[Qemu-devel] Re: [PATCH v2 3/6] Modifies wrapper functions for byte-based phys_ram_dirty bitmap to bit-based phys_ram_dirty bitmap.

2010-04-13 Thread Yoshiaki Tamura

Avi Kivity wrote:

On 04/12/2010 01:58 PM, Yoshiaki Tamura wrote:

Is it necessary to update migration and vga bitmaps?

We can simply update the master bitmap, and update the migration and vga
bitmaps only when they need it. That can be done in a different patch.



Let me explain the role of the master bitmap here.

In the previous discussion, the master bitmap represented at least one
dirty type is actually dirty. While implementing this approach, I
though there is one downside that upon resetting the bitmap, it needs
to check all dirty types whether they are already cleared to unset the
master bitmap, and this might hurt the performance in general.


The way I see it, the master bitmap is a buffer between the guest and
all the other client bitmaps (migration and vga). So, a bit can be set
in vga and cleared in master. Let's look at the following scenario:

1. guest touches address 0xa (page 0xa0)
2. qemu updates master bitmap bit 0xa0
3. vga timer fires
4. vga syncs the dirty bitmap for addresses 0xa-0xc
4.1 vga_bitmap[0xa0-0xc0] |= master_bitmap[0xa0-0xc0]
4.2 migration_bitmap[0xa0-0xc0] |= master_bitmap[0xa0-0xc0]
4.3 master_bitmap[0xa0-0xc0] = 0
5. vga draws based on vga_bitmap

the nice thing is that step 4.2 can be omitted if migration is not active.

so, client bitmaps are always updated when the client requests them,
master bitmap is only a buffer.


Thanks.  I think I'm finally getting your point.
At 4.2 and 4.3, is syncing to client necessary?
Simply doing OR at get_dirty like in this patch not enough?


In this patch, master bitmap represents all types are dirty, similar
to existing 0xff. With this approach, resetting the master bitmap can
be done without checking the other types. set_dirty_flags is actually
taking the burden in this case though. Anyway, IIUC somebody would be
unhappy depending on the role of the master bitmap.


Yes, the problem is that set_dirty_flags is the common case and also
uses random access, we'd want to make it touch only a single bit. client
access is rare, and also sequential, and therefore efficient.


I see.
So set_dirty_flags would be like,

1. set master first regard less of dirty type.
2. if dirty type was CODE, set the client.


Note that we should only allocate the migration and vga bitmaps when
migration or vga is active.


Right. May I do it in a different patch?


Sure, more patches is usually better.


I think this is about optimization. In this patch, I focused on not
breaking the existing function.


Yes, that's the best approach. But we do need to see how all the
incremental steps end up.


I totally agree.  Thanks for helping.

In summary, I'll prepare two patch sets.

1. v3 of this patch set.
- Change FLAGS value to (1,2,4,8), and add IDX (0,1,2,3)
- Use ffsl to convert FLAGS to IDX.
- Add help function which takes IDX.
- Change the behavior of set_dirty_flags as above.
- Change dirty bitmap access to a loop.
- Add brace after if ()
- Move some macros to qemu-common.h.

2. Allocate vga and migration dirty bitmap dynamically.

Please modify or add items if I were missing.




Re: [Qemu-devel] QEMU regression problems

2010-04-13 Thread Roy Tam
2010/4/12 Gerhard Wiesinger :
> Hello,
>
> Checkit reports some problems under DOS:
> 1.) NPU functions are not correct: NPU Trigonometric Functions: FAILED.
> Seems to be a problem of the instruction set.
> 2.) Real-Time Clock Alarm: FAILED (This might be also the reason for the
> KVM problem, see my previous post). Seems to be that real-time clock is not
> working correct.
> 3.) There is also a problem with the reported base memory under QEMM386
> (HIMEM.SYS and EMM386.EXE is correct here). It is 646kB instead of 640kB.
> Therefore base memory test fails. I guess that reporting memory CMOS
> tables/interrupt functions are not implemented correctly.
>
> Details are listed below.
>
> All issues are NOT present under VMWare Server 2.0 and with real hardware.
>
> QEMU: 0.12.3 under Fedora 11, 2.6.30.10-105.2.23.fc11.x86 on AMD Phenom II
> Quad Core, x86_64-softmmu.
>
> Any comments?
>

Tested with Checkit 3.0 and QEMM 9.0.
- The NPU error is confirmed and it seems to be a floating point
rounding error in QEMU. This should be a 0.12 regression as it works
in 0.11.1 and 0.10.6.
- The RTC Alarm failure is confirmed too. Is it unimplemented in QEMU?
- The Base Memory > 640K error seems to be SeaBIOS related. QEMU Bochs
BIOS(tested with both -old-bios hack in 0.12 series and old 0.11.1)
will just freeze after QEMU counted RAM.(Tested with ScriptPC and
Bochs).

> Thnx.
>
> Ciao,
> Gerhard
>
> --
> http://www.wiesinger.com/
>
> Details:
> 1.)
>NPU Trigonometric Functions.FAILED ***
>Step 1, Expected0.42926, received0.42926
>
> Double 'Npu_oldans1' = 0.429259 (3FDB78F91894EFA5h).
> Double 'Npu_oldans2' = 0.628319 (3FE41B2F769CF0E0h).
> Double 'Npu_result ' = 0.429259 (3FDB78F91894EFA6h).
>
> 2.)
> Compare Current TimePassed
>DOS: 16:24:39.89Real-Time Clock: 16:24:39.00   (.89 apart)
>
> Compare Current DatePassed
>DOS: 04/11/2010 Real-Time Clock: 04/11/2010.
>
> Real-Time Clock Alarm...FAILED ***
>
> Compare Elapsed TimePassed
>DOS: 11.97 Seconds  Real-Time Clock: 12.00 Seconds   (.03 apart)
>
> 3.) Known Memory:
>Base646K   From  0K to646K   (000h to 00A17FFh)
>Base Memory.FAILED ***
>ERROR at Address   0Ah, Bits FEDCBA9876543210
>ERROR at Address   0A0004h, Bits FEDCBA9876543210
>ERROR at Address   0A0006h, Bits FEDCBA9876543210
>ERROR at Address   0A0008h, Bits FEDCBA9876543210
>ERROR at Address   0A000Ah, Bits FEDCBA9876543210
>ERROR at Address   0A000Ch, Bits FEDCBA9876543210
>ERROR at Address   0A000Eh, Bits FEDCBA9876543210
>ERROR at Address   0A0010h, Bits FEDCBA9876543210
>ERROR at Address   0A0012h, Bits FEDCBA9876543210
>ERROR at Address   0A0014h, Bits FEDCBA9876543210
>ERROR at Address   0A0016h, Bits FEDCBA9876543210
>ERROR at Address   0A0018h, Bits FEDCBA9876543210
>ERROR at Address   0A001Ah, Bits FEDCBA9876543210
>ERROR at Address   0A001Ch, Bits FEDCBA9876543210
>ERROR at Address   0A001Eh, Bits FEDCBA9876543210
>ERROR at Address   0A0020h, Bits FEDCBA9876543210
>ADDITIONAL MEMORY ERRORS WERE NOT LISTED DUE TO LACK OF SPACE.
>
>
>




[Qemu-devel] KVM call agenda for Apr 13

2010-04-13 Thread Chris Wright
Please send in any agenda items you are interested in covering.

thanks,
-chris




[Qemu-devel] Re: Missing singlestep for already-translated code?

2010-04-13 Thread takasi-y
Hi,
> So for the already-translated code, we will miss singlestep?
At least SH4(and mips?) shows such behaviour.
I think a patch below enables single stepping in such case, too.
But, I'm not sure if this behaviour is on purpose, nor this patch is correct.
/yoshii

diff --git a/target-sh4/translate.c b/target-sh4/translate.c
index 3537f8c..dfa724a 100644
--- a/target-sh4/translate.c
+++ b/target-sh4/translate.c
@@ -300,7 +300,7 @@ static void gen_goto_tb(DisasContext * ctx, int n, 
target_ulong dest)
 tb = ctx->tb;
 
 if ((tb->pc & TARGET_PAGE_MASK) == (dest & TARGET_PAGE_MASK) &&
-   !ctx->singlestep_enabled) {
+   !ctx->singlestep_enabled && !singlestep) {
/* Use a direct jump if in same page and singlestep not enabled */
 tcg_gen_goto_tb(n);
 tcg_gen_movi_i32(cpu_pc, dest);




[Qemu-devel] [PATCH] fix migration with large mem

2010-04-13 Thread Izik Eidus
From f881b371e08760a67bf1f5b992a586c3de600f7a Mon Sep 17 00:00:00 2001
From: Izik Eidus 
Date: Tue, 13 Apr 2010 12:24:57 +0300
Subject: [PATCH] fix migration with large mem

In cases of guests with large mem that have pages
that all their bytes content are the same, we will
spend alot of time reading the memory from the guest
(is_dup_page())

It is happening beacuse ram_save_live() function have
limit of how much we can send to the dest but not how
much we read from it, and in cases we have many is_dup_page()
hits, we might read huge amount of data without updating important
stuff like the timers...

The guest lose all its repsonsibility and have many softlock ups
inside itself.

this patch add limit on the size we can read from the guest each
iteration.

Thanks.

Signed-off-by: Izik Eidus 
---
 arch_init.c |6 +-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index cfc03ea..e27b1a0 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -88,6 +88,8 @@ const uint32_t arch_type = QEMU_ARCH;
 #define RAM_SAVE_FLAG_PAGE 0x08
 #define RAM_SAVE_FLAG_EOS  0x10
 
+#define MAX_SAVE_BLOCK_READ 10 * 1024 * 1024
+
 static int is_dup_page(uint8_t *page, uint8_t ch)
 {
 uint32_t val = ch << 24 | ch << 16 | ch << 8 | ch;
@@ -175,6 +177,7 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, 
void *opaque)
 uint64_t bytes_transferred_last;
 double bwidth = 0;
 uint64_t expected_time = 0;
+int data_read = 0;
 
 if (stage < 0) {
 cpu_physical_memory_set_dirty_tracking(0);
@@ -205,10 +208,11 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, 
void *opaque)
 bytes_transferred_last = bytes_transferred;
 bwidth = qemu_get_clock_ns(rt_clock);
 
-while (!qemu_file_rate_limit(f)) {
+while (!qemu_file_rate_limit(f) && data_read < MAX_SAVE_BLOCK_READ) {
 int ret;
 
 ret = ram_save_block(f);
+data_read += ret * TARGET_PAGE_SIZE;
 bytes_transferred += ret * TARGET_PAGE_SIZE;
 if (ret == 0) { /* no more blocks */
 break;
-- 
1.6.6.1





[Qemu-devel] [PATCH 1/3] block: Convert first_drv to QLIST

2010-04-13 Thread Stefan Hajnoczi
Signed-off-by: Stefan Hajnoczi 
---
 block.c |   22 --
 block_int.h |2 +-
 2 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/block.c b/block.c
index 5d087de..12cf434 100644
--- a/block.c
+++ b/block.c
@@ -58,7 +58,8 @@ static int bdrv_write_em(BlockDriverState *bs, int64_t 
sector_num,
 static QTAILQ_HEAD(, BlockDriverState) bdrv_states =
 QTAILQ_HEAD_INITIALIZER(bdrv_states);
 
-static BlockDriver *first_drv;
+static QLIST_HEAD(, BlockDriver) bdrv_drivers =
+QLIST_HEAD_INITIALIZER(bdrv_drivers);
 
 /* If non-zero, use only whitelisted block drivers */
 static int use_bdrv_whitelist;
@@ -142,8 +143,7 @@ void bdrv_register(BlockDriver *bdrv)
 if (!bdrv->bdrv_aio_flush)
 bdrv->bdrv_aio_flush = bdrv_aio_flush_em;
 
-bdrv->next = first_drv;
-first_drv = bdrv;
+QLIST_INSERT_HEAD(&bdrv_drivers, bdrv, list);
 }
 
 /* create a new block device (by default it is empty) */
@@ -162,9 +162,10 @@ BlockDriverState *bdrv_new(const char *device_name)
 BlockDriver *bdrv_find_format(const char *format_name)
 {
 BlockDriver *drv1;
-for(drv1 = first_drv; drv1 != NULL; drv1 = drv1->next) {
-if (!strcmp(drv1->format_name, format_name))
+QLIST_FOREACH(drv1, &bdrv_drivers, list) {
+if (!strcmp(drv1->format_name, format_name)) {
 return drv1;
+}
 }
 return NULL;
 }
@@ -265,10 +266,11 @@ static BlockDriver *find_protocol(const char *filename)
 len = sizeof(protocol) - 1;
 memcpy(protocol, filename, len);
 protocol[len] = '\0';
-for(drv1 = first_drv; drv1 != NULL; drv1 = drv1->next) {
+QLIST_FOREACH(drv1, &bdrv_drivers, list) {
 if (drv1->protocol_name &&
-!strcmp(drv1->protocol_name, protocol))
+!strcmp(drv1->protocol_name, protocol)) {
 return drv1;
+}
 }
 return NULL;
 }
@@ -282,7 +284,7 @@ static BlockDriver *find_hdev_driver(const char *filename)
 int score_max = 0, score;
 BlockDriver *drv = NULL, *d;
 
-for (d = first_drv; d; d = d->next) {
+QLIST_FOREACH(d, &bdrv_drivers, list) {
 if (d->bdrv_probe_device) {
 score = d->bdrv_probe_device(filename);
 if (score > score_max) {
@@ -317,7 +319,7 @@ static BlockDriver *find_image_format(const char *filename)
 }
 
 score_max = 0;
-for(drv1 = first_drv; drv1 != NULL; drv1 = drv1->next) {
+QLIST_FOREACH(drv1, &bdrv_drivers, list) {
 if (drv1->bdrv_probe) {
 score = drv1->bdrv_probe(buf, ret, filename);
 if (score > score_max) {
@@ -1157,7 +1159,7 @@ void bdrv_iterate_format(void (*it)(void *opaque, const 
char *name),
 {
 BlockDriver *drv;
 
-for (drv = first_drv; drv != NULL; drv = drv->next) {
+QLIST_FOREACH(drv, &bdrv_drivers, list) {
 it(opaque, drv->format_name);
 }
 }
diff --git a/block_int.h b/block_int.h
index 466a38c..d4067ff 100644
--- a/block_int.h
+++ b/block_int.h
@@ -126,7 +126,7 @@ struct BlockDriver {
 /* Set if newly created images are not guaranteed to contain only zeros */
 int no_zero_init;
 
-struct BlockDriver *next;
+QLIST_ENTRY(BlockDriver) list;
 };
 
 struct BlockDriverState {
-- 
1.7.0





[Qemu-devel] [PATCH 2/3] qemu-img: Eliminate bdrv_new_open() code duplication

2010-04-13 Thread Stefan Hajnoczi
Several commands have code to create a BlockDriverState and open a file.
The bdrv_new_open() function can be used to perform these steps.  This
patch converts the qemu-img commands to actually use bdrv_new_open().

Replaced the bdrv_new_open() 'readonly' argument with bdrv_open()-style
flags to support generic flags like BDRV_O_NO_BACKING.

Signed-off-by: Stefan Hajnoczi 
---
 qemu-img.c |   83 +++
 1 files changed, 10 insertions(+), 73 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 18e43a0..b30effa 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -190,12 +190,11 @@ static int read_password(char *buf, int buf_size)
 
 static BlockDriverState *bdrv_new_open(const char *filename,
const char *fmt,
-   int readonly)
+   int flags)
 {
 BlockDriverState *bs;
 BlockDriver *drv;
 char password[256];
-int flags = BRDV_O_FLAGS;
 
 bs = bdrv_new("");
 if (!bs)
@@ -207,9 +206,6 @@ static BlockDriverState *bdrv_new_open(const char *filename,
 } else {
 drv = NULL;
 }
-if (!readonly) {
-flags |= BDRV_O_RDWR;
-}
 if (bdrv_open(bs, filename, flags, drv) < 0) {
 error("Could not open '%s'", filename);
 }
@@ -349,7 +345,7 @@ static int img_create(int argc, char **argv)
 }
 }
 
-bs = bdrv_new_open(backing_file->value.s, fmt, 1);
+bs = bdrv_new_open(backing_file->value.s, fmt, BRDV_O_FLAGS);
 bdrv_get_geometry(bs, &size);
 size *= 512;
 bdrv_delete(bs);
@@ -384,7 +380,6 @@ static int img_check(int argc, char **argv)
 {
 int c, ret;
 const char *filename, *fmt;
-BlockDriver *drv;
 BlockDriverState *bs;
 
 fmt = NULL;
@@ -405,19 +400,7 @@ static int img_check(int argc, char **argv)
 help();
 filename = argv[optind++];
 
-bs = bdrv_new("");
-if (!bs)
-error("Not enough memory");
-if (fmt) {
-drv = bdrv_find_format(fmt);
-if (!drv)
-error("Unknown file format '%s'", fmt);
-} else {
-drv = NULL;
-}
-if (bdrv_open(bs, filename, BRDV_O_FLAGS, drv) < 0) {
-error("Could not open '%s'", filename);
-}
+bs = bdrv_new_open(filename, fmt, BRDV_O_FLAGS);
 ret = bdrv_check(bs);
 switch(ret) {
 case 0:
@@ -443,7 +426,6 @@ static int img_commit(int argc, char **argv)
 {
 int c, ret;
 const char *filename, *fmt;
-BlockDriver *drv;
 BlockDriverState *bs;
 
 fmt = NULL;
@@ -464,19 +446,7 @@ static int img_commit(int argc, char **argv)
 help();
 filename = argv[optind++];
 
-bs = bdrv_new("");
-if (!bs)
-error("Not enough memory");
-if (fmt) {
-drv = bdrv_find_format(fmt);
-if (!drv)
-error("Unknown file format '%s'", fmt);
-} else {
-drv = NULL;
-}
-if (bdrv_open(bs, filename, BRDV_O_FLAGS | BDRV_O_RDWR, drv) < 0) {
-error("Could not open '%s'", filename);
-}
+bs = bdrv_new_open(filename, fmt, BRDV_O_FLAGS | BDRV_O_RDWR);
 ret = bdrv_commit(bs);
 switch(ret) {
 case 0:
@@ -633,7 +603,7 @@ static int img_convert(int argc, char **argv)
 
 total_sectors = 0;
 for (bs_i = 0; bs_i < bs_n; bs_i++) {
-bs[bs_i] = bdrv_new_open(argv[optind + bs_i], fmt, 1);
+bs[bs_i] = bdrv_new_open(argv[optind + bs_i], fmt, BRDV_O_FLAGS);
 if (!bs[bs_i])
 error("Could not open '%s'", argv[optind + bs_i]);
 bdrv_get_geometry(bs[bs_i], &bs_sectors);
@@ -691,7 +661,7 @@ static int img_convert(int argc, char **argv)
 }
 }
 
-out_bs = bdrv_new_open(out_filename, out_fmt, 0);
+out_bs = bdrv_new_open(out_filename, out_fmt, BRDV_O_FLAGS | BDRV_O_RDWR);
 
 bs_i = 0;
 bs_offset = 0;
@@ -889,7 +859,6 @@ static int img_info(int argc, char **argv)
 {
 int c;
 const char *filename, *fmt;
-BlockDriver *drv;
 BlockDriverState *bs;
 char fmt_name[128], size_buf[128], dsize_buf[128];
 uint64_t total_sectors;
@@ -916,19 +885,7 @@ static int img_info(int argc, char **argv)
 help();
 filename = argv[optind++];
 
-bs = bdrv_new("");
-if (!bs)
-error("Not enough memory");
-if (fmt) {
-drv = bdrv_find_format(fmt);
-if (!drv)
-error("Unknown file format '%s'", fmt);
-} else {
-drv = NULL;
-}
-if (bdrv_open(bs, filename, BRDV_O_FLAGS | BDRV_O_NO_BACKING, drv) < 0) {
-error("Could not open '%s'", filename);
-}
+bs = bdrv_new_open(filename, fmt, BRDV_O_FLAGS | BDRV_O_NO_BACKING);
 bdrv_get_format(bs, fmt_name, sizeof(fmt_name));
 bdrv_get_geometry(bs, &total_sectors);
 get_human_readable_size(size_buf, sizeof(size_buf), total_sectors * 512);
@@ -1028,13 +985,7 @@ static int img_snapshot(int argc, char **argv)
   

[Qemu-devel] [PATCH 3/3] qemu-img: Fix BRDV_O_FLAGS typo

2010-04-13 Thread Stefan Hajnoczi
It should be BDRV_O_FLAGS instead of BRDV_O_FLAGS.

Signed-off-by: Stefan Hajnoczi 
---
 qemu-img.c |   20 ++--
 1 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index b30effa..7203b8b 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -37,7 +37,7 @@ typedef struct img_cmd_t {
 } img_cmd_t;
 
 /* Default to cache=writeback as data integrity is not important for qemu-tcg. 
*/
-#define BRDV_O_FLAGS BDRV_O_CACHE_WB
+#define BDRV_O_FLAGS BDRV_O_CACHE_WB
 
 static void QEMU_NORETURN error(const char *fmt, ...)
 {
@@ -345,7 +345,7 @@ static int img_create(int argc, char **argv)
 }
 }
 
-bs = bdrv_new_open(backing_file->value.s, fmt, BRDV_O_FLAGS);
+bs = bdrv_new_open(backing_file->value.s, fmt, BDRV_O_FLAGS);
 bdrv_get_geometry(bs, &size);
 size *= 512;
 bdrv_delete(bs);
@@ -400,7 +400,7 @@ static int img_check(int argc, char **argv)
 help();
 filename = argv[optind++];
 
-bs = bdrv_new_open(filename, fmt, BRDV_O_FLAGS);
+bs = bdrv_new_open(filename, fmt, BDRV_O_FLAGS);
 ret = bdrv_check(bs);
 switch(ret) {
 case 0:
@@ -446,7 +446,7 @@ static int img_commit(int argc, char **argv)
 help();
 filename = argv[optind++];
 
-bs = bdrv_new_open(filename, fmt, BRDV_O_FLAGS | BDRV_O_RDWR);
+bs = bdrv_new_open(filename, fmt, BDRV_O_FLAGS | BDRV_O_RDWR);
 ret = bdrv_commit(bs);
 switch(ret) {
 case 0:
@@ -603,7 +603,7 @@ static int img_convert(int argc, char **argv)
 
 total_sectors = 0;
 for (bs_i = 0; bs_i < bs_n; bs_i++) {
-bs[bs_i] = bdrv_new_open(argv[optind + bs_i], fmt, BRDV_O_FLAGS);
+bs[bs_i] = bdrv_new_open(argv[optind + bs_i], fmt, BDRV_O_FLAGS);
 if (!bs[bs_i])
 error("Could not open '%s'", argv[optind + bs_i]);
 bdrv_get_geometry(bs[bs_i], &bs_sectors);
@@ -661,7 +661,7 @@ static int img_convert(int argc, char **argv)
 }
 }
 
-out_bs = bdrv_new_open(out_filename, out_fmt, BRDV_O_FLAGS | BDRV_O_RDWR);
+out_bs = bdrv_new_open(out_filename, out_fmt, BDRV_O_FLAGS | BDRV_O_RDWR);
 
 bs_i = 0;
 bs_offset = 0;
@@ -885,7 +885,7 @@ static int img_info(int argc, char **argv)
 help();
 filename = argv[optind++];
 
-bs = bdrv_new_open(filename, fmt, BRDV_O_FLAGS | BDRV_O_NO_BACKING);
+bs = bdrv_new_open(filename, fmt, BDRV_O_FLAGS | BDRV_O_NO_BACKING);
 bdrv_get_format(bs, fmt_name, sizeof(fmt_name));
 bdrv_get_geometry(bs, &total_sectors);
 get_human_readable_size(size_buf, sizeof(size_buf), total_sectors * 512);
@@ -1075,7 +1075,7 @@ static int img_rebase(int argc, char **argv)
  * Ignore the old backing file for unsafe rebase in case we want to correct
  * the reference to a renamed or moved backing file.
  */
-flags = BRDV_O_FLAGS | BDRV_O_RDWR | (unsafe ? BDRV_O_NO_BACKING : 0);
+flags = BDRV_O_FLAGS | BDRV_O_RDWR | (unsafe ? BDRV_O_NO_BACKING : 0);
 bs = bdrv_new_open(filename, fmt, flags);
 
 /* Find the right drivers for the backing files */
@@ -1106,7 +1106,7 @@ static int img_rebase(int argc, char **argv)
 
 bs_old_backing = bdrv_new("old_backing");
 bdrv_get_backing_filename(bs, backing_name, sizeof(backing_name));
-if (bdrv_open(bs_old_backing, backing_name, BRDV_O_FLAGS,
+if (bdrv_open(bs_old_backing, backing_name, BDRV_O_FLAGS,
 old_backing_drv))
 {
 error("Could not open old backing file '%s'", backing_name);
@@ -1114,7 +1114,7 @@ static int img_rebase(int argc, char **argv)
 }
 
 bs_new_backing = bdrv_new("new_backing");
-if (bdrv_open(bs_new_backing, out_baseimg, BRDV_O_FLAGS | BDRV_O_RDWR,
+if (bdrv_open(bs_new_backing, out_baseimg, BDRV_O_FLAGS | BDRV_O_RDWR,
 new_backing_drv))
 {
 error("Could not open new backing file '%s'", out_baseimg);
-- 
1.7.0





[Qemu-devel] [PATCH 0/3] block and qemu-img: Further cleanups

2010-04-13 Thread Stefan Hajnoczi
These patches apply to Kevin's block branch.





[Qemu-devel] Re: Missing singlestep for already-translated code?

2010-04-13 Thread Jun Koi
On Tue, Apr 13, 2010 at 6:21 PM,   wrote:
> Hi,
>> So for the already-translated code, we will miss singlestep?
> At least SH4(and mips?) shows such behaviour.
> I think a patch below enables single stepping in such case, too.
> But, I'm not sure if this behaviour is on purpose, nor this patch is correct.
> /yoshii
>
> diff --git a/target-sh4/translate.c b/target-sh4/translate.c
> index 3537f8c..dfa724a 100644
> --- a/target-sh4/translate.c
> +++ b/target-sh4/translate.c
> @@ -300,7 +300,7 @@ static void gen_goto_tb(DisasContext * ctx, int n, 
> target_ulong dest)
>     tb = ctx->tb;
>
>     if ((tb->pc & TARGET_PAGE_MASK) == (dest & TARGET_PAGE_MASK) &&
> -       !ctx->singlestep_enabled) {
> +       !ctx->singlestep_enabled && !singlestep) {
>        /* Use a direct jump if in same page and singlestep not enabled */
>         tcg_gen_goto_tb(n);
>         tcg_gen_movi_i32(cpu_pc, dest);
>

The first glance: because you are patching translate.c, I dont think
you fixed the problem.

Thanks,
J




[Qemu-devel] Re: [PATCH v2 3/6] Modifies wrapper functions for byte-based phys_ram_dirty bitmap to bit-based phys_ram_dirty bitmap.

2010-04-13 Thread Avi Kivity

On 04/13/2010 11:01 AM, Yoshiaki Tamura wrote:

Avi Kivity wrote:

On 04/12/2010 01:58 PM, Yoshiaki Tamura wrote:

Is it necessary to update migration and vga bitmaps?

We can simply update the master bitmap, and update the migration 
and vga

bitmaps only when they need it. That can be done in a different patch.



Let me explain the role of the master bitmap here.

In the previous discussion, the master bitmap represented at least one
dirty type is actually dirty. While implementing this approach, I
though there is one downside that upon resetting the bitmap, it needs
to check all dirty types whether they are already cleared to unset the
master bitmap, and this might hurt the performance in general.


The way I see it, the master bitmap is a buffer between the guest and
all the other client bitmaps (migration and vga). So, a bit can be set
in vga and cleared in master. Let's look at the following scenario:

1. guest touches address 0xa (page 0xa0)
2. qemu updates master bitmap bit 0xa0
3. vga timer fires
4. vga syncs the dirty bitmap for addresses 0xa-0xc
4.1 vga_bitmap[0xa0-0xc0] |= master_bitmap[0xa0-0xc0]
4.2 migration_bitmap[0xa0-0xc0] |= master_bitmap[0xa0-0xc0]
4.3 master_bitmap[0xa0-0xc0] = 0
5. vga draws based on vga_bitmap

the nice thing is that step 4.2 can be omitted if migration is not 
active.


so, client bitmaps are always updated when the client requests them,
master bitmap is only a buffer.


Thanks.  I think I'm finally getting your point.
At 4.2 and 4.3, is syncing to client necessary?


Yes.  We want to clear the master bitmap, otherwise on the next 
iteration we will get dirty bits that may be spurious.  But if we clear 
the dirty bitmap, we must copy it to all client bitmaps, not just vga, 
otherwise we lose data.



Simply doing OR at get_dirty like in this patch not enough?


If you don't clear the master bitmap, then you will get the same bits at 
the next iteration.





In this patch, master bitmap represents all types are dirty, similar
to existing 0xff. With this approach, resetting the master bitmap can
be done without checking the other types. set_dirty_flags is actually
taking the burden in this case though. Anyway, IIUC somebody would be
unhappy depending on the role of the master bitmap.


Yes, the problem is that set_dirty_flags is the common case and also
uses random access, we'd want to make it touch only a single bit. client
access is rare, and also sequential, and therefore efficient.


I see.
So set_dirty_flags would be like,

1. set master first regard less of dirty type.
2. if dirty type was CODE, set the client.


There really should be two APIs, one for general dirtyness and one for 
code dirty.  As Anthony said, the code dirty bitmap is completely 
separate from the other uses.


Consider starting by separating the two uses, it may clear up the code 
and make things easier later on.




In summary, I'll prepare two patch sets.

1. v3 of this patch set.
- Change FLAGS value to (1,2,4,8), and add IDX (0,1,2,3)
- Use ffsl to convert FLAGS to IDX.
- Add help function which takes IDX.
- Change the behavior of set_dirty_flags as above.
- Change dirty bitmap access to a loop.
- Add brace after if ()
- Move some macros to qemu-common.h.

2. Allocate vga and migration dirty bitmap dynamically.

Please modify or add items if I were missing.


I would add separation of CODE and the other dirty bitmaps.

--
error compiling committee.c: too many arguments to function





[Qemu-devel] [PATCH] block.h: bdrv_create2 doesn't exist any more

2010-04-13 Thread Kevin Wolf
The bdrv_create2 implementation has been disappeared long ago. Remove its
prototype from the header file, too.

Signed-off-by: Kevin Wolf 
---
 block.h |4 
 1 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/block.h b/block.h
index 4a57dd5..05ad572 100644
--- a/block.h
+++ b/block.h
@@ -57,10 +57,6 @@ BlockDriver *bdrv_find_format(const char *format_name);
 BlockDriver *bdrv_find_whitelisted_format(const char *format_name);
 int bdrv_create(BlockDriver *drv, const char* filename,
 QEMUOptionParameter *options);
-int bdrv_create2(BlockDriver *drv,
- const char *filename, int64_t size_in_sectors,
- const char *backing_file, const char *backing_format,
- int flags);
 BlockDriverState *bdrv_new(const char *device_name);
 void bdrv_delete(BlockDriverState *bs);
 int bdrv_file_open(BlockDriverState **pbs, const char *filename, int flags);
-- 
1.6.6.1





[Qemu-devel] Re: [PATCH v2 3/6] Modifies wrapper functions for byte-based phys_ram_dirty bitmap to bit-based phys_ram_dirty bitmap.

2010-04-13 Thread Yoshiaki Tamura

Avi Kivity wrote:

On 04/13/2010 11:01 AM, Yoshiaki Tamura wrote:

Avi Kivity wrote:

On 04/12/2010 01:58 PM, Yoshiaki Tamura wrote:

Is it necessary to update migration and vga bitmaps?

We can simply update the master bitmap, and update the migration
and vga
bitmaps only when they need it. That can be done in a different patch.



Let me explain the role of the master bitmap here.

In the previous discussion, the master bitmap represented at least one
dirty type is actually dirty. While implementing this approach, I
though there is one downside that upon resetting the bitmap, it needs
to check all dirty types whether they are already cleared to unset the
master bitmap, and this might hurt the performance in general.


The way I see it, the master bitmap is a buffer between the guest and
all the other client bitmaps (migration and vga). So, a bit can be set
in vga and cleared in master. Let's look at the following scenario:

1. guest touches address 0xa (page 0xa0)
2. qemu updates master bitmap bit 0xa0
3. vga timer fires
4. vga syncs the dirty bitmap for addresses 0xa-0xc
4.1 vga_bitmap[0xa0-0xc0] |= master_bitmap[0xa0-0xc0]
4.2 migration_bitmap[0xa0-0xc0] |= master_bitmap[0xa0-0xc0]
4.3 master_bitmap[0xa0-0xc0] = 0
5. vga draws based on vga_bitmap

the nice thing is that step 4.2 can be omitted if migration is not
active.

so, client bitmaps are always updated when the client requests them,
master bitmap is only a buffer.


Thanks. I think I'm finally getting your point.
At 4.2 and 4.3, is syncing to client necessary?


Yes. We want to clear the master bitmap, otherwise on the next iteration
we will get dirty bits that may be spurious. But if we clear the dirty
bitmap, we must copy it to all client bitmaps, not just vga, otherwise
we lose data.


Simply doing OR at get_dirty like in this patch not enough?


If you don't clear the master bitmap, then you will get the same bits at
the next iteration.


OK.
Master is just a cache/buffer, and upon get_dirty, we copy it to client bitmaps 
with for loop, clear the master and finally check actual client bitmap.  If 
we're going to allocate client bitmaps dynamically, we need to check whether it 
isn't null too.



In this patch, master bitmap represents all types are dirty, similar
to existing 0xff. With this approach, resetting the master bitmap can
be done without checking the other types. set_dirty_flags is actually
taking the burden in this case though. Anyway, IIUC somebody would be
unhappy depending on the role of the master bitmap.


Yes, the problem is that set_dirty_flags is the common case and also
uses random access, we'd want to make it touch only a single bit. client
access is rare, and also sequential, and therefore efficient.


I see.
So set_dirty_flags would be like,

1. set master first regard less of dirty type.
2. if dirty type was CODE, set the client.


There really should be two APIs, one for general dirtyness and one for
code dirty. As Anthony said, the code dirty bitmap is completely
separate from the other uses.

Consider starting by separating the two uses, it may clear up the code
and make things easier later on.


That really make sense.  A little bit hesitant because the code might look a bit 
duplicated.



In summary, I'll prepare two patch sets.

1. v3 of this patch set.
- Change FLAGS value to (1,2,4,8), and add IDX (0,1,2,3)
- Use ffsl to convert FLAGS to IDX.
- Add help function which takes IDX.
- Change the behavior of set_dirty_flags as above.
- Change dirty bitmap access to a loop.
- Add brace after if ()
- Move some macros to qemu-common.h.

2. Allocate vga and migration dirty bitmap dynamically.

Please modify or add items if I were missing.


I would add separation of CODE and the other dirty bitmaps.


I'm not sure I should starting separating first or later at this moment, but 
will consider it.





[Qemu-devel] Re: [PATCH 0/3] block and qemu-img: Further cleanups

2010-04-13 Thread Kevin Wolf
Am 13.04.2010 11:29, schrieb Stefan Hajnoczi:
> These patches apply to Kevin's block branch.

Thanks. Looks good, applied all to the block branch.

Kevin




Re: [Qemu-devel] How to lock-up your tap-based VM network

2010-04-13 Thread Jan Kiszka
Paul Brook wrote:
>> A major reason for this deadlock could likely be removed by shutting
>> down the tap (if peered) or dropping packets in user space (in case of
>> vlan) when a NIC is stopped or otherwise shut down. Currently most (if
>> not all) NIC models seem to signal both "queue full" and "RX disabled"
>> via !can_receive().
> 
> No. A disabled device should return true from can_recieve, then discard the 
> packets in its receive callback. Failure to do so is a bug in the device. It 
> looks like the virtio-net device may be buggy.

That's not a virtio-only issue. In fact, we ran into this over pcnet,
and a quick check of other popular PCI NIC models (except for rtl8139)
revealed the same picture: They only report can_receive if their
receiver unit is up and ready (some also include the queue state, but
that's an "add-on").

I think it's clear why: "can_receive" strongly suggests that a suspended
receiver should make the model return false. If we want to keep this
handler, it should be refactored to something like "queue_full".

But before starting any refactoring endeavor: Do we have a consensus on
the direction? Refactor can_receive to queue_full? Or even drop it?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux




Re: [Qemu-devel] How to lock-up your tap-based VM network

2010-04-13 Thread Jan Kiszka
Jamie Lokier wrote:
> Paul Brook wrote:
>>> A major reason for this deadlock could likely be removed by shutting
>>> down the tap (if peered) or dropping packets in user space (in case of
>>> vlan) when a NIC is stopped or otherwise shut down. Currently most (if
>>> not all) NIC models seem to signal both "queue full" and "RX disabled"
>>> via !can_receive().
>> No. A disabled device should return true from can_recieve, then discard the 
>> packets in its receive callback. Failure to do so is a bug in the device. It 
>> looks like the virtio-net device may be buggy.
> 
> I agree - or alternatively signal that there's no point sending it
> packets and they should be dropped without bothering to construct them.
> 
> But anyway, this flow control mechanism is buggy - what if instead of
> an interface down, you just have a *slow* guest?  That should not push
> back so much that it makes other guests networking with each other
> slow down.

Indeed. So, instead of the current scheme that tries to stop the sender
when some receiver overflows, we must turn it up side down so that this
stop can _never_ happen. We may keep a sufficiently long queue to reduce
the risk of packet drops, but we can't prevent this for all cases anyway.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux




Re: [Qemu-devel] How to lock-up your tap-based VM network

2010-04-13 Thread Jan Kiszka
Paul Brook wrote:
>> But anyway, this flow control mechanism is buggy - what if instead of
>> an interface down, you just have a *slow* guest?  That should not push
>> back so much that it makes other guests networking with each other
>> slow down.
> 
> The OP described multiple guests connected via a host bridge. In this case it 
> is entirely the host's responsibility to arbitrate between multiple guests. 
> If 
> one interface can block the bridge simply by failing to respond in a timely 
> manner then this is a serious bug or misconfiguration of your host bridge.

It's not the bridge, I think it's limited to the tun driver as bridge
participant and how it is configured/used by QEMU.

> 
> The reason tap_send exists is because slirp does not implement TCP flow 
> control properly.  Instead it relies on the can_send hook to only avoid 
> dropped packets.
> 
> Using this in the tap code is debatable. My guess is that this is a hack to 
> workaround guest network stacks that expect throughput to be limited by line 
> speed (which is a virtual environment is effectively infinite), have poor 
> higher level flow control, and react badly to packet loss.

[ CC'ing some people who should have been involved in establishing the
TX mitigation for tap devices. Full thread under
http://thread.gmane.org/gmane.comp.emulators.qemu/67809 ]

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux




[Qemu-devel] [PATCH] SCSI: Add disk reset handler

2010-04-13 Thread Jan Kiszka
Ensure that pending requests of an SCSI disk are purged on system reset
and also restore max_lba. The latter is now only present in the reset
handler as that one is already automatically called during init.

Signed-off-by: Jan Kiszka 
---
 hw/scsi-disk.c |   32 
 1 files changed, 24 insertions(+), 8 deletions(-)

This should be applied to stable as well. I can provide a corresponding
patch once this one is acceptable.

diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c
index 77cb1da..50d5e9c 100644
--- a/hw/scsi-disk.c
+++ b/hw/scsi-disk.c
@@ -1010,22 +1010,42 @@ static int32_t scsi_send_command(SCSIDevice *d, 
uint32_t tag,
 }
 }
 
-static void scsi_destroy(SCSIDevice *dev)
+static void scsi_disk_purge_requests(SCSIDiskState *s)
 {
-SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, dev);
 SCSIDiskReq *r;
 
 while (!QTAILQ_EMPTY(&s->qdev.requests)) {
 r = DO_UPCAST(SCSIDiskReq, req, QTAILQ_FIRST(&s->qdev.requests));
 scsi_remove_request(r);
 }
+}
+
+static void scsi_disk_reset(DeviceState *dev)
+{
+SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev.qdev, dev);
+uint64_t nb_sectors;
+
+scsi_disk_purge_requests(s);
+
+bdrv_get_geometry(s->bs, &nb_sectors);
+nb_sectors /= s->cluster_size;
+if (nb_sectors) {
+nb_sectors--;
+}
+s->max_lba = nb_sectors;
+}
+
+static void scsi_destroy(SCSIDevice *dev)
+{
+SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, dev);
+
+scsi_disk_purge_requests(s);
 drive_uninit(s->qdev.conf.dinfo);
 }
 
 static int scsi_disk_initfn(SCSIDevice *dev)
 {
 SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, dev);
-uint64_t nb_sectors;
 
 if (!s->qdev.conf.dinfo || !s->qdev.conf.dinfo->bdrv) {
 error_report("scsi-disk: drive property not set");
@@ -1046,11 +1066,6 @@ static int scsi_disk_initfn(SCSIDevice *dev)
 s->cluster_size = s->qdev.blocksize / 512;
 
 s->qdev.type = TYPE_DISK;
-bdrv_get_geometry(s->bs, &nb_sectors);
-nb_sectors /= s->cluster_size;
-if (nb_sectors)
-nb_sectors--;
-s->max_lba = nb_sectors;
 qemu_add_vm_change_state_handler(scsi_dma_restart_cb, s);
 return 0;
 }
@@ -1059,6 +1074,7 @@ static SCSIDeviceInfo scsi_disk_info = {
 .qdev.name= "scsi-disk",
 .qdev.desc= "virtual scsi disk or cdrom",
 .qdev.size= sizeof(SCSIDiskState),
+.qdev.reset   = scsi_disk_reset,
 .init = scsi_disk_initfn,
 .destroy  = scsi_destroy,
 .send_command = scsi_send_command,





Re: [Qemu-devel] How to lock-up your tap-based VM network

2010-04-13 Thread Paul Brook
> Paul Brook wrote:
> >> But anyway, this flow control mechanism is buggy - what if instead of
> >> an interface down, you just have a *slow* guest?  That should not push
> >> back so much that it makes other guests networking with each other
> >> slow down.
> >
> > The OP described multiple guests connected via a host bridge. In this
> > case it is entirely the host's responsibility to arbitrate between
> > multiple guests. If one interface can block the bridge simply by failing
> > to respond in a timely manner then this is a serious bug or
> > misconfiguration of your host bridge.
> 
> It's not the bridge, I think it's limited to the tun driver as bridge
> participant and how it is configured/used by QEMU.

If one tap interface can prevent correct operation of another by failing to 
read data, then this is clearly a host kernel bug. 
I've no idea whether it's a bug in the TAP implementation (e.g. shared queue 
between independent devices), a bug in the bridging implementation (drops 
unrelated packets when one port is slow), or some interaction between the two, 
but it's definitely not a qemu bug.

I'm sure there are plenty of ways to DoS a qemu instance, and prevent it 
reading data from its tap interface.  How/whether this effects other unrelated 
processes on the host machine is not something qemu can or should know about.

Paul




Re: [Qemu-devel] How to lock-up your tap-based VM network

2010-04-13 Thread Jan Kiszka
Paul Brook wrote:
>> Paul Brook wrote:
 A major reason for this deadlock could likely be removed by shutting
 down the tap (if peered) or dropping packets in user space (in case of
 vlan) when a NIC is stopped or otherwise shut down. Currently most (if
 not all) NIC models seem to signal both "queue full" and "RX disabled"
 via !can_receive().
>>> No. A disabled device should return true from can_recieve, then discard
>>> the packets in its receive callback. Failure to do so is a bug in the
>>> device. It looks like the virtio-net device may be buggy.
>> That's not a virtio-only issue. In fact, we ran into this over pcnet,
>> and a quick check of other popular PCI NIC models (except for rtl8139)
>> revealed the same picture: They only report can_receive if their
>> receiver unit is up and ready (some also include the queue state, but
>> that's an "add-on").
> 
> If so these are also buggy.
> 
>> I think it's clear why: "can_receive" strongly suggests that a suspended
>> receiver should make the model return false. If we want to keep this
>> handler, it should be refactored to something like "queue_full".
> 
> I don't see a need to refactor anything.  You just need to fix the devices 
> that incorrectly return false when their RX engine is disabled.

"can_receive" is a misleading name. NIC model developers will continue
to make mistakes when implementing this function. And I don't think we
want such implementations all around:

static int rtl8139_can_receive(VLANClientState *nc)
{
RTL8139State *s = DO_UPCAST(NICState, nc, nc)->opaque;
int avail;

/* Receive (drop) packets if card is disabled.  */
if (!s->clock_enabled)
  return 1;
if (!rtl8139_receiver_enabled(s))
  return 1;

if (rtl8139_cp_receiver_enabled(s)) {
/* ??? Flow control not implemented in c+ mode.
   This is a hack to work around slirp deficiencies anyway.  */
return 1;
} else {
avail = MOD2(s->RxBufferSize + s->RxBufPtr - s->RxBufAddr,
 s->RxBufferSize);
return (avail == 0 || avail >= 1514);
}
}

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux




[Qemu-devel] Re: Missing singlestep for already-translated code?

2010-04-13 Thread Jan Kiszka
Jun Koi wrote:
> Hi,
> 
> I am looking into the singlestep command in monitor interface, and it
> seems that we only take into account the singlestep flag when we are
> translating code.
> So for the already-translated code, we will miss singlestep?

This feature is broken. For TCG, it should at least flush the
translation buffer, and for KVM it has to enable single-stepping in the
kernel. That's what happens automatically when you call cpu_single_step.
I guess 'singlestep' wants to be somehow orthogonal to this. But this is
the wrong approach.

Does anyone actually used this feature or still does so? It looks fairly
redundant to me, kind of a poor-man's gdb front-end as part of the
monitor console.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux




Re: [Qemu-devel] How to lock-up your tap-based VM network

2010-04-13 Thread Paul Brook
> Paul Brook wrote:
> >> A major reason for this deadlock could likely be removed by shutting
> >> down the tap (if peered) or dropping packets in user space (in case of
> >> vlan) when a NIC is stopped or otherwise shut down. Currently most (if
> >> not all) NIC models seem to signal both "queue full" and "RX disabled"
> >> via !can_receive().
> >
> > No. A disabled device should return true from can_recieve, then discard
> > the packets in its receive callback. Failure to do so is a bug in the
> > device. It looks like the virtio-net device may be buggy.
> 
> That's not a virtio-only issue. In fact, we ran into this over pcnet,
> and a quick check of other popular PCI NIC models (except for rtl8139)
> revealed the same picture: They only report can_receive if their
> receiver unit is up and ready (some also include the queue state, but
> that's an "add-on").

If so these are also buggy.

> I think it's clear why: "can_receive" strongly suggests that a suspended
> receiver should make the model return false. If we want to keep this
> handler, it should be refactored to something like "queue_full".

I don't see a need to refactor anything.  You just need to fix the devices 
that incorrectly return false when their RX engine is disabled.

Paul




[Qemu-devel] KVM call minutes for Apr 13

2010-04-13 Thread Chris Wright
No topics, short call




Re: [Qemu-devel] Re: Missing singlestep for already-translated code?

2010-04-13 Thread Alexander Graf

On 13.04.2010, at 15:36, Jan Kiszka wrote:

> Jun Koi wrote:
>> Hi,
>> 
>> I am looking into the singlestep command in monitor interface, and it
>> seems that we only take into account the singlestep flag when we are
>> translating code.
>> So for the already-translated code, we will miss singlestep?
> 
> This feature is broken. For TCG, it should at least flush the
> translation buffer, and for KVM it has to enable single-stepping in the
> kernel. That's what happens automatically when you call cpu_single_step.
> I guess 'singlestep' wants to be somehow orthogonal to this. But this is
> the wrong approach.
> 
> Does anyone actually used this feature or still does so? It looks fairly
> redundant to me, kind of a poor-man's gdb front-end as part of the
> monitor console.

Not sure what it does, but I use -singlestep quite a lot to get register dumps 
for instructions when using -d cpu.

Alex





Re: [Qemu-devel] [PATCH 2/2] VirtIO RNG

2010-04-13 Thread Ian Molton
Paul Brook wrote:
>> Paul Brook wrote:

>> So now everything that looks like a stream of bytes has to use the
>> virtio-serial code...
> 
> IMO, yes. That's the whole point of virtio-serial.
> 
> You can handle it however you like in your in your favourite guest OS, but I 
> object to qemu having multiple virtio devices that look and smell exactly 
> like 
> a serial port.

I dont think it looks and smells exactly like a serial port. The code is
simpler to begin with, nor does it support any out-of-band signalling.

>> Why? Its not like it'll make the rng device any simpler, smaller,
>> faster, or reduce its dependencies. Virtio is simple enough to begin with!
> 
> I disagree. Past experience shows that it's more than possible to completely 
> screw up something as simple as a serial port.

Straw man. Serial ports might be fairly simple things, but my device is
simpler still than most of them, and you're talking about emulating a
serial port, where you have control lines, line disciplines, etc. to
emulate, all of which need to be right. None of that is a concern in my
driver.

> If everything that looks like a 
> serial port provides their own implementation then the chances are they'll 
> all 
> be missing something, and broken in different ways.

Better never write any code then, its garunteesd to have a bug in it.

I *know* what you're getting at - I removed hundreds of lines of common
code copied and randomly, brokenly, modified from the various ASoC
kernel drivers a few months back.

But this bit of code doesnt really share enough in common with anything
else to make it worth handling your way. Alternatively, prove me wrong!

> Take rate limiting as an example: Your implementation has at least one 
> outright bug (using gettimeofday),

Arguably a bug. I'd disagree that it causes any problems in practice
(qemu_gettimeofday, btw - which I was specifically requested to use
during an earlier review of this patch...)

> and various questionable characteristics (is a 1-second token bucket 
> appropriate).

So what? if people dont like the default policy, or its too limited,
they can feel free to contribute patches. Its hardly like this is core
code that minute changes will require months of review to ratify!

At least I'm doing something about the
total-lack-of-any-support-whatsoever situation.

> The EGD protocol bits are another example. You aren't exposing any of the 
> interesting bits of this protocol to the guest, so it seems entirely 
> reasonable to connect this to a serial port.

Except that then we need to have a little daemon to feed rate limited
EGD protocol data to said serial port.

So now you've got the situation of the USB-serial entropy key, feeding
the (potentially) proprietary egd-compatible-but-not-rate-limited daemon
 that feeds a rate limiting daemon into a serial port on qemu that feeds
the guest OS.

So this is running in a server farm with 64 or 128 guests sat on it.
each one with its own silly rate limiting daemon / serial port mux from
the egd daemon the USB stick feeds. What happens when you apt-get
upgrade and the egd daemon gets replaced? you really want to notify and
restart all 128 noddy mux daemons and reconnect qemu to them? Why dont
we just kill the guests and reboot them all too?

My patchset handled this case without any problems at all (the reader
would simply (and without blocking) reconect to the server).

The point is that you can take 'generic support' too far. Sure you CAN
model this as a bunch of serial ports, and feed them from rate limiting
daemons, which in turn read from some egd-speaking daemon or hardware,
but EGD protocol is *SO* simple. Why on earth go to these lengths when
it can be implemented and embedded in a handful of lines of code?

The chardev reconnect stuff was in the generic qemu chardev layer and
thus reuseable by other parts of qemu that need to maintain a persistent
state even when connected to other processes via sockets, pipes, fifos etc.

> That allows it to be used by guests that don't have a virtio-rng driver.

Thats already supported by use of TCP tunnels. but thats 1) not
efficient and 2) not very secure - few sources of entropy provide crypto
to garuntee that the data source hasnt been frobbed.   3) much more
vulnerable to attack from processes running on the guest (TCP is much
more exposde than the rng-core internals)

So, rather than bike-shedding, how about making some kind of decision?

Either:

1) Dictate how the code is going to be written (in which case, since you
know it so well, write it yourself)

or:

2) Accept that not everyone likes your (idea for a) implementation, that
prior interfaces exist on the guest side, and that a perfectly good and
non-intrusive way of doing it on qemu has beeen written for you.
(including a couple of other nice features / fixes along the way, like
adding the missing SIZE_ property, socket reconnect support, etc.)

Bear in mind that its rather unfair to (as has been in this case) have a
patch re

Re: [Qemu-devel] [PATCH 2/2] VirtIO RNG

2010-04-13 Thread Paul Brook
> Bear in mind that its rather unfair to (as has been in this case) have a
> patch reviewed, modified based on criticism, and then rejected after
> effort has been wasted working on it.

I'm pretty sure I raised all these issues the first time this code was 
submitted.

Paul




[Qemu-devel] [PATCH] tun: orphan an skb on tx

2010-04-13 Thread Michael S. Tsirkin
The following situation was observed in the field:
tap1 sends packets, tap2 does not consume them, as a result
tap1 can not be closed. This happens because
tun/tap devices can hang on to skbs undefinitely.

As noted by Herbert, possible solutions include a timeout followed by a
copy/change of ownership of the skb, or always copying/changing
ownership if we're going into a hostile device.

This patch implements the second approach.

Note: one issue still remaining is that since skbs
keep reference to tun socket and tun socket has a
reference to tun device, we won't flush backlog,
instead simply waiting for all skbs to get transmitted.
At least this is not user-triggerable, and
this was not reported in practice, my assumption is
other devices besides tap complete an skb
within finite time after it has been queued.

A possible solution for the second issue
would not to have socket reference the device,
instead, implement dev->destructor for tun, and
wait for all skbs to complete there, but this
needs some thought, probably too risky for 2.6.34.

Signed-off-by: Michael S. Tsirkin 
Tested-by: Yan Vugenfirer 

---

Please review the below, and consider for 2.6.34,
and stable trees.

 drivers/net/tun.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 96c39bd..4326520 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -387,6 +387,10 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, 
struct net_device *dev)
}
}
 
+   /* Orphan the skb - required as we might hang on to it
+* for indefinite time. */
+   skb_orphan(skb);
+
/* Enqueue packet */
skb_queue_tail(&tun->socket.sk->sk_receive_queue, skb);
dev->trans_start = jiffies;
-- 
1.7.0.2.280.gc6f05




[Qemu-devel] Re: [PATCH] tun: orphan an skb on tx

2010-04-13 Thread Herbert Xu
On Tue, Apr 13, 2010 at 05:59:44PM +0300, Michael S. Tsirkin wrote:
> The following situation was observed in the field:
> tap1 sends packets, tap2 does not consume them, as a result
> tap1 can not be closed. This happens because
> tun/tap devices can hang on to skbs undefinitely.
> 
> As noted by Herbert, possible solutions include a timeout followed by a
> copy/change of ownership of the skb, or always copying/changing
> ownership if we're going into a hostile device.
> 
> This patch implements the second approach.
> 
> Note: one issue still remaining is that since skbs
> keep reference to tun socket and tun socket has a
> reference to tun device, we won't flush backlog,
> instead simply waiting for all skbs to get transmitted.
> At least this is not user-triggerable, and
> this was not reported in practice, my assumption is
> other devices besides tap complete an skb
> within finite time after it has been queued.
> 
> A possible solution for the second issue
> would not to have socket reference the device,
> instead, implement dev->destructor for tun, and
> wait for all skbs to complete there, but this
> needs some thought, probably too risky for 2.6.34.
> 
> Signed-off-by: Michael S. Tsirkin 
> Tested-by: Yan Vugenfirer 

Acked-by: Herbert Xu 

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt




Re: [Qemu-devel] Re: Missing singlestep for already-translated code?

2010-04-13 Thread Jan Kiszka
Alexander Graf wrote:
> On 13.04.2010, at 15:36, Jan Kiszka wrote:
> 
>> Jun Koi wrote:
>>> Hi,
>>>
>>> I am looking into the singlestep command in monitor interface, and it
>>> seems that we only take into account the singlestep flag when we are
>>> translating code.
>>> So for the already-translated code, we will miss singlestep?
>> This feature is broken. For TCG, it should at least flush the
>> translation buffer, and for KVM it has to enable single-stepping in the
>> kernel. That's what happens automatically when you call cpu_single_step.
>> I guess 'singlestep' wants to be somehow orthogonal to this. But this is
>> the wrong approach.
>>
>> Does anyone actually used this feature or still does so? It looks fairly
>> redundant to me, kind of a poor-man's gdb front-end as part of the
>> monitor console.
> 
> Not sure what it does, but I use -singlestep quite a lot to get register 
> dumps for instructions when using -d cpu.

Ah, "singlestep" is not about stopping the VM after each instruction but
about limiting the TB length to a single instruction. Badly named and
poorly documented.

In that case, the dynamic switch should already be fine by adding a
tb_flush() on enable. Still, someone should also patch at least the docs.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux




Re: [Qemu-devel] [PATCH 2/2] VirtIO RNG

2010-04-13 Thread Paul Brook
> So, rather than bike-shedding, how about making some kind of decision?

Ok, let me make this simple.

Features such as rate limiting and EGD protocol translation should not be part 
of individual device emulation. They are part of the host interface, not the 
guest machine.  If you really want these in qemu then they should be 
implemented as part of the chardev layer.

It may be a bit more work to implement these properly. However I'm not going 
to accept a half-assed implementation just because someone wrote it.


A direct result of this is that virtio-rng degenerates to a crippled virtual 
serial port. We already have a virtual serial port implementation designed for 
exactly this kind of application.  IMHO virtio-rng is a complete waste of time 
and space.

Paul




[Qemu-devel] Re: [PATCH] tun: orphan an skb on tx

2010-04-13 Thread Jan Kiszka
Michael S. Tsirkin wrote:
> The following situation was observed in the field:
> tap1 sends packets, tap2 does not consume them, as a result
> tap1 can not be closed.

And before that, tap1 may not be able to send further packets to anyone
else on the bridge as its TX resources were blocked by tap2 - that's
what we saw in the field.

> This happens because
> tun/tap devices can hang on to skbs undefinitely.
> 
> As noted by Herbert, possible solutions include a timeout followed by a
> copy/change of ownership of the skb, or always copying/changing
> ownership if we're going into a hostile device.
> 
> This patch implements the second approach.
> 
> Note: one issue still remaining is that since skbs
> keep reference to tun socket and tun socket has a
> reference to tun device, we won't flush backlog,
> instead simply waiting for all skbs to get transmitted.
> At least this is not user-triggerable, and
> this was not reported in practice, my assumption is
> other devices besides tap complete an skb
> within finite time after it has been queued.
> 
> A possible solution for the second issue
> would not to have socket reference the device,
> instead, implement dev->destructor for tun, and
> wait for all skbs to complete there, but this
> needs some thought, probably too risky for 2.6.34.
> 
> Signed-off-by: Michael S. Tsirkin 
> Tested-by: Yan Vugenfirer 
> 
> ---
> 
> Please review the below, and consider for 2.6.34,
> and stable trees.
> 
>  drivers/net/tun.c |4 
>  1 files changed, 4 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 96c39bd..4326520 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -387,6 +387,10 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, 
> struct net_device *dev)
>   }
>   }
>  
> + /* Orphan the skb - required as we might hang on to it
> +  * for indefinite time. */
> + skb_orphan(skb);
> +
>   /* Enqueue packet */
>   skb_queue_tail(&tun->socket.sk->sk_receive_queue, skb);
>   dev->trans_start = jiffies;

Nice solution, thanks!

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux




[Qemu-devel] Re: Problem with QEMU on KVM

2010-04-13 Thread Jan Kiszka
Mulyadi Santosa wrote:
> Hi Jamie...
> 
> On Mon, Apr 12, 2010 at 19:07, Jamie Lokier  wrote:
>> There are various -no-kvm-XXX options to try:
>>
>> -no-kvm-irqchip disable KVM kernel mode PIC/IOAPIC/LAPIC
>> -no-kvm-pit disable KVM kernel mode PIT
>> -no-kvm-pit-reinjection disable KVM kernel mode PIT interrupt reinjection
> 
> I try to read the source code without too much luck understanding the
> meaning of the above parameters, especially the reinjection. Please
> CMIIW, interrupt reinjection is a way to handle lost ticks, right?

Those switches only exist in the qemu-kvm branch. The fact that
-enable-kvm has some effect for you makes me think that you tried
upstream KVM support (ie. the one that comes with vanilla QEMU), right?

It would be interesting to check if you can reproduce the problem also
with qemu-kvm(-0.12.3 or git head). As you may also face an issue of the
kvm kernel bits, it would be furthermore helpful to check with the
latest kernel from kvm.git or the modules build via kvm-kmod as well.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux




[Qemu-devel] Re: [PATCH] tun: orphan an skb on tx

2010-04-13 Thread Jan Kiszka
Eric Dumazet wrote:
> Le mardi 13 avril 2010 à 17:36 +0200, Jan Kiszka a écrit :
>> Michael S. Tsirkin wrote:
>>> The following situation was observed in the field:
>>> tap1 sends packets, tap2 does not consume them, as a result
>>> tap1 can not be closed.
>> And before that, tap1 may not be able to send further packets to anyone
>> else on the bridge as its TX resources were blocked by tap2 - that's
>> what we saw in the field.
>>
> 
> After the patch, tap1 is able to flood tap2, and tap3/tap4 not able to
> send one single frame. Is it OK ?

I think if that's a real issue, you have to apply traffic shaping to the
untrusted nodes. The existing flow-control scheme was fragile anyway as
you had to translate packet lengths on TX side to packet counts on RX.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux




[Qemu-devel] Re: [PATCH] tun: orphan an skb on tx

2010-04-13 Thread Michael S. Tsirkin
On Tue, Apr 13, 2010 at 06:40:38PM +0200, Eric Dumazet wrote:
> Le mardi 13 avril 2010 à 17:36 +0200, Jan Kiszka a écrit :
> > Michael S. Tsirkin wrote:
> > > The following situation was observed in the field:
> > > tap1 sends packets, tap2 does not consume them, as a result
> > > tap1 can not be closed.
> > 
> > And before that, tap1 may not be able to send further packets to anyone
> > else on the bridge as its TX resources were blocked by tap2 - that's
> > what we saw in the field.
> > 
> 
> After the patch, tap1 is able to flood tap2, and tap3/tap4 not able to
> send one single frame. Is it OK ?

Yes :) This was always possible. Number of senders needed to flood
a receiver might vary depending on send/recv queue size
that you set. External sources can also fill your RX queue
if you let them. In the end, we need to rely on the scheduler for fairness,
or apply packet shaping.

> Back to the problem : tap1 cannot be closed.
> 
> Why ? because of refcounts ?

Yes.

> When a socket with inflight tx packets is closed, we dont block the
> close, we only delay the socket freeing once all packets were delivered
> and freed.
> 

Which is wrong, since this is under userspace control, so you get
unkillable processes.

-- 
MST




[Qemu-devel] Re: [PATCH] block: Split bdrv_open

2010-04-13 Thread Christoph Hellwig
On Mon, Apr 12, 2010 at 04:49:16PM +0200, Kevin Wolf wrote:
> bdrv_open contains quite some code that is only useful for opening images (as
> opposed to opening files by a protocol), for example snapshots.
> 
> This patch splits the code so that we have bdrv_open_file() for files (uses
> protocols), bdrv_open() for images (uses format drivers) and bdrv_do_open() 
> for
> the code common for opening both images and files.
> 
> Signed-off-by: Kevin Wolf 
> ---
> This patch applies on top of Christoph's RFC for the format/protocol split

I like this a lot.  A few comments:

 - why is bdrv_do_open below it's users in the code?  I really hate
   forward declarations of functions and they can usually be easily
   avoided.
 - a "do" a function name is not very meaningfull - what about
   bdrv_open_common instead?
 - doesn't the backing device handling only apply to image formats, too?





Re: [Qemu-devel] How to lock-up your tap-based VM network

2010-04-13 Thread Blue Swirl
On 4/12/10, Paul Brook  wrote:
> > A major reason for this deadlock could likely be removed by shutting
>  > down the tap (if peered) or dropping packets in user space (in case of
>  > vlan) when a NIC is stopped or otherwise shut down. Currently most (if
>  > not all) NIC models seem to signal both "queue full" and "RX disabled"
>  > via !can_receive().
>
>
> No. A disabled device should return true from can_recieve, then discard the
>  packets in its receive callback. Failure to do so is a bug in the device. It
>  looks like the virtio-net device may be buggy.

Awesome, it looks like a longstanding bug with pcnet/lance has is
fixed by this change! OpenBSD installer would hang when receiving
packages, now it works!

diff --git a/hw/pcnet.c b/hw/pcnet.c
index 5e63eb5..a04ad09 100644
--- a/hw/pcnet.c
+++ b/hw/pcnet.c
@@ -1031,8 +1031,6 @@ static int pcnet_tdte_poll(PCNetState *s)
 int pcnet_can_receive(VLANClientState *nc)
 {
 PCNetState *s = DO_UPCAST(NICState, nc, nc)->opaque;
-if (CSR_STOP(s) || CSR_SPND(s))
-return 0;

 return sizeof(s->buffer)-16;
 }




[Qemu-devel] Re: SeaBIOS error with Juniper FreeBSD kernel

2010-04-13 Thread Bjørn Mork
Gleb Natapov  writes:
> On Sun, Feb 28, 2010 at 02:39:04PM -0500, Kevin O'Connor wrote:
>> On Sun, Feb 21, 2010 at 04:18:38PM -0700, Brandon Bennett wrote:
>>
>> > I have narrowed it down to SMBIOS.  If I disable CONFIG_SMBIOS the
>> > image boots up fine.
>> 
>> Gleb, have you seen this thread?
>> 
>> Some of the recent changes to smbios that look like possible culprits
>> are:
>> 
>> Make SMBIOS table pass MS SVVP test
>> Use MaxCountCPUs during building of per cpu tables.
>> Add malloc_high/fseg() and rework bios table creation to use them.
>> 
> If there is any seabios revision that works then it is possible to
> bisect to find problematic commit.

Hello,

I'm also bitten by this and have now attempted to bisect it.  The result
was: 

c95d2cee36db79c88253d43a90230ceadf0c26cf is first bad commit
commit c95d2cee36db79c88253d43a90230ceadf0c26cf
Author: Kevin O'Connor 
Date:   Wed Jul 29 20:41:39 2009 -0400

Add auto-generated version info to each build.

Add versioning info to initial debug and screen banner output.

:100644 100644 2e2ba1dbf46cb7a12eeaafbc394f3f279c10abf4 
37589097295dde049f8512d99561a0320955fb41 M  Makefile
:04 04 29c4ea1e03416fce9b3aa963ece74050a7b5769a 
2b5425a20726caeadba524e14756c8827dd66595 M  src



Which looks a bit strange... I'm afraid I'm unable to revert this on the
current HEAD due to other changes in the same area and my complete lack
of understanding any assembler.  But I have verified that 
 commit 8c8a880b584ccf8958d67e99a6750ba32d0b6454
is working for me, while 
 commit c95d2cee36db79c88253d43a90230ceadf0c26cf
has the mentioned problems.


However, I am not completely sure that I didn't mess up something which
affected the result.  I do note that the *working* version logs a 
"Bad SMBIOS data checksum" message:

ol...@canardo:~$ kvm -L /usr/local/src/git/seabios/out -m 1024 -hda olive2.img  
-nographic -serial mon:stdio -monitor null 
Could not open option rom 'vapic.bin': No such file or directory
pci_add_option_rom: failed to find romfile "vgabios-cirrus.bin"
pci_add_option_rom: failed to find romfile "pxe-rtl8139.bin"
Consoles: serial port  
BIOS drive C: is disk0
BIOS 639kB/1047544kB available memory

FreeBSD/i386 bootstrap loader, Revision 1.1
(buil...@ormonth.juniper.net, Tue Nov  3 08:19:23 UTC 2009)
Loading /boot/defaults/loader.conf 
/kernel text=0x74574c smbios_init: Bad SMBIOS data checksum
data=0x42ba0+0x92e34 syms=[0x4+0x7f570+0x4+0xb12b2]
/boot/modules/if_bge.ko text=0xa98c data=0x364+0xc syms=[0x4+0xd50+0x4+0xd18]
/boot/modules/mpt_core.ko text=0x18dbc data=0x488+0x358 
/boot/modules/if_bce.ko text=0xd07c data=0x16d94+0x24e4 
syms=[0x4+0x14d0+0x4+0x1787]
/boot/modules/acb.ko text=0x3d78 data=0x284+0x80 syms=[0x4+0xa70+0x4+0x9a4]
/boot/modules/mcs.ko text=0x4dc0 data=0x391+0xeb syms=[0x4+0xc40+0x4+0xbc2]
/boot/modules/scs.ko text=0x7c08 data=0x564+0x164 syms=[0x4+0x1110+0x4+0x1179]
/boot/modules/rcb.ko text=0x29c8 data=0x184+0x2c syms=[0x4+0x7e0+0x4+0x704]
/boot/modules/cb.ko text=0x63fc data=0x3b8+0x11c syms=[0x4+0xf20+0x4+0xe69]
/boot/modules/mesw.ko text=0x630c data=0x344+0x58 syms=[0x4+0xba0+0x4+0xe7a]
/boot/modules/cbd.ko text=0x1fcc data=0x9c+0xc syms=[0x4+0x540+0x4+0x445]
/boot/modules/sfccb.ko text=0xe30 data=0x1b0+0x14 syms=[0x4+0x540+0x4+0x4a5]
/boot/modules/mac_runasnonroot.ko text=0x7b4 data=0x4d0 
syms=[0x4+0x310+0x4+0x39d]


Hit [Enter] to boot immediately, or space bar for command prompt.
Booting [/kernel]...   
platform_early_bootinit: M/T Series Early Boot Initialization
Olive CPU
GDB: debug ports: sio
[snip]


while the failing version does not cause this message:

ol...@canardo:~$ kvm -L /usr/local/src/git/seabios/out -m 1024 -hda olive2.img  
-nographic -serial mon:stdio -monitor null 
Could not open option rom 'vapic.bin': No such file or directory
pci_add_option_rom: failed to find romfile "vgabios-cirrus.bin"
pci_add_option_rom: failed to find romfile "pxe-rtl8139.bin"
Consoles: serial port  
BIOS drive C: is disk0
BIOS 639kB/1047544kB available memory

FreeBSD/i386 bootstrap loader, Revision 1.1
(buil...@ormonth.juniper.net, Tue Nov  3 08:19:23 UTC 2009)
Loading /boot/defaults/loader.conf 
/kernel text=0x74574c data=0x42ba0+0x92e34 syms=[0x4+0x7f570+0x4+0xb12b2]
/boot/modules/if_bge.ko text=0xa98c data=0x364+0xc syms=[0x4+0xd50+0x4+0xd18]
/boot/modules/mpt_core.ko text=0x18dbc data=0x488+0x358 
/boot/modules/if_bce.ko text=0xd07c data=0x16d94+0x24e4 
syms=[0x4+0x14d0+0x4+0x1787]
/boot/modules/acb.ko text=0x3d78 data=0x284+0x80 syms=[0x4+0xa70+0x4+0x9a4]
/boot/modules/mcs.ko text=0x4dc0 data=0x391+0xeb syms=[0x4+0xc40+0x4+0xbc2]
/boot/modules/scs.ko text=0x7c08 data=0x564+0x164 syms=[0x4+0x1110+0x4+0x1179]
/boot/modules/rcb.ko text=0x29c8 data=0x184+0x2c syms=[0x4+0x7e0+0x4+0x704]
/boot/modules/cb.ko text=0x63fc data=0x3b8+0x11c syms=[0x4+0xf20+0x4+0xe69]
/boot/modules/mesw.ko text=0x630c data=0x344+0x58 syms=[0x4+0xba0+0x4+0xe7a]
/boot/modules/cbd.ko text=0x1fcc data=0x9c+0xc syms=[0x4+0x54

Re: [Qemu-devel] [GSoC 2010] Pass-through filesystem support.

2010-04-13 Thread jvrao
jvrao wrote:
> Alexander Graf wrote:
>> On 12.04.2010, at 13:58, Jamie Lokier wrote:
>>
>>> Mohammed Gamal wrote:
 On Mon, Apr 12, 2010 at 12:29 AM, Jamie Lokier  wrote:
> Javier Guerra Giraldez wrote:
>> On Sat, Apr 10, 2010 at 7:42 AM, Mohammed Gamal  
>> wrote:
>>> On Sat, Apr 10, 2010 at 2:12 PM, Jamie Lokier  
>>> wrote:
 To throw a spanner in, the most widely supported filesystem across
 operating systems is probably NFS, version 2 :-)
>>> Remember that Windows usage on a VM is not some rare use case, and
>>> it'd be a little bit of a pain from a user's perspective to have to
>>> install a third party NFS client for every VM they use. Having
>>> something supported on the VM out of the box is a better option IMO.
>> i don't think virtio-CIFS has any more support out of the box (on any
>> system) than virtio-9P.
> It doesn't, but at least network-CIFS tends to work ok and is the
> method of choice for Windows VMs - when you can setup Samba on the
> host (which as previously noted you cannot always do non-disruptively
> with current Sambas).
>
> -- Jamie
>
 I think having support for both 9p and CIFS would be the best option.
 In that case the user will have the option to use either one,
 depending on how their guests support these filesystems. In that case
 I'd prefer to work on CIFS support while the 9p effort can still go
 on. I don't think both efforts are mutually exclusive.

 What do the rest of you guys think?
>>> I only noted NFS because most old OSes do not support CIFS or 9P -
>>> especially all the old unixes.
>>>
>>> I don't think old versions of MS-DOS and Windows (95, 98, ME, Nt4?)
>>> even support current CIFS.  They need extra server settings to work
>>> - such as setting passwords on the server to non-encrypted and other quirks.
>>>
>>> Meanwhile Windows Vista/2008/7 works better with SMB2, not CIFS, to
>>> properly see symlinks and hard links.
>>>
>>> So there is no really nice out of the box file service which works
>>> easily with all guest OSes.
>>>
>>> I'm guessing that out of all the filesystems, CIFS is the most widely
>>> supported in recent OSes (released in the last 10 years).  But I'm not
>>> really sure what the state of CIFS is for non-Windows, non-Linux,
>>> non-BSD guests.
>> So what? If you want to have direct host fs access, install guest drivers. 
>> If you can't, set up networking and use CIFS or NFS or whatever.
>>
>>> I'm not sure why 9P is being pursued.  Does anything much support it,
>>> or do all OSes except quite recent Linux need a custom driver for 9P?
>>> Even Linux only got the first commit in the kernel 5 years ago, so
>>> probably it was only about 3 years ago that it will have begun
>>> appearing in stable distros, if at all.  Filesystem passthrough to
>>> Linux guests installed in the last couple of years is a useful
>>> feature, and I know that for many people that is their only use of
>>> KVM, but compared with CIFS' broad support it seems like quite a
>>> narrow goal.
>> The goal is to have something simple and fast. We can fine-tune 9P to align 
>> with the Linux VFS structures, making it really little overhead (and little 
>> headache too). For Windows guests, nothing prevents us to expose yet another 
>> 9P flavor. That again would be aligned well with Windows's VFS and be slim 
>> and fast there.
>>
>> The biggest problem I see with CIFS is that it's a huge beast. There are a 
>> lot of corner cases where it just doesn't fit in. See my previous mail for 
>> more details.
>>
> As Alex mentioned, 9P is chosen for its mere simplicity and easy adaptability.
> NFS and CIFS does not give that flexibility. As we mentioned in the patch 
> series, we are 
> already seeing better numbers with 9P. Looking ahead 9P can embed KVM/QEMU 
> knowledge
> to share physical resources like memory/cache between the host and the guest.
> 
> I think looking into the windows side of 9P client would be great option too. 
> The current patch on the mailing list supports .U (unix) protocol and will be 
> introducing
> .L (Linux) variant as we move forward.

Hi Mohammed,
Please let us know once you decide on where your interest lies.
Will be glad to have you on VirtFS (9P) though. :)


- JV

> 
> - JV
> 
> 
>> Alex
>>
>>
>>
> 
> 
> 
> 






Re: [Qemu-devel] How to lock-up your tap-based VM network

2010-04-13 Thread Blue Swirl
On 4/13/10, Blue Swirl  wrote:
> On 4/12/10, Paul Brook  wrote:
>  > > A major reason for this deadlock could likely be removed by shutting
>  >  > down the tap (if peered) or dropping packets in user space (in case of
>  >  > vlan) when a NIC is stopped or otherwise shut down. Currently most (if
>  >  > not all) NIC models seem to signal both "queue full" and "RX disabled"
>  >  > via !can_receive().
>  >
>  >
>  > No. A disabled device should return true from can_recieve, then discard the
>  >  packets in its receive callback. Failure to do so is a bug in the device. 
> It
>  >  looks like the virtio-net device may be buggy.
>
>
> Awesome, it looks like a longstanding bug with pcnet/lance has is
>  fixed by this change! OpenBSD installer would hang when receiving
>  packages, now it works!

I spoke too soon, networking works also without the patch now.




Re: [Qemu-devel] Problem with DOS application and 286 DOS Extender application

2010-04-13 Thread Gerhard Wiesinger

On Tue, 13 Apr 2010, Roy Tam wrote:


2010/4/13 Gerhard Wiesinger :

Hello,

I'm having a problem with a DOS application which uses a 286 DOS Extender,
error message is as the following:
unable to create task for execution
Interrupt 10 (Ah) while creating task: Invalid task segment selector.
Happens with QEMM386 and HIMEM.SYS/EMM386.EXE.

I guess the application does at this point swithing to 286 protected mode
and trying to move conventional memory up to EMS memory.

Issue is NOT present under VMWare Server 2.0 and with real hardware.

DOS; MS-DOS 6.22
QEMU: 0.12.3 under Fedora 11, 2.6.30.10-105.2.23.fc11.x86 on AMD Phenom II
Quad Core, x86_64-softmmu.

Any comments or ideas (I guess something with protected mode and MMU might
be wrong)?



You need to mention the program name so that people can try to
reproduce the bug.



It is a non public, proprietary application which uses the Ergo Computing 
286 DOS Extender. I guess some other application which use the same DOS 
extender have the same problem. So best thing is to find another 
application which uses the Ergo Computing 286 DOS Extender, too.


Ciao,
Gerhard

--
http://www.wiesinger.com/




Re: [Qemu-devel] Problem with QEMU on KVM

2010-04-13 Thread Gerhard Wiesinger

On Mon, 12 Apr 2010, Jamie Lokier wrote:


Mulyadi Santosa wrote:

Hi Gerhard...

On Sun, Apr 11, 2010 at 20:52, Gerhard Wiesinger  wrote:

OK, uses the following ports:
Port 0x20: 8259 interrupt controller
Port 0x40: 8253 timer

Interrupt 0x1A:
ah=0x00: fetches system timer counters
ah=0x02: reads the clock
ah=0x04: fetches date

So there must be something wrong with KVM with the above functionality (I
guess the timers).


Hmm, my silly guess is, maybe the timer is seen as decrementedor
at least "stuck".

I have very little knowledge about git, but maybe you can start doing
git bisect to narrow which git commit that introduce such behaviour.
Meanwhile, let's wait for comments from one of the KVM developers.


There are various -no-kvm-XXX options to try:

-no-kvm-irqchip disable KVM kernel mode PIC/IOAPIC/LAPIC
-no-kvm-pit disable KVM kernel mode PIT
-no-kvm-pit-reinjection disable KVM kernel mode PIT interrupt reinjection

Any of them might be causing this problem.

They disable in-kernel emulation of the interrupt controller and timer
chips, and force the qemu version to be used.  It's possible the
in-kernel emulation is buggier than the qemu version.


Hello Jamie, thanx so far.

I'm starting in this way:
/root/download/qemu/qemu-0.12.3/x86_64-softmmu/qemu-system-x86_64 -enable-kvm 
-a_lot_of_other_options

All options you mentioned don't work:
/root/download/qemu/qemu-0.12.3/x86_64-softmmu/qemu-system-x86_64: invalid 
option -- '-no-kvm-irqchip'
/root/download/qemu/qemu-0.12.3/x86_64-softmmu/qemu-system-x86_64: invalid 
option -- '-no-kvm-pit'
/root/download/qemu/qemu-0.12.3/x86_64-softmmu/qemu-system-x86_64: invalid 
option -- '-no-kvm-no-kvm-irqchip'

Is there another way of startup or is it working only on newer versions?

BTW: For me it is very unclear what's the difference between the 
following commands:

qemu
qemu -enable-kvm
kvm

How can I build the kvm executable? Different source tree? Newer version 
from branch?


Maybe someone can explain the different versions of qemu/kvm and branches 
etc. and possible features in detail.


Thnx.

Ciao,
Gerhard

--
http://www.wiesinger.com/




Re: [Qemu-devel] [GSoC 2010] Pass-through filesystem support.

2010-04-13 Thread Mohammed Gamal
On Tue, Apr 13, 2010 at 9:08 PM, jvrao  wrote:
> jvrao wrote:
>> Alexander Graf wrote:
>>> On 12.04.2010, at 13:58, Jamie Lokier wrote:
>>>
 Mohammed Gamal wrote:
> On Mon, Apr 12, 2010 at 12:29 AM, Jamie Lokier  
> wrote:
>> Javier Guerra Giraldez wrote:
>>> On Sat, Apr 10, 2010 at 7:42 AM, Mohammed Gamal  
>>> wrote:
 On Sat, Apr 10, 2010 at 2:12 PM, Jamie Lokier  
 wrote:
> To throw a spanner in, the most widely supported filesystem across
> operating systems is probably NFS, version 2 :-)
 Remember that Windows usage on a VM is not some rare use case, and
 it'd be a little bit of a pain from a user's perspective to have to
 install a third party NFS client for every VM they use. Having
 something supported on the VM out of the box is a better option IMO.
>>> i don't think virtio-CIFS has any more support out of the box (on any
>>> system) than virtio-9P.
>> It doesn't, but at least network-CIFS tends to work ok and is the
>> method of choice for Windows VMs - when you can setup Samba on the
>> host (which as previously noted you cannot always do non-disruptively
>> with current Sambas).
>>
>> -- Jamie
>>
> I think having support for both 9p and CIFS would be the best option.
> In that case the user will have the option to use either one,
> depending on how their guests support these filesystems. In that case
> I'd prefer to work on CIFS support while the 9p effort can still go
> on. I don't think both efforts are mutually exclusive.
>
> What do the rest of you guys think?
 I only noted NFS because most old OSes do not support CIFS or 9P -
 especially all the old unixes.

 I don't think old versions of MS-DOS and Windows (95, 98, ME, Nt4?)
 even support current CIFS.  They need extra server settings to work
 - such as setting passwords on the server to non-encrypted and other 
 quirks.

 Meanwhile Windows Vista/2008/7 works better with SMB2, not CIFS, to
 properly see symlinks and hard links.

 So there is no really nice out of the box file service which works
 easily with all guest OSes.

 I'm guessing that out of all the filesystems, CIFS is the most widely
 supported in recent OSes (released in the last 10 years).  But I'm not
 really sure what the state of CIFS is for non-Windows, non-Linux,
 non-BSD guests.
>>> So what? If you want to have direct host fs access, install guest drivers. 
>>> If you can't, set up networking and use CIFS or NFS or whatever.
>>>
 I'm not sure why 9P is being pursued.  Does anything much support it,
 or do all OSes except quite recent Linux need a custom driver for 9P?
 Even Linux only got the first commit in the kernel 5 years ago, so
 probably it was only about 3 years ago that it will have begun
 appearing in stable distros, if at all.  Filesystem passthrough to
 Linux guests installed in the last couple of years is a useful
 feature, and I know that for many people that is their only use of
 KVM, but compared with CIFS' broad support it seems like quite a
 narrow goal.
>>> The goal is to have something simple and fast. We can fine-tune 9P to align 
>>> with the Linux VFS structures, making it really little overhead (and little 
>>> headache too). For Windows guests, nothing prevents us to expose yet 
>>> another 9P flavor. That again would be aligned well with Windows's VFS and 
>>> be slim and fast there.
>>>
>>> The biggest problem I see with CIFS is that it's a huge beast. There are a 
>>> lot of corner cases where it just doesn't fit in. See my previous mail for 
>>> more details.
>>>
>> As Alex mentioned, 9P is chosen for its mere simplicity and easy 
>> adaptability.
>> NFS and CIFS does not give that flexibility. As we mentioned in the patch 
>> series, we are
>> already seeing better numbers with 9P. Looking ahead 9P can embed KVM/QEMU 
>> knowledge
>> to share physical resources like memory/cache between the host and the guest.
>>
>> I think looking into the windows side of 9P client would be great option too.
>> The current patch on the mailing list supports .U (unix) protocol and will 
>> be introducing
>> .L (Linux) variant as we move forward.
>
> Hi Mohammed,
> Please let us know once you decide on where your interest lies.
> Will be glad to have you on VirtFS (9P) though. :)
>
>
> - JV
>

It seems the community is more keen on getting 9P support merged than
getting CIFS supported, and they have made good points to support
their argument. I'm not sure whether work on this project could fit in
as a GSoC project and if there is much remaining work that could make
it fit in that direction. But I'd be glad to volunteer anyway :)

Regards,
Mohammed




[Qemu-devel] Re: [PATCH] tun: orphan an skb on tx

2010-04-13 Thread Michael S. Tsirkin
On Tue, Apr 13, 2010 at 08:31:03PM +0200, Eric Dumazet wrote:
> Le mardi 13 avril 2010 à 20:39 +0300, Michael S. Tsirkin a écrit :
> 
> > > When a socket with inflight tx packets is closed, we dont block the
> > > close, we only delay the socket freeing once all packets were delivered
> > > and freed.
> > > 
> > 
> > Which is wrong, since this is under userspace control, so you get
> > unkillable processes.
> > 
> 
> We do not get unkillable processes, at least with sockets I was thinking
> about (TCP/UDP ones).
> 
> Maybe tun sockets can behave the same ?

Looks like that's what my patch does: ip_rcv seems to call
skb_orphan too.

> Herbert Acked your patch, so I guess its OK, but I think it can be
> dangerous.
> Anyway my feeling is that we try to add various mechanisms to keep a
> hostile user flooding another one.
> 
> For example, UDP got memory accounting quite recently, and we added
> socket backlog limits very recently. It was considered not needed few
> years ago.
> 







[Qemu-devel] Re: [PATCH] tun: orphan an skb on tx

2010-04-13 Thread Michael S. Tsirkin
On Tue, Apr 13, 2010 at 10:38:06PM +0200, Eric Dumazet wrote:
> Le mardi 13 avril 2010 à 23:25 +0300, Michael S. Tsirkin a écrit :
> > On Tue, Apr 13, 2010 at 08:31:03PM +0200, Eric Dumazet wrote:
> > > Le mardi 13 avril 2010 à 20:39 +0300, Michael S. Tsirkin a écrit :
> > > 
> > > > > When a socket with inflight tx packets is closed, we dont block the
> > > > > close, we only delay the socket freeing once all packets were 
> > > > > delivered
> > > > > and freed.
> > > > > 
> > > > 
> > > > Which is wrong, since this is under userspace control, so you get
> > > > unkillable processes.
> > > > 
> > > 
> > > We do not get unkillable processes, at least with sockets I was thinking
> > > about (TCP/UDP ones).
> > > 
> > > Maybe tun sockets can behave the same ?
> > 
> > Looks like that's what my patch does: ip_rcv seems to call
> > skb_orphan too.
> 
> Well, I was speaking of tx side, you speak of receiving side.

Point is, both ip_rcv and my patch call skb_orphan.

> An external flood (coming from another domain) is another problem.
> 
> A sender might flood the 'network' inside our domain. How can we
> reasonably limit the sender ?
> 
> Maybe the answer is 'We can not', but it should be stated somewhere, so
> that someone can address this point later.
> 

And whatever's done should ideally work for tap to IP
and IP to IP sockets as well, not just tap to tap.

-- 
MST




[Qemu-devel] [PATCH v2] Write cmos hd data for ide drives using -device parm

2010-04-13 Thread Bruce Rogers

When specifying ide devices using -device, the cmos information 
which the bios depends on is not written. This patch generalizes 
the cmos hd data setting for the existing code path and adds the 
ability to call that code on a per machine, per ide drive basis. 

This is important for older guests, such as Windows XP, and Windows 
Server 2003. 

I needed to add an id to the IDEBus structure since there wasn't 
a way to identify whether an ide bus was primary or secondary in 
the lower level code. 

v2:Fixed same issue for isapc machine type (previous patch addressed
pc machine types only). Additionally, the bus naming needed to be fixed
in the isa case to be able to reference the secondary ide bus in -device
nomenclature.

Signed-off-by: Bruce Rogers  

Bruce 

 hw/ide/internal.h |3 +-
 hw/ide/isa.c  |8 +-
 hw/ide/piix.c |6 +++-
 hw/ide/qdev.c |   13 +++---
 hw/pc.c   |   68 --
 sysemu.h  |1 
 vl.c  |6 
 7 files changed, 71 insertions(+), 34 deletions(-)

diff --git a/hw/ide/internal.h b/hw/ide/internal.h
index 027029e..4ea62bd 100644
--- a/hw/ide/internal.h
+++ b/hw/ide/internal.h
@@ -450,6 +450,7 @@ struct IDEBus {
 IDEState ifs[2];
 uint8_t unit;
 uint8_t cmd;
+uint8_t id;
 qemu_irq irq;
 };
 
@@ -565,7 +566,7 @@ void ide_init2(IDEBus *bus, DriveInfo *hd0, DriveInfo *hd1,
 void ide_init_ioport(IDEBus *bus, int iobase, int iobase2);
 
 /* hw/ide/qdev.c */
-void ide_bus_new(IDEBus *idebus, DeviceState *dev);
+void ide_bus_new(IDEBus *idebus, DeviceState *dev, const char *name);
 IDEDevice *ide_create_drive(IDEBus *bus, int unit, DriveInfo *drive);
 
 #endif /* HW_IDE_INTERNAL_H */
diff --git a/hw/ide/isa.c b/hw/ide/isa.c
index dff7c79..ac3ee3b 100644
--- a/hw/ide/isa.c
+++ b/hw/ide/isa.c
@@ -67,7 +67,13 @@ static int isa_ide_initfn(ISADevice *dev)
 {
 ISAIDEState *s = DO_UPCAST(ISAIDEState, dev, dev);
 
-ide_bus_new(&s->bus, &s->dev.qdev);
+if (s->iobase == 0x1f0) {
+ide_bus_new(&s->bus, &s->dev.qdev, "ide.0");
+s->bus.id = 0;
+} else {
+ide_bus_new(&s->bus, &s->dev.qdev, "ide.1");
+s->bus.id = 1;
+}
 ide_init_ioport(&s->bus, s->iobase, s->iobase2);
 isa_init_irq(dev, &s->irq, s->isairq);
 ide_init2(&s->bus, NULL, NULL, s->irq);
diff --git a/hw/ide/piix.c b/hw/ide/piix.c
index 295a93d..377fda8 100644
--- a/hw/ide/piix.c
+++ b/hw/ide/piix.c
@@ -131,11 +131,13 @@ static int pci_piix_ide_initfn(PCIIDEState *d)
 
 vmstate_register(0, &vmstate_ide_pci, d);
 
-ide_bus_new(&d->bus[0], &d->dev.qdev);
-ide_bus_new(&d->bus[1], &d->dev.qdev);
+ide_bus_new(&d->bus[0], &d->dev.qdev, NULL);
+ide_bus_new(&d->bus[1], &d->dev.qdev, NULL);
 ide_init_ioport(&d->bus[0], 0x1f0, 0x3f6);
 ide_init_ioport(&d->bus[1], 0x170, 0x376);
 
+d->bus[0].id = 0;
+d->bus[1].id = 1;
 ide_init2(&d->bus[0], NULL, NULL, isa_reserve_irq(14));
 ide_init2(&d->bus[1], NULL, NULL, isa_reserve_irq(15));
 return 0;
diff --git a/hw/ide/qdev.c b/hw/ide/qdev.c
index b18693d..c868b8e 100644
--- a/hw/ide/qdev.c
+++ b/hw/ide/qdev.c
@@ -29,9 +29,9 @@ static struct BusInfo ide_bus_info = {
 .size  = sizeof(IDEBus),
 };
 
-void ide_bus_new(IDEBus *idebus, DeviceState *dev)
+void ide_bus_new(IDEBus *idebus, DeviceState *dev, const char *name)
 {
-qbus_create_inplace(&idebus->qbus, &ide_bus_info, dev, NULL);
+qbus_create_inplace(&idebus->qbus, &ide_bus_info, dev, name);
 }
 
 static int ide_qdev_init(DeviceState *qdev, DeviceInfo *base)
@@ -39,6 +39,7 @@ static int ide_qdev_init(DeviceState *qdev, DeviceInfo *base)
 IDEDevice *dev = DO_UPCAST(IDEDevice, qdev, qdev);
 IDEDeviceInfo *info = DO_UPCAST(IDEDeviceInfo, qdev, base);
 IDEBus *bus = DO_UPCAST(IDEBus, qbus, qdev->parent_bus);
+int i;
 
 if (!dev->conf.dinfo) {
 fprintf(stderr, "%s: no drive specified\n", qdev->info->name);
@@ -65,7 +66,13 @@ static int ide_qdev_init(DeviceState *qdev, DeviceInfo *base)
 default:
 goto err;
 }
-return info->init(dev);
+dev->dinfo->bus = bus->id;
+dev->dinfo->unit = dev->unit;
+i = info->init(dev);
+if (i >= 0) {
+machine_ide_finalize(dev->dinfo);
+}
+return i;
 
 err:
 return -1;
diff --git a/hw/pc.c b/hw/pc.c
index 69e597f..29f47d4 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -215,6 +215,44 @@ static void cmos_init_hd(int type_ofs, int info_ofs, 
BlockDriverState *hd)
 rtc_set_memory(s, info_ofs + 8, sectors);
 }
 
+static void set_cmos_hd_data(DriveInfo *dinfo)
+{
+static int hd_type = 0;
+static int hd_trans = 0;
+RTCState *s = rtc_state;
+int cylinders, heads, sectors, translation;
+
+if (dinfo->bus == 0) {
+if (dinfo->unit == 0) {
+hd_type |= 0xf0;
+cmos_init_hd(0x19, 0x1b, dinfo->bdrv);
+} else {
+hd_type |= 0x0f;
+cmos_init_hd(0x1a, 0x24, 

[Qemu-devel] Re: [PATCH] tun: orphan an skb on tx

2010-04-13 Thread Eric Dumazet
Le mardi 13 avril 2010 à 17:36 +0200, Jan Kiszka a écrit :
> Michael S. Tsirkin wrote:
> > The following situation was observed in the field:
> > tap1 sends packets, tap2 does not consume them, as a result
> > tap1 can not be closed.
> 
> And before that, tap1 may not be able to send further packets to anyone
> else on the bridge as its TX resources were blocked by tap2 - that's
> what we saw in the field.
> 

After the patch, tap1 is able to flood tap2, and tap3/tap4 not able to
send one single frame. Is it OK ?

Back to the problem : tap1 cannot be closed.

Why ? because of refcounts ?

When a socket with inflight tx packets is closed, we dont block the
close, we only delay the socket freeing once all packets were delivered
and freed.







[Qemu-devel] Re: [PATCH] tun: orphan an skb on tx

2010-04-13 Thread Eric Dumazet
Le mardi 13 avril 2010 à 20:39 +0300, Michael S. Tsirkin a écrit :

> > When a socket with inflight tx packets is closed, we dont block the
> > close, we only delay the socket freeing once all packets were delivered
> > and freed.
> > 
> 
> Which is wrong, since this is under userspace control, so you get
> unkillable processes.
> 

We do not get unkillable processes, at least with sockets I was thinking
about (TCP/UDP ones).

Maybe tun sockets can behave the same ?

Herbert Acked your patch, so I guess its OK, but I think it can be
dangerous.

Anyway my feeling is that we try to add various mechanisms to keep a
hostile user flooding another one.

For example, UDP got memory accounting quite recently, and we added
socket backlog limits very recently. It was considered not needed few
years ago.







[Qemu-devel] Re: [PATCH] tun: orphan an skb on tx

2010-04-13 Thread Eric Dumazet
Le mardi 13 avril 2010 à 23:25 +0300, Michael S. Tsirkin a écrit :
> On Tue, Apr 13, 2010 at 08:31:03PM +0200, Eric Dumazet wrote:
> > Le mardi 13 avril 2010 à 20:39 +0300, Michael S. Tsirkin a écrit :
> > 
> > > > When a socket with inflight tx packets is closed, we dont block the
> > > > close, we only delay the socket freeing once all packets were delivered
> > > > and freed.
> > > > 
> > > 
> > > Which is wrong, since this is under userspace control, so you get
> > > unkillable processes.
> > > 
> > 
> > We do not get unkillable processes, at least with sockets I was thinking
> > about (TCP/UDP ones).
> > 
> > Maybe tun sockets can behave the same ?
> 
> Looks like that's what my patch does: ip_rcv seems to call
> skb_orphan too.

Well, I was speaking of tx side, you speak of receiving side.
An external flood (coming from another domain) is another problem.

A sender might flood the 'network' inside our domain. How can we
reasonably limit the sender ?

Maybe the answer is 'We can not', but it should be stated somewhere, so
that someone can address this point later.







Re: [Qemu-devel] [PATCH] block.h: bdrv_create2 doesn't exist any more

2010-04-13 Thread Christoph Hellwig
On Tue, Apr 13, 2010 at 12:43:40PM +0200, Kevin Wolf wrote:
> The bdrv_create2 implementation has been disappeared long ago. Remove its
> prototype from the header file, too.
> 
> Signed-off-by: Kevin Wolf 

Looks good.





Re: [Qemu-devel] [PATCH] vhost: fix features ack

2010-04-13 Thread Aurelien Jarno
On Wed, Mar 31, 2010 at 09:20:31PM +0300, Michael S. Tsirkin wrote:
> From: David L Stevens 
> 
> vhost driver in qemu didn't ack features, and this happens
> to work because we don't really require any features. However,
> it's better not to rely on this. This patch passes features to
> vhost as guest acks them.
> 
> Signed-off-by: David L Stevens 
> Signed-off-by: Michael S. Tsirkin 
> ---
> 
> Anthony, here's a fixup patch to address an issue in vhost
> patches. Incidentially, what's the status of the vhost patchset?
> 

Thanks, applied.

>  hw/virtio-net.c |8 
>  1 files changed, 8 insertions(+), 0 deletions(-)
> 
> diff --git a/hw/virtio-net.c b/hw/virtio-net.c
> index 9ddd58c..4c7c46e 100644
> --- a/hw/virtio-net.c
> +++ b/hw/virtio-net.c
> @@ -218,6 +218,14 @@ static void virtio_net_set_features(VirtIODevice *vdev, 
> uint32_t features)
>  (features >> VIRTIO_NET_F_GUEST_ECN)  & 1,
>  (features >> VIRTIO_NET_F_GUEST_UFO)  & 1);
>  }
> +if (!n->nic->nc.peer ||
> +n->nic->nc.peer->info->type != NET_CLIENT_TYPE_TAP) {
> +return;
> +}
> +if (!tap_get_vhost_net(n->nic->nc.peer)) {
> +return;
> +}
> +return vhost_net_ack_features(tap_get_vhost_net(n->nic->nc.peer), 
> features);
>  }
>  
>  static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd,
> -- 
> 1.7.0.2.280.gc6f05
> 
> 
> 

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net




Re: [Qemu-devel] Re: vhost.c: include last

2010-04-13 Thread Aurelien Jarno
On Thu, Apr 08, 2010 at 11:48:56PM +0300, Michael S. Tsirkin wrote:
> On Thu, Apr 08, 2010 at 05:49:50PM -0300, Marcelo Tosatti wrote:
> > 
> > So the userspace headers define KERNEL_STRICT_NAMES and there's no
> > conflict on type definition for older kernels.
> > 
> > igned-off-by: Marcelo Tosatti 
> 
> Acked-by: Michael S. Tsirkin 

Thanks, applied.

> > diff --git a/hw/vhost.c b/hw/vhost.c
> > index ad2f98a..d37a66e 100644
> > --- a/hw/vhost.c
> > +++ b/hw/vhost.c
> > @@ -10,13 +10,13 @@
> >   * the COPYING file in the top-level directory.
> >   */
> >  
> > -#include 
> >  #include 
> >  #include 
> >  #include "vhost.h"
> >  #include "hw/hw.h"
> >  /* For range_get_last */
> >  #include "pci.h"
> > +#include 
> >  
> >  static void vhost_dev_sync_region(struct vhost_dev *dev,
> >uint64_t mfirst, uint64_t mlast,
> 
> 
> 

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net




Re: [Qemu-devel] Re: Call for 0.12.4

2010-04-13 Thread Aurelien Jarno
On Mon, Apr 12, 2010 at 10:47:55AM +0200, Paolo Bonzini wrote:
> On 04/08/2010 08:37 PM, Aurelien Jarno wrote:
> >Hi all,
> >
> >A number of fixes have been accumulated in the stable-0.12 branch, and
> >I think it's time to release a new stable version. I would like to see
> >that happening for the end of next week (around the 18th of April).
> >
> >If you want to see some patches included, please send a mail to the
> >mailing list with the [STABLE] tag. I would clearly prefer patches that
> >are already in HEAD (if the patch can simply be cherry-picked, there is
> >no need to send a patch, just the commit number), though other patches
> >might be considered too.
> 
> Please consider c5f32c99 too.
> 

Done.

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net




Re: [Qemu-devel] [PATCH] vhost-net: disable mergeable buffers

2010-04-13 Thread Aurelien Jarno
On Sun, Apr 04, 2010 at 05:36:55PM +0300, Michael S. Tsirkin wrote:
> vhost in current kernels doesn't support mergeable buffers.
> Disable this feature if vhost is enabled, until such
> support is implemented.
> 
> Signed-off-by: Michael S. Tsirkin 

Thanks, applied.

> ---
>  hw/vhost_net.c |1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/hw/vhost_net.c b/hw/vhost_net.c
> index 39643f1..2e292ee 100644
> --- a/hw/vhost_net.c
> +++ b/hw/vhost_net.c
> @@ -51,6 +51,7 @@ unsigned vhost_net_get_features(struct vhost_net *net, 
> unsigned features)
>  if (!(net->dev.features & (1 << VIRTIO_RING_F_INDIRECT_DESC))) {
>  features &= ~(1 << VIRTIO_RING_F_INDIRECT_DESC);
>  }
> +features &= ~(1 << VIRTIO_NET_F_MRG_RXBUF);
>  return features;
>  }
>  
> -- 
> 1.7.0.2.280.gc6f05
> 
> 
> 

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net




Re: [Qemu-devel] [PATCH] arm: Fix compiler warning (fprintf format string)

2010-04-13 Thread Aurelien Jarno
On Fri, Apr 09, 2010 at 10:49:50PM +0200, Stefan Weil wrote:
> When argument checking is enabled, gcc throws this error:
> 
> error: format not a string literal and no format arguments
> 
> The patch rewrites the statement to satisfy the compiler.
> 
> Signed-off-by: Stefan Weil 

Thanks, applied.

> ---
>  arm-dis.c |8 
>  1 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arm-dis.c b/arm-dis.c
> index 4fb899e..5028555 100644
> --- a/arm-dis.c
> +++ b/arm-dis.c
> @@ -3149,14 +3149,14 @@ print_insn_thumb16 (bfd_vma pc, struct 
> disassemble_info *info, long given)
> if (started)
>   func (stream, ", ");
> started = 1;
> -   func (stream, arm_regnames[14] /* "lr" */);
> +   func (stream, "%s", arm_regnames[14] /* "lr" */);
>   }
>  
> if (domaskpc)
>   {
> if (started)
>   func (stream, ", ");
> -   func (stream, arm_regnames[15] /* "pc" */);
> +   func (stream, "%s", arm_regnames[15] /* "pc" */);
>   }
>  
> func (stream, "}");
> @@ -3699,7 +3699,7 @@ print_insn_thumb32 (bfd_vma pc, struct disassemble_info 
> *info, long given)
> }
>   else
> {
> - func (stream, psr_name (given & 0xff));
> + func (stream, "%s", psr_name (given & 0xff));
> }
>   break;
>  
> @@ -3707,7 +3707,7 @@ print_insn_thumb32 (bfd_vma pc, struct disassemble_info 
> *info, long given)
>   if ((given & 0xff) == 0)
> func (stream, "%cPSR", (given & 0x10) ? 'S' : 'C');
>   else
> -   func (stream, psr_name (given & 0xff));
> +   func (stream, "%s", psr_name (given & 0xff));
>   break;
>  
> case '0': case '1': case '2': case '3': case '4':
> -- 
> 1.5.6.5
> 
> 
> 
> 

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net




Re: [Qemu-devel] [PATCH] m68k: Fix compiler warning (fprintf format string)

2010-04-13 Thread Aurelien Jarno
On Fri, Apr 09, 2010 at 10:49:51PM +0200, Stefan Weil wrote:
> When argument checking is enabled, gcc throws this error:
> 
> error: format not a string literal and no format arguments
> 
> The patch rewrites the statement to satisfy the compiler.
> It also removes a type cast which is not needed.
> 
> Signed-off-by: Stefan Weil 

Thanks, applied.

> ---
>  m68k-dis.c |6 +++---
>  1 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/m68k-dis.c b/m68k-dis.c
> index d38d5a2..7cd88f8 100644
> --- a/m68k-dis.c
> +++ b/m68k-dis.c
> @@ -1104,7 +1104,7 @@ print_insn_arg (const char *d,
>{
>  static const char *const cacheFieldName[] = { "nc", "dc", "ic", "bc" 
> };
>  val = fetch_arg (buffer, place, 2, info);
> -(*info->fprintf_func) (info->stream, cacheFieldName[val]);
> +(*info->fprintf_func) (info->stream, "%s", cacheFieldName[val]);
>  break;
>}
>  
> @@ -1199,7 +1199,7 @@ print_insn_arg (const char *d,
>   {
> static const char *const scalefactor_name[] = { "<<", ">>" };
> val = fetch_arg (buffer, place, 1, info);
> -   (*info->fprintf_func) (info->stream, scalefactor_name[val]);
> +   (*info->fprintf_func) (info->stream, "%s", scalefactor_name[val]);
>   }
>else
>   {
> @@ -1804,7 +1804,7 @@ match_insn_m68k (bfd_vma memaddr,
>  
>save_p = p;
>info->print_address_func = dummy_print_address;
> -  info->fprintf_func = (fprintf_ftype) dummy_printer;
> +  info->fprintf_func = dummy_printer;
>  
>/* We scan the operands twice.  The first time we don't print anything,
>   but look for errors.  */
> -- 
> 1.5.6.5
> 
> 
> 
> 

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net




Re: [Qemu-devel] [PATCH] sh4: Fix compiler warning (fprintf format string)

2010-04-13 Thread Aurelien Jarno
On Fri, Apr 09, 2010 at 10:49:52PM +0200, Stefan Weil wrote:
> When argument checking is enabled, gcc throws this error:
> 
> error: format not a string literal and no format arguments
> 
> The patch rewrites the statement to satisfy the compiler.
> 
> Signed-off-by: Stefan Weil 

Thanks, applied.

> ---
>  sh4-dis.c |4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/sh4-dis.c b/sh4-dis.c
> index 41fd866..078a6b2 100644
> --- a/sh4-dis.c
> +++ b/sh4-dis.c
> @@ -1493,10 +1493,10 @@ print_insn_ppi (int field_b, struct disassemble_info 
> *info)
> print_dsp_reg (field_b & 0xf, fprintf_fn, stream);
> break;
>   case DSP_REG_X:
> -   fprintf_fn (stream, sx_tab[(field_b >> 6) & 3]);
> +   fprintf_fn (stream, "%s", sx_tab[(field_b >> 6) & 3]);
> break;
>   case DSP_REG_Y:
> -   fprintf_fn (stream, sy_tab[(field_b >> 4) & 3]);
> +   fprintf_fn (stream, "%s", sy_tab[(field_b >> 4) & 3]);
> break;
>   case A_MACH:
> fprintf_fn (stream, "mach");
> -- 
> 1.5.6.5
> 
> 
> 
> 

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net




Re: [Qemu-devel] [PATCH] sparc: Fix compiler warning (fprintf format string)

2010-04-13 Thread Aurelien Jarno
On Fri, Apr 09, 2010 at 10:49:53PM +0200, Stefan Weil wrote:
> When argument checking is enabled, gcc throws this error:
> 
> error: format not a string literal and no format arguments
> 
> The patch rewrites the statement to satisfy the compiler.
> 
> Signed-off-by: Stefan Weil 

Thanks, applied.

> ---
>  sparc-dis.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/sparc-dis.c b/sparc-dis.c
> index 83a12ae..611e74f 100644
> --- a/sparc-dis.c
> +++ b/sparc-dis.c
> @@ -2778,7 +2778,7 @@ print_insn_sparc (bfd_vma memaddr, disassemble_info 
> *info)
>/* Can't do simple format if source and dest are different.  */
>continue;
>  
> -  (*info->fprintf_func) (stream, opcode->name);
> +  (*info->fprintf_func) (stream, "%s", opcode->name);
>  
>{
>  const char *s;
> -- 
> 1.5.6.5
> 
> 
> 
> 

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net




Re: [Qemu-devel] [PATCH] linux-user: do_shmdt(): Fix page_set_flags's 2nd arg.

2010-04-13 Thread Aurelien Jarno
On Sun, Apr 11, 2010 at 02:09:57AM +0900, takas...@ops.dti.ne.jp wrote:
> 2nd arg of page_set_flags() should be start+size, but size.
> 
> Signed-off-by: Takashi YOSHII 

Thanks, applied.

> ---
>  linux-user/syscall.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/linux-user/syscall.c b/linux-user/syscall.c
> index a03e432..26c0fb4 100644
> --- a/linux-user/syscall.c
> +++ b/linux-user/syscall.c
> @@ -2752,7 +2752,7 @@ static inline abi_long do_shmdt(abi_ulong shmaddr)
>  for (i = 0; i < N_SHM_REGIONS; ++i) {
>  if (shm_regions[i].start == shmaddr) {
>  shm_regions[i].start = 0;
> -page_set_flags(shmaddr, shm_regions[i].size, 0);
> +page_set_flags(shmaddr, shmaddr + shm_regions[i].size, 0);
>  break;
>  }
>  }
> -- 
> 1.6.5
> 
> 
> 
> 

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net




[Qemu-devel] Qemu sharing files help

2010-04-13 Thread Arpit Patel
Hi,

I have Ubuntu on my host system and I am using following command on it, to
load kernel..

*qemu -kernel kernelimage -initrd initrd.img /dev/zero -append "cmdline" *

Was wondering, how can I share/copy files from host Ubuntu system to the one
with above command?

Someone mentioned using sshfs, but was hard for me to figure out, how ?



Thanks,
Arpit


Re: [Qemu-devel] Problem with DOS application and 286 DOS Extender application

2010-04-13 Thread Jamie Lokier
Gerhard Wiesinger wrote:
> It is a non public, proprietary application which uses the Ergo Computing 
> 286 DOS Extender. I guess some other application which use the same DOS 
> extender have the same problem. So best thing is to find another 
> application which uses the Ergo Computing 286 DOS Extender, too.

The 286 was obsolete 20 years ago, although code depending on it
persisted for some years after.

I'm fairly sure the number of people using (or trying to use) Qemu
with 286-specific code is very small indeed, so unfortunately for a
286 problem, you will need to help reproduce it as much as you can for
it to be fixed.

Note that Qemu doesn't emulate segments properly even for 32-bit x86
code, and 16-bit (286) code depends on that all the more.  That may be
the problem.

Or it may be the "reset using keyboard controller and BIOS" method
used to switch from protected mode to real mode on a 286 is not
implemented properly, or is not supported by the BIOS properly.

Or it may simply be a bug in 16-bit task segment switching or
something like that, which is quite complex and so rarely used that it
might never have been properly tested.

Did you try running the application under Bochs, which has a more
accurate emulation of very old x86 CPUs?

-- Jamie




[Qemu-devel] Re: ehci update

2010-04-13 Thread Jan Kiszka
David S. Ahern wrote:
> After a month of code refactoring and clean ups, etc, I thought I would
> send along an update. The attached patch is relative to your ehci
> branch; I also attached the full usb-ehci.c file for easier reading.

Thanks for your work! I applied it and once again merged git head at
this chance.

Just one request for the future: Please keep a queue of incremental
changes. This EHCI beast is sufficiently tricky, and at some point
someone (you included) may want to go back in history to find
out where some change came from, and why it was applied.

E.g.: We apparently regressed further /wrt my favorite test case (as
it's self-contained): "-usbdevice net". qemu is now entering an infinite
loop when you start dhcpcd in the guest on that interface.

> 
> At this point I can get a Windows XP guest to format a 4GB key and read
> from and write to it. I can get an FC-12 guest to format a 4GB key and
> an 8GB key as well as read from and write to both. Write rates are on
> the order of 8 MB/sec for dd:
> 
> # dd if=/dev/zero of=test bs=1M count=100 oflag=dsync
> 100+0 records in
> 100+0 records out
> 104857600 bytes (105 MB) copied, 12.1205 s, 8.7 MB/s
> 
> rsync of text files (e.g., /var/log) is on the order of 2MB/sec.
> 
> 4GB keys are definitely more stable; the 8GB is not recognized by
> Windows XP.
> 
> It still needs a lot of love, but definitely an improvement from the
> last version. The biggest difference for the performance boost and
> stability is discovering that the usbfs in linux limits transactions to
> 16k versus the EHCI spec which allows 20k per qTD. I added a hack to
> submit which detects 20k requests from a guest and breaks it up into 2
> requests through the host (a 16k and then a 4k).

Did someone already bring this up on LKML or wherever usbfs is
discussed? Should be fixable, I naively guess.

Jan



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] Re: ehci update

2010-04-13 Thread David S. Ahern


On 04/13/2010 05:35 PM, Jan Kiszka wrote:
> David S. Ahern wrote:
>> After a month of code refactoring and clean ups, etc, I thought I would
>> send along an update. The attached patch is relative to your ehci
>> branch; I also attached the full usb-ehci.c file for easier reading.
> 
> Thanks for your work! I applied it and once again merged git head at
> this chance.
> 
> Just one request for the future: Please keep a queue of incremental
> changes. This EHCI beast is sufficiently tricky, and at some point
> someone (you included) may want to go back in history to find
> out where some change came from, and why it was applied.

Agreed. I will submit focused changes from now on.

> 
> E.g.: We apparently regressed further /wrt my favorite test case (as
> it's self-contained): "-usbdevice net". qemu is now entering an infinite
> loop when you start dhcpcd in the guest on that interface.

I've been focused on a single path -- async, bulk transfers to a USB
key. I take a look at the ethernet device as well.

I've struggled with the infinite loop part: the async train jumps the
track with FC-12 guest rather quickly (ie., the link list is no longer a
loop). I put in a loop detector in the advance_state function which
helps for storage devices - but clearly something is not right. I've
been roaming the ehci_hcd code in the kernel as well looking for clues.
A lot of details to gel and the day job keeps getting in the way. :-)

> 
>>
>> At this point I can get a Windows XP guest to format a 4GB key and read
>> from and write to it. I can get an FC-12 guest to format a 4GB key and
>> an 8GB key as well as read from and write to both. Write rates are on
>> the order of 8 MB/sec for dd:
>>
>> # dd if=/dev/zero of=test bs=1M count=100 oflag=dsync
>> 100+0 records in
>> 100+0 records out
>> 104857600 bytes (105 MB) copied, 12.1205 s, 8.7 MB/s
>>
>> rsync of text files (e.g., /var/log) is on the order of 2MB/sec.
>>
>> 4GB keys are definitely more stable; the 8GB is not recognized by
>> Windows XP.
>>
>> It still needs a lot of love, but definitely an improvement from the
>> last version. The biggest difference for the performance boost and
>> stability is discovering that the usbfs in linux limits transactions to
>> 16k versus the EHCI spec which allows 20k per qTD. I added a hack to
>> submit which detects 20k requests from a guest and breaks it up into 2
>> requests through the host (a 16k and then a 4k).
> 
> Did someone already bring this up on LKML or wherever usbfs is
> discussed? Should be fixable, I naively guess.

I submitted the patch to linux-usb and it was nack'ed. The response was
that memory is allocated in powers of 2 so trying to up the limit from
16k to 20k means it will actually want to find 32k of contiguous memory.
The suggestion was to handle it with multiple requests within qemu. I
guess libusb does that.

Right now there is a hack in the ehci model to do this, but the
usb-linux code would be a better place since it might be specific to
linux hosts.

David

> 
> Jan
> 




[Qemu-devel] Re: [PATCH] tun: orphan an skb on tx

2010-04-13 Thread Herbert Xu
On Tue, Apr 13, 2010 at 08:31:03PM +0200, Eric Dumazet wrote:
>
> Herbert Acked your patch, so I guess its OK, but I think it can be
> dangerous.

The tun socket accounting was never designed to stop it from
flooding another tun interface.  It's there to stop it from
transmitting above a destination interface TX bandwidth and
cause unnecessary packet drops.  It also limits the total amount
of kernel memory that can be pinned down by a single tun interface.

In this case, all we're doing is shifting the accounting from the
"hardware" queue to the qdisc queue.

So your ability to flood a tun interface is essentially unchanged.

BTW we do the same thing in a number of hardware drivers, as well
as virtio-net.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt




Re: [SeaBIOS] [Qemu-devel] QEMU regression problems

2010-04-13 Thread Kevin O'Connor
On Tue, Apr 13, 2010 at 02:26:10PM +0800, Roy Tam wrote:
> 2010/4/12 Gerhard Wiesinger :
> > 3.) There is also a problem with the reported base memory under QEMM386
> > (HIMEM.SYS and EMM386.EXE is correct here). It is 646kB instead of 640kB.
> > Therefore base memory test fails. I guess that reporting memory CMOS
> > tables/interrupt functions are not implemented correctly.
> 
> - The Base Memory > 640K error seems to be SeaBIOS related. QEMU Bochs
> BIOS(tested with both -old-bios hack in 0.12 series and old 0.11.1)
> will just freeze after QEMU counted RAM.(Tested with ScriptPC and
> Bochs).

The SeaBIOS log would really help.  This can be done by adding:

-chardev stdio,id=seabios -device isa-debugcon,iobase=0x402,chardev=seabios

to the qemu command line.

The memory can be obtained in several places (int 12, int 1588, int
15e801, int 15e820, and mem 40:13).  All look fine to me from looking
at the code.

-Kevin




Re: [Qemu-devel] Re: ehci update

2010-04-13 Thread Alexander Graf

On 14.04.2010, at 01:50, David S. Ahern wrote:

> 
> 
> On 04/13/2010 05:35 PM, Jan Kiszka wrote:
>> David S. Ahern wrote:
>>> After a month of code refactoring and clean ups, etc, I thought I would
>>> send along an update. The attached patch is relative to your ehci
>>> branch; I also attached the full usb-ehci.c file for easier reading.
>> 
>> Thanks for your work! I applied it and once again merged git head at
>> this chance.
>> 
>> Just one request for the future: Please keep a queue of incremental
>> changes. This EHCI beast is sufficiently tricky, and at some point
>> someone (you included) may want to go back in history to find
>> out where some change came from, and why it was applied.
> 
> Agreed. I will submit focused changes from now on.
> 
>> 
>> E.g.: We apparently regressed further /wrt my favorite test case (as
>> it's self-contained): "-usbdevice net". qemu is now entering an infinite
>> loop when you start dhcpcd in the guest on that interface.
> 
> I've been focused on a single path -- async, bulk transfers to a USB
> key. I take a look at the ethernet device as well.
> 
> I've struggled with the infinite loop part: the async train jumps the
> track with FC-12 guest rather quickly (ie., the link list is no longer a
> loop). I put in a loop detector in the advance_state function which
> helps for storage devices - but clearly something is not right. I've
> been roaming the ehci_hcd code in the kernel as well looking for clues.
> A lot of details to gel and the day job keeps getting in the way. :-)
> 
>> 
>>> 
>>> At this point I can get a Windows XP guest to format a 4GB key and read
>>> from and write to it. I can get an FC-12 guest to format a 4GB key and
>>> an 8GB key as well as read from and write to both. Write rates are on
>>> the order of 8 MB/sec for dd:
>>> 
>>> # dd if=/dev/zero of=test bs=1M count=100 oflag=dsync
>>> 100+0 records in
>>> 100+0 records out
>>> 104857600 bytes (105 MB) copied, 12.1205 s, 8.7 MB/s
>>> 
>>> rsync of text files (e.g., /var/log) is on the order of 2MB/sec.
>>> 
>>> 4GB keys are definitely more stable; the 8GB is not recognized by
>>> Windows XP.
>>> 
>>> It still needs a lot of love, but definitely an improvement from the
>>> last version. The biggest difference for the performance boost and
>>> stability is discovering that the usbfs in linux limits transactions to
>>> 16k versus the EHCI spec which allows 20k per qTD. I added a hack to
>>> submit which detects 20k requests from a guest and breaks it up into 2
>>> requests through the host (a 16k and then a 4k).
>> 
>> Did someone already bring this up on LKML or wherever usbfs is
>> discussed? Should be fixable, I naively guess.
> 
> I submitted the patch to linux-usb and it was nack'ed. The response was
> that memory is allocated in powers of 2 so trying to up the limit from
> 16k to 20k means it will actually want to find 32k of contiguous memory.
> The suggestion was to handle it with multiple requests within qemu. I
> guess libusb does that.

Any reason we're not using libusb?


Alex





[Qemu-devel] Re: SeaBIOS error with Juniper FreeBSD kernel

2010-04-13 Thread Kevin O'Connor
On Tue, Apr 13, 2010 at 08:48:35PM +0200, Bjørn Mork wrote:
> Gleb Natapov  writes:
> > On Sun, Feb 28, 2010 at 02:39:04PM -0500, Kevin O'Connor wrote:
> >> On Sun, Feb 21, 2010 at 04:18:38PM -0700, Brandon Bennett wrote:
> >> > I have narrowed it down to SMBIOS.  If I disable CONFIG_SMBIOS the
> >> > image boots up fine.
> > If there is any seabios revision that works then it is possible to
> > bisect to find problematic commit.
> I'm also bitten by this and have now attempted to bisect it.  The result
> was: 
> 
> c95d2cee36db79c88253d43a90230ceadf0c26cf is first bad commit
> commit c95d2cee36db79c88253d43a90230ceadf0c26cf

It looks like memory layout changes in the f-segment is tickling the
underlying bug.  I don't think SMBIOS, the above commit, or the other
commit identified earlier are the root cause of the problem.  Instead,
I'd guess these commits just change the memory layout enough to avoid
the root cause.

This looks like it is going to require some careful study with a
debugger and some execution traces.  Unfortunately, since this image
isn't available for download it makes it difficult to track down.

-Kevin




[Qemu-devel] Re: [PULL] pci,vhost-net,eepro100

2010-04-13 Thread Anthony Liguori

On 04/11/2010 03:29 PM, Michael S. Tsirkin wrote:

The following changes since commit f7e2aca83419dde3c94fa1d5e615581bb4ded9c0:

   tcg/ppc: Fix typo (2010-04-06 03:10:03 +0400)

are available in the git repository at:
   git://git.kernel.org/pub/scm/linux/kernel/git/mst/qemu.git for_anthony
   


Pulled.  Thanks.

Regards,

Anthony Liguori


David L Stevens (1):
   vhost: fix features ack

Marcelo Tosatti (1):
   vhost.c: include  last

Michael S. Tsirkin (3):
   pci: add API to add capability at a known offset
   eepro100: convert to new capability API
   vhost-net: disable mergeable buffers

Stefan Weil (8):
   eepro100: Don't allow writing SCBStatus
   eepro100: Simplify status handling
   eepro100: Simplified device instantiation
   eepro100: Add new device variant i82801
   eepro100: Set configuration bit for standard TCB
   eepro100: Set power management capability using pci_reserve_capability
   eepro100: fix mapping of flash memory
   eepro100: fix PCI interrupt pin configuration regression

  hw/eepro100.c   |  534 --
  hw/pci.c|   17 ++-
  hw/pci.h|2 +
  hw/vhost.c  |2 +-
  hw/vhost_net.c  |1 +
  hw/virtio-net.c |8 +
  6 files changed, 223 insertions(+), 341 deletions(-)
   






Re: [Qemu-devel] Re: ehci update

2010-04-13 Thread David S. Ahern


On 04/13/2010 07:20 PM, Alexander Graf wrote:
 It still needs a lot of love, but definitely an improvement from the
 last version. The biggest difference for the performance boost and
 stability is discovering that the usbfs in linux limits transactions to
 16k versus the EHCI spec which allows 20k per qTD. I added a hack to
 submit which detects 20k requests from a guest and breaks it up into 2
 requests through the host (a 16k and then a 4k).
>>>
>>> Did someone already bring this up on LKML or wherever usbfs is
>>> discussed? Should be fixable, I naively guess.
>>
>> I submitted the patch to linux-usb and it was nack'ed. The response was
>> that memory is allocated in powers of 2 so trying to up the limit from
>> 16k to 20k means it will actually want to find 32k of contiguous memory.
>> The suggestion was to handle it with multiple requests within qemu. I
>> guess libusb does that.
> 
> Any reason we're not using libusb?

Good question. I was wondering the same. I was going to look at
converting usb-linux to use libusb1 when I get some time.

David

> 
> 
> Alex
> 




[Qemu-devel] [PATCH v5 00/17] virtio-serial fixes, new abi for port discovery, flow control

2010-04-13 Thread Amit Shah
Hello,

These patches rework the way ports are announced to the guests. A
control message is used to let the guest know a new port is
added. Initial port discovery and port hot-plug work via this way now.

This was done to have the host and guest port numbering in sync to
avoid surprises after several hotplug/unplug operations and
migrations.

The ability to assign a particular port number to ports is also added
so that management software can control the placement of ports.

This iteration fixes a few things Juan pointed out.

Juan and Gerd have already acked v4 of the patch series, from which it
remains largely unchanged.

Differences from v4:
- Juan: qemu_free() handles NULL fine

- Juan: Send ports_map as an array of u32s instead of as a buffer
  (big/little endian issues)

- Juan: iov should be compiled only once (nothing target-specific)

- A few fixes from the other submission: discard piled up buffers in
  the vq after port close or hot-unplug.

Overall:
- Users can set the port id they wish to instantiate ports at by using
  the ,nr= parameter to 'virtserialport' and 'virtconsole' devices

- Migration fixes: refuse migration when:
  - number of active ports is different between the src and destination
  - max_nr_ports a device can support on the src is more

- If a qemu chardev connection to a port is closed on the dest while
  it was open on the src, inform the guest about this. (Also do the
  same for port closed on src but open on dest.)

- Use control messages for relaying new port information instead of
  config space (changes abi)

- Propagate error message from guest in instantiating a port or a
  device to the user.

- Handle scatter/gather for control output and data output from the
  guest

- Fix abuse of virtio api in the virtqueue_push() function

- Add an API for the ports for flow control: ports can signal when
  they're ready to accept data / stop sending data.

Amit Shah (17):
  virtio-serial: save/load: Ensure target has enough ports
  virtio-serial: save/load: Ensure nr_ports on src and dest are same.
  virtio-serial: save/load: Ensure we have hot-plugged ports
instantiated
  virtio-serial: save/load: Send target host connection status if
different
  virtio-serial: Use control messages to notify guest of new ports
  virtio-serial: whitespace: match surrounding code
  virtio-serial: Remove redundant check for 0-sized write request
  virtio-serial: Update copyright year to 2010
  virtio-serial: Propagate errors in initialising ports / devices in
guest
  virtio-serial: Send out guest data to ports only if port is opened
  iov: Introduce a new file for helpers around iovs, add iov_from_buf()
  iov: Add iov_to_buf and iov_size helpers
  virtio-serial: Handle scatter-gather buffers for control messages
  virtio-serial: Handle scatter/gather input from the guest
  virtio-serial: Apps should consume all data that guest sends out /
Fix virtio api abuse
  virtio-serial: Discard data that guest sends us when ports aren't
connected
  virtio-serial: Implement flow control for individual ports

 Makefile   |2 +
 Makefile.objs  |1 +
 hw/iov.c   |   70 ++
 hw/iov.h   |   19 +++
 hw/virtio-balloon.c|   35 +-
 hw/virtio-console.c|   11 +-
 hw/virtio-net.c|   20 +---
 hw/virtio-serial-bus.c |  333 
 hw/virtio-serial.h |   34 --
 9 files changed, 377 insertions(+), 148 deletions(-)
 create mode 100644 hw/iov.c
 create mode 100644 hw/iov.h





[Qemu-devel] [PATCH v5 01/17] virtio-serial: save/load: Ensure target has enough ports

2010-04-13 Thread Amit Shah
The target could be started with max_nr_ports for a virtio-serial device
lesser than what was available on the source machine. Fail the migration
in such a case.

Signed-off-by: Amit Shah 
Reported-by: Juan Quintela 
---
 hw/virtio-serial-bus.c |   13 +++--
 1 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
index 17c1ec1..9a7f0c1 100644
--- a/hw/virtio-serial-bus.c
+++ b/hw/virtio-serial-bus.c
@@ -374,10 +374,13 @@ static void virtio_serial_save(QEMUFile *f, void *opaque)
 
 /* Items in struct VirtIOSerial */
 
+qemu_put_be32s(f, &s->bus->max_nr_ports);
+
 /* Do this because we might have hot-unplugged some ports */
 nr_active_ports = 0;
-QTAILQ_FOREACH(port, &s->ports, next)
+QTAILQ_FOREACH(port, &s->ports, next) {
 nr_active_ports++;
+}
 
 qemu_put_be32s(f, &nr_active_ports);
 
@@ -399,7 +402,7 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, 
int version_id)
 {
 VirtIOSerial *s = opaque;
 VirtIOSerialPort *port;
-uint32_t nr_active_ports;
+uint32_t max_nr_ports, nr_active_ports;
 unsigned int i;
 
 if (version_id > 2) {
@@ -420,6 +423,12 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, 
int version_id)
 
 /* Items in struct VirtIOSerial */
 
+qemu_get_be32s(f, &max_nr_ports);
+if (max_nr_ports > s->bus->max_nr_ports) {
+/* Source could have more ports than us. Fail migration. */
+return -EINVAL;
+}
+
 qemu_get_be32s(f, &nr_active_ports);
 
 /* Items in struct VirtIOSerialPort */
-- 
1.6.2.5





[Qemu-devel] [PATCH v5 02/17] virtio-serial: save/load: Ensure nr_ports on src and dest are same.

2010-04-13 Thread Amit Shah
The number of ports on the source as well as the destination machines
should match. If they don't, it means some ports that got hotplugged on
the source aren't instantiated on the destination. Or that ports that
were hot-unplugged on the source are created on the destination.

Signed-off-by: Amit Shah 
Reported-by: Juan Quintela 
---
 hw/virtio-serial-bus.c |   18 --
 1 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
index 9a7f0c1..d31e62d 100644
--- a/hw/virtio-serial-bus.c
+++ b/hw/virtio-serial-bus.c
@@ -402,7 +402,7 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, 
int version_id)
 {
 VirtIOSerial *s = opaque;
 VirtIOSerialPort *port;
-uint32_t max_nr_ports, nr_active_ports;
+uint32_t max_nr_ports, nr_active_ports, nr_ports;
 unsigned int i;
 
 if (version_id > 2) {
@@ -419,7 +419,21 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, 
int version_id)
 /* The config space */
 qemu_get_be16s(f, &s->config.cols);
 qemu_get_be16s(f, &s->config.rows);
-s->config.nr_ports = qemu_get_be32(f);
+nr_ports = qemu_get_be32(f);
+
+if (nr_ports != s->config.nr_ports) {
+/*
+ * Source hot-plugged/unplugged ports and we don't have all of
+ * them here.
+ *
+ * Note: This condition cannot check for all hotplug/unplug
+ * events: eg, if one port was hot-plugged and one was
+ * unplugged, the nr_ports remains the same but the port id's
+ * would have changed and we won't catch it here. A later
+ * check for !find_port_by_id() will confirm if this happened.
+ */
+return -EINVAL;
+}
 
 /* Items in struct VirtIOSerial */
 
-- 
1.6.2.5





[Qemu-devel] [PATCH v5 06/17] virtio-serial: whitespace: match surrounding code

2010-04-13 Thread Amit Shah
The virtio-serial code doesn't mix declarations and definitions, so
separate them out on different lines.

Signed-off-by: Amit Shah 
---
 hw/virtio-serial-bus.c |5 -
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
index bb11a9b..8efba0b 100644
--- a/hw/virtio-serial-bus.c
+++ b/hw/virtio-serial-bus.c
@@ -354,7 +354,10 @@ static void handle_input(VirtIODevice *vdev, VirtQueue *vq)
 
 static uint32_t get_features(VirtIODevice *vdev, uint32_t features)
 {
-VirtIOSerial *vser = DO_UPCAST(VirtIOSerial, vdev, vdev);
+VirtIOSerial *vser;
+
+vser = DO_UPCAST(VirtIOSerial, vdev, vdev);
+
 if (vser->bus->max_nr_ports > 1) {
 features |= (1 << VIRTIO_CONSOLE_F_MULTIPORT);
 }
-- 
1.6.2.5





[Qemu-devel] [PATCH v5 03/17] virtio-serial: save/load: Ensure we have hot-plugged ports instantiated

2010-04-13 Thread Amit Shah
If some ports that were hot-plugged on the source are not available on
the destination, fail migration instead of trying to deref a NULL
pointer.

Signed-off-by: Amit Shah 
Reported-by: Juan Quintela 
---
 hw/virtio-serial-bus.c |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
index d31e62d..5316ef6 100644
--- a/hw/virtio-serial-bus.c
+++ b/hw/virtio-serial-bus.c
@@ -451,6 +451,13 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, 
int version_id)
 
 id = qemu_get_be32(f);
 port = find_port_by_id(s, id);
+if (!port) {
+/*
+ * The requested port was hot-plugged on the source but we
+ * don't have it
+ */
+return -EINVAL;
+}
 
 port->guest_connected = qemu_get_byte(f);
 }
-- 
1.6.2.5





[Qemu-devel] [PATCH v5 07/17] virtio-serial: Remove redundant check for 0-sized write request

2010-04-13 Thread Amit Shah
The check for a 0-sized write request to a guest port is not necessary;
the while loop below won't be executed in this case and all will be
fine.

Signed-off-by: Amit Shah 
---
 hw/virtio-serial-bus.c |3 ---
 1 files changed, 0 insertions(+), 3 deletions(-)

diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
index 8efba0b..c1d9851 100644
--- a/hw/virtio-serial-bus.c
+++ b/hw/virtio-serial-bus.c
@@ -91,9 +91,6 @@ static size_t write_to_port(VirtIOSerialPort *port,
 if (!virtio_queue_ready(vq)) {
 return 0;
 }
-if (!size) {
-return 0;
-}
 
 while (offset < size) {
 int i;
-- 
1.6.2.5





[Qemu-devel] [PATCH v5 05/17] virtio-serial: Use control messages to notify guest of new ports

2010-04-13 Thread Amit Shah
Allow the port 'id's to be set by a user on the command line. This is
needed by management apps that will want a stable port numbering scheme
for hot-plug/unplug and migration.

Since the port numbers are shared with the guest (to identify ports in
control messages), we just send a control message to the guest
indicating addition of new ports (hot-plug) or notifying the guest of
the available ports when the guest sends us a DEVICE_READY control
message.

Signed-off-by: Amit Shah 
---
 hw/virtio-console.c|2 +
 hw/virtio-serial-bus.c |  184 +++
 hw/virtio-serial.h |   17 +++--
 3 files changed, 133 insertions(+), 70 deletions(-)

diff --git a/hw/virtio-console.c b/hw/virtio-console.c
index bd44ec6..17b221d 100644
--- a/hw/virtio-console.c
+++ b/hw/virtio-console.c
@@ -99,6 +99,7 @@ static VirtIOSerialPortInfo virtconsole_info = {
 .exit  = virtconsole_exitfn,
 .qdev.props = (Property[]) {
 DEFINE_PROP_UINT8("is_console", VirtConsole, port.is_console, 1),
+DEFINE_PROP_UINT32("nr", VirtConsole, port.id, VIRTIO_CONSOLE_BAD_ID),
 DEFINE_PROP_CHR("chardev", VirtConsole, chr),
 DEFINE_PROP_STRING("name", VirtConsole, port.name),
 DEFINE_PROP_END_OF_LIST(),
@@ -133,6 +134,7 @@ static VirtIOSerialPortInfo virtserialport_info = {
 .init  = virtserialport_initfn,
 .exit  = virtconsole_exitfn,
 .qdev.props = (Property[]) {
+DEFINE_PROP_UINT32("nr", VirtConsole, port.id, VIRTIO_CONSOLE_BAD_ID),
 DEFINE_PROP_CHR("chardev", VirtConsole, chr),
 DEFINE_PROP_STRING("name", VirtConsole, port.name),
 DEFINE_PROP_END_OF_LIST(),
diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
index 484dc94..bb11a9b 100644
--- a/hw/virtio-serial-bus.c
+++ b/hw/virtio-serial-bus.c
@@ -41,6 +41,10 @@ struct VirtIOSerial {
 VirtIOSerialBus *bus;
 
 QTAILQ_HEAD(, VirtIOSerialPort) ports;
+
+/* bitmap for identifying active ports */
+uint32_t *ports_map;
+
 struct virtio_console_config config;
 };
 
@@ -48,6 +52,10 @@ static VirtIOSerialPort *find_port_by_id(VirtIOSerial *vser, 
uint32_t id)
 {
 VirtIOSerialPort *port;
 
+if (id == VIRTIO_CONSOLE_BAD_ID) {
+return NULL;
+}
+
 QTAILQ_FOREACH(port, &vser->ports, next) {
 if (port->id == id)
 return port;
@@ -208,14 +216,25 @@ static void handle_control_message(VirtIOSerial *vser, 
void *buf)
 size_t buffer_len;
 
 gcpkt = buf;
-port = find_port_by_id(vser, ldl_p(&gcpkt->id));
-if (!port)
-return;
 
 cpkt.event = lduw_p(&gcpkt->event);
 cpkt.value = lduw_p(&gcpkt->value);
 
+port = find_port_by_id(vser, ldl_p(&gcpkt->id));
+if (!port && cpkt.event != VIRTIO_CONSOLE_DEVICE_READY)
+return;
+
 switch(cpkt.event) {
+case VIRTIO_CONSOLE_DEVICE_READY:
+/*
+ * The device is up, we can now tell the device about all the
+ * ports we have here.
+ */
+QTAILQ_FOREACH(port, &vser->ports, next) {
+send_control_event(port, VIRTIO_CONSOLE_PORT_ADD, 1);
+}
+break;
+
 case VIRTIO_CONSOLE_PORT_READY:
 /*
  * Now that we know the guest asked for the port name, we're
@@ -363,6 +382,7 @@ static void virtio_serial_save(QEMUFile *f, void *opaque)
 VirtIOSerial *s = opaque;
 VirtIOSerialPort *port;
 uint32_t nr_active_ports;
+unsigned int i;
 
 /* The virtio device */
 virtio_save(&s->vdev, f);
@@ -370,13 +390,17 @@ static void virtio_serial_save(QEMUFile *f, void *opaque)
 /* The config space */
 qemu_put_be16s(f, &s->config.cols);
 qemu_put_be16s(f, &s->config.rows);
-qemu_put_be32s(f, &s->config.nr_ports);
 
-/* Items in struct VirtIOSerial */
+qemu_put_be32s(f, &s->config.max_nr_ports);
+
+/* The ports map */
+
+for (i = 0; i < (s->config.max_nr_ports + 31) / 32; i++) {
+qemu_put_be32s(f, &s->ports_map[i]);
+}
 
-qemu_put_be32s(f, &s->bus->max_nr_ports);
+/* Ports */
 
-/* Do this because we might have hot-unplugged some ports */
 nr_active_ports = 0;
 QTAILQ_FOREACH(port, &s->ports, next) {
 nr_active_ports++;
@@ -388,11 +412,6 @@ static void virtio_serial_save(QEMUFile *f, void *opaque)
  * Items in struct VirtIOSerialPort.
  */
 QTAILQ_FOREACH(port, &s->ports, next) {
-/*
- * We put the port number because we may not have an active
- * port at id 0 that's reserved for a console port, or in case
- * of ports that might have gotten unplugged
- */
 qemu_put_be32s(f, &port->id);
 qemu_put_byte(f, port->guest_connected);
 qemu_put_byte(f, port->host_connected);
@@ -403,7 +422,8 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, 
int version_id)
 {
 VirtIOSerial *s = opaque;
 VirtIOSerialPort *port;
-uint32_t max_nr_ports, nr_active_ports, nr_ports;
+  

[Qemu-devel] [PATCH v5 08/17] virtio-serial: Update copyright year to 2010

2010-04-13 Thread Amit Shah
Signed-off-by: Amit Shah 
---
 hw/virtio-console.c|2 +-
 hw/virtio-serial-bus.c |2 +-
 hw/virtio-serial.h |2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/virtio-console.c b/hw/virtio-console.c
index 17b221d..6b8 100644
--- a/hw/virtio-console.c
+++ b/hw/virtio-console.c
@@ -1,7 +1,7 @@
 /*
  * Virtio Console and Generic Serial Port Devices
  *
- * Copyright Red Hat, Inc. 2009
+ * Copyright Red Hat, Inc. 2009, 2010
  *
  * Authors:
  *  Amit Shah 
diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
index c1d9851..c77ea4f 100644
--- a/hw/virtio-serial-bus.c
+++ b/hw/virtio-serial-bus.c
@@ -1,7 +1,7 @@
 /*
  * A bus for connecting virtio serial and console ports
  *
- * Copyright (C) 2009 Red Hat, Inc.
+ * Copyright (C) 2009, 2010 Red Hat, Inc.
  *
  * Author(s):
  *  Amit Shah 
diff --git a/hw/virtio-serial.h b/hw/virtio-serial.h
index 0548689..f023873 100644
--- a/hw/virtio-serial.h
+++ b/hw/virtio-serial.h
@@ -2,7 +2,7 @@
  * Virtio Serial / Console Support
  *
  * Copyright IBM, Corp. 2008
- * Copyright Red Hat, Inc. 2009
+ * Copyright Red Hat, Inc. 2009, 2010
  *
  * Authors:
  *  Christian Ehrhardt 
-- 
1.6.2.5





[Qemu-devel] [PATCH v5 09/17] virtio-serial: Propagate errors in initialising ports / devices in guest

2010-04-13 Thread Amit Shah
If adding of ports or devices in the guest fails we can send out a QMP
event so that management software can deal with it.

Signed-off-by: Amit Shah 
---
 hw/virtio-serial-bus.c |   10 ++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
index c77ea4f..3a09f0d 100644
--- a/hw/virtio-serial-bus.c
+++ b/hw/virtio-serial-bus.c
@@ -223,6 +223,11 @@ static void handle_control_message(VirtIOSerial *vser, 
void *buf)
 
 switch(cpkt.event) {
 case VIRTIO_CONSOLE_DEVICE_READY:
+if (!cpkt.value) {
+error_report("virtio-serial-bus: Guest failure in adding device 
%s\n",
+ vser->bus->qbus.name);
+break;
+}
 /*
  * The device is up, we can now tell the device about all the
  * ports we have here.
@@ -233,6 +238,11 @@ static void handle_control_message(VirtIOSerial *vser, 
void *buf)
 break;
 
 case VIRTIO_CONSOLE_PORT_READY:
+if (!cpkt.value) {
+error_report("virtio-serial-bus: Guest failure in adding port %u 
for device %s\n",
+ port->id, vser->bus->qbus.name);
+break;
+}
 /*
  * Now that we know the guest asked for the port name, we're
  * sure the guest has initialised whatever state is necessary
-- 
1.6.2.5





[Qemu-devel] [PATCH v5 10/17] virtio-serial: Send out guest data to ports only if port is opened

2010-04-13 Thread Amit Shah
Data should be written only when ports are open.

Signed-off-by: Amit Shah 
---
 hw/virtio-serial-bus.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
index 3a09f0d..6befd4d 100644
--- a/hw/virtio-serial-bus.c
+++ b/hw/virtio-serial-bus.c
@@ -335,6 +335,11 @@ static void handle_output(VirtIODevice *vdev, VirtQueue 
*vq)
 goto next_buf;
 }
 
+   if (!port->host_connected) {
+ret = 0;
+goto next_buf;
+}
+
 /*
  * A port may not have any handler registered for consuming the
  * data that the guest sends or it may not have a chardev associated
-- 
1.6.2.5





[Qemu-devel] [PATCH v5 11/17] iov: Introduce a new file for helpers around iovs, add iov_from_buf()

2010-04-13 Thread Amit Shah
The virtio-net code uses iov_fill() which fills an iov from a linear
buffer. The virtio-serial-bus code does something similar in an
open-coded function.

Create a new iov.c file that has iov_from_buf().

Convert virtio-net and virtio-serial-bus over to use this functionality.
virtio-net used ints to hold sizes, the new function is going to use
size_t types.

Later commits will add the opposite functionality -- going from an iov
to a linear buffer.

Signed-off-by: Amit Shah 
---
 Makefile   |2 ++
 Makefile.objs  |1 +
 hw/iov.c   |   33 +
 hw/iov.h   |   16 
 hw/virtio-net.c|   20 +++-
 hw/virtio-serial-bus.c |   15 +++
 6 files changed, 62 insertions(+), 25 deletions(-)
 create mode 100644 hw/iov.c
 create mode 100644 hw/iov.h

diff --git a/Makefile b/Makefile
index a404fda..18e7368 100644
--- a/Makefile
+++ b/Makefile
@@ -124,6 +124,8 @@ curses.o: curses.c keymaps.h curses_keys.h
 
 bt-host.o: QEMU_CFLAGS += $(BLUEZ_CFLAGS)
 
+iov.o: iov.c iov.h
+
 ##
 
 qemu-img.o: qemu-img-cmds.h
diff --git a/Makefile.objs b/Makefile.objs
index ab1af88..fa4456c 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -100,6 +100,7 @@ common-obj-y += keymaps.o
 common-obj-$(CONFIG_SDL) += sdl.o sdl_zoom.o x_keymap.o
 common-obj-$(CONFIG_CURSES) += curses.o
 common-obj-y += vnc.o acl.o d3des.o
+common-obj-y += iov.o
 common-obj-$(CONFIG_VNC_TLS) += vnc-tls.o vnc-auth-vencrypt.o
 common-obj-$(CONFIG_VNC_SASL) += vnc-auth-sasl.o
 common-obj-$(CONFIG_COCOA) += cocoa.o
diff --git a/hw/iov.c b/hw/iov.c
new file mode 100644
index 000..07bd499
--- /dev/null
+++ b/hw/iov.c
@@ -0,0 +1,33 @@
+/*
+ * Helpers for getting linearized buffers from iov / filling buffers into iovs
+ *
+ * Copyright IBM, Corp. 2007, 2008
+ * Copyright (C) 2010 Red Hat, Inc.
+ *
+ * Author(s):
+ *  Anthony Liguori 
+ *  Amit Shah 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include "iov.h"
+
+size_t iov_from_buf(struct iovec *iov, unsigned int iovcnt,
+const void *buf, size_t size)
+{
+size_t offset;
+unsigned int i;
+
+offset = 0;
+for (i = 0; offset < size && i < iovcnt; i++) {
+size_t len;
+
+len = MIN(iov[i].iov_len, size - offset);
+
+memcpy(iov[i].iov_base, buf + offset, len);
+offset += len;
+}
+return offset;
+}
diff --git a/hw/iov.h b/hw/iov.h
new file mode 100644
index 000..5e3e541
--- /dev/null
+++ b/hw/iov.h
@@ -0,0 +1,16 @@
+/*
+ * Helpers for getting linearized buffers from iov / filling buffers into iovs
+ *
+ * Copyright (C) 2010 Red Hat, Inc.
+ *
+ * Author(s):
+ *  Amit Shah 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#include "qemu-common.h"
+
+size_t iov_from_buf(struct iovec *iov, unsigned int iovcnt,
+const void *buf, size_t size);
diff --git a/hw/virtio-net.c b/hw/virtio-net.c
index acb3cec..d602c56 100644
--- a/hw/virtio-net.c
+++ b/hw/virtio-net.c
@@ -11,6 +11,7 @@
  *
  */
 
+#include "iov.h"
 #include "virtio.h"
 #include "net.h"
 #include "net/checksum.h"
@@ -445,21 +446,6 @@ static void work_around_broken_dhclient(struct 
virtio_net_hdr *hdr,
 }
 }
 
-static int iov_fill(struct iovec *iov, int iovcnt, const void *buf, int count)
-{
-int offset, i;
-
-offset = i = 0;
-while (offset < count && i < iovcnt) {
-int len = MIN(iov[i].iov_len, count - offset);
-memcpy(iov[i].iov_base, buf + offset, len);
-offset += len;
-i++;
-}
-
-return offset;
-}
-
 static int receive_header(VirtIONet *n, struct iovec *iov, int iovcnt,
   const void *buf, size_t size, size_t hdr_len)
 {
@@ -595,8 +581,8 @@ static ssize_t virtio_net_receive(VLANClientState *nc, 
const uint8_t *buf, size_
 }
 
 /* copy in packet.  ugh */
-len = iov_fill(sg, elem.in_num,
-   buf + offset, size - offset);
+len = iov_from_buf(sg, elem.in_num,
+   buf + offset, size - offset);
 total += len;
 
 /* signal other side */
diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
index 6befd4d..a72b6b5 100644
--- a/hw/virtio-serial-bus.c
+++ b/hw/virtio-serial-bus.c
@@ -15,6 +15,7 @@
  * the COPYING file in the top-level directory.
  */
 
+#include "iov.h"
 #include "monitor.h"
 #include "qemu-queue.h"
 #include "sysbus.h"
@@ -84,27 +85,25 @@ static size_t write_to_port(VirtIOSerialPort *port,
 {
 VirtQueueElement elem;
 VirtQueue *vq;
-size_t offset = 0;
-size_t len = 0;
+size_t offset;
 
 vq = port->ivq;
 if (!virtio_queue_ready(vq)) {
 return 0;
 }
 
+offset = 0;
 while (offset < 

[Qemu-devel] [PATCH v5 12/17] iov: Add iov_to_buf and iov_size helpers

2010-04-13 Thread Amit Shah
iov_to_buf() puts the buffer contents in the iov in a linearized buffer.

iov_size() gets the length of the contents in the iov.

The iov_to_buf() function is the memcpy_to_iovec() function that was
used in virtio-ballon.c.

Signed-off-by: Amit Shah 
---
 hw/iov.c|   37 +
 hw/iov.h|3 +++
 hw/virtio-balloon.c |   35 ---
 3 files changed, 44 insertions(+), 31 deletions(-)

diff --git a/hw/iov.c b/hw/iov.c
index 07bd499..588cd04 100644
--- a/hw/iov.c
+++ b/hw/iov.c
@@ -31,3 +31,40 @@ size_t iov_from_buf(struct iovec *iov, unsigned int iovcnt,
 }
 return offset;
 }
+
+size_t iov_to_buf(const struct iovec *iov, const unsigned int iovcnt,
+  void *buf, size_t offset, size_t size)
+{
+uint8_t *ptr;
+size_t iov_off, buf_off;
+unsigned int i;
+
+ptr = buf;
+iov_off = 0;
+buf_off = 0;
+for (i = 0; i < iovcnt && size; i++) {
+if (offset < (iov_off + iov[i].iov_len)) {
+size_t len = MIN((iov_off + iov[i].iov_len) - offset , size);
+
+memcpy(ptr + buf_off, iov[i].iov_base + (offset - iov_off), len);
+
+buf_off += len;
+offset += len;
+size -= len;
+}
+iov_off += iov[i].iov_len;
+}
+return buf_off;
+}
+
+size_t iov_size(const struct iovec *iov, const unsigned int iovcnt)
+{
+size_t len;
+unsigned int i;
+
+len = 0;
+for (i = 0; i < iovcnt; i++) {
+len += iov[i].iov_len;
+}
+return len;
+}
diff --git a/hw/iov.h b/hw/iov.h
index 5e3e541..60a8547 100644
--- a/hw/iov.h
+++ b/hw/iov.h
@@ -14,3 +14,6 @@
 
 size_t iov_from_buf(struct iovec *iov, unsigned int iovcnt,
 const void *buf, size_t size);
+size_t iov_to_buf(const struct iovec *iov, const unsigned int iovcnt,
+  void *buf, size_t offset, size_t size);
+size_t iov_size(const struct iovec *iov, const unsigned int iovcnt);
diff --git a/hw/virtio-balloon.c b/hw/virtio-balloon.c
index f55f7ec..152af80 100644
--- a/hw/virtio-balloon.c
+++ b/hw/virtio-balloon.c
@@ -11,6 +11,7 @@
  *
  */
 
+#include "iov.h"
 #include "qemu-common.h"
 #include "virtio.h"
 #include "pc.h"
@@ -92,33 +93,6 @@ static QObject *get_stats_qobject(VirtIOBalloon *dev)
 return QOBJECT(dict);
 }
 
-/* FIXME: once we do a virtio refactoring, this will get subsumed into common
- * code */
-static size_t memcpy_from_iovector(void *data, size_t offset, size_t size,
-   struct iovec *iov, int iovlen)
-{
-int i;
-uint8_t *ptr = data;
-size_t iov_off = 0;
-size_t data_off = 0;
-
-for (i = 0; i < iovlen && size; i++) {
-if (offset < (iov_off + iov[i].iov_len)) {
-size_t len = MIN((iov_off + iov[i].iov_len) - offset , size);
-
-memcpy(ptr + data_off, iov[i].iov_base + (offset - iov_off), len);
-
-data_off += len;
-offset += len;
-size -= len;
-}
-
-iov_off += iov[i].iov_len;
-}
-
-return data_off;
-}
-
 static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
 {
 VirtIOBalloon *s = to_virtio_balloon(vdev);
@@ -128,8 +102,7 @@ static void virtio_balloon_handle_output(VirtIODevice 
*vdev, VirtQueue *vq)
 size_t offset = 0;
 uint32_t pfn;
 
-while (memcpy_from_iovector(&pfn, offset, 4,
-elem.out_sg, elem.out_num) == 4) {
+while (iov_to_buf(elem.out_sg, elem.out_num, &pfn, offset, 4) == 4) {
 ram_addr_t pa;
 ram_addr_t addr;
 
@@ -181,8 +154,8 @@ static void virtio_balloon_receive_stats(VirtIODevice 
*vdev, VirtQueue *vq)
  */
 reset_stats(s);
 
-while (memcpy_from_iovector(&stat, offset, sizeof(stat), elem->out_sg,
-elem->out_num) == sizeof(stat)) {
+while (iov_to_buf(elem->out_sg, elem->out_num, &stat, offset, sizeof(stat))
+   == sizeof(stat)) {
 uint16_t tag = tswap16(stat.tag);
 uint64_t val = tswap64(stat.val);
 
-- 
1.6.2.5





[Qemu-devel] [PATCH v5 13/17] virtio-serial: Handle scatter-gather buffers for control messages

2010-04-13 Thread Amit Shah
Current control messages are small enough to not be split into multiple
buffers but we could run into such a situation in the future or a
malicious guest could cause such a situation.

So handle the entire iov request for control messages.

Also ensure the size of the control request is >= what we expect
otherwise we risk accessing memory that we don't own.

Signed-off-by: Amit Shah 
CC: Avi Kivity 
Reported-by: Avi Kivity 
---
 hw/virtio-serial-bus.c |   31 ---
 1 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
index a72b6b5..b8410c3 100644
--- a/hw/virtio-serial-bus.c
+++ b/hw/virtio-serial-bus.c
@@ -204,7 +204,7 @@ size_t virtio_serial_guest_ready(VirtIOSerialPort *port)
 }
 
 /* Guest wants to notify us of some event */
-static void handle_control_message(VirtIOSerial *vser, void *buf)
+static void handle_control_message(VirtIOSerial *vser, void *buf, size_t len)
 {
 struct VirtIOSerialPort *port;
 struct virtio_console_control cpkt, *gcpkt;
@@ -213,6 +213,11 @@ static void handle_control_message(VirtIOSerial *vser, 
void *buf)
 
 gcpkt = buf;
 
+if (len < sizeof(cpkt)) {
+/* The guest sent an invalid control packet */
+return;
+}
+
 cpkt.event = lduw_p(&gcpkt->event);
 cpkt.value = lduw_p(&gcpkt->value);
 
@@ -306,13 +311,33 @@ static void control_out(VirtIODevice *vdev, VirtQueue *vq)
 {
 VirtQueueElement elem;
 VirtIOSerial *vser;
+uint8_t *buf;
+size_t len;
 
 vser = DO_UPCAST(VirtIOSerial, vdev, vdev);
 
+len = 0;
+buf = NULL;
 while (virtqueue_pop(vq, &elem)) {
-handle_control_message(vser, elem.out_sg[0].iov_base);
-virtqueue_push(vq, &elem, elem.out_sg[0].iov_len);
+size_t cur_len, copied;
+
+cur_len = iov_size(elem.out_sg, elem.out_num);
+/*
+ * Allocate a new buf only if we didn't have one previously or
+ * if the size of the buf differs
+ */
+if (cur_len > len) {
+qemu_free(buf);
+
+buf = qemu_malloc(cur_len);
+len = cur_len;
+}
+copied = iov_to_buf(elem.out_sg, elem.out_num, buf, 0, len);
+
+handle_control_message(vser, buf, copied);
+virtqueue_push(vq, &elem, copied);
 }
+qemu_free(buf);
 virtio_notify(vdev, vq);
 }
 
-- 
1.6.2.5





[Qemu-devel] [PATCH v5 14/17] virtio-serial: Handle scatter/gather input from the guest

2010-04-13 Thread Amit Shah
Current guests don't send more than one iov but it can change later.
Ensure we handle that case.

Signed-off-by: Amit Shah 
CC: Avi Kivity 
---
 hw/virtio-serial-bus.c |   12 
 1 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
index b8410c3..3053a35 100644
--- a/hw/virtio-serial-bus.c
+++ b/hw/virtio-serial-bus.c
@@ -351,7 +351,8 @@ static void handle_output(VirtIODevice *vdev, VirtQueue *vq)
 
 while (virtqueue_pop(vq, &elem)) {
 VirtIOSerialPort *port;
-size_t ret;
+uint8_t *buf;
+size_t ret, buf_size;
 
 port = find_port_by_vq(vser, vq);
 if (!port) {
@@ -374,9 +375,12 @@ static void handle_output(VirtIODevice *vdev, VirtQueue 
*vq)
 goto next_buf;
 }
 
-/* The guest always sends only one sg */
-ret = port->info->have_data(port, elem.out_sg[0].iov_base,
-elem.out_sg[0].iov_len);
+buf_size = iov_size(elem.out_sg, elem.out_num);
+buf = qemu_malloc(buf_size);
+ret = iov_to_buf(elem.out_sg, elem.out_num, buf, 0, buf_size);
+
+ret = port->info->have_data(port, buf, ret);
+qemu_free(buf);
 
 next_buf:
 virtqueue_push(vq, &elem, ret);
-- 
1.6.2.5





[Qemu-devel] [PATCH v5 16/17] virtio-serial: Discard data that guest sends us when ports aren't connected

2010-04-13 Thread Amit Shah
Before the earlier patch, we relied on incorrect virtio api usage to
signal to the guest that a particular buffer wasn't consumed by the
host.

After fixing that, we now just discard the data the guest sends us while
a host port is disconnected or doesn't have a handler registered for
consuming data.

This commit really doesn't change anything from the current behaviour,
just makes the code slightly better by spinning off data handling to
ports in another function.

Signed-off-by: Amit Shah 
---
 hw/virtio-serial-bus.c |   68 ++--
 1 files changed, 31 insertions(+), 37 deletions(-)

diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
index ad44127..0166780 100644
--- a/hw/virtio-serial-bus.c
+++ b/hw/virtio-serial-bus.c
@@ -111,6 +111,29 @@ static size_t write_to_port(VirtIOSerialPort *port,
 return offset;
 }
 
+static void flush_queued_data(VirtIOSerialPort *port, bool discard)
+{
+VirtQueue *vq;
+VirtQueueElement elem;
+
+vq = port->ovq;
+while (virtqueue_pop(vq, &elem)) {
+uint8_t *buf;
+size_t ret, buf_size;
+
+if (!discard) {
+buf_size = iov_size(elem.out_sg, elem.out_num);
+buf = qemu_malloc(buf_size);
+ret = iov_to_buf(elem.out_sg, elem.out_num, buf, 0, buf_size);
+
+port->info->have_data(port, buf, ret);
+qemu_free(buf);
+}
+virtqueue_push(vq, &elem, 0);
+}
+virtio_notify(&port->vser->vdev, vq);
+}
+
 static size_t send_control_msg(VirtIOSerialPort *port, void *buf, size_t len)
 {
 VirtQueueElement elem;
@@ -345,47 +368,18 @@ static void control_out(VirtIODevice *vdev, VirtQueue *vq)
 static void handle_output(VirtIODevice *vdev, VirtQueue *vq)
 {
 VirtIOSerial *vser;
-VirtQueueElement elem;
+VirtIOSerialPort *port;
+bool discard;
 
 vser = DO_UPCAST(VirtIOSerial, vdev, vdev);
+port = find_port_by_vq(vser, vq);
 
-while (virtqueue_pop(vq, &elem)) {
-VirtIOSerialPort *port;
-uint8_t *buf;
-size_t ret, buf_size;
-
-port = find_port_by_vq(vser, vq);
-if (!port) {
-ret = 0;
-goto next_buf;
-}
-
-   if (!port->host_connected) {
-ret = 0;
-goto next_buf;
-}
-
-/*
- * A port may not have any handler registered for consuming the
- * data that the guest sends or it may not have a chardev associated
- * with it. Just ignore the data in that case.
- */
-if (!port->info->have_data) {
-ret = 0;
-goto next_buf;
-}
-
-buf_size = iov_size(elem.out_sg, elem.out_num);
-buf = qemu_malloc(buf_size);
-ret = iov_to_buf(elem.out_sg, elem.out_num, buf, 0, buf_size);
-
-port->info->have_data(port, buf, ret);
-qemu_free(buf);
-
-next_buf:
-virtqueue_push(vq, &elem, 0);
+discard = false;
+if (!port || !port->host_connected || !port->info->have_data) {
+discard = true;
 }
-virtio_notify(vdev, vq);
+
+flush_queued_data(port, discard);
 }
 
 static void handle_input(VirtIODevice *vdev, VirtQueue *vq)
-- 
1.6.2.5





[Qemu-devel] [PATCH v5 17/17] virtio-serial: Implement flow control for individual ports

2010-04-13 Thread Amit Shah
Individual ports can now signal to the virtio-serial core to stop
sending data if the ports cannot immediately handle new data.  When a
port later unthrottles, any data queued up in the virtqueue are sent to
the port.

Disable throttling once a port is closed (and we discard all the
unconsumed buffers in the vq).

The guest kernel can reclaim the buffers when it receives the port close
event or when a port is being removed. Ensure we free up the buffers
before we send out any events to the guest.

Signed-off-by: Amit Shah 
---
 hw/virtio-serial-bus.c |   37 ++---
 hw/virtio-serial.h |9 +
 2 files changed, 43 insertions(+), 3 deletions(-)

diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
index 0166780..de64154 100644
--- a/hw/virtio-serial-bus.c
+++ b/hw/virtio-serial-bus.c
@@ -117,7 +117,7 @@ static void flush_queued_data(VirtIOSerialPort *port, bool 
discard)
 VirtQueueElement elem;
 
 vq = port->ovq;
-while (virtqueue_pop(vq, &elem)) {
+while ((discard || !port->throttled) && virtqueue_pop(vq, &elem)) {
 uint8_t *buf;
 size_t ret, buf_size;
 
@@ -185,6 +185,13 @@ int virtio_serial_open(VirtIOSerialPort *port)
 int virtio_serial_close(VirtIOSerialPort *port)
 {
 port->host_connected = false;
+/*
+ * If there's any data the guest sent which the app didn't
+ * consume, reset the throttling flag and discard the data.
+ */
+port->throttled = false;
+flush_queued_data(port, true);
+
 send_control_event(port, VIRTIO_CONSOLE_PORT_OPEN, 0);
 
 return 0;
@@ -226,6 +233,20 @@ size_t virtio_serial_guest_ready(VirtIOSerialPort *port)
 return 0;
 }
 
+void virtio_serial_throttle_port(VirtIOSerialPort *port, bool throttle)
+{
+if (!port) {
+return;
+}
+
+port->throttled = throttle;
+if (throttle) {
+return;
+}
+
+flush_queued_data(port, false);
+}
+
 /* Guest wants to notify us of some event */
 static void handle_control_message(VirtIOSerial *vser, void *buf, size_t len)
 {
@@ -379,6 +400,10 @@ static void handle_output(VirtIODevice *vdev, VirtQueue 
*vq)
 discard = true;
 }
 
+if (!discard && port->throttled) {
+return;
+}
+
 flush_queued_data(port, discard);
 }
 
@@ -554,6 +579,8 @@ static void virtser_bus_dev_print(Monitor *mon, DeviceState 
*qdev, int indent)
indent, "", port->guest_connected);
 monitor_printf(mon, "%*s dev-prop-int: host_connected: %d\n",
indent, "", port->host_connected);
+monitor_printf(mon, "%*s dev-prop-int: throttled: %d\n",
+   indent, "", port->throttled);
 }
 
 /* This function is only used if a port id is not provided by the user */
@@ -591,13 +618,17 @@ static void add_port(VirtIOSerial *vser, uint32_t port_id)
 
 static void remove_port(VirtIOSerial *vser, uint32_t port_id)
 {
+VirtIOSerialPort *port;
 unsigned int i;
 
 i = port_id / 32;
 vser->ports_map[i] &= ~(1U << (port_id % 32));
 
-send_control_event(find_port_by_id(vser, port_id),
-   VIRTIO_CONSOLE_PORT_REMOVE, 1);
+port = find_port_by_id(vser, port_id);
+/* Flush out any unconsumed buffers first */
+flush_queued_data(port, true);
+
+send_control_event(port, VIRTIO_CONSOLE_PORT_REMOVE, 1);
 }
 
 static int virtser_port_qdev_init(DeviceState *qdev, DeviceInfo *base)
diff --git a/hw/virtio-serial.h b/hw/virtio-serial.h
index 62d76a2..a93b545 100644
--- a/hw/virtio-serial.h
+++ b/hw/virtio-serial.h
@@ -110,6 +110,8 @@ struct VirtIOSerialPort {
 bool guest_connected;
 /* Is this device open for IO on the host? */
 bool host_connected;
+/* Do apps not want to receive data? */
+bool throttled;
 };
 
 struct VirtIOSerialPortInfo {
@@ -173,4 +175,11 @@ ssize_t virtio_serial_write(VirtIOSerialPort *port, const 
uint8_t *buf,
  */
 size_t virtio_serial_guest_ready(VirtIOSerialPort *port);
 
+/*
+ * Flow control: Ports can signal to the virtio-serial core to stop
+ * sending data or re-start sending data, depending on the 'throttle'
+ * value here.
+ */
+void virtio_serial_throttle_port(VirtIOSerialPort *port, bool throttle);
+
 #endif
-- 
1.6.2.5





[Qemu-devel] [PATCH v5 15/17] virtio-serial: Apps should consume all data that guest sends out / Fix virtio api abuse

2010-04-13 Thread Amit Shah
We cannot indicate to the guest how much data was consumed by an app for
out_bufs.  So we just have to assume the apps will consume all the data
that are handed over to them.

Fix the virtio api abuse in control_out() and handle_output().

Signed-off-by: Amit Shah 
---
 hw/virtio-console.c|7 ++-
 hw/virtio-serial-bus.c |6 +++---
 hw/virtio-serial.h |6 +++---
 3 files changed, 8 insertions(+), 11 deletions(-)

diff --git a/hw/virtio-console.c b/hw/virtio-console.c
index 6b8..caea11f 100644
--- a/hw/virtio-console.c
+++ b/hw/virtio-console.c
@@ -20,14 +20,11 @@ typedef struct VirtConsole {
 
 
 /* Callback function that's called when the guest sends us data */
-static size_t flush_buf(VirtIOSerialPort *port, const uint8_t *buf, size_t len)
+static void flush_buf(VirtIOSerialPort *port, const uint8_t *buf, size_t len)
 {
 VirtConsole *vcon = DO_UPCAST(VirtConsole, port, port);
-ssize_t ret;
 
-ret = qemu_chr_write(vcon->chr, buf, len);
-
-return ret < 0 ? 0 : ret;
+qemu_chr_write(vcon->chr, buf, len);
 }
 
 /* Readiness of the guest to accept data on a port */
diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
index 3053a35..ad44127 100644
--- a/hw/virtio-serial-bus.c
+++ b/hw/virtio-serial-bus.c
@@ -335,7 +335,7 @@ static void control_out(VirtIODevice *vdev, VirtQueue *vq)
 copied = iov_to_buf(elem.out_sg, elem.out_num, buf, 0, len);
 
 handle_control_message(vser, buf, copied);
-virtqueue_push(vq, &elem, copied);
+virtqueue_push(vq, &elem, 0);
 }
 qemu_free(buf);
 virtio_notify(vdev, vq);
@@ -379,11 +379,11 @@ static void handle_output(VirtIODevice *vdev, VirtQueue 
*vq)
 buf = qemu_malloc(buf_size);
 ret = iov_to_buf(elem.out_sg, elem.out_num, buf, 0, buf_size);
 
-ret = port->info->have_data(port, buf, ret);
+port->info->have_data(port, buf, ret);
 qemu_free(buf);
 
 next_buf:
-virtqueue_push(vq, &elem, ret);
+virtqueue_push(vq, &elem, 0);
 }
 virtio_notify(vdev, vq);
 }
diff --git a/hw/virtio-serial.h b/hw/virtio-serial.h
index f023873..62d76a2 100644
--- a/hw/virtio-serial.h
+++ b/hw/virtio-serial.h
@@ -136,10 +136,10 @@ struct VirtIOSerialPortInfo {
 
 /*
  * Guest wrote some data to the port. This data is handed over to
- * the app via this callback. The app should return the number of
- * bytes it successfully consumed.
+ * the app via this callback.  The app is supposed to consume all
+ * the data that is presented to it.
  */
-size_t (*have_data)(VirtIOSerialPort *port, const uint8_t *buf, size_t 
len);
+void (*have_data)(VirtIOSerialPort *port, const uint8_t *buf, size_t len);
 };
 
 /* Interface to the virtio-serial bus */
-- 
1.6.2.5





[Qemu-devel] [PATCH v5 04/17] virtio-serial: save/load: Send target host connection status if different

2010-04-13 Thread Amit Shah
If the host connection to a port is closed on the destination machine
after migration, whereas the connection was open on the source, the
guest has to be informed of that.

Similar for a host connection open on the destination.

Signed-off-by: Amit Shah 
---
 hw/virtio-serial-bus.c |   11 +++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
index 5316ef6..484dc94 100644
--- a/hw/virtio-serial-bus.c
+++ b/hw/virtio-serial-bus.c
@@ -395,6 +395,7 @@ static void virtio_serial_save(QEMUFile *f, void *opaque)
  */
 qemu_put_be32s(f, &port->id);
 qemu_put_byte(f, port->guest_connected);
+qemu_put_byte(f, port->host_connected);
 }
 }
 
@@ -448,6 +449,7 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, 
int version_id)
 /* Items in struct VirtIOSerialPort */
 for (i = 0; i < nr_active_ports; i++) {
 uint32_t id;
+bool host_connected;
 
 id = qemu_get_be32(f);
 port = find_port_by_id(s, id);
@@ -460,6 +462,15 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, 
int version_id)
 }
 
 port->guest_connected = qemu_get_byte(f);
+host_connected = qemu_get_byte(f);
+if (host_connected != port->host_connected) {
+/*
+ * We have to let the guest know of the host connection
+ * status change
+ */
+send_control_event(port, VIRTIO_CONSOLE_PORT_OPEN,
+   port->host_connected);
+}
 }
 
 return 0;
-- 
1.6.2.5