date:20100921

[Qemu-devel] Re: [PATCH] Fix vhost_net compilation errors for i386-softmmu target

2010-09-21 Thread Michal Novotny


On 09/20/2010 07:53 PM, Michael S. Tsirkin wrote:

On Mon, Sep 20, 2010 at 11:36:58AM +0200, Michal Novotny wrote:
   

Hi,
there were compilation errors when I was trying to compile
i386-softmmu target on i386
host (running on Fedora-13 with development version of qemu
downloaded from git).

There were errors of comparison of unsigned expression was always
true which made it
unable to compile. This simple fix fixes the issue.

...
cc1: warnings being treated as errors
.../hw/vhost_net.c: In function ‘vhost_net_start’:
.../vhost_net.c:154: error: comparison of unsigned expression>= 0
is always true
make[1]: *** [vhost_net.o] Error 1
make: *** [subdir-i386-softmmu] Error 2

Signed-off-by: Michal Novotny

--
Michal Novotny, RHCE
Virtualization Team (xen userspace), Red Hat

 

This is not the right fix though. I have queued
the correct one on my tree, will send pull request.

   


Oh, ok. Nevertheless according to the definition of file.index to be 
unsigned it can never be negative value so that's why I implemented the 
check against greater than zero rather than equal or greater than zero 
but maybe that's not the right way to fix it like you say. However, 
since your patch is already in the queue as you told me it's fine.


Michal

--
Michal Novotny, RHCE
Virtualization Team (xen userspace), Red Hat

[Qemu-devel] Re: [PATCH] Make NIC model fallback to default when specified model is not supported

2010-09-21 Thread Michal Novotny


On 09/20/2010 07:58 PM, Michael S. Tsirkin wrote:

On Mon, Sep 20, 2010 at 11:47:59AM +0200, Michal Novotny wrote:
   

Hi,

this is the patch to introduce a NIC model fallback to default when model
specified is not supported. It's been tested on i386-softmmu target on
i386 host using the Windows XP x86 virtual machine and by trying to setup
the invalid (unsupported) model of NIC device. Also, the new constant in
the net.h called the DEFAULT_NIC_MODEL has been introduced to be able to
change the default NIC model easily. This variable is being used to set
the default NIC model when necessary.
 

Why is this a good idea? This will create problems for anyone
doing migration, etc.
   


Why do you think it would introduce issues when doing migrations? 
Imagine one version (v1) of qemu supporting NIC model called e.g. 
"model" but the newer version (v2) no longer  supporting the model and 
you migrate a guest from v1 to v2 that's using "model" NIC. What does it 
do when you do this now? Will it fail and will the guest be working fine 
on v1 but not migrated to v2? What would my patch do concerning the 
migrations? Would it pass with changing NIC type to default?


   

Also, some bits per mips_jazz were added but usage of some constant for
MIPS is not necessary since there is only one NIC model supported there.

Michal

Signed-off-by: Michal Novotny
 

I think adding NIC_DEFAULT_MODEL macro in net.h is problematic exactly
because each platform has its own.
It belongs in per-platform .c file I think.
   


That's right. Implementing this into the per-platform.c could be a 
better idea nevertheless I saw message of "Unsupported NIC" just in 
mips_jazz.c and net.c so maybe I got confused. The NIC_DEFAULT_MODEL 
definition could help to change default NIC model in the future without 
digging into the code so much so I was thinking it could be useful.


Michal

--
Michal Novotny, RHCE
Virtualization Team (xen userspace), Red Hat

[Qemu-devel] Re: Win2k host problem with {get, free}{addr, name}info()

2010-09-21 Thread Paolo Bonzini


Does gnulib have a similar replacement function?


Very similar, in fact that must be the source.


The nice thing about gnulib is that in the long term, we could potentially
use gnulib for compatibility and make sure to get updated code.


One problem is that the current versions use GPLv3.


Sorry, I made too hasty conclusions based on a few files.
getaddrinfo.c and inet_ntop.c are both GPLv2+.


gnulib has a mix of various licenses.  People are usually not too picky 
about relicensing GPLv3 stuff to GPLv2+ and LGPLv3 stuff to either 
LGPLv2+, or dual GPLv2/LGPLv3.


However, using gnulib may require autoconfiscation of qemu.

Paolo

[Qemu-devel] Re: Caching modes

2010-09-21 Thread Kevin Wolf

Am 21.09.2010 02:18, schrieb Anthony Liguori:
> On 09/20/2010 06:17 PM, Christoph Hellwig wrote:
>> On Mon, Sep 20, 2010 at 03:11:31PM -0500, Anthony Liguori wrote:
>>
> All read and write requests SHOULD avoid any type of caching in the
> host.  Any write request MUST complete after the next level of storage
> reports that the write request has completed.  A flush from the guest
> MUST complete after all pending I/O requests for the guest have been
> completed.
>
> As an implementation detail, with the raw format, these guarantees are
> only in place for preallocated images.  Sparse images do not provide as
> strong of a guarantee.
>
>  
 That's not how cache=none ever worked nor works currently.


>>> How does it work today compared to what I wrote above?
>>>  
>> For the guest point of view it works exactly as you describe
>> cache=writeback.  There is no ordering or cache flushing guarantees.  By
>> using O_DIRECT we do bypass the host file cache, but we don't even try
>> on the others (disk cache, commiting metadata transaction that are
>> required to actually see the commited data for sparse, preallocated or
>> growing images).
>>
> 
> O_DIRECT alone to a pre-allocated file on a normal file system should 
> result in the data being visible without any additional metadata 
> transactions.
> 
> The only time when that isn't true is when dealing with CoW or other 
> special filesystem features.

I think preallocated files are the exception, usually people use sparse
files. And even with preallocation, the disk cache is still left.

>> What you describe above is the equivalent of O_DSYNC|O_DIRECT which
>> doesn't exist in current qemu, except that O_DSYNC|O_DIRECT also
>> guarantees the semantics for sparse images.  Sparse images really aren't
>> special in any way - preallocaiton using posix_fallocate or COW
>> filesystems like btrfs,nilfs2 or zfs have exactly the same issues.
>>
>>
| WC enable | WC disable
 ---
 direct|   |
 buffer|   |
 buffer + ignore flush |   |

 currently we only have:

   cache=none   direct + WC enable
   cache=writeback  buffer + WC enable
   cache=writethrough   buffer + WC disable
   cache=unsafe buffer + ignore flush + WC enable


>>> Where does O_DSYNC fit into this chart?
>>>  
>> O_DSYNC is used for all WC disable modes.
>>
>>
>>> Do all modern filesystems implement O_DSYNC without generating
>>> additional barriers per request?
>>>
>>> Having a barrier per-write request is ultimately not the right semantic
>>> for any of the modes.  However, without the use of O_DSYNC (or
>>> sync_file_range(), which I know you dislike), I don't see how we can
>>> have reasonable semantics without always implementing write back caching
>>> in the host.
>>>  
>> Barriers are a Linux-specific implementation details that is in the
>> process of going away, probably in Linux 2.6.37.  But if you want
>> O_DSYNC semantics with a volatile disk write cache there is no way
>> around using a cache flush or the FUA bit on all I/O caused by it.
> 
> If you have a volatile disk write cache, then we don't need O_DSYNC 
> semantics.

What has semantics of a qemu option to do with the host disk write
cache? We always need to provide the same semantics. If anything, we can
take advantage of a host providing write-through/no caches so that we
don't have to issue the flushes ourselves.

>>We
>> currently use the cache flush, and although I plan to experiment a bit
>> more with the FUA bit for O_DIRECT | O_DSYNC writes I would be very
>> surprised if they actually are any faster.
>>
> 
> The thing I struggle with understanding is that if the guest is sending 
> us a write request, why are we sending the underlying disk a write + 
> flush request?  That doesn't seem logical at all to me.
> 
> Even if we advertise WC disable, it should be up to the guest to decide 
> when to issue flushes.

Why should a guest ever flush a cache when it's told that this cache
doesn't exist?

Kevin

[Qemu-devel] [PATCH 1/2] qemu-virto9p: Implement TLOCK

2010-09-21 Thread M. Mohan Kumar

Synopsis

size[4] TLock tag[2] fid[4] flock[n]
size[4] RLock tag[2] status[1]

Description

Tlock is used to acquire/release byte range posix locks on a file
identified by given fid. The reply contains status of the lock request

flock structure:
type[1] - Type of lock: F_RDLCK, F_WRLCK, F_UNLCK
flags[4] - Flags could be either of
  P9_LOCK_FLAGS_BLOCK(1) - Blocked lock request, if there is a
conflicting lock exists, wait for that lock to be released.
  P9_LOCK_FLAGS_RECLAIM(2) - Reclaim lock request, used when client is
trying to reclaim a lock after a server restrart (due to crash)
start[8] - Starting offset for lock
length[8] - Number of bytes to lock
  If length is 0, lock all bytes starting at the location 'start'
  through to the end of file
pid[4] - PID of the process that wants to take lock
client_id[4] - Unique client id

status[1] - Status of the lock request, can be
  P9_LOCK_SUCCESS(0), P9_LOCK_BLOCKED(1), P9_LOCK_ERROR(2) or
  P9_LOCK_GRACE(3)
  P9_LOCK_SUCCESS - Request was successful
  P9_LOCK_BLOCKED - A conflicting lock is held by another process
  P9_LOCK_ERROR - Error while processing the lock request
  P9_LOCK_GRACE - Server is in grace period, it can't accept new lock
requests in this period (except locks with
P9_LOCK_FLAGS_RECLAIM flag set)

Signed-off-by: M. Mohan Kumar 
Signed-off-by: Aneesh Kumar K.V 
---
 hw/virtio-9p-debug.c |   14 ++
 hw/virtio-9p.c   |   50 ++
 hw/virtio-9p.h   |   30 ++
 3 files changed, 94 insertions(+), 0 deletions(-)

diff --git a/hw/virtio-9p-debug.c b/hw/virtio-9p-debug.c
index 6f6a0ec..045774f 100644
--- a/hw/virtio-9p-debug.c
+++ b/hw/virtio-9p-debug.c
@@ -588,6 +588,20 @@ void pprint_pdu(V9fsPDU *pdu)
 case P9_RXATTRCREATE:
 fprintf(llogfile, "RXATTRCREATE: (");
 break;
+case P9_TLOCK:
+fprintf(llogfile, "TLOCK: (");
+pprint_int32(pdu, 0, &offset, "fid");
+pprint_int8(pdu, 0, &offset, ", type");
+pprint_int32(pdu, 0, &offset, ", flags");
+pprint_int64(pdu, 0, &offset, ", start");
+pprint_int64(pdu, 0, &offset, ", length");
+pprint_int32(pdu, 0, &offset, ", proc_id");
+pprint_str(pdu, 0, &offset, ", client_id");
+break;
+case P9_RLOCK:
+fprintf(llogfile, "RLOCK: (");
+pprint_int8(pdu, 0, &offset, "status");
+break;
 default:
 fprintf(llogfile, "unknown(%d): (", pdu->id);
 break;
diff --git a/hw/virtio-9p.c b/hw/virtio-9p.c
index fd2147e..606604b 100644
--- a/hw/virtio-9p.c
+++ b/hw/virtio-9p.c
@@ -3152,6 +3152,55 @@ out:
 qemu_free(vs);
 }
 
+/*
+ * Implement posix byte range locking code
+ * Server side handling of locking code is very simple, because 9p server in 
QEMU
+ * can handle only one client.  And most of the lock handling (like conflict,
+ * merging) etc is done by the VFS layer itself, so no need to do any thing in
+ * qemu 9p server side lock code path.
+ * So when a TLOCK request comes, always return success
+ */
+
+static void v9fs_lock(V9fsState *s, V9fsPDU *pdu)
+{
+int32_t fid, err = 0;
+V9fsLockState *vs;
+
+vs = qemu_mallocz(sizeof(*vs));
+vs->pdu = pdu;
+vs->offset = 7;
+
+vs->flock = qemu_malloc(sizeof(*vs->flock));
+pdu_unmarshal(vs->pdu, vs->offset, "dbdqqds", &fid, &vs->flock->type,
+&vs->flock->flags, &vs->flock->start, &vs->flock->length,
+&vs->flock->proc_id, &vs->flock->client_id);
+
+vs->status = P9_LOCK_ERROR;
+
+/* We support only block flag now (that too ignored currently) */
+if (vs->flock->flags & ~P9_LOCK_FLAGS_BLOCK) {
+err = -EINVAL;
+goto out;
+}
+vs->fidp = lookup_fid(s, fid);
+if (vs->fidp == NULL) {
+err = -ENOENT;
+goto out;
+}
+
+err = v9fs_do_fstat(s, vs->fidp->fs.fd, &vs->stbuf);
+if (err < 0) {
+err = -errno;
+goto out;
+}
+vs->status = P9_LOCK_SUCCESS;
+out:
+vs->offset += pdu_marshal(vs->pdu, vs->offset, "b", vs->status);
+complete_pdu(s, vs->pdu, err);
+qemu_free(vs->flock);
+qemu_free(vs);
+}
+
 static void v9fs_mkdir_post_lstat(V9fsState *s, V9fsMkState *vs, int err)
 {
 if (err == -1) {
@@ -3414,6 +3463,7 @@ static pdu_handler_t *pdu_handlers[] = {
 [P9_TXATTRCREATE] = v9fs_xattrcreate,
 [P9_TMKNOD] = v9fs_mknod,
 [P9_TRENAME] = v9fs_rename,
+[P9_TLOCK] = v9fs_lock,
 [P9_TMKDIR] = v9fs_mkdir,
 [P9_TVERSION] = v9fs_version,
 [P9_TLOPEN] = v9fs_open,
diff --git a/hw/virtio-9p.h b/hw/virtio-9p.h
index 0816ad6..4555f39 100644
--- a/hw/virtio-9p.h
+++ b/hw/virtio-9p.h
@@ -37,6 +37,8 @@ enum {
 P9_RXATTRCREATE,
 P9_TREADDIR = 40,
 P9_RREADDIR,
+

[Qemu-devel] [PATCH 2/2] qemu-virtio9p: Implement TGETLOCK

2010-09-21 Thread M. Mohan Kumar

Synopsis

size[4] TGetlock tag[2] fid[4] getlock[n]
size[4] RGetlock tag[2] getlock[n]

Description

TGetlock is used to test for the existence of byte range posix locks on
a file identified by given fid. The reply contains getlock structure. If
the lock could be placed it returns F_UNLCK in type field of getlock structure.
Otherwise it returns the details of the conflicting locks in the getlock
structure

getlock structure:
  type[1] - Type of lock: F_RDLCK, F_WRLCK
  start[8] - Starting offset for lock
  length[8] - Number of bytes to lock
If length is 0, lock all bytes starting at the location
'start' through to the end of file
  proc_id[4] - process id that wants to take lock/owns the task
   in case of reply
  client[4] - Client id of the system that owns the process

Signed-off-by: M. Mohan Kumar 
Signed-off-by: Aneesh Kumar K.V 
---
 hw/virtio-9p-debug.c |   17 +
 hw/virtio-9p.c   |   41 +
 hw/virtio-9p.h   |   21 +
 3 files changed, 79 insertions(+), 0 deletions(-)

diff --git a/hw/virtio-9p-debug.c b/hw/virtio-9p-debug.c
index 045774f..85c9900 100644
--- a/hw/virtio-9p-debug.c
+++ b/hw/virtio-9p-debug.c
@@ -602,6 +602,23 @@ void pprint_pdu(V9fsPDU *pdu)
 fprintf(llogfile, "RLOCK: (");
 pprint_int8(pdu, 0, &offset, "status");
 break;
+case P9_TGETLOCK:
+fprintf(llogfile, "TGETLOCK: (");
+pprint_int32(pdu, 0, &offset, "fid");
+pprint_int8(pdu, 0, &offset, ", type");
+pprint_int64(pdu, 0, &offset, ", start");
+pprint_int64(pdu, 0, &offset, ", length");
+pprint_int32(pdu, 0, &offset, ", proc_id");
+pprint_str(pdu, 0, &offset, ", client_id");
+break;
+case P9_RGETLOCK:
+fprintf(llogfile, "RGETLOCK: (");
+pprint_int8(pdu, 0, &offset, "type");
+pprint_int64(pdu, 0, &offset, ", start");
+pprint_int64(pdu, 0, &offset, ", length");
+pprint_int32(pdu, 0, &offset, ", proc_id");
+pprint_str(pdu, 0, &offset, ", client_id");
+break;
 default:
 fprintf(llogfile, "unknown(%d): (", pdu->id);
 break;
diff --git a/hw/virtio-9p.c b/hw/virtio-9p.c
index 606604b..33718ea 100644
--- a/hw/virtio-9p.c
+++ b/hw/virtio-9p.c
@@ -3201,6 +3201,46 @@ out:
 qemu_free(vs);
 }
 
+/*
+ * When a TGETLOCK request comes, always return success because all lock
+ * handling is done by client's VFS layer.
+ */
+
+static void v9fs_getlock(V9fsState *s, V9fsPDU *pdu)
+{
+int32_t fid, err = 0;
+V9fsGetlockState *vs;
+
+vs = qemu_mallocz(sizeof(*vs));
+vs->pdu = pdu;
+vs->offset = 7;
+
+vs->glock = qemu_malloc(sizeof(*vs->glock));
+pdu_unmarshal(vs->pdu, vs->offset, "dbqqds", &fid, &vs->glock->type,
+&vs->glock->start, &vs->glock->length, &vs->glock->proc_id,
+   &vs->glock->client_id);
+
+vs->fidp = lookup_fid(s, fid);
+if (vs->fidp == NULL) {
+err = -ENOENT;
+goto out;
+}
+
+err = v9fs_do_fstat(s, vs->fidp->fs.fd, &vs->stbuf);
+if (err < 0) {
+err = -errno;
+goto out;
+}
+vs->glock->type = F_UNLCK;
+vs->offset += pdu_marshal(vs->pdu, vs->offset, "bqqds", vs->glock->type,
+vs->glock->start, vs->glock->length, vs->glock->proc_id,
+   &vs->glock->client_id);
+out:
+complete_pdu(s, vs->pdu, err);
+qemu_free(vs->glock);
+qemu_free(vs);
+}
+
 static void v9fs_mkdir_post_lstat(V9fsState *s, V9fsMkState *vs, int err)
 {
 if (err == -1) {
@@ -3464,6 +3504,7 @@ static pdu_handler_t *pdu_handlers[] = {
 [P9_TMKNOD] = v9fs_mknod,
 [P9_TRENAME] = v9fs_rename,
 [P9_TLOCK] = v9fs_lock,
+[P9_TGETLOCK] = v9fs_getlock,
 [P9_TMKDIR] = v9fs_mkdir,
 [P9_TVERSION] = v9fs_version,
 [P9_TLOPEN] = v9fs_open,
diff --git a/hw/virtio-9p.h b/hw/virtio-9p.h
index 4555f39..6a68895 100644
--- a/hw/virtio-9p.h
+++ b/hw/virtio-9p.h
@@ -39,6 +39,8 @@ enum {
 P9_RREADDIR,
 P9_TLOCK = 52,
 P9_RLOCK,
+P9_TGETLOCK = 54,
+P9_RGETLOCK,
 P9_TLINK = 70,
 P9_RLINK,
 P9_TMKDIR = 72,
@@ -464,6 +466,25 @@ typedef struct V9fsLockState
 V9fsFlock *flock;
 } V9fsLockState;
 
+typedef struct V9fsGetlock
+{
+uint8_t type;
+uint64_t start; /* absolute offset */
+uint64_t length;
+uint32_t proc_id;
+V9fsString client_id;
+} V9fsGetlock;
+
+typedef struct V9fsGetlockState
+{
+V9fsPDU *pdu;
+size_t offset;
+struct stat stbuf;
+V9fsFidState *fidp;
+V9fsGetlock *glock;
+} V9fsGetlockState;
+
+
 extern size_t pdu_packunpack(void *addr, struct iovec *sg, int sg_count,
 size_t offset, size_t size, int pack);
 
-- 
1.7.0.4

Fwd: [Qemu-devel]ask for help-register allocation algorithms

2010-09-21 Thread ustc ustc

-- 已转发邮件 --
发件人： ly 
日期： 2010年9月20日 上午10:22
主题： [Qemu-devel]ask for help
收件人： qemu-devel@nongnu.org


 i want to modify the register allocation algorithms in the qemu.(x86->mios)
maybe i want to change the cpu register map on the host. for example,
map the %eax to the s7 directly
to reduce the memory access.
i am a binginner with the qemu,how to start this work? thank you.

[Qemu-devel] [Bug 638955] Re: emulated netcards don't work with recent sunos kernel

2010-09-21 Thread daniel pecka

http://patchwork.ozlabs.org/patch/65137/raw/

well, this *fixed a issue .. it's very good that we (sunos guys) can now
use the best virt platform (kvm - IMO) ..

regards and thanks folks
ave, daniel

-- 
emulated netcards don't work with recent sunos kernel
https://bugs.launchpad.net/bugs/638955
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.

Status in QEMU: New

Bug description:
hi there,

i'm using qemu-kvm backend in version: # qemu-kvm -version
QEMU PC emulator version 0.12.5 (qemu-kvm-0.12.5), Copyright (c) 2003-2008 
Fabrice Bellard

and there are just *not working any of model=$type with combinations of recent 
sunos (solaris, openindiana, opensolaris, ..) ..

you can download for testing purposes iso from here: 
http://dlc-origin.openindiana.org/isos/147/ or from here: 
http://genunix.org/distributions/indiana/ << osol and oi are also bubuntu-like 
*live cds, so no need to bother with installing

behaviour is as follows:
e1000 - receiving doesn't work, transmitting works .. dladm (tool for handle 
ethers) shows that is all ok, correct mode is loaded up, it just seems like 
this driver works at 100% but ..

rtl8169|pcnet - works in 10Mbit mode with several other issues like high cpu 
utilization and so .. dladm is unable to recognize options for this kind of -nic

others - just don't work

.. i experienced this issue several times in past .. woraround was, that 
rtl8169 worked so-so .. with recent sunos kernel it doesn't.

it's easy to reproduce, this is why i'm not putting here more then launching 
script for my virtual machine:

# cat openindiana.sh
qemu-kvm -hda /home/kvm/openindiana/openindiana.img -m 2048 -localtime -cdrom 
/home/kvm/+images/oi-dev-147-x86.iso -boot d \
-vga std -vnc :9 -k en-us -monitor 
unix:/home/kvm/openindiana/instance,server,nowait \
-net nic,model=e1000,vlan=1 -net tap,ifname=oi0,script=no,vlan=1 &

sleep 2;
ip l set oi0 up;
ip a a 192.168.99.9/24 dev oi0;

regards by daniel

Re: [Qemu-devel] Re: [PATCH] net: delay peer host device delete

2010-09-21 Thread Daniel P. Berrange

On Mon, Sep 20, 2010 at 09:37:16PM +0200, Michael S. Tsirkin wrote:
> On Mon, Sep 20, 2010 at 02:22:18PM -0500, Anthony Liguori wrote:
> > On 09/20/2010 01:59 PM, Michael S. Tsirkin wrote:
> > >>You can also initiate the unplug from the OS without the ACPI event
> > >>ever happening.  I suspect that in our current implementation, that
> > >>means that we'll automatically delete the device which may have
> > >>strange effects on management tools.
> > >>
> > >>So it probably makes sense for our interface to present the same
> > >>procedure.  What do you think?
> > >>
> > >>Regards,
> > >>
> > >>Anthony Liguori
> > >We seem to have two discussions here. you speak about how an ideal hot plug
> > >interface will look. This can involve new commands etc.
> > >I speak about fixing existing ones so qemu and/or guest won't crash.
> > 
> > To be fair, existing qemu won't crash if you do:
> > 
> > (qemu) device_del 
> > Use info_qtree to notice when device goes away
> > (qemu) netdev_del 
> 
> Asking libvirt to busy loop polling the monitor sounds like a really bad
> situation: note that guest is blocked from doing any io while monitor is
> used, so it may in fact prevent it from making progress. Right?

Clearly we need either an async command completion, or an async
event notification of device_del. No one wants todo polling,
nor does anyone sane want to try to parse the outout of info
qtree :-)

> So why can't we let management do netdev_del and have it take effect
> when this becomes possible?

That would be really unpleasant to deal with. netdev_del should
always kill the backend immediately, even if the frontend device 
still exists. If this could cause issues for the frontend, then just
connect it to a no-op backend internally so it gets no further data.
In the context of drive_del, once it returns, libvirt changes the security
labelling, so QEMU is guarenteed not to be able to use the backend
anymore, even if it tries to. We would do the same for netdev_del if
we could.

> > You're trying to come up with a workaround for the fact that libvirt
> > is making bad assumptions.
> 
> BTW, even if it is, I don't think we should be crashing qemu or guest.

Even if the libvirt is making bad assumptions about when a monitor
command can be issued, this is no excuse for QEMU crashing. If the
netdev_del can't safely be performed, it should return an error
via the monitor, not crash QEMU.

Daniel
-- 
|: Red Hat, Engineering, London-o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org -o- http://virt-manager.org -o- http://deltacloud.org :|
|: http://autobuild.org-o- http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-   F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

Re: [Qemu-devel] [RFC] block-queue: Delay and batch metadata writes

2010-09-21 Thread Kevin Wolf

Am 20.09.2010 17:51, schrieb Anthony Liguori:
> On 09/20/2010 10:08 AM, Kevin Wolf wrote:
>>
>>> If you're comfortable with a writeback cache for metadata, then you
>>> should also be comfortable with a writeback cache for data in which
>>> case, cache=writeback is the answer.
>>>  
>> Well, there is a difference: We don't pollute the host page cache with
>> guest data and we don't get a virtual "disk cache" as big as the host
>> RAM, but only a very limited queue of metadata.
> 
> Would it be a mortal sin to open the file twice and have a cache=none 
> version for data and cache=writeback for metadata?

Is the behaviour for this well-defined and portable?

> The two definitely aren't consistent with each other but I think the 
> whole point here is that we don't care.

What we do care about is ordering between data and metadata writes, for
example when doing COW, the copy of the data should have completed
before we update the L2 table.

Also, what happens (in qcow2) when we free a data cluster and reuse it
as metadata (or the other way round). Does this work or is there a
chance that the old content is resurrected?

Kevin

[Bug 638955] Re: [Qemu-devel] [PATCH] e1000: Pad short frames to minimum size (60 bytes)

2010-09-21 Thread Stefan Hajnoczi

On Mon, Sep 20, 2010 at 9:31 PM, Anthony Liguori  wrote:
> On 09/20/2010 05:42 AM, Michael S. Tsirkin wrote:
>>
>> On Sun, Sep 19, 2010 at 07:36:51AM +0100, Stefan Hajnoczi wrote:
>>
>>>
>>> On Sat, Sep 18, 2010 at 10:27 PM, Edgar E. Iglesias
>>>   wrote:
>>>

 This doesn't look right. AFAIK, MAC's dont pad on receive.

>>>
>>> I agree.  NICs that do padding will do it on transmit, not receive.
>>> Anything coming in on the wire should already have the minimum length.
>>>
>>
>> QEMU never gets access to the wire.
>> Our APIs do not really pass complete ethernet packets:
>> we forward packets without checksum and padding.
>>
>> I think it makes complete sense to keep this and
>> handle padding in devices because we
>> have devices that pass the frame to guest without padding and checksum.
>> It should be easy to replace padding code in devices that
>> need it with some kind of macro.
>>
>
> Would this not also address the problem?  It sounds like the root cause is
> the tap code, not the devices..

This won't work when s->has_vnet_hdr is 1 because the virtio-net
header consumes buffer space and reduces the amount we pad.  The
padding size should be 60 + (s->has_vnet_hdr ? sizeof(struct
virtio_net_hdr) : 0).

Adjusting the length without clearing the untouched buffer space is
probably fine.  I'm trying to think of a scenario where this becomes
an information leak (security issue).  Perhaps if the guest has vlans
enabled and allows different users to sniff traffic only on their
vlans?  Then you may be able to read part of another vlan's traffic by
sending short packets to your vlan and gathering the padding data.
This is pretty contrived but doing a <60 byte memset would prevent the
issue for sure.

Stefan

-- 
emulated netcards don't work with recent sunos kernel
https://bugs.launchpad.net/bugs/638955
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.

Status in QEMU: New

Bug description:
hi there,

i'm using qemu-kvm backend in version: # qemu-kvm -version
QEMU PC emulator version 0.12.5 (qemu-kvm-0.12.5), Copyright (c) 2003-2008 
Fabrice Bellard

and there are just *not working any of model=$type with combinations of recent 
sunos (solaris, openindiana, opensolaris, ..) ..

you can download for testing purposes iso from here: 
http://dlc-origin.openindiana.org/isos/147/ or from here: 
http://genunix.org/distributions/indiana/ << osol and oi are also bubuntu-like 
*live cds, so no need to bother with installing

behaviour is as follows:
e1000 - receiving doesn't work, transmitting works .. dladm (tool for handle 
ethers) shows that is all ok, correct mode is loaded up, it just seems like 
this driver works at 100% but ..

rtl8169|pcnet - works in 10Mbit mode with several other issues like high cpu 
utilization and so .. dladm is unable to recognize options for this kind of -nic

others - just don't work

.. i experienced this issue several times in past .. woraround was, that 
rtl8169 worked so-so .. with recent sunos kernel it doesn't.

it's easy to reproduce, this is why i'm not putting here more then launching 
script for my virtual machine:

# cat openindiana.sh
qemu-kvm -hda /home/kvm/openindiana/openindiana.img -m 2048 -localtime -cdrom 
/home/kvm/+images/oi-dev-147-x86.iso -boot d \
-vga std -vnc :9 -k en-us -monitor 
unix:/home/kvm/openindiana/instance,server,nowait \
-net nic,model=e1000,vlan=1 -net tap,ifname=oi0,script=no,vlan=1 &

sleep 2;
ip l set oi0 up;
ip a a 192.168.99.9/24 dev oi0;

regards by daniel

[Qemu-devel] Re: [PATCH] net: delay peer host device delete

2010-09-21 Thread Michael S. Tsirkin

On Mon, Sep 20, 2010 at 03:50:51PM -0500, Anthony Liguori wrote:
> On 09/20/2010 03:37 PM, Michael S. Tsirkin wrote:
> >On Mon, Sep 20, 2010 at 03:38:36PM -0500, Anthony Liguori wrote:
> >>On 09/20/2010 03:27 PM, Michael S. Tsirkin wrote:
> >>>On Mon, Sep 20, 2010 at 03:20:59PM -0500, Anthony Liguori wrote:
> On 09/20/2010 02:44 PM, Michael S. Tsirkin wrote:
> >>I think the only workable approach that doesn't involve new commands
> >>is to change the semantics of the existing ones.
> >>
> >>Make netdev_del work regardless of whether the device is still present.
> >>
> >>You would need to reference count the actual netdev structure and
> >>have each device using it unref on delete.  You make netdev_del mark
> >>the device as deleted and when a device is deleted, any calls into
> >>the device effectively become nops.
> >>
> >>You have to go through most of the cleanup process to ensure that
> >>tap device gets closed even before your reference count goes to
> >>zero.
> >I think you mean 'does not get closed': we need the fd to get the flags 
> >etc.
> No, I actually meant does get closed.
> 
> When you do netdev_del, it should result in the fd getting closed.
> 
> The actual netdev structure then becomes a zombie that's completely
> useless until the device goes away.
> 
> >Note that it will mostly work unless when it'll crash.
> >Issue is we don't have any documentation so
> >people get the command set by trial and error.
> >
> >So how can we prove it's a user bug and not qemu bug?
> >I guess we should blame ourselves until proven innocent.
> Here's what I'm now suggesting:
> 
> device_del ->   may or may not unplug a device from a guest when it
> returns.  To figure out if it does, you have to run info qdm.
> >>>I think it should also always unplug on guest reset.
> >>>
> netdev_del ->   always destroys a netdev device when it returns.  May
> be called at any point in time.  If you destroy a netdev while the
> device is still using it, all packets go into the bit bucket and the
> link status is modified to be unplugged.
> >>>One issue here is that we can't allow a new device with same name
> >>>to be created until the nic is destroyed.
> >>A new netdev device?  Why not?
> >Because it won't work: it will try to pair with existing nic device
> >(it is looked up by name) and that will fail.
> 
> No, netdev_del should remove the VLANClientState from the
> non_vlan_clients list.
> 
> It's no longer enumerable and it's no longer lookup-able.
> 
> The only reason it stays around it so that the device doesn't have a
> reference to a free pointer.  The only field that's ever looked at
> is is_deleted which is used by every function to turn around and
> implement a nop.
> 
> The VLANClientState is a hollow shell of it's former glorious self.
> The remainder of it's (hopefully short) life is merely so that we
> can avoid touching every device to teach them about disconnecting
> backends.

We'll have to tell them link is down, won't we?

> >>Because the fundamental problem is that device_del is too difficult
> >>to use.  You're just making netdev_del equally difficult to use.
> >>
> >>Try your patch with libvirt, don't load acpiphp in the guest, and
> >>then play around with virsh device_detach and device_attach.  All
> >>sorts of badness will ensue as libvirt tries to manage assignment of
> >>PCI slot numbers and such.
> >>
> >>Regards,
> >>
> >>Anthony Liguori
> >At some level, that's right. Is yours better?
> 
> device_del is still busted but at least netdev_del behaves the way
> libvirt expects it to.
> 
> >I guess the right thing is to wait for libvirt guys to tell
> >us what they prefer.
> 
> Yeah, I think some clarity is needed.
> 
> Regards,
> 
> Anthony Liguori
>

Re: [Qemu-devel] Re: [PATCH] net: delay peer host device delete

2010-09-21 Thread Michael S. Tsirkin

On Tue, Sep 21, 2010 at 09:58:14AM +0100, Daniel P. Berrange wrote:
> On Mon, Sep 20, 2010 at 09:37:16PM +0200, Michael S. Tsirkin wrote:
> > On Mon, Sep 20, 2010 at 02:22:18PM -0500, Anthony Liguori wrote:
> > > On 09/20/2010 01:59 PM, Michael S. Tsirkin wrote:
> > > >>You can also initiate the unplug from the OS without the ACPI event
> > > >>ever happening.  I suspect that in our current implementation, that
> > > >>means that we'll automatically delete the device which may have
> > > >>strange effects on management tools.
> > > >>
> > > >>So it probably makes sense for our interface to present the same
> > > >>procedure.  What do you think?
> > > >>
> > > >>Regards,
> > > >>
> > > >>Anthony Liguori
> > > >We seem to have two discussions here. you speak about how an ideal hot 
> > > >plug
> > > >interface will look. This can involve new commands etc.
> > > >I speak about fixing existing ones so qemu and/or guest won't crash.
> > > 
> > > To be fair, existing qemu won't crash if you do:
> > > 
> > > (qemu) device_del 
> > > Use info_qtree to notice when device goes away
> > > (qemu) netdev_del 
> > 
> > Asking libvirt to busy loop polling the monitor sounds like a really bad
> > situation: note that guest is blocked from doing any io while monitor is
> > used, so it may in fact prevent it from making progress. Right?
> 
> Clearly we need either an async command completion, or an async
> event notification of device_del. No one wants todo polling,
> nor does anyone sane want to try to parse the outout of info
> qtree :-)
> 
>  
> > So why can't we let management do netdev_del and have it take effect
> > when this becomes possible?
> 
> That would be really unpleasant to deal with. netdev_del should
> always kill the backend immediately, even if the frontend device 
> still exists. If this could cause issues for the frontend, then just
> connect it to a no-op backend internally so it gets no further data.
> In the context of drive_del, once it returns, libvirt changes the security
> labelling, so QEMU is guarenteed not to be able to use the backend
> anymore, even if it tries to. We would do the same for netdev_del if
> we could.


OK, that's clear enough.
One note though: you won't be able to create another backend
with the same name until the frontend is gone.

-- 
MST

Re: [Qemu-devel] [PATCH] e1000: Pad short frames to minimum size (60 bytes)

2010-09-21 Thread Michael S. Tsirkin

On Mon, Sep 20, 2010 at 10:51:36PM +0200, Edgar E. Iglesias wrote:
> On Mon, Sep 20, 2010 at 03:31:32PM -0500, Anthony Liguori wrote:
> > On 09/20/2010 05:42 AM, Michael S. Tsirkin wrote:
> > > On Sun, Sep 19, 2010 at 07:36:51AM +0100, Stefan Hajnoczi wrote:
> > >
> > >> On Sat, Sep 18, 2010 at 10:27 PM, Edgar E. Iglesias
> > >>   wrote:
> > >>  
> > >>> This doesn't look right. AFAIK, MAC's dont pad on receive.
> > >>>
> > >> I agree.  NICs that do padding will do it on transmit, not receive.
> > >> Anything coming in on the wire should already have the minimum length.
> > >>  
> > > QEMU never gets access to the wire.
> > > Our APIs do not really pass complete ethernet packets:
> > > we forward packets without checksum and padding.
> > >
> > > I think it makes complete sense to keep this and
> > > handle padding in devices because we
> > > have devices that pass the frame to guest without padding and checksum.
> > > It should be easy to replace padding code in devices that
> > > need it with some kind of macro.
> > >
> > 
> > Would this not also address the problem?  It sounds like the root cause 
> > is the tap code, not the devices..
> > 
> > Regards,
> > 
> > Anthony Liguori
> > 
> > >
> > >> In QEMU that isn't true today and that's why rtl8139, pcnet, and
> > >> ne2000 already do this same padding.  This patch is the smallest
> > >> change to cover e1000.
> > >>
> > >>  
> > >>> IMO this kind of padding should somehow be done by the bridge that 
> > >>> forwards
> > >>> packets into the qemu vlan (e.g slirp or the generic tap bridge).
> > >>>
> > >> That should work and we can then drop the padding code from existing
> > >> NICs.  I'll take a look.
> > >>
> > >> Stefan
> > >>  
> > >
> > 
> 
> > From f77c3143f3fbefdfa2f0cc873c2665b5aa78e8c9 Mon Sep 17 00:00:00 2001
> > From: Anthony Liguori 
> > Date: Mon, 20 Sep 2010 15:29:31 -0500
> > Subject: [PATCH] tap: make sure packets are at least 40 bytes long
> > 
> > This is required by ethernet drivers but not enforced in the Linux tap code 
> > so
> > we need to fix it up ourselves.
> > 
> > Signed-off-by: Anthony Liguori 
> > 
> > diff --git a/net/tap.c b/net/tap.c
> > index 4afb314..822241a 100644
> > --- a/net/tap.c
> > +++ b/net/tap.c
> > @@ -179,7 +179,13 @@ static int tap_can_send(void *opaque)
> >  #ifndef __sun__
> >  ssize_t tap_read_packet(int tapfd, uint8_t *buf, int maxlen)
> >  {
> > -return read(tapfd, buf, maxlen);
> > +ssize_t len;
> > +
> > +len = read(tapfd, buf, maxlen);
> > +if (len > 0) {
> > +len = MAX(MIN(maxlen, 40), len);
> 
> 
> A small detail :)
> 40 -> 64 (including a dummy FCS).

I don't think so: e1000 at least has code to tack the FCS on,
so we'll end up with a 68 bytes.

> > +}
> > +return len;
> >  }
> >  #endif
> >  
> > -- 
> > 1.7.0.4
> >

Re: [Qemu-devel] [PATCH] e1000: Pad short frames to minimum size (60 bytes)

2010-09-21 Thread Edgar E. Iglesias

On Tue, Sep 21, 2010 at 11:17:07AM +0200, Michael S. Tsirkin wrote:
> On Mon, Sep 20, 2010 at 10:51:36PM +0200, Edgar E. Iglesias wrote:
> > On Mon, Sep 20, 2010 at 03:31:32PM -0500, Anthony Liguori wrote:
> > > On 09/20/2010 05:42 AM, Michael S. Tsirkin wrote:
> > > > On Sun, Sep 19, 2010 at 07:36:51AM +0100, Stefan Hajnoczi wrote:
> > > >
> > > >> On Sat, Sep 18, 2010 at 10:27 PM, Edgar E. Iglesias
> > > >>   wrote:
> > > >>  
> > > >>> This doesn't look right. AFAIK, MAC's dont pad on receive.
> > > >>>
> > > >> I agree.  NICs that do padding will do it on transmit, not receive.
> > > >> Anything coming in on the wire should already have the minimum length.
> > > >>  
> > > > QEMU never gets access to the wire.
> > > > Our APIs do not really pass complete ethernet packets:
> > > > we forward packets without checksum and padding.
> > > >
> > > > I think it makes complete sense to keep this and
> > > > handle padding in devices because we
> > > > have devices that pass the frame to guest without padding and checksum.
> > > > It should be easy to replace padding code in devices that
> > > > need it with some kind of macro.
> > > >
> > > 
> > > Would this not also address the problem?  It sounds like the root cause 
> > > is the tap code, not the devices..
> > > 
> > > Regards,
> > > 
> > > Anthony Liguori
> > > 
> > > >
> > > >> In QEMU that isn't true today and that's why rtl8139, pcnet, and
> > > >> ne2000 already do this same padding.  This patch is the smallest
> > > >> change to cover e1000.
> > > >>
> > > >>  
> > > >>> IMO this kind of padding should somehow be done by the bridge that 
> > > >>> forwards
> > > >>> packets into the qemu vlan (e.g slirp or the generic tap bridge).
> > > >>>
> > > >> That should work and we can then drop the padding code from existing
> > > >> NICs.  I'll take a look.
> > > >>
> > > >> Stefan
> > > >>  
> > > >
> > > 
> > 
> > > From f77c3143f3fbefdfa2f0cc873c2665b5aa78e8c9 Mon Sep 17 00:00:00 2001
> > > From: Anthony Liguori 
> > > Date: Mon, 20 Sep 2010 15:29:31 -0500
> > > Subject: [PATCH] tap: make sure packets are at least 40 bytes long
> > > 
> > > This is required by ethernet drivers but not enforced in the Linux tap 
> > > code so
> > > we need to fix it up ourselves.
> > > 
> > > Signed-off-by: Anthony Liguori 
> > > 
> > > diff --git a/net/tap.c b/net/tap.c
> > > index 4afb314..822241a 100644
> > > --- a/net/tap.c
> > > +++ b/net/tap.c
> > > @@ -179,7 +179,13 @@ static int tap_can_send(void *opaque)
> > >  #ifndef __sun__
> > >  ssize_t tap_read_packet(int tapfd, uint8_t *buf, int maxlen)
> > >  {
> > > -return read(tapfd, buf, maxlen);
> > > +ssize_t len;
> > > +
> > > +len = read(tapfd, buf, maxlen);
> > > +if (len > 0) {
> > > +len = MAX(MIN(maxlen, 40), len);
> > 
> > 
> > A small detail :)
> > 40 -> 64 (including a dummy FCS).
> 
> I don't think so: e1000 at least has code to tack the FCS on,
> so we'll end up with a 68 bytes.

And at the moment e1000 also has padding, both padding
and FCS appending should go away from ethernet models before
this goes in.

Anyway, if you guys maintaining the networking parts are in
agreement that padding and FCS appending should be done in
the device models (at least for the moment), I'll accept
that and back-off. In that case, I think your suggestion
of hiding things behind some kind of generic macro or
function would be good. At least it will clarify things.

Cheers

[Qemu-devel] [PATCH][qemu-iotests] Consider more cases in parsing qemu-io output

2010-09-21 Thread Kevin Wolf

I got a bug report with test output diffs like this:

-4 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+4 KiB, 1 ops; 0. sec (inf EiB/sec and inf ops/sec)

This patch extends the regular expression to consider terabytes, petabytes and
exabytes, and to allow inf as value for the throughput.

Signed-off-by: Kevin Wolf 
---
 common.filter |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/common.filter b/common.filter
index ce81266..da55f54 100644
--- a/common.filter
+++ b/common.filter
@@ -137,7 +137,7 @@ _filter_testdir()
 # sanitize qemu-io output
 _filter_qemu_io()
 {
-sed -e "s/[0-9]* ops\; [0-9/:. sec]* ([0-9/.]* [GMKiBbytes]*\/sec and 
[0-9/.]* ops\/sec)/X ops\; XX:XX:XX.X (XXX YYY\/sec and XXX ops\/sec)/"
+sed -e "s/[0-9]* ops\; [0-9/:. sec]* ([0-9/.inf]* [EPTGMKiBbytes]*\/sec 
and [0-9/.inf]* ops\/sec)/X ops\; XX:XX:XX.X (XXX YYY\/sec and XXX ops\/sec)/"
 }
 
 # make sure this script returns success
-- 
1.7.2.2

[Qemu-devel] Re: [PATCH] blkverify: Handle overlapping I/O vector buffers

2010-09-21 Thread Kevin Wolf

Am 20.09.2010 15:31, schrieb Stefan Hajnoczi:
> When blkverify clones an I/O vector in order to perform mirrored reads
> and then compare their contents, it does not take into account the
> layout of individual buffers.  It turns out this is important because
> guests may issue requests with overlapping buffers and the results
> differ depending on how buffers are overlapped.
> 
> This patch introduces logic to honor overlap relationships when cloning
> I/O vectors.
> 
> Signed-off-by: Stefan Hajnoczi 

Took me a while to review this. These buffer calculations always look so
harmless, but it's not trivial at all...

Anyway, looks good to me.

Kevin

[Qemu-devel] Reminder about your invitation from Georgios Portokalidis

2010-09-21 Thread Georgios Portokalidis (LinkedIn Invitations)

LinkedIn

This is a reminder that on September 16, Georgios Portokalidis sent you an 
invitation to become part of his or her professional network at LinkedIn.

Follow this link to accept Georgios Portokalidis's invitation.

https://www.linkedin.com/e/-kkb1ec-geclpv9v-3t/doi/1677193137/YH9jxvHD/gir_169175693_0/EML-inv_17_rem/

Signing up is free and takes less than a minute.

On September 16, Georgios Portokalidis wrote:

> To:  [qemu-de...@nongnu.org]
> From: Georgios Portokalidis [georgios.portokali...@gmail.com]
> Subject: Georgios Portokalidis wants to stay in touch on LinkedIn
>
> I'd like to add you to my professional network on LinkedIn.
> 
> - Georgios Portokalidis

The only way to get access to Georgios Portokalidis's professional network on 
LinkedIn is through the following link:

https://www.linkedin.com/e/-kkb1ec-geclpv9v-3t/doi/1677193137/YH9jxvHD/gir_169175693_0/EML-inv_17_rem/

You can remove yourself from Georgios Portokalidis's network at any time.

--

-- 
(c) 2010, LinkedIn Corporation

Re: [Qemu-devel] Re: Win2k host problem with {get, free}{addr, name}info()

2010-09-21 Thread Bastien ROUCARIES

On Tue, Sep 21, 2010 at 10:09 AM, Paolo Bonzini  wrote:
 Does gnulib have a similar replacement function?
>>>
>>> Very similar, in fact that must be the source.
>>>
 The nice thing about gnulib is that in the long term, we could
 potentially
 use gnulib for compatibility and make sure to get updated code.
>>>
>>> One problem is that the current versions use GPLv3.
>>
>> Sorry, I made too hasty conclusions based on a few files.
>> getaddrinfo.c and inet_ntop.c are both GPLv2+.
>
> gnulib has a mix of various licenses.  People are usually not too picky
> about relicensing GPLv3 stuff to GPLv2+ and LGPLv3 stuff to either LGPLv2+,
> or dual GPLv2/LGPLv3.
>
> However, using gnulib may require autoconfiscation of qemu.

Not sure you could put the autoconf stuff is a well isolated directory
and use the libposix trick

I use gnulib with cmake in this way

Bastien

Re: [Qemu-devel] Re: [PATCH] blkverify: Handle overlapping I/O vector buffers

2010-09-21 Thread Stefan Hajnoczi

On Tue, Sep 21, 2010 at 11:06 AM, Kevin Wolf  wrote:
> Am 20.09.2010 15:31, schrieb Stefan Hajnoczi:
>> When blkverify clones an I/O vector in order to perform mirrored reads
>> and then compare their contents, it does not take into account the
>> layout of individual buffers.  It turns out this is important because
>> guests may issue requests with overlapping buffers and the results
>> differ depending on how buffers are overlapped.
>>
>> This patch introduces logic to honor overlap relationships when cloning
>> I/O vectors.
>>
>> Signed-off-by: Stefan Hajnoczi 
>
> Took me a while to review this. These buffer calculations always look so
> harmless, but it's not trivial at all...

Thanks for the review!  I wasn't thrilled to add this logic either but
I don't see a way around it.

Will merge into blkverify so there is one unified patch including
fixes for your review comments.

Stefan

Re: [Qemu-devel] [PATCH 5/5] ide: propagate the required alignment

2010-09-21 Thread Kevin Wolf

Am 12.09.2010 23:44, schrieb Christoph Hellwig:
> IDE is a bit ugly in this respect.  For one it doesn't really keep track
> of a sector size - most of the protocol is in units of 512 bytes, and we
> assume 2048 bytes for CDROMs which is correct most of the time.
> 
> Second IDE allocates an I/O buffer long before we know if we're dealing
> with a CDROM or not, so increase the alignment for the io_buffer
> unconditionally.
> 
> Signed-off-by: Christoph Hellwig 

I'm not very happy about the last three patches in this series because
they mix guest device and backend properties. But it's probably the best
we can get without major effort.

Applied all to the block branch.

Kevin

[Qemu-devel] [PATCH] qemu-io: New command map

2010-09-21 Thread Kevin Wolf

The new map command in qemu-io lists all allocated/unallocated areas in an
image file.

Signed-off-by: Kevin Wolf 
---
 qemu-io.c |   38 ++
 1 files changed, 38 insertions(+), 0 deletions(-)

diff --git a/qemu-io.c b/qemu-io.c
index b4e5cc8..169dd51 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -1443,6 +1443,43 @@ static const cmdinfo_t alloc_cmd = {
 };
 
 static int
+map_f(int argc, char **argv)
+{
+   int64_t offset;
+   int nb_sectors;
+   char s1[64];
+   int num;
+   int ret;
+   const char *retstr;
+
+   offset = 0;
+   nb_sectors = bs->total_sectors;
+
+   do {
+   ret = bdrv_is_allocated(bs, offset, nb_sectors, &num);
+   retstr = ret ? "allocated" : "not allocated";
+   cvtstr(offset << 9ULL, s1, sizeof(s1));
+   printf("[% 24" PRId64 "] % 8d/% 8d sectors %s at offset %s 
(%d)\n",
+   offset << 9ULL, num, nb_sectors, retstr, s1, ret);
+
+   offset += num;
+   nb_sectors -= num;
+   } while(offset < bs->total_sectors);
+
+   return 0;
+}
+
+static const cmdinfo_t map_cmd = {
+   .name   = "map",
+   .argmin = 0,
+   .argmax = 0,
+   .cfunc  = map_f,
+   .args   = "",
+   .oneline= "prints the allocated areas of a file",
+};
+
+
+static int
 close_f(int argc, char **argv)
 {
bdrv_close(bs);
@@ -1680,6 +1717,7 @@ int main(int argc, char **argv)
add_command(&length_cmd);
add_command(&info_cmd);
add_command(&alloc_cmd);
+   add_command(&map_cmd);
 
add_args_command(init_args_command);
add_check_command(init_check_command);
-- 
1.7.2.2

Re: [Qemu-devel] [PATCH] qemu-io: New command map

2010-09-21 Thread Stefan Hajnoczi

On Tue, Sep 21, 2010 at 12:01 PM, Kevin Wolf  wrote:
> The new map command in qemu-io lists all allocated/unallocated areas in an
> image file.
>
> Signed-off-by: Kevin Wolf 
> ---
>  qemu-io.c |   38 ++
>  1 files changed, 38 insertions(+), 0 deletions(-)
>
> diff --git a/qemu-io.c b/qemu-io.c
> index b4e5cc8..169dd51 100644
> --- a/qemu-io.c
> +++ b/qemu-io.c
> @@ -1443,6 +1443,43 @@ static const cmdinfo_t alloc_cmd = {
>  };
>
>  static int
> +map_f(int argc, char **argv)
> +{
> +       int64_t offset;
> +       int nb_sectors;
> +       char s1[64];
> +       int num;
> +       int ret;
> +       const char *retstr;
> +
> +       offset = 0;
> +       nb_sectors = bs->total_sectors;

total_sectors is int64_t but nb_sectors is int.  A >1 TB image with
will have nb_sectors > 0x8000.  The safest solution is to use
int64_t and cap the nb_sectors argument at INT_MAX per iteration.

Stefan

[Qemu-devel] Re: KVM call agenda for Sept 21

2010-09-21 Thread Avi Kivity

 On 09/21/2010 05:37 AM, Nakajima, Jun wrote:

Avi Kivity wrote on Mon, 20 Sep 2010 at 09:50:55:

>On 09/20/2010 06:44 PM, Chris Wright wrote:
>>  Please send in any agenda items you are interested in covering.
>>
>   nested vmx: the resurrection.  Nice to see it progressing again, but
>  there's still a lot of ground to cover.  Perhaps we can involve Intel to
>  speed things up?
>
Hi, Avi

What are you looking for?

Help in getting the patchset in.  Reviewing is always appreciated (while 
it tends to increase the time, the result is usually better).  If we can 
find a way to share the work, even better.

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

[Qemu-devel] Re: [PATCH] scsi-generic: add missing reset handler

2010-09-21 Thread Kevin Wolf

Am 06.09.2010 16:07, schrieb Bernhard Kohl:
> Ensure that pending requests of a SCSI generic device are purged on
> system reset. This also avoids calling a NULL function in lsi53c895a.
> The lsi code was recently changed to call the .qdev.reset function.
> 
> Signed-off-by: Bernhard Kohl 

Thanks, applied to the block branch.

Kevin

[Qemu-devel] [PATCH v2] qemu-io: New command map

2010-09-21 Thread Kevin Wolf

The new map command in qemu-io lists all allocated/unallocated areas in an
image file.

Signed-off-by: Kevin Wolf 
---
 qemu-io.c |   39 +++
 1 files changed, 39 insertions(+), 0 deletions(-)

diff --git a/qemu-io.c b/qemu-io.c
index b4e5cc8..ff353eb 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -1443,6 +1443,44 @@ static const cmdinfo_t alloc_cmd = {
 };
 
 static int
+map_f(int argc, char **argv)
+{
+   int64_t offset;
+   int64_t nb_sectors;
+   char s1[64];
+   int num, num_checked;
+   int ret;
+   const char *retstr;
+
+   offset = 0;
+   nb_sectors = bs->total_sectors;
+
+   do {
+   num_checked = MIN(nb_sectors, INT_MAX);
+   ret = bdrv_is_allocated(bs, offset, num_checked, &num);
+   retstr = ret ? "allocated" : "not allocated";
+   cvtstr(offset << 9ULL, s1, sizeof(s1));
+   printf("[% 24" PRId64 "] % 8d/% 8d sectors %s at offset %s 
(%d)\n",
+   offset << 9ULL, num, num_checked, retstr, s1, 
ret);
+
+   offset += num;
+   nb_sectors -= num;
+   } while(offset < bs->total_sectors);
+
+   return 0;
+}
+
+static const cmdinfo_t map_cmd = {
+   .name   = "map",
+   .argmin = 0,
+   .argmax = 0,
+   .cfunc  = map_f,
+   .args   = "",
+   .oneline= "prints the allocated areas of a file",
+};
+
+
+static int
 close_f(int argc, char **argv)
 {
bdrv_close(bs);
@@ -1680,6 +1718,7 @@ int main(int argc, char **argv)
add_command(&length_cmd);
add_command(&info_cmd);
add_command(&alloc_cmd);
+   add_command(&map_cmd);
 
add_args_command(init_args_command);
add_check_command(init_check_command);
-- 
1.7.2.2

[Qemu-devel] Re: [PATCH] net: delay peer host device delete

2010-09-21 Thread Anthony Liguori


On 09/21/2010 04:18 AM, Michael S. Tsirkin wrote:

No, netdev_del should remove the VLANClientState from the
non_vlan_clients list.

It's no longer enumerable and it's no longer lookup-able.

The only reason it stays around it so that the device doesn't have a
reference to a free pointer.  The only field that's ever looked at
is is_deleted which is used by every function to turn around and
implement a nop.

The VLANClientState is a hollow shell of it's former glorious self.
The remainder of it's (hopefully short) life is merely so that we
can avoid touching every device to teach them about disconnecting
backends.
 

We'll have to tell them link is down, won't we?
   


Yes, that would be a nice touch.

Regards,

Anthony Liguori

[Qemu-devel] [PATCH v4] blkverify: Add block driver for verifying I/O

2010-09-21 Thread Stefan Hajnoczi

The blkverify block driver makes investigating image format data
corruption much easier.  A raw image initialized with the same contents
as the test image (e.g. qcow2 file) must be provided.  The raw image
mirrors read/write operations and is used to verify that data read from
the test image is correct.

See docs/blkverify.txt for more information.

Signed-off-by: Stefan Hajnoczi 
---
v4:
 * Make blkverify_aio_cancel() wait for request to fully complete
 * Delete s->test_file on .open() failure
 * Merge overlapping buffer allocation logic in blkverify_iovec_clone()

v3:
 * Fix compile error in blkverify_aio_cancel()

v2:
 * Implement aio_cancel() by waiting for pending requests
 * Fix conflict in Makefile.objs

 Makefile.objs  |2 +-
 block/blkverify.c  |  381 
 docs/blkverify.txt |   69 ++
 3 files changed, 451 insertions(+), 1 deletions(-)
 create mode 100644 block/blkverify.c
 create mode 100644 docs/blkverify.txt

diff --git a/Makefile.objs b/Makefile.objs
index 3ef6d80..dad4593 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -14,7 +14,7 @@ block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
 
 block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o 
vvfat.o
 block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
-block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o
+block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
 block-nested-$(CONFIG_WIN32) += raw-win32.o
 block-nested-$(CONFIG_POSIX) += raw-posix.o
 block-nested-$(CONFIG_CURL) += curl.o
diff --git a/block/blkverify.c b/block/blkverify.c
new file mode 100644
index 000..3ad4f3e
--- /dev/null
+++ b/block/blkverify.c
@@ -0,0 +1,381 @@
+/*
+ * Block protocol for block driver correctness testing
+ *
+ * Copyright (C) 2010 IBM, Corp.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include 
+#include "block_int.h"
+
+typedef struct {
+BlockDriverState *test_file;
+} BDRVBlkverifyState;
+
+typedef struct BlkverifyAIOCB BlkverifyAIOCB;
+struct BlkverifyAIOCB {
+BlockDriverAIOCB common;
+QEMUBH *bh;
+
+/* Request metadata */
+bool is_write;
+int64_t sector_num;
+int nb_sectors;
+
+int ret;/* first completed request's result */
+unsigned int done;  /* completion counter */
+bool *finished; /* completion signal for cancel */
+
+QEMUIOVector *qiov; /* user I/O vector */
+QEMUIOVector raw_qiov;  /* cloned I/O vector for raw file */
+void *buf;  /* buffer for raw file I/O */
+
+void (*verify)(BlkverifyAIOCB *acb);
+};
+
+static void blkverify_aio_cancel(BlockDriverAIOCB *blockacb)
+{
+BlkverifyAIOCB *acb = (BlkverifyAIOCB *)blockacb;
+bool finished = false;
+
+/* Wait until request completes, invokes its callback, and frees itself */
+acb->finished = &finished;
+while (!finished) {
+qemu_aio_wait();
+}
+}
+
+static AIOPool blkverify_aio_pool = {
+.aiocb_size = sizeof(BlkverifyAIOCB),
+.cancel = blkverify_aio_cancel,
+};
+
+static void blkverify_err(BlkverifyAIOCB *acb, const char *fmt, ...)
+{
+va_list ap;
+
+va_start(ap, fmt);
+fprintf(stderr, "blkverify: %s sector_num=%ld nb_sectors=%d ",
+acb->is_write ? "write" : "read", acb->sector_num,
+acb->nb_sectors);
+vfprintf(stderr, fmt, ap);
+fprintf(stderr, "\n");
+va_end(ap);
+exit(1);
+}
+
+/* Valid blkverify filenames look like 
blkverify:path/to/raw_image:path/to/image */
+static int blkverify_open(BlockDriverState *bs, const char *filename, int 
flags)
+{
+BDRVBlkverifyState *s = bs->opaque;
+int ret;
+char *raw, *c;
+
+/* Parse the blkverify: prefix */
+if (strncmp(filename, "blkverify:", strlen("blkverify:"))) {
+return -EINVAL;
+}
+filename += strlen("blkverify:");
+
+/* Parse the raw image filename */
+c = strchr(filename, ':');
+if (c == NULL) {
+return -EINVAL;
+}
+
+raw = strdup(filename);
+raw[c - filename] = '\0';
+ret = bdrv_file_open(&bs->file, raw, flags);
+free(raw);
+if (ret < 0) {
+return ret;
+}
+filename = c + 1;
+
+/* Open the test file */
+s->test_file = bdrv_new("");
+ret = bdrv_open(s->test_file, filename, flags, NULL);
+if (ret < 0) {
+bdrv_delete(s->test_file);
+s->test_file = NULL;
+return ret;
+}
+
+return 0;
+}
+
+static void blkverify_close(BlockDriverState *bs)
+{
+BDRVBlkverifyState *s = bs->opaque;
+
+bdrv_delete(s->test_file);
+s->test_file = NULL;
+}
+
+static void blkverify_flush(BlockDriverState *bs)
+{
+BDRVBlkverifyState *s = bs->opaque;
+
+/* Only flush test file, the raw file is not important */
+bdrv_flush(s->test_file);
+}
+
+static int64_t blkverif

Re: [Qemu-devel] Re: [PATCH] net: delay peer host device delete

2010-09-21 Thread Anthony Liguori


On 09/21/2010 04:20 AM, Michael S. Tsirkin wrote:


OK, that's clear enough.
One note though: you won't be able to create another backend
with the same name until the frontend is gone.
   


If you remove it from the linked list, you'll be able to create another 
backend just fine.


Regards,

Anthony Liguori

[Qemu-devel] Re: [PATCH v2] qemu-io: New command map

2010-09-21 Thread Stefan Hajnoczi

On Tue, Sep 21, 2010 at 1:40 PM, Kevin Wolf  wrote:
> The new map command in qemu-io lists all allocated/unallocated areas in an
> image file.
>
> Signed-off-by: Kevin Wolf 
> ---
>  qemu-io.c |   39 +++
>  1 files changed, 39 insertions(+), 0 deletions(-)

Looks good.

Stefan

Re: [Qemu-devel] [PATCH 00/18] Monitor: split HMP and QMP dispatch tables

2010-09-21 Thread Anthony Liguori


On 09/16/2010 03:20 PM, Luiz Capitulino wrote:

The subject says it all: with this series applied we'll get different
dispatch tables for HMP and QMP, which has the side effect of making
QMP commands (such as qmp_capabilities) disappear from HMP's scope.

This is also the beginning of the Monitor's redesign, which aims to
separate QMP, HMP and common code.

There's a penalty, though. We're going to get a bit of duplication
during the process, like duplicated handlers entries in the
dispatch tables.

We'll need more separation and a proper internal QMP interface to
solve that...
   


Acked-by: Anthony Liguori 

It all looks pretty straight forward.  Nice work!

Regards,

Anthony Liguori


---
  Makefile|2 +-
  Makefile.target |7 +-
  monitor.c   |  357 -
  monitor.h   |1 -
  qemu-monitor-qmp.hx | 1541 +++
  qemu-monitor.hx | 1361 +-
  6 files changed, 1774 insertions(+), 1495 deletions(-)

[Qemu-devel] Re: [PATCH] scsi_bus: fix length and xfer_mode for RESERVE and RELEASE commands

2010-09-21 Thread Kevin Wolf

Am 06.09.2010 16:58, schrieb Bernhard Kohl:
> For the RESERVE and RELEASE commands the length must be zero
> and xfer_mode must be SCSI_XFER_NONE.
> 
> Signed-off-by: Bernhard Kohl 

Thanks, applied to the block branch.

Kevin

[Qemu-devel] Re: [PATCH v4] blkverify: Add block driver for verifying I/O

2010-09-21 Thread Kevin Wolf

Am 21.09.2010 14:44, schrieb Stefan Hajnoczi:
> The blkverify block driver makes investigating image format data
> corruption much easier.  A raw image initialized with the same contents
> as the test image (e.g. qcow2 file) must be provided.  The raw image
> mirrors read/write operations and is used to verify that data read from
> the test image is correct.
> 
> See docs/blkverify.txt for more information.
> 
> Signed-off-by: Stefan Hajnoczi 

Thanks, applied to the block branch.

Kevin

[Qemu-devel] NetBSD qemu block device support

2010-09-21 Thread haad

Hi,
On Mar,Thursday 18 2010, at 9:32 PM, Blue Swirl wrote:

> On 3/17/10, haad  wrote:
>> Hi folks,
>>
>> This patch at [1] add support for NetBSD block ioctl calls to qemu
>> block-raw.c file. It was written for xen version of qemu but basically
>> it will work with vanilla qemu, too. Would anyone like to commit this
>> patch so it can be included into the base xen distribution ?
>
> The patch does not apply to current development repository.
>
> There is no description of the change suitable for changelog without
> any editing.
>
> Signed-off-by: line is missing.
>
> The patch combines formatting (whitespace) changes with functional
> changes. The formatting changes seem useless.
>
> The essence of the patch is twofold, It adds NetBSD specific includes
> and NetBSD specific code to raw_getlength(). These look OK, except
> formatting may be a bit off (no space after if) and currently the code
> probably should be surrounded by
> #ifdef DIOCGWEDGEINFO
> #endif
> or just #ifdef __NetBSD__.

I have written this patch some time ago and then I totally forgot on
it (I'm very sorry about that)

Today I looked at it again and changed it to be clean and to properly
apply against current sources.


-- 


Regards.

Adam
diff --git a/block/raw-posix.c b/block/raw-posix.c
index 72fb8ce..444fb2d 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -62,6 +62,12 @@
 #include 
 #include 
 #endif
+#if defined(__NetBSD__)
+#include 
+#include 
+#include 
+#include 
+#endif
 
 #ifdef __DragonFly__
 #include 
@@ -595,6 +601,28 @@ static int64_t raw_getlength(BlockDriverState *bs)
 } else
 return st.st_size;
 }
+#elif defined(__NetBSD__)
+static int64_t  raw_getlength(BlockDriverState *bs)
+{
+	int fd = ((BDRVRawState*)bs->opaque)->fd;
+	struct stat st;
+	if (fstat(fd, &st))
+		return -1;
+	if (S_ISCHR(st.st_mode) || S_ISBLK(st.st_mode)) {
+		struct dkwedge_info dkw;
+		if (ioctl(fd, DIOCGWEDGEINFO, &dkw) != -1) {
+			/* NetBSD still supports only 512 bytes for disk sector */
+			return dkw.dkw_size * 512;
+		} else {
+			struct disklabel dl;
+			if(ioctl(fd, DIOCGDINFO, &dl))
+return -1;
+			return (uint64_t)dl.d_secsize *
+			dl.d_partitions[DISKPART(st.st_rdev)].p_size;
+		}
+	} else
+		return st.st_size;
+}
 #elif defined(__sun__)
 static int64_t raw_getlength(BlockDriverState *bs)
 {

[Qemu-devel] [PATCH] qemu-virtio-9p: Implement TREADLINK operation for 9p2000.L

2010-09-21 Thread M. Mohan Kumar

Synopsis

size[4] TReadlink tag[2] fid[4]
size[4] RReadlink tag[2] target[s]

Description
Readlink is used to return the contents of the symoblic link
referred by fid. Contents of symboic link is returned as a
response.

target[s] - Contents of the symbolic link referred by fid.

Signed-off-by: M. Mohan Kumar 
Reviewed-by: Aneesh Kumar K.V 
---
 hw/virtio-9p-debug.c |8 
 hw/virtio-9p.c   |   44 
 hw/virtio-9p.h   |9 +
 3 files changed, 61 insertions(+), 0 deletions(-)

diff --git a/hw/virtio-9p-debug.c b/hw/virtio-9p-debug.c
index 6f6a0ec..cfaae6b 100644
--- a/hw/virtio-9p-debug.c
+++ b/hw/virtio-9p-debug.c
@@ -502,6 +502,14 @@ void pprint_pdu(V9fsPDU *pdu)
 fprintf(llogfile, "RMKNOD: )");
 pprint_qid(pdu, 0, &offset, "qid");
 break;
+case P9_TREADLINK:
+   fprintf(llogfile, "TREADLINK: (");
+pprint_int32(pdu, 0, &offset, "fid");
+break;
+case P9_RREADLINK:
+   fprintf(llogfile, "RREADLINK: (");
+pprint_str(pdu, 0, &offset, "target");
+break;
 case P9_TREAD:
 fprintf(llogfile, "TREAD: (");
 pprint_int32(pdu, 0, &offset, "fid");
diff --git a/hw/virtio-9p.c b/hw/virtio-9p.c
index fd2147e..5d1fbee 100644
--- a/hw/virtio-9p.c
+++ b/hw/virtio-9p.c
@@ -3403,6 +3403,49 @@ out:
 qemu_free(vs);
 }
 
+static void v9fs_readlink_post_readlink(V9fsState *s, V9fsReadLinkState *vs,
+int err)
+{
+if (err < 0) {
+err = -errno;
+goto out;
+}
+vs->offset += pdu_marshal(vs->pdu, vs->offset, "s", &vs->target);
+err = vs->offset;
+out:
+complete_pdu(s, vs->pdu, err);
+v9fs_string_free(&vs->target);
+qemu_free(vs);
+}
+
+static void v9fs_readlink(V9fsState *s, V9fsPDU *pdu)
+{
+int32_t fid;
+V9fsReadLinkState *vs;
+int err = 0;
+V9fsFidState *fidp;
+
+vs = qemu_malloc(sizeof(*vs));
+vs->pdu = pdu;
+vs->offset = 7;
+
+pdu_unmarshal(vs->pdu, vs->offset, "d", &fid);
+
+fidp = lookup_fid(s, fid);
+if (fidp == NULL) {
+err = -ENOENT;
+goto out;
+}
+
+v9fs_string_init(&vs->target);
+err = v9fs_do_readlink(s, &fidp->path, &vs->target);
+v9fs_readlink_post_readlink(s, vs, err);
+return;
+out:
+complete_pdu(s, vs->pdu, err);
+qemu_free(vs);
+}
+
 typedef void (pdu_handler_t)(V9fsState *s, V9fsPDU *pdu);
 
 static pdu_handler_t *pdu_handlers[] = {
@@ -3414,6 +3457,7 @@ static pdu_handler_t *pdu_handlers[] = {
 [P9_TXATTRCREATE] = v9fs_xattrcreate,
 [P9_TMKNOD] = v9fs_mknod,
 [P9_TRENAME] = v9fs_rename,
+[P9_TREADLINK] = v9fs_readlink,
 [P9_TMKDIR] = v9fs_mkdir,
 [P9_TVERSION] = v9fs_version,
 [P9_TLOPEN] = v9fs_open,
diff --git a/hw/virtio-9p.h b/hw/virtio-9p.h
index 0816ad6..bbc23d5 100644
--- a/hw/virtio-9p.h
+++ b/hw/virtio-9p.h
@@ -27,6 +27,8 @@ enum {
 P9_RMKNOD,
 P9_TRENAME = 20,
 P9_RRENAME,
+P9_TREADLINK = 22,
+P9_RREADLINK,
 P9_TGETATTR = 24,
 P9_RGETATTR,
 P9_TSETATTR = 26,
@@ -434,6 +436,13 @@ typedef struct V9fsXattrState
 void *value;
 } V9fsXattrState;
 
+typedef struct V9fsReadLinkState
+{
+V9fsPDU *pdu;
+size_t offset;
+V9fsString target;
+} V9fsReadLinkState;
+
 extern size_t pdu_packunpack(void *addr, struct iovec *sg, int sg_count,
 size_t offset, size_t size, int pack);
 
-- 
1.7.0.4

[Qemu-devel] Re: [PATCH] new parameter boot=on|off for "-net nic" and "-device" NIC devices

2010-09-21 Thread Bernhard Kohl


Am 19.09.2010 18:07, schrieb ext Michael S. Tsirkin:

On Tue, Sep 14, 2010 at 05:46:55PM +0200, Bernhard Kohl wrote:
   

>  This patch was motivated by the following use case: In our system
>  the VMs usually have 4 NICs, any combination of virtio-net-pci and
>  pci-assign NIC devices. The VMs boot via gPXE preferably over the
>  pci-assign devices.
>  
>  There is no way to make this working with a combination of the

>  current options -net -pcidevice -device -optionrom -boot.
>  
>  With the parameter boot=off it is possible to avoid loading

>  and using gPXE option ROMs either for old style "-net nic" or
>  for "-device" NIC devices. So we can select which NIC is used
>  for booting.
>  
>  A side effect of the boot=off parameter is that unneeded ROMs

>  which might waste memory are not longer loaded. E.g. if you have
>  2 virtio-net-pci and 2 pci-assign NICs in sum 4 option ROMs are
>  loaded and the virtio ROMs take precedence over the pci-assign
>  ROMs. The BIOS uses the first gPXE ROM which it finds and only
>  needs one of them even if there are more NICs of the same type.
>  
>  Without using the boot=on|off parameter the current behaviour

>  does not change.
>  
>  Signed-off-by: Thomas Ostler

>  Signed-off-by: Bernhard Kohl
 

I think this is useful, however:

- We have bit properties which handle parsing on/off
   and other formats automatically. Please don't use string.
- boot is not a great property name for PCI: what
   you actually do is disable option rom.
   So maybe call it 'rom' or something like that?
   

Our main goal is to select a certain NIC's option rom for booting.
So the other roms are not needed and not loaded. We used 'boot'
as the property name as it is similar as in the '-drive' option to
select a certain disk for booting. What's about to call it 'bootrom'?

- given you have added a property, it can now
   be changed with -device. and visible in -device ?
   This also has an advantage of only applying to pci devices
   (-net option would appear to apply to non-pci but have no effect).
   Please do not add more flag parsing in qdemu-options, net and vl.c
   

I think it is OK that we don't support this new feature for the old
style '-net' option and only implement it for qdev / -device?

To summarize, just add a qdev bit option and check
the bit.

   

In general, we will rework the patch and use all new qdev features.
There are also some memory leaks as Chris told us, because of the
usage of strdup.

This rework might take 2 weeks because of vacation.

Thanks
Bernhard

[Qemu-devel] Re: [PATCH v4] blkverify: Add block driver for verifying I/O

2010-09-21 Thread Kevin Wolf

Am 21.09.2010 14:44, schrieb Stefan Hajnoczi:
> The blkverify block driver makes investigating image format data
> corruption much easier.  A raw image initialized with the same contents
> as the test image (e.g. qcow2 file) must be provided.  The raw image
> mirrors read/write operations and is used to verify that data read from
> the test image is correct.
> 
> See docs/blkverify.txt for more information.
> 
> Signed-off-by: Stefan Hajnoczi 

Heh, so four versions is actually not enough. This breaks the mingw32
build (EINPROGESS is not defined).

Kevin

[Qemu-devel] Re: Caching modes

2010-09-21 Thread Christoph Hellwig

On Mon, Sep 20, 2010 at 07:18:14PM -0500, Anthony Liguori wrote:
> O_DIRECT alone to a pre-allocated file on a normal file system should 
> result in the data being visible without any additional metadata 
> transactions.

Anthony, for the third time: no.  O_DIRECT is a non-portable extension
in Linux (taken from IRIX) and is defined as:

   O_DIRECT (Since Linux 2.4.10)
  Try  to minimize cache effects of the I/O to and from this file.
  In general this will degrade performance, but it  is  useful  in
  special  situations,  such  as  when  applications  do their own
  caching.  File I/O is done directly to/from user space  buffers.
  The O_DIRECT flag on its own makes at an effort to transfer data
  synchronously, but does not give the guarantees  of  the  O_SYNC
  that  data and necessary metadata are transferred.  To guarantee
  synchronous I/O the O_SYNC must be used in addition to O_DIRECT.
  See NOTES below for further discussion.

  A  semantically  similar  (but  deprecated)  interface for block
  devices is described in raw(8).

O_DIRECT does not have any meaning for data integrity, it just tells the
filesystem it *should* not use the pagecache.  Even if it should not
various filesystem have fallbacks to buffered I/O for corner cases.
It does *not* mean the actual disk cache gets flushed, and it *does*
not guarantee anything about metadata which is very important.

Metadata updates happen when filling sparse file, when extening the file
size, when using a COW filesystem, and when converting preallocated to
fully allocated extents in practice and could happen in many more cases
depending on the filesystem implementation.

> >Barriers are a Linux-specific implementation details that is in the
> >process of going away, probably in Linux 2.6.37.  But if you want
> >O_DSYNC semantics with a volatile disk write cache there is no way
> >around using a cache flush or the FUA bit on all I/O caused by it.
> 
> If you have a volatile disk write cache, then we don't need O_DSYNC 
> semantics.

If you present a volatile write cache to the guest you do indeed not
need O_DSYNC and can rely on the guest sending fdatasync calls when it
wants to flush the cache.  But for the statement above you can replace
O_DSYC with fdatasync and it will still be correct.  O_DSYNC in current
Linux kernels is nothing but an implicit range fdatasync after each
write.

> >   We
> >currently use the cache flush, and although I plan to experiment a bit
> >more with the FUA bit for O_DIRECT | O_DSYNC writes I would be very
> >surprised if they actually are any faster.
> >   
> 
> The thing I struggle with understanding is that if the guest is sending 
> us a write request, why are we sending the underlying disk a write + 
> flush request?  That doesn't seem logical at all to me.

We only send a cache flush request *iff* we present the guest a device
without a volatile write cache so that it can assume all writes are
stable and we sit on a device that does have a volatile write cache.

> Even if we advertise WC disable, it should be up to the guest to decide 
> when to issue flushes.

No.  If we don't claim to have a volatile cache no guest will ever flush
the cache.  Which is just logially given that we just told it that we
don't have a cache that needs flushing.

> >ext3 and ext4 have really bad fsync implementations.  Just use a better
> >filesystem or bug one of it's developers if you want that fixed.  But
> >except for disabling the disk cache there is no way to get data integrity
> >without cache flushes (the FUA bit is nothing but an implicit flush).
> >   
> 
> But why are we issuing more flushes than the guest is issuing if we 
> don't have to worry about filesystem metadata (i.e. preallocated storage 
> or physical devices)?

Who is "we" and what is workload/filesystem/kernel combination?
Specific details and numbers please.

Re: [Qemu-devel] Re: [PATCH v4] blkverify: Add block driver for verifying I/O

2010-09-21 Thread Stefan Hajnoczi

On Tue, Sep 21, 2010 at 3:26 PM, Kevin Wolf  wrote:
> Am 21.09.2010 14:44, schrieb Stefan Hajnoczi:
>> The blkverify block driver makes investigating image format data
>> corruption much easier.  A raw image initialized with the same contents
>> as the test image (e.g. qcow2 file) must be provided.  The raw image
>> mirrors read/write operations and is used to verify that data read from
>> the test image is correct.
>>
>> See docs/blkverify.txt for more information.
>>
>> Signed-off-by: Stefan Hajnoczi 
>
> Heh, so four versions is actually not enough. This breaks the mingw32
> build (EINPROGESS is not defined).

For Windows we need to include qemu_socket.h so EINPROGRESS is
defined.  I will send a v5, thank you for pointing this out.

Stefan

[Qemu-devel] [PATCH v5] blkverify: Add block driver for verifying I/O

2010-09-21 Thread Stefan Hajnoczi

The blkverify block driver makes investigating image format data
corruption much easier.  A raw image initialized with the same contents
as the test image (e.g. qcow2 file) must be provided.  The raw image
mirrors read/write operations and is used to verify that data read from
the test image is correct.

See docs/blkverify.txt for more information.

Signed-off-by: Stefan Hajnoczi 
---
v5:
 * Include qemu_socket.h for EINPROGRESS on Windows

v4:
 * Make blkverify_aio_cancel() wait for request to fully complete
 * Delete s->test_file on .open() failure
 * Merge overlapping buffer allocation logic in blkverify_iovec_clone()

v3:
 * Fix compile error in blkverify_aio_cancel()

v2:
 * Implement aio_cancel() by waiting for pending requests
 * Fix conflict in Makefile.objs

 Makefile.objs  |2 +-
 block/blkverify.c  |  382 
 docs/blkverify.txt |   69 ++
 3 files changed, 452 insertions(+), 1 deletions(-)
 create mode 100644 block/blkverify.c
 create mode 100644 docs/blkverify.txt

diff --git a/Makefile.objs b/Makefile.objs
index 3ef6d80..dad4593 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -14,7 +14,7 @@ block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
 
 block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o 
vvfat.o
 block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
-block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o
+block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
 block-nested-$(CONFIG_WIN32) += raw-win32.o
 block-nested-$(CONFIG_POSIX) += raw-posix.o
 block-nested-$(CONFIG_CURL) += curl.o
diff --git a/block/blkverify.c b/block/blkverify.c
new file mode 100644
index 000..4202685
--- /dev/null
+++ b/block/blkverify.c
@@ -0,0 +1,382 @@
+/*
+ * Block protocol for block driver correctness testing
+ *
+ * Copyright (C) 2010 IBM, Corp.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include 
+#include "qemu_socket.h" /* for EINPROGRESS on Windows */
+#include "block_int.h"
+
+typedef struct {
+BlockDriverState *test_file;
+} BDRVBlkverifyState;
+
+typedef struct BlkverifyAIOCB BlkverifyAIOCB;
+struct BlkverifyAIOCB {
+BlockDriverAIOCB common;
+QEMUBH *bh;
+
+/* Request metadata */
+bool is_write;
+int64_t sector_num;
+int nb_sectors;
+
+int ret;/* first completed request's result */
+unsigned int done;  /* completion counter */
+bool *finished; /* completion signal for cancel */
+
+QEMUIOVector *qiov; /* user I/O vector */
+QEMUIOVector raw_qiov;  /* cloned I/O vector for raw file */
+void *buf;  /* buffer for raw file I/O */
+
+void (*verify)(BlkverifyAIOCB *acb);
+};
+
+static void blkverify_aio_cancel(BlockDriverAIOCB *blockacb)
+{
+BlkverifyAIOCB *acb = (BlkverifyAIOCB *)blockacb;
+bool finished = false;
+
+/* Wait until request completes, invokes its callback, and frees itself */
+acb->finished = &finished;
+while (!finished) {
+qemu_aio_wait();
+}
+}
+
+static AIOPool blkverify_aio_pool = {
+.aiocb_size = sizeof(BlkverifyAIOCB),
+.cancel = blkverify_aio_cancel,
+};
+
+static void blkverify_err(BlkverifyAIOCB *acb, const char *fmt, ...)
+{
+va_list ap;
+
+va_start(ap, fmt);
+fprintf(stderr, "blkverify: %s sector_num=%ld nb_sectors=%d ",
+acb->is_write ? "write" : "read", acb->sector_num,
+acb->nb_sectors);
+vfprintf(stderr, fmt, ap);
+fprintf(stderr, "\n");
+va_end(ap);
+exit(1);
+}
+
+/* Valid blkverify filenames look like 
blkverify:path/to/raw_image:path/to/image */
+static int blkverify_open(BlockDriverState *bs, const char *filename, int 
flags)
+{
+BDRVBlkverifyState *s = bs->opaque;
+int ret;
+char *raw, *c;
+
+/* Parse the blkverify: prefix */
+if (strncmp(filename, "blkverify:", strlen("blkverify:"))) {
+return -EINVAL;
+}
+filename += strlen("blkverify:");
+
+/* Parse the raw image filename */
+c = strchr(filename, ':');
+if (c == NULL) {
+return -EINVAL;
+}
+
+raw = strdup(filename);
+raw[c - filename] = '\0';
+ret = bdrv_file_open(&bs->file, raw, flags);
+free(raw);
+if (ret < 0) {
+return ret;
+}
+filename = c + 1;
+
+/* Open the test file */
+s->test_file = bdrv_new("");
+ret = bdrv_open(s->test_file, filename, flags, NULL);
+if (ret < 0) {
+bdrv_delete(s->test_file);
+s->test_file = NULL;
+return ret;
+}
+
+return 0;
+}
+
+static void blkverify_close(BlockDriverState *bs)
+{
+BDRVBlkverifyState *s = bs->opaque;
+
+bdrv_delete(s->test_file);
+s->test_file = NULL;
+}
+
+static void blkverify_flush(BlockDriverState *bs)
+{
+BDRVBlkverifyState *s = bs->opaque;
+
+/*

[Qemu-devel] Re: [PATCH] new parameter boot=on|off for "-net nic" and "-device" NIC devices

2010-09-21 Thread Michael S. Tsirkin

On Tue, Sep 21, 2010 at 04:18:56PM +0200, Bernhard Kohl wrote:
> Am 19.09.2010 18:07, schrieb ext Michael S. Tsirkin:
> >On Tue, Sep 14, 2010 at 05:46:55PM +0200, Bernhard Kohl wrote:
> >>>  This patch was motivated by the following use case: In our system
> >>>  the VMs usually have 4 NICs, any combination of virtio-net-pci and
> >>>  pci-assign NIC devices. The VMs boot via gPXE preferably over the
> >>>  pci-assign devices.
> >>>  >  There is no way to make this working with a combination of
> >>the
> >>>  current options -net -pcidevice -device -optionrom -boot.
> >>>  >  With the parameter boot=off it is possible to avoid
> >>loading
> >>>  and using gPXE option ROMs either for old style "-net nic" or
> >>>  for "-device" NIC devices. So we can select which NIC is used
> >>>  for booting.
> >>>  >  A side effect of the boot=off parameter is that unneeded
> >>ROMs
> >>>  which might waste memory are not longer loaded. E.g. if you have
> >>>  2 virtio-net-pci and 2 pci-assign NICs in sum 4 option ROMs are
> >>>  loaded and the virtio ROMs take precedence over the pci-assign
> >>>  ROMs. The BIOS uses the first gPXE ROM which it finds and only
> >>>  needs one of them even if there are more NICs of the same type.
> >>>  >  Without using the boot=on|off parameter the current
> >>behaviour
> >>>  does not change.
> >>>  >  Signed-off-by: Thomas Ostler
> >>>  Signed-off-by: Bernhard Kohl
> >I think this is useful, however:
> >
> >- We have bit properties which handle parsing on/off
> >   and other formats automatically. Please don't use string.
> >- boot is not a great property name for PCI: what
> >   you actually do is disable option rom.
> >   So maybe call it 'rom' or something like that?
> Our main goal is to select a certain NIC's option rom for booting.
> So the other roms are not needed and not loaded. We used 'boot'
> as the property name as it is similar as in the '-drive' option to
> select a certain disk for booting. What's about to call it 'bootrom'?

Sounds good.

> >- given you have added a property, it can now
> >   be changed with -device. and visible in -device ?
> >   This also has an advantage of only applying to pci devices
> >   (-net option would appear to apply to non-pci but have no effect).
> >   Please do not add more flag parsing in qdemu-options, net and vl.c
> I think it is OK that we don't support this new feature for the old
> style '-net' option and only implement it for qdev / -device?


I think so, too.

> >To summarize, just add a qdev bit option and check
> >the bit.
> >
> In general, we will rework the patch and use all new qdev features.
> There are also some memory leaks as Chris told us, because of the
> usage of strdup.
> 
> This rework might take 2 weeks because of vacation.
> 
> Thanks
> Bernhard

Sounds good.

-- 
MST

Re: [Qemu-devel] Re: [PATCH] new parameter boot=on|off for "-net nic" and "-device" NIC devices

2010-09-21 Thread Anthony Liguori


On 09/19/2010 11:07 AM, Michael S. Tsirkin wrote:

On Tue, Sep 14, 2010 at 05:46:55PM +0200, Bernhard Kohl wrote:
   

This patch was motivated by the following use case: In our system
the VMs usually have 4 NICs, any combination of virtio-net-pci and
pci-assign NIC devices. The VMs boot via gPXE preferably over the
pci-assign devices.

There is no way to make this working with a combination of the
current options -net -pcidevice -device -optionrom -boot.

With the parameter boot=off it is possible to avoid loading
and using gPXE option ROMs either for old style "-net nic" or
for "-device" NIC devices. So we can select which NIC is used
for booting.

A side effect of the boot=off parameter is that unneeded ROMs
which might waste memory are not longer loaded. E.g. if you have
2 virtio-net-pci and 2 pci-assign NICs in sum 4 option ROMs are
loaded and the virtio ROMs take precedence over the pci-assign
ROMs. The BIOS uses the first gPXE ROM which it finds and only
needs one of them even if there are more NICs of the same type.

Without using the boot=on|off parameter the current behaviour
does not change.

Signed-off-by: Thomas Ostler
Signed-off-by: Bernhard Kohl
 

I think this is useful, however:

- We have bit properties which handle parsing on/off
   and other formats automatically. Please don't use string.
   


This is unneeded.  Just do romfile= with -device and it will suppress 
the option rom loading.


IOW:

-device virtio-net-pci,romfile= -pcidevice ...

But BTW, you should be able to select the pci device by doing:

-boot cdn,menu=on

And then hitting F12.  We need to come up with a better way to let 
particular BEV or BCV devices to be chosen from the command line.


Regards,

Anthony Liguori


- boot is not a great property name for PCI: what
   you actually do is disable option rom.
   So maybe call it 'rom' or something like that?
- given you have added a property, it can now
   be changed with -device. and visible in -device ?
   This also has an advantage of only applying to pci devices
   (-net option would appear to apply to non-pci but have no effect).
   Please do not add more flag parsing in qdemu-options, net and vl.c

To summarize, just add a qdev bit option and check
the bit.

   

---
  hw/pci.c|8 +++-
  hw/pci.h|1 +
  hw/qdev.c   |6 ++
  hw/qdev.h   |1 +
  net.c   |8 
  net.h   |1 +
  qemu-options.hx |8 ++--
  vl.c|   27 +++
  8 files changed, 57 insertions(+), 3 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index a98d6f3..055a2be 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -71,6 +71,7 @@ static struct BusInfo pci_bus_info = {
  DEFINE_PROP_UINT32("rombar",  PCIDevice, rom_bar, 1),
  DEFINE_PROP_BIT("multifunction", PCIDevice, cap_present,
  QEMU_PCI_CAP_MULTIFUNCTION_BITNR, false),
+DEFINE_PROP_STRING("boot", PCIDevice, boot),
  DEFINE_PROP_END_OF_LIST()
  }
  };
@@ -1513,6 +1514,10 @@ PCIDevice *pci_nic_init(NICInfo *nd, const char 
*default_model,

  pci_dev = pci_create(bus, devfn, pci_nic_names[i]);
  dev =&pci_dev->qdev;
+if (nd->name)
+dev->id = qemu_strdup(nd->name);
+if (nd->no_boot)
+dev->no_boot = 1;
  qdev_set_nic_properties(dev, nd);
  if (qdev_init(dev)<  0)
  return NULL;
@@ -1693,7 +1698,8 @@ static int pci_qdev_init(DeviceState *qdev, DeviceInfo 
*base)
  /* rom loading */
  if (pci_dev->romfile == NULL&&  info->romfile != NULL)
  pci_dev->romfile = qemu_strdup(info->romfile);
-pci_add_option_rom(pci_dev);
+if (!qdev->no_boot)
+pci_add_option_rom(pci_dev);

  if (qdev->hotplugged) {
  rc = bus->hotplug(bus->hotplug_qdev, pci_dev, 1);
diff --git a/hw/pci.h b/hw/pci.h
index 1eab7e7..20aa038 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -172,6 +172,7 @@ struct PCIDevice {
  char *romfile;
  ram_addr_t rom_offset;
  uint32_t rom_bar;
+char *boot;
  };

  PCIDevice *pci_register_device(PCIBus *bus, const char *name,
diff --git a/hw/qdev.c b/hw/qdev.c
index 35858cb..8445bc9 100644
--- a/hw/qdev.c
+++ b/hw/qdev.c
@@ -249,6 +249,10 @@ DeviceState *qdev_device_add(QemuOpts *opts)
  qdev_free(qdev);
  return NULL;
  }
+if (qemu_opt_get(opts, "boot")) {
+if (!strcmp("off", qemu_strdup(qemu_opt_get(opts, "boot"
+qdev->no_boot = 1;
+}
  if (qdev_init(qdev)<  0) {
  qerror_report(QERR_DEVICE_INIT_FAILED, driver);
  return NULL;
@@ -421,6 +425,8 @@ void qdev_set_nic_properties(DeviceState *dev, NICInfo *nd)
  qdev_prop_exists(dev, "vectors")) {
  qdev_prop_set_uint32(dev, "vectors", nd->nvectors);
  }
+if (nd->no_boot)
+qdev_prop_parse(dev, "boot", "off");
  }

  static int next_block_unit[IF_COUNT];
diff --git a/hw/qdev.h b/hw/qdev.h
index 579328a..e7df371 100644
--- a/hw/qdev.h
+++ b/hw/qdev.h

[Qemu-devel] Re: Caching modes

2010-09-21 Thread Anthony Liguori


On 09/21/2010 09:26 AM, Christoph Hellwig wrote:

On Mon, Sep 20, 2010 at 07:18:14PM -0500, Anthony Liguori wrote:
   

O_DIRECT alone to a pre-allocated file on a normal file system should
result in the data being visible without any additional metadata
transactions.
 

Anthony, for the third time: no.  O_DIRECT is a non-portable extension
in Linux (taken from IRIX) and is defined as:


O_DIRECT (Since Linux 2.4.10)
   Try  to minimize cache effects of the I/O to and from this file.
   In general this will degrade performance, but it  is  useful  in
   special  situations,  such  as  when  applications  do their own
   caching.  File I/O is done directly to/from user space  buffers.
   The O_DIRECT flag on its own makes at an effort to transfer data
   synchronously, but does not give the guarantees  of  the  O_SYNC
   that  data and necessary metadata are transferred.  To guarantee
   synchronous I/O the O_SYNC must be used in addition to O_DIRECT.
   See NOTES below for further discussion.

   A  semantically  similar  (but  deprecated)  interface for block
   devices is described in raw(8).

O_DIRECT does not have any meaning for data integrity, it just tells the
filesystem it *should* not use the pagecache.  Even if it should not
various filesystem have fallbacks to buffered I/O for corner cases.
It does *not* mean the actual disk cache gets flushed, and it *does*
not guarantee anything about metadata which is very important.
   


Yes, I understand all of this but I was trying to avoid accepting it.  
But after the call today, I'm convinced that this is fundamentally a 
filesystem problem.


I think what we need to do is:

1) make virtual WC guest controllable.  If a guest enables WC, &= 
~O_DSYNC.  If it disables WC, |= O_DSYNC.  Obviously, we can let a user 
specify the virtual WC mode but it has to be changable during live 
migration.


2) only let the user choose between using and not using the host page 
cache.  IOW, direct=on|off.  cache=XXX is deprecated.


3) make O_DIRECT | O_DSYNC not suck so badly on ext4.


Barriers are a Linux-specific implementation details that is in the
process of going away, probably in Linux 2.6.37.  But if you want
O_DSYNC semantics with a volatile disk write cache there is no way
around using a cache flush or the FUA bit on all I/O caused by it.
   

If you have a volatile disk write cache, then we don't need O_DSYNC
semantics.
 

If you present a volatile write cache to the guest you do indeed not
need O_DSYNC and can rely on the guest sending fdatasync calls when it
wants to flush the cache.  But for the statement above you can replace
O_DSYC with fdatasync and it will still be correct.  O_DSYNC in current
Linux kernels is nothing but an implicit range fdatasync after each
write.
   


Yes.  I was stuck on O_DSYNC being independent of the virtual WC but 
it's clear to me now that it cannot be.



ext3 and ext4 have really bad fsync implementations.  Just use a better
filesystem or bug one of it's developers if you want that fixed.  But
except for disabling the disk cache there is no way to get data integrity
without cache flushes (the FUA bit is nothing but an implicit flush).

   

But why are we issuing more flushes than the guest is issuing if we
don't have to worry about filesystem metadata (i.e. preallocated storage
or physical devices)?
 

Who is "we" and what is workload/filesystem/kernel combination?
Specific details and numbers please.
   


My concern is ext4.  With a preallocated file and cache=none as 
implemented today, performance is good even when barrier=1.  If we 
enable O_DSYNC, performance will plummet.  Ultimately, this is an ext4 
problem, not a QEMU problem.


Perhaps we can issue a warning if the WC is disabled and we do an fsstat 
and see that it's ext4 with barriers enabled.


I think it's more common for a user to want to disable a virtual WC 
because they have less faith in the hypervisor than they have in the 
underlying storage.


The scenarios I am concerned about:

1) User has enterprise storage, but has an image on ext4 with 
barrier=1.  User explicitly disables WC in guest because they have 
enterprise storage but not an UPS for the hypervisor.


2) User does not have enterprise storage, but has an image on ext4 with 
barrier=1.  User explicitly disables WC in guest because they don't know 
what they're doing.


In the case of (1), the answer may be "ext4 sucks, remount with 
barrier=0" but I think we need to at least warn the user of this.


For (2), again it's probably the user doing the wrong thing because if 
they don't have enterprise storage, then they shouldn't care about a 
virtual WC.  Practically though, I've seen a lot of this with users.


Regards,

Anthony Liguori

Re: [Qemu-devel] Re: [PATCH] new parameter boot=on|off for "-net nic" and "-device" NIC devices

2010-09-21 Thread Anthony Liguori

On 09/21/2010 09:18 AM, Bernhard Kohl wrote:

Am 19.09.2010 18:07, schrieb ext Michael S. Tsirkin:

On Tue, Sep 14, 2010 at 05:46:55PM +0200, Bernhard Kohl wrote:

>  This patch was motivated by the following use case: In our system
>  the VMs usually have 4 NICs, any combination of virtio-net-pci and
>  pci-assign NIC devices. The VMs boot via gPXE preferably over the
>  pci-assign devices.
> >  There is no way to make this working with a combination of the
>  current options -net -pcidevice -device -optionrom -boot.
> >  With the parameter boot=off it is possible to avoid loading
>  and using gPXE option ROMs either for old style "-net nic" or
>  for "-device" NIC devices. So we can select which NIC is used
>  for booting.
> >  A side effect of the boot=off parameter is that unneeded ROMs
>  which might waste memory are not longer loaded. E.g. if you have
>  2 virtio-net-pci and 2 pci-assign NICs in sum 4 option ROMs are
>  loaded and the virtio ROMs take precedence over the pci-assign
>  ROMs. The BIOS uses the first gPXE ROM which it finds and only
>  needs one of them even if there are more NICs of the same type.
> >  Without using the boot=on|off parameter the current behaviour
>  does not change.
> >  Signed-off-by: Thomas Ostler
>  Signed-off-by: Bernhard Kohl

I think this is useful, however:

- We have bit properties which handle parsing on/off
   and other formats automatically. Please don't use string.
- boot is not a great property name for PCI: what
   you actually do is disable option rom.
   So maybe call it 'rom' or something like that?

Our main goal is to select a certain NIC's option rom for booting.
So the other roms are not needed and not loaded. We used 'boot'
as the property name as it is similar as in the '-drive' option to
select a certain disk for booting. What's about to call it 'bootrom'?

This is the wrong approach.  A rom doesn't have to be BEV and if it's 
not BEV or PnP, it may search for any device it can understand.  So even 
if you did boot=on to the third NIC, the first NIC may be booted from.  
So just suppressing ROM loading does not imply that any specific device 
will be chosen to boot from.

The correct approach is to communicate to the BIOS which of the boot 
entries you would like to be chosen by default.  That means QEMU and 
SeaBIOS need to agree on a mechanism to map QEMU visible devices to a 
boot entry.

Regards,

Anthony Liguori

[Qemu-devel] [PATCH 01/20] vvfat: Fix segfault on write to read-only disk

2010-09-21 Thread Kevin Wolf

From: Kevin Wolf 

vvfat tries to set the readonly flag in its open function, but nowadays
this is overwritted with the readonly=... command line option. Check in
bdrv_write if the vvfat was opened read-only and return an error in this
case.

Without this check, vvfat tries to access the qcow bs, which is NULL
without enabled write support.

Signed-off-by: Kevin Wolf 
---
 block/vvfat.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/block/vvfat.c b/block/vvfat.c
index 365332a..5898d66 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -2665,6 +2665,11 @@ static int vvfat_write(BlockDriverState *bs, int64_t 
sector_num,
 
 DLOG(checkpoint());
 
+/* Check if we're operating in read-only mode */
+if (s->qcow == NULL) {
+return -EACCES;
+}
+
 vvfat_close_current_file(s);
 
 /*
-- 
1.7.2.2

[Qemu-devel] [PATCH 05/20] raw-posix: handle > 512 byte alignment correctly

2010-09-21 Thread Kevin Wolf

From: Christoph Hellwig 

Replace the hardcoded handling of 512 byte alignment with bs->buffer_alignment
to handle larger sector size devices correctly.

Note that we can not rely on it to be initialize in bdrv_open, so deal
with the worst case there.

Signed-off-by: Christoph Hellwig 
Signed-off-by: Kevin Wolf 
---
 block/raw-posix.c |   79 +++--
 1 files changed, 46 insertions(+), 33 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 813372a..a5cbb7e 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -97,12 +97,12 @@
 #define FTYPE_CD 1
 #define FTYPE_FD 2
 
-#define ALIGNED_BUFFER_SIZE (32 * 512)
-
 /* if the FD is not accessed during that time (in ms), we try to
reopen it to see if the disk has been changed */
 #define FD_OPEN_TIMEOUT 1000
 
+#define MAX_BLOCKSIZE  4096
+
 typedef struct BDRVRawState {
 int fd;
 int type;
@@ -118,7 +118,8 @@ typedef struct BDRVRawState {
 int use_aio;
 void *aio_ctx;
 #endif
-uint8_t* aligned_buf;
+uint8_t *aligned_buf;
+unsigned aligned_buf_size;
 } BDRVRawState;
 
 static int fd_open(BlockDriverState *bs);
@@ -161,7 +162,12 @@ static int raw_open_common(BlockDriverState *bs, const 
char *filename,
 s->aligned_buf = NULL;
 
 if ((bdrv_flags & BDRV_O_NOCACHE)) {
-s->aligned_buf = qemu_blockalign(bs, ALIGNED_BUFFER_SIZE);
+/*
+ * Allocate a buffer for read/modify/write cycles.  Chose the size
+ * pessimistically as we don't know the block size yet.
+ */
+s->aligned_buf_size = 32 * MAX_BLOCKSIZE;
+s->aligned_buf = qemu_memalign(MAX_BLOCKSIZE, s->aligned_buf_size);
 if (s->aligned_buf == NULL) {
 goto out_close;
 }
@@ -278,8 +284,9 @@ static int raw_pread_aligned(BlockDriverState *bs, int64_t 
offset,
 }
 
 /*
- * offset and count are in bytes, but must be multiples of 512 for files
- * opened with O_DIRECT. buf must be aligned to 512 bytes then.
+ * offset and count are in bytes, but must be multiples of the sector size
+ * for files opened with O_DIRECT. buf must be aligned to sector size bytes
+ * then.
  *
  * This function may be called without alignment if the caller ensures
  * that O_DIRECT is not in effect.
@@ -316,24 +323,25 @@ static int raw_pread(BlockDriverState *bs, int64_t offset,
  uint8_t *buf, int count)
 {
 BDRVRawState *s = bs->opaque;
+unsigned sector_mask = bs->buffer_alignment - 1;
 int size, ret, shift, sum;
 
 sum = 0;
 
 if (s->aligned_buf != NULL)  {
 
-if (offset & 0x1ff) {
-/* align offset on a 512 bytes boundary */
+if (offset & sector_mask) {
+/* align offset on a sector size bytes boundary */
 
-shift = offset & 0x1ff;
-size = (shift + count + 0x1ff) & ~0x1ff;
-if (size > ALIGNED_BUFFER_SIZE)
-size = ALIGNED_BUFFER_SIZE;
+shift = offset & sector_mask;
+size = (shift + count + sector_mask) & ~sector_mask;
+if (size > s->aligned_buf_size)
+size = s->aligned_buf_size;
 ret = raw_pread_aligned(bs, offset - shift, s->aligned_buf, size);
 if (ret < 0)
 return ret;
 
-size = 512 - shift;
+size = bs->buffer_alignment - shift;
 if (size > count)
 size = count;
 memcpy(buf, s->aligned_buf + shift, size);
@@ -346,15 +354,15 @@ static int raw_pread(BlockDriverState *bs, int64_t offset,
 if (count == 0)
 return sum;
 }
-if (count & 0x1ff || (uintptr_t) buf & 0x1ff) {
+if (count & sector_mask || (uintptr_t) buf & sector_mask) {
 
 /* read on aligned buffer */
 
 while (count) {
 
-size = (count + 0x1ff) & ~0x1ff;
-if (size > ALIGNED_BUFFER_SIZE)
-size = ALIGNED_BUFFER_SIZE;
+size = (count + sector_mask) & ~sector_mask;
+if (size > s->aligned_buf_size)
+size = s->aligned_buf_size;
 
 ret = raw_pread_aligned(bs, offset, s->aligned_buf, size);
 if (ret < 0) {
@@ -404,25 +412,28 @@ static int raw_pwrite(BlockDriverState *bs, int64_t 
offset,
   const uint8_t *buf, int count)
 {
 BDRVRawState *s = bs->opaque;
+unsigned sector_mask = bs->buffer_alignment - 1;
 int size, ret, shift, sum;
 
 sum = 0;
 
 if (s->aligned_buf != NULL) {
 
-if (offset & 0x1ff) {
-/* align offset on a 512 bytes boundary */
-shift = offset & 0x1ff;
-ret = raw_pread_aligned(bs, offset - shift, s->aligned_buf, 512);
+if (offset & sector_mask) {
+/* align offset on a sector size bytes boundary */
+shift = offset & sector_mask;
+ret = raw_pread_aligned(bs, offset - shift

[Qemu-devel] [PATCH 06/20] Improve qemu-nbd performance by 4400 %

2010-09-21 Thread Kevin Wolf

From: Laurent Vivier 

This patch allows to reduce the boot time from an NBD server from 225 seconds to
5 seconds (time between the "boot cd:0" and the kernel init) for the
following command lines:

./qemu-nbd -t ../ISO/debian-500-powerpc-netinst.iso
and
./ppc-softmmu/qemu-system-ppc -cdrom nbd:localhost:1024

This patch combines the reply header and payload send operation.

Signed-off-by: Laurent Vivier 
Signed-off-by: Kevin Wolf 
---
 nbd.c |   25 ++---
 1 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/nbd.c b/nbd.c
index 011b50f..4bf2eb7 100644
--- a/nbd.c
+++ b/nbd.c
@@ -49,6 +49,7 @@
 
 /* This is all part of the "official" NBD API */
 
+#define NBD_REPLY_SIZE (4 + 4 + 8)
 #define NBD_REQUEST_MAGIC   0x25609513
 #define NBD_REPLY_MAGIC 0x67446698
 
@@ -588,7 +589,7 @@ static int nbd_receive_request(int csock, struct 
nbd_request *request)
 
 int nbd_receive_reply(int csock, struct nbd_reply *reply)
 {
-   uint8_t buf[4 + 4 + 8];
+   uint8_t buf[NBD_REPLY_SIZE];
uint32_t magic;
 
memset(buf, 0xAA, sizeof(buf));
@@ -655,9 +656,9 @@ int nbd_trip(BlockDriverState *bs, int csock, off_t size, 
uint64_t dev_offset,
if (nbd_receive_request(csock, &request) == -1)
return -1;
 
-   if (request.len > data_size) {
+   if (request.len + NBD_REPLY_SIZE > data_size) {
LOG("len (%u) is larger than max len (%u)",
-   request.len, data_size);
+   request.len + NBD_REPLY_SIZE, data_size);
errno = EINVAL;
return -1;
}
@@ -687,7 +688,8 @@ int nbd_trip(BlockDriverState *bs, int csock, off_t size, 
uint64_t dev_offset,
case NBD_CMD_READ:
TRACE("Request type is READ");
 
-   if (bdrv_read(bs, (request.from + dev_offset) / 512, data,
+   if (bdrv_read(bs, (request.from + dev_offset) / 512,
+ data + NBD_REPLY_SIZE,
  request.len / 512) == -1) {
LOG("reading from file failed");
errno = EINVAL;
@@ -697,12 +699,21 @@ int nbd_trip(BlockDriverState *bs, int csock, off_t size, 
uint64_t dev_offset,
 
TRACE("Read %u byte(s)", request.len);
 
-   if (nbd_send_reply(csock, &reply) == -1)
-   return -1;
+   /* Reply
+  [ 0 ..  3]magic   (NBD_REPLY_MAGIC)
+  [ 4 ..  7]error   (0 == no error)
+  [ 7 .. 15]handle
+*/
+
+   cpu_to_be32w((uint32_t*)data, NBD_REPLY_MAGIC);
+   cpu_to_be32w((uint32_t*)(data + 4), reply.error);
+   cpu_to_be64w((uint64_t*)(data + 8), reply.handle);
 
TRACE("Sending data to client");
 
-   if (write_sync(csock, data, request.len) != request.len) {
+   if (write_sync(csock, data,
+  request.len + NBD_REPLY_SIZE) !=
+  request.len + NBD_REPLY_SIZE) {
LOG("writing to socket failed");
errno = EINVAL;
return -1;
-- 
1.7.2.2

[Qemu-devel] [PATCH 10/20] qcow2: Move sync out of qcow2_alloc_clusters

2010-09-21 Thread Kevin Wolf

Signed-off-by: Kevin Wolf 
---
 block/qcow2-cluster.c  |3 +++
 block/qcow2-refcount.c |4 ++--
 block/qcow2-snapshot.c |2 ++
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index f562b16..818c0db 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -60,6 +60,7 @@ int qcow2_grow_l1_table(BlockDriverState *bs, int min_size)
 qemu_free(new_l1_table);
 return new_l1_table_offset;
 }
+bdrv_flush(bs->file);
 
 BLKDBG_EVENT(bs->file, BLKDBG_L1_GROW_WRITE_TABLE);
 for(i = 0; i < s->l1_size; i++)
@@ -243,6 +244,7 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, 
uint64_t **table)
 if (l2_offset < 0) {
 return l2_offset;
 }
+bdrv_flush(bs->file);
 
 /* allocate a new entry in the l2 cache */
 
@@ -863,6 +865,7 @@ int qcow2_alloc_cluster_offset(BlockDriverState *bs, 
uint64_t offset,
 QLIST_REMOVE(m, next_in_flight);
 return cluster_offset;
 }
+bdrv_flush(bs->file);
 
 /* save info needed for meta data update */
 m->offset = offset;
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 4fc3f80..7082601 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -629,8 +629,6 @@ int64_t qcow2_alloc_clusters(BlockDriverState *bs, int64_t 
size)
 return ret;
 }
 
-bdrv_flush(bs->file);
-
 return offset;
 }
 
@@ -678,6 +676,8 @@ int64_t qcow2_alloc_bytes(BlockDriverState *bs, int size)
 goto redo;
 }
 }
+
+bdrv_flush(bs->file);
 return offset;
 }
 
diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index 6228612..bbfcaaa 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -138,6 +138,7 @@ static int qcow_write_snapshots(BlockDriverState *bs)
 snapshots_size = offset;
 
 snapshots_offset = qcow2_alloc_clusters(bs, snapshots_size);
+bdrv_flush(bs->file);
 offset = snapshots_offset;
 if (offset < 0) {
 return offset;
@@ -271,6 +272,7 @@ int qcow2_snapshot_create(BlockDriverState *bs, 
QEMUSnapshotInfo *sn_info)
 if (l1_table_offset < 0) {
 goto fail;
 }
+bdrv_flush(bs->file);
 
 sn->l1_table_offset = l1_table_offset;
 sn->l1_size = s->l1_size;
-- 
1.7.2.2

[Qemu-devel] [PATCH 02/20] vvfat: Fix double free for opening the image rw

2010-09-21 Thread Kevin Wolf

From: Kevin Wolf 

Allocation and deallocation of bs->opaque is not in the control of a
block driver. Therefore it should not set bs->opaque to a data structure
used by another bs, or closing the image will lead to a double free.

Signed-off-by: Kevin Wolf 
---
 block/vvfat.c |7 ---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/block/vvfat.c b/block/vvfat.c
index 5898d66..0772037 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -2768,12 +2768,12 @@ static int vvfat_is_allocated(BlockDriverState *bs,
 
 static int write_target_commit(BlockDriverState *bs, int64_t sector_num,
const uint8_t* buffer, int nb_sectors) {
-BDRVVVFATState* s = bs->opaque;
+BDRVVVFATState* s = *((BDRVVVFATState**) bs->opaque);
 return try_commit(s);
 }
 
 static void write_target_close(BlockDriverState *bs) {
-BDRVVVFATState* s = bs->opaque;
+BDRVVVFATState* s = *((BDRVVVFATState**) bs->opaque);
 bdrv_delete(s->qcow);
 free(s->qcow_filename);
 }
@@ -2816,7 +2816,8 @@ static int enable_write_target(BDRVVVFATState *s)
 
 s->bs->backing_hd = calloc(sizeof(BlockDriverState), 1);
 s->bs->backing_hd->drv = &vvfat_write_target;
-s->bs->backing_hd->opaque = s;
+s->bs->backing_hd->opaque = qemu_malloc(sizeof(void*));
+*(void**)s->bs->backing_hd->opaque = s;
 
 return 0;
 }
-- 
1.7.2.2

[Qemu-devel] [PATCH 11/20] qcow2: Get rid of additional sync on COW

2010-09-21 Thread Kevin Wolf

We always have a sync for the refcount update when a new cluster is
allocated. If we move this past the COW, we can save an additional sync.

Signed-off-by: Kevin Wolf 
---
 block/qcow2-cluster.c |   10 --
 1 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index 818c0db..cb2e33f 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -415,7 +415,7 @@ static int copy_sectors(BlockDriverState *bs, uint64_t 
start_sect,
 &s->aes_encrypt_key);
 }
 BLKDBG_EVENT(bs->file, BLKDBG_COW_WRITE);
-ret = bdrv_write_sync(bs->file, (cluster_offset >> 9) + n_start,
+ret = bdrv_write(bs->file, (cluster_offset >> 9) + n_start,
 s->cluster_data, n);
 if (ret < 0)
 return ret;
@@ -714,6 +714,13 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, 
QCowL2Meta *m)
 (i << s->cluster_bits)) | QCOW_OFLAG_COPIED);
  }
 
+/*
+ * Before we update the L2 table to actually point to the new cluster, we
+ * need to be sure that the refcounts have been increased and COW was
+ * handled.
+ */
+bdrv_flush(bs->file);
+
 ret = write_l2_entries(bs, l2_table, l2_offset, l2_index, m->nb_clusters);
 if (ret < 0) {
 qcow2_l2_cache_reset(bs);
@@ -865,7 +872,6 @@ int qcow2_alloc_cluster_offset(BlockDriverState *bs, 
uint64_t offset,
 QLIST_REMOVE(m, next_in_flight);
 return cluster_offset;
 }
-bdrv_flush(bs->file);
 
 /* save info needed for meta data update */
 m->offset = offset;
-- 
1.7.2.2

[Qemu-devel] [PATCH 19/20] scsi_bus: fix length and xfer_mode for RESERVE and RELEASE commands

2010-09-21 Thread Kevin Wolf

From: Bernhard Kohl 

For the RESERVE and RELEASE commands the length must be zero
and xfer_mode must be SCSI_XFER_NONE.

Signed-off-by: Bernhard Kohl 
Signed-off-by: Kevin Wolf 
---
 hw/scsi-bus.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/hw/scsi-bus.c b/hw/scsi-bus.c
index 7aa0bcd..5a3fd4b 100644
--- a/hw/scsi-bus.c
+++ b/hw/scsi-bus.c
@@ -208,6 +208,8 @@ static int scsi_req_length(SCSIRequest *req, uint8_t *cmd)
 case SEEK_6:
 case WRITE_FILEMARKS:
 case SPACE:
+case RESERVE:
+case RELEASE:
 case ERASE:
 case ALLOW_MEDIUM_REMOVAL:
 case VERIFY:
@@ -319,7 +321,6 @@ static void scsi_req_xfer_mode(SCSIRequest *req)
 case WRITE_BUFFER:
 case FORMAT_UNIT:
 case REASSIGN_BLOCKS:
-case RESERVE:
 case SEARCH_EQUAL:
 case SEARCH_HIGH:
 case SEARCH_LOW:
-- 
1.7.2.2

[Qemu-devel] [PATCH 04/20] use qemu_blockalign consistently

2010-09-21 Thread Kevin Wolf

From: Christoph Hellwig 

Use qemu_blockalign for all allocations in the block layer.  This allows
increasing the required alignment, which is need to support O_DIRECT on
devices with large block sizes.

Signed-off-by: Christoph Hellwig 
Signed-off-by: Kevin Wolf 
---
 hw/scsi-disk.c |9 +
 hw/sd.c|2 +-
 posix-aio-compat.c |2 +-
 qemu-io.c  |2 +-
 qemu-nbd.c |2 +-
 5 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c
index 1446ca6..ee20e8f 100644
--- a/hw/scsi-disk.c
+++ b/hw/scsi-disk.c
@@ -70,14 +70,15 @@ struct SCSIDiskState
 char *serial;
 };
 
-static SCSIDiskReq *scsi_new_request(SCSIDevice *d, uint32_t tag, uint32_t lun)
+static SCSIDiskReq *scsi_new_request(SCSIDiskState *s, uint32_t tag,
+uint32_t lun)
 {
 SCSIRequest *req;
 SCSIDiskReq *r;
 
-req = scsi_req_alloc(sizeof(SCSIDiskReq), d, tag, lun);
+req = scsi_req_alloc(sizeof(SCSIDiskReq), &s->qdev, tag, lun);
 r = DO_UPCAST(SCSIDiskReq, req, req);
-r->iov.iov_base = qemu_memalign(512, SCSI_DMA_BUF_SIZE);
+r->iov.iov_base = qemu_blockalign(s->bs, SCSI_DMA_BUF_SIZE);
 return r;
 }
 
@@ -939,7 +940,7 @@ static int32_t scsi_send_command(SCSIDevice *d, uint32_t 
tag,
 }
 /* ??? Tags are not unique for different luns.  We only implement a
single lun, so this should not matter.  */
-r = scsi_new_request(d, tag, lun);
+r = scsi_new_request(s, tag, lun);
 outbuf = (uint8_t *)r->iov.iov_base;
 is_write = 0;
 DPRINTF("Command: lun=%d tag=0x%x data=0x%02x", lun, tag, buf[0]);
diff --git a/hw/sd.c b/hw/sd.c
index c928120..4bcf1c0 100644
--- a/hw/sd.c
+++ b/hw/sd.c
@@ -440,7 +440,7 @@ SDState *sd_init(BlockDriverState *bs, int is_spi)
 SDState *sd;
 
 sd = (SDState *) qemu_mallocz(sizeof(SDState));
-sd->buf = qemu_memalign(512, 512);
+sd->buf = qemu_blockalign(bs, 512);
 sd->spi = is_spi;
 sd->enable = 1;
 sd_reset(sd, bs);
diff --git a/posix-aio-compat.c b/posix-aio-compat.c
index 10f1f03..842f1a2 100644
--- a/posix-aio-compat.c
+++ b/posix-aio-compat.c
@@ -270,7 +270,7 @@ static ssize_t handle_aiocb_rw(struct qemu_paiocb *aiocb)
  * Ok, we have to do it the hard way, copy all segments into
  * a single aligned buffer.
  */
-buf = qemu_memalign(512, aiocb->aio_nbytes);
+buf = qemu_blockalign(aiocb->common.bs, aiocb->aio_nbytes);
 if (aiocb->aio_type & QEMU_AIO_WRITE) {
 char *p = buf;
 int i;
diff --git a/qemu-io.c b/qemu-io.c
index bd3bd16..b4e5cc8 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -61,7 +61,7 @@ static void *qemu_io_alloc(size_t len, int pattern)
 
if (misalign)
len += MISALIGN_OFFSET;
-   buf = qemu_memalign(512, len);
+   buf = qemu_blockalign(bs, len);
memset(buf, pattern, len);
if (misalign)
buf += MISALIGN_OFFSET;
diff --git a/qemu-nbd.c b/qemu-nbd.c
index 91b569f..923a3bf 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -446,7 +446,7 @@ int main(int argc, char **argv)
 max_fd = sharing_fds[0];
 nb_fds++;
 
-data = qemu_memalign(512, NBD_BUFFER_SIZE);
+data = qemu_blockalign(bs, NBD_BUFFER_SIZE);
 if (data == NULL)
 errx(EXIT_FAILURE, "Cannot allocate data buffer");
 
-- 
1.7.2.2

[Qemu-devel] [PATCH 07/20] nbd: correctly manage default port

2010-09-21 Thread Kevin Wolf

From: Laurent Vivier 

block/nbd.c: use default port number when none is specified
qemu-nbd.c:  use IANA-assigned port number: 10809

Signed-off-by: Laurent Vivier 
Signed-off-by: Kevin Wolf 
---
 block/nbd.c |2 --
 qemu-nbd.c  |6 +++---
 2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 5e9d6cb..c8dc763 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -95,8 +95,6 @@ static int nbd_open(BlockDriverState *bs, const char* 
filename, int flags)
 if (r == p) {
 goto out;
 }
-} else if (name == NULL) {
-goto out;
 }
 
 sock = tcp_socket_outgoing(hostname, port);
diff --git a/qemu-nbd.c b/qemu-nbd.c
index 923a3bf..99f1d22 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -44,7 +44,7 @@ static void usage(const char *name)
 "Usage: %s [OPTIONS] FILE\n"
 "QEMU Disk Network Block Device Server\n"
 "\n"
-"  -p, --port=PORT  port to listen on (default `1024')\n"
+"  -p, --port=PORT  port to listen on (default `%d')\n"
 "  -o, --offset=OFFSET  offset into the image\n"
 "  -b, --bind=IFACE interface to bind to (default `0.0.0.0')\n"
 "  -k, --socket=PATHpath to the unix socket\n"
@@ -62,7 +62,7 @@ static void usage(const char *name)
 "  -V, --versionoutput version information and exit\n"
 "\n"
 "Report bugs to \n"
-, name, "DEVICE");
+, name, NBD_DEFAULT_PORT, "DEVICE");
 }
 
 static void version(const char *name)
@@ -188,7 +188,7 @@ int main(int argc, char **argv)
 bool readonly = false;
 bool disconnect = false;
 const char *bindto = "0.0.0.0";
-int port = 1024;
+int port = NBD_DEFAULT_PORT;
 struct sockaddr_in addr;
 socklen_t addr_len = sizeof(addr);
 off_t fd_size;
-- 
1.7.2.2

[Qemu-devel] [PATCH 13/20] scsi-disk: propagate the required alignment

2010-09-21 Thread Kevin Wolf

From: Christoph Hellwig 

Signed-off-by: Christoph Hellwig 
Signed-off-by: Kevin Wolf 
---
 hw/scsi-disk.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c
index ee20e8f..9628b39 100644
--- a/hw/scsi-disk.c
+++ b/hw/scsi-disk.c
@@ -1178,6 +1178,7 @@ static int scsi_disk_initfn(SCSIDevice *dev)
 s->qdev.blocksize = s->qdev.conf.logical_block_size;
 }
 s->cluster_size = s->qdev.blocksize / 512;
+s->bs->buffer_alignment = s->qdev.blocksize;
 
 s->qdev.type = TYPE_DISK;
 qemu_add_vm_change_state_handler(scsi_dma_restart_cb, s);
-- 
1.7.2.2

[Qemu-devel] [PATCH 20/20] blkverify: Add block driver for verifying I/O

2010-09-21 Thread Kevin Wolf

From: Stefan Hajnoczi 

The blkverify block driver makes investigating image format data
corruption much easier.  A raw image initialized with the same contents
as the test image (e.g. qcow2 file) must be provided.  The raw image
mirrors read/write operations and is used to verify that data read from
the test image is correct.

See docs/blkverify.txt for more information.

Signed-off-by: Stefan Hajnoczi 
Signed-off-by: Kevin Wolf 
---
 Makefile.objs  |2 +-
 block/blkverify.c  |  382 
 docs/blkverify.txt |   69 ++
 3 files changed, 452 insertions(+), 1 deletions(-)
 create mode 100644 block/blkverify.c
 create mode 100644 docs/blkverify.txt

diff --git a/Makefile.objs b/Makefile.objs
index 3ef6d80..dad4593 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -14,7 +14,7 @@ block-obj-$(CONFIG_LINUX_AIO) += linux-aio.o
 
 block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o 
vvfat.o
 block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
-block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o
+block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
 block-nested-$(CONFIG_WIN32) += raw-win32.o
 block-nested-$(CONFIG_POSIX) += raw-posix.o
 block-nested-$(CONFIG_CURL) += curl.o
diff --git a/block/blkverify.c b/block/blkverify.c
new file mode 100644
index 000..4202685
--- /dev/null
+++ b/block/blkverify.c
@@ -0,0 +1,382 @@
+/*
+ * Block protocol for block driver correctness testing
+ *
+ * Copyright (C) 2010 IBM, Corp.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include 
+#include "qemu_socket.h" /* for EINPROGRESS on Windows */
+#include "block_int.h"
+
+typedef struct {
+BlockDriverState *test_file;
+} BDRVBlkverifyState;
+
+typedef struct BlkverifyAIOCB BlkverifyAIOCB;
+struct BlkverifyAIOCB {
+BlockDriverAIOCB common;
+QEMUBH *bh;
+
+/* Request metadata */
+bool is_write;
+int64_t sector_num;
+int nb_sectors;
+
+int ret;/* first completed request's result */
+unsigned int done;  /* completion counter */
+bool *finished; /* completion signal for cancel */
+
+QEMUIOVector *qiov; /* user I/O vector */
+QEMUIOVector raw_qiov;  /* cloned I/O vector for raw file */
+void *buf;  /* buffer for raw file I/O */
+
+void (*verify)(BlkverifyAIOCB *acb);
+};
+
+static void blkverify_aio_cancel(BlockDriverAIOCB *blockacb)
+{
+BlkverifyAIOCB *acb = (BlkverifyAIOCB *)blockacb;
+bool finished = false;
+
+/* Wait until request completes, invokes its callback, and frees itself */
+acb->finished = &finished;
+while (!finished) {
+qemu_aio_wait();
+}
+}
+
+static AIOPool blkverify_aio_pool = {
+.aiocb_size = sizeof(BlkverifyAIOCB),
+.cancel = blkverify_aio_cancel,
+};
+
+static void blkverify_err(BlkverifyAIOCB *acb, const char *fmt, ...)
+{
+va_list ap;
+
+va_start(ap, fmt);
+fprintf(stderr, "blkverify: %s sector_num=%ld nb_sectors=%d ",
+acb->is_write ? "write" : "read", acb->sector_num,
+acb->nb_sectors);
+vfprintf(stderr, fmt, ap);
+fprintf(stderr, "\n");
+va_end(ap);
+exit(1);
+}
+
+/* Valid blkverify filenames look like 
blkverify:path/to/raw_image:path/to/image */
+static int blkverify_open(BlockDriverState *bs, const char *filename, int 
flags)
+{
+BDRVBlkverifyState *s = bs->opaque;
+int ret;
+char *raw, *c;
+
+/* Parse the blkverify: prefix */
+if (strncmp(filename, "blkverify:", strlen("blkverify:"))) {
+return -EINVAL;
+}
+filename += strlen("blkverify:");
+
+/* Parse the raw image filename */
+c = strchr(filename, ':');
+if (c == NULL) {
+return -EINVAL;
+}
+
+raw = strdup(filename);
+raw[c - filename] = '\0';
+ret = bdrv_file_open(&bs->file, raw, flags);
+free(raw);
+if (ret < 0) {
+return ret;
+}
+filename = c + 1;
+
+/* Open the test file */
+s->test_file = bdrv_new("");
+ret = bdrv_open(s->test_file, filename, flags, NULL);
+if (ret < 0) {
+bdrv_delete(s->test_file);
+s->test_file = NULL;
+return ret;
+}
+
+return 0;
+}
+
+static void blkverify_close(BlockDriverState *bs)
+{
+BDRVBlkverifyState *s = bs->opaque;
+
+bdrv_delete(s->test_file);
+s->test_file = NULL;
+}
+
+static void blkverify_flush(BlockDriverState *bs)
+{
+BDRVBlkverifyState *s = bs->opaque;
+
+/* Only flush test file, the raw file is not important */
+bdrv_flush(s->test_file);
+}
+
+static int64_t blkverify_getlength(BlockDriverState *bs)
+{
+BDRVBlkverifyState *s = bs->opaque;
+
+return bdrv_getlength(s->test_file);
+}
+
+/**
+ * Check that I/O vector contents are identical
+ *
+ * @a:  I/O vector
+ * @b:

[Qemu-devel] [PATCH 16/20] qcow2: Avoid bounce buffers for AIO read requests

2010-09-21 Thread Kevin Wolf

qcow2 used to use bounce buffers for any AIO requests. This does not only imply
unnecessary copying, but also unbounded allocations which should be avoided.

This patch removes bounce buffers from the normal AIO read path, and constrains
them to a constant size for encrypted images.

Signed-off-by: Kevin Wolf 
---
 block/qcow2-cluster.c |8 -
 block/qcow2.c |   86 +---
 block/qcow2.h |4 +-
 3 files changed, 68 insertions(+), 30 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index cb2e33f..fb4224a 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -350,6 +350,8 @@ static int qcow_read(BlockDriverState *bs, int64_t 
sector_num,
 BDRVQcowState *s = bs->opaque;
 int ret, index_in_cluster, n, n1;
 uint64_t cluster_offset;
+struct iovec iov;
+QEMUIOVector qiov;
 
 while (nb_sectors > 0) {
 n = nb_sectors;
@@ -364,7 +366,11 @@ static int qcow_read(BlockDriverState *bs, int64_t 
sector_num,
 if (!cluster_offset) {
 if (bs->backing_hd) {
 /* read from the base image */
-n1 = qcow2_backing_read1(bs->backing_hd, sector_num, buf, n);
+iov.iov_base = buf;
+iov.iov_len = n * 512;
+qemu_iovec_init_external(&qiov, &iov, 1);
+
+n1 = qcow2_backing_read1(bs->backing_hd, &qiov, sector_num, n);
 if (n1 > 0) {
 BLKDBG_EVENT(bs->file, BLKDBG_READ_BACKING);
 ret = bdrv_read(bs->backing_hd, sector_num, buf, n1);
diff --git a/block/qcow2.c b/block/qcow2.c
index a53014d..4a688b4 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -311,8 +311,8 @@ static int qcow_is_allocated(BlockDriverState *bs, int64_t 
sector_num,
 }
 
 /* handle reading after the end of the backing file */
-int qcow2_backing_read1(BlockDriverState *bs,
-  int64_t sector_num, uint8_t *buf, int nb_sectors)
+int qcow2_backing_read1(BlockDriverState *bs, QEMUIOVector *qiov,
+  int64_t sector_num, int nb_sectors)
 {
 int n1;
 if ((sector_num + nb_sectors) <= bs->total_sectors)
@@ -321,7 +321,9 @@ int qcow2_backing_read1(BlockDriverState *bs,
 n1 = 0;
 else
 n1 = bs->total_sectors - sector_num;
-memset(buf + n1 * 512, 0, 512 * (nb_sectors - n1));
+
+qemu_iovec_memset(qiov, 0, 512 * (nb_sectors - n1));
+
 return n1;
 }
 
@@ -333,6 +335,7 @@ typedef struct QCowAIOCB {
 void *orig_buf;
 int remaining_sectors;
 int cur_nr_sectors;/* number of sectors in current iteration */
+uint64_t bytes_done;
 uint64_t cluster_offset;
 uint8_t *cluster_data;
 BlockDriverAIOCB *hd_aiocb;
@@ -397,15 +400,19 @@ static void qcow_aio_read_cb(void *opaque, int ret)
 /* nothing to do */
 } else {
 if (s->crypt_method) {
-qcow2_encrypt_sectors(s, acb->sector_num, acb->buf, acb->buf,
-acb->cur_nr_sectors, 0,
-&s->aes_decrypt_key);
+qcow2_encrypt_sectors(s, acb->sector_num,  acb->cluster_data,
+acb->cluster_data, acb->cur_nr_sectors, 0, 
&s->aes_decrypt_key);
+qemu_iovec_reset(&acb->hd_qiov);
+qemu_iovec_copy(&acb->hd_qiov, acb->qiov, acb->bytes_done,
+acb->cur_nr_sectors * 512);
+qemu_iovec_from_buffer(&acb->hd_qiov, acb->cluster_data,
+512 * acb->cur_nr_sectors);
 }
 }
 
 acb->remaining_sectors -= acb->cur_nr_sectors;
 acb->sector_num += acb->cur_nr_sectors;
-acb->buf += acb->cur_nr_sectors * 512;
+acb->bytes_done += acb->cur_nr_sectors * 512;
 
 if (acb->remaining_sectors == 0) {
 /* request completed */
@@ -415,6 +422,11 @@ static void qcow_aio_read_cb(void *opaque, int ret)
 
 /* prepare next AIO request */
 acb->cur_nr_sectors = acb->remaining_sectors;
+if (s->crypt_method) {
+acb->cur_nr_sectors = MIN(acb->cur_nr_sectors,
+QCOW_MAX_CRYPT_CLUSTERS * s->cluster_sectors);
+}
+
 ret = qcow2_get_cluster_offset(bs, acb->sector_num << 9,
 &acb->cur_nr_sectors, &acb->cluster_offset);
 if (ret < 0) {
@@ -423,15 +435,17 @@ static void qcow_aio_read_cb(void *opaque, int ret)
 
 index_in_cluster = acb->sector_num & (s->cluster_sectors - 1);
 
+qemu_iovec_reset(&acb->hd_qiov);
+qemu_iovec_copy(&acb->hd_qiov, acb->qiov, acb->bytes_done,
+acb->cur_nr_sectors * 512);
+
 if (!acb->cluster_offset) {
+
 if (bs->backing_hd) {
 /* read from the base image */
-n1 = qcow2_backing_read1(bs->backing_hd, acb->sector_num,
-   acb->buf, acb->cur_nr_sectors);
+n1 = qcow2_backing_read1(bs->backing_hd, &acb->hd_qiov,
+acb->sector_num, acb->cur_nr_sectors);
 if (n1 > 0) {
-acb->hd_iov.iov_base = (v

[Qemu-devel] [PATCH 08/20] qcow2: Move sync out of write_refcount_block_entries

2010-09-21 Thread Kevin Wolf

Signed-off-by: Kevin Wolf 
---
 block/qcow2-refcount.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 4c19e7e..7dc75d1 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -444,7 +444,7 @@ static int write_refcount_block_entries(BlockDriverState 
*bs,
 size = (last_index - first_index) << REFCOUNT_SHIFT;
 
 BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_UPDATE_PART);
-ret = bdrv_pwrite_sync(bs->file,
+ret = bdrv_pwrite(bs->file,
 refcount_block_offset + (first_index << REFCOUNT_SHIFT),
 &s->refcount_block_cache[first_index], size);
 if (ret < 0) {
@@ -551,6 +551,8 @@ fail:
 dummy = update_refcount(bs, offset, cluster_offset - offset, -addend);
 }
 
+bdrv_flush(bs->file);
+
 return ret;
 }
 
-- 
1.7.2.2

[Qemu-devel] [PATCH 12/20] virtio-blk: propagate the required alignment

2010-09-21 Thread Kevin Wolf

From: Christoph Hellwig 

Signed-off-by: Christoph Hellwig 
Signed-off-by: Kevin Wolf 
---
 hw/virtio-blk.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index bd6bbe6..a1df26d 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -540,6 +540,7 @@ VirtIODevice *virtio_blk_init(DeviceState *dev, BlockConf 
*conf)
 register_savevm(dev, "virtio-blk", virtio_blk_id++, 2,
 virtio_blk_save, virtio_blk_load, s);
 bdrv_set_removable(s->bs, 0);
+s->bs->buffer_alignment = conf->logical_block_size;
 
 return &s->vdev;
 }
-- 
1.7.2.2

[Qemu-devel] [PATCH 14/20] ide: propagate the required alignment

2010-09-21 Thread Kevin Wolf

From: Christoph Hellwig 

IDE is a bit ugly in this respect.  For one it doesn't really keep track
of a sector size - most of the protocol is in units of 512 bytes, and we
assume 2048 bytes for CDROMs which is correct most of the time.

Second IDE allocates an I/O buffer long before we know if we're dealing
with a CDROM or not, so increase the alignment for the io_buffer
unconditionally.

Signed-off-by: Christoph Hellwig 
Signed-off-by: Kevin Wolf 
---
 hw/ide/core.c |4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 1e466d1..06b6e14 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -2645,6 +2645,7 @@ int ide_init_drive(IDEState *s, BlockDriverState *bs,
 if (bdrv_get_type_hint(bs) == BDRV_TYPE_CDROM) {
 s->drive_kind = IDE_CD;
 bdrv_set_change_cb(bs, cdrom_change_cb, s);
+bs->buffer_alignment = 2048;
 } else {
 if (!bdrv_is_inserted(s->bs)) {
 error_report("Device needs media, but drive is empty");
@@ -2679,7 +2680,8 @@ static void ide_init1(IDEBus *bus, int unit)
 s->bus = bus;
 s->unit = unit;
 s->drive_serial = drive_serial++;
-s->io_buffer = qemu_blockalign(s->bs, IDE_DMA_BUF_SECTORS*512 + 4);
+/* we need at least 2k alignment for accessing CDROMs using O_DIRECT */
+s->io_buffer = qemu_memalign(2048, IDE_DMA_BUF_SECTORS*512 + 4);
 s->io_buffer_total_len = IDE_DMA_BUF_SECTORS*512 + 4;
 s->smart_selftest_data = qemu_blockalign(s->bs, 512);
 s->sector_write_timer = qemu_new_timer(vm_clock,
-- 
1.7.2.2

[Qemu-devel] [PATCH 15/20] cutils: qemu_iovec_copy and qemu_iovec_memset

2010-09-21 Thread Kevin Wolf

This adds two functions that work on QEMUIOVectors and will be used by the next
qcow2 patches.

Signed-off-by: Kevin Wolf 
---
 cutils.c  |   50 +-
 qemu-common.h |3 +++
 2 files changed, 44 insertions(+), 9 deletions(-)

diff --git a/cutils.c b/cutils.c
index 036ae3c..5883737 100644
--- a/cutils.c
+++ b/cutils.c
@@ -168,30 +168,50 @@ void qemu_iovec_add(QEMUIOVector *qiov, void *base, 
size_t len)
 }
 
 /*
- * Copies iovecs from src to the end dst until src is completely copied or the
- * total size of the copied iovec reaches size. The size of the last copied
- * iovec is changed in order to fit the specified total size if it isn't a
- * perfect fit already.
+ * Copies iovecs from src to the end of dst. It starts copying after skipping
+ * the given number of bytes in src and copies until src is completely copied
+ * or the total size of the copied iovec reaches size.The size of the last
+ * copied iovec is changed in order to fit the specified total size if it isn't
+ * a perfect fit already.
  */
-void qemu_iovec_concat(QEMUIOVector *dst, QEMUIOVector *src, size_t size)
+void qemu_iovec_copy(QEMUIOVector *dst, QEMUIOVector *src, uint64_t skip,
+size_t size)
 {
 int i;
 size_t done;
+void *iov_base;
+uint64_t iov_len;
 
 assert(dst->nalloc != -1);
 
 done = 0;
 for (i = 0; (i < src->niov) && (done != size); i++) {
-if (done + src->iov[i].iov_len > size) {
-qemu_iovec_add(dst, src->iov[i].iov_base, size - done);
+if (skip >= src->iov[i].iov_len) {
+/* Skip the whole iov */
+skip -= src->iov[i].iov_len;
+continue;
+} else {
+/* Skip only part (or nothing) of the iov */
+iov_base = (uint8_t*) src->iov[i].iov_base + skip;
+iov_len = src->iov[i].iov_len - skip;
+skip = 0;
+}
+
+if (done + iov_len > size) {
+qemu_iovec_add(dst, iov_base, size - done);
 break;
 } else {
-qemu_iovec_add(dst, src->iov[i].iov_base, src->iov[i].iov_len);
+qemu_iovec_add(dst, iov_base, iov_len);
 }
-done += src->iov[i].iov_len;
+done += iov_len;
 }
 }
 
+void qemu_iovec_concat(QEMUIOVector *dst, QEMUIOVector *src, size_t size)
+{
+qemu_iovec_copy(dst, src, 0, size);
+}
+
 void qemu_iovec_destroy(QEMUIOVector *qiov)
 {
 assert(qiov->nalloc != -1);
@@ -234,6 +254,18 @@ void qemu_iovec_from_buffer(QEMUIOVector *qiov, const void 
*buf, size_t count)
 }
 }
 
+void qemu_iovec_memset(QEMUIOVector *qiov, int c, size_t count)
+{
+size_t n;
+int i;
+
+for (i = 0; i < qiov->niov && count; ++i) {
+n = MIN(count, qiov->iov[i].iov_len);
+memset(qiov->iov[i].iov_base, c, n);
+count -= n;
+}
+}
+
 #ifndef _WIN32
 /* Sets a specific flag */
 int fcntl_setfl(int fd, int flag)
diff --git a/qemu-common.h b/qemu-common.h
index dfd3dc0..5544ffd 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -278,11 +278,14 @@ typedef struct QEMUIOVector {
 void qemu_iovec_init(QEMUIOVector *qiov, int alloc_hint);
 void qemu_iovec_init_external(QEMUIOVector *qiov, struct iovec *iov, int niov);
 void qemu_iovec_add(QEMUIOVector *qiov, void *base, size_t len);
+void qemu_iovec_copy(QEMUIOVector *dst, QEMUIOVector *src, uint64_t skip,
+size_t size);
 void qemu_iovec_concat(QEMUIOVector *dst, QEMUIOVector *src, size_t size);
 void qemu_iovec_destroy(QEMUIOVector *qiov);
 void qemu_iovec_reset(QEMUIOVector *qiov);
 void qemu_iovec_to_buffer(QEMUIOVector *qiov, void *buf);
 void qemu_iovec_from_buffer(QEMUIOVector *qiov, const void *buf, size_t count);
+void qemu_iovec_memset(QEMUIOVector *qiov, int c, size_t count);
 
 struct Monitor;
 typedef struct Monitor Monitor;
-- 
1.7.2.2

[Qemu-devel] [PATCH 09/20] qcow2: Move sync out of update_refcount

2010-09-21 Thread Kevin Wolf

Note that the flush is omitted intentionally in qcow2_free_clusters. If
anything, we can leak clusters here if we lose the writes.

Signed-off-by: Kevin Wolf 
---
 block/qcow2-refcount.c |   13 +++--
 1 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 7dc75d1..4fc3f80 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -261,6 +261,8 @@ static int64_t alloc_refcount_block(BlockDriverState *bs, 
int64_t cluster_index)
 goto fail_block;
 }
 
+bdrv_flush(bs->file);
+
 /* Initialize the new refcount block only after updating its refcount,
  * update_refcount uses the refcount cache itself */
 memset(s->refcount_block_cache, 0, s->cluster_size);
@@ -551,8 +553,6 @@ fail:
 dummy = update_refcount(bs, offset, cluster_offset - offset, -addend);
 }
 
-bdrv_flush(bs->file);
-
 return ret;
 }
 
@@ -575,6 +575,8 @@ static int update_cluster_refcount(BlockDriverState *bs,
 return ret;
 }
 
+bdrv_flush(bs->file);
+
 return get_refcount(bs, cluster_index);
 }
 
@@ -626,6 +628,9 @@ int64_t qcow2_alloc_clusters(BlockDriverState *bs, int64_t 
size)
 if (ret < 0) {
 return ret;
 }
+
+bdrv_flush(bs->file);
+
 return offset;
 }
 
@@ -803,6 +808,10 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
 if (ret < 0) {
 goto fail;
 }
+
+/* TODO Flushing once for the whole function should
+ * be enough */
+bdrv_flush(bs->file);
 }
 /* compressed clusters are never modified */
 refcount = 2;
-- 
1.7.2.2

Re: [Qemu-devel] Re: [PATCH] new parameter boot=on|off for "-net nic" and "-device" NIC devices

2010-09-21 Thread Michael S. Tsirkin

On Tue, Sep 21, 2010 at 10:16:29AM -0500, Anthony Liguori wrote:
> On 09/19/2010 11:07 AM, Michael S. Tsirkin wrote:
> >On Tue, Sep 14, 2010 at 05:46:55PM +0200, Bernhard Kohl wrote:
> >>This patch was motivated by the following use case: In our system
> >>the VMs usually have 4 NICs, any combination of virtio-net-pci and
> >>pci-assign NIC devices. The VMs boot via gPXE preferably over the
> >>pci-assign devices.
> >>
> >>There is no way to make this working with a combination of the
> >>current options -net -pcidevice -device -optionrom -boot.
> >>
> >>With the parameter boot=off it is possible to avoid loading
> >>and using gPXE option ROMs either for old style "-net nic" or
> >>for "-device" NIC devices. So we can select which NIC is used
> >>for booting.
> >>
> >>A side effect of the boot=off parameter is that unneeded ROMs
> >>which might waste memory are not longer loaded. E.g. if you have
> >>2 virtio-net-pci and 2 pci-assign NICs in sum 4 option ROMs are
> >>loaded and the virtio ROMs take precedence over the pci-assign
> >>ROMs. The BIOS uses the first gPXE ROM which it finds and only
> >>needs one of them even if there are more NICs of the same type.
> >>
> >>Without using the boot=on|off parameter the current behaviour
> >>does not change.
> >>
> >>Signed-off-by: Thomas Ostler
> >>Signed-off-by: Bernhard Kohl
> >I think this is useful, however:
> >
> >- We have bit properties which handle parsing on/off
> >   and other formats automatically. Please don't use string.
> 
> This is unneeded.  Just do romfile= with -device and it will
> suppress the option rom loading.
> 
> IOW:
> 
> -device virtio-net-pci,romfile= -pcidevice ...

Cool, but needs to be documented.

> But BTW, you should be able to select the pci device by doing:
> 
> -boot cdn,menu=on
> 
> And then hitting F12.  We need to come up with a better way to let
> particular BEV or BCV devices to be chosen from the command line.
> 
> Regards,
> 
> Anthony Liguori
> 
> >- boot is not a great property name for PCI: what
> >   you actually do is disable option rom.
> >   So maybe call it 'rom' or something like that?
> >- given you have added a property, it can now
> >   be changed with -device. and visible in -device ?
> >   This also has an advantage of only applying to pci devices
> >   (-net option would appear to apply to non-pci but have no effect).
> >   Please do not add more flag parsing in qdemu-options, net and vl.c
> >
> >To summarize, just add a qdev bit option and check
> >the bit.
> >
> >>---
> >>  hw/pci.c|8 +++-
> >>  hw/pci.h|1 +
> >>  hw/qdev.c   |6 ++
> >>  hw/qdev.h   |1 +
> >>  net.c   |8 
> >>  net.h   |1 +
> >>  qemu-options.hx |8 ++--
> >>  vl.c|   27 +++
> >>  8 files changed, 57 insertions(+), 3 deletions(-)
> >>
> >>diff --git a/hw/pci.c b/hw/pci.c
> >>index a98d6f3..055a2be 100644
> >>--- a/hw/pci.c
> >>+++ b/hw/pci.c
> >>@@ -71,6 +71,7 @@ static struct BusInfo pci_bus_info = {
> >>  DEFINE_PROP_UINT32("rombar",  PCIDevice, rom_bar, 1),
> >>  DEFINE_PROP_BIT("multifunction", PCIDevice, cap_present,
> >>  QEMU_PCI_CAP_MULTIFUNCTION_BITNR, false),
> >>+DEFINE_PROP_STRING("boot", PCIDevice, boot),
> >>  DEFINE_PROP_END_OF_LIST()
> >>  }
> >>  };
> >>@@ -1513,6 +1514,10 @@ PCIDevice *pci_nic_init(NICInfo *nd, const char 
> >>*default_model,
> >>
> >>  pci_dev = pci_create(bus, devfn, pci_nic_names[i]);
> >>  dev =&pci_dev->qdev;
> >>+if (nd->name)
> >>+dev->id = qemu_strdup(nd->name);
> >>+if (nd->no_boot)
> >>+dev->no_boot = 1;
> >>  qdev_set_nic_properties(dev, nd);
> >>  if (qdev_init(dev)<  0)
> >>  return NULL;
> >>@@ -1693,7 +1698,8 @@ static int pci_qdev_init(DeviceState *qdev, 
> >>DeviceInfo *base)
> >>  /* rom loading */
> >>  if (pci_dev->romfile == NULL&&  info->romfile != NULL)
> >>  pci_dev->romfile = qemu_strdup(info->romfile);
> >>-pci_add_option_rom(pci_dev);
> >>+if (!qdev->no_boot)
> >>+pci_add_option_rom(pci_dev);
> >>
> >>  if (qdev->hotplugged) {
> >>  rc = bus->hotplug(bus->hotplug_qdev, pci_dev, 1);
> >>diff --git a/hw/pci.h b/hw/pci.h
> >>index 1eab7e7..20aa038 100644
> >>--- a/hw/pci.h
> >>+++ b/hw/pci.h
> >>@@ -172,6 +172,7 @@ struct PCIDevice {
> >>  char *romfile;
> >>  ram_addr_t rom_offset;
> >>  uint32_t rom_bar;
> >>+char *boot;
> >>  };
> >>
> >>  PCIDevice *pci_register_device(PCIBus *bus, const char *name,
> >>diff --git a/hw/qdev.c b/hw/qdev.c
> >>index 35858cb..8445bc9 100644
> >>--- a/hw/qdev.c
> >>+++ b/hw/qdev.c
> >>@@ -249,6 +249,10 @@ DeviceState *qdev_device_add(QemuOpts *opts)
> >>  qdev_free(qdev);
> >>  return NULL;
> >>  }
> >>+if (qemu_opt_get(opts, "boot")) {
> >>+if (!strcmp("off", qemu_strdup(qemu_opt_get(opts, "boot"
> >>+qdev->no_

[Qemu-devel] [PULL 00/20] Block patches

2010-09-21 Thread Kevin Wolf

The following changes since commit a287916c712b0c57a97cd35c663c5e7ba061bc7e:

  Merge remote branch 'mst/for_anthony' into staging (2010-09-20 13:22:20 -0500)

are available in the git repository at:

  git://repo.or.cz/qemu/kevin.git for-anthony

Bernhard Kohl (2):
  scsi-generic: add missing reset handler
  scsi_bus: fix length and xfer_mode for RESERVE and RELEASE commands

Christoph Hellwig (5):
  use qemu_blockalign consistently
  raw-posix: handle > 512 byte alignment correctly
  virtio-blk: propagate the required alignment
  scsi-disk: propagate the required alignment
  ide: propagate the required alignment

Kevin Wolf (10):
  vvfat: Fix segfault on write to read-only disk
  vvfat: Fix double free for opening the image rw
  vvfat: Use cache=unsafe
  qcow2: Move sync out of write_refcount_block_entries
  qcow2: Move sync out of update_refcount
  qcow2: Move sync out of qcow2_alloc_clusters
  qcow2: Get rid of additional sync on COW
  cutils: qemu_iovec_copy and qemu_iovec_memset
  qcow2: Avoid bounce buffers for AIO read requests
  qcow2: Avoid bounce buffers for AIO write requests

Laurent Vivier (2):
  Improve qemu-nbd performance by 4400 %
  nbd: correctly manage default port

Stefan Hajnoczi (1):
  blkverify: Add block driver for verifying I/O

 Makefile.objs  |2 +-
 block/blkverify.c  |  382 
 block/nbd.c|2 -
 block/qcow2-cluster.c  |   19 ++-
 block/qcow2-refcount.c |   13 ++-
 block/qcow2-snapshot.c |2 +
 block/qcow2.c  |  115 +--
 block/qcow2.h  |4 +-
 block/raw-posix.c  |   79 ++
 block/vvfat.c  |   26 +++-
 cutils.c   |   50 +-
 docs/blkverify.txt |   69 +
 hw/ide/core.c  |4 +-
 hw/scsi-bus.c  |3 +-
 hw/scsi-disk.c |   10 +-
 hw/scsi-generic.c  |   21 +++-
 hw/sd.c|2 +-
 hw/virtio-blk.c|1 +
 nbd.c  |   25 +++-
 posix-aio-compat.c |2 +-
 qemu-common.h  |3 +
 qemu-io.c  |2 +-
 qemu-nbd.c |8 +-
 23 files changed, 721 insertions(+), 123 deletions(-)
 create mode 100644 block/blkverify.c
 create mode 100644 docs/blkverify.txt

[Qemu-devel] [PATCH 18/20] scsi-generic: add missing reset handler

2010-09-21 Thread Kevin Wolf

From: Bernhard Kohl 

Ensure that pending requests of a SCSI generic device are purged on
system reset. This also avoids calling a NULL function in lsi53c895a.
The lsi code was recently changed to call the .qdev.reset function.

Signed-off-by: Bernhard Kohl 
Signed-off-by: Kevin Wolf 
---
 hw/scsi-generic.c |   21 +++--
 1 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/hw/scsi-generic.c b/hw/scsi-generic.c
index 9538027..7212091 100644
--- a/hw/scsi-generic.c
+++ b/hw/scsi-generic.c
@@ -455,15 +455,31 @@ static int get_stream_blocksize(BlockDriverState *bdrv)
 return (buf[9] << 16) | (buf[10] << 8) | buf[11];
 }
 
-static void scsi_destroy(SCSIDevice *d)
+static void scsi_generic_purge_requests(SCSIGenericState *s)
 {
-SCSIGenericState *s = DO_UPCAST(SCSIGenericState, qdev, d);
 SCSIGenericReq *r;
 
 while (!QTAILQ_EMPTY(&s->qdev.requests)) {
 r = DO_UPCAST(SCSIGenericReq, req, QTAILQ_FIRST(&s->qdev.requests));
+if (r->req.aiocb) {
+bdrv_aio_cancel(r->req.aiocb);
+}
 scsi_remove_request(r);
 }
+}
+
+static void scsi_generic_reset(DeviceState *dev)
+{
+SCSIGenericState *s = DO_UPCAST(SCSIGenericState, qdev.qdev, dev);
+
+scsi_generic_purge_requests(s);
+}
+
+static void scsi_destroy(SCSIDevice *d)
+{
+SCSIGenericState *s = DO_UPCAST(SCSIGenericState, qdev, d);
+
+scsi_generic_purge_requests(s);
 blockdev_mark_auto_del(s->qdev.conf.bs);
 }
 
@@ -537,6 +553,7 @@ static SCSIDeviceInfo scsi_generic_info = {
 .qdev.name= "scsi-generic",
 .qdev.desc= "pass through generic scsi device (/dev/sg*)",
 .qdev.size= sizeof(SCSIGenericState),
+.qdev.reset   = scsi_generic_reset,
 .init = scsi_generic_initfn,
 .destroy  = scsi_destroy,
 .send_command = scsi_send_command,
-- 
1.7.2.2

[Qemu-devel] [PATCH 17/20] qcow2: Avoid bounce buffers for AIO write requests

2010-09-21 Thread Kevin Wolf

qcow2 used to use bounce buffers for any AIO requests. This does not only imply
unnecessary copying, but also unbounded allocations which should be avoided.

This patch removes bounce buffers from the normal AIO write path. Encrypted
images continue to use a bounce buffer, however with constant size.

Signed-off-by: Kevin Wolf 
---
 block/qcow2.c |   41 ++---
 1 files changed, 18 insertions(+), 23 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 4a688b4..ee3481b 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -331,15 +331,12 @@ typedef struct QCowAIOCB {
 BlockDriverAIOCB common;
 int64_t sector_num;
 QEMUIOVector *qiov;
-uint8_t *buf;
-void *orig_buf;
 int remaining_sectors;
 int cur_nr_sectors;/* number of sectors in current iteration */
 uint64_t bytes_done;
 uint64_t cluster_offset;
 uint8_t *cluster_data;
 BlockDriverAIOCB *hd_aiocb;
-struct iovec hd_iov;
 QEMUIOVector hd_qiov;
 QEMUBH *bh;
 QCowL2Meta l2meta;
@@ -530,14 +527,7 @@ static QCowAIOCB *qcow_aio_setup(BlockDriverState *bs,
 acb->sector_num = sector_num;
 acb->qiov = qiov;
 
-if (!is_write) {
-qemu_iovec_init(&acb->hd_qiov, qiov->niov);
-} else if (qiov->niov == 1) {
-acb->buf = (uint8_t *)qiov->iov->iov_base;
-} else {
-acb->buf = acb->orig_buf = qemu_blockalign(bs, qiov->size);
-qemu_iovec_to_buffer(qiov, acb->buf);
-}
+qemu_iovec_init(&acb->hd_qiov, qiov->niov);
 
 acb->bytes_done = 0;
 acb->remaining_sectors = nb_sectors;
@@ -589,7 +579,6 @@ static void qcow_aio_write_cb(void *opaque, int ret)
 BlockDriverState *bs = acb->common.bs;
 BDRVQcowState *s = bs->opaque;
 int index_in_cluster;
-const uint8_t *src_buf;
 int n_end;
 
 acb->hd_aiocb = NULL;
@@ -605,7 +594,7 @@ static void qcow_aio_write_cb(void *opaque, int ret)
 
 acb->remaining_sectors -= acb->cur_nr_sectors;
 acb->sector_num += acb->cur_nr_sectors;
-acb->buf += acb->cur_nr_sectors * 512;
+acb->bytes_done += acb->cur_nr_sectors * 512;
 
 if (acb->remaining_sectors == 0) {
 /* request completed */
@@ -636,20 +625,27 @@ static void qcow_aio_write_cb(void *opaque, int ret)
 
 assert((acb->cluster_offset & 511) == 0);
 
+qemu_iovec_reset(&acb->hd_qiov);
+qemu_iovec_copy(&acb->hd_qiov, acb->qiov, acb->bytes_done,
+acb->cur_nr_sectors * 512);
+
 if (s->crypt_method) {
 if (!acb->cluster_data) {
 acb->cluster_data = qemu_mallocz(QCOW_MAX_CRYPT_CLUSTERS *
  s->cluster_size);
 }
-qcow2_encrypt_sectors(s, acb->sector_num, acb->cluster_data, acb->buf,
-acb->cur_nr_sectors, 1, &s->aes_encrypt_key);
-src_buf = acb->cluster_data;
-} else {
-src_buf = acb->buf;
+
+assert(acb->hd_qiov.size <= QCOW_MAX_CRYPT_CLUSTERS * s->cluster_size);
+qemu_iovec_to_buffer(&acb->hd_qiov, acb->cluster_data);
+
+qcow2_encrypt_sectors(s, acb->sector_num, acb->cluster_data,
+acb->cluster_data, acb->cur_nr_sectors, 1, &s->aes_encrypt_key);
+
+qemu_iovec_reset(&acb->hd_qiov);
+qemu_iovec_add(&acb->hd_qiov, acb->cluster_data,
+acb->cur_nr_sectors * 512);
 }
-acb->hd_iov.iov_base = (void *)src_buf;
-acb->hd_iov.iov_len = acb->cur_nr_sectors * 512;
-qemu_iovec_init_external(&acb->hd_qiov, &acb->hd_iov, 1);
+
 BLKDBG_EVENT(bs->file, BLKDBG_WRITE_AIO);
 acb->hd_aiocb = bdrv_aio_writev(bs->file,
 (acb->cluster_offset >> 9) + 
index_in_cluster,
@@ -667,9 +663,8 @@ fail:
 QLIST_REMOVE(&acb->l2meta, next_in_flight);
 }
 done:
-if (acb->qiov->niov > 1)
-qemu_vfree(acb->orig_buf);
 acb->common.cb(acb->common.opaque, ret);
+qemu_iovec_destroy(&acb->hd_qiov);
 qemu_aio_release(acb);
 }
 
-- 
1.7.2.2

[Qemu-devel] kvm networking todo wiki

2010-09-21 Thread Michael S. Tsirkin

I've put up a wiki page with a kvm networking todo list,
mainly to avoid effort duplication, but also in the hope
to draw attention to what I think we should try addressing
in KVM:

http://www.linux-kvm.org/page/NetworkingTodo

This page could cover all networking related activity in KVM,
currently most info is related to virtio-net.

Note: if there's no developer listed for an item,
this just means I don't know of anyone actively working
on an issue at the moment, not that no one intends to.

I would appreciate it if others working on one of the items on this list
would add their names so we can communicate better.  If others like this
wiki page, please go ahead and add stuff you are working on if any.

It would be especially nice to add autotest projects:
there is just a short test matrix and a catch-all
'Cover test matrix with autotest', currently.

Currently there are some links to Red Hat bugzilla entries,
feel free to add links to other bugzillas.

Thanks!

-- 
MST

[Qemu-devel] [PATCH 03/20] vvfat: Use cache=unsafe

2010-09-21 Thread Kevin Wolf

From: Kevin Wolf 

The qcow file used for write support in vvfat is a temporary file,
so we can use cache=unsafe there. Without this, write support is just
too slow to be of any use.

Signed-off-by: Kevin Wolf 
---
 block/vvfat.c |   14 ++
 1 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/block/vvfat.c b/block/vvfat.c
index 0772037..53e57bf 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -2788,6 +2788,7 @@ static int enable_write_target(BDRVVVFATState *s)
 {
 BlockDriver *bdrv_qcow;
 QEMUOptionParameter *options;
+int ret;
 int size = sector2cluster(s, s->sector_count);
 s->used_clusters = calloc(size, 1);
 
@@ -2803,11 +2804,16 @@ static int enable_write_target(BDRVVVFATState *s)
 
 if (bdrv_create(bdrv_qcow, s->qcow_filename, options) < 0)
return -1;
+
 s->qcow = bdrv_new("");
-if (s->qcow == NULL ||
-bdrv_open(s->qcow, s->qcow_filename, BDRV_O_RDWR, bdrv_qcow) < 0)
-{
-   return -1;
+if (s->qcow == NULL) {
+return -1;
+}
+
+ret = bdrv_open(s->qcow, s->qcow_filename,
+BDRV_O_RDWR | BDRV_O_CACHE_WB | BDRV_O_NO_FLUSH, bdrv_qcow);
+if (ret < 0) {
+   return ret;
 }
 
 #ifndef _WIN32
-- 
1.7.2.2

[Qemu-devel] How to fix NEON of cortext-a9 on qemu

2010-09-21 Thread Pham Van Thiet

Hello everybody,
Some days ago, I read a topic about "QEMU state of ARM NEON support" from 
address: http://comments.gmane.org/gmane.comp.emulators.qemu/65999 I dowloaded 
test file anh run it on QEMU for cortex-a8 because the reference test file is 
written for cortex-a8 and I also listed some instructions that run incorrectly 
and some instructions not support by QEMU (test for cortex-a8). Now, I must 
list 

detail all NEON instructions of cortex-a9 that run incorrectly and not 
support by qemu, but I don't have reference file, if you have neon instructions 
list that not support and run incorrectly by QEMU, please send them for me. 
Actual, I met much difference for this task because I am a new bie.
And I also learn about how to implement a neon instruction for qemu, but I 
don't know what is file on qemu I will change, please help me. If you have an 
example for it please send it for me. Or if you have a patch file for NEON 
instruction on qemu please send it for me?
Thanks and best regard,
 

 
Pham Van Thiet

[Qemu-devel] [PATCH v6 00/10] initial spice support.

2010-09-21 Thread Gerd Hoffmann

  Hi,

Here comes v6 of the iniial spice support patch series which is
largely a repost of v5.

 * Detect spice in configure, Makefile windup.
 * Support for keyboard, mouse and tablet.
 * Support for simple display output (works as DisplayChangeListener,
   plays with any gfx card, sends simple draw commands to update
   dirty regions).

Note that this patch series does *not* yet contain the qxl paravirtual
gfx card.  That will come as part of a additional patch series after
sorting the vgabios support.

The patches are also available in the git repository at:
  git://anongit.freedesktop.org/spice/qemu submit.6

Changes since v5:
  * Rebased to latest master.
  * Sorted some minor conflicts with trace patches.

Changes since v4:
  * Code style fixups.
  * Small bug fix (display_remote not being set when configuring
spice via config file).

Changes since v3:
  * Drop global spice_server variable, provide a thin wrapper function
instead so spice interfaces can be registered without needing
spice_server.
  * Update locking comments.

Changes since v2:
  * Add copyright headers to the files.
  * Add dprint for debug logging.
  * Add mapping for buttons and leds.
  * Add comments for locking+threads.
  * Drop includes which qemu-common.h brings in.
  * Compile -spice switch unconditionally.
  * Hook up spice init using module.h, drop #ifdefs.
  * Misc minor tweaks.


Gerd Hoffmann (10):
  Use display types for local display only.
  Use machine_init() to register virtfs config options.
  add pflib: PixelFormat conversion library.
  configure: add logging
  add spice into the configure file
  spice: core bits
  spice: add keyboard
  spice: add mouse
  spice: simple display
  spice: add tablet support

 Makefile.objs  |3 +
 configure  |   42 +-
 fsdev/qemu-fsdev.c |9 +
 pflib.c|  213 +++
 pflib.h|   20 +++
 qemu-config.c  |   18 +++
 qemu-config.h  |1 +
 qemu-options.hx|   21 +++
 sysemu.h   |1 -
 ui/qemu-spice.h|   41 +
 ui/spice-core.c|  189 
 ui/spice-display.c |  412 
 ui/spice-display.h |   69 +
 ui/spice-input.c   |  217 +++
 vl.c   |   50 +--
 15 files changed, 1287 insertions(+), 19 deletions(-)
 create mode 100644 pflib.c
 create mode 100644 pflib.h
 create mode 100644 ui/qemu-spice.h
 create mode 100644 ui/spice-core.c
 create mode 100644 ui/spice-display.c
 create mode 100644 ui/spice-display.h
 create mode 100644 ui/spice-input.c

[Qemu-devel] [PATCH v6 04/10] configure: add logging

2010-09-21 Thread Gerd Hoffmann

Write compile commands and messages to config.log.
Useful for debugging configure.

Signed-off-by: Gerd Hoffmann 
---
 configure |7 +--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/configure b/configure
index 3bfc5e9..66b1d0b 100755
--- a/configure
+++ b/configure
@@ -16,15 +16,18 @@ TMPO="${TMPDIR1}/qemu-conf-${RANDOM}-$$-${RANDOM}.o"
 TMPE="${TMPDIR1}/qemu-conf-${RANDOM}-$$-${RANDOM}.exe"
 
 trap "rm -f $TMPC $TMPO $TMPE ; exit" EXIT INT QUIT TERM
+rm -f config.log
 
 compile_object() {
-  $cc $QEMU_CFLAGS -c -o $TMPO $TMPC > /dev/null 2> /dev/null
+  echo $cc $QEMU_CFLAGS -c -o $TMPO $TMPC >> config.log
+  $cc $QEMU_CFLAGS -c -o $TMPO $TMPC >> config.log 2>&1
 }
 
 compile_prog() {
   local_cflags="$1"
   local_ldflags="$2"
-  $cc $QEMU_CFLAGS $local_cflags -o $TMPE $TMPC $LDFLAGS $local_ldflags > 
/dev/null 2> /dev/null
+  echo $cc $QEMU_CFLAGS $local_cflags -o $TMPE $TMPC $LDFLAGS $local_ldflags 
>> config.log
+  $cc $QEMU_CFLAGS $local_cflags -o $TMPE $TMPC $LDFLAGS $local_ldflags >> 
config.log 2>&1
 }
 
 # check whether a command is available to this shell (may be either an
-- 
1.7.1

[Qemu-devel] [PATCH v6 03/10] add pflib: PixelFormat conversion library.

2010-09-21 Thread Gerd Hoffmann


Signed-off-by: Gerd Hoffmann 
---
 Makefile.objs |1 +
 pflib.c   |  213 +
 pflib.h   |   20 ++
 3 files changed, 234 insertions(+), 0 deletions(-)
 create mode 100644 pflib.c
 create mode 100644 pflib.h

diff --git a/Makefile.objs b/Makefile.objs
index 3ef6d80..8268574 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -83,6 +83,7 @@ common-obj-y += qemu-char.o savevm.o #aio.o
 common-obj-y += msmouse.o ps2.o
 common-obj-y += qdev.o qdev-properties.o
 common-obj-y += block-migration.o
+common-obj-y += pflib.o
 
 common-obj-$(CONFIG_BRLAPI) += baum.o
 common-obj-$(CONFIG_POSIX) += migration-exec.o migration-unix.o migration-fd.o
diff --git a/pflib.c b/pflib.c
new file mode 100644
index 000..1154d0c
--- /dev/null
+++ b/pflib.c
@@ -0,0 +1,213 @@
+/*
+ * PixelFormat conversion library.
+ *
+ * Author: Gerd Hoffmann 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+#include "qemu-common.h"
+#include "console.h"
+#include "pflib.h"
+
+typedef struct QemuPixel QemuPixel;
+
+typedef void (*pf_convert)(QemuPfConv *conv,
+   void *dst, void *src, uint32_t cnt);
+typedef void (*pf_convert_from)(PixelFormat *pf,
+QemuPixel *dst, void *src, uint32_t cnt);
+typedef void (*pf_convert_to)(PixelFormat *pf,
+  void *dst, QemuPixel *src, uint32_t cnt);
+
+struct QemuPfConv {
+pf_convertconvert;
+PixelFormat   src;
+PixelFormat   dst;
+
+/* for copy_generic() */
+pf_convert_from   conv_from;
+pf_convert_to conv_to;
+QemuPixel *conv_buf;
+uint32_t  conv_cnt;
+};
+
+struct QemuPixel {
+uint8_t red;
+uint8_t green;
+uint8_t blue;
+uint8_t alpha;
+};
+
+/* --- */
+/* PixelFormat -> QemuPixel conversions*/
+
+static void conv_16_to_pixel(PixelFormat *pf,
+ QemuPixel *dst, void *src, uint32_t cnt)
+{
+uint16_t *src16 = src;
+
+while (cnt > 0) {
+dst->red   = ((*src16 & pf->rmask) >> pf->rshift) << (8 - pf->rbits);
+dst->green = ((*src16 & pf->gmask) >> pf->gshift) << (8 - pf->gbits);
+dst->blue  = ((*src16 & pf->bmask) >> pf->bshift) << (8 - pf->bbits);
+dst->alpha = ((*src16 & pf->amask) >> pf->ashift) << (8 - pf->abits);
+dst++, src16++, cnt--;
+}
+}
+
+/* assumes pf->{r,g,b,a}bits == 8 */
+static void conv_32_to_pixel_fast(PixelFormat *pf,
+  QemuPixel *dst, void *src, uint32_t cnt)
+{
+uint32_t *src32 = src;
+
+while (cnt > 0) {
+dst->red   = (*src32 & pf->rmask) >> pf->rshift;
+dst->green = (*src32 & pf->gmask) >> pf->gshift;
+dst->blue  = (*src32 & pf->bmask) >> pf->bshift;
+dst->alpha = (*src32 & pf->amask) >> pf->ashift;
+dst++, src32++, cnt--;
+}
+}
+
+static void conv_32_to_pixel_generic(PixelFormat *pf,
+ QemuPixel *dst, void *src, uint32_t cnt)
+{
+uint32_t *src32 = src;
+
+while (cnt > 0) {
+if (pf->rbits < 8) {
+dst->red   = ((*src32 & pf->rmask) >> pf->rshift) << (8 - 
pf->rbits);
+} else {
+dst->red   = ((*src32 & pf->rmask) >> pf->rshift) >> (pf->rbits - 
8);
+}
+if (pf->gbits < 8) {
+dst->green = ((*src32 & pf->gmask) >> pf->gshift) << (8 - 
pf->gbits);
+} else {
+dst->green = ((*src32 & pf->gmask) >> pf->gshift) >> (pf->gbits - 
8);
+}
+if (pf->bbits < 8) {
+dst->blue  = ((*src32 & pf->bmask) >> pf->bshift) << (8 - 
pf->bbits);
+} else {
+dst->blue  = ((*src32 & pf->bmask) >> pf->bshift) >> (pf->bbits - 
8);
+}
+if (pf->abits < 8) {
+dst->alpha = ((*src32 & pf->amask) >> pf->ashift) << (8 - 
pf->abits);
+} else {
+dst->alpha = ((*src32 & pf->amask) >> pf->ashift) >> (pf->abits - 
8);
+}
+dst++, src32++, cnt--;
+}
+}
+
+/* --- */
+/* QemuPixel -> PixelFormat conversions*/
+
+static void conv_pixel_to_16(PixelFormat *pf,
+ void *dst, QemuPixel *src, uint32_t cnt)
+{
+uint16_t *dst16 = dst;
+
+while (cnt > 0) {
+*dst16  = ((uint16_t)src->red   >> (8 - pf->rbits)) << pf->rshift;
+*dst16 |= ((uint16_t)src->green >> (8 - pf->gbits)) << pf->gshift;
+*dst16 |= ((uint16_t)src->blue  >> (8 - pf->bbits)) << pf->bshift;
+*dst16 |= ((uint16_t)src->alpha >> (8 - pf->abits)) << pf->ashift;
+dst16++, src++, cnt--;
+}
+}
+
+static void conv_pixel_to_32(PixelFormat *pf,
+ void *dst, QemuPixe

[Qemu-devel] [PATCH v6 07/10] spice: add keyboard

2010-09-21 Thread Gerd Hoffmann

Open keyboard channel.  Now you can type into the spice client and the
keyboard events are sent to your guest.  You'll need some other display
like vnc to actually see the guest responding to them though.

Signed-off-by: Gerd Hoffmann 
---
 Makefile.objs|2 +-
 ui/qemu-spice.h  |1 +
 ui/spice-core.c  |2 +
 ui/spice-input.c |   85 ++
 4 files changed, 89 insertions(+), 1 deletions(-)
 create mode 100644 ui/spice-input.c

diff --git a/Makefile.objs b/Makefile.objs
index 2b58109..a95106f 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -88,7 +88,7 @@ common-obj-y += pflib.o
 common-obj-$(CONFIG_BRLAPI) += baum.o
 common-obj-$(CONFIG_POSIX) += migration-exec.o migration-unix.o migration-fd.o
 
-common-obj-$(CONFIG_SPICE) += ui/spice-core.o
+common-obj-$(CONFIG_SPICE) += ui/spice-core.o ui/spice-input.o
 
 audio-obj-y = audio.o noaudio.o wavaudio.o mixeng.o
 audio-obj-$(CONFIG_SDL) += sdlaudio.o
diff --git a/ui/qemu-spice.h b/ui/qemu-spice.h
index 50faefb..175c961 100644
--- a/ui/qemu-spice.h
+++ b/ui/qemu-spice.h
@@ -28,6 +28,7 @@
 extern int using_spice;
 
 void qemu_spice_init(void);
+void qemu_spice_input_init(void);
 int qemu_spice_add_interface(SpiceBaseInstance *sin);
 
 #else  /* CONFIG_SPICE */
diff --git a/ui/spice-core.c b/ui/spice-core.c
index 006604b..8b5e4a8 100644
--- a/ui/spice-core.c
+++ b/ui/spice-core.c
@@ -167,6 +167,8 @@ void qemu_spice_init(void)
 
 spice_server_init(spice_server, &core_interface);
 using_spice = 1;
+
+qemu_spice_input_init();
 }
 
 int qemu_spice_add_interface(SpiceBaseInstance *sin)
diff --git a/ui/spice-input.c b/ui/spice-input.c
new file mode 100644
index 000..5538a79
--- /dev/null
+++ b/ui/spice-input.c
@@ -0,0 +1,85 @@
+/*
+ * Copyright (C) 2010 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 or
+ * (at your option) version 3 of the License.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#include "qemu-common.h"
+#include "qemu-spice.h"
+#include "console.h"
+
+/* keyboard bits */
+
+typedef struct QemuSpiceKbd {
+SpiceKbdInstance sin;
+int ledstate;
+} QemuSpiceKbd;
+
+static void kbd_push_key(SpiceKbdInstance *sin, uint8_t frag);
+static uint8_t kbd_get_leds(SpiceKbdInstance *sin);
+static void kbd_leds(void *opaque, int l);
+
+static const SpiceKbdInterface kbd_interface = {
+.base.type  = SPICE_INTERFACE_KEYBOARD,
+.base.description   = "qemu keyboard",
+.base.major_version = SPICE_INTERFACE_KEYBOARD_MAJOR,
+.base.minor_version = SPICE_INTERFACE_KEYBOARD_MINOR,
+.push_scan_freg = kbd_push_key,
+.get_leds   = kbd_get_leds,
+};
+
+static void kbd_push_key(SpiceKbdInstance *sin, uint8_t frag)
+{
+kbd_put_keycode(frag);
+}
+
+static uint8_t kbd_get_leds(SpiceKbdInstance *sin)
+{
+QemuSpiceKbd *kbd = container_of(sin, QemuSpiceKbd, sin);
+return kbd->ledstate;
+}
+
+static void kbd_leds(void *opaque, int ledstate)
+{
+QemuSpiceKbd *kbd = opaque;
+
+kbd->ledstate = 0;
+if (ledstate & QEMU_SCROLL_LOCK_LED) {
+kbd->ledstate |= SPICE_KEYBOARD_MODIFIER_FLAGS_SCROLL_LOCK;
+}
+if (ledstate & QEMU_NUM_LOCK_LED) {
+kbd->ledstate |= SPICE_KEYBOARD_MODIFIER_FLAGS_NUM_LOCK;
+}
+if (ledstate & QEMU_CAPS_LOCK_LED) {
+kbd->ledstate |= SPICE_KEYBOARD_MODIFIER_FLAGS_CAPS_LOCK;
+}
+spice_server_kbd_leds(&kbd->sin, ledstate);
+}
+
+void qemu_spice_input_init(void)
+{
+QemuSpiceKbd *kbd;
+
+kbd = qemu_mallocz(sizeof(*kbd));
+kbd->sin.base.sif = &kbd_interface.base;
+qemu_spice_add_interface(&kbd->sin.base);
+qemu_add_led_event_handler(kbd_leds, kbd);
+}
-- 
1.7.1

[Qemu-devel] [PATCH v6 06/10] spice: core bits

2010-09-21 Thread Gerd Hoffmann

Add -spice command line switch.  Has support setting passwd and port for
now.  With this patch applied the spice client can successfully connect
to qemu.  You can't do anything useful yet though.

Signed-off-by: Gerd Hoffmann 
---
 Makefile.objs   |2 +
 qemu-config.c   |   18 +
 qemu-config.h   |1 +
 qemu-options.hx |   21 ++
 ui/qemu-spice.h |   39 
 ui/spice-core.c |  187 +++
 vl.c|   14 
 7 files changed, 282 insertions(+), 0 deletions(-)
 create mode 100644 ui/qemu-spice.h
 create mode 100644 ui/spice-core.c

diff --git a/Makefile.objs b/Makefile.objs
index 8268574..2b58109 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -88,6 +88,8 @@ common-obj-y += pflib.o
 common-obj-$(CONFIG_BRLAPI) += baum.o
 common-obj-$(CONFIG_POSIX) += migration-exec.o migration-unix.o migration-fd.o
 
+common-obj-$(CONFIG_SPICE) += ui/spice-core.o
+
 audio-obj-y = audio.o noaudio.o wavaudio.o mixeng.o
 audio-obj-$(CONFIG_SDL) += sdlaudio.o
 audio-obj-$(CONFIG_OSS) += ossaudio.o
diff --git a/qemu-config.c b/qemu-config.c
index e3b746c..5c6ae63 100644
--- a/qemu-config.c
+++ b/qemu-config.c
@@ -351,6 +351,24 @@ static QemuOptsList qemu_cpudef_opts = {
 },
 };
 
+QemuOptsList qemu_spice_opts = {
+.name = "spice",
+.head = QTAILQ_HEAD_INITIALIZER(qemu_spice_opts.head),
+.desc = {
+{
+.name = "port",
+.type = QEMU_OPT_NUMBER,
+},{
+.name = "password",
+.type = QEMU_OPT_STRING,
+},{
+.name = "disable-ticketing",
+.type = QEMU_OPT_BOOL,
+},
+{ /* end if list */ }
+},
+};
+
 static QemuOptsList *vm_config_groups[32] = {
 &qemu_drive_opts,
 &qemu_chardev_opts,
diff --git a/qemu-config.h b/qemu-config.h
index 533a049..20d707f 100644
--- a/qemu-config.h
+++ b/qemu-config.h
@@ -3,6 +3,7 @@
 
 extern QemuOptsList qemu_fsdev_opts;
 extern QemuOptsList qemu_virtfs_opts;
+extern QemuOptsList qemu_spice_opts;
 
 QemuOptsList *qemu_find_opts(const char *group);
 void qemu_add_opts(QemuOptsList *list);
diff --git a/qemu-options.hx b/qemu-options.hx
index a0b5ae9..718d47a 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -670,6 +670,27 @@ STEXI
 Enable SDL.
 ETEXI
 
+DEF("spice", HAS_ARG, QEMU_OPTION_spice,
+"-spiceenable spice\n", QEMU_ARCH_ALL)
+STEXI
+...@item -spice @var{option}[,@var{option}[,...]]
+...@findex -spice
+Enable the spice remote desktop protocol. Valid options are
+
+...@table @option
+
+...@item port=
+Set the TCP port spice is listening on.
+
+...@item password=
+Set the password you need to authenticate.
+
+...@item disable-ticketing
+Allow client connects without authentication.
+
+...@end table
+ETEXI
+
 DEF("portrait", 0, QEMU_OPTION_portrait,
 "-portrait   rotate graphical output 90 deg left (only PXA LCD)\n",
 QEMU_ARCH_ALL)
diff --git a/ui/qemu-spice.h b/ui/qemu-spice.h
new file mode 100644
index 000..50faefb
--- /dev/null
+++ b/ui/qemu-spice.h
@@ -0,0 +1,39 @@
+/*
+ * Copyright (C) 2010 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 or
+ * (at your option) version 3 of the License.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+
+#ifndef QEMU_SPICE_H
+#define QEMU_SPICE_H
+
+#ifdef CONFIG_SPICE
+
+#include 
+
+#include "qemu-option.h"
+#include "qemu-config.h"
+
+extern int using_spice;
+
+void qemu_spice_init(void);
+int qemu_spice_add_interface(SpiceBaseInstance *sin);
+
+#else  /* CONFIG_SPICE */
+
+#define using_spice 0
+
+#endif /* CONFIG_SPICE */
+
+#endif /* QEMU_SPICE_H */
diff --git a/ui/spice-core.c b/ui/spice-core.c
new file mode 100644
index 000..006604b
--- /dev/null
+++ b/ui/spice-core.c
@@ -0,0 +1,187 @@
+/*
+ * Copyright (C) 2010 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 or
+ * (at your option) version 3 of the License.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+
+#include 
+#include 
+
+#incl

[Qemu-devel] [PATCH v6 05/10] add spice into the configure file

2010-09-21 Thread Gerd Hoffmann


Signed-off-by: Gerd Hoffmann 
---
 configure |   35 +++
 1 files changed, 35 insertions(+), 0 deletions(-)

diff --git a/configure b/configure
index 66b1d0b..695a632 100755
--- a/configure
+++ b/configure
@@ -325,6 +325,7 @@ user_pie="no"
 zero_malloc=""
 trace_backend="nop"
 trace_file="trace"
+spice=""
 
 # OS specific
 if check_define __linux__ ; then
@@ -630,6 +631,10 @@ for opt do
   ;;
   --enable-kvm) kvm="yes"
   ;;
+  --disable-spice) spice="no"
+  ;;
+  --enable-spice) spice="yes"
+  ;;
   --enable-profiler) profiler="yes"
   ;;
   --enable-cocoa)
@@ -912,6 +917,8 @@ echo "  --enable-vhost-net   enable vhost-net 
acceleration support"
 echo "  --trace-backend=BTrace backend nop simple ust"
 echo "  --trace-file=NAMEFull PATH,NAME of file to store traces"
 echo "   Default:trace-"
+echo "  --disable-spice  disable spice"
+echo "  --enable-spice   enable spice"
 echo ""
 echo "NOTE: The object files are built at the place where configure is 
launched"
 exit 1
@@ -2062,6 +2069,29 @@ if compile_prog "" ""; then
 gcc_attribute_warn_unused_result=yes
 fi
 
+# spice probe
+if test "$spice" != "no" ; then
+  cat > $TMPC << EOF
+#include 
+int main(void) { spice_server_new(); return 0; }
+EOF
+  spice_cflags=$($pkgconfig --cflags spice-protocol spice-server 2>/dev/null)
+  spice_libs=$($pkgconfig --libs spice-protocol spice-server 2>/dev/null)
+  if $pkgconfig --atleast-version=0.5.3 spice-server &&\
+ compile_prog "$spice_cflags" "$spice_libs" ; then
+spice="yes"
+libs_softmmu="$libs_softmmu $spice_libs"
+QEMU_CFLAGS="$QEMU_CFLAGS $spice_cflags"
+  else
+if test "$spice" = "yes" ; then
+  feature_not_found "spice"
+fi
+spice="no"
+  fi
+fi
+
+##
+
 ##
 # check if we have fdatasync
 
@@ -2245,6 +2275,7 @@ echo "uuid support  $uuid"
 echo "vhost-net support $vhost_net"
 echo "Trace backend $trace_backend"
 echo "Trace output file $trace_file-"
+echo "spice support $spice"
 
 if test $sdl_too_old = "yes"; then
 echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -2482,6 +2513,10 @@ if test "$fdatasync" = "yes" ; then
   echo "CONFIG_FDATASYNC=y" >> $config_host_mak
 fi
 
+if test "$spice" = "yes" ; then
+  echo "CONFIG_SPICE=y" >> $config_host_mak
+fi
+
 # XXX: suppress that
 if [ "$bsd" = "yes" ] ; then
   echo "CONFIG_BSD=y" >> $config_host_mak
-- 
1.7.1

[Qemu-devel] [PATCH v6 10/10] spice: add tablet support

2010-09-21 Thread Gerd Hoffmann

Add support for the spice tablet interface.  The tablet interface will
be registered (and then used by the spice client) as soon as a absolute
pointing device is available and used by the guest, i.e. you'll have to
configure your guest with '-usbdevice tablet'.

Signed-off-by: Gerd Hoffmann 
---
 ui/spice-display.c |2 +-
 ui/spice-input.c   |   94 
 2 files changed, 88 insertions(+), 8 deletions(-)

diff --git a/ui/spice-display.c b/ui/spice-display.c
index 0bc230e..6702dfd 100644
--- a/ui/spice-display.c
+++ b/ui/spice-display.c
@@ -180,7 +180,7 @@ void qemu_spice_create_host_primary(SimpleSpiceDisplay *ssd)
 surface.width  = ds_get_width(ssd->ds);
 surface.height = ds_get_height(ssd->ds);
 surface.stride = -surface.width * 4;
-surface.mouse_mode = 0;
+surface.mouse_mode = true;
 surface.flags  = 0;
 surface.type   = 0;
 surface.mem= (intptr_t)ssd->buf;
diff --git a/ui/spice-input.c b/ui/spice-input.c
index 91cf18d..37c8578 100644
--- a/ui/spice-input.c
+++ b/ui/spice-input.c
@@ -17,6 +17,7 @@
 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -76,9 +77,13 @@ static void kbd_leds(void *opaque, int ledstate)
 
 /* mouse bits */
 
-typedef struct QemuSpiceMouse {
-SpiceMouseInstance sin;
-} QemuSpiceMouse;
+typedef struct QemuSpicePointer {
+SpiceMouseInstance  mouse;
+SpiceTabletInstance tablet;
+int width, height, x, y;
+Notifier mouse_mode;
+bool absolute;
+} QemuSpicePointer;
 
 static int map_buttons(int spice_buttons)
 {
@@ -121,17 +126,92 @@ static const SpiceMouseInterface mouse_interface = {
 .buttons= mouse_buttons,
 };
 
+static void tablet_set_logical_size(SpiceTabletInstance* sin, int width, int 
height)
+{
+QemuSpicePointer *pointer = container_of(sin, QemuSpicePointer, tablet);
+
+if (height < 16) {
+height = 16;
+}
+if (width < 16) {
+width = 16;
+}
+pointer->width  = width;
+pointer->height = height;
+}
+
+static void tablet_position(SpiceTabletInstance* sin, int x, int y,
+uint32_t buttons_state)
+{
+QemuSpicePointer *pointer = container_of(sin, QemuSpicePointer, tablet);
+
+pointer->x = x * 0x7FFF / (pointer->width - 1);
+pointer->y = y * 0x7FFF / (pointer->height - 1);
+kbd_mouse_event(pointer->x, pointer->y, 0, map_buttons(buttons_state));
+}
+
+
+static void tablet_wheel(SpiceTabletInstance* sin, int wheel,
+ uint32_t buttons_state)
+{
+QemuSpicePointer *pointer = container_of(sin, QemuSpicePointer, tablet);
+
+kbd_mouse_event(pointer->x, pointer->y, wheel, map_buttons(buttons_state));
+}
+
+static void tablet_buttons(SpiceTabletInstance *sin,
+   uint32_t buttons_state)
+{
+QemuSpicePointer *pointer = container_of(sin, QemuSpicePointer, tablet);
+
+kbd_mouse_event(pointer->x, pointer->y, 0, map_buttons(buttons_state));
+}
+
+static const SpiceTabletInterface tablet_interface = {
+.base.type  = SPICE_INTERFACE_TABLET,
+.base.description   = "tablet",
+.base.major_version = SPICE_INTERFACE_TABLET_MAJOR,
+.base.minor_version = SPICE_INTERFACE_TABLET_MINOR,
+.set_logical_size   = tablet_set_logical_size,
+.position   = tablet_position,
+.wheel  = tablet_wheel,
+.buttons= tablet_buttons,
+};
+
+static void mouse_mode_notifier(Notifier *notifier)
+{
+QemuSpicePointer *pointer = container_of(notifier, QemuSpicePointer, 
mouse_mode);
+bool is_absolute  = kbd_mouse_is_absolute();
+
+if (pointer->absolute == is_absolute) {
+return;
+}
+
+if (is_absolute) {
+qemu_spice_add_interface(&pointer->tablet.base);
+} else {
+spice_server_remove_interface(&pointer->tablet.base);
+}
+pointer->absolute = is_absolute;
+}
+
 void qemu_spice_input_init(void)
 {
 QemuSpiceKbd *kbd;
-QemuSpiceMouse *mouse;
+QemuSpicePointer *pointer;
 
 kbd = qemu_mallocz(sizeof(*kbd));
 kbd->sin.base.sif = &kbd_interface.base;
 qemu_spice_add_interface(&kbd->sin.base);
 qemu_add_led_event_handler(kbd_leds, kbd);
 
-mouse = qemu_mallocz(sizeof(*mouse));
-mouse->sin.base.sif = &mouse_interface.base;
-qemu_spice_add_interface(&mouse->sin.base);
+pointer = qemu_mallocz(sizeof(*pointer));
+pointer->mouse.base.sif  = &mouse_interface.base;
+pointer->tablet.base.sif = &tablet_interface.base;
+qemu_spice_add_interface(&pointer->mouse.base);
+
+pointer->absolute = false;
+pointer->mouse_mode.notify = mouse_mode_notifier;
+qemu_add_mouse_mode_change_notifier(&pointer->mouse_mode);
+mouse_mode_notifier(&pointer->mouse_mode);
 }
-- 
1.7.1

[Qemu-devel] [PATCH v6 02/10] Use machine_init() to register virtfs config options.

2010-09-21 Thread Gerd Hoffmann


Signed-off-by: Gerd Hoffmann 
---
 fsdev/qemu-fsdev.c |9 +
 vl.c   |5 -
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/fsdev/qemu-fsdev.c b/fsdev/qemu-fsdev.c
index ad69b0e..280b8f5 100644
--- a/fsdev/qemu-fsdev.c
+++ b/fsdev/qemu-fsdev.c
@@ -16,6 +16,7 @@
 #include "qemu-queue.h"
 #include "osdep.h"
 #include "qemu-common.h"
+#include "qemu-config.h"
 
 static QTAILQ_HEAD(FsTypeEntry_head, FsTypeListEntry) fstype_entries =
 QTAILQ_HEAD_INITIALIZER(fstype_entries);
@@ -75,3 +76,11 @@ FsTypeEntry *get_fsdev_fsentry(char *id)
 }
 return NULL;
 }
+
+static void fsdev_register_config(void)
+{
+qemu_add_opts(&qemu_fsdev_opts);
+qemu_add_opts(&qemu_virtfs_opts);
+}
+machine_init(fsdev_register_config);
+
diff --git a/vl.c b/vl.c
index f434193..2edc853 100644
--- a/vl.c
+++ b/vl.c
@@ -1863,11 +1863,6 @@ int main(int argc, char **argv, char **envp)
 tb_size = 0;
 autostart= 1;
 
-#ifdef CONFIG_VIRTFS
-qemu_add_opts(&qemu_fsdev_opts);
-qemu_add_opts(&qemu_virtfs_opts);
-#endif
-
 /* first pass of option parsing */
 optind = 1;
 while (optind < argc) {
-- 
1.7.1

[Qemu-devel] [PATCH v6 01/10] Use display types for local display only.

2010-09-21 Thread Gerd Hoffmann

This patch drops DT_VNC.  The display types are only used to select
select the local display (i.e. curses, sdl, coca, ...).  Remote
displays (for now only vnc, spice will follow) can be enabled
independently.

Signed-off-by: Gerd Hoffmann 
---
 sysemu.h |1 -
 vl.c |   24 +---
 2 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/sysemu.h b/sysemu.h
index a1f6466..b81a70e 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -94,7 +94,6 @@ typedef enum DisplayType
 DT_DEFAULT,
 DT_CURSES,
 DT_SDL,
-DT_VNC,
 DT_NOGRAPHIC,
 } DisplayType;
 
diff --git a/vl.c b/vl.c
index 3f45aa9..f434193 100644
--- a/vl.c
+++ b/vl.c
@@ -176,6 +176,7 @@ static const char *data_dir;
 const char *bios_name = NULL;
 enum vga_retrace_method vga_retrace_method = VGA_RETRACE_DUMB;
 DisplayType display_type = DT_DEFAULT;
+int display_remote = 0;
 const char* keyboard_layout = NULL;
 ram_addr_t ram_size;
 const char *mem_path = NULL;
@@ -2477,7 +2478,7 @@ int main(int argc, char **argv, char **envp)
 }
 break;
case QEMU_OPTION_vnc:
-display_type = DT_VNC;
+display_remote++;
vnc_display = optarg;
break;
 case QEMU_OPTION_no_acpi:
@@ -2921,17 +2922,17 @@ int main(int argc, char **argv, char **envp)
 /* just use the first displaystate for the moment */
 ds = get_displaystate();
 
-if (display_type == DT_DEFAULT) {
+if (display_type == DT_DEFAULT && !display_remote) {
 #if defined(CONFIG_SDL) || defined(CONFIG_COCOA)
 display_type = DT_SDL;
 #else
-display_type = DT_VNC;
 vnc_display = "localhost:0,to=99";
 show_vnc_port = 1;
 #endif
 }
 
 
+/* init local displays */
 switch (display_type) {
 case DT_NOGRAPHIC:
 break;
@@ -2949,7 +2950,12 @@ int main(int argc, char **argv, char **envp)
 cocoa_display_init(ds, full_screen);
 break;
 #endif
-case DT_VNC:
+default:
+break;
+}
+
+/* init remote displays */
+if (vnc_display) {
 vnc_display_init(ds);
 if (vnc_display_open(ds, vnc_display) < 0)
 exit(1);
@@ -2957,12 +2963,10 @@ int main(int argc, char **argv, char **envp)
 if (show_vnc_port) {
 printf("VNC server running on `%s'\n", vnc_display_local_addr(ds));
 }
-break;
-default:
-break;
 }
-dpy_resize(ds);
 
+/* display setup */
+dpy_resize(ds);
 dcl = ds->listeners;
 while (dcl != NULL) {
 if (dcl->dpy_refresh != NULL) {
@@ -2972,12 +2976,10 @@ int main(int argc, char **argv, char **envp)
 }
 dcl = dcl->next;
 }
-
-if (display_type == DT_NOGRAPHIC || display_type == DT_VNC) {
+if (ds->gui_timer == NULL) {
 nographic_timer = qemu_new_timer(rt_clock, nographic_update, NULL);
 qemu_mod_timer(nographic_timer, qemu_get_clock(rt_clock));
 }
-
 text_consoles_set_display(ds);
 
 if (gdbstub_dev && gdbserver_start(gdbstub_dev) < 0) {
-- 
1.7.1

[Qemu-devel] [PATCH v6 09/10] spice: simple display

2010-09-21 Thread Gerd Hoffmann

With that patch applied you'll actually see the guests screen in the
spice client.  This does *not* bring qxl and full spice support though.
This is basically the qxl vga mode made more generic, so it plays
together with any qemu-emulated gfx card.  You can display stdvga or
cirrus via spice client.  You can have both vnc and spice enabled and
clients connected at the same time.

Signed-off-by: Gerd Hoffmann 
---
 Makefile.objs  |2 +-
 ui/qemu-spice.h|1 +
 ui/spice-display.c |  412 
 ui/spice-display.h |   69 +
 vl.c   |7 +
 5 files changed, 490 insertions(+), 1 deletions(-)
 create mode 100644 ui/spice-display.c
 create mode 100644 ui/spice-display.h

diff --git a/Makefile.objs b/Makefile.objs
index a95106f..a3113d8 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -88,7 +88,7 @@ common-obj-y += pflib.o
 common-obj-$(CONFIG_BRLAPI) += baum.o
 common-obj-$(CONFIG_POSIX) += migration-exec.o migration-unix.o migration-fd.o
 
-common-obj-$(CONFIG_SPICE) += ui/spice-core.o ui/spice-input.o
+common-obj-$(CONFIG_SPICE) += ui/spice-core.o ui/spice-input.o 
ui/spice-display.o
 
 audio-obj-y = audio.o noaudio.o wavaudio.o mixeng.o
 audio-obj-$(CONFIG_SDL) += sdlaudio.o
diff --git a/ui/qemu-spice.h b/ui/qemu-spice.h
index 175c961..063c7dc 100644
--- a/ui/qemu-spice.h
+++ b/ui/qemu-spice.h
@@ -29,6 +29,7 @@ extern int using_spice;
 
 void qemu_spice_init(void);
 void qemu_spice_input_init(void);
+void qemu_spice_display_init(DisplayState *ds);
 int qemu_spice_add_interface(SpiceBaseInstance *sin);
 
 #else  /* CONFIG_SPICE */
diff --git a/ui/spice-display.c b/ui/spice-display.c
new file mode 100644
index 000..0bc230e
--- /dev/null
+++ b/ui/spice-display.c
@@ -0,0 +1,412 @@
+/*
+ * Copyright (C) 2010 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 or
+ * (at your option) version 3 of the License.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+
+#include 
+
+#include "qemu-common.h"
+#include "qemu-spice.h"
+#include "qemu-timer.h"
+#include "qemu-queue.h"
+#include "monitor.h"
+#include "console.h"
+#include "sysemu.h"
+
+#include "spice-display.h"
+
+static int debug = 0;
+
+static void __attribute__((format(printf,2,3)))
+dprint(int level, const char *fmt, ...)
+{
+va_list args;
+
+if (level <= debug) {
+va_start(args, fmt);
+vfprintf(stderr, fmt, args);
+va_end(args);
+}
+}
+
+int qemu_spice_rect_is_empty(const QXLRect* r)
+{
+return r->top == r->bottom || r->left == r->right;
+}
+
+void qemu_spice_rect_union(QXLRect *dest, const QXLRect *r)
+{
+if (qemu_spice_rect_is_empty(r)) {
+return;
+}
+
+if (qemu_spice_rect_is_empty(dest)) {
+*dest = *r;
+return;
+}
+
+dest->top = MIN(dest->top, r->top);
+dest->left = MIN(dest->left, r->left);
+dest->bottom = MAX(dest->bottom, r->bottom);
+dest->right = MAX(dest->right, r->right);
+}
+
+/*
+ * Called from spice server thread context (via interface_get_command).
+ * We do *not* hold the global qemu mutex here, so extra care is needed
+ * when calling qemu functions.  Qemu interfaces used:
+ *- pflib (is re-entrant).
+ *- qemu_malloc (underlying glibc malloc is re-entrant).
+ */
+SimpleSpiceUpdate *qemu_spice_create_update(SimpleSpiceDisplay *ssd)
+{
+SimpleSpiceUpdate *update;
+QXLDrawable *drawable;
+QXLImage *image;
+QXLCommand *cmd;
+uint8_t *src, *dst;
+int by, bw, bh;
+
+if (qemu_spice_rect_is_empty(&ssd->dirty)) {
+return NULL;
+};
+
+pthread_mutex_lock(&ssd->lock);
+dprint(2, "%s: lr %d -> %d,  tb -> %d -> %d\n", __FUNCTION__,
+   ssd->dirty.left, ssd->dirty.right,
+   ssd->dirty.top, ssd->dirty.bottom);
+
+update   = qemu_mallocz(sizeof(*update));
+drawable = &update->drawable;
+image= &update->image;
+cmd  = &update->ext.cmd;
+
+bw   = ssd->dirty.right - ssd->dirty.left;
+bh   = ssd->dirty.bottom - ssd->dirty.top;
+update->bitmap = qemu_malloc(bw * bh * 4);
+
+drawable->bbox= ssd->dirty;
+drawable->clip.type   = SPICE_CLIP_TYPE_NONE;
+drawable->effect  = QXL_EFFECT_OPAQUE;
+drawable->release_info.id = (intptr_t)update;
+drawable->type= QXL_DRAW_COPY;
+drawable->surfaces_dest[0] = -1;
+drawable->surfaces_dest[1] = -1;
+drawable->surfaces_dest[2] = -1;
+
+drawable->u.copy.rop_des

[Qemu-devel] [PATCH v6 08/10] spice: add mouse

2010-09-21 Thread Gerd Hoffmann

Open mouse channel.  Now you can move the guests mouse pointer.
No tablet / absolute positioning (yet) though.

Signed-off-by: Gerd Hoffmann 
---
 ui/spice-input.c |   52 
 1 files changed, 52 insertions(+), 0 deletions(-)

diff --git a/ui/spice-input.c b/ui/spice-input.c
index 5538a79..91cf18d 100644
--- a/ui/spice-input.c
+++ b/ui/spice-input.c
@@ -74,12 +74,64 @@ static void kbd_leds(void *opaque, int ledstate)
 spice_server_kbd_leds(&kbd->sin, ledstate);
 }
 
+/* mouse bits */
+
+typedef struct QemuSpiceMouse {
+SpiceMouseInstance sin;
+} QemuSpiceMouse;
+
+static int map_buttons(int spice_buttons)
+{
+int qemu_buttons = 0;
+
+/*
+ * Note: SPICE_MOUSE_BUTTON_* specifies the wire protocol but this
+ * isn't what we get passed in via interface callbacks for the
+ * middle and right button ...
+ */
+if (spice_buttons & SPICE_MOUSE_BUTTON_MASK_LEFT) {
+qemu_buttons |= MOUSE_EVENT_LBUTTON;
+}
+if (spice_buttons & 0x04 /* SPICE_MOUSE_BUTTON_MASK_MIDDLE */) {
+qemu_buttons |= MOUSE_EVENT_MBUTTON;
+}
+if (spice_buttons & 0x02 /* SPICE_MOUSE_BUTTON_MASK_RIGHT */) {
+qemu_buttons |= MOUSE_EVENT_RBUTTON;
+}
+return qemu_buttons;
+}
+
+static void mouse_motion(SpiceMouseInstance *sin, int dx, int dy, int dz,
+ uint32_t buttons_state)
+{
+kbd_mouse_event(dx, dy, dz, map_buttons(buttons_state));
+}
+
+static void mouse_buttons(SpiceMouseInstance *sin, uint32_t buttons_state)
+{
+kbd_mouse_event(0, 0, 0, map_buttons(buttons_state));
+}
+
+static const SpiceMouseInterface mouse_interface = {
+.base.type  = SPICE_INTERFACE_MOUSE,
+.base.description   = "mouse",
+.base.major_version = SPICE_INTERFACE_MOUSE_MAJOR,
+.base.minor_version = SPICE_INTERFACE_MOUSE_MINOR,
+.motion = mouse_motion,
+.buttons= mouse_buttons,
+};
+
 void qemu_spice_input_init(void)
 {
 QemuSpiceKbd *kbd;
+QemuSpiceMouse *mouse;
 
 kbd = qemu_mallocz(sizeof(*kbd));
 kbd->sin.base.sif = &kbd_interface.base;
 qemu_spice_add_interface(&kbd->sin.base);
 qemu_add_led_event_handler(kbd_leds, kbd);
+
+mouse = qemu_mallocz(sizeof(*mouse));
+mouse->sin.base.sif = &mouse_interface.base;
+qemu_spice_add_interface(&mouse->sin.base);
 }
-- 
1.7.1

[Qemu-devel] [VGABIOS PATCH 08/11] Add qemu stdvga pci bios

2010-09-21 Thread Gerd Hoffmann

Add PCI vgabios for the qemu standard vga (1234:).
Name it vgabios-stdvga.bin.

Signed-off-by: Gerd Hoffmann 
---
 Makefile |   10 +-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/Makefile b/Makefile
index d440b93..58f064e 100644
--- a/Makefile
+++ b/Makefile
@@ -14,12 +14,14 @@ RELVERS = `pwd | sed "s-.*/--" | sed "s/vgabios//" | sed 
"s/-//"`
 
 VGABIOS_DATE = "-DVGABIOS_DATE=\"$(RELDATE)\""
 
-all: bios cirrus-bios
+all: bios cirrus-bios stdvga-bios
 
 bios: vgabios.bin vgabios.debug.bin
 
 cirrus-bios: vgabios-cirrus.bin vgabios-cirrus.debug.bin
 
+stdvga-bios: vgabios-stdvga.bin vgabios-stdvga.debug.bin
+
 clean:
/bin/rm -f  biossums vbetables-gen vbetables.h *.o *.s *.ld86 \
   temp.awk.* vgabios*.orig _vgabios_* _vgabios-debug_* core 
vgabios*.bin vgabios*.txt $(RELEASE).bin *.bak
@@ -35,18 +37,24 @@ vgabios.bin  : VGAFLAGS := -DVBE 
-DPCI_VID=0x1234
 vgabios.debug.bin: VGAFLAGS := -DVBE -DPCI_VID=0x1234 -DDEBUG
 vgabios-cirrus.bin   : VGAFLAGS := -DCIRRUS -DPCIBIOS 
 vgabios-cirrus.debug.bin : VGAFLAGS := -DCIRRUS -DPCIBIOS -DCIRRUS_DEBUG
+vgabios-stdvga.bin   : VGAFLAGS := -DVBE -DPCIBIOS -DPCI_VID=0x1234 
-DPCI_DID=0x
+vgabios-stdvga.debug.bin : VGAFLAGS := -DVBE -DPCIBIOS -DPCI_VID=0x1234 
-DPCI_DID=0x -DDEBUG
 
 # dist names
 vgabios.bin  : DISTNAME := VGABIOS-lgpl-latest.bin
 vgabios.debug.bin: DISTNAME := VGABIOS-lgpl-latest.debug.bin
 vgabios-cirrus.bin   : DISTNAME := VGABIOS-lgpl-latest.cirrus.bin
 vgabios-cirrus.debug.bin : DISTNAME := VGABIOS-lgpl-latest.cirrus.debug.bin
+vgabios-stdvga.bin   : DISTNAME := VGABIOS-lgpl-latest.stdvga.bin
+vgabios-stdvga.debug.bin : DISTNAME := VGABIOS-lgpl-latest.stdvga.debug.bin
 
 # dependencies
 vgabios.bin  : $(VGA_FILES) $(VBE_FILES) biossums
 vgabios.debug.bin: $(VGA_FILES) $(VBE_FILES) biossums
 vgabios-cirrus.bin   : $(VGA_FILES) clext.c biossums
 vgabios-cirrus.debug.bin : $(VGA_FILES) clext.c biossums
+vgabios-stdvga.bin   : $(VGA_FILES) $(VBE_FILES) biossums
+vgabios-stdvga.debug.bin : $(VGA_FILES) $(VBE_FILES) biossums
 
 # build rule
 %.bin:
-- 
1.7.1

[Qemu-devel] [VGABIOS PATCH 02/11] - added support for a lot more non-standard VBE modes (e.g. widescreen modes) - requires latest Bochs VBE code (16 MB video memory, VBE_DISPI_ID5, VRAM size in 64k p

2010-09-21 Thread Gerd Hoffmann

From: Volker Ruppert 


Signed-off-by: Gerd Hoffmann 
---
 vbe.c   |   31 ++--
 vbe.h   |   70 --
 vbetables-gen.c |   43 +
 3 files changed, 91 insertions(+), 53 deletions(-)

diff --git a/vbe.c b/vbe.c
index 92e3d0d..ecff90d 100644
--- a/vbe.c
+++ b/vbe.c
@@ -38,8 +38,6 @@
 #include "vbe.h"
 #include "vbetables.h"
 
-#define VBE_TOTAL_VIDEO_MEMORY_DIV_64K 
(VBE_DISPI_TOTAL_VIDEO_MEMORY_MB*1024/64)
-
 // The current OEM Software Revision of this VBE Bios
 #define VBE_OEM_SOFTWARE_REV 0x0002;
 
@@ -715,7 +713,7 @@ vbe_init:
   mov  [bx], al
   pop  bx
   pop  ds
-  mov  ax, # VBE_DISPI_ID4
+  mov  ax, # VBE_DISPI_ID5
   call dispi_set_id
 no_vbe_interface:
 #if defined(USE_BX_INFO) || defined(DEBUG)
@@ -742,7 +740,19 @@ no_vbe_flag:
   mov  ds, ax
   mov  si, #_no_vbebios_info_string
   jmp  _display_string
-ASM_END  
+
+; helper function for memory size calculation
+
+lmulul:
+  and eax, #0x
+  shl ebx, #16
+  or  eax, ebx
+  SEG SS
+  mul eax, dword ptr [di]
+  mov ebx, eax
+  shr ebx, #16
+  ret
+ASM_END
 
 /** Function 00h - Return VBE Controller Information
  * 
@@ -765,6 +775,7 @@ Bit16u *AX;Bit16u ES;Bit16u DI;
 Bit16uvbe2_info;
 Bit16ucur_mode=0;
 Bit16ucur_ptr=34;
+Bit16usize_64k;
 ModeInfoListItem  *cur_info=&mode_info_list;
 
 status = read_word(ss, AX);
@@ -820,8 +831,9 @@ Bit16u *AX;Bit16u ES;Bit16u DI;
 vbe_info_block.VideoModePtr_Seg= ES ;
 vbe_info_block.VideoModePtr_Off= DI + 34;
 
-// VBE Total Memory (in 64b blocks)
-vbe_info_block.TotalMemory = VBE_TOTAL_VIDEO_MEMORY_DIV_64K;
+// VBE Total Memory (in 64k blocks)
+outw(VBE_DISPI_IOPORT_INDEX, VBE_DISPI_INDEX_VIDEO_MEMORY_64K);
+vbe_info_block.TotalMemory = inw(VBE_DISPI_IOPORT_DATA);
 
 if (vbe2_info)
 {
@@ -845,8 +857,11 @@ Bit16u *AX;Bit16u ES;Bit16u DI;
 
 do
 {
+size_64k = (Bit16u)((Bit32u)cur_info->info.XResolution * 
cur_info->info.XResolution * cur_info->info.BitsPerPixel) >> 19;
+
 if ((cur_info->info.XResolution <= dispi_get_max_xres()) &&
-(cur_info->info.BitsPerPixel <= dispi_get_max_bpp())) {
+(cur_info->info.BitsPerPixel <= dispi_get_max_bpp()) &&
+(size_64k <= vbe_info_block.TotalMemory)) {
 #ifdef DEBUG
   printf("VBE found mode %x => %x\n", cur_info->mode,cur_mode);
 #endif
@@ -855,7 +870,7 @@ Bit16u *AX;Bit16u ES;Bit16u DI;
   cur_ptr+=2;
 } else {
 #ifdef DEBUG
-  printf("VBE mode %x (xres=%x / bpp=%02x) not supported by 
display\n", 
cur_info->mode,cur_info->info.XResolution,cur_info->info.BitsPerPixel);
+  printf("VBE mode %x (xres=%x / bpp=%02x) not supported \n", 
cur_info->mode,cur_info->info.XResolution,cur_info->info.BitsPerPixel);
 #endif
 }
 cur_info++;
diff --git a/vbe.h b/vbe.h
index 60434ac..72cb045 100644
--- a/vbe.h
+++ b/vbe.h
@@ -275,39 +275,41 @@ typedef struct ModeInfoListItem
 //like 0xE000
 
 
-  #define VBE_DISPI_BANK_ADDRESS  0xA
-  #define VBE_DISPI_BANK_SIZE_KB  64
-  
-  #define VBE_DISPI_MAX_XRES  1024
-  #define VBE_DISPI_MAX_YRES  768
-  
-  #define VBE_DISPI_IOPORT_INDEX  0x01CE
-  #define VBE_DISPI_IOPORT_DATA   0x01CF
-  
-  #define VBE_DISPI_INDEX_ID  0x0
-  #define VBE_DISPI_INDEX_XRES0x1
-  #define VBE_DISPI_INDEX_YRES0x2
-  #define VBE_DISPI_INDEX_BPP 0x3
-  #define VBE_DISPI_INDEX_ENABLE  0x4
-  #define VBE_DISPI_INDEX_BANK0x5
-  #define VBE_DISPI_INDEX_VIRT_WIDTH  0x6
-  #define VBE_DISPI_INDEX_VIRT_HEIGHT 0x7
-  #define VBE_DISPI_INDEX_X_OFFSET0x8
-  #define VBE_DISPI_INDEX_Y_OFFSET0x9
-  
-  #define VBE_DISPI_ID0   0xB0C0
-  #define VBE_DISPI_ID1   0xB0C1
-  #define VBE_DISPI_ID2   0xB0C2
-  #define VBE_DISPI_ID3   0xB0C3
-  #define VBE_DISPI_ID4   0xB0C4
-  
-  #define VBE_DISPI_DISABLED  0x00
-  #define VBE_DISPI_ENABLED   0x01
-  #define VBE_DISPI_GETCAPS   0x02
-  #define VBE_DISPI_8BIT_DAC  0x20
-  #define VBE_DISPI_LFB_ENABLED   0x40
-  #define VBE_DISPI_NOCLEARMEM0x80
-  
-  #define VBE_DISPI_LFB_PHYSICAL_ADDRESS  0xE000
+  #define VBE_DISPI_BANK_ADDRESS   0xA
+  #define VBE_DISPI_BANK_SIZE_KB   64
+
+  #define VBE_DISPI_MAX_XRES   2560
+  #define VBE_DISPI_MAX_YRES   1600
+
+  #define VBE_DISPI_IOPORT_INDEX   0x01CE
+  #define VBE_DISPI_IOPORT_DATA0x01CF
+
+  #define VBE_DISPI_INDEX_ID

[Qemu-devel] [VGABIOS PATCH 04/11] - biosfn_write_teletype: fixed attribute when scrolling in text mode

2010-09-21 Thread Gerd Hoffmann

From: Volker Ruppert 


Signed-off-by: Gerd Hoffmann 
---
 vgabios.c |6 --
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/vgabios.c b/vgabios.c
index fbc3588..ea3aae8 100644
--- a/vgabios.c
+++ b/vgabios.c
@@ -2039,7 +2039,9 @@ Bit8u car;Bit8u page;Bit8u attr;Bit8u flag;
   {
if(vga_modes[line].class==TEXT)
 {
- biosfn_scroll(0x01,0x07,0,0,nbrows-1,nbcols-1,page,SCROLL_UP);
+ address=SCREEN_MEM_START(nbcols,nbrows,page)+(xcurs+(ycurs-1)*nbcols)*2;
+ attr=read_byte(vga_modes[line].sstart,address+1);
+ biosfn_scroll(0x01,attr,0,0,nbrows-1,nbcols-1,page,SCROLL_UP);
 }
else
 {
@@ -2047,7 +2049,7 @@ Bit8u car;Bit8u page;Bit8u attr;Bit8u flag;
 }
ycurs-=1;
   }
- 
+
  // Set the cursor for the page
  cursor=ycurs; cursor<<=8; cursor+=xcurs;
  biosfn_set_cursor_pos(page,cursor);
-- 
1.7.1

[Qemu-devel] [VGABIOS PATCH 05/11] - updates for release 0.6c

2010-09-21 Thread Gerd Hoffmann

From: Volker Ruppert 


Signed-off-by: Gerd Hoffmann 
---
 ChangeLog |   12 
 README|3 ++-
 2 files changed, 14 insertions(+), 1 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 35bf00a..dbaed5d 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,15 @@
+2009-04-07 20:18  vruppert
+
+   * vgabios.c (1.69):
+
+   - biosfn_write_teletype: fixed attribute when scrolling in text mode
+
+2009-04-06 20:17  vruppert
+
+   * ChangeLog (1.28), README (1.17):
+
+   - preparing for release 0.6c
+
 2009-01-25 16:46  vruppert
 
* vbe.c (1.62), vbe.h (1.28), vbetables-gen.c (1.5):
diff --git a/README b/README
index ce67aeb..c68b573 100644
--- a/README
+++ b/README
@@ -90,11 +90,12 @@ For any information on qemu, visit the website 
http://fabrice.bellard.free.fr/qe
 
 History
 ---
-vgabios-0.6c : not yet released
+vgabios-0.6c : Apr 08 2009
   - Volker
 . added DPMS support to cirrus vgabios (patch from Gleb Natapov)
 . use VBE LFB address from PCI base address if present
 . added support for a lot more non-standard VBE modes (e.g. widescreen 
modes)
+. minor bugfixes
 
 vgabios-0.6b : May 30 2008
   - Volker
-- 
1.7.1

[Qemu-devel] [VGABIOS PATCH 06/11] Makefile cleanup

2010-09-21 Thread Gerd Hoffmann

Use a single rule for building bios binaries.
Use target specific variables to set compile flags.

This makes it more obvious what the differences between the versions
are.  It also makes it easier to add new bios binaries with slightly
different settings.

Signed-off-by: Gerd Hoffmann 
---
 Makefile |   76 +++--
 1 files changed, 34 insertions(+), 42 deletions(-)

diff --git a/Makefile b/Makefile
index 00e8c66..c3c744c 100644
--- a/Makefile
+++ b/Makefile
@@ -16,8 +16,7 @@ VGABIOS_DATE = "-DVGABIOS_DATE=\"$(RELDATE)\""
 
 all: bios cirrus-bios
 
-
-bios: biossums vgabios.bin vgabios.debug.bin
+bios: vgabios.bin vgabios.debug.bin
 
 cirrus-bios: vgabios-cirrus.bin vgabios-cirrus.debug.bin
 
@@ -27,6 +26,39 @@ clean:
 
 dist-clean: clean
 
+# source files
+VGA_FILES := vgabios.c vgabios.h vgafonts.h vgatables.h
+VBE_FILES := vbe.h vbe.c vbetables.h
+
+# build flags
+vgabios.bin  : VGAFLAGS := -DVBE
+vgabios.debug.bin: VGAFLAGS := -DVBE -DDEBUG
+vgabios-cirrus.bin   : VGAFLAGS := -DCIRRUS -DPCIBIOS 
+vgabios-cirrus.debug.bin : VGAFLAGS := -DCIRRUS -DPCIBIOS -DCIRRUS_DEBUG
+
+# dist names
+vgabios.bin  : DISTNAME := VGABIOS-lgpl-latest.bin
+vgabios.debug.bin: DISTNAME := VGABIOS-lgpl-latest.debug.bin
+vgabios-cirrus.bin   : DISTNAME := VGABIOS-lgpl-latest.cirrus.bin
+vgabios-cirrus.debug.bin : DISTNAME := VGABIOS-lgpl-latest.cirrus.debug.bin
+
+# dependencies
+vgabios.bin  : $(VGA_FILES) $(VBE_FILES) biossums
+vgabios.debug.bin: $(VGA_FILES) $(VBE_FILES) biossums
+vgabios-cirrus.bin   : $(VGA_FILES) clext.c biossums
+vgabios-cirrus.debug.bin : $(VGA_FILES) clext.c biossums
+
+# build rule
+%.bin:
+   $(GCC) -E -P vgabios.c $(VGABIOS_VERS) $(VGAFLAGS) $(VGABIOS_DATE) > 
_$*_.c
+   $(BCC) -o $*.s -C-c -D__i86__ -S -0 _$*_.c
+   sed -e 's/^\.text//' -e 's/^\.data//' $*.s > _$*_.s
+   $(AS86) _$*_.s -b $*.bin -u -w- -g -0 -j -O -l $*.txt
+   rm -f _$*_.s _$*_.c $*.s
+   mv $*.bin $(DISTNAME)
+   ./biossums $(DISTNAME)
+   ls -l $(DISTNAME)
+
 release: 
VGABIOS_VERS=\"-DVGABIOS_VERS=\\\"$(RELVERS)\\\"\" make bios cirrus-bios
/bin/rm -f  *.o *.s *.ld86 \
@@ -37,46 +69,6 @@ release:
cp VGABIOS-lgpl-latest.cirrus.debug.bin ../$(RELEASE).cirrus.debug.bin
tar czvf ../$(RELEASE).tgz --exclude CVS -C .. $(RELEASE)/
 
-vgabios.bin: vgabios.c vgabios.h vgafonts.h vgatables.h vbe.h vbe.c vbetables.h
-   $(GCC) -E -P vgabios.c $(VGABIOS_VERS) -DVBE $(VGABIOS_DATE) > 
_vgabios_.c
-   $(BCC) -o vgabios.s -C-c -D__i86__ -S -0 _vgabios_.c
-   sed -e 's/^\.text//' -e 's/^\.data//' vgabios.s > _vgabios_.s
-   $(AS86) _vgabios_.s -b vgabios.bin -u -w- -g -0 -j -O -l vgabios.txt
-   rm -f _vgabios_.s _vgabios_.c vgabios.s
-   mv vgabios.bin VGABIOS-lgpl-latest.bin
-   ./biossums VGABIOS-lgpl-latest.bin
-   ls -l VGABIOS-lgpl-latest.bin
-
-vgabios.debug.bin: vgabios.c vgabios.h vgafonts.h vgatables.h vbe.h vbe.c 
vbetables.h
-   $(GCC) -E -P vgabios.c $(VGABIOS_VERS) -DVBE -DDEBUG $(VGABIOS_DATE) > 
_vgabios-debug_.c
-   $(BCC) -o vgabios-debug.s -C-c -D__i86__ -S -0 _vgabios-debug_.c
-   sed -e 's/^\.text//' -e 's/^\.data//' vgabios-debug.s > 
_vgabios-debug_.s
-   $(AS86) _vgabios-debug_.s -b vgabios.debug.bin -u -w- -g -0 -j -O -l 
vgabios.debug.txt
-   rm -f _vgabios-debug_.s _vgabios-debug_.c vgabios-debug.s
-   mv vgabios.debug.bin VGABIOS-lgpl-latest.debug.bin
-   ./biossums VGABIOS-lgpl-latest.debug.bin
-   ls -l VGABIOS-lgpl-latest.debug.bin
-
-vgabios-cirrus.bin: vgabios.c vgabios.h vgafonts.h vgatables.h clext.c
-   $(GCC) -E -P vgabios.c $(VGABIOS_VERS) -DCIRRUS -DPCIBIOS 
$(VGABIOS_DATE) > _vgabios-cirrus_.c
-   $(BCC) -o vgabios-cirrus.s -C-c -D__i86__ -S -0 _vgabios-cirrus_.c
-   sed -e 's/^\.text//' -e 's/^\.data//' vgabios-cirrus.s > 
_vgabios-cirrus_.s
-   $(AS86) _vgabios-cirrus_.s -b vgabios-cirrus.bin -u -w- -g -0 -j -O -l 
vgabios.cirrus.txt
-   rm -f _vgabios-cirrus_.s _vgabios-cirrus_.c vgabios-cirrus.s
-   mv vgabios-cirrus.bin VGABIOS-lgpl-latest.cirrus.bin
-   ./biossums VGABIOS-lgpl-latest.cirrus.bin
-   ls -l VGABIOS-lgpl-latest.cirrus.bin
-
-vgabios-cirrus.debug.bin: vgabios.c vgabios.h vgafonts.h vgatables.h clext.c
-   $(GCC) -E -P vgabios.c $(VGABIOS_VERS) -DCIRRUS -DCIRRUS_DEBUG 
-DPCIBIOS $(VGABIOS_DATE) > _vgabios-cirrus-debug_.c
-   $(BCC) -o vgabios-cirrus-debug.s -C-c -D__i86__ -S -0 
_vgabios-cirrus-debug_.c
-   sed -e 's/^\.text//' -e 's/^\.data//' vgabios-cirrus-debug.s > 
_vgabios-cirrus-debug_.s
-   $(AS86) _vgabios-cirrus-debug_.s -b vgabios.cirrus.debug.bin -u -w- -g 
-0 -j -O -l vgabios.cirrus.debug.txt
-   rm -f _vgabios-cirrus-debug_.s _vgabios-cirrus-debug_.c 
vgabios-cirrus-debug.s
-   mv vgabios.cirrus.debug.bin VGABIOS-lgpl-latest.cirrus.debug.bin
-   ./biossums VGABIOS-l

[Qemu-devel] [VGABIOS PATCH 09/11] update pci_get_lfb_addr for vmware vga

2010-09-21 Thread Gerd Hoffmann

vmware vga has the framebuffer at pci region 1 not 0.  This patch makes
pci_get_lfb_addr check region 1 too.  It also gives names to the
numbered labels to make the code more readable.

Signed-off-by: Gerd Hoffmann 
---
 vgabios.c |   23 ++-
 1 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/vgabios.c b/vgabios.c
index 2e8b5d7..c1e312b 100644
--- a/vgabios.c
+++ b/vgabios.c
@@ -3852,26 +3852,31 @@ _pci_get_lfb_addr:
 mov dl, #0x00
 call pci_read_reg
 cmp ax, #0x
-jz pci_get_lfb_addr_5
- pci_get_lfb_addr_3:
+jz pci_get_lfb_addr_fail
+ pci_get_lfb_addr_next_dev:
 mov dl, #0x00
 call pci_read_reg
 cmp ax, bx ;; check vendor
-jz pci_get_lfb_addr_4
+jz pci_get_lfb_addr_found
 add cx, #0x8
 cmp cx, #0x200 ;; search bus #0 and #1
-jb pci_get_lfb_addr_3
- pci_get_lfb_addr_5:
+jb pci_get_lfb_addr_next_dev
+ pci_get_lfb_addr_fail:
 xor dx, dx ;; no LFB
-jmp pci_get_lfb_addr_6
- pci_get_lfb_addr_4:
+jmp pci_get_lfb_addr_return
+ pci_get_lfb_addr_found:
 mov dl, #0x10 ;; I/O space #0
 call pci_read_reg
 test ax, #0xfff1
-jnz pci_get_lfb_addr_5
+jz pci_get_lfb_addr_success
+mov dl, #0x14 ;; I/O space #1
+call pci_read_reg
+test ax, #0xfff1
+jnz pci_get_lfb_addr_fail
+ pci_get_lfb_addr_success:
 shr eax, #16
 mov dx, ax ;; LFB address
- pci_get_lfb_addr_6:
+ pci_get_lfb_addr_return:
   pop eax
   mov ax, dx
   pop dx
-- 
1.7.1

[Qemu-devel] [VGABIOS PATCH 03/11] - preparing for release 0.6c

2010-09-21 Thread Gerd Hoffmann

From: Volker Ruppert 


Signed-off-by: Gerd Hoffmann 
---
 ChangeLog |   35 +++
 README|6 ++
 2 files changed, 41 insertions(+), 0 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 75be5bd..35bf00a 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,38 @@
+2009-01-25 16:46  vruppert
+
+   * vbe.c (1.62), vbe.h (1.28), vbetables-gen.c (1.5):
+
+   - added support for a lot more non-standard VBE modes (e.g. widescreen 
modes)
+   - requires latest Bochs VBE code (16 MB video memory, VBE_DISPI_ID5, 
VRAM size
+ in 64k pages stored in VBE register)
+   - check if VBE mode is supported with current VRAM size
+
+2009-01-24 11:02  vruppert
+
+   * clext.c (1.14), vbe.c (1.61), vgabios.c (1.68):
+
+   - use VBE LFB address from PCI base address if present (rewrite of the 
cirrus
+ specific function in main vgabios code)
+   - removed unnecessary spaces
+
+2008-12-14 09:29  vruppert
+
+   * clext.c (1.13):
+
+   - added DPMS support to cirrus vgabios (patch from Gleb Natapov)
+
+2008-05-30 17:28  vruppert
+
+   * README (1.16):
+
+   - updated for release 0.6b
+
+2008-05-22 12:55  vruppert
+
+   * ChangeLog (1.27), README (1.15):
+
+   - preparations for release 0.6b
+
 2008-05-11 08:40  vruppert
 
* biossums.c (1.6):
diff --git a/README b/README
index 90141d4..ce67aeb 100644
--- a/README
+++ b/README
@@ -90,6 +90,12 @@ For any information on qemu, visit the website 
http://fabrice.bellard.free.fr/qe
 
 History
 ---
+vgabios-0.6c : not yet released
+  - Volker
+. added DPMS support to cirrus vgabios (patch from Gleb Natapov)
+. use VBE LFB address from PCI base address if present
+. added support for a lot more non-standard VBE modes (e.g. widescreen 
modes)
+
 vgabios-0.6b : May 30 2008
   - Volker
 . added PCI data structure for the Cirrus VGABIOS images
-- 
1.7.1

[Qemu-devel] [VGABIOS PATCH 00/11] vgabios update

2010-09-21 Thread Gerd Hoffmann

  Hi,

This patch series updates the vgabios.  The first five patches are taken
from the vgabios cvs and update the vgabios.git tree @ qemu.org to
vgabios release 0.6c.  As this update depends on a newer bochs API it
fully works on qemu 0.13 and master only.  When using this vgabios
version on qemu 0.12 vesa bios support will break.

The last six patches cleanup the build system a bit, add a proper
PCIROM header so seabios will happily load the roms from the PCI option
rom bar and add vgabios binaries for all current and the upcoming qxl
vga device.

cheers,
  Gerd

Gerd Hoffmann (6):
  Makefile cleanup
  Add defines for PCI IDs.
  Add qemu stdvga pci bios
  update pci_get_lfb_addr for vmware vga
  Add qemu vmware vga pci bios
  Add qemu qxl vga pci bios

Volker Ruppert (5):
  - use VBE LFB address from PCI base address if present (rewrite of
the cirrus specific function in main vgabios code) - removed
unnecessary spaces
  - added support for a lot more non-standard VBE modes (e.g.
widescreen modes) - requires latest Bochs VBE code (16 MB video
memory, VBE_DISPI_ID5, VRAM size in 64k pages stored in VBE
register) - check if VBE mode is supported with current VRAM size
  - preparing for release 0.6c
  - biosfn_write_teletype: fixed attribute when scrolling in text mode
  - updates for release 0.6c

 ChangeLog   |   47 +
 Makefile|  102 ---
 README  |7 
 clext.c |   51 +--
 vbe.c   |   94 ---
 vbe.h   |   70 +++--
 vbetables-gen.c |   43 +--
 vgabios.c   |   74 ++-
 8 files changed, 314 insertions(+), 174 deletions(-)

[Qemu-devel] [VGABIOS PATCH 11/11] Add qemu qxl vga pci bios

2010-09-21 Thread Gerd Hoffmann

Add PCI vgabios for the qemu qxl vga (1b36:0100).
Name it vgabios-qxl.bin.

Signed-off-by: Gerd Hoffmann 
---
 Makefile |   10 +-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/Makefile b/Makefile
index 2a093e8..578721a 100644
--- a/Makefile
+++ b/Makefile
@@ -14,7 +14,7 @@ RELVERS = `pwd | sed "s-.*/--" | sed "s/vgabios//" | sed 
"s/-//"`
 
 VGABIOS_DATE = "-DVGABIOS_DATE=\"$(RELDATE)\""
 
-all: bios cirrus-bios stdvga-bios vmware-bios
+all: bios cirrus-bios stdvga-bios vmware-bios qxl-bios
 
 bios: vgabios.bin vgabios.debug.bin
 
@@ -24,6 +24,8 @@ stdvga-bios: vgabios-stdvga.bin vgabios-stdvga.debug.bin
 
 vmware-bios: vgabios-vmware.bin vgabios-vmware.debug.bin
 
+qxl-bios: vgabios-qxl.bin vgabios-qxl.debug.bin
+
 clean:
/bin/rm -f  biossums vbetables-gen vbetables.h *.o *.s *.ld86 \
   temp.awk.* vgabios*.orig _vgabios_* _vgabios-debug_* core 
vgabios*.bin vgabios*.txt $(RELEASE).bin *.bak
@@ -43,6 +45,8 @@ vgabios-stdvga.bin   : VGAFLAGS := -DVBE -DPCIBIOS 
-DPCI_VID=0x1234 -DPCI_DI
 vgabios-stdvga.debug.bin : VGAFLAGS := -DVBE -DPCIBIOS -DPCI_VID=0x1234 
-DPCI_DID=0x -DDEBUG
 vgabios-vmware.bin   : VGAFLAGS := -DVBE -DPCIBIOS -DPCI_VID=0x15ad 
-DPCI_DID=0x0405
 vgabios-vmware.debug.bin : VGAFLAGS := -DVBE -DPCIBIOS -DPCI_VID=0x15ad 
-DPCI_DID=0x0405 -DDEBUG
+vgabios-qxl.bin  : VGAFLAGS := -DVBE -DPCIBIOS -DPCI_VID=0x1b36 
-DPCI_DID=0x0100
+vgabios-qxl.debug.bin: VGAFLAGS := -DVBE -DPCIBIOS -DPCI_VID=0x1b36 
-DPCI_DID=0x0100 -DDEBUG
 
 # dist names
 vgabios.bin  : DISTNAME := VGABIOS-lgpl-latest.bin
@@ -53,6 +57,8 @@ vgabios-stdvga.bin   : DISTNAME := 
VGABIOS-lgpl-latest.stdvga.bin
 vgabios-stdvga.debug.bin : DISTNAME := VGABIOS-lgpl-latest.stdvga.debug.bin
 vgabios-vmware.bin   : DISTNAME := VGABIOS-lgpl-latest.vmware.bin
 vgabios-vmware.debug.bin : DISTNAME := VGABIOS-lgpl-latest.vmware.debug.bin
+vgabios-qxl.bin  : DISTNAME := VGABIOS-lgpl-latest.qxl.bin
+vgabios-qxl.debug.bin: DISTNAME := VGABIOS-lgpl-latest.qxl.debug.bin
 
 # dependencies
 vgabios.bin  : $(VGA_FILES) $(VBE_FILES) biossums
@@ -63,6 +69,8 @@ vgabios-stdvga.bin   : $(VGA_FILES) $(VBE_FILES) biossums
 vgabios-stdvga.debug.bin : $(VGA_FILES) $(VBE_FILES) biossums
 vgabios-vmware.bin   : $(VGA_FILES) $(VBE_FILES) biossums
 vgabios-vmware.debug.bin : $(VGA_FILES) $(VBE_FILES) biossums
+vgabios-qxl.bin  : $(VGA_FILES) $(VBE_FILES) biossums
+vgabios-qxl.debug.bin: $(VGA_FILES) $(VBE_FILES) biossums
 
 # build rule
 %.bin:
-- 
1.7.1

[Qemu-devel] [VGABIOS PATCH 07/11] Add defines for PCI IDs.

2010-09-21 Thread Gerd Hoffmann

This patch allows to set PCI vendor and device IDs using defines
(PCI_VID and PCI_DID).  Use it for vgabios.bin.

Signed-off-by: Gerd Hoffmann 
---
 Makefile  |4 ++--
 vbe.c |6 +-
 vgabios.c |5 +
 3 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/Makefile b/Makefile
index c3c744c..d440b93 100644
--- a/Makefile
+++ b/Makefile
@@ -31,8 +31,8 @@ VGA_FILES := vgabios.c vgabios.h vgafonts.h vgatables.h
 VBE_FILES := vbe.h vbe.c vbetables.h
 
 # build flags
-vgabios.bin  : VGAFLAGS := -DVBE
-vgabios.debug.bin: VGAFLAGS := -DVBE -DDEBUG
+vgabios.bin  : VGAFLAGS := -DVBE -DPCI_VID=0x1234
+vgabios.debug.bin: VGAFLAGS := -DVBE -DPCI_VID=0x1234 -DDEBUG
 vgabios-cirrus.bin   : VGAFLAGS := -DCIRRUS -DPCIBIOS 
 vgabios-cirrus.debug.bin : VGAFLAGS := -DCIRRUS -DPCIBIOS -DCIRRUS_DEBUG
 
diff --git a/vbe.c b/vbe.c
index ecff90d..1fab2f9 100644
--- a/vbe.c
+++ b/vbe.c
@@ -925,7 +925,11 @@ Bit16u *AX;Bit16u CX; Bit16u ES;Bit16u DI;
 if (using_lfb) {
   info.NumberOfBanks = 1;
 }
-lfb_addr = pci_get_lfb_addr(0x1234); // experimental vendor
+#ifdef PCI_VID
+lfb_addr = pci_get_lfb_addr(PCI_VID);
+#else
+lfb_addr = 0;
+#endif
 if (lfb_addr > 0) {
   info.PhysBasePtr = ((Bit32u)lfb_addr << 16);
 }
diff --git a/vgabios.c b/vgabios.c
index ea3aae8..2e8b5d7 100644
--- a/vgabios.c
+++ b/vgabios.c
@@ -210,8 +210,13 @@ vgabios_pci_data:
 .word 0x1013
 .word 0x00b8 // CLGD5446
 #else
+#ifdef PCI_VID
+.word PCI_VID
+.word PCI_DID
+#else
 #error "Unknown PCI vendor and device id"
 #endif
+#endif
 .word 0 // reserved
 .word 0x18 // dlen
 .byte 0 // revision
-- 
1.7.1

[Qemu-devel] [PATCH REPOST 0/3] use new vgabios.

2010-09-21 Thread Gerd Hoffmann

  Hi,

This patch series will put the new vgabios (patches just re-posted)
into use for stdvga and vmware_vga.

For obvious reasons it depends on the new vgabios binaries being
present, i.e. vgabios patches being committed to vgabios.git, subtree
being updated and vgabios binaries being recompiled + committed to
qemu.git.

The patches are also available from
  git://anongit.freedesktop.org/spice/qemu vgabios

There is also a "vgabios.testing" branch available.  That one has one
more commit which just drops in new vgabios binaries, so the patches
can easily be tested even before the vgabios patches are committed.

cheers,
  Gerd

Gerd Hoffmann (3):
  Add new vgabios binaries to blobs list.
  switch stdvga to pci vgabios
  switch vmware_vga to pci vgabios

 Makefile|5 +++--
 hw/vga-pci.c|7 +++
 hw/vmware_vga.c |7 +--
 3 files changed, 7 insertions(+), 12 deletions(-)

[Qemu-devel] [PATCH REPOST 2/3] switch stdvga to pci vgabios

2010-09-21 Thread Gerd Hoffmann

Make stdvga provide the new vgabios binary (with pcibios support)
using the PCI option rom bar.  Seabios will happily load it from
there.  The new vga bios will also lookup the framebuffer address
in pci config space, so the magic bochs lfb @ 0xe000 is not
needed any more -> zap it.

Without the patch:

  # dmesg | grep framebuffer
  vesafb: framebuffer at 0xe000, mapped to 0xf7e8, using 1875k, total 
8192k
  # lspci -vs2
  00:02.0 VGA compatible controller: Technical Corp. Device  (prog-if 00 
[VGA controller])
Subsystem: Qumranet, Inc. Device 1100
Physical Slot: 2
Flags: fast devsel
Memory at f000 (32-bit, prefetchable) [size=8M]
Expansion ROM at  [disabled]

With patch applied:

  # dmesg | grep framebuffer
  vesafb: framebuffer at 0xf000, mapped to 0xf7e8, using 1875k, total 
8192k
  # lspci -vs2
  00:02.0 VGA compatible controller: Technical Corp. Device  (prog-if 00 
[VGA controller])
Subsystem: Qumranet, Inc. Device 1100
Physical Slot: 2
Flags: fast devsel
Memory at f000 (32-bit, prefetchable) [size=8M]
Expansion ROM at f080 [disabled] [size=64K]

cheers,
  Gerd

Signed-off-by: Gerd Hoffmann 
---
 hw/vga-pci.c |7 +++
 1 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/hw/vga-pci.c b/hw/vga-pci.c
index 2315f70..eef0e3c 100644
--- a/hw/vga-pci.c
+++ b/hw/vga-pci.c
@@ -103,11 +103,10 @@ static int pci_vga_initfn(PCIDevice *dev)
 bios_total_size <<= 1;
 pci_register_bar(&d->dev, PCI_ROM_SLOT, bios_total_size,
  PCI_BASE_ADDRESS_MEM_PREFETCH, vga_map);
+ } else {
+ if (dev->romfile == NULL)
+ dev->romfile = qemu_strdup("vgabios-stdvga.bin");
  }
-
-vga_init_vbe(s);
- /* ROM BIOS */
- rom_add_vga(VGABIOS_FILENAME);
  return 0;
 }
 
-- 
1.7.1

[Qemu-devel] [PATCH REPOST 3/3] switch vmware_vga to pci vgabios

2010-09-21 Thread Gerd Hoffmann


Signed-off-by: Gerd Hoffmann 
---
 hw/vmware_vga.c |7 +--
 1 files changed, 1 insertions(+), 6 deletions(-)

diff --git a/hw/vmware_vga.c b/hw/vmware_vga.c
index 12bff48..682f287 100644
--- a/hw/vmware_vga.c
+++ b/hw/vmware_vga.c
@@ -114,14 +114,12 @@ struct pci_vmsvga_state_s {
 # define SVGA_IO_BASE  SVGA_LEGACY_BASE_PORT
 # define SVGA_IO_MUL   1
 # define SVGA_FIFO_SIZE0x1
-# define SVGA_MEM_BASE 0xe000
 # define SVGA_PCI_DEVICE_IDPCI_DEVICE_ID_VMWARE_SVGA2
 #else
 # define SVGA_ID   SVGA_ID_1
 # define SVGA_IO_BASE  SVGA_LEGACY_BASE_PORT
 # define SVGA_IO_MUL   4
 # define SVGA_FIFO_SIZE0x1
-# define SVGA_MEM_BASE 0xe000
 # define SVGA_PCI_DEVICE_IDPCI_DEVICE_ID_VMWARE_SVGA
 #endif
 
@@ -1171,10 +1169,6 @@ static void vmsvga_init(struct vmsvga_state_s *s, int 
vga_ram_size)
 vga_init(&s->vga);
 vmstate_register(NULL, 0, &vmstate_vga_common, &s->vga);
 
-vga_init_vbe(&s->vga);
-
-rom_add_vga(VGABIOS_FILENAME);
-
 vmsvga_reset(s);
 }
 
@@ -1272,6 +1266,7 @@ static PCIDeviceInfo vmsvga_info = {
 .qdev.size= sizeof(struct pci_vmsvga_state_s),
 .qdev.vmsd= &vmstate_vmware_vga,
 .init = pci_vmsvga_initfn,
+.romfile  = "vgabios-vmware.bin",
 };
 
 static void vmsvga_register(void)
-- 
1.7.1

[Qemu-devel] [VGABIOS PATCH 01/11] - use VBE LFB address from PCI base address if present (rewrite of the cirrus specific function in main vgabios code) - removed unnecessary spaces

2010-09-21 Thread Gerd Hoffmann

From: Volker Ruppert 


Signed-off-by: Gerd Hoffmann 
---
 clext.c   |   51 ++-
 vbe.c |   59 ---
 vgabios.c |   58 ++
 3 files changed, 92 insertions(+), 76 deletions(-)

diff --git a/clext.c b/clext.c
index c7a2ad0..b0b6834 100644
--- a/clext.c
+++ b/clext.c
@@ -948,7 +948,8 @@ cirrus_vesa_01h_3:
   ;; 32-bit LFB address
   xor ax, ax
   stosw
-  call cirrus_get_lfb_addr
+  mov ax, #0x1013 ;; vendor Cirrus
+  call _pci_get_lfb_addr
   stosw
   or ax, ax
   jz cirrus_vesa_01h_4
@@ -1293,54 +1294,6 @@ cgm_2:
 cgm_3:
   ret
 
-  ; get LFB address
-  ; out - ax:LFB address (high 16 bit)
-  ;; NOTE - may be called in protected mode
-cirrus_get_lfb_addr:
-  push cx
-  push dx
-  push eax
-xor cx, cx
-mov dl, #0x00
-call cirrus_pci_read
-cmp ax, #0x
-jz cirrus_get_lfb_addr_5
- cirrus_get_lfb_addr_3:
-mov dl, #0x00
-call cirrus_pci_read
-cmp ax, #0x1013 ;; cirrus
-jz cirrus_get_lfb_addr_4
-add cx, #0x8
-cmp cx, #0x200 ;; search bus #0 and #1
-jb cirrus_get_lfb_addr_3
- cirrus_get_lfb_addr_5:
-xor dx, dx ;; no LFB
-jmp cirrus_get_lfb_addr_6
- cirrus_get_lfb_addr_4:
-mov dl, #0x10 ;; I/O space #0
-call cirrus_pci_read
-test ax, #0xfff1
-jnz cirrus_get_lfb_addr_5
-shr eax, #16
-mov dx, ax ;; LFB address
- cirrus_get_lfb_addr_6:
-  pop eax
-  mov ax, dx
-  pop dx
-  pop cx
-  ret
-
-cirrus_pci_read:
-  mov eax, #0x0080
-  mov ax, cx
-  shl eax, #8
-  mov al, dl
-  mov dx, #0xcf8
-  out dx, eax
-  add dl, #4
-  in  eax, dx
-  ret
-
 ;; out - al:bytes per pixel
 cirrus_get_bpp_bytes:
   push dx
diff --git a/vbe.c b/vbe.c
index 6173ca0..92e3d0d 100644
--- a/vbe.c
+++ b/vbe.c
@@ -766,9 +766,9 @@ Bit16u *AX;Bit16u ES;Bit16u DI;
 Bit16ucur_mode=0;
 Bit16ucur_ptr=34;
 ModeInfoListItem  *cur_info=&mode_info_list;
-
+
 status = read_word(ss, AX);
-
+
 #ifdef DEBUG
 printf("VBE vbe_biosfn_return_vbe_info ES%x DI%x AX%x\n",ES,DI,status);
 #endif
@@ -784,7 +784,7 @@ Bit16u *AX;Bit16u ES;Bit16u DI;
  (vbe_info_block.VbeSignature[1] == 'B') &&
  (vbe_info_block.VbeSignature[2] == 'E') &&
  (vbe_info_block.VbeSignature[3] == '2')) ||
- 
+
 ((vbe_info_block.VbeSignature[0] == 'V') &&
  (vbe_info_block.VbeSignature[1] == 'E') &&
  (vbe_info_block.VbeSignature[2] == 'S') &&
@@ -796,20 +796,20 @@ Bit16u *AX;Bit16u ES;Bit16u DI;
 #endif
 }
 #endif
-
+
 // VBE Signature
 vbe_info_block.VbeSignature[0] = 'V';
 vbe_info_block.VbeSignature[1] = 'E';
 vbe_info_block.VbeSignature[2] = 'S';
 vbe_info_block.VbeSignature[3] = 'A';
-
+
 // VBE Version supported
 vbe_info_block.VbeVersion = 0x0200;
-
+
 // OEM String
 vbe_info_block.OemStringPtr_Seg = 0xc000;
 vbe_info_block.OemStringPtr_Off = &vbebios_copyright;
-
+
 // Capabilities
 vbe_info_block.Capabilities[0] = VBE_CAPABILITY_8BIT_DAC;
 vbe_info_block.Capabilities[1] = 0;
@@ -824,7 +824,7 @@ Bit16u *AX;Bit16u ES;Bit16u DI;
 vbe_info_block.TotalMemory = VBE_TOTAL_VIDEO_MEMORY_DIV_64K;
 
 if (vbe2_info)
-   {
+{
 // OEM Stuff
 vbe_info_block.OemSoftwareRev = VBE_OEM_SOFTWARE_REV;
 vbe_info_block.OemVendorNamePtr_Seg = 0xc000;
@@ -837,12 +837,12 @@ Bit16u *AX;Bit16u ES;Bit16u DI;
 // copy updates in vbe_info_block back
 memcpyb(ES, DI, ss, &vbe_info_block, sizeof(vbe_info_block));
 }
-   else
-   {
+else
+{
 // copy updates in vbe_info_block back (VBE 1.x compatibility)
 memcpyb(ES, DI, ss, &vbe_info_block, 256);
-   }
-
+}
+
 do
 {
 if ((cur_info->info.XResolution <= dispi_get_max_xres()) &&
@@ -860,7 +860,7 @@ Bit16u *AX;Bit16u ES;Bit16u DI;
 }
 cur_info++;
 } while (cur_info->mode != VBE_VESA_MODE_END_OF_LIST);
-
+
 // Add vesa mode list terminator
 write_word(ES, DI + cur_ptr, cur_info->mode);
 
@@ -888,32 +888,37 @@ Bit16u *AX;Bit16u CX; Bit16u ES;Bit16u DI;
 ModeInfoBlock info;
 ModeInfoListItem  *cur_info;
 Boolean   using_lfb;
+Bit16ulfb_addr;
 
 #ifdef DEBUG
 printf("VBE vbe_biosfn_return_mode_information ES%x DI%x 
CX%x\n",ES,DI,CX);
 #endif
 
 using_lfb=((CX & VBE_MODE_LINEAR_FRAME_BUFFER) == 
VBE_MODE_LINEAR_FRAME_BUFFER);
-
+
 CX = (CX & 0x1ff);
-
+
 cur_info = mode_info_find_mode(CX, using_lfb, &cur_info);
 
 if (cur_info

[Qemu-devel] [VGABIOS PATCH 10/11] Add qemu vmware vga pci bios

2010-09-21 Thread Gerd Hoffmann

Add PCI vgabios for the qemu vmeare vga (15ad:0405).
Name it vgabios-vmware.bin.

Signed-off-by: Gerd Hoffmann 
---
 Makefile |   10 +-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/Makefile b/Makefile
index 58f064e..2a093e8 100644
--- a/Makefile
+++ b/Makefile
@@ -14,7 +14,7 @@ RELVERS = `pwd | sed "s-.*/--" | sed "s/vgabios//" | sed 
"s/-//"`
 
 VGABIOS_DATE = "-DVGABIOS_DATE=\"$(RELDATE)\""
 
-all: bios cirrus-bios stdvga-bios
+all: bios cirrus-bios stdvga-bios vmware-bios
 
 bios: vgabios.bin vgabios.debug.bin
 
@@ -22,6 +22,8 @@ cirrus-bios: vgabios-cirrus.bin vgabios-cirrus.debug.bin
 
 stdvga-bios: vgabios-stdvga.bin vgabios-stdvga.debug.bin
 
+vmware-bios: vgabios-vmware.bin vgabios-vmware.debug.bin
+
 clean:
/bin/rm -f  biossums vbetables-gen vbetables.h *.o *.s *.ld86 \
   temp.awk.* vgabios*.orig _vgabios_* _vgabios-debug_* core 
vgabios*.bin vgabios*.txt $(RELEASE).bin *.bak
@@ -39,6 +41,8 @@ vgabios-cirrus.bin   : VGAFLAGS := -DCIRRUS -DPCIBIOS
 vgabios-cirrus.debug.bin : VGAFLAGS := -DCIRRUS -DPCIBIOS -DCIRRUS_DEBUG
 vgabios-stdvga.bin   : VGAFLAGS := -DVBE -DPCIBIOS -DPCI_VID=0x1234 
-DPCI_DID=0x
 vgabios-stdvga.debug.bin : VGAFLAGS := -DVBE -DPCIBIOS -DPCI_VID=0x1234 
-DPCI_DID=0x -DDEBUG
+vgabios-vmware.bin   : VGAFLAGS := -DVBE -DPCIBIOS -DPCI_VID=0x15ad 
-DPCI_DID=0x0405
+vgabios-vmware.debug.bin : VGAFLAGS := -DVBE -DPCIBIOS -DPCI_VID=0x15ad 
-DPCI_DID=0x0405 -DDEBUG
 
 # dist names
 vgabios.bin  : DISTNAME := VGABIOS-lgpl-latest.bin
@@ -47,6 +51,8 @@ vgabios-cirrus.bin   : DISTNAME := 
VGABIOS-lgpl-latest.cirrus.bin
 vgabios-cirrus.debug.bin : DISTNAME := VGABIOS-lgpl-latest.cirrus.debug.bin
 vgabios-stdvga.bin   : DISTNAME := VGABIOS-lgpl-latest.stdvga.bin
 vgabios-stdvga.debug.bin : DISTNAME := VGABIOS-lgpl-latest.stdvga.debug.bin
+vgabios-vmware.bin   : DISTNAME := VGABIOS-lgpl-latest.vmware.bin
+vgabios-vmware.debug.bin : DISTNAME := VGABIOS-lgpl-latest.vmware.debug.bin
 
 # dependencies
 vgabios.bin  : $(VGA_FILES) $(VBE_FILES) biossums
@@ -55,6 +61,8 @@ vgabios-cirrus.bin   : $(VGA_FILES) clext.c biossums
 vgabios-cirrus.debug.bin : $(VGA_FILES) clext.c biossums
 vgabios-stdvga.bin   : $(VGA_FILES) $(VBE_FILES) biossums
 vgabios-stdvga.debug.bin : $(VGA_FILES) $(VBE_FILES) biossums
+vgabios-vmware.bin   : $(VGA_FILES) $(VBE_FILES) biossums
+vgabios-vmware.debug.bin : $(VGA_FILES) $(VBE_FILES) biossums
 
 # build rule
 %.bin:
-- 
1.7.1

[Qemu-devel] [PATCH REPOST 1/3] Add new vgabios binaries to blobs list.

2010-09-21 Thread Gerd Hoffmann


Signed-off-by: Gerd Hoffmann 
---
 Makefile |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/Makefile b/Makefile
index f95cc2f..867e038 100644
--- a/Makefile
+++ b/Makefile
@@ -154,8 +154,9 @@ ar  de en-us  fi  fr-be  hr it  lv  nl 
pl  ru th \
 common  de-ch  es fo  fr-ca  hu ja  mk  nl-be  pt  sl tr
 
 ifdef INSTALL_BLOBS
-BLOBS=bios.bin vgabios.bin vgabios-cirrus.bin ppc_rom.bin \
-video.x openbios-sparc32 openbios-sparc64 openbios-ppc \
+BLOBS=bios.bin vgabios.bin vgabios-cirrus.bin \
+vgabios-stdvga.bin vgabios-vmware.bin vgabios-qxl.bin \
+ppc_rom.bin video.x openbios-sparc32 openbios-sparc64 openbios-ppc \
 gpxe-eepro100-80861209.rom \
 gpxe-eepro100-80861229.rom \
 pxe-e1000.bin \
-- 
1.7.1

[Qemu-devel] Re: [PATCH] Move macros GCC_ATTR and GCC_FMT_ATTR to common header file

2010-09-21 Thread Stefan Weil


Please ignore this patch. It's wrong (= instead of ==).
I'll send a fixed version.

Sorry,
Stefan

Am 20.09.2010 23:05, schrieb Stefan Weil:

By moving the definition of GCC_ATTR and GCC_FMT_ATTR
from audio_int.h to qemu-common.h these macros are
now generally available for further patches which add
the gcc format attribute.

Newer gcc versions support format gnu_printf which is
better suited for use in QEMU than format printf
(QEMU always uses standard format strings (even with mingw32)).

Cc: Blue Swirl
Signed-off-by: Stefan Weil
---
  audio/audio_int.h |8 
  qemu-common.h |   16 
  2 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/audio/audio_int.h b/audio/audio_int.h
index f6a77ad..d8560b6 100644
--- a/audio/audio_int.h
+++ b/audio/audio_int.h
@@ -236,14 +236,6 @@ static inline int audio_ring_dist (int dst, int src, int 
len)
  return (dst>= src) ? (dst - src) : (len - src + dst);
  }

-#if defined __GNUC__
-#define GCC_ATTR __attribute__ ((__unused__, format (gnu_printf, 1, 2)))
-#define GCC_FMT_ATTR(n, m) __attribute__ ((format (gnu_printf, n, m)))
-#else
-#define GCC_ATTR /**/
-#define GCC_FMT_ATTR(n, m)
-#endif
-
  static void GCC_ATTR dolog (const char *fmt, ...)
  {
  va_list ap;
diff --git a/qemu-common.h b/qemu-common.h
index 956b545..8a2872a 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -70,6 +70,22 @@ struct iovec {
  #include
  #endif

+#if defined __GNUC__
+# if (__GNUC__<  4) || \
+ defined(__GNUC_MINOR__)&&  (__GNUC__ = 4)&&  (__GNUC_MINOR__<  4)
+   /* gcc versions before 4.4.x don't support gnu_printf, so use printf. */
+#  define GCC_ATTR __attribute__((__unused__, format(printf, 1, 2)))
+#  define GCC_FMT_ATTR(n, m) __attribute__((format(printf, n, m)))
+# else
+   /* Use gnu_printf when supported (qemu uses standard format strings). */
+#  define GCC_ATTR __attribute__((__unused__, format(gnu_printf, 1, 2)))
+#  define GCC_FMT_ATTR(n, m) __attribute__((format(gnu_printf, n, m)))
+# endif
+#else
+#define GCC_ATTR /**/
+#define GCC_FMT_ATTR(n, m)
+#endif
+
  #ifdef _WIN32
  #define fsync _commit
  #define lseek _lseeki64

[Qemu-devel] KVM call minutes for Sept 21

2010-09-21 Thread Chris Wright

Nested VMX
- looking for forward progress and better collaboration between the
  Intel and IBM teams
- needs more review (not a new issue)
- use cases
- work todo
  - merge baseline patch
- looks pretty good
- review is finding mostly small things at this point
- need some correctness verification (both review from Intel and testing)
  - need a test suite
- test suite harness will help here
  - a few dozen nested SVM tests are there, can follow for nested VMX
  - nested EPT
  - optimize (reduce vmreads and vmwrites)
- has long term maintan

Hotplug
- command...guest may or may not respond
- guest can't be trusted to be direct part of request/response loop
- solve at QMP level
- human monitor issues (multiple successive commands to complete a
  single unplug)
  - should be a GUI interface design decision, human monitor is not a
good design point
- digression into GUI interface

Drive caching
- need to formalize the meanings in terms of data integrity guarantees
- guest write cache (does it directly reflect the host write cache?)
  - live migration, underlying block dev changes, so need to decouple the two
- O_DIRECT + O_DSYNC
  - O_DSYNC needed based on whether disk cache is available
  - also issues with sparse files (e.g. O_DIRECT to unallocated extent)
  - how to manage w/out needing to flush every write, slow
- perhaps start with O_DIRECT on raw, non-sparse files only?
- backend needs to open backing store matching to guests disk cache state
- O_DIRECT itself has inconsistent integrity guarantees
  - works well with fully allocated file, depedent on disk cache disable
(or fs specific flushing)
- filesystem specific warnings (ext4 w/ barriers on, brtfs)
- need to be able to open w/ O_DSYNC depending on guets's write cache mode
- make write cache visible to guest (need a knob for this)
- qemu default is cache=writethrough, do we need to revisit that?
- just present user with option whether or not to use host page cache
- allow guest OS to choose disk write cache setting
  - set up host backend accordingly
- be nice preserve write cache settings over boot (outgrowing cmos storage)
- maybe some host fs-level optimization possible
  - e.g. O_DSYNC to allocated O_DIRECT extent becomes no-op
- conclusion
  - one direct user tunable, "use host page cache or not"
  - one guest OS tunable, "enable disk cache"

[Qemu-devel] [PATCH] Move macros GCC_ATTR and GCC_FMT_ATTR to common header file

2010-09-21 Thread Stefan Weil

By moving the definition of GCC_ATTR and GCC_FMT_ATTR
from audio_int.h to qemu-common.h these macros are
now generally available for further patches which add
the gcc format attribute.

Newer gcc versions support format gnu_printf which is
better suited for use in QEMU than format printf
(QEMU always uses standard format strings (even with mingw32)).

V2: Use correct operator '==' (instead of '=')

Cc: Blue Swirl 
Signed-off-by: Stefan Weil 
---
 audio/audio_int.h |8 
 qemu-common.h |   16 
 2 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/audio/audio_int.h b/audio/audio_int.h
index f6a77ad..d8560b6 100644
--- a/audio/audio_int.h
+++ b/audio/audio_int.h
@@ -236,14 +236,6 @@ static inline int audio_ring_dist (int dst, int src, int 
len)
 return (dst >= src) ? (dst - src) : (len - src + dst);
 }
 
-#if defined __GNUC__
-#define GCC_ATTR __attribute__ ((__unused__, format (gnu_printf, 1, 2)))
-#define GCC_FMT_ATTR(n, m) __attribute__ ((format (gnu_printf, n, m)))
-#else
-#define GCC_ATTR /**/
-#define GCC_FMT_ATTR(n, m)
-#endif
-
 static void GCC_ATTR dolog (const char *fmt, ...)
 {
 va_list ap;
diff --git a/qemu-common.h b/qemu-common.h
index 956b545..e97a96e 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -70,6 +70,22 @@ struct iovec {
 #include 
 #endif
 
+#if defined __GNUC__
+# if (__GNUC__ < 4) || \
+ defined(__GNUC_MINOR__) && (__GNUC__ == 4) && (__GNUC_MINOR__ < 4)
+   /* gcc versions before 4.4.x don't support gnu_printf, so use printf. */
+#  define GCC_ATTR __attribute__((__unused__, format(printf, 1, 2)))
+#  define GCC_FMT_ATTR(n, m) __attribute__((format(printf, n, m)))
+# else
+   /* Use gnu_printf when supported (qemu uses standard format strings). */
+#  define GCC_ATTR __attribute__((__unused__, format(gnu_printf, 1, 2)))
+#  define GCC_FMT_ATTR(n, m) __attribute__((format(gnu_printf, n, m)))
+# endif
+#else
+#define GCC_ATTR /**/
+#define GCC_FMT_ATTR(n, m)
+#endif
+
 #ifdef _WIN32
 #define fsync _commit
 #define lseek _lseeki64
-- 
1.7.1

[Qemu-devel] Re: [PATCH] Move macros GCC_ATTR and GCC_FMT_ATTR to common header file

2010-09-21 Thread Blue Swirl

On Tue, Sep 21, 2010 at 5:48 PM, Stefan Weil  wrote:
> By moving the definition of GCC_ATTR and GCC_FMT_ATTR
> from audio_int.h to qemu-common.h these macros are
> now generally available for further patches which add
> the gcc format attribute.
>
> Newer gcc versions support format gnu_printf which is
> better suited for use in QEMU than format printf
> (QEMU always uses standard format strings (even with mingw32)).
>
> V2: Use correct operator '==' (instead of '=')
>
> Cc: Blue Swirl 
> Signed-off-by: Stefan Weil 
> ---
>  audio/audio_int.h |    8 
>  qemu-common.h     |   16 
>  2 files changed, 16 insertions(+), 8 deletions(-)
>
> diff --git a/audio/audio_int.h b/audio/audio_int.h
> index f6a77ad..d8560b6 100644
> --- a/audio/audio_int.h
> +++ b/audio/audio_int.h
> @@ -236,14 +236,6 @@ static inline int audio_ring_dist (int dst, int src, int 
> len)
>     return (dst >= src) ? (dst - src) : (len - src + dst);
>  }
>
> -#if defined __GNUC__
> -#define GCC_ATTR __attribute__ ((__unused__, format (gnu_printf, 1, 2)))
> -#define GCC_FMT_ATTR(n, m) __attribute__ ((format (gnu_printf, n, m)))

The patch doesn't apply, the above lines do not match HEAD.

[Qemu-devel] Re: KVM call minutes for Sept 21

2010-09-21 Thread Anthony Liguori


On 09/21/2010 01:05 PM, Chris Wright wrote:

Nested VMX
- looking for forward progress and better collaboration between the
   Intel and IBM teams
- needs more review (not a new issue)
- use cases
- work todo
   - merge baseline patch
 - looks pretty good
 - review is finding mostly small things at this point
 - need some correctness verification (both review from Intel and testing)
   - need a test suite
 - test suite harness will help here
   - a few dozen nested SVM tests are there, can follow for nested VMX
   - nested EPT
   - optimize (reduce vmreads and vmwrites)
- has long term maintan

Hotplug
- command...guest may or may not respond
- guest can't be trusted to be direct part of request/response loop
- solve at QMP level
- human monitor issues (multiple successive commands to complete a
   single unplug)
   - should be a GUI interface design decision, human monitor is not a
 good design point
 - digression into GUI interface
   


The way this works IRL is:

1) Administrator presses a physical button.  This sends an ACPI 
notification to the guest.


2) The guest makes a decision about how to handle APCI notification.

3) To initiate unplug, the guest disables the device and performs an 
operation to indicate to the PCI bus that the device is unloaded.


4) Step (3) causes an LED (usually near the button in 1) to change colors

5) Administrator then physically removes the device.

So we need at least a QMP command to perform step (1).  Since (3) can 
occur independently of (1), it should be an async notification.  
device_del should only perform step (5).


A management tool needs to:

pci_unplug_request 
/* wait for PCI_UNPLUGGED event */
device_del 
netdev_del 


Drive caching
- need to formalize the meanings in terms of data integrity guarantees
- guest write cache (does it directly reflect the host write cache?)
   - live migration, underlying block dev changes, so need to decouple the two
- O_DIRECT + O_DSYNC
   - O_DSYNC needed based on whether disk cache is available
   - also issues with sparse files (e.g. O_DIRECT to unallocated extent)
   - how to manage w/out needing to flush every write, slow
- perhaps start with O_DIRECT on raw, non-sparse files only?
- backend needs to open backing store matching to guests disk cache state
- O_DIRECT itself has inconsistent integrity guarantees
   - works well with fully allocated file, depedent on disk cache disable
 (or fs specific flushing)
- filesystem specific warnings (ext4 w/ barriers on, brtfs)
- need to be able to open w/ O_DSYNC depending on guets's write cache mode
- make write cache visible to guest (need a knob for this)
- qemu default is cache=writethrough, do we need to revisit that?
- just present user with option whether or not to use host page cache
- allow guest OS to choose disk write cache setting
   - set up host backend accordingly
- be nice preserve write cache settings over boot (outgrowing cmos storage)
- maybe some host fs-level optimization possible
   - e.g. O_DSYNC to allocated O_DIRECT extent becomes no-op
- conclusion
   - one direct user tunable, "use host page cache or not"
   - one guest OS tunable, "enable disk cache"
   


IOW, a qdev 'write-cache=on|off' property and a blockdev 'direct=on|off' 
property.  For completeness, a blockdev 'unsafe=on|off' property.


Open flags are:

write-cache=on, direct=onO_DIRECT
write-cache=off, direct=onO_DIRECT | O_DSYNC
write-cache=on, direct=off0
write-cache=off, direct=offO_DSYNC

It's still unclear what our default mode will be.

The problem is, O_DSYNC has terrible performance on ext4 when barrier=1.

write-cache=on,direct=off is a bad default because if you do a simple 
performance test, you'll get better than native and that upsets people.


write-cache=off,direct=off is a bad default because ext4's default 
config sucks with this.


likewise, write-cache=off, direct=on is a bad default for the same reason.

Regards,

Anthonny Liguori


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 >

1 - 100 of 117 matches

Mail list logo