Re: [Qemu-devel] [Xen-devel] [PATCH v1] xenpt: Properly handle 64-bit bar with more than 4G size

2015-08-12 Thread Jan Beulich
>>> On 05.08.15 at 04:02,  wrote:
> @@ -491,8 +474,9 @@ static int xen_pt_bar_reg_write(XenPCIPassthroughState 
> *s, XenPTReg *cfg_entry,
>  bar_ro_mask = XEN_PT_BAR_IO_RO_MASK | (r_size - 1);
>  break;
>  case XEN_PT_BAR_FLAG_UPPER:
> +r = &d->io_regions[index-1];

Perhaps worth an assert(index > 0)?

Jan




Re: [Qemu-devel] [Xen-devel] [PATCH v1] xenpt: Properly handle 64-bit bar with more than 4G size

2015-08-12 Thread Wu, Feng


> -Original Message-
> From: qemu-devel-bounces+feng.wu=intel@nongnu.org
> [mailto:qemu-devel-bounces+feng.wu=intel@nongnu.org] On Behalf Of
> Jan Beulich
> Sent: Wednesday, August 12, 2015 2:59 PM
> To: Wu, Feng
> Cc: xen-de...@lists.xensource.com; qemu-devel@nongnu.org;
> stefano.stabell...@eu.citrix.com
> Subject: Re: [Qemu-devel] [Xen-devel] [PATCH v1] xenpt: Properly handle 64-bit
> bar with more than 4G size
> 
> >>> On 05.08.15 at 04:02,  wrote:
> > @@ -491,8 +474,9 @@ static int
> xen_pt_bar_reg_write(XenPCIPassthroughState *s, XenPTReg *cfg_entry,
> >  bar_ro_mask = XEN_PT_BAR_IO_RO_MASK | (r_size - 1);
> >  break;
> >  case XEN_PT_BAR_FLAG_UPPER:
> > +r = &d->io_regions[index-1];
> 
> Perhaps worth an assert(index > 0)?

No problem, I will add it. BTW, do you have any other comments about this 
patch? If no, I am
going to send out the new version with this changes.

Thanks,
Feng

> 
> Jan
> 




Re: [Qemu-devel] [PATCH v6 0/2] vhost user: Add live migration

2015-08-12 Thread Michael S. Tsirkin
On Thu, Aug 06, 2015 at 10:45:07AM +0200, Thibaut Collet wrote:
> v5->v6
> 1. First patch: remove a warning log
> 2. Second patch: rename some functions to be more explicit on the purpose of
>these functions.
> 
> The first patch provides limited live migration:
> - guest without GUEST_ANNOUNCE capabilities does not announce migration ending
> and peers talking to the migrated guest can suffer important network outage.
> - Some packets sent by remote peers to the guest can be lost during migration.
> 
> The second patch fixes limitation for guest without GUEST_ANNOUNCE 
> capabilities
> and patches from Marc Andre Lureau fix potential packet's lost during 
> migration.
> 
> Thibaut Collet (2):
>   vhost user: add support of live migration
>   vhost user: add rarp sending after live migration for legacy guest

I think these patches need to be rebased on top of Marc Andre's ones,
and use protocol flags to negotiate capabilities.
Right?


>  docs/specs/vhost-user.txt |   15 +++
>  hw/net/vhost_net.c|   18 ++
>  hw/virtio/vhost-backend.c |3 ++-
>  hw/virtio/vhost-user.c|   32 ++--
>  include/hw/virtio/vhost-backend.h |2 ++
>  include/net/vhost_net.h   |1 +
>  net/vhost-user.c  |   31 +--
>  7 files changed, 97 insertions(+), 5 deletions(-)
> 
> -- 
> 1.7.10.4



Re: [Qemu-devel] [RFC v4 1/9] exec.c: Add new exclusive bitmap to ram_list

2015-08-12 Thread alvise rigo
I think that tlb_flush_entry is not enough, since in theory another
vCPU could have a different TLB address referring the same phys
address.

alvise

On Tue, Aug 11, 2015 at 6:32 PM, Paolo Bonzini  wrote:
>
>
> On 11/08/2015 18:11, alvise rigo wrote:
>>> > Why flush the entire cache (I understand you mean TLB)?
>> Sorry, I meant the TLB.
>> If for each removal of an exclusive entry we set also the bit to 1, we
>> force the following LL to make a tlb_flush() on every vCPU.
>
> What if you only flush one entry with tlb_flush_entry (on every vCPU)?
>
> Paolo



Re: [Qemu-devel] [PATCH for-2.5 14/30] m68k: allow adda/suba to add/sub word

2015-08-12 Thread Richard Henderson

On 08/09/2015 01:13 PM, Laurent Vivier wrote:

Signed-off-by: Laurent Vivier
---
  target-m68k/translate.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [Qemu-devel] [PATCH for-2.5 0/4] vhost: cleanups and switching to sorted memory map

2015-08-12 Thread Michael S. Tsirkin
On Tue, Jul 28, 2015 at 04:52:49PM +0200, Igor Mammedov wrote:
> making memory map a sorted array helps to simplify
> and speed up lookup/insertion and deletion ops on it.
> It also makes insertion/deteletion code easier to read.

I'm a bit confused by all the vhost patches you sent.
Is this series still something you want me to merge?
Or was it the one that caused memory corruptions with
mem hotplug?

> Igor Mammedov (4):
>   vhost: codding style fix tab indents
>   vhost: simplify/speedify vhost_dev_assign_memory()
>   vhost: switch region lookup from linear to bsearch
>   vhost: simplify/speedify vhost_dev_unassign_memory()
> 
>  hw/virtio/vhost.c | 252 
> ++
>  1 file changed, 123 insertions(+), 129 deletions(-)
> 
> -- 
> 1.8.3.1



Re: [Qemu-devel] [PATCH v2 1/6] hw/pci: Use pow2ceil() rather than hand-calculation

2015-08-12 Thread Michael S. Tsirkin
On Fri, Jul 24, 2015 at 01:33:07PM +0100, Peter Maydell wrote:
> A couple of places in hw/pci use an inline calculation to round a
> size up to the next largest power of 2. We have a utility routine
> for this, so use it.
> 
> (The behaviour of the old code is different if the size value
> is 0 -- it would leave it as 0 rather than rounding up to 1,
> but in both cases we know the size can't be 0.
> In the case where the size value had bit 31 set, the old code
> would invoke undefined behaviour; the new code will give a
> result of 0. Presumably that could never happen either.)
> 
> Signed-off-by: Peter Maydell 

Reviewed-by: Michael S. Tsirkin 

> ---
>  hw/pci/msix.c | 4 +---
>  hw/pci/pci.c  | 4 +---
>  2 files changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/pci/msix.c b/hw/pci/msix.c
> index 7716bf3..2fdada4 100644
> --- a/hw/pci/msix.c
> +++ b/hw/pci/msix.c
> @@ -314,9 +314,7 @@ int msix_init_exclusive_bar(PCIDevice *dev, unsigned 
> short nentries,
>  bar_size = bar_pba_offset + bar_pba_size;
>  }
>  
> -if (bar_size & (bar_size - 1)) {
> -bar_size = 1 << qemu_fls(bar_size);
> -}
> +bar_size = pow2ceil(bar_size);
>  
>  name = g_strdup_printf("%s-msix", dev->name);
>  memory_region_init(&dev->msix_exclusive_bar, OBJECT(dev), name, 
> bar_size);
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index a017614..502da8d 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -2065,9 +2065,7 @@ static void pci_add_option_rom(PCIDevice *pdev, bool 
> is_default_rom,
>  g_free(path);
>  return;
>  }
> -if (size & (size - 1)) {
> -size = 1 << qemu_fls(size);
> -}
> +size = pow2ceil(size);
>  
>  vmsd = qdev_get_vmsd(DEVICE(pdev));
>  
> -- 
> 1.9.1



Re: [Qemu-devel] [PATCH v2 2/6] hw/virtio/virtio-pci: Use pow2ceil() rather than hand-calculation

2015-08-12 Thread Michael S. Tsirkin
On Fri, Jul 24, 2015 at 01:33:08PM +0100, Peter Maydell wrote:
> Use the utility function pow2ceil() for rounding up to the next
> largest power of 2, rather than inline calculation.
> 
> Signed-off-by: Peter Maydell 

Reviewed-by: Michael S. Tsirkin 

> ---
>  hw/virtio/virtio-pci.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index 283401a..845f52f 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -1497,9 +1497,7 @@ static void virtio_pci_device_plugged(DeviceState *d, 
> Error **errp)
>  if (legacy) {
>  size = VIRTIO_PCI_REGION_SIZE(&proxy->pci_dev)
>  + virtio_bus_get_vdev_config_len(bus);
> -if (size & (size - 1)) {
> -size = 1 << qemu_fls(size);
> -}
> +size = pow2ceil(size);
>  
>  memory_region_init_io(&proxy->bar, OBJECT(proxy),
>&virtio_pci_config_ops,
> -- 
> 1.9.1



Re: [Qemu-devel] [PATCH for-2.5 15/30] m68k: add more modes to movem

2015-08-12 Thread Richard Henderson

On 08/09/2015 01:13 PM, Laurent Vivier wrote:

+opsize = (insn & 0x40) != 0 ? OS_LONG : OS_WORD;
+incr = opsize_bytes(opsize);
+if (!is_load && (insn & 070) == 040) {
+for (i = 15; i >= 0; i--, mask >>= 1) {


This has got to be wrong.  Just because it's pre-decrement doesn't mean you 
should skip all of the loads.



r~



Re: [Qemu-devel] [PATCH] linux-user: elfload: Still use TARGET_PAGE_SIZE for i386 guest

2015-08-12 Thread gchen gchen
On 2015年08月12日 14:45, Richard Henderson wrote:
> On 08/11/2015 11:03 PM, gchen gchen wrote:
>> Under Alpha host, for ubuntu12.04.5 i386 guest, it will cause failure:
>> "Invalid ELF image for this architecture".
>>
>> The related issue commit is "a70daba linux-user: Tell guest about big
>> host page sizes".
>>
>> Signed-off-by: Chen Gang
>> ---
>>   linux-user/elfload.c | 4 
>>   1 file changed, 4 insertions(+)
>
> Nack.  There's 99 problems with host page size> guest page size.  This
> solves none of them, and in the hackiest way possible.
>

Under alpha virtual machine, if set i386 guest page size 8KB, it will
cause failure directly (any dynamically linked binaries can not work).

This fix can let i386 bash/cp/vi/ls run under alpha virtual machine,
although there maybe be many other issues (which I did not meet, now).

Do you have any other ideas for solving this issue?  e.g. can we rebuild
all i386 programs with segment alignment to 8KB? I am not quite sure,
but some i386 programs (which are interested in) have no source code!
:-(.


Thanks.
--
Chen Gang

Open, share, and attitude like air, water, and life which God blessed
  

Re: [Qemu-devel] [PATCH for-2.5 01/30] m68k: define m680x0 CPUs and features

2015-08-12 Thread Laurent Vivier


Le 12/08/2015 01:13, Richard Henderson a écrit :
> On 08/09/2015 01:13 PM, Laurent Vivier wrote:
>>  INSN(undef, , , CF_ISA_A);
>> +INSN(undef, , , M68000);
>>  INSN(arith_im,  0080, fff8, CF_ISA_A);
>> +INSN(arith_im,  , ff00, M68000);
>> +INSN(undef, 00c0, ffc0, M68000);
>>  INSN(bitrev,00c0, fff8, CF_ISA_APLUSC);
>>  INSN(bitop_reg, 0100, f1c0, CF_ISA_A);
>> +INSN(bitop_reg, 0100, f1c0, M68000);
>>  INSN(bitop_reg, 0140, f1c0, CF_ISA_A);
>> +INSN(bitop_reg, 0140, f1c0, M68000);
> 
> There's a *lot* of repetition in here.
> 
> Can we also introduce a BASE() macro that's like INSN() except that it doesn't
> bother checking m68k_feature?  That way if both CF_ISA_A and M68000 are set, 
> we
> don't have to duplicate the entry.

Thank you, good idea.

Laurent



Re: [Qemu-devel] [PATCH for-2.5 15/30] m68k: add more modes to movem

2015-08-12 Thread Andreas Schwab
Richard Henderson  writes:

> On 08/09/2015 01:13 PM, Laurent Vivier wrote:
>> +opsize = (insn & 0x40) != 0 ? OS_LONG : OS_WORD;
>> +incr = opsize_bytes(opsize);
>> +if (!is_load && (insn & 070) == 040) {
>> +for (i = 15; i >= 0; i--, mask >>= 1) {
>
> This has got to be wrong.  Just because it's pre-decrement doesn't mean
> you should skip all of the loads.

Pre-dec only supports reg-to-mem, and is special because mask is bit
reversed.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



Re: [Qemu-devel] [PATCH COLO-Frame v8 00/34] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT)

2015-08-12 Thread zhanghailiang

On 2015/8/5 19:24, Dr. David Alan Gilbert wrote:

* zhanghailiang (zhang.zhanghaili...@huawei.com) wrote:

This is the 8th version of COLO.

Here is only COLO frame part, include: VM checkpoint,
failover, proxy API, block replication API, not include block replication.
The block part is treated as a separate series.

As usual, we provide 'basic' and 'developing' branches in github:
https://github.com/coloft/qemu/commits/colo-v1.5-basic
https://github.com/coloft/qemu/commits/colo-v1.5-developing (more features)

The 'basic' branch is exactly the same with this patch series,
We will keep this series simple as possible, just for easy review.

The extra features in colo-v1.5-developing branch:
1) Separate ram and device save/load process to reduce size of extra memory
used during checkpoint
2) Live migrate part of dirty pages to slave during sleep time.
3) You get the statistic info about checkpoint by command 'info migrate'


I'm hitting a problem that I think is due to the new global_state section
that Juan recently added; if I cause a failover I hit:

ERROR: invalid runstate transition: 'colo' -> 'prelaunch'

(on the secondary).
I think the problem is that, the global_state is only sent for any 'unusual' 
states,
so in the first migration that gets done at startup, 'prelaunch' is included in 
the stream
in the global state, but then for later checkpoints the global_state probably 
isn't
sent.

I hacked around it by making global_state_needed return false; I guess
we need to find a better fix!



Hi Dave,

This problem can be fixed by the following modification:

diff --git a/migration/colo.c b/migration/colo.c
index 89f4a3f..fccb384 100644
--- a/migration/colo.c
+++ b/migration/colo.c
@@ -424,6 +424,10 @@ static void *colo_thread(void *opaque)
 qemu_mutex_unlock_iothread();
 trace_colo_vm_state_change("stop", "run");

+if (global_state_store() < 0) {
+goto out;
+}
+

Actually, the global_state is sent in every cycle of COLO checkpoint.
But the value is always the old one (prelaunch). It is only stored
in the first migration's last stage, but not been updated after going into colo 
mode,
where the new state is 'running'.


Thanks,
zhanghailiang






Please reference to the follow link to test COLO.
http://wiki.qemu.org/Features/COLO.

COLO is a totally new feature which is still in early stage,
your comments and feedback are warmly welcomed.

NOTE:
We have decided to re-implement the colo proxy in userspace (In qemu exactly).
you can find the discussion about why & how to realize the colo proxy in qemu 
from the follow link:
http://lists.nongnu.org/archive/html/qemu-devel/2015-07/msg04069.html

TODO:
1. COLO function switch on/off
2. The capability of continuous FT
3. Optimize the performance.

v8:
- Move some global variables into MigrationIncomingState and MigrationState
- Move some cleanup work form colo thread and colo incoming thread into failover
   BH function and also fix the code logic for the cleanup work.
- fix the bug that colo thread and colo incoming thread possibly block in the
   socket 'recv' call when do failover work.
- Optimize colo_flush_ram_cache()
- Add migration state for incoming side, we use the state to verify if migration
   incoming side is in COLO state or not (Patch 5).
- Drop the patch 'COLO: Disable qdev hotplug when VM is in COLO mode', since it 
is not correct.

zhanghailiang (34):
   configure: Add parameter for configure to enable/disable COLO support
   migration: Introduce capability 'colo' to migration
   COLO: migrate colo related info to slave
   colo-comm/migration: skip colo info section for special cases
   migration: Add state records for migration incoming
   migration: Integrate COLO checkpoint process into migration
   migration: Integrate COLO checkpoint process into loadvm
   COLO: Implement colo checkpoint protocol
   COLO: Add a new RunState RUN_STATE_COLO
   QEMUSizedBuffer: Introduce two help functions for qsb
   COLO: Save VM state to slave when do checkpoint
   COLO RAM: Load PVM's dirty page into SVM's RAM cache temporarily
   COLO VMstate: Load VM state into qsb before restore it
   arch_init: Start to trace dirty pages of SVM
   COLO RAM: Flush cached RAM into SVM's memory
   COLO failover: Introduce a new command to trigger a failover
   COLO failover: Introduce state to record failover process
   COLO failover: Implement COLO primary/secondary vm failover work
   qmp event: Add event notification for COLO error
   COLO failover: Don't do failover during loading VM's state
   COLO: Add new command parameter 'forward_nic' 'colo_script' for net
   COLO NIC: Init/remove colo nic devices when add/cleanup tap devices
   tap: Make launch_script() public
   COLO NIC: Implement colo nic device interface configure()
   colo-nic: Handle secondary VM's original net device configure
   COLO NIC: Implement colo nic init/destroy function
   COLO NIC: Some init work related with proxy module
   COLO: Handle nfnetlink message from proxy m

Re: [Qemu-devel] [Xen-devel] [PATCH v1] xenpt: Properly handle 64-bit bar with more than 4G size

2015-08-12 Thread Jan Beulich
>>> On 12.08.15 at 09:10,  wrote:

> 
>> -Original Message-
>> From: qemu-devel-bounces+feng.wu=intel@nongnu.org 
>> [mailto:qemu-devel-bounces+feng.wu=intel@nongnu.org] On Behalf Of
>> Jan Beulich
>> Sent: Wednesday, August 12, 2015 2:59 PM
>> To: Wu, Feng
>> Cc: xen-de...@lists.xensource.com; qemu-devel@nongnu.org;
>> stefano.stabell...@eu.citrix.com 
>> Subject: Re: [Qemu-devel] [Xen-devel] [PATCH v1] xenpt: Properly handle 
>> 64-bit
>> bar with more than 4G size
>> 
>> >>> On 05.08.15 at 04:02,  wrote:
>> > @@ -491,8 +474,9 @@ static int
>> xen_pt_bar_reg_write(XenPCIPassthroughState *s, XenPTReg *cfg_entry,
>> >  bar_ro_mask = XEN_PT_BAR_IO_RO_MASK | (r_size - 1);
>> >  break;
>> >  case XEN_PT_BAR_FLAG_UPPER:
>> > +r = &d->io_regions[index-1];
>> 
>> Perhaps worth an assert(index > 0)?
> 
> No problem, I will add it. BTW, do you have any other comments about this 
> patch? If no, I am
> going to send out the new version with this changes.

No - everything else looks to make sense (but continues to need
testing).

Jan




Re: [Qemu-devel] [PATCH for-2.5 05/30] m68k: define operand sizes

2015-08-12 Thread Laurent Vivier


Le 12/08/2015 06:07, Richard Henderson a écrit :
> On 08/09/2015 01:13 PM, Laurent Vivier wrote:
>> -#define OS_BYTE 0
>> -#define OS_WORD 1
>> -#define OS_LONG 2
>> -#define OS_SINGLE 4
>> -#define OS_DOUBLE 5
>> +#define OS_BYTE 1
>> +#define OS_WORD 2
>> +#define OS_LONG 3
>> +#define OS_SINGLE   4
>> +#define OS_DOUBLE   5
>> +#define OS_EXTENDED 6
>> +#define OS_PACKED   7
>>
> 
> Is there a reason you've skipped the 0 value when adding the new values?

I think there is no reason, but if I change the value I have to update
abdc_mem, sbcd_mem instructions as they use it as an
incrementer/decrementer. I agree, it's a strange idea.

> 
>> +static inline int insn_opsize(int insn, int pos)
>> +{
>> +switch ((insn >> pos) & 3) {
> 
> 
> In particular, that change means that insn_opsize is more complicated
> than needed.  Further, is there any reason for POS to be a varable? 
> Isn't it at the same place for all insns?
> 
>> +static inline int ext_opsize(int ext, int pos)
> 
> This should probably wait until the fp insns get added.

Yes.

Laurent



Re: [Qemu-devel] [PATCH for-2.5 05/30] m68k: define operand sizes

2015-08-12 Thread Andreas Schwab
Laurent Vivier  writes:

> Le 12/08/2015 06:07, Richard Henderson a écrit :
>> Is there a reason you've skipped the 0 value when adding the new values?
>
> I think there is no reason, but if I change the value I have to update
> abdc_mem, sbcd_mem instructions as they use it as an
> incrementer/decrementer. I agree, it's a strange idea.

Those uses are really opsize_bytes(OS_BYTE), technically.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



Re: [Qemu-devel] [Xen-devel] [PATCH v1] xenpt: Properly handle 64-bit bar with more than 4G size

2015-08-12 Thread Wu, Feng


> -Original Message-
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Wednesday, August 12, 2015 4:43 PM
> To: Wu, Feng
> Cc: stefano.stabell...@eu.citrix.com; xen-de...@lists.xensource.com;
> qemu-devel@nongnu.org
> Subject: RE: [Qemu-devel] [Xen-devel] [PATCH v1] xenpt: Properly handle 64-bit
> bar with more than 4G size
> 
> >>> On 12.08.15 at 09:10,  wrote:
> 
> >
> >> -Original Message-
> >> From: qemu-devel-bounces+feng.wu=intel@nongnu.org
> >> [mailto:qemu-devel-bounces+feng.wu=intel@nongnu.org] On Behalf Of
> >> Jan Beulich
> >> Sent: Wednesday, August 12, 2015 2:59 PM
> >> To: Wu, Feng
> >> Cc: xen-de...@lists.xensource.com; qemu-devel@nongnu.org;
> >> stefano.stabell...@eu.citrix.com
> >> Subject: Re: [Qemu-devel] [Xen-devel] [PATCH v1] xenpt: Properly handle
> 64-bit
> >> bar with more than 4G size
> >>
> >> >>> On 05.08.15 at 04:02,  wrote:
> >> > @@ -491,8 +474,9 @@ static int
> >> xen_pt_bar_reg_write(XenPCIPassthroughState *s, XenPTReg *cfg_entry,
> >> >  bar_ro_mask = XEN_PT_BAR_IO_RO_MASK | (r_size - 1);
> >> >  break;
> >> >  case XEN_PT_BAR_FLAG_UPPER:
> >> > +r = &d->io_regions[index-1];
> >>
> >> Perhaps worth an assert(index > 0)?
> >
> > No problem, I will add it. BTW, do you have any other comments about this
> > patch? If no, I am
> > going to send out the new version with this changes.
> 
> No - everything else looks to make sense (but continues to need
> testing).
> 

I don't have such a device in hand. Can anybody who has such a device help to 
test this
patch? It would be highly appreciated!

Thanks,
Feng

> Jan




Re: [Qemu-devel] [PATCH] linux-user: elfload: Still use TARGET_PAGE_SIZE for i386 guest

2015-08-12 Thread gchen gchen
On 2015年08月12日 15:59, gchen gchen wrote:
> On 2015年08月12日 14:45, Richard Henderson wrote:
>> On 08/11/2015 11:03 PM, gchen gchen wrote:
>>> Under Alpha host, for ubuntu12.04.5 i386 guest, it will cause failure:
>>> "Invalid ELF image for this architecture".
>>>
>>> The related issue commit is "a70daba linux-user: Tell guest about big
>>> host page sizes".
>>>
>>> Signed-off-by: Chen Gang
>>> ---
>>>   linux-user/elfload.c | 4 
>>>   1 file changed, 4 insertions(+)
>>
>> Nack.  There's 99 problems with host page size> guest page size.  This
>> solves none of them, and in the hackiest way possible.
>>
>
> Under alpha virtual machine, if set i386 guest page size 8KB, it will
> cause failure directly (any dynamically linked binaries can not work).
>
> This fix can let i386 bash/cp/vi/ls run under alpha virtual machine,
> although there maybe be many other issues (which I did not meet, now).
>
> Do you have any other ideas for solving this issue?  e.g. can we rebuild
> all i386 programs with segment alignment to 8KB? I am not quite sure,
> but some i386 programs (which are interested in) have no source code!
> :-(.
>

By the way, the patches about the alpha are my current work, I can do
it during my work time, it will have no any negative effect with my
tilegx linux-user development in my free time. :-)

One of my current work contents are: "run ubuntu12.04.5 (no graphic)
under alpha virtual machine, and run wine under ubuntu12.04.5 i386
guest under ubuntu alpha vm, and wine will run Win32 graphic programs".

 - I want to x86_64 laptop run Alpha linux-user run i386 linux-user run
   wine run Win32 graphic programs. I guess, I need build X libs under
   ubuntu Alpha vm (which nographic) -- use them for alpha linux-user.

 - At present, x86_64 laptop run arm linux-user run i386 linux-user run
   wine run Windows i386 graphic programs are OK, the performance is
   acceptable! the qemu code keeps no touch -- our qemu is very good!!!.


Welcome any ideas, suggestions and completions.

Thanks.
--
Chen Gang

Open, share, and attitude like air, water, and life which God blessed
  

[Qemu-devel] [PATCH 1/2] migration: do cleanup operation after completion

2015-08-12 Thread Liang Li
Because of the patch 3ea3b7fa9af067982f34b of kvm, now the migration_end()
is a time consuming operation, which takes about dozens of milliseconds, and
will prolong VM downtime. Such an operation should be done after migration
completion.

For a VM with 8G RAM, this patch can reduce the VM downtime about 32 ms during
live migration.

Signed-off-by: Liang Li 
---
 migration/block.c | 1 -
 migration/migration.c | 5 -
 migration/ram.c   | 1 -
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/migration/block.c b/migration/block.c
index ed865ed..85496fd 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -750,7 +750,6 @@ static int block_save_complete(QEMUFile *f, void *opaque)
 
 qemu_put_be64(f, BLK_MIG_FLAG_EOS);
 
-blk_mig_cleanup();
 return 0;
 }
 
diff --git a/migration/migration.c b/migration/migration.c
index 662e77e..c22095e 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -923,6 +923,7 @@ static void *migration_thread(void *opaque)
 int64_t initial_bytes = 0;
 int64_t max_size = 0;
 int64_t start_time = initial_time;
+int64_t end_time;
 bool old_vm_running = false;
 
 rcu_register_thread();
@@ -1008,8 +1009,10 @@ static void *migration_thread(void *opaque)
 }
 
 qemu_mutex_lock_iothread();
+end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+qemu_savevm_state_cancel();
+
 if (s->state == MIGRATION_STATUS_COMPLETED) {
-int64_t end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
 uint64_t transferred_bytes = qemu_ftell(s->file);
 s->total_time = end_time - s->total_time;
 s->downtime = end_time - start_time;
diff --git a/migration/ram.c b/migration/ram.c
index 7f007e6..6249f6e 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1269,7 +1269,6 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
 
 rcu_read_unlock();
 
-migration_end();
 qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
 
 return 0;
-- 
1.9.1




[Qemu-devel] [PATCH 2/2] migration: rename qemu_savevm_state_cancel

2015-08-12 Thread Liang Li
The function qemu_savevm_state_cancel is called after the migration
in migration_thread, it seems strange to 'cancel' it after completion,
rename it to qemu_savevm_state_cleanup looks better.

Signed-off-by: Liang Li 
---
 include/sysemu/sysemu.h | 2 +-
 migration/migration.c   | 4 ++--
 migration/savevm.c  | 6 +++---
 trace-events| 2 +-
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 44570d1..9cc0240 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -88,7 +88,7 @@ void qemu_savevm_state_begin(QEMUFile *f,
 void qemu_savevm_state_header(QEMUFile *f);
 int qemu_savevm_state_iterate(QEMUFile *f);
 void qemu_savevm_state_complete(QEMUFile *f);
-void qemu_savevm_state_cancel(void);
+void qemu_savevm_state_cleanup(void);
 uint64_t qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size);
 int qemu_loadvm_state(QEMUFile *f);
 
diff --git a/migration/migration.c b/migration/migration.c
index c22095e..e7fe50f 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -561,7 +561,7 @@ static void migrate_fd_cleanup(void *opaque)
 assert(s->state != MIGRATION_STATUS_ACTIVE);
 
 if (s->state != MIGRATION_STATUS_COMPLETED) {
-qemu_savevm_state_cancel();
+qemu_savevm_state_cleanup();
 if (s->state == MIGRATION_STATUS_CANCELLING) {
 migrate_set_state(s, MIGRATION_STATUS_CANCELLING,
   MIGRATION_STATUS_CANCELLED);
@@ -1010,7 +1010,7 @@ static void *migration_thread(void *opaque)
 
 qemu_mutex_lock_iothread();
 end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
-qemu_savevm_state_cancel();
+qemu_savevm_state_cleanup();
 
 if (s->state == MIGRATION_STATUS_COMPLETED) {
 uint64_t transferred_bytes = qemu_ftell(s->file);
diff --git a/migration/savevm.c b/migration/savevm.c
index 6071215..b141b17 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -905,11 +905,11 @@ uint64_t qemu_savevm_state_pending(QEMUFile *f, uint64_t 
max_size)
 return ret;
 }
 
-void qemu_savevm_state_cancel(void)
+void qemu_savevm_state_cleanup(void)
 {
 SaveStateEntry *se;
 
-trace_savevm_state_cancel();
+trace_savevm_state_cleanup();
 QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
 if (se->ops && se->ops->cancel) {
 se->ops->cancel(se->opaque);
@@ -946,7 +946,7 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
 ret = qemu_file_get_error(f);
 }
 if (ret != 0) {
-qemu_savevm_state_cancel();
+qemu_savevm_state_cleanup();
 error_setg_errno(errp, -ret, "Error while writing VM state");
 }
 return ret;
diff --git a/trace-events b/trace-events
index 94bf3bb..102373c 100644
--- a/trace-events
+++ b/trace-events
@@ -1195,7 +1195,7 @@ savevm_state_begin(void) ""
 savevm_state_header(void) ""
 savevm_state_iterate(void) ""
 savevm_state_complete(void) ""
-savevm_state_cancel(void) ""
+savevm_state_cleanup(void) ""
 vmstate_save(const char *idstr, const char *vmsd_name) "%s, %s"
 vmstate_load(const char *idstr, const char *vmsd_name) "%s, %s"
 qemu_announce_self_iter(const char *mac) "%s"
-- 
1.9.1




Re: [Qemu-devel] [ARM SMBIOS V3 PATCH 4/5] smbios: add smbios 3.0 support

2015-08-12 Thread Michael S. Tsirkin
On Tue, Aug 11, 2015 at 10:08:21PM -0400, Wei Huang wrote:
> This patch adds support for SMBIOS 3.0 entry point. When caller invokes
> smbios_set_defaults(), it can specify entry point as 2.1 or 3.0. Then
> smbios_get_tables() will return the entry point table in right format.
> 
> Acked-by: Gabriel Somlo 
> Tested-by: Gabriel Somlo 
> Tested-by: Leif Lindholm 
> Signed-off-by: Wei Huang 
> ---
>  hw/i386/pc_piix.c  |  3 +-
>  hw/i386/pc_q35.c   |  3 +-
>  hw/smbios/smbios.c | 83 
> +-
>  include/hw/smbios/smbios.h | 51 
>  4 files changed, 101 insertions(+), 39 deletions(-)
> 
> diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
> index 653c710..04636b1 100644
> --- a/hw/i386/pc_piix.c
> +++ b/hw/i386/pc_piix.c
> @@ -173,7 +173,8 @@ static void pc_init1(MachineState *machine)
>  MachineClass *mc = MACHINE_GET_CLASS(machine);
>  /* These values are guest ABI, do not change */
>  smbios_set_defaults("QEMU", "Standard PC (i440FX + PIIX, 1996)",
> -mc->name, smbios_legacy_mode, 
> smbios_uuid_encoded);
> +mc->name, smbios_legacy_mode, 
> smbios_uuid_encoded,
> +SMBIOS_ENTRY_POINT_21);
>  }
>  
>  /* allocate ram and load rom/bios */
> diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
> index d83df14..061507d 100644
> --- a/hw/i386/pc_q35.c
> +++ b/hw/i386/pc_q35.c
> @@ -165,7 +165,8 @@ static void pc_q35_init(MachineState *machine)
>  if (smbios_defaults) {
>  /* These values are guest ABI, do not change */
>  smbios_set_defaults("QEMU", "Standard PC (Q35 + ICH9, 2009)",
> -mc->name, smbios_legacy_mode, 
> smbios_uuid_encoded);
> +mc->name, smbios_legacy_mode, 
> smbios_uuid_encoded,
> +SMBIOS_ENTRY_POINT_21);
>  }
>  
>  /* allocate ram and load rom/bios */
> diff --git a/hw/smbios/smbios.c b/hw/smbios/smbios.c
> index efdbb5d..de3cab5 100644
> --- a/hw/smbios/smbios.c
> +++ b/hw/smbios/smbios.c
> @@ -55,7 +55,10 @@ static uint8_t *smbios_tables;
>  static size_t smbios_tables_len;
>  static unsigned smbios_table_max;
>  static unsigned smbios_table_cnt;
> -static struct smbios_entry_point ep;
> +static SmbiosEntryPointType smbios_ep_type = SMBIOS_ENTRY_POINT_21;
> +
> +static SmbiosEntryPoint ep;
> +static size_t ep_length;

I think it's better to drop ep_length and write a function that
retrieves length from ep, based on anchor string.  Don't duplicate
information.


>  
>  static int smbios_type4_count = 0;
>  static bool smbios_immutable;
> @@ -771,11 +774,12 @@ void smbios_set_cpuid(uint32_t version, uint32_t 
> features)
>  
>  void smbios_set_defaults(const char *manufacturer, const char *product,
>   const char *version, bool legacy_mode,
> - bool uuid_encoded)
> + bool uuid_encoded, SmbiosEntryPointType ep_type)
>  {
>  smbios_have_defaults = true;
>  smbios_legacy = legacy_mode;
>  smbios_uuid_encoded = uuid_encoded;
> +smbios_ep_type = ep_type;
>  
>  /* drop unwanted version of command-line file blob(s) */
>  if (smbios_legacy) {
> @@ -808,26 +812,59 @@ void smbios_set_defaults(const char *manufacturer, 
> const char *product,
>  
>  static void smbios_entry_point_setup(void)
>  {
> -memcpy(ep.anchor_string, "_SM_", 4);
> -memcpy(ep.intermediate_anchor_string, "_DMI_", 5);
> -ep.length = sizeof(struct smbios_entry_point);
> -ep.entry_point_revision = 0; /* formatted_area reserved, per spec v2.1+ 
> */
> -memset(ep.formatted_area, 0, 5);
> -
> -/* compliant with smbios spec v2.8 */
> -ep.smbios_major_version = 2;
> -ep.smbios_minor_version = 8;
> -ep.smbios_bcd_revision = 0x28;
> -
> -/* set during table construction, but BIOS may override: */
> -ep.structure_table_length = cpu_to_le16(smbios_tables_len);
> -ep.max_structure_size = cpu_to_le16(smbios_table_max);
> -ep.number_of_structures = cpu_to_le16(smbios_table_cnt);
> -
> -/* BIOS must recalculate: */
> -ep.checksum = 0;
> -ep.intermediate_checksum = 0;
> -ep.structure_table_address = cpu_to_le32(0);
> +switch (smbios_ep_type) {
> +case SMBIOS_ENTRY_POINT_21:
> +memcpy(ep.ep21.anchor_string, "_SM_", 4);
> +memcpy(ep.ep21.intermediate_anchor_string, "_DMI_", 5);
> +ep.ep21.length = sizeof(struct smbios_21_entry_point);
> +ep.ep21.entry_point_revision = 0; /* formatted_area reserved */
> +memset(ep.ep21.formatted_area, 0, 5);
> +
> +/* compliant with smbios spec v2.8 */
> +ep.ep21.smbios_major_version = 2;
> +ep.ep21.smbios_minor_version = 8;
> +ep.ep21.smbios_bcd_revision = 0x28;
> +
> +/* set during table construction, but BIOS may override: */
> +ep.ep21.structure_table

[Qemu-devel] [PATCH 0/2] Fix long vm downtime during live migration

2015-08-12 Thread Liang Li
Some cleanup operations take long time during the pause and copy stage,
especially with the KVM patch 3ea3b7fa9af067, do these operation after
the completion of live migration can help to reduce VM downtime.


Liang Li (2):
  migration: do cleanup operation after completion
  migration: rename qemu_savevm_state_cancel

 include/sysemu/sysemu.h | 2 +-
 migration/block.c   | 1 -
 migration/migration.c   | 7 +--
 migration/ram.c | 1 -
 migration/savevm.c  | 6 +++---
 trace-events| 2 +-
 6 files changed, 10 insertions(+), 9 deletions(-)

-- 
1.9.1




[Qemu-devel] [PATCH] MAINTAINERS: list smbios maintainers

2015-08-12 Thread Michael S. Tsirkin
Now that smbios has its own directory, list its
maintainers. Same people as ACPI so just reuse that
entry.

Signed-off-by: Michael S. Tsirkin 
---
 MAINTAINERS | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 978b717..a059d5d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -645,13 +645,15 @@ S: Supported
 F: include/hw/pci/*
 F: hw/pci/*
 
-ACPI
+ACPI/SMBIOS
 M: Michael S. Tsirkin 
 M: Igor Mammedov 
 S: Supported
 F: include/hw/acpi/*
+F: include/hw/smbios/*
 F: hw/mem/*
 F: hw/acpi/*
+F: hw/smbios/*
 F: hw/i386/acpi-build.[hc]
 F: hw/i386/*dsl
 F: hw/arm/virt-acpi-build.c
-- 
MST



Re: [Qemu-devel] [PATCH 0/2] Fix long vm downtime during live migration

2015-08-12 Thread Paolo Bonzini


On 12/08/2015 23:04, Liang Li wrote:
> Some cleanup operations take long time during the pause and copy stage,
> especially with the KVM patch 3ea3b7fa9af067, do these operation after
> the completion of live migration can help to reduce VM downtime.
> 
> 
> Liang Li (2):
>   migration: do cleanup operation after completion
>   migration: rename qemu_savevm_state_cancel
> 
>  include/sysemu/sysemu.h | 2 +-
>  migration/block.c   | 1 -
>  migration/migration.c   | 7 +--
>  migration/ram.c | 1 -
>  migration/savevm.c  | 6 +++---
>  trace-events| 2 +-
>  6 files changed, 10 insertions(+), 9 deletions(-)
> 

That's really a clever solution!

Reviewed-by: Paolo Bonzini 
Cc: qemu-sta...@nongnu.org



Re: [Qemu-devel] [PATCH 1/2] migration: do cleanup operation after completion

2015-08-12 Thread Paolo Bonzini


On 12/08/2015 23:04, Liang Li wrote:
> @@ -1008,8 +1009,10 @@ static void *migration_thread(void *opaque)
>  }
>  
>  qemu_mutex_lock_iothread();
> +end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> +qemu_savevm_state_cancel();
> +

You can remove the qemu_savevm_state_cancel() call from
migrate_fd_cleanup, too.  Probably best to post a v2 with that change as
well.

Paolo   



Re: [Qemu-devel] [RFC PATCH V7 09/19] Drop global lock during TCG code execution

2015-08-12 Thread Paolo Bonzini


On 11/08/2015 23:34, Frederic Konrad wrote:
>>>
>> Also if qemu_cond_broadcast(&qemu_io_proceeded_cond) is being dropped
>> there is no point keeping the guff around in qemu_tcg_wait_io_event.
>>
> Yes good point.
> 
> BTW this leads to high consumption of host CPU eg: 100% per VCPU thread as
> the VCPUs thread are no longer waiting for qemu_io_proceeded_cond.

If the guest CPU is busy waiting, that's expected.  But if the guest CPU
is halted, it should not have 100% host CPU consumption.

Paolo



Re: [Qemu-devel] [PATCH v6 0/2] vhost user: Add live migration

2015-08-12 Thread Marc-André Lureau
Hi

On Wed, Aug 12, 2015 at 9:25 AM, Michael S. Tsirkin  wrote:
> I think these patches need to be rebased on top of Marc Andre's ones,
> and use protocol flags to negotiate capabilities.
> Right?

Correct. His patches should be applied before my migration tests, though.

-- 
Marc-André Lureau



[Qemu-devel] [PATCH] virtio/vhost: drop unnecessary VHOST_SET_VRING call

2015-08-12 Thread Marcel Apfelbaum
No need to send VHOST_SET_VRING_CALL to backend
before the negotiation with the guest is finished.

Signed-off-by: Marcel Apfelbaum 
---
 hw/virtio/vhost.c | 13 +
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 2712c6f..b448542 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -875,24 +875,13 @@ static void vhost_eventfd_del(MemoryListener *listener,
 static int vhost_virtqueue_init(struct vhost_dev *dev,
 struct vhost_virtqueue *vq, int n)
 {
-struct vhost_vring_file file = {
-.index = n,
-};
 int r = event_notifier_init(&vq->masked_notifier, 0);
+
 if (r < 0) {
 return r;
 }
 
-file.fd = event_notifier_get_fd(&vq->masked_notifier);
-r = dev->vhost_ops->vhost_call(dev, VHOST_SET_VRING_CALL, &file);
-if (r) {
-r = -errno;
-goto fail_call;
-}
 return 0;
-fail_call:
-event_notifier_cleanup(&vq->masked_notifier);
-return r;
 }
 
 static void vhost_virtqueue_cleanup(struct vhost_virtqueue *vq)
-- 
2.1.0




Re: [Qemu-devel] [PATCH] monitor: remove QAPI_EVENT_VSERPORT_CHANGE throttle

2015-08-12 Thread Daniel P. Berrange
On Tue, Aug 11, 2015 at 08:21:18PM +0200, Laszlo Ersek wrote:
> On 08/11/15 19:04, marcandre.lur...@redhat.com wrote:
> > From: Marc-André Lureau 
> > 
> > QAPI_EVENT_VSERPORT_CHANGE reports changes of a virtio serial port
> > state. However, the events may be for different ports, but the throttle
> > mechanism may replace the event for a different port, since it only
> > checks the event type.
> > 
> > libvirt relies on a correct state to be reported for all channels: the
> > qemu-ga commands may no longer work if the state is reported
> > disconnected. This can be triggered easily by having more than 1
> > virtio-serial (qemu-ga + spice agent for example), and restarting
> > quickly daemons or more realistically going quickly in and out of
> > suspend.
> > 
> > In a future patch, we may want to throttle events based on their
> > arguments, but this will likely require dynamic allocations and more
> > complicated code to insert/lookup pending events based on various
> > arguments ("id" in QAPI_EVENT_VSERPORT_CHANGE case).
> > 
> > Fixes:
> > https://bugzilla.redhat.com/show_bug.cgi?id=1244064
> > 
> > Signed-off-by: Marc-André Lureau 
> > ---
> >  monitor.c | 1 -
> >  1 file changed, 1 deletion(-)
> > 
> > diff --git a/monitor.c b/monitor.c
> > index aeea2b5..e4d56f7 100644
> > --- a/monitor.c
> > +++ b/monitor.c
> > @@ -558,7 +558,6 @@ static void monitor_qapi_event_init(void)
> >  monitor_qapi_event_throttle(QAPI_EVENT_BALLOON_CHANGE, 1000);
> >  monitor_qapi_event_throttle(QAPI_EVENT_QUORUM_REPORT_BAD, 1000);
> >  monitor_qapi_event_throttle(QAPI_EVENT_QUORUM_FAILURE, 1000);
> > -monitor_qapi_event_throttle(QAPI_EVENT_VSERPORT_CHANGE, 1000);
> >  
> >  qmp_event_set_func_emit(monitor_qapi_event_queue);
> >  }
> > 
> 
> I don't mind the change (and the point of argument sensitivity is not
> lost on me), but note that this undoes the protection that is spelled
> out in the leading comment of the function, not visible in the context here:
> 
> /* Limit guest-triggerable events to 1 per second */
> 
> That was probably put in place in order to prevent a "malicious" guest
> from spamming the log files (and the CPU usage) of the management apps.
> 
> One solution to that would be arg sensitivity. Another would be a
> burst-capable (ie. a slowly re-filling, limited size token bucket)
> throttle, maintained per event type.
> 
> Until one of those gets written, I guess this patch is acceptable -- as
> long as mgmt people are okay with it. Daniel seems to be, so I don't mind.

Not having rate limiting is certainly an undesirable situation, but
rate limiting which discards unrecoverable data is even worse as it
makes the event useless for the app. We've got a number of events which
are not rate limited for this reason. Ultimately we do need to figure
out a better way to rate limit such events, but until then we have no
option but to skip rate limiting for events which have this issue.

NB, some events which have args are acceptable to rate limit. eg the
balloon change event is fine, because apps generally only care about
the /current/ balloon level, so if 3 balloon events are emitted and
the first 2 are discarded that is fine, because the most recent balloon
level is the only one that matters.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



Re: [Qemu-devel] [PATCH] virtio/vhost: drop unnecessary VHOST_SET_VRING call

2015-08-12 Thread Michael S. Tsirkin
On Wed, Aug 12, 2015 at 01:19:51PM +0300, Marcel Apfelbaum wrote:
> No need to send VHOST_SET_VRING_CALL to backend
> before the negotiation with the guest is finished.
> 
> Signed-off-by: Marcel Apfelbaum 

Well - we do need to set it to the masked notifier initially to avoid
losing events.  You can't just drop it - need to move this call
somewhere else.

> ---
>  hw/virtio/vhost.c | 13 +
>  1 file changed, 1 insertion(+), 12 deletions(-)
> 
> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> index 2712c6f..b448542 100644
> --- a/hw/virtio/vhost.c
> +++ b/hw/virtio/vhost.c
> @@ -875,24 +875,13 @@ static void vhost_eventfd_del(MemoryListener *listener,
>  static int vhost_virtqueue_init(struct vhost_dev *dev,
>  struct vhost_virtqueue *vq, int n)
>  {
> -struct vhost_vring_file file = {
> -.index = n,
> -};
>  int r = event_notifier_init(&vq->masked_notifier, 0);
> +
>  if (r < 0) {
>  return r;
>  }
>  
> -file.fd = event_notifier_get_fd(&vq->masked_notifier);
> -r = dev->vhost_ops->vhost_call(dev, VHOST_SET_VRING_CALL, &file);
> -if (r) {
> -r = -errno;
> -goto fail_call;
> -}
>  return 0;
> -fail_call:
> -event_notifier_cleanup(&vq->masked_notifier);
> -return r;
>  }
>  
>  static void vhost_virtqueue_cleanup(struct vhost_virtqueue *vq)
> -- 
> 2.1.0



Re: [Qemu-devel] [PATCH] virtio/vhost: drop unnecessary VHOST_SET_VRING call

2015-08-12 Thread Marcel Apfelbaum

On 08/12/2015 01:34 PM, Michael S. Tsirkin wrote:

On Wed, Aug 12, 2015 at 01:19:51PM +0300, Marcel Apfelbaum wrote:

No need to send VHOST_SET_VRING_CALL to backend
before the negotiation with the guest is finished.

Signed-off-by: Marcel Apfelbaum 


Well - we do need to set it to the masked notifier initially to avoid
losing events.  You can't just drop it - need to move this call
somewhere else.

What do we need to set?
I just dropped the call to VHOST_SET_VRING_CALL.

Thanks,
Marcel




---
  hw/virtio/vhost.c | 13 +
  1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 2712c6f..b448542 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -875,24 +875,13 @@ static void vhost_eventfd_del(MemoryListener *listener,
  static int vhost_virtqueue_init(struct vhost_dev *dev,
  struct vhost_virtqueue *vq, int n)
  {
-struct vhost_vring_file file = {
-.index = n,
-};
  int r = event_notifier_init(&vq->masked_notifier, 0);
+
  if (r < 0) {
  return r;
  }

-file.fd = event_notifier_get_fd(&vq->masked_notifier);
-r = dev->vhost_ops->vhost_call(dev, VHOST_SET_VRING_CALL, &file);
-if (r) {
-r = -errno;
-goto fail_call;
-}
  return 0;
-fail_call:
-event_notifier_cleanup(&vq->masked_notifier);
-return r;
  }

  static void vhost_virtqueue_cleanup(struct vhost_virtqueue *vq)
--
2.1.0





Re: [Qemu-devel] [PATCH 1/2] migration: do cleanup operation after completion

2015-08-12 Thread Li, Liang Z
> 
> On 12/08/2015 23:04, Liang Li wrote:
> > @@ -1008,8 +1009,10 @@ static void *migration_thread(void *opaque)
> >  }
> >
> >  qemu_mutex_lock_iothread();
> > +end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > +qemu_savevm_state_cancel();
> > +
> 
> You can remove the qemu_savevm_state_cancel() call from
> migrate_fd_cleanup, too.  Probably best to post a v2 with that change as well.
> 
> Paolo

You are right. Done.

Liang



Re: [Qemu-devel] [PATCH] virtio/vhost: drop unnecessary VHOST_SET_VRING call

2015-08-12 Thread Michael S. Tsirkin
On Wed, Aug 12, 2015 at 02:10:56PM +0300, Marcel Apfelbaum wrote:
> On 08/12/2015 01:34 PM, Michael S. Tsirkin wrote:
> >On Wed, Aug 12, 2015 at 01:19:51PM +0300, Marcel Apfelbaum wrote:
> >>No need to send VHOST_SET_VRING_CALL to backend
> >>before the negotiation with the guest is finished.
> >>
> >>Signed-off-by: Marcel Apfelbaum 
> >
> >Well - we do need to set it to the masked notifier initially to avoid
> >losing events.  You can't just drop it - need to move this call
> >somewhere else.
> What do we need to set?
> I just dropped the call to VHOST_SET_VRING_CALL.
> 
> Thanks,
> Marcel

We use two eventfds: masked and unmasked one.
We switch dynamically dependent on msi mask value.
Code assumes we start out masked, so we need to match that.


> >
> >>---
> >>  hw/virtio/vhost.c | 13 +
> >>  1 file changed, 1 insertion(+), 12 deletions(-)
> >>
> >>diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> >>index 2712c6f..b448542 100644
> >>--- a/hw/virtio/vhost.c
> >>+++ b/hw/virtio/vhost.c
> >>@@ -875,24 +875,13 @@ static void vhost_eventfd_del(MemoryListener 
> >>*listener,
> >>  static int vhost_virtqueue_init(struct vhost_dev *dev,
> >>  struct vhost_virtqueue *vq, int n)
> >>  {
> >>-struct vhost_vring_file file = {
> >>-.index = n,
> >>-};
> >>  int r = event_notifier_init(&vq->masked_notifier, 0);
> >>+
> >>  if (r < 0) {
> >>  return r;
> >>  }
> >>
> >>-file.fd = event_notifier_get_fd(&vq->masked_notifier);
> >>-r = dev->vhost_ops->vhost_call(dev, VHOST_SET_VRING_CALL, &file);
> >>-if (r) {
> >>-r = -errno;
> >>-goto fail_call;
> >>-}
> >>  return 0;
> >>-fail_call:
> >>-event_notifier_cleanup(&vq->masked_notifier);
> >>-return r;
> >>  }
> >>
> >>  static void vhost_virtqueue_cleanup(struct vhost_virtqueue *vq)
> >>--
> >>2.1.0



Re: [Qemu-devel] [PATCH v8 3/5] Introduce irqchip type specification for KVM

2015-08-12 Thread Pavel Fedin
 Hello!

> I still think this is the wrong approach -- see my remarks
> in the previous round of patch review.

 Christoffer did not reply anything to your question back then. So - what to 
do? Probe for all possible GICs? Remove the probe at all?

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia





Re: [Qemu-devel] [PATCH v8 3/5] Introduce irqchip type specification for KVM

2015-08-12 Thread Christoffer Dall
On Wed, Aug 12, 2015 at 1:44 PM, Pavel Fedin  wrote:
>  Hello!
>
>> I still think this is the wrong approach -- see my remarks
>> in the previous round of patch review.
>
>  Christoffer did not reply anything to your question back then. So - what to 
> do? Probe for all possible GICs? Remove the probe at all?
>
I may have missed something here, what was the question?

-Christoffer



[Qemu-devel] [PATCH 3/3] monitor: added generation of documentation for hmp-commands-info.hx

2015-08-12 Thread Denis V. Lunev
From: Pavel Butsykin 

It will be easier if you need to add info-commands to edit
only hmp-commands-info.hx, before this had to edit monitor.c and
hmp-commands.hx

Signed-off-by: Pavel Butsykin 
Signed-off-by: Denis V. Lunev 
CC: Luiz Capitulino 
CC: Paolo Bonzini 
CC: Peter Maydell 
---
 .gitignore   |   1 +
 Makefile |   9 ++--
 hmp-commands-info.hx |   4 ++
 hmp-commands.hx  | 120 ---
 qemu-doc.texi|   2 +
 5 files changed, 13 insertions(+), 123 deletions(-)

diff --git a/.gitignore b/.gitignore
index 61bc492..f1c881a 100644
--- a/.gitignore
+++ b/.gitignore
@@ -49,6 +49,7 @@
 /qemu-ga
 /qemu-bridge-helper
 /qemu-monitor.texi
+/qemu-monitor-info.texi
 /qmp-commands.txt
 /vscclient
 /fsdev/virtfs-proxy-helper
diff --git a/Makefile b/Makefile
index 340d9c8..768422b 100644
--- a/Makefile
+++ b/Makefile
@@ -344,7 +344,7 @@ qemu-%.tar.bz2:
$(SRC_PATH)/scripts/make-release "$(SRC_PATH)" "$(patsubst 
qemu-%.tar.bz2,%,$@)"
 
 distclean: clean
-   rm -f config-host.mak config-host.h* config-host.ld $(DOCS) 
qemu-options.texi qemu-img-cmds.texi qemu-monitor.texi
+   rm -f config-host.mak config-host.h* config-host.ld $(DOCS) 
qemu-options.texi qemu-img-cmds.texi qemu-monitor.texi qemu-monitor-info.texi
rm -f config-all-devices.mak config-all-disas.mak config.status
rm -f po/*.mo tests/qemu-iotests/common.env
rm -f roms/seabios/config.mak roms/vgabios/config.mak
@@ -508,13 +508,16 @@ qemu-options.texi: $(SRC_PATH)/qemu-options.hx
 qemu-monitor.texi: $(SRC_PATH)/hmp-commands.hx
$(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -t < $< > $@,"  GEN  
 $@")
 
+qemu-monitor-info.texi: $(SRC_PATH)/hmp-commands-info.hx
+   $(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -t < $< > $@,"  GEN  
 $@")
+
 qmp-commands.txt: $(SRC_PATH)/qmp-commands.hx
$(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -q < $< > $@,"  GEN  
 $@")
 
 qemu-img-cmds.texi: $(SRC_PATH)/qemu-img-cmds.hx
$(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -t < $< > $@,"  GEN  
 $@")
 
-qemu.1: qemu-doc.texi qemu-options.texi qemu-monitor.texi
+qemu.1: qemu-doc.texi qemu-options.texi qemu-monitor.texi 
qemu-monitor-info.texi
$(call quiet-command, \
  perl -Ww -- $(SRC_PATH)/scripts/texi2pod.pl $< qemu.pod && \
  $(POD2MAN) --section=1 --center=" " --release=" " qemu.pod > $@, \
@@ -551,7 +554,7 @@ pdf: qemu-doc.pdf qemu-tech.pdf
 
 qemu-doc.dvi qemu-doc.html qemu-doc.info qemu-doc.pdf: \
qemu-img.texi qemu-nbd.texi qemu-options.texi \
-   qemu-monitor.texi qemu-img-cmds.texi
+   qemu-monitor.texi qemu-monitor-info.texi qemu-img-cmds.texi
 
 ifdef CONFIG_WIN32
 
diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index 9ccb33f..81ae9d7 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -6,6 +6,9 @@ HXCOMM monitor info commands
 HXCOMM HXCOMM can be used for comments, discarded from both texi and C
 
 STEXI
+@item info @var{subcommand}
+@findex info
+Show various information about the system state.
 @table @option
 ETEXI
 
@@ -708,4 +711,5 @@ ETEXI
 
 STEXI
 @end table
+@end table
 ETEXI
diff --git a/hmp-commands.hx b/hmp-commands.hx
index d3b7932..3b36db4 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1705,123 +1705,3 @@ ETEXI
 .mhandler.cmd = hmp_info_help,
 .sub_table = info_cmds,
 },
-
-STEXI
-@item info @var{subcommand}
-@findex info
-Show various information about the system state.
-
-@table @option
-@item info version
-show the version of QEMU
-@item info network
-show the various VLANs and the associated devices
-@item info chardev
-show the character devices
-@item info block
-show the block devices
-@item info blockstats
-show block device statistics
-@item info registers
-show the cpu registers
-@item info cpus
-show infos for each CPU
-@item info history
-show the command line history
-@item info irq
-show the interrupts statistics (if available)
-@item info pic
-show i8259 (PIC) state
-@item info pci
-show emulated PCI device info
-@item info tlb
-show virtual to physical memory mappings (i386, SH4, SPARC, PPC, and Xtensa 
only)
-@item info mem
-show the active virtual memory mappings (i386 only)
-@item info jit
-show dynamic compiler info
-@item info numa
-show NUMA information
-@item info kvm
-show KVM information
-@item info usb
-show USB devices plugged on the virtual USB hub
-@item info usbhost
-show all USB host devices
-@item info profile
-show profiling information
-@item info capture
-show information about active capturing
-@item info snapshots
-show list of VM snapshots
-@item info status
-show the current VM status (running|paused)
-@item info mice
-show which guest mouse is receiving events
-@item info vnc
-show the vnc server status
-@item info name
-show the current VM name
-@item info uuid
-show the current VM UUID
-@item info cpustats
-show CPU statistics
-@item info usernet
-show user networ

[Qemu-devel] [PATCH v2 for 2.5 0/3] Move target- and device specific code from monitor

2015-08-12 Thread Denis V. Lunev
The monivation of this set is simple. Recently we have proposed patch
to monitor.c with specific x86 APIC HMP commands. The patchset was denied
with the main motivation "No more arch specific code in monitor.c"
This patchset is the first step to move arch specific code from
monitor.c targets.

So, monitor.c already contains a lot of generic code, as well as the target
specifics code and eventually monitor.c volume will only grow. This trend leads
to a variety of fouling code ifdeffery(and combinations thereof),
poor readability, and entanglement of architecture of the project.
If someone wants to improve processing logic commands at the monitor,
it isn't necessarily must differentiate amongst the implementation of some ARM
or x86_64 specific commands, because the project already has separation of
target specific code on directories.

The presented solution is not the best, but it is quite simple
(PATCH doesn't add more code!) and decides the above mentioned issue.
Subsequently it will not prevent the introduction of more advanced mechanism
that can more effectively resolve the issue.

There is a issue with the placement of code for multiple architectures
(isn't for everyone), but this code is very small. This patch is a step towards
solving the issue associated with maintaining the purity of the code and
structure of the project, which solves not all, but doing a little better
than it is.

Changes from v1:
- ported to new head

Signed-off-by: Pavel Butsykin 
Signed-off-by: Denis V. Lunev 
CC: Luiz Capitulino 
CC: Paolo Bonzini 
CC: Peter Maydell 




[Qemu-devel] [PATCH 1/3] hmp-commands-info: move info_cmds content out of monitor.c

2015-08-12 Thread Denis V. Lunev
From: Pavel Butsykin 

For moving target- and device-specific code  from monitor.c,
to beginning we move info_cmds content to hmp-commands-info.hx

Signed-off-by: Pavel Butsykin 
Signed-off-by: Denis V. Lunev 
CC: Luiz Capitulino 
CC: Paolo Bonzini 
CC: Peter Maydell 
---
 Makefile.target  |   5 +-
 hmp-commands-info.hx | 711 +++
 monitor.c| 373 +--
 3 files changed, 717 insertions(+), 372 deletions(-)
 create mode 100644 hmp-commands-info.hx

diff --git a/Makefile.target b/Makefile.target
index 3e7aafd..93ae46d 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -151,7 +151,7 @@ else
 obj-y += hw/$(TARGET_BASE_ARCH)/
 endif
 
-GENERATED_HEADERS += hmp-commands.h qmp-commands-old.h
+GENERATED_HEADERS += hmp-commands.h hmp-commands-info.h qmp-commands-old.h
 
 endif # CONFIG_SOFTMMU
 
@@ -193,6 +193,9 @@ gdbstub-xml.c: $(TARGET_XML_FILES) 
$(SRC_PATH)/scripts/feature_to_c.sh
 hmp-commands.h: $(SRC_PATH)/hmp-commands.hx
$(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -h < $< > $@,"  GEN  
 $(TARGET_DIR)$@")
 
+hmp-commands-info.h: $(SRC_PATH)/hmp-commands-info.hx
+   $(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -h < $< > $@,"  GEN  
 $(TARGET_DIR)$@")
+
 qmp-commands-old.h: $(SRC_PATH)/qmp-commands.hx
$(call quiet-command,sh $(SRC_PATH)/scripts/hxtool -h < $< > $@,"  GEN  
 $(TARGET_DIR)$@")
 
diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
new file mode 100644
index 000..9ccb33f
--- /dev/null
+++ b/hmp-commands-info.hx
@@ -0,0 +1,711 @@
+HXCOMM Use DEFHEADING() to define headings in both help text and texi
+HXCOMM Text between STEXI and ETEXI are copied to texi version and
+HXCOMM discarded from C version
+HXCOMM DEF(command, args, callback, arg_string, help) is used to construct
+HXCOMM monitor info commands
+HXCOMM HXCOMM can be used for comments, discarded from both texi and C
+
+STEXI
+@table @option
+ETEXI
+
+{
+.name   = "version",
+.args_type  = "",
+.params = "",
+.help   = "show the version of QEMU",
+.mhandler.cmd = hmp_info_version,
+},
+
+STEXI
+@item info version
+@findex version
+Show the version of QEMU.
+ETEXI
+
+{
+.name   = "network",
+.args_type  = "",
+.params = "",
+.help   = "show the network state",
+.mhandler.cmd = hmp_info_network,
+},
+
+STEXI
+@item info network
+@findex network
+Show the network state.
+ETEXI
+
+{
+.name   = "chardev",
+.args_type  = "",
+.params = "",
+.help   = "show the character devices",
+.mhandler.cmd = hmp_info_chardev,
+},
+
+STEXI
+@item info chardev
+@findex chardev
+Show the character devices.
+ETEXI
+
+{
+.name   = "block",
+.args_type  = "nodes:-n,verbose:-v,device:B?",
+.params = "[-n] [-v] [device]",
+.help   = "show info of one block device or all block devices "
+  "(-n: show named nodes; -v: show details)",
+.mhandler.cmd = hmp_info_block,
+},
+
+STEXI
+@item info block
+@findex block
+Show info of one block device or all block devices.
+ETEXI
+
+{
+.name   = "blockstats",
+.args_type  = "",
+.params = "",
+.help   = "show block device statistics",
+.mhandler.cmd = hmp_info_blockstats,
+},
+
+STEXI
+@item info blockstats
+@findex blockstats
+Show block device statistics.
+ETEXI
+
+{
+.name   = "block-jobs",
+.args_type  = "",
+.params = "",
+.help   = "show progress of ongoing block device operations",
+.mhandler.cmd = hmp_info_block_jobs,
+},
+
+STEXI
+@item info block-jobs
+@findex block-jobs
+Show progress of ongoing block device operations.
+ETEXI
+
+{
+.name   = "registers",
+.args_type  = "",
+.params = "",
+.help   = "show the cpu registers",
+.mhandler.cmd = hmp_info_registers,
+},
+
+STEXI
+@item info registers
+@findex registers
+Show the cpu registers.
+ETEXI
+
+{
+.name   = "cpus",
+.args_type  = "",
+.params = "",
+.help   = "show infos for each CPU",
+.mhandler.cmd = hmp_info_cpus,
+},
+
+STEXI
+@item info cpus
+@findex cpus
+Show infos for each CPU.
+ETEXI
+
+{
+.name   = "history",
+.args_type  = "",
+.params = "",
+.help   = "show the command line history",
+.mhandler.cmd = hmp_info_history,
+},
+
+STEXI
+@item info history
+@findex history
+Show the command line history.
+ETEXI
+
+#if defined(TARGET_I386) || defined(TARGET_PPC) || defined(TARGET_MIPS) || \
+defined(TARGET_LM32) || (defined(TARGET_SPARC) && !defined(TARGET_SPARC64))
+{
+.name   = "irq",
+.args_type  = "",
+.params = "",
+.help   = "

[Qemu-devel] [PATCH 2/3] monitor: remove target-specific code from monitor.c

2015-08-12 Thread Denis V. Lunev
From: Pavel Butsykin 

Move target-specific code out of /monitor.c to /target-*/monitor.c,
this will avoid code cluttering and using random ifdeffery.  The solution
is quite simple, but solves the issue of the separation of target-specific
code from monitor

Signed-off-by: Pavel Butsykin 
Signed-off-by: Denis V. Lunev 
CC: Luiz Capitulino 
CC: Paolo Bonzini 
CC: Peter Maydell 
---
 include/monitor/monitor-common.h |  43 ++
 monitor.c| 854 +--
 target-i386/Makefile.objs|   2 +-
 target-i386/monitor.c| 489 ++
 target-ppc/Makefile.objs |   2 +-
 target-ppc/monitor.c | 250 
 target-sh4/Makefile.objs |   1 +
 target-sh4/monitor.c |  52 +++
 target-sparc/Makefile.objs   |   2 +-
 target-sparc/monitor.c   | 153 +++
 target-xtensa/Makefile.objs  |   1 +
 target-xtensa/monitor.c  |  34 ++
 12 files changed, 1032 insertions(+), 851 deletions(-)
 create mode 100644 include/monitor/monitor-common.h
 create mode 100644 target-i386/monitor.c
 create mode 100644 target-ppc/monitor.c
 create mode 100644 target-sh4/monitor.c
 create mode 100644 target-sparc/monitor.c
 create mode 100644 target-xtensa/monitor.c

diff --git a/include/monitor/monitor-common.h b/include/monitor/monitor-common.h
new file mode 100644
index 000..e1d79b9
--- /dev/null
+++ b/include/monitor/monitor-common.h
@@ -0,0 +1,43 @@
+/*
+ * QEMU monitor
+ *
+ * Copyright (c) 2003-2004 Fabrice Bellard
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+#ifndef MONITOR_COMMON_H
+#define MONITOR_COMMON_H
+
+#define MD_TLONG 0
+#define MD_I32   1
+
+typedef struct MonitorDef {
+const char *name;
+int offset;
+target_long (*get_value)(const struct MonitorDef *md, int val);
+int type;
+} MonitorDef;
+
+CPUArchState *mon_get_cpu_env(void);
+
+void hmp_info_mem(Monitor *mon, const QDict *qdict);
+void hmp_info_tlb(Monitor *mon, const QDict *qdict);
+void hmp_mce(Monitor *mon, const QDict *qdict);
+
+#endif /* MONITOR_COMMON */
diff --git a/monitor.c b/monitor.c
index 89c7a52..de00d84 100644
--- a/monitor.c
+++ b/monitor.c
@@ -63,6 +63,7 @@
 #include "cpu.h"
 #include "trace.h"
 #include "trace/control.h"
+#include "monitor/monitor-common.h"
 #ifdef CONFIG_TRACE_SIMPLE
 #include "trace/simple.h"
 #endif
@@ -946,7 +947,7 @@ static CPUState *mon_get_cpu(void)
 return cur_mon->mon_cpu;
 }
 
-static CPUArchState *mon_get_cpu_env(void)
+CPUArchState *mon_get_cpu_env(void)
 {
 return mon_get_cpu()->env_ptr;
 }
@@ -1420,442 +1421,6 @@ static void hmp_boot_set(Monitor *mon, const QDict 
*qdict)
 }
 }
 
-#if defined(TARGET_I386)
-static void print_pte(Monitor *mon, hwaddr addr,
-  hwaddr pte,
-  hwaddr mask)
-{
-#ifdef TARGET_X86_64
-if (addr & (1ULL << 47)) {
-addr |= -1LL << 48;
-}
-#endif
-monitor_printf(mon, TARGET_FMT_plx ": " TARGET_FMT_plx
-   " %c%c%c%c%c%c%c%c%c\n",
-   addr,
-   pte & mask,
-   pte & PG_NX_MASK ? 'X' : '-',
-   pte & PG_GLOBAL_MASK ? 'G' : '-',
-   pte & PG_PSE_MASK ? 'P' : '-',
-   pte & PG_DIRTY_MASK ? 'D' : '-',
-   pte & PG_ACCESSED_MASK ? 'A' : '-',
-   pte & PG_PCD_MASK ? 'C' : '-',
-   pte & PG_PWT_MASK ? 'T' : '-',
-   pte & PG_USER_MASK ? 'U' : '-',
-   pte & PG_RW_MASK ? 'W' : '-');
-}
-
-static void tlb_info_32(Monitor *mon, CPUArchState *env)
-{
-unsigned int l1, l2;
-uint32_t pgd, pde, pte;
-
-pgd = env->cr[3] & ~0xfff;
-for(l1 = 0; l1 < 1024; l1++) {
-cpu_physical_memory_read(pgd + l1 * 4, &pde, 4);
-pde = le32_to_cpu(pde);
-if (pde & PG_PRESENT_MASK) {
-if ((pde & PG_PSE_MASK) && (env->cr[4] & CR4_P

Re: [Qemu-devel] [PATCH v6 0/12] HyperV equivalent of pvpanic driver

2015-08-12 Thread Denis V. Lunev

On 07/07/2015 01:20 PM, Paolo Bonzini wrote:


On 03/07/2015 14:01, Denis V. Lunev wrote:

Windows 2012 guests can notify hypervisor about occurred guest crash
(Windows bugcheck(BSOD)) by writing specific Hyper-V msrs. This patch does
handling of this MSR's by KVM and sending notification to user space that
allows to gather Windows guest crash dump by QEMU/LIBVIRT.

The idea is to provide functionality equal to pvpanic device without
QEMU guest agent for Windows.

The idea is borrowed from Linux HyperV bus driver and validated against
Windows 2k12.

Changes from v5:
* added hyperv crash msrs into supported/emulated list
* qemu: reset CPUState::crash_occurred at cpu reset
* qemu: userspace checks kernel support of hyperv crash msrs
   by kvm_get_supported_msrs

Changes from v4:
* fixed typo in email of Andreas Färber 
   my vim strangely behaves on lines with extended Deutch chars

Changes from v3:
* remove unused HV_X64_MSR_CRASH_CTL_NOTIFY
* added documentation section about KVM_SYSTEM_EVENT_CRASH
* allow only supported values inside crash ctl msr
* qemu: split patch into generic crash handling patches and hyperv specific
* qemu: skip migration of crash ctl msr value

Changes from v2:
* forbid modification crash ctl msr by guest
* qemu_system_guest_panicked usage in pvpanic and s390x
* hyper-v crash handler move from generic kvm to i386
* hyper-v crash handler: skip fetching crash msrs just mark crash occurred
* sync with linux-next 20150629
* patch 11 squashed to patch 10
* patch 9 squashed to patch 7

Changes from v1:
* hyperv code move to hyperv.c
* added read handlers of crash data msrs
* added per vm and per cpu hyperv context structures
* added saving crash msrs inside qemu cpu state
* added qemu fetch and update of crash msrs
* added qemu crash msrs store in cpu state and it's migration

Signed-off-by: Andrey Smetanin 
Signed-off-by: Denis V. Lunev 
CC: Gleb Natapov 
CC: Paolo Bonzini 


I'm queuing patches 1-8 to the KVM tree.  For patch 9-12, I've applied
them locally but would like Eduardo or Andreas to ack 11 and 12.

Paolo

guys?

we are going to move forward with other HyperV bits.

Den


[Qemu-devel] [PATCH v6] hw/arm/virt: Add high MMIO PCI region, 512G in size

2015-08-12 Thread Pavel Fedin
This large region is necessary for some devices like ivshmem and video cards
32-bit kernels can be built without LPAE support. In this case such a kernel
will not be able to use PCI controller which has windows in high addresses.
In order to work around the problem, "highmem" option is introduced. It
defaults to on on, but can be manually set to off in order to be able to run
those old 32-bit guests.

Signed-off-by: Pavel Fedin 
---
v5 => v6:
- Specify correct FDT_PCI_RANGE_MMIO_64BIT type for the region, the bug
  was discovered by running UEFI
v4 => v5:
- Removed machine-dependent "highmem" default, now always ON
v3 => v4:
- Added "highmem" option which controls presence of this region. Default
  value is on for 64-bit CPUs and off for 32-bit CPUs.
- Supply correct min and max address to aml_qword_memory()
v2 => v3:
- Region size increased to 512G
- Added ACPI description
v1 => v2:
- Region address changed to 512G, leaving more space for RAM
---
 hw/arm/virt-acpi-build.c | 17 +--
 hw/arm/virt.c| 63 +++-
 include/hw/arm/virt-acpi-build.h |  1 +
 include/hw/arm/virt.h|  1 +
 4 files changed, 73 insertions(+), 9 deletions(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index f365140..9088248 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -159,7 +159,8 @@ static void acpi_dsdt_add_virtio(Aml *scope,
 }
 }
 
-static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap, int irq)
+static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap, int irq,
+  bool use_highmem)
 {
 Aml *method, *crs, *ifctx, *UUID, *ifctx1, *elsectx, *buf;
 int i, bus_no;
@@ -234,6 +235,17 @@ static void acpi_dsdt_add_pci(Aml *scope, const 
MemMapEntry *memmap, int irq)
  AML_ENTIRE_RANGE, 0x, 0x, size_pio - 1, base_pio,
  size_pio));
 
+if (use_highmem) {
+hwaddr base_mmio_high = memmap[VIRT_PCIE_MMIO_HIGH].base;
+hwaddr size_mmio_high = memmap[VIRT_PCIE_MMIO_HIGH].size;
+
+aml_append(rbuf,
+aml_qword_memory(AML_POS_DECODE, AML_MIN_FIXED, AML_MAX_FIXED,
+ AML_NON_CACHEABLE, AML_READ_WRITE, 0x,
+ base_mmio_high, base_mmio_high, 0x,
+ size_mmio_high));
+}
+
 aml_append(method, aml_name_decl("RBUF", rbuf));
 aml_append(method, aml_return(rbuf));
 aml_append(dev, method);
@@ -510,7 +522,8 @@ build_dsdt(GArray *table_data, GArray *linker, 
VirtGuestInfo *guest_info)
 acpi_dsdt_add_flash(scope, &memmap[VIRT_FLASH]);
 acpi_dsdt_add_virtio(scope, &memmap[VIRT_MMIO],
 (irqmap[VIRT_MMIO] + ARM_SPI_BASE), NUM_VIRTIO_TRANSPORTS);
-acpi_dsdt_add_pci(scope, memmap, (irqmap[VIRT_PCIE] + ARM_SPI_BASE));
+acpi_dsdt_add_pci(scope, memmap, (irqmap[VIRT_PCIE] + ARM_SPI_BASE),
+  guest_info->use_highmem);
 
 aml_append(dsdt, scope);
 
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 4846892..44dcd0c 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -77,6 +77,7 @@ typedef struct {
 typedef struct {
 MachineState parent;
 bool secure;
+bool highmem;
 } VirtMachineState;
 
 #define TYPE_VIRT_MACHINE   "virt"
@@ -117,6 +118,7 @@ static const MemMapEntry a15memmap[] = {
 [VIRT_PCIE_PIO] =   { 0x3eff, 0x0001 },
 [VIRT_PCIE_ECAM] =  { 0x3f00, 0x0100 },
 [VIRT_MEM] ={ 0x4000, 30ULL * 1024 * 1024 * 1024 },
+[VIRT_PCIE_MMIO_HIGH] =   { 0x80, 0x80 },
 };
 
 static const int a15irqmap[] = {
@@ -658,7 +660,8 @@ static void create_pcie_irq_map(const VirtBoardInfo *vbi, 
uint32_t gic_phandle,
0x7   /* PCI irq */);
 }
 
-static void create_pcie(const VirtBoardInfo *vbi, qemu_irq *pic)
+static void create_pcie(const VirtBoardInfo *vbi, qemu_irq *pic,
+bool use_highmem)
 {
 hwaddr base_mmio = vbi->memmap[VIRT_PCIE_MMIO].base;
 hwaddr size_mmio = vbi->memmap[VIRT_PCIE_MMIO].size;
@@ -719,11 +722,33 @@ static void create_pcie(const VirtBoardInfo *vbi, 
qemu_irq *pic)
 
 qemu_fdt_setprop_sized_cells(vbi->fdt, nodename, "reg",
  2, base_ecam, 2, size_ecam);
-qemu_fdt_setprop_sized_cells(vbi->fdt, nodename, "ranges",
- 1, FDT_PCI_RANGE_IOPORT, 2, 0,
- 2, base_pio, 2, size_pio,
- 1, FDT_PCI_RANGE_MMIO, 2, base_mmio,
- 2, base_mmio, 2, size_mmio);
+
+if (use_highmem) {
+/* High MMIO space */
+hwaddr base_mmio_high = vbi->memmap[VIRT_PCIE_MMIO_HIGH].base;
+hwaddr size_mmio_high = vbi->memmap[VIRT_PCIE_MMIO_HIGH].size;
+
+mmio_alias = g_new0(MemoryRegion, 1);
+memory_region_init_

Re: [Qemu-devel] [PATCH v8 3/5] Introduce irqchip type specification for KVM

2015-08-12 Thread Pavel Fedin
 Hello!

> I still think this is the wrong approach -- see my remarks
> in the previous round of patch review.

 You know... I thought a little bit...
 So far, test = true in KVM_CREATE_DEVICE means that we just want to know 
whether this type is supported. No actual actions is done by the kernel. Is it 
correct? If yes, we can just leave this test as it is, because if it says that 
GICv2 is supported by KVM_CREATE_DEVICE, then:
1. We use new API. No KVM_IRQCHIP_CREATE.
2. GICv3 may be supported.

 Therefore, if we do this check, and it succeeds, then we just proceed, and 
later actually try to create GICv3. If it fails for some reason, we will see 
error message anyway. So would it be OK just not to touch 
kvm_arch_irqchip_create() at all?

Kind regards,
Pavel Fedin
Expert Engineer
Samsung Electronics Research center Russia





Re: [Qemu-devel] [RFC PATCH V7 09/19] Drop global lock during TCG code execution

2015-08-12 Thread Frederic Konrad

On 12/08/2015 11:58, Paolo Bonzini wrote:


On 11/08/2015 23:34, Frederic Konrad wrote:

Also if qemu_cond_broadcast(&qemu_io_proceeded_cond) is being dropped
there is no point keeping the guff around in qemu_tcg_wait_io_event.


Yes good point.

BTW this leads to high consumption of host CPU eg: 100% per VCPU thread as
the VCPUs thread are no longer waiting for qemu_io_proceeded_cond.

If the guest CPU is busy waiting, that's expected.  But if the guest CPU
is halted, it should not have 100% host CPU consumption.

Paolo

Hmm so that's definitely strange... I mean theorically it's the same as 
before?


An other thing. It seems that we need to signal the VCPU when the 
iothread take

the lock eg:

if (tcg_enabled() && qemu_thread_is_self(&io_thread)) {
CPU_FOREACH(cpu) {
cpu_exit(cpu);
}
}

To make this patch working without MTTCG.

Fred



Re: [Qemu-devel] [RFC v4 1/9] exec.c: Add new exclusive bitmap to ram_list

2015-08-12 Thread Paolo Bonzini


On 12/08/2015 09:31, alvise rigo wrote:
> I think that tlb_flush_entry is not enough, since in theory another
> vCPU could have a different TLB address referring the same phys
> address.

You're right, this is a TLB so it's virtually-indexed. :(  I'm not sure 
what happens on ARM, since it has a virtually indexed (VIVT or VIPT) 
cache, but indeed it would be a problem when implementing e.g. CMPXCHG 
using the TCG ll/sc ops.

I'm a bit worried about adding such a big bitmap.  It's only used on 
TCG, but it is also allocated on KVM and on KVM you can have hundreds 
of VCPUs.  Wasting 200 bits per guest memory page (i.e. ~0.6% of guest 
memory) is obviously not a great idea. :(

Perhaps we can use a bytemap instead:

- 0..253 = TLB_EXCL must be set in all VCPUs except CPU n.  A VCPU that
loads the TLB for this vaddr does not have to set it.

- 254 = TLB_EXCL must be set in all VCPUs.  A VCPU that
loads the TLB for this vaddr has to set it.

- 255 = TLB_EXCL not set in at least two VCPUs

Transitions:

- ll transitions: anything -> 254

- sc transitions: 254 -> current CPU_ID

- TLB_EXCL store transitions: 254 -> current CPU_ID

- tlb_st_page transitions: CPU_ID other than current -> 255

The initial value is 255 on SMP guests, 0 on UP guests.

The algorithms are very similar to yours, just using this approximate
representation.

ll algorithm:
  llsc_value = bytemap[vaddr]
  if llsc_value == CPU_ID
 do nothing
  elseif llsc_value < 254
 flush TLB of CPU llsc_value
  elseif llsc_value == 255
 flush all TLBs
  set TLB_EXCL
  bytemap[vaddr] = 254
  load

tlb_set_page algorithm:
  llsc_value = bytemap[vaddr]
  if llsc_value == CPU_ID or llsc_value == 255
 do nothing
  else if llsc_value == 254
 set TLB_EXCL
  else
 # two CPUs without TLB_EXCL
 bytemap[vaddr] = 255

TLB_EXCL slow path algorithm:
   if bytemap[vaddr] == 254
  bytemap[vaddr] = CPU_ID
   else
  # two CPUs without TLB_EXCL
  bytemap[vaddr] = 255
   clear TLB_EXCL in this CPU
   store

sc algorithm:
   if bytemap[vaddr] == CPU_ID or bytemap[vaddr] == 254
  bytemap[vaddr] = CPU_ID
  clear TLB_EXCL in this CPU
  store
  succeed
   else
  fail

clear algorithm:
   if bytemap[vaddr] == 254
  bytemap[vaddr] = CPU_ID

The UP case is optimized because bytemap[vaddr] will always be 0 or 254.

The algorithm requires the LL to be cleared e.g. on exceptions.
Paolo

> alvise
> 
> On Tue, Aug 11, 2015 at 6:32 PM, Paolo Bonzini  wrote:
>>
>>
>> On 11/08/2015 18:11, alvise rigo wrote:
> Why flush the entire cache (I understand you mean TLB)?
>>> Sorry, I meant the TLB.
>>> If for each removal of an exclusive entry we set also the bit to 1, we
>>> force the following LL to make a tlb_flush() on every vCPU.
>>
>> What if you only flush one entry with tlb_flush_entry (on every vCPU)?
>>
>> Paolo



Re: [Qemu-devel] [RFC v4 3/9] softmmu: Add helpers for a new slowpath

2015-08-12 Thread Paolo Bonzini


On 07/08/2015 19:03, Alvise Rigo wrote:
> +
> +/* For this vCPU, just update the TLB entry, no need to flush. */
> +env->tlb_table[mmu_idx][index].addr_write |= TLB_EXCL;

Couldn't this vCPU also have two aliasing entries in the TLB?

Paolo



Re: [Qemu-devel] [PATCH v6 0/12] HyperV equivalent of pvpanic driver

2015-08-12 Thread Paolo Bonzini


On 12/08/2015 13:54, Denis V. Lunev wrote:
> guys?
> 
> we are going to move forward with other HyperV bits.

Wait a second, 2.4 was released only a few hours ago...

Paolo



Re: [Qemu-devel] [PATCH v8 3/5] Introduce irqchip type specification for KVM

2015-08-12 Thread Peter Maydell
On 12 August 2015 at 13:27, Pavel Fedin  wrote:
>  Hello!
>
>> I still think this is the wrong approach -- see my remarks
>> in the previous round of patch review.
>
>  You know... I thought a little bit...
>  So far, test = true in KVM_CREATE_DEVICE means that we just want to know 
> whether this type is supported. No actual actions is done by the kernel. Is 
> it correct? If yes, we can just leave this test as it is, because if it says 
> that GICv2 is supported by KVM_CREATE_DEVICE, then:
> 1. We use new API. No KVM_IRQCHIP_CREATE.
> 2. GICv3 may be supported.
>
>  Therefore, if we do this check, and it succeeds, then we just
> proceed, and later actually try to create GICv3. If it fails for some
> reason, we will see error message anyway. So would it be OK just not
> to touch kvm_arch_irqchip_create() at all?

No, because then if the kernel only supports GICv3 the
code in kvm_arch_irqchip_create() (as it is currently written)
will erroneously fall back to using the old API.

Christoffer: the question was, why does kvm_arch_irqchip_create()
not only check the KVM_CAP_DEVICE_CTRL but also try to see
if it can KVM_CREATE_DEVICE the GICv2 in order to avoid
falling back to the old pre-KVM_CREATE_DEVICE API ? Are there
kernels which have the capability bit set but which can't
actually use KVM_CREATE_DEVICE to create the irqchip?

My preference here would be for kvm_arch_irqchip_create()
to just use 'is the KVM_CAP_DEVICE_CTRL capability set'
for its "can we use the new API" test; that will then
work whether we have a GICv2 or GICv3 in the host. (The
actual GIC device creation later on might fail, of course,
but that can be handled at that point. The only thing
we need to do as early as kvm_arch_irqchip_create is
determine whether we must use the old API.)

thanks
-- PMM



Re: [Qemu-devel] [RFC v4 1/9] exec.c: Add new exclusive bitmap to ram_list

2015-08-12 Thread Peter Maydell
On 12 August 2015 at 13:36, Paolo Bonzini  wrote:
>
>
> On 12/08/2015 09:31, alvise rigo wrote:
>> I think that tlb_flush_entry is not enough, since in theory another
>> vCPU could have a different TLB address referring the same phys
>> address.
>
> You're right, this is a TLB so it's virtually-indexed. :(  I'm not sure
> what happens on ARM, since it has a virtually indexed (VIVT or VIPT)
> cache, but indeed it would be a problem when implementing e.g. CMPXCHG
> using the TCG ll/sc ops.

On ARM the exclusives operate on physical addresses, not virtual.

-- PMM



Re: [Qemu-devel] [RFC v4 3/9] softmmu: Add helpers for a new slowpath

2015-08-12 Thread alvise rigo
Yes, it could. However, it's really unlikely that a vCPU, after
issuing a LL to the virtual address x, it stores to the same phys
address using the virtual address y.

I'm not really sure If we really need to handle these cases.

alvise

On Wed, Aug 12, 2015 at 2:43 PM, Paolo Bonzini  wrote:
>
>
> On 07/08/2015 19:03, Alvise Rigo wrote:
>> +
>> +/* For this vCPU, just update the TLB entry, no need to flush. */
>> +env->tlb_table[mmu_idx][index].addr_write |= TLB_EXCL;
>
> Couldn't this vCPU also have two aliasing entries in the TLB?
>
> Paolo



Re: [Qemu-devel] [RFC v4 3/9] softmmu: Add helpers for a new slowpath

2015-08-12 Thread Paolo Bonzini


On 12/08/2015 15:09, alvise rigo wrote:
> Yes, it could. However, it's really unlikely that a vCPU, after
> issuing a LL to the virtual address x, it stores to the same phys
> address using the virtual address y.
> 
> I'm not really sure If we really need to handle these cases.

Ok, if we had to it's just a matter of adding a TLB flush here.

Paolo



Re: [Qemu-devel] [PATCH v6 0/12] HyperV equivalent of pvpanic driver

2015-08-12 Thread Denis V. Lunev

On 08/12/2015 03:47 PM, Paolo Bonzini wrote:


On 12/08/2015 13:54, Denis V. Lunev wrote:

guys?

we are going to move forward with other HyperV bits.

Wait a second, 2.4 was released only a few hours ago...

Paolo


sure :)



Re: [Qemu-devel] [PATCH v8 3/5] Introduce irqchip type specification for KVM

2015-08-12 Thread Christoffer Dall
On Wed, Aug 12, 2015 at 2:59 PM, Peter Maydell  wrote:
> On 12 August 2015 at 13:27, Pavel Fedin  wrote:
>>  Hello!
>>
>>> I still think this is the wrong approach -- see my remarks
>>> in the previous round of patch review.
>>
>>  You know... I thought a little bit...
>>  So far, test = true in KVM_CREATE_DEVICE means that we just want to know 
>> whether this type is supported. No actual actions is done by the kernel. Is 
>> it correct? If yes, we can just leave this test as it is, because if it says 
>> that GICv2 is supported by KVM_CREATE_DEVICE, then:
>> 1. We use new API. No KVM_IRQCHIP_CREATE.
>> 2. GICv3 may be supported.
>>
>>  Therefore, if we do this check, and it succeeds, then we just
>> proceed, and later actually try to create GICv3. If it fails for some
>> reason, we will see error message anyway. So would it be OK just not
>> to touch kvm_arch_irqchip_create() at all?
>
> No, because then if the kernel only supports GICv3 the
> code in kvm_arch_irqchip_create() (as it is currently written)
> will erroneously fall back to using the old API.
>
> Christoffer: the question was, why does kvm_arch_irqchip_create()
> not only check the KVM_CAP_DEVICE_CTRL but also try to see
> if it can KVM_CREATE_DEVICE the GICv2 in order to avoid
> falling back to the old pre-KVM_CREATE_DEVICE API ? Are there
> kernels which have the capability bit set but which can't
> actually use KVM_CREATE_DEVICE to create the irqchip?

My thinking probably was that technically the KVM_CAP_DEVICE_CTRL is
an orthogonal concept from how to create the vgic, and you could
advertise this capability without also supporting the GICv2 device
type.

However, I don't believe such kernels exist and they cannot in the
future as that would be because we would remove an actively supported
API.

>
> My preference here would be for kvm_arch_irqchip_create()
> to just use 'is the KVM_CAP_DEVICE_CTRL capability set'
> for its "can we use the new API" test; that will then
> work whether we have a GICv2 or GICv3 in the host. (The
> actual GIC device creation later on might fail, of course,
> but that can be handled at that point. The only thing
> we need to do as early as kvm_arch_irqchip_create is
> determine whether we must use the old API.)
>
I'm fine with this.

-Christoffer



[Qemu-devel] [PATCH v3 0/7] Extract TLS handling code from VNC server

2015-08-12 Thread Daniel P. Berrange
This small patch series is a formal submission of another part
of my previous series

 v1: https://lists.gnu.org/archive/html/qemu-devel/2015-04/msg02038.html
 v2: https://lists.gnu.org/archive/html/qemu-devel/2015-08/msg01267.html

Now we have the basic crypto module defined for hash/cipher APIs,
we extend it to also cover TLS credential and TLS session handling
APIs. These new TLS related APIs obsolete the vast majority of the
TLS handling code in the current VNC server. As a result the VNC
server no longer has to worry about conditional compilation for
GNUTLS. It also gives us code reuse for future patches which intend
to add TLS support to chardevs, migration, nbd, etc.

This series deprecates the existing way of configuring TLS for
VNC on the command line, but maintains support for back-compat
reasons.

Since the nice is now totally isolated from the VNC server it is
also practical to provide significant unit test coverage of what
is security critical code.

Aside from the new CLI syntax for configuring TLS with VNC, the
only other functional change is to allow diffie-hellman params
to be loaded from a file, instead of being generated at startup.

Changes in v3:

 - Switched "tls-creds" object to be just an abstract base class
 - Created "tls-creds-anon" object subclass in new file
 - Created "tls-creds-x509" object subclass in new file

Daniel P. Berrange (7):
  crypto: introduce new base module for TLS credentials
  crypto: introduce new module for TLS anonymous credentials
  crypto: introduce new module for TLS x509 credentials
  crypto: add sanity checking of TLS x509 credentials
  crypto: introduce new module for handling TLS sessions
  ui: fix return type for VNC I/O functions to be ssize_t
  ui: convert VNC server to use QCryptoTLSSession

 configure|   53 +-
 crypto/Makefile.objs |4 +
 crypto/init.c|   10 +
 crypto/tlscreds.c|  264 +
 crypto/tlscredsanon.c|  235 
 crypto/tlscredspriv.h|   41 ++
 crypto/tlscredsx509.c|  821 
 crypto/tlssession.c  |  578 
 include/crypto/tlscreds.h|   74 +++
 include/crypto/tlscredsanon.h|  113 
 include/crypto/tlscredsx509.h|  115 
 include/crypto/tlssession.h  |  322 +++
 qemu-options.hx  |   75 ++-
 tests/.gitignore |7 +
 tests/Makefile   |   14 +-
 tests/crypto-tls-x509-helpers.c  |  486 +
 tests/crypto-tls-x509-helpers.h  |  133 +
 tests/pkix_asn1_tab.c| 1103 ++
 tests/test-crypto-tlscredsx509.c |  734 +
 tests/test-crypto-tlssession.c   |  534 ++
 ui/Makefile.objs |2 +-
 ui/vnc-auth-sasl.c   |   36 +-
 ui/vnc-auth-vencrypt.c   |   80 +--
 ui/vnc-tls.c |  474 
 ui/vnc-tls.h |   69 ---
 ui/vnc-ws.c  |   82 +--
 ui/vnc-ws.h  |2 -
 ui/vnc.c |  360 -
 ui/vnc.h |   17 +-
 29 files changed, 6025 insertions(+), 813 deletions(-)
 create mode 100644 crypto/tlscreds.c
 create mode 100644 crypto/tlscredsanon.c
 create mode 100644 crypto/tlscredspriv.h
 create mode 100644 crypto/tlscredsx509.c
 create mode 100644 crypto/tlssession.c
 create mode 100644 include/crypto/tlscreds.h
 create mode 100644 include/crypto/tlscredsanon.h
 create mode 100644 include/crypto/tlscredsx509.h
 create mode 100644 include/crypto/tlssession.h
 create mode 100644 tests/crypto-tls-x509-helpers.c
 create mode 100644 tests/crypto-tls-x509-helpers.h
 create mode 100644 tests/pkix_asn1_tab.c
 create mode 100644 tests/test-crypto-tlscredsx509.c
 create mode 100644 tests/test-crypto-tlssession.c
 delete mode 100644 ui/vnc-tls.c
 delete mode 100644 ui/vnc-tls.h

-- 
2.4.3




[Qemu-devel] [PATCH v3 2/7] crypto: introduce new module for TLS anonymous credentials

2015-08-12 Thread Daniel P. Berrange
Introduce a QCryptoTLSCredsAnon class which is used to
manage anonymous TLS credentials. Use of this class is
generally discouraged since it does not offer strong
security, but it is required for backwards compatibility
with the current VNC server implementation.

Simple example CLI configuration:

 $QEMU -object tls-creds-anon,id=tls0,endpoint=server

Example using pre-created diffie-hellman parameters

 $QEMU -object tls-creds-anon,id=tls0,endpoint=server,\
   dir=/path/to/creds/dir

The 'id' value in the -object args will be used to associate the
credentials with the network services. For eample, when the VNC
server is later converted it would use

 $QEMU -object tls-creds-anon,id=tls0, \
   -vnc 127.0.0.1:1,tls-creds=tls0

Signed-off-by: Daniel P. Berrange 
---
 crypto/Makefile.objs  |   1 +
 crypto/init.c |   8 ++
 crypto/tlscredsanon.c | 235 ++
 include/crypto/tlscredsanon.h | 113 
 qemu-options.hx   |  20 
 5 files changed, 377 insertions(+)
 create mode 100644 crypto/tlscredsanon.c
 create mode 100644 include/crypto/tlscredsanon.h

diff --git a/crypto/Makefile.objs b/crypto/Makefile.objs
index cf62d51..cf0199e 100644
--- a/crypto/Makefile.objs
+++ b/crypto/Makefile.objs
@@ -4,3 +4,4 @@ util-obj-y += aes.o
 util-obj-y += desrfb.o
 util-obj-y += cipher.o
 util-obj-y += tlscreds.o
+util-obj-y += tlscredsanon.o
diff --git a/crypto/init.c b/crypto/init.c
index 7447882..bc4d769 100644
--- a/crypto/init.c
+++ b/crypto/init.c
@@ -19,6 +19,7 @@
  */
 
 #include "crypto/init.h"
+#include "crypto/tlscredsanon.h"
 #include "qemu/thread.h"
 
 #ifdef CONFIG_GNUTLS
@@ -137,6 +138,13 @@ int qcrypto_init(Error **errp)
 gcry_control(GCRYCTL_INITIALIZATION_FINISHED, 0);
 #endif
 
+/* XXX hack - if we don't reference any function in tlscreds*.c
+ * then the linker drops tlscred*.o from libqemutil.a when it
+ * links the emulators as it thinks it is unused. It isn't
+ * clever enough to see the constructor :-(
+ */
+qcrypto_tls_creds_anon_dummy();
+
 return 0;
 }
 
diff --git a/crypto/tlscredsanon.c b/crypto/tlscredsanon.c
new file mode 100644
index 000..b51ea90
--- /dev/null
+++ b/crypto/tlscredsanon.c
@@ -0,0 +1,235 @@
+/*
+ * QEMU crypto TLS anonymous credential support
+ *
+ * Copyright (c) 2015 Red Hat, Inc.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ *
+ */
+
+#include "crypto/tlscredsanon.h"
+#include "crypto/tlscredspriv.h"
+#include "qom/object_interfaces.h"
+
+/* #define QCRYPTO_DEBUG */
+
+#ifdef QCRYPTO_DEBUG
+#define DPRINTF(fmt, ...) do { fprintf(stderr, fmt, ## __VA_ARGS__); } while 
(0)
+#else
+#define DPRINTF(fmt, ...) do { } while (0)
+#endif
+
+
+#ifdef CONFIG_GNUTLS
+
+
+static int
+qcrypto_tls_creds_anon_load(QCryptoTLSCredsAnon *creds,
+Error **errp)
+{
+char *dhparams = NULL;
+int ret;
+int rv = -1;
+
+DPRINTF("Loading anon creds %d from %s\n",
+creds->type, creds->dir ? creds->dir : "");
+
+if (creds->parent_obj.endpoint == QCRYPTO_TLS_CREDS_ENDPOINT_SERVER) {
+if (qcrypto_tls_creds_get_path(&creds->parent_obj,
+   QCRYPTO_TLS_CREDS_DH_PARAMS,
+   false, &dhparams, errp) < 0) {
+goto cleanup;
+}
+
+ret = gnutls_anon_allocate_server_credentials(&creds->data.server);
+if (ret < 0) {
+error_setg(errp, "Cannot allocate credentials: %s",
+   gnutls_strerror(ret));
+goto cleanup;
+}
+
+if (qcrypto_tls_creds_get_dh_params_file(dhparams,
+ &creds->parent_obj.dh_params,
+ errp) < 0) {
+goto cleanup;
+}
+
+gnutls_anon_set_server_dh_params(creds->data.server,
+ creds->parent_obj.dh_params);
+} else {
+ret = gnutls_anon_allocate_client_credentials(&creds->data.client);
+if (ret < 0) {
+error_setg(errp, "Cannot allocate credentials: %s",
+   gnutls_strerror(ret));
+goto cleanup;
+}
+}
+
+rv = 0;
+ cleanup:
+g_free(dhparams);
+return rv;
+}
+

[Qemu-devel] [PATCH v3 1/7] crypto: introduce new base module for TLS credentials

2015-08-12 Thread Daniel P. Berrange
Introduce a QCryptoTLSCreds class to act as the base class for
storing TLS credentials. This will be later subclassed to provide
handling of anonymous and x509 credential types. The subclasses
will be user creatable objects, so instances can be created &
deleted via 'object-add' and 'object-del' QMP commands respectively,
or via the -object command line arg.

If the credentials cannot be initialized an error will be reported
as a QMP reply, or on stderr respectively.

The idea is to make it possible to represent and manager TLS
credentials independantly of the network service that is using
them. This will enable multiple services to use the same set of
credentials and minimize code duplication. A later patch will
convert the current VNC server TLS code over to use this object.

The representation of credentials will be functionally equivalent
to that currently implemented in the VNC server with one exception.
The new code has the ability to (optionally) load a pre-generated
set of diffie-hellman parameters, if the file dh-params.pem exists,
whereas the current VNC server will always generate them on startup.
This is beneficial for admins who wish to avoid the (small) time
sink of generating DH parameters at startup and/or avoid depleting
entropy.

Signed-off-by: Daniel P. Berrange 
---
 crypto/Makefile.objs  |   1 +
 crypto/tlscreds.c | 264 ++
 crypto/tlscredspriv.h |  41 +++
 include/crypto/tlscreds.h |  74 +
 tests/Makefile|   4 +-
 5 files changed, 382 insertions(+), 2 deletions(-)
 create mode 100644 crypto/tlscreds.c
 create mode 100644 crypto/tlscredspriv.h
 create mode 100644 include/crypto/tlscreds.h

diff --git a/crypto/Makefile.objs b/crypto/Makefile.objs
index b050138..cf62d51 100644
--- a/crypto/Makefile.objs
+++ b/crypto/Makefile.objs
@@ -3,3 +3,4 @@ util-obj-y += hash.o
 util-obj-y += aes.o
 util-obj-y += desrfb.o
 util-obj-y += cipher.o
+util-obj-y += tlscreds.o
diff --git a/crypto/tlscreds.c b/crypto/tlscreds.c
new file mode 100644
index 000..38e24be
--- /dev/null
+++ b/crypto/tlscreds.c
@@ -0,0 +1,264 @@
+/*
+ * QEMU crypto TLS credential support
+ *
+ * Copyright (c) 2015 Red Hat, Inc.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ *
+ */
+
+#include "crypto/tlscredspriv.h"
+
+/* #define QCRYPTO_DEBUG */
+
+#ifdef QCRYPTO_DEBUG
+#define DPRINTF(fmt, ...) do { fprintf(stderr, fmt, ## __VA_ARGS__); } while 
(0)
+#else
+#define DPRINTF(fmt, ...) do { } while (0)
+#endif
+
+
+#define DH_BITS 2048
+
+static const char * const endpoint_map[QCRYPTO_TLS_CREDS_ENDPOINT_LAST + 1] = {
+[QCRYPTO_TLS_CREDS_ENDPOINT_SERVER] = "server",
+[QCRYPTO_TLS_CREDS_ENDPOINT_CLIENT] = "client",
+[QCRYPTO_TLS_CREDS_ENDPOINT_LAST] = NULL,
+};
+
+
+#ifdef CONFIG_GNUTLS
+int
+qcrypto_tls_creds_get_dh_params_file(const char *filename,
+ gnutls_dh_params_t *dh_params,
+ Error **errp)
+{
+int ret;
+
+DPRINTF("Loading DH params %s\n", filename ? filename : "");
+if (filename == NULL) {
+ret = gnutls_dh_params_init(dh_params);
+if (ret < 0) {
+error_setg(errp, "Unable to initialize DH parameters %s",
+   gnutls_strerror(ret));
+return -1;
+}
+ret = gnutls_dh_params_generate2(*dh_params, DH_BITS);
+if (ret < 0) {
+gnutls_dh_params_deinit(*dh_params);
+*dh_params = NULL;
+error_setg(errp, "Unable to generate DH parameters %s",
+   gnutls_strerror(ret));
+return -1;
+}
+} else {
+GError *gerr = NULL;
+gchar *contents;
+gsize len;
+gnutls_datum_t data;
+if (!g_file_get_contents(filename,
+ &contents,
+ &len,
+ &gerr)) {
+
+error_setg(errp, "%s", gerr->message);
+g_error_free(gerr);
+return -1;
+}
+data.data = (unsigned char *)contents;
+data.size = len;
+ret = gnutls_dh_params_init(dh_params);
+if (ret < 0) {
+g_free(contents);
+error_setg(errp, "Unable to initialize DH parameters %s",
+   gnutls_strerror

[Qemu-devel] [PATCH v3 3/7] crypto: introduce new module for TLS x509 credentials

2015-08-12 Thread Daniel P. Berrange
Introduce a QCryptoTLSCredsX509 class which is used to
manage x509 certificate TLS credentials. This will be
the preferred credential type offering strong security
characteristics

Example CLI configuration:

 $QEMU -object tls-creds-x509,id=tls0,endpoint=server,\
   dir=/path/to/creds/dir,verify-peer=yes

The 'id' value in the -object args will be used to associate the
credentials with the network services. For eample, when the VNC
server is later converted it would use

 $QEMU -object tls-creds-x509,id=tls0, \
   -vnc 127.0.0.1:1,tls-creds=tls0

Signed-off-by: Daniel P. Berrange 
---
 crypto/Makefile.objs  |   1 +
 crypto/init.c |   2 +
 crypto/tlscredsx509.c | 277 ++
 include/crypto/tlscredsx509.h | 114 +
 qemu-options.hx   |  27 
 5 files changed, 421 insertions(+)
 create mode 100644 crypto/tlscredsx509.c
 create mode 100644 include/crypto/tlscredsx509.h

diff --git a/crypto/Makefile.objs b/crypto/Makefile.objs
index cf0199e..914bb68 100644
--- a/crypto/Makefile.objs
+++ b/crypto/Makefile.objs
@@ -5,3 +5,4 @@ util-obj-y += desrfb.o
 util-obj-y += cipher.o
 util-obj-y += tlscreds.o
 util-obj-y += tlscredsanon.o
+util-obj-y += tlscredsx509.o
diff --git a/crypto/init.c b/crypto/init.c
index bc4d769..a8b555a 100644
--- a/crypto/init.c
+++ b/crypto/init.c
@@ -20,6 +20,7 @@
 
 #include "crypto/init.h"
 #include "crypto/tlscredsanon.h"
+#include "crypto/tlscredsx509.h"
 #include "qemu/thread.h"
 
 #ifdef CONFIG_GNUTLS
@@ -144,6 +145,7 @@ int qcrypto_init(Error **errp)
  * clever enough to see the constructor :-(
  */
 qcrypto_tls_creds_anon_dummy();
+qcrypto_tls_creds_x509_dummy();
 
 return 0;
 }
diff --git a/crypto/tlscredsx509.c b/crypto/tlscredsx509.c
new file mode 100644
index 000..0adb065
--- /dev/null
+++ b/crypto/tlscredsx509.c
@@ -0,0 +1,277 @@
+/*
+ * QEMU crypto TLS x509 credential support
+ *
+ * Copyright (c) 2015 Red Hat, Inc.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ *
+ */
+
+#include "crypto/tlscredsx509.h"
+#include "crypto/tlscredspriv.h"
+#include "qom/object_interfaces.h"
+
+/* #define QCRYPTO_DEBUG */
+
+#ifdef QCRYPTO_DEBUG
+#define DPRINTF(fmt, ...) do { fprintf(stderr, fmt, ## __VA_ARGS__); } while 
(0)
+#else
+#define DPRINTF(fmt, ...) do { } while (0)
+#endif
+
+
+#ifdef CONFIG_GNUTLS
+
+
+static int
+qcrypto_tls_creds_x509_load(QCryptoTLSCredsX509 *creds,
+Error **errp)
+{
+char *cacert = NULL, *cacrl = NULL, *cert = NULL,
+*key = NULL, *dhparams = NULL;
+int ret;
+int rv = -1;
+
+DPRINTF("Loading x509 creds %d from %s\n",
+creds->type, creds->dir ? creds->dir : "");
+
+if (creds->parent_obj.endpoint == QCRYPTO_TLS_CREDS_ENDPOINT_SERVER) {
+if (qcrypto_tls_creds_get_path(&creds->parent_obj,
+   QCRYPTO_TLS_CREDS_X509_CA_CERT,
+   true, &cacert, errp) < 0 ||
+qcrypto_tls_creds_get_path(&creds->parent_obj,
+   QCRYPTO_TLS_CREDS_X509_CA_CRL,
+   false, &cacrl, errp) < 0 ||
+qcrypto_tls_creds_get_path(&creds->parent_obj,
+   QCRYPTO_TLS_CREDS_X509_SERVER_CERT,
+   true, &cert, errp) < 0 ||
+qcrypto_tls_creds_get_path(&creds->parent_obj,
+   QCRYPTO_TLS_CREDS_X509_SERVER_KEY,
+   true, &key, errp) < 0 ||
+qcrypto_tls_creds_get_path(&creds->parent_obj,
+   QCRYPTO_TLS_CREDS_DH_PARAMS,
+   false, &dhparams, errp) < 0) {
+goto cleanup;
+}
+} else {
+if (qcrypto_tls_creds_get_path(&creds->parent_obj,
+   QCRYPTO_TLS_CREDS_X509_CA_CERT,
+   true, &cacert, errp) < 0 ||
+qcrypto_tls_creds_get_path(&creds->parent_obj,
+   QCRYPTO_TLS_CREDS_X509_CLIENT_CERT,
+   false, &cert, errp) < 0 ||
+qcrypto_tls_creds_get_path(&creds->parent_obj,
+

[Qemu-devel] [PATCH v3 6/7] ui: fix return type for VNC I/O functions to be ssize_t

2015-08-12 Thread Daniel P. Berrange
Various VNC server I/O functions return 'long' and then
also pass this to a method accepting 'int'. All these
should be ssize_t to match the signature of read/write
APIs and thus avoid potential for integer truncation /
wraparound.

Signed-off-by: Daniel P. Berrange 
---
 ui/vnc.c | 36 ++--
 ui/vnc.h |  2 +-
 2 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/ui/vnc.c b/ui/vnc.c
index e26973a..37133a4 100644
--- a/ui/vnc.c
+++ b/ui/vnc.c
@@ -1268,7 +1268,7 @@ void vnc_disconnect_finish(VncState *vs)
 g_free(vs);
 }
 
-int vnc_client_io_error(VncState *vs, int ret, int last_errno)
+ssize_t vnc_client_io_error(VncState *vs, ssize_t ret, int last_errno)
 {
 if (ret == 0 || ret == -1) {
 if (ret == -1) {
@@ -1284,7 +1284,7 @@ int vnc_client_io_error(VncState *vs, int ret, int 
last_errno)
 }
 }
 
-VNC_DEBUG("Closing down client sock: ret %d, errno %d\n",
+VNC_DEBUG("Closing down client sock: ret %zd, errno %d\n",
   ret, ret < 0 ? last_errno : 0);
 vnc_disconnect_start(vs);
 
@@ -1301,11 +1301,11 @@ void vnc_client_error(VncState *vs)
 }
 
 #ifdef CONFIG_VNC_TLS
-static long vnc_client_write_tls(gnutls_session_t *session,
- const uint8_t *data,
- size_t datalen)
+static ssize_t vnc_client_write_tls(gnutls_session_t *session,
+const uint8_t *data,
+size_t datalen)
 {
-long ret = gnutls_write(*session, data, datalen);
+ssize_t ret = gnutls_write(*session, data, datalen);
 if (ret < 0) {
 if (ret == GNUTLS_E_AGAIN) {
 errno = EAGAIN;
@@ -1333,9 +1333,9 @@ static long vnc_client_write_tls(gnutls_session_t 
*session,
  * the requested 'datalen' if the socket would block. Returns
  * -1 on error, and disconnects the client socket.
  */
-long vnc_client_write_buf(VncState *vs, const uint8_t *data, size_t datalen)
+ssize_t vnc_client_write_buf(VncState *vs, const uint8_t *data, size_t datalen)
 {
-long ret;
+ssize_t ret;
 #ifdef CONFIG_VNC_TLS
 if (vs->tls.session) {
 ret = vnc_client_write_tls(&vs->tls.session, data, datalen);
@@ -1360,9 +1360,9 @@ long vnc_client_write_buf(VncState *vs, const uint8_t 
*data, size_t datalen)
  * the buffered output data if the socket would block. Returns
  * -1 on error, and disconnects the client socket.
  */
-static long vnc_client_write_plain(VncState *vs)
+static ssize_t vnc_client_write_plain(VncState *vs)
 {
-long ret;
+ssize_t ret;
 
 #ifdef CONFIG_VNC_SASL
 VNC_DEBUG("Write Plain: Pending output %p size %zd offset %zd. Wait SSF 
%d\n",
@@ -1436,10 +1436,10 @@ void vnc_read_when(VncState *vs, VncReadEvent *func, 
size_t expecting)
 }
 
 #ifdef CONFIG_VNC_TLS
-static long vnc_client_read_tls(gnutls_session_t *session, uint8_t *data,
-size_t datalen)
+static ssize_t vnc_client_read_tls(gnutls_session_t *session, uint8_t *data,
+   size_t datalen)
 {
-long ret = gnutls_read(*session, data, datalen);
+ssize_t ret = gnutls_read(*session, data, datalen);
 if (ret < 0) {
 if (ret == GNUTLS_E_AGAIN) {
 errno = EAGAIN;
@@ -1467,9 +1467,9 @@ static long vnc_client_read_tls(gnutls_session_t 
*session, uint8_t *data,
  * the requested 'datalen' if the socket would block. Returns
  * -1 on error, and disconnects the client socket.
  */
-long vnc_client_read_buf(VncState *vs, uint8_t *data, size_t datalen)
+ssize_t vnc_client_read_buf(VncState *vs, uint8_t *data, size_t datalen)
 {
-long ret;
+ssize_t ret;
 #ifdef CONFIG_VNC_TLS
 if (vs->tls.session) {
 ret = vnc_client_read_tls(&vs->tls.session, data, datalen);
@@ -1492,9 +1492,9 @@ long vnc_client_read_buf(VncState *vs, uint8_t *data, 
size_t datalen)
  * Returns the number of bytes read. Returns -1 on error, and
  * disconnects the client socket.
  */
-static long vnc_client_read_plain(VncState *vs)
+static ssize_t vnc_client_read_plain(VncState *vs)
 {
-int ret;
+ssize_t ret;
 VNC_DEBUG("Read plain %p size %zd offset %zd\n",
   vs->input.buffer, vs->input.capacity, vs->input.offset);
 buffer_reserve(&vs->input, 4096);
@@ -1520,7 +1520,7 @@ static void vnc_jobs_bh(void *opaque)
 void vnc_client_read(void *opaque)
 {
 VncState *vs = opaque;
-long ret;
+ssize_t ret;
 
 #ifdef CONFIG_VNC_SASL
 if (vs->sasl.conn && vs->sasl.runSSF)
diff --git a/ui/vnc.h b/ui/vnc.h
index 814d720..798b25b 100644
--- a/ui/vnc.h
+++ b/ui/vnc.h
@@ -533,7 +533,7 @@ uint32_t read_u32(uint8_t *data, size_t offset);
 
 /* Protocol stage functions */
 void vnc_client_error(VncState *vs);
-int vnc_client_io_error(VncState *vs, int ret, int last_errno);
+ssize_t vnc_client_io_error(VncState *vs, ssize_t ret, int last_errno);
 
 void start_client_init(VncState *vs);
 void start_auth_vnc(VncState *v

[Qemu-devel] [PATCH v3 5/7] crypto: introduce new module for handling TLS sessions

2015-08-12 Thread Daniel P. Berrange
Introduce a QCryptoTLSSession object that will encapsulate
all the code for setting up and using a client/sever TLS
session. This isolates the code which depends on the gnutls
library, avoiding #ifdefs in the rest of the codebase, as
well as facilitating any possible future port to other TLS
libraries, if desired. It makes use of the previously
defined QCryptoTLSCreds object to access credentials to
use with the session. It also includes further unit tests
to validate the correctness of the TLS session handshake
and certificate validation. This is functionally equivalent
to the current TLS session handling code embedded in the
VNC server, and will obsolete it.

Signed-off-by: Daniel P. Berrange 
---
 crypto/Makefile.objs   |   1 +
 crypto/tlssession.c| 578 +
 include/crypto/tlssession.h| 322 +++
 tests/.gitignore   |   4 +
 tests/Makefile |   3 +
 tests/test-crypto-tlssession.c | 534 +
 6 files changed, 1442 insertions(+)
 create mode 100644 crypto/tlssession.c
 create mode 100644 include/crypto/tlssession.h
 create mode 100644 tests/test-crypto-tlssession.c

diff --git a/crypto/Makefile.objs b/crypto/Makefile.objs
index 914bb68..52a192d 100644
--- a/crypto/Makefile.objs
+++ b/crypto/Makefile.objs
@@ -6,3 +6,4 @@ util-obj-y += cipher.o
 util-obj-y += tlscreds.o
 util-obj-y += tlscredsanon.o
 util-obj-y += tlscredsx509.o
+util-obj-y += tlssession.o
diff --git a/crypto/tlssession.c b/crypto/tlssession.c
new file mode 100644
index 000..dd0cd9f
--- /dev/null
+++ b/crypto/tlssession.c
@@ -0,0 +1,578 @@
+/*
+ * QEMU crypto TLS session support
+ *
+ * Copyright (c) 2015 Red Hat, Inc.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ *
+ */
+
+#include "crypto/tlssession.h"
+#include "crypto/tlscredsanon.h"
+#include "crypto/tlscredsx509.h"
+#include "qemu/acl.h"
+
+#include 
+
+#ifdef CONFIG_GNUTLS
+
+/* #define QCRYPTO_DEBUG */
+
+#ifdef QCRYPTO_DEBUG
+#define DPRINTF(fmt, ...) do { fprintf(stderr, fmt, ## __VA_ARGS__); } while 
(0)
+#else
+#define DPRINTF(fmt, ...) do { } while (0)
+#endif
+
+#include 
+
+
+struct _QCryptoTLSSession {
+QCryptoTLSCreds *creds;
+gnutls_session_t handle;
+char *hostname;
+char *aclname;
+bool handshakeComplete;
+QCryptoTLSSessionWriteFunc writeFunc;
+QCryptoTLSSessionReadFunc readFunc;
+void *opaque;
+char *peername;
+};
+
+
+void
+qcrypto_tls_session_free(QCryptoTLSSession *session)
+{
+if (!session) {
+return;
+}
+
+gnutls_deinit(session->handle);
+g_free(session->hostname);
+g_free(session->peername);
+g_free(session->aclname);
+object_unref(OBJECT(session->creds));
+g_free(session);
+}
+
+
+static ssize_t
+qcrypto_tls_session_push(void *opaque, const void *buf, size_t len)
+{
+QCryptoTLSSession *session = opaque;
+
+if (!session->writeFunc) {
+errno = EIO;
+return -1;
+};
+
+return session->writeFunc(buf, len, session->opaque);
+}
+
+
+static ssize_t
+qcrypto_tls_session_pull(void *opaque, void *buf, size_t len)
+{
+QCryptoTLSSession *session = opaque;
+
+if (!session->readFunc) {
+errno = EIO;
+return -1;
+};
+
+return session->readFunc(buf, len, session->opaque);
+}
+
+
+QCryptoTLSSession *
+qcrypto_tls_session_new(QCryptoTLSCreds *creds,
+const char *hostname,
+const char *aclname,
+QCryptoTLSCredsEndpoint endpoint,
+Error **errp)
+{
+QCryptoTLSSession *session;
+int ret;
+
+DPRINTF("New session creds=%p acl=%s\n", creds, aclname ? aclname : 
"none");
+session = g_new0(QCryptoTLSSession, 1);
+if (hostname) {
+session->hostname = g_strdup(hostname);
+}
+if (aclname) {
+session->aclname = g_strdup(aclname);
+}
+session->creds = creds;
+object_ref(OBJECT(creds));
+
+if (creds->endpoint != endpoint) {
+error_setg(errp, "Credentials endpoint doesn't match session");
+goto error;
+}
+
+if (endpoint == QCRYPTO_TLS_CREDS_ENDPOINT_SERVER) {
+ret = gnutls_init(&session->handle, GNUTLS_SERVER);
+} else {
+ret = gnutls_init(&session->handle, GNUTLS_CLIENT);
+}
+if (ret < 0)

[Qemu-devel] [PATCH v3 7/7] ui: convert VNC server to use QCryptoTLSSession

2015-08-12 Thread Daniel P. Berrange
Switch VNC server over to using the QCryptoTLSSession object
for the TLS session. This removes the direct use of gnutls
from the VNC server code. It also removes most knowledge
about TLS certificate handling from the VNC server code.
This has the nice effect that all the CONFIG_VNC_TLS
conditionals go away and the user gets an actual error
message when requesting TLS instead of it being silently
ignored.

With this change, the existing configuration options for
enabling TLS with -vnc are deprecated.

Old syntax for anon-DH credentials:

  -vnc hostname:0,tls

New syntax:

  -object tls-creds-anon,id=tls0,endpoint=server \
  -vnc hostname:0,tls-creds=tls0

Old syntax for x509 credentials, no client certs:

  -vnc hostname:0,tls,x509=/path/to/certs

New syntax:

  -object 
tls-creds-x509,id=tls0,dir=/path/to/certs,endpoint=server,verify-peer=no \
  -vnc hostname:0,tls-creds=tls0

Old syntax for x509 credentials, requiring client certs:

  -vnc hostname:0,tls,x509verify=/path/to/certs

New syntax:

  -object 
tls-creds-x509,id=tls0,dir=/path/to/certs,endpoint=server,verify-peer=yes \
  -vnc hostname:0,tls-creds=tls0

This aligns VNC with the way TLS credentials are to be
configured in the future for chardev, nbd and migration
backends. It also has the benefit that the same TLS
credentials can be shared across multiple VNC server
instances, if desired.

If someone uses the deprecated syntax, it will internally
result in the creation of a 'tls-creds' object with an ID
based on the VNC server ID. This allows backwards compat
with the CLI syntax, while still deleting all the original
TLS code from the VNC server.

Signed-off-by: Daniel P. Berrange 
---
 configure  |  31 
 qemu-options.hx|  28 ++-
 ui/Makefile.objs   |   2 +-
 ui/vnc-auth-sasl.c |  36 ++--
 ui/vnc-auth-vencrypt.c |  80 +
 ui/vnc-tls.c   | 474 -
 ui/vnc-tls.h   |  69 ---
 ui/vnc-ws.c|  82 +
 ui/vnc-ws.h|   2 -
 ui/vnc.c   | 338 ++-
 ui/vnc.h   |  15 +-
 11 files changed, 359 insertions(+), 798 deletions(-)
 delete mode 100644 ui/vnc-tls.c
 delete mode 100644 ui/vnc-tls.h

diff --git a/configure b/configure
index 264aea3..0754e27 100755
--- a/configure
+++ b/configure
@@ -242,7 +242,6 @@ vnc="yes"
 sparse="no"
 uuid=""
 vde=""
-vnc_tls=""
 vnc_sasl=""
 vnc_jpeg=""
 vnc_png=""
@@ -883,10 +882,6 @@ for opt do
   ;;
   --disable-strip) strip_opt="no"
   ;;
-  --disable-vnc-tls) vnc_tls="no"
-  ;;
-  --enable-vnc-tls) vnc_tls="yes"
-  ;;
   --disable-vnc-sasl) vnc_sasl="no"
   ;;
   --enable-vnc-sasl) vnc_sasl="yes"
@@ -2369,28 +2364,6 @@ EOF
   fi
 fi
 
-##
-# VNC TLS/WS detection
-if test "$vnc" = "yes" -a "$vnc_tls" != "no" ; then
-  cat > $TMPC <
-int main(void) { gnutls_session_t s; gnutls_init(&s, GNUTLS_SERVER); return 0; 
}
-EOF
-  vnc_tls_cflags=`$pkg_config --cflags gnutls 2> /dev/null`
-  vnc_tls_libs=`$pkg_config --libs gnutls 2> /dev/null`
-  if compile_prog "$vnc_tls_cflags" "$vnc_tls_libs" ; then
-if test "$vnc_tls" != "no" ; then
-  vnc_tls=yes
-fi
-libs_softmmu="$vnc_tls_libs $libs_softmmu"
-QEMU_CFLAGS="$QEMU_CFLAGS $vnc_tls_cflags"
-  else
-if test "$vnc_tls" = "yes" ; then
-  feature_not_found "vnc-tls" "Install gnutls devel"
-fi
-vnc_tls=no
-  fi
-fi
 
 ##
 # VNC SASL detection
@@ -4534,7 +4507,6 @@ echo "Block whitelist (ro) $block_drv_ro_whitelist"
 echo "VirtFS support$virtfs"
 echo "VNC support   $vnc"
 if test "$vnc" = "yes" ; then
-echo "VNC TLS support   $vnc_tls"
 echo "VNC SASL support  $vnc_sasl"
 echo "VNC JPEG support  $vnc_jpeg"
 echo "VNC PNG support   $vnc_png"
@@ -4741,9 +4713,6 @@ echo "CONFIG_BDRV_RO_WHITELIST=$block_drv_ro_whitelist" 
>> $config_host_mak
 if test "$vnc" = "yes" ; then
   echo "CONFIG_VNC=y" >> $config_host_mak
 fi
-if test "$vnc_tls" = "yes" ; then
-  echo "CONFIG_VNC_TLS=y" >> $config_host_mak
-fi
 if test "$vnc_sasl" = "yes" ; then
   echo "CONFIG_VNC_SASL=y" >> $config_host_mak
 fi
diff --git a/qemu-options.hx b/qemu-options.hx
index 23c91d5..4256483 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1214,8 +1214,9 @@ By definition the Websocket port is 5700+@var{display}. 
If @var{host} is
 specified connections will only be allowed from this host.
 As an alternative the Websocket port could be specified by using
 @code{websocket}=@var{port}.
-TLS encryption for the Websocket connection is supported if the required
-certificates are specified with the VNC option @option{x509}.
+If no TLS credentials are provided, the websocket connection runs in
+uncrypted mode. If TLS credentials are provided, the websocket connection
+requires encrypted client connections.
 
 @item password
 
@@ -1236,6 +1237,20 @@ date and time).
 You can also use keywords "now" or "never" 

[Qemu-devel] [PATCH v3 4/7] crypto: add sanity checking of TLS x509 credentials

2015-08-12 Thread Daniel P. Berrange
If the administrator incorrectly sets up their x509 certificates,
the errors seen at runtime during connection attempts are very
obscure and difficult to diagnose. This has been a particular
problem for people using openssl to generate their certificates
instead of the gnutls certtool, because the openssl tools don't
turn on the various x509 extensions that gnutls expects to be
present by default.

This change thus adds support in the TLS credentials object to
sanity check the certificates when QEMU first loads them. This
gives the administrator immediate feedback for the majority of
common configuration mistakes, reducing the pain involved in
setting up TLS. The code is derived from equivalent code that
has been part of libvirt's TLS support and has been seen to be
valuable in assisting admins.

It is possible to disable the sanity checking, however, via
the new 'sanity-check' property on the tls-creds object type,
with a value of 'no'.

Unit tests are included in this change to verify the correctness
of the sanity checking code in all the key scenarios it is
intended to cope with. As part of the test suite, the pkix_asn1_tab.c
from gnutls is imported. This file is intentionally copied from the
(long since obsolete) gnutls 1.6.3 source tree, since that version
was still under GPLv2+, rather than the GPLv3+ of gnutls >= 2.0.

Signed-off-by: Daniel P. Berrange 
---
 configure|   22 +
 crypto/tlscredsx509.c|  544 +++
 include/crypto/tlscredsx509.h|1 +
 tests/.gitignore |3 +
 tests/Makefile   |7 +-
 tests/crypto-tls-x509-helpers.c  |  486 +
 tests/crypto-tls-x509-helpers.h  |  133 +
 tests/pkix_asn1_tab.c| 1103 ++
 tests/test-crypto-tlscredsx509.c |  734 +
 9 files changed, 3032 insertions(+), 1 deletion(-)
 create mode 100644 tests/crypto-tls-x509-helpers.c
 create mode 100644 tests/crypto-tls-x509-helpers.h
 create mode 100644 tests/pkix_asn1_tab.c
 create mode 100644 tests/test-crypto-tlscredsx509.c

diff --git a/configure b/configure
index cd219d8..264aea3 100755
--- a/configure
+++ b/configure
@@ -416,6 +416,9 @@ if test "$debug_info" = "yes"; then
 LDFLAGS="-g $LDFLAGS"
 fi
 
+test_cflags=""
+test_libs=""
+
 # make source path absolute
 source_path=`cd "$source_path"; pwd`
 
@@ -2207,6 +2210,19 @@ if test "$gnutls_nettle" != "no"; then
 fi
 fi
 
+##
+# libtasn1 - only for the TLS creds/session test suite
+
+tasn1=yes
+if $pkg_config --exists "libtasn1"; then
+tasn1_cflags=`$pkg_config --cflags libtasn1`
+tasn1_libs=`$pkg_config --libs libtasn1`
+test_cflags="$test_cflags $tasn1_cflags"
+test_libs="$test_libs $tasn1_libs"
+else
+tasn1=no
+fi
+
 
 ##
 # VTE probe
@@ -4507,6 +4523,7 @@ echo "GNUTLS support$gnutls"
 echo "GNUTLS hash   $gnutls_hash"
 echo "GNUTLS gcrypt $gnutls_gcrypt"
 echo "GNUTLS nettle $gnutls_nettle ${gnutls_nettle+($nettle_version)}"
+echo "libtasn1  $tasn1"
 echo "VTE support   $vte"
 echo "curses support$curses"
 echo "curl support  $curl"
@@ -4876,6 +4893,9 @@ if test "$gnutls_nettle" = "yes" ; then
   echo "CONFIG_GNUTLS_NETTLE=y" >> $config_host_mak
   echo "CONFIG_NETTLE_VERSION_MAJOR=${nettle_version%%.*}" >> $config_host_mak
 fi
+if test "$tasn1" = "yes" ; then
+  echo "CONFIG_TASN1=y" >> $config_host_mak
+fi
 if test "$vte" = "yes" ; then
   echo "CONFIG_VTE=y" >> $config_host_mak
   echo "VTE_CFLAGS=$vte_cflags" >> $config_host_mak
@@ -5199,6 +5219,8 @@ echo "EXESUF=$EXESUF" >> $config_host_mak
 echo "DSOSUF=$DSOSUF" >> $config_host_mak
 echo "LDFLAGS_SHARED=$LDFLAGS_SHARED" >> $config_host_mak
 echo "LIBS_QGA+=$libs_qga" >> $config_host_mak
+echo "TEST_LIBS=$test_libs" >> $config_host_mak
+echo "TEST_CFLAGS=$test_cflags" >> $config_host_mak
 echo "POD2MAN=$POD2MAN" >> $config_host_mak
 echo "TRANSLATE_OPT_CFLAGS=$TRANSLATE_OPT_CFLAGS" >> $config_host_mak
 if test "$gcov" = "yes" ; then
diff --git a/crypto/tlscredsx509.c b/crypto/tlscredsx509.c
index 0adb065..e223501 100644
--- a/crypto/tlscredsx509.c
+++ b/crypto/tlscredsx509.c
@@ -33,6 +33,514 @@
 
 #ifdef CONFIG_GNUTLS
 
+#include 
+
+
+static int
+qcrypto_tls_creds_check_cert_times(gnutls_x509_crt_t cert,
+   const char *certFile,
+   bool isServer,
+   bool isCA,
+   Error **errp)
+{
+time_t now = time(NULL);
+
+if (now == ((time_t)-1)) {
+error_setg_errno(errp, errno, "cannot get current time");
+return -1;
+}
+
+if (gnutls_x509_crt_get_expiration_time(cert) < now) {
+error_setg(errp,
+   (isCA ?
+"The CA certificate %s has expired" :
+(isServer ?
+ 

Re: [Qemu-devel] [PATCH] Makefile.target: include top level build dir in vpath

2015-08-12 Thread Paolo Bonzini


On 09/08/2015 09:02, Michael Marineau wrote:
> Using ccache with CCACHE_BASEDIR set to $(SRC_PATH) or a parent will
> rewrite all absolute paths to relative paths. This interacts poorly with
> QEMU's two-level build directory scheme. For example, lets say
> BUILD_DIR=$(SRC_PATH)/build so build/blockdev.d will contain:
> 
>   blockdev.o: ../blockdev.c ../include/sysemu/block-backend.h \
> 
> Now the target build under build/x86_64-softmmu or similar will depend
> on ../blockdev.o which in turn will get make to source ../blockdev.d to
> check its dependencies. Since make always considers paths relative to
> the current working directory rather than the makefile the path appeared
> in the relative path to ../blockdev.c is useless.
> 
> This change simply adds the top level build directory to vpath so paths
> relative to the source directory, top build directory, and target build
> directory all work just fine.
> ---
>  Makefile.target | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/Makefile.target b/Makefile.target
> index 3e7aafd..dc32294 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -7,7 +7,7 @@ include config-target.mak
>  include config-devices.mak
>  include $(SRC_PATH)/rules.mak
>  
> -$(call set-vpath, $(SRC_PATH))
> +$(call set-vpath, $(SRC_PATH):$(BUILD_DIR))
>  ifdef CONFIG_LINUX
>  QEMU_CFLAGS += -I../linux-headers
>  endif
> 

Hi,

can you reply with "Signed-off-by: Michael Marineau
", representing that you've read and agreed
to the Developer Certificate of Origin (http://developercertificate.org/)?

This is necessary to include your patch in QEMU.

Paolo



[Qemu-devel] [PULL 03/20] tests: virtio-scsi: clear unit attention after reset

2015-08-12 Thread Paolo Bonzini
From: Stefan Hajnoczi 

The unit attention after reset (power on) prevents normal commands from
running.  The unaligned WRITE SAME test never executed its command!

Signed-off-by: Stefan Hajnoczi 
Message-Id: <1438262173-11546-4-git-send-email-stefa...@redhat.com>
Signed-off-by: Paolo Bonzini 
---
 tests/virtio-scsi-test.c | 90 +---
 1 file changed, 54 insertions(+), 36 deletions(-)

diff --git a/tests/virtio-scsi-test.c b/tests/virtio-scsi-test.c
index 11ccdd6..6e91ca9 100644
--- a/tests/virtio-scsi-test.c
+++ b/tests/virtio-scsi-test.c
@@ -13,6 +13,7 @@
 #include "libqtest.h"
 #include "qemu/osdep.h"
 #include 
+#include "block/scsi.h"
 #include "libqos/virtio.h"
 #include "libqos/virtio-pci.h"
 #include "libqos/pci-pc.h"
@@ -71,40 +72,6 @@ static void qvirtio_scsi_stop(void)
 qtest_end();
 }
 
-static QVirtIOSCSI *qvirtio_scsi_pci_init(int slot)
-{
-QVirtIOSCSI *vs;
-QVirtioPCIDevice *dev;
-void *addr;
-int i;
-
-vs = g_new0(QVirtIOSCSI, 1);
-vs->alloc = pc_alloc_init();
-vs->bus = qpci_init_pc();
-
-dev = qvirtio_pci_device_find(vs->bus, QVIRTIO_SCSI_DEVICE_ID);
-vs->dev = (QVirtioDevice *)dev;
-g_assert(dev != NULL);
-g_assert_cmphex(vs->dev->device_type, ==, QVIRTIO_SCSI_DEVICE_ID);
-
-qvirtio_pci_device_enable(dev);
-qvirtio_reset(&qvirtio_pci, vs->dev);
-qvirtio_set_acknowledge(&qvirtio_pci, vs->dev);
-qvirtio_set_driver(&qvirtio_pci, vs->dev);
-
-addr = dev->addr + QVIRTIO_PCI_DEVICE_SPECIFIC_NO_MSIX;
-vs->num_queues = qvirtio_config_readl(&qvirtio_pci, vs->dev,
-  (uint64_t)(uintptr_t)addr);
-
-g_assert_cmpint(vs->num_queues, <, MAX_NUM_QUEUES);
-
-for (i = 0; i < vs->num_queues + 2; i++) {
-vs->vq[i] = qvirtqueue_setup(&qvirtio_pci, vs->dev, vs->alloc, i);
-}
-
-return vs;
-}
-
 static void qvirtio_scsi_pci_free(QVirtIOSCSI *vs)
 {
 int i;
@@ -134,7 +101,8 @@ static uint64_t qvirtio_scsi_alloc(QVirtIOSCSI *vs, size_t 
alloc_size,
 static uint8_t virtio_scsi_do_command(QVirtIOSCSI *vs, const uint8_t *cdb,
   const uint8_t *data_in,
   size_t data_in_len,
-  uint8_t *data_out, size_t data_out_len)
+  uint8_t *data_out, size_t data_out_len,
+  QVirtIOSCSICmdResp *resp_out)
 {
 QVirtQueue *vq;
 QVirtIOSCSICmdReq req = { { 0 } };
@@ -174,6 +142,10 @@ static uint8_t virtio_scsi_do_command(QVirtIOSCSI *vs, 
const uint8_t *cdb,
 
 response = readb(resp_addr + offsetof(QVirtIOSCSICmdResp, response));
 
+if (resp_out) {
+memread(resp_addr, resp_out, sizeof(*resp_out));
+}
+
 guest_free(vs->alloc, req_addr);
 guest_free(vs->alloc, resp_addr);
 guest_free(vs->alloc, data_in_addr);
@@ -181,6 +153,52 @@ static uint8_t virtio_scsi_do_command(QVirtIOSCSI *vs, 
const uint8_t *cdb,
 return response;
 }
 
+static QVirtIOSCSI *qvirtio_scsi_pci_init(int slot)
+{
+const uint8_t test_unit_ready_cdb[CDB_SIZE] = {};
+QVirtIOSCSI *vs;
+QVirtioPCIDevice *dev;
+QVirtIOSCSICmdResp resp;
+void *addr;
+int i;
+
+vs = g_new0(QVirtIOSCSI, 1);
+vs->alloc = pc_alloc_init();
+vs->bus = qpci_init_pc();
+
+dev = qvirtio_pci_device_find(vs->bus, QVIRTIO_SCSI_DEVICE_ID);
+vs->dev = (QVirtioDevice *)dev;
+g_assert(dev != NULL);
+g_assert_cmphex(vs->dev->device_type, ==, QVIRTIO_SCSI_DEVICE_ID);
+
+qvirtio_pci_device_enable(dev);
+qvirtio_reset(&qvirtio_pci, vs->dev);
+qvirtio_set_acknowledge(&qvirtio_pci, vs->dev);
+qvirtio_set_driver(&qvirtio_pci, vs->dev);
+
+addr = dev->addr + QVIRTIO_PCI_DEVICE_SPECIFIC_NO_MSIX;
+vs->num_queues = qvirtio_config_readl(&qvirtio_pci, vs->dev,
+  (uint64_t)(uintptr_t)addr);
+
+g_assert_cmpint(vs->num_queues, <, MAX_NUM_QUEUES);
+
+for (i = 0; i < vs->num_queues + 2; i++) {
+vs->vq[i] = qvirtqueue_setup(&qvirtio_pci, vs->dev, vs->alloc, i);
+}
+
+/* Clear the POWER ON OCCURRED unit attention */
+g_assert_cmpint(virtio_scsi_do_command(vs, test_unit_ready_cdb,
+   NULL, 0, NULL, 0, &resp),
+==, 0);
+g_assert_cmpint(resp.status, ==, CHECK_CONDITION);
+g_assert_cmpint(resp.sense[0], ==, 0x70); /* Fixed format sense buffer */
+g_assert_cmpint(resp.sense[2], ==, UNIT_ATTENTION);
+g_assert_cmpint(resp.sense[12], ==, 0x29); /* POWER ON */
+g_assert_cmpint(resp.sense[13], ==, 0x00);
+
+return vs;
+}
+
 /* Tests only initialization so far. TODO: Replace with functional tests */
 static void pci_nop(void)
 {
@@ -231,7 +249,7 @@ static void test_unaligned_write_same(void)
 vs = qvirtio_scsi_pci_init(PCI_SLOT);
 
 g_assert_cmphex(0, ==,
-virtio_scsi_do_command(vs, write_same_c

[Qemu-devel] [PULL 02/20] scsi-disk: fix cmd.mode field typo

2015-08-12 Thread Paolo Bonzini
From: Stefan Hajnoczi 

The cmd.xfer field is the data length.  The cmd.mode field is the data
transfer direction.

scsi_handle_rw_error() was using the wrong error policy for read
requests.

Signed-off-by: Stefan Hajnoczi 
Message-Id: <1438262173-11546-3-git-send-email-stefa...@redhat.com>
Signed-off-by: Paolo Bonzini 
---
 hw/scsi/scsi-disk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 64f0694..73fed3f 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -399,7 +399,7 @@ static void scsi_read_data(SCSIRequest *req)
  */
 static int scsi_handle_rw_error(SCSIDiskReq *r, int error)
 {
-bool is_read = (r->req.cmd.xfer == SCSI_XFER_FROM_DEV);
+bool is_read = (r->req.cmd.mode == SCSI_XFER_FROM_DEV);
 SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
 BlockErrorAction action = blk_get_error_action(s->qdev.conf.blk,
is_read, error);
-- 
2.4.3





[Qemu-devel] [PULL 00/20] SCSI, build, TCG, RCU, misc patches for 2015-08-12

2015-08-12 Thread Paolo Bonzini
The following changes since commit cb48f67ad8c7b33c617d4f8144a27706e69fd688:

  bsd-user: Fix operand to cpu_x86_exec (2015-07-30 12:38:49 +0100)

are available in the git repository at:

  git://github.com/bonzini/qemu.git tags/for-upstream

for you to fetch changes up to 70c6c8bdc7c91bb111710156e1eee7bbe769985f:

  disas: Defeature print_target_address (2015-08-12 15:32:57 +0200)


* SCSI fixes from Stefan and Fam
* vhost-scsi fix from Igor and Lu Lina
* a build system fix from Daniel
* two more multi-arch-related patches from Peter C.
* TCG patches from myself and Sergey Fedorov
* RCU improvement from Wen Congyang
* a few more simple cleanups


Chen Hanxiao (1):
  exec: use macro ROUND_UP for alignment

Daniel P. Berrange (1):
  configure: only add CONFIG_RDMA to config-host.h once

Fam Zheng (3):
  scsi-disk: Fix assertion failure on WRITE SAME
  virtio-scsi-test: Add test case for tail unaligned WRITE SAME
  configure: Default to enable module build

Igor Mammedov (1):
  vhost/scsi: call vhost_dev_cleanup() at unrealize() time

Lu Lina (1):
  vhost-scsi: Clarify vhost_virtqueue_mask argument

Paolo Bonzini (6):
  exec: drop cpu_can_do_io, just read cpu->can_do_io
  qemu-nbd: remove unnecessary qemu_notify_event()
  scsi: create restart bottom half in the right AioContext
  scsi-disk: identify AIO callbacks more clearly
  scsi-generic: identify AIO callbacks more clearly
  hw: fix mask for ColdFire UART command register

Peter Crosthwaite (2):
  cpu_defs: Simplify CPUTLB padding logic
  disas: Defeature print_target_address

Sergey Fedorov (1):
  cpu-exec: Do not invalidate original TB in cpu_exec_nocache()

Stefan Hajnoczi (3):
  virtio-scsi: use virtqueue_map_sg() when loading requests
  scsi-disk: fix cmd.mode field typo
  tests: virtio-scsi: clear unit attention after reset

Wen Congyang (1):
  rcu: Allow calling rcu_(un)register_thread() during synchronize_rcu()

 .travis.yml  |   2 +-
 configure| 125 +++
 cpu-exec.c   |  10 ++--
 cpus.c   |   2 +-
 disas.c  |  12 +
 exec.c   |   2 +-
 hw/char/mcf_uart.c   |   2 +-
 hw/scsi/scsi-bus.c   |   3 +-
 hw/scsi/scsi-disk.c  |  97 
 hw/scsi/scsi-generic.c   |  66 +++--
 hw/scsi/vhost-scsi.c |   3 +-
 hw/scsi/virtio-scsi.c|   5 ++
 include/exec/cpu-defs.h  |  23 -
 include/exec/exec-all.h  |  23 +
 include/qom/cpu.h|   4 +-
 qemu-nbd.c   |   1 -
 qom/cpu.c|   2 +-
 softmmu_template.h   |   4 +-
 tests/virtio-scsi-test.c | 100 +++--
 translate-all.c  |  11 -
 util/rcu.c   |  48 +-
 21 files changed, 335 insertions(+), 210 deletions(-)
-- 
2.4.3




[Qemu-devel] [PULL 10/20] exec: drop cpu_can_do_io, just read cpu->can_do_io

2015-08-12 Thread Paolo Bonzini
After commit 626cf8f (icount: set can_do_io outside TB execution,
2014-12-08), can_do_io is set to 1 if not executing code.  It is
no longer necessary to make this assumption in cpu_can_do_io.

It is also possible to remove the use_icount test, simply by
never setting cpu->can_do_io to 0 unless use_icount is true.

With these changes cpu_can_do_io boils down to a read of
cpu->can_do_io.

Signed-off-by: Paolo Bonzini 
---
 cpu-exec.c  |  2 +-
 cpus.c  |  2 +-
 include/exec/exec-all.h | 21 -
 include/qom/cpu.h   |  4 +++-
 qom/cpu.c   |  2 +-
 softmmu_template.h  |  4 ++--
 translate-all.c |  3 ++-
 7 files changed, 10 insertions(+), 28 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 407fa47..713540f 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -196,7 +196,7 @@ static inline tcg_target_ulong cpu_tb_exec(CPUState *cpu, 
uint8_t *tb_ptr)
 }
 #endif /* DEBUG_DISAS */
 
-cpu->can_do_io = 0;
+cpu->can_do_io = !use_icount;
 next_tb = tcg_qemu_tb_exec(env, tb_ptr);
 cpu->can_do_io = 1;
 trace_exec_tb_exit((void *) (next_tb & ~TB_EXIT_MASK),
diff --git a/cpus.c b/cpus.c
index a822ce3..c1e74d9 100644
--- a/cpus.c
+++ b/cpus.c
@@ -145,7 +145,7 @@ int64_t cpu_get_icount_raw(void)
 
 icount = timers_state.qemu_icount;
 if (cpu) {
-if (!cpu_can_do_io(cpu)) {
+if (!cpu->can_do_io) {
 fprintf(stderr, "Bad icount read\n");
 exit(1);
 }
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 8427225..29775c0 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -346,27 +346,6 @@ extern int singlestep;
 /* cpu-exec.c */
 extern volatile sig_atomic_t exit_request;
 
-/**
- * cpu_can_do_io:
- * @cpu: The CPU for which to check IO.
- *
- * Deterministic execution requires that IO only be performed on the last
- * instruction of a TB so that interrupts take effect immediately.
- *
- * Returns: %true if memory-mapped IO is safe, %false otherwise.
- */
-static inline bool cpu_can_do_io(CPUState *cpu)
-{
-if (!use_icount) {
-return true;
-}
-/* If not executing code then assume we are ok.  */
-if (cpu->current_tb == NULL) {
-return true;
-}
-return cpu->can_do_io != 0;
-}
-
 #if !defined(CONFIG_USER_ONLY)
 void migration_bitmap_extend(ram_addr_t old, ram_addr_t new);
 #endif
diff --git a/include/qom/cpu.h b/include/qom/cpu.h
index 20aabc9..39712ab 100644
--- a/include/qom/cpu.h
+++ b/include/qom/cpu.h
@@ -231,7 +231,9 @@ struct kvm_run;
  * @icount_decr: Number of cycles left, with interrupt flag in high bit.
  * This allows a single read-compare-cbranch-write sequence to test
  * for both decrementer underflow and exceptions.
- * @can_do_io: Nonzero if memory-mapped IO is safe.
+ * @can_do_io: Nonzero if memory-mapped IO is safe. Deterministic execution
+ * requires that IO only be performed on the last instruction of a TB
+ * so that interrupts take effect immediately.
  * @env_ptr: Pointer to subclass-specific CPUArchState field.
  * @current_tb: Currently executing TB.
  * @gdb_regs: Additional GDB registers.
diff --git a/qom/cpu.c b/qom/cpu.c
index eb9cfec..62f4b5d 100644
--- a/qom/cpu.c
+++ b/qom/cpu.c
@@ -247,7 +247,7 @@ static void cpu_common_reset(CPUState *cpu)
 cpu->mem_io_vaddr = 0;
 cpu->icount_extra = 0;
 cpu->icount_decr.u32 = 0;
-cpu->can_do_io = 0;
+cpu->can_do_io = 1;
 cpu->exception_index = -1;
 memset(cpu->tb_jmp_cache, 0, TB_JMP_CACHE_SIZE * sizeof(void *));
 }
diff --git a/softmmu_template.h b/softmmu_template.h
index d42d89d..50dec1c 100644
--- a/softmmu_template.h
+++ b/softmmu_template.h
@@ -154,7 +154,7 @@ static inline DATA_TYPE glue(io_read, SUFFIX)(CPUArchState 
*env,
 
 physaddr = (physaddr & TARGET_PAGE_MASK) + addr;
 cpu->mem_io_pc = retaddr;
-if (mr != &io_mem_rom && mr != &io_mem_notdirty && !cpu_can_do_io(cpu)) {
+if (mr != &io_mem_rom && mr != &io_mem_notdirty && !cpu->can_do_io) {
 cpu_io_recompile(cpu, retaddr);
 }
 
@@ -374,7 +374,7 @@ static inline void glue(io_write, SUFFIX)(CPUArchState *env,
 MemoryRegion *mr = iotlb_to_region(cpu, physaddr);
 
 physaddr = (physaddr & TARGET_PAGE_MASK) + addr;
-if (mr != &io_mem_rom && mr != &io_mem_notdirty && !cpu_can_do_io(cpu)) {
+if (mr != &io_mem_rom && mr != &io_mem_notdirty && !cpu->can_do_io) {
 cpu_io_recompile(cpu, retaddr);
 }
 
diff --git a/translate-all.c b/translate-all.c
index 755cdab..9c46ffa 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -222,6 +222,7 @@ static int cpu_restore_state_from_tb(CPUState *cpu, 
TranslationBlock *tb,
 gen_intermediate_code_pc(env, tb);
 
 if (tb->cflags & CF_USE_ICOUNT) {
+assert(use_icount);
 /* Reset the cycle counter to the start of the block.  */
 cpu->icount_decr.u16.low += tb->icount;
 /* Clear the IO flag.  */
@@ -1470,7 +1471,7 @@ static void tcg_

[Qemu-devel] [PULL 01/20] virtio-scsi: use virtqueue_map_sg() when loading requests

2015-08-12 Thread Paolo Bonzini
From: Stefan Hajnoczi 

The VirtQueueElement struct is serialized during migration but the
in_sg[]/out_sg[] iovec arrays are not usable on the destination host
because the pointers are meaningless.

Use virtqueue_map_sg() to refresh in_sg[]/out_sg[] to valid pointers
based on in_addr[]/out_addr[] hwaddrs.

Cc: Paolo Bonzini 
Signed-off-by: Stefan Hajnoczi 
Message-Id: <1438262173-11546-2-git-send-email-stefa...@redhat.com>
Signed-off-by: Paolo Bonzini 
---
 hw/scsi/virtio-scsi.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 811c3da..a8bb1c6 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -217,6 +217,11 @@ static void *virtio_scsi_load_request(QEMUFile *f, 
SCSIRequest *sreq)
 assert(req->elem.in_num <= ARRAY_SIZE(req->elem.in_sg));
 assert(req->elem.out_num <= ARRAY_SIZE(req->elem.out_sg));
 
+virtqueue_map_sg(req->elem.in_sg, req->elem.in_addr,
+ req->elem.in_num, 1);
+virtqueue_map_sg(req->elem.out_sg, req->elem.out_addr,
+ req->elem.out_num, 0);
+
 if (virtio_scsi_parse_req(req, sizeof(VirtIOSCSICmdReq) + vs->cdb_size,
   sizeof(VirtIOSCSICmdResp) + vs->sense_size) < 0) 
{
 error_report("invalid SCSI request migration data");
-- 
2.4.3





[Qemu-devel] [PULL 05/20] virtio-scsi-test: Add test case for tail unaligned WRITE SAME

2015-08-12 Thread Paolo Bonzini
From: Fam Zheng 

Signed-off-by: Fam Zheng 
Message-Id: <1438159512-3871-3-git-send-email-f...@redhat.com>
Signed-off-by: Paolo Bonzini 
---
 tests/virtio-scsi-test.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/tests/virtio-scsi-test.c b/tests/virtio-scsi-test.c
index 6e91ca9..66d8491 100644
--- a/tests/virtio-scsi-test.c
+++ b/tests/virtio-scsi-test.c
@@ -239,9 +239,12 @@ static void hotplug(void)
 static void test_unaligned_write_same(void)
 {
 QVirtIOSCSI *vs;
-uint8_t buf[512] = { 0 };
-const uint8_t write_same_cdb[CDB_SIZE] = { 0x41, 0x00, 0x00, 0x00, 0x00,
+uint8_t buf1[512] = { 0 };
+uint8_t buf2[512] = { 1 };
+const uint8_t write_same_cdb_1[CDB_SIZE] = { 0x41, 0x00, 0x00, 0x00, 0x00,
0x01, 0x00, 0x00, 0x02, 0x00 };
+const uint8_t write_same_cdb_2[CDB_SIZE] = { 0x41, 0x00, 0x00, 0x00, 0x00,
+   0x01, 0x00, 0x33, 0x00, 0x00 };
 
 qvirtio_scsi_start("-drive file=blkdebug::null-co://,if=none,id=dr1"
",format=raw,file.align=4k "
@@ -249,7 +252,10 @@ static void test_unaligned_write_same(void)
 vs = qvirtio_scsi_pci_init(PCI_SLOT);
 
 g_assert_cmphex(0, ==,
-virtio_scsi_do_command(vs, write_same_cdb, NULL, 0, buf, 512, NULL));
+virtio_scsi_do_command(vs, write_same_cdb_1, NULL, 0, buf1, 512, 
NULL));
+
+g_assert_cmphex(0, ==,
+virtio_scsi_do_command(vs, write_same_cdb_2, NULL, 0, buf2, 512, 
NULL));
 
 qvirtio_scsi_pci_free(vs);
 qvirtio_scsi_stop();
-- 
2.4.3





[Qemu-devel] [PULL 07/20] cpu-exec: Do not invalidate original TB in cpu_exec_nocache()

2015-08-12 Thread Paolo Bonzini
From: Sergey Fedorov 

Instead of invalidating an original TB in cpu_exec_nocache()
prematurely, just save a link to it in the temporary generated TB. If
cpu_io_recompile() is raised subsequently from the temporary TB,
invalidate the original one as well. That allows reusing the original TB
each time cpu_exec_nocache() is called to handle expired instruction
counter in icount mode.

Signed-off-by: Sergey Fedorov 
Message-Id: <1435656909-29116-1-git-send-email-serge.f...@gmail.com>
Signed-off-by: Paolo Bonzini 
---
 cpu-exec.c  | 8 ++--
 include/exec/exec-all.h | 2 ++
 translate-all.c | 8 
 3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/cpu-exec.c b/cpu-exec.c
index 75694f3..407fa47 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -231,19 +231,15 @@ static void cpu_exec_nocache(CPUState *cpu, int 
max_cycles,
  TranslationBlock *orig_tb)
 {
 TranslationBlock *tb;
-target_ulong pc = orig_tb->pc;
-target_ulong cs_base = orig_tb->cs_base;
-uint64_t flags = orig_tb->flags;
 
 /* Should never happen.
We only end up here when an existing TB is too long.  */
 if (max_cycles > CF_COUNT_MASK)
 max_cycles = CF_COUNT_MASK;
 
-/* tb_gen_code can flush our orig_tb, invalidate it now */
-tb_phys_invalidate(orig_tb, -1);
-tb = tb_gen_code(cpu, pc, cs_base, flags,
+tb = tb_gen_code(cpu, orig_tb->pc, orig_tb->cs_base, orig_tb->flags,
  max_cycles | CF_NOCACHE);
+tb->orig_tb = tcg_ctx.tb_ctx.tb_invalidated_flag ? NULL : orig_tb;
 cpu->current_tb = tb;
 /* execute the generated code */
 trace_exec_tb_nocache(tb, tb->pc);
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index a6fce04..8427225 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -155,6 +155,8 @@ struct TranslationBlock {
 void *tc_ptr;/* pointer to the translated code */
 /* next matching tb for physical address. */
 struct TranslationBlock *phys_hash_next;
+/* original tb when cflags has CF_NOCACHE */
+struct TranslationBlock *orig_tb;
 /* first and second physical page containing code. The lower bit
of the pointer tells the index in page_next[] */
 struct TranslationBlock *page_next[2];
diff --git a/translate-all.c b/translate-all.c
index 60a3d8b..755cdab 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -1533,6 +1533,14 @@ void cpu_io_recompile(CPUState *cpu, uintptr_t retaddr)
 cs_base = tb->cs_base;
 flags = tb->flags;
 tb_phys_invalidate(tb, -1);
+if (tb->cflags & CF_NOCACHE) {
+if (tb->orig_tb) {
+/* Invalidate original TB if this TB was generated in
+ * cpu_exec_nocache() */
+tb_phys_invalidate(tb->orig_tb, -1);
+}
+tb_free(tb);
+}
 /* FIXME: In theory this could raise an exception.  In practice
we have already translated the block once so it's probably ok.  */
 tb_gen_code(cpu, pc, cs_base, flags, cflags);
-- 
2.4.3





[Qemu-devel] [PATCH] qemu-thread: add a fast path to the Win32 QemuEvent

2015-08-12 Thread Paolo Bonzini
QemuEvents are used heavily by call_rcu.  We do not want them to be slow,
but the current implementation does a kernel call on every invocation
of qemu_event_* and won't cut it.

So, wrap a Win32 manual-reset event with a fast userspace path.  The
states and transitions are the same as for the futex and mutex/condvar
implementations, but the slow path is different of course.  The idea
is to reset the Win32 event lazily, as part of a test-reset-test-wait
sequence.  Such a sequence is, indeed, how QemuEvents are used by
RCU and other subsystems!

The patch includes a formal model of the algorithm.

Signed-off-by: Paolo Bonzini 
---
 docs/win32-qemu-event.promela | 98 +++
 include/qemu/thread-win32.h   |  1 +
 util/qemu-thread-win32.c  | 66 +++--
 3 files changed, 161 insertions(+), 4 deletions(-)
 create mode 100644 docs/win32-qemu-event.promela

diff --git a/docs/win32-qemu-event.promela b/docs/win32-qemu-event.promela
new file mode 100644
index 000..c446a71
--- /dev/null
+++ b/docs/win32-qemu-event.promela
@@ -0,0 +1,98 @@
+/*
+ * This model describes the implementation of QemuEvent in
+ * util/qemu-thread-win32.c.
+ *
+ * Author: Paolo Bonzini 
+ *
+ * This file is in the public domain.  If you really want a license,
+ * the WTFPL will do.
+ *
+ * To verify it:
+ * spin -a docs/event.promela
+ * gcc -O2 pan.c -DSAFETY
+ * ./a.out
+ */
+
+bool event;
+int value;
+
+/* Primitives for a Win32 event */
+#define RAW_RESET event = false
+#define RAW_SET   event = true
+#define RAW_WAIT  do :: event -> break; od
+
+#if 0
+/* Basic sanity checking: test the Win32 event primitives */
+#define RESET RAW_RESET
+#define SET   RAW_SET
+#define WAIT  RAW_WAIT
+#else
+/* Full model: layer a userspace-only fast path on top of the RAW_*
+ * primitives.  SET/RESET/WAIT have exactly the same semantics as
+ * RAW_SET/RAW_RESET/RAW_WAIT, but try to avoid invoking them.
+ */
+#define EV_SET 0
+#define EV_FREE 1
+#define EV_BUSY -1
+
+int state = EV_FREE;
+
+int xchg_result;
+#define SET   if :: state != EV_SET ->  \
+atomic { /* xchg_result=xchg(state, EV_SET) */  \
+xchg_result = state;\
+state = EV_SET; \
+}   \
+if :: xchg_result == EV_BUSY -> RAW_SET;\
+   :: else -> skip; \
+fi; \
+ :: else -> skip;   \
+  fi
+
+#define RESET if :: state == EV_SET -> atomic { state = state | EV_FREE; }  \
+ :: else-> skip;\
+  fi
+
+int tmp1, tmp2;
+#define WAIT  tmp1 = state; \
+  if :: tmp1 != EV_SET ->   \
+if :: tmp1 == EV_FREE ->\
+  RAW_RESET;\
+  atomic { /* tmp2=cas(state, EV_FREE, EV_BUSY) */  \
+  tmp2 = state; \
+  if :: tmp2 == EV_FREE -> state = EV_BUSY; \
+ :: else-> skip;\
+  fi;   \
+  } \
+  if :: tmp2 == EV_SET -> tmp1 = EV_SET;\
+ :: else   -> tmp1 = EV_BUSY;   \
+  fi;   \
+   :: else -> skip; \
+fi; \
+assert(tmp1 != EV_FREE);\
+if :: tmp1 == EV_BUSY -> RAW_WAIT;  \
+   :: else -> skip; \
+fi; \
+ :: else -> skip;   \
+  fi
+#endif
+
+active proctype waiter()
+{
+ if
+ :: !value ->
+ RESET;
+ if
+ :: !value -> WAIT;
+ :: else   -> skip;
+ fi;
+:: else -> skip;
+fi;
+assert(value);
+}
+
+active proctype notifier()
+{
+value = true;
+SET;
+}
diff --git a/include/qemu/thread-win32.h b/include/qemu/thread-win3

[Qemu-devel] [PULL 15/20] configure: only add CONFIG_RDMA to config-host.h once

2015-08-12 Thread Paolo Bonzini
From: "Daniel P. Berrange" 

For unknown reasons (probably a git rebase merge mistake)

  commit 2da776db4846eadcb808598a5d3484d149773c05
  Author: Michael R. Hines 
  Date:   Mon Jul 22 10:01:54 2013 -0400

rdma: core logic

Adds CONFIG_RDMA to config-host.h twice, as can be seen
in the generated file:

 $ grep CONFIG_RDMA config-host.h
 #define CONFIG_RDMA 1
 #define CONFIG_RDMA 1

Signed-off-by: Daniel P. Berrange 
Message-Id: <1438345403-32467-1-git-send-email-berra...@redhat.com>
Signed-off-by: Paolo Bonzini 
---
 configure | 4 
 1 file changed, 4 deletions(-)

diff --git a/configure b/configure
index 28bf755..a19e8ec 100755
--- a/configure
+++ b/configure
@@ -5646,10 +5646,6 @@ if [ "$pixman" = "internal" ]; then
   echo "config-host.h: subdir-pixman" >> $config_host_mak
 fi
 
-if test "$rdma" = "yes" ; then
-echo "CONFIG_RDMA=y" >> $config_host_mak
-fi
-
 if [ "$dtc_internal" = "yes" ]; then
   echo "config-host.h: subdir-dtc" >> $config_host_mak
 fi
-- 
2.4.3





[Qemu-devel] [PULL 04/20] scsi-disk: Fix assertion failure on WRITE SAME

2015-08-12 Thread Paolo Bonzini
From: Fam Zheng 

The last portion of an unaligned WRITE SAME command could fail the
assertion in bdrv_aligned_pwritev:

assert(!qiov || bytes == qiov->size);

Because we updated data->iov.iov_len right above this if block, but
data->qiov still has the old size.

Reinitialize the qiov to make them equal and keep block layer happy.

Cc: qemu-sta...@nongnu.org
Signed-off-by: Fam Zheng 
Message-Id: <1438159512-3871-2-git-send-email-f...@redhat.com>
Signed-off-by: Paolo Bonzini 
---
 hw/scsi/scsi-disk.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 73fed3f..087541d 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -1683,6 +1683,10 @@ static void scsi_write_same_complete(void *opaque, int 
ret)
 if (data->iov.iov_len) {
 block_acct_start(blk_get_stats(s->qdev.conf.blk), &r->acct,
  data->iov.iov_len, BLOCK_ACCT_WRITE);
+/* blk_aio_write doesn't like the qiov size being different from
+ * nb_sectors, make sure they match.
+ */
+qemu_iovec_init_external(&data->qiov, &data->iov, 1);
 r->req.aiocb = blk_aio_writev(s->qdev.conf.blk, data->sector,
   &data->qiov, data->iov.iov_len / 512,
   scsi_write_same_complete, data);
-- 
2.4.3





[Qemu-devel] [PULL 08/20] cpu_defs: Simplify CPUTLB padding logic

2015-08-12 Thread Paolo Bonzini
From: Peter Crosthwaite 

There was a complicated subtractive arithmetic for determining the
padding on the CPUTLBEntry structure. Simplify this with a union.

Signed-off-by: Peter Crosthwaite 
Message-Id: <1436130533-18565-1-git-send-email-crosthwaite.pe...@gmail.com>
Signed-off-by: Paolo Bonzini 
---
 include/exec/cpu-defs.h | 23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/include/exec/cpu-defs.h b/include/exec/cpu-defs.h
index 98b9cff..5093be2 100644
--- a/include/exec/cpu-defs.h
+++ b/include/exec/cpu-defs.h
@@ -105,17 +105,18 @@ typedef struct CPUTLBEntry {
bit 3  : indicates that the entry is invalid
bit 2..0   : zero
 */
-target_ulong addr_read;
-target_ulong addr_write;
-target_ulong addr_code;
-/* Addend to virtual address to get host address.  IO accesses
-   use the corresponding iotlb value.  */
-uintptr_t addend;
-/* padding to get a power of two size */
-uint8_t dummy[(1 << CPU_TLB_ENTRY_BITS) -
-  (sizeof(target_ulong) * 3 +
-   ((-sizeof(target_ulong) * 3) & (sizeof(uintptr_t) - 1)) +
-   sizeof(uintptr_t))];
+union {
+struct {
+target_ulong addr_read;
+target_ulong addr_write;
+target_ulong addr_code;
+/* Addend to virtual address to get host address.  IO accesses
+   use the corresponding iotlb value.  */
+uintptr_t addend;
+};
+/* padding to get a power of two size */
+uint8_t dummy[1 << CPU_TLB_ENTRY_BITS];
+};
 } CPUTLBEntry;
 
 QEMU_BUILD_BUG_ON(sizeof(CPUTLBEntry) != (1 << CPU_TLB_ENTRY_BITS));
-- 
2.4.3





[Qemu-devel] [PULL 09/20] configure: Default to enable module build

2015-08-12 Thread Paolo Bonzini
From: Fam Zheng 

We have module build support around for a while, but also had it bitrot
several times. It probably makes sense to enable it by default so that
people can notice and use it.

Add --disable-modules as a counterpart to --enable-modules, which is
now turned on by default.  If both are omitted, support is guessed as
usual.

pie is now checked for all platforms, because it's depended on by module
build.

Signed-off-by: Fam Zheng 
Message-Id: <1423481144-20314-2-git-send-email-f...@redhat.com>
Signed-off-by: Paolo Bonzini 
---
 .travis.yml |   2 +-
 configure   | 121 +---
 2 files changed, 84 insertions(+), 39 deletions(-)

diff --git a/.travis.yml b/.travis.yml
index 0ac170b..12bf1db 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -99,5 +99,5 @@ matrix:
   EXTRA_CONFIG="--enable-trace-backends=ust"
   compiler: gcc
 - env: TARGETS=i386-softmmu,x86_64-softmmu
-   EXTRA_CONFIG="--enable-modules"
+   EXTRA_CONFIG="--disable-modules"
   compiler: gcc
diff --git a/configure b/configure
index 704b34c..28bf755 100755
--- a/configure
+++ b/configure
@@ -271,7 +271,7 @@ gcov_tool="gcov"
 EXESUF=""
 DSOSUF=".so"
 LDFLAGS_SHARED="-shared"
-modules="no"
+modules=""
 prefix="/usr/local"
 mandir="\${prefix}/share/man"
 datadir="\${prefix}/share"
@@ -784,6 +784,9 @@ for opt do
   --enable-modules)
   modules="yes"
   ;;
+  --disable-modules)
+  modules="no"
+  ;;
   --cpu=*)
   ;;
   --target-list=*) target_list="$optarg"
@@ -1508,9 +1511,6 @@ if compile_prog "-Werror -fno-gcse" "" ; then
 fi
 
 if test "$static" = "yes" ; then
-  if test "$modules" = "yes" ; then
-error_exit "static and modules are mutually incompatible"
-  fi
   if test "$pie" = "yes" ; then
 error_exit "static and pie are mutually incompatible"
   else
@@ -1518,17 +1518,6 @@ if test "$static" = "yes" ; then
   fi
 fi
 
-# Unconditional check for compiler __thread support
-  cat > $TMPC << EOF
-static __thread int tls_var;
-int main(void) { return tls_var; }
-EOF
-
-if ! compile_prog "-Werror" "" ; then
-error_exit "Your compiler does not support the __thread specifier for " \
-   "Thread-Local Storage (TLS). Please upgrade to a version that does."
-fi
-
 if test "$pie" = ""; then
   case "$cpu-$targetos" in
 i386-Linux|x86_64-Linux|x32-Linux|i386-OpenBSD|x86_64-OpenBSD)
@@ -1601,6 +1590,17 @@ EOF
   fi
 fi
 
+# Unconditional check for compiler __thread support
+  cat > $TMPC << EOF
+static __thread int tls_var;
+int main(void) { return tls_var; }
+EOF
+
+if ! compile_prog "-Werror" "" ; then
+error_exit "Your compiler does not support the __thread specifier for " \
+   "Thread-Local Storage (TLS). Please upgrade to a version that does."
+fi
+
 ##
 # __sync_fetch_and_and requires at least -march=i486. Many toolchains
 # use i686 as default anyway, but for those that don't, an explicit
@@ -2784,17 +2784,24 @@ if test "$modules" = yes; then
 glib_modules="$glib_modules gmodule-2.0"
 fi
 
-for i in $glib_modules; do
-if $pkg_config --atleast-version=$glib_req_ver $i; then
-glib_cflags=`$pkg_config --cflags $i`
-glib_libs=`$pkg_config --libs $i`
-CFLAGS="$glib_cflags $CFLAGS"
-LIBS="$glib_libs $LIBS"
-libs_qga="$glib_libs $libs_qga"
-else
-error_exit "glib-$glib_req_ver $i is required to compile QEMU"
-fi
-done
+glib_pkg_config()
+{
+  if $pkg_config --atleast-version=$glib_req_ver $1; then
+local probe_cflags=$($pkg_config --cflags $1)
+local probe_libs=$($pkg_config --libs $1)
+CFLAGS="$probe_cflags $CFLAGS"
+LIBS="$probe_libs $LIBS"
+libs_qga="$probe_libs $libs_qga"
+glib_cflags="$probe_cflags $glib_cflags"
+glib_libs="$probe_libs $glib_libs"
+return 0
+  else
+return 1
+  fi
+}
+
+glib_pkg_config gthread-2.0 || \
+  error_exit "glib-$glib_req_ver gthread-2.0 is required to compile QEMU"
 
 # g_test_trap_subprocess added in 2.38. Used by some tests.
 glib_subprocess=yes
@@ -2815,19 +2822,57 @@ if ! compile_prog "$glib_cflags -Werror" "$glib_libs" ; 
then
 fi
 
 ##
-# SHA command probe for modules
-if test "$modules" = yes; then
-shacmd_probe="sha1sum sha1 shasum"
-for c in $shacmd_probe; do
-if has $c; then
-shacmd="$c"
-break
-fi
-done
-if test "$shacmd" = ""; then
-error_exit "one of the checksum commands is required to enable 
modules: $shacmd_probe"
+# SHA command and gmodule-2.0 probe for modules
+# return 0 if probe succeeds
+# $1: true - force mode, exit if probe fail
+# false - optoinal mode, return 1 if probe fail
+module_try_enable()
+{
+  force=$1
+  if test "$static" = "yes"; then
+if $force; then
+  error_exit "static and modules are mutually incompatible"
+else
+  modules="no"
+  return
 fi
-fi
+  fi
+  shacmd_probe="sha1sum sha1 shasum"

[Qemu-devel] [PULL 06/20] vhost/scsi: call vhost_dev_cleanup() at unrealize() time

2015-08-12 Thread Paolo Bonzini
From: Igor Mammedov 

vhost-scsi calls vhost_dev_init() at realize() time
but forgets to call it's counterpart vhost_dev_cleanup()
at unrealize() time.

Calling it should fix leaking of memory table and
mem_sections table in vhost device. And also unregister
vhost's memory listerner to prevent access from
memory core to freed memory.

Signed-off-by: Igor Mammedov 
Message-Id: <1438262999-287627-1-git-send-email-imamm...@redhat.com>
Signed-off-by: Paolo Bonzini 
---
 hw/scsi/vhost-scsi.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index a69918b..0dd57ff 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -277,6 +277,7 @@ static void vhost_scsi_unrealize(DeviceState *dev, Error 
**errp)
 /* This will stop vhost backend. */
 vhost_scsi_set_status(vdev, 0);
 
+vhost_dev_cleanup(&s->dev);
 g_free(s->dev.vqs);
 
 virtio_scsi_common_unrealize(dev, errp);
-- 
2.4.3





[Qemu-devel] [PULL 18/20] scsi-generic: identify AIO callbacks more clearly

2015-08-12 Thread Paolo Bonzini
Functions that are not callbacks should assert that aiocb is NULL and
have a SCSIGenericReq argument.

AIO callbacks should assert that aiocb is not NULL.  They also have an
opaque argument.

Reviewed-by: Fam Zheng 
Signed-off-by: Paolo Bonzini 
---
 hw/scsi/scsi-generic.c | 66 +++---
 1 file changed, 41 insertions(+), 25 deletions(-)

diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c
index e53470f..1b6350b 100644
--- a/hw/scsi/scsi-generic.c
+++ b/hw/scsi/scsi-generic.c
@@ -88,12 +88,12 @@ static void scsi_free_request(SCSIRequest *req)
 }
 
 /* Helper function for command completion.  */
-static void scsi_command_complete(void *opaque, int ret)
+static void scsi_command_complete_noio(SCSIGenericReq *r, int ret)
 {
 int status;
-SCSIGenericReq *r = (SCSIGenericReq *)opaque;
 
-r->req.aiocb = NULL;
+assert(r->req.aiocb == NULL);
+
 if (r->req.io_canceled) {
 scsi_req_cancel_complete(&r->req);
 goto done;
@@ -142,6 +142,15 @@ done:
 scsi_req_unref(&r->req);
 }
 
+static void scsi_command_complete(void *opaque, int ret)
+{
+SCSIGenericReq *r = (SCSIGenericReq *)opaque;
+
+assert(r->req.aiocb != NULL);
+r->req.aiocb = NULL;
+scsi_command_complete_noio(r, ret);
+}
+
 static int execute_command(BlockBackend *blk,
SCSIGenericReq *r, int direction,
BlockCompletionFunc *complete)
@@ -172,33 +181,37 @@ static void scsi_read_complete(void * opaque, int ret)
 SCSIDevice *s = r->req.dev;
 int len;
 
+assert(r->req.aiocb != NULL);
 r->req.aiocb = NULL;
+
 if (ret || r->req.io_canceled) {
-scsi_command_complete(r, ret);
+scsi_command_complete_noio(r, ret);
 return;
 }
+
 len = r->io_header.dxfer_len - r->io_header.resid;
 DPRINTF("Data ready tag=0x%x len=%d\n", r->req.tag, len);
 
 r->len = -1;
 if (len == 0) {
-scsi_command_complete(r, 0);
-} else {
-/* Snoop READ CAPACITY output to set the blocksize.  */
-if (r->req.cmd.buf[0] == READ_CAPACITY_10 &&
-(ldl_be_p(&r->buf[0]) != 0xU || s->max_lba == 0)) {
-s->blocksize = ldl_be_p(&r->buf[4]);
-s->max_lba = ldl_be_p(&r->buf[0]) & 0xULL;
-} else if (r->req.cmd.buf[0] == SERVICE_ACTION_IN_16 &&
-   (r->req.cmd.buf[1] & 31) == SAI_READ_CAPACITY_16) {
-s->blocksize = ldl_be_p(&r->buf[8]);
-s->max_lba = ldq_be_p(&r->buf[0]);
-}
-blk_set_guest_block_size(s->conf.blk, s->blocksize);
+scsi_command_complete_noio(r, 0);
+return;
+}
 
-scsi_req_data(&r->req, len);
-scsi_req_unref(&r->req);
+/* Snoop READ CAPACITY output to set the blocksize.  */
+if (r->req.cmd.buf[0] == READ_CAPACITY_10 &&
+(ldl_be_p(&r->buf[0]) != 0xU || s->max_lba == 0)) {
+s->blocksize = ldl_be_p(&r->buf[4]);
+s->max_lba = ldl_be_p(&r->buf[0]) & 0xULL;
+} else if (r->req.cmd.buf[0] == SERVICE_ACTION_IN_16 &&
+   (r->req.cmd.buf[1] & 31) == SAI_READ_CAPACITY_16) {
+s->blocksize = ldl_be_p(&r->buf[8]);
+s->max_lba = ldq_be_p(&r->buf[0]);
 }
+blk_set_guest_block_size(s->conf.blk, s->blocksize);
+
+scsi_req_data(&r->req, len);
+scsi_req_unref(&r->req);
 }
 
 /* Read more data from scsi device into buffer.  */
@@ -213,14 +226,14 @@ static void scsi_read_data(SCSIRequest *req)
 /* The request is used as the AIO opaque value, so add a ref.  */
 scsi_req_ref(&r->req);
 if (r->len == -1) {
-scsi_command_complete(r, 0);
+scsi_command_complete_noio(r, 0);
 return;
 }
 
 ret = execute_command(s->conf.blk, r, SG_DXFER_FROM_DEV,
   scsi_read_complete);
 if (ret < 0) {
-scsi_command_complete(r, ret);
+scsi_command_complete_noio(r, ret);
 }
 }
 
@@ -230,9 +243,12 @@ static void scsi_write_complete(void * opaque, int ret)
 SCSIDevice *s = r->req.dev;
 
 DPRINTF("scsi_write_complete() ret = %d\n", ret);
+
+assert(r->req.aiocb != NULL);
 r->req.aiocb = NULL;
+
 if (ret || r->req.io_canceled) {
-scsi_command_complete(r, ret);
+scsi_command_complete_noio(r, ret);
 return;
 }
 
@@ -242,7 +258,7 @@ static void scsi_write_complete(void * opaque, int ret)
 DPRINTF("block size %d\n", s->blocksize);
 }
 
-scsi_command_complete(r, ret);
+scsi_command_complete_noio(r, ret);
 }
 
 /* Write data to a scsi device.  Returns nonzero on failure.
@@ -264,7 +280,7 @@ static void scsi_write_data(SCSIRequest *req)
 scsi_req_ref(&r->req);
 ret = execute_command(s->conf.blk, r, SG_DXFER_TO_DEV, 
scsi_write_complete);
 if (ret < 0) {
-scsi_command_complete(r, ret);
+scsi_command_complete_noio(r, ret);
 }
 }
 
@@ -306,7 +322,7 @@ static int32_t scsi_send_command(SCSIReq

[Qemu-devel] [PULL 11/20] rcu: Allow calling rcu_(un)register_thread() during synchronize_rcu()

2015-08-12 Thread Paolo Bonzini
From: Wen Congyang 

If rcu_(un)register_thread() is called together with synchronize_rcu(),
it will wait for the synchronize_rcu() to finish. But when synchronize_rcu()
waits for some events, we can modify the list registry.
We also use the lock rcu_gp_lock to assume that synchronize_rcu() isn't
executed in more than one thread at the same time. Add a new mutex lock
rcu_sync_lock to assume it and rename rcu_gp_lock to rcu_registry_lock.
Release rcu_registry_lock when synchronize_rcu() waits for some events.

Signed-off-by: Wen Congyang 
Message-Id: <55b59652.4090...@cn.fujitsu.com>
Signed-off-by: Paolo Bonzini 
---
 util/rcu.c | 48 +++-
 1 file changed, 35 insertions(+), 13 deletions(-)

diff --git a/util/rcu.c b/util/rcu.c
index cdcad67..8ba304d 100644
--- a/util/rcu.c
+++ b/util/rcu.c
@@ -47,7 +47,8 @@
 unsigned long rcu_gp_ctr = RCU_GP_LOCKED;
 
 QemuEvent rcu_gp_event;
-static QemuMutex rcu_gp_lock;
+static QemuMutex rcu_registry_lock;
+static QemuMutex rcu_sync_lock;
 
 /*
  * Check whether a quiescent state was crossed between the beginning of
@@ -66,7 +67,7 @@ static inline int rcu_gp_ongoing(unsigned long *ctr)
  */
 __thread struct rcu_reader_data rcu_reader;
 
-/* Protected by rcu_gp_lock.  */
+/* Protected by rcu_registry_lock.  */
 typedef QLIST_HEAD(, rcu_reader_data) ThreadList;
 static ThreadList registry = QLIST_HEAD_INITIALIZER(registry);
 
@@ -114,10 +115,26 @@ static void wait_for_readers(void)
 break;
 }
 
-/* Wait for one thread to report a quiescent state and
- * try again.
+/* Wait for one thread to report a quiescent state and try again.
+ * Release rcu_registry_lock, so rcu_(un)register_thread() doesn't
+ * wait too much time.
+ *
+ * rcu_register_thread() may add nodes to ®istry; it will not
+ * wake up synchronize_rcu, but that is okay because at least another
+ * thread must exit its RCU read-side critical section before
+ * synchronize_rcu is done.  The next iteration of the loop will
+ * move the new thread's rcu_reader from ®istry to &qsreaders,
+ * because rcu_gp_ongoing() will return false.
+ *
+ * rcu_unregister_thread() may remove nodes from &qsreaders instead
+ * of ®istry if it runs during qemu_event_wait.  That's okay;
+ * the node then will not be added back to ®istry by QLIST_SWAP
+ * below.  The invariant is that the node is part of one list when
+ * rcu_registry_lock is released.
  */
+qemu_mutex_unlock(&rcu_registry_lock);
 qemu_event_wait(&rcu_gp_event);
+qemu_mutex_lock(&rcu_registry_lock);
 }
 
 /* put back the reader list in the registry */
@@ -126,7 +143,8 @@ static void wait_for_readers(void)
 
 void synchronize_rcu(void)
 {
-qemu_mutex_lock(&rcu_gp_lock);
+qemu_mutex_lock(&rcu_sync_lock);
+qemu_mutex_lock(&rcu_registry_lock);
 
 if (!QLIST_EMPTY(®istry)) {
 /* In either case, the atomic_mb_set below blocks stores that free
@@ -149,7 +167,8 @@ void synchronize_rcu(void)
 wait_for_readers();
 }
 
-qemu_mutex_unlock(&rcu_gp_lock);
+qemu_mutex_unlock(&rcu_registry_lock);
+qemu_mutex_unlock(&rcu_sync_lock);
 }
 
 
@@ -273,23 +292,24 @@ void call_rcu1(struct rcu_head *node, void (*func)(struct 
rcu_head *node))
 void rcu_register_thread(void)
 {
 assert(rcu_reader.ctr == 0);
-qemu_mutex_lock(&rcu_gp_lock);
+qemu_mutex_lock(&rcu_registry_lock);
 QLIST_INSERT_HEAD(®istry, &rcu_reader, node);
-qemu_mutex_unlock(&rcu_gp_lock);
+qemu_mutex_unlock(&rcu_registry_lock);
 }
 
 void rcu_unregister_thread(void)
 {
-qemu_mutex_lock(&rcu_gp_lock);
+qemu_mutex_lock(&rcu_registry_lock);
 QLIST_REMOVE(&rcu_reader, node);
-qemu_mutex_unlock(&rcu_gp_lock);
+qemu_mutex_unlock(&rcu_registry_lock);
 }
 
 static void rcu_init_complete(void)
 {
 QemuThread thread;
 
-qemu_mutex_init(&rcu_gp_lock);
+qemu_mutex_init(&rcu_registry_lock);
+qemu_mutex_init(&rcu_sync_lock);
 qemu_event_init(&rcu_gp_event, true);
 
 qemu_event_init(&rcu_call_ready_event, false);
@@ -306,12 +326,14 @@ static void rcu_init_complete(void)
 #ifdef CONFIG_POSIX
 static void rcu_init_lock(void)
 {
-qemu_mutex_lock(&rcu_gp_lock);
+qemu_mutex_lock(&rcu_sync_lock);
+qemu_mutex_lock(&rcu_registry_lock);
 }
 
 static void rcu_init_unlock(void)
 {
-qemu_mutex_unlock(&rcu_gp_lock);
+qemu_mutex_unlock(&rcu_registry_lock);
+qemu_mutex_unlock(&rcu_sync_lock);
 }
 #endif
 
-- 
2.4.3





[Qemu-devel] [PULL 12/20] exec: use macro ROUND_UP for alignment

2015-08-12 Thread Paolo Bonzini
From: Chen Hanxiao 

Use ROUND_UP instead.

Signed-off-by: Chen Hanxiao 
Message-Id: <1437707523-4910-1-git-send-email-chenhanx...@cn.fujitsu.com>
Signed-off-by: Paolo Bonzini 
---
 exec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/exec.c b/exec.c
index 0a4a0c5..54cd70a 100644
--- a/exec.c
+++ b/exec.c
@@ -1210,7 +1210,7 @@ static void *file_ram_alloc(RAMBlock *block,
 unlink(filename);
 g_free(filename);
 
-memory = (memory+hpagesize-1) & ~(hpagesize-1);
+memory = ROUND_UP(memory, hpagesize);
 
 /*
  * ftruncate is not supported by hugetlbfs in older
-- 
2.4.3





[Qemu-devel] [PULL 17/20] scsi-disk: identify AIO callbacks more clearly

2015-08-12 Thread Paolo Bonzini
Functions that are not callbacks should assert that aiocb is NULL and
have a non-opaque argument (usually a pointer to SCSIDiskReq).

AIO callbacks should assert that aiocb is not NULL and take care of
calling block_acct done.  They also have an opaque argument.

Reviewed-by: Fam Zheng 
Signed-off-by: Paolo Bonzini 
---
 hw/scsi/scsi-disk.c | 91 +++--
 1 file changed, 61 insertions(+), 30 deletions(-)

diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index 087541d..bada9a7 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -217,6 +217,8 @@ static void scsi_write_do_fua(SCSIDiskReq *r)
 {
 SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
 
+assert(r->req.aiocb == NULL);
+
 if (r->req.io_canceled) {
 scsi_req_cancel_complete(&r->req);
 goto done;
@@ -235,15 +237,10 @@ done:
 scsi_req_unref(&r->req);
 }
 
-static void scsi_dma_complete_noio(void *opaque, int ret)
+static void scsi_dma_complete_noio(SCSIDiskReq *r, int ret)
 {
-SCSIDiskReq *r = (SCSIDiskReq *)opaque;
-SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
+assert(r->req.aiocb == NULL);
 
-if (r->req.aiocb != NULL) {
-r->req.aiocb = NULL;
-block_acct_done(blk_get_stats(s->qdev.conf.blk), &r->acct);
-}
 if (r->req.io_canceled) {
 scsi_req_cancel_complete(&r->req);
 goto done;
@@ -271,9 +268,13 @@ done:
 static void scsi_dma_complete(void *opaque, int ret)
 {
 SCSIDiskReq *r = (SCSIDiskReq *)opaque;
+SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
 
 assert(r->req.aiocb != NULL);
-scsi_dma_complete_noio(opaque, ret);
+r->req.aiocb = NULL;
+
+block_acct_done(blk_get_stats(s->qdev.conf.blk), &r->acct);
+scsi_dma_complete_noio(r, ret);
 }
 
 static void scsi_read_complete(void * opaque, int ret)
@@ -308,16 +309,13 @@ done:
 }
 
 /* Actually issue a read to the block device.  */
-static void scsi_do_read(void *opaque, int ret)
+static void scsi_do_read(SCSIDiskReq *r, int ret)
 {
-SCSIDiskReq *r = opaque;
 SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
 uint32_t n;
 
-if (r->req.aiocb != NULL) {
-r->req.aiocb = NULL;
-block_acct_done(blk_get_stats(s->qdev.conf.blk), &r->acct);
-}
+assert (r->req.aiocb == NULL);
+
 if (r->req.io_canceled) {
 scsi_req_cancel_complete(&r->req);
 goto done;
@@ -349,6 +347,18 @@ done:
 scsi_req_unref(&r->req);
 }
 
+static void scsi_do_read_cb(void *opaque, int ret)
+{
+SCSIDiskReq *r = (SCSIDiskReq *)opaque;
+SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
+
+assert (r->req.aiocb != NULL);
+r->req.aiocb = NULL;
+
+block_acct_done(blk_get_stats(s->qdev.conf.blk), &r->acct);
+scsi_do_read(opaque, ret);
+}
+
 /* Read more data from scsi device into buffer.  */
 static void scsi_read_data(SCSIRequest *req)
 {
@@ -384,7 +394,7 @@ static void scsi_read_data(SCSIRequest *req)
 if (first && scsi_is_cmd_fua(&r->req.cmd)) {
 block_acct_start(blk_get_stats(s->qdev.conf.blk), &r->acct, 0,
  BLOCK_ACCT_FLUSH);
-r->req.aiocb = blk_aio_flush(s->qdev.conf.blk, scsi_do_read, r);
+r->req.aiocb = blk_aio_flush(s->qdev.conf.blk, scsi_do_read_cb, r);
 } else {
 scsi_do_read(r, 0);
 }
@@ -430,16 +440,12 @@ static int scsi_handle_rw_error(SCSIDiskReq *r, int error)
 return action != BLOCK_ERROR_ACTION_IGNORE;
 }
 
-static void scsi_write_complete(void * opaque, int ret)
+static void scsi_write_complete_noio(SCSIDiskReq *r, int ret)
 {
-SCSIDiskReq *r = (SCSIDiskReq *)opaque;
-SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
 uint32_t n;
 
-if (r->req.aiocb != NULL) {
-r->req.aiocb = NULL;
-block_acct_done(blk_get_stats(s->qdev.conf.blk), &r->acct);
-}
+assert (r->req.aiocb == NULL);
+
 if (r->req.io_canceled) {
 scsi_req_cancel_complete(&r->req);
 goto done;
@@ -467,6 +473,18 @@ done:
 scsi_req_unref(&r->req);
 }
 
+static void scsi_write_complete(void * opaque, int ret)
+{
+SCSIDiskReq *r = (SCSIDiskReq *)opaque;
+SCSIDiskState *s = DO_UPCAST(SCSIDiskState, qdev, r->req.dev);
+
+assert (r->req.aiocb != NULL);
+r->req.aiocb = NULL;
+
+block_acct_done(blk_get_stats(s->qdev.conf.blk), &r->acct);
+scsi_write_complete_noio(r, ret);
+}
+
 static void scsi_write_data(SCSIRequest *req)
 {
 SCSIDiskReq *r = DO_UPCAST(SCSIDiskReq, req, req);
@@ -480,18 +498,18 @@ static void scsi_write_data(SCSIRequest *req)
 scsi_req_ref(&r->req);
 if (r->req.cmd.mode != SCSI_XFER_TO_DEV) {
 DPRINTF("Data transfer direction invalid\n");
-scsi_write_complete(r, -EINVAL);
+scsi_write_complete_noio(r, -EINVAL);
 return;
 }
 
 if (!r->req.sg && !r->qiov.size) {
 /* Called for the first time.  Ask the driver to s

[Qemu-devel] [PULL 16/20] scsi: create restart bottom half in the right AioContext

2015-08-12 Thread Paolo Bonzini
This matches commit 4407c1c (virtio-blk: Schedule BH in the right context,
2014-06-17), which did the same thing for virtio-blk.

Reviewed-by: Fam Zheng 
Signed-off-by: Paolo Bonzini 
---
 hw/scsi/scsi-bus.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index f0ae462..ffac8f4 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -136,7 +136,8 @@ static void scsi_dma_restart_cb(void *opaque, int running, 
RunState state)
 return;
 }
 if (!s->bh) {
-s->bh = qemu_bh_new(scsi_dma_restart_bh, s);
+AioContext *ctx = blk_get_aio_context(s->conf.blk);
+s->bh = aio_bh_new(ctx, scsi_dma_restart_bh, s);
 qemu_bh_schedule(s->bh);
 }
 }
-- 
2.4.3





Re: [Qemu-devel] [RFC v4 1/9] exec.c: Add new exclusive bitmap to ram_list

2015-08-12 Thread alvise rigo
On Wed, Aug 12, 2015 at 2:36 PM, Paolo Bonzini  wrote:
>
>
> On 12/08/2015 09:31, alvise rigo wrote:
>> I think that tlb_flush_entry is not enough, since in theory another
>> vCPU could have a different TLB address referring the same phys
>> address.
>
> You're right, this is a TLB so it's virtually-indexed. :(  I'm not sure
> what happens on ARM, since it has a virtually indexed (VIVT or VIPT)
> cache, but indeed it would be a problem when implementing e.g. CMPXCHG
> using the TCG ll/sc ops.
>
> I'm a bit worried about adding such a big bitmap.  It's only used on
> TCG, but it is also allocated on KVM and on KVM you can have hundreds
> of VCPUs.  Wasting 200 bits per guest memory page (i.e. ~0.6% of guest
> memory) is obviously not a great idea. :(

I agree, it's a waste of memory.

>
> Perhaps we can use a bytemap instead:
>
> - 0..253 = TLB_EXCL must be set in all VCPUs except CPU n.  A VCPU that
> loads the TLB for this vaddr does not have to set it.
>
> - 254 = TLB_EXCL must be set in all VCPUs.  A VCPU that
> loads the TLB for this vaddr has to set it.
>
> - 255 = TLB_EXCL not set in at least two VCPUs
>
> Transitions:
>
> - ll transitions: anything -> 254
>
> - sc transitions: 254 -> current CPU_ID
>
> - TLB_EXCL store transitions: 254 -> current CPU_ID
>
> - tlb_st_page transitions: CPU_ID other than current -> 255
>
> The initial value is 255 on SMP guests, 0 on UP guests.
>
> The algorithms are very similar to yours, just using this approximate
> representation.
>
> ll algorithm:
>   llsc_value = bytemap[vaddr]
>   if llsc_value == CPU_ID
>  do nothing
>   elseif llsc_value < 254
>  flush TLB of CPU llsc_value
>   elseif llsc_value == 255
>  flush all TLBs
>   set TLB_EXCL
>   bytemap[vaddr] = 254
>   load
>
> tlb_set_page algorithm:
>   llsc_value = bytemap[vaddr]
>   if llsc_value == CPU_ID or llsc_value == 255
>  do nothing
>   else if llsc_value == 254
>  set TLB_EXCL
>   else
>  # two CPUs without TLB_EXCL
>  bytemap[vaddr] = 255
>
> TLB_EXCL slow path algorithm:
>if bytemap[vaddr] == 254
>   bytemap[vaddr] = CPU_ID
>else
>   # two CPUs without TLB_EXCL
>   bytemap[vaddr] = 255
>clear TLB_EXCL in this CPU
>store
>
> sc algorithm:
>if bytemap[vaddr] == CPU_ID or bytemap[vaddr] == 254
>   bytemap[vaddr] = CPU_ID
>   clear TLB_EXCL in this CPU
>   store
>   succeed
>else
>   fail
>
> clear algorithm:
>if bytemap[vaddr] == 254
>   bytemap[vaddr] = CPU_ID

Isn't this also required for the clear algorithm?

if bytemap[vaddr] < 254
/* this can happen for the TLB_EXCL slow path effect */
bytemap[vaddr] = 255

The whole idea makes sense, I will consider it for the next iteration
of the patches.

Thanks,
alvise

>
> The UP case is optimized because bytemap[vaddr] will always be 0 or 254.
>
> The algorithm requires the LL to be cleared e.g. on exceptions.
> Paolo
>
>> alvise
>>
>> On Tue, Aug 11, 2015 at 6:32 PM, Paolo Bonzini  wrote:
>>>
>>>
>>> On 11/08/2015 18:11, alvise rigo wrote:
>> Why flush the entire cache (I understand you mean TLB)?
 Sorry, I meant the TLB.
 If for each removal of an exclusive entry we set also the bit to 1, we
 force the following LL to make a tlb_flush() on every vCPU.
>>>
>>> What if you only flush one entry with tlb_flush_entry (on every vCPU)?
>>>
>>> Paolo



[Qemu-devel] [PULL 20/20] disas: Defeature print_target_address

2015-08-12 Thread Paolo Bonzini
From: Peter Crosthwaite 

It does not work in multi-arch as it requires the CPU specific
TARGET_VIRT_ADDR_SPACE_BITS global define. Just use the generic
version that does no masking. Targets should be responsible for
passing in a sane virtual address.

Signed-off-by: Peter Crosthwaite 
Message-Id: <1436129432-16617-1-git-send-email-crosthwaite.pe...@gmail.com>
Signed-off-by: Paolo Bonzini 
---
 disas.c | 12 ++--
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/disas.c b/disas.c
index 69a6066..0ae70c2 100644
--- a/disas.c
+++ b/disas.c
@@ -72,14 +72,6 @@ generic_print_address (bfd_vma addr, struct disassemble_info 
*info)
 (*info->fprintf_func) (info->stream, "0x%" PRIx64, addr);
 }
 
-/* Print address in hex, truncated to the width of a target virtual address. */
-static void
-generic_print_target_address(bfd_vma addr, struct disassemble_info *info)
-{
-uint64_t mask = ~0ULL >> (64 - TARGET_VIRT_ADDR_SPACE_BITS);
-generic_print_address(addr & mask, info);
-}
-
 /* Print address in hex, truncated to the width of a host virtual address. */
 static void
 generic_print_host_address(bfd_vma addr, struct disassemble_info *info)
@@ -201,7 +193,7 @@ void target_disas(FILE *out, CPUState *cpu, target_ulong 
code,
 s.info.read_memory_func = target_read_memory;
 s.info.buffer_vma = code;
 s.info.buffer_length = size;
-s.info.print_address_func = generic_print_target_address;
+s.info.print_address_func = generic_print_address;
 
 #ifdef TARGET_WORDS_BIGENDIAN
 s.info.endian = BFD_ENDIAN_BIG;
@@ -424,7 +416,7 @@ void monitor_disas(Monitor *mon, CPUState *cpu,
 s.cpu = cpu;
 monitor_disas_is_physical = is_physical;
 s.info.read_memory_func = monitor_read_memory;
-s.info.print_address_func = generic_print_target_address;
+s.info.print_address_func = generic_print_address;
 
 s.info.buffer_vma = pc;
 
-- 
2.4.3




[Qemu-devel] [PULL 19/20] hw: fix mask for ColdFire UART command register

2015-08-12 Thread Paolo Bonzini
The "miscellaneous commands" part of the register is 3 bits wide.
Spotted by Coverity and confirmed in the datasheet, downloadable from
http://cache.freescale.com/files/32bit/doc/ref_manual/MCF5307BUM.pdf
(figure 14-6).

Signed-off-by: Paolo Bonzini 
---
 hw/char/mcf_uart.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/char/mcf_uart.c b/hw/char/mcf_uart.c
index 98fd44e..cda22ee 100644
--- a/hw/char/mcf_uart.c
+++ b/hw/char/mcf_uart.c
@@ -126,7 +126,7 @@ static void mcf_uart_do_tx(mcf_uart_state *s)
 static void mcf_do_command(mcf_uart_state *s, uint8_t cmd)
 {
 /* Misc command.  */
-switch ((cmd >> 4) & 3) {
+switch ((cmd >> 4) & 7) {
 case 0: /* No-op.  */
 break;
 case 1: /* Reset mode register pointer.  */
-- 
2.4.3





[Qemu-devel] [PULL 14/20] qemu-nbd: remove unnecessary qemu_notify_event()

2015-08-12 Thread Paolo Bonzini
This was needed when qemu-nbd was using qemu_set_fd_handler2.  It is
not needed anymore now that nbd_update_server_fd_handler is called
whenever nbd_can_accept() can change from false to true.
nbd_update_server_fd_handler will call qemu_set_fd_handler(),
which will call qemu_notify_event().

Reviewed-by: Max Reitz 
Signed-off-by: Paolo Bonzini 
---
 qemu-nbd.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/qemu-nbd.c b/qemu-nbd.c
index 5106b80..d9644b2 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -362,7 +362,6 @@ static void nbd_client_closed(NBDClient *client)
 state = TERMINATE;
 }
 nbd_update_server_fd_handler(server_fd);
-qemu_notify_event();
 nbd_client_put(client);
 }
 
-- 
2.4.3





Re: [Qemu-devel] [RFC PATCH V7 16/19] translate-all: introduces tb_flush_safe.

2015-08-12 Thread Paolo Bonzini


On 10/08/2015 17:27, fred.kon...@greensocs.com wrote:
> From: KONRAD Frederic 
> 
> tb_flush is not thread safe we definitely need to exit VCPUs to do that.
> This introduces tb_flush_safe which just creates an async safe work which will
> do a tb_flush later.
> 
> Signed-off-by: KONRAD Frederic 

You could also allocate a new code buffer and free the old one with
call_rcu.  This should simplify things a lot.

Paolo



[Qemu-devel] [PULL 13/20] vhost-scsi: Clarify vhost_virtqueue_mask argument

2015-08-12 Thread Paolo Bonzini
From: Lu Lina 

vhost_virtqueue_mask takes an "absolute" virtqueue index, while the
code looks like it's passing an index that is relative to
s->dev.vq_index.  In reality, s->dev.vq_index is always zero, so
this patch does not make any difference, but the code is clearer.

Signed-off-by: Lu Lina 
Signed-off-by: Gonglei 
Message-Id: <1437978359-17960-1-git-send-email-arei.gong...@huawei.com>
Signed-off-by: Paolo Bonzini 
---
 hw/scsi/vhost-scsi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index 0dd57ff..7eacca9 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -118,7 +118,7 @@ static int vhost_scsi_start(VHostSCSI *s)
  * enabling/disabling irqfd.
  */
 for (i = 0; i < s->dev.nvqs; i++) {
-vhost_virtqueue_mask(&s->dev, vdev, i, false);
+vhost_virtqueue_mask(&s->dev, vdev, s->dev.vq_index + i, false);
 }
 
 return ret;
-- 
2.4.3





[Qemu-devel] [PATCH] ioapic: fix contents of arbitration register

2015-08-12 Thread Paolo Bonzini
The arbitration register should read to the same value as the
IOAPIC id register.  Fixes kvm-unit-tests ioapic.flat.

Signed-off-by: Paolo Bonzini 
---
 hw/intc/ioapic.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
index 6ad3c66..bde52e8 100644
--- a/hw/intc/ioapic.c
+++ b/hw/intc/ioapic.c
@@ -156,15 +156,13 @@ ioapic_mem_read(void *opaque, hwaddr addr, unsigned int 
size)
 }
 switch (s->ioregsel) {
 case IOAPIC_REG_ID:
+case IOAPIC_REG_ARB:
 val = s->id << IOAPIC_ID_SHIFT;
 break;
 case IOAPIC_REG_VER:
 val = IOAPIC_VERSION |
 ((IOAPIC_NUM_PINS - 1) << IOAPIC_VER_ENTRIES_SHIFT);
 break;
-case IOAPIC_REG_ARB:
-val = 0;
-break;
 default:
 index = (s->ioregsel - IOAPIC_REG_REDTBL_BASE) >> 1;
 if (index >= 0 && index < IOAPIC_NUM_PINS) {
-- 
2.4.3




Re: [Qemu-devel] [RFC v4 1/9] exec.c: Add new exclusive bitmap to ram_list

2015-08-12 Thread Paolo Bonzini


On 12/08/2015 16:04, alvise rigo wrote:
>> > clear algorithm:
>> >if bytemap[vaddr] == 254
>> >   bytemap[vaddr] = CPU_ID
> Isn't this also required for the clear algorithm?
> 
> if bytemap[vaddr] < 254
> /* this can happen for the TLB_EXCL slow path effect */
> bytemap[vaddr] = 255

I don't think so because clear doesn't clear TLB_EXCL.  But maybe we're
talking about two different things. :)

Paolo



Re: [Qemu-devel] [RFC PATCH V7 16/19] translate-all: introduces tb_flush_safe.

2015-08-12 Thread Frederic Konrad

On 12/08/2015 16:09, Paolo Bonzini wrote:


On 10/08/2015 17:27, fred.kon...@greensocs.com wrote:

From: KONRAD Frederic 

tb_flush is not thread safe we definitely need to exit VCPUs to do that.
This introduces tb_flush_safe which just creates an async safe work which will
do a tb_flush later.

Signed-off-by: KONRAD Frederic 

You could also allocate a new code buffer and free the old one with
call_rcu.  This should simplify things a lot.

Paolo

Depending the size of the code buffer this might be a good idea. :).

Fred



[Qemu-devel] [PATCH] hw/misc/zynq_slcr: Change CPU clock rate

2015-08-12 Thread Guenter Roeck
The Linux kernel only accepts 34 Khz and 67 Khz clock rates, and
may crash if the actual clock rate is too low. The clock rate used to be
(ps-clk-frequency * 26 / 4), which resulted in a CPU frequency of
21 Khz if ps-clk-frequency was set to  Hz. Change it to
(ps-clk-frequency * 20 / 2) = 33 Khz to make Linux happy.

Signed-off-by: Guenter Roeck 
---
 hw/misc/zynq_slcr.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/misc/zynq_slcr.c b/hw/misc/zynq_slcr.c
index 964f253..d3e1ce0 100644
--- a/hw/misc/zynq_slcr.c
+++ b/hw/misc/zynq_slcr.c
@@ -189,7 +189,7 @@ static void zynq_slcr_reset(DeviceState *d)
 
 s->regs[LOCKSTA] = 1;
 /* 0x100 - 0x11C */
-s->regs[ARM_PLL_CTRL]   = 0x0001A008;
+s->regs[ARM_PLL_CTRL]   = 0x00014008;
 s->regs[DDR_PLL_CTRL]   = 0x0001A008;
 s->regs[IO_PLL_CTRL]= 0x0001A008;
 s->regs[PLL_STATUS] = 0x003F;
@@ -198,7 +198,7 @@ static void zynq_slcr_reset(DeviceState *d)
 s->regs[IO_PLL_CFG] = 0x00014000;
 
 /* 0x120 - 0x16C */
-s->regs[ARM_CLK_CTRL]   = 0x1F000400;
+s->regs[ARM_CLK_CTRL]   = 0x1F000200;
 s->regs[DDR_CLK_CTRL]   = 0x1843;
 s->regs[DCI_CLK_CTRL]   = 0x01E03201;
 s->regs[APER_CLK_CTRL]  = 0x01FFCCCD;
-- 
2.1.4




Re: [Qemu-devel] [PATCH v8 3/5] Introduce irqchip type specification for KVM

2015-08-12 Thread Christoffer Dall
On Wed, Aug 12, 2015 at 4:14 PM, Eric Auger  wrote:
> Hi,
> On 08/12/2015 03:23 PM, Christoffer Dall wrote:
>> On Wed, Aug 12, 2015 at 2:59 PM, Peter Maydell  
>> wrote:
>>> On 12 August 2015 at 13:27, Pavel Fedin  wrote:
  Hello!

> I still think this is the wrong approach -- see my remarks
> in the previous round of patch review.

  You know... I thought a little bit...
  So far, test = true in KVM_CREATE_DEVICE means that we just want to know 
 whether this type is supported. No actual actions is done by the kernel. 
 Is it correct?
> yes
> If yes, we can just leave this test as it is, because if it says that
> GICv2 is supported by KVM_CREATE_DEVICE, then:
 1. We use new API. No KVM_IRQCHIP_CREATE.
 2. GICv3 may be supported.

  Therefore, if we do this check, and it succeeds, then we just
 proceed, and later actually try to create GICv3. If it fails for some
 reason, we will see error message anyway. So would it be OK just not
 to touch kvm_arch_irqchip_create() at all?
>>>
>>> No, because then if the kernel only supports GICv3 the
>>> code in kvm_arch_irqchip_create() (as it is currently written)
>>> will erroneously fall back to using the old API.
>>>
>>> Christoffer: the question was, why does kvm_arch_irqchip_create()
>>> not only check the KVM_CAP_DEVICE_CTRL but also try to see
>>> if it can KVM_CREATE_DEVICE the GICv2 in order to avoid
>>> falling back to the old pre-KVM_CREATE_DEVICE API ? Are there
>>> kernels which have the capability bit set but which can't
>>> actually use KVM_CREATE_DEVICE to create the irqchip?
>>
>> My thinking probably was that technically the KVM_CAP_DEVICE_CTRL is
>> an orthogonal concept from how to create the vgic, and you could
>> advertise this capability without also supporting the GICv2 device
>> type.
>>
>> However, I don't believe such kernels exist
> the capability was advertised for arm/arm64 with GICv2
> 7330672  KVM: arm-vgic: Support KVM_CREATE_DEVICE for VGIC (1 year, 8
> months ago) 
> so effectively I don't think we have any arm kernel advertising the CAP
> without GICv3.
>
>  and they cannot in the
>> future as that would be because we would remove an actively supported
>> API.
>>
>>>
>>> My preference here would be for kvm_arch_irqchip_create()
>>> to just use 'is the KVM_CAP_DEVICE_CTRL capability set'
>>> for its "can we use the new API" test; that will then
>>> work whether we have a GICv2 or GICv3 in the host. (The
>>> actual GIC device creation later on might fail, of course,
>>> but that can be handled at that point. The only thing
>>> we need to do as early as kvm_arch_irqchip_create is
>>> determine whether we must use the old API.)
> another way was proposed in the past was consisting in calling first
> ret = kvm_create_device(s, KVM_DEV_TYPE_ARM_VGIC_V3, true);
> if this succeeds this means we have the new API (the only one used vy
> v3) and hence we have it as well for VGIC_V2, we can return ...
> if this fails we just can say VGICv3 KVM device hasn't registered, try
> KVM_DEV_TYPE_ARM_VGIC_V2
> if we have it return else fall back to older API
>
> I think it worked as well?
>
> Besides I think Peter's suggestion is simpler.
>
> For info we also use KVM VFIO device now.
>
Note that there's a difference between "I called the create-device
ioctl, and the creation of the device failed" vs. "I called the ioctl
and I got an error because the ioctl is not supported", only in the
latter case should you fall back to the older API.

Not sure if the end result is the same with the suggested approach,
based on the right error values etc.

-Christoffer



Re: [Qemu-devel] [PATCH] Makefile.target: include top level build dir in vpath

2015-08-12 Thread Michael Marineau
On Aug 12, 2015 6:32 AM, "Paolo Bonzini"  wrote:
>
>
>
> On 09/08/2015 09:02, Michael Marineau wrote:
> > Using ccache with CCACHE_BASEDIR set to $(SRC_PATH) or a parent will
> > rewrite all absolute paths to relative paths. This interacts poorly with
> > QEMU's two-level build directory scheme. For example, lets say
> > BUILD_DIR=$(SRC_PATH)/build so build/blockdev.d will contain:
> >
> >   blockdev.o: ../blockdev.c ../include/sysemu/block-backend.h \
> >
> > Now the target build under build/x86_64-softmmu or similar will depend
> > on ../blockdev.o which in turn will get make to source ../blockdev.d to
> > check its dependencies. Since make always considers paths relative to
> > the current working directory rather than the makefile the path appeared
> > in the relative path to ../blockdev.c is useless.
> >
> > This change simply adds the top level build directory to vpath so paths
> > relative to the source directory, top build directory, and target build
> > directory all work just fine.
> > ---
> >  Makefile.target | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/Makefile.target b/Makefile.target
> > index 3e7aafd..dc32294 100644
> > --- a/Makefile.target
> > +++ b/Makefile.target
> > @@ -7,7 +7,7 @@ include config-target.mak
> >  include config-devices.mak
> >  include $(SRC_PATH)/rules.mak
> >
> > -$(call set-vpath, $(SRC_PATH))
> > +$(call set-vpath, $(SRC_PATH):$(BUILD_DIR))
> >  ifdef CONFIG_LINUX
> >  QEMU_CFLAGS += -I../linux-headers
> >  endif
> >
>
> Hi,
>
> can you reply with "Signed-off-by: Michael Marineau
> ", representing that you've read and agreed
> to the Developer Certificate of Origin (http://developercertificate.org/)?
>
> This is necessary to include your patch in QEMU.
>
> Paolo

Oops, forgot about that.

Signed-off-by: Michael Marineau



Re: [Qemu-devel] [PATCH v2] target-cris: update CPU state save/load to use VMStateDescription

2015-08-12 Thread Edgar E. Iglesias
On Fri, Aug 07, 2015 at 05:02:14PM +0100, Peter Maydell wrote:
> From: Juan Quintela 
> 
> Update the CRIS CPU state save/load to use a VMStateDescription struct
> rather than cpu_save/cpu_load functions.
> 
> Have to define TLBSet struct.
> Multidimensional arrays in C are a mess, just unroll them.
> 
> Signed-off-by: Juan Quintela 


Acked-by: Edgar E. Iglesias 



> [PMM:
>  * expand commit message a little since it's no longer one patch in
>a 35-patch series
>  * add header/copyright comment to machine.c; credited copyright is
>Red Hat and author is Juan, since this commit gives the file all-new
>contents; license is LGPL-2-or-later, to match other target-cris code
>  * remove hardcoded tab
>  * add fields for locked_irq, interrupt_vector, fault_vector, trap_vector
>  * drop minimum_version_id_old fields
>  * bump version_id to 2 as we are not compatible with old state format
>  * remove unnecessary hw/boards.h include
>  * update to register via dc->vmsd]
> Signed-off-by: Peter Maydell 
> ---
> 
> This is a patch of Juan's from way back in 2012, which I am resurrecting
> because we now have only two CPUs which still use old-style non-VMState
> save/load (CRIS and SPARC). If we can update them both we can drop the
> machinery in the common code which supports this.
> 
> Notes:
>  * CPUCRISState indent style is somewhat mismatched (cf load_info);
>I took the "minimal make checkpatch happy" approach, but am
>happy to do something else with the line changed here
>  * I have added a copyright header to target-cris/machine.c, because it
>did not have one at all, and this commit effectively gives the
>file all-new contents. I have set it up as LGPLv2-or-later,
>copyright Red Hat, author Juan. Please let me know if you would
>prefer something else!
>  * I added vmstate entries for four fields which did not previously
>get saved and restored, which is presumably a bug fix...
>  * vmsave/vmload on the axis-dev88 board does not currently seem to
>work (among other obvious problems, there is no vmstate support
>in the interrupt controller), so we're limited to "looks good
>on code review" here.
> 
> Changes v1->v2:
>  * actually register the vmstate struct, by setting dc->vmsd
>(discussion on IRC with Andreas a few weeks back indicated that
>dc->vmsd is preferred over cc->vmsd for new CPUs)
>  * update it to be the right format for a dc->vmsd struct
> 
> Testing: I checked with a host gdb that we seem to be filling in
> the registers with the right values. Since none of the devices
> in the axis model (including the interrupt controller) support
> vmstate save/load, the run after load from snapshot goes wrong pretty
> quickly, obviously.
> 
>  target-cris/cpu-qom.h |   4 ++
>  target-cris/cpu.c |   1 +
>  target-cris/cpu.h |  13 ++--
>  target-cris/machine.c | 167 
> +-
>  4 files changed, 95 insertions(+), 90 deletions(-)
> 
> diff --git a/target-cris/cpu-qom.h b/target-cris/cpu-qom.h
> index 6fc30c2..df4c0b5 100644
> --- a/target-cris/cpu-qom.h
> +++ b/target-cris/cpu-qom.h
> @@ -73,6 +73,10 @@ static inline CRISCPU *cris_env_get_cpu(CPUCRISState *env)
>  
>  #define ENV_OFFSET offsetof(CRISCPU, env)
>  
> +#ifndef CONFIG_USER_ONLY
> +extern const struct VMStateDescription vmstate_cris_cpu;
> +#endif
> +
>  void cris_cpu_do_interrupt(CPUState *cpu);
>  void crisv10_cpu_do_interrupt(CPUState *cpu);
>  bool cris_cpu_exec_interrupt(CPUState *cpu, int int_req);
> diff --git a/target-cris/cpu.c b/target-cris/cpu.c
> index b17e849..d461e07 100644
> --- a/target-cris/cpu.c
> +++ b/target-cris/cpu.c
> @@ -302,6 +302,7 @@ static void cris_cpu_class_init(ObjectClass *oc, void 
> *data)
>  cc->handle_mmu_fault = cris_cpu_handle_mmu_fault;
>  #else
>  cc->get_phys_page_debug = cris_cpu_get_phys_page_debug;
> +dc->vmsd = &vmstate_cris_cpu;
>  #endif
>  
>  cc->gdb_num_core_regs = 49;
> diff --git a/target-cris/cpu.h b/target-cris/cpu.h
> index d422e35..6760dc6 100644
> --- a/target-cris/cpu.h
> +++ b/target-cris/cpu.h
> @@ -108,6 +108,11 @@
>  
>  #define NB_MMU_MODES 2
>  
> +typedef struct {
> +uint32_t hi;
> +uint32_t lo;
> +} TLBSet;
> +
>  typedef struct CPUCRISState {
>   uint32_t regs[16];
>   /* P0 - P15 are referred to as special registers in the docs.  */
> @@ -161,11 +166,7 @@ typedef struct CPUCRISState {
>*
>* One for I and another for D.
>*/
> - struct
> - {
> - uint32_t hi;
> - uint32_t lo;
> - } tlbsets[2][4][16];
> +TLBSet tlbsets[2][4][16];
>  
>   CPU_COMMON
>  
> @@ -227,8 +228,6 @@ enum {
>  #define cpu_gen_code cpu_cris_gen_code
>  #define cpu_signal_handler cpu_cris_signal_handler
>  
> -#define CPU_SAVE_VERSION 1
> -
>  /* MMU modes definitions */
>  #define MMU_MODE0_SUFFIX _kernel
>  #define MMU_MODE1_SUFFIX _user
> diff --git a/target-cris/machine.c b/target-cris/machine.c
> index 

Re: [Qemu-devel] [Xen-devel] [PATCH v1] xenpt: Properly handle 64-bit bar with more than 4G size

2015-08-12 Thread Konrad Rzeszutek Wilk
On Wed, Aug 12, 2015 at 08:53:44AM +, Wu, Feng wrote:
> 
> 
> > -Original Message-
> > From: Jan Beulich [mailto:jbeul...@suse.com]
> > Sent: Wednesday, August 12, 2015 4:43 PM
> > To: Wu, Feng
> > Cc: stefano.stabell...@eu.citrix.com; xen-de...@lists.xensource.com;
> > qemu-devel@nongnu.org
> > Subject: RE: [Qemu-devel] [Xen-devel] [PATCH v1] xenpt: Properly handle 
> > 64-bit
> > bar with more than 4G size
> > 
> > >>> On 12.08.15 at 09:10,  wrote:
> > 
> > >
> > >> -Original Message-
> > >> From: qemu-devel-bounces+feng.wu=intel@nongnu.org
> > >> [mailto:qemu-devel-bounces+feng.wu=intel@nongnu.org] On Behalf Of
> > >> Jan Beulich
> > >> Sent: Wednesday, August 12, 2015 2:59 PM
> > >> To: Wu, Feng
> > >> Cc: xen-de...@lists.xensource.com; qemu-devel@nongnu.org;
> > >> stefano.stabell...@eu.citrix.com
> > >> Subject: Re: [Qemu-devel] [Xen-devel] [PATCH v1] xenpt: Properly handle
> > 64-bit
> > >> bar with more than 4G size
> > >>
> > >> >>> On 05.08.15 at 04:02,  wrote:
> > >> > @@ -491,8 +474,9 @@ static int
> > >> xen_pt_bar_reg_write(XenPCIPassthroughState *s, XenPTReg *cfg_entry,
> > >> >  bar_ro_mask = XEN_PT_BAR_IO_RO_MASK | (r_size - 1);
> > >> >  break;
> > >> >  case XEN_PT_BAR_FLAG_UPPER:
> > >> > +r = &d->io_regions[index-1];
> > >>
> > >> Perhaps worth an assert(index > 0)?
> > >
> > > No problem, I will add it. BTW, do you have any other comments about this
> > > patch? If no, I am
> > > going to send out the new version with this changes.
> > 
> > No - everything else looks to make sense (but continues to need
> > testing).
> > 
> 
> I don't have such a device in hand. Can anybody who has such a device help to 
> test this
> patch? It would be highly appreciated!

Um, 4GB MMIO bars? Wouldn't the Nvidia Quadro K6000 12GB GDDR do it? I am sure
that Intel would be OK expensing $4K of hardware :-)

> 
> Thanks,
> Feng
> 
> > Jan
> 
> 
> ___
> Xen-devel mailing list
> xen-de...@lists.xen.org
> http://lists.xen.org/xen-devel



Re: [Qemu-devel] [PATCH] MAINTAINERS: list smbios maintainers

2015-08-12 Thread Wei Huang



On 8/12/15 04:20, Michael S. Tsirkin wrote:

Now that smbios has its own directory, list its
maintainers. Same people as ACPI so just reuse that
entry.

Signed-off-by: Michael S. Tsirkin 
---
  MAINTAINERS | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 978b717..a059d5d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -645,13 +645,15 @@ S: Supported
  F: include/hw/pci/*
  F: hw/pci/*

-ACPI
+ACPI/SMBIOS
  M: Michael S. Tsirkin 
  M: Igor Mammedov 
  S: Supported
  F: include/hw/acpi/*
+F: include/hw/smbios/*
  F: hw/mem/*
  F: hw/acpi/*
+F: hw/smbios/*
  F: hw/i386/acpi-build.[hc]
  F: hw/i386/*dsl
  F: hw/arm/virt-acpi-build.c


Acked-by: Wei Huang 

Note that SMBIOS patch isn't in yet.

-Wei



Re: [Qemu-devel] [RFC v4 1/9] exec.c: Add new exclusive bitmap to ram_list

2015-08-12 Thread alvise rigo
On Wed, Aug 12, 2015 at 4:10 PM, Paolo Bonzini  wrote:
>
>
> On 12/08/2015 16:04, alvise rigo wrote:
>>> > clear algorithm:
>>> >if bytemap[vaddr] == 254
>>> >   bytemap[vaddr] = CPU_ID
>> Isn't this also required for the clear algorithm?
>>
>> if bytemap[vaddr] < 254
>> /* this can happen for the TLB_EXCL slow path effect */
>> bytemap[vaddr] = 255
>
> I don't think so because clear doesn't clear TLB_EXCL.  But maybe we're
> talking about two different things. :)

Let me go through it again, I might have missed something :)

alvise

>
> Paolo



Re: [Qemu-devel] [RFC PATCH V7 16/19] translate-all: introduces tb_flush_safe.

2015-08-12 Thread Paolo Bonzini


On 12/08/2015 16:11, Frederic Konrad wrote:
>> You could also allocate a new code buffer and free the old one with
>> call_rcu.  This should simplify things a lot.
>
> Depending the size of the code buffer this might be a good idea. :).

32 megabytes.

Paolo



Re: [Qemu-devel] [PATCH v8 3/5] Introduce irqchip type specification for KVM

2015-08-12 Thread Eric Auger
Hi,
On 08/12/2015 03:23 PM, Christoffer Dall wrote:
> On Wed, Aug 12, 2015 at 2:59 PM, Peter Maydell  
> wrote:
>> On 12 August 2015 at 13:27, Pavel Fedin  wrote:
>>>  Hello!
>>>
 I still think this is the wrong approach -- see my remarks
 in the previous round of patch review.
>>>
>>>  You know... I thought a little bit...
>>>  So far, test = true in KVM_CREATE_DEVICE means that we just want to know 
>>> whether this type is supported. No actual actions is done by the kernel. Is 
>>> it correct? 
yes
If yes, we can just leave this test as it is, because if it says that
GICv2 is supported by KVM_CREATE_DEVICE, then:
>>> 1. We use new API. No KVM_IRQCHIP_CREATE.
>>> 2. GICv3 may be supported.
>>>
>>>  Therefore, if we do this check, and it succeeds, then we just
>>> proceed, and later actually try to create GICv3. If it fails for some
>>> reason, we will see error message anyway. So would it be OK just not
>>> to touch kvm_arch_irqchip_create() at all?
>>
>> No, because then if the kernel only supports GICv3 the
>> code in kvm_arch_irqchip_create() (as it is currently written)
>> will erroneously fall back to using the old API.
>>
>> Christoffer: the question was, why does kvm_arch_irqchip_create()
>> not only check the KVM_CAP_DEVICE_CTRL but also try to see
>> if it can KVM_CREATE_DEVICE the GICv2 in order to avoid
>> falling back to the old pre-KVM_CREATE_DEVICE API ? Are there
>> kernels which have the capability bit set but which can't
>> actually use KVM_CREATE_DEVICE to create the irqchip?
> 
> My thinking probably was that technically the KVM_CAP_DEVICE_CTRL is
> an orthogonal concept from how to create the vgic, and you could
> advertise this capability without also supporting the GICv2 device
> type.
> 
> However, I don't believe such kernels exist
the capability was advertised for arm/arm64 with GICv2
7330672  KVM: arm-vgic: Support KVM_CREATE_DEVICE for VGIC (1 year, 8
months ago) 
so effectively I don't think we have any arm kernel advertising the CAP
without GICv3.

 and they cannot in the
> future as that would be because we would remove an actively supported
> API.
> 
>>
>> My preference here would be for kvm_arch_irqchip_create()
>> to just use 'is the KVM_CAP_DEVICE_CTRL capability set'
>> for its "can we use the new API" test; that will then
>> work whether we have a GICv2 or GICv3 in the host. (The
>> actual GIC device creation later on might fail, of course,
>> but that can be handled at that point. The only thing
>> we need to do as early as kvm_arch_irqchip_create is
>> determine whether we must use the old API.)
another way was proposed in the past was consisting in calling first
ret = kvm_create_device(s, KVM_DEV_TYPE_ARM_VGIC_V3, true);
if this succeeds this means we have the new API (the only one used vy
v3) and hence we have it as well for VGIC_V2, we can return ...
if this fails we just can say VGICv3 KVM device hasn't registered, try
KVM_DEV_TYPE_ARM_VGIC_V2
if we have it return else fall back to older API

I think it worked as well?

Besides I think Peter's suggestion is simpler.

For info we also use KVM VFIO device now.

Best Regards

Eric
>>
> I'm fine with this.
> 
> -Christoffer
> 




[Qemu-devel] [PATCH] sh4: Fix initramfs initialization for endiannes-mismatched targets

2015-08-12 Thread Guenter Roeck
If host and target endianness does not match, loding an initramfs does not work.
Fix by writing boot parameters with appropriate endianness conversion.

Signed-off-by: Guenter Roeck 
---
 hw/sh4/r2d.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/sh4/r2d.c b/hw/sh4/r2d.c
index 5e22ed7..3b0b2ec 100644
--- a/hw/sh4/r2d.c
+++ b/hw/sh4/r2d.c
@@ -338,9 +338,9 @@ static void r2d_init(MachineState *machine)
 }
 
 /* initialization which should be done by firmware */
-boot_params.loader_type = 1;
-boot_params.initrd_start = INITRD_LOAD_OFFSET;
-boot_params.initrd_size = initrd_size;
+boot_params.loader_type = tswap32(1);
+boot_params.initrd_start = tswap32(INITRD_LOAD_OFFSET);
+boot_params.initrd_size = tswap32(initrd_size);
 }
 
 if (kernel_cmdline) {
-- 
2.1.4




Re: [Qemu-devel] [PATCH] linux-user: elfload: Still use TARGET_PAGE_SIZE for i386 guest

2015-08-12 Thread Richard Henderson
On 08/12/2015 12:59 AM, gchen gchen wrote:
>> Nack.  There's 99 problems with host page size> guest page size.  This
>> solves none of them, and in the hackiest way possible.
>>
> 
> Under alpha virtual machine, if set i386 guest page size 8KB, it will
> cause failure directly (any dynamically linked binaries can not work).

Yes, I know.  The same thing happens when running i386 guests on other
(non-virtual) hosts.  E.g. Sparc64's 8kB page, PowerPC64's 64kB page.

> Do you have any other ideas for solving this issue?

The only complete solution that I see is to use softmmu with linux-user, so
that we properly emulate the guest pages.  Yes, it will cause quite some
slow-down in emulation, but I believe it's the only reliable way.


r~



Re: [Qemu-devel] [PATCH for-2.5 15/30] m68k: add more modes to movem

2015-08-12 Thread Richard Henderson
On 08/12/2015 01:07 AM, Andreas Schwab wrote:
> Richard Henderson  writes:
> 
>> On 08/09/2015 01:13 PM, Laurent Vivier wrote:
>>> +opsize = (insn & 0x40) != 0 ? OS_LONG : OS_WORD;
>>> +incr = opsize_bytes(opsize);
>>> +if (!is_load && (insn & 070) == 040) {
>>> +for (i = 15; i >= 0; i--, mask >>= 1) {
>>
>> This has got to be wrong.  Just because it's pre-decrement doesn't mean
>> you should skip all of the loads.
> 
> Pre-dec only supports reg-to-mem, and is special because mask is bit
> reversed.

Ah, I'd never noticed that.  A comment to that effect would be good.


r~



  1   2   >