date:20170130

Re: [Qemu-devel] Towards an ivshmem 2.0?

2017-01-30 Thread Markus Armbruster

Jan Kiszka  writes:

> On 2017-01-27 20:36, Markus Armbruster wrote:
>> Jan Kiszka  writes:
>> 
>>> On 2017-01-23 15:19, Markus Armbruster wrote:
 Jan Kiszka  writes:

> Hi,
>
> some of you may know that we are using a shared memory device similar to
> ivshmem in the partitioning hypervisor Jailhouse [1].
>
> We started as being compatible to the original ivshmem that QEMU
> implements, but we quickly deviated in some details, and in the recent
> months even more. Some of the deviations are related to making the
> implementation simpler. The new ivshmem takes <500 LoC - Jailhouse is

 Compare: hw/misc/ivshmem.c ~1000 SLOC, measured with sloccount.
>>>
>>> That difference comes from remote/migration support and general QEMU
>>> integration - likely not very telling due to the different environments.
>> 
>> Plausible.
>> 
> aiming at safety critical systems and, therefore, a small code base.
> Other changes address deficits in the original design, like missing
> life-cycle management.
>
> Now the question is if there is interest in defining a common new
> revision of this device and maybe also of some protocols used on top,
> such as virtual network links. Ideally, this would enable us to share
> Linux drivers. We will definitely go for upstreaming at least a network
> driver such as [2], a UIO driver and maybe also a serial port/console.
>
> I've attached a first draft of the specification of our new ivshmem
> device. A working implementation can be found in the wip/ivshmem2 branch
> of Jailhouse [3], the corresponding ivshmem-net driver in [4].
>
> Deviations from the original design:
>
> - Only two peers per link

 Uh, define "link".
>>>
>>> VMs are linked via a common shared memory. Interrupt delivery follows
>>> that route as well.
>>>

>   This simplifies the implementation and also the interfaces (think of
>   life-cycle management in a multi-peer environment). Moreover, we do
>   not have an urgent use case for multiple peers, thus also not
>   reference for a protocol that could be used in such setups. If someone
>   else happens to share such a protocol, it would be possible to discuss
>   potential extensions and their implications.
>
> - Side-band registers to discover and configure share memory regions
>
>   This was one of the first changes: We removed the memory regions from
>   the PCI BARs and gave them special configuration space registers. By
>   now, these registers are embedded in a PCI capability. The reasons are
>   that Jailhouse does not allow to relocate the regions in guest address
>   space (but other hypervisors may if they like to) and that we now have
>   up to three of them.

 I'm afraid I don't quite understand the change, nor the rationale.  I
 guess I could figure out the former by studying the specification.
>>>
>>> a) It's a Jailhouse thing (we disallow the guest to move the regions
>>>around in its address space)
>>> b) With 3 regions + MSI-X + MMIO registers, we run out of BARs (or
>>>would have to downgrade them to 32 bit)
>> 
>> Have you considered putting your three shared memory regions in memory
>> consecutively, so they can be covered by a single BAR?  Similar to how a
>> single BAR covers both MSI-X table and PBA.
>
> Would still require to pass three times some size information (each
> region can be different or empty/non-existent).

Yes.  Precedence: location of MSI-X table and PBA are specified in the
MSI-X Capability Structure as offset and BIR.

> Moreover, a) is not
> possible then without ugly modifications to the guest because they
> expect BAR-based regions to be relocatable.

Can you explain why not letting the guest map the shared memory into its
address space on its own just like any other piece of device memory is a
requirement?

> - Changed PCI base class code to 0xff (unspecified class)

 Changed from 0x5 (memory controller).
>>>
>>> Right.
>>>

>   This allows us to define our own sub classes and interfaces. That is
>   now exploited for specifying the shared memory protocol the two
>   connected peers should use. It also allows the Linux drivers to match
>   on that.
>
> - INTx interrupts support is back
>
>   This is needed on target platforms without MSI controllers, i.e.
>   without the required guest support. Namely some PCI-less ARM SoCs
>   required the reintroduction. While doing this, we also took care of
>   keeping the MMIO registers free of privileged controls so that a
>   guest OS can map them safely into a guest userspace application.

 So you need interrupt capability.  Current upstream ivshmem requires a
 server such as the one in contrib/ivshmem-server/.  What about yours?
>>>
>>> IIRC, the need for

Re: [Qemu-devel] Towards an ivshmem 2.0?

2017-01-30 Thread Markus Armbruster

Jan Kiszka  writes:

> On 2017-01-29 15:00, Marc-André Lureau wrote:
>> Hi
>> 
>> On Sun, Jan 29, 2017 at 12:44 PM Jan Kiszka > > wrote:
>> 
>> >> Of course, I'm careful with investing much time into expanding the
>> >> existing, for Jailhouse possibly sufficient design if there no real
>> >> interest in continuing the ivshmem support in QEMU - because of
>> >> vhost-pci or other reasons. But if that interest exists, it would be
>> >> beneficial for us to have QEMU supporting a compatible version
>> and using
>> >> the same guest drivers. Then I would start looking into concrete
>> patches
>> >> for it as well.
>> >
>> > Interest is difficult for me to gauge, not least because alternatives
>> > are still being worked on.
>> 
>> I'm considering to suggest this as GSoC project now.
>> 
>> 
>> It's better for a student and for the community if the work get accepted
>> in the end.

Yes.

>> So, I think that could be an intersting GSoC (implementing your ivshmem
>> 2 proposal). However, if the qemu community isn't ready to accept a new
>> ivshmem, and would rather have vhost-pci based solution, I would suggest
>> a different project (hopefully Wei Wang can help define it and mentor):
>> work on a vhost-pci using dedicated shared PCI BARs (and kernel support
>> to avoid extra copy - if I understand the extra copy situation correctly).
>
> It's still open if vhost-pci can replace ivshmem (not to speak of being
> desirable for Jailhouse - I'm still studying). In that light, having
> both implementations available to do real comparisons is valuable IMHO.

Yes, but is it appropriate for GSoC?

> That said, we will play with open cards, explain the student the
> situation and let her/him decide knowingly.

Both the student and the QEMU project need to consider the situation
carefully.

> Jan
>
> PS: We have a mixed history /wrt actually merging student projects.

Yes, but having screwed up is no license to screw up some more :)

Re: [Qemu-devel] Towards an ivshmem 2.0?

2017-01-30 Thread Jan Kiszka

On 2017-01-30 09:02, Markus Armbruster wrote:
> Jan Kiszka  writes:
> 
>> On 2017-01-29 15:00, Marc-André Lureau wrote:
>>> Hi
>>>
>>> On Sun, Jan 29, 2017 at 12:44 PM Jan Kiszka >> > wrote:
>>>
>>> >> Of course, I'm careful with investing much time into expanding the
>>> >> existing, for Jailhouse possibly sufficient design if there no real
>>> >> interest in continuing the ivshmem support in QEMU - because of
>>> >> vhost-pci or other reasons. But if that interest exists, it would be
>>> >> beneficial for us to have QEMU supporting a compatible version
>>> and using
>>> >> the same guest drivers. Then I would start looking into concrete
>>> patches
>>> >> for it as well.
>>> >
>>> > Interest is difficult for me to gauge, not least because alternatives
>>> > are still being worked on.
>>>
>>> I'm considering to suggest this as GSoC project now.
>>>
>>>
>>> It's better for a student and for the community if the work get accepted
>>> in the end.
> 
> Yes.
> 
>>> So, I think that could be an intersting GSoC (implementing your ivshmem
>>> 2 proposal). However, if the qemu community isn't ready to accept a new
>>> ivshmem, and would rather have vhost-pci based solution, I would suggest
>>> a different project (hopefully Wei Wang can help define it and mentor):
>>> work on a vhost-pci using dedicated shared PCI BARs (and kernel support
>>> to avoid extra copy - if I understand the extra copy situation correctly).
>>
>> It's still open if vhost-pci can replace ivshmem (not to speak of being
>> desirable for Jailhouse - I'm still studying). In that light, having
>> both implementations available to do real comparisons is valuable IMHO.
> 
> Yes, but is it appropriate for GSoC?
> 
>> That said, we will play with open cards, explain the student the
>> situation and let her/him decide knowingly.
> 
> Both the student and the QEMU project need to consider the situation
> carefully.
> 
>> Jan
>>
>> PS: We have a mixed history /wrt actually merging student projects.
> 
> Yes, but having screwed up is no license to screw up some more :)
> 

After having received multiple feedbacks in this direction, I will drop
that proposal from our list. So, don't worry. ;)

Jan

Re: [Qemu-devel] [PATCH v2 8/8] hw: Drop superfluous special checks for orphaned -drive

2017-01-30 Thread Markus Armbruster

John Snow  writes:

> On 01/27/2017 11:04 AM, Markus Armbruster wrote:
>> John Snow  writes:
>> 
>>> On 01/27/2017 06:51 AM, Markus Armbruster wrote:
 John Snow  writes:

> On 01/26/2017 10:09 AM, Markus Armbruster wrote:
>> We've traditionally rejected orphans here and there, but not
>> systematically.  For instance, the sun4m machines have an onboard SCSI
>> HBA (bus=0), and have always rejected bus>0.  Other machines with an
>> onboard SCSI HBA don't.
>>
>> Commit a66c9dc made all orphans trigger a warning, and the previous
>> commit turned this into an error.  The checks "here and there" are now
>> redundant.  Drop them.
>>
>> Note that the one in mips_jazz.c was wrong: it rejected bus > MAX_FD,
>> but MAX_FD is the number of floppy drives per bus.
>>
>> Error messages change from
>>
>> $ qemu-system-x86_64 -drive if=ide,bus=2
>> qemu-system-x86_64: Too many IDE buses defined (3 > 2)
>> $ qemu-system-mips64 -M magnum,accel=qtest -drive 
>> if=floppy,bus=2,id=fd1
>> qemu: too many floppy drives
>> $ qemu-system-sparc -M LX -drive if=scsi,bus=1
>> qemu: too many SCSI bus
>>
>> to
>>
>> $ qemu-system-x86_64 -drive if=ide,bus=2
>> qemu-system-x86_64: -drive if=ide,bus=2: machine type does not 
>> support this drive
>> $ qemu-system-mips64 -M magnum,accel=qtest -drive 
>> if=floppy,bus=2,id=fd1
>> qemu-system-mips64: -drive if=floppy,bus=2,id=fd1: machine type does 
>> not support this drive
>> $ qemu-system-sparc -M LX -drive if=scsi,bus=1
>> qemu-system-sparc: -drive if=scsi,bus=1: machine type does not 
>> support this drive
>>
>
> Hm, that's a lot less helpful, isn't it? Can we augment with hints?

 The message itself may be less specific, but it now comes with a precise
 location.  Personally, I'd even find

 qemu-system-sparc: -drive if=scsi,bus=1: *mumble* *mumble*

 more helpful than

 qemu: too many SCSI bus

 because the former tells me *which* of the options is bad.  We tend to
 have lots and lots of them.

 The deleted special case errors cover only a minority of "orphan"
 -drive.  If these cases need improvement, then so will the general case.
 If you can come up with a hint that makes the general case message more
 useful, I'm more than happy to squash it into PATCH 6.

>>>
>>> The old error had "why" and the new error has "where" but neither has
>>> both. I would suggest that from the "why" you can divine the "where,"
>>> but the opposite is not as easily true.
>> 
>> Some users will be able to divine more easily than others.  Consider my
>> "too many floppy drives" example.  There's just one, and the machine
>> actually supports two.  The user has to make the connection to "bus=2"
>> somehow.  Now, anybody crazy enough to mess with bus= can probably be
>> expected to figure this out, but still, the deleted error messages
>> aren't exactly wonderful.
>> 
>>> The new error even suggests information I think is wrong and misleading:
>>> We do support SCSI! (Just not this many of them.)
>> 
>> Well, the error doesn't say "machine doesn't support SCSI", only
>> "doesn't support this particular -drive".  Perhaps it could be worded
>> more clearly.  Ideas?
>> 
>
> Ah, I see what you mean now. I interpreted "this drive" to mean SCSI,
> not this SCSI *instance*.

Uh, I would've written "does not support if=scsi" then.  But I see where
you come from.

>   If it can be made clearer that QEMU is simply
> unable to instantiate this particular instance, that'd be fine.
>
> Instead of "Machine type does not support this drive,"
>
> how about
>
> "Machine type cannot instantiate this drive instance"

Hmm.

> Or ... follow your own best judgement. This is really YOUR wheelhouse.
> My example is a little wordy.

All I can come up with is even wordier: "Machine type doesn't support
this combination of if, bus, unit", or "if=scsi,bus=%d,unit=%d not
supported with this machine type".  Could also be confusing when the
user specified index instead of bus, unit.

Good error messages are hard...

Re: [Qemu-devel] Towards an ivshmem 2.0?

2017-01-30 Thread Jan Kiszka

On 2017-01-30 09:00, Markus Armbruster wrote:
> Jan Kiszka  writes:
> 
>> On 2017-01-27 20:36, Markus Armbruster wrote:
>>> Jan Kiszka  writes:
>>>
 On 2017-01-23 15:19, Markus Armbruster wrote:
> Jan Kiszka  writes:
>
>> Hi,
>>
>> some of you may know that we are using a shared memory device similar to
>> ivshmem in the partitioning hypervisor Jailhouse [1].
>>
>> We started as being compatible to the original ivshmem that QEMU
>> implements, but we quickly deviated in some details, and in the recent
>> months even more. Some of the deviations are related to making the
>> implementation simpler. The new ivshmem takes <500 LoC - Jailhouse is
>
> Compare: hw/misc/ivshmem.c ~1000 SLOC, measured with sloccount.

 That difference comes from remote/migration support and general QEMU
 integration - likely not very telling due to the different environments.
>>>
>>> Plausible.
>>>
>> aiming at safety critical systems and, therefore, a small code base.
>> Other changes address deficits in the original design, like missing
>> life-cycle management.
>>
>> Now the question is if there is interest in defining a common new
>> revision of this device and maybe also of some protocols used on top,
>> such as virtual network links. Ideally, this would enable us to share
>> Linux drivers. We will definitely go for upstreaming at least a network
>> driver such as [2], a UIO driver and maybe also a serial port/console.
>>
>> I've attached a first draft of the specification of our new ivshmem
>> device. A working implementation can be found in the wip/ivshmem2 branch
>> of Jailhouse [3], the corresponding ivshmem-net driver in [4].
>>
>> Deviations from the original design:
>>
>> - Only two peers per link
>
> Uh, define "link".

 VMs are linked via a common shared memory. Interrupt delivery follows
 that route as well.

>
>>   This simplifies the implementation and also the interfaces (think of
>>   life-cycle management in a multi-peer environment). Moreover, we do
>>   not have an urgent use case for multiple peers, thus also not
>>   reference for a protocol that could be used in such setups. If someone
>>   else happens to share such a protocol, it would be possible to discuss
>>   potential extensions and their implications.
>>
>> - Side-band registers to discover and configure share memory regions
>>
>>   This was one of the first changes: We removed the memory regions from
>>   the PCI BARs and gave them special configuration space registers. By
>>   now, these registers are embedded in a PCI capability. The reasons are
>>   that Jailhouse does not allow to relocate the regions in guest address
>>   space (but other hypervisors may if they like to) and that we now have
>>   up to three of them.
>
> I'm afraid I don't quite understand the change, nor the rationale.  I
> guess I could figure out the former by studying the specification.

 a) It's a Jailhouse thing (we disallow the guest to move the regions
around in its address space)
 b) With 3 regions + MSI-X + MMIO registers, we run out of BARs (or
would have to downgrade them to 32 bit)
>>>
>>> Have you considered putting your three shared memory regions in memory
>>> consecutively, so they can be covered by a single BAR?  Similar to how a
>>> single BAR covers both MSI-X table and PBA.
>>
>> Would still require to pass three times some size information (each
>> region can be different or empty/non-existent).
> 
> Yes.  Precedence: location of MSI-X table and PBA are specified in the
> MSI-X Capability Structure as offset and BIR.
> 
>> Moreover, a) is not
>> possible then without ugly modifications to the guest because they
>> expect BAR-based regions to be relocatable.
> 
> Can you explain why not letting the guest map the shared memory into its
> address space on its own just like any other piece of device memory is a
> requirement?

It requires reconfiguration of the sensitive 2nd level page tables
during runtime of the guest. We are avoiding the neccessery checking and
synchronization measures so far which reduces code complexity further.

BTW, PCI has a similar concept of static assignment (PCI EA), but that
is unfortunately incompatible to our needs [1].

> 
>> - Changed PCI base class code to 0xff (unspecified class)
>
> Changed from 0x5 (memory controller).

 Right.

>
>>   This allows us to define our own sub classes and interfaces. That is
>>   now exploited for specifying the shared memory protocol the two
>>   connected peers should use. It also allows the Linux drivers to match
>>   on that.
>>
>> - INTx interrupts support is back
>>
>>   This is needed on target platforms without MSI contr

Re: [Qemu-devel] [PATCH RFC] mem-prealloc: Reduce large guest start-up and migration time.

2017-01-30 Thread Jitendra Kolhe

On 1/27/2017 6:23 PM, Juan Quintela wrote:
> Jitendra Kolhe  wrote:
>> Using "-mem-prealloc" option for a very large guest leads to huge guest
>> start-up and migration time. This is because with "-mem-prealloc" option
>> qemu tries to map every guest page (create address translations), and
>> make sure the pages are available during runtime. virsh/libvirt by
>> default, seems to use "-mem-prealloc" option in case the guest is
>> configured to use huge pages. The patch tries to map all guest pages
>> simultaneously by spawning multiple threads. Given the problem is more
>> prominent for large guests, the patch limits the changes to the guests
>> of at-least 64GB of memory size. Currently limiting the change to QEMU
>> library functions on POSIX compliant host only, as we are not sure if
>> the problem exists on win32. Below are some stats with "-mem-prealloc"
>> option for guest configured to use huge pages.
>>
>> 
>> Idle Guest  | Start-up time | Migration time
>> 
>> Guest stats with 2M HugePage usage - single threaded (existing code)
>> 
>> 64 Core - 4TB   | 54m11.796s| 75m43.843s
> ^^
> 
>> 64 Core - 1TB   | 8m56.576s | 14m29.049s
>> 64 Core - 256GB | 2m11.245s | 3m26.598s
>> 
>> Guest stats with 2M HugePage usage - map guest pages using 8 threads
>> 
>> 64 Core - 4TB   | 5m1.027s  | 34m10.565s
>> 64 Core - 1TB   | 1m10.366s | 8m28.188s
>> 64 Core - 256GB | 0m19.040s | 2m10.148s
>> ---
>> Guest stats with 2M HugePage usage - map guest pages using 16 threads
>> ---
>> 64 Core - 4TB   | 1m58.970s | 31m43.400s
> ^
> 
> Impressive, not everyday one get an speedup of 20 O:-)
> 
> 
>> +static void *do_touch_pages(void *arg)
>> +{
>> +PageRange *range = (PageRange *)arg;
>> +char *start_addr = range->addr;
>> +uint64_t numpages = range->numpages;
>> +uint64_t hpagesize = range->hpagesize;
>> +uint64_t i = 0;
>> +
>> +for (i = 0; i < numpages; i++) {
>> +memset(start_addr + (hpagesize * i), 0, 1);
> 
> I would use the range->addr and similar here directly, but it is just a
> question of taste.
> 

Thanks for your response, 
will update my next patch.

>> -/* MAP_POPULATE silently ignores failures */
>> -for (i = 0; i < numpages; i++) {
>> -memset(area + (hpagesize * i), 0, 1);
>> +/* touch pages simultaneously for memory >= 64G */
>> +if (memory < (1ULL << 36)) {
> 
> 64GB guest already took quite a bit of time, I think I would put it
> always as min(num_vcpus, 16).  So, we always execute the multiple theard
> codepath?
> 

It sounds good idea to have a heuristic on vcpu count. I will add in my 
next version. But shouldn't we also restrict based on guest RAM size too, 
to avoid overhead spawning multiple threads for thin guests?

Thanks,
- Jitendra

> But very nice, thanks.
> 
> Later, Juan.
>

Re: [Qemu-devel] [PATCH v2 8/8] hw: Drop superfluous special checks for orphaned -drive

2017-01-30 Thread John Snow



On 01/30/2017 03:10 AM, Markus Armbruster wrote:
> John Snow  writes:
> 
>> On 01/27/2017 11:04 AM, Markus Armbruster wrote:
>>> John Snow  writes:
>>>
 On 01/27/2017 06:51 AM, Markus Armbruster wrote:
> John Snow  writes:
>
>> On 01/26/2017 10:09 AM, Markus Armbruster wrote:
>>> We've traditionally rejected orphans here and there, but not
>>> systematically.  For instance, the sun4m machines have an onboard SCSI
>>> HBA (bus=0), and have always rejected bus>0.  Other machines with an
>>> onboard SCSI HBA don't.
>>>
>>> Commit a66c9dc made all orphans trigger a warning, and the previous
>>> commit turned this into an error.  The checks "here and there" are now
>>> redundant.  Drop them.
>>>
>>> Note that the one in mips_jazz.c was wrong: it rejected bus > MAX_FD,
>>> but MAX_FD is the number of floppy drives per bus.
>>>
>>> Error messages change from
>>>
>>> $ qemu-system-x86_64 -drive if=ide,bus=2
>>> qemu-system-x86_64: Too many IDE buses defined (3 > 2)
>>> $ qemu-system-mips64 -M magnum,accel=qtest -drive 
>>> if=floppy,bus=2,id=fd1
>>> qemu: too many floppy drives
>>> $ qemu-system-sparc -M LX -drive if=scsi,bus=1
>>> qemu: too many SCSI bus
>>>
>>> to
>>>
>>> $ qemu-system-x86_64 -drive if=ide,bus=2
>>> qemu-system-x86_64: -drive if=ide,bus=2: machine type does not 
>>> support this drive
>>> $ qemu-system-mips64 -M magnum,accel=qtest -drive 
>>> if=floppy,bus=2,id=fd1
>>> qemu-system-mips64: -drive if=floppy,bus=2,id=fd1: machine type 
>>> does not support this drive
>>> $ qemu-system-sparc -M LX -drive if=scsi,bus=1
>>> qemu-system-sparc: -drive if=scsi,bus=1: machine type does not 
>>> support this drive
>>>
>>
>> Hm, that's a lot less helpful, isn't it? Can we augment with hints?
>
> The message itself may be less specific, but it now comes with a precise
> location.  Personally, I'd even find
>
> qemu-system-sparc: -drive if=scsi,bus=1: *mumble* *mumble*
>
> more helpful than
>
> qemu: too many SCSI bus
>
> because the former tells me *which* of the options is bad.  We tend to
> have lots and lots of them.
>
> The deleted special case errors cover only a minority of "orphan"
> -drive.  If these cases need improvement, then so will the general case.
> If you can come up with a hint that makes the general case message more
> useful, I'm more than happy to squash it into PATCH 6.
>

 The old error had "why" and the new error has "where" but neither has
 both. I would suggest that from the "why" you can divine the "where,"
 but the opposite is not as easily true.
>>>
>>> Some users will be able to divine more easily than others.  Consider my
>>> "too many floppy drives" example.  There's just one, and the machine
>>> actually supports two.  The user has to make the connection to "bus=2"
>>> somehow.  Now, anybody crazy enough to mess with bus= can probably be
>>> expected to figure this out, but still, the deleted error messages
>>> aren't exactly wonderful.
>>>
 The new error even suggests information I think is wrong and misleading:
 We do support SCSI! (Just not this many of them.)
>>>
>>> Well, the error doesn't say "machine doesn't support SCSI", only
>>> "doesn't support this particular -drive".  Perhaps it could be worded
>>> more clearly.  Ideas?
>>>
>>
>> Ah, I see what you mean now. I interpreted "this drive" to mean SCSI,
>> not this SCSI *instance*.
> 
> Uh, I would've written "does not support if=scsi" then.  But I see where
> you come from.
> 
>>   If it can be made clearer that QEMU is simply
>> unable to instantiate this particular instance, that'd be fine.
>>
>> Instead of "Machine type does not support this drive,"
>>
>> how about
>>
>> "Machine type cannot instantiate this drive instance"
> 
> Hmm.
> 
>> Or ... follow your own best judgement. This is really YOUR wheelhouse.
>> My example is a little wordy.
> 
> All I can come up with is even wordier: "Machine type doesn't support
> this combination of if, bus, unit", or "if=scsi,bus=%d,unit=%d not
> supported with this machine type".  Could also be confusing when the
> user specified index instead of bus, unit.
> 
> Good error messages are hard...
> 

Yes, so just follow your own instincts. Just offering a non-blocking
comment.

--js

Re: [Qemu-devel] [PATCH 1/1] block: add missed BDRV_O_NOCACHE when block device is opened without file

2017-01-30 Thread Denis V. Lunev

On 01/28/2017 07:23 PM, Max Reitz wrote:
> On 25.01.2017 20:44, Denis V. Lunev wrote:
>> On 01/25/2017 08:59 PM, Max Reitz wrote:
>>> [CC-ing John]
>>>
>>> On 25.01.2017 17:42, Denis V. Lunev wrote:
 Technically there is a problem when the guest DVD is created by libvirt
 with AIO mode 'native' on Linux. Current QEMU is unable to start the
 domain configured as follows:
 
   
   
   
 
 The problem comes from the combination of 'cache' and 'io' options.

 'io' option is common option and it is removed from block driver
 specific options. 'cache' originally is not. The patch makes 'cache'
 option common. This works fine as long as cdrom media insertion
 later on.

 Signed-off-by: Denis V. Lunev 
 CC: Kevin Wolf 
 CC: Max Reitz 
 ---
  blockdev.c | 18 +++---
  1 file changed, 15 insertions(+), 3 deletions(-)
>>> There was a Red Hat BZ for this:
>>>
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1342999
>>>
>>> There is now a corresponding BZ for libvirt:
>>>
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1377321
>>>
>>> The gist is that it was determined to be a problem with libvirt.
>>>
>>> RHEL has a downstream commit to work around this issue by ignoring the
>>> cache mode set for empty CD-ROMs -- but that is only because that was
>>> simpler than fixing libvirt.
>>>
>>> (Note that it's ignoring the cache mode instead of actually evaluating it.)
>>>
>> This is what I have exactly started from:
>> http://ftp.redhat.com/pub/redhat/linux/enterprise/7Server/en/RHEV/SRPMS/qemu-kvm-rhev-2.6.0-27.el7.src.rpm
>>
>> This package starts VM well for the above mentioned configuration:
>>
>> 
>>   
>>   
>>   
>> 
>>
>> but the problem comes later at 'change' moment. It reports
>>
>> 'Details: internal error: unable to execute QEMU command 'change':
>> aio=native was specified, but it requires cache.direct=on, which
>> was not specified.)'
>>
>>
>> Thus partial solution implemented by the RedHat is really
>> partial and does not work at the second stage.
> Yes, because it is not supposed to work for that. The only thing the
> downstream patch does is ignore the cache mode so you can still start
> the VM even if libvirt decides to specify a cache parameter for an empty
> drive (which it should not do).
>
>>I have started from
>>
>> diff --git a/blockdev.c b/blockdev.c
>> index d280dc4..e2c9053 100644   
>> --- a/blockdev.c
>> +++ b/blockdev.c
>> @@ -608,6 +608,13 @@ static BlockBackend *blockdev_init(const char
>> *file, QDict *bs_opts,
>>  goto early_err;
>>  }
>>  
>> +if (qdict_haskey(bs_opts, BDRV_OPT_CACHE_DIRECT)) {
>> +bdrv_flags |= BDRV_O_NOCACHE;
>> +}
>> +if (qdict_haskey(bs_opts, BDRV_OPT_CACHE_NO_FLUSH)) {
>> +bdrv_flags |= BDRV_O_NO_FLUSH;
>> +}
>> +
>>  blk_rs = blk_get_root_state(blk);
>>  blk_rs->open_flags= bdrv_flags;
>>  blk_rs->read_only = !(bdrv_flags & BDRV_O_RDWR);
>>
>> The problem for us is that we have such previously valid configurations
>> in the field.
>>
 May be this has already discussed, but still. AIO=native for CDROM without
 media seems important case.
>>> First, I personally don't find aio=native very important for CD-ROM
>>> drives. Yes, we shouldn't make IDE CD-ROM slower than necessary, but I
>>> don't think aio=native will make the difference between "Slow like
>>> CD-ROMs are in reality" and "As fast as virtio".
>> the problem for me is that each clone() call is costly and counts. That
>> is why we are trying to avoid it whenever possible. That is why 'native'
>> mode is MUCH better. Also it would be very nice not to use cached
>> IO, which is very good for memory overcommit situations.
>>
>> Here I am fighting not with the performance, which does not make
>> any real sense, but with a memory footprint.
> Fair enough. Still, specifying a cache mode for an empty drive simply
> doesn't make sense.
>
> Have you considered inserting a null-co:// file at startup as a
> workaround? If you use the old 'change' (or blockdev-change-medium)
> command, it should retain AIO and cache mode.
ok. Thanks for the suggestion.


>
>>> Second, all this patch does is revert some changes done by commit
>>> 91a097e7478940483e76d52217f05bc05b98d5a5, which was very deliberate.
>>>
>>> Third, you may then be asking for the recommended way to put an
>>> aio=native medium into a CD-ROM drive. Good thing you ask, because we
>>> have a way that we want to recommend but can't because it's still
>>> considered experimental:
>>>
>>> The BDS is added using blockdev-add, with all of the appropriate caching
>>> and aio options. Then it's inserted into the drive using the
>>> x-blockdev-insert-medium command, and the drive is closed using
>>> blockdev-close-tray.
>> the problem, again, is that with x-blockdev-insert-me

Re: [Qemu-devel] [PATCH RFC] mem-prealloc: Reduce large guest start-up and migration time.

2017-01-30 Thread Jitendra Kolhe

On 1/27/2017 6:33 PM, Dr. David Alan Gilbert wrote:
> * Jitendra Kolhe (jitendra.ko...@hpe.com) wrote:
>> Using "-mem-prealloc" option for a very large guest leads to huge guest
>> start-up and migration time. This is because with "-mem-prealloc" option
>> qemu tries to map every guest page (create address translations), and
>> make sure the pages are available during runtime. virsh/libvirt by
>> default, seems to use "-mem-prealloc" option in case the guest is
>> configured to use huge pages. The patch tries to map all guest pages
>> simultaneously by spawning multiple threads. Given the problem is more
>> prominent for large guests, the patch limits the changes to the guests
>> of at-least 64GB of memory size. Currently limiting the change to QEMU
>> library functions on POSIX compliant host only, as we are not sure if
>> the problem exists on win32. Below are some stats with "-mem-prealloc"
>> option for guest configured to use huge pages.
>>
>> 
>> Idle Guest  | Start-up time | Migration time
>> 
>> Guest stats with 2M HugePage usage - single threaded (existing code)
>> 
>> 64 Core - 4TB   | 54m11.796s| 75m43.843s
>> 64 Core - 1TB   | 8m56.576s | 14m29.049s
>> 64 Core - 256GB | 2m11.245s | 3m26.598s
>> 
>> Guest stats with 2M HugePage usage - map guest pages using 8 threads
>> 
>> 64 Core - 4TB   | 5m1.027s  | 34m10.565s
>> 64 Core - 1TB   | 1m10.366s | 8m28.188s
>> 64 Core - 256GB | 0m19.040s | 2m10.148s
>> ---
>> Guest stats with 2M HugePage usage - map guest pages using 16 threads
>> ---
>> 64 Core - 4TB   | 1m58.970s | 31m43.400s
>> 64 Core - 1TB   | 0m39.885s | 7m55.289s
>> 64 Core - 256GB | 0m11.960s | 2m0.135s
>> ---
> 
> That's a nice improvement.
> 
>> Signed-off-by: Jitendra Kolhe 
>> ---
>>  util/oslib-posix.c | 64 
>> +++---
>>  1 file changed, 61 insertions(+), 3 deletions(-)
>>
>> diff --git a/util/oslib-posix.c b/util/oslib-posix.c
>> index f631464..a8bd7c2 100644
>> --- a/util/oslib-posix.c
>> +++ b/util/oslib-posix.c
>> @@ -55,6 +55,13 @@
>>  #include "qemu/error-report.h"
>>  #endif
>>  
>> +#define PAGE_TOUCH_THREAD_COUNT 8
> 
> It seems a shame to fix that number as a constant.
> 

Yes, as per comments received we will update patch to incorporate vcpu count.

>> +typedef struct {
>> +char *addr;
>> +uint64_t numpages;
>> +uint64_t hpagesize;
>> +} PageRange;
>> +
>>  int qemu_get_thread_id(void)
>>  {
>>  #if defined(__linux__)
>> @@ -323,6 +330,52 @@ static void sigbus_handler(int signal)
>>  siglongjmp(sigjump, 1);
>>  }
>>  
>> +static void *do_touch_pages(void *arg)
>> +{
>> +PageRange *range = (PageRange *)arg;
>> +char *start_addr = range->addr;
>> +uint64_t numpages = range->numpages;
>> +uint64_t hpagesize = range->hpagesize;
>> +uint64_t i = 0;
>> +
>> +for (i = 0; i < numpages; i++) {
>> +memset(start_addr + (hpagesize * i), 0, 1);
>> +}
>> +qemu_thread_exit(NULL);
>> +
>> +return NULL;
>> +}
>> +
>> +static int touch_all_pages(char *area, size_t hpagesize, size_t numpages)
>> +{
>> +QemuThread page_threads[PAGE_TOUCH_THREAD_COUNT];
>> +PageRange page_range[PAGE_TOUCH_THREAD_COUNT];
>> +uint64_tnumpage_per_thread, size_per_thread;
>> +int i = 0, tcount = 0;
>> +
>> +numpage_per_thread = (numpages / PAGE_TOUCH_THREAD_COUNT);
>> +size_per_thread = (hpagesize * numpage_per_thread);
>> +for (i = 0; i < (PAGE_TOUCH_THREAD_COUNT - 1); i++) {
>> +page_range[i].addr = area;
>> +page_range[i].numpages = numpage_per_thread;
>> +page_range[i].hpagesize = hpagesize;
>> +
>> +qemu_thread_create(page_threads + i, "touch_pages",
>> +   do_touch_pages, (page_range + i),
>> +   QEMU_THREAD_JOINABLE);
>> +tcount++;
>> +area += size_per_thread;
>> +numpages -= numpage_per_thread;
>> +}
>> +for (i = 0; i < numpages; i++) {
>> +memset(area + (hpagesize * i), 0, 1);
>> +}
>> +for (i = 0; i < tcount; i++) {
>> +qemu_thread_join(page_threads + i);
>> +}
>> +return 0;
>> +}
>> +
>>  void os_mem_prealloc(int fd, char *area, size_t memory, Error **errp)
>>  {
>>  int ret;
>> @@ -353,9 +406,14 @@ void os_mem_prealloc(int fd, char *area, size_t memory, 
>> Error **errp)
>>  size_t hpagesize = qemu_fd_getpagesiz

Re: [Qemu-devel] [PATCH v3 2/3] xen-platform: add support for unplugging NVMe disks...

2017-01-30 Thread Paul Durrant

> -Original Message-
> From: Stefano Stabellini [mailto:sstabell...@kernel.org]
> Sent: 27 January 2017 23:35
> To: Paul Durrant 
> Cc: qemu-devel@nongnu.org; xen-de...@lists.xenproject.org; Stefano
> Stabellini ; Anthony Perard
> ; Michael S. Tsirkin ; Paolo
> Bonzini ; Richard Henderson ;
> Eduardo Habkost 
> Subject: Re: [PATCH v3 2/3] xen-platform: add support for unplugging NVMe
> disks...
> 
> On Thu, 26 Jan 2017, Paul Durrant wrote:
> > ...not just IDE and SCSI.
> >
> > This patch allows the Xen tool-stack to fully support of NVMe as an
> > emulated disk type. See [1] for the relevant tool-stack patch discussion.
> >
> > [1] https://lists.xen.org/archives/html/xen-devel/2017-01/msg01225.html
> >
> > Signed-off-by: Paul Durrant 
> 
> Reviewed-by: Stefano Stabellini 
> 
> I queued the whole series. FYI I am waiting for another patch to fix a
> regression before sending a pull request.
> 

Cool. Thanks Stefano.

  Paul

> 
> > Cc: Stefano Stabellini 
> > Cc: Anthony Perard 
> > Cc: "Michael S. Tsirkin" 
> > Cc: Paolo Bonzini 
> > Cc: Richard Henderson 
> > Cc: Eduardo Habkost 
> >
> > v3:
> > - Add reference to xen-devel patch discussion in commit message as
> >   requested by Stefano.
> > ---
> >  hw/i386/xen/xen_platform.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/hw/i386/xen/xen_platform.c b/hw/i386/xen/xen_platform.c
> > index f50915f..7d41ebb 100644
> > --- a/hw/i386/xen/xen_platform.c
> > +++ b/hw/i386/xen/xen_platform.c
> > @@ -120,6 +120,7 @@ static void unplug_disks(PCIBus *b, PCIDevice *d,
> void *o)
> >  break;
> >
> >  case PCI_CLASS_STORAGE_SCSI:
> > +case PCI_CLASS_STORAGE_EXPRESS:
> >  object_unparent(OBJECT(d));
> >  break;
> >
> > --
> > 2.1.4
> >

Re: [Qemu-devel] [RFC 3/4] hw/intc/arm_gicv3_its: Implement state save/restore

2017-01-30 Thread Juan Quintela

Eric Auger  wrote:
> We need to handle both registers and ITS tables. While
> register handling is standard, ITS table handling is more
> challenging since the kernel API is devised so that the
> tables are flushed into guest RAM and not in vmstate buffers.
>
> Flushing the ITS tables on device pre_save() is too late
> since the guest RAM had already been saved at this point.
>
> Table flushing needs to happen when we are sure the vcpus
> are stopped and before the last dirty page saving. The
> right point is RUN_STATE_FINISH_MIGRATE but sometimes the
> VM gets stopped before migration launch so let's simply
> flush the tables each time the VM gets stopped.
>
> For regular ITS registers we just can use vmstate pre_save
> and post_load callbacks.
>
> Signed-off-by: Eric Auger 

Hi


> + * vm_change_state_handler - VM change state callback aiming at flushing
> + * ITS tables into guest RAM
> + *
> + * The tables get flushed to guest RAM whenever the VM gets stopped.
> + */
> +static void vm_change_state_handler(void *opaque, int running,
> +RunState state)
> +{
> +GICv3ITSState *s = (GICv3ITSState *)opaque;

Cast is unneeded.

> +
> +if (running) {
> +return;
> +}
> +kvm_device_access(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_ITS_TABLES,
> +  0, NULL, false);

As you are adding it to do everytime that we stop the guest, how
expensive/slow is that?

Thanks, Juan.

Re: [Qemu-devel] [PATCH 1/1] block: add missed BDRV_O_NOCACHE when block device is opened without file

2017-01-30 Thread Kevin Wolf

Am 25.01.2017 um 20:44 hat Denis V. Lunev geschrieben:
> This is what I have exactly started from:
> http://ftp.redhat.com/pub/redhat/linux/enterprise/7Server/en/RHEV/SRPMS/qemu-kvm-rhev-2.6.0-27.el7.src.rpm
> 
> This package starts VM well for the above mentioned configuration:
> 
> 
>   
>   
>   
> 
> 
> but the problem comes later at 'change' moment. It reports
> 
> 'Details: internal error: unable to execute QEMU command 'change':
> aio=native was specified, but it requires cache.direct=on, which
> was not specified.)'
> 
> 
> Thus partial solution implemented by the RedHat is really
> partial and does not work at the second stage. I have started from

That's a downstream problem that is probably worth fixing, but of course
not with an upstream patch. Can you create a Bugzilla for this and
assign it to Max, John or me? (Putting the rest of us into the CC list)

> the problem for me is that each clone() call is costly and counts. That
> is why we are trying to avoid it whenever possible. That is why 'native'
> mode is MUCH better. Also it would be very nice not to use cached
> IO, which is very good for memory overcommit situations.
> 
> Here I am fighting not with the performance, which does not make
> any real sense, but with a memory footprint.

This makes sense generally, but for empty CD-ROMs, no I/O request is
ever made, so neither Linux AIO nor the thread pool is used.

You only get a specific cache/aio mode once you actually insert a block
driver node to the virtual device.

> > Second, all this patch does is revert some changes done by commit
> > 91a097e7478940483e76d52217f05bc05b98d5a5, which was very deliberate.
> >
> > Third, you may then be asking for the recommended way to put an
> > aio=native medium into a CD-ROM drive. Good thing you ask, because we
> > have a way that we want to recommend but can't because it's still
> > considered experimental:
> >
> > The BDS is added using blockdev-add, with all of the appropriate caching
> > and aio options. Then it's inserted into the drive using the
> > x-blockdev-insert-medium command, and the drive is closed using
> > blockdev-close-tray.
> the problem, again, is that with x-blockdev-insert-medium I can not
> deal with block driver options, or may be I am missing something.

The thing that you insert with x-blockdev-insert-medium is a block
driver node that you presumably created with blockdev-add, so you did
have a chance to specify whichever options you want.

Kevin

Re: [Qemu-devel] [PATCH v8 08/25] tcg: drop global lock during TCG code execution

2017-01-30 Thread Alex Bennée


Pranith Kumar  writes:

> Alex Bennée writes:
>
>> From: Jan Kiszka 
>>
>> This finally allows TCG to benefit from the iothread introduction: Drop
>> the global mutex while running pure TCG CPU code. Reacquire the lock
>> when entering MMIO or PIO emulation, or when leaving the TCG loop.
>>
>> We have to revert a few optimization for the current TCG threading
>> model, namely kicking the TCG thread in qemu_mutex_lock_iothread and not
>> kicking it in qemu_cpu_kick. We also need to disable RAM block
>> reordering until we have a more efficient locking mechanism at hand.
>>
>> Still, a Linux x86 UP guest and my Musicpal ARM model boot fine here.
>> These numbers demonstrate where we gain something:
>>
>> 20338 jan   20   0  331m  75m 6904 R   99  0.9   0:50.95 qemu-system-arm
>> 20337 jan   20   0  331m  75m 6904 S   20  0.9   0:26.50 qemu-system-arm
>>
>> The guest CPU was fully loaded, but the iothread could still run mostly
>> independent on a second core. Without the patch we don't get beyond
>>
>> 32206 jan   20   0  330m  73m 7036 R   82  0.9   1:06.00 qemu-system-arm
>> 32204 jan   20   0  330m  73m 7036 S   21  0.9   0:17.03 qemu-system-arm
>>
>> We don't benefit significantly, though, when the guest is not fully
>> loading a host CPU.
>>
>> Signed-off-by: Jan Kiszka 
>> Message-Id: <1439220437-23957-10-git-send-email-fred.kon...@greensocs.com>
>> [FK: Rebase, fix qemu_devices_reset deadlock, rm address_space_* mutex]
>> Signed-off-by: KONRAD Frederic 
>> [EGC: fixed iothread lock for cpu-exec IRQ handling]
>> Signed-off-by: Emilio G. Cota 
>> [AJB: -smp single-threaded fix, clean commit msg, BQL fixes]
>> Signed-off-by: Alex Bennée 
>> Reviewed-by: Richard Henderson 
>>
>> ---
>> v8:
>>  - merged in BQL fixes for PPC target: ppc_set_irq
>>  - merged in BQL fixes for ARM target: ARM_CP_IO helpers
>>  - merged in BQL fixes for ARM target: arm_call_el_change_hook
>>
>> v5 (ajb, base patches):
>>  - added an assert to BQL unlock/lock functions instead of hanging
>>  - ensure all cpu->interrupt_requests *modifications* protected by BQL
>>  - add a re-read on cpu->interrupt_request for correctness
>>  - BQL fixes for:
>>- assert BQL held for PPC hypercalls (emulate_spar_hypercall)
>>- SCLP service calls on s390x
>>  - merge conflict with kick timer patch
>> v4 (ajb, base patches):
>>  - protect cpu->interrupt updates with BQL
>>  - fix wording io_mem_notdirty calls
>>  - s/we/with/
>> v3 (ajb, base-patches):
>>   - stale iothread_unlocks removed (cpu_exit/resume_from_signal deals
>>   with it in the longjmp).
>>   - fix re-base conflicts
>> v2 (ajb):
>>   - merge with tcg: grab iothread lock in cpu-exec interrupt handling
>>   - use existing fns for tracking lock state
>>   - lock iothread for mem_region
>> - add assert on mem region modification
>> - ensure smm_helper holds iothread
>>   - Add JK s-o-b
>>   - Fix-up FK s-o-b annotation
>> v1 (ajb, base-patches):
>>   - SMP failure now fixed by previous commit
>>
>> Changes from Fred Konrad (mttcg-v7 via paolo):
>>   * Rebase on the current HEAD.
>>   * Fixes a deadlock in qemu_devices_reset().
>>   * Remove the mutex in address_space_*
>> ---
>>  cpu-exec.c | 20 ++--
>>  cpus.c | 28 +---
>>  cputlb.c   | 21 -
>>  exec.c | 12 +---
>>  hw/core/irq.c  |  1 +
>>  hw/i386/kvmvapic.c |  4 ++--
>>  hw/intc/arm_gicv3_cpuif.c  |  3 +++
>>  hw/ppc/ppc.c   | 16 +++-
>>  hw/ppc/spapr.c |  3 +++
>>  include/qom/cpu.h  |  1 +
>>  memory.c   |  2 ++
>>  qom/cpu.c  | 10 ++
>>  target/arm/helper.c|  6 ++
>>  target/arm/op_helper.c | 43 +++
>>  target/i386/smm_helper.c   |  7 +++
>>  target/s390x/misc_helper.c |  5 -
>>  translate-all.c|  9 +++--
>>  translate-common.c | 21 +++--
>>  18 files changed, 163 insertions(+), 49 deletions(-)
>>
>> diff --git a/cpu-exec.c b/cpu-exec.c
>> index f9e836c8dd..f42a128bdf 100644
>> --- a/cpu-exec.c
>> +++ b/cpu-exec.c
>> @@ -29,6 +29,7 @@
>>  #include "qemu/rcu.h"
>>  #include "exec/tb-hash.h"
>>  #include "exec/log.h"
>> +#include "qemu/main-loop.h"
>>  #if defined(TARGET_I386) && !defined(CONFIG_USER_ONLY)
>>  #include "hw/i386/apic.h"
>>  #endif
>> @@ -388,8 +389,10 @@ static inline bool cpu_handle_halt(CPUState *cpu)
>>  if ((cpu->interrupt_request & CPU_INTERRUPT_POLL)
>>  && replay_interrupt()) {
>>  X86CPU *x86_cpu = X86_CPU(cpu);
>> +qemu_mutex_lock_iothread();
>>  apic_poll_irq(x86_cpu->apic_state);
>>  cpu_reset_interrupt(cpu, CPU_INTERRUPT_POLL);
>> +qemu_mutex_unlock_iothread();
>>  }
>>  #endif
>>  if (!cpu_has_work(cpu)) {
>> @@ -443,7 +446,9 @@ static

Re: [Qemu-devel] [PATCH v8 04/25] tcg: move TCG_MO/BAR types into own file

2017-01-30 Thread Alex Bennée


Pranith Kumar  writes:

> Alex Bennée writes:
>
>> We'll be using the memory ordering definitions to define values for
>> both the host and guest. To avoid fighting with circular header
>> dependencies just move these types into their own minimal header.
>>
>> Signed-off-by: Alex Bennée 
>> Reviewed-by: Richard Henderson 
>> ---
>>  tcg/tcg-mo.h | 45 +
>>  tcg/tcg.h| 18 +-
>>  2 files changed, 46 insertions(+), 17 deletions(-)
>>  create mode 100644 tcg/tcg-mo.h
>>
>> diff --git a/tcg/tcg-mo.h b/tcg/tcg-mo.h
>> new file mode 100644
>> index 00..429b022561
>> --- /dev/null
>> +++ b/tcg/tcg-mo.h
>> @@ -0,0 +1,45 @@
>> +/*
>> + * Tiny Code Generator for QEMU
>> + *
>> + * Copyright (c) 2008 Fabrice Bellard
>> + *
>> + * Permission is hereby granted, free of charge, to any person obtaining a 
>> copy
>> + * of this software and associated documentation files (the "Software"), to 
>> deal
>> + * in the Software without restriction, including without limitation the 
>> rights
>> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
>> + * copies of the Software, and to permit persons to whom the Software is
>> + * furnished to do so, subject to the following conditions:
>> + *
>> + * The above copyright notice and this permission notice shall be included 
>> in
>> + * all copies or substantial portions of the Software.
>> + *
>> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS 
>> OR
>> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
>> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR 
>> OTHER
>> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
>> FROM,
>> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
>> + * THE SOFTWARE.
>> + */
>> +
>> +#ifndef TCG_MO_H
>> +#define TCG_MO_H
>> +
>> +typedef enum {
>> +/* Used to indicate the type of accesses on which ordering
>> +   is to be ensured.  Modeled after SPARC barriers.  */
>> +TCG_MO_LD_LD  = 0x01,
>> +TCG_MO_ST_LD  = 0x02,
>> +TCG_MO_LD_ST  = 0x04,
>> +TCG_MO_ST_ST  = 0x08,
>> +TCG_MO_ALL= 0x0F,  /* OR of the above */
>> +
>> +/* Used to indicate the kind of ordering which is to be ensured by the
>> +   instruction.  These types are derived from x86/aarch64 instructions.
>> +   It should be noted that these are different from C11 semantics.  */
>> +TCG_BAR_LDAQ  = 0x10,  /* Following ops will not come forward */
>> +TCG_BAR_STRL  = 0x20,  /* Previous ops will not be delayed */
>> +TCG_BAR_SC= 0x30,  /* No ops cross barrier; OR of the above */
>> +} TCGBar;
>> +
>> +#endif /* TCG_MO_H */
>> diff --git a/tcg/tcg.h b/tcg/tcg.h
>> index 631c6f69b1..f946452049 100644
>> --- a/tcg/tcg.h
>> +++ b/tcg/tcg.h
>> @@ -29,6 +29,7 @@
>>  #include "cpu.h"
>>  #include "exec/tb-context.h"
>>  #include "qemu/bitops.h"
>> +#include "tcg-mo.h"
>>  #include "tcg-target.h"
>>
>>  /* XXX: make safe guess about sizes */
>> @@ -498,23 +499,6 @@ static inline intptr_t QEMU_ARTIFICIAL 
>> GET_TCGV_PTR(TCGv_ptr t)
>>  #define TCG_CALL_DUMMY_TCGV MAKE_TCGV_I32(-1)
>>  #define TCG_CALL_DUMMY_ARG  ((TCGArg)(-1))
>>
>> -typedef enum {
>> -/* Used to indicate the type of accesses on which ordering
>> -   is to be ensured.  Modeled after SPARC barriers.  */
>> -TCG_MO_LD_LD  = 0x01,
>> -TCG_MO_ST_LD  = 0x02,
>> -TCG_MO_LD_ST  = 0x04,
>> -TCG_MO_ST_ST  = 0x08,
>> -TCG_MO_ALL= 0x0F,  /* OR of the above */
>
> Can you please add the following comment:
>
> This is of the form TCG_MO_A_B where A is before B in program order.

Good idea. Will do.


--
Alex Bennée

Re: [Qemu-devel] [PATCH] target/s390x: Fix broken user mode

2017-01-30 Thread Christian Borntraeger

On 01/28/2017 09:36 AM, Stefan Weil wrote:
> Returning NULL from get_max_cpu_model results in a SIGSEGV runtime error.
> 
> Signed-off-by: Stefan Weil 
> ---
> 
> This is also broken in Debian.
> 
> In addition, there is no default CPU ("any"), so binfmt and related
> actions currently don't work. I hacked my local installation by
> duplicating the "qemu" cpu definition for "any", but maybe there is
> a better solution.


> 
> Regards
> Stefan
> 
>  target/s390x/cpu_models.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/target/s390x/cpu_models.c b/target/s390x/cpu_models.c
> index 2a894ee..6e34763 100644
> --- a/target/s390x/cpu_models.c
> +++ b/target/s390x/cpu_models.c
> @@ -660,7 +660,6 @@ static void check_compatibility(const S390CPUModel 
> *max_model,
>   static S390CPUModel *get_max_cpu_model(Error **errp)
>  {

I have for whatever reasons problems with this patch. Looks like you pasted
it into thunderbird or something like that and the whitespaces look mangled,
e.g. look at the indentation of static vs {.

> -#ifndef CONFIG_USER_ONLY
>  static S390CPUModel max_model;
>  static bool cached;
>  @@ -680,7 +679,6 @@ static S390CPUModel *get_max_cpu_model(Error **errp)
>  cached = true;
>  return &max_model;
>  }
> -#endif
>  return NULL;
>  }
>  -- 2.1.4
> 
>

Re: [Qemu-devel] [PATCH 10/17] migration: create ram_multifd_page

2017-01-30 Thread Juan Quintela

"Dr. David Alan Gilbert"  wrote:
> * Juan Quintela (quint...@redhat.com) wrote:
>> The function still don't use multifd, but we have simplified
>> ram_save_page, xbzrle and RDMA stuff is gone.  We have added a new
>> counter and a new flag for this type of pages.
>> 
>> Signed-off-by: Juan Quintela 

>> --- a/migration/ram.c
>> +++ b/migration/ram.c
>> @@ -61,6 +61,7 @@ static uint64_t bitmap_sync_count;
>>  #define RAM_SAVE_FLAG_XBZRLE   0x40
>>  /* 0x80 is reserved in migration.h start with 0x100 next */
>>  #define RAM_SAVE_FLAG_COMPRESS_PAGE0x100
>> +#define RAM_SAVE_FLAG_MULTIFD_PAGE 0x200
>
> I think a similar reminder from the last iteration of this patch;
> I think we're out of bits here - I'm not sure if 0x200 is even
> available.

In previous iteration, I used *two* bits.  As per your recomendation, I
"reused" and old one for doing the synchronization.


Later, Juan.

[Qemu-devel] [PATCH 1/1] mirror: restore offset after zeroing out the image

2017-01-30 Thread Denis V. Lunev

If explicit zeroing out before mirroring is required for the target image,
it moves the block job offset counter to EOF, then offset and len counters
count the image size twice.

There is no harm but confusing stats (e.g. for 1G image the completion
counter starts from 1G and increases to 2G)

The patch fixed that problem by resetting the offset counter.

Signed-off-by: Anton Nefedov 
Signed-off-by: Denis V. Lunev 
CC: Jeff Cody 
CC: Kevin Wolf 
CC: Max Reitz 
---
 block/mirror.c | 3 +++
 tests/qemu-iotests/094.out | 4 ++--
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 301ba92..94915e8 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -587,6 +587,9 @@ static int coroutine_fn mirror_dirty_init(MirrorBlockJob *s)
 }
 
 mirror_wait_for_all_io(s);
+
+/* offset has moved to EOF, restore it */
+s->common.offset = 0;
 }
 
 /* First part, loop on the sectors and initialize the dirty bitmap.  */
diff --git a/tests/qemu-iotests/094.out b/tests/qemu-iotests/094.out
index b66dc07..522a20c 100644
--- a/tests/qemu-iotests/094.out
+++ b/tests/qemu-iotests/094.out
@@ -3,9 +3,9 @@ Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
 Formatting 'TEST_DIR/source.IMGFMT', fmt=IMGFMT size=67108864
 {"return": {}}
 {"return": {}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"BLOCK_JOB_READY", "data": {"device": "src", "len": 67108864, "offset": 
67108864, "speed": 0, "type": "mirror"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"BLOCK_JOB_READY", "data": {"device": "src", "len": 0, "offset": 0, "speed": 0, 
"type": "mirror"}}
 {"return": {}}
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"BLOCK_JOB_COMPLETED", "data": {"device": "src", "len": 67108864, "offset": 
67108864, "speed": 0, "type": "mirror"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"BLOCK_JOB_COMPLETED", "data": {"device": "src", "len": 0, "offset": 0, 
"speed": 0, "type": "mirror"}}
 {"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"SHUTDOWN"}
 *** done
-- 
2.7.4

Re: [Qemu-devel] [PULL 00/22] target-arm queue

2017-01-30 Thread Peter Maydell

On 27 January 2017 at 15:31, Peter Maydell  wrote:
> ARM queue; the bulk of this is M profile bugfixes.
>
> thanks
> -- PMM

Applied, thanks.

-- PMM

Re: [Qemu-devel] [PATCH] sd: sdhci: check data length during dma_memory_read

2017-01-30 Thread Peter Maydell

On 30 January 2017 at 06:47, P J P  wrote:
> From: Prasad J Pandit 
>
> While doing multi block SDMA transfer in routine
> 'sdhci_sdma_transfer_multi_blocks', the 's->fifo_buffer' starting
> index 'begin' and data length 's->data_count' could end up to be same.
> This could lead to an OOB access issue. Correct transfer data length
> to avoid it.
>
> Reported-by: Jiang Xin 
> Signed-off-by: Prasad J Pandit 
> ---
>  hw/sd/sdhci.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/hw/sd/sdhci.c b/hw/sd/sdhci.c
> index 01fbf22..5bd5ab6 100644
> --- a/hw/sd/sdhci.c
> +++ b/hw/sd/sdhci.c
> @@ -536,7 +536,7 @@ static void sdhci_sdma_transfer_multi_blocks(SDHCIState 
> *s)
>  boundary_count -= block_size - begin;
>  }
>  dma_memory_read(&address_space_memory, s->sdmasysad,
> -&s->fifo_buffer[begin], s->data_count);
> +&s->fifo_buffer[begin], s->data_count - begin);
>  s->sdmasysad += s->data_count - begin;
>  if (s->data_count == block_size) {
>  for (n = 0; n < block_size; n++) {

Reviewed-by: Peter Maydell 
Cc: qemu-sta...@nongnu.org

thanks
-- PMM

Re: [Qemu-devel] [PATCH 0/2] tcg/i386 guest_base improvements

2017-01-30 Thread Alex Bennée


Richard Henderson  writes:

> The first change does two things: (1) improve bsd-user so that it
> doesn't continually reload guest_base into a temp register and
> (2) extract the bulk of the guest_base logic to a routine that
> is run once at startup.
>
> The second change adds segmentation support to 32-bit linux.  There,
> if we're using a guest base, we can save 3 bytes per memory op by
> using a segment override.  In addition, if we're using a reserved_va,
> we can set up the segment such that guest memory references are
> constrained by the segment.
>
> Comments?

I'm not sure how to best review this given its fairly low level
x86 stuff. Do you have any numbers to show how this improves things?

>
>
> r~
>
>
> Richard Henderson (2):
>   tcg/i386: Reserve register for guest_base if a segment isn't available
>   tcg/i386: Use segment for 32-bit guest base on linux
>
>  tcg/i386/tcg-target.inc.c | 181 
> +-
>  1 file changed, 100 insertions(+), 81 deletions(-)


--
Alex Bennée

Re: [Qemu-devel] [RFC 3/4] hw/intc/arm_gicv3_its: Implement state save/restore

2017-01-30 Thread Auger Eric

Hi Juan,

On 30/01/2017 10:15, Juan Quintela wrote:
> Eric Auger  wrote:
>> We need to handle both registers and ITS tables. While
>> register handling is standard, ITS table handling is more
>> challenging since the kernel API is devised so that the
>> tables are flushed into guest RAM and not in vmstate buffers.
>>
>> Flushing the ITS tables on device pre_save() is too late
>> since the guest RAM had already been saved at this point.
>>
>> Table flushing needs to happen when we are sure the vcpus
>> are stopped and before the last dirty page saving. The
>> right point is RUN_STATE_FINISH_MIGRATE but sometimes the
>> VM gets stopped before migration launch so let's simply
>> flush the tables each time the VM gets stopped.
>>
>> For regular ITS registers we just can use vmstate pre_save
>> and post_load callbacks.
>>
>> Signed-off-by: Eric Auger 
> 
> Hi
> 
> 
>> + * vm_change_state_handler - VM change state callback aiming at flushing
>> + * ITS tables into guest RAM
>> + *
>> + * The tables get flushed to guest RAM whenever the VM gets stopped.
>> + */
>> +static void vm_change_state_handler(void *opaque, int running,
>> +RunState state)
>> +{
>> +GICv3ITSState *s = (GICv3ITSState *)opaque;
> 
> Cast is unneeded.
> 
>> +
>> +if (running) {
>> +return;
>> +}
>> +kvm_device_access(s->dev_fd, KVM_DEV_ARM_VGIC_GRP_ITS_TABLES,
>> +  0, NULL, false);
> 
> As you are adding it to do everytime that we stop the guest, how
> expensive/slow is that?

This is highly dependent on the number of devices using MSIs and number
of allocated MSIs on guest. The number of bytes to transfer basically is:

(#nb_vcpus + #nb_devices_using_MSI_on_guest  +  2 *
nb_allocated_guest_MSIs bytes ) * 8 bytes

So I would say < 10 kB in real life case. In my virtio-pci test case it
is just 440 Bytes.

For live migration I could hook a callback at RUN_STATE_FINISH_MIGRATE.
However this does not work with virsh save/restore use case since the
notifier is not called (the VM being already paused), hence that choice.

Thanks

Eric

> 
> Thanks, Juan.
>

[Qemu-devel] [PATCH] spice: wakeup QXL worker to pick up mouse changes

2017-01-30 Thread Marc-André Lureau

Without it, server-mode mouse is "slow" to update position: QXL will
wait until new display commands come. This is very visible with
virtio-gpu.

Signed-off-by: Marc-André Lureau 
---
 ui/spice-display.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/ui/spice-display.c b/ui/spice-display.c
index 5e6f78a219..64e472eeb0 100644
--- a/ui/spice-display.c
+++ b/ui/spice-display.c
@@ -769,6 +769,7 @@ static void display_mouse_set(DisplayChangeListener *dcl,
 g_free(ssd->ptr_move);
 ssd->ptr_move = qemu_spice_create_cursor_update(ssd, NULL, on);
 qemu_mutex_unlock(&ssd->lock);
+qemu_spice_wakeup(ssd);
 }
 
 static void display_mouse_define(DisplayChangeListener *dcl,
@@ -787,6 +788,7 @@ static void display_mouse_define(DisplayChangeListener *dcl,
 g_free(ssd->ptr_define);
 ssd->ptr_define = qemu_spice_create_cursor_update(ssd, c, 0);
 qemu_mutex_unlock(&ssd->lock);
+qemu_spice_wakeup(ssd);
 }
 
 static const DisplayChangeListenerOps display_listener_ops = {
-- 
2.11.0.295.gd7dffce1c.dirty

[Qemu-devel] [PATCH] io: fix decoding when multiple websockets frames arrive at once

2017-01-30 Thread Daniel P. Berrange

The qio_channel_websock_read_wire() method will read upto 4096
bytes off the socket and then decode the websockets header and
payload. The code was only decoding a single websockets frame,
even if the buffered data contained multiple frames. This meant
that decoding of subsequent frames was delayed until further
input arrived on the socket. This backlog of delayed frames
gets worse & worse over time.

Symptom was that when connecting to the VNC server via the
built-in websockets server, mouse/keyboard interaction would
start out fine, but slowly get more & more delayed until it
was unusable.

Signed-off-by: Daniel P. Berrange 
---
 io/channel-websock.c | 28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/io/channel-websock.c b/io/channel-websock.c
index e47279a..a06a4a8 100644
--- a/io/channel-websock.c
+++ b/io/channel-websock.c
@@ -570,21 +570,24 @@ static ssize_t 
qio_channel_websock_read_wire(QIOChannelWebsock *ioc,
 ioc->encinput.offset += ret;
 }
 
-if (ioc->payload_remain == 0) {
-ret = qio_channel_websock_decode_header(ioc, errp);
+while (ioc->encinput.offset != 0) {
+if (ioc->payload_remain == 0) {
+ret = qio_channel_websock_decode_header(ioc, errp);
+if (ret < 0) {
+return ret;
+}
+if (ret == 0) {
+ioc->io_eof = TRUE;
+break;
+}
+}
+
+ret = qio_channel_websock_decode_payload(ioc, errp);
 if (ret < 0) {
 return ret;
 }
-if (ret == 0) {
-return 0;
-}
 }
-
-ret = qio_channel_websock_decode_payload(ioc, errp);
-if (ret < 0) {
-return ret;
-}
-return ret;
+return 1;
 }
 
 
@@ -642,9 +645,6 @@ static gboolean qio_channel_websock_flush(QIOChannel *ioc,
 if (ret < 0) {
 goto cleanup;
 }
-if (ret == 0) {
-wioc->io_eof = TRUE;
-}
 }
 
  cleanup:
-- 
2.9.3

Re: [Qemu-devel] QEMU websockets support is laggy?

2017-01-30 Thread Daniel P. Berrange

On Fri, Jan 27, 2017 at 06:08:20PM +, Daniel P. Berrange wrote:
> On Fri, Jan 27, 2017 at 09:35:38AM +, Daniel P. Berrange wrote:
> > On Tue, Jan 24, 2017 at 05:02:25PM -0500, Brian Rak wrote:
> > > We've been considering switching over to using qemu's built in websockets
> > > support (to avoid the overhead of needing websockify running).  We've been
> > > seeing very poor performance after the switch (it takes the console 4-5
> > > seconds to update after pressing a key).  So far, I haven't been able to
> > > find any indication of why this is happening.  The exact same 
> > > configuration
> > > works perfectly when running with websockify, but laggy when hitting qemu
> > > directly.
> > > 
> > > I've tried a few things (disabling encryption, bypassing our usual nginx
> > > proxy, even connecting via a ssh tunnel), and haven't made any sort of
> > > progress here.  Has anyone else seen this?  Any suggestions as to where I
> > > should start looking?
> > 
> > Can you clarify the exact setup you have ?  As mentioned on IRC, I don't
> > see any degradation in performance betweeen builtin websockets vs a
> > websockets proxy - if anything the builtin websockets is marginally less
> > laggy.  I was connecting over TCP localhost, however, so would not see
> > any effects of network latency. My test was QEMU git master, with noVNC
> > git master.
> 
> It turns out that to see the problem you have a wait a while - the
> connection is initially fine, but gets worse over time. Using a TCP
> connection seems to make it get worse more quickly.
> 
> After a painful debugging session I've discovered the problem is that
> QEMU is reading data off the socket, decoding a websockets packet and
> then processing it. The problem is if the read off the socket gets
> multiple websockets packets at once, it only decodes the first packet.
> The remaining packets aren't decoded until more data arrives on the
> socket. This gets progressively worse & worse. I'll send a patch for
> this next week...

I've just sent a patch & would appreciate your testing to confirm that
it fixes the problem for you too.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://entangle-photo.org   -o-http://search.cpan.org/~danberr/ :|

Re: [Qemu-devel] Towards an ivshmem 2.0?

2017-01-30 Thread Stefan Hajnoczi

On Sun, Jan 29, 2017 at 12:56:23PM +0100, msuchanek wrote:
> On 2017-01-17 10:59, Stefan Hajnoczi wrote:
> > On Mon, Jan 16, 2017 at 02:10:17PM +0100, Jan Kiszka wrote:
> > > On 2017-01-16 13:41, Marc-André Lureau wrote:
> > > > On Mon, Jan 16, 2017 at 12:37 PM Jan Kiszka  > > > > wrote:
> > > > So, this is our proposal. Would be great to hear some opinions if 
> > > > you
> > > > see value in adding support for such an "ivshmem 2.0" device to 
> > > > QEMU as
> > > > well and expand its ecosystem towards Linux upstream, maybe also 
> > > > DPDK
> > > > again. If you see problems in the new design /wrt what QEMU 
> > > > provides so
> > > > far with its ivshmem device, let's discuss how to resolve them. 
> > > > Looking
> > > > forward to any feedback!
> > > >
> > > >
> > > > My feeling is that ivshmem is not being actively developped in qemu, but
> > > > rather virtio-based solutions (vhost-pci for vm2vm).
> > > 
> > > As pointed out, for us it's most important to keep the design simple -
> > > even at the price of "reinventing" some drivers for upstream (at
> > > least,
> > > we do not need two sets of drivers because our interface is fully
> > > symmetric). I don't see yet how vhost-pci could achieve the same, but
> > > I'm open to learn more!
> > 
> > The concept of symmetry is nice but only applies for communications
> > channels like networking and serial.
> > 
> > It doesn't apply for I/O that is fundamentally asymmetric like disk I/O.
> > 
> > I just wanted to point this out because lack symmetry has also bothered
> > me about virtio but it's actually impossible to achieve it for all
> > device types.
> > 
> 
> What's asymetric about storage? IIRC both SCSI and Firewire which can be
> used for storage are symmetric. All asymmetry only comes from usage
> convention or less capable buses like IDE/SATA.

I'll also add Intel NVMe and virtio-blk to the list of interfaces that
are not symmetric.

Even for SCSI, separate roles for initiator and target are central to
the SCSI Architecture Model.  The consequence is that hardware
interfaces and software stacks are not symmetric.  For example, the
Linux SCSI target only supports a small set of FC HBAs with explicit
target mode support rather than all SCSI HBAs.

Intuitively this makes sense - if the I/O has clear "client" and
"server" roles then why should both sides implement both roles?  It adds
cost and complexity for no benefit.

The same goes for other device types like graphics cards.  They are
inherently asymmetric.  Only one side has the actual hardware to perform
the I/O so it doesn't make sense to be symmetric.

You can pretend they are symmetric by restricting the hardware interface
and driver to just message passing.  Then another layer of software
handles the asymmetric behavior.  But then you may as well use iSCSI,
VNC, etc and not have a hardware interface for disk and graphics.

Stefan

signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH Risu 1/2] risu_ppc64: Fix Risu to run under qemu linux user

2017-01-30 Thread Alex Bennée


Jose Ricardo Ziviani  writes:

> Qemu linux-user doesn't fill uc_mcontext completely like full emul.
> does.

Are you going to submit a fix for QEMU for this? Is there a reason it
doesn't do it correctly?

> For instance, uc->uc_mcontext.regs->nip is an invalid so this
> commit replaces it by uc->uc_mcontext.gp_regs[PT_NIP]
>
> Signed-off-by: Jose Ricardo Ziviani 
> ---
>  risu_ppc64le.c |  2 +-
>  risu_reginfo_ppc64le.c | 11 ++-
>  test_ppc64le.s | 20 +---
>  3 files changed, 16 insertions(+), 17 deletions(-)
>
> diff --git a/risu_ppc64le.c b/risu_ppc64le.c
> index 9c1fafd..773d14c 100644
> --- a/risu_ppc64le.c
> +++ b/risu_ppc64le.c
> @@ -27,7 +27,7 @@ uint8_t apprentice_memblock[MEMBLOCKLEN];
>  void advance_pc(void *vuc)
>  {
>  ucontext_t *uc = (ucontext_t*)vuc;
> -uc->uc_mcontext.regs->nip += 4;
> +uc->uc_mcontext.gp_regs[PT_NIP] += 4;
>  }
>
>  void set_x0(void *vuc, uint64_t x0)
> diff --git a/risu_reginfo_ppc64le.c b/risu_reginfo_ppc64le.c
> index 7a54eab..4dc509c 100644
> --- a/risu_reginfo_ppc64le.c
> +++ b/risu_reginfo_ppc64le.c
> @@ -28,8 +28,9 @@ void reginfo_init(struct reginfo *ri, ucontext_t *uc)
>  int i;
>  memset(ri, 0, sizeof(*ri));
>
> -ri->faulting_insn = *((uint32_t *)uc->uc_mcontext.regs->nip);
> -ri->nip = uc->uc_mcontext.regs->nip - image_start_address;
> +ri->faulting_insn = *((uint32_t *)uc->uc_mcontext.gp_regs[PT_NIP]);
> +ri->prev_insn = *((uint32_t *)(uc->uc_mcontext.gp_regs[PT_NIP] - 4));
> +ri->nip = uc->uc_mcontext.gp_regs[PT_NIP] - image_start_address;
>
>  for (i = 0; i < NGREG; i++) {
>  ri->gregs[i] = uc->uc_mcontext.gp_regs[i];
> @@ -105,9 +106,9 @@ void reginfo_dump(struct reginfo *ri, int is_master)
>  {
>  int i;
>  if (is_master) {
> -fprintf(stderr, "  faulting insn \e[1;101;37m0x%x\e[0m\n", 
> ri->faulting_insn);
> -fprintf(stderr, "  prev insn \e[1;101;37m0x%x\e[0m\n", 
> ri->prev_insn);
> -fprintf(stderr, "  prev addr \e[1;101;37m0x%" PRIx64 
> "\e[0m\n\n", ri->prev_addr);
> +fprintf(stderr, "  faulting insn 0x%x\n", ri->faulting_insn);
> +fprintf(stderr, "  prev insn 0x%x\n", ri->prev_insn);
> +fprintf(stderr, "  prev addr0x%" PRIx64 "\n\n", ri->nip);
>  }
>
>  for (i = 0; i < 16; i++) {
> diff --git a/test_ppc64le.s b/test_ppc64le.s
> index 4321751..4af770c 100644
> --- a/test_ppc64le.s
> +++ b/test_ppc64le.s
> @@ -12,20 +12,18 @@
>   
> */
>
>  /* Initialise the gp regs */
> -li 0,0
> -li 1,1
> -li 2,2
> -li 3,3
> -li 4,4
> -li 5,5
> -li 6,6
> -li 7,7
> -li 8,8
> -li 9,9
> +li 0, 0
> +li 2, 2
> +li 3, 3
> +li 4, 4
> +li 5, 5
> +li 6, 6
> +li 7, 7
> +li 8, 8
> +li 9, 9
>  li 10, 10
>  li 11, 11
>  li 12, 12
> -li 13, 13
>  li 14, 14
>  li 15, 15
>  li 16, 16


--
Alex Bennée

Re: [Qemu-devel] [PATCH Risu 1/2] risu_ppc64: Fix Risu to run under qemu linux user

2017-01-30 Thread Peter Maydell

On 30 January 2017 at 02:47, Jose Ricardo Ziviani
 wrote:
> Qemu linux-user doesn't fill uc_mcontext completely like full emul. does.
> For instance, uc->uc_mcontext.regs->nip is an invalid so this
> commit replaces it by uc->uc_mcontext.gp_regs[PT_NIP]

It's not clear to me from this commit message whether this is
a bug in QEMU's userspace emulation which this is trying to work
around (in which case we should just fix it in QEMU), or a
bug in risu where we were incorrectly relying on something the
kernel doesn't actually guarantee. Which is it?

Also, looking at the kernel source and headers as far
as I can see uc_context.regs is a pointer set up such that
uc->uc_mcontext.regs->nip is pointing at the same bit of
memory where uc->uc_mcontext.gp_regs[PT_NIP] is,
and the QEMU code does similar, so I don't see how you can
get two different values from the two things.

(It is certainly the case that risu is quite good at exercising
odd corner cases of the signal handling code in QEMU which most
normal programs don't care about...)

> Signed-off-by: Jose Ricardo Ziviani 
> ---
>  risu_ppc64le.c |  2 +-
>  risu_reginfo_ppc64le.c | 11 ++-
>  test_ppc64le.s | 20 +---
>  3 files changed, 16 insertions(+), 17 deletions(-)
>
> diff --git a/risu_ppc64le.c b/risu_ppc64le.c
> index 9c1fafd..773d14c 100644
> --- a/risu_ppc64le.c
> +++ b/risu_ppc64le.c
> @@ -27,7 +27,7 @@ uint8_t apprentice_memblock[MEMBLOCKLEN];
>  void advance_pc(void *vuc)
>  {
>  ucontext_t *uc = (ucontext_t*)vuc;
> -uc->uc_mcontext.regs->nip += 4;
> +uc->uc_mcontext.gp_regs[PT_NIP] += 4;
>  }
>
>  void set_x0(void *vuc, uint64_t x0)
> diff --git a/risu_reginfo_ppc64le.c b/risu_reginfo_ppc64le.c
> index 7a54eab..4dc509c 100644
> --- a/risu_reginfo_ppc64le.c
> +++ b/risu_reginfo_ppc64le.c
> @@ -28,8 +28,9 @@ void reginfo_init(struct reginfo *ri, ucontext_t *uc)
>  int i;
>  memset(ri, 0, sizeof(*ri));
>
> -ri->faulting_insn = *((uint32_t *)uc->uc_mcontext.regs->nip);
> -ri->nip = uc->uc_mcontext.regs->nip - image_start_address;
> +ri->faulting_insn = *((uint32_t *)uc->uc_mcontext.gp_regs[PT_NIP]);
> +ri->prev_insn = *((uint32_t *)(uc->uc_mcontext.gp_regs[PT_NIP] - 4));
> +ri->nip = uc->uc_mcontext.gp_regs[PT_NIP] - image_start_address;
>
>  for (i = 0; i < NGREG; i++) {
>  ri->gregs[i] = uc->uc_mcontext.gp_regs[i];
> @@ -105,9 +106,9 @@ void reginfo_dump(struct reginfo *ri, int is_master)
>  {
>  int i;
>  if (is_master) {
> -fprintf(stderr, "  faulting insn \e[1;101;37m0x%x\e[0m\n", 
> ri->faulting_insn);
> -fprintf(stderr, "  prev insn \e[1;101;37m0x%x\e[0m\n", 
> ri->prev_insn);
> -fprintf(stderr, "  prev addr \e[1;101;37m0x%" PRIx64 
> "\e[0m\n\n", ri->prev_addr);
> +fprintf(stderr, "  faulting insn 0x%x\n", ri->faulting_insn);
> +fprintf(stderr, "  prev insn 0x%x\n", ri->prev_insn);
> +fprintf(stderr, "  prev addr0x%" PRIx64 "\n\n", ri->nip);

This seems to be fixing two unrelated things:
 * escape sequences in the output (which we should indeed get rid of)
 * printing an uninitialized and unused prev_addr field
   (which we should fix and also drop the unneeded field from reginfo)

>  }
>
>  for (i = 0; i < 16; i++) {
> diff --git a/test_ppc64le.s b/test_ppc64le.s
> index 4321751..4af770c 100644
> --- a/test_ppc64le.s
> +++ b/test_ppc64le.s
> @@ -12,20 +12,18 @@
>   
> */
>
>  /* Initialise the gp regs */
> -li 0,0
> -li 1,1
> -li 2,2
> -li 3,3
> -li 4,4
> -li 5,5
> -li 6,6
> -li 7,7
> -li 8,8
> -li 9,9
> +li 0, 0
> +li 2, 2
> +li 3, 3
> +li 4, 4
> +li 5, 5
> +li 6, 6
> +li 7, 7
> +li 8, 8
> +li 9, 9

This appears to be unrelated changes to whitespace...

>  li 10, 10
>  li 11, 11
>  li 12, 12
> -li 13, 13
>  li 14, 14
>  li 15, 15
>  li 16, 16

thanks
-- PMM

Re: [Qemu-devel] [PULL 15/18] spapr: CPU hot unplug support

2017-01-30 Thread Igor Mammedov

On Thu, 26 Jan 2017 19:56:35 +0530
Bharata B Rao  wrote:

> On Thu, Jan 26, 2017 at 12:32:58PM +0100, Igor Mammedov wrote:
> > On Fri, 17 Jun 2016 16:36:36 +1000
> > David Gibson  wrote:
> > 
> > > From: Bharata B Rao 
> > > 
> > > Remove the CPU core device by removing the underlying CPU thread devices.
> > > Hot removal of CPU for sPAPR guests is achieved by sending the hot unplug
> > > notification to the guest. Release the vCPU object after CPU hot unplug so
> > > that vCPU fd can be parked and reused.
> > > 
> > > Signed-off-by: Bharata B Rao 
> > > Signed-off-by: David Gibson 
> > [...]
> > 
> > 
> > Bharata,
> > 
> > Here is some notes I've made while auditing spapr cpu hotplug code.
> >   
> > spapr_core_release() should be spapr_core_unrealize()
> > except of machine related
> >  spapr->cores[cc->core_id / smt] = NULL;
> > which should go to spapr_core_unplug()
> 
> There were some issues in calling cpu_remove_[sync] from unrealize
> path. I know that x86 does that way. let me remember and get back on this.
on the first glance it doesn't look like there should be issues with
making it spapr_core_unrealize(), but since it's way out of scope
numa rework I'd leave 'fixing' it upto you. 

> 
> > 
> > > +static void spapr_core_release(DeviceState *dev, void *opaque)
> > > +{
> > > +sPAPRCPUCore *sc = SPAPR_CPU_CORE(OBJECT(dev));
> > > +const char *typename = object_class_get_name(sc->cpu_class);
> > > +size_t size = object_type_get_instance_size(typename);
> > > +sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine());
> > > +sPAPRCPUCore *core = SPAPR_CPU_CORE(OBJECT(dev));
> > > +CPUCore *cc = CPU_CORE(dev);
> > > +int smt = kvmppc_smt_threads();
> > > +int i;
> > > +
> > > +for (i = 0; i < cc->nr_threads; i++) {
> > > +void *obj = sc->threads + i * size;
> > > +DeviceState *dev = DEVICE(obj);
> > > +CPUState *cs = CPU(dev);
> > > +PowerPCCPU *cpu = POWERPC_CPU(cs);
> > > +
> > > +spapr_cpu_destroy(cpu);
> > > +cpu_remove_sync(cs);
> > > +object_unparent(obj);
> > > +}
> > > +
> > > +spapr->cores[cc->core_id / smt] = NULL;
> > > +
> > > +g_free(core->threads);
> > > +object_unparent(OBJECT(dev));
> > > +}
> > > +
> > 
> > spapr_core_[un]plug() functions belong to machine code and should
> > be in hw/ppc/spapr.c
> 
> That's how the series started, but eventually we consolidated all
> core related routines in spapr_cpu_core.c

Since spapr_core_[un]plug() manage spapr machine state and not internal
core state, I'd like to move them close to other machine code (spapr.c)
if you don't mind.

> 
> > 
> > > +void spapr_core_unplug(HotplugHandler *hotplug_dev, DeviceState *dev,
> > > +   Error **errp)
> > > +{
> > > +sPAPRCPUCore *core = SPAPR_CPU_CORE(OBJECT(dev));
> > > +PowerPCCPU *cpu = POWERPC_CPU(core->threads);
> > > +int id = ppc_get_vcpu_dt_id(cpu);
> > > +sPAPRDRConnector *drc =
> > > +spapr_dr_connector_by_id(SPAPR_DR_CONNECTOR_TYPE_CPU, id);
> > > +sPAPRDRConnectorClass *drck;
> > > +Error *local_err = NULL;
> > > +
> > > +g_assert(drc);
> > > +
> > > +drck = SPAPR_DR_CONNECTOR_GET_CLASS(drc);
> > > +drck->detach(drc, dev, spapr_core_release, NULL, &local_err);
> > 
> > Could you explain call flow during cpu unplug?
> 
> In response to unplug request, spapr_core_unplug() gets called which
> does a detach() on the associated DRC object. The detach() registers
> a callback (spapr_core_release)  and signals the guest about the unplug
> request.
> 
> When the guest is ready to let go of the CPU core, DRC subsystem ends up
> calling the callback spapr_core_release. For each of the CPU thread objects
> of the core, spapr_core_release will call cpu_remove_sync() and waits
> for the CPU to be really removed. cpu_remove will result in CPU unrealize
> function being called (ppc_cpu_unrealizefn) for each of the removed
> CPU.
> 
> After we are done waiting for all the threads' removal, the core object is
> ready for removal.
> 
> > 
> > My expectations were that unplug_request() handler asks for CPU removal
> > and unplug() handler removes CPU.
> > It's obviously messed up somehow.
> 
> When we did CPU unplug, we didn't really implement ->unplug_request() for
> sPAPR. It was added later when memory unplug came in.
It ended up that spapr_core_unplug() is called from both ->unplug_request()
and ->unplug(). Where ->unplug() is dead path that's never called.
I'll send patches to fix hot-unlpug flow to conform to generic hotplug
pattern
   ->unplug_request() -> register callback
and
   callback ->unplug() -> release_core()

> 
> Regards,
> Bharata.
> 
>

[Qemu-devel] [PATCH v3 2/3] qemu-io: Add regression tests

2017-01-30 Thread Nir Soffer

From: Nir Soffer 

Add regression tests checking that qemu-io fail with non-zero exit code
when reading non-existing file or using the wrong format.

Signed-off-by: Nir Soffer 
---
 tests/qemu-iotests/173 | 59 ++
 tests/qemu-iotests/173.out |  9 +++
 tests/qemu-iotests/group   |  1 +
 3 files changed, 69 insertions(+)
 create mode 100755 tests/qemu-iotests/173
 create mode 100644 tests/qemu-iotests/173.out

diff --git a/tests/qemu-iotests/173 b/tests/qemu-iotests/173
new file mode 100755
index 000..1d1fd6d
--- /dev/null
+++ b/tests/qemu-iotests/173
@@ -0,0 +1,59 @@
+#!/bin/bash
+#
+# Test that qemu-io fail with non-zero exit code
+#
+# Copyright (C) 2017 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=nir...@gmail.com
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+here=`pwd`
+status=1   # failure is the default!
+
+_cleanup()
+{
+   _cleanup_test_img
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt raw
+
+
+size=256K
+_make_test_img $size
+
+echo
+echo "== reading wrong format should fail =="
+$QEMU_IO -f qcow2 -c "read 0 $size" "$TEST_IMG" 2>&1 | _filter_testdir
+test "${PIPESTATUS[0]}" -eq 1 || _fail "did not fail"
+
+echo
+echo "== reading missing file should fail =="
+$QEMU_IO -c "read 0 $size" "$TEST_DIR/missing" 2>&1 | _filter_testdir
+test "${PIPESTATUS[0]}" -eq 1 || _fail "did not fail"
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/173.out b/tests/qemu-iotests/173.out
new file mode 100644
index 000..47012a3
--- /dev/null
+++ b/tests/qemu-iotests/173.out
@@ -0,0 +1,9 @@
+QA output created by 173
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=262144
+
+== reading wrong format should fail ==
+can't open device TEST_DIR/t.raw: Image is not in qcow2 format
+
+== reading missing file should fail ==
+can't open device TEST_DIR/missing: Could not open 'TEST_DIR/missing': No such 
file or directory
+*** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 866c1a0..069a5f3 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -165,3 +165,4 @@
 170 rw auto quick
 171 rw auto quick
 172 auto
+173 auto
-- 
2.9.3

[Qemu-devel] [PATCH v3 1/3] qemu-io: Return non-zero exit code on failure

2017-01-30 Thread Nir Soffer

From: Nir Soffer 

The result of openfile was not checked, leading to failure deep in the
actual command with confusing error message, and exiting with exit code 0.

Here is a simple example - trying to read with the wrong format:

$ touch file
$ qemu-io -f qcow2 -c 'read -P 1 0 1024' file; echo $?
can't open device file: Image is not in qcow2 format
no file open, try 'help open'
0

With this patch, we fail earlier with exit code 1:

$ ./qemu-io -f qcow2 -c 'read -P 1 0 1024' file; echo $?
can't open device file: Image is not in qcow2 format
1

Signed-off-by: Nir Soffer 
Reviewed-by: Eric Blake 
Reviewed-by: Fam Zheng 
---

Changes since v2:
- Adding missing signed-off-by
- Fix tests expecting the wrong output

 qemu-io.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/qemu-io.c b/qemu-io.c
index 23a229f..427cbae 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -595,13 +595,17 @@ int main(int argc, char **argv)
 exit(1);
 }
 opts = qemu_opts_to_qdict(qopts, NULL);
-openfile(NULL, flags, writethrough, opts);
+if (openfile(NULL, flags, writethrough, opts)) {
+exit(1);
+}
 } else {
 if (format) {
 opts = qdict_new();
 qdict_put(opts, "driver", qstring_from_str(format));
 }
-openfile(argv[optind], flags, writethrough, opts);
+if (openfile(argv[optind], flags, writethrough, opts)) {
+exit(1);
+}
 }
 }
 command_loop();
-- 
2.9.3

[Qemu-devel] [PATCH v3 3/3] qemu-io: Fix tests expecting the wrong output

2017-01-30 Thread Nir Soffer

From: Nir Soffer 

Many tests expected the wrong behavior when qemu-io call into the
command with after failing to open the file, writing this error:

no file open, try 'help open'

Now that we fail immediately when opening a file fails, this error does
not exist in the output; remove it from tests output.

Tested using:

./check 059 -vmdk (unrelated failure)
./check 070 -vhdx
./check 075 -cloop
./check 076 -parallels
./check 078 -bochs
./check 080 -qcow2
./check 083 -nbd
./check 088 -vpc
./check 092 -qcow
./check 116 -qed
./check 131 -parallels
./check 140 -raw
./check 140 -qcow2
./check -raw
./check -qcow2

Signed-off-by: Nir Soffer 
---
 tests/qemu-iotests/059.out |  3 ---
 tests/qemu-iotests/070.out |  1 -
 tests/qemu-iotests/075.out |  7 ---
 tests/qemu-iotests/076.out |  3 ---
 tests/qemu-iotests/078.out |  6 --
 tests/qemu-iotests/080.out | 18 --
 tests/qemu-iotests/083.out | 17 -
 tests/qemu-iotests/088.out |  6 --
 tests/qemu-iotests/092.out | 12 
 tests/qemu-iotests/116.out |  7 ---
 tests/qemu-iotests/131.out |  1 -
 tests/qemu-iotests/140.out |  1 -
 12 files changed, 82 deletions(-)

diff --git a/tests/qemu-iotests/059.out b/tests/qemu-iotests/059.out
index 678adb4..898b528 100644
--- a/tests/qemu-iotests/059.out
+++ b/tests/qemu-iotests/059.out
@@ -3,17 +3,14 @@ QA output created by 059
 === Testing invalid granularity ===
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
 can't open device TEST_DIR/t.vmdk: Invalid granularity, image may be corrupt
-no file open, try 'help open'
 
 === Testing too big L2 table size ===
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
 can't open device TEST_DIR/t.vmdk: L2 table size too big
-no file open, try 'help open'
 
 === Testing too big L1 table size ===
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
 can't open device TEST_DIR/t.vmdk: L1 size too big
-no file open, try 'help open'
 
 === Testing monolithicFlat creation and opening ===
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=2147483648 
subformat=monolithicFlat
diff --git a/tests/qemu-iotests/070.out b/tests/qemu-iotests/070.out
index 131a5b1..c269d99 100644
--- a/tests/qemu-iotests/070.out
+++ b/tests/qemu-iotests/070.out
@@ -4,7 +4,6 @@ QA output created by 070
 can't open device TEST_DIR/iotest-dirtylog-10G-4M.vhdx: VHDX image file 
'TEST_DIR/iotest-dirtylog-10G-4M.vhdx' opened read-only, but contains a log 
that needs to be replayed
 To replay the log, run:
 qemu-img check -r all 'TEST_DIR/iotest-dirtylog-10G-4M.vhdx'
- no file open, try 'help open'
 === Verify open image replays log  ===
 read 18874368/18874368 bytes at offset 0
 18 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
diff --git a/tests/qemu-iotests/075.out b/tests/qemu-iotests/075.out
index 87beae4..b234b75 100644
--- a/tests/qemu-iotests/075.out
+++ b/tests/qemu-iotests/075.out
@@ -10,29 +10,22 @@ read 512/512 bytes at offset 1048064
 
 == block_size must be a multiple of 512 ==
 can't open device TEST_DIR/simple-pattern.cloop: block_size 513 must be a 
multiple of 512
-no file open, try 'help open'
 
 == block_size cannot be zero ==
 can't open device TEST_DIR/simple-pattern.cloop: block_size cannot be zero
-no file open, try 'help open'
 
 == huge block_size ===
 can't open device TEST_DIR/simple-pattern.cloop: block_size 4294966784 must be 
64 MB or less
-no file open, try 'help open'
 
 == offsets_size overflow ===
 can't open device TEST_DIR/simple-pattern.cloop: n_blocks 4294967295 must be 
536870911 or less
-no file open, try 'help open'
 
 == refuse images that require too many offsets ===
 can't open device TEST_DIR/simple-pattern.cloop: image requires too many 
offsets, try increasing block size
-no file open, try 'help open'
 
 == refuse images with non-monotonically increasing offsets ==
 can't open device TEST_DIR/simple-pattern.cloop: offsets not monotonically 
increasing at index 1, image file is corrupt
-no file open, try 'help open'
 
 == refuse images with invalid compressed block size ==
 can't open device TEST_DIR/simple-pattern.cloop: invalid compressed block size 
at index 1, image file is corrupt
-no file open, try 'help open'
 *** done
diff --git a/tests/qemu-iotests/076.out b/tests/qemu-iotests/076.out
index 72645b2..9c66c5f 100644
--- a/tests/qemu-iotests/076.out
+++ b/tests/qemu-iotests/076.out
@@ -6,15 +6,12 @@ read 65536/65536 bytes at offset 0
 
 == Negative catalog size ==
 can't open device TEST_DIR/parallels-v1: Catalog too large
-no file open, try 'help open'
 
 == Overflow in catalog allocation ==
 can't open device TEST_DIR/parallels-v1: Catalog too large
-no file open, try 'help open'
 
 == Zero sectors per track ==
 can't open device TEST_DIR/parallels-v1: Invalid image: Zero sectors per track
-no file open, try 'help open'
 
 == Read from a valid v2 image ==
 read 65536/65536 bytes at offset 0
diff --git a/tests/qemu-iotests/078.out b/tests/qemu-iotests/078.out
index 42b8a83..c3d6aa4 100644

Re: [Qemu-devel] [PATCH v4 1/9] ACPI: Add a function for building named qword entries

2017-01-30 Thread Laszlo Ersek

On 01/27/17 19:19, Ben Warren wrote:
> 
>> On Jan 27, 2017, at 8:12 AM, Laszlo Ersek > > wrote:
>>
>> On 01/27/17 16:43, Kevin O'Connor wrote:
>>> On Fri, Jan 27, 2017 at 03:46:33PM +0100, Laszlo Ersek wrote:
 On 01/27/17 15:18, Kevin O'Connor wrote:
> If an offset is going to be added, shouldn't both a source offset and
> destination offset be used?
>
>/*
> * COMMAND_WRITE_POINTER - update a writeable file named
> * @pointer.dest_file at @pointer.dest_offset, by writing
> pointer
> * plus @pointer.src_offset to the blob originating from
> * @src_file. 1,2,4 or 8 byte unsigned write is used depending
> * on @pointer.size.
> */
>struct {
>char dest_file[BIOS_LINKER_LOADER_FILESZ];
>char src_file[BIOS_LINKER_LOADER_FILESZ];
>uint32_t src_offset, dest_offset;
>uint8_t size;
>} pointer;
>
> I doubt the offsets or size is really all that important though.

 The offset into the fw_cfg file that receives the allocation address is
 important, that allows the same file to receive several different
 addresses (for different downloaded blobs), at different offsets.

 OTOH, asking the firmware to add a constant to the address value before
 writing it to the fw_cfg file is not necessary, in my opinion. The blob
 that the firmware allocated and downloaded originates from QEMU to begin
 with, so QEMU knows its internal structure.
>>>
>>> I guess I'm missing why QEMU would want to use the same writable file
>>> for multiple pointers
>>
>> I know no specific reason; I just thought this possible generalization
>> was one of the advantages in Michael's suggestion.
>>
>>> as well as why it would want support for
>>> pointers smaller than 8 bytes in size.
>>
>> Hm, right, good point.
>>
>>> If it's because it may be
>>> easier to support an internal QEMU blob of a particular format, then
>>> adding a src_offset would facilitate that.
>>>
>>> However, if it was done so that WRITE_POINTER mimicks ADD_POINTER then
>>> that's fine too.
>>
>> That might be the main reason I guess; reading back a bit, Michael wrote
>> "... a variant of ADD_POINTER".
>>
>>> I'm okay with either format.
>>
>> I'd say let's go ahead with Michael's proposal then. Ben, are you okay
>> with that?
>>
> Yes, this seems reasonable.  If I’m interpreting this properly, it’s
> like ADD_POINTER, except that we extend the write path back to QEMU over
> DMA.  Is that about right?

Yes. Basically the command differs from ADD_POINTER in that (a) the
pointer field to patch lives in "fw_cfg space", not in guest RAM, so
rather than updating guest RAM, the firmware has to select a writeable
fw_cfg file, seek ("skip") to the specified offset, and rewrite the
field, (b) the nature of the patching is not "increment", just a plain
"overwrite".

Thanks!
Laszlo

[Qemu-devel] [PATCH RFC 00/36] 9pfs: local: fix vulnerability to symlink attacks

2017-01-30 Thread Greg Kurz

This series tries to fix CVE-2016-9602. This vulnerability affects all
accesses to the underlying filesystem in the "local" backend code.

If QEMU is started with:

-fsdev local,security_model=,path=/foo/bar

then the guest can cause QEMU to create symlinks in /foo/bar.

This causes accesses to any path /foo/bar/some/path to be unsafe, since
untrusted code within the guest (or in another guest sharing the same
virtfs folder) could change some/path to point to a random path of the
host filesystem.

The core problem is that the "local" backend relies on path-based syscalls
to access the underlying filesystem. All path-based syscalls are vulnerable
to this issue, even open(O_NOFOLLOW) or syscalls that explicitly don't
dereference symlinks, since the kernel only checks the rightmost element of
the path. Depending on the privilege level of the QEMU process, a guest can
end up opening, renaming, changing ACLs, unlinking... files on the host
filesystem.

A possible fix is to always walk paths manually with openat(O_NOFOLLOW), and
use "*at()" variants of all syscalls in the "local" backend code. This will
likely not improve performances for path-based syscalls in the guest, but I
don't see how to fix the issue without kernel support (like an O_PATHSTATIC
flag to tell the full path should not traverse any symlink for example).

A fair amount of code is shared by all security models: this series hence
starts with preparatory patches to split the code. This allows to have
patches of reasonable size, that don't affect too many code paths.

TODO:
- the accesses to metadata files of the "mapped-file" security mode also need
  to be converted

---

Greg Kurz (36):
  9pfs: local: move xattr security ops to 9p-xattr.c
  9pfs: local: split chmod operation per security model
  9pfs: local: split mknod operation per security model
  9pfs: local: split mkdir operation per security model
  9pfs: local: split open2 operation per security model
  9pfs: local: split symlink operation per security model
  9pfs: local: split mkdir operation per security model
  9pfs: local: improve error handling in link op
  9pfs: local: post link operation for mapped-file security
  v9fs: local: improve error handling in rename op
  9pfs: local: post rename operation for mapped-file security
  9pfs: local: pre remove operation for mapped-file security
  9pfs: local: pre unlikat operation for mapped-file security
  9pfs: remove side-effects in local_init()
  9pfs: remove side-effects in local_open() and local_opendir()
  9pfs: introduce openat_nofollow() helper
  9pfs: local: keep a file descriptor on the shared folder
  9pfs: local: open/opendir: don't follow symlinks
  9pfs: local: utimensat: don't follow symlinks
  9pfs: local: readlink: don't follow symlinks
  9pfs: local: truncate: don't follow symlinks
  9pfs: local: statfs: don't follow symlinks
  9pfs: local: mknod/mkdir/open2: don't follow symlinks
  9pfs: local: chmod: don't follow symlinks
  9pfs: local: symlink: don't follow symlinks
  9pfs: local: chown: don't follow symlinks
  9pfs: local: link: don't follow symlinks
  9pfs: local: rename: don't follow symlinks
  9pfs: local: remove: don't follow symlinks
  9pfs: local: unlinkat: don't follow symlinks
  9pfs: local: introduce symlink-attack safe xattr helpers
  9pfs: local: lstat: don't follow symlinks
  9pfs: local: lgetxattr: don't follow symlinks
  9pfs: local: llistxattr: don't follow symlinks
  9pfs: local: lsetxattr: don't follow symlinks
  9pfs: local: lremovexattr: don't follow symlinks


 hw/9pfs/9p-local.c  | 1319 +--
 hw/9pfs/9p-local.h  |   22 +
 hw/9pfs/9p-posix-acl.c  |   48 --
 hw/9pfs/9p-util.c   |   69 ++
 hw/9pfs/9p-util.h   |   25 +
 hw/9pfs/9p-xattr-user.c |   28 -
 hw/9pfs/9p-xattr.c  |  229 
 hw/9pfs/9p-xattr.h  |   91 +--
 hw/9pfs/Makefile.objs   |2 
 9 files changed, 1306 insertions(+), 527 deletions(-)
 create mode 100644 hw/9pfs/9p-local.h
 create mode 100644 hw/9pfs/9p-util.c
 create mode 100644 hw/9pfs/9p-util.h

--
Greg

[Qemu-devel] [PATCH RFC 04/36] 9pfs: local: split mkdir operation per security model

2017-01-30 Thread Greg Kurz

Having all security models implemented in one monolithic function is
cumbersome. Especially when the need arises to fix something in the
shared code, as it forces to change all the paths at the same time.

This doesn't fix any bug, it is just preparatory cleanup.

Signed-off-by: Greg Kurz 
---
 hw/9pfs/9p-local.c |  134 +---
 1 file changed, 95 insertions(+), 39 deletions(-)

diff --git a/hw/9pfs/9p-local.c b/hw/9pfs/9p-local.c
index 7f513c5728f6..0d6869123094 100644
--- a/hw/9pfs/9p-local.c
+++ b/hw/9pfs/9p-local.c
@@ -626,8 +626,8 @@ static int local_mknod(FsContext *fs_ctx, V9fsPath 
*dir_path, const char *name,
 g_assert_not_reached();
 }
 
-static int local_mkdir(FsContext *fs_ctx, V9fsPath *dir_path,
-   const char *name, FsCred *credp)
+static int local_mkdir_mapped(FsContext *fs_ctx, V9fsPath *dir_path,
+  const char *name, FsCred *credp)
 {
 char *path;
 int err = -1;
@@ -639,43 +639,16 @@ static int local_mkdir(FsContext *fs_ctx, V9fsPath 
*dir_path,
 v9fs_string_sprintf(&fullname, "%s/%s", dir_path->data, name);
 path = fullname.data;
 
-/* Determine the security model */
-if (fs_ctx->export_flags & V9FS_SM_MAPPED) {
-buffer = rpath(fs_ctx, path);
-err = mkdir(buffer, SM_LOCAL_DIR_MODE_BITS);
-if (err == -1) {
-goto out;
-}
-credp->fc_mode = credp->fc_mode|S_IFDIR;
-err = local_set_xattr(buffer, credp);
-if (err == -1) {
-serrno = errno;
-goto err_end;
-}
-} else if (fs_ctx->export_flags & V9FS_SM_MAPPED_FILE) {
-buffer = rpath(fs_ctx, path);
-err = mkdir(buffer, SM_LOCAL_DIR_MODE_BITS);
-if (err == -1) {
-goto out;
-}
-credp->fc_mode = credp->fc_mode|S_IFDIR;
-err = local_set_mapped_file_attr(fs_ctx, path, credp);
-if (err == -1) {
-serrno = errno;
-goto err_end;
-}
-} else if ((fs_ctx->export_flags & V9FS_SM_PASSTHROUGH) ||
-   (fs_ctx->export_flags & V9FS_SM_NONE)) {
-buffer = rpath(fs_ctx, path);
-err = mkdir(buffer, credp->fc_mode);
-if (err == -1) {
-goto out;
-}
-err = local_post_create_passthrough(fs_ctx, path, credp);
-if (err == -1) {
-serrno = errno;
-goto err_end;
-}
+buffer = rpath(fs_ctx, path);
+err = mkdir(buffer, SM_LOCAL_DIR_MODE_BITS);
+if (err == -1) {
+goto out;
+}
+credp->fc_mode = credp->fc_mode | S_IFDIR;
+err = local_set_xattr(buffer, credp);
+if (err == -1) {
+serrno = errno;
+goto err_end;
 }
 goto out;
 
@@ -688,6 +661,89 @@ out:
 return err;
 }
 
+static int local_mkdir_mapped_file(FsContext *fs_ctx, V9fsPath *dir_path,
+   const char *name, FsCred *credp)
+{
+char *path;
+int err = -1;
+int serrno = 0;
+V9fsString fullname;
+char *buffer = NULL;
+
+v9fs_string_init(&fullname);
+v9fs_string_sprintf(&fullname, "%s/%s", dir_path->data, name);
+path = fullname.data;
+
+buffer = rpath(fs_ctx, path);
+err = mkdir(buffer, SM_LOCAL_DIR_MODE_BITS);
+if (err == -1) {
+goto out;
+}
+credp->fc_mode = credp->fc_mode | S_IFDIR;
+err = local_set_mapped_file_attr(fs_ctx, path, credp);
+if (err == -1) {
+serrno = errno;
+goto err_end;
+}
+goto out;
+
+err_end:
+remove(buffer);
+errno = serrno;
+out:
+g_free(buffer);
+v9fs_string_free(&fullname);
+return err;
+}
+
+static int local_mkdir_passthrough(FsContext *fs_ctx, V9fsPath *dir_path,
+   const char *name, FsCred *credp)
+{
+char *path;
+int err = -1;
+int serrno = 0;
+V9fsString fullname;
+char *buffer = NULL;
+
+v9fs_string_init(&fullname);
+v9fs_string_sprintf(&fullname, "%s/%s", dir_path->data, name);
+path = fullname.data;
+
+buffer = rpath(fs_ctx, path);
+err = mkdir(buffer, credp->fc_mode);
+if (err == -1) {
+goto out;
+}
+err = local_post_create_passthrough(fs_ctx, path, credp);
+if (err == -1) {
+serrno = errno;
+goto err_end;
+}
+goto out;
+
+err_end:
+remove(buffer);
+errno = serrno;
+out:
+g_free(buffer);
+v9fs_string_free(&fullname);
+return err;
+}
+
+static int local_mkdir(FsContext *fs_ctx, V9fsPath *dir_path, const char *name,
+   FsCred *credp)
+{
+if (fs_ctx->export_flags & V9FS_SM_MAPPED) {
+return local_mkdir_mapped(fs_ctx, dir_path, name, credp);
+} else if (fs_ctx->export_flags & V9FS_SM_MAPPED_FILE) {
+return local_mkdir_mapped_file(fs_ctx, dir_path, name, credp);
+} else if ((fs_ctx->export_flags & V9FS_SM_PASSTHROUGH) ||
+   (fs_ctx->export_flags & V9FS_SM_NONE)) {
+ret

[Qemu-devel] [PATCH RFC 01/36] 9pfs: local: move xattr security ops to 9p-xattr.c

2017-01-30 Thread Greg Kurz

These functions are always called indirectly. It really doesn't make sense
for them to sit in a header file.

Signed-off-by: Greg Kurz 
---
 hw/9pfs/9p-xattr.c |   61 
 hw/9pfs/9p-xattr.h |   80 +---
 2 files changed, 75 insertions(+), 66 deletions(-)

diff --git a/hw/9pfs/9p-xattr.c b/hw/9pfs/9p-xattr.c
index 5d8595ed932a..19a2daf02f5c 100644
--- a/hw/9pfs/9p-xattr.c
+++ b/hw/9pfs/9p-xattr.c
@@ -143,6 +143,67 @@ int v9fs_remove_xattr(FsContext *ctx,
 
 }
 
+ssize_t pt_getxattr(FsContext *ctx, const char *path, const char *name,
+void *value, size_t size)
+{
+char *buffer;
+ssize_t ret;
+
+buffer = rpath(ctx, path);
+ret = lgetxattr(buffer, name, value, size);
+g_free(buffer);
+return ret;
+}
+
+int pt_setxattr(FsContext *ctx, const char *path, const char *name, void 
*value,
+size_t size, int flags)
+{
+char *buffer;
+int ret;
+
+buffer = rpath(ctx, path);
+ret = lsetxattr(buffer, name, value, size, flags);
+g_free(buffer);
+return ret;
+}
+
+int pt_removexattr(FsContext *ctx, const char *path, const char *name)
+{
+char *buffer;
+int ret;
+
+buffer = rpath(ctx, path);
+ret = lremovexattr(path, name);
+g_free(buffer);
+return ret;
+}
+
+ssize_t notsup_getxattr(FsContext *ctx, const char *path, const char *name,
+void *value, size_t size)
+{
+errno = ENOTSUP;
+return -1;
+}
+
+int notsup_setxattr(FsContext *ctx, const char *path, const char *name,
+void *value, size_t size, int flags)
+{
+errno = ENOTSUP;
+return -1;
+}
+
+ssize_t notsup_listxattr(FsContext *ctx, const char *path, char *name,
+ void *value, size_t size)
+{
+return 0;
+}
+
+int notsup_removexattr(FsContext *ctx, const char *path, const char *name)
+{
+errno = ENOTSUP;
+return -1;
+}
+
 XattrOperations *mapped_xattr_ops[] = {
 &mapped_user_xattr,
 &mapped_pacl_xattr,
diff --git a/hw/9pfs/9p-xattr.h b/hw/9pfs/9p-xattr.h
index a853ea641c0b..3f43f5153f3c 100644
--- a/hw/9pfs/9p-xattr.h
+++ b/hw/9pfs/9p-xattr.h
@@ -49,73 +49,21 @@ ssize_t v9fs_list_xattr(FsContext *ctx, const char *path, 
void *value,
 int v9fs_set_xattr(FsContext *ctx, const char *path, const char *name,
   void *value, size_t size, int flags);
 int v9fs_remove_xattr(FsContext *ctx, const char *path, const char *name);
+
 ssize_t pt_listxattr(FsContext *ctx, const char *path, char *name, void *value,
  size_t size);
-
-static inline ssize_t pt_getxattr(FsContext *ctx, const char *path,
-  const char *name, void *value, size_t size)
-{
-char *buffer;
-ssize_t ret;
-
-buffer = rpath(ctx, path);
-ret = lgetxattr(buffer, name, value, size);
-g_free(buffer);
-return ret;
-}
-
-static inline int pt_setxattr(FsContext *ctx, const char *path,
-  const char *name, void *value,
-  size_t size, int flags)
-{
-char *buffer;
-int ret;
-
-buffer = rpath(ctx, path);
-ret = lsetxattr(buffer, name, value, size, flags);
-g_free(buffer);
-return ret;
-}
-
-static inline int pt_removexattr(FsContext *ctx,
- const char *path, const char *name)
-{
-char *buffer;
-int ret;
-
-buffer = rpath(ctx, path);
-ret = lremovexattr(path, name);
-g_free(buffer);
-return ret;
-}
-
-static inline ssize_t notsup_getxattr(FsContext *ctx, const char *path,
-  const char *name, void *value,
-  size_t size)
-{
-errno = ENOTSUP;
-return -1;
-}
-
-static inline int notsup_setxattr(FsContext *ctx, const char *path,
-  const char *name, void *value,
-  size_t size, int flags)
-{
-errno = ENOTSUP;
-return -1;
-}
-
-static inline ssize_t notsup_listxattr(FsContext *ctx, const char *path,
-   char *name, void *value, size_t size)
-{
-return 0;
-}
-
-static inline int notsup_removexattr(FsContext *ctx,
- const char *path, const char *name)
-{
-errno = ENOTSUP;
-return -1;
-}
+ssize_t pt_getxattr(FsContext *ctx, const char *path, const char *name,
+void *value, size_t size);
+int pt_setxattr(FsContext *ctx, const char *path, const char *name, void 
*value,
+size_t size, int flags);
+int pt_removexattr(FsContext *ctx, const char *path, const char *name);
+
+ssize_t notsup_getxattr(FsContext *ctx, const char *path, const char *name,
+void *value, size_t size);
+int notsup_setxattr(FsContext *ctx, const char *path, const char *name,
+void *value, size_t size, int flags);
+ssize_t notsup_listxat

[Qemu-devel] [PATCH RFC 09/36] 9pfs: local: post link operation for mapped-file security

2017-01-30 Thread Greg Kurz

The link operation is really the same for the passthrough and mapped
security models. This patch simply moves the mapped-file bits to a
separate function. This will make future modifications easier.

This doesn't fix any bug, it is just preparatory cleanup.

Signed-off-by: Greg Kurz 
---
 hw/9pfs/9p-local.c |   42 ++
 1 file changed, 30 insertions(+), 12 deletions(-)

diff --git a/hw/9pfs/9p-local.c b/hw/9pfs/9p-local.c
index c0b1907f7901..ebc3e208efa0 100644
--- a/hw/9pfs/9p-local.c
+++ b/hw/9pfs/9p-local.c
@@ -1108,6 +1108,35 @@ static int local_symlink(FsContext *fs_ctx, const char 
*oldpath,
 g_assert_not_reached();
 }
 
+static int local_post_link_mapped_file(FsContext *ctx, V9fsPath *oldpath,
+   V9fsPath *dirpath, const char *name)
+{
+int ret;
+V9fsString newpath;
+char *buffer, *buffer1;
+
+v9fs_string_init(&newpath);
+v9fs_string_sprintf(&newpath, "%s/%s", dirpath->data, name);
+
+/* Link the .virtfs_metadata files. Create the metada directory */
+ret = local_create_mapped_attr_dir(ctx, newpath.data);
+if (ret < 0) {
+goto out;
+}
+buffer = local_mapped_attr_path(ctx, oldpath->data);
+buffer1 = local_mapped_attr_path(ctx, newpath.data);
+ret = link(buffer, buffer1);
+g_free(buffer);
+g_free(buffer1);
+if (ret < 0 && errno == ENOENT) {
+ret = 0;
+}
+
+out:
+v9fs_string_free(&newpath);
+return ret;
+}
+
 static int local_link(FsContext *ctx, V9fsPath *oldpath,
   V9fsPath *dirpath, const char *name)
 {
@@ -1129,21 +1158,10 @@ static int local_link(FsContext *ctx, V9fsPath *oldpath,
 
 /* now link the virtfs_metadata files */
 if (ctx->export_flags & V9FS_SM_MAPPED_FILE) {
-char *vbuffer, *vbuffer1;
-
-/* Link the .virtfs_metadata files. Create the metada directory */
-ret = local_create_mapped_attr_dir(ctx, newpath.data);
+ret = local_post_link_mapped_file(ctx, oldpath, dirpath, name);
 if (ret < 0) {
 goto err_out;
 }
-vbuffer = local_mapped_attr_path(ctx, oldpath->data);
-vbuffer1 = local_mapped_attr_path(ctx, newpath.data);
-ret = link(vbuffer, vbuffer1);
-g_free(vbuffer);
-g_free(vbuffer1);
-if (ret < 0 && errno != ENOENT) {
-goto err_out;
-}
 }
 goto out;

[Qemu-devel] [PATCH RFC 02/36] 9pfs: local: split chmod operation per security model

2017-01-30 Thread Greg Kurz

Having all security models implemented in one monolithic function is
cumbersome. Especially when the need arises to fix something in the
shared code, as it forces to change all the paths at the same time.

This doesn't fix any bug, it is just preparatory cleanup.

Signed-off-by: Greg Kurz 
---
 hw/9pfs/9p-local.c |   46 +-
 1 file changed, 37 insertions(+), 9 deletions(-)

diff --git a/hw/9pfs/9p-local.c b/hw/9pfs/9p-local.c
index 7de07e1ba67f..73a20657f1fc 100644
--- a/hw/9pfs/9p-local.c
+++ b/hw/9pfs/9p-local.c
@@ -461,25 +461,53 @@ static ssize_t local_pwritev(FsContext *ctx, 
V9fsFidOpenState *fs,
 return ret;
 }
 
-static int local_chmod(FsContext *fs_ctx, V9fsPath *fs_path, FsCred *credp)
+static int local_chmod_mapped(FsContext *fs_ctx, V9fsPath *fs_path,
+  FsCred *credp)
+{
+char *buffer;
+int ret = -1;
+char *path = fs_path->data;
+
+buffer = rpath(fs_ctx, path);
+ret = local_set_xattr(buffer, credp);
+g_free(buffer);
+
+return ret;
+}
+
+static int local_chmod_passthrough(FsContext *fs_ctx, V9fsPath *fs_path,
+   FsCred *credp)
 {
 char *buffer;
 int ret = -1;
 char *path = fs_path->data;
 
+buffer = rpath(fs_ctx, path);
+ret = chmod(buffer, credp->fc_mode);
+g_free(buffer);
+
+return ret;
+}
+
+static int local_chmod_mapped_file(FsContext *fs_ctx, V9fsPath *fs_path,
+   FsCred *credp)
+{
+char *path = fs_path->data;
+
+return local_set_mapped_file_attr(fs_ctx, path, credp);
+}
+
+static int local_chmod(FsContext *fs_ctx, V9fsPath *fs_path, FsCred *credp)
+{
 if (fs_ctx->export_flags & V9FS_SM_MAPPED) {
-buffer = rpath(fs_ctx, path);
-ret = local_set_xattr(buffer, credp);
-g_free(buffer);
+return local_chmod_mapped(fs_ctx, fs_path, credp);
 } else if (fs_ctx->export_flags & V9FS_SM_MAPPED_FILE) {
-return local_set_mapped_file_attr(fs_ctx, path, credp);
+return local_chmod_mapped_file(fs_ctx, fs_path, credp);
 } else if ((fs_ctx->export_flags & V9FS_SM_PASSTHROUGH) ||
(fs_ctx->export_flags & V9FS_SM_NONE)) {
-buffer = rpath(fs_ctx, path);
-ret = chmod(buffer, credp->fc_mode);
-g_free(buffer);
+return local_chmod_passthrough(fs_ctx, fs_path, credp);
 }
-return ret;
+g_assert_not_reached();
 }
 
 static int local_mknod(FsContext *fs_ctx, V9fsPath *dir_path,

[Qemu-devel] [PATCH RFC 03/36] 9pfs: local: split mknod operation per security model

2017-01-30 Thread Greg Kurz

Having all security models implemented in one monolithic function is
cumbersome. Especially when the need arises to fix something in the
shared code, as it forces to change all the paths at the same time.

This doesn't fix any bug, it is just preparatory cleanup.

Signed-off-by: Greg Kurz 
---
 hw/9pfs/9p-local.c |  129 +---
 1 file changed, 92 insertions(+), 37 deletions(-)

diff --git a/hw/9pfs/9p-local.c b/hw/9pfs/9p-local.c
index 73a20657f1fc..7f513c5728f6 100644
--- a/hw/9pfs/9p-local.c
+++ b/hw/9pfs/9p-local.c
@@ -510,8 +510,8 @@ static int local_chmod(FsContext *fs_ctx, V9fsPath 
*fs_path, FsCred *credp)
 g_assert_not_reached();
 }
 
-static int local_mknod(FsContext *fs_ctx, V9fsPath *dir_path,
-   const char *name, FsCred *credp)
+static int local_mknod_mapped(FsContext *fs_ctx, V9fsPath *dir_path,
+  const char *name, FsCred *credp)
 {
 char *path;
 int err = -1;
@@ -523,42 +523,49 @@ static int local_mknod(FsContext *fs_ctx, V9fsPath 
*dir_path,
 v9fs_string_sprintf(&fullname, "%s/%s", dir_path->data, name);
 path = fullname.data;
 
-/* Determine the security model */
-if (fs_ctx->export_flags & V9FS_SM_MAPPED) {
-buffer = rpath(fs_ctx, path);
-err = mknod(buffer, SM_LOCAL_MODE_BITS|S_IFREG, 0);
-if (err == -1) {
-goto out;
-}
-err = local_set_xattr(buffer, credp);
-if (err == -1) {
-serrno = errno;
-goto err_end;
-}
-} else if (fs_ctx->export_flags & V9FS_SM_MAPPED_FILE) {
+buffer = rpath(fs_ctx, path);
+err = mknod(buffer, SM_LOCAL_MODE_BITS | S_IFREG, 0);
+if (err == -1) {
+goto out;
+}
+err = local_set_xattr(buffer, credp);
+if (err == -1) {
+serrno = errno;
+goto err_end;
+}
+goto out;
 
-buffer = rpath(fs_ctx, path);
-err = mknod(buffer, SM_LOCAL_MODE_BITS|S_IFREG, 0);
-if (err == -1) {
-goto out;
-}
-err = local_set_mapped_file_attr(fs_ctx, path, credp);
-if (err == -1) {
-serrno = errno;
-goto err_end;
-}
-} else if ((fs_ctx->export_flags & V9FS_SM_PASSTHROUGH) ||
-   (fs_ctx->export_flags & V9FS_SM_NONE)) {
-buffer = rpath(fs_ctx, path);
-err = mknod(buffer, credp->fc_mode, credp->fc_rdev);
-if (err == -1) {
-goto out;
-}
-err = local_post_create_passthrough(fs_ctx, path, credp);
-if (err == -1) {
-serrno = errno;
-goto err_end;
-}
+err_end:
+remove(buffer);
+errno = serrno;
+out:
+g_free(buffer);
+v9fs_string_free(&fullname);
+return err;
+}
+
+static int local_mknod_mapped_file(FsContext *fs_ctx, V9fsPath *dir_path,
+   const char *name, FsCred *credp)
+{
+char *path;
+int err = -1;
+int serrno = 0;
+V9fsString fullname;
+char *buffer = NULL;
+
+v9fs_string_init(&fullname);
+v9fs_string_sprintf(&fullname, "%s/%s", dir_path->data, name);
+path = fullname.data;
+
+buffer = rpath(fs_ctx, path);
+err = mknod(buffer, SM_LOCAL_MODE_BITS | S_IFREG, 0);
+if (err == -1) {
+goto out;
+}
+err = local_set_mapped_file_attr(fs_ctx, path, credp);
+if (err == -1) {
+serrno = errno;
+goto err_end;
 }
 goto out;
 
@@ -571,6 +578,54 @@ out:
 return err;
 }
 
+static int local_mknod_passthrough(FsContext *fs_ctx, V9fsPath *dir_path,
+   const char *name, FsCred *credp)
+{
+char *path;
+int err = -1;
+int serrno = 0;
+V9fsString fullname;
+char *buffer = NULL;
+
+v9fs_string_init(&fullname);
+v9fs_string_sprintf(&fullname, "%s/%s", dir_path->data, name);
+path = fullname.data;
+
+buffer = rpath(fs_ctx, path);
+err = mknod(buffer, credp->fc_mode, credp->fc_rdev);
+if (err == -1) {
+goto out;
+}
+err = local_post_create_passthrough(fs_ctx, path, credp);
+if (err == -1) {
+serrno = errno;
+goto err_end;
+}
+goto out;
+
+err_end:
+remove(buffer);
+errno = serrno;
+out:
+g_free(buffer);
+v9fs_string_free(&fullname);
+return err;
+}
+
+static int local_mknod(FsContext *fs_ctx, V9fsPath *dir_path, const char *name,
+   FsCred *credp)
+{
+if (fs_ctx->export_flags & V9FS_SM_MAPPED) {
+return local_mknod_mapped(fs_ctx, dir_path, name, credp);
+} else if (fs_ctx->export_flags & V9FS_SM_MAPPED_FILE) {
+return local_mknod_mapped_file(fs_ctx, dir_path, name, credp);
+} else if ((fs_ctx->export_flags & V9FS_SM_PASSTHROUGH) ||
+   (fs_ctx->export_flags & V9FS_SM_NONE)) {
+return local_mknod_passthrough(fs_ctx, dir_path, name, credp);
+}
+g_assert_not_reached();
+}
+
 static int local_mkdir(FsContext *

[Qemu-devel] [PATCH RFC 05/36] 9pfs: local: split open2 operation per security model

2017-01-30 Thread Greg Kurz

Having all security models implemented in one monolithic function is
cumbersome. Especially when the need arises to fix something in the
shared code, as it forces to change all the paths at the same time.

This doesn't fix any bug, it is just preparatory cleanup.

Signed-off-by: Greg Kurz 
---
 hw/9pfs/9p-local.c |  167 ++--
 1 file changed, 123 insertions(+), 44 deletions(-)

diff --git a/hw/9pfs/9p-local.c b/hw/9pfs/9p-local.c
index 0d6869123094..4deb19ce8775 100644
--- a/hw/9pfs/9p-local.c
+++ b/hw/9pfs/9p-local.c
@@ -785,8 +785,9 @@ static int local_fstat(FsContext *fs_ctx, int fid_type,
 return err;
 }
 
-static int local_open2(FsContext *fs_ctx, V9fsPath *dir_path, const char *name,
-   int flags, FsCred *credp, V9fsFidOpenState *fs)
+static int local_open2_mapped(FsContext *fs_ctx, V9fsPath *dir_path,
+  const char *name, int flags, FsCred *credp,
+  V9fsFidOpenState *fs)
 {
 char *path;
 int fd = -1;
@@ -804,48 +805,65 @@ static int local_open2(FsContext *fs_ctx, V9fsPath 
*dir_path, const char *name,
 v9fs_string_sprintf(&fullname, "%s/%s", dir_path->data, name);
 path = fullname.data;
 
-/* Determine the security model */
-if (fs_ctx->export_flags & V9FS_SM_MAPPED) {
-buffer = rpath(fs_ctx, path);
-fd = open(buffer, flags, SM_LOCAL_MODE_BITS);
-if (fd == -1) {
-err = fd;
-goto out;
-}
-credp->fc_mode = credp->fc_mode|S_IFREG;
-/* Set cleint credentials in xattr */
-err = local_set_xattr(buffer, credp);
-if (err == -1) {
-serrno = errno;
-goto err_end;
-}
-} else if (fs_ctx->export_flags & V9FS_SM_MAPPED_FILE) {
-buffer = rpath(fs_ctx, path);
-fd = open(buffer, flags, SM_LOCAL_MODE_BITS);
-if (fd == -1) {
-err = fd;
-goto out;
-}
-credp->fc_mode = credp->fc_mode|S_IFREG;
-/* Set client credentials in .virtfs_metadata directory files */
-err = local_set_mapped_file_attr(fs_ctx, path, credp);
-if (err == -1) {
-serrno = errno;
-goto err_end;
-}
-} else if ((fs_ctx->export_flags & V9FS_SM_PASSTHROUGH) ||
-   (fs_ctx->export_flags & V9FS_SM_NONE)) {
-buffer = rpath(fs_ctx, path);
-fd = open(buffer, flags, credp->fc_mode);
-if (fd == -1) {
-err = fd;
-goto out;
-}
-err = local_post_create_passthrough(fs_ctx, path, credp);
-if (err == -1) {
-serrno = errno;
-goto err_end;
-}
+buffer = rpath(fs_ctx, path);
+fd = open(buffer, flags, SM_LOCAL_MODE_BITS);
+if (fd == -1) {
+err = fd;
+goto out;
+}
+credp->fc_mode = credp->fc_mode | S_IFREG;
+/* Set cleint credentials in xattr */
+err = local_set_xattr(buffer, credp);
+if (err == -1) {
+serrno = errno;
+goto err_end;
+}
+err = fd;
+fs->fd = fd;
+goto out;
+
+err_end:
+close(fd);
+remove(buffer);
+errno = serrno;
+out:
+g_free(buffer);
+v9fs_string_free(&fullname);
+return err;
+}
+
+static int local_open2_mapped_file(FsContext *fs_ctx, V9fsPath *dir_path,
+   const char *name, int flags, FsCred *credp,
+   V9fsFidOpenState *fs)
+{
+char *path;
+int fd = -1;
+int err = -1;
+int serrno = 0;
+V9fsString fullname;
+char *buffer = NULL;
+
+/*
+ * Mark all the open to not follow symlinks
+ */
+flags |= O_NOFOLLOW;
+
+v9fs_string_init(&fullname);
+v9fs_string_sprintf(&fullname, "%s/%s", dir_path->data, name);
+path = fullname.data;
+
+buffer = rpath(fs_ctx, path);
+fd = open(buffer, flags, SM_LOCAL_MODE_BITS);
+if (fd == -1) {
+err = fd;
+goto out;
+}
+credp->fc_mode = credp->fc_mode | S_IFREG;
+/* Set client credentials in .virtfs_metadata directory files */
+err = local_set_mapped_file_attr(fs_ctx, path, credp);
+if (err == -1) {
+serrno = errno;
+goto err_end;
 }
 err = fd;
 fs->fd = fd;
@@ -861,6 +879,67 @@ out:
 return err;
 }
 
+static int local_open2_passthrough(FsContext *fs_ctx, V9fsPath *dir_path,
+   const char *name, int flags, FsCred *credp,
+   V9fsFidOpenState *fs)
+{
+char *path;
+int fd = -1;
+int err = -1;
+int serrno = 0;
+V9fsString fullname;
+char *buffer = NULL;
+
+/*
+ * Mark all the open to not follow symlinks
+ */
+flags |= O_NOFOLLOW;
+
+v9fs_string_init(&fullname);
+v9fs_string_sprintf(&fullname, "%s/%s", dir_path->data, name);
+path = fullname.data;
+
+buffer = rpath(fs_ctx, path);
+fd = open(buffer, flags, credp->fc

[Qemu-devel] [PATCH RFC 12/36] 9pfs: local: pre remove operation for mapped-file security

2017-01-30 Thread Greg Kurz

The remove operation is really the same for the passthrough and mapped
security models. This patch simply moves the mapped-file bits to a
separate function. This will make future modifications easier.

This doesn't fix any bug, it is just preparatory cleanup.

Signed-off-by: Greg Kurz 
---
 hw/9pfs/9p-local.c |   74 +++-
 1 file changed, 44 insertions(+), 30 deletions(-)

diff --git a/hw/9pfs/9p-local.c b/hw/9pfs/9p-local.c
index 74953e4dbfe0..7506d2155c05 100644
--- a/hw/9pfs/9p-local.c
+++ b/hw/9pfs/9p-local.c
@@ -1303,41 +1303,25 @@ static int local_utimensat(FsContext *s, V9fsPath 
*fs_path,
 return ret;
 }
 
-static int local_remove(FsContext *ctx, const char *path)
+static int local_pre_remove_mapped_file(FsContext *ctx, const char *path)
 {
 int err;
 struct stat stbuf;
 char *buffer;
 
-if (ctx->export_flags & V9FS_SM_MAPPED_FILE) {
-buffer = rpath(ctx, path);
-err =  lstat(buffer, &stbuf);
-g_free(buffer);
-if (err) {
-goto err_out;
-}
-/*
- * If directory remove .virtfs_metadata contained in the
- * directory
- */
-if (S_ISDIR(stbuf.st_mode)) {
-buffer = g_strdup_printf("%s/%s/%s", ctx->fs_root,
- path, VIRTFS_META_DIR);
-err = remove(buffer);
-g_free(buffer);
-if (err < 0 && errno != ENOENT) {
-/*
- * We didn't had the .virtfs_metadata file. May be file created
- * in non-mapped mode ?. Ignore ENOENT.
- */
-goto err_out;
-}
-}
-/*
- * Now remove the name from parent directory
- * .virtfs_metadata directory
- */
-buffer = local_mapped_attr_path(ctx, path);
+buffer = rpath(ctx, path);
+err =  lstat(buffer, &stbuf);
+g_free(buffer);
+if (err) {
+goto err_out;
+}
+/*
+ * If directory remove .virtfs_metadata contained in the
+ * directory
+ */
+if (S_ISDIR(stbuf.st_mode)) {
+buffer = g_strdup_printf("%s/%s/%s", ctx->fs_root,
+ path, VIRTFS_META_DIR);
 err = remove(buffer);
 g_free(buffer);
 if (err < 0 && errno != ENOENT) {
@@ -1348,6 +1332,36 @@ static int local_remove(FsContext *ctx, const char *path)
 goto err_out;
 }
 }
+/*
+ * Now remove the name from parent directory
+ * .virtfs_metadata directory
+ */
+buffer = local_mapped_attr_path(ctx, path);
+err = remove(buffer);
+g_free(buffer);
+if (err < 0 && errno == ENOENT) {
+/*
+ * We didn't had the .virtfs_metadata file. May be file created
+ * in non-mapped mode ?. Ignore ENOENT.
+ */
+err = 0;
+}
+
+err_out:
+return err;
+}
+
+static int local_remove(FsContext *ctx, const char *path)
+{
+int err;
+char *buffer;
+
+if (ctx->export_flags & V9FS_SM_MAPPED_FILE) {
+err =  local_pre_remove_mapped_file(ctx, path);
+if (err) {
+goto err_out;
+}
+}
 
 buffer = rpath(ctx, path);
 err = remove(buffer);

[Qemu-devel] [PATCH RFC 11/36] 9pfs: local: post rename operation for mapped-file security

2017-01-30 Thread Greg Kurz

The rename operation is really the same for the passthrough and mapped
security models. This patch simply moves the mapped-file bits to a
separate function. This will make future modifications easier.

This doesn't fix any bug, it is just preparatory cleanup.

Signed-off-by: Greg Kurz 
---
 hw/9pfs/9p-local.c |   36 
 1 file changed, 24 insertions(+), 12 deletions(-)

diff --git a/hw/9pfs/9p-local.c b/hw/9pfs/9p-local.c
index df453414c902..74953e4dbfe0 100644
--- a/hw/9pfs/9p-local.c
+++ b/hw/9pfs/9p-local.c
@@ -1187,6 +1187,29 @@ static int local_truncate(FsContext *ctx, V9fsPath 
*fs_path, off_t size)
 return ret;
 }
 
+static int local_post_rename_mapped_file(FsContext *ctx, const char *oldpath,
+ const char *newpath)
+{
+int err;
+char *buffer, *buffer1;
+
+err = local_create_mapped_attr_dir(ctx, newpath);
+if (err < 0) {
+return err;
+}
+/* rename the .virtfs_metadata files */
+buffer = local_mapped_attr_path(ctx, oldpath);
+buffer1 = local_mapped_attr_path(ctx, newpath);
+err = rename(buffer, buffer1);
+g_free(buffer);
+g_free(buffer1);
+if (err < 0 && errno != ENOENT) {
+return err;
+}
+
+return 0;
+}
+
 static int local_rename(FsContext *ctx, const char *oldpath,
 const char *newpath)
 {
@@ -1202,19 +1225,8 @@ static int local_rename(FsContext *ctx, const char 
*oldpath,
 }
 
 if (ctx->export_flags & V9FS_SM_MAPPED_FILE) {
-char *vbuffer, *vbuffer1;
-
-err = local_create_mapped_attr_dir(ctx, newpath);
+err = local_post_rename_mapped_file(ctx, oldpath, newpath);
 if (err < 0) {
-goto out_err;
-}
-/* rename the .virtfs_metadata files */
-vbuffer = local_mapped_attr_path(ctx, oldpath);
-vbuffer1 = local_mapped_attr_path(ctx, newpath);
-err = rename(vbuffer, vbuffer1);
-g_free(vbuffer);
-g_free(vbuffer1);
-if (err < 0 && errno != ENOENT) {
 goto err_out;
 }
 }

[Qemu-devel] [PATCH RFC 08/36] 9pfs: local: improve error handling in link op

2017-01-30 Thread Greg Kurz

When using the mapped-file security model, we also have to create a link
for the metadata file if it exists. In case of failuire, we should rollback.

That's what this patch does.

Signed-off-by: Greg Kurz 
---
 hw/9pfs/9p-local.c |   26 +++---
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/hw/9pfs/9p-local.c b/hw/9pfs/9p-local.c
index d424d8994779..c0b1907f7901 100644
--- a/hw/9pfs/9p-local.c
+++ b/hw/9pfs/9p-local.c
@@ -1114,6 +1114,7 @@ static int local_link(FsContext *ctx, V9fsPath *oldpath,
 int ret;
 V9fsString newpath;
 char *buffer, *buffer1;
+int serrno;
 
 v9fs_string_init(&newpath);
 v9fs_string_sprintf(&newpath, "%s/%s", dirpath->data, name);
@@ -1122,25 +1123,36 @@ static int local_link(FsContext *ctx, V9fsPath *oldpath,
 buffer1 = rpath(ctx, newpath.data);
 ret = link(buffer, buffer1);
 g_free(buffer);
-g_free(buffer1);
+if (ret < 0) {
+goto out;
+}
 
 /* now link the virtfs_metadata files */
-if (!ret && (ctx->export_flags & V9FS_SM_MAPPED_FILE)) {
+if (ctx->export_flags & V9FS_SM_MAPPED_FILE) {
+char *vbuffer, *vbuffer1;
+
 /* Link the .virtfs_metadata files. Create the metada directory */
 ret = local_create_mapped_attr_dir(ctx, newpath.data);
 if (ret < 0) {
 goto err_out;
 }
-buffer = local_mapped_attr_path(ctx, oldpath->data);
-buffer1 = local_mapped_attr_path(ctx, newpath.data);
-ret = link(buffer, buffer1);
-g_free(buffer);
-g_free(buffer1);
+vbuffer = local_mapped_attr_path(ctx, oldpath->data);
+vbuffer1 = local_mapped_attr_path(ctx, newpath.data);
+ret = link(vbuffer, vbuffer1);
+g_free(vbuffer);
+g_free(vbuffer1);
 if (ret < 0 && errno != ENOENT) {
 goto err_out;
 }
 }
+goto out;
+
 err_out:
+serrno = errno;
+remove(buffer1);
+errno = serrno;
+out:
+g_free(buffer1);
 v9fs_string_free(&newpath);
 return ret;
 }

[Qemu-devel] [PATCH RFC 07/36] 9pfs: local: split mkdir operation per security model

2017-01-30 Thread Greg Kurz

Having all security models implemented in one monolithic function is
cumbersome. Especially when the need arises to fix something in the
shared code, as it forces to change all the paths at the same time.

This doesn't fix any bug, it is just preparatory cleanup.

Signed-off-by: Greg Kurz 
---
 hw/9pfs/9p-local.c |   44 +++-
 1 file changed, 35 insertions(+), 9 deletions(-)

diff --git a/hw/9pfs/9p-local.c b/hw/9pfs/9p-local.c
index f9abd4229b17..d424d8994779 100644
--- a/hw/9pfs/9p-local.c
+++ b/hw/9pfs/9p-local.c
@@ -1187,26 +1187,52 @@ static int local_rename(FsContext *ctx, const char 
*oldpath,
 return err;
 }
 
-static int local_chown(FsContext *fs_ctx, V9fsPath *fs_path, FsCred *credp)
+static int local_chown_passthrough(FsContext *fs_ctx, V9fsPath *fs_path,
+   FsCred *credp)
 {
 char *buffer;
 int ret = -1;
 char *path = fs_path->data;
 
+buffer = rpath(fs_ctx, path);
+ret = lchown(buffer, credp->fc_uid, credp->fc_gid);
+g_free(buffer);
+return ret;
+}
+
+static int local_chown_mapped(FsContext *fs_ctx, V9fsPath *fs_path,
+  FsCred *credp)
+{
+char *buffer;
+int ret = -1;
+char *path = fs_path->data;
+
+buffer = rpath(fs_ctx, path);
+ret = local_set_xattr(buffer, credp);
+g_free(buffer);
+return ret;
+}
+
+static int local_chown_mapped_file(FsContext *fs_ctx, V9fsPath *fs_path,
+   FsCred *credp)
+{
+char *path = fs_path->data;
+
+return local_set_mapped_file_attr(fs_ctx, path, credp);
+}
+
+static int local_chown(FsContext *fs_ctx, V9fsPath *fs_path, FsCred *credp)
+{
 if ((credp->fc_uid == -1 && credp->fc_gid == -1) ||
 (fs_ctx->export_flags & V9FS_SM_PASSTHROUGH) ||
 (fs_ctx->export_flags & V9FS_SM_NONE)) {
-buffer = rpath(fs_ctx, path);
-ret = lchown(buffer, credp->fc_uid, credp->fc_gid);
-g_free(buffer);
+return local_chown_passthrough(fs_ctx, fs_path, credp);
 } else if (fs_ctx->export_flags & V9FS_SM_MAPPED) {
-buffer = rpath(fs_ctx, path);
-ret = local_set_xattr(buffer, credp);
-g_free(buffer);
+return local_chown_mapped(fs_ctx, fs_path, credp);
 } else if (fs_ctx->export_flags & V9FS_SM_MAPPED_FILE) {
-return local_set_mapped_file_attr(fs_ctx, path, credp);
+return local_chown_mapped_file(fs_ctx, fs_path, credp);
 }
-return ret;
+g_assert_not_reached();
 }
 
 static int local_utimensat(FsContext *s, V9fsPath *fs_path,

[Qemu-devel] [PATCH RFC 13/36] 9pfs: local: pre unlikat operation for mapped-file security

2017-01-30 Thread Greg Kurz

The unlinkat operation is really the same for the passthrough and
mapped security models. This patch simply moves the mapped-file bits
to a separate function. This will make future modifications easier.

This doesn't fix any bug, it is just preparatory cleanup.

Signed-off-by: Greg Kurz 
---
 hw/9pfs/9p-local.c |   68 ++--
 1 file changed, 44 insertions(+), 24 deletions(-)

diff --git a/hw/9pfs/9p-local.c b/hw/9pfs/9p-local.c
index 7506d2155c05..b75ad452decc 100644
--- a/hw/9pfs/9p-local.c
+++ b/hw/9pfs/9p-local.c
@@ -1462,39 +1462,23 @@ static int local_renameat(FsContext *ctx, V9fsPath 
*olddir,
 return ret;
 }
 
-static int local_unlinkat(FsContext *ctx, V9fsPath *dir,
-  const char *name, int flags)
+static int local_pre_unlinkat_mapped_file(FsContext *ctx, V9fsPath *dir,
+  const char *name, int flags)
 {
 int ret;
 V9fsString fullname;
 char *buffer;
 
 v9fs_string_init(&fullname);
-
 v9fs_string_sprintf(&fullname, "%s/%s", dir->data, name);
-if (ctx->export_flags & V9FS_SM_MAPPED_FILE) {
-if (flags == AT_REMOVEDIR) {
-/*
- * If directory remove .virtfs_metadata contained in the
- * directory
- */
-buffer = g_strdup_printf("%s/%s/%s", ctx->fs_root,
- fullname.data, VIRTFS_META_DIR);
-ret = remove(buffer);
-g_free(buffer);
-if (ret < 0 && errno != ENOENT) {
-/*
- * We didn't had the .virtfs_metadata file. May be file created
- * in non-mapped mode ?. Ignore ENOENT.
- */
-goto err_out;
-}
-}
+
+if (flags == AT_REMOVEDIR) {
 /*
- * Now remove the name from parent directory
- * .virtfs_metadata directory.
+ * If directory remove .virtfs_metadata contained in the
+ * directory
  */
-buffer = local_mapped_attr_path(ctx, fullname.data);
+buffer = g_strdup_printf("%s/%s/%s", ctx->fs_root,
+ fullname.data, VIRTFS_META_DIR);
 ret = remove(buffer);
 g_free(buffer);
 if (ret < 0 && errno != ENOENT) {
@@ -1505,6 +1489,42 @@ static int local_unlinkat(FsContext *ctx, V9fsPath *dir,
 goto err_out;
 }
 }
+/*
+ * Now remove the name from parent directory
+ * .virtfs_metadata directory.
+ */
+buffer = local_mapped_attr_path(ctx, fullname.data);
+ret = remove(buffer);
+g_free(buffer);
+if (ret < 0 && errno == ENOENT) {
+/*
+ * We didn't had the .virtfs_metadata file. May be file created
+ * in non-mapped mode ?. Ignore ENOENT.
+ */
+ret = 0;
+}
+
+err_out:
+v9fs_string_free(&fullname);
+return ret;
+}
+
+static int local_unlinkat(FsContext *ctx, V9fsPath *dir, const char *name,
+  int flags)
+{
+int ret;
+V9fsString fullname;
+char *buffer;
+
+v9fs_string_init(&fullname);
+v9fs_string_sprintf(&fullname, "%s/%s", dir->data, name);
+
+if (ctx->export_flags & V9FS_SM_MAPPED_FILE) {
+ret = local_pre_unlinkat_mapped_file(ctx, dir, name, flags);
+if (ret < 0) {
+goto err_out;
+}
+}
 /* Remove the name finally */
 buffer = rpath(ctx, fullname.data);
 ret = remove(buffer);

[Qemu-devel] [PATCH RFC 15/36] 9pfs: remove side-effects in local_open() and local_opendir()

2017-01-30 Thread Greg Kurz

If these functions fail, they should not change *fs. Let's use local
variables to fix this. While here, let's also do some cosmetic fixes
on the function args.

This doesn't fix any bug, it is just preparatory cleanup.

Signed-off-by: Greg Kurz 
---
 hw/9pfs/9p-local.c |   21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/hw/9pfs/9p-local.c b/hw/9pfs/9p-local.c
index 443197841780..d3c6ccf30b53 100644
--- a/hw/9pfs/9p-local.c
+++ b/hw/9pfs/9p-local.c
@@ -351,30 +351,37 @@ static int local_closedir(FsContext *ctx, 
V9fsFidOpenState *fs)
 return closedir(fs->dir.stream);
 }
 
-static int local_open(FsContext *ctx, V9fsPath *fs_path,
-  int flags, V9fsFidOpenState *fs)
+static int local_open(FsContext *ctx, V9fsPath *fs_path, int flags,
+  V9fsFidOpenState *fs)
 {
 char *buffer;
 char *path = fs_path->data;
+int fd;
 
 buffer = rpath(ctx, path);
-fs->fd = open(buffer, flags | O_NOFOLLOW);
+fd = open(buffer, flags | O_NOFOLLOW);
 g_free(buffer);
+if (fd == -1) {
+return -1;
+}
+fs->fd = fd;
 return fs->fd;
 }
 
-static int local_opendir(FsContext *ctx,
- V9fsPath *fs_path, V9fsFidOpenState *fs)
+static int local_opendir(FsContext *ctx, V9fsPath *fs_path,
+ V9fsFidOpenState *fs)
 {
 char *buffer;
 char *path = fs_path->data;
+DIR *stream;
 
 buffer = rpath(ctx, path);
-fs->dir.stream = opendir(buffer);
+stream = opendir(buffer);
 g_free(buffer);
-if (!fs->dir.stream) {
+if (!stream) {
 return -1;
 }
+fs->dir.stream = stream;
 return 0;
 }

[Qemu-devel] [PATCH RFC 20/36] 9pfs: local: readlink: don't follow symlinks

2017-01-30 Thread Greg Kurz

This fixes CVE-2016-9602 for all security models.

Signed-off-by: Greg Kurz 
---
 hw/9pfs/9p-local.c |   37 +++--
 1 file changed, 27 insertions(+), 10 deletions(-)

diff --git a/hw/9pfs/9p-local.c b/hw/9pfs/9p-local.c
index a1fff04c3219..1f9239de07e5 100644
--- a/hw/9pfs/9p-local.c
+++ b/hw/9pfs/9p-local.c
@@ -346,31 +346,48 @@ err:
 return -1;
 }
 
+static int readlink_nofollow(FsContext *fs_ctx, const char *path, char *buf,
+ size_t bufsz)
+{
+char *dirpath = local_dirname(path);
+char *name = local_basename(path);
+int dirfd, ret = -1;
+
+dirfd = local_opendir_nofollow(fs_ctx, dirpath);
+if (dirfd == -1) {
+goto out;
+}
+
+ret = readlinkat(dirfd, name, buf, bufsz);
+close_preserve_errno(dirfd);
+out:
+g_free(name);
+g_free(dirpath);
+return ret;
+}
+
 static ssize_t local_readlink(FsContext *fs_ctx, V9fsPath *fs_path,
   char *buf, size_t bufsz)
 {
-ssize_t tsize = -1;
-char *buffer;
-char *path = fs_path->data;
+ssize_t tsize;
 
 if ((fs_ctx->export_flags & V9FS_SM_MAPPED) ||
 (fs_ctx->export_flags & V9FS_SM_MAPPED_FILE)) {
 int fd;
-buffer = rpath(fs_ctx, path);
-fd = open(buffer, O_RDONLY | O_NOFOLLOW);
-g_free(buffer);
+
+fd = local_open_nofollow(fs_ctx, fs_path->data, O_RDONLY, 0);
 if (fd == -1) {
 return -1;
 }
 do {
 tsize = read(fd, (void *)buf, bufsz);
 } while (tsize == -1 && errno == EINTR);
-close(fd);
+close_preserve_errno(fd);
 } else if ((fs_ctx->export_flags & V9FS_SM_PASSTHROUGH) ||
(fs_ctx->export_flags & V9FS_SM_NONE)) {
-buffer = rpath(fs_ctx, path);
-tsize = readlink(buffer, buf, bufsz);
-g_free(buffer);
+tsize = readlink_nofollow(fs_ctx, fs_path->data, buf, bufsz);
+} else {
+g_assert_not_reached();
 }
 return tsize;
 }

[Qemu-devel] [PATCH RFC 10/36] v9fs: local: improve error handling in rename op

2017-01-30 Thread Greg Kurz

When using the mapped-file security model, we also have to rename the
metadata file if it exists. In case of failure, we should rollback.

To achieve that, this patch moves the renaming of the main file before
the renaming of the metadata file.

Signed-off-by: Greg Kurz 
---
 hw/9pfs/9p-local.c |   33 +++--
 1 file changed, 23 insertions(+), 10 deletions(-)

diff --git a/hw/9pfs/9p-local.c b/hw/9pfs/9p-local.c
index ebc3e208efa0..df453414c902 100644
--- a/hw/9pfs/9p-local.c
+++ b/hw/9pfs/9p-local.c
@@ -1192,26 +1192,39 @@ static int local_rename(FsContext *ctx, const char 
*oldpath,
 {
 int err;
 char *buffer, *buffer1;
+int serrno;
+
+buffer = rpath(ctx, oldpath);
+buffer1 = rpath(ctx, newpath);
+err = rename(buffer, buffer1);
+if (err < 0) {
+goto out;
+}
 
 if (ctx->export_flags & V9FS_SM_MAPPED_FILE) {
+char *vbuffer, *vbuffer1;
+
 err = local_create_mapped_attr_dir(ctx, newpath);
 if (err < 0) {
-return err;
+goto out_err;
 }
 /* rename the .virtfs_metadata files */
-buffer = local_mapped_attr_path(ctx, oldpath);
-buffer1 = local_mapped_attr_path(ctx, newpath);
-err = rename(buffer, buffer1);
-g_free(buffer);
-g_free(buffer1);
+vbuffer = local_mapped_attr_path(ctx, oldpath);
+vbuffer1 = local_mapped_attr_path(ctx, newpath);
+err = rename(vbuffer, vbuffer1);
+g_free(vbuffer);
+g_free(vbuffer1);
 if (err < 0 && errno != ENOENT) {
-return err;
+goto err_out;
 }
 }
+goto out;
 
-buffer = rpath(ctx, oldpath);
-buffer1 = rpath(ctx, newpath);
-err = rename(buffer, buffer1);
+err_out:
+serrno = errno;
+rename(buffer1, buffer);
+errno = serrno;
+out:
 g_free(buffer);
 g_free(buffer1);
 return err;

[Qemu-devel] [PATCH RFC 17/36] 9pfs: local: keep a file descriptor on the shared folder

2017-01-30 Thread Greg Kurz

This patch opens the shared folder and caches the file descriptor, so that
it can be used to do symlink-safe path walk. Since nothing prevents several
QEMU instances to pass overlapping export paths to -fsdev, we also make
sure that the export path doesn't traverse a symlink either.

Signed-off-by: Greg Kurz 
---
 hw/9pfs/9p-local.c |   38 --
 1 file changed, 36 insertions(+), 2 deletions(-)

diff --git a/hw/9pfs/9p-local.c b/hw/9pfs/9p-local.c
index d3c6ccf30b53..8a1d52cd6c2a 100644
--- a/hw/9pfs/9p-local.c
+++ b/hw/9pfs/9p-local.c
@@ -14,6 +14,7 @@
 #include "qemu/osdep.h"
 #include "9p.h"
 #include "9p-xattr.h"
+#include "9p-util.h"
 #include "fsdev/qemu-fsdev.h"   /* local_ops */
 #include 
 #include 
@@ -43,6 +44,10 @@
 #define BTRFS_SUPER_MAGIC 0x9123683E
 #endif
 
+struct local_data {
+int mountfd;
+};
+
 #define VIRTFS_META_DIR ".virtfs_metadata"
 
 static char *local_mapped_attr_path(FsContext *ctx, const char *path)
@@ -1573,13 +1578,28 @@ static int local_ioc_getversion(FsContext *ctx, 
V9fsPath *path,
 static int local_init(FsContext *ctx)
 {
 struct statfs stbuf;
+struct local_data *data = g_malloc(sizeof(*data));
+int rootfd;
+
+rootfd = open("/", O_DIRECTORY | O_RDONLY);
+if (rootfd == -1) {
+goto err;
+}
+
+data->mountfd = openat_nofollow(rootfd, ctx->fs_root,
+O_DIRECTORY | O_RDONLY, 0);
+close_preserve_errno(rootfd);
+if (data->mountfd == -1) {
+goto err;
+}
 
 #ifdef FS_IOC_GETVERSION
 /*
  * use ioc_getversion only if the iocl is definied
  */
-if (statfs(ctx->fs_root, &stbuf) < 0) {
-return -1;
+if (fstatfs(data->mountfd, &stbuf) < 0) {
+close_preserve_errno(data->mountfd);
+goto err;
 }
 switch (stbuf.f_type) {
 case EXT2_SUPER_MAGIC:
@@ -1606,7 +1626,20 @@ static int local_init(FsContext *ctx)
 }
 ctx->export_flags |= V9FS_PATHNAME_FSCONTEXT;
 
+ctx->private = data;
 return 0;
+
+err:
+g_free(data);
+return -1;
+}
+
+static void local_cleanup(FsContext *ctx)
+{
+struct local_data *data = ctx->private;
+
+close(data->mountfd);
+g_free(data);
 }
 
 static int local_parse_opts(QemuOpts *opts, struct FsDriverEntry *fse)
@@ -1649,6 +1682,7 @@ static int local_parse_opts(QemuOpts *opts, struct 
FsDriverEntry *fse)
 FileOperations local_ops = {
 .parse_opts = local_parse_opts,
 .init  = local_init,
+.cleanup = local_cleanup,
 .lstat = local_lstat,
 .readlink = local_readlink,
 .close = local_close,

[Qemu-devel] [PATCH RFC 16/36] 9pfs: introduce openat_nofollow() helper

2017-01-30 Thread Greg Kurz

When using the passthrough security mode, symbolic links created by the
guest are actual symbolic links on the host file system.

Since the resolution of symbolic links during path walk is supposed to
occur on the client side. The server should never have to follow them.
The current code hence relies on O_NOFOLLOW and "l*()" variants of system
calls. Unfortunately, this only applies to the rightmost path component.
A guest could maliciously replace any component in a trusted path with a
symbolic link. This could allow any guest to escape a virtfs shared folder.

This patch introduces a variant of the openat() syscall that successively
opens each path element with O_NOFOLLOW. It will be used by subsequent
patches to implement symlink-safe path walk for any access to the backend.

Suggested-by: Jann Horn 
Signed-off-by: Greg Kurz 
---
 hw/9pfs/9p-util.c |   69 +
 hw/9pfs/9p-util.h |   25 ++
 hw/9pfs/Makefile.objs |2 +
 3 files changed, 95 insertions(+), 1 deletion(-)
 create mode 100644 hw/9pfs/9p-util.c
 create mode 100644 hw/9pfs/9p-util.h

diff --git a/hw/9pfs/9p-util.c b/hw/9pfs/9p-util.c
new file mode 100644
index ..48292d948401
--- /dev/null
+++ b/hw/9pfs/9p-util.c
@@ -0,0 +1,69 @@
+/*
+ * 9p utilities
+ *
+ * Copyright IBM, Corp. 2017
+ *
+ * Authors:
+ *  Greg Kurz 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "9p-util.h"
+
+int openat_nofollow(int dirfd, const char *path, int flags, mode_t mode)
+{
+const char *tail = path;
+const char *c;
+int fd;
+
+fd = dup(dirfd);
+if (fd == -1) {
+return -1;
+}
+
+while (*tail) {
+int next_fd;
+char *head;
+
+while (*tail == '/') {
+tail++;
+}
+
+if (!*tail) {
+break;
+}
+
+head = g_strdup(tail);
+c = strchr(tail, '/');
+if (c) {
+head[c - tail] = 0;
+next_fd = openat(fd, head, O_DIRECTORY | O_RDONLY | O_NOFOLLOW);
+} else {
+/* We don't want bad things to happen like opening a file that
+ * sits outside the virtfs export, or hanging on a named pipe,
+ * or changing the controlling process of a terminal.
+ */
+flags |= O_NOFOLLOW | O_NONBLOCK | O_NOCTTY;
+next_fd = openat(fd, head, flags, mode);
+}
+g_free(head);
+if (next_fd == -1) {
+close_preserve_errno(fd);
+return -1;
+}
+close(fd);
+fd = next_fd;
+
+if (!c) {
+break;
+}
+tail = c + 1;
+}
+/* O_NONBLOCK was only needed to open the file. Let's drop it. */
+assert(!fcntl(fd, F_SETFL, flags));
+
+return fd;
+}
diff --git a/hw/9pfs/9p-util.h b/hw/9pfs/9p-util.h
new file mode 100644
index ..e19673d85222
--- /dev/null
+++ b/hw/9pfs/9p-util.h
@@ -0,0 +1,25 @@
+/*
+ * 9p utilities
+ *
+ * Copyright IBM, Corp. 2017
+ *
+ * Authors:
+ *  Greg Kurz 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_9P_UTIL_H
+#define QEMU_9P_UTIL_H
+
+static inline void close_preserve_errno(int fd)
+{
+int serrno = errno;
+close(fd);
+errno = serrno;
+}
+
+int openat_nofollow(int dirfd, const char *path, int flags, mode_t mode);
+
+#endif
diff --git a/hw/9pfs/Makefile.objs b/hw/9pfs/Makefile.objs
index da0ae0cfdbae..32197e6671dd 100644
--- a/hw/9pfs/Makefile.objs
+++ b/hw/9pfs/Makefile.objs
@@ -1,4 +1,4 @@
-common-obj-y  = 9p.o
+common-obj-y  = 9p.o 9p-util.o
 common-obj-y += 9p-local.o 9p-xattr.o
 common-obj-y += 9p-xattr-user.o 9p-posix-acl.o
 common-obj-y += coth.o cofs.o codir.o cofile.o

[Qemu-devel] [PATCH RFC 06/36] 9pfs: local: split symlink operation per security model

2017-01-30 Thread Greg Kurz

Having all security models implemented in one monolithic function is
cumbersome. Especially when the need arises to fix something in the
shared code, as it forces to change all the paths at the same time.

This doesn't fix any bug, it is just preparatory cleanup.

Signed-off-by: Greg Kurz 
---
 hw/9pfs/9p-local.c |  210 ++--
 1 file changed, 137 insertions(+), 73 deletions(-)

diff --git a/hw/9pfs/9p-local.c b/hw/9pfs/9p-local.c
index 4deb19ce8775..f9abd4229b17 100644
--- a/hw/9pfs/9p-local.c
+++ b/hw/9pfs/9p-local.c
@@ -941,96 +941,144 @@ static int local_open2(FsContext *fs_ctx, V9fsPath 
*dir_path, const char *name,
 g_assert_not_reached();
 }
 
-static int local_symlink(FsContext *fs_ctx, const char *oldpath,
- V9fsPath *dir_path, const char *name, FsCred *credp)
+static int local_symlink_mapped(FsContext *fs_ctx, const char *oldpath,
+V9fsPath *dir_path, const char *name,
+FsCred *credp)
 {
 int err = -1;
 int serrno = 0;
 char *newpath;
 V9fsString fullname;
 char *buffer = NULL;
+int fd;
+ssize_t oldpath_size, write_size;
 
 v9fs_string_init(&fullname);
 v9fs_string_sprintf(&fullname, "%s/%s", dir_path->data, name);
 newpath = fullname.data;
 
-/* Determine the security model */
-if (fs_ctx->export_flags & V9FS_SM_MAPPED) {
-int fd;
-ssize_t oldpath_size, write_size;
-buffer = rpath(fs_ctx, newpath);
-fd = open(buffer, O_CREAT|O_EXCL|O_RDWR|O_NOFOLLOW, 
SM_LOCAL_MODE_BITS);
-if (fd == -1) {
-err = fd;
-goto out;
-}
-/* Write the oldpath (target) to the file. */
-oldpath_size = strlen(oldpath);
-do {
-write_size = write(fd, (void *)oldpath, oldpath_size);
-} while (write_size == -1 && errno == EINTR);
+buffer = rpath(fs_ctx, newpath);
+fd = open(buffer, O_CREAT | O_EXCL | O_RDWR | O_NOFOLLOW,
+  SM_LOCAL_MODE_BITS);
+if (fd == -1) {
+err = fd;
+goto out;
+}
+/* Write the oldpath (target) to the file. */
+oldpath_size = strlen(oldpath);
+do {
+write_size = write(fd, (void *)oldpath, oldpath_size);
+} while (write_size == -1 && errno == EINTR);
 
-if (write_size != oldpath_size) {
-serrno = errno;
-close(fd);
-err = -1;
-goto err_end;
-}
+if (write_size != oldpath_size) {
+serrno = errno;
 close(fd);
-/* Set cleint credentials in symlink's xattr */
-credp->fc_mode = credp->fc_mode|S_IFLNK;
-err = local_set_xattr(buffer, credp);
-if (err == -1) {
-serrno = errno;
-goto err_end;
-}
-} else if (fs_ctx->export_flags & V9FS_SM_MAPPED_FILE) {
-int fd;
-ssize_t oldpath_size, write_size;
-buffer = rpath(fs_ctx, newpath);
-fd = open(buffer, O_CREAT|O_EXCL|O_RDWR|O_NOFOLLOW, 
SM_LOCAL_MODE_BITS);
-if (fd == -1) {
-err = fd;
-goto out;
-}
-/* Write the oldpath (target) to the file. */
-oldpath_size = strlen(oldpath);
-do {
-write_size = write(fd, (void *)oldpath, oldpath_size);
-} while (write_size == -1 && errno == EINTR);
+err = -1;
+goto err_end;
+}
+close(fd);
+/* Set cleint credentials in symlink's xattr */
+credp->fc_mode = credp->fc_mode | S_IFLNK;
+err = local_set_xattr(buffer, credp);
+if (err == -1) {
+serrno = errno;
+goto err_end;
+}
+goto out;
 
-if (write_size != oldpath_size) {
-serrno = errno;
-close(fd);
-err = -1;
-goto err_end;
-}
+err_end:
+remove(buffer);
+errno = serrno;
+out:
+g_free(buffer);
+v9fs_string_free(&fullname);
+return err;
+}
+
+static int local_symlink_mapped_file(FsContext *fs_ctx, const char *oldpath,
+ V9fsPath *dir_path, const char *name,
+ FsCred *credp)
+{
+int err = -1;
+int serrno = 0;
+char *newpath;
+V9fsString fullname;
+char *buffer = NULL;
+int fd;
+ssize_t oldpath_size, write_size;
+
+v9fs_string_init(&fullname);
+v9fs_string_sprintf(&fullname, "%s/%s", dir_path->data, name);
+newpath = fullname.data;
+
+buffer = rpath(fs_ctx, newpath);
+fd = open(buffer, O_CREAT | O_EXCL | O_RDWR | O_NOFOLLOW,
+  SM_LOCAL_MODE_BITS);
+if (fd == -1) {
+err = fd;
+goto out;
+}
+/* Write the oldpath (target) to the file. */
+oldpath_size = strlen(oldpath);
+do {
+write_size = write(fd, (void *)oldpath, oldpath_size);
+} while (write_size == -1 && errno == EINTR);
+
+if (write_size != oldpath_size) {
+serrno = errno;

[Qemu-devel] [PATCH RFC 21/36] 9pfs: local: truncate: don't follow symlinks

2017-01-30 Thread Greg Kurz

This fixes CVE-2016-9602 for all security models.

Signed-off-by: Greg Kurz 
---
 hw/9pfs/9p-local.c |   15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/hw/9pfs/9p-local.c b/hw/9pfs/9p-local.c
index 1f9239de07e5..4377aa6524c2 100644
--- a/hw/9pfs/9p-local.c
+++ b/hw/9pfs/9p-local.c
@@ -1231,15 +1231,16 @@ out:
 return ret;
 }
 
-static int local_truncate(FsContext *ctx, V9fsPath *fs_path, off_t size)
+static int local_truncate(FsContext *fs_ctx, V9fsPath *fs_path, off_t size)
 {
-char *buffer;
-int ret;
-char *path = fs_path->data;
+int fd, ret;
 
-buffer = rpath(ctx, path);
-ret = truncate(buffer, size);
-g_free(buffer);
+fd = local_open_nofollow(fs_ctx, fs_path->data, O_WRONLY, 0);
+if (fd == -1) {
+return -1;
+}
+ret = ftruncate(fd, size);
+close_preserve_errno(fd);
 return ret;
 }

[Qemu-devel] [PATCH RFC 22/36] 9pfs: local: statfs: don't follow symlinks

2017-01-30 Thread Greg Kurz

This fixes CVE-2016-9602 for all security models.

Signed-off-by: Greg Kurz 
---
 hw/9pfs/9p-local.c |   13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/hw/9pfs/9p-local.c b/hw/9pfs/9p-local.c
index 4377aa6524c2..dbc56b16979c 100644
--- a/hw/9pfs/9p-local.c
+++ b/hw/9pfs/9p-local.c
@@ -1452,15 +1452,14 @@ static int local_fsync(FsContext *ctx, int fid_type,
 }
 }
 
-static int local_statfs(FsContext *s, V9fsPath *fs_path, struct statfs *stbuf)
+static int local_statfs(FsContext *fs_ctx, V9fsPath *fs_path,
+struct statfs *stbuf)
 {
-char *buffer;
-int ret;
-char *path = fs_path->data;
+int fd, ret;
 
-buffer = rpath(s, path);
-ret = statfs(buffer, stbuf);
-g_free(buffer);
+fd = local_open_nofollow(fs_ctx, fs_path->data, O_RDONLY, 0);
+ret = fstatfs(fd, stbuf);
+close_preserve_errno(fd);
 return ret;
 }

[Qemu-devel] [PATCH RFC 27/36] 9pfs: local: link: don't follow symlinks

2017-01-30 Thread Greg Kurz

This fixes CVE-2016-9602 for the "passthrough" and "mapped" security models.

Signed-off-by: Greg Kurz 
---
 hw/9pfs/9p-local.c |   40 ++--
 1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/hw/9pfs/9p-local.c b/hw/9pfs/9p-local.c
index 5e320917c484..de860db3d70b 100644
--- a/hw/9pfs/9p-local.c
+++ b/hw/9pfs/9p-local.c
@@ -1181,21 +1181,23 @@ out:
 static int local_link(FsContext *ctx, V9fsPath *oldpath,
   V9fsPath *dirpath, const char *name)
 {
-int ret;
-V9fsString newpath;
-char *buffer, *buffer1;
-int serrno;
-
-v9fs_string_init(&newpath);
-v9fs_string_sprintf(&newpath, "%s/%s", dirpath->data, name);
+char *odirpath = local_dirname(oldpath->data);
+char *oname = local_basename(oldpath->data);
+int ret = -1;
+int odirfd, ndirfd;
 
-buffer = rpath(ctx, oldpath->data);
-buffer1 = rpath(ctx, newpath.data);
-ret = link(buffer, buffer1);
-g_free(buffer);
-if (ret < 0) {
+odirfd = local_opendir_nofollow(ctx, odirpath);
+if (odirfd == -1) {
 goto out;
 }
+ndirfd = local_opendir_nofollow(ctx, dirpath->data);
+if (ndirfd == -1) {
+goto out_close_odirfd;
+}
+ret = linkat(odirfd, oname, ndirfd, name, AT_SYMLINK_FOLLOW);
+if (ret < 0) {
+goto out_close_ndirfd;
+}
 
 /* now link the virtfs_metadata files */
 if (ctx->export_flags & V9FS_SM_MAPPED_FILE) {
@@ -1204,15 +1206,17 @@ static int local_link(FsContext *ctx, V9fsPath *oldpath,
 goto err_out;
 }
 }
-goto out;
+goto out_close_ndirfd;
 
 err_out:
-serrno = errno;
-remove(buffer1);
-errno = serrno;
+unlinkat_preserve_errno(ndirfd, name, 0);
+out_close_ndirfd:
+close_preserve_errno(ndirfd);
+out_close_odirfd:
+close_preserve_errno(odirfd);
 out:
-g_free(buffer1);
-v9fs_string_free(&newpath);
+g_free(oname);
+g_free(odirpath);
 return ret;
 }

[Qemu-devel] [PATCH RFC 24/36] 9pfs: local: chmod: don't follow symlinks

2017-01-30 Thread Greg Kurz

This fixes CVE-2016-9602 for the "passthrough" security model.

Signed-off-by: Greg Kurz 
---
 hw/9pfs/9p-local.c |   13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/hw/9pfs/9p-local.c b/hw/9pfs/9p-local.c
index 48d46b6abd28..9dfa3e306245 100644
--- a/hw/9pfs/9p-local.c
+++ b/hw/9pfs/9p-local.c
@@ -551,13 +551,14 @@ static int local_chmod_mapped(FsContext *fs_ctx, V9fsPath 
*fs_path,
 static int local_chmod_passthrough(FsContext *fs_ctx, V9fsPath *fs_path,
FsCred *credp)
 {
-char *buffer;
-int ret = -1;
-char *path = fs_path->data;
+int fd, ret;
 
-buffer = rpath(fs_ctx, path);
-ret = chmod(buffer, credp->fc_mode);
-g_free(buffer);
+fd = local_open_nofollow(fs_ctx, fs_path->data, O_RDONLY, 0);
+if (fd == -1) {
+return -1;
+}
+ret = fchmod(fd, credp->fc_mode);
+close_preserve_errno(fd);
 
 return ret;
 }

[Qemu-devel] [PATCH RFC 14/36] 9pfs: remove side-effects in local_init()

2017-01-30 Thread Greg Kurz

If this function fails, it should not modify *ctx.

This doesn't fix any bug, it is just preparatory cleanup.

Signed-off-by: Greg Kurz 
---
 hw/9pfs/9p-local.c |   37 +++--
 1 file changed, 19 insertions(+), 18 deletions(-)

diff --git a/hw/9pfs/9p-local.c b/hw/9pfs/9p-local.c
index b75ad452decc..443197841780 100644
--- a/hw/9pfs/9p-local.c
+++ b/hw/9pfs/9p-local.c
@@ -1565,9 +1565,25 @@ static int local_ioc_getversion(FsContext *ctx, V9fsPath 
*path,
 
 static int local_init(FsContext *ctx)
 {
-int err = 0;
 struct statfs stbuf;
 
+#ifdef FS_IOC_GETVERSION
+/*
+ * use ioc_getversion only if the iocl is definied
+ */
+if (statfs(ctx->fs_root, &stbuf) < 0) {
+return -1;
+}
+switch (stbuf.f_type) {
+case EXT2_SUPER_MAGIC:
+case BTRFS_SUPER_MAGIC:
+case REISERFS_SUPER_MAGIC:
+case XFS_SUPER_MAGIC:
+ctx->exops.get_st_gen = local_ioc_getversion;
+break;
+}
+#endif
+
 if (ctx->export_flags & V9FS_SM_PASSTHROUGH) {
 ctx->xops = passthrough_xattr_ops;
 } else if (ctx->export_flags & V9FS_SM_MAPPED) {
@@ -1582,23 +1598,8 @@ static int local_init(FsContext *ctx)
 ctx->xops = passthrough_xattr_ops;
 }
 ctx->export_flags |= V9FS_PATHNAME_FSCONTEXT;
-#ifdef FS_IOC_GETVERSION
-/*
- * use ioc_getversion only if the iocl is definied
- */
-err = statfs(ctx->fs_root, &stbuf);
-if (!err) {
-switch (stbuf.f_type) {
-case EXT2_SUPER_MAGIC:
-case BTRFS_SUPER_MAGIC:
-case REISERFS_SUPER_MAGIC:
-case XFS_SUPER_MAGIC:
-ctx->exops.get_st_gen = local_ioc_getversion;
-break;
-}
-}
-#endif
-return err;
+
+return 0;
 }
 
 static int local_parse_opts(QemuOpts *opts, struct FsDriverEntry *fse)

[Qemu-devel] [PATCH RFC 29/36] 9pfs: local: remove: don't follow symlinks

2017-01-30 Thread Greg Kurz

This fixes CVE-2016-9602 for the "passthrough" and "mapped" security models.

Signed-off-by: Greg Kurz 
---
 hw/9pfs/9p-local.c |   28 +++-
 1 file changed, 23 insertions(+), 5 deletions(-)

diff --git a/hw/9pfs/9p-local.c b/hw/9pfs/9p-local.c
index 364da435350b..573852a55a00 100644
--- a/hw/9pfs/9p-local.c
+++ b/hw/9pfs/9p-local.c
@@ -1432,8 +1432,17 @@ err_out:
 
 static int local_remove(FsContext *ctx, const char *path)
 {
-int err;
-char *buffer;
+char *dirpath = local_dirname(path);
+char *name = local_basename(path);
+struct stat stbuf;
+int flags = 0;
+int err = -1;
+int dirfd;
+
+dirfd = local_opendir_nofollow(ctx, dirpath);
+if (dirfd) {
+goto out;
+}
 
 if (ctx->export_flags & V9FS_SM_MAPPED_FILE) {
 err =  local_pre_remove_mapped_file(ctx, path);
@@ -1442,10 +1451,19 @@ static int local_remove(FsContext *ctx, const char 
*path)
 }
 }
 
-buffer = rpath(ctx, path);
-err = remove(buffer);
-g_free(buffer);
+if (fstatat(dirfd, path, &stbuf, AT_SYMLINK_NOFOLLOW) < 0) {
+goto err_out;
+}
+if (S_ISDIR(stbuf.st_mode)) {
+flags |= AT_REMOVEDIR;
+}
+
+err = unlinkat(dirfd, name, flags);
 err_out:
+close_preserve_errno(dirfd);
+out:
+g_free(name);
+g_free(dirpath);
 return err;
 }

[Qemu-devel] [PATCH RFC 25/36] 9pfs: local: symlink: don't follow symlinks

2017-01-30 Thread Greg Kurz

This fixes CVE-2016-9602 for the "passthrough" security model.

Signed-off-by: Greg Kurz 
---
 hw/9pfs/9p-local.c |   26 ++
 1 file changed, 10 insertions(+), 16 deletions(-)

diff --git a/hw/9pfs/9p-local.c b/hw/9pfs/9p-local.c
index 9dfa3e306245..bbc08184564f 100644
--- a/hw/9pfs/9p-local.c
+++ b/hw/9pfs/9p-local.c
@@ -1100,29 +1100,25 @@ static int local_symlink_passthrough(FsContext *fs_ctx, 
const char *oldpath,
  V9fsPath *dir_path, const char *name,
  FsCred *credp)
 {
-int err = -1;
-int serrno = 0;
-char *newpath;
-V9fsString fullname;
-char *buffer = NULL;
+int dirfd, err;
 
-v9fs_string_init(&fullname);
-v9fs_string_sprintf(&fullname, "%s/%s", dir_path->data, name);
-newpath = fullname.data;
+dirfd = local_opendir_nofollow(fs_ctx, dir_path->data);
+if (dirfd == -1) {
+return -1;
+}
 
-buffer = rpath(fs_ctx, newpath);
-err = symlink(oldpath, buffer);
+err = symlinkat(oldpath, dirfd, name);
 if (err) {
 goto out;
 }
-err = lchown(buffer, credp->fc_uid, credp->fc_gid);
+err = fchownat(dirfd, name, credp->fc_uid, credp->fc_gid,
+   AT_SYMLINK_NOFOLLOW);
 if (err == -1) {
 /*
  * If we fail to change ownership and if we are
  * using security model none. Ignore the error
  */
 if ((fs_ctx->export_flags & V9FS_SEC_MASK) != V9FS_SM_NONE) {
-serrno = errno;
 goto err_end;
 } else {
 err = 0;
@@ -1131,11 +1127,9 @@ static int local_symlink_passthrough(FsContext *fs_ctx, 
const char *oldpath,
 goto out;
 
 err_end:
-remove(buffer);
-errno = serrno;
+unlinkat_preserve_errno(dirfd, name, 0);
 out:
-g_free(buffer);
-v9fs_string_free(&fullname);
+close_preserve_errno(dirfd);
 return err;
 }

[Qemu-devel] [PATCH RFC 18/36] 9pfs: local: open/opendir: don't follow symlinks

2017-01-30 Thread Greg Kurz

This fixes CVE-2016-9602 for all security models.

Signed-off-by: Greg Kurz 
---
 hw/9pfs/9p-local.c |   31 +--
 hw/9pfs/9p-local.h |   20 
 2 files changed, 41 insertions(+), 10 deletions(-)
 create mode 100644 hw/9pfs/9p-local.h

diff --git a/hw/9pfs/9p-local.c b/hw/9pfs/9p-local.c
index 8a1d52cd6c2a..783b4006ffd4 100644
--- a/hw/9pfs/9p-local.c
+++ b/hw/9pfs/9p-local.c
@@ -13,6 +13,7 @@
 
 #include "qemu/osdep.h"
 #include "9p.h"
+#include "9p-local.h"
 #include "9p-xattr.h"
 #include "9p-util.h"
 #include "fsdev/qemu-fsdev.h"   /* local_ops */
@@ -48,6 +49,18 @@ struct local_data {
 int mountfd;
 };
 
+int local_open_nofollow(FsContext *fs_ctx, const char *path, int flags,
+mode_t mode)
+{
+struct local_data *data = fs_ctx->private;
+return openat_nofollow(data->mountfd, path, flags, mode);
+}
+
+int local_opendir_nofollow(FsContext *fs_ctx, const char *path)
+{
+return local_open_nofollow(fs_ctx, path, O_DIRECTORY | O_RDONLY, 0);
+}
+
 #define VIRTFS_META_DIR ".virtfs_metadata"
 
 static char *local_mapped_attr_path(FsContext *ctx, const char *path)
@@ -359,13 +372,9 @@ static int local_closedir(FsContext *ctx, V9fsFidOpenState 
*fs)
 static int local_open(FsContext *ctx, V9fsPath *fs_path, int flags,
   V9fsFidOpenState *fs)
 {
-char *buffer;
-char *path = fs_path->data;
 int fd;
 
-buffer = rpath(ctx, path);
-fd = open(buffer, flags | O_NOFOLLOW);
-g_free(buffer);
+fd = local_open_nofollow(ctx, fs_path->data, flags, 0);
 if (fd == -1) {
 return -1;
 }
@@ -376,13 +385,15 @@ static int local_open(FsContext *ctx, V9fsPath *fs_path, 
int flags,
 static int local_opendir(FsContext *ctx, V9fsPath *fs_path,
  V9fsFidOpenState *fs)
 {
-char *buffer;
-char *path = fs_path->data;
+int dirfd;
 DIR *stream;
 
-buffer = rpath(ctx, path);
-stream = opendir(buffer);
-g_free(buffer);
+dirfd = local_opendir_nofollow(ctx, fs_path->data);
+if (dirfd == -1) {
+return -1;
+}
+
+stream = fdopendir(dirfd);
 if (!stream) {
 return -1;
 }
diff --git a/hw/9pfs/9p-local.h b/hw/9pfs/9p-local.h
new file mode 100644
index ..32c72749d9df
--- /dev/null
+++ b/hw/9pfs/9p-local.h
@@ -0,0 +1,20 @@
+/*
+ * 9p local backend utilities
+ *
+ * Copyright IBM, Corp. 2017
+ *
+ * Authors:
+ *  Greg Kurz 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_9P_LOCAL_H
+#define QEMU_9P_LOCAL_H
+
+int local_open_nofollow(FsContext *fs_ctx, const char *path, int flags,
+mode_t mode);
+int local_opendir_nofollow(FsContext *fs_ctx, const char *path);
+
+#endif

[Qemu-devel] [PATCH RFC 30/36] 9pfs: local: unlinkat: don't follow symlinks

2017-01-30 Thread Greg Kurz

This fixes CVE-2016-9602 for the "passthrough" and "mapped" security models.

Signed-off-by: Greg Kurz 
---
 hw/9pfs/9p-local.c |   18 --
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/hw/9pfs/9p-local.c b/hw/9pfs/9p-local.c
index 573852a55a00..60edfb25f8a5 100644
--- a/hw/9pfs/9p-local.c
+++ b/hw/9pfs/9p-local.c
@@ -1609,25 +1609,23 @@ static int local_unlinkat(FsContext *ctx, V9fsPath 
*dir, const char *name,
   int flags)
 {
 int ret;
-V9fsString fullname;
-char *buffer;
+int dirfd;
 
-v9fs_string_init(&fullname);
-v9fs_string_sprintf(&fullname, "%s/%s", dir->data, name);
+dirfd = local_opendir_nofollow(ctx, dir->data);
+if (dirfd == -1) {
+return -1;
+}
 
 if (ctx->export_flags & V9FS_SM_MAPPED_FILE) {
 ret = local_pre_unlinkat_mapped_file(ctx, dir, name, flags);
-if (ret < 0) {
+if (ret) {
 goto err_out;
 }
 }
-/* Remove the name finally */
-buffer = rpath(ctx, fullname.data);
-ret = remove(buffer);
-g_free(buffer);
 
+ret = unlinkat(dirfd, name, flags);
 err_out:
-v9fs_string_free(&fullname);
+close_preserve_errno(dirfd);
 return ret;
 }

[Qemu-devel] [PATCH RFC 28/36] 9pfs: local: rename: don't follow symlinks

2017-01-30 Thread Greg Kurz

This fixes CVE-2016-9602 for the "passthrough" and "mapped" security models.

Signed-off-by: Greg Kurz 
---
 hw/9pfs/9p-local.c |   44 +++-
 1 file changed, 31 insertions(+), 13 deletions(-)

diff --git a/hw/9pfs/9p-local.c b/hw/9pfs/9p-local.c
index de860db3d70b..364da435350b 100644
--- a/hw/9pfs/9p-local.c
+++ b/hw/9pfs/9p-local.c
@@ -84,6 +84,14 @@ static void unlinkat_preserve_errno(int dirfd, const char 
*path, int flags)
 errno = serrno;
 }
 
+static void renameat_preserve_errno(int odirfd, const char *opath, int ndirfd,
+const char *npath)
+{
+int serrno = errno;
+renameat(odirfd, opath, ndirfd, npath);
+errno = serrno;
+}
+
 #define VIRTFS_META_DIR ".virtfs_metadata"
 
 static char *local_mapped_attr_path(FsContext *ctx, const char *path)
@@ -1259,16 +1267,22 @@ static int local_post_rename_mapped_file(FsContext 
*ctx, const char *oldpath,
 static int local_rename(FsContext *ctx, const char *oldpath,
 const char *newpath)
 {
-int err;
-char *buffer, *buffer1;
-int serrno;
+char *odirpath = local_dirname(oldpath);
+char *oname = local_basename(oldpath);
+char *ndirpath = local_dirname(newpath);
+char *nname = local_basename(newpath);
+int err = -1;
+int odirfd, ndirfd;
 
-buffer = rpath(ctx, oldpath);
-buffer1 = rpath(ctx, newpath);
-err = rename(buffer, buffer1);
-if (err < 0) {
+odirfd = local_opendir_nofollow(ctx, odirpath);
+if (odirfd == -1) {
 goto out;
 }
+ndirfd = local_opendir_nofollow(ctx, ndirpath);
+if (ndirfd < 0) {
+goto out_close_odirfd;
+}
+err = renameat(odirfd, oname, ndirfd, nname);
 
 if (ctx->export_flags & V9FS_SM_MAPPED_FILE) {
 err = local_post_rename_mapped_file(ctx, oldpath, newpath);
@@ -1276,15 +1290,19 @@ static int local_rename(FsContext *ctx, const char 
*oldpath,
 goto err_out;
 }
 }
-goto out;
+goto out_close_ndirfd;
 
 err_out:
-serrno = errno;
-rename(buffer1, buffer);
-errno = serrno;
+renameat_preserve_errno(ndirfd, nname, odirfd, oname);
+out_close_ndirfd:
+close_preserve_errno(ndirfd);
+out_close_odirfd:
+close_preserve_errno(odirfd);
 out:
-g_free(buffer);
-g_free(buffer1);
+g_free(nname);
+g_free(ndirpath);
+g_free(oname);
+g_free(odirpath);
 return err;
 }

[Qemu-devel] [PATCH RFC 19/36] 9pfs: local: utimensat: don't follow symlinks

2017-01-30 Thread Greg Kurz

This fixes CVE-2016-9602 for all security models.

Signed-off-by: Greg Kurz 
---
 hw/9pfs/9p-local.c |   37 ++---
 hw/9pfs/9p-local.h |2 ++
 2 files changed, 32 insertions(+), 7 deletions(-)

diff --git a/hw/9pfs/9p-local.c b/hw/9pfs/9p-local.c
index 783b4006ffd4..a1fff04c3219 100644
--- a/hw/9pfs/9p-local.c
+++ b/hw/9pfs/9p-local.c
@@ -61,6 +61,22 @@ int local_opendir_nofollow(FsContext *fs_ctx, const char 
*path)
 return local_open_nofollow(fs_ctx, path, O_DIRECTORY | O_RDONLY, 0);
 }
 
+char *local_dirname(const char *path)
+{
+char *tmp_path = g_strdup(path);
+char *result = g_strdup(dirname(tmp_path));
+g_free(tmp_path);
+return result;
+}
+
+char *local_basename(const char *path)
+{
+char *tmp_path = g_strdup(path);
+char *result = g_strdup(basename(tmp_path));
+g_free(tmp_path);
+return result;
+}
+
 #define VIRTFS_META_DIR ".virtfs_metadata"
 
 static char *local_mapped_attr_path(FsContext *ctx, const char *path)
@@ -1313,16 +1329,23 @@ static int local_chown(FsContext *fs_ctx, V9fsPath 
*fs_path, FsCred *credp)
 g_assert_not_reached();
 }
 
-static int local_utimensat(FsContext *s, V9fsPath *fs_path,
+static int local_utimensat(FsContext *fs_ctx, V9fsPath *fs_path,
const struct timespec *buf)
 {
-char *buffer;
-int ret;
-char *path = fs_path->data;
+char *dirpath = local_dirname(fs_path->data);
+char *name = local_basename(fs_path->data);
+int dirfd, ret = -1;
 
-buffer = rpath(s, path);
-ret = qemu_utimens(buffer, buf);
-g_free(buffer);
+dirfd = local_opendir_nofollow(fs_ctx, dirpath);
+if (dirfd == -1) {
+goto out;
+}
+
+ret = utimensat(dirfd, name, buf, AT_SYMLINK_NOFOLLOW);
+close_preserve_errno(dirfd);
+out:
+g_free(dirpath);
+g_free(name);
 return ret;
 }
 
diff --git a/hw/9pfs/9p-local.h b/hw/9pfs/9p-local.h
index 32c72749d9df..2732da2b5d55 100644
--- a/hw/9pfs/9p-local.h
+++ b/hw/9pfs/9p-local.h
@@ -16,5 +16,7 @@
 int local_open_nofollow(FsContext *fs_ctx, const char *path, int flags,
 mode_t mode);
 int local_opendir_nofollow(FsContext *fs_ctx, const char *path);
+char *local_dirname(const char *path);
+char *local_basename(const char *path);
 
 #endif

[Qemu-devel] [PATCH RFC 33/36] 9pfs: local: lgetxattr: don't follow symlinks

2017-01-30 Thread Greg Kurz

This fixes CVE-2016-9602 for all security models.

Signed-off-by: Greg Kurz 
---
 hw/9pfs/9p-posix-acl.c  |   16 ++--
 hw/9pfs/9p-xattr-user.c |8 +---
 hw/9pfs/9p-xattr.c  |8 +---
 3 files changed, 4 insertions(+), 28 deletions(-)

diff --git a/hw/9pfs/9p-posix-acl.c b/hw/9pfs/9p-posix-acl.c
index ec003181cd33..9435e27a368c 100644
--- a/hw/9pfs/9p-posix-acl.c
+++ b/hw/9pfs/9p-posix-acl.c
@@ -25,13 +25,7 @@
 static ssize_t mp_pacl_getxattr(FsContext *ctx, const char *path,
 const char *name, void *value, size_t size)
 {
-char *buffer;
-ssize_t ret;
-
-buffer = rpath(ctx, path);
-ret = lgetxattr(buffer, MAP_ACL_ACCESS, value, size);
-g_free(buffer);
-return ret;
+return local_getxattr_nofollow(ctx, path, MAP_ACL_ACCESS, value, size);
 }
 
 static ssize_t mp_pacl_listxattr(FsContext *ctx, const char *path,
@@ -89,13 +83,7 @@ static int mp_pacl_removexattr(FsContext *ctx,
 static ssize_t mp_dacl_getxattr(FsContext *ctx, const char *path,
 const char *name, void *value, size_t size)
 {
-char *buffer;
-ssize_t ret;
-
-buffer = rpath(ctx, path);
-ret = lgetxattr(buffer, MAP_ACL_DEFAULT, value, size);
-g_free(buffer);
-return ret;
+return local_getxattr_nofollow(ctx, path, MAP_ACL_DEFAULT, value, size);
 }
 
 static ssize_t mp_dacl_listxattr(FsContext *ctx, const char *path,
diff --git a/hw/9pfs/9p-xattr-user.c b/hw/9pfs/9p-xattr-user.c
index f87530c8b526..4071fbc4c086 100644
--- a/hw/9pfs/9p-xattr-user.c
+++ b/hw/9pfs/9p-xattr-user.c
@@ -20,9 +20,6 @@
 static ssize_t mp_user_getxattr(FsContext *ctx, const char *path,
 const char *name, void *value, size_t size)
 {
-char *buffer;
-ssize_t ret;
-
 if (strncmp(name, "user.virtfs.", 12) == 0) {
 /*
  * Don't allow fetch of user.virtfs namesapce
@@ -31,10 +28,7 @@ static ssize_t mp_user_getxattr(FsContext *ctx, const char 
*path,
 errno = ENOATTR;
 return -1;
 }
-buffer = rpath(ctx, path);
-ret = lgetxattr(buffer, name, value, size);
-g_free(buffer);
-return ret;
+return local_getxattr_nofollow(ctx, path, name, value, size);
 }
 
 static ssize_t mp_user_listxattr(FsContext *ctx, const char *path,
diff --git a/hw/9pfs/9p-xattr.c b/hw/9pfs/9p-xattr.c
index ea0695f37242..29f4f940a23f 100644
--- a/hw/9pfs/9p-xattr.c
+++ b/hw/9pfs/9p-xattr.c
@@ -302,13 +302,7 @@ int v9fs_remove_xattr(FsContext *ctx,
 ssize_t pt_getxattr(FsContext *ctx, const char *path, const char *name,
 void *value, size_t size)
 {
-char *buffer;
-ssize_t ret;
-
-buffer = rpath(ctx, path);
-ret = lgetxattr(buffer, name, value, size);
-g_free(buffer);
-return ret;
+return local_getxattr_nofollow(ctx, path, name, value, size);
 }
 
 int pt_setxattr(FsContext *ctx, const char *path, const char *name, void 
*value,

Re: [Qemu-devel] Towards an ivshmem 2.0?

2017-01-30 Thread Markus Armbruster

Jan Kiszka  writes:

> On 2017-01-30 09:00, Markus Armbruster wrote:
>> Jan Kiszka  writes:
>> 
>>> On 2017-01-27 20:36, Markus Armbruster wrote:
 Jan Kiszka  writes:

> On 2017-01-23 15:19, Markus Armbruster wrote:
>> Jan Kiszka  writes:
>>
>>> Hi,
>>>
>>> some of you may know that we are using a shared memory device similar to
>>> ivshmem in the partitioning hypervisor Jailhouse [1].
>>>
>>> We started as being compatible to the original ivshmem that QEMU
>>> implements, but we quickly deviated in some details, and in the recent
>>> months even more. Some of the deviations are related to making the
>>> implementation simpler. The new ivshmem takes <500 LoC - Jailhouse is
>>
>> Compare: hw/misc/ivshmem.c ~1000 SLOC, measured with sloccount.
>
> That difference comes from remote/migration support and general QEMU
> integration - likely not very telling due to the different environments.

 Plausible.

>>> aiming at safety critical systems and, therefore, a small code base.
>>> Other changes address deficits in the original design, like missing
>>> life-cycle management.
>>>
>>> Now the question is if there is interest in defining a common new
>>> revision of this device and maybe also of some protocols used on top,
>>> such as virtual network links. Ideally, this would enable us to share
>>> Linux drivers. We will definitely go for upstreaming at least a network
>>> driver such as [2], a UIO driver and maybe also a serial port/console.
>>>
>>> I've attached a first draft of the specification of our new ivshmem
>>> device. A working implementation can be found in the wip/ivshmem2 branch
>>> of Jailhouse [3], the corresponding ivshmem-net driver in [4].
>>>
>>> Deviations from the original design:
>>>
>>> - Only two peers per link
>>
>> Uh, define "link".
>
> VMs are linked via a common shared memory. Interrupt delivery follows
> that route as well.
>
>>
>>>   This simplifies the implementation and also the interfaces (think of
>>>   life-cycle management in a multi-peer environment). Moreover, we do
>>>   not have an urgent use case for multiple peers, thus also not
>>>   reference for a protocol that could be used in such setups. If someone
>>>   else happens to share such a protocol, it would be possible to discuss
>>>   potential extensions and their implications.
>>>
>>> - Side-band registers to discover and configure share memory regions
>>>
>>>   This was one of the first changes: We removed the memory regions from
>>>   the PCI BARs and gave them special configuration space registers. By
>>>   now, these registers are embedded in a PCI capability. The reasons are
>>>   that Jailhouse does not allow to relocate the regions in guest address
>>>   space (but other hypervisors may if they like to) and that we now have
>>>   up to three of them.
>>
>> I'm afraid I don't quite understand the change, nor the rationale.  I
>> guess I could figure out the former by studying the specification.
>
> a) It's a Jailhouse thing (we disallow the guest to move the regions
>around in its address space)
> b) With 3 regions + MSI-X + MMIO registers, we run out of BARs (or
>would have to downgrade them to 32 bit)

 Have you considered putting your three shared memory regions in memory
 consecutively, so they can be covered by a single BAR?  Similar to how a
 single BAR covers both MSI-X table and PBA.
>>>
>>> Would still require to pass three times some size information (each
>>> region can be different or empty/non-existent).
>> 
>> Yes.  Precedence: location of MSI-X table and PBA are specified in the
>> MSI-X Capability Structure as offset and BIR.
>> 
>>> Moreover, a) is not
>>> possible then without ugly modifications to the guest because they
>>> expect BAR-based regions to be relocatable.
>> 
>> Can you explain why not letting the guest map the shared memory into its
>> address space on its own just like any other piece of device memory is a
>> requirement?
>
> It requires reconfiguration of the sensitive 2nd level page tables
> during runtime of the guest. We are avoiding the neccessery checking and
> synchronization measures so far which reduces code complexity further.

You mean the hypervisor needs to act when the guest maps BARs, and that
gives the guest an attack vector?

Don't you have to deal with that anyway, for other PCI devices?

This is just out of curiosity, feel free to ignore me :)

> BTW, PCI has a similar concept of static assignment (PCI EA), but that
> is unfortunately incompatible to our needs [1].

Interesting.

>> 
>>> - Changed PCI base class code to 0xff (unspecified class)
>>
>> Changed from 0x5 (memory controller).
>
> Right.
>

[Qemu-devel] [PATCH RFC 23/36] 9pfs: local: mknod/mkdir/open2: don't follow symlinks

2017-01-30 Thread Greg Kurz

This fixes CVE-2016-9602 for the "passthrough" security model.

Signed-off-by: Greg Kurz 
---
 hw/9pfs/9p-local.c |  128 
 1 file changed, 59 insertions(+), 69 deletions(-)

diff --git a/hw/9pfs/9p-local.c b/hw/9pfs/9p-local.c
index dbc56b16979c..48d46b6abd28 100644
--- a/hw/9pfs/9p-local.c
+++ b/hw/9pfs/9p-local.c
@@ -77,6 +77,13 @@ char *local_basename(const char *path)
 return result;
 }
 
+static void unlinkat_preserve_errno(int dirfd, const char *path, int flags)
+{
+int serrno = errno;
+unlinkat(dirfd, path, flags);
+errno = serrno;
+}
+
 #define VIRTFS_META_DIR ".virtfs_metadata"
 
 static char *local_mapped_attr_path(FsContext *ctx, const char *path)
@@ -319,31 +326,41 @@ static int local_set_xattr(const char *path, FsCred 
*credp)
 return 0;
 }
 
-static int local_post_create_passthrough(FsContext *fs_ctx, const char *path,
- FsCred *credp)
+static int local_post_create_passthrough(FsContext *fs_ctx, int dirfd,
+ const char *name, FsCred *credp)
 {
-char *buffer;
+int fd, err = -1;
 
-buffer = rpath(fs_ctx, path);
-if (lchown(buffer, credp->fc_uid, credp->fc_gid) < 0) {
+if (fchownat(dirfd, name, credp->fc_uid, credp->fc_gid,
+ AT_SYMLINK_NOFOLLOW | AT_EMPTY_PATH) < 0) {
 /*
  * If we fail to change ownership and if we are
  * using security model none. Ignore the error
  */
 if ((fs_ctx->export_flags & V9FS_SEC_MASK) != V9FS_SM_NONE) {
-goto err;
+return -1;
 }
 }
 
-if (chmod(buffer, credp->fc_mode & 0) < 0) {
-goto err;
+if (name[0]) {
+fd = openat_nofollow(dirfd, name, O_RDONLY, 0);
+if (fd == -1) {
+return -1;
+}
+} else {
+fd = dirfd;
 }
 
-g_free(buffer);
-return 0;
-err:
-g_free(buffer);
-return -1;
+if (fchmod(fd, credp->fc_mode & 0) < 0) {
+goto out;
+}
+err = 0;
+
+out:
+if (name[0]) {
+close_preserve_errno(fd);
+}
+return err;
 }
 
 static int readlink_nofollow(FsContext *fs_ctx, const char *path, char *buf,
@@ -637,34 +654,27 @@ out:
 static int local_mknod_passthrough(FsContext *fs_ctx, V9fsPath *dir_path,
const char *name, FsCred *credp)
 {
-char *path;
-int err = -1;
-int serrno = 0;
-V9fsString fullname;
-char *buffer = NULL;
+int dirfd, err;
 
-v9fs_string_init(&fullname);
-v9fs_string_sprintf(&fullname, "%s/%s", dir_path->data, name);
-path = fullname.data;
+dirfd = local_opendir_nofollow(fs_ctx, dir_path->data);
+if (dirfd == -1) {
+return -1;
+}
 
-buffer = rpath(fs_ctx, path);
-err = mknod(buffer, credp->fc_mode, credp->fc_rdev);
+err = mknodat(dirfd, name, credp->fc_mode, credp->fc_rdev);
 if (err == -1) {
 goto out;
 }
-err = local_post_create_passthrough(fs_ctx, path, credp);
+err = local_post_create_passthrough(fs_ctx, dirfd, name, credp);
 if (err == -1) {
-serrno = errno;
 goto err_end;
 }
 goto out;
 
 err_end:
-remove(buffer);
-errno = serrno;
+unlinkat_preserve_errno(dirfd, name, 0);
 out:
-g_free(buffer);
-v9fs_string_free(&fullname);
+close_preserve_errno(dirfd);
 return err;
 }
 
@@ -755,34 +765,27 @@ out:
 static int local_mkdir_passthrough(FsContext *fs_ctx, V9fsPath *dir_path,
const char *name, FsCred *credp)
 {
-char *path;
-int err = -1;
-int serrno = 0;
-V9fsString fullname;
-char *buffer = NULL;
+int dirfd, err;
 
-v9fs_string_init(&fullname);
-v9fs_string_sprintf(&fullname, "%s/%s", dir_path->data, name);
-path = fullname.data;
+dirfd = local_opendir_nofollow(fs_ctx, dir_path->data);
+if (dirfd == -1) {
+return -1;
+}
 
-buffer = rpath(fs_ctx, path);
-err = mkdir(buffer, credp->fc_mode);
+err = mkdirat(dirfd, name, credp->fc_mode);
 if (err == -1) {
 goto out;
 }
-err = local_post_create_passthrough(fs_ctx, path, credp);
+err = local_post_create_passthrough(fs_ctx, dirfd, name, credp);
 if (err == -1) {
-serrno = errno;
 goto err_end;
 }
 goto out;
 
 err_end:
-remove(buffer);
-errno = serrno;
+unlinkat_preserve_errno(dirfd, name, AT_REMOVEDIR);
 out:
-g_free(buffer);
-v9fs_string_free(&fullname);
+close_preserve_errno(dirfd);
 return err;
 }
 
@@ -939,44 +942,31 @@ static int local_open2_passthrough(FsContext *fs_ctx, 
V9fsPath *dir_path,
const char *name, int flags, FsCred *credp,
V9fsFidOpenState *fs)
 {
-char *path;
-int fd = -1;
 int err = -1;
-int serrno = 0;
-V9fsString fullname;
-char *buffer = NULL;
+

[Qemu-devel] [PATCH RFC 35/36] 9pfs: local: lsetxattr: don't follow symlinks

2017-01-30 Thread Greg Kurz

This fixes CVE-2016-9602 for all security models.

Signed-off-by: Greg Kurz 
---
 hw/9pfs/9p-posix-acl.c  |   18 --
 hw/9pfs/9p-xattr-user.c |8 +---
 hw/9pfs/9p-xattr.c  |8 +---
 3 files changed, 6 insertions(+), 28 deletions(-)

diff --git a/hw/9pfs/9p-posix-acl.c b/hw/9pfs/9p-posix-acl.c
index 9435e27a368c..0154e2a7605f 100644
--- a/hw/9pfs/9p-posix-acl.c
+++ b/hw/9pfs/9p-posix-acl.c
@@ -50,13 +50,8 @@ static ssize_t mp_pacl_listxattr(FsContext *ctx, const char 
*path,
 static int mp_pacl_setxattr(FsContext *ctx, const char *path, const char *name,
 void *value, size_t size, int flags)
 {
-char *buffer;
-int ret;
-
-buffer = rpath(ctx, path);
-ret = lsetxattr(buffer, MAP_ACL_ACCESS, value, size, flags);
-g_free(buffer);
-return ret;
+return local_setxattr_nofollow(ctx, path, MAP_ACL_ACCESS, value, size,
+   flags);
 }
 
 static int mp_pacl_removexattr(FsContext *ctx,
@@ -108,13 +103,8 @@ static ssize_t mp_dacl_listxattr(FsContext *ctx, const 
char *path,
 static int mp_dacl_setxattr(FsContext *ctx, const char *path, const char *name,
 void *value, size_t size, int flags)
 {
-char *buffer;
-int ret;
-
-buffer = rpath(ctx, path);
-ret = lsetxattr(buffer, MAP_ACL_DEFAULT, value, size, flags);
-g_free(buffer);
-return ret;
+return local_setxattr_nofollow(ctx, path, MAP_ACL_DEFAULT, value, size,
+   flags);
 }
 
 static int mp_dacl_removexattr(FsContext *ctx,
diff --git a/hw/9pfs/9p-xattr-user.c b/hw/9pfs/9p-xattr-user.c
index 4071fbc4c086..1840a5db66f3 100644
--- a/hw/9pfs/9p-xattr-user.c
+++ b/hw/9pfs/9p-xattr-user.c
@@ -67,9 +67,6 @@ static ssize_t mp_user_listxattr(FsContext *ctx, const char 
*path,
 static int mp_user_setxattr(FsContext *ctx, const char *path, const char *name,
 void *value, size_t size, int flags)
 {
-char *buffer;
-int ret;
-
 if (strncmp(name, "user.virtfs.", 12) == 0) {
 /*
  * Don't allow fetch of user.virtfs namesapce
@@ -78,10 +75,7 @@ static int mp_user_setxattr(FsContext *ctx, const char 
*path, const char *name,
 errno = EACCES;
 return -1;
 }
-buffer = rpath(ctx, path);
-ret = lsetxattr(buffer, name, value, size, flags);
-g_free(buffer);
-return ret;
+return local_setxattr_nofollow(ctx, path, name, value, size, flags);
 }
 
 static int mp_user_removexattr(FsContext *ctx,
diff --git a/hw/9pfs/9p-xattr.c b/hw/9pfs/9p-xattr.c
index 08df02e0bab2..053ab6eff692 100644
--- a/hw/9pfs/9p-xattr.c
+++ b/hw/9pfs/9p-xattr.c
@@ -326,13 +326,7 @@ ssize_t pt_getxattr(FsContext *ctx, const char *path, 
const char *name,
 int pt_setxattr(FsContext *ctx, const char *path, const char *name, void 
*value,
 size_t size, int flags)
 {
-char *buffer;
-int ret;
-
-buffer = rpath(ctx, path);
-ret = lsetxattr(buffer, name, value, size, flags);
-g_free(buffer);
-return ret;
+return local_setxattr_nofollow(ctx, path, name, value, size, flags);
 }
 
 int pt_removexattr(FsContext *ctx, const char *path, const char *name)

[Qemu-devel] [PATCH RFC 26/36] 9pfs: local: chown: don't follow symlinks

2017-01-30 Thread Greg Kurz

This fixes CVE-2016-9602 for the "passthrough" security model.

Signed-off-by: Greg Kurz 
---
 hw/9pfs/9p-local.c |   20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/hw/9pfs/9p-local.c b/hw/9pfs/9p-local.c
index bbc08184564f..5e320917c484 100644
--- a/hw/9pfs/9p-local.c
+++ b/hw/9pfs/9p-local.c
@@ -1287,13 +1287,21 @@ out:
 static int local_chown_passthrough(FsContext *fs_ctx, V9fsPath *fs_path,
FsCred *credp)
 {
-char *buffer;
-int ret = -1;
-char *path = fs_path->data;
+char *dirpath = local_dirname(fs_path->data);
+char *name = local_basename(fs_path->data);
+int dirfd, ret = -1;
 
-buffer = rpath(fs_ctx, path);
-ret = lchown(buffer, credp->fc_uid, credp->fc_gid);
-g_free(buffer);
+dirfd = local_opendir_nofollow(fs_ctx, dirpath);
+if (dirfd == -1) {
+goto out;
+}
+ret = fchownat(dirfd, name, credp->fc_uid, credp->fc_gid,
+   AT_SYMLINK_NOFOLLOW);
+close_preserve_errno(dirfd);
+
+out:
+g_free(name);
+g_free(dirpath);
 return ret;
 }

[Qemu-devel] [PATCH RFC 36/36] 9pfs: local: lremovexattr: don't follow symlinks

2017-01-30 Thread Greg Kurz

This fixes CVE-2016-9602 for all security models.

Signed-off-by: Greg Kurz 
---
 hw/9pfs/9p-posix-acl.c  |   14 --
 hw/9pfs/9p-xattr-user.c |   12 +++-
 hw/9pfs/9p-xattr.c  |8 +---
 3 files changed, 8 insertions(+), 26 deletions(-)

diff --git a/hw/9pfs/9p-posix-acl.c b/hw/9pfs/9p-posix-acl.c
index 0154e2a7605f..73aee0e9d1fa 100644
--- a/hw/9pfs/9p-posix-acl.c
+++ b/hw/9pfs/9p-posix-acl.c
@@ -54,14 +54,12 @@ static int mp_pacl_setxattr(FsContext *ctx, const char 
*path, const char *name,
flags);
 }
 
-static int mp_pacl_removexattr(FsContext *ctx,
-   const char *path, const char *name)
+static int mp_pacl_removexattr(FsContext *ctx, const char *path,
+   const char *name)
 {
 int ret;
-char *buffer;
 
-buffer = rpath(ctx, path);
-ret  = lremovexattr(buffer, MAP_ACL_ACCESS);
+ret = local_removexattr_nofollow(ctx, MAP_ACL_ACCESS, name);
 if (ret == -1 && errno == ENODATA) {
 /*
  * We don't get ENODATA error when trying to remove a
@@ -71,7 +69,6 @@ static int mp_pacl_removexattr(FsContext *ctx,
 errno = 0;
 ret = 0;
 }
-g_free(buffer);
 return ret;
 }
 
@@ -111,10 +108,8 @@ static int mp_dacl_removexattr(FsContext *ctx,
const char *path, const char *name)
 {
 int ret;
-char *buffer;
 
-buffer = rpath(ctx, path);
-ret  = lremovexattr(buffer, MAP_ACL_DEFAULT);
+ret = local_removexattr_nofollow(ctx, MAP_ACL_DEFAULT, name);
 if (ret == -1 && errno == ENODATA) {
 /*
  * We don't get ENODATA error when trying to remove a
@@ -124,7 +119,6 @@ static int mp_dacl_removexattr(FsContext *ctx,
 errno = 0;
 ret = 0;
 }
-g_free(buffer);
 return ret;
 }
 
diff --git a/hw/9pfs/9p-xattr-user.c b/hw/9pfs/9p-xattr-user.c
index 1840a5db66f3..ec606a4146a7 100644
--- a/hw/9pfs/9p-xattr-user.c
+++ b/hw/9pfs/9p-xattr-user.c
@@ -78,12 +78,9 @@ static int mp_user_setxattr(FsContext *ctx, const char 
*path, const char *name,
 return local_setxattr_nofollow(ctx, path, name, value, size, flags);
 }
 
-static int mp_user_removexattr(FsContext *ctx,
-   const char *path, const char *name)
+static int mp_user_removexattr(FsContext *ctx, const char *path,
+   const char *name)
 {
-char *buffer;
-int ret;
-
 if (strncmp(name, "user.virtfs.", 12) == 0) {
 /*
  * Don't allow fetch of user.virtfs namesapce
@@ -92,10 +89,7 @@ static int mp_user_removexattr(FsContext *ctx,
 errno = EACCES;
 return -1;
 }
-buffer = rpath(ctx, path);
-ret = lremovexattr(buffer, name);
-g_free(buffer);
-return ret;
+return local_removexattr_nofollow(ctx, path, name);
 }
 
 XattrOperations mapped_user_xattr = {
diff --git a/hw/9pfs/9p-xattr.c b/hw/9pfs/9p-xattr.c
index 053ab6eff692..bbfef248e767 100644
--- a/hw/9pfs/9p-xattr.c
+++ b/hw/9pfs/9p-xattr.c
@@ -331,13 +331,7 @@ int pt_setxattr(FsContext *ctx, const char *path, const 
char *name, void *value,
 
 int pt_removexattr(FsContext *ctx, const char *path, const char *name)
 {
-char *buffer;
-int ret;
-
-buffer = rpath(ctx, path);
-ret = lremovexattr(path, name);
-g_free(buffer);
-return ret;
+return local_removexattr_nofollow(ctx, path, name);
 }
 
 ssize_t notsup_getxattr(FsContext *ctx, const char *path, const char *name,

[Qemu-devel] [PATCH RFC 32/36] 9pfs: local: lstat: don't follow symlinks

2017-01-30 Thread Greg Kurz

This fixes CVE-2016-9602 for the "passthrough" and "mapped" security models.

Signed-off-by: Greg Kurz 
---
 hw/9pfs/9p-local.c |   36 
 1 file changed, 24 insertions(+), 12 deletions(-)

diff --git a/hw/9pfs/9p-local.c b/hw/9pfs/9p-local.c
index 60edfb25f8a5..b46a81792ecf 100644
--- a/hw/9pfs/9p-local.c
+++ b/hw/9pfs/9p-local.c
@@ -168,12 +168,17 @@ static void local_mapped_file_attr(FsContext *ctx, const 
char *path,
 
 static int local_lstat(FsContext *fs_ctx, V9fsPath *fs_path, struct stat 
*stbuf)
 {
-int err;
-char *buffer;
-char *path = fs_path->data;
+int err = -1;
+char *dirpath = local_dirname(fs_path->data);
+char *name = local_basename(fs_path->data);
+int dirfd;
 
-buffer = rpath(fs_ctx, path);
-err =  lstat(buffer, stbuf);
+dirfd = local_opendir_nofollow(fs_ctx, dirpath);
+if (dirfd == -1) {
+goto out;
+}
+
+err = fstatat(dirfd, name, stbuf, AT_SYMLINK_NOFOLLOW);
 if (err) {
 goto err_out;
 }
@@ -183,25 +188,32 @@ static int local_lstat(FsContext *fs_ctx, V9fsPath 
*fs_path, struct stat *stbuf)
 gid_t tmp_gid;
 mode_t tmp_mode;
 dev_t tmp_dev;
-if (getxattr(buffer, "user.virtfs.uid", &tmp_uid, sizeof(uid_t)) > 0) {
+
+if (fgetxattrat(dirfd, name, "user.virtfs.uid", &tmp_uid,
+sizeof(uid_t)) > 0) {
 stbuf->st_uid = le32_to_cpu(tmp_uid);
 }
-if (getxattr(buffer, "user.virtfs.gid", &tmp_gid, sizeof(gid_t)) > 0) {
+if (fgetxattrat(dirfd, name, "user.virtfs.gid", &tmp_gid,
+sizeof(gid_t)) > 0) {
 stbuf->st_gid = le32_to_cpu(tmp_gid);
 }
-if (getxattr(buffer, "user.virtfs.mode",
-&tmp_mode, sizeof(mode_t)) > 0) {
+if (fgetxattrat(dirfd, name, "user.virtfs.mode", &tmp_mode,
+sizeof(mode_t)) > 0) {
 stbuf->st_mode = le32_to_cpu(tmp_mode);
 }
-if (getxattr(buffer, "user.virtfs.rdev", &tmp_dev, sizeof(dev_t)) > 0) 
{
+if (fgetxattrat(dirfd, name, "user.virtfs.rdev", &tmp_dev,
+sizeof(dev_t)) > 0) {
 stbuf->st_rdev = le64_to_cpu(tmp_dev);
 }
 } else if (fs_ctx->export_flags & V9FS_SM_MAPPED_FILE) {
-local_mapped_file_attr(fs_ctx, path, stbuf);
+local_mapped_file_attr(fs_ctx, fs_path->data, stbuf);
 }
 
 err_out:
-g_free(buffer);
+close_preserve_errno(dirfd);
+out:
+g_free(name);
+g_free(dirpath);
 return err;
 }

[Qemu-devel] [PATCH v2] hw/core/register: Mark the device with cannot_instantiate_with_device_add_yet

2017-01-30 Thread Thomas Huth

The "qemu,register" device needs to be wired up in source code, there
is no way the user can add this device with the "-device" parameter or
the "device_add" monitor command yet.

Signed-off-by: Thomas Huth 
---
 v2: Changed patch description and comment

 hw/core/register.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/hw/core/register.c b/hw/core/register.c
index 4bfbc50..1416691 100644
--- a/hw/core/register.c
+++ b/hw/core/register.c
@@ -274,9 +274,18 @@ void register_finalize_block(RegisterInfoArray *r_array)
 g_free(r_array);
 }
 
+static void register_class_init(ObjectClass *oc, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(oc);
+
+/* Needs to be wired up in source code, see register_init_block32() */
+dc->cannot_instantiate_with_device_add_yet = true;
+}
+
 static const TypeInfo register_info = {
 .name  = TYPE_REGISTER,
 .parent = TYPE_DEVICE,
+.class_init = register_class_init,
 };
 
 static void register_register_types(void)
-- 
1.8.3.1

[Qemu-devel] [PATCH RFC 34/36] 9pfs: local: llistxattr: don't follow symlinks

2017-01-30 Thread Greg Kurz

This fixes CVE-2016-9602 for all security models.

Signed-off-by: Greg Kurz 
---
 hw/9pfs/9p-xattr.c |   30 --
 1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/hw/9pfs/9p-xattr.c b/hw/9pfs/9p-xattr.c
index 29f4f940a23f..08df02e0bab2 100644
--- a/hw/9pfs/9p-xattr.c
+++ b/hw/9pfs/9p-xattr.c
@@ -214,6 +214,11 @@ ssize_t pt_listxattr(FsContext *ctx, const char *path,
 return name_size;
 }
 
+static ssize_t flistxattrat(int dirfd, const char *path, char *list,
+size_t size)
+{
+return do_xattrat_op(XATTRAT_OP_LIST, dirfd, path, NULL, list, size, 0);
+}
 
 /*
  * Get the list and pass to each layer to find out whether
@@ -223,24 +228,37 @@ ssize_t v9fs_list_xattr(FsContext *ctx, const char *path,
 void *value, size_t vsize)
 {
 ssize_t size = 0;
-char *buffer;
 void *ovalue = value;
 XattrOperations *xops;
 char *orig_value, *orig_value_start;
 ssize_t xattr_len, parsed_len = 0, attr_len;
+char *dirpath, *name;
+int dirfd;
 
 /* Get the actual len */
-buffer = rpath(ctx, path);
-xattr_len = llistxattr(buffer, value, 0);
+dirpath = local_dirname(path);
+dirfd = local_opendir_nofollow(ctx, dirpath);
+g_free(dirpath);
+if (dirfd == -1) {
+return -1;
+}
+
+name = local_basename(path);
+xattr_len = flistxattrat(dirfd, name, value, 0);
 if (xattr_len <= 0) {
-g_free(buffer);
+g_free(name);
+close_preserve_errno(dirfd);
 return xattr_len;
 }
 
 /* Now fetch the xattr and find the actual size */
 orig_value = g_malloc(xattr_len);
-xattr_len = llistxattr(buffer, orig_value, xattr_len);
-g_free(buffer);
+xattr_len = flistxattrat(dirfd, name, orig_value, xattr_len);
+g_free(name);
+close_preserve_errno(dirfd);
+if (xattr_len < 0) {
+return -1;
+}
 
 /* store the orig pointer */
 orig_value_start = orig_value;

Re: [Qemu-devel] [PULL v2 00/41] Misc patches for 2017-01-27

2017-01-30 Thread Laszlo Ersek

Peter,

On 01/27/17 20:35, Paolo Bonzini wrote:
> The following changes since commit 29ba0cdc1fd1300f910d150c03a0f74236083bf7:
> 
>   Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2017-01-27' 
> into staging (2017-01-27 15:20:08 +)
> 
> are available in the git repository at:
> 
>   git://github.com/bonzini/qemu.git tags/for-upstream
> 
> for you to fetch changes up to 6da67de6803e93cbb7e93ac3497865832f8c00ea:
> 
>   memory: don't sign-extend 32-bit writes (2017-01-27 18:08:00 +0100)
> 

Can you please pull this v2? I think it resolves the conflict you
pointed out in
.

The pull request touches a sizeable and diverse set of files (cf. "misc"
in the subject); the more we wait with it the more likely another
conflict emerges. I have firmware patches that depend on my stuff
herein, so if you can find the time, please...

Thank you!
Laszlo

> 
> * SCSI max_transfer support for scsi-generic (Eric)
> * x86 SMI broadcast (Laszlo)
> * Character device QOMification (Marc-André)
> * Record/replay improvements (Pavel)
> * iscsi fixes (Peter L.)
> * "info mtree -f" command (Peter Xu)
> * TSC clock rate reporting (Phil)
> * DEVICE_CATEGORY_CPU (Thomas)
> * Memory sign-extension fix (Ladi)
> 
> 
> Eric Farman (3):
>   hw/scsi: Fix debug message of cdb structure in scsi-generic
>   block: Fix target variable of BLKSECTGET ioctl
>   block: get max_transfer limit for char (scsi-generic) devices
> 
> Ladi Prosek (1):
>   memory: don't sign-extend 32-bit writes
> 
> Laszlo Ersek (3):
>   hw/isa/lpc_ich9: add SMI feature negotiation via fw_cfg
>   hw/isa/lpc_ich9: add broadcast SMI feature
>   hw/isa/lpc_ich9: negotiate SMI broadcast on pc-q35-2.9+ machine types
> 
> Marc-André Lureau (20):
>   tests: fix linking test-char on win32
>   qemu-options: stdio is available on win32
>   char: add qemu_chr_fe_add_watch() Returns description
>   doc: fix spelling
>   char: use a const CharDriver
>   char: use a static array for backends
>   char: move callbacks in CharDriver
>   char: fold single-user functions in caller
>   char: introduce generic qemu_chr_get_kind()
>   char: use a feature bit for replay
>   char: allocate CharDriverState as a single object
>   bt: use qemu_chr_alloc()
>   char: rename CharDriverState Chardev
>   char: rename TCPChardev and NetChardev
>   spice-char: improve error reporting
>   char: use error_report()
>   gtk: overwrite the console.c char driver
>   baum: use a common prefix for chr callbacks
>   vc: use a common prefix for chr callbacks
>   chardev: qom-ify
> 
> Pavel Dovgalyuk (7):
>   icount: update instruction counter on apic patching
>   replay: improve interrupt handling
>   replay: don't use rtc clock on loadvm phase
>   savevm: add public save_vmstate function
>   replay: save/load initial state
>   replay: exception replay fix
>   apic: save apic_delivered flag
> 
> Peter Lieven (2):
>   block/iscsi: avoid data corruption with cache=writeback
>   block/iscsi: statically link qemu_iscsi_opts
> 
> Peter Xu (2):
>   memory: tune mtree_print_mr() to dump mr type
>   memory: hmp: add "-f" for "info mtree"
> 
> Phil Dennis-Jordan (2):
>   x86-KVM: Supply TSC and APIC clock rates to guest like VMWare
>   pc: Enable vmware-cpuid-freq CPU option for 2.9+ machine types
> 
> Thomas Huth (1):
>   Introduce DEVICE_CATEGORY_CPU for CPU devices
> 
>  MAINTAINERS   |1 +
>  backends/baum.c   |  102 +-
>  backends/msmouse.c|   77 +-
>  backends/rng-egd.c|4 +-
>  backends/testdev.c|   53 +-
>  block/Makefile.objs   |1 +
>  block/file-posix.c|   19 +-
>  block/iscsi-opts.c|   69 ++
>  block/iscsi.c |8 +-
>  cpu-exec.c|2 +-
>  docs/replay.txt   |   16 +
>  exec.c|2 +-
>  gdbstub.c |   45 +-
>  hmp-commands-info.hx  |6 +-
>  hw/arm/fsl-imx25.c|2 +-
>  hw/arm/fsl-imx31.c|2 +-
>  hw/arm/fsl-imx6.c |2 +-
>  hw/arm/nseries.c  |2 +-
>  hw/arm/omap2.c|2 +-
>  hw/arm/pxa2xx.c   |2 +-
>  hw/arm/virt.c |2 +-
>  hw/bt/hci-csr.c   |   64 +-
>  hw/char/escc.c|2 +-
>  hw/char/exynos4210_uart.c |2 +-
>  hw/char/imx_serial.c  |2 +-
>  hw/char/mcf_uart.c|4 +-
>  hw/char/omap_uart.c   |6 +-
>  hw/char/parallel.c|

[Qemu-devel] [PATCH RFC 31/36] 9pfs: local: introduce symlink-attack safe xattr helpers

2017-01-30 Thread Greg Kurz

There are no "at" variants for xattr syscalls. This patch implement them
using a separate process.

Signed-off-by: Greg Kurz 
---
 hw/9pfs/9p-xattr.c |  156 
 hw/9pfs/9p-xattr.h |   11 
 2 files changed, 167 insertions(+)

diff --git a/hw/9pfs/9p-xattr.c b/hw/9pfs/9p-xattr.c
index 19a2daf02f5c..ea0695f37242 100644
--- a/hw/9pfs/9p-xattr.c
+++ b/hw/9pfs/9p-xattr.c
@@ -15,7 +15,163 @@
 #include "9p.h"
 #include "fsdev/file-op-9p.h"
 #include "9p-xattr.h"
+#include "9p-util.h"
 
+enum {
+XATTRAT_OP_GET = 0,
+XATTRAT_OP_LIST,
+XATTRAT_OP_SET,
+XATTRAT_OP_REMOVE
+};
+
+struct xattrat_data {
+ssize_t ret;
+int serrno;
+char value[0];
+};
+
+static void munmap_preserver_errno(void *addr, size_t length)
+{
+int serrno = errno;
+munmap(addr, length);
+errno = serrno;
+}
+
+static ssize_t do_xattrat_op(int op_type, int dirfd, const char *path,
+ const char *name, void *value, size_t size,
+ int flags)
+{
+struct xattrat_data *data;
+pid_t pid;
+ssize_t ret = -1;
+int wstatus;
+
+data = mmap(NULL, sizeof(*data) + size, PROT_READ | PROT_WRITE,
+MAP_SHARED | MAP_ANONYMOUS, -1, 0);
+if (data == MAP_FAILED) {
+return -1;
+}
+data->ret = -1;
+
+pid = fork();
+if (pid < 0) {
+goto err_out;
+} else if (pid == 0) {
+if (fchdir(dirfd) == 0) {
+switch (op_type) {
+case XATTRAT_OP_GET:
+data->ret = lgetxattr(path, name, data->value, size);
+break;
+case XATTRAT_OP_LIST:
+data->ret = llistxattr(path, data->value, size);
+break;
+case XATTRAT_OP_SET:
+data->ret = lsetxattr(path, name, data->value, size, flags);
+break;
+case XATTRAT_OP_REMOVE:
+data->ret = lremovexattr(path, name);
+break;
+default:
+g_assert_not_reached();
+}
+}
+data->serrno = errno;
+_exit(0);
+}
+assert(waitpid(pid, &wstatus, 0) == pid && WIFEXITED(wstatus));
+
+ret = data->ret;
+if (ret < 0) {
+errno = data->serrno;
+goto err_out;
+}
+memcpy(value, data->value, data->ret);
+
+err_out:
+munmap_preserver_errno(data, sizeof(*data) + size);
+return ret;
+}
+
+ssize_t fgetxattrat(int dirfd, const char *path, const char *name, void *value,
+size_t size)
+{
+return do_xattrat_op(XATTRAT_OP_GET, dirfd, path, name, value, size, 0);
+}
+
+ssize_t local_getxattr_nofollow(FsContext *ctx, const char *path,
+const char *name, void *value, size_t size)
+{
+char *dirpath = local_dirname(path);
+char *filename = local_basename(path);
+int dirfd;
+ssize_t ret = -1;
+
+dirfd = local_opendir_nofollow(ctx, dirpath);
+if (dirfd == -1) {
+goto out;
+}
+
+ret = fgetxattrat(dirfd, filename, name, value, size);
+close_preserve_errno(dirfd);
+out:
+g_free(dirpath);
+g_free(filename);
+return ret;
+}
+
+static ssize_t fsetxattrat(int dirfd, const char *path, const char *name,
+   void *value, size_t size, int flags)
+{
+return do_xattrat_op(XATTRAT_OP_SET, dirfd, path, name, value, size, 
flags);
+}
+
+ssize_t local_setxattr_nofollow(FsContext *ctx, const char *path,
+const char *name, void *value, size_t size,
+int flags)
+{
+char *dirpath = local_dirname(path);
+char *filename = local_basename(path);
+int dirfd;
+ssize_t ret = -1;
+
+dirfd = local_opendir_nofollow(ctx, dirpath);
+if (dirfd == -1) {
+goto out;
+}
+
+ret = fsetxattrat(dirfd, filename, name, value, size, flags);
+close_preserve_errno(dirfd);
+out:
+g_free(dirpath);
+g_free(filename);
+return ret;
+}
+
+static ssize_t fremovexattrat(int dirfd, const char *path, const char *name)
+{
+return do_xattrat_op(XATTRAT_OP_GET, dirfd, path, name, NULL, 0, 0);
+}
+
+ssize_t local_removexattr_nofollow(FsContext *ctx, const char *path,
+   const char *name)
+{
+char *dirpath = local_dirname(path);
+char *filename = local_basename(path);
+int dirfd;
+ssize_t ret = -1;
+
+dirfd = local_opendir_nofollow(ctx, dirpath);
+if (dirfd == -1) {
+goto out;
+}
+
+ret = fremovexattrat(dirfd, filename, name);
+close_preserve_errno(dirfd);
+out:
+g_free(dirpath);
+g_free(filename);
+return ret;
+}
 
 static XattrOperations *get_xattr_operations(XattrOperations **h,
  const char *name)
diff --git a/hw/9pfs/9p-xattr.h b/hw/9pfs/9p-xattr.h
index 3f43f5153f3c..d95ccc26a18d 100644
--- a/hw/9pfs/9p-xattr.h
+++ b/hw/9pfs/9p-xattr.h
@@ -15,6

Re: [Qemu-devel] [PULL v2 00/41] Misc patches for 2017-01-27

2017-01-30 Thread Peter Maydell

On 30 January 2017 at 12:33, Laszlo Ersek  wrote:
> Peter,
>
> On 01/27/17 20:35, Paolo Bonzini wrote:
>> The following changes since commit 29ba0cdc1fd1300f910d150c03a0f74236083bf7:
>>
>>   Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2017-01-27' 
>> into staging (2017-01-27 15:20:08 +)
>>
>> are available in the git repository at:
>>
>>   git://github.com/bonzini/qemu.git tags/for-upstream
>>
>> for you to fetch changes up to 6da67de6803e93cbb7e93ac3497865832f8c00ea:
>>
>>   memory: don't sign-extend 32-bit writes (2017-01-27 18:08:00 +0100)
>>
>
> Can you please pull this v2?

I am currently processing it, yes. Paolo only sent it to the list
late on Friday evening, and I don't generally work weekends,
so it's not like this has been lying around for a long time...

thanks
-- PMM

[Qemu-devel] [PATCH v2] hw/core/or-irq: Mark the device with cannot_instantiate_with_device_add_yet

2017-01-30 Thread Thomas Huth

The "or-irq" device needs to be wired up in source code, there is no
way the user can add this device with the "-device" parameter or the
"device_add" monitor command yet.

Signed-off-by: Thomas Huth 
---
 v2: Changed patch description and comment

 hw/core/or-irq.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/core/or-irq.c b/hw/core/or-irq.c
index 1ac090d..808c8d1 100644
--- a/hw/core/or-irq.c
+++ b/hw/core/or-irq.c
@@ -89,6 +89,9 @@ static void or_irq_class_init(ObjectClass *klass, void *data)
 dc->props = or_irq_properties;
 dc->realize = or_irq_realize;
 dc->vmsd = &vmstate_or_irq;
+
+/* Reason: Needs to be wired up in source code, e.g. see stm32f205_soc.c */
+dc->cannot_instantiate_with_device_add_yet = true;
 }
 
 static const TypeInfo or_irq_type_info = {
-- 
1.8.3.1

Re: [Qemu-devel] [PATCH v2 10/11] aspeed: use first FMC flash as a boot ROM

2017-01-30 Thread Peter Maydell

On 9 January 2017 at 16:24, Cédric Le Goater  wrote:
> Create a ROM region, using the default size of the mapping window for
> the CE0 FMC flash module, and fill it with the flash content.
>
> This is a little hacky but until we can boot from a MMIO region, it
> seems difficult to do anything else.
>
> Signed-off-by: Cédric Le Goater 
> Reviewed-by: Joel Stanley 
> Reviewed-by: Andrew Jeffery 
> ---
>  hw/arm/aspeed.c | 41 +
>  1 file changed, 41 insertions(+)
>
> diff --git a/hw/arm/aspeed.c b/hw/arm/aspeed.c
> index 40c13838fb2d..a92c2f1c362b 100644
> --- a/hw/arm/aspeed.c
> +++ b/hw/arm/aspeed.c
> @@ -20,6 +20,8 @@
>  #include "qemu/log.h"
>  #include "sysemu/block-backend.h"
>  #include "sysemu/blockdev.h"
> +#include "hw/loader.h"
> +#include "qemu/error-report.h"
>
>  static struct arm_boot_info aspeed_board_binfo = {
>  .board_id = -1, /* device-tree-only board */
> @@ -104,6 +106,28 @@ static const AspeedBoardConfig aspeed_boards[] = {
>  },
>  };
>
> +#define FIRMWARE_ADDR 0x0
> +
> +static void write_boot_rom(DriveInfo *dinfo, hwaddr addr, size_t rom_size,
> +   Error **errp)
> +{
> +BlockBackend *blk = blk_by_legacy_dinfo(dinfo);
> +uint8_t *storage;
> +
> +if (rom_size > blk_getlength(blk)) {
> +rom_size = blk_getlength(blk);
> +}
> +
> +storage = g_new0(uint8_t, rom_size);

Hi -- coverity points out that you're not checking for the
case where blk_getlength() returns negative. (This can happen
for the case where the BlockBackend represents an insertable
device like a floppy or cdrom with no medium present.)

Since for aspeed this dinfo always represents a flash device,
we know this shouldn't happen, so you can just make it a fatal
error if blk_getlength() doesn't return what you're expecting.

> +if (blk_pread(blk, 0, storage, rom_size) < 0) {
> +error_setg(errp, "failed to read the initial flash content");
> +return;
> +}
> +
> +rom_add_blob_fixed("aspeed.boot_rom", storage, rom_size, addr);
> +g_free(storage);
> +}

thanks
-- PMM

Re: [Qemu-devel] [PATCH 2/2] migration: discard non-dirty ram pages after the start of postcopy

2017-01-30 Thread Pavel Butsykin


On 27.01.2017 14:39, Dr. David Alan Gilbert wrote:

* Pavel Butsykin (pbutsy...@virtuozzo.com) wrote:

After the start of postcopy migration there are some non-dirty pages which have
already been migrated. These pages are no longer needed on the source vm so that
we can free them and it doen't hurt to complete the migration.

Signed-off-by: Pavel Butsykin 
---
  include/migration/migration.h |  1 +
  migration/migration.c |  2 ++
  migration/ram.c   | 25 +
  3 files changed, 28 insertions(+)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index d7bd404365..0d9b81545c 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -279,6 +279,7 @@ int ram_postcopy_send_discard_bitmap(MigrationState *ms);
  int ram_discard_range(MigrationIncomingState *mis, const char *block_name,
uint64_t start, size_t length);
  int ram_postcopy_incoming_init(MigrationIncomingState *mis);
+void ram_postcopy_migrated_memory_discard(MigrationState *ms);

  /**
   * @migrate_add_blocker - prevent migration from proceeding
diff --git a/migration/migration.c b/migration/migration.c
index 391db6f28b..20490ed020 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1662,6 +1662,8 @@ static int postcopy_start(MigrationState *ms, bool 
*old_vm_running)
   */
  qemu_savevm_send_ping(ms->to_dst_file, 4);

+ram_postcopy_migrated_memory_discard(ms);
+


Did you intend this to be selected based on your capability flag?



I did, but apparently lost..




  ret = qemu_file_get_error(ms->to_dst_file);
  if (ret) {
  error_report("postcopy_start: Migration stream errored");
diff --git a/migration/ram.c b/migration/ram.c
index b0322a0b5c..8a6b614b0d 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -1546,6 +1546,31 @@ void ram_debug_dump_bitmap(unsigned long *todump, bool 
expected)

  /*  functions for postcopy * */

+void ram_postcopy_migrated_memory_discard(MigrationState *ms)
+{
+struct RAMBlock *block;
+unsigned long *bitmap = atomic_rcu_read(&migration_bitmap_rcu)->bmap;
+
+QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+unsigned long first = block->offset >> TARGET_PAGE_BITS;
+unsigned long range = first + (block->used_length >> TARGET_PAGE_BITS);
+unsigned long run_start = find_next_zero_bit(bitmap, range, first);
+
+while (run_start < range) {
+unsigned long run_end = find_next_bit(bitmap, range, run_start + 
1);
+uint8_t *addr = block->host + (run_start << TARGET_PAGE_BITS);
+size_t chunk_size = (run_end - run_start) << TARGET_PAGE_BITS;
+
+if (qemu_madvise(addr, chunk_size, QEMU_MADV_DONTNEED) < 0) {
+error_report("migrate: madvise DONTNEED failed %p %ld: %s",
+ addr, chunk_size, strerror(errno));
+}


can you use your ram_discard_page here, it keeps all the use of madvise 
together.


ok.


+
+run_start = find_next_zero_bit(bitmap, range, run_end + 1);
+}
+}
+}


Dave


  /*
   * Callback from postcopy_each_ram_send_discard for each RAMBlock
   * Note: At this point the 'unsentmap' is the processed bitmap combined
--
2.11.0



--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

[Qemu-devel] [PATCH v2] target/s390x: Fix broken user mode

2017-01-30 Thread Stefan Weil

Returning NULL from get_max_cpu_model results in a SIGSEGV runtime error.

Signed-off-by: Stefan Weil 
---

v2: Re-sent as v1 was damaged by my mailer.

This is also broken in Debian.

In addition, there is no default CPU ("any"), so binfmt and related
actions currently don't work. I hacked my local installation by
duplicating the "qemu" cpu definition for "any", but maybe there is
a better solution.

Regards
Stefan

 target/s390x/cpu_models.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/target/s390x/cpu_models.c b/target/s390x/cpu_models.c
index 2a894ee..6e34763 100644
--- a/target/s390x/cpu_models.c
+++ b/target/s390x/cpu_models.c
@@ -660,7 +660,6 @@ static void check_compatibility(const S390CPUModel 
*max_model,
 
 static S390CPUModel *get_max_cpu_model(Error **errp)
 {
-#ifndef CONFIG_USER_ONLY
 static S390CPUModel max_model;
 static bool cached;
 
@@ -680,7 +679,6 @@ static S390CPUModel *get_max_cpu_model(Error **errp)
 cached = true;
 return &max_model;
 }
-#endif
 return NULL;
 }
 
-- 
2.1.4

Re: [Qemu-devel] [PATCH] target/s390x: Fix broken user mode

2017-01-30 Thread Stefan Weil

Am 30.01.2017 um 11:01 schrieb Christian Borntraeger:
> I have for whatever reasons problems with this patch. Looks like you
> pasted
> it into thunderbird or something like that and the whitespaces look mangled,
> e.g. look at the indentation of static vs {.

The original e-mail was not delivered by my mail server.
Therefore I had sent it as "new" with Thunderbird, which
wrapped some lines.

I'm sorry for that and have sent a v2 which should be fine.

Stefan

Re: [Qemu-devel] [PATCH 1/2] add 'discard-ram' migrate capability

2017-01-30 Thread Pavel Butsykin


On 27.01.2017 14:01, Dr. David Alan Gilbert wrote:

* Pavel Butsykin (pbutsy...@virtuozzo.com) wrote:

This feature frees the migrated memory on the source during postcopy-ram
migration. In the second step of postcopy-ram migration when the source vm
is put on pause we can free unnecessary memory. It will allow, in particular,
to start relaxing the memory stress on the source host in a load-balancing
scenario.

Signed-off-by: Pavel Butsykin 


Hi Pavel,
   Firstly a higher-level thing, can we use a different word than 'discard'
because I already use 'discard' in postcopy to mean the request from the source
to the destination to discard pages that are redirtied.  I suggest 'release-ram'
to just pick a different word that means the same thing.



Hi David,

Yeah, I thought about it.. the same names can confuse here, so I agree 
to change the name.



Also, see patchew's build error it spotted.


---
  include/migration/migration.h |  1 +
  include/migration/qemu-file.h |  3 ++-
  migration/migration.c |  9 +++
  migration/qemu-file.c | 59 ++-
  migration/ram.c   | 24 +-
  qapi-schema.json  |  5 +++-
  6 files changed, 91 insertions(+), 10 deletions(-)

diff --git a/include/migration/migration.h b/include/migration/migration.h
index c309d23370..d7bd404365 100644
--- a/include/migration/migration.h
+++ b/include/migration/migration.h
@@ -294,6 +294,7 @@ void migrate_add_blocker(Error *reason);
   */
  void migrate_del_blocker(Error *reason);

+bool migrate_discard_ram(void);
  bool migrate_postcopy_ram(void);
  bool migrate_zero_blocks(void);

diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index abedd466c9..0cd648a733 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -132,7 +132,8 @@ void qemu_put_byte(QEMUFile *f, int v);
   * put_buffer without copying the buffer.
   * The buffer should be available till it is sent asynchronously.
   */
-void qemu_put_buffer_async(QEMUFile *f, const uint8_t *buf, size_t size);
+void qemu_put_buffer_async(QEMUFile *f, const uint8_t *buf, size_t size,
+   bool may_free);
  bool qemu_file_mode_is_not_valid(const char *mode);
  bool qemu_file_is_writable(QEMUFile *f);

diff --git a/migration/migration.c b/migration/migration.c
index f498ab84f2..391db6f28b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1251,6 +1251,15 @@ void qmp_migrate_set_downtime(double value, Error **errp)
  qmp_migrate_set_parameters(&p, errp);
  }

+bool migrate_discard_ram(void)
+{
+MigrationState *s;
+
+s = migrate_get_current();
+
+return s->enabled_capabilities[MIGRATION_CAPABILITY_DISCARD_RAM];
+}
+
  bool migrate_postcopy_ram(void)
  {
  MigrationState *s;
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index e9fae31158..f85a0ecd9e 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -30,6 +30,7 @@
  #include "qemu/coroutine.h"
  #include "migration/migration.h"
  #include "migration/qemu-file.h"
+#include "sysemu/sysemu.h"
  #include "trace.h"

  #define IO_BUF_SIZE 32768
@@ -49,6 +50,7 @@ struct QEMUFile {
  int buf_size; /* 0 when writing */
  uint8_t buf[IO_BUF_SIZE];

+DECLARE_BITMAP(may_free, MAX_IOV_SIZE);
  struct iovec iov[MAX_IOV_SIZE];
  unsigned int iovcnt;

@@ -132,6 +134,40 @@ bool qemu_file_is_writable(QEMUFile *f)
  return f->ops->writev_buffer;
  }

+static void qemu_iovec_discard_ram(QEMUFile *f)
+{
+struct iovec iov;
+unsigned long idx;
+
+if (!migrate_discard_ram() || !runstate_check(RUN_STATE_FINISH_MIGRATE)) {
+return;
+}


Can we split this out into a separate function please; so qemu_iovec_discard_ram
always does it, and then you have something that only calls it if enabled.



Of course, I think migrate_discard_ram() is not needed here, because it
is already invoked in ram.c:

static int ram_save_page(QEMUFile *f, PageSearchStatus *pss,
...
if (send_async) {
qemu_put_buffer_async(
f, p, TARGET_PAGE_SIZE, migrate_discard_ram());
} else {
qemu_put_buffer(f, p, TARGET_PAGE_SIZE);
...


Also, it will fix 'build error'.


+idx = find_next_bit(f->may_free, f->iovcnt, 0);
+if (idx >= f->iovcnt) {
+return;
+}
+iov = f->iov[idx];
+
+while ((idx = find_next_bit(f->may_free, f->iovcnt, idx + 1)) < f->iovcnt) 
{
+/* check for adjacent buffer and coalesce them */
+if (iov.iov_base + iov.iov_len == f->iov[idx].iov_base) {
+iov.iov_len += f->iov[idx].iov_len;
+continue;
+}
+if (qemu_madvise(iov.iov_base, iov.iov_len, QEMU_MADV_DONTNEED) < 0) {
+error_report("migrate: madvise DONTNEED failed %p %ld: %s",
+ iov.iov_base, iov.iov_len, strerror(errno));
+}
+iov = f->iov[idx];


Can you add s

[Qemu-devel] [PATCH 1/1] io: ignore case in WebSocket HTTP header #PSBM-57554

2017-01-30 Thread Denis V. Lunev

From: Anton Nefedov 

According to RFC7230 Section 3.2, header field name is case-insensitive.

The haystack string length is limited by 4096 bytes by
qio_channel_websock_handshake_read().

Further, handshake_process() dups and NULL-terminates the string
so it is safe to call non length-limited functions like strcasestr().

Signed-off-by: Anton Nefedov 
Signed-off-by: Denis V. Lunev 
CC: Daniel P. Berrange 
---
 io/channel-websock.c | 25 ++---
 1 file changed, 14 insertions(+), 11 deletions(-)

diff --git a/io/channel-websock.c b/io/channel-websock.c
index d5a4ed3..991925a 100644
--- a/io/channel-websock.c
+++ b/io/channel-websock.c
@@ -108,18 +108,16 @@ enum {
 };
 
 static char *qio_channel_websock_handshake_entry(const char *handshake,
- size_t handshake_len,
  const char *name)
 {
 char *begin, *end, *ret = NULL;
 char *line = g_strdup_printf("%s%s: ",
  QIO_CHANNEL_WEBSOCK_HANDSHAKE_DELIM,
  name);
-begin = g_strstr_len(handshake, handshake_len, line);
+begin = strcasestr(handshake, line);
 if (begin != NULL) {
 begin += strlen(line);
-end = g_strstr_len(begin, handshake_len - (begin - handshake),
-QIO_CHANNEL_WEBSOCK_HANDSHAKE_DELIM);
+end = strstr(begin, QIO_CHANNEL_WEBSOCK_HANDSHAKE_DELIM);
 if (end != NULL) {
 ret = g_strndup(begin, end - begin);
 }
@@ -170,12 +168,14 @@ static int 
qio_channel_websock_handshake_process(QIOChannelWebsock *ioc,
  Error **errp)
 {
 int ret = -1;
+/* make it NULL-terminated */
+char *handshake = g_strndup(line, size);
 char *protocols = qio_channel_websock_handshake_entry(
-line, size, QIO_CHANNEL_WEBSOCK_HEADER_PROTOCOL);
+handshake, QIO_CHANNEL_WEBSOCK_HEADER_PROTOCOL);
 char *version = qio_channel_websock_handshake_entry(
-line, size, QIO_CHANNEL_WEBSOCK_HEADER_VERSION);
+handshake, QIO_CHANNEL_WEBSOCK_HEADER_VERSION);
 char *key = qio_channel_websock_handshake_entry(
-line, size, QIO_CHANNEL_WEBSOCK_HEADER_KEY);
+handshake, QIO_CHANNEL_WEBSOCK_HEADER_KEY);
 
 if (!protocols) {
 error_setg(errp, "Missing websocket protocol header data");
@@ -213,6 +213,7 @@ static int 
qio_channel_websock_handshake_process(QIOChannelWebsock *ioc,
 ret = qio_channel_websock_handshake_send_response(ioc, key, errp);
 
  cleanup:
+g_free(handshake);
 g_free(protocols);
 g_free(version);
 g_free(key);
@@ -248,10 +249,12 @@ static int 
qio_channel_websock_handshake_read(QIOChannelWebsock *ioc,
 }
 }
 
-if (qio_channel_websock_handshake_process(ioc,
-  (char *)ioc->encinput.buffer,
-  ioc->encinput.offset,
-  errp) < 0) {
+if (qio_channel_websock_handshake_process(
+ioc,
+(char *)ioc->encinput.buffer,
+handshake_end - (char *)ioc->encinput.buffer
++ strlen(QIO_CHANNEL_WEBSOCK_HANDSHAKE_END),
+errp) < 0) {
 return -1;
 }
 
-- 
2.7.4

Re: [Qemu-devel] [PULL v2 00/41] Misc patches for 2017-01-27

2017-01-30 Thread Peter Maydell

On 27 January 2017 at 19:35, Paolo Bonzini  wrote:
> The following changes since commit 29ba0cdc1fd1300f910d150c03a0f74236083bf7:
>
>   Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2017-01-27' 
> into staging (2017-01-27 15:20:08 +)
>
> are available in the git repository at:
>
>   git://github.com/bonzini/qemu.git tags/for-upstream
>
> for you to fetch changes up to 6da67de6803e93cbb7e93ac3497865832f8c00ea:
>
>   memory: don't sign-extend 32-bit writes (2017-01-27 18:08:00 +0100)
>
> 
> * SCSI max_transfer support for scsi-generic (Eric)
> * x86 SMI broadcast (Laszlo)
> * Character device QOMification (Marc-André)
> * Record/replay improvements (Pavel)
> * iscsi fixes (Peter L.)
> * "info mtree -f" command (Peter Xu)
> * TSC clock rate reporting (Phil)
> * DEVICE_CATEGORY_CPU (Thomas)
> * Memory sign-extension fix (Ladi)
>

Applied, thanks.

-- PMM

Re: [Qemu-devel] [PULL v2 00/41] Misc patches for 2017-01-27

2017-01-30 Thread Laszlo Ersek

On 01/30/17 13:57, Peter Maydell wrote:
> On 30 January 2017 at 12:33, Laszlo Ersek  wrote:
>> Peter,
>>
>> On 01/27/17 20:35, Paolo Bonzini wrote:
>>> The following changes since commit 29ba0cdc1fd1300f910d150c03a0f74236083bf7:
>>>
>>>   Merge remote-tracking branch 'remotes/armbru/tags/pull-qapi-2017-01-27' 
>>> into staging (2017-01-27 15:20:08 +)
>>>
>>> are available in the git repository at:
>>>
>>>   git://github.com/bonzini/qemu.git tags/for-upstream
>>>
>>> for you to fetch changes up to 6da67de6803e93cbb7e93ac3497865832f8c00ea:
>>>
>>>   memory: don't sign-extend 32-bit writes (2017-01-27 18:08:00 +0100)
>>>
>>
>> Can you please pull this v2?
> 
> I am currently processing it, yes. Paolo only sent it to the list

Apologies, I missed that.

> late on Friday evening, and I don't generally work weekends,

I didn't try to imply that you should have -- I also don't check my work
email over the weekend, on purpose. I check the occasional public list
archive, maybe.

> so it's not like this has been lying around for a long time...

Yeah I was conflicted whether I should ping you or not... The very first
version of the patches goes back to October 2015; I may have gotten
over-enthusiastic about the feature being finally merged, and possibly
made the wrong call. Sorry.

(BTW I know about , and for the v1 pull req, I
did notice quickly enough the conflict ("Failed in applying to current
master"), even before reading your feedback on-list; but for v2, patchew
only says "Series not complete".)

It's difficult to ask politely in email... I tend to start with
apologizing and various disclaimers, saying "please" and "thank you"
profusely, but I've also been told that I should be more "assertive"...
Sigh.

Thank you anyway, and sorry about the poke. It was meant in good faith.
Laszlo

Re: [Qemu-devel] [PULL v2 00/41] Misc patches for 2017-01-27

2017-01-30 Thread Peter Maydell

On 30 January 2017 at 13:29, Laszlo Ersek  wrote:
> Yeah I was conflicted whether I should ping you or not... The very first
> version of the patches goes back to October 2015; I may have gotten
> over-enthusiastic about the feature being finally merged, and possibly
> made the wrong call. Sorry.

> It's difficult to ask politely in email... I tend to start with
> apologizing and various disclaimers, saying "please" and "thank you"
> profusely, but I've also been told that I should be more "assertive"...
> Sigh.
>
> Thank you anyway, and sorry about the poke. It was meant in good faith.

No problem; I should probably have been a bit clearer that I
wasn't strongly objecting to being nudged here, merely noting
that it wasn't really necessary just yet. Sorry for being
unnecessarily grumpy.

thanks
-- PMM

Re: [Qemu-devel] [PATCH v2 10/11] aspeed: use first FMC flash as a boot ROM

2017-01-30 Thread Cédric Le Goater

>> +#define FIRMWARE_ADDR 0x0
>> +
>> +static void write_boot_rom(DriveInfo *dinfo, hwaddr addr, size_t rom_size,
>> +   Error **errp)
>> +{
>> +BlockBackend *blk = blk_by_legacy_dinfo(dinfo);
>> +uint8_t *storage;
>> +
>> +if (rom_size > blk_getlength(blk)) {
>> +rom_size = blk_getlength(blk);
>> +}
>> +
>> +storage = g_new0(uint8_t, rom_size);
> 
> Hi -- coverity points out that you're not checking for the
> case where blk_getlength() returns negative. (This can happen
> for the case where the BlockBackend represents an insertable
> device like a floppy or cdrom with no medium present.)
> 
> Since for aspeed this dinfo always represents a flash device,
> we know this shouldn't happen, so you can just make it a fatal
> error if blk_getlength() doesn't return what you're expecting.

OK. I will send a fix to improve this routine. 

Thanks,

C.

[Qemu-devel] [PATCH v2 00/41] chardev: qom clean-up and split in various backend files

2017-01-30 Thread Marc-André Lureau

Finish qom-ification by using instance finalizers and split the big
qemu-char.c file in many backend-specific units.

This is part of a larger refactoring series that I try to keep up to date here:
https://github.com/elmarco/qemu/commits/chrfe

v2: (after Eric's review)
- replace win32 BOOL with bool
- use a chardev_name_foreach() helper to avoid unnecessary list creation
- back copyright blurbs to new files
- fix osdep.h inclusion in header
- long lines split, fixed some code churn
- add a-b/r-b tags
- propose myself as qemu-char maintainer

Marc-André Lureau (41):
  MAINTAINERS: add myself to qemu-char.c
  spice-qemu-char: convert to finalize
  baum: convert to finalize
  msmouse: convert to finalize
  mux: convert to finalize
  char-udp: convert to finalize
  char-socket: convert to finalize
  char-pty: convert to finalize
  char-ringbuf: convert to finalize
  char-parallel: convert parallel to finalize
  char-stdio: convert to finalize
  char-win-stdio: convert to finalize
  char-win: do not override chr_free
  char-win: convert to finalize
  char-fd: convert to finalize
  char: remove chr_free
  char: get rid of CharDriver
  char: rename remaining CharDriver to Chardev
  char: remove class kind field
  char: move to chardev/
  char: create chardev-obj-y
  char: make null_chr_write() the default method
  char: move null chardev to its own file
  char: move mux to its own file
  char: move ringbuf/memory to its own file
  char: rename and move to header CHR_READ_BUF_LEN
  char: remove unused READ_RETRIES
  char: move QIOChannel-related stuff to char-io.h
  char: move fd chardev in its own file
  char: move win chardev base class in its own file
  char: move win-stdio into its own file
  char: move socket chardev to its own file
  char: move udp chardev in its own file
  char: move file chardev in its own file
  char: move stdio in its own file
  char: move console in its own file
  char: move pipe chardev in its own file
  char: move pty chardev in its own file
  char: move serial chardev to its own file
  char: move parallel chardev in its own file
  char: headers clean-up

 chardev/char-fd.h|   44 +
 chardev/char-io.h|   46 +
 chardev/char-mux.h   |   63 +
 chardev/char-parallel.h  |   32 +
 chardev/char-serial.h|   35 +
 chardev/char-win-stdio.h |   29 +
 chardev/char-win.h   |   53 +
 include/sysemu/char.h|   69 +-
 backends/baum.c  |   11 +-
 backends/msmouse.c   |   11 +-
 backends/testdev.c   |5 -
 chardev/char-console.c   |   53 +
 chardev/char-fd.c|  170 ++
 chardev/char-file.c  |  139 ++
 chardev/char-io.c|  192 ++
 chardev/char-mux.c   |  358 
 chardev/char-null.c  |   54 +
 chardev/char-parallel.c  |  316 +++
 chardev/char-pipe.c  |  191 ++
 chardev/char-pty.c   |  300 +++
 chardev/char-ringbuf.c   |  249 +++
 chardev/char-serial.c|  318 +++
 chardev/char-socket.c| 1017 +
 chardev/char-stdio.c |  164 ++
 chardev/char-udp.c   |  233 +++
 chardev/char-win-stdio.c |  266 +++
 chardev/char-win.c   |  265 +++
 chardev/char.c   | 1334 
 hmp.c|1 +
 monitor.c|1 +
 qemu-char.c  | 5171 --
 qmp.c|1 +
 spice-qemu-char.c|   21 +-
 tests/vhost-user-test.c  |1 +
 ui/console.c |   10 +-
 ui/gtk.c |9 +-
 MAINTAINERS  |3 +-
 Makefile |4 +-
 Makefile.objs|5 +-
 Makefile.target  |3 +
 chardev/Makefile.objs|   17 +
 tests/Makefile.include   |9 +-
 42 files changed, 5999 insertions(+), 5274 deletions(-)
 create mode 100644 chardev/char-fd.h
 create mode 100644 chardev/char-io.h
 create mode 100644 chardev/char-mux.h
 create mode 100644 chardev/char-parallel.h
 create mode 100644 chardev/char-serial.h
 create mode 100644 chardev/char-win-stdio.h
 create mode 100644 chardev/char-win.h
 create mode 100644 chardev/char-console.c
 create mode 100644 chardev/char-fd.c
 create mode 100644 chardev/char-file.c
 create mode 100644 chardev/char-io.c
 create mode 100644 chardev/char-mux.c
 create mode 100644 chardev/char-null.c
 create mode 100644 chardev/char-parallel.c
 create mode 100644 chardev/char-pipe.c
 create mode 100644 chardev/char-pty.c
 create mode 100644 chardev/char-ringbuf.c
 create mode 100644 chardev/char-serial.c
 create mode 100644 chardev/char-socket.c
 create mode 100644 chardev/char-stdio.c
 create mode 100644 chardev/char-udp.c
 create mode 100644 chardev/char-win-stdio.c
 create mode 100644 chardev/char-win.c
 create mode 100644 chardev/char.c
 delete mode 100644 qemu-char.c
 create mode 100644 chardev/Makefile.objs

-- 
2.11.0.295.gd7dffce1c.dirty

[Qemu-devel] [PATCH v2 01/41] MAINTAINERS: add myself to qemu-char.c

2017-01-30 Thread Marc-André Lureau

I consider to have enough experience with qemu-char to propose myself as
maintainer. This will allow me to send pull request without waiting for
Paolo.

Signed-off-by: Marc-André Lureau 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index e0be7bc0d4..5efd8cc671 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1194,6 +1194,7 @@ T: git git://github.com/jnsnow/qemu.git bitmaps
 
 Character device backends
 M: Paolo Bonzini 
+M: Marc-André Lureau 
 S: Maintained
 F: qemu-char.c
 F: backends/msmouse.c
-- 
2.11.0.295.gd7dffce1c.dirty

[Qemu-devel] [PATCH v2 04/41] msmouse: convert to finalize

2017-01-30 Thread Marc-André Lureau

Signed-off-by: Marc-André Lureau 
Reviewed-by: Eric Blake 
---
 backends/msmouse.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/backends/msmouse.c b/backends/msmouse.c
index 936a5476d5..55c344f0e1 100644
--- a/backends/msmouse.c
+++ b/backends/msmouse.c
@@ -139,9 +139,9 @@ static int msmouse_chr_write(struct Chardev *s, const 
uint8_t *buf, int len)
 return len;
 }
 
-static void msmouse_chr_free(struct Chardev *chr)
+static void char_msmouse_finalize(Object *obj)
 {
-MouseChardev *mouse = MOUSE_CHARDEV(chr);
+MouseChardev *mouse = MOUSE_CHARDEV(obj);
 
 qemu_input_handler_unregister(mouse->hs);
 }
@@ -172,13 +172,13 @@ static void char_msmouse_class_init(ObjectClass *oc, void 
*data)
 cc->open = msmouse_chr_open;
 cc->chr_write = msmouse_chr_write;
 cc->chr_accept_input = msmouse_chr_accept_input;
-cc->chr_free = msmouse_chr_free;
 }
 
 static const TypeInfo char_msmouse_type_info = {
 .name = TYPE_CHARDEV_MSMOUSE,
 .parent = TYPE_CHARDEV,
 .instance_size = sizeof(MouseChardev),
+.instance_finalize = char_msmouse_finalize,
 .class_init = char_msmouse_class_init,
 };
 
-- 
2.11.0.295.gd7dffce1c.dirty

[Qemu-devel] [PATCH v2 02/41] spice-qemu-char: convert to finalize

2017-01-30 Thread Marc-André Lureau

Signed-off-by: Marc-André Lureau 
Reviewed-by: Eric Blake 
---
 spice-qemu-char.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/spice-qemu-char.c b/spice-qemu-char.c
index dd97c17fca..3902202a35 100644
--- a/spice-qemu-char.c
+++ b/spice-qemu-char.c
@@ -210,9 +210,9 @@ static int spice_chr_write(Chardev *chr, const uint8_t 
*buf, int len)
 return read_bytes;
 }
 
-static void spice_chr_free(struct Chardev *chr)
+static void char_spice_finalize(Object *obj)
 {
-SpiceChardev *s = SPICE_CHARDEV(chr);
+SpiceChardev *s = SPICE_CHARDEV(obj);
 
 vmc_unregister_interface(s);
 QLIST_REMOVE(s, next);
@@ -365,13 +365,13 @@ static void char_spice_class_init(ObjectClass *oc, void 
*data)
 cc->chr_write = spice_chr_write;
 cc->chr_add_watch = spice_chr_add_watch;
 cc->chr_accept_input = spice_chr_accept_input;
-cc->chr_free = spice_chr_free;
 }
 
 static const TypeInfo char_spice_type_info = {
 .name = TYPE_CHARDEV_SPICE,
 .parent = TYPE_CHARDEV,
 .instance_size = sizeof(SpiceChardev),
+.instance_finalize = char_spice_finalize,
 .class_init = char_spice_class_init,
 .abstract = true,
 };
-- 
2.11.0.295.gd7dffce1c.dirty

[Qemu-devel] [PATCH v2 03/41] baum: convert to finalize

2017-01-30 Thread Marc-André Lureau

Signed-off-by: Marc-André Lureau 
Reviewed-by: Eric Blake 
---
 backends/baum.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/backends/baum.c b/backends/baum.c
index 0f418ed358..39c9365024 100644
--- a/backends/baum.c
+++ b/backends/baum.c
@@ -616,9 +616,9 @@ static void baum_chr_read(void *opaque)
 }
 }
 
-static void baum_chr_free(Chardev *chr)
+static void char_braille_finalize(Object *obj)
 {
-BaumChardev *baum = BAUM_CHARDEV(chr);
+BaumChardev *baum = BAUM_CHARDEV(obj);
 
 timer_free(baum->cellCount_timer);
 if (baum->brlapi) {
@@ -659,13 +659,13 @@ static void char_braille_class_init(ObjectClass *oc, void 
*data)
 cc->open = baum_chr_open;
 cc->chr_write = baum_chr_write;
 cc->chr_accept_input = baum_chr_accept_input;
-cc->chr_free = baum_chr_free;
 }
 
 static const TypeInfo char_braille_type_info = {
 .name = TYPE_CHARDEV_BRAILLE,
 .parent = TYPE_CHARDEV,
 .instance_size = sizeof(BaumChardev),
+.instance_finalize = char_braille_finalize,
 .class_init = char_braille_class_init,
 };
 
-- 
2.11.0.295.gd7dffce1c.dirty

[Qemu-devel] [PATCH v2 09/41] char-ringbuf: convert to finalize

2017-01-30 Thread Marc-André Lureau

Signed-off-by: Marc-André Lureau 
Reviewed-by: Eric Blake 
---
 qemu-char.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/qemu-char.c b/qemu-char.c
index fecebde87a..74e78b56d1 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -3550,9 +3550,9 @@ static int ringbuf_chr_read(Chardev *chr, uint8_t *buf, 
int len)
 return i;
 }
 
-static void ringbuf_chr_free(struct Chardev *chr)
+static void char_ringbuf_finalize(Object *obj)
 {
-RingBufChardev *d = RINGBUF_CHARDEV(chr);
+RingBufChardev *d = RINGBUF_CHARDEV(obj);
 
 g_free(d->cbuf);
 }
@@ -3982,7 +3982,6 @@ static void char_ringbuf_class_init(ObjectClass *oc, void 
*data)
 
 cc->open = qemu_chr_open_ringbuf;
 cc->chr_write = ringbuf_chr_write;
-cc->chr_free = ringbuf_chr_free;
 }
 
 static const TypeInfo char_ringbuf_type_info = {
@@ -3990,6 +3989,7 @@ static const TypeInfo char_ringbuf_type_info = {
 .parent = TYPE_CHARDEV,
 .class_init = char_ringbuf_class_init,
 .instance_size = sizeof(RingBufChardev),
+.instance_finalize = char_ringbuf_finalize,
 };
 
 /* Bug-compatibility: */
-- 
2.11.0.295.gd7dffce1c.dirty

[Qemu-devel] [PATCH v2 07/41] char-socket: convert to finalize

2017-01-30 Thread Marc-André Lureau

Notice that finalize() will be run after a failure to open(), so cleanup
code must be adjusted.

Signed-off-by: Marc-André Lureau 
Reviewed-by: Eric Blake 
---
 qemu-char.c | 12 +---
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/qemu-char.c b/qemu-char.c
index 1c4fcf3a6e..a00bbb0a1c 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -3446,9 +3446,10 @@ int qemu_chr_fe_wait_connected(CharBackend *be, Error 
**errp)
 return qemu_chr_wait_connected(be->chr, errp);
 }
 
-static void tcp_chr_free(Chardev *chr)
+static void char_socket_finalize(Object *obj)
 {
-SocketChardev *s = SOCKET_CHARDEV(chr);
+Chardev *chr = CHARDEV(obj);
+SocketChardev *s = SOCKET_CHARDEV(obj);
 
 tcp_chr_free_connection(chr);
 
@@ -4886,7 +4887,7 @@ static void qmp_chardev_open_socket(Chardev *chr,
 s->listen_ioc = sioc;
 if (is_waitconnect &&
 qemu_chr_wait_connected(chr, errp) < 0) {
-goto error;
+return;
 }
 if (!s->ioc) {
 s->listen_tag = qio_channel_add_watch(
@@ -4904,9 +4905,6 @@ error:
 if (sioc) {
 object_unref(OBJECT(sioc));
 }
-if (s->tls_creds) {
-object_unref(OBJECT(s->tls_creds));
-}
 }
 
 static const CharDriver socket_driver = {
@@ -4928,13 +4926,13 @@ static void char_socket_class_init(ObjectClass *oc, 
void *data)
 cc->chr_add_client = tcp_chr_add_client;
 cc->chr_add_watch = tcp_chr_add_watch;
 cc->chr_update_read_handler = tcp_chr_update_read_handler;
-cc->chr_free = tcp_chr_free;
 }
 
 static const TypeInfo char_socket_type_info = {
 .name = TYPE_CHARDEV_SOCKET,
 .parent = TYPE_CHARDEV,
 .instance_size = sizeof(SocketChardev),
+.instance_finalize = char_socket_finalize,
 .class_init = char_socket_class_init,
 };
 
-- 
2.11.0.295.gd7dffce1c.dirty

[Qemu-devel] [PATCH v2 05/41] mux: convert to finalize

2017-01-30 Thread Marc-André Lureau

Signed-off-by: Marc-André Lureau 
Reviewed-by: Eric Blake 
---
 qemu-char.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/qemu-char.c b/qemu-char.c
index 6b4a299702..91ca4cb083 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -859,9 +859,9 @@ static GSource *mux_chr_add_watch(Chardev *s, GIOCondition 
cond)
 return cc->chr_add_watch(chr, cond);
 }
 
-static void mux_chr_free(struct Chardev *chr)
+static void char_mux_finalize(Object *obj)
 {
-MuxChardev *d = MUX_CHARDEV(chr);
+MuxChardev *d = MUX_CHARDEV(obj);
 int i;
 
 for (i = 0; i < d->mux_cnt; i++) {
@@ -4025,7 +4025,6 @@ static void char_mux_class_init(ObjectClass *oc, void 
*data)
 ChardevClass *cc = CHARDEV_CLASS(oc);
 
 cc->open = qemu_chr_open_mux;
-cc->chr_free = mux_chr_free;
 cc->chr_write = mux_chr_write;
 cc->chr_accept_input = mux_chr_accept_input;
 cc->chr_add_watch = mux_chr_add_watch;
@@ -4036,6 +4035,7 @@ static const TypeInfo char_mux_type_info = {
 .parent = TYPE_CHARDEV,
 .class_init = char_mux_class_init,
 .instance_size = sizeof(MuxChardev),
+.instance_finalize = char_mux_finalize,
 };
 
 static void qemu_chr_parse_socket(QemuOpts *opts, ChardevBackend *backend,
-- 
2.11.0.295.gd7dffce1c.dirty

[Qemu-devel] [PATCH v2 13/41] char-win: do not override chr_free

2017-01-30 Thread Marc-André Lureau

For some unclear reason to me, char-file does not have chr_free on
win32. Since we want to switch to instance finalizer instead of class
chr_free, we should be able to run the base WinChardev class finalizer
in any case. Use a boolean to skip free to ease the transition to
instance finalizer.

Signed-off-by: Marc-André Lureau 
Reviewed-by: Eric Blake 
---
 qemu-char.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/qemu-char.c b/qemu-char.c
index d92642735e..83636d76c0 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -2122,6 +2122,8 @@ typedef struct {
 
 /* Protected by the Chardev chr_write_lock.  */
 OVERLAPPED osend;
+/* FIXME: file/console do not finalize */
+bool skip_free;
 } WinChardev;
 
 #define TYPE_CHARDEV_WIN "chardev-win"
@@ -2152,6 +2154,10 @@ static void win_chr_free(Chardev *chr)
 {
 WinChardev *s = WIN_CHARDEV(chr);
 
+if (s->skip_free) {
+return;
+}
+
 if (s->hsend) {
 CloseHandle(s->hsend);
 s->hsend = NULL;
@@ -2432,6 +2438,7 @@ static void qemu_chr_open_win_file(Chardev *chr, HANDLE 
fd_out)
 {
 WinChardev *s = WIN_CHARDEV(chr);
 
+s->skip_free = true;
 s->hcom = fd_out;
 }
 
@@ -2468,7 +2475,6 @@ static void char_console_class_init(ObjectClass *oc, void 
*data)
 ChardevClass *cc = CHARDEV_CLASS(oc);
 
 cc->open = qemu_chr_open_win_con;
-cc->chr_free = NULL;
 }
 
 static const TypeInfo char_console_type_info = {
@@ -4731,10 +4737,6 @@ static void char_file_class_init(ObjectClass *oc, void 
*data)
 ChardevClass *cc = CHARDEV_CLASS(oc);
 
 cc->open = qmp_chardev_open_file;
-#ifdef _WIN32
-/* FIXME: no chr_free */
-cc->chr_free = NULL;
-#endif
 }
 
 static const TypeInfo char_file_type_info = {
-- 
2.11.0.295.gd7dffce1c.dirty

[Qemu-devel] [PATCH v2 10/41] char-parallel: convert parallel to finalize

2017-01-30 Thread Marc-André Lureau

Signed-off-by: Marc-André Lureau 
Reviewed-by: Eric Blake 
---
 qemu-char.c | 30 +-
 1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/qemu-char.c b/qemu-char.c
index 74e78b56d1..1981058f45 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -2025,17 +2025,6 @@ static int pp_ioctl(Chardev *chr, int cmd, void *arg)
 return 0;
 }
 
-static void pp_free(Chardev *chr)
-{
-ParallelChardev *drv = PARALLEL_CHARDEV(chr);
-int fd = drv->fd;
-
-pp_hw_mode(drv, IEEE1284_MODE_COMPAT);
-ioctl(fd, PPRELEASE);
-close(fd);
-qemu_chr_be_event(chr, CHR_EVENT_CLOSED);
-}
-
 static void qemu_chr_open_pp_fd(Chardev *chr,
 int fd,
 bool *be_opened,
@@ -4699,18 +4688,33 @@ static void char_parallel_class_init(ObjectClass *oc, 
void *data)
 #if defined(__linux__)
 cc->chr_write = null_chr_write;
 cc->chr_ioctl = pp_ioctl;
-cc->chr_free = pp_free;
 #elif defined(__FreeBSD__) || defined(__FreeBSD_kernel__) || 
defined(__DragonFly__)
-/* FIXME: no chr_free */
 cc->chr_write = null_chr_write;
 cc->chr_ioctl = pp_ioctl;
 #endif
 }
 
+static void char_parallel_finalize(Object *obj)
+{
+#if defined(__linux__)
+Chardev *chr = CHARDEV(obj);
+ParallelChardev *drv = PARALLEL_CHARDEV(chr);
+int fd = drv->fd;
+
+pp_hw_mode(drv, IEEE1284_MODE_COMPAT);
+ioctl(fd, PPRELEASE);
+close(fd);
+qemu_chr_be_event(chr, CHR_EVENT_CLOSED);
+#elif defined(__FreeBSD__) || defined(__FreeBSD_kernel__) || 
defined(__DragonFly__)
+/* FIXME: close fd? */
+#endif
+}
+
 static const TypeInfo char_parallel_type_info = {
 .name = TYPE_CHARDEV_PARALLEL,
 .parent = TYPE_CHARDEV,
 .instance_size = sizeof(ParallelChardev),
+.instance_finalize = char_parallel_finalize,
 .class_init = char_parallel_class_init,
 };
 #endif
-- 
2.11.0.295.gd7dffce1c.dirty

[Qemu-devel] [PATCH v2 06/41] char-udp: convert to finalize

2017-01-30 Thread Marc-André Lureau

Signed-off-by: Marc-André Lureau 
Reviewed-by: Eric Blake 
---
 qemu-char.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/qemu-char.c b/qemu-char.c
index 91ca4cb083..1c4fcf3a6e 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -2779,9 +2779,10 @@ static void udp_chr_update_read_handler(Chardev *chr,
 }
 }
 
-static void udp_chr_free(Chardev *chr)
+static void char_udp_finalize(Object *obj)
 {
-UdpChardev *s = UDP_CHARDEV(chr);
+Chardev *chr = CHARDEV(obj);
+UdpChardev *s = UDP_CHARDEV(obj);
 
 remove_fd_in_watch(chr);
 if (s->ioc) {
@@ -4975,13 +4976,13 @@ static void char_udp_class_init(ObjectClass *oc, void 
*data)
 cc->open = qmp_chardev_open_udp;
 cc->chr_write = udp_chr_write;
 cc->chr_update_read_handler = udp_chr_update_read_handler;
-cc->chr_free = udp_chr_free;
 }
 
 static const TypeInfo char_udp_type_info = {
 .name = TYPE_CHARDEV_UDP,
 .parent = TYPE_CHARDEV,
 .instance_size = sizeof(UdpChardev),
+.instance_finalize = char_udp_finalize,
 .class_init = char_udp_class_init,
 };
 
-- 
2.11.0.295.gd7dffce1c.dirty

[Qemu-devel] [PATCH v2 11/41] char-stdio: convert to finalize

2017-01-30 Thread Marc-André Lureau

Signed-off-by: Marc-André Lureau 
Reviewed-by: Eric Blake 
---
 qemu-char.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/qemu-char.c b/qemu-char.c
index 1981058f45..390e6a99de 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -1422,10 +1422,10 @@ static void qemu_chr_set_echo_stdio(Chardev *chr, bool 
echo)
 tcsetattr (0, TCSANOW, &tty);
 }
 
-static void qemu_chr_free_stdio(struct Chardev *chr)
+static void char_stdio_finalize(Object *obj)
 {
 term_exit();
-fd_chr_free(chr);
+fd_chr_free(CHARDEV(chr));
 }
 
 static void qemu_chr_open_stdio(Chardev *chr,
@@ -3859,7 +3859,6 @@ static void char_stdio_class_init(ObjectClass *oc, void 
*data)
 cc->chr_free = win_stdio_free;
 #else
 cc->chr_set_echo = qemu_chr_set_echo_stdio;
-cc->chr_free = qemu_chr_free_stdio;
 #endif
 }
 
@@ -3869,6 +3868,7 @@ static const TypeInfo char_stdio_type_info = {
 .parent = TYPE_CHARDEV_WIN_STDIO,
 #else
 .parent = TYPE_CHARDEV_FD,
+.instance_finalize = char_stdio_finalize,
 #endif
 .class_init = char_stdio_class_init,
 };
-- 
2.11.0.295.gd7dffce1c.dirty

[Qemu-devel] [PATCH v2 12/41] char-win-stdio: convert to finalize

2017-01-30 Thread Marc-André Lureau

Signed-off-by: Marc-André Lureau 
Reviewed-by: Eric Blake 
---
 qemu-char.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/qemu-char.c b/qemu-char.c
index 390e6a99de..d92642735e 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -2594,9 +2594,9 @@ static void qemu_chr_set_echo_win_stdio(Chardev *chr, 
bool echo)
 }
 }
 
-static void win_stdio_free(Chardev *chr)
+static void char_win_stdio_finalize(Object *obj)
 {
-WinStdioChardev *stdio = WIN_STDIO_CHARDEV(chr);
+WinStdioChardev *stdio = WIN_STDIO_CHARDEV(obj);
 
 if (stdio->hInputReadyEvent != INVALID_HANDLE_VALUE) {
 CloseHandle(stdio->hInputReadyEvent);
@@ -2613,6 +2613,7 @@ static const TypeInfo char_win_stdio_type_info = {
 .name = TYPE_CHARDEV_WIN_STDIO,
 .parent = TYPE_CHARDEV,
 .instance_size = sizeof(WinStdioChardev),
+.instance_finalize = char_win_stdio_finalize,
 .abstract = true,
 };
 
@@ -3856,7 +3857,6 @@ static void char_stdio_class_init(ObjectClass *oc, void 
*data)
 #ifdef _WIN32
 cc->chr_write = win_stdio_write;
 cc->chr_set_echo = qemu_chr_set_echo_win_stdio;
-cc->chr_free = win_stdio_free;
 #else
 cc->chr_set_echo = qemu_chr_set_echo_stdio;
 #endif
-- 
2.11.0.295.gd7dffce1c.dirty

[Qemu-devel] [PATCH v2 08/41] char-pty: convert to finalize

2017-01-30 Thread Marc-André Lureau

Signed-off-by: Marc-André Lureau 
Reviewed-by: Eric Blake 
---
 qemu-char.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/qemu-char.c b/qemu-char.c
index a00bbb0a1c..fecebde87a 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -1659,9 +1659,10 @@ static void pty_chr_state(Chardev *chr, int connected)
 }
 }
 
-static void pty_chr_free(struct Chardev *chr)
+static void char_pty_finalize(Object *obj)
 {
-PtyChardev *s = PTY_CHARDEV(chr);
+Chardev *chr = CHARDEV(obj);
+PtyChardev *s = PTY_CHARDEV(obj);
 
 qemu_mutex_lock(&chr->chr_write_lock);
 pty_chr_state(chr, 0);
@@ -1718,13 +1719,13 @@ static void char_pty_class_init(ObjectClass *oc, void 
*data)
 cc->chr_write = char_pty_chr_write;
 cc->chr_update_read_handler = pty_chr_update_read_handler;
 cc->chr_add_watch = pty_chr_add_watch;
-cc->chr_free = pty_chr_free;
 }
 
 static const TypeInfo char_pty_type_info = {
 .name = TYPE_CHARDEV_PTY,
 .parent = TYPE_CHARDEV,
 .instance_size = sizeof(PtyChardev),
+.instance_finalize = char_pty_finalize,
 .class_init = char_pty_class_init,
 };
 
-- 
2.11.0.295.gd7dffce1c.dirty

[Qemu-devel] [PATCH v2 14/41] char-win: convert to finalize

2017-01-30 Thread Marc-André Lureau

Signed-off-by: Marc-André Lureau 
Reviewed-by: Eric Blake 
---
 qemu-char.c | 10 +++---
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/qemu-char.c b/qemu-char.c
index 83636d76c0..0cece8a34f 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -2150,8 +2150,9 @@ typedef struct {
 static int win_chr_poll(void *opaque);
 static int win_chr_pipe_poll(void *opaque);
 
-static void win_chr_free(Chardev *chr)
+static void char_win_finalize(Object *obj)
 {
+Chardev *chr = CHARDEV(obj);
 WinChardev *s = WIN_CHARDEV(chr);
 
 if (s->skip_free) {
@@ -2160,15 +2161,12 @@ static void win_chr_free(Chardev *chr)
 
 if (s->hsend) {
 CloseHandle(s->hsend);
-s->hsend = NULL;
 }
 if (s->hrecv) {
 CloseHandle(s->hrecv);
-s->hrecv = NULL;
 }
 if (s->hcom) {
 CloseHandle(s->hcom);
-s->hcom = NULL;
 }
 if (s->fpipe)
 qemu_del_polling_cb(win_chr_pipe_poll, chr);
@@ -2241,7 +2239,6 @@ static int win_chr_init(Chardev *chr, const char 
*filename, Error **errp)
 return 0;
 
  fail:
-win_chr_free(chr);
 return -1;
 }
 
@@ -2416,7 +2413,6 @@ static int win_chr_pipe_init(Chardev *chr, const char 
*filename,
 return 0;
 
  fail:
-win_chr_free(chr);
 return -1;
 }
 
@@ -2447,13 +2443,13 @@ static void char_win_class_init(ObjectClass *oc, void 
*data)
 ChardevClass *cc = CHARDEV_CLASS(oc);
 
 cc->chr_write = win_chr_write;
-cc->chr_free = win_chr_free;
 }
 
 static const TypeInfo char_win_type_info = {
 .name = TYPE_CHARDEV_WIN,
 .parent = TYPE_CHARDEV,
 .instance_size = sizeof(WinChardev),
+.instance_finalize = char_win_finalize,
 .class_init = char_win_class_init,
 .abstract = true,
 };
-- 
2.11.0.295.gd7dffce1c.dirty

[Qemu-devel] [PATCH v2 16/41] char: remove chr_free

2017-01-30 Thread Marc-André Lureau

Now it uses Object instance_finalize instead.

Signed-off-by: Marc-André Lureau 
Reviewed-by: Eric Blake 
---
 include/sysemu/char.h |  8 
 qemu-char.c   | 10 +-
 2 files changed, 1 insertion(+), 17 deletions(-)

diff --git a/include/sysemu/char.h b/include/sysemu/char.h
index da0e7dd494..5e5594f305 100644
--- a/include/sysemu/char.h
+++ b/include/sysemu/char.h
@@ -185,13 +185,6 @@ Chardev *qemu_chr_new_noreplay(const char *label, const 
char *filename);
 void qemu_chr_delete(Chardev *chr);
 
 /**
- * @qemu_chr_free:
- *
- * Destroy a character backend.
- */
-void qemu_chr_free(Chardev *chr);
-
-/**
  * @qemu_chr_fe_set_echo:
  *
  * Ask the backend to override its normal echo setting.  This only really
@@ -496,7 +489,6 @@ typedef struct ChardevClass {
 int (*set_msgfds)(Chardev *s, int *fds, int num);
 int (*chr_add_client)(Chardev *chr, int fd);
 int (*chr_wait_connected)(Chardev *chr, Error **errp);
-void (*chr_free)(Chardev *chr);
 void (*chr_disconnect)(Chardev *chr);
 void (*chr_accept_input)(Chardev *chr);
 void (*chr_set_echo)(Chardev *chr, bool echo);
diff --git a/qemu-char.c b/qemu-char.c
index 3d0d690999..c239418c25 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -4371,18 +4371,10 @@ void qemu_chr_fe_disconnect(CharBackend *be)
 }
 }
 
-void qemu_chr_free(Chardev *chr)
-{
-if (CHARDEV_GET_CLASS(chr)->chr_free) {
-CHARDEV_GET_CLASS(chr)->chr_free(chr);
-}
-object_unref(OBJECT(chr));
-}
-
 void qemu_chr_delete(Chardev *chr)
 {
 QTAILQ_REMOVE(&chardevs, chr, next);
-qemu_chr_free(chr);
+object_unref(OBJECT(chr));
 }
 
 ChardevInfoList *qmp_query_chardev(Error **errp)
-- 
2.11.0.295.gd7dffce1c.dirty

1 2 3 >

1 - 100 of 249 matches

Mail list logo