Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.

2015-01-09 Thread Tian, Kevin
> From: Tim Deegan [mailto:t...@xen.org]
> Sent: Thursday, January 08, 2015 8:43 PM
> 
> Hi,
> 
> > > Not really.  The IOMMU tables are also 64-bit so there must be enough
> > > addresses to map all of RAM.  There shouldn't be any need for these
> > > mappings to be _contiguous_, btw.  You just need to have one free
> > > address for each mapping.  Again, following how grant maps work, I'd
> > > imagine that PVH guests will allocate an unused GFN for each mapping
> > > and do enough bookkeeping to make sure they don't clash with other GFN
> > > users (grant mapping, ballooning, &c).  PV guests will probably be
> > > given a BFN by the hypervisor at map time (which will be == MFN in
> > > practice) and just needs to pass the same BFN to the unmap call later
> > > (it can store it in the GTT meanwhile).
> >
> > if possible prefer to make both consistent, i.e. always finding unused GFN?
> 
> I don't think it will be possible.  PV domains are already using BFNs
> supplied by Xen (in fact == MFN) for backend grant mappings, which
> would conflict with supplying their own for these mappings.  But
> again, I think the kernel maintainers for Xen may have a better idea
> of how these interfaces are used inside the kernel.  For example,
> it might be easy enough to wrap the two systems inside a common API
> inside linux.   Again, following how grant mapping works seems like
> the way forward.
> 

So Konrad, do you have any insight here? :-)

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/3] x86: also allow REP STOS emulation acceleration

2015-01-09 Thread Jan Beulich
>>> On 08.01.15 at 20:29,  wrote:
> On 08/01/15 15:50, Jan Beulich wrote:
>> +if ( !buf )
>> +buf = p_data;
>> +else
>> +switch ( bytes_per_rep )
>> +{
>> +#define CASE(bits, suffix)  \
>> +case (bits) / 8:\
>> +asm ( "rep stos" #suffix\
>> +  :: "a" (*(const uint##bits##_t *)p_data), \
>> + "D" (buf), "c" (*reps) \
>> +  : "memory" ); \
> 
> This looks as if it needs output clobbers for D and c

Well spotted, will fix.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Intel-gfx] [Announcement] 2015-Q1 release of XenGT - a Mediated Graphics Passthrough Solution from Intel

2015-01-09 Thread Jike Song

Hi all,

  We're pleased to announce a public update to Intel Graphics Virtualization 
Technology (Intel GVT-g, formerly known as XenGT). Intel GVT-g is a complete 
vGPU solution with mediated pass-through, supported today on 4th generation 
Intel Core(TM) processors with Intel Graphics processors. A virtual GPU 
instance is maintained for each VM, with part of performance critical resources 
directly assigned. The capability of running native graphics driver inside a 
VM, without hypervisor intervention in performance critical paths, achieves a 
good balance among performance, feature, and sharing capability. Though we only 
support Xen on Intel Processor Graphics so far, the core logic can be easily 
ported to other hypervisors.   The XenGT project should be considered a work in 
progress, As such it is not a complete product nor should it be considered 
one., Extra care should be taken when testing and configuring a system to use 
the XenGT project.

The news of this update:

- kernel update from 3.14.1 to drm-intel 3.17.0.
- We plan to integrate Intel GVT-g as a feature in i915 driver. That 
effort is still under review, not included in this update yet.
- Next update will be around early Apr, 2015.

This update consists of:

- Including some bug fixes and stability enhancement.
- Making XenGT device model to be aware of Broadwell. In this version 
BDW is not yet functioning.
- Available Fence registers number is changed to 32 from 16 to align 
with HSW hardware.
- New cascade interrupt framework for supporting interrupt 
virtualization on both Haswell and Broadwell.
- Add back the gem_vgtbuffer. The previous release did not build that 
module for 3.14 kernel. In this release, the module is back and rebased to 3.17.
- Enable the irq based context switch in vgt driver, which will help 
reduce the cpu utilization while doing context switch, it is enabled by 
default, and can be turn off by kernel flag irq_based_ctx_switch.


Please refer to the new setup guide, which provides step-to-step details about 
building/configuring/running Intel GVT-g:


https://github.com/01org/XenGT-Preview-kernel/blob/master/XenGT_Setup_Guide.pdf

The new source codes are available at the updated github repos:

Linux: https://github.com/01org/XenGT-Preview-kernel.git
Xen: https://github.com/01org/XenGT-Preview-xen.git
Qemu: https://github.com/01org/XenGT-Preview-qemu.git


More information about Intel GVT-g background, architecture, etc can be found 
at:



https://www.usenix.org/conference/atc14/technical-sessions/presentation/tian

http://events.linuxfoundation.org/sites/events/files/slides/XenGT-Xen%20Summit-v7_0.pdf
https://01.org/xen/blogs/srclarkx/2013/graphics-virtualization-xengt



The previous update can be found here:


http://lists.xen.org/archives/html/xen-devel/2014-12/msg00474.html



Appreciate your comments!



--
Thanks,
Jike


On 12/04/2014 10:45 AM, Jike Song wrote:

Hi all,

We're pleased to announce a public release to Intel Graphics Virtualization 
Technology (Intel GVT-g, formerly known as XenGT). Intel GVT-g is a complete 
vGPU solution with mediated pass-through, supported today on 4th generation 
Intel Core(TM) processors with Intel Graphics processors. A virtual GPU 
instance is maintained for each VM, with part of performance critical resources 
directly assigned. The capability of running native graphics driver inside a 
VM, without hypervisor intervention in performance critical paths, achieves a 
good balance among performance, feature, and sharing capability. Though we only 
support Xen on Intel Processor Graphics so far, the core logic can be easily 
ported to other hypervisors.


The news of this update:


- kernel update from 3.11.6 to 3.14.1

- We plan to integrate Intel GVT-g as a feature in i915 driver. That 
effort is still under review, not included in this update yet

- Next update will be around early Jan, 2015


This update consists of:

- Windows HVM support with driver version 15.33.3910

- Stability fixes, e.g. stabilize GPU, the GPU hang occurrence rate 
becomes rare now

- Hardware Media Acceleration for Decoding/Encoding/Transcoding, VC1, 
H264 etc. format supporting

- Display enhancements, e.g. DP type is supported for virtual PORT

- Display port capability virtualization: with this feature, dom0 
manager could freely assign virtual DDI ports to VM, not necessary to check 
whether the corresponding physical DDI ports are available



Please refer to the new setup guide, which provides step-to-step details about 
building/configuring/running Intel GVT-g:



https://github.com/01org/XenGT-Preview-kernel/blob/master/XenGT_Setup_Guide.pdf



The new source codes are available at the updated github repos:


Linux: https://github.com/01org/XenGT-Preview-kernel.git

  

Re: [Xen-devel] (v2) Design proposal for RMRR fix

2015-01-09 Thread Jan Beulich
>>> On 08.01.15 at 19:02,  wrote:
> On Thu, Jan 8, 2015 at 4:10 PM, Jan Beulich  wrote:
> On 08.01.15 at 16:59,  wrote:
>>> It seems a lot cleaner to me to have the toolstack tell Xen what
>>> ranges are reserved for RMRR per VM, and then have Xen check again
>>> when assigning a device to make sure that the RMRRs have already been
>>> reserved.
>>
>> With an extra level of what can be got wrong by the admin.
>> However, I now realize that doing it this way would allow
>> specifying regions not associated with any device on the host
>> the guest boots on, but associated with one on a host the guest
>> may later migrate to.
> 
> I did say the toolstack, not the admin. :-)

You did, but the tool stack needs to take its knowledge from
somewhere, which I implied to be the guest config.

> At the xl level, I envisioned a single boolean that would say, "Make
> my memory layout resemble the host system" -- so the MMIO hole would
> be the same size, and all the RMRRs would be reserved.

Right, that's an option where things can't go wrong, but not allowing
maximum flexibility (as mentioned before when considering migration).

> But xapi, for instance, has a concept of "hardware pools" containing
> individual hardware devices, which can be assigned to VMs.  You could
> imagine a toolstack like xapi keeping track of all devices which
> *might be* assigned to a guest, and supplying Xen with the RMRRs.  As
> you say, then this could include hardware across a pool of hosts, with
> the RMRRs of any device in the system reserved.
> 
> Alternately, could the toolstack could be responsible for making sure
> that nobody uses such a range; and then Xen when a device is assigned,
> Xen can check to make sure that the gpfn space is empty before adding
> the RMRRs?  That might be the most flexible.

"such a range" is pretty unspecific here: The main question is where the
tool stack would get the information on the set of ranges from. Of course
if it has this information, it can make sure no guest (unless marked
otherwise) uses any of these ranges. The hypervisor side verification
needs to be done in any case.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] (v2) Design proposal for RMRR fix

2015-01-09 Thread Jan Beulich
>>> On 09.01.15 at 03:27,  wrote:
>>  From: Jan Beulich [mailto:jbeul...@suse.com]
>> Sent: Thursday, January 08, 2015 9:55 PM
>> >>> On 26.12.14 at 12:23,  wrote:
>> > 3) hvmloader
>> >   Hvmloader allocates other resources (ACPI, PCI MMIO, etc.) and
>> > internal data structures in gfn space, and it creates the final guest
>> > e820. So hvmloader also needs to detect conflictions when conducting
>> > those operations. If there's no confliction, hvmloader will reserve
>> > those regions in guest e820 to let guest OS aware.
>> 
>> Ideally, rather than detecting conflicts, hvmloader would just
>> consume what libxc set up. Obviously that would require awareness
>> in libxc of things it currently doesn't care about (like fitting PCI BARs
>> into the MMIO hole, enlarging it as necessary). I admit that this may
>> end up being difficult to implement. Another alternative would be to
>> have libxc only populate a limited part of RAM (for hvmloader to be
>> loadable), and have hvmloader do the bulk of the populating.
> 
> there are quite some allocations which are suitable in hvmloader, such
> as ACPI, PCI BARs, and other hole allocations. some of them are hvmloader's
> own usage, and others are related to guest bios. I don't think it's worthy 
> of
> the mass refactoring of moving those allocations to libxc, just for this 
> very
> specific task. As long as hvmloader still needs allocate gfns, it needs to
> keep confliction detection logic itself.

Allocations done by hvmloader doesn't need to look for conflicts.
All it needs to make sure is that what it allocates is actually RAM.
It not doing so today is already a (latent) bug. The thing is that
if libxc set up a proper, fully correct memory map for hvmloader
to consume, hvmloader doesn't need to do anything else than
play by this memory map.

>> 3.3 Policies
>> > 
>> > An intuitive thought is to fail immediately upon a confliction, however
>> > it is not flexible regarding to different requirments:
>> >
>> > a) it's not appropriate to fail libxc domain builder just because such
>> > confliction. We still want the guest to boot even w/o assigned device;
>> 
>> I don't think that's right (and I believe this was discussed before):
>> When device assignment fails, VM creation should fail too. It is the
>> responsibility of the host admin in that case to remove some or all
>> of the to be assigned devices from the guest config.
> 
> think about bare metal. If a device say NIC doesn't work, would the
> platform reject to work at all? there could be errors, but their scope
> are limited within specific function. user can still use a platform w/
> errors as long as related functions are not used.
> 
> Similarly we should allow domainbuilder to move forward upon a
> device assignment failure (something like circuit error when powering
> the device), and user will note this problem when using the device
> (either not present or not function correctly).
> 
> same thing for hotplug usage. all the detections for future hotplug
> usage are just preparation and not strict. you don't want to hang
> a platform just because it's not suitable to hotplug some device in
> the future.

Hotplug is something that can fail, and surely shouldn't lead to a
hung guest. Very similar to hotplug on bare metal indeed.

Boot time device assignment is different: The question isn't whether
an assigned device works, instead the proper analogy is whether a
device is _present_. If a device doesn't work on bare metal, it will
still be discoverable. Yet if device assignment fails, that's not going
to be the case - for security reasons, the guest would not see any
notion of the device.

The "device does not work" analogy to bare metal would only apply
if the device's presence would prevent the system from booting (in
which case you'd have to physically remove it from the system, just
like for the virtualized case you'd have to remove it from the guest
config).

>> > We propose report-all as the simple solution (different from last sent
>> > version which used report-sel), regarding to the below facts:
>> >
>> >   - 'warn' policy in user space makes report-all not harmful
>> >   - 'report-all' still means only a few entries in reality:
>> > * RMRR reserved regions should be avoided or limited by platform
>> > designers, per VT-d specification;
>> > * RMRR reserved regions are only a few on real platforms, per our
>> > current observations;
>> 
>> Few yes, but in the IGD example you gave the region is quite large,
>> and it would be fairly odd to have all guests have a strange, large
>> hole in their address spaces. Furthermore remember that these
>> holes vary from machine to machine, so a migrateable guest would
>> needlessly end up having a hole potentially not helping subsequent
>> hotplug at all.
> 
> it's not strange since it never exceeds the set on bare metal, but yes, 
> migration raises another interesting point. currently I don't think 
> migration w/ assigned devices is supp

Re: [Xen-devel] (v2) Design proposal for RMRR fix

2015-01-09 Thread Jan Beulich
>>> On 09.01.15 at 03:29,  wrote:
>>  From: Jan Beulich [mailto:jbeul...@suse.com]
>> Sent: Thursday, January 08, 2015 8:59 PM
>> 
>> >>> On 08.01.15 at 13:49,  wrote:
>> > One question: where are these RMRRs typically located in memory?  Are
>> > they normally up in the MMIO region?  Or can they occur anywhere (even
>> > in really low areas, say, under 1GiB)?
>> 
>> They would typically sit in the MMIO hole or below 1Mb; that latter case
>> is particularly problematic as it might conflict with what we want to put
>> there (BIOS etc).
>> 
> 
> and later case is not solvable, which is then related to other discussion 
> whether
> we want to fail such case

That latter case is partially solvable: The BIOS put below 1Mb has a
permanent and a transient part. Dealing with the transient part
overlapping an RMRR ought to be possible (e.g. by delaying the
actual device assignment until the point where hvmloader knows it
is safe to do). An overlap of the permanent part with an RMRR is of
course fatal.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [linux-3.10 test] 33271: regressions - trouble: broken/fail/pass

2015-01-09 Thread xen . org
flight 33271 linux-3.10 real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/33271/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-qemut-winxpsp3  7 windows-install fail REGR. vs. 26303

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl-sedf  3 host-install(3)   broken pass in 33190
 test-armhf-armhf-xl   3 host-install(3)  broken in 33190 pass in 33271

Regressions which are regarded as allowable (not blocking):
 test-amd64-i386-pair17 guest-migrate/src_host/dst_host fail like 26303
 test-amd64-amd64-xl-winxpsp3  7 windows-install  fail   like 26303
 build-armhf-libvirt 3 host-install(3) broken in 33190 blocked in 26303
 test-amd64-amd64-rumpuserxen-amd64 11 rumpuserxen-demo-xenstorels/xenstorels 
fail in 33190 blocked in 26303
 build-i386-libvirt  3 host-install(3) broken in 33190 blocked in 26303
 build-amd64-libvirt 3 host-install(3) broken in 33190 blocked in 26303
 test-amd64-amd64-xl-qemuu-debianhvm-amd64 10 guest-localmigrate fail in 33190 
blocked in 26303

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-pvh-intel  9 guest-start  fail  never pass
 test-amd64-i386-libvirt   9 guest-start  fail   never pass
 test-amd64-amd64-libvirt  9 guest-start  fail   never pass
 test-amd64-amd64-xl-pvh-amd   9 guest-start  fail   never pass
 test-amd64-amd64-xl-pcipt-intel  9 guest-start fail never pass
 test-armhf-armhf-libvirt  5 xen-boot fail   never pass
 test-armhf-armhf-xl   5 xen-boot fail   never pass
 test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop  fail never pass
 test-amd64-amd64-xl-qemuu-winxpsp3 14 guest-stop   fail never pass
 test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop   fail never pass
 test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop  fail never pass
 test-amd64-i386-xl-winxpsp3  14 guest-stop   fail   never pass
 test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass
 test-amd64-amd64-xl-win7-amd64 14 guest-stop   fail never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass
 test-amd64-i386-xl-qemut-winxpsp3 14 guest-stopfail never pass
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-i386-xl-win7-amd64 14 guest-stop   fail  never pass
 test-amd64-i386-xl-qemuu-winxpsp3 14 guest-stopfail never pass
 test-amd64-i386-libvirt   1 build-check(1)blocked in 33190 n/a
 test-amd64-amd64-libvirt  1 build-check(1)blocked in 33190 n/a
 test-armhf-armhf-libvirt  1 build-check(1)blocked in 33190 n/a

version targeted for testing:
 linuxa472efc75989c7092187fe00f0400e02c495c436
baseline version:
 linuxbe67db109090b17b56eb8eb2190cd70700f107aa


816 people touched revisions under test,
not listing them all


jobs:
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 build-amd64-rumpuserxen  pass
 build-i386-rumpuserxen   pass
 test-amd64-amd64-xl  pass
 test-armhf-armhf-xl  fail
 test-amd64-i386-xl   pass
 test-amd64-amd64-xl-pvh-amd  fail
 test-amd64-i386-rhel6hvm-amd pass
 test-amd64-i386-qemut-rhel6hvm-amd   pass
 test-amd64-i386-qemuu-rhel6hvm-amd   pass
 test-amd64-amd64-xl-qemut-debianhvm-amd64pass
 test-amd64-i386-xl-qemut-debianhvm-amd64 pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-i386-xl-qemuu-debianhvm-amd64 pass
 test-amd64-i386-f

Re: [Xen-devel] [libvirt test] 33157: regressions - trouble: blocked/fail/pass/preparing/queued

2015-01-09 Thread Ian Campbell
On Thu, 2015-01-08 at 14:56 -0700, Jim Fehlig wrote:
> Jim Fehlig wrote:
> > Ian Campbell wrote:
> >
> >   
> >> I must confess that I thought you already did use libxlu...
> >>   
> >> 
> >
> > No, not directly.  But I will be doing so now.  I should try to revert
> > all this nonsense and use libxlu before it ends up in the next libvirt
> > release.
> >   
> 
> Hmm, I don't think that is going to be possible since libxlutil.h has
> never been installed :-(.  I guess something like the below (untested)
> patch would have been necessary long ago.  Changing libvirt to use
> libxlutil would break a lot of builds where the distro's existing
> xen-devel package is missing libxlutil.h.

Whoops, yes, how embarrassing!

Acked-by: Ian Campbell 

And of course it should be backported to everywhere.

Later:
On Thu, 2015-01-08 at 22:59 -0700, Jim Fehlig wrote:
[...]
> That's a bit pessimistic.  I could always add something like the
> following hack to libvirt

I'd say that if you combine that with an AC_CHECK_HEADERS([libxlutil.h])
in configure.ac and wrap the hack in #ifndef HAVE_LIBXLUTIL_H that
wouldn't be too aweful a hack.

> 
> typedef struct XLU_Config XLU_Config;
> 
> extern XLU_Config *xlu_cfg_init(FILE *report,
> const char *report_filename);
> 
> extern int xlu_disk_parse(XLU_Config *cfg,
>   int nspecs,
>   const char *const *specs,
>   libxl_device_disk *disk);
> 
> Regards,
> Jim
> 
> 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] (v2) Design proposal for RMRR fix

2015-01-09 Thread Jan Beulich
>>> On 09.01.15 at 07:57,  wrote:
> 1) 'fail' vs. 'warn' upon gfn confliction
> 
> Assigning device which fails RMRR confliction check (i.e. intended gfns 
> already allocated for other resources) actually brings unknown stability 
> problem (device may clobber those valid resources) and potentially security 
> issue within VM (but not worse than what a malicious driver can do w/o 
> virtual IOMMU). 
> 
> So by default we should not move forward if gfn confliction is detected 
> when setting up RMRR identity mapping, so-called a 'fail' policy.
> 
> One open though, is whether we want to allow admin to override default
> 'fail' policy with a 'warn' policy, i.e. throwing out confliction detail but
> succeeding device assignment. USB is discussed as one example before
> (hack how it works today upon <1MB confliction), so it might be good to 
> allow enthusiast trying device assignment, or provide flexibility to users
> who already verified predicted potential problem not a real issue to their
> specific deployment.
> 
> I'd like to hear your votes on whether to provide such 'warn' option.

Yes, I certainly see value in such an option, to have a means to
circumvent the other possible case of perceived regressions in not
being able to pass through certain devices anymore.

> 1.1) per-device 'warn' vs. global 'warn'
> 
> Both Tim/Jan prefer to 'warn' as a per-device option to the admin instead 
> of a global option.
> 
> In a glimpse a per-device 'warn' option provides more fine-grained control 
> than a global option, however if thinking it carefully allowing one device 
> w/ 
> potential problem isn't more correct or secure than allowing multiple 
> devices w/ potential problem. Even in practice a device like USB can
> work bearing <1MB confliction, like Jan pointed out there's always corner
> cases which we might not know so as long as we open door for one device,
> it implies a problematic environment to users and user's judge on whether
> he can live up to this problem is not impacted by how many devices the door
> is opened for (he anyway needs to study warning message and do verification
> if choosing to live up)
> 
> Regarding to that, imo if we agree to provide 'warn' option, just providing
> a global overriding option (definitely per-vm) is acceptable and simpler.

If the admin determined that ignoring the RMRR requirements for one
devices is safe, that doesn't (and shouldn't) mean this is the case for
all other devices too.

> 1.2) when to 'fail'
> 
> There is one open whether we should fail immediately in domain builder
> if a confliction is detected. 
> 
> Jan's comment is yes, we should 'fail' the VM creation as it's an error.
> 
> My previous point is more mimicking native behavior, where a device 
> failure (in our case it's actually potential device failure since VM is not 
> powered yet) doesn't impact user until its function is actually touched. 
> In our case, even domain builder fails to re-arrange guest RAM to skip 
> reserved regions, we have centralized policy (either 'fail' or 'warn' per 
> above conclusion) in Xen hypervisor when the device is actually assigned. 
> so a 'warn' should be fine, but my insist on this is not strong.

See my earlier reply: Failure to add a device to me is more like a
device preventing a bare metal system from coming up altogether.

> and another point is about hotplug. 'fail' for future devices is too strict,
> but to differentiate that from static-assigned devices, domain builder
> will then need maintain a per-device reserved region structure. just
> 'warn' makes things simple.

Whereas here I agree - hotplug should just fail (without otherwise
impacting the guest).

> 2) RMRR management
> 
> George raised a good point that RMRR reserved regions can be maintained
> in toolstack, and it's toolstack to tell Xen which regions to be reserved. 
> When
> providing more flexibility, another benefit from Jan is to specify reserved 
> regions in another node (might-be-migrated-to) as a preparation for migration.
> 
> When it sounds like a good long term plan, my feeling is that it might be
> some parallel effort driving from toolstack experts. Xen can't simply rely
> on user space to setup all necessary reserved regions, since it violates the
> isolation philosophy in Xen. Whatever a toolstack may tell Xen, Xen still
> needs to setup identity mapping for all reserved regions reported for the
> assigned device.

Of course. If the tool stack failed to reserve a certain page in a guest's
memory map, failure will result.

> So I still prefer to current way i.e. having Xen to organize reserve regions
> according to assigned device, and then having libxc/hvmloader to query
> to avoid confliction. In the future new interface can be created to allow
> toolstack specific plain reserved regions for whatever reason to Xen, as
> a compliment.

Indeed we should presumably allow for both - the guest config may
specify regions independent on what the host properties are, yet by
d

Re: [Xen-devel] [PATCH 3/3] xen: use correct type for physical addresses

2015-01-09 Thread Jan Beulich
>>> On 08.01.15 at 18:01,  wrote:
> --- a/arch/x86/xen/setup.c
> +++ b/arch/x86/xen/setup.c
> @@ -140,7 +140,7 @@ static void __init xen_del_extra_mem(u64 start, u64 size)
>  unsigned long __ref xen_chk_extra_mem(unsigned long pfn)
>  {
>   int i;
> - unsigned long addr = PFN_PHYS(pfn);
> + u64 addr = PFN_PHYS(pfn);

Isn't phys_addr_t the type to use here?

> @@ -284,7 +286,7 @@ static void __init xen_update_mem_tables(unsigned long 
> pfn, unsigned long mfn)
>   }
>  
>   /* Update kernel mapping, but not for highmem. */
> - if ((pfn << PAGE_SHIFT) >= __pa(high_memory))
> + if (PFN_PHYS(pfn) >= (u64)(__pa(high_memory)))

I don't think you really need the cast on the right side - __pa()
should be returning a value of suitable type (and unsigned long
would be sufficient for anything up to and including high_memory).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC V9 4/4] domain snapshot design: libxl/libxlu

2015-01-09 Thread Chun Yan Liu


>>> On 1/8/2015 at 08:11 PM, in message <1420719107.19787.53.ca...@citrix.com>, 
>>> Ian
Campbell  wrote: 
> On Mon, 2014-12-22 at 02:36 -0700, Chun Yan Liu wrote: 
> > > > b). For internal snapshot, like qcow2, lvm too. For lvm, it doesn't  
> support  
> > > > snapshot of snapshot, so out of scope. For qcow2, delete any disk  
> snapshot  
> > > > won't affect others.  
> > >   
> > > For either internal or external if you are removing a snapshot from the  
> > > middle of a chain which ends in one or more active disks, then surely  
> > > the disk backend associated with those domains need to get some sort of  
> > > notification, otherwise they would need to be written *very* carefully  
> > > in order to be able to cope with disk metadata changing under their  
> > > feet.  
> > >   
> > > Are you saying that the qemu/qcow implementation has indeed been written  
> > > with this in mind and can cope with arbitrary other processes modifying  
> > > the qcow metadata under their feet?  
> >  
> > Yes. 
> >  
> > I add qemu-devel Kevin and Stefan in this thread in case my understanding 
> > has somewhere wrong. 
> >  
> > Kevin & Stefan, 
> >  
> > About the qcow2 snapshot implementation,  in following snapshot chain case, 
> > if we delete SNAPSHOT A, will it affect domain 1 and domain 2 which uses 
> > SNAPSHOT B and SNAPSHOT C? 
> >  
> > From my understanding, creating a snapshot will increases refcount of  
> original data, 
> > deleting a snapshot only descreases the refcount (won't delete data until  
> the refcount 
> > becomes 0), so I think it won't affect domain 1 and domain 2. 
> > Is that right? 
>  
> I'm not worried about the data being deleted (I'm sure qcow2 will get 
> that right), but rather about the snapshot chain being collapsed and 
> therefore the metadata (e.g. the tables of which block is in which 
> backing file, and perhaps the location of the data itself) changing 
> while a domain is running, e.g. 
>  
> BASE---SNAPSHOT A---SNAPSHOT B --- domain 1  
>  `--SNAPSHOT C --- domain 2  
>  
> becoming  
>  
> BASESNAPSHOT B --- domain 1  
>  `--SNAPSHOT C --- domain 2  
>  
> (essentially changing B and C's tables to accommodate the lack of A) 
>  
> For an internal snapshot I can see that it would be sensible (and easy) 
> to keep A around as a "ghost", to avoid this case, and the need to 
> perhaps move data around or duplicate it. 

This is what we talked about a lot :-)

If I understand correctly, qcow2 internal snapshot implementation
should be like this:
Taking a snapshot is increasing a new table which records the data
clusters index in this snapshot, at the same time increase the refcount
of those data clusters;
Teleting a snapshot only decrease the refcount (not deleting the data,
only when refcount=0 the data will be deleted.) and remove the map table.

So, I think deleting a snapshot even within a snapshot chain, won't
affect other snapshots. ( Otherwise, even if that's not the case, how can
we move internal data around or duplicate it? )

>  
> If A were external though an admin might think they could delete the 
> file...

About external, generally, external disk snapshot are made by backing file,
qemu, vhd-util and lvm are all done that way. Like:
snapshot A -> snapshot B -> Current image
A is B's backing file, B is current image's backing file, as backing file, it
should not be changed, otherwise if A is changed or deleted, B is corrupted.
So, a delete operation is actually a merge process, like 'virsh blockcommit':
shorten disk image chain by live merging the current active disk content.

To external disk snapshot: revert (may need to clone the snapshot out)
and delete (merge in fact as mentioned above). The complexity is
internal/external mixing case (take internal snapshot, revert, take
external snapshot, ...)

Chunyan

>
> Ian. 
>  
>  
>  


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] xen-ringwatch issues

2015-01-09 Thread Ian Campbell
On Thu, 2015-01-08 at 14:07 -0500, moftah moftah wrote:
> Hi All,
> We are using Xenserver 6.2

FYI xenserver is developed as a separate project over at
www.xenserver.org, so in general you should be reporting
issue/requesting help over on their forums and lists etc.

However, since xen-ringwatch is shipped by upstream Xen we can at least
try and help with that bit here.

[...]
> so after searching around we changed the file xen-ringwatch in order
> to see the real issue the changes are

Those look sensible, please could you send with a changelog message and
a Signed-off-by as described in
http://wiki.xen.org/wiki/Submitting_Xen_Patches . The S-o-b in
particular is required in order to be able to accept a code
contribution.

> --- /usr/sbin/xen-ringwatch 2013-07-22 13:52:19.0 +0200
> +++ /usr/sbin/xen-ringwatch 2013-07-22 13:52:30.0 +0200
> @@ -238,7 +238,7 @@
>  match = cls._pattern.search(line)
>  if not match:
>  raise Exception, "Malformed %s input: %s" % \
> -(cls.__name__, repr(s))
> +(cls.__name__, repr(line))
>  
>  i = iter(match.groups())
>  for k in i:
> 
> now the issue we see is like this
> Exception: Malformed Req input: 'req prod 3412900880 cons -882066416 event 
> 3412900881'

I bet the negative number is confusing things (ah, which you also said
further down). 

Really the kernel ought to be printing these as unsigned (for which
you'll need to speak to the xenserver.org folks, I think)

But the python code could also deal with them more gracefully when it
sees them. You'd need to start by allowing the regex used for the match
to accept an optional leading "-" on the numbers.

You probably also want to cast the result to the unsigned value during
the subsequent parsing, my Python-fu isn't sufficient to know off hand
how one would do that.

Ian.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] (v2) Design proposal for RMRR fix

2015-01-09 Thread Tian, Kevin
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Friday, January 09, 2015 5:24 PM
> 
> >>> On 09.01.15 at 03:29,  wrote:
> >>  From: Jan Beulich [mailto:jbeul...@suse.com]
> >> Sent: Thursday, January 08, 2015 8:59 PM
> >>
> >> >>> On 08.01.15 at 13:49,  wrote:
> >> > One question: where are these RMRRs typically located in memory?
> Are
> >> > they normally up in the MMIO region?  Or can they occur anywhere
> (even
> >> > in really low areas, say, under 1GiB)?
> >>
> >> They would typically sit in the MMIO hole or below 1Mb; that latter case
> >> is particularly problematic as it might conflict with what we want to put
> >> there (BIOS etc).
> >>
> >
> > and later case is not solvable, which is then related to other discussion
> > whether
> > we want to fail such case
> 
> That latter case is partially solvable: The BIOS put below 1Mb has a
> permanent and a transient part. Dealing with the transient part
> overlapping an RMRR ought to be possible (e.g. by delaying the
> actual device assignment until the point where hvmloader knows it
> is safe to do). An overlap of the permanent part with an RMRR is of
> course fatal.
> 

yes, but that type of changes are tricky, and since it's only partial so not
worthy of doing so. :-)

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] (v2) Design proposal for RMRR fix

2015-01-09 Thread Tian, Kevin
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Friday, January 09, 2015 5:21 PM
> 
> >>> On 09.01.15 at 03:27,  wrote:
> >>  From: Jan Beulich [mailto:jbeul...@suse.com]
> >> Sent: Thursday, January 08, 2015 9:55 PM
> >> >>> On 26.12.14 at 12:23,  wrote:
> >> > 3) hvmloader
> >> >   Hvmloader allocates other resources (ACPI, PCI MMIO, etc.) and
> >> > internal data structures in gfn space, and it creates the final guest
> >> > e820. So hvmloader also needs to detect conflictions when conducting
> >> > those operations. If there's no confliction, hvmloader will reserve
> >> > those regions in guest e820 to let guest OS aware.
> >>
> >> Ideally, rather than detecting conflicts, hvmloader would just
> >> consume what libxc set up. Obviously that would require awareness
> >> in libxc of things it currently doesn't care about (like fitting PCI BARs
> >> into the MMIO hole, enlarging it as necessary). I admit that this may
> >> end up being difficult to implement. Another alternative would be to
> >> have libxc only populate a limited part of RAM (for hvmloader to be
> >> loadable), and have hvmloader do the bulk of the populating.
> >
> > there are quite some allocations which are suitable in hvmloader, such
> > as ACPI, PCI BARs, and other hole allocations. some of them are hvmloader's
> > own usage, and others are related to guest bios. I don't think it's worthy
> > of
> > the mass refactoring of moving those allocations to libxc, just for this
> > very
> > specific task. As long as hvmloader still needs allocate gfns, it needs to
> > keep confliction detection logic itself.
> 
> Allocations done by hvmloader doesn't need to look for conflicts.
> All it needs to make sure is that what it allocates is actually RAM.
> It not doing so today is already a (latent) bug. The thing is that
> if libxc set up a proper, fully correct memory map for hvmloader
> to consume, hvmloader doesn't need to do anything else than
> play by this memory map.

for RAM I think we're aligned. with your earlier suggestion I think we can
achieve the goal. :-)

> 
> >> 3.3 Policies
> >> > 
> >> > An intuitive thought is to fail immediately upon a confliction, however
> >> > it is not flexible regarding to different requirments:
> >> >
> >> > a) it's not appropriate to fail libxc domain builder just because such
> >> > confliction. We still want the guest to boot even w/o assigned device;
> >>
> >> I don't think that's right (and I believe this was discussed before):
> >> When device assignment fails, VM creation should fail too. It is the
> >> responsibility of the host admin in that case to remove some or all
> >> of the to be assigned devices from the guest config.
> >
> > think about bare metal. If a device say NIC doesn't work, would the
> > platform reject to work at all? there could be errors, but their scope
> > are limited within specific function. user can still use a platform w/
> > errors as long as related functions are not used.
> >
> > Similarly we should allow domainbuilder to move forward upon a
> > device assignment failure (something like circuit error when powering
> > the device), and user will note this problem when using the device
> > (either not present or not function correctly).
> >
> > same thing for hotplug usage. all the detections for future hotplug
> > usage are just preparation and not strict. you don't want to hang
> > a platform just because it's not suitable to hotplug some device in
> > the future.
> 
> Hotplug is something that can fail, and surely shouldn't lead to a
> hung guest. Very similar to hotplug on bare metal indeed.
> 
> Boot time device assignment is different: The question isn't whether
> an assigned device works, instead the proper analogy is whether a
> device is _present_. If a device doesn't work on bare metal, it will
> still be discoverable. Yet if device assignment fails, that's not going
> to be the case - for security reasons, the guest would not see any
> notion of the device.

the question is whether we want such device assignment fail due to
RMRR confliction, and the fail decision should be when Xen handles
actual assignment instead of when domain builder prepares reserved
regions.

> 
> The "device does not work" analogy to bare metal would only apply
> if the device's presence would prevent the system from booting (in
> which case you'd have to physically remove it from the system, just
> like for the virtualized case you'd have to remove it from the guest
> config).
> 
> >> > We propose report-all as the simple solution (different from last sent
> >> > version which used report-sel), regarding to the below facts:
> >> >
> >> >   - 'warn' policy in user space makes report-all not harmful
> >> >   - 'report-all' still means only a few entries in reality:
> >> > * RMRR reserved regions should be avoided or limited by platform
> >> > designers, per VT-d specification;
> >> > * RMRR reserved regions are only a few on real platforms, per our
> >> > current 

Re: [Xen-devel] [PATCH 0/3] XSA-114 follow-ups

2015-01-09 Thread Andrew Cooper
On 08/01/15 16:28, Tim Deegan wrote:
> At 16:00 + on 08 Jan (1420729255), Jan Beulich wrote:
>> 1: spinlock: use local_irq_disable() instead of local_irq_save() where 
>> possible
>> 2: rwlock: allow arch to override read_unlock() atomic
>> 3: rwlock: allow arch to override write_unlock() atomic
>>
>> Signed-off-by: Jan Beulich 
> Reviewed-by: Tim Deegan 

Reviewed-by: Andrew Cooper 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 3/3] x86/HVM: vMSI simplification

2015-01-09 Thread Andrew Cooper
On 08/01/15 15:52, Jan Beulich wrote:
> - struct msixtbl_entry's table_len field can be unsigned int, and by
>   moving it down a little the structure size can be reduced slightly
> - a disjoint xmalloc()/memset() pair can be converted to xzalloc()
> - a pointless local variable can be dropped
>
> Signed-off-by: Jan Beulich 

Reviewed-by: Andrew Cooper 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 1/3] x86: also allow REP STOS emulation acceleration

2015-01-09 Thread Jan Beulich
While the REP MOVS acceleration appears to have helped qemu-traditional
based guests, qemu-upstream (or really the respective video BIOSes)
doesn't appear to benefit from that. Instead the acceleration added
here provides a visible performance improvement during very early HVM
guest boot.

Signed-off-by: Jan Beulich 
---
v2: fix asm() constraints in hvmemul_rep_stos(), as pointed out by
Andrew. Add output operand telling the compiler that "buf" is being
written.

--- unstable.orig/xen/arch/x86/hvm/emulate.c2015-01-09 11:19:01.0 
+0100
+++ unstable/xen/arch/x86/hvm/emulate.c 2015-01-09 11:19:36.0 +0100
@@ -731,6 +731,17 @@ static int hvmemul_rep_movs_discard(
 return X86EMUL_OKAY;
 }
 
+static int hvmemul_rep_stos_discard(
+void *p_data,
+enum x86_segment seg,
+unsigned long offset,
+unsigned int bytes_per_rep,
+unsigned long *reps,
+struct x86_emulate_ctxt *ctxt)
+{
+return X86EMUL_OKAY;
+}
+
 static int hvmemul_rep_outs_discard(
 enum x86_segment src_seg,
 unsigned long src_offset,
@@ -982,6 +993,114 @@ static int hvmemul_rep_movs(
 return X86EMUL_OKAY;
 }
 
+static int hvmemul_rep_stos(
+void *p_data,
+enum x86_segment seg,
+unsigned long offset,
+unsigned int bytes_per_rep,
+unsigned long *reps,
+struct x86_emulate_ctxt *ctxt)
+{
+struct hvm_emulate_ctxt *hvmemul_ctxt =
+container_of(ctxt, struct hvm_emulate_ctxt, ctxt);
+unsigned long addr;
+paddr_t gpa;
+p2m_type_t p2mt;
+bool_t df = !!(ctxt->regs->eflags & X86_EFLAGS_DF);
+int rc = hvmemul_virtual_to_linear(seg, offset, bytes_per_rep, reps,
+   hvm_access_write, hvmemul_ctxt, &addr);
+
+if ( rc == X86EMUL_OKAY )
+{
+uint32_t pfec = PFEC_page_present | PFEC_write_access;
+
+if ( hvmemul_ctxt->seg_reg[x86_seg_ss].attr.fields.dpl == 3 )
+pfec |= PFEC_user_mode;
+
+rc = hvmemul_linear_to_phys(
+addr, &gpa, bytes_per_rep, reps, pfec, hvmemul_ctxt);
+}
+if ( rc != X86EMUL_OKAY )
+return rc;
+
+/* Check for MMIO op */
+(void)get_gfn_query_unlocked(current->domain, gpa >> PAGE_SHIFT, &p2mt);
+
+switch ( p2mt )
+{
+unsigned long bytes;
+void *buf;
+
+default:
+/* Allocate temporary buffer. */
+for ( ; ; )
+{
+bytes = *reps * bytes_per_rep;
+buf = xmalloc_bytes(bytes);
+if ( buf || *reps <= 1 )
+break;
+*reps >>= 1;
+}
+
+if ( !buf )
+buf = p_data;
+else
+switch ( bytes_per_rep )
+{
+unsigned long dummy;
+
+#define CASE(bits, suffix) \
+case (bits) / 8:   \
+asm ( "rep stos" #suffix   \
+  : "=m" (*(char (*)[bytes])buf),  \
+"=D" (dummy), "=c" (dummy) \
+  : "a" (*(const uint##bits##_t *)p_data), \
+ "1" (buf), "2" (*reps)\
+  : "memory" );\
+break
+CASE(8, b);
+CASE(16, w);
+CASE(32, l);
+CASE(64, q);
+#undef CASE
+
+default:
+xfree(buf);
+ASSERT(!buf);
+return X86EMUL_UNHANDLEABLE;
+}
+
+/* Adjust address for reverse store. */
+if ( df )
+gpa -= bytes - bytes_per_rep;
+
+rc = hvm_copy_to_guest_phys(gpa, buf, bytes);
+
+if ( buf != p_data )
+xfree(buf);
+
+switch ( rc )
+{
+case HVMCOPY_gfn_paged_out:
+case HVMCOPY_gfn_shared:
+return X86EMUL_RETRY;
+case HVMCOPY_okay:
+return X86EMUL_OKAY;
+}
+
+gdprintk(XENLOG_WARNING,
+ "Failed REP STOS: gpa=%"PRIpaddr" reps=%lu 
bytes_per_rep=%u\n",
+ gpa, *reps, bytes_per_rep);
+/* fall through */
+case p2m_mmio_direct:
+return X86EMUL_UNHANDLEABLE;
+
+case p2m_mmio_dm:
+return hvmemul_do_mmio(gpa, reps, bytes_per_rep, 0, IOREQ_WRITE, df,
+   p_data);
+}
+}
+
 static int hvmemul_read_segment(
 enum x86_segment seg,
 struct segment_register *reg,
@@ -1239,6 +1358,7 @@ static const struct x86_emulate_ops hvm_
 .rep_ins   = hvmemul_rep_ins,
 .rep_outs  = hvmemul_rep_outs,
 .rep_movs  = hvmemul_rep_movs,
+.rep_stos  = hvmemul_rep_stos,
 .read_segment  = hvmemul_read_segment,
 .write_segment = hvmemul_write_segment,
 .read_io   = hvmemul_read_io,
@@ -1264,6 +1384,7 @@ static const struct x86_emulate_ops hvm_
 .rep_ins   = hvmemul_rep_ins_discard,
 .rep_outs  = hvmemul_rep_outs_discard

Re: [Xen-devel] (v2) Design proposal for RMRR fix

2015-01-09 Thread Tian, Kevin
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Friday, January 09, 2015 5:46 PM
> 
> >>> On 09.01.15 at 07:57,  wrote:
> > 1) 'fail' vs. 'warn' upon gfn confliction
> >
> > Assigning device which fails RMRR confliction check (i.e. intended gfns
> > already allocated for other resources) actually brings unknown stability
> > problem (device may clobber those valid resources) and potentially security
> > issue within VM (but not worse than what a malicious driver can do w/o
> > virtual IOMMU).
> >
> > So by default we should not move forward if gfn confliction is detected
> > when setting up RMRR identity mapping, so-called a 'fail' policy.
> >
> > One open though, is whether we want to allow admin to override default
> > 'fail' policy with a 'warn' policy, i.e. throwing out confliction detail but
> > succeeding device assignment. USB is discussed as one example before
> > (hack how it works today upon <1MB confliction), so it might be good to
> > allow enthusiast trying device assignment, or provide flexibility to users
> > who already verified predicted potential problem not a real issue to their
> > specific deployment.
> >
> > I'd like to hear your votes on whether to provide such 'warn' option.
> 
> Yes, I certainly see value in such an option, to have a means to
> circumvent the other possible case of perceived regressions in not
> being able to pass through certain devices anymore.
> 
> > 1.1) per-device 'warn' vs. global 'warn'
> >
> > Both Tim/Jan prefer to 'warn' as a per-device option to the admin instead
> > of a global option.
> >
> > In a glimpse a per-device 'warn' option provides more fine-grained control
> > than a global option, however if thinking it carefully allowing one device
> > w/
> > potential problem isn't more correct or secure than allowing multiple
> > devices w/ potential problem. Even in practice a device like USB can
> > work bearing <1MB confliction, like Jan pointed out there's always corner
> > cases which we might not know so as long as we open door for one device,
> > it implies a problematic environment to users and user's judge on whether
> > he can live up to this problem is not impacted by how many devices the door
> > is opened for (he anyway needs to study warning message and do
> verification
> > if choosing to live up)
> >
> > Regarding to that, imo if we agree to provide 'warn' option, just providing
> > a global overriding option (definitely per-vm) is acceptable and simpler.
> 
> If the admin determined that ignoring the RMRR requirements for one
> devices is safe, that doesn't (and shouldn't) mean this is the case for
> all other devices too.

I don't think admin can determine whether it's 100% safe. What admin can 
decide is whether he lives up to the potential problem based on his purpose
or based on some experiments. only device vendor knows when and how
RMRR is used. So as long as warn is opened for one device, I think it
already means a problem environment and then adding more device is
just same situation.

> 
> > 1.2) when to 'fail'
> >
> > There is one open whether we should fail immediately in domain builder
> > if a confliction is detected.
> >
> > Jan's comment is yes, we should 'fail' the VM creation as it's an error.
> >
> > My previous point is more mimicking native behavior, where a device
> > failure (in our case it's actually potential device failure since VM is not
> > powered yet) doesn't impact user until its function is actually touched.
> > In our case, even domain builder fails to re-arrange guest RAM to skip
> > reserved regions, we have centralized policy (either 'fail' or 'warn' per
> > above conclusion) in Xen hypervisor when the device is actually assigned.
> > so a 'warn' should be fine, but my insist on this is not strong.
> 
> See my earlier reply: Failure to add a device to me is more like a
> device preventing a bare metal system from coming up altogether.

not all devices are required for bare metal to boot. it causes problem
only when it's being used in the boot process. say at powering up the
disk (insert in the PCI slot) is broken (not sure whether you call such
thing as 'failure to add a device'), it is only error when BIOS tries to
read disk.

note device assignment path is the actual path to decide whether a
device will be present to the guest. not at this domain build time.

> 
> > and another point is about hotplug. 'fail' for future devices is too strict,
> > but to differentiate that from static-assigned devices, domain builder
> > will then need maintain a per-device reserved region structure. just
> > 'warn' makes things simple.
> 
> Whereas here I agree - hotplug should just fail (without otherwise
> impacting the guest).

so 'should' -> 'shoundn't'?

> 
> > 2) RMRR management
> >
> > George raised a good point that RMRR reserved regions can be maintained
> > in toolstack, and it's toolstack to tell Xen which regions to be reserved.
> > When
> > providing more flexibility, another benefit from Jan is to s

[Xen-devel] [libvirt test] 33280: regressions - FAIL

2015-01-09 Thread xen . org
flight 33280 libvirt real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/33280/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-armhf-libvirt   5 libvirt-build fail REGR. vs. 32648
 build-i386-libvirt5 libvirt-build fail REGR. vs. 32648
 build-amd64-libvirt   5 libvirt-build fail REGR. vs. 32648

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a

version targeted for testing:
 libvirt  bb072e8a38d1d6d2f7819bbcb8ff106520081667
baseline version:
 libvirt  2360fe5d24175835d3f5fd1c7e8e6e13addab629


People who touched revisions under test:
  Alexander Burluka 
  Cedric Bosdonnat 
  Chunyan Liu 
  Cédric Bosdonnat 
  Daniel P. Berrange 
  Eric Blake 
  Geoff Hickey 
  Jim Fehlig 
  Jiri Denemark 
  Ján Tomko 
  Kiarie Kahurani 
  Luyao Huang 
  Michal Privoznik 
  Nehal J Wani 
  Pavel Hrdina 
  Peter Krempa 
  Stefan Berger 


jobs:
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  fail
 build-armhf-libvirt  fail
 build-i386-libvirt   fail
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt blocked
 test-armhf-armhf-libvirt blocked
 test-amd64-i386-libvirt  blocked



sg-report-flight on osstest.cam.xci-test.com
logs: /home/xc_osstest/logs
images: /home/xc_osstest/images

Logs, config files, etc. are available at
http://www.chiark.greenend.org.uk/~xensrcts/logs

Test harness code can be found at
http://xenbits.xensource.com/gitweb?p=osstest.git;a=summary


Not pushing.

(No revision log; it would be 560 lines long.)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [seabios test] 33272: trouble: blocked/broken/fail/pass

2015-01-09 Thread xen . org
flight 33272 seabios real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/33272/

Failures and problems with tests :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-i3863 host-install(3) broken REGR. vs. 32830

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-pvh-intel  9 guest-start  fail  never pass
 test-amd64-i386-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-pvh-amd   9 guest-start  fail   never pass
 test-amd64-amd64-libvirt  9 guest-start  fail   never pass
 build-i386-libvirt1 build-check(1)   blocked  n/a
 test-amd64-i386-qemuu-rhel6hvm-intel  1 build-check(1) blocked n/a
 test-amd64-i386-rhel6hvm-intel  1 build-check(1)   blocked  n/a
 test-amd64-i386-xl1 build-check(1)   blocked  n/a
 test-amd64-i386-qemut-rhel6hvm-amd  1 build-check(1)   blocked n/a
 test-amd64-i386-rhel6hvm-amd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-pcipt-intel  9 guest-start fail never pass
 test-amd64-i386-qemuu-rhel6hvm-amd  1 build-check(1)   blocked n/a
 test-amd64-i386-xl-credit21 build-check(1)   blocked  n/a
 test-amd64-i386-freebsd10-amd64  1 build-check(1)   blocked  n/a
 test-amd64-i386-qemut-rhel6hvm-intel  1 build-check(1) blocked n/a
 test-amd64-i386-freebsd10-i386  1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-qemut-win7-amd64  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-qemuu-ovmf-amd64  1 build-check(1)  blocked n/a
 test-amd64-i386-xl-qemut-debianhvm-amd64  1 build-check(1) blocked n/a
 test-amd64-i386-xl-qemut-winxpsp3  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-winxpsp3 14 guest-stop   fail   never pass
 test-amd64-i386-xl-winxpsp3-vcpus1  1 build-check(1)   blocked n/a
 test-amd64-i386-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-qemuu-debianhvm-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemut-winxpsp3 14 guest-stop   fail never pass
 test-amd64-i386-xl-win7-amd64  1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-qemuu-win7-amd64  1 build-check(1)  blocked n/a
 test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass
 test-amd64-amd64-xl-win7-amd64 14 guest-stop   fail never pass
 test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass
 test-amd64-amd64-xl-qemuu-winxpsp3 14 guest-stop   fail never pass
 test-amd64-i386-xl-qemuu-winxpsp3  1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1  1 build-check(1) blocked n/a
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1  1 build-check(1) blocked n/a
 test-amd64-i386-xl-winxpsp3   1 build-check(1)   blocked  n/a

version targeted for testing:
 seabios  60e0e55f212dadd043ab9e39bee05a48013ddd8f
baseline version:
 seabios  2c9870f9f55d9c1ecddf50eb26b777f0cea06313


People who touched revisions under test:
  Kevin O'Connor 
  Paolo Bonzini 


jobs:
 build-amd64  pass
 build-i386   broken  
 build-amd64-libvirt  pass
 build-i386-libvirt   blocked 
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl  pass
 test-amd64-i386-xl   blocked 
 test-amd64-amd64-xl-pvh-amd  fail
 test-amd64-i386-rhel6hvm-amd blocked 
 test-amd64-i386-qemut-rhel6hvm-amd   blocked 
 test-amd64-i386-qemuu-rhel6hvm-amd   blocked 
 test-amd64-amd64-xl-qemut-debianhvm-amd64pass
 test-amd64-i386-xl-qemut-debianhvm-amd64 blocked 
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-i386-xl-qemuu-debianhvm-amd64 blocked 
 test-amd64-i386-freebsd10-amd64  blocked 
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  blocked 
 test-amd64-amd64-xl-qemut-win7-amd64 fail
 test-amd64-i386-xl-qemu

Re: [Xen-devel] (v2) Design proposal for RMRR fix

2015-01-09 Thread Jan Beulich
>>> On 09.01.15 at 11:10,  wrote:
>>  From: Jan Beulich [mailto:jbeul...@suse.com]
>> Boot time device assignment is different: The question isn't whether
>> an assigned device works, instead the proper analogy is whether a
>> device is _present_. If a device doesn't work on bare metal, it will
>> still be discoverable. Yet if device assignment fails, that's not going
>> to be the case - for security reasons, the guest would not see any
>> notion of the device.
> 
> the question is whether we want such device assignment fail due to
> RMRR confliction, and the fail decision should be when Xen handles
> actual assignment instead of when domain builder prepares reserved
> regions.

Detecting the failure only in the hypervisor has the downside of
potentially leaving the user with little clues as to what went wrong.
Sending messages to the hypervisor log in that case is
questionable, yet the tool stack (namely libxc) is known to not
always do a good job in error propagation.

>> The question isn't about migrating with devices assigned, but about
>> assigning devices after migration (consider a dual vif + SR-IOV NIC
>> guest setup where the SR-IOV NIC gets hot-removed before
>> migration and a new one hot-plugged afterwards).
>> 
>> Furthermore any tying of the guest memory layout to the host's
>> where the guest first boots is awkward, as post-migration there's
>> not going to be any reliable correlation between the guest layout
>> and the new host's.
> 
> how can you solve this? like above example, a NIC on node-A leaves
> a reserved region in guest e820. now it's hot-removed and then
> migrated to node-b. there's no way to update e820 again since it's
> only boot structure. then user will still see such awkward regions.
> since it's not avoidable, report-all in the summary mail looks not
> causing a new problem.

The solution to this are reserved regions specified in the guest config,
independent of host characteristics.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] (v2) Design proposal for RMRR fix

2015-01-09 Thread Jan Beulich
>>> On 09.01.15 at 11:26,  wrote:
>>  From: Jan Beulich [mailto:jbeul...@suse.com]
>> >>> On 09.01.15 at 07:57,  wrote:
>> > 1.1) per-device 'warn' vs. global 'warn'
>> >
>> > Both Tim/Jan prefer to 'warn' as a per-device option to the admin instead
>> > of a global option.
>> >
>> > In a glimpse a per-device 'warn' option provides more fine-grained control
>> > than a global option, however if thinking it carefully allowing one device
>> > w/
>> > potential problem isn't more correct or secure than allowing multiple
>> > devices w/ potential problem. Even in practice a device like USB can
>> > work bearing <1MB confliction, like Jan pointed out there's always corner
>> > cases which we might not know so as long as we open door for one device,
>> > it implies a problematic environment to users and user's judge on whether
>> > he can live up to this problem is not impacted by how many devices the door
>> > is opened for (he anyway needs to study warning message and do
>> verification
>> > if choosing to live up)
>> >
>> > Regarding to that, imo if we agree to provide 'warn' option, just providing
>> > a global overriding option (definitely per-vm) is acceptable and simpler.
>> 
>> If the admin determined that ignoring the RMRR requirements for one
>> devices is safe, that doesn't (and shouldn't) mean this is the case for
>> all other devices too.
> 
> I don't think admin can determine whether it's 100% safe. What admin can 
> decide is whether he lives up to the potential problem based on his purpose
> or based on some experiments. only device vendor knows when and how
> RMRR is used. So as long as warn is opened for one device, I think it
> already means a problem environment and then adding more device is
> just same situation.

What if the admin consulted the device and BIOS vendors, and got
assured there's not going to be any accesses to the reserved regions
post-boot?

>> > 1.2) when to 'fail'
>> >
>> > There is one open whether we should fail immediately in domain builder
>> > if a confliction is detected.
>> >
>> > Jan's comment is yes, we should 'fail' the VM creation as it's an error.
>> >
>> > My previous point is more mimicking native behavior, where a device
>> > failure (in our case it's actually potential device failure since VM is not
>> > powered yet) doesn't impact user until its function is actually touched.
>> > In our case, even domain builder fails to re-arrange guest RAM to skip
>> > reserved regions, we have centralized policy (either 'fail' or 'warn' per
>> > above conclusion) in Xen hypervisor when the device is actually assigned.
>> > so a 'warn' should be fine, but my insist on this is not strong.
>> 
>> See my earlier reply: Failure to add a device to me is more like a
>> device preventing a bare metal system from coming up altogether.
> 
> not all devices are required for bare metal to boot. it causes problem
> only when it's being used in the boot process. say at powering up the
> disk (insert in the PCI slot) is broken (not sure whether you call such
> thing as 'failure to add a device'), it is only error when BIOS tries to
> read disk.

Not necessarily. Any malfunctioning device touched by the BIOS,
irrespective of whether the device is needed for booting, can cause
the boot process to hang. Again, the analogy to bare metal is
device presence, not whether the device is functioning properly.

> note device assignment path is the actual path to decide whether a
> device will be present to the guest. not at this domain build time.

That would only make a marginal difference in time of when domain
creation fails.

>> > and another point is about hotplug. 'fail' for future devices is too 
> strict,
>> > but to differentiate that from static-assigned devices, domain builder
>> > will then need maintain a per-device reserved region structure. just
>> > 'warn' makes things simple.
>> 
>> Whereas here I agree - hotplug should just fail (without otherwise
>> impacting the guest).
> 
> so 'should' -> 'shoundn't'?

No. Perhaps what you imply from fail is different from my reading:
I mean this to be the result of the hotplug operation - the device
would just not appear in the guest. The guest isn't to be brought
down because of such failure (i.e. behavior here is different from
the boot time assignment, where the guest would be prevented
from coming up).

>> > second, I'm not sure to what level users care about those reserved regions.
>> > At most it's same layout as physical so even sensitive users won't see it 
>> > as
>> > a UFO. :-) and e820 is platform attributes so user shouldn't set assumption
>> > on it.
>> 
>> Just consider the case where, in order to accommodate the reserved
>> regions, low memory needs to be reduced from the default of over
>> 3Gb to say 1Gb. If the guest OS then is incapable of using memory
>> above 4Gb (say Linux with HIGHMEM=n), there is a significant
>> difference to be seen by the user.
> 
> that makes some sense... but if yes it's also a limitation to you

[Xen-devel] [PATCH] MAINTAINERS: Andrew Cooper to co-maintain x86

2015-01-09 Thread Jan Beulich
Signed-off-by: Jan Beulich 

--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -341,6 +341,7 @@
 X86 ARCHITECTURE
 M: Keir Fraser 
 M: Jan Beulich 
+M: Andrew Cooper 
 S: Supported
 L: xen-devel@lists.xen.org 
 F: xen/arch/x86/




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/3] x86: also allow REP STOS emulation acceleration

2015-01-09 Thread Andrew Cooper
On 09/01/15 10:27, Jan Beulich wrote:
> While the REP MOVS acceleration appears to have helped qemu-traditional
> based guests, qemu-upstream (or really the respective video BIOSes)
> doesn't appear to benefit from that. Instead the acceleration added
> here provides a visible performance improvement during very early HVM
> guest boot.
>
> Signed-off-by: Jan Beulich 
> ---
> v2: fix asm() constraints in hvmemul_rep_stos(), as pointed out by
> Andrew. Add output operand telling the compiler that "buf" is being
> written.

Is writing buf wise? it looks like you will xfree() a wild pointer, and
appears to interfere the "buf != p_data" logic.

~Andrew

>
> --- unstable.orig/xen/arch/x86/hvm/emulate.c  2015-01-09 11:19:01.0 
> +0100
> +++ unstable/xen/arch/x86/hvm/emulate.c   2015-01-09 11:19:36.0 
> +0100
> @@ -731,6 +731,17 @@ static int hvmemul_rep_movs_discard(
>  return X86EMUL_OKAY;
>  }
>  
> +static int hvmemul_rep_stos_discard(
> +void *p_data,
> +enum x86_segment seg,
> +unsigned long offset,
> +unsigned int bytes_per_rep,
> +unsigned long *reps,
> +struct x86_emulate_ctxt *ctxt)
> +{
> +return X86EMUL_OKAY;
> +}
> +
>  static int hvmemul_rep_outs_discard(
>  enum x86_segment src_seg,
>  unsigned long src_offset,
> @@ -982,6 +993,114 @@ static int hvmemul_rep_movs(
>  return X86EMUL_OKAY;
>  }
>  
> +static int hvmemul_rep_stos(
> +void *p_data,
> +enum x86_segment seg,
> +unsigned long offset,
> +unsigned int bytes_per_rep,
> +unsigned long *reps,
> +struct x86_emulate_ctxt *ctxt)
> +{
> +struct hvm_emulate_ctxt *hvmemul_ctxt =
> +container_of(ctxt, struct hvm_emulate_ctxt, ctxt);
> +unsigned long addr;
> +paddr_t gpa;
> +p2m_type_t p2mt;
> +bool_t df = !!(ctxt->regs->eflags & X86_EFLAGS_DF);
> +int rc = hvmemul_virtual_to_linear(seg, offset, bytes_per_rep, reps,
> +   hvm_access_write, hvmemul_ctxt, 
> &addr);
> +
> +if ( rc == X86EMUL_OKAY )
> +{
> +uint32_t pfec = PFEC_page_present | PFEC_write_access;
> +
> +if ( hvmemul_ctxt->seg_reg[x86_seg_ss].attr.fields.dpl == 3 )
> +pfec |= PFEC_user_mode;
> +
> +rc = hvmemul_linear_to_phys(
> +addr, &gpa, bytes_per_rep, reps, pfec, hvmemul_ctxt);
> +}
> +if ( rc != X86EMUL_OKAY )
> +return rc;
> +
> +/* Check for MMIO op */
> +(void)get_gfn_query_unlocked(current->domain, gpa >> PAGE_SHIFT, &p2mt);
> +
> +switch ( p2mt )
> +{
> +unsigned long bytes;
> +void *buf;
> +
> +default:
> +/* Allocate temporary buffer. */
> +for ( ; ; )
> +{
> +bytes = *reps * bytes_per_rep;
> +buf = xmalloc_bytes(bytes);
> +if ( buf || *reps <= 1 )
> +break;
> +*reps >>= 1;
> +}
> +
> +if ( !buf )
> +buf = p_data;
> +else
> +switch ( bytes_per_rep )
> +{
> +unsigned long dummy;
> +
> +#define CASE(bits, suffix) \
> +case (bits) / 8:   \
> +asm ( "rep stos" #suffix   \
> +  : "=m" (*(char (*)[bytes])buf),  \
> +"=D" (dummy), "=c" (dummy) \
> +  : "a" (*(const uint##bits##_t *)p_data), \
> + "1" (buf), "2" (*reps)\
> +  : "memory" );\
> +break
> +CASE(8, b);
> +CASE(16, w);
> +CASE(32, l);
> +CASE(64, q);
> +#undef CASE
> +
> +default:
> +xfree(buf);
> +ASSERT(!buf);
> +return X86EMUL_UNHANDLEABLE;
> +}
> +
> +/* Adjust address for reverse store. */
> +if ( df )
> +gpa -= bytes - bytes_per_rep;
> +
> +rc = hvm_copy_to_guest_phys(gpa, buf, bytes);
> +
> +if ( buf != p_data )
> +xfree(buf);
> +
> +switch ( rc )
> +{
> +case HVMCOPY_gfn_paged_out:
> +case HVMCOPY_gfn_shared:
> +return X86EMUL_RETRY;
> +case HVMCOPY_okay:
> +return X86EMUL_OKAY;
> +}
> +
> +gdprintk(XENLOG_WARNING,
> + "Failed REP STOS: gpa=%"PRIpaddr" reps=%lu 
> bytes_per_rep=%u\n",
> + gpa, *reps, bytes_per_rep);
> +/* fall through */
> +case p2m_mmio_direct:
> +return X86EMUL_UNHANDLEABLE;
> +
> +case p2m_mmio_dm:
> +return hvmemul_do_mmio(gpa, reps, bytes_per_rep, 0, IOREQ_WRITE, df,
> +   p_data);
> +}
> +}
> +
>  static int hvmemul_read_segment(
>  enum x86_segment seg,
>  struct segment_register *reg,
> @@ -1239,6 +1358,7 @@ static 

Re: [Xen-devel] [PATCH] MAINTAINERS: Andrew Cooper to co-maintain x86

2015-01-09 Thread Andrew Cooper
On 09/01/15 10:55, Jan Beulich wrote:
> Signed-off-by: Jan Beulich 

Acked-by: Andrew Cooper 

(or Reviewed-by perhaps? I am not certain of the protocol here)

>
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -341,6 +341,7 @@
>  X86 ARCHITECTURE
>  M:   Keir Fraser 
>  M:   Jan Beulich 
> +M:   Andrew Cooper 
>  S:   Supported
>  L:   xen-devel@lists.xen.org 
>  F:   xen/arch/x86/
>
>
>


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/3] x86: also allow REP STOS emulation acceleration

2015-01-09 Thread Jan Beulich
>>> On 09.01.15 at 11:56,  wrote:
> On 09/01/15 10:27, Jan Beulich wrote:
>> While the REP MOVS acceleration appears to have helped qemu-traditional
>> based guests, qemu-upstream (or really the respective video BIOSes)
>> doesn't appear to benefit from that. Instead the acceleration added
>> here provides a visible performance improvement during very early HVM
>> guest boot.
>>
>> Signed-off-by: Jan Beulich 
>> ---
>> v2: fix asm() constraints in hvmemul_rep_stos(), as pointed out by
>> Andrew. Add output operand telling the compiler that "buf" is being
>> written.
> 
> Is writing buf wise? it looks like you will xfree() a wild pointer, and
> appears to interfere the "buf != p_data" logic.

Perhaps the textual description was a little ambiguous, but the code
is not: It's not the variable "buf" that gets written, but buf[] in its
entirety.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] MAINTAINERS: Andrew Cooper to co-maintain x86

2015-01-09 Thread Jan Beulich
>>> On 09.01.15 at 12:01,  wrote:
> On 09/01/15 10:55, Jan Beulich wrote:
>> Signed-off-by: Jan Beulich 
> 
> Acked-by: Andrew Cooper 
> 
> (or Reviewed-by perhaps? I am not certain of the protocol here)

I'd call it Accepted-by in this case, and would be expecting a formal
ack from Keir before committing.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/3] x86: also allow REP STOS emulation acceleration

2015-01-09 Thread Tim Deegan
At 10:56 + on 09 Jan (1420797418), Andrew Cooper wrote:
> On 09/01/15 10:27, Jan Beulich wrote:
> > While the REP MOVS acceleration appears to have helped qemu-traditional
> > based guests, qemu-upstream (or really the respective video BIOSes)
> > doesn't appear to benefit from that. Instead the acceleration added
> > here provides a visible performance improvement during very early HVM
> > guest boot.
> >
> > Signed-off-by: Jan Beulich 
> > ---
> > v2: fix asm() constraints in hvmemul_rep_stos(), as pointed out by
> > Andrew. Add output operand telling the compiler that "buf" is being
> > written.
> 
> Is writing buf wise? it looks like you will xfree() a wild pointer, and
> appears to interfere the "buf != p_data" logic.

I think the constraints are correct, though the 'memory' clobber makes
the rest of it moot.  But this:

> > +default:
> > +xfree(buf);
> > +ASSERT(!buf);

looks dodgy...

Tim.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 3/3] x86/HVM: make hvm_efer_valid() honor guest features

2015-01-09 Thread Jan Beulich
>>> On 08.01.15 at 19:49,  wrote:
> On 08/01/15 15:23, Jan Beulich wrote:
>> Following the earlier similar change validating CR4 modifications.
>>
>> Signed-off-by: Jan Beulich 
>>
>> --- a/xen/arch/x86/hvm/hvm.c
>> +++ b/xen/arch/x86/hvm/hvm.c
>> @@ -1672,20 +1672,53 @@ static int hvm_save_cpu_ctxt(struct doma
>>  return 0;
>>  }
>>  
>> -static bool_t hvm_efer_valid(struct domain *d,
>> - uint64_t value, uint64_t efer_validbits)
>> +static bool_t hvm_efer_valid(const struct vcpu *v, uint64_t value,
>> + bool_t restore)
>>  {
>> -if ( nestedhvm_enabled(d) && cpu_has_svm )
>> -efer_validbits |= EFER_SVME;
>> +unsigned int ext1_ecx = 0, ext1_edx = 0;
>>  
>> -return !((value & ~efer_validbits) ||
>> - ((sizeof(long) != 8) && (value & EFER_LME)) ||
>> - (!cpu_has_svm && (value & EFER_SVME)) ||
>> - (!cpu_has_nx && (value & EFER_NX)) ||
>> - (!cpu_has_syscall && (value & EFER_SCE)) ||
>> - (!cpu_has_lmsl && (value & EFER_LMSLE)) ||
>> - (!cpu_has_ffxsr && (value & EFER_FFXSE)) ||
>> - ((value & (EFER_LME|EFER_LMA)) == EFER_LMA));
>> +if ( !restore && !is_hardware_domain(v->domain) )
>> +{
>> +unsigned int level;
>> +
>> +ASSERT(v == current);
>> +hvm_cpuid(0x8000, &level, NULL, NULL, NULL);
>> +if ( level >= 0x8001 )
>> +hvm_cpuid(0x8001, NULL, NULL, &ext1_ecx, &ext1_edx);
>> +}
>> +else
>> +{
>> +ext1_edx = boot_cpu_data.x86_capability[X86_FEATURE_LM / 32];
>> +ext1_ecx = boot_cpu_data.x86_capability[X86_FEATURE_SVM / 32];
>> +}
>> +
>> +if ( (value & EFER_SCE) &&
>> + !(ext1_edx & cpufeat_mask(X86_FEATURE_SYSCALL)) )
>> +return 0;
>> +
>> +if ( (value & (EFER_LME | EFER_LMA)) &&
>> + !(ext1_edx & cpufeat_mask(X86_FEATURE_LM)) )
>> +return 0;
>> +
>> +if ( (value & EFER_LMA) && !(value & EFER_LME) )
>> +return 0;
> 
> The LME/LMA handling more complicated than this.
> 
> LMA is read-only on Intel, but specified as read-write on AMD, with the
> requirement that if it doesn't match the value generated by hardware, a
> #GP fault will occur.  I believe this actually means it is read-only on
> AMD as well.
> 
> LMA only gets set by hardware after paging is enabled and the processor
> switches properly into long mode, which means that there is a window
> between setting LME and setting CR0.PG where LMA should read as 0.

Did you note that hvm_set_efer() clears EFER_LMA before calling
hvm_efer_valid(), thus avoiding all those issues?

> I think hvm_efer_valid() also needs the current EFER and CR0 to work out
> what the current LMA should be, and reject any attempt to change it.

I.e. this would be needed only for the restore path (and it not being
done hasn't caused problems for me because guests don't normally
get migrated before they fully enabled long mode).

>> +if ( (value & EFER_NX) && !(ext1_edx & cpufeat_mask(X86_FEATURE_NX)) )
>> +return 0;
>> +
>> +if ( (value & EFER_SVME) &&
>> + (!(ext1_ecx & cpufeat_mask(X86_FEATURE_SVM)) ||
>> +  !nestedhvm_enabled(v->domain)) )
> 
> This is going to cause an issue for the restore case, as the HVM PARAMs
> follow the architectural state.
> 
> I don't believe it is reasonable for nestedhvm_enabled() to disagree
> with cpufeat_mask(X86_FEATURE_{SVM,VMX}), or for the toolstack to be
> able to yank nestedhvm while the VM is active.

I effectively only translated the pre-existing (and kind of suspicious
to me) nestedhvm_enabled() use here.

> Looking at the checks when setting HVM_PARAM_NESTEDHVM, neither of the
> guest feature flags are actually checked before setting up the nested
> infrastructure.

Which would be a separate issue.

> Having said all of this, don't appear to be any migration records
> associated with nested state (hardware-loaded VMCS/VMCB pointers,
> currently nested?) which leads me to suspect that a domain actually
> using nested virt is not going to survive a migration, so it might be
> acceptable to fudge the checking of SVME for now.

So am I getting you right that while these are all valid observations,
you don't really ask for a change to the patch in this regard (i.e.
taking care of above EFER_LMA issue is all that's needed, and even
that only for the unusual case of a guest getting migrated in the
middle of it enabling long mode on one of its CPUs)?

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/3] x86: also allow REP STOS emulation acceleration

2015-01-09 Thread Jan Beulich
>>> On 09.01.15 at 12:18,  wrote:
> At 10:56 + on 09 Jan (1420797418), Andrew Cooper wrote:
>> On 09/01/15 10:27, Jan Beulich wrote:
>> > While the REP MOVS acceleration appears to have helped qemu-traditional
>> > based guests, qemu-upstream (or really the respective video BIOSes)
>> > doesn't appear to benefit from that. Instead the acceleration added
>> > here provides a visible performance improvement during very early HVM
>> > guest boot.
>> >
>> > Signed-off-by: Jan Beulich 
>> > ---
>> > v2: fix asm() constraints in hvmemul_rep_stos(), as pointed out by
>> > Andrew. Add output operand telling the compiler that "buf" is being
>> > written.
>> 
>> Is writing buf wise? it looks like you will xfree() a wild pointer, and
>> appears to interfere the "buf != p_data" logic.
> 
> I think the constraints are correct, though the 'memory' clobber makes
> the rest of it moot.  But this:

Ah, yes, the memory clobber could be dropped now that the "=m"
is there.

>> > +default:
>> > +xfree(buf);
>> > +ASSERT(!buf);
> 
> looks dodgy...

In which way? The "default" is supposed to be unreachable, and sits
in the else branch to an if(!buf), i.e. in a release build we'll correctly
free the buffer, while in a debug build the ASSERT() will trigger.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] treewide: Convert clockevents_notify to use int cpu

2015-01-09 Thread Thierry Reding
On Wed, Dec 10, 2014 at 03:28:53PM -0800, Joe Perches wrote:
[...]
>  arch/arm/mach-tegra/cpuidle-tegra114.c |  4 ++--
>  arch/arm/mach-tegra/cpuidle-tegra20.c  |  8 
>  arch/arm/mach-tegra/cpuidle-tegra30.c  |  8 
[...]

Acked-by: Thierry Reding 


pgpsUlJQpYjp_.pgp
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/3] x86: also allow REP STOS emulation acceleration

2015-01-09 Thread Tim Deegan
At 11:24 + on 09 Jan (1420799087), Jan Beulich wrote:
> >>> On 09.01.15 at 12:18,  wrote:
> >> > +default:
> >> > +xfree(buf);
> >> > +ASSERT(!buf);
> > 
> > looks dodgy...
> 
> In which way? The "default" is supposed to be unreachable, and sits
> in the else branch to an if(!buf), i.e. in a release build we'll correctly
> free the buffer, while in a debug build the ASSERT() will trigger.

Oh I see.  Can you please use ASSERT(0) for that?

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] (v2) Design proposal for RMRR fix

2015-01-09 Thread Andrew Cooper
On 09/01/15 00:53, Tian, Kevin wrote:
>> From: Tim Deegan [mailto:t...@xen.org]
>> Sent: Thursday, January 08, 2015 8:32 PM
>>
>> Hi Kevin,
>>
>> Thanks for sending out this design document.  I think Jan will have
>> the most to say about this.  Looking just at the hypervisor side of
>> things, and leaving the tools desig to others...
>>
>> At 11:23 + on 26 Dec (1419589382), Tian, Kevin wrote:
>>> c) in Xen hypervisor it is reasonable to fail upon confliction, where
>>> device is actually assigned. But due to the same requirement on USB
>>> controller, sometimes we might want it succeed just w/ warnings.
>> Can you explain more concretely why we would want to allow assignment
>> whn the RMRR setup fails?  It seems like the device's use of the RMRR
>> will (at least) corrupt the OS and (possibly) make the device itself
>> fail.
> For USB device, RMRR is used only in early-boot phase, as a way for
> communication between legacy keyboard driver and bios. That emulation
> mode will be disabled either at some ACPI initialization step or when
> USB keyboard driver is loader (sorry don't remember detail). So when
> it's assigned to a guest, there's no legacy emulation and not setting up
> identity mapping is not a critical issue. If we think such usage case is
> a valid one, 'fail' would cause regression on that.

XenServer has observed OEMs using RMRRs for stats reporting back to the BMC.

It is not safe to assume that a USB device with an RMRR would only ever
be using it for legacy keyboard emulation.  It is also not safe to
assume that legacy keyboard emulation might not resume at some point in
the future, although I would hope that it would not.

Therefore, I feel that USB controllers should not be permitted a special
case compared to other devices.

~Andrew


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/3] x86: also allow REP STOS emulation acceleration

2015-01-09 Thread Jan Beulich
>>> On 09.01.15 at 12:45,  wrote:
> At 11:24 + on 09 Jan (1420799087), Jan Beulich wrote:
>> >>> On 09.01.15 at 12:18,  wrote:
>> >> > +default:
>> >> > +xfree(buf);
>> >> > +ASSERT(!buf);
>> > 
>> > looks dodgy...
>> 
>> In which way? The "default" is supposed to be unreachable, and sits
>> in the else branch to an if(!buf), i.e. in a release build we'll correctly
>> free the buffer, while in a debug build the ASSERT() will trigger.
> 
> Oh I see.  Can you please use ASSERT(0) for that?

I sincerely dislike ASSERT(0), but if that's the only way to get
the patch accepted...

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v3] libxl: Spice streaming video setting support for upstream qemu

2015-01-09 Thread Fabio Fantoni
Usage:
spice_streaming_video=[filter|all|off]

Specifies what streaming video setting is to be used by spice (if
given),
otherwise the qemu default will be used.

Signed-off-by: Fabio Fantoni 

---

Changes in v3:
- fixed a mistake in libxl_dm.c

Changes in v2:
- refresh
---
 docs/man/xl.cfg.pod.5   |  5 +
 tools/libxl/libxl.h | 11 +++
 tools/libxl/libxl_dm.c  |  3 +++
 tools/libxl/libxl_types.idl |  1 +
 tools/libxl/xl_cmdimpl.c|  2 ++
 5 files changed, 22 insertions(+)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index 07797b1..2725eab 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -1426,6 +1426,11 @@ an usb2 controller. The default is disabled (0).
 Specifies what image compression is to be used by spice (if given), otherwise
 the qemu default will be used.
 
+=item B
+
+Specifies what streaming video setting is to be used by spice (if given),
+otherwise the qemu default will be used.
+
 =back
 
 =head3 Miscellaneous Emulated Hardware
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index b8e0b67..c219f59 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -539,6 +539,17 @@ typedef struct libxl__ctx libxl_ctx;
 #define LIBXL_HAVE_SPICE_IMAGECOMPRESSION 1
 
 /*
+ * LIBXL_HAVE_SPICE_STREAMINGVIDEO
+ *
+ * If defined, then the libxl_spice_info structure will contain a string type
+ * field: streaming_video. This value defines what Spice streaming video 
setting
+ * is used.
+ *
+ * If this is not defined, the Spice streaming video setting support is 
ignored.
+ */
+#define LIBXL_HAVE_SPICE_STREAMINGVIDEO 1
+
+/*
  * LIBXL_HAVE_DOMAIN_CREATE_RESTORE_PARAMS 1
  *
  * If this is defined, libxl_domain_create_restore()'s API has changed to
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index e1188f4..e80b8d7 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -401,6 +401,9 @@ static char *dm_spice_options(libxl__gc *gc,
 if (spice->image_compression)
 opt = libxl__sprintf(gc, "%s,image-compression=%s", opt, 
spice->image_compression);
 
+if (spice->streaming_video)
+opt = libxl__sprintf(gc, "%s,streaming-video=%s", opt, 
spice->streaming_video);
+
 return opt;
 }
 
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 1dc4233..fbbac62 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -241,6 +241,7 @@ libxl_spice_info = Struct("spice_info", [
 ("clipboard_sharing", libxl_defbool),
 ("usbredirection", integer),
 ("image_compression", string),
+("streaming_video", string),
 ])
 
 libxl_sdl_info = Struct("sdl_info", [
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 8afac5a..bb22ae3 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -1940,6 +1940,8 @@ skip_vfb:
 b_info->u.hvm.spice.usbredirection = l;
 xlu_cfg_replace_string (config, "spice_image_compression",
 &b_info->u.hvm.spice.image_compression, 0);
+xlu_cfg_replace_string (config, "spice_streaming_video",
+&b_info->u.hvm.spice.streaming_video, 0);
 xlu_cfg_get_defbool(config, "nographic", &b_info->u.hvm.nographic, 0);
 xlu_cfg_get_defbool(config, "gfx_passthru",
 &b_info->u.hvm.gfx_passthru, 0);
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/3] x86: also allow REP STOS emulation acceleration

2015-01-09 Thread Andrew Cooper
On 09/01/15 12:10, Jan Beulich wrote:
 On 09.01.15 at 12:45,  wrote:
>> At 11:24 + on 09 Jan (1420799087), Jan Beulich wrote:
>> On 09.01.15 at 12:18,  wrote:
>> +default:
>> +xfree(buf);
>> +ASSERT(!buf);
 looks dodgy...
>>> In which way? The "default" is supposed to be unreachable, and sits
>>> in the else branch to an if(!buf), i.e. in a release build we'll correctly
>>> free the buffer, while in a debug build the ASSERT() will trigger.
>> Oh I see.  Can you please use ASSERT(0) for that?
> I sincerely dislike ASSERT(0), but if that's the only way to get
> the patch accepted...
>
> Jan
>

Perhaps introducing ASSERT_UNREACHABLE() as an alternative which is more
obvious in nature than both ASSERT(!buf) and ASSERT(0) ?

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 3/3] xen: use correct type for physical addresses

2015-01-09 Thread David Vrabel
On 09/01/15 09:57, Jan Beulich wrote:
 On 08.01.15 at 18:01,  wrote:
>> --- a/arch/x86/xen/setup.c
>> +++ b/arch/x86/xen/setup.c
>> @@ -140,7 +140,7 @@ static void __init xen_del_extra_mem(u64 start, u64 size)
>>  unsigned long __ref xen_chk_extra_mem(unsigned long pfn)
>>  {
>>  int i;
>> -unsigned long addr = PFN_PHYS(pfn);
>> +u64 addr = PFN_PHYS(pfn);
> 
> Isn't phys_addr_t the type to use here?

Agreed.

>> @@ -284,7 +286,7 @@ static void __init xen_update_mem_tables(unsigned long 
>> pfn, unsigned long mfn)
>>  }
>>  
>>  /* Update kernel mapping, but not for highmem. */
>> -if ((pfn << PAGE_SHIFT) >= __pa(high_memory))
>> +if (PFN_PHYS(pfn) >= (u64)(__pa(high_memory)))
> 
> I don't think you really need the cast on the right side - __pa()
> should be returning a value of suitable type (and unsigned long
> would be sufficient for anything up to and including high_memory).

I'd prefer:

if (pfn >= PFN_DOWN(__pa(high_memory))

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 3/3] xen: use correct type for physical addresses

2015-01-09 Thread David Vrabel
On 08/01/15 17:01, Juergen Gross wrote:
> When converting a pfn to a physical address be sure to use 64 bit
> wide types.
> 
> Also avoid invalidating memory for zero sized non-aligned extra
> memory regions.

"Also" means this bit should be in another patch...

> Signed-off-by: Juergen Gross 
> Tested-by: Boris Ostrovsky 
> ---
>  arch/x86/xen/setup.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
> index feb6d86..8839d7b 100644
> --- a/arch/x86/xen/setup.c
> +++ b/arch/x86/xen/setup.c
> @@ -160,6 +160,8 @@ void __init xen_inv_extra_mem(void)
>   int i;
>  
>   for (i = 0; i < XEN_EXTRA_MEM_MAX_REGIONS; i++) {
> + if (!xen_extra_mem[i].size)
> + continue;
>   pfn_s = PFN_DOWN(xen_extra_mem[i].start);
>   pfn_e = PFN_UP(xen_extra_mem[i].start + xen_extra_mem[i].size);
>   for (pfn = pfn_s; pfn < pfn_e; pfn++)

i.e., this hunk should be in a separate path.

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 3/3] xen: use correct type for physical addresses

2015-01-09 Thread Jan Beulich
>>> On 09.01.15 at 13:51,  wrote:
> On 09/01/15 09:57, Jan Beulich wrote:
> On 08.01.15 at 18:01,  wrote:
>>> @@ -284,7 +286,7 @@ static void __init xen_update_mem_tables(unsigned long 
>>> pfn, unsigned long mfn)
>>> }
>>>  
>>> /* Update kernel mapping, but not for highmem. */
>>> -   if ((pfn << PAGE_SHIFT) >= __pa(high_memory))
>>> +   if (PFN_PHYS(pfn) >= (u64)(__pa(high_memory)))
>> 
>> I don't think you really need the cast on the right side - __pa()
>> should be returning a value of suitable type (and unsigned long
>> would be sufficient for anything up to and including high_memory).
> 
> I'd prefer:
> 
> if (pfn >= PFN_DOWN(__pa(high_memory))

Even better indeed. Just one more point: Strictly speaking
__pa(high_memory) is invalid, as __pa() is defined for low
memory only. Hence perhaps PFN_UP(__pa(high_memory - 1))?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 1/2] gnttab: Introduce rwlock to protect updates to grant table state

2015-01-09 Thread Egger, Christoph
On 2014/12/18 12:37, Jan Beulich wrote:
 On 03.12.14 at 15:29,  wrote:
>> --- a/xen/common/grant_table.c
>> +++ b/xen/common/grant_table.c
[...]
>> @@ -944,6 +944,7 @@ __gnttab_unmap_common(
>>  }
>>  
>>  op->rd = rd;
>> +read_lock(&rgt->lock);
>>  act = &active_entry(rgt, op->map->ref);
>>  
>>  if ( op->frame == 0 )
> 
> The nesting of the two locks should be mentioned in the doc change.

Where do you see the nesting coming from?

Christoph



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen 4.5 Development Update (GA slip by a week), GA slip + 1 day

2015-01-09 Thread Lars Kurth
Note that I published the Acknowledgements and most 4.5 documentation is now in 
place. Bits which you may want to check and fix
* Spelling of your name if it contains special characters as the scripts that I 
use nuke them - http://wiki.xenproject.org/wiki/Xen_Project_4.5_Acknowledgements
* Some additions to 
http://wiki.xenproject.org/wiki/Xen_Project_Release_Features, which I will do

And we do need to populate the Known Issues section of 
http://wiki.xenproject.org/wiki/Xen_Project_4.5_Release_Notes

Regards
Lars

On 8 Jan 2015, at 19:30, Konrad Rzeszutek Wilk  wrote:

 = Timeline =
 
 Xen 4.5 is a 10 month release. The dates are:
 
 * Feature Freeze: 24th September 2014
 * First RC: 24th October [Friday!]
 * RC2: Nov 11th
 * RC2 Test-day: Nov 13th
 * RC3: Dec 3rd.
 * RC3 Test-day: Dec 4th
 * RC4: Dec 15th
 * RC4 Test-day: Dec 17th
 
 < WE ARE HERE ===>
 
 Release Date: Jan 14th.
> 
> New date: Jan 15th.
> 
> Due to marketing reasons (LinuxFoundation has something
> scheduled on Jan 14th and we don't want to overlap).
> 
> That should not change anything on the technical side as the
> last commit went in yesterday. Just a bit longer..
> 
 
 
 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel
>>> 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH] libxl: Don't ignore error when we fail to give access to ioport/irq

2015-01-09 Thread Julien Grall
If we fail to give the access, the domain will unlikely work correctly.
So we should bail out at the first error.

Signed-off-by: Julien Grall 
Cc: Ian Jackson 
Cc: Stefano Stabellini 
Cc: Ian Campbell 
Cc: Wei Liu 

---
This patch is candidate for a backport for at least Xen 4.4 and Xen 4.5.
---
 tools/libxl/libxl_create.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 1198225..09d481a 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -1167,6 +1167,7 @@ static void domcreate_launch_dm(libxl__egc *egc, 
libxl__multidev *multidev,
  "failed give dom%d access to ioports %"PRIx32"-%"PRIx32,
  domid, io->first, io->first + io->number - 1);
 ret = ERROR_FAIL;
+goto error_out;
 }
 }
 
@@ -1182,6 +1183,7 @@ static void domcreate_launch_dm(libxl__egc *egc, 
libxl__multidev *multidev,
 if (ret < 0) {
 LOGE(ERROR, "failed give dom%d access to irq %d", domid, irq);
 ret = ERROR_FAIL;
+goto error_out;
 }
 }
 
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 1/2] gnttab: Introduce rwlock to protect updates to grant table state

2015-01-09 Thread Jan Beulich
>>> On 09.01.15 at 14:05,  wrote:
> On 2014/12/18 12:37, Jan Beulich wrote:
> On 03.12.14 at 15:29,  wrote:
>>> --- a/xen/common/grant_table.c
>>> +++ b/xen/common/grant_table.c
> [...]
>>> @@ -944,6 +944,7 @@ __gnttab_unmap_common(
>>>  }
>>>  
>>>  op->rd = rd;
>>> +read_lock(&rgt->lock);
>>>  act = &active_entry(rgt, op->map->ref);
>>>  
>>>  if ( op->frame == 0 )
>> 
>> The nesting of the two locks should be mentioned in the doc change.
> 
> Where do you see the nesting coming from?

Reproducing the subsequent hunk I had commented on

> @@ -1004,6 +1005,7 @@ __gnttab_unmap_common(
>  
>   unmap_out:
>  double_gt_unlock(lgt, rgt);
> +read_unlock(&rgt->lock);

there obviously are two (or really three) locks in simultaneous use
here.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] libxl: Don't ignore error when we fail to give access to ioport/irq

2015-01-09 Thread Wei Liu
On Fri, Jan 09, 2015 at 01:54:22PM +, Julien Grall wrote:
> If we fail to give the access, the domain will unlikely work correctly.
> So we should bail out at the first error.
> 
> Signed-off-by: Julien Grall 
> Cc: Ian Jackson 
> Cc: Stefano Stabellini 
> Cc: Ian Campbell 
> Cc: Wei Liu 
> 
> ---
> This patch is candidate for a backport for at least Xen 4.4 and Xen 4.5.

The handling of io mem has same issue. You may also want to fix that?

Also do you need to clean up (revoke permission) the ports, irqs and io
mems that have already been assigned? I presume that's done when domain
is destroyed?

Wei.

> ---
>  tools/libxl/libxl_create.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
> index 1198225..09d481a 100644
> --- a/tools/libxl/libxl_create.c
> +++ b/tools/libxl/libxl_create.c
> @@ -1167,6 +1167,7 @@ static void domcreate_launch_dm(libxl__egc *egc, 
> libxl__multidev *multidev,
>   "failed give dom%d access to ioports %"PRIx32"-%"PRIx32,
>   domid, io->first, io->first + io->number - 1);
>  ret = ERROR_FAIL;
> +goto error_out;
>  }
>  }
>  
> @@ -1182,6 +1183,7 @@ static void domcreate_launch_dm(libxl__egc *egc, 
> libxl__multidev *multidev,
>  if (ret < 0) {
>  LOGE(ERROR, "failed give dom%d access to irq %d", domid, irq);
>  ret = ERROR_FAIL;
> +goto error_out;
>  }
>  }
>  
> -- 
> 2.1.4

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen 4.5 Development Update (GA slip by a week), GA slip + 1 day

2015-01-09 Thread Julien Grall
Sorry, I forgot to CC xen-devel.

On 9 January 2015 at 14:19, Julien Grall  wrote:
> (Dropping most of the CC)
>
> Hi Lars,
>
> On 09/01/15 13:51, Lars Kurth wrote:
>> Note that I published the Acknowledgements and most 4.5 documentation is now 
>> in place. Bits which you may want to check and fix
>> * Spelling of your name if it contains special characters as the scripts 
>> that I use nuke them - 
>> http://wiki.xenproject.org/wiki/Xen_Project_4.5_Acknowledgements
>> * Some additions to 
>> http://wiki.xenproject.org/wiki/Xen_Project_Release_Features, which I will do
>
> Some of the release features section gives the impression that we
> support them in ARM:
> * Advanced Memory Management:
> - Memory sharing is not supported on ARM.
> - I'm not sure about Memory paging
> * Interoperability / Hardware Support:
> - None of them is supported on ARM
> * High Availability and Fault tolerance:
> - Live Migration is not supported.
> - vMCE is x86 specific
> * Security:
> - Device Model stub domains is only x86
> - MemAccess is not yet supported on ARM. It has been deferred 
> for Xen 4.6
>
> * Tooling:
> - Gdbsx is not supported
> - vPMU is not supported
> - xentrace is not supported
>
>
>
> --
> Julien Grall



-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] libxl: Don't ignore error when we fail to give access to ioport/irq

2015-01-09 Thread Julien Grall
Hi Wei,

On 09/01/15 14:15, Wei Liu wrote:
> On Fri, Jan 09, 2015 at 01:54:22PM +, Julien Grall wrote:
>> If we fail to give the access, the domain will unlikely work correctly.
>> So we should bail out at the first error.
>>
>> Signed-off-by: Julien Grall 
>> Cc: Ian Jackson 
>> Cc: Stefano Stabellini 
>> Cc: Ian Campbell 
>> Cc: Wei Liu 
>>
>> ---
>> This patch is candidate for a backport for at least Xen 4.4 and Xen 4.5.
> 
> The handling of io mem has same issue. You may also want to fix that?

I forgot this one. I will fix in the next version.

> Also do you need to clean up (revoke permission) the ports, irqs and io
> mems that have already been assigned?

We don't need to clean up ioport/irqs/iomems. It's already done
implicitly when the domain is destroyed.

> I presume that's done when domain
> is destroyed?

If domain destroyed doesn't correctly revoke the permission that would
mean the code is buggy in the normal behavior.

Regards,


-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH] libxl: provide xenlight.pc

2015-01-09 Thread Wei Liu
A pkg-config file for libxl. It also contains two variables
(xenfirmwaredir and libexec_bin) so that tools that are very keen on
knowing the locations of Xen binaries (say, libvirt) can use them to
determine the location of the binaries.

Please rerun autogen.sh after applying this patch.

Signed-off-by: Wei Liu 
Cc: Ian Campbell 
Cc: Ian Jackson 
---
 .gitignore|2 ++
 config/Paths.mk.in|2 +-
 m4/paths.m4   |3 +++
 tools/configure.ac|1 +
 tools/libxl/Makefile  |   11 ++-
 tools/libxl/xenlight.pc.in.in |   11 +++
 6 files changed, 28 insertions(+), 2 deletions(-)
 create mode 100644 tools/libxl/xenlight.pc.in.in

diff --git a/.gitignore b/.gitignore
index 8c8c06f..21f0ca6 100644
--- a/.gitignore
+++ b/.gitignore
@@ -158,6 +158,8 @@ tools/include/xen/*
 tools/include/xen-foreign/*.(c|h|size)
 tools/include/xen-foreign/checker
 tools/libxl/libxlu_cfg_y.output
+tools/libxl/xenlight.pc
+tools/libxl/xenlight.pc.in
 tools/libxl/xl
 tools/libxl/testenum
 tools/libxl/testenum.c
diff --git a/config/Paths.mk.in b/config/Paths.mk.in
index fe10f76..150bae7 100644
--- a/config/Paths.mk.in
+++ b/config/Paths.mk.in
@@ -57,7 +57,7 @@ BASH_COMPLETION_DIR  := $(CONFIG_DIR)/bash_completion.d
 XEN_LOCK_DIR := @XEN_LOCK_DIR@
 XEN_PAGING_DIR   := @XEN_PAGING_DIR@
 
-XENFIRMWAREDIR   := $(LIBEXEC)/boot
+XENFIRMWAREDIR   := @XENFIRMWAREDIR@
 
 XEN_CONFIG_DIR   := @XEN_CONFIG_DIR@
 XEN_SCRIPT_DIR   := @XEN_SCRIPT_DIR@
diff --git a/m4/paths.m4 b/m4/paths.m4
index 7ede5bd..db74f55 100644
--- a/m4/paths.m4
+++ b/m4/paths.m4
@@ -77,6 +77,9 @@ dnl This variable will be substituted in various .in files
 LIBEXEC_BIN=`eval echo $libexecdir/$PACKAGE_TARNAME/bin`
 AC_SUBST(LIBEXEC_BIN)
 
+XENFIRMWAREDIR=`eval echo $libexecdir/$PACKAGE_TARNAME/boot`
+AC_SUBST(XENFIRMWAREDIR)
+
 XEN_RUN_DIR=$localstatedir/run/xen
 AC_SUBST(XEN_RUN_DIR)
 
diff --git a/tools/configure.ac b/tools/configure.ac
index 1ac63a3..741c36c 100644
--- a/tools/configure.ac
+++ b/tools/configure.ac
@@ -27,6 +27,7 @@ hotplug/Linux/xen-backend.rules
 hotplug/Linux/xen-hotplug-common.sh
 hotplug/Linux/xendomains
 hotplug/NetBSD/rc.d/xencommons
+libxl/xenlight.pc.in
 ])
 AC_CONFIG_HEADERS([config.h])
 AC_CONFIG_AUX_DIR([../])
diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index df08c8a..b6d5d22 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -143,13 +143,15 @@ $(XEN_INIT_DOM0_OBJS): CFLAGS += $(CFLAGS_libxenstore)
 SAVE_HELPER_OBJS = libxl_save_helper.o _libxl_save_msgs_helper.o
 $(SAVE_HELPER_OBJS): CFLAGS += $(CFLAGS_libxenctrl)
 
+PKG_CONFIG = xenlight.pc
+
 testidl.o: CFLAGS += $(CFLAGS_libxenctrl) $(CFLAGS_libxenlight)
 testidl.c: libxl_types.idl gentest.py libxl.h $(AUTOINCS)
$(PYTHON) gentest.py libxl_types.idl testidl.c.new
mv testidl.c.new testidl.c
 
 .PHONY: all
-all: $(CLIENTS) $(TEST_PROGS) \
+all: $(CLIENTS) $(TEST_PROGS) $(PKG_CONFIG) \
libxenlight.so libxenlight.a libxlutil.so libxlutil.a \
$(AUTOSRCS) $(AUTOINCS)
 
@@ -248,6 +250,10 @@ libxl-save-helper: $(SAVE_HELPER_OBJS) libxenlight.so
 testidl: testidl.o libxlutil.so libxenlight.so
$(CC) $(LDFLAGS) -o $@ testidl.o libxlutil.so $(LDLIBS_libxenlight) 
$(LDLIBS_libxenctrl) $(APPEND_LDFLAGS)
 
+xenlight.pc: xenlight.pc.in Makefile
+   @sed -e 's/@@version@@/$(MAJOR).$(MINOR)/g' < xenlight.pc.in > $@.new
+   @mv -f $@.new $@
+
 .PHONY: install
 install: all
$(INSTALL_DIR) $(DESTDIR)$(SBINDIR)
@@ -255,6 +261,7 @@ install: all
$(INSTALL_DIR) $(DESTDIR)$(INCLUDEDIR)
$(INSTALL_DIR) $(DESTDIR)$(BASH_COMPLETION_DIR)
$(INSTALL_DIR) $(DESTDIR)$(LIBEXEC_BIN)
+   $(INSTALL_DIR) $(DESTDIR)$(SHAREDIR)/pkgconfig
$(INSTALL_PROG) xl $(DESTDIR)$(SBINDIR)
$(INSTALL_PROG) xen-init-dom0 $(DESTDIR)$(LIBEXEC_BIN)
$(INSTALL_PROG) libxl-save-helper $(DESTDIR)$(LIBEXEC_BIN)
@@ -268,12 +275,14 @@ install: all
$(INSTALL_DATA) libxlutil.a $(DESTDIR)$(LIBDIR)
$(INSTALL_DATA) libxl.h libxl_event.h libxl_json.h _libxl_types.h 
_libxl_types_json.h _libxl_list.h libxl_utils.h libxl_uuid.h 
$(DESTDIR)$(INCLUDEDIR)
$(INSTALL_DATA) bash-completion $(DESTDIR)$(BASH_COMPLETION_DIR)/xl.sh
+   $(INSTALL_DATA) xenlight.pc $(DESTDIR)$(SHAREDIR)/pkgconfig/
 
 .PHONY: clean
 clean:
$(RM) -f _*.h *.o *.so* *.a $(CLIENTS) $(DEPS)
$(RM) -f _*.c *.pyc _paths.*.tmp _*.api-for-check
$(RM) -f testidl.c.new testidl.c *.api-ok
+   $(RM) -f xenlight.pc xenlight.pc.in
 
 distclean: clean
 
diff --git a/tools/libxl/xenlight.pc.in.in b/tools/libxl/xenlight.pc.in.in
new file mode 100644
index 000..c27872e
--- /dev/null
+++ b/tools/libxl/xenlight.pc.in.in
@@ -0,0 +1,11 @@
+prefix=@prefix@
+includedir=@includedir@
+libdir=@libdir@
+xenfirmwaredir=@XENFIRMWAREDIR@
+libexec_bin=@LIBEXEC_BIN@
+
+Name: Xenlight
+Descriptio

Re: [Xen-devel] [PATCH] MAINTAINERS: Andrew Cooper to co-maintain x86

2015-01-09 Thread Keir Fraser



Jan Beulich 
9 January 2015 10:55
Signed-off-by: Jan Beulich 


Acked-by: Keir Fraser 


--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -341,6 +341,7 @@
X86 ARCHITECTURE
M: Keir Fraser 
M: Jan Beulich 
+M: Andrew Cooper 
S: Supported
L: xen-devel@lists.xen.org
F: xen/arch/x86/



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH resend] libxl, hotplug/Linux: default to phy backend for raw format file, take 2

2015-01-09 Thread Wei Liu
On Thu, Jan 08, 2015 at 03:33:17PM +0100, Fabio Fantoni wrote:
> Il 08/01/2015 14:46, Wei Liu ha scritto:
> >This patch resurrects 11a63a166. The previous patch had a bug that
> >wrong "physical-device" was written to xenstore causing block script
> >execution fail. This patch fixes that problem.
> >
> >Following configurations have been tested:
> >
> >1. Raw file and PV
> >2. Raw file and HVM
> >3. Block device and PV
> >4. Block device and HVM
> >
> >Creation / destruction / local migration all worked.
> >
> >Original commit message from 11a63a166:
> >
> >Modify libxl and hotplug script to allow raw format file to use phy
> >backend.
> >
> >The block script now tests the path and determine the actual type of
> >file (block device or regular file) then use the actual type to
> >determine which branch to run.
> >
> >With these changes, plus the current ordering of backend preference (phy
> >>qdisk > tap), we will use phy backend for raw format file by default.
> 
> Sorry for the probably stupid question, what are the pros and cons of
> default use of phy instead qdisk for raw files as domU disk?
> 

There's no stupid question. :-)

I was told that it performs better and enables other potential
improvements.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] tools/misc: Cleanup makefile

2015-01-09 Thread Wei Liu
On Thu, Jan 08, 2015 at 01:34:22PM +, Andrew Cooper wrote:
> The existing makefile was awkward with needing to express conditional
> inclusion for both the build and install rules, and contained both split and
> unsplit long lines.
> 
> The INSTALL_* rules now contain the conditional inclusion information, while
> the TARGET_* rules generate the build list from the complete install list,
> less the minority of scripts which simply need copying into place.  Comments
> are introduces to aid clarity.
> 
> In addition, collect the CFLAGS expressions, remove the unreferenced and empty
> HDRS list, and restrict the libxc internals build fix to the offending
> binaries.
> 
> No functional change as a result of this patch, but it is rather more simple
> to add or remove utilities.
> 
> Signed-off-by: Andrew Cooper 
> CC: Ian Campbell 
> CC: Ian Jackson 
> CC: Wei Liu 

This looks correct to me.

Acked-by: Wei Liu 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 1/2] gnttab: Introduce rwlock to protect updates to grant table state

2015-01-09 Thread Egger, Christoph
On 2015/01/09 15:10, Jan Beulich wrote:
 On 09.01.15 at 14:05,  wrote:
>> On 2014/12/18 12:37, Jan Beulich wrote:
>> On 03.12.14 at 15:29,  wrote:
 --- a/xen/common/grant_table.c
 +++ b/xen/common/grant_table.c
>> [...]
 @@ -944,6 +944,7 @@ __gnttab_unmap_common(
  }
  
  op->rd = rd;
 +read_lock(&rgt->lock);
  act = &active_entry(rgt, op->map->ref);
  
  if ( op->frame == 0 )
>>>
>>> The nesting of the two locks should be mentioned in the doc change.
>>
>> Where do you see the nesting coming from?
> 
> Reproducing the subsequent hunk I had commented on
> 
>> @@ -1004,6 +1005,7 @@ __gnttab_unmap_common(
>>  
>>   unmap_out:
>>  double_gt_unlock(lgt, rgt);
>> +read_unlock(&rgt->lock);
> 
> there obviously are two (or really three) locks in simultaneous use
> here.

Ah, I see now. My mistake is I was looking for multiple subsequent
read_lock(&rgt->lock); calls and found nothing.

Christoph


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/3] xen: correct race in alloc_p2m_pmd()

2015-01-09 Thread David Vrabel
On 08/01/15 17:01, Juergen Gross wrote:
> When allocating a new pmd for the linear mapped p2m list a check is
> done for not introducing another pmd when this just happened on
> another cpu. In this case the old pte pointer was returned which
> points to the p2m_missing or p2m_identity page. The correct value
> would be the pointer to the found new page.

Looking at the check I don't see how it works at all.

alloc_p2m() looks up the address of the PTE page

ptep = lookup_address(addr, &level);
pte_pg = (pte_t *)((unsigned long)ptep & ~(PAGE_SIZE - 1));

But the check under the lock that is still true does:

ptechk = lookup_address(vaddr, &level);
if (ptechk == pte_pg) {

So I don't see how this works unless (by chance) we happen to get the
first entry in the PTE page.

It also doesn't handle racing with p2m_missing_pte to p2m_identity_pte
(or vice-versa) change.

Something like:

ptechk = lookup_address(vaddr, &level);
ptechk = (pte_t *)((unsigned long)ptechk & ~(PAGE_SIZE - 1));
if (ptechk == p2m_missing_pte || ptechk == p2m_identity) {
  set_pmd(..)

Perhaps?

David

> --- a/arch/x86/xen/p2m.c
> +++ b/arch/x86/xen/p2m.c
> @@ -440,10 +440,9 @@ EXPORT_SYMBOL_GPL(get_phys_to_machine);
>   * a new pmd is to replace p2m_missing_pte or p2m_identity_pte by a 
> individual
>   * pmd. In case of PAE/x86-32 there are multiple pmds to allocate!
>   */
> -static pte_t *alloc_p2m_pmd(unsigned long addr, pte_t *ptep, pte_t *pte_pg)
> +static pte_t *alloc_p2m_pmd(unsigned long addr, pte_t *pte_pg)
>  {
>   pte_t *ptechk;
> - pte_t *pteret = ptep;
>   pte_t *pte_newpg[PMDS_PER_MID_PAGE];
>   pmd_t *pmdp;
>   unsigned int level;
> @@ -477,8 +476,6 @@ static pte_t *alloc_p2m_pmd(unsigned long addr, pte_t 
> *ptep, pte_t *pte_pg)
>   if (ptechk == pte_pg) {
>   set_pmd(pmdp,
>   __pmd(__pa(pte_newpg[i]) | _KERNPG_TABLE));
> - if (vaddr == (addr & ~(PMD_SIZE - 1)))
> - pteret = pte_offset_kernel(pmdp, addr);
>   pte_newpg[i] = NULL;
>   }
>  
> @@ -492,7 +489,7 @@ static pte_t *alloc_p2m_pmd(unsigned long addr, pte_t 
> *ptep, pte_t *pte_pg)
>   vaddr += PMD_SIZE;
>   }
>  
> - return pteret;
> + return lookup_address(addr, &level);
>  }
>  
>  /*
> @@ -521,7 +518,7 @@ static bool alloc_p2m(unsigned long pfn)
>  
>   if (pte_pg == p2m_missing_pte || pte_pg == p2m_identity_pte) {
>   /* PMD level is missing, allocate a new one */
> - ptep = alloc_p2m_pmd(addr, ptep, pte_pg);
> + ptep = alloc_p2m_pmd(addr, pte_pg);
>   if (!ptep)
>   return false;
>   }
> 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 3/3] x86/HVM: make hvm_efer_valid() honor guest features

2015-01-09 Thread Andrew Cooper
On 09/01/15 11:20, Jan Beulich wrote:
 On 08.01.15 at 19:49,  wrote:
>> On 08/01/15 15:23, Jan Beulich wrote:
>>> Following the earlier similar change validating CR4 modifications.
>>>
>>> Signed-off-by: Jan Beulich 
>>>
>>> --- a/xen/arch/x86/hvm/hvm.c
>>> +++ b/xen/arch/x86/hvm/hvm.c
>>> @@ -1672,20 +1672,53 @@ static int hvm_save_cpu_ctxt(struct doma
>>>  return 0;
>>>  }
>>>  
>>> -static bool_t hvm_efer_valid(struct domain *d,
>>> - uint64_t value, uint64_t efer_validbits)
>>> +static bool_t hvm_efer_valid(const struct vcpu *v, uint64_t value,
>>> + bool_t restore)
>>>  {
>>> -if ( nestedhvm_enabled(d) && cpu_has_svm )
>>> -efer_validbits |= EFER_SVME;
>>> +unsigned int ext1_ecx = 0, ext1_edx = 0;
>>>  
>>> -return !((value & ~efer_validbits) ||
>>> - ((sizeof(long) != 8) && (value & EFER_LME)) ||
>>> - (!cpu_has_svm && (value & EFER_SVME)) ||
>>> - (!cpu_has_nx && (value & EFER_NX)) ||
>>> - (!cpu_has_syscall && (value & EFER_SCE)) ||
>>> - (!cpu_has_lmsl && (value & EFER_LMSLE)) ||
>>> - (!cpu_has_ffxsr && (value & EFER_FFXSE)) ||
>>> - ((value & (EFER_LME|EFER_LMA)) == EFER_LMA));
>>> +if ( !restore && !is_hardware_domain(v->domain) )
>>> +{
>>> +unsigned int level;
>>> +
>>> +ASSERT(v == current);
>>> +hvm_cpuid(0x8000, &level, NULL, NULL, NULL);
>>> +if ( level >= 0x8001 )
>>> +hvm_cpuid(0x8001, NULL, NULL, &ext1_ecx, &ext1_edx);
>>> +}
>>> +else
>>> +{
>>> +ext1_edx = boot_cpu_data.x86_capability[X86_FEATURE_LM / 32];
>>> +ext1_ecx = boot_cpu_data.x86_capability[X86_FEATURE_SVM / 32];
>>> +}
>>> +
>>> +if ( (value & EFER_SCE) &&
>>> + !(ext1_edx & cpufeat_mask(X86_FEATURE_SYSCALL)) )
>>> +return 0;
>>> +
>>> +if ( (value & (EFER_LME | EFER_LMA)) &&
>>> + !(ext1_edx & cpufeat_mask(X86_FEATURE_LM)) )
>>> +return 0;
>>> +
>>> +if ( (value & EFER_LMA) && !(value & EFER_LME) )
>>> +return 0;
>> The LME/LMA handling more complicated than this.
>>
>> LMA is read-only on Intel, but specified as read-write on AMD, with the
>> requirement that if it doesn't match the value generated by hardware, a
>> #GP fault will occur.  I believe this actually means it is read-only on
>> AMD as well.
>>
>> LMA only gets set by hardware after paging is enabled and the processor
>> switches properly into long mode, which means that there is a window
>> between setting LME and setting CR0.PG where LMA should read as 0.
> Did you note that hvm_set_efer() clears EFER_LMA before calling
> hvm_efer_valid(), thus avoiding all those issues?

I failed to appreciate its intent.

>
>> I think hvm_efer_valid() also needs the current EFER and CR0 to work out
>> what the current LMA should be, and reject any attempt to change it.
> I.e. this would be needed only for the restore path (and it not being
> done hasn't caused problems for me because guests don't normally
> get migrated before they fully enabled long mode).

Urgh.  In the restore case we don't have a current EFER to use.  In this
case I think it is acceptable to check that new_lma is (new_lme &&
!cr4.pg), which allows us to declare that the restore state is
consistent (presuming we have already validated cr4).

For the non-restore case, the caller takes care of removing LMA from
consideration.

>
>>> +if ( (value & EFER_NX) && !(ext1_edx & cpufeat_mask(X86_FEATURE_NX)) )
>>> +return 0;
>>> +
>>> +if ( (value & EFER_SVME) &&
>>> + (!(ext1_ecx & cpufeat_mask(X86_FEATURE_SVM)) ||
>>> +  !nestedhvm_enabled(v->domain)) )
>> This is going to cause an issue for the restore case, as the HVM PARAMs
>> follow the architectural state.
>>
>> I don't believe it is reasonable for nestedhvm_enabled() to disagree
>> with cpufeat_mask(X86_FEATURE_{SVM,VMX}), or for the toolstack to be
>> able to yank nestedhvm while the VM is active.
> I effectively only translated the pre-existing (and kind of suspicious
> to me) nestedhvm_enabled() use here.

So you did.  (My review focused a bit too much along the lines of "does
this new function match what the manuals state")

>
>> Looking at the checks when setting HVM_PARAM_NESTEDHVM, neither of the
>> guest feature flags are actually checked before setting up the nested
>> infrastructure.
> Which would be a separate issue.

It is indeed.  I apologize for rambling slightly - it was intended
purely as an observation.

>
>> Having said all of this, don't appear to be any migration records
>> associated with nested state (hardware-loaded VMCS/VMCB pointers,
>> currently nested?) which leads me to suspect that a domain actually
>> using nested virt is not going to survive a migration, so it might be
>> acceptable to fudge the checking of SVME for now.
> So am I getting you right that while these are all valid obs

Re: [Xen-devel] [PATCH 3/3] xen: use correct type for physical addresses

2015-01-09 Thread David Vrabel
On 09/01/15 12:56, Jan Beulich wrote:
 On 09.01.15 at 13:51,  wrote:
>> On 09/01/15 09:57, Jan Beulich wrote:
>> On 08.01.15 at 18:01,  wrote:
 @@ -284,7 +286,7 @@ static void __init xen_update_mem_tables(unsigned long 
 pfn, unsigned long mfn)
}
  
/* Update kernel mapping, but not for highmem. */
 -  if ((pfn << PAGE_SHIFT) >= __pa(high_memory))
 +  if (PFN_PHYS(pfn) >= (u64)(__pa(high_memory)))
>>>
>>> I don't think you really need the cast on the right side - __pa()
>>> should be returning a value of suitable type (and unsigned long
>>> would be sufficient for anything up to and including high_memory).
>>
>> I'd prefer:
>>
>> if (pfn >= PFN_DOWN(__pa(high_memory))
> 
> Even better indeed. Just one more point: Strictly speaking
> __pa(high_memory) is invalid, as __pa() is defined for low
> memory only. Hence perhaps PFN_UP(__pa(high_memory - 1))?

Yes.

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 0/2] gnttab: Improve scaleability

2015-01-09 Thread Christoph Egger
This patch series changes the grant table locking to
a more fain grained locking protocol. The result is
a performance boost measured with blkfront/blkback.
Document the locking protocol.

v4:
  * Coding style nits from Jan Beulich
  * Fixup read locks pointed out by Jan Beulich
  * renamed double_gt_(un)lock to double_maptrack_(un)lock
per request from Jan Beulich
  * Addressed ASSERT()'s from Jan Beulich

v3:
  * Addressed gnttab_swap_grant_ref() comment from Andrew Cooper
v2:
  * Add arm part per request from Julien Grall

Christoph Egger (1):
  gnttab: Introduce rwlock to protect updates to grant table state

Matt Wilson (1):
  gnttab: refactor locking for scalability

 docs/misc/grant-tables.txt|   49 ++-
 xen/arch/arm/mm.c |4 +-
 xen/arch/x86/mm.c |4 +-
 xen/common/grant_table.c  |  321 +
 xen/include/xen/grant_table.h |9 +-
 5 files changed, 258 insertions(+), 129 deletions(-)

-- 
1.7.9.5


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 2/2] gnttab: refactor locking for scalability

2015-01-09 Thread Christoph Egger
From: Matt Wilson 

This patch refactors grant table locking. It splits the previous
single spinlock per grant table into multiple locks. The heavily
modified components of the grant table (the maptrack state and the
active entries) are now protected by their own spinlocks. The
remaining elements of the grant table are read-mostly, so the main
grant table lock is modified to be a rwlock to improve concurrency.

Workloads with high grant table operation rates, especially map/unmap
operations as used by blkback/blkfront when persistent grants are not
supported, show marked improvement with these changes. A domU with 24
VBDs in a streaming 2M write workload achieved 1,400 MB/sec before
this change. Performance more than doubles with this patch, reaching
3,000 MB/sec before tuning and 3,600 MB/sec after adjusting event
channel vCPU bindings.

Signed-off-by: Matt Wilson 
[chegger: ported to xen-staging, split into multiple commits]

v4:
  * Addressed ASSERT()'s from Jan Beulich
  * Addressed locking issues in unmap_common pointed out
by Jan Beulich
v3:
  * Addressed gnttab_swap_grant_ref() comment from Andrew Cooper

Signed-off-by: Christoph Egger 
Cc: Anthony Liguori 
Cc: Jan Beulich 
Cc: Keir Fraser 
---
 docs/misc/grant-tables.txt |   21 +
 xen/common/grant_table.c   |  219 +++-
 2 files changed, 158 insertions(+), 82 deletions(-)

diff --git a/docs/misc/grant-tables.txt b/docs/misc/grant-tables.txt
index c9ae8f2..1ada018 100644
--- a/docs/misc/grant-tables.txt
+++ b/docs/misc/grant-tables.txt
@@ -63,6 +63,7 @@ is complete.
   act->domid : remote domain being granted rights
   act->frame : machine frame being granted
   act->pin   : used to hold reference counts
+  act->lock  : spinlock used to serialize access to active entry state
 
  Map tracking
  
@@ -87,6 +88,8 @@ is complete.
version, partially initialized active table 
pages,
etc.
   grant_table->maptrack_lock : spinlock used to protect the maptrack state
+  active_grant_entry->lock   : spinlock used to serialize modifications to
+   active entries
 
  The primary lock for the grant table is a read/write spinlock. All
  functions that access members of struct grant_table must acquire a
@@ -102,6 +105,24 @@ is complete.
  state can be rapidly modified under some workloads, and the critical
  sections are very small, thus we use a spinlock to protect them.
 
+ Active entries are obtained by calling active_entry_acquire(gt, ref).
+ This function returns a pointer to the active entry after locking its
+ spinlock. The caller must hold the rwlock for the gt in question
+ before calling active_entry_acquire(). This is because the grant
+ table can be dynamically extended via gnttab_grow_table() while a
+ domain is running and must be fully initialized. Once all access to
+ the active entry is complete, release the lock by calling
+ active_entry_release(act).
+
+ Summary of rules for locking:
+  active_entry_acquire() and active_entry_release() can only be
+  called when holding the relevant grant table's lock. I.e.:
+read_lock(>->lock);
+act = active_entry_acquire(gt, ref);
+...
+active_entry_release(act);
+read_unlock(>->lock);
+
 

 
  Granting a foreign domain access to frames
diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index 586ca94..92b3cb1 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -157,10 +157,13 @@ struct active_grant_entry {
in the page.   */
 unsigned  length:16; /* For sub-page grants, the length of the
 grant.*/
+spinlock_tlock;  /* lock to protect access of this entry.
+see docs/misc/grant-tables.txt for
+locking protocol  */
 };
 
 #define ACGNT_PER_PAGE (PAGE_SIZE / sizeof(struct active_grant_entry))
-#define active_entry(t, e) \
+#define _active_entry(t, e) \
 ((t)->active[(e)/ACGNT_PER_PAGE][(e)%ACGNT_PER_PAGE])
 
 static inline void gnttab_flush_tlb(const struct domain *d)
@@ -188,6 +191,26 @@ nr_active_grant_frames(struct grant_table *gt)
 return num_act_frames_from_sha_frames(nr_grant_frames(gt));
 }
 
+static inline struct active_grant_entry *
+active_entry_acquire(struct grant_table *t, grant_ref_t e)
+{
+struct active_grant_entry *act;
+
+/* not perfect, but better than nothing for a debug build
+ * sanity check */
+ASSERT(rw_is_locked(&t->lock));
+
+act = &_active_entry(t, e);
+spin_lock(&act->lock);
+
+return act;
+}
+
+static inline void active_entry_release(struct active_grant_entry *act)
+{
+spin_unlock(&act->lock);
+}
+
 /* Check if the page has been paged out, or needs

[Xen-devel] [PATCH v4 1/2] gnttab: Introduce rwlock to protect updates to grant table state

2015-01-09 Thread Christoph Egger
Split grant table lock into two separate locks. One to protect
maptrack state and change the other into a rwlock.

The rwlock is used to prevent readers from accessing
inconsistent grant table state such as current
version, partially initialized active table pages,
etc.

Signed-off-by: Matt Wilson 
[chegger: ported to xen-staging, split into multiple commits]

v4:
  * Coding style nits from Jan Beulich
  * Fixup read locks pointed out by Jan Beulich
  * renamed double_gt_(un)lock to double_maptrack_(un)lock
per request from Jan Beulich
v3:
  * Addressed gnttab_swap_grant_ref() comment from Andrew Cooper
v2:
  * Add arm part per request from Julien Grall

Signed-off-by: Christoph Egger 
Cc: Jan Beulich 
Cc: Keir Fraser 
Cc: Julien Grall 
---
 docs/misc/grant-tables.txt|   28 -
 xen/arch/arm/mm.c |4 +-
 xen/arch/x86/mm.c |4 +-
 xen/common/grant_table.c  |  134 +++--
 xen/include/xen/grant_table.h |   17 --
 5 files changed, 117 insertions(+), 70 deletions(-)

diff --git a/docs/misc/grant-tables.txt b/docs/misc/grant-tables.txt
index 19db4ec..c9ae8f2 100644
--- a/docs/misc/grant-tables.txt
+++ b/docs/misc/grant-tables.txt
@@ -74,7 +74,33 @@ is complete.
  matching map track entry is then removed, as if unmap had been invoked.
  These are not used by the transfer mechanism.
   map->domid : owner of the mapped frame
-  map->ref_and_flags : grant reference, ro/rw, mapped for host or device access
+  map->ref   : grant reference
+  map->flags : ro/rw, mapped for host or device access
+
+
+ Locking
+ ~~~
+ Xen uses several locks to serialize access to the internal grant table state.
+
+  grant_table->lock  : rwlock used to prevent readers from accessing
+   inconsistent grant table state such as current
+   version, partially initialized active table 
pages,
+   etc.
+  grant_table->maptrack_lock : spinlock used to protect the maptrack state
+
+ The primary lock for the grant table is a read/write spinlock. All
+ functions that access members of struct grant_table must acquire a
+ read lock around critical sections. Any modification to the members
+ of struct grant_table (e.g., nr_status_frames, nr_grant_frames,
+ active frames, etc.) must only be made if the write lock is
+ held. These elements are read-mostly, and read critical sections can
+ be large, which makes a rwlock a good choice.
+
+ The maptrack state is protected by its own spinlock. Any access (read
+ or write) of struct grant_table members that have a "maptrack_"
+ prefix must be made while holding the maptrack lock. The maptrack
+ state can be rapidly modified under some workloads, and the critical
+ sections are very small, thus we use a spinlock to protect them.
 
 

 
diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index 7d4ba0c..2765683 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -1037,7 +1037,7 @@ int xenmem_add_to_physmap_one(
 switch ( space )
 {
 case XENMAPSPACE_grant_table:
-spin_lock(&d->grant_table->lock);
+write_lock(&d->grant_table->lock);
 
 if ( d->grant_table->gt_version == 0 )
 d->grant_table->gt_version = 1;
@@ -1067,7 +1067,7 @@ int xenmem_add_to_physmap_one(
 
 t = p2m_ram_rw;
 
-spin_unlock(&d->grant_table->lock);
+write_unlock(&d->grant_table->lock);
 break;
 case XENMAPSPACE_shared_info:
 if ( idx != 0 )
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 6e9c2c0..d371ff9 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4565,7 +4565,7 @@ int xenmem_add_to_physmap_one(
 mfn = virt_to_mfn(d->shared_info);
 break;
 case XENMAPSPACE_grant_table:
-spin_lock(&d->grant_table->lock);
+write_lock(&d->grant_table->lock);
 
 if ( d->grant_table->gt_version == 0 )
 d->grant_table->gt_version = 1;
@@ -4587,7 +4587,7 @@ int xenmem_add_to_physmap_one(
 mfn = virt_to_mfn(d->grant_table->shared_raw[idx]);
 }
 
-spin_unlock(&d->grant_table->lock);
+write_unlock(&d->grant_table->lock);
 break;
 case XENMAPSPACE_gmfn_range:
 case XENMAPSPACE_gmfn:
diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index fe52b63..586ca94 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -229,27 +229,27 @@ static int __get_paged_frame(unsigned long gfn, unsigned 
long *frame, struct pag
 }
 
 static inline void
-double_gt_lock(struct grant_table *lgt, struct grant_table *rgt)
+double_maptrack_lock(struct grant_table *lgt, struct grant_table *rgt)
 {
 if ( lgt < rgt )
  

Re: [Xen-devel] [PATCH 2/3] xen: correct race in alloc_p2m_pmd()

2015-01-09 Thread Juergen Gross

On 01/09/2015 04:09 PM, David Vrabel wrote:

On 08/01/15 17:01, Juergen Gross wrote:

When allocating a new pmd for the linear mapped p2m list a check is
done for not introducing another pmd when this just happened on
another cpu. In this case the old pte pointer was returned which
points to the p2m_missing or p2m_identity page. The correct value
would be the pointer to the found new page.


Looking at the check I don't see how it works at all.

alloc_p2m() looks up the address of the PTE page

ptep = lookup_address(addr, &level);
pte_pg = (pte_t *)((unsigned long)ptep & ~(PAGE_SIZE - 1));

But the check under the lock that is still true does:

ptechk = lookup_address(vaddr, &level);
if (ptechk == pte_pg) {

So I don't see how this works unless (by chance) we happen to get the
first entry in the PTE page.


The check under lock tests vaddr which is pmd aligned.


It also doesn't handle racing with p2m_missing_pte to p2m_identity_pte
(or vice-versa) change.


The change is always and only for missing/identity to individual pmd.
There is no transition between all missing and identity pmds.


Something like:

ptechk = lookup_address(vaddr, &level);
ptechk = (pte_t *)((unsigned long)ptechk & ~(PAGE_SIZE - 1));
if (ptechk == p2m_missing_pte || ptechk == p2m_identity) {
   set_pmd(..)

Perhaps?


Would work, but is more complicated than needed. :-)


Juergen



David


--- a/arch/x86/xen/p2m.c
+++ b/arch/x86/xen/p2m.c
@@ -440,10 +440,9 @@ EXPORT_SYMBOL_GPL(get_phys_to_machine);
   * a new pmd is to replace p2m_missing_pte or p2m_identity_pte by a individual
   * pmd. In case of PAE/x86-32 there are multiple pmds to allocate!
   */
-static pte_t *alloc_p2m_pmd(unsigned long addr, pte_t *ptep, pte_t *pte_pg)
+static pte_t *alloc_p2m_pmd(unsigned long addr, pte_t *pte_pg)
  {
pte_t *ptechk;
-   pte_t *pteret = ptep;
pte_t *pte_newpg[PMDS_PER_MID_PAGE];
pmd_t *pmdp;
unsigned int level;
@@ -477,8 +476,6 @@ static pte_t *alloc_p2m_pmd(unsigned long addr, pte_t 
*ptep, pte_t *pte_pg)
if (ptechk == pte_pg) {
set_pmd(pmdp,
__pmd(__pa(pte_newpg[i]) | _KERNPG_TABLE));
-   if (vaddr == (addr & ~(PMD_SIZE - 1)))
-   pteret = pte_offset_kernel(pmdp, addr);
pte_newpg[i] = NULL;
}

@@ -492,7 +489,7 @@ static pte_t *alloc_p2m_pmd(unsigned long addr, pte_t 
*ptep, pte_t *pte_pg)
vaddr += PMD_SIZE;
}

-   return pteret;
+   return lookup_address(addr, &level);
  }

  /*
@@ -521,7 +518,7 @@ static bool alloc_p2m(unsigned long pfn)

if (pte_pg == p2m_missing_pte || pte_pg == p2m_identity_pte) {
/* PMD level is missing, allocate a new one */
-   ptep = alloc_p2m_pmd(addr, ptep, pte_pg);
+   ptep = alloc_p2m_pmd(addr, pte_pg);
if (!ptep)
return false;
}







___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/3] x86: also allow REP STOS emulation acceleration

2015-01-09 Thread Tim Deegan
At 12:46 + on 09 Jan (1420804005), Andrew Cooper wrote:
> On 09/01/15 12:10, Jan Beulich wrote:
>  On 09.01.15 at 12:45,  wrote:
> >> At 11:24 + on 09 Jan (1420799087), Jan Beulich wrote:
> >> On 09.01.15 at 12:18,  wrote:
> >> +default:
> >> +xfree(buf);
> >> +ASSERT(!buf);
>  looks dodgy...
> >>> In which way? The "default" is supposed to be unreachable, and sits
> >>> in the else branch to an if(!buf), i.e. in a release build we'll correctly
> >>> free the buffer, while in a debug build the ASSERT() will trigger.
> >> Oh I see.  Can you please use ASSERT(0) for that?
> > I sincerely dislike ASSERT(0), but if that's the only way to get
> > the patch accepted...
> >
> > Jan
> >
> 
> Perhaps introducing ASSERT_UNREACHABLE() as an alternative which is more
> obvious in nature than both ASSERT(!buf) and ASSERT(0) ?

Fine by me!

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] MAINTAINERS: Andrew Cooper to co-maintain x86

2015-01-09 Thread Konrad Rzeszutek Wilk
On Fri, Jan 09, 2015 at 11:01:01AM +, Andrew Cooper wrote:
> On 09/01/15 10:55, Jan Beulich wrote:
> > Signed-off-by: Jan Beulich 
> 
> Acked-by: Andrew Cooper 

> 
> (or Reviewed-by perhaps? I am not certain of the protocol here)

Congrantulations-by: Konrad Rzeszutek Wilk  :-)
> 
> >
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -341,6 +341,7 @@
> >  X86 ARCHITECTURE
> >  M: Keir Fraser 
> >  M: Jan Beulich 
> > +M: Andrew Cooper 
> >  S: Supported
> >  L: xen-devel@lists.xen.org 
> >  F: xen/arch/x86/
> >
> >
> >
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 3/3] x86/HVM: make hvm_efer_valid() honor guest features

2015-01-09 Thread Jan Beulich
>>> On 09.01.15 at 16:09,  wrote:
> On 09/01/15 11:20, Jan Beulich wrote:
> On 08.01.15 at 19:49,  wrote:
>>> I think hvm_efer_valid() also needs the current EFER and CR0 to work out
>>> what the current LMA should be, and reject any attempt to change it.
>> I.e. this would be needed only for the restore path (and it not being
>> done hasn't caused problems for me because guests don't normally
>> get migrated before they fully enabled long mode).
> 
> Urgh.  In the restore case we don't have a current EFER to use.  In this
> case I think it is acceptable to check that new_lma is (new_lme &&
> !cr4.pg), which allows us to declare that the restore state is
> consistent (presuming we have already validated cr4).

I suppose you mean cr0 here, and also new_lma == new_lme && cr0.pg?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] libxl: Spice image compression setting support for upstream qemu

2015-01-09 Thread Wei Liu
On Thu, Jan 08, 2015 at 04:26:30PM +0100, Fabio Fantoni wrote:
> Usage:
> spice_image_compression=[auto_glz|auto_lz|quic|glz|lz|off]
> 
> Specifies what image compression is to be used by spice (if given),
> otherwise the qemu default will be used.
> 
> Signed-off-by: Fabio Fantoni 
> 
> ---
> 
> Changes in v2:
> - refresh
> ---
>  docs/man/xl.cfg.pod.5   |  5 +
>  tools/libxl/libxl.h | 11 +++
>  tools/libxl/libxl_dm.c  |  3 +++
>  tools/libxl/libxl_types.idl |  1 +
>  tools/libxl/xl_cmdimpl.c|  2 ++
>  5 files changed, 22 insertions(+)
> 
> diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
> index 622ea53..07797b1 100644
> --- a/docs/man/xl.cfg.pod.5
> +++ b/docs/man/xl.cfg.pod.5
> @@ -1421,6 +1421,11 @@ for redirection of up to 4 usb devices from spice 
> client to domU's qemu.
>  It requires an usb controller and if not defined it will automatically adds
>  an usb2 controller. The default is disabled (0).
>  
> +=item B
> +
> +Specifies what image compression is to be used by spice (if given), otherwise
> +the qemu default will be used.
> +

I don't think you should be so explicit here. Potentially QEMU can add
other compression algorithms or remove some algorithms. The doc should
ask user to consult output of the QEMU they use.

>  =back
>  
>  =head3 Miscellaneous Emulated Hardware
> diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
> index 0a123f1..b8e0b67 100644
> --- a/tools/libxl/libxl.h
> +++ b/tools/libxl/libxl.h
> @@ -528,6 +528,17 @@ typedef struct libxl__ctx libxl_ctx;
>  #define LIBXL_HAVE_SPICE_USBREDIREDIRECTION 1
>  
>  /*
> + * LIBXL_HAVE_SPICE_IMAGECOMPRESSION
> + *
> + * If defined, then the libxl_spice_info structure will contain a string type
> + * field: image_compression. This value defines what Spice image compression
> + * is used.
> + *
> + * If this is not defined, the Spice image compression setting support is 
> ignored.
> + */
> +#define LIBXL_HAVE_SPICE_IMAGECOMPRESSION 1
> +
> +/*
>   * LIBXL_HAVE_DOMAIN_CREATE_RESTORE_PARAMS 1
>   *
>   * If this is defined, libxl_domain_create_restore()'s API has changed to
> diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
> index c2b0487..e1188f4 100644
> --- a/tools/libxl/libxl_dm.c
> +++ b/tools/libxl/libxl_dm.c
> @@ -398,6 +398,9 @@ static char *dm_spice_options(libxl__gc *gc,
>  if (!libxl_defbool_val(spice->clipboard_sharing))
>  opt = libxl__sprintf(gc, "%s,disable-copy-paste", opt);
>  
> +if (spice->image_compression)
> +opt = libxl__sprintf(gc, "%s,image-compression=%s", opt, 
> spice->image_compression);

Line too long.

> +
>  return opt;
>  }
>  
> diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
> index f7fc695..1dc4233 100644
> --- a/tools/libxl/libxl_types.idl
> +++ b/tools/libxl/libxl_types.idl
> @@ -240,6 +240,7 @@ libxl_spice_info = Struct("spice_info", [
>  ("vdagent", libxl_defbool),
>  ("clipboard_sharing", libxl_defbool),
>  ("usbredirection", integer),
> +("image_compression", string),
>  ])
>  
>  libxl_sdl_info = Struct("sdl_info", [
> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> index ed0d478..8afac5a 100644
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
> @@ -1938,6 +1938,8 @@ skip_vfb:
>  &b_info->u.hvm.spice.clipboard_sharing, 0);
>  if (!xlu_cfg_get_long (config, "spiceusbredirection", &l, 0))
>  b_info->u.hvm.spice.usbredirection = l;
> +xlu_cfg_replace_string (config, "spice_image_compression",
> +&b_info->u.hvm.spice.image_compression, 0);
>  xlu_cfg_get_defbool(config, "nographic", &b_info->u.hvm.nographic, 
> 0);
>  xlu_cfg_get_defbool(config, "gfx_passthru",
>  &b_info->u.hvm.gfx_passthru, 0);
> -- 
> 1.9.1

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 3/3] x86/HVM: make hvm_efer_valid() honor guest features

2015-01-09 Thread Andrew Cooper
On 09/01/15 15:33, Jan Beulich wrote:
 On 09.01.15 at 16:09,  wrote:
>> On 09/01/15 11:20, Jan Beulich wrote:
>> On 08.01.15 at 19:49,  wrote:
 I think hvm_efer_valid() also needs the current EFER and CR0 to work out
 what the current LMA should be, and reject any attempt to change it.
>>> I.e. this would be needed only for the restore path (and it not being
>>> done hasn't caused problems for me because guests don't normally
>>> get migrated before they fully enabled long mode).
>> Urgh.  In the restore case we don't have a current EFER to use.  In this
>> case I think it is acceptable to check that new_lma is (new_lme &&
>> !cr4.pg), which allows us to declare that the restore state is
>> consistent (presuming we have already validated cr4).
> I suppose you mean cr0 here, and also new_lma == new_lme && cr0.pg?

Oops yes - I do on both counts.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3] libxl: Spice streaming video setting support for upstream qemu

2015-01-09 Thread Wei Liu
On Fri, Jan 09, 2015 at 01:32:12PM +0100, Fabio Fantoni wrote:
[...]
>   * LIBXL_HAVE_DOMAIN_CREATE_RESTORE_PARAMS 1
>   *
>   * If this is defined, libxl_domain_create_restore()'s API has changed to
> diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
> index e1188f4..e80b8d7 100644
> --- a/tools/libxl/libxl_dm.c
> +++ b/tools/libxl/libxl_dm.c
> @@ -401,6 +401,9 @@ static char *dm_spice_options(libxl__gc *gc,
>  if (spice->image_compression)
>  opt = libxl__sprintf(gc, "%s,image-compression=%s", opt, 
> spice->image_compression);
>  
> +if (spice->streaming_video)
> +opt = libxl__sprintf(gc, "%s,streaming-video=%s", opt, 
> spice->streaming_video);

Line too long.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] xen/blkfront: restart request queue when there is enough persistent_gnts_c

2015-01-09 Thread Roger Pau Monné
El 06/01/15 a les 14.19, Bob Liu ha escrit:
> When there is no enough free grants, gnttab_alloc_grant_references()
> will fail and block request queue will stop.
> If the system is always lack of grants, blkif_restart_queue_callback() can't 
> be
> scheduled and block request queue can't be restart(block I/O hang).
> 
> But when there are former requests complete, some grants may free to
> persistent_gnts_c, we can give the request queue another chance to restart and
> avoid block hang.
> 
> Reported-by: Junxiao Bi 
> Signed-off-by: Bob Liu 
> ---
>  drivers/block/xen-blkfront.c |   11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index 2236c6f..dd30f99 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -1125,6 +1125,17 @@ static void blkif_completion(struct blk_shadow *s, 
> struct blkfront_info *info,
>   }
>   }
>   }
> +
> + /*
> +  * Request queue would be stopped if failed to alloc enough grants and
> +  * won't be restarted until gnttab_free_count >= info->callback->count.
> +  *
> +  * But there is another case, once we have enough persistent grants we
> +  * can try to restart the request queue instead of continue to wait for
> +  * 'gnttab_free_count'.
> +  */
> + if (info->persistent_gnts_c >= info->callback.count)
> + schedule_work(&info->work);

I guess I'm missing something here, but blkif_completion is called by
blkif_interrupt, which in turn calls kick_pending_request_queues when
finished, which IMHO should be enough to restart the processing of requests.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 12/12] xen/gntdev: provide a set of pages for each VMA

2015-01-09 Thread Stefano Stabellini
On Tue, 6 Jan 2015, David Vrabel wrote:
> From: Jenny Herbert 
> 
> For each VMA for grant maps, provide the correct array of pages so
> get_user_pages() on foreign mappings works in PV guests.
> 
> Signed-off-by: Jenny Herbert 
> Signed-off-by: David Vrabel 
> ---
>  drivers/xen/gntdev.c |2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
> index 5d73fe8..09e7863 100644
> --- a/drivers/xen/gntdev.c
> +++ b/drivers/xen/gntdev.c
> @@ -870,6 +870,8 @@ static int gntdev_mmap(struct file *flip, struct 
> vm_area_struct *vma)
>   vma->vm_end - vma->vm_start,
>   set_grant_ptes_as_special, NULL);
>   }
> +
> + vma->pages = map->pages;

Shouldn't this be done just for x86 PV guests rather than all the
callers of gntdev_mmap?


>   }
>  
>   return 0;
> -- 
> 1.7.10.4
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2] libxl: Don't ignore error when we fail to give access to ioport/irq/iomem

2015-01-09 Thread Julien Grall
If we fail to give the access, the domain will unlikely work correctly.
So we should bail out at the first error.

Signed-off-by: Julien Grall 
Cc: Ian Jackson 
Cc: Stefano Stabellini 
Cc: Ian Campbell 
Cc: Wei Liu 

---
This patch is candidate for a backport for at least Xen 4.4 and Xen 4.5.

Changes in v2:
- Also bail out when libxl fails to give access to iomem
---
 tools/libxl/libxl_create.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 1198225..6f87d1c 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -1167,6 +1167,7 @@ static void domcreate_launch_dm(libxl__egc *egc, 
libxl__multidev *multidev,
  "failed give dom%d access to ioports %"PRIx32"-%"PRIx32,
  domid, io->first, io->first + io->number - 1);
 ret = ERROR_FAIL;
+goto error_out;
 }
 }
 
@@ -1182,6 +1183,7 @@ static void domcreate_launch_dm(libxl__egc *egc, 
libxl__multidev *multidev,
 if (ret < 0) {
 LOGE(ERROR, "failed give dom%d access to irq %d", domid, irq);
 ret = ERROR_FAIL;
+goto error_out;
 }
 }
 
@@ -1198,7 +1200,7 @@ static void domcreate_launch_dm(libxl__egc *egc, 
libxl__multidev *multidev,
  "failed give dom%d access to iomem range %"PRIx64"-%"PRIx64,
  domid, io->start, io->start + io->number - 1);
 ret = ERROR_FAIL;
-continue;
+goto error_out;
 }
 ret = xc_domain_memory_mapping(CTX->xch, domid,
io->gfn, io->start,
@@ -1209,6 +1211,7 @@ static void domcreate_launch_dm(libxl__egc *egc, 
libxl__multidev *multidev,
  " to guest address %"PRIx64,
  domid, io->start, io->start + io->number - 1, io->gfn);
 ret = ERROR_FAIL;
+goto error_out;
 }
 }
 
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/3] spinlock: use local_irq_disable() instead of local_irq_save() where possible

2015-01-09 Thread Konrad Rzeszutek Wilk
On Thu, Jan 08, 2015 at 04:13:03PM +, Jan Beulich wrote:
> ... as generally being a cheaper operation.

I was wondering if it would be possible to change some of the
EFLAGS after when we go in the 'cpu_relax' - and an interrupt
happens, we process it, alter the EFLAGS, then when we are
done, the EFLAGS are different - which the original code would
save when it was done sitting on the cpu_relax() loop.

Actually that sounds bad - we only want to restore the flags
that we had when going in this spin lock. Would make sense
to add an ASSERT to check for flags being different from the
EFLAGS?


> 
> Signed-off-by: Jan Beulich 
> 
> --- a/xen/common/spinlock.c
> +++ b/xen/common/spinlock.c
> @@ -162,7 +162,7 @@ unsigned long _spin_lock_irqsave(spinloc
>  local_irq_restore(flags);
>  while ( likely(_raw_spin_is_locked(&lock->raw)) )
>  cpu_relax();
> -local_irq_save(flags);
> +local_irq_disable();
>  }
>  LOCK_PROFILE_GOT;
>  preempt_disable();
> @@ -313,7 +313,7 @@ unsigned long _read_lock_irqsave(rwlock_
>  local_irq_restore(flags);
>  while ( (x = lock->lock) & RW_WRITE_FLAG )
>  cpu_relax();
> -local_irq_save(flags);
> +local_irq_disable();
>  }
>  } while ( cmpxchg(&lock->lock, x, x+1) != x );
>  preempt_disable();
> @@ -409,7 +409,7 @@ unsigned long _write_lock_irqsave(rwlock
>  local_irq_restore(flags);
>  while ( (x = lock->lock) & RW_WRITE_FLAG )
>  cpu_relax();
> -local_irq_save(flags);
> +local_irq_disable();
>  }
>  } while ( cmpxchg(&lock->lock, x, x|RW_WRITE_FLAG) != x );
>  while ( x != 0 )
> 
> 
> 

> spinlock: use local_irq_disable() instead of local_irq_save() where possible
> 
> ... as generally being a cheaper operation.
> 
> Signed-off-by: Jan Beulich 
> 
> --- a/xen/common/spinlock.c
> +++ b/xen/common/spinlock.c
> @@ -162,7 +162,7 @@ unsigned long _spin_lock_irqsave(spinloc
>  local_irq_restore(flags);
>  while ( likely(_raw_spin_is_locked(&lock->raw)) )
>  cpu_relax();
> -local_irq_save(flags);
> +local_irq_disable();
>  }
>  LOCK_PROFILE_GOT;
>  preempt_disable();
> @@ -313,7 +313,7 @@ unsigned long _read_lock_irqsave(rwlock_
>  local_irq_restore(flags);
>  while ( (x = lock->lock) & RW_WRITE_FLAG )
>  cpu_relax();
> -local_irq_save(flags);
> +local_irq_disable();
>  }
>  } while ( cmpxchg(&lock->lock, x, x+1) != x );
>  preempt_disable();
> @@ -409,7 +409,7 @@ unsigned long _write_lock_irqsave(rwlock
>  local_irq_restore(flags);
>  while ( (x = lock->lock) & RW_WRITE_FLAG )
>  cpu_relax();
> -local_irq_save(flags);
> +local_irq_disable();
>  }
>  } while ( cmpxchg(&lock->lock, x, x|RW_WRITE_FLAG) != x );
>  while ( x != 0 )

> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 06/12] xen: mark grant mapped pages as foreign

2015-01-09 Thread Stefano Stabellini
On Tue, 6 Jan 2015, David Vrabel wrote:
> From: Jenny Herbert 
> 
> Use the "foreign" page flag to mark pages that have a grant map.  Use
> page->private to store information of the grant (the granting domain
> and the grant reference).
> 
> Signed-off-by: Jenny Herbert 
> Signed-off-by: David Vrabel 
> ---
>  arch/x86/xen/p2m.c|   50 
> ++---
>  include/xen/grant_table.h |   13 
>  2 files changed, 56 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
> index 0d70814..22624a3 100644
> --- a/arch/x86/xen/p2m.c
> +++ b/arch/x86/xen/p2m.c
> @@ -648,6 +648,43 @@ bool set_phys_to_machine(unsigned long pfn, unsigned 
> long mfn)
>   return true;
>  }
>  
> +static int
> +init_page_grant_ref(struct page *p, domid_t domid, grant_ref_t grantref)
> +{
> +#ifdef CONFIG_X86_64
> + uint64_t gref;
> + uint64_t* gref_p = &gref;
> +#else
> + uint64_t* gref_p = kmalloc(sizeof(uint64_t), GFP_KERNEL);
> + if (!gref)
> + return -ENOMEM;
> + uint64_t* gref = gref_p;
> +#endif
> +
> + *gref_p = ((uint64_t) grantref << 32) | domid;
> + p->private = gref;
> +
> + WARN_ON(PagePrivate(p));
> + WARN_ON(PageForeign(p));
> + SetPagePrivate(p);
> + SetPageForeign(p);
> + return 0;
> +}
> +
> +static void
> +clear_page_grant_ref(struct page *p)
> +{
> + WARN_ON(!PagePrivate(p));
> + WARN_ON(!PageForeign(p));
> +
> +#ifndef CONFIG_X86_64
> + kfree(p->private);
> +#endif
> + p->private = 0;
> + ClearPagePrivate(p);
> + ClearPageForeign(p);
> +}

Given that get_page_grant_ref is used by netback, these functions need
to be made arch-independent, moved to an arch-independent code location
and called appropriately.


>  int set_foreign_p2m_mapping(struct gnttab_map_grant_ref *map_ops,
>   struct gnttab_map_grant_ref *kmap_ops,
>   struct page **pages, unsigned int count)
> @@ -681,11 +718,12 @@ int set_foreign_p2m_mapping(struct gnttab_map_grant_ref 
> *map_ops,
>   }
>   pfn = page_to_pfn(pages[i]);
>  
> - WARN_ON(PagePrivate(pages[i]));
> - WARN(pfn_to_mfn(pfn) != INVALID_P2M_ENTRY, "page must be 
> ballooned");
> + ret = init_page_grant_ref(pages[i],
> +   map_ops[i].dom, map_ops[i].ref);
> + if (ret < 0)
> + goto out;
>  
> - SetPagePrivate(pages[i]);
> - set_page_private(pages[i], mfn);
> + WARN(pfn_to_mfn(pfn) != INVALID_P2M_ENTRY, "page must be 
> ballooned");
>  
>   if (unlikely(!set_phys_to_machine(pfn, FOREIGN_FRAME(mfn {
>   ret = -ENOMEM;
> @@ -716,9 +754,7 @@ int clear_foreign_p2m_mapping(struct 
> gnttab_unmap_grant_ref *unmap_ops,
>   goto out;
>   }
>  
> - set_page_private(pages[i], INVALID_P2M_ENTRY);
> - WARN_ON(!PagePrivate(pages[i]));
> - ClearPagePrivate(pages[i]);
> + clear_page_grant_ref(pages[i]);
>   set_phys_to_machine(pfn, INVALID_P2M_ENTRY);
>   }
>   if (kunmap_ops)
> diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
> index 7235d8f..381f007 100644
> --- a/include/xen/grant_table.h
> +++ b/include/xen/grant_table.h
> @@ -45,6 +45,7 @@
>  #include 
>  
>  #include 
> +#include 
>  
>  #define GNTTAB_RESERVED_XENSTORE 1
>  
> @@ -182,4 +183,16 @@ int gnttab_unmap_refs(struct gnttab_unmap_grant_ref 
> *unmap_ops,
>  void gnttab_batch_map(struct gnttab_map_grant_ref *batch, unsigned count);
>  void gnttab_batch_copy(struct gnttab_copy *batch, unsigned count);
>  
> +static inline void
> +get_page_grant_ref(struct page *p, domid_t* domid, grant_ref_t* grantref)
> +{
> +#ifdef CONFIG_X86_64
> + uint64_t gref = p->private;
> +#else
> + uint64_t gref = *p->private;
> +#endif
> + *domid = gref & 0x;
> + *grantref = gref >> 32;
> +}
> +
>  #endif /* __ASM_GNTTAB_H__ */
> -- 
> 1.7.10.4
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 12/12] xen/gntdev: provide a set of pages for each VMA

2015-01-09 Thread Stefano Stabellini
On Fri, 9 Jan 2015, Stefano Stabellini wrote:
> On Tue, 6 Jan 2015, David Vrabel wrote:
> > From: Jenny Herbert 
> > 
> > For each VMA for grant maps, provide the correct array of pages so
> > get_user_pages() on foreign mappings works in PV guests.
> > 
> > Signed-off-by: Jenny Herbert 
> > Signed-off-by: David Vrabel 
> > ---
> >  drivers/xen/gntdev.c |2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
> > index 5d73fe8..09e7863 100644
> > --- a/drivers/xen/gntdev.c
> > +++ b/drivers/xen/gntdev.c
> > @@ -870,6 +870,8 @@ static int gntdev_mmap(struct file *flip, struct 
> > vm_area_struct *vma)
> > vma->vm_end - vma->vm_start,
> > set_grant_ptes_as_special, NULL);
> > }
> > +
> > +   vma->pages = map->pages;
> 
> Shouldn't this be done just for x86 PV guests rather than all the
> callers of gntdev_mmap?

I guess that since on ARM and PVH the pte wouldn't be marked SPECIAL, it
wouldn't make a difference, but it would still be nice to be consistent.


> > }
> >  
> > return 0;
> > -- 
> > 1.7.10.4
> > 
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] libxl: Don't ignore error when we fail to give access to ioport/irq/iomem

2015-01-09 Thread Wei Liu
On Fri, Jan 09, 2015 at 03:56:45PM +, Julien Grall wrote:
> If we fail to give the access, the domain will unlikely work correctly.
> So we should bail out at the first error.
> 
> Signed-off-by: Julien Grall 
> Cc: Ian Jackson 
> Cc: Stefano Stabellini 
> Cc: Ian Campbell 

Acked-by: Wei Liu 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/3] spinlock: use local_irq_disable() instead of local_irq_save() where possible

2015-01-09 Thread Jan Beulich
>>> On 09.01.15 at 17:02,  wrote:
> On Thu, Jan 08, 2015 at 04:13:03PM +, Jan Beulich wrote:
>> ... as generally being a cheaper operation.
> 
> I was wondering if it would be possible to change some of the
> EFLAGS after when we go in the 'cpu_relax' - and an interrupt
> happens, we process it, alter the EFLAGS, then when we are
> done, the EFLAGS are different - which the original code would
> save when it was done sitting on the cpu_relax() loop.
> 
> Actually that sounds bad - we only want to restore the flags
> that we had when going in this spin lock. Would make sense
> to add an ASSERT to check for flags being different from the
> EFLAGS?

I'm not sure I'm following what you try to tell me. For one,
interrupts don't change EFLAGS - the IRET restores them. And
second, local_irq_restore() is only guaranteed to restore
EFLAGS.IF - whether it touches other flags if largely undefined.

Jan

>> --- a/xen/common/spinlock.c
>> +++ b/xen/common/spinlock.c
>> @@ -162,7 +162,7 @@ unsigned long _spin_lock_irqsave(spinloc
>>  local_irq_restore(flags);
>>  while ( likely(_raw_spin_is_locked(&lock->raw)) )
>>  cpu_relax();
>> -local_irq_save(flags);
>> +local_irq_disable();
>>  }
>>  LOCK_PROFILE_GOT;
>>  preempt_disable();
>> @@ -313,7 +313,7 @@ unsigned long _read_lock_irqsave(rwlock_
>>  local_irq_restore(flags);
>>  while ( (x = lock->lock) & RW_WRITE_FLAG )
>>  cpu_relax();
>> -local_irq_save(flags);
>> +local_irq_disable();
>>  }
>>  } while ( cmpxchg(&lock->lock, x, x+1) != x );
>>  preempt_disable();
>> @@ -409,7 +409,7 @@ unsigned long _write_lock_irqsave(rwlock
>>  local_irq_restore(flags);
>>  while ( (x = lock->lock) & RW_WRITE_FLAG )
>>  cpu_relax();
>> -local_irq_save(flags);
>> +local_irq_disable();
>>  }
>>  } while ( cmpxchg(&lock->lock, x, x|RW_WRITE_FLAG) != x );
>>  while ( x != 0 )



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/3] spinlock: use local_irq_disable() instead of local_irq_save() where possible

2015-01-09 Thread Andrew Cooper
On 09/01/15 16:02, Konrad Rzeszutek Wilk wrote:
> On Thu, Jan 08, 2015 at 04:13:03PM +, Jan Beulich wrote:
>> ... as generally being a cheaper operation.
> I was wondering if it would be possible to change some of the
> EFLAGS after when we go in the 'cpu_relax' - and an interrupt
> happens, we process it, alter the EFLAGS, then when we are
> done, the EFLAGS are different - which the original code would
> save when it was done sitting on the cpu_relax() loop.
>
> Actually that sounds bad - we only want to restore the flags
> that we had when going in this spin lock. Would make sense
> to add an ASSERT to check for flags being different from the
> EFLAGS?

local_irq_restore() only restores the interrupt flag from flags.  All
other bits in EFLAGS are unmodified.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 08/12] xen/grant-table: add a mechanism to safely unmap pages that are in use

2015-01-09 Thread Stefano Stabellini
On Tue, 6 Jan 2015, David Vrabel wrote:
> From: Jenny Herbert 
> 
> Introduce gnttab_unmap_refs_async() that can be used to safely unmap
> pages that may be in use (ref count > 1).  If the pages are in use the
> unmap is deferred and retried later.  This polling is not very clever
> but it should be good enough if the cases where the delay is necessary
> are rare.
> 
> This is needed to allow block backends using grant mapping to safely
> use network storage (block or filesystem based such as iSCSI or NFS).
> 
> The network storage driver may complete a block request whilst there
> is a queued network packet retry (because the ack from the remote end
> races with deciding to queue the retry).  The pages for the retried
> packet would be grant unmapped and the network driver (or hardware)
> would access the unmapped page.
> 
> Signed-off-by: Jenny Herbert 
> Signed-off-by: David Vrabel 
> ---
>  drivers/xen/grant-table.c |   44 
>  include/xen/grant_table.h |   18 ++
>  2 files changed, 62 insertions(+)
> 
> diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
> index 999d7ab..dc0edfa 100644
> --- a/drivers/xen/grant-table.c
> +++ b/drivers/xen/grant-table.c
> @@ -42,6 +42,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -751,6 +752,49 @@ int gnttab_unmap_refs(struct gnttab_unmap_grant_ref 
> *unmap_ops,
>  }
>  EXPORT_SYMBOL_GPL(gnttab_unmap_refs);
>  
> +#define GNTTAB_UNMAP_REFS_DELAY 5
> +
> +static void __gnttab_unmap_refs_async(struct gntab_unmap_queue_data* item);
> +
> +static void gnttab_unmap_work(struct work_struct *work)
> +{
> + struct gntab_unmap_queue_data
> + *unmap_data = container_of(work, 
> +struct gntab_unmap_queue_data,
> +gnttab_work.work);
> + if (unmap_data->age!=UINT_MAX)

code style


> + unmap_data->age++;
> + __gnttab_unmap_refs_async(unmap_data);
> +}
> +
> +static void __gnttab_unmap_refs_async(struct gntab_unmap_queue_data* item)
> +{
> + int ret;
> + int pc;
> +
> + for (pc = 0; pc < item->count; pc++) {
> + if (page_count(item->pages[pc]) > 1) {
> + unsigned long delay = GNTTAB_UNMAP_REFS_DELAY * 
> (item->age + 1);

I am curious: why are you increasing the delay at each iteration
instead of keeping it constant?


> + schedule_delayed_work(&item->gnttab_work,
> +   msecs_to_jiffies(delay));
> + return;
> + }
> + }
> +
> + ret = gnttab_unmap_refs(item->unmap_ops, item->kunmap_ops,
> + item->pages, item->count);
> + item->done(ret, item);
> +}
> +
> +void gnttab_unmap_refs_async(struct gntab_unmap_queue_data* item)
> +{
> + INIT_DELAYED_WORK(&item->gnttab_work, gnttab_unmap_work);
> + item->age = 0;
> +
> + __gnttab_unmap_refs_async(item);
> +}
> +EXPORT_SYMBOL_GPL(gnttab_unmap_refs_async);
> +
>  static int gnttab_map_frames_v1(xen_pfn_t *frames, unsigned int nr_gframes)
>  {
>   int rc;
> diff --git a/include/xen/grant_table.h b/include/xen/grant_table.h
> index 381f007..f7c4821 100644
> --- a/include/xen/grant_table.h
> +++ b/include/xen/grant_table.h
> @@ -59,6 +59,22 @@ struct gnttab_free_callback {
>   u16 count;
>  };
>  
> +struct gntab_unmap_queue_data;
> +
> +typedef void (*gnttab_unmap_refs_done)(int result, struct 
> gntab_unmap_queue_data *data);
> +
> +struct gntab_unmap_queue_data
> +{
> + struct delayed_work gnttab_work;
> + void *data;
> + gnttab_unmap_refs_done  done;
> + struct gnttab_unmap_grant_ref *unmap_ops;
> + struct gnttab_unmap_grant_ref *kunmap_ops;
> + struct page **pages;
> + unsigned int count;
> + unsigned int age;
> +};
> +
>  int gnttab_init(void);
>  int gnttab_suspend(void);
>  int gnttab_resume(void);
> @@ -170,6 +186,8 @@ int gnttab_map_refs(struct gnttab_map_grant_ref *map_ops,
>  int gnttab_unmap_refs(struct gnttab_unmap_grant_ref *unmap_ops,
> struct gnttab_unmap_grant_ref *kunmap_ops,
> struct page **pages, unsigned int count);
> +void gnttab_unmap_refs_async(struct gntab_unmap_queue_data* item);
> +
>  
>  /* Perform a batch of grant map/copy operations. Retry every batch slot
>   * for which the hypervisor returns GNTST_eagain. This is typically due
> -- 
> 1.7.10.4
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH] libxl/arm: Correctly spelled FDT_ERR_* in a comment

2015-01-09 Thread Julien Grall
Signed-off-by: Julien Grall 
Cc: Ian Jackson 
Cc: Stefano Stabellini 
Cc: Ian Campbell 
Cc: Wei Liu 
---
 tools/libxl/libxl_arm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/libxl/libxl_arm.c b/tools/libxl/libxl_arm.c
index 448ac07..ce0be84 100644
--- a/tools/libxl/libxl_arm.c
+++ b/tools/libxl/libxl_arm.c
@@ -546,7 +546,7 @@ int libxl__arch_domain_init_hw_description(libxl__gc *gc,
 LOG(DEBUG, "  - vGIC version: %s", gicv_to_string(config.gic_version));
 
 /*
- * Call "call" handling FDR_ERR_*. Will either:
+ * Call "call" handling FDT_ERR_*. Will either:
  * - loop back to retry_resize
  * - set rc and goto out
  * - fall through successfully
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 06/12] xen: mark grant mapped pages as foreign

2015-01-09 Thread Ian Campbell
On Fri, 2015-01-09 at 16:03 +, Stefano Stabellini wrote:
> On Tue, 6 Jan 2015, David Vrabel wrote:
> > From: Jenny Herbert 
> > 
> > Use the "foreign" page flag to mark pages that have a grant map.  Use
> > page->private to store information of the grant (the granting domain
> > and the grant reference).
> > 
> > Signed-off-by: Jenny Herbert 
> > Signed-off-by: David Vrabel 
> > ---
> >  arch/x86/xen/p2m.c|   50 
> > ++---
> >  include/xen/grant_table.h |   13 
> >  2 files changed, 56 insertions(+), 7 deletions(-)
> > 
> > diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
> > index 0d70814..22624a3 100644
> > --- a/arch/x86/xen/p2m.c
> > +++ b/arch/x86/xen/p2m.c
> > @@ -648,6 +648,43 @@ bool set_phys_to_machine(unsigned long pfn, unsigned 
> > long mfn)
> > return true;
> >  }
> >  
> > +static int
> > +init_page_grant_ref(struct page *p, domid_t domid, grant_ref_t grantref)
> > +{
> > +#ifdef CONFIG_X86_64
> > +   uint64_t gref;
> > +   uint64_t* gref_p = &gref;
> > +#else
> > +   uint64_t* gref_p = kmalloc(sizeof(uint64_t), GFP_KERNEL);
> > +   if (!gref)
> > +   return -ENOMEM;
> > +   uint64_t* gref = gref_p;
> > +#endif
> > +
> > +   *gref_p = ((uint64_t) grantref << 32) | domid;
> > +   p->private = gref;
> > +
> > +   WARN_ON(PagePrivate(p));
> > +   WARN_ON(PageForeign(p));
> > +   SetPagePrivate(p);
> > +   SetPageForeign(p);
> > +   return 0;
> > +}
> > +
> > +static void
> > +clear_page_grant_ref(struct page *p)
> > +{
> > +   WARN_ON(!PagePrivate(p));
> > +   WARN_ON(!PageForeign(p));
> > +
> > +#ifndef CONFIG_X86_64
> > +   kfree(p->private);
> > +#endif
> > +   p->private = 0;
> > +   ClearPagePrivate(p);
> > +   ClearPageForeign(p);
> > +}
> 
> Given that get_page_grant_ref is used by netback, these functions need
> to be made arch-independent, moved to an arch-independent code location
> and called appropriately.

... or stubbed out for arches which don't need this (which might include
arm*?).



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/3] spinlock: use local_irq_disable() instead of local_irq_save() where possible

2015-01-09 Thread Konrad Rzeszutek Wilk
On Fri, Jan 09, 2015 at 04:09:30PM +, Andrew Cooper wrote:
> On 09/01/15 16:02, Konrad Rzeszutek Wilk wrote:
> > On Thu, Jan 08, 2015 at 04:13:03PM +, Jan Beulich wrote:
> >> ... as generally being a cheaper operation.
> > I was wondering if it would be possible to change some of the
> > EFLAGS after when we go in the 'cpu_relax' - and an interrupt
> > happens, we process it, alter the EFLAGS, then when we are
> > done, the EFLAGS are different - which the original code would
> > save when it was done sitting on the cpu_relax() loop.
> >
> > Actually that sounds bad - we only want to restore the flags
> > that we had when going in this spin lock. Would make sense
> > to add an ASSERT to check for flags being different from the
> > EFLAGS?
> 
> local_irq_restore() only restores the interrupt flag from flags.  All
> other bits in EFLAGS are unmodified.

which I would have found out if I read the code from local_irq_restore().

Sorry about the noise - should have looked at the code before asking
questions!
> 
> ~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for 2.3 v2 1/1] xen-hvm: increase maxmem before calling xc_domain_populate_physmap

2015-01-09 Thread Don Slutz
Ping

On 12/23/14 09:35, Don Slutz wrote:
> Ping.
> 
> On 12/03/14 08:15, Don Slutz wrote:
>> From: Stefano Stabellini 
>>
>> Increase maxmem before calling xc_domain_populate_physmap_exact to
>> avoid the risk of running out of guest memory. This way we can also
>> avoid complex memory calculations in libxl at domain construction
>> time.
>>
>> This patch fixes an abort() when assigning more than 4 NICs to a VM.
>>
>> Signed-off-by: Stefano Stabellini 
>> Signed-off-by: Don Slutz 
>> ---
>> v2: Changes by Don Slutz
>>Switch from xc_domain_getinfo to xc_domain_getinfolist
>>Fix error check for xc_domain_getinfolist
>>Limit increase of maxmem to only do when needed:
>>  Add QEMU_SPARE_PAGES (How many pages to leave free)
>>  Add free_pages calculation
>>
>>   xen-hvm.c | 19 +++
>>   1 file changed, 19 insertions(+)
>>
>> diff --git a/xen-hvm.c b/xen-hvm.c
>> index 7548794..d30e77e 100644
>> --- a/xen-hvm.c
>> +++ b/xen-hvm.c
>> @@ -90,6 +90,7 @@ static inline ioreq_t
>> *xen_vcpu_ioreq(shared_iopage_t *shared_page, int vcpu)
>>   #endif
>> #define BUFFER_IO_MAX_DELAY  100
>> +#define QEMU_SPARE_PAGES 16
>> typedef struct XenPhysmap {
>>   hwaddr start_addr;
>> @@ -244,6 +245,8 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t
>> size, MemoryRegion *mr)
>>   unsigned long nr_pfn;
>>   xen_pfn_t *pfn_list;
>>   int i;
>> +xc_domaininfo_t info;
>> +unsigned long free_pages;
>> if (runstate_check(RUN_STATE_INMIGRATE)) {
>>   /* RAM already populated in Xen */
>> @@ -266,6 +269,22 @@ void xen_ram_alloc(ram_addr_t ram_addr,
>> ram_addr_t size, MemoryRegion *mr)
>>   pfn_list[i] = (ram_addr >> TARGET_PAGE_BITS) + i;
>>   }
>>   +if ((xc_domain_getinfolist(xen_xc, xen_domid, 1, &info) != 1) ||
>> +(info.domain != xen_domid)) {
>> +hw_error("xc_domain_getinfolist failed");
>> +}
>> +free_pages = info.max_pages - info.tot_pages;
>> +if (free_pages > QEMU_SPARE_PAGES) {
>> +free_pages -= QEMU_SPARE_PAGES;
>> +} else {
>> +free_pages = 0;
>> +}
>> +if ((free_pages < nr_pfn) &&
>> +(xc_domain_setmaxmem(xen_xc, xen_domid,
>> + ((info.max_pages + nr_pfn - free_pages)
>> +  << (XC_PAGE_SHIFT - 10))) < 0)) {
>> +hw_error("xc_domain_setmaxmem failed");
>> +}
>>   if (xc_domain_populate_physmap_exact(xen_xc, xen_domid, nr_pfn,
>> 0, 0, pfn_list)) {
>>   hw_error("xen: failed to populate ram at " RAM_ADDR_FMT,
>> ram_addr);
>>   }
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 06/12] xen: mark grant mapped pages as foreign

2015-01-09 Thread David Vrabel
On 09/01/15 16:19, Ian Campbell wrote:
> On Fri, 2015-01-09 at 16:03 +, Stefano Stabellini wrote:
>> On Tue, 6 Jan 2015, David Vrabel wrote:
>>> From: Jenny Herbert 
>>>
>>> Use the "foreign" page flag to mark pages that have a grant map.  Use
>>> page->private to store information of the grant (the granting domain
>>> and the grant reference).
>>>
>>> Signed-off-by: Jenny Herbert 
>>> Signed-off-by: David Vrabel 
>>> ---
>>>  arch/x86/xen/p2m.c|   50 
>>> ++---
>>>  include/xen/grant_table.h |   13 
>>>  2 files changed, 56 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
>>> index 0d70814..22624a3 100644
>>> --- a/arch/x86/xen/p2m.c
>>> +++ b/arch/x86/xen/p2m.c
>>> @@ -648,6 +648,43 @@ bool set_phys_to_machine(unsigned long pfn, unsigned 
>>> long mfn)
>>> return true;
>>>  }
>>>  
>>> +static int
>>> +init_page_grant_ref(struct page *p, domid_t domid, grant_ref_t grantref)
>>> +{
>>> +#ifdef CONFIG_X86_64
>>> +   uint64_t gref;
>>> +   uint64_t* gref_p = &gref;
>>> +#else
>>> +   uint64_t* gref_p = kmalloc(sizeof(uint64_t), GFP_KERNEL);
>>> +   if (!gref)
>>> +   return -ENOMEM;
>>> +   uint64_t* gref = gref_p;
>>> +#endif
>>> +
>>> +   *gref_p = ((uint64_t) grantref << 32) | domid;
>>> +   p->private = gref;
>>> +
>>> +   WARN_ON(PagePrivate(p));
>>> +   WARN_ON(PageForeign(p));
>>> +   SetPagePrivate(p);
>>> +   SetPageForeign(p);
>>> +   return 0;
>>> +}
>>> +
>>> +static void
>>> +clear_page_grant_ref(struct page *p)
>>> +{
>>> +   WARN_ON(!PagePrivate(p));
>>> +   WARN_ON(!PageForeign(p));
>>> +
>>> +#ifndef CONFIG_X86_64
>>> +   kfree(p->private);
>>> +#endif
>>> +   p->private = 0;
>>> +   ClearPagePrivate(p);
>>> +   ClearPageForeign(p);
>>> +}
>>
>> Given that get_page_grant_ref is used by netback, these functions need
>> to be made arch-independent, moved to an arch-independent code location
>> and called appropriately.

I've fixed this already.

> ... or stubbed out for arches which don't need this (which might include
> arm*?).

I'm reasonably certain that this is required for HVM and ARM guests as
well.  The grant copy will still fail to get the page by gfn since the
mfn is owned by a different domain.

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 12/12] xen/gntdev: provide a set of pages for each VMA

2015-01-09 Thread David Vrabel
On 09/01/15 16:05, Stefano Stabellini wrote:
> On Fri, 9 Jan 2015, Stefano Stabellini wrote:
>> On Tue, 6 Jan 2015, David Vrabel wrote:
>>> From: Jenny Herbert 
>>>
>>> For each VMA for grant maps, provide the correct array of pages so
>>> get_user_pages() on foreign mappings works in PV guests.
>>>
>>> Signed-off-by: Jenny Herbert 
>>> Signed-off-by: David Vrabel 
>>> ---
>>>  drivers/xen/gntdev.c |2 ++
>>>  1 file changed, 2 insertions(+)
>>>
>>> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
>>> index 5d73fe8..09e7863 100644
>>> --- a/drivers/xen/gntdev.c
>>> +++ b/drivers/xen/gntdev.c
>>> @@ -870,6 +870,8 @@ static int gntdev_mmap(struct file *flip, struct 
>>> vm_area_struct *vma)
>>> vma->vm_end - vma->vm_start,
>>> set_grant_ptes_as_special, NULL);
>>> }
>>> +
>>> +   vma->pages = map->pages;
>>
>> Shouldn't this be done just for x86 PV guests rather than all the
>> callers of gntdev_mmap?
> 
> I guess that since on ARM and PVH the pte wouldn't be marked SPECIAL, it
> wouldn't make a difference, but it would still be nice to be consistent.

This is already PV only.

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 06/12] xen: mark grant mapped pages as foreign

2015-01-09 Thread Stefano Stabellini
On Fri, 9 Jan 2015, David Vrabel wrote:
> On 09/01/15 16:19, Ian Campbell wrote:
> > On Fri, 2015-01-09 at 16:03 +, Stefano Stabellini wrote:
> >> On Tue, 6 Jan 2015, David Vrabel wrote:
> >>> From: Jenny Herbert 
> >>>
> >>> Use the "foreign" page flag to mark pages that have a grant map.  Use
> >>> page->private to store information of the grant (the granting domain
> >>> and the grant reference).
> >>>
> >>> Signed-off-by: Jenny Herbert 
> >>> Signed-off-by: David Vrabel 
> >>> ---
> >>>  arch/x86/xen/p2m.c|   50 
> >>> ++---
> >>>  include/xen/grant_table.h |   13 
> >>>  2 files changed, 56 insertions(+), 7 deletions(-)
> >>>
> >>> diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c
> >>> index 0d70814..22624a3 100644
> >>> --- a/arch/x86/xen/p2m.c
> >>> +++ b/arch/x86/xen/p2m.c
> >>> @@ -648,6 +648,43 @@ bool set_phys_to_machine(unsigned long pfn, unsigned 
> >>> long mfn)
> >>>   return true;
> >>>  }
> >>>  
> >>> +static int
> >>> +init_page_grant_ref(struct page *p, domid_t domid, grant_ref_t grantref)
> >>> +{
> >>> +#ifdef CONFIG_X86_64
> >>> + uint64_t gref;
> >>> + uint64_t* gref_p = &gref;
> >>> +#else
> >>> + uint64_t* gref_p = kmalloc(sizeof(uint64_t), GFP_KERNEL);
> >>> + if (!gref)
> >>> + return -ENOMEM;
> >>> + uint64_t* gref = gref_p;
> >>> +#endif
> >>> +
> >>> + *gref_p = ((uint64_t) grantref << 32) | domid;
> >>> + p->private = gref;
> >>> +
> >>> + WARN_ON(PagePrivate(p));
> >>> + WARN_ON(PageForeign(p));
> >>> + SetPagePrivate(p);
> >>> + SetPageForeign(p);
> >>> + return 0;
> >>> +}
> >>> +
> >>> +static void
> >>> +clear_page_grant_ref(struct page *p)
> >>> +{
> >>> + WARN_ON(!PagePrivate(p));
> >>> + WARN_ON(!PageForeign(p));
> >>> +
> >>> +#ifndef CONFIG_X86_64
> >>> + kfree(p->private);
> >>> +#endif
> >>> + p->private = 0;
> >>> + ClearPagePrivate(p);
> >>> + ClearPageForeign(p);
> >>> +}
> >>
> >> Given that get_page_grant_ref is used by netback, these functions need
> >> to be made arch-independent, moved to an arch-independent code location
> >> and called appropriately.
> 
> I've fixed this already.
> 
> > ... or stubbed out for arches which don't need this (which might include
> > arm*?).
> 
> I'm reasonably certain that this is required for HVM and ARM guests as
> well.  The grant copy will still fail to get the page by gfn since the
> mfn is owned by a different domain.

I agree

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 06/12] xen: mark grant mapped pages as foreign

2015-01-09 Thread Ian Campbell
On Fri, 2015-01-09 at 16:39 +, David Vrabel wrote:
> > ... or stubbed out for arches which don't need this (which might include
> > arm*?).
> 
> I'm reasonably certain that this is required for HVM and ARM guests as
> well.  The grant copy will still fail to get the page by gfn since the
> mfn is owned by a different domain.

Yes, I'd somehow convinced myself this snippet was to do with the
m2p_override aspect of the series, which it's not.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86/traps: Avoid the #GP slowpath for guest #DB exceptions

2015-01-09 Thread Andrew Cooper
On 08/01/15 14:00, Jan Beulich wrote:
 On 08.01.15 at 14:25,  wrote:
>> do_debug() is capable of correctly dealing with #DB exceptions in guest
>> context, and indeed needs to be as the 'icebp' instruction skips the DPL
>> check anyway.
> I don't follow: ICEBP doesn't check DPL, right, but what does setting
> DPL to 3 buy us? Other than for INTO and INT3, we don't want to
> encourage use of INT $0x01 instructions, nor am I aware of anyone
> commonly using them. Yet afaict only they would be affected by your
> change. Actual #DB, just like any other hardware exceptions, don't
> consider DPL just like ICEBP doesn't.
>
> Jan
>

I noticed it as I was attempting to get my debugtraps test case working
for PV guests, which does indeed use 'int $0x1'.  But as you point out,
it is only 'int $0x1' we would gain a fastpath for, which doesn't really
make it worthwhile.

~Andrew


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [xen-4.2-testing test] 33282: tolerable FAIL - PUSHED

2015-01-09 Thread xen . org
flight 33282 xen-4.2-testing real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/33282/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-rumpuserxen-amd64  1 build-check(1)   blocked n/a
 test-i386-i386-rumpuserxen-i386  1 build-check(1)   blocked  n/a
 test-amd64-i386-rumpuserxen-i386  1 build-check(1)   blocked  n/a
 build-amd64-rumpuserxen   5 rumpuserxen-buildfail   never pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  7 debian-hvm-install  fail never pass
 test-amd64-amd64-libvirt  9 guest-start  fail   never pass
 test-i386-i386-libvirt9 guest-start  fail   never pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64  7 debian-hvm-install fail never pass
 build-i386-rumpuserxen5 rumpuserxen-buildfail   never pass
 test-amd64-i386-libvirt   9 guest-start  fail   never pass
 test-amd64-amd64-xl-win7-amd64 14 guest-stop   fail never pass
 test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop  fail never pass
 test-amd64-amd64-xl-pcipt-intel  9 guest-start fail never pass
 test-i386-i386-xl-qemut-winxpsp3 14 guest-stop fail never pass
 test-amd64-i386-xend-winxpsp3 17 leak-check/check fail  never pass
 test-amd64-amd64-xl-winxpsp3 14 guest-stop   fail   never pass
 test-i386-i386-xl-qemuu-winxpsp3 14 guest-stop fail never pass
 test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop  fail never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass
 test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop   fail never pass
 test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass
 test-amd64-i386-xend-qemut-winxpsp3 17 leak-check/checkfail never pass
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-amd64-xl-qemuu-winxpsp3 14 guest-stop   fail never pass
 test-amd64-i386-xl-win7-amd64 14 guest-stop   fail  never pass
 test-i386-i386-xl-winxpsp3   14 guest-stop   fail   never pass
 test-amd64-amd64-xl-qemut-winxpsp3 14 guest-stop   fail never pass

version targeted for testing:
 xen  95af3f09eeef089e0100a8518f7ca75206e33c7c
baseline version:
 xen  353de6b221c2d0fb59edfceb1f535357e4d84825


People who touched revisions under test:
  Mihai Donțu 
  Răzvan Cojocaru 


jobs:
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 build-amd64-rumpuserxen  fail
 build-i386-rumpuserxen   fail
 test-amd64-amd64-xl  pass
 test-amd64-i386-xl   pass
 test-i386-i386-xlpass
 test-amd64-i386-rhel6hvm-amd pass
 test-amd64-i386-qemut-rhel6hvm-amd   pass
 test-amd64-i386-qemuu-rhel6hvm-amd   pass
 test-amd64-amd64-xl-qemut-debianhvm-amd64pass
 test-amd64-i386-xl-qemut-debianhvm-amd64 pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-i386-xl-qemuu-debianhvm-amd64 pass
 test-amd64-i386-qemuu-freebsd10-amd64pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 fail
 test-amd64-i386-xl-qemuu-ovmf-amd64  fail
 test-amd64-amd64-rumpuserxen-amd64   blocked
 test-amd64-amd64-xl-qemut-win7-amd64 fail
 test-amd64-i386-xl-qemut-win7-amd64  fail
 test-amd64-amd64-xl-qemuu-win7-amd64 fail
 test-amd64-i386-xl-qemuu-win7-amd64  fail
 test-amd64-amd64-xl-win7-amd64   fail
 test-amd64-i386-xl-win7-amd64fail
 test-amd64-i386-xl-credit2   pass
 test-amd64-i386-qemuu-freebsd10-i386 pass
 test-amd64-i386-rumpuserxen-i386 blocked
 test-i386-i386-rumpuserxen-i386  blocked
 test-amd64-amd64-xl-pcipt-intel

Re: [Xen-devel] [libvirt test] 33157: regressions - trouble: blocked/fail/pass/preparing/queued

2015-01-09 Thread Dario Faggioli
On Thu, 2015-01-08 at 14:56 -0700, Jim Fehlig wrote:
> Jim Fehlig wrote:
> > No, not directly.  But I will be doing so now.  I should try to revert
> > all this nonsense and use libxlu before it ends up in the next libvirt
> > release.
> >   
> 
> Hmm, I don't think that is going to be possible since libxlutil.h has
> never been installed :-(.  
>
Wow... That's bad! :-/

> From 3425c1cef21d0295fa8fbf9465ea7273b717f637 Mon Sep 17 00:00:00 2001
> From: Jim Fehlig 
> Date: Thu, 8 Jan 2015 14:43:28 -0700
> Subject: [PATCH] Install libxlutil.h
>
> libxlutil.{a,so} are installed, but not the corresponding header
> file.
>
> Signed-off-by: Jim Fehlig 
>
FWIW, Reviewed-by: Dario Faggioli 

BTW, would backporting this patch makes things at least a bit better?

Regards,
Dario

-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)



signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH resend] libxl, hotplug/Linux: default to phy backend for raw format file, take 2

2015-01-09 Thread Dario Faggioli
On Fri, 2015-01-09 at 14:42 +, Wei Liu wrote:
> On Thu, Jan 08, 2015 at 03:33:17PM +0100, Fabio Fantoni wrote:

> > Sorry for the probably stupid question, what are the pros and cons of
> > default use of phy instead qdisk for raw files as domU disk?
> > 
> 
> There's no stupid question. :-)
> 
> I was told that it performs better and enables other potential
> improvements.
> 
Not a big deal, probably, but IMO it would be nice to have at least a
few words about when it's happening in the changelog, wouldn't it?

Regards,
Dario

-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)



signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [libvirt test] 33157: regressions - trouble: blocked/fail/pass/preparing/queued

2015-01-09 Thread Jim Fehlig
Dario Faggioli wrote:
> On Thu, 2015-01-08 at 14:56 -0700, Jim Fehlig wrote:
>   
>> Jim Fehlig wrote:
>> 
>>> No, not directly.  But I will be doing so now.  I should try to revert
>>> all this nonsense and use libxlu before it ends up in the next libvirt
>>> release.
>>>   
>>>   
>> Hmm, I don't think that is going to be possible since libxlutil.h has
>> never been installed :-(.  
>>
>> 
> Wow... That's bad! :-/
>
>   
>> From 3425c1cef21d0295fa8fbf9465ea7273b717f637 Mon Sep 17 00:00:00 2001
>> From: Jim Fehlig 
>> Date: Thu, 8 Jan 2015 14:43:28 -0700
>> Subject: [PATCH] Install libxlutil.h
>>
>> libxlutil.{a,so} are installed, but not the corresponding header
>> file.
>>
>> Signed-off-by: Jim Fehlig 
>>
>> 
> FWIW, Reviewed-by: Dario Faggioli 
>
> BTW, would backporting this patch makes things at least a bit better?
>   

Yes.  As Ian noted, it should be backported everywhere.  But I'll also
provide a workaround in libvirt.

Regards,
Jim

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen 4.5 Development Update (GA slip by a week), GA slip + 1 day

2015-01-09 Thread Konrad Rzeszutek Wilk
On Fri, Jan 09, 2015 at 01:51:56PM +, Lars Kurth wrote:
> Note that I published the Acknowledgements and most 4.5 documentation is now 
> in place. Bits which you may want to check and fix
> * Spelling of your name if it contains special characters as the scripts that 
> I use nuke them - 
> http://wiki.xenproject.org/wiki/Xen_Project_4.5_Acknowledgements
> * Some additions to 
> http://wiki.xenproject.org/wiki/Xen_Project_Release_Features, which I will do
> 
> And we do need to populate the Known Issues section of 
> http://wiki.xenproject.org/wiki/Xen_Project_4.5_Release_Notes

I've added the one that I am aware (systemd).

In 4.4 we also had this 
http://wiki.xenproject.org/wiki?title=Xen_Project_4.4_Feature_List
which was linked of the  
http://wiki.xenproject.org/wiki/Xen_Project_4.4_Release_Notes

I can populate most of htem or we can just reference the blog that will be on
the release day?
> 
> Regards
> Lars
> 
> On 8 Jan 2015, at 19:30, Konrad Rzeszutek Wilk  wrote:
> 
>  = Timeline =
>  
>  Xen 4.5 is a 10 month release. The dates are:
>  
>  * Feature Freeze: 24th September 2014
>  * First RC: 24th October [Friday!]
>  * RC2: Nov 11th
>  * RC2 Test-day: Nov 13th
>  * RC3: Dec 3rd.
>  * RC3 Test-day: Dec 4th
>  * RC4: Dec 15th
>  * RC4 Test-day: Dec 17th
>  
>  < WE ARE HERE ===>
>  
>  Release Date: Jan 14th.
> > 
> > New date: Jan 15th.
> > 
> > Due to marketing reasons (LinuxFoundation has something
> > scheduled on Jan 14th and we don't want to overlap).
> > 
> > That should not change anything on the technical side as the
> > last commit went in yesterday. Just a bit longer..
> > 
>  
>  
>  ___
>  Xen-devel mailing list
>  Xen-devel@lists.xen.org
>  http://lists.xen.org/xen-devel
> >>> 
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen's Linux kernel config options V2

2015-01-09 Thread Konrad Rzeszutek Wilk
On Tue, Dec 16, 2014 at 05:21:05PM +0100, Juergen Gross wrote:
> Hi,
> 
> This is a design proposal for a rework of the config options on the
> Linux kernel which are related to Xen.
> 
> The need to do so arose from the fact that it is currently not
> possible to build the Xen frontend drivers for a non-pvops kernel,
> e.g. to run them in a HVM-domain. There are more drawbacks in the
> current config options to which I'll come later.
> 
> Current Xen related config options are as follows:
> 
> Option  Selects Depends
> -
> XEN PARAVIRT_CLOCK(x86) PARAVIRT(x86)
> XEN_HAVE_PVMMU(x86)
> SWIOTLB_XEN(arm,arm64)
>   PCI_XEN(x86)  SWIOTLB_XEN
>   XEN_DOM0  PCI_XEN(x86)
> XEN_BACKEND
>   XEN_BLKDEV_BACKEND
>   XEN_PCIDEV_BACKEND(x86)
>   XEN_SCSI_BACKEND
>   XEN_NETDEV_BACKEND
> XEN_ACPI_HOTPLUG_MEMORY XEN_STUB
> XEN_ACPI_HOTPLUG_CPUXEN_STUB
> XEN_MCE_LOG(x86)
>   XEN_PVHVM(x86)
> XEN_PVH(x86)
>   XEN_MAX_DOMAIN_MEMORY(x86)
>   XEN_SAVE_RESTORE(x86)
>   XEN_DEBUG_FS(x86)
>   XEN_FBDEV_FRONTENDXEN_XENBUS_FRONTEND
> INPUT_XEN_KBDDEV_FRONTEND
>   XEN_BLKDEV_FRONTEND   XEN_XENBUS_FRONTEND
>   HVC_XEN
> HVC_XEN_FRONTENDXEN_XENBUS_FRONTEND
>   TCG_XEN   XEN_XENBUS_FRONTEND
>   XEN_PCIDEV_FRONTEND   PCI_XEN
> XEN_XENBUS_FRONTEND
>   XEN_SCSI_FRONTEND XEN_XENBUS_FRONTEND
>   INPUT_XEN_KBDDEV_FRONTEND XEN_XENBUS_FRONTEND
>   XEN_WDT
>   XEN_BALLOON
> XEN_SELFBALLOONING  XEN_TMEM
> XEN_BALLOON_MEMORY_HOTPLUG
> XEN_SCRUB_PAGES
>   XEN_DEV_EVTCHN
>   XENFS XEN_PRIVCMD
> XEN_COMPAT_XENFS
>   XEN_SYS_HYPERVISOR
>   XEN_XENBUS_FRONTEND
>   XEN_GNTDEV
>   XEN_GRANT_DEV_ALLOC
>   SWIOTLB_XEN
>   XEN_TMEM(x86)
>   XEN_PRIVCMD
>   XEN_STUB(x86_64)  BROKEN
>   XEN_ACPI_PROCESSOR(x86)
>   XEN_HAVE_PVMMU
>   XEN_EFI(x64)
>   XEN_NETDEV_FRONTEND   XEN_XENBUS_FRONTEND
> 
> Legend:
> - The first column are the Xen config options. Indentation in this
>   column reflects the dependency between those options (e.g.
>   XEN_SCSI_BACKEND depends on XEN_BACKEND, which in turn depends on
>   XEN_DOM0, which depends on XEN).
> - The second column shows the options which are selected automatically
>   if the corresponding option in the first column is configured.
> - The last column shows additional dependencies which can't be shown via
>   the indentation level of the first column.
> - Some options or dependencies are architecture specific, they are
>   listed in brackets (e.g. XEN_TMEM which is available for x86 only).
> - Non-Xen options are mentioned only if they seem to be really relevant,
>   like e.g. PARAVIRT or BROKEN.
> 
> There are several reasons to change the config options:
> - Be able to build Xen frontend drivers for HVM domains without the need
>   for choosing a pvops kernel: currently the frontend drivers need XEN
>   configured which depends on PARAVIRT.
> - Be able to build a Dom0 using XEN_PVH without having to configure
>   XEN_HAVE_PVMMU.
> - Be able to build HVM driver domains.
> - Some features are available for x86 only, in spite of being not
>   architecture specific, e.g. XEN_DEBUG_FS or XEN_TMEM.
> 
> After some feedback for the first draft I'd suggest the following:
> 
> Option  Selects Depends
> --
> XEN
>   XEN_PV(x86)   XEN_HAVE_PVMMU
> PARAVIRT
> PARAVIRT_CLOCK
>   XEN_PVH(x86)  XEN_PVHVM
> PARAVIRT
> PARAVIRT_CLOCK
>   XEN_PVHVM PARAVIRT
> PARAVIRT_CLOCK
>   XEN_BACKEND   SWIOTLB_XEN(arm,arm64)  XEN_PV(x86) ||
> XEN_PVH(x86) ||
> XEN_PVHVM
> XEN_BLKDEV_BACKEND
> XEN_PCIDEV_BACKEND(x86)
> XEN_SCSI_BACKEND
> XEN_NETDEV_BACKEND
>   PCI_XEN(x86)  SWIOTLB_XEN
>   XEN_DOM0  XEN_BACKEND XEN_PV(x86) ||
> PCI_XEN(x86)XEN_PVH(x86)
> XEN_ACPI_HOTPLUG_MEMORY XEN_STUB
> XEN_ACPI_HOTPLUG_CPUXEN_STUB
> XEN_MCE_LOG(x86)

and XEN_ACPI_PROCESSOR(x86)

>   XEN_MAX_DOMAIN_MEMORY(x86)

which depends on XEN_PV

Re: [Xen-devel] One question about the hypercall to translate gfn to mfn.

2015-01-09 Thread Konrad Rzeszutek Wilk
On Fri, Jan 09, 2015 at 08:02:48AM +, Tian, Kevin wrote:
> > From: Tim Deegan [mailto:t...@xen.org]
> > Sent: Thursday, January 08, 2015 8:43 PM
> > 
> > Hi,
> > 
> > > > Not really.  The IOMMU tables are also 64-bit so there must be enough
> > > > addresses to map all of RAM.  There shouldn't be any need for these
> > > > mappings to be _contiguous_, btw.  You just need to have one free
> > > > address for each mapping.  Again, following how grant maps work, I'd
> > > > imagine that PVH guests will allocate an unused GFN for each mapping
> > > > and do enough bookkeeping to make sure they don't clash with other GFN
> > > > users (grant mapping, ballooning, &c).  PV guests will probably be
> > > > given a BFN by the hypervisor at map time (which will be == MFN in
> > > > practice) and just needs to pass the same BFN to the unmap call later
> > > > (it can store it in the GTT meanwhile).
> > >
> > > if possible prefer to make both consistent, i.e. always finding unused 
> > > GFN?
> > 
> > I don't think it will be possible.  PV domains are already using BFNs
> > supplied by Xen (in fact == MFN) for backend grant mappings, which
> > would conflict with supplying their own for these mappings.  But
> > again, I think the kernel maintainers for Xen may have a better idea
> > of how these interfaces are used inside the kernel.  For example,
> > it might be easy enough to wrap the two systems inside a common API
> > inside linux.   Again, following how grant mapping works seems like
> > the way forward.
> > 
> 
> So Konrad, do you have any insight here? :-)

For grants we end up making the 'struct page' for said grant be visible
in our linear space. We stash the original BFNs(MFN) in the 'struct page'
and replace the P2M in PV guests with the new BFN(MFN). David and Jenniefer
is working on making this more lightweight.

How often do we these updates? We could also do simpler way - which is
what backend drivers do - is to get a swath of vmalloc memory and hooking
the BFNs to it.  That can stay for quite some time.

The neat thing about vmalloc is that it is an sliding-window
type mechanism to deal with memory that is not usually accessed via
linear page tables. 

I suppose the complexity behind this is that this 'window' at the GPU
page tables needs to change. As in it moves around as there are different
guests doing things. So the mechanism of swapping this 'window' is going
to be expensive to map/unmap (as you have to flush the TLBs in the 
initial domain for the page-tables - unless you have multiple
'windows' and we flush the olders ones lazily? But that sounds complex).

Who is doing the audit/modification ? Is it some application in the
initial domain (backend) domain or some driver in the kernel?

> 
> Thanks
> Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] (v2) Design proposal for RMRR fix

2015-01-09 Thread Konrad Rzeszutek Wilk
On Thu, Jan 08, 2015 at 06:02:04PM +, George Dunlap wrote:
> On Thu, Jan 8, 2015 at 4:10 PM, Jan Beulich  wrote:
>  On 08.01.15 at 16:59,  wrote:
> >> On Thu, Jan 8, 2015 at 1:54 PM, Jan Beulich  wrote:
>  the 1st invocation of this interface will save all reported reserved
>  regions under domain structure, and later invocation (e.g. from
>  hvmloader) gets saved content.
> >>>
> >>> Why would the reserved regions need attaching to the domain
> >>> structure? The combination of (to be) assigned devices and
> >>> global RMRR list always allow reproducing the intended set of
> >>> regions without any extra storage.
> >>
> >> So when you say "(to be) assigned devices", you mean any device which
> >> is currently assigned, *or may be assigned at some point in the
> >> future*?
> >
> > Yes.
> >
> >> Do you think the extra storage for "this VM might possibly be assigned
> >> this device at some point" wouldn't really be that much bigger than
> >> "this VM might possibly map this RMRR at some point in the future"?
> >
> > Since listing devices without RMRR association would be pointless,
> > I think a list of devices would require less storage. But see below.
> >
> >> It seems a lot cleaner to me to have the toolstack tell Xen what
> >> ranges are reserved for RMRR per VM, and then have Xen check again
> >> when assigning a device to make sure that the RMRRs have already been
> >> reserved.
> >
> > With an extra level of what can be got wrong by the admin.
> > However, I now realize that doing it this way would allow
> > specifying regions not associated with any device on the host
> > the guest boots on, but associated with one on a host the guest
> > may later migrate to.
> 
> I did say the toolstack, not the admin. :-)
> 
> At the xl level, I envisioned a single boolean that would say, "Make
> my memory layout resemble the host system" -- so the MMIO hole would
> be the same size, and all the RMRRs would be reserved.

Like the e820_host=1 ? :-)

> 
> But xapi, for instance, has a concept of "hardware pools" containing
> individual hardware devices, which can be assigned to VMs.  You could
> imagine a toolstack like xapi keeping track of all devices which
> *might be* assigned to a guest, and supplying Xen with the RMRRs.  As
> you say, then this could include hardware across a pool of hosts, with
> the RMRRs of any device in the system reserved.
> 
> Alternately, could the toolstack could be responsible for making sure
> that nobody uses such a range; and then Xen when a device is assigned,
> Xen can check to make sure that the gpfn space is empty before adding
> the RMRRs?  That might be the most flexible.
> 
>  -George
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Bugfix] x86/apic: Fix xen IRQ allocation failure caused by commit b81975eade8c

2015-01-09 Thread Konrad Rzeszutek Wilk
On Thu, Jan 08, 2015 at 02:36:38PM +0800, Jiang Liu wrote:
> On 2015/1/7 23:44, Konrad Rzeszutek Wilk wrote:
> > On Wed, Jan 07, 2015 at 11:37:52PM +0800, Jiang Liu wrote:
> >> On 2015/1/7 22:50, Konrad Rzeszutek Wilk wrote:
> >>> On Wed, Jan 07, 2015 at 02:13:49PM +0800, Jiang Liu wrote:
>  Commit b81975eade8c ("x86, irq: Clean up irqdomain transition code")
>  breaks xen IRQ allocation because xen_smp_prepare_cpus() doesn't invoke
>  setup_IO_APIC(), so no irqdomains created for IOAPICs and
>  mp_map_pin_to_irq() fails at the very beginning.
>  --- a/arch/x86/kernel/apic/io_apic.c
>  +++ b/arch/x86/kernel/apic/io_apic.c
>  @@ -2369,31 +2369,29 @@ static void ioapic_destroy_irqdomain(int idx)
>   ioapics[idx].pin_info = NULL;
>   }
>   
>  -void __init setup_IO_APIC(void)
>  +void __init setup_IO_APIC(bool xen_smp)
>   {
>   int ioapic;
>   
>  -/*
>  - * calling enable_IO_APIC() is moved to setup_local_APIC for BP
>  - */
>  -io_apic_irqs = nr_legacy_irqs() ? ~PIC_IRQS : ~0UL;
>  +if (!xen_smp) {
>  +apic_printk(APIC_VERBOSE, "ENABLING IO-APIC IRQs\n");
>  +io_apic_irqs = nr_legacy_irqs() ? ~PIC_IRQS : ~0UL;
>  +
>  +/* Set up IO-APIC IRQ routing. */
>  +x86_init.mpparse.setup_ioapic_ids();
>  +sync_Arb_IDs();
>  +}
> Hi Konrad,
>   Enabling above code for Xen dom0 will cause following warning
> because it writes a special value to ICR register.
> [3.394981] [ cut here ]
> [3.394985] WARNING: CPU: 0 PID: 1 at arch/x86/xen/enlighten.c:968
> xen_apic_write+0x15/0x20()
> [3.394988] Modules linked in:
> [3.394991] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.19.0-rc3+ #5
> [3.394993] Hardware name: Dell Inc. OptiPlex 9020/0DNKMN, BIOS A03
> 09/17/2013
> [3.394996]  03c8 88003056bdd8 817611bb
> 03c8
> [3.395000]   88003056be18 8106f4ea
> 0008
> [3.395004]  81fc1120 880030561348 a108
> a101
> [3.395008] Call Trace:
> [3.395012]  [] dump_stack+0x4f/0x6c
> [3.395015]  [] warn_slowpath_common+0xaa/0xd0
> [3.395018]  [] warn_slowpath_null+0x15/0x20
> [3.395021]  [] xen_apic_write+0x15/0x20
> [3.395026]  [] sync_Arb_IDs+0x84/0x86
> [3.395029]  [] setup_IO_APIC+0x7f/0x8e3
> [3.395033]  [] ? trace_hardirqs_on+0xd/0x10
> [3.395036]  [] ? _raw_spin_unlock_irqrestore+0x8a/0xa0
> [3.395040]  [] xen_smp_prepare_cpus+0x5d/0x184
> [3.395044]  [] kernel_init_freeable+0x149/0x293
> [3.395047]  [] ? kernel_init+0x9/0xf0
> [3.395049]  [] ? rest_init+0xd0/0xd0
> [3.395052]  [] kernel_init+0x9/0xf0
> [3.395054]  [] ret_from_fork+0x7c/0xb0
> [3.395057]  [] ? rest_init+0xd0/0xd0
> [3.395066] ---[ end trace 7c4371c8ba33d5d0 ]---
> 
> 
>   ioapic_initialized = 1;
>  +
>  +if (!xen_smp) {
>  +init_IO_APIC_traps();
>  +if (nr_legacy_irqs())
>  +check_timer();
>  +}
>   }
> And enabling above code causes Xen dom0 reboots.


Which is due to the 'check_timer' trying to setup its timer and
failing and then moving under its feet the legacy_pic to the NULL one
and then hitting panic.

The 'check_timer' has the logic to swap the 'legacy_pic':

2186 legacy_pic->init(1);   
 

which ends up executing:

317 new_val = inb(PIC_MASTER_IMR);  

318 if (new_val != probe_val) { 

319 printk(KERN_INFO "Using NULL legacy PIC\n");

320 legacy_pic = &null_legacy_pic;  

321 raw_spin_unlock_irqrestore(&i8259A_lock, flags);

322 return; 

323 } 

And the 'legacy_pic' has now be swapped over to the 'null_legacy_pic'
for which:

2393 if (nr_legacy_irqs())  
 
2394 check_timer(); 
 
2395
 

 70 static inline int nr_legacy_irqs(void)  

 71 {   

 72 return legacy_pic->nr_legacy_irqs;  

 73 }   

 74

would return zero (and not invoke the 'check

[Xen-devel] [PATCH 00/11] Alternate p2m: support multiple copies of host p2m

2015-01-09 Thread Ed White
This set of patches adds support to hvm domains for EPTP switching by creating
multiple copies of the host p2m (currently limited to 10 copies).

The primary use of this capability is expected to be in scenarios where access
to memory needs to be monitored and/or restricted below the level at which the
guest OS page tables operate. Two examples that were discussed at the 2014 Xen
developer summit are:

VM introspection: 
http://www.slideshare.net/xen_com_mgr/
zero-footprint-guest-memory-introspection-from-xen

Secure inter-VM communication:
http://www.slideshare.net/xen_com_mgr/nakajima-nvf

Each p2m copy is populated lazily on EPT violations, and only contains entries 
for
ram p2m types. Permissions for pages in alternate p2m's can be changed in a 
similar
way to the existing memory access interface, and gfn->mfn mappings can be 
changed.

All this is done through extra HVMOP types.

The cross-domain HVMOP code has been compile-tested only. Also, the cross-domain
code is hypervisor-only, the toolstack has not been modified.

The intra-domain code has been tested. Violation notifications can only be 
received
for pages that have been modified (access permissions and/or gfn->mfn mapping) 
intra-domain, and only on VCPU's that have enabled notification.

VMFUNC and #VE will both be emulated on hardware without native support.

This code is not compatible with nested hvm functionality and will refuse to 
work
with nested hvm active. It is also not compatible with migration. It should be
considered experimental.

Ed White (11):
  VMX: VMFUNC and #VE definitions and detection.
  VMX: implement suppress #VE.
  x86/HVM: Hardware alternate p2m support detection.
  x86/MM: Improve p2m type checks.
  x86/altp2m: basic data structures and support routines.
  VMX/altp2m: add code to support EPTP switching and #VE.
  x86/altp2m: introduce p2m_ram_rw_ve type.
  x86/altp2m: add remaining support routines.
  x86/altp2m: define and implement alternate p2m HVMOP types.
  x86/altp2m: fix log-dirty handling.
  x86/altp2m: alternate p2m memory events.

 docs/misc/xen-command-line.markdown |   7 +
 xen/arch/x86/hvm/Makefile   |   3 +-
 xen/arch/x86/hvm/altp2mhvm.c|  77 ++
 xen/arch/x86/hvm/hvm.c  | 264 +++-
 xen/arch/x86/hvm/vmx/vmcs.c |  40 +++
 xen/arch/x86/hvm/vmx/vmx.c  | 139 +++
 xen/arch/x86/mm/guest_walk.c|   2 +-
 xen/arch/x86/mm/hap/Makefile|   1 +
 xen/arch/x86/mm/hap/altp2m_hap.c| 191 +++
 xen/arch/x86/mm/hap/guest_walk.c|   4 +-
 xen/arch/x86/mm/hap/hap.c   |  30 ++-
 xen/arch/x86/mm/mm-locks.h  |   4 +
 xen/arch/x86/mm/p2m-ept.c   |  40 ++-
 xen/arch/x86/mm/p2m.c   | 472 +++-
 xen/arch/x86/mm/paging.c|   5 -
 xen/common/mem_access.c |   1 +
 xen/include/asm-arm/p2m.h   |   7 +
 xen/include/asm-x86/domain.h|   7 +
 xen/include/asm-x86/hvm/altp2mhvm.h |  42 
 xen/include/asm-x86/hvm/hvm.h   |  23 ++
 xen/include/asm-x86/hvm/vcpu.h  |   9 +
 xen/include/asm-x86/hvm/vmx/vmcs.h  |  16 ++
 xen/include/asm-x86/hvm/vmx/vmx.h   |  14 +-
 xen/include/asm-x86/msr-index.h |   1 +
 xen/include/asm-x86/p2m.h   |  61 -
 xen/include/public/hvm/hvm_op.h |  68 ++
 xen/include/public/mem_event.h  |   9 +
 27 files changed, 1513 insertions(+), 24 deletions(-)
 create mode 100644 xen/arch/x86/hvm/altp2mhvm.c
 create mode 100644 xen/arch/x86/mm/hap/altp2m_hap.c
 create mode 100644 xen/include/asm-x86/hvm/altp2mhvm.h

-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 10/11] x86/altp2m: fix log-dirty handling.

2015-01-09 Thread Ed White
Log-dirty, as used to track vram changes, works exclusively on the host p2m.
As a result, when running on any other p2m vram changes aren't tracked
properly and the domain's console display is corrupted.

To fix this, log-dirty pages are never valid in the alternate p2m's, and
if the type of any page in the host p2m is changed, that page is immediately
removed from any alternate p2m in which it was previously valid.

This requires taking the alternate p2m list lock, so to avoid a locking
order violation p2m_change_type_one() must not be called with the host p2m
lock held. This requires a minor change to the exit code flow in the
nested page fault handler, and removing the p2m locking code in
paging_log_dirty_range().

As far as I can tell, removing the latter code is safe since
p2m_change_type_one() acquires a gfn lock on the page before changing it.

With these changes, the alternate p2m nested page fault handler can safely
ignore log-dirty and leave it to be handled in the host p2m nested page
fault handler.

Signed-off-by: Ed White 
---
 xen/arch/x86/hvm/hvm.c   | 4 +++-
 xen/arch/x86/mm/p2m.c| 4 
 xen/arch/x86/mm/paging.c | 5 -
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index afe16bf..18d5987 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -2885,6 +2885,7 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long 
gla,
 /* Spurious fault? PoD and log-dirty also take this path. */
 if ( p2m_is_ram(p2mt) )
 {
+rc = 1;
 /*
  * Page log dirty is always done with order 0. If this mfn resides in
  * a large page, we do not change other pages type within that large
@@ -2893,9 +2894,10 @@ int hvm_hap_nested_page_fault(paddr_t gpa, unsigned long 
gla,
 if ( npfec.write_access )
 {
 paging_mark_dirty(v->domain, mfn_x(mfn));
+put_gfn(p2m->domain, gfn);
 p2m_change_type_one(v->domain, gfn, p2m_ram_logdirty, p2m_ram_rw);
+goto out;
 }
-rc = 1;
 goto out_put_gfn;
 }
 
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 44bf1ad..843a433 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -793,6 +793,10 @@ int p2m_change_type_one(struct domain *d, unsigned long 
gfn,
 
 gfn_unlock(p2m, gfn, 0);
 
+if ( pt == ot && altp2mhvm_active(d) )
+/* make sure this page isn't valid in any alternate p2m */
+p2m_remove_altp2m_page(d, gfn);
+
 return rc;
 }
 
diff --git a/xen/arch/x86/mm/paging.c b/xen/arch/x86/mm/paging.c
index 6b788f7..2be68ae 100644
--- a/xen/arch/x86/mm/paging.c
+++ b/xen/arch/x86/mm/paging.c
@@ -574,7 +574,6 @@ void paging_log_dirty_range(struct domain *d,
unsigned long nr,
uint8_t *dirty_bitmap)
 {
-struct p2m_domain *p2m = p2m_get_hostp2m(d);
 int i;
 unsigned long pfn;
 
@@ -588,14 +587,10 @@ void paging_log_dirty_range(struct domain *d,
  * switched to read-write.
  */
 
-p2m_lock(p2m);
-
 for ( i = 0, pfn = begin_pfn; pfn < begin_pfn + nr; i++, pfn++ )
 if ( !p2m_change_type_one(d, pfn, p2m_ram_rw, p2m_ram_logdirty) )
 dirty_bitmap[i >> 3] |= (1 << (i & 7));
 
-p2m_unlock(p2m);
-
 flush_tlb_mask(d->domain_dirty_cpumask);
 }
 
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 11/11] x86/altp2m: alternate p2m memory events.

2015-01-09 Thread Ed White
Add a flag to indicate that a memory event occurred in an alternate p2m
and a field containing the p2m index. Allow the response to switch to
a different p2m using the same flag and field.

Modify p2m_access_check() to handle alternate p2m's. Access_required is
always assumed for an alternate p2m.

Signed-off-by: Ed White 
---
 xen/arch/x86/mm/hap/altp2m_hap.c | 53 ++--
 xen/arch/x86/mm/p2m.c| 18 --
 xen/common/mem_access.c  |  1 +
 xen/include/asm-arm/p2m.h|  7 ++
 xen/include/asm-x86/p2m.h|  4 +++
 xen/include/public/mem_event.h   |  9 +++
 6 files changed, 88 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/mm/hap/altp2m_hap.c b/xen/arch/x86/mm/hap/altp2m_hap.c
index b889626..dd56bbc 100644
--- a/xen/arch/x86/mm/hap/altp2m_hap.c
+++ b/xen/arch/x86/mm/hap/altp2m_hap.c
@@ -19,6 +19,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -63,8 +64,9 @@ altp2m_write_p2m_entry(struct p2m_domain *p2m, unsigned long 
gfn,
  * else indicate that outer handler should handle fault
  *
  * If the fault is for a present entry:
- * if the page type is not p2m_ram_rw_ve crash domain
- * else if hardware does not support #VE emulate it and retry
+ * if the page type is p2m_ram_rw_ve and hardware does not support #VE
+ *  emulate #VE and retry if successful
+ * else try to send a memory event
  * else crash domain
  */
 
@@ -90,11 +92,58 @@ altp2mhvm_hap_nested_page_fault(struct vcpu *v, paddr_t gpa,
 
 if ( mfn_valid(mfn) )
 {
+bool_t violation;
+mem_event_request_t *req_ptr = NULL;
+
 /* Should #VE be emulated for this fault? */
 if ( p2mt == p2m_ram_rw_ve && !cpu_has_vmx_virt_exceptions &&
  ahvm_vcpu_emulate_ve(v) )
 return ALTP2MHVM_PAGEFAULT_DONE;
 
+/* Fault not handled yet, so try for mem_event */
+switch (p2ma)
+{
+case p2m_access_n:
+case p2m_access_n2rwx:
+default:
+violation = npfec.read_access || npfec.write_access || 
npfec.insn_fetch;
+break;
+case p2m_access_r:
+violation = npfec.write_access || npfec.insn_fetch;
+break;
+case p2m_access_w:
+violation = npfec.read_access || npfec.insn_fetch;
+break;
+case p2m_access_x:
+violation = npfec.read_access || npfec.write_access;
+break;
+case p2m_access_rx:
+case p2m_access_rx2rw:
+violation = npfec.write_access;
+break;
+case p2m_access_wx:
+violation = npfec.read_access;
+break;
+case p2m_access_rw:
+violation = npfec.insn_fetch;
+break;
+case p2m_access_rwx:
+violation = 0;
+break;
+}
+
+if ( violation )
+{
+p2m_mem_access_check(gpa, gla, npfec, &req_ptr);
+
+if ( req_ptr )
+{
+mem_access_send_req(v->domain, req_ptr);
+xfree(req_ptr);
+return ALTP2MHVM_PAGEFAULT_DONE;
+}
+}
+
 /* Could not handle fault here */
 gdprintk(XENLOG_INFO, "Altp2m memory access permissions failure, "
   "no mem_event listener VCPU %d, dom %d\n",
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 843a433..d296c8f 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -1486,6 +1486,13 @@ void p2m_mem_event_emulate_check(struct vcpu *v, const 
mem_event_response_t *rsp
 }
 }
 
+void p2m_mem_event_altp2m_check(struct vcpu *v, const mem_event_response_t 
*rsp)
+{
+if ( (rsp->flags & MEM_EVENT_FLAG_ALTERNATE_P2M) &&
+ altp2mhvm_active(v->domain) )
+p2m_switch_vcpu_altp2m_by_id(v, rsp->altp2m_idx);
+}
+
 void p2m_setup_introspection(struct domain *d)
 {
 if ( hvm_funcs.enable_msr_exit_interception )
@@ -1502,7 +1509,8 @@ bool_t p2m_mem_access_check(paddr_t gpa, unsigned long 
gla,
 struct vcpu *v = current;
 unsigned long gfn = gpa >> PAGE_SHIFT;
 struct domain *d = v->domain;
-struct p2m_domain* p2m = p2m_get_hostp2m(d);
+struct p2m_domain *p2m = altp2mhvm_active(v->domain) ?
+p2m_get_altp2m(v) : p2m_get_hostp2m(d);
 mfn_t mfn;
 p2m_type_t p2mt;
 p2m_access_t p2ma;
@@ -1536,7 +1544,7 @@ bool_t p2m_mem_access_check(paddr_t gpa, unsigned long 
gla,
 if ( !mem_event_check_ring(&d->mem_event->access) || !req_ptr ) 
 {
 /* No listener */
-if ( p2m->access_required ) 
+if ( p2m->access_required || altp2mhvm_active(v->domain) )
 {
 gdprintk(XENLOG_INFO, "Memory access permissions failure, "
   "no mem_event listener VCPU %d, dom %d\n",
@@ -1612,6 +1620,12 @@ bool_t p2m_mem_access_check(paddr_t gpa, unsigned long 
gla,
 req->vcpu_id = v->vc

[Xen-devel] [PATCH 09/11] x86/altp2m: define and implement alternate p2m HVMOP types.

2015-01-09 Thread Ed White
Signed-off-by: Ed White 
---
 xen/arch/x86/hvm/hvm.c  | 217 
 xen/include/public/hvm/hvm_op.h |  68 +
 2 files changed, 285 insertions(+)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index e6f64a3..afe16bf 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -6145,6 +6145,223 @@ long do_hvm_op(unsigned long op, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 break;
 }
 
+case HVMOP_altp2m_get_domain_state:
+{
+struct xen_hvm_altp2m_domain_state a;
+struct domain *d;
+
+if ( copy_from_guest(&a, arg, 1) )
+return -EFAULT;
+
+d = rcu_lock_domain_by_any_id(a.domid);
+if ( d == NULL )
+return -ESRCH;
+
+rc = -EINVAL;
+if ( !is_hvm_domain(d) || !hvm_altp2m_supported() )
+goto param_fail9;
+
+a.state = altp2mhvm_active(d);
+rc = copy_to_guest(arg, &a, 1) ? -EFAULT : 0;
+
+param_fail9:
+rcu_unlock_domain(d);
+break;
+}
+
+case HVMOP_altp2m_set_domain_state:
+{
+struct xen_hvm_altp2m_domain_state a;
+struct domain *d;
+struct vcpu *v;
+bool_t ostate;
+
+if ( copy_from_guest(&a, arg, 1) )
+return -EFAULT;
+
+d = rcu_lock_domain_by_any_id(a.domid);
+if ( d == NULL )
+return -ESRCH;
+
+rc = -EINVAL;
+if ( !is_hvm_domain(d) || !hvm_altp2m_supported() ||
+ nestedhvm_enabled(d) )
+goto param_fail10;
+
+ostate = d->arch.altp2m_active;
+d->arch.altp2m_active = !!a.state;
+
+/* If the alternate p2m state has changed, handle appropriately */
+if ( d->arch.altp2m_active != ostate )
+{
+if ( !ostate && !p2m_init_altp2m_by_id(d, 0) )
+goto param_fail10;
+
+for_each_vcpu( d, v )
+if (!ostate)
+altp2mhvm_vcpu_initialise(v);
+else
+altp2mhvm_vcpu_destroy(v);
+
+if ( ostate )
+p2m_flush_altp2m(d);
+}
+
+rc = 0;
+
+param_fail10:
+rcu_unlock_domain(d);
+break;
+}
+
+case HVMOP_altp2m_vcpu_enable_notify:
+{
+struct vcpu *curr = current;
+struct xen_hvm_altp2m_vcpu_enable_notify a;
+
+if ( copy_from_guest(&a, arg, 1) )
+return -EFAULT;
+
+if ( !is_hvm_domain(curr_d) || !hvm_altp2m_supported() ||
+ !curr_d->arch.altp2m_active || vcpu_altp2mhvm(curr).veinfo )
+return -EINVAL;
+
+vcpu_altp2mhvm(curr).veinfo = a.pfn;
+ahvm_vcpu_update_vmfunc_ve(curr);
+rc = 0;
+
+break;
+}
+
+case HVMOP_altp2m_create_p2m:
+{
+struct xen_hvm_altp2m_view a;
+struct domain *d;
+
+if ( copy_from_guest(&a, arg, 1) )
+return -EFAULT;
+
+d = rcu_lock_domain_by_any_id(a.domid);
+if ( d == NULL )
+return -ESRCH;
+
+rc = -EINVAL;
+if ( !is_hvm_domain(d) || !hvm_altp2m_supported() ||
+ !d->arch.altp2m_active )
+goto param_fail11;
+
+if ( !p2m_init_next_altp2m(d, &a.view) )
+goto param_fail11;
+
+p2m_set_altp2m_mem_access(d, a.view, ~0ul, a.hvmmem_default_access);
+
+rc = copy_to_guest(arg, &a, 1) ? -EFAULT : 0;
+
+param_fail11:
+rcu_unlock_domain(d);
+break;
+}
+
+case HVMOP_altp2m_destroy_p2m:
+{
+struct xen_hvm_altp2m_view a;
+struct domain *d;
+
+if ( copy_from_guest(&a, arg, 1) )
+return -EFAULT;
+
+d = rcu_lock_domain_by_any_id(a.domid);
+if ( d == NULL )
+return -ESRCH;
+
+rc = -EINVAL;
+if ( !is_hvm_domain(d) || !hvm_altp2m_supported() ||
+ !d->arch.altp2m_active )
+goto param_fail12;
+
+if ( p2m_destroy_altp2m_by_id(d, a.view) )
+rc = 0;
+
+param_fail12:
+rcu_unlock_domain(d);
+break;
+}
+
+case HVMOP_altp2m_switch_p2m:
+{
+struct xen_hvm_altp2m_view a;
+struct domain *d;
+
+if ( copy_from_guest(&a, arg, 1) )
+return -EFAULT;
+
+d = rcu_lock_domain_by_any_id(a.domid);
+if ( d == NULL )
+return -ESRCH;
+
+rc = -EINVAL;
+if ( !is_hvm_domain(d) || !hvm_altp2m_supported() ||
+ !d->arch.altp2m_active )
+goto param_fail13;
+
+if ( p2m_switch_domain_altp2m_by_id(d, a.view) )
+rc = 0;
+
+param_fail13:
+rcu_unlock_domain(d);
+break;
+}
+
+case HVMOP_altp2m_set_mem_access:
+{
+struct xen_hvm_altp2m_set_mem_access a;
+struct domain *d;
+
+if ( copy_from_guest(&a, arg, 1) )
+return -EFAULT;
+
+d = rcu_lock_domain_by_any_id(a.domid);
+if ( d == NULL )
+   

[Xen-devel] [PATCH 01/11] VMX: VMFUNC and #VE definitions and detection.

2015-01-09 Thread Ed White
Currently, neither is enabled globally but may be enabled on a per-VCPU
basis by the altp2m code.

Everything can be force-disabled globally by specifying vmfunc=0 on the
Xen command line.

Remove the check for EPTE bit 63 == zero in ept_split_super_page(), as
that bit is now hardware-defined.

Signed-off-by: Ed White 
---
 docs/misc/xen-command-line.markdown |  7 +++
 xen/arch/x86/hvm/vmx/vmcs.c | 40 +
 xen/arch/x86/mm/p2m-ept.c   |  1 -
 xen/include/asm-x86/hvm/vmx/vmcs.h  | 16 +++
 xen/include/asm-x86/hvm/vmx/vmx.h   | 13 +++-
 xen/include/asm-x86/msr-index.h |  1 +
 6 files changed, 76 insertions(+), 2 deletions(-)

diff --git a/docs/misc/xen-command-line.markdown 
b/docs/misc/xen-command-line.markdown
index 152ae03..00fbae7 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1305,6 +1305,13 @@ The optional `keep` parameter causes Xen to continue 
using the vga
 console even after dom0 has been started.  The default behaviour is to
 relinquish control to dom0.
 
+### vmfunc (Intel)
+> `= `
+
+> Default: `true`
+
+Use VMFUNC and #VE support if available.
+
 ### vpid (Intel)
 > `= `
 
diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 9d8033e..4274e92 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -50,6 +50,9 @@ boolean_param("unrestricted_guest", 
opt_unrestricted_guest_enabled);
 static bool_t __read_mostly opt_apicv_enabled = 1;
 boolean_param("apicv", opt_apicv_enabled);
 
+static bool_t __read_mostly opt_vmfunc_enabled = 1;
+boolean_param("vmfunc", opt_vmfunc_enabled);
+
 /*
  * These two parameters are used to config the controls for Pause-Loop Exiting:
  * ple_gap:upper bound on the amount of time between two successive
@@ -71,6 +74,8 @@ u32 vmx_secondary_exec_control __read_mostly;
 u32 vmx_vmexit_control __read_mostly;
 u32 vmx_vmentry_control __read_mostly;
 u64 vmx_ept_vpid_cap __read_mostly;
+u64 vmx_vmfunc __read_mostly;
+bool_t vmx_virt_exception __read_mostly;
 
 const u32 vmx_introspection_force_enabled_msrs[] = {
 MSR_IA32_SYSENTER_EIP,
@@ -110,6 +115,8 @@ static void __init vmx_display_features(void)
 P(cpu_has_vmx_virtual_intr_delivery, "Virtual Interrupt Delivery");
 P(cpu_has_vmx_posted_intr_processing, "Posted Interrupt Processing");
 P(cpu_has_vmx_vmcs_shadowing, "VMCS shadowing");
+P(cpu_has_vmx_vmfunc, "VM Functions");
+P(cpu_has_vmx_virt_exceptions, "Virtualization Exceptions");
 #undef P
 
 if ( !printed )
@@ -154,6 +161,7 @@ static int vmx_init_vmcs_config(void)
 u64 _vmx_misc_cap = 0;
 u32 _vmx_vmexit_control;
 u32 _vmx_vmentry_control;
+u64 _vmx_vmfunc = 0;
 bool_t mismatch = 0;
 
 rdmsr(MSR_IA32_VMX_BASIC, vmx_basic_msr_low, vmx_basic_msr_high);
@@ -207,6 +215,9 @@ static int vmx_init_vmcs_config(void)
 opt |= SECONDARY_EXEC_ENABLE_VPID;
 if ( opt_unrestricted_guest_enabled )
 opt |= SECONDARY_EXEC_UNRESTRICTED_GUEST;
+if ( opt_vmfunc_enabled )
+opt |= SECONDARY_EXEC_ENABLE_VM_FUNCTIONS |
+   SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS;
 
 /*
  * "APIC Register Virtualization" and "Virtual Interrupt Delivery"
@@ -296,6 +307,24 @@ static int vmx_init_vmcs_config(void)
   || !(_vmx_vmexit_control & VM_EXIT_ACK_INTR_ON_EXIT) )
 _vmx_pin_based_exec_control  &= ~ PIN_BASED_POSTED_INTERRUPT;
 
+/* The IA32_VMX_VMFUNC MSR exists only when VMFUNC is available */
+if ( _vmx_secondary_exec_control & SECONDARY_EXEC_ENABLE_VM_FUNCTIONS )
+{
+rdmsrl(MSR_IA32_VMX_VMFUNC, _vmx_vmfunc);
+
+/*
+ * VMFUNC leaf 0 (EPTP switching) must be supported.
+ *
+ * Or we just don't use VMFUNC.
+ */
+if ( !(_vmx_vmfunc & VMX_VMFUNC_EPTP_SWITCHING) )
+_vmx_secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_VM_FUNCTIONS;
+}
+
+/* Virtualization exceptions are only enabled if VMFUNC is enabled */
+if ( !(_vmx_secondary_exec_control & SECONDARY_EXEC_ENABLE_VM_FUNCTIONS) )
+_vmx_secondary_exec_control &= ~SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS;
+
 min = 0;
 opt = VM_ENTRY_LOAD_GUEST_PAT | VM_ENTRY_LOAD_BNDCFGS;
 _vmx_vmentry_control = adjust_vmx_controls(
@@ -316,6 +345,9 @@ static int vmx_init_vmcs_config(void)
 vmx_vmentry_control= _vmx_vmentry_control;
 vmx_basic_msr  = ((u64)vmx_basic_msr_high << 32) |
  vmx_basic_msr_low;
+vmx_vmfunc = _vmx_vmfunc;
+vmx_virt_exception = !!(_vmx_secondary_exec_control &
+   SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS);
 vmx_display_features();
 
 /* IA-32 SDM Vol 3B: VMCS size is never greater than 4kB. */
@@ -352,6 +384,9 @@ static int vmx_init_vmcs_config(void)
 mismatch |= 

[Xen-devel] [PATCH 06/11] VMX/altp2m: add code to support EPTP switching and #VE.

2015-01-09 Thread Ed White
Implement and hook up the code to enable VMX support of VMFUNC and #VE.

VMFUNC leaf 0 (EPTP switching) and #VE are emulated on hardware that
doesn't support them.

Signed-off-by: Ed White 
---
 xen/arch/x86/hvm/vmx/vmx.c | 138 +
 1 file changed, 138 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 931709b..a0a2d02 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -56,6 +56,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -1718,6 +1719,91 @@ static void vmx_enable_msr_exit_interception(struct 
domain *d)
  MSR_TYPE_W);
 }
 
+static void vmx_vcpu_update_eptp(struct vcpu *v)
+{
+struct domain *d = v->domain;
+struct p2m_domain *p2m = altp2mhvm_active(d) ?
+p2m_get_altp2m(v) : p2m_get_hostp2m(d);
+struct ept_data *ept = &p2m->ept;
+
+ept->asr = pagetable_get_pfn(p2m_get_pagetable(p2m));
+
+vmx_vmcs_enter(v);
+
+__vmwrite(EPT_POINTER, ept_get_eptp(ept));
+
+vmx_vmcs_exit(v);
+}
+
+static void vmx_vcpu_update_vmfunc_ve(struct vcpu *v)
+{
+struct domain *d = v->domain;
+u32 mask = SECONDARY_EXEC_ENABLE_VM_FUNCTIONS;
+
+if ( !cpu_has_vmx_vmfunc )
+return;
+
+if ( cpu_has_vmx_virt_exceptions )
+mask |= SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS;
+
+vmx_vmcs_enter(v);
+
+if ( !d->is_dying && altp2mhvm_active(d) )
+{
+v->arch.hvm_vmx.secondary_exec_control |= mask;
+__vmwrite(VM_FUNCTION_CONTROL, VMX_VMFUNC_EPTP_SWITCHING);
+__vmwrite(EPTP_LIST_ADDR, virt_to_maddr(d->arch.altp2m_eptp));
+
+if ( cpu_has_vmx_virt_exceptions )
+{
+p2m_type_t t;
+mfn_t mfn;
+
+mfn = get_gfn_query_unlocked(d, vcpu_altp2mhvm(v).veinfo, &t);
+__vmwrite(VIRT_EXCEPTION_INFO, mfn_x(mfn) << PAGE_SHIFT);
+}
+}
+else
+v->arch.hvm_vmx.secondary_exec_control &= ~mask;
+
+__vmwrite(SECONDARY_VM_EXEC_CONTROL,
+v->arch.hvm_vmx.secondary_exec_control);
+
+vmx_vmcs_exit(v);
+}
+
+static bool_t vmx_vcpu_emulate_ve(struct vcpu *v)
+{
+bool_t rc = 0;
+ve_info_t *veinfo = vcpu_altp2mhvm(v).veinfo ?
+hvm_map_guest_frame_rw(vcpu_altp2mhvm(v).veinfo, 0) : NULL;
+
+if ( !veinfo )
+return 0;
+
+if ( veinfo->semaphore != 0 )
+goto out;
+
+rc = 1;
+
+veinfo->exit_reason = EXIT_REASON_EPT_VIOLATION;
+veinfo->semaphore = ~0l;
+veinfo->eptp_index = vcpu_altp2mhvm(v).p2midx;
+
+vmx_vmcs_enter(v);
+__vmread(EXIT_QUALIFICATION, &veinfo->exit_qualification);
+__vmread(GUEST_LINEAR_ADDRESS, &veinfo->gla);
+__vmread(GUEST_PHYSICAL_ADDRESS, &veinfo->gpa);
+vmx_vmcs_exit(v);
+
+hvm_inject_hw_exception(TRAP_virtualisation,
+HVM_DELIVER_NO_ERROR_CODE);
+
+out:
+hvm_unmap_guest_frame(veinfo, 0);
+return rc;
+}
+
 static struct hvm_function_table __initdata vmx_function_table = {
 .name = "VMX",
 .cpu_up_prepare   = vmx_cpu_up_prepare,
@@ -1777,6 +1863,9 @@ static struct hvm_function_table __initdata 
vmx_function_table = {
 .nhvm_hap_walk_L1_p2m = nvmx_hap_walk_L1_p2m,
 .hypervisor_cpuid_leaf = vmx_hypervisor_cpuid_leaf,
 .enable_msr_exit_interception = vmx_enable_msr_exit_interception,
+.ahvm_vcpu_update_eptp = vmx_vcpu_update_eptp,
+.ahvm_vcpu_update_vmfunc_ve = vmx_vcpu_update_vmfunc_ve,
+.ahvm_vcpu_emulate_ve = vmx_vcpu_emulate_ve,
 };
 
 const struct hvm_function_table * __init start_vmx(void)
@@ -2551,6 +2640,17 @@ static void vmx_vmexit_ud_intercept(struct cpu_user_regs 
*regs)
 hvm_inject_hw_exception(TRAP_invalid_op, HVM_DELIVER_NO_ERROR_CODE);
 break;
 case X86EMUL_EXCEPTION:
+/* check for a VMFUNC that should be emulated */
+if ( !cpu_has_vmx_vmfunc && altp2mhvm_active(current->domain) &&
+ ctxt.insn_buf_bytes >= 3 && ctxt.insn_buf[0] == 0x0f &&
+ ctxt.insn_buf[1] == 0x01 && ctxt.insn_buf[2] == 0xd4 &&
+ regs->eax == 0 &&
+ p2m_switch_vcpu_altp2m_by_id(current, (uint16_t)regs->ecx) )
+{
+regs->eip += 3;
+return;
+}
+
 if ( ctxt.exn_pending )
 hvm_inject_trap(&ctxt.trap);
 /* fall through */
@@ -2698,6 +2798,40 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
 
 /* Now enable interrupts so it's safe to take locks. */
 local_irq_enable();
+ 
+/*
+ * If the guest has the ability to switch EPTP without an exit,
+ * figure out whether it has done so and update the altp2m data.
+ */
+if ( altp2mhvm_active(v->domain) &&
+(v->arch.hvm_vmx.secondary_exec_control &
+SECONDARY_EXEC_ENABLE_VM_FUNCTIONS) )
+{
+unsigned long idx;
+
+if ( v->arch.hvm_vmx.secondary_exec_control &
+SECONDARY_EXEC_ENABLE_VIRT_EXCEPT

[Xen-devel] [PATCH 05/11] x86/altp2m: basic data structures and support routines.

2015-01-09 Thread Ed White
Add the basic data structures needed to support alternate p2m's and
the functions to initialise them and tear them down.

Although Intel hardware can handle 512 EPTP's per hardware thread
concurrently, only 10 per domain are supported in this patch for
performance reasons.

The iterator in hap_enable() does need to handle 512, so that is now
uint16_t.

Signed-off-by: Ed White 
---
 xen/arch/x86/hvm/Makefile   |   3 +-
 xen/arch/x86/hvm/altp2mhvm.c|  77 +++
 xen/arch/x86/hvm/hvm.c  |  21 
 xen/arch/x86/mm/hap/Makefile|   1 +
 xen/arch/x86/mm/hap/altp2m_hap.c|  66 +++
 xen/arch/x86/mm/hap/hap.c   |  30 ++-
 xen/arch/x86/mm/mm-locks.h  |   4 ++
 xen/arch/x86/mm/p2m.c   | 102 
 xen/include/asm-x86/domain.h|   7 +++
 xen/include/asm-x86/hvm/altp2mhvm.h |  36 +
 xen/include/asm-x86/hvm/hvm.h   |  17 ++
 xen/include/asm-x86/hvm/vcpu.h  |   9 
 xen/include/asm-x86/p2m.h   |  22 
 13 files changed, 393 insertions(+), 2 deletions(-)
 create mode 100644 xen/arch/x86/hvm/altp2mhvm.c
 create mode 100644 xen/arch/x86/mm/hap/altp2m_hap.c
 create mode 100644 xen/include/asm-x86/hvm/altp2mhvm.h

diff --git a/xen/arch/x86/hvm/Makefile b/xen/arch/x86/hvm/Makefile
index eea..5bf8b4f 100644
--- a/xen/arch/x86/hvm/Makefile
+++ b/xen/arch/x86/hvm/Makefile
@@ -22,4 +22,5 @@ obj-y += vlapic.o
 obj-y += vmsi.o
 obj-y += vpic.o
 obj-y += vpt.o
-obj-y += vpmu.o
\ No newline at end of file
+obj-y += vpmu.o
+obj-y += altp2mhvm.o
diff --git a/xen/arch/x86/hvm/altp2mhvm.c b/xen/arch/x86/hvm/altp2mhvm.c
new file mode 100644
index 000..fa0af0c
--- /dev/null
+++ b/xen/arch/x86/hvm/altp2mhvm.c
@@ -0,0 +1,77 @@
+/*
+ * Alternate p2m HVM
+ * Copyright (c) 2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+void
+altp2mhvm_vcpu_reset(struct vcpu *v)
+{
+struct altp2mvcpu *av = &vcpu_altp2mhvm(v);
+
+av->p2midx = 0;
+av->veinfo = 0;
+
+if ( hvm_funcs.ahvm_vcpu_reset )
+hvm_funcs.ahvm_vcpu_reset(v);
+}
+
+int
+altp2mhvm_vcpu_initialise(struct vcpu *v)
+{
+int rc = -EOPNOTSUPP;
+
+if ( v != current )
+vcpu_pause(v);
+
+if ( !hvm_funcs.ahvm_vcpu_initialise ||
+ (hvm_funcs.ahvm_vcpu_initialise(v) == 0) )
+{
+rc = 0;
+altp2mhvm_vcpu_reset(v);
+cpumask_set_cpu(v->vcpu_id, p2m_get_altp2m(v)->dirty_cpumask);
+
+ahvm_vcpu_update_eptp(v);
+}
+
+if ( v != current )
+vcpu_unpause(v);
+
+return rc;
+}
+
+void
+altp2mhvm_vcpu_destroy(struct vcpu *v)
+{
+if ( v != current )
+vcpu_pause(v);
+
+if ( hvm_funcs.ahvm_vcpu_destroy )
+hvm_funcs.ahvm_vcpu_destroy(v);
+
+cpumask_clear_cpu(v->processor, p2m_get_altp2m(v)->dirty_cpumask);
+altp2mhvm_vcpu_reset(v);
+
+ahvm_vcpu_update_eptp(v);
+ahvm_vcpu_update_vmfunc_ve(v);
+
+if ( v != current )
+vcpu_unpause(v);
+}
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index b89e9d2..e8787cc 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -60,6 +60,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -2290,6 +2291,7 @@ void hvm_vcpu_destroy(struct vcpu *v)
 
 hvm_all_ioreq_servers_remove_vcpu(d, v);
 
+altp2mhvm_vcpu_destroy(v);
 nestedhvm_vcpu_destroy(v);
 
 free_compat_arg_xlat(v);
@@ -6377,6 +6379,25 @@ bool_t hvm_altp2m_supported()
 return hvm_funcs.altp2m_supported;
 }
 
+void ahvm_vcpu_update_eptp(struct vcpu *v)
+{
+if (hvm_funcs.ahvm_vcpu_update_eptp)
+hvm_funcs.ahvm_vcpu_update_eptp(v);
+}
+
+void ahvm_vcpu_update_vmfunc_ve(struct vcpu *v)
+{
+if (hvm_funcs.ahvm_vcpu_update_vmfunc_ve)
+hvm_funcs.ahvm_vcpu_update_vmfunc_ve(v);
+}
+
+bool_t ahvm_vcpu_emulate_ve(struct vcpu *v)
+{
+if (hvm_funcs.ahvm_vcpu_emulate_ve)
+return hvm_funcs.ahvm_vcpu_emulate_ve(v);
+return 0;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/mm/hap/Makefile b/xen/arch/x86/mm/hap/Makefile
index 68f2bb5..216cd90 100644
--- a/xen/arch/x86/mm/hap/Makefile
+++ b/xen/arch/x86/mm/hap/Makefile
@@ -4,6 +4,7 @@ obj-y += guest_walk_3level.o
 obj-$

[Xen-devel] [PATCH 03/11] x86/HVM: Hardware alternate p2m support detection.

2015-01-09 Thread Ed White
As implemented here, only supported on platforms with VMX HAP.

Signed-off-by: Ed White 
---
 xen/arch/x86/hvm/hvm.c| 8 
 xen/arch/x86/hvm/vmx/vmx.c| 1 +
 xen/include/asm-x86/hvm/hvm.h | 6 ++
 3 files changed, 15 insertions(+)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index bc414ff..3a7367c 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -157,6 +157,9 @@ static int __init hvm_enable(void)
 if ( !fns->pvh_supported )
 printk(XENLOG_INFO "HVM: PVH mode not supported on this platform\n");
 
+if ( !fns->altp2m_supported )
+printk(XENLOG_INFO "HVM: Alternate p2m mode not supported on this 
platform\n");
+
 /*
  * Allow direct access to the PC debug ports 0x80 and 0xed (they are
  * often used for I/O delays, but the vmexits simply slow things down).
@@ -6369,6 +6372,11 @@ enum hvm_intblk nhvm_interrupt_blocked(struct vcpu *v)
 return hvm_funcs.nhvm_intr_blocked(v);
 }
 
+bool_t hvm_altp2m_supported()
+{
+return hvm_funcs.altp2m_supported;
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index f2554d6..931709b 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1796,6 +1796,7 @@ const struct hvm_function_table * __init start_vmx(void)
 if ( cpu_has_vmx_ept && (cpu_has_vmx_pat || opt_force_ept) )
 {
 vmx_function_table.hap_supported = 1;
+vmx_function_table.altp2m_supported = 1;
 
 vmx_function_table.hap_capabilities = 0;
 
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index e3d2d9a..7115a68 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -94,6 +94,9 @@ struct hvm_function_table {
 /* Necessary hardware support for PVH mode? */
 int pvh_supported;
 
+/* Necessary hardware support for alternate p2m's? */
+int altp2m_supported;
+
 /* Indicate HAP capabilities. */
 int hap_capabilities;
 
@@ -518,6 +521,9 @@ bool_t nhvm_vmcx_hap_enabled(struct vcpu *v);
 /* interrupt */
 enum hvm_intblk nhvm_interrupt_blocked(struct vcpu *v);
 
+/* returns true if hardware supports alternate p2m's */
+bool_t hvm_altp2m_supported(void);
+
 #ifndef NDEBUG
 /* Permit use of the Forced Emulation Prefix in HVM guests */
 extern bool_t opt_hvm_fep;
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


  1   2   >