Re: [Xen-devel] [PATCH 00/27] Libxl migration v2

2015-07-02 Thread Yang Hongyang

Hi Andrew,

  Are there any updates of this series that I can checkout and rebase mine onto?
:)

On 06/15/2015 09:44 PM, Andrew Cooper wrote:

This series adds support for the libxl migration v2 stream, and untangles the
existing layering violations of the toolstack and qemu records.

At the end of the series, legacy migration is no longer used.

Note: Remus support is broken and (RFC) fixed in separate patches in this
series.  It was too tangled to fix in a bisectable fashon.  Plain
suspend/migrate/resume however is (should be) bisectable along the entire
series.

There are a couple of outstanding questions:

1) What to do about the toolstack/xenstore record.  It is currently by being
passed around as a blob, but it might be better to split it out.

2) What (if any) ABI/API qualifications are needed? (Particularly in reference
to patch 21)

The Remus code is untested by me, but is hopefully in the correct ballpark.
All other combinations of suspend/migrate/resume have been tested with PV and
HVM guests (qemu-trad and qemu-upstream), including 32 -> 64 bit migration
(which was the underlying bug causing us to write migration v2 in the first
place).

There are some further improvements which could be made.  In particular, it
appears that sending the toolstack record on each checkpoint is redundant, and
there is certainly room for some more pruning of the legacy migration code.

Anyway, thoughts/comments welcome.  Please test!

~Andrew


Andrew Cooper (22):
   tools/libxl: Fix libxl__ev_child_inuse() check for not-yet-initialised 
children
   tools/libxc: Always compile the compat qemu variables into xc_sr_context
   tools/libxl: Stash all restore parameters in domain_create_state
   tools/xl: Mandatory flag indicating the format of the migration stream
   tools/libxl: Introduce ROUNDUP()
   tools/libxl: Extra APIs for the save helper
   tools/libxl: Pass restore_fd as a parameter to libxl__xc_domain_restore()
   docs: Libxl migration v2 stream specification
   tools/python: Libxc migration v2 infrastructure
   tools/python: Libxl migration v2 infrastructure
   tools/python: Verification utility for v2 stream spec compliance
   tools/python: Conversion utility for legacy migration streams
   tools/libxl: Support converting a legacy stream to a v2 stream
   tools/libxl: Convert a legacy stream if needed
   tools/libxc+libxl+xl: Restore v2 streams
   tools/libxc+libxl+xl: Save v2 streams
   docs/libxl: [RFC] Introduce CHECKPOINT_END to support migration v2 remus 
streams
   tools/libxl: [RFC] Write checkpoint records into the stream
   tools/libx{c,l}: [RFC] Introduce restore_callbacks.checkpoint()
   tools/libxl: [RFC] Handle checkpoint records in a libxl migration v2 stream
   tools/libxc: Drop all XG_LIBXL_HVM_COMPAT code from libxc
   tools/libxl: Drop all knowledge of toolstack callbacks

Ian Jackson (2):
   libxl: cancellation: Preparations for save/restore cancellation
   libxl: cancellation: Handle SIGTERM in save/restore helper

Ross Lagerwall (3):
   tools/libxl: Migration v2 stream format
   tools/libxl: Infrastructure for reading a libxl migration v2 stream
   tools/libxl: Infrastructure for writing a v2 stream

  docs/specs/libxl-migration-stream.pandoc  |  218 
  tools/libxc/Makefile  |2 -
  tools/libxc/include/xenguest.h|3 +
  tools/libxc/xc_sr_common.h|5 -
  tools/libxc/xc_sr_restore.c   |   33 +-
  tools/libxc/xc_sr_restore_x86_hvm.c   |  124 -
  tools/libxc/xc_sr_save_x86_hvm.c  |   36 --
  tools/libxl/Makefile  |2 +
  tools/libxl/libxl_aoutils.c   |7 +
  tools/libxl/libxl_convert_callout.c   |  146 ++
  tools/libxl/libxl_create.c|   80 +--
  tools/libxl/libxl_dom.c   |   61 +--
  tools/libxl/libxl_internal.h  |  140 -
  tools/libxl/libxl_save_callout.c  |   63 +--
  tools/libxl/libxl_save_helper.c   |   95 ++--
  tools/libxl/libxl_save_msgs_gen.pl|9 +-
  tools/libxl/libxl_sr_stream_format.h  |   58 +++
  tools/libxl/libxl_stream_read.c   |  663 
  tools/libxl/libxl_stream_write.c  |  640 +++
  tools/libxl/libxl_types.idl   |2 +
  tools/libxl/xl_cmdimpl.c  |9 +-
  tools/python/Makefile |4 +
  tools/python/scripts/convert-legacy-stream.py |  683 +
  tools/python/scripts/verify-stream-v2.py  |  174 +++
  tools/python/setup.py |1 +
  tools/python/xen/migration/libxc.py   |  446 
  tools/python/xen/migration/libxl.py   |  199 +++
  tools/python/xen/migration/tests.py   |   54 ++
  tools/python/xen/migration/verify.py  |   37 ++
  29 files changed, 3638 i

Re: [Xen-devel] [CALL-FOR-AGENDA] Monthly Xen.org Technical Call (2015-07-08)

2015-07-02 Thread Roger Pau Monné
El 01/07/15 a les 18.15, Boris Ostrovsky ha escrit:
> On 07/01/2015 11:57 AM, Ian Campbell wrote:
>> The next Xen technical call will be at:
>>  Wed  8 Jul 17:00:00 BST 2015
>>  `date -d @1436371200`
>>
>> See http://lists.xen.org/archives/html/xen-devel/2015-01/msg00414.html
>> for more information on the call.
>>
>> Please let me know (CC-ing the list) any topics which you would like to
>> discuss. It might be useful to include:
>>
>>* References to any relevant/recent mailing list threads;
>>* Other people who you think should be involved in the discussion (and
>>  CC them);
>>
>> If you would like to attend then please let me know so I can send you the
>> dial in details.
> 
> 
> Given that there is fair amount of PVH-related work happening now
> (Roger's, Elena's and mine) perhaps we should have a discussion about
> that to see where we are going?
> 
> Andrew, Tim, Roger, Jan (if he is back from vacation), Elena, Konrad and
> David would be good to have present.

I'm sorry but I will be on vacation next week, hiking in the mountains
without Internet or cell phone coverage, so I won't be able to join the
meeting.

I will be back on the 20th, maybe we can arrange to do another of this
PVH meetings on the next technical call in early August?

Roger.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Fwd: [v3 14/15] Update Posted-Interrupts Descriptor during vCPU scheduling

2015-07-02 Thread Dario Faggioli
On Thu, 2015-07-02 at 04:32 +, Wu, Feng wrote:
> 
> 
> > -Original Message-
> > From: Dario Faggioli [mailto:dario.faggi...@citrix.com]
> > Sent: Tuesday, June 30, 2015 10:58 AM
> > To: Wu, Feng
> > Cc: xen-devel; k...@xen.org; jbeul...@suse.com; andrew.coop...@citrix.com;
> > Tian, Kevin; Zhang, Yang Z; george.dun...@eu.citrix.com; Wu, Feng
> > Subject: Re: Fwd: [v3 14/15] Update Posted-Interrupts Descriptor during vCPU
> > scheduling
> > 
> > On Mon, 2015-06-29 at 18:36 +0100, Andrew Cooper wrote:
> > 
> > >
> > > The basic idea here is:
> > > 1. When vCPU's state is RUNSTATE_running,
> > > - set 'NV' to 'Notification Vector'.
> > > - Clear 'SN' to accpet PI.
> > > - set 'NDST' to the right pCPU.
> > > 2. When vCPU's state is RUNSTATE_blocked,
> > > - set 'NV' to 'Wake-up Vector', so we can wake up the
> > >   related vCPU when posted-interrupt happens for it.
> > > - Clear 'SN' to accpet PI.
> > > 3. When vCPU's state is RUNSTATE_runnable/RUNSTATE_offline,
> > > - Set 'SN' to suppress non-urgent interrupts.
> > >   (Current, we only support non-urgent interrupts)
> > > - Set 'NV' back to 'Notification Vector' if needed.
> > >
> > It might be me, but it feels a bit odd to see RUNSTATE-s being (ab)used
> > directly for this, as it does feel odd to see arch specific code being
> > added in there.
> > 
> > Can't this be done in context_switch(), which is already architecture
> > specific? I was thinking to something very similar to what has been done
> > for PSR, i.e., on x86, put everything in __context_switch().
> > 
> > Looking at who's prev and who's next, and at what pause_flags each has
> > set, you should be able to implement all of the above logic.
> > 
> > Or am I missing something?
> 
> As mentioned in the description of this patch, here we need to do
> something when the vCPU's state is changed, can we get the
> state transition in __context_switch(), such as "running -> blocking"?
> 
Well, in the patch description you mention how you've done it, so of
course it mentions runstates.

That does not necessarily means "we need to do something" in
vcpu_runstate_change(). Actually, that's exactly what I'm asking: can
you check whether this thing that you need doing can be done somewhere
else than in vcpu_runstaete_change() ?

In fact, looking at how, where and what for, runstetes are used, that
really does not feel right, at least to me. What you seem to be
interested is whether a vCPU blocks and/or unblocks. Runstates are an
abstraction, build up on top of (mostly) pause_flags, like _VPF_blocked
(look at how runstate is updated).

I think you should not build on top of such abstraction, but on top of
pause_flags directly. I had a quick look, and it indeed seems to me that
you can get all you need from there too. It might even result in the
code looking simpler (but that's of course hard to tell without actually
trying). In fact, inside the context switching code, you already know
that prev was running so, if it has the proper flag set, it means it's
blocking (i.e., going to RUNSTATE_blocked, in runstates language), if
not, it maybe is being preempted (i.e., going to RUNSTATE_runnable).
Therefore, you can enact all your logic, even without any need to keep
track of the previous runstate, and without needing to build up a full
state machine and looking at all possible transitions.

So, can you have a look at whether that solution can fly? Because, if it
does, I think it would be a lot better.

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-02 Thread Dario Faggioli
On Thu, 2015-07-02 at 04:27 +, Wu, Feng wrote:

> > > > +list_for_each_entry(vmx, &per_cpu(pi_blocked_vcpu, cpu),
> > > > +pi_blocked_vcpu_list)
> > > > +if ( vmx->pi_desc.on )
> > > > +tasklet_schedule(&vmx->pi_vcpu_wakeup_tasklet);
> > >
> > > There is a logical bug here.  If we have two NV's delivered to this
> > > pcpu, we will kick the first vcpu twice.
> > >
> > > On finding desc.on, a kick should be scheduled, then the vcpu removed
> > > from this list.  With desc.on set, we know for certain that another NV
> > > will not arrive for it until it has been scheduled again and the
> > > interrupt posted.
> > >
> > Yes, that seems a possible issue (and one that should indeed be
> > avoided).
> > 
> > I'm still unsure about the one that I raised myself but, if it is
> > possible to have more than one vcpu in a pcpu list, with desc.on==true,
> > then it looks to me that we kick all of them, for each notification.
> > 
> > Added what Andrew's spotted, if there are a bunch of vcpus, queued with
> > desc.on==ture, and a bunch of notifications arrives before the tasklet
> > gets executed, we'll be kicking the whole bunch of them for a bunch of
> > times! :-/
> 
> As Andrew mentioned, removing the vCPUs with desc.on = true from the
> list can avoid kick vCPUs for multiple times.
> 
It avoids kicking vcpus multiple times if more than one notification
arrives, yes.

It is, therefore, not effective in making sure that, even with only one
notification, you only kick the interested vcpu.

This is the third time that I ask:
 (1) whether it is possible to have more vcpus queued on one pcpu PI 
 blocked list with desc.on (I really believe it is);
 (2) if yes, whether it is TheRightThing(TM) to kick all of them, as
 soon as any notification arrives, instead that putting together a
 mechanism for kicking only a specific one.

The fact that you're not answering is not so much of a big deal for
me... I'll just keep asking! :-D


Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen 4.6 Development Update (2 WEEKS TO FREEZE, important information in preamble)

2015-07-02 Thread Ian Campbell
On Wed, 2015-07-01 at 11:17 +0100, Julien Grall wrote:
> As suggested by Wei on the top of his mail [1], can you please CC only
> relevant people and avoid to reply all?

It seems that many people are unable to follow these simple
instructions.

Wei, perhaps you could stop CCing people who inappropriately do not trim
their quotes or the CC list in the future. If they cannot do us the
courtesy of doing so I don't see why they should receive a courtesy copy
of the status mail.

A somewhat less aggressive approach might be to use Bcc instead Cc for
the bulk of people (i.e. anyone who needn't be cc-d on every reply). The
failure case of someone who cannot read simple instructions then becomes
a lack of CCs rather than a plethora of unwanted Ccs.

Ian.

> 
> Many thanks,
> 
> [1] (Note, please trim your quotes when replying, and also trim the CC
> list if
> necessary. You might also consider changing the subject line of your
> reply to
> "Status of FOO (Was: Xen 4.6 Development Update (X months reminder)")
> 
> On 01/07/15 07:12, Chen, Tiejun wrote:
> >>
> >> *  RMRR fix (fair)
> >> RFC posted
> > 
> > Wei,
> > 
> > I think this should be ok or good based on current status, and also
> > should remove "RFC" here.
> > 
> > Thanks
> > Tiejun
> > 
> > ___
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel
> 
> 



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH v3 10/18] xen/arm: ITS: Add APIs to add and assign device

2015-07-02 Thread Vijay Kilari
On Mon, Jun 29, 2015 at 5:59 PM, Ian Campbell  wrote:
> CIOn Mon, 2015-06-22 at 17:31 +0530, vijay.kil...@gmail.com wrote:
>> +/* Device assignment. Should be called from pci_device_add */
>> +int its_add_device(struct domain *d, u32 devid)
>> +{
>
> Prior to the PCI series landing, and to enable dom0 to use ITS it might
> be possible to call this from xen/arch/arm/platforms/thunderx.c via the
> specific_mappings platform hook, which would also expose the PCI
> controller to dom0 via a series of specific mmio mappings (look at
> xen/arch/arm/platforms/xgene-storm.c for the sort of thing I mean).
>
> That would, I think, give basic PCI functionality for dom0 (i.e.
> allowing us to boot on thunderx) and decouple things from the PCI series
> somewhat, which ought to make things easier overall IMHO.

  In case ThunderX, mmio mappings PCI RC is parsed to find devices.
How do we know device ids upfront to call its_add_device?

>
> Ian.
>

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [CALL-FOR-AGENDA] Monthly Xen.org Technical Call (2015-07-08)

2015-07-02 Thread Ian Campbell
On Thu, 2015-07-02 at 09:32 +0200, Roger Pau Monné wrote:
> El 01/07/15 a les 18.15, Boris Ostrovsky ha escrit:
> > On 07/01/2015 11:57 AM, Ian Campbell wrote:
> >> The next Xen technical call will be at:
> >>  Wed  8 Jul 17:00:00 BST 2015
> >>  `date -d @1436371200`
> >>
> >> See http://lists.xen.org/archives/html/xen-devel/2015-01/msg00414.html
> >> for more information on the call.
> >>
> >> Please let me know (CC-ing the list) any topics which you would like to
> >> discuss. It might be useful to include:
> >>
> >>* References to any relevant/recent mailing list threads;
> >>* Other people who you think should be involved in the discussion (and
> >>  CC them);
> >>
> >> If you would like to attend then please let me know so I can send you the
> >> dial in details.
> > 
> > 
> > Given that there is fair amount of PVH-related work happening now
> > (Roger's, Elena's and mine) perhaps we should have a discussion about
> > that to see where we are going?
> > 
> > Andrew, Tim, Roger, Jan (if he is back from vacation), Elena, Konrad and
> > David would be good to have present.

Yes, I think this topic would indeed be a worthwhile subject for
discussion.

I think there is a toolstack element to all this as well, so would it be
a good idea to have Ian J, Stefano and Wei in addition to me (who will
be there to run the call in any case)? I'd probably mark them in as
optional participants for scheduling purposes.

> I'm sorry but I will be on vacation next week, hiking in the mountains
> without Internet or cell phone coverage, so I won't be able to join the
> meeting.

I think you are one of the critical participants in the discussion, i.e.
we can't go ahead without you.

> I will be back on the 20th, maybe we can arrange to do another of this
> PVH meetings on the next technical call in early August?

The August call is scheduled for 12 August, which is 6 weeks from
yesterday.

We could either go for that or if it is too far out we could push the
July iteration of the call back two weeks to Wednesday 22 July (so 3
weeks from yesterday). (a one week slip would still land in your
vacation time).

There is also the devsummit in the week 17 August (~7 weeks time) which
is an opportunity for a f2f.

Shall I put up a poll of some sort to gather preferred timeslot options
out of that set?

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [xen-unstable test] 58974: regressions - FAIL

2015-07-02 Thread Dario Faggioli
On Wed, 2015-07-01 at 21:29 -0700, Meng Xu wrote:
> Hi Dario,
> 
Hi,

> 2015-06-30 10:14 GMT-07:00 Dario Faggioli :
> > if you'd b able to have a look at what's
> > happening, that would be awesome. If you don't have time, I will have a
> > look myself, but only in a few days.
> >
> Hmm, this is another bug for RTDS on ARM. :-(
> I don't have an ARM board set up right now. I'm not sure if I can
> run/test it on ARM. I'm curious if this bug is similar with the
> previous lock-related bug of RTDS scheduler on ARM.
> Denys fixed the previous bug at
> http://lists.xenproject.org/archives/html/xen-devel/2015-01/msg03933.html
> 
Yes, it reminded of that one to me too... It likely is something
similar. :-/

I understand you don't have any suitable hardware. Still, if you could
at least have a look at the failure and at the code, and try to provide
an analysis and/or suggest a way for resolving the issue, we'll be glad
to test patches on actual ARM systems.

If you need to look at the Xen (quite likely) and Linux (quite unlikely)
images, e.g., to disassemble it and check what corresponds to the
address where things explode, and similar things, they're available too.

Just look here:
http://logs.test-lab.xenproject.org/osstest/logs/58974/test-armhf-armhf-xl-rtds/info.html

And check what the build jobs, Xen and Linux paths are, in the table at
the bottom. You'll find this:

  buildjobbuild-armhf
  xenbuildjob build-armhf
  xen_kernel_ver  3.16.7-ckt12+

Go to the summary of the flight, which includes build jobs:

 'Flight 58974 scoreboard'
 http://logs.test-lab.xenproject.org/osstest/logs/58974/

Identify and go in that jobs' report:
http://logs.test-lab.xenproject.org/osstest/logs/58974/build-armhf/info.html

And go to:

 'build/ (outputs from build)'
 http://logs.test-lab.xenproject.org/osstest/logs/58974/build-armhf/build/

You'll find the binaries (for Xen, in this case), there.

The only caveat is that you'll need an ARM toolchain, to be able to use,
in this case, arm-elf-objdump and similar tools. The procedure to get
one is distro dependant, but it's not that hard these days (there are
guides on our wiki too).

Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v4][PATCH 04/19] xen/passthrough: extend hypercall to support rdm reservation policy

2015-07-02 Thread Chen, Tiejun

@@ -1898,7 +1899,13 @@ static int intel_iommu_add_device(u8 devfn, struct 
pci_dev *pdev)
   PCI_BUS(bdf) == pdev->bus &&
   PCI_DEVFN2(bdf) == devfn )
  {
-ret = rmrr_identity_mapping(pdev->domain, 1, rmrr);
+/*
+ * RMRR is always reserved on e820 so either of flag
+ * is fine for hardware domain and here we'd like to
+ * pass XEN_DOMCTL_DEV_RDM_RELAXED.
+ */
+ret = rmrr_identity_mapping(pdev->domain, 1, rmrr,
+XEN_DOMCTL_DEV_RDM_RELAXED);


So two things.

First, you assert that the value here won't matter, because the
hardware domain is guaranteed never to have a conflict.

Which is likely to be true almost all the time; but the question is,
*if* something goes wrong, what should happen?

For instance, suppose that someone accidentally introduces a bug in
Xen that messes up or ignores reading a portion of the e820 map under
certain circumstances.  What should happen?


Yes, you can image all possible cases. But if this kind of bug can come 
true, I really very doubt if Xen can boot successfully. Because e820 is 
a fundamental key to run OS, so this case is very easy to panic Xen, right?


Anyway, I agree we should concern all corner cases.



If you set this to RELAXED, this clash will be silently ignored; which
means that devices that need RMRR will simply malfunction in weird
ways without any warning messages having been printed that might give


No. We always post that messages regardless of relaxe or strict since 
this massage just depends on one condition of that conflict exist.



someone a hint about what is going on.

If you set this to STRICT, then this clash will print an error
message, but as far as I can tell, the rest of the device assignment
will continue as normal.  (Please correct me if I've followed the code
wrong.)


Not all cases are like this behavior but here is true.



Since the device should be just as functional (or not functional)
either way, but in the STRICT case should actually print an error
message which someone might notice, it seems to me that STRICT is a
better option for the hardware domain.



Just see above.


Secondly, you assert in response to Kevin's question in v3 that this
path is only reachable when assigning to the hardware domain.  I think
you at least need to update the comment here to indicate that's what
you think; it's not at all obvious just from looking at the function


What about this?

  PCI_DEVFN2(bdf) == devfn )
 {
 /*
- * RMRR is always reserved on e820 so either of flag
- * is fine for hardware domain and here we'd like to
- * pass XEN_DOMCTL_DEV_RDM_RELAXED.
+ * Here means we're add a device to the hardware domain
+ * so actually RMRR is always reserved on e820 so either
+ * of flag is fine for hardware domain and here we'd like
+ * to pass XEN_DOMCTL_DEV_RDM_RELAXED.
  */
 ret = rmrr_identity_mapping(pdev->domain, 1, rmrr,
 XEN_DOMCTL_DEV_RDM_RELAXED);



that this is true.  And if we do end up doing something besides
STRICT, we should check to make sure that pdev->domain really *is* the
hardware domain before acting like it is.


  if ( ret )
  dprintk(XENLOG_ERR VTDPREFIX, "d%d: RMRR mapping failed\n",
  pdev->domain->domain_id);
@@ -1939,7 +1946,8 @@ static int intel_iommu_remove_device(u8 devfn, struct 
pci_dev *pdev)
   PCI_DEVFN2(bdf) != devfn )
  continue;

-rmrr_identity_mapping(pdev->domain, 0, rmrr);
+rmrr_identity_mapping(pdev->domain, 0, rmrr,
+  XEN_DOMCTL_DEV_RDM_RELAXED);
  }


Same here wrt STRICT.


This is inside intel_iommu_remove_device() so actually any flag doesn't 
take effect to rmrr_identity_mapping(). But I should add a comment like 
this,


+/*
+ * Any flag is nothing to clear these mappings so here
+ * its always safe to set XEN_DOMCTL_DEV_RDM_RELAXED.
+ */




After those changes (a single RDM_RELAXED flag, passing STRICT in for
the hardware domain) then I think this patch is in good shape.



Based on my understanding to your concern, seems you always think in 
case of "relax" we don't post any message, right? But now as I reply 
above this is not correct so what's your further consideration?


Anyway, I'm fine to change this. And after you suggested to keep one bit 
just to indicate XEN_DOMCTL_DEV_RDM_RELAXED, we don't have that actual 
XEN_DOMCTL_DEV_RDM_STRICT so I can just reset all associated flag as 0 
easily.


Thanks
Tiejun


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen 4.6 Development Update (2 WEEKS TO FREEZE, important information in preamble)

2015-07-02 Thread Razvan Cojocaru
Hello Ian, Julien,

On 07/02/2015 11:35 AM, Ian Campbell wrote:
> On Wed, 2015-07-01 at 11:17 +0100, Julien Grall wrote:
>> As suggested by Wei on the top of his mail [1], can you please CC only
>> relevant people and avoid to reply all?
> 
> It seems that many people are unable to follow these simple
> instructions.
> 
> Wei, perhaps you could stop CCing people who inappropriately do not trim
> their quotes or the CC list in the future. If they cannot do us the
> courtesy of doing so I don't see why they should receive a courtesy copy
> of the status mail.
> 
> A somewhat less aggressive approach might be to use Bcc instead Cc for
> the bulk of people (i.e. anyone who needn't be cc-d on every reply). The
> failure case of someone who cannot read simple instructions then becomes
> a lack of CCs rather than a plethora of unwanted Ccs.

First of all, let me apologize for doing this in the past, I'll
certainly remember to not let it happen again.

Second, I'd like to point out that, while I cannot speak for everyone,
and so maybe it's just me, I find this statement: "please trim your
quotes when replying, and also trim the CC list if necessary" a bit
ambiguous.

The quotes part is obvious (and not that many people have ommited to do
that), but to be honest I haven't been clear on who is supposed to be in
the trimmed CC list: the maintainers of the code I'm touching with my
series and Wei? There doesn't seem to be a clear rule about who should
be replied to (or maybe there is and I've missed it? If so, could you
please point it out?). Maybe clearing this up could help with this
problem in the future.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-02 Thread Wu, Feng


> -Original Message-
> From: Dario Faggioli [mailto:dario.faggi...@citrix.com]
> Sent: Thursday, July 02, 2015 4:30 PM
> To: Wu, Feng
> Cc: Andrew Cooper; xen-devel@lists.xen.org; Zhang, Yang Z;
> george.dun...@eu.citrix.com; Tian, Kevin; k...@xen.org; jbeul...@suse.com
> Subject: Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU
> is blocked
> 
> On Thu, 2015-07-02 at 04:27 +, Wu, Feng wrote:
> 
> > > > > +list_for_each_entry(vmx, &per_cpu(pi_blocked_vcpu, cpu),
> > > > > +pi_blocked_vcpu_list)
> > > > > +if ( vmx->pi_desc.on )
> > > > > +tasklet_schedule(&vmx->pi_vcpu_wakeup_tasklet);
> > > >
> > > > There is a logical bug here.  If we have two NV's delivered to this
> > > > pcpu, we will kick the first vcpu twice.
> > > >
> > > > On finding desc.on, a kick should be scheduled, then the vcpu removed
> > > > from this list.  With desc.on set, we know for certain that another NV
> > > > will not arrive for it until it has been scheduled again and the
> > > > interrupt posted.
> > > >
> > > Yes, that seems a possible issue (and one that should indeed be
> > > avoided).
> > >
> > > I'm still unsure about the one that I raised myself but, if it is
> > > possible to have more than one vcpu in a pcpu list, with desc.on==true,
> > > then it looks to me that we kick all of them, for each notification.
> > >
> > > Added what Andrew's spotted, if there are a bunch of vcpus, queued with
> > > desc.on==ture, and a bunch of notifications arrives before the tasklet
> > > gets executed, we'll be kicking the whole bunch of them for a bunch of
> > > times! :-/
> >
> > As Andrew mentioned, removing the vCPUs with desc.on = true from the
> > list can avoid kick vCPUs for multiple times.
> >
> It avoids kicking vcpus multiple times if more than one notification
> arrives, yes.
> 
> It is, therefore, not effective in making sure that, even with only one
> notification, you only kick the interested vcpu.
> 
> This is the third time that I ask:
>  (1) whether it is possible to have more vcpus queued on one pcpu PI
>  blocked list with desc.on (I really believe it is);

I think it is, please see the following scenario:

When cpu masks the interrupts, and an external interrupt occurs for the
assigned device while the target vCPU2 is blocked, the wakeup notification
event handler has no chance to run, after a while, another wakeup
notification event for vCPU4 blocking on the same pCPU occurs,
after cpu unmakes the interrupts, wakeup notification handler
gets called. Then we get:
vCPU2, desc.on = 1 and vCPU4, desc.on = 1
Then in the handler we need to kick both of them.

>  (2) if yes, whether it is TheRightThing(TM) to kick all of them, as
>  soon as any notification arrives, instead that putting together a
>  mechanism for kicking only a specific one.
> 
Why can't we kick all of them, 'desc.on = 1' means there is a pending
interrupt, when we meet this condition, kicking the related vCPU should
be the right thing to do.

Thanks,
Feng

> The fact that you're not answering is not so much of a big deal for
> me... I'll just keep asking! :-D
> 
> 
> Regards,
> Dario
> --
> <> (Raistlin Majere)
> -
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH v3 10/18] xen/arm: ITS: Add APIs to add and assign device

2015-07-02 Thread Ian Campbell
On Thu, 2015-07-02 at 14:10 +0530, Vijay Kilari wrote:
> On Mon, Jun 29, 2015 at 5:59 PM, Ian Campbell  wrote:
> > CIOn Mon, 2015-06-22 at 17:31 +0530, vijay.kil...@gmail.com wrote:
> >> +/* Device assignment. Should be called from pci_device_add */
> >> +int its_add_device(struct domain *d, u32 devid)
> >> +{
> >
> > Prior to the PCI series landing, and to enable dom0 to use ITS it might
> > be possible to call this from xen/arch/arm/platforms/thunderx.c via the
> > specific_mappings platform hook, which would also expose the PCI
> > controller to dom0 via a series of specific mmio mappings (look at
> > xen/arch/arm/platforms/xgene-storm.c for the sort of thing I mean).
> >
> > That would, I think, give basic PCI functionality for dom0 (i.e.
> > allowing us to boot on thunderx) and decouple things from the PCI series
> > somewhat, which ought to make things easier overall IMHO.
> 
>   In case ThunderX, mmio mappings PCI RC is parsed to find devices.
> How do we know device ids upfront to call its_add_device?

Ah yes, this approach probably doesn't work for ITS. Which is a shame.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH xen] stubdom: vtpmmgr: Correctly format size_t with %z when printing.

2015-07-02 Thread Ian Campbell
On Thu, 2015-07-02 at 02:23 +0200, Samuel Thibault wrote:
> Ian Campbell, le Fri 26 Jun 2015 12:06:09 +0100, a écrit :
> > Also contains a fix from Thomas Leonard (to use %u for "4 + 32", not
> > %lu) previously posted as part of "mini-os: enable compiler check for
> > printk format types" but with mini-os now having been split a separate
> > repo most of that change has been applied there.
> > 
> > This fixes the 32-bit build with updated mini-os which includes format
> > string checking.
> > 
> > Signed-off-by: Thomas Leonard 
> > Signed-off-by: Ian Campbell 
> > Cc: Daniel De Graaf 
> > Cc: Stefano Stabellini 
> 
> Acked-By: Samuel Thibault 
> 
> (after the 'z' modifier support is commited, of course)

Er, yes, well spotted!

> > ---
> > I intend to fold in an update to MINIOS_UPSTREAM_REVISION upon commit
> > to pull in the updated mini-os plus the "Correct printf formatting for
> > tpm_tis message." patch I've just posted.
> > ---
> >  stubdom/vtpmmgr/disk_read.c |   12 ++--
> >  1 file changed, 6 insertions(+), 6 deletions(-)
> > 
> > diff --git a/stubdom/vtpmmgr/disk_read.c b/stubdom/vtpmmgr/disk_read.c
> > index e9dc20f..944d3ff 100644
> > --- a/stubdom/vtpmmgr/disk_read.c
> > +++ b/stubdom/vtpmmgr/disk_read.c
> > @@ -548,18 +548,18 @@ int vtpm_load_disk(void)
> > TPM_read_pcrs();
> >  
> > printk("TPM Manager - disk format %d\n", TPM_MGR_VERSION);
> > -   printk(" root seal: %lu; sector of %d: %lu\n",
> > +   printk(" root seal: %zu; sector of %d: %zu\n",
> > sizeof(struct disk_root_sealed_data), SEALS_PER_ROOT_SEAL_LIST, 
> > sizeof(struct disk_seal_list));
> > -   printk(" root: %lu v=%lu\n", sizeof(root1), sizeof(root1.v));
> > -   printk(" itree: %lu; sector of %d: %lu\n",
> > +   printk(" root: %zu v=%zu\n", sizeof(root1), sizeof(root1.v));
> > +   printk(" itree: %u; sector of %d: %zu\n",
> > 4 + 32, NR_ENTRIES_PER_ITREE, sizeof(struct disk_itree_sector));
> > -   printk(" group: %lu v=%lu id=%lu md=%lu\n",
> > +   printk(" group: %zu v=%zu id=%zu md=%zu\n",
> > sizeof(struct disk_group_sector), sizeof(struct 
> > disk_group_sector_mac3_area),
> > sizeof(struct group_id_data), sizeof(struct group_details));
> > -   printk(" group seal: %lu; %d in parent: %lu; sector of %d: %lu\n",
> > +   printk(" group seal: %zu; %d in parent: %zu; sector of %d: %zu\n",
> > sizeof(struct disk_group_sealed_data), NR_SEALS_PER_GROUP, 
> > sizeof(struct disk_group_boot_config_list),
> > SEALS_PER_GROUP_SEAL_LIST, sizeof(struct disk_group_seal_list));
> > -   printk(" vtpm: %lu+%lu; sector of %d: %lu\n",
> > +   printk(" vtpm: %zu+%zu; sector of %d: %zu\n",
> > sizeof(struct disk_vtpm_plain), sizeof(struct disk_vtpm_secret),
> > VTPMS_PER_SECTOR, sizeof(struct disk_vtpm_sector));
> >  
> > -- 
> > 1.7.10.4
> > 
> 



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 11/22] xen/x86: allow disabling emulated devices for HVM guests

2015-07-02 Thread Roger Pau Monné
El 01/07/15 a les 17.46, Andrew Cooper ha escrit:
> On 01/07/15 15:46, Roger Pau Monne wrote:
>> Introduce a new DOMCTL flag that can be used to disable device emulation
>> inside of Xen for HVM guests. The following emulated devices are disabled
>> when the XEN_DOMCTL_CDF_noemu is used: hpet, pmtimer, rtc, ioapic, lapic,
>> pic and pmu. Also all the MMIO handlers are disabled.
>>
>> Signed-off-by: Roger Pau Monné 
>> Cc: Jan Beulich 
>> Cc: Andrew Cooper 
>> Cc: Boris Ostrovsky 
>> Cc: Suravee Suthikulpanit 
>> Cc: Aravind Gopalakrishnan 
>> Cc: Jun Nakajima 
>> Cc: Eddie Dong 
>> Cc: Kevin Tian 
> 
> I would be hesitant to have a blanket change like this.
> 
> Consider APICV/AVIC.  For performance reasons, we absolutely want HVM
> and PVH to make use of them, as they are substantially more efficient
> using hardware support than evening using plain evtchn hypercalls.
> 
> However, the flipside is that we must provide an LAPIC emulation to
> cover the bits which hardware cannot virtualise.
> 
> As a random idea, how about having a new hypercall or hvmparam which
> provides a bitmap of permitted emulators?  This would allow far finer
> grain control over what is and isn't available to a domain.

I don't think using a new hypercall or hvmparam is suitable for this,
the emulators are initialized in hvm_domain_initialise which is called
by the XEN_DOMCTL_createdomain hypercall. Trying to set them before
calling XEN_DOMCTL_createdomain is impossible because there's no domain
struct yet, and adding a new hypercall to do that later seems quite
convoluted, IMHO it's best to never initialize them in the first place.

I would rather add a bitmap field to xen_arch_domainconfig in order to
describe which emulators we want to enable.

I've been also wondering why we need to introduce this now, AFAICT we
can always introduce this bitmap field later and remove
XEN_DOMCTL_CDF_noemu/DOMCRF_noemu because the DOMCTL interface is not
stable anyway.

Also, from a guest POV, how is the hw emulated local apic going to be
used? Are we going to route the interrupts from virtual devices
(netfront, blkfront) to the lapic? Or we just want it for the timer and
ditch the PV timer?

I can see that this is more interesting for a PVH/HVMlite Dom0, but
still in that case I'm not sure how a guest is supposed to interact with
it. Will the PHYSDEV hypercalls route interrupts to the emulated local
apic instead of pirqs event channels? Will we trap PCI/MSI/MSI-X
configuration and emulate it?

Roger.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH OSSTEST v3] mg-all-branch-statuses: Show how up to date each branch is

2015-07-02 Thread Ian Campbell
Using report_find_push_age_info allows us to provide counts of
attempts since the last baseline on current tip as well as the first
attempt of each of those.

Since everything serialises on the repo lock I didn't bother trying to
parallelise anything.

It's a little terse to keep it in 80 chars.

Signed-off-by: Ian Campbell 
---
v2: Use new report_find_push_age_info functionality, output condensed.

v3: - Correctly quote parameters to printf, so empty ones don't misalign the
  rest.
- Drop dates in favour of number of days, and n/a in those cases
- Print "Error!" if no tip is available and "UpToDate" if tip==basis.

time with an recently updates set of Repos says:

57.34user 28.28system 5:34.16elapsed 25%CPU (0avgtext+0avgdata 
47256maxresident)k
100216inputs+600outputs (673major+2332436minor)pagefaults 0swaps

So it's not quick...

Example output:
Branch   BasisTip  #Tip #Tot 1stTip 1stNew
libvirt  d10a5f58 845184b20   10 n/a8 days
linux-3.0e1c63f9f 5dba9ddd29 2 days 868 days
linux-3.10   b3d78448 UpToDate
linux-3.14   762167f9 UpToDate
linux-3.16   162d6432 26749e7511 1 day  1 day
linux-3.18   d048c068 ea5dd38e33 2 days 2 days
linux-3.4bb4a05a0 cf1b3dad   15  161 11 days212 days
linux-4.1b953c0d2 6a010c0a0- n/an/a
linux-arm-xen64972ceb UpToDate
linux-linus  6aaf0da8 4da3064d00 n/an/a
linux-mingo-tip-master   d935d0f7 778a1ac50   16 n/a1173 days
linux-nextc8a9ad220  219 n/a447 days
osstest  15d2dd50 Error!  0- n/an/a
ovmf 269e0aeb 288ed59001 n/a1 day
qemu-mainlined2966f80 UpToDate
qemu-upstream-4.2-testingd2382550 UpToDate
qemu-upstream-4.3-testingefae5e0f UpToDate
qemu-upstream-4.4-testing32226f42 UpToDate
qemu-upstream-4.5-testingd9552b0a UpToDate
qemu-upstream-unstable   c4a962ec UpToDate
rumpuserxen  30d72f3f 3b91e449   61  113 77 days149 days
seabios  f24eb2f8 UpToDate
xen-4.0-testing  2692df2a UpToDate
xen-4.1-testing  40feff87 UpToDate
xen-4.2-testing  38fcda22 UpToDate
xen-4.3-testing  e7c02297 UpToDate
xen-4.4-testing  6c1cb3db UpToDate
xen-4.5-testing  e3bd3cef UpToDate
xen-unstable c40317f1 3d55a17903 n/a2 days
---
 mg-all-branch-statuses | 135 +
 1 file changed, 135 insertions(+)
 create mode 100755 mg-all-branch-statuses

diff --git a/mg-all-branch-statuses b/mg-all-branch-statuses
new file mode 100755
index 000..9a78ec2
--- /dev/null
+++ b/mg-all-branch-statuses
@@ -0,0 +1,135 @@
+#!/bin/bash
+# -*- bash -*-
+#
+# Prints the status of each branch
+#
+# Usage:
+#./mg-all-branch-statuses [BRANCH]
+#
+# If no BRANCHes specified, does all that are normally run by
+# cr-daily-branch or out of crontab.
+
+# This is part of "osstest", an automated testing framework for Xen.
+# Copyright (C) 2009-2014 Citrix Inc.
+#
+# This program is free software: you can redistribute it and/or modify
+# it under the terms of the GNU Affero General Public License as published by
+# the Free Software Foundation, either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU Affero General Public License for more details.
+#
+# You should have received a copy of the GNU Affero General Public License
+# along with this program.  If not, see .
+
+set -e
+
+. cri-common
+
+mkdir -p tmp
+
+if [ $# = 0 ]; then
+   set `./mg-list-all-branches`
+fi
+
+gather_info()
+{
+local branch=$1; shift
+local basis=$1;  shift
+local tip=$1;shift
+
+select_xenbranch
+
+local info=`perl -we '
+   use Osstest::Executive;
+   use Osstest;
+   use Data::Dumper;
+   open DEBUG, ">/dev/null" or die $!;
+   #open DEBUG, ">&STDERR" or die $!;
+   csreadconfig();
+   my ($branch,$tree,$basis,$tip) = @ARGV;
+   print DEBUG "branch=$branch tree=$tree basis=$basis tip=$tip\n";
+   my $info = report_find_push_age_info([qw(real adhoc play)],
+ undef, [($branch)],
+ $tree, $basis, $tip);
+   print DEBUG Dumper $info;
+   my $onevar = sub {
+   my ($var,$dflt) = @_;
+   $dflt //= "";
+   print "export ".uc($var)."=\"";
+   print $info->{$var}

Re: [Xen-devel] [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy

2015-07-02 Thread George Dunlap
On 07/02/2015 02:11 AM, Chen, Tiejun wrote:
 If I'm correct, then #3 means it's not possible to have devices
 for a
 domain *default* to strict, but to be relaxed in individual
 instances.
 If you had five devices you wanted strict, and only one device you
 wanted to be relaxed (because you knew it didn't matter), you'd
 have
 to set reserved=strict for all the other devices, rather than just
 being able to set the domain setting to strict and set
 reserve=relaxed
 for the one.

 I think that both violates the principle of least surprise, and is
 less useful.
>>
>
> So what's you idea to follow our requirement?

 So consider the following config snippet:

 ---
 rdm="reserve=relaxed"

 pci=['01:00.1,msitranslate=1']
 

 What should the policy for that device be?

 According to your policy document, it seems to me like it should be
 "relaxed", since the domain default* is set to "relaxed" and nothing
>>>
>>> Why? "strict" should be in this case.
>>
>> OK, I think I see where the problem is.  I had expected the domain-wide
>> setting to be a default which was overridden by per-device policies (see
>> pci_permissive and friends).  So when I saw "global default RDM policy"
> 
> We knew this behavior but we'd like to take a different consideration in
> this case.
> 
>> confirmation bias caused me to interpret it as what I expected to see --
>> the domain setting as the default, which the local setting could
>> override.
>>
>> I see now that in your documentation you consistently talk about two
>> different policies, each of which have their own defaults, and that the
>> effective permissions for a device end up being the intersection of the
>> two (i.e., only relaxed of both are relaxed; strict under all other
>> circumstances).
>>
>>> Why are you saying this is not our expectation? Just let me pick up that
>>> description *again*,
>>>
>>> "Default per-device RDM policy is 'strict', while default global RDM
>>> policy is 'relaxed'. When both policies are specified on a given region,
>>> 'strict' is always preferred."
>>
>> Look, if I haven't understood what you meant by the exact same words the
>> first 4 times I read it, simply repeating the same exact words is not
>> going to be helpful.  Ideally you need to try go understand where my
>> misunderstanding is coming from and explain where I've misunderstood
>> something; or, at least you need to try to use different words, or
>> explain how the words you're using apply to the given situation.
> 
> From my point of view, I already replied this previously by quoting part
> of the patch head description. As you know this revision is already
> marked as v4 and although I admit some code implementations still need a
> further review, at least our policy should already acknowledged right
> now unless this is really wrong. But in our case, looks you're
> concerning our mechanism is not expected to you. So
> 
>>
 This interface doesn't make any sense to me.  Why, if the "global
>>>
>>> If you have any objection to our solution, and if you can't find any
>>> reasonable answer from our design, just please ping Jan or Kevin because
> 
> just do it to make this clear to us. And then, whatever, I'm going be
> fine to step next.
> 
>>> I'm really not that person who can address this kind of change at this
>>> point in this high level.
>>
>> And you have no idea why that design was chosen; you're just doing what
> 
> Certainly I have my own understanding with this issue. But
> 
>> you're told?
> 
> in high level I have to say Yes. If you really read that v2 design and
> its associated discussion, you should notice I didn't put any response
> right there.

Look, I'm getting a bit angry at your continual implication that I
haven't put in enough work reading the background for this series.  If
you go back and look at the v2 design discussion, you'll see that I was
actively involved in that discussion, and sent at least a dozen emails
about it.  I have now spent nearly two full days just on this series,
including going back over lots of conversations that have happened
before to find answers to questions which you could have given in a
single line; and also to check assertions that you've made which have
turned out to be false.

In the v2 design discussion, the only thing I could find regarding the
relationship between per-device settings and the domain-wide setting was
as where you said [1]:

"per-device override is always favored if a conflicting setting in
rmrr_host."

And in v2, Wei asked you [2]:

"But this only works with global configuration and individual
configuration in PCI spec trumps this, right?"

And you responded [3]:

"You're right."

Now it happens that in all those cases you were literally talking about
the rmrr_host part of the configuration, not the strict/relaxed part of
the conf

Re: [Xen-devel] [CALL-FOR-AGENDA] Monthly Xen.org Technical Call (2015-07-08)

2015-07-02 Thread Roger Pau Monné
El 02/07/15 a les 10.45, Ian Campbell ha escrit:
> On Thu, 2015-07-02 at 09:32 +0200, Roger Pau Monné wrote:
>> El 01/07/15 a les 18.15, Boris Ostrovsky ha escrit:
>>> On 07/01/2015 11:57 AM, Ian Campbell wrote:
 The next Xen technical call will be at:
  Wed  8 Jul 17:00:00 BST 2015
  `date -d @1436371200`

 See http://lists.xen.org/archives/html/xen-devel/2015-01/msg00414.html
 for more information on the call.

 Please let me know (CC-ing the list) any topics which you would like to
 discuss. It might be useful to include:

* References to any relevant/recent mailing list threads;
* Other people who you think should be involved in the discussion (and
  CC them);

 If you would like to attend then please let me know so I can send you the
 dial in details.
>>>
>>>
>>> Given that there is fair amount of PVH-related work happening now
>>> (Roger's, Elena's and mine) perhaps we should have a discussion about
>>> that to see where we are going?
>>>
>>> Andrew, Tim, Roger, Jan (if he is back from vacation), Elena, Konrad and
>>> David would be good to have present.
> 
> Yes, I think this topic would indeed be a worthwhile subject for
> discussion.
> 
> I think there is a toolstack element to all this as well, so would it be
> a good idea to have Ian J, Stefano and Wei in addition to me (who will
> be there to run the call in any case)? I'd probably mark them in as
> optional participants for scheduling purposes.

Yes, my series include way more toolstack than hypervisor changes.

>> I'm sorry but I will be on vacation next week, hiking in the mountains
>> without Internet or cell phone coverage, so I won't be able to join the
>> meeting.
> 
> I think you are one of the critical participants in the discussion, i.e.
> we can't go ahead without you.
> 
>> I will be back on the 20th, maybe we can arrange to do another of this
>> PVH meetings on the next technical call in early August?
> 
> The August call is scheduled for 12 August, which is 6 weeks from
> yesterday.
> 
> We could either go for that or if it is too far out we could push the
> July iteration of the call back two weeks to Wednesday 22 July (so 3
> weeks from yesterday). (a one week slip would still land in your
> vacation time).

I would rather go for the 22 or 29 of July or the 5 of August, because
I'm also on vacations from the 10 to the 16 of August.

> There is also the devsummit in the week 17 August (~7 weeks time) which
> is an opportunity for a f2f.
> 
> Shall I put up a poll of some sort to gather preferred timeslot options
> out of that set?

That sounds fine, thanks.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v4][PATCH 12/19] tools/libxl: passes rdm reservation policy

2015-07-02 Thread Chen, Tiejun

I'm not only suggesting changing the layout of the patches; I'm


Sorry for this misunderstanding.


suggesting modifying the functionality.

In patch 12 you add a new command-line parameter to xl; so that you have
to type something like this:

# xl pci-attach ubuntu01 01:00.1,msitranslate=1 relaxed

What I'm saying is that you can drop the xl part of that patch entirely,
because once you have the xlu code in, you can just do this:

# xl pci-attach ubuntu01 01:00.1,msitranslate=1,rdm_reserve=relaxed

This has the positive advantage that you can copy and paste the same
string into both the xl command and the xl config file.



I think you're right,

pciattach()
|
+ xlu_pci_parse_bdf()

So I really should drop this patch as you said.

Thanks
Tiejun


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 00/27] Libxl migration v2

2015-07-02 Thread Andrew Cooper
On 02/07/15 08:33, Yang Hongyang wrote:
> Hi Andrew,
>
>   Are there any updates of this series that I can checkout and rebase
> mine onto?
> :)

Not yet - I am very sorry it is taking this long.  I am working on it
and am half way through, but rebasing over the AO Abort series is
proving far more complicated than I initially expected.  A lot of
functions which I had introduced or modified have had their number of
parameters and error semantics changed.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 00/27] Libxl migration v2

2015-07-02 Thread Yang Hongyang



On 07/02/2015 05:26 PM, Andrew Cooper wrote:

On 02/07/15 08:33, Yang Hongyang wrote:

Hi Andrew,

   Are there any updates of this series that I can checkout and rebase
mine onto?
:)


Not yet - I am very sorry it is taking this long.  I am working on it
and am half way through, but rebasing over the AO Abort series is
proving far more complicated than I initially expected.  A lot of
functions which I had introduced or modified have had their number of
parameters and error semantics changed.


I understand that's a pain...



~Andrew
.



--
Thanks,
Yang.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [CALL-FOR-AGENDA] Monthly Xen.org Technical Call (2015-07-08)

2015-07-02 Thread Ian Campbell
On Thu, 2015-07-02 at 09:45 +0100, Ian Campbell wrote:
> Shall I put up a poll of some sort to gather preferred timeslot options
> out of that set?

Please can everyone who is interested in this topic indicate their date
preference/availability at:

http://doodle.com/cy88dhwzybg7hh7p

I've gone with the usual 5pm BST slow for simplicity. That's 1200 Noon
EDT, 9am PDT and 6pm CEST.

If it turns out we can't find a option at that time then we can try
again with some other times in the mix.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] Xen + futexes

2015-07-02 Thread Vitaly Chernooky
Hi all!

I have theoretical question.

What do you think about porting futexes or binder to XEN?

With best regards,

-- 
*Vitaly Chernooky | Senior Developer - Product Engineering and Development*
GlobalLogic
P *+380.44.4929695 ext.1136* M *+380.63.6011802* S cvv_2k
www.globallogic.com

http://www.globallogic.com/email_disclaimer.txt
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [linux-3.16 test] 59012: tolerable FAIL - PUSHED

2015-07-02 Thread osstest service user
flight 59012 linux-3.16 real [real]
http://logs.test-lab.xenproject.org/osstest/logs/59012/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-i386-qemut-rhel6hvm-intel 12 guest-start/redhat.repeat fail in 
58996 pass in 59012
 test-armhf-armhf-xl-arndale   6 xen-bootfail pass in 58996
 test-armhf-armhf-xl-multivcpu  7 host-ping-check-xenfail pass in 58996

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-xl-rtds 14 guest-localmigrate fail in 58996 baseline untested
 test-armhf-armhf-xl-rtds 14 guest-start.2  fail in 58996 baseline untested
 test-amd64-i386-libvirt  11 guest-start  fail   like 58447
 test-amd64-amd64-xl-credit2  17 guest-localmigrate/x10   fail   like 58447
 test-amd64-i386-libvirt-xsm  11 guest-start  fail   like 58447
 test-amd64-amd64-xl-qemut-win7-amd64 16 guest-stop fail like 58447

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl-arndale  12 migrate-support-check fail in 58996 never pass
 test-armhf-armhf-xl-multivcpu 12 migrate-support-check fail in 58996 never pass
 test-amd64-amd64-xl-pvh-intel 11 guest-start  fail  never pass
 test-amd64-amd64-xl-pvh-amd  11 guest-start  fail   never pass
 test-amd64-i386-freebsd10-i386  9 freebsd-install  fail never pass
 test-amd64-i386-freebsd10-amd64  9 freebsd-install fail never pass
 test-amd64-amd64-xl-multivcpu 17 guest-localmigrate/x10   fail  never pass
 test-amd64-amd64-xl-rtds 17 guest-localmigrate/x10   fail   never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 guest-start/debian.repeatfail   never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-xsm  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 12 migrate-support-checkfail never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop fail never pass
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop  fail never pass
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop  fail never pass
 test-armhf-armhf-libvirt 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-xsm 12 migrate-support-checkfail   never pass

version targeted for testing:
 linux26749e751cc2be7bc0f17a6cca68f6e5c0675191
baseline version:
 linux162d64326176ee1916fb98323d810c78a7e3d042


People who touched revisions under test:
  Aaro Koskinen 
  Adam Jackson 
  Alex Deucher 
  Andrew Morton 
  Andy Lutomirski 
  Axel Lin 
  Chengyu Song 
  Chris Mason 
  Clemens Ladisch 
  Dan Carpenter 
  Dan Williams 
  Dave Airlie 
  David S. Miller 
  David Woodhouse 
  Dieter Jurzitza
  Dmitry Torokhov 
  Eric Dumazet 
  Fabio Estevam 
  Fan Du 
  Felipe Balbi 
  Filipe Manana 
  Florian Fainelli 
  Greg Kroah-Hartman 
  Gregory CLEMENT 
  Gu Zheng 
  H. Peter Anvin 
  Hannes Frederic Sowa 
  Hans de Goede 
  Herbert Xu 
  Horia Geanta 
  Hui Wang 
  Ian Campbell 
  Ingo Molnar 
  James Hogan 
  Jan Kara 
  Jani Nikula 
  Jason A. Donenfeld 
  Jean Delvare 
  Jeff Mahoney 
  Jenny Falkovich 
  Jens Axboe 
  Jiang Liu 
  Jim Bride 
  Jiri Pirko 
  Johan Hovold 
  Johannes Berg 
  John D. Blair 
  Jonathan Cameron 
  Jérôme Glisse 
  Kim Phillips 
  Lars-Peter Clausen 
  Laura Abbott 
  Li RongQing 
  Linus Torvalds 
  Luis Henriques 
  Lukas Wunner 
  Mark Salyzyn 
  Michael S. Tsirkin 
  Michal Marek 
  Nadav Haklai 
  Nicholas Bellinger 
  Nicholas Mc Guire 
  nightmixes 
  Nikolay Aleksandrov 
  Oliver Grafe  (v2)
  Patrick Riphagen 
  Paul Cercueil 
  Peter Feuerer 
  Peter Hutterer 
  Peter Kümmel 
  Philipp Zabel 
  Ralf Baechle 
  Richard Cochran 
  Ross Lagerwall 
  Sachin Kamat 
  Sagi Grimberg 
  Sam Hung 
  Shawn Bohrer 
  Steffen Klassert 
  Steve Cornelius 
  Steven Rostedt 
  Subbaraya Sundeep Bhatta 
  Subbaraya Sundeep Bhatta 
  Takashi Iwai 
  Tejun Heo 
  Thomas Gleixner 
  Tim Gardner 
  tommy.gag...@gmail.com
  Veaceslav Falico 
  Victoria Milhoan 
  Vince Weaver 
  Vlad Yasevich 
  Vladislav Yasevich 
  Wang Long 
  Wei Liu 
  Wolfram Sang 
  洪一竹 


jobs:
 build-amd64-xsm  pass
 build-armhf-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 bu

Re: [Xen-devel] [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy

2015-07-02 Thread Chen, Tiejun

in high level I have to say Yes. If you really read that v2 design and
its associated discussion, you should notice I didn't put any response
right there.


Look, I'm getting a bit angry at your continual implication that I


Sorry to this.


haven't put in enough work reading the background for this series.  If
you go back and look at the v2 design discussion, you'll see that I was
actively involved in that discussion, and sent at least a dozen emails
about it.  I have now spent nearly two full days just on this series,


Sure and thanks for your review and time.


including going back over lots of conversations that have happened
before to find answers to questions which you could have given in a
single line; and also to check assertions that you've made which have
turned out to be false.

In the v2 design discussion, the only thing I could find regarding the
relationship between per-device settings and the domain-wide setting was
as where you said [1]:

"per-device override is always favored if a conflicting setting in
rmrr_host."

And in v2, Wei asked you [2]:

"But this only works with global configuration and individual
configuration in PCI spec trumps this, right?"

And you responded [3]:

"You're right."

Now it happens that in all those cases you were literally talking about
the rmrr_host part of the configuration, not the strict/relaxed part of
the configuration; but that doesn't even make sense, since there *is* no
device-specific rmrr_host setting -- the only configuration which has
both a domain-wide and per-device component is the relaxed/strict.

So:

1. After spending yet another half hour doing research, I haven't found
any discussion that concluded we should have the global policy override
the local policy


I also took some time to go back checking this point and indeed this is 
not in that public design. And as I mentioned in another email which is 
following this, I also had a talk to Kevin about this issue, and looks 
this is just concluded from our internal discussion and he didn't post 
this in v2 design again because as you know, that design is about 
something in high level. And as I recall, these discussions can't cover 
everything at that moment because they thought we'd better post a 
preliminary patches to further discuss something since this is really a 
complicated case. So afterwards I sent out two RFC revisions to help all 
guys finalize a good solution. And I can confirm current policy is 
always same from the first RFC, but we didn't see any opposite advice 
until now.




2. The only discussion I *did* find has *you yourself* saying that the
per-device setting should override the global setting, not once, but
twice; and nobody contradicting you.

Maybe there is somewhere else a discussion somewhere where this was
changed; but I've already spent half an hour this morning looking at
where you said it was (v2 design discussion), and found the opposite --
just as I remembered.  I'm not going to look anymore.

You have now caused me to waste an awful lot of time on this series that
could profitably have been used elsewhere.


Sorry to this but I just think we already have 2 RFC revisions and 4 
revisions without RFC, and some patches are already Acked, we really 
should overturn this policy right now?




[1]
marc.info/?i=

[2] marc.info/?i=<20150519110041.gb21...@zion.uk.xensource.com>

[3] marc.info/?i=<555c1b5c.7070...@intel.com>



I was involved in the design discussion, and from the very beginning I
probably saw your plan but misunderstood it.  I wouldn't be surprised if
some others didn't quite understand what they were agreeing to.


Again, I didn't walk into v2 design. So here I don't want to bring any
confusion to you just with my reply.


This is your feature, so it is your responsibility to understand and
explain why you are doing what you are doing, if only to say "Jan wanted


Maybe you remember I just posted v1 but looks that was not a better 
design to show this implementation according to some feedback, so Kevin 
issued v2 revision and had a wider discussion with you guys. Since then 
I just follow this version. So I mean I don't further hold these things 
in high level since I just think both policy is fine to me because IMO, 
these two approaches are optional.



X to happen because of Y [see $ref]."



So this is why I said you'd better ask this to Kevin or Jan since I 
can't decide what's next at this point.


Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86/p2m-ept: Don't unmap the EPT pagetable while it is still in use

2015-07-02 Thread George Dunlap
On 06/30/2015 06:09 PM, Andrew Cooper wrote:
> The call to iommu_pte_flush() between the two hunks uses &ept_entry->epte
> which is a pointer into the mapped page.
> 
> It is eventually passed to `clflush` instruction which will suffer a pagefault
> if the virtual mapping has fallen out of the TLB.
> 
> (XEN) [ Xen-4.5.0-xs102594-d  x86_64  debug=y  Not tainted ]
> (XEN) CPU:7
> (XEN) RIP:e008:[] cacheline_flush+0x4/0x9
> 
> (XEN) Xen call trace:
> (XEN)[] cacheline_flush+0x4/0x9
> (XEN)[] __iommu_flush_cache+0x4a/0x6a
> (XEN)[] iommu_pte_flush+0x2b/0xd5
> (XEN)[] ept_set_entry+0x4bc/0x61f
> (XEN)[] p2m_set_entry+0xd1/0x112
> (XEN)[] clear_mmio_p2m_entry+0x1a0/0x200
> (XEN)[] unmap_mmio_regions+0x49/0x73
> (XEN)[] do_domctl+0x15bd/0x1edb
> (XEN)[] syscall_enter+0xeb/0x145
> (XEN)
> (XEN) Pagetable walk from 820040004ae0:
> (XEN)  L4[0x104] = 0008668a5063 
> (XEN)  L3[0x001] = 0008668a3063 
> (XEN)  L2[0x000] = 00086689c063 
> (XEN)  L1[0x004] = 00056f078063 0007f678
> (XEN)
> (XEN) 
> (XEN) Panic on CPU 7:
> (XEN) FATAL PAGE FAULT
> (XEN) [error_code=]
> (XEN) Faulting linear address: 820040004ae0
> (XEN) 
> 
> Signed-off-by: Andrew Cooper 
> CC: Jan Beulich 
> CC: George Dunlap 
> CC: Jun Nakajima 
> CC: Eddie Dong 
> CC: Kevin Tian 

Reviewed-by: George Dunlap 

 -George


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] OSSTest: stop testing SEDF at all

2015-07-02 Thread George Dunlap
On 06/30/2015 06:36 PM, Dario Faggioli wrote:
> SEDF has been broken and unmaintained at least
> until Xen 4.2, and most likely even before! Tests
> are failing without anyonce caring, and yet we're
> keeping using test resources for them.
> 
> Let's stop doing this!
> 
> Signed-off-by: Dario Faggioli 

Acked-by: George Dunlap 

> ---
> diff --git a/allow.all b/allow.all
> index 88a3038..8067d5e 100644
> --- a/allow.all
> +++ b/allow.all
> @@ -1,4 +1,3 @@
> -test-@@-sedf@@
>  test-@@-rtds@@
>  build-@@logs-capture@@
>  test-@@-pcipt@@
> diff --git a/make-flight b/make-flight
> index 31cb942..c763ce9 100755
> --- a/make-flight
> +++ b/make-flight
> @@ -274,21 +274,6 @@ do_hvm_rhel6_tests () {
>done
>  }
>  
> -do_sedf_tests () {
> -  if [ $xenarch != $dom0arch -o x$test_sedf != xy ]; then
> -return
> -  fi
> -
> -  for pin in '' -pin; do
> -job_create_test test-$xenarch$kern-$dom0arch-xl-sedf$pin  \
> -   test-debian xl $xenarch $dom0arch  \
> -guests_vcpus=4\
> -xen_boot_append="sched=sedf loglvl=all ${pin:+dom0_vcpus_pin}" \
> -linux_boot_append='loglevel=9 debug'  \
> -$debian_runvars all_hostflags=$most_hostflags
> -  done
> -}
> -
>  do_credit2_tests () {
>if [ $xenarch != $dom0arch ]; then
>  return
> @@ -374,19 +359,18 @@ test_matrix_do_one () {
>  
>do_multivcpu_tests
>  
> -  # RTDS came in 4.4, while SEDF is going away in 4.6
> +  # RTDS came in 4.5
>case "$xenbranch" in
> -  xen-3.*-testing) test_sedf=y; test_rtds=n ;;
> -  xen-4.0-testing) test_sedf=y; test_rtds=n ;;
> -  xen-4.1-testing) test_sedf=y; test_rtds=n ;;
> -  xen-4.2-testing) test_sedf=y; test_rtds=n ;;
> -  xen-4.3-testing) test_sedf=y; test_rtds=n ;;
> -  xen-4.4-testing) test_sedf=y; test_rtds=n ;;
> -  xen-4.5-testing) test_sedf=y; test_rtds=y ;;
> -  *)   test_sedf=n; test_rtds=y ;;
> +  xen-3.*-testing) test_rtds=n ;;
> +  xen-4.0-testing) test_rtds=n ;;
> +  xen-4.1-testing) test_rtds=n ;;
> +  xen-4.2-testing) test_rtds=n ;;
> +  xen-4.3-testing) test_rtds=n ;;
> +  xen-4.4-testing) test_rtds=n ;;
> +  xen-4.5-testing) test_rtds=y ;;
> +  *)   test_rtds=y ;;
>esac
>  
> -  do_sedf_tests
>do_rtds_tests
>do_credit2_tests
>  
> 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-02 Thread Dario Faggioli
On Thu, 2015-07-02 at 08:58 +, Wu, Feng wrote:

> > -Original Message-
> > From: Dario Faggioli [mailto:dario.faggi...@citrix.com]
> >
> > This is the third time that I ask:
> >  (1) whether it is possible to have more vcpus queued on one pcpu PI
> >  blocked list with desc.on (I really believe it is);
> 
> I think it is, please see the following scenario:
> 
> When cpu masks the interrupts, and an external interrupt occurs for the
> assigned device while the target vCPU2 is blocked, the wakeup notification
> event handler has no chance to run, after a while, another wakeup
> notification event for vCPU4 blocking on the same pCPU occurs,
> after cpu unmakes the interrupts, wakeup notification handler
> gets called. Then we get:
>   vCPU2, desc.on = 1 and vCPU4, desc.on = 1
> Then in the handler we need to kick both of them.
> 
Ok, first of all, thanks for answering! :-)

And yes, this makes sense.

> >  (2) if yes, whether it is TheRightThing(TM) to kick all of them, as
> >  soon as any notification arrives, instead that putting together a
> >  mechanism for kicking only a specific one.
> > 
> Why can't we kick all of them, 'desc.on = 1' means there is a pending
> interrupt, when we meet this condition, kicking the related vCPU should
> be the right thing to do.
> 
Right, I see it now. I felt like I was missing something, and that's why
I was asking to you to elaborate a bit more.
Thanks again for having done this. I was missing/forgetting half of the
way desc.on is actually handled, sorry for this.

BTW, I'm finding it hard reading this series from the archives; there
appears to be some threading issues and some missing messages. I also
don't have it in my inbox, because my filters failed to spot and flag it
properly. If you send a new version, please, Cc me, so it will be easier
for me to look at all the patches, and provide a more helpful review.

Thanks and Regards,
Dario

-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy

2015-07-02 Thread George Dunlap
On 07/02/2015 11:01 AM, Chen, Tiejun wrote:
>> 1. After spending yet another half hour doing research, I haven't found
>> any discussion that concluded we should have the global policy override
>> the local policy
> 
> I also took some time to go back checking this point and indeed this is
> not in that public design. And as I mentioned in another email which is
> following this, I also had a talk to Kevin about this issue, and looks
> this is just concluded from our internal discussion and he didn't post
> this in v2 design again because as you know, that design is about
> something in high level. And as I recall, these discussions can't cover
> everything at that moment because they thought we'd better post a
> preliminary patches to further discuss something since this is really a
> complicated case. So afterwards I sent out two RFC revisions to help all
> guys finalize a good solution. And I can confirm current policy is
> always same from the first RFC, but we didn't see any opposite advice
> until now.

Probably because the reviewers all assumed that the design draft had
been followed, and you didn't make it clear that you'd changed it.

>> 2. The only discussion I *did* find has *you yourself* saying that the
>> per-device setting should override the global setting, not once, but
>> twice; and nobody contradicting you.
>>
>> Maybe there is somewhere else a discussion somewhere where this was
>> changed; but I've already spent half an hour this morning looking at
>> where you said it was (v2 design discussion), and found the opposite --
>> just as I remembered.  I'm not going to look anymore.
>>
>> You have now caused me to waste an awful lot of time on this series that
>> could profitably have been used elsewhere.
> 
> Sorry to this but I just think we already have 2 RFC revisions and 4
> revisions without RFC, and some patches are already Acked, we really
> should overturn this policy right now?

First of all, I think it's easy to change.

Even if it weren't, I already said that I'd be OK with accepting the
patch series with the existing "override" semantics, and without the
"default" semantics, *if* it were renamed to make it clear what was
going on.

But, for future reference, I am not going to approve an interface I
think is misleading or wrong -- particularly one like the xl interface
which we want to avoid changing if possible -- just because time is
short.  One of my own features, HVM USB pass-through, has narrowly
missed two releases (including the current one) because we wanted to be
careful to get the interface right.

>>> Again, I didn't walk into v2 design. So here I don't want to bring any
>>> confusion to you just with my reply.
>>
>> This is your feature, so it is your responsibility to understand and
>> explain why you are doing what you are doing, if only to say "Jan wanted
> 
> Maybe you remember I just posted v1 but looks that was not a better
> design to show this implementation according to some feedback, so Kevin
> issued v2 revision and had a wider discussion with you guys. Since then
> I just follow this version. So I mean I don't further hold these things
> in high level since I just think both policy is fine to me because IMO,
> these two approaches are optional.
> 
>> X to happen because of Y [see $ref]."
>>
> 
> So this is why I said you'd better ask this to Kevin or Jan since I
> can't decide what's next at this point.

Let me say that again: I don't care whether anyone "pulled rank" and
ordered you to do something a certain way.  YOU are the one submitting
this patch.  That means YOU responsible for understanding why they want
it that way, and YOU are responsible for justifying it to other people.
 If you don't understand it at all, it's YOUR responsibility to get them
to explain it, not mine to chase them down.

 -George


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-02 Thread Andrew Cooper
On 02/07/15 09:30, Dario Faggioli wrote:
> On Thu, 2015-07-02 at 04:27 +, Wu, Feng wrote:
>
> +list_for_each_entry(vmx, &per_cpu(pi_blocked_vcpu, cpu),
> +pi_blocked_vcpu_list)
> +if ( vmx->pi_desc.on )
> +tasklet_schedule(&vmx->pi_vcpu_wakeup_tasklet);
 There is a logical bug here.  If we have two NV's delivered to this
 pcpu, we will kick the first vcpu twice.

 On finding desc.on, a kick should be scheduled, then the vcpu removed
 from this list.  With desc.on set, we know for certain that another NV
 will not arrive for it until it has been scheduled again and the
 interrupt posted.

>>> Yes, that seems a possible issue (and one that should indeed be
>>> avoided).
>>>
>>> I'm still unsure about the one that I raised myself but, if it is
>>> possible to have more than one vcpu in a pcpu list, with desc.on==true,
>>> then it looks to me that we kick all of them, for each notification.
>>>
>>> Added what Andrew's spotted, if there are a bunch of vcpus, queued with
>>> desc.on==ture, and a bunch of notifications arrives before the tasklet
>>> gets executed, we'll be kicking the whole bunch of them for a bunch of
>>> times! :-/
>> As Andrew mentioned, removing the vCPUs with desc.on = true from the
>> list can avoid kick vCPUs for multiple times.
>>
> It avoids kicking vcpus multiple times if more than one notification
> arrives, yes.
>
> It is, therefore, not effective in making sure that, even with only one
> notification, you only kick the interested vcpu.
>
> This is the third time that I ask:
>  (1) whether it is possible to have more vcpus queued on one pcpu PI 
>  blocked list with desc.on (I really believe it is);
>  (2) if yes, whether it is TheRightThing(TM) to kick all of them, as
>  soon as any notification arrives, instead that putting together a
>  mechanism for kicking only a specific one.

We will receive one NV for every time the hardware managed to
successfully set desc.on

If multiple stack up and we proactively drain the list, we will
subsequently search the list to completion for all remaining NV's, due
to finding no appropriate entries.

I can't currently decide whether this will be quicker or slower overall,
or (most likely) it will even out to equal in the general case.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH] doc: Fix nonexistent error code in libxl_event_check example

2015-07-02 Thread Euan Harris
Fix example code in comment.libxl_event_check() can return
ERROR_NOT_READY;  LIBXL_NOT_READY does not exist.

Signed-off-by: Euan Harris 
---
 tools/libxl/libxl_event.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/libxl/libxl_event.h b/tools/libxl/libxl_event.h
index 3c6fcfe..fad4c14 100644
--- a/tools/libxl/libxl_event.h
+++ b/tools/libxl/libxl_event.h
@@ -213,7 +213,7 @@ void libxl_evdisable_disk_eject(libxl_ctx *ctx, 
libxl_evgen_disk_eject*);
  *  libxl_osevent_afterpoll(...);
  *  for (;;) {
  *  r = libxl_event_check(...);
- *  if (r==LIBXL_NOT_READY) break;
+ *  if (r==ERROR_NOT_READY) break;
  *  if (r) goto error_out;
  *  do something with the event;
  *  }
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH v3 10/18] xen/arm: ITS: Add APIs to add and assign device

2015-07-02 Thread Julien Grall
On 02/07/15 10:01, Ian Campbell wrote:
> On Thu, 2015-07-02 at 14:10 +0530, Vijay Kilari wrote:
>> On Mon, Jun 29, 2015 at 5:59 PM, Ian Campbell  
>> wrote:
>>> CIOn Mon, 2015-06-22 at 17:31 +0530, vijay.kil...@gmail.com wrote:
 +/* Device assignment. Should be called from pci_device_add */
 +int its_add_device(struct domain *d, u32 devid)
 +{
>>>
>>> Prior to the PCI series landing, and to enable dom0 to use ITS it might
>>> be possible to call this from xen/arch/arm/platforms/thunderx.c via the
>>> specific_mappings platform hook, which would also expose the PCI
>>> controller to dom0 via a series of specific mmio mappings (look at
>>> xen/arch/arm/platforms/xgene-storm.c for the sort of thing I mean).
>>>
>>> That would, I think, give basic PCI functionality for dom0 (i.e.
>>> allowing us to boot on thunderx) and decouple things from the PCI series
>>> somewhat, which ought to make things easier overall IMHO.
>>
>>   In case ThunderX, mmio mappings PCI RC is parsed to find devices.
>> How do we know device ids upfront to call its_add_device?
> 
> Ah yes, this approach probably doesn't work for ITS. Which is a shame.

I guess the number of device IDs supported by thunderX is bounded and
not so big. So you could do a loop which register all the IDs from 0 to N.

Though, it may be possible to be smarter.

Regards,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen 4.6 Development Update (2 WEEKS TO FREEZE, important information in preamble)

2015-07-02 Thread Dario Faggioli
On Thu, 2015-07-02 at 11:51 +0300, Razvan Cojocaru wrote:
> On 07/02/2015 11:35 AM, Ian Campbell wrote:
> > It seems that many people are unable to follow these simple
> > instructions.
> 
> The quotes part is obvious (and not that many people have ommited to do
> that), but to be honest I haven't been clear on who is supposed to be in
> the trimmed CC list: the maintainers of the code I'm touching with my
> series and Wei? 
>
Exactly.

It's the 'interested parties' that should be Cc-ed, which, 99% of the
times, is exactly what you just said:
 - the maintainers because, well, they are the maintainers, they know
   the code, they most likely know your series and will be able to
   engage in a conversation on whether the estimation is correct or not;
 - the release manager, since we're discussing release;
 - there might me more people, such as, people that have been involved
   in the review process, despite not being maintainers, or... no,
   that's all that comes to my mind.

So, trim to such a set, and you'll make most of the people happy, I bet.

> There doesn't seem to be a clear rule about who should
> be replied to (or maybe there is and I've missed it? If so, could you
> please point it out?). Maybe clearing this up could help with this
> problem in the future.
> 
I think it was pretty obvious. I guess it would not harm to add a line
making this crystal clear in the mail, but it's Wei's call to judge
whether that is really necessary.

Regards,
Dario

-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3 00/13] xen/arm: Add support for GICv2 on GICv3

2015-07-02 Thread Julien Grall
On 01/07/15 12:00, Julien Grall wrote:
> Hi all,

Hi Ian,

> This patch series adds support for GICv2 on GICv3. This feature is available
> only when the GICv3 hardware is compatible with GICv2.
> 
> When it's the case, the same interface is provided in order to use a
> virtualize GICv2 (i.e GICC and GICV). This will allow us to re-use the
> same vGIC driver.
> 
> Currently GIC and vGIC drivers are tight because of the domain initialization
> splitted between GIC and vGIC. This patch series intends to remove this
> dependency in order to make the vGIC driver agnostic of the GIC driver.
> 
> It has been tested on the ARMv8 Foundation Model with GICv2 and GICv3 as
> well as changing the vGIC version emulated for the guest (only on GICv3 host).
> 
> A branch with all the patches can be found here:
> git://xenbits.xen.org/people/julieng/xen-unstable.git branch 
> gicv2-on-gicv3-v3
> 
> All the patches as been acked except #11 and #12.

I was wondering if you can apply patch #1-#10 as they are already acked.
It would avoid me to resend the whole series.

If you ack #11, you could even apply #11 and #13 (#12 is independent).

Regards,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-02 Thread Wu, Feng


> -Original Message-
> From: Dario Faggioli [mailto:dario.faggi...@citrix.com]
> Sent: Thursday, July 02, 2015 6:10 PM
> To: Wu, Feng
> Cc: Andrew Cooper; xen-devel@lists.xen.org; Zhang, Yang Z;
> george.dun...@eu.citrix.com; Tian, Kevin; k...@xen.org; jbeul...@suse.com
> Subject: Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU
> is blocked
> 
> On Thu, 2015-07-02 at 08:58 +, Wu, Feng wrote:
> 
> > > -Original Message-
> > > From: Dario Faggioli [mailto:dario.faggi...@citrix.com]
> > >
> > > This is the third time that I ask:
> > >  (1) whether it is possible to have more vcpus queued on one pcpu PI
> > >  blocked list with desc.on (I really believe it is);
> >
> > I think it is, please see the following scenario:
> >
> > When cpu masks the interrupts, and an external interrupt occurs for the
> > assigned device while the target vCPU2 is blocked, the wakeup notification
> > event handler has no chance to run, after a while, another wakeup
> > notification event for vCPU4 blocking on the same pCPU occurs,
> > after cpu unmakes the interrupts, wakeup notification handler
> > gets called. Then we get:
> > vCPU2, desc.on = 1 and vCPU4, desc.on = 1
> > Then in the handler we need to kick both of them.
> >
> Ok, first of all, thanks for answering! :-)
> 
> And yes, this makes sense.
> 
> > >  (2) if yes, whether it is TheRightThing(TM) to kick all of them, as
> > >  soon as any notification arrives, instead that putting together a
> > >  mechanism for kicking only a specific one.
> > >
> > Why can't we kick all of them, 'desc.on = 1' means there is a pending
> > interrupt, when we meet this condition, kicking the related vCPU should
> > be the right thing to do.
> >
> Right, I see it now. I felt like I was missing something, and that's why
> I was asking to you to elaborate a bit more.
> Thanks again for having done this. I was missing/forgetting half of the
> way desc.on is actually handled, sorry for this.
> 
> BTW, I'm finding it hard reading this series from the archives; there
> appears to be some threading issues and some missing messages. I also
> don't have it in my inbox, because my filters failed to spot and flag it
> properly. If you send a new version, please, Cc me, so it will be easier
> for me to look at all the patches, and provide a more helpful review.

Sure, thanks for the review!

Thanks,
Feng

> 
> Thanks and Regards,
> Dario
> 
> --
> <> (Raistlin Majere)
> -
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4] x86/arm/mm: use gfn instead of pfn in p2m_get_mem_access/p2m_set_mem_access

2015-07-02 Thread George Dunlap
On 06/29/2015 04:45 PM, Vitaly Kuznetsov wrote:
> 'pfn' and 'start_pfn' are ambiguous, both these functions expect GFNs as 
> input.
> 
> On x86 the interface of p2m_set_mem_access() in p2m.c doesn't match the
> declaration in p2m-common.h as 'pfn' is being used instead of 'start_pfn'.
> 
> On ARM both p2m_set_mem_access and p2m_get_mem_access interfaces don't match
> declarations from p2m-common.h: p2m_set_mem_access uses 'pfn' instead of
> 'start_pfn' and p2m_get_mem_access uses 'gpfn' instead of 'pfn'.
> 
> Convert p2m_get_mem_access/p2m_set_mem_access (and __p2m_get_mem_access on 
> ARM)
> interfaces to using gft_t instead of unsigned long and update all users of
> these functions.
> 
> There is also an issue in p2m_get_mem_access on x86: 'gfn' parameter passed to
> gfn_lock/gfn_unlock is not defined. This code compiles only because of a
> coincidence: gfn_lock/gfn_unlock are currently macros which don't use their
> second argument.
> 
> Signed-off-by: Vitaly Kuznetsov 
> ---
> Changes since v3:
> - Comment codying style fix [Razvan Cojocaru]
> - Use INVALID_GFN instead of ~0 and -1 [Andrew Cooper]
> - Convert p2m_get_mem_access/p2m_set_mem_access interfaces to using gfn_t
>   [Andrew Cooper]

But you missed a change...

> @@ -1600,9 +1600,11 @@ bool_t p2m_mem_access_check(paddr_t gpa, unsigned long 
> gla,
>  return (p2ma == p2m_access_n2rwx);
>  }
>  
> -/* Set access type for a region of pfns.
> - * If start_pfn == -1ul, sets the default access type */
> -long p2m_set_mem_access(struct domain *d, unsigned long pfn, uint32_t nr,
> +/*
> + * Set access type for a region of gfns.
> + * If gfn == INVALID_GFN, sets the default access type.
> + */
> +long p2m_set_mem_access(struct domain *d, gfn_t gfn, uint32_t nr,
>  uint32_t start, uint32_t mask, xenmem_access_t 
> access)
>  {
>  struct p2m_domain *p2m = p2m_get_hostp2m(d);
> @@ -1638,18 +1640,19 @@ long p2m_set_mem_access(struct domain *d, unsigned 
> long pfn, uint32_t nr,
>  return -EINVAL;
>  }
>  
> -/* If request to set default access */
> -if ( pfn == ~0ul )
> +/* If request to set default access. */
> +if ( gfn_x(gfn) == INVALID_GFN )
>  {
>  p2m->default_access = a;
>  return 0;
>  }
>  
>  p2m_lock(p2m);
> -for ( pfn += start; nr > start; ++pfn )
> +for ( gfn = _gfn(gfn_x(gfn) + start); nr > start;
> +  gfn = _gfn(gfn_x(gfn) + 1) )

Now in p2m_set_mem_access(), rather than just using an unsigned long in
the loop iterating over gfns, you do this thing where you convert gfn_t
to unsigned long, add one, and then convert it back to gfn_t again.

I can't see any comments in v3 that suggest you doing that, and it seems
a bit clunky.  Is that really necessary?  Wouldn't it be better to
declare a local variable?

I'm not strongly opinionated on this one, it just seems a bit strange.

Everything else looks good, thanks.

 -George


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen + futexes

2015-07-02 Thread Dario Faggioli
On Thu, 2015-07-02 at 12:48 +0300, Vitaly Chernooky wrote:
> Hi all!
>
Hi,
> 
> I have theoretical question.
>
Great, I love theoretical stuff! :-P
> 
> What do you think about porting futexes or binder to XEN?
>
Ouch... That's *too* theoretical! :-)

No, really, I know a bit about futexes [0], while I know nothing about
binder (I'll investigate).

Actually, we don't even have mutexes _within_ the hypervisor, so I
struggle a bit to figure out how it would be useful to offer a similar
support to whatever is userspace for us (i.e., what futexes do, as far
as I can recall).

So, that's why I was joking/saying about this being too much
theoretical: do you have a use case in mind where something like that
can help and be useful? If yes, I'd be interested in hearing about it.

Regards,
Dario

[0] http://www.akkadia.org/drepper/futex.pdf
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [CALL-FOR-AGENDA] Monthly Xen.org Technical Call (2015-07-08)

2015-07-02 Thread Konrad Rzeszutek Wilk
On July 2, 2015 3:32:14 AM EDT, "Roger Pau Monné"  wrote:
>El 01/07/15 a les 18.15, Boris Ostrovsky ha escrit:
>> On 07/01/2015 11:57 AM, Ian Campbell wrote:
>>> The next Xen technical call will be at:
>>>  Wed  8 Jul 17:00:00 BST 2015
>>>  `date -d @1436371200`
>>>
>>> See
>http://lists.xen.org/archives/html/xen-devel/2015-01/msg00414.html
>>> for more information on the call.
>>>
>>> Please let me know (CC-ing the list) any topics which you would like
>to
>>> discuss. It might be useful to include:
>>>
>>>* References to any relevant/recent mailing list threads;
>>>* Other people who you think should be involved in the discussion
>(and
>>>  CC them);
>>>
>>> If you would like to attend then please let me know so I can send
>you the
>>> dial in details.
>> 
>> 
>> Given that there is fair amount of PVH-related work happening now
>> (Roger's, Elena's and mine) perhaps we should have a discussion about
>> that to see where we are going?
>> 
>> Andrew, Tim, Roger, Jan (if he is back from vacation), Elena, Konrad
>and
>> David would be good to have present.
>
>I'm sorry but I will be on vacation next week, hiking in the mountains
>without Internet or cell phone coverage, so I won't be able to join the
>meeting.

They might have those old payphones off the road :-)

Enjoy the hike!!
>
>I will be back on the 20th, maybe we can arrange to do another of this
>PVH meetings on the next technical call in early August?
>
>Roger.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-02 Thread Wu, Feng


> -Original Message-
> From: Andrew Cooper [mailto:andrew.coop...@citrix.com]
> Sent: Thursday, July 02, 2015 6:30 PM
> To: Dario Faggioli; Wu, Feng
> Cc: xen-devel@lists.xen.org; Zhang, Yang Z; george.dun...@eu.citrix.com;
> Tian, Kevin; k...@xen.org; jbeul...@suse.com
> Subject: Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU
> is blocked
> 
> On 02/07/15 09:30, Dario Faggioli wrote:
> > On Thu, 2015-07-02 at 04:27 +, Wu, Feng wrote:
> >
> > +list_for_each_entry(vmx, &per_cpu(pi_blocked_vcpu, cpu),
> > +pi_blocked_vcpu_list)
> > +if ( vmx->pi_desc.on )
> > +tasklet_schedule(&vmx->pi_vcpu_wakeup_tasklet);
>  There is a logical bug here.  If we have two NV's delivered to this
>  pcpu, we will kick the first vcpu twice.
> 
>  On finding desc.on, a kick should be scheduled, then the vcpu removed
>  from this list.  With desc.on set, we know for certain that another NV
>  will not arrive for it until it has been scheduled again and the
>  interrupt posted.
> 
> >>> Yes, that seems a possible issue (and one that should indeed be
> >>> avoided).
> >>>
> >>> I'm still unsure about the one that I raised myself but, if it is
> >>> possible to have more than one vcpu in a pcpu list, with desc.on==true,
> >>> then it looks to me that we kick all of them, for each notification.
> >>>
> >>> Added what Andrew's spotted, if there are a bunch of vcpus, queued with
> >>> desc.on==ture, and a bunch of notifications arrives before the tasklet
> >>> gets executed, we'll be kicking the whole bunch of them for a bunch of
> >>> times! :-/
> >> As Andrew mentioned, removing the vCPUs with desc.on = true from the
> >> list can avoid kick vCPUs for multiple times.
> >>
> > It avoids kicking vcpus multiple times if more than one notification
> > arrives, yes.
> >
> > It is, therefore, not effective in making sure that, even with only one
> > notification, you only kick the interested vcpu.
> >
> > This is the third time that I ask:
> >  (1) whether it is possible to have more vcpus queued on one pcpu PI
> >  blocked list with desc.on (I really believe it is);
> >  (2) if yes, whether it is TheRightThing(TM) to kick all of them, as
> >  soon as any notification arrives, instead that putting together a
> >  mechanism for kicking only a specific one.
> 
> We will receive one NV for every time the hardware managed to
> successfully set desc.on
> 
> If multiple stack up and we proactively drain the list, we will
> subsequently search the list to completion for all remaining NV's, due
> to finding no appropriate entries.
> 
> I can't currently decide whether this will be quicker or slower overall,
> or (most likely) it will even out to equal in the general case.

What do you mean by "general case"?

Thanks,
Feng

> 
> ~Andrew
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH RFC 4/6] xen: Print and use errno where applicable.

2015-07-02 Thread Stefano Stabellini
On Wed, 1 Jul 2015, Konrad Rzeszutek Wilk wrote:
> On Wed, Jul 01, 2015 at 02:01:07PM +0100, Stefano Stabellini wrote:
> > On Mon, 29 Jun 2015, Konrad Rzeszutek Wilk wrote:
> > > In Xen 4.6 commit cd2f100f0f61b3f333d52d1737dd73f02daee592
> > > "libxc: Fix do_memory_op to return negative value on errors"
> > > made the libxc API less odd-ball: On errors, return value is
> > > -1 and error code is in errno. On success the return value
> > > is either 0 or an positive value.
> > > 
> > > Since we could be running with an old toolstack in which the
> > > Exx value is in rc or the newer, we print both and return
> > > the -EXX depending on rc == -1 condition.
> > > 
> > > Signed-off-by: Konrad Rzeszutek Wilk 
> > > ---
> > >  xen-hvm.c | 10 ++
> > >  1 file changed, 6 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/xen-hvm.c b/xen-hvm.c
> > > index 0408462..a92bc14 100644
> > > --- a/xen-hvm.c
> > > +++ b/xen-hvm.c
> > > @@ -345,11 +345,12 @@ go_physmap:
> > >  unsigned long idx = pfn + i;
> > >  xen_pfn_t gpfn = start_gpfn + i;
> > >  
> > > +/* In Xen 4.6 rc is -1 and errno contains the error value. */
> > >  rc = xc_domain_add_to_physmap(xen_xc, xen_domid, 
> > > XENMAPSPACE_gmfn, idx, gpfn);
> > >  if (rc) {
> > >  DPRINTF("add_to_physmap MFN %"PRI_xen_pfn" to PFN %"
> > > -PRI_xen_pfn" failed: %d\n", idx, gpfn, rc);
> > > -return -rc;
> > > +PRI_xen_pfn" failed: %d (errno: %d)\n", idx, gpfn, 
> > > rc, errno);
> > > +return rc == -1 ? -errno : -rc;
> > 
> > Printing both rc and errno is the right thing to do, but I am not sure
> > changing return value depending on the libxc version is a good idea.
> > Maybe we should be consistent and always return rc?
> 
> In Xen 4.5 and earlier this function would return -EINVAL (say rc=EINVAL).
> With Xen 4.6 it would always return 1 on errors (rc is -1, and with --1 we 
> get 1), while
> the errno would have EINVAL.
> 
> To be consistent and have this function return an proper -Exx value we
> need that check to use errno in case rc == -1.

Maybe the best thing to do is to introduce a versioned
xen_xc_domain_add_to_physmap to include/hw/xen/xen_common.h


> I am uncomfortable with returning positive values as errors, which reminds me 
> -
> I need to update the commit to mention the return 1 issue.

Agreed


> > 
> > 
> > >  }
> > >  }
> > >  
> > > @@ -422,11 +423,12 @@ static int xen_remove_from_physmap(XenIOState 
> > > *state,
> > >  xen_pfn_t idx = start_addr + i;
> > >  xen_pfn_t gpfn = phys_offset + i;
> > >  
> > > +/* In Xen 4.6 rc is -1 and errno contains the error value. */
> > >  rc = xc_domain_add_to_physmap(xen_xc, xen_domid, 
> > > XENMAPSPACE_gmfn, idx, gpfn);
> > >  if (rc) {
> > >  fprintf(stderr, "add_to_physmap MFN %"PRI_xen_pfn" to PFN %"
> > > -PRI_xen_pfn" failed: %d\n", idx, gpfn, rc);
> > > -return -rc;
> > > +PRI_xen_pfn" failed: %d (errno: %d)\n", idx, gpfn, 
> > > rc, errno);
> > > +return rc == -1 ? -errno : -rc;
> > >  }
> > >  }
> > >  
> > > -- 
> > > 2.1.0
> > > 
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 2/6] libxl: do not add a vkb backend to hvm guests

2015-07-02 Thread Stefano Stabellini
On Wed, 1 Jul 2015, Konrad Rzeszutek Wilk wrote:
> On Wed, Jul 01, 2015 at 11:29:46AM +0100, Stefano Stabellini wrote:
> > On Tue, 30 Jun 2015, Konrad Rzeszutek Wilk wrote:
> > > On Tue, Jun 30, 2015 at 03:13:53PM +0100, Ian Campbell wrote:
> > > > On Tue, 2015-06-30 at 15:02 +0100, Stefano Stabellini wrote:
> > > > > On Tue, 30 Jun 2015, Ian Campbell wrote:
> > > > > > On Tue, 2015-06-30 at 12:21 +0100, Stefano Stabellini wrote:
> > > > > > > On Tue, 30 Jun 2015, Ian Campbell wrote:
> > > > > > > > On Mon, 2015-06-29 at 18:59 +0100, Stefano Stabellini wrote:
> > > > > > > > > On Thu, 25 Jun 2015, Ian Campbell wrote:
> > > > > > > > > > On Tue, 2015-06-16 at 16:39 +0100, Stefano Stabellini wrote:
> > > > > > > > > > > On Tue, 16 Jun 2015, Wei Liu wrote:
> > > > > > > > > > > > On Wed, Jun 10, 2015 at 11:09:50AM +0100, Stefano 
> > > > > > > > > > > > Stabellini wrote:
> > > > > > > > > > > > > When QEMU restricts its xenstore connection, it 
> > > > > > > > > > > > > cannot provide PV
> > > > > > > > > > > > > backends. A separate QEMU instance is required to 
> > > > > > > > > > > > > provide PV backends in
> > > > > > > > > > > > > userspace, such as qdisk. With two separate 
> > > > > > > > > > > > > instances, it is not
> > > > > > > > > > > > > possible to take advantage of vkb for mouse and 
> > > > > > > > > > > > > keyboard, as the QEMU
> > > > > > > > > > > > > that emulates the graphic card (the device model), 
> > > > > > > > > > > > > would be separate
> > > > > > > > > > > > > from the QEMU running the vkb backend (PV QEMU).
> > > > > > > > > > > > > 
> > > > > > > > > > > > 
> > > > > > > > > > > > The question is that how would this affect the 
> > > > > > > > > > > > non-split setup.
> > > > > > > > > > > 
> > > > > > > > > > > vkb is useful because emulating usb forces QEMU to wake 
> > > > > > > > > > > up more often.
> > > > > > > > > > > However there is no way around it.
> > > > > > > > > > 
> > > > > > > > > > Does pvfb+vkb continue to work due to code somewhere else?
> > > > > > > > > 
> > > > > > > > > Yes, it continues to work as usual for PV guests.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > > Do we think anyone will actually be using emulated VGA + PV 
> > > > > > > > > > input
> > > > > > > > > > devices?
> > > > > > > > > 
> > > > > > > > > VGA + PV input only works with Linux and is only useful for 
> > > > > > > > > power
> > > > > > > > > efficiency, because if you disable usb emulation in QEMU, 
> > > > > > > > > then QEMU
> > > > > > > > > would be able to wake up less often. Given that usb emulation 
> > > > > > > > > is still
> > > > > > > > > on by default, I don't think that this change will have a big 
> > > > > > > > > impact.
> > > > > > > > 
> > > > > > > > My question was whether we thought anyone would be using this
> > > > > > > > non-default configuration, not what the impact on the default 
> > > > > > > > is.
> > > > > > > > 
> > > > > > > > You gave a good reason why people might be using this facility, 
> > > > > > > > do you
> > > > > > > > think anyone is actually using it?
> > > > > > >  
> > > > > > > I don't know of anybody using it. I don't think we made clear 
> > > > > > > enough how
> > > > > > > to use this non-default configuration and its advantages for 
> > > > > > > users to go
> > > > > > > out of their ways to use it. 
> > > > > > 
> > > > > > That's good enough for me, thanks,.
> > > > > 
> > > > > Can I add your acked-by?
> > > > 
> > > > If you put some distillation of the reasoning given in this subthread
> > > > for why we think we can get away with it into the commit message then
> > > > yes.
> > > 
> > > Why don't we also make the Linux code not expose this driver for HVM 
> > > guests?
> > > 
> > > I've had an go for this last year (can't find the link) as it would unduly
> 
> And the link:
> 
> http://lists.xenproject.org/archives/html/xen-devel/2015-03/msg00160.html
> > > cause the Linux kernel to take an extra 30 seconds to boot. That is 
> > > because
> > > 'xend' by default exposes the PV configuration even for HVM guests - and 
> > > of
> > > course there are no PV drivers (as the VGA in QEMU is enabled).
> > 
> > But even with xend it only happens with a vfb line is in the config
> > file, right? That's why we didn't fix it back when the issue was
> > reported, if I remember correctly.
> 
> Both. If you had 'vnc' or 'vfb' it would setup the 'vfb' key.
> > 
> > 
> > > The only use case I had was for ARM - where there are no VGA - and the
> > > patch I think I had just disabled the xen-fbfront under X86 HVM.
> > 
> > Yeah, we need xen-fbfront for ARM.
> > 
> > 
> > Given that xen-fbfront is likely to go away for HVM guests, I wouldn't
> > be opposed to stop the driver initialization in Linux on x86/HVM. Unless
> > Roger's work on HVMlite is going to need xen-fbfront again, but in that
> > case we'll be able to distinguish a regular HVM guest from an HVMlite
> > guest, I think.
> 
> Correct. Right now the 'xen_pvh_do

Re: [Xen-devel] [PATCH v4] x86/arm/mm: use gfn instead of pfn in p2m_get_mem_access/p2m_set_mem_access

2015-07-02 Thread Andrew Cooper
On 02/07/15 11:48, George Dunlap wrote:
> On 06/29/2015 04:45 PM, Vitaly Kuznetsov wrote:
>> 'pfn' and 'start_pfn' are ambiguous, both these functions expect GFNs as 
>> input.
>>
>> On x86 the interface of p2m_set_mem_access() in p2m.c doesn't match the
>> declaration in p2m-common.h as 'pfn' is being used instead of 'start_pfn'.
>>
>> On ARM both p2m_set_mem_access and p2m_get_mem_access interfaces don't match
>> declarations from p2m-common.h: p2m_set_mem_access uses 'pfn' instead of
>> 'start_pfn' and p2m_get_mem_access uses 'gpfn' instead of 'pfn'.
>>
>> Convert p2m_get_mem_access/p2m_set_mem_access (and __p2m_get_mem_access on 
>> ARM)
>> interfaces to using gft_t instead of unsigned long and update all users of
>> these functions.
>>
>> There is also an issue in p2m_get_mem_access on x86: 'gfn' parameter passed 
>> to
>> gfn_lock/gfn_unlock is not defined. This code compiles only because of a
>> coincidence: gfn_lock/gfn_unlock are currently macros which don't use their
>> second argument.
>>
>> Signed-off-by: Vitaly Kuznetsov 
>> ---
>> Changes since v3:
>> - Comment codying style fix [Razvan Cojocaru]
>> - Use INVALID_GFN instead of ~0 and -1 [Andrew Cooper]
>> - Convert p2m_get_mem_access/p2m_set_mem_access interfaces to using gfn_t
>>   [Andrew Cooper]
> But you missed a change...
>
>> @@ -1600,9 +1600,11 @@ bool_t p2m_mem_access_check(paddr_t gpa, unsigned 
>> long gla,
>>  return (p2ma == p2m_access_n2rwx);
>>  }
>>  
>> -/* Set access type for a region of pfns.
>> - * If start_pfn == -1ul, sets the default access type */
>> -long p2m_set_mem_access(struct domain *d, unsigned long pfn, uint32_t nr,
>> +/*
>> + * Set access type for a region of gfns.
>> + * If gfn == INVALID_GFN, sets the default access type.
>> + */
>> +long p2m_set_mem_access(struct domain *d, gfn_t gfn, uint32_t nr,
>>  uint32_t start, uint32_t mask, xenmem_access_t 
>> access)
>>  {
>>  struct p2m_domain *p2m = p2m_get_hostp2m(d);
>> @@ -1638,18 +1640,19 @@ long p2m_set_mem_access(struct domain *d, unsigned 
>> long pfn, uint32_t nr,
>>  return -EINVAL;
>>  }
>>  
>> -/* If request to set default access */
>> -if ( pfn == ~0ul )
>> +/* If request to set default access. */
>> +if ( gfn_x(gfn) == INVALID_GFN )
>>  {
>>  p2m->default_access = a;
>>  return 0;
>>  }
>>  
>>  p2m_lock(p2m);
>> -for ( pfn += start; nr > start; ++pfn )
>> +for ( gfn = _gfn(gfn_x(gfn) + start); nr > start;
>> +  gfn = _gfn(gfn_x(gfn) + 1) )
> Now in p2m_set_mem_access(), rather than just using an unsigned long in
> the loop iterating over gfns, you do this thing where you convert gfn_t
> to unsigned long, add one, and then convert it back to gfn_t again.
>
> I can't see any comments in v3 that suggest you doing that, and it seems
> a bit clunky.  Is that really necessary?  Wouldn't it be better to
> declare a local variable?
>
> I'm not strongly opinionated on this one, it just seems a bit strange.
>
> Everything else looks good, thanks.

Looping over {g,m,p}fn_t's is indeed awkward, as the compiler tricks for
typesafety don't allow for simply adding 1 to a typesafe variable.

In a cases like this, I think it is acceptable to keep a unsigned long
shadow variable and manipulate it is a plain integer.  The eventual
_gfn() required to pass it further down the callchain will help to
visually re-enforce the appropriate type.

After all, the entire point of these typesafes are to try and avoid
accidentally mixing up the different address spaces, but a function
which takes a typesafe, loops over a subset and passes the same typesafe
further down can probably be trusted to DTRT, catching errors at review
time. 

Off the top of my head, the only functions which would normally expect
to mix and match the typesafes are the pagetable walking ones.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] libxl: Increase device model startup timeout to 1min.

2015-07-02 Thread Anthony PERARD
On Wed, Jul 01, 2015 at 04:03:55PM +0100, Stefano Stabellini wrote:
> On Tue, 30 Jun 2015, Ian Jackson wrote:
> > > >   * The number and nature of parallel operations done in the stress
> > > > test is unreasonable for the provided hardware:
> > > >   => the timeout is fine
> > > 
> > > I don't know if it is our place to make this call.  Should we really be
> > > deciding what is considered "reasonable"? I think not. Defining what is
> > > reasonable and policies that match it is not a route I think we should
> > > take in libxl.
> > 
> > Nevertheless if we are defining timeouts we are implicitly setting
> > some parameters which imply that certain configurations are
> > unreasonable.  Hopefully all such configurations are absurd.
> > 
> > If what you mean is that our bounds of `reasonable' should be very
> > wide, then I agree.  If anyone could reasonably expect it to work,
> > then that is fine.  Certainly we should refrain fromk subjective
> > judgements.
> 
> OK.  How do you measure reasonable for this case?
> 
> What I actually mean to ask is how do you suggest we proceed on this
> problem?
> 
> Of course it would be nice if we knew exactly why this is happening, but
> the issue only happens once every 2-3 tempest runs, each of them takes
> about 1 hour.  Tempest executes about 1300 tests for each run, some
> of them in parallel. We haven't taken the time to read all the tests run
> by tempest so we don't know exactly what they do.
> 
> We don't really know the environment that causes the failure. Reading
> all the tests is not an option. We could try adding more tracing to the
> system, but given the type of error, if we do we are not likely to
> reproduce the error at all, or maybe reproduce something different.
> 
> 
> Given the state of things, I suggest we make sure that increasing the
> timeout actually fixes/works-around the problem. I would also like to
> see some empirical measurements that tell us by how much we should
> increase the timeout. Is 1 minute actually enough?

I have tested an increase timeout this night. And here are the result.

The machine is a AMD Opteron(tm) Processor 4284, with 8G of RAM and 8 pCPU.
It's running Ubuntu 14.04, with Xen 4.4. On top of that, OpenStack have
been deployed via devstack on a single.

The test is to run Tempest with --concurrency=4. There are 4 tests runned
in parallel, but they don't necessarly start a VM. When they do, it's a PV
with 64MB and 1 vCPU and sometime with double amount of RAM.

The stats:
  Tempest run: 22
  Tempest run time for each run: ~3000s
  Tempest number of test: 1143
after 22 run of tempest:
  QEMU start: 3352
  number of run that took more than 2s: 20
  number of run that took more than 9s: 6
  maximum start time: 10.973713s

I have gathered the QEMU start time by having strace running for each of
them. I have then look at the time it took from the first syscall
execve('qemu') until the syscall where QEMU respond on its QMP socket
(libxl have acknoledge that QEMU is running at that time).

-- 
Anthony PERARD

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/4] xen: sched: avoid dumping duplicate information

2015-07-02 Thread George Dunlap
On Thu, Jun 25, 2015 at 1:15 PM, Dario Faggioli
 wrote:
> When dumping scheduling information (debug key 'r'), what
> we print as 'Idle cpupool' is pretty much the same of what
> we print immediately after as 'Cpupool0'. In fact, if there
> are no pCPUs outside of any cpupools, it is exactly the
> same.
>
> If there are free pCPUs, there is some valuable information,
> but still a lot of duplication:
>
>  (XEN) Online Cpus: 0-15
>  (XEN) Free Cpus: 8
>  (XEN) Idle cpupool:
>  (XEN) Scheduler: SMP Credit Scheduler (credit)
>  (XEN) info:
>  (XEN)   ncpus  = 13
>  (XEN)   master = 0
>  (XEN)   credit = 3900
>  (XEN)   credit balance = 45
>  (XEN)   weight = 1280
>  (XEN)   runq_sort  = 11820
>  (XEN)   default-weight = 256
>  (XEN)   tslice = 30ms
>  (XEN)   ratelimit  = 1000us
>  (XEN)   credits per msec   = 10
>  (XEN)   ticks per tslice   = 3
>  (XEN)   migration delay= 0us
>  (XEN) idlers: ,6d29
>  (XEN) active vcpus:
>  (XEN) 1: [1.7] pri=-1 flags=0 cpu=15 credit=-116 [w=256,cap=0] (84+300) 
> {a/i=22/21 m=18+5 (k=0)}
>  (XEN) 2: [1.3] pri=0 flags=0 cpu=1 credit=-113 [w=256,cap=0] (87+300) 
> {a/i=37/36 m=11+544 (k=0)}
>  (XEN) 3: [0.15] pri=-1 flags=0 cpu=4 credit=95 [w=256,cap=0] (210+300) 
> {a/i=127/126 m=108+9 (k=0)}
>  (XEN) 4: [0.10] pri=-2 flags=0 cpu=12 credit=-287 [w=256,cap=0] 
> (-84+300) {a/i=163/162 m=36+568 (k=0)}
>  (XEN) 5: [0.7] pri=-2 flags=0 cpu=2 credit=-242 [w=256,cap=0] (-42+300) 
> {a/i=129/128 m=16+50 (k=0)}
>  (XEN) CPU[08]  sort=5791, sibling=,0300, core=,ff00
>  (XEN)   run: [32767.8] pri=-64 flags=0 cpu=8
>  (XEN) Cpupool 0:
>  (XEN) Cpus: 0-5,10-15
>  (XEN) Scheduler: SMP Credit Scheduler (credit)
>  (XEN) info:
>  (XEN)   ncpus  = 13
>  (XEN)   master = 0
>  (XEN)   credit = 3900
>  (XEN)   credit balance = 45
>  (XEN)   weight = 1280
>  (XEN)   runq_sort  = 11820
>  (XEN)   default-weight = 256
>  (XEN)   tslice = 30ms
>  (XEN)   ratelimit  = 1000us
>  (XEN)   credits per msec   = 10
>  (XEN)   ticks per tslice   = 3
>  (XEN)   migration delay= 0us
>  (XEN) idlers: ,6d29
>  (XEN) active vcpus:
>  (XEN) 1: [1.7] pri=-1 flags=0 cpu=15 credit=-116 [w=256,cap=0] (84+300) 
> {a/i=22/21 m=18+5 (k=0)}
>  (XEN) 2: [1.3] pri=0 flags=0 cpu=1 credit=-113 [w=256,cap=0] (87+300) 
> {a/i=37/36 m=11+544 (k=0)}
>  (XEN) 3: [0.15] pri=-1 flags=0 cpu=4 credit=95 [w=256,cap=0] (210+300) 
> {a/i=127/126 m=108+9 (k=0)}
>  (XEN) 4: [0.10] pri=-2 flags=0 cpu=12 credit=-287 [w=256,cap=0] 
> (-84+300) {a/i=163/162 m=36+568 (k=0)}
>  (XEN) 5: [0.7] pri=-2 flags=0 cpu=2 credit=-242 [w=256,cap=0] (-42+300) 
> {a/i=129/128 m=16+50 (k=0)}
>  (XEN) CPU[00]  sort=11801, sibling=,0003, core=,00ff
>  (XEN)   run: [32767.0] pri=-64 flags=0 cpu=0
>  ... ... ...
>  (XEN) CPU[15]  sort=11820, sibling=,c000, core=,ff00
>  (XEN)   run: [1.7] pri=-1 flags=0 cpu=15 credit=-116 [w=256,cap=0] (84+300) 
> {a/i=22/21 m=18+5 (k=0)}
>  (XEN) 1: [32767.15] pri=-64 flags=0 cpu=15
>  (XEN) Cpupool 1:
>  (XEN) Cpus: 6-7,9
>  (XEN) Scheduler: SMP RTDS Scheduler (rtds)
>  (XEN) CPU[06]
>  (XEN) CPU[07]
>  (XEN) CPU[09]
>
> With this change, we get rid of the redundancy, and retain
> only the information about the free pCPUs.
>
> (While there, turn a loop index variable from `int' to
> `unsigned int' in schedule_dump().)
>
> Signed-off-by: Dario Faggioli 

Acked-by: George Dunlap 

> ---
> Cc: Juergen Gross 
> Cc: George Dunlap 
> ---
>  xen/common/cpupool.c  |6 +++---
>  xen/common/schedule.c |   18 +-
>  2 files changed, 16 insertions(+), 8 deletions(-)
>
> diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
> index 563864d..5471f93 100644
> --- a/xen/common/cpupool.c
> +++ b/xen/common/cpupool.c
> @@ -728,10 +728,10 @@ void dump_runq(unsigned char key)
>
>  print_cpumap("Online Cpus", &cpu_online_map);
>  if ( !cpumask_empty(&cpupool_free_cpus) )
> +{
>  print_cpumap("Free Cpus", &cpupool_free_cpus);
> -
> -printk("Idle cpupool:\n");
> -schedule_dump(NULL);
> +schedule_dump(NULL);
> +}
>
>  for_each_cpupool(c)
>  {
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index ecf1545..4ffcd98 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -1473,16 +1473,24 @@ void scheduler_free(struct scheduler *sched)
>
>  void schedule_dump(struct cpupool *c)
>  {
> -int   i;
> +unsigned int  i;
>  struct scheduler *sched;
>  cpumask_t*cpus;
>
>  /* Locking, if necessary, must be handled withing each scheduler */
>
> -sched = (c == NULL) ? &ops : c->sched;
> -cpus = cpupool_scheduler_cpumask(c);
> -printk("Scheduler: %s (%s)\n", sched->name, sched->opt_name);
> -SC

[Xen-devel] Sharing display between guests

2015-07-02 Thread Maxime Ripard
Hi,

I've started using Xen on an Allwinner A33, which works great as an
headless device using the latest PSCI patches in U-Boot.

However, we would like to do something more with it, and we would need
to have two VMs accessing the display at once, each one drawing in its
own part of the framebuffer.

Something that would look like this:

   Framebuffer
+---+
|   |
|Guest 1|
|   |
+---+
|   |
|   |
|   |
|Guest 2|
|   |
|   |
|   |
+---+

Where thing start to get interesting is that the second guest would be
running Android, and as such would need OpenGL support, and access to
the GPU, and that ideally the first guest would need to be able to
draw over all the screen to create some kind of a drop-down menu.

Our first thought was to use two different planes of a DRM/KMS driver,
one for each VM, with the second guest having the primary plane, and
the first guest having an overlay, and we would set it up in dom0.

That would mean that we would have a static "composition", that would
be setup once and we could forget about it during the life of the
system.

This way, we would also have a fixed size framebuffer assigned to
Android, which is much easier to support, and since we have total
control over the application in the first guest, we would be able to
control how much "transparency" we want to leave (== how much of
Android do we want to be displayed), and we would be able to create
our drop-down menu.

Now the hard part: is such a setup possible at all with Xen? Can we
export a single plane to a guest and let it be the only user of it? If
that is possible, how would that interact with the 3D acceleration?

If not, is it something that is conceptually flawed, or does one just
need to write the appropriate amount of code? Do you have a better
solution to this problem?

Thanks a lot for your feedback,
Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com


signature.asc
Description: Digital signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4] x86/arm/mm: use gfn instead of pfn in p2m_get_mem_access/p2m_set_mem_access

2015-07-02 Thread Tim Deegan
At 12:09 +0100 on 02 Jul (1435838956), Andrew Cooper wrote:
> On 02/07/15 11:48, George Dunlap wrote:
> > Now in p2m_set_mem_access(), rather than just using an unsigned long in
> > the loop iterating over gfns, you do this thing where you convert gfn_t
> > to unsigned long, add one, and then convert it back to gfn_t again.
> >
> > I can't see any comments in v3 that suggest you doing that, and it seems
> > a bit clunky.  Is that really necessary?  Wouldn't it be better to
> > declare a local variable?
> >
> > I'm not strongly opinionated on this one, it just seems a bit strange.
> >
> > Everything else looks good, thanks.
> 
> Looping over {g,m,p}fn_t's is indeed awkward, as the compiler tricks for
> typesafety don't allow for simply adding 1 to a typesafe variable.
> 
> In a cases like this, I think it is acceptable to keep a unsigned long
> shadow variable and manipulate it is a plain integer.  The eventual
> _gfn() required to pass it further down the callchain will help to
> visually re-enforce the appropriate type.
> 
> After all, the entire point of these typesafes are to try and avoid
> accidentally mixing up the different address spaces, but a function
> which takes a typesafe, loops over a subset and passes the same typesafe
> further down can probably be trusted to DTRT, catching errors at review
> time. 
> 
> Off the top of my head, the only functions which would normally expect
> to mix and match the typesafes are the pagetable walking ones.

It should be easy enough to extend the macros to define a
gfn_inc(&gfn_t) operator for this kind of thing.

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4] x86/arm/mm: use gfn instead of pfn in p2m_get_mem_access/p2m_set_mem_access

2015-07-02 Thread George Dunlap
On 07/02/2015 12:25 PM, Tim Deegan wrote:
> At 12:09 +0100 on 02 Jul (1435838956), Andrew Cooper wrote:
>> On 02/07/15 11:48, George Dunlap wrote:
>>> Now in p2m_set_mem_access(), rather than just using an unsigned long in
>>> the loop iterating over gfns, you do this thing where you convert gfn_t
>>> to unsigned long, add one, and then convert it back to gfn_t again.
>>>
>>> I can't see any comments in v3 that suggest you doing that, and it seems
>>> a bit clunky.  Is that really necessary?  Wouldn't it be better to
>>> declare a local variable?
>>>
>>> I'm not strongly opinionated on this one, it just seems a bit strange.
>>>
>>> Everything else looks good, thanks.
>>
>> Looping over {g,m,p}fn_t's is indeed awkward, as the compiler tricks for
>> typesafety don't allow for simply adding 1 to a typesafe variable.
>>
>> In a cases like this, I think it is acceptable to keep a unsigned long
>> shadow variable and manipulate it is a plain integer.  The eventual
>> _gfn() required to pass it further down the callchain will help to
>> visually re-enforce the appropriate type.
>>
>> After all, the entire point of these typesafes are to try and avoid
>> accidentally mixing up the different address spaces, but a function
>> which takes a typesafe, loops over a subset and passes the same typesafe
>> further down can probably be trusted to DTRT, catching errors at review
>> time. 
>>
>> Off the top of my head, the only functions which would normally expect
>> to mix and match the typesafes are the pagetable walking ones.
> 
> It should be easy enough to extend the macros to define a
> gfn_inc(&gfn_t) operator for this kind of thing.

I was thinking that -- although in this case you'd still need to un-pack
it to do the loop exit conditional.  To really make things pretty you'd
want a for_gfn_range() macro or something like that that takes a start
gfn and a number.

But that's really starting to be feature creep for this patch, which is
why I didn't want to suggest it on v4. :-)

 -George


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy

2015-07-02 Thread Chen, Tiejun

On 2015/7/2 18:28, George Dunlap wrote:

On 07/02/2015 11:01 AM, Chen, Tiejun wrote:

1. After spending yet another half hour doing research, I haven't found
any discussion that concluded we should have the global policy override
the local policy


I also took some time to go back checking this point and indeed this is
not in that public design. And as I mentioned in another email which is
following this, I also had a talk to Kevin about this issue, and looks
this is just concluded from our internal discussion and he didn't post
this in v2 design again because as you know, that design is about
something in high level. And as I recall, these discussions can't cover
everything at that moment because they thought we'd better post a
preliminary patches to further discuss something since this is really a
complicated case. So afterwards I sent out two RFC revisions to help all
guys finalize a good solution. And I can confirm current policy is
always same from the first RFC, but we didn't see any opposite advice
until now.


Probably because the reviewers all assumed that the design draft had
been followed, and you didn't make it clear that you'd changed it.


Shouldn't the patch head description already clarify this point? And I 
also comment this point in the code. After all, we already had several 
rounds of technical reviews so its a little hard to believe it was not 
obvious to be missed.





2. The only discussion I *did* find has *you yourself* saying that the
per-device setting should override the global setting, not once, but
twice; and nobody contradicting you.

Maybe there is somewhere else a discussion somewhere where this was
changed; but I've already spent half an hour this morning looking at
where you said it was (v2 design discussion), and found the opposite --
just as I remembered.  I'm not going to look anymore.

You have now caused me to waste an awful lot of time on this series that
could profitably have been used elsewhere.


Sorry to this but I just think we already have 2 RFC revisions and 4
revisions without RFC, and some patches are already Acked, we really
should overturn this policy right now?


First of all, I think it's easy to change.



I agree but what I'm saying is this is involving our policy. It 
shouldn't change this sort of thing if not all associated maintainers 
are in the agreement with you.



Even if it weren't, I already said that I'd be OK with accepting the
patch series with the existing "override" semantics, and without the
"default" semantics, *if* it were renamed to make it clear what was
going on.

But, for future reference, I am not going to approve an interface I
think is misleading or wrong -- particularly one like the xl interface
which we want to avoid changing if possible -- just because time is
short.  One of my own features, HVM USB pass-through, has narrowly
missed two releases (including the current one) because we wanted to be
careful to get the interface right.


I admit I should concern everything carefully like you.




Again, I didn't walk into v2 design. So here I don't want to bring any
confusion to you just with my reply.


This is your feature, so it is your responsibility to understand and
explain why you are doing what you are doing, if only to say "Jan wanted


Maybe you remember I just posted v1 but looks that was not a better
design to show this implementation according to some feedback, so Kevin
issued v2 revision and had a wider discussion with you guys. Since then
I just follow this version. So I mean I don't further hold these things
in high level since I just think both policy is fine to me because IMO,
these two approaches are optional.


X to happen because of Y [see $ref]."



So this is why I said you'd better ask this to Kevin or Jan since I
can't decide what's next at this point.


Let me say that again: I don't care whether anyone "pulled rank" and
ordered you to do something a certain way.  YOU are the one submitting
this patch.  That means YOU responsible for understanding why they want
it that way, and YOU are responsible for justifying it to other people.
  If you don't understand it at all, it's YOUR responsibility to get them
to explain it, not mine to chase them down.



As I said above I thought initially they're optional, and just about 
which one is a preference. So I picked up these patch descriptions 
reviewed in public to say this is our expectation. But looks this is not 
satisfied to you, so I don't think I can further explain this kind of 
thing appropriately, and then I ask you to ping Jan or Kevin to get a 
formal answer. Is this procedure not reasonable?


Thanks
Tiejun


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4] x86/arm/mm: use gfn instead of pfn in p2m_get_mem_access/p2m_set_mem_access

2015-07-02 Thread Andrew Cooper
On 02/07/15 12:25, Tim Deegan wrote:
> At 12:09 +0100 on 02 Jul (1435838956), Andrew Cooper wrote:
>> On 02/07/15 11:48, George Dunlap wrote:
>>> Now in p2m_set_mem_access(), rather than just using an unsigned long in
>>> the loop iterating over gfns, you do this thing where you convert gfn_t
>>> to unsigned long, add one, and then convert it back to gfn_t again.
>>>
>>> I can't see any comments in v3 that suggest you doing that, and it seems
>>> a bit clunky.  Is that really necessary?  Wouldn't it be better to
>>> declare a local variable?
>>>
>>> I'm not strongly opinionated on this one, it just seems a bit strange.
>>>
>>> Everything else looks good, thanks.
>> Looping over {g,m,p}fn_t's is indeed awkward, as the compiler tricks for
>> typesafety don't allow for simply adding 1 to a typesafe variable.
>>
>> In a cases like this, I think it is acceptable to keep a unsigned long
>> shadow variable and manipulate it is a plain integer.  The eventual
>> _gfn() required to pass it further down the callchain will help to
>> visually re-enforce the appropriate type.
>>
>> After all, the entire point of these typesafes are to try and avoid
>> accidentally mixing up the different address spaces, but a function
>> which takes a typesafe, loops over a subset and passes the same typesafe
>> further down can probably be trusted to DTRT, catching errors at review
>> time. 
>>
>> Off the top of my head, the only functions which would normally expect
>> to mix and match the typesafes are the pagetable walking ones.
> It should be easy enough to extend the macros to define a
> gfn_inc(&gfn_t) operator for this kind of thing.

Very true.  I suspect inc and dec will cover the majority of the
unbox/modify/box bits we currently have.

OTOH, ideally I would prefer not to introduce even more magic which
cscope/tags can't spot, but introducing gfn_inc() does appear to be the
neater solution.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 11/22] xen/x86: allow disabling emulated devices for HVM guests

2015-07-02 Thread Stefano Stabellini
On Wed, 1 Jul 2015, Andrew Cooper wrote:
> On 01/07/15 17:13, Stefano Stabellini wrote:
> > On Wed, 1 Jul 2015, Andrew Cooper wrote:
> >> On 01/07/15 16:51, Boris Ostrovsky wrote:
> >>> On 07/01/2015 11:46 AM, Andrew Cooper wrote:
>  On 01/07/15 15:46, Roger Pau Monne wrote:
> > Introduce a new DOMCTL flag that can be used to disable device
> > emulation
> > inside of Xen for HVM guests. The following emulated devices are
> > disabled
> > when the XEN_DOMCTL_CDF_noemu is used: hpet, pmtimer, rtc, ioapic,
> > lapic,
> > pic and pmu. Also all the MMIO handlers are disabled.
> >
> > Signed-off-by: Roger Pau Monné 
> > Cc: Jan Beulich 
> > Cc: Andrew Cooper 
> > Cc: Boris Ostrovsky 
> > Cc: Suravee Suthikulpanit 
> > Cc: Aravind Gopalakrishnan 
> > Cc: Jun Nakajima 
> > Cc: Eddie Dong 
> > Cc: Kevin Tian 
>  I would be hesitant to have a blanket change like this.
> 
>  Consider APICV/AVIC.  For performance reasons, we absolutely want HVM
>  and PVH to make use of them, as they are substantially more efficient
>  using hardware support than evening using plain evtchn hypercalls.
> 
>  However, the flipside is that we must provide an LAPIC emulation to
>  cover the bits which hardware cannot virtualise.
> 
>  As a random idea, how about having a new hypercall or hvmparam which
>  provides a bitmap of permitted emulators?  This would allow far finer
>  grain control over what is and isn't available to a domain.
> >>> I think we also need to decide on which subsets of emulators we are
> >>> going to support, otherwise test matrix will become pretty big. For
> >>> example, initially we may want to allow all (for what we now call HVM)
> >>> or none (PVH).
> >> Right, but that can currently be enforced with an "if ( arg != 0 && arg
> >> != ~0 ) return -EOPNOTSUPP;" in the hypercall handler for now.
> >>
> >> It still leaves us with the ability to add in LAPIC emulation in the
> >> future by changing the auditing.  A blanket "no emulation" boolean is
> >> very much harder to relax in the future.
> > APICV is a bit of a special case, because it is partially virtualized in
> > hardware.
>
> Not in the slightest.  It is *exactly* the same as existing hardware
> virt.

I thought we were speaking about emulation, specifically regarding
device emulation in Xen x86, such as the hpet for example. In this
context APICV is a bit of a special case. Are there other devices being
partially virtualized in hardware on x86? (I admit that I haven't follow
x86 development that closely.)


> Hardware does most of the work, but occasionally needs to break
> into Xen to mange thing.  The difference is that we don't call some of
> the existing vmexits "emulating an x86 cpu", despite this being what is
> actually happening.

To me, that is different.


> > But in general, considering that the whole purpose of PVH as DomU is
> > security
>
> Says who?  An entirely reasonable alternate opinion is "HVM without the
> emulation overhead".

I say that :-)

I don't want to diminish the value of PVH. It is indeed very valuable
and some people might agree with your statement. However I am trying to
establish few clear use cases that, as Xen Project, we do support well,
rather than trying to be everything to everybody. I wouldn't want to
find ourselves into a situation similar to stubdom in a few years from
now with PVH too.

I think we should consider our current test matrix, realize that is way
too large for our current level of engagement, and try to decrease it,
rather than increasing it.


> > , as a Xen user, I would not want any emulators running with PVH
> > guests. Otherwise I might as well run PV on HVM.
>
> That is fine from a security point of view, but is not shared by most
> users Xen.
>
> Most users of Xen want to squeeze every ounce of performance out of the
> hardware they paid $$$ for, and won't mind exposing an LAPIC
> implementation to a PVH guest, seeing as the same implementation is
> available to windows or existing PVHVM guests.

Sure, in that case let's make a choice and enable the vLAPIC for PVH by
default. Why do we need to make it configurable? Why do we need 2
configurations instead of 1? We cannot always be everything to
everybody.


> (when eventually supported), offering host administrators a choice
> between more security or more performance is perfectly ok, but designing
> PVH to be secure at the deliberate detriment of performance is unacceptable.

It is never that simple.

If we have 2 configs for PVH, then we need two set of tests in OSSTest,
that are going to cost time to write and resources to run. As you
probably know we already have too many. They are going to require more
hardware in the Xen Project test infrastructure and eventually more
people to maintain it.

If we have 2 configs for PVH, we have two code paths to maintain in Xen
and Linux.  Are you up for taking up maintenance of PVH in 

[Xen-devel] [PATCH v2 2/3] xen/domain_page: Convert copy/clear_domain_page() to using mfn_t

2015-07-02 Thread Ben Catterall
From: Andrew Cooper 

Signed-off-by: Andrew Cooper 
[Convert grant_table.c to pass mfn_t types and fix ARM compiling]

Signed-off-by: Ben Catterall 
Reviewed-by: Andrew Cooper 
---
 xen/arch/x86/mm.c |  7 ---
 xen/common/grant_table.c  |  2 +-
 xen/common/kimage.c   | 12 ++--
 xen/common/memory.c   | 12 +---
 xen/include/xen/domain_page.h | 15 ++-
 5 files changed, 22 insertions(+), 26 deletions(-)

diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 9e08c9b..8a7524f 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -3294,7 +3294,7 @@ long do_mmuext_op(
 /* A page is dirtied when it's being cleared. */
 paging_mark_dirty(pg_owner, page_to_mfn(page));
 
-clear_domain_page(page_to_mfn(page));
+clear_domain_page(_mfn(page_to_mfn(page)));
 
 put_page_and_type(page);
 break;
@@ -3328,7 +3328,8 @@ long do_mmuext_op(
 /* A page is dirtied when it's being copied to. */
 paging_mark_dirty(pg_owner, page_to_mfn(dst_page));
 
-copy_domain_page(page_to_mfn(dst_page), page_to_mfn(src_page));
+copy_domain_page(_mfn(page_to_mfn(dst_page)),
+ _mfn(page_to_mfn(src_page)));
 
 put_page_and_type(dst_page);
 put_page(src_page);
@@ -6005,7 +6006,7 @@ int create_perdomain_mapping(struct domain *d, unsigned 
long va,
 pg = alloc_domheap_page(d, MEMF_no_owner);
 if ( pg )
 {
-clear_domain_page(page_to_mfn(pg));
+clear_domain_page(_mfn(page_to_mfn(pg)));
 if ( !IS_NIL(ppg) )
 *ppg++ = pg;
 l1tab[l1_table_offset(va)] =
diff --git a/xen/common/grant_table.c b/xen/common/grant_table.c
index a011276..9786ecd 100644
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -1732,7 +1732,7 @@ gnttab_transfer(
 goto unlock_and_copyback;
 }
 
-copy_domain_page(page_to_mfn(new_page), mfn);
+copy_domain_page(_mfn(page_to_mfn(new_page)), _mfn(mfn));
 
 page->count_info &= ~(PGC_count_mask|PGC_allocated);
 free_domheap_page(page);
diff --git a/xen/common/kimage.c b/xen/common/kimage.c
index 8c4854d..742e4e8 100644
--- a/xen/common/kimage.c
+++ b/xen/common/kimage.c
@@ -77,7 +77,7 @@ static struct page_info *kimage_alloc_zeroed_page(unsigned 
memflags)
 if ( !page )
 return NULL;
 
-clear_domain_page(page_to_mfn(page));
+clear_domain_page(_mfn(page_to_mfn(page)));
 
 return page;
 }
@@ -409,7 +409,7 @@ static struct page_info 
*kimage_alloc_crash_control_page(struct kexec_image *ima
 if ( page )
 {
 image->next_crash_page = hole_end;
-clear_domain_page(page_to_mfn(page));
+clear_domain_page(_mfn(page_to_mfn(page)));
 }
 
 return page;
@@ -637,15 +637,15 @@ static struct page_info *kimage_alloc_page(struct 
kexec_image *image,
 if ( old )
 {
 /* If so move it. */
-unsigned long old_mfn = *old >> PAGE_SHIFT;
-unsigned long mfn = addr >> PAGE_SHIFT;
+mfn_t old_mfn = _mfn(*old >> PAGE_SHIFT);
+mfn_t mfn = _mfn(addr >> PAGE_SHIFT);
 
 copy_domain_page(mfn, old_mfn);
 clear_domain_page(old_mfn);
 *old = (addr & ~PAGE_MASK) | IND_SOURCE;
 unmap_domain_page(old);
 
-page = mfn_to_page(old_mfn);
+page = mfn_to_page(mfn_x(old_mfn));
 break;
 }
 else
@@ -917,7 +917,7 @@ int kimage_build_ind(struct kexec_image *image, unsigned 
long ind_mfn,
 goto done;
 }
 
-copy_domain_page(page_to_mfn(xen_page), mfn);
+copy_domain_page(_mfn(page_to_mfn(xen_page)), _mfn(mfn));
 put_page(guest_page);
 
 ret = kimage_add_page(image, page_to_maddr(xen_page));
diff --git a/xen/common/memory.c b/xen/common/memory.c
index c84fcdd..ae4c32e 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -1170,25 +1170,23 @@ long do_memory_op(unsigned long cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 return rc;
 }
 
-#ifdef CONFIG_DOMAIN_PAGE
-void clear_domain_page(unsigned long mfn)
+void clear_domain_page(mfn_t mfn)
 {
-void *ptr = map_domain_page(mfn);
+void *ptr = map_domain_page(mfn_x(mfn));
 
 clear_page(ptr);
 unmap_domain_page(ptr);
 }
 
-void copy_domain_page(unsigned long dmfn, unsigned long smfn)
+void copy_domain_page(mfn_t dest, mfn_t source)
 {
-const void *src = map_domain_page(smfn);
-void *dst = map_domain_page(dmfn);
+const void *src = map_domain_page(mfn_x(source));
+void *dst = map_domain_page(mfn_x(dest));
 
 copy_page(dst, src);
 unmap_domain_page(dst);
 unmap_domain_page(src);
 }
-#endif
 
 void destroy_ring_for_helper(
 void **_va, struct page_info *page)

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-02 Thread Dario Faggioli
On Thu, 2015-07-02 at 11:30 +0100, Andrew Cooper wrote:
> On 02/07/15 09:30, Dario Faggioli wrote:

> > It is, therefore, not effective in making sure that, even with only one
> > notification, you only kick the interested vcpu.
> >
> > This is the third time that I ask:
> >  (1) whether it is possible to have more vcpus queued on one pcpu PI 
> >  blocked list with desc.on (I really believe it is);
> >  (2) if yes, whether it is TheRightThing(TM) to kick all of them, as
> >  soon as any notification arrives, instead that putting together a
> >  mechanism for kicking only a specific one.
> 
> We will receive one NV for every time the hardware managed to
> successfully set desc.on
> 
Right, I see it now, thanks.

> If multiple stack up and we proactively drain the list, we will
> subsequently search the list to completion for all remaining NV's, due
> to finding no appropriate entries.
> 
> I can't currently decide whether this will be quicker or slower overall,
> or (most likely) it will even out to equal in the general case.
> 
Well, given the thing works as you (two) just described, I think
draining the list is the only thing we can do.

In fact, AFAICT, since we can't know for what vcpu a particular
notification is intended, we don't have alternatives to waking them all,
do we?

Dario

-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 1/3] xen/domain_page: Convert map_domain_page_global() to using mfn_t

2015-07-02 Thread Ben Catterall
From: Andrew Cooper 

The sh_map/unmap wrappers can be dropped, and take the opportunity to turn
some #define's into static inlines, for added type saftey.

As part of adding the type safety, GCC highlights an problematic include cycle
with arm/mm.h including domain_page.h which includes xen/mm.h and falls over
__page_to_mfn being used before being declared.  Simply dropping the inclusion
of domain_page.h fixes the compilation issue.

Signed-off-by: Andrew Cooper 
CC: Jan Beulich 
CC: Tim Deegan 
CC: Ian Campbell 
CC: Stefano Stabellini 
---
 xen/arch/arm/mm.c|  6 ++
 xen/arch/x86/domain_page.c   |  9 -
 xen/arch/x86/mm/shadow/multi.c   | 10 +-
 xen/arch/x86/mm/shadow/private.h | 12 
 xen/include/asm-arm/mm.h |  1 -
 xen/include/xen/domain_page.h| 22 +-
 6 files changed, 28 insertions(+), 32 deletions(-)

diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index ff1b330..d479048 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -271,11 +271,9 @@ void clear_fixmap(unsigned map)
 }
 
 #ifdef CONFIG_DOMAIN_PAGE
-void *map_domain_page_global(unsigned long mfn)
+void *map_domain_page_global(mfn_t mfn)
 {
-mfn_t m = _mfn(mfn);
-
-return vmap(&m, 1);
+return vmap(&mfn, 1);
 }
 
 void unmap_domain_page_global(const void *va)
diff --git a/xen/arch/x86/domain_page.c b/xen/arch/x86/domain_page.c
index d684b2f..0f7548b 100644
--- a/xen/arch/x86/domain_page.c
+++ b/xen/arch/x86/domain_page.c
@@ -302,17 +302,16 @@ int mapcache_vcpu_init(struct vcpu *v)
 return 0;
 }
 
-void *map_domain_page_global(unsigned long mfn)
+void *map_domain_page_global(mfn_t mfn)
 {
-mfn_t m = _mfn(mfn);
 ASSERT(!in_irq() && local_irq_is_enabled());
 
 #ifdef NDEBUG
-if ( mfn <= PFN_DOWN(__pa(HYPERVISOR_VIRT_END - 1)) )
-return mfn_to_virt(mfn);
+if ( mfn_x(mfn) <= PFN_DOWN(__pa(HYPERVISOR_VIRT_END - 1)) )
+return mfn_to_virt(mfn_x(mfn));
 #endif
 
-return vmap(&m, 1);
+return vmap(&mfn, 1);
 }
 
 void unmap_domain_page_global(const void *ptr)
diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index 42204d9..54d0bd3 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -3806,7 +3806,7 @@ sh_detach_old_tables(struct vcpu *v)
 if ( v->arch.paging.shadow.guest_vtable )
 {
 if ( shadow_mode_external(d) || shadow_mode_translate(d) )
-sh_unmap_domain_page_global(v->arch.paging.shadow.guest_vtable);
+unmap_domain_page_global(v->arch.paging.shadow.guest_vtable);
 v->arch.paging.shadow.guest_vtable = NULL;
 }
 #endif // !NDEBUG
@@ -3977,8 +3977,8 @@ sh_update_cr3(struct vcpu *v, int do_locking)
 if ( shadow_mode_external(d) || shadow_mode_translate(d) )
 {
 if ( v->arch.paging.shadow.guest_vtable )
-sh_unmap_domain_page_global(v->arch.paging.shadow.guest_vtable);
-v->arch.paging.shadow.guest_vtable = sh_map_domain_page_global(gmfn);
+unmap_domain_page_global(v->arch.paging.shadow.guest_vtable);
+v->arch.paging.shadow.guest_vtable = map_domain_page_global(gmfn);
 /* PAGING_LEVELS==4 implies 64-bit, which means that
  * map_domain_page_global can't fail */
 BUG_ON(v->arch.paging.shadow.guest_vtable == NULL);
@@ -4010,8 +4010,8 @@ sh_update_cr3(struct vcpu *v, int do_locking)
 if ( shadow_mode_external(d) || shadow_mode_translate(d) )
 {
 if ( v->arch.paging.shadow.guest_vtable )
-sh_unmap_domain_page_global(v->arch.paging.shadow.guest_vtable);
-v->arch.paging.shadow.guest_vtable = sh_map_domain_page_global(gmfn);
+unmap_domain_page_global(v->arch.paging.shadow.guest_vtable);
+v->arch.paging.shadow.guest_vtable = map_domain_page_global(gmfn);
 /* Does this really need map_domain_page_global?  Handle the
  * error properly if so. */
 BUG_ON(v->arch.paging.shadow.guest_vtable == NULL); /* XXX */
diff --git a/xen/arch/x86/mm/shadow/private.h b/xen/arch/x86/mm/shadow/private.h
index f72ea9f..eff39dc 100644
--- a/xen/arch/x86/mm/shadow/private.h
+++ b/xen/arch/x86/mm/shadow/private.h
@@ -517,18 +517,6 @@ sh_unmap_domain_page(void *p)
 unmap_domain_page(p);
 }
 
-static inline void *
-sh_map_domain_page_global(mfn_t mfn)
-{
-return map_domain_page_global(mfn_x(mfn));
-}
-
-static inline void
-sh_unmap_domain_page_global(void *p)
-{
-unmap_domain_page_global(p);
-}
-
 /**/
 /* Shadow-page refcounting. */
 
diff --git a/xen/include/asm-arm/mm.h b/xen/include/asm-arm/mm.h
index 3601140..2e1f21a 100644
--- a/xen/include/asm-arm/mm.h
+++ b/xen/include/asm-arm/mm.h
@@ -5,7 +5,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 
 /* Align Xen to a 2 MiB boundary. */
diff --git a/xen/include/xen/domain_page.h b/xen/include/xen/domain_page.h
index b7a710b..1aac0eb 100644

[Xen-devel] [PATCH v2 3/3] Convert map_domain_page() to use the new mfn_t type

2015-07-02 Thread Ben Catterall
Reworked the internals and declaration, applying (un)boxing
where needed. Converted calls to map_domain_page() to
provide mfn_t types, boxing where needed.

Signed-off-by: Ben Catterall 
Reviewed-by: Andrew Cooper 

---
Changed since v1:
   * Created paddr_to_mfn() and mfn_to_paddr() for both x86 and ARM
   * Converted code to use the new paddr_to_mfn() rather than e.g.
 paddr>>PAGE_SHIFT

Signed-off-by: Ben Catterall 
---
 xen/arch/arm/domain_build.c   |  2 +-
 xen/arch/arm/kernel.c |  2 +-
 xen/arch/arm/mm.c | 12 +-
 xen/arch/arm/p2m.c|  4 ++--
 xen/arch/arm/traps.c  |  4 ++--
 xen/arch/x86/debug.c  | 10 
 xen/arch/x86/domain.c |  4 ++--
 xen/arch/x86/domain_build.c   | 10 
 xen/arch/x86/domain_page.c| 22 -
 xen/arch/x86/domctl.c |  2 +-
 xen/arch/x86/mm.c | 40 +++
 xen/arch/x86/mm/guest_walk.c  |  2 +-
 xen/arch/x86/mm/hap/guest_walk.c  |  2 +-
 xen/arch/x86/mm/mem_sharing.c |  4 ++--
 xen/arch/x86/mm/p2m-ept.c | 22 -
 xen/arch/x86/mm/p2m-pod.c |  8 +++
 xen/arch/x86/mm/p2m-pt.c  | 28 +++---
 xen/arch/x86/mm/p2m.c |  2 +-
 xen/arch/x86/mm/paging.c  | 32 -
 xen/arch/x86/mm/shadow/common.c   |  2 +-
 xen/arch/x86/mm/shadow/multi.c|  4 ++--
 xen/arch/x86/mm/shadow/private.h  |  2 +-
 xen/arch/x86/smpboot.c|  2 +-
 xen/arch/x86/tboot.c  |  4 ++--
 xen/arch/x86/traps.c  | 12 +-
 xen/arch/x86/x86_64/mm.c  | 14 +--
 xen/arch/x86/x86_64/traps.c   | 10 
 xen/arch/x86/x86_emulate.c| 10 
 xen/common/grant_table.c  |  4 ++--
 xen/common/kexec.c|  4 ++--
 xen/common/kimage.c   | 10 
 xen/common/memory.c   |  6 ++---
 xen/common/tmem_xen.c |  6 ++---
 xen/drivers/passthrough/amd/iommu_guest.c | 10 
 xen/drivers/passthrough/amd/iommu_map.c   | 14 +--
 xen/drivers/passthrough/vtd/x86/vtd.c |  2 +-
 xen/include/asm-arm/mm.h  |  2 ++
 xen/include/asm-x86/hap.h |  2 +-
 xen/include/asm-x86/page.h| 10 +---
 xen/include/asm-x86/paging.h  |  2 +-
 xen/include/xen/domain_page.h |  8 +++
 41 files changed, 179 insertions(+), 173 deletions(-)

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index e9cb8a9..37db8b7 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -1322,7 +1322,7 @@ static void initrd_load(struct kernel_info *kinfo)
 return;
 }
 
-dst = map_domain_page(ma>>PAGE_SHIFT);
+dst = map_domain_page(_mfn(paddr_to_mfn(ma)));
 
 copy_from_paddr(dst + s, paddr + offs, l);
 
diff --git a/xen/arch/arm/kernel.c b/xen/arch/arm/kernel.c
index 209c3dd..9826fb2 100644
--- a/xen/arch/arm/kernel.c
+++ b/xen/arch/arm/kernel.c
@@ -182,7 +182,7 @@ static void kernel_zimage_load(struct kernel_info *info)
 return;
 }
 
-dst = map_domain_page(ma>>PAGE_SHIFT);
+dst = map_domain_page(_mfn(paddr_to_mfn(ma)));
 
 copy_from_paddr(dst + s, paddr + offs, l);
 
diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index d479048..ae0f34c 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -213,7 +213,7 @@ void dump_pt_walk(paddr_t ttbr, paddr_t addr,
 else
 root_table = 0;
 
-mapping = map_domain_page(root_pfn + root_table);
+mapping = map_domain_page(_mfn(root_pfn + root_table));
 
 for ( level = root_level; ; level++ )
 {
@@ -230,7 +230,7 @@ void dump_pt_walk(paddr_t ttbr, paddr_t addr,
 
 /* For next iteration */
 unmap_domain_page(mapping);
-mapping = map_domain_page(pte.walk.base);
+mapping = map_domain_page(_mfn(pte.walk.base));
 }
 
 unmap_domain_page(mapping);
@@ -282,11 +282,11 @@ void unmap_domain_page_global(const void *va)
 }
 
 /* Map a page of domheap memory */
-void *map_domain_page(unsigned long mfn)
+void *map_domain_page(mfn_t mfn)
 {
 unsigned long flags;
 lpae_t *map = this_cpu(xen_dommap);
-unsigned long slot_mfn = mfn & ~LPAE_ENTRY_MASK;
+unsigned long slot_mfn = mfn_x(mfn) & ~LPAE_ENTRY_MASK;
 vaddr_t va;
 lpae_t pte;
 int i, slot;
@@ -339,7 +339,7 @@ void *map_domain_page(unsigned long mfn)
 
 va = (DOMHEAP_VIRT_START
   + (slot << SECOND_SHIFT)
-  + ((mfn & LPAE_ENTRY_MASK) << THIRD_SHIFT));
+  + ((mfn_x(mfn) & LPAE_ENTRY_MASK) << THIRD_SHIFT));

Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-02 Thread Wu, Feng


> -Original Message-
> From: Dario Faggioli [mailto:dario.faggi...@citrix.com]
> Sent: Thursday, July 02, 2015 8:04 PM
> To: Andrew Cooper
> Cc: Wu, Feng; Tian, Kevin; k...@xen.org; george.dun...@eu.citrix.com;
> xen-devel@lists.xen.org; jbeul...@suse.com; Zhang, Yang Z
> Subject: Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU
> is blocked
> 
> On Thu, 2015-07-02 at 11:30 +0100, Andrew Cooper wrote:
> > On 02/07/15 09:30, Dario Faggioli wrote:
> 
> > > It is, therefore, not effective in making sure that, even with only one
> > > notification, you only kick the interested vcpu.
> > >
> > > This is the third time that I ask:
> > >  (1) whether it is possible to have more vcpus queued on one pcpu PI
> > >  blocked list with desc.on (I really believe it is);
> > >  (2) if yes, whether it is TheRightThing(TM) to kick all of them, as
> > >  soon as any notification arrives, instead that putting together a
> > >  mechanism for kicking only a specific one.
> >
> > We will receive one NV for every time the hardware managed to
> > successfully set desc.on
> >
> Right, I see it now, thanks.
> 
> > If multiple stack up and we proactively drain the list, we will
> > subsequently search the list to completion for all remaining NV's, due
> > to finding no appropriate entries.
> >
> > I can't currently decide whether this will be quicker or slower overall,
> > or (most likely) it will even out to equal in the general case.
> >
> Well, given the thing works as you (two) just described, I think
> draining the list is the only thing we can do.
> 
> In fact, AFAICT, since we can't know for what vcpu a particular
> notification is intended,

Exactly, when notification event happens, the hardware sets 'ON',
software will find the vCPU with 'ON' set, in fact, software doesn't
know which vCPU the wakeup event is targeting, the only thing it
can do is kicking the vCPUs with desc.on = 1.

Thanks,
Feng

 we don't have alternatives to waking them all,
> do we?
> 
> Dario
> 
> --
> <> (Raistlin Majere)
> -
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] Mapping Dom0 page in hypervisor from domctl

2015-07-02 Thread Bread Cutter
Hi all,

I'm trying to map a page provided by Dom0 into Xen's address space,
but I keep freezing the physical computer when I try. Maybe I'm
hitting a spinlock or something, but clearly I'm doing something
incorrectly. I'll probably be able to get farther once I have another
computer with a serial port to get logging, but hopefully it's
something obvious.

Dom0 sends a domctl containing a virtual address. I attempt to
translate it to a GFN, and then get the page_info structure, and
finally map it.

Source code:

/* Unmap any existing mapping */
if ( d->arch.hvm_domain.dom0_mapping.page_info )
{
unmap_domain_page(d->arch.hvm_domain.dom0_mapping.buffer);
put_page(d->arch.hvm_domain.dom0_mapping.page_info);

d->arch.hvm_domain.dom0_mapping.buffer = NULL;
d->arch.hvm_domain.dom0_mapping.page_info = NULL;
}

if ( domctl->va )
{
struct page_info* page_info;
uint64_t gfn;
uint32_t pfec = 0;

gdprintk (XENLOG_DEBUG, "Using guest VA 0x%lX\n", domctl->va);

/* Validate that the given VA is page-aligned */
if ((domctl->va & ~PAGE_MASK) != 0)
{
gdprintk (XENLOG_DEBUG, "VA is not page aligned\n");
ret = -EINVAL;
break;
}

/* Translate the given virtual address to the guest frame number */
gfn = paging_gva_to_gfn(current, domctl->va, &pfec);
if ( gfn == INVALID_GFN )
{
gdprintk (XENLOG_DEBUG, "Invalid GFN\n");
ret = -EFAULT;
break;
}
gdprintk (XENLOG_DEBUG, "Translated to gfn 0x%lX\n", gfn);

/* Get the page info */
page_info = get_page_from_gfn(current->domain, gfn, NULL,
P2M_UNSHARE);
if ( !page_info )
{
gdprintk (XENLOG_DEBUG, "No page info\n");
ret = -ESRCH;
break;
}
gdprintk (XENLOG_DEBUG, "Translated to mfn 0x%lX\n",
page_to_mfn(page_info));

d->arch.hvm_domain.dom0_mapping.page_info = page_info;
d->arch.hvm_domain.dom0_mapping.buffer =
(uint8_t*)map_domain_page_global(page_to_mfn(page_info));
if ( !d->arch.hvm_domain.dom0_mapping.buffer )
{
gdprintk (XENLOG_DEBUG, "Map failed\n");
put_page(page_info);
ret = -ENOMEM;
break;
}

gdprintk (XENLOG_DEBUG, "All Mapped: %p\n",
d->arch.hvm_domain.dom0_mapping.buffer);
}

Any insight into what I'm doing wrong would be appreciated!

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-02 Thread Andrew Cooper
On 02/07/15 13:04, Dario Faggioli wrote:
> On Thu, 2015-07-02 at 11:30 +0100, Andrew Cooper wrote:
>> On 02/07/15 09:30, Dario Faggioli wrote:
>>> It is, therefore, not effective in making sure that, even with only one
>>> notification, you only kick the interested vcpu.
>>>
>>> This is the third time that I ask:
>>>  (1) whether it is possible to have more vcpus queued on one pcpu PI 
>>>  blocked list with desc.on (I really believe it is);
>>>  (2) if yes, whether it is TheRightThing(TM) to kick all of them, as
>>>  soon as any notification arrives, instead that putting together a
>>>  mechanism for kicking only a specific one.
>> We will receive one NV for every time the hardware managed to
>> successfully set desc.on
>>
> Right, I see it now, thanks.
>
>> If multiple stack up and we proactively drain the list, we will
>> subsequently search the list to completion for all remaining NV's, due
>> to finding no appropriate entries.
>>
>> I can't currently decide whether this will be quicker or slower overall,
>> or (most likely) it will even out to equal in the general case.
>>
> Well, given the thing works as you (two) just described, I think
> draining the list is the only thing we can do.
>
> In fact, AFAICT, since we can't know for what vcpu a particular
> notification is intended, we don't have alternatives to waking them all,
> do we?

Perhaps you misunderstand.

Every single vcpu has a PI descriptor which is shared memory with hardware.

A NV is delivered strictly when hardware atomically changes desc.on from
0 to 1.  i.e. the first time that an oustanding notification arrives. 
(iirc, desc.on is later cleared by hardware when the vcpu is scheduled
and the vector(s) actually injected.)

Part of the scheduling modifications alter when a vcpu is eligible to
have NV's delivered on its behalf.  non-scheduled vcpus get NV's while
scheduled vcpus have direct injection instead.

Therefore, in the case that an NV arrives, we know for certain that one
of the NV-eligible vcpus has had desc.on set by hardware, and we can
uniquely identify it by searching for the vcpu for which desc.on is set.

In the case of stacked NV's, we cannot associate which specific vcpu
caused which NV, but we know that we will get one NV per vcpu needing
kicking.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH v3 07/18] xen/arm: ITS: implement hw_irq_controller for LPIs

2015-07-02 Thread Vijay Kilari
On Fri, Jun 26, 2015 at 8:35 PM, Julien Grall  wrote:
> Hi Vijay,
>
> On 26/06/2015 14:54, Vijay Kilari wrote:
>>
>> On Tue, Jun 23, 2015 at 8:02 PM, Julien Grall 
>> wrote:
>>>
>>> Hi Vijay,
>>>
>>> On 22/06/15 13:01, vijay.kil...@gmail.com wrote:

 From: Vijaya Kumar K 

 Implements hw_irq_controller api's required
 to handle LPI's
>>>
>>>
>>> This patch doesn't hw_irq_controller for LPI but just hack around the
>>> current GICv3 host hw_irq_controller.
>>>
>>> As said on the previous version, the goal of hw_irq_controller is too
>>> keep things simple (i.e few conditional code). Please introduce a
>>> separate hw_irq_controller for LPIs.
>>
>>
>> If new hw_irq_controller is introduced for LPIs, then this has to
>> be exported using some lpi structure which holds pointer to
>> hw_irq_controller
>> for guest & host type similar to gic_hw_ops
>
>
> The interface is not set in stone, you are free to change what you want as
> long as we keep something clean and comprehensible. It's the same for the
> functions (I have in mind route_irq_to_guest).
>
> In this case, I would prefer to see 2 callbacks (one for the host the other
> for the guest) which return the correct IRQ controller for a specific IRQ. I
> have in mind something like:
>
>get_guest_hw_irq_controller(unsigned int irq)
>{
>if ( !is_lpi )
>  return &gicv3_guest_irq_controller
>else
>  return &gicv3_guest_lpi_controller
>}
>
> Same for the host irq controller. So the selection of the IRQ controller
> would be hidden from gic.c and keep the code a generic as possible.
>
 +/*
 + * Make the above write visible to the redistributors.
 + * And yes, we're flushing exactly: One. Single. Byte.
 + * Humpf...
 + */
 +if ( gic_rdists->flags & RDIST_FLAGS_PROPBASE_NEEDS_FLUSHING )
 +clean_and_invalidate_dcache_va_range(cfg, sizeof(*cfg));
 +else
 +dsb(ishst);
 +
 +/* Get collection id for this event id */
 +col = &its_dev->its->collections[virq % num_online_cpus()];
>>>
>>>
>>> This is fragile, you are assuming that num_online_cpus() will never
>>> change. Why don't you store the collection in every irq_desc?
>>
>>
>>This will add additional 8 bytes overhead for each irq_desc.
>>
>> Also is there a macro to get number of actual number processors in
>> system.?
>> AUI, nr_cpu_ids always returns 128
>
>
> AFAIU, nr_cpu_ids should reflect the number of CPU of the platform. x86
> correctly set it when parsing the ACPI. So I think this is a bug in the ARM
> code.
>
> In fact, I wasn't able to find a place in the ARM code where this value is
> changed.

nr_cpu_ids is not changed in case of ARM. I think this value has
to be updated in start_xen with cpus value

void __init start_xen(unsigned long boot_phys_offset,
  unsigned long fdt_paddr,
  unsigned long cpuid)


cpus = smp_get_max_cpus();
..

}

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Sharing display between guests

2015-07-02 Thread Dario Faggioli
On Thu, 2015-07-02 at 13:13 +0200, Maxime Ripard wrote:
> Hi,
> 
Hi,

I don't have much comments on all this, just that, reading, it, it
sounded somewhat similar to what GL is doing:

http://www.xenproject.org/component/allvideoshare/video/latest/ces15-globalogic.html
http://www.xenproject.org/component/allvideoshare/video/globallogic-xen-android.html

> That would mean that we would have a static "composition", that would
> be setup once and we could forget about it during the life of the
> system.
> 
Well, they use a DomU and Dom0, rather than 2 DomUs (or so I think), but
is seems to me that at least some composition is indeed happening.

I don't have much more details, I'm afraid, but if you think your use
case is related, or at least that you could benefit from interacting
with them and sharing experiences/code/whatever, just ask, I've Cc-ed
some of them. :-)

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH v3 07/18] xen/arm: ITS: implement hw_irq_controller for LPIs

2015-07-02 Thread Vijay Kilari
On Mon, Jun 29, 2015 at 5:29 PM, Ian Campbell  wrote:
> On Tue, 2015-06-23 at 15:32 +0100, Julien Grall wrote:
> [...]
>> > +{
>> > +struct its_collection *col;
>> > +struct its_device *its_dev = get_irq_device(desc);
>> > +u8 *cfg;
>> > +u32 virq = irq_to_virq(desc);
>> > +
>> > +ASSERT(virq < its_dev->nr_lpis);
>> > +
>> > +cfg = gic_rdists->prop_page + desc->irq - NR_GIC_LPI;
>> > +if ( enable )
>> > +*cfg |= LPI_PROP_ENABLED;
>> > +else
>> > +*cfg &= ~LPI_PROP_ENABLED;
>> > +
>> > +/*
>> > + * Make the above write visible to the redistributors.
>> > + * And yes, we're flushing exactly: One. Single. Byte.
>> > + * Humpf...
>> > + */
>> > +if ( gic_rdists->flags & RDIST_FLAGS_PROPBASE_NEEDS_FLUSHING )
>> > +clean_and_invalidate_dcache_va_range(cfg, sizeof(*cfg));
>> > +else
>> > +dsb(ishst);
>> > +
>> > +/* Get collection id for this event id */
>> > +col = &its_dev->its->collections[virq % num_online_cpus()];
>>
>> This is fragile, you are assuming that num_online_cpus() will never
>> change. Why don't you store the collection in every irq_desc?
>
> The original Linux code upon which this is based doesn't seem to need to
> lookup the collection here, why is flushing needed for us but not Linux?

We are writing to lpi property table. Even linux code flushes it.

>
> I'm also confused by the use of the variable name "virq" in a function
> called set_lpi_config which appears to be dealing with host level
> physical LPIs. It seems like this function would be broken for LPIs
> which were delivered to Xen and not to a guest, and that the irq_to_virq
> in here ought to be desc->irq, no?

I am using desc->irq for updating lpi property table but using vid to
send inv command

>
> BTW, the Linux original calls what this calls "virq" "id" instead, which
> is much less confusing.

OK will rename virq to vid

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] Refactor ioreq server for better performance

2015-07-02 Thread Yu, Zhang

[snip]



Thanks, Paul.
Well, I agree the former approach would be simpler. But I still doubt
if this is more reasonable. :)
IIUC, one of the reasons for struct domain to have a rangeset list(and
a spinlock - rangesets_lock), is because there are iomem_caps and
irq_caps for each domain. These 2 rangeset members of struct domain are
platform independent.
However, struct rb_rangeset is only supposed to be used in ioreq
server, which is only for x86 hvm cases. Adding a rb_rangeset list
member(similarly, if so, a rb_rangesets_lock is also required) in
struct domain maybe useless for hardware domain and for platforms other
than x86.


Fair enough.


So, I'd like to register a new debug key, to dump the ioreq server
informations, just like the keys to dump iommu p2m table or the irq
mappings. With a new debug key, we do not need to add a spinlock for
rb_rangeset in struct domain, the one in ioreq server would be enough.
Does this sound reasonable?



That would be ok with me, but I'm not sure about claiming a whole debug key for 
this. Is there any other one that you could piggy-back on? If not, then maybe 
just make it part of the 'q' output.


Thanks, my new implementation uses the 'q' debug key. Will send out the
new version later. :)

Yu


   Paul



[snip]




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel







___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4] x86/arm/mm: use gfn instead of pfn in p2m_get_mem_access/p2m_set_mem_access

2015-07-02 Thread Vitaly Kuznetsov
George Dunlap  writes:

> On 07/02/2015 12:25 PM, Tim Deegan wrote:
>> At 12:09 +0100 on 02 Jul (1435838956), Andrew Cooper wrote:
>>> On 02/07/15 11:48, George Dunlap wrote:
 Now in p2m_set_mem_access(), rather than just using an unsigned long in
 the loop iterating over gfns, you do this thing where you convert gfn_t
 to unsigned long, add one, and then convert it back to gfn_t again.

 I can't see any comments in v3 that suggest you doing that, and it seems
 a bit clunky.  Is that really necessary?  Wouldn't it be better to
 declare a local variable?

 I'm not strongly opinionated on this one, it just seems a bit strange.

 Everything else looks good, thanks.
>>>
>>> Looping over {g,m,p}fn_t's is indeed awkward, as the compiler tricks for
>>> typesafety don't allow for simply adding 1 to a typesafe variable.
>>>
>>> In a cases like this, I think it is acceptable to keep a unsigned long
>>> shadow variable and manipulate it is a plain integer.  The eventual
>>> _gfn() required to pass it further down the callchain will help to
>>> visually re-enforce the appropriate type.
>>>
>>> After all, the entire point of these typesafes are to try and avoid
>>> accidentally mixing up the different address spaces, but a function
>>> which takes a typesafe, loops over a subset and passes the same typesafe
>>> further down can probably be trusted to DTRT, catching errors at review
>>> time. 
>>>
>>> Off the top of my head, the only functions which would normally expect
>>> to mix and match the typesafes are the pagetable walking ones.
>> 
>> It should be easy enough to extend the macros to define a
>> gfn_inc(&gfn_t) operator for this kind of thing.
>
> I was thinking that -- although in this case you'd still need to un-pack
> it to do the loop exit conditional.  To really make things pretty you'd
> want a for_gfn_range() macro or something like that that takes a start
> gfn and a number.
>
> But that's really starting to be feature creep for this patch, which is
> why I didn't want to suggest it on v4. :-)

Well, if you look at what I was fixing in v1 ... :-)

I suggest we add a local unsigned long here and close the deal, when
{g,m,p}fn_{inc,dec} macros are available we'll get rid of it.

-- 
  Vitaly

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH v3 07/18] xen/arm: ITS: implement hw_irq_controller for LPIs

2015-07-02 Thread Ian Campbell
On Thu, 2015-07-02 at 17:51 +0530, Vijay Kilari wrote:
> On Mon, Jun 29, 2015 at 5:29 PM, Ian Campbell  wrote:
> > On Tue, 2015-06-23 at 15:32 +0100, Julien Grall wrote:
> > [...]
> >> > +{
> >> > +struct its_collection *col;
> >> > +struct its_device *its_dev = get_irq_device(desc);
> >> > +u8 *cfg;
> >> > +u32 virq = irq_to_virq(desc);
> >> > +
> >> > +ASSERT(virq < its_dev->nr_lpis);
> >> > +
> >> > +cfg = gic_rdists->prop_page + desc->irq - NR_GIC_LPI;
> >> > +if ( enable )
> >> > +*cfg |= LPI_PROP_ENABLED;
> >> > +else
> >> > +*cfg &= ~LPI_PROP_ENABLED;
> >> > +
> >> > +/*
> >> > + * Make the above write visible to the redistributors.
> >> > + * And yes, we're flushing exactly: One. Single. Byte.
> >> > + * Humpf...
> >> > + */
> >> > +if ( gic_rdists->flags & RDIST_FLAGS_PROPBASE_NEEDS_FLUSHING )
> >> > +clean_and_invalidate_dcache_va_range(cfg, sizeof(*cfg));
> >> > +else
> >> > +dsb(ishst);
> >> > +
> >> > +/* Get collection id for this event id */
> >> > +col = &its_dev->its->collections[virq % num_online_cpus()];
> >>
> >> This is fragile, you are assuming that num_online_cpus() will never
> >> change. Why don't you store the collection in every irq_desc?
> >
> > The original Linux code upon which this is based doesn't seem to need to
> > lookup the collection here, why is flushing needed for us but not Linux?
> 
> We are writing to lpi property table. Even linux code flushes it.

Sorry I was referring to the collection look up and inv, not the cache
flush, i.e. this bit:

+/* Get collection id for this event id */
+col = &its_dev->its->collections[virq % num_online_cpus()];
+its_send_inv(its_dev, col, virq);

Linux doesn't seem to do that INV there.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-02 Thread Dario Faggioli
On Thu, 2015-07-02 at 13:16 +0100, Andrew Cooper wrote:
> On 02/07/15 13:04, Dario Faggioli wrote:
> > On Thu, 2015-07-02 at 11:30 +0100, Andrew Cooper wrote:

> >> I can't currently decide whether this will be quicker or slower overall,
> >> or (most likely) it will even out to equal in the general case.
> >>
> > Well, given the thing works as you (two) just described, I think
> > draining the list is the only thing we can do.
> >
> > In fact, AFAICT, since we can't know for what vcpu a particular
> > notification is intended, we don't have alternatives to waking them all,
> > do we?
> 
> Perhaps you misunderstand.
> 
I'm quite sure I was. While I think now I'm getting it.

> Every single vcpu has a PI descriptor which is shared memory with hardware.
> 
Right.

> A NV is delivered strictly when hardware atomically changes desc.on from
> 0 to 1.  i.e. the first time that an oustanding notification arrives. 
> (iirc, desc.on is later cleared by hardware when the vcpu is scheduled
> and the vector(s) actually injected.)
> 
> Part of the scheduling modifications alter when a vcpu is eligible to
> have NV's delivered on its behalf.  non-scheduled vcpus get NV's while
> scheduled vcpus have direct injection instead.
> 
Blocked vcpus, AFAICT. But that's not relevant here.

> Therefore, in the case that an NV arrives, we know for certain that one
> of the NV-eligible vcpus has had desc.on set by hardware, and we can
> uniquely identify it by searching for the vcpu for which desc.on is set.
> 
Yeah, but we ca have more than one of them. You said "I can't currently
decide whether this will be quicker or slower", which I read like you
were suggesting that not draining the queue was a plausible alternative,
while I now think it's not.

Perhaps you were not meaning anything like that, so it was not necessary
for me to point this out, in which case, sorry for the noise. :-)

> In the case of stacked NV's, we cannot associate which specific vcpu
> caused which NV, but we know that we will get one NV per vcpu needing
> kicking.
> 
Exactly, and that's what I'm talking about, and why I'm saying that
waking everyone is the only solution. The bottom line being that, even
in case this is deemed too slow, we don't have the option of waking only
one vcpu at each NV, as we wouldn't know who to wake, and hence we'd
need to make things faster in some other way.


Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] libxl: Increase device model startup timeout to 1min.

2015-07-02 Thread Ian Jackson
Anthony PERARD writes ("Re: [PATCH] libxl: Increase device model startup 
timeout to 1min."):
> I have tested an increase timeout this night. And here are the result.
> 
> The machine is a AMD Opteron(tm) Processor 4284, with 8G of RAM and 8 pCPU.
> It's running Ubuntu 14.04, with Xen 4.4. On top of that, OpenStack have
> been deployed via devstack on a single.
> 
> The test is to run Tempest with --concurrency=4. There are 4 tests runned
> in parallel, but they don't necessarly start a VM. When they do, it's a PV
> with 64MB and 1 vCPU and sometime with double amount of RAM.
> 
> The stats:
>   Tempest run: 22
>   Tempest run time for each run: ~3000s
>   Tempest number of test: 1143
> after 22 run of tempest:
>   QEMU start: 3352
>   number of run that took more than 2s: 20
>   number of run that took more than 9s: 6
>   maximum start time: 10.973713s
> 
> I have gathered the QEMU start time by having strace running for each of
> them. I have then look at the time it took from the first syscall
> execve('qemu') until the syscall where QEMU respond on its QMP socket
> (libxl have acknoledge that QEMU is running at that time).

Thanks for this information.

So from what you say it appears that we are running at most 4 copies
of libxl and qemu in parallel, along with at most 4 VMs ?

And out of 3352 qemu starts, we have
  <= 2s   3332
  >2s <= 9s 14
  > 9s   6
?

Do you have any information about the maximum system load in general ?
dom0 load, vcpu overcommit, etc. ?  Do you know what the guests are
doing ?

I'm starting to think that this might be a real bug but that the bug
might be "Linux's I/O subsystem sometimes produces appalling latency
under load" (which is hardly news).

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH] Remove sh_{un}map_domain_page() and hap_{un}map_domain_page()

2015-07-02 Thread Ben Catterall
Removed as they were wrappers around map_domain_page() to
make it appear to take an mfn_t type.

Signed-off-by: Ben Catterall 
---
 xen/arch/x86/mm/hap/hap.c|   4 +-
 xen/arch/x86/mm/shadow/common.c  |  22 +++---
 xen/arch/x86/mm/shadow/multi.c   | 152 +++
 xen/arch/x86/mm/shadow/private.h |  13 
 xen/include/asm-x86/hap.h|  15 
 5 files changed, 89 insertions(+), 117 deletions(-)

diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
index d0d3f1e..63980af 100644
--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -395,7 +395,7 @@ static void hap_install_xen_entries_in_l4(struct vcpu *v, 
mfn_t l4mfn)
 struct domain *d = v->domain;
 l4_pgentry_t *l4e;
 
-l4e = hap_map_domain_page(l4mfn);
+l4e = map_domain_page(l4mfn);
 
 /* Copy the common Xen mappings from the idle domain */
 memcpy(&l4e[ROOT_PAGETABLE_FIRST_XEN_SLOT],
@@ -411,7 +411,7 @@ static void hap_install_xen_entries_in_l4(struct vcpu *v, 
mfn_t l4mfn)
 l4e[l4_table_offset(LINEAR_PT_VIRT_START)] =
 l4e_from_pfn(mfn_x(l4mfn), __PAGE_HYPERVISOR);
 
-hap_unmap_domain_page(l4e);
+unmap_domain_page(l4e);
 }
 
 static mfn_t hap_make_monitor_table(struct vcpu *v)
diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
index da6b847..3da9767 100644
--- a/xen/arch/x86/mm/shadow/common.c
+++ b/xen/arch/x86/mm/shadow/common.c
@@ -781,11 +781,11 @@ static void oos_hash_add(struct vcpu *v, mfn_t gmfn)
 if ( swap )
 SWAP(oos_snapshot[idx], oos_snapshot[oidx]);
 
-gptr = sh_map_domain_page(oos[oidx]);
-gsnpptr = sh_map_domain_page(oos_snapshot[oidx]);
+gptr = map_domain_page(oos[oidx]);
+gsnpptr = map_domain_page(oos_snapshot[oidx]);
 memcpy(gsnpptr, gptr, PAGE_SIZE);
-sh_unmap_domain_page(gptr);
-sh_unmap_domain_page(gsnpptr);
+unmap_domain_page(gptr);
+unmap_domain_page(gsnpptr);
 }
 
 /* Remove an MFN from the list of out-of-sync guest pagetables */
@@ -1498,7 +1498,7 @@ mfn_t shadow_alloc(struct domain *d,
 p = __map_domain_page(sp);
 ASSERT(p != NULL);
 clear_page(p);
-sh_unmap_domain_page(p);
+unmap_domain_page(p);
 INIT_PAGE_LIST_ENTRY(&sp->list);
 page_list_add(sp, &tmp_list);
 sp->u.sh.type = shadow_type;
@@ -2524,7 +2524,7 @@ static int sh_remove_shadow_via_pointer(struct domain *d, 
mfn_t smfn)
 if (sp->up == 0) return 0;
 pmfn = _mfn(sp->up >> PAGE_SHIFT);
 ASSERT(mfn_valid(pmfn));
-vaddr = sh_map_domain_page(pmfn);
+vaddr = map_domain_page(pmfn);
 ASSERT(vaddr);
 vaddr += sp->up & (PAGE_SIZE-1);
 ASSERT(l1e_get_pfn(*(l1_pgentry_t *)vaddr) == mfn_x(smfn));
@@ -2554,7 +2554,7 @@ static int sh_remove_shadow_via_pointer(struct domain *d, 
mfn_t smfn)
 default: BUG(); /* Some wierd unknown shadow type */
 }
 
-sh_unmap_domain_page(vaddr);
+unmap_domain_page(vaddr);
 if ( rc )
 perfc_incr(shadow_up_pointer);
 else
@@ -3028,7 +3028,7 @@ int shadow_enable(struct domain *d, u32 mode)
 e[i] = ((0x40U * i)
 | _PAGE_PRESENT | _PAGE_RW | _PAGE_USER
 | _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE);
-sh_unmap_domain_page(e);
+unmap_domain_page(e);
 pg->u.inuse.type_info = PGT_l2_page_table | 1 | PGT_validated;
 }
 
@@ -3631,8 +3631,8 @@ int shadow_track_dirty_vram(struct domain *d,
 if ( sl1mfn != map_mfn )
 {
 if ( map_sl1p )
-sh_unmap_domain_page(map_sl1p);
-map_sl1p = sh_map_domain_page(_mfn(sl1mfn));
+unmap_domain_page(map_sl1p);
+map_sl1p = map_domain_page(_mfn(sl1mfn));
 map_mfn = sl1mfn;
 }
 sl1e = map_sl1p + (sl1ma & ~PAGE_MASK);
@@ -3663,7 +3663,7 @@ int shadow_track_dirty_vram(struct domain *d,
 }
 
 if ( map_sl1p )
-sh_unmap_domain_page(map_sl1p);
+unmap_domain_page(map_sl1p);
 
 memcpy(dirty_bitmap, dirty_vram->dirty_bitmap, dirty_size);
 memset(dirty_vram->dirty_bitmap, 0, dirty_size);
diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index 2e3d3f6..4c8badf 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -221,16 +221,16 @@ shadow_check_gwalk(struct vcpu *v, unsigned long va, 
walk_t *gw, int version)
 #if GUEST_PAGING_LEVELS >= 4 /* 64-bit only... */
 l4p = (guest_l4e_t *)v->arch.paging.shadow.guest_vtable;
 mismatch |= (gw->l4e.l4 != l4p[guest_l4_table_offset(va)].l4);
-l3p = sh_map_domain_page(gw->l3mfn);
+l3p = map_domain_page(gw->l3mfn);
 mismatch |= (gw->l3e.l3 != l3p[guest_l3_table_offset(va)].l3);
-sh_unmap_domain_page(l3p);
+unma

Re: [Xen-devel] [RFC PATCH v3 07/18] xen/arm: ITS: implement hw_irq_controller for LPIs

2015-07-02 Thread Vijay Kilari
On Thu, Jul 2, 2015 at 6:05 PM, Ian Campbell  wrote:
> On Thu, 2015-07-02 at 17:51 +0530, Vijay Kilari wrote:
>> On Mon, Jun 29, 2015 at 5:29 PM, Ian Campbell  
>> wrote:
>> > On Tue, 2015-06-23 at 15:32 +0100, Julien Grall wrote:
>> > [...]
>> >> > +{
>> >> > +struct its_collection *col;
>> >> > +struct its_device *its_dev = get_irq_device(desc);
>> >> > +u8 *cfg;
>> >> > +u32 virq = irq_to_virq(desc);
>> >> > +
>> >> > +ASSERT(virq < its_dev->nr_lpis);
>> >> > +
>> >> > +cfg = gic_rdists->prop_page + desc->irq - NR_GIC_LPI;
>> >> > +if ( enable )
>> >> > +*cfg |= LPI_PROP_ENABLED;
>> >> > +else
>> >> > +*cfg &= ~LPI_PROP_ENABLED;
>> >> > +
>> >> > +/*
>> >> > + * Make the above write visible to the redistributors.
>> >> > + * And yes, we're flushing exactly: One. Single. Byte.
>> >> > + * Humpf...
>> >> > + */
>> >> > +if ( gic_rdists->flags & RDIST_FLAGS_PROPBASE_NEEDS_FLUSHING )
>> >> > +clean_and_invalidate_dcache_va_range(cfg, sizeof(*cfg));
>> >> > +else
>> >> > +dsb(ishst);
>> >> > +
>> >> > +/* Get collection id for this event id */
>> >> > +col = &its_dev->its->collections[virq % num_online_cpus()];
>> >>
>> >> This is fragile, you are assuming that num_online_cpus() will never
>> >> change. Why don't you store the collection in every irq_desc?
>> >
>> > The original Linux code upon which this is based doesn't seem to need to
>> > lookup the collection here, why is flushing needed for us but not Linux?
>>
>> We are writing to lpi property table. Even linux code flushes it.
>
> Sorry I was referring to the collection look up and inv, not the cache
> flush, i.e. this bit:
>
> +/* Get collection id for this event id */
> +col = &its_dev->its->collections[virq % num_online_cpus()];
> +its_send_inv(its_dev, col, virq);
>
> Linux doesn't seem to do that INV there.

Linux does INV.

https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/irqchip/irq-gic-v3-its.c?id=refs/tags/v4.1

line 555

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [v3 12/15] vmx: posted-interrupt handling when vCPU is blocked

2015-07-02 Thread Andrew Cooper
On 02/07/15 13:38, Dario Faggioli wrote:
> On Thu, 2015-07-02 at 13:16 +0100, Andrew Cooper wrote:
>> On 02/07/15 13:04, Dario Faggioli wrote:
>>> On Thu, 2015-07-02 at 11:30 +0100, Andrew Cooper wrote:
 I can't currently decide whether this will be quicker or slower overall,
 or (most likely) it will even out to equal in the general case.

>>> Well, given the thing works as you (two) just described, I think
>>> draining the list is the only thing we can do.
>>>
>>> In fact, AFAICT, since we can't know for what vcpu a particular
>>> notification is intended, we don't have alternatives to waking them all,
>>> do we?
>> Perhaps you misunderstand.
>>
> I'm quite sure I was. While I think now I'm getting it.
>
>> Every single vcpu has a PI descriptor which is shared memory with hardware.
>>
> Right.
>
>> A NV is delivered strictly when hardware atomically changes desc.on from
>> 0 to 1.  i.e. the first time that an oustanding notification arrives. 
>> (iirc, desc.on is later cleared by hardware when the vcpu is scheduled
>> and the vector(s) actually injected.)
>>
>> Part of the scheduling modifications alter when a vcpu is eligible to
>> have NV's delivered on its behalf.  non-scheduled vcpus get NV's while
>> scheduled vcpus have direct injection instead.
>>
> Blocked vcpus, AFAICT. But that's not relevant here.
>
>> Therefore, in the case that an NV arrives, we know for certain that one
>> of the NV-eligible vcpus has had desc.on set by hardware, and we can
>> uniquely identify it by searching for the vcpu for which desc.on is set.
>>
> Yeah, but we ca have more than one of them. You said "I can't currently
> decide whether this will be quicker or slower", which I read like you
> were suggesting that not draining the queue was a plausible alternative,
> while I now think it's not.
>
> Perhaps you were not meaning anything like that, so it was not necessary
> for me to point this out, in which case, sorry for the noise. :-)

To be clear, (assuming that a kicked vcpu is removed from the list),
then both options of kicking exactly one vcpu or kicking all vcpus will
function.  The end result after all processing of NVs will be that every
vcpu with desc.on set will be kicked exactly once.

I just was concerned about the O() of searching the list on a subsequent
NV, knowing that we most likely took the relevant entry off the list on
the previous NV.

>
>> In the case of stacked NV's, we cannot associate which specific vcpu
>> caused which NV, but we know that we will get one NV per vcpu needing
>> kicking.
>>
> Exactly, and that's what I'm talking about, and why I'm saying that
> waking everyone is the only solution. The bottom line being that, even
> in case this is deemed too slow, we don't have the option of waking only
> one vcpu at each NV, as we wouldn't know who to wake, and hence we'd
> need to make things faster in some other way.

Ah - I see your point now.

Yes - kicking exactly one vcpu per NV could result in a different vcpu
being deferred based on the interrupt activity of other vcpus and its
position in the list.

In which case, we should eagerly kick all vcpus.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH v3 07/18] xen/arm: ITS: implement hw_irq_controller for LPIs

2015-07-02 Thread Ian Campbell
On Thu, 2015-07-02 at 18:14 +0530, Vijay Kilari wrote:
> On Thu, Jul 2, 2015 at 6:05 PM, Ian Campbell  wrote:
> > On Thu, 2015-07-02 at 17:51 +0530, Vijay Kilari wrote:
> >> On Mon, Jun 29, 2015 at 5:29 PM, Ian Campbell  
> >> wrote:
> >> > On Tue, 2015-06-23 at 15:32 +0100, Julien Grall wrote:
> >> > [...]
> >> >> > +{
> >> >> > +struct its_collection *col;
> >> >> > +struct its_device *its_dev = get_irq_device(desc);
> >> >> > +u8 *cfg;
> >> >> > +u32 virq = irq_to_virq(desc);
> >> >> > +
> >> >> > +ASSERT(virq < its_dev->nr_lpis);
> >> >> > +
> >> >> > +cfg = gic_rdists->prop_page + desc->irq - NR_GIC_LPI;
> >> >> > +if ( enable )
> >> >> > +*cfg |= LPI_PROP_ENABLED;
> >> >> > +else
> >> >> > +*cfg &= ~LPI_PROP_ENABLED;
> >> >> > +
> >> >> > +/*
> >> >> > + * Make the above write visible to the redistributors.
> >> >> > + * And yes, we're flushing exactly: One. Single. Byte.
> >> >> > + * Humpf...
> >> >> > + */
> >> >> > +if ( gic_rdists->flags & RDIST_FLAGS_PROPBASE_NEEDS_FLUSHING )
> >> >> > +clean_and_invalidate_dcache_va_range(cfg, sizeof(*cfg));
> >> >> > +else
> >> >> > +dsb(ishst);
> >> >> > +
> >> >> > +/* Get collection id for this event id */
> >> >> > +col = &its_dev->its->collections[virq % num_online_cpus()];
> >> >>
> >> >> This is fragile, you are assuming that num_online_cpus() will never
> >> >> change. Why don't you store the collection in every irq_desc?
> >> >
> >> > The original Linux code upon which this is based doesn't seem to need to
> >> > lookup the collection here, why is flushing needed for us but not Linux?
> >>
> >> We are writing to lpi property table. Even linux code flushes it.
> >
> > Sorry I was referring to the collection look up and inv, not the cache
> > flush, i.e. this bit:
> >
> > +/* Get collection id for this event id */
> > +col = &its_dev->its->collections[virq % num_online_cpus()];
> > +its_send_inv(its_dev, col, virq);
> >
> > Linux doesn't seem to do that INV there.
> 
> Linux does INV.
> 
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/irqchip/irq-gic-v3-its.c?id=refs/tags/v4.1
> 
> line 555

So it does, not sure how I missed that when I first looked.

Linux's approach of saving collection in the its_dev seems preferable to
looking it up like this here though.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4] x86/arm/mm: use gfn instead of pfn in p2m_get_mem_access/p2m_set_mem_access

2015-07-02 Thread George Dunlap
On 07/02/2015 01:26 PM, Vitaly Kuznetsov wrote:
> George Dunlap  writes:
> 
>> On 07/02/2015 12:25 PM, Tim Deegan wrote:
>>> At 12:09 +0100 on 02 Jul (1435838956), Andrew Cooper wrote:
 On 02/07/15 11:48, George Dunlap wrote:
> Now in p2m_set_mem_access(), rather than just using an unsigned long in
> the loop iterating over gfns, you do this thing where you convert gfn_t
> to unsigned long, add one, and then convert it back to gfn_t again.
>
> I can't see any comments in v3 that suggest you doing that, and it seems
> a bit clunky.  Is that really necessary?  Wouldn't it be better to
> declare a local variable?
>
> I'm not strongly opinionated on this one, it just seems a bit strange.
>
> Everything else looks good, thanks.

 Looping over {g,m,p}fn_t's is indeed awkward, as the compiler tricks for
 typesafety don't allow for simply adding 1 to a typesafe variable.

 In a cases like this, I think it is acceptable to keep a unsigned long
 shadow variable and manipulate it is a plain integer.  The eventual
 _gfn() required to pass it further down the callchain will help to
 visually re-enforce the appropriate type.

 After all, the entire point of these typesafes are to try and avoid
 accidentally mixing up the different address spaces, but a function
 which takes a typesafe, loops over a subset and passes the same typesafe
 further down can probably be trusted to DTRT, catching errors at review
 time. 

 Off the top of my head, the only functions which would normally expect
 to mix and match the typesafes are the pagetable walking ones.
>>>
>>> It should be easy enough to extend the macros to define a
>>> gfn_inc(&gfn_t) operator for this kind of thing.
>>
>> I was thinking that -- although in this case you'd still need to un-pack
>> it to do the loop exit conditional.  To really make things pretty you'd
>> want a for_gfn_range() macro or something like that that takes a start
>> gfn and a number.
>>
>> But that's really starting to be feature creep for this patch, which is
>> why I didn't want to suggest it on v4. :-)
> 
> Well, if you look at what I was fixing in v1 ... :-)
> 
> I suggest we add a local unsigned long here and close the deal, when
> {g,m,p}fn_{inc,dec} macros are available we'll get rid of it.

+1

 -G


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 1/2] Resize the MAX_NR_IO_RANGES for ioreq server

2015-07-02 Thread Yu Zhang
MAX_NR_IO_RANGES is used by ioreq server as the maximum
number of discrete ranges to be tracked. This patch changes
its value to 8k, so that more ranges can be tracked on next
generation of Intel platforms in XenGT. Future patches can
extend the limit to be toolstack tunable, and MAX_NR_IO_RANGES
can serve as a default limit.

Signed-off-by: Yu Zhang 
---
 xen/include/asm-x86/hvm/domain.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h
index ad68fcf..d62fda9 100644
--- a/xen/include/asm-x86/hvm/domain.h
+++ b/xen/include/asm-x86/hvm/domain.h
@@ -49,7 +49,7 @@ struct hvm_ioreq_vcpu {
 };
 
 #define NR_IO_RANGE_TYPES (HVMOP_IO_RANGE_PCI + 1)
-#define MAX_NR_IO_RANGES  256
+#define MAX_NR_IO_RANGES  8192
 
 struct hvm_ioreq_server {
 struct list_head   list_entry;
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 2/2] Add new data structure to track ranges.

2015-07-02 Thread Yu Zhang
This patch introduces a new data structure, struct rb_rangeset,
to represent a group of continuous ranges, e.g. the start and end
addresses for PIO/MMIO regions. By now, this structure is supposed
to assist ioreq server to forward the I/O request to backend device
models more efficiently.

Behavior of this new data structure is quite similar to rangeset,
with major difference being the time complexity. Based on doubly
linked list, struct rangeset provides O(n) time complexity for
searching. And struct rb_rangeset is based on red-black tree, with
binary searching, the time complexity is improved to O(log(n)) -
more suitable to track massive discrete ranges.

Ioreq server code is changed to utilize this new type, and a new
routine, hvm_ioreq_server_dump_range_info, is added to dump all the
ranges tracked in an ioreq server.

Signed-off-by: Yu Zhang 
---
 xen/arch/x86/domain.c|   3 +
 xen/arch/x86/hvm/hvm.c   |  56 ++--
 xen/common/Makefile  |   1 +
 xen/common/rb_rangeset.c | 281 +++
 xen/include/asm-x86/hvm/domain.h |   2 +-
 xen/include/asm-x86/hvm/hvm.h|   1 +
 xen/include/xen/rb_rangeset.h|  49 +++
 7 files changed, 378 insertions(+), 15 deletions(-)
 create mode 100644 xen/common/rb_rangeset.c
 create mode 100644 xen/include/xen/rb_rangeset.h

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index a8fe046..f8a8b80 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -2086,6 +2086,9 @@ int domain_relinquish_resources(struct domain *d)
 void arch_dump_domain_info(struct domain *d)
 {
 paging_dump_domain_info(d);
+
+if ( is_hvm_domain(d) )
+hvm_ioreq_server_dump_range_info(d);
 }
 
 void arch_dump_vcpu_info(struct vcpu *v)
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 535d622..c79676e 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -37,6 +37,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -818,7 +819,7 @@ static void hvm_ioreq_server_free_rangesets(struct 
hvm_ioreq_server *s,
 return;
 
 for ( i = 0; i < NR_IO_RANGE_TYPES; i++ )
-rangeset_destroy(s->range[i]);
+rb_rangeset_destroy(s->range[i]);
 }
 
 static int hvm_ioreq_server_alloc_rangesets(struct hvm_ioreq_server *s, 
@@ -842,8 +843,7 @@ static int hvm_ioreq_server_alloc_rangesets(struct 
hvm_ioreq_server *s,
 if ( rc )
 goto fail;
 
-s->range[i] = rangeset_new(s->domain, name,
-   RANGESETF_prettyprint_hex);
+s->range[i] = rb_rangeset_new(name);
 
 xfree(name);
 
@@ -851,7 +851,7 @@ static int hvm_ioreq_server_alloc_rangesets(struct 
hvm_ioreq_server *s,
 if ( !s->range[i] )
 goto fail;
 
-rangeset_limit(s->range[i], MAX_NR_IO_RANGES);
+rb_rangeset_limit(s->range[i], MAX_NR_IO_RANGES);
 }
 
  done:
@@ -1149,7 +1149,7 @@ static int hvm_map_io_range_to_ioreq_server(struct domain 
*d, ioservid_t id,
 
 if ( s->id == id )
 {
-struct rangeset *r;
+struct rb_rangeset *r;
 
 switch ( type )
 {
@@ -1169,10 +1169,10 @@ static int hvm_map_io_range_to_ioreq_server(struct 
domain *d, ioservid_t id,
 break;
 
 rc = -EEXIST;
-if ( rangeset_overlaps_range(r, start, end) )
+if ( rb_rangeset_overlaps_range(r, start, end) )
 break;
 
-rc = rangeset_add_range(r, start, end);
+rc = rb_rangeset_add_range(r, start, end);
 break;
 }
 }
@@ -1200,7 +1200,7 @@ static int hvm_unmap_io_range_from_ioreq_server(struct 
domain *d, ioservid_t id,
 
 if ( s->id == id )
 {
-struct rangeset *r;
+struct rb_rangeset *r;
 
 switch ( type )
 {
@@ -1220,10 +1220,10 @@ static int hvm_unmap_io_range_from_ioreq_server(struct 
domain *d, ioservid_t id,
 break;
 
 rc = -ENOENT;
-if ( !rangeset_contains_range(r, start, end) )
+if ( !rb_rangeset_contains_range(r, start, end) )
 break;
 
-rc = rangeset_remove_range(r, start, end);
+rc = rb_rangeset_remove_range(r, start, end);
 break;
 }
 }
@@ -1349,6 +1349,34 @@ static void hvm_destroy_all_ioreq_servers(struct domain 
*d)
 spin_unlock(&d->arch.hvm_domain.ioreq_server.lock);
 }
 
+void  hvm_ioreq_server_dump_range_info(struct domain *d)
+{
+unsigned int i;
+struct hvm_ioreq_server *s;
+
+spin_lock(&d->arch.hvm_domain.ioreq_server.lock);
+
+list_for_each_entry ( s,
+  &d->arch.hvm_domain.ioreq_server.list,
+  list_entry )
+{
+if ( s == d->arch.hvm_domain.default_ioreq_server )
+continue;
+
+ printk("Domain %u, ranges tracked in ioreq server %u:\n", 
d

[Xen-devel] [PATCH 0/2] Refactor ioreq server for better performance.

2015-07-02 Thread Yu Zhang
XenGT leverages ioreq server to track and forward the accesses to
GPU I/O resources, e.g. the PPGTT(per-process graphic translation
tables). Currently, ioreq server uses rangeset to track the BDF/
PIO/MMIO ranges to be emulated. To select an ioreq server, the
rangeset is searched to see if the I/O range is recorded. However,
traversing the link list inside rangeset could be time consuming
when number of ranges is too high. On HSW platform, number of PPGTTs
for each vGPU could be several hundred. On BDW, this value could
be several thousand.

To accommodate more ranges, limitation of the number of ranges in an
ioreq server, MAX_NR_IO_RANGES is changed - future patches will be
provided to tune this with other approaches. And to increase the ioreq
server performance, a new data structure, rb_rangeset, is introduced.

Yu Zhang (2):
  Resize the MAX_NR_IO_RANGES for ioreq server
  Add new data structure to track ranges.

 xen/arch/x86/domain.c|   3 +
 xen/arch/x86/hvm/hvm.c   |  56 ++--
 xen/common/Makefile  |   1 +
 xen/common/rb_rangeset.c | 281 +++
 xen/include/asm-x86/hvm/domain.h |   4 +-
 xen/include/asm-x86/hvm/hvm.h|   1 +
 xen/include/xen/rb_rangeset.h|  49 +++
 7 files changed, 379 insertions(+), 16 deletions(-)
 create mode 100644 xen/common/rb_rangeset.c
 create mode 100644 xen/include/xen/rb_rangeset.h

-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] Remove sh_{un}map_domain_page() and hap_{un}map_domain_page()

2015-07-02 Thread Andrew Cooper
On 02/07/15 13:43, Ben Catterall wrote:
> Removed as they were wrappers around map_domain_page() to
> make it appear to take an mfn_t type.
>
> Signed-off-by: Ben Catterall 

Reviewed-by: Andrew Cooper 

In the future, a patch like this should either state it is dependent on
another series, or in this case, probably be a 4/4 on the existing series.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v5] x86/arm/mm: use gfn instead of pfn in p2m_get_mem_access/p2m_set_mem_access

2015-07-02 Thread Vitaly Kuznetsov
'pfn' and 'start_pfn' are ambiguous, both these functions expect GFNs as input.

On x86 the interface of p2m_set_mem_access() in p2m.c doesn't match the
declaration in p2m-common.h as 'pfn' is being used instead of 'start_pfn'.

On ARM both p2m_set_mem_access and p2m_get_mem_access interfaces don't match
declarations from p2m-common.h: p2m_set_mem_access uses 'pfn' instead of
'start_pfn' and p2m_get_mem_access uses 'gpfn' instead of 'pfn'.

Convert p2m_get_mem_access/p2m_set_mem_access (and __p2m_get_mem_access on ARM)
interfaces to using gft_t instead of unsigned long and update all users of
these functions.

There is also an issue in p2m_get_mem_access on x86: 'gfn' parameter passed to
gfn_lock/gfn_unlock is not defined. This code compiles only because of a
coincidence: gfn_lock/gfn_unlock are currently macros which don't use their
second argument.

Signed-off-by: Vitaly Kuznetsov 
---
Changes since v4:
- In p2m_set_mem_access use local unsigned long gfn_l for looping instead of
  boxing/unboxing [George Dunlap]

Changes since v3:
- Comment codying style fix [Razvan Cojocaru]
- Use INVALID_GFN instead of ~0 and -1 [Andrew Cooper]
- Convert p2m_get_mem_access/p2m_set_mem_access interfaces to using gfn_t
  [Andrew Cooper]

Changes since v2:
- Instead of adding start_ prefix on ARM remove it on x86 [Jan Beulich,
  Ian Campbell, Razvan Cojocaru]

Changes since v1:
- This patch is a successor of '[PATCH] x86/mm: use existing 'pfn' in
  p2m_get_mem_access', instead of fixing gfn_lock/gfn_unlock arguments we do
  s/pfn/gfn/g for both p2m_get_mem_access/p2m_set_mem_access [Andrew Cooper,
  Jan Beulich]

P.S.
- The patch was compile-tested on x86 and ARM64.
---
 xen/arch/arm/p2m.c   | 33 +
 xen/arch/x86/mm/p2m.c| 36 
 xen/common/mem_access.c  |  4 ++--
 xen/include/xen/p2m-common.h | 13 ++---
 4 files changed, 45 insertions(+), 41 deletions(-)

diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index 903fa3f..6b9ef33 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -436,7 +436,7 @@ static int p2m_create_table(struct domain *d, lpae_t *entry,
 return 0;
 }
 
-static int __p2m_get_mem_access(struct domain *d, unsigned long gpfn,
+static int __p2m_get_mem_access(struct domain *d, gfn_t gfn,
 xenmem_access_t *access)
 {
 struct p2m_domain *p2m = p2m_get_hostp2m(d);
@@ -465,14 +465,14 @@ static int __p2m_get_mem_access(struct domain *d, 
unsigned long gpfn,
 return 0;
 }
 
-/* If request to get default access */
-if ( gpfn == ~0ul )
+/* If request to get default access. */
+if ( gfn_x(gfn) == INVALID_GFN )
 {
 *access = memaccess[p2m->default_access];
 return 0;
 }
 
-i = radix_tree_lookup(&p2m->mem_access_settings, gpfn);
+i = radix_tree_lookup(&p2m->mem_access_settings, gfn_x(gfn));
 
 if ( !i )
 {
@@ -480,7 +480,7 @@ static int __p2m_get_mem_access(struct domain *d, unsigned 
long gpfn,
  * No setting was found in the Radix tree. Check if the
  * entry exists in the page-tables.
  */
-paddr_t maddr = p2m_lookup(d, gpfn << PAGE_SHIFT, NULL);
+paddr_t maddr = p2m_lookup(d, gfn_x(gfn) << PAGE_SHIFT, NULL);
 if ( INVALID_PADDR == maddr )
 return -ESRCH;
 
@@ -1386,7 +1386,7 @@ p2m_mem_access_check_and_get_page(vaddr_t gva, unsigned 
long flag)
  * We do this first as this is faster in the default case when no
  * permission is set on the page.
  */
-rc = __p2m_get_mem_access(current->domain, paddr_to_pfn(ipa), &xma);
+rc = __p2m_get_mem_access(current->domain, _gfn(paddr_to_pfn(ipa)), &xma);
 if ( rc < 0 )
 goto err;
 
@@ -1590,7 +1590,7 @@ bool_t p2m_mem_access_check(paddr_t gpa, vaddr_t gla, 
const struct npfec npfec)
 if ( !p2m->mem_access_enabled )
 return true;
 
-rc = p2m_get_mem_access(v->domain, paddr_to_pfn(gpa), &xma);
+rc = p2m_get_mem_access(v->domain, _gfn(paddr_to_pfn(gpa)), &xma);
 if ( rc )
 return true;
 
@@ -1632,13 +1632,13 @@ bool_t p2m_mem_access_check(paddr_t gpa, vaddr_t gla, 
const struct npfec npfec)
 /* First, handle rx2rw and n2rwx conversion automatically. */
 if ( npfec.write_access && xma == XENMEM_access_rx2rw )
 {
-rc = p2m_set_mem_access(v->domain, paddr_to_pfn(gpa), 1,
+rc = p2m_set_mem_access(v->domain, _gfn(paddr_to_pfn(gpa)), 1,
 0, ~0, XENMEM_access_rw);
 return false;
 }
 else if ( xma == XENMEM_access_n2rwx )
 {
-rc = p2m_set_mem_access(v->domain, paddr_to_pfn(gpa), 1,
+rc = p2m_set_mem_access(v->domain, _gfn(paddr_to_pfn(gpa)), 1,
 0, ~0, XENMEM_access_rwx);
 }
 
@@ -1660,7 +1660,7 @@ bool_t p2m_mem_access_check(paddr_t gpa, vaddr_t gla, 
const struct npfec npfec)
 {
 /* A listener is no

Re: [Xen-devel] [PATCH 1/7] libxc: fix uninitialized variable in xc_cpuid_pv_policy()

2015-07-02 Thread Andrew Cooper
On 01/07/15 18:37, Jennifer Herbert wrote:
> If xc_domain_get_guest_width were to fail, guest_width is not set, and
> hence guest_64bit becomes undefined.
> Fix is to initialise to 0, and report error if call fails.
>
> Signed-off-by: Jennifer Herbert 
> ---
>  tools/libxc/xc_cpuid_x86.c |8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
> index c97f91a..847b701 100644
> --- a/tools/libxc/xc_cpuid_x86.c
> +++ b/tools/libxc/xc_cpuid_x86.c
> @@ -437,14 +437,16 @@ static void xc_cpuid_pv_policy(

Urgh - let another libxc function which can fail hard in several ways,
yet is void.

>  {
>  DECLARE_DOMCTL;
>  unsigned int guest_width;
> -int guest_64bit;
> +int guest_64bit = 0;

The default in Xen is that a PV guest is 64bit until explicitly
converted to being compat.  The better default therefore is 1.

>  char brand[13];
>  uint64_t xfeature_mask;
>  
>  xc_cpuid_brand_get(brand);
>  
> -xc_domain_get_guest_width(xch, domid, &guest_width);
> -guest_64bit = (guest_width == 8);
> +if (xc_domain_get_guest_width(xch, domid, &guest_width) == 0)
> +guest_64bit = (guest_width == 8);
> +else
> +ERROR("Could not read guest word width.");

No full stop please.

>  
>  /* Detecting Xen's atitude towards XSAVE */
>  memset(&domctl, 0, sizeof(domctl));


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 0/7] libxc: Fix a number of coverity issues.

2015-07-02 Thread Andrew Cooper
On 01/07/15 18:37, Jennifer Herbert wrote:
> Fix a number of coverity issues in libxc.

Patches 2 through 7:
  Reviewed-by: Andrew Cooper 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC PATCH v3 08/18] xen/arm: vITS: Add virtual ITS driver

2015-07-02 Thread Vijay Kilari
Hi Julien,

On Tue, Jun 23, 2015 at 10:09 PM, Julien Grall  wrote:
> Hi Vijay,
>
>>
>> +struct vits_device {
>> +uint32_t vdevid;
>> +uint32_t pdevid;
>> +struct its_device *its_dev;
>> +struct rb_node node;
>> +};
>
> We spoke about a specific structure in the design [2] but you introduced
> a new one. Why?

Section 6 of DraftG specifies to manage separate tree for device assignment.
This helps to manage RB-tree per domain to hold list of devices
assigned to this domain index with vdevid.

This helps to check if device is assigned to this domain before processing
any ITS command with that vdevid.

>
> Having everything in the its_device would help to catch a device
> attached to 2 different domains...

One option is to introduce a new variable inside its_device to know
to which domain the device is currently assigned.

>
> Also, the field pdevid is not vits specific but its.
pdevid can be removed as its_device structure already has it

Regards
Vijay

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 11/22] xen/x86: allow disabling emulated devices for HVM guests

2015-07-02 Thread Boris Ostrovsky

On 07/02/2015 07:49 AM, Stefano Stabellini wrote:

On Wed, 1 Jul 2015, Andrew Cooper wrote:

On 01/07/15 17:13, Stefano Stabellini wrote:

On Wed, 1 Jul 2015, Andrew Cooper wrote:

On 01/07/15 16:51, Boris Ostrovsky wrote:

On 07/01/2015 11:46 AM, Andrew Cooper wrote:

On 01/07/15 15:46, Roger Pau Monne wrote:

Introduce a new DOMCTL flag that can be used to disable device
emulation
inside of Xen for HVM guests. The following emulated devices are
disabled
when the XEN_DOMCTL_CDF_noemu is used: hpet, pmtimer, rtc, ioapic,
lapic,
pic and pmu. Also all the MMIO handlers are disabled.

Signed-off-by: Roger Pau Monné 
Cc: Jan Beulich 
Cc: Andrew Cooper 
Cc: Boris Ostrovsky 
Cc: Suravee Suthikulpanit 
Cc: Aravind Gopalakrishnan 
Cc: Jun Nakajima 
Cc: Eddie Dong 
Cc: Kevin Tian 

I would be hesitant to have a blanket change like this.

Consider APICV/AVIC.  For performance reasons, we absolutely want HVM
and PVH to make use of them, as they are substantially more efficient
using hardware support than evening using plain evtchn hypercalls.

However, the flipside is that we must provide an LAPIC emulation to
cover the bits which hardware cannot virtualise.

As a random idea, how about having a new hypercall or hvmparam which
provides a bitmap of permitted emulators?  This would allow far finer
grain control over what is and isn't available to a domain.

I think we also need to decide on which subsets of emulators we are
going to support, otherwise test matrix will become pretty big. For
example, initially we may want to allow all (for what we now call HVM)
or none (PVH).

Right, but that can currently be enforced with an "if ( arg != 0 && arg
!= ~0 ) return -EOPNOTSUPP;" in the hypercall handler for now.

It still leaves us with the ability to add in LAPIC emulation in the
future by changing the auditing.  A blanket "no emulation" boolean is
very much harder to relax in the future.

APICV is a bit of a special case, because it is partially virtualized in
hardware.

Not in the slightest.  It is *exactly* the same as existing hardware
virt.

I thought we were speaking about emulation, specifically regarding
device emulation in Xen x86, such as the hpet for example. In this
context APICV is a bit of a special case. Are there other devices being
partially virtualized in hardware on x86? (I admit that I haven't follow
x86 development that closely.)



Hardware does most of the work, but occasionally needs to break
into Xen to mange thing.  The difference is that we don't call some of
the existing vmexits "emulating an x86 cpu", despite this being what is
actually happening.

To me, that is different.



But in general, considering that the whole purpose of PVH as DomU is
security


From kernel perspective, the major reason for having PVH is to move 
away from PV memory management (in the long term)



-boris


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH RFC 6/6] xen: Add backtrace for serious issues.

2015-07-02 Thread Stefan Hajnoczi
On Wed, Jul 01, 2015 at 02:06:30PM +0100, Stefano Stabellini wrote:
> On Mon, 29 Jun 2015, Konrad Rzeszutek Wilk wrote:
> > When debugging issues that caused the emulator to kill itself
> > or skipping certain operations (unable to write to host
> > registers) an stack trace will most definitly aid in debugging
> > the problem.
> > 
> > As such this patch uses the most basic backtrace to print out
> > details.
> > 
> > Signed-off-by: Konrad Rzeszutek Wilk 
> 
> I think it could be useful, but it cannot be done as a xen-hvm.c thing.
> It should be somewhere generic, maybe under util? Stefan, any
> suggestions?

Yes, it seems like a util/ thing.  backtrace() and
backtrace_symbols_fd() are glibc-specific so it must not break the build
on other platforms.

I think the reason we've surivived without backtraces so far is because
fatal errors are typically handled with abort(3).  It causes a core dump
so you have the full process state, including backtraces.

I'm fine with adding a backtrace function though since it's more
lightweight and allows for graceful shutdown or error recovery.


pgpXBsKw5J3DS.pgp
Description: PGP signature
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 0/6] add xsaves/xrstors support

2015-07-02 Thread Andrew Cooper
On 02/07/15 15:02, Shuai Ruan wrote:
> This patchset enable xsaves/xrstors feature.
> It includes tree parts:
> 1. add xsaves/xrstors for xen.
> 2. add xsaves/xrstors for pv guest.
> 3. add xsaves/xrstors for hvn guest.

What is xsaves/xrstors and why might I want Xen to use it? What
advantages does it give? When might these instructions be available?
Where can I read more details about this?

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] Remove sh_{un}map_domain_page() and hap_{un}map_domain_page()

2015-07-02 Thread Ben Catterall



On 02/07/15 14:13, Andrew Cooper wrote:

On 02/07/15 13:43, Ben Catterall wrote:

Removed as they were wrappers around map_domain_page() to
make it appear to take an mfn_t type.

Signed-off-by: Ben Catterall 

Reviewed-by: Andrew Cooper 

In the future, a patch like this should either state it is dependent on
another series, or in this case, probably be a 4/4 on the existing series.

~Andrew

Ah ok, understood!

Ben

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH RFC 4/6] xen: Print and use errno where applicable.

2015-07-02 Thread Konrad Rzeszutek Wilk
On Thu, Jul 02, 2015 at 12:00:29PM +0100, Stefano Stabellini wrote:
> On Wed, 1 Jul 2015, Konrad Rzeszutek Wilk wrote:
> > On Wed, Jul 01, 2015 at 02:01:07PM +0100, Stefano Stabellini wrote:
> > > On Mon, 29 Jun 2015, Konrad Rzeszutek Wilk wrote:
> > > > In Xen 4.6 commit cd2f100f0f61b3f333d52d1737dd73f02daee592
> > > > "libxc: Fix do_memory_op to return negative value on errors"
> > > > made the libxc API less odd-ball: On errors, return value is
> > > > -1 and error code is in errno. On success the return value
> > > > is either 0 or an positive value.
> > > > 
> > > > Since we could be running with an old toolstack in which the
> > > > Exx value is in rc or the newer, we print both and return
> > > > the -EXX depending on rc == -1 condition.
> > > > 
> > > > Signed-off-by: Konrad Rzeszutek Wilk 
> > > > ---
> > > >  xen-hvm.c | 10 ++
> > > >  1 file changed, 6 insertions(+), 4 deletions(-)
> > > > 
> > > > diff --git a/xen-hvm.c b/xen-hvm.c
> > > > index 0408462..a92bc14 100644
> > > > --- a/xen-hvm.c
> > > > +++ b/xen-hvm.c
> > > > @@ -345,11 +345,12 @@ go_physmap:
> > > >  unsigned long idx = pfn + i;
> > > >  xen_pfn_t gpfn = start_gpfn + i;
> > > >  
> > > > +/* In Xen 4.6 rc is -1 and errno contains the error value. */
> > > >  rc = xc_domain_add_to_physmap(xen_xc, xen_domid, 
> > > > XENMAPSPACE_gmfn, idx, gpfn);
> > > >  if (rc) {
> > > >  DPRINTF("add_to_physmap MFN %"PRI_xen_pfn" to PFN %"
> > > > -PRI_xen_pfn" failed: %d\n", idx, gpfn, rc);
> > > > -return -rc;
> > > > +PRI_xen_pfn" failed: %d (errno: %d)\n", idx, gpfn, 
> > > > rc, errno);
> > > > +return rc == -1 ? -errno : -rc;
> > > 
> > > Printing both rc and errno is the right thing to do, but I am not sure
> > > changing return value depending on the libxc version is a good idea.
> > > Maybe we should be consistent and always return rc?
> > 
> > In Xen 4.5 and earlier this function would return -EINVAL (say rc=EINVAL).
> > With Xen 4.6 it would always return 1 on errors (rc is -1, and with --1 we 
> > get 1), while
> > the errno would have EINVAL.
> > 
> > To be consistent and have this function return an proper -Exx value we
> > need that check to use errno in case rc == -1.
> 
> Maybe the best thing to do is to introduce a versioned
> xen_xc_domain_add_to_physmap to include/hw/xen/xen_common.h

Aah, hadn't seen that before. Yes will do that.
> 
> 
> > I am uncomfortable with returning positive values as errors, which reminds 
> > me -
> > I need to update the commit to mention the return 1 issue.
> 
> Agreed
> 
> 
> > > 
> > > 
> > > >  }
> > > >  }
> > > >  
> > > > @@ -422,11 +423,12 @@ static int xen_remove_from_physmap(XenIOState 
> > > > *state,
> > > >  xen_pfn_t idx = start_addr + i;
> > > >  xen_pfn_t gpfn = phys_offset + i;
> > > >  
> > > > +/* In Xen 4.6 rc is -1 and errno contains the error value. */
> > > >  rc = xc_domain_add_to_physmap(xen_xc, xen_domid, 
> > > > XENMAPSPACE_gmfn, idx, gpfn);
> > > >  if (rc) {
> > > >  fprintf(stderr, "add_to_physmap MFN %"PRI_xen_pfn" to PFN 
> > > > %"
> > > > -PRI_xen_pfn" failed: %d\n", idx, gpfn, rc);
> > > > -return -rc;
> > > > +PRI_xen_pfn" failed: %d (errno: %d)\n", idx, gpfn, 
> > > > rc, errno);
> > > > +return rc == -1 ? -errno : -rc;
> > > >  }
> > > >  }
> > > >  
> > > > -- 
> > > > 2.1.0
> > > > 
> > 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 5/6] x86/xsaves: support compact format for hvm save/restore

2015-07-02 Thread Shuai Ruan
xsaves/xrstors only use compat format, so format convertion
is needed when perform save/restore.

Signed-off-by: Shuai Ruan 
---
 xen/arch/x86/hvm/hvm.c   |  16 +++--
 xen/arch/x86/xstate.c| 137 +++
 xen/include/asm-x86/xstate.h |   6 ++
 3 files changed, 154 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 2958e0d..86d1579 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -2045,8 +2045,11 @@ static int hvm_save_cpu_xsave_states(struct domain *d, 
hvm_domain_context_t *h)
 ctxt->xfeature_mask = xfeature_mask;
 ctxt->xcr0 = v->arch.xcr0;
 ctxt->xcr0_accum = v->arch.xcr0_accum;
-memcpy(&ctxt->save_area, v->arch.xsave_area,
-   size - offsetof(struct hvm_hw_cpu_xsave, save_area));
+if ( cpu_has_xsaves )
+save_xsave_states(v, (u8 *)&ctxt->save_area);
+else
+memcpy(&ctxt->save_area, v->arch.xsave_area,
+   size - offsetof(struct hvm_hw_cpu_xsave, save_area));
 }
 
 return 0;
@@ -2145,9 +2148,12 @@ static int hvm_load_cpu_xsave_states(struct domain *d, 
hvm_domain_context_t *h)
 v->arch.xcr0_accum = ctxt->xcr0_accum;
 if ( ctxt->xcr0_accum & XSTATE_NONLAZY )
 v->arch.nonlazy_xstate_used = 1;
-memcpy(v->arch.xsave_area, &ctxt->save_area,
-   min(desc->length, size) - offsetof(struct hvm_hw_cpu_xsave,
-   save_area));
+if ( cpu_has_xsaves )
+load_xsave_states(v, (u8 *)&ctxt->save_area);
+else
+memcpy(v->arch.xsave_area, &ctxt->save_area,
+   min(desc->length, size) - offsetof(struct hvm_hw_cpu_xsave,
+   save_area));
 
 return 0;
 }
diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
index 73a16b9..c20f865 100644
--- a/xen/arch/x86/xstate.c
+++ b/xen/arch/x86/xstate.c
@@ -30,6 +30,9 @@ static u32 __read_mostly xsave_cntxt_size;
 /* A 64-bit bitmask of the XSAVE/XRSTOR features supported by processor. */
 u64 __read_mostly xfeature_mask;
 
+static unsigned int *xstate_offsets, *xstate_sizes;
+static unsigned int xstate_features;
+static unsigned int xstate_comp_offsets[sizeof(xfeature_mask)*8];
 /* Cached xcr0 for fast read */
 static DEFINE_PER_CPU(uint64_t, xcr0);
 
@@ -66,6 +69,137 @@ uint64_t get_xcr0(void)
 return this_cpu(xcr0);
 }
 
+static void setup_xstate_features(void)
+{
+unsigned int eax, ebx, ecx, edx, leaf = 0x2;
+
+xstate_features = fls(xfeature_mask);
+xstate_offsets = _xzalloc(xstate_features, sizeof(int));
+xstate_sizes = _xzalloc(xstate_features, sizeof(int));
+
+do {
+cpuid_count(XSTATE_CPUID, leaf, &eax, &ebx, &ecx, &edx);
+
+if ( eax == 0 )
+break;
+
+xstate_offsets[leaf] = ebx;
+xstate_sizes[leaf] = eax;
+
+leaf++;
+} while (1);
+}
+
+static void setup_xstate_comp(u64 xcr0)
+{
+unsigned int xstate_comp_sizes[sizeof(xfeature_mask)*8];
+int i;
+
+/*
+ * The FP xstates and SSE xstates are legacy states. They are always
+ * in the fixed offsets in the xsave area in either compacted form
+ * or standard form.
+ */
+xstate_comp_offsets[0] = 0;
+xstate_comp_offsets[1] = XSAVE_SSE_OFFSET;
+
+xstate_comp_offsets[2] = FXSAVE_SIZE + XSAVE_HDR_SIZE;
+
+for (i = 2; i < xstate_features; i++)
+{
+if ( 1 << i & xcr0 )
+xstate_comp_sizes[i] = xstate_sizes[i];
+else
+xstate_comp_sizes[i] = 0;
+
+if ( i > 2 )
+xstate_comp_offsets[i] = xstate_comp_offsets[i-1]
++ xstate_comp_sizes[i-1];
+}
+}
+
+static void *get_xsave_addr(struct xsave_struct *xsave, int xstate)
+{
+int feature = fls(xstate) - 1;
+if ( !(1 << feature & xfeature_mask) )
+return NULL;
+
+return (void *)xsave + xstate_comp_offsets[feature];
+}
+
+void save_xsave_states(struct vcpu *v, u8 *dest)
+{
+struct xsave_struct *xsave = v->arch.xsave_area;
+u64 xstate_bv = xsave->xsave_hdr.xstate_bv;
+u64 valid;
+
+setup_xstate_comp(v->arch.xcr0);
+/*
+ * Copy legacy XSAVE area, to avoid complications with CPUID
+ * leaves 0 and 1 in the loop below.
+ */
+memcpy(dest, xsave, XSAVE_HDR_OFFSET);
+
+/* Set XSTATE_BV */
+*(u64 *)(dest + XSAVE_HDR_OFFSET) = xstate_bv;
+
+/*
+ * Copy each region from the possibly compacted offset to the
+ * non-compacted offset.
+ */
+valid = xstate_bv & ~XSTATE_FP_SSE;
+while ( valid )
+{
+u64 feature = valid & -valid;
+int index = fls(feature) - 1;
+void *src = get_xsave_addr(xsave, feature);
+
+if ( src )
+memcpy(dest + xstate_offsets[index], src, xstate_sizes[index]);
+else
+WARN_ON(1);
+
+valid -= feature;
+}
+}
+
+void load_xsave_states(struct vcpu *v, u8 *src)
+{
+struct xsave_struct *xsave = v->arch.xsave_area;
+ 

[Xen-devel] [PATCH 0/6] add xsaves/xrstors support

2015-07-02 Thread Shuai Ruan
This patchset enable xsaves/xrstors feature.
It includes tree parts:
1. add xsaves/xrstors for xen.
2. add xsaves/xrstors for pv guest.
3. add xsaves/xrstors for hvn guest.

Shuai Ruan (6):
  x86/xsaves: enable xsaves/xrstors for pv guest
  x86/xsaves: enable xsaves/xrstors in xen
  x86/xsaves: enable xsaves/xrstors for hvm guest
  libxc: expose xsaves/xgetbv/xsavec to hvm guest
  x86/xsaves: support compact format for hvm save/restore
  x86/xsaves: detect xsaves/xgetbv in xen

 tools/libxc/xc_cpuid_x86.c |   8 +-
 xen/arch/x86/domain.c  |   3 +
 xen/arch/x86/hvm/hvm.c |  56 +++-
 xen/arch/x86/hvm/vmx/vmcs.c|   7 +-
 xen/arch/x86/hvm/vmx/vmx.c |  18 +++
 xen/arch/x86/traps.c   |  85 
 xen/arch/x86/x86_64/mm.c   |  52 +++
 xen/arch/x86/xstate.c  | 271 -
 xen/include/asm-x86/domain.h   |   1 +
 xen/include/asm-x86/hvm/vmx/vmcs.h |   5 +
 xen/include/asm-x86/hvm/vmx/vmx.h  |   2 +
 xen/include/asm-x86/mm.h   |   1 +
 xen/include/asm-x86/msr-index.h|   2 +
 xen/include/asm-x86/xstate.h   |  13 +-
 14 files changed, 477 insertions(+), 47 deletions(-)

-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 6/6] x86/xsaves: detect xsaves/xgetbv in xen

2015-07-02 Thread Shuai Ruan
As xsaves/xgetbv already support, so switch on.

Signed-off-by: Shuai Ruan 
---
 xen/arch/x86/xstate.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
index c20f865..ebf9920 100644
--- a/xen/arch/x86/xstate.c
+++ b/xen/arch/x86/xstate.c
@@ -525,15 +525,15 @@ void xstate_init(bool_t bsp)
 {
 cpu_has_xsaveopt = !!(eax & XSTATE_FEATURE_XSAVEOPT);
 cpu_has_xsavec = !!(eax & XSTATE_FEATURE_XSAVEC);
-/* XXX cpu_has_xgetbv1 = !!(eax & XSTATE_FEATURE_XGETBV1); */
-/* XXX cpu_has_xsaves = !!(eax & XSTATE_FEATURE_XSAVES); */
+cpu_has_xgetbv1 = !!(eax & XSTATE_FEATURE_XGETBV1);
+cpu_has_xsaves = !!(eax & XSTATE_FEATURE_XSAVES);
 }
 else
 {
 BUG_ON(!cpu_has_xsaveopt != !(eax & XSTATE_FEATURE_XSAVEOPT));
 BUG_ON(!cpu_has_xsavec != !(eax & XSTATE_FEATURE_XSAVEC));
-/* XXX BUG_ON(!cpu_has_xgetbv1 != !(eax & XSTATE_FEATURE_XGETBV1)); */
-/* XXX BUG_ON(!cpu_has_xsaves != !(eax & XSTATE_FEATURE_XSAVES)); */
+BUG_ON(!cpu_has_xgetbv1 != !(eax & XSTATE_FEATURE_XGETBV1));
+BUG_ON(!cpu_has_xsaves != !(eax & XSTATE_FEATURE_XSAVES));
 }
 
 if ( cpu_has_xsaves )
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 1/6] x86/xsaves: enable xsaves/xrstors for pv guest

2015-07-02 Thread Shuai Ruan
This patch emualtes xsaves/xrstors instruction and
XSS msr access.

As xsaves/xrstors instructions and XSS msr access
required be executed only in ring0. So emulation is
needed when pv guest use these instructions.

Signed-off-by: Shuai Ruan 
---
 xen/arch/x86/domain.c   |  3 ++
 xen/arch/x86/traps.c| 85 +
 xen/arch/x86/x86_64/mm.c| 52 +
 xen/arch/x86/xstate.c   | 39 +++
 xen/include/asm-x86/domain.h|  1 +
 xen/include/asm-x86/mm.h|  1 +
 xen/include/asm-x86/msr-index.h |  2 +
 xen/include/asm-x86/xstate.h|  3 ++
 8 files changed, 186 insertions(+)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index a8fe046..66f8231 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -426,6 +426,7 @@ int vcpu_initialise(struct vcpu *v)
 
 /* By default, do not emulate */
 v->arch.vm_event.emulate_flags = 0;
+v->arch.msr_ia32_xss = 0;
 
 rc = mapcache_vcpu_init(v);
 if ( rc )
@@ -1494,6 +1495,8 @@ static void __context_switch(void)
 if ( xcr0 != get_xcr0() && !set_xcr0(xcr0) )
 BUG();
 }
+if ( cpu_has_xsaves )
+wrmsr_safe(MSR_IA32_XSS, n->arch.msr_ia32_xss);
 vcpu_restore_fpu_eager(n);
 n->arch.ctxt_switch_to(n);
 }
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index ac62f20..227670b 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -2346,6 +2346,80 @@ static int emulate_privileged_op(struct cpu_user_regs 
*regs)
 }
 break;
 
+case 0xc7:
+{
+void *xsave_addr;
+int not_page_aligned = 0;
+u32 guest_xsaves_size = xstate_ctxt_size_compact(v->arch.xcr0);
+
+switch ( insn_fetch(u8, code_base, eip, code_limit) )
+{
+case 0x2f:/* XSAVES */
+{
+if ( (regs->edi & ~PAGE_MASK) + guest_xsaves_size > PAGE_SIZE )
+{
+mfn_t mfn_list[2];
+void *va;
+
+not_page_aligned = 1;
+mfn_list[0] = _mfn(do_page_walk_mfn(v, regs->edi));
+mfn_list[1] = _mfn(do_page_walk_mfn(v,
+   PAGE_ALIGN(regs->edi)));
+va = __vmap(mfn_list, 1, 2, PAGE_SIZE, PAGE_HYPERVISOR);
+ASSERT(((unsigned long) va & ~PAGE_MASK) == 0);
+xsave_addr = (void *)((unsigned long)va +
+ (regs->edi & ~PAGE_MASK));
+}
+else
+xsave_addr = do_page_walk(v, regs->edi);
+
+if ( !xsave_addr )
+goto fail;
+
+xsaves(regs->eax, regs->edx, xsave_addr);
+
+if ( not_page_aligned )
+vunmap((void *)((unsigned long)xsave_addr & PAGE_MASK));
+else
+unmap_domain_page(xsave_addr);
+break;
+}
+case 0x1f:/* XRSTORS */
+{
+if( (regs->edi & ~PAGE_MASK) + guest_xsaves_size > PAGE_SIZE )
+{
+mfn_t mfn_list[2];
+void *va;
+
+not_page_aligned = 1;
+mfn_list[0] = _mfn(do_page_walk_mfn(v, regs->edi));
+mfn_list[1] = _mfn(do_page_walk_mfn(v,
+   PAGE_ALIGN(regs->edi)));
+va = __vmap(mfn_list, 1, 2, PAGE_SIZE, PAGE_HYPERVISOR);
+ASSERT(((unsigned long) va & ~PAGE_MASK) == 0);
+xsave_addr = (void *)((unsigned long)va +
+ (regs->edi & ~PAGE_MASK));
+}
+else
+xsave_addr = do_page_walk(v, regs->edi);
+
+if ( !xsave_addr )
+goto fail;
+
+xrstors(regs->eax, regs->edx, xsave_addr);
+
+if ( not_page_aligned )
+vunmap((void *)((unsigned long)xsave_addr & PAGE_MASK));
+else
+unmap_domain_page(xsave_addr);
+break;
+}
+default:
+goto fail;
+}
+break;
+}
+
 case 0x06: /* CLTS */
 (void)do_fpu_taskswitch(0);
 break;
@@ -2638,6 +2712,12 @@ static int emulate_privileged_op(struct cpu_user_regs 
*regs)
 wrmsrl(regs->_ecx, msr_content);
 break;
 
+case MSR_IA32_XSS:
+if ( wrmsr_safe(regs->ecx, msr_content) != 0 )
+goto fail;
+v->arch.msr_ia32_xss = msr_content;
+break;
+
 default:
 if ( wrmsr_hypervisor_regs(regs->ecx, msr_content) == 1 )
 break;
@@ -2740,6 +2820,11 @@ static int emulate_privileged_op(struct cpu_user_regs 
*regs)
 

[Xen-devel] [PATCH 4/6] libxc: expose xsaves/xgetbv/xsavec to hvm guest

2015-07-02 Thread Shuai Ruan
This patch exposes xsaves/xgetbv/xsavec to hvm guest.

Signed-off-by: Shuai Ruan 
---
 tools/libxc/xc_cpuid_x86.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/libxc/xc_cpuid_x86.c b/tools/libxc/xc_cpuid_x86.c
index c97f91a..0ed8b68 100644
--- a/tools/libxc/xc_cpuid_x86.c
+++ b/tools/libxc/xc_cpuid_x86.c
@@ -211,6 +211,9 @@ static void intel_xc_cpuid_policy(
 }
 
 #define XSAVEOPT(1 << 0)
+#define XSAVEC  (1 << 1)
+#define XGETBV  (1 << 2)
+#define XSAVES  (1 << 3)
 /* Configure extended state enumeration leaves (0x000D for xsave) */
 static void xc_cpuid_config_xsave(
 xc_interface *xch, domid_t domid, uint64_t xfeature_mask,
@@ -247,8 +250,7 @@ static void xc_cpuid_config_xsave(
 regs[1] = 512 + 64; /* FP/SSE + XSAVE.HEADER */
 break;
 case 1: /* leaf 1 */
-regs[0] &= XSAVEOPT;
-regs[1] = regs[2] = regs[3] = 0;
+regs[0] &= (XSAVEOPT | XSAVEC | XGETBV | XSAVES);
 break;
 case 2 ... 63: /* sub-leaves */
 if ( !(xfeature_mask & (1ULL << input[1])) )
@@ -256,8 +258,6 @@ static void xc_cpuid_config_xsave(
 regs[0] = regs[1] = regs[2] = regs[3] = 0;
 break;
 }
-/* Don't touch EAX, EBX. Also cleanup ECX and EDX */
-regs[2] = regs[3] = 0;
 break;
 }
 }
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH 2/6] x86/xsaves: enable xsaves/xrstors in xen

2015-07-02 Thread Shuai Ruan
This patch uses xsaves/xrstors instead of xsaveopt/xrstor
when perform task switch in xen if the feature is supported
in hardware.

Please note that xsaves/xrstors only use compact format.

Signed-off-by: Shuai Ruan 
---
 xen/arch/x86/xstate.c| 83 
 xen/include/asm-x86/xstate.h |  3 +-
 2 files changed, 55 insertions(+), 31 deletions(-)

diff --git a/xen/arch/x86/xstate.c b/xen/arch/x86/xstate.c
index e34eda3..ff67986 100644
--- a/xen/arch/x86/xstate.c
+++ b/xen/arch/x86/xstate.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 
+#define XSTATE_COMPACTION_ENABLED  (1ULL << 63)
 static bool_t __read_mostly cpu_has_xsaveopt;
 static bool_t __read_mostly cpu_has_xsavec;
 bool_t __read_mostly cpu_has_xgetbv1;
@@ -102,7 +103,9 @@ void xsave(struct vcpu *v, uint64_t mask)
 typeof(ptr->fpu_sse.fip.sel) fcs = ptr->fpu_sse.fip.sel;
 typeof(ptr->fpu_sse.fdp.sel) fds = ptr->fpu_sse.fdp.sel;
 
-if ( cpu_has_xsaveopt )
+if ( cpu_has_xsaves )
+xsaves(lmask, hmask, ptr);
+else if ( cpu_has_xsaveopt )
 {
 /*
  * xsaveopt may not write the FPU portion even when the respective
@@ -155,7 +158,9 @@ void xsave(struct vcpu *v, uint64_t mask)
 }
 else
 {
-if ( cpu_has_xsaveopt )
+if ( cpu_has_xsaves )
+xsaves(lmask, hmask, ptr);
+else if ( cpu_has_xsaveopt )
 asm volatile ( ".byte 0x0f,0xae,0x37"
: "=m" (*ptr)
: "a" (lmask), "d" (hmask), "D" (ptr) );
@@ -198,36 +203,54 @@ void xrstor(struct vcpu *v, uint64_t mask)
 switch ( __builtin_expect(ptr->fpu_sse.x[FPU_WORD_SIZE_OFFSET], 8) )
 {
 default:
-asm volatile ( "1: .byte 0x48,0x0f,0xae,0x2f\n"
-   ".section .fixup,\"ax\"  \n"
-   "2: mov %5,%%ecx \n"
-   "   xor %1,%1\n"
-   "   rep stosb\n"
-   "   lea %2,%0\n"
-   "   mov %3,%1\n"
-   "   jmp 1b   \n"
-   ".previous   \n"
-   _ASM_EXTABLE(1b, 2b)
-   : "+&D" (ptr), "+&a" (lmask)
-   : "m" (*ptr), "g" (lmask), "d" (hmask),
- "m" (xsave_cntxt_size)
-   : "ecx" );
+if ( cpu_has_xsaves )
+{
+if ( !(v->arch.xsave_area->xsave_hdr.xcomp_bv &
+ XSTATE_COMPACTION_ENABLED) )
+v->arch.xsave_area->xsave_hdr.xcomp_bv = get_xcr0() |
+XSTATE_COMPACTION_ENABLED;
+xrstors(lmask, hmask, ptr);
+}
+else
+asm volatile ( "1: .byte 0x48,0x0f,0xae,0x2f\n"
+   ".section .fixup,\"ax\"  \n"
+   "2: mov %5,%%ecx \n"
+   "   xor %1,%1\n"
+   "   rep stosb\n"
+   "   lea %2,%0\n"
+   "   mov %3,%1\n"
+   "   jmp 1b   \n"
+   ".previous   \n"
+   _ASM_EXTABLE(1b, 2b)
+   : "+&D" (ptr), "+&a" (lmask)
+   : "m" (*ptr), "g" (lmask), "d" (hmask),
+ "m" (xsave_cntxt_size)
+   : "ecx" );
 break;
 case 4: case 2:
-asm volatile ( "1: .byte 0x0f,0xae,0x2f\n"
-   ".section .fixup,\"ax\" \n"
-   "2: mov %5,%%ecx\n"
-   "   xor %1,%1   \n"
-   "   rep stosb   \n"
-   "   lea %2,%0   \n"
-   "   mov %3,%1   \n"
-   "   jmp 1b  \n"
-   ".previous  \n"
-   _ASM_EXTABLE(1b, 2b)
-   : "+&D" (ptr), "+&a" (lmask)
-   : "m" (*ptr), "g" (lmask), "d" (hmask),
- "m" (xsave_cntxt_size)
-   : "ecx" );
+if ( cpu_has_xsaves )
+{
+if ( !(v->arch.xsave_area->xsave_hdr.xcomp_bv &
+ XSTATE_COMPACTION_ENABLED) )
+v->arch.xsave_area->xsave_hdr.xcomp_bv = get_xcr0() |
+XSTATE_COMPACTION_ENABLED;
+xrstors(lmask, hmask, ptr);
+}
+else
+asm volatile ( "1: .byte 0x48,0x0f,0xae,0x2f\n"
+   ".section .fixup,\"ax\"  \n"
+   "2: mov %5,%%ecx \n"
+   "   xor

[Xen-devel] [PATCH 3/6] x86/xsaves: enable xsaves/xrstors for hvm guest

2015-07-02 Thread Shuai Ruan
This patch enables xsaves for hvm guest, includes:
1.handle xsaves vmcs init and vmexit.
2.add logic to write/read the XSS msr.

Signed-off-by: Shuai Ruan 
---
 xen/arch/x86/hvm/hvm.c | 40 ++
 xen/arch/x86/hvm/vmx/vmcs.c|  7 ++-
 xen/arch/x86/hvm/vmx/vmx.c | 18 +
 xen/arch/x86/xstate.c  |  4 ++--
 xen/include/asm-x86/hvm/vmx/vmcs.h |  5 +
 xen/include/asm-x86/hvm/vmx/vmx.h  |  2 ++
 xen/include/asm-x86/xstate.h   |  1 +
 7 files changed, 74 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 535d622..2958e0d 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4269,6 +4269,10 @@ void hvm_hypervisor_cpuid_leaf(uint32_t sub_idx,
 }
 }
 
+#define XSAVEOPT   (1 << 0)
+#define XSAVEC (1 << 1)
+#define XGETBV (1 << 2)
+#define XSAVES (1 << 3)
 void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
unsigned int *ecx, unsigned int *edx)
 {
@@ -4355,6 +4359,34 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, 
unsigned int *ebx,
 *ebx = _eax + _ebx;
 }
 }
+if ( count == 1 )
+{
+if ( cpu_has_xsaves )
+{
+*ebx = XSTATE_AREA_MIN_SIZE;
+if ( v->arch.xcr0 | v->arch.msr_ia32_xss )
+for ( sub_leaf = 2; sub_leaf < 63; sub_leaf++ )
+{
+if ( !((v->arch.xcr0 | v->arch.msr_ia32_xss)
+  & (1ULL << sub_leaf)) )
+continue;
+domain_cpuid(d, input, sub_leaf, &_eax, &_ebx, &_ecx,
+ &_edx);
+*ebx =  *ebx + _eax;
+}
+}
+else
+{
+*eax &= ~XSAVES;
+if ( !cpu_has_xgetbv1 )
+*eax &= ~XGETBV;
+if ( !cpu_has_xsavec )
+*eax &= ~XSAVEC;
+if ( !cpu_has_xsaveopt )
+*eax &= ~XSAVEOPT;
+*ebx = *ecx = *edx = 0;
+}
+}
 break;
 
 case 0x8001:
@@ -4454,6 +4486,10 @@ int hvm_msr_read_intercept(unsigned int msr, uint64_t 
*msr_content)
 *msr_content = v->arch.hvm_vcpu.guest_efer;
 break;
 
+case MSR_IA32_XSS:
+*msr_content = v->arch.msr_ia32_xss;
+break;
+
 case MSR_IA32_TSC:
 *msr_content = _hvm_rdtsc_intercept();
 break;
@@ -4573,6 +4609,10 @@ int hvm_msr_write_intercept(unsigned int msr, uint64_t 
msr_content)
return X86EMUL_EXCEPTION;
 break;
 
+case MSR_IA32_XSS:
+v->arch.msr_ia32_xss = msr_content;
+break;
+
 case MSR_IA32_TSC:
 hvm_set_guest_tsc(v, msr_content);
 break;
diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 4c5ceb5..8e61e3f 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -230,7 +230,8 @@ static int vmx_init_vmcs_config(void)
SECONDARY_EXEC_ENABLE_EPT |
SECONDARY_EXEC_ENABLE_RDTSCP |
SECONDARY_EXEC_PAUSE_LOOP_EXITING |
-   SECONDARY_EXEC_ENABLE_INVPCID);
+   SECONDARY_EXEC_ENABLE_INVPCID |
+   SECONDARY_EXEC_XSAVES);
 rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap);
 if ( _vmx_misc_cap & VMX_MISC_VMWRITE_ALL )
 opt |= SECONDARY_EXEC_ENABLE_VMCS_SHADOWING;
@@ -921,6 +922,7 @@ void virtual_vmcs_vmwrite(void *vvmcs, u32 vmcs_encoding, 
u64 val)
 virtual_vmcs_exit(vvmcs);
 }
 
+#define VMX_XSS_EXIT_BITMAP 0
 static int construct_vmcs(struct vcpu *v)
 {
 struct domain *d = v->domain;
@@ -1204,6 +1206,9 @@ static int construct_vmcs(struct vcpu *v)
 __vmwrite(GUEST_PAT, guest_pat);
 }
 
+if ( cpu_has_vmx_xsaves )
+__vmwrite(XSS_EXIT_BITMAP, VMX_XSS_EXIT_BITMAP);
+
 vmx_vmcs_exit(v);
 
 /* PVH: paging mode is updated by arch_set_info_guest(). */
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index fc29b89..7c950b3 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2683,6 +2683,16 @@ static int vmx_handle_apic_write(void)
 return vlapic_apicv_write(current, exit_qualification & 0xfff);
 }
 
+static void vmx_handle_xsaves(void)
+{
+WARN();
+}
+
+static void vmx_handle_xrstors(void)
+{
+WARN();
+}
+
 void vmx_vmexit_handler(struct cpu_user_regs *regs)
 {
 unsigned long exit_qualification, exit_reason, idtv_info, intr_info = 0;
@@ -3201,6 +3211,14 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
 vmx_vcpu_flush_pml_buffer(v);
 break;
 
+case EXIT_REASON_XSAVES:
+vmx_handle_xsaves();
+break;
+
+case EXIT_REASON_XRSTORS:
+vmx_handle_

Re: [Xen-devel] [RFC PATCH v3 08/18] xen/arm: vITS: Add virtual ITS driver

2015-07-02 Thread Ian Campbell
On Thu, 2015-07-02 at 19:03 +0530, Vijay Kilari wrote:
> Hi Julien,
> 
> On Tue, Jun 23, 2015 at 10:09 PM, Julien Grall  
> wrote:
> > Hi Vijay,
> >
> >>
> >> +struct vits_device {
> >> +uint32_t vdevid;
> >> +uint32_t pdevid;
> >> +struct its_device *its_dev;
> >> +struct rb_node node;
> >> +};
> >
> > We spoke about a specific structure in the design [2] but you introduced
> > a new one. Why?
> 
> Section 6 of DraftG specifies to manage separate tree for device assignment.
> This helps to manage RB-tree per domain to hold list of devices
> assigned to this domain index with vdevid.
> 
> This helps to check if device is assigned to this domain before processing
> any ITS command with that vdevid.
> 
> >
> > Having everything in the its_device would help to catch a device
> > attached to 2 different domains...
> 
> One option is to introduce a new variable inside its_device to know
> to which domain the device is currently assigned.

IIRC that's what I intended, e.g. two trees referencing the same
underlying data structure. Sorry that wasn't clear.

> 
> >
> > Also, the field pdevid is not vits specific but its.
> pdevid can be removed as its_device structure already has it
> 
> Regards
> Vijay



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3 2/6] libxl: do not add a vkb backend to hvm guests

2015-07-02 Thread Konrad Rzeszutek Wilk
> > But I am looking at the xen-fbfront.c driver and it might be that
> > I had already fixed this issue! (inadvertly it seems)
> > 
> > 51c71a3bbaca868043cc45b3ad3786dd48a90235
> > Author: Konrad Rzeszutek Wilk 
> > Date:   Tue Nov 26 15:05:40 2013 -0500
> > 
> > xen/pvhvm: If xen_platform_pci=0 is set don't blow up (v4).
> > 
> > ..
> >- if running in HVM, check if user wanted 'xen_emul_unplug=never',
> >in which case bail out and don't load any PV drivers.
> >  - if running in HVM, and if PCI device 5853:0001 (xen_platform_pci)
> >does not exist, then bail out and not load PV drivers.
> >  - (v2) if running in HVM, and if the user wanted 
> > 'xen_emul_unplug=ide-disks',
> >then bail out for all PV devices _except_ the block one.
> >Ditto for the network one ('nics').
> >  - (v2) if running in HVM, and if the user wanted 
> > 'xen_emul_unplug=unnecessary'
> >then load block PV driver, and also setup the legacy IDE paths.
> >In (v3) make it actually load PV drivers.
> > 
> > .. which means that if the driver does not use the 'xen_has_pv_XXX_devices'
> > but only the 'xen_has_pv_devices' then for a normal HVM guest it won't load
> > it.
> > 
> > And sure enough we have:
> > 
> > +   if (!xen_has_pv_devices())
> > +   return -ENODEV;
> > 
> > so we bail out and not load it under HVM.
>  
> And at the same time it works on ARM because CONFIG_XEN_PVHVM is not
> defined there, right?

Yup, and it ends up doing:

static inline bool xen_has_pv_devices(void) 
{   
return IS_ENABLED(CONFIG_XEN);  
}  

which will return true if CONFIG_XEN is set.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [linux-3.4 test] 59013: regressions - FAIL

2015-07-02 Thread osstest service user
flight 59013 linux-3.4 real [real]
http://logs.test-lab.xenproject.org/osstest/logs/59013/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-qemut-win7-amd64  6 xen-boot  fail REGR. vs. 30511

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl-sedf-pin  6 xen-boot   fail in 58831 pass in 58798
 test-amd64-i386-xl-qemuu-win7-amd64 9 windows-install fail in 58831 pass in 
59013
 test-amd64-amd64-pair10 xen-boot/dst_host   fail pass in 58798
 test-amd64-amd64-pair 9 xen-boot/src_host   fail pass in 58798
 test-amd64-i386-pair 10 xen-boot/dst_host   fail pass in 58831
 test-amd64-i386-pair  9 xen-boot/src_host   fail pass in 58831
 test-amd64-amd64-xl   6 xen-bootfail pass in 58941

Regressions which are regarded as allowable (not blocking):
 test-amd64-i386-xl-qemut-debianhvm-amd64-xsm 6 xen-boot fail baseline untested
 test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm 6 xen-boot fail baseline untested
 test-amd64-i386-libvirt-xsm   6 xen-bootfail baseline untested
 test-amd64-amd64-xl-multivcpu  6 xen-boot   fail baseline untested
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm 6 xen-boot fail baseline untested
 test-amd64-amd64-libvirt-xsm  6 xen-bootfail baseline untested
 test-amd64-amd64-xl-credit2   6 xen-bootfail baseline untested
 test-amd64-amd64-xl-xsm   6 xen-bootfail baseline untested
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 6 xen-boot fail baseline 
untested
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 14 guest-localmigrate.2 
fail baseline untested
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm 6 xen-boot fail baseline untested
 test-amd64-amd64-xl-sedf  6 xen-boot  fail in 58831 like 30406
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 9 debian-hvm-install fail 
in 58941 baseline untested
 test-amd64-i386-libvirt  11 guest-start  fail   like 30511
 test-amd64-amd64-libvirt 11 guest-start  fail   like 30511
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop fail like 30511
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop  fail like 30511
 test-amd64-amd64-xl-qemuu-ovmf-amd64  6 xen-bootfail like 53709-bisect
 test-amd64-i386-xl6 xen-bootfail like 53725-bisect
 test-amd64-i386-freebsd10-amd64  6 xen-boot fail like 58780-bisect
 test-amd64-i386-xl-qemuu-winxpsp3  6 xen-boot   fail like 58786-bisect
 test-amd64-i386-qemut-rhel6hvm-intel  6 xen-bootfail like 58788-bisect
 test-amd64-i386-rumpuserxen-i386  6 xen-bootfail like 58799-bisect
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1  6 xen-bootfail like 58801-bisect
 test-amd64-amd64-xl-qemuu-debianhvm-amd64  6 xen-boot   fail like 58803-bisect
 test-amd64-amd64-xl-qemut-winxpsp3  6 xen-boot  fail like 58804-bisect
 test-amd64-i386-freebsd10-i386  6 xen-boot  fail like 58805-bisect
 test-amd64-i386-xl-qemuu-ovmf-amd64  6 xen-boot fail like 58806-bisect
 test-amd64-amd64-xl-qemuu-winxpsp3  6 xen-boot  fail like 58807-bisect
 test-amd64-i386-xl-qemut-winxpsp3  6 xen-boot   fail like 58808-bisect
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1  6 xen-bootfail like 58809-bisect
 test-amd64-amd64-rumpuserxen-amd64  6 xen-boot  fail like 58810-bisect
 test-amd64-i386-xl-qemuu-debianhvm-amd64  6 xen-bootfail like 58811-bisect
 test-amd64-amd64-xl-qemut-debianhvm-amd64  6 xen-boot   fail like 58813-bisect
 test-amd64-i386-qemuu-rhel6hvm-intel  6 xen-bootfail like 58814-bisect
 test-amd64-i386-xl-qemut-debianhvm-amd64  6 xen-bootfail like 58815-bisect

Tests which did not succeed, but are not blocking:
 test-amd64-i386-libvirt  12 migrate-support-check fail in 58831 never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-check fail in 58831 never pass
 test-amd64-amd64-libvirt 12 migrate-support-check fail in 58831 never pass
 test-amd64-amd64-xl-pvh-amd  11 guest-start  fail   never pass
 test-amd64-amd64-xl-pvh-intel 11 guest-start  fail  never pass
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop  fail never pass

version targeted for testing:
 linuxcf1b3dad6c5699b977273276bada8597636ef3e2
baseline version:
 linuxbb4a05a0400ed6d2f1e13d1f82f289ff74300a70


500 people touched revisions under test,
not listing them all


jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64

Re: [Xen-devel] [PATCH v5 05/16] x86/hvm: unify internal portio and mmio intercepts

2015-07-02 Thread Roger Pau Monné
El 30/06/15 a les 15.05, Paul Durrant ha escrit:
[...]
+void msixtbl_init(struct domain *d)
> +{
> +register_mmio_handler(d, &msixtbl_mmio_ops);

Since you are adding an initialization function to vmsi I think
msixtbl_list and msixtbl_list_lock should also be initialized here
instead of hvm_domain_initialise.

Roger.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v5 5/6] xen/PMU: Intercept PMU-related MSR and APIC accesses

2015-07-02 Thread Boris Ostrovsky
Provide interfaces for recognizing accesses to PMU-related MSRs and
LVTPC APIC and process these accesses in Xen PMU code.

(The interrupt handler performs XENPMU_flush right away in the beginning
since no PMU emulation is available. It will be added with a later patch).

Signed-off-by: Boris Ostrovsky 
Reviewed-by: David Vrabel 
---
 arch/x86/xen/apic.c|  5 ++-
 arch/x86/xen/enlighten.c   | 11 +++--
 arch/x86/xen/pmu.c | 95 +-
 arch/x86/xen/pmu.h |  4 ++
 include/xen/interface/xenpmu.h |  2 +
 5 files changed, 109 insertions(+), 8 deletions(-)

diff --git a/arch/x86/xen/apic.c b/arch/x86/xen/apic.c
index d03ebfa..acda713 100644
--- a/arch/x86/xen/apic.c
+++ b/arch/x86/xen/apic.c
@@ -7,6 +7,7 @@
 #include 
 #include 
 #include "xen-ops.h"
+#include "pmu.h"
 #include "smp.h"
 
 static unsigned int xen_io_apic_read(unsigned apic, unsigned reg)
@@ -72,8 +73,10 @@ static u32 xen_apic_read(u32 reg)
 
 static void xen_apic_write(u32 reg, u32 val)
 {
-   if (reg == APIC_LVTPC)
+   if (reg == APIC_LVTPC) {
+   (void)pmu_apic_update(reg);
return;
+   }
 
/* Warn to see if there's any stray references */
WARN(1,"register: %x, value: %x\n", reg, val);
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index cf46416..623f8ad 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -991,6 +991,9 @@ static u64 xen_read_msr_safe(unsigned int msr, int *err)
 {
u64 val;
 
+   if (pmu_msr_read(msr, &val, err))
+   return val;
+
val = native_read_msr_safe(msr, err);
switch (msr) {
case MSR_IA32_APICBASE:
@@ -1037,17 +1040,13 @@ static int xen_write_msr_safe(unsigned int msr, 
unsigned low, unsigned high)
   Xen console noise. */
 
default:
-   ret = native_write_msr_safe(msr, low, high);
+   if (!pmu_msr_write(msr, low, high, &ret))
+   ret = native_write_msr_safe(msr, low, high);
}
 
return ret;
 }
 
-unsigned long long xen_read_pmc(int counter)
-{
-   return 0;
-}
-
 void xen_setup_shared_info(void)
 {
if (!xen_feature(XENFEAT_auto_translated_physmap)) {
diff --git a/arch/x86/xen/pmu.c b/arch/x86/xen/pmu.c
index 1fc7e10..69a0b68 100644
--- a/arch/x86/xen/pmu.c
+++ b/arch/x86/xen/pmu.c
@@ -51,6 +51,8 @@ static __read_mostly int amd_num_counters;
 /* Alias registers (0x4c1) for full-width writes to PMCs */
 #define MSR_PMC_ALIAS_MASK  (~(MSR_IA32_PERFCTR0 ^ MSR_IA32_PMC0))
 
+#define INTEL_PMC_TYPE_SHIFT30
+
 static __read_mostly int intel_num_arch_counters, intel_num_fixed_counters;
 
 
@@ -167,6 +169,91 @@ static int is_intel_pmu_msr(u32 msr_index, int *type, int 
*index)
}
 }
 
+bool pmu_msr_read(unsigned int msr, uint64_t *val, int *err)
+{
+
+   if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) {
+   if (is_amd_pmu_msr(msr)) {
+   *val = native_read_msr_safe(msr, err);
+   return true;
+   }
+   } else {
+   int type, index;
+
+   if (is_intel_pmu_msr(msr, &type, &index)) {
+   *val = native_read_msr_safe(msr, err);
+   return true;
+   }
+   }
+
+   return false;
+}
+
+bool pmu_msr_write(unsigned int msr, uint32_t low, uint32_t high, int *err)
+{
+   if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) {
+   if (is_amd_pmu_msr(msr)) {
+   *err = native_write_msr_safe(msr, low, high);
+   return true;
+   }
+   } else {
+   int type, index;
+
+   if (is_intel_pmu_msr(msr, &type, &index)) {
+   *err = native_write_msr_safe(msr, low, high);
+   return true;
+   }
+   }
+
+   return false;
+}
+
+unsigned long long xen_amd_read_pmc(int counter)
+{
+   uint32_t msr;
+   int err;
+
+   msr = amd_counters_base + (counter * amd_msr_step);
+   return native_read_msr_safe(msr, &err);
+}
+
+unsigned long long xen_intel_read_pmc(int counter)
+{
+   int err;
+   uint32_t msr;
+
+   if (counter & (1

[Xen-devel] [PATCH v5 0/6] xen/PMU: PMU support for Xen PV(H) guests

2015-07-02 Thread Boris Ostrovsky
I haven't posted Linux part of PV(H) VPMU support in a while but now
that (hopefully) the hypervisor part is getting close to be done I
think it's time to post it again.

There are very few differences compared to the last version, mostly due
to updates in shared structures layouts. Patches 1 and 4 have no changes
at all and patch 5 has minor changes due to rebasing so I kept David's
Reviewed-by tag.


Boris Ostrovsky (6):
  xen: xensyms support
  xen/PMU: Sysfs interface for setting Xen PMU mode
  xen/PMU: Initialization code for Xen PMU
  xen/PMU: Describe vendor-specific PMU registers
  xen/PMU: Intercept PMU-related MSR and APIC accesses
  xen/PMU: PMU emulation code

 Documentation/ABI/testing/sysfs-hypervisor-pmu |  23 +
 arch/x86/include/asm/xen/hypercall.h   |   6 +
 arch/x86/include/asm/xen/interface.h   | 123 ++
 arch/x86/xen/Makefile  |   2 +-
 arch/x86/xen/apic.c|   6 +
 arch/x86/xen/enlighten.c   |  13 +-
 arch/x86/xen/pmu.c | 572 +
 arch/x86/xen/pmu.h |  15 +
 arch/x86/xen/smp.c |  29 +-
 arch/x86/xen/suspend.c |  23 +-
 drivers/xen/Kconfig|   8 +
 drivers/xen/sys-hypervisor.c   | 127 ++
 drivers/xen/xenfs/Makefile |   1 +
 drivers/xen/xenfs/super.c  |   3 +
 drivers/xen/xenfs/xenfs.h  |   1 +
 drivers/xen/xenfs/xensyms.c| 152 +++
 include/xen/interface/platform.h   |  18 +
 include/xen/interface/xen.h|   2 +
 include/xen/interface/xenpmu.h |  94 
 19 files changed, 1208 insertions(+), 10 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-hypervisor-pmu
 create mode 100644 arch/x86/xen/pmu.c
 create mode 100644 arch/x86/xen/pmu.h
 create mode 100644 drivers/xen/xenfs/xensyms.c
 create mode 100644 include/xen/interface/xenpmu.h

-- 
1.8.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v5 1/6] xen: xensyms support

2015-07-02 Thread Boris Ostrovsky
Export Xen symbols to dom0 via /proc/xen/xensyms (similar to
/proc/kallsyms).

Signed-off-by: Boris Ostrovsky 
Reviewed-by: David Vrabel 
---
 drivers/xen/Kconfig  |   8 +++
 drivers/xen/xenfs/Makefile   |   1 +
 drivers/xen/xenfs/super.c|   3 +
 drivers/xen/xenfs/xenfs.h|   1 +
 drivers/xen/xenfs/xensyms.c  | 152 +++
 include/xen/interface/platform.h |  18 +
 6 files changed, 183 insertions(+)
 create mode 100644 drivers/xen/xenfs/xensyms.c

diff --git a/drivers/xen/Kconfig b/drivers/xen/Kconfig
index 7cd226d..1d825ca 100644
--- a/drivers/xen/Kconfig
+++ b/drivers/xen/Kconfig
@@ -280,4 +280,12 @@ config XEN_ACPI
def_bool y
depends on X86 && ACPI
 
+config XEN_SYMS
+   bool "Xen symbols"
+   depends on XEN_DOM0 && XENFS
+   default y if KALLSYMS
+   help
+  Exports hypervisor symbols (along with their types and addresses) via
+  /proc/xen/xensyms file, similar to /proc/kallsyms
+
 endmenu
diff --git a/drivers/xen/xenfs/Makefile b/drivers/xen/xenfs/Makefile
index b019865..1a83010 100644
--- a/drivers/xen/xenfs/Makefile
+++ b/drivers/xen/xenfs/Makefile
@@ -2,3 +2,4 @@ obj-$(CONFIG_XENFS) += xenfs.o
 
 xenfs-y  = super.o
 xenfs-$(CONFIG_XEN_DOM0) += xenstored.o
+xenfs-$(CONFIG_XEN_SYMS) += xensyms.o
diff --git a/drivers/xen/xenfs/super.c b/drivers/xen/xenfs/super.c
index 06092e0..8559a71 100644
--- a/drivers/xen/xenfs/super.c
+++ b/drivers/xen/xenfs/super.c
@@ -57,6 +57,9 @@ static int xenfs_fill_super(struct super_block *sb, void 
*data, int silent)
{ "privcmd", &xen_privcmd_fops, S_IRUSR|S_IWUSR },
{ "xsd_kva", &xsd_kva_file_ops, S_IRUSR|S_IWUSR},
{ "xsd_port", &xsd_port_file_ops, S_IRUSR|S_IWUSR},
+#ifdef CONFIG_XEN_SYMS
+   { "xensyms", &xensyms_ops, S_IRUSR},
+#endif
{""},
};
 
diff --git a/drivers/xen/xenfs/xenfs.h b/drivers/xen/xenfs/xenfs.h
index 6b80c77..2c5934e 100644
--- a/drivers/xen/xenfs/xenfs.h
+++ b/drivers/xen/xenfs/xenfs.h
@@ -3,5 +3,6 @@
 
 extern const struct file_operations xsd_kva_file_ops;
 extern const struct file_operations xsd_port_file_ops;
+extern const struct file_operations xensyms_ops;
 
 #endif /* _XENFS_XENBUS_H */
diff --git a/drivers/xen/xenfs/xensyms.c b/drivers/xen/xenfs/xensyms.c
new file mode 100644
index 000..ed16f8d
--- /dev/null
+++ b/drivers/xen/xenfs/xensyms.c
@@ -0,0 +1,152 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "xenfs.h"
+
+
+#define XEN_KSYM_NAME_LEN 127 /* Hypervisor may have different name length */
+
+struct xensyms {
+   struct xen_platform_op op;
+   char *name;
+   uint32_t namelen;
+};
+
+/* Grab next output page from the hypervisor */
+static int xensyms_next_sym(struct xensyms *xs)
+{
+   int ret;
+   struct xenpf_symdata *symdata = &xs->op.u.symdata;
+   uint64_t symnum;
+
+   memset(xs->name, 0, xs->namelen);
+   symdata->namelen = xs->namelen;
+
+   symnum = symdata->symnum;
+
+   ret = HYPERVISOR_dom0_op(&xs->op);
+   if (ret < 0)
+   return ret;
+
+   /*
+* If hypervisor's symbol didn't fit into the buffer then allocate
+* a larger buffer and try again
+*/
+   if (unlikely(symdata->namelen > xs->namelen)) {
+   kfree(xs->name);
+
+   xs->namelen = symdata->namelen;
+   xs->name = kzalloc(xs->namelen, GFP_KERNEL);
+   if (!xs->name)
+   return 1;
+
+   set_xen_guest_handle(symdata->name, xs->name);
+   symdata->symnum--; /* Rewind */
+
+   ret = HYPERVISOR_dom0_op(&xs->op);
+   if (ret < 0)
+   return ret;
+   }
+
+   if (symdata->symnum == symnum)
+   /* End of symbols */
+   return 1;
+
+   return 0;
+}
+
+static void *xensyms_start(struct seq_file *m, loff_t *pos)
+{
+   struct xensyms *xs = (struct xensyms *)m->private;
+
+   xs->op.u.symdata.symnum = *pos;
+
+   if (xensyms_next_sym(xs))
+   return NULL;
+
+   return m->private;
+}
+
+static void *xensyms_next(struct seq_file *m, void *p, loff_t *pos)
+{
+   struct xensyms *xs = (struct xensyms *)m->private;
+
+   xs->op.u.symdata.symnum = ++(*pos);
+
+   if (xensyms_next_sym(xs))
+   return NULL;
+
+   return p;
+}
+
+static int xensyms_show(struct seq_file *m, void *p)
+{
+   struct xensyms *xs = (struct xensyms *)m->private;
+   struct xenpf_symdata *symdata = &xs->op.u.symdata;
+
+   seq_printf(m, "%016llx %c %s\n", symdata->address,
+  symdata->type, xs->name);
+
+   return 0;
+}
+
+static void xensyms_stop(struct seq_file *m, void *p)
+{
+}
+
+static const struct seq_operations xensyms_seq_ops = {
+   .start = xensyms_start,
+

[Xen-devel] [PATCH v5 6/6] xen/PMU: PMU emulation code

2015-07-02 Thread Boris Ostrovsky
Add PMU emulation code that runs when we are processing a PMU interrupt.
This code will allow us not to trap to hypervisor on each MSR/LVTPC access
(of which there may be quite a few in the handler).

Signed-off-by: Boris Ostrovsky 
---
 arch/x86/xen/pmu.c | 214 +
 1 file changed, 185 insertions(+), 29 deletions(-)

diff --git a/arch/x86/xen/pmu.c b/arch/x86/xen/pmu.c
index 69a0b68..4c09167 100644
--- a/arch/x86/xen/pmu.c
+++ b/arch/x86/xen/pmu.c
@@ -13,11 +13,20 @@
 /* x86_pmu.handle_irq definition */
 #include "../kernel/cpu/perf_event.h"
 
+#define XENPMU_IRQ_PROCESSING1
+struct xenpmu {
+   /* Shared page between hypervisor and domain */
+   struct xen_pmu_data *xenpmu_data;
 
-/* Shared page between hypervisor and domain */
-DEFINE_PER_CPU(struct xen_pmu_data *, xenpmu_shared);
-#define get_xenpmu_data()per_cpu(xenpmu_shared, smp_processor_id())
+   uint8_t flags;
+};
+DEFINE_PER_CPU(struct xenpmu, xenpmu_shared);
+#define get_xenpmu_data()(this_cpu_ptr(&xenpmu_shared)->xenpmu_data)
+#define get_xenpmu_flags()   (this_cpu_ptr(&xenpmu_shared)->flags)
 
+/* Macro for computing address of a PMU MSR bank */
+#define field_offset(ctxt, field) ((void *)((uintptr_t)ctxt + \
+   (uintptr_t)ctxt->field))
 
 /* AMD PMU */
 #define F15H_NUM_COUNTERS   6
@@ -169,19 +178,124 @@ static int is_intel_pmu_msr(u32 msr_index, int *type, 
int *index)
}
 }
 
-bool pmu_msr_read(unsigned int msr, uint64_t *val, int *err)
+static bool xen_intel_pmu_emulate(unsigned int msr, u64 *val, int type,
+ int index, bool is_read)
 {
+   uint64_t *reg = NULL;
+   struct xen_pmu_intel_ctxt *ctxt;
+   uint64_t *fix_counters;
+   struct xen_pmu_cntr_pair *arch_cntr_pair;
+   struct xen_pmu_data *xenpmu_data = get_xenpmu_data();
+   uint8_t xenpmu_flags = get_xenpmu_flags();
+
 
+   if (!xenpmu_data || !(xenpmu_flags & XENPMU_IRQ_PROCESSING))
+   return false;
+
+   ctxt = &xenpmu_data->pmu.c.intel;
+
+   switch (msr) {
+   case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+   reg = &ctxt->global_ovf_ctrl;
+   break;
+   case MSR_CORE_PERF_GLOBAL_STATUS:
+   reg = &ctxt->global_status;
+   break;
+   case MSR_CORE_PERF_GLOBAL_CTRL:
+   reg = &ctxt->global_ctrl;
+   break;
+   case MSR_CORE_PERF_FIXED_CTR_CTRL:
+   reg = &ctxt->fixed_ctrl;
+   break;
+   default:
+   switch (type) {
+   case MSR_TYPE_COUNTER:
+   fix_counters = field_offset(ctxt, fixed_counters);
+   reg = &fix_counters[index];
+   break;
+   case MSR_TYPE_ARCH_COUNTER:
+   arch_cntr_pair = field_offset(ctxt, arch_counters);
+   reg = &arch_cntr_pair[index].counter;
+   break;
+   case MSR_TYPE_ARCH_CTRL:
+   arch_cntr_pair = field_offset(ctxt, arch_counters);
+   reg = &arch_cntr_pair[index].control;
+   break;
+   default:
+   return false;
+   }
+   }
+
+   if (reg) {
+   if (is_read)
+   *val = *reg;
+   else {
+   *reg = *val;
+
+   if (msr == MSR_CORE_PERF_GLOBAL_OVF_CTRL)
+   ctxt->global_status &= (~(*val));
+   }
+   return true;
+   }
+
+   return false;
+}
+
+static bool xen_amd_pmu_emulate(unsigned int msr, u64 *val, bool is_read)
+{
+   uint64_t *reg = NULL;
+   int i, off = 0;
+   struct xen_pmu_amd_ctxt *ctxt;
+   uint64_t *counter_regs, *ctrl_regs;
+   struct xen_pmu_data *xenpmu_data = get_xenpmu_data();
+   uint8_t xenpmu_flags = get_xenpmu_flags();
+
+   if (!xenpmu_data || !(xenpmu_flags & XENPMU_IRQ_PROCESSING))
+   return false;
+
+   if (k7_counters_mirrored &&
+   ((msr >= MSR_K7_EVNTSEL0) && (msr <= MSR_K7_PERFCTR3)))
+   msr = get_fam15h_addr(msr);
+
+   ctxt = &xenpmu_data->pmu.c.amd;
+   for (i = 0; i < amd_num_counters; i++) {
+   if (msr == amd_ctrls_base + off) {
+   ctrl_regs = field_offset(ctxt, ctrls);
+   reg = &ctrl_regs[i];
+   break;
+   } else if (msr == amd_counters_base + off) {
+   counter_regs = field_offset(ctxt, counters);
+   reg = &counter_regs[i];
+   break;
+   }
+   off += amd_msr_step;
+   }
+
+   if (reg) {
+   if (is_read)
+   *val = *reg;
+   else
+   *reg = *val;
+
+   return true;
+   

[Xen-devel] [PATCH v5 2/6] xen/PMU: Sysfs interface for setting Xen PMU mode

2015-07-02 Thread Boris Ostrovsky
Set Xen's PMU mode via /sys/hypervisor/pmu/pmu_mode. Add XENPMU hypercall.

Signed-off-by: Boris Ostrovsky 
---
 Documentation/ABI/testing/sysfs-hypervisor-pmu |  23 +
 arch/x86/include/asm/xen/hypercall.h   |   6 ++
 drivers/xen/sys-hypervisor.c   | 127 +
 include/xen/interface/xen.h|   1 +
 include/xen/interface/xenpmu.h |  59 
 5 files changed, 216 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-hypervisor-pmu
 create mode 100644 include/xen/interface/xenpmu.h

diff --git a/Documentation/ABI/testing/sysfs-hypervisor-pmu 
b/Documentation/ABI/testing/sysfs-hypervisor-pmu
new file mode 100644
index 000..d1ceb77
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-hypervisor-pmu
@@ -0,0 +1,23 @@
+What:  /sys/hypervisor/pmu/pmu_mode
+Date:  July 2015
+KernelVersion: 4.2
+Contact:   Boris Ostrovsky 
+Description:
+   Describes mode that Xen's performance-monitoring unit (PMU)
+   uses. Accepted values are
+   "off"  -- PMU is disabled
+   "self" -- The guest can profile itself
+   "hv"   -- The guest can profile itself and, if it is
+ privileged (e.g. dom0), the hypervisor
+   "all" --  The guest can profile itself, the hypervisor
+ and all other guests. Only available to
+ privileged guests.
+
+What:   /sys/hypervisor/pmu/pmu_features
+Date:   July 2015
+KernelVersion:  4.2
+Contact:Boris Ostrovsky 
+Description:
+   Describes Xen PMU features (as an integer). A set bit indicates
+   that the corresponding feature is enabled. See
+   include/xen/interface/xenpmu.h for available features
diff --git a/arch/x86/include/asm/xen/hypercall.h 
b/arch/x86/include/asm/xen/hypercall.h
index ca08a27..83aea80 100644
--- a/arch/x86/include/asm/xen/hypercall.h
+++ b/arch/x86/include/asm/xen/hypercall.h
@@ -465,6 +465,12 @@ HYPERVISOR_tmem_op(
return _hypercall1(int, tmem_op, op);
 }
 
+static inline int
+HYPERVISOR_xenpmu_op(unsigned int op, void *arg)
+{
+   return _hypercall2(int, xenpmu_op, op, arg);
+}
+
 static inline void
 MULTI_fpu_taskswitch(struct multicall_entry *mcl, int set)
 {
diff --git a/drivers/xen/sys-hypervisor.c b/drivers/xen/sys-hypervisor.c
index 96453f8..f642085 100644
--- a/drivers/xen/sys-hypervisor.c
+++ b/drivers/xen/sys-hypervisor.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define HYPERVISOR_ATTR_RO(_name) \
 static struct hyp_sysfs_attr  _name##_attr = __ATTR_RO(_name)
@@ -368,6 +369,124 @@ static void xen_properties_destroy(void)
sysfs_remove_group(hypervisor_kobj, &xen_properties_group);
 }
 
+struct pmu_mode {
+   const char *name;
+   uint32_t mode;
+};
+
+struct pmu_mode pmu_modes[] = {
+   {"off", XENPMU_MODE_OFF},
+   {"self", XENPMU_MODE_SELF},
+   {"hv", XENPMU_MODE_HV},
+   {"all", XENPMU_MODE_ALL}
+};
+
+static ssize_t pmu_mode_store(struct hyp_sysfs_attr *attr,
+ const char *buffer, size_t len)
+{
+   int ret;
+   struct xen_pmu_params xp;
+   int i;
+
+   for (i = 0; i < ARRAY_SIZE(pmu_modes); i++) {
+   if (strncmp(buffer, pmu_modes[i].name, len - 1) == 0) {
+   xp.val = pmu_modes[i].mode;
+   break;
+   }
+   }
+
+   if (i == ARRAY_SIZE(pmu_modes))
+   return -EINVAL;
+
+   xp.version.maj = XENPMU_VER_MAJ;
+   xp.version.min = XENPMU_VER_MIN;
+   ret = HYPERVISOR_xenpmu_op(XENPMU_mode_set, &xp);
+   if (ret)
+   return ret;
+
+   return len;
+}
+
+static ssize_t pmu_mode_show(struct hyp_sysfs_attr *attr, char *buffer)
+{
+   int ret;
+   struct xen_pmu_params xp;
+   int i;
+   uint32_t mode;
+
+   xp.version.maj = XENPMU_VER_MAJ;
+   xp.version.min = XENPMU_VER_MIN;
+   ret = HYPERVISOR_xenpmu_op(XENPMU_mode_get, &xp);
+   if (ret)
+   return ret;
+
+   mode = (uint32_t)xp.val;
+   for (i = 0; i < ARRAY_SIZE(pmu_modes); i++) {
+   if (mode == pmu_modes[i].mode)
+   return sprintf(buffer, "%s\n", pmu_modes[i].name);
+   }
+
+   return -EINVAL;
+}
+HYPERVISOR_ATTR_RW(pmu_mode);
+
+static ssize_t pmu_features_store(struct hyp_sysfs_attr *attr,
+ const char *buffer, size_t len)
+{
+   int ret;
+   uint32_t features;
+   struct xen_pmu_params xp;
+
+   ret = kstrtou32(buffer, 0, &features);
+   if (ret)
+   return ret;
+
+   xp.val = features;
+   xp.version.maj = XENPMU_VER_MAJ;
+   xp.version.min = XENPMU_VER_MIN;
+   ret = HYPERVISOR_xenpmu_op(XENPMU_feature_set, &xp);
+   if (ret)
+

[Xen-devel] [PATCH v5 3/6] xen/PMU: Initialization code for Xen PMU

2015-07-02 Thread Boris Ostrovsky
Map shared data structure that will hold CPU registers, VPMU context,
V/PCPU IDs of the CPU interrupted by PMU interrupt. Hypervisor fills
this information in its handler and passes it to the guest for further
processing.

Set up PMU VIRQ.

Now that perf infrastructure will assume that PMU is available on a PV
guest we need to be careful and make sure that accesses via RDPMC
instruction don't cause fatal traps by the hypervisor. Provide a nop
RDPMC handler.

For the same reason avoid issuing a warning on a write to APIC's LVTPC.

Both of these will be made functional in later patches.

Signed-off-by: Boris Ostrovsky 
---
 arch/x86/include/asm/xen/interface.h | 123 +
 arch/x86/xen/Makefile|   2 +-
 arch/x86/xen/apic.c  |   3 +
 arch/x86/xen/enlighten.c |  12 ++-
 arch/x86/xen/pmu.c   | 172 +++
 arch/x86/xen/pmu.h   |  11 +++
 arch/x86/xen/smp.c   |  29 +-
 arch/x86/xen/suspend.c   |  23 +++--
 include/xen/interface/xen.h  |   1 +
 include/xen/interface/xenpmu.h   |  33 +++
 10 files changed, 400 insertions(+), 9 deletions(-)
 create mode 100644 arch/x86/xen/pmu.c
 create mode 100644 arch/x86/xen/pmu.h

diff --git a/arch/x86/include/asm/xen/interface.h 
b/arch/x86/include/asm/xen/interface.h
index 3400dba..dced9bc 100644
--- a/arch/x86/include/asm/xen/interface.h
+++ b/arch/x86/include/asm/xen/interface.h
@@ -172,6 +172,129 @@ struct vcpu_guest_context {
 #endif
 };
 DEFINE_GUEST_HANDLE_STRUCT(vcpu_guest_context);
+
+/* AMD PMU registers and structures */
+struct xen_pmu_amd_ctxt {
+   /*
+* Offsets to counter and control MSRs (relative to xen_pmu_arch.c.amd).
+* For PV(H) guests these fields are RO.
+*/
+   uint32_t counters;
+   uint32_t ctrls;
+
+   /* Counter MSRs */
+#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
+   uint64_t regs[];
+#elif defined(__GNUC__)
+   uint64_t regs[0];
+#endif
+};
+
+/* Intel PMU registers and structures */
+struct xen_pmu_cntr_pair {
+   uint64_t counter;
+   uint64_t control;
+};
+
+struct xen_pmu_intel_ctxt {
+   /*
+* Offsets to fixed and architectural counter MSRs (relative to
+* xen_pmu_arch.c.intel).
+* For PV(H) guests these fields are RO.
+*/
+   uint32_t fixed_counters;
+   uint32_t arch_counters;
+
+   /* PMU registers */
+   uint64_t global_ctrl;
+   uint64_t global_ovf_ctrl;
+   uint64_t global_status;
+   uint64_t fixed_ctrl;
+   uint64_t ds_area;
+   uint64_t pebs_enable;
+   uint64_t debugctl;
+
+   /* Fixed and architectural counter MSRs */
+#if defined(__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
+   uint64_t regs[];
+#elif defined(__GNUC__)
+   uint64_t regs[0];
+#endif
+};
+
+/* Sampled domain's registers */
+struct xen_pmu_regs {
+   uint64_t ip;
+   uint64_t sp;
+   uint64_t flags;
+   uint16_t cs;
+   uint16_t ss;
+   uint8_t cpl;
+   uint8_t pad[3];
+};
+
+/* PMU flags */
+#define PMU_CACHED(1<<0) /* PMU MSRs are cached in the context */
+#define PMU_SAMPLE_USER   (1<<1) /* Sample is from user or kernel mode 
*/
+#define PMU_SAMPLE_REAL   (1<<2) /* Sample is from realmode */
+#define PMU_SAMPLE_PV (1<<3) /* Sample from a PV guest */
+
+/*
+ * Architecture-specific information describing state of the processor at
+ * the time of PMU interrupt.
+ * Fields of this structure marked as RW for guest should only be written by
+ * the guest when PMU_CACHED bit in pmu_flags is set (which is done by the
+ * hypervisor during PMU interrupt). Hypervisor will read updated data in
+ * XENPMU_flush hypercall and clear PMU_CACHED bit.
+ */
+struct xen_pmu_arch {
+   union {
+   /*
+* Processor's registers at the time of interrupt.
+* WO for hypervisor, RO for guests.
+*/
+   struct xen_pmu_regs regs;
+   /*
+* Padding for adding new registers to xen_pmu_regs in
+* the future
+*/
+#define XENPMU_REGS_PAD_SZ  64
+   uint8_t pad[XENPMU_REGS_PAD_SZ];
+   } r;
+
+   /* WO for hypervisor, RO for guest */
+   uint64_t pmu_flags;
+
+   /*
+* APIC LVTPC register.
+* RW for both hypervisor and guest.
+* Only APIC_LVT_MASKED bit is loaded by the hypervisor into hardware
+* during XENPMU_flush or XENPMU_lvtpc_set.
+*/
+   union {
+   uint32_t lapic_lvtpc;
+   uint64_t pad;
+   } l;
+
+   /*
+* Vendor-specific PMU registers.
+* RW for both hypervisor and guest (see exceptions above).
+* Guest's updates to this field are verified and then loaded by the
+* hypervisor into hardware during XENPMU_flush
+*/
+   union

[Xen-devel] [PATCH v5 4/6] xen/PMU: Describe vendor-specific PMU registers

2015-07-02 Thread Boris Ostrovsky
AMD and Intel PMU register initialization and helpers that determine
whether a register belongs to PMU.

This and some of subsequent PMU emulation code is somewhat similar to
Xen's PMU implementation.

Signed-off-by: Boris Ostrovsky 
Reviewed-by: David Vrabel 
---
 arch/x86/xen/pmu.c | 153 -
 1 file changed, 152 insertions(+), 1 deletion(-)

diff --git a/arch/x86/xen/pmu.c b/arch/x86/xen/pmu.c
index ba7687c..1fc7e10 100644
--- a/arch/x86/xen/pmu.c
+++ b/arch/x86/xen/pmu.c
@@ -18,6 +18,155 @@
 DEFINE_PER_CPU(struct xen_pmu_data *, xenpmu_shared);
 #define get_xenpmu_data()per_cpu(xenpmu_shared, smp_processor_id())
 
+
+/* AMD PMU */
+#define F15H_NUM_COUNTERS   6
+#define F10H_NUM_COUNTERS   4
+
+static __read_mostly uint32_t amd_counters_base;
+static __read_mostly uint32_t amd_ctrls_base;
+static __read_mostly int amd_msr_step;
+static __read_mostly int k7_counters_mirrored;
+static __read_mostly int amd_num_counters;
+
+/* Intel PMU */
+#define MSR_TYPE_COUNTER0
+#define MSR_TYPE_CTRL   1
+#define MSR_TYPE_GLOBAL 2
+#define MSR_TYPE_ARCH_COUNTER   3
+#define MSR_TYPE_ARCH_CTRL  4
+
+/* Number of general pmu registers (CPUID.EAX[0xa].EAX[8..15]) */
+#define PMU_GENERAL_NR_SHIFT8
+#define PMU_GENERAL_NR_BITS 8
+#define PMU_GENERAL_NR_MASK (((1 << PMU_GENERAL_NR_BITS) - 1) \
+<< PMU_GENERAL_NR_SHIFT)
+
+/* Number of fixed pmu registers (CPUID.EDX[0xa].EDX[0..4]) */
+#define PMU_FIXED_NR_SHIFT  0
+#define PMU_FIXED_NR_BITS   5
+#define PMU_FIXED_NR_MASK   (((1 << PMU_FIXED_NR_BITS) - 1) \
+<< PMU_FIXED_NR_SHIFT)
+
+/* Alias registers (0x4c1) for full-width writes to PMCs */
+#define MSR_PMC_ALIAS_MASK  (~(MSR_IA32_PERFCTR0 ^ MSR_IA32_PMC0))
+
+static __read_mostly int intel_num_arch_counters, intel_num_fixed_counters;
+
+
+static void xen_pmu_arch_init(void)
+{
+   if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) {
+
+   switch (boot_cpu_data.x86) {
+   case 0x15:
+   amd_num_counters = F15H_NUM_COUNTERS;
+   amd_counters_base = MSR_F15H_PERF_CTR;
+   amd_ctrls_base = MSR_F15H_PERF_CTL;
+   amd_msr_step = 2;
+   k7_counters_mirrored = 1;
+   break;
+   case 0x10:
+   case 0x12:
+   case 0x14:
+   case 0x16:
+   default:
+   amd_num_counters = F10H_NUM_COUNTERS;
+   amd_counters_base = MSR_K7_PERFCTR0;
+   amd_ctrls_base = MSR_K7_EVNTSEL0;
+   amd_msr_step = 1;
+   k7_counters_mirrored = 0;
+   break;
+   }
+   } else {
+   uint32_t eax, ebx, ecx, edx;
+
+   cpuid(0xa, &eax, &ebx, &ecx, &edx);
+
+   intel_num_arch_counters = (eax & PMU_GENERAL_NR_MASK) >>
+   PMU_GENERAL_NR_SHIFT;
+   intel_num_fixed_counters = (edx & PMU_FIXED_NR_MASK) >>
+   PMU_FIXED_NR_SHIFT;
+   }
+}
+
+static inline uint32_t get_fam15h_addr(u32 addr)
+{
+   switch (addr) {
+   case MSR_K7_PERFCTR0:
+   case MSR_K7_PERFCTR1:
+   case MSR_K7_PERFCTR2:
+   case MSR_K7_PERFCTR3:
+   return MSR_F15H_PERF_CTR + (addr - MSR_K7_PERFCTR0);
+   case MSR_K7_EVNTSEL0:
+   case MSR_K7_EVNTSEL1:
+   case MSR_K7_EVNTSEL2:
+   case MSR_K7_EVNTSEL3:
+   return MSR_F15H_PERF_CTL + (addr - MSR_K7_EVNTSEL0);
+   default:
+   break;
+   }
+
+   return addr;
+}
+
+static inline bool is_amd_pmu_msr(unsigned int msr)
+{
+   if ((msr >= MSR_F15H_PERF_CTL &&
+msr < MSR_F15H_PERF_CTR + (amd_num_counters * 2)) ||
+   (msr >= MSR_K7_EVNTSEL0 &&
+msr < MSR_K7_PERFCTR0 + amd_num_counters))
+   return true;
+
+   return false;
+}
+
+static int is_intel_pmu_msr(u32 msr_index, int *type, int *index)
+{
+   u32 msr_index_pmc;
+
+   switch (msr_index) {
+   case MSR_CORE_PERF_FIXED_CTR_CTRL:
+   case MSR_IA32_DS_AREA:
+   case MSR_IA32_PEBS_ENABLE:
+   *type = MSR_TYPE_CTRL;
+   return true;
+
+   case MSR_CORE_PERF_GLOBAL_CTRL:
+   case MSR_CORE_PERF_GLOBAL_STATUS:
+   case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+   *type = MSR_TYPE_GLOBAL;
+   return true;
+
+   default:
+
+   if ((msr_index >= MSR_CORE_PERF_FIXED_CTR0) &&
+   (msr_index < MSR_CORE_PERF_FIXED_CTR0 +
+intel_num_fixed_counters)) {
+   *index = msr_index - MSR_CORE_PERF_FIXED_CTR0;
+   *type = MSR_TYPE_COUNTER;
+   return true

  1   2   >