date:20150623

Re: [Xen-devel] [PATCH v9 03/13] x86: maintain COS to CBM mapping for each socket

2015-06-23 Thread Chao Peng

On Mon, Jun 15, 2015 at 05:02:59PM +0100, Jan Beulich wrote:
> >>> On 03.06.15 at 06:53,  wrote:
> > @@ -209,6 +215,23 @@ void psr_ctxt_switch_to(struct domain *d)
> >  }
> >  }
> >  
> > +static int cat_cpu_prepare(unsigned int cpu)
> > +{
> > +struct psr_cat_socket_info *info;
> > +
> > +if ( !cat_socket_info )
> > +return 0;
> > +
> > +info = cat_socket_info + cpu_to_socket(cpu);
> 
> Is cpu_to_socket() guaranteed to always return a value < nr_sockets?

Return –ENOSPC once this happen?

Chao

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH RFC 3/4] xen: implement SCHEDOP_soft_reset

2015-06-23 Thread Jan Beulich

>>> On 22.06.15 at 18:24,  wrote:
> "Jan Beulich"  writes:
> 
> On 22.06.15 at 18:00,  wrote:
>>> "Jan Beulich"  writes:
>>> 
>>> On 03.06.15 at 15:35,  wrote:
> @@ -1129,8 +1129,9 @@ void unmap_vcpu_info(struct vcpu *v)
>  mfn = v->vcpu_info_mfn;
>  unmap_domain_page_global((void *)
>   ((unsigned long)v->vcpu_info & PAGE_MASK));
> -
> -v->vcpu_info = &dummy_vcpu_info;
> +v->vcpu_info = ((v->vcpu_id < XEN_LEGACY_MAX_VCPUS)
> +? (vcpu_info_t *)&shared_info(d, 
> vcpu_info[v->vcpu_id])

 Is this cast really needed?

>>> 
>>> Without it my gcc-4.8.3 complains:
>>> 
>>> domain.c: In function ‘unmap_vcpu_info’:
>>> domain.c:1158:21: error: pointer type mismatch in conditional expression 
>>> [-Werror]
>>>  : &dummy_vcpu_info);
>>>  ^
>>> cc1: all warnings being treated as errors
>>
>> Which is the kind of warning one normally should _not_ work
>> around by adding a cast.
> 
> In this (and in alloc_vcpu() from where this expression was copied)
> particular case this is probably OK: in struct shared_info we have
> 'struct vcpu_info vcpu_info[XEN_LEGACY_MAX_VCPUS]' as its first
> member. But I may be missing something..

Did you read the comment accompanying the definition of
__shared_info()?

The cast is presumably safe here, but it doesn't _look_ safe. And
for future readers (and future changes) it would be better if it did.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH RFC v1 00/13] Introduce HMV without dm and new boot ABI

2015-06-23 Thread Roger Pau Monné

Hello,

El 22/06/15 a les 19.55, Stefano Stabellini ha escrit:
> Hi Roger,
> 
> given that this patch series is actually using the Xen "hvm" builder, I
> take that all the PVH code paths in Xen or the guest kernel are not
> actually used, correct? This is more like PV on HVM without QEMU, right?

>From a Xen POV this is not a PVH guest (it's an HVM guest), although I'm
using some code paths which are shared with PVH. In the guest (in this
case FreeBSD) I'm using the same paths as PVH, since the exposed
interface is very similar. We probably aim at enabling the same set of
hypercalls that are enabled on PVH.

> Do you think think this can work for Dom0 too?

I don't see why it couldn't be made to work.

> Would that make all the PVH stuff in Xen and Linux effectively useless?

No, I expect that some code paths inside of Xen will be shared between
PVH and this HVM-without-dm guest type.

Then from a guest POV the interface is quite similar, so most of the
code present in Linux and FreeBSD in order to run as a PVH guest can be
reused for this new guest mode. If you take a look at the FreeBSD patch
the change is just mostly during early boot in order to set the page
tables and jump into long mode, then the rest is quite similar to PVH.

Forgot to mention, I've also tested it with hap=0 (so using shadow) and
it seems to work fine.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 10/10] x86/MSI-X: provide hypercall interface for mask-all control

2015-06-23 Thread Jan Beulich

>>> On 22.06.15 at 19:02,  wrote:
> OK, I didn't get the part of the question. AFAICT yes, FreeBSD will 
> access the low 256 bytes of the config space. For example the stub to 
> write to a cfg register is as follows:
> 
> void
> pci_cfgregwrite(int bus, int slot, int func, int reg, u_int32_t data, int 
> bytes)
> {
> 
>   if (cfgmech == CFGMECH_PCIE &&
>   (bus >= pcie_minbus && bus <= pcie_maxbus) &&
>   (bus != 0 || !(1 << slot & pcie_badslots)))
>   pciereg_cfgwrite(bus, slot, func, reg, data, bytes);
>   else
>   pcireg_cfgwrite(bus, slot, func, reg, data, bytes);
> }
> 
> I take that you would prefer it to be:
> 
> void
> pci_cfgregwrite(int bus, int slot, int func, int reg, u_int32_t data, int 
> bytes)
> {
> 
>   if (cfgmech == CFGMECH_PCIE &&
>   (bus >= pcie_minbus && bus <= pcie_maxbus) &&
>   (bus != 0 || !(1 << slot & pcie_badslots)) &&
>   (reg > PCI_REGMAX))
>   pciereg_cfgwrite(bus, slot, func, reg, data, bytes);
>   else
>   pcireg_cfgwrite(bus, slot, func, reg, data, bytes);
> }
> 
> Where 'PCI_REGMAX' is 255.

Indeed that would have been nice, but we have to live with what
OSes do. But then again - did I understand correctly that FreeBSD
doesn't support PV Dom0 operation anymore, but wants to only
run in PVH mode? Was that a recent change, i.e. are PV Dom0-s
assumed to still be in use?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v9 06/13] x86: dynamically get/set CBM for a domain

2015-06-23 Thread Chao Peng

On Tue, Jun 16, 2015 at 08:08:34AM +0100, Jan Beulich wrote:
> >>> On 03.06.15 at 06:53,  wrote:
> > +int psr_set_l3_cbm(struct domain *d, unsigned int socket, uint64_t cbm)
> > +{
> > +unsigned int old_cos, cos;
> > +struct psr_cat_cbm *map, *found = NULL;
> > +struct psr_cat_socket_info *info = NULL;
> > +int ret = get_cat_socket_info(socket, &info);
> > +
> > +if ( ret )
> > +return ret;
> > +
> > +if ( !psr_check_cbm(info->cbm_len, cbm) )
> > +return -EINVAL;
> > +
> > +old_cos = d->arch.psr_cos_ids[socket];
> > +map = info->cos_to_cbm;
> > +
> > +spin_lock(&info->cbm_lock);
> > +
> > +for ( cos = 0; cos <= info->cos_max; cos++ )
> > +{
> > +/* If still not found, then keep unused one. */
> > +if ( !found && cos != 0 && map[cos].ref == 0 )
> > +found = map + cos;
> > +else if ( map[cos].cbm == cbm )
> > +{
> > +if ( unlikely(cos == old_cos) )
> > +{
> > +spin_unlock(&info->cbm_lock);
> > +return 0;
> 
> Is this in particular, but also the surrounding "else if", correct when
> map[cos].ref == 0? 

I can't see any problem now.

> I can't seem to see map[cos].cbm getting
> invalidated when an entry's refcount drops to zero...

map[cos].cbm is not invalidated when refcount == 0 so that when a same
cbm is requested again I don't need to write the corresponding register.

Chao

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v4 RFC 1/6] x86/PCI: add config space write abstract intercept logic

2015-06-23 Thread Jan Beulich

>>> On 22.06.15 at 21:31,  wrote:
>> @@ -1804,8 +1804,12 @@ static bool_t pci_cfg_ok(struct domain *
>>  start |= CF8_ADDR_HI(currd->arch.pci_cf8);
>>  }
>>  
>> -return !xsm_pci_config_permission(XSM_HOOK, currd, machine_bdf,
>> -  start, start + size - 1, write);
>> +if ( xsm_pci_config_permission(XSM_HOOK, currd, machine_bdf,
>> +   start, start + size - 1, !!write) != 0 )
>> + return 0;
>> +
>> +return !write ||
>> +   pci_conf_write_intercept(0, machine_bdf, start, size, write) >= 
>> 0;
> 
> Won't the 'write' parameter cause an compiler error as it expects an 
> pointer?

No, certainly not - !write means the same as write != NULL, but is
(imo) easier to read.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 10/10] x86/MSI-X: provide hypercall interface for mask-all control

2015-06-23 Thread Roger Pau Monné

El 23/06/15 a les 9.20, Jan Beulich ha escrit:
 On 22.06.15 at 19:02,  wrote:
>> OK, I didn't get the part of the question. AFAICT yes, FreeBSD will 
>> access the low 256 bytes of the config space. For example the stub to 
>> write to a cfg register is as follows:
>>
>> void
>> pci_cfgregwrite(int bus, int slot, int func, int reg, u_int32_t data, int 
>> bytes)
>> {
>>
>>  if (cfgmech == CFGMECH_PCIE &&
>>  (bus >= pcie_minbus && bus <= pcie_maxbus) &&
>>  (bus != 0 || !(1 << slot & pcie_badslots)))
>>  pciereg_cfgwrite(bus, slot, func, reg, data, bytes);
>>  else
>>  pcireg_cfgwrite(bus, slot, func, reg, data, bytes);
>> }
>>
>> I take that you would prefer it to be:
>>
>> void
>> pci_cfgregwrite(int bus, int slot, int func, int reg, u_int32_t data, int 
>> bytes)
>> {
>>
>>  if (cfgmech == CFGMECH_PCIE &&
>>  (bus >= pcie_minbus && bus <= pcie_maxbus) &&
>>  (bus != 0 || !(1 << slot & pcie_badslots)) &&
>>  (reg > PCI_REGMAX))
>>  pciereg_cfgwrite(bus, slot, func, reg, data, bytes);
>>  else
>>  pcireg_cfgwrite(bus, slot, func, reg, data, bytes);
>> }
>>
>> Where 'PCI_REGMAX' is 255.
> 
> Indeed that would have been nice, but we have to live with what
> OSes do. But then again - did I understand correctly that FreeBSD
> doesn't support PV Dom0 operation anymore, but wants to only
> run in PVH mode? Was that a recent change, i.e. are PV Dom0-s
> assumed to still be in use?

FreeBSD never supported PV Dom0 operation, it only had a very minimal
and crappy i386 PV DomU support which has now been completely removed.
Maybe you are confusing it with NetBSD, which does have PV Dom0 support
since a long time ago?

Yes, FreeBSD only aims to run as PVHVM or PVH (or whatever we call HVM
without a device model guests). I frankly don't mind trying to change
this in FreeBSD if there's a good rational behind it. I'm currently
testing FreeBSD with the change above to see how it behaves.

Why doesn't Linux use the low 256 bytes of the config space and instead
uses the IO ports with PCIe devices?

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 04/11] x86/intel_pstate: relocate the driver register function

2015-06-23 Thread Jan Beulich

>>> On 23.06.15 at 05:40,  wrote:
> On 18/06/2015 22:30, Jan Beulich wrote:
>> >>> On 11.06.15 at 10:27,  wrote:
>> > Register the CPU hotplug notifier when the driver is registered, and
>> > move the driver register function to the cpufreq.c.
>> 
>> The first half of the sentence fails to say why. And I suppose if you 
> explained
>> that (to yourself) you'd figure that the change is wrong (or at least 
> altering
>> behavior in a way that needs more explanation to be verifiably correct): The
>> calls to cpufreq_register_driver() sit in __initcall handlers, yet what you
>> replace is a presmp_initcall. I.e. all APs being brought up at boot time 
>> won't
>> get the callback invoked for them anymore.
>> 
>> I suppose you tested your series on a system where the new driver will kick
>> in. You of course also need to test the case where this isn't the case -
>> everything needs to continue to function there.
> 
> The cpu_callback() is removed in the following patch (05/11) because it's 
> redundant.
> Functions in cpufreq_register_driver() are also called only once, because 
> other calls to cpufreq_register_driver() will just return due to the 
> unsuccessful condition check at the beginning of the function.

If this is the case today, then the removal should be a separate
patch early in the series (properly explaining the situation).

>> > --- a/xen/drivers/cpufreq/cpufreq.c
>> > +++ b/xen/drivers/cpufreq/cpufreq.c
>> > @@ -630,12 +630,21 @@ static struct notifier_block cpu_nfb = {
>> >  .notifier_call = cpu_callback
>> >  };
>> >
>> > -static int __init cpufreq_presmp_init(void)
>> > +int cpufreq_register_driver(struct cpufreq_driver *driver_data)
>> >  {
>> >  void *cpu = (void *)(long)smp_processor_id();
>> >  cpu_callback(&cpu_nfb, CPU_ONLINE, cpu);
>> > +if (!driver_data || !driver_data->init
>> > +|| !driver_data->verify || !driver_data->exit
>> > +|| (!driver_data->target == !driver_data->setpolicy))
>> 
>> Do you really want/need to enforce this policy (target set if and only if
>> setpolicy is not set) here? And if that's to uniformly hold, the two could be
>> put into a union...
> 
> driver_data->target() is used by a driver which relies on the old Governor 
> framework.
> driver_data->setpolicy() is used by a driver which implements its internal 
> governor.
> So, the driver either uses the old Governor framework or has its own private 
> internal governor.
> We shouldn't change to use union, because in many places, we distinguish the 
> two by checking if it's using "->target" or "->setpolicy".

The distinction between the two driver modes shouldn't be based on
arbitrary accessors they may or may not implement. There should be
a dedicated flag or alike.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 04/11] x86/intel_pstate: relocate the driver register function

2015-06-23 Thread Wang, Wei W

On 23/06/2015 15:31, Jan Beulich wrote:
> >>> On 23.06.15 at 05:40,  wrote:
> > On 18/06/2015 22:30, Jan Beulich wrote:
> >> >>> On 11.06.15 at 10:27,  wrote:
> >> > -static int __init cpufreq_presmp_init(void)
> >> > +int cpufreq_register_driver(struct cpufreq_driver *driver_data)
> >> >  {
> >> >  void *cpu = (void *)(long)smp_processor_id();
> >> >  cpu_callback(&cpu_nfb, CPU_ONLINE, cpu);
> >> > +if (!driver_data || !driver_data->init
> >> > +|| !driver_data->verify || !driver_data->exit
> >> > +|| (!driver_data->target == !driver_data->setpolicy))
> >>
> >> Do you really want/need to enforce this policy (target set if and
> >> only if setpolicy is not set) here? And if that's to uniformly hold,
> >> the two could be put into a union...
> >
> > driver_data->target() is used by a driver which relies on the old
> > Governor framework.
> > driver_data->setpolicy() is used by a driver which implements its
> > internal governor.
> > So, the driver either uses the old Governor framework or has its own
> > private internal governor.
> > We shouldn't change to use union, because in many places, we
> > distinguish the two by checking if it's using "->target" or "->setpolicy".
> 
> The distinction between the two driver modes shouldn't be based on
> arbitrary accessors they may or may not implement. There should be a
> dedicated flag or alike.

This is not arbitrary - "->target()" is dedicated to the Governor framework, 
and "->setpolicy" is dedicated to the internal governor implementation. The 
Linux kernel also takes advantage of this method. I think we don't need to add 
another new functionally equivalent flag to do so.
Shall we keep using it?

Best,
Wei

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 07/11] x86/intel_pstate: changes in cpufreq_del_cpu for CPU offline

2015-06-23 Thread Jan Beulich

>>> On 23.06.15 at 08:16,  wrote:
> On 19/06/2015 17:44, Jan Beulich wrote:
>> >>> On 11.06.15 at 10:28,  wrote:
>> > cpufreq_cpu_policy is used in intel_pstate_set_pstate(), so we change
>> > to NULL it after the call of cpufreq_driver->exit. Otherwise, a
>> > calltrace will show up on your screen due to the reference of a NULL
>> > pointer when you power down the system.
>> 
>> Apart from what was already said on this patch, I think it is placed too 
>> late in
>> this series (should precede patch 6) or, even better, not needed at all: 
>> ->exit()
>> gets passed the policy being cleaned up, so I don't follow why the driver
>> needs to consult the per-CPU field to obtain it.
> 
>> Plus, if the patch is to be kept, the description suggesting this to be a
>> problem at system shutdown only is wrong - it can equally well happen while
>> offlining a CPU. Just saying that the pointer is still needed would be 
>> sufficient.
> 
> It's needed. When unplugging a CPU, the intel_pstate driver sets it to run 
> with the lowest P-state.

I understand the latter, but this doesn't explain why you can't do
what I suggested above.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 04/11] x86/intel_pstate: relocate the driver register function

2015-06-23 Thread Jan Beulich

>>> On 23.06.15 at 10:01,  wrote:
> On 23/06/2015 15:31, Jan Beulich wrote:
>> >>> On 23.06.15 at 05:40,  wrote:
>> > On 18/06/2015 22:30, Jan Beulich wrote:
>> >> >>> On 11.06.15 at 10:27,  wrote:
>> >> > -static int __init cpufreq_presmp_init(void)
>> >> > +int cpufreq_register_driver(struct cpufreq_driver *driver_data)
>> >> >  {
>> >> >  void *cpu = (void *)(long)smp_processor_id();
>> >> >  cpu_callback(&cpu_nfb, CPU_ONLINE, cpu);
>> >> > +if (!driver_data || !driver_data->init
>> >> > +|| !driver_data->verify || !driver_data->exit
>> >> > +|| (!driver_data->target == !driver_data->setpolicy))
>> >>
>> >> Do you really want/need to enforce this policy (target set if and
>> >> only if setpolicy is not set) here? And if that's to uniformly hold,
>> >> the two could be put into a union...
>> >
>> > driver_data->target() is used by a driver which relies on the old
>> > Governor framework.
>> > driver_data->setpolicy() is used by a driver which implements its
>> > internal governor.
>> > So, the driver either uses the old Governor framework or has its own
>> > private internal governor.
>> > We shouldn't change to use union, because in many places, we
>> > distinguish the two by checking if it's using "->target" or "->setpolicy".
>> 
>> The distinction between the two driver modes shouldn't be based on
>> arbitrary accessors they may or may not implement. There should be a
>> dedicated flag or alike.
> 
> This is not arbitrary - "->target()" is dedicated to the Governor framework, 
> and "->setpolicy" is dedicated to the internal governor implementation. The 
> Linux kernel also takes advantage of this method. I think we don't need to 
> add another new functionally equivalent flag to do so.
> Shall we keep using it?

If this distinction is being made clear by comments accompanying
the definitions of the target and setpolicy fields, I'm fine with
keeping it that way. Without making this explicit it would continue
to look arbitrary (and prone to break, should another driver
elect to also implement the setpolicy hook [for whatever purpose]).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 10/10] x86/MSI-X: provide hypercall interface for mask-all control

2015-06-23 Thread Jan Beulich

>>> On 23.06.15 at 09:29,  wrote:
> FreeBSD never supported PV Dom0 operation, it only had a very minimal
> and crappy i386 PV DomU support which has now been completely removed.
> Maybe you are confusing it with NetBSD, which does have PV Dom0 support
> since a long time ago?

Very likely.

> Yes, FreeBSD only aims to run as PVHVM or PVH (or whatever we call HVM
> without a device model guests). I frankly don't mind trying to change
> this in FreeBSD if there's a good rational behind it. I'm currently
> testing FreeBSD with the change above to see how it behaves.
> 
> Why doesn't Linux use the low 256 bytes of the config space and instead
> uses the IO ports with PCIe devices?

See
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit?id=a0ca9909609470ad779b9b9cc68ce96e975afff7

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH RFC v1 00/13] Introduce HMV without dm and new boot ABI

2015-06-23 Thread Roger Pau Monné

El 22/06/15 a les 20.05, Konrad Rzeszutek Wilk ha escrit:
> On Mon, Jun 22, 2015 at 06:55:12PM +0100, Stefano Stabellini wrote:
>> Hi Roger,
>>
>> given that this patch series is actually using the Xen "hvm" builder, I
>> take that all the PVH code paths in Xen or the guest kernel are not
>> actually used, correct? This is more like PV on HVM without QEMU, right?
> 
> Are you saying it should be called now 'HVM-diet' ? Or 'HVMlite' instead
> of PVH since it is looking at this from the HVM perspective instead of PVH?
> 
> 
>>
>> Do you think think this can work for Dom0 too?
>>
>> Would that make all the PVH stuff in Xen and Linux effectively useless?
> 
> No. The AP bootup is still the same. So would all the hypercalls I think.

This is something that we have not discussed, but for HVM domains the
vCPU initialize hypercall only allows starting the vCPU in 32bit mode
with paging disabled (just like how we are starting the BSP). I'm fine
with that and I don't mind implementing a small trampoline for APs also,
but would like to hear opinions about it.

Roger.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [rumpuserxen test] 58843: regressions - FAIL

2015-06-23 Thread osstest service user

flight 58843 rumpuserxen real [real]
http://logs.test-lab.xenproject.org/osstest/logs/58843/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-i386-rumpuserxen5 rumpuserxen-build fail REGR. vs. 33866
 build-amd64-rumpuserxen   5 rumpuserxen-build fail REGR. vs. 33866

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-rumpuserxen-amd64  1 build-check(1)   blocked n/a
 test-amd64-i386-rumpuserxen-i386  1 build-check(1)   blocked  n/a

version targeted for testing:
 rumpuserxen  3b91e44996ea6ae1276bce1cc44f38701c53ee6f
baseline version:
 rumpuserxen  30d72f3fc5e35cd53afd82c8179cc0e0b11146ad


People who touched revisions under test:
  Antti Kantee 
  Ian Jackson 
  Martin Lucina 
  Wei Liu 


jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 build-amd64-rumpuserxen  fail
 build-i386-rumpuserxen   fail
 test-amd64-amd64-rumpuserxen-amd64   blocked 
 test-amd64-i386-rumpuserxen-i386 blocked 



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.

(No revision log; it would be 535 lines long.)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 07/11] x86/intel_pstate: changes in cpufreq_del_cpu for CPU offline

2015-06-23 Thread Wang, Wei W

On 23/06/2015 16:08, Jan Beulich wrote:
> >>> On 23.06.15 at 08:16,  wrote:
> > On 19/06/2015 17:44, Jan Beulich wrote:
> >> >>> On 11.06.15 at 10:28,  wrote:
> >> > cpufreq_cpu_policy is used in intel_pstate_set_pstate(), so we
> >> > change to NULL it after the call of cpufreq_driver->exit.
> >> > Otherwise, a calltrace will show up on your screen due to the
> >> > reference of a NULL pointer when you power down the system.
> >>
> >> Apart from what was already said on this patch, I think it is placed
> >> too late in this series (should precede patch 6) or, even better, not
> >> needed at all: ->exit() gets passed the policy being cleaned up, so I
> >> don't follow why the driver needs to consult the per-CPU field to obtain 
> >> it.
> >
> >> Plus, if the patch is to be kept, the description suggesting this to
> >> be a problem at system shutdown only is wrong - it can equally well
> >> happen while offlining a CPU. Just saying that the pointer is still needed
> would be sufficient.
> >
> > It's needed. When unplugging a CPU, the intel_pstate driver sets it to
> > run with the lowest P-state.
> 
> I understand the latter, but this doesn't explain why you can't do what I
> suggested above.

Because the "->exit()" needs to call "intel_pstate_set_pstate()" which does not 
receive "policy" as a parameter. "intel_pstate_set_pstate()" is also called by 
another function without passing "policy". So I think it is simpler to just 
change the order of NULLing the pointer, instead of changing more code.

Best,
Wei  

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v25 04/15] x86/VPMU: Interface for setting PMU mode and flags

2015-06-23 Thread Jan Beulich

>>> On 22.06.15 at 18:10,  wrote:
> On 06/22/2015 11:10 AM, Jan Beulich wrote:
>>
>>> +switch ( op )
>>> +{
>>> +case XENPMU_mode_set:
>>> +{
>>> +if ( (pmu_params.val & ~(XENPMU_MODE_SELF | XENPMU_MODE_HV)) ||
>>> + (hweight64(pmu_params.val) > 1) )
>>> +return -EINVAL;
>>> +
>>> +/* 32-bit dom0 can only sample itself. */
>>> +if ( is_pv_32bit_vcpu(current) && (pmu_params.val & 
>>> XENPMU_MODE_HV) )
>>> +return -EINVAL;
>>> +
>>> +spin_lock(&vpmu_lock);
>>> +
>>> +/*
>>> + * We can always safely switch between XENPMU_MODE_SELF and
>>> + * XENPMU_MODE_HV while other VPMUs are active.
>>> + */
>>> +if ( (vpmu_count == 0) || (vpmu_mode == pmu_params.val) ||
>>> + ((vpmu_mode ^ pmu_params.val) ==
>>> +  (XENPMU_MODE_SELF | XENPMU_MODE_HV)) )
>>> +vpmu_mode = pmu_params.val;
>>> +else
>>> +{
>> I think this would better be
>>
>>  if ( (vpmu_count == 0) ||
>>   ((vpmu_mode ^ pmu_params.val) ==
>>(XENPMU_MODE_SELF | XENPMU_MODE_HV)) )
>>  vpmu_mode = pmu_params.val;
>>  else if ( vpmu_mode != pmu_params.val )
>>  {
>>
>> But there's no need to re-submit just because of this (it could be
>> changed upon commit if you agree).
> 
> This will generate an error (with a message to the log) even if we keep 
> the mode unchanged, something that I wanted to avoid. (Same thing for 
> XENPMU_feature_set, which is what Kevin mentioned last time).

I don't see this: The only difference to the code you have is -
afaics - that my variant avoids the pointless write to vpmu_mode.

>> Or wait - I just checked again, and while I thought this was long
>> addressed I still don't seen struct xen_pmu_params "pad" field
>> being checked to be zero as input, and being stored as zero when
>> only an output. Did this get lost? Did I forget about a reason why
>> this isn't being done? Unless the latter is the case, the ack above
>> is dependent upon this getting fixed.
> 
> Yes, we did talk about this and as result I added major version check 
> (which I should have had there anyway): if we decide to start using this 
> field we'd need to increment the major version.

Okay then.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 04/11] x86/intel_pstate: relocate the driver register function

2015-06-23 Thread Wang, Wei W

On 23/06/2015 16:11, Jan Beulich wrote:
> >>> On 23.06.15 at 10:01,  wrote:
> > On 23/06/2015 15:31, Jan Beulich wrote:
> >> >>> On 23.06.15 at 05:40,  wrote:
> >> > On 18/06/2015 22:30, Jan Beulich wrote:
> >> >> >>> On 11.06.15 at 10:27,  wrote:
> >> >> > -static int __init cpufreq_presmp_init(void)
> >> >> > +int cpufreq_register_driver(struct cpufreq_driver *driver_data)
> >> >> >  {
> >> >> >  void *cpu = (void *)(long)smp_processor_id();
> >> >> >  cpu_callback(&cpu_nfb, CPU_ONLINE, cpu);
> >> >> > +if (!driver_data || !driver_data->init
> >> >> > +|| !driver_data->verify || !driver_data->exit
> >> >> > +|| (!driver_data->target == !driver_data->setpolicy))
> >> >>
> >> >> Do you really want/need to enforce this policy (target set if and
> >> >> only if setpolicy is not set) here? And if that's to uniformly
> >> >> hold, the two could be put into a union...
> >> >
> >> > driver_data->target() is used by a driver which relies on the old
> >> > Governor framework.
> >> > driver_data->setpolicy() is used by a driver which implements its
> >> > internal governor.
> >> > So, the driver either uses the old Governor framework or has its
> >> > own private internal governor.
> >> > We shouldn't change to use union, because in many places, we
> >> > distinguish the two by checking if it's using "->target" or 
> >> > "->setpolicy".
> >>
> >> The distinction between the two driver modes shouldn't be based on
> >> arbitrary accessors they may or may not implement. There should be a
> >> dedicated flag or alike.
> >
> > This is not arbitrary - "->target()" is dedicated to the Governor
> > framework, and "->setpolicy" is dedicated to the internal governor
> > implementation. The Linux kernel also takes advantage of this method.
> > I think we don't need to add another new functionally equivalent flag to do
> so.
> > Shall we keep using it?
> 
> If this distinction is being made clear by comments accompanying the
> definitions of the target and setpolicy fields, I'm fine with keeping it that 
> way.
> Without making this explicit it would continue to look arbitrary (and prone to
> break, should another driver elect to also implement the setpolicy hook [for
> whatever purpose]).

OK, I will add some explanation to the related commit message.

Best,
Wei

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v9 03/13] x86: maintain COS to CBM mapping for each socket

2015-06-23 Thread Jan Beulich

>>> On 23.06.15 at 09:08,  wrote:
> On Mon, Jun 15, 2015 at 05:02:59PM +0100, Jan Beulich wrote:
>> >>> On 03.06.15 at 06:53,  wrote:
>> > @@ -209,6 +215,23 @@ void psr_ctxt_switch_to(struct domain *d)
>> >  }
>> >  }
>> >  
>> > +static int cat_cpu_prepare(unsigned int cpu)
>> > +{
>> > +struct psr_cat_socket_info *info;
>> > +
>> > +if ( !cat_socket_info )
>> > +return 0;
>> > +
>> > +info = cat_socket_info + cpu_to_socket(cpu);
>> 
>> Is cpu_to_socket() guaranteed to always return a value < nr_sockets?
> 
> Return –ENOSPC once this happen?

Maybe. I can't tell what's a suitable action to take without digging
out the full patch (and perhaps series) again.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v9 06/13] x86: dynamically get/set CBM for a domain

2015-06-23 Thread Jan Beulich

>>> On 23.06.15 at 09:19,  wrote:
> On Tue, Jun 16, 2015 at 08:08:34AM +0100, Jan Beulich wrote:
>> >>> On 03.06.15 at 06:53,  wrote:
>> > +int psr_set_l3_cbm(struct domain *d, unsigned int socket, uint64_t cbm)
>> > +{
>> > +unsigned int old_cos, cos;
>> > +struct psr_cat_cbm *map, *found = NULL;
>> > +struct psr_cat_socket_info *info = NULL;
>> > +int ret = get_cat_socket_info(socket, &info);
>> > +
>> > +if ( ret )
>> > +return ret;
>> > +
>> > +if ( !psr_check_cbm(info->cbm_len, cbm) )
>> > +return -EINVAL;
>> > +
>> > +old_cos = d->arch.psr_cos_ids[socket];
>> > +map = info->cos_to_cbm;
>> > +
>> > +spin_lock(&info->cbm_lock);
>> > +
>> > +for ( cos = 0; cos <= info->cos_max; cos++ )
>> > +{
>> > +/* If still not found, then keep unused one. */
>> > +if ( !found && cos != 0 && map[cos].ref == 0 )
>> > +found = map + cos;
>> > +else if ( map[cos].cbm == cbm )
>> > +{
>> > +if ( unlikely(cos == old_cos) )
>> > +{
>> > +spin_unlock(&info->cbm_lock);
>> > +return 0;
>> 
>> Is this in particular, but also the surrounding "else if", correct when
>> map[cos].ref == 0? 
> 
> I can't see any problem now.

Further down in the function you increment found->ref, and it looks
suspicious that you return success here having found a slot possibly
having refount zero (and thus available for re-use for another CBM).
I.e. if this indeed is intended and correct, I think this needs to be
explained in a brief comment.

>> I can't seem to see map[cos].cbm getting
>> invalidated when an entry's refcount drops to zero...
> 
> map[cos].cbm is not invalidated when refcount == 0 so that when a same
> cbm is requested again I don't need to write the corresponding register.

Ah, makes sense.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 04/11] x86/intel_pstate: relocate the driver register function

2015-06-23 Thread Jan Beulich

>>> On 23.06.15 at 10:24,  wrote:
> On 23/06/2015 16:11, Jan Beulich wrote:
>> >>> On 23.06.15 at 10:01,  wrote:
>> > On 23/06/2015 15:31, Jan Beulich wrote:
>> >> >>> On 23.06.15 at 05:40,  wrote:
>> >> > On 18/06/2015 22:30, Jan Beulich wrote:
>> >> >> >>> On 11.06.15 at 10:27,  wrote:
>> >> >> > -static int __init cpufreq_presmp_init(void)
>> >> >> > +int cpufreq_register_driver(struct cpufreq_driver *driver_data)
>> >> >> >  {
>> >> >> >  void *cpu = (void *)(long)smp_processor_id();
>> >> >> >  cpu_callback(&cpu_nfb, CPU_ONLINE, cpu);
>> >> >> > +if (!driver_data || !driver_data->init
>> >> >> > +|| !driver_data->verify || !driver_data->exit
>> >> >> > +|| (!driver_data->target == !driver_data->setpolicy))
>> >> >>
>> >> >> Do you really want/need to enforce this policy (target set if and
>> >> >> only if setpolicy is not set) here? And if that's to uniformly
>> >> >> hold, the two could be put into a union...
>> >> >
>> >> > driver_data->target() is used by a driver which relies on the old
>> >> > Governor framework.
>> >> > driver_data->setpolicy() is used by a driver which implements its
>> >> > internal governor.
>> >> > So, the driver either uses the old Governor framework or has its
>> >> > own private internal governor.
>> >> > We shouldn't change to use union, because in many places, we
>> >> > distinguish the two by checking if it's using "->target" or 
>> >> > "->setpolicy".
>> >>
>> >> The distinction between the two driver modes shouldn't be based on
>> >> arbitrary accessors they may or may not implement. There should be a
>> >> dedicated flag or alike.
>> >
>> > This is not arbitrary - "->target()" is dedicated to the Governor
>> > framework, and "->setpolicy" is dedicated to the internal governor
>> > implementation. The Linux kernel also takes advantage of this method.
>> > I think we don't need to add another new functionally equivalent flag to do
>> so.
>> > Shall we keep using it?
>> 
>> If this distinction is being made clear by comments accompanying the
>> definitions of the target and setpolicy fields, I'm fine with keeping it 
>> that way.
>> Without making this explicit it would continue to look arbitrary (and prone 
>> to
>> break, should another driver elect to also implement the setpolicy hook [for
>> whatever purpose]).
> 
> OK, I will add some explanation to the related commit message.

The commit message will not (normally) be read by people wanting
to make further adjustments to the code. Such an explanation
belongs - as I said - alongside the field declarations.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] libxl: Add AHCI support for upstream qemu

2015-06-23 Thread Fabio Fantoni


Il 22/06/2015 17:32, Stefano Stabellini ha scritto:

On Mon, 22 Jun 2015, Fabio Fantoni wrote:

Usage:
ahci=0|1 (default=0)

If enabled adds ich9 disk controller in ahci mode and uses it with
upstream qemu to emulate disks instead of ide.
It doesn't support cdroms which still using ide (cdroms will use
"-device ide-cd" as new qemu parameter)
Ahci requires new qemu parameter but for now other emulated disks cases
remains with old ones (I did it in other patch, not needed by this one)
I did it as libxl parameter disabled by default to avoid possible
problems:
- with save/restore/migration (restoring with ahci a domU that was with
ide instead)
- windows < 8 without pv drivers (a registry key change is needed for
AHCI<->IDE change FWIK to avoid possible blue screen)
- windows XP or older that many not support ahci by default.
Setting AHCI with libxl parameter and default to disabled seems the best
solution.
AHCI increase hvm domUs boot performance. On linux hvm domU I saw up to
only 20% of the previous total boot time, whereas boot time decrease a
lot on W7 domUs for most of boots I have done. Small difference in boot
time compared to ide mode on W8 and newer (probably other xen
improvements or fixes are needed not ahci related)

Signed-off-by: Fabio Fantoni 

I am OK with this patch going in Xen 4.6.



  docs/man/xl.cfg.pod.5   |  9 +
  tools/libxl/libxl.h | 10 ++
  tools/libxl/libxl_create.c  |  1 +
  tools/libxl/libxl_dm.c  | 10 +-
  tools/libxl/libxl_types.idl |  1 +
  tools/libxl/xl_cmdimpl.c|  1 +
  6 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index a3e0e2e..7e16123 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -904,6 +904,15 @@ default is B.
  
  =back
  
+=item B

+
+If enabled adds ich9 disk controller in ahci mode and uses it with
+upstream qemu to emulate disks instead of ide. It decrease boot time but
+may be not supported by default in windows xp and older windows.
+The default is disabled (0).
+
+=back
+
  =head3 Paging
  
  The following options control the mechanisms used to virtualise guest

diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 0a7913b..6a3677d 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -596,6 +596,16 @@ typedef struct libxl__ctx libxl_ctx;
  #define LIBXL_HAVE_SPICE_STREAMINGVIDEO 1
  
  /*

+ * LIBXL_HAVE_AHCI
+ *
+ * If defined, then the u.hvm structure will contain a boolean type:
+ * ahci. This value defines if ahci support is present.
+ *
+ * If this is not defined, the ahci support is ignored.
+ */
+#define LIBXL_HAVE_AHCI 1
+
+/*
   * LIBXL_HAVE_DOMAIN_CREATE_RESTORE_PARAMS 1
   *
   * If this is defined, libxl_domain_create_restore()'s API has changed to
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 86384d2..8ca2481 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -331,6 +331,7 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
  libxl_defbool_setdefault(&b_info->u.hvm.nested_hvm, false);
  libxl_defbool_setdefault(&b_info->u.hvm.usb,false);
  libxl_defbool_setdefault(&b_info->u.hvm.xen_platform_pci,   true);
+libxl_defbool_setdefault(&b_info->u.hvm.ahci,   false);
  
  libxl_defbool_setdefault(&b_info->u.hvm.spice.enable, false);

  if (!libxl_defbool_val(b_info->u.hvm.spice.enable) &&
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 33f9ce6..93f191a 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -818,6 +818,8 @@ static int libxl__build_device_model_args_new(libxl__gc *gc,
  flexarray_append(dm_args, libxl__sprintf(gc, "%"PRId64, ram_size));
  
  if (b_info->type == LIBXL_DOMAIN_TYPE_HVM) {

+if (libxl_defbool_val(b_info->u.hvm.ahci))
+flexarray_append_pair(dm_args, "-device", "ahci,id=ahci0");
  for (i = 0; i < num_disks; i++) {
  int disk, part;
  int dev_number =
@@ -872,7 +874,13 @@ static int libxl__build_device_model_args_new(libxl__gc 
*gc,
  drive = libxl__sprintf
  (gc, 
"file=%s,if=scsi,bus=0,unit=%d,format=%s,cache=writeback",
   pdev_path, disk, format);
-else if (disk < 4)
+else if (disk < 6 && libxl_defbool_val(b_info->u.hvm.ahci)){

One space missing.

This means that we can now have disks up to hdf. I think you should
update the in-code comment above. I think you should also update
docs/txt/misc/vbd-interface.txt, explaining that hd* mean AHCI if ahci=1
is specified.


Thanks for reply, I'll do in v2.





+flexarray_vappend(dm_args, "-drive",
+
GCSPRINTF("file=%s,if=none,id=ahcidisk-%d,format=%s,cache=writeback",
+pdev_path, disk, format), "-device", 
GCSPRINTF("ide-hd,bus=ahci0.%d,unit=0,dr

Re: [Xen-devel] PCI Passthrough ARM Design : Draft1

2015-06-23 Thread Ian Campbell

On Mon, 2015-06-22 at 14:33 -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Jun 17, 2015 at 03:35:02PM +0100, Stefano Stabellini wrote:
> > On Wed, 17 Jun 2015, Ian Campbell wrote:
> > > On Wed, 2015-06-17 at 07:14 -0700, Manish Jaggi wrote:
> > > > 
> > > > On Wednesday 17 June 2015 06:43 AM, Ian Campbell wrote:
> > > > > On Wed, 2015-06-17 at 13:58 +0100, Stefano Stabellini wrote:
> > > > >> Yes, pciback is already capable of doing that, see
> > > > >> drivers/xen/xen-pciback/conf_space.c
> > > > >>
> > > > >>> I am not sure if the pci-back driver can query the guest memory 
> > > > >>> map. Is there an existing hypercall ?
> > > > >> No, that is missing.  I think it would be OK for the virtual BAR to 
> > > > >> be
> > > > >> initialized to the same value as the physical BAR.  But I would let 
> > > > >> the
> > > > >> guest change the virtual BAR address and map the MMIO region 
> > > > >> wherever it
> > > > >> wants in the guest physical address space with
> > > > >> XENMEM_add_to_physmap_range.
> > > > > I disagree, given that we've apparently survived for years with x86 PV
> > > > > guests not being able to right to the BARs I think it would be far
> > > > > simpler to extend this to ARM and x86 PVH too than to allow guests to
> > > > > start writing BARs which has various complex questions around it.
> > > > > All that's needed is for the toolstack to set everything up and write
> > > > > some new xenstore nodes in the per-device directory with the BAR
> > > > > address/size.
> > > > >
> > > > > Also most guests apparently don't reassign the PCI bus by default, so
> > > > > using a 1:1 by default and allowing it to be changed would require
> > > > > modifying the guests to reasssign. Easy on Linux, but I don't know 
> > > > > about
> > > > > others and I imagine some OSes (especially simpler/embedded ones) are
> > > > > assuming the firmware sets up something sane by default.
> > > > Does the Flow below captures all points
> > > > a) When assigning a device to domU, toolstack creates a node in per 
> > > > device directory with virtual BAR address/size
> > > > 
> > > > Option1:
> > > > b) toolstack using some hypercall ask xen to create p2m mapping { 
> > > > virtual BAR : physical BAR } for domU
> > > > c) domU will not anytime update the BARs, if it does then it is a 
> > > > fault, 
> > > > till we decide how to handle it
> > > 
> > > As Julien has noted pciback already deals with this correctly, because
> > > sizing a BAR involves a write, it implementes a scheme which allows
> > > either the hardcoded virtual BAR to be written or all 1s (needed for
> > > size detection).
> > > 
> > > > d) when domU queries BAR address from pci-back the virtual BAR address 
> > > > is provided.
> > > > 
> > > > Option2:
> > > > b) domU will not anytime update the BARs, if it does then it is a 
> > > > fault, 
> > > > till we decide how to handle it
> > > > c) when domU queries BAR address from pci-back the virtual BAR address 
> > > > is provided.
> > > > d) domU sends a hypercall to map virtual BARs,
> > > > e) xen pci code reads the BAR and maps { virtual BAR : physical BAR } 
> > > > for domU
> > > > 
> > > > Which option is better I think Ian is for (2) and Stefano may be (1)
> > > 
> > > In fact I'm now (after Julien pointed out the current behaviour of
> > > pciback) in favour of (1), although I'm not sure if Stefano is too.
> > > 
> > > (I was never in favour of (2), FWIW, I previously was in favour of (3)
> > > which is like (2) except pciback makes the hypervcall to map the virtual
> > > bars to the guest, I'd still favour that over (2) but (1) is now my
> > > preference)
> > 
> > OK, let's go with (1).
> 
> Right, and as the maintainer of pciback that means I don't have to do
> anything right :-)

I _think_ there will need to be some mechanism for the toolstack to
inform pciback what the virtual BARs should contain, so it can correctly
process read requests. But other than review that (hopefully) small bit
of code, I think there is nothing for you to do.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 10/11] x86/intel_pstate: support the use of intel_pstate in pmstat.c

2015-06-23 Thread Wang, Wei W

On 23/06/2015 09:41, Wei Wang wrote:
> On 11/06/2015 16:31, Wei Wang wrote:
> > Add support in the pmstat.c so that the xenpm tool can request to
> > access the intel_pstate driver.
> 
> I want to propose some other changes here to commonize the intel_pstate
> implementation in this common code (pmstat.c).
> 
> 1) introduce a new struct:
> struct internal_governor {
> char *avail_gov;
> uint32_t cur_gov;
> uint32_t gov_num;
> }
> 
> 2) a new callback in cpufreq_driver, "cpufreq_driver-
> >get_internal_gov(internal_gov)",
>  This allows other possible cpufreq drivers, which implement their own
> internal governors, to return their own interenal governor info (e.g. we will
> not need to use the INTEL_PSTATE_GOV_NUM, which is defined to be 4, in
> this patch).


Jan, do you have any comments on this change?

Best,
Wei

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 07/11] x86/intel_pstate: changes in cpufreq_del_cpu for CPU offline

2015-06-23 Thread Jan Beulich

>>> On 23.06.15 at 10:21,  wrote:
> On 23/06/2015 16:08, Jan Beulich wrote:
>> >>> On 23.06.15 at 08:16,  wrote:
>> > On 19/06/2015 17:44, Jan Beulich wrote:
>> >> >>> On 11.06.15 at 10:28,  wrote:
>> >> > cpufreq_cpu_policy is used in intel_pstate_set_pstate(), so we
>> >> > change to NULL it after the call of cpufreq_driver->exit.
>> >> > Otherwise, a calltrace will show up on your screen due to the
>> >> > reference of a NULL pointer when you power down the system.
>> >>
>> >> Apart from what was already said on this patch, I think it is placed
>> >> too late in this series (should precede patch 6) or, even better, not
>> >> needed at all: ->exit() gets passed the policy being cleaned up, so I
>> >> don't follow why the driver needs to consult the per-CPU field to obtain 
>> >> it.
>> >
>> >> Plus, if the patch is to be kept, the description suggesting this to
>> >> be a problem at system shutdown only is wrong - it can equally well
>> >> happen while offlining a CPU. Just saying that the pointer is still needed
>> would be sufficient.
>> >
>> > It's needed. When unplugging a CPU, the intel_pstate driver sets it to
>> > run with the lowest P-state.
>> 
>> I understand the latter, but this doesn't explain why you can't do what I
>> suggested above.
> 
> Because the "->exit()" needs to call "intel_pstate_set_pstate()" which does 
> not receive "policy" as a parameter. "intel_pstate_set_pstate()" is also 
> called by another function without passing "policy". So I think it is simpler 
> to just change the order of NULLing the pointer, instead of changing more 
> code.

I yet have to see the amount of differences to the Linux original to be
convinced that this is the better approach, so let's see how the next
version of the series is going to look like.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 10/11] x86/intel_pstate: support the use of intel_pstate in pmstat.c

2015-06-23 Thread Jan Beulich

>>> On 23.06.15 at 03:40,  wrote:
> Hi Jan,
> On 11/06/2015 16:31, Wei Wang wrote:
>> Add support in the pmstat.c so that the xenpm tool can request to access the
>> intel_pstate driver.
> 
> I want to propose some other changes here to commonize the intel_pstate 
> implementation in this common code (pmstat.c).
>  
> 1) introduce a new struct:
> struct internal_governor {
> char *avail_gov;
> uint32_t cur_gov; 
> uint32_t gov_num;
> }
> 
> 2) a new callback in cpufreq_driver, 
> "cpufreq_driver->get_internal_gov(internal_gov)",
>  This allows other possible cpufreq drivers, which implement their own 
> internal governors, to return their own interenal governor info (e.g. we will 
> not need to use the INTEL_PSTATE_GOV_NUM, which is defined to be 4, in this 
> patch).
> 
> Do you think this would be better?

I really need to see this in context.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH OSSTEST] Add some sanity checks for presence of Repos configuration

2015-06-23 Thread Ian Campbell

Without this then anything which uses cr-daily-branch produces the
rather cryptic:

+ test -f daily.xsettings
++ ./ap-print-url xen-unstable
with-lock-ex ./ap-print-url: /lock: Permission denied
+ treeurl=
FAILED rc=255

Which has caught out one or two people using standalone mode.

Signed-off-by: Ian Campbell 
Cc: dario.faggi...@citrix.com
---
 README |  6 ++
 cri-lock-repos |  8 
 standalone | 15 +++
 3 files changed, 29 insertions(+)

diff --git a/README b/README
index 44e2989..a53fdc4 100644
--- a/README
+++ b/README
@@ -422,6 +422,12 @@ Stash
 Images
 Logs
 
+Repos  Full path to a temporary directory where repositories can
+  be cloned. This is needed for anything which uses 
cr-daily-branch,
+  including "./standalone make-flight" and
+  "standalone-generate-dump-flight-runvars".
+
+
 DebianSuite
 GuestDebianSuite   defaults to DebianSuite
 
diff --git a/cri-lock-repos b/cri-lock-repos
index 7d10c87..ee0d9ba 100644
--- a/cri-lock-repos
+++ b/cri-lock-repos
@@ -20,6 +20,14 @@
 . cri-common
 
 repos=`getconfig Repos`
+if [ -z "$repos" ] ; then
+   echo "Repos must be configured in $config" >&2
+   exit 1
+fi
+if [ ! -d "$repos" ] ; then
+   echo "Repos $repos does not exist" >&2
+   exit 1
+fi
 repos_lock="$repos/lock"
 
 if [ "x$OSSTEST_REPOS_LOCK_LOCKED" != "x$repos_lock" ]; then
diff --git a/standalone b/standalone
index 91d18b5..a81a648 100755
--- a/standalone
+++ b/standalone
@@ -142,6 +142,20 @@ need_host() {
 fi
 }
 
+check_repos() {
+local repos=`OSSTEST_CONFIG=$config getconfig Repos`
+if [ -z "$repos" ] ; then
+   echo "Repos must be configured in $config" >&2
+   exit 1
+fi
+if [ ! -d "$repos" ] ; then
+   # Is likely an absolute path, so don't create automatically,
+   # just in case...
+   echo "Repos $repos does not exist" >&2
+   exit 1
+fi
+}
+
 ensure_logs() {
 if [ ! -d "logs" ] ; then
mkdir "logs"
@@ -171,6 +185,7 @@ case $op in
;;
 
 make-flight)
+check_repos
 need_flight
 
 if [ $# -lt 1 ] ; then
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [linux-3.4 test] 58831: regressions - FAIL

2015-06-23 Thread osstest service user

flight 58831 linux-3.4 real [real]
http://logs.test-lab.xenproject.org/osstest/logs/58831/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-qemut-win7-amd64  6 xen-boot  fail REGR. vs. 30511

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-pair10 xen-boot/dst_host   fail pass in 58798
 test-amd64-amd64-pair 9 xen-boot/src_host   fail pass in 58798
 test-amd64-amd64-xl-sedf-pin  6 xen-bootfail pass in 58798
 test-amd64-i386-xl-qemuu-win7-amd64  9 windows-install  fail pass in 58820

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm 6 xen-boot fail baseline untested
 test-amd64-i386-libvirt-xsm   6 xen-bootfail baseline untested
 test-amd64-amd64-xl-multivcpu  6 xen-boot   fail baseline untested
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm 6 xen-boot fail baseline untested
 test-amd64-i386-xl-qemut-debianhvm-amd64-xsm 6 xen-boot fail baseline untested
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm 6 xen-boot fail baseline untested
 test-amd64-amd64-xl-sedf  6 xen-boot fail   like 30406
 test-amd64-amd64-xl-qemuu-win7-amd64 16 guest-stop fail like 30511
 test-amd64-i386-xl-qemut-win7-amd64 16 guest-stop  fail like 30511
 test-amd64-amd64-xl-qemuu-ovmf-amd64  6 xen-bootfail like 53709-bisect
 test-amd64-i386-freebsd10-amd64  6 xen-boot fail like 58780-bisect
 test-amd64-i386-xl-qemuu-winxpsp3  6 xen-boot   fail like 58786-bisect
 test-amd64-i386-qemut-rhel6hvm-intel  6 xen-bootfail like 58788-bisect
 test-amd64-i386-rumpuserxen-i386  6 xen-bootfail like 58799-bisect
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1  6 xen-bootfail like 58801-bisect
 test-amd64-amd64-xl-qemuu-debianhvm-amd64  6 xen-boot   fail like 58803-bisect
 test-amd64-amd64-xl-qemut-winxpsp3  6 xen-boot  fail like 58804-bisect
 test-amd64-i386-freebsd10-i386  6 xen-boot  fail like 58805-bisect
 test-amd64-i386-xl-qemuu-ovmf-amd64  6 xen-boot fail like 58806-bisect
 test-amd64-amd64-xl-qemuu-winxpsp3  6 xen-boot  fail like 58807-bisect
 test-amd64-i386-xl-qemut-winxpsp3  6 xen-boot   fail like 58808-bisect
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1  6 xen-bootfail like 58809-bisect
 test-amd64-amd64-rumpuserxen-amd64  6 xen-boot  fail like 58810-bisect
 test-amd64-i386-xl-qemuu-debianhvm-amd64  6 xen-bootfail like 58811-bisect
 test-amd64-amd64-xl-qemut-debianhvm-amd64  6 xen-boot   fail like 58813-bisect
 test-amd64-i386-qemuu-rhel6hvm-intel  6 xen-bootfail like 58814-bisect
 test-amd64-i386-xl-qemut-debianhvm-amd64  6 xen-bootfail like 58815-bisect

Tests which did not succeed, but are not blocking:
 test-amd64-i386-xl-qemuu-win7-amd64 16 guest-stop fail in 58820 never pass
 test-amd64-amd64-xl-pvh-amd  11 guest-start  fail   never pass
 test-amd64-amd64-xl-pvh-intel 11 guest-start  fail  never pass
 test-amd64-i386-libvirt  12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 12 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 12 migrate-support-checkfail   never pass

version targeted for testing:
 linuxcf1b3dad6c5699b977273276bada8597636ef3e2
baseline version:
 linuxbb4a05a0400ed6d2f1e13d1f82f289ff74300a70


500 people touched revisions under test,
not listing them all


jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 build-amd64-rumpuserxen  pass
 build-i386-rumpuserxen   pass
 test-amd64-amd64-xl  pass
 test-amd64-i386-xl   pass
 test-amd64-amd64-xl-qemut-debianhvm-amd64-xsmfail
 test-amd64-i386-xl-qemut-debianhvm-amd64-xsm fail
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsmfail
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm fail
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsmpass
 test-amd64-i386-xl

Re: [Xen-devel] [PATCH v9 06/13] x86: dynamically get/set CBM for a domain

2015-06-23 Thread Chao Peng

On Tue, Jun 23, 2015 at 09:35:30AM +0100, Jan Beulich wrote:
> >>> On 23.06.15 at 09:19,  wrote:
> > On Tue, Jun 16, 2015 at 08:08:34AM +0100, Jan Beulich wrote:
> >> >>> On 03.06.15 at 06:53,  wrote:
> >> > +int psr_set_l3_cbm(struct domain *d, unsigned int socket, uint64_t cbm)
> >> > +{
> >> > +unsigned int old_cos, cos;
> >> > +struct psr_cat_cbm *map, *found = NULL;
> >> > +struct psr_cat_socket_info *info = NULL;
> >> > +int ret = get_cat_socket_info(socket, &info);
> >> > +
> >> > +if ( ret )
> >> > +return ret;
> >> > +
> >> > +if ( !psr_check_cbm(info->cbm_len, cbm) )
> >> > +return -EINVAL;
> >> > +
> >> > +old_cos = d->arch.psr_cos_ids[socket];
> >> > +map = info->cos_to_cbm;
> >> > +
> >> > +spin_lock(&info->cbm_lock);
> >> > +
> >> > +for ( cos = 0; cos <= info->cos_max; cos++ )
> >> > +{
> >> > +/* If still not found, then keep unused one. */
> >> > +if ( !found && cos != 0 && map[cos].ref == 0 )
> >> > +found = map + cos;
> >> > +else if ( map[cos].cbm == cbm )
> >> > +{
> >> > +if ( unlikely(cos == old_cos) )
> >> > +{
> >> > +spin_unlock(&info->cbm_lock);
> >> > +return 0;
> >> 
> >> Is this in particular, but also the surrounding "else if", correct when
> >> map[cos].ref == 0? 
> > 
> > I can't see any problem now.
> 
> Further down in the function you increment found->ref, and it looks
> suspicious that you return success here having found a slot possibly
> having refount zero (and thus available for re-use for another CBM).

In such case 'cos == old_cos' can't be true, otherwise refcount can't be
zero.

> I.e. if this indeed is intended and correct, I think this needs to be
> explained in a brief comment.

So yes, I can leave a comment here.

Chao

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCHv1] x86: only check for one watchdog NMI

2015-06-23 Thread Jan Beulich

>>> On 22.06.15 at 18:21,  wrote:
> Since the NMI handler can now recognize watchdog NMIs, make
> check_nmi_watchdog() only check for at least one watchdog NMI.  This
> prevents false negatives caused by other processors (which may be
> being power managed by the BIOS) running at reduced clock frequencies.

Doesn't this go a little too far, in that it allows through the case where
an NMI works exactly once (a behavior not unheard of)? Lowering the
count to e.g. 3 would seem acceptable, but not any further. Raising
the wait time might then need to be the way to go if the (approximate)
1:3 ratio still isn't enough to cope with BIOS power managed CPUs.

Btw., can an OS know of the power state CPUs come up in? I.e. can
the wait time be adjusted dynamically? Or is this (perhaps intentionally)
completely transparent to the OS?

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [DESIGN] Feature Levelling improvements

2015-06-23 Thread Andrew Cooper

On 22/06/15 20:18, Konrad Rzeszutek Wilk wrote:
> Thank you for posting this!
>
> Some comments below.
>
>> Design
>> ==
>>
>> `struct sysctl_physinfo.levelling_caps`
>> ---
>>
>> Xen shall gain a new physinfo field which reports the degree to which it can
>> influence `CPUID` executed by a PV guest.  This is a bitmap containing:
>>
>> * `faulting`
>> * CPUID Faulting is available, and full control can be exercised.
>> * `mask_ecx`
>> * Leaf 0x0001.ECX
>> * `mask_edx`
>> * Leaf 0x0001.EDX
>> * `mask_extd_ecx`
>> * Leaf 0x8001.ECX
>> * `mask_extd_edx`
>> * Leaf 0x8001.EDX
>> * `mask_xsave_eax`
>> * Leaf 0x000D[ECX=1].EAX
>> * `mask_therm_ecx`
>> * Leaf 0x0006.ECX
>> * `mask_l7s0_eax`
>> * Leaf 0x0007[ECX=0].EAX
>> * `mask_l7s0_ebx`
> Those 'l' look like '1' in the PDF.
>
> Can it be called something else?

If you can suggest a better name, yes.  As for now, these are the
variable names used in-tree (top of xen/arch/x86/cpu/amd.c)

>
>> * Leaf 0x0007[ECX=0].EBX
>>
>> At the time of writing, these are all the masking MSRs known by Xen.  The
>> bitmap shall be extended as new MSRs become available.
>>
>> New 'featureset' API for use by the toolstack
>> -
>>
>> A featureset is a defined as a collection of words covering the cpuid leaves
>> which report features to the caller.  It is variable length, and expected to
>> grow over time as processors gain more features, or Xen starts supporting
>> exposing more features to guests.
>>
>> At the time of writing, the leaves containing feature bits are:
>>
>> * 0x0001.ECX
>> * 0x0001.EDX
>> * 0x8001.ECX
>> * 0x8001.EDX
>> * 0x000D[ECX=1].EAX
>> * 0x0007[ECX=0].EBX
>> * 0x0006.EAX
>> * 0x0006.ECX
>> * 0x000A.EAX
>> * 0x000A.EBX
>> * 0x000F[ECX=0].EDX
>> * 0x000F[ECX=1].EDX
>>
>> XEN_SYSCTL_get_featureset
>> -
>>
>> Xen shall on boot create a featureset for itself, and the maximum available
>> features for each type of guest, based on hardware features, command line
>> options etc.  A toolstack shall be able to query all of these.
> maximum available features?

Maximum set of features Xen is able to provide to particular guests on
this specific host.

>  As in two sets of features - one for
> PV and another for HVM. The PV being a subset of HVM (since it is more
> constrained)?

Three really (including the host featureset), but yes.

>
> Command line options being the same old ones (the cpuid_mask..?) and then
> more? Or just rewrite this to be:
>
> cpuid=mask_therm_ecx=[blahbla],mask_xsave_eax=[blahbal] ?

No.  What I meant by that is that something like "no-xsave" will turn
off whole swathes of features in all sets.

The maximum set of features available to Xen, PV and HVM guests alike
depends on the hardware, firmware settings and command line options
provided to Xen enabling or disabling functionality.

It is specifically not guaranteed to remain the same across reboot,
which is why Xen shall recalculate it on each boot.

>
>
>> Cpuid feature-verification library
>> --
>>
>> There shall be a new library (shared between Xen and libxc in the same
>> way as
>> libelf etc.) which can verify the a featureset.  In particular, it will
> s/ a //
>> confirm that no features are enabled without their dependent features.
> And presumarily can compare these features and do a and-subset (or an
> or-subset) ?

At the end of the day, these are just bitmaps with a (unknown but fixed)
integer length.

>
>> XEN_DOMCTL_set_cpuid
>> 
>>
>> This is an existing hypercall.  Currently it just stashes the policy from
>> userspace.  It shall be extended to provide verification of the policy, and
>> reject attempts to advertise features which Xen is incapable of providing
>> (via hardware or emulation support).
> Where would be the code to trim the 'maximum available features' in the
> subsets (like PV) with some cpuid=X flags from user-space?

There is already code to do this in both libxl and libxc.  There will of
course be some changes as part of this work, but nothing major (I hope).

The important point is that the hypercall shall now check Xen's ability
to provide what the toolstack has requested, and say no if it can't. 
This will avoid the current situation which exists where the domain
cpuid code in Xen is always needing to second-guess what is present in
the domain policy, due to it usually being junk.

>
>
>> VCPU context switch
>> ---
>>
>> Xen shall be updated to lazily context switch all available masking
>> MSRs.  It
>> is noted that this shall incur a performance overhead if restricted
>> featuresets are assigned to PV guests, and _CPUID Faulting_ is not
>> available.
>>
>> It shall be the responsibility of the host administrator to avoid creating
>> such a scenario, if the performance overh

Re: [Xen-devel] [PATCH v9 06/13] x86: dynamically get/set CBM for a domain

2015-06-23 Thread Jan Beulich

>>> On 23.06.15 at 11:03,  wrote:
> On Tue, Jun 23, 2015 at 09:35:30AM +0100, Jan Beulich wrote:
>> >>> On 23.06.15 at 09:19,  wrote:
>> > On Tue, Jun 16, 2015 at 08:08:34AM +0100, Jan Beulich wrote:
>> >> >>> On 03.06.15 at 06:53,  wrote:
>> >> > +int psr_set_l3_cbm(struct domain *d, unsigned int socket, uint64_t cbm)
>> >> > +{
>> >> > +unsigned int old_cos, cos;
>> >> > +struct psr_cat_cbm *map, *found = NULL;
>> >> > +struct psr_cat_socket_info *info = NULL;
>> >> > +int ret = get_cat_socket_info(socket, &info);
>> >> > +
>> >> > +if ( ret )
>> >> > +return ret;
>> >> > +
>> >> > +if ( !psr_check_cbm(info->cbm_len, cbm) )
>> >> > +return -EINVAL;
>> >> > +
>> >> > +old_cos = d->arch.psr_cos_ids[socket];
>> >> > +map = info->cos_to_cbm;
>> >> > +
>> >> > +spin_lock(&info->cbm_lock);
>> >> > +
>> >> > +for ( cos = 0; cos <= info->cos_max; cos++ )
>> >> > +{
>> >> > +/* If still not found, then keep unused one. */
>> >> > +if ( !found && cos != 0 && map[cos].ref == 0 )
>> >> > +found = map + cos;
>> >> > +else if ( map[cos].cbm == cbm )
>> >> > +{
>> >> > +if ( unlikely(cos == old_cos) )
>> >> > +{
>> >> > +spin_unlock(&info->cbm_lock);
>> >> > +return 0;
>> >> 
>> >> Is this in particular, but also the surrounding "else if", correct when
>> >> map[cos].ref == 0? 
>> > 
>> > I can't see any problem now.
>> 
>> Further down in the function you increment found->ref, and it looks
>> suspicious that you return success here having found a slot possibly
>> having refount zero (and thus available for re-use for another CBM).
> 
> In such case 'cos == old_cos' can't be true, otherwise refcount can't be
> zero.
> 
>> I.e. if this indeed is intended and correct, I think this needs to be
>> explained in a brief comment.
> 
> So yes, I can leave a comment here.

Or maybe even better a respective ASSERT().

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v2] libxl: Add AHCI support for upstream qemu

2015-06-23 Thread Fabio Fantoni

Usage:
ahci=0|1 (default=0)

If enabled adds ich9 disk controller in ahci mode and uses it with
upstream qemu to emulate disks instead of ide.
It doesn't support cdroms which still using ide (cdroms will use
"-device ide-cd" as new qemu parameter)
Ahci requires new qemu parameter but for now other emulated disks cases
remains with old ones (I did it in other patch, not needed by this one)
I did it as libxl parameter disabled by default to avoid possible
problems:
- with save/restore/migration (restoring with ahci a domU that was with
ide instead)
- windows < 8 without pv drivers (a registry key change is needed for
AHCI<->IDE change FWIK to avoid possible blue screen)
- windows XP or older that many not support ahci by default.
Setting AHCI with libxl parameter and default to disabled seems the best
solution.
AHCI increase hvm domUs boot performance. On linux hvm domU I saw up to
only 20% of the previous total boot time, whereas boot time decrease a
lot on W7 domUs for most of boots I have done. Small difference in boot
time compared to ide mode on W8 and newer (probably other xen
improvements or fixes are needed not ahci related)

Signed-off-by: Fabio Fantoni 

---

Changes in v2:
- libxl_dm.c: small code style fix
- added vbd-interface.txt changes
---
 docs/man/xl.cfg.pod.5   |  9 +
 docs/misc/vbd-interface.txt |  5 +++--
 tools/libxl/libxl.h | 10 ++
 tools/libxl/libxl_create.c  |  1 +
 tools/libxl/libxl_dm.c  | 10 +-
 tools/libxl/libxl_types.idl |  1 +
 tools/libxl/xl_cmdimpl.c|  1 +
 7 files changed, 34 insertions(+), 3 deletions(-)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index a3e0e2e..7e16123 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -904,6 +904,15 @@ default is B.
 
 =back
 
+=item B
+
+If enabled adds ich9 disk controller in ahci mode and uses it with
+upstream qemu to emulate disks instead of ide. It decrease boot time but
+may be not supported by default in windows xp and older windows.
+The default is disabled (0).
+
+=back
+
 =head3 Paging
 
 The following options control the mechanisms used to virtualise guest
diff --git a/docs/misc/vbd-interface.txt b/docs/misc/vbd-interface.txt
index f873db0..afb6846 100644
--- a/docs/misc/vbd-interface.txt
+++ b/docs/misc/vbd-interface.txt
@@ -3,18 +3,19 @@ Xen guest interface
 
 A Xen guest can be provided with block devices.  These are always
 provided as Xen VBDs; for HVM guests they may also be provided as
-emulated IDE or SCSI disks.
+emulated IDE, AHCI or SCSI disks.
 
 The abstract interface involves specifying, for each block device:
 
  * Nominal disk type: Xen virtual disk (aka xvd*, the default); SCSI
-   (sd*); IDE (hd*).
+   (sd*); IDE or AHCI (hd*).
 
For HVM guests, each whole-disk hd* and and sd* device is made
available _both_ via emulated IDE resp. SCSI controller, _and_ as a
Xen VBD.  The HVM guest is entitled to assume that the IDE or SCSI
disks available via the emulated IDE controller target the same
underlying devices as the corresponding Xen VBD (ie, multipath).
+   In hd* case with ahci=1, disk will be AHCI via emulated ich9 controller.
 
For PV guests every device is made available to the guest only as a
Xen VBD.  For these domains the type is advisory, for use by the
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 0a7913b..6a3677d 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -596,6 +596,16 @@ typedef struct libxl__ctx libxl_ctx;
 #define LIBXL_HAVE_SPICE_STREAMINGVIDEO 1
 
 /*
+ * LIBXL_HAVE_AHCI
+ *
+ * If defined, then the u.hvm structure will contain a boolean type:
+ * ahci. This value defines if ahci support is present.
+ *
+ * If this is not defined, the ahci support is ignored.
+ */
+#define LIBXL_HAVE_AHCI 1
+
+/*
  * LIBXL_HAVE_DOMAIN_CREATE_RESTORE_PARAMS 1
  *
  * If this is defined, libxl_domain_create_restore()'s API has changed to
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 86384d2..8ca2481 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -331,6 +331,7 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
 libxl_defbool_setdefault(&b_info->u.hvm.nested_hvm, false);
 libxl_defbool_setdefault(&b_info->u.hvm.usb,false);
 libxl_defbool_setdefault(&b_info->u.hvm.xen_platform_pci,   true);
+libxl_defbool_setdefault(&b_info->u.hvm.ahci,   false);
 
 libxl_defbool_setdefault(&b_info->u.hvm.spice.enable, false);
 if (!libxl_defbool_val(b_info->u.hvm.spice.enable) &&
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 33f9ce6..9216028 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -818,6 +818,8 @@ static int libxl__build_device_model_args_new(libxl__gc *gc,
 flexarray_append(dm_args, libxl__sprintf(gc, "%"PRId64, ram_size));
 
 if (b_info->type == LIBXL_DOMAIN_TYPE_HVM) {
+if (libxl_defboo

Re: [Xen-devel] [osstest test] 58828: regressions - trouble: blocked/broken/fail/pass

2015-06-23 Thread Ian Campbell

On Tue, 2015-06-23 at 02:18 +, osstest service user wrote:
> flight 58828 osstest real [real]
> http://logs.test-lab.xenproject.org/osstest/logs/58828/
> 
> Regressions :-(
> 
> Tests which did not succeed and are blocking,
> including tests which could not be run:
>  build-armhf   3 host-install(3) broken REGR. vs. 
> 58726
>  build-armhf-xsm   3 host-install(3) broken REGR. vs. 
> 58726
>  build-armhf-pvops 3 host-install(3) broken REGR. vs. 
> 58726

Copy kernel 
/home/tftp//osstest/debian-installer/armhf/2015-01-10-wheezy/backports.deb 
failed: No such file or directory at Osstest/Debian.pm line 884.

Looks like I probably botched something in one of:
5dd4c52291b6 mg-debian-installer-update: updates to better handle Jessie 
onwards.
cdbe966bb86e More flexible handling of need-kernel-deb-$flavour host flag

I'll investigate.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH OSSTEST] Add some sanity checks for presence of Repos configuration

2015-06-23 Thread Dario Faggioli

On Tue, 2015-06-23 at 09:59 +0100, Ian Campbell wrote:
> Without this then anything which uses cr-daily-branch produces the
> rather cryptic:
> 
> + test -f daily.xsettings
> ++ ./ap-print-url xen-unstable
> with-lock-ex ./ap-print-url: /lock: Permission denied
> + treeurl=
> FAILED rc=255
> 
> Which has caught out one or two people using standalone mode.
> 
> Signed-off-by: Ian Campbell 
> Cc: dario.faggi...@citrix.com
>
Reviewed-and-Tested-by: Dario Faggioli 

Regards,
Dario

-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC v2 14/15] Suppress posting interrupts when 'SN' is set

2015-06-23 Thread Wu, Feng



> -Original Message-
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Thursday, June 18, 2015 4:10 PM
> To: Wu, Feng
> Cc: andrew.coop...@citrix.com; george.dun...@eu.citrix.com; Tian, Kevin;
> Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org
> Subject: RE: [RFC v2 14/15] Suppress posting interrupts when 'SN' is set
> 
> >>> On 18.06.15 at 10:01,  wrote:
> 
> >
> >> -Original Message-
> >> From: Jan Beulich [mailto:jbeul...@suse.com]
> >> Sent: Thursday, June 18, 2015 3:51 PM
> >> To: Wu, Feng
> >> Cc: andrew.coop...@citrix.com; george.dun...@eu.citrix.com; Tian, Kevin;
> >> Zhang, Yang Z; xen-devel@lists.xen.org; k...@xen.org
> >> Subject: RE: [RFC v2 14/15] Suppress posting interrupts when 'SN' is set
> >>
> >> >>> On 18.06.15 at 09:34,  wrote:
> >> > Seems it is a little tricky here. What we need to do for this is
> >> > like something below:
> >> >
> >> > ..
> >> >
> >> > else if ( !pi_test_sn(&v->arch.hvm_vmx.pi_desc) ||
> >> >   !pi_test_and_set_on(&v->arch.hvm_vmx.pi_desc) )
> >> > {
> >> > __vmx_deliver_posted_interrupt(v);
> >> > return;
> >> > }
> >> >
> >> > ..
> >> >
> >> > But how to handle this if 'SN' is set between pi_test_sn() and
> >> > pi_test_and_set_on() ? Seems we cannot guarantee this two
> >> > operations are atomic in software way.But thinking a bit
> >> > more, maybe this solution is acceptable, 'SN' is only set when
> >> > the vCPU's state is going to be runnable, which means the
> >> > vCPU is not running in non-root, in this case, it doesn't matter
> >> > whether we send notification event or not, as long as the
> >> > interrupt is stored in PIR, and they will be delivered to guest
> >> > during the next vm-entry.
> >> >
> >> > Any ideas about this?
> >>
> >> Unless the two functions don't do what their name suggests I
> >> don't see why you need to invoke pi_test_sn() first - the purpose
> >> of pi_test_and_set_on() after all is what its name says: Test
> >> and set the flag atomically, returning the previous state.
> >
> > If 'SN' is set, then interrupts are suppress, we cannot send
> > notification event, then we don't need to test and set 'ON',
> > and it the purpose of this patch,right?
> 
> Oh, sorry, the single character difference in the name didn't
> catch my attention. Testing the state of one bit and setting
> (perhaps also along with testing it) another one should be done
> with cmpxchg().

Yes, we can use cmpxchg(), but what happened if other bits are
changed during this process. Please see the following pseudo
code and scenario:

uint64_t flag = pi_desc.control;
uint64_t old = flag & ( ~(POSTED_INTR_ON | POSTED_INTR_SN) );
uint64_t new = flag | POSTED_INTR_ON;

/* What should we do if other bits except ON and SN are
  changed here?*/

cmpxchg(&flag, old, new);

I think about this for some time, unfortunately I don't get a good
solution for this. Could you please give more hints? Thanks a lot!

Thanks,
Feng

> 
> Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH RFC v1 05/13] libxc: introduce a domain loader for HVM guest firmware

2015-06-23 Thread Jan Beulich

>>> On 22.06.15 at 18:11,  wrote:
> Introduce a very simple (and dummy) domain loader to be used to load the
> firmware (hvmloader) into HVM guests. Since hmvloader is just a 32bit elf
> executable the loader is fairly simple.

But hvmloader gets loaded fine today - why is this needed?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH RFC v1 09/13] elfnotes: intorduce a new PHYS_ENTRY elfnote

2015-06-23 Thread Jan Beulich

>>> On 22.06.15 at 18:11,  wrote:
> @@ -213,6 +214,9 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
>  elf, note, sizeof(*parms->f_supported), i);
>  break;
>  
> +case XEN_ELFNOTE_PHYS_ENTRY:
> +parms->phys_entry = val;

I don't think I've seen this field having got added in an earlier patch,
and it's also not getting added here.

> --- a/xen/include/public/elfnote.h
> +++ b/xen/include/public/elfnote.h
> @@ -200,9 +200,18 @@
>  #define XEN_ELFNOTE_SUPPORTED_FEATURES 17
>  
>  /*
> + * Physical entry point into the kernel.
> + *
> + * 32bit entry point into the kernel. Xen will use this entry point
> + * in order to launch the guest kernel in 32bit protected mode
> + * with paging disabled.
> + */
> +#define XEN_ELFNOTE_PHYS_ENTRY 18

Perhaps XEN_ELFNOTE_PHYS_ENTRY32 or XEN_ELFNOTE_PHYS32_ENTRY
then?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCHv1] x86: only check for one watchdog NMI

2015-06-23 Thread Andrew Cooper

On 23/06/15 10:09, Jan Beulich wrote:
 On 22.06.15 at 18:21,  wrote:
>> Since the NMI handler can now recognize watchdog NMIs, make
>> check_nmi_watchdog() only check for at least one watchdog NMI.  This
>> prevents false negatives caused by other processors (which may be
>> being power managed by the BIOS) running at reduced clock frequencies.
> Doesn't this go a little too far, in that it allows through the case where
> an NMI works exactly once (a behavior not unheard of)? Lowering the
> count to e.g. 3 would seem acceptable, but not any further. Raising
> the wait time might then need to be the way to go if the (approximate)
> 1:3 ratio still isn't enough to cope with BIOS power managed CPUs.
>
> Btw., can an OS know of the power state CPUs come up in? I.e. can
> the wait time be adjusted dynamically? Or is this (perhaps intentionally)
> completely transparent to the OS?

With the mwait driver, probably, as it is specifically designed to
completely bypass any firmware settings which might be in place.

Anything else is dependent on how much information can be gleaned from
the ACPI tables, but most firmware deliberately has a "BIOS controlled"
mode which is designed to restrict what the OS is capable of doing.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH RFC v1 05/13] libxc: introduce a domain loader for HVM guest firmware

2015-06-23 Thread Roger Pau Monné

El 23/06/15 a les 11.29, Jan Beulich ha escrit:
 On 22.06.15 at 18:11,  wrote:
>> Introduce a very simple (and dummy) domain loader to be used to load the
>> firmware (hvmloader) into HVM guests. Since hmvloader is just a 32bit elf
>> executable the loader is fairly simple.
> 
> But hvmloader gets loaded fine today - why is this needed?

So that we can use the same set of xc_dom_* functions that we currently
use on x86 to build PV guests, instead of the crappy setup_guest stub
that we have in xc_hvm_build_x86.c.

As said in the cover letter, patches 1 to 7 aim at unifying the way HVM
and PV guests are created on x86.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PULL 0/5] xen-220615

2015-06-23 Thread Peter Maydell

On 22 June 2015 at 14:08, Stefano Stabellini
 wrote:
> The following changes since commit 00967f4e0bab246679d0ddc32fd31a7179345baf:
>
>   Merge remote-tracking branch 'remotes/agraf/tags/signed-s390-for-upstream' 
> into staging (2015-06-05 12:04:42 +0100)
>
> are available in the git repository at:
>
>
>   git://xenbits.xen.org/people/sstabellini/qemu-dm.git tags/xen-220615
>
> for you to fetch changes up to 960ce9e49c4aaf591f13ced0e74a93499f46016b:
>
>   Revert "xen-hvm: increase maxmem before calling xc_domain_populate_physmap" 
> (2015-06-22 13:00:42 +)
>
> 
> Xen tree 22/06/2015
>
> 
> Jan Beulich (4):
>   xen/pass-through: fold host PCI command register writes
>   xen/pass-through: ROM BAR handling adjustments
>   xen/pass-through: log errno values rather than function return ones
>   xen/pass-through: constify some static data
>
> Stefano Stabellini (1):
>   Revert "xen-hvm: increase maxmem before calling 
> xc_domain_populate_physmap"

I'm afraid I can't apply this -- this revert commit is missing a signed-off-by
line. Can you respin, please?

thanks
-- PMM

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH RFC v1 09/13] elfnotes: intorduce a new PHYS_ENTRY elfnote

2015-06-23 Thread Roger Pau Monné

El 23/06/15 a les 11.35, Jan Beulich ha escrit:
 On 22.06.15 at 18:11,  wrote:
>> @@ -213,6 +214,9 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary 
>> *elf,
>>  elf, note, sizeof(*parms->f_supported), i);
>>  break;
>>  
>> +case XEN_ELFNOTE_PHYS_ENTRY:
>> +parms->phys_entry = val;
> 
> I don't think I've seen this field having got added in an earlier patch,
> and it's also not getting added here.

Yes, it's added in patch 5, because it's also used by the HVM-generic
loader.

>> --- a/xen/include/public/elfnote.h
>> +++ b/xen/include/public/elfnote.h
>> @@ -200,9 +200,18 @@
>>  #define XEN_ELFNOTE_SUPPORTED_FEATURES 17
>>  
>>  /*
>> + * Physical entry point into the kernel.
>> + *
>> + * 32bit entry point into the kernel. Xen will use this entry point
>> + * in order to launch the guest kernel in 32bit protected mode
>> + * with paging disabled.
>> + */
>> +#define XEN_ELFNOTE_PHYS_ENTRY 18
> 
> Perhaps XEN_ELFNOTE_PHYS_ENTRY32 or XEN_ELFNOTE_PHYS32_ENTRY
> then?

That's fine, I don't mind changing it. Although AFAIK it's not possible
to have a 64bit physical entry point (paging-disabled).

Roger.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH RFC v1 11/13] xen/libxl: allow creating HVM guests without a device model

2015-06-23 Thread Jan Beulich

>>> On 22.06.15 at 18:11,  wrote:
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -343,7 +343,7 @@ u64 hvm_get_guest_tsc_adjust(struct vcpu *v)
>  void hvm_migrate_timers(struct vcpu *v)
>  {
>  /* PVH doesn't use rtc and emulated timers, it uses pvclock mechanism. */
> -if ( is_pvh_vcpu(v) )
> +if ( is_pvh_vcpu(v) || v->domain->arch.hvm_domain.no_emu )

Why would you need to keep the is-PVH check when you have ...

> @@ -1485,9 +1485,10 @@ int hvm_domain_initialise(struct domain *d)
>  else
>  d->arch.hvm_domain.io_bitmap = hvm_io_bitmap;
>  
> -if ( is_pvh_domain(d) )
> +if ( is_pvh_domain(d) || domcr_flags & DOMCRF_noemu )
>  {
>  register_portio_handler(d, 0, 0x10003, handle_pvh_io);
> +d->arch.hvm_domain.no_emu = TRUE;

... this (which of course shouldn't use TRUE).

> @@ -2327,6 +2328,9 @@ int hvm_vcpu_initialise(struct vcpu *v)
>  return 0;
>  }
>  
> +if ( d->arch.hvm_domain.no_emu )
> +return 0;
> +
>  rc = setup_compat_arg_xlat(v); /* teardown: free_compat_arg_xlat() */

How are you going to get away without an argument translation
area?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH OSSTEST] Add some sanity checks for presence of Repos configuration

2015-06-23 Thread Ian Jackson

Ian Campbell writes ("[PATCH OSSTEST] Add some sanity checks for presence of 
Repos configuration"):
> Without this then anything which uses cr-daily-branch produces the
> rather cryptic:
> 
> + test -f daily.xsettings
> ++ ./ap-print-url xen-unstable
> with-lock-ex ./ap-print-url: /lock: Permission denied
> + treeurl=
> FAILED rc=255
> 
> Which has caught out one or two people using standalone mode.

I agree this should be improved, thanks.

> diff --git a/README b/README
> index 44e2989..a53fdc4 100644
> --- a/README
> +++ b/README
> @@ -422,6 +422,12 @@ Stash
>  Images
>  Logs
>  
> +Repos  Full path to a temporary directory where repositories \
can
> +be cloned. This is needed for anything which uses cr-dail\
y-branch,
> +including "./standalone make-flight" and

Shows wrap damage on my screen, as you see.

> diff --git a/cri-lock-repos b/cri-lock-repos
> index 7d10c87..ee0d9ba 100644
> --- a/cri-lock-repos
> +++ b/cri-lock-repos
> @@ -20,6 +20,14 @@
>  . cri-common
>  
>  repos=`getconfig Repos`
> +if [ -z "$repos" ] ; then
> + echo "Repos must be configured in $config" >&2
> + exit 1
> +fi
...
> +check_repos() {
> +local repos=`OSSTEST_CONFIG=$config getconfig Repos`
> +if [ -z "$repos" ] ; then
> + echo "Repos must be configured in $config" >&2
> + exit 1
> +fi
...

This is the same code twice.  I think check_repos could live in
cri-getconfig.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH RFC v1 12/13] xen: allow 64bit HVM guests to use XENMEM_memory_map

2015-06-23 Thread Jan Beulich

>>> On 22.06.15 at 18:11,  wrote:
> Enable this hypercall for 64bit HVM guests in order to fetch the e820 memory
> map in the absence of an emulated BIOS. The memory map is populated and
> notified to Xen in arch_setup_meminit_hvm.

I see no reason why this should be limited to 64-bit guests.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Bug in devicetree_for_each_node() in xen/arch/arm/bootfdt.c ?

2015-06-23 Thread David Vrabel

On 23/06/15 00:02, Chris (Christopher) Brand wrote:
> I’ve been trying to figure out why Xen only reports 2GB on my ARM
> platform that actually has 3GB, and I think I’ve found a bug, but I’m
> not familiar enough with the Xen code to fix it.
> 
> The relevant parts of my dts are:
> 
> /dts-v1/;
> / {
> 
>  model = "Broadcom STB (7445d0)";
>  compatible = "brcm,bcm7445d0", "brcm,brcmstb";
>  #address-cells = <0x2>;
>  #size-cells = <0x2>;
>  interrupt-parent = <0x1>;
> 
>  memory {
>#address-cells = <0x1>;
>#size-cells = <0x1>;
>device_type = "memory";
>reg = <0x0 0x0 0x0 0x4000 0x0 0x4000 0x0 0x4000>;

It's been a while since I've looked at device tree stuff but I think you
need 64-bit values for this reg property because the parent node has
#address-cells == 0x2 and #size-cells == 0x2.

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v3][PATCH 07/16] hvmloader/pci: skip reserved ranges

2015-06-23 Thread Chen, Tiejun


On 2015/6/19 10:02, Chen, Tiejun wrote:

On 2015/6/18 16:05, Jan Beulich wrote:

On 18.06.15 at 09:01,  wrote:

On 2015/6/18 14:29, Jan Beulich wrote:

On 18.06.15 at 08:17,  wrote:

On 2015/6/17 17:24, Jan Beulich wrote:

On 17.06.15 at 11:18,  wrote:

On 2015/6/17 17:02, Jan Beulich wrote:

On 17.06.15 at 10:26,  wrote:

Something hits me to generate another idea,

#1. Still allocate all devices as before.
#2. Lookup all actual bars to check if they're conflicting RMRR

We can skip these bars to keep zero. Then later it would make
lookup easily.

#3. Need to reallocate these conflicting bars.
#3.1 Trying to reallocate them with the remaining resources
#3.2 If the remaining resources aren't enough, we need to
allocate them
from high_mem_resource.


That's possible onyl for 64-bit BARs.


You're right so this means its not proper to adjust mmio_total to
include conflicting reserved ranges and finally moved all
conflicting
bars to high_mem_resource as Kevin suggested previously, so i high
level, we still need to decrease pci_mem_start to populate more
RAM to
compensate them as I did, right?


You probably should do both: Prefer moving things beyond 4Gb,
but if not possible increase the MMIO hole.



I'm trying to figure out a better solution. Perhaps we can allocate
32-bit bars and 64-bit bars orderly. This may help us bypass those
complicated corner cases.


Dealing with 32- and 64-bit BARs separately won't help at all, as


More precisely I'm saying to deal with them orderly.


there may only be 32-bit ones, or the set of 32-bit ones may
already require you to do re-arrangements. Plus, for compatibility


Yes but I don't understand they are specific cases to my idea.


Perhaps the problem is that you don't say what "orderly" is supposed
to mean here?


You're right. Here when "separately" vs "orderly", I should definitely
use "orderly" to make more understandable.




reasons (just like physical machines' BIOSes do) avoiding to place
MMIO above 4Gb where possible is still a goal.


So are you sure you see my idea completely? I don't intend to expand pci
memory above 4GB.

Let me clear this simply,

#1. I'm still trying to allocate all 32bit bars from
[pci_mem_start,pci_mem_end] as before.

#2. But [pci_mem_start,pci_mem_end] mightn't enough cover all 32bit-bars
again because of RMRR, right? So I will populate RAM to push downward at
cur_pci_mem_start ( = pci_mem_start - reserved device memory), then
allocate the remaining 32bit-bars from [cur_pci_mem_start ,
pci_mem_start]

#3. Then I'm still trying to allocate 64bit-bars from
[pci_mem_start,pci_mem_end], unless its not enough. This is just going
to follow the original.

So anything is breaking that goal?


Maybe not, from the above.


And overall, its same as the original.


If the model follows the original, what's the point of outlining
supposed changes to the model? All I'm trying to understand is how


Its not same completely, or let me change this statement, "same" ->
"similar".


you want to change the current code to accommodate the not
aligned reserved memory regions. If everything is the same as
before, this can't have been taken care of. If something is different
from the original, that's what needs spelling out (and nothing else,
as that would only clutter the picture).


Doesn't look like the right approach to me. As said before, I think


Could you see what I'm saying again? I just feel you don't understand
what you mean. If you still think I'm wrong let me know.


I think I understand what _I_ mean, but I'm indeed unsure I see
what _you_ mean. Part of the problem is that you toggle between
sending (incomplete) patches, code fragments, and discussing the
approach verbally. I'd much prefer if either you started with a clear
picture of what you intend to implement, or with an implementation
that at least attempts to take care of all the corner cases (showing
that you understand what the corner cases are, which so far I'm
getting the - perhaps false - impression that you don't).



Based on my previous recognition and our recent discussion, my current
understanding can be summarized as follows;

#1. Goal

MMIO region should exclude all reserved device memory

#2. Requirements

#2.1 Still need to make sure MMIO region is fit all pci devices as before

#2.2 Accommodate the not aligned reserved memory regions

If I'm missing something let me know.

#3. How to

#3.1 Address #2.1

We need to either of populating more RAM, or of expanding more highmem.
But we should know just 64bit-bar can work with highmem, and as you
mentioned we also should avoid expanding highmem as possible.

So my implementation is to allocate 32bit-bar and 64bit-bar orderly.

1>. The first allocation round just to 32bit-bar

If we can finish allocating all 32bit-bar, we just go to allocate
64bit-bar with all remaining resources including low pci memory.

If not, we need to calculate how much RAM should be populated to
allocate the remaining 32bit-bars, then populate sufficien

[Xen-devel] Xen 4.5.1 released

2015-06-23 Thread Jan Beulich

All,

I am pleased to announce the release of Xen 4.5.1. This is
available immediately from its git repository
http://xenbits.xen.org/gitweb/?p=xen.git;a=shortlog;h=refs/heads/stable-4.5 
(tag RELEASE-4.5.1) or from the XenProject download page
http://www.xenproject.org/downloads/xen-archives/xen-45-series/xen-451.html 
(where a list of changes can also be found).

We recommend all users of the 4.5 stable series to update to this
first point release.

Regards,
Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCHv1] x86: only check for one watchdog NMI

2015-06-23 Thread Jan Beulich

>>> On 23.06.15 at 11:34,  wrote:
> On 23/06/15 10:09, Jan Beulich wrote:
> On 22.06.15 at 18:21,  wrote:
>>> Since the NMI handler can now recognize watchdog NMIs, make
>>> check_nmi_watchdog() only check for at least one watchdog NMI.  This
>>> prevents false negatives caused by other processors (which may be
>>> being power managed by the BIOS) running at reduced clock frequencies.
>> Doesn't this go a little too far, in that it allows through the case where
>> an NMI works exactly once (a behavior not unheard of)? Lowering the
>> count to e.g. 3 would seem acceptable, but not any further. Raising
>> the wait time might then need to be the way to go if the (approximate)
>> 1:3 ratio still isn't enough to cope with BIOS power managed CPUs.
>>
>> Btw., can an OS know of the power state CPUs come up in? I.e. can
>> the wait time be adjusted dynamically? Or is this (perhaps intentionally)
>> completely transparent to the OS?
> 
> With the mwait driver, probably, as it is specifically designed to
> completely bypass any firmware settings which might be in place.

The mwait driver deals with C states, while reduced frequency
means P states (and we specifically busy wait here to prevent
entering and C state).

> Anything else is dependent on how much information can be gleaned from
> the ACPI tables, but most firmware deliberately has a "BIOS controlled"
> mode which is designed to restrict what the OS is capable of doing.

None of which would help in identifying what state a CPU is in.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH OSSTEST v2] Add some sanity checks for presence of Repos configuration

2015-06-23 Thread Ian Campbell

By providing an explicit fetch method in cri-getconfig which checks
things.

Without this then anything which uses cr-daily-branch produces the
rather cryptic:

+ test -f daily.xsettings
++ ./ap-print-url xen-unstable
with-lock-ex ./ap-print-url: /lock: Permission denied
+ treeurl=
FAILED rc=255

Which has caught out one or two people using standalone mode.

Signed-off-by: Ian Campbell 
Cc: dario.faggi...@citrix.com
---
v2: README wrapping
Move into a cri-getconfig helper (invalidating Dario's T-a-a-b)
---
 README |  5 +
 cri-getconfig  | 15 +++
 cri-lock-repos |  2 +-
 standalone |  7 +++
 4 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/README b/README
index 44e2989..c3508a3 100644
--- a/README
+++ b/README
@@ -422,6 +422,11 @@ Stash
 Images
 Logs
 
+Repos  Full path to a temporary directory where repositories
+   can be cloned. This is needed for anything which uses
+   cr-daily-branch, including "./standalone make-flight"
+   and "standalone-generate-dump-flight-runvars".
+
 DebianSuite
 GuestDebianSuite   defaults to DebianSuite
 
diff --git a/cri-getconfig b/cri-getconfig
index a19b900..0589bf0 100644
--- a/cri-getconfig
+++ b/cri-getconfig
@@ -24,3 +24,18 @@ getconfig () {
 print $c{"'$1'"} or die $!;
 '
 }
+
+getrepos() {
+   local repos=`getconfig Repos`
+   if [ -z "$repos" ] ; then
+   echo "Repos must be configured in $config" >&2
+   exit 1
+   fi
+   if [ ! -d "$repos" ] ; then
+   # Is likely an absolute path, so don't create automatically,
+   # just in case...
+   echo "Repos $repos does not exist" >&2
+   exit 1
+   fi
+   echo $repos
+}
diff --git a/cri-lock-repos b/cri-lock-repos
index 7d10c87..c8269f3 100644
--- a/cri-lock-repos
+++ b/cri-lock-repos
@@ -19,7 +19,7 @@
 
 . cri-common
 
-repos=`getconfig Repos`
+repos=`getrepos`
 repos_lock="$repos/lock"
 
 if [ "x$OSSTEST_REPOS_LOCK_LOCKED" != "x$repos_lock" ]; then
diff --git a/standalone b/standalone
index 91d18b5..c89347f 100755
--- a/standalone
+++ b/standalone
@@ -142,6 +142,12 @@ need_host() {
 fi
 }
 
+check_repos() {
+# We don't care about the answer, just the error checking and
+# logging to stderr.
+OSSTEST_CONFIG=$config getrepos >/dev/null
+}
+
 ensure_logs() {
 if [ ! -d "logs" ] ; then
mkdir "logs"
@@ -171,6 +177,7 @@ case $op in
;;
 
 make-flight)
+check_repos
 need_flight
 
 if [ $# -lt 1 ] ; then
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH OSSTEST v] Add some sanity checks for presence of Repos configuration

2015-06-23 Thread Ian Campbell

By providing an explicit fetch method in cri-getconfig which checks
things.

Without this then anything which uses cr-daily-branch produces the
rather cryptic:

+ test -f daily.xsettings
++ ./ap-print-url xen-unstable
with-lock-ex ./ap-print-url: /lock: Permission denied
+ treeurl=
FAILED rc=255

Which has caught out one or two people using standalone mode.

Signed-off-by: Ian Campbell 
Cc: dario.faggi...@citrix.com
---
v2: README wrapping
Move into a cri-getconfig helper (invalidating Dario's T-a-a-b)
---
 README |  5 +
 cri-getconfig  | 15 +++
 cri-lock-repos |  2 +-
 standalone |  7 +++
 4 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/README b/README
index 44e2989..c3508a3 100644
--- a/README
+++ b/README
@@ -422,6 +422,11 @@ Stash
 Images
 Logs
 
+Repos  Full path to a temporary directory where repositories
+   can be cloned. This is needed for anything which uses
+   cr-daily-branch, including "./standalone make-flight"
+   and "standalone-generate-dump-flight-runvars".
+
 DebianSuite
 GuestDebianSuite   defaults to DebianSuite
 
diff --git a/cri-getconfig b/cri-getconfig
index a19b900..0589bf0 100644
--- a/cri-getconfig
+++ b/cri-getconfig
@@ -24,3 +24,18 @@ getconfig () {
 print $c{"'$1'"} or die $!;
 '
 }
+
+getrepos() {
+   local repos=`getconfig Repos`
+   if [ -z "$repos" ] ; then
+   echo "Repos must be configured in $config" >&2
+   exit 1
+   fi
+   if [ ! -d "$repos" ] ; then
+   # Is likely an absolute path, so don't create automatically,
+   # just in case...
+   echo "Repos $repos does not exist" >&2
+   exit 1
+   fi
+   echo $repos
+}
diff --git a/cri-lock-repos b/cri-lock-repos
index 7d10c87..c8269f3 100644
--- a/cri-lock-repos
+++ b/cri-lock-repos
@@ -19,7 +19,7 @@
 
 . cri-common
 
-repos=`getconfig Repos`
+repos=`getrepos`
 repos_lock="$repos/lock"
 
 if [ "x$OSSTEST_REPOS_LOCK_LOCKED" != "x$repos_lock" ]; then
diff --git a/standalone b/standalone
index 91d18b5..c89347f 100755
--- a/standalone
+++ b/standalone
@@ -142,6 +142,12 @@ need_host() {
 fi
 }
 
+check_repos() {
+# We don't care about the answer, just the error checking and
+# logging to stderr.
+OSSTEST_CONFIG=$config getrepos >/dev/null
+}
+
 ensure_logs() {
 if [ ! -d "logs" ] ; then
mkdir "logs"
@@ -171,6 +177,7 @@ case $op in
;;
 
 make-flight)
+check_repos
 need_flight
 
 if [ $# -lt 1 ] ; then
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH OSSTEST v2] Add some sanity checks for presence of Repos configuration

2015-06-23 Thread Dario Faggioli

On Tue, 2015-06-23 at 10:54 +0100, Ian Campbell wrote:
> By providing an explicit fetch method in cri-getconfig which checks
> things.
> 
> Without this then anything which uses cr-daily-branch produces the
> rather cryptic:
> 
> + test -f daily.xsettings
> ++ ./ap-print-url xen-unstable
> with-lock-ex ./ap-print-url: /lock: Permission denied
> + treeurl=
> FAILED rc=255
> 
> Which has caught out one or two people using standalone mode.
> 
> Signed-off-by: Ian Campbell 
> Cc: dario.faggi...@citrix.com
>
Reviewed-and-Tested-by: Dario Faggioli 

Dario

-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Xen 4.6 Development Update (five months reminder, 5 WEEKS TO FREEZE)

2015-06-23 Thread Jan Beulich

>>> On 23.06.15 at 11:45,  wrote:
> We seem to have gone off list -- not on purpose I think?

No, sorry - when I trimmed the Cc list I trimmed it a little too much.
Re-added.

Jan

> On Tue, 2015-06-23 at 10:20 +0100, Jan Beulich wrote:
>> >>> On 23.06.15 at 11:16,  wrote:
>> > On Mon, Jun 22, 2015 at 03:37:45PM +0100, Jan Beulich wrote:
>> >> >>> On 05.06.15 at 15:53,  wrote:
>> >> > == Hypervisor ==
>> >> 
>> >> What I seem to be missing in this section are the ticket lock patches
>> >> which got temporarily reverted (and should imo go in again rather
>> >> sooner than later).
>> >> 
>> > 
>> > Now tracked.
>> > 
>> > I agree it should go in sooner rather than later. As I understand it the
>> > issue (netback hotplug script issue) that caused that series to be
>> > reverted has been solved.
>> 
>> I think it is only slowly finding its way into stable Linux releases.
> 
> 
> In terms of releases I've seen 3.14.45 with it in was released this
> morning, which is the tree that will be used for x86 tests by default.
> I've had a mail saying it has been added to the 3.16.7-ckt queue and
> will be in 3.16.7-ckt14, but I've no info on when that will be released,
> I've not seen an associated patchbomb yet.
> 
> ARM testing currently uses 3.16 so that's the one we really care about
> here (since it was arm which was failing). Since we have our own
> linux-xen-arm tree anyway (which I'd forgotten until yesterday) perhaps
> Stefano could add it there in the meantime?
> 
> Stefano, that's a request to backport
> 31a418986a5852034d520a5bab546821ff1ccf3d from upstream into the
> linux-xen-arm tree (sorry I should have thought of this yesterday when
> you were pulling the latest cktN in).
> 
> It's already in
> http://kernel.ubuntu.com/git/ubuntu/linux.git/log/?h=linux-3.16.y-queue 
> but I think it would be better not to wait.
> 
> Ian.




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Bug in devicetree_for_each_node() in xen/arch/arm/bootfdt.c ?

2015-06-23 Thread Ian Campbell

On Tue, 2015-06-23 at 10:44 +0100, David Vrabel wrote:
> On 23/06/15 00:02, Chris (Christopher) Brand wrote:
> > I’ve been trying to figure out why Xen only reports 2GB on my ARM
> > platform that actually has 3GB, and I think I’ve found a bug, but I’m
> > not familiar enough with the Xen code to fix it.
> > 
> > The relevant parts of my dts are:
> > 
> > /dts-v1/;
> > / {
> > 
> >  model = "Broadcom STB (7445d0)";
> >  compatible = "brcm,bcm7445d0", "brcm,brcmstb";
> >  #address-cells = <0x2>;
> >  #size-cells = <0x2>;
> >  interrupt-parent = <0x1>;
> > 
> >  memory {
> >#address-cells = <0x1>;
> >#size-cells = <0x1>;
> >device_type = "memory";
> >reg = <0x0 0x0 0x0 0x4000 0x0 0x4000 0x0 0x4000>;
> 
> It's been a while since I've looked at device tree stuff but I think you
> need 64-bit values for this reg property because the parent node has
> #address-cells == 0x2 and #size-cells == 0x2.

Yes, the prevailing sizes will be 0x2 here, since the 0x1 only apply to
_children_. However you still need to write the cells as separate 32-bit
entries, so the above encodes to memory regions from 0x0.0 to
0x0.4000 and 0x0.4000 to 0x0.8000 (using . to show where the
cell boundary lies).

I don't know this platform, but that seems a plausible description of 2x
1GB regions.

I've not yet looked at the code closer to see what's going on.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH OSSTEST v] Add some sanity checks for presence of Repos configuration

2015-06-23 Thread Ian Campbell

On Tue, 2015-06-23 at 10:54 +0100, Ian Campbell wrote:

Sorry for this, I hit "v, CR, 2" instead of "v, 2, CR" and apparently
didn't get to Ctrl-C quick enough.

This is the same as v2 which followed shortly after.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [v4][PATCH 03/19] xen/vtd: create RMRR mapping

2015-06-23 Thread Tiejun Chen

RMRR reserved regions must be setup in the pfn space with an identity
mapping to reported mfn. However existing code has problem to setup
correct mapping when VT-d shares EPT page table, so lead to problem
when assigning devices (e.g GPU) with RMRR reported. So instead, this
patch aims to setup identity mapping in p2m layer, regardless of
whether EPT is shared or not. And we still keep creating VT-d table.

CC: Yang Zhang 
CC: Kevin Tian 
Signed-off-by: Tiejun Chen 
---
v4:

* Instead of intel_iommu_unmap_page(), we should use
  guest_physmap_remove_page() to unmap rmrr mapping correctly. 

* Drop iommu_map_page() since actually ept_set_entry() can do this
  internally.

 xen/drivers/passthrough/vtd/iommu.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index 44ed23d..202b2d0 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -1839,7 +1839,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t 
map,
 
 while ( base_pfn < end_pfn )
 {
-if ( intel_iommu_unmap_page(d, base_pfn) )
+if ( guest_physmap_remove_page(d, base_pfn, base_pfn, 0) )
 ret = -ENXIO;
 base_pfn++;
 }
@@ -1855,8 +1855,7 @@ static int rmrr_identity_mapping(struct domain *d, bool_t 
map,
 
 while ( base_pfn < end_pfn )
 {
-int err = intel_iommu_map_page(d, base_pfn, base_pfn,
-   IOMMUF_readable|IOMMUF_writable);
+int err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw);
 
 if ( err )
 return err;
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [v4][PATCH 00/19] Fix RMRR

2015-06-23 Thread Tiejun Chen

v4:

* Change one condition inside patch #2, "xen/x86/p2m: introduce
  set_identity_p2m_entry",

  if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm )

 to make sure we just catch our requirement.

* Inside patch #3, "xen/vtd: create RMRR mapping",
  Instead of intel_iommu_unmap_page(), we should use
  guest_physmap_remove_page() to unmap rmrr mapping correctly. And drop
  iommu_map_page() since actually ept_set_entry() can do this
  internally.

* Inside patch #4, "xen/passthrough: extend hypercall to support rdm
  reservation policy", add code comments to describer why we fix to set a
  policy flag in some cases like adding a device to hwdomain, and removing
  a device from user domain. And fix one judging condition

  domctl->u.assign_device.flag == XEN_DOMCTL_DEV_NO_RDM
  -> domctl->u.assign_device.flag != XEN_DOMCTL_DEV_NO_RDM

  Additionally, also add to range check the flag passed to make future
  extensions possible (and to avoid ambiguity on what out of range values
  would mean).

* Inside patch #6, "hvmloader: get guest memory map into memory_map[]", we
  move some codes related to e820 to that specific file, e820.c, and consolidate
  "printf()+BUG()" and "BUG_ON()", and also avoid another fixed width type for
  the parameter of get_mem_mapping_layout()

* Inside patch #7, "hvmloader/pci: skip reserved ranges"
  We have to re-design this as follows:

  #1. Goal

  MMIO region should exclude all reserved device memory

  #2. Requirements

  #2.1 Still need to make sure MMIO region is fit all pci devices as before

  #2.2 Accommodate the not aligned reserved memory regions

  If I'm missing something let me know.

  #3. How to

  #3.1 Address #2.1

  We need to either of populating more RAM, or of expanding more highmem. But
  we should know just 64bit-bar can work with highmem, and as you mentioned we
  also should avoid expanding highmem as possible. So my implementation is to 
  allocate 32bit-bar and 64bit-bar orderly.

  1>. The first allocation round just to 32bit-bar

  If we can finish allocating all 32bit-bar, we just go to allocate 64bit-bar
  with all remaining resources including low pci memory.

  If not, we need to calculate how much RAM should be populated to allocate the 
  remaining 32bit-bars, then populate sufficient RAM as exp_mem_resource to go
  to the second allocation round 2>.

  2>. The second allocation round to the remaining 32bit-bar

  We should can finish allocating all 32bit-bar in theory, then go to the third
  allocation round 3>.

  3>. The third allocation round to 64bit-bar

  We'll try to first allocate from the remaining low memory resource. If that
  isn't enough, we try to expand highmem to allocate for 64bit-bar. This process
  should be same as the original.

  #3.2 Address #2.2

  I'm trying to accommodate the not aligned reserved memory regions:

  We should skip all reserved device memory, but we also need to check if other
  smaller bars can be allocated if a mmio hole exists between resource->base and
  reserved device memory. If a hole exists between base and reserved device
  memory, lets go out simply to try allocate for next bar since all bars are in
  descending order of size. If not, we need to move resource->base to 
reserved_end
  just to reallocate this bar

* Inside of patch #8, "hvmloader/e820: construct guest e820 table", we need to
  adjust highmme if lowmem is changed such as hvmloader has to populate more
  RAM to allocate bars.

* Inside of patch #11, "tools: introduce some new parameters to set rdm policy",
  we don't define init_val for for libxl_rdm_reserve_type since its just zero,
  and grab those changes to xl/libxlu to as a final patch.

* Inside of patch #12, "passes rdm reservation policy", fix one typo,
  s/unkwon/unknown. And in command description, we should use "[]" to indicate 
  it's optional for that extended xl command, pci-attach.

* Patch #13 is separated from current patch #14 since this is specific to xc.

* Inside of patch #14, "tools/libxl: detect and avoid conflicts with RDM", and
  just unconditionally set *nr_entries to 0. And additionally, we grab to all
  stuffs to provide a parameter to set our predefined boundary dynamically to as
  a separated patch later

* Inside of patch #16, "tools/libxl: extend XENMEM_set_memory_map", we use
  goto style error handling, and instead of NOGC, we shoud use
  libxl__malloc(gc,XXX) to allocate local e820.

Overall, we refined several the patch head descriptions and code comments.

v3:

* Rearrange all patches orderly as Wei suggested
* Rebase on the latest tree
* Address some Wei's comments on tools side
* Two changes for runtime cycle
   patch #2,xen/x86/p2m: introduce set_identity_p2m_entry, on hypervisor side

  a>. Introduce paging_mode_translate()
  Otherwise, we'll see this error when boot Xen/Dom0

(XEN) Assertion 'paging_mode_translate(p2m->domain)' failed at p2m-pt.c:702
(XEN) [ Xen-4.6-unstable  x86_64  debug=y  Tainted:C ]

(XEN) Xen call trace:

Re: [Xen-devel] [PATCH RFC v1 09/13] elfnotes: intorduce a new PHYS_ENTRY elfnote

2015-06-23 Thread Jan Beulich

>>> On 23.06.15 at 11:40,  wrote:
> El 23/06/15 a les 11.35, Jan Beulich ha escrit:
> On 22.06.15 at 18:11,  wrote:
>>> @@ -213,6 +214,9 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary 
> *elf,
>>>  elf, note, sizeof(*parms->f_supported), i);
>>>  break;
>>>  
>>> +case XEN_ELFNOTE_PHYS_ENTRY:
>>> +parms->phys_entry = val;
>> 
>> I don't think I've seen this field having got added in an earlier patch,
>> and it's also not getting added here.
> 
> Yes, it's added in patch 5, because it's also used by the HVM-generic
> loader.

So indeed missed it (being in an otherwise tools only patch). Sorry.

>>> --- a/xen/include/public/elfnote.h
>>> +++ b/xen/include/public/elfnote.h
>>> @@ -200,9 +200,18 @@
>>>  #define XEN_ELFNOTE_SUPPORTED_FEATURES 17
>>>  
>>>  /*
>>> + * Physical entry point into the kernel.
>>> + *
>>> + * 32bit entry point into the kernel. Xen will use this entry point
>>> + * in order to launch the guest kernel in 32bit protected mode
>>> + * with paging disabled.
>>> + */
>>> +#define XEN_ELFNOTE_PHYS_ENTRY 18
>> 
>> Perhaps XEN_ELFNOTE_PHYS_ENTRY32 or XEN_ELFNOTE_PHYS32_ENTRY
>> then?
> 
> That's fine, I don't mind changing it. Although AFAIK it's not possible
> to have a 64bit physical entry point (paging-disabled).

That depends on the perspective you take: Under UEFI the kernel
would be entered in pseudo-physical (1:1 mapped virtual) mode.
And that's certainly a model to at least keep in mind.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [v4][PATCH 01/19] xen: introduce XENMEM_reserved_device_memory_map

2015-06-23 Thread Tiejun Chen

From: Jan Beulich 

This is a prerequisite for punching holes into HVM and PVH guests' P2M
to allow passing through devices that are associated with (on VT-d)
RMRRs.

CC: Jan Beulich 
CC: Yang Zhang 
CC: Kevin Tian 
Signed-off-by: Jan Beulich 
Signed-off-by: Tiejun Chen 
Acked-by: Kevin Tian 
---
v4:

* Nothing is changed.

 xen/common/compat/memory.c   | 66 
 xen/common/memory.c  | 64 ++
 xen/drivers/passthrough/iommu.c  | 10 ++
 xen/drivers/passthrough/vtd/dmar.c   | 32 +
 xen/drivers/passthrough/vtd/extern.h |  1 +
 xen/drivers/passthrough/vtd/iommu.c  |  1 +
 xen/include/public/memory.h  | 32 -
 xen/include/xen/iommu.h  | 10 ++
 xen/include/xen/pci.h|  2 ++
 xen/include/xlat.lst |  3 +-
 10 files changed, 219 insertions(+), 2 deletions(-)

diff --git a/xen/common/compat/memory.c b/xen/common/compat/memory.c
index b258138..b608496 100644
--- a/xen/common/compat/memory.c
+++ b/xen/common/compat/memory.c
@@ -17,6 +17,45 @@ CHECK_TYPE(domid);
 CHECK_mem_access_op;
 CHECK_vmemrange;
 
+#ifdef HAS_PASSTHROUGH
+struct get_reserved_device_memory {
+struct compat_reserved_device_memory_map map;
+unsigned int used_entries;
+};
+
+static int get_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
+  u32 id, void *ctxt)
+{
+struct get_reserved_device_memory *grdm = ctxt;
+u32 sbdf;
+struct compat_reserved_device_memory rdm = {
+.start_pfn = start, .nr_pages = nr
+};
+
+sbdf = PCI_SBDF2(grdm->map.seg, grdm->map.bus, grdm->map.devfn);
+if ( (grdm->map.flag & PCI_DEV_RDM_ALL) || (sbdf == id) )
+{
+if ( grdm->used_entries < grdm->map.nr_entries )
+{
+if ( rdm.start_pfn != start || rdm.nr_pages != nr )
+return -ERANGE;
+
+if ( __copy_to_compat_offset(grdm->map.buffer,
+ grdm->used_entries,
+ &rdm,
+ 1) )
+{
+return -EFAULT;
+}
+}
+++grdm->used_entries;
+return 1;
+}
+
+return 0;
+}
+#endif
+
 int compat_memory_op(unsigned int cmd, XEN_GUEST_HANDLE_PARAM(void) compat)
 {
 int split, op = cmd & MEMOP_CMD_MASK;
@@ -303,6 +342,33 @@ int compat_memory_op(unsigned int cmd, 
XEN_GUEST_HANDLE_PARAM(void) compat)
 break;
 }
 
+#ifdef HAS_PASSTHROUGH
+case XENMEM_reserved_device_memory_map:
+{
+struct get_reserved_device_memory grdm;
+
+if ( copy_from_guest(&grdm.map, compat, 1) ||
+ !compat_handle_okay(grdm.map.buffer, grdm.map.nr_entries) )
+return -EFAULT;
+
+grdm.used_entries = 0;
+rc = iommu_get_reserved_device_memory(get_reserved_device_memory,
+  &grdm);
+
+if ( !rc && grdm.map.nr_entries < grdm.used_entries )
+rc = -ENOBUFS;
+
+grdm.map.nr_entries = grdm.used_entries;
+if ( grdm.map.nr_entries )
+{
+if ( __copy_to_guest(compat, &grdm.map, 1) )
+rc = -EFAULT;
+}
+
+return rc;
+}
+#endif
+
 default:
 return compat_arch_memory_op(cmd, compat);
 }
diff --git a/xen/common/memory.c b/xen/common/memory.c
index c84fcdd..7b6281b 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -748,6 +748,43 @@ static int construct_memop_from_reservation(
 return 0;
 }
 
+#ifdef HAS_PASSTHROUGH
+struct get_reserved_device_memory {
+struct xen_reserved_device_memory_map map;
+unsigned int used_entries;
+};
+
+static int get_reserved_device_memory(xen_pfn_t start, xen_ulong_t nr,
+  u32 id, void *ctxt)
+{
+struct get_reserved_device_memory *grdm = ctxt;
+u32 sbdf;
+
+sbdf = PCI_SBDF2(grdm->map.seg, grdm->map.bus, grdm->map.devfn);
+if ( (grdm->map.flag & PCI_DEV_RDM_ALL) || (sbdf == id) )
+{
+if ( grdm->used_entries < grdm->map.nr_entries )
+{
+struct xen_reserved_device_memory rdm = {
+.start_pfn = start, .nr_pages = nr
+};
+
+if ( __copy_to_guest_offset(grdm->map.buffer,
+grdm->used_entries,
+&rdm,
+1) )
+{
+return -EFAULT;
+}
+}
+++grdm->used_entries;
+return 1;
+}
+
+return 0;
+}
+#endif
+
 long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
 struct domain *d;
@@ -1162,6 +1199,33 @@ long do_memory_op(unsigned long cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)

[Xen-devel] [v4][PATCH 07/19] hvmloader/pci: skip reserved ranges

2015-06-23 Thread Tiejun Chen

When allocating mmio address for PCI bars, we need to make
sure they don't overlap with reserved regions.

CC: Keir Fraser 
CC: Jan Beulich 
CC: Andrew Cooper 
CC: Ian Jackson 
CC: Stefano Stabellini 
CC: Ian Campbell 
CC: Wei Liu 
Signed-off-by: Tiejun Chen 
---
v4:

* We have to re-design this as follows:

  #1. Goal

  MMIO region should exclude all reserved device memory

  #2. Requirements

  #2.1 Still need to make sure MMIO region is fit all pci devices as before

  #2.2 Accommodate the not aligned reserved memory regions

  If I'm missing something let me know.

  #3. How to

  #3.1 Address #2.1

  We need to either of populating more RAM, or of expanding more highmem. But
  we should know just 64bit-bar can work with highmem, and as you mentioned we
  also should avoid expanding highmem as possible. So my implementation is to 
  allocate 32bit-bar and 64bit-bar orderly.

  1>. The first allocation round just to 32bit-bar

  If we can finish allocating all 32bit-bar, we just go to allocate 64bit-bar
  with all remaining resources including low pci memory.

  If not, we need to calculate how much RAM should be populated to allocate the 
  remaining 32bit-bars, then populate sufficient RAM as exp_mem_resource to go
  to the second allocation round 2>.

  2>. The second allocation round to the remaining 32bit-bar

  We should can finish allocating all 32bit-bar in theory, then go to the third
  allocation round 3>.

  3>. The third allocation round to 64bit-bar

  We'll try to first allocate from the remaining low memory resource. If that
  isn't enough, we try to expand highmem to allocate for 64bit-bar. This process
  should be same as the original.

  #3.2 Address #2.2

  I'm trying to accommodate the not aligned reserved memory regions:

  We should skip all reserved device memory, but we also need to check if other
  smaller bars can be allocated if a mmio hole exists between resource->base and
  reserved device memory. If a hole exists between base and reserved device
  memory, lets go out simply to try allocate for next bar since all bars are in
  descending order of size. If not, we need to move resource->base to 
reserved_end
  just to reallocate this bar

 tools/firmware/hvmloader/pci.c | 180 +++--
 1 file changed, 154 insertions(+), 26 deletions(-)

diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
index 5ff87a7..5470958 100644
--- a/tools/firmware/hvmloader/pci.c
+++ b/tools/firmware/hvmloader/pci.c
@@ -38,6 +38,31 @@ uint64_t pci_hi_mem_start = 0, pci_hi_mem_end = 0;
 enum virtual_vga virtual_vga = VGA_none;
 unsigned long igd_opregion_pgbase = 0;
 
+static void relocate_ram_for_pci_memory(unsigned long cur_pci_mem_start)
+{
+struct xen_add_to_physmap xatp;
+unsigned int nr_pages = min_t(
+unsigned int,
+hvm_info->low_mem_pgend - (cur_pci_mem_start >> PAGE_SHIFT),
+(1u << 16) - 1);
+if ( hvm_info->high_mem_pgend == 0 )
+hvm_info->high_mem_pgend = 1ull << (32 - PAGE_SHIFT);
+hvm_info->low_mem_pgend -= nr_pages;
+printf("Relocating 0x%x pages from "PRIllx" to "PRIllx\
+   " for lowmem MMIO hole\n",
+   nr_pages,
+   PRIllx_arg(((uint64_t)hvm_info->low_mem_pgend)high_mem_pgend;
+xatp.size  = nr_pages;
+if ( hypercall_memory_op(XENMEM_add_to_physmap, &xatp) != 0 )
+BUG();
+hvm_info->high_mem_pgend += nr_pages;
+}
+
 void pci_setup(void)
 {
 uint8_t is_64bar, using_64bar, bar64_relocate = 0;
@@ -50,7 +75,7 @@ void pci_setup(void)
 /* Resources assignable to PCI devices via BARs. */
 struct resource {
 uint64_t base, max;
-} *resource, mem_resource, high_mem_resource, io_resource;
+} *resource, mem_resource, high_mem_resource, io_resource, 
exp_mem_resource;
 
 /* Create a list of device BARs in descending order of size. */
 struct bars {
@@ -59,8 +84,11 @@ void pci_setup(void)
 uint32_t bar_reg;
 uint64_t bar_sz;
 } *bars = (struct bars *)scratch_start;
-unsigned int i, nr_bars = 0;
-uint64_t mmio_hole_size = 0;
+unsigned int i, j, n, nr_bars = 0;
+uint64_t mmio_hole_size = 0, reserved_start, reserved_end, reserved_size;
+bool bar32_allocating = 0;
+uint64_t mmio32_unallocated_total = 0;
+unsigned long cur_pci_mem_start = 0;
 
 const char *s;
 /*
@@ -309,29 +337,31 @@ void pci_setup(void)
 }
 
 /* Relocate RAM that overlaps PCI space (in 64k-page chunks). */
+cur_pci_mem_start = pci_mem_start;
 while ( (pci_mem_start >> PAGE_SHIFT) < hvm_info->low_mem_pgend )
+relocate_ram_for_pci_memory(cur_pci_mem_start);
+
+/*
+ * Check if reserved device memory conflicts current pci memory.
+ * If yes, we need to first allocate bar32 since reserved devices
+ * always occupy low memory, and also enable relocating some BARs
+ * to 64bit as possible.
+ */
+

[Xen-devel] [v4][PATCH 09/19] tools/libxc: Expose new hypercall xc_reserved_device_memory_map

2015-06-23 Thread Tiejun Chen

We will introduce the hypercall xc_reserved_device_memory_map
approach to libxc. This helps us get rdm entry info according to
different parameters. If flag == PCI_DEV_RDM_ALL, all entries
should be exposed. Or we just expose that rdm entry specific to
a SBDF.

CC: Ian Jackson 
CC: Stefano Stabellini 
CC: Ian Campbell 
CC: Wei Liu 
Signed-off-by: Tiejun Chen 
Reviewed-by: Kevin Tian 
---
v4:

* Nothing is changed.

 tools/libxc/include/xenctrl.h |  8 
 tools/libxc/xc_domain.c   | 36 
 2 files changed, 44 insertions(+)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index d1d2ab3..9160623 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1326,6 +1326,14 @@ int xc_domain_set_memory_map(xc_interface *xch,
 int xc_get_machine_memory_map(xc_interface *xch,
   struct e820entry entries[],
   uint32_t max_entries);
+
+int xc_reserved_device_memory_map(xc_interface *xch,
+  uint32_t flag,
+  uint16_t seg,
+  uint8_t bus,
+  uint8_t devfn,
+  struct xen_reserved_device_memory entries[],
+  uint32_t *max_entries);
 #endif
 int xc_domain_set_time_offset(xc_interface *xch,
   uint32_t domid,
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index ce51e69..0951291 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -684,6 +684,42 @@ int xc_domain_set_memory_map(xc_interface *xch,
 
 return rc;
 }
+
+int xc_reserved_device_memory_map(xc_interface *xch,
+  uint32_t flag,
+  uint16_t seg,
+  uint8_t bus,
+  uint8_t devfn,
+  struct xen_reserved_device_memory entries[],
+  uint32_t *max_entries)
+{
+int rc;
+struct xen_reserved_device_memory_map xrdmmap = {
+.flag = flag,
+.seg = seg,
+.bus = bus,
+.devfn = devfn,
+.nr_entries = *max_entries
+};
+DECLARE_HYPERCALL_BOUNCE(entries,
+ sizeof(struct xen_reserved_device_memory) *
+ *max_entries, XC_HYPERCALL_BUFFER_BOUNCE_OUT);
+
+if ( xc_hypercall_bounce_pre(xch, entries) )
+return -1;
+
+set_xen_guest_handle(xrdmmap.buffer, entries);
+
+rc = do_memory_op(xch, XENMEM_reserved_device_memory_map,
+  &xrdmmap, sizeof(xrdmmap));
+
+xc_hypercall_bounce_post(xch, entries);
+
+*max_entries = xrdmmap.nr_entries;
+
+return rc;
+}
+
 int xc_get_machine_memory_map(xc_interface *xch,
   struct e820entry entries[],
   uint32_t max_entries)
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [v4][PATCH 19/19] tools: parse to enable new rdm policy parameters

2015-06-23 Thread Tiejun Chen

This patch parses to enable user configurable parameters to specify
RDM resource and according policies,

Global RDM parameter:
rdm = "type=none/host,reserve=strict/relaxed"
Per-device RDM parameter:
pci = [ 'sbdf, rdm_reserve=strict/relaxed' ]

Default per-device RDM policy is 'strict', while default global RDM policy
is 'relaxed'. When both policies are specified on a given region, 'strict' is
always preferred.

CC: Ian Jackson 
CC: Stefano Stabellini 
CC: Ian Campbell 
CC: Wei Liu 
Signed-off-by: Tiejun Chen 
---
v4:

* Separated from current patch #11 to parse/enable our rdm policy parameters
  since its make a lot sense and these stuffs are specific to xl/libxlu.

 tools/libxl/libxlu_pci.c | 92 
 tools/libxl/libxlutil.h  |  4 +++
 tools/libxl/xl_cmdimpl.c | 10 ++
 3 files changed, 106 insertions(+)

diff --git a/tools/libxl/libxlu_pci.c b/tools/libxl/libxlu_pci.c
index 26fb143..9255878 100644
--- a/tools/libxl/libxlu_pci.c
+++ b/tools/libxl/libxlu_pci.c
@@ -42,6 +42,9 @@ static int pcidev_struct_fill(libxl_device_pci *pcidev, 
unsigned int domain,
 #define STATE_OPTIONS_K 6
 #define STATE_OPTIONS_V 7
 #define STATE_TERMINAL  8
+#define STATE_TYPE  9
+#define STATE_RDM_TYPE  10
+#define STATE_RESERVE_FLAG  11
 int xlu_pci_parse_bdf(XLU_Config *cfg, libxl_device_pci *pcidev, const char 
*str)
 {
 unsigned state = STATE_DOMAIN;
@@ -143,6 +146,17 @@ int xlu_pci_parse_bdf(XLU_Config *cfg, libxl_device_pci 
*pcidev, const char *str
 pcidev->permissive = atoi(tok);
 }else if ( !strcmp(optkey, "seize") ) {
 pcidev->seize = atoi(tok);
+}else if ( !strcmp(optkey, "rdm_reserve") ) {
+if ( !strcmp(tok, "strict") ) {
+pcidev->rdm_reserve = LIBXL_RDM_RESERVE_FLAG_STRICT;
+} else if ( !strcmp(tok, "relaxed") ) {
+pcidev->rdm_reserve = LIBXL_RDM_RESERVE_FLAG_RELAXED;
+} else {
+XLU__PCI_ERR(cfg, "%s is not an valid PCI RDM property"
+  " flag: 'strict' or 'relaxed'.",
+ tok);
+goto parse_error;
+}
 }else{
 XLU__PCI_ERR(cfg, "Unknown PCI BDF option: %s", optkey);
 }
@@ -167,6 +181,84 @@ parse_error:
 return ERROR_INVAL;
 }
 
+int xlu_rdm_parse(XLU_Config *cfg, libxl_rdm_reserve *rdm, const char *str)
+{
+unsigned state = STATE_TYPE;
+char *buf2, *tok, *ptr, *end;
+
+if (NULL == (buf2 = ptr = strdup(str)))
+return ERROR_NOMEM;
+
+for (tok = ptr, end = ptr + strlen(ptr) + 1; ptr < end; ptr++) {
+switch(state) {
+case STATE_TYPE:
+if (*ptr == '=') {
+state = STATE_RDM_TYPE;
+*ptr = '\0';
+if (strcmp(tok, "type")) {
+XLU__PCI_ERR(cfg, "Unknown RDM state option: %s", tok);
+goto parse_error;
+}
+tok = ptr + 1;
+}
+break;
+case STATE_RDM_TYPE:
+if (*ptr == '\0' || *ptr == ',') {
+state = STATE_RESERVE_FLAG;
+*ptr = '\0';
+if (!strcmp(tok, "host")) {
+rdm->type = LIBXL_RDM_RESERVE_TYPE_HOST;
+} else if (!strcmp(tok, "none")) {
+rdm->type = LIBXL_RDM_RESERVE_TYPE_NONE;
+} else {
+XLU__PCI_ERR(cfg, "Unknown RDM type option: %s", tok);
+goto parse_error;
+}
+tok = ptr + 1;
+}
+break;
+case STATE_RESERVE_FLAG:
+if (*ptr == '=') {
+state = STATE_OPTIONS_V;
+*ptr = '\0';
+if (strcmp(tok, "reserve")) {
+XLU__PCI_ERR(cfg, "Unknown RDM property value: %s", tok);
+goto parse_error;
+}
+tok = ptr + 1;
+}
+break;
+case STATE_OPTIONS_V:
+if (*ptr == ',' || *ptr == '\0') {
+state = STATE_TERMINAL;
+*ptr = '\0';
+if (!strcmp(tok, "strict")) {
+rdm->reserve = LIBXL_RDM_RESERVE_FLAG_STRICT;
+} else if (!strcmp(tok, "relaxed")) {
+rdm->reserve = LIBXL_RDM_RESERVE_FLAG_RELAXED;
+} else {
+XLU__PCI_ERR(cfg, "Unknown RDM property flag value: %s",
+ tok);
+goto parse_error;
+}
+tok = ptr + 1;
+}
+default:
+break;
+}
+}
+
+free(buf2);
+
+if (tok != ptr || state != STATE_TERMINAL)
+goto parse_error;
+
+return 0;
+

[Xen-devel] [v4][PATCH 08/19] hvmloader/e820: construct guest e820 table

2015-06-23 Thread Tiejun Chen

Now we can use that memory map to build our final
e820 table but it may need to reorder all e820
entries.

CC: Keir Fraser 
CC: Jan Beulich 
CC: Andrew Cooper 
CC: Ian Jackson 
CC: Stefano Stabellini 
CC: Ian Campbell 
CC: Wei Liu 
Signed-off-by: Tiejun Chen 
---
v4:

* Rename local variable, low_mem_pgend, to low_mem_end.

* Improve some code comments

* Adjust highmem after lowmem is changed.

 tools/firmware/hvmloader/e820.c | 80 +
 1 file changed, 66 insertions(+), 14 deletions(-)

diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c
index 3e53c47..aa2569f 100644
--- a/tools/firmware/hvmloader/e820.c
+++ b/tools/firmware/hvmloader/e820.c
@@ -108,7 +108,9 @@ int build_e820_table(struct e820entry *e820,
  unsigned int lowmem_reserved_base,
  unsigned int bios_image_base)
 {
-unsigned int nr = 0;
+unsigned int nr = 0, i, j;
+uint64_t add_high_mem = 0;
+uint64_t low_mem_end = hvm_info->low_mem_pgend << PAGE_SHIFT;
 
 if ( !lowmem_reserved_base )
 lowmem_reserved_base = 0xA;
@@ -152,13 +154,6 @@ int build_e820_table(struct e820entry *e820,
 e820[nr].type = E820_RESERVED;
 nr++;
 
-/* Low RAM goes here. Reserve space for special pages. */
-BUG_ON((hvm_info->low_mem_pgend << PAGE_SHIFT) < (2u << 20));
-e820[nr].addr = 0x10;
-e820[nr].size = (hvm_info->low_mem_pgend << PAGE_SHIFT) - e820[nr].addr;
-e820[nr].type = E820_RAM;
-nr++;
-
 /*
  * Explicitly reserve space for special pages.
  * This space starts at RESERVED_MEMBASE an extends to cover various
@@ -194,16 +189,73 @@ int build_e820_table(struct e820entry *e820,
 nr++;
 }
 
-
-if ( hvm_info->high_mem_pgend )
+/*
+ * Construct E820 table according to recorded memory map.
+ *
+ * The memory map created by toolstack may include,
+ *
+ * #1. Low memory region
+ *
+ * Low RAM starts at least from 1M to make sure all standard regions
+ * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
+ * have enough space.
+ *
+ * #2. Reserved regions if they exist
+ *
+ * #3. High memory region if it exists
+ */
+for ( i = 0; i < memory_map.nr_map; i++ )
 {
-e820[nr].addr = ((uint64_t)1 << 32);
-e820[nr].size =
-((uint64_t)hvm_info->high_mem_pgend << PAGE_SHIFT) - e820[nr].addr;
-e820[nr].type = E820_RAM;
+e820[nr] = memory_map.map[i];
 nr++;
 }
 
+/* Low RAM goes here. Reserve space for special pages. */
+BUG_ON(low_mem_end < (2u << 20));
+
+/*
+ * We may need to adjust real lowmem end since we may
+ * populate RAM to get enough MMIO previously.
+ */
+for ( i = 0; i < memory_map.nr_map; i++ )
+{
+uint64_t end = e820[i].addr + e820[i].size;
+if ( e820[i].type == E820_RAM &&
+ low_mem_end > e820[i].addr && low_mem_end < end )
+{
+add_high_mem = end - low_mem_end;
+e820[i].size = low_mem_end - e820[i].addr;
+}
+}
+
+/*
+ * And then we also need to adjust highmem.
+ */
+if ( add_high_mem )
+{
+for ( i = 0; i < memory_map.nr_map; i++ )
+{
+if ( e820[i].type == E820_RAM &&
+ e820[i].addr > (1ull << 32))
+e820[i].size += add_high_mem;
+}
+}
+
+/* Finally we need to reorder all e820 entries. */
+for ( j = 0; j < nr-1; j++ )
+{
+for ( i = j+1; i < nr; i++ )
+{
+if ( e820[j].addr > e820[i].addr )
+{
+struct e820entry tmp;
+tmp = e820[j];
+e820[j] = e820[i];
+e820[i] = tmp;
+}
+}
+}
+
 return nr;
 }
 
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [v4][PATCH 11/19] tools: introduce some new parameters to set rdm policy

2015-06-23 Thread Tiejun Chen

This patch introduces user configurable parameters to specify RDM
resource and according policies,

Global RDM parameter:
rdm = "type=none/host,reserve=strict/relaxed"
Per-device RDM parameter:
pci = [ 'sbdf, rdm_reserve=strict/relaxed' ]

Global RDM parameter, "type", allows user to specify reserved regions
explicitly, e.g. using 'host' to include all reserved regions reported
on this platform which is good to handle hotplug scenario. In the future
this parameter may be further extended to allow specifying random regions,
e.g. even those belonging to another platform as a preparation for live
migration with passthrough devices. Instead, 'none' means we have nothing
to do all reserved regions and ignore all policies, so guest work as before.

'strict/relaxed' policy decides how to handle conflict when reserving RDM
regions in pfn space. If conflict exists, 'strict' means an immediate error
so VM will be killed, while 'relaxed' allows moving forward with a warning
message thrown out.

Default per-device RDM policy is 'strict', while default global RDM policy
is 'relaxed'. When both policies are specified on a given region, 'strict' is
always preferred.

CC: Ian Jackson 
CC: Stefano Stabellini 
CC: Ian Campbell 
CC: Wei Liu 
Signed-off-by: Tiejun Chen 
---
v4:

* No need to define init_val for libxl_rdm_reserve_type since its just zero
* Grab those changes to xl/libxlu to as a final patch

 docs/man/xl.cfg.pod.5| 50 
 docs/misc/vtd.txt| 24 +
 tools/libxl/libxl_create.c   | 13 
 tools/libxl/libxl_internal.h |  2 ++
 tools/libxl/libxl_pci.c  |  3 +++
 tools/libxl/libxl_types.idl  | 18 
 6 files changed, 110 insertions(+)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index a3e0e2e..638b350 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -655,6 +655,49 @@ assigned slave device.
 
 =back
 
+=item B
+
+(HVM/x86 only) Specifies the information about Reserved Device Memory (RDM),
+which is necessary to enable robust device passthrough. One example of RDM
+is reported through ACPI Reserved Memory Region Reporting (RMRR) structure
+on x86 platform.
+
+B has the form C<[KEY=VALUE,KEY=VALUE,...> where:
+
+=over 4
+
+=item B
+
+Possible Bs are:
+
+=over 4
+
+=item B
+
+Currently we just have two types:
+
+"host" means all reserved device memory on this platform should be reserved
+in this VM's guest address space space. This global RDM parameter allows
+user to specify reserved regions explicitly. And using "host" to include all
+reserved regions reported on this platform which is good to handle hotplug
+scenario. In the future this parameter may be further extended to allow
+specifying random regions, e.g. even those belonging to another platform as
+a preparation for live migration with passthrough devices.
+
+"none" means we have nothing to do all reserved regions and ignore all 
policies,
+so guest work as before.
+
+=over 4
+
+=item B
+
+Conflict may be detected when reserving reserved device memory in guest address
+space. "strict" means an unsolved conflict leads to immediate VM crash, while
+"relaxed" allows VM moving forward with a warning message thrown out. "relaxed"
+is default.
+
+Note this may be overridden by rdm_reserve option in PCI device configuration.
+
 =item B
 
 Specifies the host PCI devices to passthrough to this guest. Each 
B
@@ -717,6 +760,13 @@ dom0 without confirmation.  Please use with care.
 D0-D3hot power management states for the PCI device. False (0) by
 default.
 
+=item B
+
+(HVM/x86 only) This is same as reserve option above but just specific
+to a given device, and "strict" is default here.
+
+Note this would override global B option.
+
 =back
 
 =back
diff --git a/docs/misc/vtd.txt b/docs/misc/vtd.txt
index 9af0e99..7d63c47 100644
--- a/docs/misc/vtd.txt
+++ b/docs/misc/vtd.txt
@@ -111,6 +111,30 @@ in the config file:
 To override for a specific device:
pci = [ '01:00.0,msitranslate=0', '03:00.0' ]
 
+RDM, 'reserved device memory', for PCI Device Passthrough
+-
+
+There are some devices the BIOS controls, for e.g. USB devices to perform
+PS2 emulation. The regions of memory used for these devices are marked
+reserved in the e820 map. When we turn on DMA translation, DMA to those
+regions will fail. Hence BIOS uses RMRR to specify these regions along with
+devices that need to access these regions. OS is expected to setup
+identity mappings for these regions for these devices to access these regions.
+
+While creating a VM we should reserve them in advance, and avoid any conflicts.
+So we introduce user configurable parameters to specify RDM resource and
+according policies,
+
+To enable this globally, add "rdm" in the config file:
+
+rdm = "type=host, reserve=relaxed"   (default policy is "relaxed")
+
+Or just for a specific device:
+
+pci = [ '01:00.0,rdm_r

[Xen-devel] [v4][PATCH 16/19] tools/libxl: extend XENMEM_set_memory_map

2015-06-23 Thread Tiejun Chen

Here we'll construct a basic guest e820 table via
XENMEM_set_memory_map. This table includes lowmem, highmem
and RDMs if they exist. And hvmloader would need this info
later.

CC: Ian Jackson 
CC: Stefano Stabellini 
CC: Ian Campbell 
CC: Wei Liu 
Signed-off-by: Tiejun Chen 
---
v4:

* Use goto style error handling.
* Instead of NOGC, we shoud use libxl__malloc(gc,XXX) to allocate local e820.

 tools/libxl/libxl_dom.c  |  5 +++
 tools/libxl/libxl_internal.h | 24 +
 tools/libxl/libxl_x86.c  | 83 
 3 files changed, 112 insertions(+)

diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 0987991..bc8fd5b 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -1004,6 +1004,11 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
 goto out;
 }
 
+if (libxl__domain_construct_e820(gc, d_config, domid, &args)) {
+LOG(ERROR, "setting domain memory map failed");
+goto out;
+}
+
 ret = hvm_build_set_params(ctx->xch, domid, info, state->store_port,
&state->store_mfn, state->console_port,
&state->console_mfn, state->store_domid,
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index c0acf11..ae2f5e0 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3714,6 +3714,30 @@ static inline void libxl__update_config_vtpm(libxl__gc 
*gc,
  */
 void libxl__bitmap_copy_best_effort(libxl__gc *gc, libxl_bitmap *dptr,
 const libxl_bitmap *sptr);
+
+/*
+ * Here we're just trying to set these kinds of e820 mappings:
+ *
+ * #1. Low memory region
+ *
+ * Low RAM starts at least from 1M to make sure all standard regions
+ * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
+ * have enough space.
+ * Note: Those stuffs below 1M are still constructed with multiple
+ * e820 entries by hvmloader. At this point we don't change anything.
+ *
+ * #2. RDM region if it exists
+ *
+ * #3. High memory region if it exists
+ *
+ * Note: these regions are not overlapping since we already check
+ * to adjust them. Please refer to libxl__domain_device_construct_rdm().
+ */
+int libxl__domain_construct_e820(libxl__gc *gc,
+ libxl_domain_config *d_config,
+ uint32_t domid,
+ struct xc_hvm_build_args *args);
+
 #endif
 
 /*
diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index ed2bd38..be297b2 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -438,6 +438,89 @@ int libxl__arch_domain_map_irq(libxl__gc *gc, uint32_t 
domid, int irq)
 }
 
 /*
+ * Here we're just trying to set these kinds of e820 mappings:
+ *
+ * #1. Low memory region
+ *
+ * Low RAM starts at least from 1M to make sure all standard regions
+ * of the PC memory map, like BIOS, VGA memory-mapped I/O and vgabios,
+ * have enough space.
+ * Note: Those stuffs below 1M are still constructed with multiple
+ * e820 entries by hvmloader. At this point we don't change anything.
+ *
+ * #2. RDM region if it exists
+ *
+ * #3. High memory region if it exists
+ *
+ * Note: these regions are not overlapping since we already check
+ * to adjust them. Please refer to libxl__domain_device_construct_rdm().
+ */
+#define GUEST_LOW_MEM_START_DEFAULT 0x10
+int libxl__domain_construct_e820(libxl__gc *gc,
+ libxl_domain_config *d_config,
+ uint32_t domid,
+ struct xc_hvm_build_args *args)
+{
+int rc = 0;
+unsigned int nr = 0, i;
+/* We always own at least one lowmem entry. */
+unsigned int e820_entries = 1;
+struct e820entry *e820 = NULL;
+uint64_t highmem_size =
+args->highmem_end ? args->highmem_end - (1ull << 32) : 0;
+
+/* Add all rdm entries. */
+for (i = 0; i < d_config->num_rdms; i++)
+if (d_config->rdms[i].flag != LIBXL_RDM_RESERVE_FLAG_INVALID)
+e820_entries++;
+
+
+/* If we should have a highmem range. */
+if (highmem_size)
+e820_entries++;
+
+if (e820_entries >= E820MAX) {
+LOG(ERROR, "Ooops! Too many entries in the memory map!\n");
+rc = ERROR_INVAL;
+goto out;
+}
+
+e820 = libxl__malloc(gc, sizeof(struct e820entry) * e820_entries);
+
+/* Low memory */
+e820[nr].addr = GUEST_LOW_MEM_START_DEFAULT;
+e820[nr].size = args->lowmem_end - GUEST_LOW_MEM_START_DEFAULT;
+e820[nr].type = E820_RAM;
+nr++;
+
+/* RDM mapping */
+for (i = 0; i < d_config->num_rdms; i++) {
+if (d_config->rdms[i].flag == LIBXL_RDM_RESERVE_FLAG_INVALID)
+continue;
+
+e820[nr].addr = d_config->rdms[i].start;
+e820[nr].size = d_config->rdms[i].size;
+e820[nr].type = E820_RESERVED;
+nr++;
+}
+
+/* High

[Xen-devel] [v4][PATCH 18/19] xen/vtd: prevent from assign the device with shared rmrr

2015-06-23 Thread Tiejun Chen

Currently we're intending to cover this kind of devices
with shared RMRR simply since the case of shared RMRR is
a rare case according to our previous experiences. But
late we can group these devices which shared rmrr, and
then allow all devices within a group to be assigned to
same domain.

CC: Yang Zhang 
CC: Kevin Tian 
Signed-off-by: Tiejun Chen 
Acked-by: Kevin Tian 
---
v4:

* Refine one code comment.

 xen/drivers/passthrough/vtd/iommu.c | 32 +---
 1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index 07f5c7c..43ba131 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2291,13 +2291,39 @@ static int intel_iommu_assign_device(
 if ( list_empty(&acpi_drhd_units) )
 return -ENODEV;
 
+seg = pdev->seg;
+bus = pdev->bus;
+/*
+ * In rare cases one given rmrr is shared by multiple devices but
+ * obviously this would put the security of a system at risk. So
+ * we should prevent from this sort of device assignment.
+ *
+ * TODO: in the future we can introduce group device assignment
+ * interface to make sure devices sharing RMRR are assigned to the
+ * same domain together.
+ */
+for_each_rmrr_device( rmrr, bdf, i )
+{
+if ( rmrr->segment == seg &&
+ PCI_BUS(bdf) == bus &&
+ PCI_DEVFN2(bdf) == devfn )
+{
+if ( rmrr->scope.devices_cnt > 1 )
+{
+printk(XENLOG_G_ERR VTDPREFIX
+   " cannot assign %04x:%02x:%02x.%u"
+   " with shared RMRR for Dom%d.\n",
+   seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
+   d->domain_id);
+return -EPERM;
+}
+}
+}
+
 ret = reassign_device_ownership(hardware_domain, d, devfn, pdev);
 if ( ret )
 return ret;
 
-seg = pdev->seg;
-bus = pdev->bus;
-
 /* Setup rmrr identity mapping */
 for_each_rmrr_device( rmrr, bdf, i )
 {
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [v4][PATCH 05/19] xen: enable XENMEM_memory_map in hvm

2015-06-23 Thread Tiejun Chen

This patch enables XENMEM_memory_map in hvm. So hvmloader can
use it to setup the e820 mappings.

CC: Keir Fraser 
CC: Jan Beulich 
CC: Andrew Cooper 
Signed-off-by: Tiejun Chen 
Reviewed-by: Tim Deegan 
Reviewed-by: Kevin Tian 
Acked-by: Jan Beulich 
---
v4:

* Just refine the patch head description as Jan commented.

 xen/arch/x86/hvm/hvm.c | 2 --
 xen/arch/x86/mm.c  | 6 --
 2 files changed, 8 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index d5e5242..f63b01a 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4745,7 +4745,6 @@ static long hvm_memory_op(int cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 
 switch ( cmd & MEMOP_CMD_MASK )
 {
-case XENMEM_memory_map:
 case XENMEM_machine_memory_map:
 case XENMEM_machphys_mapping:
 return -ENOSYS;
@@ -4821,7 +4820,6 @@ static long hvm_memory_op_compat32(int cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 
 switch ( cmd & MEMOP_CMD_MASK )
 {
-case XENMEM_memory_map:
 case XENMEM_machine_memory_map:
 case XENMEM_machphys_mapping:
 return -ENOSYS;
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 9e08c9b..fcb8682 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -4717,12 +4717,6 @@ long arch_memory_op(unsigned long cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 return rc;
 }
 
-if ( is_hvm_domain(d) )
-{
-rcu_unlock_domain(d);
-return -EPERM;
-}
-
 e820 = xmalloc_array(e820entry_t, fmap.map.nr_entries);
 if ( e820 == NULL )
 {
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [v4][PATCH 13/19] tools/libxc: check to set args.mmio_size before call xc_hvm_build

2015-06-23 Thread Tiejun Chen

After commit 5dff8e9eedc7, "libxc/libxl: fill xc_hvm_build_args in
libxl" is introduced, we won't check to set args.mmio_size inside
xc_hvm_build as before. So instead, we need to do this before call
that.

CC: Ian Jackson 
CC: Stefano Stabellini 
CC: Ian Campbell 
CC: Wei Liu 
Signed-off-by: Tiejun Chen 
---
v4:

* Separate this from currenpt patch #14 since this is specific to xc.

 tools/libxc/xc_hvm_build_x86.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
index 003ea06..7343e87 100644
--- a/tools/libxc/xc_hvm_build_x86.c
+++ b/tools/libxc/xc_hvm_build_x86.c
@@ -754,6 +754,8 @@ int xc_hvm_build_target_mem(xc_interface *xch,
 args.mem_size = (uint64_t)memsize << 20;
 args.mem_target = (uint64_t)target << 20;
 args.image_file_name = image_name;
+if ( args.mmio_size == 0 )
+args.mmio_size = HVM_BELOW_4G_MMIO_LENGTH;
 
 return xc_hvm_build(xch, domid, &args);
 }
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [v4][PATCH 02/19] xen/x86/p2m: introduce set_identity_p2m_entry

2015-06-23 Thread Tiejun Chen

We will create this sort of identity mapping as follows:

If the gfn space is unoccupied, we just set the mapping. If space
is already occupied by desired identity mapping, do nothing.
Otherwise, failure is returned.

And we also add a returning value to guest_physmap_remove_page()
then make that as a better helper to clear such a p2m entry.

CC: Tim Deegan 
CC: Keir Fraser 
CC: Jan Beulich 
CC: Andrew Cooper 
Signed-off-by: Tiejun Chen 
Reviewed-by: Kevin Tian 
---
v4:

* Change that orginal condition,

  if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm )
  
  to make sure we catch those invalid mfn mapping as we expected.

* To have

  if ( !paging_mode_translate(p2m->domain) )
return 0;

  at the start, instead of indenting the whole body of the function
  in an inner scope. 

* extend guest_physmap_remove_page() to return a value as a proper
  unmapping helper

 xen/arch/x86/mm/p2m.c | 40 ++--
 xen/include/asm-x86/p2m.h | 10 +++---
 2 files changed, 45 insertions(+), 5 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 1fd1194..7e50db6 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -584,14 +584,16 @@ p2m_remove_page(struct p2m_domain *p2m, unsigned long 
gfn, unsigned long mfn,
  p2m->default_access);
 }
 
-void
+int
 guest_physmap_remove_page(struct domain *d, unsigned long gfn,
   unsigned long mfn, unsigned int page_order)
 {
 struct p2m_domain *p2m = p2m_get_hostp2m(d);
+int rc;
 gfn_lock(p2m, gfn, page_order);
-p2m_remove_page(p2m, gfn, mfn, page_order);
+rc = p2m_remove_page(p2m, gfn, mfn, page_order);
 gfn_unlock(p2m, gfn, page_order);
+return rc;
 }
 
 int
@@ -898,6 +900,40 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long 
gfn, mfn_t mfn,
 return set_typed_p2m_entry(d, gfn, mfn, p2m_mmio_direct, access);
 }
 
+int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
+   p2m_access_t p2ma)
+{
+p2m_type_t p2mt;
+p2m_access_t a;
+mfn_t mfn;
+struct p2m_domain *p2m = p2m_get_hostp2m(d);
+int ret;
+
+if ( !paging_mode_translate(p2m->domain) )
+return 0;
+
+gfn_lock(p2m, gfn, 0);
+
+mfn = p2m->get_entry(p2m, gfn, &p2mt, &a, 0, NULL);
+
+if ( p2mt == p2m_invalid || p2mt == p2m_mmio_dm )
+ret = p2m_set_entry(p2m, gfn, _mfn(gfn), PAGE_ORDER_4K,
+p2m_mmio_direct, p2ma);
+else if ( mfn_x(mfn) == gfn && p2mt == p2m_mmio_direct && a == p2ma )
+ret = 0;
+else
+{
+ret = -EBUSY;
+printk(XENLOG_G_WARNING
+   "Cannot setup identity map d%d:%lx,"
+   " gfn already mapped to %lx.\n",
+   d->domain_id, gfn, mfn_x(mfn));
+}
+
+gfn_unlock(p2m, gfn, 0);
+return ret;
+}
+
 /* Returns: 0 for success, -errno for failure */
 int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn)
 {
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index b49c09b..538a1cf 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -503,9 +503,9 @@ static inline int guest_physmap_add_page(struct domain *d,
 }
 
 /* Remove a page from a domain's p2m table */
-void guest_physmap_remove_page(struct domain *d,
-   unsigned long gfn,
-   unsigned long mfn, unsigned int page_order);
+int guest_physmap_remove_page(struct domain *d,
+  unsigned long gfn,
+  unsigned long mfn, unsigned int page_order);
 
 /* Set a p2m range as populate-on-demand */
 int guest_physmap_mark_populate_on_demand(struct domain *d, unsigned long gfn,
@@ -543,6 +543,10 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long 
gfn, mfn_t mfn,
p2m_access_t access);
 int clear_mmio_p2m_entry(struct domain *d, unsigned long gfn, mfn_t mfn);
 
+/* Set identity addresses in the p2m table (for pass-through) */
+int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
+   p2m_access_t p2ma);
+
 /* Add foreign mapping to the guest's p2m table. */
 int p2m_add_foreign(struct domain *tdom, unsigned long fgfn,
 unsigned long gpfn, domid_t foreign_domid);
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [v4][PATCH 17/19] xen/vtd: enable USB device assignment

2015-06-23 Thread Tiejun Chen

USB RMRR may conflict with guest BIOS region. In such case, identity
mapping setup is simply skipped in previous implementation. Now we
can handle this scenario cleanly with new policy mechanism so previous
hack code can be removed now.

CC: Yang Zhang 
CC: Kevin Tian 
Signed-off-by: Tiejun Chen 
Acked-by: Kevin Tian 
---
v4:

* Refine the patch head description

 xen/drivers/passthrough/vtd/dmar.h  |  1 -
 xen/drivers/passthrough/vtd/iommu.c | 11 ++-
 xen/drivers/passthrough/vtd/utils.c |  7 ---
 3 files changed, 2 insertions(+), 17 deletions(-)

diff --git a/xen/drivers/passthrough/vtd/dmar.h 
b/xen/drivers/passthrough/vtd/dmar.h
index af1feef..af205f5 100644
--- a/xen/drivers/passthrough/vtd/dmar.h
+++ b/xen/drivers/passthrough/vtd/dmar.h
@@ -129,7 +129,6 @@ do {\
 
 int vtd_hw_check(void);
 void disable_pmr(struct iommu *iommu);
-int is_usb_device(u16 seg, u8 bus, u8 devfn);
 int is_igd_drhd(struct acpi_drhd_unit *drhd);
 
 #endif /* _DMAR_H_ */
diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c
index 59d5fd7..07f5c7c 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2239,11 +2239,9 @@ static int reassign_device_ownership(
 /*
  * If the device belongs to the hardware domain, and it has RMRR, don't
  * remove it from the hardware domain, because BIOS may use RMRR at
- * booting time. Also account for the special casing of USB below (in
- * intel_iommu_assign_device()).
+ * booting time.
  */
-if ( !is_hardware_domain(source) &&
- !is_usb_device(pdev->seg, pdev->bus, pdev->devfn) )
+if ( !is_hardware_domain(source) )
 {
 const struct acpi_rmrr_unit *rmrr;
 u16 bdf;
@@ -2297,13 +2295,8 @@ static int intel_iommu_assign_device(
 if ( ret )
 return ret;
 
-/* FIXME: Because USB RMRR conflicts with guest bios region,
- * ignore USB RMRR temporarily.
- */
 seg = pdev->seg;
 bus = pdev->bus;
-if ( is_usb_device(seg, bus, pdev->devfn) )
-return 0;
 
 /* Setup rmrr identity mapping */
 for_each_rmrr_device( rmrr, bdf, i )
diff --git a/xen/drivers/passthrough/vtd/utils.c 
b/xen/drivers/passthrough/vtd/utils.c
index bd14c02..b8a077f 100644
--- a/xen/drivers/passthrough/vtd/utils.c
+++ b/xen/drivers/passthrough/vtd/utils.c
@@ -29,13 +29,6 @@
 #include "extern.h"
 #include 
 
-int is_usb_device(u16 seg, u8 bus, u8 devfn)
-{
-u16 class = pci_conf_read16(seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
-PCI_CLASS_DEVICE);
-return (class == 0xc03);
-}
-
 /* Disable vt-d protected memory registers. */
 void disable_pmr(struct iommu *iommu)
 {
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [v4][PATCH 14/19] tools/libxl: detect and avoid conflicts with RDM

2015-06-23 Thread Tiejun Chen

While building a VM, HVM domain builder provides struct hvm_info_table{}
to help hvmloader. Currently it includes two fields to construct guest
e820 table by hvmloader, low_mem_pgend and high_mem_pgend. So we should
check them to fix any conflict with RAM.

RMRR can reside in address space beyond 4G theoretically, but we never
see this in real world. So in order to avoid breaking highmem layout
we don't solve highmem conflict. Note this means highmem rmrr could still
be supported if no conflict.

But in the case of lowmem, RMRR probably scatter the whole RAM space.
Especially multiple RMRR entries would worsen this to lead a complicated
memory layout. And then its hard to extend hvm_info_table{} to work
hvmloader out. So here we're trying to figure out a simple solution to
avoid breaking existing layout. So when a conflict occurs,

#1. Above a predefined boundary (2G)
- move lowmem_end below reserved region to solve conflict;

#2. Below a predefined boundary (2G)
- Check strict/relaxed policy.
"strict" policy leads to fail libxl. Note when both policies
are specified on a given region, 'strict' is always preferred.
"relaxed" policy issue a warning message and also mask this entry 
INVALID
to indicate we shouldn't expose this entry to hvmloader.

Note later we need to provide a parameter to set that predefined boundary
dynamically.

CC: Ian Jackson 
CC: Stefano Stabellini 
CC: Ian Campbell 
CC: Wei Liu 
Signed-off-by: Tiejun Chen 
Reviewed-by: Kevin Tian 
---
v4:

* Consistent to use term "RDM".
* Unconditionally set *nr_entries to 0
* Grab to all sutffs to provide a parameter to set our predefined boundary
  dynamically to as a separated patch later

 tools/libxl/libxl_create.c   |   2 +-
 tools/libxl/libxl_dm.c   | 259 +++
 tools/libxl/libxl_dom.c  |  17 ++-
 tools/libxl/libxl_internal.h |  11 +-
 tools/libxl/libxl_types.idl  |   7 ++
 5 files changed, 293 insertions(+), 3 deletions(-)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 6c8ec63..30e6593 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -460,7 +460,7 @@ int libxl__domain_build(libxl__gc *gc,
 
 switch (info->type) {
 case LIBXL_DOMAIN_TYPE_HVM:
-ret = libxl__build_hvm(gc, domid, info, state);
+ret = libxl__build_hvm(gc, domid, d_config, state);
 if (ret)
 goto out;
 
diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 33f9ce6..5436bcf 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -90,6 +90,265 @@ const char *libxl__domain_device_model(libxl__gc *gc,
 return dm;
 }
 
+static struct xen_reserved_device_memory
+*xc_device_get_rdm(libxl__gc *gc,
+   uint32_t flag,
+   uint16_t seg,
+   uint8_t bus,
+   uint8_t devfn,
+   unsigned int *nr_entries)
+{
+struct xen_reserved_device_memory *xrdm;
+int rc;
+
+/*
+ * We really can't presume how many entries we can get in advance.
+ */
+*nr_entries = 0;
+rc = xc_reserved_device_memory_map(CTX->xch, flag, seg, bus, devfn,
+   NULL, nr_entries);
+assert(rc <= 0);
+/* "0" means we have no any rdm entry. */
+if (!rc)
+goto out;
+
+if (errno == ENOBUFS) {
+xrdm = malloc(*nr_entries * sizeof(xen_reserved_device_memory_t));
+if (!xrdm) {
+LOG(ERROR, "Could not allocate RDM buffer!\n");
+goto out;
+}
+rc = xc_reserved_device_memory_map(CTX->xch, flag, seg, bus, devfn,
+   xrdm, nr_entries);
+if (rc) {
+LOG(ERROR, "Could not get reserved device memory maps.\n");
+*nr_entries = 0;
+free(xrdm);
+xrdm = NULL;
+}
+} else
+LOG(ERROR, "Could not get reserved device memory maps.\n");
+
+ out:
+return xrdm;
+}
+
+/*
+ * Check whether there exists rdm hole in the specified memory range.
+ * Returns true if exists, else returns false.
+ */
+static bool overlaps_rdm(uint64_t start, uint64_t memsize,
+ uint64_t rdm_start, uint64_t rdm_size)
+{
+return (start + memsize > rdm_start) && (start < rdm_start + rdm_size);
+}
+
+/*
+ * Check reported RDM regions and handle potential gfn conflicts according
+ * to user preferred policy.
+ *
+ * RDM can reside in address space beyond 4G theoretically, but we never
+ * see this in real world. So in order to avoid breaking highmem layout
+ * we don't solve highmem conflict. Note this means highmem rmrr could still
+ * be supported if no conflict.
+ *
+ * But in the case of lowmem, RDM probably scatter the whole RAM space.
+ * Especially multiple RDM entries would worsen this to lead a complicated
+ * memory layout. And then its hard to extend hvm_info_table{} to work
+ * hvmloader out.

[Xen-devel] [v4][PATCH 15/19] tools: introduce a new parameter to set a predefined rdm boundary

2015-06-23 Thread Tiejun Chen

Previously we always fix that predefined boundary as 2G to handle
conflict between memory and rdm, but now this predefined boundar
can be changes with the parameter "rdm_mem_boundary" in .cfg file.

CC: Ian Jackson 
CC: Stefano Stabellini 
CC: Ian Campbell 
CC: Wei Liu 
Signed-off-by: Tiejun Chen 
---
v4:

* Separated from the previous patch to provide a parameter to set that
  predefined boundary dynamically.

 docs/man/xl.cfg.pod.5   | 21 +
 tools/libxl/libxl.h |  6 ++
 tools/libxl/libxl_create.c  |  4 
 tools/libxl/libxl_dom.c |  8 +---
 tools/libxl/libxl_types.idl |  1 +
 tools/libxl/xl_cmdimpl.c|  3 +++
 6 files changed, 36 insertions(+), 7 deletions(-)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index 638b350..079465f 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -767,6 +767,27 @@ to a given device, and "strict" is default here.
 
 Note this would override global B option.
 
+=item B
+
+Number of megabytes to set a boundary for checking rdm conflict.
+
+When RDM conflicts with RAM, RDM probably scatter the whole RAM space.
+Especially multiple RDM entries would worsen this to lead a complicated
+memory layout. So here we're trying to figure out a simple solution to
+avoid breaking existing layout. So when a conflict occurs,
+
+#1. Above a predefined boundary
+- move lowmem_end below reserved region to solve conflict;
+
+#2. Below a predefined boundary
+- Check strict/relaxed policy.
+"strict" policy leads to fail libxl. Note when both policies
+are specified on a given region, 'strict' is always preferred.
+"relaxed" policy issue a warning message and also mask this entry 
INVALID
+to indicate we shouldn't expose this entry to hvmloader.
+
+Here the default is 2G.
+
 =back
 
 =back
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 0a7913b..a6212fb 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -858,6 +858,12 @@ const char *libxl_defbool_to_string(libxl_defbool b);
 #define LIBXL_TIMER_MODE_DEFAULT -1
 #define LIBXL_MEMKB_DEFAULT ~0ULL
 
+/*
+ * We'd like to set a memory boundary to determine if we need to check
+ * any overlap with reserved device memory.
+ */
+#define LIBXL_RDM_MEM_BOUNDARY_MEMKB_DEFAULT (2048 * 1024)
+
 #define LIBXL_MS_VM_GENID_LEN 16
 typedef struct {
 uint8_t bytes[LIBXL_MS_VM_GENID_LEN];
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 30e6593..0438731 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -109,6 +109,10 @@ void libxl__rdm_setdefault(libxl__gc *gc, 
libxl_domain_build_info *b_info)
 {
 if (b_info->rdm.reserve == LIBXL_RDM_RESERVE_FLAG_INVALID)
 b_info->rdm.reserve = LIBXL_RDM_RESERVE_FLAG_RELAXED;
+
+if (b_info->rdm_mem_boundary_memkb == LIBXL_MEMKB_DEFAULT)
+b_info->rdm_mem_boundary_memkb =
+LIBXL_RDM_MEM_BOUNDARY_MEMKB_DEFAULT;
 }
 
 int libxl__domain_build_info_setdefault(libxl__gc *gc,
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index 34bd466..0987991 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -922,12 +922,6 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
 int ret, rc = ERROR_FAIL;
 uint64_t mmio_start, lowmem_end, highmem_end;
 libxl_domain_build_info *const info = &d_config->b_info;
-/*
- * Currently we fix this as 2G to guarantte how to handle
- * our rdm policy. But we'll provide a parameter to set
- * this dynamically.
- */
-uint64_t rdm_mem_boundary = 0x8000;
 
 memset(&args, 0, sizeof(struct xc_hvm_build_args));
 /* The params from the configuration file are in Mb, which are then
@@ -966,7 +960,7 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
 args.mmio_start = mmio_start;
 
 ret = libxl__domain_device_construct_rdm(gc, d_config,
- rdm_mem_boundary,
+ info->rdm_mem_boundary_memkb*1024,
  &args);
 if (ret) {
 LOG(ERROR, "checking reserved device memory failed");
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 5ba075d..d130d48 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -395,6 +395,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
 ("target_memkb",MemKB),
 ("video_memkb", MemKB),
 ("shadow_memkb",MemKB),
+("rdm_mem_boundary_memkb",MemKB),
 ("rtc_timeoffset",  uint32),
 ("exec_ssidref",uint32),
 ("exec_ssid_label", string),
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 5637c30..c7a12b1 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -1374,6 +1374,9 @@ static void parse_config_data(const char *config_source,
 if (!xlu_cfg_get_long (config, "videoram", &l, 0))

[Xen-devel] [v4][PATCH 10/19] tools: extend xc_assign_device() to support rdm reservation policy

2015-06-23 Thread Tiejun Chen

This patch passes rdm reservation policy to xc_assign_device() so the policy
is checked when assigning devices to a VM.

Note this also bring some fallout to python usage of xc_assign_device().

CC: Ian Jackson 
CC: Stefano Stabellini 
CC: Ian Campbell 
CC: Wei Liu 
CC: David Scott 
Signed-off-by: Tiejun Chen 
---
v4:

* In the patch head description, I add to explain why we need to sync
  the xc.c file

 tools/libxc/include/xenctrl.h   |  3 ++-
 tools/libxc/xc_domain.c |  6 +-
 tools/libxl/libxl_pci.c |  3 ++-
 tools/ocaml/libs/xc/xenctrl_stubs.c | 18 ++
 tools/python/xen/lowlevel/xc/xc.c   | 29 +++--
 5 files changed, 42 insertions(+), 17 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 9160623..89cbc5a 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2079,7 +2079,8 @@ int xc_hvm_destroy_ioreq_server(xc_interface *xch,
 /* HVM guest pass-through */
 int xc_assign_device(xc_interface *xch,
  uint32_t domid,
- uint32_t machine_sbdf);
+ uint32_t machine_sbdf,
+ uint32_t flag);
 
 int xc_get_device_group(xc_interface *xch,
  uint32_t domid,
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 0951291..40ff0f4 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1697,7 +1697,8 @@ int xc_domain_setdebugging(xc_interface *xch,
 int xc_assign_device(
 xc_interface *xch,
 uint32_t domid,
-uint32_t machine_sbdf)
+uint32_t machine_sbdf,
+uint32_t flag)
 {
 DECLARE_DOMCTL;
 
@@ -1705,6 +1706,7 @@ int xc_assign_device(
 domctl.domain = domid;
 domctl.u.assign_device.dev = XEN_DOMCTL_DEV_PCI;
 domctl.u.assign_device.u.pci.machine_sbdf = machine_sbdf;
+domctl.u.assign_device.flag = flag;
 
 return do_domctl(xch, &domctl);
 }
@@ -1792,6 +1794,8 @@ int xc_assign_dt_device(
 
 domctl.u.assign_device.dev = XEN_DOMCTL_DEV_DT;
 domctl.u.assign_device.u.dt.size = size;
+/* DT doesn't own any RDM. */
+domctl.u.assign_device.flag = XEN_DOMCTL_DEV_NO_RDM;
 set_xen_guest_handle(domctl.u.assign_device.u.dt.path, path);
 
 rc = do_domctl(xch, &domctl);
diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index e0743f8..632c15e 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -894,6 +894,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, 
libxl_device_pci *pcidev, i
 FILE *f;
 unsigned long long start, end, flags, size;
 int irq, i, rc, hvm = 0;
+uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
 
 if (type == LIBXL_DOMAIN_TYPE_INVALID)
 return ERROR_FAIL;
@@ -987,7 +988,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, 
libxl_device_pci *pcidev, i
 
 out:
 if (!libxl_is_stubdom(ctx, domid, NULL)) {
-rc = xc_assign_device(ctx->xch, domid, pcidev_encode_bdf(pcidev));
+rc = xc_assign_device(ctx->xch, domid, pcidev_encode_bdf(pcidev), 
flag);
 if (rc < 0 && (hvm || errno != ENOSYS)) {
 LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "xc_assign_device failed");
 return ERROR_FAIL;
diff --git a/tools/ocaml/libs/xc/xenctrl_stubs.c 
b/tools/ocaml/libs/xc/xenctrl_stubs.c
index 64f1137..317bf75 100644
--- a/tools/ocaml/libs/xc/xenctrl_stubs.c
+++ b/tools/ocaml/libs/xc/xenctrl_stubs.c
@@ -1172,12 +1172,19 @@ CAMLprim value stub_xc_domain_test_assign_device(value 
xch, value domid, value d
CAMLreturn(Val_bool(ret == 0));
 }
 
-CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc)
+static int domain_assign_device_rdm_flag_table[] = {
+XEN_DOMCTL_DEV_NO_RDM,
+XEN_DOMCTL_DEV_RDM_RELAXED,
+XEN_DOMCTL_DEV_RDM_STRICT,
+};
+
+CAMLprim value stub_xc_domain_assign_device(value xch, value domid, value desc,
+value rflag)
 {
-   CAMLparam3(xch, domid, desc);
+   CAMLparam4(xch, domid, desc, rflag);
int ret;
int domain, bus, dev, func;
-   uint32_t sbdf;
+   uint32_t sbdf, flag;
 
domain = Int_val(Field(desc, 0));
bus = Int_val(Field(desc, 1));
@@ -1185,7 +1192,10 @@ CAMLprim value stub_xc_domain_assign_device(value xch, 
value domid, value desc)
func = Int_val(Field(desc, 3));
sbdf = encode_sbdf(domain, bus, dev, func);
 
-   ret = xc_assign_device(_H(xch), _D(domid), sbdf);
+   ret = Int_val(Field(rflag, 0));
+   flag = domain_assign_device_rdm_flag_table[ret];
+
+   ret = xc_assign_device(_H(xch), _D(domid), sbdf, flag);
 
if (ret < 0)
failwith_xc(_H(xch));
diff --git a/tools/python/xen/lowlevel/xc/xc.c 
b/tools/python/xen/lowlevel/xc/xc.c
index c77e15b..172bdf0 100644
--- a/tools/python/xen/lowlevel/xc/xc.c
+++ b/tools/python/xen/lowlevel/xc/xc.c
@@ -592,7 +592,8 @@ static int token_value(char *token)

[Xen-devel] [v4][PATCH 06/19] hvmloader: get guest memory map into memory_map[]

2015-06-23 Thread Tiejun Chen

Now we get this map layout by call XENMEM_memory_map then
save them into one global variable memory_map[]. It should
include lowmem range, rdm range and highmem range. Note
rdm range and highmem range may not exist in some cases.

And here we need to check if any reserved memory conflicts with
[RESERVED_MEMORY_DYNAMIC_START - 1, RESERVED_MEMORY_DYNAMIC_END].
This range is used to allocate memory in hvmloder level, and
we would lead hvmloader failed in case of conflict since its
another rare possibility in real world.

CC: Keir Fraser 
CC: Jan Beulich 
CC: Andrew Cooper 
CC: Ian Jackson 
CC: Stefano Stabellini 
CC: Ian Campbell 
CC: Wei Liu 
Signed-off-by: Tiejun Chen 
Reviewed-by: Kevin Tian 
---
v4:

* Move some codes related to e820 to that specific file, e820.c.

* Consolidate "printf()+BUG()" and "BUG_ON()"

* Avoid another fixed width type for the parameter of get_mem_mapping_layout()

 tools/firmware/hvmloader/e820.c  | 35 +++
 tools/firmware/hvmloader/e820.h  |  7 +++
 tools/firmware/hvmloader/hvmloader.c |  2 ++
 tools/firmware/hvmloader/util.c  | 26 ++
 tools/firmware/hvmloader/util.h  | 12 
 5 files changed, 82 insertions(+)

diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c
index 2e05e93..3e53c47 100644
--- a/tools/firmware/hvmloader/e820.c
+++ b/tools/firmware/hvmloader/e820.c
@@ -23,6 +23,41 @@
 #include "config.h"
 #include "util.h"
 
+struct e820map memory_map;
+
+void memory_map_setup(void)
+{
+unsigned int nr_entries = E820MAX, i;
+int rc;
+uint64_t alloc_addr = RESERVED_MEMORY_DYNAMIC_START - 1;
+uint64_t alloc_size = RESERVED_MEMORY_DYNAMIC_END - alloc_addr;
+
+rc = get_mem_mapping_layout(memory_map.map, &nr_entries);
+
+if ( rc || !nr_entries )
+{
+printf("Get guest memory maps[%d] failed. (%d)\n", nr_entries, rc);
+BUG();
+}
+
+memory_map.nr_map = nr_entries;
+
+for ( i = 0; i < nr_entries; i++ )
+{
+if ( memory_map.map[i].type == E820_RESERVED )
+{
+if ( check_overlap(alloc_addr, alloc_size,
+   memory_map.map[i].addr,
+   memory_map.map[i].size) )
+{
+printf("Fail to setup memory map due to conflict");
+printf(" on dynamic reserved memory range.\n");
+BUG();
+}
+}
+}
+}
+
 void dump_e820_table(struct e820entry *e820, unsigned int nr)
 {
 uint64_t last_end = 0, start, end;
diff --git a/tools/firmware/hvmloader/e820.h b/tools/firmware/hvmloader/e820.h
index b2ead7f..8b5a9e0 100644
--- a/tools/firmware/hvmloader/e820.h
+++ b/tools/firmware/hvmloader/e820.h
@@ -15,6 +15,13 @@ struct e820entry {
 uint32_t type;
 } __attribute__((packed));
 
+#define E820MAX128
+
+struct e820map {
+unsigned int nr_map;
+struct e820entry map[E820MAX];
+};
+
 #endif /* __HVMLOADER_E820_H__ */
 
 /*
diff --git a/tools/firmware/hvmloader/hvmloader.c 
b/tools/firmware/hvmloader/hvmloader.c
index 25b7f08..84c588c 100644
--- a/tools/firmware/hvmloader/hvmloader.c
+++ b/tools/firmware/hvmloader/hvmloader.c
@@ -262,6 +262,8 @@ int main(void)
 
 init_hypercalls();
 
+memory_map_setup();
+
 xenbus_setup();
 
 bios = detect_bios();
diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index 80d822f..122e3fa 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -27,6 +27,17 @@
 #include 
 #include 
 
+/*
+ * Check whether there exists overlap in the specified memory range.
+ * Returns true if exists, else returns false.
+ */
+bool check_overlap(uint64_t start, uint64_t size,
+   uint64_t reserved_start, uint64_t reserved_size)
+{
+return (start + size > reserved_start) &&
+(start < reserved_start + reserved_size);
+}
+
 void wrmsr(uint32_t idx, uint64_t v)
 {
 asm volatile (
@@ -368,6 +379,21 @@ uuid_to_string(char *dest, uint8_t *uuid)
 *p = '\0';
 }
 
+int get_mem_mapping_layout(struct e820entry entries[], uint32_t *max_entries)
+{
+int rc;
+struct xen_memory_map memmap = {
+.nr_entries = *max_entries
+};
+
+set_xen_guest_handle(memmap.buffer, entries);
+
+rc = hypercall_memory_op(XENMEM_memory_map, &memmap);
+*max_entries = memmap.nr_entries;
+
+return rc;
+}
+
 void mem_hole_populate_ram(xen_pfn_t mfn, uint32_t nr_mfns)
 {
 static int over_allocated;
diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
index f99c0f19..1100a3b 100644
--- a/tools/firmware/hvmloader/util.h
+++ b/tools/firmware/hvmloader/util.h
@@ -4,8 +4,10 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
+#include "e820.h"
 
 #define __STR(...) #__VA_ARGS__
 #define STR(...) __STR(__VA_ARGS__)
@@ -222,6 +224,9 @@ int hvm_param_set(uint32_t index, uint64_t value);
 /* Setup PCI bus */
 void pci

[Xen-devel] [v4][PATCH 04/19] xen/passthrough: extend hypercall to support rdm reservation policy

2015-06-23 Thread Tiejun Chen

This patch extends the existing hypercall to support rdm reservation policy.
We return error or just throw out a warning message depending on whether
the policy is "strict" or "relaxed" when reserving RDM regions in pfn space.
Note in some special cases, e.g. add a device to hwdomain, and remove a
device from user domain, 'relaxed' is fine enough since this is always safe
to hwdomain.

CC: Tim Deegan 
CC: Keir Fraser 
CC: Jan Beulich 
CC: Andrew Cooper 
CC: Suravee Suthikulpanit 
CC: Aravind Gopalakrishnan 
CC: Ian Campbell 
CC: Stefano Stabellini 
CC: Yang Zhang 
CC: Kevin Tian 
Signed-off-by: Tiejun Chen 
---
v4:

* Add code comments to describer why we fix to set a policy flag in some
  cases like adding a device to hwdomain, and removing a device from user 
domain.

* Avoid using fixed width types for the parameter of set_identity_p2m_entry()

* Fix one judging condition
  domctl->u.assign_device.flag == XEN_DOMCTL_DEV_NO_RDM
  -> domctl->u.assign_device.flag != XEN_DOMCTL_DEV_NO_RDM

* Add to range check the flag passed to make future extensions possible
  (and to avoid ambiguity on what out of range values would mean).

 xen/arch/x86/mm/p2m.c   |  7 --
 xen/drivers/passthrough/amd/pci_amd_iommu.c |  3 ++-
 xen/drivers/passthrough/arm/smmu.c  |  2 +-
 xen/drivers/passthrough/device_tree.c   | 11 +-
 xen/drivers/passthrough/pci.c   | 15 +
 xen/drivers/passthrough/vtd/iommu.c | 34 ++---
 xen/include/asm-x86/p2m.h   |  2 +-
 xen/include/public/domctl.h |  5 +
 xen/include/xen/iommu.h |  2 +-
 9 files changed, 62 insertions(+), 19 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 7e50db6..a3e07d3 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -901,7 +901,7 @@ int set_mmio_p2m_entry(struct domain *d, unsigned long gfn, 
mfn_t mfn,
 }
 
 int set_identity_p2m_entry(struct domain *d, unsigned long gfn,
-   p2m_access_t p2ma)
+   p2m_access_t p2ma, unsigned int flag)
 {
 p2m_type_t p2mt;
 p2m_access_t a;
@@ -923,7 +923,10 @@ int set_identity_p2m_entry(struct domain *d, unsigned long 
gfn,
 ret = 0;
 else
 {
-ret = -EBUSY;
+if ( flag == XEN_DOMCTL_DEV_RDM_STRICT )
+ret = -EBUSY;
+else
+ret = 0;
 printk(XENLOG_G_WARNING
"Cannot setup identity map d%d:%lx,"
" gfn already mapped to %lx.\n",
diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c 
b/xen/drivers/passthrough/amd/pci_amd_iommu.c
index e83bb35..920b35a 100644
--- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
+++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
@@ -394,7 +394,8 @@ static int reassign_device(struct domain *source, struct 
domain *target,
 }
 
 static int amd_iommu_assign_device(struct domain *d, u8 devfn,
-   struct pci_dev *pdev)
+   struct pci_dev *pdev,
+   u32 flag)
 {
 struct ivrs_mappings *ivrs_mappings = get_ivrs_mappings(pdev->seg);
 int bdf = PCI_BDF2(pdev->bus, devfn);
diff --git a/xen/drivers/passthrough/arm/smmu.c 
b/xen/drivers/passthrough/arm/smmu.c
index 6cc4394..9a667e9 100644
--- a/xen/drivers/passthrough/arm/smmu.c
+++ b/xen/drivers/passthrough/arm/smmu.c
@@ -2605,7 +2605,7 @@ static void arm_smmu_destroy_iommu_domain(struct 
iommu_domain *domain)
 }
 
 static int arm_smmu_assign_dev(struct domain *d, u8 devfn,
-  struct device *dev)
+  struct device *dev, u32 flag)
 {
struct iommu_domain *domain;
struct arm_smmu_xen_domain *xen_domain;
diff --git a/xen/drivers/passthrough/device_tree.c 
b/xen/drivers/passthrough/device_tree.c
index 5d3842a..e286f1e 100644
--- a/xen/drivers/passthrough/device_tree.c
+++ b/xen/drivers/passthrough/device_tree.c
@@ -52,7 +52,8 @@ int iommu_assign_dt_device(struct domain *d, struct 
dt_device_node *dev)
 goto fail;
 }
 
-rc = hd->platform_ops->assign_device(d, 0, dt_to_dev(dev));
+rc = hd->platform_ops->assign_device(d, 0, dt_to_dev(dev),
+ XEN_DOMCTL_DEV_NO_RDM);
 
 if ( rc )
 goto fail;
@@ -148,6 +149,14 @@ int iommu_do_dt_domctl(struct xen_domctl *domctl, struct 
domain *d,
 if ( domctl->u.assign_device.dev != XEN_DOMCTL_DEV_DT )
 break;
 
+if ( domctl->u.assign_device.flag != XEN_DOMCTL_DEV_NO_RDM )
+{
+printk(XENLOG_G_ERR "XEN_DOMCTL_assign_device: assign \"%s\""
+   " to dom%u failed (%d) since we don't support RDM.\n",
+   dt_node_full_name(dev), d->domain_id, ret);
+break;
+}
+
 if ( unlikely(d->is_dying) )
 {
 ret = -EINVAL;
diff --git a/xen/drivers/pass

[Xen-devel] [v4][PATCH 12/19] tools/libxl: passes rdm reservation policy

2015-06-23 Thread Tiejun Chen

This patch passes our rdm reservation policy inside libxl
when we assign a device or attach a device.

CC: Ian Jackson 
CC: Stefano Stabellini 
CC: Ian Campbell 
CC: Wei Liu 
Signed-off-by: Tiejun Chen 
---
v4:

* Fix one typo, s/unkwon/unknown
* In command description, we should use "[]" to indicate it's optional
  for that extended xl command, pci-attach.

 docs/man/xl.pod.1 |  7 ++-
 tools/libxl/libxl_pci.c   | 10 +-
 tools/libxl/xl_cmdimpl.c  | 23 +++
 tools/libxl/xl_cmdtable.c |  2 +-
 4 files changed, 35 insertions(+), 7 deletions(-)

diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index 4eb929d..c5c4809 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -1368,10 +1368,15 @@ it will also attempt to re-bind the device to its 
original driver, making it
 usable by Domain 0 again.  If the device is not bound to pciback, it will
 return success.
 
-=item B I I
+=item B I I [I]
 
 Hot-plug a new pass-through pci device to the specified domain.
 B is the PCI Bus/Device/Function of the physical device to pass-through.
+B is about how to handle conflict between reserving reserved device
+memory and guest address space. "strict" means an unsolved conflict leads to
+immediate VM crash, while "relaxed" allows VM moving forward with a warning
+message thrown out. Here "strict" is default.
+
 
 =item B [I<-f>] I I
 
diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c
index a00d799..a6a2a8c 100644
--- a/tools/libxl/libxl_pci.c
+++ b/tools/libxl/libxl_pci.c
@@ -894,7 +894,7 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, 
libxl_device_pci *pcidev, i
 FILE *f;
 unsigned long long start, end, flags, size;
 int irq, i, rc, hvm = 0;
-uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
+uint32_t flag;
 
 if (type == LIBXL_DOMAIN_TYPE_INVALID)
 return ERROR_FAIL;
@@ -988,6 +988,14 @@ static int do_pci_add(libxl__gc *gc, uint32_t domid, 
libxl_device_pci *pcidev, i
 
 out:
 if (!libxl_is_stubdom(ctx, domid, NULL)) {
+if (pcidev->rdm_reserve == LIBXL_RDM_RESERVE_FLAG_RELAXED) {
+flag = XEN_DOMCTL_DEV_RDM_RELAXED;
+} else if (pcidev->rdm_reserve == LIBXL_RDM_RESERVE_FLAG_STRICT) {
+flag = XEN_DOMCTL_DEV_RDM_STRICT;
+} else {
+LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "unknown rdm check flag.");
+return ERROR_FAIL;
+}
 rc = xc_assign_device(ctx->xch, domid, pcidev_encode_bdf(pcidev), 
flag);
 if (rc < 0 && (hvm || errno != ENOSYS)) {
 LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "xc_assign_device failed");
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index c858068..5637c30 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -3349,7 +3349,8 @@ int main_pcidetach(int argc, char **argv)
 pcidetach(domid, bdf, force);
 return 0;
 }
-static void pciattach(uint32_t domid, const char *bdf, const char *vs)
+static void pciattach(uint32_t domid, const char *bdf, const char *vs,
+  uint32_t flag)
 {
 libxl_device_pci pcidev;
 XLU_Config *config;
@@ -3359,6 +3360,7 @@ static void pciattach(uint32_t domid, const char *bdf, 
const char *vs)
 config = xlu_cfg_init(stderr, "command line");
 if (!config) { perror("xlu_cfg_inig"); exit(-1); }
 
+pcidev.rdm_reserve = flag;
 if (xlu_pci_parse_bdf(config, &pcidev, bdf)) {
 fprintf(stderr, "pci-attach: malformed BDF specification \"%s\"\n", 
bdf);
 exit(2);
@@ -3371,9 +3373,9 @@ static void pciattach(uint32_t domid, const char *bdf, 
const char *vs)
 
 int main_pciattach(int argc, char **argv)
 {
-uint32_t domid;
+uint32_t domid, flag;
 int opt;
-const char *bdf = NULL, *vs = NULL;
+const char *bdf = NULL, *vs = NULL, *rdm_policy = NULL;
 
 SWITCH_FOREACH_OPT(opt, "", NULL, "pci-attach", 2) {
 /* No options */
@@ -3385,7 +3387,20 @@ int main_pciattach(int argc, char **argv)
 if (optind + 1 < argc)
 vs = argv[optind + 2];
 
-pciattach(domid, bdf, vs);
+if (optind + 2 < argc) {
+rdm_policy = argv[optind + 3];
+}
+if (!strcmp(rdm_policy, "strict")) {
+flag = LIBXL_RDM_RESERVE_FLAG_STRICT;
+} else if (!strcmp(rdm_policy, "relaxed")) {
+flag = LIBXL_RDM_RESERVE_FLAG_RELAXED;
+} else {
+fprintf(stderr, "%s is an invalid rdm policy: 'strict'|'relaxed'\n",
+rdm_policy);
+exit(2);
+}
+
+pciattach(domid, bdf, vs, flag);
 return 0;
 }
 
diff --git a/tools/libxl/xl_cmdtable.c b/tools/libxl/xl_cmdtable.c
index 7f4759b..36a2aaa 100644
--- a/tools/libxl/xl_cmdtable.c
+++ b/tools/libxl/xl_cmdtable.c
@@ -88,7 +88,7 @@ struct cmd_spec cmd_table[] = {
 { "pci-attach",
   &main_pciattach, 0, 1,
   "Insert a new pass-through pci device",
-  "  [Virtual Slot]",
+  "  [Virtual Slot] [policy to reserve 
rdm<'strice'|'relaxed'>]",
 },
 { "pci-detach",

Re: [Xen-devel] [RFC v2 14/15] Suppress posting interrupts when 'SN' is set

2015-06-23 Thread Jan Beulich

>>> On 23.06.15 at 11:26,  wrote:
>> From: Jan Beulich [mailto:jbeul...@suse.com]
>> Sent: Thursday, June 18, 2015 4:10 PM
>> >>> On 18.06.15 at 10:01,  wrote:
>> >> From: Jan Beulich [mailto:jbeul...@suse.com]
>> >> Sent: Thursday, June 18, 2015 3:51 PM
>> >> >>> On 18.06.15 at 09:34,  wrote:
>> >> > Seems it is a little tricky here. What we need to do for this is
>> >> > like something below:
>> >> >
>> >> > ..
>> >> >
>> >> > else if ( !pi_test_sn(&v->arch.hvm_vmx.pi_desc) ||
>> >> >   !pi_test_and_set_on(&v->arch.hvm_vmx.pi_desc) )
>> >> > {
>> >> > __vmx_deliver_posted_interrupt(v);
>> >> > return;
>> >> > }
>> >> >
>> >> > ..
>> >> >
>> >> > But how to handle this if 'SN' is set between pi_test_sn() and
>> >> > pi_test_and_set_on() ? Seems we cannot guarantee this two
>> >> > operations are atomic in software way.But thinking a bit
>> >> > more, maybe this solution is acceptable, 'SN' is only set when
>> >> > the vCPU's state is going to be runnable, which means the
>> >> > vCPU is not running in non-root, in this case, it doesn't matter
>> >> > whether we send notification event or not, as long as the
>> >> > interrupt is stored in PIR, and they will be delivered to guest
>> >> > during the next vm-entry.
>> >> >
>> >> > Any ideas about this?
>> >>
>> >> Unless the two functions don't do what their name suggests I
>> >> don't see why you need to invoke pi_test_sn() first - the purpose
>> >> of pi_test_and_set_on() after all is what its name says: Test
>> >> and set the flag atomically, returning the previous state.
>> >
>> > If 'SN' is set, then interrupts are suppress, we cannot send
>> > notification event, then we don't need to test and set 'ON',
>> > and it the purpose of this patch,right?
>> 
>> Oh, sorry, the single character difference in the name didn't
>> catch my attention. Testing the state of one bit and setting
>> (perhaps also along with testing it) another one should be done
>> with cmpxchg().
> 
> Yes, we can use cmpxchg(), but what happened if other bits are
> changed during this process. Please see the following pseudo
> code and scenario:
> 
> uint64_t flag = pi_desc.control;
> uint64_t old = flag & ( ~(POSTED_INTR_ON | POSTED_INTR_SN) );
> uint64_t new = flag | POSTED_INTR_ON;
> 
> /* What should we do if other bits except ON and SN are
>   changed here?*/
> 
> cmpxchg(&flag, old, new);
> 
> I think about this for some time, unfortunately I don't get a good
> solution for this. Could you please give more hints? Thanks a lot!

This is not a problem unique to your code. Just go look elsewhere -
you simply need to loop over cmpxchg() (with some guarantee to
be had that this loop will eventually be exited).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Xen 4.5.1 released

2015-06-23 Thread M A Young

On Tue, 23 Jun 2015, Jan Beulich wrote:

> All,
> 
> I am pleased to announce the release of Xen 4.5.1. This is
> available immediately from its git repository
> http://xenbits.xen.org/gitweb/?p=xen.git;a=shortlog;h=refs/heads/stable-4.5 
> (tag RELEASE-4.5.1) or from the XenProject download page
> http://www.xenproject.org/downloads/xen-archives/xen-45-series/xen-451.html 
> (where a list of changes can also be found).
> 
> We recommend all users of the 4.5 stable series to update to this
> first point release.

I don't believe this release has the qemu-xen-traditional half of XSA-135. 
If this wasn't deliberate it might be worth noting it somewhere.

Michael Young

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [v4][PATCH 03/19] xen/vtd: create RMRR mapping

2015-06-23 Thread Jan Beulich

>>> On 23.06.15 at 11:57,  wrote:
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -1839,7 +1839,7 @@ static int rmrr_identity_mapping(struct domain *d, 
> bool_t map,
>  
>  while ( base_pfn < end_pfn )
>  {
> -if ( intel_iommu_unmap_page(d, base_pfn) )
> +if ( guest_physmap_remove_page(d, base_pfn, base_pfn, 0) )
>  ret = -ENXIO;
>  base_pfn++;
>  }
> @@ -1855,8 +1855,7 @@ static int rmrr_identity_mapping(struct domain *d, 
> bool_t map,
>  
>  while ( base_pfn < end_pfn )
>  {
> -int err = intel_iommu_map_page(d, base_pfn, base_pfn,
> -   IOMMUF_readable|IOMMUF_writable);
> +int err = set_identity_p2m_entry(d, base_pfn, p2m_access_rw);

Shouldn't the two continue to be the inverse of one another?
Maybe guest_physmap_remove_page() does what you want,
but it looks like at least an abuse.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Bug in devicetree_for_each_node() in xen/arch/arm/bootfdt.c ?

2015-06-23 Thread Julien Grall


Hi,

On 23/06/2015 10:44, David Vrabel wrote:

On 23/06/15 00:02, Chris (Christopher) Brand wrote:

I’ve been trying to figure out why Xen only reports 2GB on my ARM
platform that actually has 3GB, and I think I’ve found a bug, but I’m
not familiar enough with the Xen code to fix it.

The relevant parts of my dts are:

/dts-v1/;
/ {

  model = "Broadcom STB (7445d0)";
  compatible = "brcm,bcm7445d0", "brcm,brcmstb";
  #address-cells = <0x2>;
  #size-cells = <0x2>;
  interrupt-parent = <0x1>;

  memory {
#address-cells = <0x1>;
#size-cells = <0x1>;
device_type = "memory";
reg = <0x0 0x0 0x0 0x4000 0x0 0x4000 0x0 0x4000>;


It's been a while since I've looked at device tree stuff but I think you
need 64-bit values for this reg property because the parent node has
#address-cells == 0x2 and #size-cells == 0x2.


I think they are already on 64-bit values, otherwise you would have a 
bank starting at 0 of a size 0 which seems odd.


Anyway, the format of this memory node is not supported on Xen and I 
wasn't able to find a bindings somewhere. Will extend my point by 
answering to his mail.


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Bug in devicetree_for_each_node() in xen/arch/arm/bootfdt.c ?

2015-06-23 Thread Ian Campbell

On Tue, 2015-06-23 at 11:08 +0100, Ian Campbell wrote:
> On Tue, 2015-06-23 at 10:57 +0100, Ian Campbell wrote:
> > On Tue, 2015-06-23 at 10:44 +0100, David Vrabel wrote:
> > > On 23/06/15 00:02, Chris (Christopher) Brand wrote:
> > > > I’ve been trying to figure out why Xen only reports 2GB on my ARM
> > > > platform that actually has 3GB, and I think I’ve found a bug, but I’m
> > > > not familiar enough with the Xen code to fix it.
> > > > 
> > > > The relevant parts of my dts are:
> > > > 
> > > > /dts-v1/;
> > > > / {
> > > > 
> > > >  model = "Broadcom STB (7445d0)";
> > > >  compatible = "brcm,bcm7445d0", "brcm,brcmstb";
> > > >  #address-cells = <0x2>;
> > > >  #size-cells = <0x2>;
> > > >  interrupt-parent = <0x1>;
> > > > 
> > > >  memory {
> > > >#address-cells = <0x1>;
> > > >#size-cells = <0x1>;
> > > >device_type = "memory";
> > > >reg = <0x0 0x0 0x0 0x4000 0x0 0x4000 0x0 0x4000>;
> > > 
> > > It's been a while since I've looked at device tree stuff but I think you
> > > need 64-bit values for this reg property because the parent node has
> > > #address-cells == 0x2 and #size-cells == 0x2.
> > 
> > Yes, the prevailing sizes will be 0x2 here, since the 0x1 only apply to
> > _children_. However you still need to write the cells as separate 32-bit
> > entries, so the above encodes to memory regions from 0x0.0 to
> > 0x0.4000 and 0x0.4000 to 0x0.8000 (using . to show where the
> > cell boundary lies).

And to further clarify an apparent misunderstanding in the first mail: A
reg property is a list of   pairs, with each consuming
the appropriate #foo-cells (so 2 for both here, meaning each entry is 4
cells in total).

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Bug in devicetree_for_each_node() in xen/arch/arm/bootfdt.c ?

2015-06-23 Thread Julien Grall


Hi,

On 23/06/2015 00:02, Chris (Christopher) Brand wrote:

I’ve been trying to figure out why Xen only reports 2GB on my ARM
platform that actually has 3GB, and I think I’ve found a bug, but I’m
not familiar enough with the Xen code to fix it.
/dts-v1/;

/ {

  model = "Broadcom STB (7445d0)";

  compatible = "brcm,bcm7445d0", "brcm,brcmstb";

  #address-cells = <0x2>;

  #size-cells = <0x2>;

  interrupt-parent = <0x1>;

  memory {

#address-cells = <0x1>;

#size-cells = <0x1>;

device_type = "memory";

reg = <0x0 0x0 0x0 0x4000 0x0 0x4000 0x0 0x4000>;


As said by David, Xen will parse "reg" using the #address-cells and 
#size-cells of the parent. So it's normal to see 2GB. Does the same 
device tree reports 3GB on Linux? I suspect no.


Although, what the rest of the node used for? Do we expect to parse it? 
I wasn't able to find a suitable bindings in the docs...


Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Bug in devicetree_for_each_node() in xen/arch/arm/bootfdt.c ?

2015-06-23 Thread Ian Campbell

On Tue, 2015-06-23 at 10:57 +0100, Ian Campbell wrote:
> On Tue, 2015-06-23 at 10:44 +0100, David Vrabel wrote:
> > On 23/06/15 00:02, Chris (Christopher) Brand wrote:
> > > I’ve been trying to figure out why Xen only reports 2GB on my ARM
> > > platform that actually has 3GB, and I think I’ve found a bug, but I’m
> > > not familiar enough with the Xen code to fix it.
> > > 
> > > The relevant parts of my dts are:
> > > 
> > > /dts-v1/;
> > > / {
> > > 
> > >  model = "Broadcom STB (7445d0)";
> > >  compatible = "brcm,bcm7445d0", "brcm,brcmstb";
> > >  #address-cells = <0x2>;
> > >  #size-cells = <0x2>;
> > >  interrupt-parent = <0x1>;
> > > 
> > >  memory {
> > >#address-cells = <0x1>;
> > >#size-cells = <0x1>;
> > >device_type = "memory";
> > >reg = <0x0 0x0 0x0 0x4000 0x0 0x4000 0x0 0x4000>;
> > 
> > It's been a while since I've looked at device tree stuff but I think you
> > need 64-bit values for this reg property because the parent node has
> > #address-cells == 0x2 and #size-cells == 0x2.
> 
> Yes, the prevailing sizes will be 0x2 here, since the 0x1 only apply to
> _children_. However you still need to write the cells as separate 32-bit
> entries, so the above encodes to memory regions from 0x0.0 to
> 0x0.4000 and 0x0.4000 to 0x0.8000 (using . to show where the
> cell boundary lies).
> 
> I don't know this platform, but that seems a plausible description of 2x
> 1GB regions.

I read the original report backwards, thinking 2GB was expected, but
rereading I see that 3GB was expected. In which case I think there is a
region missing in the DTB (or the hardware really only has 2GB).

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [osstest test] 58828: regressions - trouble: blocked/broken/fail/pass

2015-06-23 Thread Ian Campbell

On Tue, 2015-06-23 at 10:16 +0100, Ian Campbell wrote:
> On Tue, 2015-06-23 at 02:18 +, osstest service user wrote:
> > flight 58828 osstest real [real]
> > http://logs.test-lab.xenproject.org/osstest/logs/58828/
> > 
> > Regressions :-(
> > 
> > Tests which did not succeed and are blocking,
> > including tests which could not be run:
> >  build-armhf   3 host-install(3) broken REGR. vs. 
> > 58726
> >  build-armhf-xsm   3 host-install(3) broken REGR. vs. 
> > 58726
> >  build-armhf-pvops 3 host-install(3) broken REGR. vs. 
> > 58726
> 
> Copy kernel 
> /home/tftp//osstest/debian-installer/armhf/2015-01-10-wheezy/backports.deb 
> failed: No such file or directory at Osstest/Debian.pm line 884.
> 
> Looks like I probably botched something in one of:
> 5dd4c52291b6 mg-debian-installer-update: updates to better handle Jessie 
> onwards.
> cdbe966bb86e More flexible handling of need-kernel-deb-$flavour host flag
> 
> I'll investigate.

So, the way I appear to have structured the backwards compatibility it
retains compatibility with the existing host props, but not with the
existing TftpDiVersion.

IOW mg-debian-installer-update-all should have been run while committing
"More flexible handling of need-kernel-deb-$flavour host flag", which I
seem to have forgotten to note anywhere -- although Ian J did do so in
response to v2.

I don't think handling compatibility with the layout of the existing
TftpDiVersion dir is going to be easy or worth it.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH V4 3/7] libxl: add pvusb API

2015-06-23 Thread Chun Yan Liu



>>> On 6/16/2015 at 01:47 AM, in message <557f0fa7.2060...@eu.citrix.com>, 
>>> George
Dunlap  wrote: 
> On 06/10/2015 04:20 AM, Chunyan Liu wrote: 
> > Add pvusb APIs, including: 
> >  - attach/detach (create/destroy) virtual usb controller. 
> >  - attach/detach usb device 
> >  - list usb controller and usb devices 
> >  - some other helper functions 
> >  
> > Signed-off-by: Chunyan Liu  
> > Signed-off-by: Simon Cao  
> >  
> > --- 
> > changes: 
> >   - make libxl_device_usbctrl_add async, to be consistent with 
> > libxl_device_usbctrl_remove, using MACROs DEFINE_DEVICE_ADD 
> > and DEFINE_DEVICES_ADD, adjusting related codes. 
> >   - correct update_json related processing. 
> >   - use libxl__* helper functions instead of calloc, realloc 
> > and strdup, etc. whereever possible. 
> >   - merge protocol definition pv|qemu in this patch 
> >   - treat it as warning at rebind failure instead of error in usb_remove 
> >   - add documentation docs/misc/pvusb.txt to describe pvusb 
> > details (toolstack design, libxl design, xenstore info, etc.) 
> >   - other fixes addring Wei and George's comments 
> >  
> >  docs/misc/pvusb.txt  |  243 +++ 
> >  tools/libxl/Makefile |2 +- 
> >  tools/libxl/libxl.c  |6 + 
> >  tools/libxl/libxl.h  |   65 ++ 
> >  tools/libxl/libxl_internal.h |   16 +- 
> >  tools/libxl/libxl_osdeps.h   |   13 + 
> >  tools/libxl/libxl_pvusb.c| 1249  
> ++ 
> >  tools/libxl/libxl_types.idl  |   52 ++ 
> >  tools/libxl/libxl_types_internal.idl |1 + 
> >  tools/libxl/libxl_utils.c|   16 + 
> >  10 files changed, 1661 insertions(+), 2 deletions(-) 
> >  create mode 100644 docs/misc/pvusb.txt 
> >  create mode 100644 tools/libxl/libxl_pvusb.c 
> >  
> > diff --git a/docs/misc/pvusb.txt b/docs/misc/pvusb.txt 
> > new file mode 100644 
> > index 000..4cdf965 
> > --- /dev/null 
> > +++ b/docs/misc/pvusb.txt 
> > @@ -0,0 +1,243 @@ 
> > +1. Overview 
> > + 
> > +There are two general methods for passing through individual host 
> > +devices to a guest. The first is via an emulated USB device 
> > +controller; the second is PVUSB. 
> > + 
> > +Additionally, there are two ways to add USB devices to a guest: via 
> > +the config file at domain creation time, and via hot-plug while the VM 
> > +is running. 
> > + 
> > +* Emulated USB 
> > + 
> > +In emulated USB, the device model (qemu) presents an emulated USB 
> > +controller to the guest. The device model process then grabs control 
> > +of the device from domain 0 and and passes the USB commands between 
> > +the guest OS and the host USB device. 
> > + 
> > +This method is only available to HVM domains, and is not available for 
> > +domains running with device model stubdomains. 
> > + 
> > +* PVUSB 
> > + 
> > +PVUSB uses a paravirtialized front-end/back-end interface, similar to 
> > +the traditional Xen PV network and disk protocols. In order to use 
> > +PVUSB, you need usbfront in your guest OS, and usbback in dom0 (or 
> > +your USB driver domain). 
> > + 
> > +Additionally, for easy use of PVUSB, you need support in the toolstack 
> > +to get the two sides to talk to each other. 
>  
> I think this paragraph is probably unnecessary.  By the time this is 
> checked in, the toolstack will have the necessary support. 
>  
> > +2. Specifying a host USB device 
> > + 
> > +QEMU hmp commands allows USB devices to be specified either by their 
>  
> s/hmp/qmp/; ? 
>  
> > +bus address (in the form bus.device) or their device tag (in the form 
> > +vendorid:deviceid). 
> > + 
> > +Each way of specifying has its advantages: 
> > + 
> > +Specifying by device tag will always get the same device, 
> > +regardless of where the device ends up in the USB bus topology. 
> > +However, if there are two identical devices, it will not allow you to 
> > +specify which one. 
> > + 
> > +Specifying by bus address will always allow you to choose a 
> > +specific device, even if you have duplicates. However, the bus address 
> > +may change depending on which port you plugged the device into, and 
> > +possibly also after a reboot. 
> > + 
> > +To avoid duplication of vendorid:deviceid, we'll use bus address to 
> > +specify host USB device in xl toolstack. 
>  
> This paragraph comparing the different ways of specifying devices makes 
> sense to include in the 0/N series summary, but not so much in a file 
> we're checking in. 
>  
> > + 
> > +You can use lsusb to list the USB devices on the system: 
> > + 
> > +Bus 001 Device 003: ID 0424:2514 Standard Microsystems Corp. USB 2.0 
> > +Hub 
> > +Bus 003 Device 002: ID f617:0905 
> > +Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub 
> > +Bus 001 Device 004: ID 0424:2640 Standard Microsystems Corp. USB 2.0 
> > +Hub 
> > +Bus 001 Device 005: ID 0424:4060 Standard Microsystems Corp. Ultra 
> > +Fast Media Rea

Re: [Xen-devel] [RFC PATCH v3 06/18] xen/arm: ITS: Add helper functions to manage its_devices

2015-06-23 Thread Julien Grall


Hi Vijay,

On 22/06/2015 13:01, vijay.kil...@gmail.com wrote:

From: Vijaya Kumar K 

Helper functions to mange its devices using RB-tree


s/mange/manage/


are introduced in physical ITS driver.

This is global list of all the devices.

Signed-off-by: Vijaya Kumar K 
---
  xen/arch/arm/gic-v3-its.c |   49 +
  xen/include/asm-arm/gic-its.h |3 +++
  2 files changed, 52 insertions(+)

diff --git a/xen/arch/arm/gic-v3-its.c b/xen/arch/arm/gic-v3-its.c
index b1a97c1..349d0bb 100644
--- a/xen/arch/arm/gic-v3-its.c
+++ b/xen/arch/arm/gic-v3-its.c
@@ -92,6 +92,7 @@ struct its_node {
  static LIST_HEAD(its_nodes);
  static DEFINE_SPINLOCK(its_lock);
  static struct rdist_prop  *gic_rdists;
+static struct rb_root rb_its_dev;

  #define gic_data_rdist()(per_cpu(rdist, smp_processor_id()))
  #define gic_data_rdist_rd_base()(per_cpu(rdist, smp_processor_id()).rbase)
@@ -145,6 +146,53 @@ void dump_cmd(its_cmd_block *cmd)
  }
  #endif

+/* RB-tree helpers for its_device */
+struct its_device * find_its_device(struct rb_root *root, u32 devid)


coding style: struct its_device *find


I would rename to its_find_device to keep consistent with the function 
name within this file.


Also, for any exported function you have to declare the prototype in the 
header within the same patch.



+{
+struct rb_node *node = root->rb_node;
+
+while ( node )
+{
+struct its_device *dev;
+
+dev = container_of(node, struct its_device, node);
+if ( devid < dev->device_id )
+node = node->rb_left;
+else if ( devid > dev->device_id )
+node = node->rb_right;
+else
+return dev;
+}
+
+return NULL;
+}
+
+int insert_its_device(struct rb_root *root, struct its_device *dev)


Why do you need the root in parameter? You already it within the file.

Also, I would rename to its_add_device.

Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] xen-netback: fix a BUG() during initialization

2015-06-23 Thread David Miller

From: Imre Palik 
Date: Fri, 19 Jun 2015 14:21:51 +0200

> From: "Palik, Imre" 
> 
> Commit edafc132baac ("xen-netback: making the bandwidth limiter runtime 
> settable")
> introduced the capability to change the bandwidth rate limit at runtime.
> But it also introduced a possible crashing bug.
> 
> If netback receives two XenbusStateConnected without getting the
> hotplug-status watch firing in between, then it will try to register the
> watches for the rate limiter again.  But this triggers a BUG() in the watch
> registration code.
> 
> The fix modifies connect() to remove the possibly existing packet-rate
> watches before trying to install those watches.  This behaviour is in line
> with how connect() deals with the hotplug-status watch.
> 
> Signed-off-by: Imre Palik 
> Cc: Matt Wilson 

Applied, thank you.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Xen 4.5.1 released

2015-06-23 Thread Ian Jackson

M A Young writes ("Re: [Xen-devel] Xen 4.5.1 released"):
> On Tue, 23 Jun 2015, Jan Beulich wrote:
> > I am pleased to announce the release of Xen 4.5.1. This is
> > available immediately from its git repository
> > http://xenbits.xen.org/gitweb/?p=xen.git;a=shortlog;h=refs/heads/stable-4.5 
> > (tag RELEASE-4.5.1) or from the XenProject download page
> > http://www.xenproject.org/downloads/xen-archives/xen-45-series/xen-451.html 
> > (where a list of changes can also be found).
> > 
> > We recommend all users of the 4.5 stable series to update to this
> > first point release.
> 
> I don't believe this release has the qemu-xen-traditional half of XSA-135. 
> If this wasn't deliberate it might be worth noting it somewhere.

You're right.  It appears that the patch for XSA-135 was never applied
to qemu-traditional, due to an oversight.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Bug in devicetree_for_each_node() in xen/arch/arm/bootfdt.c ?

2015-06-23 Thread Ian Campbell

On Tue, 2015-06-23 at 11:08 +0100, Julien Grall wrote:

> > reg = <0x0 0x0 0x0 0x4000 0x0 0x4000 0x0 0x4000>;
> 
> Although, what the rest of the node used for? Do we expect to parse it? 
> I wasn't able to find a suitable bindings in the docs...

A reg can encode multiple regions, by just listing them one after the
other. It'll be in the generic docs about reg properties I expect,
there's nothing special about a reg property in a memory node in this
regard.

Xen parses these just fine, via the loop and use of device_tree_get_reg
in process_memory_node

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PULL 0/5] xen-220615

2015-06-23 Thread Stefano Stabellini

On Tue, 23 Jun 2015, Peter Maydell wrote:
> On 22 June 2015 at 14:08, Stefano Stabellini
>  wrote:
> > The following changes since commit 00967f4e0bab246679d0ddc32fd31a7179345baf:
> >
> >   Merge remote-tracking branch 
> > 'remotes/agraf/tags/signed-s390-for-upstream' into staging (2015-06-05 
> > 12:04:42 +0100)
> >
> > are available in the git repository at:
> >
> >
> >   git://xenbits.xen.org/people/sstabellini/qemu-dm.git tags/xen-220615
> >
> > for you to fetch changes up to 960ce9e49c4aaf591f13ced0e74a93499f46016b:
> >
> >   Revert "xen-hvm: increase maxmem before calling 
> > xc_domain_populate_physmap" (2015-06-22 13:00:42 +)
> >
> > 
> > Xen tree 22/06/2015
> >
> > 
> > Jan Beulich (4):
> >   xen/pass-through: fold host PCI command register writes
> >   xen/pass-through: ROM BAR handling adjustments
> >   xen/pass-through: log errno values rather than function return ones
> >   xen/pass-through: constify some static data
> >
> > Stefano Stabellini (1):
> >   Revert "xen-hvm: increase maxmem before calling 
> > xc_domain_populate_physmap"
> 
> I'm afraid I can't apply this -- this revert commit is missing a signed-off-by
> line. Can you respin, please?

I thought that being a straight revert, didn't need a SOB line.
xen-220615-2 is the tag with SOB line on the last commit:

git://xenbits.xen.org/people/sstabellini/qemu-dm.git tags/xen-220615-2

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Bug in devicetree_for_each_node() in xen/arch/arm/bootfdt.c ?

2015-06-23 Thread Julien Grall




On 23/06/2015 11:27, Ian Campbell wrote:

On Tue, 2015-06-23 at 11:08 +0100, Julien Grall wrote:


 reg = <0x0 0x0 0x0 0x4000 0x0 0x4000 0x0 0x4000>;


Although, what the rest of the node used for? Do we expect to parse it?
I wasn't able to find a suitable bindings in the docs...


A reg can encode multiple regions, by just listing them one after the
other. It'll be in the generic docs about reg properties I expect,
there's nothing special about a reg property in a memory node in this
regard.

Xen parses these just fine, via the loop and use of device_tree_get_reg
in process_memory_node


I know that a "reg" can encode multiple regions... My question was 
related to the "rest of the node" and not the rest of the property... i.e


   region@1000 {

contiguous-region;

reg = <0x1000 0x1f80>;

linux,phandle = <0x2>;

phandle = <0x2>;

   };

AFAICT we don't parse it.

Regards,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v3 05/18] x86/hvm: remove multiple open coded 'chunking' loops

2015-06-23 Thread Paul Durrant

...in hvmemul_read/write()

Add hvmemul_phys_mmio_access() and hvmemul_linear_mmio_access() functions
to reduce code duplication.

Signed-off-by: Paul Durrant 
Cc: Keir Fraser 
Cc: Jan Beulich 
Cc: Andrew Cooper 
---
 xen/arch/x86/hvm/emulate.c |  232 
 1 file changed, 126 insertions(+), 106 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index f3372fc..02796d0 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -540,6 +540,115 @@ static int hvmemul_virtual_to_linear(
 return X86EMUL_EXCEPTION;
 }
 
+static int hvmemul_phys_mmio_access(
+paddr_t gpa, unsigned int size, uint8_t dir, uint8_t *buffer,
+unsigned int *off)
+{
+unsigned long one_rep = 1;
+unsigned int chunk;
+int rc = 0;
+
+/* Accesses must fall within a page */
+if ( (gpa & (PAGE_SIZE - 1)) + size > PAGE_SIZE )
+return X86EMUL_UNHANDLEABLE;
+
+/*
+ * hvmemul_do_io() cannot handle non-power-of-2 accesses or
+ * accesses larger than sizeof(long), so choose the highest power
+ * of 2 not exceeding sizeof(long) as the 'chunk' size.
+ */
+chunk = 1 << (fls(size) - 1);
+if ( chunk > sizeof (long) )
+chunk = sizeof (long);
+
+while ( size != 0 )
+{
+rc = hvmemul_do_mmio_buffer(gpa, &one_rep, chunk, dir, 0,
+&buffer[*off]);
+if ( rc != X86EMUL_OKAY )
+break;
+
+/* Advance to the next chunk */
+gpa += chunk;
+*off += chunk;
+size -= chunk;
+
+/*
+ * If the chunk now exceeds the remaining size, choose the next
+ * lowest power of 2 that will fit.
+ */
+while ( chunk > size )
+chunk >>= 1;
+}
+
+return rc;
+}
+
+static inline int hvmemul_phys_mmio_read(
+paddr_t gpa, unsigned int size, uint8_t *buffer, unsigned int *off)
+{
+return hvmemul_phys_mmio_access(gpa, size, IOREQ_READ, buffer,
+off);
+}
+
+static inline int hvmemul_phys_mmio_write(
+paddr_t gpa, unsigned int size, uint8_t *buffer, unsigned int *off)
+{
+return hvmemul_phys_mmio_access(gpa, size, IOREQ_WRITE, buffer,
+off);
+}
+
+static int hvmemul_linear_mmio_access(
+unsigned long gla, unsigned int size, uint8_t dir, void *buffer,
+uint32_t pfec, struct hvm_emulate_ctxt *hvmemul_ctxt)
+{
+unsigned long page_off = gla & (PAGE_SIZE - 1);
+unsigned int chunk, buffer_off = 0;
+paddr_t gpa;
+unsigned long one_rep = 1;
+int rc;
+
+chunk = min_t(unsigned int, size, PAGE_SIZE - page_off);
+rc = hvmemul_linear_to_phys(gla, &gpa, chunk, &one_rep, pfec,
+hvmemul_ctxt);
+while ( rc == X86EMUL_OKAY )
+{
+rc = hvmemul_phys_mmio_access(gpa, chunk, dir, buffer,
+  &buffer_off);
+if ( rc != X86EMUL_OKAY )
+break;
+
+gla += chunk;
+size -= chunk;
+
+if ( size == 0 )
+break;
+
+ASSERT((gla & (PAGE_SIZE - 1)) == 0);
+chunk = min_t(unsigned int, size, PAGE_SIZE);
+rc = hvmemul_linear_to_phys(gla, &gpa, chunk, &one_rep, pfec,
+hvmemul_ctxt);
+}
+
+return rc;
+}
+
+static inline int hvmemul_linear_mmio_read(
+unsigned long gla, unsigned int size, void *buffer,
+uint32_t pfec, struct hvm_emulate_ctxt *hvmemul_ctxt)
+{
+return hvmemul_linear_mmio_access(gla, size, IOREQ_READ, buffer,
+  pfec, hvmemul_ctxt);
+}
+
+static inline int hvmemul_linear_mmio_write(
+unsigned long gla, unsigned int size, void *buffer,
+uint32_t pfec, struct hvm_emulate_ctxt *hvmemul_ctxt)
+{
+return hvmemul_linear_mmio_access(gla, size, IOREQ_WRITE, buffer,
+  pfec, hvmemul_ctxt);
+}
+
 static int __hvmemul_read(
 enum x86_segment seg,
 unsigned long offset,
@@ -549,52 +658,26 @@ static int __hvmemul_read(
 struct hvm_emulate_ctxt *hvmemul_ctxt)
 {
 struct vcpu *curr = current;
-unsigned long addr, reps = 1;
-unsigned int off, chunk = min(bytes, 1U << LONG_BYTEORDER);
+unsigned long addr, one_rep = 1;
 uint32_t pfec = PFEC_page_present;
 struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
-paddr_t gpa;
 int rc;
 
 rc = hvmemul_virtual_to_linear(
-seg, offset, bytes, &reps, access_type, hvmemul_ctxt, &addr);
+seg, offset, bytes, &one_rep, access_type, hvmemul_ctxt, &addr);
 if ( rc != X86EMUL_OKAY )
 return rc;
-off = addr & (PAGE_SIZE - 1);
-/*
- * We only need to handle sizes actual instruction operands can have. All
- * such sizes are either powers of 2 or the sum of two powers of 2. Thus
- * picking as initial chunk size the largest power of 2 not greater than
- * the total size will

[Xen-devel] [PATCH v3 07/18] x86/hvm: unify internal portio and mmio intercepts

2015-06-23 Thread Paul Durrant

The implementation of mmio and portio intercepts is unnecessarily different.
This leads to much code duplication. This patch unifies much of the
intercept handling, leaving only distinct handlers for stdvga mmio and dpci
portio. Subsequent patches will unify those handlers.

Signed-off-by: Paul Durrant 
Cc: Keir Fraser 
Cc: Jan Beulich 
Cc: Andrew Cooper 
---
 xen/arch/x86/hvm/emulate.c|   11 +-
 xen/arch/x86/hvm/hpet.c   |4 +-
 xen/arch/x86/hvm/hvm.c|7 +-
 xen/arch/x86/hvm/intercept.c  |  502 +
 xen/arch/x86/hvm/stdvga.c |   30 +-
 xen/arch/x86/hvm/vioapic.c|4 +-
 xen/arch/x86/hvm/vlapic.c |5 +-
 xen/arch/x86/hvm/vmsi.c   |7 +-
 xen/drivers/passthrough/amd/iommu_guest.c |   30 +-
 xen/include/asm-x86/hvm/domain.h  |1 +
 xen/include/asm-x86/hvm/io.h  |  119 +++
 11 files changed, 350 insertions(+), 370 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 02796d0..91e57b0 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -143,16 +143,7 @@ static int hvmemul_do_io(
 hvmtrace_io_assist(&p);
 }
 
-if ( is_mmio )
-{
-rc = hvm_mmio_intercept(&p);
-if ( rc == X86EMUL_UNHANDLEABLE )
-rc = hvm_buffered_io_intercept(&p);
-}
-else
-{
-rc = hvm_portio_intercept(&p);
-}
+rc = hvm_io_intercept(&p);
 
 switch ( rc )
 {
diff --git a/xen/arch/x86/hvm/hpet.c b/xen/arch/x86/hvm/hpet.c
index 9585ca8..8958873 100644
--- a/xen/arch/x86/hvm/hpet.c
+++ b/xen/arch/x86/hvm/hpet.c
@@ -504,7 +504,7 @@ static int hpet_range(struct vcpu *v, unsigned long addr)
  (addr < (HPET_BASE_ADDRESS + HPET_MMAP_SIZE)) );
 }
 
-const struct hvm_mmio_ops hpet_mmio_ops = {
+static const struct hvm_mmio_ops hpet_mmio_ops = {
 .check = hpet_range,
 .read  = hpet_read,
 .write = hpet_write
@@ -659,6 +659,8 @@ void hpet_init(struct domain *d)
 h->hpet.comparator64[i] = ~0ULL;
 h->pt[i].source = PTSRC_isa;
 }
+
+register_mmio_handler(d, &hpet_mmio_ops);
 }
 
 void hpet_deinit(struct domain *d)
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 535d622..c10db78 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1465,11 +1465,12 @@ int hvm_domain_initialise(struct domain *d)
 goto fail0;
 
 d->arch.hvm_domain.params = xzalloc_array(uint64_t, HVM_NR_PARAMS);
-d->arch.hvm_domain.io_handler = xmalloc(struct hvm_io_handler);
+d->arch.hvm_domain.io_handler = xzalloc_array(struct hvm_io_handler,
+  NR_IO_HANDLERS);
 rc = -ENOMEM;
 if ( !d->arch.hvm_domain.params || !d->arch.hvm_domain.io_handler )
 goto fail1;
-d->arch.hvm_domain.io_handler->num_slot = 0;
+d->arch.hvm_domain.io_handler_count = 0;
 
 /* Set the default IO Bitmap. */
 if ( is_hardware_domain(d) )
@@ -1506,6 +1507,8 @@ int hvm_domain_initialise(struct domain *d)
 
 rtc_init(d);
 
+msixtbl_init(d);
+
 register_portio_handler(d, 0xe9, 1, hvm_print_line);
 register_portio_handler(d, 0xcf8, 4, hvm_access_cf8);
 
diff --git a/xen/arch/x86/hvm/intercept.c b/xen/arch/x86/hvm/intercept.c
index cc44733..4db024e 100644
--- a/xen/arch/x86/hvm/intercept.c
+++ b/xen/arch/x86/hvm/intercept.c
@@ -32,205 +32,97 @@
 #include 
 #include 
 
-static const struct hvm_mmio_ops *const
-hvm_mmio_handlers[HVM_MMIO_HANDLER_NR] =
+static bool_t hvm_mmio_accept(struct hvm_io_handler *handler,
+  uint64_t addr,
+  uint64_t size)
 {
-&hpet_mmio_ops,
-&vlapic_mmio_ops,
-&vioapic_mmio_ops,
-&msixtbl_mmio_ops,
-&iommu_mmio_ops
-};
+BUG_ON(handler->type != IOREQ_TYPE_COPY);
+
+return handler->u.mmio.ops->check(current, addr);
+}
 
-static int hvm_mmio_access(struct vcpu *v,
-   ioreq_t *p,
-   hvm_mmio_read_t read,
-   hvm_mmio_write_t write)
+static int hvm_mmio_read(struct hvm_io_handler *handler,
+ uint64_t addr,
+ uint64_t size,
+ uint64_t *data)
 {
-struct hvm_vcpu_io *vio = &v->arch.hvm_vcpu.hvm_io;
-unsigned long data;
-int rc = X86EMUL_OKAY, i, step = p->df ? -p->size : p->size;
+BUG_ON(handler->type != IOREQ_TYPE_COPY);
 
-if ( !p->data_is_ptr )
-{
-if ( p->dir == IOREQ_READ )
-{
-if ( vio->mmio_retrying )
-{
-if ( vio->mmio_large_read_bytes != p->size )
-return X86EMUL_UNHANDLEABLE;
-memcpy(&data, vio->mmio_large_read, p->size);
-vio->mmio_large_read_bytes = 0;
-vio->mmio_retrying = 0;
-}
-el

[Xen-devel] [PATCH v3 06/18] x86/hvm: re-name struct hvm_mmio_handler to hvm_mmio_ops

2015-06-23 Thread Paul Durrant

The struct just contains three methods and no data, so the name
hvm_mmio_ops more accurately reflects its content. A subsequent patch
introduces a new structure which more accurately warrants the name
hvm_mmio_handler so doing the rename in this purely cosmetic patch avoids
conflating functional and cosmetic changes in a single patch.

Signed-off-by: Paul Durrant 
Cc: Keir Fraser 
Acked-by: Jan Beulich 
Cc: Andrew Cooper 
---
 tools/tests/vhpet/emul.h  |8 +++---
 tools/tests/vhpet/main.c  |6 ++---
 xen/arch/x86/hvm/hpet.c   |8 +++---
 xen/arch/x86/hvm/intercept.c  |   41 ++---
 xen/arch/x86/hvm/vioapic.c|8 +++---
 xen/arch/x86/hvm/vlapic.c |8 +++---
 xen/arch/x86/hvm/vmsi.c   |8 +++---
 xen/drivers/passthrough/amd/iommu_guest.c |8 +++---
 xen/include/asm-x86/hvm/io.h  |   18 ++---
 9 files changed, 56 insertions(+), 57 deletions(-)

diff --git a/tools/tests/vhpet/emul.h b/tools/tests/vhpet/emul.h
index 09e4611..383acff 100644
--- a/tools/tests/vhpet/emul.h
+++ b/tools/tests/vhpet/emul.h
@@ -237,11 +237,11 @@ typedef int (*hvm_mmio_write_t)(struct vcpu *v,
 typedef int (*hvm_mmio_check_t)(struct vcpu *v, unsigned long addr);
 
 
-struct hvm_mmio_handler
+struct hvm_mmio_ops
 {
-hvm_mmio_check_t check_handler;
-hvm_mmio_read_t read_handler;
-hvm_mmio_write_t write_handler;
+hvm_mmio_check_t check;
+hvm_mmio_read_t  read;
+hvm_mmio_write_t write;
 };
 
 /* Marshalling and unmarshalling uses a buffer with size and cursor. */
diff --git a/tools/tests/vhpet/main.c b/tools/tests/vhpet/main.c
index fbd7510..6fe65ea 100644
--- a/tools/tests/vhpet/main.c
+++ b/tools/tests/vhpet/main.c
@@ -70,7 +70,7 @@ static int skip_error_on_load;
 
 static char *global_thousep;
 
-extern const struct hvm_mmio_handler hpet_mmio_handler;
+extern const struct hvm_mmio_ops hpet_mmio_ops;
 
 struct domain dom1;
 struct vcpu vcpu0;
@@ -297,13 +297,13 @@ void udelay(int w)
 unsigned int hpet_readl(unsigned long a)
 {
 unsigned long ret = 0;
-hpet_mmio_handler.read_handler(current, a, 4, &ret);
+hpet_mmio_ops.read(current, a, 4, &ret);
 return ret;
 }
 
 void hpet_writel(unsigned long d, unsigned long a)
 {
-hpet_mmio_handler.write_handler(current, a, 4, d);
+hpet_mmio_ops.write(current, a, 4, d);
 return;
 }
 
diff --git a/xen/arch/x86/hvm/hpet.c b/xen/arch/x86/hvm/hpet.c
index d898169..9585ca8 100644
--- a/xen/arch/x86/hvm/hpet.c
+++ b/xen/arch/x86/hvm/hpet.c
@@ -504,10 +504,10 @@ static int hpet_range(struct vcpu *v, unsigned long addr)
  (addr < (HPET_BASE_ADDRESS + HPET_MMAP_SIZE)) );
 }
 
-const struct hvm_mmio_handler hpet_mmio_handler = {
-.check_handler = hpet_range,
-.read_handler  = hpet_read,
-.write_handler = hpet_write
+const struct hvm_mmio_ops hpet_mmio_ops = {
+.check = hpet_range,
+.read  = hpet_read,
+.write = hpet_write
 };
 
 
diff --git a/xen/arch/x86/hvm/intercept.c b/xen/arch/x86/hvm/intercept.c
index d52a48c..cc44733 100644
--- a/xen/arch/x86/hvm/intercept.c
+++ b/xen/arch/x86/hvm/intercept.c
@@ -32,20 +32,20 @@
 #include 
 #include 
 
-static const struct hvm_mmio_handler *const
+static const struct hvm_mmio_ops *const
 hvm_mmio_handlers[HVM_MMIO_HANDLER_NR] =
 {
-&hpet_mmio_handler,
-&vlapic_mmio_handler,
-&vioapic_mmio_handler,
-&msixtbl_mmio_handler,
-&iommu_mmio_handler
+&hpet_mmio_ops,
+&vlapic_mmio_ops,
+&vioapic_mmio_ops,
+&msixtbl_mmio_ops,
+&iommu_mmio_ops
 };
 
 static int hvm_mmio_access(struct vcpu *v,
ioreq_t *p,
-   hvm_mmio_read_t read_handler,
-   hvm_mmio_write_t write_handler)
+   hvm_mmio_read_t read,
+   hvm_mmio_write_t write)
 {
 struct hvm_vcpu_io *vio = &v->arch.hvm_vcpu.hvm_io;
 unsigned long data;
@@ -64,11 +64,11 @@ static int hvm_mmio_access(struct vcpu *v,
 vio->mmio_retrying = 0;
 }
 else
-rc = read_handler(v, p->addr, p->size, &data);
+rc = read(v, p->addr, p->size, &data);
 p->data = data;
 }
 else /* p->dir == IOREQ_WRITE */
-rc = write_handler(v, p->addr, p->size, p->data);
+rc = write(v, p->addr, p->size, p->data);
 return rc;
 }
 
@@ -86,7 +86,7 @@ static int hvm_mmio_access(struct vcpu *v,
 }
 else
 {
-rc = read_handler(v, p->addr + step * i, p->size, &data);
+rc = read(v, p->addr + step * i, p->size, &data);
 if ( rc != X86EMUL_OKAY )
 break;
 }
@@ -145,7 +145,7 @@ static int hvm_mmio_access(struct vcpu *v,
 }
 if ( rc != X86EMUL_OKAY )
 break;
-

[Xen-devel] [PATCH v3 04/18] x86/hvm: make sure translated MMIO reads or writes fall within a page

2015-06-23 Thread Paul Durrant

...otherwise they will simply carry on to the next page using a normal
linear-to-phys translation.

Signed-off-by: Paul Durrant 
Cc: Keir Fraser 
Cc: Jan Beulich 
Cc: Andrew Cooper 
---
 xen/arch/x86/hvm/emulate.c |6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 935eab3..f3372fc 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -586,7 +586,6 @@ static int __hvmemul_read(
 p_data);
 if ( rc != X86EMUL_OKAY || bytes == chunk )
 return rc;
-addr += chunk;
 off += chunk;
 gpa += chunk;
 p_data += chunk;
@@ -594,6 +593,8 @@ static int __hvmemul_read(
 if ( bytes < chunk )
 chunk = bytes;
 }
+
+return X86EMUL_UNHANDLEABLE;
 }
 
 if ( (seg != x86_seg_none) &&
@@ -730,7 +731,6 @@ static int hvmemul_write(
 p_data);
 if ( rc != X86EMUL_OKAY || bytes == chunk )
 return rc;
-addr += chunk;
 off += chunk;
 gpa += chunk;
 p_data += chunk;
@@ -738,6 +738,8 @@ static int hvmemul_write(
 if ( bytes < chunk )
 chunk = bytes;
 }
+
+return X86EMUL_UNHANDLEABLE;
 }
 
 if ( (seg != x86_seg_none) &&
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v3 08/18] x86/hvm: add length to mmio check op

2015-06-23 Thread Paul Durrant

When memory mapped I/O is range checked by internal handlers, the length
of the access should be taken into account.

Signed-off-by: Paul Durrant 
Cc: Keir Fraser 
Cc: Jan Beulich 
Cc: Andrew Cooper 
---
 xen/arch/x86/hvm/hpet.c   |7 ---
 xen/arch/x86/hvm/intercept.c  |2 +-
 xen/arch/x86/hvm/vioapic.c|   17 ++---
 xen/arch/x86/hvm/vlapic.c |8 +---
 xen/arch/x86/hvm/vmsi.c   |   27 ---
 xen/drivers/passthrough/amd/iommu_guest.c |   18 +++---
 xen/include/asm-x86/hvm/io.h  |4 +++-
 7 files changed, 62 insertions(+), 21 deletions(-)

diff --git a/xen/arch/x86/hvm/hpet.c b/xen/arch/x86/hvm/hpet.c
index 8958873..1a1f239 100644
--- a/xen/arch/x86/hvm/hpet.c
+++ b/xen/arch/x86/hvm/hpet.c
@@ -498,10 +498,11 @@ static int hpet_write(
 return X86EMUL_OKAY;
 }
 
-static int hpet_range(struct vcpu *v, unsigned long addr)
+static int hpet_range(struct vcpu *v, unsigned long addr,
+  unsigned long length)
 {
-return ( (addr >= HPET_BASE_ADDRESS) &&
- (addr < (HPET_BASE_ADDRESS + HPET_MMAP_SIZE)) );
+return (addr >= HPET_BASE_ADDRESS) &&
+   ((addr + length) < (HPET_BASE_ADDRESS + HPET_MMAP_SIZE));
 }
 
 static const struct hvm_mmio_ops hpet_mmio_ops = {
diff --git a/xen/arch/x86/hvm/intercept.c b/xen/arch/x86/hvm/intercept.c
index 4db024e..5e8d8b2 100644
--- a/xen/arch/x86/hvm/intercept.c
+++ b/xen/arch/x86/hvm/intercept.c
@@ -38,7 +38,7 @@ static bool_t hvm_mmio_accept(struct hvm_io_handler *handler,
 {
 BUG_ON(handler->type != IOREQ_TYPE_COPY);
 
-return handler->u.mmio.ops->check(current, addr);
+return handler->u.mmio.ops->check(current, addr, size);
 }
 
 static int hvm_mmio_read(struct hvm_io_handler *handler,
diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
index 9ad909b..4a9b33e 100644
--- a/xen/arch/x86/hvm/vioapic.c
+++ b/xen/arch/x86/hvm/vioapic.c
@@ -242,12 +242,13 @@ static int vioapic_write(
 return X86EMUL_OKAY;
 }
 
-static int vioapic_range(struct vcpu *v, unsigned long addr)
+static int vioapic_range(struct vcpu *v, unsigned long addr,
+unsigned long length)
 {
 struct hvm_hw_vioapic *vioapic = domain_vioapic(v->domain);
 
-return ((addr >= vioapic->base_address &&
- (addr < vioapic->base_address + VIOAPIC_MEM_LENGTH)));
+return (addr >= vioapic->base_address) &&
+   ((addr + length) <= (vioapic->base_address + VIOAPIC_MEM_LENGTH));
 }
 
 static const struct hvm_mmio_ops vioapic_mmio_ops = {
@@ -466,3 +467,13 @@ void vioapic_deinit(struct domain *d)
 xfree(d->arch.hvm_domain.vioapic);
 d->arch.hvm_domain.vioapic = NULL;
 }
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/hvm/vlapic.c b/xen/arch/x86/hvm/vlapic.c
index f2052cf..7421fc5 100644
--- a/xen/arch/x86/hvm/vlapic.c
+++ b/xen/arch/x86/hvm/vlapic.c
@@ -986,14 +986,16 @@ int hvm_x2apic_msr_write(struct vcpu *v, unsigned int 
msr, uint64_t msr_content)
 return vlapic_reg_write(v, offset, (uint32_t)msr_content);
 }
 
-static int vlapic_range(struct vcpu *v, unsigned long addr)
+static int vlapic_range(struct vcpu *v, unsigned long address,
+unsigned long len)
 {
 struct vlapic *vlapic = vcpu_vlapic(v);
-unsigned long offset  = addr - vlapic_base_address(vlapic);
+unsigned long offset  = address - vlapic_base_address(vlapic);
 
 return !vlapic_hw_disabled(vlapic) &&
!vlapic_x2apic_mode(vlapic) &&
-   (offset < PAGE_SIZE);
+   (address >= vlapic_base_address(vlapic)) &&
+   ((offset + len) <= PAGE_SIZE);
 }
 
 static const struct hvm_mmio_ops vlapic_mmio_ops = {
diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
index 09ea301..61fe391 100644
--- a/xen/arch/x86/hvm/vmsi.c
+++ b/xen/arch/x86/hvm/vmsi.c
@@ -168,14 +168,14 @@ struct msixtbl_entry
 static DEFINE_RCU_READ_LOCK(msixtbl_rcu_lock);
 
 static struct msixtbl_entry *msixtbl_find_entry(
-struct vcpu *v, unsigned long addr)
+struct vcpu *v, unsigned long address, unsigned long len)
 {
 struct msixtbl_entry *entry;
 struct domain *d = v->domain;
 
 list_for_each_entry( entry, &d->arch.hvm_domain.msixtbl_list, list )
-if ( addr >= entry->gtable &&
- addr < entry->gtable + entry->table_len )
+if ( (address >= entry->gtable) &&
+ ((address + len) <= (entry->gtable + entry->table_len)) )
 return entry;
 
 return NULL;
@@ -214,7 +214,7 @@ static int msixtbl_read(
 
 rcu_read_lock(&msixtbl_rcu_lock);
 
-entry = msixtbl_find_entry(v, address);
+entry = msixtbl_find_entry(v, address, len);
 if ( !entry )
 goto out;
 offset = address & (PCI_MSIX_ENTRY_SIZE - 1);
@@ -273,7 +273,7 @@ static int m

[Xen-devel] [PATCH v3 03/18] x86/hvm: remove extraneous parameter from hvmtrace_io_assist()

2015-06-23 Thread Paul Durrant

The is_mmio parameter can be inferred from the ioreq type.

Signed-off-by: Paul Durrant 
Cc: Keir Fraser 
Cc: Jan Beulich 
Cc: Andrew Cooper 
---
 xen/arch/x86/hvm/emulate.c |7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index b412302..935eab3 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -23,8 +23,9 @@
 #include 
 #include 
 
-static void hvmtrace_io_assist(int is_mmio, ioreq_t *p)
+static void hvmtrace_io_assist(ioreq_t *p)
 {
+bool_t is_mmio = (p->type == IOREQ_TYPE_COPY);
 unsigned int size, event;
 unsigned char buffer[12];
 
@@ -139,7 +140,7 @@ static int hvmemul_do_io(
 if ( !data_is_addr )
 memcpy(&p.data, p_data, size);
 
-hvmtrace_io_assist(is_mmio, &p);
+hvmtrace_io_assist(&p);
 }
 
 if ( is_mmio )
@@ -200,7 +201,7 @@ static int hvmemul_do_io(
  finish_access:
 if ( dir == IOREQ_READ )
 {
-hvmtrace_io_assist(is_mmio, &p);
+hvmtrace_io_assist(&p);
 
 if ( !data_is_addr )
 memcpy(p_data, &vio->io_data, size);
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v3 01/18] x86/hvm: simplify hvmemul_do_io()

2015-06-23 Thread Paul Durrant

Currently hvmemul_do_io() handles paging for I/O to/from a guest address
inline. This causes every exit point to have to execute:

if ( ram_page )
put_page(ram_page);

This patch introduces wrapper hvmemul_do_io_addr() and
hvmemul_do_io_buffer() functions. The latter is used for I/O to/from a Xen
buffer and thus the complexity of paging can be restricted only to the
former, making the common hvmemul_do_io() function less convoluted.

This patch also tightens up some types and introduces pio/mmio wrappers
for the above functions with comments to document their semantics.

Signed-off-by: Paul Durrant 
Cc: Keir Fraser 
Cc: Jan Beulich 
Cc: Andrew Cooper 
---
 xen/arch/x86/hvm/emulate.c|  278 -
 xen/arch/x86/hvm/io.c |4 +-
 xen/include/asm-x86/hvm/emulate.h |   17 ++-
 3 files changed, 198 insertions(+), 101 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index ac9c9d6..9d7af0c 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -51,41 +51,23 @@ static void hvmtrace_io_assist(int is_mmio, ioreq_t *p)
 }
 
 static int hvmemul_do_io(
-int is_mmio, paddr_t addr, unsigned long *reps, int size,
-paddr_t ram_gpa, int dir, int df, void *p_data)
+bool_t is_mmio, paddr_t addr, unsigned long *reps, unsigned int size,
+uint8_t dir, bool_t df, bool_t data_is_addr, uintptr_t data)
 {
 struct vcpu *curr = current;
-struct hvm_vcpu_io *vio;
+struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
 ioreq_t p = {
 .type = is_mmio ? IOREQ_TYPE_COPY : IOREQ_TYPE_PIO,
 .addr = addr,
 .size = size,
 .dir = dir,
 .df = df,
-.data = ram_gpa,
-.data_is_ptr = (p_data == NULL),
+.data = data,
+.data_is_ptr = data_is_addr, /* ioreq_t field name is misleading */
 };
-unsigned long ram_gfn = paddr_to_pfn(ram_gpa);
-p2m_type_t p2mt;
-struct page_info *ram_page;
+void *p_data = (void *)data;
 int rc;
 
-/* Check for paged out page */
-ram_page = get_page_from_gfn(curr->domain, ram_gfn, &p2mt, P2M_UNSHARE);
-if ( p2m_is_paging(p2mt) )
-{
-if ( ram_page )
-put_page(ram_page);
-p2m_mem_paging_populate(curr->domain, ram_gfn);
-return X86EMUL_RETRY;
-}
-if ( p2m_is_shared(p2mt) )
-{
-if ( ram_page )
-put_page(ram_page);
-return X86EMUL_RETRY;
-}
-
 /*
  * Weird-sized accesses have undefined behaviour: we discard writes
  * and read all-ones.
@@ -93,23 +75,10 @@ static int hvmemul_do_io(
 if ( unlikely((size > sizeof(long)) || (size & (size - 1))) )
 {
 gdprintk(XENLOG_WARNING, "bad mmio size %d\n", size);
-ASSERT(p_data != NULL); /* cannot happen with a REP prefix */
-if ( dir == IOREQ_READ )
-memset(p_data, ~0, size);
-if ( ram_page )
-put_page(ram_page);
 return X86EMUL_UNHANDLEABLE;
 }
 
-if ( !p.data_is_ptr && (dir == IOREQ_WRITE) )
-{
-memcpy(&p.data, p_data, size);
-p_data = NULL;
-}
-
-vio = &curr->arch.hvm_vcpu.hvm_io;
-
-if ( is_mmio && !p.data_is_ptr )
+if ( is_mmio && !data_is_addr )
 {
 /* Part of a multi-cycle read or write? */
 if ( dir == IOREQ_WRITE )
@@ -117,11 +86,7 @@ static int hvmemul_do_io(
 paddr_t pa = vio->mmio_large_write_pa;
 unsigned int bytes = vio->mmio_large_write_bytes;
 if ( (addr >= pa) && ((addr + size) <= (pa + bytes)) )
-{
-if ( ram_page )
-put_page(ram_page);
 return X86EMUL_OKAY;
-}
 }
 else
 {
@@ -131,8 +96,6 @@ static int hvmemul_do_io(
 {
 memcpy(p_data, &vio->mmio_large_read[addr - pa],
size);
-if ( ram_page )
-put_page(ram_page);
 return X86EMUL_OKAY;
 }
 }
@@ -144,40 +107,28 @@ static int hvmemul_do_io(
 break;
 case HVMIO_completed:
 vio->io_state = HVMIO_none;
-if ( p_data == NULL )
-{
-if ( ram_page )
-put_page(ram_page);
+if ( data_is_addr || dir == IOREQ_WRITE )
 return X86EMUL_UNHANDLEABLE;
-}
 goto finish_access;
 case HVMIO_dispatched:
 /* May have to wait for previous cycle of a multi-write to complete. */
-if ( is_mmio && !p.data_is_ptr && (dir == IOREQ_WRITE) &&
+if ( is_mmio && !data_is_addr && (dir == IOREQ_WRITE) &&
  (addr == (vio->mmio_large_write_pa +
vio->mmio_large_write_bytes)) )
-{
-if ( ram_page )
-put_page(ram_page);
 return X86EMUL_RETRY;
-}
 /* fallthrough */
 default:
-if ( ram_page )
-put

[Xen-devel] [PATCH v3 09/18] x86/hvm: unify dpci portio intercept with standard portio intercept

2015-06-23 Thread Paul Durrant

This patch re-works the dpci portio intercepts so that they can be unified
with standard portio handling thereby removing a substantial amount of
code duplication.

Signed-off-by: Paul Durrant 
Cc: Keir Fraser 
Cc: Jan Beulich 
Cc: Andrew Cooper 
---
 xen/arch/x86/hvm/hvm.c |2 +
 xen/arch/x86/hvm/intercept.c   |   22 ++--
 xen/arch/x86/hvm/io.c  |  225 +---
 xen/include/asm-x86/hvm/io.h   |8 ++
 xen/include/asm-x86/hvm/vcpu.h |2 +
 xen/include/xen/iommu.h|1 -
 6 files changed, 88 insertions(+), 172 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index c10db78..f8486f4 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1486,6 +1486,8 @@ int hvm_domain_initialise(struct domain *d)
 else
 d->arch.hvm_domain.io_bitmap = hvm_io_bitmap;
 
+register_dpci_portio_handler(d);
+
 if ( is_pvh_domain(d) )
 {
 register_portio_handler(d, 0, 0x10003, handle_pvh_io);
diff --git a/xen/arch/x86/hvm/intercept.c b/xen/arch/x86/hvm/intercept.c
index 5e8d8b2..5633959 100644
--- a/xen/arch/x86/hvm/intercept.c
+++ b/xen/arch/x86/hvm/intercept.c
@@ -116,10 +116,7 @@ static int hvm_process_io_intercept(struct hvm_io_handler 
*handler,
 ioreq_t *p)
 {
 struct hvm_vcpu_io *vio = ¤t->arch.hvm_vcpu.hvm_io;
-const struct hvm_io_ops *ops =
-(p->type == IOREQ_TYPE_COPY) ?
-&mmio_ops :
-&portio_ops;
+const struct hvm_io_ops *ops = handler->ops;
 int rc = X86EMUL_OKAY, i, step = p->df ? -p->size : p->size;
 uint64_t data;
 uint64_t addr;
@@ -237,16 +234,13 @@ static struct hvm_io_handler *hvm_find_io_handler(ioreq_t 
*p)
 {
 struct vcpu *curr = current;
 struct domain *curr_d = curr->domain;
-const struct hvm_io_ops *ops =
-(p->type == IOREQ_TYPE_COPY) ?
-&mmio_ops :
-&portio_ops;
 unsigned int i;
 
 for ( i = 0; i < curr_d->arch.hvm_domain.io_handler_count; i++ )
 {
 struct hvm_io_handler *handler =
 &curr_d->arch.hvm_domain.io_handler[i];
+const struct hvm_io_ops *ops = handler->ops;
 uint64_t start, end, count = p->count, size = p->size;
 
 if ( handler->type != p->type )
@@ -285,13 +279,7 @@ int hvm_io_intercept(ioreq_t *p)
 {
 struct hvm_io_handler *handler;
 
-if ( p->type == IOREQ_TYPE_PIO )
-{
-int rc = dpci_ioport_intercept(p);
-if ( (rc == X86EMUL_OKAY) || (rc == X86EMUL_RETRY) )
-return rc;
-}
-else if ( p->type == IOREQ_TYPE_COPY )
+if ( p->type == IOREQ_TYPE_COPY )
 {
 int rc = stdvga_intercept_mmio(p);
 if ( (rc == X86EMUL_OKAY) || (rc == X86EMUL_RETRY) )
@@ -306,7 +294,7 @@ int hvm_io_intercept(ioreq_t *p)
 return hvm_process_io_intercept(handler, p);
 }
 
-static struct hvm_io_handler *hvm_next_io_handler(struct domain *d)
+struct hvm_io_handler *hvm_next_io_handler(struct domain *d)
 {
 unsigned int i = d->arch.hvm_domain.io_handler_count++;
 
@@ -321,6 +309,7 @@ void register_mmio_handler(struct domain *d, const struct 
hvm_mmio_ops *ops)
 struct hvm_io_handler *handler = hvm_next_io_handler(d);
 
 handler->type = IOREQ_TYPE_COPY;
+handler->ops = &mmio_ops;
 handler->u.mmio.ops = ops;
 }
 
@@ -330,6 +319,7 @@ void register_portio_handler(struct domain *d, uint32_t 
addr,
 struct hvm_io_handler *handler = hvm_next_io_handler(d);
 
 handler->type = IOREQ_TYPE_PIO;
+handler->ops = &portio_ops;
 handler->u.portio.start = addr;
 handler->u.portio.end = addr + size;
 handler->u.portio.action = action;
diff --git a/xen/arch/x86/hvm/io.c b/xen/arch/x86/hvm/io.c
index c0964ec..51ef19a 100644
--- a/xen/arch/x86/hvm/io.c
+++ b/xen/arch/x86/hvm/io.c
@@ -208,185 +208,100 @@ void hvm_io_assist(ioreq_t *p)
 }
 }
 
-static int dpci_ioport_read(uint32_t mport, ioreq_t *p)
+static bool_t dpci_portio_accept(struct hvm_io_handler *handler,
+ uint64_t addr,
+ uint64_t size)
 {
-struct hvm_vcpu_io *vio = ¤t->arch.hvm_vcpu.hvm_io;
-int rc = X86EMUL_OKAY, i, step = p->df ? -p->size : p->size;
-uint32_t data = 0;
+struct vcpu *curr = current;
+struct hvm_iommu *hd = domain_hvm_iommu(curr->domain);
+struct hvm_vcpu_io *vio = &curr->arch.hvm_vcpu.hvm_io;
+struct g2m_ioport *g2m_ioport;
+uint32_t start, end;
+uint32_t gport = addr, mport;
 
-for ( i = 0; i < p->count; i++ )
+list_for_each_entry( g2m_ioport, &hd->arch.g2m_ioport_list, list )
 {
-if ( vio->mmio_retrying )
-{
-if ( vio->mmio_large_read_bytes != p->size )
-return X86EMUL_UNHANDLEABLE;
-memcpy(&data, vio->mmio_large_read, p->size);
-vio->mmio_large_read_bytes = 0;
-vio->mmio_retrying = 0;
-}
-else switch ( p->size )
-{
-case

[Xen-devel] [PATCH v3 00/18] x86/hvm: I/O emulation cleanup and fix

2015-06-23 Thread Paul Durrant

This patch series re-works much of the code involved in emulation of port
and memory mapped I/O for HVM guests.

The code has become very convoluted and, at least by inspection, certain
emulations will apparently malfunction.

The series is broken down into 18 patches (which are also available in
my xenbits repo: http://xenbits.xen.org/gitweb/?p=people/pauldu/xen.git
on the emulation25 branch) as follows:

0001-x86-hvm-simplify-hvmemul_do_io.patch
0002-x86-hvm-remove-hvm_io_pending-check-in-hvmemul_do_io.patch
0003-x86-hvm-remove-extraneous-parameter-from-hvmtrace_io.patch
0004-x86-hvm-make-sure-translated-MMIO-reads-or-writes-fa.patch
0005-x86-hvm-remove-multiple-open-coded-chunking-loops.patch
0006-x86-hvm-re-name-struct-hvm_mmio_handler-to-hvm_mmio_.patch
0007-x86-hvm-unify-internal-portio-and-mmio-intercepts.patch
0008-x86-hvm-add-length-to-mmio-check-op.patch
0009-x86-hvm-unify-dpci-portio-intercept-with-standard-po.patch
0010-x86-hvm-unify-stdvga-mmio-intercept-with-standard-mm.patch
0011-x86-hvm-revert-82ed8716b-fix-direct-PCI-port-I-O-emu.patch
0012-x86-hvm-only-call-hvm_io_assist-from-hvm_wait_for_io.patch
0013-x86-hvm-split-I-O-completion-handling-from-state-mod.patch
0014-x86-hvm-remove-HVMIO_dispatched-I-O-state.patch
0015-x86-hvm-remove-hvm_io_state-enumeration.patch
0016-x86-hvm-use-ioreq_t-to-track-in-flight-state.patch
0017-x86-hvm-always-re-emulate-I-O-from-a-buffer.patch
0018-x86-hvm-track-large-memory-mapped-accesses-by-buffer.patch

v2:
 - Removed bogus assertion from patch 16
 - Re-worked patch 17 after basic testing of back-port onto XenServer

v3:
 - Addressed comments from Jan
 - Re-ordered series to bring a couple of more trivial patches to the
   front
 - Backport to XenServer (4.5) now passing automated tests
 - Tested on unstable with QEMU upstream and trad, with and without
   HAP (to force shadow emulation)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v3 02/18] x86/hvm: remove hvm_io_pending() check in hvmemul_do_io()

2015-06-23 Thread Paul Durrant

The check is done at the wrong point (since it is irrelevant if the
I/O is to be handled by the hypervisor) and its functionality can be
covered by returning X86EMUL_UNHANDLEABLE from hvm_send_assist_req()
instead.

This patch also removes the domain_crash() call from
hvm_send_assist_req(). Returning X86EMUL_UNHANDLEABLE allows the
higher layers of emulation to decide what to do instead.

Signed-off-by: Paul Durrant 
Cc: Keir Fraser 
Cc: Jan Beulich 
Cc: Andrew Cooper 
---
 xen/arch/x86/hvm/emulate.c|   10 ++
 xen/arch/x86/hvm/hvm.c|   16 ++--
 xen/include/asm-x86/hvm/hvm.h |2 +-
 3 files changed, 9 insertions(+), 19 deletions(-)

diff --git a/xen/arch/x86/hvm/emulate.c b/xen/arch/x86/hvm/emulate.c
index 9d7af0c..b412302 100644
--- a/xen/arch/x86/hvm/emulate.c
+++ b/xen/arch/x86/hvm/emulate.c
@@ -121,12 +121,6 @@ static int hvmemul_do_io(
 return X86EMUL_UNHANDLEABLE;
 }
 
-if ( hvm_io_pending(curr) )
-{
-gdprintk(XENLOG_WARNING, "WARNING: io already pending?\n");
-return X86EMUL_UNHANDLEABLE;
-}
-
 vio->io_state = (data_is_addr || dir == IOREQ_WRITE) ?
 HVMIO_dispatched : HVMIO_awaiting_completion;
 vio->io_size = size;
@@ -188,8 +182,8 @@ static int hvmemul_do_io(
 }
 else
 {
-rc = X86EMUL_RETRY;
-if ( !hvm_send_assist_req(s, &p) )
+rc = hvm_send_assist_req(s, &p);
+if ( rc != X86EMUL_RETRY )
 vio->io_state = HVMIO_none;
 else if ( data_is_addr || dir == IOREQ_WRITE )
 rc = X86EMUL_OKAY;
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index d5e5242..535d622 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -2605,7 +2605,7 @@ int hvm_buffered_io_send(ioreq_t *p)
 return 1;
 }
 
-bool_t hvm_send_assist_req(struct hvm_ioreq_server *s, ioreq_t *proto_p)
+int hvm_send_assist_req(struct hvm_ioreq_server *s, ioreq_t *proto_p)
 {
 struct vcpu *curr = current;
 struct domain *d = curr->domain;
@@ -2613,7 +2613,7 @@ bool_t hvm_send_assist_req(struct hvm_ioreq_server *s, 
ioreq_t *proto_p)
 
 ASSERT(s);
 if ( unlikely(!vcpu_start_shutdown_deferral(curr)) )
-return 0; /* implicitly bins the i/o operation */
+return X86EMUL_OKAY;
 
 list_for_each_entry ( sv,
   &s->ioreq_vcpu_list,
@@ -2628,14 +2628,14 @@ bool_t hvm_send_assist_req(struct hvm_ioreq_server *s, 
ioreq_t *proto_p)
 {
 gprintk(XENLOG_ERR, "device model set bad IO state %d\n",
 p->state);
-goto crash;
+break;
 }
 
 if ( unlikely(p->vp_eport != port) )
 {
 gprintk(XENLOG_ERR, "device model set bad event channel %d\n",
 p->vp_eport);
-goto crash;
+break;
 }
 
 proto_p->state = STATE_IOREQ_NONE;
@@ -2651,15 +2651,11 @@ bool_t hvm_send_assist_req(struct hvm_ioreq_server *s, 
ioreq_t *proto_p)
  */
 p->state = STATE_IOREQ_READY;
 notify_via_xen_event_channel(d, port);
-break;
+return X86EMUL_RETRY;
 }
 }
 
-return 1;
-
- crash:
-domain_crash(d);
-return 0;
+return X86EMUL_UNHANDLEABLE;
 }
 
 void hvm_complete_assist_req(ioreq_t *p)
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 77eeac5..57f9605 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -230,7 +230,7 @@ void hvm_vcpu_reset_state(struct vcpu *v, uint16_t cs, 
uint16_t ip);
 
 struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d,
  ioreq_t *p);
-bool_t hvm_send_assist_req(struct hvm_ioreq_server *s, ioreq_t *p);
+int hvm_send_assist_req(struct hvm_ioreq_server *s, ioreq_t *p);
 void hvm_broadcast_assist_req(ioreq_t *p);
 void hvm_complete_assist_req(ioreq_t *p);
 
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

1 2 3 >

1 - 100 of 277 matches

Mail list logo