Re: [Xen-devel] [RFC] xen/arm: Handling cache maintenance instructions by set/way

2017-12-08 Thread Tim Deegan
Hi,

At 12:58 + on 06 Dec (1512565090), Julien Grall wrote:
> On 12/06/2017 12:28 PM, George Dunlap wrote:
> > 2. It sounds like rather than using PoD, you could use the
> > "misconfigured p2m table" technique that x86 uses: set bits in the p2m
> > entry which cause a specific kind of HAP fault when accessed.  The fault
> > handler then looks in the p2m entry, and if it finds an otherwise valid
> > entry, it just fixes the "misconfigured" bits and continues.
> 
> I thought about this. But when do you set the entry to misconfigured?
> 
> If you take the example of Linux 32-bit. There are a couple of full 
> cache clean during the boot of uni-processor. So you would need to go 
> through the p2m multiple time and reset the access bits.

My 2c (echoing what some others have already said):

+1 for avoiding the full majesty of PoD if you don't need it.

It should be possible to do something like the misconfigured-entry bit
trick by _allocating_ the memory up-front and building the p2m entries
but only making them usable by the {IO}MMUs on first access.  That
would make these early p2m walks shorter (because they can skip whole
subtrees that aren't marked present yet) without making major changes
to domain build or introducing run-time failures.

Also beware of DoS conditions -- a guest that touches all its memory
and then flushes by set/way mustn't be allowed to hurt the rest of the
system.  That probably means the set/way flush has to be preemptable.

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Ping: [PATCH] x86/mm: drop bogus assertion

2017-12-08 Thread Tim Deegan
Hi,

At 02:31 -0700 on 06 Dec (1512527481), Jan Beulich wrote:
> >>> On 30.11.17 at 10:10,  wrote:
> > i.e. the guest specified a runstate area address which has a non-present
> > mapping in the page tables [EC=0002 CR2=88003d405220], but that's
> > not something the hypervisor needs to be concerned about.) Release
> > builds work fine, which is a first indication that the assertion isn't
> > really needed.

Yep, this assertion should just go away, so:

Reviewed-by: Tim Deegan 

> > What's worse though - there appears to be a timing window where the
> > guest runs in shadow mode, but not in log-dirty mode, and that is what
> > triggers the assertion (the same could, afaict, be achieved by test-
> > enabling shadow mode on a PV guest). This is because turing off log-
> > dirty mode is being performed in two steps: First the log-dirty bit gets
> > cleared (paging_log_dirty_disable() [having paused the domain] ->
> > sh_disable_log_dirty() -> shadow_one_bit_disable()), followed by
> > unpausing the domain and only then clearing shadow mode (via
> > shadow_test_disable(), which pauses the domain a second time).
> > 
> > Hence besides removing the ASSERT() here (or optionally replacing it by
> > explicit translate and refcounts mode checks, but this seems rather
> > pointless now that the three are tied together) I wonder whether either
> > shadow_one_bit_disable() should turn off shadow mode if no other bit
> > besides PG_SH_enable remains set (just like shadow_one_bit_enable()
> > enables it if not already set), or the domain pausing scope should be
> > extended so that both steps occur without the domain getting a chance to
> > run in between.

I'd be fine with either of those.  It would avoid unexpected surprises
in the relatively untested pv-shadow-without-logdirty paths.  

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [RFC] xen/arm: Handling cache maintenance instructions by set/way

2017-12-10 Thread Tim Deegan
At 14:38 + on 08 Dec (1512743913), Julien Grall wrote:
> On 08/12/17 08:03, Tim Deegan wrote:
> > +1 for avoiding the full majesty of PoD if you don't need it.
> > 
> > It should be possible to do something like the misconfigured-entry bit
> > trick by _allocating_ the memory up-front and building the p2m entries
> > but only making them usable by the {IO}MMUs on first access.  That
> > would make these early p2m walks shorter (because they can skip whole
> > subtrees that aren't marked present yet) without making major changes
> > to domain build or introducing run-time failures.
> 
> I am not aware of any way on Arm to misconfigure an entry. We do have 
> valid and access bits, although they will affect the IOMMU as well. So 
> it will not be possible to get page-table sharing with this "feature" 
> enabled.

How unfortunate.  How does KVM's demand-population scheme handle the IOMMU? 

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v8 1/6] x86/domain: remove the 'oos_off' flag

2019-09-03 Thread Tim Deegan
At 15:50 +0100 on 02 Sep (1567439409), Paul Durrant wrote:
> The flag is not needed since the domain 'options' can now be tested
> directly.
> 
> Signed-off-by: Paul Durrant 

Acked-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/shadow: don't enable shadow mode with too small a shadow allocation (part 2)

2019-09-05 Thread Tim Deegan
At 09:55 +0200 on 04 Sep (1567590940), Jan Beulich wrote:
> Commit 2634b997af ("x86/shadow: don't enable shadow mode with too small
> a shadow allocation") was incomplete: The adjustment done there to
> shadow_enable() is also needed in shadow_one_bit_enable(). The (new)
> problem report was (apparently) a failed PV guest migration followed by
> another migration attempt for that same guest. Disabling log-dirty mode
> after the first one had left a couple of shadow pages allocated (perhaps
> something that also wants fixing), and hence the second enabling of
> log-dirty mode wouldn't have allocated anything further.
> 
> Reported-by: James Wang 
> Signed-off-by: Jan Beulich 

Acked-by: Tim Deegan 

> --- a/xen/arch/x86/mm/shadow/common.c
> +++ b/xen/arch/x86/mm/shadow/common.c
> @@ -2864,7 +2864,8 @@ static int shadow_one_bit_enable(struct
>  
>  mode |= PG_SH_enable;
>  
> -if ( d->arch.paging.shadow.total_pages == 0 )
> +if ( d->arch.paging.shadow.total_pages <
> + sh_min_allocation(d) + d->arch.paging.shadow.p2m_pages )
>  {
>  /* Init the shadow memory allocation if the user hasn't done so */
>  if ( shadow_set_allocation(d, 1, NULL) != 0 )

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/shadow: fold p2m page accounting into sh_min_allocation()

2019-09-11 Thread Tim Deegan
At 10:34 +0200 on 05 Sep (1567679687), Jan Beulich wrote:
> This is to make the function live up to the promise its name makes. And
> it simplifies all callers.
> 
> Suggested-by: Andrew Cooper 
> Signed-off-by: Jan Beulich 

Acked-by: Tim Deegan 


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] x86/shadow: adjust minimum allocation calculations

2019-02-09 Thread Tim Deegan
At 04:41 -0700 on 07 Feb (1549514492), Jan Beulich wrote:
> A previously bad situation has become worse with the early setting of
> ->max_vcpus: The value returned by shadow_min_acceptable_pages() has
> further grown, and hence now holds back even more memory from use for
> the p2m.
> 
> Make sh_min_allocation() account for all p2m memory needed for
> shadow_enable() to succeed during domain creation (at which point the
> domain has no memory at all allocated to it yet, and hence use of
> d->tot_pages is meaningless).
> 
> Also make shadow_min_acceptable_pages() no longer needlessly add 1 to
> the vCPU count.
> 
> Finally make the debugging printk() in shadow_alloc_p2m_page() a little
> more useful by logging some of the relevant domain settings.
> 
> Reported-by: Roger Pau Monné 
> Signed-off-by: Jan Beulich 
> Reviewed-by: Roger Pau Monné 

Acked-by: Tim Deegan 

I don't think it's worth bikeshedding too hard over individual
pages on this path; these numbers look OK.

Cheers,

Tim.


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/shadow: don't use map_domain_page_global() on paths that may not fail

2019-02-22 Thread Tim Deegan
At 08:15 -0700 on 20 Feb (1550650529), Jan Beulich wrote:
> The assumption (according to one comment) and hope (according to
> another) that map_domain_page_global() can't fail are both wrong on
> large enough systems. Do away with the guest_vtable field altogether,
> and establish / tear down the desired mapping as necessary.
> 
> The alternatives, discarded as being undesirable, would have been to
> either crash the guest in sh_update_cr3() when the mapping fails, or to
> bubble up an error indicator, which upper layers would have a hard time
> to deal with (other than again by crashing the guest).
> 
> Signed-off-by: Jan Beulich 

I follow your argument, so Acked-by: Tim Deegan 

I would expect this to have a measurable cost on page fault times (on
configurations where global map isn't just a directmap).  It would be
good to know if that's the case.

Cheers,

Tim.


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 0/2] x86/shadow: two tiny further bits of PV/HVM separation

2019-03-11 Thread Tim Deegan
At 10:56 -0600 on 11 Mar (1552301785), Jan Beulich wrote:
> 1: sh_validate_guest_pt_write() is HVM-only
> 2: sh_{write,cmpxchg}_guest_entry() are PV-only

Acked-by: Tim Deegan 

Thanks,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/shadow: Drop incorrect diagnostic when shadowing TSS.RSP0

2019-04-16 Thread Tim Deegan
At 13:32 +0100 on 08 Apr (1554730320), Andrew Cooper wrote:
> On 08/04/2019 13:11, Jan Beulich wrote:
> >>>> On 08.04.19 at 13:37,  wrote:
> >> On 08/04/2019 11:14, Jan Beulich wrote:
> >>>>>> On 05.04.19 at 21:09,  wrote:
> >>>> --- a/xen/arch/x86/mm/shadow/multi.c
> >>>> +++ b/xen/arch/x86/mm/shadow/multi.c
> >>>> @@ -3305,8 +3305,9 @@ static int sh_page_fault(struct vcpu *v,
> >>>>  {
> >>>>  /*
> >>>>   * If we are in the middle of injecting an exception or 
> >>>> interrupt then
> >>>> - * we should not emulate: it is not the instruction at %eip 
> >>>> that caused
> >>>> - * the fault. Furthermore it is almost certainly the case the 
> >>>> handler
> >>>> + * we should not emulate: the fault is a side effect of the 
> >>>> processor
> >>>> + * trying to push an exception frame onto a stack which has yet 
> >>>> to be
> >>>> + * shadowed.  Furthermore it is almost certainly the case the 
> >>>> handler
> >>>>   * stack is currently considered to be a page table, so we 
> >>>> should
> >>>>   * unshadow the faulting page before exiting.
> >>>>   */
> > I'm (at least) mildly confused: I follow what you write (I think), but
> > you again say "the stack always becomes shadowed". My original
> > question was whether you really mean that, as stacks, if at all,
> > should get shadowed only unintentionally (and hence get un-shadowed
> > immediately when that condition is detected). That is, my (slightly
> > rephrased) question stands: Do you perhaps mean the page tables
> > mapping the stack to become shadowed, rather than the stack itself?
> 
> I guess this is an issue of terminology, to which I defer to Tim to judge.

Hi,

Sorry for the delay; I have been away.

FWIW I prefer the original comment, and I think that referring to the
stack as "not yet shadowed" is confusing when the problem might be
that the stack itself is indeed shadowed.  I'd also be happy with just
saying "the fault is a side effect of the processor trying to push an
exception frame onto the stack."

Happy to see the debug message being removed if it's being triggered
in the real world.  IIRC it has not proved to be useful since that
code was first developed.  So, with the comment adjusted,
Acked-by: Tim Deegan 

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] Fix the KDD_LOG statements to use appropriate format specifier for printing uint64_t

2019-12-01 Thread Tim Deegan
At 03:11 -0500 on 30 Nov (1575083478), Julian Tuminaro wrote:
> Previous commit in kdd.c had a small issue which lead to warning/error while 
> compiling
> on 32-bit systems due to mismatch of type size while doing type cast from 
> uint64_t to
> void *
> 
> Signed-off-by: Jenish Rakholiya 
> Signed-off-by: Julian Tuminaro 

Acked-by: Tim Deegan 

Thanks for the fix!

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH] MAINTAINERS: drop Tim Deegan from 'The Rest'

2019-10-16 Thread Tim Deegan
I have not been active in this role for a while now.

Signed-off-by: Tim Deegan 
---
 MAINTAINERS | 1 -
 1 file changed, 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 533cfdc08f..f60d765baf 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -537,7 +537,6 @@ M:  Jan Beulich 
 M: Julien Grall 
 M: Konrad Rzeszutek Wilk 
 M: Stefano Stabellini 
-M: Tim Deegan 
 M: Wei Liu 
 L: xen-devel@lists.xenproject.org
 S: Supported
-- 
2.20.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/1] kdd.c: Add support for initial handshake in KD protocol for Win 7, 8 and 10 (64 bit)

2019-11-10 Thread Tim Deegan
Hi,

At 21:24 -0500 on 05 Nov (1572989067), Julian Tuminaro wrote:
> Current implementation of find_os is based on the hard-coded values for
> different Windows version. It uses the value for get the address to
> start looking for DOS header in the given specified range. However, this
> is not scalable to all version of Windows as it will require us to keep
> adding new entries and also due to KASLR, chances of not hitting the PE
> header is significant. We implement a way for 64-bit systems to use IDT
> entry to get a valid exception/interrupt handler and then move back into
> the memory to find the valid DOS header. Since IDT entries are protected
> by PatchGuard, we think our assumption that IDT entries will not be
> corrupted is valid for our purpose. Once we have the image base, we
> search for the DBGKD_GET_VERSION64 structure type in .data section to
> get information required for handshake.

Thank you very much for this - it looks super-useful!  Does this
technique only work if the guest kernel has debugging enabled, or can
it work on all systems?

I have some commetns on the code, below.

>  /* Windows version details */
>  typedef struct {
>  uint32_t build; 
> @@ -62,6 +64,7 @@ typedef struct {
>  uint32_t version;   /* +-> NtBuildNumber */
>  uint32_t modules;   /* +-> PsLoadedModuleList */
>  uint32_t prcbs; /* +-> KiProcessorBlock */
> +uint32_t kddl;

This needs a comment describing the Windows name of what it points to.

> +/**
> + * @brief Parse the memory at \a filebase as a valid DOS header and get 
> virtual
> + * address offset and size for any given section name (if it exists)
> + *
> + * @param s Pointer to the kdd_state structure
> + * @param filebase Base address of the file structure
> + * @param sectname Pointer to the section name c-string to look for
> + * @param vaddr Pointer to write the virtual address of section start to
> + * (if found)
> + * @param visze Pointer to write the section size to (if found)
> + *
> + * @return -1 on failure to find the section name
> + * @return 0 on success
> + */
> +static int get_pe64_sections(kdd_state *s, uint64_t filebase, char *sectname,
> +uint64_t *vaddr, uint32_t *vsize)

These new functions don't belong in the 'Utility functions' section.
Please move them to beside the other OS-finding code.

> +{
> +uint8_t buf[0x30];

PE_SECT_ENT_SZ, please.

> +uint64_t pe_hdr;
> +uint64_t sect_start;
> +uint16_t num_sections;
> +int ret;
> +
> +ret = -1;
> +
> +if (!s->os.w64) {
> +return ret;
> +}
> +
> +// read PE header offset
> +if (kdd_read_virtual(s, s->cpuid, filebase + DOS_HDR_PE_OFF, 
> DOS_HDR_PE_SZ,
> +buf) != DOS_HDR_PE_SZ) {
> +return -1;
> +}
> +pe_hdr = filebase + *(uint32_t *)buf;

Here and elsewhere, please read directly into the variables, e.g.:

  uint32_t pe_hdr;
  kdd_read_virtual(s, s->cpuid, filebase + DOS_HDR_PE_OFF,
   sizeof pe_hdr, &pe_hdr);
  pe_hdr += filebase;

That gives neater code and avoids any confusion about the sizes of
various copies and buffers.

> +
> +// read number of sections
> +if (kdd_read_virtual(s, s->cpuid, pe_hdr + PE_NUM_SECTION_OFF,
> +PE_NUM_SECTION_SZ, &buf) != PE_NUM_SECTION_SZ) {
> +return -1;
> +}
> +num_sections = *(uint16_t *)buf;

This needs a check for very large numbers -- loading 65,535 section
headers might take a long time.

> +// read size of optional header
> +if (kdd_read_virtual(s, s->cpuid, pe_hdr + PE_OPT_HDR_SZ_OFF,
> +PE_OPT_HDR_SZ_SZ, &buf) != PE_OPT_HDR_SZ_SZ) {
> +return -1;
> +}
> +
> +// 0x18 is the size of PE header
> +sect_start = pe_hdr + PE_HDR_SZ + *(uint16_t *)buf;
> +
> +for (int i = 0; i < num_sections; i++) {
> +if (kdd_read_virtual(s, s->cpuid, sect_start + (i * PE_SECT_ENT_SZ),
> +PE_SECT_ENT_SZ, &buf) != PE_SECT_ENT_SZ) {
> +return -1;
> +}
> +
> +if (!strncmp(sectname, (char *)(buf + PE_SECT_NAME_OFF),
> +PE_SECT_NAME_SZ)) {
> +*vaddr = filebase + *(uint32_t *)(buf + PE_SECT_RVA_OFF);
> +*vsize = *(uint32_t *)(buf + PE_SECT_VSIZE_OFF);
> +ret = 0;
> +break;

Just 'return 0' will do here, and..

> +}
> +}
> +
> +return ret;

return -1 here, and drop 'ret'.

> +}
> +
> +/**
> + * @brief Get the OS information like base address, minor version,
> + * PsLoadedModuleList and DebuggerDataList (basically the fields of
> + * DBGKD_GET_VERSION64 struture required to do handshake?).
> + *
> + * This is done by reading the IDT entry for divide-by-zero exception and
> + * searching back into the memory for DOS header (which is our kernel base).
> + * Once we have the kernel base, we parse the PE header and look for kernel
> + * base address in the .data section. Once we have possible va

Re: [Xen-devel] [PATCH V2] kdd.c: Add support for initial handshake in KD protocol for Win 7, 8 and 10 (64 bit)

2019-11-14 Thread Tim Deegan
Hi,

At 23:55 -0500 on 13 Nov (1573689341), Julian Tuminaro wrote:
> From: Julian Tuminaro and Jenish Rakholiya  rakholiyajenish...@gmail.com>
> 
> Current implementation of find_os is based on the hard-coded values for
> different Windows version. It uses the value for get the address to
> start looking for DOS header in the given specified range. However, this
> is not scalable to all version of Windows as it will require us to keep
> adding new entries and also due to KASLR, chances of not hitting the PE
> header is significant. We implement a way for 64-bit systems to use IDT
> entry to get a valid exception/interrupt handler and then move back into
> the memory to find the valid DOS header. Since IDT entries are protected
> by PatchGuard, we think our assumption that IDT entries will not be
> corrupted is valid for our purpose. Once we have the image base, we
> search for the DBGKD_GET_VERSION64 structure type in .data section to
> get information required for handshake.

Thanks for the updates, this looks good!

Reviewed-by: Tim Deegan 


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/shadow: don't enable shadow mode with too small a shadow allocation

2018-11-30 Thread Tim Deegan
At 07:53 -0700 on 29 Nov (1543477992), Jan Beulich wrote:
> We've had more than one report of host crashes after failed migration,
> and in at least one case we've had a hint towards a too far shrunk
> shadow allocation pool. Instead of just checking the pool for being
> empty, check whether the pool is smaller than what
> shadow_set_allocation() would minimally bump it to if it was invoked in
> the first place.
> 
> Signed-off-by: Jan Beulich 

Acked-by: Tim Deegan 

Thanks,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 7/7] tools/kdd: mute spurious gcc warning

2018-04-06 Thread Tim Deegan
Hi,

At 16:32 +0200 on 06 Apr (1523032362), Marek Marczykowski-Górecki wrote:
> On Fri, Apr 06, 2018 at 09:56:05AM -0400, Boris Ostrovsky wrote:
> > Can we instead pre-compute the pointer to pacify the compiler? I haven't
> > seen the original error so I can't test it, but something like
> 
> Nope, it doesn't help. But adding "if (offset > 0)" before that "+=
> offset" does... 
> For me it looks like a gcc bug. Not sure how to deal with this. Enclose
> #pragma with #if __GNUC__ >= 8 ?

If the logic can be reshuffled so that gcc8 is happy with, that would
be better than silencing the warning, IMO.  As far as I can see the
pointer is indeed safe[1], and if the compiler's going to decide
that it's not, I'd rather have the warning than have it silently
optimized out.

Cheers,

Tim.

[1] except a possible memcpy(dst, one-past-the-end-of-the-struct, 0),
which is not what's being complained of here, and is a good deal
too theological for me while I'm on my holiday.

> > diff --git a/tools/debugger/kdd/kdd.c b/tools/debugger/kdd/kdd.c
> > index 61d769e..1b048ac 100644
> > --- a/tools/debugger/kdd/kdd.c
> > +++ b/tools/debugger/kdd/kdd.c
> > @@ -688,6 +688,7 @@ static void kdd_handle_read_ctrl(kdd_state *s)
> >  } else {
> >  /* 32-bit control-register space starts at 0x[2]cc, for 84 bytes */
> >  uint64_t offset = addr;
> > +    void *ptr = &ctrl.c32;
> >  if (offset > 0x200)
> >  offset -= 0x200;
> >  offset -= 0xcc;
> > @@ -695,10 +696,8 @@ static void kdd_handle_read_ctrl(kdd_state *s)
> >  KDD_LOG(s, "Request outside of known control space\n");
> >  len = 0;
> >  } else {
> > -#pragma GCC diagnostic push
> > -#pragma GCC diagnostic ignored "-Warray-bounds"
> > -    memcpy(buf, ((uint8_t *)&ctrl.c32) + offset, len);
> > -#pragma GCC diagnostic pop
> > +    ptr += offset;
> > +    memcpy(buf, ptr, len);
> >  }
> >  }
> >  
> > -boris
> 
> -- 
> Best Regards,
> Marek Marczykowski-Górecki
> Invisible Things Lab
> A: Because it messes up the order in which people normally read text.
> Q: Why is top-posting such a bad thing?



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] tools/kdd: use mute -Warray-bounds only on new gcc version

2018-04-07 Thread Tim Deegan
At 00:39 +0200 on 07 Apr (1523061555), Marek Marczykowski-Górecki wrote:
> On Fri, Apr 06, 2018 at 06:12:50PM +0100, Wei Liu wrote:
> > On Fri, Apr 06, 2018 at 05:32:57PM +0200, Marek Marczykowski-Górecki wrote:
> > Oh thanks for the quick turnaround.
> > 
> > Since Tim thinks it is better to not disable the warning -- how about
> > using assert() to give the compiler a hint? Would that work?
> 
> No, it doesn't.
> 
> Changing offset type to uint32_t, or unsigned int works. Also adding
> "offset &= 0x2ff" helps (but changes behavior). And now I wonder if
> this warning isn't legitimate - maybe there is some int overflow case
> that I don't see?

Huh.  I don't see an overflow - if anything, having offset be 64bit
should make it safer.  But having offset be uint32_t should be fine
too, so if it makes gcc happy, that's fine.

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] tools/kdd: silence gcc 8 warning a different way

2018-04-16 Thread Tim Deegan


At 09:29 +0100 on 16 Apr (1523870989), Wei Liu wrote:
> Cc Tim
> On Thu, Apr 12, 2018 at 06:04:49AM -0600, Jan Beulich wrote:
> > Older gcc doesn't like "#pragma GCC diagnostic" inside functions.
> > 
> > Signed-off-by: Jan Beulich 
> > 
> > --- a/tools/debugger/kdd/kdd.c
> > +++ b/tools/debugger/kdd/kdd.c
> > @@ -695,10 +695,10 @@ static void kdd_handle_read_ctrl(kdd_sta
> >  KDD_LOG(s, "Request outside of known control space\n");
> >  len = 0;
> >  } else {
> > -#pragma GCC diagnostic push
> > -#pragma GCC diagnostic ignored "-Warray-bounds"
> > -memcpy(buf, ((uint8_t *)&ctrl.c32) + offset, len);
> > -#pragma GCC diagnostic pop
> > +/* Suppress bogus gcc 8 "out of bounds" warning. */
> > +const uint8_t *src;
> > +asm ("" : "=g" (src) : "0" ((uint8_t *)&ctrl.c32 + offset));

That's terrifying!  Does casting the offset to uint32_t not DTRT?

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v7 1/9] x86/xpti: avoid copying L4 page table contents when possible

2018-04-16 Thread Tim Deegan
Hi,

At 02:43 -0600 on 13 Apr (1523587395), Jan Beulich wrote:
> >>> On 12.04.18 at 20:09,  wrote:
> > For mitigation of Meltdown the current L4 page table is copied to the
> > cpu local root page table each time a 64 bit pv guest is entered.
> > 
> > Copying can be avoided in cases where the guest L4 page table hasn't
> > been modified while running the hypervisor, e.g. when handling
> > interrupts or any hypercall not modifying the L4 page table or %cr3.
> > 
> > So add a per-cpu flag indicating whether the copying should be
> > performed and set that flag only when loading a new %cr3 or modifying
> > the L4 page table.  This includes synchronization of the cpu local
> > root page table with other cpus, so add a special synchronization flag
> > for that case.
> > 
> > A simple performance check (compiling the hypervisor via "make -j 4")
> > in dom0 with 4 vcpus shows a significant improvement:
> > 
> > - real time drops from 112 seconds to 103 seconds
> > - system time drops from 142 seconds to 131 seconds
> > 
> > Signed-off-by: Juergen Gross 
> > Reviewed-by: Jan Beulich 
> > ---
> > V7:
> > - add missing flag setting in shadow code
> 
> This now needs an ack from Tim (now Cc-ed).

I may be misunderstanding how this flag is supposed to work, but this
seems at first glance to do both too much and too little.  The sl4
table that's being modified may be in use on any number of other
pcpus, and might not be in use on the current pcpu.

It looks like the do_mmu_update path issues a flush IPI; I think that
the equivalent IPI would be a better place to hook if you can.

Also I'm not sure why the flag needs to be set in
l4e_propagate_from_guest() as well as shadow_set_l4e().  Can you
elaborate?

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v8 1/9] x86/xpti: avoid copying L4 page table contents when possible

2018-04-21 Thread Tim Deegan
Hi,

At 09:44 +0200 on 19 Apr (1524131080), Juergen Gross wrote:
> >> So either I'm adding some kind of locking/rcu, or I'm switching to use
> >> IPIs and access root_pgt_changed only locally.
> >>
> >> Do you have any preference?
> > 
> > Since issuing an IPI is just a single call, I'd prefer not to have new 
> > (locking,
> > rcu, or whatever else) logic added here. Unless of course someone, in
> > particular Tim, thinks sending an IPI here is a rather bad idea.

AFAICS you're calling this from shadow code whenever it changes an
L4e, so I'd rather not have an IPI here if we don't need it.

> Another alternative would be to pass another flag to the callers to
> signal the need for a flush. This would require quite some modifications
> to shadow code I'd like to avoid, though. OTOH this way we could combine
> flushing the tlb and the root page tables. Tim, any preferences?

This sounds a promising direction but it should be doabl without major
surgery to the shadow code.  The shadow code already leaves old sl4es
visible (in TLBs) when it's safe to do so, so I think the right place
to hook this is on the receiving side of the TLB flush IPI.  IOW as
long as:
 - you copy the L4 on context switch; and
 - you copy it on the TLB flush IPI is received
then you can rely on the existing TLB flush mechanisms to do what you need.
And shadow doesn't have to behave differently from 'normal' PV MM.

Do you think it needs more (in particular to avoid the L4 copy on TLB
flushes?)  Would a per-domain flag be good enough if per-vcpu is
difficult?

Tim.


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v8 1/9] x86/xpti: avoid copying L4 page table contents when possible

2018-04-22 Thread Tim Deegan
At 19:11 +0200 on 21 Apr (1524337893), Juergen Gross wrote:
> On 21/04/18 15:32, Tim Deegan wrote:
> > At 09:44 +0200 on 19 Apr (1524131080), Juergen Gross wrote:
> >> Another alternative would be to pass another flag to the callers to
> >> signal the need for a flush. This would require quite some modifications
> >> to shadow code I'd like to avoid, though. OTOH this way we could combine
> >> flushing the tlb and the root page tables. Tim, any preferences?
> > 
> > This sounds a promising direction but it should be doabl without major
> > surgery to the shadow code.  The shadow code already leaves old sl4es
> > visible (in TLBs) when it's safe to do so, so I think the right place
> > to hook this is on the receiving side of the TLB flush IPI.  IOW as
> > long as:
> >  - you copy the L4 on context switch; and
> >  - you copy it on the TLB flush IPI is received
> > then you can rely on the existing TLB flush mechanisms to do what you need.
> > And shadow doesn't have to behave differently from 'normal' PV MM.
> 
> It is not so easy. The problem is that e.g. a page fault will flush the
> TLB entry for the page in question, but it won't lead to the L4 to be
> copied.

Oh yes, I see; thanks for the explanation.  It might be worth copying
what the hardware does here, and checking/propagating the relevant l4e
in the PV pagefault handler.

Tim.


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v8 1/9] x86/xpti: avoid copying L4 page table contents when possible

2018-04-24 Thread Tim Deegan
At 07:45 +0200 on 23 Apr (1524469545), Juergen Gross wrote:
> On 22/04/18 18:39, Tim Deegan wrote:
> > At 19:11 +0200 on 21 Apr (1524337893), Juergen Gross wrote:
> >> On 21/04/18 15:32, Tim Deegan wrote:
> >>> At 09:44 +0200 on 19 Apr (1524131080), Juergen Gross wrote:
> >>>> Another alternative would be to pass another flag to the callers to
> >>>> signal the need for a flush. This would require quite some modifications
> >>>> to shadow code I'd like to avoid, though. OTOH this way we could combine
> >>>> flushing the tlb and the root page tables. Tim, any preferences?
> >>>
> >>> This sounds a promising direction but it should be doabl without major
> >>> surgery to the shadow code.  The shadow code already leaves old sl4es
> >>> visible (in TLBs) when it's safe to do so, so I think the right place
> >>> to hook this is on the receiving side of the TLB flush IPI.  IOW as
> >>> long as:
> >>>  - you copy the L4 on context switch; and
> >>>  - you copy it on the TLB flush IPI is received
> >>> then you can rely on the existing TLB flush mechanisms to do what you 
> >>> need.
> >>> And shadow doesn't have to behave differently from 'normal' PV MM.
> >>
> >> It is not so easy. The problem is that e.g. a page fault will flush the
> >> TLB entry for the page in question, but it won't lead to the L4 to be
> >> copied.
> > 
> > Oh yes, I see; thanks for the explanation.  It might be worth copying
> > what the hardware does here, and checking/propagating the relevant l4e
> > in the PV pagefault handler.
> 
> While in the long run being an interesting option I'm not sure I want
> to go this route for 4.11. There might be nasty corner cases and I think
> such a lazy approach is much more error prone than doing explicit
> updates of the L4 table on the affected cpus in case of a modified
> entry. I think we should either do the explicit call of flush_mask() in
> shadow_set_l4e() or propagate the need for the flush up to the caller.

FWIW, I disagree -- I think that having the fault handler DTRT and
relying on the existing, tested, TLB-flush logic is more likely to
work than introducing a new mechanism that _also_ has to catch every
possible l4e update.  It should touch less code and be less likely to
break with later changes.  And I think it would be better to do it
'properly' now than to hope there's time to revisit it later.  That
said, if Jan agrees that this way is OK, I'll quit grumbling and
review the shadow parts. :)

I think that setting the bits in shadow_set_l4e() is better than
having this leak out into all the callers.  I'm happy to see that the
hunk in l4e_propagate_from_guest() has gone away too.

Please move the shadow_set_l4e() hunk up so it's just after the write,
and before the general TLB flush logic.

Please move the logic into your code: the new function should take a
domain pointer and do all the filtering itself rather than have shadow
code be aware of what xpti is or why the domain's dirty-cpumask is
relevant.

It doesn't look like there's any check limiting this to PV guests, and
I think there should be, right?

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9 1/9] x86/xpti: avoid copying L4 page table contents when possible

2018-04-26 Thread Tim Deegan
Hi,

At 13:33 +0200 on 26 Apr (1524749590), Juergen Gross wrote:
> For mitigation of Meltdown the current L4 page table is copied to the
> cpu local root page table each time a 64 bit pv guest is entered.
> 
> Copying can be avoided in cases where the guest L4 page table hasn't
> been modified while running the hypervisor, e.g. when handling
> interrupts or any hypercall not modifying the L4 page table or %cr3.
> 
> So add a per-cpu flag indicating whether the copying should be
> performed and set that flag only when loading a new %cr3 or modifying
> the L4 page table.  This includes synchronization of the cpu local
> root page table with other cpus, so add a special synchronization flag
> for that case.
> 
> A simple performance check (compiling the hypervisor via "make -j 4")
> in dom0 with 4 vcpus shows a significant improvement:
> 
> - real time drops from 112 seconds to 103 seconds
> - system time drops from 142 seconds to 131 seconds
> 
> Signed-off-by: Juergen Gross 
> Reviewed-by: Jan Beulich 

Acked-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 07/10] xen/arm: Release maintenance interrupt when CPU is hot-unplugged

2018-04-26 Thread Tim Deegan
At 11:08 +0100 on 26 Apr (1524740921), Julien Grall wrote:
>  On 20/04/18 13:25, Mirela Simonovic wrote:
> >> This looks a bit weird. AFAIU, if you disable the CPU interface, then you
> >> should never receive interrupt after. So why would you re-enable them?
> >>
> >> I realize the code in __cpu_disbale do that, but this looks quite wrong to
> >> me. There are no way to receive queued timer interrupt afterwards.
> >>
> > 
> > That is what I took as a reference, but I asked myself the same.
> > There is (extremely small, but it exists) time window between
> > disabling irq locally and disabling CPU interface. An interrupt
> > received in that time window would propagate to the CPU but I'm not
> > sure would happen after the GIC CPU interface is disabled and
> > interrupts are locally enabled. That is the only explanation I can
> > come up with, although I believe the answer is nothing. Since you're
> > at ARM you could check this internally.
> 
> Speaking with Andre (in CC), the GIC CPU interface may have forwarded an 
> interrupt to the processor before it gets disabled. So when the 
> interrupt will be re-enabled, the processor will jump to the interrupt 
> exception entry.
> 
> However, looking at the spec (4-78 in ARM IHI 0048B.b), Xen will read a 
> spurious interrupt ID from GICC_IAR. So I am not sure what the point of 
> that code. It looks like it has been taken from x86, but some bits are 
> missing.
> 
> AFAIU, x86 will only suspend the timer afterwards (see time_suspend). I 
> am not fully sure why this code is there on Arm. Whether we expect a 
> timer interrupt to come up. Stefano, Tim, do you have any insight on 
> that code?

Sorry, no.  I pretty clearly copied this logic from x86, which copied
it directly from Linux at some point in the past.  I don't know why
x86 does it this way, and I haven't dived into linux to find out. :)
But draining the outstanding IRQs seems like a polite thing to do if
you're ever going to re-enable this CPU (at least without resetting
it first).

Cheers,

Tim.


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 07/10] xen/arm: Release maintenance interrupt when CPU is hot-unplugged

2018-04-27 Thread Tim Deegan
Hi,

At 10:28 +0100 on 27 Apr (1524824906), Julien Grall wrote:
> On 26/04/18 15:23, Tim Deegan wrote:
> > At 11:08 +0100 on 26 Apr (1524740921), Julien Grall wrote:
> >>>>>> On 20/04/18 13:25, Mirela Simonovic wrote:
> >>>> This looks a bit weird. AFAIU, if you disable the CPU interface, then you
> >>>> should never receive interrupt after. So why would you re-enable them?
> >>>>
> >>>> I realize the code in __cpu_disbale do that, but this looks quite wrong 
> >>>> to
> >>>> me. There are no way to receive queued timer interrupt afterwards.
> >>>>
> >>>
> >>> That is what I took as a reference, but I asked myself the same.
> >>> There is (extremely small, but it exists) time window between
> >>> disabling irq locally and disabling CPU interface. An interrupt
> >>> received in that time window would propagate to the CPU but I'm not
> >>> sure would happen after the GIC CPU interface is disabled and
> >>> interrupts are locally enabled. That is the only explanation I can
> >>> come up with, although I believe the answer is nothing. Since you're
> >>> at ARM you could check this internally.
> >>
> >> Speaking with Andre (in CC), the GIC CPU interface may have forwarded an
> >> interrupt to the processor before it gets disabled. So when the
> >> interrupt will be re-enabled, the processor will jump to the interrupt
> >> exception entry.
> >>
> >> However, looking at the spec (4-78 in ARM IHI 0048B.b), Xen will read a
> >> spurious interrupt ID from GICC_IAR. So I am not sure what the point of
> >> that code. It looks like it has been taken from x86, but some bits are
> >> missing.
> >>
> >> AFAIU, x86 will only suspend the timer afterwards (see time_suspend). I
> >> am not fully sure why this code is there on Arm. Whether we expect a
> >> timer interrupt to come up. Stefano, Tim, do you have any insight on
> >> that code?
> > 
> > Sorry, no.  I pretty clearly copied this logic from x86, which copied
> > it directly from Linux at some point in the past.  I don't know why
> > x86 does it this way, and I haven't dived into linux to find out. :)
> > But draining the outstanding IRQs seems like a polite thing to do if
> > you're ever going to re-enable this CPU (at least without resetting
> > it first).
> 
> I am not entirely sure what you mean by draining, do you mean they will 
> serviced by Xen? If so, what kind of interrupts do you expect to be 
> serviced (e.g PPI, SPIs) ?

All I mean is, when you disable the GICC (or APIC, or whatever), you
know that it won't send any more interrupts to the CPU.  But you would
like to also be certain that any interrupts it already sent to the CPU
get processed now.  Otherwise, if you bring the CPU up again later
that interrupt could still be there.  Better to get it out of the way
now, right?

AIUI that's what x86 is doing by re-enabling interrupts and waiting a
bit, which seems a bit crude but OK.  ARM could maybe do the same
thing by disabling GICC, dsb(), then disable interrupts.  But I don't
understand the interface between GICD, GICC and CPU well enough to
reason about it properly.

It's also possible that there's some subtlety of the timer interrupt
handling that I don't know about -- I _think_ that the reason timer
interrupts are relevant is that they're generated inside the APIC,
so that even when no interrupts are routed to the core, the APIC could
still generate one as it's being shut down.

> Clearly, this code does not seem to be doing what we are expecting. 
> Speaking the Marc Z. (GIC maintainers in Linux), there are no need to 
> disable the GIC CPU interface in the hypervisor/OS. You are going to 
> shutdown the CPU and it will be reset when you are coming back.

Well that answers my question, then.  If you know you're going to
reset the core (and not just put it in a deep sleep and wake it up
again) then I think this is all moot and you can just disable
interrupts once.

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v1 2/2] Replace occurances of xen.org with xenproject.org

2018-04-27 Thread Tim Deegan
At 10:40 +0100 on 27 Apr (1524825602), Wei Liu wrote:
> On Fri, Apr 27, 2018 at 10:30:51AM +0100, Lars Kurth wrote:
> >  KDD DEBUGGER
> > -M: Tim Deegan 
> > +M: Tim Deegan 
> 
> I think Tim should choose which domain name he prefers.

I would prefer to keep using @xen.org, thank you.

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v6 3/9] x86/mm: add disallow_mask parameter to get_page_from_l1e

2018-03-19 Thread Tim Deegan
At 09:54 -0600 on 13 Mar (1520934871), Jan Beulich wrote:
> >>> On 13.02.18 at 21:04,  wrote:
> > --- a/xen/arch/x86/mm/shadow/multi.c
> > +++ b/xen/arch/x86/mm/shadow/multi.c
> > @@ -858,13 +858,21 @@ shadow_get_page_from_l1e(shadow_l1e_t sl1e, struct 
> > domain *d, p2m_type_t type)
> >  int res;
> >  mfn_t mfn;
> >  struct domain *owner;
> > +/* The disallow mask is taken from arch/x86/mm.c for HVM guest */
> > +uint32_t disallow_mask =
> > +~(_PAGE_PRESENT | _PAGE_RW | _PAGE_USER | _PAGE_ACCESSED |
> > +  _PAGE_DIRTY | _PAGE_AVAIL | _PAGE_AVAIL_HIGH | _PAGE_NX);
> >  
> > +disallow_mask = (disallow_mask | _PAGE_GNTTAB) & ~_PAGE_GLOBAL;
> > +disallow_mask &= ~PAGE_CACHE_ATTRS;
> 
> If any of this is needed in the first place (see below), at least this
> last line could be folded into the variable's initializer as it looks.

Building it piecewise is fine by me, but since this is a constant it
should probably be declared once in some header file rather than here.
In any case it needs a comment that describes what it is, and not that
it used to live in mm.c. :)

Like Jan, I'm not convinced that having this be an argument to
get_page_from_l1e is super-useful -- it seems like an opportunity for
one of the callers to get it wrong and fail to enforce our invariants.

> > +ASSERT(is_hvm_domain(d));

This assertion isn't particularly helpful IMO (because it's
shadow_mode_refcounts that's relevant here, not is_hvm) but I don't
object to it.

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [for-4.11][PATCH v6 16/16] xen: Convert page_to_mfn and mfn_to_page to use typesafe MFN

2018-03-22 Thread Tim Deegan
Hi,

At 04:47 + on 21 Mar (1521607657), Julien Grall wrote:
> Most of the users of page_to_mfn and mfn_to_page are either overriding
> the macros to make them work with mfn_t or use mfn_x/_mfn because the
> rest of the function use mfn_t.
> 
> So make page_to_mfn and mfn_to_page return mfn_t by default. The __*
> version are now dropped as this patch will convert all the remaining
> non-typesafe callers.
> 
> Only reasonable clean-ups are done in this patch. The rest will use
> _mfn/mfn_x for the time being.
> 
> Lastly, domain_page_to_mfn is also converted to use mfn_t given that
> most of the callers are now switched to _mfn(domain_page_to_mfn(...)).
> 
> Signed-off-by: Julien Grall 
> Acked-by: Razvan Cojocaru 
> Reviewed-by: Paul Durrant 
> Reviewed-by: Boris Ostrovsky 
> Reviewed-by: Kevin Tian 
> Reviewed-by: Wei Liu 
> Acked-by: Jan Beulich 
> Reviewed-by: George Dunlap 

Thought I'd already acked this for the shadow code, but clearly not.
Sorry for the delay, and:

Acked-by: Tim Deegan 


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 07/16] x86/shadow: fetch CPL just once in sh_page_fault()

2018-07-11 Thread Tim Deegan
At 07:29 -0600 on 11 Jul (1531294179), Jan Beulich wrote:
> This isn't as much of an optimization than to avoid triggering a gcc bug
> affecting 5.x ... 7.x, triggered by any asm() put inside the ad hoc
> "rewalk" loop and taking as an (output?) operand a register variable
> tied to %rdx (an "rdx" clobber is fine). The issue is due to an apparent
> collision in register use with the modulo operation in vtlb_hash(),
> which (with optimization enabled) involves a multiplication of two
> 64-bit values with the upper half (in %rdx) of the 128-bit result being
> of interest.
> 
> Such an asm() was originally meant to be implicitly introduced into the
> code when converting most indirect calls through the hvm_funcs table to
> direct calls (via alternative instruction patching); that model was
> switched to clobbers due to further compiler problems, but I think the
> change here is worthwhile nevertheless.
> 
> Signed-off-by: Jan Beulich 

I don't quite follow what the compiler bug does here -- it would be nice
to say what effect it has on the final code.  In any case, the code
change is fine.

Reviewed-by: Tim Deegan 

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 6/6] x86/shadow: a little bit of style cleanup

2018-07-23 Thread Tim Deegan
At 04:51 -0600 on 19 Jul (1531975863), Jan Beulich wrote:
> Correct indentation of a piece of code, adjusting comment style at the
> same time. Constify gl3e pointers and drop a bogus (and useless once
> corrected) cast.
> 
> Signed-off-by: Jan Beulich 

Acked-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 3/6] x86/HVM: implement memory read caching

2018-07-23 Thread Tim Deegan
Hi,

At 04:48 -0600 on 19 Jul (1531975717), Jan Beulich wrote:
> Emulation requiring device model assistance uses a form of instruction
> re-execution, assuming that the second (and any further) pass takes
> exactly the same path. This is a valid assumption as far use of CPU
> registers goes (as those can't change without any other instruction
> executing in between), but is wrong for memory accesses. In particular
> it has been observed that Windows might page out buffers underneath an
> instruction currently under emulation (hitting between two passes). If
> the first pass translated a linear address successfully, any subsequent
> pass needs to do so too, yielding the exact same translation.
> 
> Introduce a cache (used just by guest page table accesses for now) to
> make sure above described assumption holds. This is a very simplistic
> implementation for now: Only exact matches are satisfied (no overlaps or
> partial reads or anything).
> 
> Signed-off-by: Jan Beulich 

For the change to shadow code:

Acked-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC] tools/kdd: avoid adversarial optimisation hazard

2018-08-03 Thread Tim Deegan
Hi,

Apologies for the delay.  Several of my other hats were on fire.

> > I suspect the address, from which offset is derived, is bounded. But I
> > haven't found the spec for KD.
> 
> I don’t think there is one.

Indeed not.  The official way to extend windbg &c is to write a plugin
that runs on the Windows machine where you run the debugger.

At 13:37 +0100 on 26 Jul (1532612265), Ian Jackson wrote:
> It's still very obscure becaause this test
> 
> if (offset > sizeof ctrl.c32 || offset + len > sizeof ctrl.c32) {
> 
> depends critically on the size of offset, etc.
> 
> Is it not still possible that this test could be fooled ?  Suppose
> offset is 0x.  Then before the test, offset is 0xfd33.

This is > sizeof ctrl.c32.  But:

> This kind of reasoning is awful.  The code should be rewritten so that
> it is obvious that it won't go wrong.

Yes.  How about this (compile tested only, and I haven't checked the buggy
gcc versions):

diff --git a/tools/debugger/kdd/kdd.c b/tools/debugger/kdd/kdd.c
index 5a019a0a0c..64aacde1ee 100644
--- a/tools/debugger/kdd/kdd.c
+++ b/tools/debugger/kdd/kdd.c
@@ -687,11 +687,11 @@ static void kdd_handle_read_ctrl(kdd_state *s)
 }
 } else {
 /* 32-bit control-register space starts at 0x[2]cc, for 84 bytes */
-uint32_t offset = addr;
-if (offset > 0x200)
-offset -= 0x200;
-offset -= 0xcc;
-if (offset > sizeof ctrl.c32 || offset + len > sizeof ctrl.c32) {
+uint32_t offset = addr - 0xcc;
+if (offset > sizeof ctrl.c32)
+offset = addr - 0x2cc;
+if (offset > sizeof ctrl.c32
+|| len > sizeof ctrl.c32 - offset) {
 KDD_LOG(s, "Request outside of known control space\n");
 len = 0;
 } else {


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 9/9] x86/vmx: Don't leak EFER.NXE into guest context

2018-05-24 Thread Tim Deegan
At 12:20 +0100 on 22 May (1526991646), Andrew Cooper wrote:
> Intel hardware only uses 4 bits in MSR_EFER.  Changes to LME and LMA are
> handled automatically via the VMENTRY_CTLS.IA32E_MODE bit.
> 
> SCE is handled by ad-hoc logic in context_switch(), vmx_restore_guest_msrs()
> and vmx_update_guest_efer(), and works by altering the host SCE value to match
> the setting the guest wants.  This works because, in HVM vcpu context, Xen
> never needs to execute a SYSCALL or SYSRET instruction.
> 
> However, NXE has never been context switched.  Unlike SCE, NXE cannot be
> context switched at vcpu boundaries because disabling NXE makes PTE.NX bits
> reserved and cause a pagefault when encountered.  This means that the guest
> always has Xen's setting in effect, irrespective of the bit it can see and
> modify in its virtualised view of MSR_EFER.
> 
> This isn't a major problem for production operating systems because they, like
> Xen, always turn the NXE on when it is available.  However, it does have an
> observable effect on which guest PTE bits are valid, and whether
> PFEC_insn_fetch is visible in a #PF error code.
> 
> Second generation VT-x hardware has host and guest EFER fields in the VMCS,
> and support for loading and saving them automatically.  First generation VT-x
> hardware needs to use MSR load/save lists to cause an atomic switch of
> MSR_EFER on vmentry/exit.
> 
> Therefore we update vmx_init_vmcs_config() to find and use guest/host EFER
> support when available (and MSR load/save lists on older hardware) and drop
> all ad-hoc alteration of SCE.
> 
> There are two complications for shadow guests.  NXE, being a paging setting
> needs to remain under host control, but that is fine as it is also Xen which
> handles the pagefaults.  Also, it turns out that without EPT enabled, hardware
> won't tolerate LME and LMA being different via either the GUEST_EFER VMCS
> setting, or via the guest load list.  This doesn't matter in practice as we
> intercept all writes to CR0 and reads from MSR_EFER, so can provide
> architecturally consistent behaviour from the guests point of view.
> 
> As a result of fixing EFER context switching, we can remove the Intel-special
> case from hvm_nx_enabled() and let guest_walk_tables() work with the real
> guest paging settings.
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] tools/kdd: alternative way of muting spurious gcc warning

2018-05-24 Thread Tim Deegan
At 06:02 +0200 on 23 May (1527055365), Juergen Gross wrote:
> On 22/05/18 21:47, Marek Marczykowski-Górecki wrote:
> > Older gcc does not support #pragma GCC diagnostics, so use alternative
> > approach - change variable type to uint32_t (this code handle 32-bit
> > requests only anyway), which apparently also avoid gcc complaining about
> > this (otherwise correct) code.
> > 
> > Fixes 437e00fea04becc91c1b6bc1c0baa636b067a5cc "tools/kdd: mute spurious
> > gcc warning"
> > 
> > Signed-off-by: Marek Marczykowski-Górecki 
> 
> Release-acked-by: Juergen Gross 

Acked-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 0/6] x86/mm: Minor non-functional cleanup

2018-08-16 Thread Tim Deegan
At 19:34 +0100 on 15 Aug (1534361671), Andrew Cooper wrote:
> Minor cleanup which has accumulated while doing other work.  No functional
> change anywhere.
> 
> Andrew Cooper (6):
>   x86/mm: Use mfn_eq()/mfn_add() rather than opencoded variations
>   x86/shadow: Use more appropriate conversion functions
>   x86/shadow: Switch shadow_domain.has_fast_mmio_entries to bool
>   x86/shadow: Use MASK_* helpers for the MMIO fastpath PTE manipulation
>   x86/shadow: Clean up the MMIO fastpath helpers
>   x86/shadow: Use mfn_t in shadow_track_dirty_vram()

Reviewed-by: Tim Deegan 
(with the one correction that Roger asked for in patch 1/6)


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 25/34] x86/mm/shadow: make it build with !CONFIG_HVM

2018-08-23 Thread Tim Deegan
At 16:12 +0100 on 17 Aug (1534522363), Wei Liu wrote:
> Enclose HVM only emulation code under CONFIG_HVM. Add some BUG()s to
> to catch any issue.
> 
> Note that although some code checks is_hvm_*, which hints it can be
> called for PV as well, I can't find such paths.
> 
> Signed-off-by: Wei Liu 

Reviewed-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 26/34] x86/mm/shadow: split out HVM only code

2018-08-23 Thread Tim Deegan
At 16:12 +0100 on 17 Aug (1534522364), Wei Liu wrote:
> Move the code previously enclosed in CONFIG_HVM into its own file.
> 
> Note that although some code explicitly check is_hvm_*, which hints it
> can be used for PV too, I can't find a code path that would be the
> case.
> 
> Signed-off-by: Wei Liu 

Reviewed-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/mm: Drop {HAP,SHADOW}_ERROR() wrappers

2018-08-30 Thread Tim Deegan
At 19:11 +0100 on 28 Aug (1535483514), Andrew Cooper wrote:
> Unlike the PRINTK/DEBUG wrappers, these go straight out to the console, rather
> than ending up in the debugtrace buffer.
> 
> A number of these users are followed by domain_crash(), and future changes
> will want to combine the printk() into the domain_crash() call.  Expand these
> wrappers in place, using XENLOG_ERR before a BUG(), and XENLOG_G_ERR before a
> domain_crash().
> 
> Perfom some %pv/PRI_mfn/etc cleanup while modifying the invocations, and
> explicitly drop some calls which are unnecessary (bad shadow op, and the empty
> stubs for incorrect sh_map_and_validate_gl?e() calls).
> 
> Signed-off-by: Andrew Cooper 

Acked-by:  Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Ping: [PATCH] x86: improve vCPU selection in pagetable_dying()

2018-10-03 Thread Tim Deegan
At 17:56 +0100 on 03 Oct (1538589366), George Dunlap wrote:
> On 09/26/2018 08:04 AM, Jan Beulich wrote:
> > Looking at things again (in particular
> > the comment ahead of pagetable_dying()) I now actually wonder why
> > HVMOP_pagetable_dying is permitted to be called by other than a domain
> > for itself. There's no use of it in the tool stack. Disallowing the unused
> > case would mean the fast-path logic in sh_pagetable_dying() could
> > become the only valid/implemented case. Tim?
> 
> Not so -- a guest could still call pagetable_dying() on the top level PT
> of a process not currently running.
> 
> I would be totally in favor of limiting this call to the guest itself,
> however -- that would simplify the logic even more.

Yes, I think that this can be restricted to the caller's domain, and
so always use current as the vcpu.  I don't recall a use case for
setting this from outside the VM.

I can't find reason for the vcpu[0] in the history, but it does look
wrong.  I suspect this patch might have been in a XenServer patch
queue for a while, and perhaps the plumbing was fixed up incorrectly
when it was upstreamed.

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 18/18] tools/debugger/kdd: Install as `xen-kdd', not just `kdd'

2018-10-07 Thread Tim Deegan
At 18:29 +0100 on 05 Oct (1538764157), Ian Jackson wrote:
> `kdd' is an unfortunate namespace landgrab.

Bah, humbug, etc. :)  Can we have a note in the changelog for the next
release to warn the few kdd users that we've done this?

> Signed-off-by: Ian Jackson 

Acked-by: Tim Deegan 

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 25/25] x86/shadow: fold sh_x86_emulate_{write, cmpxchg}() into their only callers

2018-02-05 Thread Tim Deegan
Hi,

At 01:42 -0700 on 05 Feb (1517794959), Jan Beulich wrote:
> >>> On 02.02.18 at 17:52,  wrote:
> > On 07/12/17 14:19, Jan Beulich wrote:
> >> +case 1: prev = cmpxchg((uint8_t  *)ptr, old, new); break;
> >> +case 2: prev = cmpxchg((uint16_t *)ptr, old, new); break;
> >> +case 4: prev = cmpxchg((uint32_t *)ptr, old, new); break;
> >> +case 8: prev = cmpxchg((uint64_t *)ptr, old, new); break;
> >> +default:
> >> +SHADOW_PRINTK("cmpxchg size %u is not supported\n", bytes);
> > 
> > Given the earlier patches in the series, is it worth introducing case 16
> > here?
> 
> In a follow-up patch this could be an option (unless Tim knows a
> reason why this might be a bad idea), but I certainly wouldn't want
> to do so here.

I agree that adding the 16 case shouldn't happen in this patch, and I
don't see a need for it.  Unless we think guest OSes will use 16-byte
atomic ops to update their 8-byte PTEs, the shadow code is probably
better off taking that as a hint to unshadow.

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 15/20] x86emul: correctly handle CMPXCHG* comparison failures

2018-02-28 Thread Tim Deegan
At 06:09 -0700 on 28 Feb (1519798185), Jan Beulich wrote:
> If the ->cmpxchg() hook finds a mismatch, we should deal with this the
> same way as when the "manual" comparison reports a mismatch.
> 
> This involves reverting bfce0e62c3 ("x86/emul: Drop
> X86EMUL_CMPXCHG_FAILED"), albeit with X86EMUL_CMPXCHG_FAILED now
> becoming a value distinct from X86EMUL_RETRY.
> 
> Signed-off-by: Jan Beulich 
> Acked-by: Andrew Cooper 

Shadow parts Acked-by: Tim Deegan 

And also for the other parts of the series (13/20, 19/20 and 20/20).

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 5/3] x86/shadow: emulate_gva_to_mfn() should respect p2m_ioreq_server

2018-11-14 Thread Tim Deegan
At 04:22 -0700 on 13 Nov (1542082936), Jan Beulich wrote:
> >>> On 13.11.18 at 11:59,  wrote:
> >> Subject: [PATCH 5/3] x86/shadow: emulate_gva_to_mfn() should respect
> >> p2m_ioreq_server
> >> 
> >> Writes to such pages would need to be handed to the emulator, which we're
> >> not prepared to do at this point.
> >> 
> >> Signed-off-by: Jan Beulich 
> >> 
> >> --- a/xen/arch/x86/mm/shadow/hvm.c
> >> +++ b/xen/arch/x86/mm/shadow/hvm.c
> >> @@ -338,7 +338,7 @@ static mfn_t emulate_gva_to_mfn(struct v
> >>  {
> >>  return _mfn(BAD_GFN_TO_MFN);
> >>  }
> >> -if ( p2m_is_discard_write(p2mt) )
> >> +if ( p2m_is_discard_write(p2mt) || p2mt == p2m_ioreq_server )
> >>  {
> >>  put_page(page);
> >>  return _mfn(READONLY_GFN);
> > 
> > Is this what we want here? I would have thought we want to return 
> > BAD_GFN_TO_MFN in the p2m_ioreq_server case so that the caller treats this 
> > in 
> > the same way it would MMIO.
> 
> I'm not sure which behavior is better; I'm certainly fine with switching
> as you say, but I'd first like to see Tim's opinion as well.

I'm not clear on what behaviour you want for this kind of page in
general -- I suspect I have missed or forgotten some context.  If the
guest's not supposed to write to it, then IMO you should add it to
P2M_DISCARD_WRITE_TYPES rather than special-casing it here.

Cheers,

Tim.


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 5/3] x86/shadow: emulate_gva_to_mfn() should respect p2m_ioreq_server

2018-11-14 Thread Tim Deegan
At 12:44 + on 14 Nov (1542199496), Paul Durrant wrote:
> > -Original Message-
> > From: Tim Deegan [mailto:t...@xen.org]
> > Sent: 14 November 2018 12:42
> > To: Jan Beulich 
> > Cc: Paul Durrant ; Andrew Cooper
> > ; Wei Liu ; xen-devel
> > 
> > Subject: Re: [PATCH 5/3] x86/shadow: emulate_gva_to_mfn() should respect
> > p2m_ioreq_server
> > 
> > At 04:22 -0700 on 13 Nov (1542082936), Jan Beulich wrote:
> > > >>> On 13.11.18 at 11:59,  wrote:
> > > >> Subject: [PATCH 5/3] x86/shadow: emulate_gva_to_mfn() should respect
> > > >> p2m_ioreq_server
> > > >>
> > > >> Writes to such pages would need to be handed to the emulator, which
> > we're
> > > >> not prepared to do at this point.
> > > >>
> > > >> Signed-off-by: Jan Beulich 
> > > >>
> > > >> --- a/xen/arch/x86/mm/shadow/hvm.c
> > > >> +++ b/xen/arch/x86/mm/shadow/hvm.c
> > > >> @@ -338,7 +338,7 @@ static mfn_t emulate_gva_to_mfn(struct v
> > > >>  {
> > > >>  return _mfn(BAD_GFN_TO_MFN);
> > > >>  }
> > > >> -if ( p2m_is_discard_write(p2mt) )
> > > >> +if ( p2m_is_discard_write(p2mt) || p2mt == p2m_ioreq_server )
> > > >>  {
> > > >>  put_page(page);
> > > >>  return _mfn(READONLY_GFN);
> > > >
> > > > Is this what we want here? I would have thought we want to return
> > > > BAD_GFN_TO_MFN in the p2m_ioreq_server case so that the caller treats
> > this in
> > > > the same way it would MMIO.
> > >
> > > I'm not sure which behavior is better; I'm certainly fine with switching
> > > as you say, but I'd first like to see Tim's opinion as well.
> > 
> > I'm not clear on what behaviour you want for this kind of page in
> > general -- I suspect I have missed or forgotten some context.  If the
> > guest's not supposed to write to it, then IMO you should add it to
> > P2M_DISCARD_WRITE_TYPES rather than special-casing it here.
> 
> The type has an odd semantic... it needs to read as normal RAM but writes 
> need to hit emulation. The use-case is for GVT-g where the emulator needs to 
> intercept writes to the graphics pagetables in guest RAM.

I see, thanks.  In that case, I agree with you that you should signal
this as BAD_GFN_TO_MFN here to trigger the MMIO path.

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 3/3] x86/shadow: emulate_gva_to_mfn() should respect p2m_ioreq_server

2018-11-15 Thread Tim Deegan
At 05:51 -0700 on 15 Nov (1542261108), Jan Beulich wrote:
> Writes to such pages need to be handed to the emulator.
> 
> Signed-off-by: Jan Beulich 

Acked-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/shadow: un-hide "full" auditing code

2018-11-20 Thread Tim Deegan
At 08:59 -0700 on 20 Nov (1542704343), Jan Beulich wrote:
> In particular sh_oos_audit() has become stale due to changes elsewhere,
> and the need for adjustment was not noticed because both "full audit"
> flags are off in both release and debug builds. Switch away from pre-
> processsor conditionals, thus exposing the code to the compiler at all
> times. This obviously requires correcting the accumulated issues with
> the so far hidden code.
> 
> Note that shadow_audit_tables() now also gains an effect with "full
> entry audit" mode disabled; the prior code structure suggests that this
> was originally intended anyway.
> 
> Signed-off-by: Jan Beulich 

Acked-by: Tim Deegan 

Thanks!

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/shadow: make 1-bit-disable match 1-bit-enable

2017-12-20 Thread Tim Deegan
At 02:28 -0700 on 18 Dec (1513564115), Jan Beulich wrote:
> shadow_one_bit_enable() sets PG_SH_enable (if not already set of course)
> in addition to the bit being requested. Make shadow_one_bit_disable()
> behave similarly - clear PG_SH_enable if that's the only bit remaining.
> 
> Signed-off-by: Jan Beulich 

Acked-by: Tim Deegan 


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/6] x86/shadow: drop further 32-bit relics

2017-12-20 Thread Tim Deegan
At 08:04 -0700 on 12 Dec (1513065884), Jan Beulich wrote:
> PV guests don't ever get shadowed in other than 4-level mode anymore;
> commit 5a3ce8f85e ("x86/shadow: drop stray name tags from
> sh_{guest_get,map}_eff_l1e()") didn't go quite fare enough (and there's
> a good chance that further cleanup opportunity exists, which I simply
> didn't notice).
> 
> Signed-off-by: Jan Beulich 

Acked-by: Tim Deegan 

> I'm not sure if all the ASSERT()s are really useful to have.

They seem good to me.

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/6] x86/shadow: remove pointless loops over all vCPU-s

2017-12-20 Thread Tim Deegan
At 08:05 -0700 on 12 Dec (1513065939), Jan Beulich wrote:
> The vCPU count can be had more directly.
> 
> Signed-off-by: Jan Beulich 

Acked-by: Tim Deegan 

> In the sh_make_shadow() case the question is whether it really was
> intended to count all vCPU-s, rather than e.g. only all initialized
> ones. I guess the problem would be the phase before the guest
> actually starts secondary processors, but that could perhaps be
> covered by using ->max_vcpus if otherwise 1 would result.

Yes, the intention was to count 'active' vcpus in both cases.  I'm OK
with the change though, and I don't think it's worth making things any
more complex here.

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 4/6] x86/shadow: widen reference count

2017-12-20 Thread Tim Deegan
At 08:07 -0700 on 12 Dec (1513066056), Jan Beulich wrote:
> Utilize as many of the bits available in the union as possible, without
> (just to be on the safe side) colliding with any of the bits outside of
> PGT_type_mask.
> 
> Signed-off-by: Jan Beulich 

Acked-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 3/6] x86/shadow: ignore sh_pin() failure in one more case

2017-12-20 Thread Tim Deegan
At 08:06 -0700 on 12 Dec (1513065979), Jan Beulich wrote:
> Following what we've already done in the XSA-250 fix, convert another
> sh_pin() caller to no longer fail the higher level operation if pinning
> fails, as pinning is a performance optimization only in those places.
> 
> Suggested-by: Tim Deegan 
> Signed-off-by: Jan Beulich 

Reviewed-by: Tim Deegan 

> Would it be worth making sh_pin() return void, by adding a bool
> parameter which the other call site in sh_set_toplevel_shadow() could
> use to indicate a ref is avalable to be used (instead of aquiring a
> new one)? This would allow to drop another domain_crash().

No, let's keep the calling convention as it is, please.

Cheers

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 5/6] x86/mm: clean up SHARED_M2P{, _ENTRY} uses

2017-12-20 Thread Tim Deegan
At 08:08 -0700 on 12 Dec (1513066100), Jan Beulich wrote:
> Stop open-coding SHARED_M2P() and drop a pointless use of it from
> paging_mfn_is_dirty() (!VALID_M2P() is a superset of SHARED_M2P()) and
> another one from free_page_type() (prior assertions render this
> redundant).
> 
> Signed-off-by: Jan Beulich 

Acked-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 6/6] x86: use paging_mark_pfn_dirty()

2017-12-20 Thread Tim Deegan
At 08:09 -0700 on 12 Dec (1513066153), Jan Beulich wrote:
> ... in preference over paging_mark_dirty(), when the PFN is known
> anyway.
> 
> Signed-off-by: Jan Beulich 

Acked-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [PATCH 0/2] x86/mm: {paging,sh}_{cmpxchg,write}_guest_entry() adjustments

2020-10-03 Thread Tim Deegan
At 13:56 +0200 on 28 Sep (1601301371), Jan Beulich wrote:
> 1: {paging,sh}_{cmpxchg,write}_guest_entry() cannot fault
> 2: remove some indirection from {paging,sh}_cmpxchg_guest_entry()

Acked-by: Tim Deegan 



Re: [PATCH 04/10] x86/shadow: sh_update_linear_entries() is a no-op for PV

2020-04-18 Thread Tim Deegan
At 16:26 +0200 on 17 Apr (1587140817), Jan Beulich wrote:
> Consolidate the shadow_mode_external() in here: Check this once at the
> start of the function.
> 
> Signed-off-by: Jan Beulich 
> 
> --- a/xen/arch/x86/mm/shadow/multi.c
> +++ b/xen/arch/x86/mm/shadow/multi.c
> @@ -3707,34 +3707,30 @@ sh_update_linear_entries(struct vcpu *v)
>   */

This looks good.  Can you please also delete the out-of-date comment
just above that talks about about PAE linear pagetables in PV guests,
to make it clear that PV guests don't need any maintenance here?

Cheers,

Tim.



Re: [PATCH 07/10] x86/shadow: the guess_wrmap() hook is needed for HVM only

2020-04-18 Thread Tim Deegan
At 16:28 +0200 on 17 Apr (1587140897), Jan Beulich wrote:
> sh_remove_write_access() bails early for !external guests, and hence its
> building and thus the need for the hook can be suppressed altogether in
> !HVM configs.
> 
> Signed-off-by: Jan Beulich 

> @@ -366,6 +367,14 @@ int sh_validate_guest_entry(struct vcpu
>  extern int sh_remove_write_access(struct domain *d, mfn_t readonly_mfn,
>unsigned int level,
>unsigned long fault_addr);
> +#else
> +static inline int sh_remove_write_access(struct domain *d, mfn_t 
> readonly_mfn,
> + unsigned int level,
> + unsigned long fault_addr)
> +{

Can we have an ASSERT(!shadow_mode_refcounts(d)) here, please,
matching the check that would have made it a noop before?

Cheers,

Tim.




Re: [PATCH 00/10] x86: mm (mainly shadow) adjustments

2020-04-18 Thread Tim Deegan
At 16:23 +0200 on 17 Apr (1587140581), Jan Beulich wrote:
> Large parts of this series are to further isolate pieces which
> are needed for HVM only, and hence would better not be built
> with HVM=n. But there are also a few other items which I've
> noticed along the road.

Acked-by: Tim Deegan 
with two suggestions that I've sent separately.

Cheers,

Tim.



Re: [PATCH 07/10] x86/shadow: the guess_wrmap() hook is needed for HVM only

2020-04-20 Thread Tim Deegan
At 15:06 +0200 on 20 Apr (1587395210), Jan Beulich wrote:
> On 18.04.2020 11:03, Tim Deegan wrote:
> > At 16:28 +0200 on 17 Apr (1587140897), Jan Beulich wrote:
> >> sh_remove_write_access() bails early for !external guests, and hence its
> >> building and thus the need for the hook can be suppressed altogether in
> >> !HVM configs.
> >>
> >> Signed-off-by: Jan Beulich 
> > 
> >> @@ -366,6 +367,14 @@ int sh_validate_guest_entry(struct vcpu
> >>  extern int sh_remove_write_access(struct domain *d, mfn_t readonly_mfn,
> >>unsigned int level,
> >>unsigned long fault_addr);
> >> +#else
> >> +static inline int sh_remove_write_access(struct domain *d, mfn_t 
> >> readonly_mfn,
> >> + unsigned int level,
> >> + unsigned long fault_addr)
> >> +{
> > 
> > Can we have an ASSERT(!shadow_mode_refcounts(d)) here, please,
> > matching the check that would have made it a noop before?
> 
> I've added one, but I find this quite odd in a !HVM build, where
> 
> #define PG_refcounts   0
> 
> and
> 
> #define paging_mode_refcounts(_d) (!!((_d)->arch.paging.mode & PG_refcounts))
> 
> Perhaps you're wanting this mainly for documentation purposes?


Yeah, that and future-proofing.  If !HVM builds ever start using
paging_mode_refcounts then this assertion will forcibly remind us that
we need changes here.  I'm glad that it compiles away to nothing in
the meantime.

Cheers,

Tim.



Re: [PATCH] x86/shadow: make sh_remove_write_access() helper HVM only

2020-04-21 Thread Tim Deegan
At 14:05 +0200 on 21 Apr (1587477956), Jan Beulich wrote:
> Despite the inline attribute at least some clang versions warn about
> trace_shadow_wrmap_bf() being unused in !HVM builds. Include the helper
> in the #ifdef region.
> 
> Fixes: 8b8d011ad868 ("x86/shadow: the guess_wrmap() hook is needed for HVM 
> only")
> Reported-by: Andrew Cooper 
> Signed-off-by: Jan Beulich 

Acked-by: Tim Deegan 



Re: [PATCH v2 2/4] x86/shadow: sh_update_linear_entries() is a no-op for PV

2020-04-21 Thread Tim Deegan
At 11:11 +0200 on 21 Apr (1587467497), Jan Beulich wrote:
> Consolidate the shadow_mode_external() in here: Check this once at the
> start of the function.
> 
> Signed-off-by: Jan Beulich 
> Acked-by: Andrew Cooper 
> Acked-by: Tim Deegan 
> ---
> v2: Delete stale part of comment.
> ---
> Tim - I'm re-posting as I wasn't entirely sure whether you meant to drop
> the entire PV part of the comment, or only the last two sentences.

Looks good, thanks!

Acked-by: Tim Deegan 



Re: [PATCH v11 1/3] x86/tlb: introduce a flush HVM ASIDs flag

2020-04-28 Thread Tim Deegan
At 11:12 +0100 on 27 Apr (1587985955), Wei Liu wrote:
> On Thu, Apr 23, 2020 at 06:33:49PM +0200, Jan Beulich wrote:
> > On 23.04.2020 16:56, Roger Pau Monne wrote:
> > > Introduce a specific flag to request a HVM guest linear TLB flush,
> > > which is an ASID/VPID tickle that forces a guest linear to guest
> > > physical TLB flush for all HVM guests.
> > > 
> > > This was previously unconditionally done in each pre_flush call, but
> > > that's not required: HVM guests not using shadow don't require linear
> > > TLB flushes as Xen doesn't modify the pages tables the guest runs on
> > > in that case (ie: when using HAP). Note that shadow paging code
> > > already takes care of issuing the necessary flushes when the shadow
> > > page tables are modified.
> > > 
> > > In order to keep the previous behavior modify all shadow code TLB
> > > flushes to also flush the guest linear to physical TLB if the guest is
> > > HVM. I haven't looked at each specific shadow code TLB flush in order
> > > to figure out whether it actually requires a guest TLB flush or not,
> > > so there might be room for improvement in that regard.
> > > 
> > > Also perform ASID/VPID flushes when modifying the p2m tables as it's a
> > > requirement for AMD hardware. Finally keep the flush in
> > > switch_cr3_cr4, as it's not clear whether code could rely on
> > > switch_cr3_cr4 also performing a guest linear TLB flush. A following
> > > patch can remove the ASID/VPID tickle from switch_cr3_cr4 if found to
> > > not be necessary.
> > > 
> > > Signed-off-by: Roger Pau Monné 
> > 
> > Reviewed-by: Jan Beulich 
> 
> Tim, ICYMI, this patch needs your ack.

Sorry!  Thanks for the reminder.

Acked-by: Tim Deegan 




Re: [PATCH] x86/traps: Rework #PF[Rsvd] bit handling

2020-05-20 Thread Tim Deegan
At 18:09 +0200 on 19 May (1589911795), Jan Beulich wrote:
> static inline int sh_type_has_up_pointer(struct domain *d, unsigned int t)
> {
> /* Multi-page shadows don't have up-pointers */
> if ( t == SH_type_l1_32_shadow
>  || t == SH_type_fl1_32_shadow
>  || t == SH_type_l2_32_shadow )
> return 0;
> /* Pinnable shadows don't have up-pointers either */
> return !sh_type_is_pinnable(d, t);
> }
> 
> It's unclear to me in which way SH_type_l1_32_shadow and
> SH_type_l2_32_shadow are "multi-page" shadows; I'd rather have
> expected all three SH_type_fl1_*_shadow to be. Tim?

They are multi-page in the sense that the shadow itself is more than a
page long (because it needs to have 1024 8-byte entries).

The FL1 shadows are the same size as their equivalent L1s.

Tim.



Re: [Xen-devel] [PATCH v9 0/4] purge free_shared_domheap_page()

2020-02-07 Thread Tim Deegan
At 09:17 + on 06 Feb (1580980664), Durrant, Paul wrote:
> > -Original Message-
> > From: Jan Beulich 
> > On 06.02.2020 09:28, Durrant, Paul wrote:
> > >>  xen/arch/x86/mm/shadow/common.c |  2 +-

> George, Julien, Tim,
> 
>   Can I have acks or otherwise, please?

Acked-by: Tim Deegan 

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 2/7] x86/paging: add TLB flush hooks

2020-02-13 Thread Tim Deegan
At 18:28 +0100 on 10 Feb (1581359304), Roger Pau Monne wrote:
> Add shadow and hap implementation specific helpers to perform guest
> TLB flushes. Note that the code for both is exactly the same at the
> moment, and is copied from hvm_flush_vcpu_tlb. This will be changed by
> further patches that will add implementation specific optimizations to
> them.
> 
> No functional change intended.
> 
> Signed-off-by: Roger Pau Monné 
> Reviewed-by: Wei Liu 

Acked-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 2/7] x86/paging: add TLB flush hooks

2020-02-13 Thread Tim Deegan
At 09:02 + on 13 Feb (1581584528), Tim Deegan wrote:
> At 18:28 +0100 on 10 Feb (1581359304), Roger Pau Monne wrote:
> > Add shadow and hap implementation specific helpers to perform guest
> > TLB flushes. Note that the code for both is exactly the same at the
> > moment, and is copied from hvm_flush_vcpu_tlb. This will be changed by
> > further patches that will add implementation specific optimizations to
> > them.
> > 
> > No functional change intended.
> > 
> > Signed-off-by: Roger Pau Monné 
> > Reviewed-by: Wei Liu 
> 
> Acked-by: Tim Deegan 

Oops, wrong address, sorry.

Acked-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 4/7] x86/tlb: introduce a flush guests TLB flag

2020-02-13 Thread Tim Deegan
At 18:28 +0100 on 10 Feb (1581359306), Roger Pau Monne wrote:
> Introduce a specific flag to request a HVM guest TLB flush, which is
> an ASID/VPID tickle that forces a linear TLB flush for all HVM guests.
> 
> This was previously unconditionally done in each pre_flush call, but
> that's not required: HVM guests not using shadow don't require linear
> TLB flushes as Xen doesn't modify the guest page tables in that case
> (ie: when using HAP).
> 
> Modify all shadow code TLB flushes to also flush the guest TLB, in
> order to keep the previous behavior. I haven't looked at each specific
> shadow code TLB flush in order to figure out whether it actually
> requires a guest TLB flush or not, so there might be room for
> improvement in that regard.
> 
> Signed-off-by: Roger Pau Monné 
> Reviewed-by: Wei Liu 

Acked-by:  Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [PATCH v3 2/3] x86/shadow: refactor shadow_vram_{get,put}_l1e()

2020-10-29 Thread Tim Deegan
At 10:44 +0200 on 19 Oct (1603104271), Jan Beulich wrote:
> By passing the functions an MFN and flags, only a single instance of
> each is needed; they were pretty large for being inline functions
> anyway.
> 
> While moving the code, also adjust coding style and add const where
> sensible / possible.
> 
> Signed-off-by: Jan Beulich 

Acked-by: Tim Deegan 



Re: [PATCH v3 3/3] x86/shadow: sh_{make,destroy}_monitor_table() are "even more" HVM-only

2020-10-29 Thread Tim Deegan
At 10:45 +0200 on 19 Oct (1603104300), Jan Beulich wrote:
> With them depending on just the number of shadow levels, there's no need
> for more than one instance of them, and hence no need for any hook (IOW
> 452219e24648 ["x86/shadow: monitor table is HVM-only"] didn't go quite
> far enough). Move the functions to hvm.c while dropping the dead
> is_pv_32bit_domain() code paths.
>
> While moving the code, replace a stale comment reference to
> sh_install_xen_entries_in_l4(). Doing so made me notice the function
> also didn't have its prototype dropped in 8d7b633adab7 ("x86/mm:
> Consolidate all Xen L4 slot writing into init_xen_l4_slots()"), which
> gets done here as well.
> 
> Signed-off-by: Jan Beulich 

Acked-by: Tim Deegan 

> TBD: In principle both functions could have their first parameter
>  constified. In fact, "destroy" doesn't depend on the vCPU at all
>  and hence could be passed a struct domain *. Not sure whether such
>  an asymmetry would be acceptable.
>  In principle "make" would also not need passing of the number of
>  shadow levels (can be derived from v), which would result in yet
>  another asymmetry.
>  If these asymmetries were acceptable, "make" could then also update
>  v->arch.hvm.monitor_table, instead of doing so at both call sites.

Feel free to add consts, but please don't change the function
parameters any more than that.  I would rather keep them as a matched
pair, and leave the hvm.monitor_table updates in the caller, where
it's easier to see why they're not symmetrical.

Cheers

Tim.



Re: [PATCH 2/5] x86/p2m: collapse the two ->write_p2m_entry() hooks

2020-10-29 Thread Tim Deegan
At 10:22 +0100 on 28 Oct (1603880578), Jan Beulich wrote:
> The struct paging_mode instances get set to the same functions
> regardless of mode by both HAP and shadow code, hence there's no point
> having this hook there. The hook also doesn't need moving elsewhere - we
> can directly use struct p2m_domain's. This merely requires (from a
> strictly formal pov; in practice this may not even be needed) making
> sure we don't end up using safe_write_pte() for nested P2Ms.
> 
> Signed-off-by: Jan Beulich 

Acked-by: Tim Deegan 



Re: [PATCH 5/5] x86/p2m: split write_p2m_entry() hook

2020-10-29 Thread Tim Deegan
At 10:24 +0100 on 28 Oct (1603880693), Jan Beulich wrote:
> Fair parts of the present handlers are identical; in fact
> nestedp2m_write_p2m_entry() lacks a call to p2m_entry_modify(). Move
> common parts right into write_p2m_entry(), splitting the hooks into a
> "pre" one (needed just by shadow code) and a "post" one.
> 
> For the common parts moved I think that the p2m_flush_nestedp2m() is,
> at least from an abstract perspective, also applicable in the shadow
> case. Hence it doesn't get a 3rd hook put in place.
> 
> The initial comment that was in hap_write_p2m_entry() gets dropped: Its
> placement was bogus, and looking back the the commit introducing it
> (dd6de3ab9985 "Implement Nested-on-Nested") I can't see either what use
> of a p2m it was meant to be associated with.
> 
> Signed-off-by: Jan Beulich 

This seems like a good approach to me.  I'm happy with the shadow
parts but am not confident enough on nested p2m any more to have an
opinion on that side. 

Acked-by: Tim Deegan 




Re: [PATCH] x86/shadow: correct GFN use by sh_unshadow_for_p2m_change()

2020-10-29 Thread Tim Deegan
At 16:42 +0100 on 28 Oct (1603903365), Jan Beulich wrote:
> Luckily sh_remove_all_mappings()'s use of the parameter is limited to
> generation of log messages. Nevertheless we'd better pass correct GFNs
> around:
> - the incoming GFN, when replacing a large page, may not be large page
>   aligned,
> - incrementing by page-size-scaled values can't be right.
> 
> Signed-off-by: Jan Beulich 

Reviewed-by: Tim Deegan 

Thanks!

Tim.



Re: Ping: [PATCH v3 0/3] x86: shim building adjustments (plus shadow follow-on)

2020-10-29 Thread Tim Deegan
At 14:40 +0100 on 29 Oct (1603982415), Jan Beulich wrote:
> Tim,
> 
> unless you tell me otherwise I'm intending to commit the latter
> two with Roger's acks some time next week. Of course it would
> still be nice to know your view on the first of the TBDs in
> patch 3 (which may result in further changes, in a follow-up).

Ack, sorry for the dropped patches, and thank you for the ping.  I've
now replied to everything that I think is wating for my review.

Cheers,

Tim.



Re: [PATCH v2 7/9] x86/p2m: pass old PTE directly to write_p2m_entry_pre() hook

2020-11-06 Thread Tim Deegan
At 10:38 +0100 on 06 Nov (1604659085), Jan Beulich wrote:
> In no case is a pointer to non-const needed. Since no pointer arithmetic
> is done by the sole user of the hook, passing in the PTE itself is quite
> fine.
> 
> While doing this adjustment also
> - drop the intermediate sh_write_p2m_entry_pre():
>   sh_unshadow_for_p2m_change() can itself be used as the hook function,
>   moving the conditional into there,
> - introduce a local variable holding the flags of the old entry.
> 
> Signed-off-by: Jan Beulich 

Acked-by: Tim Deegan 




Re: [PATCH v2 8/9] x86/shadow: cosmetics to sh_unshadow_for_p2m_change()

2020-11-07 Thread Tim Deegan
At 10:38 +0100 on 06 Nov (1604659127), Jan Beulich wrote:
> Besides the adjustments for style
> - use switch(),
> - widen scope of commonly used variables,
> - narrow scope of other variables.
> 
> Signed-off-by: Jan Beulich 

Acked-by: Tim Deegan 



Re: [PATCH v2 9/9] x86/shadow: adjust TLB flushing in sh_unshadow_for_p2m_change()

2020-11-07 Thread Tim Deegan
At 10:39 +0100 on 06 Nov (1604659168), Jan Beulich wrote:
> Accumulating transient state of d->dirty_cpumask in a local variable is
> unnecessary here: The flush is fine to make with the dirty set at the
> time of the call. With this, move the invocation to a central place at
> the end of the function.
> 
> Signed-off-by: Jan Beulich 

Acked-by: Tim Deegan 



Re: [PATCH 5/5] x86/p2m: split write_p2m_entry() hook

2020-11-12 Thread Tim Deegan
At 15:04 +0100 on 12 Nov (1605193496), Jan Beulich wrote:
> On 12.11.2020 14:07, Roger Pau Monné wrote:
> > On Thu, Nov 12, 2020 at 01:29:33PM +0100, Jan Beulich wrote:
> >> I agree with all this. If only it was merely about TLB flushes. In
> >> the shadow case, shadow_blow_all_tables() gets invoked, and that
> >> one - looking at the other call sites - wants the paging lock held.
[...]
> > The post hook for shadow could pick the lock again, as I don't think
> > the removal of the tables needs to be strictly done inside of the same
> > locked region?
> 
> I think it does, or else a use of the now stale tables may occur
> before they got blown away. Tim?

Is this the call to shadow_blow_tables() in the write_p2m_entry path?
I think it would be safe to drop and re-take the paging lock there as
long as the call happens before the write is considered to have
finished.

But it would not be a useful performance improvement - any update that
takes this path is going to be very slow regardless.  So unless you
have another pressing reason to split it up, I would be inclined to
leave it as it is.  That way it's easier to see that the locking is
correct.

Cheers,

Tim.



Re: [Xen-devel] [PATCH] x86/shadow: use single (atomic) MOV for emulated writes

2020-01-16 Thread Tim Deegan
At 15:29 -0500 on 16 Jan (1579188566), Jason Andryuk wrote:
> This is the corresponding change to the shadow code as made by
> bf08a8a08a2e "x86/HVM: use single (atomic) MOV for aligned emulated
> writes" to the non-shadow HVM code.
> 
> The bf08a8a08a2e commit message:
> Using memcpy() may result in multiple individual byte accesses
> (depending how memcpy() is implemented and how the resulting insns,
> e.g. REP MOVSB, get carried out in hardware), which isn't what we
> want/need for carrying out guest insns as correctly as possible. Fall
> back to memcpy() only for accesses not 2, 4, or 8 bytes in size.
> 
> Signed-off-by: Jason Andryuk 

Acked-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Issues/improvements performing flush of guest TLBs

2020-01-16 Thread Tim Deegan
Hi,

At 12:16 +0100 on 15 Jan (1579090561), Roger Pau Monné wrote:
>  - Shadow: it's not clear to me exactly which parts of sh_update_cr3
>are needed in order to perform a guest TLB flush. I think calling:
> 
> #if (SHADOW_OPTIMIZATIONS & SHOPT_VIRTUAL_TLB)
> /* No longer safe to use cached gva->gfn translations */
> vtlb_flush(v);
> #endif
> #if (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC)
> /* Need to resync all the shadow entries on a TLB flush. */
> shadow_resync_current_vcpu(v);
> #endif
> 
> if ( is_hvm_domain(d) )
> /*
>  * Linear mappings might be cached in non-root mode when ASID/VPID is
>  * in use and hence they need to be flushed here.
>  */
> hvm_asid_flush_vcpu(v);
> 
>Should be enough but I'm not very familiar with the shadow code,
>and hence would like some feedback from someone more familiar with
>shadow in order to assert exactly what's required to perform a
>guest TLB flush.

I would advise keeping the whole thing until you have measurememnts
that show that it's worthwhile being clever here (e.g. that the IPI
costs don't dominate).

But I think for safety we need at least the code you mention and also:
 - the code that reloads the PAE top-level entries; and
 - the shadow_resync_other_vcpus() at the end.

>Also, AFAICT sh_update_cr3 is not safe to be called on vCPUs
>currently running on remote pCPUs, albeit there are no assertions
>to that end. It's also not clear which parts of sh_update_cr3 are
>safe to be called while the vCPU is running.

Yeah, sh_update_cr3 makes a bunch of state changes and assumes
that the vcpu can't do TLB loads part-way through.  It may be possible
to do some of it remotely but as you say it would take a lot of
thinking, and if the guest is running you're going to need an IPI
anyway to flush the actual TLB.

> FWIW, there also seems to be a lot of unneeded flushes of HVM guests
> TLB, as do_tlb_flush will unconditionally clear all HVM guest TLBs on
> the pCPU by calling hvm_asid_flush_core which I don't think it's
> necessary/intended by quite a lot of the Xen TLB flush callers. I
> guess this would also warrant a different discussion, as there seems
> to be room for improvement in this area.

There may be room for improvement, but do be careful - the Xen MM
safety rules depend on TLB flushes when a page's type or ownership
changes, and that does mean flushing even the guest TLBs.  ISTR
discussing this at the time that vTLBs were introduced and deciding
that it wasn't worth adding all the tracking that would be necessary;
that may have changed now that the p2m infrastructure is better
developed.

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] xen/x86: p2m: Don't initialize slot 0 of the P2M

2020-02-05 Thread Tim Deegan
At 18:31 + on 03 Feb (1580754711), Julien Grall wrote:
> On 03/02/2020 17:37, George Dunlap wrote:
> > On 2/3/20 5:22 PM, Julien Grall wrote:
> >> On 03/02/2020 17:10, George Dunlap wrote:
> >>> On 2/3/20 4:58 PM, Julien Grall wrote:
>  From: Julien Grall 
> 
>  It is not entirely clear why the slot 0 of each p2m should be populated
>  with empty page-tables. The commit introducing it 759af8e3800 "[HVM]
>  Fix 64-bit HVM domain creation." does not contain meaningful
>  explanation except that it was necessary for shadow.
> >>>
> >>> Tim, any ideas here?

Afraid not, sorry.  I can't think what would rely on the tables being
allocated for slot 0 in particular.  Maybe there's something later
that adds other entries in the bottom 2MB and can't handle a table
allocation failure?

> > Also, it's not clear to me what kind of bug the code you're deleting
> > would fix.  If you read a not-present entry, you should get INVALID_MFN
> > anyway.  Unless you were calling p2m_get_entry_query(), which I'm pretty
> > sure hadn't been introduced at this point.
> 
> I can't find this function you mention in staging. Was it removed recently?
> 
> The code is allocating all page-tables for _gfn(0). I would not expect 
> the common code to care whether a table is allocated or not. So this 
> would suggest that an internal implementation (of the shadow?) is 
> relying on this.
> 
> However, I can't find anything obvious suggesting that is necessary. If 
> there was anything, I would expect to happen during domain creation, as 
> neither Xen nor a guest could rely on this (there are way to make those 
> pages disappear with the MEMORY op hypercall).

That may not have been true at the time (and so whatever it was that
neede this may have been fixed when it became true?)

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [PATCH 15/17] x86/shadow: drop SH_type_l2h_pae_shadow

2021-01-22 Thread Tim Deegan
Hi,

At 16:10 +0100 on 14 Jan (1610640627), Jan Beulich wrote:
> This is a remnant from 32-bit days, having no place anymore where a
> shadow of this type would be created.
> 
> Signed-off-by: Jan Beulich 
> ---
> hash_{domain,vcpu}_foreach() have a use each of literal 15. It's not
> clear to me what the proper replacement constant would be, as it
> doesn't look as if SH_type_monitor_table was meant.

Good spot.  I think the '<= 15' should be replaced with '< SH_type_unused'.
It originally matched the callback arrays all being coded as
"static hash_callback_t callbacks[16]".

Cheers,

Tim



Re: [PATCH 00/17] x86/PV: avoid speculation abuse through guest accessors plus ...

2021-01-22 Thread Tim Deegan
At 16:01 +0100 on 14 Jan (1610640090), Jan Beulich wrote:
> ... shadow adjustments towards not building 2- and 3-level code
> when !HVM. While the latter isn't functionally related to the
> former, it depends on some of the rearrangements done there.

The shadow changes look good, thank you!
Reviewed-by: Tim Deegan 

I have read the uaccess stuff in passing and it looks OK to me too,
but I didn't review it in detail.

Cheers,

Tim.




Re: [PATCH 15/17] x86/shadow: drop SH_type_l2h_pae_shadow

2021-01-22 Thread Tim Deegan
Hi,

At 17:31 +0100 on 22 Jan (1611336662), Jan Beulich wrote:
> On 22.01.2021 14:11, Tim Deegan wrote:
> > At 16:10 +0100 on 14 Jan (1610640627), Jan Beulich wrote:
> >> hash_{domain,vcpu}_foreach() have a use each of literal 15. It's not
> >> clear to me what the proper replacement constant would be, as it
> >> doesn't look as if SH_type_monitor_table was meant.
> > 
> > Good spot.  I think the '<= 15' should be replaced with '< SH_type_unused'.
> > It originally matched the callback arrays all being coded as
> > "static hash_callback_t callbacks[16]".
> 
> So are you saying this was off by one then prior to this patch
> (when SH_type_unused was still 17), albeit in apparently a
> benign way?

Yes - this assertion is just to catch overruns of the callback table,
and so it was OK for its limit to be too low.  The new types that were
added since then are never put in the hash table, so don't trigger
this assertion.

> And the comments in sh_remove_write_access(),
> sh_remove_all_mappings(), sh_remove_shadows(), and
> sh_reset_l3_up_pointers() are then wrong as well, and would
> instead better be like in shadow_audit_tables()?

Yes, it looks like those comments are also out of date where they
mention 'unused'.

> Because of this having been benign (due to none of the callback
> tables specifying non-NULL entries there), wouldn't it make
> sense to dimension the tables by SH_type_max_shadow + 1 only?
> Or would you consider this too risky?

Yes, I think that would be fine, also changing '<= 15' to
'<= SH_type_max_shadow'.  Maybe add a matching
ASSERT(t <= SH_type_max_shadow) in shadow_hash_insert as well?

Cheers,

Tim.



Re: [PATCH] x86/shadow: replace stale literal numbers in hash_{vcpu,domain}_foreach()

2021-01-25 Thread Tim Deegan
At 12:07 +0100 on 25 Jan (1611576438), Jan Beulich wrote:
> 15 apparently once used to be the last valid type to request a callback
> for, and the dimension of the respective array. The arrays meanwhile are
> larger than this (in a benign way, i.e. no caller ever sets a mask bit
> higher than 15), dimensioned by SH_type_unused. Have the ASSERT()s
> follow suit and add build time checks at the call sites.
> 
> Also adjust a comment naming the wrong of the two functions.
> 
> Signed-off-by: Jan Beulich 

Reviewed-by: Tim Deegan 

> ---
> The ASSERT()s being adjusted look redundant with the BUILD_BUG_ON()s
> being added, so I wonder whether dropping them wouldn't be the better
> route.

I'm happy to keep both, as they do slightly different things.

Thanks for fixing this up!

Tim.



Re: [PATCH 5/5] x86/shadow: l3table[] and gl3e[] are HVM only

2020-07-18 Thread Tim Deegan
At 12:00 +0200 on 15 Jul (1594814409), Jan Beulich wrote:
> ... by the very fact that they're 3-level specific, while PV always gets
> run in 4-level mode. This requires adding some seemingly redundant
> #ifdef-s - some of them will be possible to drop again once 2- and
> 3-level guest code doesn't get built anymore in !HVM configs, but I'm
> afraid there's still quite a bit of disentangling work to be done to
> make this possible.
> 
> Signed-off-by: Jan Beulich 

Looks good.  It seems like the new code for '3-level non-HVM' in
guest-walks ought to have some sort of assert-unreachable in them too
- or is there a reason to to?

Cheers,

Tim.




Re: [PATCH 0/5] x86: mostly shadow related XSA-319 follow-up

2020-07-18 Thread Tim Deegan
At 11:56 +0200 on 15 Jul (1594814214), Jan Beulich wrote:
> This in particular goes a few small steps further towards proper
> !HVM and !PV config handling (i.e. no carrying of unnecessary
> baggage).
> 
> 1: x86/shadow: dirty VRAM tracking is needed for HVM only
> 2: x86/shadow: shadow_table[] needs only one entry for PV-only configs
> 3: x86/PV: drop a few misleading paging_mode_refcounts() checks
> 4: x86/shadow: have just a single instance of sh_set_toplevel_shadow()
> 5: x86/shadow: l3table[] and gl3e[] are HVM only

I sent a question on #5 separatly; otherwise these all seem good to
me, thank you!

Acked-by: Tim Deegan 



Re: [PATCH 5/5] x86/shadow: l3table[] and gl3e[] are HVM only

2020-07-20 Thread Tim Deegan
At 10:55 +0200 on 20 Jul (1595242521), Jan Beulich wrote:
> On 18.07.2020 20:20, Tim Deegan wrote:
> > At 12:00 +0200 on 15 Jul (1594814409), Jan Beulich wrote:
> >> ... by the very fact that they're 3-level specific, while PV always gets
> >> run in 4-level mode. This requires adding some seemingly redundant
> >> #ifdef-s - some of them will be possible to drop again once 2- and
> >> 3-level guest code doesn't get built anymore in !HVM configs, but I'm
> >> afraid there's still quite a bit of disentangling work to be done to
> >> make this possible.
> >>
> >> Signed-off-by: Jan Beulich 
> > 
> > Looks good.  It seems like the new code for '3-level non-HVM' in
> > guest-walks ought to have some sort of assert-unreachable in them too
> > - or is there a reason to to?
> 
> You mean this piece of code
> 
> +#elif !defined(CONFIG_HVM)
> +(void)root_gfn;
> +memset(gw, 0, sizeof(*gw));
> +return false;
> +#else /* PAE */
> 
> If so - sure, ASSERT_UNREACHABLE() could be added there. It simply
> didn't occur to me. I take it your ack for the entire series holds
> here with this addition.

Yes, it does.  Thanks!

Tim.



Re: [PATCH] x86/hvm: Clean up track_dirty_vram() calltree

2020-07-26 Thread Tim Deegan
At 16:15 +0100 on 22 Jul (1595434548), Andrew Cooper wrote:
>  * Rename nr to nr_frames.  A plain 'nr' is confusing to follow in the the
>lower levels.
>  * Use DIV_ROUND_UP() rather than opencoding it in several different ways
>  * The hypercall input is capped at uint32_t, so there is no need for
>nr_frames to be unsigned long in the lower levels.
> 
> No functional change.
> 
> Signed-off-by: Andrew Cooper 

Acked-by: Tim Deegan 



Re: [PATCH] x86/shadow: Reposition sh_remove_write_access_from_sl1p()

2020-05-24 Thread Tim Deegan
At 10:04 +0100 on 21 May (1590055468), Andrew Cooper wrote:
> When compiling with SHOPT_OUT_OF_SYNC disabled, the build fails with:
> 
>   common.c:41:12: error: ‘sh_remove_write_access_from_sl1p’ declared ‘static’ 
> but never defined [-Werror=unused-function]
>static int sh_remove_write_access_from_sl1p(struct domain *d, mfn_t gmfn,
>   ^~~~
> 
> due to an unguarded forward declaration.
> 
> It turns out there is no need to forward declare
> sh_remove_write_access_from_sl1p() to begin with, so move it to just ahead of
> its first users, which is within a larger #ifdef'd SHOPT_OUT_OF_SYNC block.
> 
> Fix up for style while moving it.  No functional change.
> 
> Signed-off-by: Andrew Cooper 

Thank you!  This is fine, either as-is or with the suggested change to
a switch.

Reviewed-by: Tim Deegan 




Re: [PATCH v1] kdd: remove zero-length arrays

2020-06-09 Thread Tim Deegan
Hi,

At 09:55 +0100 on 09 Jun (1591696552), Paul Durrant wrote:
> > -Original Message-
> > From: Xen-devel  On Behalf Of Olaf 
> > Hering
> > Sent: 08 June 2020 21:39
> > To: xen-devel@lists.xenproject.org
> > Cc: Ian Jackson ; Olaf Hering ; 
> > Tim Deegan ;
> > Wei Liu 
> > Subject: [PATCH v1] kdd: remove zero-length arrays
> > 
> > Struct 'kdd_hdr' already has a member named 'payload[]' to easily
> > refer to the data after the header.  Remove the payload member from
> > 'kdd_pkt' and always use 'kdd_hdr' to fix the following compile error:
> 
> Is it not sufficient to just change the declaration of payload in kdd_pkt 
> from [0] to []? In fact I can't see any use of the payload
> field in kdd_hdr so it looks like that is the one that ought to be dropped.

Yes, if one of these has to go, it should be the one in the header,
since it's not used and the one in the packet is neatly in the union
with the other decriptions of the packet payload.

I'm not currently in a position to test patches, but might be later in
the week.  Olaf, can you try dropping the 'payload' field from the
header and replacing the payload[0] in pkt with payload[] ?

Cheers,

Tim.



Re: [PATCH v1] kdd: remove zero-length arrays

2020-06-10 Thread Tim Deegan
At 15:22 +0200 on 09 Jun (1591716153), Olaf Hering wrote:
> Am Tue, 9 Jun 2020 13:15:49 +0100
> schrieb Tim Deegan :
> 
> > Olaf, can you try dropping the 'payload' field from the header and 
> > replacing the payload[0] in pkt with payload[] ?
> 
> In file included from kdd.c:53:
> kdd.h:325:17: error: flexible array member in union
>   325 | uint8_t payload[];

How tedious.  Well, the only place we actually allocate one of these
we already leave enough space for a max-size packet, so how about
this?

kdd: stop using [0] arrays to access packet contents.

GCC 10 is unhappy about this, and we already use 64k buffers
in the only places where packets are allocated, so move the
64k size into the packet definition.

Reported-by: Olaf Hering 
Signed-off-by: Tim Deegan 
---
 tools/debugger/kdd/kdd.c | 4 ++--
 tools/debugger/kdd/kdd.h | 3 +--
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git tools/debugger/kdd/kdd.c tools/debugger/kdd/kdd.c
index 3ebda9b12c..a7d0976ea4 100644
--- tools/debugger/kdd/kdd.c
+++ tools/debugger/kdd/kdd.c
@@ -79,11 +79,11 @@ typedef struct {
 /* State of the debugger stub */
 typedef struct {
 union {
-uint8_t txb[sizeof (kdd_hdr) + 65536];   /* Marshalling area for tx */
+uint8_t txb[sizeof (kdd_pkt)];   /* Marshalling area for tx */
 kdd_pkt txp; /* Also readable as a packet structure */
 };
 union {
-uint8_t rxb[sizeof (kdd_hdr) + 65536];   /* Marshalling area for rx */
+uint8_t rxb[sizeof (kdd_pkt)];   /* Marshalling area for rx */
 kdd_pkt rxp; /* Also readable as a packet structure */
 };
 unsigned int cur;   /* Offset into rx where we'll put the next byte */
diff --git tools/debugger/kdd/kdd.h tools/debugger/kdd/kdd.h
index bfb00ba5c5..b9a17440df 100644
--- tools/debugger/kdd/kdd.h
+++ tools/debugger/kdd/kdd.h
@@ -68,7 +68,6 @@ typedef struct {
 uint16_t len; /* Payload length, excl. header and trailing byte */
 uint32_t id;  /* Echoed in responses */
 uint32_t sum; /* Unsigned sum of all payload bytes */
-uint8_t payload[0];
 } PACKED kdd_hdr;
 
 #define KDD_PKT_CMD 0x0002  /* Debugger commands (and replies to them) */
@@ -323,7 +322,7 @@ typedef struct {
 kdd_msg msg;
 kdd_reg reg;
 kdd_stc stc;
-uint8_t payload[0];
+uint8_t payload[65536];
 };
 } PACKED kdd_pkt;
 
-- 
2.26.2




Re: Build problems in kdd.c with xen-4.14.0-rc4

2020-07-03 Thread Tim Deegan
Hi Michael,

Thanks for ther report!

At 23:21 +0100 on 30 Jun (1593559296), Michael Young wrote:
> I get the following errors when trying to build xen-4.14.0-rc4
> 
> kdd.c: In function 'kdd_tx':
> kdd.c:754:15: error: array subscript 16 is above array bounds of 
> 'uint8_t[16]' {aka 'unsigned char[16]'} [-Werror=array-bounds]
>754 | s->txb[len++] = 0xaa;
>| ~~^~~
> kdd.c:82:17: note: while referencing 'txb'
> 82 | uint8_t txb[sizeof (kdd_hdr)];   /* Marshalling area 
> for tx */
>| ^~~
> kdd.c: In function 'kdd_break':
> kdd.c:819:19: error: array subscript 16 is above array bounds of 
> 'uint8_t[16]' {aka 'unsigned char[16]'} [-Werror=array-bounds]

Oh dear.  The fix for the last kdd bug seems to have gone wrong
somewhere.  The patch I posted has:

-uint8_t txb[sizeof (kdd_hdr) + 65536];   /* Marshalling area for tx */
+uint8_t txb[sizeof (kdd_pkt)];   /* Marshalling area for tx */

but as applied in master it's:

-uint8_t txb[sizeof (kdd_hdr) + 65536];   /* Marshalling area for tx */
+uint8_t txb[sizeof (kdd_hdr)];   /* Marshalling area for tx */

i.e. the marshalling area is only large enough for a header and GCC
is correctly complaining about that.

Wei, it looks like you committed this patch - can you figure out what
happened to it please?

Cheers,

Tim.



Re: [PATCH for-4.14] kdd: fix build again

2020-07-03 Thread Tim Deegan
At 20:10 + on 03 Jul (1593807001), Wei Liu wrote:
> Restore Tim's patch. The one that was committed was recreated by me
> because git didn't accept my saved copy. I made some mistakes while
> recreating that patch and here we are.
> 
> Fixes: 3471cafbdda3 ("kdd: stop using [0] arrays to access packet contents")
> Reported-by: Michael Young 
> Signed-off-by: Wei Liu 

Reviewed-by: Tim Deegan 

Thanks!

Tim.



Re: [Xen-devel] [PATCH 4/6] x86/domain: remove the 'oos_off' flag

2019-07-24 Thread Tim Deegan
At 17:06 +0100 on 23 Jul (1563901567), Paul Durrant wrote:
> The flag is not needed since the domain 'createflags' can now be tested
> directly.
> 
> Signed-off-by: Paul Durrant 

Acked-by: Tim Deegan 

though some of this change seems to have got into patch 3, maybe they
were reordered at some point?

Cheers,

Tim.


> ---
> Cc: George Dunlap 
> Cc: Jan Beulich 
> Cc: Andrew Cooper 
> Cc: Wei Liu 
> Cc: "Roger Pau Monné" 
> ---
>  xen/arch/x86/mm/shadow/common.c | 3 +--
>  xen/include/asm-x86/domain.h| 1 -
>  2 files changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/xen/arch/x86/mm/shadow/common.c b/xen/arch/x86/mm/shadow/common.c
> index 320ea0db21..2c7fafa4fb 100644
> --- a/xen/arch/x86/mm/shadow/common.c
> +++ b/xen/arch/x86/mm/shadow/common.c
> @@ -62,7 +62,6 @@ int shadow_domain_init(struct domain *d)
>  
>  #if (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC)
>  d->arch.paging.shadow.oos_active = 0;
> -d->arch.paging.shadow.oos_off = d->createflags & XEN_DOMCTL_CDF_oos_off;
>  #endif
>  d->arch.paging.shadow.pagetable_dying_op = 0;
>  
> @@ -2523,7 +2522,7 @@ static void sh_update_paging_modes(struct vcpu *v)
>  #if (SHADOW_OPTIMIZATIONS & SHOPT_OUT_OF_SYNC)
>  /* We need to check that all the vcpus have paging enabled to
>   * unsync PTs. */
> -if ( is_hvm_domain(d) && !d->arch.paging.shadow.oos_off )
> +if ( is_hvm_domain(d) && !(d->createflags & XEN_DOMCTL_CDF_oos_off) )
>  {
>  int pe = 1;
>  struct vcpu *vptr;
> diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
> index 933b85901f..5f9899469c 100644
> --- a/xen/include/asm-x86/domain.h
> +++ b/xen/include/asm-x86/domain.h
> @@ -115,7 +115,6 @@ struct shadow_domain {
>  
>  /* OOS */
>  bool_t oos_active;
> -bool_t oos_off;
>  
>  /* Has this domain ever used HVMOP_pagetable_dying? */
>  bool_t pagetable_dying_op;
> -- 
> 2.20.1.2.gb21ebb671
> 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [PATCH v3 0/4] x86/P2M: allow 2M superpage use for shadowed guests

2022-08-16 Thread Tim Deegan
At 09:43 +0200 on 12 Aug (1660297391), Jan Beulich wrote:
> I did notice this anomaly in the context of IOMMU side work.
> 
> 1: shadow: slightly consolidate sh_unshadow_for_p2m_change() (part I)
> 2: shadow: slightly consolidate sh_unshadow_for_p2m_change() (part II)
> 3: shadow: slightly consolidate sh_unshadow_for_p2m_change() (part III)
> 4: P2M: allow 2M superpage use for shadowed guests


Acked-by: Tim Deegan 

FWIW I think that adding some kind of mfn_mask() opreration  would
be neater and more understandable than subtracting the PSE flag.

Cheers,

Tim.



Re: [PATCH 0/8] x86: XSA-40{1,2,8} follow-up

2022-07-30 Thread Tim Deegan
At 17:58 +0200 on 26 Jul (1658858332), Jan Beulich wrote:
> Perhaps not entirely unexpected the work there has turned up a few
> further items which want dealing with. Most if not all of these
> aren't interdependent, so could probably be looked at (and go in)
> in (about) any order.

Shadow parts Acked-by: Tim Deegan 

Cheers,

Tim.



Re: [PATCH 07/16] x86/shadow: call sh_update_cr3() directly from sh_page_fault()

2023-03-27 Thread Tim Deegan
Hi,

At 10:33 +0100 on 22 Mar (1679481226), Jan Beulich wrote:
> There's no need for an indirect call here, as the mode is invariant
> throughout the entire paging-locked region. All it takes to avoid it is
> to have a forward declaration of sh_update_cr3() in place.
> 
> Signed-off-by: Jan Beulich 
> ---
> I find this and the respective Win7 related comment suspicious: If we
> really need to "fix up" L3 entries "on demand", wouldn't we better retry
> the shadow_get_and_create_l1e() rather than exit? The spurious page
> fault that the guest observes can, after all, not be known to be non-
> fatal inside the guest. That's purely an OS policy.

I think it has to be non-fatal because it can happen on real hardware,
even if the hardware *does* fill the TLB here (which it is not
required to).

Filling just one sl3e sounds plausible, though we don't want to go
back to the idea of having L3 shadows on PAE!

> Furthermore the sh_update_cr3() will also invalidate L3 entries which
> were loaded successfully before, but invalidated by the guest
> afterwards. I strongly suspect that the described hardware behavior is
> _only_ to load previously not-present entries from the PDPT, but not
> purge ones already marked present.

Very likely, but we *are* allowed to forget old entries whenever we
want to, so this is at worst a performance problem.

> IOW I think sh_update_cr3() would
> need calling in an "incremental" mode here.

This would be the best way of updating just the one entry - but as far
as I can tell the existing code is correct so I wouldn't add any more
complexity unless we know that this path is causing perf problems.

> In any event emitting a TRC_SHADOW_DOMF_DYING trace record in this case
> looks wrong.

Yep.

> Beyond the "on demand" L3 entry creation I also can't see what guest
> actions could lead to the ASSERT() being inapplicable in the PAE case.
> The 3-level code in shadow_get_and_create_l2e() doesn't consult guest
> PDPTEs, and all other logic is similar to that for other modes.

The assert's not true here because the guest can push us down this
path by doing exactly what Win 7 does here - loading CR3 with a
missing L3E, then filling the L3E and causing a page fault that uses
the now-filled L3E.  (Or maybe that's not true any more; my mental
model of the pagetable walker code might be out of date)

Cheers,

Tim.




Re: [PATCH 07/16] x86/shadow: call sh_update_cr3() directly from sh_page_fault()

2023-03-28 Thread Tim Deegan
At 12:37 +0200 on 28 Mar (1680007032), Jan Beulich wrote:
> On 27.03.2023 17:39, Tim Deegan wrote:
> > At 10:33 +0100 on 22 Mar (1679481226), Jan Beulich wrote:
> >> There's no need for an indirect call here, as the mode is invariant
> >> throughout the entire paging-locked region. All it takes to avoid it is
> >> to have a forward declaration of sh_update_cr3() in place.
> >>
> >> Signed-off-by: Jan Beulich 
> >> ---
> >> I find this and the respective Win7 related comment suspicious: If we
> >> really need to "fix up" L3 entries "on demand", wouldn't we better retry
> >> the shadow_get_and_create_l1e() rather than exit? The spurious page
> >> fault that the guest observes can, after all, not be known to be non-
> >> fatal inside the guest. That's purely an OS policy.
> > 
> > I think it has to be non-fatal because it can happen on real hardware,
> > even if the hardware *does* fill the TLB here (which it is not
> > required to).
> 
> But if hardware filled the TLB, it won't raise #PF. If it didn't fill
> the TLB (e.g. because of not even doing a walk when a PDPTE is non-
> present), a #PF would be legitimate, and the handler would then need
> to reload CR3. But such a #PF is what, according to the comment, Win7
> chokes on.

IIRC the Win7 behaviour is to accept and discard the #PF as spurious
(because it sees the PDPTE is present) *without* reloading CR3, so it
gets stuck in a loop taking pagefaults.  Here, we reload CR3 for it,
to unstick it.

I can't think of a mental model of the MMU that would have a problem
here -- either the L3Es are special in which case the pagefault is
correct (and we shouldn't even read the new contents) or they're just
like other PTEs in which case the spurious fault is fine.

> > The assert's not true here because the guest can push us down this
> > path by doing exactly what Win 7 does here - loading CR3 with a
> > missing L3E, then filling the L3E and causing a page fault that uses
> > the now-filled L3E.  (Or maybe that's not true any more; my mental
> > model of the pagetable walker code might be out of date)
> 
> Well, I specifically started the paragraph with 'Beyond the "on demand"
> L3 entry creation'. IOW the assertion would look applicable to me if
> we, as suggested higher up, retried shadow_get_and_create_l1e() and it
> failed again. As the comment there says, "we know the guest entries are
> OK", so the missing L3 entry must have appeared.

Ah, I didn't quite understand you.  Yes, if we changed the handler to
rebuild the SL3E then I think the assertion would be valid again.

Cheers,

Tim.



Re: [PATCH] x86/shadow: drop callback_mask pseudo-variables

2021-07-03 Thread Tim Deegan
At 08:42 +0200 on 30 Jun (1625042541), Jan Beulich wrote:
> In commit 90629587e16e ("x86/shadow: replace stale literal numbers in
> hash_{vcpu,domain}_foreach()") I had to work around a clang shortcoming
> (if you like), leveraging that gcc tolerates static const variables in
> otherwise integer constant expressions. Roberto suggests that we'd
> better not rely on such behavior. Drop the involved static const-s,
> using their "expansions" in both of the prior use sites each. This then
> allows dropping the short-circuiting of the check for clang.
> 
> Requested-by: Roberto Bagnara 
> Signed-off-by: Jan Beulich 

Acked-by: Tim Deegan 



Re: [PATCH 1/5] tools/debugger: Fix PAGE_SIZE redefinition error

2021-04-29 Thread Tim Deegan
Hi,

At 15:05 +0300 on 27 Apr (1619535916), Costin Lupu wrote:
> If PAGE_SIZE is already defined in the system (e.g. in
> /usr/include/limits.h header) then gcc will trigger a redefinition error
> because of -Werror. This commit also protects PAGE_SHIFT definitions for
> keeping consistency.

Thanks for looking into this!  I think properly speaking we should fix
this by defining KDD_PAGE_SHIFT and KDD_PAGE_SIZE in kdd.h and using
those everywhere we currently use PAGE_SIZE/PAGE_SHIFT. in kdd.c and
kdd-xen.c.  If for some reason we ever ended up with a system-defined
PAGE_SIZE that wasn't 4096u then we would not want to use it here
because it would break our guest operations.

Cheers,

Tim



Re: [PATCH 1/5] tools/debugger: Fix PAGE_SIZE redefinition error

2021-04-30 Thread Tim Deegan
At 14:36 +0300 on 30 Apr (1619793419), Costin Lupu wrote:
> Hi Tim,
> 
> On 4/29/21 10:58 PM, Tim Deegan wrote:
> > Hi,
> > 
> > At 15:05 +0300 on 27 Apr (1619535916), Costin Lupu wrote:
> >> If PAGE_SIZE is already defined in the system (e.g. in
> >> /usr/include/limits.h header) then gcc will trigger a redefinition error
> >> because of -Werror. This commit also protects PAGE_SHIFT definitions for
> >> keeping consistency.
> > 
> > Thanks for looking into this!  I think properly speaking we should fix
> > this by defining KDD_PAGE_SHIFT and KDD_PAGE_SIZE in kdd.h and using
> > those everywhere we currently use PAGE_SIZE/PAGE_SHIFT. in kdd.c and
> > kdd-xen.c.  If for some reason we ever ended up with a system-defined
> > PAGE_SIZE that wasn't 4096u then we would not want to use it here
> > because it would break our guest operations.
> 
> As discussed for another patch of the series, an agreed solution that
> would apply for other libs as well would be to use XC_PAGE_* macros
> instead of PAGE_* macros. I've just sent a v2 doing that. Please let me
> know if you think it would be better to introduce the KDD_PAGE_*
> definitions instead.

Sorry to be annoying, but yes, please do introduce the KDD_ versions.
All the xen-specific code in KDD lives in kdd-xen.c; kdd.c shouldn't
include any xen headers.

Cheers,

Tim.



  1   2   >