> From: Lv, Zhiyuan
> Sent: Friday, February 05, 2016 10:01 AM
> 
> Hi George,
> 
> On Thu, Feb 04, 2016 at 11:06:33AM +0000, George Dunlap wrote:
> > On Thu, Feb 4, 2016 at 9:38 AM, Yu, Zhang <yu.c.zh...@linux.intel.com> 
> > wrote:
> > > On 2/4/2016 5:28 PM, Paul Durrant wrote:
> > >> I assume this means that the emulator can 'unshadow' GTTs (I guess on an
> > >> LRU basis) so that it can shadow new ones when the limit has been 
> > >> exhausted?
> > >> If so, how bad is performance likely to be if we live with a lower limit
> > >> and take the hit of unshadowing if the guest GTTs become heavily 
> > >> fragmented?
> > >>
> > > Thank you, Paul.
> > >
> > > Well, I was told the emulator have approaches to delay the shadowing of
> > > the GTT till future GPU commands are submitted. By now, I'm not sure
> > > about the performance penalties if the limit is set too low. Although
> > > we are confident 8K is a secure limit, it seems still too high to be
> > > accepted. We will perform more experiments with this new approach to
> > > find a balance between the lowest limit and the XenGT performance.
> >
> > Just to check some of my assumptions:
> >
> > I assume that unlike memory accesses, your GPU hardware cannot
> > 'recover' from faults in the GTTs. That is, for memory, you can take a
> > page fault, fix up the pagetables, and then re-execute the original
> > instruction; but so far I haven't heard of any devices being able to
> > seamlessly re-execute a transaction after a fault.  Is my
> > understanding correct?
> 
> That's correct. At least for now, we do not have GPU page fault.
> 
> >
> > If that is the case, then for every top-level value (whatever the
> > equivalent of the CR3), you need to be able to shadow the entire GTT
> > tree below it, yes?  You can't use a trick that the memory shadow
> 
> That's right also.
> 
> > pagetables can use, of unshadowing parts of the tree and reshadowing
> > them.
> >
> > So as long as the currently-in-use GTT tree contains no more than
> > $LIMIT ranges, you can unshadow and reshadow; this will be slow, but
> > strictly speaking correct.
> >
> > What do you do if the guest driver switches to a GTT such that the
> > entire tree takes up more than $LIMIT entries?
> 
> GPU has some special properties different from CPU, which make things
> easier. The GPU page table is constructed by CPU and used by GPU
> workloads. GPU workload itself will not change the page table.
> Meanwhile, GPU workload submission, in virtualized environment, is
> controled by our device model. Then we can reshadow the whole table
> every time before we submit workload. That can reduce the total number
> required for write protection but with performance impact, because GPU
> has to have more idle time waiting for CPU. Hope the info helps.
> Thanks!
> 

Putting in another way, it's fully under mediation when a GPU page table 
(GTT) will be referenced by the GPU, so there're plenty of room to
optimize existing shadowing (always shadowing all recognized GPU page
tables), e.g. shadowing only active one when a VM is scheduled in. It's
a performance matter but no correctness issue.

This is why Yu mentioned earlier whether we can just set a default
limit which is good for majority of use cases, while extending our
device mode to drop/recreate some shadow tables upon the limitation
is hit. I think this matches how today's CPU shadow page table is
implemented, which also has a limitation of how many shadow pages
are allowed per-VM.

Thanks
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Reply via email to