i915: write fence reg only once on VGPU

Chris Wilson Wed, 04 Jul 2018 02:40:41 -0700

Quoting Zhao, Yakui (2018-07-04 03:09:03)
> >-----Original Message-----
> >From: Chris Wilson [mailto:ch...@chris-wilson.co.uk]
> >Sent: Tuesday, July 3, 2018 10:08 PM
> >To: Zhao, Yakui <yakui.z...@intel.com>; Daniel Vetter <dan...@ffwll.ch>
> >Cc: intel-gfx@lists.freedesktop.org
> >Subject: RE: [Intel-gfx] [PATCH v2 2/2] drm/i915: write fence reg only once 
> >on
> >VGPU
> >
> >Quoting Zhao, Yakui (2018-07-03 14:58:31)
> >> >-----Original Message-----
> >> >From: Chris Wilson [mailto:ch...@chris-wilson.co.uk]
> >> >Sent: Tuesday, July 3, 2018 9:25 PM
> >> >To: Zhao, Yakui <yakui.z...@intel.com>; Daniel Vetter
> >> ><dan...@ffwll.ch>
> >> >Cc: intel-gfx@lists.freedesktop.org
> >> >Subject: RE: [Intel-gfx] [PATCH v2 2/2] drm/i915: write fence reg
> >> >only once on VGPU
> >> >
> >> >Quoting Zhao, Yakui (2018-07-03 13:47:46)
> >> >>
> >> >> >-----Original Message-----
> >> >> >From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch] On Behalf Of
> >> >> >Daniel Vetter
> >> >> >Sent: Tuesday, July 3, 2018 5:52 PM
> >> >> >To: Chris Wilson <ch...@chris-wilson.co.uk>
> >> >> >Cc: Daniel Vetter <dan...@ffwll.ch>; Zhao, Yakui
> >> >> ><yakui.z...@intel.com>; intel-gfx@lists.freedesktop.org
> >> >> >Subject: Re: [Intel-gfx] [PATCH v2 2/2] drm/i915: write fence reg
> >> >> >only once on VGPU
> >> >> >
> >> >> >On Tue, Jul 03, 2018 at 10:05:28AM +0100, Chris Wilson wrote:
> >> >> >> Quoting Daniel Vetter (2018-07-03 09:51:03)
> >> >> >> > On Tue, Jul 03, 2018 at 10:56:17AM +0800, Zhao Yakui wrote:
> >> >> >> > > On VGPU scenario the read/write operation of fence_reg will
> >> >> >> > > be trapped by the GVT-g. And then gvt-g follows the HW spec
> >> >> >> > > to write the
> >> >> >fence_reg.
> >> >> >> > > So it is unnecessary to read/write fence reg several times.
> >> >> >> > > This will help to reduce the unnecessary trap of fence_reg
> >> >> >> > > mmio
> >> >operation.
> >> >> >> > >
> >> >> >> > > V1->V2: Fix one typo error of parameter when calling
> >> >> >> > > V1->intel_vgpu_active
> >> >> >> > >
> >> >> >> > > Signed-off-by: Zhao Yakui <yakui.z...@intel.com>
> >> >> >> >
> >> >> >> > Ok this makes more sense. Except you need to put the 64bit
> >> >> >> > entirely into the vpgu block, with a comment explaining why
> >> >> >> > this is safe (since the vpgu will take care of updating fences 
> >> >> >> > correctly).
> >> >> >>
> >> >> >> Except, who cares? Are fence registers being rewritten that
> >> >> >> frequently that special casing vgpu is worth the hassle. Part of
> >> >> >> that is that you need to leave a hint behind in the code that
> >> >> >> (a) explains why it is safe after having the "here be dragons"
> >> >> >> and (b) why we
> >> >care.
> >> >> >>
> >> >> >> On a more pragmatic level if fencing doesn't plateau out to
> >> >> >> steady state, that is a worrying amount of contention -- the
> >> >> >> actual fence write itself would be the least of my worries.
> >> >> >
> >> >> >I can easily imagine that with the few per-client fences vgpu
> >> >> >hands out rewrites are much more common. But yeah some real data
> >> >> >would be
> >> >good.
> >> >> >And more reasons to get mesa off of the gtt mmaps.
> >> >>
> >> >> Hi, Daniel/Chris
> >> >>
> >> >>       Thanks for your comments.
> >> >>       The fence reg is used to assure the access of Tiled surface
> >> >> through aperature window. When fence is needed, the driver helps to
> >> >> find one available fence reg and then configure it. After it is not
> >> >> used, the
> >> >fence will be turned off and then be allocated for next usage. It
> >> >doesn't rely on the state of fence reg.  In such case we don't need
> >> >to worry about the unsteady state.
> >> >>
> >> >>       For the VGPU operation: The op of fence reg is trapped.  Then
> >> >> the gvt-g
> >> >will follow the trapped value to program the fence_reg.
> >> >> (It will turn off and then write the expected value for any trapped
> >> >> write op
> >> >of fence reg). The trapped op in GVT-g is safe.
> >> >>
> >> >>       Based on the current logic,  it needs the five traps when one
> >> >> fence reg is
> >> >configured under VGPU mode.(Three writes, two reads).
> >> >> If it is programmed in one 64-bit op under VGPU mode, only one trap
> >> >> is
> >> >needed. And the GVT-g still can configure the expected fence_value.
> >> >> As the trap is quite heavy for VGPU, the trap time can be saved.
> >> >
> >> >But the argument is can we avoid it entirely by never changing the
> >> >fence. You say this is used for mapping through the aperture (GTT),
> >> >we say userspace shouldn't be doing that for performance reasons :) A
> >> >slow trap on top of a slow operation that is already causing
> >> >contention seems more sensible to fix at source. (Albeit so long as
> >> >the maintenance burden is considered and found to be reasonable,
> >> >adding special cases with their rationale is acceptable.) So you have
> >> >to sell why this mmio is worthy of special attention and curtail any 
> >> >future
> >questions.
> >>
> >> If the userspace driver/app can take care of the buffer allocation
> >> especially for the tiled surface, maybe it can reduce the ratio of
> >> changing the fence. But this can't be avoided if the tiled buffer is
> >> needed and allocated. This also depends on the userspace driver. And it is
> >beyond the responsibility of the kernel driver.
> >
> >We own the stack. If userspace is not behaving as well as it might, we need 
> >to
> >let them know. Improving algorithms is several orders more beneficial than
> >micro-optimising the implementation. (Speaking as one who sadly does do
> >much of the latter.) But you are asking us to maintain this path with neither
> >any CI coverage nor any indication of the userspace impact.
> >
> >> If it is configured in non-VGPU mode, the several writes of fence_reg
> >> doesn't matter. But under the VGPU mode, the trap of fence_reg will
> >> cause that it exits the mode of Virtual machine. After the trap
> >> emulation is finished, it can return to the guest OS and then resume
> >> the following sequence(For
> >> example: On KVMGT it will exit to the Linux host from the guest OS.).
> >> The exit of guest OS is quite costing. (For example: It needs to check the 
> >> exit
> >reason and check who/how to do the trap emulation).
> >> As it is mentioned in the previous email, the current sequence on VGPU
> >> needs five traps when one fence reg Is programmed.(Three writes, two
> >> reads). If only one trap is needed while it can configure the fence
> >> reg correctly, it can save the time of unnecessary traps for fence_reg
> >programming. Of course it will help to improve the efficiency on the guest 
> >OS.
> >
> >Numbers help, we understand why it is slow to do several traps versus one,
> >what we don't know is what impact that actually has to userspace, and
> >whether it is worth optimising this versus every other register.
> >
> Hi, Chris
> 
>        The userspace driver/library on guest OS has no sense that it is 
> running on the virtual machine.


And? Fence thrashing is bad irrespective of virtualisation or not.
Improving algorithms to avoid fencing should be effective whether or
not userspace is running on a mediated vgpu, and avoiding everything in
the path leading to rewriting fences (struct_mutex!) so much effective
than improving the traps.

It is not that one precludes the other, it's just the
micro-optimisations still need to be justified, or they'll be forgotten
and removed.

> In such case the trap of fence_reg on VGPU is transparent to the userspace 
> driver/library. 
> Only when trying to run the performance test, the performance will be 
> affected by so many traps.

But how much so?
 
> If the trap of other register can also be optimized, it will help the 
> performance of guest OS.  Of course the
> Prerequisite is that  we collect the corresponding trap info when running the 
> corresponding workload.
> This is case by case.
>
> >What is the before after of something like gem_fence_upload?
> 
> I check the case of gem_fence_upload. It only checks the uploading 
> performance after fence is already programmed.
> So this patch has no impact on gem_fence_upload.

Incorrect. By the same token, you haven't measured yourself the impact
of this change?
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH v2 2/2] drm/i915: write fence reg only once on VGPU

Reply via email to