Quoting Zhao, Yakui (2018-07-04 03:09:03) > >-----Original Message----- > >From: Chris Wilson [mailto:ch...@chris-wilson.co.uk] > >Sent: Tuesday, July 3, 2018 10:08 PM > >To: Zhao, Yakui <yakui.z...@intel.com>; Daniel Vetter <dan...@ffwll.ch> > >Cc: intel-gfx@lists.freedesktop.org > >Subject: RE: [Intel-gfx] [PATCH v2 2/2] drm/i915: write fence reg only once > >on > >VGPU > > > >Quoting Zhao, Yakui (2018-07-03 14:58:31) > >> >-----Original Message----- > >> >From: Chris Wilson [mailto:ch...@chris-wilson.co.uk] > >> >Sent: Tuesday, July 3, 2018 9:25 PM > >> >To: Zhao, Yakui <yakui.z...@intel.com>; Daniel Vetter > >> ><dan...@ffwll.ch> > >> >Cc: intel-gfx@lists.freedesktop.org > >> >Subject: RE: [Intel-gfx] [PATCH v2 2/2] drm/i915: write fence reg > >> >only once on VGPU > >> > > >> >Quoting Zhao, Yakui (2018-07-03 13:47:46) > >> >> > >> >> >-----Original Message----- > >> >> >From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch] On Behalf Of > >> >> >Daniel Vetter > >> >> >Sent: Tuesday, July 3, 2018 5:52 PM > >> >> >To: Chris Wilson <ch...@chris-wilson.co.uk> > >> >> >Cc: Daniel Vetter <dan...@ffwll.ch>; Zhao, Yakui > >> >> ><yakui.z...@intel.com>; intel-gfx@lists.freedesktop.org > >> >> >Subject: Re: [Intel-gfx] [PATCH v2 2/2] drm/i915: write fence reg > >> >> >only once on VGPU > >> >> > > >> >> >On Tue, Jul 03, 2018 at 10:05:28AM +0100, Chris Wilson wrote: > >> >> >> Quoting Daniel Vetter (2018-07-03 09:51:03) > >> >> >> > On Tue, Jul 03, 2018 at 10:56:17AM +0800, Zhao Yakui wrote: > >> >> >> > > On VGPU scenario the read/write operation of fence_reg will > >> >> >> > > be trapped by the GVT-g. And then gvt-g follows the HW spec > >> >> >> > > to write the > >> >> >fence_reg. > >> >> >> > > So it is unnecessary to read/write fence reg several times. > >> >> >> > > This will help to reduce the unnecessary trap of fence_reg > >> >> >> > > mmio > >> >operation. > >> >> >> > > > >> >> >> > > V1->V2: Fix one typo error of parameter when calling > >> >> >> > > V1->intel_vgpu_active > >> >> >> > > > >> >> >> > > Signed-off-by: Zhao Yakui <yakui.z...@intel.com> > >> >> >> > > >> >> >> > Ok this makes more sense. Except you need to put the 64bit > >> >> >> > entirely into the vpgu block, with a comment explaining why > >> >> >> > this is safe (since the vpgu will take care of updating fences > >> >> >> > correctly). > >> >> >> > >> >> >> Except, who cares? Are fence registers being rewritten that > >> >> >> frequently that special casing vgpu is worth the hassle. Part of > >> >> >> that is that you need to leave a hint behind in the code that > >> >> >> (a) explains why it is safe after having the "here be dragons" > >> >> >> and (b) why we > >> >care. > >> >> >> > >> >> >> On a more pragmatic level if fencing doesn't plateau out to > >> >> >> steady state, that is a worrying amount of contention -- the > >> >> >> actual fence write itself would be the least of my worries. > >> >> > > >> >> >I can easily imagine that with the few per-client fences vgpu > >> >> >hands out rewrites are much more common. But yeah some real data > >> >> >would be > >> >good. > >> >> >And more reasons to get mesa off of the gtt mmaps. > >> >> > >> >> Hi, Daniel/Chris > >> >> > >> >> Thanks for your comments. > >> >> The fence reg is used to assure the access of Tiled surface > >> >> through aperature window. When fence is needed, the driver helps to > >> >> find one available fence reg and then configure it. After it is not > >> >> used, the > >> >fence will be turned off and then be allocated for next usage. It > >> >doesn't rely on the state of fence reg. In such case we don't need > >> >to worry about the unsteady state. > >> >> > >> >> For the VGPU operation: The op of fence reg is trapped. Then > >> >> the gvt-g > >> >will follow the trapped value to program the fence_reg. > >> >> (It will turn off and then write the expected value for any trapped > >> >> write op > >> >of fence reg). The trapped op in GVT-g is safe. > >> >> > >> >> Based on the current logic, it needs the five traps when one > >> >> fence reg is > >> >configured under VGPU mode.(Three writes, two reads). > >> >> If it is programmed in one 64-bit op under VGPU mode, only one trap > >> >> is > >> >needed. And the GVT-g still can configure the expected fence_value. > >> >> As the trap is quite heavy for VGPU, the trap time can be saved. > >> > > >> >But the argument is can we avoid it entirely by never changing the > >> >fence. You say this is used for mapping through the aperture (GTT), > >> >we say userspace shouldn't be doing that for performance reasons :) A > >> >slow trap on top of a slow operation that is already causing > >> >contention seems more sensible to fix at source. (Albeit so long as > >> >the maintenance burden is considered and found to be reasonable, > >> >adding special cases with their rationale is acceptable.) So you have > >> >to sell why this mmio is worthy of special attention and curtail any > >> >future > >questions. > >> > >> If the userspace driver/app can take care of the buffer allocation > >> especially for the tiled surface, maybe it can reduce the ratio of > >> changing the fence. But this can't be avoided if the tiled buffer is > >> needed and allocated. This also depends on the userspace driver. And it is > >beyond the responsibility of the kernel driver. > > > >We own the stack. If userspace is not behaving as well as it might, we need > >to > >let them know. Improving algorithms is several orders more beneficial than > >micro-optimising the implementation. (Speaking as one who sadly does do > >much of the latter.) But you are asking us to maintain this path with neither > >any CI coverage nor any indication of the userspace impact. > > > >> If it is configured in non-VGPU mode, the several writes of fence_reg > >> doesn't matter. But under the VGPU mode, the trap of fence_reg will > >> cause that it exits the mode of Virtual machine. After the trap > >> emulation is finished, it can return to the guest OS and then resume > >> the following sequence(For > >> example: On KVMGT it will exit to the Linux host from the guest OS.). > >> The exit of guest OS is quite costing. (For example: It needs to check the > >> exit > >reason and check who/how to do the trap emulation). > >> As it is mentioned in the previous email, the current sequence on VGPU > >> needs five traps when one fence reg Is programmed.(Three writes, two > >> reads). If only one trap is needed while it can configure the fence > >> reg correctly, it can save the time of unnecessary traps for fence_reg > >programming. Of course it will help to improve the efficiency on the guest > >OS. > > > >Numbers help, we understand why it is slow to do several traps versus one, > >what we don't know is what impact that actually has to userspace, and > >whether it is worth optimising this versus every other register. > > > Hi, Chris > > The userspace driver/library on guest OS has no sense that it is > running on the virtual machine.
And? Fence thrashing is bad irrespective of virtualisation or not. Improving algorithms to avoid fencing should be effective whether or not userspace is running on a mediated vgpu, and avoiding everything in the path leading to rewriting fences (struct_mutex!) so much effective than improving the traps. It is not that one precludes the other, it's just the micro-optimisations still need to be justified, or they'll be forgotten and removed. > In such case the trap of fence_reg on VGPU is transparent to the userspace > driver/library. > Only when trying to run the performance test, the performance will be > affected by so many traps. But how much so? > If the trap of other register can also be optimized, it will help the > performance of guest OS. Of course the > Prerequisite is that we collect the corresponding trap info when running the > corresponding workload. > This is case by case. > > >What is the before after of something like gem_fence_upload? > > I check the case of gem_fence_upload. It only checks the uploading > performance after fence is already programmed. > So this patch has no impact on gem_fence_upload. Incorrect. By the same token, you haven't measured yourself the impact of this change? -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx