Re: [Intel-gfx] [PATCH 03/11] drm/i915: Allow the sysadmin to override security mitigations
> -Original Message- > From: Chris Wilson > Sent: Sunday, January 10, 2021 7:04 AM > To: intel-gfx@lists.freedesktop.org > Cc: Chris Wilson ; Joonas Lahtinen > ; Bloomfield, Jon > ; Vivi, Rodrigo ; > sta...@vger.kernel.org > Subject: [PATCH 03/11] drm/i915: Allow the sysadmin to override security > mitigations > > The clear-residuals mitigation is a relatively heavy hammer and under some > circumstances the user may wish to forgo the context isolation in order > to meet some performance requirement. Introduce a generic module > parameter to allow selectively enabling/disabling different mitigations. > > Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1858 > Fixes: 47f8253d2b89 ("drm/i915/gen7: Clear all EU/L3 residual contexts") > Signed-off-by: Chris Wilson > Cc: Joonas Lahtinen > Cc: Jon Bloomfield > Cc: Rodrigo Vivi > Cc: sta...@vger.kernel.org # v5.7 > --- Reviewed-by: Jon Bloomfield ? ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [RFC PATCH 00/97] Basic GuC submission support in the i915
> -Original Message- > From: Martin Peres > Sent: Tuesday, May 11, 2021 1:06 AM > To: Daniel Vetter > Cc: Jason Ekstrand ; Brost, Matthew > ; intel-gfx ; > dri-devel ; Ursulin, Tvrtko > ; Ekstrand, Jason ; > Ceraolo Spurio, Daniele ; Bloomfield, Jon > ; Vetter, Daniel ; > Harrison, John C > Subject: Re: [RFC PATCH 00/97] Basic GuC submission support in the i915 > > On 10/05/2021 19:33, Daniel Vetter wrote: > > On Mon, May 10, 2021 at 3:55 PM Martin Peres > wrote: > >> > >> On 10/05/2021 02:11, Jason Ekstrand wrote: > >>> On May 9, 2021 12:12:36 Martin Peres wrote: > >>> > >>>> Hi, > >>>> > >>>> On 06/05/2021 22:13, Matthew Brost wrote: > >>>>> Basic GuC submission support. This is the first bullet point in the > >>>>> upstreaming plan covered in the following RFC [1]. > >>>>> > >>>>> At a very high level the GuC is a piece of firmware which sits between > >>>>> the i915 and the GPU. It offloads some of the scheduling of contexts > >>>>> from the i915 and programs the GPU to submit contexts. The i915 > >>>>> communicates with the GuC and the GuC communicates with the > GPU. > >>>> > >>>> May I ask what will GuC command submission do that execlist > won't/can't > >>>> do? And what would be the impact on users? Even forgetting the > troubled > >>>> history of GuC (instability, performance regression, poor level of user > >>>> support, 6+ years of trying to upstream it...), adding this much code > >>>> and doubling the amount of validation needed should come with a > >>>> rationale making it feel worth it... and I am not seeing here. Would you > >>>> mind providing the rationale behind this work? > >>>> > >>>>> > >>>>> GuC submission will be disabled by default on all current upstream > >>>>> platforms behind a module parameter - enable_guc. A value of 3 will > >>>>> enable submission and HuC loading via the GuC. GuC submission > should > >>>>> work on all gen11+ platforms assuming the GuC firmware is present. > >>>> > >>>> What is the plan here when it comes to keeping support for execlist? I > >>>> am afraid that landing GuC support in Linux is the first step towards > >>>> killing the execlist, which would force users to use proprietary > >>>> firmwares that even most Intel engineers have little influence over. > >>>> Indeed, if "drm/i915/guc: Disable semaphores when using GuC > scheduling" > >>>> which states "Disable semaphores when using GuC scheduling as > semaphores > >>>> are broken in the current GuC firmware." is anything to go by, it means > >>>> that even Intel developers seem to prefer working around the GuC > >>>> firmware, rather than fixing it. > >>> > >>> Yes, landing GuC support may be the first step in removing execlist > >>> support. The inevitable reality is that GPU scheduling is coming and > >>> likely to be there only path in the not-too-distant future. (See also > >>> the ongoing thread with AMD about fences.) I'm not going to pass > >>> judgement on whether or not this is a good thing. I'm just reading the > >>> winds and, in my view, this is where things are headed for good or ill. > >>> > >>> In answer to the question above, the answer to "what do we gain from > >>> GuC?" may soon be, "you get to use your GPU." We're not there yet > and, > >>> again, I'm not necessarily advocating for it, but that is likely where > >>> things are headed. > >> > >> This will be a sad day, especially since it seems fundamentally opposed > >> with any long-term support, on top of taking away user freedom to > >> fix/tweak their system when Intel won't. > >> > >>> A firmware-based submission model isn't a bad design IMO and, aside > from > >>> the firmware freedom issues, I think there are actual advantages to the > >>> model. Immediately, it'll unlock a few features like parallel submission > >>> (more on that in a bit) and long-running compute because they're > >>> implemented in GuC and the work to implement them properly in the > >>> execlist scheduler is highly non-trivial. Longer t
Re: [Intel-gfx] [PATCH 07/11] drm/i915: Add i915_gem_context_is_full_ppgtt
> -Original Message- > From: Tvrtko Ursulin > Sent: Thursday, September 2, 2021 9:42 AM > To: Daniel Vetter > Cc: Daniel Vetter ; DRI Development de...@lists.freedesktop.org>; Intel Graphics Development g...@lists.freedesktop.org>; Maarten Lankhorst > ; Vetter, Daniel > ; Bloomfield, Jon ; > Chris Wilson ; Joonas Lahtinen > ; Thomas Hellström > ; Auld, Matthew > ; Landwerlin, Lionel G > ; Dave Airlie ; Jason > Ekstrand > Subject: Re: [Intel-gfx] [PATCH 07/11] drm/i915: Add > i915_gem_context_is_full_ppgtt > > > On 02/09/2021 16:22, Daniel Vetter wrote: > > On Thu, Sep 02, 2021 at 03:54:36PM +0100, Tvrtko Ursulin wrote: > >> On 02/09/2021 15:20, Daniel Vetter wrote: > >>> And use it anywhere we have open-coded checks for ctx->vm that really > >>> only check for full ppgtt. > >>> > >>> Plus for paranoia add a GEM_BUG_ON that checks it's really only set > >>> when we have full ppgtt, just in case. gem_context->vm is different > >>> since it's NULL in ggtt mode, unlike intel_context->vm or gt->vm, > >>> which is always set. > >>> > >>> v2: 0day found a testcase that I missed. > >>> > >>> Reviewed-by: Maarten Lankhorst > >>> Signed-off-by: Daniel Vetter > >>> Cc: Jon Bloomfield > >>> Cc: Chris Wilson > >>> Cc: Maarten Lankhorst > >>> Cc: Joonas Lahtinen > >>> Cc: Daniel Vetter > >>> Cc: "Thomas Hellström" > >>> Cc: Matthew Auld > >>> Cc: Lionel Landwerlin > >>> Cc: Dave Airlie > >>> Cc: Jason Ekstrand > >>> --- > >>>drivers/gpu/drm/i915/gem/i915_gem_context.c | 2 +- > >>>drivers/gpu/drm/i915/gem/i915_gem_context.h | 7 +++ > >>>drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c| 2 +- > >>>drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c | 6 +++--- > >>>4 files changed, 12 insertions(+), 5 deletions(-) > >>> > >>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c > b/drivers/gpu/drm/i915/gem/i915_gem_context.c > >>> index 7a566fb7cca4..1eec85944c1f 100644 > >>> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c > >>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c > >>> @@ -1566,7 +1566,7 @@ static int get_ppgtt(struct > drm_i915_file_private *file_priv, > >>> int err; > >>> u32 id; > >>> - if (!rcu_access_pointer(ctx->vm)) > >>> + if (!i915_gem_context_is_full_ppgtt(ctx)) > >> > >> It reads a bit wrong because GEM context cannot *be* full ppggt. It can > be > >> associated with a VM which is or isn't full ppgtt. So a test on a VM > >> retrieved from a context is semnntically more correct. Perhaps you want > to > >> consider adding a helper to that effect instead? It could mean splitting > >> into two helpers (getter + test) or maybe just renaming would work. Like > >> i915_gem_context_has_full_ppgtt_vm(ctx)? > > > > The pointer isn't set when the driver/context isn't running in full ppgtt > > mode. This is why I've added the GEM_BUG_ON to check we're not > breaking > > any invariants. So yeah it is a full ppgtt context or it's not, that is > > indeed the question here. > > > > I'm happy to bikeshed the naming, but I don't see how your suggestion is > > an improvement. > > I think the pointer being set or not is implementation detail, for > instance we could have it always set just like it is in intel_context. > > I simply think GEM context *isn't* full ppgtt, but the VM is. And since > GEM context *points* to a VM, *has* is the right verb in my mind. You > did not write why do you not see has as more correct than is so I don't > want to be guessing too much. FWIW, I agree with Tvrtko. i915_gem_context_is_full_ppgtt is incorrect grammar. It IS a bike shed, but, hey it'll live for a while.
Re: [Intel-gfx] [PATCH 1/9] drm/doc/rfc: i915 DG1 uAPI
> -Original Message- > From: Auld, Matthew > Sent: Monday, April 26, 2021 2:39 AM > To: intel-gfx@lists.freedesktop.org > Cc: Joonas Lahtinen ; Thomas Hellström > ; Ceraolo Spurio, Daniele > ; Lionel Landwerlin > ; Bloomfield, Jon > ; Justen, Jordan L ; > Vetter, Daniel ; Kenneth Graunke > ; Jason Ekstrand ; Dave > Airlie ; dri-de...@lists.freedesktop.org; mesa- > d...@lists.freedesktop.org; Daniel Vetter ; Dave > Airlie > Subject: [PATCH 1/9] drm/doc/rfc: i915 DG1 uAPI > > Add an entry for the new uAPI needed for DG1. Also add the overall > upstream plan, including some notes for the TTM conversion. > > v2(Daniel): > - include the overall upstreaming plan > - add a note for mmap, there are differences here for TTM vs i915 > - bunch of other suggestions from Daniel > v3: > (Daniel) > - add a note for set/get caching stuff > - add some more docs for existing query and extensions stuff > - add an actual code example for regions query > - bunch of other stuff > (Jason) > - uAPI change(!): > - try a simpler design with the placements extension > - rather than have a generic setparam which can cover multiple > use cases, have each extension be responsible for one thing > only > v4: > (Daniel) > - add some more notes for ttm conversion > - bunch of other stuff > (Jason) > - uAPI change(!): > - drop all the extra rsvd members for the region_query and > region_info, just keep the bare minimum needed for padding > > Signed-off-by: Matthew Auld > Cc: Joonas Lahtinen > Cc: Thomas Hellström > Cc: Daniele Ceraolo Spurio > Cc: Lionel Landwerlin > Cc: Jon Bloomfield > Cc: Jordan Justen > Cc: Daniel Vetter > Cc: Kenneth Graunke > Cc: Jason Ekstrand > Cc: Dave Airlie > Cc: dri-de...@lists.freedesktop.org > Cc: mesa-...@lists.freedesktop.org > Acked-by: Daniel Vetter > Acked-by: Dave Airlie > --- Acked-by: Jon Bloomfield ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Prevent bonded requests from overtaking each other on preemption
> -Original Message- > From: Intel-gfx On Behalf Of Tvrtko > Ursulin > Sent: Friday, September 20, 2019 8:12 AM > To: Chris Wilson ; intel-gfx@lists.freedesktop.org > Subject: Re: [Intel-gfx] [PATCH] drm/i915: Prevent bonded requests from > overtaking each other on preemption > > > On 20/09/2019 15:57, Chris Wilson wrote: > > Quoting Chris Wilson (2019-09-20 09:36:24) > >> Force bonded requests to run on distinct engines so that they cannot be > >> shuffled onto the same engine where timeslicing will reverse the order. > >> A bonded request will often wait on a semaphore signaled by its master, > >> creating an implicit dependency -- if we ignore that implicit dependency > >> and allow the bonded request to run on the same engine and before its > >> master, we will cause a GPU hang. > > > > Thinking more, it should not directly cause a GPU hang, as the stuck request > > should be timesliced away, and each preemption should be enough to keep > > hangcheck at bay (though we have evidence it may not). So at best it runs > > at half-speed, at worst a third (if my model is correct). > > But I think it is still correct to do since we don't have the coupling > information on re-submit. Hm.. but don't we need to prevent slave from > changing engines as well? Unless I'm missing something, the proposal here is to set the engines in stone at first submission, and never change them? If so, that does sound overly restrictive, and will prevent any kind of rebalancing as workloads (of varying slave counts) come and go. During the original design it was called out that the workloads should be pre-empted atomically. That allows the entire bonding mask to be re-evaluated at every context switch and so we can then rebalance. Still not easy to achieve I agree :-( > > Regards, > > Tvrtko > ___ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Prevent bonded requests from overtaking each other on preemption
> -Original Message- > From: Chris Wilson > Sent: Friday, September 20, 2019 9:04 AM > To: Bloomfield, Jon ; intel- > g...@lists.freedesktop.org; Tvrtko Ursulin > Subject: RE: [Intel-gfx] [PATCH] drm/i915: Prevent bonded requests from > overtaking each other on preemption > > Quoting Bloomfield, Jon (2019-09-20 16:50:57) > > > -Original Message- > > > From: Intel-gfx On Behalf Of > Tvrtko > > > Ursulin > > > Sent: Friday, September 20, 2019 8:12 AM > > > To: Chris Wilson ; > > > intel-gfx@lists.freedesktop.org > > > Subject: Re: [Intel-gfx] [PATCH] drm/i915: Prevent bonded requests from > > > overtaking each other on preemption > > > > > > > > > On 20/09/2019 15:57, Chris Wilson wrote: > > > > Quoting Chris Wilson (2019-09-20 09:36:24) > > > >> Force bonded requests to run on distinct engines so that they cannot be > > > >> shuffled onto the same engine where timeslicing will reverse the order. > > > >> A bonded request will often wait on a semaphore signaled by its master, > > > >> creating an implicit dependency -- if we ignore that implicit > > > >> dependency > > > >> and allow the bonded request to run on the same engine and before its > > > >> master, we will cause a GPU hang. > > > > > > > > Thinking more, it should not directly cause a GPU hang, as the stuck > request > > > > should be timesliced away, and each preemption should be enough to > keep > > > > hangcheck at bay (though we have evidence it may not). So at best it > > > > runs > > > > at half-speed, at worst a third (if my model is correct). > > > > > > But I think it is still correct to do since we don't have the coupling > > > information on re-submit. Hm.. but don't we need to prevent slave from > > > changing engines as well? > > > > Unless I'm missing something, the proposal here is to set the engines in > > stone > at first submission, and never change them? > > For submission here, think execution (submission to actual HW). (We have > 2 separate phases that all like to be called submit()!) > > > If so, that does sound overly restrictive, and will prevent any kind of > rebalancing as workloads (of varying slave counts) come and go. > > We are only restricting this request, not the contexts. We still have > balancing overall, just not instantaneous balancing if we timeslice out > of this request -- we put it back onto the "same" engine and not another. > Which is in some ways is less than ideal, although strictly we are only > saying don't put it back onto an engine we have earmarked for our bonded > request, and so we avoid contending with our parallel request reducing > that to serial (and often bad) behaviour. > > [So at the end of this statement, I'm more happy with the restriction ;] > > > During the original design it was called out that the workloads should be > > pre- > empted atomically. That allows the entire bonding mask to be re-evaluated at > every context switch and so we can then rebalance. Still not easy to achieve I > agree :-( > > The problem with that statement is that atomic implies a global > scheduling decision. Blood, sweat and tears. Agreed - It isn't fun. Perhaps it doesn't matter anyway. Once GuC is offloading the scheduling it should be able to do a little more wrt rebalancing. Let's make it a GuC headache instead. > > Of course, with your endless scheme, scheduling is all in the purview of > the user :) Hey, don't tarnish me with that brush. I don't like it either. Actually, it's your scheme technically. I just asked for a way to enable HPC workloads, and you enthusiastically offered heartbeats&non-persistence. So shall history be written :-) > -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915/ehl: Add missing VECS engine
> -Original Message- > From: Intel-gfx On Behalf Of Tvrtko > Ursulin > Sent: Tuesday, June 25, 2019 10:22 PM > To: Ceraolo Spurio, Daniele ; Roper, > Matthew D ; Souza, Jose > Cc: intel-gfx@lists.freedesktop.org > Subject: Re: [Intel-gfx] [PATCH] drm/i915/ehl: Add missing VECS engine > > > On 25/06/2019 22:48, Daniele Ceraolo Spurio wrote: > > On 6/25/19 8:26 AM, Matt Roper wrote: > >> On Fri, Jun 14, 2019 at 03:17:39PM -0700, Matt Roper wrote: > >>> On Fri, Jun 14, 2019 at 02:37:49PM -0700, José Roberto de Souza wrote: > EHL can have up to one VECS(video enhancement) engine, so add it to > the device_info. > >>> > >>> Bspec 29150 has a footnote on VEbox that indicates "Pass-through only, > >>> no VEbox processing logic." That note seems a bit vague, but I think I > >>> saw some more detailed info in the past somewhere that indicated the > >>> VECS command streamer is still technically present but doesn't actually > >>> do any video enhancement on EHL; it just passes content through to SFC. > >>> > >>> I'm not terribly plugged into the media side of the world, so I'm not > >>> sure if we want to expose VECS to userspace if it's basically a noop and > >>> doesn't do what it normally does on other platforms. Bspec page 5229 > >>> implies that SFC can be fed directly by the decode engine without going > >>> through VEBOX, so I'm not sure if media userspace would ever have a use > >>> for the passthrough-only VECS streamer. > >>> > >>> We should probably ask someone on the media team what their thoughts > are > >>> on this. > >> > >> Since the media team confirmed that there is indeed a use case for a > >> passthrough-only VECS, > >> > >> Reviewed-by: Matt Roper > >> > > > > A bit late for a question, but how does userspace know that this is just > > a pass-through VECS? Are we expecting them to switch based on platform > > instead of just using the kernel API? IMO it'd be better to hide the > > engine in the query ioctl by default and only show it if userspace > > passes an appropriate flag, otherwise legacy apps could try to submit > > VECS-specific commands to the engine. > > I have a patch which would enable this, guess it's time to send it.. > > If we go this route (hide the engine by default), this patch would need > to add a new capability flag. But what to call it? > I915_VIDEO_ENHANCE_CLASS_PASSTHROUGH? I don't get the legacy userspace problem. It's not realistic to expect userspace to 'just work' on new platforms. Instructions are added and removed at each new gen, and umd needs to know that. The capabilities were not really designed to insulate the umd's from making device specific decisions. If umd can easily deduce a capability from the device-id, it should be doing so. The capabilities will become unwieldy very quickly if we start poking in new ones for every change in hardware shape. Caps should be limited to things that cannot be easily deduced, or that can be different on different SKU's of the same device (e.g. fusing). The SFC cap is needed because we don't expose explicit engine numbers to userspace, so it has no easy way to determine which VDBox's have SFC and which do not. Here, we're talking about an easily deducible feature. > > Regards, > > Tvrtko > ___ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] Small bar recovery vs compressed content on DG2
+@Vetter, Daniel Let's not start re-inventing this on the fly again. That's how we got into trouble in the past. The SAS/Whitepaper does currently require the SMEM+LMEM placement for mappable, for good reasons. We cannot 'always migrate to mappable in the fault handler'. Or at least, this is not as trivial as it is to write in a sentence due to the need to spill out other active objects, and all the usual challenges with context synchronization etc. It is possible, perhaps with a lot of care, but it is challenging to guarantee, easy to break, and not needed for 99.9% of software. We are trying to simplify our driver stack. If we need a special mechanism for debug, we should devise a special mechanism, not throw out the general LMEM+SMEM requirement. Are there any identified first-class clients that require such access, or is it only debugging tools? If only debug, then why can't the tool use a copy engine submission to access the data in place? Or perhaps a bespoke ioctl to access this via the KMD (and kmd submitted copy-engine BB)? Thanks, Jon > -Original Message- > From: Thomas Hellström > Sent: Thursday, March 17, 2022 2:35 AM > To: Joonas Lahtinen ; Bloomfield, Jon > ; Intel Graphics Development g...@lists.freedesktop.org>; Auld, Matthew ; C, > Ramalingam > Subject: Re: Small bar recovery vs compressed content on DG2 > > On Thu, 2022-03-17 at 10:43 +0200, Joonas Lahtinen wrote: > > Quoting Thomas Hellström (2022-03-16 09:25:16) > > > Hi! > > > > > > Do we somehow need to clarify in the headers the semantics for > > > this? > > > > > > From my understanding when discussing the CCS migration series > > > with > > > Ram, the kernel will never do any resolving (compressing / > > > decompressing) migrations or evictions which basically implies the > > > following: > > > > > > *) Compressed data must have LMEM only placement, otherwise the > GPU > > > would read garbage if accessing from SMEM. > > > > This has always been the case, so it should be documented in the uAPI > > headers and kerneldocs. > > > > > *) Compressed data can't be assumed to be mappable by the CPU, > > > because > > > in order to ensure that on small BAR, the placement needs to be > > > LMEM+SMEM. > > > > Not strictly true, as we could always migrate to the mappable region > > in > > the CPU fault handler. Will need the same set of tricks as with > > limited > > mappable GGTT in past. > > In addition to Matt's reply: > > Yes, if there is sufficient space. I'm not sure we want to complicate > this to migrate only part of the buffer to mappable on a fault basis? > Otherwise this is likely to fail. > > One option is to allow cpu-mapping from SYSTEM like TTM is doing for > evicted buffers, even if SYSTEM is not in the placement list, and then > migrate back to LMEM for gpu access. > > But can user-space even interpret the compressed data when CPU- > mapping? > without access to the CCS metadata? > > > > > > *) Neither can compressed data be part of a CAPTURE buffer, because > > > that > > > requires the data to be CPU-mappable. > > > > Especially this will be too big of a limitation which we can't really > > afford > > when it comes to debugging. > > Same here WRT user-space interpretation. > > This will become especially tricky on small BAR, because either we need > to fit all compressed buffers in the mappable portion, or be able to > blit the contents of the capture buffers from within the fence > signalling critical section, which will require a lot of work I guess. > > /Thomas > > > > > > Regards, Joonas > > > > > Are we (and user-mode drivers) OK with these restrictions, or do we > > > need > > > to rethink? > > > > > > Thanks, > > > > > > Thomas > > > > > > >
Re: [Intel-gfx] Small bar recovery vs compressed content on DG2
@Thomas Hellström - I agree :-) My question was really to @Joonas Lahtinen, who was saying we could always migrate in the CPU fault handler. I am pushing back on that unless we have no choice. It's the very complication we were trying to avoid with the current SAS. If that's what's needed, then so be it. But I'm asking whether we can instead handle this specially, instead of adding generic complexity to the primary code paths. Jon > -Original Message- > From: Thomas Hellström > Sent: Friday, March 18, 2022 2:48 AM > To: Bloomfield, Jon ; Joonas Lahtinen > ; Intel Graphics Development g...@lists.freedesktop.org>; Auld, Matthew ; C, > Ramalingam ; Vetter, Daniel > > Subject: Re: Small bar recovery vs compressed content on DG2 > > Hi, > > On Thu, 2022-03-17 at 18:21 +, Bloomfield, Jon wrote: > > +@Vetter, Daniel > > > > Let's not start re-inventing this on the fly again. That's how we got > > into trouble in the past. The SAS/Whitepaper does currently require > > the SMEM+LMEM placement for mappable, for good reasons. > > Just to avoid any misunderstandings here: > > We have two hard requirements from Arch that clash, main problem is > compressed bos can't be captured on error with current designs. > > From an engineering point of view we can do little more than list > options available to resolve this and whether they are hard or not so > hard to implemement. But IMHO Arch needs to agree on what's got to > give. > > Thanks, > Thomas > > > > > > We cannot 'always migrate to mappable in the fault handler'. Or at > > least, this is not as trivial as it is to write in a sentence due to > > the need to spill out other active objects, and all the usual > > challenges with context synchronization etc. It is possible, perhaps > > with a lot of care, but it is challenging to guarantee, easy to > > break, and not needed for 99.9% of software. We are trying to > > simplify our driver stack. > > > > If we need a special mechanism for debug, we should devise a special > > mechanism, not throw out the general LMEM+SMEM requirement. Are > there > > any identified first-class clients that require such access, or is it > > only debugging tools? > > > > If only debug, then why can't the tool use a copy engine submission > > to access the data in place? Or perhaps a bespoke ioctl to access > > this via the KMD (and kmd submitted copy-engine BB)? > > > > Thanks, > > > > Jon > > > > > -Original Message- > > > From: Thomas Hellström > > > Sent: Thursday, March 17, 2022 2:35 AM > > > To: Joonas Lahtinen ; Bloomfield, > > > Jon > > > ; Intel Graphics Development > > g...@lists.freedesktop.org>; Auld, Matthew ; > > > C, > > > Ramalingam > > > Subject: Re: Small bar recovery vs compressed content on DG2 > > > > > > On Thu, 2022-03-17 at 10:43 +0200, Joonas Lahtinen wrote: > > > > Quoting Thomas Hellström (2022-03-16 09:25:16) > > > > > Hi! > > > > > > > > > > Do we somehow need to clarify in the headers the semantics for > > > > > this? > > > > > > > > > > From my understanding when discussing the CCS migration series > > > > > with > > > > > Ram, the kernel will never do any resolving (compressing / > > > > > decompressing) migrations or evictions which basically implies > > > > > the > > > > > following: > > > > > > > > > > *) Compressed data must have LMEM only placement, otherwise the > > > GPU > > > > > would read garbage if accessing from SMEM. > > > > > > > > This has always been the case, so it should be documented in the > > > > uAPI > > > > headers and kerneldocs. > > > > > > > > > *) Compressed data can't be assumed to be mappable by the CPU, > > > > > because > > > > > in order to ensure that on small BAR, the placement needs to be > > > > > LMEM+SMEM. > > > > > > > > Not strictly true, as we could always migrate to the mappable > > > > region > > > > in > > > > the CPU fault handler. Will need the same set of tricks as with > > > > limited > > > > mappable GGTT in past. > > > > > > In addition to Matt's reply: > > > > > > Yes, if there is sufficient space. I'm not sure we want to > >
Re: [Intel-gfx] [PATCH] drm/i915/adlp: Remove require_force_probe protection
Assuming the whitespace cleanup requested below is completed: Acked-by: Jon Bloomfield > -Original Message- > From: Intel-gfx On Behalf Of > Rodrigo Vivi > Sent: Tuesday, November 16, 2021 2:33 PM > To: Taylor, Clinton A > Cc: Intel-gfx@lists.freedesktop.org > Subject: Re: [Intel-gfx] [PATCH] drm/i915/adlp: Remove > require_force_probe protection > > On Mon, Nov 15, 2021 at 03:53:45PM -0800, clinton.a.tay...@intel.com > wrote: > > From: Clint Taylor > > > > drm/i915/adlp: Remove require_force_probe protection > > > > Removing force probe protection from ADL_P platform. Did > > not observe warnings, errors, flickering or any visual > > defects while doing ordinary tasks like browsing and > > editing documents in a two monitor setup. > > some strange alignment here... please remove the extra > tabs here. > > > > > For more info drm-tip idle run results : > > https://intel-gfx-ci.01.org/tree/drm-tip/drmtip.html? > > hmm... I could swear that I had seen the ADL-P green there a few > days ago as well... But right now I couldn't see ADL-P there... > > So that fails on having a *reliable* green CI picture... > Any idea why that is down at this moment? > > > > > Cc: Rodrigo Vivi > > > > Signed-off-by: Clint Taylor > > --- > > drivers/gpu/drm/i915/i915_pci.c | 1 - > > 1 file changed, 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_pci.c > b/drivers/gpu/drm/i915/i915_pci.c > > index 4c7fcc5f9a97..af9f4988bd88 100644 > > --- a/drivers/gpu/drm/i915/i915_pci.c > > +++ b/drivers/gpu/drm/i915/i915_pci.c > > @@ -970,7 +970,6 @@ static const struct intel_device_info adl_p_info = { > > GEN12_FEATURES, > > XE_LPD_FEATURES, > > PLATFORM(INTEL_ALDERLAKE_P), > > - .require_force_probe = 1, > > .cpu_transcoder_mask = BIT(TRANSCODER_A) | > BIT(TRANSCODER_B) | > >BIT(TRANSCODER_C) | BIT(TRANSCODER_D) | > >BIT(TRANSCODER_DSI_0) | > BIT(TRANSCODER_DSI_1), > > -- > > 2.33.1 > >
Re: [Intel-gfx] [PATCH v3 0/4] GuC HWCONFIG with documentation
Acked-by: Jon Bloomfield > -Original Message- > From: Intel-gfx On Behalf Of > Jordan Justen > Sent: Tuesday, February 8, 2022 1:05 PM > To: intel-gfx > Cc: dri-devel > Subject: [Intel-gfx] [PATCH v3 0/4] GuC HWCONFIG with documentation > > This is John/Rodrigo's 2 patches with some minor changes, and I added > 2 patches. > > "drm/i915/uapi: Add query for hwconfig blob" was changed: > > * Rename DRM_I915_QUERY_HWCONFIG_TABLE to > DRM_I915_QUERY_HWCONFIG_BLOB >as requested by Joonas. > > * Reword commit message > > * I added Acked-by to this patch, but this only applies in the >context of this version of the patchset. If my changes are >rejected, then please *do not* add my Acked-by to the other series. > >In particular, I do not want my Acked-by on the patch if the patch >mentions the HWCONFIG format, but is not willing to add that to the >actual uAPI. > >I also do not want my Acked-by on it if it mentions "consolidation" >of this data. Since we are dealing with open source projects (aside >from GuC), this doesn't seem appropriate. > > "drm/i915/uapi: Add struct drm_i915_query_hwconfig_blob_item" adds a > struct to the uAPI and documents the return value for > DRM_I915_QUERY_HWCONFIG_BLOB. (Except, keys / values are still > deferred to the PRM.) > > "drm/i915/guc: Verify hwconfig blob matches supported format" does the > simple verification of the blob to make sure it matches what the uAPI > documents. > > v2: > * Fix -Werror errors. > * Rebase to drm-intel/for-linux-next instead of >drm-intel/for-linux-next-gt, as this seems to be what CI wants. > * Fix u32 -> __u32. (Sorry, I was first testing in Mesa tree.) > * Add commit message for "Verify hwconfig blob" patch as requested by >Tvrtko. > * Reword text added to i915_drm.h as requested by Tvrtko. (Attempting >to indicate the overall blob ends right at the last blob item.) > > v3: > * Add several changes suggested by Tvrtko in the "Verify hwconfig >blob", along with some tweaks to i915_drm.h from the feedback for >the same patch. > > John Harrison (1): > drm/i915/guc: Add fetch of hwconfig table > > Jordan Justen (2): > drm/i915/uapi: Add struct drm_i915_query_hwconfig_blob_item > drm/i915/guc: Verify hwconfig blob matches supported format > > Rodrigo Vivi (1): > drm/i915/uapi: Add query for hwconfig blob > > drivers/gpu/drm/i915/Makefile | 1 + > .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h | 1 + > .../gpu/drm/i915/gt/uc/abi/guc_errors_abi.h | 4 + > drivers/gpu/drm/i915/gt/uc/intel_guc.h| 3 + > .../gpu/drm/i915/gt/uc/intel_guc_hwconfig.c | 201 ++ > .../gpu/drm/i915/gt/uc/intel_guc_hwconfig.h | 19 ++ > drivers/gpu/drm/i915/gt/uc/intel_uc.c | 6 + > drivers/gpu/drm/i915/i915_query.c | 23 ++ > include/uapi/drm/i915_drm.h | 33 +++ > 9 files changed, 291 insertions(+) > create mode 100644 drivers/gpu/drm/i915/gt/uc/intel_guc_hwconfig.c > create mode 100644 drivers/gpu/drm/i915/gt/uc/intel_guc_hwconfig.h > > -- > 2.34.1
Re: [Intel-gfx] [PATCH 20/30] drm/i915: Cancel non-persistent contexts on close
> -Original Message- > From: Chris Wilson > Sent: Wednesday, October 2, 2019 4:20 AM > To: intel-gfx@lists.freedesktop.org > Cc: Chris Wilson ; Joonas Lahtinen > ; Winiarski, Michal > ; Bloomfield, Jon > Subject: [PATCH 20/30] drm/i915: Cancel non-persistent contexts on close > > Normally, we rely on our hangcheck to prevent persistent batches from > hogging the GPU. However, if the user disables hangcheck, this mechanism > breaks down. Despite our insistence that this is unsafe, the users are > equally insistent that they want to use endless batches and will disable > the hangcheck mechanism. We are looking at perhaps replacing hangcheck > with a softer mechanism, that sends a pulse down the engine to check if > it is well. We can use the same preemptive pulse to flush an active > persistent context off the GPU upon context close, preventing resources > being lost and unkillable requests remaining on the GPU after process > termination. To avoid changing the ABI and accidentally breaking > existing userspace, we make the persistence of a context explicit and > enable it by default (matching current ABI). Userspace can opt out of > persistent mode (forcing requests to be cancelled when the context is > closed by process termination or explicitly) by a context parameter. To > facilitate existing use-cases of disabling hangcheck, if the modparam is > disabled (i915.enable_hangcheck=0), we disable persistence mode by > default. (Note, one of the outcomes for supporting endless mode will be > the removal of hangchecking, at which point opting into persistent mode > will be mandatory, or maybe the default perhaps controlled by cgroups.) > > Testcase: igt/gem_ctx_persistence > Signed-off-by: Chris Wilson > Cc: Joonas Lahtinen > Cc: Michał Winiarski > Cc: Jon Bloomfield > Reviewed-by: Jon Bloomfield > --- > drivers/gpu/drm/i915/Makefile | 1 + > drivers/gpu/drm/i915/gem/i915_gem_context.c | 122 ++ > drivers/gpu/drm/i915/gem/i915_gem_context.h | 15 +++ > .../gpu/drm/i915/gem/i915_gem_context_types.h | 1 + > .../gpu/drm/i915/gem/selftests/mock_context.c | 2 + > .../gpu/drm/i915/gt/intel_engine_heartbeat.c | 56 > .../gpu/drm/i915/gt/intel_engine_heartbeat.h | 14 ++ > drivers/gpu/drm/i915/gt/intel_engine_pm.c | 2 +- > drivers/gpu/drm/i915/i915_priolist_types.h| 1 + > include/uapi/drm/i915_drm.h | 15 +++ > 10 files changed, 228 insertions(+), 1 deletion(-) > create mode 100644 drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c > create mode 100644 drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h > > diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile > index 1f87c70a2842..561fa2bb3006 100644 > --- a/drivers/gpu/drm/i915/Makefile > +++ b/drivers/gpu/drm/i915/Makefile > @@ -76,6 +76,7 @@ gt-y += \ > gt/intel_breadcrumbs.o \ > gt/intel_context.o \ > gt/intel_engine_cs.o \ > + gt/intel_engine_heartbeat.o \ > gt/intel_engine_pm.o \ > gt/intel_engine_pool.o \ > gt/intel_engine_sysfs.o \ > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c > b/drivers/gpu/drm/i915/gem/i915_gem_context.c > index 0ab416887fc2..e832238a72e5 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c > +static int > +set_persistence(struct i915_gem_context *ctx, > + const struct drm_i915_gem_context_param *args) > +{ > + if (args->size) > + return -EINVAL; > + > + if (args->value) { > + i915_gem_context_set_persistence(ctx); > + return 0; > + } > + > + /* To cancel a context we use "preempt-to-idle" */ > + if (!(ctx->i915->caps.scheduler & > I915_SCHEDULER_CAP_PREEMPTION)) > + return -ENODEV; > + > + i915_gem_context_clear_persistence(ctx); > + return 0; > +} Hmmn. Given that disabling hangcheck is an explicit operation, and we already change the default setting, can't we make it a hard requirement that persistence requires hangcheck? You should not really be able to opt back in to persistence if hangcheck is disabled. In fact you could just test for hangcheck when deciding whether to kill the context, and force-kill if it is off - that way if hangcheck is disabled after a context starts it will get cleaned up. ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 21/30] drm/i915: Replace hangcheck by heartbeats
> -Original Message- > From: Chris Wilson > Sent: Wednesday, October 2, 2019 4:20 AM > To: intel-gfx@lists.freedesktop.org > Cc: Chris Wilson ; Joonas Lahtinen > ; Ursulin, Tvrtko ; > Bloomfield, Jon > Subject: [PATCH 21/30] drm/i915: Replace hangcheck by heartbeats > > Replace sampling the engine state every so often with a periodic > heartbeat request to measure the health of an engine. This is coupled > with the forced-preemption to allow long running requests to survive so > long as they do not block other users. > > The heartbeat interval can be adjusted per-engine using, > > /sys/class/drm/card0/engine/*/heartbeat_interval_ms Not true for discrete :-) Perhaps: /sys/class/drm/card/engine/*/heartbeat_interval_ms And again in the code below. ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v2] drm/i915: Cancel non-persistent contexts on close
> -Original Message- > From: Chris Wilson > Sent: Wednesday, October 2, 2019 7:23 AM > To: intel-gfx@lists.freedesktop.org > Cc: Chris Wilson ; Joonas Lahtinen > ; Winiarski, Michal > ; Bloomfield, Jon > Subject: [PATCH v2] drm/i915: Cancel non-persistent contexts on close > > Normally, we rely on our hangcheck to prevent persistent batches from > hogging the GPU. However, if the user disables hangcheck, this mechanism > breaks down. Despite our insistence that this is unsafe, the users are > equally insistent that they want to use endless batches and will disable > the hangcheck mechanism. We are looking at perhaps replacing hangcheck > with a softer mechanism, that sends a pulse down the engine to check if > it is well. We can use the same preemptive pulse to flush an active > persistent context off the GPU upon context close, preventing resources > being lost and unkillable requests remaining on the GPU after process > termination. To avoid changing the ABI and accidentally breaking > existing userspace, we make the persistence of a context explicit and > enable it by default (matching current ABI). Userspace can opt out of > persistent mode (forcing requests to be cancelled when the context is > closed by process termination or explicitly) by a context parameter. To > facilitate existing use-cases of disabling hangcheck, if the modparam is > disabled (i915.enable_hangcheck=0), we disable persistence mode by > default. (Note, one of the outcomes for supporting endless mode will be > the removal of hangchecking, at which point opting into persistent mode > will be mandatory, or maybe the default perhaps controlled by cgroups.) > > v2: Check for hangchecking at context termination, so that we are not > left with undying contexts from a craft user. s/craft/crafty/ unless we only care about sailors. Otherwise: Reviewed-by: Jon Bloomfield ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 20/30] drm/i915: Cancel non-persistent contexts on close
> -Original Message- > From: Chris Wilson > Sent: Wednesday, October 2, 2019 7:24 AM > To: Bloomfield, Jon ; intel- > g...@lists.freedesktop.org > Cc: Joonas Lahtinen ; Winiarski, Michal > > Subject: RE: [PATCH 20/30] drm/i915: Cancel non-persistent contexts on close > > Quoting Bloomfield, Jon (2019-10-02 14:52:32) > > > > > > > > Hmmn. Given that disabling hangcheck is an explicit operation, and we > already change the default setting, can't we make it a hard requirement that > persistence requires hangcheck? You should not really be able to opt back in > to > persistence if hangcheck is disabled. In fact you could just test for > hangcheck > when deciding whether to kill the context, and force-kill if it is off - that > way if > hangcheck is disabled after a context starts it will get cleaned up. > > Just great, now I got to update the igt to treat i915.enable_hangcheck > as API! > -Chris Don't blame me ;-) I'm in damage limitation mode. I'd prefer we didn't have persistence at all. ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v2 3/5] drm/i915: Introduce DRM_I915_GEM_MMAP_OFFSET
> -Original Message- > From: Intel-gfx On Behalf Of Abdiel > Janulgue > Sent: Monday, October 7, 2019 2:19 AM > To: intel-gfx@lists.freedesktop.org > Cc: Auld, Matthew > Subject: [Intel-gfx] [PATCH v2 3/5] drm/i915: Introduce > DRM_I915_GEM_MMAP_OFFSET > > This is really just an alias of mmap_gtt. Add a new CPU mmap > implementation that allows multiple fault handlers that depends on > the object's backing pages. > > Note that we multiplex mmap_gtt and mmap_offset through the same ioctl, > and use the zero extending behaviour of drm to differentiate between > them, when we inspect the flags. > > Signed-off-by: Abdiel Janulgue > Signed-off-by: Matthew Auld > Cc: Joonas Lahtinen > --- > drivers/gpu/drm/i915/gem/i915_gem_mman.c | 36 +-- > .../gpu/drm/i915/gem/i915_gem_object_types.h | 3 ++ > drivers/gpu/drm/i915/i915_drv.c | 2 +- > include/uapi/drm/i915_drm.h | 28 +++ > 4 files changed, 66 insertions(+), 3 deletions(-) How does the label 'offset' fit into this API if it's really about multiple fault handlers? Could do with a much better description here I think. Who would use this, and why, would help a lot. ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH i-g-t] igt: Use COND_BBEND for busy spinning on gen9
> -Original Message- > From: Chris Wilson > Sent: Wednesday, November 13, 2019 11:47 AM > To: intel-gfx@lists.freedesktop.org > Cc: Bloomfield, Jon ; Lahtinen, Joonas > ; Vivi, Rodrigo ; > Kuoppala, Mika ; Mika Kuoppala > > Subject: [PATCH i-g-t] igt: Use COND_BBEND for busy spinning on gen9 > > From: Jon Bloomfield > > gen9+ introduces a cmdparser for the BLT engine which copies the > incoming BB to a kmd owned buffer for submission (to prevent changes > being made after the bb has been safely scanned). This breaks the > spin functionality because it relies on changing the submitted spin > buffers in order to terminate them. > > Instead, for gen9+, we change the semantics by introducing a COND_BB_END > into the infinite loop, to wait until a memory flag (in anothe bo) is > cleared. > > v2: Correct nop length to avoid overwriting bb_end instr when using > a dependency bo (cork) > v3: fix conflicts on igt_dummyload (Mika) > v4: s/bool running/uint32_t running, fix r->delta (Mika) > v5: remove overzealous assert (Mika) > v6: rebase on top of lib changes (Mika) > v7: rework on top of public igt lib changes (Mika) > v8: rebase > v9: simplify by using bb end as conditional (Chris) > > Signed-off-by: Jon Bloomfield (v2) > Cc: Joonas Lahtinen > Cc: Rodrigo Vivi > Cc: Mika Kuoppala > Signed-off-by: Mika Kuoppala > --- > lib/i830_reg.h | 2 +- > lib/igt_dummyload.c | 24 +++- > lib/intel_reg.h | 3 +++ > tests/i915/gem_double_irq_loop.c| 2 -- > tests/i915/gem_write_read_ring_switch.c | 2 +- > 5 files changed, 28 insertions(+), 5 deletions(-) > > diff --git a/lib/i830_reg.h b/lib/i830_reg.h > index a57691c7e..410202567 100644 > --- a/lib/i830_reg.h > +++ b/lib/i830_reg.h > @@ -43,7 +43,7 @@ > /* broadwater flush bits */ > #define BRW_MI_GLOBAL_SNAPSHOT_RESET (1 << 3) > > -#define MI_COND_BATCH_BUFFER_END (0x36<<23 | 1) > +#define MI_COND_BATCH_BUFFER_END (0x36 << 23) > #define MI_DO_COMPARE(1<<21) > > #define MI_BATCH_BUFFER_END (0xA << 23) > diff --git a/lib/igt_dummyload.c b/lib/igt_dummyload.c > index c079bd045..93b8b6bc6 100644 > --- a/lib/igt_dummyload.c > +++ b/lib/igt_dummyload.c > @@ -208,7 +208,29 @@ emit_recursive_batch(igt_spin_t *spin, >* trouble. See https://bugs.freedesktop.org/show_bug.cgi?id=102262 >*/ > if (!(opts->flags & IGT_SPIN_FAST)) > - cs += 1000; > + cs += 960; > + > + /* > + * When using a cmdparser, the batch is copied into a read only > location > + * and validated. We are then unable to alter the executing batch, > + * breaking the older *spin->condition = MI_BB_END termination. > + * Instead we can use a conditional MI_BB_END here that looks at > + * the user's copy of the batch and terminates when they modified it. > + */ > + if (gen >= 9) { > + r = &relocs[batch->relocation_count++]; > + > + r->presumed_offset = 0; > + r->target_handle = batch->handle; > + r->offset = (cs + 2 - batch_start) * sizeof(*cs); > + r->read_domains = I915_GEM_DOMAIN_COMMAND; > + r->delta = (spin->condition - batch_start) * sizeof(*cs); > + > + *cs++ = MI_COND_BATCH_BUFFER_END | 1 << 21 | 5 << 12 | > 2; > + *cs++ = MI_BATCH_BUFFER_END; > + *cs++ = r->delta; > + *cs++ = 0; > + } > > /* recurse */ > r = &relocs[batch->relocation_count++]; > diff --git a/lib/intel_reg.h b/lib/intel_reg.h > index 069440cb7..10ca760a2 100644 > --- a/lib/intel_reg.h > +++ b/lib/intel_reg.h > @@ -2593,6 +2593,9 @@ SOFTWARE OR THE USE OR OTHER DEALINGS IN > THE SOFTWARE. > #define MI_BATCH_BUFFER ((0x30 << 23) | 1) > #define MI_BATCH_BUFFER_START(0x31 << 23) > #define MI_BATCH_BUFFER_END (0xA << 23) > +#define MI_COND_BATCH_BUFFER_END (0x36 << 23) > +#define MI_DO_COMPARE (1<<21) > + > #define MI_BATCH_NON_SECURE (1) > #define MI_BATCH_NON_SECURE_I965 (1 << 8) > #define MI_BATCH_NON_SECURE_HSW (1<<13) /* Additional > bit for RCS */ > diff --git a/tests/i915/gem_double_irq_loop.c > b/tests/i915/gem_double_irq_loop.c > index b326fc58a..f17f61c19 100644 > --- a/tests/i915/gem_double_irq_loop.c > +++ b/tests/i915/gem_double_irq_loop.c > @@ -52,8 +52,6 @@ static drm_intel_bo *target_buff
Re: [Intel-gfx] drm-next + i915 CVE yolo merge
> -Original Message- > From: Daniel Vetter > Sent: Thursday, November 14, 2019 12:13 AM > To: Dave Airlie > Cc: Jani Nikula ; Bloomfield, Jon > ; Joonas Lahtinen > ; Chris Wilson ; > Stephen Rothwell ; dri-devel de...@lists.freedesktop.org>; Intel Graphics Development g...@lists.freedesktop.org>; Linus Torvalds > Subject: Re: drm-next + i915 CVE yolo merge > > On Thu, Nov 14, 2019 at 2:33 AM Dave Airlie wrote: > > The landing of the i915 CVE fixes into Linus tree has created a bit of > > a mess in linux-next and downstream in drm-next trees. > > > > I talked to Daniel and he had talked to Joonas a bit, and I decided to > > go with what Daniel describes as the YOLO merge, where I just solve it > > and pray, and everyone else verifies/fixes it. > > > > In my favour I've been reading these patches for a couple of months > > now and applied them to a lot of places, so I'm quite familiar with > > what they are doing. > > > > The worst culprit was the RC6 ctx corruption fix since the whole of > > rc6 got refactored in next. However I also had access to a version of > > this patch Jon wrote on drm-tip a couple of weeks ago. > > > > I took that patch, applied it and fixed it up on top of drm-next. Then > > I backmerged the commit that also went into Linus' tree. Then I > > removed any evidence of the RC6 patch from Linus' tree and left the > > newer version pieces in place. The other stuff mostly merged fine and > > the results looked fine, but I'd definitely think everyone at Intel > > should be staring at it, and my dinq tip resolutions ASAP and > > hopefully it goes into CI and comes out smelling of something good. > > Imre should look at the RC6 fix - He did all the hard work on that, including the rebases I sent to Dave. I was just a proxy :-) > > Let me know if it's all horrible asap, > > Add Martin and Arek for CI results. The yolo stuff landed in > CI_DRM_7340, did we break anything in there? From a quick look seems > all ok. Jon ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Replace hangcheck by heartbeats
> -Original Message- > From: Chris Wilson > Sent: Thursday, July 25, 2019 4:17 PM > To: intel-gfx@lists.freedesktop.org > Cc: Chris Wilson ; Joonas Lahtinen > ; Ursulin, Tvrtko ; > Bloomfield, Jon > Subject: [PATCH] drm/i915: Replace hangcheck by heartbeats > > Replace sampling the engine state every so often with a periodic > heartbeat request to measure the health of an engine. This is coupled > with the forced-preemption to allow long running requests to survive so > long as they do not block other users. Can you explain why we would need this at all if we have forced-preemption? Forced preemption guarantees that an engine cannot interfere with the timely execution of other contexts. If it hangs, but nothing else wants to use the engine then do we care? Power, obviously. But I'm not everything can be pre-empted. If a compute context is running on an engine, and no other contexts require that engine, then is it right to nuke it just because it's busy in a long running thread? > > Signed-off-by: Chris Wilson > Cc: Joonas Lahtinen > Cc: Tvrtko Ursulin > Cc: Jon Bloomfield > --- > Note, strictly this still requires struct_mutex-free retirement for the > corner case where the idle-worker is ineffective and we get a backlog of > requests on the kernel ring. Or if the legacy ringbuffer is full... > When we are able to retire from outside of struct_mutex we can do the > full idle-barrier and idle-work from here. > --- > drivers/gpu/drm/i915/Kconfig.profile | 11 + > drivers/gpu/drm/i915/Makefile | 2 +- > drivers/gpu/drm/i915/gem/i915_gem_pm.c| 2 - > drivers/gpu/drm/i915/gt/intel_engine.h| 32 -- > drivers/gpu/drm/i915/gt/intel_engine_cs.c | 10 +- > .../gpu/drm/i915/gt/intel_engine_heartbeat.c | 99 + > .../gpu/drm/i915/gt/intel_engine_heartbeat.h | 17 + > drivers/gpu/drm/i915/gt/intel_engine_pm.c | 5 +- > drivers/gpu/drm/i915/gt/intel_engine_types.h | 14 +- > drivers/gpu/drm/i915/gt/intel_gt.c| 1 - > drivers/gpu/drm/i915/gt/intel_gt.h| 4 - > drivers/gpu/drm/i915/gt/intel_gt_pm.c | 2 - > drivers/gpu/drm/i915/gt/intel_gt_types.h | 9 - > drivers/gpu/drm/i915/gt/intel_hangcheck.c | 360 -- > drivers/gpu/drm/i915/gt/intel_reset.c | 3 +- > drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 4 - > drivers/gpu/drm/i915/i915_debugfs.c | 86 - > drivers/gpu/drm/i915/i915_drv.c | 6 +- > drivers/gpu/drm/i915/i915_drv.h | 1 - > drivers/gpu/drm/i915/i915_gpu_error.c | 37 +- > drivers/gpu/drm/i915/i915_gpu_error.h | 2 - > drivers/gpu/drm/i915/i915_params.c| 5 - > drivers/gpu/drm/i915/i915_params.h| 1 - > 23 files changed, 151 insertions(+), 562 deletions(-) > create mode 100644 drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c > create mode 100644 drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h > delete mode 100644 drivers/gpu/drm/i915/gt/intel_hangcheck.c > > diff --git a/drivers/gpu/drm/i915/Kconfig.profile > b/drivers/gpu/drm/i915/Kconfig.profile > index 3184e8491333..aafb57f84169 100644 > --- a/drivers/gpu/drm/i915/Kconfig.profile > +++ b/drivers/gpu/drm/i915/Kconfig.profile > @@ -37,3 +37,14 @@ config DRM_I915_PREEMPT_TIMEOUT > to execute. > > May be 0 to disable the timeout. > + > +config DRM_I915_HEARTBEAT_INTERVAL > + int "Interval between heartbeat pulses (ms)" > + default 2500 # microseconds > + help > + While active the driver uses a periodic request, a heartbeat, to > + check the wellness of the GPU and to regularly flush state changes > + (idle barriers). > + > + May be 0 to disable heartbeats and therefore disable automatic GPU > + hang detection. > diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile > index 524516251a40..18201852d68d 100644 > --- a/drivers/gpu/drm/i915/Makefile > +++ b/drivers/gpu/drm/i915/Makefile > @@ -73,10 +73,10 @@ gt-y += \ > gt/intel_breadcrumbs.o \ > gt/intel_context.o \ > gt/intel_engine_cs.o \ > + gt/intel_engine_heartbeat.o \ > gt/intel_engine_pm.o \ > gt/intel_gt.o \ > gt/intel_gt_pm.o \ > - gt/intel_hangcheck.o \ > gt/intel_lrc.o \ > gt/intel_renderstate.o \ > gt/intel_reset.o \ > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pm.c > b/drivers/gpu/drm/i915/gem/i915_gem_pm.c > index b5561cbdc5ea..a5a0aefcc04c 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_pm.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_pm.c > @@ -170,8 +170,6 @@ void i915_gem_suspend(struct drm_i9
Re: [Intel-gfx] [PATCH] drm/i915: Replace hangcheck by heartbeats
> -Original Message- > From: Chris Wilson > Sent: Thursday, July 25, 2019 4:28 PM > To: Bloomfield, Jon ; intel- > g...@lists.freedesktop.org > Cc: Joonas Lahtinen ; Ursulin, Tvrtko > > Subject: RE: [PATCH] drm/i915: Replace hangcheck by heartbeats > > Quoting Bloomfield, Jon (2019-07-26 00:21:47) > > > -Original Message- > > > From: Chris Wilson > > > Sent: Thursday, July 25, 2019 4:17 PM > > > To: intel-gfx@lists.freedesktop.org > > > Cc: Chris Wilson ; Joonas Lahtinen > > > ; Ursulin, Tvrtko > ; > > > Bloomfield, Jon > > > Subject: [PATCH] drm/i915: Replace hangcheck by heartbeats > > > > > > Replace sampling the engine state every so often with a periodic > > > heartbeat request to measure the health of an engine. This is coupled > > > with the forced-preemption to allow long running requests to survive so > > > long as they do not block other users. > > > > Can you explain why we would need this at all if we have forced-preemption? > > Forced preemption guarantees that an engine cannot interfere with the > timely > > execution of other contexts. If it hangs, but nothing else wants to use the > engine > > then do we care? > > We may not have something else waiting to use the engine, but we may > have users waiting for the response where we need to detect the GPU hang > to prevent an infinite wait / stuck processes and infinite power drain. I'm not sure I buy that logic. Being able to pre-empt doesn't imply it will ever end. As written a context can sit forever, apparently making progress but never actually returning a response to the user. If the user isn't happy with the progress they will kill the process. So we haven't solved the user responsiveness here. All we've done is eliminated the potential to run one class of otherwise valid workload. Same argument goes for power. Just because it yields when other contexts want to run doesn't mean it won't consume lots of power indefinitely. I can equally write a CPU program to burn lots of power, forever, and it won't get nuked. TDR made sense when it was the only way to ensure contexts could always make forward progress. But force-preemption does everything we need to ensure that as far as I can tell. > > There is also the secondary task of flushing idle barriers. > > > Power, obviously. But I'm not everything can be pre-empted. If a compute > > context is running on an engine, and no other contexts require that engine, > > then is it right to nuke it just because it's busy in a long running > > thread? > > Yes. Unless you ask that we implement GPU-isolation where not even the > kernel is allowed to use a particular set of engines. > -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Replace hangcheck by heartbeats
> -Original Message- > From: Chris Wilson > Sent: Thursday, July 25, 2019 4:52 PM > To: Bloomfield, Jon ; intel- > g...@lists.freedesktop.org > Cc: Joonas Lahtinen ; Ursulin, Tvrtko > > Subject: RE: [PATCH] drm/i915: Replace hangcheck by heartbeats > > Quoting Bloomfield, Jon (2019-07-26 00:41:49) > > > -Original Message- > > > From: Chris Wilson > > > Sent: Thursday, July 25, 2019 4:28 PM > > > To: Bloomfield, Jon ; intel- > > > g...@lists.freedesktop.org > > > Cc: Joonas Lahtinen ; Ursulin, Tvrtko > > > > > > Subject: RE: [PATCH] drm/i915: Replace hangcheck by heartbeats > > > > > > Quoting Bloomfield, Jon (2019-07-26 00:21:47) > > > > > -Original Message- > > > > > From: Chris Wilson > > > > > Sent: Thursday, July 25, 2019 4:17 PM > > > > > To: intel-gfx@lists.freedesktop.org > > > > > Cc: Chris Wilson ; Joonas Lahtinen > > > > > ; Ursulin, Tvrtko > > > ; > > > > > Bloomfield, Jon > > > > > Subject: [PATCH] drm/i915: Replace hangcheck by heartbeats > > > > > > > > > > Replace sampling the engine state every so often with a periodic > > > > > heartbeat request to measure the health of an engine. This is coupled > > > > > with the forced-preemption to allow long running requests to survive > > > > > so > > > > > long as they do not block other users. > > > > > > > > Can you explain why we would need this at all if we have forced- > preemption? > > > > Forced preemption guarantees that an engine cannot interfere with the > > > timely > > > > execution of other contexts. If it hangs, but nothing else wants to use > > > > the > > > engine > > > > then do we care? > > > > > > We may not have something else waiting to use the engine, but we may > > > have users waiting for the response where we need to detect the GPU hang > > > to prevent an infinite wait / stuck processes and infinite power drain. > > > > I'm not sure I buy that logic. Being able to pre-empt doesn't imply it will > > ever end. As written a context can sit forever, apparently making progress > > but never actually returning a response to the user. If the user isn't happy > > with the progress they will kill the process. So we haven't solved the > > user responsiveness here. All we've done is eliminated the potential to > > run one class of otherwise valid workload. > > Indeed, one of the conditions I have in mind for endless is rlimits. The > user + admin should be able to specify that a context not exceed so much > runtime, and if we ever get a scheduler, we can write that as a budget > (along with deadlines). Agreed, an rlimit solution would work better, if there was an option for 'unlimited'. The specific reason I poked was to try to find a solution to the long-running compute workload scenarios. In those cases there is no fwd progress on the ring/BB, but the kernel will still be making fwd progress. The challenge here is that it's not easy for compiler to guarantee to fold a user kernel into something that fit any specific time boundary. It depends on the user kernel structure. A thread could take several seconds (or possibly minutes) to complete. That's not automatically bad. We need a solution to that specific problem, and while I agree that it would be nice to detect errant contexts and kick them out, that relies on an accurate classification of 'errant' and 'safe'. The response needs to be proportional to the problem. > > > Same argument goes for power. Just because it yields when other contexts > > want to run doesn't mean it won't consume lots of power indefinitely. I can > > equally write a CPU program to burn lots of power, forever, and it won't get > > nuked. > > I agree, and continue to dislike letting hogs have free reign. But even with this change a BB hog can still have free reign to consume power, as long as it pauses it's unsavoury habit whenever the heartbeat request comes snooping on the engine. What we're effectively saying with this is that it's ok for a context to spin in a BB, but not ok to spin in an EU kernel. Why would we differentiate when both can burn the same power? So we haven't solved this problem, but continue to victimize legitimate code. > > > TDR made sense when it was the only way to ensure contexts could always > > make forward progress. But force-preemption does everything we need to >
Re: [Intel-gfx] [PATCH] drm/i915: Replace hangcheck by heartbeats
> -Original Message- > From: Chris Wilson > Sent: Friday, July 26, 2019 1:11 PM > To: Bloomfield, Jon ; intel- > g...@lists.freedesktop.org > Cc: Joonas Lahtinen ; Ursulin, Tvrtko > > Subject: RE: [PATCH] drm/i915: Replace hangcheck by heartbeats > > Quoting Bloomfield, Jon (2019-07-26 16:00:06) > > > -Original Message- > > > From: Chris Wilson > > > Sent: Thursday, July 25, 2019 4:52 PM > > > To: Bloomfield, Jon ; intel- > > > g...@lists.freedesktop.org > > > Cc: Joonas Lahtinen ; Ursulin, Tvrtko > > > > > > Subject: RE: [PATCH] drm/i915: Replace hangcheck by heartbeats > > > > > > Quoting Bloomfield, Jon (2019-07-26 00:41:49) > > > > > -Original Message- > > > > > From: Chris Wilson > > > > > Sent: Thursday, July 25, 2019 4:28 PM > > > > > To: Bloomfield, Jon ; intel- > > > > > g...@lists.freedesktop.org > > > > > Cc: Joonas Lahtinen ; Ursulin, Tvrtko > > > > > > > > > > Subject: RE: [PATCH] drm/i915: Replace hangcheck by heartbeats > > > > > > > > > > Quoting Bloomfield, Jon (2019-07-26 00:21:47) > > > > > > > -Original Message- > > > > > > > From: Chris Wilson > > > > > > > Sent: Thursday, July 25, 2019 4:17 PM > > > > > > > To: intel-gfx@lists.freedesktop.org > > > > > > > Cc: Chris Wilson ; Joonas Lahtinen > > > > > > > ; Ursulin, Tvrtko > > > > > ; > > > > > > > Bloomfield, Jon > > > > > > > Subject: [PATCH] drm/i915: Replace hangcheck by heartbeats > > > > > > > > > > > > > > Replace sampling the engine state every so often with a periodic > > > > > > > heartbeat request to measure the health of an engine. This is > coupled > > > > > > > with the forced-preemption to allow long running requests to > survive so > > > > > > > long as they do not block other users. > > > > > > > > > > > > Can you explain why we would need this at all if we have forced- > > > preemption? > > > > > > Forced preemption guarantees that an engine cannot interfere with > the > > > > > timely > > > > > > execution of other contexts. If it hangs, but nothing else wants to > > > > > > use > the > > > > > engine > > > > > > then do we care? > > > > > > > > > > We may not have something else waiting to use the engine, but we may > > > > > have users waiting for the response where we need to detect the GPU > hang > > > > > to prevent an infinite wait / stuck processes and infinite power > > > > > drain. > > > > > > > > I'm not sure I buy that logic. Being able to pre-empt doesn't imply it > > > > will > > > > ever end. As written a context can sit forever, apparently making > > > > progress > > > > but never actually returning a response to the user. If the user isn't > > > > happy > > > > with the progress they will kill the process. So we haven't solved the > > > > user responsiveness here. All we've done is eliminated the potential to > > > > run one class of otherwise valid workload. > > > > > > Indeed, one of the conditions I have in mind for endless is rlimits. The > > > user + admin should be able to specify that a context not exceed so much > > > runtime, and if we ever get a scheduler, we can write that as a budget > > > (along with deadlines). > > > > Agreed, an rlimit solution would work better, if there was an option for > 'unlimited'. > > > > The specific reason I poked was to try to find a solution to the > > long-running compute workload scenarios. In those cases there is no fwd > > progress on the ring/BB, but the kernel will still be making fwd progress. > > The > > challenge here is that it's not easy for compiler to guarantee to fold a > > user > > kernel into something that fit any specific time boundary. It depends on the > > user kernel structure. A thread could take several seconds (or possibly > > minutes) to complete. That's not automatically bad. > > > > We need a solution to that specific problem, and while I agree that it would > be nice to detect errant contexts and kick them ou
Re: [Intel-gfx] [PATCH] drm/i915: Replace hangcheck by heartbeats
Hmmn. We're still on orthogonal perspectives as far as our previous arguments stand. But it doesn't matter because while thinking through your replies, I realized there is one argument in favour, which trumps all my previous arguments against this patch - it makes things deterministic. Without this patch (or hangcheck), whether a context gets nuked depends on what else is running. And that's a recipe for confused support emails. So I retract my other arguments, thanks for staying with me :-) BTW: TDR==Timeout-Detection and Reset. Essentially hangcheck and recovery. > -Original Message- > From: Chris Wilson > Sent: Friday, July 26, 2019 2:30 PM > To: Bloomfield, Jon ; intel- > g...@lists.freedesktop.org > Cc: Joonas Lahtinen ; Ursulin, Tvrtko > > Subject: RE: [PATCH] drm/i915: Replace hangcheck by heartbeats > > Quoting Bloomfield, Jon (2019-07-26 21:58:38) > > > From: Chris Wilson > > > It's no more often than before, you have to fail to advance within an > > > interval, and then deny preemption request. > > > > It's entrapment. You are creating an artificial workload for the context to > impede. Before that artificial workload was injected, the context would have > run to completion, the world would be at peace (and the user would have her > new bitcoin). Instead she stays poor because the DoS police launched an > illegal > sting on her otherwise genius operation. > > It's housekeeping; it's the cost of keeping the engine alive. This > argument is why CPU-isolation came about (aiui). > > > > > I do like the rlimit suggestion, but until we have that, just disabling > > > > TDR > feels > > > like a better stop-gap than nuking innocent compute contexts just in case > they > > > might block something. > > > > > > They're not innocent though :-p > > > > They are innocent until proven guilty :-) > > > > > > > > Disabling hangcheck (no idea why you confuse that with the recovery > > > procedure) makes igt unhappy, but they are able to do that today with > > > the modparam. This patch makes it easier to move that to an engine > > > parameter, but to disable it entirely you still need to be able to reset > > > the GPU on demand (suspend, eviction, oom). Without hangcheck we need > to > > > evaluate all MAX_SCHEDULE_TIMEOUT waits and supply a reset-on- > timeout > > > along critical paths. > > > -Chris > > > > I don't think I'm confusing hang-check with the recovery. I've talked about > TDR, which to me is a periodic hangcheck, combined with a recovery by engine > reset. I don't argue against being able to reset, just against the blunt > classification that hangcheck itself provides. > > > > TDR was originally looking for regressive workloads that were not making > forward progress to protect against DoS. But it was always a very blunt tool. > It's > never been able to differentiate long running, but otherwise valid, compute > workloads from genuine BB hangs, but that was fine at the time, and as you say > users could always switch the modparam. > > To be honest, I still have no idea what TDR is. But I take it that you > agree that we're only talking about hangcheck :) What I think you are > missing out on is that we have some more or less essential (itself > depending on client behaviour) housekeeping that goes along side it. > > My claim is that without a guarantee of isolation, anything else that > wants to use that engine will need the background housekeeping. (And if > we don't go as far as performing complete isolation, I expect userspace > is going to need the kernel to cleanup as they go along, as they are > unlikely to be prepared to do the system maintenance themselves.) > > > Now we have more emphasis on compute we need a solution that doesn't > involve a modparam. This was specifically requested by the compute team - > they know that they can flip the tdr switch, but that means their workload > will > only run if user modifies the system. That's hardly ideal. > > It means they can adjust things to their admins' hearts' content, and > it's a udev rule away from setting permissions to allow the compute group > to freely reconfigure the settings. > > > Without the rlimit concept I don't think we can't prevent power hogs > whatever we do, any more than the core kernel can prevent CPU power hogs. > So, if we can prevent a workload from blocking other contexts, then it is > unhelpful to continue either with the blunt tool that TDR is, or the similarly > blunt heartbeat. If
Re: [Intel-gfx] [PATCH] drm/i915: Replace hangcheck by heartbeats
> -Original Message- > From: Joonas Lahtinen > Sent: Monday, July 29, 2019 5:50 AM > To: Bloomfield, Jon ; intel- > g...@lists.freedesktop.org; Chris Wilson > Cc: Ursulin, Tvrtko > Subject: RE: [PATCH] drm/i915: Replace hangcheck by heartbeats > > Quoting Chris Wilson (2019-07-29 12:45:52) > > Quoting Joonas Lahtinen (2019-07-29 10:26:47) > > > Ok, so just confirming here. The plan is still to have userspace set a > > > per context (or per request) time limit for expected completion of a > > > request. This will be useful for the media workloads that consume > > > deterministic amount of time for correct bitstream. And the userspace > > > wants to be notified much quicker than the generic hangcheck time if > > > the operation failed due to corrupt bitstream. > > > > > > This time limit can be set to infinite by compute workloads. > > > > That only provides a cap on the context itself. We need to make sure that proposals such as the above are compatible with GuC. The nice thing about the heartbeat is that it relies on a more or less standard request/context and so should be compatible with any back end. > > Yes. > > > We also have the > > criteria that is something else has been selected to run on the GPU, you > > have to allow preemption within a certain period or else you will be > > shot. > > This is what I meant ... > > > > Then, in parallel to that, we have cgroups or system wide configuration > > > for maximum allowed timeslice per process/context. That means that a > > > long-running workload must pre-empt at that granularity. > > ... with this. > > > Not quite. It must preempt within a few ms of being asked, that is a > > different problem to the timeslice granularity (which is when we ask it > > to switch, or if not due to a high priority request earlier). It's a QoS > > issue for the other context. Setting that timeout is hard, we can allow > > a context to select its own timeout, or define it via sysfs/cgroups, but > > because it depends on the running context, it causes another context to > > fail in non-trivial ways. The GPU is simply not as preemptible as one > > would like. > > Right, I was only thinking about the pre-emption delay, maybe I chose my > words wrong. Basically what the admin wants to control is exactly what > you wrote, how long it can take from pre-emption request to completion. > This is probably useful as a CONTEXT_GETPARAM for userspace to consider. > They might decide how many loops to run without MI_ARB_CHECK in > non-pre-emptible sections. Dunno. > > That parameter configures the QoS level of the system, how fast a high > priority requests gets to run on the hardware. > > > Fwiw, I was thinking the next step would be to put per-engine controls > > in sysfs, then cross the cgroups bridge. I'm not sure my previous plan > > of exposing per-context parameters for timeslice/preemption is that > > usable. > > The prefered frequency of how often a context would like to be scheduled > on the hardware, makes sense as context setparam. Compute workloads are > probably indifferent and something related to media most likely wants to > run at some multiple of video FPS. > > I guess the userspace could really only request being run more > frequently than the default, and in exchange it would receive less > execution time per each slice. We probably want to control the upper > bound of the frequency. > > > > That pre-emption/hearbeat should happen regardless if others contexts are > > > requesting the hardware or not, because better start recovery of a hung > > > task as soon as it misbehaves. > > > > I concur, but Jon would like the opposite to allow for uncooperative > > compute kernels that simply block preemption forever. I think for the I wasn't asking that :-) What I was originally asking for is to allow a compute workload to run forever IF no other contexts need to run. Don't launch a pre-emptive strike, only kill it if it actually blocks a real workload. But then I realized that this is bad for deterministic behaviour. So retracted the ask. Using the heartbeat to test the workload for pre-emptability is a good solution because it ensures a workload always fails quickly, or never fails. > > extreme Jon wants, something like CPU-isolation fits better, where the > > important client owns an engine all to itself and the kernel is not even > > allowed to do housekeeping on that engine. (We would turn off time- > > slicing, preemption timers, etc on that engine and basically run it in > > submission order.) Agreed, isolation is really the only way we could permit a workload to hog an engine indefinitely. This would be beneficial to some of the RTOS use cases in particular. > > Makes sense to me. > > Regards, Joonas ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 5/5] drm/i915: Cancel non-persistent contexts on close
> -Original Message- > From: Chris Wilson > Sent: Tuesday, August 6, 2019 6:47 AM > To: intel-gfx@lists.freedesktop.org > Cc: Chris Wilson ; Joonas Lahtinen > ; Winiarski, Michal > ; Bloomfield, Jon > Subject: [PATCH 5/5] drm/i915: Cancel non-persistent contexts on close > > Normally, we rely on our hangcheck to prevent persistent batches from > hogging the GPU. However, if the user disables hangcheck, this mechanism > breaks down. Despite our insistence that this is unsafe, the users are > equally insistent that they want to use endless batches and will disable > the hangcheck mechanism. We are looking are perhaps replacing hangcheck > with a softer mechanism, that sends a pulse down the engine to check if > it is well. We can use the same preemptive pulse to flush an active > persistent context off the GPU upon context close, preventing resources > being lost and unkillable requests remaining on the GPU, after process > termination. To avoid changing the ABI and accidentally breaking > existing userspace, we make the persistence of a context explicit and > enable it by default. Userspace can opt out of persistent mode (forcing > requests to be cancelled when the context is closed by process > termination or explicitly) by a context parameter, or to facilitate > existing use-cases by disabling hangcheck (i915.enable_hangcheck=0). > (Note, one of the outcomes for supporting endless mode will be the > removal of hangchecking, at which point opting into persistent mode will > be mandatory, or maybe the default.) > > Testcase: igt/gem_ctx_persistence > Signed-off-by: Chris Wilson > Cc: Joonas Lahtinen > Cc: Michał Winiarski > Cc: Jon Bloomfield > > --- > Same sort of caveats as for hangcheck, a few corner cases need > struct_mutex and some preallocation. > --- > drivers/gpu/drm/i915/Makefile | 3 +- > drivers/gpu/drm/i915/gem/i915_gem_context.c | 79 +++ > drivers/gpu/drm/i915/gem/i915_gem_context.h | 15 > .../gpu/drm/i915/gem/i915_gem_context_types.h | 1 + > .../gpu/drm/i915/gt/intel_engine_heartbeat.c | 53 + > .../gpu/drm/i915/gt/intel_engine_heartbeat.h | 14 > include/uapi/drm/i915_drm.h | 15 > 7 files changed, 179 insertions(+), 1 deletion(-) > create mode 100644 drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c > create mode 100644 drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h > > diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile > index a1016858d014..42417d87496e 100644 > --- a/drivers/gpu/drm/i915/Makefile > +++ b/drivers/gpu/drm/i915/Makefile > @@ -71,9 +71,10 @@ obj-y += gt/ > gt-y += \ > gt/intel_breadcrumbs.o \ > gt/intel_context.o \ > - gt/intel_engine_pool.o \ > gt/intel_engine_cs.o \ > + gt/intel_engine_heartbeat.o \ > gt/intel_engine_pm.o \ > + gt/intel_engine_pool.o \ > gt/intel_gt.o \ > gt/intel_gt_pm.o \ > gt/intel_hangcheck.o \ > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c > b/drivers/gpu/drm/i915/gem/i915_gem_context.c > index 64f7a533e886..5718b74f95b7 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c > @@ -70,6 +70,7 @@ > #include > > #include "gt/intel_lrc_reg.h" > +#include "gt/intel_engine_heartbeat.h" > > #include "i915_gem_context.h" > #include "i915_globals.h" > @@ -373,6 +374,45 @@ void i915_gem_context_release(struct kref *ref) > queue_work(i915->wq, &i915->contexts.free_work); > } > > +static void kill_context(struct i915_gem_context *ctx) > +{ > + struct i915_gem_engines_iter it; > + struct intel_engine_cs *engine; > + intel_engine_mask_t tmp, active; > + struct intel_context *ce; > + > + if (i915_gem_context_is_banned(ctx)) > + return; > + > + i915_gem_context_set_banned(ctx); > + > + active = 0; > + for_each_gem_engine(ce, i915_gem_context_lock_engines(ctx), it) { > + struct i915_request *rq; > + > + if (!ce->ring) > + continue; > + > + rq = i915_active_request_get_unlocked(&ce->ring->timeline- > >last_request); > + if (!rq) > + continue; > + > + active |= rq->engine->mask; > + i915_request_put(rq); > + } > + i915_gem_context_unlock_engines(ctx); > + > + /* > + * Send a "high priority pulse" down the engine to cause the > + * current request to be momentarily preempted. (If it fails to > +
Re: [Intel-gfx] [PATCH 5/5] drm/i915: Cancel non-persistent contexts on close
> -Original Message- > From: Chris Wilson > Sent: Wednesday, August 7, 2019 6:23 AM > To: intel-gfx@lists.freedesktop.org > Cc: Joonas Lahtinen ; Winiarski, Michal > ; Bloomfield, Jon > Subject: Re: [PATCH 5/5] drm/i915: Cancel non-persistent contexts on close > > Quoting Chris Wilson (2019-08-06 14:47:25) > > @@ -433,6 +482,8 @@ __create_context(struct drm_i915_private *i915) > > > > i915_gem_context_set_bannable(ctx); > > i915_gem_context_set_recoverable(ctx); > > + if (i915_modparams.enable_hangcheck) > > + i915_gem_context_set_persistence(ctx); > > I am not fond of this, but from a pragmatic point of view, this does > prevent the issue Jon raised: HW resources being pinned indefinitely > past process termination. > > I don't like it because we cannot perform the operation cleanly > everywhere, it requires preemption first and foremost (with a cooperating > submission backend) and per-engine reset. The alternative is to try and > do a full GPU reset if the context is still active. For the sake of the > issue raised, I think that (full reset on older HW) is required. > > That we are baking in a change of ABI due to an unsafe modparam is meh. > There are a few more corner cases to deal with before endless just > works. But since it is being used in the wild, I'm not sure we can wait > for userspace to opt-in, or wait for cgroups. However, since users are > being encouraged to disable hangcheck, should we extend the concept of > persistence to also mean "survives hangcheck"? -- though it should be a > separate parameter, and I'm not sure how exactly to protect it from the > heartbeat reset without giving gross privileges to the context. (CPU > isolation is nicer from the pov where we can just give up and not even > worry about the engine. If userspace can request isolation, it has the > privilege to screw up.) > -Chris Ok, so your concern is supporting non-persistence on older non-preempting, engine-reset capable, hardware. Why is that strictly required? Can't we simply make it dependent on the features needed to do it well, and if your hardware cannot, then the advice is not to disable hangcheck? I'm doubtful that anyone would attempt a HPC type workload on n IVB. I'm not sure I understand your "survives hangcheck" idea. You mean instead of simply disabling hangcheck, just opt in to persistence and have that also prevent hangcheck? Isn't that the wrong way around, since persistence is what is happening today? ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 5/5] drm/i915: Cancel non-persistent contexts on close
> -Original Message- > From: Chris Wilson > Sent: Wednesday, August 7, 2019 7:14 AM > To: Bloomfield, Jon ; intel- > g...@lists.freedesktop.org > Cc: Joonas Lahtinen ; Winiarski, Michal > > Subject: RE: [PATCH 5/5] drm/i915: Cancel non-persistent contexts on close > > Quoting Bloomfield, Jon (2019-08-07 15:04:16) > > Ok, so your concern is supporting non-persistence on older non-preempting, > engine-reset capable, hardware. Why is that strictly required? Can't we simply > make it dependent on the features needed to do it well, and if your hardware > cannot, then the advice is not to disable hangcheck? I'm doubtful that anyone > would attempt a HPC type workload on n IVB. > > Our advice was not to disable hangcheck :) > > With the cat out of the bag, my concern is dotting the Is and crossing > the Ts. Fixing up the error handling path to the reset isn't all that > bad. But I'm not going to advertise the persistence-parameter support > unless we have a clean solution, and we can advise that compute > workloads are better handled with new hardware. > > > I'm not sure I understand your "survives hangcheck" idea. You mean instead > of simply disabling hangcheck, just opt in to persistence and have that also > prevent hangcheck? Isn't that the wrong way around, since persistence is what > is happening today? > > Persistence is the clear-and-present danger. I'm just trying to sketch a > path for endless support, trying to ask myself questions such as: Is the > persistence parameter still required? What other parameters make sense? > Does anything less than CPU-esque isolation make sense? :) > -Chris I personally liked your persistence idea :-) Isolation doesn't really solve the problem in this case. So, customer enables isolation for a HPC workload. That workload hangs, and user hits ctrl-C. We still have a hung workload and the next job in the queue still can't run. Also, Isolation is kind of meaningless when there is only one engine that's capable of running your workload. On Gen9, pretty much every type of workload requires some RCS involvement, and RCS is where the compute workloads need to run. So isolation hasn't helped us. I'd settle for umd opting in to non-persistence and not providing the automatic association with hangcheck. That at least ensures well behaved umd's don't kill the system. We didn't explore the idea of terminating orphaned contexts though (where none of their resources are referenced by any other contexts). Is there a reason why this is not feasible? In the case of compute (certainly HPC) workloads, there would be no compositor taking the output so this might be a solution. Jon ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 5/5] drm/i915: Cancel non-persistent contexts on close
> -Original Message- > From: Chris Wilson > Sent: Wednesday, August 7, 2019 8:08 AM > To: Bloomfield, Jon ; intel- > g...@lists.freedesktop.org > Cc: Joonas Lahtinen ; Winiarski, Michal > > Subject: RE: [PATCH 5/5] drm/i915: Cancel non-persistent contexts on close > > Quoting Bloomfield, Jon (2019-08-07 15:33:51) > [skip to end] > > We didn't explore the idea of terminating orphaned contexts though > (where none of their resources are referenced by any other contexts). Is > there a reason why this is not feasible? In the case of compute (certainly > HPC) workloads, there would be no compositor taking the output so this > might be a solution. > > Sounds easier said than done. We have to go through each request and > determine it if has an external reference (or if the object holding the > reference has an external reference) to see if the output would be > visible to a third party. Sounds like a conservative GC :| > (Coming to that conclusion suggests that we should structure the request > tracking to make reparenting easier.) > > We could take a pid-1 approach and move all the orphan timelines over to > a new parent purely responsible for them. That honestly doesn't seem to > achieve anything. (We are still stuck with tasks on the GPU and no way > to kill them.) > > In comparison, persistence is a rarely used "feature" and cleaning up on > context close fits in nicely with the process model. It just works as > most users/clients would expect. (Although running in non-persistent > by default hasn't show anything to explode on the desktop, it's too easy > to construct scenarios where persistence turns out to be an advantage, > particularly with chains of clients (the compositor model).) Between the > two modes, we should have most bases covered, it's hard to argue for a > third way (that is until someone has a usecase!) > -Chris Ok, makes sense. Thanks. But have we converged on a decision :-) As I said, requiring compute umd optin should be ok for the immediate HPC issue, but I'd personally argue that it's valid to change the contract for hangcheck=0 and switch the default to non-persistent. Jon ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 5/5] drm/i915: Cancel non-persistent contexts on close
> -Original Message- > From: Chris Wilson > Sent: Wednesday, August 7, 2019 9:51 AM > To: Bloomfield, Jon ; intel- > g...@lists.freedesktop.org > Cc: Joonas Lahtinen ; Winiarski, Michal > > Subject: RE: [PATCH 5/5] drm/i915: Cancel non-persistent contexts on close > > Quoting Bloomfield, Jon (2019-08-07 16:29:55) > > > -Original Message- > > > From: Chris Wilson > > > Sent: Wednesday, August 7, 2019 8:08 AM > > > To: Bloomfield, Jon ; intel- > > > g...@lists.freedesktop.org > > > Cc: Joonas Lahtinen ; Winiarski, Michal > > > > > > Subject: RE: [PATCH 5/5] drm/i915: Cancel non-persistent contexts on close > > > > > > Quoting Bloomfield, Jon (2019-08-07 15:33:51) > > > [skip to end] > > > > We didn't explore the idea of terminating orphaned contexts though > > > (where none of their resources are referenced by any other contexts). Is > > > there a reason why this is not feasible? In the case of compute (certainly > > > HPC) workloads, there would be no compositor taking the output so this > > > might be a solution. > > > > > > Sounds easier said than done. We have to go through each request and > > > determine it if has an external reference (or if the object holding the > > > reference has an external reference) to see if the output would be > > > visible to a third party. Sounds like a conservative GC :| > > > (Coming to that conclusion suggests that we should structure the request > > > tracking to make reparenting easier.) > > > > > > We could take a pid-1 approach and move all the orphan timelines over to > > > a new parent purely responsible for them. That honestly doesn't seem to > > > achieve anything. (We are still stuck with tasks on the GPU and no way > > > to kill them.) > > > > > > In comparison, persistence is a rarely used "feature" and cleaning up on > > > context close fits in nicely with the process model. It just works as > > > most users/clients would expect. (Although running in non-persistent > > > by default hasn't show anything to explode on the desktop, it's too easy > > > to construct scenarios where persistence turns out to be an advantage, > > > particularly with chains of clients (the compositor model).) Between the > > > two modes, we should have most bases covered, it's hard to argue for a > > > third way (that is until someone has a usecase!) > > > -Chris > > > > Ok, makes sense. Thanks. > > > > But have we converged on a decision :-) > > > > As I said, requiring compute umd optin should be ok for the immediate HPC > issue, but I'd personally argue that it's valid to change the contract for > hangcheck=0 and switch the default to non-persistent. > > Could you tender > > diff --git a/runtime/os_interface/linux/drm_neo.cpp > b/runtime/os_interface/linux/drm_neo.cpp > index 31deb68b..8a9af363 100644 > --- a/runtime/os_interface/linux/drm_neo.cpp > +++ b/runtime/os_interface/linux/drm_neo.cpp > @@ -141,11 +141,22 @@ void Drm::setLowPriorityContextParam(uint32_t > drmContextId) { > UNRECOVERABLE_IF(retVal != 0); > } > > +void setNonPersistent(uint32_t drmContextId) { > +drm_i915_gem_context_param gcp = {}; > +gcp.ctx_id = drmContextId; > +gcp.param = 0xb; /* I915_CONTEXT_PARAM_PERSISTENCE; */ > + > +ioctl(DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &gcp); > +} > + > uint32_t Drm::createDrmContext() { > drm_i915_gem_context_create gcc = {}; > auto retVal = ioctl(DRM_IOCTL_I915_GEM_CONTEXT_CREATE, &gcc); > UNRECOVERABLE_IF(retVal != 0); > > +/* enable cleanup of resources on process termination */ > +setNonPersistent(gcc.ctx_id); > + > return gcc.ctx_id; > } > > to interested parties? > -Chris Yes, that's exactly what I had in mind. I think it's enough to resolve the HPC challenges. ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 5/5] drm/i915: Cancel non-persistent contexts on close
> -Original Message- > From: Chris Wilson > Sent: Friday, August 9, 2019 4:35 PM > To: intel-gfx@lists.freedesktop.org > Cc: Joonas Lahtinen ; Winiarski, Michal > ; Bloomfield, Jon > Subject: Re: [PATCH 5/5] drm/i915: Cancel non-persistent contexts on close > > Quoting Chris Wilson (2019-08-06 14:47:25) > > Normally, we rely on our hangcheck to prevent persistent batches from > > hogging the GPU. However, if the user disables hangcheck, this mechanism > > breaks down. Despite our insistence that this is unsafe, the users are > > equally insistent that they want to use endless batches and will disable > > the hangcheck mechanism. We are looking are perhaps replacing hangcheck > > with a softer mechanism, that sends a pulse down the engine to check if > > it is well. We can use the same preemptive pulse to flush an active > > persistent context off the GPU upon context close, preventing resources > > being lost and unkillable requests remaining on the GPU, after process > > termination. To avoid changing the ABI and accidentally breaking > > existing userspace, we make the persistence of a context explicit and > > enable it by default. Userspace can opt out of persistent mode (forcing > > requests to be cancelled when the context is closed by process > > termination or explicitly) by a context parameter, or to facilitate > > existing use-cases by disabling hangcheck (i915.enable_hangcheck=0). > > (Note, one of the outcomes for supporting endless mode will be the > > removal of hangchecking, at which point opting into persistent mode will > > be mandatory, or maybe the default.) > > For the record, I've finally run into examples of desktop clients > exiting before their rendering is shown. No longer hypothetical. > -Chris Can you share any details - Might be useful for testing. ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 27/28] drm/i915: Cancel non-persistent contexts on close
> -Original Message- > From: Chris Wilson > Sent: Monday, August 26, 2019 12:22 AM > To: intel-gfx@lists.freedesktop.org > Cc: Chris Wilson ; Joonas Lahtinen > ; Winiarski, Michal > ; Bloomfield, Jon > Subject: [PATCH 27/28] drm/i915: Cancel non-persistent contexts on close > > Normally, we rely on our hangcheck to prevent persistent batches from > hogging the GPU. However, if the user disables hangcheck, this mechanism > breaks down. Despite our insistence that this is unsafe, the users are > equally insistent that they want to use endless batches and will disable > the hangcheck mechanism. We are looking are perhaps replacing hangcheck "looking at"? > with a softer mechanism, that sends a pulse down the engine to check if > it is well. We can use the same preemptive pulse to flush an active > persistent context off the GPU upon context close, preventing resources > being lost and unkillable requests remaining on the GPU after process > termination. To avoid changing the ABI and accidentally breaking > existing userspace, we make the persistence of a context explicit and > enable it by default (matching current ABI). Userspace can opt out of > persistent mode (forcing requests to be cancelled when the context is > closed by process termination or explicitly) by a context parameter. To > facilitate existing use-cases of disabling hangcheck, if the modparam is > disabled (i915.enable_hangcheck=0), we disable peristence mode by > default. (Note, one of the outcomes for supporting endless mode will be > the removal of hangchecking, at which point opting into persistent mode > will be mandatory, or maybe the default perhaps controlled by cgroups.) > > Testcase: igt/gem_ctx_persistence > Signed-off-by: Chris Wilson > Cc: Joonas Lahtinen > Cc: Michał Winiarski > Cc: Jon Bloomfield > --- > drivers/gpu/drm/i915/Makefile | 3 +- > drivers/gpu/drm/i915/gem/i915_gem_context.c | 122 ++ > drivers/gpu/drm/i915/gem/i915_gem_context.h | 15 +++ > .../gpu/drm/i915/gem/i915_gem_context_types.h | 1 + > .../gpu/drm/i915/gt/intel_engine_heartbeat.c | 54 > .../gpu/drm/i915/gt/intel_engine_heartbeat.h | 14 ++ > drivers/gpu/drm/i915/gt/intel_engine_pm.c | 2 +- > drivers/gpu/drm/i915/i915_priolist_types.h| 1 + > include/uapi/drm/i915_drm.h | 15 +++ > 9 files changed, 225 insertions(+), 2 deletions(-) > create mode 100644 drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c > create mode 100644 drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h > > diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile > index 658b930d34a8..eaa74e000985 100644 > --- a/drivers/gpu/drm/i915/Makefile > +++ b/drivers/gpu/drm/i915/Makefile > @@ -76,8 +76,9 @@ gt-y += \ > gt/intel_breadcrumbs.o \ > gt/intel_context.o \ > gt/intel_engine_cs.o \ > - gt/intel_engine_pool.o \ > + gt/intel_engine_heartbeat.o \ > gt/intel_engine_pm.o \ > + gt/intel_engine_pool.o \ > gt/intel_engine_user.o \ > gt/intel_gt.o \ > gt/intel_gt_irq.o \ > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c > b/drivers/gpu/drm/i915/gem/i915_gem_context.c > index bd9397669332..5520a896e701 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c > @@ -70,6 +70,7 @@ > #include > > #include "gt/intel_lrc_reg.h" > +#include "gt/intel_engine_heartbeat.h" > #include "gt/intel_engine_user.h" > > #include "i915_gem_context.h" > @@ -375,6 +376,78 @@ void i915_gem_context_release(struct kref *ref) > queue_work(i915->wq, &i915->contexts.free_work); > } > > +static inline struct i915_gem_engines * > +__context_engines_static(struct i915_gem_context *ctx) > +{ > + return rcu_dereference_protected(ctx->engines, true); > +} > + > +static void kill_context(struct i915_gem_context *ctx) > +{ > + intel_engine_mask_t tmp, active, reset; > + struct intel_gt *gt = &ctx->i915->gt; > + struct i915_gem_engines_iter it; > + struct intel_engine_cs *engine; > + struct intel_context *ce; > + > + /* > + * If we are already banned, it was due to a guilty request causing > + * a reset and the entire context being evicted from the GPU. > + */ > + if (i915_gem_context_is_banned(ctx)) > + return; > + > + i915_gem_context_set_banned(ctx); > + > + /* > + * Map the user's engine back to the actual engines; one virtual > + * engine will be mapped to multiple engines, and using ctx-&g
Re: [Intel-gfx] [PATCH 28/28] drm/i915: Replace hangcheck by heartbeats
> -Original Message- > From: Chris Wilson > Sent: Monday, August 26, 2019 12:22 AM > To: intel-gfx@lists.freedesktop.org > Cc: Chris Wilson ; Joonas Lahtinen > ; Ursulin, Tvrtko ; > Bloomfield, Jon > Subject: [PATCH 28/28] drm/i915: Replace hangcheck by heartbeats > > Replace sampling the engine state every so often with a periodic > heartbeat request to measure the health of an engine. This is coupled > with the forced-preemption to allow long running requests to survive so > long as they do not block other users. > > Signed-off-by: Chris Wilson > Cc: Joonas Lahtinen > Cc: Tvrtko Ursulin > Cc: Jon Bloomfield > --- > drivers/gpu/drm/i915/Kconfig.profile | 11 + > drivers/gpu/drm/i915/Makefile | 1 - > drivers/gpu/drm/i915/display/intel_display.c | 2 +- > drivers/gpu/drm/i915/gem/i915_gem_object.h| 1 - > drivers/gpu/drm/i915/gem/i915_gem_pm.c| 2 - > drivers/gpu/drm/i915/gt/intel_engine.h| 32 -- > drivers/gpu/drm/i915/gt/intel_engine_cs.c | 10 +- > .../gpu/drm/i915/gt/intel_engine_heartbeat.c | 113 +- > .../gpu/drm/i915/gt/intel_engine_heartbeat.h | 5 + > drivers/gpu/drm/i915/gt/intel_engine_pm.c | 5 +- > drivers/gpu/drm/i915/gt/intel_engine_types.h | 14 +- > drivers/gpu/drm/i915/gt/intel_gt.c| 1 - > drivers/gpu/drm/i915/gt/intel_gt.h| 4 - > drivers/gpu/drm/i915/gt/intel_gt_pm.c | 2 - > drivers/gpu/drm/i915/gt/intel_gt_types.h | 9 - > drivers/gpu/drm/i915/gt/intel_hangcheck.c | 361 -- > drivers/gpu/drm/i915/gt/intel_reset.c | 3 +- > drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 4 - > drivers/gpu/drm/i915/i915_debugfs.c | 87 - > drivers/gpu/drm/i915/i915_drv.c | 3 - > drivers/gpu/drm/i915/i915_drv.h | 1 - > drivers/gpu/drm/i915/i915_getparam.c | 3 +- > drivers/gpu/drm/i915/i915_gpu_error.c | 33 +- > drivers/gpu/drm/i915/i915_gpu_error.h | 2 - > drivers/gpu/drm/i915/i915_params.c| 6 +- > drivers/gpu/drm/i915/i915_priolist_types.h| 6 + > 26 files changed, 159 insertions(+), 562 deletions(-) > delete mode 100644 drivers/gpu/drm/i915/gt/intel_hangcheck.c > > diff --git a/drivers/gpu/drm/i915/Kconfig.profile > b/drivers/gpu/drm/i915/Kconfig.profile > index 3184e8491333..aafb57f84169 100644 > --- a/drivers/gpu/drm/i915/Kconfig.profile > +++ b/drivers/gpu/drm/i915/Kconfig.profile > @@ -37,3 +37,14 @@ config DRM_I915_PREEMPT_TIMEOUT > to execute. > > May be 0 to disable the timeout. > + > +config DRM_I915_HEARTBEAT_INTERVAL > + int "Interval between heartbeat pulses (ms)" > + default 2500 # microseconds > + help > + While active the driver uses a periodic request, a heartbeat, to > + check the wellness of the GPU and to regularly flush state changes > + (idle barriers). > + > + May be 0 to disable heartbeats and therefore disable automatic GPU > + hang detection. > diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile > index eaa74e000985..d7286720de83 100644 > --- a/drivers/gpu/drm/i915/Makefile > +++ b/drivers/gpu/drm/i915/Makefile > @@ -84,7 +84,6 @@ gt-y += \ > gt/intel_gt_irq.o \ > gt/intel_gt_pm.o \ > gt/intel_gt_pm_irq.o \ > - gt/intel_hangcheck.o \ > gt/intel_lrc.o \ > gt/intel_renderstate.o \ > gt/intel_reset.o \ > diff --git a/drivers/gpu/drm/i915/display/intel_display.c > b/drivers/gpu/drm/i915/display/intel_display.c > index 6e74c33f2ec4..e008016d864c 100644 > --- a/drivers/gpu/drm/i915/display/intel_display.c > +++ b/drivers/gpu/drm/i915/display/intel_display.c > @@ -14306,7 +14306,7 @@ static void intel_plane_unpin_fb(struct > intel_plane_state *old_plane_state) > static void fb_obj_bump_render_priority(struct drm_i915_gem_object *obj) > { > struct i915_sched_attr attr = { > - .priority = I915_PRIORITY_DISPLAY, > + .priority = I915_USER_PRIORITY(I915_PRIORITY_DISPLAY), > }; > > i915_gem_object_wait_priority(obj, 0, &attr); > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h > b/drivers/gpu/drm/i915/gem/i915_gem_object.h > index a78af25dce36..967c30737dc5 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h > @@ -429,6 +429,5 @@ int i915_gem_object_wait(struct > drm_i915_gem_object *obj, > int i915_gem_object_wait_priority(struct drm_i915_gem_object *obj, > unsigned int flags, > const struct i915_sched_
Re: [Intel-gfx] [PATCH 28/28] drm/i915: Replace hangcheck by heartbeats
> -Original Message- > From: Chris Wilson > Sent: Monday, August 26, 2019 9:57 AM > To: Bloomfield, Jon ; intel- > g...@lists.freedesktop.org > Cc: Joonas Lahtinen ; Ursulin, Tvrtko > > Subject: RE: [PATCH 28/28] drm/i915: Replace hangcheck by heartbeats > > > Isn't engine->heartbeat now NULL in some cases? > > engine->heartbeat, the worker > vs > engine->last_heartbeat > Doh! > Maybe, > > struct intel_engine_heartbeat { > work; > systole; > }; > > > > > diff --git a/drivers/gpu/drm/i915/i915_getparam.c > > > b/drivers/gpu/drm/i915/i915_getparam.c > > > index 5d9101376a3d..e6c351080593 100644 > > > --- a/drivers/gpu/drm/i915/i915_getparam.c > > > +++ b/drivers/gpu/drm/i915/i915_getparam.c > > > @@ -78,8 +78,7 @@ int i915_getparam_ioctl(struct drm_device *dev, void > > > *data, > > > return -ENODEV; > > > break; > > > case I915_PARAM_HAS_GPU_RESET: > > > - value = i915_modparams.enable_hangcheck && > > > - intel_has_gpu_reset(i915); > > > + value = intel_has_gpu_reset(i915); > > > > Don't understand this tweak. We haven't really changed the essence of > hangcheck, just improved it, so why do we change this get_param? > > I deleted the modparam in earlier patches. But anticipated you would > object... Ok, I see. But then shouldn't we just be checking the new param for a non-zero timeout? That would then be equivalent. Or, it seems fair to conclude that this never made sense, but then it really ought to be a separate patch to remove the association between HAS_GPU_RESET and hangcheck. > -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 29/36] drm/i915: Replace hangcheck by heartbeats
> -Original Message- > From: Chris Wilson > Sent: Thursday, August 29, 2019 1:12 AM > To: intel-gfx@lists.freedesktop.org > Cc: Chris Wilson ; Joonas Lahtinen > ; Ursulin, Tvrtko ; > Bloomfield, Jon > Subject: [PATCH 29/36] drm/i915: Replace hangcheck by heartbeats > > Replace sampling the engine state every so often with a periodic > heartbeat request to measure the health of an engine. This is coupled > with the forced-preemption to allow long running requests to survive so > long as they do not block other users. > > Signed-off-by: Chris Wilson > Cc: Joonas Lahtinen > Cc: Tvrtko Ursulin > Cc: Jon Bloomfield > --- Reviewed-by: Jon Bloomfield ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 28/36] drm/i915: Cancel non-persistent contexts on close
> -Original Message- > From: Chris Wilson > Sent: Thursday, August 29, 2019 1:12 AM > To: intel-gfx@lists.freedesktop.org > Cc: Chris Wilson ; Joonas Lahtinen > ; Winiarski, Michal > ; Bloomfield, Jon > Subject: [PATCH 28/36] drm/i915: Cancel non-persistent contexts on close > > Normally, we rely on our hangcheck to prevent persistent batches from > hogging the GPU. However, if the user disables hangcheck, this mechanism > breaks down. Despite our insistence that this is unsafe, the users are > equally insistent that they want to use endless batches and will disable > the hangcheck mechanism. We are looking at perhaps replacing hangcheck > with a softer mechanism, that sends a pulse down the engine to check if > it is well. We can use the same preemptive pulse to flush an active > persistent context off the GPU upon context close, preventing resources > being lost and unkillable requests remaining on the GPU after process > termination. To avoid changing the ABI and accidentally breaking > existing userspace, we make the persistence of a context explicit and > enable it by default (matching current ABI). Userspace can opt out of > persistent mode (forcing requests to be cancelled when the context is > closed by process termination or explicitly) by a context parameter. To > facilitate existing use-cases of disabling hangcheck, if the modparam is > disabled (i915.enable_hangcheck=0), we disable persistence mode by > default. (Note, one of the outcomes for supporting endless mode will be > the removal of hangchecking, at which point opting into persistent mode > will be mandatory, or maybe the default perhaps controlled by cgroups.) > > Testcase: igt/gem_ctx_persistence > Signed-off-by: Chris Wilson > Cc: Joonas Lahtinen > Cc: Michał Winiarski > Cc: Jon Bloomfield > --- Reviewed-by: Jon Bloomfield ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 2/2] drm/i915/gen7: Clear all EU/L3 residual contexts
Reducing audience as this series is of high interest externally. I fully agree with Joonas' suggestion here, and we have been looking at doing just that. But can we iterate as a follow up patch series? Putting in the infra to support igt assembly from source will take a little time (igt assembler doesn't like the source right now, so it looks like it will need updating), and we are under pressure to get this security fix out. Jon > -Original Message- > From: Joonas Lahtinen > Sent: Friday, January 31, 2020 1:52 AM > To: Abodunrin, Akeem G ; Wilson, Chris P > ; Phillips, D Scott ; > Vetter, Daniel ; Stewart, David C > ; dri-de...@lists.freedesktop.org; Balestrieri, > Francesco ; intel-gfx@lists.freedesktop.org; > Nikula, Jani ; Bloomfield, Jon > ; Kuoppala, Mika ; > Aran, Omer ; Pathi, Pragyansri > ; Kumar Valsan, Prathap > ; Dutt, Sudeep ; > Luck, Tony > Subject: Re: [PATCH 2/2] drm/i915/gen7: Clear all EU/L3 residual contexts > > Quoting Akeem G Abodunrin (2020-01-30 18:57:21) > > From: Prathap Kumar Valsan > > > > On gen7 and gen7.5 devices, there could be leftover data residuals in > > EU/L3 from the retiring context. This patch introduces workaround to clear > > that residual contexts, by submitting a batch buffer with dedicated HW > > context to the GPU with ring allocation for each context switching. > > > > This security mitigation change does not trigger any performance > > regression. Performance is on par with current mainline/drm-tip. > > > > Signed-off-by: Mika Kuoppala > > Signed-off-by: Prathap Kumar Valsan > > Signed-off-by: Akeem G Abodunrin > > Cc: Chris Wilson > > Cc: Balestrieri Francesco > > Cc: Bloomfield Jon > > Cc: Dutt Sudeep > > --- > > drivers/gpu/drm/i915/Makefile | 1 + > > drivers/gpu/drm/i915/gt/gen7_renderclear.c| 535 ++ > > drivers/gpu/drm/i915/gt/gen7_renderclear.h| 15 + > > drivers/gpu/drm/i915/gt/intel_gpu_commands.h | 17 +- > > .../gpu/drm/i915/gt/intel_ring_submission.c | 3 +- > > drivers/gpu/drm/i915/i915_utils.h | 5 + > > 6 files changed, 572 insertions(+), 4 deletions(-) > > create mode 100644 drivers/gpu/drm/i915/gt/gen7_renderclear.c > > create mode 100644 drivers/gpu/drm/i915/gt/gen7_renderclear.h > > > > diff --git a/drivers/gpu/drm/i915/Makefile > b/drivers/gpu/drm/i915/Makefile > > index 3c88d7d8c764..f96bae664a03 100644 > > --- a/drivers/gpu/drm/i915/Makefile > > +++ b/drivers/gpu/drm/i915/Makefile > > @@ -78,6 +78,7 @@ gt-y += \ > > gt/debugfs_gt.o \ > > gt/debugfs_gt_pm.o \ > > gt/gen6_ppgtt.o \ > > + gt/gen7_renderclear.o \ > > gt/gen8_ppgtt.o \ > > gt/intel_breadcrumbs.o \ > > gt/intel_context.o \ > > diff --git a/drivers/gpu/drm/i915/gt/gen7_renderclear.c > b/drivers/gpu/drm/i915/gt/gen7_renderclear.c > > new file mode 100644 > > index ..a6f5f1602e33 > > --- /dev/null > > +++ b/drivers/gpu/drm/i915/gt/gen7_renderclear.c > > @@ -0,0 +1,535 @@ > > +// SPDX-License-Identifier: MIT > > +/* > > + * Copyright © 2019 Intel Corporation > > + */ > > + > > +#include "gen7_renderclear.h" > > +#include "i915_drv.h" > > +#include "i915_utils.h" > > +#include "intel_gpu_commands.h" > > + > > +#define MAX_URB_ENTRIES 64 > > +#define STATE_SIZE (4 * 1024) > > +#define GT3_INLINE_DATA_DELAYS 0x1E00 > > +#define batch_advance(Y, CS) GEM_BUG_ON((Y)->end != (CS)) > > + > > +/* > > + * Media CB Kernel for gen7 devices > > + * TODO: Add comments to kernel, indicating what each array of hex does > or > > + * include header file, which has assembly source and support in igt to be > > + * able to generate kernel in this same format > > + */ > > Having the original source code for the kernels in IGT is the > best way to proceed. The kernels should also be split into > separate files which can be generated from IGT and copied > over as-is for easy verification. > > Regards, Joonas ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Include asm sources for {ivb, hsw}_clear_kernel.c
> -Original Message- > From: Intel-gfx On Behalf Of > Rodrigo Vivi > Sent: Wednesday, June 10, 2020 1:18 PM > To: intel-gfx@lists.freedesktop.org > Cc: Alexandre Oliva ; Nikula, Jani ; > sta...@vger.kernel.org; Chris Wilson > Subject: [Intel-gfx] [PATCH] drm/i915: Include asm sources for {ivb, > hsw}_clear_kernel.c > > Alexandre Oliva has recently removed these files from Linux Libre > with concerns that the sources weren't available. > > The sources are available on IGT repository, and only open source > tools are used to generate the {ivb,hsw}_clear_kernel.c files. > > However, the remaining concern from Alexandre Oliva was around > GPL license and the source not been present when distributing > the code. > > So, it looks like 2 alternatives are possible, the use of > linux-firmware.git repository to store the blob or making sure > that the source is also present in our tree. Since the goal > is to limit the i915 firmware to only the micro-controller blobs > let's make sure that we do include the asm sources here in our tree. > > Btw, I tried to have some diligence here and make sure that the > asms that these commits are adding are truly the source for > the mentioned files: > > igt$ ./scripts/generate_clear_kernel.sh -g ivb \ > -m ~/mesa/build/src/intel/tools/i965_asm > Output file not specified - using default file "ivb-cb_assembled" > > Generating gen7 CB Kernel assembled file "ivb_clear_kernel.c" > for i915 driver... > > igt$ diff ~/i915/drm-tip/drivers/gpu/drm/i915/gt/ivb_clear_kernel.c \ > ivb_clear_kernel.c > > < * Generated by: IGT Gpu Tools on Fri 21 Feb 2020 05:29:32 AM UTC > > * Generated by: IGT Gpu Tools on Mon 08 Jun 2020 10:00:54 AM PDT > 61c61 > < }; > > }; > \ No newline at end of file > > igt$ ./scripts/generate_clear_kernel.sh -g hsw \ > -m ~/mesa/build/src/intel/tools/i965_asm > Output file not specified - using default file "hsw-cb_assembled" > > Generating gen7.5 CB Kernel assembled file "hsw_clear_kernel.c" > for i915 driver... > > igt$ diff ~/i915/drm-tip/drivers/gpu/drm/i915/gt/hsw_clear_kernel.c \ > hsw_clear_kernel.c > 5c5 > < * Generated by: IGT Gpu Tools on Fri 21 Feb 2020 05:30:13 AM UTC > > * Generated by: IGT Gpu Tools on Mon 08 Jun 2020 10:01:42 AM PDT > 61c61 > < }; > > }; > \ No newline at end of file > > Used IGT and Mesa master repositories from Fri Jun 5 2020) > IGT: 53e8c878a6fb ("tests/kms_chamelium: Force reprobe after replugging > the connector") > Mesa: 5d13c7477eb1 ("radv: set keep_statistic_info with > RADV_DEBUG=shaderstats") > Mesa built with: meson build -D platforms=drm,x11 -D dri-drivers=i965 \ > -D gallium-drivers=iris -D prefix=/usr \ >-D libdir=/usr/lib64/ -Dtools=intel \ >-Dkulkan-drivers=intel && ninja -C build > > v2: Header clean-up and include build instructions in a readme (Chris) > Modified commit message to respect check-patch > > Reference: http://www.fsfla.org/pipermail/linux-libre/2020- > June/003374.html > Reference: http://www.fsfla.org/pipermail/linux-libre/2020- > June/003375.html > Fixes: 47f8253d2b89 ("drm/i915/gen7: Clear all EU/L3 residual contexts") > Cc: # v5.7+ > Cc: Alexandre Oliva > Cc: Prathap Kumar Valsan > Cc: Akeem G Abodunrin > Cc: Mika Kuoppala > Cc: Chris Wilson > Cc: Jani Nikula > Cc: Joonas Lahtinen > Signed-off-by: Rodrigo Vivi Reviewed-by: Jon Bloomfield ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v2] drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture
> -Original Message- > From: Chris Wilson [mailto:ch...@chris-wilson.co.uk] > Sent: Wednesday, December 6, 2017 7:38 AM > To: intel-gfx@lists.freedesktop.org > Cc: Chris Wilson ; Bloomfield, Jon > ; Harrison, John C ; > Ursulin, Tvrtko ; Joonas Lahtinen > ; Daniel Vetter > Subject: [PATCH v2] drm/i915: Prevent machine hang from Broxton's vtd w/a > and error capture > > Since capturing the error state requires fiddling around with the GGTT > to read arbitrary buffers and is itself run under stop_machine(), it > deadlocks the machine (effectively a hard hang) when run in conjunction > with Broxton's VTd workaround to serialize GGTT access. > > v2: Store the ERR_PTR in first_error so that the error can be reported > to the user via sysfs. > > Fixes: 0ef34ad6222a ("drm/i915: Serialize GTT/Aperture accesses on BXT") > Signed-off-by: Chris Wilson > Cc: Jon Bloomfield > Cc: John Harrison > Cc: Tvrtko Ursulin > Cc: Joonas Lahtinen > Cc: Daniel Vetter It's a real shame to lose error capture on BXT. Can we wrap stop_machine to make it recursive ? Something like... static cpumask_t sm_mask; struct sm_args { cpu_stop_fn_t *fn; void *data; }; void do_recursive_stop(void *sm_arg_data) { struct sm_arg *args = sm_arg_data; /* We're stopped - flag the fact to prevent recursion */ cpumask_set_cpu(smp_processor_id(), &sm_mask); args->fn(args->data); /* Re-enable recursion */ cpumask_clear_cpu(smp_processor_id(), &sm_mask); } void recursive_stop_machine(cpu_stop_fn_t fn, void *data) { if (cpumask_test_cpu(smp_processor_id(), &sm_mask)) { /* We were already stopped, so can just call directly */ fn(data); } else { /* Our CPU is not currently stopped */ struct sm_args *args = {fn, data}; stop_machine(do_recursive_stop, args, NULL); } } ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v2] drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture
> -Original Message- > From: Intel-gfx [mailto:intel-gfx-boun...@lists.freedesktop.org] On Behalf > Of Bloomfield, Jon > Sent: Wednesday, December 6, 2017 9:01 AM > To: Chris Wilson ; intel-gfx@lists.freedesktop.org > Cc: Daniel Vetter > Subject: Re: [Intel-gfx] [PATCH v2] drm/i915: Prevent machine hang from > Broxton's vtd w/a and error capture > > > -Original Message- > > From: Chris Wilson [mailto:ch...@chris-wilson.co.uk] > > Sent: Wednesday, December 6, 2017 7:38 AM > > To: intel-gfx@lists.freedesktop.org > > Cc: Chris Wilson ; Bloomfield, Jon > > ; Harrison, John C > ; > > Ursulin, Tvrtko ; Joonas Lahtinen > > ; Daniel Vetter > > Subject: [PATCH v2] drm/i915: Prevent machine hang from Broxton's vtd > w/a > > and error capture > > > > Since capturing the error state requires fiddling around with the GGTT > > to read arbitrary buffers and is itself run under stop_machine(), it > > deadlocks the machine (effectively a hard hang) when run in conjunction > > with Broxton's VTd workaround to serialize GGTT access. > > > > v2: Store the ERR_PTR in first_error so that the error can be reported > > to the user via sysfs. > > > > Fixes: 0ef34ad6222a ("drm/i915: Serialize GTT/Aperture accesses on BXT") > > Signed-off-by: Chris Wilson > > Cc: Jon Bloomfield > > Cc: John Harrison > > Cc: Tvrtko Ursulin > > Cc: Joonas Lahtinen > > Cc: Daniel Vetter > > It's a real shame to lose error capture on BXT. Can we wrap stop_machine to > make it recursive ? > > Something like... > > static cpumask_t sm_mask; > > struct sm_args { > cpu_stop_fn_t *fn; > void *data; > }; > > void do_recursive_stop(void *sm_arg_data) > { > struct sm_arg *args = sm_arg_data; > > /* We're stopped - flag the fact to prevent recursion */ > cpumask_set_cpu(smp_processor_id(), &sm_mask); > > args->fn(args->data); > > /* Re-enable recursion */ > cpumask_clear_cpu(smp_processor_id(), &sm_mask); > } > > void recursive_stop_machine(cpu_stop_fn_t fn, void *data) > { > if (cpumask_test_cpu(smp_processor_id(), &sm_mask)) { > /* We were already stopped, so can just call directly */ > fn(data); > } > else { > /* Our CPU is not currently stopped */ > struct sm_args *args = {fn, data}; > stop_machine(do_recursive_stop, args, NULL); > } > } ... I think a single bool is sufficient in place of the cpumask, since it is set and cleared within stop_machine - I started out trying to set/clear outside. ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915/icl: Adjust BSD2 semantics to mean any second VCS instance
> -Original Message- > From: Intel-gfx On Behalf Of > Joonas Lahtinen > Sent: Wednesday, April 18, 2018 3:43 AM > To: Intel-gfx@lists.freedesktop.org; Tvrtko Ursulin > Subject: Re: [Intel-gfx] [PATCH] drm/i915/icl: Adjust BSD2 semantics to mean > any second VCS instance > > Quoting Tvrtko Ursulin (2018-04-18 12:33:42) > > From: Tvrtko Ursulin > > > > Currently our driver assumes BSD2 means hardware engine instance > number > > two. This does not work for Icelake parts with two VCS engines, but which > > are hardware instances 0 and 2, and not 0 and 1 as with previous parts. > > > > This makes the second engine not discoverable via HAS_BSD2 get param, > nor > > it can be targetted by execbuf. > > > > While we are working on the next generation execbuf put in a hack which > > allows discovery and access to this second VCS engine using legacy ABI. > > > > Signed-off-by: Tvrtko Ursulin > > Cc: Chris Wilson > > Cc: Jon Bloomfield > > Cc: Tony Ye > > --- > > Compile tested only. > > > > Also, one could argue if this is just a temporary hack or could actually > > make sense to have this permanently in. It feels like the ABI semantics of > > BSD2 are blurry, or at least could be re-blurred for Gen11. > > --- > > drivers/gpu/drm/i915/i915_drv.c| 8 +++- > > drivers/gpu/drm/i915/i915_gem_execbuffer.c | 10 +- > > 2 files changed, 16 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_drv.c > b/drivers/gpu/drm/i915/i915_drv.c > > index b7dbeba72dec..a185366d9beb 100644 > > --- a/drivers/gpu/drm/i915/i915_drv.c > > +++ b/drivers/gpu/drm/i915/i915_drv.c > > @@ -331,7 +331,13 @@ static int i915_getparam_ioctl(struct drm_device > *dev, void *data, > > value = !!dev_priv->engine[VECS]; > > break; > > case I915_PARAM_HAS_BSD2: > > - value = !!dev_priv->engine[VCS2]; > > + /* > > +* FIXME: Temporary hack for Icelake. > > +* > > +* Make semantics of HAS_BSD2 "has second", or "has two" > VDBOXes > > +* instead of "has VDBOX 2nd hardware instance". > > +*/ > > + value = dev_priv->engine[VCS2] || dev_priv->engine[VCS3]; > > There can be no temporary hacks for the uAPI... You either sign yourself > up to keep it working indefinitely or don't :) > > Regards, Joonas This doesn't really change the API does it? In fact I'd argue this is simply fixing a breakage in the API wrt to previous devices. It makes no sense to expose holes in the engine map to userspace. If a device has two useable VCS engines, HAS_BSD2 should reflect that, and the second engine (wherever it sits physically), should be addressable through the legacy BSD2 execbuf interface. ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915/icl: Adjust BSD2 semantics to mean any second VCS instance
> -Original Message- > From: Tvrtko Ursulin > Sent: Wednesday, April 18, 2018 2:34 AM > To: Intel-gfx@lists.freedesktop.org > Cc: tursu...@ursulin.net; Ursulin, Tvrtko ; Chris > Wilson ; Bloomfield, Jon > ; Ye, Tony > Subject: [PATCH] drm/i915/icl: Adjust BSD2 semantics to mean any second > VCS instance > > From: Tvrtko Ursulin > > Currently our driver assumes BSD2 means hardware engine instance number > two. This does not work for Icelake parts with two VCS engines, but which > are hardware instances 0 and 2, and not 0 and 1 as with previous parts. > > This makes the second engine not discoverable via HAS_BSD2 get param, nor > it can be targetted by execbuf. > > While we are working on the next generation execbuf put in a hack which > allows discovery and access to this second VCS engine using legacy ABI. > > Signed-off-by: Tvrtko Ursulin > Cc: Chris Wilson > Cc: Jon Bloomfield > Cc: Tony Ye I would advocate this patch being merged while the new execbuf API is being developed. Currently there is no way to submit to 2 engine skus with non-sequential engine id's. This doesn't introduce a new ABI, and there is no reason that I can see that the new execbuf solution couldn't be made backward compatible with this. Reviewed-by: Jon Bloomfield ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915/icl: Adjust BSD2 semantics to mean any second VCS instance
> -Original Message- > From: Tvrtko Ursulin > Sent: Friday, April 20, 2018 9:56 AM > To: Bloomfield, Jon ; Tvrtko Ursulin > ; Intel-gfx@lists.freedesktop.org > Subject: Re: [Intel-gfx] [PATCH] drm/i915/icl: Adjust BSD2 semantics to mean > any second VCS instance > > > On 20/04/2018 15:19, Bloomfield, Jon wrote: > >> -Original Message- > >> From: Tvrtko Ursulin > >> Sent: Wednesday, April 18, 2018 2:34 AM > >> To: Intel-gfx@lists.freedesktop.org > >> Cc: tursu...@ursulin.net; Ursulin, Tvrtko ; Chris > >> Wilson ; Bloomfield, Jon > >> ; Ye, Tony > >> Subject: [PATCH] drm/i915/icl: Adjust BSD2 semantics to mean any second > >> VCS instance > >> > >> From: Tvrtko Ursulin > >> > >> Currently our driver assumes BSD2 means hardware engine instance > number > >> two. This does not work for Icelake parts with two VCS engines, but which > >> are hardware instances 0 and 2, and not 0 and 1 as with previous parts. > >> > >> This makes the second engine not discoverable via HAS_BSD2 get param, > nor > >> it can be targetted by execbuf. > >> > >> While we are working on the next generation execbuf put in a hack which > >> allows discovery and access to this second VCS engine using legacy ABI. > >> > >> Signed-off-by: Tvrtko Ursulin > >> Cc: Chris Wilson > >> Cc: Jon Bloomfield > >> Cc: Tony Ye > > I would advocate this patch being merged while the new execbuf API is > being > > developed. Currently there is no way to submit to 2 engine skus with non- > sequential > > engine id's. This doesn't introduce a new ABI, and there is no reason that I > can see > > that the new execbuf solution couldn't be made backward compatible with > this. > > It is a bit of a awkward period to commit to this permanently because it > only solves a subset of problem space and that makes it a hard sell in > that context. > > If there was legacy userspace which ran on 2 VCS Gen11 then maybe, but > otherwise I think best is just wait for the new execbuf API. Or in fact > would there be _any_ upstream userspace using this before the new > execbuf API happens? > Fair point. Will you be physically inhibiting this legacy ABI for gen11? If you intend to allow it it's worth merging, because right now it simply doesn't work. If you will never allow the legacy ABI, and will forcibly prevent it (hardcode HAS_BSD2==0, for gen11+), then I agree we may as well carry the patch as a delta until the new execbuf lands. ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v8 12/35] drm/i915: Implement the HDCP2.2 support for DP
I'm not formally reviewing this series, but while glancing at it, I noticed > -Original Message- > From: Intel-gfx On Behalf Of > Ramalingam C > Sent: Tuesday, November 27, 2018 2:43 AM > To: intel-gfx@lists.freedesktop.org; dri-de...@lists.freedesktop.org; > daniel.vet...@ffwll.ch; Winkler, Tomas > Subject: [Intel-gfx] [PATCH v8 12/35] drm/i915: Implement the HDCP2.2 > support for DP > > Implements the DP adaptation specific HDCP2.2 functions. > > These functions perform the DPCD read and write for communicating the > HDCP2.2 auth message back and forth. > > v2: > wait for cp_irq is merged with this patch. Rebased. > v3: > wait_queue is used for wait for cp_irq [Chris Wilson] > v4: > Style fixed. > %s/PARING/PAIRING > Few style fixes [Uma] > v5: > Lookup table for DP HDCP2.2 msg details [Daniel]. > Extra lines are removed. > v6: > Rebased. > v7: > Fixed some regression introduced at v5. [Ankit] > Macro HDCP_2_2_RX_CAPS_VERSION_VAL is reused [Uma] > Converted a function to inline [Uma] > %s/uintxx_t/uxx > v8: > Error due to the sinks are reported as DEBUG logs. > Adjust to the new mei interface. > > Signed-off-by: Ramalingam C > Signed-off-by: Ankit K Nautiyal > Reviewed-by: Uma Shankar > --- > drivers/gpu/drm/i915/intel_dp.c | 338 > ++ > drivers/gpu/drm/i915/intel_drv.h | 7 + > drivers/gpu/drm/i915/intel_hdcp.c | 6 + > 3 files changed, 351 insertions(+) > > diff --git a/drivers/gpu/drm/i915/intel_dp.c > b/drivers/gpu/drm/i915/intel_dp.c > index ecc4706db7dc..1cc82e490999 100644 > --- a/drivers/gpu/drm/i915/intel_dp.c > +++ b/drivers/gpu/drm/i915/intel_dp.c > @@ -31,6 +31,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -5347,6 +5348,27 @@ void intel_dp_encoder_suspend(struct > intel_encoder *intel_encoder) > pps_unlock(intel_dp); > } > > + > +static struct hdcp2_dp_msg_data *get_hdcp2_dp_msg_data(u8 msg_id) > +{ > + int i; > + > + for (i = 0; i < sizeof(hdcp2_msg_data); i++) Should be sizeof(hdcp2_msg_data) / sizeof(*hdcp2_msg_data) or equivalent. Has the failure return been tested? > + if (hdcp2_msg_data[i].msg_id == msg_id) > + return &hdcp2_msg_data[i]; > + > + return NULL; > +} > + > -- > 2.7.4 > > ___ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 3/6] drm/i915/gtt: Disable read-only support under GVT
> -Original Message- > From: Chris Wilson > Sent: Thursday, July 12, 2018 11:53 AM > To: intel-gfx@lists.freedesktop.org > Cc: Chris Wilson ; Zhenyu Wang > ; Bloomfield, Jon ; > Joonas Lahtinen ; Matthew Auld > > Subject: [PATCH 3/6] drm/i915/gtt: Disable read-only support under GVT > > GVT is not propagating the PTE bits, and is always setting the > read-write bit, thus breaking read-only support. > > Signed-off-by: Chris Wilson > Cc: Zhenyu Wang > Cc: Jon Bloomfield > Cc: Joonas Lahtinen > Cc: Matthew Auld > --- > drivers/gpu/drm/i915/i915_gem_gtt.c | 8 ++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c > b/drivers/gpu/drm/i915/i915_gem_gtt.c > index 6c0b438afe46..8e70a45b8a90 100644 > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c > @@ -1662,8 +1662,12 @@ static struct i915_hw_ppgtt > *gen8_ppgtt_create(struct drm_i915_private *i915) > 1ULL << 48 : > 1ULL << 32; > > - /* From bdw, there is support for read-only pages in the PPGTT */ > - ppgtt->vm.has_read_only = true; > + /* > + * From bdw, there is support for read-only pages in the PPGTT. > + * > + * XXX GVT is not setting honouring the PTE bits. > + */ > + ppgtt->vm.has_read_only = !intel_vgpu_active(i915); > > i915_address_space_init(&ppgtt->vm, i915); > > -- > 2.18.0 Is there a blocker that prevents gvt respecting this bit? I can't think of an obvious reason why it would be a bad thing to support. ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 3/6] drm/i915/gtt: Disable read-only support under GVT
> -Original Message- > From: Chris Wilson > Sent: Friday, July 13, 2018 1:06 AM > To: Bloomfield, Jon ; Zhenyu Wang > > Cc: intel-gfx@lists.freedesktop.org; Zhenyu Wang > ; Joonas Lahtinen > ; Matthew Auld > > Subject: Re: [PATCH 3/6] drm/i915/gtt: Disable read-only support under GVT > > Quoting Zhenyu Wang (2018-07-13 03:03:10) > > On 2018.07.12 20:36:03 +, Bloomfield, Jon wrote: > > > > -Original Message- > > > > From: Chris Wilson > > > > Sent: Thursday, July 12, 2018 11:53 AM > > > > To: intel-gfx@lists.freedesktop.org > > > > Cc: Chris Wilson ; Zhenyu Wang > > > > ; Bloomfield, Jon > ; > > > > Joonas Lahtinen ; Matthew Auld > > > > > > > > Subject: [PATCH 3/6] drm/i915/gtt: Disable read-only support under > GVT > > > > > > > > GVT is not propagating the PTE bits, and is always setting the > > > > read-write bit, thus breaking read-only support. > > > > > > > > Signed-off-by: Chris Wilson > > > > Cc: Zhenyu Wang > > > > Cc: Jon Bloomfield > > > > Cc: Joonas Lahtinen > > > > Cc: Matthew Auld > > > > --- > > > > drivers/gpu/drm/i915/i915_gem_gtt.c | 8 ++-- > > > > 1 file changed, 6 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c > > > > b/drivers/gpu/drm/i915/i915_gem_gtt.c > > > > index 6c0b438afe46..8e70a45b8a90 100644 > > > > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c > > > > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c > > > > @@ -1662,8 +1662,12 @@ static struct i915_hw_ppgtt > > > > *gen8_ppgtt_create(struct drm_i915_private *i915) > > > > 1ULL << 48 : > > > > 1ULL << 32; > > > > > > > > - /* From bdw, there is support for read-only pages in the PPGTT */ > > > > - ppgtt->vm.has_read_only = true; > > > > + /* > > > > +* From bdw, there is support for read-only pages in the PPGTT. > > > > +* > > > > +* XXX GVT is not setting honouring the PTE bits. > > > > +*/ > > > > + ppgtt->vm.has_read_only = !intel_vgpu_active(i915); > > > > > > > > i915_address_space_init(&ppgtt->vm, i915); > > > > > > > > -- > > > > 2.18.0 > > > > > > Is there a blocker that prevents gvt respecting this bit? I can't think > > > of an > obvious > > > reason why it would be a bad thing to support. > > > > I don't think any blocker for gvt support, we can respect that bit when > shadowing. > > But we need capability check on host gvt when that support is ready. > > Another cap bit required, so ack on both sides? > -Chris I see. Not as permanent disable, just more plumbing needed. I'm happy then :-) Reviewed-by: Jon Bloomfield ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v9 7/7] drm/i915: add a sysfs entry to let users set sseu configs
> -Original Message- > From: Intel-gfx On Behalf Of > Tvrtko Ursulin > Sent: Thursday, June 14, 2018 1:29 AM > To: Joonas Lahtinen ; Chris Wilson > ; Landwerlin, Lionel G > ; intel-gfx@lists.freedesktop.org > Subject: Re: [Intel-gfx] [PATCH v9 7/7] drm/i915: add a sysfs entry to let > users > set sseu configs > > > On 13/06/2018 13:49, Joonas Lahtinen wrote: > > Quoting Tvrtko Ursulin (2018-06-12 15:02:07) > >> > >> On 12/06/2018 11:52, Lionel Landwerlin wrote: > >>> On 12/06/18 11:37, Chris Wilson wrote: > Quoting Lionel Landwerlin (2018-06-12 11:33:34) > > On 12/06/18 10:20, Joonas Lahtinen wrote: > >> Quoting Chris Wilson (2018-06-11 18:02:37) > >>> Quoting Lionel Landwerlin (2018-06-11 14:46:07) > On 11/06/18 13:10, Tvrtko Ursulin wrote: > > On 30/05/2018 15:33, Lionel Landwerlin wrote: > >> There are concerns about denial of service around the per > >> context sseu > >> configuration capability. In a previous commit introducing the > >> capability we allowed it only for capable users. This changes > >> adds a > >> new debugfs entry to let any user configure its own context > >> powergating setup. > > As far as I understood it, Joonas' concerns here are: > > > > 1) That in the containers use case individual containers wouldn't > be > > able to turn on the sysfs toggle for them. > > > > 2) That also in the containers use case if box admin turns on the > > feature, some containers would potentially start negatively > > affecting > > the others (via the accumulated cost of slice re-configuration on > > context switching). > > > > I am not familiar with typical container setups to be authoritative > > here, but intuitively I find it reasonable that a low-level hardware > > switch like this would be under the control of a master domain > > administrator. ("If you are installing our product in the container > > environment, make sure your system administrator enables this > > hardware > > feature.", "Note to system administrators: Enabling this features > > may > > negatively affect the performance of other containers.") > > > > Alternative proposal is for the i915 to apply an "or" filter on all > > requested masks and in that way ensure dynamic re- > configuration > > doesn't happen on context switches, but driven from userspace > via > > ioctls. > > > > In other words, should _all_ userspace agree between > themselves that > > they want to turn off a slice, they would then need to send out a > > concerted ioctl storm, where number of needed ioctls equals > the > > number > > of currently active contexts. (This may have its own performance > > consequences caused by the barriers needed to modify all > context > > images.) > > > > This was deemed acceptable the the media use case, but my > concern is > > the approach is not elegant and will tie us with the "or" policy in > > the ABI. (Performance concerns I haven't evaluated yet, but > they > > also > > may be significant.) > > > > If we go back thinking about the containers use case, then it > > transpires that even though the "or" policy does prevent one > > container > > from affecting the other from one angle, it also prevents one > > container from exercising the feature unless all containers > > co-operate. > > > > As such, we can view the original problem statement where we > have an > > issue if not everyone co-operates, as conceptually the same just > > from > > an opposite angle. (Rather than one container incurring the > > increased > > cost of context switches to the rest, we would have one > container > > preventing the optimized slice configuration to the other.) > > > > From this follows that both proposals require complete > > co-operation > > from all running userspace to avoid complete control of the > feature. > > > > Since the balance between the benefit of optimized slice > > configuration > > (or penalty of suboptimal one), versus the penalty of increased > > context switch times, cannot be know by the driver (barring > > venturing > > into the heuristics territory), that is another reason why I find > > the > > "or" policy in the driver questionable. > > > > We can also ask a question of - If we go with the "or" policy, why > > require N per-context ioctls to modify the global GPU > configuration > > and not instead add a global driver ioctl to modify the state? > > >
Re: [Intel-gfx] [PATCH v9 7/7] drm/i915: add a sysfs entry to let users set sseu configs
Gratuitous top posting to re-kick the thread. For Gen11 we can't have an on/off switch anyway (media simply won't run with an oncompatible slice config), so let's agree on an api to allow userland to select the slice configuration for its contexts, for Gen11 only. I'd prefer a simple {MediaConfig/GeneralConfig} type context param but I really don't care that much. For gen9/10 it's arguable whether we need this at all. The effect on media workloads varies, but I'm guessing normal users (outside a transcode type environment) will never know they're missing anything. Either way, we can continue discussing while we progress the gen11 solution. Jon > -Original Message- > From: Intel-gfx On Behalf Of > Bloomfield, Jon > Sent: Wednesday, July 18, 2018 9:44 AM > To: Tvrtko Ursulin ; Joonas Lahtinen > ; Chris Wilson ; > Landwerlin, Lionel G ; intel- > g...@lists.freedesktop.org > Subject: Re: [Intel-gfx] [PATCH v9 7/7] drm/i915: add a sysfs entry to let > users > set sseu configs > > > -Original Message- > > From: Intel-gfx On Behalf Of > > Tvrtko Ursulin > > Sent: Thursday, June 14, 2018 1:29 AM > > To: Joonas Lahtinen ; Chris Wilson > > ; Landwerlin, Lionel G > > ; intel-gfx@lists.freedesktop.org > > Subject: Re: [Intel-gfx] [PATCH v9 7/7] drm/i915: add a sysfs entry to let > users > > set sseu configs > > > > > > On 13/06/2018 13:49, Joonas Lahtinen wrote: > > > Quoting Tvrtko Ursulin (2018-06-12 15:02:07) > > >> > > >> On 12/06/2018 11:52, Lionel Landwerlin wrote: > > >>> On 12/06/18 11:37, Chris Wilson wrote: > > >>>> Quoting Lionel Landwerlin (2018-06-12 11:33:34) > > >>>>> On 12/06/18 10:20, Joonas Lahtinen wrote: > > >>>>>> Quoting Chris Wilson (2018-06-11 18:02:37) > > >>>>>>> Quoting Lionel Landwerlin (2018-06-11 14:46:07) > > >>>>>>>> On 11/06/18 13:10, Tvrtko Ursulin wrote: > > >>>>>>>>> On 30/05/2018 15:33, Lionel Landwerlin wrote: > > >>>>>>>>>> There are concerns about denial of service around the per > > >>>>>>>>>> context sseu > > >>>>>>>>>> configuration capability. In a previous commit introducing the > > >>>>>>>>>> capability we allowed it only for capable users. This changes > > >>>>>>>>>> adds a > > >>>>>>>>>> new debugfs entry to let any user configure its own context > > >>>>>>>>>> powergating setup. > > >>>>>>>>> As far as I understood it, Joonas' concerns here are: > > >>>>>>>>> > > >>>>>>>>> 1) That in the containers use case individual containers > wouldn't > > be > > >>>>>>>>> able to turn on the sysfs toggle for them. > > >>>>>>>>> > > >>>>>>>>> 2) That also in the containers use case if box admin turns on > the > > >>>>>>>>> feature, some containers would potentially start negatively > > >>>>>>>>> affecting > > >>>>>>>>> the others (via the accumulated cost of slice re-configuration > on > > >>>>>>>>> context switching). > > >>>>>>>>> > > >>>>>>>>> I am not familiar with typical container setups to be > authoritative > > >>>>>>>>> here, but intuitively I find it reasonable that a low-level > hardware > > >>>>>>>>> switch like this would be under the control of a master domain > > >>>>>>>>> administrator. ("If you are installing our product in the > container > > >>>>>>>>> environment, make sure your system administrator enables > this > > >>>>>>>>> hardware > > >>>>>>>>> feature.", "Note to system administrators: Enabling this > features > > >>>>>>>>> may > > >>>>>>>>> negatively affect the performance of other containers.") > > >>>>>>>>> > > >>>>>>>>> Alternative proposal is for the i915 to apply an "or" filter on > > >>>>>>>>> all > > >>&g
Re: [Intel-gfx] [RFC] drm/i915: Engine discovery query
> -Original Message- > From: Tvrtko Ursulin [mailto:tursu...@ursulin.net] > Sent: Wednesday, March 14, 2018 7:06 AM > To: Intel-gfx@lists.freedesktop.org > Cc: tursu...@ursulin.net; Ursulin, Tvrtko ; Chris > Wilson ; Bloomfield, Jon > ; Rogozhkin, Dmitry V > ; Landwerlin, Lionel G > ; Joonas Lahtinen > > Subject: [RFC] drm/i915: Engine discovery query > > From: Tvrtko Ursulin > > Engine discovery query allows userspace to enumerate engines, probe their > configuration features, all without needing to maintain the internal PCI > ID based database. > > A new query for the generic i915 query ioctl is added named > DRM_I915_QUERY_ENGINE_INFO, together with accompanying structure > drm_i915_query_engine_info. The address of latter should be passed to the > kernel in the query.data_ptr field, and should be large enough for the > kernel to fill out all known engines as struct drm_i915_engine_info > elements trailing the query. > > As with other queries, setting the item query length to zero allows > userspace to query minimum required buffer size. > > Enumerated engines have common type mask which can be used to query all > hardware engines, versus engines userspace can submit to using the execbuf > uAPI. > > Engines also have capabilities which are per engine class namespace of > bits describing features not present on all engine instances. > > Signed-off-by: Tvrtko Ursulin > Cc: Chris Wilson > Cc: Jon Bloomfield > Cc: Dmitry Rogozhkin > Cc: Lionel Landwerlin > Cc: Joonas Lahtinen > --- > This is RFC for now since it is not very usable before the new execbuf API > or virtual engine queue submission gets closer. > > In this version I have added capability of distinguishing between hardware > engines and ABI engines. This is to account for probable future use cases > like virtualization, where guest might only see one engine instance, but > will want to know overall hardware capabilities in order to tune its > behaviour. The patch looks good, but... I can't see any use for exposing the unreachable engines. Can you elaborate on why a umd running in a VM would need to know about the engines assigned to other VM's ? Is it even desirable to leak the physical capabilities to VM's ? In general a VM would have a very limited view of the underlying hardware. It's unlikely to even be capable of discovering true h/w engine counts. ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [RFC 0/8] Force preemption
> From: Intel-gfx [mailto:intel-gfx-boun...@lists.freedesktop.org] On Behalf > Of Jeff McGee > Sent: Thursday, March 22, 2018 12:09 PM > To: Tvrtko Ursulin > Cc: Kondapally, Kalyan ; intel- > g...@lists.freedesktop.org; b...@bwidawsk.net > Subject: Re: [Intel-gfx] [RFC 0/8] Force preemption > > On Thu, Mar 22, 2018 at 05:41:57PM +, Tvrtko Ursulin wrote: > > > > On 22/03/2018 16:01, Jeff McGee wrote: > > >On Thu, Mar 22, 2018 at 03:57:49PM +, Tvrtko Ursulin wrote: > > >> > > >>On 22/03/2018 14:34, Jeff McGee wrote: > > >>>On Thu, Mar 22, 2018 at 09:28:00AM +, Chris Wilson wrote: > > Quoting Tvrtko Ursulin (2018-03-22 09:22:55) > > > > > >On 21/03/2018 17:26, jeff.mc...@intel.com wrote: > > >>From: Jeff McGee > > >> > > >>Force preemption uses engine reset to enforce a limit on the time > > >>that a request targeted for preemption can block. This feature is > > >>a requirement in automotive systems where the GPU may be > shared by > > >>clients of critically high priority and clients of low priority that > > >>may not have been curated to be preemption friendly. There may > be > > >>more general applications of this feature. I'm sharing as an RFC to > > >>stimulate that discussion and also to get any technical feedback > > >>that I can before submitting to the product kernel that needs this. > > >>I have developed the patches for ease of rebase, given that this is > > >>for the moment considered a non-upstreamable feature. It would > be > > >>possible to refactor hangcheck to fully incorporate force > preemption > > >>as another tier of patience (or impatience) with the running > request. > > > > > >Sorry if it was mentioned elsewhere and I missed it - but does this > work > > >only with stateless clients - or in other words, what would happen to > > >stateful clients which would be force preempted? Or the answer is > we > > >don't care since they are misbehaving? > > > > They get notified of being guilty for causing a gpu reset; three strikes > > and they are out (banned from using the gpu) using the current rules. > > This is a very blunt hammer that requires the rest of the system to be > > robust; one might argue time spent making the system robust would > be > > better served making sure that the timer never expired in the first > place > > thereby eliminating the need for a forced gpu reset. > > -Chris > > >>> > > >>>Yes, for simplication the policy applied to force preempted contexts > > >>>is the same as for hanging contexts. It is known that this feature > > >>>should not be required in a fully curated system. It's a requirement > > >>>if end user will be alllowed to install 3rd party apps to run in the > > >>>non-critical domain. > > >> > > >>My concern is whether it safe to call this force _preemption_, while > > >>it is not really expected to work as preemption from the point of > > >>view of preempted context. I may be missing some angle here, but I > > >>think a better name would include words like maximum request > > >>duration or something. > > >> > > >>I can see a difference between allowed maximum duration when there > > >>is something else pending, and when it isn't, but I don't > > >>immediately see that we should consider this distinction for any > > >>real benefit? > > >> > > >>So should the feature just be "maximum request duration"? This would > > >>perhaps make it just a special case of hangcheck, which ignores head > > >>progress, or whatever we do in there. > > >> > > >>Regards, > > >> > > >>Tvrtko > > > > > >I think you might be unclear about how this works. We're not starting a > > >preemption to see if we can cleanly remove a request who has begun to > > >exceed its normal time slice, i.e. hangcheck. This is about bounding > > >the time that a normal preemption can take. So first start preemption > > >in response to higher-priority request arrival, then wait for preemption > > >to complete within a certain amount of time. If it does not, resort to > > >reset. > > > > > >So it's really "force the resolution of a preemption", shortened to > > >"force preemption". > > > > You are right, I veered off in my thinking and ended up with > > something different. :) > > > > I however still think the name is potentially misleading, since the > > request/context is not getting preempted. It is getting effectively > > killed (sooner or later, directly or indirectly). > > > > Maybe that is OK for the specific use case when everything is only > > broken and not malicious. > > > > In a more general purpose system it would be a bit random when > > something would work, and when it wouldn't, depending on system > > setup and even timings. > > > > Hm, maybe you don't even really benefit from the standard three > > strikes and you are out policy, and for this specific use case, you > > should just kill it straight away. If it couldn't be preempted once, > > why pay
Re: [Intel-gfx] [PATCH 3/5] drm/i915: Prevent writing into a read-only object via a GGTT mmap
> -Original Message- > From: Chris Wilson > Sent: Thursday, June 14, 2018 5:00 AM > To: intel-gfx@lists.freedesktop.org > Cc: Chris Wilson ; Bloomfield, Jon > ; Joonas Lahtinen > ; Matthew Auld > > Subject: [PATCH 3/5] drm/i915: Prevent writing into a read-only object via a > GGTT mmap > > If the user has created a read-only object, they should not be allowed > to circumvent the write protection by using a GGTT mmapping. Deny it. > > Also most machines do not support read-only GGTT PTEs, so again we have > to reject attempted writes. Fortunately, this is known a priori, so we > can at least reject in the call to create the mmap with backup in the > fault handler. This is a little draconian as we could blatantly ignore > the write protection on the pages, but it is far simply to keep the > readonly object pure. (It is easier to lift a restriction than to impose > it later!) Are you sure this is necessary? I assumed you would just create a ro IA mapping to the page, irrespective of the ability of ggtt. It feels wrong to disallow mapping a read-only object to the CPU as read-only. With ppgtt the presence of an unprotected mapping in the ggtt should be immune from tampering in the GT, so only the cpu mapping should really matter. > > Signed-off-by: Chris Wilson > Cc: Jon Bloomfield > Cc: Joonas Lahtinen > Cc: Matthew Auld > --- ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 3/5] drm/i915: Prevent writing into a read-only object via a GGTT mmap
> -Original Message- > From: Chris Wilson > Sent: Thursday, June 14, 2018 8:00 AM > To: Bloomfield, Jon ; intel- > g...@lists.freedesktop.org > Cc: Joonas Lahtinen ; Matthew Auld > > Subject: RE: [PATCH 3/5] drm/i915: Prevent writing into a read-only object via > a GGTT mmap > > Quoting Bloomfield, Jon (2018-06-14 15:53:13) > > > -Original Message- > > > From: Chris Wilson > > > Sent: Thursday, June 14, 2018 5:00 AM > > > To: intel-gfx@lists.freedesktop.org > > > Cc: Chris Wilson ; Bloomfield, Jon > > > ; Joonas Lahtinen > > > ; Matthew Auld > > > > > > Subject: [PATCH 3/5] drm/i915: Prevent writing into a read-only object via > a > > > GGTT mmap > > > > > > If the user has created a read-only object, they should not be allowed > > > to circumvent the write protection by using a GGTT mmapping. Deny it. > > > > > > Also most machines do not support read-only GGTT PTEs, so again we > have > > > to reject attempted writes. Fortunately, this is known a priori, so we > > > can at least reject in the call to create the mmap with backup in the > > > fault handler. This is a little draconian as we could blatantly ignore > > > the write protection on the pages, but it is far simply to keep the > > > readonly object pure. (It is easier to lift a restriction than to impose > > > it later!) > > Are you sure this is necessary? I assumed you would just create a ro IA > > mapping to the page, irrespective of the ability of ggtt. > > You are thinking of the CPU mmap? The GTT mmap offers a linear view of > the tiled object. It would be very wrong for us to bypass the PROT_READ > protection of a user page by accessing it via the GTT. No, I was thinking of gtt mmap. That requires both GTT and IA PTE mappings right? One to map the object into the GTT, and then a second to point the IA at the aperture. Why wouldn't marking the IA mapping RO protect the object if the GT cannot reach the GTT mapping (from user batches). > > > It feels wrong to > > disallow mapping a read-only object to the CPU as read-only. With ppgtt > > the presence of an unprotected mapping in the ggtt should be immune > > from tampering in the GT, so only the cpu mapping should really matter. > > And the CPU mapping has its protection bits on the IA PTE. > -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 3/5] drm/i915: Prevent writing into a read-only object via a GGTT mmap
> -Original Message- > From: Chris Wilson > Sent: Thursday, June 14, 2018 8:22 AM > To: Bloomfield, Jon ; intel- > g...@lists.freedesktop.org > Cc: Joonas Lahtinen ; Matthew Auld > > Subject: RE: [PATCH 3/5] drm/i915: Prevent writing into a read-only object via > a GGTT mmap > > Quoting Bloomfield, Jon (2018-06-14 16:06:40) > > > -Original Message- > > > From: Chris Wilson > > > Sent: Thursday, June 14, 2018 8:00 AM > > > To: Bloomfield, Jon ; intel- > > > g...@lists.freedesktop.org > > > Cc: Joonas Lahtinen ; Matthew Auld > > > > > > Subject: RE: [PATCH 3/5] drm/i915: Prevent writing into a read-only > object via > > > a GGTT mmap > > > > > > Quoting Bloomfield, Jon (2018-06-14 15:53:13) > > > > > -Original Message- > > > > > From: Chris Wilson > > > > > Sent: Thursday, June 14, 2018 5:00 AM > > > > > To: intel-gfx@lists.freedesktop.org > > > > > Cc: Chris Wilson ; Bloomfield, Jon > > > > > ; Joonas Lahtinen > > > > > ; Matthew Auld > > > > > > > > > > Subject: [PATCH 3/5] drm/i915: Prevent writing into a read-only > object via > > > a > > > > > GGTT mmap > > > > > > > > > > If the user has created a read-only object, they should not be allowed > > > > > to circumvent the write protection by using a GGTT mmapping. Deny > it. > > > > > > > > > > Also most machines do not support read-only GGTT PTEs, so again we > > > have > > > > > to reject attempted writes. Fortunately, this is known a priori, so we > > > > > can at least reject in the call to create the mmap with backup in the > > > > > fault handler. This is a little draconian as we could blatantly ignore > > > > > the write protection on the pages, but it is far simply to keep the > > > > > readonly object pure. (It is easier to lift a restriction than to > > > > > impose > > > > > it later!) > > > > Are you sure this is necessary? I assumed you would just create a ro IA > > > > mapping to the page, irrespective of the ability of ggtt. > > > > > > You are thinking of the CPU mmap? The GTT mmap offers a linear view of > > > the tiled object. It would be very wrong for us to bypass the PROT_READ > > > protection of a user page by accessing it via the GTT. > > No, I was thinking of gtt mmap. That requires both GTT and IA PTE > mappings > > right? One to map the object into the GTT, and then a second to point the > > IA at the aperture. Why wouldn't marking the IA mapping RO protect the > > object if the GT cannot reach the GTT mapping (from user batches). > > Hmm. I keep forgetting that we can get at the vma from mmap(), because > that's hidden away elsewhere and only see i915_gem_fault() on a daily > basis. Hmm, is legal to read a PROT_READ-only vma is PROT_WRITE is > requested, or are meant to return -EINVAL? > -Chris That's a trickier question :-) My instinct in -EINVAL if the user specifies PROT_WRITE, but allowed if they only PROT_READ, and ppgtt is enabled (including aliased) so that userspace can't see the gtt mapping from the GT. ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v2] drm/i915: Prevent writing into a read-only object via a GGTT mmap
> -Original Message- > From: Chris Wilson > Sent: Thursday, June 14, 2018 9:07 AM > To: intel-gfx@lists.freedesktop.org > Cc: dri-de...@lists.freedesktop.org; Chris Wilson ; > Bloomfield, Jon ; Joonas Lahtinen > ; Matthew Auld > ; David Herrmann > > Subject: [PATCH v2] drm/i915: Prevent writing into a read-only object via a > GGTT mmap > > If the user has created a read-only object, they should not be allowed > to circumvent the write protection by using a GGTT mmapping. Deny it. > > Also most machines do not support read-only GGTT PTEs, so again we have > to reject attempted writes. Fortunately, this is known a priori, so we > can at least reject in the call to create the mmap (with a sanity check > in the fault handler). > > v2: Check the vma->vm_flags during mmap() to allow readonly access. > > Signed-off-by: Chris Wilson > Cc: Jon Bloomfield > Cc: Joonas Lahtinen > Cc: Matthew Auld > Cc: David Herrmann Shame about the BUG_ON, but probably overkill to add code to suppress the RO flag just for mmap. Reviewed-by: Jon Bloomfield ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 5/5] drm/i915/userptr: Enable read-only support on gen8+
> -Original Message- > From: Chris Wilson > Sent: Thursday, June 14, 2018 5:00 AM > To: intel-gfx@lists.freedesktop.org > Cc: Chris Wilson ; Bloomfield, Jon > ; Joonas Lahtinen > > Subject: [PATCH 5/5] drm/i915/userptr: Enable read-only support on gen8+ > > On gen8 and onwards, we can mark GPU accesses through the ppGTT as > being > read-only, that is cause any GPU write onto that page to be discarded > (not triggering a fault). This is all that we need to finally support > the read-only flag for userptr! > > Testcase: igt/gem_userptr_blits/readonly* > Signed-off-by: Chris Wilson > Cc: Jon Bloomfield > Cc: Joonas Lahtinen > --- Reviewed-by: Jon Bloomfield ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 4/5] drm/i915: Reject attempted pwrites into a read-only object
> -Original Message- > From: Chris Wilson > Sent: Thursday, June 14, 2018 5:00 AM > To: intel-gfx@lists.freedesktop.org > Cc: Chris Wilson ; Bloomfield, Jon > ; Joonas Lahtinen > ; Matthew Auld > > Subject: [PATCH 4/5] drm/i915: Reject attempted pwrites into a read-only > object > > If the user created a read-only object, they should not be allowed to > circumvent the write protection using the pwrite ioctl. > > Signed-off-by: Chris Wilson > Cc: Jon Bloomfield > Cc: Joonas Lahtinen > Cc: Matthew Auld > --- Reviewed-by: Jon Bloomfield ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Serialize GTT Updates on BXT
> -Original Message- > From: Chris Wilson [mailto:ch...@chris-wilson.co.uk] > Sent: Monday, May 22, 2017 1:05 PM > To: Bloomfield, Jon > Cc: intel-gfx@lists.freedesktop.org > Subject: Re: [Intel-gfx] [PATCH] drm/i915: Serialize GTT Updates on BXT > > On Mon, May 22, 2017 at 11:07:25AM -0700, Jon Bloomfield wrote: > > BXT requires accesses to the GTT (i.e. PTE updates) to be serialized > > when IOMMU is enabled. > > Serialised with what, since all writes are serialized already? Fair cop guv. I'll reword. > > The reason is that you need to explain the hw model you are protecting, for > example do clears need to be protected? > > > This patch guarantees this by wrapping all updates in stop_machine and > > using a flushing read to guarantee that the GTT writes have reached > > their destination before restarting. > > If you mention which patch you are reinstating (for a new problem) and cc > the author, he might point out what has changed in the meantime. I don't understand. I'm not re-instating any patches to my knowledge, so it's a bit hard to cc the author. > > Note, the flush here is not about ensuring the GTT writes reach their > destination. > > > Signed-off-by: Jon Bloomfield > > If you are the author and sender, what is John's s-o-b doing afterwards? This patch was previously signed off by John. > > > Signed-off-by: John Harrison > > --- > > drivers/gpu/drm/i915/i915_gem_gtt.c | 106 > > > > 1 file changed, 106 insertions(+) > > > > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c > > b/drivers/gpu/drm/i915/i915_gem_gtt.c > > index 7c769d7..6360d92 100644 > > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c > > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c > > @@ -2191,6 +2191,100 @@ static void gen8_ggtt_clear_range(struct > i915_address_space *vm, > > gen8_set_pte(>t_base[i], scratch_pte); } > > > > +#ifdef CONFIG_INTEL_IOMMU > > +struct insert_page { > > + struct i915_address_space *vm; > > + dma_addr_t addr; > > + u64 offset; > > + enum i915_cache_level level; > > +}; > > + > > +static int gen8_ggtt_insert_page__cb(void *_arg) { > > + struct insert_page *arg = _arg; > > + > > + struct drm_i915_private *dev_priv = arg->vm->i915; > > + > > + gen8_ggtt_insert_page(arg->vm, arg->addr, > > + arg->offset, arg->level, 0); > > + > > + POSTING_READ(GFX_FLSH_CNTL_GEN6); > > This is now just a call to i915_ggtt_invalidate() because we are now also > responsible for invalidating the guc tlbs as well as the chipset. > And more importantly it is already done by gen8_ggtt_insert_page. > > All the POSTING_READ(GFX_FLSH_CNTL_GEN6) are spurious. Are you sure - The purpose of the register read is to ensure that all the PTE writes are flushed from the h/w queue before we restart the machine. It is critical that all the PTE writes have left this queue before any other accesses are allowed to begin. Isn't the invalidate a posted write ? If so, it won't drain the queue. Even if the invalidate is guaranteed to effect this pipeline flish, the clear_page path doesn't call invalidate, so it's certainly required there. > > > static void gen6_ggtt_clear_range(struct i915_address_space *vm, > > u64 start, u64 length) > > { > > @@ -2789,6 +2883,18 @@ static int gen8_gmch_probe(struct i915_ggtt > > *ggtt) > > > > ggtt->base.insert_entries = gen8_ggtt_insert_entries; > > > > +#ifdef CONFIG_INTEL_IOMMU > > + /* Serialize GTT updates on BXT if VT-d is on. */ > > + if (IS_BROXTON(dev_priv) && intel_iommu_gfx_mapped) { > > Move to a header and don't ifdef out the users. A small cost in object side > for > the benefit of keeping these ifdef out of code. Move what to a header ? You mean create a macro for the test, the whole block, or something else ? I was following the pattern used elsewhere in the code in the vain hope that by following established convention we might avoid bike-shedding. Every single use of intel_iommu_gfx_mapped in this file is protected by the #ifdef. > -Chris > > -- > Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Serialize GTT Updates on BXT
> -Original Message- > From: Chris Wilson [mailto:ch...@chris-wilson.co.uk] > Sent: Tuesday, May 23, 2017 12:33 AM > To: Bloomfield, Jon > Cc: intel-gfx@lists.freedesktop.org > Subject: Re: [Intel-gfx] [PATCH] drm/i915: Serialize GTT Updates on BXT > > On Mon, May 22, 2017 at 10:18:31PM +, Bloomfield, Jon wrote: > > > -Original Message- > > > From: Chris Wilson [mailto:ch...@chris-wilson.co.uk] > > > Sent: Monday, May 22, 2017 1:05 PM > > > To: Bloomfield, Jon > > > Cc: intel-gfx@lists.freedesktop.org > > > Subject: Re: [Intel-gfx] [PATCH] drm/i915: Serialize GTT Updates on BXT > > > > > > On Mon, May 22, 2017 at 11:07:25AM -0700, Jon Bloomfield wrote: > > > > BXT requires accesses to the GTT (i.e. PTE updates) to be serialized > > > > when IOMMU is enabled. > > > > > > Serialised with what, since all writes are serialized already? > > > > Fair cop guv. I'll reword. > > > > > > > > The reason is that you need to explain the hw model you are protecting, > for > > > example do clears need to be protected? > > > > > > > This patch guarantees this by wrapping all updates in stop_machine and > > > > using a flushing read to guarantee that the GTT writes have reached > > > > their destination before restarting. > > > > > > If you mention which patch you are reinstating (for a new problem) and > cc > > > the author, he might point out what has changed in the meantime. > > > > I don't understand. I'm not re-instating any patches to my knowledge, so > it's a bit hard to cc the author. > > Please then review history before submitting *copied* code. > > > > Note, the flush here is not about ensuring the GTT writes reach their > > > destination. > > > > > > > Signed-off-by: Jon Bloomfield > > > > > > If you are the author and sender, what is John's s-o-b doing afterwards? > > This patch was previously signed off by John. > > > > > > > > > Signed-off-by: John Harrison > > > > --- > > > > drivers/gpu/drm/i915/i915_gem_gtt.c | 106 > > > > > > > > 1 file changed, 106 insertions(+) > > > > > > > > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c > > > > b/drivers/gpu/drm/i915/i915_gem_gtt.c > > > > index 7c769d7..6360d92 100644 > > > > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c > > > > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c > > > > @@ -2191,6 +2191,100 @@ static void gen8_ggtt_clear_range(struct > > > i915_address_space *vm, > > > > gen8_set_pte(>t_base[i], scratch_pte); } > > > > > > > > +#ifdef CONFIG_INTEL_IOMMU > > > > +struct insert_page { > > > > + struct i915_address_space *vm; > > > > + dma_addr_t addr; > > > > + u64 offset; > > > > + enum i915_cache_level level; > > > > +}; > > > > + > > > > +static int gen8_ggtt_insert_page__cb(void *_arg) { > > > > + struct insert_page *arg = _arg; > > > > + > > > > + struct drm_i915_private *dev_priv = arg->vm->i915; > > > > + > > > > + gen8_ggtt_insert_page(arg->vm, arg->addr, > > > > + arg->offset, arg->level, 0); > > > > + > > > > + POSTING_READ(GFX_FLSH_CNTL_GEN6); > > > > > > This is now just a call to i915_ggtt_invalidate() because we are now also > > > responsible for invalidating the guc tlbs as well as the chipset. > > > And more importantly it is already done by gen8_ggtt_insert_page. > > > > > > All the POSTING_READ(GFX_FLSH_CNTL_GEN6) are spurious. > > > > Are you sure - The purpose of the register read is to ensure that all the > > PTE > writes are flushed from the h/w queue > > before we restart the machine. It is critical that all the PTE writes have > > left > this queue before any other accesses are > > allowed to begin. > > Isn't the invalidate a posted write ? If so, it won't drain the queue. > > Even if the invalidate is guaranteed to effect this pipeline flish, the > clear_page path doesn't call invalidate, so it's > > certainly required there. > > It's an uncached mmio write. It is strongly ordered and serial. Besides > if you feel it is wrong, fix the bug at source.
Re: [Intel-gfx] [PATCH] drm/i915: Serialize GTT Updates on BXT
> -Original Message- > From: Chris Wilson [mailto:ch...@chris-wilson.co.uk] > Sent: Tuesday, May 23, 2017 1:41 AM > To: Bloomfield, Jon > Cc: intel-gfx@lists.freedesktop.org > Subject: Re: [Intel-gfx] [PATCH] drm/i915: Serialize GTT Updates on BXT > > On Mon, May 22, 2017 at 11:07:25AM -0700, Jon Bloomfield wrote: > > BXT requires accesses to the GTT (i.e. PTE updates) to be serialized > > when IOMMU is enabled. This patch guarantees this by wrapping all > > updates in stop_machine and using a flushing read to guarantee that > > the GTT writes have reached their destination before restarting. > > > > Testcase? igt/gem_concurrent_blit shows the failure. I was using a combination of tests, run in parallel to hit this bug. I need to get hold of a system again to re-run. Are you saying you have also repro'd the bug just with gem_concurrent_blit or just suggesting that it might be a good candidate ? I'll also re-try without the flushing read, but I'm wary of removing this unless I can understand why the mmio write has the same effect. It might be luck. > -Chris > > -- > Chris Wilson, Intel Open Source Technology Centre ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Serialize GTT Updates on BXT
> -Original Message- > From: Bloomfield, Jon > Sent: Tuesday, May 23, 2017 7:28 AM > To: 'Chris Wilson' > Cc: intel-gfx@lists.freedesktop.org > Subject: RE: [Intel-gfx] [PATCH] drm/i915: Serialize GTT Updates on BXT > > > -Original Message- > > From: Chris Wilson [mailto:ch...@chris-wilson.co.uk] > > Sent: Tuesday, May 23, 2017 12:33 AM > > To: Bloomfield, Jon > > Cc: intel-gfx@lists.freedesktop.org > > Subject: Re: [Intel-gfx] [PATCH] drm/i915: Serialize GTT Updates on BXT > > It's an uncached mmio write. It is strongly ordered and serial. Besides > > if you feel it is wrong, fix the bug at source. > > Strongly ordered is not enough, nor is being uncached - that just ensures the > PTE > writes have left the CPU. We need to ensure they have left the GAM before > we > allow any following Aperture accesses to reach the GAM. On the other hand > it > may be that the write to the flushctl reg will explicitly cause the h/w to > clear > the > fifo. I'll check with hw. H/W have confirmed that the flushing write is not sufficient to avoid the bug. In their words [almost]: "You need the read. The [FLSH_CTRL write] will invalidate the GTLB. However, it won't ensure a [Aperture] read isn't in the pipe." So the mmio read needs to stay, and we're left with where to issue it. I placed it in the _cb function because it is only needed for BXT and doing it there avoids any (admittedly small) extra overhead for other devices. But I have no strong opinion on this, so if you really want it to go into the main function I'll move it. I do think it should be conditional though. ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915/bxt: Enable VBT based BL control for DP
> -Original Message- > From: Intel-gfx [mailto:intel-gfx-boun...@lists.freedesktop.org] On Behalf > Of Imre Deak > Sent: Tuesday, June 6, 2017 5:34 AM > To: Jani Nikula > Cc: intel-gfx@lists.freedesktop.org; Mustaffa, Mustamin B > > Subject: Re: [Intel-gfx] [PATCH] drm/i915/bxt: Enable VBT based BL control > for DP > > On Tue, Jun 06, 2017 at 12:24:30PM +0300, Jani Nikula wrote: > > On Tue, 06 Jun 2017, Mustamin B Mustaffa > wrote: > > > [...] > > > diff --git a/drivers/gpu/drm/i915/intel_dp.c > b/drivers/gpu/drm/i915/intel_dp.c > > > index d1670b8..124f58b 100644 > > > --- a/drivers/gpu/drm/i915/intel_dp.c > > > +++ b/drivers/gpu/drm/i915/intel_dp.c > > > @@ -591,13 +591,8 @@ bxt_power_sequencer_idx(struct intel_dp > *intel_dp) > > > /* We should never land here with regular DP ports */ > > > WARN_ON(!is_edp(intel_dp)); > > > > > > - /* > > > - * TODO: BXT has 2 PPS instances. The correct port->PPS instance > > > - * mapping needs to be retrieved from VBT, for now just hard-code > to > > > - * use instance #0 always. > > > - */ > > > if (!intel_dp->pps_reset) > > > - return 0; > > > + return dev_priv->vbt.backlight.controller; > > > > So the existing code around here looks a bit convoluted, not least > > because now pretty much all PPS access first does > > > > - intel_pps_get_registers(), which calls > > - bxt_power_sequencer_idx(), which calls > > - intel_dp_init_panel_power_sequencer_registers(), which calls > > - intel_pps_get_registers()... > > > > With your change, for controller == 1 and pps_reset == true, the first > > time the registers are needed, we'll initialize the correct controller 1 > > registers, but controller 0 registers are returned. From there on, we'll > > keep returning controller 1 registers until pps_reset is set to true > > again. > > > > Cc: Imre as author of commits 78597996370c ("drm/i915/bxt: Fix PPS lost > > state after suspend breaking eDP link training") and 8e8232d51878 > > ("drm/i915: Deduplicate PPS register retrieval") which I think create > > the loop. > > A loop would be prevented by the pps->reset check, but agree the code > isn't nice, I guess I overlooked this when I wrote it. To make things > clearer we could factor out a helper from > intel_dp_init_panel_power_sequencer_registers() that takes pps_registers > and call this helper from bxt_power_sequencer_idx(). > > So how about something like the following: Just checking what the intention is here because your proposed change ommits the VBT fix... Are you going to post these changes as a new baseline for Mustaffa's patch ? Or are you asking Mustaffa to fold these changes into his patch ? Hoping it's the former :-) ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915/bxt: Enable VBT based BL control for DP
> -Original Message- > From: Jani Nikula [mailto:jani.nik...@linux.intel.com] > Sent: Wednesday, June 7, 2017 1:16 AM > To: Deak, Imre ; Bloomfield, Jon > > Cc: intel-gfx@lists.freedesktop.org; Mustaffa, Mustamin B > > Subject: Re: [Intel-gfx] [PATCH] drm/i915/bxt: Enable VBT based BL control > for DP > > On Tue, 06 Jun 2017, Imre Deak wrote: > > On Tue, Jun 06, 2017 at 05:58:43PM +0300, Bloomfield, Jon wrote: > >> > -Original Message- > >> > From: Intel-gfx [mailto:intel-gfx-boun...@lists.freedesktop.org] On > Behalf > >> > Of Imre Deak > >> > Sent: Tuesday, June 6, 2017 5:34 AM > >> > To: Jani Nikula > >> > Cc: intel-gfx@lists.freedesktop.org; Mustaffa, Mustamin B > >> > > >> > Subject: Re: [Intel-gfx] [PATCH] drm/i915/bxt: Enable VBT based BL > control > >> > for DP > >> > > >> > On Tue, Jun 06, 2017 at 12:24:30PM +0300, Jani Nikula wrote: > >> > > On Tue, 06 Jun 2017, Mustamin B Mustaffa > >> > wrote: > >> > > > [...] > >> > > > diff --git a/drivers/gpu/drm/i915/intel_dp.c > >> > b/drivers/gpu/drm/i915/intel_dp.c > >> > > > index d1670b8..124f58b 100644 > >> > > > --- a/drivers/gpu/drm/i915/intel_dp.c > >> > > > +++ b/drivers/gpu/drm/i915/intel_dp.c > >> > > > @@ -591,13 +591,8 @@ bxt_power_sequencer_idx(struct intel_dp > >> > *intel_dp) > >> > > > /* We should never land here with regular DP ports */ > >> > > > WARN_ON(!is_edp(intel_dp)); > >> > > > > >> > > > -/* > >> > > > - * TODO: BXT has 2 PPS instances. The correct port->PPS > instance > >> > > > - * mapping needs to be retrieved from VBT, for now just > hard-code > >> > to > >> > > > - * use instance #0 always. > >> > > > - */ > >> > > > if (!intel_dp->pps_reset) > >> > > > -return 0; > >> > > > +return dev_priv->vbt.backlight.controller; > >> > > > >> > > So the existing code around here looks a bit convoluted, not least > >> > > because now pretty much all PPS access first does > >> > > > >> > > - intel_pps_get_registers(), which calls > >> > > - bxt_power_sequencer_idx(), which calls > >> > > - intel_dp_init_panel_power_sequencer_registers(), which calls > >> > > - intel_pps_get_registers()... > >> > > > >> > > With your change, for controller == 1 and pps_reset == true, the first > >> > > time the registers are needed, we'll initialize the correct controller > >> > > 1 > >> > > registers, but controller 0 registers are returned. From there on, > >> > > we'll > >> > > keep returning controller 1 registers until pps_reset is set to true > >> > > again. > >> > > > >> > > Cc: Imre as author of commits 78597996370c ("drm/i915/bxt: Fix PPS > lost > >> > > state after suspend breaking eDP link training") and 8e8232d51878 > >> > > ("drm/i915: Deduplicate PPS register retrieval") which I think create > >> > > the loop. > >> > > >> > A loop would be prevented by the pps->reset check, but agree the code > >> > isn't nice, I guess I overlooked this when I wrote it. To make things > >> > clearer we could factor out a helper from > >> > intel_dp_init_panel_power_sequencer_registers() that takes > pps_registers > >> > and call this helper from bxt_power_sequencer_idx(). > >> > > >> > So how about something like the following: > >> > >> Just checking what the intention is here because your proposed change > >> ommits the VBT fix... Are you going to post these changes as a new > >> baseline for Mustaffa's patch ? Or are you asking Mustaffa to fold > >> these changes into his patch ? Hoping it's the former :-) > > > > The change is just to make the code clearer, unrelated to the VBT fix, > > so it should be a separate patch. I don't mind doing this as a follow-up > > to Mustaffa's patchset. What his patch here would need is just to return > > the correct index from bxt_power_sequencer_idx() in all cases. > > I think we might need to backport Mustaffa's patch to stable so we need > to do that first as a standalone change. After it has been fixed > according to Imre's and my feedback. Oh, and I'd still like someone(tm) > to check if the PPS-PWM mapping is fixed 1:1, or can we cross connect > them? Yes, I was having doubts about this yesterday too. I can find nothing the BSpec to indicate any relationship at all between PWM and PPS. Mustamin is from IOTG and this for a specific product they're working on. The correct fix is probably to extend VBT to include a separate PPS select field and then key off that. As this is a new product, there should be no issues with updating the VBT (I hope), but how does that sit with you guys ? Hardcoding is certainly plain wrong, even if all current released products do need 0. > > I just involved Imre here because the existing code is, I think, > unnecessarily hard to follow. ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Workaround incoherence between fences and LLC across multiple CPUs
> -Original Message- > From: daniel.vet...@ffwll.ch [mailto:daniel.vet...@ffwll.ch] On Behalf Of > Daniel Vetter > Sent: Saturday, June 08, 2013 10:22 PM > To: Carsten Emde > Cc: Chris Wilson; Jesse Barnes; Intel Graphics Development; Bloomfield, Jon; > Steven Rostedt; Christoph Mathys; stable > Subject: Re: [Intel-gfx] [PATCH] drm/i915: Workaround incoherence > between fences and LLC across multiple CPUs > > On Sat, Jun 8, 2013 at 10:41 PM, Carsten Emde wrote: > > On 04/08/2013 11:54 AM, Daniel Vetter wrote: > >> > >> On Fri, Apr 05, 2013 at 09:03:27AM -0700, Jesse Barnes wrote: > >>> > >>> On Thu, 4 Apr 2013 21:31:03 +0100 > >>> Chris Wilson wrote: > >>> > >>>> In order to fully serialize access to the fenced region and the > >>>> update to the fence register we need to take extreme measures on > >>>> SNB+, and manually flush writes to memory prior to writing the > >>>> fence register in conjunction with the memory barriers placed around > the register write. > >>>> > >>>> Fixes i-g-t/gem_fence_thrash > >>>> > >>>> v2: Bring a bigger gun > >>>> v3: Switch the bigger gun for heavier bullets (Arjan van de Ven) > >>>> v4: Remove changes for working generations. > >>>> v5: Reduce to a per-cpu wbinvd() call prior to updating the fences. > >>>> v6: Rewrite comments to ellide forgotten history. > >>>> > >>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62191 > >>>> Signed-off-by: Chris Wilson > >>>> Cc: Jon Bloomfield > >>>> Tested-by: Jon Bloomfield (v2) > >>>> Cc: sta...@vger.kernel.org > >>>> --- > >>>> drivers/gpu/drm/i915/i915_gem.c | 28 > +++- > >>>> 1 file changed, 23 insertions(+), 5 deletions(-) > >>>> > >>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c > >>>> b/drivers/gpu/drm/i915/i915_gem.c index fa4ea1a..8f7739e 100644 > >>>> --- a/drivers/gpu/drm/i915/i915_gem.c > >>>> +++ b/drivers/gpu/drm/i915/i915_gem.c > >>>> @@ -2689,17 +2689,35 @@ static inline int fence_number(struct > >>>> drm_i915_private *dev_priv, > >>>> return fence - dev_priv->fence_regs; > >>>> } > >>>> > >>>> +static void i915_gem_write_fence__ipi(void *data) { > >>>> + wbinvd(); > >>>> +} > >>>> + > >>>> static void i915_gem_object_update_fence(struct > >>>> drm_i915_gem_object *obj, > >>>> struct drm_i915_fence_reg > >>>> *fence, > >>>> bool enable) > >>>> { > >>>> - struct drm_i915_private *dev_priv = obj->base.dev->dev_private; > >>>> - int reg = fence_number(dev_priv, fence); > >>>> - > >>>> - i915_gem_write_fence(obj->base.dev, reg, enable ? obj : NULL); > >>>> + struct drm_device *dev = obj->base.dev; > >>>> + struct drm_i915_private *dev_priv = dev->dev_private; > >>>> + int fence_reg = fence_number(dev_priv, fence); > >>>> + > >>>> + /* In order to fully serialize access to the fenced region and > >>>> +* the update to the fence register we need to take extreme > >>>> +* measures on SNB+. In theory, the write to the fence register > >>>> +* flushes all memory transactions before, and coupled with the > >>>> +* mb() placed around the register write we serialise all memory > >>>> +* operations with respect to the changes in the tiler. Yet, on > >>>> +* SNB+ we need to take a step further and emit an explicit > >>>> wbinvd() > >>>> +* on each processor in order to manually flush all memory > >>>> +* transactions before updating the fence register. > >>>> +*/ > >>>> + if (HAS_LLC(obj->base.dev)) > >>>> + on_each_cpu(i915_gem_write_fence__ipi, NULL, 1); > >>>> + i915_gem_write_fence(dev, fence_reg, enable ? obj : NULL); > >>>> > >>>> if (enable) { > >>>> - obj->fence_reg = reg; > >>>> +
Re: [Intel-gfx] [PATCH] drm/i915: Workaround incoherence between fences and LLC across multiple CPUs
> -Original Message- > From: daniel.vet...@ffwll.ch [mailto:daniel.vet...@ffwll.ch] On Behalf Of > Daniel Vetter > Sent: Monday, June 10, 2013 3:38 PM > To: Bloomfield, Jon > Cc: Carsten Emde; Chris Wilson; Jesse Barnes; Intel Graphics Development; > Steven Rostedt; Christoph Mathys; stable > Subject: Re: [Intel-gfx] [PATCH] drm/i915: Workaround incoherence > between fences and LLC across multiple CPUs > > On Mon, Jun 10, 2013 at 3:17 PM, Bloomfield, Jon > wrote: > >> -Original Message- > >> From: daniel.vet...@ffwll.ch [mailto:daniel.vet...@ffwll.ch] On > >> Behalf Of Daniel Vetter > >> Sent: Saturday, June 08, 2013 10:22 PM > >> To: Carsten Emde > >> Cc: Chris Wilson; Jesse Barnes; Intel Graphics Development; > >> Bloomfield, Jon; Steven Rostedt; Christoph Mathys; stable > >> Subject: Re: [Intel-gfx] [PATCH] drm/i915: Workaround incoherence > >> between fences and LLC across multiple CPUs > >> > >> On Sat, Jun 8, 2013 at 10:41 PM, Carsten Emde > wrote: > >> > On 04/08/2013 11:54 AM, Daniel Vetter wrote: > >> >> > >> >> On Fri, Apr 05, 2013 at 09:03:27AM -0700, Jesse Barnes wrote: > >> >>> > >> >>> On Thu, 4 Apr 2013 21:31:03 +0100 Chris Wilson > >> >>> wrote: > >> >>> > >> >>>> In order to fully serialize access to the fenced region and the > >> >>>> update to the fence register we need to take extreme measures on > >> >>>> SNB+, and manually flush writes to memory prior to writing the > >> >>>> fence register in conjunction with the memory barriers placed > >> >>>> around > >> the register write. > >> >>>> > >> >>>> Fixes i-g-t/gem_fence_thrash > >> >>>> > >> >>>> v2: Bring a bigger gun > >> >>>> v3: Switch the bigger gun for heavier bullets (Arjan van de Ven) > >> >>>> v4: Remove changes for working generations. > >> >>>> v5: Reduce to a per-cpu wbinvd() call prior to updating the fences. > >> >>>> v6: Rewrite comments to ellide forgotten history. > >> >>>> > >> >>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62191 > >> >>>> Signed-off-by: Chris Wilson > >> >>>> Cc: Jon Bloomfield > >> >>>> Tested-by: Jon Bloomfield (v2) > >> >>>> Cc: sta...@vger.kernel.org > >> >>>> --- > >> >>>> drivers/gpu/drm/i915/i915_gem.c | 28 > >> +++- > >> >>>> 1 file changed, 23 insertions(+), 5 deletions(-) > >> >>>> > >> >>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c > >> >>>> b/drivers/gpu/drm/i915/i915_gem.c index fa4ea1a..8f7739e 100644 > >> >>>> --- a/drivers/gpu/drm/i915/i915_gem.c > >> >>>> +++ b/drivers/gpu/drm/i915/i915_gem.c > >> >>>> @@ -2689,17 +2689,35 @@ static inline int fence_number(struct > >> >>>> drm_i915_private *dev_priv, > >> >>>> return fence - dev_priv->fence_regs; > >> >>>> } > >> >>>> > >> >>>> +static void i915_gem_write_fence__ipi(void *data) { > >> >>>> + wbinvd(); > >> >>>> +} > >> >>>> + > >> >>>> static void i915_gem_object_update_fence(struct > >> >>>> drm_i915_gem_object *obj, > >> >>>> struct > >> >>>> drm_i915_fence_reg *fence, > >> >>>> bool enable) > >> >>>> { > >> >>>> - struct drm_i915_private *dev_priv = obj->base.dev- > >dev_private; > >> >>>> - int reg = fence_number(dev_priv, fence); > >> >>>> - > >> >>>> - i915_gem_write_fence(obj->base.dev, reg, enable ? obj : > NULL); > >> >>>> + struct drm_device *dev = obj->base.dev; > >> >>>> + struct drm_i915_private *dev_priv = dev->dev_private; > >> >>>> + int fence_reg = fence_number(dev_priv, fence); > >> >>>> + > >> >>>> + /* In order to fully serialize access to the fenced region and &
Re: [Intel-gfx] [PATCH] drm/i915: Introduce vblank work function
What's the status of this patch ? I can't find any subsequent mention of it, but we currently use it in one of our Android development trees. I'm trying to work out whether to retain it or replace it. Was it killed off, or is it still in the pipeline ? Thanks. > -Original Message- > From: intel-gfx-bounces+jon.bloomfield=intel@lists.freedesktop.org > [mailto:intel-gfx-bounces+jon.bloomfield=intel@lists.freedesktop.org] > On Behalf Of Chris Wilson > Sent: Friday, July 05, 2013 9:49 AM > To: intel-gfx@lists.freedesktop.org > Subject: [Intel-gfx] [PATCH] drm/i915: Introduce vblank work function > > Throughout the driver, we have a number of pieces of code that must wait > for a vblank before we can update some state. Often these could be run > asynchronously since they are merely freeing a resource no long referenced > by a double-buffered registered. So we introduce a vblank worker upon > which we queue various tasks to be run after the next vvblank. > > This will be used in the next patches to avoid unnecessary stalls when > updating registers and for freeing resources. > > Signed-off-by: Chris Wilson > --- > drivers/gpu/drm/i915/intel_display.c | 79 > > drivers/gpu/drm/i915/intel_drv.h | 8 +++- > 2 files changed, 86 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/intel_display.c > b/drivers/gpu/drm/i915/intel_display.c > index b70ce4e..dff3b93 100644 > --- a/drivers/gpu/drm/i915/intel_display.c > +++ b/drivers/gpu/drm/i915/intel_display.c > @@ -795,6 +795,74 @@ void intel_wait_for_vblank(struct drm_device *dev, > int pipe) > DRM_DEBUG_KMS("vblank wait timed out\n"); } > > +struct intel_crtc_vblank_task { > + struct list_head list; > + void (*func)(struct intel_crtc *, void *data); > + void *data; > +}; > + > +static void intel_crtc_vblank_work_fn(struct work_struct *_work) { > + struct intel_crtc_vblank_work *work = > + container_of(_work, struct intel_crtc_vblank_work, work); > + struct intel_crtc *crtc = > + container_of(work, struct intel_crtc, vblank_work); > + struct list_head tasks; > + > + intel_wait_for_vblank(crtc->base.dev, crtc->pipe); > + > + mutex_lock(&crtc->vblank_work.mutex); > + list_replace_init(&work->tasks, &tasks); > + mutex_unlock(&crtc->vblank_work.mutex); > + > + while (!list_empty(&tasks)) { > + struct intel_crtc_vblank_task *task > + = list_first_entry(&tasks, > +struct intel_crtc_vblank_task, > +list); > + > + task->func(crtc, task->data); > + list_del(&task->list); > + kfree(task); > + } > +} > + > +static int intel_crtc_add_vblank_task(struct intel_crtc *crtc, > + bool single, > + void (*func)(struct intel_crtc *, > +void *data), > + void *data) > +{ > + struct intel_crtc_vblank_task *task; > + struct intel_crtc_vblank_work *work = &crtc->vblank_work; > + > + task = kzalloc(sizeof *task, GFP_KERNEL); > + if (task == NULL) > + return -ENOMEM; > + task->func = func; > + task->data = data; > + > + mutex_lock(&work->mutex); > + if (list_empty(&work->tasks)) { > + schedule_work(&work->work); > + } else if (single) { > + struct intel_crtc_vblank_task *old; > + list_for_each_entry(old, &work->tasks, list) { > + if (task->func == func && task->data == data) { > + func = NULL; > + break; > + } > + } > + } > + if (func) > + list_add(&task->list, &work->tasks); > + else > + kfree(task); > + mutex_unlock(&work->mutex); > + > + return 0; > +} > + > /* > * intel_wait_for_pipe_off - wait for pipe to turn off > * @dev: drm device > @@ -2835,6 +2903,8 @@ static void intel_crtc_wait_for_pending_flips(struct > drm_crtc *crtc) > if (crtc->fb == NULL) > return; > > + flush_work(&to_intel_crtc(crtc)->vblank_work.work); > + > WARN_ON(waitqueue_active(&dev_priv->pending_flip_queue)); > > wait_event(dev_priv->pending_flip_queue, > @@ -9097,6 +9167,10 @@ static void intel_crtc_init(struct drm_device *dev, > int pipe) > if (intel_crtc == NULL) > return; > > + mutex_init(&intel_crtc->vblank_work.mutex); > + INIT_LIST_HEAD(&intel_crtc->vblank_work.tasks); > + INIT_WORK(&intel_crtc->vblank_work.work, > intel_crtc_vblank_work_fn); > + > drm_crtc_init(dev, &intel_crtc->base, &intel_crtc_funcs); > > if (drm_mode_crtc_set_gamma_size(&intel_crtc->base, 256)) @@ - > 10296,11 +10370,16 @@ void intel_modeset_cleanup(struct drm_de
Re: [Intel-gfx] [PATCH] drm/i915: Introduce vblank work function
Ok thanks. To add weight to it becoming official in some form, we're using it for various deferred operations: drm/i915: Make plane switching asynchronous drm/i915: Asynchronously unpin the old framebuffer after the next vblank They aren't my patches but I believe they should be upstreamed in the near future. The claim is that these give a noticeable performance boost. I'll leave it in and hope it becomes official. > -Original Message- > From: Daniel Vetter [mailto:daniel.vet...@ffwll.ch] On Behalf Of Daniel > Vetter > Sent: Friday, December 06, 2013 12:07 PM > To: Bloomfield, Jon > Cc: Chris Wilson; intel-gfx@lists.freedesktop.org > Subject: Re: [Intel-gfx] [PATCH] drm/i915: Introduce vblank work function > > On Fri, Dec 06, 2013 at 10:44:15AM +, Bloomfield, Jon wrote: > > What's the status of this patch ? I can't find any subsequent mention > > of it, but we currently use it in one of our Android development > > trees. I'm trying to work out whether to retain it or replace it. > > > > Was it killed off, or is it still in the pipeline ? > > Stalled atm I think. The overall concept of a vblank worker/timer support > code is still useful imo. I think I've written up all the bikesheds Chris&I > discussed on irc in my other reply. > -Daniel > -- > Daniel Vetter > Software Engineer, Intel Corporation > +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 1/2] drm/i915: Wait for completion of pending flips when starved of fences
> -Original Message- > From: intel-gfx-boun...@lists.freedesktop.org [mailto:intel-gfx- > boun...@lists.freedesktop.org] On Behalf Of Chris Wilson > Sent: Monday, January 20, 2014 10:18 AM > To: intel-gfx@lists.freedesktop.org > Subject: [Intel-gfx] [PATCH 1/2] drm/i915: Wait for completion of pending > flips when starved of fences > > On older generations (gen2, gen3) the GPU requires fences for many > operations, such as blits. The display hardware also requires fences for > scanouts and this leads to a situation where an arbitrary number of fences > may be pinned by old scanouts following a pageflip but before we have > executed the unpin workqueue. This is unpredictable by userspace and leads > to random EDEADLK when submitting an otherwise benign execbuffer. > However, we can detect when we have an outstanding flip and so cause > userspace to wait upon their completion before finally declaring that the > system is starved of fences. This is really no worse than forcing the GPU to > stall waiting for older execbuffer to retire and release their fences before > we > can reallocate them for the next execbuffer. > > v2: move the test for a pending fb unpin to a common routine for later reuse > during eviction > > Reported-and-tested-by: di...@gmx.net > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=73696 > Signed-off-by: Chris Wilson Review-by: Jon Bloomfield ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 2/2] drm/i915: Repeat evictions whilst pageflip completions are outstanding
> From: intel-gfx-boun...@lists.freedesktop.org [mailto:intel-gfx- > boun...@lists.freedesktop.org] On Behalf Of Chris Wilson > Sent: Monday, January 20, 2014 10:18 AM > To: intel-gfx@lists.freedesktop.org > Subject: [Intel-gfx] [PATCH 2/2] drm/i915: Repeat evictions whilst pageflip > completions are outstanding > > Since an old pageflip will keep its scanout buffer object pinned until it has > executed its unpin task on the common workqueue, we can clog up our > GGTT with stale pinned objects. As we cannot flush those workqueues > without dropping our locks, we have to resort to falling back to userspace > and telling them to repeat the operation in order to have a chance to run our > workqueues and free up the required memory. If we fail, then we are forced > to report ENOSPC back to userspace causing the operation to fail and best- > case scenario is that it introduces temporary corruption. > > Signed-off-by: Chris Wilson Reviewed-by: Jon Bloomfield ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Wait and retry if there is no space in the aperture mappable area
> On Tue, Oct 22, 2013 at 01:15:32PM +0100, Chris Wilson wrote: > > On Tue, Oct 22, 2013 at 12:04:17PM +, Siluvery, Arun wrote: > > > From: "Siluvery, Arun" > > > > > > When a mapping is requested and if there is no space the mapping > > > fails and the region is not physically backed. This results in > > > signal 7(SIGBUS), code 2 (BUS_ADRERR) when it is actually accessed. > > > This patch handles this error, continues to wait and retries to find > > > space. > > > > Eh, no. The line before will remove everything from the aperture that > > is unpinned. Throwing an evict_everything in here breaks reservations, > > so I think you are just papering over a bug. > > If we want to fix this for real (i.e. allow userspace to reliably map stuff, > mabye even bigger than the aperture) we need to fall back to suballocating > tile-row aligned strides of the buffer (maybe pick the tile row multiply in > between 1M-2M). Until that's done userspace can't rely on gtt mmaps relibly > working for large buffers. The current heuristics we're using is half of the > mappable space, but that's probably a bit too optimistic. > -Daniel Is calling i915_gem_evict_everything() actually dangerous ? Despite its name, it appears to only evict unpinned buffers. Or am I missing something ? What it does do is update the status of buffers which are no longer in use on the ring by calling i915_gem_retire_requests(). So from what I can tell (from a 10 minute trawl of the code admittedly) all this patch is doing is getting a more up to date view of the GTT demands so that we only fail with ENOSPC if there are no pinned buffers which can now be unpinned. It doesn't address our underlying issue - userspace should still handle ENOSPC gracefully. However it certainly seems to be improving things considerably, so is beneficial if it really is a safe thing to do. Jon - Intel Corporation (UK) Limited Registered No. 1134945 (England) Registered Office: Pipers Way, Swindon SN3 1RJ VAT No: 860 2173 47 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Wait and retry if there is no space in the aperture mappable area
Thanks Daniel, sermon noted. I hadn't twigged that we were pinning buffers to the mapable GTT region which didn't really need to be there. Do we definitely never need to modify or interrogate the hw contexts from the CPU ? What about for debug ? Jon > -Original Message- > From: daniel.vet...@ffwll.ch [mailto:daniel.vet...@ffwll.ch] On Behalf Of > Daniel Vetter > Sent: Friday, November 01, 2013 4:57 PM > To: Chris Wilson > Cc: Bloomfield, Jon; intel-gfx > Subject: Re: [Intel-gfx] [PATCH] drm/i915: Wait and retry if there is no space > in the aperture mappable area > > Dragging this discussion back onto the mailing list. > > The Intel GenX kernel team is massively distributed over mutliple continents > and lots fo different business groups within Intel, and a few interested > parties outside of Intel. We need to work hard and actively to make sure > everyone is in the loop, that interested parties can jump into a discussion or > at least learn a bit by following it. > > Which means _please_ _always_ cc mailing list. Either the public intel-gfx > list > or when discussing confidential topics (and they can't be redacted out > without causing confusion) the internal gfx-internal-devel mailing list. > > /end of sermon, thanks for your attention > > Short recap for newcomers: Jon's group is hitting reliability issues in stress > testing where gtt mmappings fail with SIGBUS. The frequency sharply > increased when they started to support more outputs/sprites and so > squarely points at mappable gtt fragmentation due to pinned buffers. > > On Fri, Nov 1, 2013 at 1:47 PM, Chris Wilson > wrote: > > On Fri, Nov 01, 2013 at 12:02:07PM +, Bloomfield, Jon wrote: > >> > > > That statement is false. > >> > > Fair enough, but why is it false ? > >> > > >> > You often have buffers larger than the mappable aperture. At > >> > present, such mappings fail (they should pruned out earlier, but in > >> > the fault they would give a SIGBUS). That limit is oft requested to be > raised by magic. > >> But a mapping like this should fail when the user asks to map the buffer, > not when they subsequently access it. It can never succeed, so tell the user > immediately. > >> > >> > > Userspace fails when accessing a 90MB mapping. > >> > > Don't understand how a trampoline would simplify this - > >> > > trampoline as in > >> > vectored jump ? > >> > > Don't understand the 'educate' comment - they are simply mapping > >> > > a > >> > 90MB buffer and trying to use it. Why should that be slow in the > >> > general case ? > >> > > >> > Failing to map a 90MiB buffer is a kernel bug, pure and simple. > >> > Failing to map a 900MiB buffer is a user education issue. (It could > >> > be done, but there are at least a dozen ways the user could do the > >> > operation differently that will be > >> > faster.) > >> Why a kernel bug ? If there are too many surfaces pinned for display (or > sufficiently fragmented), is it guaranteed that a 90MB map should still > succeed ? > >> > >> > The borderline is whether the user expects a 128MiB mapping to > >> > always work. Worst case scenario (without a > >> > trampoline) is that requires the outputs to be disabled first i.e. > >> > process deadlock which leads to a system deadlock with anything like X. > >> Again, how would a trampoline help ? > > > > All I am stating here is that your idea can^Wwill lead to a system > > deadlock. NAK. > > I agree with Chris, implicit waiting for a random pin count to drop is a > deadlock nightmare. We'd need at least an explicti unpin_lru where we track > all soon-to-be-unpinned objects. And then we need to walk that list in > evict_something as a last resort. > > Still resolving the locking inversion this causes between gem and kms code > won't be fun at all, and we still haven't gained any solid guarantees for > userspace.The mappable gtt can still be easily fragmented, it's just a notch > less likely to cause problems. > > We need a better solution, and we need one with actual guarantees for > userspace to depend upon. > > So here's my idea to solve this for real: > - split long-term pinned objects into two classes: Those that might need to be > access through the mappable gtt mmio window (like scanout > buffers) and those that don't (like hw contexts). > - Pin objects that are never accessed through the mappable gtt by the cpu to > the
Re: [Intel-gfx] [PATCH 20/62] drm/i915/bdw: Add GTT functions
> -Original Message- > From: intel-gfx-boun...@lists.freedesktop.org [mailto:intel-gfx- > boun...@lists.freedesktop.org] On Behalf Of Ben Widawsky > Sent: Sunday, November 03, 2013 4:07 AM > To: Intel GFX > Cc: Daniel Vetter; Ben Widawsky; Widawsky, Benjamin > Subject: [Intel-gfx] [PATCH 20/62] drm/i915/bdw: Add GTT functions > > With the PTE clarifications, the bind and clear functions can now be added for > gen8. > > v2: Use for_each_sg_pages in gen8_ggtt_insert_entries. > > v3: Drop dev argument to pte encode functions, upstream lost it. Also rebase > on top of the scratch page movement. > > v4: Rebase on top of the new address space vfuncs. > > v5: Add the bool use_scratch argument to clear_range and the bool valid > argument to the PTE encode function to follow upstream changes. > > Signed-off-by: Ben Widawsky (v1) > Signed-off-by: Daniel Vetter > --- > drivers/gpu/drm/i915/i915_gem_gtt.c | 88 > +++-- > 1 file changed, 85 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c > b/drivers/gpu/drm/i915/i915_gem_gtt.c > index 8bf2184..df992dc 100644 > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c > @@ -58,6 +58,15 @@ typedef uint64_t gen8_gtt_pte_t; > #define HSW_WB_ELLC_LLC_AGE0 > HSW_CACHEABILITY_CONTROL(0xb) > #define HSW_WT_ELLC_LLC_AGE0 > HSW_CACHEABILITY_CONTROL(0x6) > > +static inline gen8_gtt_pte_t gen8_pte_encode(dma_addr_t addr, > + enum i915_cache_level level, > + bool valid) > +{ > + gen8_gtt_pte_t pte = valid ? _PAGE_PRESENT | _PAGE_RW : 0; > + pte |= addr; > + return pte; > +} > + > static gen6_gtt_pte_t snb_pte_encode(dma_addr_t addr, >enum i915_cache_level level, >bool valid) > @@ -576,6 +585,56 @@ int i915_gem_gtt_prepare_object(struct > drm_i915_gem_object *obj) > return 0; > } > > +static inline void gen8_set_pte(void __iomem *addr, gen8_gtt_pte_t pte) > +{ #ifdef writeq > + writeq(pte, addr); > +#else > + iowrite32((u32)pte, addr); > + iowrite32(pte >> 32, addr + 4); > +#endif > +} > + > +static void gen8_ggtt_insert_entries(struct i915_address_space *vm, > + struct sg_table *st, > + unsigned int first_entry, > + enum i915_cache_level level) > +{ > + struct drm_i915_private *dev_priv = vm->dev->dev_private; > + gen8_gtt_pte_t __iomem *gtt_entries = > + (gen8_gtt_pte_t __iomem *)dev_priv->gtt.gsm + > first_entry; > + int i = 0; > + struct sg_page_iter sg_iter; > + dma_addr_t addr; > + > + for_each_sg_page(st->sgl, &sg_iter, st->nents, 0) { > + addr = sg_dma_address(sg_iter.sg) + > + (sg_iter.sg_pgoffset << PAGE_SHIFT); > + gen8_set_pte(>t_entries[i], > + gen8_pte_encode(addr, level, true)); > + i++; > + } > + > + /* XXX: This serves as a posting read to make sure that the PTE has > + * actually been updated. There is some concern that even though > + * registers and PTEs are within the same BAR that they are > potentially > + * of NUMA access patterns. Therefore, even with the way we > assume > + * hardware should work, we must keep this posting read for > paranoia. > + */ > + if (i != 0) > + WARN_ON(readl(>t_entries[i-1]) > + != gen8_pte_encode(addr, level, true)); Comparing a u32 with a 64-bit page-table entry ? ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v5 1/7] drm/i915: Implement a framework for batch buffer pools
> -Original Message- > From: Nguyen, Michael H > Sent: Wednesday, November 26, 2014 9:54 PM > To: intel-gfx@lists.freedesktop.org > Cc: Bloomfield, Jon; Volkin, Bradley D > Subject: [PATCH v5 1/7] drm/i915: Implement a framework for batch buffer > pools > > From: Brad Volkin > > This adds a small module for managing a pool of batch buffers. > The only current use case is for the command parser, as described in the > kerneldoc in the patch. The code is simple, but separating it out makes it > easier to change the underlying algorithms and to extend to future use cases > should they arise. > > The interface is simple: init to create an empty pool, fini to clean it up, > get to > obtain a new buffer. Note that all buffers are expected to be inactive before > cleaning up the pool. > > Locking is currently based on the caller holding the struct_mutex. > We already do that in the places where we will use the batch pool for the > command parser. > > v2: > - s/BUG_ON/WARN_ON/ for locking assertions > - Remove the cap on pool size > - Switch from alloc/free to init/fini > > v3: > - Idiomatic looping structure in _fini > - Correct handling of purged objects > - Don't return a buffer that's too much larger than needed > > v4: > - Rebased to latest -nightly > > v5: > - Remove _put() function and clean up comments to match > > v6: > - Move purged check inside the loop (danvet, from v4 1/7 feedback) > > Issue: VIZ-4719 > Signed-off-by: Brad Volkin > --- > Documentation/DocBook/drm.tmpl | 5 + > drivers/gpu/drm/i915/Makefile | 1 + > drivers/gpu/drm/i915/i915_drv.h| 15 +++ > drivers/gpu/drm/i915/i915_gem.c| 1 + > drivers/gpu/drm/i915/i915_gem_batch_pool.c | 152 > + > 5 files changed, 174 insertions(+) > create mode 100644 drivers/gpu/drm/i915/i915_gem_batch_pool.c > > diff --git a/Documentation/DocBook/drm.tmpl > b/Documentation/DocBook/drm.tmpl index 944e8ed..b5943d4 100644 > --- a/Documentation/DocBook/drm.tmpl > +++ b/Documentation/DocBook/drm.tmpl > @@ -4028,6 +4028,11 @@ int num_ioctls; > !Idrivers/gpu/drm/i915/i915_cmd_parser.c > > > +Batchbuffer Pools > +!Pdrivers/gpu/drm/i915/i915_gem_batch_pool.c batch pool > +!Idrivers/gpu/drm/i915/i915_gem_batch_pool.c > + > + > Logical Rings, Logical Ring Contexts and Execlists > !Pdrivers/gpu/drm/i915/intel_lrc.c Logical Rings, Logical Ring Contexts and > Execlists !Idrivers/gpu/drm/i915/intel_lrc.c > diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile > index e4083e4..c8dbb37d 100644 > --- a/drivers/gpu/drm/i915/Makefile > +++ b/drivers/gpu/drm/i915/Makefile > @@ -19,6 +19,7 @@ i915-$(CONFIG_DEBUG_FS) += i915_debugfs.o > > # GEM code > i915-y += i915_cmd_parser.o \ > + i915_gem_batch_pool.o \ > i915_gem_context.o \ > i915_gem_render_state.o \ > i915_gem_debug.o \ > diff --git a/drivers/gpu/drm/i915/i915_drv.h > b/drivers/gpu/drm/i915/i915_drv.h index c4f2cb6..b12a062 100644 > --- a/drivers/gpu/drm/i915/i915_drv.h > +++ b/drivers/gpu/drm/i915/i915_drv.h > @@ -1154,6 +1154,12 @@ struct intel_l3_parity { > int which_slice; > }; > > +struct i915_gem_batch_pool { > + struct drm_device *dev; > + struct list_head active_list; > + struct list_head inactive_list; > +}; > + > struct i915_gem_mm { > /** Memory allocator for GTT stolen memory */ > struct drm_mm stolen; > @@ -1885,6 +1891,8 @@ struct drm_i915_gem_object { > /** Used in execbuf to temporarily hold a ref */ > struct list_head obj_exec_link; > > + struct list_head batch_pool_list; > + > /** >* This is set if the object is on the active lists (has pending >* rendering and so a non-zero seqno), and is not set if it i s on @@ - > 2849,6 +2857,13 @@ void i915_destroy_error_state(struct drm_device > *dev); void i915_get_extra_instdone(struct drm_device *dev, uint32_t > *instdone); const char *i915_cache_level_str(struct drm_i915_private *i915, > int type); > > +/* i915_gem_batch_pool.c */ > +void i915_gem_batch_pool_init(struct drm_device *dev, > + struct i915_gem_batch_pool *pool); void > +i915_gem_batch_pool_fini(struct i915_gem_batch_pool *pool); struct > +drm_i915_gem_object* i915_gem_batch_pool_get(struct > i915_gem_batch_pool > +*pool, size_t size); > + > /* i915_cmd_parser.c */ > int i915_cmd_parser_get_version(void); > int i915_cmd_parser_init_ring(struct intel_engine_cs *ring); diff --git > a/drivers/gpu/drm/i915/i915
Re: [Intel-gfx] [PATCH] drm/i915: Add MI_SET_APPID cmd to cmd parser tables
> -Original Message- > From: Intel-gfx [mailto:intel-gfx-boun...@lists.freedesktop.org] On Behalf > Of michael.h.ngu...@intel.com > Sent: Friday, November 21, 2014 5:36 PM > To: intel-gfx@lists.freedesktop.org > Cc: Vetter, Daniel; Barnes, Jesse > Subject: [Intel-gfx] [PATCH] drm/i915: Add MI_SET_APPID cmd to cmd parser > tables > > From: "Michael H. Nguyen" > > Was missing > > Issue: VIZ-4701 > Signed-off-by: Michael H. Nguyen > --- > drivers/gpu/drm/i915/i915_cmd_parser.c | 14 +++--- > drivers/gpu/drm/i915/i915_reg.h| 3 +++ > 2 files changed, 14 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c > b/drivers/gpu/drm/i915/i915_cmd_parser.c > index 22c992a..364aff7 100644 > --- a/drivers/gpu/drm/i915/i915_cmd_parser.c > +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c > @@ -152,6 +152,7 @@ static const struct drm_i915_cmd_descriptor > render_cmds[] = { > CMD( MI_PREDICATE, SMI,F, 1, S ), > CMD( MI_TOPOLOGY_FILTER, SMI,F, 1, S ), > CMD( MI_DISPLAY_FLIP, SMI, !F, 0xFF, R ), > + CMD( MI_SET_APPID, SMI,F, 1, S ), > CMD( MI_SET_CONTEXT, SMI, !F, 0xFF, R ), > CMD( MI_URB_CLEAR, SMI, !F, 0xFF, S ), > CMD( MI_STORE_DWORD_IMM, SMI, !F, 0x3F, B, > @@ -210,6 +211,7 @@ static const struct drm_i915_cmd_descriptor > hsw_render_cmds[] = { > CMD( MI_SET_PREDICATE, SMI,F, 1, S ), > CMD( MI_RS_CONTROL,SMI,F, 1, S ), > CMD( MI_URB_ATOMIC_ALLOC, SMI,F, 1, S ), > + CMD( MI_SET_APPID, SMI,F, 1, S ), > CMD( MI_RS_CONTEXT,SMI,F, 1, S ), > CMD( MI_LOAD_SCAN_LINES_INCL, SMI, !F, 0x3F, M ), > CMD( MI_LOAD_SCAN_LINES_EXCL, SMI, !F, 0x3F, R ), > @@ -229,6 +231,7 @@ static const struct drm_i915_cmd_descriptor > hsw_render_cmds[] = { > > static const struct drm_i915_cmd_descriptor video_cmds[] = { > CMD( MI_ARB_ON_OFF,SMI,F, 1, R ), > + CMD( MI_SET_APPID, SMI,F, 1, S ), > CMD( MI_STORE_DWORD_IMM, SMI, !F, 0xFF, B, > .bits = {{ > .offset = 0, > @@ -272,6 +275,7 @@ static const struct drm_i915_cmd_descriptor > video_cmds[] = { > > static const struct drm_i915_cmd_descriptor vecs_cmds[] = { > CMD( MI_ARB_ON_OFF,SMI,F, 1, R ), > + CMD( MI_SET_APPID, SMI,F, 1, S ), > CMD( MI_STORE_DWORD_IMM, SMI, !F, 0xFF, B, > .bits = {{ > .offset = 0, > @@ -481,13 +485,17 @@ static u32 gen7_bsd_get_cmd_length_mask(u32 > cmd_header) > u32 client = (cmd_header & INSTR_CLIENT_MASK) >> > INSTR_CLIENT_SHIFT; > u32 subclient = > (cmd_header & INSTR_SUBCLIENT_MASK) >> > INSTR_SUBCLIENT_SHIFT; > + u32 op = (cmd_header & INSTR_26_TO_24_MASK) >> > INSTR_26_TO_24_SHIFT; > > if (client == INSTR_MI_CLIENT) > return 0x3F; > else if (client == INSTR_RC_CLIENT) { > - if (subclient == INSTR_MEDIA_SUBCLIENT) > - return 0xFFF; > - else > + if (subclient == INSTR_MEDIA_SUBCLIENT) { > + if (op == 6) > + return 0x; > + else > + return 0xFFF; > + } else > return 0xFF; > } > > diff --git a/drivers/gpu/drm/i915/i915_reg.h > b/drivers/gpu/drm/i915/i915_reg.h index 7a77cd5..c881d88 100644 > --- a/drivers/gpu/drm/i915/i915_reg.h > +++ b/drivers/gpu/drm/i915/i915_reg.h > @@ -204,6 +204,8 @@ > #define INSTR_SUBCLIENT_SHIFT 27 > #define INSTR_SUBCLIENT_MASK0x1800 > #define INSTR_MEDIA_SUBCLIENT 0x2 > +#define INSTR_26_TO_24_MASK 0x700 > +#define INSTR_26_TO_24_SHIFT 24 > > /* > * Memory interface instructions used by the kernel @@ -233,6 +235,7 @@ > #define MI_BATCH_BUFFER_END MI_INSTR(0x0a, 0) > #define MI_SUSPEND_FLUSH MI_INSTR(0x0b, 0) > #define MI_SUSPEND_FLUSH_EN(1<<0) > +#define MI_SET_APPID MI_INSTR(0x0e, 0) > #define MI_OVERLAY_FLIP MI_INSTR(0x11, 0) > #define MI_OVERLAY_CONTINUE(0x0<<21) > #define MI_OVERLAY_ON (0x1<<21) Reviewed-by: Jon Bloomfield > -- > 1.9.1 > > ___ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mai
Re: [Intel-gfx] [PATCH v6 1/5] drm/i915: Implement a framework for batch buffer pools
> -Original Message- > From: Intel-gfx [mailto:intel-gfx-boun...@lists.freedesktop.org] On Behalf > Of Daniel Vetter > Sent: Tuesday, December 09, 2014 1:18 PM > To: Nguyen, Michael H > Cc: Brad Volkin; intel-gfx@lists.freedesktop.org > Subject: Re: [Intel-gfx] [PATCH v6 1/5] drm/i915: Implement a framework for > batch buffer pools > > On Mon, Dec 08, 2014 at 02:33:46PM -0800, michael.h.ngu...@intel.com > wrote: > > From: Brad Volkin > > > > This adds a small module for managing a pool of batch buffers. > > The only current use case is for the command parser, as described in > > the kerneldoc in the patch. The code is simple, but separating it out > > makes it easier to change the underlying algorithms and to extend to > > future use cases should they arise. > > > > The interface is simple: init to create an empty pool, fini to clean > > it up, get to obtain a new buffer. Note that all buffers are expected > > to be inactive before cleaning up the pool. > > > > Locking is currently based on the caller holding the struct_mutex. > > We already do that in the places where we will use the batch pool for > > the command parser. > > > > v2: > > - s/BUG_ON/WARN_ON/ for locking assertions > > - Remove the cap on pool size > > - Switch from alloc/free to init/fini > > > > v3: > > - Idiomatic looping structure in _fini > > - Correct handling of purged objects > > - Don't return a buffer that's too much larger than needed > > > > v4: > > - Rebased to latest -nightly > > > > v5: > > - Remove _put() function and clean up comments to match > > > > v6: > > - Move purged check inside the loop (danvet, from v4 1/7 feedback) > > > > v7: > > - Use single list instead of two. (Chris W) > > - s/active_list/cache_list > > - Squashed in debug patches (Chris W) > > drm/i915: Add a batch pool debugfs file > > > > It provides some useful information about the buffers in > > the global command parser batch pool. > > > > v2: rebase on global pool instead of per-ring pools > > v3: rebase > > > > drm/i915: Add batch pool details to i915_gem_objects debugfs > > > > To better account for the potentially large memory consumption > > of the batch pool. > > > > Issue: VIZ-4719 > > Signed-off-by: Brad Volkin > > --- > > Documentation/DocBook/drm.tmpl | 5 ++ > > drivers/gpu/drm/i915/Makefile | 1 + > > drivers/gpu/drm/i915/i915_debugfs.c| 71 ++-- > > drivers/gpu/drm/i915/i915_drv.h| 21 + > > drivers/gpu/drm/i915/i915_gem.c| 1 + > > drivers/gpu/drm/i915/i915_gem_batch_pool.c | 132 > > + > > 6 files changed, 222 insertions(+), 9 deletions(-) create mode > > 100644 drivers/gpu/drm/i915/i915_gem_batch_pool.c > > > > diff --git a/Documentation/DocBook/drm.tmpl > > b/Documentation/DocBook/drm.tmpl index 85287cb..022923a 100644 > > --- a/Documentation/DocBook/drm.tmpl > > +++ b/Documentation/DocBook/drm.tmpl > > @@ -4029,6 +4029,11 @@ int num_ioctls; > > !Idrivers/gpu/drm/i915/i915_cmd_parser.c > > > > > > +Batchbuffer Pools > > +!Pdrivers/gpu/drm/i915/i915_gem_batch_pool.c batch pool > > +!Idrivers/gpu/drm/i915/i915_gem_batch_pool.c > > + > > + > > Logical Rings, Logical Ring Contexts and > > Execlists !Pdrivers/gpu/drm/i915/intel_lrc.c Logical Rings, > > Logical Ring Contexts and Execlists > > !Idrivers/gpu/drm/i915/intel_lrc.c > > diff --git a/drivers/gpu/drm/i915/Makefile > > b/drivers/gpu/drm/i915/Makefile index e4083e4..c8dbb37d 100644 > > --- a/drivers/gpu/drm/i915/Makefile > > +++ b/drivers/gpu/drm/i915/Makefile > > @@ -19,6 +19,7 @@ i915-$(CONFIG_DEBUG_FS) += i915_debugfs.o > > > > # GEM code > > i915-y += i915_cmd_parser.o \ > > + i915_gem_batch_pool.o \ > > i915_gem_context.o \ > > i915_gem_render_state.o \ > > i915_gem_debug.o \ > > diff --git a/drivers/gpu/drm/i915/i915_debugfs.c > > b/drivers/gpu/drm/i915/i915_debugfs.c > > index d0e445e..3c3bf98 100644 > > --- a/drivers/gpu/drm/i915/i915_debugfs.c > > +++ b/drivers/gpu/drm/i915/i915_debugfs.c > > @@ -359,6 +359,33 @@ static int per_file_stats(int id, void *ptr, void > *data) > > return 0; > > } > > > > +#define print_file_stats(m, name, stats) \ > > + seq_printf(m, "%s: %u objects, %zu bytes (%zu active, %zu inactive, > %zu global, %zu shared, %zu unbound)\n", \ > > + name, \ > > + stats.count, \ > > + stats.total, \ > > + stats.active, \ > > + stats.inactive, \ > > + stats.global, \ > > + stats.shared, \ > > + stats.unbound) "print_file_stats" might be better named "print_obj_stats" or similar. As it stands there is nothing in its name to suggest its purpose is to describe an object. > > + > > +static void print_batch_pool_stats(struct seq_file *m, > > + struct drm_i915_private *dev_priv) { > > + struct drm_i915_gem_object *obj; >
Re: [Intel-gfx] [PATCH v6 1/5] drm/i915: Implement a framework for batch buffer pools
> -Original Message- > From: Nguyen, Michael H > Sent: Wednesday, December 10, 2014 4:34 PM > To: Bloomfield, Jon > Cc: Daniel Vetter; intel-gfx@lists.freedesktop.org > Subject: Re: [Intel-gfx] [PATCH v6 1/5] drm/i915: Implement a framework for > batch buffer pools > > Hi Jon, > > On 12/10/2014 03:06 AM, Bloomfield, Jon wrote: > >> -Original Message- > >> From: Intel-gfx [mailto:intel-gfx-boun...@lists.freedesktop.org] On > >> Behalf Of Daniel Vetter > >> Sent: Tuesday, December 09, 2014 1:18 PM > >> To: Nguyen, Michael H > >> Cc: Brad Volkin; intel-gfx@lists.freedesktop.org > >> Subject: Re: [Intel-gfx] [PATCH v6 1/5] drm/i915: Implement a > >> framework for batch buffer pools > >> > >> On Mon, Dec 08, 2014 at 02:33:46PM -0800, michael.h.ngu...@intel.com > >> wrote: > >>> From: Brad Volkin > >>> > >>> This adds a small module for managing a pool of batch buffers. > >>> The only current use case is for the command parser, as described in > >>> the kerneldoc in the patch. The code is simple, but separating it > >>> out makes it easier to change the underlying algorithms and to > >>> extend to future use cases should they arise. > >>> > >>> The interface is simple: init to create an empty pool, fini to clean > >>> it up, get to obtain a new buffer. Note that all buffers are > >>> expected to be inactive before cleaning up the pool. > >>> > >>> Locking is currently based on the caller holding the struct_mutex. > >>> We already do that in the places where we will use the batch pool > >>> for the command parser. > >>> > >>> v2: > >>> - s/BUG_ON/WARN_ON/ for locking assertions > >>> - Remove the cap on pool size > >>> - Switch from alloc/free to init/fini > >>> > >>> v3: > >>> - Idiomatic looping structure in _fini > >>> - Correct handling of purged objects > >>> - Don't return a buffer that's too much larger than needed > >>> > >>> v4: > >>> - Rebased to latest -nightly > >>> > >>> v5: > >>> - Remove _put() function and clean up comments to match > >>> > >>> v6: > >>> - Move purged check inside the loop (danvet, from v4 1/7 feedback) > >>> > >>> v7: > >>> - Use single list instead of two. (Chris W) > >>> - s/active_list/cache_list > >>> - Squashed in debug patches (Chris W) > >>>drm/i915: Add a batch pool debugfs file > >>> > >>>It provides some useful information about the buffers in > >>>the global command parser batch pool. > >>> > >>>v2: rebase on global pool instead of per-ring pools > >>>v3: rebase > >>> > >>>drm/i915: Add batch pool details to i915_gem_objects debugfs > >>> > >>>To better account for the potentially large memory consumption > >>>of the batch pool. > >>> > >>> Issue: VIZ-4719 > >>> Signed-off-by: Brad Volkin > >>> --- > >>> Documentation/DocBook/drm.tmpl | 5 ++ > >>> drivers/gpu/drm/i915/Makefile | 1 + > >>> drivers/gpu/drm/i915/i915_debugfs.c| 71 ++-- > >>> drivers/gpu/drm/i915/i915_drv.h| 21 + > >>> drivers/gpu/drm/i915/i915_gem.c| 1 + > >>> drivers/gpu/drm/i915/i915_gem_batch_pool.c | 132 > >>> + > >>> 6 files changed, 222 insertions(+), 9 deletions(-) create mode > >>> 100644 drivers/gpu/drm/i915/i915_gem_batch_pool.c > >>> > >>> diff --git a/Documentation/DocBook/drm.tmpl > >>> b/Documentation/DocBook/drm.tmpl index 85287cb..022923a 100644 > >>> --- a/Documentation/DocBook/drm.tmpl > >>> +++ b/Documentation/DocBook/drm.tmpl > >>> @@ -4029,6 +4029,11 @@ int num_ioctls; > >>> !Idrivers/gpu/drm/i915/i915_cmd_parser.c > >>> > >>> > >>> +Batchbuffer Pools > >>> +!Pdrivers/gpu/drm/i915/i915_gem_batch_pool.c batch pool > >>> +!Idrivers/gpu/drm/i915/i915_gem_batch_pool.c > >>> + > >>> + > >>> Logical Rings, Logical Ring Contexts and > >>> Execlists !Pdrivers/gpu/drm/i915/intel_lrc.c Lo
Re: [Intel-gfx] [PATCH v6 4/5] drm/i915: Mark shadow batch buffers as purgeable
> -Original Message- > From: Nguyen, Michael H > Sent: Monday, December 08, 2014 10:34 PM > To: intel-gfx@lists.freedesktop.org > Cc: Bloomfield, Jon; Brad Volkin > Subject: [PATCH v6 4/5] drm/i915: Mark shadow batch buffers as purgeable > > From: Brad Volkin > > By adding a new exec_entry flag, we cleanly mark the shadow objects as > purgeable after they are on the active list. > > v2: > - Move 'shadow_batch_obj->madv = I915_MADV_WILLNEED' inside _get > fnc (danvet, from v4 6/7 feedback) > > Issue: VIZ-4719 > Signed-off-by: Brad Volkin > --- > drivers/gpu/drm/i915/i915_gem_batch_pool.c | 2 ++ > drivers/gpu/drm/i915/i915_gem_execbuffer.c | 11 ++- > 2 files changed, 12 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem_batch_pool.c > b/drivers/gpu/drm/i915/i915_gem_batch_pool.c > index e9349e3..9e0ec4b 100644 > --- a/drivers/gpu/drm/i915/i915_gem_batch_pool.c > +++ b/drivers/gpu/drm/i915/i915_gem_batch_pool.c > @@ -128,5 +128,7 @@ i915_gem_batch_pool_get(struct > i915_gem_batch_pool *pool, > list_add_tail(&obj->batch_pool_list, &pool->cache_list); > } > > + obj->madv = I915_MADV_WILLNEED; > + > return obj; > } > diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c > b/drivers/gpu/drm/i915/i915_gem_execbuffer.c > index fccfff5..2071938 100644 > --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c > +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c > @@ -37,6 +37,7 @@ > #define __EXEC_OBJECT_HAS_FENCE (1<<30) #define > __EXEC_OBJECT_NEEDS_MAP (1<<29) #define > __EXEC_OBJECT_NEEDS_BIAS (1<<28) > +#define __EXEC_OBJECT_PURGEABLE (1<<27) > > #define BATCH_OFFSET_BIAS (256*1024) > > @@ -226,7 +227,12 @@ i915_gem_execbuffer_unreserve_vma(struct > i915_vma *vma) > if (entry->flags & __EXEC_OBJECT_HAS_PIN) > vma->pin_count--; > > - entry->flags &= ~(__EXEC_OBJECT_HAS_FENCE | > __EXEC_OBJECT_HAS_PIN); > + if (entry->flags & __EXEC_OBJECT_PURGEABLE) > + obj->madv = I915_MADV_DONTNEED; > + > + entry->flags &= ~(__EXEC_OBJECT_HAS_FENCE | > + __EXEC_OBJECT_HAS_PIN | > + __EXEC_OBJECT_PURGEABLE); > } > > static void eb_destroy(struct eb_vmas *eb) @@ -1410,6 +1416,8 @@ > i915_gem_do_execbuffer(struct drm_device *dev, void *data, > goto err; > } > > + shadow_batch_obj->madv = I915_MADV_WILLNEED; > + Hasn't i915_gem_batch_pool_get() has already marked the buffer as WILLNEED ? > ret = i915_gem_obj_ggtt_pin(shadow_batch_obj, 4096, 0); > if (ret) > goto err; > @@ -1433,6 +1441,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, > void *data, > > vma = i915_gem_obj_to_ggtt(shadow_batch_obj); > vma->exec_entry = &shadow_exec_entry; > + vma->exec_entry->flags = > __EXEC_OBJECT_PURGEABLE; > drm_gem_object_reference(&shadow_batch_obj- > >base); > list_add_tail(&vma->exec_list, &eb->vmas); > > -- > 1.9.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v6 5/5] drm/i915: Tidy up execbuffer command parsing code
> -Original Message- > From: Nguyen, Michael H > Sent: Monday, December 08, 2014 10:34 PM > To: intel-gfx@lists.freedesktop.org > Cc: Bloomfield, Jon; Brad Volkin > Subject: [PATCH v6 5/5] drm/i915: Tidy up execbuffer command parsing code > > From: Brad Volkin > > Move it to a separate function since the main do_execbuffer function > already has so much going on. > > v2: > - Move pin/unpin calls inside i915_parse_cmds() (Chris W, v4 7/7 > feedback) > > Issue: VIZ-4719 > Signed-off-by: Brad Volkin > > Conflicts: > drivers/gpu/drm/i915/i915_gem_execbuffer.c > --- > drivers/gpu/drm/i915/i915_cmd_parser.c | 8 ++ > drivers/gpu/drm/i915/i915_gem_execbuffer.c | 126 --- > -- > 2 files changed, 77 insertions(+), 57 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c > b/drivers/gpu/drm/i915/i915_cmd_parser.c > index 118079d..80b1b28 100644 > --- a/drivers/gpu/drm/i915/i915_cmd_parser.c > +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c > @@ -1042,10 +1042,17 @@ int i915_parse_cmds(struct intel_engine_cs > *ring, > struct drm_i915_cmd_descriptor default_desc = { 0 }; > bool oacontrol_set = false; /* OACONTROL tracking. See > check_cmd() */ > > + ret = i915_gem_obj_ggtt_pin(shadow_batch_obj, 4096, 0); > + if (ret) { > + DRM_DEBUG_DRIVER("CMD: Failed to pin shadow batch\n"); > + return -1; > + } > + > batch_base = copy_batch(shadow_batch_obj, batch_obj, > batch_start_offset, batch_len); > if (IS_ERR(batch_base)) { > DRM_DEBUG_DRIVER("CMD: Failed to copy batch\n"); > + i915_gem_object_ggtt_unpin(shadow_batch_obj); > return PTR_ERR(batch_base); > } > > @@ -1116,6 +1123,7 @@ int i915_parse_cmds(struct intel_engine_cs *ring, > } > > vunmap(batch_base); > + i915_gem_object_ggtt_unpin(shadow_batch_obj); > > return ret; > } > diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c > b/drivers/gpu/drm/i915/i915_gem_execbuffer.c > index 2071938..3d36465 100644 > --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c > +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c > @@ -1069,6 +1069,65 @@ i915_emit_box(struct intel_engine_cs *ring, > return 0; > } > > +static struct drm_i915_gem_object* > +i915_gem_execbuffer_parse(struct intel_engine_cs *ring, > + struct drm_i915_gem_exec_object2 > *shadow_exec_entry, > + struct eb_vmas *eb, > + struct drm_i915_gem_object *batch_obj, > + u32 batch_start_offset, > + u32 batch_len, > + bool is_master, > + u32 *flags) > +{ > + struct drm_i915_private *dev_priv = to_i915(batch_obj->base.dev); > + struct drm_i915_gem_object *shadow_batch_obj; > + int ret; > + > + shadow_batch_obj = i915_gem_batch_pool_get(&dev_priv- > >mm.batch_pool, > +batch_obj->base.size); > + if (IS_ERR(shadow_batch_obj)) > + return shadow_batch_obj; > + > + ret = i915_parse_cmds(ring, > + batch_obj, > + shadow_batch_obj, > + batch_start_offset, > + batch_len, > + is_master); > + if (ret) { > + if (ret == -EACCES) > + return batch_obj; Shouldn't this be returning an error code ? > + } else { > + struct i915_vma *vma; > + > + memset(shadow_exec_entry, 0, > sizeof(*shadow_exec_entry)); > + > + vma = i915_gem_obj_to_ggtt(shadow_batch_obj); > + vma->exec_entry = shadow_exec_entry; > + vma->exec_entry->flags = __EXEC_OBJECT_PURGEABLE; > + drm_gem_object_reference(&shadow_batch_obj->base); > + list_add_tail(&vma->exec_list, &eb->vmas); > + > + shadow_batch_obj->base.pending_read_domains = > + batch_obj->base.pending_read_domains; > + > + /* > + * Set the DISPATCH_SECURE bit to remove the > NON_SECURE > + * bit from MI_BATCH_BUFFER_START commands issued in > the > + * dispatch_execbuffer implementations. We specifically > + * don't want that set when the command parser is > + * enabled. > + * > + * FIXME: with aliasing ppg
Re: [Intel-gfx] [PATCH v7 1/5] drm/i915: Implement a framework for batch buffer pools
> -Original Message- > From: Nguyen, Michael H > Sent: Thursday, December 11, 2014 8:13 PM > To: intel-gfx@lists.freedesktop.org > Cc: Bloomfield, Jon; Brad Volkin > Subject: [PATCH v7 1/5] drm/i915: Implement a framework for batch buffer > pools > > From: Brad Volkin > > This adds a small module for managing a pool of batch buffers. > The only current use case is for the command parser, as described in the > kerneldoc in the patch. The code is simple, but separating it out makes it > easier to change the underlying algorithms and to extend to future use cases > should they arise. > > The interface is simple: init to create an empty pool, fini to clean it up, > get to > obtain a new buffer. Note that all buffers are expected to be inactive before > cleaning up the pool. > > Locking is currently based on the caller holding the struct_mutex. > We already do that in the places where we will use the batch pool for the > command parser. > > v2: > - s/BUG_ON/WARN_ON/ for locking assertions > - Remove the cap on pool size > - Switch from alloc/free to init/fini > > v3: > - Idiomatic looping structure in _fini > - Correct handling of purged objects > - Don't return a buffer that's too much larger than needed > > v4: > - Rebased to latest -nightly > > v5: > - Remove _put() function and clean up comments to match > > v6: > - Move purged check inside the loop (danvet, from v4 1/7 feedback) > > v7: > - Use single list instead of two. (Chris W) > - s/active_list/cache_list > - Squashed in debug patches (Chris W) > drm/i915: Add a batch pool debugfs file > > It provides some useful information about the buffers in > the global command parser batch pool. > > v2: rebase on global pool instead of per-ring pools > v3: rebase > > drm/i915: Add batch pool details to i915_gem_objects debugfs > > To better account for the potentially large memory consumption > of the batch pool. > > v8: > - Keep cache in LRU order (danvet, from v6 1/5 feedback) > > Issue: VIZ-4719 > Signed-off-by: Brad Volkin > --- > Documentation/DocBook/drm.tmpl | 5 ++ > drivers/gpu/drm/i915/Makefile | 1 + > drivers/gpu/drm/i915/i915_debugfs.c| 71 +-- > drivers/gpu/drm/i915/i915_drv.h| 21 + > drivers/gpu/drm/i915/i915_gem.c| 1 + > drivers/gpu/drm/i915/i915_gem_batch_pool.c | 135 > + > 6 files changed, 225 insertions(+), 9 deletions(-) create mode 100644 > drivers/gpu/drm/i915/i915_gem_batch_pool.c > > diff --git a/Documentation/DocBook/drm.tmpl > b/Documentation/DocBook/drm.tmpl index 25c23ca..21cbcdb 100644 > --- a/Documentation/DocBook/drm.tmpl > +++ b/Documentation/DocBook/drm.tmpl > @@ -4059,6 +4059,11 @@ int num_ioctls; > !Idrivers/gpu/drm/i915/i915_cmd_parser.c > > > +Batchbuffer Pools > +!Pdrivers/gpu/drm/i915/i915_gem_batch_pool.c batch pool > +!Idrivers/gpu/drm/i915/i915_gem_batch_pool.c > + > + > Logical Rings, Logical Ring Contexts and Execlists > !Pdrivers/gpu/drm/i915/intel_lrc.c Logical Rings, Logical Ring Contexts and > Execlists !Idrivers/gpu/drm/i915/intel_lrc.c > diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile > index 3cf70a6..1849ffa 100644 > --- a/drivers/gpu/drm/i915/Makefile > +++ b/drivers/gpu/drm/i915/Makefile > @@ -19,6 +19,7 @@ i915-$(CONFIG_DEBUG_FS) += i915_debugfs.o > > # GEM code > i915-y += i915_cmd_parser.o \ > + i915_gem_batch_pool.o \ > i915_gem_context.o \ > i915_gem_render_state.o \ > i915_gem_debug.o \ > diff --git a/drivers/gpu/drm/i915/i915_debugfs.c > b/drivers/gpu/drm/i915/i915_debugfs.c > index 479e0c1..5d96d94 100644 > --- a/drivers/gpu/drm/i915/i915_debugfs.c > +++ b/drivers/gpu/drm/i915/i915_debugfs.c > @@ -359,6 +359,33 @@ static int per_file_stats(int id, void *ptr, void *data) > return 0; > } > > +#define print_file_stats(m, name, stats) \ > + seq_printf(m, "%s: %u objects, %zu bytes (%zu active, %zu inactive, > %zu global, %zu shared, %zu unbound)\n", \ > +name, \ > +stats.count, \ > +stats.total, \ > +stats.active, \ > +stats.inactive, \ > +stats.global, \ > +stats.shared, \ > +stats.unbound) > + > +static void print_batch_pool_stats(struct seq_file *m, > +struct drm_i915_private *dev_priv) { > + struct drm_i915_gem_object *obj; > + struct file_stats stats; >
Re: [Intel-gfx] [PATCH v7 2/5] drm/i915: Use batch pools with the command parser
> -Original Message- > From: Nguyen, Michael H > Sent: Thursday, December 11, 2014 8:13 PM > To: intel-gfx@lists.freedesktop.org > Cc: Bloomfield, Jon; Brad Volkin > Subject: [PATCH v7 2/5] drm/i915: Use batch pools with the command parser > > From: Brad Volkin > > This patch sets up all of the tracking and copying necessary to use batch > pools > with the command parser and dispatches the copied > (shadow) batch to the hardware. > > After this patch, the parser is in 'enabling' mode. > > Note that performance takes a hit from the copy in some cases and will likely > need some work. At a rough pass, the memcpy appears to be the > bottleneck. Without having done a deeper analysis, two ideas that come to > mind are: > 1) Copy sections of the batch at a time, as they are reached >by parsing. Might improve cache locality. > 2) Copy only up to the userspace-supplied batch length and >memset the rest of the buffer. Reduces the number of reads. > > v2: > - Remove setting the capacity of the pool > - One global pool instead of per-ring pools > - Replace batch_obj with shadow_batch_obj and hook into eb->vmas > - Memset any space in the shadow batch beyond what gets copied > - Rebased on execlist prep refactoring > > v3: > - Rebase on chained batch handling > - Squash in setting the secure dispatch flag > - Add a note about the interaction w/secure dispatch pinning > - Check for request->batch_obj == NULL in i915_gem_free_request > > v4: > - Fix read domains for shadow_batch_obj > - Remove the set_to_gtt_domain call from i915_parse_cmds > - ggtt_pin/unpin in the parser block to simplify error handling > - Check USES_FULL_PPGTT before setting DISPATCH_SECURE flag > - Remove i915_gem_batch_pool_put calls > > v5: > - Move 'pending_read_domains |= I915_GEM_DOMAIN_COMMAND' after > the parser (danvet, from v4 0/7 feedback) > > Issue: VIZ-4719 > Signed-off-by: Brad Volkin > --- Reviewed-by: Jon Bloomfield ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v7 3/5] drm/i915: Use batch length instead of object size in command parser
> -Original Message- > From: Nguyen, Michael H > Sent: Thursday, December 11, 2014 8:13 PM > To: intel-gfx@lists.freedesktop.org > Cc: Bloomfield, Jon; Brad Volkin > Subject: [PATCH v7 3/5] drm/i915: Use batch length instead of object size in > command parser > > From: Brad Volkin > > Previously we couldn't trust the user-supplied batch length because it came > directly from userspace (i.e. untrusted code). It would have affected what > commands software parsed without regard to what hardware would actually > execute, leaving a potential hole. > > With the parser now copying the user supplied batch buffer and writing > MI_NOP commands to any space after the copied region, we can safely use > the batch length input. This should be a performance win as the actual batch > length is frequently much smaller than the allocated object size. > > v2: Fix handling of non-zero batch_start_offset > > Issue: VIZ-4719 > Signed-off-by: Brad Volkin Reviewed-by: Jon Bloomfield ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v7 4/5] drm/i915: Mark shadow batch buffers as purgeable
> -Original Message- > From: Nguyen, Michael H > Sent: Thursday, December 11, 2014 8:13 PM > To: intel-gfx@lists.freedesktop.org > Cc: Bloomfield, Jon; Brad Volkin > Subject: [PATCH v7 4/5] drm/i915: Mark shadow batch buffers as purgeable > > From: Brad Volkin > > By adding a new exec_entry flag, we cleanly mark the shadow objects as > purgeable after they are on the active list. > > v2: > - Move 'shadow_batch_obj->madv = I915_MADV_WILLNEED' inside _get > fnc (danvet, from v4 6/7 feedback) > > v3: > - Remove duplicate 'madv = I915_MADV_WILLNEED' (danvet, from v6 4/5) > > Issue: VIZ-4719 > Signed-off-by: Brad Volkin Reviewed-by: Jon Bloomfield ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v7 5/5] drm/i915: Tidy up execbuffer command parsing code
> -Original Message- > From: Nguyen, Michael H > Sent: Thursday, December 11, 2014 8:13 PM > To: intel-gfx@lists.freedesktop.org > Cc: Bloomfield, Jon; Brad Volkin > Subject: [PATCH v7 5/5] drm/i915: Tidy up execbuffer command parsing code > > From: Brad Volkin > > Move it to a separate function since the main do_execbuffer function > already has so much going on. > > v2: > - Move pin/unpin calls inside i915_parse_cmds() (Chris W, v4 7/7 > feedback) > > Issue: VIZ-4719 > Signed-off-by: Brad Volkin Reviewed-by: Jon Bloomfield ? ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915: Revert "drm/i915: Reject the pin ioctl on gen6+"
On Fri, Sep 19, 2014 Daniel Vetter wrote: > I still consider pinning stuff behind the kernels back too evil. So if people > want that I'd like to see the use-case and why we can't do this differently. > > And I've never approved of the pin ioctl but really loudly complained about > each occasion I've spotted, so I think the internal users just have to keep > the > pieces for a bit longer. > -Daniel Daniel. All that's being asked for is a re-instatement of the old mechanism until a better solution can be designed and implemented. It may well be evil, but the mechanism was there, and was the only known way to handle the OA buffer, so why wouldn't someone use it ? You've broken userspace before you provided any alternative solution. Please, let's revert this while a more elegant solution is designed, implemented, reviewed, re-implemented, reviewed again, and maybe one day merged. Remember that it will take a while to filter down to people even after you merge any new solution to nightly, or even drm-next. If we have to wait for the new design, it's not likely to reach us until next year. Can't we just agree that we're evil, but turn a blind eye while we're coaxed slowly back towards the light ? Jon ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/intel-gfx