[Intel-gfx] ✓ Fi.CI.IGT: success for Begin enabling Xe_HP SDV and DG2 platforms (rev9)
== Series Details == Series: Begin enabling Xe_HP SDV and DG2 platforms (rev9) URL : https://patchwork.freedesktop.org/series/92135/ State : success == Summary == CI Bug Log - changes from CI_DRM_10451_full -> Patchwork_20776_full Summary --- **SUCCESS** No regressions found. Possible new issues --- Here are the unknown changes that may have been introduced in Patchwork_20776_full: ### CI changes ### Suppressed The following results come from untrusted machines, tests, or statuses. They do not affect the overall result. * boot: - {shard-rkl}:([PASS][1], [PASS][2], [PASS][3], [PASS][4], [PASS][5], [PASS][6], [PASS][7], [PASS][8], [PASS][9], [PASS][10], [PASS][11], [PASS][12], [PASS][13], [PASS][14], [PASS][15], [PASS][16], [PASS][17], [PASS][18], [PASS][19], [PASS][20], [PASS][21], [PASS][22]) -> ([PASS][23], [PASS][24], [PASS][25], [PASS][26], [PASS][27], [FAIL][28], [PASS][29], [PASS][30], [PASS][31], [PASS][32], [PASS][33], [PASS][34], [PASS][35], [PASS][36], [PASS][37], [PASS][38], [PASS][39], [PASS][40], [PASS][41], [PASS][42], [PASS][43], [PASS][44], [PASS][45]) [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10451/shard-rkl-6/boot.html [2]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10451/shard-rkl-6/boot.html [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10451/shard-rkl-6/boot.html [4]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10451/shard-rkl-6/boot.html [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10451/shard-rkl-5/boot.html [6]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10451/shard-rkl-5/boot.html [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10451/shard-rkl-5/boot.html [8]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10451/shard-rkl-5/boot.html [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10451/shard-rkl-5/boot.html [10]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10451/shard-rkl-2/boot.html [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10451/shard-rkl-2/boot.html [12]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10451/shard-rkl-2/boot.html [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10451/shard-rkl-2/boot.html [14]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10451/shard-rkl-2/boot.html [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10451/shard-rkl-2/boot.html [16]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10451/shard-rkl-2/boot.html [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10451/shard-rkl-1/boot.html [18]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10451/shard-rkl-1/boot.html [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10451/shard-rkl-1/boot.html [20]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10451/shard-rkl-1/boot.html [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10451/shard-rkl-1/boot.html [22]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10451/shard-rkl-1/boot.html [23]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20776/shard-rkl-6/boot.html [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20776/shard-rkl-6/boot.html [25]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20776/shard-rkl-6/boot.html [26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20776/shard-rkl-5/boot.html [27]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20776/shard-rkl-5/boot.html [28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20776/shard-rkl-5/boot.html [29]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20776/shard-rkl-5/boot.html [30]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20776/shard-rkl-5/boot.html [31]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20776/shard-rkl-5/boot.html [32]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20776/shard-rkl-2/boot.html [33]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20776/shard-rkl-2/boot.html [34]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20776/shard-rkl-2/boot.html [35]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20776/shard-rkl-2/boot.html [36]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20776/shard-rkl-2/boot.html [37]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20776/shard-rkl-2/boot.html [38]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20776/shard-rkl-1/boot.html [39]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20776/shard-rkl-1/boot.html [40]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20776/shard-rkl-1/boot.html [41]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20776/shard-rkl-1/boot.html [42]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20776/shard-rkl-1/boot.html [43]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20776/shard-rkl-1/boot.html [44]: https://intel-gfx-ci
Re: [Intel-gfx] [PATCH i-g-t v3 03/11] lib/i915/gem_mman: add fixed mode to gem_mmap_offset__cpu
Op 02-08-2021 om 08:29 schreef Dixit, Ashutosh: > On Fri, 30 Jul 2021 01:53:40 -0700, Matthew Auld wrote: >> On discrete we only support the new fixed mode. >> >> Signed-off-by: Matthew Auld >> Cc: Maarten Lankhorst >> Cc: Ashutosh Dixit >> Cc: Daniel Vetter >> Cc: Ramalingam C >> --- >> lib/i915/gem_mman.c | 8 +++- >> 1 file changed, 7 insertions(+), 1 deletion(-) >> >> diff --git a/lib/i915/gem_mman.c b/lib/i915/gem_mman.c >> index c432bb16..563a7ccf 100644 >> --- a/lib/i915/gem_mman.c >> +++ b/lib/i915/gem_mman.c >> @@ -474,8 +474,14 @@ void *gem_mmap__cpu(int fd, uint32_t handle, uint64_t >> offset, uint64_t size, uns >> void *__gem_mmap_offset__cpu(int fd, uint32_t handle, uint64_t offset, >> uint64_t size, unsigned prot) >> { >> -return __gem_mmap_offset(fd, handle, offset, size, prot, >> +void *ptr; >> + >> +ptr = __gem_mmap_offset(fd, handle, offset, size, prot, >> I915_MMAP_OFFSET_WB); >> +if (!ptr) >> +ptr = __gem_mmap_offset__fixed(fd, handle, offset, size, prot); >> + >> +return ptr; > Imo there's some asymmetry here. If we are adding fixed mode to > mmap__device_coherent (in the previous patch) then we should also be adding > it to mmap__cpu_coherent (as before). Or, if we are adding fixed mode to > __gem_mmap_offset__cpu we should also be adding it to > __gem_mmap_offset__wc. Thanks. I've applied the patch series for now. Not sure I agree on adding it to __gem_mmap_offset__wc, but if there's a need, we could do that. :)
[Intel-gfx] [PULL] drm-intel-gt-next
Hi Dave & Daniel, Sorry for the big PR in advance. Had the summer vacations and did not notice until tool late how many patches were in already before leaving. As requested, there is a lot of refactoring to increase the use of TTM allocator and prep for DRM scheduler. Note that at times the patches trade simplicity for performance and revert optimizations not seen as critical. So some performance regressions are expected. As an example is dropping of faster GPU relocation path for older platforms, which should be mitigated by updating to the latest UMD versions to regain the perf. This PR drops various bits of uAPI where userspace has dropped the ball after reviews have been completed on both sides (Thanks Jason for doing the due-diligence!). [1] Due to the complexity of tracing back who used what, I think we could do better in the future to avoid such situations to begin with. See below for my suggestion [2]. In addition to the refactoring and uAPI cleanup there is preliminay code for ADL-P/XeHP and DG2 platforms. Fixes for ADL-S and removal of CNL code. A couple of fixes that have been Cc: stable already. Removing unnecessary workarounds per stepping and adding missing for Gen12 iGFX. I915_MMAP_OFFSET_FIXED is also added to to align with the static/fixed caching mode per BO instead of per-mapping mode (for dGFX only). There is GuC firmware interface update and backend code major rework that unblock enabling GuC on Gen11 (not on by default). Then there is the addition of the GuCRC power management feature which can be enabled for Gen12+ when submission is enabled. Then there is finally the drm-next backmerge and 3 topic branch merges. I think that is mostly all. I tried to summarize much in the tag description so it should hopefully not be horribly long... Regards, Joonas [1] Given that Jason is only human and there is no way to automate this analysis, we may have to bring something back as fixes if we find breakage (like with the MMAP IOCTL). [2] As we first require to merge the kernel code, should we introduce some further rules that the userspace has to land their code before the final kernel version release? If that is not followed through, we would neuter the new uAPI with as small patch as possible in the final release candidate and then remove it in next release. Thoughts? *** drm-intel-gt-next-2021-08-06-1: UAPI Changes: - Add I915_MMAP_OFFSET_FIXED On devices with local memory `I915_MMAP_OFFSET_FIXED` is the only valid type. On devices without local memory, this caching mode is invalid. As caching mode when specifying `I915_MMAP_OFFSET_FIXED`, WC or WB will be used, depending on the object placement on creation. WB will be used when the object can only exist in system memory, WC otherwise. Userspace: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11888 - Reinstate the mmap ioctl for (already released) integrated Gen12 platforms Rationale: Otherwise media driver breaks eg. for ADL-P. Long term goal is still to sunset the IOCTL even for integrated and require using mmap_offset. - Reject caching/set_domain IOCTLs on discrete Expected to become immutable property of the BO - Disallow changing context parameters after first use on Gen12 and earlier - Require setting context parameters at creation on platforms after Gen12 Rationale (for both): Allow less dynamic changes to the context to simplify the implementation and avoid user shooting theirselves in the foot. - Drop I915_CONTEXT_PARAM_RINGSIZE Userspace PR for compute-driver has not been merged - Drop I915_CONTEXT_PARAM_NO_ZEROMAP Userspace PR for libdrm / Beignet was never landed - Drop CONTEXT_CLONE API Userspace PR for Mesa was never landed - Drop getparam support for I915_CONTEXT_PARAM_ENGINES Only existed for symmetry wrt. setparam, never used. - Disallow bonding of virtual engines Drop the prep work, no hardware has been released needing it. - (Implicit) Disable gpu relocations Media userspace was the last userspace to still use them. They have converted so performance can be regained with an update. Core Changes: - Merge topic branch 'topic/i915-ttm-2021-06-11' (from Maarten) - Merge topic branch 'topic/revid_steppings' (from Matt R) - Merge topic branch 'topic/xehp-dg2-definitions-2021-07-21' (from Matt R) - Backmerges drm-next (Rodrigo) Driver Changes: - Initial workarounds for ADL-P (Clint) - Preliminary code for XeHP/DG2 (Stuart, Umesh, Matt R, Prathap, Ram, Venkata, Akeem, Tvrtko, John, Lucas) - Fix ADL-S DMA mask size to 39 bits (Tejas) - Remove code for CNL (Lucas) - Add ADL-P GuC/HuC firmwares (John) - Update HuC to 7.9.3 for TGL/ADL-S/RKL (John) - Fix -EDEADLK handling regression (Ville) - Implement Wa_1508744258 for DG1 and Gen12 iGFX (Jose) - Extend Wa_1406941453 to ADL-S (Jose) - Drop unnecessary workarounds per stepping for SKL/BXT/ICL (Matt R) - Use fuse info to enable SFC on Gen12 (Venkata) - Unconditionally flush the pages on acq
Re: [Intel-gfx] [PATCH i-g-t v3 03/11] lib/i915/gem_mman: add fixed mode to gem_mmap_offset__cpu
Op 02-08-2021 om 08:29 schreef Dixit, Ashutosh: > On Fri, 30 Jul 2021 01:53:40 -0700, Matthew Auld wrote: >> On discrete we only support the new fixed mode. >> >> Signed-off-by: Matthew Auld >> Cc: Maarten Lankhorst >> Cc: Ashutosh Dixit >> Cc: Daniel Vetter >> Cc: Ramalingam C >> --- >> lib/i915/gem_mman.c | 8 +++- >> 1 file changed, 7 insertions(+), 1 deletion(-) >> >> diff --git a/lib/i915/gem_mman.c b/lib/i915/gem_mman.c >> index c432bb16..563a7ccf 100644 >> --- a/lib/i915/gem_mman.c >> +++ b/lib/i915/gem_mman.c >> @@ -474,8 +474,14 @@ void *gem_mmap__cpu(int fd, uint32_t handle, uint64_t >> offset, uint64_t size, uns >> void *__gem_mmap_offset__cpu(int fd, uint32_t handle, uint64_t offset, >> uint64_t size, unsigned prot) >> { >> -return __gem_mmap_offset(fd, handle, offset, size, prot, >> +void *ptr; >> + >> +ptr = __gem_mmap_offset(fd, handle, offset, size, prot, >> I915_MMAP_OFFSET_WB); >> +if (!ptr) >> +ptr = __gem_mmap_offset__fixed(fd, handle, offset, size, prot); >> + >> +return ptr; > Imo there's some asymmetry here. If we are adding fixed mode to > mmap__device_coherent (in the previous patch) then we should also be adding > it to mmap__cpu_coherent (as before). Or, if we are adding fixed mode to > __gem_mmap_offset__cpu we should also be adding it to > __gem_mmap_offset__wc. Thanks. Why do we need it in __gem_mmap_offset_wc btw? Rest of series doesn't seem to be blocked by it, so pushed for now.
Re: [Intel-gfx] [PULL] drm-intel-gt-next
Quoting Joonas Lahtinen (2021-08-06 13:06:17) > Hi Dave & Daniel, > > Sorry for the big PR in advance. Had the summer vacations and did not > notice until tool late how many patches were in already before leaving. > > As requested, there is a lot of refactoring to increase the use of TTM > allocator and prep for DRM scheduler. Note that at times the patches trade > simplicity for performance and revert optimizations not seen as critical. > So some performance regressions are expected. > > As an example is dropping of faster GPU relocation path for older platforms, > which should be mitigated by updating to the latest UMD versions to regain > the perf. > > This PR drops various bits of uAPI where userspace has dropped the ball after > reviews > have been completed on both sides (Thanks Jason for doing the > due-diligence!). [1] > Due to the complexity of tracing back who used what, I think we could do > better in the > future to avoid such situations to begin with. See below for my suggestion > [2]. > > In addition to the refactoring and uAPI cleanup there is preliminay code for > ADL-P/XeHP and DG2 platforms. Fixes for ADL-S and removal of CNL code. > A couple of fixes that have been Cc: stable already. Removing unnecessary > workarounds per stepping and adding missing for Gen12 iGFX. > > I915_MMAP_OFFSET_FIXED is also added to to align with the static/fixed caching > mode per BO instead of per-mapping mode (for dGFX only). There is GuC firmware > interface update and backend code major rework that unblock enabling GuC on > Gen11 > (not on by default). Then there is the addition of the GuCRC power management > feature which can be enabled for Gen12+ when submission is enabled. There is also addition of I915_USERPTR_PROBE with userspace changes in: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12044 Regards, Joonas > > Then there is finally the drm-next backmerge and 3 topic branch merges. > > I think that is mostly all. I tried to summarize much in the tag description > so > it should hopefully not be horribly long... > > Regards, Joonas > > [1] Given that Jason is only human and there is no way to automate this > analysis, we > may have to bring something back as fixes if we find breakage (like with the > MMAP IOCTL). > > [2] As we first require to merge the kernel code, should we introduce some > further rules > that the userspace has to land their code before the final kernel version > release? If > that is not followed through, we would neuter the new uAPI with as small > patch as possible > in the final release candidate and then remove it in next release. Thoughts? > > *** > > drm-intel-gt-next-2021-08-06-1: > > UAPI Changes: > > - Add I915_MMAP_OFFSET_FIXED > > On devices with local memory `I915_MMAP_OFFSET_FIXED` is the only valid > type. On devices without local memory, this caching mode is invalid. > > As caching mode when specifying `I915_MMAP_OFFSET_FIXED`, WC or WB will > be used, depending on the object placement on creation. WB will be used > when the object can only exist in system memory, WC otherwise. > > Userspace: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11888 > > - Reinstate the mmap ioctl for (already released) integrated Gen12 platforms > > Rationale: Otherwise media driver breaks eg. for ADL-P. Long term goal is > still to sunset the IOCTL even for integrated and require using mmap_offset. > > - Reject caching/set_domain IOCTLs on discrete > > Expected to become immutable property of the BO > > - Disallow changing context parameters after first use on Gen12 and earlier > - Require setting context parameters at creation on platforms after Gen12 > > Rationale (for both): Allow less dynamic changes to the context to simplify > the implementation and avoid user shooting theirselves in the foot. > > - Drop I915_CONTEXT_PARAM_RINGSIZE > > Userspace PR for compute-driver has not been merged > > - Drop I915_CONTEXT_PARAM_NO_ZEROMAP > > Userspace PR for libdrm / Beignet was never landed > > - Drop CONTEXT_CLONE API > > Userspace PR for Mesa was never landed > > - Drop getparam support for I915_CONTEXT_PARAM_ENGINES > > Only existed for symmetry wrt. setparam, never used. > > - Disallow bonding of virtual engines > > Drop the prep work, no hardware has been released needing it. > > - (Implicit) Disable gpu relocations > > Media userspace was the last userspace to still use them. They > have converted so performance can be regained with an update. > > Core Changes: > > - Merge topic branch 'topic/i915-ttm-2021-06-11' (from Maarten) > - Merge topic branch 'topic/revid_steppings' (from Matt R) > - Merge topic branch 'topic/xehp-dg2-definitions-2021-07-21' (from Matt R) > - Backmerges drm-next (Rodrigo) > > Driver Changes: > > - Initial workarounds for ADL-P (Clint) > - Preliminary code for XeHP/DG2 (Stuart, Umesh, Matt R, Prathap, Ram, > Venkata, Akeem, Tvrtko, John, Lucas) >
Re: [Intel-gfx] [PATCH] drm/i915: Be more gentle when exiting non-persistent contexts
On 05/08/2021 17:32, Matthew Brost wrote: On Thu, Aug 05, 2021 at 01:05:09PM +0100, Tvrtko Ursulin wrote: From: Tvrtko Ursulin When a non-persistent context exits we currently mark it as banned in order to trigger fast termination of any outstanding GPU jobs it may have left running. In doing so we apply a very strict 1ms limit in which the left over job has to preempt before we issues an engine resets. Some workloads are not able to cleanly preempt in that time window and it can be argued that it would instead be better to give them a bit more grace since avoiding engine resets is generally preferrable. To achieve this the patch splits handling of banned contexts from simply closed non-persistent ones and then applies different timeouts for both and also extends the criteria which determines if a request should be scheduled back in after preemption or not. 15ms preempt timeout grace is given to exited non-persistent contexts which have been empirically tested to satisfy customers requirements and still provides reasonably quick cleanup post exit. I think you need to rework your thinking here a bit as this a very execlists specific solution and the GuC needs to be considered. Slipped my mind GuC patches were merged in the meantime. (This patch predates that.) But I think wording in the commit message is fine. It is just the implementation that now has to handle the GuC as well. v2: * Streamline fast path checks. v3: * Simplify by using only schedulable status. * Increase timeout to 20ms. v4: * Fix live_execlists selftest. v5: * Fix logic in kill_engines. v6: * Rebase. Signed-off-by: Tvrtko Ursulin Cc: Chris Wilson Cc: Zhen Han --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 22 +-- drivers/gpu/drm/i915/gt/intel_context.c | 2 ++ drivers/gpu/drm/i915/gt/intel_context.h | 17 +- drivers/gpu/drm/i915/gt/intel_context_types.h | 1 + .../drm/i915/gt/intel_execlists_submission.c | 11 -- drivers/gpu/drm/i915/gt/selftest_execlists.c | 20 +++-- drivers/gpu/drm/i915/i915_request.c | 2 +- 7 files changed, 57 insertions(+), 18 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index cff72679ad7c..21fe5d4057ab 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -1065,7 +1065,8 @@ static struct intel_engine_cs *active_engine(struct intel_context *ce) return engine; } -static void kill_engines(struct i915_gem_engines *engines, bool ban) +static void +kill_engines(struct i915_gem_engines *engines, bool ban, bool persistent) { struct i915_gem_engines_iter it; struct intel_context *ce; @@ -1079,8 +1080,15 @@ static void kill_engines(struct i915_gem_engines *engines, bool ban) */ for_each_gem_engine(ce, engines, it) { struct intel_engine_cs *engine; + bool skip = false; + + if (ban) + skip = intel_context_ban(ce, NULL); + else if (!persistent) + skip = !intel_context_clear_schedulable(ce); schedulable doesn't hook into the backend at all, while intel_context_ban does. In the case of GuC submission intel_context_ban changes to preemption timeout to 1 us and disables scheduling resulting in the context getting kicked off the hardware immediately. You likely need to update intel_context_clear_schedulable to use the same vfunc as intel_context_ban() but accept an argument for the value of the preemption timeout. For a ban user a lower value, for clearing schedulable use a higher value. Okay I'll have a look. Might go back to closed flag as opposed to schedulable as well since I don't quite like schedulable being the odd one out. - if (ban && intel_context_ban(ce, NULL)) + /* Already previously banned or made non-schedulable? */ + if (skip) continue; /* @@ -1093,7 +1101,7 @@ static void kill_engines(struct i915_gem_engines *engines, bool ban) engine = active_engine(ce); /* First attempt to gracefully cancel the context */ - if (engine && !__cancel_engine(engine) && ban) + if (engine && !__cancel_engine(engine) && (ban || !persistent)) /* * If we are unable to send a preemptive pulse to bump * the context from the GPU, we have to resort to a full @@ -1105,8 +1113,6 @@ static void kill_engines(struct i915_gem_engines *engines, bool ban) static void kill_context(struct i915_gem_context *ctx) { - bool ban = (!i915_gem_context_is_persistent(ctx) || - !ctx->i915->params.enable_hangcheck); struct i915_gem_engines *pos, *next; spin_lock_irq(&ctx->stale.lock); @@ -1119,7 +1125,8 @@ static vo
Re: [Intel-gfx] [PATCH 0/4] Enable GuC submission by default on DG1
Hi, On 8/3/21 7:26 PM, Matthew Brost wrote: On Tue, Aug 03, 2021 at 02:15:13PM +0200, Daniel Vetter wrote: On Tue, Aug 3, 2021 at 6:53 AM Matthew Brost wrote: Minimum set of patches to enable GuC submission on DG1 and enable it by default. A little difficult to test as IGTs do not work with DG1 due to a bunch of uAPI features being disabled (e.g. relocations, caching memory options, etc...). Matt Auld has an igt series which fixes a lot of this stuff, would be good to do at least a Test-With run with that. It looks like Maarten now merged Matt's series to IGT. There is a series on IGT trybot with pending work to have some igt tests support relocations, https://patchwork.freedesktop.org/series/92043/ One of the tests that have WIP fixes is gem_exec_whisper, and that particular test has historically shown occasional hangs with GuC submission on DG1 so it would be very desirable if we could make that test in particular work (I haven't verified that that's the case) reliably. Also the following series: https://patchwork.freedesktop.org/series/93455/ tries a bit harder to get some more tests running, squashing the above series on top of latest IGT. Thanks, /Thomas
Re: [Intel-gfx] [PATCH 0/4] Enable GuC submission by default on DG1
On 8/6/21 1:34 PM, Thomas Hellström (Intel) wrote: Hi, On 8/3/21 7:26 PM, Matthew Brost wrote: On Tue, Aug 03, 2021 at 02:15:13PM +0200, Daniel Vetter wrote: On Tue, Aug 3, 2021 at 6:53 AM Matthew Brost wrote: Minimum set of patches to enable GuC submission on DG1 and enable it by default. A little difficult to test as IGTs do not work with DG1 due to a bunch of uAPI features being disabled (e.g. relocations, caching memory options, etc...). Matt Auld has an igt series which fixes a lot of this stuff, would be good to do at least a Test-With run with that. It looks like Maarten now merged Matt's series to IGT. There is a series on IGT trybot with pending work to have some igt tests support relocations, https://patchwork.freedesktop.org/series/92043/ One of the tests that have WIP fixes is gem_exec_whisper, and that particular test has historically shown occasional hangs with GuC submission on DG1 so it would be very desirable if we could make that test in particular work (I haven't verified that that's the case) reliably. Also the following series: https://patchwork.freedesktop.org/series/93455/ tries a bit harder to get some more tests running, squashing the above series on top of latest IGT. Thanks, /Thomas And also while we're working on getting igt adapted to uapi changes and to get more LMEM coverage in there, an IMO relevant test case to run manually is "piglit quick"on top of DG1-enabled OpenGL checking for regressions and significant changes in execution time. /Thomas
[Intel-gfx] ✗ Fi.CI.IGT: failure for drm/i915/gvt: Fix cached atomics setting for Windows VM (rev2)
== Series Details == Series: drm/i915/gvt: Fix cached atomics setting for Windows VM (rev2) URL : https://patchwork.freedesktop.org/series/92809/ State : failure == Summary == CI Bug Log - changes from CI_DRM_10454_full -> Patchwork_20779_full Summary --- **FAILURE** Serious unknown changes coming with Patchwork_20779_full absolutely need to be verified manually. If you think the reported changes have nothing to do with the changes introduced in Patchwork_20779_full, please notify your bug team to allow them to document this new failure mode, which will reduce false positives in CI. Possible new issues --- Here are the unknown changes that may have been introduced in Patchwork_20779_full: ### IGT changes ### Possible regressions * igt@gem_exec_schedule@u-submit-golden-slice@vecs0: - shard-iclb: [PASS][1] -> [INCOMPLETE][2] [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10454/shard-iclb5/igt@gem_exec_schedule@u-submit-golden-sl...@vecs0.html [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20779/shard-iclb4/igt@gem_exec_schedule@u-submit-golden-sl...@vecs0.html Known issues Here are the changes found in Patchwork_20779_full that come from known issues: ### CI changes ### Issues hit * boot: - shard-iclb: ([PASS][3], [PASS][4], [PASS][5], [PASS][6], [PASS][7], [PASS][8], [PASS][9], [PASS][10], [PASS][11], [PASS][12], [PASS][13], [PASS][14], [PASS][15], [PASS][16], [PASS][17], [PASS][18], [PASS][19], [PASS][20], [PASS][21], [PASS][22], [PASS][23], [PASS][24], [PASS][25], [PASS][26], [PASS][27]) -> ([PASS][28], [PASS][29], [PASS][30], [PASS][31], [PASS][32], [PASS][33], [PASS][34], [PASS][35], [PASS][36], [PASS][37], [FAIL][38], [PASS][39], [PASS][40], [PASS][41], [PASS][42], [PASS][43], [PASS][44], [PASS][45], [PASS][46], [PASS][47], [PASS][48], [PASS][49], [PASS][50], [PASS][51], [PASS][52]) ([i915#3521]) [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10454/shard-iclb3/boot.html [4]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10454/shard-iclb8/boot.html [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10454/shard-iclb8/boot.html [6]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10454/shard-iclb7/boot.html [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10454/shard-iclb7/boot.html [8]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10454/shard-iclb7/boot.html [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10454/shard-iclb6/boot.html [10]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10454/shard-iclb6/boot.html [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10454/shard-iclb6/boot.html [12]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10454/shard-iclb6/boot.html [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10454/shard-iclb5/boot.html [14]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10454/shard-iclb5/boot.html [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10454/shard-iclb5/boot.html [16]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10454/shard-iclb4/boot.html [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10454/shard-iclb4/boot.html [18]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10454/shard-iclb4/boot.html [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10454/shard-iclb4/boot.html [20]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10454/shard-iclb3/boot.html [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10454/shard-iclb3/boot.html [22]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10454/shard-iclb2/boot.html [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10454/shard-iclb2/boot.html [24]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10454/shard-iclb2/boot.html [25]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10454/shard-iclb1/boot.html [26]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10454/shard-iclb1/boot.html [27]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10454/shard-iclb1/boot.html [28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20779/shard-iclb1/boot.html [29]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20779/shard-iclb1/boot.html [30]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20779/shard-iclb1/boot.html [31]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20779/shard-iclb2/boot.html [32]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20779/shard-iclb2/boot.html [33]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20779/shard-iclb2/boot.html [34]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20779/shard-iclb2/boot.html [35]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20779/shard-iclb3/boot.html [36]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20779/shard-iclb3/boot.html [37]: ht
Re: [Intel-gfx] [PATCH 0/4] Enable GuC submission by default on DG1
On 8/6/21 1:34 PM, Thomas Hellström (Intel) wrote: Hi, On 8/3/21 7:26 PM, Matthew Brost wrote: On Tue, Aug 03, 2021 at 02:15:13PM +0200, Daniel Vetter wrote: On Tue, Aug 3, 2021 at 6:53 AM Matthew Brost wrote: Minimum set of patches to enable GuC submission on DG1 and enable it by default. A little difficult to test as IGTs do not work with DG1 due to a bunch of uAPI features being disabled (e.g. relocations, caching memory options, etc...). Matt Auld has an igt series which fixes a lot of this stuff, would be good to do at least a Test-With run with that. It looks like Maarten now merged Matt's series to IGT. There is a series on IGT trybot with pending work to have some igt tests support relocations, https://patchwork.freedesktop.org/series/92043/ One of the tests that have WIP fixes is gem_exec_whisper, and that particular test has historically shown occasional hangs with GuC submission on DG1 so it would be very desirable if we could make that test in particular work (I haven't verified that that's the case) reliably. Also the following series: https://patchwork.freedesktop.org/series/93455/ tries a bit harder to get some more tests running, squashing the above series on top of latest IGT. Thanks, /Thomas Just verified that gem-exec-whisper is running now on DG1 on latest igt and the above series. Haven't tried with GuC submission, though. /Thomas
Re: [Intel-gfx] [PATCH v5 02/20] drm/msm: Fix drm/sched point of no return rules
On Fri, Aug 6, 2021 at 12:58 AM Rob Clark wrote: > > On Thu, Aug 5, 2021 at 3:47 AM Daniel Vetter wrote: > > > > Originally drm_sched_job_init was the point of no return, after which > > drivers must submit a job. I've split that up, which allows us to fix > > this issue pretty easily. > > > > Only thing we have to take care of is to not skip to error paths after > > that. Other drivers do this the same for out-fence and similar things. > > > > Fixes: 1d8a5ca436ee ("drm/msm: Conversion to drm scheduler") > > Cc: Rob Clark > > Cc: Rob Clark > > Cc: Sean Paul > > Cc: Sumit Semwal > > Cc: "Christian König" > > Cc: linux-arm-...@vger.kernel.org > > Cc: dri-de...@lists.freedesktop.org > > Cc: freedr...@lists.freedesktop.org > > Cc: linux-me...@vger.kernel.org > > Cc: linaro-mm-...@lists.linaro.org > > Signed-off-by: Daniel Vetter > > --- > > drivers/gpu/drm/msm/msm_gem_submit.c | 15 +++ > > 1 file changed, 7 insertions(+), 8 deletions(-) > > > > diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c > > b/drivers/gpu/drm/msm/msm_gem_submit.c > > index 6d6c44f0e1f3..d0ed4ddc509e 100644 > > --- a/drivers/gpu/drm/msm/msm_gem_submit.c > > +++ b/drivers/gpu/drm/msm/msm_gem_submit.c > > @@ -52,9 +52,6 @@ static struct msm_gem_submit *submit_create(struct > > drm_device *dev, > > return ERR_PTR(ret); > > } > > > > - /* FIXME: this is way too early */ > > - drm_sched_job_arm(&job->base); > > - > > xa_init_flags(&submit->deps, XA_FLAGS_ALLOC); > > > > kref_init(&submit->ref); > > @@ -883,6 +880,9 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void > > *data, > > > > submit->user_fence = dma_fence_get(&submit->base.s_fence->finished); > > > > + /* point of no return, we _have_ to submit no matter what */ > > + drm_sched_job_arm(&submit->base); > > + > > /* > > * Allocate an id which can be used by WAIT_FENCE ioctl to map back > > * to the underlying fence. > > @@ -892,17 +892,16 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void > > *data, > > if (submit->fence_id < 0) { > > ret = submit->fence_id = 0; > > submit->fence_id = 0; > > - goto out; > > } > > > > - if (args->flags & MSM_SUBMIT_FENCE_FD_OUT) { > > + if (ret == 0 && args->flags & MSM_SUBMIT_FENCE_FD_OUT) { > > struct sync_file *sync_file = > > sync_file_create(submit->user_fence); > > if (!sync_file) { > > ret = -ENOMEM; > > - goto out; > > + } else { > > + fd_install(out_fence_fd, sync_file->file); > > + args->fence_fd = out_fence_fd; > > } > > - fd_install(out_fence_fd, sync_file->file); > > - args->fence_fd = out_fence_fd; > > I wonder if instead we should (approximately) undo "drm/msm/submit: > Simplify out-fence-fd handling" so that the point that it could fail > is moved up ahead of the drm_sched_job_arm()? Hm yeah. Up to you how you want to paint this shed, I think either is fine. > Also, does the dma_fence_get() work before drm_sched_job_arm()? From > a quick look, it looks like it won't, but I'm still playing catchup > and haven't had a chance to look at your entire series. If it doesn't > work before drm_sched_job_arm(), then there is really no way to > prevent a error path between the fence-init and job-submit. Yes. I thought I've checked that I put the _arm() in the right spot, but I guess I screwed up and you need the fence before the point where I've put the job_arm()? And yes the error path cannot be avoided for out-fences, that's what I tried to explain in the commit message. > But, prior to your series, wouldn't a failure after > drm_sched_job_init() but before the job is submitted just burn a > fence-id, and otherwise carry on it's merry way? Maybe? I'm not sure whether the scheduler gets confused about the gap and freak out abou that. I'm fairly new to that code and learning (which is part why I'm working on it). Since you look up in fences/syncobj after job_init() it should be pretty easy to whip up a testcase and see what happens. Also as long as nothing fails you won't see an issue, that's for sure. -Daniel > BR, > -R > > > } > > > > submit_attach_object_fences(submit); > > -- > > 2.32.0 > > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [Intel-gfx] [PATCH v5 02/20] drm/msm: Fix drm/sched point of no return rules
On Fri, Aug 6, 2021 at 9:42 AM Daniel Vetter wrote: > > On Fri, Aug 6, 2021 at 12:58 AM Rob Clark wrote: > > > > On Thu, Aug 5, 2021 at 3:47 AM Daniel Vetter wrote: > > > > > > Originally drm_sched_job_init was the point of no return, after which > > > drivers must submit a job. I've split that up, which allows us to fix > > > this issue pretty easily. > > > > > > Only thing we have to take care of is to not skip to error paths after > > > that. Other drivers do this the same for out-fence and similar things. > > > > > > Fixes: 1d8a5ca436ee ("drm/msm: Conversion to drm scheduler") > > > Cc: Rob Clark > > > Cc: Rob Clark > > > Cc: Sean Paul > > > Cc: Sumit Semwal > > > Cc: "Christian König" > > > Cc: linux-arm-...@vger.kernel.org > > > Cc: dri-de...@lists.freedesktop.org > > > Cc: freedr...@lists.freedesktop.org > > > Cc: linux-me...@vger.kernel.org > > > Cc: linaro-mm-...@lists.linaro.org > > > Signed-off-by: Daniel Vetter > > > --- > > > drivers/gpu/drm/msm/msm_gem_submit.c | 15 +++ > > > 1 file changed, 7 insertions(+), 8 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c > > > b/drivers/gpu/drm/msm/msm_gem_submit.c > > > index 6d6c44f0e1f3..d0ed4ddc509e 100644 > > > --- a/drivers/gpu/drm/msm/msm_gem_submit.c > > > +++ b/drivers/gpu/drm/msm/msm_gem_submit.c > > > @@ -52,9 +52,6 @@ static struct msm_gem_submit *submit_create(struct > > > drm_device *dev, > > > return ERR_PTR(ret); > > > } > > > > > > - /* FIXME: this is way too early */ > > > - drm_sched_job_arm(&job->base); > > > - > > > xa_init_flags(&submit->deps, XA_FLAGS_ALLOC); > > > > > > kref_init(&submit->ref); > > > @@ -883,6 +880,9 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void > > > *data, > > > > > > submit->user_fence = > > > dma_fence_get(&submit->base.s_fence->finished); > > > > > > + /* point of no return, we _have_ to submit no matter what */ > > > + drm_sched_job_arm(&submit->base); > > > + > > > /* > > > * Allocate an id which can be used by WAIT_FENCE ioctl to map > > > back > > > * to the underlying fence. > > > @@ -892,17 +892,16 @@ int msm_ioctl_gem_submit(struct drm_device *dev, > > > void *data, > > > if (submit->fence_id < 0) { > > > ret = submit->fence_id = 0; > > > submit->fence_id = 0; > > > - goto out; > > > } > > > > > > - if (args->flags & MSM_SUBMIT_FENCE_FD_OUT) { > > > + if (ret == 0 && args->flags & MSM_SUBMIT_FENCE_FD_OUT) { > > > struct sync_file *sync_file = > > > sync_file_create(submit->user_fence); > > > if (!sync_file) { > > > ret = -ENOMEM; > > > - goto out; > > > + } else { > > > + fd_install(out_fence_fd, sync_file->file); > > > + args->fence_fd = out_fence_fd; > > > } > > > - fd_install(out_fence_fd, sync_file->file); > > > - args->fence_fd = out_fence_fd; > > > > I wonder if instead we should (approximately) undo "drm/msm/submit: > > Simplify out-fence-fd handling" so that the point that it could fail > > is moved up ahead of the drm_sched_job_arm()? > > Hm yeah. Up to you how you want to paint this shed, I think either is fine. > > > Also, does the dma_fence_get() work before drm_sched_job_arm()? From > > a quick look, it looks like it won't, but I'm still playing catchup > > and haven't had a chance to look at your entire series. If it doesn't > > work before drm_sched_job_arm(), then there is really no way to > > prevent a error path between the fence-init and job-submit. > > Yes. I thought I've checked that I put the _arm() in the right spot, > but I guess I screwed up and you need the fence before the point where > I've put the job_arm()? And yes the error path cannot be avoided for > out-fences, that's what I tried to explain in the commit message. > > > But, prior to your series, wouldn't a failure after > > drm_sched_job_init() but before the job is submitted just burn a > > fence-id, and otherwise carry on it's merry way? > > Maybe? I'm not sure whether the scheduler gets confused about the gap > and freak out abou that. I'm fairly new to that code and learning > (which is part why I'm working on it). Since you look up in > fences/syncobj after job_init() it should be pretty easy to whip up a > testcase and see what happens. Also as long as nothing fails you won't > see an issue, that's for sure. fair.. I'll try to come up with a test case.. pre-scheduler-conversion it wasn't a problem to fail after the fence seqno was allocated (well, I guess you might have problems if you had 2^31 failures in a row) BR, -R > -Daniel > > > BR, > > -R > > > > > } > > > > > > submit_attach_object_fences(submit); > > > -- > > > 2.32.0 > > > > > > > -- > Daniel Vetter > Software Engineer,
[Intel-gfx] [PATCH v5.1 4/9] drm/i915/xehpsdv: Add compute DSS type
From: Stuart Summers Starting in XeHP, the concept of slice has been removed in favor of DSS (Dual-Subslice) masks for various workload types. These workloads have been divided into those enabled for geometry and those enabled for compute. i915 currently maintains a single set of S/SS/EU masks for the device. The goal of this patch set is to minimize the amount of impact to prior generations while still giving the user maximum flexibility. v2: - Generalize a comment about uapi access to geometry/compute masks; the proposed uapi has changed since the comment was first written, and will show up in a future series once the userspace code is published. (Lucas) v3: - Eliminate unnecessary has_compute_dss flag. (Lucas) - Drop unwanted comment change in uapi header. (Lucas) Bspec: 33117, 33118, 20376 Cc: Daniele Ceraolo Spurio Cc: Matt Roper Cc: Lucas De Marchi Signed-off-by: Stuart Summers Signed-off-by: Steve Hampson Signed-off-by: Matt Roper --- drivers/gpu/drm/i915/gt/intel_sseu.c | 60 +--- drivers/gpu/drm/i915/gt/intel_sseu.h | 4 +- drivers/gpu/drm/i915/i915_reg.h | 3 +- 3 files changed, 50 insertions(+), 17 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_sseu.c b/drivers/gpu/drm/i915/gt/intel_sseu.c index bbd272943c3f..b0e09b58005e 100644 --- a/drivers/gpu/drm/i915/gt/intel_sseu.c +++ b/drivers/gpu/drm/i915/gt/intel_sseu.c @@ -46,11 +46,11 @@ u32 intel_sseu_get_subslices(const struct sseu_dev_info *sseu, u8 slice) } void intel_sseu_set_subslices(struct sseu_dev_info *sseu, int slice, - u32 ss_mask) + u8 *subslice_mask, u32 ss_mask) { int offset = slice * sseu->ss_stride; - memcpy(&sseu->subslice_mask[offset], &ss_mask, sseu->ss_stride); + memcpy(&subslice_mask[offset], &ss_mask, sseu->ss_stride); } unsigned int @@ -100,14 +100,24 @@ static u16 compute_eu_total(const struct sseu_dev_info *sseu) return total; } -static void gen11_compute_sseu_info(struct sseu_dev_info *sseu, - u8 s_en, u32 ss_en, u16 eu_en) +static u32 get_ss_stride_mask(struct sseu_dev_info *sseu, u8 s, u32 ss_en) +{ + u32 ss_mask; + + ss_mask = ss_en >> (s * sseu->max_subslices); + ss_mask &= GENMASK(sseu->max_subslices - 1, 0); + + return ss_mask; +} + +static void gen11_compute_sseu_info(struct sseu_dev_info *sseu, u8 s_en, + u32 g_ss_en, u32 c_ss_en, u16 eu_en) { int s, ss; - /* ss_en represents entire subslice mask across all slices */ + /* g_ss_en/c_ss_en represent entire subslice mask across all slices */ GEM_BUG_ON(sseu->max_slices * sseu->max_subslices > - sizeof(ss_en) * BITS_PER_BYTE); + sizeof(g_ss_en) * BITS_PER_BYTE); for (s = 0; s < sseu->max_slices; s++) { if ((s_en & BIT(s)) == 0) @@ -115,7 +125,22 @@ static void gen11_compute_sseu_info(struct sseu_dev_info *sseu, sseu->slice_mask |= BIT(s); - intel_sseu_set_subslices(sseu, s, ss_en); + /* +* XeHP introduces the concept of compute vs geometry DSS. To +* reduce variation between GENs around subslice usage, store a +* mask for both the geometry and compute enabled masks since +* userspace will need to be able to query these masks +* independently. Also compute a total enabled subslice count +* for the purposes of selecting subslices to use in a +* particular GEM context. +*/ + intel_sseu_set_subslices(sseu, s, sseu->compute_subslice_mask, +get_ss_stride_mask(sseu, s, c_ss_en)); + intel_sseu_set_subslices(sseu, s, sseu->geometry_subslice_mask, +get_ss_stride_mask(sseu, s, g_ss_en)); + intel_sseu_set_subslices(sseu, s, sseu->subslice_mask, +get_ss_stride_mask(sseu, s, + g_ss_en | c_ss_en)); for (ss = 0; ss < sseu->max_subslices; ss++) if (intel_sseu_has_subslice(sseu, s, ss)) @@ -129,7 +154,7 @@ static void gen12_sseu_info_init(struct intel_gt *gt) { struct sseu_dev_info *sseu = >->info.sseu; struct intel_uncore *uncore = gt->uncore; - u32 dss_en; + u32 g_dss_en, c_dss_en = 0; u16 eu_en = 0; u8 eu_en_fuse; u8 s_en; @@ -160,7 +185,9 @@ static void gen12_sseu_info_init(struct intel_gt *gt) s_en = intel_uncore_read(uncore, GEN11_GT_SLICE_ENABLE) & GEN11_GT_S_ENA_MASK; - dss_en = intel_uncore_read(uncore, GEN12_GT_DSS_ENABLE); + g_dss_en = intel_uncore_read(uncore, GEN12_GT_GEOMETRY_DSS_ENABLE); +
[Intel-gfx] [PATCH] drm/i915: Only access SFC_DONE when media domain is not fused off
The SFC_DONE register lives within the corresponding VD0/VD2/VD4/VD6 forcewake domain and is not accessible if the vdbox in that domain is fused off and the forcewake is not initialized. This mistake went unnoticed because until recently we were using the wrong register offset for the SFC_DONE register; once the register offset was corrected, we started hitting errors like <4> [544.989065] i915 :cc:00.0: Uninitialized forcewake domain(s) 0x80 accessed at 0x1ce000 on parts with fused-off vdbox engines. Fixes: e50dbdbfd9fb ("drm/i915/tgl: Add SFC instdone to error state") Fixes: 82929a2140eb ("drm/i915: Correct SFC_DONE register offset") Cc: Daniele Ceraolo Spurio Cc: Mika Kuoppala Signed-off-by: Matt Roper --- drivers/gpu/drm/i915/i915_gpu_error.c | 19 ++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index 0f08bcfbe964..9cf6ac575de1 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -727,9 +727,18 @@ static void err_print_gt(struct drm_i915_error_state_buf *m, if (GRAPHICS_VER(m->i915) >= 12) { int i; - for (i = 0; i < GEN12_SFC_DONE_MAX; i++) + for (i = 0; i < GEN12_SFC_DONE_MAX; i++) { + /* +* SFC_DONE resides in the VD forcewake domain, so it +* only exists if the corresponding VCS engine is +* present. +*/ + if (!HAS_ENGINE(gt->_gt, _VCS(i * 2))) + continue; + err_printf(m, " SFC_DONE[%d]: 0x%08x\n", i, gt->sfc_done[i]); + } err_printf(m, " GAM_DONE: 0x%08x\n", gt->gam_done); } @@ -1598,6 +1607,14 @@ static void gt_record_regs(struct intel_gt_coredump *gt) if (GRAPHICS_VER(i915) >= 12) { for (i = 0; i < GEN12_SFC_DONE_MAX; i++) { + /* +* SFC_DONE resides in the VD forcewake domain, so it +* only exists if the corresponding VCS engine is +* present. +*/ + if (!HAS_ENGINE(gt->_gt, _VCS(i * 2))) + continue; + gt->sfc_done[i] = intel_uncore_read(uncore, GEN12_SFC_DONE(i)); } -- 2.25.4
Re: [Intel-gfx] [PATCH 1/1] drm/i915: Check if engine has heartbeat when closing a context
On 8/2/2021 02:40, Tvrtko Ursulin wrote: On 30/07/2021 19:13, John Harrison wrote: On 7/30/2021 02:49, Tvrtko Ursulin wrote: On 30/07/2021 01:13, John Harrison wrote: On 7/28/2021 17:34, Matthew Brost wrote: If an engine associated with a context does not have a heartbeat, ban it immediately. This is needed for GuC submission as a idle pulse doesn't kick the context off the hardware where it then can check for a heartbeat and ban the context. Pulse, that is a request with I915_PRIORITY_BARRIER, does not preempt a running normal priority context? Why does it matter then whether or not heartbeats are enabled - when heartbeat just ends up sending the same engine pulse (eventually, with raising priority)? The point is that the pulse is pointless. See the rest of my comments below, specifically "the context will get resubmitted to the hardware after the pulse completes". To re-iterate... Yes, it preempts the context. Yes, it does so whether heartbeats are enabled or not. But so what? Who cares? You have preempted a context. It is no longer running on the hardware. BUT IT IS STILL A VALID CONTEXT. It is valid yes, and it even may be the current ABI so another question is whether it is okay to change that. The backend scheduler will just resubmit it to the hardware as soon as the pulse completes. The only reason this works at all is because of the horrid hack in the execlist scheduler's back end implementation (in __execlists_schedule_in): if (unlikely(intel_context_is_closed(ce) && !intel_engine_has_heartbeat(engine))) intel_context_set_banned(ce); Right, is the above code then needed with this patch - when ban is immediately applied on the higher level? The actual back end scheduler is saying "Is this a zombie context? Is the heartbeat disabled? Then ban it". No other scheduler backend is going to have knowledge of zombie context status or of the heartbeat status. Nor are they going to call back into the higher levels of the i915 driver to trigger a ban operation. Certainly a hardware implemented scheduler is not going to be looking at private i915 driver information to decide whether to submit a context or whether to tell the OS to kill it off instead. For persistence to work with a hardware scheduler (or a non-Intel specific scheduler such as the DRM one), the handling of zombie contexts, banning, etc. *must* be done entirely in the front end. It cannot rely on any backend hacks. That means you can't rely on any fancy behaviour of pulses. If you want to ban a context then you must explicitly ban that context. If you want to ban it at some later point then you need to track it at the top level as a zombie and then explicitly ban that zombie at whatever later point. I am still trying to understand it all. If I go by the commit message: """ This is needed for GuC submission as a idle pulse doesn't kick the context off the hardware where it then can check for a heartbeat and ban the context. """ That did not explain things for me. Sentence does not appear to make sense. Now, it seems "kick off the hardware" is meant as revoke and not just preempt. Which is fine, perhaps just needs to be written more explicitly. But the part of checking for heartbeat after idle pulse does not compute for me. It is the heartbeat which emits idle pulses, not idle pulse emitting heartbeats. I am in agreement that the commit message is confusing and does not explain either the problem or the solution. But anyway, I can buy the handling at the front end story completely. It makes sense. We just need to agree that a) it is okay to change the ABI and b) remove the backend check from execlists if it is not needed any longer. And if ABI change is okay then commit message needs to talk about it loudly and clearly. I don't think we have a choice. The current ABI is not and cannot ever be compatible with any scheduler external to i915. It cannot be implemented with a hardware scheduler such as the GuC and it cannot be implemented with an external software scheduler such as the DRM one. My view is that any implementation involving knowledge of the heartbeat is fundamentally broken. According to Daniel Vetter, the DRM ABI on this subject is that an actively executing context should persist until the DRM file handle is closed. That seems like a much more plausible and simple ABI than one that says 'if the heartbeat is running then a context will persist forever, if the heartbeat is not running then it will be killed immediately, if the heart was running but then stops running then the context will be killed on the next context switch, ...'. And if I understand it correctly, the current ABI allows a badly written user app to cause a denial of service by leaving contexts permanently running an infinite loop on the hardware even after the app has been killed! How can that ever be considered a good idea? Th
[Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Begin enabling Xe_HP SDV and DG2 platforms (rev10)
== Series Details == Series: Begin enabling Xe_HP SDV and DG2 platforms (rev10) URL : https://patchwork.freedesktop.org/series/92135/ State : warning == Summary == $ dim checkpatch origin/drm-tip 0ebc963f011d drm/i915/xehp: Loop over all gslices for INSTDONE processing -:135: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'iter_' - possible side-effects? #135: FILE: drivers/gpu/drm/i915/gt/intel_engine_types.h:582: +#define for_each_instdone_gslice_dss_xehp(dev_priv_, sseu_, iter_, gslice_, dss_) \ + for ((iter_) = 0, (gslice_) = 0, (dss_) = 0; \ +(iter_) < GEN_MAX_SUBSLICES; \ +(iter_)++, (gslice_) = (iter_) / GEN_DSS_PER_GSLICE, \ +(dss_) = (iter_) % GEN_DSS_PER_GSLICE) \ + for_each_if(intel_sseu_has_subslice((sseu_), 0, (iter_))) -:135: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'gslice_' - possible side-effects? #135: FILE: drivers/gpu/drm/i915/gt/intel_engine_types.h:582: +#define for_each_instdone_gslice_dss_xehp(dev_priv_, sseu_, iter_, gslice_, dss_) \ + for ((iter_) = 0, (gslice_) = 0, (dss_) = 0; \ +(iter_) < GEN_MAX_SUBSLICES; \ +(iter_)++, (gslice_) = (iter_) / GEN_DSS_PER_GSLICE, \ +(dss_) = (iter_) % GEN_DSS_PER_GSLICE) \ + for_each_if(intel_sseu_has_subslice((sseu_), 0, (iter_))) -:135: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'dss_' - possible side-effects? #135: FILE: drivers/gpu/drm/i915/gt/intel_engine_types.h:582: +#define for_each_instdone_gslice_dss_xehp(dev_priv_, sseu_, iter_, gslice_, dss_) \ + for ((iter_) = 0, (gslice_) = 0, (dss_) = 0; \ +(iter_) < GEN_MAX_SUBSLICES; \ +(iter_)++, (gslice_) = (iter_) / GEN_DSS_PER_GSLICE, \ +(dss_) = (iter_) % GEN_DSS_PER_GSLICE) \ + for_each_if(intel_sseu_has_subslice((sseu_), 0, (iter_))) total: 0 errors, 0 warnings, 3 checks, 164 lines checked 91cde6f62ffc drm/i915/dg2: Report INSTDONE_GEOM values in error state 9dd72d6a97d4 drm/i915/xehpsdv: Add compute DSS type 9793162b05a6 drm/i915/xehpsdv: factor out function to read RP_STATE_CAP 8db8e3b6bca0 drm/i915/xehpsdv: Read correct RP_STATE_CAP register 8c7ca653ac33 drm/i915/dg2: Add new LRI reg offsets 23befa04925d drm/i915/dg2: Maintain backward-compatible nested batch behavior c5300668b502 drm/i915/dg2: Configure PCON in DP pre-enable path
[Intel-gfx] ✗ Fi.CI.SPARSE: warning for Begin enabling Xe_HP SDV and DG2 platforms (rev10)
== Series Details == Series: Begin enabling Xe_HP SDV and DG2 platforms (rev10) URL : https://patchwork.freedesktop.org/series/92135/ State : warning == Summary == $ dim sparse --fast origin/drm-tip Sparse version: v0.6.2 Fast mode used, each commit won't be checked separately. - +drivers/gpu/drm/i915/display/intel_display.c:1901:21:expected struct i915_vma *[assigned] vma +drivers/gpu/drm/i915/display/intel_display.c:1901:21:got void [noderef] __iomem *[assigned] iomem +drivers/gpu/drm/i915/display/intel_display.c:1901:21: warning: incorrect type in assignment (different address spaces) +drivers/gpu/drm/i915/gem/i915_gem_context.c:1410:34:expected struct i915_address_space *vm +drivers/gpu/drm/i915/gem/i915_gem_context.c:1410:34:got struct i915_address_space [noderef] __rcu *vm +drivers/gpu/drm/i915/gem/i915_gem_context.c:1410:34: warning: incorrect type in argument 1 (different address spaces) +drivers/gpu/drm/i915/gem/selftests/mock_context.c:43:25:expected struct i915_address_space [noderef] __rcu *vm +drivers/gpu/drm/i915/gem/selftests/mock_context.c:43:25:got struct i915_address_space * +drivers/gpu/drm/i915/gem/selftests/mock_context.c:43:25: warning: incorrect type in assignment (different address spaces) +drivers/gpu/drm/i915/gem/selftests/mock_context.c:60:34:expected struct i915_address_space *vm +drivers/gpu/drm/i915/gem/selftests/mock_context.c:60:34:got struct i915_address_space [noderef] __rcu *vm +drivers/gpu/drm/i915/gem/selftests/mock_context.c:60:34: warning: incorrect type in argument 1 (different address spaces) +drivers/gpu/drm/i915/gt/intel_engine_stats.h:27:9: warning: trying to copy expression type 31 +drivers/gpu/drm/i915/gt/intel_engine_stats.h:27:9: warning: trying to copy expression type 31 +drivers/gpu/drm/i915/gt/intel_engine_stats.h:27:9: warning: trying to copy expression type 31 +drivers/gpu/drm/i915/gt/intel_engine_stats.h:32:9: warning: trying to copy expression type 31 +drivers/gpu/drm/i915/gt/intel_engine_stats.h:32:9: warning: trying to copy expression type 31 +drivers/gpu/drm/i915/gt/intel_engine_stats.h:49:9: warning: trying to copy expression type 31 +drivers/gpu/drm/i915/gt/intel_engine_stats.h:49:9: warning: trying to copy expression type 31 +drivers/gpu/drm/i915/gt/intel_engine_stats.h:49:9: warning: trying to copy expression type 31 +drivers/gpu/drm/i915/gt/intel_engine_stats.h:56:9: warning: trying to copy expression type 31 +drivers/gpu/drm/i915/gt/intel_engine_stats.h:56:9: warning: trying to copy expression type 31 +drivers/gpu/drm/i915/gt/intel_reset.c:1392:5: warning: context imbalance in 'intel_gt_reset_trylock' - different lock contexts for basic block +drivers/gpu/drm/i915/gt/intel_ring_submission.c:1268:24: warning: Using plain integer as NULL pointer +drivers/gpu/drm/i915/i915_perf.c:1442:15: warning: memset with byte count of 16777216 +drivers/gpu/drm/i915/i915_perf.c:1496:15: warning: memset with byte count of 16777216 +./include/asm-generic/bitops/find.h:112:45: warning: shift count is negative (-262080) +./include/asm-generic/bitops/find.h:32:31: warning: shift count is negative (-262080) +./include/linux/spinlock.h:409:9: warning: context imbalance in 'fwtable_read16' - different lock contexts for basic block +./include/linux/spinlock.h:409:9: warning: context imbalance in 'fwtable_read32' - different lock contexts for basic block +./include/linux/spinlock.h:409:9: warning: context imbalance in 'fwtable_read64' - different lock contexts for basic block +./include/linux/spinlock.h:409:9: warning: context imbalance in 'fwtable_read8' - different lock contexts for basic block +./include/linux/spinlock.h:409:9: warning: context imbalance in 'fwtable_write16' - different lock contexts for basic block +./include/linux/spinlock.h:409:9: warning: context imbalance in 'fwtable_write32' - different lock contexts for basic block +./include/linux/spinlock.h:409:9: warning: context imbalance in 'fwtable_write8' - different lock contexts for basic block +./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen11_fwtable_read16' - different lock contexts for basic block +./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen11_fwtable_read32' - different lock contexts for basic block +./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen11_fwtable_read64' - different lock contexts for basic block +./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen11_fwtable_read8' - different lock contexts for basic block +./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen11_fwtable_write16' - different lock contexts for basic block +./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen11_fwtable_write32' - different lock contexts for basic block +./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen11_fwtable_write8' - different lock contexts for basic block +./include/linux/spinlock.h:409:9:
Re: [Intel-gfx] [PATCH] drm/i915: Fix syncmap memory leak
On 7/30/2021 12:53, Matthew Brost wrote: A small race exists between intel_gt_retire_requests_timeout and intel_timeline_exit which could result in the syncmap not getting free'd. Rather than work to hard to seal this race, simply cleanup the free'd -> freed syncmap on fini. unreferenced object 0x88813bc53b18 (size 96): comm "gem_close_race", pid 5410, jiffies 4294917818 (age 1105.600s) hex dump (first 32 bytes): 01 00 00 00 00 00 00 00 00 00 00 00 0a 00 00 00 00 00 00 00 00 00 00 00 6b 6b 6b 6b 06 00 00 00 backtrace: [<120b863a>] __sync_alloc_leaf+0x1e/0x40 [i915] [<042f6959>] __sync_set+0x1bb/0x240 [i915] [<90f0e90f>] i915_request_await_dma_fence+0x1c7/0x400 [i915] [<56a48219>] i915_request_await_object+0x222/0x360 [i915] [] i915_gem_do_execbuffer+0x1bd0/0x2250 [i915] [<3c9d830f>] i915_gem_execbuffer2_ioctl+0x405/0xce0 [i915] [ ] drm_ioctl_kernel+0xb0/0xf0 [drm] [ ] drm_ioctl+0x305/0x3c0 [drm] [<8b0d8986>] __x64_sys_ioctl+0x71/0xb0 [<76c362a4>] do_syscall_64+0x33/0x80 [ ] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Signed-off-by: Matthew Brost Fixes: 531958f6f357 ("drm/i915/gt: Track timeline activeness in enter/exit") Cc: --- drivers/gpu/drm/i915/gt/intel_timeline.c | 9 + 1 file changed, 9 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c index c4a126c8caef..1257f4f11e66 100644 --- a/drivers/gpu/drm/i915/gt/intel_timeline.c +++ b/drivers/gpu/drm/i915/gt/intel_timeline.c @@ -127,6 +127,15 @@ static void intel_timeline_fini(struct rcu_head *rcu) i915_vma_put(timeline->hwsp_ggtt); i915_active_fini(&timeline->active); + + /* +* A small race exists between intel_gt_retire_requests_timeout and +* intel_timeline_exit which could result in the syncmap not getting +* free'd. Rather than work to hard to seal this race, simply cleanup +* the syncmap on fini. What is the race? I'm going round in circles just trying to work out how intel_gt_retire_requests_timeout is supposed to get to intel_timeline_exit in the first place. Also, free'd -> freed. John. +*/ + i915_syncmap_free(&timeline->sync); + kfree(timeline); }
Re: [Intel-gfx] [PATCH] drm/i915: Disable bonding on gen12+ platforms
On 7/28/2021 12:21, Matthew Brost wrote: Disable bonding on gen12+ platforms aside from ones already supported by the i915 - TGL, RKL, and ADL-S. Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 05c3ee191710..9c3672bac0e2 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -446,6 +446,13 @@ set_proto_ctx_engines_bond(struct i915_user_extension __user *base, void *data) u16 idx, num_bonds; int err, n; + if (GRAPHICS_VER(i915) >= 12 && !IS_TIGERLAKE(i915) && + !IS_ROCKETLAKE(i915) && !IS_ALDERLAKE_S(i915)) { + drm_dbg(&i915->drm, + "Bonding on gen12+ aside from TGL, RKL, and ADL_S not allowed\n"); I would have said not supported rather than not allowed. Either way: Reviewed-by: John Harrison + return -ENODEV; + } + if (get_user(idx, &ext->virtual_index)) return -EFAULT;
Re: [Intel-gfx] [PATCH] drm/i915: Fix syncmap memory leak
On Fri, Aug 06, 2021 at 11:23:06AM -0700, John Harrison wrote: > On 7/30/2021 12:53, Matthew Brost wrote: > > A small race exists between intel_gt_retire_requests_timeout and > > intel_timeline_exit which could result in the syncmap not getting > > free'd. Rather than work to hard to seal this race, simply cleanup the > free'd -> freed > Sure. > > syncmap on fini. > > > > unreferenced object 0x88813bc53b18 (size 96): > >comm "gem_close_race", pid 5410, jiffies 4294917818 (age 1105.600s) > >hex dump (first 32 bytes): > > 01 00 00 00 00 00 00 00 00 00 00 00 0a 00 00 00 > > 00 00 00 00 00 00 00 00 6b 6b 6b 6b 06 00 00 00 > >backtrace: > > [<120b863a>] __sync_alloc_leaf+0x1e/0x40 [i915] > > [<042f6959>] __sync_set+0x1bb/0x240 [i915] > > [<90f0e90f>] i915_request_await_dma_fence+0x1c7/0x400 [i915] > > [<56a48219>] i915_request_await_object+0x222/0x360 [i915] > > [] i915_gem_do_execbuffer+0x1bd0/0x2250 [i915] > > [<3c9d830f>] i915_gem_execbuffer2_ioctl+0x405/0xce0 [i915] > > [ ] drm_ioctl_kernel+0xb0/0xf0 [drm] > > [ ] drm_ioctl+0x305/0x3c0 [drm] > > [<8b0d8986>] __x64_sys_ioctl+0x71/0xb0 > > [<76c362a4>] do_syscall_64+0x33/0x80 > > [ ] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > > Signed-off-by: Matthew Brost > > Fixes: 531958f6f357 ("drm/i915/gt: Track timeline activeness in enter/exit") > > Cc: > > --- > > drivers/gpu/drm/i915/gt/intel_timeline.c | 9 + > > 1 file changed, 9 insertions(+) > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c > > b/drivers/gpu/drm/i915/gt/intel_timeline.c > > index c4a126c8caef..1257f4f11e66 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_timeline.c > > +++ b/drivers/gpu/drm/i915/gt/intel_timeline.c > > @@ -127,6 +127,15 @@ static void intel_timeline_fini(struct rcu_head *rcu) > > i915_vma_put(timeline->hwsp_ggtt); > > i915_active_fini(&timeline->active); > > + > > + /* > > +* A small race exists between intel_gt_retire_requests_timeout and > > +* intel_timeline_exit which could result in the syncmap not getting > > +* free'd. Rather than work to hard to seal this race, simply cleanup > > +* the syncmap on fini. > What is the race? I'm going round in circles just trying to work out how > intel_gt_retire_requests_timeout is supposed to get to intel_timeline_exit > in the first place. > intel_gt_retire_requests_timeout increments tl->active_count, active_count == 2 intel_timeline_exit is called, returns on atomic_add_unless, active_count == 1 intel_gt_retire_requests_timeout decrements tl->active_count, active_count == 0 i915_syncmap_free is never called, memory leak Matt > Also, free'd -> freed. > > John. > > > > +*/ > > + i915_syncmap_free(&timeline->sync); > > + > > kfree(timeline); > > } >
Re: [Intel-gfx] [PATCH v5 02/20] drm/msm: Fix drm/sched point of no return rules
On Fri, Aug 6, 2021 at 7:15 PM Rob Clark wrote: > > On Fri, Aug 6, 2021 at 9:42 AM Daniel Vetter wrote: > > > > On Fri, Aug 6, 2021 at 12:58 AM Rob Clark wrote: > > > > > > On Thu, Aug 5, 2021 at 3:47 AM Daniel Vetter > > > wrote: > > > > > > > > Originally drm_sched_job_init was the point of no return, after which > > > > drivers must submit a job. I've split that up, which allows us to fix > > > > this issue pretty easily. > > > > > > > > Only thing we have to take care of is to not skip to error paths after > > > > that. Other drivers do this the same for out-fence and similar things. > > > > > > > > Fixes: 1d8a5ca436ee ("drm/msm: Conversion to drm scheduler") > > > > Cc: Rob Clark > > > > Cc: Rob Clark > > > > Cc: Sean Paul > > > > Cc: Sumit Semwal > > > > Cc: "Christian König" > > > > Cc: linux-arm-...@vger.kernel.org > > > > Cc: dri-de...@lists.freedesktop.org > > > > Cc: freedr...@lists.freedesktop.org > > > > Cc: linux-me...@vger.kernel.org > > > > Cc: linaro-mm-...@lists.linaro.org > > > > Signed-off-by: Daniel Vetter > > > > --- > > > > drivers/gpu/drm/msm/msm_gem_submit.c | 15 +++ > > > > 1 file changed, 7 insertions(+), 8 deletions(-) > > > > > > > > diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c > > > > b/drivers/gpu/drm/msm/msm_gem_submit.c > > > > index 6d6c44f0e1f3..d0ed4ddc509e 100644 > > > > --- a/drivers/gpu/drm/msm/msm_gem_submit.c > > > > +++ b/drivers/gpu/drm/msm/msm_gem_submit.c > > > > @@ -52,9 +52,6 @@ static struct msm_gem_submit *submit_create(struct > > > > drm_device *dev, > > > > return ERR_PTR(ret); > > > > } > > > > > > > > - /* FIXME: this is way too early */ > > > > - drm_sched_job_arm(&job->base); > > > > - > > > > xa_init_flags(&submit->deps, XA_FLAGS_ALLOC); > > > > > > > > kref_init(&submit->ref); > > > > @@ -883,6 +880,9 @@ int msm_ioctl_gem_submit(struct drm_device *dev, > > > > void *data, > > > > > > > > submit->user_fence = > > > > dma_fence_get(&submit->base.s_fence->finished); > > > > > > > > + /* point of no return, we _have_ to submit no matter what */ > > > > + drm_sched_job_arm(&submit->base); > > > > + > > > > /* > > > > * Allocate an id which can be used by WAIT_FENCE ioctl to map > > > > back > > > > * to the underlying fence. > > > > @@ -892,17 +892,16 @@ int msm_ioctl_gem_submit(struct drm_device *dev, > > > > void *data, > > > > if (submit->fence_id < 0) { > > > > ret = submit->fence_id = 0; > > > > submit->fence_id = 0; > > > > - goto out; > > > > } > > > > > > > > - if (args->flags & MSM_SUBMIT_FENCE_FD_OUT) { > > > > + if (ret == 0 && args->flags & MSM_SUBMIT_FENCE_FD_OUT) { > > > > struct sync_file *sync_file = > > > > sync_file_create(submit->user_fence); > > > > if (!sync_file) { > > > > ret = -ENOMEM; > > > > - goto out; > > > > + } else { > > > > + fd_install(out_fence_fd, sync_file->file); > > > > + args->fence_fd = out_fence_fd; > > > > } > > > > - fd_install(out_fence_fd, sync_file->file); > > > > - args->fence_fd = out_fence_fd; > > > > > > I wonder if instead we should (approximately) undo "drm/msm/submit: > > > Simplify out-fence-fd handling" so that the point that it could fail > > > is moved up ahead of the drm_sched_job_arm()? > > > > Hm yeah. Up to you how you want to paint this shed, I think either is fine. > > > > > Also, does the dma_fence_get() work before drm_sched_job_arm()? From > > > a quick look, it looks like it won't, but I'm still playing catchup > > > and haven't had a chance to look at your entire series. If it doesn't > > > work before drm_sched_job_arm(), then there is really no way to > > > prevent a error path between the fence-init and job-submit. > > > > Yes. I thought I've checked that I put the _arm() in the right spot, > > but I guess I screwed up and you need the fence before the point where > > I've put the job_arm()? And yes the error path cannot be avoided for > > out-fences, that's what I tried to explain in the commit message. > > > > > But, prior to your series, wouldn't a failure after > > > drm_sched_job_init() but before the job is submitted just burn a > > > fence-id, and otherwise carry on it's merry way? > > > > Maybe? I'm not sure whether the scheduler gets confused about the gap > > and freak out abou that. I'm fairly new to that code and learning > > (which is part why I'm working on it). Since you look up in > > fences/syncobj after job_init() it should be pretty easy to whip up a > > testcase and see what happens. Also as long as nothing fails you won't > > see an issue, that's for sure. > > fair.. I'll try to come up with a test case.. pre-scheduler-conversion > it wasn't a problem to fail after the fence seqno
[Intel-gfx] ✓ Fi.CI.BAT: success for Begin enabling Xe_HP SDV and DG2 platforms (rev10)
== Series Details == Series: Begin enabling Xe_HP SDV and DG2 platforms (rev10) URL : https://patchwork.freedesktop.org/series/92135/ State : success == Summary == CI Bug Log - changes from CI_DRM_10457 -> Patchwork_20781 Summary --- **SUCCESS** No regressions found. External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20781/index.html Known issues Here are the changes found in Patchwork_20781 that come from known issues: ### IGT changes ### Possible fixes * igt@gem_exec_suspend@basic-s3: - fi-tgl-1115g4: [FAIL][1] ([i915#1888]) -> [PASS][2] [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10457/fi-tgl-1115g4/igt@gem_exec_susp...@basic-s3.html [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20781/fi-tgl-1115g4/igt@gem_exec_susp...@basic-s3.html * igt@i915_selftest@live@hangcheck: - fi-icl-u2: [DMESG-FAIL][3] -> [PASS][4] [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10457/fi-icl-u2/igt@i915_selftest@l...@hangcheck.html [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20781/fi-icl-u2/igt@i915_selftest@l...@hangcheck.html {name}: This element is suppressed. This means it is ignored when computing the status of the difference (SUCCESS, WARNING, or FAILURE). [i915#1888]: https://gitlab.freedesktop.org/drm/intel/issues/1888 [i915#541]: https://gitlab.freedesktop.org/drm/intel/issues/541 Participating hosts (37 -> 34) -- Missing(3): fi-bdw-samus fi-bsw-cyan bat-jsl-1 Build changes - * Linux: CI_DRM_10457 -> Patchwork_20781 CI-20190529: 20190529 CI_DRM_10457: 7700f858b68060307b0a7d94377a5d8f64000e5d @ git://anongit.freedesktop.org/gfx-ci/linux IGT_6162: 2f32b9e0da5f1ac9529318dd5b836c8cf4d3c441 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git Patchwork_20781: c5300668b502fff59ac563d2a7b22f2341bd8c95 @ git://anongit.freedesktop.org/gfx-ci/linux == Linux commits == c5300668b502 drm/i915/dg2: Configure PCON in DP pre-enable path 23befa04925d drm/i915/dg2: Maintain backward-compatible nested batch behavior 8c7ca653ac33 drm/i915/dg2: Add new LRI reg offsets 8db8e3b6bca0 drm/i915/xehpsdv: Read correct RP_STATE_CAP register 9793162b05a6 drm/i915/xehpsdv: factor out function to read RP_STATE_CAP 9dd72d6a97d4 drm/i915/xehpsdv: Add compute DSS type 91cde6f62ffc drm/i915/dg2: Report INSTDONE_GEOM values in error state 0ebc963f011d drm/i915/xehp: Loop over all gslices for INSTDONE processing == Logs == For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20781/index.html
Re: [Intel-gfx] [PATCH 2/4] drm/i915/guc: put all guc objects in lmem when available
On 8/2/2021 22:11, Matthew Brost wrote: From: Daniele Ceraolo Spurio The firmware binary has to be loaded from lmem and the recommendation is to put all other objects in there as well. Note that we don't fall back to system memory if the allocation in lmem fails because all objects are allocated during driver load and if we have issues with lmem at that point something is seriously wrong with the system, so no point in trying to handle it. Cc: Matthew Auld Cc: Abdiel Janulgue Cc: Michal Wajdeczko Cc: Vinay Belgaumkar Cc: Radoslaw Szwichtenberg Signed-off-by: Daniele Ceraolo Spurio Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gem/i915_gem_lmem.c | 26 drivers/gpu/drm/i915/gem/i915_gem_lmem.h | 4 ++ drivers/gpu/drm/i915/gt/uc/intel_guc.c| 9 ++- drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c | 11 +++- drivers/gpu/drm/i915/gt/uc/intel_huc.c| 14 - drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 75 +-- 6 files changed, 127 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c index eb345305dc52..034226c5d4d0 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c @@ -103,6 +103,32 @@ __i915_gem_object_create_lmem_with_ps(struct drm_i915_private *i915, size, page_size, flags); } +struct drm_i915_gem_object * +i915_gem_object_create_lmem_from_data(struct drm_i915_private *i915, + const void *data, size_t size) +{ + struct drm_i915_gem_object *obj; + void *map; + + obj = i915_gem_object_create_lmem(i915, + round_up(size, PAGE_SIZE), + I915_BO_ALLOC_CONTIGUOUS); + if (IS_ERR(obj)) + return obj; + + map = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WC); + if (IS_ERR(map)) { + i915_gem_object_put(obj); + return map; + } + + memcpy(map, data, size); + + i915_gem_object_unpin_map(obj); + + return obj; +} + struct drm_i915_gem_object * i915_gem_object_create_lmem(struct drm_i915_private *i915, resource_size_t size, diff --git a/drivers/gpu/drm/i915/gem/i915_gem_lmem.h b/drivers/gpu/drm/i915/gem/i915_gem_lmem.h index 4ee81fc66302..1b88ea13435c 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_lmem.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_lmem.h @@ -23,6 +23,10 @@ bool i915_gem_object_is_lmem(struct drm_i915_gem_object *obj); bool __i915_gem_object_is_lmem(struct drm_i915_gem_object *obj); +struct drm_i915_gem_object * +i915_gem_object_create_lmem_from_data(struct drm_i915_private *i915, + const void *data, size_t size); + struct drm_i915_gem_object * __i915_gem_object_create_lmem_with_ps(struct drm_i915_private *i915, resource_size_t size, diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c b/drivers/gpu/drm/i915/gt/uc/intel_guc.c index 979128e28372..55160d3e401a 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c @@ -3,6 +3,7 @@ * Copyright © 2014-2019 Intel Corporation */ +#include "gem/i915_gem_lmem.h" #include "gt/intel_gt.h" #include "gt/intel_gt_irq.h" #include "gt/intel_gt_pm_irq.h" @@ -630,7 +631,13 @@ struct i915_vma *intel_guc_allocate_vma(struct intel_guc *guc, u32 size) u64 flags; int ret; - obj = i915_gem_object_create_shmem(gt->i915, size); + if (HAS_LMEM(gt->i915)) + obj = i915_gem_object_create_lmem(gt->i915, size, + I915_BO_ALLOC_CPU_CLEAR | + I915_BO_ALLOC_CONTIGUOUS); + else + obj = i915_gem_object_create_shmem(gt->i915, size); + if (IS_ERR(obj)) return ERR_CAST(obj); diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c index 76fe766ad1bc..962be0c12208 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_fw.c @@ -41,7 +41,7 @@ static void guc_prepare_xfer(struct intel_uncore *uncore) } /* Copy RSA signature from the fw image to HW for verification */ -static void guc_xfer_rsa(struct intel_uc_fw *guc_fw, +static int guc_xfer_rsa(struct intel_uc_fw *guc_fw, struct intel_uncore *uncore) { u32 rsa[UOS_RSA_SCRATCH_COUNT]; @@ -49,10 +49,13 @@ static void guc_xfer_rsa(struct intel_uc_fw *guc_fw, int i; copied = intel_uc_fw_copy_rsa(guc_fw, rsa, sizeof(rsa)); - GEM_BUG_ON(copied < sizeof(rsa)); + if (copied < sizeof(rsa)) + return -ENOMEM; for (i = 0; i < UOS_RSA_SCRATCH_COUNT; i++) intel_uncore_write(u
Re: [Intel-gfx] [PATCH 3/4] drm/i915/guc: Add DG1 GuC / HuC firmware defs
On 8/2/2021 22:11, Matthew Brost wrote: Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c index f8cb00ffb506..a685d563df72 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c @@ -51,6 +51,7 @@ void intel_uc_fw_change_status(struct intel_uc_fw *uc_fw, #define INTEL_UC_FIRMWARE_DEFS(fw_def, guc_def, huc_def) \ fw_def(ALDERLAKE_P, 0, guc_def(adlp, 62, 0, 3), huc_def(tgl, 7, 9, 3)) \ fw_def(ALDERLAKE_S, 0, guc_def(tgl, 62, 0, 0), huc_def(tgl, 7, 9, 3)) \ + fw_def(DG1, 0, guc_def(dg1, 62, 0, 0), huc_def(dg1, 7, 9, 3)) \ fw_def(ROCKETLAKE, 0, guc_def(tgl, 62, 0, 0), huc_def(tgl, 7, 9, 3)) \ fw_def(TIGERLAKE, 0, guc_def(tgl, 62, 0, 0), huc_def(tgl, 7, 9, 3)) \ fw_def(JASPERLAKE, 0, guc_def(ehl, 62, 0, 0), huc_def(ehl, 9, 0, 0)) \ Reviewed-by: John Harrison
Re: [Intel-gfx] [PATCH 4/4] drm/i915/guc: Enable GuC submission by default on DG1
On 8/2/2021 22:11, Matthew Brost wrote: Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gt/uc/intel_uc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c index da57d18d9f6b..fc2fc8d111d8 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c @@ -35,7 +35,7 @@ static void uc_expand_default_options(struct intel_uc *uc) } /* Intermediate platforms are HuC authentication only */ - if (IS_DG1(i915) || IS_ALDERLAKE_S(i915)) { + if (IS_ALDERLAKE_S(i915)) { i915->params.enable_guc = ENABLE_GUC_LOAD_HUC; return; } Reviewed-by: John Harrison
[Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915: Only access SFC_DONE when media domain is not fused off
== Series Details == Series: drm/i915: Only access SFC_DONE when media domain is not fused off URL : https://patchwork.freedesktop.org/series/93468/ State : warning == Summary == $ dim checkpatch origin/drm-tip 0e9d9b7275f5 drm/i915: Only access SFC_DONE when media domain is not fused off -:15: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line) #15: <4> [544.989065] i915 :cc:00.0: Uninitialized forcewake domain(s) 0x80 accessed at 0x1ce000 total: 0 errors, 1 warnings, 0 checks, 33 lines checked
Re: [Intel-gfx] [PATCH] drm/i915: Disable bonding on gen12+ platforms
On Fri, Aug 6, 2021 at 8:25 PM John Harrison wrote: > On 7/28/2021 12:21, Matthew Brost wrote: > > Disable bonding on gen12+ platforms aside from ones already supported by > > the i915 - TGL, RKL, and ADL-S. > > > > Signed-off-by: Matthew Brost > > --- > > drivers/gpu/drm/i915/gem/i915_gem_context.c | 7 +++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c > > b/drivers/gpu/drm/i915/gem/i915_gem_context.c > > index 05c3ee191710..9c3672bac0e2 100644 > > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c > > @@ -446,6 +446,13 @@ set_proto_ctx_engines_bond(struct i915_user_extension > > __user *base, void *data) > > u16 idx, num_bonds; > > int err, n; > > > > + if (GRAPHICS_VER(i915) >= 12 && !IS_TIGERLAKE(i915) && > > + !IS_ROCKETLAKE(i915) && !IS_ALDERLAKE_S(i915)) { > > + drm_dbg(&i915->drm, > > + "Bonding on gen12+ aside from TGL, RKL, and ADL_S not > > allowed\n"); > I would have said not supported rather than not allowed. Either way: > Reviewed-by: John Harrison Either is fine with me. Acked-by: Daniel Vetter > > > + return -ENODEV; > > + } > > + > > if (get_user(idx, &ext->virtual_index)) > > return -EFAULT; > > > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [Intel-gfx] [PATCH v5 02/20] drm/msm: Fix drm/sched point of no return rules
On Fri, Aug 6, 2021 at 11:41 AM Daniel Vetter wrote: > > On Fri, Aug 6, 2021 at 7:15 PM Rob Clark wrote: > > > > On Fri, Aug 6, 2021 at 9:42 AM Daniel Vetter wrote: > > > > > > On Fri, Aug 6, 2021 at 12:58 AM Rob Clark wrote: > > > > > > > > On Thu, Aug 5, 2021 at 3:47 AM Daniel Vetter > > > > wrote: > > > > > > > > > > Originally drm_sched_job_init was the point of no return, after which > > > > > drivers must submit a job. I've split that up, which allows us to fix > > > > > this issue pretty easily. > > > > > > > > > > Only thing we have to take care of is to not skip to error paths after > > > > > that. Other drivers do this the same for out-fence and similar things. > > > > > > > > > > Fixes: 1d8a5ca436ee ("drm/msm: Conversion to drm scheduler") > > > > > Cc: Rob Clark > > > > > Cc: Rob Clark > > > > > Cc: Sean Paul > > > > > Cc: Sumit Semwal > > > > > Cc: "Christian König" > > > > > Cc: linux-arm-...@vger.kernel.org > > > > > Cc: dri-de...@lists.freedesktop.org > > > > > Cc: freedr...@lists.freedesktop.org > > > > > Cc: linux-me...@vger.kernel.org > > > > > Cc: linaro-mm-...@lists.linaro.org > > > > > Signed-off-by: Daniel Vetter > > > > > --- > > > > > drivers/gpu/drm/msm/msm_gem_submit.c | 15 +++ > > > > > 1 file changed, 7 insertions(+), 8 deletions(-) > > > > > > > > > > diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c > > > > > b/drivers/gpu/drm/msm/msm_gem_submit.c > > > > > index 6d6c44f0e1f3..d0ed4ddc509e 100644 > > > > > --- a/drivers/gpu/drm/msm/msm_gem_submit.c > > > > > +++ b/drivers/gpu/drm/msm/msm_gem_submit.c > > > > > @@ -52,9 +52,6 @@ static struct msm_gem_submit *submit_create(struct > > > > > drm_device *dev, > > > > > return ERR_PTR(ret); > > > > > } > > > > > > > > > > - /* FIXME: this is way too early */ > > > > > - drm_sched_job_arm(&job->base); > > > > > - > > > > > xa_init_flags(&submit->deps, XA_FLAGS_ALLOC); > > > > > > > > > > kref_init(&submit->ref); > > > > > @@ -883,6 +880,9 @@ int msm_ioctl_gem_submit(struct drm_device *dev, > > > > > void *data, > > > > > > > > > > submit->user_fence = > > > > > dma_fence_get(&submit->base.s_fence->finished); > > > > > > > > > > + /* point of no return, we _have_ to submit no matter what */ > > > > > + drm_sched_job_arm(&submit->base); > > > > > + > > > > > /* > > > > > * Allocate an id which can be used by WAIT_FENCE ioctl to > > > > > map back > > > > > * to the underlying fence. > > > > > @@ -892,17 +892,16 @@ int msm_ioctl_gem_submit(struct drm_device > > > > > *dev, void *data, > > > > > if (submit->fence_id < 0) { > > > > > ret = submit->fence_id = 0; > > > > > submit->fence_id = 0; > > > > > - goto out; > > > > > } > > > > > > > > > > - if (args->flags & MSM_SUBMIT_FENCE_FD_OUT) { > > > > > + if (ret == 0 && args->flags & MSM_SUBMIT_FENCE_FD_OUT) { > > > > > struct sync_file *sync_file = > > > > > sync_file_create(submit->user_fence); > > > > > if (!sync_file) { > > > > > ret = -ENOMEM; > > > > > - goto out; > > > > > + } else { > > > > > + fd_install(out_fence_fd, sync_file->file); > > > > > + args->fence_fd = out_fence_fd; > > > > > } > > > > > - fd_install(out_fence_fd, sync_file->file); > > > > > - args->fence_fd = out_fence_fd; > > > > > > > > I wonder if instead we should (approximately) undo "drm/msm/submit: > > > > Simplify out-fence-fd handling" so that the point that it could fail > > > > is moved up ahead of the drm_sched_job_arm()? > > > > > > Hm yeah. Up to you how you want to paint this shed, I think either is > > > fine. > > > > > > > Also, does the dma_fence_get() work before drm_sched_job_arm()? From > > > > a quick look, it looks like it won't, but I'm still playing catchup > > > > and haven't had a chance to look at your entire series. If it doesn't > > > > work before drm_sched_job_arm(), then there is really no way to > > > > prevent a error path between the fence-init and job-submit. > > > > > > Yes. I thought I've checked that I put the _arm() in the right spot, > > > but I guess I screwed up and you need the fence before the point where > > > I've put the job_arm()? And yes the error path cannot be avoided for > > > out-fences, that's what I tried to explain in the commit message. > > > > > > > But, prior to your series, wouldn't a failure after > > > > drm_sched_job_init() but before the job is submitted just burn a > > > > fence-id, and otherwise carry on it's merry way? > > > > > > Maybe? I'm not sure whether the scheduler gets confused about the gap > > > and freak out abou that. I'm fairly new to that code and learning > > > (which is part why I'm working on it). Since you look up in > > > fences/syncobj after job_ini
Re: [Intel-gfx] [PATCH v5 02/20] drm/msm: Fix drm/sched point of no return rules
On Fri, Aug 6, 2021 at 8:57 PM Rob Clark wrote: > > On Fri, Aug 6, 2021 at 11:41 AM Daniel Vetter wrote: > > > > On Fri, Aug 6, 2021 at 7:15 PM Rob Clark wrote: > > > > > > On Fri, Aug 6, 2021 at 9:42 AM Daniel Vetter > > > wrote: > > > > > > > > On Fri, Aug 6, 2021 at 12:58 AM Rob Clark wrote: > > > > > > > > > > On Thu, Aug 5, 2021 at 3:47 AM Daniel Vetter > > > > > wrote: > > > > > > > > > > > > Originally drm_sched_job_init was the point of no return, after > > > > > > which > > > > > > drivers must submit a job. I've split that up, which allows us to > > > > > > fix > > > > > > this issue pretty easily. > > > > > > > > > > > > Only thing we have to take care of is to not skip to error paths > > > > > > after > > > > > > that. Other drivers do this the same for out-fence and similar > > > > > > things. > > > > > > > > > > > > Fixes: 1d8a5ca436ee ("drm/msm: Conversion to drm scheduler") > > > > > > Cc: Rob Clark > > > > > > Cc: Rob Clark > > > > > > Cc: Sean Paul > > > > > > Cc: Sumit Semwal > > > > > > Cc: "Christian König" > > > > > > Cc: linux-arm-...@vger.kernel.org > > > > > > Cc: dri-de...@lists.freedesktop.org > > > > > > Cc: freedr...@lists.freedesktop.org > > > > > > Cc: linux-me...@vger.kernel.org > > > > > > Cc: linaro-mm-...@lists.linaro.org > > > > > > Signed-off-by: Daniel Vetter > > > > > > --- > > > > > > drivers/gpu/drm/msm/msm_gem_submit.c | 15 +++ > > > > > > 1 file changed, 7 insertions(+), 8 deletions(-) > > > > > > > > > > > > diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c > > > > > > b/drivers/gpu/drm/msm/msm_gem_submit.c > > > > > > index 6d6c44f0e1f3..d0ed4ddc509e 100644 > > > > > > --- a/drivers/gpu/drm/msm/msm_gem_submit.c > > > > > > +++ b/drivers/gpu/drm/msm/msm_gem_submit.c > > > > > > @@ -52,9 +52,6 @@ static struct msm_gem_submit > > > > > > *submit_create(struct drm_device *dev, > > > > > > return ERR_PTR(ret); > > > > > > } > > > > > > > > > > > > - /* FIXME: this is way too early */ > > > > > > - drm_sched_job_arm(&job->base); > > > > > > - > > > > > > xa_init_flags(&submit->deps, XA_FLAGS_ALLOC); > > > > > > > > > > > > kref_init(&submit->ref); > > > > > > @@ -883,6 +880,9 @@ int msm_ioctl_gem_submit(struct drm_device > > > > > > *dev, void *data, > > > > > > > > > > > > submit->user_fence = > > > > > > dma_fence_get(&submit->base.s_fence->finished); > > > > > > > > > > > > + /* point of no return, we _have_ to submit no matter what */ > > > > > > + drm_sched_job_arm(&submit->base); > > > > > > + > > > > > > /* > > > > > > * Allocate an id which can be used by WAIT_FENCE ioctl to > > > > > > map back > > > > > > * to the underlying fence. > > > > > > @@ -892,17 +892,16 @@ int msm_ioctl_gem_submit(struct drm_device > > > > > > *dev, void *data, > > > > > > if (submit->fence_id < 0) { > > > > > > ret = submit->fence_id = 0; > > > > > > submit->fence_id = 0; > > > > > > - goto out; > > > > > > } > > > > > > > > > > > > - if (args->flags & MSM_SUBMIT_FENCE_FD_OUT) { > > > > > > + if (ret == 0 && args->flags & MSM_SUBMIT_FENCE_FD_OUT) { > > > > > > struct sync_file *sync_file = > > > > > > sync_file_create(submit->user_fence); > > > > > > if (!sync_file) { > > > > > > ret = -ENOMEM; > > > > > > - goto out; > > > > > > + } else { > > > > > > + fd_install(out_fence_fd, sync_file->file); > > > > > > + args->fence_fd = out_fence_fd; > > > > > > } > > > > > > - fd_install(out_fence_fd, sync_file->file); > > > > > > - args->fence_fd = out_fence_fd; > > > > > > > > > > I wonder if instead we should (approximately) undo "drm/msm/submit: > > > > > Simplify out-fence-fd handling" so that the point that it could fail > > > > > is moved up ahead of the drm_sched_job_arm()? > > > > > > > > Hm yeah. Up to you how you want to paint this shed, I think either is > > > > fine. > > > > > > > > > Also, does the dma_fence_get() work before drm_sched_job_arm()? From > > > > > a quick look, it looks like it won't, but I'm still playing catchup > > > > > and haven't had a chance to look at your entire series. If it doesn't > > > > > work before drm_sched_job_arm(), then there is really no way to > > > > > prevent a error path between the fence-init and job-submit. > > > > > > > > Yes. I thought I've checked that I put the _arm() in the right spot, > > > > but I guess I screwed up and you need the fence before the point where > > > > I've put the job_arm()? And yes the error path cannot be avoided for > > > > out-fences, that's what I tried to explain in the commit message. > > > > > > > > > But, prior to your series, wouldn't a failure after > > > > > drm_sched_job_init() but before the job is submitted jus
[Intel-gfx] ✗ Fi.CI.BUILD: failure for drm/sched dependency handling and implicit sync fixes (rev2)
== Series Details == Series: drm/sched dependency handling and implicit sync fixes (rev2) URL : https://patchwork.freedesktop.org/series/93415/ State : failure == Summary == Applying: drm/sched: Split drm_sched_job_init Applying: drm/msm: Fix drm/sched point of no return rules error: git diff header lacks filename information when removing 1 leading pathname component (line 2) error: could not build fake ancestor hint: Use 'git am --show-current-patch=diff' to see the failed patch Patch failed at 0002 drm/msm: Fix drm/sched point of no return rules When you have resolved this problem, run "git am --continue". If you prefer to skip this patch, run "git am --skip" instead. To restore the original branch and stop patching, run "git am --abort".
[Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915: Only access SFC_DONE when media domain is not fused off
== Series Details == Series: drm/i915: Only access SFC_DONE when media domain is not fused off URL : https://patchwork.freedesktop.org/series/93468/ State : success == Summary == CI Bug Log - changes from CI_DRM_10457 -> Patchwork_20782 Summary --- **SUCCESS** No regressions found. External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20782/index.html Known issues Here are the changes found in Patchwork_20782 that come from known issues: ### IGT changes ### Issues hit * igt@kms_chamelium@hdmi-edid-read: - fi-kbl-7500u: [PASS][1] -> [FAIL][2] ([i915#3449]) [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10457/fi-kbl-7500u/igt@kms_chamel...@hdmi-edid-read.html [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20782/fi-kbl-7500u/igt@kms_chamel...@hdmi-edid-read.html Possible fixes * igt@gem_exec_suspend@basic-s3: - fi-tgl-1115g4: [FAIL][3] ([i915#1888]) -> [PASS][4] [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10457/fi-tgl-1115g4/igt@gem_exec_susp...@basic-s3.html [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20782/fi-tgl-1115g4/igt@gem_exec_susp...@basic-s3.html * igt@i915_selftest@live@hangcheck: - fi-icl-u2: [DMESG-FAIL][5] -> [PASS][6] [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10457/fi-icl-u2/igt@i915_selftest@l...@hangcheck.html [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20782/fi-icl-u2/igt@i915_selftest@l...@hangcheck.html [i915#1888]: https://gitlab.freedesktop.org/drm/intel/issues/1888 [i915#3449]: https://gitlab.freedesktop.org/drm/intel/issues/3449 Participating hosts (37 -> 34) -- Missing(3): fi-bdw-samus fi-bsw-cyan bat-jsl-1 Build changes - * Linux: CI_DRM_10457 -> Patchwork_20782 CI-20190529: 20190529 CI_DRM_10457: 7700f858b68060307b0a7d94377a5d8f64000e5d @ git://anongit.freedesktop.org/gfx-ci/linux IGT_6162: 2f32b9e0da5f1ac9529318dd5b836c8cf4d3c441 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git Patchwork_20782: 0e9d9b7275f5382258a63683599e60f68ccb1e31 @ git://anongit.freedesktop.org/gfx-ci/linux == Linux commits == 0e9d9b7275f5 drm/i915: Only access SFC_DONE when media domain is not fused off == Logs == For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20782/index.html
Re: [Intel-gfx] [PATCH 1/1] drm/i915: Check if engine has heartbeat when closing a context
Seen this fly by and figured I dropped a few thoughts in here. At the likely cost of looking a bit out of whack :-) On Fri, Aug 6, 2021 at 8:01 PM John Harrison wrote: > On 8/2/2021 02:40, Tvrtko Ursulin wrote: > > On 30/07/2021 19:13, John Harrison wrote: > >> On 7/30/2021 02:49, Tvrtko Ursulin wrote: > >>> On 30/07/2021 01:13, John Harrison wrote: > On 7/28/2021 17:34, Matthew Brost wrote: > > If an engine associated with a context does not have a heartbeat, > > ban it > > immediately. This is needed for GuC submission as a idle pulse > > doesn't > > kick the context off the hardware where it then can check for a > > heartbeat and ban the context. > >>> > >>> Pulse, that is a request with I915_PRIORITY_BARRIER, does not > >>> preempt a running normal priority context? > >>> > >>> Why does it matter then whether or not heartbeats are enabled - when > >>> heartbeat just ends up sending the same engine pulse (eventually, > >>> with raising priority)? > >> The point is that the pulse is pointless. See the rest of my comments > >> below, specifically "the context will get resubmitted to the hardware > >> after the pulse completes". To re-iterate... > >> > >> Yes, it preempts the context. Yes, it does so whether heartbeats are > >> enabled or not. But so what? Who cares? You have preempted a context. > >> It is no longer running on the hardware. BUT IT IS STILL A VALID > >> CONTEXT. > > > > It is valid yes, and it even may be the current ABI so another > > question is whether it is okay to change that. > > > >> The backend scheduler will just resubmit it to the hardware as soon > >> as the pulse completes. The only reason this works at all is because > >> of the horrid hack in the execlist scheduler's back end > >> implementation (in __execlists_schedule_in): > >> if (unlikely(intel_context_is_closed(ce) && > >> !intel_engine_has_heartbeat(engine))) > >> intel_context_set_banned(ce); > > > > Right, is the above code then needed with this patch - when ban is > > immediately applied on the higher level? > > > >> The actual back end scheduler is saying "Is this a zombie context? Is > >> the heartbeat disabled? Then ban it". No other scheduler backend is > >> going to have knowledge of zombie context status or of the heartbeat > >> status. Nor are they going to call back into the higher levels of the > >> i915 driver to trigger a ban operation. Certainly a hardware > >> implemented scheduler is not going to be looking at private i915 > >> driver information to decide whether to submit a context or whether > >> to tell the OS to kill it off instead. > >> > >> For persistence to work with a hardware scheduler (or a non-Intel > >> specific scheduler such as the DRM one), the handling of zombie > >> contexts, banning, etc. *must* be done entirely in the front end. It > >> cannot rely on any backend hacks. That means you can't rely on any > >> fancy behaviour of pulses. > >> > >> If you want to ban a context then you must explicitly ban that > >> context. If you want to ban it at some later point then you need to > >> track it at the top level as a zombie and then explicitly ban that > >> zombie at whatever later point. > > > > I am still trying to understand it all. If I go by the commit message: > > > > """ > > This is needed for GuC submission as a idle pulse doesn't > > kick the context off the hardware where it then can check for a > > heartbeat and ban the context. > > """ > > > > That did not explain things for me. Sentence does not appear to make > > sense. Now, it seems "kick off the hardware" is meant as revoke and > > not just preempt. Which is fine, perhaps just needs to be written more > > explicitly. But the part of checking for heartbeat after idle pulse > > does not compute for me. It is the heartbeat which emits idle pulses, > > not idle pulse emitting heartbeats. > I am in agreement that the commit message is confusing and does not > explain either the problem or the solution. > > > > > > > > But anyway, I can buy the handling at the front end story completely. > > It makes sense. We just need to agree that a) it is okay to change the > > ABI and b) remove the backend check from execlists if it is not needed > > any longer. > > > > And if ABI change is okay then commit message needs to talk about it > > loudly and clearly. > I don't think we have a choice. The current ABI is not and cannot ever > be compatible with any scheduler external to i915. It cannot be > implemented with a hardware scheduler such as the GuC and it cannot be > implemented with an external software scheduler such as the DRM one. So generally on linux we implement helper libraries, which means massive flexibility everywhere. https://blog.ffwll.ch/2016/12/midlayers-once-more-with-feeling.html So it shouldn't be an insurmountable problem to make this happen even with drm/scheduler, we can patch it up. Whether that's justified is another question. > M
Re: [Intel-gfx] [PATCH v5 02/20] drm/msm: Fix drm/sched point of no return rules
On Fri, Aug 6, 2021 at 12:11 PM Daniel Vetter wrote: > > On Fri, Aug 6, 2021 at 8:57 PM Rob Clark wrote: > > > > On Fri, Aug 6, 2021 at 11:41 AM Daniel Vetter > > wrote: > > > > > > On Fri, Aug 6, 2021 at 7:15 PM Rob Clark wrote: > > > > > > > > On Fri, Aug 6, 2021 at 9:42 AM Daniel Vetter > > > > wrote: > > > > > > > > > > On Fri, Aug 6, 2021 at 12:58 AM Rob Clark wrote: > > > > > > > > > > > > On Thu, Aug 5, 2021 at 3:47 AM Daniel Vetter > > > > > > wrote: > > > > > > > > > > > > > > Originally drm_sched_job_init was the point of no return, after > > > > > > > which > > > > > > > drivers must submit a job. I've split that up, which allows us to > > > > > > > fix > > > > > > > this issue pretty easily. > > > > > > > > > > > > > > Only thing we have to take care of is to not skip to error paths > > > > > > > after > > > > > > > that. Other drivers do this the same for out-fence and similar > > > > > > > things. > > > > > > > > > > > > > > Fixes: 1d8a5ca436ee ("drm/msm: Conversion to drm scheduler") > > > > > > > Cc: Rob Clark > > > > > > > Cc: Rob Clark > > > > > > > Cc: Sean Paul > > > > > > > Cc: Sumit Semwal > > > > > > > Cc: "Christian König" > > > > > > > Cc: linux-arm-...@vger.kernel.org > > > > > > > Cc: dri-de...@lists.freedesktop.org > > > > > > > Cc: freedr...@lists.freedesktop.org > > > > > > > Cc: linux-me...@vger.kernel.org > > > > > > > Cc: linaro-mm-...@lists.linaro.org > > > > > > > Signed-off-by: Daniel Vetter > > > > > > > --- > > > > > > > drivers/gpu/drm/msm/msm_gem_submit.c | 15 +++ > > > > > > > 1 file changed, 7 insertions(+), 8 deletions(-) > > > > > > > > > > > > > > diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c > > > > > > > b/drivers/gpu/drm/msm/msm_gem_submit.c > > > > > > > index 6d6c44f0e1f3..d0ed4ddc509e 100644 > > > > > > > --- a/drivers/gpu/drm/msm/msm_gem_submit.c > > > > > > > +++ b/drivers/gpu/drm/msm/msm_gem_submit.c > > > > > > > @@ -52,9 +52,6 @@ static struct msm_gem_submit > > > > > > > *submit_create(struct drm_device *dev, > > > > > > > return ERR_PTR(ret); > > > > > > > } > > > > > > > > > > > > > > - /* FIXME: this is way too early */ > > > > > > > - drm_sched_job_arm(&job->base); > > > > > > > - > > > > > > > xa_init_flags(&submit->deps, XA_FLAGS_ALLOC); > > > > > > > > > > > > > > kref_init(&submit->ref); > > > > > > > @@ -883,6 +880,9 @@ int msm_ioctl_gem_submit(struct drm_device > > > > > > > *dev, void *data, > > > > > > > > > > > > > > submit->user_fence = > > > > > > > dma_fence_get(&submit->base.s_fence->finished); > > > > > > > > > > > > > > + /* point of no return, we _have_ to submit no matter what > > > > > > > */ > > > > > > > + drm_sched_job_arm(&submit->base); > > > > > > > + > > > > > > > /* > > > > > > > * Allocate an id which can be used by WAIT_FENCE ioctl > > > > > > > to map back > > > > > > > * to the underlying fence. > > > > > > > @@ -892,17 +892,16 @@ int msm_ioctl_gem_submit(struct drm_device > > > > > > > *dev, void *data, > > > > > > > if (submit->fence_id < 0) { > > > > > > > ret = submit->fence_id = 0; > > > > > > > submit->fence_id = 0; > > > > > > > - goto out; > > > > > > > } > > > > > > > > > > > > > > - if (args->flags & MSM_SUBMIT_FENCE_FD_OUT) { > > > > > > > + if (ret == 0 && args->flags & MSM_SUBMIT_FENCE_FD_OUT) { > > > > > > > struct sync_file *sync_file = > > > > > > > sync_file_create(submit->user_fence); > > > > > > > if (!sync_file) { > > > > > > > ret = -ENOMEM; > > > > > > > - goto out; > > > > > > > + } else { > > > > > > > + fd_install(out_fence_fd, sync_file->file); > > > > > > > + args->fence_fd = out_fence_fd; > > > > > > > } > > > > > > > - fd_install(out_fence_fd, sync_file->file); > > > > > > > - args->fence_fd = out_fence_fd; > > > > > > > > > > > > I wonder if instead we should (approximately) undo "drm/msm/submit: > > > > > > Simplify out-fence-fd handling" so that the point that it could fail > > > > > > is moved up ahead of the drm_sched_job_arm()? > > > > > > > > > > Hm yeah. Up to you how you want to paint this shed, I think either is > > > > > fine. > > > > > > > > > > > Also, does the dma_fence_get() work before drm_sched_job_arm()? > > > > > > From > > > > > > a quick look, it looks like it won't, but I'm still playing catchup > > > > > > and haven't had a chance to look at your entire series. If it > > > > > > doesn't > > > > > > work before drm_sched_job_arm(), then there is really no way to > > > > > > prevent a error path between the fence-init and job-submit. > > > > > > > > > > Yes. I thought I've checked that I put the _arm() in the right spot, > > > > > but I guess I screwed up and you need the
[Intel-gfx] ✗ Fi.CI.IGT: failure for remove rcu support from i915_address_space (rev5)
== Series Details == Series: remove rcu support from i915_address_space (rev5) URL : https://patchwork.freedesktop.org/series/93314/ State : failure == Summary == CI Bug Log - changes from CI_DRM_10451_full -> Patchwork_20777_full Summary --- **FAILURE** Serious unknown changes coming with Patchwork_20777_full absolutely need to be verified manually. If you think the reported changes have nothing to do with the changes introduced in Patchwork_20777_full, please notify your bug team to allow them to document this new failure mode, which will reduce false positives in CI. Possible new issues --- Here are the unknown changes that may have been introduced in Patchwork_20777_full: ### IGT changes ### Possible regressions * igt@i915_selftest@live@hangcheck: - shard-snb: [PASS][1] -> [INCOMPLETE][2] [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10451/shard-snb7/igt@i915_selftest@l...@hangcheck.html [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20777/shard-snb2/igt@i915_selftest@l...@hangcheck.html Known issues Here are the changes found in Patchwork_20777_full that come from known issues: ### IGT changes ### Issues hit * igt@gem_create@create-massive: - shard-apl: NOTRUN -> [DMESG-WARN][3] ([i915#3002]) [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20777/shard-apl7/igt@gem_cre...@create-massive.html * igt@gem_ctx_persistence@idempotent: - shard-snb: NOTRUN -> [SKIP][4] ([fdo#109271] / [i915#1099]) +6 similar issues [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20777/shard-snb6/igt@gem_ctx_persiste...@idempotent.html * igt@gem_eio@unwedge-stress: - shard-skl: [PASS][5] -> [TIMEOUT][6] ([i915#2369] / [i915#3063]) [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10451/shard-skl1/igt@gem_...@unwedge-stress.html [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20777/shard-skl9/igt@gem_...@unwedge-stress.html - shard-snb: NOTRUN -> [FAIL][7] ([i915#3354]) [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20777/shard-snb6/igt@gem_...@unwedge-stress.html * igt@gem_exec_fair@basic-deadline: - shard-apl: NOTRUN -> [FAIL][8] ([i915#2846]) [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20777/shard-apl6/igt@gem_exec_f...@basic-deadline.html * igt@gem_exec_fair@basic-none-rrul@rcs0: - shard-tglb: NOTRUN -> [FAIL][9] ([i915#2842]) [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20777/shard-tglb1/igt@gem_exec_fair@basic-none-r...@rcs0.html * igt@gem_exec_fair@basic-none-share@rcs0: - shard-iclb: [PASS][10] -> [FAIL][11] ([i915#2842]) [10]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10451/shard-iclb1/igt@gem_exec_fair@basic-none-sh...@rcs0.html [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20777/shard-iclb5/igt@gem_exec_fair@basic-none-sh...@rcs0.html - shard-tglb: [PASS][12] -> [FAIL][13] ([i915#2842]) [12]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10451/shard-tglb5/igt@gem_exec_fair@basic-none-sh...@rcs0.html [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20777/shard-tglb7/igt@gem_exec_fair@basic-none-sh...@rcs0.html * igt@gem_exec_fair@basic-none-solo@rcs0: - shard-glk: [PASS][14] -> [FAIL][15] ([i915#2842]) +1 similar issue [14]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10451/shard-glk6/igt@gem_exec_fair@basic-none-s...@rcs0.html [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20777/shard-glk1/igt@gem_exec_fair@basic-none-s...@rcs0.html * igt@gem_exec_whisper@basic-contexts-all: - shard-glk: [PASS][16] -> [DMESG-WARN][17] ([i915#118] / [i915#95]) [16]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10451/shard-glk7/igt@gem_exec_whis...@basic-contexts-all.html [17]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20777/shard-glk7/igt@gem_exec_whis...@basic-contexts-all.html * igt@gem_userptr_blits@unsync-overlap: - shard-tglb: NOTRUN -> [SKIP][18] ([i915#3297]) [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20777/shard-tglb1/igt@gem_userptr_bl...@unsync-overlap.html * igt@gem_userptr_blits@unsync-unmap-cycles: - shard-iclb: NOTRUN -> [SKIP][19] ([i915#3297]) [19]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20777/shard-iclb6/igt@gem_userptr_bl...@unsync-unmap-cycles.html * igt@gem_userptr_blits@vma-merge: - shard-apl: NOTRUN -> [FAIL][20] ([i915#3318]) [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20777/shard-apl1/igt@gem_userptr_bl...@vma-merge.html * igt@gen3_render_linear_blits: - shard-tglb: NOTRUN -> [SKIP][21] ([fdo#109289]) +1 similar issue [21]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20777/shard-tglb1/
Re: [Intel-gfx] [PATCH] drm/i915: Fix syncmap memory leak
On 8/6/2021 11:29, Matthew Brost wrote: On Fri, Aug 06, 2021 at 11:23:06AM -0700, John Harrison wrote: On 7/30/2021 12:53, Matthew Brost wrote: A small race exists between intel_gt_retire_requests_timeout and intel_timeline_exit which could result in the syncmap not getting free'd. Rather than work to hard to seal this race, simply cleanup the free'd -> freed Sure. syncmap on fini. unreferenced object 0x88813bc53b18 (size 96): comm "gem_close_race", pid 5410, jiffies 4294917818 (age 1105.600s) hex dump (first 32 bytes): 01 00 00 00 00 00 00 00 00 00 00 00 0a 00 00 00 00 00 00 00 00 00 00 00 6b 6b 6b 6b 06 00 00 00 backtrace: [<120b863a>] __sync_alloc_leaf+0x1e/0x40 [i915] [<042f6959>] __sync_set+0x1bb/0x240 [i915] [<90f0e90f>] i915_request_await_dma_fence+0x1c7/0x400 [i915] [<56a48219>] i915_request_await_object+0x222/0x360 [i915] [] i915_gem_do_execbuffer+0x1bd0/0x2250 [i915] [<3c9d830f>] i915_gem_execbuffer2_ioctl+0x405/0xce0 [i915] [ ] drm_ioctl_kernel+0xb0/0xf0 [drm] [ ] drm_ioctl+0x305/0x3c0 [drm] [<8b0d8986>] __x64_sys_ioctl+0x71/0xb0 [<76c362a4>] do_syscall_64+0x33/0x80 [ ] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Signed-off-by: Matthew Brost Fixes: 531958f6f357 ("drm/i915/gt: Track timeline activeness in enter/exit") Cc: --- drivers/gpu/drm/i915/gt/intel_timeline.c | 9 + 1 file changed, 9 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c index c4a126c8caef..1257f4f11e66 100644 --- a/drivers/gpu/drm/i915/gt/intel_timeline.c +++ b/drivers/gpu/drm/i915/gt/intel_timeline.c @@ -127,6 +127,15 @@ static void intel_timeline_fini(struct rcu_head *rcu) i915_vma_put(timeline->hwsp_ggtt); i915_active_fini(&timeline->active); + + /* +* A small race exists between intel_gt_retire_requests_timeout and +* intel_timeline_exit which could result in the syncmap not getting +* free'd. Rather than work to hard to seal this race, simply cleanup +* the syncmap on fini. What is the race? I'm going round in circles just trying to work out how intel_gt_retire_requests_timeout is supposed to get to intel_timeline_exit in the first place. intel_gt_retire_requests_timeout increments tl->active_count, active_count == 2 intel_timeline_exit is called, returns on atomic_add_unless, active_count == 1 intel_gt_retire_requests_timeout decrements tl->active_count, active_count == 0 i915_syncmap_free is never called, memory leak Matt Okay. Think I follow it now. Seems like the syncmap free should have been in timeline_fini instead of timeline_exit in the first place? Reviewed-by: John Harrison Also, free'd -> freed. John. +*/ + i915_syncmap_free(&timeline->sync); + kfree(timeline); }
[Intel-gfx] [PATCH] drm/i915: Release ctx->syncobj on final put, not on ctx close
gem context refcounting is another exercise in least locking design it seems, where most things get destroyed upon context closure (which can race with anything really). Only the actual memory allocation and the locks survive while holding a reference. This tripped up Jason when reimplementing the single timeline feature in commit 00dae4d3d35d4f526929633b76e00b0ab4d3970d Author: Jason Ekstrand Date: Thu Jul 8 10:48:12 2021 -0500 drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4) We could fix the bug by holding ctx->mutex, but it's cleaner to just make the context object actually invariant over its _entire_ lifetime. Signed-off-by: Daniel Vetter Fixes: 00dae4d3d35d ("drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4)") Cc: Jason Ekstrand Cc: Chris Wilson Cc: Tvrtko Ursulin Cc: Joonas Lahtinen Cc: Matthew Brost Cc: Matthew Auld Cc: Maarten Lankhorst Cc: "Thomas Hellström" Cc: Lionel Landwerlin Cc: Dave Airlie --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 754b9b8d4981..93ba0197d70a 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -940,6 +940,9 @@ void i915_gem_context_release(struct kref *ref) trace_i915_context_free(ctx); GEM_BUG_ON(!i915_gem_context_is_closed(ctx)); + if (ctx->syncobj) + drm_syncobj_put(ctx->syncobj); + mutex_destroy(&ctx->engines_mutex); mutex_destroy(&ctx->lut_mutex); @@ -1159,9 +1162,6 @@ static void context_close(struct i915_gem_context *ctx) if (vm) i915_vm_close(vm); - if (ctx->syncobj) - drm_syncobj_put(ctx->syncobj); - ctx->file_priv = ERR_PTR(-EBADF); /* -- 2.32.0
[Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915: Release ctx->syncobj on final put, not on ctx close
== Series Details == Series: drm/i915: Release ctx->syncobj on final put, not on ctx close URL : https://patchwork.freedesktop.org/series/93470/ State : warning == Summary == $ dim checkpatch origin/drm-tip d1dd9f17231e drm/i915: Release ctx->syncobj on final put, not on ctx close -:17: ERROR:GIT_COMMIT_ID: Please use git commit description style 'commit <12+ chars of sha1> ("")' - ie: 'commit 00dae4d3d35d ("drm/i915: Implement SINGLE_TIMELINE with a syncobj (v4)")' #17: commit 00dae4d3d35d4f526929633b76e00b0ab4d3970d -:62: WARNING:FROM_SIGN_OFF_MISMATCH: From:/Signed-off-by: email address mismatch: 'From: Daniel Vetter ' != 'Signed-off-by: Daniel Vetter ' total: 1 errors, 1 warnings, 0 checks, 18 lines checked
[Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915: Release ctx->syncobj on final put, not on ctx close
== Series Details == Series: drm/i915: Release ctx->syncobj on final put, not on ctx close URL : https://patchwork.freedesktop.org/series/93470/ State : success == Summary == CI Bug Log - changes from CI_DRM_10457 -> Patchwork_20784 Summary --- **SUCCESS** No regressions found. External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20784/index.html Known issues Here are the changes found in Patchwork_20784 that come from known issues: ### IGT changes ### Issues hit * igt@i915_selftest@live@gt_heartbeat: - fi-kbl-soraka: [PASS][1] -> [DMESG-FAIL][2] ([i915#2291] / [i915#541]) [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10457/fi-kbl-soraka/igt@i915_selftest@live@gt_heartbeat.html [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20784/fi-kbl-soraka/igt@i915_selftest@live@gt_heartbeat.html Possible fixes * igt@gem_exec_suspend@basic-s3: - fi-tgl-1115g4: [FAIL][3] ([i915#1888]) -> [PASS][4] [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10457/fi-tgl-1115g4/igt@gem_exec_susp...@basic-s3.html [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20784/fi-tgl-1115g4/igt@gem_exec_susp...@basic-s3.html * igt@i915_selftest@live@hangcheck: - fi-icl-u2: [DMESG-FAIL][5] -> [PASS][6] [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10457/fi-icl-u2/igt@i915_selftest@l...@hangcheck.html [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20784/fi-icl-u2/igt@i915_selftest@l...@hangcheck.html [i915#1888]: https://gitlab.freedesktop.org/drm/intel/issues/1888 [i915#2291]: https://gitlab.freedesktop.org/drm/intel/issues/2291 [i915#541]: https://gitlab.freedesktop.org/drm/intel/issues/541 Participating hosts (37 -> 34) -- Missing(3): fi-bdw-samus fi-bsw-cyan bat-jsl-1 Build changes - * Linux: CI_DRM_10457 -> Patchwork_20784 CI-20190529: 20190529 CI_DRM_10457: 7700f858b68060307b0a7d94377a5d8f64000e5d @ git://anongit.freedesktop.org/gfx-ci/linux IGT_6162: 2f32b9e0da5f1ac9529318dd5b836c8cf4d3c441 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git Patchwork_20784: d1dd9f17231e132d8ea8aeaeb7fffdf45380f3ee @ git://anongit.freedesktop.org/gfx-ci/linux == Linux commits == d1dd9f17231e drm/i915: Release ctx->syncobj on final put, not on ctx close == Logs == For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20784/index.html
[Intel-gfx] ✓ Fi.CI.IGT: success for Begin enabling Xe_HP SDV and DG2 platforms (rev10)
== Series Details == Series: Begin enabling Xe_HP SDV and DG2 platforms (rev10) URL : https://patchwork.freedesktop.org/series/92135/ State : success == Summary == CI Bug Log - changes from CI_DRM_10457_full -> Patchwork_20781_full Summary --- **SUCCESS** No regressions found. Possible new issues --- Here are the unknown changes that may have been introduced in Patchwork_20781_full: ### IGT changes ### Suppressed The following results come from untrusted machines, tests, or statuses. They do not affect the overall result. * {igt@gem_userptr_blits@map-fixed-invalidate-overlap@fixed}: - {shard-rkl}:NOTRUN -> [SKIP][1] [1]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20781/shard-rkl-5/igt@gem_userptr_blits@map-fixed-invalidate-over...@fixed.html Known issues Here are the changes found in Patchwork_20781_full that come from known issues: ### IGT changes ### Issues hit * igt@gem_create@create-massive: - shard-snb: NOTRUN -> [DMESG-WARN][2] ([i915#3002]) [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20781/shard-snb5/igt@gem_cre...@create-massive.html * igt@gem_ctx_isolation@preservation-s3@vcs0: - shard-kbl: [PASS][3] -> [DMESG-WARN][4] ([i915#180]) +2 similar issues [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10457/shard-kbl1/igt@gem_ctx_isolation@preservation...@vcs0.html [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20781/shard-kbl3/igt@gem_ctx_isolation@preservation...@vcs0.html - shard-skl: [PASS][5] -> [INCOMPLETE][6] ([i915#198]) [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10457/shard-skl1/igt@gem_ctx_isolation@preservation...@vcs0.html [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20781/shard-skl4/igt@gem_ctx_isolation@preservation...@vcs0.html * igt@gem_ctx_persistence@legacy-engines-mixed: - shard-snb: NOTRUN -> [SKIP][7] ([fdo#109271] / [i915#1099]) +1 similar issue [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20781/shard-snb7/igt@gem_ctx_persiste...@legacy-engines-mixed.html * igt@gem_eio@unwedge-stress: - shard-snb: NOTRUN -> [FAIL][8] ([i915#3354]) [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20781/shard-snb5/igt@gem_...@unwedge-stress.html * igt@gem_exec_endless@dispatch@rcs0: - shard-tglb: [PASS][9] -> [INCOMPLETE][10] ([i915#3778]) [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10457/shard-tglb3/igt@gem_exec_endless@dispa...@rcs0.html [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20781/shard-tglb3/igt@gem_exec_endless@dispa...@rcs0.html * igt@gem_exec_fair@basic-none-rrul@rcs0: - shard-glk: [PASS][11] -> [FAIL][12] ([i915#2842]) +1 similar issue [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10457/shard-glk5/igt@gem_exec_fair@basic-none-r...@rcs0.html [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20781/shard-glk6/igt@gem_exec_fair@basic-none-r...@rcs0.html * igt@gem_exec_fair@basic-none-share@rcs0: - shard-iclb: [PASS][13] -> [FAIL][14] ([i915#2842]) [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10457/shard-iclb6/igt@gem_exec_fair@basic-none-sh...@rcs0.html [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20781/shard-iclb1/igt@gem_exec_fair@basic-none-sh...@rcs0.html * igt@gem_exec_fair@basic-none@vcs0: - shard-kbl: NOTRUN -> [FAIL][15] ([i915#2842]) +3 similar issues [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20781/shard-kbl2/igt@gem_exec_fair@basic-n...@vcs0.html * igt@gem_exec_fair@basic-pace@rcs0: - shard-tglb: [PASS][16] -> [FAIL][17] ([i915#2842]) [16]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10457/shard-tglb1/igt@gem_exec_fair@basic-p...@rcs0.html [17]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20781/shard-tglb7/igt@gem_exec_fair@basic-p...@rcs0.html * igt@gem_exec_fair@basic-pace@vcs1: - shard-kbl: [PASS][18] -> [SKIP][19] ([fdo#109271]) [18]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10457/shard-kbl7/igt@gem_exec_fair@basic-p...@vcs1.html [19]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20781/shard-kbl3/igt@gem_exec_fair@basic-p...@vcs1.html * igt@gem_exec_fair@basic-pace@vecs0: - shard-kbl: [PASS][20] -> [FAIL][21] ([i915#2842]) +1 similar issue [20]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10457/shard-kbl7/igt@gem_exec_fair@basic-p...@vecs0.html [21]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20781/shard-kbl3/igt@gem_exec_fair@basic-p...@vecs0.html * igt@gem_exec_whisper@basic-queues: - shard-glk: [PASS][22] -> [DMESG-WARN][23] ([i915#118] / [i915#95]) [22]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10457/shard-glk1/igt@gem_exec_whis...@basic-queu
[Intel-gfx] ✓ Fi.CI.IGT: success for drm/i915: Only access SFC_DONE when media domain is not fused off
== Series Details == Series: drm/i915: Only access SFC_DONE when media domain is not fused off URL : https://patchwork.freedesktop.org/series/93468/ State : success == Summary == CI Bug Log - changes from CI_DRM_10457_full -> Patchwork_20782_full Summary --- **SUCCESS** No regressions found. Known issues Here are the changes found in Patchwork_20782_full that come from known issues: ### IGT changes ### Issues hit * igt@gem_ctx_persistence@legacy-engines-mixed: - shard-snb: NOTRUN -> [SKIP][1] ([fdo#109271] / [i915#1099]) +3 similar issues [1]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20782/shard-snb6/igt@gem_ctx_persiste...@legacy-engines-mixed.html * igt@gem_ctx_sseu@invalid-args: - shard-apl: NOTRUN -> [SKIP][2] ([fdo#109271]) +107 similar issues [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20782/shard-apl6/igt@gem_ctx_s...@invalid-args.html * igt@gem_eio@unwedge-stress: - shard-tglb: [PASS][3] -> [TIMEOUT][4] ([i915#2369] / [i915#3063] / [i915#3648]) [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10457/shard-tglb2/igt@gem_...@unwedge-stress.html [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20782/shard-tglb5/igt@gem_...@unwedge-stress.html - shard-snb: NOTRUN -> [FAIL][5] ([i915#3354]) [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20782/shard-snb7/igt@gem_...@unwedge-stress.html * igt@gem_exec_fair@basic-none-rrul@rcs0: - shard-glk: [PASS][6] -> [FAIL][7] ([i915#2842]) [6]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10457/shard-glk5/igt@gem_exec_fair@basic-none-r...@rcs0.html [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20782/shard-glk3/igt@gem_exec_fair@basic-none-r...@rcs0.html * igt@gem_exec_fair@basic-none-share@rcs0: - shard-iclb: [PASS][8] -> [FAIL][9] ([i915#2842]) [8]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10457/shard-iclb6/igt@gem_exec_fair@basic-none-sh...@rcs0.html [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20782/shard-iclb6/igt@gem_exec_fair@basic-none-sh...@rcs0.html * igt@gem_exec_fair@basic-none@vcs1: - shard-iclb: NOTRUN -> [FAIL][10] ([i915#2842]) [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20782/shard-iclb2/igt@gem_exec_fair@basic-n...@vcs1.html * igt@gem_exec_fair@basic-pace@rcs0: - shard-tglb: [PASS][11] -> [FAIL][12] ([i915#2842]) [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10457/shard-tglb1/igt@gem_exec_fair@basic-p...@rcs0.html [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20782/shard-tglb3/igt@gem_exec_fair@basic-p...@rcs0.html * igt@gem_exec_fair@basic-pace@vecs0: - shard-kbl: [PASS][13] -> [FAIL][14] ([i915#2842]) +2 similar issues [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10457/shard-kbl7/igt@gem_exec_fair@basic-p...@vecs0.html [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20782/shard-kbl4/igt@gem_exec_fair@basic-p...@vecs0.html * igt@gem_mmap_gtt@big-copy: - shard-glk: [PASS][15] -> [FAIL][16] ([i915#1888] / [i915#307]) [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10457/shard-glk3/igt@gem_mmap_...@big-copy.html [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20782/shard-glk1/igt@gem_mmap_...@big-copy.html * igt@gem_render_copy@x-tiled-to-vebox-yf-tiled: - shard-kbl: NOTRUN -> [SKIP][17] ([fdo#109271]) +72 similar issues [17]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20782/shard-kbl1/igt@gem_render_c...@x-tiled-to-vebox-yf-tiled.html * igt@gem_softpin@noreloc-s3: - shard-apl: NOTRUN -> [DMESG-WARN][18] ([i915#180]) [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20782/shard-apl3/igt@gem_soft...@noreloc-s3.html * igt@gen9_exec_parse@allowed-single: - shard-apl: [PASS][19] -> [DMESG-WARN][20] ([i915#1436] / [i915#716]) [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10457/shard-apl1/igt@gen9_exec_pa...@allowed-single.html [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20782/shard-apl2/igt@gen9_exec_pa...@allowed-single.html * igt@i915_pm_rpm@basic-rte: - shard-apl: NOTRUN -> [FAIL][21] ([i915#579]) [21]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20782/shard-apl1/igt@i915_pm_...@basic-rte.html * igt@i915_suspend@fence-restore-untiled: - shard-apl: [PASS][22] -> [DMESG-WARN][23] ([i915#180]) +1 similar issue [22]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10457/shard-apl7/igt@i915_susp...@fence-restore-untiled.html [23]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20782/shard-apl8/igt@i915_susp...@fence-restore-untiled.html * igt@i915_suspend@forcewake: - shard-kbl: [PASS][24] -> [DMESG-WARN][25] ([i915#180]) +4 similar issue
[Intel-gfx] ✗ Fi.CI.IGT: failure for drm/i915: Release ctx->syncobj on final put, not on ctx close
== Series Details == Series: drm/i915: Release ctx->syncobj on final put, not on ctx close URL : https://patchwork.freedesktop.org/series/93470/ State : failure == Summary == CI Bug Log - changes from CI_DRM_10457_full -> Patchwork_20784_full Summary --- **FAILURE** Serious unknown changes coming with Patchwork_20784_full absolutely need to be verified manually. If you think the reported changes have nothing to do with the changes introduced in Patchwork_20784_full, please notify your bug team to allow them to document this new failure mode, which will reduce false positives in CI. Possible new issues --- Here are the unknown changes that may have been introduced in Patchwork_20784_full: ### IGT changes ### Possible regressions * igt@gem_ctx_shared@exec-single-timeline@vecs0: - shard-apl: NOTRUN -> [DMESG-WARN][1] [1]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20784/shard-apl2/igt@gem_ctx_shared@exec-single-timel...@vecs0.html * igt@gem_ctx_shared@q-in-order@bcs0: - shard-apl: [PASS][2] -> [DMESG-WARN][3] +5 similar issues [2]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10457/shard-apl1/igt@gem_ctx_shared@q-in-or...@bcs0.html [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20784/shard-apl8/igt@gem_ctx_shared@q-in-or...@bcs0.html * igt@gem_ctx_shared@q-out-order@bcs0: - shard-iclb: [PASS][4] -> [DMESG-WARN][5] +9 similar issues [4]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10457/shard-iclb8/igt@gem_ctx_shared@q-out-or...@bcs0.html [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20784/shard-iclb6/igt@gem_ctx_shared@q-out-or...@bcs0.html * igt@gem_ctx_shared@q-smoketest-all: - shard-glk: [PASS][6] -> [DMESG-WARN][7] +7 similar issues [6]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10457/shard-glk9/igt@gem_ctx_sha...@q-smoketest-all.html [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20784/shard-glk1/igt@gem_ctx_sha...@q-smoketest-all.html * igt@gem_ctx_shared@single-timeline: - shard-skl: [PASS][8] -> [DMESG-WARN][9] +12 similar issues [8]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10457/shard-skl8/igt@gem_ctx_sha...@single-timeline.html [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20784/shard-skl4/igt@gem_ctx_sha...@single-timeline.html * igt@gem_exec_whisper@basic-queues-all: - shard-tglb: [PASS][10] -> [DMESG-WARN][11] +9 similar issues [10]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10457/shard-tglb8/igt@gem_exec_whis...@basic-queues-all.html [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20784/shard-tglb8/igt@gem_exec_whis...@basic-queues-all.html - shard-kbl: NOTRUN -> [DMESG-WARN][12] +1 similar issue [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20784/shard-kbl1/igt@gem_exec_whis...@basic-queues-all.html * igt@gem_exec_whisper@basic-queues-priority-all: - shard-kbl: [PASS][13] -> [DMESG-WARN][14] +6 similar issues [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10457/shard-kbl3/igt@gem_exec_whis...@basic-queues-priority-all.html [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20784/shard-kbl4/igt@gem_exec_whis...@basic-queues-priority-all.html Suppressed The following results come from untrusted machines, tests, or statuses. They do not affect the overall result. * igt@gem_ctx_shared@q-independent@bcs0: - {shard-rkl}:[PASS][15] -> [DMESG-WARN][16] +7 similar issues [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10457/shard-rkl-2/igt@gem_ctx_shared@q-independ...@bcs0.html [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20784/shard-rkl-2/igt@gem_ctx_shared@q-independ...@bcs0.html * igt@gem_ctx_shared@q-smoketest@rcs0: - {shard-rkl}:NOTRUN -> [DMESG-WARN][17] +1 similar issue [17]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20784/shard-rkl-2/igt@gem_ctx_shared@q-smoket...@rcs0.html Known issues Here are the changes found in Patchwork_20784_full that come from known issues: ### IGT changes ### Issues hit * igt@gem_ctx_isolation@preservation-s3@bcs0: - shard-kbl: [PASS][18] -> [DMESG-WARN][19] ([i915#180]) +2 similar issues [18]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10457/shard-kbl1/igt@gem_ctx_isolation@preservation...@bcs0.html [19]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20784/shard-kbl6/igt@gem_ctx_isolation@preservation...@bcs0.html * igt@gem_ctx_persistence@smoketest: - shard-snb: NOTRUN -> [SKIP][20] ([fdo#109271] / [i915#1099]) +1 similar issue [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20784/shard-snb6/igt@gem_ctx_persiste...@smoketest.html * igt@gem_eio@unwedge-stress: - shard-tglb: [PA
Re: [Intel-gfx] [PATCH] drm/i915: Disable bonding on gen12+ platforms
On Fri, Aug 06, 2021 at 08:54:59PM +0200, Daniel Vetter wrote: > On Fri, Aug 6, 2021 at 8:25 PM John Harrison > wrote: > > On 7/28/2021 12:21, Matthew Brost wrote: > > > Disable bonding on gen12+ platforms aside from ones already supported by > > > the i915 - TGL, RKL, and ADL-S. > > > > > > Signed-off-by: Matthew Brost > > > --- > > > drivers/gpu/drm/i915/gem/i915_gem_context.c | 7 +++ > > > 1 file changed, 7 insertions(+) > > > > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c > > > b/drivers/gpu/drm/i915/gem/i915_gem_context.c > > > index 05c3ee191710..9c3672bac0e2 100644 > > > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c > > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c > > > @@ -446,6 +446,13 @@ set_proto_ctx_engines_bond(struct > > > i915_user_extension __user *base, void *data) > > > u16 idx, num_bonds; > > > int err, n; > > > > > > + if (GRAPHICS_VER(i915) >= 12 && !IS_TIGERLAKE(i915) && > > > + !IS_ROCKETLAKE(i915) && !IS_ALDERLAKE_S(i915)) { > > > + drm_dbg(&i915->drm, > > > + "Bonding on gen12+ aside from TGL, RKL, and ADL_S > > > not allowed\n"); > > I would have said not supported rather than not allowed. Either way: > > Reviewed-by: John Harrison > > Either is fine with me. > > Acked-by: Daniel Vetter > Applied to drm-intel-gt-next (with the suggested debug message wording tweak). Thanks for the patch and reviews. Matt > > > > > + return -ENODEV; > > > + } > > > + > > > if (get_user(idx, &ext->virtual_index)) > > > return -EFAULT; > > > > > > > > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch -- Matt Roper Graphics Software Engineer VTT-OSGC Platform Enablement Intel Corporation (916) 356-2795