[Intel-gfx] [PULL] gvt-next-fixes
Hi, Here's to only include gvt fixes for 5.9-rc1. Two fixes to make guest suspend/resume working gracefully are included. Thanks -- The following changes since commit e57bd05ec0d2d82d63725dedf9f5a063f879de25: drm/i915: Update DRIVER_DATE to 20200715 (2020-07-15 14:18:02 +0300) are available in the Git repository at: https://github.com/intel/gvt-linux tags/gvt-next-fixes-2020-08-05 for you to fetch changes up to 9e7c0efadb86ddb58965561bbca638d44792d78f: drm/i915/gvt: Do not reset pv_notified when vGPU transit from D3->D0 (2020-07-29 14:18:32 +0800) gvt-next-fixes-2020-08-05 - Fix guest suspend/resume low performance handling of shadow ppgtt (Colin) - Fix PV notifier handling for guest suspend/resume (Colin) Colin Xu (2): drm/i915/gvt: Do not destroy ppgtt_mm during vGPU D3->D0. drm/i915/gvt: Do not reset pv_notified when vGPU transit from D3->D0 drivers/gpu/drm/i915/gvt/cfg_space.c | 24 drivers/gpu/drm/i915/gvt/gtt.c | 2 +- drivers/gpu/drm/i915/gvt/gtt.h | 2 ++ drivers/gpu/drm/i915/gvt/gvt.h | 3 +++ drivers/gpu/drm/i915/gvt/vgpu.c | 20 +--- 5 files changed, 47 insertions(+), 4 deletions(-) signature.asc Description: PGP signature ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH] drm/i915/gt: Prevent immediate reuse of the last context tag
While we only release the context tag after we have processed the context-switch event away from the context, be paranoid in case that value remains live in HW and so avoid reusing the last tag for the next context after a brief idle. Signed-off-by: Chris Wilson Cc: Ramalingam C --- drivers/gpu/drm/i915/gt/intel_engine_types.h | 1 + drivers/gpu/drm/i915/gt/intel_lrc.c | 20 2 files changed, 17 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index c400aaa2287b..bfa0199b7a2c 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -330,6 +330,7 @@ struct intel_engine_cs { atomic_t fw_active; unsigned long context_tag; + unsigned long context_last; struct rb_node uabi_node; diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index 417f6b0c6c61..f8a0ee67d930 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -1335,6 +1335,21 @@ static void intel_context_update_runtime(struct intel_context *ce) ce->runtime.total += dt; } +static unsigned int next_cyclic_tag(struct intel_engine_cs *engine) +{ + unsigned long tag, mask = ~0ul << engine->context_last; + + /* Cyclically allocate unused ids, prevent immediate reuse of last */ + tag = READ_ONCE(engine->context_tag); + tag = (tag & mask) ?: tag; + GEM_BUG_ON(tag == 0); + + tag = __ffs(tag); + clear_bit(tag, &engine->context_tag); + + return engine->context_last = tag + 1; +} + static inline struct intel_engine_cs * __execlists_schedule_in(struct i915_request *rq) { @@ -1355,12 +1370,9 @@ __execlists_schedule_in(struct i915_request *rq) ce->lrc.ccid = ce->tag; } else { /* We don't need a strict matching tag, just different values */ - unsigned int tag = ffs(READ_ONCE(engine->context_tag)); + unsigned int tag = next_cyclic_tag(engine); - GEM_BUG_ON(tag == 0 || tag >= BITS_PER_LONG); - clear_bit(tag - 1, &engine->context_tag); ce->lrc.ccid = tag << (GEN11_SW_CTX_ID_SHIFT - 32); - BUILD_BUG_ON(BITS_PER_LONG > GEN12_MAX_CONTEXT_HW_ID); } -- 2.20.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH] drm/i915/gt: Prevent immediate reuse of the last context tag
Quoting Chris Wilson (2020-08-05 09:37:51) > While we only release the context tag after we have processed the > context-switch event away from the context, be paranoid in case that > value remains live in HW and so avoid reusing the last tag for the next > context after a brief idle. > Fixes: 5c4a53e3b1cb ("drm/i915/execlists: Track inflight CCID") > Signed-off-by: Chris Wilson > Cc: Ramalingam C Cc: # v5.5+ -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] ✗ Fi.CI.SPARSE: warning for drm/i915/gt: Prevent immediate reuse of the last context tag
== Series Details == Series: drm/i915/gt: Prevent immediate reuse of the last context tag URL : https://patchwork.freedesktop.org/series/80277/ State : warning == Summary == $ dim sparse --fast origin/drm-tip Sparse version: v0.6.0 Fast mode used, each commit won't be checked separately. ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PULL] drm-misc-next-fixes
drm-misc-next-fixes-2020-08-05: drm-misc-next-fixes for v5.9-rc1: - Fix drm_dp_mst_port refcount leaks in drm_dp_mst_allocate_vcpi - Fix a fbcon OOB read in fbdev, found by syzbot. - Mark vga_tryget static as it's not used elsewhere. - Small fixes to xlnx. - Remove null check for kfree in drm_dev_release. - Fix DRM_FORMAT_MOD_AMLOGIC_FBC definition. - Fix mode initialization in omap_connector_mode_valid(). The following changes since commit 206739119508d5ab4b42ab480ff61a7e6cd72d7c: Merge tag 'amd-drm-next-5.9-2020-07-17' of git://people.freedesktop.org/~agd5f/linux into drm-next (2020-07-23 15:38:11 +1000) are available in the Git repository at: git://anongit.freedesktop.org/drm/drm-misc tags/drm-misc-next-fixes-2020-08-05 for you to fetch changes up to a34a0a632dd991a371fec56431d73279f9c54029: drm: fix drm_dp_mst_port refcount leaks in drm_dp_mst_allocate_vcpi (2020-08-04 12:21:11 -0400) drm-misc-next-fixes for v5.9-rc1: - Fix drm_dp_mst_port refcount leaks in drm_dp_mst_allocate_vcpi - Fix a fbcon OOB read in fbdev, found by syzbot. - Mark vga_tryget static as it's not used elsewhere. - Small fixes to xlnx. - Remove null check for kfree in drm_dev_release. - Fix DRM_FORMAT_MOD_AMLOGIC_FBC definition. - Fix mode initialization in omap_connector_mode_valid(). Christoph Hellwig (1): vgaarb: mark vga_tryget static Colin Ian King (1): drm: xlnx: fix spelling mistake "failes" -> "failed" Hyun Kwon (1): drm: xlnx: zynqmp: Use switch - case for link rate downshift Li Heng (1): drm: Remove redundant NULL check Neil Armstrong (1): drm/fourcc: fix Amlogic Video Framebuffer Compression macro Tetsuo Handa (1): fbmem: pull fbcon_update_vcs() out of fb_set_var() Ville Syrjälä (1): drm/omap: Use {} to zero initialize the mode Wei Yongjun (1): drm: xlnx: Fix typo in parameter description Xin Xiong (1): drm: fix drm_dp_mst_port refcount leaks in drm_dp_mst_allocate_vcpi drivers/gpu/drm/drm_dp_mst_topology.c| 7 --- drivers/gpu/drm/drm_drv.c| 3 +-- drivers/gpu/drm/omapdrm/omap_connector.c | 2 +- drivers/gpu/drm/xlnx/zynqmp_dp.c | 33 +--- drivers/gpu/vga/vgaarb.c | 3 +-- drivers/video/fbdev/core/fbmem.c | 8 ++-- drivers/video/fbdev/core/fbsysfs.c | 4 ++-- drivers/video/fbdev/ps3fb.c | 5 +++-- include/linux/fb.h | 2 -- include/linux/vgaarb.h | 6 -- include/uapi/drm/drm_fourcc.h| 2 +- 11 files changed, 33 insertions(+), 42 deletions(-) ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915/gt: Prevent immediate reuse of the last context tag
== Series Details == Series: drm/i915/gt: Prevent immediate reuse of the last context tag URL : https://patchwork.freedesktop.org/series/80277/ State : failure == Summary == CI Bug Log - changes from CI_DRM_8844 -> Patchwork_18308 Summary --- **FAILURE** Serious unknown changes coming with Patchwork_18308 absolutely need to be verified manually. If you think the reported changes have nothing to do with the changes introduced in Patchwork_18308, please notify your bug team to allow them to document this new failure mode, which will reduce false positives in CI. External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18308/index.html Possible new issues --- Here are the unknown changes that may have been introduced in Patchwork_18308: ### IGT changes ### Possible regressions * igt@i915_pm_rpm@module-reload: - fi-hsw-4770:[PASS][1] -> [DMESG-WARN][2] [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8844/fi-hsw-4770/igt@i915_pm_...@module-reload.html [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18308/fi-hsw-4770/igt@i915_pm_...@module-reload.html Suppressed The following results come from untrusted machines, tests, or statuses. They do not affect the overall result. * igt@i915_pm_rpm@module-reload: - {fi-kbl-7560u}: [PASS][3] -> [DMESG-WARN][4] [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8844/fi-kbl-7560u/igt@i915_pm_...@module-reload.html [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18308/fi-kbl-7560u/igt@i915_pm_...@module-reload.html * igt@runner@aborted: - {fi-tgl-dsi}: NOTRUN -> [FAIL][5] [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18308/fi-tgl-dsi/igt@run...@aborted.html Known issues Here are the changes found in Patchwork_18308 that come from known issues: ### IGT changes ### Issues hit * igt@kms_busy@basic@flip: - fi-kbl-x1275: [PASS][6] -> [DMESG-WARN][7] ([i915#62] / [i915#92] / [i915#95]) [6]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8844/fi-kbl-x1275/igt@kms_busy@ba...@flip.html [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18308/fi-kbl-x1275/igt@kms_busy@ba...@flip.html Possible fixes * igt@i915_pm_rpm@basic-pci-d3-state: - fi-bsw-kefka: [DMESG-WARN][8] ([i915#1982]) -> [PASS][9] [8]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8844/fi-bsw-kefka/igt@i915_pm_...@basic-pci-d3-state.html [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18308/fi-bsw-kefka/igt@i915_pm_...@basic-pci-d3-state.html * igt@i915_selftest@live@execlists: - fi-kbl-guc: [INCOMPLETE][10] ([i915#794]) -> [PASS][11] [10]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8844/fi-kbl-guc/igt@i915_selftest@l...@execlists.html [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18308/fi-kbl-guc/igt@i915_selftest@l...@execlists.html * igt@i915_selftest@live@gem_contexts: - fi-tgl-u2: [INCOMPLETE][12] ([i915#2045]) -> [PASS][13] [12]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8844/fi-tgl-u2/igt@i915_selftest@live@gem_contexts.html [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18308/fi-tgl-u2/igt@i915_selftest@live@gem_contexts.html * igt@kms_flip@basic-flip-vs-wf_vblank@c-edp1: - fi-icl-u2: [DMESG-WARN][14] ([i915#1982]) -> [PASS][15] +1 similar issue [14]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8844/fi-icl-u2/igt@kms_flip@basic-flip-vs-wf_vbl...@c-edp1.html [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18308/fi-icl-u2/igt@kms_flip@basic-flip-vs-wf_vbl...@c-edp1.html * igt@kms_flip@basic-flip-vs-wf_vblank@c-hdmi-a2: - fi-skl-guc: [DMESG-WARN][16] ([i915#2203]) -> [PASS][17] [16]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8844/fi-skl-guc/igt@kms_flip@basic-flip-vs-wf_vbl...@c-hdmi-a2.html [17]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18308/fi-skl-guc/igt@kms_flip@basic-flip-vs-wf_vbl...@c-hdmi-a2.html Warnings * igt@i915_pm_rpm@module-reload: - fi-kbl-guc: [DMESG-FAIL][18] ([i915#2203]) -> [DMESG-WARN][19] ([i915#2203]) [18]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8844/fi-kbl-guc/igt@i915_pm_...@module-reload.html [19]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18308/fi-kbl-guc/igt@i915_pm_...@module-reload.html * igt@kms_cursor_legacy@basic-busy-flip-before-cursor-legacy: - fi-kbl-x1275: [DMESG-WARN][20] ([i915#62] / [i915#92] / [i915#95]) -> [DMESG-WARN][21] ([i915#62] / [i915#92]) +6 similar issues [20]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8844/fi-kbl-x1275/igt@kms_cursor_leg...@basic-busy-flip-before-cursor-legacy.html [21]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18308/fi-kbl-x1275/igt@kms_cursor_leg...@basic-busy-flip-befo
[Intel-gfx] [PATCH v3 0/2] HDCP minor refactoring
No functional change. Anshuman Gupta (2): drm/i915/hdcp: Add update_pipe early return drm/i915/hdcp: No direct access to power_well desc drivers/gpu/drm/i915/display/intel_hdcp.c | 23 +-- 1 file changed, 9 insertions(+), 14 deletions(-) -- 2.26.2 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v3 2/2] drm/i915/hdcp: No direct access to power_well desc
HDCP code doesn't require to access power_well internal stuff, instead it should use the intel_display_power_well_is_enabled() to get the status of desired power_well. No functional change. v2: - used with_intel_runtime_pm instead of get/put. [Jani] v3: - rebased. Cc: Jani Nikula Signed-off-by: Anshuman Gupta --- drivers/gpu/drm/i915/display/intel_hdcp.c | 15 +++ 1 file changed, 3 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/i915/display/intel_hdcp.c b/drivers/gpu/drm/i915/display/intel_hdcp.c index a1e0d518e529..e76b049618db 100644 --- a/drivers/gpu/drm/i915/display/intel_hdcp.c +++ b/drivers/gpu/drm/i915/display/intel_hdcp.c @@ -148,9 +148,8 @@ static int intel_hdcp_poll_ksv_fifo(struct intel_digital_port *dig_port, static bool hdcp_key_loadable(struct drm_i915_private *dev_priv) { - struct i915_power_domains *power_domains = &dev_priv->power_domains; - struct i915_power_well *power_well; enum i915_power_well_id id; + intel_wakeref_t wakeref; bool enabled = false; /* @@ -162,17 +161,9 @@ static bool hdcp_key_loadable(struct drm_i915_private *dev_priv) else id = SKL_DISP_PW_1; - mutex_lock(&power_domains->lock); - /* PG1 (power well #1) needs to be enabled */ - for_each_power_well(dev_priv, power_well) { - if (power_well->desc->id == id) { - enabled = power_well->desc->ops->is_enabled(dev_priv, - power_well); - break; - } - } - mutex_unlock(&power_domains->lock); + with_intel_runtime_pm(&dev_priv->runtime_pm, wakeref) + enabled = intel_display_power_well_is_enabled(dev_priv, id); /* * Another req for hdcp key loadability is enabled state of pll for -- 2.26.2 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH v3 1/2] drm/i915/hdcp: Add update_pipe early return
Currently intel_hdcp_update_pipe() is also getting called for non-hdcp connectors and get through its conditional code flow, which is completely unnecessary for non-hdcp connectors, therefore it make sense to have an early return. No functional change. v2: - rebased. Reviewed-by: Uma Shankar Signed-off-by: Anshuman Gupta --- drivers/gpu/drm/i915/display/intel_hdcp.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/display/intel_hdcp.c b/drivers/gpu/drm/i915/display/intel_hdcp.c index 89a4d294822d..a1e0d518e529 100644 --- a/drivers/gpu/drm/i915/display/intel_hdcp.c +++ b/drivers/gpu/drm/i915/display/intel_hdcp.c @@ -2082,11 +2082,15 @@ void intel_hdcp_update_pipe(struct intel_atomic_state *state, struct intel_connector *connector = to_intel_connector(conn_state->connector); struct intel_hdcp *hdcp = &connector->hdcp; - bool content_protection_type_changed = + bool content_protection_type_changed, desired_and_not_enabled = false; + + if (!connector->hdcp.shim) + return; + + content_protection_type_changed = (conn_state->hdcp_content_type != hdcp->content_type && conn_state->content_protection != DRM_MODE_CONTENT_PROTECTION_UNDESIRED); - bool desired_and_not_enabled = false; /* * During the HDCP encryption session if Type change is requested, -- 2.26.2 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 28/66] drm/i915/gem: Replace i915_gem_object.mm.mutex with reservation_ww_class
Quoting Thomas Hellström (Intel) (2020-07-29 14:44:41) > > On 7/29/20 2:17 PM, Tvrtko Ursulin wrote: > > > > On 28/07/2020 12:17, Thomas Hellström (Intel) wrote: > >> On 7/16/20 5:53 PM, Tvrtko Ursulin wrote: > >>> On 15/07/2020 16:43, Maarten Lankhorst wrote: > Op 15-07-2020 om 13:51 schreef Chris Wilson: > > Our goal is to pull all memory reservations (next iteration > > obj->ops->get_pages()) under a ww_mutex, and to align those > > reservations > > with other drivers, i.e. control all such allocations with the > > reservation_ww_class. Currently, this is under the purview of the > > obj->mm.mutex, and while obj->mm remains an embedded struct we can > > "simply" switch to using the reservation_ww_class > > obj->base.resv->lock > > > > The major consequence is the impact on the shrinker paths as the > > reservation_ww_class is used to wrap allocations, and a ww_mutex does > > not support subclassing so we cannot do our usual trick of knowing > > that > > we never recurse inside the shrinker and instead have to finish the > > reclaim with a trylock. This may result in us failing to release the > > pages after having released the vma. This will have to do until a > > better > > idea comes along. > > > > However, this step only converts the mutex over and continues to > > treat > > everything as a single allocation and pinning the pages. With the > > ww_mutex in place we can remove the temporary pinning, as we can then > > reserve all storage en masse. > > > > One last thing to do: kill the implict page pinning for active vma. > > This will require us to invalidate the vma->pages when the backing > > store > > is removed (and we expect that while the vma is active, we mark the > > backing store as active so that it cannot be removed while the HW is > > busy.) > > > > Signed-off-by: Chris Wilson > >>> > >>> [snip] > >>> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c > > b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c > > index dc8f052a0ffe..4e928103a38f 100644 > > --- a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c > > @@ -47,10 +47,7 @@ static bool unsafe_drop_pages(struct > > drm_i915_gem_object *obj, > > if (!(shrink & I915_SHRINK_BOUND)) > > flags = I915_GEM_OBJECT_UNBIND_TEST; > > - if (i915_gem_object_unbind(obj, flags) == 0) > > - __i915_gem_object_put_pages(obj); > > - > > - return !i915_gem_object_has_pages(obj); > > + return i915_gem_object_unbind(obj, flags) == 0; > > } > > static void try_to_writeback(struct drm_i915_gem_object *obj, > > @@ -199,14 +196,14 @@ i915_gem_shrink(struct drm_i915_private *i915, > > spin_unlock_irqrestore(&i915->mm.obj_lock, flags); > > - if (unsafe_drop_pages(obj, shrink)) { > > - /* May arrive from get_pages on another bo */ > > - mutex_lock(&obj->mm.lock); > > + if (unsafe_drop_pages(obj, shrink) && > > + i915_gem_object_trylock(obj)) { > >>> > Why trylock? Because of the nesting? In that case, still use ww ctx > if provided please > >>> > >>> By "if provided" you mean for code paths where we are calling the > >>> shrinker ourselves, as opposed to reclaim, like shmem_get_pages? > >>> > >>> That indeed sounds like the right thing to do, since all the > >>> get_pages from execbuf are in the reservation phase, collecting a > >>> list of GEM objects to lock, the ones to shrink sound like should be > >>> on that list. > >>> > > + __i915_gem_object_put_pages(obj); > > if (!i915_gem_object_has_pages(obj)) { > > try_to_writeback(obj, shrink); > > count += obj->base.size >> PAGE_SHIFT; > > } > > - mutex_unlock(&obj->mm.lock); > > + i915_gem_object_unlock(obj); > > } > > scanned += obj->base.size >> PAGE_SHIFT; > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_tiling.c > > b/drivers/gpu/drm/i915/gem/i915_gem_tiling.c > > index ff72ee2fd9cd..ac12e1c20e66 100644 > > --- a/drivers/gpu/drm/i915/gem/i915_gem_tiling.c > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_tiling.c > > @@ -265,7 +265,6 @@ i915_gem_object_set_tiling(struct > > drm_i915_gem_object *obj, > > * pages to prevent them being swapped out and causing > > corruption > > * due to the change in swizzling. > > */ > > - mutex_lock(&obj->mm.lock); > > if (i915_gem_object_has_pages(obj) && > > obj->mm.madv == I915_MADV_WILLNEED && > > i915->quirks & QUIRK_PIN_SWIZZLED_PAGES) { > > @@ -280,7 +279,6 @@ i915_gem_object
[Intel-gfx] [PATCH 24/37] drm/i915/gem: Reintroduce multiple passes for reloc processing
The prospect of locking the entire submission sequence under a wide ww_mutex re-imposes some key restrictions, in particular that we must not call copy_(from|to)_user underneath the mutex (as the faulthandlers themselves may need to take the ww_mutex). To satisfy this requirement, we need to split the relocation handling into multiple phases again. After dropping the reservations, we need to allocate enough buffer space to both copy the relocations from userspace into, and serve as the relocation command buffer. Once we have finished copying the relocations, we can then re-aquire all the objects for the execbuf and rebind them, including our new relocations objects. After we have bound all the new and old objects into their final locations, we can then convert the relocation entries into the GPU commands to update the relocated vma. Finally, once it is all over and we have dropped the ww_mutex for the last time, we can then complete the update of the user relocation entries. Signed-off-by: Chris Wilson --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 883 +- .../i915/gem/selftests/i915_gem_execbuffer.c | 206 ++-- .../drm/i915/gt/intel_gt_buffer_pool_types.h | 2 +- 3 files changed, 585 insertions(+), 506 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 0839397c7e50..58e40348b551 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -59,6 +59,20 @@ struct eb_vma_array { struct eb_vma vma[]; }; +struct eb_relocs_link { + unsigned long rsvd; /* overwritten by MI_BATCH_BUFFER_END */ + struct i915_vma *vma; +}; + +struct eb_relocs { + struct i915_vma *head; + struct drm_i915_gem_relocation_entry *map; + unsigned int pos; + unsigned int max; + + unsigned int bufsz; +}; + #define __EXEC_OBJECT_HAS_PIN BIT(31) #define __EXEC_OBJECT_HAS_FENCEBIT(30) #define __EXEC_OBJECT_NEEDS_MAPBIT(29) @@ -250,6 +264,7 @@ struct i915_execbuffer { struct intel_engine_cs *engine; /** engine to queue the request to */ struct intel_context *context; /* logical state for the request */ + struct intel_context *reloc_context; /* distinct context for relocs */ struct i915_gem_context *gem_context; /** caller's context */ struct i915_request *request; /** our request to build */ @@ -261,27 +276,11 @@ struct i915_execbuffer { /** list of all vma required to be bound for this execbuf */ struct list_head bind_list; - /** list of vma that have execobj.relocation_count */ - struct list_head relocs_list; - struct list_head submit_list; - /** -* Track the most recently used object for relocations, as we -* frequently have to perform multiple relocations within the same -* obj/page -*/ - struct reloc_cache { - struct drm_mm_node node; /** temporary GTT binding */ - - struct intel_context *ce; - - struct i915_vma *target; - struct i915_request *rq; - struct i915_vma *rq_vma; - u32 *rq_cmd; - unsigned int rq_size; - } reloc_cache; + /** list of vma that have execobj.relocation_count */ + struct list_head relocs_list; + unsigned long relocs_count; struct eb_cmdparser { struct eb_vma *shadow; @@ -297,7 +296,6 @@ struct i915_execbuffer { unsigned int gen; /** Cached value of INTEL_GEN */ bool use_64bit_reloc : 1; - bool has_llc : 1; bool has_fence : 1; bool needs_unfenced : 1; @@ -485,6 +483,7 @@ static int eb_create(struct i915_execbuffer *eb) INIT_LIST_HEAD(&eb->bind_list); INIT_LIST_HEAD(&eb->submit_list); INIT_LIST_HEAD(&eb->relocs_list); + eb->relocs_count = 0; return 0; } @@ -631,8 +630,10 @@ eb_add_vma(struct i915_execbuffer *eb, list_add_tail(&ev->bind_link, &eb->bind_list); list_add_tail(&ev->submit_link, &eb->submit_list); - if (entry->relocation_count) + if (entry->relocation_count) { list_add_tail(&ev->reloc_link, &eb->relocs_list); + eb->relocs_count += entry->relocation_count; + } /* * SNA is doing fancy tricks with compressing batch buffers, which leads @@ -1889,8 +1890,6 @@ eb_get_vma(const struct i915_execbuffer *eb, unsigned long handle) static void eb_destroy(const struct i915_execbuffer *eb) { - GEM_BUG_ON(eb->reloc_cache.rq); - eb_vma_array_put(eb->array); if (eb->lut_size > 0) kfree(eb->buckets); @@ -1908,90 +1907,11 @@ static void eb_info_init(struct i915_execbuffer *eb, { /* Must be a variable in the struct to allow GCC to unroll. */ eb->gen = INTEL_GEN(i915); - eb->h
[Intel-gfx] [PATCH 21/37] drm/i915/gem: Include cmdparser in common execbuf pinning
Pull the cmdparser allocations in to the reservation phase, and then they are included in the common vma pinning pass. Signed-off-by: Chris Wilson Reviewed-by: Thomas Hellström --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 360 +++--- drivers/gpu/drm/i915/gem/i915_gem_object.h| 10 + drivers/gpu/drm/i915/i915_cmd_parser.c| 21 +- 3 files changed, 230 insertions(+), 161 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 4cdaf5d81ef1..236d4ad3516b 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -25,6 +25,7 @@ #include "i915_gem_clflush.h" #include "i915_gem_context.h" #include "i915_gem_ioctls.h" +#include "i915_memcpy.h" #include "i915_sw_fence_work.h" #include "i915_trace.h" #include "i915_user_extensions.h" @@ -53,6 +54,7 @@ struct eb_bind_vma { struct eb_vma_array { struct kref kref; + struct list_head aux_list; struct eb_vma vma[]; }; @@ -254,7 +256,6 @@ struct i915_execbuffer { struct i915_request *request; /** our request to build */ struct eb_vma *batch; /** identity of the batch obj/vma */ - struct i915_vma *trampoline; /** trampoline used for chaining */ /** actual size of execobj[] as we may extend it for the cmdparser */ unsigned int buffer_count; @@ -284,6 +285,11 @@ struct i915_execbuffer { unsigned int rq_size; } reloc_cache; + struct eb_cmdparser { + struct eb_vma *shadow; + struct eb_vma *trampoline; + } parser; + u64 invalid_flags; /** Set of execobj.flags that are invalid */ u32 context_flags; /** Set of execobj.flags to insert from the ctx */ @@ -310,6 +316,10 @@ struct i915_execbuffer { unsigned long num_fences; }; +static struct drm_i915_gem_exec_object2 no_entry = { + .offset = -1ull +}; + static inline bool eb_use_cmdparser(const struct i915_execbuffer *eb) { return intel_engine_requires_cmd_parser(eb->engine) || @@ -326,6 +336,7 @@ static struct eb_vma_array *eb_vma_array_create(unsigned int count) return NULL; kref_init(&arr->kref); + INIT_LIST_HEAD(&arr->aux_list); arr->vma[0].vma = NULL; return arr; @@ -351,16 +362,31 @@ static inline void eb_unreserve_vma(struct eb_vma *ev) __EXEC_OBJECT_HAS_FENCE); } +static void eb_vma_destroy(struct eb_vma *ev) +{ + eb_unreserve_vma(ev); + i915_vma_put(ev->vma); +} + +static void eb_destroy_aux(struct eb_vma_array *arr) +{ + struct eb_vma *ev, *en; + + list_for_each_entry_safe(ev, en, &arr->aux_list, reloc_link) { + eb_vma_destroy(ev); + kfree(ev); + } +} + static void eb_vma_array_destroy(struct kref *kref) { struct eb_vma_array *arr = container_of(kref, typeof(*arr), kref); - struct eb_vma *ev = arr->vma; + struct eb_vma *ev; - while (ev->vma) { - eb_unreserve_vma(ev); - i915_vma_put(ev->vma); - ev++; - } + eb_destroy_aux(arr); + + for (ev = arr->vma; ev->vma; ev++) + eb_vma_destroy(ev); kvfree(arr); } @@ -408,8 +434,8 @@ eb_lock_vma(struct i915_execbuffer *eb, struct ww_acquire_ctx *acquire) static int eb_create(struct i915_execbuffer *eb) { - /* Allocate an extra slot for use by the command parser + sentinel */ - eb->array = eb_vma_array_create(eb->buffer_count + 2); + /* Allocate an extra slot for use by the sentinel */ + eb->array = eb_vma_array_create(eb->buffer_count + 1); if (!eb->array) return -ENOMEM; @@ -1076,7 +1102,7 @@ static int eb_reserve_vma(struct eb_vm_work *work, struct eb_bind_vma *bind) GEM_BUG_ON(!(drm_mm_node_allocated(&vma->node) ^ drm_mm_node_allocated(&bind->hole))); - if (entry->offset != vma->node.start) { + if (entry != &no_entry && entry->offset != vma->node.start) { entry->offset = vma->node.start | UPDATE; *work->p_flags |= __EXEC_HAS_RELOC; } @@ -1369,7 +1395,8 @@ static int eb_reserve_vm(struct i915_execbuffer *eb) struct i915_vma *vma = ev->vma; if (eb_pin_vma_inplace(eb, entry, ev)) { - if (entry->offset != vma->node.start) { + if (entry != &no_entry && + entry->offset != vma->node.start) { entry->offset = vma->node.start | UPDATE; eb->args->flags |= __EXEC_HAS_RELOC; } @@ -1540,6 +1567,113 @@ static int eb_reserve_vm(struct i915_execbuffer *eb) } while (1); } +static int eb_alloc_cmdparser(struct i915_execbuffer *eb) +{ + struct intel_gt_buffer_pool_node *
[Intel-gfx] [PATCH 32/37] drm/i915: Specialise GGTT binding
The Global GTT mmapings do not require any backing storage for the page directories and so do not need extensive support for preallocations, or for handling multiple bindings en masse. The Global GTT bindings also need to take into account an eviction strategy for pinned vma, that we want to explicitly avoid for user bindings. It is easier to specialise the i915_ggtt_pin() to keep alive the pages/address as they are used by HW in its private GTT, while we deconstruct the i915_vma_pin() and rebuild. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gt/gen6_ppgtt.c | 7 +- .../gpu/drm/i915/gt/intel_engine_heartbeat.c | 7 +- .../gpu/drm/i915/gt/intel_engine_heartbeat.h | 4 +- drivers/gpu/drm/i915/gt/selftest_context.c| 2 +- .../drm/i915/gt/selftest_engine_heartbeat.c | 7 +- drivers/gpu/drm/i915/i915_active.c| 2 +- drivers/gpu/drm/i915/i915_vma.c | 180 -- drivers/gpu/drm/i915/i915_vma.h | 1 + .../gpu/drm/i915/selftests/i915_gem_evict.c | 151 --- 9 files changed, 180 insertions(+), 181 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c index fae75830494d..3aadb3c80794 100644 --- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c @@ -390,8 +390,11 @@ int gen6_ppgtt_pin(struct i915_ppgtt *base) * size. We allocate at the top of the GTT to avoid fragmentation. */ err = 0; - if (!atomic_read(&ppgtt->pin_count)) - err = i915_ggtt_pin(ppgtt->vma, GEN6_PD_ALIGN, PIN_HIGH); + if (!atomic_read(&ppgtt->pin_count)) { + err = i915_ggtt_pin_locked(ppgtt->vma, GEN6_PD_ALIGN, PIN_HIGH); + if (err == 0) + err = i915_vma_wait_for_bind(ppgtt->vma); + } if (!err) atomic_inc(&ppgtt->pin_count); mutex_unlock(&ppgtt->pin_mutex); diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c index 377cbfdb3355..382b0ced18e8 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c @@ -255,7 +255,7 @@ int intel_engine_pulse(struct intel_engine_cs *engine) return err; } -int intel_engine_flush_barriers(struct intel_engine_cs *engine) +int intel_engine_flush_barriers(struct intel_engine_cs *engine, gfp_t gfp) { struct i915_sched_attr attr = { .priority = I915_USER_PRIORITY(I915_PRIORITY_MIN), @@ -270,12 +270,13 @@ int intel_engine_flush_barriers(struct intel_engine_cs *engine) if (!intel_engine_pm_get_if_awake(engine)) return 0; - if (mutex_lock_interruptible(&ce->timeline->mutex)) { + if (mutex_lock_interruptible_nested(&ce->timeline->mutex, + !gfpflags_allow_blocking(gfp))) { err = -EINTR; goto out_rpm; } - rq = heartbeat_create(ce, GFP_KERNEL); + rq = heartbeat_create(ce, gfp); if (IS_ERR(rq)) { err = PTR_ERR(rq); goto out_unlock; diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h index a7b8c0f9e005..996e12e7ccf8 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.h @@ -7,6 +7,8 @@ #ifndef INTEL_ENGINE_HEARTBEAT_H #define INTEL_ENGINE_HEARTBEAT_H +#include + struct intel_engine_cs; void intel_engine_init_heartbeat(struct intel_engine_cs *engine); @@ -18,6 +20,6 @@ void intel_engine_park_heartbeat(struct intel_engine_cs *engine); void intel_engine_unpark_heartbeat(struct intel_engine_cs *engine); int intel_engine_pulse(struct intel_engine_cs *engine); -int intel_engine_flush_barriers(struct intel_engine_cs *engine); +int intel_engine_flush_barriers(struct intel_engine_cs *engine, gfp_t gfp); #endif /* INTEL_ENGINE_HEARTBEAT_H */ diff --git a/drivers/gpu/drm/i915/gt/selftest_context.c b/drivers/gpu/drm/i915/gt/selftest_context.c index 1f4020e906a8..e97e522f947b 100644 --- a/drivers/gpu/drm/i915/gt/selftest_context.c +++ b/drivers/gpu/drm/i915/gt/selftest_context.c @@ -261,7 +261,7 @@ static int __live_active_context(struct intel_engine_cs *engine) } /* Now make sure our idle-barriers are flushed */ - err = intel_engine_flush_barriers(engine); + err = intel_engine_flush_barriers(engine, GFP_KERNEL); if (err) goto err; diff --git a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c index e73854dd2fe0..d22a7956c9a5 100644 --- a/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c +++ b/drivers/gpu/drm/i915/gt/selftest_engine_heartbeat.c @@ -146,6 +146,11 @@ static int __live_idle_pulse(struct intel_engine_cs *engine, return err; }
[Intel-gfx] [PATCH 00/37] Replace obj->mm.lock with reservation_ww_class
Long story short, we need to manage evictions using dma_resv & dma_fence tracking. The backing storage will then be managed using the ww_mutex borrowed from (and shared via) obj->base.resv, rather than the current obj->mm.lock. Skipping over the breadcrumbs, the first step is to remove the final crutches of struct_mutex from execbuf and to broaden the hold for the dma-resv to guard not just publishing the dma-fences, but for the duration of the execbuf submission (holding all objects and their backing store from the point of acquisition to publishing of the final GPU work, after which the guard is delegated to the dma-fences). This is of course made complicated by our history. On top of the user's objects, we also have the HW/kernel objects with their own lifetimes, and a bunch of auxiliary objects used for working around unhappy HW and for providing the legacy relocation mechanism. We add every auxiliary object to the list of user objects required, and attempt to acquire them en masse. Since all the objects can be known a priori, we can build a list of those objects and pass that to a routine that can resolve the -EDEADLK (and evictions). [To avoid relocations imposing a penalty on sane userspace that avoids them, we do not touch any relocations until necessary, at will point we have to unroll the state, and rebuild a new list with more auxiliary buffers to accommodate the extra copy_from_user]. More examples are included as to how we can break down operations involving multiple objects into an acquire phase prior to those operations, keeping the -EDEADLK handling under control. execbuf is the unique interface in that it deals with multiple user and kernel buffers. After that, we have callers that in principle care about accessing a single buffer, and so can be migrated over to a helper that permits only holding one such buffer at a time. That enables us to swap out obj->mm.lock for obj->base.resv->lock, and use lockdep to spot illegal nesting, and to throw away the temporary pins by replacing them with holding the ww_mutex for the duration instead. What's changed? Some patch splitting and we need to pull in Matthew's patch to map the page directories under the ww_mutex. ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 04/37] drm/i915/gt: Defer enabling the breadcrumb interrupt to after submission
Move the register slow register write and readback from out of the critical path for execlists submission and delay it until the following worker, shaving off around 200us. Note that the same signal_irq_work() is allowed to run concurrently on each CPU (but it will only be queued once, once running though it can be requeued and reexecuted) so we have to remember to lock the global interactions as we cannot rely on the signal_irq_work() itself providing the serialisation (in constrast to a tasklet). Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 72 ++--- drivers/gpu/drm/i915/gt/intel_engine_pm.h | 5 ++ 2 files changed, 52 insertions(+), 25 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c index d8b206e53660..dee6d5c9b413 100644 --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c @@ -30,6 +30,7 @@ #include "i915_trace.h" #include "intel_breadcrumbs.h" #include "intel_context.h" +#include "intel_engine_pm.h" #include "intel_gt_pm.h" #include "intel_gt_requests.h" @@ -57,12 +58,10 @@ static void irq_disable(struct intel_engine_cs *engine) static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b) { - lockdep_assert_held(&b->irq_lock); - - if (!b->irq_engine || b->irq_armed) + if (!b->irq_engine) return; - if (!intel_gt_pm_get_if_awake(b->irq_engine->gt)) + if (GEM_WARN_ON(!intel_gt_pm_get_if_awake(b->irq_engine->gt))) return; /* @@ -83,15 +82,13 @@ static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b) if (!b->irq_enabled++) irq_enable(b->irq_engine); + + /* Requests may have completed before we could enable the interrupt. */ + irq_work_queue(&b->irq_work); } static void __intel_breadcrumbs_disarm_irq(struct intel_breadcrumbs *b) { - lockdep_assert_held(&b->irq_lock); - - if (!b->irq_engine || !b->irq_armed) - return; - GEM_BUG_ON(!b->irq_enabled); if (!--b->irq_enabled) irq_disable(b->irq_engine); @@ -105,8 +102,6 @@ static void add_signaling_context(struct intel_breadcrumbs *b, { intel_context_get(ce); list_add_tail(&ce->signal_link, &b->signalers); - if (list_is_first(&ce->signal_link, &b->signalers)) - __intel_breadcrumbs_arm_irq(b); } static void remove_signaling_context(struct intel_breadcrumbs *b, @@ -197,7 +192,30 @@ static void signal_irq_work(struct irq_work *work) spin_lock(&b->irq_lock); - if (list_empty(&b->signalers)) + /* +* Keep the irq armed until the interrupt after all listeners are gone. +* +* Enabling/disabling the interrupt is rather costly, roughly a couple +* of hundred microseconds. If we are proactive and enable/disable +* the interrupt around every request that wants a breadcrumb, we +* quickly drown in the extra orders of magnitude of latency imposed +* on request submission. +* +* So we try to be lazy, and keep the interrupts enabled until no +* more listeners appear within a breadcrumb interrupt interval (that +* is until a request completes that no one cares about). The +* observation is that listeners come in batches, and will often +* listen to a bunch of requests in succession. +* +* We also try to avoid raising too many interrupts, as they may +* be generated by userspace batches and it is unfortunately rather +* too easy to drown the CPU under a flood of GPU interrupts. Thus +* whenever no one appears to be listening, we turn off the interrupts. +* Fewer interrupts should conserve power -- at the very least, fewer +* interrupt draw less ire from other users of the system and tools +* like powertop. +*/ + if (b->irq_armed && list_empty(&b->signalers)) __intel_breadcrumbs_disarm_irq(b); list_splice_init(&b->signaled_requests, &signal); @@ -251,6 +269,15 @@ static void signal_irq_work(struct irq_work *work) i915_request_put(rq); } + + if (!READ_ONCE(b->irq_armed) && !list_empty(&b->signalers)) { + spin_lock(&b->irq_lock); + if (!b->irq_armed) + __intel_breadcrumbs_arm_irq(b); + spin_unlock(&b->irq_lock); + } + if (READ_ONCE(b->irq_armed) && intel_engine_is_parking(b->irq_engine)) + irq_work_queue(&b->irq_work); /* flush the signalers */ } struct intel_breadcrumbs * @@ -292,16 +319,8 @@ void intel_breadcrumbs_reset(struct intel_breadcrumbs *b) void intel_breadcrumbs_park(struct intel_breadcrumbs *b) { - unsigned long flags; - - if (!READ_ONCE(b->irq_armed)) - return; - - spin_
[Intel-gfx] [PATCH 06/37] drm/i915/gt: Don't cancel the interrupt shadow too early
We currently want to keep the interrupt enabled until the interrupt after which we have no more work to do. This heuristic was broken by us kicking the irq-work on adding a completed request without attaching a signaler -- hence it appearing to the irq-worker that an interrupt had fired when we were idle. Fixes: bda4d4db6dd6 ("drm/i915/gt: Replace intel_engine_transfer_stale_breadcrumbs") Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c index 9710d09e7670..ae8895b48eca 100644 --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c @@ -216,7 +216,7 @@ static void signal_irq_work(struct irq_work *work) * interrupt draw less ire from other users of the system and tools * like powertop. */ - if (b->irq_armed && list_empty(&b->signalers)) + if (!signal && b->irq_armed && list_empty(&b->signalers)) __intel_breadcrumbs_disarm_irq(b); list_for_each_entry_safe(ce, cn, &b->signalers, signal_link) { -- 2.20.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 30/37] drm/i915: Hold wakeref for the duration of the vma GGTT binding
Now that we have pushed the binding itself outside of the vm->mutex, we are clear of the potential wakeref inversions and can take the wakeref around the actual duration of the HW interaction. Signed-off-by: Chris Wilson Reviewed-by: Thomas Hellström --- drivers/gpu/drm/i915/gt/intel_ggtt.c | 39 drivers/gpu/drm/i915/i915_vma.c | 6 - 2 files changed, 22 insertions(+), 23 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c index 92b6cc754d5b..a2c7c55b358d 100644 --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c @@ -434,27 +434,39 @@ static void i915_ggtt_clear_range(struct i915_address_space *vm, intel_gtt_clear_range(start >> PAGE_SHIFT, length >> PAGE_SHIFT); } -static void ggtt_bind_vma(struct i915_address_space *vm, - struct i915_vm_pt_stash *stash, - struct i915_vma *vma, - enum i915_cache_level cache_level, - u32 flags) +static void __ggtt_bind_vma(struct i915_address_space *vm, + struct i915_vm_pt_stash *stash, + struct i915_vma *vma, + enum i915_cache_level cache_level, + u32 flags) { struct drm_i915_gem_object *obj = vma->obj; + intel_wakeref_t wakeref; u32 pte_flags; - if (i915_vma_is_bound(vma, ~flags & I915_VMA_BIND_MASK)) - return; - /* Applicable to VLV (gen8+ do not support RO in the GGTT) */ pte_flags = 0; if (i915_gem_object_is_readonly(obj)) pte_flags |= PTE_READ_ONLY; - vm->insert_entries(vm, vma, cache_level, pte_flags); + with_intel_runtime_pm(vm->gt->uncore->rpm, wakeref) + vm->insert_entries(vm, vma, cache_level, pte_flags); + vma->page_sizes.gtt = I915_GTT_PAGE_SIZE; } +static void ggtt_bind_vma(struct i915_address_space *vm, + struct i915_vm_pt_stash *stash, + struct i915_vma *vma, + enum i915_cache_level cache_level, + u32 flags) +{ + if (i915_vma_is_bound(vma, ~flags & I915_VMA_BIND_MASK)) + return; + + __ggtt_bind_vma(vm, stash, vma, cache_level, flags); +} + static void ggtt_unbind_vma(struct i915_address_space *vm, struct i915_vma *vma) { vm->clear_range(vm, vma->node.start, vma->size); @@ -571,19 +583,12 @@ static void aliasing_gtt_bind_vma(struct i915_address_space *vm, enum i915_cache_level cache_level, u32 flags) { - u32 pte_flags; - - /* Currently applicable only to VLV */ - pte_flags = 0; - if (i915_gem_object_is_readonly(vma->obj)) - pte_flags |= PTE_READ_ONLY; - if (flags & I915_VMA_LOCAL_BIND) ppgtt_bind_vma(&i915_vm_to_ggtt(vm)->alias->vm, stash, vma, cache_level, flags); if (flags & I915_VMA_GLOBAL_BIND) - vm->insert_entries(vm, vma, cache_level, pte_flags); + __ggtt_bind_vma(vm, stash, vma, cache_level, flags); } static void aliasing_gtt_unbind_vma(struct i915_address_space *vm, diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index 40e38b533b59..320f6f8ec042 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -794,7 +794,6 @@ static int __wait_for_unbind(struct i915_vma *vma, unsigned int flags) int i915_vma_pin(struct i915_vma *vma, u64 size, u64 alignment, u64 flags) { struct i915_vma_work *work = NULL; - intel_wakeref_t wakeref = 0; unsigned int bound; int err; @@ -813,9 +812,6 @@ int i915_vma_pin(struct i915_vma *vma, u64 size, u64 alignment, u64 flags) return err; } - if (flags & PIN_GLOBAL) - wakeref = intel_runtime_pm_get(&vma->vm->i915->runtime_pm); - err = __wait_for_unbind(vma, flags); if (err) goto err_rpm; @@ -925,8 +921,6 @@ int i915_vma_pin(struct i915_vma *vma, u64 size, u64 alignment, u64 flags) err_fence: dma_fence_work_commit_imm(&work->base); err_rpm: - if (wakeref) - intel_runtime_pm_put(&vma->vm->i915->runtime_pm, wakeref); if (vma->obj) i915_gem_object_unpin_pages(vma->obj); return err; -- 2.20.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 36/37] drm/i915/display: Drop object lock from intel_unpin_fb_vma
The obj->resv->lock does not serialisation anything within intel_unpin_fb_vma(), so remove the redundant contention point. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/display/intel_display.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c index 522c772a2111..a70b41b63650 100644 --- a/drivers/gpu/drm/i915/display/intel_display.c +++ b/drivers/gpu/drm/i915/display/intel_display.c @@ -2311,12 +2311,9 @@ intel_pin_and_fence_fb_obj(struct drm_framebuffer *fb, void intel_unpin_fb_vma(struct i915_vma *vma, unsigned long flags) { - i915_gem_object_lock(vma->obj); if (flags & PLANE_HAS_FENCE) i915_vma_unpin_fence(vma); i915_gem_object_unpin_from_display_plane(vma); - i915_gem_object_unlock(vma->obj); - i915_vma_put(vma); } -- 2.20.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 28/37] drm/i915: Acquire the object lock around page directories
Now that the page directories are backed by an object, and we wish to acquire multiple objects together under the same acquire context, teach i915_vm_map_pt_stash() to use i915_acquire_ctx. Signed-off-by: Chris Wilson --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 2 +- drivers/gpu/drm/i915/gt/intel_gtt.c | 14 +++- drivers/gpu/drm/i915/gt/intel_gtt.h | 4 +++ drivers/gpu/drm/i915/gt/intel_ppgtt.c | 34 +-- 4 files changed, 49 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index d3ac2542a039..94ec3536cac4 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -1450,7 +1450,7 @@ static int eb_reserve_vm(struct i915_execbuffer *eb) return eb_vm_work_cancel(work, err); /* We also need to prepare mappings to write the PD pages */ - err = i915_vm_map_pt_stash(work->vm, &work->stash); + err = __i915_vm_map_pt_stash_locked(work->vm, &work->stash); if (err) return eb_vm_work_cancel(work, err); diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c index 1a7efbad8f74..b0629de490a3 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.c +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c @@ -19,7 +19,8 @@ struct drm_i915_gem_object *alloc_pt_dma(struct i915_address_space *vm, int sz) return i915_gem_object_create_internal(vm->i915, sz); } -int map_pt_dma(struct i915_address_space *vm, struct drm_i915_gem_object *obj) +int __map_pt_dma_locked(struct i915_address_space *vm, + struct drm_i915_gem_object *obj) { void *vaddr; @@ -31,6 +32,17 @@ int map_pt_dma(struct i915_address_space *vm, struct drm_i915_gem_object *obj) return 0; } +int map_pt_dma(struct i915_address_space *vm, struct drm_i915_gem_object *obj) +{ + int err; + + i915_gem_object_lock(obj); + err = __map_pt_dma_locked(vm, obj); + i915_gem_object_unlock(obj); + + return err; +} + void __i915_vm_close(struct i915_address_space *vm) { struct i915_vma *vma, *vn; diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.h b/drivers/gpu/drm/i915/gt/intel_gtt.h index c659dbd6cda2..b4e1519e4028 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.h +++ b/drivers/gpu/drm/i915/gt/intel_gtt.h @@ -525,6 +525,8 @@ struct i915_page_directory *alloc_pd(struct i915_address_space *vm); struct i915_page_directory *__alloc_pd(int npde); int map_pt_dma(struct i915_address_space *vm, struct drm_i915_gem_object *obj); +int __map_pt_dma_locked(struct i915_address_space *vm, + struct drm_i915_gem_object *obj); void free_px(struct i915_address_space *vm, struct i915_page_table *pt, int lvl); @@ -573,6 +575,8 @@ int i915_vm_alloc_pt_stash(struct i915_address_space *vm, u64 size); int i915_vm_map_pt_stash(struct i915_address_space *vm, struct i915_vm_pt_stash *stash); +int __i915_vm_map_pt_stash_locked(struct i915_address_space *vm, + struct i915_vm_pt_stash *stash); void i915_vm_free_pt_stash(struct i915_address_space *vm, struct i915_vm_pt_stash *stash); diff --git a/drivers/gpu/drm/i915/gt/intel_ppgtt.c b/drivers/gpu/drm/i915/gt/intel_ppgtt.c index 11e7288464c0..ada894885795 100644 --- a/drivers/gpu/drm/i915/gt/intel_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/intel_ppgtt.c @@ -5,6 +5,8 @@ #include +#include "mm/i915_acquire_ctx.h" + #include "i915_trace.h" #include "intel_gtt.h" #include "gen6_ppgtt.h" @@ -253,15 +255,15 @@ int i915_vm_alloc_pt_stash(struct i915_address_space *vm, return 0; } -int i915_vm_map_pt_stash(struct i915_address_space *vm, -struct i915_vm_pt_stash *stash) +int __i915_vm_map_pt_stash_locked(struct i915_address_space *vm, + struct i915_vm_pt_stash *stash) { struct i915_page_table *pt; int n, err; for (n = 0; n < ARRAY_SIZE(stash->pt); n++) { for (pt = stash->pt[n]; pt; pt = pt->stash) { - err = map_pt_dma(vm, pt->base); + err = __map_pt_dma_locked(vm, pt->base); if (err) return err; } @@ -270,6 +272,32 @@ int i915_vm_map_pt_stash(struct i915_address_space *vm, return 0; } +int i915_vm_map_pt_stash(struct i915_address_space *vm, +struct i915_vm_pt_stash *stash) +{ + struct i915_acquire_ctx acquire; + struct i915_page_table *pt; + int n, err; + + /* Acquire all the pages for the page directories simultaneously */ + i915_acquire_ctx_init(&acquire); + for (n = 0; n < ARRAY_SIZE(stash->pt); n++)
[Intel-gfx] [PATCH 25/37] drm/i915: Add an implementation for common reservation_ww_class locking
From: Maarten Lankhorst i915_gem_ww_ctx is used to lock all gem bo's for pinning and memory eviction. We don't use it yet, but lets start adding the definition first. To use it, we have to pass a non-NULL ww to gem_object_lock, and don't unlock directly. It is done in i915_gem_ww_ctx_fini. Changes since v1: - Change ww_ctx and obj order in locking functions (Jonas Lahtinen) v3: Build a list of all objects first, centralise -EDEADLK handling Signed-off-by: Maarten Lankhorst Reviewed-by: Thomas Hellström --- drivers/gpu/drm/i915/Makefile | 4 + drivers/gpu/drm/i915/i915_globals.c | 1 + drivers/gpu/drm/i915/i915_globals.h | 1 + drivers/gpu/drm/i915/mm/i915_acquire_ctx.c| 139 ++ drivers/gpu/drm/i915/mm/i915_acquire_ctx.h| 34 +++ drivers/gpu/drm/i915/mm/st_acquire_ctx.c | 242 ++ .../drm/i915/selftests/i915_mock_selftests.h | 1 + 7 files changed, 422 insertions(+) create mode 100644 drivers/gpu/drm/i915/mm/i915_acquire_ctx.c create mode 100644 drivers/gpu/drm/i915/mm/i915_acquire_ctx.h create mode 100644 drivers/gpu/drm/i915/mm/st_acquire_ctx.c diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index bda4c0e408f8..a3a4c8a555ec 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -125,6 +125,10 @@ gt-y += \ gt/gen9_renderstate.o i915-y += $(gt-y) +# Memory + DMA management +i915-y += \ + mm/i915_acquire_ctx.o + # GEM (Graphics Execution Management) code gem-y += \ gem/i915_gem_busy.o \ diff --git a/drivers/gpu/drm/i915/i915_globals.c b/drivers/gpu/drm/i915/i915_globals.c index 3aa213684293..51ec42a14694 100644 --- a/drivers/gpu/drm/i915/i915_globals.c +++ b/drivers/gpu/drm/i915/i915_globals.c @@ -87,6 +87,7 @@ static void __i915_globals_cleanup(void) static __initconst int (* const initfn[])(void) = { i915_global_active_init, + i915_global_acquire_init, i915_global_buddy_init, i915_global_context_init, i915_global_gem_context_init, diff --git a/drivers/gpu/drm/i915/i915_globals.h b/drivers/gpu/drm/i915/i915_globals.h index b2f5cd9b9b1a..11227abf2769 100644 --- a/drivers/gpu/drm/i915/i915_globals.h +++ b/drivers/gpu/drm/i915/i915_globals.h @@ -27,6 +27,7 @@ void i915_globals_exit(void); /* constructors */ int i915_global_active_init(void); +int i915_global_acquire_init(void); int i915_global_buddy_init(void); int i915_global_context_init(void); int i915_global_gem_context_init(void); diff --git a/drivers/gpu/drm/i915/mm/i915_acquire_ctx.c b/drivers/gpu/drm/i915/mm/i915_acquire_ctx.c new file mode 100644 index ..d1c3b958c15d --- /dev/null +++ b/drivers/gpu/drm/i915/mm/i915_acquire_ctx.c @@ -0,0 +1,139 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2020 Intel Corporation + */ + +#include + +#include "i915_globals.h" +#include "gem/i915_gem_object.h" + +#include "i915_acquire_ctx.h" + +static struct i915_global_acquire { + struct i915_global base; + struct kmem_cache *slab_acquires; +} global; + +struct i915_acquire { + struct drm_i915_gem_object *obj; + struct i915_acquire *next; +}; + +static struct i915_acquire *i915_acquire_alloc(void) +{ + return kmem_cache_alloc(global.slab_acquires, GFP_KERNEL); +} + +static void i915_acquire_free(struct i915_acquire *lnk) +{ + kmem_cache_free(global.slab_acquires, lnk); +} + +void i915_acquire_ctx_init(struct i915_acquire_ctx *ctx) +{ + ww_acquire_init(&ctx->ctx, &reservation_ww_class); + ctx->locked = NULL; +} + +int i915_acquire_ctx_lock(struct i915_acquire_ctx *ctx, + struct drm_i915_gem_object *obj) +{ + struct i915_acquire *lock, *lnk; + int err; + + lock = i915_acquire_alloc(); + if (!lock) + return -ENOMEM; + + lock->obj = i915_gem_object_get(obj); + lock->next = NULL; + + while ((lnk = lock)) { + obj = lnk->obj; + lock = lnk->next; + + err = dma_resv_lock_interruptible(obj->base.resv, &ctx->ctx); + if (err == -EDEADLK) { + struct i915_acquire *old; + + while ((old = ctx->locked)) { + i915_gem_object_unlock(old->obj); + ctx->locked = old->next; + old->next = lock; + lock = old; + } + + err = dma_resv_lock_slow_interruptible(obj->base.resv, + &ctx->ctx); + } + if (!err) { + lnk->next = ctx->locked; + ctx->locked = lnk; + } else { + i915_gem_object_put(obj); + i915_acquire_free(lnk); + } + if (err == -EALREADY) +
[Intel-gfx] [PATCH 11/37] drm/i915/gem: Move the 'cached' info to i915_execbuffer
The reloc_cache contains some details that are used outside of the relocation handling, so lift those out of the embeddded struct into the principle struct i915_execbuffer. Signed-off-by: Chris Wilson --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 61 +++ .../i915/gem/selftests/i915_gem_execbuffer.c | 6 +- 2 files changed, 37 insertions(+), 30 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index e7e16c62df1c..e9ef0c287fd9 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -261,11 +261,6 @@ struct i915_execbuffer { */ struct reloc_cache { struct drm_mm_node node; /** temporary GTT binding */ - unsigned int gen; /** Cached value of INTEL_GEN */ - bool use_64bit_reloc : 1; - bool has_llc : 1; - bool has_fence : 1; - bool needs_unfenced : 1; struct intel_context *ce; @@ -283,6 +278,12 @@ struct i915_execbuffer { u32 batch_len; /** Length of batch within object */ u32 batch_flags; /** Flags composed for emit_bb_start() */ + unsigned int gen; /** Cached value of INTEL_GEN */ + bool use_64bit_reloc : 1; + bool has_llc : 1; + bool has_fence : 1; + bool needs_unfenced : 1; + /** * Indicate either the size of the hastable used to resolve * relocation handles, or if negative that we are using a direct @@ -540,11 +541,11 @@ eb_validate_vma(struct i915_execbuffer *eb, */ entry->offset = gen8_noncanonical_addr(entry->offset); - if (!eb->reloc_cache.has_fence) { + if (!eb->has_fence) { entry->flags &= ~EXEC_OBJECT_NEEDS_FENCE; } else { if ((entry->flags & EXEC_OBJECT_NEEDS_FENCE || -eb->reloc_cache.needs_unfenced) && +eb->needs_unfenced) && i915_gem_object_is_tiled(vma->obj)) entry->flags |= EXEC_OBJECT_NEEDS_GTT | __EXEC_OBJECT_NEEDS_MAP; } @@ -592,7 +593,7 @@ eb_add_vma(struct i915_execbuffer *eb, if (entry->relocation_count && !(ev->flags & EXEC_OBJECT_PINNED)) ev->flags |= __EXEC_OBJECT_NEEDS_BIAS; - if (eb->reloc_cache.has_fence) + if (eb->has_fence) ev->flags |= EXEC_OBJECT_NEEDS_FENCE; eb->batch = ev; @@ -995,15 +996,19 @@ relocation_target(const struct drm_i915_gem_relocation_entry *reloc, return gen8_canonical_addr((int)reloc->delta + target->node.start); } -static void reloc_cache_init(struct reloc_cache *cache, -struct drm_i915_private *i915) +static void eb_info_init(struct i915_execbuffer *eb, +struct drm_i915_private *i915) { /* Must be a variable in the struct to allow GCC to unroll. */ - cache->gen = INTEL_GEN(i915); - cache->has_llc = HAS_LLC(i915); - cache->use_64bit_reloc = HAS_64BIT_RELOC(i915); - cache->has_fence = cache->gen < 4; - cache->needs_unfenced = INTEL_INFO(i915)->unfenced_needs_alignment; + eb->gen = INTEL_GEN(i915); + eb->has_llc = HAS_LLC(i915); + eb->use_64bit_reloc = HAS_64BIT_RELOC(i915); + eb->has_fence = eb->gen < 4; + eb->needs_unfenced = INTEL_INFO(i915)->unfenced_needs_alignment; +} + +static void reloc_cache_init(struct reloc_cache *cache) +{ cache->node.flags = 0; cache->rq = NULL; cache->target = NULL; @@ -1011,8 +1016,9 @@ static void reloc_cache_init(struct reloc_cache *cache, #define RELOC_TAIL 4 -static int reloc_gpu_chain(struct reloc_cache *cache) +static int reloc_gpu_chain(struct i915_execbuffer *eb) { + struct reloc_cache *cache = &eb->reloc_cache; struct intel_gt_buffer_pool_node *pool; struct i915_request *rq = cache->rq; struct i915_vma *batch; @@ -1036,9 +1042,9 @@ static int reloc_gpu_chain(struct reloc_cache *cache) GEM_BUG_ON(cache->rq_size + RELOC_TAIL > PAGE_SIZE / sizeof(u32)); cmd = cache->rq_cmd + cache->rq_size; *cmd++ = MI_ARB_CHECK; - if (cache->gen >= 8) + if (eb->gen >= 8) *cmd++ = MI_BATCH_BUFFER_START_GEN8; - else if (cache->gen >= 6) + else if (eb->gen >= 6) *cmd++ = MI_BATCH_BUFFER_START; else *cmd++ = MI_BATCH_BUFFER_START | MI_BATCH_GTT; @@ -1061,7 +1067,7 @@ static int reloc_gpu_chain(struct reloc_cache *cache) goto out_pool; cmd = i915_gem_object_pin_map(batch->obj, - cache->has_llc ? + eb->has_llc ? I915_MAP_FORCE_WB : I915_MAP_FORC
[Intel-gfx] [PATCH 03/37] drm/i915/gt: Free stale request on destroying the virtual engine
Since preempt-to-busy, we may unsubmit a request while it is still on the HW and completes asynchronously. That means it may be retired and in the process destroy the virtual engine (as the user has closed their context), but that engine may still be holding onto the unsubmitted compelted request. Therefore we need to potentially cleanup the old request on destroying the virtual engine. We also have to keep the virtual_engine alive until after the sibling's execlists_dequeue() have finished peeking into the virtual engines, for which we serialise with RCU. Signed-off-by: Chris Wilson Cc: Tvrtko Ursulin --- drivers/gpu/drm/i915/gt/intel_lrc.c | 22 +++--- 1 file changed, 19 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index 417f6b0c6c61..cb04bc5474be 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -180,6 +180,7 @@ #define EXECLISTS_REQUEST_SIZE 64 /* bytes */ struct virtual_engine { + struct rcu_head rcu; struct intel_engine_cs base; struct intel_context context; @@ -5393,10 +5394,25 @@ static void virtual_context_destroy(struct kref *kref) container_of(kref, typeof(*ve), context.ref); unsigned int n; - GEM_BUG_ON(!list_empty(virtual_queue(ve))); - GEM_BUG_ON(ve->request); GEM_BUG_ON(ve->context.inflight); + if (unlikely(ve->request)) { + struct i915_request *old; + unsigned long flags; + + spin_lock_irqsave(&ve->base.active.lock, flags); + + old = fetch_and_zero(&ve->request); + if (old) { + GEM_BUG_ON(!i915_request_completed(old)); + __i915_request_submit(old); + i915_request_put(old); + } + + spin_unlock_irqrestore(&ve->base.active.lock, flags); + } + GEM_BUG_ON(!list_empty(virtual_queue(ve))); + for (n = 0; n < ve->num_siblings; n++) { struct intel_engine_cs *sibling = ve->siblings[n]; struct rb_node *node = &ve->nodes[sibling->id].rb; @@ -5422,7 +5438,7 @@ static void virtual_context_destroy(struct kref *kref) intel_engine_free_request_pool(&ve->base); kfree(ve->bonds); - kfree(ve); + kfree_rcu(ve, rcu); } static void virtual_engine_initial_hint(struct virtual_engine *ve) -- 2.20.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 19/37] drm/i915/gem: Asynchronous GTT unbinding
It is reasonably common for userspace (even modern drivers like iris) to reuse an active address for a new buffer. This would cause the application to stall under its mutex (originally struct_mutex) until the old batches were idle and it could synchronously remove the stale PTE. However, we can queue up a job that waits on the signal for the old nodes to complete and upon those signals, remove the old nodes replacing them with the new ones for the batch. This is still CPU driven, but in theory we can do the GTT patching from the GPU. The job itself has a completion signal allowing the execbuf to wait upon the rebinding, and also other observers to coordinate with the common VM activity. Letting userspace queue up more work, allows it do more stuff without blocking other clients. In turn, we take care not to let it too much concurrent work, creating a small number of queues for each context to limit the number of concurrent tasks. The implementation relies on only scheduling one unbind operation per vma as we use the unbound vma->node location to track the stale PTE. If there are multiple processes thrashing the same vm, the eviction processing will become synchronous, with the clients having to wait for execbuf to schedule their work. Closes: https://gitlab.freedesktop.org/drm/intel/issues/1402 Signed-off-by: Chris Wilson Cc: Matthew Auld Cc: Andi Shyti Reviewed-by: Thomas Hellström --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 919 -- drivers/gpu/drm/i915/gt/gen6_ppgtt.c | 1 + drivers/gpu/drm/i915/gt/intel_gtt.c | 4 + drivers/gpu/drm/i915/gt/intel_gtt.h | 2 + drivers/gpu/drm/i915/i915_gem.c | 7 + drivers/gpu/drm/i915/i915_gem_gtt.c | 5 + drivers/gpu/drm/i915/i915_vma.c | 71 +- drivers/gpu/drm/i915/i915_vma.h | 4 + 8 files changed, 883 insertions(+), 130 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 32d23718ee1e..301e67dcdbde 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -18,6 +18,7 @@ #include "gt/intel_gt.h" #include "gt/intel_gt_buffer_pool.h" #include "gt/intel_gt_pm.h" +#include "gt/intel_gt_requests.h" #include "gt/intel_ring.h" #include "i915_drv.h" @@ -44,6 +45,12 @@ struct eb_vma { u32 handle; }; +struct eb_bind_vma { + struct eb_vma *ev; + struct drm_mm_node hole; + unsigned int bind_flags; +}; + struct eb_vma_array { struct kref kref; struct eb_vma vma[]; @@ -67,11 +74,12 @@ struct eb_vma_array { I915_EXEC_RESOURCE_STREAMER) /* Catch emission of unexpected errors for CI! */ +#define __EINVAL__ 22 #if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM) #undef EINVAL #define EINVAL ({ \ DRM_DEBUG_DRIVER("EINVAL at %s:%d\n", __func__, __LINE__); \ - 22; \ + __EINVAL__; \ }) #endif @@ -323,6 +331,12 @@ static struct eb_vma_array *eb_vma_array_create(unsigned int count) return arr; } +static struct eb_vma_array *eb_vma_array_get(struct eb_vma_array *arr) +{ + kref_get(&arr->kref); + return arr; +} + static inline void eb_unreserve_vma(struct eb_vma *ev) { struct i915_vma *vma = ev->vma; @@ -456,7 +470,10 @@ eb_vma_misplaced(const struct drm_i915_gem_exec_object2 *entry, const struct i915_vma *vma, unsigned int flags) { - if (vma->node.size < entry->pad_to_size) + if (test_bit(I915_VMA_ERROR_BIT, __i915_vma_flags(vma))) + return true; + + if (vma->node.size < max(vma->size, entry->pad_to_size)) return true; if (entry->alignment && !IS_ALIGNED(vma->node.start, entry->alignment)) @@ -481,32 +498,6 @@ eb_vma_misplaced(const struct drm_i915_gem_exec_object2 *entry, return false; } -static u64 eb_pin_flags(const struct drm_i915_gem_exec_object2 *entry, - unsigned int exec_flags) -{ - u64 pin_flags = 0; - - if (exec_flags & EXEC_OBJECT_NEEDS_GTT) - pin_flags |= PIN_GLOBAL; - - /* -* Wa32bitGeneralStateOffset & Wa32bitInstructionBaseOffset, -* limit address to the first 4GBs for unflagged objects. -*/ - if (!(exec_flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS)) - pin_flags |= PIN_ZONE_4G; - - if (exec_flags & __EXEC_OBJECT_NEEDS_MAP) - pin_flags |= PIN_MAPPABLE; - - if (exec_flags & EXEC_OBJECT_PINNED) - pin_flags |= entry->offset | PIN_OFFSET_FIXED; - else if (exec_flags & __EXEC_OBJECT_NEEDS_BIAS) - pin_flags |= BATCH_OFFSET_BIAS | PIN_OFFSET_BIAS; - - return pin_flags; -} - static bool eb_pin_vma_fence_inplace(struct eb_vma *ev) { return false; /* We need to add some new fence serialisation */ @@ -520,6 +511,10 @@ eb_pin_vma_inplace(struct i915_e
[Intel-gfx] [PATCH 22/37] drm/i915/gem: Include secure batch in common execbuf pinning
Pull the GGTT binding for the secure batch dispatch into the common vma pinning routine for execbuf, so that there is just a single central place for all i915_vma_pin(). Signed-off-by: Chris Wilson Reviewed-by: Thomas Hellström --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 88 +++ 1 file changed, 51 insertions(+), 37 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 236d4ad3516b..19cab5541dbc 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -1674,6 +1674,48 @@ static int eb_alloc_cmdparser(struct i915_execbuffer *eb) return err; } +static int eb_secure_batch(struct i915_execbuffer *eb) +{ + struct i915_vma *vma = eb->batch->vma; + + /* +* snb/ivb/vlv conflate the "batch in ppgtt" bit with the "non-secure +* batch" bit. Hence we need to pin secure batches into the global gtt. +* hsw should have this fixed, but bdw mucks it up again. +*/ + if (!(eb->batch_flags & I915_DISPATCH_SECURE)) + return 0; + + if (GEM_WARN_ON(vma->vm != &eb->engine->gt->ggtt->vm)) { + struct eb_vma *ev; + + ev = kzalloc(sizeof(*ev), GFP_KERNEL); + if (!ev) + return -ENOMEM; + + vma = i915_vma_instance(vma->obj, + &eb->engine->gt->ggtt->vm, + NULL); + if (IS_ERR(vma)) { + kfree(ev); + return PTR_ERR(vma); + } + + ev->vma = i915_vma_get(vma); + ev->exec = &no_entry; + + list_add(&ev->submit_link, &eb->submit_list); + list_add(&ev->reloc_link, &eb->array->aux_list); + list_add(&ev->bind_link, &eb->bind_list); + + GEM_BUG_ON(eb->batch->vma->private); + eb->batch = ev; + } + + eb->batch->flags |= EXEC_OBJECT_NEEDS_GTT; + return 0; +} + static unsigned int eb_batch_index(const struct i915_execbuffer *eb) { if (eb->args->flags & I915_EXEC_BATCH_FIRST) @@ -1823,6 +1865,10 @@ static int eb_lookup_vmas(struct i915_execbuffer *eb) if (err) return err; + err = eb_secure_batch(eb); + if (err) + return err; + return 0; } @@ -2798,7 +2844,7 @@ static int eb_parse(struct i915_execbuffer *eb) return 0; } -static int eb_submit(struct i915_execbuffer *eb, struct i915_vma *batch) +static int eb_submit(struct i915_execbuffer *eb) { int err; @@ -2825,7 +2871,7 @@ static int eb_submit(struct i915_execbuffer *eb, struct i915_vma *batch) } err = eb->engine->emit_bb_start(eb->request, - batch->node.start + + eb->batch->vma->node.start + eb->batch_start_offset, eb->batch_len, eb->batch_flags); @@ -3486,7 +3532,6 @@ i915_gem_do_execbuffer(struct drm_device *dev, struct i915_execbuffer eb; struct dma_fence *in_fence = NULL; struct sync_file *out_fence = NULL; - struct i915_vma *batch; int out_fence_fd = -1; int err; @@ -3601,34 +3646,6 @@ i915_gem_do_execbuffer(struct drm_device *dev, if (err) goto err_vma; - /* -* snb/ivb/vlv conflate the "batch in ppgtt" bit with the "non-secure -* batch" bit. Hence we need to pin secure batches into the global gtt. -* hsw should have this fixed, but bdw mucks it up again. */ - batch = i915_vma_get(eb.batch->vma); - if (eb.batch_flags & I915_DISPATCH_SECURE) { - struct i915_vma *vma; - - /* -* So on first glance it looks freaky that we pin the batch here -* outside of the reservation loop. But: -* - The batch is already pinned into the relevant ppgtt, so we -* already have the backing storage fully allocated. -* - No other BO uses the global gtt (well contexts, but meh), -* so we don't really have issues with multiple objects not -* fitting due to fragmentation. -* So this is actually safe. -*/ - vma = i915_gem_object_ggtt_pin(batch->obj, NULL, 0, 0, 0); - if (IS_ERR(vma)) { - err = PTR_ERR(vma); - goto err_vma; - } - - GEM_BUG_ON(vma->obj != batch->obj); - batch = vma; - } - /* All GPU relocation batches must be submitted prior to the user rq */ GEM_BUG_ON(eb.reloc_cache.rq); @@ -3636,7 +3653,7 @@ i915_gem_do_exec
[Intel-gfx] [PATCH 12/37] drm/i915/gem: Break apart the early i915_vma_pin from execbuf object lookup
As a prelude to the next step where we want to perform all the object allocations together under the same lock, we first must delay the i915_vma_pin() as that implicitly does the allocations for us, one by one. As it only does the allocations one by one, it is not allowed to wait/evict, whereas pulling all the allocations together the entire set can be scheduled as one. Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin Reviewed-by: Thomas Hellström --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 74 ++- 1 file changed, 41 insertions(+), 33 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index e9ef0c287fd9..2f6fa8b3a805 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -34,6 +34,8 @@ struct eb_vma { /** This vma's place in the execbuf reservation list */ struct drm_i915_gem_exec_object2 *exec; + + struct list_head bind_link; struct list_head unbound_link; struct list_head reloc_link; @@ -248,8 +250,8 @@ struct i915_execbuffer { /** actual size of execobj[] as we may extend it for the cmdparser */ unsigned int buffer_count; - /** list of vma not yet bound during reservation phase */ - struct list_head unbound; + /** list of all vma required to be bound for this execbuf */ + struct list_head bind_list; /** list of vma that have execobj.relocation_count */ struct list_head relocs_list; @@ -577,6 +579,8 @@ eb_add_vma(struct i915_execbuffer *eb, eb->lut_size)]); } + list_add_tail(&ev->bind_link, &eb->bind_list); + if (entry->relocation_count) list_add_tail(&ev->reloc_link, &eb->relocs_list); @@ -598,16 +602,6 @@ eb_add_vma(struct i915_execbuffer *eb, eb->batch = ev; } - - if (eb_pin_vma(eb, entry, ev)) { - if (entry->offset != vma->node.start) { - entry->offset = vma->node.start | UPDATE; - eb->args->flags |= __EXEC_HAS_RELOC; - } - } else { - eb_unreserve_vma(ev); - list_add_tail(&ev->unbound_link, &eb->unbound); - } } static int eb_reserve_vma(const struct i915_execbuffer *eb, @@ -682,13 +676,31 @@ static int wait_for_timeline(struct intel_timeline *tl) } while (1); } -static int eb_reserve(struct i915_execbuffer *eb) +static int eb_reserve_vm(struct i915_execbuffer *eb) { - const unsigned int count = eb->buffer_count; unsigned int pin_flags = PIN_USER | PIN_NONBLOCK; - struct list_head last; + struct list_head last, unbound; struct eb_vma *ev; - unsigned int i, pass; + unsigned int pass; + + INIT_LIST_HEAD(&unbound); + list_for_each_entry(ev, &eb->bind_list, bind_link) { + struct drm_i915_gem_exec_object2 *entry = ev->exec; + struct i915_vma *vma = ev->vma; + + if (eb_pin_vma(eb, entry, ev)) { + if (entry->offset != vma->node.start) { + entry->offset = vma->node.start | UPDATE; + eb->args->flags |= __EXEC_HAS_RELOC; + } + } else { + eb_unreserve_vma(ev); + list_add_tail(&ev->unbound_link, &unbound); + } + } + + if (list_empty(&unbound)) + return 0; /* * Attempt to pin all of the buffers into the GTT. @@ -726,7 +738,7 @@ static int eb_reserve(struct i915_execbuffer *eb) if (mutex_lock_interruptible(&eb->i915->drm.struct_mutex)) return -EINTR; - list_for_each_entry(ev, &eb->unbound, unbound_link) { + list_for_each_entry(ev, &unbound, unbound_link) { err = eb_reserve_vma(eb, ev, pin_flags); if (err) break; @@ -737,13 +749,11 @@ static int eb_reserve(struct i915_execbuffer *eb) } /* Resort *all* the objects into priority order */ - INIT_LIST_HEAD(&eb->unbound); + INIT_LIST_HEAD(&unbound); INIT_LIST_HEAD(&last); - for (i = 0; i < count; i++) { - unsigned int flags; + list_for_each_entry(ev, &eb->bind_list, bind_link) { + unsigned int flags = ev->flags; - ev = &eb->vma[i]; - flags = ev->flags; if (flags & EXEC_OBJECT_PINNED && flags & __EXEC_OBJECT_HAS_PIN) continue; @@ -752,17 +762,17 @@ static int eb_reserve(struct i915_execbuffer *eb)
[Intel-gfx] [PATCH 34/37] drm/i915/gt: Push the wait for the context to bound to the request
Rather than synchronously wait for the context to be bound, within the intel_context_pin(), we can track the pending completion of the bind fence and only submit requests along the context when signaled. Signed-off-by: Chris Wilson Reviewed-by: Thomas Hellström --- drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/gt/intel_context.c| 80 +- drivers/gpu/drm/i915/gt/intel_context.h| 6 ++ drivers/gpu/drm/i915/i915_active.h | 1 - drivers/gpu/drm/i915/i915_request.c| 4 ++ drivers/gpu/drm/i915/i915_sw_fence_await.c | 62 + drivers/gpu/drm/i915/i915_sw_fence_await.h | 19 + 7 files changed, 140 insertions(+), 33 deletions(-) create mode 100644 drivers/gpu/drm/i915/i915_sw_fence_await.c create mode 100644 drivers/gpu/drm/i915/i915_sw_fence_await.h diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index a3a4c8a555ec..2cf54db8b847 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -61,6 +61,7 @@ i915-y += \ i915_memcpy.o \ i915_mm.o \ i915_sw_fence.o \ + i915_sw_fence_await.o \ i915_sw_fence_work.o \ i915_syncmap.o \ i915_user_extensions.o diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index ff3f7580d1ca..04c2f207b11d 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -10,6 +10,7 @@ #include "i915_drv.h" #include "i915_globals.h" +#include "i915_sw_fence_await.h" #include "intel_context.h" #include "intel_engine.h" @@ -140,31 +141,71 @@ intel_context_acquire_lock(struct intel_context *ce, return 0; } +static int await_bind(struct dma_fence_await *fence, struct i915_vma *vma) +{ + struct dma_fence *bind; + int err = 0; + + bind = i915_active_fence_get(&vma->active.excl); + if (bind) { + err = i915_sw_fence_await_dma_fence(&fence->await, bind, + 0, GFP_KERNEL); + dma_fence_put(bind); + } + + return err; +} + static int intel_context_active_locked(struct intel_context *ce) { + struct dma_fence_await *fence; int err; + fence = dma_fence_await_create(GFP_KERNEL); + if (!fence) + return -ENOMEM; + err = __ring_active_locked(ce->ring); if (err) - return err; + goto out_fence; + + err = await_bind(fence, ce->ring->vma); + if (err < 0) + goto err_ring; err = intel_timeline_pin_locked(ce->timeline); if (err) goto err_ring; - if (!ce->state) - return 0; - - err = __context_active_locked(ce->state); - if (err) + err = await_bind(fence, ce->timeline->hwsp_ggtt); + if (err < 0) goto err_timeline; - return 0; + if (ce->state) { + err = __context_active_locked(ce->state); + if (err) + goto err_timeline; + + err = await_bind(fence, ce->state); + if (err < 0) + goto err_state; + } + + /* Must be the last action as it *releases* the ce->active */ + if (atomic_read(&fence->await.pending) > 1) + i915_active_set_exclusive(&ce->active, &fence->dma); + err = 0; + goto out_fence; + +err_state: + __context_retire_state(ce->state); err_timeline: intel_timeline_unpin(ce->timeline); err_ring: __ring_retire(ce->ring); +out_fence: + i915_sw_fence_commit(&fence->await); return err; } @@ -322,27 +363,6 @@ static void intel_context_active_release(struct intel_context *ce) i915_active_release(&ce->active); } -static int __intel_context_sync(struct intel_context *ce) -{ - int err; - - err = i915_vma_wait_for_bind(ce->ring->vma); - if (err) - return err; - - err = i915_vma_wait_for_bind(ce->timeline->hwsp_ggtt); - if (err) - return err; - - if (ce->state) { - err = i915_vma_wait_for_bind(ce->state); - if (err) - return err; - } - - return 0; -} - int __intel_context_do_pin(struct intel_context *ce) { int err; @@ -368,10 +388,6 @@ int __intel_context_do_pin(struct intel_context *ce) } if (likely(!atomic_add_unless(&ce->pin_count, 1, 0))) { - err = __intel_context_sync(ce); - if (unlikely(err)) - goto out_unlock; - err = intel_context_active_acquire(ce); if (unlikely(err)) goto out_unlock; diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h index 07be021882cc..f48df2784a6c 100644 --- a/drivers/gpu/drm/i915/g
[Intel-gfx] [PATCH 14/37] drm/i915: Serialise i915_vma_pin_inplace() with i915_vma_unbind()
Directly seralise the atomic pinning with evicting the vma from unbind with a pair of coupled cmpxchg to avoid fighting over vm->mutex. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_vma.c | 45 ++--- 1 file changed, 14 insertions(+), 31 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index dbe11b349175..17ce0bce318e 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -742,12 +742,10 @@ i915_vma_detach(struct i915_vma *vma) bool i915_vma_pin_inplace(struct i915_vma *vma, unsigned int flags) { - unsigned int bound; - bool pinned = true; + unsigned int bound = atomic_read(&vma->flags); GEM_BUG_ON(flags & ~I915_VMA_BIND_MASK); - bound = atomic_read(&vma->flags); do { if (unlikely(flags & ~bound)) return false; @@ -755,34 +753,10 @@ bool i915_vma_pin_inplace(struct i915_vma *vma, unsigned int flags) if (unlikely(bound & (I915_VMA_OVERFLOW | I915_VMA_ERROR))) return false; - if (!(bound & I915_VMA_PIN_MASK)) - goto unpinned; - GEM_BUG_ON(((bound + 1) & I915_VMA_PIN_MASK) == 0); } while (!atomic_try_cmpxchg(&vma->flags, &bound, bound + 1)); return true; - -unpinned: - /* -* If pin_count==0, but we are bound, check under the lock to avoid -* racing with a concurrent i915_vma_unbind(). -*/ - mutex_lock(&vma->vm->mutex); - do { - if (unlikely(bound & (I915_VMA_OVERFLOW | I915_VMA_ERROR))) { - pinned = false; - break; - } - - if (unlikely(flags & ~bound)) { - pinned = false; - break; - } - } while (!atomic_try_cmpxchg(&vma->flags, &bound, bound + 1)); - mutex_unlock(&vma->vm->mutex); - - return pinned; } static int vma_get_pages(struct i915_vma *vma) @@ -1292,6 +1266,7 @@ void __i915_vma_evict(struct i915_vma *vma) int __i915_vma_unbind(struct i915_vma *vma) { + unsigned int bound; int ret; lockdep_assert_held(&vma->vm->mutex); @@ -1299,10 +1274,18 @@ int __i915_vma_unbind(struct i915_vma *vma) if (!drm_mm_node_allocated(&vma->node)) return 0; - if (i915_vma_is_pinned(vma)) { - vma_print_allocator(vma, "is pinned"); - return -EAGAIN; - } + /* Serialise with i915_vma_pin_inplace() */ + bound = atomic_read(&vma->flags); + do { + if (unlikely(bound & I915_VMA_PIN_MASK)) { + vma_print_allocator(vma, "is pinned"); + return -EAGAIN; + } + + if (unlikely(bound & I915_VMA_ERROR)) + break; + } while (!atomic_try_cmpxchg(&vma->flags, +&bound, bound | I915_VMA_ERROR)); /* * After confirming that no one else is pinning this vma, wait for -- 2.20.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 08/37] drm/i915/gem: Don't drop the timeline lock during execbuf
Our timeline lock is our defence against a concurrent execbuf interrupting our request construction. we need hold it throughout or, for example, a second thread may interject a relocation request in between our own relocation request and execution in the ring. A second, major benefit, is that it allows us to preserve a large chunk of the ringbuffer for our exclusive use; which should virtually eliminate the threat of hitting a wait_for_space during request construction -- although we should have already dropped other contentious locks at that point. Signed-off-by: Chris Wilson Reviewed-by: Thomas Hellström --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 462 +++--- .../i915/gem/selftests/i915_gem_execbuffer.c | 29 +- 2 files changed, 312 insertions(+), 179 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 9ce114d67288..2dc30dbbdbf3 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -267,6 +267,8 @@ struct i915_execbuffer { bool has_fence : 1; bool needs_unfenced : 1; + struct intel_context *ce; + struct i915_vma *target; struct i915_request *rq; struct i915_vma *rq_vma; @@ -650,6 +652,35 @@ static int eb_reserve_vma(const struct i915_execbuffer *eb, return 0; } +static void retire_requests(struct intel_timeline *tl) +{ + struct i915_request *rq, *rn; + + list_for_each_entry_safe(rq, rn, &tl->requests, link) + if (!i915_request_retire(rq)) + break; +} + +static int wait_for_timeline(struct intel_timeline *tl) +{ + do { + struct dma_fence *fence; + int err; + + fence = i915_active_fence_get(&tl->last_request); + if (!fence) + return 0; + + err = dma_fence_wait(fence, true); + dma_fence_put(fence); + if (err) + return err; + + /* Retiring may trigger a barrier, requiring an extra pass */ + retire_requests(tl); + } while (1); +} + static int eb_reserve(struct i915_execbuffer *eb) { const unsigned int count = eb->buffer_count; @@ -657,7 +688,6 @@ static int eb_reserve(struct i915_execbuffer *eb) struct list_head last; struct eb_vma *ev; unsigned int i, pass; - int err = 0; /* * Attempt to pin all of the buffers into the GTT. @@ -673,18 +703,37 @@ static int eb_reserve(struct i915_execbuffer *eb) * room for the earlier objects *unless* we need to defragment. */ - if (mutex_lock_interruptible(&eb->i915->drm.struct_mutex)) - return -EINTR; - pass = 0; do { + int err = 0; + + /* +* We need to hold one lock as we bind all the vma so that +* we have a consistent view of the entire vm and can plan +* evictions to fill the whole GTT. If we allow a second +* thread to run as we do this, it will either unbind +* everything we want pinned, or steal space that we need for +* ourselves. The closer we are to a full GTT, the more likely +* such contention will cause us to fail to bind the workload +* for this batch. Since we know at this point we need to +* find space for new buffers, we know that extra pressure +* from contention is likely. +* +* In lieu of being able to hold vm->mutex for the entire +* sequence (it's complicated!), we opt for struct_mutex. +*/ + if (mutex_lock_interruptible(&eb->i915->drm.struct_mutex)) + return -EINTR; + list_for_each_entry(ev, &eb->unbound, bind_link) { err = eb_reserve_vma(eb, ev, pin_flags); if (err) break; } - if (!(err == -ENOSPC || err == -EAGAIN)) - break; + if (!(err == -ENOSPC || err == -EAGAIN)) { + mutex_unlock(&eb->i915->drm.struct_mutex); + return err; + } /* Resort *all* the objects into priority order */ INIT_LIST_HEAD(&eb->unbound); @@ -713,38 +762,50 @@ static int eb_reserve(struct i915_execbuffer *eb) list_add_tail(&ev->bind_link, &last); } list_splice_tail(&last, &eb->unbound); + mutex_unlock(&eb->i915->drm.struct_mutex); if (err == -EAGAIN) { - mutex_unlock(&eb->i915->drm.struct_mutex); flu
[Intel-gfx] [PATCH 18/37] drm/i915/gem: Separate the ww_mutex walker into its own list
In preparation for making eb_vma bigger and heavy to run in parallel, we need to stop applying an in-place swap() to reorder around ww_mutex deadlocks. Keep the array intact and reorder the locks using a dedicated list. Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin Reviewed-by: Thomas Hellström --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 83 --- 1 file changed, 54 insertions(+), 29 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 62a1de1dd238..32d23718ee1e 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -38,6 +38,7 @@ struct eb_vma { struct list_head bind_link; struct list_head unbound_link; struct list_head reloc_link; + struct list_head submit_link; struct hlist_node node; u32 handle; @@ -256,6 +257,8 @@ struct i915_execbuffer { /** list of vma that have execobj.relocation_count */ struct list_head relocs_list; + struct list_head submit_list; + /** * Track the most recently used object for relocations, as we * frequently have to perform multiple relocations within the same @@ -353,6 +356,42 @@ static void eb_vma_array_put(struct eb_vma_array *arr) kref_put(&arr->kref, eb_vma_array_destroy); } +static int +eb_lock_vma(struct i915_execbuffer *eb, struct ww_acquire_ctx *acquire) +{ + struct eb_vma *ev; + int err = 0; + + list_for_each_entry(ev, &eb->submit_list, submit_link) { + struct i915_vma *vma = ev->vma; + + err = ww_mutex_lock_interruptible(&vma->resv->lock, acquire); + if (err == -EDEADLK) { + struct eb_vma *unlock = ev, *en; + + list_for_each_entry_safe_continue_reverse(unlock, en, + &eb->submit_list, + submit_link) { + ww_mutex_unlock(&unlock->vma->resv->lock); + list_move_tail(&unlock->submit_link, &eb->submit_list); + } + + GEM_BUG_ON(!list_is_first(&ev->submit_link, &eb->submit_list)); + err = ww_mutex_lock_slow_interruptible(&vma->resv->lock, + acquire); + } + if (err) { + list_for_each_entry_continue_reverse(ev, +&eb->submit_list, +submit_link) + ww_mutex_unlock(&ev->vma->resv->lock); + break; + } + } + + return err; +} + static int eb_create(struct i915_execbuffer *eb) { /* Allocate an extra slot for use by the command parser + sentinel */ @@ -405,6 +444,10 @@ static int eb_create(struct i915_execbuffer *eb) eb->lut_size = -eb->buffer_count; } + INIT_LIST_HEAD(&eb->bind_list); + INIT_LIST_HEAD(&eb->submit_list); + INIT_LIST_HEAD(&eb->relocs_list); + return 0; } @@ -572,6 +615,7 @@ eb_add_vma(struct i915_execbuffer *eb, } list_add_tail(&ev->bind_link, &eb->bind_list); + list_add_tail(&ev->submit_link, &eb->submit_list); if (entry->relocation_count) list_add_tail(&ev->reloc_link, &eb->relocs_list); @@ -938,9 +982,6 @@ static int eb_lookup_vmas(struct i915_execbuffer *eb) unsigned int i; int err = 0; - INIT_LIST_HEAD(&eb->bind_list); - INIT_LIST_HEAD(&eb->relocs_list); - for (i = 0; i < eb->buffer_count; i++) { struct i915_vma *vma; @@ -1613,38 +1654,19 @@ static int eb_relocate(struct i915_execbuffer *eb) static int eb_move_to_gpu(struct i915_execbuffer *eb) { - const unsigned int count = eb->buffer_count; struct ww_acquire_ctx acquire; - unsigned int i; + struct eb_vma *ev; int err = 0; ww_acquire_init(&acquire, &reservation_ww_class); - for (i = 0; i < count; i++) { - struct eb_vma *ev = &eb->vma[i]; - struct i915_vma *vma = ev->vma; - - err = ww_mutex_lock_interruptible(&vma->resv->lock, &acquire); - if (err == -EDEADLK) { - GEM_BUG_ON(i == 0); - do { - int j = i - 1; - - ww_mutex_unlock(&eb->vma[j].vma->resv->lock); - - swap(eb->vma[i], eb->vma[j]); - } while (--i); + err = eb_lock_vma(eb, &acquire); + if (err) + goto err_fini; - err = ww_mutex_lock_slow_interr
[Intel-gfx] [PATCH 17/37] drm/i915/gem: Assign context id for async work
Allocate a few dma fence context id that we can use to associate async work [for the CPU] launched on behalf of this context. For extra fun, we allow a configurable concurrency width. A current example would be that we spawn an unbound worker for every userptr get_pages. In the future, we wish to charge this work to the context that initiated the async work and to impose concurrency limits based on the context. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 4 drivers/gpu/drm/i915/gem/i915_gem_context.h | 6 ++ drivers/gpu/drm/i915/gem/i915_gem_context_types.h | 6 ++ 3 files changed, 16 insertions(+) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index db893f6c516b..bc80e7d3c50a 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -721,6 +721,10 @@ __create_context(struct drm_i915_private *i915) mutex_init(&ctx->mutex); INIT_LIST_HEAD(&ctx->link); + ctx->async.width = rounddown_pow_of_two(num_online_cpus()); + ctx->async.context = dma_fence_context_alloc(ctx->async.width); + ctx->async.width--; + spin_lock_init(&ctx->stale.lock); INIT_LIST_HEAD(&ctx->stale.engines); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.h b/drivers/gpu/drm/i915/gem/i915_gem_context.h index a133f92bbedb..f254458a795e 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.h @@ -134,6 +134,12 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data, int i915_gem_context_reset_stats_ioctl(struct drm_device *dev, void *data, struct drm_file *file); +static inline u64 i915_gem_context_async_id(struct i915_gem_context *ctx) +{ + return (ctx->async.context + + (atomic_fetch_inc(&ctx->async.cur) & ctx->async.width)); +} + static inline struct i915_gem_context * i915_gem_context_get(struct i915_gem_context *ctx) { diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h index ae14ca24a11f..52561f98000f 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h @@ -85,6 +85,12 @@ struct i915_gem_context { struct intel_timeline *timeline; + struct { + u64 context; + atomic_t cur; + unsigned int width; + } async; + /** * @vm: unique address space (GTT) * -- 2.20.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 01/37] drm/i915/gem: Reduce context termination list iteration guard to RCU
As we now protect the timeline list using RCU, we can drop the timeline->mutex for guarding the list iteration during context close, as we are searching for an inflight request. Any new request will see the context is banned and not be submitted. In doing so, pull the checks for a concurrent submission of the request (notably the i915_request_completed()) under the engine spinlock, to fully serialise with __i915_request_submit()). That is in the case of preempt-to-busy where the request may be completed during the __i915_request_submit(), we need to be careful that we sample the request status after serialising so that we don't miss the request the engine is actually submitting. Fixes: 4a3174152147 ("drm/i915/gem: Refine occupancy test in kill_context()") References: d22d2d073ef8 ("drm/i915: Protect i915_request_await_start from early waits") # rcu protection of timeline->requests References: https://gitlab.freedesktop.org/drm/intel/-/issues/1622 Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 32 - 1 file changed, 19 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index d8cccbab7a51..db893f6c516b 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -439,29 +439,36 @@ static bool __cancel_engine(struct intel_engine_cs *engine) return __reset_engine(engine); } -static struct intel_engine_cs *__active_engine(struct i915_request *rq) +static bool +__active_engine(struct i915_request *rq, struct intel_engine_cs **active) { struct intel_engine_cs *engine, *locked; + bool ret = false; /* * Serialise with __i915_request_submit() so that it sees * is-banned?, or we know the request is already inflight. +* +* Note that rq->engine is unstable, and so we double +* check that we have acquired the lock on the final engine. */ locked = READ_ONCE(rq->engine); spin_lock_irq(&locked->active.lock); while (unlikely(locked != (engine = READ_ONCE(rq->engine { spin_unlock(&locked->active.lock); - spin_lock(&engine->active.lock); locked = engine; + spin_lock(&locked->active.lock); } - engine = NULL; - if (i915_request_is_active(rq) && rq->fence.error != -EIO) - engine = rq->engine; + if (!i915_request_completed(rq)) { + if (i915_request_is_active(rq) && rq->fence.error != -EIO) + *active = locked; + ret = true; + } spin_unlock_irq(&locked->active.lock); - return engine; + return ret; } static struct intel_engine_cs *active_engine(struct intel_context *ce) @@ -472,17 +479,16 @@ static struct intel_engine_cs *active_engine(struct intel_context *ce) if (!ce->timeline) return NULL; - mutex_lock(&ce->timeline->mutex); - list_for_each_entry_reverse(rq, &ce->timeline->requests, link) { - if (i915_request_completed(rq)) - break; + rcu_read_lock(); + list_for_each_entry_rcu(rq, &ce->timeline->requests, link) { + if (i915_request_is_active(rq) && i915_request_completed(rq)) + continue; /* Check with the backend if the request is inflight */ - engine = __active_engine(rq); - if (engine) + if (__active_engine(rq, &engine)) break; } - mutex_unlock(&ce->timeline->mutex); + rcu_read_unlock(); return engine; } -- 2.20.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 31/37] drm/i915/gt: Refactor heartbeat request construction and submission
Pull the individual strands of creating a custom heartbeat requests into a pair of common functions. This will reduce the number of changes we will need to make in future. Signed-off-by: Chris Wilson --- .../gpu/drm/i915/gt/intel_engine_heartbeat.c | 59 +-- 1 file changed, 41 insertions(+), 18 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c index 8ffdf676c0a0..377cbfdb3355 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c @@ -37,12 +37,33 @@ static bool next_heartbeat(struct intel_engine_cs *engine) return true; } +static struct i915_request * +heartbeat_create(struct intel_context *ce, gfp_t gfp) +{ + struct i915_request *rq; + + intel_context_enter(ce); + rq = __i915_request_create(ce, gfp); + intel_context_exit(ce); + + return rq; +} + static void idle_pulse(struct intel_engine_cs *engine, struct i915_request *rq) { engine->wakeref_serial = READ_ONCE(engine->serial) + 1; i915_request_add_active_barriers(rq); } +static void heartbeat_commit(struct i915_request *rq, +const struct i915_sched_attr *attr) +{ + idle_pulse(rq->engine, rq); + + __i915_request_commit(rq); + __i915_request_queue(rq, attr); +} + static void show_heartbeat(const struct i915_request *rq, struct intel_engine_cs *engine) { @@ -137,18 +158,14 @@ static void heartbeat(struct work_struct *wrk) goto out; } - intel_context_enter(ce); - rq = __i915_request_create(ce, GFP_NOWAIT | __GFP_NOWARN); - intel_context_exit(ce); + rq = heartbeat_create(ce, GFP_NOWAIT | __GFP_NOWARN); if (IS_ERR(rq)) goto unlock; - idle_pulse(engine, rq); if (engine->i915->params.enable_hangcheck) engine->heartbeat.systole = i915_request_get(rq); - __i915_request_commit(rq); - __i915_request_queue(rq, &attr); + heartbeat_commit(rq, &attr); unlock: mutex_unlock(&ce->timeline->mutex); @@ -220,19 +237,14 @@ int intel_engine_pulse(struct intel_engine_cs *engine) goto out_rpm; } - intel_context_enter(ce); - rq = __i915_request_create(ce, GFP_NOWAIT | __GFP_NOWARN); - intel_context_exit(ce); + rq = heartbeat_create(ce, GFP_NOWAIT | __GFP_NOWARN); if (IS_ERR(rq)) { err = PTR_ERR(rq); goto out_unlock; } __set_bit(I915_FENCE_FLAG_SENTINEL, &rq->fence.flags); - idle_pulse(engine, rq); - - __i915_request_commit(rq); - __i915_request_queue(rq, &attr); + heartbeat_commit(rq, &attr); GEM_BUG_ON(rq->sched.attr.priority < I915_PRIORITY_BARRIER); err = 0; @@ -245,8 +257,12 @@ int intel_engine_pulse(struct intel_engine_cs *engine) int intel_engine_flush_barriers(struct intel_engine_cs *engine) { + struct i915_sched_attr attr = { + .priority = I915_USER_PRIORITY(I915_PRIORITY_MIN), + }; + struct intel_context *ce = engine->kernel_context; struct i915_request *rq; - int err = 0; + int err; if (llist_empty(&engine->barrier_tasks)) return 0; @@ -254,15 +270,22 @@ int intel_engine_flush_barriers(struct intel_engine_cs *engine) if (!intel_engine_pm_get_if_awake(engine)) return 0; - rq = i915_request_create(engine->kernel_context); + if (mutex_lock_interruptible(&ce->timeline->mutex)) { + err = -EINTR; + goto out_rpm; + } + + rq = heartbeat_create(ce, GFP_KERNEL); if (IS_ERR(rq)) { err = PTR_ERR(rq); - goto out_rpm; + goto out_unlock; } - idle_pulse(engine, rq); - i915_request_add(rq); + heartbeat_commit(rq, &attr); + err = 0; +out_unlock: + mutex_unlock(&ce->timeline->mutex); out_rpm: intel_engine_pm_put(engine); return err; -- 2.20.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 07/37] drm/i915/gt: Split the breadcrumb spinlock between global and contexts
As we funnel more and more contexts into the breadcrumbs on an engine, the hold time of b->irq_lock grows. As we may then contend with the b->irq_lock during request submission, this increases the burden upon the engine->active.lock and so directly impacts both our execution latency and client latency. If we split the b->irq_lock by introducing a per-context spinlock to manage the signalers within a context, we then only need the b->irq_lock for enabling/disabling the interrupt and can avoid taking the lock for walking the list of contexts within the signal worker. Even with the current setup, this greatly reduces the number of times we have to take and fight for b->irq_lock. Fixes: bda4d4db6dd6 ("drm/i915/gt: Replace intel_engine_transfer_stale_breadcrumbs") Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 157 ++ .../gpu/drm/i915/gt/intel_breadcrumbs_types.h | 6 +- drivers/gpu/drm/i915/gt/intel_context.c | 1 + drivers/gpu/drm/i915/gt/intel_context_types.h | 1 + 4 files changed, 89 insertions(+), 76 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c index ae8895b48eca..8802b47fbd8f 100644 --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c @@ -100,15 +100,16 @@ static void __intel_breadcrumbs_disarm_irq(struct intel_breadcrumbs *b) static void add_signaling_context(struct intel_breadcrumbs *b, struct intel_context *ce) { - intel_context_get(ce); - list_add_tail(&ce->signal_link, &b->signalers); + lockdep_assert_held(&b->signalers_lock); + list_add_rcu(&ce->signal_link, &b->signalers); } static void remove_signaling_context(struct intel_breadcrumbs *b, struct intel_context *ce) { - list_del(&ce->signal_link); - intel_context_put(ce); + spin_lock(&b->signalers_lock); + list_del_rcu(&ce->signal_link); + spin_unlock(&b->signalers_lock); } static inline bool __request_completed(const struct i915_request *rq) @@ -184,15 +185,12 @@ static void signal_irq_work(struct irq_work *work) struct intel_breadcrumbs *b = container_of(work, typeof(*b), irq_work); const ktime_t timestamp = ktime_get(); struct llist_node *signal, *sn; - struct intel_context *ce, *cn; - struct list_head *pos, *next; + struct intel_context *ce; signal = NULL; if (unlikely(!llist_empty(&b->signaled_requests))) signal = llist_del_all(&b->signaled_requests); - spin_lock(&b->irq_lock); - /* * Keep the irq armed until the interrupt after all listeners are gone. * @@ -216,11 +214,23 @@ static void signal_irq_work(struct irq_work *work) * interrupt draw less ire from other users of the system and tools * like powertop. */ - if (!signal && b->irq_armed && list_empty(&b->signalers)) - __intel_breadcrumbs_disarm_irq(b); + if (!signal && READ_ONCE(b->irq_armed) && list_empty(&b->signalers)) { + spin_lock(&b->irq_lock); + if (b->irq_armed) + __intel_breadcrumbs_disarm_irq(b); + spin_unlock(&b->irq_lock); + } + + rcu_read_lock(); + list_for_each_entry_rcu(ce, &b->signalers, signal_link) { + struct list_head *pos, *next; + bool release = false; - list_for_each_entry_safe(ce, cn, &b->signalers, signal_link) { - GEM_BUG_ON(list_empty(&ce->signals)); + if (!spin_trylock(&ce->signal_lock)) + continue; + + if (list_empty(&ce->signals)) + goto unlock; list_for_each_safe(pos, next, &ce->signals) { struct i915_request *rq = @@ -253,11 +263,16 @@ static void signal_irq_work(struct irq_work *work) if (&ce->signals == pos) { /* now empty */ add_retire(b, ce->timeline); remove_signaling_context(b, ce); + release = true; } } - } - spin_unlock(&b->irq_lock); +unlock: + spin_unlock(&ce->signal_lock); + if (release) + intel_context_put(ce); + } + rcu_read_unlock(); llist_for_each_safe(signal, sn, signal) { struct i915_request *rq = @@ -292,14 +307,15 @@ intel_breadcrumbs_create(struct intel_engine_cs *irq_engine) if (!b) return NULL; - spin_lock_init(&b->irq_lock); + b->irq_engine = irq_engine; + + spin_lock_init(&b->signalers_lock); INIT_LIST_HEAD(&b->signalers); init_llist_head(&b->signaled_requests); + spin_lock_init(&b->irq_lock);
[Intel-gfx] [PATCH 05/37] drm/i915/gt: Track signaled breadcrumbs outside of the breadcrumb spinlock
Make b->signaled_requests a lockless-list so that we can manipulate it outside of the b->irq_lock. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 28 +++ .../gpu/drm/i915/gt/intel_breadcrumbs_types.h | 2 +- drivers/gpu/drm/i915/i915_request.h | 6 +++- 3 files changed, 22 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c index dee6d5c9b413..9710d09e7670 100644 --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c @@ -169,16 +169,13 @@ static void add_retire(struct intel_breadcrumbs *b, struct intel_timeline *tl) intel_engine_add_retire(b->irq_engine, tl); } -static bool __signal_request(struct i915_request *rq, struct list_head *signals) +static bool __signal_request(struct i915_request *rq) { - clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags); - if (!__dma_fence_signal(&rq->fence)) { i915_request_put(rq); return false; } - list_add_tail(&rq->signal_link, signals); return true; } @@ -186,9 +183,13 @@ static void signal_irq_work(struct irq_work *work) { struct intel_breadcrumbs *b = container_of(work, typeof(*b), irq_work); const ktime_t timestamp = ktime_get(); + struct llist_node *signal, *sn; struct intel_context *ce, *cn; struct list_head *pos, *next; - LIST_HEAD(signal); + + signal = NULL; + if (unlikely(!llist_empty(&b->signaled_requests))) + signal = llist_del_all(&b->signaled_requests); spin_lock(&b->irq_lock); @@ -218,8 +219,6 @@ static void signal_irq_work(struct irq_work *work) if (b->irq_armed && list_empty(&b->signalers)) __intel_breadcrumbs_disarm_irq(b); - list_splice_init(&b->signaled_requests, &signal); - list_for_each_entry_safe(ce, cn, &b->signalers, signal_link) { GEM_BUG_ON(list_empty(&ce->signals)); @@ -236,7 +235,11 @@ static void signal_irq_work(struct irq_work *work) * spinlock as the callback chain may end up adding * more signalers to the same context or engine. */ - __signal_request(rq, &signal); + clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags); + if (__signal_request(rq)) { + rq->signal_node.next = signal; + signal = &rq->signal_node; + } } /* @@ -256,9 +259,9 @@ static void signal_irq_work(struct irq_work *work) spin_unlock(&b->irq_lock); - list_for_each_safe(pos, next, &signal) { + llist_for_each_safe(signal, sn, signal) { struct i915_request *rq = - list_entry(pos, typeof(*rq), signal_link); + llist_entry(signal, typeof(*rq), signal_node); struct list_head cb_list; spin_lock(&rq->lock); @@ -291,7 +294,7 @@ intel_breadcrumbs_create(struct intel_engine_cs *irq_engine) spin_lock_init(&b->irq_lock); INIT_LIST_HEAD(&b->signalers); - INIT_LIST_HEAD(&b->signaled_requests); + init_llist_head(&b->signaled_requests); init_irq_work(&b->irq_work, signal_irq_work); @@ -346,7 +349,8 @@ static void insert_breadcrumb(struct i915_request *rq, * its signal completion. */ if (__request_completed(rq)) { - if (__signal_request(rq, &b->signaled_requests)) + if (__signal_request(rq) && + llist_add(&rq->signal_node, &b->signaled_requests)) irq_work_queue(&b->irq_work); return; } diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h index 8e53b9942695..3fa19820b37a 100644 --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h @@ -35,7 +35,7 @@ struct intel_breadcrumbs { struct intel_engine_cs *irq_engine; struct list_head signalers; - struct list_head signaled_requests; + struct llist_head signaled_requests; struct irq_work irq_work; /* for use from inside irq_lock */ diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h index 16b721080195..874af6db6103 100644 --- a/drivers/gpu/drm/i915/i915_request.h +++ b/drivers/gpu/drm/i915/i915_request.h @@ -176,7 +176,11 @@ struct i915_request { struct intel_context *context; struct intel_ring *ring; struct intel_timeline __rcu *timeline; - struct list_head signal_link; + + union { + struct list_head signal_link; + struct llist_node signal_node; +
[Intel-gfx] [PATCH 02/37] drm/i915/gt: Protect context lifetime with RCU
Allow a brief period for continued access to a dead intel_context by deferring the release of the struct until after an RCU grace period. As we are using a dedicated slab cache for the contexts, we can defer the release of the slab pages via RCU, with the caveat that individual structs may be reused from the freelist within an RCU grace period. To handle that, we have to avoid clearing members of the zombie struct. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gt/intel_context.c | 330 +--- drivers/gpu/drm/i915/i915_active.c | 10 + drivers/gpu/drm/i915/i915_active.h | 2 + drivers/gpu/drm/i915/i915_utils.h | 7 + 4 files changed, 202 insertions(+), 147 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index 52db2bde44a3..4e7924640ffa 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -22,7 +22,7 @@ static struct i915_global_context { static struct intel_context *intel_context_alloc(void) { - return kmem_cache_zalloc(global.slab_ce, GFP_KERNEL); + return kmem_cache_alloc(global.slab_ce, GFP_KERNEL); } void intel_context_free(struct intel_context *ce) @@ -30,6 +30,177 @@ void intel_context_free(struct intel_context *ce) kmem_cache_free(global.slab_ce, ce); } +static int __context_pin_state(struct i915_vma *vma) +{ + unsigned int bias = i915_ggtt_pin_bias(vma) | PIN_OFFSET_BIAS; + int err; + + err = i915_ggtt_pin(vma, 0, bias | PIN_HIGH); + if (err) + return err; + + err = i915_active_acquire(&vma->active); + if (err) + goto err_unpin; + + /* +* And mark it as a globally pinned object to let the shrinker know +* it cannot reclaim the object until we release it. +*/ + i915_vma_make_unshrinkable(vma); + vma->obj->mm.dirty = true; + + return 0; + +err_unpin: + i915_vma_unpin(vma); + return err; +} + +static void __context_unpin_state(struct i915_vma *vma) +{ + i915_vma_make_shrinkable(vma); + i915_active_release(&vma->active); + __i915_vma_unpin(vma); +} + +static int __ring_active(struct intel_ring *ring) +{ + int err; + + err = intel_ring_pin(ring); + if (err) + return err; + + err = i915_active_acquire(&ring->vma->active); + if (err) + goto err_pin; + + return 0; + +err_pin: + intel_ring_unpin(ring); + return err; +} + +static void __ring_retire(struct intel_ring *ring) +{ + i915_active_release(&ring->vma->active); + intel_ring_unpin(ring); +} + +__i915_active_call +static void __intel_context_retire(struct i915_active *active) +{ + struct intel_context *ce = container_of(active, typeof(*ce), active); + + CE_TRACE(ce, "retire runtime: { total:%lluns, avg:%lluns }\n", +intel_context_get_total_runtime_ns(ce), +intel_context_get_avg_runtime_ns(ce)); + + set_bit(CONTEXT_VALID_BIT, &ce->flags); + if (ce->state) + __context_unpin_state(ce->state); + + intel_timeline_unpin(ce->timeline); + __ring_retire(ce->ring); + + intel_context_put(ce); +} + +static int __intel_context_active(struct i915_active *active) +{ + struct intel_context *ce = container_of(active, typeof(*ce), active); + int err; + + CE_TRACE(ce, "active\n"); + + intel_context_get(ce); + + err = __ring_active(ce->ring); + if (err) + goto err_put; + + err = intel_timeline_pin(ce->timeline); + if (err) + goto err_ring; + + if (!ce->state) + return 0; + + err = __context_pin_state(ce->state); + if (err) + goto err_timeline; + + return 0; + +err_timeline: + intel_timeline_unpin(ce->timeline); +err_ring: + __ring_retire(ce->ring); +err_put: + intel_context_put(ce); + return err; +} + +static void __intel_context_ctor(void *arg) +{ + struct intel_context *ce = arg; + + INIT_LIST_HEAD(&ce->signal_link); + INIT_LIST_HEAD(&ce->signals); + + atomic_set(&ce->pin_count, 0); + mutex_init(&ce->pin_mutex); + + ce->active_count = 0; + i915_active_init(&ce->active, +__intel_context_active, __intel_context_retire); + + ce->inflight = NULL; + ce->lrc_reg_state = NULL; + ce->lrc.desc = 0; +} + +static void +__intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine) +{ + GEM_BUG_ON(!engine->cops); + GEM_BUG_ON(!engine->gt->vm); + + kref_init(&ce->ref); + i915_active_reinit(&ce->active); + mutex_reinit(&ce->pin_mutex); + + ce->engine = engine; + ce->ops = engine->cops; + ce->sseu = engine->sseu; + + ce->wa_bb_page = 0; + ce->flags = 0; + ce->tag = 0; + +
[Intel-gfx] [PATCH 16/37] drm/i915: Always defer fenced work to the worker
Currently, if an error is raised we always call the cleanup locally [and skip the main work callback]. However, some future users may need to take a mutex to cleanup and so we cannot immediately execute the cleanup as we may still be in interrupt context. For example, if we have committed sensitive changes [like evicting from the ppGTT layout] that are visible but gated behind the fence, we need to ensure those changes are completed even after an error. [This does suggest the split between the work/release callback is artificial and we may be able to simplify the worker api by only requiring a single callback.] With the execute-immediate flag, for most cases this should result in immediate cleanup of an error. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_sw_fence_work.c | 26 +++ 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.c b/drivers/gpu/drm/i915/i915_sw_fence_work.c index a3a81bb8f2c3..e094fd0a4202 100644 --- a/drivers/gpu/drm/i915/i915_sw_fence_work.c +++ b/drivers/gpu/drm/i915/i915_sw_fence_work.c @@ -16,11 +16,14 @@ static void fence_complete(struct dma_fence_work *f) static void fence_work(struct work_struct *work) { struct dma_fence_work *f = container_of(work, typeof(*f), work); - int err; - err = f->ops->work(f); - if (err) - dma_fence_set_error(&f->dma, err); + if (!f->dma.error) { + int err; + + err = f->ops->work(f); + if (err) + dma_fence_set_error(&f->dma, err); + } fence_complete(f); dma_fence_put(&f->dma); @@ -36,15 +39,10 @@ fence_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state) if (fence->error) dma_fence_set_error(&f->dma, fence->error); - if (!f->dma.error) { - dma_fence_get(&f->dma); - if (test_bit(DMA_FENCE_WORK_IMM, &f->dma.flags)) - fence_work(&f->work); - else - queue_work(system_unbound_wq, &f->work); - } else { - fence_complete(f); - } + if (test_bit(DMA_FENCE_WORK_IMM, &f->dma.flags)) + fence_work(&f->work); + else + queue_work(system_unbound_wq, &f->work); break; case FENCE_FREE: @@ -91,6 +89,8 @@ void dma_fence_work_init(struct dma_fence_work *f, dma_fence_init(&f->dma, &fence_ops, &f->lock, 0, 0); i915_sw_fence_init(&f->chain, fence_notify); INIT_WORK(&f->work, fence_work); + + dma_fence_get(&f->dma); /* once for the chain; once for the work */ } int dma_fence_work_chain(struct dma_fence_work *f, struct dma_fence *signal) -- 2.20.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 15/37] drm/i915: Add list_for_each_entry_safe_continue_reverse
One more list iterator variant, for when we want to unwind from inside one list iterator with the intention of restarting from the current entry as the new head of the list. Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin Reviewed-by: Thomas Hellström --- drivers/gpu/drm/i915/i915_utils.h | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_utils.h b/drivers/gpu/drm/i915/i915_utils.h index ef8db3aa75c7..3873834f2316 100644 --- a/drivers/gpu/drm/i915/i915_utils.h +++ b/drivers/gpu/drm/i915/i915_utils.h @@ -266,6 +266,12 @@ static inline int list_is_last_rcu(const struct list_head *list, return READ_ONCE(list->next) == head; } +#define list_for_each_entry_safe_continue_reverse(pos, n, head, member) \ + for (pos = list_prev_entry(pos, member),\ +n = list_prev_entry(pos, member); \ +&pos->member != (head);\ +pos = n, n = list_prev_entry(n, member)) + static inline unsigned long msecs_to_jiffies_timeout(const unsigned int m) { unsigned long j = msecs_to_jiffies(m); -- 2.20.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 10/37] drm/i915/gem: Rename the list of relocations to reloc_list
Continuing the theme of calling the lists a foo_list, rename the relocs list. This means that we can now use relocs for the old reloc_cache that is not a cache anymore. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index a5b63ae17241..e7e16c62df1c 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -252,7 +252,7 @@ struct i915_execbuffer { struct list_head unbound; /** list of vma that have execobj.relocation_count */ - struct list_head relocs; + struct list_head relocs_list; /** * Track the most recently used object for relocations, as we @@ -577,7 +577,7 @@ eb_add_vma(struct i915_execbuffer *eb, } if (entry->relocation_count) - list_add_tail(&ev->reloc_link, &eb->relocs); + list_add_tail(&ev->reloc_link, &eb->relocs_list); /* * SNA is doing fancy tricks with compressing batch buffers, which leads @@ -932,7 +932,7 @@ static int eb_lookup_vmas(struct i915_execbuffer *eb) unsigned int i; int err = 0; - INIT_LIST_HEAD(&eb->relocs); + INIT_LIST_HEAD(&eb->relocs_list); INIT_LIST_HEAD(&eb->unbound); for (i = 0; i < eb->buffer_count; i++) { @@ -1592,7 +1592,7 @@ static int eb_relocate(struct i915_execbuffer *eb) struct eb_vma *ev; int flush; - list_for_each_entry(ev, &eb->relocs, reloc_link) { + list_for_each_entry(ev, &eb->relocs_list, reloc_link) { err = eb_relocate_vma(eb, ev); if (err) break; -- 2.20.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 13/37] drm/i915/gem: Remove the call for no-evict i915_vma_pin
Remove the stub i915_vma_pin() used for incrementally pinning objects for execbuf (under the severe restriction that they must not wait on a resource as we may have already pinned it) and replace it with a i915_vma_pin_inplace() that is only allowed to reclaim the currently bound location for the vma (and will never wait for a pinned resource). v2: Bail out if fences are in use. Signed-off-by: Chris Wilson Reviewed-by: Thomas Hellström --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 55 +-- drivers/gpu/drm/i915/i915_vma.c | 6 +- drivers/gpu/drm/i915/i915_vma.h | 2 + 3 files changed, 31 insertions(+), 32 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 2f6fa8b3a805..62a1de1dd238 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -464,49 +464,41 @@ static u64 eb_pin_flags(const struct drm_i915_gem_exec_object2 *entry, return pin_flags; } +static bool eb_pin_vma_fence_inplace(struct eb_vma *ev) +{ + return false; /* We need to add some new fence serialisation */ +} + static inline bool -eb_pin_vma(struct i915_execbuffer *eb, - const struct drm_i915_gem_exec_object2 *entry, - struct eb_vma *ev) +eb_pin_vma_inplace(struct i915_execbuffer *eb, + const struct drm_i915_gem_exec_object2 *entry, + struct eb_vma *ev) { struct i915_vma *vma = ev->vma; - u64 pin_flags; + unsigned int pin_flags; - if (vma->node.size) - pin_flags = vma->node.start; - else - pin_flags = entry->offset & PIN_OFFSET_MASK; + if (eb_vma_misplaced(entry, vma, ev->flags)) + return false; - pin_flags |= PIN_USER | PIN_NOEVICT | PIN_OFFSET_FIXED; + pin_flags = PIN_USER; if (unlikely(ev->flags & EXEC_OBJECT_NEEDS_GTT)) pin_flags |= PIN_GLOBAL; /* Attempt to reuse the current location if available */ - if (unlikely(i915_vma_pin(vma, 0, 0, pin_flags))) { - if (entry->flags & EXEC_OBJECT_PINNED) - return false; - - /* Failing that pick any _free_ space if suitable */ - if (unlikely(i915_vma_pin(vma, - entry->pad_to_size, - entry->alignment, - eb_pin_flags(entry, ev->flags) | - PIN_USER | PIN_NOEVICT))) - return false; - } + if (!i915_vma_pin_inplace(vma, pin_flags)) + return false; if (unlikely(ev->flags & EXEC_OBJECT_NEEDS_FENCE)) { - if (unlikely(i915_vma_pin_fence(vma))) { - i915_vma_unpin(vma); + if (!eb_pin_vma_fence_inplace(ev)) { + __i915_vma_unpin(vma); return false; } - - if (vma->fence) - ev->flags |= __EXEC_OBJECT_HAS_FENCE; } + GEM_BUG_ON(eb_vma_misplaced(entry, vma, ev->flags)); + ev->flags |= __EXEC_OBJECT_HAS_PIN; - return !eb_vma_misplaced(entry, vma, ev->flags); + return true; } static int @@ -688,14 +680,17 @@ static int eb_reserve_vm(struct i915_execbuffer *eb) struct drm_i915_gem_exec_object2 *entry = ev->exec; struct i915_vma *vma = ev->vma; - if (eb_pin_vma(eb, entry, ev)) { + if (eb_pin_vma_inplace(eb, entry, ev)) { if (entry->offset != vma->node.start) { entry->offset = vma->node.start | UPDATE; eb->args->flags |= __EXEC_HAS_RELOC; } } else { - eb_unreserve_vma(ev); - list_add_tail(&ev->unbound_link, &unbound); + /* Lightly sort user placed objects to the fore */ + if (ev->flags & EXEC_OBJECT_PINNED) + list_add(&ev->unbound_link, &unbound); + else + list_add_tail(&ev->unbound_link, &unbound); } } diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index c6bf04ca2032..dbe11b349175 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -740,11 +740,13 @@ i915_vma_detach(struct i915_vma *vma) list_del(&vma->vm_link); } -static bool try_qad_pin(struct i915_vma *vma, unsigned int flags) +bool i915_vma_pin_inplace(struct i915_vma *vma, unsigned int flags) { unsigned int bound; bool pinned = true; + GEM_BUG_ON(flags & ~I915_VMA_BIND_MASK); + bound = atomic_read(&vma->flags); do {
[Intel-gfx] [PATCH 09/37] drm/i915/gem: Rename execbuf.bind_link to unbound_link
Rename the current list of unbound objects so that we can track of all objects that we need to bind, as well as the list of currently unbound [unprocessed] objects. Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin Reviewed-by: Thomas Hellström --- drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 2dc30dbbdbf3..a5b63ae17241 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -34,7 +34,7 @@ struct eb_vma { /** This vma's place in the execbuf reservation list */ struct drm_i915_gem_exec_object2 *exec; - struct list_head bind_link; + struct list_head unbound_link; struct list_head reloc_link; struct hlist_node node; @@ -605,7 +605,7 @@ eb_add_vma(struct i915_execbuffer *eb, } } else { eb_unreserve_vma(ev); - list_add_tail(&ev->bind_link, &eb->unbound); + list_add_tail(&ev->unbound_link, &eb->unbound); } } @@ -725,7 +725,7 @@ static int eb_reserve(struct i915_execbuffer *eb) if (mutex_lock_interruptible(&eb->i915->drm.struct_mutex)) return -EINTR; - list_for_each_entry(ev, &eb->unbound, bind_link) { + list_for_each_entry(ev, &eb->unbound, unbound_link) { err = eb_reserve_vma(eb, ev, pin_flags); if (err) break; @@ -751,15 +751,15 @@ static int eb_reserve(struct i915_execbuffer *eb) if (flags & EXEC_OBJECT_PINNED) /* Pinned must have their slot */ - list_add(&ev->bind_link, &eb->unbound); + list_add(&ev->unbound_link, &eb->unbound); else if (flags & __EXEC_OBJECT_NEEDS_MAP) /* Map require the lowest 256MiB (aperture) */ - list_add_tail(&ev->bind_link, &eb->unbound); + list_add_tail(&ev->unbound_link, &eb->unbound); else if (!(flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS)) /* Prioritise 4GiB region for restricted bo */ - list_add(&ev->bind_link, &last); + list_add(&ev->unbound_link, &last); else - list_add_tail(&ev->bind_link, &last); + list_add_tail(&ev->unbound_link, &last); } list_splice_tail(&last, &eb->unbound); mutex_unlock(&eb->i915->drm.struct_mutex); -- 2.20.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 20/37] drm/i915/gem: Bind the fence async for execbuf
It is illegal to wait on an another vma while holding the vm->mutex, as that easily leads to ABBA deadlocks (we wait on a second vma that waits on us to release the vm->mutex). So while the vm->mutex exists, we compute the required register transfer inside the i915_ggtt.mutex, setting up a fence for tracking the register writes, but move the waiting outside of the lock into the async binding pipeline. Signed-off-by: Chris Wilson Reviewed-by: Thomas Hellström --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 21 +-- drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c | 139 +- drivers/gpu/drm/i915/gt/intel_ggtt_fencing.h | 5 + drivers/gpu/drm/i915/i915_vma.h | 2 - 4 files changed, 152 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 301e67dcdbde..4cdaf5d81ef1 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -1054,15 +1054,6 @@ static int eb_reserve_vma(struct eb_vm_work *work, struct eb_bind_vma *bind) return err; pin: - if (unlikely(exec_flags & EXEC_OBJECT_NEEDS_FENCE)) { - err = __i915_vma_pin_fence(vma); /* XXX no waiting */ - if (unlikely(err)) - return err; - - if (vma->fence) - bind->ev->flags |= __EXEC_OBJECT_HAS_FENCE; - } - bind_flags &= ~atomic_read(&vma->flags); if (bind_flags) { err = set_bind_fence(vma, work); @@ -1093,6 +1084,15 @@ static int eb_reserve_vma(struct eb_vm_work *work, struct eb_bind_vma *bind) bind->ev->flags |= __EXEC_OBJECT_HAS_PIN; GEM_BUG_ON(eb_vma_misplaced(entry, vma, bind->ev->flags)); + if (unlikely(exec_flags & EXEC_OBJECT_NEEDS_FENCE)) { + err = __i915_vma_pin_fence_async(vma, &work->base); + if (unlikely(err)) + return err; + + if (vma->fence) + bind->ev->flags |= __EXEC_OBJECT_HAS_FENCE; + } + return 0; } @@ -1158,6 +1158,9 @@ static void __eb_bind_vma(struct eb_vm_work *work) struct eb_bind_vma *bind = &work->bind[n]; struct i915_vma *vma = bind->ev->vma; + if (bind->ev->flags & __EXEC_OBJECT_HAS_FENCE) + __i915_vma_apply_fence_async(vma); + if (!bind->bind_flags) goto put; diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c index 7fb36b12fe7a..ce06b949dc7c 100644 --- a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c +++ b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c @@ -21,10 +21,13 @@ * IN THE SOFTWARE. */ +#include "i915_active.h" #include "i915_drv.h" #include "i915_scatterlist.h" +#include "i915_sw_fence_work.h" #include "i915_pvinfo.h" #include "i915_vgpu.h" +#include "i915_vma.h" /** * DOC: fence register handling @@ -340,19 +343,37 @@ static struct i915_fence_reg *fence_find(struct i915_ggtt *ggtt) return ERR_PTR(-EDEADLK); } -int __i915_vma_pin_fence(struct i915_vma *vma) +static int fence_wait_bind(struct i915_fence_reg *reg) +{ + struct dma_fence *fence; + int err = 0; + + fence = i915_active_fence_get(®->active.excl); + if (fence) { + err = dma_fence_wait(fence, true); + dma_fence_put(fence); + } + + return err; +} + +static int __i915_vma_pin_fence(struct i915_vma *vma) { struct i915_ggtt *ggtt = i915_vm_to_ggtt(vma->vm); - struct i915_fence_reg *fence; + struct i915_fence_reg *fence = vma->fence; struct i915_vma *set = i915_gem_object_is_tiled(vma->obj) ? vma : NULL; int err; lockdep_assert_held(&vma->vm->mutex); /* Just update our place in the LRU if our fence is getting reused. */ - if (vma->fence) { - fence = vma->fence; + if (fence) { GEM_BUG_ON(fence->vma != vma); + + err = fence_wait_bind(fence); + if (err) + return err; + atomic_inc(&fence->pin_count); if (!fence->dirty) { list_move_tail(&fence->link, &ggtt->fence_list); @@ -384,6 +405,116 @@ int __i915_vma_pin_fence(struct i915_vma *vma) return err; } +static int set_bind_fence(struct i915_fence_reg *fence, + struct dma_fence_work *work) +{ + struct dma_fence *prev; + int err; + + if (rcu_access_pointer(fence->active.excl.fence) == &work->dma) + return 0; + + err = i915_sw_fence_await_active(&work->chain, +&fence->active, +I915_ACTIVE_AWAIT_ACTIVE); + if (err) + return err; + + if (i915_active_ac
[Intel-gfx] [PATCH 33/37] drm/i915/gt: Acquire backing storage for the context
Pull the individual acquisition of the context objects (state, ring, timeline) under a common i915_acquire_ctx in preparation to allow the context to evict memory (or rather the i915_acquire_ctx on its behalf). The context objects maintain their semi-permanent status; that is they are assumed to be accessible by the HW at all times until we receive a signal from the HW that they are no longer in use. Currently, we generate such a signal ourselves from the context switch following the final use of the objects. This means that they will remain on the HW for an indefinite amount of time, and we retain the use of pinning to keep them in the same place. As they are pinned, they can be processed outside of the working set for the requests within the context. This is useful, as the context share some global state causing it to incur a global lock via its objects. By only requiring that lock as the context is activated, it is both reduced in frequency and reduced in duration (as compared to execbuf). Signed-off-by: Chris Wilson Reviewed-by: Thomas Hellström --- drivers/gpu/drm/i915/gt/intel_context.c | 113 ++--- drivers/gpu/drm/i915/gt/intel_ring.c | 17 ++- drivers/gpu/drm/i915/gt/intel_ring.h | 5 +- .../gpu/drm/i915/gt/intel_ring_submission.c | 119 +++--- drivers/gpu/drm/i915/gt/intel_timeline.c | 14 ++- drivers/gpu/drm/i915/gt/intel_timeline.h | 8 +- drivers/gpu/drm/i915/gt/mock_engine.c | 2 + drivers/gpu/drm/i915/gt/selftest_timeline.c | 30 - 8 files changed, 240 insertions(+), 68 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index cde356c7754d..ff3f7580d1ca 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -6,6 +6,7 @@ #include "gem/i915_gem_context.h" #include "gem/i915_gem_pm.h" +#include "mm/i915_acquire_ctx.h" #include "i915_drv.h" #include "i915_globals.h" @@ -30,12 +31,12 @@ void intel_context_free(struct intel_context *ce) kmem_cache_free(global.slab_ce, ce); } -static int __context_pin_state(struct i915_vma *vma) +static int __context_active_locked(struct i915_vma *vma) { unsigned int bias = i915_ggtt_pin_bias(vma) | PIN_OFFSET_BIAS; int err; - err = i915_ggtt_pin(vma, 0, bias | PIN_HIGH); + err = i915_ggtt_pin_locked(vma, 0, bias | PIN_HIGH); if (err) return err; @@ -57,18 +58,18 @@ static int __context_pin_state(struct i915_vma *vma) return err; } -static void __context_unpin_state(struct i915_vma *vma) +static void __context_retire_state(struct i915_vma *vma) { i915_vma_make_shrinkable(vma); i915_active_release(&vma->active); __i915_vma_unpin(vma); } -static int __ring_active(struct intel_ring *ring) +static int __ring_active_locked(struct intel_ring *ring) { int err; - err = intel_ring_pin(ring); + err = intel_ring_pin_locked(ring); if (err) return err; @@ -100,7 +101,7 @@ static void __intel_context_retire(struct i915_active *active) set_bit(CONTEXT_VALID_BIT, &ce->flags); if (ce->state) - __context_unpin_state(ce->state); + __context_retire_state(ce->state); intel_timeline_unpin(ce->timeline); __ring_retire(ce->ring); @@ -108,27 +109,53 @@ static void __intel_context_retire(struct i915_active *active) intel_context_put(ce); } -static int __intel_context_active(struct i915_active *active) +static int +__intel_context_acquire_lock(struct intel_context *ce, +struct i915_acquire_ctx *ctx) +{ + return i915_acquire_ctx_lock(ctx, ce->state->obj); +} + +static int +intel_context_acquire_lock(struct intel_context *ce, + struct i915_acquire_ctx *ctx) { - struct intel_context *ce = container_of(active, typeof(*ce), active); int err; - CE_TRACE(ce, "active\n"); + err = intel_ring_acquire_lock(ce->ring, ctx); + if (err) + return err; - intel_context_get(ce); + if (ce->state) { + err = __intel_context_acquire_lock(ce, ctx); + if (err) + return err; + } - err = __ring_active(ce->ring); + /* Note that the timeline will migrate as the seqno wrap around */ + err = intel_timeline_acquire_lock(ce->timeline, ctx); if (err) - goto err_put; + return err; + + return 0; +} - err = intel_timeline_pin(ce->timeline); +static int intel_context_active_locked(struct intel_context *ce) +{ + int err; + + err = __ring_active_locked(ce->ring); + if (err) + return err; + + err = intel_timeline_pin_locked(ce->timeline); if (err) goto err_ring; if (!ce->state)
[Intel-gfx] [PATCH 26/37] drm/i915/gem: Pull execbuf dma resv under a single critical section
Acquire all the objects and their backing storage, and page directories, as used by execbuf under a single common ww_mutex. Albeit we have to restart the critical section a few times in order to handle various restrictions (such as avoiding copy_(from|to)_user and mmap_sem). Signed-off-by: Chris Wilson --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 166 +- .../i915/gem/selftests/i915_gem_execbuffer.c | 2 + 2 files changed, 84 insertions(+), 84 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 58e40348b551..3a79b6facb02 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -20,6 +20,7 @@ #include "gt/intel_gt_pm.h" #include "gt/intel_gt_requests.h" #include "gt/intel_ring.h" +#include "mm/i915_acquire_ctx.h" #include "i915_drv.h" #include "i915_gem_clflush.h" @@ -267,6 +268,8 @@ struct i915_execbuffer { struct intel_context *reloc_context; /* distinct context for relocs */ struct i915_gem_context *gem_context; /** caller's context */ + struct i915_acquire_ctx acquire; /** lock for _all_ DMA reservations */ + struct i915_request *request; /** our request to build */ struct eb_vma *batch; /** identity of the batch obj/vma */ @@ -392,42 +395,6 @@ static void eb_vma_array_put(struct eb_vma_array *arr) kref_put(&arr->kref, eb_vma_array_destroy); } -static int -eb_lock_vma(struct i915_execbuffer *eb, struct ww_acquire_ctx *acquire) -{ - struct eb_vma *ev; - int err = 0; - - list_for_each_entry(ev, &eb->submit_list, submit_link) { - struct i915_vma *vma = ev->vma; - - err = ww_mutex_lock_interruptible(&vma->resv->lock, acquire); - if (err == -EDEADLK) { - struct eb_vma *unlock = ev, *en; - - list_for_each_entry_safe_continue_reverse(unlock, en, - &eb->submit_list, - submit_link) { - ww_mutex_unlock(&unlock->vma->resv->lock); - list_move_tail(&unlock->submit_link, &eb->submit_list); - } - - GEM_BUG_ON(!list_is_first(&ev->submit_link, &eb->submit_list)); - err = ww_mutex_lock_slow_interruptible(&vma->resv->lock, - acquire); - } - if (err) { - list_for_each_entry_continue_reverse(ev, -&eb->submit_list, -submit_link) - ww_mutex_unlock(&ev->vma->resv->lock); - break; - } - } - - return err; -} - static int eb_create(struct i915_execbuffer *eb) { /* Allocate an extra slot for use by the sentinel */ @@ -656,6 +623,25 @@ eb_add_vma(struct i915_execbuffer *eb, } } +static int eb_lock_mm(struct i915_execbuffer *eb) +{ + struct eb_vma *ev; + int err; + + list_for_each_entry(ev, &eb->bind_list, bind_link) { + err = i915_acquire_ctx_lock(&eb->acquire, ev->vma->obj); + if (err) + return err; + } + + return 0; +} + +static int eb_acquire_mm(struct i915_execbuffer *eb) +{ + return i915_acquire_mm(&eb->acquire); +} + struct eb_vm_work { struct dma_fence_work base; struct eb_vma_array *array; @@ -1378,7 +1364,15 @@ static int eb_reserve_vm(struct i915_execbuffer *eb) unsigned long count; struct eb_vma *ev; unsigned int pass; - int err = 0; + int err; + + err = eb_lock_mm(eb); + if (err) + return err; + + err = eb_acquire_mm(eb); + if (err) + return err; count = 0; INIT_LIST_HEAD(&unbound); @@ -1404,10 +1398,15 @@ static int eb_reserve_vm(struct i915_execbuffer *eb) if (count == 0) return 0; + /* We need to reserve page directories, release all, start over */ + i915_acquire_ctx_fini(&eb->acquire); + pass = 0; do { struct eb_vm_work *work; + i915_acquire_ctx_init(&eb->acquire); + /* * We need to hold one lock as we bind all the vma so that * we have a consistent view of the entire vm and can plan @@ -1424,6 +1423,11 @@ static int eb_reserve_vm(struct i915_execbuffer *eb) * beneath it, so we have to stage and preallocate all the * resources we may require before taking the mutex. */ + + err = eb_lock_mm(eb); + if (e
[Intel-gfx] [PATCH 37/37] drm/i915/gem: Delay attach mmu-notifier until we acquire the pinned userptr
On the fast path, we first try to pin the user pages and then attach the mmu-notifier. On the slow path, we did it the opposite way around, carrying the mmu-notifier over from the tail of the fast path. However, if we are mapping a fresh batch of user pages, we will always hit a pmd split operation (to replace the zero pages with real pages), triggering an invalidate-range callback for this userptr, and so we have to cancel the work [after completing the pinning] and cause the caller to retry (an extra EAGAIN return from an ioctl for some paths). If we follow the fast path approach and attach the callback after completion, we only see the invalidate-range for revocations of our pages. The risk (the same as for the fast path) is that if the mmu-notifier should have been run during the page lookup, we will have missed it and the pages will be mixed. One might conclude that the fast path is wrong, and we should always attach the mmu-notifier first and bear the cost of redundant repetition. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 11 +++ 1 file changed, 3 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c index 80907c00c6fd..ba1f01650eeb 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c @@ -500,14 +500,13 @@ __i915_gem_userptr_get_pages_worker(struct work_struct *_work) pages = __i915_gem_userptr_alloc_pages(obj, pvec, npages); if (!IS_ERR(pages)) { + __i915_gem_userptr_set_active(obj, true); pinned = 0; pages = NULL; } } obj->userptr.work = ERR_CAST(pages); - if (IS_ERR(pages)) - __i915_gem_userptr_set_active(obj, false); } i915_gem_object_unlock(obj); @@ -566,7 +565,6 @@ static int i915_gem_userptr_get_pages(struct drm_i915_gem_object *obj) struct mm_struct *mm = obj->userptr.mm->mm; struct page **pvec; struct sg_table *pages; - bool active; int pinned; unsigned int gup_flags = 0; @@ -621,19 +619,16 @@ static int i915_gem_userptr_get_pages(struct drm_i915_gem_object *obj) } } - active = false; if (pinned < 0) { pages = ERR_PTR(pinned); pinned = 0; } else if (pinned < num_pages) { pages = __i915_gem_userptr_get_pages_schedule(obj); - active = pages == ERR_PTR(-EAGAIN); } else { pages = __i915_gem_userptr_alloc_pages(obj, pvec, num_pages); - active = !IS_ERR(pages); + if (!IS_ERR(pages)) + __i915_gem_userptr_set_active(obj, true); } - if (active) - __i915_gem_userptr_set_active(obj, true); if (IS_ERR(pages)) unpin_user_pages(pvec, pinned); -- 2.20.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 35/37] drm/i915: Remove unused i915_gem_evict_vm()
Obsolete, last user removed. Signed-off-by: Chris Wilson Reviewed-by: Thomas Hellström --- drivers/gpu/drm/i915/i915_drv.h | 1 - drivers/gpu/drm/i915/i915_gem_evict.c | 57 --- .../gpu/drm/i915/selftests/i915_gem_evict.c | 40 - 3 files changed, 98 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 05a2624116a1..04243dc286c7 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1867,7 +1867,6 @@ int __must_check i915_gem_evict_something(struct i915_address_space *vm, int __must_check i915_gem_evict_for_node(struct i915_address_space *vm, struct drm_mm_node *node, unsigned int flags); -int i915_gem_evict_vm(struct i915_address_space *vm); /* i915_gem_internal.c */ struct drm_i915_gem_object * diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c index 6501939929d5..e35f0ba5e245 100644 --- a/drivers/gpu/drm/i915/i915_gem_evict.c +++ b/drivers/gpu/drm/i915/i915_gem_evict.c @@ -343,63 +343,6 @@ int i915_gem_evict_for_node(struct i915_address_space *vm, return ret; } -/** - * i915_gem_evict_vm - Evict all idle vmas from a vm - * @vm: Address space to cleanse - * - * This function evicts all vmas from a vm. - * - * This is used by the execbuf code as a last-ditch effort to defragment the - * address space. - * - * To clarify: This is for freeing up virtual address space, not for freeing - * memory in e.g. the shrinker. - */ -int i915_gem_evict_vm(struct i915_address_space *vm) -{ - int ret = 0; - - lockdep_assert_held(&vm->mutex); - trace_i915_gem_evict_vm(vm); - - /* Switch back to the default context in order to unpin -* the existing context objects. However, such objects only -* pin themselves inside the global GTT and performing the -* switch otherwise is ineffective. -*/ - if (i915_is_ggtt(vm)) { - ret = ggtt_flush(vm->gt); - if (ret) - return ret; - } - - do { - struct i915_vma *vma, *vn; - LIST_HEAD(eviction_list); - - list_for_each_entry(vma, &vm->bound_list, vm_link) { - if (i915_vma_is_pinned(vma)) - continue; - - __i915_vma_pin(vma); - list_add(&vma->evict_link, &eviction_list); - } - if (list_empty(&eviction_list)) - break; - - ret = 0; - list_for_each_entry_safe(vma, vn, &eviction_list, evict_link) { - __i915_vma_unpin(vma); - if (ret == 0) - ret = __i915_vma_unbind(vma); - if (ret != -EINTR) /* "Get me out of here!" */ - ret = 0; - } - } while (ret == 0); - - return ret; -} - #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) #include "selftests/i915_gem_evict.c" #endif diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c index 48ea7f0ff7b9..b851b17d6f5a 100644 --- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c +++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c @@ -327,52 +327,12 @@ static int igt_evict_for_cache_color(void *arg) return err; } -static int igt_evict_vm(void *arg) -{ - struct intel_gt *gt = arg; - struct i915_ggtt *ggtt = gt->ggtt; - LIST_HEAD(objects); - int err; - - /* Fill the GGTT with pinned objects and try to evict everything. */ - - err = populate_ggtt(ggtt, &objects); - if (err) - goto cleanup; - - /* Everything is pinned, nothing should happen */ - mutex_lock(&ggtt->vm.mutex); - err = i915_gem_evict_vm(&ggtt->vm); - mutex_unlock(&ggtt->vm.mutex); - if (err) { - pr_err("i915_gem_evict_vm on a full GGTT returned err=%d]\n", - err); - goto cleanup; - } - - unpin_ggtt(ggtt); - - mutex_lock(&ggtt->vm.mutex); - err = i915_gem_evict_vm(&ggtt->vm); - mutex_unlock(&ggtt->vm.mutex); - if (err) { - pr_err("i915_gem_evict_vm on a full GGTT returned err=%d]\n", - err); - goto cleanup; - } - -cleanup: - cleanup_objects(ggtt, &objects); - return err; -} - int i915_gem_evict_mock_selftests(void) { static const struct i915_subtest tests[] = { SUBTEST(igt_evict_something), SUBTEST(igt_evict_for_vma), SUBTEST(igt_evict_for_cache_color), - SUBTEST(igt_evict_vm), SUBTEST(igt_overcommit), }; struct drm_i915_private *i915; -- 2.20.1 __
[Intel-gfx] [PATCH 29/37] drm/i915/gem: Replace i915_gem_object.mm.mutex with reservation_ww_class
Our goal is to pull all memory reservations (next iteration obj->ops->get_pages()) under a ww_mutex, and to align those reservations with other drivers, i.e. control all such allocations with the reservation_ww_class. Currently, this is under the purview of the obj->mm.mutex, and while obj->mm remains an embedded struct we can "simply" switch to using the reservation_ww_class obj->base.resv->lock The major consequence is the impact on the shrinker paths as the reservation_ww_class is used to wrap allocations, and a ww_mutex does not support subclassing so we cannot do our usual trick of knowing that we never recurse inside the shrinker and instead have to finish the reclaim with a trylock. This may result in us failing to release the pages after having released the vma. This will have to do until a better idea comes along. However, this step only converts the mutex over and continues to treat everything as a single allocation and pinning the pages. With the ww_mutex in place we can remove the temporary pinning, as we can then reserve all storage en masse. One last thing to do: kill the implict page pinning for active vma. This will require us to invalidate the vma->pages when the backing store is removed (and we expect that while the vma is active, we mark the backing store as active so that it cannot be removed while the HW is busy.) Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gem/i915_gem_clflush.c | 20 +- drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c| 19 +- drivers/gpu/drm/i915/gem/i915_gem_domain.c| 65 ++ .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 40 +++- drivers/gpu/drm/i915/gem/i915_gem_object.c| 8 +- drivers/gpu/drm/i915/gem/i915_gem_object.h| 37 +-- .../gpu/drm/i915/gem/i915_gem_object_types.h | 1 - drivers/gpu/drm/i915/gem/i915_gem_pages.c | 147 ++-- drivers/gpu/drm/i915/gem/i915_gem_phys.c | 8 +- drivers/gpu/drm/i915/gem/i915_gem_shrinker.c | 13 +- drivers/gpu/drm/i915/gem/i915_gem_tiling.c| 2 - drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 15 +- .../gpu/drm/i915/gem/selftests/huge_pages.c | 32 ++- .../i915/gem/selftests/i915_gem_coherency.c | 14 +- .../drm/i915/gem/selftests/i915_gem_context.c | 10 +- .../drm/i915/gem/selftests/i915_gem_mman.c| 2 + drivers/gpu/drm/i915/gt/gen6_ppgtt.c | 2 - drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 1 - drivers/gpu/drm/i915/gt/intel_ggtt.c | 5 +- drivers/gpu/drm/i915/gt/intel_gtt.c | 6 +- drivers/gpu/drm/i915/gt/intel_gtt.h | 2 - drivers/gpu/drm/i915/gt/intel_ppgtt.c | 1 + drivers/gpu/drm/i915/i915_gem.c | 16 +- drivers/gpu/drm/i915/i915_vma.c | 216 +++--- drivers/gpu/drm/i915/i915_vma_types.h | 6 - drivers/gpu/drm/i915/mm/i915_acquire_ctx.c| 12 +- drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 4 +- .../drm/i915/selftests/intel_memory_region.c | 17 +- 28 files changed, 336 insertions(+), 385 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_clflush.c b/drivers/gpu/drm/i915/gem/i915_gem_clflush.c index bc0223716906..a32fd0d5570b 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_clflush.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_clflush.c @@ -27,16 +27,8 @@ static void __do_clflush(struct drm_i915_gem_object *obj) static int clflush_work(struct dma_fence_work *base) { struct clflush *clflush = container_of(base, typeof(*clflush), base); - struct drm_i915_gem_object *obj = clflush->obj; - int err; - - err = i915_gem_object_pin_pages(obj); - if (err) - return err; - - __do_clflush(obj); - i915_gem_object_unpin_pages(obj); + __do_clflush(clflush->obj); return 0; } @@ -44,7 +36,7 @@ static void clflush_release(struct dma_fence_work *base) { struct clflush *clflush = container_of(base, typeof(*clflush), base); - i915_gem_object_put(clflush->obj); + i915_gem_object_unpin_pages(clflush->obj); } static const struct dma_fence_work_ops clflush_ops = { @@ -63,8 +55,14 @@ static struct clflush *clflush_work_create(struct drm_i915_gem_object *obj) if (!clflush) return NULL; + if (__i915_gem_object_get_pages_locked(obj)) { + kfree(clflush); + return NULL; + } + dma_fence_work_init(&clflush->base, &clflush_ops); - clflush->obj = i915_gem_object_get(obj); /* obj <-> clflush cycle */ + __i915_gem_object_pin_pages(obj); + clflush->obj = obj; /* Beware the obj.resv <-> clflush fence cycle */ return clflush; } diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c index 2679380159fc..f965fa6c3353 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c @@ -124,19 +124,19 @@ static int i915_gem_begin_cpu_access(struct dma_buf *dma_buf, e
[Intel-gfx] [PATCH 23/37] drm/i915/gem: Manage GTT placement bias (starting offset) explicitly
Since we can control placement in the ppGTT explicitly, we can specify our desired starting offset exactly on a per-vma basis. This prevents us falling down a few corner cases where we confuse the user with our choices. Signed-off-by: Chris Wilson --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 67 +-- 1 file changed, 31 insertions(+), 36 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 19cab5541dbc..0839397c7e50 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -36,6 +36,7 @@ struct eb_vma { /** This vma's place in the execbuf reservation list */ struct drm_i915_gem_exec_object2 *exec; + u32 bias; struct list_head bind_link; struct list_head unbound_link; @@ -61,15 +62,12 @@ struct eb_vma_array { #define __EXEC_OBJECT_HAS_PIN BIT(31) #define __EXEC_OBJECT_HAS_FENCEBIT(30) #define __EXEC_OBJECT_NEEDS_MAPBIT(29) -#define __EXEC_OBJECT_NEEDS_BIAS BIT(28) -#define __EXEC_OBJECT_INTERNAL_FLAGS (~0u << 28) /* all of the above */ +#define __EXEC_OBJECT_INTERNAL_FLAGS (~0u << 29) /* all of the above */ #define __EXEC_HAS_RELOC BIT(31) #define __EXEC_INTERNAL_FLAGS (~0u << 31) #define UPDATE PIN_OFFSET_FIXED -#define BATCH_OFFSET_BIAS (256*1024) - #define __I915_EXEC_ILLEGAL_FLAGS \ (__I915_EXEC_UNKNOWN_FLAGS | \ I915_EXEC_CONSTANTS_MASK | \ @@ -291,7 +289,7 @@ struct i915_execbuffer { } parser; u64 invalid_flags; /** Set of execobj.flags that are invalid */ - u32 context_flags; /** Set of execobj.flags to insert from the ctx */ + u32 context_bias; u32 batch_start_offset; /** Location within object of batch */ u32 batch_len; /** Length of batch within object */ @@ -491,11 +489,12 @@ static int eb_create(struct i915_execbuffer *eb) return 0; } -static bool -eb_vma_misplaced(const struct drm_i915_gem_exec_object2 *entry, -const struct i915_vma *vma, -unsigned int flags) +static bool eb_vma_misplaced(const struct eb_vma *ev) { + const struct drm_i915_gem_exec_object2 *entry = ev->exec; + const struct i915_vma *vma = ev->vma; + unsigned int flags = ev->flags; + if (test_bit(I915_VMA_ERROR_BIT, __i915_vma_flags(vma))) return true; @@ -509,8 +508,7 @@ eb_vma_misplaced(const struct drm_i915_gem_exec_object2 *entry, vma->node.start != entry->offset) return true; - if (flags & __EXEC_OBJECT_NEEDS_BIAS && - vma->node.start < BATCH_OFFSET_BIAS) + if (vma->node.start < ev->bias) return true; if (!(flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS) && @@ -529,10 +527,7 @@ static bool eb_pin_vma_fence_inplace(struct eb_vma *ev) return false; /* We need to add some new fence serialisation */ } -static inline bool -eb_pin_vma_inplace(struct i915_execbuffer *eb, - const struct drm_i915_gem_exec_object2 *entry, - struct eb_vma *ev) +static inline bool eb_pin_vma_inplace(struct eb_vma *ev) { struct i915_vma *vma = ev->vma; unsigned int pin_flags; @@ -541,7 +536,7 @@ eb_pin_vma_inplace(struct i915_execbuffer *eb, if (!i915_active_is_idle(&vma->vm->binding)) return false; - if (eb_vma_misplaced(entry, vma, ev->flags)) + if (eb_vma_misplaced(ev)) return false; pin_flags = PIN_USER; @@ -559,7 +554,7 @@ eb_pin_vma_inplace(struct i915_execbuffer *eb, } } - GEM_BUG_ON(eb_vma_misplaced(entry, vma, ev->flags)); + GEM_BUG_ON(eb_vma_misplaced(ev)); ev->flags |= __EXEC_OBJECT_HAS_PIN; return true; @@ -608,9 +603,6 @@ eb_validate_vma(struct i915_execbuffer *eb, entry->flags |= EXEC_OBJECT_NEEDS_GTT | __EXEC_OBJECT_NEEDS_MAP; } - if (!(entry->flags & EXEC_OBJECT_PINNED)) - entry->flags |= eb->context_flags; - return 0; } @@ -627,6 +619,7 @@ eb_add_vma(struct i915_execbuffer *eb, ev->vma = vma; ev->exec = entry; ev->flags = entry->flags; + ev->bias = eb->context_bias; if (eb->lut_size > 0) { ev->handle = entry->handle; @@ -653,7 +646,8 @@ eb_add_vma(struct i915_execbuffer *eb, if (i == batch_idx) { if (entry->relocation_count && !(ev->flags & EXEC_OBJECT_PINNED)) - ev->flags |= __EXEC_OBJECT_NEEDS_BIAS; + ev->bias = max_t(u32, ev->bias, SZ_256K); + if (eb->has_fence) ev->flags |= EXEC_OBJECT_NEEDS_FENCE; @@ -979,8 +973,9 @@ static int eb_reserve_vma(struct eb_vm_work *work, struct eb_bind_vma *bind) co
[Intel-gfx] [PATCH 27/37] drm/i915/gtt: map the PD up front
From: Matthew Auld We need to general our accessor for the page directories and tables from using the simple kmap_atomic to support local memory, and this setup must be done on acquisition of the backing storage prior to entering fence execution contexts. Here we replace the kmap with the object maping code that for simple single page shmemfs object will return a plain kmap, that is then kept for the lifetime of the page directory. Signed-off-by: Matthew Auld Signed-off-by: Chris Wilson --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 3 +- drivers/gpu/drm/i915/gt/gen6_ppgtt.c | 11 +++--- drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 26 ++ drivers/gpu/drm/i915/gt/intel_ggtt.c | 2 +- drivers/gpu/drm/i915/gt/intel_gtt.c | 34 --- drivers/gpu/drm/i915/gt/intel_gtt.h | 9 ++--- drivers/gpu/drm/i915/gt/intel_ppgtt.c | 7 ++-- drivers/gpu/drm/i915/i915_vma.c | 3 +- drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 8 ++--- 9 files changed, 45 insertions(+), 58 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 3a79b6facb02..d3ac2542a039 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -1449,7 +1449,8 @@ static int eb_reserve_vm(struct i915_execbuffer *eb) if (err) return eb_vm_work_cancel(work, err); - err = i915_vm_pin_pt_stash(work->vm, &work->stash); + /* We also need to prepare mappings to write the PD pages */ + err = i915_vm_map_pt_stash(work->vm, &work->stash); if (err) return eb_vm_work_cancel(work, err); diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c index 1b4fa9ce6658..dd723d9832b9 100644 --- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c @@ -105,9 +105,8 @@ static void gen6_ppgtt_clear_range(struct i915_address_space *vm, * entries back to scratch. */ - vaddr = kmap_atomic_px(pt); + vaddr = px_vaddr(pt); memset32(vaddr + pte, scratch_pte, count); - kunmap_atomic(vaddr); pte = 0; } @@ -129,7 +128,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm, GEM_BUG_ON(!pd->entry[act_pt]); - vaddr = kmap_atomic_px(i915_pt_entry(pd, act_pt)); + vaddr = px_vaddr(i915_pt_entry(pd, act_pt)); do { GEM_BUG_ON(iter.sg->length < I915_GTT_PAGE_SIZE); vaddr[act_pte] = pte_encode | GEN6_PTE_ADDR_ENCODE(iter.dma); @@ -145,12 +144,10 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm, } if (++act_pte == GEN6_PTES) { - kunmap_atomic(vaddr); - vaddr = kmap_atomic_px(i915_pt_entry(pd, ++act_pt)); + vaddr = px_vaddr(i915_pt_entry(pd, ++act_pt)); act_pte = 0; } } while (1); - kunmap_atomic(vaddr); vma->page_sizes.gtt = I915_GTT_PAGE_SIZE; } @@ -242,7 +239,7 @@ static int gen6_ppgtt_init_scratch(struct gen6_ppgtt *ppgtt) if (IS_ERR(vm->scratch[1])) return PTR_ERR(vm->scratch[1]); - ret = pin_pt_dma(vm, vm->scratch[1]); + ret = map_pt_dma(vm, vm->scratch[1]); if (ret) { i915_gem_object_put(vm->scratch[1]); return ret; diff --git a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c index eb64f474a78c..ca25e751a023 100644 --- a/drivers/gpu/drm/i915/gt/gen8_ppgtt.c +++ b/drivers/gpu/drm/i915/gt/gen8_ppgtt.c @@ -237,11 +237,10 @@ static u64 __gen8_ppgtt_clear(struct i915_address_space * const vm, atomic_read(&pt->used)); GEM_BUG_ON(!count || count >= atomic_read(&pt->used)); - vaddr = kmap_atomic_px(pt); + vaddr = px_vaddr(pt); memset64(vaddr + gen8_pd_index(start, 0), vm->scratch[0]->encode, count); - kunmap_atomic(vaddr); atomic_sub(count, &pt->used); start += count; @@ -370,7 +369,7 @@ gen8_ppgtt_insert_pte(struct i915_ppgtt *ppgtt, gen8_pte_t *vaddr; pd = i915_pd_entry(pdp, gen8_pd_index(idx, 2)); - vaddr = kmap_atomic_px(i915_pt_entry(pd, gen8_pd_index(idx, 1))); + vaddr = px_vaddr(i915_pt_entry(pd, gen8_pd_index(idx, 1))); do { GEM_BUG_ON(iter->sg->length < I915_GTT_PAGE_SIZE); vaddr[gen8_pd_index(idx, 0)] = pte_encode | iter->dma; @@ -397,12 +396,10 @@ gen8_ppgtt_inser
[Intel-gfx] ✓ Fi.CI.BAT: success for HDCP minor refactoring (rev3)
== Series Details == Series: HDCP minor refactoring (rev3) URL : https://patchwork.freedesktop.org/series/77224/ State : success == Summary == CI Bug Log - changes from CI_DRM_8845 -> Patchwork_18309 Summary --- **SUCCESS** No regressions found. External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18309/index.html Known issues Here are the changes found in Patchwork_18309 that come from known issues: ### IGT changes ### Issues hit * igt@i915_selftest@live@gt_lrc: - fi-tgl-u2: [PASS][1] -> [DMESG-FAIL][2] ([i915#1233]) [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/fi-tgl-u2/igt@i915_selftest@live@gt_lrc.html [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18309/fi-tgl-u2/igt@i915_selftest@live@gt_lrc.html * igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic: - fi-icl-u2: [PASS][3] -> [DMESG-WARN][4] ([i915#1982]) [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/fi-icl-u2/igt@kms_cursor_leg...@basic-busy-flip-before-cursor-atomic.html [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18309/fi-icl-u2/igt@kms_cursor_leg...@basic-busy-flip-before-cursor-atomic.html * igt@kms_pipe_crc_basic@hang-read-crc-pipe-a: - fi-bsw-kefka: [PASS][5] -> [DMESG-WARN][6] ([i915#1982]) [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/fi-bsw-kefka/igt@kms_pipe_crc_ba...@hang-read-crc-pipe-a.html [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18309/fi-bsw-kefka/igt@kms_pipe_crc_ba...@hang-read-crc-pipe-a.html Possible fixes * igt@kms_busy@basic@flip: - {fi-tgl-dsi}: [DMESG-WARN][7] ([i915#1982]) -> [PASS][8] [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/fi-tgl-dsi/igt@kms_busy@ba...@flip.html [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18309/fi-tgl-dsi/igt@kms_busy@ba...@flip.html - fi-kbl-x1275: [DMESG-WARN][9] ([i915#62] / [i915#92] / [i915#95]) -> [PASS][10] [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/fi-kbl-x1275/igt@kms_busy@ba...@flip.html [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18309/fi-kbl-x1275/igt@kms_busy@ba...@flip.html * igt@kms_chamelium@common-hpd-after-suspend: - fi-kbl-7500u: [DMESG-WARN][11] ([i915#2203]) -> [PASS][12] [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/fi-kbl-7500u/igt@kms_chamel...@common-hpd-after-suspend.html [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18309/fi-kbl-7500u/igt@kms_chamel...@common-hpd-after-suspend.html * igt@kms_flip@basic-flip-vs-wf_vblank@b-edp1: - fi-icl-u2: [DMESG-WARN][13] ([i915#1982]) -> [PASS][14] [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/fi-icl-u2/igt@kms_flip@basic-flip-vs-wf_vbl...@b-edp1.html [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18309/fi-icl-u2/igt@kms_flip@basic-flip-vs-wf_vbl...@b-edp1.html Warnings * igt@kms_cursor_legacy@basic-flip-before-cursor-atomic: - fi-kbl-x1275: [DMESG-WARN][15] ([i915#62] / [i915#92]) -> [DMESG-WARN][16] ([i915#62] / [i915#92] / [i915#95]) +5 similar issues [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/fi-kbl-x1275/igt@kms_cursor_leg...@basic-flip-before-cursor-atomic.html [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18309/fi-kbl-x1275/igt@kms_cursor_leg...@basic-flip-before-cursor-atomic.html * igt@kms_force_connector_basic@force-edid: - fi-kbl-x1275: [DMESG-WARN][17] ([i915#62] / [i915#92] / [i915#95]) -> [DMESG-WARN][18] ([i915#62] / [i915#92]) +4 similar issues [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/fi-kbl-x1275/igt@kms_force_connector_ba...@force-edid.html [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18309/fi-kbl-x1275/igt@kms_force_connector_ba...@force-edid.html {name}: This element is suppressed. This means it is ignored when computing the status of the difference (SUCCESS, WARNING, or FAILURE). [i915#1233]: https://gitlab.freedesktop.org/drm/intel/issues/1233 [i915#1982]: https://gitlab.freedesktop.org/drm/intel/issues/1982 [i915#2203]: https://gitlab.freedesktop.org/drm/intel/issues/2203 [i915#62]: https://gitlab.freedesktop.org/drm/intel/issues/62 [i915#92]: https://gitlab.freedesktop.org/drm/intel/issues/92 [i915#95]: https://gitlab.freedesktop.org/drm/intel/issues/95 Participating hosts (44 -> 37) -- Missing(7): fi-ilk-m540 fi-hsw-4200u fi-byt-squawks fi-bsw-cyan fi-ctg-p8600 fi-byt-clapper fi-bdw-samus Build changes - * Linux: CI_DRM_8845 -> Patchwork_18309 CI-20190529: 20190529 CI_DRM_8845: a486392fed875e0b9154eaeb4bf6a4193484e0b3 @ git://anongit.freedesktop.org/gfx-ci/linux IGT_5758: bb34603947667cb44ed9ff0db8dccbb9d3f42357 @ git://anongit.freedesktop.org/xorg/app/inte
[Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Replace obj->mm.lock with reservation_ww_class
== Series Details == Series: Replace obj->mm.lock with reservation_ww_class URL : https://patchwork.freedesktop.org/series/80291/ State : warning == Summary == $ dim checkpatch origin/drm-tip fa0ff87bd9b0 drm/i915/gem: Reduce context termination list iteration guard to RCU -:20: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line) #20: References: d22d2d073ef8 ("drm/i915: Protect i915_request_await_start from early waits") # rcu protection of timeline->requests -:20: ERROR:GIT_COMMIT_ID: Please use git commit description style 'commit <12+ chars of sha1> ("")' - ie: 'commit d22d2d073ef8 ("drm/i915: Protect i915_request_await_start from early waits")' #20: References: d22d2d073ef8 ("drm/i915: Protect i915_request_await_start from early waits") # rcu protection of timeline->requests total: 1 errors, 1 warnings, 0 checks, 65 lines checked ce73eec0532c drm/i915/gt: Protect context lifetime with RCU dd5a5156d3f0 drm/i915/gt: Free stale request on destroying the virtual engine 3bc9e76bcdd9 drm/i915/gt: Defer enabling the breadcrumb interrupt to after submission 4749f15dd1d3 drm/i915/gt: Track signaled breadcrumbs outside of the breadcrumb spinlock 042d0d931dcf drm/i915/gt: Don't cancel the interrupt shadow too early 1ee396796726 drm/i915/gt: Split the breadcrumb spinlock between global and contexts -:339: CHECK:UNCOMMENTED_DEFINITION: spinlock_t definition without comment #339: FILE: drivers/gpu/drm/i915/gt/intel_context_types.h:54: + spinlock_t signal_lock; total: 0 errors, 0 warnings, 1 checks, 293 lines checked 98e29e72ccd2 drm/i915/gem: Don't drop the timeline lock during execbuf f0a442c8ea00 drm/i915/gem: Rename execbuf.bind_link to unbound_link aae43a649127 drm/i915/gem: Rename the list of relocations to reloc_list f6729467a31c drm/i915/gem: Move the 'cached' info to i915_execbuffer cbf2d6b56eea drm/i915/gem: Break apart the early i915_vma_pin from execbuf object lookup 89bf55e09136 drm/i915/gem: Remove the call for no-evict i915_vma_pin 225f65b93b67 drm/i915: Serialise i915_vma_pin_inplace() with i915_vma_unbind() aeec0677ca2f drm/i915: Add list_for_each_entry_safe_continue_reverse -:25: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'pos' - possible side-effects? #25: FILE: drivers/gpu/drm/i915/i915_utils.h:269: +#define list_for_each_entry_safe_continue_reverse(pos, n, head, member) \ + for (pos = list_prev_entry(pos, member),\ +n = list_prev_entry(pos, member); \ +&pos->member != (head);\ +pos = n, n = list_prev_entry(n, member)) -:25: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'n' - possible side-effects? #25: FILE: drivers/gpu/drm/i915/i915_utils.h:269: +#define list_for_each_entry_safe_continue_reverse(pos, n, head, member) \ + for (pos = list_prev_entry(pos, member),\ +n = list_prev_entry(pos, member); \ +&pos->member != (head);\ +pos = n, n = list_prev_entry(n, member)) -:25: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'member' - possible side-effects? #25: FILE: drivers/gpu/drm/i915/i915_utils.h:269: +#define list_for_each_entry_safe_continue_reverse(pos, n, head, member) \ + for (pos = list_prev_entry(pos, member),\ +n = list_prev_entry(pos, member); \ +&pos->member != (head);\ +pos = n, n = list_prev_entry(n, member)) total: 0 errors, 0 warnings, 3 checks, 12 lines checked 7ebf143a2b61 drm/i915: Always defer fenced work to the worker 1105264eb6df drm/i915/gem: Assign context id for async work b5426c3e8f1e drm/i915/gem: Separate the ww_mutex walker into its own list 418443e4f373 drm/i915/gem: Asynchronous GTT unbinding 6909712422ff drm/i915/gem: Bind the fence async for execbuf 08253ca5d7f8 drm/i915/gem: Include cmdparser in common execbuf pinning c04ff4d7bd4f drm/i915/gem: Include secure batch in common execbuf pinning 555b300812f8 drm/i915/gem: Manage GTT placement bias (starting offset) explicitly 245b73f9f203 drm/i915/gem: Reintroduce multiple passes for reloc processing -:1390: WARNING:MEMORY_BARRIER: memory barrier without comment #1390: FILE: drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c:174: + wmb(); total: 0 errors, 1 warnings, 0 checks, 1401 lines checked aec7bc6e676a drm/i915: Add an implementation for common reservation_ww_class locking -:65: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating? #65: new file mode 100644 -:360: WARNING:LINE_SPACING: Missing a blank line after declarations #360: FILE: drivers/gpu/drm/i915/mm/st_acquire_ctx.c:106: + const unsigned int total = ARRAY_SIZE(dl->obj); + I915_RND_STATE(prng); -:456: WA
[Intel-gfx] ✗ Fi.CI.SPARSE: warning for Replace obj->mm.lock with reservation_ww_class
== Series Details == Series: Replace obj->mm.lock with reservation_ww_class URL : https://patchwork.freedesktop.org/series/80291/ State : warning == Summary == $ dim sparse --fast origin/drm-tip Sparse version: v0.6.0 Fast mode used, each commit won't be checked separately. ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] ✓ Fi.CI.BAT: success for Replace obj->mm.lock with reservation_ww_class
== Series Details == Series: Replace obj->mm.lock with reservation_ww_class URL : https://patchwork.freedesktop.org/series/80291/ State : success == Summary == CI Bug Log - changes from CI_DRM_8845 -> Patchwork_18310 Summary --- **SUCCESS** No regressions found. External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18310/index.html Known issues Here are the changes found in Patchwork_18310 that come from known issues: ### IGT changes ### Issues hit * igt@gem_exec_suspend@basic-s0: - fi-tgl-u2: [PASS][1] -> [FAIL][2] ([i915#1888]) [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/fi-tgl-u2/igt@gem_exec_susp...@basic-s0.html [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18310/fi-tgl-u2/igt@gem_exec_susp...@basic-s0.html Possible fixes * igt@i915_module_load@reload: - fi-byt-j1900: [DMESG-WARN][3] ([i915#1982]) -> [PASS][4] [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/fi-byt-j1900/igt@i915_module_l...@reload.html [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18310/fi-byt-j1900/igt@i915_module_l...@reload.html * igt@kms_busy@basic@flip: - fi-kbl-x1275: [DMESG-WARN][5] ([i915#62] / [i915#92] / [i915#95]) -> [PASS][6] [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/fi-kbl-x1275/igt@kms_busy@ba...@flip.html [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18310/fi-kbl-x1275/igt@kms_busy@ba...@flip.html * igt@kms_chamelium@common-hpd-after-suspend: - fi-kbl-7500u: [DMESG-WARN][7] ([i915#2203]) -> [PASS][8] [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/fi-kbl-7500u/igt@kms_chamel...@common-hpd-after-suspend.html [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18310/fi-kbl-7500u/igt@kms_chamel...@common-hpd-after-suspend.html * igt@kms_flip@basic-flip-vs-wf_vblank@b-edp1: - fi-icl-u2: [DMESG-WARN][9] ([i915#1982]) -> [PASS][10] [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/fi-icl-u2/igt@kms_flip@basic-flip-vs-wf_vbl...@b-edp1.html [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18310/fi-icl-u2/igt@kms_flip@basic-flip-vs-wf_vbl...@b-edp1.html Warnings * igt@gem_exec_suspend@basic-s0: - fi-kbl-x1275: [DMESG-WARN][11] ([i915#62] / [i915#92] / [i915#95]) -> [DMESG-WARN][12] ([i915#1982] / [i915#62] / [i915#92]) [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/fi-kbl-x1275/igt@gem_exec_susp...@basic-s0.html [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18310/fi-kbl-x1275/igt@gem_exec_susp...@basic-s0.html * igt@kms_force_connector_basic@force-connector-state: - fi-kbl-x1275: [DMESG-WARN][13] ([i915#62] / [i915#92] / [i915#95]) -> [DMESG-WARN][14] ([i915#62] / [i915#92]) +1 similar issue [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/fi-kbl-x1275/igt@kms_force_connector_ba...@force-connector-state.html [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18310/fi-kbl-x1275/igt@kms_force_connector_ba...@force-connector-state.html * igt@prime_vgem@basic-fence-flip: - fi-kbl-x1275: [DMESG-WARN][15] ([i915#62] / [i915#92]) -> [DMESG-WARN][16] ([i915#62] / [i915#92] / [i915#95]) +9 similar issues [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/fi-kbl-x1275/igt@prime_v...@basic-fence-flip.html [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18310/fi-kbl-x1275/igt@prime_v...@basic-fence-flip.html {name}: This element is suppressed. This means it is ignored when computing the status of the difference (SUCCESS, WARNING, or FAILURE). [i915#1888]: https://gitlab.freedesktop.org/drm/intel/issues/1888 [i915#1982]: https://gitlab.freedesktop.org/drm/intel/issues/1982 [i915#2203]: https://gitlab.freedesktop.org/drm/intel/issues/2203 [i915#62]: https://gitlab.freedesktop.org/drm/intel/issues/62 [i915#92]: https://gitlab.freedesktop.org/drm/intel/issues/92 [i915#95]: https://gitlab.freedesktop.org/drm/intel/issues/95 Participating hosts (44 -> 37) -- Missing(7): fi-ilk-m540 fi-hsw-4200u fi-byt-squawks fi-bsw-cyan fi-ctg-p8600 fi-byt-clapper fi-bdw-samus Build changes - * Linux: CI_DRM_8845 -> Patchwork_18310 CI-20190529: 20190529 CI_DRM_8845: a486392fed875e0b9154eaeb4bf6a4193484e0b3 @ git://anongit.freedesktop.org/gfx-ci/linux IGT_5758: bb34603947667cb44ed9ff0db8dccbb9d3f42357 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools Patchwork_18310: ed8fec45359a20d6d36aa007d51975e219e295d1 @ git://anongit.freedesktop.org/gfx-ci/linux == Linux commits == ed8fec45359a drm/i915/gem: Delay attach mmu-notifier until we acquire the pinned userptr 5fce5e4b3f18 drm/i915/display: Drop object lock from intel_unpin_fb_vma 6e88c2dcdd0b drm/i915: Remove unused i915_gem_evict_vm() c51644
Re: [Intel-gfx] [PATCH v3 1/2] drm/i915/hdcp: Add update_pipe early return
On 2020-08-05 at 17:15:20 +0530, Anshuman Gupta wrote: > Currently intel_hdcp_update_pipe() is also getting called for non-hdcp > connectors and get through its conditional code flow, which is completely > unnecessary for non-hdcp connectors, therefore it make sense to > have an early return. No functional change. Looks good to me Reviewed-by: Ramalingam C > > v2: > - rebased. > > Reviewed-by: Uma Shankar > Signed-off-by: Anshuman Gupta > --- > drivers/gpu/drm/i915/display/intel_hdcp.c | 8 ++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/display/intel_hdcp.c > b/drivers/gpu/drm/i915/display/intel_hdcp.c > index 89a4d294822d..a1e0d518e529 100644 > --- a/drivers/gpu/drm/i915/display/intel_hdcp.c > +++ b/drivers/gpu/drm/i915/display/intel_hdcp.c > @@ -2082,11 +2082,15 @@ void intel_hdcp_update_pipe(struct intel_atomic_state > *state, > struct intel_connector *connector = > to_intel_connector(conn_state->connector); > struct intel_hdcp *hdcp = &connector->hdcp; > - bool content_protection_type_changed = > + bool content_protection_type_changed, desired_and_not_enabled = false; > + > + if (!connector->hdcp.shim) > + return; > + > + content_protection_type_changed = > (conn_state->hdcp_content_type != hdcp->content_type && >conn_state->content_protection != >DRM_MODE_CONTENT_PROTECTION_UNDESIRED); > - bool desired_and_not_enabled = false; > > /* >* During the HDCP encryption session if Type change is requested, > -- > 2.26.2 > ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 10/37] drm/i915/gem: Rename the list of relocations to reloc_list
On 05/08/2020 13:22, Chris Wilson wrote: Continuing the theme of calling the lists a foo_list, rename the relocs list. This means that we can now use relocs for the old reloc_cache that is not a cache anymore. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index a5b63ae17241..e7e16c62df1c 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -252,7 +252,7 @@ struct i915_execbuffer { struct list_head unbound; /** list of vma that have execobj.relocation_count */ - struct list_head relocs; + struct list_head relocs_list; /** * Track the most recently used object for relocations, as we @@ -577,7 +577,7 @@ eb_add_vma(struct i915_execbuffer *eb, } if (entry->relocation_count) - list_add_tail(&ev->reloc_link, &eb->relocs); + list_add_tail(&ev->reloc_link, &eb->relocs_list); /* * SNA is doing fancy tricks with compressing batch buffers, which leads @@ -932,7 +932,7 @@ static int eb_lookup_vmas(struct i915_execbuffer *eb) unsigned int i; int err = 0; - INIT_LIST_HEAD(&eb->relocs); + INIT_LIST_HEAD(&eb->relocs_list); INIT_LIST_HEAD(&eb->unbound); for (i = 0; i < eb->buffer_count; i++) { @@ -1592,7 +1592,7 @@ static int eb_relocate(struct i915_execbuffer *eb) struct eb_vma *ev; int flush; - list_for_each_entry(ev, &eb->relocs, reloc_link) { + list_for_each_entry(ev, &eb->relocs_list, reloc_link) { err = eb_relocate_vma(eb, ev); if (err) break; Reviewed-by: Tvrtko Ursulin Regards, Tvrtko ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 11/37] drm/i915/gem: Move the 'cached' info to i915_execbuffer
On 05/08/2020 13:22, Chris Wilson wrote: The reloc_cache contains some details that are used outside of the relocation handling, so lift those out of the embeddded struct into the principle struct i915_execbuffer. Signed-off-by: Chris Wilson --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 61 +++ .../i915/gem/selftests/i915_gem_execbuffer.c | 6 +- 2 files changed, 37 insertions(+), 30 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index e7e16c62df1c..e9ef0c287fd9 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -261,11 +261,6 @@ struct i915_execbuffer { */ struct reloc_cache { struct drm_mm_node node; /** temporary GTT binding */ - unsigned int gen; /** Cached value of INTEL_GEN */ - bool use_64bit_reloc : 1; - bool has_llc : 1; - bool has_fence : 1; - bool needs_unfenced : 1; struct intel_context *ce; @@ -283,6 +278,12 @@ struct i915_execbuffer { u32 batch_len; /** Length of batch within object */ u32 batch_flags; /** Flags composed for emit_bb_start() */ + unsigned int gen; /** Cached value of INTEL_GEN */ + bool use_64bit_reloc : 1; + bool has_llc : 1; + bool has_fence : 1; + bool needs_unfenced : 1; + /** * Indicate either the size of the hastable used to resolve * relocation handles, or if negative that we are using a direct @@ -540,11 +541,11 @@ eb_validate_vma(struct i915_execbuffer *eb, */ entry->offset = gen8_noncanonical_addr(entry->offset); - if (!eb->reloc_cache.has_fence) { + if (!eb->has_fence) { entry->flags &= ~EXEC_OBJECT_NEEDS_FENCE; } else { if ((entry->flags & EXEC_OBJECT_NEEDS_FENCE || -eb->reloc_cache.needs_unfenced) && +eb->needs_unfenced) && i915_gem_object_is_tiled(vma->obj)) entry->flags |= EXEC_OBJECT_NEEDS_GTT | __EXEC_OBJECT_NEEDS_MAP; } @@ -592,7 +593,7 @@ eb_add_vma(struct i915_execbuffer *eb, if (entry->relocation_count && !(ev->flags & EXEC_OBJECT_PINNED)) ev->flags |= __EXEC_OBJECT_NEEDS_BIAS; - if (eb->reloc_cache.has_fence) + if (eb->has_fence) ev->flags |= EXEC_OBJECT_NEEDS_FENCE; eb->batch = ev; @@ -995,15 +996,19 @@ relocation_target(const struct drm_i915_gem_relocation_entry *reloc, return gen8_canonical_addr((int)reloc->delta + target->node.start); } -static void reloc_cache_init(struct reloc_cache *cache, -struct drm_i915_private *i915) +static void eb_info_init(struct i915_execbuffer *eb, +struct drm_i915_private *i915) { /* Must be a variable in the struct to allow GCC to unroll. */ - cache->gen = INTEL_GEN(i915); - cache->has_llc = HAS_LLC(i915); - cache->use_64bit_reloc = HAS_64BIT_RELOC(i915); - cache->has_fence = cache->gen < 4; - cache->needs_unfenced = INTEL_INFO(i915)->unfenced_needs_alignment; + eb->gen = INTEL_GEN(i915); + eb->has_llc = HAS_LLC(i915); + eb->use_64bit_reloc = HAS_64BIT_RELOC(i915); + eb->has_fence = eb->gen < 4; + eb->needs_unfenced = INTEL_INFO(i915)->unfenced_needs_alignment; +} + +static void reloc_cache_init(struct reloc_cache *cache) +{ cache->node.flags = 0; cache->rq = NULL; cache->target = NULL; @@ -1011,8 +1016,9 @@ static void reloc_cache_init(struct reloc_cache *cache, #define RELOC_TAIL 4 -static int reloc_gpu_chain(struct reloc_cache *cache) +static int reloc_gpu_chain(struct i915_execbuffer *eb) { + struct reloc_cache *cache = &eb->reloc_cache; struct intel_gt_buffer_pool_node *pool; struct i915_request *rq = cache->rq; struct i915_vma *batch; @@ -1036,9 +1042,9 @@ static int reloc_gpu_chain(struct reloc_cache *cache) GEM_BUG_ON(cache->rq_size + RELOC_TAIL > PAGE_SIZE / sizeof(u32)); cmd = cache->rq_cmd + cache->rq_size; *cmd++ = MI_ARB_CHECK; - if (cache->gen >= 8) + if (eb->gen >= 8) *cmd++ = MI_BATCH_BUFFER_START_GEN8; - else if (cache->gen >= 6) + else if (eb->gen >= 6) *cmd++ = MI_BATCH_BUFFER_START; else *cmd++ = MI_BATCH_BUFFER_START | MI_BATCH_GTT; @@ -1061,7 +1067,7 @@ static int reloc_gpu_chain(struct reloc_cache *cache) goto out_pool; cmd = i915_gem_object_pin_map(batch->obj, - cache->has_llc ? + eb->has_llc ? I915_MAP_FORCE_WB :
Re: [Intel-gfx] [PATCH v3 2/2] drm/i915/hdcp: No direct access to power_well desc
On 2020-08-05 at 17:15:21 +0530, Anshuman Gupta wrote: > HDCP code doesn't require to access power_well internal stuff, > instead it should use the intel_display_power_well_is_enabled() > to get the status of desired power_well. > No functional change. > > v2: > - used with_intel_runtime_pm instead of get/put. [Jani] > v3: > - rebased. > > Cc: Jani Nikula > Signed-off-by: Anshuman Gupta LGTM. Reviewed-by: Ramalingam C -Ram > --- > drivers/gpu/drm/i915/display/intel_hdcp.c | 15 +++ > 1 file changed, 3 insertions(+), 12 deletions(-) > > diff --git a/drivers/gpu/drm/i915/display/intel_hdcp.c > b/drivers/gpu/drm/i915/display/intel_hdcp.c > index a1e0d518e529..e76b049618db 100644 > --- a/drivers/gpu/drm/i915/display/intel_hdcp.c > +++ b/drivers/gpu/drm/i915/display/intel_hdcp.c > @@ -148,9 +148,8 @@ static int intel_hdcp_poll_ksv_fifo(struct > intel_digital_port *dig_port, > > static bool hdcp_key_loadable(struct drm_i915_private *dev_priv) > { > - struct i915_power_domains *power_domains = &dev_priv->power_domains; > - struct i915_power_well *power_well; > enum i915_power_well_id id; > + intel_wakeref_t wakeref; > bool enabled = false; > > /* > @@ -162,17 +161,9 @@ static bool hdcp_key_loadable(struct drm_i915_private > *dev_priv) > else > id = SKL_DISP_PW_1; > > - mutex_lock(&power_domains->lock); > - > /* PG1 (power well #1) needs to be enabled */ > - for_each_power_well(dev_priv, power_well) { > - if (power_well->desc->id == id) { > - enabled = power_well->desc->ops->is_enabled(dev_priv, > - power_well); > - break; > - } > - } > - mutex_unlock(&power_domains->lock); > + with_intel_runtime_pm(&dev_priv->runtime_pm, wakeref) > + enabled = intel_display_power_well_is_enabled(dev_priv, id); > > /* >* Another req for hdcp key loadability is enabled state of pll for > -- > 2.26.2 > ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v5 1/5] drm/i915: Add enable/disable flip done and flip done handler
On 7/25/2020 4:56 AM, Paulo Zanoni wrote: Em seg, 2020-07-20 às 17:01 +0530, Karthik B S escreveu: Add enable/disable flip done functions and the flip done handler function which handles the flip done interrupt. Enable the flip done interrupt in IER. Enable flip done function is called before writing the surface address register as the write to this register triggers the flip done interrupt Flip done handler is used to send the page flip event as soon as the surface address is written as per the requirement of async flips. The interrupt is disabled after the event is sent. v2: -Change function name from icl_* to skl_* (Paulo) -Move flip handler to this patch (Paulo) -Remove vblank_put() (Paulo) -Enable flip done interrupt for gen9+ only (Paulo) -Enable flip done interrupt in power_well_post_enable hook (Paulo) -Removed the event check in flip done handler to handle async flips without pageflip events. v3: -Move skl_disable_flip_done out of interrupt handler (Paulo) -Make the pending vblank event NULL in the beginning of flip_done_handler to remove sporadic WARN_ON that is seen. v4: -Calculate timestamps using flip done time stamp and current timestamp for async flips (Ville) v5: -Fix the sparse warning by making the function 'g4x_get_flip_counter' static.(Reported-by: kernel test robot ) -Fix the typo in commit message. Signed-off-by: Karthik B S Signed-off-by: Vandita Kulkarni --- drivers/gpu/drm/i915/display/intel_display.c | 10 +++ drivers/gpu/drm/i915/i915_irq.c | 83 ++-- drivers/gpu/drm/i915/i915_irq.h | 2 + drivers/gpu/drm/i915/i915_reg.h | 4 +- 4 files changed, 91 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c index db2a5a1a9b35..b8ff032195d9 100644 --- a/drivers/gpu/drm/i915/display/intel_display.c +++ b/drivers/gpu/drm/i915/display/intel_display.c @@ -15562,6 +15562,13 @@ static void intel_atomic_commit_tail(struct intel_atomic_state *state) intel_dbuf_pre_plane_update(state); + for_each_new_intel_crtc_in_state(state, crtc, new_crtc_state, i) { + if (new_crtc_state->uapi.async_flip) { + skl_enable_flip_done(&crtc->base); + break; Do we really want the break here? What if more than one CRTC wants an async flip? Thanks for the review. This will fail for multiple CRTC case, I will remove this break. Perhaps you could extend IGT to try this. Currently we cannot add this scenario of having 2 crtc's in the same commit, as we're using the page flip ioctl. But I did try by hacking via the atomic path and 2 display with async is working fine. + } + } + /* Now enable the clocks, plane, pipe, and connectors that we set up. */ dev_priv->display.commit_modeset_enables(state); @@ -15583,6 +15590,9 @@ static void intel_atomic_commit_tail(struct intel_atomic_state *state) drm_atomic_helper_wait_for_flip_done(dev, &state->base); for_each_new_intel_crtc_in_state(state, crtc, new_crtc_state, i) { + if (new_crtc_state->uapi.async_flip) + skl_disable_flip_done(&crtc->base); Here we don't break in the first found, so at least there's an inconsistency. I will remove the break in the earlier loop. + if (new_crtc_state->hw.active && !needs_modeset(new_crtc_state) && !new_crtc_state->preload_luts && diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 1fa67700d8f4..95953b393941 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -697,14 +697,24 @@ u32 i915_get_vblank_counter(struct drm_crtc *crtc) return (((high1 << 8) | low) + (pixel >= vbl_start)) & 0xff; } +static u32 g4x_get_flip_counter(struct drm_crtc *crtc) +{ + struct drm_i915_private *dev_priv = to_i915(crtc->dev); + enum pipe pipe = to_intel_crtc(crtc)->pipe; + + return I915_READ(PIPE_FLIPCOUNT_G4X(pipe)); +} + u32 g4x_get_vblank_counter(struct drm_crtc *crtc) { struct drm_i915_private *dev_priv = to_i915(crtc->dev); enum pipe pipe = to_intel_crtc(crtc)->pipe; + if (crtc->state->async_flip) + return g4x_get_flip_counter(crtc); + return I915_READ(PIPE_FRMCOUNT_G4X(pipe)); I don't understand the intention behind this, can you please clarify? This goes back to my reply of the cover letter. It seems that here we're going to alternate between two different counters in our vblank count. So if user space alternates between sometimes using async flips and sometimes using normal flip it's going to get some very weird deltas, isn't it? At least this is what I remember from when I played with these registers: FLIPCOUNT drifts away from FRMCOUNT when we start using async
Re: [Intel-gfx] [PATCH v5 1/5] drm/i915: Add enable/disable flip done and flip done handler
On 7/27/2020 5:57 PM, Michel Dänzer wrote: On 2020-07-25 1:26 a.m., Paulo Zanoni wrote: Em seg, 2020-07-20 às 17:01 +0530, Karthik B S escreveu: diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 1fa67700d8f4..95953b393941 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -697,14 +697,24 @@ u32 i915_get_vblank_counter(struct drm_crtc *crtc) return (((high1 << 8) | low) + (pixel >= vbl_start)) & 0xff; } +static u32 g4x_get_flip_counter(struct drm_crtc *crtc) +{ + struct drm_i915_private *dev_priv = to_i915(crtc->dev); + enum pipe pipe = to_intel_crtc(crtc)->pipe; + + return I915_READ(PIPE_FLIPCOUNT_G4X(pipe)); +} + u32 g4x_get_vblank_counter(struct drm_crtc *crtc) { struct drm_i915_private *dev_priv = to_i915(crtc->dev); enum pipe pipe = to_intel_crtc(crtc)->pipe; + if (crtc->state->async_flip) + return g4x_get_flip_counter(crtc); + return I915_READ(PIPE_FRMCOUNT_G4X(pipe)); I don't understand the intention behind this, can you please clarify? This goes back to my reply of the cover letter. It seems that here we're going to alternate between two different counters in our vblank count. So if user space alternates between sometimes using async flips and sometimes using normal flip it's going to get some very weird deltas, isn't it? At least this is what I remember from when I played with these registers: FLIPCOUNT drifts away from FRMCOUNT when we start using async flips. This definitely looks wrong. The counter value returned by the get_vblank_counter hook is supposed to increment when a vertical blank period occurs; page flips are not supposed to affect this in any way. Thanks for the review. As per the feedback received, I will be removing this and will revert back to the original implementation in the next revision. Thanks, Karthik.B.S ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v5 1/5] drm/i915: Add enable/disable flip done and flip done handler
On 7/28/2020 3:04 AM, Daniel Vetter wrote: On Mon, Jul 27, 2020 at 2:27 PM Michel Dänzer wrote: On 2020-07-25 1:26 a.m., Paulo Zanoni wrote: Em seg, 2020-07-20 às 17:01 +0530, Karthik B S escreveu: diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 1fa67700d8f4..95953b393941 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -697,14 +697,24 @@ u32 i915_get_vblank_counter(struct drm_crtc *crtc) return (((high1 << 8) | low) + (pixel >= vbl_start)) & 0xff; } +static u32 g4x_get_flip_counter(struct drm_crtc *crtc) +{ +struct drm_i915_private *dev_priv = to_i915(crtc->dev); +enum pipe pipe = to_intel_crtc(crtc)->pipe; + +return I915_READ(PIPE_FLIPCOUNT_G4X(pipe)); +} + u32 g4x_get_vblank_counter(struct drm_crtc *crtc) { struct drm_i915_private *dev_priv = to_i915(crtc->dev); enum pipe pipe = to_intel_crtc(crtc)->pipe; +if (crtc->state->async_flip) +return g4x_get_flip_counter(crtc); + return I915_READ(PIPE_FRMCOUNT_G4X(pipe)); I don't understand the intention behind this, can you please clarify? This goes back to my reply of the cover letter. It seems that here we're going to alternate between two different counters in our vblank count. So if user space alternates between sometimes using async flips and sometimes using normal flip it's going to get some very weird deltas, isn't it? At least this is what I remember from when I played with these registers: FLIPCOUNT drifts away from FRMCOUNT when we start using async flips. This definitely looks wrong. The counter value returned by the get_vblank_counter hook is supposed to increment when a vertical blank period occurs; page flips are not supposed to affect this in any way. Also you just flat out can't access crtc->state from interrupt context. Anything you need in there needs to be protected by the right irq-type spin_lock, updates correctly synchronized against both the interrupt handler and atomic updates, and data copied over, not pointers. Otherwise just crash&burn. Thanks for the review. I will be removing this change in the next revision based on the feedback received, but I will keep this in mind whenever I'll have to access something from the interrupt context. Thanks, Karthik.B.S -Daniel ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 14/37] drm/i915: Serialise i915_vma_pin_inplace() with i915_vma_unbind()
On 05/08/2020 13:22, Chris Wilson wrote: Directly seralise the atomic pinning with evicting the vma from unbind with a pair of coupled cmpxchg to avoid fighting over vm->mutex. Assumption being bind/unbind should never contend and create a busy-spinny section? And motivation being.. ? Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_vma.c | 45 ++--- 1 file changed, 14 insertions(+), 31 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index dbe11b349175..17ce0bce318e 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -742,12 +742,10 @@ i915_vma_detach(struct i915_vma *vma) bool i915_vma_pin_inplace(struct i915_vma *vma, unsigned int flags) { - unsigned int bound; - bool pinned = true; + unsigned int bound = atomic_read(&vma->flags); GEM_BUG_ON(flags & ~I915_VMA_BIND_MASK); - bound = atomic_read(&vma->flags); do { if (unlikely(flags & ~bound)) return false; @@ -755,34 +753,10 @@ bool i915_vma_pin_inplace(struct i915_vma *vma, unsigned int flags) if (unlikely(bound & (I915_VMA_OVERFLOW | I915_VMA_ERROR))) return false; - if (!(bound & I915_VMA_PIN_MASK)) - goto unpinned; - GEM_BUG_ON(((bound + 1) & I915_VMA_PIN_MASK) == 0); } while (!atomic_try_cmpxchg(&vma->flags, &bound, bound + 1)); return true; - -unpinned: - /* -* If pin_count==0, but we are bound, check under the lock to avoid -* racing with a concurrent i915_vma_unbind(). -*/ - mutex_lock(&vma->vm->mutex); - do { - if (unlikely(bound & (I915_VMA_OVERFLOW | I915_VMA_ERROR))) { - pinned = false; - break; - } - - if (unlikely(flags & ~bound)) { - pinned = false; - break; - } - } while (!atomic_try_cmpxchg(&vma->flags, &bound, bound + 1)); - mutex_unlock(&vma->vm->mutex); - - return pinned; } static int vma_get_pages(struct i915_vma *vma) @@ -1292,6 +1266,7 @@ void __i915_vma_evict(struct i915_vma *vma) int __i915_vma_unbind(struct i915_vma *vma) { + unsigned int bound; int ret; lockdep_assert_held(&vma->vm->mutex); @@ -1299,10 +1274,18 @@ int __i915_vma_unbind(struct i915_vma *vma) if (!drm_mm_node_allocated(&vma->node)) return 0; - if (i915_vma_is_pinned(vma)) { - vma_print_allocator(vma, "is pinned"); - return -EAGAIN; - } + /* Serialise with i915_vma_pin_inplace() */ + bound = atomic_read(&vma->flags); + do { + if (unlikely(bound & I915_VMA_PIN_MASK)) { + vma_print_allocator(vma, "is pinned"); + return -EAGAIN; + } + + if (unlikely(bound & I915_VMA_ERROR)) + break; + } while (!atomic_try_cmpxchg(&vma->flags, +&bound, bound | I915_VMA_ERROR)); Using the error flag is somehow critical for this scheme to work? Can you please explain in the comment and/or commit message? /* * After confirming that no one else is pinning this vma, wait for Regards, Tvrtko ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 16/37] drm/i915: Always defer fenced work to the worker
On 05/08/2020 13:22, Chris Wilson wrote: Currently, if an error is raised we always call the cleanup locally [and skip the main work callback]. However, some future users may need to take a mutex to cleanup and so we cannot immediately execute the cleanup as we may still be in interrupt context. For example, if we have committed sensitive changes [like evicting from the ppGTT layout] that are visible but gated behind the fence, we need to ensure those changes are completed even after an error. [This does suggest the split between the work/release callback is artificial and we may be able to simplify the worker api by only requiring a single callback.] With the execute-immediate flag, for most cases this should result in immediate cleanup of an error. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_sw_fence_work.c | 26 +++ 1 file changed, 13 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.c b/drivers/gpu/drm/i915/i915_sw_fence_work.c index a3a81bb8f2c3..e094fd0a4202 100644 --- a/drivers/gpu/drm/i915/i915_sw_fence_work.c +++ b/drivers/gpu/drm/i915/i915_sw_fence_work.c @@ -16,11 +16,14 @@ static void fence_complete(struct dma_fence_work *f) static void fence_work(struct work_struct *work) { struct dma_fence_work *f = container_of(work, typeof(*f), work); - int err; - err = f->ops->work(f); - if (err) - dma_fence_set_error(&f->dma, err); + if (!f->dma.error) { + int err; + + err = f->ops->work(f); + if (err) + dma_fence_set_error(&f->dma, err); + } fence_complete(f); dma_fence_put(&f->dma); @@ -36,15 +39,10 @@ fence_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state) if (fence->error) dma_fence_set_error(&f->dma, fence->error); - if (!f->dma.error) { - dma_fence_get(&f->dma); - if (test_bit(DMA_FENCE_WORK_IMM, &f->dma.flags)) - fence_work(&f->work); - else - queue_work(system_unbound_wq, &f->work); - } else { - fence_complete(f); - } + if (test_bit(DMA_FENCE_WORK_IMM, &f->dma.flags)) + fence_work(&f->work); + else + queue_work(system_unbound_wq, &f->work); break; case FENCE_FREE: @@ -91,6 +89,8 @@ void dma_fence_work_init(struct dma_fence_work *f, dma_fence_init(&f->dma, &fence_ops, &f->lock, 0, 0); i915_sw_fence_init(&f->chain, fence_notify); INIT_WORK(&f->work, fence_work); + + dma_fence_get(&f->dma); /* once for the chain; once for the work */ } int dma_fence_work_chain(struct dma_fence_work *f, struct dma_fence *signal) Reviewed-by: Tvrtko Ursulin Regards, Tvrtko ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 17/37] drm/i915/gem: Assign context id for async work
On 05/08/2020 13:22, Chris Wilson wrote: Allocate a few dma fence context id that we can use to associate async work [for the CPU] launched on behalf of this context. For extra fun, we allow a configurable concurrency width. A current example would be that we spawn an unbound worker for every userptr get_pages. In the future, we wish to charge this work to the context that initiated the async work and to impose concurrency limits based on the context. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 4 drivers/gpu/drm/i915/gem/i915_gem_context.h | 6 ++ drivers/gpu/drm/i915/gem/i915_gem_context_types.h | 6 ++ 3 files changed, 16 insertions(+) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index db893f6c516b..bc80e7d3c50a 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -721,6 +721,10 @@ __create_context(struct drm_i915_private *i915) mutex_init(&ctx->mutex); INIT_LIST_HEAD(&ctx->link); + ctx->async.width = rounddown_pow_of_two(num_online_cpus()); + ctx->async.context = dma_fence_context_alloc(ctx->async.width); + ctx->async.width--; + spin_lock_init(&ctx->stale.lock); INIT_LIST_HEAD(&ctx->stale.engines); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.h b/drivers/gpu/drm/i915/gem/i915_gem_context.h index a133f92bbedb..f254458a795e 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.h @@ -134,6 +134,12 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data, int i915_gem_context_reset_stats_ioctl(struct drm_device *dev, void *data, struct drm_file *file); +static inline u64 i915_gem_context_async_id(struct i915_gem_context *ctx) +{ + return (ctx->async.context + + (atomic_fetch_inc(&ctx->async.cur) & ctx->async.width)); +} + static inline struct i915_gem_context * i915_gem_context_get(struct i915_gem_context *ctx) { diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h index ae14ca24a11f..52561f98000f 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h @@ -85,6 +85,12 @@ struct i915_gem_context { struct intel_timeline *timeline; + struct { + u64 context; + atomic_t cur; + unsigned int width; + } async; + /** * @vm: unique address space (GTT) * Reviewed-by: Tvrtko Ursulin Regards, Tvrtko ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 23/37] drm/i915/gem: Manage GTT placement bias (starting offset) explicitly
On 05/08/2020 13:22, Chris Wilson wrote: Since we can control placement in the ppGTT explicitly, we can specify our desired starting offset exactly on a per-vma basis. This prevents us falling down a few corner cases where we confuse the user with our choices. Signed-off-by: Chris Wilson --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 67 +-- 1 file changed, 31 insertions(+), 36 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 19cab5541dbc..0839397c7e50 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -36,6 +36,7 @@ struct eb_vma { /** This vma's place in the execbuf reservation list */ struct drm_i915_gem_exec_object2 *exec; + u32 bias; struct list_head bind_link; struct list_head unbound_link; @@ -61,15 +62,12 @@ struct eb_vma_array { #define __EXEC_OBJECT_HAS_PIN BIT(31) #define __EXEC_OBJECT_HAS_FENCE BIT(30) #define __EXEC_OBJECT_NEEDS_MAP BIT(29) -#define __EXEC_OBJECT_NEEDS_BIAS BIT(28) -#define __EXEC_OBJECT_INTERNAL_FLAGS (~0u << 28) /* all of the above */ +#define __EXEC_OBJECT_INTERNAL_FLAGS (~0u << 29) /* all of the above */ #define __EXEC_HAS_RELOC BIT(31) #define __EXEC_INTERNAL_FLAGS (~0u << 31) #define UPDATEPIN_OFFSET_FIXED -#define BATCH_OFFSET_BIAS (256*1024) - #define __I915_EXEC_ILLEGAL_FLAGS \ (__I915_EXEC_UNKNOWN_FLAGS | \ I915_EXEC_CONSTANTS_MASK | \ @@ -291,7 +289,7 @@ struct i915_execbuffer { } parser; u64 invalid_flags; /** Set of execobj.flags that are invalid */ - u32 context_flags; /** Set of execobj.flags to insert from the ctx */ + u32 context_bias; u32 batch_start_offset; /** Location within object of batch */ u32 batch_len; /** Length of batch within object */ @@ -491,11 +489,12 @@ static int eb_create(struct i915_execbuffer *eb) return 0; } -static bool -eb_vma_misplaced(const struct drm_i915_gem_exec_object2 *entry, -const struct i915_vma *vma, -unsigned int flags) +static bool eb_vma_misplaced(const struct eb_vma *ev) { + const struct drm_i915_gem_exec_object2 *entry = ev->exec; + const struct i915_vma *vma = ev->vma; + unsigned int flags = ev->flags; + if (test_bit(I915_VMA_ERROR_BIT, __i915_vma_flags(vma))) return true; @@ -509,8 +508,7 @@ eb_vma_misplaced(const struct drm_i915_gem_exec_object2 *entry, vma->node.start != entry->offset) return true; - if (flags & __EXEC_OBJECT_NEEDS_BIAS && - vma->node.start < BATCH_OFFSET_BIAS) + if (vma->node.start < ev->bias) return true; if (!(flags & EXEC_OBJECT_SUPPORTS_48B_ADDRESS) && @@ -529,10 +527,7 @@ static bool eb_pin_vma_fence_inplace(struct eb_vma *ev) return false; /* We need to add some new fence serialisation */ } -static inline bool -eb_pin_vma_inplace(struct i915_execbuffer *eb, - const struct drm_i915_gem_exec_object2 *entry, - struct eb_vma *ev) +static inline bool eb_pin_vma_inplace(struct eb_vma *ev) { struct i915_vma *vma = ev->vma; unsigned int pin_flags; @@ -541,7 +536,7 @@ eb_pin_vma_inplace(struct i915_execbuffer *eb, if (!i915_active_is_idle(&vma->vm->binding)) return false; - if (eb_vma_misplaced(entry, vma, ev->flags)) + if (eb_vma_misplaced(ev)) return false; pin_flags = PIN_USER; @@ -559,7 +554,7 @@ eb_pin_vma_inplace(struct i915_execbuffer *eb, } } - GEM_BUG_ON(eb_vma_misplaced(entry, vma, ev->flags)); + GEM_BUG_ON(eb_vma_misplaced(ev)); ev->flags |= __EXEC_OBJECT_HAS_PIN; return true; @@ -608,9 +603,6 @@ eb_validate_vma(struct i915_execbuffer *eb, entry->flags |= EXEC_OBJECT_NEEDS_GTT | __EXEC_OBJECT_NEEDS_MAP; } - if (!(entry->flags & EXEC_OBJECT_PINNED)) - entry->flags |= eb->context_flags; - return 0; } @@ -627,6 +619,7 @@ eb_add_vma(struct i915_execbuffer *eb, ev->vma = vma; ev->exec = entry; ev->flags = entry->flags; + ev->bias = eb->context_bias; if (eb->lut_size > 0) { ev->handle = entry->handle; @@ -653,7 +646,8 @@ eb_add_vma(struct i915_execbuffer *eb, if (i == batch_idx) { if (entry->relocation_count && !(ev->flags & EXEC_OBJECT_PINNED)) - ev->flags |= __EXEC_OBJECT_NEEDS_BIAS; + ev->bias = max_t(u32, ev->bias, SZ_256K); What dictates the 256KiB border? Wondering if this is too hidden in here or not. Regards, Tvrtko + if (eb->has_fence) ev->flags |= EXE
Re: [Intel-gfx] [PATCH 6/6] drm/xen-front: Add support for EDID based configuration
Hi Oleksandr, I love your patch! Perhaps something to improve: [auto build test WARNING on drm-exynos/exynos-drm-next] [also build test WARNING on drm-intel/for-linux-next tegra-drm/drm/tegra/for-next drm-tip/drm-tip linus/master v5.8 next-20200804] [cannot apply to xen-tip/linux-next drm/drm-next] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Oleksandr-Andrushchenko/Fixes-and-improvements-for-Xen-pvdrm/20200731-205350 base: https://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos.git exynos-drm-next compiler: aarch64-linux-gcc (GCC) 9.3.0 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot cppcheck warnings: (new ones prefixed by >>) >> drivers/irqchip/irq-gic.c:161:24: warning: Local variable gic_data shadows >> outer variable [shadowVar] struct gic_chip_data *gic_data = irq_data_get_irq_chip_data(d); ^ drivers/irqchip/irq-gic.c:123:29: note: Shadowed declaration static struct gic_chip_data gic_data[CONFIG_ARM_GIC_MAX_NR] __read_mostly; ^ drivers/irqchip/irq-gic.c:161:24: note: Shadow variable struct gic_chip_data *gic_data = irq_data_get_irq_chip_data(d); ^ drivers/irqchip/irq-gic.c:167:24: warning: Local variable gic_data shadows outer variable [shadowVar] struct gic_chip_data *gic_data = irq_data_get_irq_chip_data(d); ^ drivers/irqchip/irq-gic.c:123:29: note: Shadowed declaration static struct gic_chip_data gic_data[CONFIG_ARM_GIC_MAX_NR] __read_mostly; ^ drivers/irqchip/irq-gic.c:167:24: note: Shadow variable struct gic_chip_data *gic_data = irq_data_get_irq_chip_data(d); ^ >> drivers/irqchip/irq-gic.c:400:28: warning: Local variable gic_irq shadows >> outer function [shadowFunction] unsigned int cascade_irq, gic_irq; ^ drivers/irqchip/irq-gic.c:171:28: note: Shadowed declaration static inline unsigned int gic_irq(struct irq_data *d) ^ drivers/irqchip/irq-gic.c:400:28: note: Shadow variable unsigned int cascade_irq, gic_irq; ^ >> drivers/irqchip/irq-gic.c:1507:14: warning: Local variable gic_cpu_base >> shadows outer function [shadowFunction] phys_addr_t gic_cpu_base; ^ drivers/irqchip/irq-gic.c:165:29: note: Shadowed declaration static inline void __iomem *gic_cpu_base(struct irq_data *d) ^ drivers/irqchip/irq-gic.c:1507:14: note: Shadow variable phys_addr_t gic_cpu_base; ^ >> drivers/irqchip/irq-gic-v3.c:874:71: warning: Boolean result is used in >> bitwise operation. Clarify expression with parentheses. [clarifyCondition] gic_data.rdists.has_direct_lpi &= (!!(typer & GICR_TYPER_DirectLPIS) | ^ >> drivers/irqchip/irq-gic-v3.c:1808:6: warning: Local variable >> nr_redist_regions shadows outer variable [shadowVar] u32 nr_redist_regions; ^ drivers/irqchip/irq-gic-v3.c:1880:6: note: Shadowed declaration u32 nr_redist_regions; ^ drivers/irqchip/irq-gic-v3.c:1808:6: note: Shadow variable u32 nr_redist_regions; ^ >> drivers/irqchip/irq-gic-v3.c:2042:6: warning: Local variable maint_irq_mode >> shadows outer variable [shadowVar] int maint_irq_mode; ^ drivers/irqchip/irq-gic-v3.c:1884:6: note: Shadowed declaration int maint_irq_mode; ^ drivers/irqchip/irq-gic-v3.c:2042:6: note: Shadow variable int maint_irq_mode; ^ >> drivers/gpu/drm/xen/xen_drm_front_cfg.c:76:6: warning: Variable 'ret' is >> reassigned a value before the old one has been used. [redundantAssignment] ret = xen_drm_front_get_edid(front_info, index, pages, ^ drivers/gpu/drm/xen/xen_drm_front_cfg.c:61:0: note: Variable 'ret' is reassigned a value before the old one has been used. int i, npages, ret = -ENOMEM; ^ drivers/gpu/drm/xen/xen_drm_front_cfg.c:76:6: note: Variable 'ret' is reassigned a value before the old one has been used. ret = xen_drm_front_get_edid(front_info, index, pages, ^ vim +/ret +76 drivers/gpu/drm/xen/xen_drm_front_cfg.c 54 55 static void cfg_connector_edid(struct xen_drm_front_info *front_info, 56 struct xen_drm_front_cfg_connector *connector, 57 int index) 58 { 59 struct page **pages; 60 u32 edid_sz; 61 int i, npages, ret = -ENOMEM; 62 63 connector->edid = vmalloc(XENDISPL_EDID_MAX_SIZE); 64 if (!connector->edid) 65
[Intel-gfx] [PATCH] i915/tgl: Fix TC-cold block/unblock sequence
The command register is the low PCODE MBOX low register not the high one as described by the spec. This left the system with the TC-cold power state being blocked all the time. Fix things by using the correct register. Also to make sure we retry a request for at least 600usec, when the PCODE MBOX command itself succeeded, but the TC-cold block command failed, sleep for 1msec unconditionally after any fail. The change was tested with JTAG register read of the HW/FW's actual TC-cold state, which reported the expected states after this change. Tested-by: Nivedita Swaminathan Cc: José Roberto de Souza Signed-off-by: Imre Deak --- drivers/gpu/drm/i915/display/intel_display_power.c | 10 +- drivers/gpu/drm/i915/i915_reg.h| 4 ++-- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/i915/display/intel_display_power.c b/drivers/gpu/drm/i915/display/intel_display_power.c index 9f0241a53a45..8f0b712ed7a0 100644 --- a/drivers/gpu/drm/i915/display/intel_display_power.c +++ b/drivers/gpu/drm/i915/display/intel_display_power.c @@ -3927,12 +3927,13 @@ tgl_tc_cold_request(struct drm_i915_private *i915, bool block) int ret; while (1) { - u32 low_val = 0, high_val; + u32 low_val; + u32 high_val = 0; if (block) - high_val = TGL_PCODE_EXIT_TCCOLD_DATA_H_BLOCK_REQ; + low_val = TGL_PCODE_EXIT_TCCOLD_DATA_L_BLOCK_REQ; else - high_val = TGL_PCODE_EXIT_TCCOLD_DATA_H_UNBLOCK_REQ; + low_val = TGL_PCODE_EXIT_TCCOLD_DATA_L_UNBLOCK_REQ; /* * Spec states that we should timeout the request after 200us @@ -3951,8 +3952,7 @@ tgl_tc_cold_request(struct drm_i915_private *i915, bool block) if (++tries == 3) break; - if (ret == -EAGAIN) - msleep(1); + msleep(1); } if (ret) diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 2b403df03404..e85c6fc1f3cb 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -9226,8 +9226,8 @@ enum { #define DISPLAY_IPS_CONTROL 0x19 #define TGL_PCODE_TCCOLD 0x26 #define TGL_PCODE_EXIT_TCCOLD_DATA_L_EXIT_FAILED REG_BIT(0) -#define TGL_PCODE_EXIT_TCCOLD_DATA_H_BLOCK_REQ 0 -#define TGL_PCODE_EXIT_TCCOLD_DATA_H_UNBLOCK_REQ REG_BIT(0) +#define TGL_PCODE_EXIT_TCCOLD_DATA_L_BLOCK_REQ 0 +#define TGL_PCODE_EXIT_TCCOLD_DATA_L_UNBLOCK_REQ REG_BIT(0) /* See also IPS_CTL */ #define IPS_PCODE_CONTROL (1 << 30) #define HSW_PCODE_DYNAMIC_DUTY_CYCLE_CONTROL 0x1A -- 2.23.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 01/37] drm/i915/gem: Reduce context termination list iteration guard to RCU
On 05/08/2020 13:21, Chris Wilson wrote: As we now protect the timeline list using RCU, we can drop the timeline->mutex for guarding the list iteration during context close, as we are searching for an inflight request. Any new request will see the context is banned and not be submitted. In doing so, pull the checks for a concurrent submission of the request (notably the i915_request_completed()) under the engine spinlock, to fully serialise with __i915_request_submit()). That is in the case of preempt-to-busy where the request may be completed during the __i915_request_submit(), we need to be careful that we sample the request status after serialising so that we don't miss the request the engine is actually submitting. Fixes: 4a3174152147 ("drm/i915/gem: Refine occupancy test in kill_context()") References: d22d2d073ef8 ("drm/i915: Protect i915_request_await_start from early waits") # rcu protection of timeline->requests References: https://gitlab.freedesktop.org/drm/intel/-/issues/1622 Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 32 - 1 file changed, 19 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index d8cccbab7a51..db893f6c516b 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -439,29 +439,36 @@ static bool __cancel_engine(struct intel_engine_cs *engine) return __reset_engine(engine); } -static struct intel_engine_cs *__active_engine(struct i915_request *rq) +static bool +__active_engine(struct i915_request *rq, struct intel_engine_cs **active) { struct intel_engine_cs *engine, *locked; + bool ret = false; /* * Serialise with __i915_request_submit() so that it sees * is-banned?, or we know the request is already inflight. +* +* Note that rq->engine is unstable, and so we double +* check that we have acquired the lock on the final engine. */ locked = READ_ONCE(rq->engine); spin_lock_irq(&locked->active.lock); while (unlikely(locked != (engine = READ_ONCE(rq->engine { spin_unlock(&locked->active.lock); - spin_lock(&engine->active.lock); locked = engine; + spin_lock(&locked->active.lock); } - engine = NULL; - if (i915_request_is_active(rq) && rq->fence.error != -EIO) - engine = rq->engine; + if (!i915_request_completed(rq)) { + if (i915_request_is_active(rq) && rq->fence.error != -EIO) + *active = locked; + ret = true; So not completed but also not submitted will return true and no engine.. + } spin_unlock_irq(&locked->active.lock); - return engine; + return ret; } static struct intel_engine_cs *active_engine(struct intel_context *ce) @@ -472,17 +479,16 @@ static struct intel_engine_cs *active_engine(struct intel_context *ce) if (!ce->timeline) return NULL; - mutex_lock(&ce->timeline->mutex); - list_for_each_entry_reverse(rq, &ce->timeline->requests, link) { - if (i915_request_completed(rq)) - break; + rcu_read_lock(); + list_for_each_entry_rcu(rq, &ce->timeline->requests, link) { + if (i915_request_is_active(rq) && i915_request_completed(rq)) + continue; /* Check with the backend if the request is inflight */ - engine = __active_engine(rq); - if (engine) + if (__active_engine(rq, &engine)) break; ... hence the caller of this will say no action. Because not active means not submitted so that's okay and matches old behaviour. Need for bool return and output engine looks a consequence of iterating the list in different direction. Reviewed-by: Tvrtko Ursulin Regards, Tvrtko } - mutex_unlock(&ce->timeline->mutex); + rcu_read_unlock(); return engine; } ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 02/37] drm/i915/gt: Protect context lifetime with RCU
On 05/08/2020 13:21, Chris Wilson wrote: Allow a brief period for continued access to a dead intel_context by deferring the release of the struct until after an RCU grace period. As we are using a dedicated slab cache for the contexts, we can defer the release of the slab pages via RCU, with the caveat that individual structs may be reused from the freelist within an RCU grace period. To handle that, we have to avoid clearing members of the zombie struct. Is this related to debugfs race, optimising the driver latencies or both? Need to hack up mutex_reinit bothers me, on top of general desire to avoid even more rcu complexity. Regards, Tvrtko Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gt/intel_context.c | 330 +--- drivers/gpu/drm/i915/i915_active.c | 10 + drivers/gpu/drm/i915/i915_active.h | 2 + drivers/gpu/drm/i915/i915_utils.h | 7 + 4 files changed, 202 insertions(+), 147 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index 52db2bde44a3..4e7924640ffa 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -22,7 +22,7 @@ static struct i915_global_context { static struct intel_context *intel_context_alloc(void) { - return kmem_cache_zalloc(global.slab_ce, GFP_KERNEL); + return kmem_cache_alloc(global.slab_ce, GFP_KERNEL); } void intel_context_free(struct intel_context *ce) @@ -30,6 +30,177 @@ void intel_context_free(struct intel_context *ce) kmem_cache_free(global.slab_ce, ce); } +static int __context_pin_state(struct i915_vma *vma) +{ + unsigned int bias = i915_ggtt_pin_bias(vma) | PIN_OFFSET_BIAS; + int err; + + err = i915_ggtt_pin(vma, 0, bias | PIN_HIGH); + if (err) + return err; + + err = i915_active_acquire(&vma->active); + if (err) + goto err_unpin; + + /* +* And mark it as a globally pinned object to let the shrinker know +* it cannot reclaim the object until we release it. +*/ + i915_vma_make_unshrinkable(vma); + vma->obj->mm.dirty = true; + + return 0; + +err_unpin: + i915_vma_unpin(vma); + return err; +} + +static void __context_unpin_state(struct i915_vma *vma) +{ + i915_vma_make_shrinkable(vma); + i915_active_release(&vma->active); + __i915_vma_unpin(vma); +} + +static int __ring_active(struct intel_ring *ring) +{ + int err; + + err = intel_ring_pin(ring); + if (err) + return err; + + err = i915_active_acquire(&ring->vma->active); + if (err) + goto err_pin; + + return 0; + +err_pin: + intel_ring_unpin(ring); + return err; +} + +static void __ring_retire(struct intel_ring *ring) +{ + i915_active_release(&ring->vma->active); + intel_ring_unpin(ring); +} + +__i915_active_call +static void __intel_context_retire(struct i915_active *active) +{ + struct intel_context *ce = container_of(active, typeof(*ce), active); + + CE_TRACE(ce, "retire runtime: { total:%lluns, avg:%lluns }\n", +intel_context_get_total_runtime_ns(ce), +intel_context_get_avg_runtime_ns(ce)); + + set_bit(CONTEXT_VALID_BIT, &ce->flags); + if (ce->state) + __context_unpin_state(ce->state); + + intel_timeline_unpin(ce->timeline); + __ring_retire(ce->ring); + + intel_context_put(ce); +} + +static int __intel_context_active(struct i915_active *active) +{ + struct intel_context *ce = container_of(active, typeof(*ce), active); + int err; + + CE_TRACE(ce, "active\n"); + + intel_context_get(ce); + + err = __ring_active(ce->ring); + if (err) + goto err_put; + + err = intel_timeline_pin(ce->timeline); + if (err) + goto err_ring; + + if (!ce->state) + return 0; + + err = __context_pin_state(ce->state); + if (err) + goto err_timeline; + + return 0; + +err_timeline: + intel_timeline_unpin(ce->timeline); +err_ring: + __ring_retire(ce->ring); +err_put: + intel_context_put(ce); + return err; +} + +static void __intel_context_ctor(void *arg) +{ + struct intel_context *ce = arg; + + INIT_LIST_HEAD(&ce->signal_link); + INIT_LIST_HEAD(&ce->signals); + + atomic_set(&ce->pin_count, 0); + mutex_init(&ce->pin_mutex); + + ce->active_count = 0; + i915_active_init(&ce->active, +__intel_context_active, __intel_context_retire); + + ce->inflight = NULL; + ce->lrc_reg_state = NULL; + ce->lrc.desc = 0; +} + +static void +__intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine) +{ + GEM_BUG_ON(!engine->cops); + GEM_BUG_ON(!engine->gt->vm); + + kref_init(&ce->re
Re: [Intel-gfx] [PATCH 03/37] drm/i915/gt: Free stale request on destroying the virtual engine
On 05/08/2020 13:21, Chris Wilson wrote: Since preempt-to-busy, we may unsubmit a request while it is still on the HW and completes asynchronously. That means it may be retired and in the process destroy the virtual engine (as the user has closed their context), but that engine may still be holding onto the unsubmitted compelted request. Therefore we need to potentially cleanup the old request on destroying the virtual engine. We also have to keep the virtual_engine alive until after the sibling's execlists_dequeue() have finished peeking into the virtual engines, for which we serialise with RCU. Signed-off-by: Chris Wilson Cc: Tvrtko Ursulin --- drivers/gpu/drm/i915/gt/intel_lrc.c | 22 +++--- 1 file changed, 19 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index 417f6b0c6c61..cb04bc5474be 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -180,6 +180,7 @@ #define EXECLISTS_REQUEST_SIZE 64 /* bytes */ struct virtual_engine { + struct rcu_head rcu; struct intel_engine_cs base; struct intel_context context; @@ -5393,10 +5394,25 @@ static void virtual_context_destroy(struct kref *kref) container_of(kref, typeof(*ve), context.ref); unsigned int n; - GEM_BUG_ON(!list_empty(virtual_queue(ve))); - GEM_BUG_ON(ve->request); GEM_BUG_ON(ve->context.inflight); + if (unlikely(ve->request)) { + struct i915_request *old; + unsigned long flags; + + spin_lock_irqsave(&ve->base.active.lock, flags); + + old = fetch_and_zero(&ve->request); + if (old) { + GEM_BUG_ON(!i915_request_completed(old)); + __i915_request_submit(old); + i915_request_put(old); + } + + spin_unlock_irqrestore(&ve->base.active.lock, flags); + } + GEM_BUG_ON(!list_empty(virtual_queue(ve))); + for (n = 0; n < ve->num_siblings; n++) { struct intel_engine_cs *sibling = ve->siblings[n]; struct rb_node *node = &ve->nodes[sibling->id].rb; @@ -5422,7 +5438,7 @@ static void virtual_context_destroy(struct kref *kref) intel_engine_free_request_pool(&ve->base); kfree(ve->bonds); - kfree(ve); + kfree_rcu(ve, rcu); } static void virtual_engine_initial_hint(struct virtual_engine *ve) If it would go without the previous patch I think it would simply mean a normal kfree here. In both cases it looks okay to me. Reviewed-by: Tvrtko Ursulin Regards, Tvrtko ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for i915/tgl: Fix TC-cold block/unblock sequence
== Series Details == Series: i915/tgl: Fix TC-cold block/unblock sequence URL : https://patchwork.freedesktop.org/series/80302/ State : warning == Summary == $ dim checkpatch origin/drm-tip 26989606f3cd i915/tgl: Fix TC-cold block/unblock sequence -:52: WARNING:MSLEEP: msleep < 20ms can sleep for up to 20ms; see Documentation/timers/timers-howto.rst #52: FILE: drivers/gpu/drm/i915/display/intel_display_power.c:3955: + msleep(1); total: 0 errors, 1 warnings, 0 checks, 35 lines checked ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] ✗ Fi.CI.SPARSE: warning for i915/tgl: Fix TC-cold block/unblock sequence
== Series Details == Series: i915/tgl: Fix TC-cold block/unblock sequence URL : https://patchwork.freedesktop.org/series/80302/ State : warning == Summary == $ dim sparse --fast origin/drm-tip Sparse version: v0.6.0 Fast mode used, each commit won't be checked separately. ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] ✓ Fi.CI.BAT: success for i915/tgl: Fix TC-cold block/unblock sequence
== Series Details == Series: i915/tgl: Fix TC-cold block/unblock sequence URL : https://patchwork.freedesktop.org/series/80302/ State : success == Summary == CI Bug Log - changes from CI_DRM_8845 -> Patchwork_18311 Summary --- **SUCCESS** No regressions found. External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18311/index.html Known issues Here are the changes found in Patchwork_18311 that come from known issues: ### IGT changes ### Issues hit * igt@i915_module_load@reload: - fi-apl-guc: [PASS][1] -> [DMESG-WARN][2] ([i915#1635] / [i915#1982]) [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/fi-apl-guc/igt@i915_module_l...@reload.html [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18311/fi-apl-guc/igt@i915_module_l...@reload.html * igt@i915_selftest@live@execlists: - fi-kbl-r: [PASS][3] -> [INCOMPLETE][4] ([i915#794]) [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/fi-kbl-r/igt@i915_selftest@l...@execlists.html [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18311/fi-kbl-r/igt@i915_selftest@l...@execlists.html Possible fixes * igt@i915_module_load@reload: - fi-byt-j1900: [DMESG-WARN][5] ([i915#1982]) -> [PASS][6] [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/fi-byt-j1900/igt@i915_module_l...@reload.html [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18311/fi-byt-j1900/igt@i915_module_l...@reload.html * igt@i915_pm_rpm@basic-pci-d3-state: - fi-bsw-kefka: [DMESG-WARN][7] ([i915#1982]) -> [PASS][8] [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/fi-bsw-kefka/igt@i915_pm_...@basic-pci-d3-state.html [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18311/fi-bsw-kefka/igt@i915_pm_...@basic-pci-d3-state.html * igt@kms_busy@basic@flip: - {fi-tgl-dsi}: [DMESG-WARN][9] ([i915#1982]) -> [PASS][10] [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/fi-tgl-dsi/igt@kms_busy@ba...@flip.html [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18311/fi-tgl-dsi/igt@kms_busy@ba...@flip.html - fi-kbl-x1275: [DMESG-WARN][11] ([i915#62] / [i915#92] / [i915#95]) -> [PASS][12] [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/fi-kbl-x1275/igt@kms_busy@ba...@flip.html [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18311/fi-kbl-x1275/igt@kms_busy@ba...@flip.html * igt@kms_chamelium@common-hpd-after-suspend: - fi-kbl-7500u: [DMESG-WARN][13] ([i915#2203]) -> [PASS][14] [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/fi-kbl-7500u/igt@kms_chamel...@common-hpd-after-suspend.html [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18311/fi-kbl-7500u/igt@kms_chamel...@common-hpd-after-suspend.html * igt@kms_flip@basic-flip-vs-wf_vblank@b-edp1: - fi-icl-u2: [DMESG-WARN][15] ([i915#1982]) -> [PASS][16] [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/fi-icl-u2/igt@kms_flip@basic-flip-vs-wf_vbl...@b-edp1.html [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18311/fi-icl-u2/igt@kms_flip@basic-flip-vs-wf_vbl...@b-edp1.html Warnings * igt@kms_force_connector_basic@force-edid: - fi-kbl-x1275: [DMESG-WARN][17] ([i915#62] / [i915#92] / [i915#95]) -> [DMESG-WARN][18] ([i915#62] / [i915#92]) +2 similar issues [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/fi-kbl-x1275/igt@kms_force_connector_ba...@force-edid.html [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18311/fi-kbl-x1275/igt@kms_force_connector_ba...@force-edid.html * igt@prime_vgem@basic-fence-flip: - fi-kbl-x1275: [DMESG-WARN][19] ([i915#62] / [i915#92]) -> [DMESG-WARN][20] ([i915#62] / [i915#92] / [i915#95]) +5 similar issues [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/fi-kbl-x1275/igt@prime_v...@basic-fence-flip.html [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18311/fi-kbl-x1275/igt@prime_v...@basic-fence-flip.html {name}: This element is suppressed. This means it is ignored when computing the status of the difference (SUCCESS, WARNING, or FAILURE). [i915#1635]: https://gitlab.freedesktop.org/drm/intel/issues/1635 [i915#1982]: https://gitlab.freedesktop.org/drm/intel/issues/1982 [i915#2203]: https://gitlab.freedesktop.org/drm/intel/issues/2203 [i915#62]: https://gitlab.freedesktop.org/drm/intel/issues/62 [i915#794]: https://gitlab.freedesktop.org/drm/intel/issues/794 [i915#92]: https://gitlab.freedesktop.org/drm/intel/issues/92 [i915#95]: https://gitlab.freedesktop.org/drm/intel/issues/95 Participating hosts (44 -> 37) -- Missing(7): fi-ilk-m540 fi-hsw-4200u fi-byt-squawks fi-bsw-cyan fi-ctg-p8600 fi-byt-clapper fi-bdw-samus Build changes - * Linux: CI_DRM_8845
Re: [Intel-gfx] [PATCH 26/37] drm/i915/gem: Pull execbuf dma resv under a single critical section
Hi, Chris, On 8/5/20 2:22 PM, Chris Wilson wrote: Acquire all the objects and their backing storage, and page directories, as used by execbuf under a single common ww_mutex. Albeit we have to restart the critical section a few times in order to handle various restrictions (such as avoiding copy_(from|to)_user and mmap_sem). Signed-off-by: Chris Wilson --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 166 +- .../i915/gem/selftests/i915_gem_execbuffer.c | 2 + 2 files changed, 84 insertions(+), 84 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 58e40348b551..3a79b6facb02 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -20,6 +20,7 @@ #include "gt/intel_gt_pm.h" #include "gt/intel_gt_requests.h" #include "gt/intel_ring.h" +#include "mm/i915_acquire_ctx.h" #include "i915_drv.h" #include "i915_gem_clflush.h" @@ -267,6 +268,8 @@ struct i915_execbuffer { struct intel_context *reloc_context; /* distinct context for relocs */ struct i915_gem_context *gem_context; /** caller's context */ + struct i915_acquire_ctx acquire; /** lock for _all_ DMA reservations */ + struct i915_request *request; /** our request to build */ struct eb_vma *batch; /** identity of the batch obj/vma */ @@ -392,42 +395,6 @@ static void eb_vma_array_put(struct eb_vma_array *arr) kref_put(&arr->kref, eb_vma_array_destroy); } -static int -eb_lock_vma(struct i915_execbuffer *eb, struct ww_acquire_ctx *acquire) -{ - struct eb_vma *ev; - int err = 0; - - list_for_each_entry(ev, &eb->submit_list, submit_link) { - struct i915_vma *vma = ev->vma; - - err = ww_mutex_lock_interruptible(&vma->resv->lock, acquire); - if (err == -EDEADLK) { - struct eb_vma *unlock = ev, *en; - - list_for_each_entry_safe_continue_reverse(unlock, en, - &eb->submit_list, - submit_link) { - ww_mutex_unlock(&unlock->vma->resv->lock); - list_move_tail(&unlock->submit_link, &eb->submit_list); - } - - GEM_BUG_ON(!list_is_first(&ev->submit_link, &eb->submit_list)); - err = ww_mutex_lock_slow_interruptible(&vma->resv->lock, - acquire); - } - if (err) { - list_for_each_entry_continue_reverse(ev, -&eb->submit_list, -submit_link) - ww_mutex_unlock(&ev->vma->resv->lock); - break; - } - } - - return err; -} - static int eb_create(struct i915_execbuffer *eb) { /* Allocate an extra slot for use by the sentinel */ @@ -656,6 +623,25 @@ eb_add_vma(struct i915_execbuffer *eb, } } +static int eb_lock_mm(struct i915_execbuffer *eb) +{ + struct eb_vma *ev; + int err; + + list_for_each_entry(ev, &eb->bind_list, bind_link) { + err = i915_acquire_ctx_lock(&eb->acquire, ev->vma->obj); + if (err) + return err; + } + + return 0; +} + +static int eb_acquire_mm(struct i915_execbuffer *eb) +{ + return i915_acquire_mm(&eb->acquire); +} + struct eb_vm_work { struct dma_fence_work base; struct eb_vma_array *array; @@ -1378,7 +1364,15 @@ static int eb_reserve_vm(struct i915_execbuffer *eb) unsigned long count; struct eb_vma *ev; unsigned int pass; - int err = 0; + int err; + + err = eb_lock_mm(eb); + if (err) + return err; + + err = eb_acquire_mm(eb); + if (err) + return err; count = 0; INIT_LIST_HEAD(&unbound); @@ -1404,10 +1398,15 @@ static int eb_reserve_vm(struct i915_execbuffer *eb) if (count == 0) return 0; + /* We need to reserve page directories, release all, start over */ + i915_acquire_ctx_fini(&eb->acquire); + pass = 0; do { struct eb_vm_work *work; + i915_acquire_ctx_init(&eb->acquire); + /* * We need to hold one lock as we bind all the vma so that * we have a consistent view of the entire vm and can plan @@ -1424,6 +1423,11 @@ static int eb_reserve_vm(struct i915_execbuffer *eb) * beneath it, so we have to stage and preallocate all the * resources we may require before taking the mutex. */ + +
Re: [Intel-gfx] [PATCH 00/37] Replace obj->mm.lock with reservation_ww_class
Hi, Chris, On 8/5/20 2:21 PM, Chris Wilson wrote: Long story short, we need to manage evictions using dma_resv & dma_fence tracking. The backing storage will then be managed using the ww_mutex borrowed from (and shared via) obj->base.resv, rather than the current obj->mm.lock. Skipping over the breadcrumbs, While perhaps needed fixes, could we submit them as a separate series, since they, from what I can tell, are not a direct part of the locking rework, and some of them were actually part of a series that Dave NaK'ed and may require additional justification? the first step is to remove the final crutches of struct_mutex from execbuf and to broaden the hold for the dma-resv to guard not just publishing the dma-fences, but for the duration of the execbuf submission (holding all objects and their backing store from the point of acquisition to publishing of the final GPU work, after which the guard is delegated to the dma-fences). This is of course made complicated by our history. On top of the user's objects, we also have the HW/kernel objects with their own lifetimes, and a bunch of auxiliary objects used for working around unhappy HW and for providing the legacy relocation mechanism. We add every auxiliary object to the list of user objects required, and attempt to acquire them en masse. Since all the objects can be known a priori, we can build a list of those objects and pass that to a routine that can resolve the -EDEADLK (and evictions). [To avoid relocations imposing a penalty on sane userspace that avoids them, we do not touch any relocations until necessary, at will point we have to unroll the state, and rebuild a new list with more auxiliary buffers to accommodate the extra copy_from_user]. More examples are included as to how we can break down operations involving multiple objects into an acquire phase prior to those operations, keeping the -EDEADLK handling under control. execbuf is the unique interface in that it deals with multiple user and kernel buffers. After that, we have callers that in principle care about accessing a single buffer, and so can be migrated over to a helper that permits only holding one such buffer at a time. That enables us to swap out obj->mm.lock for obj->base.resv->lock, and use lockdep to spot illegal nesting, and to throw away the temporary pins by replacing them with holding the ww_mutex for the duration instead. What's changed? Some patch splitting and we need to pull in Matthew's patch to map the page directories under the ww_mutex. I would still like to see a justification for the newly introduced async work, as opposed to add it as an optimizing / regression fixing series follow the locking rework. That async work introduces a bunch of code complexity and it would be beneficial to see a discussion of the tradeoffs and how it alignes with the upstream proposed dma-fence annotations Thanks, Thomas ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] ✗ Fi.CI.IGT: failure for HDCP minor refactoring (rev3)
== Series Details == Series: HDCP minor refactoring (rev3) URL : https://patchwork.freedesktop.org/series/77224/ State : failure == Summary == CI Bug Log - changes from CI_DRM_8845_full -> Patchwork_18309_full Summary --- **FAILURE** Serious unknown changes coming with Patchwork_18309_full absolutely need to be verified manually. If you think the reported changes have nothing to do with the changes introduced in Patchwork_18309_full, please notify your bug team to allow them to document this new failure mode, which will reduce false positives in CI. Possible new issues --- Here are the unknown changes that may have been introduced in Patchwork_18309_full: ### IGT changes ### Possible regressions * igt@i915_selftest@live@gtt: - shard-glk: [PASS][1] -> [INCOMPLETE][2] [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/shard-glk8/igt@i915_selftest@l...@gtt.html [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18309/shard-glk7/igt@i915_selftest@l...@gtt.html Warnings * igt@perf@blocking-parameterized: - shard-glk: [FAIL][3] ([i915#1542]) -> [INCOMPLETE][4] [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/shard-glk9/igt@p...@blocking-parameterized.html [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18309/shard-glk8/igt@p...@blocking-parameterized.html Known issues Here are the changes found in Patchwork_18309_full that come from known issues: ### IGT changes ### Issues hit * igt@kms_cursor_crc@pipe-a-cursor-alpha-transparent: - shard-skl: [PASS][5] -> [FAIL][6] ([i915#54]) [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/shard-skl3/igt@kms_cursor_...@pipe-a-cursor-alpha-transparent.html [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18309/shard-skl3/igt@kms_cursor_...@pipe-a-cursor-alpha-transparent.html * igt@kms_cursor_edge_walk@pipe-b-64x64-bottom-edge: - shard-glk: [PASS][7] -> [DMESG-WARN][8] ([i915#1982]) [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/shard-glk1/igt@kms_cursor_edge_w...@pipe-b-64x64-bottom-edge.html [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18309/shard-glk1/igt@kms_cursor_edge_w...@pipe-b-64x64-bottom-edge.html * igt@kms_flip@2x-flip-vs-expired-vblank-interruptible@bc-hdmi-a1-hdmi-a2: - shard-glk: [PASS][9] -> [FAIL][10] ([i915#79]) [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/shard-glk4/igt@kms_flip@2x-flip-vs-expired-vblank-interrupti...@bc-hdmi-a1-hdmi-a2.html [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18309/shard-glk9/igt@kms_flip@2x-flip-vs-expired-vblank-interrupti...@bc-hdmi-a1-hdmi-a2.html * igt@kms_flip@flip-vs-expired-vblank-interruptible@b-edp1: - shard-skl: [PASS][11] -> [FAIL][12] ([i915#79]) [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/shard-skl1/igt@kms_flip@flip-vs-expired-vblank-interrupti...@b-edp1.html [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18309/shard-skl1/igt@kms_flip@flip-vs-expired-vblank-interrupti...@b-edp1.html * igt@kms_hdr@bpc-switch-suspend: - shard-kbl: [PASS][13] -> [DMESG-WARN][14] ([i915#180]) +6 similar issues [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/shard-kbl1/igt@kms_...@bpc-switch-suspend.html [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18309/shard-kbl1/igt@kms_...@bpc-switch-suspend.html * igt@kms_plane_alpha_blend@pipe-b-constant-alpha-min: - shard-skl: [PASS][15] -> [FAIL][16] ([fdo#108145] / [i915#265]) [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/shard-skl8/igt@kms_plane_alpha_bl...@pipe-b-constant-alpha-min.html [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18309/shard-skl10/igt@kms_plane_alpha_bl...@pipe-b-constant-alpha-min.html * igt@kms_plane_lowres@pipe-b-tiling-none: - shard-apl: [PASS][17] -> [DMESG-WARN][18] ([i915#1635] / [i915#1982]) [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/shard-apl2/igt@kms_plane_low...@pipe-b-tiling-none.html [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18309/shard-apl6/igt@kms_plane_low...@pipe-b-tiling-none.html * igt@kms_plane_scaling@pipe-b-scaler-with-pixel-format: - shard-skl: [PASS][19] -> [DMESG-WARN][20] ([i915#1982]) +12 similar issues [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/shard-skl7/igt@kms_plane_scal...@pipe-b-scaler-with-pixel-format.html [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18309/shard-skl5/igt@kms_plane_scal...@pipe-b-scaler-with-pixel-format.html * igt@kms_psr@psr2_sprite_plane_move: - shard-iclb: [PASS][21] -> [SKIP][22] ([fdo#109441]) +1 similar issue [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/shard-iclb2/igt
[Intel-gfx] ✗ Fi.CI.IGT: failure for Replace obj->mm.lock with reservation_ww_class
== Series Details == Series: Replace obj->mm.lock with reservation_ww_class URL : https://patchwork.freedesktop.org/series/80291/ State : failure == Summary == CI Bug Log - changes from CI_DRM_8845_full -> Patchwork_18310_full Summary --- **FAILURE** Serious unknown changes coming with Patchwork_18310_full absolutely need to be verified manually. If you think the reported changes have nothing to do with the changes introduced in Patchwork_18310_full, please notify your bug team to allow them to document this new failure mode, which will reduce false positives in CI. Possible new issues --- Here are the unknown changes that may have been introduced in Patchwork_18310_full: ### IGT changes ### Possible regressions * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-pri-indfb-draw-mmap-gtt: - shard-tglb: [PASS][1] -> [INCOMPLETE][2] [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/shard-tglb3/igt@kms_frontbuffer_track...@fbc-1p-primscrn-pri-indfb-draw-mmap-gtt.html [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18310/shard-tglb3/igt@kms_frontbuffer_track...@fbc-1p-primscrn-pri-indfb-draw-mmap-gtt.html New tests - New tests have been introduced between CI_DRM_8845_full and Patchwork_18310_full: ### New IGT tests (1) ### * igt@i915_selftest@mock@acquire: - Statuses : 7 pass(s) - Exec time: [0.67, 1.79] s Known issues Here are the changes found in Patchwork_18310_full that come from known issues: ### IGT changes ### Issues hit * igt@kms_big_fb@y-tiled-16bpp-rotate-0: - shard-skl: [PASS][3] -> [DMESG-WARN][4] ([i915#1982]) +12 similar issues [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/shard-skl7/igt@kms_big...@y-tiled-16bpp-rotate-0.html [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18310/shard-skl2/igt@kms_big...@y-tiled-16bpp-rotate-0.html * igt@kms_big_fb@y-tiled-64bpp-rotate-0: - shard-glk: [PASS][5] -> [DMESG-FAIL][6] ([i915#118] / [i915#95]) [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/shard-glk3/igt@kms_big...@y-tiled-64bpp-rotate-0.html [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18310/shard-glk8/igt@kms_big...@y-tiled-64bpp-rotate-0.html * igt@kms_cursor_crc@pipe-c-cursor-suspend: - shard-kbl: [PASS][7] -> [DMESG-WARN][8] ([i915#180]) +2 similar issues [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/shard-kbl6/igt@kms_cursor_...@pipe-c-cursor-suspend.html [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18310/shard-kbl7/igt@kms_cursor_...@pipe-c-cursor-suspend.html * igt@kms_cursor_edge_walk@pipe-b-128x128-bottom-edge: - shard-apl: [PASS][9] -> [DMESG-WARN][10] ([i915#1635] / [i915#1982]) +3 similar issues [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/shard-apl3/igt@kms_cursor_edge_w...@pipe-b-128x128-bottom-edge.html [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18310/shard-apl1/igt@kms_cursor_edge_w...@pipe-b-128x128-bottom-edge.html * igt@kms_cursor_legacy@pipe-a-forked-bo: - shard-glk: [PASS][11] -> [DMESG-WARN][12] ([i915#118] / [i915#95]) [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/shard-glk3/igt@kms_cursor_leg...@pipe-a-forked-bo.html [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18310/shard-glk8/igt@kms_cursor_leg...@pipe-a-forked-bo.html * igt@kms_flip@flip-vs-expired-vblank-interruptible@b-hdmi-a2: - shard-glk: [PASS][13] -> [FAIL][14] ([i915#79]) [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/shard-glk1/igt@kms_flip@flip-vs-expired-vblank-interrupti...@b-hdmi-a2.html [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18310/shard-glk8/igt@kms_flip@flip-vs-expired-vblank-interrupti...@b-hdmi-a2.html * igt@kms_flip@flip-vs-expired-vblank@c-dp1: - shard-kbl: [PASS][15] -> [FAIL][16] ([i915#79]) [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/shard-kbl2/igt@kms_flip@flip-vs-expired-vbl...@c-dp1.html [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18310/shard-kbl4/igt@kms_flip@flip-vs-expired-vbl...@c-dp1.html * igt@kms_flip@flip-vs-wf_vblank-interruptible@a-edp1: - shard-tglb: [PASS][17] -> [DMESG-WARN][18] ([i915#1982]) [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/shard-tglb3/igt@kms_flip@flip-vs-wf_vblank-interrupti...@a-edp1.html [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18310/shard-tglb2/igt@kms_flip@flip-vs-wf_vblank-interrupti...@a-edp1.html * igt@kms_hdr@bpc-switch-suspend: - shard-skl: [PASS][19] -> [INCOMPLETE][20] ([i915#198]) [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/shard-skl4/igt@kms_...@bpc-switch-suspend.html [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwo
Re: [Intel-gfx] [PATCH v4] drm/kmb: Add support for KeemBay Display
Hi Anitha. On Mon, Aug 03, 2020 at 09:02:24PM +, Chrisanthus, Anitha wrote: > Hi Sam, > I installed codespell, but the dictionary.txt in > usr/share/codespell/dictionary.txt > seems to be different from yours. Mine is version 1.8. Where can I get the > dictionary.txt > that you are using? I dunno. $ apt info codespell Package: codespell Version: 1.16.0-2 Priority: optional Section: universe/devel Origin: Ubuntu Maintainer: Ubuntu Developers Original-Maintainer: Debian Python Modules Team Bugs: https://bugs.launchpad.net/ubuntu/+filebug Installed-Size: 572 kB Depends: python3, python3-chardet, python3:any Homepage: https://github.com/codespell-project/codespell/ Download-Size: 118 kB APT-Manual-Installed: yes APT-Sources: http://dk.archive.ubuntu.com/ubuntu focal/universe amd64 Packages Description: Find and fix common misspellings in text files codespell is designed to find and fix common misspellings in text files. It is designed primarily for checking misspelled words in source code, but it can be used with other files as well. > I have corrected the relevant spelling warnings from your email and have sent > v5. The spelling mistakes was the least relevant warnings. Please see examples in the following. > > -:146: CHECK:PARENTHESIS_ALIGNMENT: Alignment should match open > > parenthesis > > #146: FILE: drivers/gpu/drm/kmb/kmb_crtc.c:58: > > + kmb_clr_bitmask_lcd(kmb, LCD_INT_ENABLE, > > + LCD_INT_VERT_COMP); Here we want LCD_INT_VERT_COMP to be aligned right after the opening '('. It must be indented with a number of tabs followed by the necessary spaces to achive this indent. Always uses tabs for indent if possible. So in other words 8 spaces are not OK, then use a tab. Same goes for similar warnings. > > -:427: CHECK:LINE_SPACING: Please don't use multiple blank lines > > #427: FILE: drivers/gpu/drm/kmb/kmb_drv.c:74: > > + > > + > > Do not use two consecutive blank lines. > > -:463: CHECK:SPACING: spaces preferred around that '/' (ctx:VxV) > > #463: FILE: drivers/gpu/drm/kmb/kmb_drv.c:110: > > + kmb->sys_clk_mhz = clk_get_rate(kmb_clk.clk_pll0)/100; > > ^ Spaces around all operatoers - so space before and after '/' here. Same goes for following warnings of the same type. > > -:688: CHECK:BRACES: Blank lines aren't necessary after an open brace '{' > > #688: FILE: drivers/gpu/drm/kmb/kmb_drv.c:335: > > + if (status & LCD_INT_EOF) { > > + As the warning says - no empty line after opening '{' > > > > -:701: CHECK:CAMELCASE: Avoid CamelCase: > > #701: FILE: drivers/gpu/drm/kmb/kmb_drv.c:348: > > + LCD_LAYERn_DMA_CFG > > If you have a reason to use CamelCase then this can be ignored. A good reason could be that this is how it is done in the datasheet. In this case maybe use LCD_LAYER_N_DMA_CFG or similar. > > -:957: CHECK:BRACES: braces {} should be used on all arms of this statement > > #957: FILE: drivers/gpu/drm/kmb/kmb_drv.c:604: > > + if (adv_bridge == ERR_PTR(-EPROBE_DEFER)) > > [...] > > + else if (IS_ERR(adv_bridge)) { > > [...] If we use {} in one arm of the statement use it in all arms. This, as the other tidbits, improve readability. Same for all similar warnings. > > -:1026: WARNING:UNDOCUMENTED_DT_STRING: DT compatible string > > "intel,kmb_display" appears un-documented -- check > > ./Documentation/devicetree/bindings/ > > #1026: FILE: drivers/gpu/drm/kmb/kmb_drv.c:673: > > + {.compatible = "intel,kmb_display"}, Binding is missing - we cannot apply a driver for an unknown binding. The binding must be in DT-schema (yaml) format. > > > > -:1122: CHECK:UNCOMMENTED_DEFINITION: spinlock_t definition without > > comment > > #1122: FILE: drivers/gpu/drm/kmb/kmb_drv.h:35: > > + spinlock_t irq_lock; Add comment. And consider a more specific name like kmb_irq_lock - allows for easier grepping. > > > > -:1360: CHECK:PREFER_KERNEL_TYPES: Prefer kernel type 'u16' over 'uint16_t' > > #1360: FILE: drivers/gpu/drm/kmb/kmb_dsi.c:95: > > + uint16_t default_bit_rate_mbps; As the warning says. This goes again later. > > -:1947: CHECK:COMPARISON_TO_NULL: Comparison to NULL could be written > > "fg_cfg->sections[i]" > > #1947: FILE: drivers/gpu/drm/kmb/kmb_dsi.c:682: > > + if (fg_cfg->sections[i] != NULL) Hmm, I like the current code. But better please checkpatch here. I did not go through them all. The point is that all the warnings from checkpatch should be considered, and for the most of them they are legit and should be fixed. Sam ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] ✓ Fi.CI.IGT: success for i915/tgl: Fix TC-cold block/unblock sequence
== Series Details == Series: i915/tgl: Fix TC-cold block/unblock sequence URL : https://patchwork.freedesktop.org/series/80302/ State : success == Summary == CI Bug Log - changes from CI_DRM_8845_full -> Patchwork_18311_full Summary --- **SUCCESS** No regressions found. Known issues Here are the changes found in Patchwork_18311_full that come from known issues: ### IGT changes ### Issues hit * igt@gem_exec_balancer@bonded-early: - shard-kbl: [PASS][1] -> [FAIL][2] ([i915#2079]) [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/shard-kbl4/igt@gem_exec_balan...@bonded-early.html [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18311/shard-kbl1/igt@gem_exec_balan...@bonded-early.html * igt@gem_exec_balancer@nop: - shard-iclb: [PASS][3] -> [INCOMPLETE][4] ([i915#2268]) [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/shard-iclb2/igt@gem_exec_balan...@nop.html [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18311/shard-iclb2/igt@gem_exec_balan...@nop.html * igt@gem_partial_pwrite_pread@writes-after-reads-uncached: - shard-apl: [PASS][5] -> [FAIL][6] ([i915#1635]) [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/shard-apl7/igt@gem_partial_pwrite_pr...@writes-after-reads-uncached.html [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18311/shard-apl2/igt@gem_partial_pwrite_pr...@writes-after-reads-uncached.html * igt@kms_big_fb@linear-64bpp-rotate-180: - shard-glk: [PASS][7] -> [DMESG-FAIL][8] ([i915#118] / [i915#95]) [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/shard-glk4/igt@kms_big...@linear-64bpp-rotate-180.html [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18311/shard-glk8/igt@kms_big...@linear-64bpp-rotate-180.html * igt@kms_color@pipe-a-gamma: - shard-skl: [PASS][9] -> [FAIL][10] ([i915#71]) [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/shard-skl3/igt@kms_co...@pipe-a-gamma.html [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18311/shard-skl3/igt@kms_co...@pipe-a-gamma.html * igt@kms_cursor_crc@pipe-a-cursor-alpha-transparent: - shard-skl: [PASS][11] -> [FAIL][12] ([i915#54]) [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/shard-skl3/igt@kms_cursor_...@pipe-a-cursor-alpha-transparent.html [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18311/shard-skl3/igt@kms_cursor_...@pipe-a-cursor-alpha-transparent.html * igt@kms_cursor_crc@pipe-c-cursor-suspend: - shard-skl: [PASS][13] -> [INCOMPLETE][14] ([i915#300]) [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/shard-skl5/igt@kms_cursor_...@pipe-c-cursor-suspend.html [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18311/shard-skl1/igt@kms_cursor_...@pipe-c-cursor-suspend.html * igt@kms_cursor_edge_walk@pipe-b-128x128-bottom-edge: - shard-glk: [PASS][15] -> [DMESG-WARN][16] ([i915#1982]) [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/shard-glk6/igt@kms_cursor_edge_w...@pipe-b-128x128-bottom-edge.html [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18311/shard-glk3/igt@kms_cursor_edge_w...@pipe-b-128x128-bottom-edge.html * igt@kms_draw_crc@draw-method-xrgb2101010-mmap-wc-untiled: - shard-apl: [PASS][17] -> [DMESG-WARN][18] ([i915#1635] / [i915#1982]) [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/shard-apl2/igt@kms_draw_...@draw-method-xrgb2101010-mmap-wc-untiled.html [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18311/shard-apl7/igt@kms_draw_...@draw-method-xrgb2101010-mmap-wc-untiled.html * igt@kms_hdr@bpc-switch-suspend: - shard-kbl: [PASS][19] -> [DMESG-WARN][20] ([i915#180]) +8 similar issues [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/shard-kbl1/igt@kms_...@bpc-switch-suspend.html [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18311/shard-kbl4/igt@kms_...@bpc-switch-suspend.html - shard-skl: [PASS][21] -> [FAIL][22] ([i915#1188]) [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/shard-skl4/igt@kms_...@bpc-switch-suspend.html [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18311/shard-skl8/igt@kms_...@bpc-switch-suspend.html * igt@kms_plane_scaling@pipe-b-scaler-with-pixel-format: - shard-skl: [PASS][23] -> [DMESG-WARN][24] ([i915#1982]) +12 similar issues [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8845/shard-skl7/igt@kms_plane_scal...@pipe-b-scaler-with-pixel-format.html [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18311/shard-skl2/igt@kms_plane_scal...@pipe-b-scaler-with-pixel-format.html * igt@kms_psr@psr2_no_drrs: - shard-iclb: [PASS][25] -> [SKIP][26] ([fdo#109441]) [25]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM
[Intel-gfx] [PATCH v2] drm/i915/gt: Implement WA_1406941453
From: Clint Taylor Enable HW Default flip for small PL. bspec: 52890 bspec: 53508 bspec: 53273 v2: rebase to drm-tip Reviewed-by: Matt Atwood Signed-off-by: Clint Taylor --- drivers/gpu/drm/i915/gt/intel_workarounds.c | 6 ++ drivers/gpu/drm/i915/i915_reg.h | 1 + 2 files changed, 7 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c b/drivers/gpu/drm/i915/gt/intel_workarounds.c index cef1c122696f..cb02813c5e92 100644 --- a/drivers/gpu/drm/i915/gt/intel_workarounds.c +++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c @@ -639,6 +639,9 @@ static void tgl_ctx_workarounds_init(struct intel_engine_cs *engine, FF_MODE2_GS_TIMER_MASK | FF_MODE2_TDS_TIMER_MASK, FF_MODE2_GS_TIMER_224 | FF_MODE2_TDS_TIMER_128, 0); + + /* Wa_1406941453:gen12 */ + WA_SET_BIT_MASKED(GEN10_SAMPLER_MODE, ENABLE_SMALLPL); } static void @@ -1522,6 +1525,9 @@ static void icl_whitelist_build(struct intel_engine_cs *engine) whitelist_reg_ext(w, PS_INVOCATION_COUNT, RING_FORCE_TO_NONPRIV_ACCESS_RD | RING_FORCE_TO_NONPRIV_RANGE_4); + + /* Wa_1406941453:gen12 */ + whitelist_reg(w, GEN10_SAMPLER_MODE); break; case VIDEO_DECODE_CLASS: diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 2b403df03404..494b2e1e358e 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -9314,6 +9314,7 @@ enum { #define GEN11_LSN_UNSLCVC_GAFS_HALF_SF_MAXALLOC (1 << 7) #define GEN10_SAMPLER_MODE _MMIO(0xE18C) +#define ENABLE_SMALLPL REG_BIT(15) #define GEN11_SAMPLER_ENABLE_HEADLESS_MSGREG_BIT(5) /* IVYBRIDGE DPF */ -- 2.27.0 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] ✗ Fi.CI.SPARSE: warning for drm/i915/gt: Implement WA_1406941453 (rev2)
== Series Details == Series: drm/i915/gt: Implement WA_1406941453 (rev2) URL : https://patchwork.freedesktop.org/series/78243/ State : warning == Summary == $ dim sparse --fast origin/drm-tip Sparse version: v0.6.0 Fast mode used, each commit won't be checked separately. ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915/gt: Implement WA_1406941453 (rev2)
== Series Details == Series: drm/i915/gt: Implement WA_1406941453 (rev2) URL : https://patchwork.freedesktop.org/series/78243/ State : failure == Summary == CI Bug Log - changes from CI_DRM_8846 -> Patchwork_18312 Summary --- **FAILURE** Serious unknown changes coming with Patchwork_18312 absolutely need to be verified manually. If you think the reported changes have nothing to do with the changes introduced in Patchwork_18312, please notify your bug team to allow them to document this new failure mode, which will reduce false positives in CI. External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18312/index.html Possible new issues --- Here are the unknown changes that may have been introduced in Patchwork_18312: ### IGT changes ### Possible regressions * igt@kms_chamelium@hdmi-hpd-fast: - fi-icl-u2: [PASS][1] -> [DMESG-WARN][2] +1 similar issue [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8846/fi-icl-u2/igt@kms_chamel...@hdmi-hpd-fast.html [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18312/fi-icl-u2/igt@kms_chamel...@hdmi-hpd-fast.html Known issues Here are the changes found in Patchwork_18312 that come from known issues: ### IGT changes ### Issues hit * igt@i915_selftest@live@execlists: - fi-icl-y: [PASS][3] -> [INCOMPLETE][4] ([i915#2276]) [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8846/fi-icl-y/igt@i915_selftest@l...@execlists.html [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18312/fi-icl-y/igt@i915_selftest@l...@execlists.html * igt@kms_cursor_legacy@basic-flip-after-cursor-atomic: - fi-icl-u2: [PASS][5] -> [DMESG-WARN][6] ([i915#1982]) +2 similar issues [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8846/fi-icl-u2/igt@kms_cursor_leg...@basic-flip-after-cursor-atomic.html [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18312/fi-icl-u2/igt@kms_cursor_leg...@basic-flip-after-cursor-atomic.html Possible fixes * igt@i915_module_load@reload: - fi-apl-guc: [DMESG-WARN][7] ([i915#1635] / [i915#1982]) -> [PASS][8] [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8846/fi-apl-guc/igt@i915_module_l...@reload.html [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18312/fi-apl-guc/igt@i915_module_l...@reload.html * igt@i915_pm_rpm@basic-pci-d3-state: - fi-byt-j1900: [DMESG-WARN][9] ([i915#1982]) -> [PASS][10] [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8846/fi-byt-j1900/igt@i915_pm_...@basic-pci-d3-state.html [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18312/fi-byt-j1900/igt@i915_pm_...@basic-pci-d3-state.html * igt@i915_pm_rpm@module-reload: - fi-bsw-kefka: [INCOMPLETE][11] ([i915#151] / [i915#1844] / [i915#1909] / [i915#392]) -> [PASS][12] [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8846/fi-bsw-kefka/igt@i915_pm_...@module-reload.html [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18312/fi-bsw-kefka/igt@i915_pm_...@module-reload.html * igt@i915_selftest@live@gt_lrc: - fi-tgl-u2: [DMESG-FAIL][13] ([i915#1233]) -> [PASS][14] [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8846/fi-tgl-u2/igt@i915_selftest@live@gt_lrc.html [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18312/fi-tgl-u2/igt@i915_selftest@live@gt_lrc.html * igt@kms_cursor_legacy@basic-busy-flip-before-cursor-legacy: - fi-icl-u2: [DMESG-WARN][15] ([i915#1982]) -> [PASS][16] +1 similar issue [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8846/fi-icl-u2/igt@kms_cursor_leg...@basic-busy-flip-before-cursor-legacy.html [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18312/fi-icl-u2/igt@kms_cursor_leg...@basic-busy-flip-before-cursor-legacy.html Warnings * igt@gem_exec_suspend@basic-s0: - fi-kbl-x1275: [DMESG-WARN][17] ([i915#62] / [i915#92]) -> [DMESG-WARN][18] ([i915#62] / [i915#92] / [i915#95]) +4 similar issues [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8846/fi-kbl-x1275/igt@gem_exec_susp...@basic-s0.html [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18312/fi-kbl-x1275/igt@gem_exec_susp...@basic-s0.html * igt@i915_pm_rpm@module-reload: - fi-kbl-guc: [DMESG-FAIL][19] ([i915#2203]) -> [DMESG-WARN][20] ([i915#2203]) [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8846/fi-kbl-guc/igt@i915_pm_...@module-reload.html [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18312/fi-kbl-guc/igt@i915_pm_...@module-reload.html * igt@kms_force_connector_basic@force-connector-state: - fi-kbl-x1275: [DMESG-WARN][21] ([i915#62] / [i915#92] / [i915#95]) -> [DMESG-WARN][22] ([i915#62] / [i915#92]) +3 similar issues [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8846/f
Re: [Intel-gfx] [PATCH 1/8] drm/atomic-helper: reset vblank on crtc reset
On Thu, Aug 06, 2020 at 03:43:02PM +0900, Tetsuo Handa wrote: > As of commit 47ec5303d73ea344 ("Merge > git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next") on linux.git , > my VMware environment cannot boot. Do I need to bisect? That sounds like a good idea, but please start a new thread (not reply to some random existing ones), with maintainers for drivers/gpu/drm/vmwgfx only. Not a massive list of random folks who have no idea what's going on here. From get_maintainers.pl $ scripts/get_maintainer.pl -f drivers/gpu/drm/vmwgfx/ VMware Graphics (supporter:DRM DRIVER FOR VMWARE VIRTUAL GPU) Roland Scheidegger (supporter:DRM DRIVER FOR VMWARE VIRTUAL GPU) David Airlie (maintainer:DRM DRIVERS) Daniel Vetter (maintainer:DRM DRIVERS) dri-de...@lists.freedesktop.org (open list:DRM DRIVER FOR VMWARE VIRTUAL GPU) linux-ker...@vger.kernel.org (open list) Cheers, Daniel > > [9.314496][T1] vga16fb: mapped to 0x71050562 > [9.467770][T1] Console: switching to colour frame buffer device 80x30 > [9.632092][T1] fb0: VGA16 VGA frame buffer device > [9.651768][T1] ACPI: AC Adapter [ACAD] (on-line) > [9.672544][T1] input: Power Button as > /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0 > [9.722373][T1] ACPI: Power Button [PWRF] > [9.744231][T1] ioatdma: Intel(R) QuickData Technology Driver 5.00 > [9.820147][T1] N_HDLC line discipline registered with maxframe=4096 > [9.835649][T1] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled > [9.852567][T1] 00:05: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = > 115200) is a 16550A > [ 10.033372][T1] Cyclades driver 2.6 > [ 10.049928][T1] Initializing Nozomi driver 2.1d > [ 10.065493][T1] RocketPort device driver module, version 2.09, > 12-June-2003 > [ 10.095368][T1] No rocketport ports found; unloading driver > [ 10.112430][T1] Non-volatile memory driver v1.3 > [ 10.127090][T1] Linux agpgart interface v0.103 > [ 10.144037][T1] agpgart-intel :00:00.0: Intel 440BX Chipset > [ 10.162275][T1] agpgart-intel :00:00.0: AGP aperture is 256M @ 0x0 > [ 10.181130][T1] [drm] DMA map mode: Caching DMA mappings. > [ 10.195150][T1] [drm] Capabilities: > [ 10.208728][T1] [drm] Rect copy. > [ 10.222772][T1] [drm] Cursor. > [ 10.235364][T1] [drm] Cursor bypass. > [ 10.249121][T1] [drm] Cursor bypass 2. > [ 10.260590][T1] [drm] 8bit emulation. > [ 10.272220][T1] [drm] Alpha cursor. > [ 10.284670][T1] [drm] 3D. > [ 10.295051][T1] [drm] Extended Fifo. > [ 10.305180][T1] [drm] Multimon. > [ 10.315506][T1] [drm] Pitchlock. > [ 10.325167][T1] [drm] Irq mask. > [ 10.334262][T1] [drm] Display Topology. > [ 10.343519][T1] [drm] GMR. > [ 10.352775][T1] [drm] Traces. > [ 10.362166][T1] [drm] GMR2. > [ 10.370716][T1] [drm] Screen Object 2. > [ 10.379220][T1] [drm] Command Buffers. > [ 10.388489][T1] [drm] Command Buffers 2. > [ 10.396055][T1] [drm] Guest Backed Resources. > [ 10.403290][T1] [drm] DX Features. > [ 10.409911][T1] [drm] HP Command Queue. > [ 10.417820][T1] [drm] Capabilities2: > [ 10.424216][T1] [drm] Grow oTable. > [ 10.430423][T1] [drm] IntraSurface copy. > [ 10.436371][T1] [drm] Max GMR ids is 64 > [ 10.442651][T1] [drm] Max number of GMR pages is 65536 > [ 10.450317][T1] [drm] Max dedicated hypervisor surface memory is 0 kiB > [ 10.458809][T1] [drm] Maximum display memory size is 262144 kiB > [ 10.466330][T1] [drm] VRAM at 0xe800 size is 4096 kiB > [ 10.474704][T1] [drm] MMIO at 0xfe00 size is 256 kiB > [ 10.484625][T1] [TTM] Zone kernel: Available graphics memory: 4030538 > KiB > [ 10.500730][T1] [TTM] Zone dma32: Available graphics memory: 2097152 > KiB > [ 10.516851][T1] [TTM] Initializing pool allocator > [ 10.527542][T1] [TTM] Initializing DMA pool allocator > [ 10.540197][T1] BUG: kernel NULL pointer dereference, address: > 0438 > [ 10.550087][T1] #PF: supervisor read access in kernel mode > [ 10.550087][T1] #PF: error_code(0x) - not-present page > [ 10.550087][T1] PGD 0 P4D 0 > [ 10.550087][T1] Oops: [#1] PREEMPT SMP > [ 10.550087][T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.8.0+ #271 > [ 10.550087][T1] Hardware name: VMware, Inc. VMware Virtual > Platform/440BX Desktop Reference Platform, BIOS 6.00 02/27/2020 > [ 10.550087][T1] RIP: 0010:drm_dev_has_vblank+0x9/0x20 > [ 10.550087][T1] Code: 5d 41 5e 41 5f e9 e7 fa 01 ff e8 e2 fa 01 ff 45 > 31 e4 41 8b 5f 48 eb a7 cc cc cc cc cc cc cc cc cc 53 48 89 fb e8 c7 fa 01 ff > <8b> 83 38 04 00 00 5b 85 c0 0f 95 c0 c3 66 2e 0f 1f 84 00 00 00 00 > [ 10.550087][T1] RSP: :c9027b80 EFLAGS: 00010293 > [ 10.550087][