Re: [Intel-gfx] [RFC v2 01/22] drm: RFC for Plane Color Hardware Pipeline
On Tue, 12 Oct 2021 19:11:29 + "Shankar, Uma" wrote: > > -Original Message- > > From: Pekka Paalanen > > Sent: Tuesday, October 12, 2021 5:30 PM > > To: Simon Ser > > Cc: Shankar, Uma ; intel-gfx@lists.freedesktop.org; > > dri- > > de...@lists.freedesktop.org; harry.wentl...@amd.com; > > ville.syrj...@linux.intel.com; brian.star...@arm.com; > > sebast...@sebastianwick.net; shashank.sha...@amd.com > > Subject: Re: [RFC v2 01/22] drm: RFC for Plane Color Hardware Pipeline > > > > On Tue, 12 Oct 2021 10:35:37 + > > Simon Ser wrote: > > > > > On Tuesday, October 12th, 2021 at 12:30, Pekka Paalanen > > wrote: > > > > > > > is there a practise of landing proposal documents in the kernel? How > > > > does that work, will a kernel tree carry the patch files? > > > > Or should this document be worded like documentation for an accepted > > > > feature, and then the patches either land or don't? > > > > > > Once everyone agrees, the RFC can land. I don't think a kernel tree is > > > necessary. See: > > > > > > https://dri.freedesktop.org/docs/drm/gpu/rfc/index.html > > > > Does this mean the RFC doc patch will land, but the code patches will > > remain in the > > review cycles waiting for userspace proving vehicles? > > Rather than e.g. committed as files that people would need to apply > > themselves? Or > > how does one find the code patches corresponding to RFC docs? > > As I understand, this section was added to finalize the design and debate on > the UAPI, > structures, headers and design etc. Once a general agreement is in place with > all the > stakeholders, we can have ack on design and approach and get it merged. This > hence > serves as an approved reference for the UAPI, accepted and agreed by > community at large. > > Once the code lands, all the documentation will be added to the right driver > sections and > helpers, like it's been done currently. I'm just wondering: someone browses a kernel tree, and discovers this RFC doc in there. They want to see or test the latest (WIP) kernel implementation of it. How will they find the code / patches? Thanks, pq pgpZjiSLSTD4M.pgp Description: OpenPGP digital signature
[Intel-gfx] [PATCH v3] lib/stackdepot: allow optional init and stack_table allocation by kvmalloc()
Currently, enabling CONFIG_STACKDEPOT means its stack_table will be allocated from memblock, even if stack depot ends up not actually used. The default size of stack_table is 4MB on 32-bit, 8MB on 64-bit. This is fine for use-cases such as KASAN which is also a config option and has overhead on its own. But it's an issue for functionality that has to be actually enabled on boot (page_owner) or depends on hardware (GPU drivers) and thus the memory might be wasted. This was raised as an issue [1] when attempting to add stackdepot support for SLUB's debug object tracking functionality. It's common to build kernels with CONFIG_SLUB_DEBUG and enable slub_debug on boot only when needed, or create only specific kmem caches with debugging for testing purposes. It would thus be more efficient if stackdepot's table was allocated only when actually going to be used. This patch thus makes the allocation (and whole stack_depot_init() call) optional: - Add a CONFIG_STACKDEPOT_ALWAYS_INIT flag to keep using the current well-defined point of allocation as part of mem_init(). Make CONFIG_KASAN select this flag. - Other users have to call stack_depot_init() as part of their own init when it's determined that stack depot will actually be used. This may depend on both config and runtime conditions. Convert current users which are page_owner and several in the DRM subsystem. Same will be done for SLUB later. - Because the init might now be called after the boot-time memblock allocation has given all memory to the buddy allocator, change stack_depot_init() to allocate stack_table with kvmalloc() when memblock is no longer available. Also handle allocation failure by disabling stackdepot (could have theoretically happened even with memblock allocation previously), and don't unnecessarily align the memblock allocation to its own size anymore. [1] https://lore.kernel.org/all/CAMuHMdW=eovzm1re5fvoen87nkfilmm2+ah7enu2kxehcvb...@mail.gmail.com/ Signed-off-by: Vlastimil Babka Acked-by: Dmitry Vyukov Reviewed-by: Marco Elver # stackdepot Cc: Marco Elver Cc: Vijayanand Jitta Cc: Maarten Lankhorst Cc: Maxime Ripard Cc: Thomas Zimmermann Cc: David Airlie Cc: Daniel Vetter Cc: Andrey Ryabinin Cc: Alexander Potapenko Cc: Andrey Konovalov Cc: Dmitry Vyukov Cc: Geert Uytterhoeven Cc: Oliver Glitta Cc: Imran Khan --- Changes in v3: - stack_depot_init_mutex made static and moved inside stack_depot_init() Reported-by: kernel test robot - use !stack_table condition instead of stack_table == NULL reported by checkpatch on freedesktop.org patchwork drivers/gpu/drm/drm_dp_mst_topology.c | 1 + drivers/gpu/drm/drm_mm.c| 4 +++ drivers/gpu/drm/i915/intel_runtime_pm.c | 3 +++ include/linux/stackdepot.h | 25 --- init/main.c | 2 +- lib/Kconfig | 4 +++ lib/Kconfig.kasan | 2 +- lib/stackdepot.c| 33 + mm/page_owner.c | 2 ++ 9 files changed, 60 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c b/drivers/gpu/drm/drm_dp_mst_topology.c index 86d13d6bc463..b0ebdc843a00 100644 --- a/drivers/gpu/drm/drm_dp_mst_topology.c +++ b/drivers/gpu/drm/drm_dp_mst_topology.c @@ -5493,6 +5493,7 @@ int drm_dp_mst_topology_mgr_init(struct drm_dp_mst_topology_mgr *mgr, mutex_init(&mgr->probe_lock); #if IS_ENABLED(CONFIG_DRM_DEBUG_DP_MST_TOPOLOGY_REFS) mutex_init(&mgr->topology_ref_history_lock); + stack_depot_init(); #endif INIT_LIST_HEAD(&mgr->tx_msg_downq); INIT_LIST_HEAD(&mgr->destroy_port_list); diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c index 93d48a6f04ab..5916228ea0c9 100644 --- a/drivers/gpu/drm/drm_mm.c +++ b/drivers/gpu/drm/drm_mm.c @@ -983,6 +983,10 @@ void drm_mm_init(struct drm_mm *mm, u64 start, u64 size) add_hole(&mm->head_node); mm->scan_active = 0; + +#ifdef CONFIG_DRM_DEBUG_MM + stack_depot_init(); +#endif } EXPORT_SYMBOL(drm_mm_init); diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c b/drivers/gpu/drm/i915/intel_runtime_pm.c index eaf7688f517d..d083506986e1 100644 --- a/drivers/gpu/drm/i915/intel_runtime_pm.c +++ b/drivers/gpu/drm/i915/intel_runtime_pm.c @@ -78,6 +78,9 @@ static void __print_depot_stack(depot_stack_handle_t stack, static void init_intel_runtime_pm_wakeref(struct intel_runtime_pm *rpm) { spin_lock_init(&rpm->debug.lock); + + if (rpm->available) + stack_depot_init(); } static noinline depot_stack_handle_t diff --git a/include/linux/stackdepot.h b/include/linux/stackdepot.h index 6bb4bc1a5f54..40fc5e92194f 100644 --- a/include/linux/stackdepot.h +++ b/include/linux/stackdepot.h @@ -13,6 +13,22 @@ typedef u32 depot_stack_handle_t; +/* + * Every user of stack depot has to call this during its own init when it's + * decided that
[Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for lib/stackdepot: allow optional init and stack_table allocation by kvmalloc() (rev3)
== Series Details == Series: lib/stackdepot: allow optional init and stack_table allocation by kvmalloc() (rev3) URL : https://patchwork.freedesktop.org/series/95549/ State : warning == Summary == $ dim checkpatch origin/drm-tip 50fb572ebbb4 lib/stackdepot: allow optional init and stack_table allocation by kvmalloc() -:7: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line) #7: Currently, enabling CONFIG_STACKDEPOT means its stack_table will be allocated -:209: CHECK:COMPARISON_TO_NULL: Comparison to NULL could be written "!stack_table" #209: FILE: lib/stackdepot.c:175: + if (!stack_depot_disable && stack_table == NULL) { total: 0 errors, 1 warnings, 1 checks, 147 lines checked
[Intel-gfx] ✓ Fi.CI.BAT: success for lib/stackdepot: allow optional init and stack_table allocation by kvmalloc() (rev3)
== Series Details == Series: lib/stackdepot: allow optional init and stack_table allocation by kvmalloc() (rev3) URL : https://patchwork.freedesktop.org/series/95549/ State : success == Summary == CI Bug Log - changes from CI_DRM_10728 -> Patchwork_21326 Summary --- **SUCCESS** No regressions found. External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21326/index.html Possible new issues --- Here are the unknown changes that may have been introduced in Patchwork_21326: ### IGT changes ### Suppressed The following results come from untrusted machines, tests, or statuses. They do not affect the overall result. * igt@i915_selftest@live@hangcheck: - {fi-jsl-1}: [PASS][1] -> [INCOMPLETE][2] [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10728/fi-jsl-1/igt@i915_selftest@l...@hangcheck.html [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21326/fi-jsl-1/igt@i915_selftest@l...@hangcheck.html Known issues Here are the changes found in Patchwork_21326 that come from known issues: ### IGT changes ### Issues hit * igt@amdgpu/amd_basic@cs-gfx: - fi-kbl-soraka: NOTRUN -> [SKIP][3] ([fdo#109271]) +3 similar issues [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21326/fi-kbl-soraka/igt@amdgpu/amd_ba...@cs-gfx.html * igt@kms_flip@basic-flip-vs-modeset@c-dp1: - fi-cfl-8109u: [PASS][4] -> [FAIL][5] ([i915#4165]) [4]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10728/fi-cfl-8109u/igt@kms_flip@basic-flip-vs-mode...@c-dp1.html [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21326/fi-cfl-8109u/igt@kms_flip@basic-flip-vs-mode...@c-dp1.html * igt@kms_frontbuffer_tracking@basic: - fi-cml-u2: [PASS][6] -> [DMESG-WARN][7] ([i915#4269]) [6]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10728/fi-cml-u2/igt@kms_frontbuffer_track...@basic.html [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21326/fi-cml-u2/igt@kms_frontbuffer_track...@basic.html - fi-cfl-8109u: [PASS][8] -> [FAIL][9] ([i915#2546]) [8]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10728/fi-cfl-8109u/igt@kms_frontbuffer_track...@basic.html [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21326/fi-cfl-8109u/igt@kms_frontbuffer_track...@basic.html Possible fixes * igt@i915_selftest@live@hangcheck: - {fi-hsw-gt1}: [DMESG-WARN][10] ([i915#3303]) -> [PASS][11] [10]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10728/fi-hsw-gt1/igt@i915_selftest@l...@hangcheck.html [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21326/fi-hsw-gt1/igt@i915_selftest@l...@hangcheck.html * igt@i915_selftest@live@perf: - {fi-tgl-dsi}: [DMESG-WARN][12] ([i915#2867]) -> [PASS][13] +9 similar issues [12]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10728/fi-tgl-dsi/igt@i915_selftest@l...@perf.html [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21326/fi-tgl-dsi/igt@i915_selftest@l...@perf.html {name}: This element is suppressed. This means it is ignored when computing the status of the difference (SUCCESS, WARNING, or FAILURE). [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271 [i915#2546]: https://gitlab.freedesktop.org/drm/intel/issues/2546 [i915#2867]: https://gitlab.freedesktop.org/drm/intel/issues/2867 [i915#3303]: https://gitlab.freedesktop.org/drm/intel/issues/3303 [i915#3970]: https://gitlab.freedesktop.org/drm/intel/issues/3970 [i915#4165]: https://gitlab.freedesktop.org/drm/intel/issues/4165 [i915#4269]: https://gitlab.freedesktop.org/drm/intel/issues/4269 Participating hosts (41 -> 35) -- Missing(6): fi-ilk-m540 fi-hsw-4200u fi-bsw-cyan fi-apl-guc fi-ctg-p8600 fi-icl-y Build changes - * Linux: CI_DRM_10728 -> Patchwork_21326 CI-20190529: 20190529 CI_DRM_10728: 82a9f298afec66c882e710078138891826ce5e22 @ git://anongit.freedesktop.org/gfx-ci/linux IGT_6242: 721fd85ee95225ed5df322f7182bdfa9b86a3e68 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git Patchwork_21326: 50fb572ebbb41c369436391aba246b388d6b0f13 @ git://anongit.freedesktop.org/gfx-ci/linux == Linux commits == 50fb572ebbb4 lib/stackdepot: allow optional init and stack_table allocation by kvmalloc() == Logs == For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21326/index.html
Re: [Intel-gfx] [PATCH v2] drm/i915: Remove memory frequency calculation
On 2021/10/13 10:54, Matt Roper wrote: On Tue, Oct 12, 2021 at 06:00:46PM -0700, José Roberto de Souza wrote: This memory frequency calculated is only used to check if it is zero, what is not useful as it will never actually be zero. Also the calculation is wrong, we should be checking other bit to select the appropriate frequency multiplier while this code is stuck with a fixed multiplier. So here dropping it as whole. v2: - Also remove memory frequency calculation for gen9 LP platforms Cc: Yakui Zhao Cc: Matt Roper Fixes: f8112cb9574b ("drm/i915/gen11+: Only load DRAM information from pcode") Signed-off-by: José Roberto de Souza Reviewed-by: Matt Roper After removing the check of memory frequency, the EHL SBL can work as expected. Otherwise it will fail some checks in intel_dram_detect because of incorrect memory frequency calculation. Add: Tested-by: Zhao Yakui --- drivers/gpu/drm/i915/i915_reg.h | 8 drivers/gpu/drm/i915/intel_dram.c | 30 ++ 2 files changed, 2 insertions(+), 36 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index a897f4abea0c3..8825f7ac477b6 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -11109,12 +11109,6 @@ enum skl_power_gate { #define DC_STATE_DEBUG_MASK_CORES(1 << 0) #define DC_STATE_DEBUG_MASK_MEMORY_UP(1 << 1) -#define BXT_P_CR_MC_BIOS_REQ_0_0_0 _MMIO(MCHBAR_MIRROR_BASE_SNB + 0x7114) -#define BXT_REQ_DATA_MASK 0x3F -#define BXT_DRAM_CHANNEL_ACTIVE_SHIFT 12 -#define BXT_DRAM_CHANNEL_ACTIVE_MASK (0xF << 12) -#define BXT_MEMORY_FREQ_MULTIPLIER_HZ 1 - #define BXT_D_CR_DRP0_DUNIT8 0x1000 #define BXT_D_CR_DRP0_DUNIT9 0x1200 #define BXT_D_CR_DRP0_DUNIT_START8 @@ -11145,9 +11139,7 @@ enum skl_power_gate { #define BXT_DRAM_TYPE_LPDDR4 (0x2 << 22) #define BXT_DRAM_TYPE_DDR4 (0x4 << 22) -#define SKL_MEMORY_FREQ_MULTIPLIER_HZ 2 #define SKL_MC_BIOS_DATA_0_0_0_MCHBAR_PCU _MMIO(MCHBAR_MIRROR_BASE_SNB + 0x5E04) -#define SKL_REQ_DATA_MASK (0xF << 0) #define DG1_GEAR_TYPEREG_BIT(16) #define SKL_MAD_INTER_CHANNEL_0_0_0_MCHBAR_MCMAIN _MMIO(MCHBAR_MIRROR_BASE_SNB + 0x5000) diff --git a/drivers/gpu/drm/i915/intel_dram.c b/drivers/gpu/drm/i915/intel_dram.c index 30a0cab5eff46..0adadfd9528aa 100644 --- a/drivers/gpu/drm/i915/intel_dram.c +++ b/drivers/gpu/drm/i915/intel_dram.c @@ -244,7 +244,6 @@ static int skl_get_dram_info(struct drm_i915_private *i915) { struct dram_info *dram_info = &i915->dram_info; - u32 mem_freq_khz, val; int ret; dram_info->type = skl_get_dram_type(i915); @@ -255,17 +254,6 @@ skl_get_dram_info(struct drm_i915_private *i915) if (ret) return ret; - val = intel_uncore_read(&i915->uncore, - SKL_MC_BIOS_DATA_0_0_0_MCHBAR_PCU); - mem_freq_khz = DIV_ROUND_UP((val & SKL_REQ_DATA_MASK) * - SKL_MEMORY_FREQ_MULTIPLIER_HZ, 1000); - - if (dram_info->num_channels * mem_freq_khz == 0) { - drm_info(&i915->drm, -"Couldn't get system memory bandwidth\n"); - return -EINVAL; - } - return 0; } @@ -350,24 +338,10 @@ static void bxt_get_dimm_info(struct dram_dimm_info *dimm, u32 val) static int bxt_get_dram_info(struct drm_i915_private *i915) { struct dram_info *dram_info = &i915->dram_info; - u32 dram_channels; - u32 mem_freq_khz, val; - u8 num_active_channels, valid_ranks = 0; + u32 val; + u8 valid_ranks = 0; int i; - val = intel_uncore_read(&i915->uncore, BXT_P_CR_MC_BIOS_REQ_0_0_0); - mem_freq_khz = DIV_ROUND_UP((val & BXT_REQ_DATA_MASK) * - BXT_MEMORY_FREQ_MULTIPLIER_HZ, 1000); - - dram_channels = val & BXT_DRAM_CHANNEL_ACTIVE_MASK; - num_active_channels = hweight32(dram_channels); - - if (mem_freq_khz * num_active_channels == 0) { - drm_info(&i915->drm, -"Couldn't get system memory bandwidth\n"); - return -EINVAL; - } - /* * Now read each DUNIT8/9/10/11 to check the rank of each dimms. */ -- 2.33.0
Re: [Intel-gfx] [PATCH v5] drm/i915/gt: move remaining debugfs interfaces into gt
Hi, sorry, just forgot to add the changelog On Wed, Oct 13, 2021 at 12:17:38AM +0200, Andi Shyti wrote: > From: Andi Shyti > > The following interfaces: > > i915_wedged > i915_forcewake_user > > are dependent on gt values. Put them inside gt/ and drop the > "i915_" prefix name. This would be the new structure: > > dri/0/gt > | > +-- forcewake_user > | > \-- reset > > For backwards compatibility with existing igt (and the slight > semantic difference between operating on the i915 abi entry > points and the deep gt info): > > dri/0 > | > +-- i915_wedged > | > \-- i915_forcewake_user > > remain at the top level. > > Signed-off-by: Andi Shyti > Cc: Tvrtko Ursulin > Cc: Chris Wilson > Reviewed-by: Lucas De Marchi > --- Changelog: -- v4 -> v5: https://patchwork.freedesktop.org/patch/458293/ * rename static functions exposed to header files so that they can keep a coherent namespace (thanks Lucas!) * add Lucas r-b. v3 -> v4: https://patchwork.freedesktop.org/patch/458225/ * remove the unnecessary interrupt_info_show() information. They were already removed here by Chris: cf977e18610e6 ("drm/i915/gem: Spring clean debugfs") v2 -> v3: https://patchwork.freedesktop.org/patch/458108/ * keep the original interfaces as they were (thanks Chris) but implement the functionality inside the gt. The upper level files will call the gt functions (thanks Lucas). v1 -> v2: https://patchwork.freedesktop.org/patch/456652/ * keep the original interfaces intact (thanks Chris). Andi
[Intel-gfx] ✗ Fi.CI.IGT: failure for drm/i915/gt: move remaining debugfs interfaces into gt (rev12)
== Series Details == Series: drm/i915/gt: move remaining debugfs interfaces into gt (rev12) URL : https://patchwork.freedesktop.org/series/75333/ State : failure == Summary == CI Bug Log - changes from CI_DRM_10728_full -> Patchwork_21322_full Summary --- **FAILURE** Serious unknown changes coming with Patchwork_21322_full absolutely need to be verified manually. If you think the reported changes have nothing to do with the changes introduced in Patchwork_21322_full, please notify your bug team to allow them to document this new failure mode, which will reduce false positives in CI. Possible new issues --- Here are the unknown changes that may have been introduced in Patchwork_21322_full: ### IGT changes ### Possible regressions * igt@kms_frontbuffer_tracking@fbc-suspend: - shard-kbl: [PASS][1] -> [INCOMPLETE][2] [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10728/shard-kbl7/igt@kms_frontbuffer_track...@fbc-suspend.html [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21322/shard-kbl4/igt@kms_frontbuffer_track...@fbc-suspend.html Known issues Here are the changes found in Patchwork_21322_full that come from known issues: ### IGT changes ### Issues hit * igt@gem_ctx_isolation@preservation-s3@bcs0: - shard-apl: NOTRUN -> [DMESG-WARN][3] ([i915#180]) +2 similar issues [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21322/shard-apl8/igt@gem_ctx_isolation@preservation...@bcs0.html * igt@gem_ctx_persistence@engines-mixed-process: - shard-snb: NOTRUN -> [SKIP][4] ([fdo#109271] / [i915#1099]) +3 similar issues [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21322/shard-snb6/igt@gem_ctx_persiste...@engines-mixed-process.html * igt@gem_ctx_shared@q-in-order: - shard-snb: NOTRUN -> [SKIP][5] ([fdo#109271]) +294 similar issues [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21322/shard-snb7/igt@gem_ctx_sha...@q-in-order.html * igt@gem_eio@in-flight-suspend: - shard-kbl: [PASS][6] -> [DMESG-WARN][7] ([i915#180]) +1 similar issue [6]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10728/shard-kbl4/igt@gem_...@in-flight-suspend.html [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21322/shard-kbl7/igt@gem_...@in-flight-suspend.html * igt@gem_exec_fair@basic-deadline: - shard-glk: [PASS][8] -> [FAIL][9] ([i915#2846]) [8]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10728/shard-glk9/igt@gem_exec_f...@basic-deadline.html [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21322/shard-glk9/igt@gem_exec_f...@basic-deadline.html * igt@gem_exec_fair@basic-none-rrul@rcs0: - shard-tglb: NOTRUN -> [FAIL][10] ([i915#2842]) +1 similar issue [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21322/shard-tglb1/igt@gem_exec_fair@basic-none-r...@rcs0.html * igt@gem_exec_fair@basic-none-solo@rcs0: - shard-glk: [PASS][11] -> [FAIL][12] ([i915#2842]) [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10728/shard-glk5/igt@gem_exec_fair@basic-none-s...@rcs0.html [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21322/shard-glk6/igt@gem_exec_fair@basic-none-s...@rcs0.html * igt@gem_exec_fair@basic-pace@vcs1: - shard-iclb: NOTRUN -> [FAIL][13] ([i915#2842]) [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21322/shard-iclb1/igt@gem_exec_fair@basic-p...@vcs1.html - shard-tglb: [PASS][14] -> [FAIL][15] ([i915#2842]) [14]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10728/shard-tglb5/igt@gem_exec_fair@basic-p...@vcs1.html [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21322/shard-tglb2/igt@gem_exec_fair@basic-p...@vcs1.html * igt@gem_exec_schedule@u-submit-golden-slice@vecs0: - shard-skl: NOTRUN -> [INCOMPLETE][16] ([i915#3797]) [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21322/shard-skl7/igt@gem_exec_schedule@u-submit-golden-sl...@vecs0.html * igt@gem_exec_whisper@basic-fds-forked: - shard-glk: [PASS][17] -> [DMESG-WARN][18] ([i915#118]) +1 similar issue [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10728/shard-glk6/igt@gem_exec_whis...@basic-fds-forked.html [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21322/shard-glk1/igt@gem_exec_whis...@basic-fds-forked.html * igt@gem_pxp@reject-modify-context-protection-off-2: - shard-tglb: NOTRUN -> [SKIP][19] ([i915#4270]) [19]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21322/shard-tglb8/igt@gem_...@reject-modify-context-protection-off-2.html * igt@gem_render_copy@x-tiled-to-vebox-yf-tiled: - shard-kbl: NOTRUN -> [SKIP][20] ([fdo#109271]) +100 similar issues [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21322/shard-kbl6/i
Re: [Intel-gfx] [RFC v2 01/22] drm: RFC for Plane Color Hardware Pipeline
On Tue, 12 Oct 2021 20:58:27 + "Shankar, Uma" wrote: > > -Original Message- > > From: Pekka Paalanen > > Sent: Tuesday, October 12, 2021 4:01 PM > > To: Shankar, Uma > > Cc: intel-gfx@lists.freedesktop.org; dri-de...@lists.freedesktop.org; > > harry.wentl...@amd.com; ville.syrj...@linux.intel.com; > > brian.star...@arm.com; sebast...@sebastianwick.net; > > shashank.sha...@amd.com > > Subject: Re: [RFC v2 01/22] drm: RFC for Plane Color Hardware Pipeline > > > > On Tue, 7 Sep 2021 03:08:43 +0530 > > Uma Shankar wrote: > > > > > This is a RFC proposal for plane color hardware blocks. > > > It exposes the property interface to userspace and calls out the > > > details or interfaces created and the intended purpose. > > > > > > Credits: Ville Syrjälä > > > Signed-off-by: Uma Shankar > > > --- > > > Documentation/gpu/rfc/drm_color_pipeline.rst | 167 > > > +++ > > > 1 file changed, 167 insertions(+) > > > create mode 100644 Documentation/gpu/rfc/drm_color_pipeline.rst > > > > > > diff --git a/Documentation/gpu/rfc/drm_color_pipeline.rst > > > b/Documentation/gpu/rfc/drm_color_pipeline.rst > > > new file mode 100644 > > > index ..0d1ca858783b > > > --- /dev/null > > > +++ b/Documentation/gpu/rfc/drm_color_pipeline.rst > > > @@ -0,0 +1,167 @@ > > > +== > > > +Display Color Pipeline: Proposed DRM Properties ... > > > +Proposal is to have below properties for a plane: > > > + > > > +* Plane Degamma or Pre-Curve: > > > + * This will be used to linearize the input framebuffer data. > > > + * It will apply the reverse of the color transfer function. > > > + * It can be a degamma curve or OETF for HDR. > > > > As you want to produce light-linear values, you use EOTF or inverse OETF. > > > > The term OETF has a built-in assumption that that happens in a camera: > > it takes in light and produces and electrical signal. Lately I have > > personally started talking about non-linear encoding of color values, > > since EOTF is often associated with displays if nothing else is said > > (taking in an electrical signal and producing light). > > > > So this would be decoding the color values into light-linear color > > values. That is what an EOTF does, yes, but I feel there is a nuanced > > difference. A piece of equipment implements an EOTF by turning an > > electrical signal into light, hence EOTF often refers to specific > > equipment. You could talk about content EOTF to denote content value > > encoding, as opposed to output or display EOTF, but that might be > > confusing if you look at e.g. the diagrams in BT.2100: is it the EOTF or is > > it the inverse OETF? Is the (inverse?) OOTF included? > > > > So I try to side-step those questions by talking about encoding. > > The idea here is that frame buffer presented to display plane engine will be > non-linear. > So output of a media decode should result in content with EOTF applied. Hi, sure, but the question is: which EOTF. There can be many different things called "EOTF" in a single pipeline, and then it's up to the document writer to make the difference between them. Comparing two documents with different conventions causes a lot of confusion in my personal experience, so it is good to define the concepts more carefully. > So output of a media decode should result in content with EOTF applied. I suspect you have it backwards. Media decode produces electrical (non-linear) pixel color values. If EOTF was applied, they would be linear instead (and require more memory to achieve the same visual precision). If you want to put it this way, you could say "with inverse EOTF applied", but that might be slightly confusing because it is already baked in to the video, it's not something a media decoder has to specifically apply, I think. However, the (inverse) EOTF in this case is the content EOTF, not the display EOTF. If content and display EOTF differ, then one must apply first content EOTF and then inverse display EOTF to get values that are correctly encoded for the display. (This is necessary but not sufficient in general.) Mind, that this is not an OOTF nor an artistic adjustment, this is purely a value encoding conversion. > Playback transfer function (EOTF): inverse OETF plus rendering intent gamma. Does "rendering intent gamma" refer to artistic adjustments, not OOTF? cf. BT.2100 Annex 1, "The relationship between the OETF, the EOTF and the OOTF", although I find those diagrams somewhat confusing still. It does not seem to clearly account for transmission non-linear encoding being different from the display EOTF. Different documents use OOTF to refer to different things. Then there is also the fundamental difference between PQ and HLG systems, where OOTF is by definition in different places of the camera-transmission-display pipeline. > > To make it linear, we should apply the OETF. Confusion is whether OETF is > equivalent to > in
[Intel-gfx] ✗ Fi.CI.IGT: failure for drm/i915: Remove memory frequency calculation (rev2)
== Series Details == Series: drm/i915: Remove memory frequency calculation (rev2) URL : https://patchwork.freedesktop.org/series/95748/ State : failure == Summary == CI Bug Log - changes from CI_DRM_10728_full -> Patchwork_21324_full Summary --- **FAILURE** Serious unknown changes coming with Patchwork_21324_full absolutely need to be verified manually. If you think the reported changes have nothing to do with the changes introduced in Patchwork_21324_full, please notify your bug team to allow them to document this new failure mode, which will reduce false positives in CI. Possible new issues --- Here are the unknown changes that may have been introduced in Patchwork_21324_full: ### IGT changes ### Possible regressions * igt@kms_cursor_edge_walk@pipe-d-128x128-bottom-edge: - shard-tglb: [PASS][1] -> [INCOMPLETE][2] [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10728/shard-tglb3/igt@kms_cursor_edge_w...@pipe-d-128x128-bottom-edge.html [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21324/shard-tglb6/igt@kms_cursor_edge_w...@pipe-d-128x128-bottom-edge.html * igt@kms_frontbuffer_tracking@fbc-suspend: - shard-kbl: [PASS][3] -> [INCOMPLETE][4] [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10728/shard-kbl7/igt@kms_frontbuffer_track...@fbc-suspend.html [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21324/shard-kbl3/igt@kms_frontbuffer_track...@fbc-suspend.html Known issues Here are the changes found in Patchwork_21324_full that come from known issues: ### IGT changes ### Issues hit * igt@gem_ctx_persistence@engines-mixed-process: - shard-snb: NOTRUN -> [SKIP][5] ([fdo#109271] / [i915#1099]) +3 similar issues [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21324/shard-snb2/igt@gem_ctx_persiste...@engines-mixed-process.html * igt@gem_ctx_shared@q-in-order: - shard-snb: NOTRUN -> [SKIP][6] ([fdo#109271]) +294 similar issues [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21324/shard-snb5/igt@gem_ctx_sha...@q-in-order.html * igt@gem_eio@unwedge-stress: - shard-skl: [PASS][7] -> [TIMEOUT][8] ([i915#2369] / [i915#3063]) [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10728/shard-skl1/igt@gem_...@unwedge-stress.html [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21324/shard-skl1/igt@gem_...@unwedge-stress.html * igt@gem_exec_fair@basic-flow@rcs0: - shard-tglb: [PASS][9] -> [FAIL][10] ([i915#2842]) +3 similar issues [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10728/shard-tglb8/igt@gem_exec_fair@basic-f...@rcs0.html [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21324/shard-tglb8/igt@gem_exec_fair@basic-f...@rcs0.html * igt@gem_exec_fair@basic-none-rrul@rcs0: - shard-tglb: NOTRUN -> [FAIL][11] ([i915#2842]) +1 similar issue [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21324/shard-tglb5/igt@gem_exec_fair@basic-none-r...@rcs0.html * igt@gem_exec_fair@basic-none@vcs0: - shard-kbl: [PASS][12] -> [FAIL][13] ([i915#2842]) +2 similar issues [12]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10728/shard-kbl3/igt@gem_exec_fair@basic-n...@vcs0.html [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21324/shard-kbl3/igt@gem_exec_fair@basic-n...@vcs0.html * igt@gem_exec_fair@basic-pace@vcs1: - shard-iclb: NOTRUN -> [FAIL][14] ([i915#2842]) [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21324/shard-iclb2/igt@gem_exec_fair@basic-p...@vcs1.html * igt@gem_exec_schedule@u-submit-golden-slice@vecs0: - shard-skl: NOTRUN -> [INCOMPLETE][15] ([i915#3797]) [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21324/shard-skl3/igt@gem_exec_schedule@u-submit-golden-sl...@vecs0.html * igt@gem_pxp@reject-modify-context-protection-off-2: - shard-tglb: NOTRUN -> [SKIP][16] ([i915#4270]) +1 similar issue [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21324/shard-tglb3/igt@gem_...@reject-modify-context-protection-off-2.html * igt@gem_render_copy@x-tiled-to-vebox-yf-tiled: - shard-kbl: NOTRUN -> [SKIP][17] ([fdo#109271]) +125 similar issues [17]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21324/shard-kbl7/igt@gem_render_c...@x-tiled-to-vebox-yf-tiled.html * igt@gem_softpin@evict-snoop: - shard-tglb: NOTRUN -> [SKIP][18] ([fdo#109312]) [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21324/shard-tglb3/igt@gem_soft...@evict-snoop.html * igt@gem_userptr_blits@input-checking: - shard-apl: NOTRUN -> [DMESG-WARN][19] ([i915#3002]) [19]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21324/shard-apl6/igt@gem_userptr_bl...@input-checking.html * igt@gem_userptr_blits@unsync-unm
Re: [Intel-gfx] [PATCH v2] drm/i915: Remove memory frequency calculation
On Tue, Oct 12, 2021 at 06:00:46PM -0700, José Roberto de Souza wrote: > This memory frequency calculated is only used to check if it is zero, > what is not useful as it will never actually be zero. > > Also the calculation is wrong, we should be checking other bit to > select the appropriate frequency multiplier while this code is stuck > with a fixed multiplier. I don't think the alternate ref clock was ever used. At least I don't recall ever seeing it. The real problem with this is that IIRC this is just the last requested frequency. So on a system with SAGV this will change dynamically. > > So here dropping it as whole. We have a second copy of this in gen6_update_ring_freq(). Rather than removing one and leaving another potentially broken one behind we should probably just consolidate on a single implementation. > > v2: > - Also remove memory frequency calculation for gen9 LP platforms > > Cc: Yakui Zhao > Cc: Matt Roper > Fixes: f8112cb9574b ("drm/i915/gen11+: Only load DRAM information from pcode") > Signed-off-by: José Roberto de Souza > --- > drivers/gpu/drm/i915/i915_reg.h | 8 > drivers/gpu/drm/i915/intel_dram.c | 30 ++ > 2 files changed, 2 insertions(+), 36 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h > index a897f4abea0c3..8825f7ac477b6 100644 > --- a/drivers/gpu/drm/i915/i915_reg.h > +++ b/drivers/gpu/drm/i915/i915_reg.h > @@ -11109,12 +11109,6 @@ enum skl_power_gate { > #define DC_STATE_DEBUG_MASK_CORES (1 << 0) > #define DC_STATE_DEBUG_MASK_MEMORY_UP (1 << 1) > > -#define BXT_P_CR_MC_BIOS_REQ_0_0_0 _MMIO(MCHBAR_MIRROR_BASE_SNB + 0x7114) > -#define BXT_REQ_DATA_MASK 0x3F > -#define BXT_DRAM_CHANNEL_ACTIVE_SHIFT 12 > -#define BXT_DRAM_CHANNEL_ACTIVE_MASK(0xF << 12) > -#define BXT_MEMORY_FREQ_MULTIPLIER_HZ 1 > - > #define BXT_D_CR_DRP0_DUNIT8 0x1000 > #define BXT_D_CR_DRP0_DUNIT9 0x1200 > #define BXT_D_CR_DRP0_DUNIT_START 8 > @@ -11145,9 +11139,7 @@ enum skl_power_gate { > #define BXT_DRAM_TYPE_LPDDR4(0x2 << 22) > #define BXT_DRAM_TYPE_DDR4 (0x4 << 22) > > -#define SKL_MEMORY_FREQ_MULTIPLIER_HZ2 > #define SKL_MC_BIOS_DATA_0_0_0_MCHBAR_PCU_MMIO(MCHBAR_MIRROR_BASE_SNB + > 0x5E04) > -#define SKL_REQ_DATA_MASK (0xF << 0) > #define DG1_GEAR_TYPE REG_BIT(16) > > #define SKL_MAD_INTER_CHANNEL_0_0_0_MCHBAR_MCMAIN > _MMIO(MCHBAR_MIRROR_BASE_SNB + 0x5000) > diff --git a/drivers/gpu/drm/i915/intel_dram.c > b/drivers/gpu/drm/i915/intel_dram.c > index 30a0cab5eff46..0adadfd9528aa 100644 > --- a/drivers/gpu/drm/i915/intel_dram.c > +++ b/drivers/gpu/drm/i915/intel_dram.c > @@ -244,7 +244,6 @@ static int > skl_get_dram_info(struct drm_i915_private *i915) > { > struct dram_info *dram_info = &i915->dram_info; > - u32 mem_freq_khz, val; > int ret; > > dram_info->type = skl_get_dram_type(i915); > @@ -255,17 +254,6 @@ skl_get_dram_info(struct drm_i915_private *i915) > if (ret) > return ret; > > - val = intel_uncore_read(&i915->uncore, > - SKL_MC_BIOS_DATA_0_0_0_MCHBAR_PCU); > - mem_freq_khz = DIV_ROUND_UP((val & SKL_REQ_DATA_MASK) * > - SKL_MEMORY_FREQ_MULTIPLIER_HZ, 1000); > - > - if (dram_info->num_channels * mem_freq_khz == 0) { > - drm_info(&i915->drm, > - "Couldn't get system memory bandwidth\n"); > - return -EINVAL; > - } > - > return 0; > } > > @@ -350,24 +338,10 @@ static void bxt_get_dimm_info(struct dram_dimm_info > *dimm, u32 val) > static int bxt_get_dram_info(struct drm_i915_private *i915) > { > struct dram_info *dram_info = &i915->dram_info; > - u32 dram_channels; > - u32 mem_freq_khz, val; > - u8 num_active_channels, valid_ranks = 0; > + u32 val; > + u8 valid_ranks = 0; > int i; > > - val = intel_uncore_read(&i915->uncore, BXT_P_CR_MC_BIOS_REQ_0_0_0); > - mem_freq_khz = DIV_ROUND_UP((val & BXT_REQ_DATA_MASK) * > - BXT_MEMORY_FREQ_MULTIPLIER_HZ, 1000); > - > - dram_channels = val & BXT_DRAM_CHANNEL_ACTIVE_MASK; > - num_active_channels = hweight32(dram_channels); > - > - if (mem_freq_khz * num_active_channels == 0) { > - drm_info(&i915->drm, > - "Couldn't get system memory bandwidth\n"); > - return -EINVAL; > - } > - > /* >* Now read each DUNIT8/9/10/11 to check the rank of each dimms. >*/ > -- > 2.33.0 -- Ville Syrjälä Intel
[Intel-gfx] [PATCH 0/1] drm/i915: vlv sideband
Three main ideas here: - vlv sideband only has the name "sideband" in common with the rest of intel_sideband.[ch] - we may need better abstractions on the dependency, this should help a little bit; maybe vlv_sideband.[ch] can be turned into that abstraction layer - we probably want to split out sideband registers from i915_reg.h, and they could go to vlv_sideband.h or vlv_sideband_reg.h or something BR, Jani. Cc: Lucas De Marchi Cc: Ville Syrjälä Jani Nikula (1): drm/i915: split out vlv sideband to a separate file drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/display/g4x_dp.c | 2 +- drivers/gpu/drm/i915/display/g4x_hdmi.c | 2 +- drivers/gpu/drm/i915/display/intel_cdclk.c| 1 + drivers/gpu/drm/i915/display/intel_display.c | 1 + .../drm/i915/display/intel_display_debugfs.c | 1 - .../drm/i915/display/intel_display_power.c| 4 +- drivers/gpu/drm/i915/display/intel_dp.c | 1 - drivers/gpu/drm/i915/display/intel_dpio_phy.c | 5 +- drivers/gpu/drm/i915/display/intel_dpll.c | 2 +- drivers/gpu/drm/i915/display/intel_dsi_vbt.c | 2 +- drivers/gpu/drm/i915/display/vlv_dsi.c| 2 +- drivers/gpu/drm/i915/display/vlv_dsi_pll.c| 2 +- drivers/gpu/drm/i915/gt/intel_gt_pm_debugfs.c | 1 + drivers/gpu/drm/i915/gt/intel_rps.c | 1 + drivers/gpu/drm/i915/i915_debugfs.c | 1 - drivers/gpu/drm/i915/i915_sysfs.c | 1 - drivers/gpu/drm/i915/intel_pm.c | 1 + drivers/gpu/drm/i915/intel_sideband.c | 257 - drivers/gpu/drm/i915/intel_sideband.h | 110 drivers/gpu/drm/i915/vlv_sideband.c | 266 ++ drivers/gpu/drm/i915/vlv_sideband.h | 123 22 files changed, 405 insertions(+), 382 deletions(-) create mode 100644 drivers/gpu/drm/i915/vlv_sideband.c create mode 100644 drivers/gpu/drm/i915/vlv_sideband.h -- 2.30.2
[Intel-gfx] [PATCH 1/1] drm/i915: split out vlv sideband to a separate file
The VLV/CHV sideband code is pretty distinct from the rest of the sideband code. Split it out to new vlv_sideband.[ch]. Pure code movement with relevant #include changes, and a tiny checkpatch fix on top. Cc: Lucas De Marchi Cc: Ville Syrjälä Signed-off-by: Jani Nikula --- drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/display/g4x_dp.c | 2 +- drivers/gpu/drm/i915/display/g4x_hdmi.c | 2 +- drivers/gpu/drm/i915/display/intel_cdclk.c| 1 + drivers/gpu/drm/i915/display/intel_display.c | 1 + .../drm/i915/display/intel_display_debugfs.c | 1 - .../drm/i915/display/intel_display_power.c| 4 +- drivers/gpu/drm/i915/display/intel_dp.c | 1 - drivers/gpu/drm/i915/display/intel_dpio_phy.c | 5 +- drivers/gpu/drm/i915/display/intel_dpll.c | 2 +- drivers/gpu/drm/i915/display/intel_dsi_vbt.c | 2 +- drivers/gpu/drm/i915/display/vlv_dsi.c| 2 +- drivers/gpu/drm/i915/display/vlv_dsi_pll.c| 2 +- drivers/gpu/drm/i915/gt/intel_gt_pm_debugfs.c | 1 + drivers/gpu/drm/i915/gt/intel_rps.c | 1 + drivers/gpu/drm/i915/i915_debugfs.c | 1 - drivers/gpu/drm/i915/i915_sysfs.c | 1 - drivers/gpu/drm/i915/intel_pm.c | 1 + drivers/gpu/drm/i915/intel_sideband.c | 257 - drivers/gpu/drm/i915/intel_sideband.h | 110 drivers/gpu/drm/i915/vlv_sideband.c | 266 ++ drivers/gpu/drm/i915/vlv_sideband.h | 123 22 files changed, 405 insertions(+), 382 deletions(-) create mode 100644 drivers/gpu/drm/i915/vlv_sideband.c create mode 100644 drivers/gpu/drm/i915/vlv_sideband.h diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index 21b05ed0e4e8..d50d2b144fc6 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -54,6 +54,7 @@ i915-y += i915_drv.o \ intel_step.o \ intel_uncore.o \ intel_wakeref.o \ + vlv_sideband.o \ vlv_suspend.o # core library code diff --git a/drivers/gpu/drm/i915/display/g4x_dp.c b/drivers/gpu/drm/i915/display/g4x_dp.c index 85a09c3e09e8..dc41868d01ef 100644 --- a/drivers/gpu/drm/i915/display/g4x_dp.c +++ b/drivers/gpu/drm/i915/display/g4x_dp.c @@ -18,7 +18,7 @@ #include "intel_hdmi.h" #include "intel_hotplug.h" #include "intel_pps.h" -#include "intel_sideband.h" +#include "vlv_sideband.h" struct dp_link_dpll { int clock; diff --git a/drivers/gpu/drm/i915/display/g4x_hdmi.c b/drivers/gpu/drm/i915/display/g4x_hdmi.c index be352e9f0afc..88c427f3c346 100644 --- a/drivers/gpu/drm/i915/display/g4x_hdmi.c +++ b/drivers/gpu/drm/i915/display/g4x_hdmi.c @@ -14,8 +14,8 @@ #include "intel_fifo_underrun.h" #include "intel_hdmi.h" #include "intel_hotplug.h" -#include "intel_sideband.h" #include "intel_sdvo.h" +#include "vlv_sideband.h" static void intel_hdmi_prepare(struct intel_encoder *encoder, const struct intel_crtc_state *crtc_state) diff --git a/drivers/gpu/drm/i915/display/intel_cdclk.c b/drivers/gpu/drm/i915/display/intel_cdclk.c index ecb28e8f1eb6..44bb18773509 100644 --- a/drivers/gpu/drm/i915/display/intel_cdclk.c +++ b/drivers/gpu/drm/i915/display/intel_cdclk.c @@ -30,6 +30,7 @@ #include "intel_display_types.h" #include "intel_psr.h" #include "intel_sideband.h" +#include "vlv_sideband.h" /** * DOC: CDCLK / RAWCLK diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c index 9cf987ee143d..3602fdb2a549 100644 --- a/drivers/gpu/drm/i915/display/intel_display.c +++ b/drivers/gpu/drm/i915/display/intel_display.c @@ -109,6 +109,7 @@ #include "i9xx_plane.h" #include "skl_scaler.h" #include "skl_universal_plane.h" +#include "vlv_sideband.h" static void i9xx_crtc_clock_get(struct intel_crtc *crtc, struct intel_crtc_state *pipe_config); diff --git a/drivers/gpu/drm/i915/display/intel_display_debugfs.c b/drivers/gpu/drm/i915/display/intel_display_debugfs.c index bc5113589f0a..e04767695530 100644 --- a/drivers/gpu/drm/i915/display/intel_display_debugfs.c +++ b/drivers/gpu/drm/i915/display/intel_display_debugfs.c @@ -20,7 +20,6 @@ #include "intel_hdmi.h" #include "intel_pm.h" #include "intel_psr.h" -#include "intel_sideband.h" #include "intel_sprite.h" static inline struct drm_i915_private *node_to_i915(struct drm_info_node *node) diff --git a/drivers/gpu/drm/i915/display/intel_display_power.c b/drivers/gpu/drm/i915/display/intel_display_power.c index 06e9879aedd7..709569211c85 100644 --- a/drivers/gpu/drm/i915/display/intel_display_power.c +++ b/drivers/gpu/drm/i915/display/intel_display_power.c @@ -3,12 +3,11 @@ * Copyright © 2019 Intel Corporation */ -#include "display/intel_crt.h" - #include "i915_drv.h" #include "i915_irq.h" #include "intel_cdclk.h" #include "intel_combo_phy.h" +#include "intel_crt.h" #include "intel_de
[Intel-gfx] ✗ Fi.CI.IGT: failure for lib/stackdepot: allow optional init and stack_table allocation by kvmalloc() (rev3)
== Series Details == Series: lib/stackdepot: allow optional init and stack_table allocation by kvmalloc() (rev3) URL : https://patchwork.freedesktop.org/series/95549/ State : failure == Summary == CI Bug Log - changes from CI_DRM_10728_full -> Patchwork_21326_full Summary --- **FAILURE** Serious unknown changes coming with Patchwork_21326_full absolutely need to be verified manually. If you think the reported changes have nothing to do with the changes introduced in Patchwork_21326_full, please notify your bug team to allow them to document this new failure mode, which will reduce false positives in CI. Possible new issues --- Here are the unknown changes that may have been introduced in Patchwork_21326_full: ### IGT changes ### Possible regressions * igt@i915_pm_rpm@reg-read-ioctl: - shard-iclb: [PASS][1] -> [INCOMPLETE][2] [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10728/shard-iclb1/igt@i915_pm_...@reg-read-ioctl.html [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21326/shard-iclb7/igt@i915_pm_...@reg-read-ioctl.html * igt@kms_frontbuffer_tracking@fbc-suspend: - shard-kbl: [PASS][3] -> [INCOMPLETE][4] [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10728/shard-kbl7/igt@kms_frontbuffer_track...@fbc-suspend.html [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21326/shard-kbl4/igt@kms_frontbuffer_track...@fbc-suspend.html ### Piglit changes ### Possible regressions * spec@glsl-1.50@execution@built-in-functions@gs-op-assign-mult-ivec2-ivec2 (NEW): - pig-snb-2600: NOTRUN -> [FAIL][5] [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21326/pig-snb-2600/spec@glsl-1.50@execution@built-in-functi...@gs-op-assign-mult-ivec2-ivec2.html New tests - New tests have been introduced between CI_DRM_10728_full and Patchwork_21326_full: ### New Piglit tests (1) ### * spec@glsl-1.50@execution@built-in-functions@gs-op-assign-mult-ivec2-ivec2: - Statuses : 1 fail(s) - Exec time: [0.19] s Known issues Here are the changes found in Patchwork_21326_full that come from known issues: ### IGT changes ### Issues hit * igt@gem_ctx_persistence@engines-mixed-process: - shard-snb: NOTRUN -> [SKIP][6] ([fdo#109271] / [i915#1099]) +1 similar issue [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21326/shard-snb6/igt@gem_ctx_persiste...@engines-mixed-process.html * igt@gem_exec_fair@basic-deadline: - shard-skl: NOTRUN -> [FAIL][7] ([i915#2846]) [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21326/shard-skl9/igt@gem_exec_f...@basic-deadline.html * igt@gem_exec_fair@basic-none-rrul@rcs0: - shard-tglb: NOTRUN -> [FAIL][8] ([i915#2842]) +1 similar issue [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21326/shard-tglb3/igt@gem_exec_fair@basic-none-r...@rcs0.html * igt@gem_exec_fair@basic-none-solo@rcs0: - shard-glk: [PASS][9] -> [FAIL][10] ([i915#2842]) +1 similar issue [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10728/shard-glk5/igt@gem_exec_fair@basic-none-s...@rcs0.html [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21326/shard-glk2/igt@gem_exec_fair@basic-none-s...@rcs0.html * igt@gem_exec_fair@basic-none@vcs0: - shard-kbl: [PASS][11] -> [FAIL][12] ([i915#2842]) +1 similar issue [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10728/shard-kbl3/igt@gem_exec_fair@basic-n...@vcs0.html [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21326/shard-kbl6/igt@gem_exec_fair@basic-n...@vcs0.html * igt@gem_exec_fair@basic-pace@vcs1: - shard-iclb: NOTRUN -> [FAIL][13] ([i915#2842]) [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21326/shard-iclb4/igt@gem_exec_fair@basic-p...@vcs1.html - shard-tglb: [PASS][14] -> [FAIL][15] ([i915#2842]) [14]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10728/shard-tglb5/igt@gem_exec_fair@basic-p...@vcs1.html [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21326/shard-tglb6/igt@gem_exec_fair@basic-p...@vcs1.html * igt@gem_exec_schedule@u-submit-golden-slice@vecs0: - shard-skl: NOTRUN -> [INCOMPLETE][16] ([i915#3797]) [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21326/shard-skl10/igt@gem_exec_schedule@u-submit-golden-sl...@vecs0.html * igt@gem_fenced_exec_thrash@2-spare-fences: - shard-snb: [PASS][17] -> [INCOMPLETE][18] ([i915#2055]) [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10728/shard-snb5/igt@gem_fenced_exec_thr...@2-spare-fences.html [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21326/shard-snb6/igt@gem_fenced_exec_thr...@2-spare-fences.html * igt@gem_huc_copy@huc-copy: - shard-tglb: [PASS][19] -> [SKIP][20] ([
[Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915: vlv sideband
== Series Details == Series: drm/i915: vlv sideband URL : https://patchwork.freedesktop.org/series/95764/ State : warning == Summary == $ dim checkpatch origin/drm-tip ba91b0757d4b drm/i915: split out vlv sideband to a separate file -:666: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating? #666: new file mode 100644 total: 0 errors, 1 warnings, 0 checks, 934 lines checked
Re: [Intel-gfx] [PATCH 0/1] drm/i915: vlv sideband
On Wed, Oct 13, 2021 at 01:11:58PM +0300, Jani Nikula wrote: > Three main ideas here: > > - vlv sideband only has the name "sideband" in common with the rest of > intel_sideband.[ch] I wouldn't put it like that. There are two actual sideband implementtions in that file: - vlv/chv iosf sideband (vlv_sideband) - lpt/wpt iosf sideband (intel_sbi) And the third thing in that file is the snb+ pcode mailbox stuff, which has nothing to do with sideband. -- Ville Syrjälä Intel
[Intel-gfx] [PATCH] drm/i915: Use dma_resv_iter for waiting in i915_gem_object_wait_reservation.
No memory should be allocated when calling i915_gem_object_wait, because it may be called to idle a BO when evicting memory. Fix this by using dma_resv_iter helpers to call i915_gem_object_wait_fence() on each fence, which cleans up the code a lot. Also remove dma_resv_prune, it's questionably. This will result in the following lockdep splat. <4> [83.538517] == <4> [83.538520] WARNING: possible circular locking dependency detected <4> [83.538522] 5.15.0-rc5-CI-Trybot_8062+ #1 Not tainted <4> [83.538525] -- <4> [83.538527] gem_render_line/5242 is trying to acquire lock: <4> [83.538530] 8275b1e0 (fs_reclaim){+.+.}-{0:0}, at: __kmalloc_track_caller+0x56/0x270 <4> [83.538538] but task is already holding lock: <4> [83.538540] 88813471d1e0 (&vm->mutex/1){+.+.}-{3:3}, at: i915_vma_pin_ww+0x1c7/0x970 [i915] <4> [83.538638] which lock already depends on the new lock. <4> [83.538642] the existing dependency chain (in reverse order) is: <4> [83.538645] -> #1 (&vm->mutex/1){+.+.}-{3:3}: <4> [83.538649]lock_acquire+0xd3/0x310 <4> [83.538654]i915_gem_shrinker_taints_mutex+0x2d/0x50 [i915] <4> [83.538730]i915_address_space_init+0xf5/0x1b0 [i915] <4> [83.538794]ppgtt_init+0x55/0x70 [i915] <4> [83.538856]gen8_ppgtt_create+0x44/0x5d0 [i915] <4> [83.538912]i915_ppgtt_create+0x28/0xf0 [i915] <4> [83.538971]intel_gt_init+0x130/0x3b0 [i915] <4> [83.539029]i915_gem_init+0x14b/0x220 [i915] <4> [83.539100]i915_driver_probe+0x97e/0xdd0 [i915] <4> [83.539149]i915_pci_probe+0x43/0x1d0 [i915] <4> [83.539197]pci_device_probe+0x9b/0x110 <4> [83.539201]really_probe+0x1b0/0x3b0 <4> [83.539205]__driver_probe_device+0xf6/0x170 <4> [83.539208]driver_probe_device+0x1a/0x90 <4> [83.539210]__driver_attach+0x93/0x160 <4> [83.539213]bus_for_each_dev+0x72/0xc0 <4> [83.539216]bus_add_driver+0x14b/0x1f0 <4> [83.539220]driver_register+0x66/0xb0 <4> [83.539222]hdmi_get_spk_alloc+0x1f/0x50 [snd_hda_codec_hdmi] <4> [83.539227]do_one_initcall+0x53/0x2e0 <4> [83.539230]do_init_module+0x55/0x200 <4> [83.539234]load_module+0x2700/0x2980 <4> [83.539237]__do_sys_finit_module+0xaa/0x110 <4> [83.539241]do_syscall_64+0x37/0xb0 <4> [83.539244]entry_SYSCALL_64_after_hwframe+0x44/0xae <4> [83.539247] -> #0 (fs_reclaim){+.+.}-{0:0}: <4> [83.539251]validate_chain+0xb37/0x1e70 <4> [83.539254]__lock_acquire+0x5a1/0xb70 <4> [83.539258]lock_acquire+0xd3/0x310 <4> [83.539260]fs_reclaim_acquire+0x9d/0xd0 <4> [83.539264]__kmalloc_track_caller+0x56/0x270 <4> [83.539267]krealloc+0x48/0xa0 <4> [83.539270]dma_resv_get_fences+0x1c3/0x280 <4> [83.539274]i915_gem_object_wait+0x1ff/0x410 [i915] <4> [83.539342]i915_gem_evict_for_node+0x16b/0x440 [i915] <4> [83.539412]i915_gem_gtt_reserve+0xff/0x130 [i915] <4> [83.539482]i915_vma_pin_ww+0x765/0x970 [i915] <4> [83.539556]eb_validate_vmas+0x6fe/0x8e0 [i915] <4> [83.539626]i915_gem_do_execbuffer+0x9a6/0x20a0 [i915] <4> [83.539693]i915_gem_execbuffer2_ioctl+0x11f/0x2c0 [i915] <4> [83.539759]drm_ioctl_kernel+0xac/0x140 <4> [83.539763]drm_ioctl+0x201/0x3d0 <4> [83.539766]__x64_sys_ioctl+0x6a/0xa0 <4> [83.539769]do_syscall_64+0x37/0xb0 <4> [83.539772]entry_SYSCALL_64_after_hwframe+0x44/0xae <4> [83.539775] other info that might help us debug this: <4> [83.539778] Possible unsafe locking scenario: <4> [83.539781]CPU0CPU1 <4> [83.539783] <4> [83.539785] lock(&vm->mutex/1); <4> [83.539788]lock(fs_reclaim); <4> [83.539791]lock(&vm->mutex/1); <4> [83.539794] lock(fs_reclaim); <4> [83.539796] *** DEADLOCK *** <4> [83.539799] 3 locks held by gem_render_line/5242: <4> [83.539802] #0: c9d4bbf0 (reservation_ww_class_acquire){+.+.}-{0:0}, at: i915_gem_do_execbuffer+0x8e5/0x20a0 [i915] <4> [83.539870] #1: 88811e48bae8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: eb_validate_vmas+0x81/0x8e0 [i915] <4> [83.539936] #2: 88813471d1e0 (&vm->mutex/1){+.+.}-{3:3}, at: i915_vma_pin_ww+0x1c7/0x970 [i915] <4> [83.540011] stack backtrace: <4> [83.540014] CPU: 2 PID: 5242 Comm: gem_render_line Not tainted 5.15.0-rc5-CI-Trybot_8062+ #1 <4> [83.540019] Hardware name: Intel(R) Client Systems NUC11TNHi3/NUC11TNBi3, BIOS TNTGL357.0038.2020.1124.1648 11/24/2020 <4> [83.540023] Call Trace: <4> [83.540026] dump_stack_lvl+0x56/0x7b <4> [83.540030] check_noncircular+0x12e/0x150 <4> [83.540034] ? _raw_spin_unlock_irqrestore+0x50/0x60 <4> [83.540038] validate_chain+0xb37/0x1e70 <4> [83.540042] __lock_acquire+0x5a1/0xb70 <4> [83.540046] lock_acquire+0
Re: [Intel-gfx] [PATCH 0/1] drm/i915: vlv sideband
On Wed, 13 Oct 2021, Ville Syrjälä wrote: > On Wed, Oct 13, 2021 at 01:11:58PM +0300, Jani Nikula wrote: >> Three main ideas here: >> >> - vlv sideband only has the name "sideband" in common with the rest of >> intel_sideband.[ch] > > I wouldn't put it like that. There are two actual sideband > implementtions in that file: > - vlv/chv iosf sideband (vlv_sideband) > - lpt/wpt iosf sideband (intel_sbi) > > And the third thing in that file is the snb+ pcode mailbox stuff, > which has nothing to do with sideband. Fair enough... but no opposition to the splitting out of vlv/chv iosf sideband? vlv_sideband.[ch] like here? I'm fine with renaming too. I can follow up with lpt/wpt iosf split out (intel_sbi.[ch]?) and snb+ pcode (intel_pcode.[ch]?). I think we've just put all of them together way back when this was all probably bundled in i915_drv.c or something... BR, Jani. -- Jani Nikula, Intel Open Source Graphics Center
[Intel-gfx] ✓ Fi.CI.BAT: success for drm/i915: vlv sideband
== Series Details == Series: drm/i915: vlv sideband URL : https://patchwork.freedesktop.org/series/95764/ State : success == Summary == CI Bug Log - changes from CI_DRM_10728 -> Patchwork_21327 Summary --- **SUCCESS** No regressions found. External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21327/index.html Known issues Here are the changes found in Patchwork_21327 that come from known issues: ### IGT changes ### Issues hit * igt@i915_selftest@live@hangcheck: - fi-ivb-3770:[PASS][1] -> [INCOMPLETE][2] ([i915#3303]) [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10728/fi-ivb-3770/igt@i915_selftest@l...@hangcheck.html [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21327/fi-ivb-3770/igt@i915_selftest@l...@hangcheck.html * igt@kms_chamelium@vga-hpd-fast: - fi-kbl-guc: NOTRUN -> [SKIP][3] ([fdo#109271] / [fdo#111827]) +8 similar issues [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21327/fi-kbl-guc/igt@kms_chamel...@vga-hpd-fast.html * igt@kms_frontbuffer_tracking@basic: - fi-cml-u2: [PASS][4] -> [DMESG-WARN][5] ([i915#4269]) [4]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10728/fi-cml-u2/igt@kms_frontbuffer_track...@basic.html [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21327/fi-cml-u2/igt@kms_frontbuffer_track...@basic.html * igt@kms_pipe_crc_basic@compare-crc-sanitycheck-pipe-d: - fi-kbl-guc: NOTRUN -> [SKIP][6] ([fdo#109271] / [i915#533]) [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21327/fi-kbl-guc/igt@kms_pipe_crc_ba...@compare-crc-sanitycheck-pipe-d.html * igt@kms_pipe_crc_basic@read-crc-pipe-c: - fi-kbl-guc: NOTRUN -> [SKIP][7] ([fdo#109271]) +41 similar issues [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21327/fi-kbl-guc/igt@kms_pipe_crc_ba...@read-crc-pipe-c.html * igt@runner@aborted: - fi-ivb-3770:NOTRUN -> [FAIL][8] ([fdo#109271]) [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21327/fi-ivb-3770/igt@run...@aborted.html Possible fixes * igt@i915_selftest@live@hangcheck: - {fi-hsw-gt1}: [DMESG-WARN][9] ([i915#3303]) -> [PASS][10] [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10728/fi-hsw-gt1/igt@i915_selftest@l...@hangcheck.html [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21327/fi-hsw-gt1/igt@i915_selftest@l...@hangcheck.html * igt@i915_selftest@live@perf: - {fi-tgl-dsi}: [DMESG-WARN][11] ([i915#2867]) -> [PASS][12] +9 similar issues [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10728/fi-tgl-dsi/igt@i915_selftest@l...@perf.html [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21327/fi-tgl-dsi/igt@i915_selftest@l...@perf.html {name}: This element is suppressed. This means it is ignored when computing the status of the difference (SUCCESS, WARNING, or FAILURE). [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271 [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827 [i915#2867]: https://gitlab.freedesktop.org/drm/intel/issues/2867 [i915#3303]: https://gitlab.freedesktop.org/drm/intel/issues/3303 [i915#4269]: https://gitlab.freedesktop.org/drm/intel/issues/4269 [i915#533]: https://gitlab.freedesktop.org/drm/intel/issues/533 Participating hosts (41 -> 37) -- Additional (1): fi-kbl-guc Missing(5): fi-ilk-m540 fi-hsw-4200u fi-bsw-cyan fi-apl-guc fi-ctg-p8600 Build changes - * Linux: CI_DRM_10728 -> Patchwork_21327 CI-20190529: 20190529 CI_DRM_10728: 82a9f298afec66c882e710078138891826ce5e22 @ git://anongit.freedesktop.org/gfx-ci/linux IGT_6242: 721fd85ee95225ed5df322f7182bdfa9b86a3e68 @ https://gitlab.freedesktop.org/drm/igt-gpu-tools.git Patchwork_21327: ba91b0757d4b185a92e03981ca99df05ca7cea22 @ git://anongit.freedesktop.org/gfx-ci/linux == Linux commits == ba91b0757d4b drm/i915: split out vlv sideband to a separate file == Logs == For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21327/index.html
[Intel-gfx] ✗ Fi.CI.BUILD: failure for drm/i915: Use dma_resv_iter for waiting in i915_gem_object_wait_reservation.
== Series Details == Series: drm/i915: Use dma_resv_iter for waiting in i915_gem_object_wait_reservation. URL : https://patchwork.freedesktop.org/series/95765/ State : failure == Summary == CALLscripts/checksyscalls.sh CALLscripts/atomic/check-atomics.sh DESCEND objtool CHK include/generated/compile.h make[4]: *** No rule to make target 'drivers/gpu/drm/i915/dma_resv_utils.o', needed by 'drivers/gpu/drm/i915/i915.o'. Stop. scripts/Makefile.build:540: recipe for target 'drivers/gpu/drm/i915' failed make[3]: *** [drivers/gpu/drm/i915] Error 2 scripts/Makefile.build:540: recipe for target 'drivers/gpu/drm' failed make[2]: *** [drivers/gpu/drm] Error 2 scripts/Makefile.build:540: recipe for target 'drivers/gpu' failed make[1]: *** [drivers/gpu] Error 2 Makefile:1868: recipe for target 'drivers' failed make: *** [drivers] Error 2
Re: [Intel-gfx] [PATCH 0/1] drm/i915: vlv sideband
On Wed, Oct 13, 2021 at 01:47:09PM +0300, Jani Nikula wrote: > On Wed, 13 Oct 2021, Ville Syrjälä wrote: > > On Wed, Oct 13, 2021 at 01:11:58PM +0300, Jani Nikula wrote: > >> Three main ideas here: > >> > >> - vlv sideband only has the name "sideband" in common with the rest of > >> intel_sideband.[ch] > > > > I wouldn't put it like that. There are two actual sideband > > implementtions in that file: > > - vlv/chv iosf sideband (vlv_sideband) > > - lpt/wpt iosf sideband (intel_sbi) > > > > And the third thing in that file is the snb+ pcode mailbox stuff, > > which has nothing to do with sideband. > > Fair enough... but no opposition to the splitting out of vlv/chv iosf > sideband? vlv_sideband.[ch] like here? I'm fine with renaming too. > > I can follow up with lpt/wpt iosf split out (intel_sbi.[ch]?) and snb+ > pcode (intel_pcode.[ch]?). Yeah, I guess just full split is the cleanest. Those names seem OK to me. Or I suppose we could rename the intel_sbi stuff to lpt_sbi or something? Might not be worth the hassle. Adding a small comment to intel_sbi.c to document what it's for should be sufficient reminder. > I think we've just put all of them together way back when this was all > probably bundled in i915_drv.c or something... Yeah. I think the common thread was that you need to go through a mailbox, but the file name didn't really reflect that. -- Ville Syrjälä Intel
[Intel-gfx] [PATCH] drm/i915: Use dma_resv_iter for waiting in i915_gem_object_wait_reservation.
No memory should be allocated when calling i915_gem_object_wait, because it may be called to idle a BO when evicting memory. Fix this by using dma_resv_iter helpers to call i915_gem_object_wait_fence() on each fence, which cleans up the code a lot. Also remove dma_resv_prune, it's questionably. This will result in the following lockdep splat. <4> [83.538517] == <4> [83.538520] WARNING: possible circular locking dependency detected <4> [83.538522] 5.15.0-rc5-CI-Trybot_8062+ #1 Not tainted <4> [83.538525] -- <4> [83.538527] gem_render_line/5242 is trying to acquire lock: <4> [83.538530] 8275b1e0 (fs_reclaim){+.+.}-{0:0}, at: __kmalloc_track_caller+0x56/0x270 <4> [83.538538] but task is already holding lock: <4> [83.538540] 88813471d1e0 (&vm->mutex/1){+.+.}-{3:3}, at: i915_vma_pin_ww+0x1c7/0x970 [i915] <4> [83.538638] which lock already depends on the new lock. <4> [83.538642] the existing dependency chain (in reverse order) is: <4> [83.538645] -> #1 (&vm->mutex/1){+.+.}-{3:3}: <4> [83.538649]lock_acquire+0xd3/0x310 <4> [83.538654]i915_gem_shrinker_taints_mutex+0x2d/0x50 [i915] <4> [83.538730]i915_address_space_init+0xf5/0x1b0 [i915] <4> [83.538794]ppgtt_init+0x55/0x70 [i915] <4> [83.538856]gen8_ppgtt_create+0x44/0x5d0 [i915] <4> [83.538912]i915_ppgtt_create+0x28/0xf0 [i915] <4> [83.538971]intel_gt_init+0x130/0x3b0 [i915] <4> [83.539029]i915_gem_init+0x14b/0x220 [i915] <4> [83.539100]i915_driver_probe+0x97e/0xdd0 [i915] <4> [83.539149]i915_pci_probe+0x43/0x1d0 [i915] <4> [83.539197]pci_device_probe+0x9b/0x110 <4> [83.539201]really_probe+0x1b0/0x3b0 <4> [83.539205]__driver_probe_device+0xf6/0x170 <4> [83.539208]driver_probe_device+0x1a/0x90 <4> [83.539210]__driver_attach+0x93/0x160 <4> [83.539213]bus_for_each_dev+0x72/0xc0 <4> [83.539216]bus_add_driver+0x14b/0x1f0 <4> [83.539220]driver_register+0x66/0xb0 <4> [83.539222]hdmi_get_spk_alloc+0x1f/0x50 [snd_hda_codec_hdmi] <4> [83.539227]do_one_initcall+0x53/0x2e0 <4> [83.539230]do_init_module+0x55/0x200 <4> [83.539234]load_module+0x2700/0x2980 <4> [83.539237]__do_sys_finit_module+0xaa/0x110 <4> [83.539241]do_syscall_64+0x37/0xb0 <4> [83.539244]entry_SYSCALL_64_after_hwframe+0x44/0xae <4> [83.539247] -> #0 (fs_reclaim){+.+.}-{0:0}: <4> [83.539251]validate_chain+0xb37/0x1e70 <4> [83.539254]__lock_acquire+0x5a1/0xb70 <4> [83.539258]lock_acquire+0xd3/0x310 <4> [83.539260]fs_reclaim_acquire+0x9d/0xd0 <4> [83.539264]__kmalloc_track_caller+0x56/0x270 <4> [83.539267]krealloc+0x48/0xa0 <4> [83.539270]dma_resv_get_fences+0x1c3/0x280 <4> [83.539274]i915_gem_object_wait+0x1ff/0x410 [i915] <4> [83.539342]i915_gem_evict_for_node+0x16b/0x440 [i915] <4> [83.539412]i915_gem_gtt_reserve+0xff/0x130 [i915] <4> [83.539482]i915_vma_pin_ww+0x765/0x970 [i915] <4> [83.539556]eb_validate_vmas+0x6fe/0x8e0 [i915] <4> [83.539626]i915_gem_do_execbuffer+0x9a6/0x20a0 [i915] <4> [83.539693]i915_gem_execbuffer2_ioctl+0x11f/0x2c0 [i915] <4> [83.539759]drm_ioctl_kernel+0xac/0x140 <4> [83.539763]drm_ioctl+0x201/0x3d0 <4> [83.539766]__x64_sys_ioctl+0x6a/0xa0 <4> [83.539769]do_syscall_64+0x37/0xb0 <4> [83.539772]entry_SYSCALL_64_after_hwframe+0x44/0xae <4> [83.539775] other info that might help us debug this: <4> [83.539778] Possible unsafe locking scenario: <4> [83.539781]CPU0CPU1 <4> [83.539783] <4> [83.539785] lock(&vm->mutex/1); <4> [83.539788]lock(fs_reclaim); <4> [83.539791]lock(&vm->mutex/1); <4> [83.539794] lock(fs_reclaim); <4> [83.539796] *** DEADLOCK *** <4> [83.539799] 3 locks held by gem_render_line/5242: <4> [83.539802] #0: c9d4bbf0 (reservation_ww_class_acquire){+.+.}-{0:0}, at: i915_gem_do_execbuffer+0x8e5/0x20a0 [i915] <4> [83.539870] #1: 88811e48bae8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: eb_validate_vmas+0x81/0x8e0 [i915] <4> [83.539936] #2: 88813471d1e0 (&vm->mutex/1){+.+.}-{3:3}, at: i915_vma_pin_ww+0x1c7/0x970 [i915] <4> [83.540011] stack backtrace: <4> [83.540014] CPU: 2 PID: 5242 Comm: gem_render_line Not tainted 5.15.0-rc5-CI-Trybot_8062+ #1 <4> [83.540019] Hardware name: Intel(R) Client Systems NUC11TNHi3/NUC11TNBi3, BIOS TNTGL357.0038.2020.1124.1648 11/24/2020 <4> [83.540023] Call Trace: <4> [83.540026] dump_stack_lvl+0x56/0x7b <4> [83.540030] check_noncircular+0x12e/0x150 <4> [83.540034] ? _raw_spin_unlock_irqrestore+0x50/0x60 <4> [83.540038] validate_chain+0xb37/0x1e70 <4> [83.540042] __lock_acquire+0x5a1/0xb70 <4> [83.540046] lock_acquire+0
Re: [Intel-gfx] [PATCH] drm/i915: Prefer struct_size over open coded arithmetic
On Mon, 11 Oct 2021, Len Baker wrote: > Hi, > > On Sun, Oct 03, 2021 at 12:42:58PM +0200, Len Baker wrote: >> As noted in the "Deprecated Interfaces, Language Features, Attributes, >> and Conventions" documentation [1], size calculations (especially >> multiplication) should not be performed in memory allocator (or similar) >> function arguments due to the risk of them overflowing. This could lead >> to values wrapping around and a smaller allocation being made than the >> caller was expecting. Using those allocations could lead to linear >> overflows of heap memory and other misbehaviors. >> >> In this case these are not actually dynamic sizes: all the operands >> involved in the calculation are constant values. However it is better to >> refactor them anyway, just to keep the open-coded math idiom out of >> code. >> >> So, add at the end of the struct i915_syncmap a union with two flexible >> array members (these arrays share the same memory layout). This is >> possible using the new DECLARE_FLEX_ARRAY macro. And then, use the >> struct_size() helper to do the arithmetic instead of the argument >> "size + count * size" in the kmalloc and kzalloc() functions. >> >> Also, take the opportunity to refactor the __sync_seqno and __sync_child >> making them more readable. >> >> This code was detected with the help of Coccinelle and audited and fixed >> manually. >> >> [1] >> https://www.kernel.org/doc/html/latest/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments >> >> Signed-off-by: Len Baker >> --- >> drivers/gpu/drm/i915/i915_syncmap.c | 12 >> 1 file changed, 8 insertions(+), 4 deletions(-) > > I received a mail telling that this patch doesn't build: > > == Series Details == > > Series: drm/i915: Prefer struct_size over open coded arithmetic > URL : https://patchwork.freedesktop.org/series/95408/ > State : failure > > But it builds without error against linux-next (tag next-20211001). Against > which tree and branch do I need to build? drm-tip [1]. It's a sort of linux-next for graphics. I think there are still some branches that don't feed to linux-next. BR, Jani. [1] https://cgit.freedesktop.org/drm/drm-tip > > Regards, > Len -- Jani Nikula, Intel Open Source Graphics Center
Re: [Intel-gfx] [PATCH 1/1] RFC : drm/i915: Adding new sysfs frequency attributes
On Fri, 08 Oct 2021, Sujaritha Sundaresan wrote: > This patch adds the following new sysfs frequency attributes; Why? Sysfs is uapi. What's the userspace consumer for these? More comments inline. > - punit_req_freq_mhz > - throttle_reason_status > - throttle_reason_pl1 > - throttle_reason_pl2 > - throttle_reason_pl4 > - throttle_reason_thermal > - throttle_reason_prochot > - throttle_reason_ratl > - throttle_reason_vr_thermalert > - throttle_reason_vr_tdc > > Signed-off-by: Sujaritha Sundaresan > Cc: Dale B Stimson > --- > drivers/gpu/drm/i915/gt/intel_rps.c | 83 + > drivers/gpu/drm/i915/gt/intel_rps.h | 10 +++ > drivers/gpu/drm/i915/i915_reg.h | 11 +++ > drivers/gpu/drm/i915/i915_sysfs.c | 135 > 4 files changed, 239 insertions(+) > > diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c > b/drivers/gpu/drm/i915/gt/intel_rps.c > index 172de6c9f949..c03d99f2608c 100644 > --- a/drivers/gpu/drm/i915/gt/intel_rps.c > +++ b/drivers/gpu/drm/i915/gt/intel_rps.c > @@ -2153,6 +2153,89 @@ u32 intel_rps_read_state_cap(struct intel_rps *rps) > return intel_uncore_read(uncore, GEN6_RP_STATE_CAP); > } > > +static u32 __rps_read_mmio(struct intel_gt *gt, i915_reg_t reg32) > +{ > + intel_wakeref_t wakeref; > + u32 val; > + > + with_intel_runtime_pm(gt->uncore->rpm, wakeref) > + val = intel_uncore_read(gt->uncore, reg32); > + > + return val; > +} > + > +u32 intel_rps_read_throttle_reason_status(struct intel_rps *rps) > +{ > + struct intel_gt *gt = rps_to_gt(rps); > + u32 status = __rps_read_mmio(gt, GT0_PERF_LIMIT_REASONS) & > GT0_PERF_LIMIT_REASONS_MASK; > + > + return status; > +} > + > +u32 intel_rps_read_throttle_reason_pl1(struct intel_rps *rps) > +{ > + struct intel_gt *gt = rps_to_gt(rps); > + u32 pl1 = __rps_read_mmio(gt, GT0_PERF_LIMIT_REASONS) & > POWER_LIMIT_1_MASK; > + > + return pl1; > +} > + > +u32 intel_rps_read_throttle_reason_pl2(struct intel_rps *rps) > +{ > + struct intel_gt *gt = rps_to_gt(rps); > + u32 pl2 = __rps_read_mmio(gt, GT0_PERF_LIMIT_REASONS) & > POWER_LIMIT_2_MASK; > + > + return pl2; > +} > + > +u32 intel_rps_read_throttle_reason_pl4(struct intel_rps *rps) > +{ > + struct intel_gt *gt = rps_to_gt(rps); > + u32 pl4 = __rps_read_mmio(gt, GT0_PERF_LIMIT_REASONS) & > POWER_LIMIT_4_MASK; > + > + return pl4; > +} > + > +u32 intel_rps_read_throttle_reason_thermal(struct intel_rps *rps) > +{ > + struct intel_gt *gt = rps_to_gt(rps); > + u32 thermal = __rps_read_mmio(gt, GT0_PERF_LIMIT_REASONS) & > THERMAL_LIMIT_MASK; > + > + return thermal; > +} > + > +u32 intel_rps_read_throttle_reason_prochot(struct intel_rps *rps) > +{ > + struct intel_gt *gt = rps_to_gt(rps); > + u32 prochot = __rps_read_mmio(gt, GT0_PERF_LIMIT_REASONS) & > PROCHOT_MASK; > + > + return prochot; > +} > + > +u32 intel_rps_read_throttle_reason_ratl(struct intel_rps *rps) > +{ > + struct intel_gt *gt = rps_to_gt(rps); > + u32 ratl = __rps_read_mmio(gt, GT0_PERF_LIMIT_REASONS) & RATL_MASK; > + > + return ratl; > +} > + > +u32 intel_rps_read_throttle_reason_vr_thermalert(struct intel_rps *rps) > +{ > + struct intel_gt *gt = rps_to_gt(rps); > + u32 thermalert = __rps_read_mmio(gt, GT0_PERF_LIMIT_REASONS) & > VR_THERMALERT_MASK; > + > + return thermalert; > +} > + > +u32 intel_rps_read_throttle_reason_vr_tdc(struct intel_rps *rps) > +{ > + struct intel_gt *gt = rps_to_gt(rps); > + u32 tdc = __rps_read_mmio(gt, GT0_PERF_LIMIT_REASONS) & VR_TDC_MASK; > + > + return tdc; > +} > + > /* External interface for intel_ips.ko */ > > static struct drm_i915_private __rcu *ips_mchdev; > diff --git a/drivers/gpu/drm/i915/gt/intel_rps.h > b/drivers/gpu/drm/i915/gt/intel_rps.h > index 11960d64ca82..d6ac97f1facd 100644 > --- a/drivers/gpu/drm/i915/gt/intel_rps.h > +++ b/drivers/gpu/drm/i915/gt/intel_rps.h > @@ -42,6 +42,16 @@ u32 intel_rps_get_rpn_frequency(struct intel_rps *rps); > u32 intel_rps_read_punit_req(struct intel_rps *rps); > u32 intel_rps_read_punit_req_frequency(struct intel_rps *rps); > u32 intel_rps_read_state_cap(struct intel_rps *rps); > +u32 intel_rps_read_throttle_reason(struct intel_rps *rps); > +u32 intel_rps_read_throttle_reason_status(struct intel_rps *rps); > +u32 intel_rps_read_throttle_reason_pl1(struct intel_rps *rps); > +u32 intel_rps_read_throttle_reason_pl2(struct intel_rps *rps); > +u32 intel_rps_read_throttle_reason_pl4(struct intel_rps *rps); > +u32 intel_rps_read_throttle_reason_thermal(struct intel_rps *rps); > +u32 intel_rps_read_throttle_reason_prochot(struct intel_rps *rps); > +u32 intel_rps_read_throttle_reason_ratl(struct intel_rps *rps); > +u32 intel_rps_read_throttle_reason_vr_thermalert(struct intel_rps *rps); > +u32 intel_rps_read_throttle_reason_vr_tdc(struct intel_rps *rps); > > void gen5_rps_irq_handler(struct intel_r
Re: [Intel-gfx] [PATCH] drm/i915: Prefer struct_size over open coded arithmetic
On Wed, Oct 13, 2021 at 02:24:05PM +0300, Jani Nikula wrote: > On Mon, 11 Oct 2021, Len Baker wrote: > > Hi, > > > > On Sun, Oct 03, 2021 at 12:42:58PM +0200, Len Baker wrote: > >> As noted in the "Deprecated Interfaces, Language Features, Attributes, > >> and Conventions" documentation [1], size calculations (especially > >> multiplication) should not be performed in memory allocator (or similar) > >> function arguments due to the risk of them overflowing. This could lead > >> to values wrapping around and a smaller allocation being made than the > >> caller was expecting. Using those allocations could lead to linear > >> overflows of heap memory and other misbehaviors. > >> > >> In this case these are not actually dynamic sizes: all the operands > >> involved in the calculation are constant values. However it is better to > >> refactor them anyway, just to keep the open-coded math idiom out of > >> code. > >> > >> So, add at the end of the struct i915_syncmap a union with two flexible > >> array members (these arrays share the same memory layout). This is > >> possible using the new DECLARE_FLEX_ARRAY macro. And then, use the > >> struct_size() helper to do the arithmetic instead of the argument > >> "size + count * size" in the kmalloc and kzalloc() functions. > >> > >> Also, take the opportunity to refactor the __sync_seqno and __sync_child > >> making them more readable. > >> > >> This code was detected with the help of Coccinelle and audited and fixed > >> manually. > >> > >> [1] > >> https://www.kernel.org/doc/html/latest/process/deprecated.html#open-coded-arithmetic-in-allocator-arguments > >> > >> Signed-off-by: Len Baker > >> --- > >> drivers/gpu/drm/i915/i915_syncmap.c | 12 > >> 1 file changed, 8 insertions(+), 4 deletions(-) > > > > I received a mail telling that this patch doesn't build: > > > > == Series Details == > > > > Series: drm/i915: Prefer struct_size over open coded arithmetic > > URL : https://patchwork.freedesktop.org/series/95408/ > > State : failure > > > > But it builds without error against linux-next (tag next-20211001). Against > > which tree and branch do I need to build? > > drm-tip [1]. It's a sort of linux-next for graphics. I think there are > still some branches that don't feed to linux-next. Yeah we need to get gt-next in linux-next asap. Joonas promised to send out his patch to make that happen in dim. -Daniel > > BR, > Jani. > > > [1] https://cgit.freedesktop.org/drm/drm-tip > > > > > > Regards, > > Len > > -- > Jani Nikula, Intel Open Source Graphics Center -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [Intel-gfx] [PATCH 1/1] drm/i915: split out vlv sideband to a separate file
Hi, On 10/13/21 12:11 PM, Jani Nikula wrote: > The VLV/CHV sideband code is pretty distinct from the rest of the > sideband code. Split it out to new vlv_sideband.[ch]. > > Pure code movement with relevant #include changes, and a tiny checkpatch > fix on top. > > Cc: Lucas De Marchi > Cc: Ville Syrjälä > Signed-off-by: Jani Nikula Thanks, patch looks good to me: Reviewed-by: Hans de Goede Feel free to keep the Reviewed-by if you do a new version with the improved commit msg suggested by Ville. Regards, Hans > --- > drivers/gpu/drm/i915/Makefile | 1 + > drivers/gpu/drm/i915/display/g4x_dp.c | 2 +- > drivers/gpu/drm/i915/display/g4x_hdmi.c | 2 +- > drivers/gpu/drm/i915/display/intel_cdclk.c| 1 + > drivers/gpu/drm/i915/display/intel_display.c | 1 + > .../drm/i915/display/intel_display_debugfs.c | 1 - > .../drm/i915/display/intel_display_power.c| 4 +- > drivers/gpu/drm/i915/display/intel_dp.c | 1 - > drivers/gpu/drm/i915/display/intel_dpio_phy.c | 5 +- > drivers/gpu/drm/i915/display/intel_dpll.c | 2 +- > drivers/gpu/drm/i915/display/intel_dsi_vbt.c | 2 +- > drivers/gpu/drm/i915/display/vlv_dsi.c| 2 +- > drivers/gpu/drm/i915/display/vlv_dsi_pll.c| 2 +- > drivers/gpu/drm/i915/gt/intel_gt_pm_debugfs.c | 1 + > drivers/gpu/drm/i915/gt/intel_rps.c | 1 + > drivers/gpu/drm/i915/i915_debugfs.c | 1 - > drivers/gpu/drm/i915/i915_sysfs.c | 1 - > drivers/gpu/drm/i915/intel_pm.c | 1 + > drivers/gpu/drm/i915/intel_sideband.c | 257 - > drivers/gpu/drm/i915/intel_sideband.h | 110 > drivers/gpu/drm/i915/vlv_sideband.c | 266 ++ > drivers/gpu/drm/i915/vlv_sideband.h | 123 > 22 files changed, 405 insertions(+), 382 deletions(-) > create mode 100644 drivers/gpu/drm/i915/vlv_sideband.c > create mode 100644 drivers/gpu/drm/i915/vlv_sideband.h > > diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile > index 21b05ed0e4e8..d50d2b144fc6 100644 > --- a/drivers/gpu/drm/i915/Makefile > +++ b/drivers/gpu/drm/i915/Makefile > @@ -54,6 +54,7 @@ i915-y += i915_drv.o \ > intel_step.o \ > intel_uncore.o \ > intel_wakeref.o \ > + vlv_sideband.o \ > vlv_suspend.o > > # core library code > diff --git a/drivers/gpu/drm/i915/display/g4x_dp.c > b/drivers/gpu/drm/i915/display/g4x_dp.c > index 85a09c3e09e8..dc41868d01ef 100644 > --- a/drivers/gpu/drm/i915/display/g4x_dp.c > +++ b/drivers/gpu/drm/i915/display/g4x_dp.c > @@ -18,7 +18,7 @@ > #include "intel_hdmi.h" > #include "intel_hotplug.h" > #include "intel_pps.h" > -#include "intel_sideband.h" > +#include "vlv_sideband.h" > > struct dp_link_dpll { > int clock; > diff --git a/drivers/gpu/drm/i915/display/g4x_hdmi.c > b/drivers/gpu/drm/i915/display/g4x_hdmi.c > index be352e9f0afc..88c427f3c346 100644 > --- a/drivers/gpu/drm/i915/display/g4x_hdmi.c > +++ b/drivers/gpu/drm/i915/display/g4x_hdmi.c > @@ -14,8 +14,8 @@ > #include "intel_fifo_underrun.h" > #include "intel_hdmi.h" > #include "intel_hotplug.h" > -#include "intel_sideband.h" > #include "intel_sdvo.h" > +#include "vlv_sideband.h" > > static void intel_hdmi_prepare(struct intel_encoder *encoder, > const struct intel_crtc_state *crtc_state) > diff --git a/drivers/gpu/drm/i915/display/intel_cdclk.c > b/drivers/gpu/drm/i915/display/intel_cdclk.c > index ecb28e8f1eb6..44bb18773509 100644 > --- a/drivers/gpu/drm/i915/display/intel_cdclk.c > +++ b/drivers/gpu/drm/i915/display/intel_cdclk.c > @@ -30,6 +30,7 @@ > #include "intel_display_types.h" > #include "intel_psr.h" > #include "intel_sideband.h" > +#include "vlv_sideband.h" > > /** > * DOC: CDCLK / RAWCLK > diff --git a/drivers/gpu/drm/i915/display/intel_display.c > b/drivers/gpu/drm/i915/display/intel_display.c > index 9cf987ee143d..3602fdb2a549 100644 > --- a/drivers/gpu/drm/i915/display/intel_display.c > +++ b/drivers/gpu/drm/i915/display/intel_display.c > @@ -109,6 +109,7 @@ > #include "i9xx_plane.h" > #include "skl_scaler.h" > #include "skl_universal_plane.h" > +#include "vlv_sideband.h" > > static void i9xx_crtc_clock_get(struct intel_crtc *crtc, > struct intel_crtc_state *pipe_config); > diff --git a/drivers/gpu/drm/i915/display/intel_display_debugfs.c > b/drivers/gpu/drm/i915/display/intel_display_debugfs.c > index bc5113589f0a..e04767695530 100644 > --- a/drivers/gpu/drm/i915/display/intel_display_debugfs.c > +++ b/drivers/gpu/drm/i915/display/intel_display_debugfs.c > @@ -20,7 +20,6 @@ > #include "intel_hdmi.h" > #include "intel_pm.h" > #include "intel_psr.h" > -#include "intel_sideband.h" > #include "intel_sprite.h" > > static inline struct drm_i915_private *node_to_i915(struct drm_info_node > *node) > diff --git a/drivers/gpu/drm/i915/display/intel_display_power
Re: [Intel-gfx] mmotm 2021-10-05-19-53 uploaded (drivers/gpu/drm/msm/hdmi/hdmi_phy.o)
On Wed, Oct 13, 2021 at 12:54 PM Arnd Bergmann wrote: > On Thu, Oct 7, 2021 at 11:51 AM Geert Uytterhoeven > wrote: > > -msm-$(CONFIG_DRM_FBDEV_EMULATION) += msm_fbdev.o > -msm-$(CONFIG_COMMON_CLK) += disp/mdp4/mdp4_lvds_pll.o > -msm-$(CONFIG_COMMON_CLK) += hdmi/hdmi_pll_8960.o > -msm-$(CONFIG_COMMON_CLK) += hdmi/hdmi_phy_8996.o > +msm-$(CONFIG_DRM_FBDEV_EMULATION) += msm_fbdev.o \ > + disp/mdp4/mdp4_lvds_pll.o \ > + hdmi/hdmi_pll_8960.o \ > + hdmi/hdmi_phy_8996.o > > msm-$(CONFIG_DRM_MSM_HDMI_HDCP) += hdmi/hdmi_hdcp.o I fixed my local copy now after noticing that these should not go after CONFIG_DRM_FBDEV_EMULATION but the top-level option: @@ -23,8 +23,10 @@ msm-y := \ hdmi/hdmi_i2c.o \ hdmi/hdmi_phy.o \ hdmi/hdmi_phy_8960.o \ + hdmi/hdmi_phy_8996.o hdmi/hdmi_phy_8x60.o \ hdmi/hdmi_phy_8x74.o \ + hdmi/hdmi_pll_8960.o \ edp/edp.o \ edp/edp_aux.o \ edp/edp_bridge.o \ @@ -37,6 +39,7 @@ msm-y := \ disp/mdp4/mdp4_dtv_encoder.o \ disp/mdp4/mdp4_lcdc_encoder.o \ disp/mdp4/mdp4_lvds_connector.o \ + disp/mdp4/mdp4_lvds_pll.o \ disp/mdp4/mdp4_irq.o \ disp/mdp4/mdp4_kms.o \ disp/mdp4/mdp4_plane.o \ Arnd
Re: [Intel-gfx] mmotm 2021-10-05-19-53 uploaded (drivers/gpu/drm/msm/hdmi/hdmi_phy.o)
On Thu, Oct 7, 2021 at 11:51 AM Geert Uytterhoeven wrote: > On Wed, Oct 6, 2021 at 9:28 AM Christian König > wrote: > > Am 06.10.21 um 09:20 schrieb Stephen Rothwell: > > > On Tue, 5 Oct 2021 22:48:03 -0700 Randy Dunlap > > > wrote: > > >> on i386: > > >> > > >> ld: drivers/gpu/drm/msm/hdmi/hdmi_phy.o:(.rodata+0x3f0): undefined > > >> reference to `msm_hdmi_phy_8996_cfg' I ran into the same thing now as well. E_TEST) && COMMON_CLK > > I'd make that: > > -depends on DRM > + depends on COMMON_CLK && DRM && IOMMU_SUPPORT > depends on ARCH_QCOM || SOC_IMX5 || COMPILE_TEST > -depends on IOMMU_SUPPORT > - depends on (OF && COMMON_CLK) || COMPILE_TEST > + depends on OF || COMPILE_TEST > > to keep a better separation between hard and soft dependencies. > > Note that the "depends on OF || COMPILE_TEST" can even be > deleted, as the dependency on ARCH_QCOM || SOC_IMX5 implies OF. Looks good to me, I would also drop that last line in this case, and maybe add this change as building without COMMON_CLK is no longer possible: diff --git a/drivers/gpu/drm/msm/Makefile b/drivers/gpu/drm/msm/Makefile index 904535eda0c4..a5d87e03812f 100644 --- a/drivers/gpu/drm/msm/Makefile +++ b/drivers/gpu/drm/msm/Makefile @@ -116,10 +116,10 @@ msm-$(CONFIG_DRM_MSM_DP)+= dp/dp_aux.o \ dp/dp_power.o \ dp/dp_audio.o -msm-$(CONFIG_DRM_FBDEV_EMULATION) += msm_fbdev.o -msm-$(CONFIG_COMMON_CLK) += disp/mdp4/mdp4_lvds_pll.o -msm-$(CONFIG_COMMON_CLK) += hdmi/hdmi_pll_8960.o -msm-$(CONFIG_COMMON_CLK) += hdmi/hdmi_phy_8996.o +msm-$(CONFIG_DRM_FBDEV_EMULATION) += msm_fbdev.o \ + disp/mdp4/mdp4_lvds_pll.o \ + hdmi/hdmi_pll_8960.o \ + hdmi/hdmi_phy_8996.o msm-$(CONFIG_DRM_MSM_HDMI_HDCP) += hdmi/hdmi_hdcp.o Has anyone submitted a patch already, or should I send the version that I am using locally now? Arnd
Re: [Intel-gfx] [PATCH 1/4] drm: Introduce drm_modeset_lock_ctx_retry()
On Mon, Oct 04, 2021 at 02:15:51PM +0300, Ville Syrjälä wrote: > On Tue, Jul 20, 2021 at 03:44:49PM +0200, Daniel Vetter wrote: > > On Thu, Jul 15, 2021 at 09:49:51PM +0300, Ville Syrjala wrote: > > > From: Ville Syrjälä > > > > > > Quite a few places are hand rolling the modeset lock backoff dance. > > > Let's suck that into a helper macro that is easier to use without > > > forgetting some steps. > > > > > > The main downside is probably that the implementation of > > > drm_with_modeset_lock_ctx() is a bit harder to read than a hand > > > rolled version on account of being split across three functions, > > > but the actual code using it ends up being much simpler. > > > > > > Cc: Sean Paul > > > Cc: Daniel Vetter > > > Signed-off-by: Ville Syrjälä > > > --- > > > drivers/gpu/drm/drm_modeset_lock.c | 44 ++ > > > include/drm/drm_modeset_lock.h | 20 ++ > > > 2 files changed, 64 insertions(+) > > > > > > diff --git a/drivers/gpu/drm/drm_modeset_lock.c > > > b/drivers/gpu/drm/drm_modeset_lock.c > > > index fcfe1a03c4a1..083df96632e8 100644 > > > --- a/drivers/gpu/drm/drm_modeset_lock.c > > > +++ b/drivers/gpu/drm/drm_modeset_lock.c > > > @@ -425,3 +425,47 @@ int drm_modeset_lock_all_ctx(struct drm_device *dev, > > > return 0; > > > } > > > EXPORT_SYMBOL(drm_modeset_lock_all_ctx); > > > + > > > +void _drm_modeset_lock_begin(struct drm_modeset_acquire_ctx *ctx, > > > + struct drm_atomic_state *state, > > > + unsigned int flags, int *ret) > > > +{ > > > + drm_modeset_acquire_init(ctx, flags); > > > + > > > + if (state) > > > + state->acquire_ctx = ctx; > > > + > > > + *ret = -EDEADLK; > > > +} > > > +EXPORT_SYMBOL(_drm_modeset_lock_begin); > > > + > > > +bool _drm_modeset_lock_loop(int *ret) > > > +{ > > > + if (*ret == -EDEADLK) { > > > + *ret = 0; > > > + return true; > > > + } > > > + > > > + return false; > > > +} > > > +EXPORT_SYMBOL(_drm_modeset_lock_loop); > > > + > > > +void _drm_modeset_lock_end(struct drm_modeset_acquire_ctx *ctx, > > > +struct drm_atomic_state *state, > > > +int *ret) > > > +{ > > > + if (*ret == -EDEADLK) { > > > + if (state) > > > + drm_atomic_state_clear(state); > > > + > > > + *ret = drm_modeset_backoff(ctx); > > > + if (*ret == 0) { > > > + *ret = -EDEADLK; > > > + return; > > > + } > > > + } > > > + > > > + drm_modeset_drop_locks(ctx); > > > + drm_modeset_acquire_fini(ctx); > > > +} > > > +EXPORT_SYMBOL(_drm_modeset_lock_end); > > > diff --git a/include/drm/drm_modeset_lock.h > > > b/include/drm/drm_modeset_lock.h > > > index aafd07388eb7..5eaad2533de5 100644 > > > --- a/include/drm/drm_modeset_lock.h > > > +++ b/include/drm/drm_modeset_lock.h > > > @@ -26,6 +26,7 @@ > > > > > > #include > > > > > > +struct drm_atomic_state; > > > struct drm_modeset_lock; > > > > > > /** > > > @@ -203,4 +204,23 @@ modeset_lock_fail: > > > \ > > > if (!drm_drv_uses_atomic_modeset(dev)) \ > > > mutex_unlock(&dev->mode_config.mutex); > > > > > > +void _drm_modeset_lock_begin(struct drm_modeset_acquire_ctx *ctx, > > > + struct drm_atomic_state *state, > > > + unsigned int flags, > > > + int *ret); > > > +bool _drm_modeset_lock_loop(int *ret); > > > +void _drm_modeset_lock_end(struct drm_modeset_acquire_ctx *ctx, > > > +struct drm_atomic_state *state, > > > +int *ret); > > > + > > > +/* > > > + * Note that one must always use "continue" rather than > > > + * "break" or "return" to handle errors within the > > > + * drm_modeset_lock_ctx_retry() block. > > > > I'm not sold on loop macros with these kind of restrictions, C just isn't > > a great language for these. That's why e.g. drm_connector_iter doesn't > > give you a macro, but only the begin/next/end function calls explicitly. > > We already use this pattern extensively in i915. Gem ww ctx has one, > power domains/pps/etc. use a similar things. It makes the code pretty nice, > with the slight caveat that an accidental 'break' can ruin your day. But > so can an accidental return with other constructs (and we even had that > happen a few times with the connector iterators), so not a dealbreaker > IMO. > > So if we don't want this drm wide I guess I can propose this just for > i915 since it fits in perfectly there. Well I don't like them for i915 either. And yes C is dangerous, but also C is verbose. I think one lesson from igt is that too many magic block constructs are bad, it's just not how C works. Definitely not in the kernel, where "oops I got it wrong because it was too clever" is bad. > > Yes the macro we have is also not nice, but at least it's a screaming > > macro since it's all uppercase, so
Re: [Intel-gfx] [RFC 6/8] drm/i915: Make some recently added vfuncs use full scheduling attribute
On Wed, Oct 06, 2021 at 10:12:29AM -0700, Matthew Brost wrote: > On Mon, Oct 04, 2021 at 03:36:48PM +0100, Tvrtko Ursulin wrote: > > From: Tvrtko Ursulin > > > > Code added in 71ed60112d5d ("drm/i915: Add kick_backend function to > > i915_sched_engine") and ee242ca704d3 ("drm/i915/guc: Implement GuC > > priority management") introduced some scheduling related vfuncs which > > take integer request priority as argument. > > > > Make them instead take struct i915_sched_attr, which is the type > > encapsulating this information, so it probably aligns with the design > > better. It definitely enables extending the set of scheduling attributes. > > > > Understand the motivation here but the i915_scheduler is going to > disapear when we move to the DRM scheduler or at least its functionality > of priority inheritance will be pushed into the DRM scheduler. I'd be > very careful making any changes here as the priority in the DRM > scheduler is defined as single enum: Yeah I'm not sure it makes sense to build this and make the conversion to drm/sched even harder. We've already merged a lot of code with a "we'll totally convert to drm/sched right after" promise, there's not really room for more fun like this built on top of i915-scheduler. -Daniel > > /* These are often used as an (initial) index > * to an array, and as such should start at 0. > */ > enum drm_sched_priority { > DRM_SCHED_PRIORITY_MIN, > DRM_SCHED_PRIORITY_NORMAL, > DRM_SCHED_PRIORITY_HIGH, > DRM_SCHED_PRIORITY_KERNEL, > > DRM_SCHED_PRIORITY_COUNT, > DRM_SCHED_PRIORITY_UNSET = -2 > }; > > Adding a field to the i915_sched_attr is fairly easy as we already have > a structure but changing the DRM scheduler might be a tougher sell. > Anyway you can make this work without adding the 'nice' field to > i915_sched_attr? Might be worth exploring so when we move to the DRM > scheduler this feature drops in a little cleaner. > > Matt > > > Signed-off-by: Tvrtko Ursulin > > Cc: Matthew Brost > > Cc: Daniele Ceraolo Spurio > > --- > > drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 4 +++- > > drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c| 3 ++- > > drivers/gpu/drm/i915/i915_scheduler.c| 4 ++-- > > drivers/gpu/drm/i915/i915_scheduler_types.h | 4 ++-- > > 4 files changed, 9 insertions(+), 6 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c > > b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c > > index 7147fe80919e..e91d803a6453 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c > > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c > > @@ -3216,11 +3216,13 @@ static bool can_preempt(struct intel_engine_cs > > *engine) > > return engine->class != RENDER_CLASS; > > } > > > > -static void kick_execlists(const struct i915_request *rq, int prio) > > +static void kick_execlists(const struct i915_request *rq, > > + const struct i915_sched_attr *attr) > > { > > struct intel_engine_cs *engine = rq->engine; > > struct i915_sched_engine *sched_engine = engine->sched_engine; > > const struct i915_request *inflight; > > + const int prio = attr->priority; > > > > /* > > * We only need to kick the tasklet once for the high priority > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > > b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > > index ba0de35f6323..b5883a4365ca 100644 > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > > @@ -2414,9 +2414,10 @@ static void guc_init_breadcrumbs(struct > > intel_engine_cs *engine) > > } > > > > static void guc_bump_inflight_request_prio(struct i915_request *rq, > > - int prio) > > + const struct i915_sched_attr *attr) > > { > > struct intel_context *ce = rq->context; > > + const int prio = attr->priority; > > u8 new_guc_prio = map_i915_prio_to_guc_prio(prio); > > > > /* Short circuit function */ > > diff --git a/drivers/gpu/drm/i915/i915_scheduler.c > > b/drivers/gpu/drm/i915/i915_scheduler.c > > index 762127dd56c5..534bab99fcdc 100644 > > --- a/drivers/gpu/drm/i915/i915_scheduler.c > > +++ b/drivers/gpu/drm/i915/i915_scheduler.c > > @@ -255,7 +255,7 @@ static void __i915_schedule(struct i915_sched_node > > *node, > > > > /* Must be called before changing the nodes priority */ > > if (sched_engine->bump_inflight_request_prio) > > - sched_engine->bump_inflight_request_prio(from, prio); > > + sched_engine->bump_inflight_request_prio(from, attr); > > > > WRITE_ONCE(node->attr.priority, prio); > > > > @@ -280,7 +280,7 @@ static void __i915_schedule(struct i915_sched_node > > *node, > > > > /* Defer (tasklet) submission until after all of
Re: [Intel-gfx] [RFC PATCH] drm: Increase DRM_OBJECT_MAX_PROPERTY by 18.
On Tue, Oct 05, 2021 at 08:51:51AM +0200, Sebastian Andrzej Siewior wrote: > The warning poped up, it says it increase it by the number of occurrence. > I saw it 18 times so here it is. > It started to up since commit >2f425cf5242a0 ("drm: Fix oops in damage self-tests by mocking damage > property") > > Increase DRM_OBJECT_MAX_PROPERTY by 18. > > Signed-off-by: Sebastian Andrzej Siewior Which driver where? Whomever added that into upstream should also have realized this (things will just not work) and include it in there. So if things are tested correctly this should be part of a larger series to add these 18 props somewhere. Also maybe we should just dynamically allocate this array if people have this many properties on their objects. -Daniel > --- > > I have no idea whether this is correct or just a symptom of another > problem. This has been observed with i915 and full debug. > > include/drm/drm_mode_object.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/include/drm/drm_mode_object.h b/include/drm/drm_mode_object.h > index c34a3e8030e12..1e5399e47c3a5 100644 > --- a/include/drm/drm_mode_object.h > +++ b/include/drm/drm_mode_object.h > @@ -60,7 +60,7 @@ struct drm_mode_object { > void (*free_cb)(struct kref *kref); > }; > > -#define DRM_OBJECT_MAX_PROPERTY 24 > +#define DRM_OBJECT_MAX_PROPERTY 42 > /** > * struct drm_object_properties - property tracking for &drm_mode_object > */ > -- > 2.33.0 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [Intel-gfx] [PATCH] drm/i915: Handle Intel igfx + Intel dgfx hybrid graphics setup
On Tue, Oct 05, 2021 at 03:05:25PM +0200, Thomas Hellström wrote: > Hi, Tvrtko, > > On 10/5/21 13:31, Tvrtko Ursulin wrote: > > From: Tvrtko Ursulin > > > > In short this makes i915 work for hybrid setups (DRI_PRIME=1 with Mesa) > > when rendering is done on Intel dgfx and scanout/composition on Intel > > igfx. > > > > Before this patch the driver was not quite ready for that setup, mainly > > because it was able to emit a semaphore wait between the two GPUs, which > > results in deadlocks because semaphore target location in HWSP is neither > > shared between the two, nor mapped in both GGTT spaces. > > > > To fix it the patch adds an additional check to a couple of relevant code > > paths in order to prevent using semaphores for inter-engine > > synchronisation when relevant objects are not in the same GGTT space. > > > > v2: > > * Avoid adding rq->i915. (Chris) > > > > v3: > > * Use GGTT which describes the limit more precisely. > > > > Signed-off-by: Tvrtko Ursulin > > Cc: Daniel Vetter > > Cc: Matthew Auld > > Cc: Thomas Hellström > > An IMO pretty important bugfix. I read up a bit on the previous discussion > on this, and from what I understand the other two options were > > 1) Ripping out the semaphore code, > 2) Consider dma-fences from other instances of the same driver as foreign. > > For imported dma-bufs we do 2), but particularly with lmem and p2p that's a > more straightforward decision. > > I don't think 1) is a reasonable approach to fix this bug, (but perhaps as a > general cleanup?), and for 2) yes I guess we might end up doing that, unless > we find some real benefits in treating same-driver-separate-device > dma-fences as local, but for this particular bug, IMO this is a reasonable > fix. The foreign dma-fences have uapi impact, which Tvrtko shrugged off as "it's a good idea", and not it's really just not. So we still need to that this properly. > Reviewed-by: Thomas Hellström But I'm also ok with just merging this as-is so the situation doesn't become too entertaining. -Daniel > > > > > > > --- > > drivers/gpu/drm/i915/i915_request.c | 12 +++- > > 1 file changed, 11 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_request.c > > b/drivers/gpu/drm/i915/i915_request.c > > index 79da5eca60af..4f189982f67e 100644 > > --- a/drivers/gpu/drm/i915/i915_request.c > > +++ b/drivers/gpu/drm/i915/i915_request.c > > @@ -1145,6 +1145,12 @@ __emit_semaphore_wait(struct i915_request *to, > > return 0; > > } > > +static bool > > +can_use_semaphore_wait(struct i915_request *to, struct i915_request *from) > > +{ > > + return to->engine->gt->ggtt == from->engine->gt->ggtt; > > +} > > + > > static int > > emit_semaphore_wait(struct i915_request *to, > > struct i915_request *from, > > @@ -1153,6 +1159,9 @@ emit_semaphore_wait(struct i915_request *to, > > const intel_engine_mask_t mask = READ_ONCE(from->engine)->mask; > > struct i915_sw_fence *wait = &to->submit; > > + if (!can_use_semaphore_wait(to, from)) > > + goto await_fence; > > + > > if (!intel_context_use_semaphores(to->context)) > > goto await_fence; > > @@ -1256,7 +1265,8 @@ __i915_request_await_execution(struct i915_request > > *to, > > * immediate execution, and so we must wait until it reaches the > > * active slot. > > */ > > - if (intel_engine_has_semaphores(to->engine) && > > + if (can_use_semaphore_wait(to, from) && > > + intel_engine_has_semaphores(to->engine) && > > !i915_request_has_initial_breadcrumb(to)) { > > err = __emit_semaphore_wait(to, from, from->fence.seqno - 1); > > if (err < 0) -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [Intel-gfx] [PATCH 1/4] drm: Introduce drm_modeset_lock_ctx_retry()
On Wed, Oct 13, 2021 at 01:59:47PM +0200, Daniel Vetter wrote: > On Mon, Oct 04, 2021 at 02:15:51PM +0300, Ville Syrjälä wrote: > > On Tue, Jul 20, 2021 at 03:44:49PM +0200, Daniel Vetter wrote: > > > On Thu, Jul 15, 2021 at 09:49:51PM +0300, Ville Syrjala wrote: > > > > From: Ville Syrjälä > > > > > > > > Quite a few places are hand rolling the modeset lock backoff dance. > > > > Let's suck that into a helper macro that is easier to use without > > > > forgetting some steps. > > > > > > > > The main downside is probably that the implementation of > > > > drm_with_modeset_lock_ctx() is a bit harder to read than a hand > > > > rolled version on account of being split across three functions, > > > > but the actual code using it ends up being much simpler. > > > > > > > > Cc: Sean Paul > > > > Cc: Daniel Vetter > > > > Signed-off-by: Ville Syrjälä > > > > --- > > > > drivers/gpu/drm/drm_modeset_lock.c | 44 ++ > > > > include/drm/drm_modeset_lock.h | 20 ++ > > > > 2 files changed, 64 insertions(+) > > > > > > > > diff --git a/drivers/gpu/drm/drm_modeset_lock.c > > > > b/drivers/gpu/drm/drm_modeset_lock.c > > > > index fcfe1a03c4a1..083df96632e8 100644 > > > > --- a/drivers/gpu/drm/drm_modeset_lock.c > > > > +++ b/drivers/gpu/drm/drm_modeset_lock.c > > > > @@ -425,3 +425,47 @@ int drm_modeset_lock_all_ctx(struct drm_device > > > > *dev, > > > > return 0; > > > > } > > > > EXPORT_SYMBOL(drm_modeset_lock_all_ctx); > > > > + > > > > +void _drm_modeset_lock_begin(struct drm_modeset_acquire_ctx *ctx, > > > > +struct drm_atomic_state *state, > > > > +unsigned int flags, int *ret) > > > > +{ > > > > + drm_modeset_acquire_init(ctx, flags); > > > > + > > > > + if (state) > > > > + state->acquire_ctx = ctx; > > > > + > > > > + *ret = -EDEADLK; > > > > +} > > > > +EXPORT_SYMBOL(_drm_modeset_lock_begin); > > > > + > > > > +bool _drm_modeset_lock_loop(int *ret) > > > > +{ > > > > + if (*ret == -EDEADLK) { > > > > + *ret = 0; > > > > + return true; > > > > + } > > > > + > > > > + return false; > > > > +} > > > > +EXPORT_SYMBOL(_drm_modeset_lock_loop); > > > > + > > > > +void _drm_modeset_lock_end(struct drm_modeset_acquire_ctx *ctx, > > > > + struct drm_atomic_state *state, > > > > + int *ret) > > > > +{ > > > > + if (*ret == -EDEADLK) { > > > > + if (state) > > > > + drm_atomic_state_clear(state); > > > > + > > > > + *ret = drm_modeset_backoff(ctx); > > > > + if (*ret == 0) { > > > > + *ret = -EDEADLK; > > > > + return; > > > > + } > > > > + } > > > > + > > > > + drm_modeset_drop_locks(ctx); > > > > + drm_modeset_acquire_fini(ctx); > > > > +} > > > > +EXPORT_SYMBOL(_drm_modeset_lock_end); > > > > diff --git a/include/drm/drm_modeset_lock.h > > > > b/include/drm/drm_modeset_lock.h > > > > index aafd07388eb7..5eaad2533de5 100644 > > > > --- a/include/drm/drm_modeset_lock.h > > > > +++ b/include/drm/drm_modeset_lock.h > > > > @@ -26,6 +26,7 @@ > > > > > > > > #include > > > > > > > > +struct drm_atomic_state; > > > > struct drm_modeset_lock; > > > > > > > > /** > > > > @@ -203,4 +204,23 @@ modeset_lock_fail: > > > > \ > > > > if (!drm_drv_uses_atomic_modeset(dev)) > > > > \ > > > > mutex_unlock(&dev->mode_config.mutex); > > > > > > > > +void _drm_modeset_lock_begin(struct drm_modeset_acquire_ctx *ctx, > > > > +struct drm_atomic_state *state, > > > > +unsigned int flags, > > > > +int *ret); > > > > +bool _drm_modeset_lock_loop(int *ret); > > > > +void _drm_modeset_lock_end(struct drm_modeset_acquire_ctx *ctx, > > > > + struct drm_atomic_state *state, > > > > + int *ret); > > > > + > > > > +/* > > > > + * Note that one must always use "continue" rather than > > > > + * "break" or "return" to handle errors within the > > > > + * drm_modeset_lock_ctx_retry() block. > > > > > > I'm not sold on loop macros with these kind of restrictions, C just isn't > > > a great language for these. That's why e.g. drm_connector_iter doesn't > > > give you a macro, but only the begin/next/end function calls explicitly. > > > > We already use this pattern extensively in i915. Gem ww ctx has one, > > power domains/pps/etc. use a similar things. It makes the code pretty nice, > > with the slight caveat that an accidental 'break' can ruin your day. But > > so can an accidental return with other constructs (and we even had that > > happen a few times with the connector iterators), so not a dealbreaker > > IMO. > > > >
Re: [Intel-gfx] [PATCH 03/11] drm/i915: Restructure probe to handle multi-tile platforms
On Fri, 08 Oct 2021, Matt Roper wrote: > On a multi-tile platform, each tile has its own registers + GGTT space, > and BAR 0 is extended to cover all of them. Upcoming patches will start > exposing the tiles as multiple GTs within a single PCI device. In > preparation for supporting such setups, restructure the driver's probe > code a bit. > > Only the primary/root tile is initialized for now; the other tiles will > be detected and plugged in by future patches once the necessary > infrastructure is in place to handle them. > > Original-author: Abdiel Janulgue > Cc: Daniele Ceraolo Spurio > Cc: Matthew Auld > Cc: Joonas Lahtinen > Signed-off-by: Daniele Ceraolo Spurio > Signed-off-by: Tvrtko Ursulin > Signed-off-by: Matt Roper > --- > drivers/gpu/drm/i915/gt/intel_gt.c | 45 > drivers/gpu/drm/i915/gt/intel_gt.h | 3 ++ > drivers/gpu/drm/i915/gt/intel_gt_pm.c| 9 - > drivers/gpu/drm/i915/gt/intel_gt_types.h | 5 +++ > drivers/gpu/drm/i915/i915_drv.c | 20 +-- > drivers/gpu/drm/i915/intel_uncore.c | 12 +++ > drivers/gpu/drm/i915/intel_uncore.h | 3 +- > 7 files changed, 76 insertions(+), 21 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c > b/drivers/gpu/drm/i915/gt/intel_gt.c > index 1cb1948ac959..f4bea1f1de77 100644 > --- a/drivers/gpu/drm/i915/gt/intel_gt.c > +++ b/drivers/gpu/drm/i915/gt/intel_gt.c > @@ -900,6 +900,51 @@ u32 intel_gt_read_register_fw(struct intel_gt *gt, > i915_reg_t reg) > return intel_uncore_read_fw(gt->uncore, reg); > } > > +static int > +tile_setup(struct intel_gt *gt, unsigned int id, phys_addr_t phys_addr) > +{ > + int ret; > + > + intel_uncore_init_early(gt->uncore, gt->i915); > + > + ret = intel_uncore_setup_mmio(gt->uncore, phys_addr); > + if (ret) > + return ret; > + > + gt->phys_addr = phys_addr; > + > + return 0; > +} > + > +static void tile_cleanup(struct intel_gt *gt) > +{ > + intel_uncore_cleanup_mmio(gt->uncore); > +} > + > +int intel_probe_gts(struct drm_i915_private *i915) > +{ > + struct pci_dev *pdev = to_pci_dev(i915->drm.dev); > + phys_addr_t phys_addr; > + unsigned int mmio_bar; > + int ret; > + > + mmio_bar = GRAPHICS_VER(i915) == 2 ? 1 : 0; > + phys_addr = pci_resource_start(pdev, mmio_bar); > + > + /* We always have at least one primary GT on any device */ > + ret = tile_setup(&i915->gt, 0, phys_addr); > + if (ret) > + return ret; > + > + /* TODO: add more tiles */ > + return 0; > +} > + > +void intel_gts_release(struct drm_i915_private *i915) > +{ > + tile_cleanup(&i915->gt); > +} Please call the functions intel_gt_*. BR, Jani. > + > void intel_gt_info_print(const struct intel_gt_info *info, >struct drm_printer *p) > { > diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h > b/drivers/gpu/drm/i915/gt/intel_gt.h > index 74e771871a9b..f4f35a70cbe4 100644 > --- a/drivers/gpu/drm/i915/gt/intel_gt.h > +++ b/drivers/gpu/drm/i915/gt/intel_gt.h > @@ -85,6 +85,9 @@ static inline bool intel_gt_needs_read_steering(struct > intel_gt *gt, > > u32 intel_gt_read_register_fw(struct intel_gt *gt, i915_reg_t reg); > > +int intel_probe_gts(struct drm_i915_private *i915); > +void intel_gts_release(struct drm_i915_private *i915); > + > void intel_gt_info_print(const struct intel_gt_info *info, >struct drm_printer *p); > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c > b/drivers/gpu/drm/i915/gt/intel_gt_pm.c > index 524eaf678790..76f498edb0d5 100644 > --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c > +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c > @@ -126,7 +126,14 @@ static const struct intel_wakeref_ops wf_ops = { > > void intel_gt_pm_init_early(struct intel_gt *gt) > { > - intel_wakeref_init(>->wakeref, gt->uncore->rpm, &wf_ops); > + /* > + * We access the runtime_pm structure via gt->i915 here rather than > + * gt->uncore as we do elsewhere in the file because gt->uncore is not > + * yet initialized for all tiles at this point in the driver startup. > + * runtime_pm is per-device rather than per-tile, so this is still the > + * correct structure. > + */ > + intel_wakeref_init(>->wakeref, >->i915->runtime_pm, &wf_ops); > seqcount_mutex_init(>->stats.lock, >->wakeref.mutex); > } > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h > b/drivers/gpu/drm/i915/gt/intel_gt_types.h > index 14216cc471b1..66143316d92e 100644 > --- a/drivers/gpu/drm/i915/gt/intel_gt_types.h > +++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h > @@ -180,6 +180,11 @@ struct intel_gt { > > const struct intel_mmio_range *steering_table[NUM_STEERING_TYPES]; > > + /* > + * Base of per-tile GTTMMADR where we can derive the MMIO and the GGTT. > + */ > + phys_addr_t phys_addr; > + > struct intel_gt_info { > intel_engine_mask_t engine_mask;
[Intel-gfx] [PATCH] drm/i915: Use dma_resv_iter for waiting in i915_gem_object_wait_reservation.
No memory should be allocated when calling i915_gem_object_wait, because it may be called to idle a BO when evicting memory. Fix this by using dma_resv_iter helpers to call i915_gem_object_wait_fence() on each fence, which cleans up the code a lot. Also remove dma_resv_prune, it's questionably. This will result in the following lockdep splat. <4> [83.538517] == <4> [83.538520] WARNING: possible circular locking dependency detected <4> [83.538522] 5.15.0-rc5-CI-Trybot_8062+ #1 Not tainted <4> [83.538525] -- <4> [83.538527] gem_render_line/5242 is trying to acquire lock: <4> [83.538530] 8275b1e0 (fs_reclaim){+.+.}-{0:0}, at: __kmalloc_track_caller+0x56/0x270 <4> [83.538538] but task is already holding lock: <4> [83.538540] 88813471d1e0 (&vm->mutex/1){+.+.}-{3:3}, at: i915_vma_pin_ww+0x1c7/0x970 [i915] <4> [83.538638] which lock already depends on the new lock. <4> [83.538642] the existing dependency chain (in reverse order) is: <4> [83.538645] -> #1 (&vm->mutex/1){+.+.}-{3:3}: <4> [83.538649]lock_acquire+0xd3/0x310 <4> [83.538654]i915_gem_shrinker_taints_mutex+0x2d/0x50 [i915] <4> [83.538730]i915_address_space_init+0xf5/0x1b0 [i915] <4> [83.538794]ppgtt_init+0x55/0x70 [i915] <4> [83.538856]gen8_ppgtt_create+0x44/0x5d0 [i915] <4> [83.538912]i915_ppgtt_create+0x28/0xf0 [i915] <4> [83.538971]intel_gt_init+0x130/0x3b0 [i915] <4> [83.539029]i915_gem_init+0x14b/0x220 [i915] <4> [83.539100]i915_driver_probe+0x97e/0xdd0 [i915] <4> [83.539149]i915_pci_probe+0x43/0x1d0 [i915] <4> [83.539197]pci_device_probe+0x9b/0x110 <4> [83.539201]really_probe+0x1b0/0x3b0 <4> [83.539205]__driver_probe_device+0xf6/0x170 <4> [83.539208]driver_probe_device+0x1a/0x90 <4> [83.539210]__driver_attach+0x93/0x160 <4> [83.539213]bus_for_each_dev+0x72/0xc0 <4> [83.539216]bus_add_driver+0x14b/0x1f0 <4> [83.539220]driver_register+0x66/0xb0 <4> [83.539222]hdmi_get_spk_alloc+0x1f/0x50 [snd_hda_codec_hdmi] <4> [83.539227]do_one_initcall+0x53/0x2e0 <4> [83.539230]do_init_module+0x55/0x200 <4> [83.539234]load_module+0x2700/0x2980 <4> [83.539237]__do_sys_finit_module+0xaa/0x110 <4> [83.539241]do_syscall_64+0x37/0xb0 <4> [83.539244]entry_SYSCALL_64_after_hwframe+0x44/0xae <4> [83.539247] -> #0 (fs_reclaim){+.+.}-{0:0}: <4> [83.539251]validate_chain+0xb37/0x1e70 <4> [83.539254]__lock_acquire+0x5a1/0xb70 <4> [83.539258]lock_acquire+0xd3/0x310 <4> [83.539260]fs_reclaim_acquire+0x9d/0xd0 <4> [83.539264]__kmalloc_track_caller+0x56/0x270 <4> [83.539267]krealloc+0x48/0xa0 <4> [83.539270]dma_resv_get_fences+0x1c3/0x280 <4> [83.539274]i915_gem_object_wait+0x1ff/0x410 [i915] <4> [83.539342]i915_gem_evict_for_node+0x16b/0x440 [i915] <4> [83.539412]i915_gem_gtt_reserve+0xff/0x130 [i915] <4> [83.539482]i915_vma_pin_ww+0x765/0x970 [i915] <4> [83.539556]eb_validate_vmas+0x6fe/0x8e0 [i915] <4> [83.539626]i915_gem_do_execbuffer+0x9a6/0x20a0 [i915] <4> [83.539693]i915_gem_execbuffer2_ioctl+0x11f/0x2c0 [i915] <4> [83.539759]drm_ioctl_kernel+0xac/0x140 <4> [83.539763]drm_ioctl+0x201/0x3d0 <4> [83.539766]__x64_sys_ioctl+0x6a/0xa0 <4> [83.539769]do_syscall_64+0x37/0xb0 <4> [83.539772]entry_SYSCALL_64_after_hwframe+0x44/0xae <4> [83.539775] other info that might help us debug this: <4> [83.539778] Possible unsafe locking scenario: <4> [83.539781]CPU0CPU1 <4> [83.539783] <4> [83.539785] lock(&vm->mutex/1); <4> [83.539788]lock(fs_reclaim); <4> [83.539791]lock(&vm->mutex/1); <4> [83.539794] lock(fs_reclaim); <4> [83.539796] *** DEADLOCK *** <4> [83.539799] 3 locks held by gem_render_line/5242: <4> [83.539802] #0: c9d4bbf0 (reservation_ww_class_acquire){+.+.}-{0:0}, at: i915_gem_do_execbuffer+0x8e5/0x20a0 [i915] <4> [83.539870] #1: 88811e48bae8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: eb_validate_vmas+0x81/0x8e0 [i915] <4> [83.539936] #2: 88813471d1e0 (&vm->mutex/1){+.+.}-{3:3}, at: i915_vma_pin_ww+0x1c7/0x970 [i915] <4> [83.540011] stack backtrace: <4> [83.540014] CPU: 2 PID: 5242 Comm: gem_render_line Not tainted 5.15.0-rc5-CI-Trybot_8062+ #1 <4> [83.540019] Hardware name: Intel(R) Client Systems NUC11TNHi3/NUC11TNBi3, BIOS TNTGL357.0038.2020.1124.1648 11/24/2020 <4> [83.540023] Call Trace: <4> [83.540026] dump_stack_lvl+0x56/0x7b <4> [83.540030] check_noncircular+0x12e/0x150 <4> [83.540034] ? _raw_spin_unlock_irqrestore+0x50/0x60 <4> [83.540038] validate_chain+0xb37/0x1e70 <4> [83.540042] __lock_acquire+0x5a1/0xb70 <4> [83.540046] lock_acquire+0
Re: [Intel-gfx] [RFC PATCH] drm: Increase DRM_OBJECT_MAX_PROPERTY by 18.
On 2021-10-13 14:02:59 [+0200], Daniel Vetter wrote: > On Tue, Oct 05, 2021 at 08:51:51AM +0200, Sebastian Andrzej Siewior wrote: > > The warning poped up, it says it increase it by the number of occurrence. > > I saw it 18 times so here it is. > > It started to up since commit > >2f425cf5242a0 ("drm: Fix oops in damage self-tests by mocking damage > > property") > > > > Increase DRM_OBJECT_MAX_PROPERTY by 18. > > > > Signed-off-by: Sebastian Andrzej Siewior > > Which driver where? Whomever added that into upstream should also have > realized this (things will just not work) and include it in there. So if > things are tested correctly this should be part of a larger series to add > these 18 props somewhere. This is on i915 with full debug. If I remember correctly, it wasn't there before commit c7fcbf2513973 ("drm/plane: check that fb_damage is set up when used") With that commit the box crashed until commit 2f425cf5242a0 ("drm: Fix oops in damage self-tests by mocking damage property") where I then observed this. > Also maybe we should just dynamically allocate this array if people have > this many properties on their objects. > -Daniel Sebastian
Re: [Intel-gfx] [PATCH 1/6] drm/i915: Update dma_fence_work
On Fri, Oct 08, 2021 at 03:35:25PM +0200, Thomas Hellström wrote: > Move the release callback to after fence signaling to align with > what's done for upcoming VM_BIND user-fence signaling. > > Finally call the work callback regardless of whether we have a fence > error or not and update the existing callbacks accordingly. We will > need this to intercept the error for failsafe migration. > > Signed-off-by: Thomas Hellström I think before we make this thing more complex we really should either move this into dma-buf/ as a proper thing, or just open-code. Minimally at least any new async dma_fence worker needs to have dma_fence_begin/end_signalling annotations, or we're just digging a grave here. I'm also not seeing the point in building everything on top of this, for many cases just an open-coded work_struct should be a lot simpler. It's just more to clean up later on, that part is for sure. -Daniel > --- > drivers/gpu/drm/i915/gem/i915_gem_clflush.c | 5 +++ > drivers/gpu/drm/i915/i915_sw_fence_work.c | 36 ++--- > drivers/gpu/drm/i915/i915_sw_fence_work.h | 1 + > drivers/gpu/drm/i915/i915_vma.c | 12 +-- > 4 files changed, 33 insertions(+), 21 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_clflush.c > b/drivers/gpu/drm/i915/gem/i915_gem_clflush.c > index f0435c6feb68..2143ebaf5b6f 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_clflush.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_clflush.c > @@ -28,6 +28,11 @@ static void clflush_work(struct dma_fence_work *base) > { > struct clflush *clflush = container_of(base, typeof(*clflush), base); > > + if (base->error) { > + dma_fence_set_error(&base->dma, base->error); > + return; > + } > + > __do_clflush(clflush->obj); > } > > diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.c > b/drivers/gpu/drm/i915/i915_sw_fence_work.c > index 5b33ef23d54c..5b55cddafc9b 100644 > --- a/drivers/gpu/drm/i915/i915_sw_fence_work.c > +++ b/drivers/gpu/drm/i915/i915_sw_fence_work.c > @@ -6,21 +6,24 @@ > > #include "i915_sw_fence_work.h" > > -static void fence_complete(struct dma_fence_work *f) > +static void dma_fence_work_complete(struct dma_fence_work *f) > { > + dma_fence_signal(&f->dma); > + > if (f->ops->release) > f->ops->release(f); > - dma_fence_signal(&f->dma); > + > + dma_fence_put(&f->dma); > } > > -static void fence_work(struct work_struct *work) > +static void dma_fence_work_work(struct work_struct *work) > { > struct dma_fence_work *f = container_of(work, typeof(*f), work); > > - f->ops->work(f); > + if (f->ops->work) > + f->ops->work(f); > > - fence_complete(f); > - dma_fence_put(&f->dma); > + dma_fence_work_complete(f); > } > > static int __i915_sw_fence_call > @@ -31,17 +34,13 @@ fence_notify(struct i915_sw_fence *fence, enum > i915_sw_fence_notify state) > switch (state) { > case FENCE_COMPLETE: > if (fence->error) > - dma_fence_set_error(&f->dma, fence->error); > - > - if (!f->dma.error) { > - dma_fence_get(&f->dma); > - if (test_bit(DMA_FENCE_WORK_IMM, &f->dma.flags)) > - fence_work(&f->work); > - else > - queue_work(system_unbound_wq, &f->work); > - } else { > - fence_complete(f); > - } > + cmpxchg(&f->error, 0, fence->error); > + > + dma_fence_get(&f->dma); > + if (test_bit(DMA_FENCE_WORK_IMM, &f->dma.flags)) > + dma_fence_work_work(&f->work); > + else > + queue_work(system_unbound_wq, &f->work); > break; > > case FENCE_FREE: > @@ -84,10 +83,11 @@ void dma_fence_work_init(struct dma_fence_work *f, >const struct dma_fence_work_ops *ops) > { > f->ops = ops; > + f->error = 0; > spin_lock_init(&f->lock); > dma_fence_init(&f->dma, &fence_ops, &f->lock, 0, 0); > i915_sw_fence_init(&f->chain, fence_notify); > - INIT_WORK(&f->work, fence_work); > + INIT_WORK(&f->work, dma_fence_work_work); > } > > int dma_fence_work_chain(struct dma_fence_work *f, struct dma_fence *signal) > diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.h > b/drivers/gpu/drm/i915/i915_sw_fence_work.h > index d56806918d13..caa59fb5252b 100644 > --- a/drivers/gpu/drm/i915/i915_sw_fence_work.h > +++ b/drivers/gpu/drm/i915/i915_sw_fence_work.h > @@ -24,6 +24,7 @@ struct dma_fence_work_ops { > struct dma_fence_work { > struct dma_fence dma; > spinlock_t lock; > + int error; > > struct i915_sw_fence chain; > struct i915_sw_dma_fence_cb cb; > diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c > index 4b7fc4647e46..5123ac28ad9a 100644 > --- a/d
Re: [Intel-gfx] [PATCH 4/6] drm/i915: Add a struct dma_fence_work timeline
On Fri, Oct 08, 2021 at 03:35:28PM +0200, Thomas Hellström wrote: > The TTM managers and, possibly, the gtt address space managers will > need to be able to order fences for async operation. > Using dma_fence_is_later() for this will require that the fences we hand > them are from a single fence context and ordered. > > Introduce a struct dma_fence_work_timeline, and a function to attach > struct dma_fence_work to such a timeline in a way that all previous > fences attached to the timeline will be signaled when the latest > attached struct dma_fence_work signals. > > Signed-off-by: Thomas Hellström I'm not understanding why we need this: - if we just want to order dma_fence work, then an ordered workqueue is what we want. Which is why hand-rolling is better than reusing dma_fence_work for absolutely everything. - if we just need to make sure the public fences signal in order, then it's a dma_fence_chain. Definitely no more "it looks like it's shared code but isn't" stuff in i915. -Daniel > --- > drivers/gpu/drm/i915/i915_sw_fence_work.c | 89 ++- > drivers/gpu/drm/i915/i915_sw_fence_work.h | 58 +++ > 2 files changed, 145 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.c > b/drivers/gpu/drm/i915/i915_sw_fence_work.c > index 5b55cddafc9b..87cdb3158042 100644 > --- a/drivers/gpu/drm/i915/i915_sw_fence_work.c > +++ b/drivers/gpu/drm/i915/i915_sw_fence_work.c > @@ -5,6 +5,66 @@ > */ > > #include "i915_sw_fence_work.h" > +#include "i915_utils.h" > + > +/** > + * dma_fence_work_timeline_attach - Attach a struct dma_fence_work to a > + * timeline. > + * @tl: The timeline to attach to. > + * @f: The struct dma_fence_work. > + * @tl_cb: The i915_sw_dma_fence_cb needed to attach to the > + * timeline. This is typically embedded into the structure that also > + * embeds the struct dma_fence_work. > + * > + * This function takes a timeline reference and associates it with the > + * struct dma_fence_work. That reference is given up when the fence > + * signals. Furthermore it assigns a fence context and a seqno to the > + * dma-fence, and then chains upon the previous fence of the timeline > + * if any, to make sure that the fence signals after that fence. The > + * @tl_cb callback structure is needed for that chaining. Finally > + * the registered last fence of the timeline is replaced by this fence, and > + * the timeline takes a reference on the fence, which is released when > + * the fence signals. > + */ > +void dma_fence_work_timeline_attach(struct dma_fence_work_timeline *tl, > + struct dma_fence_work *f, > + struct i915_sw_dma_fence_cb *tl_cb) > +{ > + struct dma_fence *await; > + > + if (tl->ops->get) > + tl->ops->get(tl); > + > + spin_lock(&tl->lock); > + await = tl->last_fence; > + tl->last_fence = dma_fence_get(&f->dma); > + f->dma.seqno = tl->seqno++; > + f->dma.context = tl->context; > + f->tl = tl; > + spin_unlock(&tl->lock); > + > + if (await) { > + __i915_sw_fence_await_dma_fence(&f->chain, await, tl_cb); > + dma_fence_put(await); > + } > +} > + > +static void dma_fence_work_timeline_detach(struct dma_fence_work *f) > +{ > + struct dma_fence_work_timeline *tl = f->tl; > + bool put = false; > + > + spin_lock(&tl->lock); > + if (tl->last_fence == &f->dma) { > + put = true; > + tl->last_fence = NULL; > + } > + spin_unlock(&tl->lock); > + if (tl->ops->put) > + tl->ops->put(tl); > + if (put) > + dma_fence_put(&f->dma); > +} > > static void dma_fence_work_complete(struct dma_fence_work *f) > { > @@ -13,6 +73,9 @@ static void dma_fence_work_complete(struct dma_fence_work > *f) > if (f->ops->release) > f->ops->release(f); > > + if (f->tl) > + dma_fence_work_timeline_detach(f); > + > dma_fence_put(&f->dma); > } > > @@ -53,14 +116,17 @@ fence_notify(struct i915_sw_fence *fence, enum > i915_sw_fence_notify state) > > static const char *get_driver_name(struct dma_fence *fence) > { > - return "dma-fence"; > + struct dma_fence_work *f = container_of(fence, typeof(*f), dma); > + > + return (f->tl && f->tl->ops->name) ? f->tl->ops->name : "dma-fence"; > } > > static const char *get_timeline_name(struct dma_fence *fence) > { > struct dma_fence_work *f = container_of(fence, typeof(*f), dma); > > - return f->ops->name ?: "work"; > + return (f->tl && f->tl->name) ? f->tl->name : > + f->ops->name ?: "work"; > } > > static void fence_release(struct dma_fence *fence) > @@ -84,6 +150,7 @@ void dma_fence_work_init(struct dma_fence_work *f, > { > f->ops = ops; > f->error = 0; > + f->tl = NULL; > spin_lock_init(&f->lock); > dma_fence_init(&f->dma, &fence_ops, &f->lock, 0, 0); > i
Re: [Intel-gfx] [RFC PATCH] drm: Increase DRM_OBJECT_MAX_PROPERTY by 18.
On Wed, Oct 13, 2021 at 02:35:25PM +0200, Sebastian Andrzej Siewior wrote: > On 2021-10-13 14:02:59 [+0200], Daniel Vetter wrote: > > On Tue, Oct 05, 2021 at 08:51:51AM +0200, Sebastian Andrzej Siewior wrote: > > > The warning poped up, it says it increase it by the number of occurrence. > > > I saw it 18 times so here it is. > > > It started to up since commit > > >2f425cf5242a0 ("drm: Fix oops in damage self-tests by mocking damage > > > property") > > > > > > Increase DRM_OBJECT_MAX_PROPERTY by 18. > > > > > > Signed-off-by: Sebastian Andrzej Siewior > > > > Which driver where? Whomever added that into upstream should also have > > realized this (things will just not work) and include it in there. So if > > things are tested correctly this should be part of a larger series to add > > these 18 props somewhere. > > This is on i915 with full debug. If I remember correctly, it wasn't > there before commit >c7fcbf2513973 ("drm/plane: check that fb_damage is set up when used") > > With that commit the box crashed until commit >2f425cf5242a0 ("drm: Fix oops in damage self-tests by mocking damage > property") > > where I then observed this. Hm there's a pile of commits there, and nothing immediately jumps to light. The thing is, 18 is likely way too much, since if e.g. we have a single new property on a plane and that pushes over the limit on all of them, you get iirc 3x4 already simply because we have that many planes. So would be good to know the actual culprit. Can you pls try to bisect the above range, applying the patch as a fixup locally (without commit, that will confuse git bisect a bit I think), so we know what/where went wrong? I'm still confused why this isn't showing up anywhere in our intel ci ... Thanks, Daniel > > > Also maybe we should just dynamically allocate this array if people have > > this many properties on their objects. > > -Daniel > > Sebastian -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [Intel-gfx] [PATCH 1/6] drm/i915: Update dma_fence_work
On 10/13/21 14:41, Daniel Vetter wrote: On Fri, Oct 08, 2021 at 03:35:25PM +0200, Thomas Hellström wrote: Move the release callback to after fence signaling to align with what's done for upcoming VM_BIND user-fence signaling. Finally call the work callback regardless of whether we have a fence error or not and update the existing callbacks accordingly. We will need this to intercept the error for failsafe migration. Signed-off-by: Thomas Hellström I think before we make this thing more complex we really should either move this into dma-buf/ as a proper thing, or just open-code. Minimally at least any new async dma_fence worker needs to have dma_fence_begin/end_signalling annotations, or we're just digging a grave here. I'm also not seeing the point in building everything on top of this, for many cases just an open-coded work_struct should be a lot simpler. It's just more to clean up later on, that part is for sure. -Daniel Yes, I mentioned to Matthew, I'm going to respin this based on our previous discussions. Forgot to mention on the ML. /Thomas --- drivers/gpu/drm/i915/gem/i915_gem_clflush.c | 5 +++ drivers/gpu/drm/i915/i915_sw_fence_work.c | 36 ++--- drivers/gpu/drm/i915/i915_sw_fence_work.h | 1 + drivers/gpu/drm/i915/i915_vma.c | 12 +-- 4 files changed, 33 insertions(+), 21 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_clflush.c b/drivers/gpu/drm/i915/gem/i915_gem_clflush.c index f0435c6feb68..2143ebaf5b6f 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_clflush.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_clflush.c @@ -28,6 +28,11 @@ static void clflush_work(struct dma_fence_work *base) { struct clflush *clflush = container_of(base, typeof(*clflush), base); + if (base->error) { + dma_fence_set_error(&base->dma, base->error); + return; + } + __do_clflush(clflush->obj); } diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.c b/drivers/gpu/drm/i915/i915_sw_fence_work.c index 5b33ef23d54c..5b55cddafc9b 100644 --- a/drivers/gpu/drm/i915/i915_sw_fence_work.c +++ b/drivers/gpu/drm/i915/i915_sw_fence_work.c @@ -6,21 +6,24 @@ #include "i915_sw_fence_work.h" -static void fence_complete(struct dma_fence_work *f) +static void dma_fence_work_complete(struct dma_fence_work *f) { + dma_fence_signal(&f->dma); + if (f->ops->release) f->ops->release(f); - dma_fence_signal(&f->dma); + + dma_fence_put(&f->dma); } -static void fence_work(struct work_struct *work) +static void dma_fence_work_work(struct work_struct *work) { struct dma_fence_work *f = container_of(work, typeof(*f), work); - f->ops->work(f); + if (f->ops->work) + f->ops->work(f); - fence_complete(f); - dma_fence_put(&f->dma); + dma_fence_work_complete(f); } static int __i915_sw_fence_call @@ -31,17 +34,13 @@ fence_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state) switch (state) { case FENCE_COMPLETE: if (fence->error) - dma_fence_set_error(&f->dma, fence->error); - - if (!f->dma.error) { - dma_fence_get(&f->dma); - if (test_bit(DMA_FENCE_WORK_IMM, &f->dma.flags)) - fence_work(&f->work); - else - queue_work(system_unbound_wq, &f->work); - } else { - fence_complete(f); - } + cmpxchg(&f->error, 0, fence->error); + + dma_fence_get(&f->dma); + if (test_bit(DMA_FENCE_WORK_IMM, &f->dma.flags)) + dma_fence_work_work(&f->work); + else + queue_work(system_unbound_wq, &f->work); break; case FENCE_FREE: @@ -84,10 +83,11 @@ void dma_fence_work_init(struct dma_fence_work *f, const struct dma_fence_work_ops *ops) { f->ops = ops; + f->error = 0; spin_lock_init(&f->lock); dma_fence_init(&f->dma, &fence_ops, &f->lock, 0, 0); i915_sw_fence_init(&f->chain, fence_notify); - INIT_WORK(&f->work, fence_work); + INIT_WORK(&f->work, dma_fence_work_work); } int dma_fence_work_chain(struct dma_fence_work *f, struct dma_fence *signal) diff --git a/drivers/gpu/drm/i915/i915_sw_fence_work.h b/drivers/gpu/drm/i915/i915_sw_fence_work.h index d56806918d13..caa59fb5252b 100644 --- a/drivers/gpu/drm/i915/i915_sw_fence_work.h +++ b/drivers/gpu/drm/i915/i915_sw_fence_work.h @@ -24,6 +24,7 @@ struct dma_fence_work_ops { struct dma_fence_work { struct dma_fence dma; spinlock_t lock; + int error; struct i915_sw_fence chain; struct i915_sw_dma_fence_cb cb; diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c i
Re: [Intel-gfx] [PATCH v2] drm/locking: add backtrace for locking contended locks without backoff
On Fri, 01 Oct 2021, Jani Nikula wrote: > If drm_modeset_lock() returns -EDEADLK, the caller is supposed to drop > all currently held locks using drm_modeset_backoff(). Failing to do so > will result in warnings and backtraces on the paths trying to lock a > contended lock. Add support for optionally printing the backtrace on the > path that hit the deadlock and didn't gracefully handle the situation. > > For example, the patch [1] inadvertently dropped the return value check > and error return on replacing calc_watermark_data() with > intel_compute_global_watermarks(). The backtraces on the subsequent > locking paths hitting WARN_ON(ctx->contended) were unhelpful, but adding > the backtrace to the deadlock path produced this helpful printout: > > <7> [98.002465] drm_modeset_lock attempting to lock a contended lock without > backoff: >drm_modeset_lock+0x107/0x130 >drm_atomic_get_plane_state+0x76/0x150 >skl_compute_wm+0x251d/0x2b20 [i915] >intel_atomic_check+0x1942/0x29e0 [i915] >drm_atomic_check_only+0x554/0x910 >drm_atomic_nonblocking_commit+0xe/0x50 >drm_mode_atomic_ioctl+0x8c2/0xab0 >drm_ioctl_kernel+0xac/0x140 > > Add new CONFIG_DRM_DEBUG_MODESET_LOCK to enable modeset lock debugging > with stack depot and trace. > > [1] https://lore.kernel.org/r/20210924114741.15940-4-jani.nik...@intel.com > > v2: > - default y if DEBUG_WW_MUTEX_SLOWPATH (Daniel) > - depends on DEBUG_KERNEL > > Cc: Daniel Vetter > Cc: Dave Airlie > Reviewed-by: Daniel Vetter > Signed-off-by: Jani Nikula Pushed to drm-misc-next, thanks for the review. BR, Jani. > --- > drivers/gpu/drm/Kconfig| 15 + > drivers/gpu/drm/drm_modeset_lock.c | 49 -- > include/drm/drm_modeset_lock.h | 8 + > 3 files changed, 70 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig > index 2a926d0de423..a4c020a9a0eb 100644 > --- a/drivers/gpu/drm/Kconfig > +++ b/drivers/gpu/drm/Kconfig > @@ -100,6 +100,21 @@ config DRM_DEBUG_DP_MST_TOPOLOGY_REFS >This has the potential to use a lot of memory and print some very >large kernel messages. If in doubt, say "N". > > +config DRM_DEBUG_MODESET_LOCK > + bool "Enable backtrace history for lock contention" > + depends on STACKTRACE_SUPPORT > + depends on DEBUG_KERNEL > + depends on EXPERT > + select STACKDEPOT > + default y if DEBUG_WW_MUTEX_SLOWPATH > + help > + Enable debug tracing of failures to gracefully handle drm modeset lock > + contention. A history of each drm modeset lock path hitting -EDEADLK > + will be saved until gracefully handled, and the backtrace will be > + printed when attempting to lock a contended lock. > + > + If in doubt, say "N". > + > config DRM_FBDEV_EMULATION > bool "Enable legacy fbdev support for your modesetting driver" > depends on DRM > diff --git a/drivers/gpu/drm/drm_modeset_lock.c > b/drivers/gpu/drm/drm_modeset_lock.c > index bf8a6e823a15..4d32b61fa1fd 100644 > --- a/drivers/gpu/drm/drm_modeset_lock.c > +++ b/drivers/gpu/drm/drm_modeset_lock.c > @@ -25,6 +25,7 @@ > #include > #include > #include > +#include > > /** > * DOC: kms locking > @@ -77,6 +78,45 @@ > > static DEFINE_WW_CLASS(crtc_ww_class); > > +#if IS_ENABLED(CONFIG_DRM_DEBUG_MODESET_LOCK) > +static noinline depot_stack_handle_t __stack_depot_save(void) > +{ > + unsigned long entries[8]; > + unsigned int n; > + > + n = stack_trace_save(entries, ARRAY_SIZE(entries), 1); > + > + return stack_depot_save(entries, n, GFP_NOWAIT | __GFP_NOWARN); > +} > + > +static void __stack_depot_print(depot_stack_handle_t stack_depot) > +{ > + struct drm_printer p = drm_debug_printer("drm_modeset_lock"); > + unsigned long *entries; > + unsigned int nr_entries; > + char *buf; > + > + buf = kmalloc(PAGE_SIZE, GFP_NOWAIT | __GFP_NOWARN); > + if (!buf) > + return; > + > + nr_entries = stack_depot_fetch(stack_depot, &entries); > + stack_trace_snprint(buf, PAGE_SIZE, entries, nr_entries, 2); > + > + drm_printf(&p, "attempting to lock a contended lock without > backoff:\n%s", buf); > + > + kfree(buf); > +} > +#else /* CONFIG_DRM_DEBUG_MODESET_LOCK */ > +static depot_stack_handle_t __stack_depot_save(void) > +{ > + return 0; > +} > +static void __stack_depot_print(depot_stack_handle_t stack_depot) > +{ > +} > +#endif /* CONFIG_DRM_DEBUG_MODESET_LOCK */ > + > /** > * drm_modeset_lock_all - take all modeset locks > * @dev: DRM device > @@ -225,7 +265,9 @@ EXPORT_SYMBOL(drm_modeset_acquire_fini); > */ > void drm_modeset_drop_locks(struct drm_modeset_acquire_ctx *ctx) > { > - WARN_ON(ctx->contended); > + if (WARN_ON(ctx->contended)) > + __stack_depot_print(ctx->stack_depot); > + > while (!list_empty(&ctx->locked)) { > struct drm_modeset_lock *lock; > > @@ -243,7 +285,8 @@ static i
Re: [Intel-gfx] [PATCH v2] component: do not leave master devres group open after bind
On Wed, Oct 06, 2021 at 04:47:57PM +0300, Kai Vehmanen wrote: > Hi, > > On Tue, 5 Oct 2021, Greg KH wrote: > > > On Wed, Sep 22, 2021 at 11:54:32AM +0300, Kai Vehmanen wrote: > > > In current code, the devres group for aggregate master is left open > > > after call to component_master_add_*(). This leads to problems when the > > > master does further managed allocations on its own. When any > > > participating driver calls component_del(), this leads to immediate > > > release of resources. > [...] > > > the devres group, and by closing the devres group after > > > the master->ops->bind() call is done. This allows devres allocations > > > done by the driver acting as master to be isolated from the binding state > > > of the aggregate driver. This modifies the logic originally introduced in > > > commit 9e1ccb4a7700 ("drivers/base: fix devres handling for master > > > device") > > > > > > BugLink: https://gitlab.freedesktop.org/drm/intel/-/issues/4136 > > > Signed-off-by: Kai Vehmanen > > > Acked-by: Imre Deak > > > Acked-by: Russell King (Oracle) > > > > What commit does this "fix:"? And does it need to go to stable > > kernel(s)? > > I didn't put a "Fixes" on the original commit 9e1ccb4a7700 > ("drivers/base: fix devres handling for master device") as it alone > didn't cause problems. It did open the door for possible devres issues > for anybody calling component_master_add_(). > > On audio side, this surfaced with the more recent commit 3fcaf24e5dce > ("ALSA: hda: Allocate resources with device-managed APIs"). In theory one > could have hit issues already before, but this made it very easy to hit > on actual systems. > > If I'd have to pick one, it would be 9e1ccb4a7700 ("drivers/base: fix > devres handling for master device"). And yes, given comments on this > thread, I'd say this needs to go to stable kernels. Then please add a fixes: line and a cc: stable line and resend. thanks, greg k-h
[Intel-gfx] ✗ Fi.CI.IGT: failure for drm/i915: vlv sideband
== Series Details == Series: drm/i915: vlv sideband URL : https://patchwork.freedesktop.org/series/95764/ State : failure == Summary == CI Bug Log - changes from CI_DRM_10728_full -> Patchwork_21327_full Summary --- **FAILURE** Serious unknown changes coming with Patchwork_21327_full absolutely need to be verified manually. If you think the reported changes have nothing to do with the changes introduced in Patchwork_21327_full, please notify your bug team to allow them to document this new failure mode, which will reduce false positives in CI. Possible new issues --- Here are the unknown changes that may have been introduced in Patchwork_21327_full: ### IGT changes ### Possible regressions * igt@kms_frontbuffer_tracking@fbc-suspend: - shard-kbl: [PASS][1] -> [INCOMPLETE][2] [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10728/shard-kbl7/igt@kms_frontbuffer_track...@fbc-suspend.html [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21327/shard-kbl2/igt@kms_frontbuffer_track...@fbc-suspend.html Known issues Here are the changes found in Patchwork_21327_full that come from known issues: ### IGT changes ### Issues hit * igt@gem_ctx_isolation@preservation-s3@vcs0: - shard-kbl: [PASS][3] -> [DMESG-WARN][4] ([i915#180]) +5 similar issues [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10728/shard-kbl1/igt@gem_ctx_isolation@preservation...@vcs0.html [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21327/shard-kbl1/igt@gem_ctx_isolation@preservation...@vcs0.html * igt@gem_ctx_persistence@legacy-engines-persistence: - shard-snb: NOTRUN -> [SKIP][5] ([fdo#109271] / [i915#1099]) +2 similar issues [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21327/shard-snb5/igt@gem_ctx_persiste...@legacy-engines-persistence.html * igt@gem_ctx_shared@q-in-order: - shard-snb: NOTRUN -> [SKIP][6] ([fdo#109271]) +224 similar issues [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21327/shard-snb5/igt@gem_ctx_sha...@q-in-order.html * igt@gem_eio@unwedge-stress: - shard-skl: [PASS][7] -> [TIMEOUT][8] ([i915#2369] / [i915#3063]) [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10728/shard-skl1/igt@gem_...@unwedge-stress.html [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21327/shard-skl5/igt@gem_...@unwedge-stress.html * igt@gem_exec_fair@basic-deadline: - shard-skl: NOTRUN -> [FAIL][9] ([i915#2846]) [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21327/shard-skl1/igt@gem_exec_f...@basic-deadline.html * igt@gem_exec_fair@basic-none-rrul@rcs0: - shard-tglb: NOTRUN -> [FAIL][10] ([i915#2842]) [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21327/shard-tglb2/igt@gem_exec_fair@basic-none-r...@rcs0.html * igt@gem_exec_fair@basic-none-solo@rcs0: - shard-glk: [PASS][11] -> [FAIL][12] ([i915#2842]) [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10728/shard-glk5/igt@gem_exec_fair@basic-none-s...@rcs0.html [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21327/shard-glk1/igt@gem_exec_fair@basic-none-s...@rcs0.html * igt@gem_exec_fair@basic-pace@vcs0: - shard-kbl: [PASS][13] -> [FAIL][14] ([i915#2842]) [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10728/shard-kbl6/igt@gem_exec_fair@basic-p...@vcs0.html [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21327/shard-kbl2/igt@gem_exec_fair@basic-p...@vcs0.html * igt@gem_exec_fair@basic-pace@vcs1: - shard-iclb: NOTRUN -> [FAIL][15] ([i915#2842]) [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21327/shard-iclb2/igt@gem_exec_fair@basic-p...@vcs1.html - shard-tglb: [PASS][16] -> [FAIL][17] ([i915#2842]) [16]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10728/shard-tglb5/igt@gem_exec_fair@basic-p...@vcs1.html [17]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21327/shard-tglb3/igt@gem_exec_fair@basic-p...@vcs1.html * igt@gem_exec_schedule@u-submit-golden-slice@vecs0: - shard-skl: NOTRUN -> [INCOMPLETE][18] ([i915#3797]) [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21327/shard-skl8/igt@gem_exec_schedule@u-submit-golden-sl...@vecs0.html * igt@gem_fenced_exec_thrash@2-spare-fences: - shard-snb: [PASS][19] -> [INCOMPLETE][20] ([i915#2055]) [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10728/shard-snb5/igt@gem_fenced_exec_thr...@2-spare-fences.html [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21327/shard-snb6/igt@gem_fenced_exec_thr...@2-spare-fences.html * igt@gem_huc_copy@huc-copy: - shard-tglb: [PASS][21] -> [SKIP][22] ([i915#2190]) [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10728/shard-tglb3/igt@gem_huc_c..
Re: [Intel-gfx] [PATCH 03/14] drm/i915/xehpsdv: enforce min GTT alignment
On Mon, Oct 11, 2021 at 09:41:44PM +0530, Ramalingam C wrote: > From: Matthew Auld > > For local-memory objects we need to align the GTT addresses to 64K, both > for the ppgtt and ggtt. > > Signed-off-by: Matthew Auld > Signed-off-by: Stuart Summers > Signed-off-by: Ramalingam C > Cc: Joonas Lahtinen > Cc: Rodrigo Vivi Do we still need this with relocations removed? Userspace is picking all the addresses for us, so all we have to check is whether userspace got it right. -Daniel > --- > drivers/gpu/drm/i915/i915_vma.c | 9 +++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c > index 4b7fc4647e46..1ea1fa08efdf 100644 > --- a/drivers/gpu/drm/i915/i915_vma.c > +++ b/drivers/gpu/drm/i915/i915_vma.c > @@ -670,8 +670,13 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 > alignment, u64 flags) > } > > color = 0; > - if (vma->obj && i915_vm_has_cache_coloring(vma->vm)) > - color = vma->obj->cache_level; > + if (vma->obj) { > + if (HAS_64K_PAGES(vma->vm->i915) && > i915_gem_object_is_lmem(vma->obj)) > + alignment = max(alignment, I915_GTT_PAGE_SIZE_64K); > + > + if (i915_vm_has_cache_coloring(vma->vm)) > + color = vma->obj->cache_level; > + } > > if (flags & PIN_OFFSET_FIXED) { > u64 offset = flags & PIN_OFFSET_MASK; > -- > 2.20.1 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [Intel-gfx] [PATCH] drm/i915: Use dma_resv_iter for waiting in i915_gem_object_wait_reservation.
Hi Maarten, I love your patch! Yet something to improve: [auto build test ERROR on drm-intel/for-linux-next] [also build test ERROR on drm-tip/drm-tip drm-exynos/exynos-drm-next tegra-drm/drm/tegra/for-next v5.15-rc5 next-20211013] [cannot apply to airlied/drm-next] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Maarten-Lankhorst/drm-i915-Use-dma_resv_iter-for-waiting-in-i915_gem_object_wait_reservation/20211013-184219 base: git://anongit.freedesktop.org/drm-intel for-linux-next config: x86_64-randconfig-a015-20211013 (attached as .config) compiler: gcc-9 (Debian 9.3.0-22) 9.3.0 reproduce (this is a W=1 build): # https://github.com/0day-ci/linux/commit/647f0c4c47ffea53967daf523e8b935707e7a586 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Maarten-Lankhorst/drm-i915-Use-dma_resv_iter-for-waiting-in-i915_gem_object_wait_reservation/20211013-184219 git checkout 647f0c4c47ffea53967daf523e8b935707e7a586 # save the attached .config to linux build tree mkdir build_dir make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash drivers/gpu/drm/i915/ If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All errors (new ones prefixed by >>): >> drivers/gpu/drm/i915/gem/i915_gem_shrinker.c:18:10: fatal error: >> dma_resv_utils.h: No such file or directory 18 | #include "dma_resv_utils.h" | ^~ compilation terminated. vim +18 drivers/gpu/drm/i915/gem/i915_gem_shrinker.c 09137e94543761 drivers/gpu/drm/i915/gem/i915_gem_shrinker.c Chris Wilson 2020-07-08 17 6d393ef5ff5cac drivers/gpu/drm/i915/gem/i915_gem_shrinker.c Chris Wilson 2020-12-23 @18 #include "dma_resv_utils.h" be6a0376950475 drivers/gpu/drm/i915/i915_gem_shrinker.c Daniel Vetter 2015-03-18 19 #include "i915_trace.h" be6a0376950475 drivers/gpu/drm/i915/i915_gem_shrinker.c Daniel Vetter 2015-03-18 20 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org .config.gz Description: application/gzip
Re: [Intel-gfx] [PATCH 13/14] drm/i915/uapi: document behaviour for DG2 64K support
On Mon, Oct 11, 2021 at 09:41:54PM +0530, Ramalingam C wrote: > From: Matthew Auld > > On discrete platforms like DG2, we need to support a minimum page size > of 64K when dealing with device local-memory. This is quite tricky for > various reasons, so try to document the new implicit uapi for this. > > Signed-off-by: Matthew Auld > Signed-off-by: Ramalingam C > --- > include/uapi/drm/i915_drm.h | 61 ++--- > 1 file changed, 56 insertions(+), 5 deletions(-) > > diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h > index aa2a7eccfb94..d62e8b7ed8b6 100644 > --- a/include/uapi/drm/i915_drm.h > +++ b/include/uapi/drm/i915_drm.h > @@ -1118,10 +1118,16 @@ struct drm_i915_gem_exec_object2 { > /** >* When the EXEC_OBJECT_PINNED flag is specified this is populated by >* the user with the GTT offset at which this object will be pinned. > + * >* When the I915_EXEC_NO_RELOC flag is specified this must contain the >* presumed_offset of the object. > + * >* During execbuffer2 the kernel populates it with the value of the >* current GTT offset of the object, for future presumed_offset writes. > + * > + * See struct drm_i915_gem_create_ext for the rules when dealing with > + * alignment restrictions with I915_MEMORY_CLASS_DEVICE, on devices with > + * minimum page sizes, like DG2. >*/ > __u64 offset; > > @@ -3001,11 +3007,56 @@ struct drm_i915_gem_create_ext { >* I think a heading here (or a bit earlier) about Page alignment would be good. Just mark it up as bold or something (since real sphinx headings won't work). >* The (page-aligned) allocated size for the object will be returned. >* > - * Note that for some devices we have might have further minimum > - * page-size restrictions(larger than 4K), like for device local-memory. > - * However in general the final size here should always reflect any > - * rounding up, if for example using the > I915_GEM_CREATE_EXT_MEMORY_REGIONS > - * extension to place the object in device local-memory. > + * On discrete platforms, starting from DG2, we have to contend with GTT > + * page size restrictions when dealing with I915_MEMORY_CLASS_DEVICE > + * objects. Specifically the hardware only supports 64K or larger GTT > + * page sizes for such memory. The kernel will already ensure that all > + * I915_MEMORY_CLASS_DEVICE memory is allocated using 64K or larger page > + * sizes underneath. > + * > + * Note that the returned size here will always reflect any required > + * rounding up done by the kernel, i.e 4K will now become 64K on devices > + * such as DG2. The GTT alignment will also need be at least 64K for > + * such objects. > + * I think here we should have a "Special DG2 placement restrictions" heading for clarity > + * Note that due to how the hardware implements 64K GTT page support, we > + * have some further complications: > + * > + * 1.) The entire PDE(which covers a 2M virtual address range), must Does this really format into a nice list in the html output? Also not both . and ), usually in text it's just ) > + * contain only 64K PTEs, i.e mixing 4K and 64K PTEs in the same > + * PDE is forbidden by the hardware. > + * > + * 2.) We still need to support 4K PTEs for I915_MEMORY_CLASS_SYSTEM > + * objects. > + * > + * To handle the above the kernel implements a memory coloring scheme to > + * prevent userspace from mixing I915_MEMORY_CLASS_DEVICE and > + * I915_MEMORY_CLASS_SYSTEM objects in the same PDE. If the kernel is > + * ever unable to evict the required pages for the given PDE(different > + * color) when inserting the object into the GTT then it will simply > + * fail the request. > + * > + * Since userspace needs to manage the GTT address space themselves, > + * special care is needed to ensure this doesn't happen. The simplest > + * scheme is to simply align and round up all I915_MEMORY_CLASS_DEVICE > + * objects to 2M, which avoids any issues here. At the very least this > + * is likely needed for objects that can be placed in both > + * I915_MEMORY_CLASS_DEVICE and I915_MEMORY_CLASS_SYSTEM, to avoid > + * potential issues when the kernel needs to migrate the object behind > + * the scenes, since that might also involve evicting other objects. > + * > + * To summarise the GTT rules, on platforms like DG2: > + * > + * 1.) All objects that can be placed in I915_MEMORY_CLASS_DEVICE must > + * have 64K alignment. The kernel will reject this otherwise. > + * > + * 2.) All I915_MEMORY_CLASS_DEVICE objects must never be placed in > + * the same PDE with other I915_MEMORY_CLASS_SYSTEM objects. The > + * kernel will r
Re: [Intel-gfx] [PATCH 14/14] Doc/gpu/rfc/i915: i915 DG2 uAPI
On Mon, Oct 11, 2021 at 09:41:55PM +0530, Ramalingam C wrote: > Details of the new features getting added as part of DG2 enabling and their > implicit impact on the uAPI. > > Signed-off-by: Ramalingam C > cc: Daniel Vetter > cc: Matthew Auld > --- > Documentation/gpu/rfc/i915_dg2.rst | 47 ++ > Documentation/gpu/rfc/index.rst| 3 ++ > 2 files changed, 50 insertions(+) > create mode 100644 Documentation/gpu/rfc/i915_dg2.rst Please move this and any uapi doc patch this relies on to the front of the series, so it serves as an intro. I think the 64k side looks good with the uapi docs, once it's fully reviewed and acked. What we still need is proper uapi docs for flat CCS. I think for that a separate flat ccs DOC: section would be good, which is then references by the gem_create_ext kerneldoc with a sphinx hyperlink. The other thing that's missing here are the dg2 flat ccs drm_modifiers. So we need another patch for that, which in it's kerneldoc then also links to the flat ccs DOC: section. Finally that flat ccs doc section needs to discuss all the flat ccs issues and uapi we've discussed. That patch needs to be acked both by userspace driver folks, and by compositor folks (because of the modifier uapi aspect). Please cc Pekka and Simon Ser for the compositor acks (but feel free to add more people). -Daniel > > diff --git a/Documentation/gpu/rfc/i915_dg2.rst > b/Documentation/gpu/rfc/i915_dg2.rst > new file mode 100644 > index ..a83ca26cd758 > --- /dev/null > +++ b/Documentation/gpu/rfc/i915_dg2.rst > @@ -0,0 +1,47 @@ > + > +I915 DG2 RFC Section > + > + > +Upstream plan > += > +Plan to upstream the DG2 enabling is: > + > +* Merge basic HW enabling for DG2(Still without pciid) > +* Merge the 64k support for lmem > +* Merge the flat CCS enabling patches > +* Add the pciid for DG2 and enable the DG2 in CI > + > + > +64K page support for lmem > += > +On DG2 hw, local-memory supports minimum GTT page size of 64k only. 4k is > not supported anymore. > + > +DG2 hw dont support the 64k(lmem) and 4k(smem) pages in the same ppgtt Page > table. Refer the > +struct drm_i915_gem_create_ext for the implication of handling the 64k page > size. > + > +.. kernel-doc:: include/uapi/drm/i915_drm.h > +:functions: drm_i915_gem_create_ext > + > + > +flat CCS support for lmem > += > +Gen 12+ devices support 3D surfaces compression and compression formats. > This is > +accomplished by an additional compression control state (CCS) stored for > each surface. > + > +Gen 12 devices(TGL and DG1) stores compression state in a separate region of > memory. > +It is managed by userspace and has an associated set of userspace managed > page tables > +used by hardware for address translation. > + > +In Gen 12.5 devices(XEXPSDV and DG2) Flat CCS is introduced to replace the > userspace > +managed AUX pagetable with the flat indexed region of device memory for > storing the > +compression state > + > +GOP Driver steals a chunk of memory for the CCS surface corresponding to the > entire > +range of local memory. The memory required for the CCS of the entire local > memory is > +1/256 of the main local memory. The Gop driver will also program a secure > register > +(XEHPSDV_FLAT_CCS_BASE_ADDR 0x4910) with this address value. > + > +So the Total local memory available for driver allocation is Total lmem size > - CCS data size > + > +Flat CCS data needs to be cleared when a lmem object is allocated. And CCS > data can > +be copied in and out of CCS region through XY_CTRL_SURF_COPY_BLT. > diff --git a/Documentation/gpu/rfc/index.rst b/Documentation/gpu/rfc/index.rst > index 91e93a705230..afb320ed4028 100644 > --- a/Documentation/gpu/rfc/index.rst > +++ b/Documentation/gpu/rfc/index.rst > @@ -20,6 +20,9 @@ host such documentation: > > i915_gem_lmem.rst > > +.. toctree:: > +i915_dg2.rst > + > .. toctree:: > > i915_scheduler.rst > -- > 2.20.1 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [Intel-gfx] [PATCH 00/14] drm/i915/dg2: Enabling 64k page size and flat ccs
On Mon, Oct 11, 2021 at 09:41:41PM +0530, Ramalingam C wrote: > This series introduces the enabling patches for new flat ccs feature and > 64k page support for i915 local memory, along with documentation on the > uAPI impact. > > 64k page support > > > On discrete platforms, starting from DG2, we have to contend with GTT > page size restrictions when dealing with I915_MEMORY_CLASS_DEVICE > objects. Specifically the hardware only supports 64K or larger GTT page > sizes for such memory. The kernel will already ensure that all > I915_MEMORY_CLASS_DEVICE memory is allocated using 64K or larger page > sizes underneath. > > Note that the returned size here will always reflect any required > rounding up done by the kernel, i.e 4K will now become 64K on devices > such as DG2. The GTT alignment will also need be at least 64K for such > objects. > > Note that due to how the hardware implements 64K GTT page support, we > have some further complications: > > 1.) The entire PDE(which covers a 2M virtual address range), must > contain only 64K PTEs, i.e mixing 4K and 64K PTEs in the same PDE is > forbidden by the hardware. > > 2.) We still need to support 4K PTEs for I915_MEMORY_CLASS_SYSTEM > objects. > > To handle the above the kernel implements a memory coloring scheme to > prevent userspace from mixing I915_MEMORY_CLASS_DEVICE and > I915_MEMORY_CLASS_SYSTEM objects in the same PDE. If the kernel is ever > unable to evict the required pages for the given PDE(different color) > when inserting the object into the GTT then it will simply fail the > request. > > Since userspace needs to manage the GTT address space themselves, > special care is needed to ensure this doesn’t happen. The simplest > scheme is to simply align and round up all I915_MEMORY_CLASS_DEVICE > objects to 2M, which avoids any issues here. At the very least this is > likely needed for objects that can be placed in both > I915_MEMORY_CLASS_DEVICE and I915_MEMORY_CLASS_SYSTEM, to avoid > potential issues when the kernel needs to migrate the object behind the > scenes, since that might also involve evicting other objects. > > To summarise the GTT rules, on platforms like DG2: > > 1.) All objects that can be placed in I915_MEMORY_CLASS_DEVICE must have > 64K alignment. The kernel will reject this otherwise. > > 2.) All I915_MEMORY_CLASS_DEVICE objects must never be placed in the > same PDE with other I915_MEMORY_CLASS_SYSTEM objects. The kernel will > reject this otherwise. > > 3.) Objects that can be placed in both I915_MEMORY_CLASS_DEVICE and > I915_MEMORY_CLASS_SYSTEM should probably be aligned and padded out to > 2M. > > Flat CCS: > = > Gen 12+ devices support 3D surfaces compression and compression formats. > This is accomplished by an additional compression control state (CCS) > stored for each surface. > > Gen 12 devices(TGL and DG1) stores compression state in a separate > region of memory. It is managed by userspace and has an associated set > of userspace managed page tables used by hardware for address > translation. > > In Gen 12.5 devices(XEXPSDV and DG2) Flat CCS is introduced to replace > the userspace managed AUX pagetable with the flat indexed region of > device memory for storing the compression state > > GOP Driver steals a chunk of memory for the CCS surface corresponding to > the entire range of local memory. The memory required for the CCS of the > entire local memory is 1/256 of the main local memory. The Gop driver > will also program a secure register (XEHPSDV_FLAT_CCS_BASE_ADDR 0x4910) > with this address value. > > TODO: add patches for the flatccs modifiers and kdoc for them. Ah it's here too :-) Since this is uapi we also need link to igts (or at least where the tests are), and to mesa MR (if that hasn't all landed yet). -Daniel > > *** BLURB HERE *** > > Abdiel Janulgue (1): > drm/i915/lmem: Enable lmem for platforms with Flat CCS > > Ayaz A Siddiqui (1): > drm/i915/gt: Clear compress metadata for Gen12.5 >= platforms > > Bommu Krishnaiah (1): > drm/i915: Add vm min alignment support > > CQ Tang (1): > drm/i915/xehpsdv: Add has_flat_ccs to device info > > Matthew Auld (8): > drm/i915/xehpsdv: set min page-size to 64K > drm/i915/xehpsdv: enforce min GTT alignment > drm/i915: enforce min page size for scratch > drm/i915/gtt/xehpsdv: move scratch page to system memory > drm/i915/xehpsdv: support 64K GTT pages > drm/i915/selftests: account for min_alignment in GTT selftests > drm/i915/xehpsdv: implement memory coloring > drm/i915/uapi: document behaviour for DG2 64K support > > Ramalingam C (1): > Doc/gpu/rfc/i915: i915 DG2 uAPI > > Stuart Summers (1): > drm/i915: Add has_64k_pages flag > > Documentation/gpu/rfc/i915_dg2.rst| 47 ++ > Documentation/gpu/rfc/index.rst | 3 + > drivers/gpu/drm/i915/gem/i915_gem_stolen.c| 6 +- > .../gpu/drm/i915/gem/selftests/huge_pages.c | 61 > .../i915/gem/selftest
Re: [Intel-gfx] [PATCH 1/3] drm:Enable buddy allocator support
On Wed, Oct 13, 2021 at 07:05:34PM +0530, Arunpravin wrote: > Port Intel buddy manager to drm root folder One patch to move it 1:1, then follow-up patches to change it. Not everything in one. Also i915 needs to be adopted to use this too, or this just doesn't make sense. I'm also wondering whether we shouldn't have a ttm helper for this readymade so it just glues all in? -Daniel > Implemented range allocation support for the provided order > Implemented TOP-DOWN support > Implemented freeing up unused pages on contiguous allocation > Moved range allocation and freelist pickup into a single function > > Signed-off-by: Arunpravin > --- > drivers/gpu/drm/Makefile| 2 +- > drivers/gpu/drm/drm_buddy.c | 705 > drivers/gpu/drm/drm_drv.c | 3 + > include/drm/drm_buddy.h | 157 > 4 files changed, 866 insertions(+), 1 deletion(-) > create mode 100644 drivers/gpu/drm/drm_buddy.c > create mode 100644 include/drm/drm_buddy.h > > diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile > index a118692a6df7..fe1a2fc09675 100644 > --- a/drivers/gpu/drm/Makefile > +++ b/drivers/gpu/drm/Makefile > @@ -18,7 +18,7 @@ drm-y := drm_aperture.o drm_auth.o drm_cache.o \ > drm_dumb_buffers.o drm_mode_config.o drm_vblank.o \ > drm_syncobj.o drm_lease.o drm_writeback.o drm_client.o \ > drm_client_modeset.o drm_atomic_uapi.o drm_hdcp.o \ > - drm_managed.o drm_vblank_work.o > + drm_managed.o drm_vblank_work.o drm_buddy.o > > drm-$(CONFIG_DRM_LEGACY) += drm_agpsupport.o drm_bufs.o drm_context.o > drm_dma.o \ > drm_legacy_misc.o drm_lock.o drm_memory.o > drm_scatter.o \ > diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c > new file mode 100644 > index ..8cd118574665 > --- /dev/null > +++ b/drivers/gpu/drm/drm_buddy.c > @@ -0,0 +1,705 @@ > +// SPDX-License-Identifier: MIT > +/* > + * Copyright © 2021 Intel Corporation > + */ > + > +#include > +#include > + > +#include > + > +static struct kmem_cache *slab_blocks; > + > +static struct drm_buddy_block *drm_block_alloc(struct drm_buddy_mm *mm, > +struct drm_buddy_block *parent, > +unsigned int order, > +u64 offset) > +{ > + struct drm_buddy_block *block; > + > + BUG_ON(order > DRM_BUDDY_MAX_ORDER); > + > + block = kmem_cache_zalloc(slab_blocks, GFP_KERNEL); > + if (!block) > + return NULL; > + > + block->header = offset; > + block->header |= order; > + block->parent = parent; > + > + BUG_ON(block->header & DRM_BUDDY_HEADER_UNUSED); > + return block; > +} > + > +static void drm_block_free(struct drm_buddy_mm *mm, > +struct drm_buddy_block *block) > +{ > + kmem_cache_free(slab_blocks, block); > +} > + > +static void mark_allocated(struct drm_buddy_block *block) > +{ > + block->header &= ~DRM_BUDDY_HEADER_STATE; > + block->header |= DRM_BUDDY_ALLOCATED; > + > + list_del(&block->link); > +} > + > +static void mark_free(struct drm_buddy_mm *mm, > + struct drm_buddy_block *block) > +{ > + block->header &= ~DRM_BUDDY_HEADER_STATE; > + block->header |= DRM_BUDDY_FREE; > + > + list_add(&block->link, > + &mm->free_list[drm_buddy_block_order(block)]); > +} > + > +static void mark_split(struct drm_buddy_block *block) > +{ > + block->header &= ~DRM_BUDDY_HEADER_STATE; > + block->header |= DRM_BUDDY_SPLIT; > + > + list_del(&block->link); > +} > + > +/** > + * drm_buddy_init - init memory manager > + * > + * @mm: DRM buddy manager to initialize > + * @size: size in bytes to manage > + * @chunk_size: minimum page size in bytes for our allocations > + * > + * Initializes the memory manager and its resources. > + * > + * Returns: > + * 0 on success, error code on failure. > + */ > +int drm_buddy_init(struct drm_buddy_mm *mm, u64 size, u64 chunk_size) > +{ > + unsigned int i; > + u64 offset; > + > + if (size < chunk_size) > + return -EINVAL; > + > + if (chunk_size < PAGE_SIZE) > + return -EINVAL; > + > + if (!is_power_of_2(chunk_size)) > + return -EINVAL; > + > + size = round_down(size, chunk_size); > + > + mm->size = size; > + mm->avail = size; > + mm->chunk_size = chunk_size; > + mm->max_order = ilog2(size) - ilog2(chunk_size); > + > + BUG_ON(mm->max_order > DRM_BUDDY_MAX_ORDER); > + > + mm->free_list = kmalloc_array(mm->max_order + 1, > + sizeof(struct list_head), > + GFP_KERNEL); > + if (!mm->free_list) > + return -ENOMEM; > + > + for (i = 0; i <= mm->max_order; ++i) > + INIT_LIST_HEAD(&mm->free_list[i]); > + > + mm->n_roots = hweight
Re: [Intel-gfx] [PATCH] drm/i915: Use dma_resv_iter for waiting in i915_gem_object_wait_reservation.
On Wed, Oct 13, 2021 at 02:32:03PM +0200, Maarten Lankhorst wrote: > No memory should be allocated when calling i915_gem_object_wait, > because it may be called to idle a BO when evicting memory. > > Fix this by using dma_resv_iter helpers to call > i915_gem_object_wait_fence() on each fence, which cleans up the code a lot. > Also remove dma_resv_prune, it's questionably. > > This will result in the following lockdep splat. > > <4> [83.538517] == > <4> [83.538520] WARNING: possible circular locking dependency detected > <4> [83.538522] 5.15.0-rc5-CI-Trybot_8062+ #1 Not tainted > <4> [83.538525] -- > <4> [83.538527] gem_render_line/5242 is trying to acquire lock: > <4> [83.538530] 8275b1e0 (fs_reclaim){+.+.}-{0:0}, at: > __kmalloc_track_caller+0x56/0x270 > <4> [83.538538] > but task is already holding lock: > <4> [83.538540] 88813471d1e0 (&vm->mutex/1){+.+.}-{3:3}, at: > i915_vma_pin_ww+0x1c7/0x970 [i915] > <4> [83.538638] > which lock already depends on the new lock. > <4> [83.538642] > the existing dependency chain (in reverse order) is: > <4> [83.538645] > -> #1 (&vm->mutex/1){+.+.}-{3:3}: > <4> [83.538649]lock_acquire+0xd3/0x310 > <4> [83.538654]i915_gem_shrinker_taints_mutex+0x2d/0x50 [i915] > <4> [83.538730]i915_address_space_init+0xf5/0x1b0 [i915] > <4> [83.538794]ppgtt_init+0x55/0x70 [i915] > <4> [83.538856]gen8_ppgtt_create+0x44/0x5d0 [i915] > <4> [83.538912]i915_ppgtt_create+0x28/0xf0 [i915] > <4> [83.538971]intel_gt_init+0x130/0x3b0 [i915] > <4> [83.539029]i915_gem_init+0x14b/0x220 [i915] > <4> [83.539100]i915_driver_probe+0x97e/0xdd0 [i915] > <4> [83.539149]i915_pci_probe+0x43/0x1d0 [i915] > <4> [83.539197]pci_device_probe+0x9b/0x110 > <4> [83.539201]really_probe+0x1b0/0x3b0 > <4> [83.539205]__driver_probe_device+0xf6/0x170 > <4> [83.539208]driver_probe_device+0x1a/0x90 > <4> [83.539210]__driver_attach+0x93/0x160 > <4> [83.539213]bus_for_each_dev+0x72/0xc0 > <4> [83.539216]bus_add_driver+0x14b/0x1f0 > <4> [83.539220]driver_register+0x66/0xb0 > <4> [83.539222]hdmi_get_spk_alloc+0x1f/0x50 [snd_hda_codec_hdmi] > <4> [83.539227]do_one_initcall+0x53/0x2e0 > <4> [83.539230]do_init_module+0x55/0x200 > <4> [83.539234]load_module+0x2700/0x2980 > <4> [83.539237]__do_sys_finit_module+0xaa/0x110 > <4> [83.539241]do_syscall_64+0x37/0xb0 > <4> [83.539244]entry_SYSCALL_64_after_hwframe+0x44/0xae > <4> [83.539247] > -> #0 (fs_reclaim){+.+.}-{0:0}: > <4> [83.539251]validate_chain+0xb37/0x1e70 > <4> [83.539254]__lock_acquire+0x5a1/0xb70 > <4> [83.539258]lock_acquire+0xd3/0x310 > <4> [83.539260]fs_reclaim_acquire+0x9d/0xd0 > <4> [83.539264]__kmalloc_track_caller+0x56/0x270 > <4> [83.539267]krealloc+0x48/0xa0 > <4> [83.539270]dma_resv_get_fences+0x1c3/0x280 > <4> [83.539274]i915_gem_object_wait+0x1ff/0x410 [i915] > <4> [83.539342]i915_gem_evict_for_node+0x16b/0x440 [i915] > <4> [83.539412]i915_gem_gtt_reserve+0xff/0x130 [i915] > <4> [83.539482]i915_vma_pin_ww+0x765/0x970 [i915] > <4> [83.539556]eb_validate_vmas+0x6fe/0x8e0 [i915] > <4> [83.539626]i915_gem_do_execbuffer+0x9a6/0x20a0 [i915] > <4> [83.539693]i915_gem_execbuffer2_ioctl+0x11f/0x2c0 [i915] > <4> [83.539759]drm_ioctl_kernel+0xac/0x140 > <4> [83.539763]drm_ioctl+0x201/0x3d0 > <4> [83.539766]__x64_sys_ioctl+0x6a/0xa0 > <4> [83.539769]do_syscall_64+0x37/0xb0 > <4> [83.539772]entry_SYSCALL_64_after_hwframe+0x44/0xae > <4> [83.539775] > other info that might help us debug this: > <4> [83.539778] Possible unsafe locking scenario: > <4> [83.539781]CPU0CPU1 > <4> [83.539783] > <4> [83.539785] lock(&vm->mutex/1); > <4> [83.539788]lock(fs_reclaim); > <4> [83.539791]lock(&vm->mutex/1); > <4> [83.539794] lock(fs_reclaim); > <4> [83.539796] > *** DEADLOCK *** > <4> [83.539799] 3 locks held by gem_render_line/5242: > <4> [83.539802] #0: c9d4bbf0 > (reservation_ww_class_acquire){+.+.}-{0:0}, at: > i915_gem_do_execbuffer+0x8e5/0x20a0 [i915] > <4> [83.539870] #1: 88811e48bae8 > (reservation_ww_class_mutex){+.+.}-{3:3}, at: eb_validate_vmas+0x81/0x8e0 > [i915] > <4> [83.539936] #2: 88813471d1e0 (&vm->mutex/1){+.+.}-{3:3}, at: > i915_vma_pin_ww+0x1c7/0x970 [i915] > <4> [83.540011] > stack backtrace: > <4> [83.540014] CPU: 2 PID: 5242 Comm: gem_render_line Not tainted > 5.15.0-rc5-CI-Trybot_8062+ #1 > <4> [83.540019] Hardware name: Intel(R) Client Systems NUC11TNHi3/NUC11TNBi3, > BIOS TNTGL357.0038.2020.1124.1648 11/24/2020 > <4> [83.540023] Call Trace:
Re: [Intel-gfx] [PATCH 03/28] dma-buf: add dma_resv selftest v3
On Tue, Oct 05, 2021 at 01:37:17PM +0200, Christian König wrote: > Just exercising a very minor subset of the functionality, but already > proven useful. > > v2: add missing locking > v3: some more cleanup and consolidation, add unlocked test as well > > Signed-off-by: Christian König Yeah this is great, since if we then get some specific bug later on it's going to be very easy to add the unit test for the precise bug hopefully. I scrolled through, looks correct. Reviewed-by: Daniel Vetter > --- > drivers/dma-buf/Makefile | 3 +- > drivers/dma-buf/selftests.h | 1 + > drivers/dma-buf/st-dma-resv.c | 282 ++ > 3 files changed, 285 insertions(+), 1 deletion(-) > create mode 100644 drivers/dma-buf/st-dma-resv.c > > diff --git a/drivers/dma-buf/Makefile b/drivers/dma-buf/Makefile > index 1ef021273a06..511805dbeb75 100644 > --- a/drivers/dma-buf/Makefile > +++ b/drivers/dma-buf/Makefile > @@ -11,6 +11,7 @@ obj-$(CONFIG_DMABUF_SYSFS_STATS) += dma-buf-sysfs-stats.o > dmabuf_selftests-y := \ > selftest.o \ > st-dma-fence.o \ > - st-dma-fence-chain.o > + st-dma-fence-chain.o \ > + st-dma-resv.o > > obj-$(CONFIG_DMABUF_SELFTESTS) += dmabuf_selftests.o > diff --git a/drivers/dma-buf/selftests.h b/drivers/dma-buf/selftests.h > index bc8cea67bf1e..97d73aaa31da 100644 > --- a/drivers/dma-buf/selftests.h > +++ b/drivers/dma-buf/selftests.h > @@ -12,3 +12,4 @@ > selftest(sanitycheck, __sanitycheck__) /* keep first (igt selfcheck) */ > selftest(dma_fence, dma_fence) > selftest(dma_fence_chain, dma_fence_chain) > +selftest(dma_resv, dma_resv) > diff --git a/drivers/dma-buf/st-dma-resv.c b/drivers/dma-buf/st-dma-resv.c > new file mode 100644 > index ..50d3791ccb8c > --- /dev/null > +++ b/drivers/dma-buf/st-dma-resv.c > @@ -0,0 +1,282 @@ > +/* SPDX-License-Identifier: MIT */ > + > +/* > +* Copyright © 2019 Intel Corporation > +* Copyright © 2021 Advanced Micro Devices, Inc. > +*/ > + > +#include > +#include > +#include > + > +#include "selftest.h" > + > +static struct spinlock fence_lock; > + > +static const char *fence_name(struct dma_fence *f) > +{ > + return "selftest"; > +} > + > +static const struct dma_fence_ops fence_ops = { > + .get_driver_name = fence_name, > + .get_timeline_name = fence_name, > +}; > + > +static struct dma_fence *alloc_fence(void) > +{ > + struct dma_fence *f; > + > + f = kmalloc(sizeof(*f), GFP_KERNEL); > + if (!f) > + return NULL; > + > + dma_fence_init(f, &fence_ops, &fence_lock, 0, 0); > + return f; > +} > + > +static int sanitycheck(void *arg) > +{ > + struct dma_resv resv; > + struct dma_fence *f; > + int r; > + > + f = alloc_fence(); > + if (!f) > + return -ENOMEM; > + > + dma_fence_signal(f); > + dma_fence_put(f); > + > + dma_resv_init(&resv); > + r = dma_resv_lock(&resv, NULL); > + if (r) > + pr_err("Resv locking failed\n"); > + else > + dma_resv_unlock(&resv); > + dma_resv_fini(&resv); > + return r; > +} > + > +static int test_signaling(void *arg, bool shared) > +{ > + struct dma_resv resv; > + struct dma_fence *f; > + int r; > + > + f = alloc_fence(); > + if (!f) > + return -ENOMEM; > + > + dma_resv_init(&resv); > + r = dma_resv_lock(&resv, NULL); > + if (r) { > + pr_err("Resv locking failed\n"); > + goto err_free; > + } > + > + if (shared) { > + r = dma_resv_reserve_shared(&resv, 1); > + if (r) { > + pr_err("Resv shared slot allocation failed\n"); > + goto err_unlock; > + } > + > + dma_resv_add_shared_fence(&resv, f); > + } else { > + dma_resv_add_excl_fence(&resv, f); > + } > + > + if (dma_resv_test_signaled(&resv, shared)) { > + pr_err("Resv unexpectedly signaled\n"); > + r = -EINVAL; > + goto err_unlock; > + } > + dma_fence_signal(f); > + if (!dma_resv_test_signaled(&resv, shared)) { > + pr_err("Resv not reporting signaled\n"); > + r = -EINVAL; > + goto err_unlock; > + } > +err_unlock: > + dma_resv_unlock(&resv); > +err_free: > + dma_resv_fini(&resv); > + dma_fence_put(f); > + return r; > +} > + > +static int test_excl_signaling(void *arg) > +{ > + return test_signaling(arg, false); > +} > + > +static int test_shared_signaling(void *arg) > +{ > + return test_signaling(arg, true); > +} > + > +static int test_for_each(void *arg, bool shared) > +{ > + struct dma_resv_iter cursor; > + struct dma_fence *f, *fence; > + struct dma_resv resv; > + int r; > + > + f = alloc_fence(); > + if (!f) > + return -ENOMEM; > + > + dma_resv_init(&resv); > + r = dma_resv_lock(&resv, NULL); > + if (r) { > + pr_err("Resv
Re: [Intel-gfx] [PATCH 11/28] drm/amdgpu: use the new iterator in amdgpu_sync_resv
On Tue, Oct 05, 2021 at 01:37:25PM +0200, Christian König wrote: > Simplifying the code a bit. > > Signed-off-by: Christian König Reviewed-by: Daniel Vetter Yeah these iterators rock :-) -Daniel > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 44 > 1 file changed, 14 insertions(+), 30 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c > index 862eb3c1c4c5..f7d8487799b2 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c > @@ -252,41 +252,25 @@ int amdgpu_sync_resv(struct amdgpu_device *adev, struct > amdgpu_sync *sync, >struct dma_resv *resv, enum amdgpu_sync_mode mode, >void *owner) > { > - struct dma_resv_list *flist; > + struct dma_resv_iter cursor; > struct dma_fence *f; > - unsigned i; > - int r = 0; > + int r; > > if (resv == NULL) > return -EINVAL; > > - /* always sync to the exclusive fence */ > - f = dma_resv_excl_fence(resv); > - dma_fence_chain_for_each(f, f) { > - struct dma_fence_chain *chain = to_dma_fence_chain(f); > - > - if (amdgpu_sync_test_fence(adev, mode, owner, chain ? > -chain->fence : f)) { > - r = amdgpu_sync_fence(sync, f); > - dma_fence_put(f); > - if (r) > - return r; > - break; > - } > - } > - > - flist = dma_resv_shared_list(resv); > - if (!flist) > - return 0; > - > - for (i = 0; i < flist->shared_count; ++i) { > - f = rcu_dereference_protected(flist->shared[i], > - dma_resv_held(resv)); > - > - if (amdgpu_sync_test_fence(adev, mode, owner, f)) { > - r = amdgpu_sync_fence(sync, f); > - if (r) > - return r; > + dma_resv_for_each_fence(&cursor, resv, true, f) { > + dma_fence_chain_for_each(f, f) { > + struct dma_fence_chain *chain = to_dma_fence_chain(f); > + > + if (amdgpu_sync_test_fence(adev, mode, owner, chain ? > +chain->fence : f)) { > + r = amdgpu_sync_fence(sync, f); > + dma_fence_put(f); > + if (r) > + return r; > + break; > + } > } > } > return 0; > -- > 2.25.1 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [Intel-gfx] [PATCH 12/28] drm/amdgpu: use new iterator in amdgpu_ttm_bo_eviction_valuable
On Tue, Oct 05, 2021 at 01:37:26PM +0200, Christian König wrote: > Simplifying the code a bit. > > Signed-off-by: Christian König > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 14 -- > 1 file changed, 4 insertions(+), 10 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > index e8d70b6e6737..722e3c9e8882 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > @@ -1345,10 +1345,9 @@ static bool amdgpu_ttm_bo_eviction_valuable(struct > ttm_buffer_object *bo, > const struct ttm_place *place) > { > unsigned long num_pages = bo->resource->num_pages; > + struct dma_resv_iter resv_cursor; > struct amdgpu_res_cursor cursor; > - struct dma_resv_list *flist; > struct dma_fence *f; > - int i; > > /* Swapout? */ > if (bo->resource->mem_type == TTM_PL_SYSTEM) > @@ -1362,14 +1361,9 @@ static bool amdgpu_ttm_bo_eviction_valuable(struct > ttm_buffer_object *bo, >* If true, then return false as any KFD process needs all its BOs to >* be resident to run successfully >*/ > - flist = dma_resv_shared_list(bo->base.resv); > - if (flist) { > - for (i = 0; i < flist->shared_count; ++i) { > - f = rcu_dereference_protected(flist->shared[i], > - dma_resv_held(bo->base.resv)); > - if (amdkfd_fence_check_mm(f, current->mm)) > - return false; > - } > + dma_resv_for_each_fence(&resv_cursor, bo->base.resv, true, f) { ^false? At least I'm not seeing the code look at the exclusive fence here. -Daniel > + if (amdkfd_fence_check_mm(f, current->mm)) > + return false; > } > > switch (bo->resource->mem_type) { > -- > 2.25.1 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [Intel-gfx] [PATCH 13/28] drm/amdgpu: use new iterator in amdgpu_vm_prt_fini
On Tue, Oct 05, 2021 at 01:37:27PM +0200, Christian König wrote: > No need to actually allocate an array of fences here. > > Signed-off-by: Christian König > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 26 +- > 1 file changed, 5 insertions(+), 21 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > index 6b15cad78de9..e42dd79ed6f4 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > @@ -2090,30 +2090,14 @@ static void amdgpu_vm_free_mapping(struct > amdgpu_device *adev, > static void amdgpu_vm_prt_fini(struct amdgpu_device *adev, struct amdgpu_vm > *vm) > { > struct dma_resv *resv = vm->root.bo->tbo.base.resv; > - struct dma_fence *excl, **shared; > - unsigned i, shared_count; > - int r; > + struct dma_resv_iter cursor; > + struct dma_fence *fence; > > - r = dma_resv_get_fences(resv, &excl, &shared_count, &shared); > - if (r) { > - /* Not enough memory to grab the fence list, as last resort > - * block for all the fences to complete. > - */ > - dma_resv_wait_timeout(resv, true, false, > - MAX_SCHEDULE_TIMEOUT); > - return; > - } > - > - /* Add a callback for each fence in the reservation object */ > - amdgpu_vm_prt_get(adev); I was confused for a bit why the old code wouldn't leak a refcount for !excl case, but it's all handled. Not sure amdgpu_vm_add_prt_cb still needs to handle the !fence case, it's a bit a gotcha but I guess can happen? Either way, looks correct. Reviewed-by: Daniel Vetter > - amdgpu_vm_add_prt_cb(adev, excl); > - > - for (i = 0; i < shared_count; ++i) { > + dma_resv_for_each_fence(&cursor, resv, true, fence) { > + /* Add a callback for each fence in the reservation object */ > amdgpu_vm_prt_get(adev); > - amdgpu_vm_add_prt_cb(adev, shared[i]); > + amdgpu_vm_add_prt_cb(adev, fence); > } > - > - kfree(shared); > } > > /** > -- > 2.25.1 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [Intel-gfx] [PATCH 03/14] drm/i915/xehpsdv: enforce min GTT alignment
On 13/10/2021 14:38, Daniel Vetter wrote: On Mon, Oct 11, 2021 at 09:41:44PM +0530, Ramalingam C wrote: From: Matthew Auld For local-memory objects we need to align the GTT addresses to 64K, both for the ppgtt and ggtt. Signed-off-by: Matthew Auld Signed-off-by: Stuart Summers Signed-off-by: Ramalingam C Cc: Joonas Lahtinen Cc: Rodrigo Vivi Do we still need this with relocations removed? Userspace is picking all the addresses for us, so all we have to check is whether userspace got it right. Yeah, for OFFSET_FIXED this just validates that the provided address is correctly aligned to 64K, while for the in-kernel insertion stuff we still need to allocate an address that is aligned to 64K. Setting the alignment here handles both cases. -Daniel --- drivers/gpu/drm/i915/i915_vma.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index 4b7fc4647e46..1ea1fa08efdf 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -670,8 +670,13 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags) } color = 0; - if (vma->obj && i915_vm_has_cache_coloring(vma->vm)) - color = vma->obj->cache_level; + if (vma->obj) { + if (HAS_64K_PAGES(vma->vm->i915) && i915_gem_object_is_lmem(vma->obj)) + alignment = max(alignment, I915_GTT_PAGE_SIZE_64K); + + if (i915_vm_has_cache_coloring(vma->vm)) + color = vma->obj->cache_level; + } if (flags & PIN_OFFSET_FIXED) { u64 offset = flags & PIN_OFFSET_MASK; -- 2.20.1
Re: [Intel-gfx] [PATCH 14/28] drm/msm: use new iterator in msm_gem_describe
On Tue, Oct 05, 2021 at 01:37:28PM +0200, Christian König wrote: > Simplifying the code a bit. Also drop the RCU read side lock since the > object is locked anyway. > > Untested since I can't get the driver to compile on !ARM. Cross-compiler install is pretty easy and you should have that for pushing drm changes to drm-misc :-) > Signed-off-by: Christian König Assuming this compiles, it looks correct. Reviewed-by: Daniel Vetter > --- > drivers/gpu/drm/msm/msm_gem.c | 19 +-- > 1 file changed, 5 insertions(+), 14 deletions(-) > > diff --git a/drivers/gpu/drm/msm/msm_gem.c b/drivers/gpu/drm/msm/msm_gem.c > index 40a9863f5951..5bd511f07c07 100644 > --- a/drivers/gpu/drm/msm/msm_gem.c > +++ b/drivers/gpu/drm/msm/msm_gem.c > @@ -880,7 +880,7 @@ void msm_gem_describe(struct drm_gem_object *obj, struct > seq_file *m, > { > struct msm_gem_object *msm_obj = to_msm_bo(obj); > struct dma_resv *robj = obj->resv; > - struct dma_resv_list *fobj; > + struct dma_resv_iter cursor; > struct dma_fence *fence; > struct msm_gem_vma *vma; > uint64_t off = drm_vma_node_start(&obj->vma_node); > @@ -955,22 +955,13 @@ void msm_gem_describe(struct drm_gem_object *obj, > struct seq_file *m, > seq_puts(m, "\n"); > } > > - rcu_read_lock(); > - fobj = dma_resv_shared_list(robj); > - if (fobj) { > - unsigned int i, shared_count = fobj->shared_count; > - > - for (i = 0; i < shared_count; i++) { > - fence = rcu_dereference(fobj->shared[i]); > + dma_resv_for_each_fence(&cursor, robj, true, fence) { > + if (dma_resv_iter_is_exclusive(&cursor)) > + describe_fence(fence, "Exclusive", m); > + else > describe_fence(fence, "Shared", m); > - } > } > > - fence = dma_resv_excl_fence(robj); > - if (fence) > - describe_fence(fence, "Exclusive", m); > - rcu_read_unlock(); > - > msm_gem_unlock(obj); > } > > -- > 2.25.1 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [Intel-gfx] [PATCH 15/28] drm/radeon: use new iterator in radeon_sync_resv
On Tue, Oct 05, 2021 at 01:37:29PM +0200, Christian König wrote: > Simplifying the code a bit. > > Signed-off-by: Christian König Reviewed-by: Daniel Vetter > --- > drivers/gpu/drm/radeon/radeon_sync.c | 22 +++--- > 1 file changed, 3 insertions(+), 19 deletions(-) > > diff --git a/drivers/gpu/drm/radeon/radeon_sync.c > b/drivers/gpu/drm/radeon/radeon_sync.c > index 9257b60144c4..b991ba1bcd51 100644 > --- a/drivers/gpu/drm/radeon/radeon_sync.c > +++ b/drivers/gpu/drm/radeon/radeon_sync.c > @@ -91,33 +91,17 @@ int radeon_sync_resv(struct radeon_device *rdev, >struct dma_resv *resv, >bool shared) > { > - struct dma_resv_list *flist; > - struct dma_fence *f; > + struct dma_resv_iter cursor; > struct radeon_fence *fence; > - unsigned i; > + struct dma_fence *f; > int r = 0; > > - /* always sync to the exclusive fence */ > - f = dma_resv_excl_fence(resv); > - fence = f ? to_radeon_fence(f) : NULL; > - if (fence && fence->rdev == rdev) > - radeon_sync_fence(sync, fence); > - else if (f) > - r = dma_fence_wait(f, true); > - > - flist = dma_resv_shared_list(resv); > - if (shared || !flist || r) > - return r; > - > - for (i = 0; i < flist->shared_count; ++i) { > - f = rcu_dereference_protected(flist->shared[i], > - dma_resv_held(resv)); > + dma_resv_for_each_fence(&cursor, resv, shared, f) { > fence = to_radeon_fence(f); > if (fence && fence->rdev == rdev) > radeon_sync_fence(sync, fence); > else > r = dma_fence_wait(f, true); > - > if (r) > break; > } > -- > 2.25.1 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [Intel-gfx] [PATCH 17/28] drm/i915: use the new iterator in i915_gem_busy_ioctl v2
On Tue, Oct 05, 2021 at 02:44:50PM +0200, Christian König wrote: > Am 05.10.21 um 14:40 schrieb Tvrtko Ursulin: > > > > On 05/10/2021 12:37, Christian König wrote: > > > This makes the function much simpler since the complex > > > retry logic is now handled else where. > > > > > > Signed-off-by: Christian König > > > Reviewed-by: Tvrtko Ursulin > > > > Reminder - r-b was retracted until at least more text is added to commit > > message about pros and cons. But really some discussion had inside the > > i915 team on the topic. > > Sure, going to move those to a different branch. > > But I really only see the following options: > 1. Grab the lock. > 2. Use the _unlocked variant with get/put. > 3. Add another _rcu iterator just for this case. > > I'm fine with either, but Daniel pretty much already rejected #3 and #2/#1 > has more overhead then the original one. Anything that removes open-code rcu/lockless magic from i915 gets my ack, there's way too much of this everywhere. So on this: Acked-by: Daniel Vetter I've asked Maarten to review the i915 ones for you, please pester him if it's not happening :-) -Daniel > > Regards, > Christian. > > > > > Regards, > > > > Tvrtko > > > > > --- > > > drivers/gpu/drm/i915/gem/i915_gem_busy.c | 35 ++-- > > > 1 file changed, 14 insertions(+), 21 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_busy.c > > > b/drivers/gpu/drm/i915/gem/i915_gem_busy.c > > > index 6234e17259c1..dc72b36dae54 100644 > > > --- a/drivers/gpu/drm/i915/gem/i915_gem_busy.c > > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_busy.c > > > @@ -82,8 +82,8 @@ i915_gem_busy_ioctl(struct drm_device *dev, void > > > *data, > > > { > > > struct drm_i915_gem_busy *args = data; > > > struct drm_i915_gem_object *obj; > > > - struct dma_resv_list *list; > > > - unsigned int seq; > > > + struct dma_resv_iter cursor; > > > + struct dma_fence *fence; > > > int err; > > > err = -ENOENT; > > > @@ -109,27 +109,20 @@ i915_gem_busy_ioctl(struct drm_device *dev, > > > void *data, > > > * to report the overall busyness. This is what the wait-ioctl > > > does. > > > * > > > */ > > > -retry: > > > - seq = raw_read_seqcount(&obj->base.resv->seq); > > > - > > > - /* Translate the exclusive fence to the READ *and* WRITE engine */ > > > - args->busy = > > > busy_check_writer(dma_resv_excl_fence(obj->base.resv)); > > > - > > > - /* Translate shared fences to READ set of engines */ > > > - list = dma_resv_shared_list(obj->base.resv); > > > - if (list) { > > > - unsigned int shared_count = list->shared_count, i; > > > - > > > - for (i = 0; i < shared_count; ++i) { > > > - struct dma_fence *fence = > > > - rcu_dereference(list->shared[i]); > > > - > > > + args->busy = 0; > > > + dma_resv_iter_begin(&cursor, obj->base.resv, true); > > > + dma_resv_for_each_fence_unlocked(&cursor, fence) { > > > + if (dma_resv_iter_is_restarted(&cursor)) > > > + args->busy = 0; > > > + > > > + if (dma_resv_iter_is_exclusive(&cursor)) > > > + /* Translate the exclusive fence to the READ *and* > > > WRITE engine */ > > > + args->busy |= busy_check_writer(fence); > > > + else > > > + /* Translate shared fences to READ set of engines */ > > > args->busy |= busy_check_reader(fence); > > > - } > > > } > > > - > > > - if (args->busy && read_seqcount_retry(&obj->base.resv->seq, seq)) > > > - goto retry; > > > + dma_resv_iter_end(&cursor); > > > err = 0; > > > out: > > > > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [Intel-gfx] [PATCH 23/28] drm: use new iterator in drm_gem_fence_array_add_implicit v3
On Tue, Oct 05, 2021 at 01:37:37PM +0200, Christian König wrote: > Simplifying the code a bit. > > v2: add missing rcu_read_lock()/unlock() > v3: switch to locked version > > Signed-off-by: Christian König > Reviewed-by: Tvrtko Ursulin Please make sure you also apply this to the new copy of this code in drm/sched. This one here is up for deletion, once I get all the driver conversions I have landed ... -Daniel > --- > drivers/gpu/drm/drm_gem.c | 26 +- > 1 file changed, 5 insertions(+), 21 deletions(-) > > diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c > index 09c820045859..4dcdec6487bb 100644 > --- a/drivers/gpu/drm/drm_gem.c > +++ b/drivers/gpu/drm/drm_gem.c > @@ -1340,31 +1340,15 @@ int drm_gem_fence_array_add_implicit(struct xarray > *fence_array, >struct drm_gem_object *obj, >bool write) > { > - int ret; > - struct dma_fence **fences; > - unsigned int i, fence_count; > - > - if (!write) { > - struct dma_fence *fence = > - dma_resv_get_excl_unlocked(obj->resv); > - > - return drm_gem_fence_array_add(fence_array, fence); > - } > + struct dma_resv_iter cursor; > + struct dma_fence *fence; > + int ret = 0; > > - ret = dma_resv_get_fences(obj->resv, NULL, > - &fence_count, &fences); > - if (ret || !fence_count) > - return ret; > - > - for (i = 0; i < fence_count; i++) { > - ret = drm_gem_fence_array_add(fence_array, fences[i]); > + dma_resv_for_each_fence(&cursor, obj->resv, write, fence) { > + ret = drm_gem_fence_array_add(fence_array, fence); > if (ret) > break; > } > - > - for (; i < fence_count; i++) > - dma_fence_put(fences[i]); > - kfree(fences); > return ret; > } > EXPORT_SYMBOL(drm_gem_fence_array_add_implicit); > -- > 2.25.1 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [Intel-gfx] [PATCH 4/6] drm/i915: Add a struct dma_fence_work timeline
On Wed, 2021-10-13 at 14:43 +0200, Daniel Vetter wrote: > On Fri, Oct 08, 2021 at 03:35:28PM +0200, Thomas Hellström wrote: > > The TTM managers and, possibly, the gtt address space managers will > > need to be able to order fences for async operation. > > Using dma_fence_is_later() for this will require that the fences we > > hand > > them are from a single fence context and ordered. > > > > Introduce a struct dma_fence_work_timeline, and a function to > > attach > > struct dma_fence_work to such a timeline in a way that all previous > > fences attached to the timeline will be signaled when the latest > > attached struct dma_fence_work signals. > > > > Signed-off-by: Thomas Hellström > > I'm not understanding why we need this: > > - if we just want to order dma_fence work, then an ordered workqueue > is > what we want. Which is why hand-rolling is better than reusing > dma_fence_work for absolutely everything. > > - if we just need to make sure the public fences signal in order, > then > it's a dma_fence_chain. Part of the same series that needs reworking. What we need here is a way to coalesce multiple fences from various contexts (including both gpu and work fences) into a single fence and then attach it to a timeline. /Thomas
Re: [Intel-gfx] [PATCH 24/28] drm: use new iterator in drm_gem_plane_helper_prepare_fb v2
On Tue, Oct 05, 2021 at 01:37:38PM +0200, Christian König wrote: > Makes the handling a bit more complex, but avoids the use of > dma_resv_get_excl_unlocked(). > > v2: improve coding and documentation > > Signed-off-by: Christian König > --- > drivers/gpu/drm/drm_gem_atomic_helper.c | 13 +++-- > 1 file changed, 11 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/drm_gem_atomic_helper.c > b/drivers/gpu/drm/drm_gem_atomic_helper.c > index e570398abd78..8534f78d4d6d 100644 > --- a/drivers/gpu/drm/drm_gem_atomic_helper.c > +++ b/drivers/gpu/drm/drm_gem_atomic_helper.c > @@ -143,6 +143,7 @@ > */ > int drm_gem_plane_helper_prepare_fb(struct drm_plane *plane, struct > drm_plane_state *state) > { > + struct dma_resv_iter cursor; > struct drm_gem_object *obj; > struct dma_fence *fence; > > @@ -150,9 +151,17 @@ int drm_gem_plane_helper_prepare_fb(struct drm_plane > *plane, struct drm_plane_st > return 0; > > obj = drm_gem_fb_get_obj(state->fb, 0); > - fence = dma_resv_get_excl_unlocked(obj->resv); > - drm_atomic_set_fence_for_plane(state, fence); > + dma_resv_iter_begin(&cursor, obj->resv, false); > + dma_resv_for_each_fence_unlocked(&cursor, fence) { > + /* TODO: We only use the first write fence here and need to fix > + * the drm_atomic_set_fence_for_plane() API to accept more than > + * one. */ I'm confused, right now there is only one write fence. So no need to iterate, and also no need to add a TODO. If/when we add more write fences then I think this needs to be revisited, and ofc then we do need to update the set_fence helpers to carry an entire array of fences. -Daniel > + dma_fence_get(fence); > + break; > + } > + dma_resv_iter_end(&cursor); > > + drm_atomic_set_fence_for_plane(state, fence); > return 0; > } > EXPORT_SYMBOL_GPL(drm_gem_plane_helper_prepare_fb); > -- > 2.25.1 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [Intel-gfx] [PATCH 25/28] drm/nouveau: use the new iterator in nouveau_fence_sync
On Tue, Oct 05, 2021 at 01:37:39PM +0200, Christian König wrote: > Simplifying the code a bit. > > Signed-off-by: Christian König A bit a trick conversion since the previous code was clever with the ret handling in the loop, but looks correct. Please mention in the commit message that this code now also waits for all shared fences in all cases. Previously if we found an exclusive fence, we bailed out. That needs to be recorded in the commit message, together with an explainer that defacto too many other drivers have broken this rule already, and so you have to always iterate all fences. With that added: Reviewed-by: Daniel Vetter > --- > drivers/gpu/drm/nouveau/nouveau_fence.c | 48 +++-- > 1 file changed, 12 insertions(+), 36 deletions(-) > > diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c > b/drivers/gpu/drm/nouveau/nouveau_fence.c > index 05d0b3eb3690..26f9299df881 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_fence.c > +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c > @@ -339,14 +339,15 @@ nouveau_fence_wait(struct nouveau_fence *fence, bool > lazy, bool intr) > } > > int > -nouveau_fence_sync(struct nouveau_bo *nvbo, struct nouveau_channel *chan, > bool exclusive, bool intr) > +nouveau_fence_sync(struct nouveau_bo *nvbo, struct nouveau_channel *chan, > +bool exclusive, bool intr) > { > struct nouveau_fence_chan *fctx = chan->fence; > - struct dma_fence *fence; > struct dma_resv *resv = nvbo->bo.base.resv; > - struct dma_resv_list *fobj; > + struct dma_resv_iter cursor; > + struct dma_fence *fence; > struct nouveau_fence *f; > - int ret = 0, i; > + int ret; > > if (!exclusive) { > ret = dma_resv_reserve_shared(resv, 1); > @@ -355,10 +356,7 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, struct > nouveau_channel *chan, bool e > return ret; > } > > - fobj = dma_resv_shared_list(resv); > - fence = dma_resv_excl_fence(resv); > - > - if (fence) { > + dma_resv_for_each_fence(&cursor, resv, exclusive, fence) { > struct nouveau_channel *prev = NULL; > bool must_wait = true; > > @@ -366,41 +364,19 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, struct > nouveau_channel *chan, bool e > if (f) { > rcu_read_lock(); > prev = rcu_dereference(f->channel); > - if (prev && (prev == chan || fctx->sync(f, prev, chan) > == 0)) > + if (prev && (prev == chan || > + fctx->sync(f, prev, chan) == 0)) > must_wait = false; > rcu_read_unlock(); > } > > - if (must_wait) > + if (must_wait) { > ret = dma_fence_wait(fence, intr); > - > - return ret; > - } > - > - if (!exclusive || !fobj) > - return ret; > - > - for (i = 0; i < fobj->shared_count && !ret; ++i) { > - struct nouveau_channel *prev = NULL; > - bool must_wait = true; > - > - fence = rcu_dereference_protected(fobj->shared[i], > - dma_resv_held(resv)); > - > - f = nouveau_local_fence(fence, chan->drm); > - if (f) { > - rcu_read_lock(); > - prev = rcu_dereference(f->channel); > - if (prev && (prev == chan || fctx->sync(f, prev, chan) > == 0)) > - must_wait = false; > - rcu_read_unlock(); > + if (ret) > + return ret; > } > - > - if (must_wait) > - ret = dma_fence_wait(fence, intr); > } > - > - return ret; > + return 0; > } > > void > -- > 2.25.1 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [Intel-gfx] [PATCH 26/28] drm/nouveau: use the new interator in nv50_wndw_prepare_fb
On Tue, Oct 05, 2021 at 01:37:40PM +0200, Christian König wrote: > Makes the handling a bit more complex, but avoids the use of > dma_resv_get_excl_unlocked(). > > Signed-off-by: Christian König > --- > drivers/gpu/drm/nouveau/dispnv50/wndw.c | 10 +- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/nouveau/dispnv50/wndw.c > b/drivers/gpu/drm/nouveau/dispnv50/wndw.c > index 8d048bacd6f0..30712a681e2a 100644 > --- a/drivers/gpu/drm/nouveau/dispnv50/wndw.c > +++ b/drivers/gpu/drm/nouveau/dispnv50/wndw.c > @@ -539,6 +539,8 @@ nv50_wndw_prepare_fb(struct drm_plane *plane, struct > drm_plane_state *state) > struct nouveau_bo *nvbo; > struct nv50_head_atom *asyh; > struct nv50_wndw_ctxdma *ctxdma; > + struct dma_resv_iter cursor; > + struct dma_fence *fence; > int ret; > > NV_ATOMIC(drm, "%s prepare: %p\n", plane->name, fb); > @@ -561,7 +563,13 @@ nv50_wndw_prepare_fb(struct drm_plane *plane, struct > drm_plane_state *state) > asyw->image.handle[0] = ctxdma->object.handle; > } > > - asyw->state.fence = dma_resv_get_excl_unlocked(nvbo->bo.base.resv); > + dma_resv_iter_begin(&cursor, nvbo->bo.base.resv, false); > + dma_resv_for_each_fence_unlocked(&cursor, fence) { > + /* TODO: We only use the first writer here */ Same thing as with the atomic core helper. This is actually broken, because for atomic we really do _not_ want to wait for any shared fences. Which this will do, if there's no exclusive fence attached. So upgrading my general concern on this and the atomic helper patch to a reject, since I think it's broken. -Daniel > + asyw->state.fence = dma_fence_get(fence); > + break; > + } > + dma_resv_iter_end(&cursor); > asyw->image.offset[0] = nvbo->offset; > > if (wndw->func->prepare) { > -- > 2.25.1 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [Intel-gfx] [PATCH 27/28] drm/etnaviv: use new iterator in etnaviv_gem_describe
On Tue, Oct 05, 2021 at 01:37:41PM +0200, Christian König wrote: > Instead of hand rolling the logic. > > Signed-off-by: Christian König > --- > drivers/gpu/drm/etnaviv/etnaviv_gem.c | 31 ++- > 1 file changed, 11 insertions(+), 20 deletions(-) > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c > b/drivers/gpu/drm/etnaviv/etnaviv_gem.c > index 8f1b5af47dd6..0eeb33de2ff4 100644 > --- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c > +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c > @@ -428,19 +428,17 @@ int etnaviv_gem_wait_bo(struct etnaviv_gpu *gpu, struct > drm_gem_object *obj, > static void etnaviv_gem_describe_fence(struct dma_fence *fence, > const char *type, struct seq_file *m) > { > - if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)) Yay for removing open-coded tests like this. Drivers really should have no business digging around in fence->flags (i915 is terrible in this regard unfortunately). > - seq_printf(m, "\t%9s: %s %s seq %llu\n", > -type, > -fence->ops->get_driver_name(fence), > -fence->ops->get_timeline_name(fence), > -fence->seqno); > + seq_printf(m, "\t%9s: %s %s seq %llu\n", type, > +fence->ops->get_driver_name(fence), > +fence->ops->get_timeline_name(fence), > +fence->seqno); > } > > static void etnaviv_gem_describe(struct drm_gem_object *obj, struct seq_file > *m) > { > struct etnaviv_gem_object *etnaviv_obj = to_etnaviv_bo(obj); > struct dma_resv *robj = obj->resv; > - struct dma_resv_list *fobj; > + struct dma_resv_iter cursor; > struct dma_fence *fence; > unsigned long off = drm_vma_node_start(&obj->vma_node); > > @@ -449,21 +447,14 @@ static void etnaviv_gem_describe(struct drm_gem_object > *obj, struct seq_file *m) > obj->name, kref_read(&obj->refcount), > off, etnaviv_obj->vaddr, obj->size); > > - rcu_read_lock(); > - fobj = dma_resv_shared_list(robj); > - if (fobj) { > - unsigned int i, shared_count = fobj->shared_count; > - > - for (i = 0; i < shared_count; i++) { > - fence = rcu_dereference(fobj->shared[i]); > + dma_resv_iter_begin(&cursor, robj, true); > + dma_resv_for_each_fence_unlocked(&cursor, fence) { > + if (dma_resv_iter_is_exclusive(&cursor)) > + etnaviv_gem_describe_fence(fence, "Exclusive", m); > + else > etnaviv_gem_describe_fence(fence, "Shared", m); > - } > } > - > - fence = dma_resv_excl_fence(robj); > - if (fence) > - etnaviv_gem_describe_fence(fence, "Exclusive", m); > - rcu_read_unlock(); > + dma_resv_iter_end(&cursor); Reviewed-by: Daniel Vetter Please make sure it compiles on arm before pushing :-) > } > > void etnaviv_gem_describe_objects(struct etnaviv_drm_private *priv, > -- > 2.25.1 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [Intel-gfx] [PATCH 28/28] drm/etnaviv: replace dma_resv_get_excl_unlocked
On Tue, Oct 05, 2021 at 01:37:42PM +0200, Christian König wrote: > We certainly hold the reservation lock here, no need for the RCU dance. > > Signed-off-by: Christian König > --- > drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c > b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c > index 4dd7d9d541c0..7e17bc2b5df1 100644 > --- a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c > +++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c > @@ -195,7 +195,7 @@ static int submit_fence_sync(struct etnaviv_gem_submit > *submit) > if (ret) > return ret; > } else { > - bo->excl = dma_resv_get_excl_unlocked(robj); Maybe have that in the series to sunset dma_resv_get_excl_unlocked()? Just so it makes a bit more sense from a motivation pov. Or explain that in the commit message. Anyway looks correct. Reviewed-by: Daniel Vetter > + bo->excl = dma_fence_get(dma_resv_excl_fence(robj)); > } > > } > -- > 2.25.1 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [Intel-gfx] [PATCH 4/6] drm/i915: Add a struct dma_fence_work timeline
On Wed, Oct 13, 2021 at 04:21:43PM +0200, Thomas Hellström wrote: > On Wed, 2021-10-13 at 14:43 +0200, Daniel Vetter wrote: > > On Fri, Oct 08, 2021 at 03:35:28PM +0200, Thomas Hellström wrote: > > > The TTM managers and, possibly, the gtt address space managers will > > > need to be able to order fences for async operation. > > > Using dma_fence_is_later() for this will require that the fences we > > > hand > > > them are from a single fence context and ordered. > > > > > > Introduce a struct dma_fence_work_timeline, and a function to > > > attach > > > struct dma_fence_work to such a timeline in a way that all previous > > > fences attached to the timeline will be signaled when the latest > > > attached struct dma_fence_work signals. > > > > > > Signed-off-by: Thomas Hellström > > > > I'm not understanding why we need this: > > > > - if we just want to order dma_fence work, then an ordered workqueue > > is > > what we want. Which is why hand-rolling is better than reusing > > dma_fence_work for absolutely everything. > > > > - if we just need to make sure the public fences signal in order, > > then > > it's a dma_fence_chain. > > Part of the same series that needs reworking. > > What we need here is a way to coalesce multiple fences from various > contexts (including both gpu and work fences) into a single fence and > then attach it to a timeline. I thought dma_fence_chain does this for you, including coelescing on the same timeline. Or at least it's supposed to, because if it doesn't you can produce some rather epic chain explosions with vulkan :-) -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [Intel-gfx] [PATCH 4/6] drm/i915: Add a struct dma_fence_work timeline
On 10/13/21 16:33, Daniel Vetter wrote: On Wed, Oct 13, 2021 at 04:21:43PM +0200, Thomas Hellström wrote: On Wed, 2021-10-13 at 14:43 +0200, Daniel Vetter wrote: On Fri, Oct 08, 2021 at 03:35:28PM +0200, Thomas Hellström wrote: The TTM managers and, possibly, the gtt address space managers will need to be able to order fences for async operation. Using dma_fence_is_later() for this will require that the fences we hand them are from a single fence context and ordered. Introduce a struct dma_fence_work_timeline, and a function to attach struct dma_fence_work to such a timeline in a way that all previous fences attached to the timeline will be signaled when the latest attached struct dma_fence_work signals. Signed-off-by: Thomas Hellström I'm not understanding why we need this: - if we just want to order dma_fence work, then an ordered workqueue is what we want. Which is why hand-rolling is better than reusing dma_fence_work for absolutely everything. - if we just need to make sure the public fences signal in order, then it's a dma_fence_chain. Part of the same series that needs reworking. What we need here is a way to coalesce multiple fences from various contexts (including both gpu and work fences) into a single fence and then attach it to a timeline. I thought dma_fence_chain does this for you, including coelescing on the same timeline. Or at least it's supposed to, because if it doesn't you can produce some rather epic chain explosions with vulkan :-) I'll take a look to see if I can use dma_fence_chain for this case. Thanks, /Thomas -Daniel
Re: [Intel-gfx] [PATCH 2/6] drm/i915: Introduce refcounted sg-tables
On Fri, Oct 08, 2021 at 03:35:26PM +0200, Thomas Hellström wrote: > As we start to introduce asynchronous failsafe object migration, > where we update the object state and then submit asynchronous > commands we need to record what memory resources are actually used > by various part of the command stream. Initially for three purposes: > > 1) Error capture. > 2) Asynchronous migration error recovery. > 3) Asynchronous vma bind. > > At the time where these happens, the object state may have been updated > to be several migrations ahead and object sg-tables discarded. > > In order to make it possible to keep sg-tables with memory resource > information for these operations, introduce refcounted sg-tables that > aren't freed until the last user is done with them. > > The alternative would be to reference information sitting on the > corresponding ttm_resources which typically have the same lifetime as > these refcountes sg_tables, but that leads to other awkward constructs: > Due to the design direction chosen for ttm resource managers that would > lead to diamond-style inheritance, the LMEM resources may sometimes be > prematurely freed, and finally the subclassed struct ttm_resource would > have to bleed into the asynchronous vma bind code. On the diamon inheritence I was pondering some more whether we shouldn't just do the classic C union horrors, i.e. struct ttm_resource { /* stuff */ }; struct ttm_drm_mm_resource { struct ttm_resource base; struct drm_mm_node; }; struct ttm_buddy_resource { struct ttm_resource base; struct drm_buddy_node; }; Whatever else we have, maybe also integer resources for guc_id. And then the horrors: struct i915_gem_resource { union { struct ttm_resource base; struct ttm_drm_mm_resource drm_mm; struct ttm_buffer_object buddy; }; /* i915 stuff */ }; BUILD_BUG_ON(offsetof(struct i915_gem_resource, base) == offsetof(struct i915_gem_resource, drmm_mm.base)) BUILD_BUG_ON(offsetof(struct i915_gem_resource, base) == offsetof(struct i915_gem_resource, buddy.base)) This is horrible, but also in official C89 and later unions are the only ways to do inheritance. The only reason we can do different in linux is because we compile with strict aliasing turned off. So I think we can shrug this off as officially sanctioned horrors. There's a small downside with overhead maybe, but I don't think the amount in difference between the various allocators is big enough that we should care. Plus a pointer to driver stuff to resolve the diamond inheritance through different means isn't free either. But also this is for much later, I think for now refcounting sglist as a standalone thing is ok, since we do seem to need them in a bunch of places. But eventually I do think we should aim to merge them with ttm_resource, if/when those get refcounted. -Daniel > > Signed-off-by: Thomas Hellström > --- > .../gpu/drm/i915/gem/i915_gem_object_types.h | 3 +- > drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 159 +++--- > drivers/gpu/drm/i915/i915_scatterlist.c | 62 +-- > drivers/gpu/drm/i915/i915_scatterlist.h | 76 - > drivers/gpu/drm/i915/intel_region_ttm.c | 15 +- > drivers/gpu/drm/i915/intel_region_ttm.h | 5 +- > drivers/gpu/drm/i915/selftests/mock_region.c | 12 +- > 7 files changed, 238 insertions(+), 94 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h > b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h > index 7c3da4e3e737..d600cf7ceb35 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h > @@ -485,6 +485,7 @@ struct drm_i915_gem_object { >*/ > struct list_head region_link; > > + struct i915_refct_sgt *rsgt; > struct sg_table *pages; > void *mapping; > > @@ -538,7 +539,7 @@ struct drm_i915_gem_object { > } mm; > > struct { > - struct sg_table *cached_io_st; > + struct i915_refct_sgt *cached_io_rsgt; > struct i915_gem_object_page_iter get_io_page; > struct drm_i915_gem_object *backup; > bool created:1; > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c > b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c > index 74a1ffd0d7dd..4b4d7457bef9 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c > @@ -34,7 +34,7 @@ > * struct i915_ttm_tt - TTM page vector with additional private information > * @ttm: The base TTM page vector. > * @dev: The struct device used for dma mapping and unmapping. > - * @cached_st: The cached scatter-gather table. > + * @cached_rsgt: The cached scatter-gather table. > * > * Note that DMA may be going on right up to the point where the page- > * vector is unpopulated in delayed
Re: [Intel-gfx] [PATCH 2/6] drm/i915: Introduce refcounted sg-tables
On 10/13/21 16:41, Daniel Vetter wrote: On Fri, Oct 08, 2021 at 03:35:26PM +0200, Thomas Hellström wrote: As we start to introduce asynchronous failsafe object migration, where we update the object state and then submit asynchronous commands we need to record what memory resources are actually used by various part of the command stream. Initially for three purposes: 1) Error capture. 2) Asynchronous migration error recovery. 3) Asynchronous vma bind. At the time where these happens, the object state may have been updated to be several migrations ahead and object sg-tables discarded. In order to make it possible to keep sg-tables with memory resource information for these operations, introduce refcounted sg-tables that aren't freed until the last user is done with them. The alternative would be to reference information sitting on the corresponding ttm_resources which typically have the same lifetime as these refcountes sg_tables, but that leads to other awkward constructs: Due to the design direction chosen for ttm resource managers that would lead to diamond-style inheritance, the LMEM resources may sometimes be prematurely freed, and finally the subclassed struct ttm_resource would have to bleed into the asynchronous vma bind code. On the diamon inheritence I was pondering some more whether we shouldn't just do the classic C union horrors, i.e. struct ttm_resource { /* stuff */ }; struct ttm_drm_mm_resource { struct ttm_resource base; struct drm_mm_node; }; struct ttm_buddy_resource { struct ttm_resource base; struct drm_buddy_node; }; Whatever else we have, maybe also integer resources for guc_id. And then the horrors: struct i915_gem_resource { union { struct ttm_resource base; struct ttm_drm_mm_resource drm_mm; struct ttm_buffer_object buddy; }; /* i915 stuff */ }; BUILD_BUG_ON(offsetof(struct i915_gem_resource, base) == offsetof(struct i915_gem_resource, drmm_mm.base)) BUILD_BUG_ON(offsetof(struct i915_gem_resource, base) == offsetof(struct i915_gem_resource, buddy.base)) This is horrible, but also in official C89 and later unions are the only ways to do inheritance. The only reason we can do different in linux is because we compile with strict aliasing turned off. So I think we can shrug this off as officially sanctioned horrors. There's a small downside with overhead maybe, but I don't think the amount in difference between the various allocators is big enough that we should care. Plus a pointer to driver stuff to resolve the diamond inheritance through different means isn't free either. Yes, this is exactly what was meant by "awkward constructs" in the commit message, My thoughts are still that all this could be avoided by a different design for struct ttm_resource, but I agree we can do with refcounted sg-lists for now, to see where this ends up when all related resource-on-lru stuff lands in TTM. /Thomas
[Intel-gfx] ✗ Fi.CI.BUILD: failure for mmotm 2021-10-05-19-53 uploaded (drivers/gpu/drm/msm/hdmi/hdmi_phy.o) (rev2)
== Series Details == Series: mmotm 2021-10-05-19-53 uploaded (drivers/gpu/drm/msm/hdmi/hdmi_phy.o) (rev2) URL : https://patchwork.freedesktop.org/series/95495/ State : failure == Summary == Applying: mmotm 2021-10-05-19-53 uploaded (drivers/gpu/drm/msm/hdmi/hdmi_phy.o) error: patch failed: drivers/gpu/drm/msm/Makefile:116 error: drivers/gpu/drm/msm/Makefile: patch does not apply error: Did you hand edit your patch? It does not apply to blobs recorded in its index. hint: Use 'git am --show-current-patch=diff' to see the failed patch Using index info to reconstruct a base tree... Patch failed at 0001 mmotm 2021-10-05-19-53 uploaded (drivers/gpu/drm/msm/hdmi/hdmi_phy.o) When you have resolved this problem, run "git am --continue". If you prefer to skip this patch, run "git am --skip" instead. To restore the original branch and stop patching, run "git am --abort".
[Intel-gfx] ✗ Fi.CI.BUILD: failure for drm/i915: Use dma_resv_iter for waiting in i915_gem_object_wait_reservation. (rev3)
== Series Details == Series: drm/i915: Use dma_resv_iter for waiting in i915_gem_object_wait_reservation. (rev3) URL : https://patchwork.freedesktop.org/series/95765/ State : failure == Summary == CALLscripts/checksyscalls.sh CALLscripts/atomic/check-atomics.sh DESCEND objtool CHK include/generated/compile.h CC [M] drivers/gpu/drm/i915/gem/i915_gem_shrinker.o drivers/gpu/drm/i915/gem/i915_gem_shrinker.c: In function ‘i915_gem_shrink’: drivers/gpu/drm/i915/gem/i915_gem_shrinker.c:231:4: error: implicit declaration of function ‘dma_resv_prune’; did you mean ‘dma_resv_fini’? [-Werror=implicit-function-declaration] dma_resv_prune(obj->base.resv); ^~ dma_resv_fini cc1: all warnings being treated as errors scripts/Makefile.build:277: recipe for target 'drivers/gpu/drm/i915/gem/i915_gem_shrinker.o' failed make[4]: *** [drivers/gpu/drm/i915/gem/i915_gem_shrinker.o] Error 1 scripts/Makefile.build:540: recipe for target 'drivers/gpu/drm/i915' failed make[3]: *** [drivers/gpu/drm/i915] Error 2 scripts/Makefile.build:540: recipe for target 'drivers/gpu/drm' failed make[2]: *** [drivers/gpu/drm] Error 2 scripts/Makefile.build:540: recipe for target 'drivers/gpu' failed make[1]: *** [drivers/gpu] Error 2 Makefile:1868: recipe for target 'drivers' failed make: *** [drivers] Error 2
Re: [Intel-gfx] [PATCH] drm/i915/display: Remove check for low voltage sku for max dp source rate
On Thu, Oct 07, 2021 at 01:19:25PM +0530, Nautiyal, Ankit K wrote: > > On 10/5/2021 9:01 PM, Imre Deak wrote: > > On Tue, Oct 05, 2021 at 01:34:21PM +0300, Jani Nikula wrote: > > > Cc: Imre, I think you were involved in adding the checks. > > About ADL-S the spec says: > > > > Bspec 53597: > > Combo Port Maximum Speed: > > OEM must use VBT to specify a maximum that is tolerated by the board design. > > > > Combo Port HBR3 support: > > May require retimer on motherboard. The OEM must use VBT to limit the link > > rate to HBR2 if HBR3 not supported by motherboard. > > > > Bspec/49201: > > Combo Port HBR3/6.48GHz support: > > Only supported on SKUs with higher I/O voltage > > > > I take the above meaning that only high voltage SKUs support HBR3 and > > on those SKUs the OEM must limit this to HBR2 if HBR3 would require a > > retimer on the board, but the board doesn't have this. > > > > If the above isn't correct and low voltage SKUs also in fact support > > HBR3 (with retimers if necessary) then this should imo clarified at > > Bspec/49201. The VBT limit could be used then if present, ignoring the > > low voltage SKU readout. > > Thanks Imre for the inputs. > > As you have mentioned note : rate >5.4 G supported only on High voltage I/O, > is mentioned for platforms like ICL, JSL and Display 12 platforms. > > I had again asked the HW team and VBT/GOP team whether we can safely rely on > VBT for the max rate for these platforms, without worrying about the SKU's > IO Voltage, and also requested them to update the Bspec page for the same. > > In response the Bspec pages 49201, 20598 are now updated with the note "OEM > must use VBT to specify a maximum that is tolerated by the board design" for > the rates above 5.4G. Ok, thanks for this, now the spec is closer to the proposed changes. On some platforms it's still unclear if the default max rate in the lack of a VBT limit is HBR2 or HBR3. The ADL-S overview at Bspec/53597 is clear now wrt. this: (*) "May require retimer on motherboard. The OEM must use VBT to limit the link rate to HBR2 if HBR3 not supported by motherboard." ideally it should still clarify if the potential retimer requirement applies to both eDP and DP or only to DP. I still see the followings to adjust in the spec so that it reflects the patch: - ICL - bspec/20584: "Increased IO voltage may be required to support HBR3 for the highest DisplayPort and eDP resolutions." should be changed to (*) above mentioning that HBR3 is only supported on eDP. - bspec/20598: "Combo HBR3: OEM must use VBT to specify a miximum that is tolerated by the board design." The DP/HBR3 support on ICL should be removed. For eDP/HBR3 on ICL the above comment should be changed to (*). - JSL - bspec/32247: "Increased IO voltage may be required to support HBR3 for the highest DisplayPort resolutions." should be removed/changed to (*). - bspec/20598: "OEM must use VBT to specify a miximum that is tolerated by the board design." should be changed to (*). - TGL: - bspec/49201: "Combo HBR3: OEM must use VBT to specify a miximum that is tolerated by the board design." The DP/HBR3 support should be removed, for eDP/HBR3 the above should be changed to (*). - RKL: - bspec/49201, 49204: Remove the RKL tag, since there is a separate page for RKL. - bspec/49202: "Combo HBR3: Only supported on SKUs with higher I/O voltage" should be changed to (*). - ADLS: - bspec/49201, 49204: The ADLS tag should be removed, since there is a separate page for ADLS. - bspec/53720: "Combo HBR3: OEM must use VBT to specify a miximum that is tolerated by the board design." should be changed to (*). - DG1: - bspec/49205: "Combo HBR3: Only supported on SKUs with higher I/O voltage" should be changed to (*) above. - DG2: - bspec/53657: For Combo HBR3 (*) should be added. - bspec/54034: For Combo HBR3 (*) should be added. - ADLP: - bspec/49185: "Combo DP/HBR3: OEM must use VBT to specify a miximum that is tolerated by the board design. An external re-timer may be needed." should be changed to (*). Also could you add a debug print with the voltage configuration of combo PHYs somewhere in intel_combo_phy.c? > From what I understand, we can depend upon the VBT's rate, and if there are > some low voltage I/O SKUs that do not support HBR3 rate, it should be > limited by the VBT. > > Thanks & Regards, > > Ankit > > > > BR, > > > Jani. > > > > > > On Tue, 05 Oct 2021, "Nautiyal, Ankit K" > > > wrote: > > > > On 10/5/2021 1:34 PM, Jani Nikula wrote: > > > > > On Tue, 05 Oct 2021, Ankit Nautiyal > > > > > wrote: > > > > > > The low voltage sku check can be ignored as OEMs need to consider > > > > > > that > > > > > > when designing the board and then put any limits in VBT. > > > > > "can" or "must"? > > > > > > > > > > VBT has been notoriously buggy over the
Re: [Intel-gfx] [PATCH] drm/i915/display: Remove check for low voltage sku for max dp source rate
On Wed, 13 Oct 2021, Imre Deak wrote: > On Thu, Oct 07, 2021 at 01:19:25PM +0530, Nautiyal, Ankit K wrote: >> >> On 10/5/2021 9:01 PM, Imre Deak wrote: >> > On Tue, Oct 05, 2021 at 01:34:21PM +0300, Jani Nikula wrote: >> > > Cc: Imre, I think you were involved in adding the checks. >> > About ADL-S the spec says: >> > >> > Bspec 53597: >> > Combo Port Maximum Speed: >> > OEM must use VBT to specify a maximum that is tolerated by the board >> > design. >> > >> > Combo Port HBR3 support: >> > May require retimer on motherboard. The OEM must use VBT to limit the link >> > rate to HBR2 if HBR3 not supported by motherboard. >> > >> > Bspec/49201: >> > Combo Port HBR3/6.48GHz support: >> > Only supported on SKUs with higher I/O voltage >> > >> > I take the above meaning that only high voltage SKUs support HBR3 and >> > on those SKUs the OEM must limit this to HBR2 if HBR3 would require a >> > retimer on the board, but the board doesn't have this. >> > >> > If the above isn't correct and low voltage SKUs also in fact support >> > HBR3 (with retimers if necessary) then this should imo clarified at >> > Bspec/49201. The VBT limit could be used then if present, ignoring the >> > low voltage SKU readout. >> >> Thanks Imre for the inputs. >> >> As you have mentioned note : rate >5.4 G supported only on High voltage I/O, >> is mentioned for platforms like ICL, JSL and Display 12 platforms. >> >> I had again asked the HW team and VBT/GOP team whether we can safely rely on >> VBT for the max rate for these platforms, without worrying about the SKU's >> IO Voltage, and also requested them to update the Bspec page for the same. >> >> In response the Bspec pages 49201, 20598 are now updated with the note "OEM >> must use VBT to specify a maximum that is tolerated by the board design" for >> the rates above 5.4G. > > Ok, thanks for this, now the spec is closer to the proposed changes. On > some platforms it's still unclear if the default max rate in the lack of > a VBT limit is HBR2 or HBR3. The ADL-S overview at Bspec/53597 is clear > now wrt. this: > > (*) "May require retimer on motherboard. The OEM must use VBT to limit the > link rate > to HBR2 if HBR3 not supported by motherboard." > > ideally it should still clarify if the potential retimer requirement applies > to > both eDP and DP or only to DP. > > I still see the followings to adjust in the spec so that it reflects > the patch: > > - ICL > - bspec/20584: > "Increased IO voltage may be required to support HBR3 for the highest > DisplayPort > and eDP resolutions." > > should be changed to (*) above mentioning that HBR3 is only supported on > eDP. > > - bspec/20598: > "Combo HBR3: OEM must use VBT to specify a miximum that is tolerated by > the > board design." > > The DP/HBR3 support on ICL should be removed. > > For eDP/HBR3 on ICL the above comment should be changed to (*). > > - JSL > - bspec/32247: > "Increased IO voltage may be required to support HBR3 for the highest > DisplayPort > resolutions." > > should be removed/changed to (*). > > - bspec/20598: > "OEM must use VBT to specify a miximum that is tolerated by the > board design." > > should be changed to (*). > > - TGL: > - bspec/49201: > "Combo HBR3: OEM must use VBT to specify a miximum that is tolerated > by the board design." > > The DP/HBR3 support should be removed, for eDP/HBR3 the above should > be changed to (*). > > - RKL: > - bspec/49201, 49204: > Remove the RKL tag, since there is a separate page for RKL. > > - bspec/49202: > "Combo HBR3: Only supported on SKUs with higher I/O voltage" > > should be changed to (*). > > - ADLS: > - bspec/49201, 49204: > The ADLS tag should be removed, since there is a separate page for ADLS. > > - bspec/53720: > "Combo HBR3: OEM must use VBT to specify a miximum that is tolerated by > the > board design." > > should be changed to (*). > > - DG1: > - bspec/49205: > "Combo HBR3: Only supported on SKUs with higher I/O voltage" > > should be changed to (*) above. > > - DG2: > - bspec/53657: > For Combo HBR3 (*) should be added. > > - bspec/54034: > For Combo HBR3 (*) should be added. > > - ADLP: > - bspec/49185: > "Combo DP/HBR3: OEM must use VBT to specify a miximum that is tolerated by > the board design. An external re-timer may be needed." > > should be changed to (*). > > > Also could you add a debug print with the voltage configuration of combo > PHYs somewhere in intel_combo_phy.c? > >> From what I understand, we can depend upon the VBT's rate, and if there are >> some low voltage I/O SKUs that do not support HBR3 rate, it should be >> limited by the VBT. >> >> Thanks & Regards, >> >> Ankit >> >> > > BR, >> > > Jani. >> > > >> > > On Tue, 05 Oct 2021, "Nautiyal, Ankit K" >> > > wrote: >> > > > On 10/5/2021 1:34 PM, Jani Nikula wrote: >> > > > > On Tue, 05 Oct 2021, Ankit Naut
Re: [Intel-gfx] [PATCH] drm/i915: Use dma_resv_iter for waiting in i915_gem_object_wait_reservation.
On 13/10/2021 15:00, Daniel Vetter wrote: On Wed, Oct 13, 2021 at 02:32:03PM +0200, Maarten Lankhorst wrote: No memory should be allocated when calling i915_gem_object_wait, because it may be called to idle a BO when evicting memory. Fix this by using dma_resv_iter helpers to call i915_gem_object_wait_fence() on each fence, which cleans up the code a lot. Also remove dma_resv_prune, it's questionably. This will result in the following lockdep splat. <4> [83.538517] == <4> [83.538520] WARNING: possible circular locking dependency detected <4> [83.538522] 5.15.0-rc5-CI-Trybot_8062+ #1 Not tainted <4> [83.538525] -- <4> [83.538527] gem_render_line/5242 is trying to acquire lock: <4> [83.538530] 8275b1e0 (fs_reclaim){+.+.}-{0:0}, at: __kmalloc_track_caller+0x56/0x270 <4> [83.538538] but task is already holding lock: <4> [83.538540] 88813471d1e0 (&vm->mutex/1){+.+.}-{3:3}, at: i915_vma_pin_ww+0x1c7/0x970 [i915] <4> [83.538638] which lock already depends on the new lock. <4> [83.538642] the existing dependency chain (in reverse order) is: <4> [83.538645] -> #1 (&vm->mutex/1){+.+.}-{3:3}: <4> [83.538649]lock_acquire+0xd3/0x310 <4> [83.538654]i915_gem_shrinker_taints_mutex+0x2d/0x50 [i915] <4> [83.538730]i915_address_space_init+0xf5/0x1b0 [i915] <4> [83.538794]ppgtt_init+0x55/0x70 [i915] <4> [83.538856]gen8_ppgtt_create+0x44/0x5d0 [i915] <4> [83.538912]i915_ppgtt_create+0x28/0xf0 [i915] <4> [83.538971]intel_gt_init+0x130/0x3b0 [i915] <4> [83.539029]i915_gem_init+0x14b/0x220 [i915] <4> [83.539100]i915_driver_probe+0x97e/0xdd0 [i915] <4> [83.539149]i915_pci_probe+0x43/0x1d0 [i915] <4> [83.539197]pci_device_probe+0x9b/0x110 <4> [83.539201]really_probe+0x1b0/0x3b0 <4> [83.539205]__driver_probe_device+0xf6/0x170 <4> [83.539208]driver_probe_device+0x1a/0x90 <4> [83.539210]__driver_attach+0x93/0x160 <4> [83.539213]bus_for_each_dev+0x72/0xc0 <4> [83.539216]bus_add_driver+0x14b/0x1f0 <4> [83.539220]driver_register+0x66/0xb0 <4> [83.539222]hdmi_get_spk_alloc+0x1f/0x50 [snd_hda_codec_hdmi] <4> [83.539227]do_one_initcall+0x53/0x2e0 <4> [83.539230]do_init_module+0x55/0x200 <4> [83.539234]load_module+0x2700/0x2980 <4> [83.539237]__do_sys_finit_module+0xaa/0x110 <4> [83.539241]do_syscall_64+0x37/0xb0 <4> [83.539244]entry_SYSCALL_64_after_hwframe+0x44/0xae <4> [83.539247] -> #0 (fs_reclaim){+.+.}-{0:0}: <4> [83.539251]validate_chain+0xb37/0x1e70 <4> [83.539254]__lock_acquire+0x5a1/0xb70 <4> [83.539258]lock_acquire+0xd3/0x310 <4> [83.539260]fs_reclaim_acquire+0x9d/0xd0 <4> [83.539264]__kmalloc_track_caller+0x56/0x270 <4> [83.539267]krealloc+0x48/0xa0 <4> [83.539270]dma_resv_get_fences+0x1c3/0x280 <4> [83.539274]i915_gem_object_wait+0x1ff/0x410 [i915] <4> [83.539342]i915_gem_evict_for_node+0x16b/0x440 [i915] <4> [83.539412]i915_gem_gtt_reserve+0xff/0x130 [i915] <4> [83.539482]i915_vma_pin_ww+0x765/0x970 [i915] <4> [83.539556]eb_validate_vmas+0x6fe/0x8e0 [i915] <4> [83.539626]i915_gem_do_execbuffer+0x9a6/0x20a0 [i915] <4> [83.539693]i915_gem_execbuffer2_ioctl+0x11f/0x2c0 [i915] <4> [83.539759]drm_ioctl_kernel+0xac/0x140 <4> [83.539763]drm_ioctl+0x201/0x3d0 <4> [83.539766]__x64_sys_ioctl+0x6a/0xa0 <4> [83.539769]do_syscall_64+0x37/0xb0 <4> [83.539772]entry_SYSCALL_64_after_hwframe+0x44/0xae <4> [83.539775] other info that might help us debug this: <4> [83.539778] Possible unsafe locking scenario: <4> [83.539781]CPU0CPU1 <4> [83.539783] <4> [83.539785] lock(&vm->mutex/1); <4> [83.539788]lock(fs_reclaim); <4> [83.539791]lock(&vm->mutex/1); <4> [83.539794] lock(fs_reclaim); <4> [83.539796] *** DEADLOCK *** <4> [83.539799] 3 locks held by gem_render_line/5242: <4> [83.539802] #0: c9d4bbf0 (reservation_ww_class_acquire){+.+.}-{0:0}, at: i915_gem_do_execbuffer+0x8e5/0x20a0 [i915] <4> [83.539870] #1: 88811e48bae8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: eb_validate_vmas+0x81/0x8e0 [i915] <4> [83.539936] #2: 88813471d1e0 (&vm->mutex/1){+.+.}-{3:3}, at: i915_vma_pin_ww+0x1c7/0x970 [i915] <4> [83.540011] stack backtrace: <4> [83.540014] CPU: 2 PID: 5242 Comm: gem_render_line Not tainted 5.15.0-rc5-CI-Trybot_8062+ #1 <4> [83.540019] Hardware name: Intel(R) Client Systems NUC11TNHi3/NUC11TNBi3, BIOS TNTGL357.0038.2020.1124.1648 11/24/2020 <4> [83.540023] Call Trace: <4> [83.540026] dump_stack_lvl+0x56/0x7b <4> [83.540030] check_noncircular+0x12e/0x150 <4> [83.540034] ? _raw_spin_unlock_irqrestore+0x50/0x60 <4> [
Re: [Intel-gfx] [PATCH 0/1] drm/i915: vlv sideband
On Wed, Oct 13, 2021 at 01:47:09PM +0300, Jani Nikula wrote: On Wed, 13 Oct 2021, Ville Syrjälä wrote: On Wed, Oct 13, 2021 at 01:11:58PM +0300, Jani Nikula wrote: Three main ideas here: - vlv sideband only has the name "sideband" in common with the rest of intel_sideband.[ch] I wouldn't put it like that. There are two actual sideband implementtions in that file: - vlv/chv iosf sideband (vlv_sideband) - lpt/wpt iosf sideband (intel_sbi) And the third thing in that file is the snb+ pcode mailbox stuff, which has nothing to do with sideband. Fair enough... but no opposition to the splitting out of vlv/chv iosf sideband? vlv_sideband.[ch] like here? I'm fine with renaming too. I can follow up with lpt/wpt iosf split out (intel_sbi.[ch]?) and snb+ pcode (intel_pcode.[ch]?). yeah, I think that if we move intel_pcode.[ch] out, then we probably don't even have to worry about the iosf_* calls for other archs. The common stuff would be in pcode and the others would be compiled out for archs that don't have it (i.e. only x86 adds it). +Siva, who was looking into this iosf abstraction. Lucas De Marchi I think we've just put all of them together way back when this was all probably bundled in i915_drv.c or something... BR, Jani. -- Jani Nikula, Intel Open Source Graphics Center
Re: [Intel-gfx] [RFC 6/8] drm/i915: Make some recently added vfuncs use full scheduling attribute
On 13/10/2021 13:01, Daniel Vetter wrote: On Wed, Oct 06, 2021 at 10:12:29AM -0700, Matthew Brost wrote: On Mon, Oct 04, 2021 at 03:36:48PM +0100, Tvrtko Ursulin wrote: From: Tvrtko Ursulin Code added in 71ed60112d5d ("drm/i915: Add kick_backend function to i915_sched_engine") and ee242ca704d3 ("drm/i915/guc: Implement GuC priority management") introduced some scheduling related vfuncs which take integer request priority as argument. Make them instead take struct i915_sched_attr, which is the type encapsulating this information, so it probably aligns with the design better. It definitely enables extending the set of scheduling attributes. Understand the motivation here but the i915_scheduler is going to disapear when we move to the DRM scheduler or at least its functionality of priority inheritance will be pushed into the DRM scheduler. I'd be very careful making any changes here as the priority in the DRM scheduler is defined as single enum: Yeah I'm not sure it makes sense to build this and make the conversion to drm/sched even harder. We've already merged a lot of code with a "we'll totally convert to drm/sched right after" promise, there's not really room for more fun like this built on top of i915-scheduler. It is not really fun on top of i915-scheduler. It is fun on top of the concept of uapi gem context priority. As long as there is gem context priority, and requests inherit from it, the concept works. This is demonstrated by the fact it ties in with the GuC backend which reduces to three priorities already. It is limited granularity but it does something. Implementation details aside, key question is the proposal to tie process nice with GPU scheduling priority. There seems to be interest from other parties so there probably is something here. But I do plan to simplify this RFC to not add anything to i915_sched_attr and also drop the task sched attr change notifier. Regards, Tvrtko -Daniel /* These are often used as an (initial) index * to an array, and as such should start at 0. */ enum drm_sched_priority { DRM_SCHED_PRIORITY_MIN, DRM_SCHED_PRIORITY_NORMAL, DRM_SCHED_PRIORITY_HIGH, DRM_SCHED_PRIORITY_KERNEL, DRM_SCHED_PRIORITY_COUNT, DRM_SCHED_PRIORITY_UNSET = -2 }; Adding a field to the i915_sched_attr is fairly easy as we already have a structure but changing the DRM scheduler might be a tougher sell. Anyway you can make this work without adding the 'nice' field to i915_sched_attr? Might be worth exploring so when we move to the DRM scheduler this feature drops in a little cleaner. Matt Signed-off-by: Tvrtko Ursulin Cc: Matthew Brost Cc: Daniele Ceraolo Spurio --- drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 4 +++- drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c| 3 ++- drivers/gpu/drm/i915/i915_scheduler.c| 4 ++-- drivers/gpu/drm/i915/i915_scheduler_types.h | 4 ++-- 4 files changed, 9 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index 7147fe80919e..e91d803a6453 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -3216,11 +3216,13 @@ static bool can_preempt(struct intel_engine_cs *engine) return engine->class != RENDER_CLASS; } -static void kick_execlists(const struct i915_request *rq, int prio) +static void kick_execlists(const struct i915_request *rq, + const struct i915_sched_attr *attr) { struct intel_engine_cs *engine = rq->engine; struct i915_sched_engine *sched_engine = engine->sched_engine; const struct i915_request *inflight; + const int prio = attr->priority; /* * We only need to kick the tasklet once for the high priority diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index ba0de35f6323..b5883a4365ca 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -2414,9 +2414,10 @@ static void guc_init_breadcrumbs(struct intel_engine_cs *engine) } static void guc_bump_inflight_request_prio(struct i915_request *rq, - int prio) + const struct i915_sched_attr *attr) { struct intel_context *ce = rq->context; + const int prio = attr->priority; u8 new_guc_prio = map_i915_prio_to_guc_prio(prio); /* Short circuit function */ diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index 762127dd56c5..534bab99fcdc 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -255,7 +255,7 @@ static void __i915_schedule(struct i915_sched_node *node, /* Must
Re: [Intel-gfx] [PATCH 1/1] drm/i915: split out vlv sideband to a separate file
On Wed, Oct 13, 2021 at 01:11:59PM +0300, Jani Nikula wrote: The VLV/CHV sideband code is pretty distinct from the rest of the sideband code. Split it out to new vlv_sideband.[ch]. Pure code movement with relevant #include changes, and a tiny checkpatch fix on top. Cc: Lucas De Marchi Cc: Ville Syrjälä Signed-off-by: Jani Nikula Acked-by: Lucas De Marchi thanks Lucas De Marchi
Re: [Intel-gfx] [PATCH] drm/i915: Handle Intel igfx + Intel dgfx hybrid graphics setup
On 13/10/2021 13:06, Daniel Vetter wrote: On Tue, Oct 05, 2021 at 03:05:25PM +0200, Thomas Hellström wrote: Hi, Tvrtko, On 10/5/21 13:31, Tvrtko Ursulin wrote: From: Tvrtko Ursulin In short this makes i915 work for hybrid setups (DRI_PRIME=1 with Mesa) when rendering is done on Intel dgfx and scanout/composition on Intel igfx. Before this patch the driver was not quite ready for that setup, mainly because it was able to emit a semaphore wait between the two GPUs, which results in deadlocks because semaphore target location in HWSP is neither shared between the two, nor mapped in both GGTT spaces. To fix it the patch adds an additional check to a couple of relevant code paths in order to prevent using semaphores for inter-engine synchronisation when relevant objects are not in the same GGTT space. v2: * Avoid adding rq->i915. (Chris) v3: * Use GGTT which describes the limit more precisely. Signed-off-by: Tvrtko Ursulin Cc: Daniel Vetter Cc: Matthew Auld Cc: Thomas Hellström An IMO pretty important bugfix. I read up a bit on the previous discussion on this, and from what I understand the other two options were 1) Ripping out the semaphore code, 2) Consider dma-fences from other instances of the same driver as foreign. For imported dma-bufs we do 2), but particularly with lmem and p2p that's a more straightforward decision. I don't think 1) is a reasonable approach to fix this bug, (but perhaps as a general cleanup?), and for 2) yes I guess we might end up doing that, unless we find some real benefits in treating same-driver-separate-device dma-fences as local, but for this particular bug, IMO this is a reasonable fix. The foreign dma-fences have uapi impact, which Tvrtko shrugged off as "it's a good idea", and not it's really just not. So we still need to that this properly. I always said lets merge the fix and discuss it. Fix only improved one fail and did not introduce any new issues you are worried about. They were all already there. So lets start the discussion why it is not a good idea to extend the concept of priority inheritance in the hybrid case? Today we can have high priority compositor waiting for client rendering, or even I915_PRIORITY_DISPLAY which I _think_ somehow ties into page flips with full screen stuff, and with igpu we do priority inheritance in those cases. Why it is a bad idea to do the same in the hybrid setup? Regards, Tvrtko Reviewed-by: Thomas Hellström But I'm also ok with just merging this as-is so the situation doesn't become too entertaining. -Daniel --- drivers/gpu/drm/i915/i915_request.c | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 79da5eca60af..4f189982f67e 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -1145,6 +1145,12 @@ __emit_semaphore_wait(struct i915_request *to, return 0; } +static bool +can_use_semaphore_wait(struct i915_request *to, struct i915_request *from) +{ + return to->engine->gt->ggtt == from->engine->gt->ggtt; +} + static int emit_semaphore_wait(struct i915_request *to, struct i915_request *from, @@ -1153,6 +1159,9 @@ emit_semaphore_wait(struct i915_request *to, const intel_engine_mask_t mask = READ_ONCE(from->engine)->mask; struct i915_sw_fence *wait = &to->submit; + if (!can_use_semaphore_wait(to, from)) + goto await_fence; + if (!intel_context_use_semaphores(to->context)) goto await_fence; @@ -1256,7 +1265,8 @@ __i915_request_await_execution(struct i915_request *to, * immediate execution, and so we must wait until it reaches the * active slot. */ - if (intel_engine_has_semaphores(to->engine) && + if (can_use_semaphore_wait(to, from) && + intel_engine_has_semaphores(to->engine) && !i915_request_has_initial_breadcrumb(to)) { err = __emit_semaphore_wait(to, from, from->fence.seqno - 1); if (err < 0)
Re: [Intel-gfx] [PATCH 2/2] drm/i915/pmu: Connect engine busyness stats from GuC to pmu
On 13/10/2021 01:56, Umesh Nerlige Ramappa wrote: With GuC handling scheduling, i915 is not aware of the time that a context is scheduled in and out of the engine. Since i915 pmu relies on this info to provide engine busyness to the user, GuC shares this info with i915 for all engines using shared memory. For each engine, this info contains: - total busyness: total time that the context was running (total) - id: id of the running context (id) - start timestamp: timestamp when the context started running (start) At the time (now) of sampling the engine busyness, if the id is valid (!= ~0), and start is non-zero, then the context is considered to be active and the engine busyness is calculated using the below equation engine busyness = total + (now - start) All times are obtained from the gt clock base. For inactive contexts, engine busyness is just equal to the total. The start and total values provided by GuC are 32 bits and wrap around in a few minutes. Since perf pmu provides busyness as 64 bit monotonically increasing values, there is a need for this implementation to account for overflows and extend the time to 64 bits before returning busyness to the user. In order to do that, a worker runs periodically at frequency = 1/8th the time it takes for the timestamp to wrap. As an example, that would be once in 27 seconds for a gt clock frequency of 19.2 MHz. Note: There might be an overaccounting of busyness due to the fact that GuC may be updating the total and start values while kmd is reading them. (i.e kmd may read the updated total and the stale start). In such a case, user may see higher busyness value followed by smaller ones which would eventually catch up to the higher value. v2: (Tvrtko) - Include details in commit message - Move intel engine busyness function into execlist code - Use union inside engine->stats - Use natural type for ping delay jiffies - Drop active_work condition checks - Use for_each_engine if iterating all engines - Drop seq locking, use spinlock at guc level to update engine stats - Document worker specific details v3: (Tvrtko/Umesh) - Demarcate guc and execlist stat objects with comments - Document known over-accounting issue in commit - Provide a consistent view of guc state - Add hooks to gt park/unpark for guc busyness - Stop/start worker in gt park/unpark path - Drop inline - Move spinlock and worker inits to guc initialization - Drop helpers that are called only once v4: (Tvrtko/Matt/Umesh) - Drop addressed opens from commit message - Get runtime pm in ping, remove from the park path - Use cancel_delayed_work_sync in disable_submission path - Update stats during reset prepare - Skip ping if reset in progress - Explicitly name execlists and guc stats objects - Since disable_submission is called from many places, move resetting stats to intel_guc_submission_reset_prepare v5: (Tvrtko) - Add a trylock helper that does not sleep and synchronize PMU event callbacks and worker with gt reset Looks good to me now, for some combination of high level and incomeplte low level review (I did not check the overflow handling or the GuC page layout and flow.). Both patches: Acked-by: Tvrtko Ursulin Do you have someone available to check the parts I did not and r-b? Regards, Tvrtko Signed-off-by: John Harrison Signed-off-by: Umesh Nerlige Ramappa --- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 28 +- drivers/gpu/drm/i915/gt/intel_engine_types.h | 33 ++- .../drm/i915/gt/intel_execlists_submission.c | 34 +++ drivers/gpu/drm/i915/gt/intel_gt_pm.c | 2 + drivers/gpu/drm/i915/gt/intel_reset.c | 16 ++ drivers/gpu/drm/i915/gt/intel_reset.h | 1 + .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h | 1 + drivers/gpu/drm/i915/gt/uc/intel_guc.h| 30 ++ drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c| 21 ++ drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h| 5 + drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 13 + .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 267 ++ .../gpu/drm/i915/gt/uc/intel_guc_submission.h | 2 + drivers/gpu/drm/i915/i915_reg.h | 2 + 14 files changed, 427 insertions(+), 28 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 38436f4b5706..6b783fdcba2a 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -1873,23 +1873,6 @@ void intel_engine_dump(struct intel_engine_cs *engine, intel_engine_print_breadcrumbs(engine, m); } -static ktime_t __intel_engine_get_busy_time(struct intel_engine_cs *engine, - ktime_t *now) -{ - struct intel_engine_execlists_stats *stats = &engine->stats.execlists; - ktime_t total = stats->total; - - /* -* If the engine is executing something at the moment -* add it to the total. -*/ - *now =
Re: [Intel-gfx] [PATCH] drm/i915/dg2: Tile 4 plane format support
On 2021-10-12 at 11:28:45 +0300, Stanislav Lisovskiy wrote: > TileF(Tile4 in bspec) format is 4K tile organized into > 64B subtiles with same basic shape as for legacy TileY > which will be supported by Display13. > > v2: - Fixed wrong case condition(Jani Nikula) > - Increased I915_FORMAT_MOD_F_TILED up to 12(Imre Deak) > > v3: - s/I915_TILING_F/TILING_4/g > - s/I915_FORMAT_MOD_F_TILED/I915_FORMAT_MOD_4_TILED/g > - Removed unneeded fencing code > > Cc: Imre Deak > Cc: Matt Roper > Cc: Maarten Lankhorst > Signed-off-by: Stanislav Lisovskiy > Signed-off-by: Matt Roper > Signed-off-by: Juha-Pekka Heikkilä > --- > drivers/gpu/drm/i915/display/intel_display.c | 2 ++ > drivers/gpu/drm/i915/display/intel_fb.c | 7 > drivers/gpu/drm/i915/display/intel_fbc.c | 1 + > .../drm/i915/display/skl_universal_plane.c| 36 ++- > drivers/gpu/drm/i915/i915_drv.h | 1 + > drivers/gpu/drm/i915/i915_pci.c | 1 + > drivers/gpu/drm/i915/i915_reg.h | 1 + > drivers/gpu/drm/i915/intel_device_info.h | 1 + > drivers/gpu/drm/i915/intel_pm.c | 1 + > include/uapi/drm/drm_fourcc.h | 8 + > 10 files changed, 50 insertions(+), 9 deletions(-) > > diff --git a/drivers/gpu/drm/i915/display/intel_display.c > b/drivers/gpu/drm/i915/display/intel_display.c > index 4f0badb11bbb..524a20fa67ce 100644 > --- a/drivers/gpu/drm/i915/display/intel_display.c > +++ b/drivers/gpu/drm/i915/display/intel_display.c > @@ -1325,6 +1325,7 @@ intel_alloc_initial_plane_obj(struct intel_crtc *crtc, > case DRM_FORMAT_MOD_LINEAR: > case I915_FORMAT_MOD_X_TILED: > case I915_FORMAT_MOD_Y_TILED: > + case I915_FORMAT_MOD_4_TILED: > break; > default: > drm_dbg(&dev_priv->drm, > @@ -9330,6 +9331,7 @@ static int intel_atomic_check_async(struct > intel_atomic_state *state) > case I915_FORMAT_MOD_X_TILED: > case I915_FORMAT_MOD_Y_TILED: > case I915_FORMAT_MOD_Yf_TILED: > + case I915_FORMAT_MOD_4_TILED: > break; > default: > drm_dbg_kms(&i915->drm, > diff --git a/drivers/gpu/drm/i915/display/intel_fb.c > b/drivers/gpu/drm/i915/display/intel_fb.c > index fa1f375e696b..e19739fef825 100644 > --- a/drivers/gpu/drm/i915/display/intel_fb.c > +++ b/drivers/gpu/drm/i915/display/intel_fb.c > @@ -127,6 +127,12 @@ intel_tile_width_bytes(const struct drm_framebuffer *fb, > int color_plane) > return 128; > else > return 512; > + case I915_FORMAT_MOD_4_TILED: > + /* > + * Each 4K tile consists of 64B(8*8) subtiles, with > + * same shape as Y Tile(i.e 4*16B OWords) > + */ > + return 128; > case I915_FORMAT_MOD_Y_TILED_CCS: > if (is_ccs_plane(fb, color_plane)) > return 128; > @@ -305,6 +311,7 @@ unsigned int intel_surf_alignment(const struct > drm_framebuffer *fb, > case I915_FORMAT_MOD_Y_TILED_CCS: > case I915_FORMAT_MOD_Yf_TILED_CCS: > case I915_FORMAT_MOD_Y_TILED: > + case I915_FORMAT_MOD_4_TILED: > case I915_FORMAT_MOD_Yf_TILED: > return 1 * 1024 * 1024; > default: > diff --git a/drivers/gpu/drm/i915/display/intel_fbc.c > b/drivers/gpu/drm/i915/display/intel_fbc.c > index 1f66de77a6b1..f079a771f802 100644 > --- a/drivers/gpu/drm/i915/display/intel_fbc.c > +++ b/drivers/gpu/drm/i915/display/intel_fbc.c > @@ -747,6 +747,7 @@ static bool tiling_is_valid(struct drm_i915_private > *dev_priv, > case DRM_FORMAT_MOD_LINEAR: > case I915_FORMAT_MOD_Y_TILED: > case I915_FORMAT_MOD_Yf_TILED: > + case I915_FORMAT_MOD_4_TILED: > return DISPLAY_VER(dev_priv) >= 9; > case I915_FORMAT_MOD_X_TILED: > return true; > diff --git a/drivers/gpu/drm/i915/display/skl_universal_plane.c > b/drivers/gpu/drm/i915/display/skl_universal_plane.c > index a0e53a3b267a..586aa660ba7a 100644 > --- a/drivers/gpu/drm/i915/display/skl_universal_plane.c > +++ b/drivers/gpu/drm/i915/display/skl_universal_plane.c > @@ -207,6 +207,13 @@ static const u64 adlp_step_a_plane_format_modifiers[] = { > DRM_FORMAT_MOD_INVALID > }; > > +static const u64 dg2_plane_format_modifiers[] = { > + I915_FORMAT_MOD_X_TILED, > + I915_FORMAT_MOD_4_TILED, > + DRM_FORMAT_MOD_LINEAR, > + DRM_FORMAT_MOD_INVALID > +}; > + > int skl_format_to_fourcc(int format, bool rgb_order, bool alpha) > { > switch (format) { > @@ -795,6 +802,8 @@ static u32 skl_plane_ctl_tiling(u64 fb_modifier) > return PLANE_CTL_TILED_X; > case I915_FORMAT_MOD_Y_TILED: > return PLANE_CTL_TILED_Y; > + case I915_FORMAT_MOD_4_TILED: > + return PLANE_CTL_TILED_F; > case I915_FORMAT_MOD_Y_TILED_CCS: > case I915
Re: [Intel-gfx] [PATCH] drm/i915: Use dma_resv_iter for waiting in i915_gem_object_wait_reservation.
On Wed, Oct 13, 2021 at 04:37:03PM +0100, Tvrtko Ursulin wrote: > > On 13/10/2021 15:00, Daniel Vetter wrote: > > On Wed, Oct 13, 2021 at 02:32:03PM +0200, Maarten Lankhorst wrote: > > > No memory should be allocated when calling i915_gem_object_wait, > > > because it may be called to idle a BO when evicting memory. > > > > > > Fix this by using dma_resv_iter helpers to call > > > i915_gem_object_wait_fence() on each fence, which cleans up the code a > > > lot. > > > Also remove dma_resv_prune, it's questionably. > > > > > > This will result in the following lockdep splat. > > > > > > <4> [83.538517] == > > > <4> [83.538520] WARNING: possible circular locking dependency detected > > > <4> [83.538522] 5.15.0-rc5-CI-Trybot_8062+ #1 Not tainted > > > <4> [83.538525] -- > > > <4> [83.538527] gem_render_line/5242 is trying to acquire lock: > > > <4> [83.538530] 8275b1e0 (fs_reclaim){+.+.}-{0:0}, at: > > > __kmalloc_track_caller+0x56/0x270 > > > <4> [83.538538] > > > but task is already holding lock: > > > <4> [83.538540] 88813471d1e0 (&vm->mutex/1){+.+.}-{3:3}, at: > > > i915_vma_pin_ww+0x1c7/0x970 [i915] > > > <4> [83.538638] > > > which lock already depends on the new lock. > > > <4> [83.538642] > > > the existing dependency chain (in reverse order) is: > > > <4> [83.538645] > > > -> #1 (&vm->mutex/1){+.+.}-{3:3}: > > > <4> [83.538649]lock_acquire+0xd3/0x310 > > > <4> [83.538654]i915_gem_shrinker_taints_mutex+0x2d/0x50 [i915] > > > <4> [83.538730]i915_address_space_init+0xf5/0x1b0 [i915] > > > <4> [83.538794]ppgtt_init+0x55/0x70 [i915] > > > <4> [83.538856]gen8_ppgtt_create+0x44/0x5d0 [i915] > > > <4> [83.538912]i915_ppgtt_create+0x28/0xf0 [i915] > > > <4> [83.538971]intel_gt_init+0x130/0x3b0 [i915] > > > <4> [83.539029]i915_gem_init+0x14b/0x220 [i915] > > > <4> [83.539100]i915_driver_probe+0x97e/0xdd0 [i915] > > > <4> [83.539149]i915_pci_probe+0x43/0x1d0 [i915] > > > <4> [83.539197]pci_device_probe+0x9b/0x110 > > > <4> [83.539201]really_probe+0x1b0/0x3b0 > > > <4> [83.539205]__driver_probe_device+0xf6/0x170 > > > <4> [83.539208]driver_probe_device+0x1a/0x90 > > > <4> [83.539210]__driver_attach+0x93/0x160 > > > <4> [83.539213]bus_for_each_dev+0x72/0xc0 > > > <4> [83.539216]bus_add_driver+0x14b/0x1f0 > > > <4> [83.539220]driver_register+0x66/0xb0 > > > <4> [83.539222]hdmi_get_spk_alloc+0x1f/0x50 [snd_hda_codec_hdmi] > > > <4> [83.539227]do_one_initcall+0x53/0x2e0 > > > <4> [83.539230]do_init_module+0x55/0x200 > > > <4> [83.539234]load_module+0x2700/0x2980 > > > <4> [83.539237]__do_sys_finit_module+0xaa/0x110 > > > <4> [83.539241]do_syscall_64+0x37/0xb0 > > > <4> [83.539244]entry_SYSCALL_64_after_hwframe+0x44/0xae > > > <4> [83.539247] > > > -> #0 (fs_reclaim){+.+.}-{0:0}: > > > <4> [83.539251]validate_chain+0xb37/0x1e70 > > > <4> [83.539254]__lock_acquire+0x5a1/0xb70 > > > <4> [83.539258]lock_acquire+0xd3/0x310 > > > <4> [83.539260]fs_reclaim_acquire+0x9d/0xd0 > > > <4> [83.539264]__kmalloc_track_caller+0x56/0x270 > > > <4> [83.539267]krealloc+0x48/0xa0 > > > <4> [83.539270]dma_resv_get_fences+0x1c3/0x280 > > > <4> [83.539274]i915_gem_object_wait+0x1ff/0x410 [i915] > > > <4> [83.539342]i915_gem_evict_for_node+0x16b/0x440 [i915] > > > <4> [83.539412]i915_gem_gtt_reserve+0xff/0x130 [i915] > > > <4> [83.539482]i915_vma_pin_ww+0x765/0x970 [i915] > > > <4> [83.539556]eb_validate_vmas+0x6fe/0x8e0 [i915] > > > <4> [83.539626]i915_gem_do_execbuffer+0x9a6/0x20a0 [i915] > > > <4> [83.539693]i915_gem_execbuffer2_ioctl+0x11f/0x2c0 [i915] > > > <4> [83.539759]drm_ioctl_kernel+0xac/0x140 > > > <4> [83.539763]drm_ioctl+0x201/0x3d0 > > > <4> [83.539766]__x64_sys_ioctl+0x6a/0xa0 > > > <4> [83.539769]do_syscall_64+0x37/0xb0 > > > <4> [83.539772]entry_SYSCALL_64_after_hwframe+0x44/0xae > > > <4> [83.539775] > > > other info that might help us debug this: > > > <4> [83.539778] Possible unsafe locking scenario: > > > <4> [83.539781]CPU0CPU1 > > > <4> [83.539783] > > > <4> [83.539785] lock(&vm->mutex/1); > > > <4> [83.539788]lock(fs_reclaim); > > > <4> [83.539791]lock(&vm->mutex/1); > > > <4> [83.539794] lock(fs_reclaim); > > > <4> [83.539796] > > > *** DEADLOCK *** > > > <4> [83.539799] 3 locks held by gem_render_line/5242: > > > <4> [83.539802] #0: c9d4bbf0 > > > (reservation_ww_class_acquire){+.+.}-{0:0}, at: > > > i915_gem_do_execbuffer+0x8e5/0x20a0 [i915] > > > <4> [83.539870] #1: 88811e48bae8 > > > (reservation
[Intel-gfx] [PATCH v3] component: do not leave master devres group open after bind
In current code, the devres group for aggregate master is left open after call to component_master_add_*(). This leads to problems when the master does further managed allocations on its own. When any participating driver calls component_del(), this leads to immediate release of resources. This came up when investigating a page fault occurring with i915 DRM driver unbind with 5.15-rc1 kernel. The following sequence occurs: i915_pci_remove() -> intel_display_driver_unregister() -> i915_audio_component_cleanup() -> component_del() -> component.c:take_down_master() -> hdac_component_master_unbind() [via master->ops->unbind()] -> devres_release_group(master->parent, NULL) With older kernels this has not caused issues, but with audio driver moving to use managed interfaces for more of its allocations, this no longer works. Devres log shows following to occur: component_master_add_with_match() [ 126.886032] snd_hda_intel :00:1f.3: DEVRES ADD 323ccdc5 devm_component_match_release (24 bytes) [ 126.886045] snd_hda_intel :00:1f.3: DEVRES ADD 865cdb29 grp< (0 bytes) [ 126.886049] snd_hda_intel :00:1f.3: DEVRES ADD 1b480725 grp< (0 bytes) audio driver completes its PCI probe() [ 126.892238] snd_hda_intel :00:1f.3: DEVRES ADD 1b480725 pcim_iomap_release (48 bytes) component_del() called() at DRM/i915 unbind() [ 137.579422] i915 :00:02.0: DEVRES REL ef44c293 grp< (0 bytes) [ 137.579445] snd_hda_intel :00:1f.3: DEVRES REL 865cdb29 grp< (0 bytes) [ 137.579458] snd_hda_intel :00:1f.3: DEVRES REL 1b480725 pcim_iomap_release (48 bytes) So the "devres_release_group(master->parent, NULL)" ends up freeing the pcim_iomap allocation. Upon next runtime resume, the audio driver will cause a page fault as the iomap alloc was released without the driver knowing about it. Fix this issue by using the "struct master" pointer as identifier for the devres group, and by closing the devres group after the master->ops->bind() call is done. This allows devres allocations done by the driver acting as master to be isolated from the binding state of the aggregate driver. This modifies the logic originally introduced in commit 9e1ccb4a7700 ("drivers/base: fix devres handling for master device") Cc: sta...@vger.kernel.org BugLink: https://gitlab.freedesktop.org/drm/intel/-/issues/4136 Fixes: 9e1ccb4a7700 ("drivers/base: fix devres handling for master device") Signed-off-by: Kai Vehmanen Acked-by: Imre Deak Acked-by: Russell King (Oracle) --- drivers/base/component.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) V3 changes: - address feedback from Greg KH, add a Fixes tag and cc stable V2 changes: - after review form Imre and Russell, removing RFC tag - rebased on top of 5.15-rc2 (V1 was on drm-tip) - CI test results for V1 show that this patch fixes multiple failures in i915 unbind and module reload tests: https://patchwork.freedesktop.org/series/94889/ diff --git a/drivers/base/component.c b/drivers/base/component.c index 5e79299f6c3f..870485cbbb87 100644 --- a/drivers/base/component.c +++ b/drivers/base/component.c @@ -246,7 +246,7 @@ static int try_to_bring_up_master(struct master *master, return 0; } - if (!devres_open_group(master->parent, NULL, GFP_KERNEL)) + if (!devres_open_group(master->parent, master, GFP_KERNEL)) return -ENOMEM; /* Found all components */ @@ -258,6 +258,7 @@ static int try_to_bring_up_master(struct master *master, return ret; } + devres_close_group(master->parent, NULL); master->bound = true; return 1; } @@ -282,7 +283,7 @@ static void take_down_master(struct master *master) { if (master->bound) { master->ops->unbind(master->parent); - devres_release_group(master->parent, NULL); + devres_release_group(master->parent, master); master->bound = false; } } base-commit: 9e1ff307c779ce1f0f810c7ecce3d95bbae40896 -- 2.33.0
Re: [Intel-gfx] [PATCH 2/2] drm/i915/pmu: Connect engine busyness stats from GuC to pmu
On Wed, Oct 13, 2021 at 05:06:26PM +0100, Tvrtko Ursulin wrote: On 13/10/2021 01:56, Umesh Nerlige Ramappa wrote: With GuC handling scheduling, i915 is not aware of the time that a context is scheduled in and out of the engine. Since i915 pmu relies on this info to provide engine busyness to the user, GuC shares this info with i915 for all engines using shared memory. For each engine, this info contains: - total busyness: total time that the context was running (total) - id: id of the running context (id) - start timestamp: timestamp when the context started running (start) At the time (now) of sampling the engine busyness, if the id is valid (!= ~0), and start is non-zero, then the context is considered to be active and the engine busyness is calculated using the below equation engine busyness = total + (now - start) All times are obtained from the gt clock base. For inactive contexts, engine busyness is just equal to the total. The start and total values provided by GuC are 32 bits and wrap around in a few minutes. Since perf pmu provides busyness as 64 bit monotonically increasing values, there is a need for this implementation to account for overflows and extend the time to 64 bits before returning busyness to the user. In order to do that, a worker runs periodically at frequency = 1/8th the time it takes for the timestamp to wrap. As an example, that would be once in 27 seconds for a gt clock frequency of 19.2 MHz. Note: There might be an overaccounting of busyness due to the fact that GuC may be updating the total and start values while kmd is reading them. (i.e kmd may read the updated total and the stale start). In such a case, user may see higher busyness value followed by smaller ones which would eventually catch up to the higher value. v2: (Tvrtko) - Include details in commit message - Move intel engine busyness function into execlist code - Use union inside engine->stats - Use natural type for ping delay jiffies - Drop active_work condition checks - Use for_each_engine if iterating all engines - Drop seq locking, use spinlock at guc level to update engine stats - Document worker specific details v3: (Tvrtko/Umesh) - Demarcate guc and execlist stat objects with comments - Document known over-accounting issue in commit - Provide a consistent view of guc state - Add hooks to gt park/unpark for guc busyness - Stop/start worker in gt park/unpark path - Drop inline - Move spinlock and worker inits to guc initialization - Drop helpers that are called only once v4: (Tvrtko/Matt/Umesh) - Drop addressed opens from commit message - Get runtime pm in ping, remove from the park path - Use cancel_delayed_work_sync in disable_submission path - Update stats during reset prepare - Skip ping if reset in progress - Explicitly name execlists and guc stats objects - Since disable_submission is called from many places, move resetting stats to intel_guc_submission_reset_prepare v5: (Tvrtko) - Add a trylock helper that does not sleep and synchronize PMU event callbacks and worker with gt reset Looks good to me now, for some combination of high level and incomeplte low level review (I did not check the overflow handling or the GuC page layout and flow.). Both patches: Acked-by: Tvrtko Ursulin Thanks Do you have someone available to check the parts I did not and r-b? I will check with Matt/John. Regards, Umesh Regards, Tvrtko Signed-off-by: John Harrison Signed-off-by: Umesh Nerlige Ramappa --- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 28 +- drivers/gpu/drm/i915/gt/intel_engine_types.h | 33 ++- .../drm/i915/gt/intel_execlists_submission.c | 34 +++ drivers/gpu/drm/i915/gt/intel_gt_pm.c | 2 + drivers/gpu/drm/i915/gt/intel_reset.c | 16 ++ drivers/gpu/drm/i915/gt/intel_reset.h | 1 + .../gpu/drm/i915/gt/uc/abi/guc_actions_abi.h | 1 + drivers/gpu/drm/i915/gt/uc/intel_guc.h| 30 ++ drivers/gpu/drm/i915/gt/uc/intel_guc_ads.c| 21 ++ drivers/gpu/drm/i915/gt/uc/intel_guc_ads.h| 5 + drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 13 + .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 267 ++ .../gpu/drm/i915/gt/uc/intel_guc_submission.h | 2 + drivers/gpu/drm/i915/i915_reg.h | 2 + 14 files changed, 427 insertions(+), 28 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 38436f4b5706..6b783fdcba2a 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -1873,23 +1873,6 @@ void intel_engine_dump(struct intel_engine_cs *engine, intel_engine_print_breadcrumbs(engine, m); } -static ktime_t __intel_engine_get_busy_time(struct intel_engine_cs *engine, - ktime_t *now) -{ - struct intel_engine_execlists_stats *stats = &engine->stats.execlists; - ktime_t total = stats->total; - - /* -* If the en
Re: [Intel-gfx] [PATCH] drm/i915: Use dma_resv_iter for waiting in i915_gem_object_wait_reservation.
Op 13-10-2021 om 17:37 schreef Tvrtko Ursulin: > > On 13/10/2021 15:00, Daniel Vetter wrote: >> On Wed, Oct 13, 2021 at 02:32:03PM +0200, Maarten Lankhorst wrote: >>> No memory should be allocated when calling i915_gem_object_wait, >>> because it may be called to idle a BO when evicting memory. >>> >>> Fix this by using dma_resv_iter helpers to call >>> i915_gem_object_wait_fence() on each fence, which cleans up the code a lot. >>> Also remove dma_resv_prune, it's questionably. >>> >>> This will result in the following lockdep splat. >>> >>> <4> [83.538517] == >>> <4> [83.538520] WARNING: possible circular locking dependency detected >>> <4> [83.538522] 5.15.0-rc5-CI-Trybot_8062+ #1 Not tainted >>> <4> [83.538525] -- >>> <4> [83.538527] gem_render_line/5242 is trying to acquire lock: >>> <4> [83.538530] 8275b1e0 (fs_reclaim){+.+.}-{0:0}, at: >>> __kmalloc_track_caller+0x56/0x270 >>> <4> [83.538538] >>> but task is already holding lock: >>> <4> [83.538540] 88813471d1e0 (&vm->mutex/1){+.+.}-{3:3}, at: >>> i915_vma_pin_ww+0x1c7/0x970 [i915] >>> <4> [83.538638] >>> which lock already depends on the new lock. >>> <4> [83.538642] >>> the existing dependency chain (in reverse order) is: >>> <4> [83.538645] >>> -> #1 (&vm->mutex/1){+.+.}-{3:3}: >>> <4> [83.538649] lock_acquire+0xd3/0x310 >>> <4> [83.538654] i915_gem_shrinker_taints_mutex+0x2d/0x50 [i915] >>> <4> [83.538730] i915_address_space_init+0xf5/0x1b0 [i915] >>> <4> [83.538794] ppgtt_init+0x55/0x70 [i915] >>> <4> [83.538856] gen8_ppgtt_create+0x44/0x5d0 [i915] >>> <4> [83.538912] i915_ppgtt_create+0x28/0xf0 [i915] >>> <4> [83.538971] intel_gt_init+0x130/0x3b0 [i915] >>> <4> [83.539029] i915_gem_init+0x14b/0x220 [i915] >>> <4> [83.539100] i915_driver_probe+0x97e/0xdd0 [i915] >>> <4> [83.539149] i915_pci_probe+0x43/0x1d0 [i915] >>> <4> [83.539197] pci_device_probe+0x9b/0x110 >>> <4> [83.539201] really_probe+0x1b0/0x3b0 >>> <4> [83.539205] __driver_probe_device+0xf6/0x170 >>> <4> [83.539208] driver_probe_device+0x1a/0x90 >>> <4> [83.539210] __driver_attach+0x93/0x160 >>> <4> [83.539213] bus_for_each_dev+0x72/0xc0 >>> <4> [83.539216] bus_add_driver+0x14b/0x1f0 >>> <4> [83.539220] driver_register+0x66/0xb0 >>> <4> [83.539222] hdmi_get_spk_alloc+0x1f/0x50 [snd_hda_codec_hdmi] >>> <4> [83.539227] do_one_initcall+0x53/0x2e0 >>> <4> [83.539230] do_init_module+0x55/0x200 >>> <4> [83.539234] load_module+0x2700/0x2980 >>> <4> [83.539237] __do_sys_finit_module+0xaa/0x110 >>> <4> [83.539241] do_syscall_64+0x37/0xb0 >>> <4> [83.539244] entry_SYSCALL_64_after_hwframe+0x44/0xae >>> <4> [83.539247] >>> -> #0 (fs_reclaim){+.+.}-{0:0}: >>> <4> [83.539251] validate_chain+0xb37/0x1e70 >>> <4> [83.539254] __lock_acquire+0x5a1/0xb70 >>> <4> [83.539258] lock_acquire+0xd3/0x310 >>> <4> [83.539260] fs_reclaim_acquire+0x9d/0xd0 >>> <4> [83.539264] __kmalloc_track_caller+0x56/0x270 >>> <4> [83.539267] krealloc+0x48/0xa0 >>> <4> [83.539270] dma_resv_get_fences+0x1c3/0x280 >>> <4> [83.539274] i915_gem_object_wait+0x1ff/0x410 [i915] >>> <4> [83.539342] i915_gem_evict_for_node+0x16b/0x440 [i915] >>> <4> [83.539412] i915_gem_gtt_reserve+0xff/0x130 [i915] >>> <4> [83.539482] i915_vma_pin_ww+0x765/0x970 [i915] >>> <4> [83.539556] eb_validate_vmas+0x6fe/0x8e0 [i915] >>> <4> [83.539626] i915_gem_do_execbuffer+0x9a6/0x20a0 [i915] >>> <4> [83.539693] i915_gem_execbuffer2_ioctl+0x11f/0x2c0 [i915] >>> <4> [83.539759] drm_ioctl_kernel+0xac/0x140 >>> <4> [83.539763] drm_ioctl+0x201/0x3d0 >>> <4> [83.539766] __x64_sys_ioctl+0x6a/0xa0 >>> <4> [83.539769] do_syscall_64+0x37/0xb0 >>> <4> [83.539772] entry_SYSCALL_64_after_hwframe+0x44/0xae >>> <4> [83.539775] >>> other info that might help us debug this: >>> <4> [83.539778] Possible unsafe locking scenario: >>> <4> [83.539781] CPU0 CPU1 >>> <4> [83.539783] >>> <4> [83.539785] lock(&vm->mutex/1); >>> <4> [83.539788] lock(fs_reclaim); >>> <4> [83.539791] lock(&vm->mutex/1); >>> <4> [83.539794] lock(fs_reclaim); >>> <4> [83.539796] >>> *** DEADLOCK *** >>> <4> [83.539799] 3 locks held by gem_render_line/5242: >>> <4> [83.539802] #0: c9d4bbf0 >>> (reservation_ww_class_acquire){+.+.}-{0:0}, at: >>> i915_gem_do_execbuffer+0x8e5/0x20a0 [i915] >>> <4> [83.539870] #1: 88811e48bae8 >>> (reservation_ww_class_mutex){+.+.}-{3:3}, at: eb_validate_vmas+0x81/0x8e0 >>> [i915] >>> <4> [83.539936] #2: 88813471d1e0 (&vm->mutex/1){+.+.}-{3:3}, at: >>> i915_vma_pin_ww+0x1c7/0x970 [i915] >>> <4> [83.540011]
[Intel-gfx] ✗ Fi.CI.BAT: failure for component: do not leave master devres group open after bind (rev3)
== Series Details == Series: component: do not leave master devres group open after bind (rev3) URL : https://patchwork.freedesktop.org/series/94889/ State : failure == Summary == Applying: component: do not leave master devres group open after bind Using index info to reconstruct a base tree... M drivers/base/component.c Falling back to patching base and 3-way merge... No changes -- Patch already applied.
Re: [Intel-gfx] [RFC PATCH] drm: Increase DRM_OBJECT_MAX_PROPERTY by 18.
On 2021-10-13 14:57:34 [+0200], Daniel Vetter wrote: > Hm there's a pile of commits there, and nothing immediately jumps to > light. The thing is, 18 is likely way too much, since if e.g. we have a > single new property on a plane and that pushes over the limit on all of > them, you get iirc 3x4 already simply because we have that many planes. > > So would be good to know the actual culprit. > > Can you pls try to bisect the above range, applying the patch as a fixup > locally (without commit, that will confuse git bisect a bit I think), so > we know what/where went wrong? c7fcbf2513973 -> does not boot c7fcbf2513973 + 2f425cf5242a0 -> boots, 18 x DRM_OBJECT_MAX_PROPERTY 6f11f37459d8f -> boots, 0 x DRM_OBJECT_MAX_PROPERTY 6f11f37459d8f + 2f425cf5242a0 -> boots, 18 x DRM_OBJECT_MAX_PROPERTY > I'm still confused why this isn't showing up anywhere in our intel ci ... > > Thanks, Daniel Sebastian
[Intel-gfx] [PATCH 3/3] drm/amdgpu: Replace drm_mm with drm buddy manager
Add drm buddy allocator support for vram memory management Signed-off-by: Arunpravin --- .../gpu/drm/amd/amdgpu/amdgpu_res_cursor.h| 97 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h | 4 +- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 251 ++ 3 files changed, 217 insertions(+), 135 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h index acfa207cf970..2c17e948355e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h @@ -30,12 +30,15 @@ #include #include +#include "amdgpu_vram_mgr.h" + /* state back for walking over vram_mgr and gtt_mgr allocations */ struct amdgpu_res_cursor { uint64_tstart; uint64_tsize; uint64_tremaining; - struct drm_mm_node *node; + void*node; + uint32_tmem_type; }; /** @@ -52,27 +55,63 @@ static inline void amdgpu_res_first(struct ttm_resource *res, uint64_t start, uint64_t size, struct amdgpu_res_cursor *cur) { + struct drm_buddy_block *block; + struct list_head *head, *next; struct drm_mm_node *node; - if (!res || res->mem_type == TTM_PL_SYSTEM) { - cur->start = start; - cur->size = size; - cur->remaining = size; - cur->node = NULL; - WARN_ON(res && start + size > res->num_pages << PAGE_SHIFT); - return; - } + if (!res) + goto err_out; BUG_ON(start + size > res->num_pages << PAGE_SHIFT); - node = to_ttm_range_mgr_node(res)->mm_nodes; - while (start >= node->size << PAGE_SHIFT) - start -= node++->size << PAGE_SHIFT; + cur->mem_type = res->mem_type; + + switch (cur->mem_type) { + case TTM_PL_VRAM: + head = &to_amdgpu_vram_mgr_node(res)->blocks; + + block = list_first_entry_or_null(head, +struct drm_buddy_block, +link); + if (!block) + goto err_out; + + while (start >= node_size(block)) { + start -= node_size(block); + + next = block->link.next; + if (next != head) + block = list_entry(next, struct drm_buddy_block, link); + } + + cur->start = node_start(block) + start; + cur->size = min(node_size(block) - start, size); + cur->remaining = size; + cur->node = block; + break; + case TTM_PL_TT: + node = to_ttm_range_mgr_node(res)->mm_nodes; + while (start >= node->size << PAGE_SHIFT) + start -= node++->size << PAGE_SHIFT; + + cur->start = (node->start << PAGE_SHIFT) + start; + cur->size = min((node->size << PAGE_SHIFT) - start, size); + cur->remaining = size; + cur->node = node; + break; + default: + goto err_out; + } - cur->start = (node->start << PAGE_SHIFT) + start; - cur->size = min((node->size << PAGE_SHIFT) - start, size); + return; + +err_out: + cur->start = start; + cur->size = size; cur->remaining = size; - cur->node = node; + cur->node = NULL; + WARN_ON(res && start + size > res->num_pages << PAGE_SHIFT); + return; } /** @@ -85,7 +124,9 @@ static inline void amdgpu_res_first(struct ttm_resource *res, */ static inline void amdgpu_res_next(struct amdgpu_res_cursor *cur, uint64_t size) { - struct drm_mm_node *node = cur->node; + struct drm_buddy_block *block; + struct drm_mm_node *node; + struct list_head *next; BUG_ON(size > cur->remaining); @@ -99,9 +140,27 @@ static inline void amdgpu_res_next(struct amdgpu_res_cursor *cur, uint64_t size) return; } - cur->node = ++node; - cur->start = node->start << PAGE_SHIFT; - cur->size = min(node->size << PAGE_SHIFT, cur->remaining); + switch (cur->mem_type) { + case TTM_PL_VRAM: + block = cur->node; + + next = block->link.next; + block = list_entry(next, struct drm_buddy_block, link); + + cur->node = block; + cur->start = node_start(block); + cur->size = min(node_size(block), cur->remaining); + break; + case TTM_PL_TT: + node = cur->node; + + cur->node = ++node; + cur->start = node->start << PAGE_SHIFT; + cur->size = min(node->size << PAGE_SHIFT, cur->re
[Intel-gfx] [PATCH 1/3] drm:Enable buddy allocator support
Port Intel buddy manager to drm root folder Implemented range allocation support for the provided order Implemented TOP-DOWN support Implemented freeing up unused pages on contiguous allocation Moved range allocation and freelist pickup into a single function Signed-off-by: Arunpravin --- drivers/gpu/drm/Makefile| 2 +- drivers/gpu/drm/drm_buddy.c | 705 drivers/gpu/drm/drm_drv.c | 3 + include/drm/drm_buddy.h | 157 4 files changed, 866 insertions(+), 1 deletion(-) create mode 100644 drivers/gpu/drm/drm_buddy.c create mode 100644 include/drm/drm_buddy.h diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile index a118692a6df7..fe1a2fc09675 100644 --- a/drivers/gpu/drm/Makefile +++ b/drivers/gpu/drm/Makefile @@ -18,7 +18,7 @@ drm-y :=drm_aperture.o drm_auth.o drm_cache.o \ drm_dumb_buffers.o drm_mode_config.o drm_vblank.o \ drm_syncobj.o drm_lease.o drm_writeback.o drm_client.o \ drm_client_modeset.o drm_atomic_uapi.o drm_hdcp.o \ - drm_managed.o drm_vblank_work.o + drm_managed.o drm_vblank_work.o drm_buddy.o drm-$(CONFIG_DRM_LEGACY) += drm_agpsupport.o drm_bufs.o drm_context.o drm_dma.o \ drm_legacy_misc.o drm_lock.o drm_memory.o drm_scatter.o \ diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c new file mode 100644 index ..8cd118574665 --- /dev/null +++ b/drivers/gpu/drm/drm_buddy.c @@ -0,0 +1,705 @@ +// SPDX-License-Identifier: MIT +/* + * Copyright © 2021 Intel Corporation + */ + +#include +#include + +#include + +static struct kmem_cache *slab_blocks; + +static struct drm_buddy_block *drm_block_alloc(struct drm_buddy_mm *mm, + struct drm_buddy_block *parent, + unsigned int order, + u64 offset) +{ + struct drm_buddy_block *block; + + BUG_ON(order > DRM_BUDDY_MAX_ORDER); + + block = kmem_cache_zalloc(slab_blocks, GFP_KERNEL); + if (!block) + return NULL; + + block->header = offset; + block->header |= order; + block->parent = parent; + + BUG_ON(block->header & DRM_BUDDY_HEADER_UNUSED); + return block; +} + +static void drm_block_free(struct drm_buddy_mm *mm, + struct drm_buddy_block *block) +{ + kmem_cache_free(slab_blocks, block); +} + +static void mark_allocated(struct drm_buddy_block *block) +{ + block->header &= ~DRM_BUDDY_HEADER_STATE; + block->header |= DRM_BUDDY_ALLOCATED; + + list_del(&block->link); +} + +static void mark_free(struct drm_buddy_mm *mm, + struct drm_buddy_block *block) +{ + block->header &= ~DRM_BUDDY_HEADER_STATE; + block->header |= DRM_BUDDY_FREE; + + list_add(&block->link, + &mm->free_list[drm_buddy_block_order(block)]); +} + +static void mark_split(struct drm_buddy_block *block) +{ + block->header &= ~DRM_BUDDY_HEADER_STATE; + block->header |= DRM_BUDDY_SPLIT; + + list_del(&block->link); +} + +/** + * drm_buddy_init - init memory manager + * + * @mm: DRM buddy manager to initialize + * @size: size in bytes to manage + * @chunk_size: minimum page size in bytes for our allocations + * + * Initializes the memory manager and its resources. + * + * Returns: + * 0 on success, error code on failure. + */ +int drm_buddy_init(struct drm_buddy_mm *mm, u64 size, u64 chunk_size) +{ + unsigned int i; + u64 offset; + + if (size < chunk_size) + return -EINVAL; + + if (chunk_size < PAGE_SIZE) + return -EINVAL; + + if (!is_power_of_2(chunk_size)) + return -EINVAL; + + size = round_down(size, chunk_size); + + mm->size = size; + mm->avail = size; + mm->chunk_size = chunk_size; + mm->max_order = ilog2(size) - ilog2(chunk_size); + + BUG_ON(mm->max_order > DRM_BUDDY_MAX_ORDER); + + mm->free_list = kmalloc_array(mm->max_order + 1, + sizeof(struct list_head), + GFP_KERNEL); + if (!mm->free_list) + return -ENOMEM; + + for (i = 0; i <= mm->max_order; ++i) + INIT_LIST_HEAD(&mm->free_list[i]); + + mm->n_roots = hweight64(size); + + mm->roots = kmalloc_array(mm->n_roots, + sizeof(struct drm_buddy_block *), + GFP_KERNEL); + if (!mm->roots) + goto out_free_list; + + offset = 0; + i = 0; + + /* +* Split into power-of-two blocks, in case we are given a size that is +* not itself a power-of-two. +*/ + do { + struct drm_buddy_block *root; + unsigned int order; +
[Intel-gfx] [PATCH 2/3] drm/amdgpu:move vram manager defines into a header file
Move vram related defines and inline functions into a separate header file Signed-off-by: Arunpravin --- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h | 72 1 file changed, 72 insertions(+) create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h new file mode 100644 index ..fcab6475ccbb --- /dev/null +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h @@ -0,0 +1,72 @@ +/* SPDX-License-Identifier: MIT + * Copyright 2021 Advanced Micro Devices, Inc. + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR + * OTHER DEALINGS IN THE SOFTWARE. + * + */ + +#ifndef __AMDGPU_VRAM_MGR_H__ +#define __AMDGPU_VRAM_MGR_H__ + +#include + +struct amdgpu_vram_mgr_node { + struct ttm_resource base; + struct list_head blocks; + unsigned long flags; +}; + +struct amdgpu_vram_reservation { + uint64_t start; + uint64_t size; + uint64_t min_size; + unsigned long flags; + struct list_head block; + struct list_head node; +}; + +static inline uint64_t node_start(struct drm_buddy_block *block) +{ + return drm_buddy_block_offset(block); +} + +static inline uint64_t node_size(struct drm_buddy_block *block) +{ + return PAGE_SIZE << drm_buddy_block_order(block); +} + +static inline struct amdgpu_vram_mgr_node * +to_amdgpu_vram_mgr_node(struct ttm_resource *res) +{ + return container_of(res, struct amdgpu_vram_mgr_node, base); +} + +static inline struct amdgpu_vram_mgr * +to_vram_mgr(struct ttm_resource_manager *man) +{ + return container_of(man, struct amdgpu_vram_mgr, manager); +} + +static inline struct amdgpu_device * +to_amdgpu_device(struct amdgpu_vram_mgr *mgr) +{ + return container_of(mgr, struct amdgpu_device, mman.vram_mgr); +} + +#endif -- 2.25.1
Re: [Intel-gfx] [PATCH 23/26] drm/i915: Make request conflict tracking understand parallel submits
On Tue, Oct 12, 2021 at 03:08:05PM -0700, John Harrison wrote: > On 10/4/2021 15:06, Matthew Brost wrote: > > If an object in the excl or shared slot is a composite fence from a > > parallel submit and the current request in the conflict tracking is from > > the same parallel context there is no need to enforce ordering as the > > ordering already implicit. Make the request conflict tracking understand > ordering already -> ordering is already > > > this by comparing the parents parallel fence values and skipping the > parents -> parent's > > > conflict insertion if the values match. > Presumably, this is to cope with the fact that the parallel submit fences do > not look like regular submission fences. And hence the existing code that > says 'new fence belongs to same context as old fence, so safe to ignore' > does not work with parallel submission. However, this change does not appear > to be adding parallel submit support to an existing 'same context' check. It > seems to be a brand new check that does not exist for single submission. > What makes parallel submit different? If we aren't skipping same context > fences for single submits, why do we need it for parallel? Conversely, if we > need it for parallel then why don't we need it for single? > > And if the single submission version is simply somewhere else in the code, > why do the parallel version here instead of at the same place? > > John. > > > > > Signed-off-by: Matthew Brost > > --- > > drivers/gpu/drm/i915/i915_request.c | 43 +++-- > > 1 file changed, 29 insertions(+), 14 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_request.c > > b/drivers/gpu/drm/i915/i915_request.c > > index e9bfa32f9270..cf89624020ad 100644 > > --- a/drivers/gpu/drm/i915/i915_request.c > > +++ b/drivers/gpu/drm/i915/i915_request.c > > @@ -1325,6 +1325,25 @@ i915_request_await_external(struct i915_request *rq, > > struct dma_fence *fence) > > return err; > > } > > +static inline bool is_parallel_rq(struct i915_request *rq) > > +{ > > + return intel_context_is_parallel(rq->context); > > +} > > + > > +static inline struct intel_context *request_to_parent(struct i915_request > > *rq) > > +{ > > + return intel_context_to_parent(rq->context); > > +} > > + > > +static bool is_same_parallel_context(struct i915_request *to, > > +struct i915_request *from) > > +{ > > + if (is_parallel_rq(to)) > Should this not say '&& is_parallel_rq(from)'? > Missed this one. That isn't necessary as if from is not a parallel submit the following compare of parents will always return false. I could add if you insist as either way works. Matt > > + return request_to_parent(to) == request_to_parent(from); > > + > > + return false; > > +} > > + > > int > > i915_request_await_execution(struct i915_request *rq, > > struct dma_fence *fence) > > @@ -1356,11 +1375,14 @@ i915_request_await_execution(struct i915_request > > *rq, > > * want to run our callback in all cases. > > */ > > - if (dma_fence_is_i915(fence)) > > + if (dma_fence_is_i915(fence)) { > > + if (is_same_parallel_context(rq, to_request(fence))) > > + continue; > > ret = __i915_request_await_execution(rq, > > to_request(fence)); > > - else > > + } else { > > ret = i915_request_await_external(rq, fence); > > + } > > if (ret < 0) > > return ret; > > } while (--nchild); > > @@ -1461,10 +1483,13 @@ i915_request_await_dma_fence(struct i915_request > > *rq, struct dma_fence *fence) > > fence)) > > continue; > > - if (dma_fence_is_i915(fence)) > > + if (dma_fence_is_i915(fence)) { > > + if (is_same_parallel_context(rq, to_request(fence))) > > + continue; > > ret = i915_request_await_request(rq, to_request(fence)); > > - else > > + } else { > > ret = i915_request_await_external(rq, fence); > > + } > > if (ret < 0) > > return ret; > > @@ -1539,16 +1564,6 @@ i915_request_await_object(struct i915_request *to, > > return ret; > > } > > -static inline bool is_parallel_rq(struct i915_request *rq) > > -{ > > - return intel_context_is_parallel(rq->context); > > -} > > - > > -static inline struct intel_context *request_to_parent(struct i915_request > > *rq) > > -{ > > - return intel_context_to_parent(rq->context); > > -} > > - > > static struct i915_request * > > __i915_request_ensure_parallel_ordering(struct i915_request *rq, > > struct intel_timeline *timeline) >
Re: [Intel-gfx] [PATCH 10/26] drm/i915/guc: Assign contexts in parent-child relationship consecutive guc_ids
On Fri, Oct 08, 2021 at 09:40:43AM -0700, John Harrison wrote: > On 10/7/2021 18:21, Matthew Brost wrote: > > On Thu, Oct 07, 2021 at 03:03:04PM -0700, John Harrison wrote: > > > On 10/4/2021 15:06, Matthew Brost wrote: > > > > Assign contexts in parent-child relationship consecutive guc_ids. This > > > > is accomplished by partitioning guc_id space between ones that need to > > > > be consecutive (1/16 available guc_ids) and ones that do not (15/16 of > > > > available guc_ids). The consecutive search is implemented via the bitmap > > > > API. > > > > > > > > This is a precursor to the full GuC multi-lrc implementation but aligns > > > > to how GuC mutli-lrc interface is defined - guc_ids must be consecutive > > > > when using the GuC multi-lrc interface. > > > > > > > > v2: > > > >(Daniel Vetter) > > > > - Explicitly state why we assign consecutive guc_ids > > > > v3: > > > >(John Harrison) > > > > - Bring back in spin lock > > > > > > > > Signed-off-by: Matthew Brost > > > > --- > > > >drivers/gpu/drm/i915/gt/uc/intel_guc.h| 6 +- > > > >.../gpu/drm/i915/gt/uc/intel_guc_submission.c | 104 > > > > ++ > > > >2 files changed, 86 insertions(+), 24 deletions(-) > > > > > > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h > > > > b/drivers/gpu/drm/i915/gt/uc/intel_guc.h > > > > index 25a598e2b6e8..a9f4ec972bfb 100644 > > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h > > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h > > > > @@ -76,9 +76,13 @@ struct intel_guc { > > > > */ > > > > spinlock_t lock; > > > > /** > > > > -* @guc_ids: used to allocate new guc_ids > > > > +* @guc_ids: used to allocate new guc_ids, single-lrc > > > > */ > > > > struct ida guc_ids; > > > > + /** > > > > +* @guc_ids_bitmap: used to allocate new guc_ids, > > > > multi-lrc > > > > +*/ > > > > + unsigned long *guc_ids_bitmap; > > > > /** > > > > * @guc_id_list: list of intel_context with valid > > > > guc_ids but no > > > > * refs > > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > > > > b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > > > > index 1f2809187513..79e7732e83b2 100644 > > > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > > > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c > > > > @@ -128,6 +128,16 @@ guc_create_virtual(struct intel_engine_cs > > > > **siblings, unsigned int count); > > > >#define GUC_REQUEST_SIZE 64 /* bytes */ > > > > +/* > > > > + * We reserve 1/16 of the guc_ids for multi-lrc as these need to be > > > > contiguous > > > > + * per the GuC submission interface. A different allocation algorithm > > > > is used > > > > + * (bitmap vs. ida) between multi-lrc and single-lrc hence the reason > > > > to > > > > + * partition the guc_id space. We believe the number of multi-lrc > > > > contexts in > > > > + * use should be low and 1/16 should be sufficient. Minimum of 32 > > > > guc_ids for > > > > + * multi-lrc. > > > > + */ > > > > +#define NUMBER_MULTI_LRC_GUC_ID > > > > (GUC_MAX_LRC_DESCRIPTORS / 16) > > > > + > > > >/* > > > > * Below is a set of functions which control the GuC scheduling > > > > state which > > > > * require a lock. > > > > @@ -1206,6 +1216,11 @@ int intel_guc_submission_init(struct intel_guc > > > > *guc) > > > > INIT_WORK(&guc->submission_state.destroyed_worker, > > > > destroyed_worker_func); > > > > + guc->submission_state.guc_ids_bitmap = > > > > + bitmap_zalloc(NUMBER_MULTI_LRC_GUC_ID, GFP_KERNEL); > > > > + if (!guc->submission_state.guc_ids_bitmap) > > > > + return -ENOMEM; > > > > + > > > > return 0; > > > >} > > > > @@ -1217,6 +1232,7 @@ void intel_guc_submission_fini(struct intel_guc > > > > *guc) > > > > guc_lrc_desc_pool_destroy(guc); > > > > guc_flush_destroyed_contexts(guc); > > > > i915_sched_engine_put(guc->sched_engine); > > > > + bitmap_free(guc->submission_state.guc_ids_bitmap); > > > >} > > > >static inline void queue_request(struct i915_sched_engine > > > > *sched_engine, > > > > @@ -1268,18 +1284,43 @@ static void guc_submit_request(struct > > > > i915_request *rq) > > > > spin_unlock_irqrestore(&sched_engine->lock, flags); > > > >} > > > > -static int new_guc_id(struct intel_guc *guc) > > > > +static int new_guc_id(struct intel_guc *guc, struct intel_context *ce) > > > >{ > > > > - return ida_simple_get(&guc->submission_state.guc_ids, 0, > > > > - GUC_MAX_LRC_DESCRIPTORS, GFP_KERNEL | > > > > - __GFP_RETRY_MAYFAIL | __GFP_NOWARN); > > > > + int ret; > > > > + > > > > + GEM_BUG_ON(inte
[Intel-gfx] [PATCH 1/2] drm: Add Gamma and Degamma LUT sizes props to drm_crtc to validate.
From: Mark Yacoub [Why] 1. drm_atomic_helper_check doesn't check for the LUT sizes of either Gamma or Degamma props in the new CRTC state, allowing any invalid size to be passed on. 2. Each driver has its own LUT size, which could also be different for legacy users. [How] 1. Create |degamma_lut_size| and |gamma_lut_size| to save the LUT sizes assigned by the driver when it's initializing its color and CTM management. 2. Create drm_atomic_helper_check_crtc which is called by drm_atomic_helper_check to check the LUT sizes saved in drm_crtc that they match the sizes in the new CRTC state. 3. Rename older lut checks that test for the color channels to indicate it's a channel check. It's not included in drm_atomic_helper_check_crtc as it's hardware specific and is to be called by the driver. 4. As the LUT size check now happens in drm_atomic_helper_check, remove the lut check in intel_color.c Fixes: igt@kms_color@pipe-A-invalid-gamma-lut-sizes on MTK Tested on Zork(amdgpu) and Jacuzzi(mediatek), volteer(TGL) v1: 1. Fix typos 2. Remove the LUT size check from intel driver 3. Rename old LUT check to indicate it's a channel change Signed-off-by: Mark Yacoub --- drivers/gpu/drm/drm_atomic_helper.c| 60 ++ drivers/gpu/drm/drm_color_mgmt.c | 14 ++--- drivers/gpu/drm/i915/display/intel_color.c | 14 ++--- include/drm/drm_atomic_helper.h| 1 + include/drm/drm_color_mgmt.h | 7 +-- include/drm/drm_crtc.h | 11 6 files changed, 89 insertions(+), 18 deletions(-) diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c index bc3487964fb5e..5feb2ad0209c3 100644 --- a/drivers/gpu/drm/drm_atomic_helper.c +++ b/drivers/gpu/drm/drm_atomic_helper.c @@ -929,6 +929,62 @@ drm_atomic_helper_check_planes(struct drm_device *dev, } EXPORT_SYMBOL(drm_atomic_helper_check_planes); +/** + * drm_atomic_helper_check_crtcs - validate state object for CRTC changes + * @state: the driver state object + * + * Check the CRTC state object such as the Gamma/Degamma LUT sizes if the new + * state holds them. + * + * RETURNS: + * Zero for success or -errno + */ +int drm_atomic_helper_check_crtcs(struct drm_atomic_state *state) +{ + struct drm_crtc *crtc; + struct drm_crtc_state *new_crtc_state; + int i; + + for_each_new_crtc_in_state (state, crtc, new_crtc_state, i) { + if (new_crtc_state->color_mgmt_changed && + new_crtc_state->gamma_lut) { + uint64_t supported_lut_size = crtc->gamma_lut_size; + uint32_t supported_legacy_lut_size = crtc->gamma_size; + uint32_t new_state_lut_size = + drm_color_lut_size(new_crtc_state->gamma_lut); + + if (new_state_lut_size != supported_lut_size && + new_state_lut_size != supported_legacy_lut_size) { + drm_dbg_state( + state->dev, + "Invalid Gamma LUT size. Should be %u (or %u for legacy) but got %u.\n", + supported_lut_size, + supported_legacy_lut_size, + new_state_lut_size); + return -EINVAL; + } + } + + if (new_crtc_state->color_mgmt_changed && + new_crtc_state->degamma_lut) { + uint32_t new_state_lut_size = + drm_color_lut_size(new_crtc_state->degamma_lut); + uint64_t supported_lut_size = crtc->degamma_lut_size; + + if (new_state_lut_size != supported_lut_size) { + drm_dbg_state( + state->dev, + "Invalid Degamma LUT size. Should be %u but got %u.\n", + supported_lut_size, new_state_lut_size); + return -EINVAL; + } + } + } + + return 0; +} +EXPORT_SYMBOL(drm_atomic_helper_check_crtcs); + /** * drm_atomic_helper_check - validate state object * @dev: DRM device @@ -974,6 +1030,10 @@ int drm_atomic_helper_check(struct drm_device *dev, if (ret) return ret; + ret = drm_atomic_helper_check_crtcs(state); + if (ret) + return ret; + if (state->legacy_cursor_update) state->async_update = !drm_atomic_helper_async_check(dev, state); diff --git a/drivers/gpu/drm/drm_color_mgmt.c b/drivers/gpu/drm/drm_color_mgmt.c index bb14f488c8f6c..e5b820ce823bf 100644 --- a/drivers/gpu/drm/drm_color_mgmt.c +++ b/drivers/gpu/drm/drm_color_mgmt.c @@ -166,6 +166,7 @@ void drm_crtc_enable_color_mgmt(struct d
[Intel-gfx] [PATCH 2/2] amd/amdgpu_dm: Verify Gamma and Degamma LUT sizes using DRM Core check
From: Mark Yacoub [Why] drm_atomic_helper_check_crtc now verifies both legacy and non-legacy LUT sizes. There is no need to check it within amdgpu_dm_atomic_check. [How] Remove the local call to verify LUT sizes and use DRM Core function instead. Tested on ChromeOS Zork. v1: Remove amdgpu_dm_verify_lut_sizes everywhere. Signed-off-by: Mark Yacoub --- .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 8 ++--- .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h | 1 - .../amd/display/amdgpu_dm/amdgpu_dm_color.c | 35 --- 3 files changed, 4 insertions(+), 40 deletions(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index f74663b6b046e..47f8de1cfc3a5 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -10244,6 +10244,10 @@ static int amdgpu_dm_atomic_check(struct drm_device *dev, } } #endif + ret = drm_atomic_helper_check_crtcs(state); + if (ret) + return ret; + for_each_oldnew_crtc_in_state(state, crtc, old_crtc_state, new_crtc_state, i) { dm_old_crtc_state = to_dm_crtc_state(old_crtc_state); @@ -10253,10 +10257,6 @@ static int amdgpu_dm_atomic_check(struct drm_device *dev, dm_old_crtc_state->dsc_force_changed == false) continue; - ret = amdgpu_dm_verify_lut_sizes(new_crtc_state); - if (ret) - goto fail; - if (!new_crtc_state->enable) continue; diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h index fcb9c4a629c32..22730e5542092 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h @@ -617,7 +617,6 @@ void amdgpu_dm_trigger_timing_sync(struct drm_device *dev); #define MAX_COLOR_LEGACY_LUT_ENTRIES 256 void amdgpu_dm_init_color_mod(void); -int amdgpu_dm_verify_lut_sizes(const struct drm_crtc_state *crtc_state); int amdgpu_dm_update_crtc_color_mgmt(struct dm_crtc_state *crtc); int amdgpu_dm_update_plane_color_mgmt(struct dm_crtc_state *crtc, struct dc_plane_state *dc_plane_state); diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_color.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_color.c index a022e5bb30a5c..319f8a8a89835 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_color.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_color.c @@ -284,37 +284,6 @@ static int __set_input_tf(struct dc_transfer_func *func, return res ? 0 : -ENOMEM; } -/** - * Verifies that the Degamma and Gamma LUTs attached to the |crtc_state| are of - * the expected size. - * Returns 0 on success. - */ -int amdgpu_dm_verify_lut_sizes(const struct drm_crtc_state *crtc_state) -{ - const struct drm_color_lut *lut = NULL; - uint32_t size = 0; - - lut = __extract_blob_lut(crtc_state->degamma_lut, &size); - if (lut && size != MAX_COLOR_LUT_ENTRIES) { - DRM_DEBUG_DRIVER( - "Invalid Degamma LUT size. Should be %u but got %u.\n", - MAX_COLOR_LUT_ENTRIES, size); - return -EINVAL; - } - - lut = __extract_blob_lut(crtc_state->gamma_lut, &size); - if (lut && size != MAX_COLOR_LUT_ENTRIES && - size != MAX_COLOR_LEGACY_LUT_ENTRIES) { - DRM_DEBUG_DRIVER( - "Invalid Gamma LUT size. Should be %u (or %u for legacy) but got %u.\n", - MAX_COLOR_LUT_ENTRIES, MAX_COLOR_LEGACY_LUT_ENTRIES, - size); - return -EINVAL; - } - - return 0; -} - /** * amdgpu_dm_update_crtc_color_mgmt: Maps DRM color management to DC stream. * @crtc: amdgpu_dm crtc state @@ -348,10 +317,6 @@ int amdgpu_dm_update_crtc_color_mgmt(struct dm_crtc_state *crtc) bool is_legacy; int r; - r = amdgpu_dm_verify_lut_sizes(&crtc->base); - if (r) - return r; - degamma_lut = __extract_blob_lut(crtc->base.degamma_lut, °amma_size); regamma_lut = __extract_blob_lut(crtc->base.gamma_lut, ®amma_size); -- 2.33.0.882.g93a45727a2-goog
Re: [Intel-gfx] [PATCH 12/26] drm/i915/guc: Implement multi-lrc submission
On Fri, Oct 08, 2021 at 10:20:24AM -0700, John Harrison wrote: > On 10/4/2021 15:06, Matthew Brost wrote: > > Implement multi-lrc submission via a single workqueue entry and single > > H2G. The workqueue entry contains an updated tail value for each > > request, of all the contexts in the multi-lrc submission, and updates > > these values simultaneously. As such, the tasklet and bypass path have > > been updated to coalesce requests into a single submission. > > > > v2: > > (John Harrison) > >- s/wqe/wqi > >- Use FIELD_PREP macros > >- Add GEM_BUG_ONs ensures length fits within field > >- Add comment / white space to intel_guc_write_barrier > > (Kernel test robot) > >- Make need_tasklet a static function > > > > Signed-off-by: Matthew Brost > > --- > > drivers/gpu/drm/i915/gt/uc/intel_guc.c| 26 ++ > > drivers/gpu/drm/i915/gt/uc/intel_guc.h| 8 + > > drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c | 24 +- > > drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 23 +- > > .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 319 -- > > drivers/gpu/drm/i915/i915_request.h | 8 + > > 6 files changed, 335 insertions(+), 73 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.c > > b/drivers/gpu/drm/i915/gt/uc/intel_guc.c > > index 8f8182bf7c11..7191e8439290 100644 > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.c > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.c > > @@ -756,3 +756,29 @@ void intel_guc_load_status(struct intel_guc *guc, > > struct drm_printer *p) > > } > > } > > } > > + > > +void intel_guc_write_barrier(struct intel_guc *guc) > > +{ > > + struct intel_gt *gt = guc_to_gt(guc); > > + > > + if (i915_gem_object_is_lmem(guc->ct.vma->obj)) { > > + /* > > +* Ensure intel_uncore_write_fw can be used rather than > > +* intel_uncore_write. > > +*/ > > + GEM_BUG_ON(guc->send_regs.fw_domains); > > + > > + /* > > +* This register is used by the i915 and GuC for MMIO based > > +* communication. Once we are in this code CTBs are the only > > +* method the i915 uses to communicate with the GuC so it is > > +* safe to write to this register (a value of 0 is NOP for MMIO > > +* communication). If we ever start mixing CTBs and MMIOs a new > > +* register will have to be chosen. > > +*/ > Hmm, missed it before but this comment is very CTB centric and the barrier > function is now being used for parallel submission work queues. Seems like > an extra comment should be added to cover that case. Just something simple > about WQ usage is also guaranteed to be post CTB switch over. > Sure. > > + intel_uncore_write_fw(gt->uncore, GEN11_SOFT_SCRATCH(0), 0); > > + } else { > > + /* wmb() sufficient for a barrier if in smem */ > > + wmb(); > > + } > > +} > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h > > b/drivers/gpu/drm/i915/gt/uc/intel_guc.h > > index a9f4ec972bfb..147f39cc0f2f 100644 > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h > > @@ -46,6 +46,12 @@ struct intel_guc { > > * submitted until the stalled request is processed. > > */ > > struct i915_request *stalled_request; > > + enum { > > + STALL_NONE, > > + STALL_REGISTER_CONTEXT, > > + STALL_MOVE_LRC_TAIL, > > + STALL_ADD_REQUEST, > > + } submission_stall_reason; > > /* intel_guc_recv interrupt related state */ > > /** @irq_lock: protects GuC irq state */ > > @@ -361,4 +367,6 @@ void intel_guc_submission_cancel_requests(struct > > intel_guc *guc); > > void intel_guc_load_status(struct intel_guc *guc, struct drm_printer *p); > > +void intel_guc_write_barrier(struct intel_guc *guc); > > + > > #endif > > diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c > > b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c > > index 20c710a74498..10d1878d2826 100644 > > --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c > > +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c > > @@ -377,28 +377,6 @@ static u32 ct_get_next_fence(struct intel_guc_ct *ct) > > return ++ct->requests.last_fence; > > } > > -static void write_barrier(struct intel_guc_ct *ct) > > -{ > > - struct intel_guc *guc = ct_to_guc(ct); > > - struct intel_gt *gt = guc_to_gt(guc); > > - > > - if (i915_gem_object_is_lmem(guc->ct.vma->obj)) { > > - GEM_BUG_ON(guc->send_regs.fw_domains); > > - /* > > -* This register is used by the i915 and GuC for MMIO based > > -* communication. Once we are in this code CTBs are the only > > -* method the i915 uses to communicate with the GuC so it is > > -* safe to write to this register (a value of 0 is NOP for MMIO > > -* communication). If we ever start mixing CTBs and MMIOs a
Re: [Intel-gfx] [PATCH] drm/i915/uapi: Add comment clarifying purpose of I915_TILING_* values
Looks good to me. Reviewed-by: Caz Yokoyama -caz On Tue, 2021-10-12 at 15:12 -0700, Matt Roper wrote: > The I915_TILING_* values in our uapi header are intended solely for > use > with the old get_tiling/set_tiling ioctls that operate on hardware > de-tiling fences; all other uapi communication about tiling types is > done via framebuffer modifiers rather than with these old values. > > On newer Intel platforms detiling fences no longer exist so the old > get_tiling/set_tiling ioctls are no longer usable and will always > return > -EOPNOTSUPP. This means there's no reason to add new tiling types > (such > as the Tile4 format introduced by Xe_HP) to the uapi header > here. Any > kernel-internal code that needs to represent tiling format should > either > rely on framebuffer modifiers (as the display code does) or use some > kind of non-uapi enum (as the GEM blt selftest now does). > > References: > https://patchwork.freedesktop.org/patch/456656/?series=95308 > Cc: Ville Syrjälä > Signed-off-by: Matt Roper > --- > include/uapi/drm/i915_drm.h | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/include/uapi/drm/i915_drm.h > b/include/uapi/drm/i915_drm.h > index aa2a7eccfb94..9b8e61163c39 100644 > --- a/include/uapi/drm/i915_drm.h > +++ b/include/uapi/drm/i915_drm.h > @@ -1522,6 +1522,12 @@ struct drm_i915_gem_caching { > #define I915_TILING_NONE 0 > #define I915_TILING_X1 > #define I915_TILING_Y2 > +/* > + * Do not add new tiling types here. The I915_TILING_* values are > for > + * de-tiling fence registers that no longer exist on modern > platforms. Although > + * the hardware may support new types of tiling in general (e.g., > Tile4), we > + * do not need to add them to the uapi that is specific to now- > defunct ioctls. > + */ > #define I915_TILING_LAST I915_TILING_Y > > #define I915_BIT_6_SWIZZLE_NONE 0
[Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [v2,1/4] dri: do not check for NULL debugfs dentry
== Series Details == Series: series starting with [v2,1/4] dri: do not check for NULL debugfs dentry URL : https://patchwork.freedesktop.org/series/95794/ State : warning == Summary == $ dim checkpatch origin/drm-tip bb1c720488a1 dri: do not check for NULL debugfs dentry -:93: CHECK:LINE_SPACING: Please don't use multiple blank lines #93: FILE: include/drm/drm_file.h:84: + total: 0 errors, 0 warnings, 1 checks, 73 lines checked 5a230733e5b5 drm/ttm: do not set NULL to debugfs dentry 9a2340e7beba drm/i915/gt: do not check for NULL debugfs dentry 3e3b63e04133 vgaswitcheroo: do not check for NULL debugfs dentry
[Intel-gfx] ✗ Fi.CI.SPARSE: warning for series starting with [v2,1/4] dri: do not check for NULL debugfs dentry
== Series Details == Series: series starting with [v2,1/4] dri: do not check for NULL debugfs dentry URL : https://patchwork.freedesktop.org/series/95794/ State : warning == Summary == $ dim sparse --fast origin/drm-tip Sparse version: v0.6.2 Fast mode used, each commit won't be checked separately. - +./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB" +./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB" +./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB" +./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB" +./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB" +./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB" +./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB" +./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB" +./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB" +./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB" +./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB" +./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB" +./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB" +./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB" +./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB" +./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB" +./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB" +./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB" +./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB" +./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB" +./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB" +./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB" +./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB" +./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB" +./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB" +./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB" +./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB" +./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB" +./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB" +./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB" +./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB" +./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB" +./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB" +./drivers/gpu/drm/amd/amdgpu/../amdgpu/amdgv_sriovmsg.h:314:49: error: static assertion failed: "amd_sriov_msg_vf2pf_info must be 1 KB"
Re: [Intel-gfx] [PATCH 10/26] drm/i915/guc: Assign contexts in parent-child relationship consecutive guc_ids
On 10/13/2021 11:03, Matthew Brost wrote: On Fri, Oct 08, 2021 at 09:40:43AM -0700, John Harrison wrote: On 10/7/2021 18:21, Matthew Brost wrote: On Thu, Oct 07, 2021 at 03:03:04PM -0700, John Harrison wrote: On 10/4/2021 15:06, Matthew Brost wrote: Assign contexts in parent-child relationship consecutive guc_ids. This is accomplished by partitioning guc_id space between ones that need to be consecutive (1/16 available guc_ids) and ones that do not (15/16 of available guc_ids). The consecutive search is implemented via the bitmap API. This is a precursor to the full GuC multi-lrc implementation but aligns to how GuC mutli-lrc interface is defined - guc_ids must be consecutive when using the GuC multi-lrc interface. v2: (Daniel Vetter) - Explicitly state why we assign consecutive guc_ids v3: (John Harrison) - Bring back in spin lock Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/gt/uc/intel_guc.h| 6 +- .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 104 ++ 2 files changed, 86 insertions(+), 24 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc.h b/drivers/gpu/drm/i915/gt/uc/intel_guc.h index 25a598e2b6e8..a9f4ec972bfb 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc.h +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc.h @@ -76,9 +76,13 @@ struct intel_guc { */ spinlock_t lock; /** -* @guc_ids: used to allocate new guc_ids +* @guc_ids: used to allocate new guc_ids, single-lrc */ struct ida guc_ids; + /** +* @guc_ids_bitmap: used to allocate new guc_ids, multi-lrc +*/ + unsigned long *guc_ids_bitmap; /** * @guc_id_list: list of intel_context with valid guc_ids but no * refs diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 1f2809187513..79e7732e83b2 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -128,6 +128,16 @@ guc_create_virtual(struct intel_engine_cs **siblings, unsigned int count); #define GUC_REQUEST_SIZE 64 /* bytes */ +/* + * We reserve 1/16 of the guc_ids for multi-lrc as these need to be contiguous + * per the GuC submission interface. A different allocation algorithm is used + * (bitmap vs. ida) between multi-lrc and single-lrc hence the reason to + * partition the guc_id space. We believe the number of multi-lrc contexts in + * use should be low and 1/16 should be sufficient. Minimum of 32 guc_ids for + * multi-lrc. + */ +#define NUMBER_MULTI_LRC_GUC_ID(GUC_MAX_LRC_DESCRIPTORS / 16) + /* * Below is a set of functions which control the GuC scheduling state which * require a lock. @@ -1206,6 +1216,11 @@ int intel_guc_submission_init(struct intel_guc *guc) INIT_WORK(&guc->submission_state.destroyed_worker, destroyed_worker_func); + guc->submission_state.guc_ids_bitmap = + bitmap_zalloc(NUMBER_MULTI_LRC_GUC_ID, GFP_KERNEL); + if (!guc->submission_state.guc_ids_bitmap) + return -ENOMEM; + return 0; } @@ -1217,6 +1232,7 @@ void intel_guc_submission_fini(struct intel_guc *guc) guc_lrc_desc_pool_destroy(guc); guc_flush_destroyed_contexts(guc); i915_sched_engine_put(guc->sched_engine); + bitmap_free(guc->submission_state.guc_ids_bitmap); } static inline void queue_request(struct i915_sched_engine *sched_engine, @@ -1268,18 +1284,43 @@ static void guc_submit_request(struct i915_request *rq) spin_unlock_irqrestore(&sched_engine->lock, flags); } -static int new_guc_id(struct intel_guc *guc) +static int new_guc_id(struct intel_guc *guc, struct intel_context *ce) { - return ida_simple_get(&guc->submission_state.guc_ids, 0, - GUC_MAX_LRC_DESCRIPTORS, GFP_KERNEL | - __GFP_RETRY_MAYFAIL | __GFP_NOWARN); + int ret; + + GEM_BUG_ON(intel_context_is_child(ce)); + + if (intel_context_is_parent(ce)) + ret = bitmap_find_free_region(guc->submission_state.guc_ids_bitmap, + NUMBER_MULTI_LRC_GUC_ID, + order_base_2(ce->parallel.number_children + + 1)); + else + ret = ida_simple_get(&guc->submission_state.guc_ids, +NUMBER_MULTI_LRC_GUC_ID, +GUC_MAX_LRC_DESCRIPTORS, +GFP_KERNEL | __GFP_RETRY_MAYFAIL | +__GFP_NOWARN); + if (unlikely(ret < 0)) + return ret; + + ce->guc_id.id = ret; + return 0;
Re: [Intel-gfx] [PATCH v2] drm/i915: Remove memory frequency calculation
On Wed, 2021-10-13 at 12:32 +0300, Ville Syrjälä wrote: > On Tue, Oct 12, 2021 at 06:00:46PM -0700, José Roberto de Souza wrote: > > This memory frequency calculated is only used to check if it is zero, > > what is not useful as it will never actually be zero. > > > > Also the calculation is wrong, we should be checking other bit to > > select the appropriate frequency multiplier while this code is stuck > > with a fixed multiplier. > > I don't think the alternate ref clock was ever used. > At least I don't recall ever seeing it. > > The real problem with this is that IIRC this is just the last > requested frequency. So on a system with SAGV this will > change dynamically. > > > > > So here dropping it as whole. > > We have a second copy of this in gen6_update_ring_freq(). Rather > than removing one and leaving another potentially broken one behind we > should probably just consolidate on a single implementation. gen6_update_ring_freq() is related to GPU frequency not memory, don't look related at all to me. > > > > > v2: > > - Also remove memory frequency calculation for gen9 LP platforms > > > > Cc: Yakui Zhao > > Cc: Matt Roper > > Fixes: f8112cb9574b ("drm/i915/gen11+: Only load DRAM information from > > pcode") > > Signed-off-by: José Roberto de Souza > > --- > > drivers/gpu/drm/i915/i915_reg.h | 8 > > drivers/gpu/drm/i915/intel_dram.c | 30 ++ > > 2 files changed, 2 insertions(+), 36 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_reg.h > > b/drivers/gpu/drm/i915/i915_reg.h > > index a897f4abea0c3..8825f7ac477b6 100644 > > --- a/drivers/gpu/drm/i915/i915_reg.h > > +++ b/drivers/gpu/drm/i915/i915_reg.h > > @@ -11109,12 +11109,6 @@ enum skl_power_gate { > > #define DC_STATE_DEBUG_MASK_CORES (1 << 0) > > #define DC_STATE_DEBUG_MASK_MEMORY_UP (1 << 1) > > > > -#define BXT_P_CR_MC_BIOS_REQ_0_0_0 _MMIO(MCHBAR_MIRROR_BASE_SNB + 0x7114) > > -#define BXT_REQ_DATA_MASK 0x3F > > -#define BXT_DRAM_CHANNEL_ACTIVE_SHIFT 12 > > -#define BXT_DRAM_CHANNEL_ACTIVE_MASK (0xF << 12) > > -#define BXT_MEMORY_FREQ_MULTIPLIER_HZ 1 > > - > > #define BXT_D_CR_DRP0_DUNIT8 0x1000 > > #define BXT_D_CR_DRP0_DUNIT9 0x1200 > > #define BXT_D_CR_DRP0_DUNIT_START 8 > > @@ -11145,9 +11139,7 @@ enum skl_power_gate { > > #define BXT_DRAM_TYPE_LPDDR4 (0x2 << 22) > > #define BXT_DRAM_TYPE_DDR4(0x4 << 22) > > > > -#define SKL_MEMORY_FREQ_MULTIPLIER_HZ 2 > > #define SKL_MC_BIOS_DATA_0_0_0_MCHBAR_PCU _MMIO(MCHBAR_MIRROR_BASE_SNB + > > 0x5E04) > > -#define SKL_REQ_DATA_MASK (0xF << 0) > > #define DG1_GEAR_TYPE REG_BIT(16) > > > > #define SKL_MAD_INTER_CHANNEL_0_0_0_MCHBAR_MCMAIN > > _MMIO(MCHBAR_MIRROR_BASE_SNB + 0x5000) > > diff --git a/drivers/gpu/drm/i915/intel_dram.c > > b/drivers/gpu/drm/i915/intel_dram.c > > index 30a0cab5eff46..0adadfd9528aa 100644 > > --- a/drivers/gpu/drm/i915/intel_dram.c > > +++ b/drivers/gpu/drm/i915/intel_dram.c > > @@ -244,7 +244,6 @@ static int > > skl_get_dram_info(struct drm_i915_private *i915) > > { > > struct dram_info *dram_info = &i915->dram_info; > > - u32 mem_freq_khz, val; > > int ret; > > > > dram_info->type = skl_get_dram_type(i915); > > @@ -255,17 +254,6 @@ skl_get_dram_info(struct drm_i915_private *i915) > > if (ret) > > return ret; > > > > - val = intel_uncore_read(&i915->uncore, > > - SKL_MC_BIOS_DATA_0_0_0_MCHBAR_PCU); > > - mem_freq_khz = DIV_ROUND_UP((val & SKL_REQ_DATA_MASK) * > > - SKL_MEMORY_FREQ_MULTIPLIER_HZ, 1000); > > - > > - if (dram_info->num_channels * mem_freq_khz == 0) { > > - drm_info(&i915->drm, > > -"Couldn't get system memory bandwidth\n"); > > - return -EINVAL; > > - } > > - > > return 0; > > } > > > > @@ -350,24 +338,10 @@ static void bxt_get_dimm_info(struct dram_dimm_info > > *dimm, u32 val) > > static int bxt_get_dram_info(struct drm_i915_private *i915) > > { > > struct dram_info *dram_info = &i915->dram_info; > > - u32 dram_channels; > > - u32 mem_freq_khz, val; > > - u8 num_active_channels, valid_ranks = 0; > > + u32 val; > > + u8 valid_ranks = 0; > > int i; > > > > - val = intel_uncore_read(&i915->uncore, BXT_P_CR_MC_BIOS_REQ_0_0_0); > > - mem_freq_khz = DIV_ROUND_UP((val & BXT_REQ_DATA_MASK) * > > - BXT_MEMORY_FREQ_MULTIPLIER_HZ, 1000); > > - > > - dram_channels = val & BXT_DRAM_CHANNEL_ACTIVE_MASK; > > - num_active_channels = hweight32(dram_channels); > > - > > - if (mem_freq_khz * num_active_channels == 0) { > > - drm_info(&i915->drm, > > -"Couldn't get system memory bandwidth\n"); > > - return -EINV
Re: [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Parallel submission aka multi-bb execbuf (rev4)
On 10/12/2021 17:15, Matthew Brost wrote: On Tue, Oct 12, 2021 at 03:15:00PM -0700, John Harrison wrote: On 10/4/2021 15:21, Patchwork wrote: == Series Details == Series: Parallel submission aka multi-bb execbuf (rev4) URL : https://patchwork.freedesktop.org/series/92789/ State : warning == Summary == $ dim checkpatch origin/drm-tip e2a47a99bf9d drm/i915/guc: Move GuC guc_id allocation under submission state sub-struct f83d8f1539fa drm/i915/guc: Take GT PM ref when deregistering context -:79: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'gt' - possible side-effects? #79: FILE: drivers/gpu/drm/i915/gt/intel_gt_pm.h:44: +#define with_intel_gt_pm(gt, tmp) \ + for (tmp = 1, intel_gt_pm_get(gt); tmp; \ +intel_gt_pm_put(gt), tmp = 0) -:79: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'tmp' - possible side-effects? #79: FILE: drivers/gpu/drm/i915/gt/intel_gt_pm.h:44: +#define with_intel_gt_pm(gt, tmp) \ + for (tmp = 1, intel_gt_pm_get(gt); tmp; \ +intel_gt_pm_put(gt), tmp = 0) Not sure what these two are complaining about? But 'gt' and 'tmp' should be wrapped with parentheses when used? Not, sure but I think this one is fine. total: 0 errors, 0 warnings, 2 checks, 290 lines checked 93e5284929b3 drm/i915/guc: Take engine PM when a context is pinned with GuC submission 4dd6554d994d drm/i915/guc: Don't call switch_to_kernel_context with GuC submission 8629b55f536c drm/i915: Add logical engine mapping 8117ec0a1ca7 drm/i915: Expose logical engine instance to user aa8e1eb4dd4e drm/i915/guc: Introduce context parent-child relationship aaf50eacc2fd drm/i915/guc: Add multi-lrc context registration e5f6f50e66d1 drm/i915/guc: Ensure GuC schedule operations do not operate on child contexts adf21ba138f3 drm/i915/guc: Assign contexts in parent-child relationship consecutive guc_ids 40ef33318b81 drm/i915/guc: Implement parallel context pin / unpin functions 1ad560c70346 drm/i915/guc: Implement multi-lrc submission -:364: CHECK:SPACING: spaces preferred around that '*' (ctx:ExV) #364: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:771: + *wqi++ = child->ring->tail / sizeof(u64); ^ This seems like a bogus warning. Agree. total: 0 errors, 0 warnings, 1 checks, 570 lines checked 466c01457dec drm/i915/guc: Insert submit fences between requests in parent-child relationship 2ece815c1f18 drm/i915/guc: Implement multi-lrc reset 7add5784199f drm/i915/guc: Update debugfs for GuC multi-lrc -:23: CHECK:LINE_SPACING: Please don't use multiple blank lines #23: FILE: drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c:3707: + This should be fixed. Done. total: 0 errors, 0 warnings, 1 checks, 67 lines checked 966991d7bbed drm/i915: Fix bug in user proto-context creation that leaked contexts 0eb3d3bf0c84 drm/i915/guc: Connect UAPI to GuC multi-lrc interface 68c6596b649a drm/i915/doc: Update parallel submit doc to point to i915_drm.h -:13: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating? #13: deleted file mode 100644 total: 0 errors, 1 warnings, 0 checks, 10 lines checked 8290f5d15ca2 drm/i915/guc: Add basic GuC multi-lrc selftest -:22: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating? #22: new file mode 100644 These two can be ignored. Agree. total: 0 errors, 1 warnings, 0 checks, 190 lines checked ade3768c42d5 drm/i915/guc: Implement no mid batch preemption for multi-lrc 57882939d788 drm/i915: Multi-BB execbuf -:369: CHECK:MACRO_ARG_REUSE: Macro argument reuse '_i' - possible side-effects? #369: FILE: drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:1854: +#define for_each_batch_create_order(_eb, _i) \ + for (_i = 0; _i < (_eb)->num_batches; ++_i) Again, not sure the 'reuse' comment means but should also use '(_i)'? I haven't been able to figure out how to fix these ones. I think you only need () if you dref the variable. The () is to prevent any kind of operator precedence confusion when passing in something more exciting than a simple variable. Doesn't have to be a deref, it could be any operator. Granted, extremely unlikely for this particular macro but generally good practice just in case. E.g. someone passes in weird things like 'a, func()' as '_i'. John. -:371: ERROR:MULTISTATEMENT_MACRO_USE_DO_WHILE: Macros with multiple statements should be enclosed in a do - while loop #371: FILE: drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:1856: +#define for_each_batch_add_order(_eb, _i) \ + BUILD_BUG_ON(!typecheck(int, _i)); \ + for (_i = (_eb)->num_batches - 1; _i >= 0; --_i) This seems bogus. Wrapping it in a do/while will break the purpose! Right. Added the BUILD_BUG_ON here because I did have a bug where I used an unsigned with this macro and that breaks the macro. Matt -:371: CHECK:MACRO_ARG_REUSE: Macro argument reuse '_i' - possible side-effects? #371: FILE: drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:1856:
Re: [Intel-gfx] [PATCH 23/26] drm/i915: Make request conflict tracking understand parallel submits
On 10/13/2021 10:51, Matthew Brost wrote: On Tue, Oct 12, 2021 at 03:08:05PM -0700, John Harrison wrote: On 10/4/2021 15:06, Matthew Brost wrote: If an object in the excl or shared slot is a composite fence from a parallel submit and the current request in the conflict tracking is from the same parallel context there is no need to enforce ordering as the ordering already implicit. Make the request conflict tracking understand ordering already -> ordering is already this by comparing the parents parallel fence values and skipping the parents -> parent's conflict insertion if the values match. Presumably, this is to cope with the fact that the parallel submit fences do not look like regular submission fences. And hence the existing code that says 'new fence belongs to same context as old fence, so safe to ignore' does not work with parallel submission. However, this change does not appear to be adding parallel submit support to an existing 'same context' check. It seems to be a brand new check that does not exist for single submission. What makes parallel submit different? If we aren't skipping same context fences for single submits, why do we need it for parallel? Conversely, if we need it for parallel then why don't we need it for single? And if the single submission version is simply somewhere else in the code, why do the parallel version here instead of at the same place? John. Signed-off-by: Matthew Brost --- drivers/gpu/drm/i915/i915_request.c | 43 +++-- 1 file changed, 29 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index e9bfa32f9270..cf89624020ad 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -1325,6 +1325,25 @@ i915_request_await_external(struct i915_request *rq, struct dma_fence *fence) return err; } +static inline bool is_parallel_rq(struct i915_request *rq) +{ + return intel_context_is_parallel(rq->context); +} + +static inline struct intel_context *request_to_parent(struct i915_request *rq) +{ + return intel_context_to_parent(rq->context); +} + +static bool is_same_parallel_context(struct i915_request *to, +struct i915_request *from) +{ + if (is_parallel_rq(to)) Should this not say '&& is_parallel_rq(from)'? Missed this one. That isn't necessary as if from is not a parallel submit the following compare of parents will always return false. I could add if you insist as either way works. Matt It was more a question of whether req_to_parent() works fine irrespective of whether the rq is a parent, child or single? John. + return request_to_parent(to) == request_to_parent(from); + + return false; +} + int i915_request_await_execution(struct i915_request *rq, struct dma_fence *fence) @@ -1356,11 +1375,14 @@ i915_request_await_execution(struct i915_request *rq, * want to run our callback in all cases. */ - if (dma_fence_is_i915(fence)) + if (dma_fence_is_i915(fence)) { + if (is_same_parallel_context(rq, to_request(fence))) + continue; ret = __i915_request_await_execution(rq, to_request(fence)); - else + } else { ret = i915_request_await_external(rq, fence); + } if (ret < 0) return ret; } while (--nchild); @@ -1461,10 +1483,13 @@ i915_request_await_dma_fence(struct i915_request *rq, struct dma_fence *fence) fence)) continue; - if (dma_fence_is_i915(fence)) + if (dma_fence_is_i915(fence)) { + if (is_same_parallel_context(rq, to_request(fence))) + continue; ret = i915_request_await_request(rq, to_request(fence)); - else + } else { ret = i915_request_await_external(rq, fence); + } if (ret < 0) return ret; @@ -1539,16 +1564,6 @@ i915_request_await_object(struct i915_request *to, return ret; } -static inline bool is_parallel_rq(struct i915_request *rq) -{ - return intel_context_is_parallel(rq->context); -} - -static inline struct intel_context *request_to_parent(struct i915_request *rq) -{ - return intel_context_to_parent(rq->context); -} - static struct i915_request * __i915_request_ensure_parallel_ordering(struct i915_request *rq, struct intel_timeline *timeline)