[Intel-gfx] ✗ Fi.CI.SPARSE: warning for series starting with [01/10] drm/i915: Seal races between async GPU cancellation, retirement and signaling
== Series Details == Series: series starting with [01/10] drm/i915: Seal races between async GPU cancellation, retirement and signaling URL : https://patchwork.freedesktop.org/series/59912/ State : warning == Summary == $ dim sparse origin/drm-tip Sparse version: v0.5.2 Commit: drm/i915: Seal races between async GPU cancellation, retirement and signaling Okay! Commit: drm/i915/gvt: Pin the per-engine GVT shadow contexts Okay! Commit: drm/i915: Export intel_context_instance() -O:drivers/gpu/drm/i915/gt/intel_context.c:131:22: warning: context imbalance in 'intel_context_pin_lock' - wrong count at exit +drivers/gpu/drm/i915/gt/intel_context.c:131:22: warning: context imbalance in 'intel_context_pin_lock' - wrong count at exit +drivers/gpu/drm/i915/gt/intel_context.c:150:6: warning: context imbalance in 'intel_context_pin_unlock' - wrong count at exit Commit: drm/i915/selftests: Use the real kernel context for sseu isolation tests Okay! Commit: drm/i915/selftests: Pass around intel_context for sseu Okay! Commit: drm/i915: Pass intel_context to intel_context_pin_lock() -O:drivers/gpu/drm/i915/gt/intel_context.c:131:22: warning: context imbalance in 'intel_context_pin_lock' - wrong count at exit -O:drivers/gpu/drm/i915/gt/intel_context.c:150:6: warning: context imbalance in 'intel_context_pin_unlock' - wrong count at exit Commit: drm/i915: Split engine setup/init into two phases Okay! Commit: drm/i915: Switch back to an array of logical per-engine HW contexts +./include/linux/overflow.h:285:13: error: incorrect type in conditional +./include/linux/overflow.h:285:13: error: undefined identifier '__builtin_mul_overflow' +./include/linux/overflow.h:285:13:got void +./include/linux/overflow.h:285:13: warning: call with no type! +./include/linux/overflow.h:287:13: error: incorrect type in conditional +./include/linux/overflow.h:287:13: error: undefined identifier '__builtin_add_overflow' +./include/linux/overflow.h:287:13:got void +./include/linux/overflow.h:287:13: warning: call with no type! Commit: drm/i915: Remove intel_context.active_link Okay! Commit: drm/i915: Move i915_request_alloc into selftests/ +./include/uapi/linux/perf_event.h:147:56: warning: cast truncates bits from constant value (8000 becomes 0) ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915: Allow multiple user handles to the same VM
== Series Details == Series: drm/i915: Allow multiple user handles to the same VM URL : https://patchwork.freedesktop.org/series/59913/ State : failure == Summary == CI Bug Log - changes from CI_DRM_5995 -> Patchwork_12867 Summary --- **FAILURE** Serious unknown changes coming with Patchwork_12867 absolutely need to be verified manually. If you think the reported changes have nothing to do with the changes introduced in Patchwork_12867, please notify your bug team to allow them to document this new failure mode, which will reduce false positives in CI. External URL: https://patchwork.freedesktop.org/api/1.0/series/59913/revisions/1/mbox/ Possible new issues --- Here are the unknown changes that may have been introduced in Patchwork_12867: ### IGT changes ### Possible regressions * igt@gem_exec_suspend@basic-s3: - fi-apl-guc: NOTRUN -> [DMESG-WARN][1] [1]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12867/fi-apl-guc/igt@gem_exec_susp...@basic-s3.html Known issues Here are the changes found in Patchwork_12867 that come from known issues: ### IGT changes ### Issues hit * igt@i915_selftest@live_hangcheck: - fi-skl-iommu: [PASS][2] -> [INCOMPLETE][3] ([fdo#108602] / [fdo#108744]) [2]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5995/fi-skl-iommu/igt@i915_selftest@live_hangcheck.html [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12867/fi-skl-iommu/igt@i915_selftest@live_hangcheck.html Possible fixes * igt@i915_module_load@reload: - fi-blb-e6850: [INCOMPLETE][4] ([fdo#107718]) -> [PASS][5] [4]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5995/fi-blb-e6850/igt@i915_module_l...@reload.html [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12867/fi-blb-e6850/igt@i915_module_l...@reload.html [fdo#107718]: https://bugs.freedesktop.org/show_bug.cgi?id=107718 [fdo#108602]: https://bugs.freedesktop.org/show_bug.cgi?id=108602 [fdo#108744]: https://bugs.freedesktop.org/show_bug.cgi?id=108744 Participating hosts (32 -> 38) -- Additional (12): fi-skl-gvtdvm fi-bsw-n3050 fi-skl-6770hq fi-apl-guc fi-snb-2520m fi-bxt-j4205 fi-icl-u3 fi-pnv-d510 fi-bsw-kefka fi-byt-n2820 fi-byt-clapper fi-skl-6700k2 Missing(6): fi-kbl-soraka fi-ilk-m540 fi-bsw-cyan fi-icl-u2 fi-ctg-p8600 fi-bdw-samus Build changes - * Linux: CI_DRM_5995 -> Patchwork_12867 CI_DRM_5995: 290f6892805fb22466c77e373ffd4c8f9e028f7f @ git://anongit.freedesktop.org/gfx-ci/linux IGT_4963: 11e10bc575516c56978640fcc697c27f277c660a @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools Patchwork_12867: 965b86ced4fe3a35cc9cf4db7db6d86864280639 @ git://anongit.freedesktop.org/gfx-ci/linux == Linux commits == 965b86ced4fe drm/i915: Allow multiple user handles to the same VM == Logs == For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12867/ ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] ✗ Fi.CI.BAT: failure for series starting with [01/10] drm/i915: Seal races between async GPU cancellation, retirement and signaling
== Series Details == Series: series starting with [01/10] drm/i915: Seal races between async GPU cancellation, retirement and signaling URL : https://patchwork.freedesktop.org/series/59912/ State : failure == Summary == CI Bug Log - changes from CI_DRM_5995 -> Patchwork_12868 Summary --- **FAILURE** Serious unknown changes coming with Patchwork_12868 absolutely need to be verified manually. If you think the reported changes have nothing to do with the changes introduced in Patchwork_12868, please notify your bug team to allow them to document this new failure mode, which will reduce false positives in CI. External URL: https://patchwork.freedesktop.org/api/1.0/series/59912/revisions/1/mbox/ Possible new issues --- Here are the unknown changes that may have been introduced in Patchwork_12868: ### IGT changes ### Possible regressions * igt@gem_exec_suspend@basic-s3: - fi-apl-guc: NOTRUN -> [DMESG-WARN][1] [1]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12868/fi-apl-guc/igt@gem_exec_susp...@basic-s3.html Known issues Here are the changes found in Patchwork_12868 that come from known issues: ### IGT changes ### Possible fixes * igt@i915_module_load@reload: - fi-blb-e6850: [INCOMPLETE][2] ([fdo#107718]) -> [PASS][3] [2]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5995/fi-blb-e6850/igt@i915_module_l...@reload.html [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12868/fi-blb-e6850/igt@i915_module_l...@reload.html [fdo#107718]: https://bugs.freedesktop.org/show_bug.cgi?id=107718 Participating hosts (32 -> 38) -- Additional (12): fi-bxt-dsi fi-skl-gvtdvm fi-byt-j1900 fi-skl-6770hq fi-apl-guc fi-snb-2520m fi-icl-u3 fi-pnv-d510 fi-bsw-kefka fi-byt-n2820 fi-byt-clapper fi-skl-6700k2 Missing(6): fi-kbl-soraka fi-ilk-m540 fi-hsw-peppy fi-bsw-cyan fi-ctg-p8600 fi-bdw-samus Build changes - * Linux: CI_DRM_5995 -> Patchwork_12868 CI_DRM_5995: 290f6892805fb22466c77e373ffd4c8f9e028f7f @ git://anongit.freedesktop.org/gfx-ci/linux IGT_4963: 11e10bc575516c56978640fcc697c27f277c660a @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools Patchwork_12868: a9960e8e0a5bce983982cb1748cc2edaf17a9d2b @ git://anongit.freedesktop.org/gfx-ci/linux == Linux commits == a9960e8e0a5b drm/i915: Move i915_request_alloc into selftests/ de35fe7b2bdc drm/i915: Remove intel_context.active_link b904aba5b574 drm/i915: Switch back to an array of logical per-engine HW contexts 07503cad8e52 drm/i915: Split engine setup/init into two phases dabea4817a74 drm/i915: Pass intel_context to intel_context_pin_lock() 38d99ba59c32 drm/i915/selftests: Pass around intel_context for sseu 84d02409bb7d drm/i915/selftests: Use the real kernel context for sseu isolation tests 521b9e368bdf drm/i915: Export intel_context_instance() 0ce4990dec50 drm/i915/gvt: Pin the per-engine GVT shadow contexts c9843a19934a drm/i915: Seal races between async GPU cancellation, retirement and signaling == Logs == For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12868/ ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for drm/i915/gen11: enable support for headerless msgs (rev4)
== Series Details == Series: drm/i915/gen11: enable support for headerless msgs (rev4) URL : https://patchwork.freedesktop.org/series/59839/ State : warning == Summary == $ dim checkpatch origin/drm-tip 95faef437aa7 drm/i915/gen11: enable support for headerless msgs -:6: WARNING:TYPO_SPELLING: 'preemptable' may be misspelled - perhaps 'preemptible'? #6: Setting bit5 (headerless msg for preemptable GPGPU context) of SAMPLER_MODE -:8: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line) #8: use cases will be affected by this as this change makes both types of message -:26: WARNING:TYPO_SPELLING: 'preemptable' may be misspelled - perhaps 'preemptible'? #26: FILE: drivers/gpu/drm/i915/gt/intel_workarounds.c:579: + /* allow headerless messages for preemptable GPGPU context */ total: 0 errors, 3 warnings, 0 checks, 17 lines checked ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [CI] drm/i915: Move GraphicsTechnology files under gt/
On Wed, 24 Apr 2019, Chris Wilson wrote: > Start partitioning off the code that talks to the hardware (GT) from the > uapi layers and move the device facing code under gt/ > > One casualty is s/intel_ringbuffer.h/intel_engine.h/ with the plan to > subdivide that header and body further (and split out the submission > code from the ringbuffer and logical context handling). This patch aims > to be simple motion so git can fixup inflight patches with little mess. > > Signed-off-by: Chris Wilson > Acked-by: Joonas Lahtinen > Acked-by: Jani Nikula > Acked-by: Rodrigo Vivi Yay! I guess this'll conflict badly with my wip header refactoring, but first come first served. A couple of comments below. I don't mind having them fixed afterwards to avoid several CI rounds and potential conflicts on the huge patch now. > diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile > index 53ff209b91bb..40130cf5c003 100644 > --- a/drivers/gpu/drm/i915/Makefile > +++ b/drivers/gpu/drm/i915/Makefile > @@ -35,32 +35,53 @@ subdir-ccflags-y += \ > # Extra header tests > include $(src)/Makefile.header-test > > +subdir-ccflags-y += -I$(src) Just remembered [1], the kbuild folks want to use -I $(srctree)/$(src). I suppose it's by design you don't add gt/ in the header search path, and require explicit "gt/foo.h" in the #includes in files outside the gt/ directory. (I assume files under gt/ can omit the directory part.) I could be persuaded either way. [1] 1553859161-2628-1-git-send-email-yamada.masahiro@socionext.com">http://mid.mail-archive.com/1553859161-2628-1-git-send-email-yamada.masahiro@socionext.com > + > # Please keep these build lists sorted! > > # core driver code > i915-y += i915_drv.o \ > i915_irq.o \ > - i915_memcpy.o \ > - i915_mm.o \ > i915_params.o \ > i915_pci.o \ > - i915_reset.o \ > i915_suspend.o \ > - i915_sw_fence.o \ > - i915_syncmap.o \ > i915_sysfs.o \ > - i915_user_extensions.o \ > intel_csr.o \ > intel_device_info.o \ > intel_pm.o \ > intel_runtime_pm.o \ > - intel_workarounds.o > + intel_uncore.o > + > +# core library code > +i915-y += \ > + i915_memcpy.o \ > + i915_mm.o \ > + i915_sw_fence.o \ > + i915_syncmap.o \ > + i915_user_extensions.o > > i915-$(CONFIG_COMPAT) += i915_ioc32.o > i915-$(CONFIG_DEBUG_FS) += i915_debugfs.o intel_pipe_crc.o > i915-$(CONFIG_PERF_EVENTS) += i915_pmu.o > > -# GEM code > +# "Graphics Technology" (aka we talk to the gpu) > +obj-y += gt/ I wonder if this and the gt/Makefile are subtle enough to warrant a comment. That they are only used for the header tests. It took me a moment to figure this out. > +gt-y += \ > + gt/intel_breadcrumbs.o \ > + gt/intel_context.o \ > + gt/intel_engine_cs.o \ > + gt/intel_hangcheck.o \ > + gt/intel_lrc.o \ > + gt/intel_reset.o \ > + gt/intel_ringbuffer.o \ > + gt/intel_mocs.o \ > + gt/intel_sseu.o \ > + gt/intel_workarounds.o > +gt-$(CONFIG_DRM_I915_SELFTEST) += \ > + gt/mock_engine.o > +i915-y += $(gt-y) > + > +# GEM (Graphics Execution Management) code > i915-y += \ > i915_active.o \ > i915_cmd_parser.o \ > @@ -88,15 +109,6 @@ i915-y += \ > i915_timeline.o \ > i915_trace_points.o \ > i915_vma.o \ > - intel_breadcrumbs.o \ > - intel_context.o \ > - intel_engine_cs.o \ > - intel_hangcheck.o \ > - intel_lrc.o \ > - intel_mocs.o \ > - intel_ringbuffer.o \ > - intel_sseu.o \ > - intel_uncore.o \ > intel_wopcm.o > > # general-purpose microcontroller (GuC) support > diff --git a/drivers/gpu/drm/i915/Makefile.header-test > b/drivers/gpu/drm/i915/Makefile.header-test > index 5bcc78d7ac96..96a5d90629ec 100644 > --- a/drivers/gpu/drm/i915/Makefile.header-test > +++ b/drivers/gpu/drm/i915/Makefile.header-test > @@ -13,13 +13,11 @@ header_test := \ > intel_cdclk.h \ > intel_color.h \ > intel_connector.h \ > - intel_context_types.h \ > intel_crt.h \ > intel_csr.h \ > intel_ddi.h \ > intel_dp.h \ > intel_dvo.h \ > - intel_engine_types.h \ > intel_fbc.h \ > intel_fbdev.h \ > intel_frontbuffer.h \ > @@ -33,9 +31,7 @@ header_test := \ > intel_psr.h \ > intel_sdvo.h \ > intel_sprite.h \ > - intel_sseu.h \ > - intel_tv.h \ > - intel_workarounds_types.h > + intel_tv.h > > quiet_cmd_header_test = HDRTEST $@ >cmd_header_test = echo "\#include \"$( $@ > diff --git a/drivers/gpu/drm/i915/gt/Makefile > b/drivers/gpu/drm/i915/gt/Makefile > new file mode 100644 > index ..1c75b5c9790c > --- /dev/null > +++ b/drivers/gpu/drm/i915/gt/Makefile > @@ -0,0 +1,2 @@ > +# Extra header tests > +include $(src)/Makefile.header-test > diff --git a/drivers/gpu/drm/i915/gt/Makefile.header-test > b/drivers/gpu/drm/i915/gt/
[Intel-gfx] ✓ Fi.CI.IGT: success for series starting with [1/2] drm/i915/icl: Factor out combo PHY lane power setup helper
== Series Details == Series: series starting with [1/2] drm/i915/icl: Factor out combo PHY lane power setup helper URL : https://patchwork.freedesktop.org/series/59893/ State : success == Summary == CI Bug Log - changes from CI_DRM_5991_full -> Patchwork_12863_full Summary --- **SUCCESS** No regressions found. Known issues Here are the changes found in Patchwork_12863_full that come from known issues: ### IGT changes ### Issues hit * igt@i915_pm_rpm@i2c: - shard-iclb: [PASS][1] -> [FAIL][2] ([fdo#104097]) [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5991/shard-iclb5/igt@i915_pm_...@i2c.html [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12863/shard-iclb3/igt@i915_pm_...@i2c.html * igt@kms_cursor_crc@cursor-64x64-suspend: - shard-skl: [PASS][3] -> [INCOMPLETE][4] ([fdo#104108] / [fdo#107773]) [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5991/shard-skl4/igt@kms_cursor_...@cursor-64x64-suspend.html [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12863/shard-skl1/igt@kms_cursor_...@cursor-64x64-suspend.html * igt@kms_frontbuffer_tracking@fbc-suspend: - shard-kbl: [PASS][5] -> [INCOMPLETE][6] ([fdo#103665]) [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5991/shard-kbl7/igt@kms_frontbuffer_track...@fbc-suspend.html [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12863/shard-kbl3/igt@kms_frontbuffer_track...@fbc-suspend.html - shard-apl: [PASS][7] -> [DMESG-WARN][8] ([fdo#108566]) +5 similar issues [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5991/shard-apl1/igt@kms_frontbuffer_track...@fbc-suspend.html [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12863/shard-apl3/igt@kms_frontbuffer_track...@fbc-suspend.html * igt@kms_frontbuffer_tracking@fbcpsr-1p-offscren-pri-shrfb-draw-pwrite: - shard-iclb: [PASS][9] -> [FAIL][10] ([fdo#103167]) +4 similar issues [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5991/shard-iclb3/igt@kms_frontbuffer_track...@fbcpsr-1p-offscren-pri-shrfb-draw-pwrite.html [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12863/shard-iclb2/igt@kms_frontbuffer_track...@fbcpsr-1p-offscren-pri-shrfb-draw-pwrite.html * igt@kms_plane_alpha_blend@pipe-c-constant-alpha-min: - shard-skl: [PASS][11] -> [FAIL][12] ([fdo#108145]) [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5991/shard-skl8/igt@kms_plane_alpha_bl...@pipe-c-constant-alpha-min.html [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12863/shard-skl8/igt@kms_plane_alpha_bl...@pipe-c-constant-alpha-min.html * igt@kms_plane_alpha_blend@pipe-c-coverage-7efc: - shard-skl: [PASS][13] -> [FAIL][14] ([fdo#108145] / [fdo#110403]) [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5991/shard-skl5/igt@kms_plane_alpha_bl...@pipe-c-coverage-7efc.html [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12863/shard-skl2/igt@kms_plane_alpha_bl...@pipe-c-coverage-7efc.html * igt@kms_plane_lowres@pipe-a-tiling-y: - shard-iclb: [PASS][15] -> [FAIL][16] ([fdo#103166]) [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5991/shard-iclb1/igt@kms_plane_low...@pipe-a-tiling-y.html [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12863/shard-iclb1/igt@kms_plane_low...@pipe-a-tiling-y.html * igt@kms_plane_scaling@pipe-c-scaler-with-rotation: - shard-glk: [PASS][17] -> [SKIP][18] ([fdo#109271] / [fdo#109278]) [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5991/shard-glk9/igt@kms_plane_scal...@pipe-c-scaler-with-rotation.html [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12863/shard-glk1/igt@kms_plane_scal...@pipe-c-scaler-with-rotation.html * igt@kms_psr@psr2_primary_mmap_cpu: - shard-iclb: [PASS][19] -> [SKIP][20] ([fdo#109441]) +4 similar issues [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5991/shard-iclb2/igt@kms_psr@psr2_primary_mmap_cpu.html [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12863/shard-iclb3/igt@kms_psr@psr2_primary_mmap_cpu.html * igt@kms_setmode@basic: - shard-apl: [PASS][21] -> [FAIL][22] ([fdo#99912]) [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5991/shard-apl1/igt@kms_setm...@basic.html [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12863/shard-apl8/igt@kms_setm...@basic.html * igt@kms_sysfs_edid_timing: - shard-iclb: [PASS][23] -> [FAIL][24] ([fdo#100047]) [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5991/shard-iclb5/igt@kms_sysfs_edid_timing.html [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12863/shard-iclb3/igt@kms_sysfs_edid_timing.html Possible fixes * igt@gem_tiled_swapping@non-threaded: - shard-glk: [FAIL][25] ([fdo#108686]) -> [PASS][26] [25]:
[Intel-gfx] ✗ Fi.CI.BAT: failure for drm/i915/gen11: enable support for headerless msgs (rev4)
== Series Details == Series: drm/i915/gen11: enable support for headerless msgs (rev4) URL : https://patchwork.freedesktop.org/series/59839/ State : failure == Summary == CI Bug Log - changes from CI_DRM_5995 -> Patchwork_12869 Summary --- **FAILURE** Serious unknown changes coming with Patchwork_12869 absolutely need to be verified manually. If you think the reported changes have nothing to do with the changes introduced in Patchwork_12869, please notify your bug team to allow them to document this new failure mode, which will reduce false positives in CI. External URL: https://patchwork.freedesktop.org/api/1.0/series/59839/revisions/4/mbox/ Possible new issues --- Here are the unknown changes that may have been introduced in Patchwork_12869: ### IGT changes ### Possible regressions * igt@gem_exec_suspend@basic-s3: - fi-apl-guc: NOTRUN -> [DMESG-WARN][1] [1]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12869/fi-apl-guc/igt@gem_exec_susp...@basic-s3.html Known issues Here are the changes found in Patchwork_12869 that come from known issues: ### IGT changes ### Issues hit * igt@gem_exec_suspend@basic-s4-devices: - fi-blb-e6850: [PASS][2] -> [INCOMPLETE][3] ([fdo#107718]) [2]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5995/fi-blb-e6850/igt@gem_exec_susp...@basic-s4-devices.html [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12869/fi-blb-e6850/igt@gem_exec_susp...@basic-s4-devices.html * igt@i915_selftest@live_contexts: - fi-bdw-gvtdvm: [PASS][4] -> [DMESG-FAIL][5] ([fdo#110235 ]) [4]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5995/fi-bdw-gvtdvm/igt@i915_selftest@live_contexts.html [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12869/fi-bdw-gvtdvm/igt@i915_selftest@live_contexts.html [fdo#107718]: https://bugs.freedesktop.org/show_bug.cgi?id=107718 [fdo#110235 ]: https://bugs.freedesktop.org/show_bug.cgi?id=110235 Participating hosts (32 -> 42) -- Additional (15): fi-bxt-dsi fi-skl-gvtdvm fi-bsw-n3050 fi-byt-j1900 fi-skl-6770hq fi-apl-guc fi-snb-2520m fi-byt-clapper fi-bxt-j4205 fi-icl-u3 fi-pnv-d510 fi-icl-y fi-byt-n2820 fi-bsw-kefka fi-skl-6700k2 Missing(5): fi-kbl-soraka fi-ilk-m540 fi-bsw-cyan fi-ctg-p8600 fi-bdw-samus Build changes - * Linux: CI_DRM_5995 -> Patchwork_12869 CI_DRM_5995: 290f6892805fb22466c77e373ffd4c8f9e028f7f @ git://anongit.freedesktop.org/gfx-ci/linux IGT_4963: 11e10bc575516c56978640fcc697c27f277c660a @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools Patchwork_12869: 95faef437aa7e6333a2bef8134a5b3eee82f68c1 @ git://anongit.freedesktop.org/gfx-ci/linux == Linux commits == 95faef437aa7 drm/i915/gen11: enable support for headerless msgs == Logs == For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12869/ ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 1/2] drm/i915/icl: Factor out combo PHY lane power setup helper
On Wed, 24 Apr 2019, Imre Deak wrote: > Factor out the combo PHY lane power configuration code to a separate > helper; it will be also needed by the next patch adding the same > configuration for DDI ports. > > While at it also add support to handle lane reversal which wasn't > needed for DSI, but will be needed by DDI ports. > > Also, remove the macros for the power down flags, they aren't > needed any more since we now calculate the power down mask. Many of > those macro values (mostly the ones for the currently unused reversed > lane configs) were actually undefined in the spec and didn't make much > sense either. Only PWR_DOWN_LN_1 is undefined in the spec. > > Cc: Jani Nikula > Cc: Madhav Chauhan > Cc: Ville Syrjala > Signed-off-by: Imre Deak > --- > drivers/gpu/drm/i915/i915_drv.h| 3 +++ > drivers/gpu/drm/i915/i915_reg.h| 9 - > drivers/gpu/drm/i915/icl_dsi.c | 26 +++--- > drivers/gpu/drm/i915/intel_combo_phy.c | 31 +++ > 4 files changed, 37 insertions(+), 32 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > index dc74d33c20aa..1207ef080aae 100644 > --- a/drivers/gpu/drm/i915/i915_drv.h > +++ b/drivers/gpu/drm/i915/i915_drv.h > @@ -3515,6 +3515,9 @@ void icl_combo_phys_init(struct drm_i915_private > *dev_priv); > void icl_combo_phys_uninit(struct drm_i915_private *dev_priv); > void cnl_combo_phys_init(struct drm_i915_private *dev_priv); > void cnl_combo_phys_uninit(struct drm_i915_private *dev_priv); > +void icl_combo_phy_power_up_lanes(struct drm_i915_private *dev_priv, > + enum port port, int lane_count, > + bool lane_reversal); > > int intel_gpu_freq(struct drm_i915_private *dev_priv, int val); > int intel_freq_opcode(struct drm_i915_private *dev_priv, int val); > diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h > index b74824f0b5b1..29f16bc40b0a 100644 > --- a/drivers/gpu/drm/i915/i915_reg.h > +++ b/drivers/gpu/drm/i915/i915_reg.h > @@ -1807,15 +1807,6 @@ enum i915_power_well_id { > #define PG_SEQ_DELAY_OVERRIDE_MASK (3 << 25) > #define PG_SEQ_DELAY_OVERRIDE_SHIFT 25 > #define PG_SEQ_DELAY_OVERRIDE_ENABLE(1 << 24) > -#define PWR_UP_ALL_LANES(0x0 << 4) > -#define PWR_DOWN_LN_3_2_1 (0xe << 4) > -#define PWR_DOWN_LN_3_2 (0xc << 4) > -#define PWR_DOWN_LN_3 (0x8 << 4) > -#define PWR_DOWN_LN_2_1_0 (0x7 << 4) > -#define PWR_DOWN_LN_1_0 (0x3 << 4) > -#define PWR_DOWN_LN_1 (0x2 << 4) > -#define PWR_DOWN_LN_3_1 (0xa << 4) > -#define PWR_DOWN_LN_3_1_0 (0xb << 4) > #define PWR_DOWN_LN_MASK(0xf << 4) > #define PWR_DOWN_LN_SHIFT 4 > > diff --git a/drivers/gpu/drm/i915/icl_dsi.c b/drivers/gpu/drm/i915/icl_dsi.c > index 9d962ea1e635..88959517b668 100644 > --- a/drivers/gpu/drm/i915/icl_dsi.c > +++ b/drivers/gpu/drm/i915/icl_dsi.c > @@ -363,30 +363,10 @@ static void gen11_dsi_power_up_lanes(struct > intel_encoder *encoder) > struct drm_i915_private *dev_priv = to_i915(encoder->base.dev); > struct intel_dsi *intel_dsi = enc_to_intel_dsi(&encoder->base); > enum port port; > - u32 tmp; > - u32 lane_mask; > > - switch (intel_dsi->lane_count) { > - case 1: > - lane_mask = PWR_DOWN_LN_3_1_0; > - break; > - case 2: > - lane_mask = PWR_DOWN_LN_3_1; > - break; > - case 3: > - lane_mask = PWR_DOWN_LN_3; > - break; > - case 4: > - default: > - lane_mask = PWR_UP_ALL_LANES; > - break; > - } > - > - for_each_dsi_port(port, intel_dsi->ports) { > - tmp = I915_READ(ICL_PORT_CL_DW10(port)); > - tmp &= ~PWR_DOWN_LN_MASK; > - I915_WRITE(ICL_PORT_CL_DW10(port), tmp | lane_mask); > - } I look at that and it takes me maybe 10 seconds to figure out what's going on, assuming the PWR_DOWN_LN_* macros are right. > + for_each_dsi_port(port, intel_dsi->ports) > + icl_combo_phy_power_up_lanes(dev_priv, port, > + intel_dsi->lane_count, false); > } > > static void gen11_dsi_config_phy_lanes_sequence(struct intel_encoder > *encoder) > diff --git a/drivers/gpu/drm/i915/intel_combo_phy.c > b/drivers/gpu/drm/i915/intel_combo_phy.c > index 2bf4359d7e41..e3ed584eca47 100644 > --- a/drivers/gpu/drm/i915/intel_combo_phy.c > +++ b/drivers/gpu/drm/i915/intel_combo_phy.c > @@ -203,6 +203,37 @@ static bool icl_combo_phy_verify_state(struct > drm_i915_private *dev_priv, > return ret; > } > > +static uint8_t reverse_nibble_bits(uint8_t val) > +{ > +#define MOVBIT(v, from, to) (!!((v) & BIT(from)) << (to)) > +#define SWPBIT(v, b1, b2) (MOVBIT((v), (b1), (b2)) | MOVBIT((v), (b2), (b1)))
[Intel-gfx] [PATCH v4 1/1] drm/fb-helper: Avoid race with DRM userspace
drm_fb_helper_is_bound() is used to check if DRM userspace is in control. This is done by looking at the fb on the primary plane. By the time fb-helper gets around to committing, it's possible that the facts have changed. Avoid this race by holding the drm_device->master_mutex lock while committing. When DRM userspace does its first open, it will now wait until fb-helper is done. The helper will stay away if there's a master. Locking rule: Always take the fb-helper lock first. v2: - Remove drm_fb_helper_is_bound() (Daniel Vetter) - No need to check fb_helper->dev->master in drm_fb_helper_single_fb_probe(), restore_fbdev_mode() has the check. Suggested-by: Daniel Vetter Signed-off-by: Noralf Trønnes Reviewed-by: Daniel Vetter --- drivers/gpu/drm/drm_auth.c | 20 drivers/gpu/drm/drm_fb_helper.c | 90 - drivers/gpu/drm/drm_internal.h | 2 + 3 files changed, 67 insertions(+), 45 deletions(-) diff --git a/drivers/gpu/drm/drm_auth.c b/drivers/gpu/drm/drm_auth.c index 1669c42c40ed..db199807b7dc 100644 --- a/drivers/gpu/drm/drm_auth.c +++ b/drivers/gpu/drm/drm_auth.c @@ -368,3 +368,23 @@ void drm_master_put(struct drm_master **master) *master = NULL; } EXPORT_SYMBOL(drm_master_put); + +/* Used by drm_client and drm_fb_helper */ +bool drm_master_internal_acquire(struct drm_device *dev) +{ + mutex_lock(&dev->master_mutex); + if (dev->master) { + mutex_unlock(&dev->master_mutex); + return false; + } + + return true; +} +EXPORT_SYMBOL(drm_master_internal_acquire); + +/* Used by drm_client and drm_fb_helper */ +void drm_master_internal_release(struct drm_device *dev) +{ + mutex_unlock(&dev->master_mutex); +} +EXPORT_SYMBOL(drm_master_internal_release); diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c index 2339f0f8f5a8..578428461391 100644 --- a/drivers/gpu/drm/drm_fb_helper.c +++ b/drivers/gpu/drm/drm_fb_helper.c @@ -44,6 +44,7 @@ #include "drm_crtc_internal.h" #include "drm_crtc_helper_internal.h" +#include "drm_internal.h" static bool drm_fbdev_emulation = true; module_param_named(fbdev_emulation, drm_fbdev_emulation, bool, 0600); @@ -509,7 +510,7 @@ static int restore_fbdev_mode_legacy(struct drm_fb_helper *fb_helper) return ret; } -static int restore_fbdev_mode(struct drm_fb_helper *fb_helper) +static int restore_fbdev_mode_force(struct drm_fb_helper *fb_helper) { struct drm_device *dev = fb_helper->dev; @@ -519,6 +520,21 @@ static int restore_fbdev_mode(struct drm_fb_helper *fb_helper) return restore_fbdev_mode_legacy(fb_helper); } +static int restore_fbdev_mode(struct drm_fb_helper *fb_helper) +{ + struct drm_device *dev = fb_helper->dev; + int ret; + + if (!drm_master_internal_acquire(dev)) + return -EBUSY; + + ret = restore_fbdev_mode_force(fb_helper); + + drm_master_internal_release(dev); + + return ret; +} + /** * drm_fb_helper_restore_fbdev_mode_unlocked - restore fbdev configuration * @fb_helper: driver-allocated fbdev helper, can be NULL @@ -556,34 +572,6 @@ int drm_fb_helper_restore_fbdev_mode_unlocked(struct drm_fb_helper *fb_helper) } EXPORT_SYMBOL(drm_fb_helper_restore_fbdev_mode_unlocked); -static bool drm_fb_helper_is_bound(struct drm_fb_helper *fb_helper) -{ - struct drm_device *dev = fb_helper->dev; - struct drm_crtc *crtc; - int bound = 0, crtcs_bound = 0; - - /* -* Sometimes user space wants everything disabled, so don't steal the -* display if there's a master. -*/ - if (READ_ONCE(dev->master)) - return false; - - drm_for_each_crtc(crtc, dev) { - drm_modeset_lock(&crtc->mutex, NULL); - if (crtc->primary->fb) - crtcs_bound++; - if (crtc->primary->fb == fb_helper->fb) - bound++; - drm_modeset_unlock(&crtc->mutex); - } - - if (bound < crtcs_bound) - return false; - - return true; -} - #ifdef CONFIG_MAGIC_SYSRQ /* * restore fbcon display for all kms driver's using this helper, used for sysrq @@ -604,7 +592,7 @@ static bool drm_fb_helper_force_kernel_mode(void) continue; mutex_lock(&helper->lock); - ret = restore_fbdev_mode(helper); + ret = restore_fbdev_mode_force(helper); if (ret) error = true; mutex_unlock(&helper->lock); @@ -663,20 +651,22 @@ static void dpms_legacy(struct drm_fb_helper *fb_helper, int dpms_mode) static void drm_fb_helper_dpms(struct fb_info *info, int dpms_mode) { struct drm_fb_helper *fb_helper = info->par; + struct drm_device *dev = fb_helper->dev; /* * For each CRTC in this fb, turn the connectors on/off. */ mutex_lock(&fb_helper-
[Intel-gfx] [PATCH v4 0/1] drm/fb-helper: Move modesetting code to drm_client
The Intel CI [1] was not happy with the previous version and I don't know which part it didn't like. So I'll split up the series and feed it piece by piece until I know where the problem is. Noralf. [1] https://patchwork.freedesktop.org/series/58597/ Noralf Trønnes (1): drm/fb-helper: Avoid race with DRM userspace drivers/gpu/drm/drm_auth.c | 20 drivers/gpu/drm/drm_fb_helper.c | 90 - drivers/gpu/drm/drm_internal.h | 2 + 3 files changed, 67 insertions(+), 45 deletions(-) -- 2.20.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] ✓ Fi.CI.IGT: success for drm/i915: Move GraphicsTechnology files under gt/
== Series Details == Series: drm/i915: Move GraphicsTechnology files under gt/ URL : https://patchwork.freedesktop.org/series/59900/ State : success == Summary == CI Bug Log - changes from CI_DRM_5991_full -> Patchwork_12864_full Summary --- **SUCCESS** No regressions found. Known issues Here are the changes found in Patchwork_12864_full that come from known issues: ### IGT changes ### Issues hit * igt@gem_mmap_gtt@forked-medium-copy: - shard-iclb: [PASS][1] -> [INCOMPLETE][2] ([fdo#107713]) [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5991/shard-iclb6/igt@gem_mmap_...@forked-medium-copy.html [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12864/shard-iclb1/igt@gem_mmap_...@forked-medium-copy.html * igt@gem_tiled_swapping@non-threaded: - shard-apl: [PASS][3] -> [DMESG-WARN][4] ([fdo#108686]) [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5991/shard-apl3/igt@gem_tiled_swapp...@non-threaded.html [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12864/shard-apl7/igt@gem_tiled_swapp...@non-threaded.html * igt@i915_pm_rc6_residency@rc6-accuracy: - shard-kbl: [PASS][5] -> [SKIP][6] ([fdo#109271]) [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5991/shard-kbl6/igt@i915_pm_rc6_reside...@rc6-accuracy.html [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12864/shard-kbl2/igt@i915_pm_rc6_reside...@rc6-accuracy.html * igt@kms_flip_tiling@flip-x-tiled: - shard-iclb: [PASS][7] -> [FAIL][8] ([fdo#108303]) [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5991/shard-iclb8/igt@kms_flip_til...@flip-x-tiled.html [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12864/shard-iclb8/igt@kms_flip_til...@flip-x-tiled.html * igt@kms_frontbuffer_tracking@fbc-suspend: - shard-kbl: [PASS][9] -> [INCOMPLETE][10] ([fdo#103665]) [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5991/shard-kbl7/igt@kms_frontbuffer_track...@fbc-suspend.html [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12864/shard-kbl5/igt@kms_frontbuffer_track...@fbc-suspend.html - shard-apl: [PASS][11] -> [DMESG-WARN][12] ([fdo#108566]) +1 similar issue [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5991/shard-apl1/igt@kms_frontbuffer_track...@fbc-suspend.html [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12864/shard-apl4/igt@kms_frontbuffer_track...@fbc-suspend.html * igt@kms_frontbuffer_tracking@fbcpsr-1p-offscren-pri-shrfb-draw-pwrite: - shard-iclb: [PASS][13] -> [FAIL][14] ([fdo#103167]) +3 similar issues [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5991/shard-iclb3/igt@kms_frontbuffer_track...@fbcpsr-1p-offscren-pri-shrfb-draw-pwrite.html [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12864/shard-iclb6/igt@kms_frontbuffer_track...@fbcpsr-1p-offscren-pri-shrfb-draw-pwrite.html * igt@kms_plane_alpha_blend@pipe-c-constant-alpha-min: - shard-skl: [PASS][15] -> [FAIL][16] ([fdo#108145]) [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5991/shard-skl8/igt@kms_plane_alpha_bl...@pipe-c-constant-alpha-min.html [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12864/shard-skl2/igt@kms_plane_alpha_bl...@pipe-c-constant-alpha-min.html * igt@kms_plane_scaling@pipe-c-scaler-with-rotation: - shard-glk: [PASS][17] -> [SKIP][18] ([fdo#109271] / [fdo#109278]) [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5991/shard-glk9/igt@kms_plane_scal...@pipe-c-scaler-with-rotation.html [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12864/shard-glk8/igt@kms_plane_scal...@pipe-c-scaler-with-rotation.html * igt@kms_psr@psr2_primary_mmap_cpu: - shard-iclb: [PASS][19] -> [SKIP][20] ([fdo#109441]) +4 similar issues [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5991/shard-iclb2/igt@kms_psr@psr2_primary_mmap_cpu.html [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12864/shard-iclb3/igt@kms_psr@psr2_primary_mmap_cpu.html * igt@kms_setmode@basic: - shard-apl: [PASS][21] -> [FAIL][22] ([fdo#99912]) [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5991/shard-apl1/igt@kms_setm...@basic.html [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12864/shard-apl5/igt@kms_setm...@basic.html Possible fixes * igt@gem_tiled_swapping@non-threaded: - shard-glk: [FAIL][23] ([fdo#108686]) -> [PASS][24] [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5991/shard-glk5/igt@gem_tiled_swapp...@non-threaded.html [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12864/shard-glk3/igt@gem_tiled_swapp...@non-threaded.html * igt@gem_workarounds@suspend-resume: - shard-apl: [DMESG-WARN][25] ([fdo#108566]) -> [PASS][26] +2 similar issues [25]: https://in
[Intel-gfx] ✓ Fi.CI.BAT: success for drm/fb-helper: Move modesetting code to drm_client (rev5)
== Series Details == Series: drm/fb-helper: Move modesetting code to drm_client (rev5) URL : https://patchwork.freedesktop.org/series/58597/ State : success == Summary == CI Bug Log - changes from CI_DRM_5996 -> Patchwork_12870 Summary --- **SUCCESS** No regressions found. External URL: https://patchwork.freedesktop.org/api/1.0/series/58597/revisions/5/mbox/ Known issues Here are the changes found in Patchwork_12870 that come from known issues: ### IGT changes ### Issues hit * igt@i915_selftest@live_contexts: - fi-skl-gvtdvm: [PASS][1] -> [DMESG-FAIL][2] ([fdo#110235 ]) [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5996/fi-skl-gvtdvm/igt@i915_selftest@live_contexts.html [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12870/fi-skl-gvtdvm/igt@i915_selftest@live_contexts.html * igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a: - fi-byt-clapper: [PASS][3] -> [FAIL][4] ([fdo#103191]) [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5996/fi-byt-clapper/igt@kms_pipe_crc_ba...@suspend-read-crc-pipe-a.html [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12870/fi-byt-clapper/igt@kms_pipe_crc_ba...@suspend-read-crc-pipe-a.html [fdo#103191]: https://bugs.freedesktop.org/show_bug.cgi?id=103191 [fdo#110235 ]: https://bugs.freedesktop.org/show_bug.cgi?id=110235 Participating hosts (40 -> 42) -- Additional (6): fi-bxt-dsi fi-bsw-n3050 fi-byt-j1900 fi-hsw-peppy fi-icl-y fi-bsw-kefka Missing(4): fi-kbl-soraka fi-ctg-p8600 fi-ilk-m540 fi-bdw-samus Build changes - * Linux: CI_DRM_5996 -> Patchwork_12870 CI_DRM_5996: 526cadba413ae78043a74d85f5bc9b6e0a430c38 @ git://anongit.freedesktop.org/gfx-ci/linux IGT_4963: 11e10bc575516c56978640fcc697c27f277c660a @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools Patchwork_12870: fa2af64690cbb9403e4fc947b3ac7b224859071e @ git://anongit.freedesktop.org/gfx-ci/linux == Linux commits == fa2af64690cb drm/fb-helper: Avoid race with DRM userspace == Logs == For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12870/ ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 1/2] drm/i915/icl: Factor out combo PHY lane power setup helper
On Thu, Apr 25, 2019 at 11:30:36AM +0300, Jani Nikula wrote: > On Wed, 24 Apr 2019, Imre Deak wrote: > > Factor out the combo PHY lane power configuration code to a separate > > helper; it will be also needed by the next patch adding the same > > configuration for DDI ports. > > > > While at it also add support to handle lane reversal which wasn't > > needed for DSI, but will be needed by DDI ports. > > > > Also, remove the macros for the power down flags, they aren't > > needed any more since we now calculate the power down mask. Many of > > those macro values (mostly the ones for the currently unused reversed > > lane configs) were actually undefined in the spec and didn't make much > > sense either. > > Only PWR_DOWN_LN_1 is undefined in the spec. Arg, yes I missed the DDI/DSI difference. Not sure how since it's quite close together in the spec and the existing 2 and 3 lane DSI cases in the code should've made it clear that there is a difference. > > > > > Cc: Jani Nikula > > Cc: Madhav Chauhan > > Cc: Ville Syrjala > > Signed-off-by: Imre Deak > > --- > > drivers/gpu/drm/i915/i915_drv.h| 3 +++ > > drivers/gpu/drm/i915/i915_reg.h| 9 - > > drivers/gpu/drm/i915/icl_dsi.c | 26 +++--- > > drivers/gpu/drm/i915/intel_combo_phy.c | 31 +++ > > 4 files changed, 37 insertions(+), 32 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_drv.h > > b/drivers/gpu/drm/i915/i915_drv.h > > index dc74d33c20aa..1207ef080aae 100644 > > --- a/drivers/gpu/drm/i915/i915_drv.h > > +++ b/drivers/gpu/drm/i915/i915_drv.h > > @@ -3515,6 +3515,9 @@ void icl_combo_phys_init(struct drm_i915_private > > *dev_priv); > > void icl_combo_phys_uninit(struct drm_i915_private *dev_priv); > > void cnl_combo_phys_init(struct drm_i915_private *dev_priv); > > void cnl_combo_phys_uninit(struct drm_i915_private *dev_priv); > > +void icl_combo_phy_power_up_lanes(struct drm_i915_private *dev_priv, > > + enum port port, int lane_count, > > + bool lane_reversal); > > > > int intel_gpu_freq(struct drm_i915_private *dev_priv, int val); > > int intel_freq_opcode(struct drm_i915_private *dev_priv, int val); > > diff --git a/drivers/gpu/drm/i915/i915_reg.h > > b/drivers/gpu/drm/i915/i915_reg.h > > index b74824f0b5b1..29f16bc40b0a 100644 > > --- a/drivers/gpu/drm/i915/i915_reg.h > > +++ b/drivers/gpu/drm/i915/i915_reg.h > > @@ -1807,15 +1807,6 @@ enum i915_power_well_id { > > #define PG_SEQ_DELAY_OVERRIDE_MASK(3 << 25) > > #define PG_SEQ_DELAY_OVERRIDE_SHIFT 25 > > #define PG_SEQ_DELAY_OVERRIDE_ENABLE (1 << 24) > > -#define PWR_UP_ALL_LANES (0x0 << 4) > > -#define PWR_DOWN_LN_3_2_1 (0xe << 4) > > -#define PWR_DOWN_LN_3_2 (0xc << 4) > > -#define PWR_DOWN_LN_3 (0x8 << 4) > > -#define PWR_DOWN_LN_2_1_0 (0x7 << 4) > > -#define PWR_DOWN_LN_1_0 (0x3 << 4) > > -#define PWR_DOWN_LN_1 (0x2 << 4) > > -#define PWR_DOWN_LN_3_1 (0xa << 4) > > -#define PWR_DOWN_LN_3_1_0 (0xb << 4) > > #define PWR_DOWN_LN_MASK (0xf << 4) > > #define PWR_DOWN_LN_SHIFT 4 > > > > diff --git a/drivers/gpu/drm/i915/icl_dsi.c b/drivers/gpu/drm/i915/icl_dsi.c > > index 9d962ea1e635..88959517b668 100644 > > --- a/drivers/gpu/drm/i915/icl_dsi.c > > +++ b/drivers/gpu/drm/i915/icl_dsi.c > > @@ -363,30 +363,10 @@ static void gen11_dsi_power_up_lanes(struct > > intel_encoder *encoder) > > struct drm_i915_private *dev_priv = to_i915(encoder->base.dev); > > struct intel_dsi *intel_dsi = enc_to_intel_dsi(&encoder->base); > > enum port port; > > - u32 tmp; > > - u32 lane_mask; > > > > - switch (intel_dsi->lane_count) { > > - case 1: > > - lane_mask = PWR_DOWN_LN_3_1_0; > > - break; > > - case 2: > > - lane_mask = PWR_DOWN_LN_3_1; > > - break; > > - case 3: > > - lane_mask = PWR_DOWN_LN_3; > > - break; > > - case 4: > > - default: > > - lane_mask = PWR_UP_ALL_LANES; > > - break; > > - } > > - > > - for_each_dsi_port(port, intel_dsi->ports) { > > - tmp = I915_READ(ICL_PORT_CL_DW10(port)); > > - tmp &= ~PWR_DOWN_LN_MASK; > > - I915_WRITE(ICL_PORT_CL_DW10(port), tmp | lane_mask); > > - } > > I look at that and it takes me maybe 10 seconds to figure out what's > going on, assuming the PWR_DOWN_LN_* macros are right. > > > + for_each_dsi_port(port, intel_dsi->ports) > > + icl_combo_phy_power_up_lanes(dev_priv, port, > > +intel_dsi->lane_count, false); > > } > > > > static void gen11_dsi_config_phy_lanes_sequence(struct intel_encoder > > *encoder) > > diff --git a/drivers/gpu/drm/i915/intel_combo_phy.c > > b/drivers/gpu/drm/i915/intel_combo_phy.c > > index 2bf4359d7e41..e3ed584eca47 1006
[Intel-gfx] [PATCH 42/45] drm/i915: Stop retiring along engine
We no longer track the execution order along the engine and so no longer need to enforce ordering of retire along the engine. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_request.c | 128 +++- 1 file changed, 52 insertions(+), 76 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index d69ea896ef87..89e4bd80e0bc 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -183,72 +183,23 @@ static void free_capture_list(struct i915_request *request) } } -static void __retire_engine_request(struct intel_engine_cs *engine, - struct i915_request *rq) -{ - GEM_TRACE("%s(%s) fence %llx:%lld, current %d\n", - __func__, engine->name, - rq->fence.context, rq->fence.seqno, - hwsp_seqno(rq)); - - GEM_BUG_ON(!i915_request_completed(rq)); - - local_irq_disable(); - - spin_lock(&engine->timeline.lock); - GEM_BUG_ON(!list_is_first(&rq->link, &engine->timeline.requests)); - list_del_init(&rq->link); - spin_unlock(&engine->timeline.lock); - - spin_lock(&rq->lock); - i915_request_mark_complete(rq); - if (!i915_request_signaled(rq)) - dma_fence_signal_locked(&rq->fence); - if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &rq->fence.flags)) - i915_request_cancel_breadcrumb(rq); - if (rq->waitboost) { - GEM_BUG_ON(!atomic_read(&rq->i915->gt_pm.rps.num_waiters)); - atomic_dec(&rq->i915->gt_pm.rps.num_waiters); - } - spin_unlock(&rq->lock); - - local_irq_enable(); -} - -static void __retire_engine_upto(struct intel_engine_cs *engine, -struct i915_request *rq) -{ - struct i915_request *tmp; - - if (list_empty(&rq->link)) - return; - - do { - tmp = list_first_entry(&engine->timeline.requests, - typeof(*tmp), link); - - GEM_BUG_ON(tmp->engine != engine); - __retire_engine_request(engine, tmp); - } while (tmp != rq); -} - -static void i915_request_retire(struct i915_request *request) +static bool i915_request_retire(struct i915_request *rq) { struct i915_active_request *active, *next; - GEM_TRACE("%s fence %llx:%lld, current %d\n", - request->engine->name, - request->fence.context, request->fence.seqno, - hwsp_seqno(request)); + lockdep_assert_held(&rq->i915->drm.struct_mutex); + if (!i915_request_completed(rq)) + return false; - lockdep_assert_held(&request->i915->drm.struct_mutex); - GEM_BUG_ON(!i915_sw_fence_signaled(&request->submit)); - GEM_BUG_ON(!i915_request_completed(request)); + GEM_TRACE("%s fence %llx:%lld, current %d\n", + rq->engine->name, + rq->fence.context, rq->fence.seqno, + hwsp_seqno(rq)); - trace_i915_request_retire(request); + GEM_BUG_ON(!i915_sw_fence_signaled(&rq->submit)); + trace_i915_request_retire(rq); - advance_ring(request); - free_capture_list(request); + advance_ring(rq); /* * Walk through the active list, calling retire on each. This allows @@ -260,7 +211,7 @@ static void i915_request_retire(struct i915_request *request) * pass along the auxiliary information (to avoid dereferencing * the node after the callback). */ - list_for_each_entry_safe(active, next, &request->active_list, link) { + list_for_each_entry_safe(active, next, &rq->active_list, link) { /* * In microbenchmarks or focusing upon time inside the kernel, * we may spend an inordinate amount of time simply handling @@ -276,18 +227,39 @@ static void i915_request_retire(struct i915_request *request) INIT_LIST_HEAD(&active->link); RCU_INIT_POINTER(active->request, NULL); - active->retire(active, request); + active->retire(active, rq); + } + + local_irq_disable(); + + spin_lock(&rq->engine->timeline.lock); + list_del(&rq->link); + spin_unlock(&rq->engine->timeline.lock); + + spin_lock(&rq->lock); + i915_request_mark_complete(rq); + if (!i915_request_signaled(rq)) + dma_fence_signal_locked(&rq->fence); + if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &rq->fence.flags)) + i915_request_cancel_breadcrumb(rq); + if (rq->waitboost) { + GEM_BUG_ON(!atomic_read(&rq->i915->gt_pm.rps.num_waiters)); + atomic_dec(&rq->i915->gt_pm.rps.num_waiters); } + spin_unlock(&rq->lock); + + local_irq_enable(); - i915_request_remove_from_client(request);
[Intel-gfx] [PATCH 39/45] drm/i915: Move object close under its own lock
Use i915_gem_object_lock() to guard the LUT and active reference to allow us to break free of struct_mutex for handling GEM_CLOSE. Testcase: igt/gem_close_race Testcase: igt/gem_exec_parallel Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 76 ++- .../gpu/drm/i915/gem/i915_gem_context_types.h | 12 +-- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 23 -- drivers/gpu/drm/i915/gem/i915_gem_object.c| 38 ++ .../gpu/drm/i915/gem/i915_gem_object_types.h | 1 - .../gpu/drm/i915/gem/selftests/mock_context.c | 1 - drivers/gpu/drm/i915/i915_drv.h | 4 +- drivers/gpu/drm/i915/i915_gem.c | 1 + drivers/gpu/drm/i915/i915_gem_gtt.c | 1 + drivers/gpu/drm/i915/i915_timeline.c | 13 ++-- drivers/gpu/drm/i915/i915_vma.c | 42 ++ drivers/gpu/drm/i915/i915_vma.h | 17 ++--- .../gpu/drm/i915/selftests/mock_gem_device.c | 1 + 13 files changed, 132 insertions(+), 98 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index b61c63618b87..d4f13278d5b6 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -95,24 +95,42 @@ void i915_lut_handle_free(struct i915_lut_handle *lut) static void lut_close(struct i915_gem_context *ctx) { - struct i915_lut_handle *lut, *ln; struct radix_tree_iter iter; void __rcu **slot; - list_for_each_entry_safe(lut, ln, &ctx->handles_list, ctx_link) { - list_del(&lut->obj_link); - i915_lut_handle_free(lut); - } - INIT_LIST_HEAD(&ctx->handles_list); + lockdep_assert_held(&ctx->mutex); rcu_read_lock(); radix_tree_for_each_slot(slot, &ctx->handles_vma, &iter, 0) { struct i915_vma *vma = rcu_dereference_raw(*slot); - - radix_tree_iter_delete(&ctx->handles_vma, &iter, slot); - - vma->open_count--; - i915_vma_put(vma); + struct drm_i915_gem_object *obj = vma->obj; + struct i915_lut_handle *lut; + bool found = false; + + rcu_read_unlock(); + i915_gem_object_lock(obj); + list_for_each_entry(lut, &obj->lut_list, obj_link) { + if (lut->ctx != ctx) + continue; + + if (lut->handle != iter.index) + continue; + + list_del(&lut->obj_link); + i915_lut_handle_free(lut); + found = true; + break; + } + i915_gem_object_unlock(obj); + rcu_read_lock(); + + if (found) { + radix_tree_iter_delete(&ctx->handles_vma, &iter, slot); + if (atomic_dec_and_test(&vma->open_count) && + !i915_vma_is_ggtt(vma)) + i915_vma_close(vma); + i915_gem_object_put(obj); + } } rcu_read_unlock(); } @@ -238,15 +256,9 @@ static void free_engines(struct i915_gem_engines *e) __free_engines(e, e->num_engines); } -static void free_engines_rcu(struct work_struct *wrk) +static void free_engines_rcu(struct rcu_head *rcu) { - struct i915_gem_engines *e = - container_of(wrk, struct i915_gem_engines, rcu.work); - struct drm_i915_private *i915 = e->i915; - - mutex_lock(&i915->drm.struct_mutex); - free_engines(e); - mutex_unlock(&i915->drm.struct_mutex); + free_engines(container_of(rcu, struct i915_gem_engines, rcu)); } static struct i915_gem_engines *default_engines(struct i915_gem_context *ctx) @@ -259,7 +271,7 @@ static struct i915_gem_engines *default_engines(struct i915_gem_context *ctx) if (!e) return ERR_PTR(-ENOMEM); - e->i915 = ctx->i915; + init_rcu_head(&e->rcu); for_each_engine(engine, ctx->i915, id) { struct intel_context *ce; @@ -347,7 +359,10 @@ void i915_gem_context_release(struct kref *ref) static void context_close(struct i915_gem_context *ctx) { + mutex_lock(&ctx->mutex); + i915_gem_context_set_closed(ctx); + ctx->file_priv = ERR_PTR(-EBADF); /* * This context will never again be assinged to HW, so we can @@ -362,7 +377,7 @@ static void context_close(struct i915_gem_context *ctx) */ lut_close(ctx); - ctx->file_priv = ERR_PTR(-EBADF); + mutex_unlock(&ctx->mutex); i915_gem_context_put(ctx); } @@ -417,7 +432,6 @@ __create_context(struct drm_i915_private *dev_priv) RCU_INIT_POINTER(ctx->engines, e); INIT_RADIX_TREE(&ctx->handles_vma, GFP_KERNEL); - INIT_LIST_HEAD(&ctx->handles_list); INIT_LIST_HEAD(
[Intel-gfx] [PATCH 38/45] drm/i915: Drop the deferred active reference
An old optimisation to reduce the number of atomics per batch sadly relies on struct_mutex for coordination. In order to remove struct_mutex from serialising object/context closing, always taking and releasing an active reference on first use / last use greatly simplifies the locking. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 2 +- drivers/gpu/drm/i915/gem/i915_gem_object.c| 13 +- drivers/gpu/drm/i915/gem/i915_gem_object.h| 24 +-- .../gpu/drm/i915/gem/i915_gem_object_types.h | 8 --- .../gpu/drm/i915/gem/selftests/huge_pages.c | 3 +-- .../i915/gem/selftests/i915_gem_coherency.c | 4 ++-- .../drm/i915/gem/selftests/i915_gem_context.c | 11 + .../drm/i915/gem/selftests/i915_gem_mman.c| 2 +- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 2 +- drivers/gpu/drm/i915/gt/intel_ringbuffer.c| 3 +-- drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 9 +-- .../gpu/drm/i915/gt/selftest_workarounds.c| 3 --- drivers/gpu/drm/i915/gvt/scheduler.c | 2 +- drivers/gpu/drm/i915/i915_gem_batch_pool.c| 2 +- drivers/gpu/drm/i915/i915_gem_render_state.c | 2 +- drivers/gpu/drm/i915/i915_vma.c | 15 +--- drivers/gpu/drm/i915/selftests/i915_request.c | 8 --- drivers/gpu/drm/i915/selftests/igt_spinner.c | 9 +-- 18 files changed, 26 insertions(+), 96 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 378040732797..b61c63618b87 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -112,7 +112,7 @@ static void lut_close(struct i915_gem_context *ctx) radix_tree_iter_delete(&ctx->handles_vma, &iter, slot); vma->open_count--; - __i915_gem_object_release_unless_active(vma->obj); + i915_vma_put(vma); } rcu_read_unlock(); } diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c index df79e2eead62..c178cf7614c6 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c @@ -156,7 +156,7 @@ void i915_gem_close_object(struct drm_gem_object *gem, struct drm_file *file) list_del(&lut->ctx_link); i915_lut_handle_free(lut); - __i915_gem_object_release_unless_active(obj); + i915_gem_object_put(obj); } mutex_unlock(&i915->drm.struct_mutex); @@ -348,17 +348,6 @@ void i915_gem_free_object(struct drm_gem_object *gem_obj) call_rcu(&obj->rcu, __i915_gem_free_object_rcu); } -void __i915_gem_object_release_unless_active(struct drm_i915_gem_object *obj) -{ - lockdep_assert_held(&obj->base.dev->struct_mutex); - - if (!i915_gem_object_has_active_reference(obj) && - i915_gem_object_is_active(obj)) - i915_gem_object_set_active_reference(obj); - else - i915_gem_object_put(obj); -} - static inline enum fb_op_origin fb_write_origin(struct drm_i915_gem_object *obj, unsigned int domain) { diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h index 23bca003fbfb..dcb765cd97a3 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h @@ -161,31 +161,9 @@ i915_gem_object_needs_async_cancel(const struct drm_i915_gem_object *obj) static inline bool i915_gem_object_is_active(const struct drm_i915_gem_object *obj) { - return obj->active_count; + return READ_ONCE(obj->active_count); } -static inline bool -i915_gem_object_has_active_reference(const struct drm_i915_gem_object *obj) -{ - return test_bit(I915_BO_ACTIVE_REF, &obj->flags); -} - -static inline void -i915_gem_object_set_active_reference(struct drm_i915_gem_object *obj) -{ - lockdep_assert_held(&obj->base.dev->struct_mutex); - __set_bit(I915_BO_ACTIVE_REF, &obj->flags); -} - -static inline void -i915_gem_object_clear_active_reference(struct drm_i915_gem_object *obj) -{ - lockdep_assert_held(&obj->base.dev->struct_mutex); - __clear_bit(I915_BO_ACTIVE_REF, &obj->flags); -} - -void __i915_gem_object_release_unless_active(struct drm_i915_gem_object *obj); - static inline bool i915_gem_object_is_framebuffer(const struct drm_i915_gem_object *obj) { diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h index 24abdd6999b5..133581339d81 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h @@ -120,14 +120,6 @@ struct drm_i915_gem_object { struct list_head batch_pool_link; I915_SELFTEST_DECLARE(struct list_head st_link); - unsigned long flags; - - /** -* Have we taken a reference for the object for incomplete GPU
[Intel-gfx] [PATCH 36/45] drm/i915: Move GEM object busy checking to its own file
Continuing the decluttering of i915_gem.c by moving the object busy checking into its own file. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/Makefile| 1 + drivers/gpu/drm/i915/gem/i915_gem_busy.c | 138 +++ drivers/gpu/drm/i915/i915_gem.c | 128 - 3 files changed, 139 insertions(+), 128 deletions(-) create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_busy.c diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index c65f4829eb83..655616b77abc 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -88,6 +88,7 @@ i915-y += $(gt-y) # GEM (Graphics Execution Management) code obj-y += gem/ gem-y += \ + gem/i915_gem_busy.o \ gem/i915_gem_clflush.o \ gem/i915_gem_context.o \ gem/i915_gem_dmabuf.o \ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_busy.c b/drivers/gpu/drm/i915/gem/i915_gem_busy.c new file mode 100644 index ..5a5eda3003e9 --- /dev/null +++ b/drivers/gpu/drm/i915/gem/i915_gem_busy.c @@ -0,0 +1,138 @@ +/* + * SPDX-License-Identifier: MIT + * + * Copyright © 2014-2016 Intel Corporation + */ + +#include "gt/intel_engine.h" + +#include "i915_gem_ioctls.h" +#include "i915_gem_object.h" + +static __always_inline u32 __busy_read_flag(u8 id) +{ + if (id == (u8)I915_ENGINE_CLASS_INVALID) + return 0xu; + + GEM_BUG_ON(id >= 16); + return 0x1u << id; +} + +static __always_inline u32 __busy_write_id(u8 id) +{ + /* +* The uABI guarantees an active writer is also amongst the read +* engines. This would be true if we accessed the activity tracking +* under the lock, but as we perform the lookup of the object and +* its activity locklessly we can not guarantee that the last_write +* being active implies that we have set the same engine flag from +* last_read - hence we always set both read and write busy for +* last_write. +*/ + if (id == (u8)I915_ENGINE_CLASS_INVALID) + return 0xu; + + return (id + 1) | __busy_read_flag(id); +} + +static __always_inline unsigned int +__busy_set_if_active(const struct dma_fence *fence, u32 (*flag)(u8 id)) +{ + const struct i915_request *rq; + + /* +* We have to check the current hw status of the fence as the uABI +* guarantees forward progress. We could rely on the idle worker +* to eventually flush us, but to minimise latency just ask the +* hardware. +* +* Note we only report on the status of native fences. +*/ + if (!dma_fence_is_i915(fence)) + return 0; + + /* opencode to_request() in order to avoid const warnings */ + rq = container_of(fence, const struct i915_request, fence); + if (i915_request_completed(rq)) + return 0; + + /* Beware type-expansion follies! */ + BUILD_BUG_ON(!typecheck(u8, rq->engine->uabi_class)); + return flag(rq->engine->uabi_class); +} + +static __always_inline unsigned int +busy_check_reader(const struct dma_fence *fence) +{ + return __busy_set_if_active(fence, __busy_read_flag); +} + +static __always_inline unsigned int +busy_check_writer(const struct dma_fence *fence) +{ + if (!fence) + return 0; + + return __busy_set_if_active(fence, __busy_write_id); +} + +int +i915_gem_busy_ioctl(struct drm_device *dev, void *data, + struct drm_file *file) +{ + struct drm_i915_gem_busy *args = data; + struct drm_i915_gem_object *obj; + struct reservation_object_list *list; + unsigned int seq; + int err; + + err = -ENOENT; + rcu_read_lock(); + obj = i915_gem_object_lookup_rcu(file, args->handle); + if (!obj) + goto out; + + /* +* A discrepancy here is that we do not report the status of +* non-i915 fences, i.e. even though we may report the object as idle, +* a call to set-domain may still stall waiting for foreign rendering. +* This also means that wait-ioctl may report an object as busy, +* where busy-ioctl considers it idle. +* +* We trade the ability to warn of foreign fences to report on which +* i915 engines are active for the object. +* +* Alternatively, we can trade that extra information on read/write +* activity with +* args->busy = +* !reservation_object_test_signaled_rcu(obj->resv, true); +* to report the overall busyness. This is what the wait-ioctl does. +* +*/ +retry: + seq = raw_read_seqcount(&obj->resv->seq); + + /* Translate the exclusive fence to the READ *and* WRITE engine */ + args->busy = busy_check_writer(rcu_dereference(obj->resv->fence_excl)); + + /* Translate shared fences to READ set of engines */ +
[Intel-gfx] [PATCH 43/45] drm/i915: Replace engine->timeline with a plain list
To continue the onslaught of removing the assumption of a global execution ordering, another casualty is the engine->timeline. Without an actual timeline to track, it is overkill and we can replace it with a much less grand plain list. We still need a list of requests inflight, for the simple purpose of finding inflight requests (for retiring, resetting, preemption etc). Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gt/intel_engine.h| 6 ++ drivers/gpu/drm/i915/gt/intel_engine_cs.c | 62 +++--- drivers/gpu/drm/i915/gt/intel_engine_types.h | 6 +- drivers/gpu/drm/i915/gt/intel_lrc.c | 84 +-- drivers/gpu/drm/i915/gt/intel_reset.c | 10 +-- drivers/gpu/drm/i915/gt/intel_ringbuffer.c| 15 ++-- drivers/gpu/drm/i915/gt/mock_engine.c | 18 ++-- drivers/gpu/drm/i915/gt/selftest_lrc.c| 4 +- drivers/gpu/drm/i915/i915_gpu_error.c | 5 +- drivers/gpu/drm/i915/i915_request.c | 43 +++--- drivers/gpu/drm/i915/i915_request.h | 2 +- drivers/gpu/drm/i915/i915_scheduler.c | 37 drivers/gpu/drm/i915/i915_timeline.c | 1 - drivers/gpu/drm/i915/i915_timeline.h | 19 - drivers/gpu/drm/i915/i915_timeline_types.h| 4 - drivers/gpu/drm/i915/intel_guc_submission.c | 16 ++-- .../gpu/drm/i915/selftests/mock_timeline.c| 1 - 17 files changed, 145 insertions(+), 188 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h index 89463e18cdcc..1654af266af5 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine.h +++ b/drivers/gpu/drm/i915/gt/intel_engine.h @@ -577,4 +577,10 @@ intel_engine_get_hangcheck_seqno(struct intel_engine_cs *engine) return intel_read_status_page(engine, I915_GEM_HWS_HANGCHECK); } +void intel_engine_init_active(struct intel_engine_cs *engine, + unsigned int subclass); +#define ENGINE_PHYSICAL0 +#define ENGINE_MOCK1 +#define ENGINE_VIRTUAL 2 + #endif /* _INTEL_RINGBUFFER_H_ */ diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 62914e796248..12110e62b3fb 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -548,14 +548,7 @@ static int intel_engine_setup_common(struct intel_engine_cs *engine) if (err) return err; - err = i915_timeline_init(engine->i915, -&engine->timeline, -engine->status_page.vma); - if (err) - goto err_hwsp; - - i915_timeline_set_subclass(&engine->timeline, TIMELINE_ENGINE); - + intel_engine_init_active(engine, ENGINE_PHYSICAL); intel_engine_init_breadcrumbs(engine); intel_engine_init_execlists(engine); intel_engine_init_hangcheck(engine); @@ -568,10 +561,6 @@ static int intel_engine_setup_common(struct intel_engine_cs *engine) intel_sseu_from_device_info(&RUNTIME_INFO(engine->i915)->sseu); return 0; - -err_hwsp: - cleanup_status_page(engine); - return err; } /** @@ -724,6 +713,27 @@ static int pin_context(struct i915_gem_context *ctx, return 0; } +void +intel_engine_init_active(struct intel_engine_cs *engine, unsigned int subclass) +{ + INIT_LIST_HEAD(&engine->active.requests); + + spin_lock_init(&engine->active.lock); + lockdep_set_subclass(&engine->active.lock, subclass); + + /* +* Due to an interesting quirk in lockdep's internal debug tracking, +* after setting a subclass we must ensure the lock is used. Otherwise, +* nr_unused_locks is incremented once too often. +*/ +#ifdef CONFIG_DEBUG_LOCK_ALLOC + local_irq_disable(); + lock_map_acquire(&engine->active.lock.dep_map); + lock_map_release(&engine->active.lock.dep_map); + local_irq_enable(); +#endif +} + /** * intel_engines_init_common - initialize cengine state which might require hw access * @engine: Engine to initialize. @@ -787,6 +797,8 @@ int intel_engine_init_common(struct intel_engine_cs *engine) */ void intel_engine_cleanup_common(struct intel_engine_cs *engine) { + GEM_BUG_ON(!list_empty(&engine->active.requests)); + cleanup_status_page(engine); intel_engine_fini_breadcrumbs(engine); @@ -801,8 +813,6 @@ void intel_engine_cleanup_common(struct intel_engine_cs *engine) intel_context_unpin(engine->kernel_context); GEM_BUG_ON(!llist_empty(&engine->barrier_tasks)); - i915_timeline_fini(&engine->timeline); - intel_wa_list_free(&engine->ctx_wa_list); intel_wa_list_free(&engine->wa_list); intel_wa_list_free(&engine->whitelist); @@ -1402,16 +1412,6 @@ void intel_engine_dump(struct intel_engine_cs *engine, drm_printf(m, "\tRequests:\n"); - rq = list_first_entry(&engine->timeline.request
[Intel-gfx] [PATCH 45/45] drm/i915/execlists: Minimalistic timeslicing
If we have multiple contexts of equal priority pending execution, activate a timer to demote the currently executing context in favour of the next in the queue when that timeslice expires. This enforces fairness between contexts (so long as they allow preemption -- forced preemption, in the future, will kick those who do not obey) and allows us to avoid userspace blocking forward progress with e.g. unbounded MI_SEMAPHORE_WAIT. For the starting point here, we use the jiffie as our timeslice so that we should be reasonably efficient wrt frequent CPU wakeups. Testcase: igt/gem_exec_scheduler/semaphore-resolve Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gt/intel_engine_types.h | 6 + drivers/gpu/drm/i915/gt/intel_lrc.c | 111 +++ drivers/gpu/drm/i915/gt/selftest_lrc.c | 3 + drivers/gpu/drm/i915/i915_scheduler.c| 1 + drivers/gpu/drm/i915/i915_scheduler_types.h | 1 + 5 files changed, 122 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index 30292e17b827..ca28ffbcf43c 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -12,6 +12,7 @@ #include #include #include +#include #include #include "i915_gem.h" @@ -137,6 +138,11 @@ struct intel_engine_execlists { */ struct tasklet_struct tasklet; + /** +* @timer: kick the current context if its timeslice expires +*/ + struct timer_list timer; + /** * @default_priolist: priority list for I915_PRIORITY_NORMAL */ diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index f3b36a7296e1..e140c2a301a0 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -835,6 +835,82 @@ last_active(const struct intel_engine_execlists *execlists) return *last; } +static void +defer_request(struct i915_request * const rq, struct list_head * const pl) +{ + struct i915_dependency *p; + + /* +* We want to move the interrupted request to the back of +* the round-robin list (i.e. its priority level), but +* in doing so, we must then move all requests that were in +* flight and were waiting for the interrupted request to +* be run after it again. +*/ + list_move_tail(&rq->sched.link, pl); + + list_for_each_entry(p, &rq->sched.waiters_list, wait_link) { + struct i915_request *w = + container_of(p->waiter, typeof(*w), sched); + + if (!i915_sw_fence_done(&w->submit)) + continue; + + /* Leave semaphores spinning on the other engines */ + if (w->engine != rq->engine) + continue; + + /* No waiter should start before the active request completed */ + GEM_BUG_ON(i915_request_started(w)); + + GEM_BUG_ON(rq_prio(w) > rq_prio(rq)); + if (rq_prio(w) < rq_prio(rq)) + continue; + + /* +* This should be very shallow as it is limited by the +* number of requests that can fit in a ring (<64) and +* the number of contexts that can be in flight on this +* engine. +*/ + defer_request(w, pl); + } +} + +static void defer_active(struct intel_engine_cs *engine) +{ + struct i915_request *rq; + + rq = __unwind_incomplete_requests(engine); + if (!rq) + return; + + defer_request(rq, i915_sched_lookup_priolist(engine, rq_prio(rq))); +} + +static bool +need_timeslice(struct intel_engine_cs *engine, const struct i915_request *rq) +{ + int hint; + + if (list_is_last(&rq->sched.link, &engine->active.requests)) + return false; + + hint = max(rq_prio(list_next_entry(rq, sched.link)), + queue_prio(&engine->execlists)); + hint |= __NO_PREEMPTION; + + return hint >= rq_prio(rq); +} + +static bool +enable_timeslice(struct intel_engine_cs *engine) +{ + struct i915_request *last = last_active(&engine->execlists); + + return last && need_timeslice(engine, last); +} + static bool execlists_dequeue(struct intel_engine_cs *engine) { struct intel_engine_execlists * const execlists = &engine->execlists; @@ -928,6 +1004,27 @@ static bool execlists_dequeue(struct intel_engine_cs *engine) */ last->hw_context->lrc_desc |= CTX_DESC_FORCE_RESTORE; last = NULL; + } else if (need_timeslice(engine, last) && + !timer_pending(&engine->execlists.timer)) { + GEM_TRACE("%s: expired last=%llx:%lld, prio=%d, hint=%d\n", + engine->name, +
[Intel-gfx] [PATCH 44/45] drm/i915/execlists: Preempt-to-busy
When using a global seqno, we required a precise stop-the-workd event to handle preemption and unwind the global seqno counter. To accomplish this, we would preempt to a special out-of-band context and wait for the machine to report that it was idle. Given an idle machine, we could very precisely see which requests had completed and which we needed to feed back into the run queue. However, now that we have scrapped the global seqno, we no longer need to precisely unwind the global counter and only track requests by their per-context seqno. This allows us to loosely unwind inflight requests while scheduling a preemption, with the enormous caveat that the requests we put back on the run queue are still _inflight_ (until the preemption request is complete). This makes request tracking much more messy, as at any point then we can see a completed request that we believe is not currently scheduled for execution. We also have to be careful not to rewind RING_TAIL past RING_HEAD on preempting to the running context, and for this we use a semaphore to prevent completion of the request before continuing. To accomplish this feat, we change how we track requests scheduled to the HW. Instead of appending our requests onto a single list as we submit, we track each submission to ELSP as its own block. Then upon receiving the CS preemption event, we promote the pending block to the inflight block (discarding what was previously being tracked). As normal CS completion events arrive, we then remove stale entries from the inflight tracker. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 2 +- drivers/gpu/drm/i915/gt/intel_context_types.h | 5 + drivers/gpu/drm/i915/gt/intel_engine.h| 61 +- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 61 +- drivers/gpu/drm/i915/gt/intel_engine_types.h | 52 +- drivers/gpu/drm/i915/gt/intel_lrc.c | 682 -- drivers/gpu/drm/i915/i915_gpu_error.c | 19 +- drivers/gpu/drm/i915/i915_request.c | 6 + drivers/gpu/drm/i915/i915_request.h | 1 + drivers/gpu/drm/i915/i915_scheduler.c | 13 +- drivers/gpu/drm/i915/i915_utils.h | 13 + drivers/gpu/drm/i915/intel_guc_submission.c | 174 ++--- drivers/gpu/drm/i915/selftests/i915_request.c | 8 +- 13 files changed, 480 insertions(+), 617 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 3d0a7af096f6..cf333007a867 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -630,7 +630,7 @@ static void init_contexts(struct drm_i915_private *i915) static bool needs_preempt_context(struct drm_i915_private *i915) { - return HAS_EXECLISTS(i915); + return USES_GUC_SUBMISSION(i915); } int i915_gem_contexts_init(struct drm_i915_private *dev_priv) diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index 92401d25764b..05e984864098 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -13,6 +13,7 @@ #include #include "i915_active_types.h" +#include "i915_utils.h" #include "intel_sseu.h" struct i915_gem_context; @@ -37,6 +38,10 @@ struct intel_context { struct i915_gem_context *gem_context; struct intel_engine_cs *engine; struct intel_engine_cs *inflight; +#define intel_context_inflight(ce) ptr_mask_bits((ce)->inflight, 2) +#define intel_context_inflight_count(ce) ptr_unmask_bits((ce)->inflight, 2) +#define intel_context_inflight_inc(ce) ptr_count_inc(&(ce)->inflight) +#define intel_context_inflight_dec(ce) ptr_count_dec(&(ce)->inflight) struct list_head signal_link; struct list_head signals; diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h index 1654af266af5..5d4e9480be04 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine.h +++ b/drivers/gpu/drm/i915/gt/intel_engine.h @@ -124,71 +124,26 @@ static inline bool __execlists_need_preempt(int prio, int last) return prio > max(I915_PRIORITY_NORMAL - 1, last); } -static inline void -execlists_set_active(struct intel_engine_execlists *execlists, -unsigned int bit) -{ - __set_bit(bit, (unsigned long *)&execlists->active); -} - -static inline bool -execlists_set_active_once(struct intel_engine_execlists *execlists, - unsigned int bit) -{ - return !__test_and_set_bit(bit, (unsigned long *)&execlists->active); -} - -static inline void -execlists_clear_active(struct intel_engine_execlists *execlists, - unsigned int bit) -{ - __clear_bit(bit, (unsigned long *)&execlists->active); -} - -static inline void -execlists_clear_all_active(struct intel_engine_execlists *execlists) +static inline unsigned int +execlists_num_ports(const struct intel_engine_exec
[Intel-gfx] [PATCH 33/45] lockdep: Swap storage for pin_count and refereneces
As a lockmap takes a reference for every ww_mutex used together, this can be an arbitrarily large number and under control of userspace -- easily overflowing the limit of 4096. Signed-off-by: Chris Wilson --- include/linux/lockdep.h | 4 ++-- kernel/locking/lockdep.c | 15 +-- 2 files changed, 11 insertions(+), 8 deletions(-) diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h index 79c3873d58ac..2dc4a61d7355 100644 --- a/include/linux/lockdep.h +++ b/include/linux/lockdep.h @@ -262,8 +262,8 @@ struct held_lock { unsigned int read:2;/* see lock_acquire() comment */ unsigned int check:1; /* see lock_acquire() comment */ unsigned int hardirqs_off:1; - unsigned int references:12; /* 32 bits */ - unsigned int pin_count; + unsigned int pin_count:12; /* 32 bits */ + unsigned int references; }; /* diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index e221be724fe8..e83bfc520a28 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -3613,9 +3613,9 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass, if (hlock->class_idx == class_idx && nest_lock) { if (hlock->references) { /* -* Check: unsigned int references:12, overflow. +* Check: unsigned int references overflow. */ - if (DEBUG_LOCKS_WARN_ON(hlock->references == (1 << 12)-1)) + if (DEBUG_LOCKS_WARN_ON(hlock->references == UINT_MAX)) return 0; hlock->references++; @@ -4053,11 +4053,14 @@ static struct pin_cookie __lock_pin_lock(struct lockdep_map *lock) if (match_held_lock(hlock, lock)) { /* -* Grab 16bits of randomness; this is sufficient to not -* be guessable and still allows some pin nesting in -* our u32 pin_count. +* Grab 6bits of randomness; this is barely sufficient +* to not be guessable and still allows some 32 levels +* of pin nesting in our u12 pin_count. */ - cookie.val = 1 + (prandom_u32() >> 16); + cookie.val = 1 + (prandom_u32() >> 26); + if (DEBUG_LOCKS_WARN_ON(hlock->pin_count + cookie.val >= 1 << 12)) + return NIL_COOKIE; + hlock->pin_count += cookie.val; return cookie; } -- 2.20.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 37/45] drm/i915: Move GEM client throttling to its own file
Continuing the decluttering of i915_gem.c by moving the client self throttling into its own file. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/Makefile| 1 + drivers/gpu/drm/i915/gem/i915_gem_throttle.c | 74 drivers/gpu/drm/i915/i915_drv.h | 6 -- drivers/gpu/drm/i915/i915_gem.c | 58 --- 4 files changed, 75 insertions(+), 64 deletions(-) create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_throttle.c diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index 655616b77abc..1ae067daafa3 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -104,6 +104,7 @@ gem-y += \ gem/i915_gem_shmem.o \ gem/i915_gem_shrinker.o \ gem/i915_gem_stolen.o \ + gem/i915_gem_throttle.o \ gem/i915_gem_tiling.o \ gem/i915_gem_userptr.o \ gem/i915_gem_wait.o \ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_throttle.c b/drivers/gpu/drm/i915/gem/i915_gem_throttle.c new file mode 100644 index ..491bc28c175d --- /dev/null +++ b/drivers/gpu/drm/i915/gem/i915_gem_throttle.c @@ -0,0 +1,74 @@ +/* + * SPDX-License-Identifier: MIT + * + * Copyright © 2014-2016 Intel Corporation + */ + +#include + +#include + +#include "i915_gem_ioctls.h" +#include "i915_gem_object.h" + +#include "../i915_drv.h" + +/* + * 20ms is a fairly arbitrary limit (greater than the average frame time) + * chosen to prevent the CPU getting more than a frame ahead of the GPU + * (when using lax throttling for the frontbuffer). We also use it to + * offer free GPU waitboosts for severely congested workloads. + */ +#define DRM_I915_THROTTLE_JIFFIES msecs_to_jiffies(20) + +/* + * Throttle our rendering by waiting until the ring has completed our requests + * emitted over 20 msec ago. + * + * Note that if we were to use the current jiffies each time around the loop, + * we wouldn't escape the function with any frames outstanding if the time to + * render a frame was over 20ms. + * + * This should get us reasonable parallelism between CPU and GPU but also + * relatively low latency when blocking on a particular request to finish. + */ +int +i915_gem_throttle_ioctl(struct drm_device *dev, void *data, + struct drm_file *file) +{ + struct drm_i915_file_private *file_priv = file->driver_priv; + unsigned long recent_enough = jiffies - DRM_I915_THROTTLE_JIFFIES; + struct i915_request *request, *target = NULL; + long ret; + + /* ABI: return -EIO if already wedged */ + ret = i915_terminally_wedged(to_i915(dev)); + if (ret) + return ret; + + spin_lock(&file_priv->mm.lock); + list_for_each_entry(request, &file_priv->mm.request_list, client_link) { + if (time_after_eq(request->emitted_jiffies, recent_enough)) + break; + + if (target) { + list_del(&target->client_link); + target->file_priv = NULL; + } + + target = request; + } + if (target) + i915_request_get(target); + spin_unlock(&file_priv->mm.lock); + + if (!target) + return 0; + + ret = i915_request_wait(target, + I915_WAIT_INTERRUPTIBLE, + MAX_SCHEDULE_TIMEOUT); + i915_request_put(target); + + return ret < 0 ? ret : 0; +} diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 0d1ce4eb6699..dcfc8a54d63c 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -211,12 +211,6 @@ struct drm_i915_file_private { struct { spinlock_t lock; struct list_head request_list; -/* 20ms is a fairly arbitrary limit (greater than the average frame time) - * chosen to prevent the CPU getting more than a frame ahead of the GPU - * (when using lax throttling for the frontbuffer). We also use it to - * offer free GPU waitboosts for severely congested workloads. - */ -#define DRM_I915_THROTTLE_JIFFIES msecs_to_jiffies(20) } mm; struct idr context_idr; diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 2f1e6dd78dc1..2a9e8ecf2926 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -988,57 +988,6 @@ int i915_gem_wait_for_idle(struct drm_i915_private *i915, return 0; } -/* Throttle our rendering by waiting until the ring has completed our requests - * emitted over 20 msec ago. - * - * Note that if we were to use the current jiffies each time around the loop, - * we wouldn't escape the function with any frames outstanding if the time to - * render a frame was over 20ms. - * - * This should get us reasonable parallelism between CPU and GPU but also - * relatively low latency when blocking on a particular request to finish. - *
[Intel-gfx] [PATCH 35/45] drm/i915: Move GEM object waiting to its own file
Continuing the decluttering of i915_gem.c by moving the object wait decomposition into its own file. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/gem/i915_gem_object.h | 8 + drivers/gpu/drm/i915/gem/i915_gem_wait.c | 276 + drivers/gpu/drm/i915/i915_drv.h| 17 -- drivers/gpu/drm/i915/i915_gem.c| 254 --- 5 files changed, 285 insertions(+), 271 deletions(-) create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_wait.c diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index e3069812..c65f4829eb83 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -105,6 +105,7 @@ gem-y += \ gem/i915_gem_stolen.o \ gem/i915_gem_tiling.o \ gem/i915_gem_userptr.o \ + gem/i915_gem_wait.o \ gem/i915_gemfs.o i915-y += \ $(gem-y) \ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h index 509d145d808a..23bca003fbfb 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h @@ -436,4 +436,12 @@ static inline void __start_cpu_write(struct drm_i915_gem_object *obj) obj->cache_dirty = true; } +int i915_gem_object_wait(struct drm_i915_gem_object *obj, +unsigned int flags, +long timeout); +int i915_gem_object_wait_priority(struct drm_i915_gem_object *obj, + unsigned int flags, + const struct i915_sched_attr *attr); +#define I915_PRIORITY_DISPLAY I915_USER_PRIORITY(I915_PRIORITY_MAX) + #endif diff --git a/drivers/gpu/drm/i915/gem/i915_gem_wait.c b/drivers/gpu/drm/i915/gem/i915_gem_wait.c new file mode 100644 index ..c1d966af879f --- /dev/null +++ b/drivers/gpu/drm/i915/gem/i915_gem_wait.c @@ -0,0 +1,276 @@ +/* + * SPDX-License-Identifier: MIT + * + * Copyright © 2016 Intel Corporation + */ + +#include + +#include "gt/intel_engine.h" + +#include "i915_gem_ioctls.h" +#include "i915_gem_object.h" + +static long +i915_gem_object_wait_fence(struct dma_fence *fence, + unsigned int flags, + long timeout) +{ + BUILD_BUG_ON(I915_WAIT_INTERRUPTIBLE != 0x1); + + if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)) + return timeout; + + if (dma_fence_is_i915(fence)) + return i915_request_wait(to_request(fence), flags, timeout); + + return dma_fence_wait_timeout(fence, + flags & I915_WAIT_INTERRUPTIBLE, + timeout); +} + +static long +i915_gem_object_wait_reservation(struct reservation_object *resv, +unsigned int flags, +long timeout) +{ + unsigned int seq = __read_seqcount_begin(&resv->seq); + struct dma_fence *excl; + bool prune_fences = false; + + if (flags & I915_WAIT_ALL) { + struct dma_fence **shared; + unsigned int count, i; + int ret; + + ret = reservation_object_get_fences_rcu(resv, + &excl, &count, &shared); + if (ret) + return ret; + + for (i = 0; i < count; i++) { + timeout = i915_gem_object_wait_fence(shared[i], +flags, timeout); + if (timeout < 0) + break; + + dma_fence_put(shared[i]); + } + + for (; i < count; i++) + dma_fence_put(shared[i]); + kfree(shared); + + /* +* If both shared fences and an exclusive fence exist, +* then by construction the shared fences must be later +* than the exclusive fence. If we successfully wait for +* all the shared fences, we know that the exclusive fence +* must all be signaled. If all the shared fences are +* signaled, we can prune the array and recover the +* floating references on the fences/requests. +*/ + prune_fences = count && timeout >= 0; + } else { + excl = reservation_object_get_excl_rcu(resv); + } + + if (excl && timeout >= 0) + timeout = i915_gem_object_wait_fence(excl, flags, timeout); + + dma_fence_put(excl); + + /* +* Opportunistically prune the fences iff we know they have *all* been +* signaled and that the reservation object has not been changed (i.e. +* no new fences have been added). +*/ + if (prune_fences && !__read_seqcount_ret
[Intel-gfx] [PATCH 31/45] drm/i915: Move more GEM objects under gem/
Continuing the theme of separating out the GEM clutter. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/Makefile | 26 +-- drivers/gpu/drm/i915/Makefile.header-test | 2 - .../gpu/drm/i915/{ => gem}/i915_gem_clflush.c | 27 +++ drivers/gpu/drm/i915/gem/i915_gem_clflush.h | 20 + .../gpu/drm/i915/{ => gem}/i915_gem_context.c | 27 ++- .../gpu/drm/i915/{ => gem}/i915_gem_context.h | 22 + .../i915/{ => gem}/i915_gem_context_types.h | 0 .../gpu/drm/i915/{ => gem}/i915_gem_dmabuf.c | 28 +++- drivers/gpu/drm/i915/gem/i915_gem_domain.c| 2 +- .../drm/i915/{ => gem}/i915_gem_execbuffer.c | 37 --- .../drm/i915/{ => gem}/i915_gem_internal.c| 33 +- drivers/gpu/drm/i915/gem/i915_gem_object.c| 10 - drivers/gpu/drm/i915/{ => gem}/i915_gem_pm.c | 2 +- drivers/gpu/drm/i915/{ => gem}/i915_gem_pm.h | 0 .../drm/i915/{ => gem}/i915_gem_shrinker.c| 25 ++- .../gpu/drm/i915/{ => gem}/i915_gem_stolen.c | 33 -- .../gpu/drm/i915/{ => gem}/i915_gem_tiling.c | 31 +++-- .../gpu/drm/i915/{ => gem}/i915_gem_userptr.c | 30 +++-- drivers/gpu/drm/i915/{ => gem}/i915_gemfs.c | 25 ++- drivers/gpu/drm/i915/gem/i915_gemfs.h | 16 +++ .../{ => gem}/selftests/huge_gem_object.c | 22 + .../drm/i915/gem/selftests/huge_gem_object.h | 27 +++ .../drm/i915/{ => gem}/selftests/huge_pages.c | 34 +- .../{ => gem}/selftests/i915_gem_coherency.c | 26 ++- .../{ => gem}/selftests/i915_gem_context.c| 40 + .../{ => gem}/selftests/i915_gem_dmabuf.c | 26 ++- .../drm/i915/gem/selftests/i915_gem_mman.c| 2 +- .../{ => gem}/selftests/i915_gem_object.c | 28 +++- .../i915/{ => gem}/selftests/igt_gem_utils.c | 2 +- .../i915/{ => gem}/selftests/igt_gem_utils.h | 0 .../i915/{ => gem}/selftests/mock_context.c | 24 ++ .../gpu/drm/i915/gem/selftests/mock_context.h | 24 ++ .../i915/{ => gem}/selftests/mock_dmabuf.c| 22 + .../gpu/drm/i915/gem/selftests/mock_dmabuf.h | 22 + .../{ => gem}/selftests/mock_gem_object.h | 7 ++- drivers/gpu/drm/i915/gt/intel_context.c | 4 +- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 3 ++ drivers/gpu/drm/i915/gt/intel_lrc.c | 2 + drivers/gpu/drm/i915/gt/intel_lrc.h | 14 +++--- drivers/gpu/drm/i915/gt/intel_reset.c | 2 + drivers/gpu/drm/i915/gt/intel_ringbuffer.c| 3 ++ drivers/gpu/drm/i915/gt/mock_engine.c | 3 +- drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 7 +-- drivers/gpu/drm/i915/gt/selftest_lrc.c| 7 ++- .../gpu/drm/i915/gt/selftest_workarounds.c| 6 ++- drivers/gpu/drm/i915/gvt/mmio_context.c | 1 + drivers/gpu/drm/i915/gvt/scheduler.c | 5 ++- drivers/gpu/drm/i915/i915_debugfs.c | 2 +- drivers/gpu/drm/i915/i915_drv.c | 1 + drivers/gpu/drm/i915/i915_drv.h | 2 +- drivers/gpu/drm/i915/i915_gem.c | 11 ++--- drivers/gpu/drm/i915/i915_gem_clflush.h | 36 --- drivers/gpu/drm/i915/i915_gem_evict.c | 2 + drivers/gpu/drm/i915/i915_gemfs.h | 34 -- drivers/gpu/drm/i915/i915_globals.c | 2 +- drivers/gpu/drm/i915/i915_gpu_error.c | 2 + drivers/gpu/drm/i915/i915_perf.c | 2 + drivers/gpu/drm/i915/i915_request.c | 3 ++ drivers/gpu/drm/i915/intel_display.c | 1 - drivers/gpu/drm/i915/intel_guc_submission.c | 2 + drivers/gpu/drm/i915/intel_overlay.c | 2 + .../gpu/drm/i915/selftests/huge_gem_object.h | 45 --- drivers/gpu/drm/i915/selftests/i915_active.c | 2 + drivers/gpu/drm/i915/selftests/i915_gem.c | 6 ++- .../gpu/drm/i915/selftests/i915_gem_evict.c | 6 ++- drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 3 +- drivers/gpu/drm/i915/selftests/i915_request.c | 5 ++- .../gpu/drm/i915/selftests/i915_timeline.c| 1 + drivers/gpu/drm/i915/selftests/i915_vma.c | 3 +- .../gpu/drm/i915/selftests/igt_flush_test.c | 2 + drivers/gpu/drm/i915/selftests/igt_spinner.c | 3 +- drivers/gpu/drm/i915/selftests/igt_spinner.h | 2 +- drivers/gpu/drm/i915/selftests/intel_guc.c| 1 + drivers/gpu/drm/i915/selftests/mock_context.h | 42 - drivers/gpu/drm/i915/selftests/mock_dmabuf.h | 41 - .../gpu/drm/i915/selftests/mock_gem_device.c | 5 ++- drivers/gpu/drm/i915/selftests/mock_request.c | 2 +- 77 files changed, 338 insertions(+), 692 deletions(-) rename drivers/gpu/drm/i915/{ => gem}/i915_gem_clflush.c (77%) create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_clflush.h rename drivers/gpu/drm/i915/{ => gem}/i915_gem_context.c (98%) rename drivers/gpu/drm/i915/{ => gem}/i915_gem_context.h (85%) rename drivers/gpu/drm/i915/{ => gem}/i915_gem_context_types.
[Intel-gfx] [PATCH 19/45] drm/i915: Load balancing across a virtual engine
Having allowed the user to define a set of engines that they will want to only use, we go one step further and allow them to bind those engines into a single virtual instance. Submitting a batch to the virtual engine will then forward it to any one of the set in a manner as best to distribute load. The virtual engine has a single timeline across all engines (it operates as a single queue), so it is not able to concurrently run batches across multiple engines by itself; that is left up to the user to submit multiple concurrent batches to multiple queues. Multiple users will be load balanced across the system. The mechanism used for load balancing in this patch is a late greedy balancer. When a request is ready for execution, it is added to each engine's queue, and when an engine is ready for its next request it claims it from the virtual engine. The first engine to do so, wins, i.e. the request is executed at the earliest opportunity (idle moment) in the system. As not all HW is created equal, the user is still able to skip the virtual engine and execute the batch on a specific engine, all within the same queue. It will then be executed in order on the correct engine, with execution on other virtual engines being moved away due to the load detection. A couple of areas for potential improvement left! - The virtual engine always take priority over equal-priority tasks. Mostly broken up by applying FQ_CODEL rules for prioritising new clients, and hopefully the virtual and real engines are not then congested (i.e. all work is via virtual engines, or all work is to the real engine). - We require the breadcrumb irq around every virtual engine request. For normal engines, we eliminate the need for the slow round trip via interrupt by using the submit fence and queueing in order. For virtual engines, we have to allow any job to transfer to a new ring, and cannot coalesce the submissions, so require the completion fence instead, forcing the persistent use of interrupts. - We only drip feed single requests through each virtual engine and onto the physical engines, even if there was enough work to fill all ELSP, leaving small stalls with an idle CS event at the end of every request. Could we be greedy and fill both slots? Being lazy is virtuous for load distribution on less-than-full workloads though. Other areas of improvement are more general, such as reducing lock contention, reducing dispatch overhead, looking at direct submission rather than bouncing around tasklets etc. sseu: Lift the restriction to allow sseu to be reconfigured on virtual engines composed of RENDER_CLASS (rcs). v2: macroize check_user_mbz() v3: Cancel virtual engines on wedging v4: Commence commenting v5: Replace 64b sibling_mask with a list of class:instance Signed-off-by: Chris Wilson Cc: Tvrtko Ursulin --- drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 6 +- drivers/gpu/drm/i915/gt/intel_engine_types.h | 8 + drivers/gpu/drm/i915/gt/intel_lrc.c | 656 ++- drivers/gpu/drm/i915/gt/intel_lrc.h | 9 + drivers/gpu/drm/i915/gt/selftest_lrc.c | 180 + drivers/gpu/drm/i915/i915_gem.h | 5 + drivers/gpu/drm/i915/i915_gem_context.c | 118 +++- drivers/gpu/drm/i915/i915_scheduler.c| 18 +- drivers/gpu/drm/i915/i915_timeline_types.h | 1 + include/uapi/drm/i915_drm.h | 39 ++ 10 files changed, 1012 insertions(+), 28 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c index 4283224249d4..35eef02b3f09 100644 --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c @@ -290,8 +290,12 @@ bool i915_request_enable_breadcrumb(struct i915_request *rq) break; } list_add(&rq->signal_link, pos); - if (pos == &ce->signals) /* catch transitions from empty list */ + if (pos == &ce->signals) { /* catch transitions from empty */ list_move_tail(&ce->signal_link, &b->signalers); + } else if (ce->engine != rq->engine) { /* virtualised */ + list_move_tail(&ce->signal_link, &b->signalers); + intel_engine_queue_breadcrumbs(rq->engine); + } set_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags); spin_unlock(&b->irq_lock); diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index 9d64e33f8427..b07d3685e2cb 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -227,6 +227,7 @@ struct intel_engine_execlists { * @queue: queue of requests, in priority lists */ struct rb_root_cached queue; + struct rb_root_cached virtual; /** * @csb_write: control register for Context Switch buffer @@ -445
[Intel-gfx] [PATCH 02/45] drm/i915/gvt: Pin the per-engine GVT shadow contexts
Our eventual goal is to rid request construction of struct_mutex, with the short term step of lifting the struct_mutex requirements into the higher levels (i.e. the caller must ensure that the context is already pinned into the GTT). In this patch, we pin GVT's shadow context upon allocation and so keep them pinned into the GGTT for as long as the virtual machine is alive, and so we can use the simpler request construction path safe in the knowledge that the hard work is already done. Signed-off-by: Chris Wilson Cc: Zhenyu Wang --- drivers/gpu/drm/i915/gvt/gvt.h | 2 +- drivers/gpu/drm/i915/gvt/kvmgt.c| 2 +- drivers/gpu/drm/i915/gvt/mmio_context.c | 3 +- drivers/gpu/drm/i915/gvt/scheduler.c| 137 4 files changed, 73 insertions(+), 71 deletions(-) diff --git a/drivers/gpu/drm/i915/gvt/gvt.h b/drivers/gpu/drm/i915/gvt/gvt.h index f5a328b5290a..b54f2bdc13a4 100644 --- a/drivers/gpu/drm/i915/gvt/gvt.h +++ b/drivers/gpu/drm/i915/gvt/gvt.h @@ -149,9 +149,9 @@ struct intel_vgpu_submission_ops { struct intel_vgpu_submission { struct intel_vgpu_execlist execlist[I915_NUM_ENGINES]; struct list_head workload_q_head[I915_NUM_ENGINES]; + struct intel_context *shadow[I915_NUM_ENGINES]; struct kmem_cache *workloads; atomic_t running_workload_num; - struct i915_gem_context *shadow_ctx; union { u64 i915_context_pml4; u64 i915_context_pdps[GEN8_3LVL_PDPES]; diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c index a68addf95c23..144301b778df 100644 --- a/drivers/gpu/drm/i915/gvt/kvmgt.c +++ b/drivers/gpu/drm/i915/gvt/kvmgt.c @@ -1576,7 +1576,7 @@ hw_id_show(struct device *dev, struct device_attribute *attr, struct intel_vgpu *vgpu = (struct intel_vgpu *) mdev_get_drvdata(mdev); return sprintf(buf, "%u\n", - vgpu->submission.shadow_ctx->hw_id); + vgpu->submission.shadow[0]->gem_context->hw_id); } return sprintf(buf, "\n"); } diff --git a/drivers/gpu/drm/i915/gvt/mmio_context.c b/drivers/gpu/drm/i915/gvt/mmio_context.c index e7e14c842be4..b8823495022b 100644 --- a/drivers/gpu/drm/i915/gvt/mmio_context.c +++ b/drivers/gpu/drm/i915/gvt/mmio_context.c @@ -495,8 +495,7 @@ static void switch_mmio(struct intel_vgpu *pre, * itself. */ if (mmio->in_context && - !is_inhibit_context(intel_context_lookup(s->shadow_ctx, - dev_priv->engine[ring_id]))) + !is_inhibit_context(s->shadow[ring_id])) continue; if (mmio->mask) diff --git a/drivers/gpu/drm/i915/gvt/scheduler.c b/drivers/gpu/drm/i915/gvt/scheduler.c index 8998fa5ab198..40d9f549a0cd 100644 --- a/drivers/gpu/drm/i915/gvt/scheduler.c +++ b/drivers/gpu/drm/i915/gvt/scheduler.c @@ -36,6 +36,7 @@ #include #include "i915_drv.h" +#include "i915_gem_pm.h" #include "gvt.h" #define RING_CTX_OFF(x) \ @@ -277,18 +278,23 @@ static int shadow_context_status_change(struct notifier_block *nb, return NOTIFY_OK; } -static void shadow_context_descriptor_update(struct intel_context *ce) +static void +shadow_context_descriptor_update(struct intel_context *ce, +struct intel_vgpu_workload *workload) { - u64 desc = 0; - - desc = ce->lrc_desc; + u64 desc = ce->lrc_desc; - /* Update bits 0-11 of the context descriptor which includes flags + /* +* Update bits 0-11 of the context descriptor which includes flags * like GEN8_CTX_* cached in desc_template */ desc &= U64_MAX << 12; desc |= ce->gem_context->desc_template & ((1ULL << 12) - 1); + desc &= ~(0x3 << GEN8_CTX_ADDRESSING_MODE_SHIFT); + desc |= workload->ctx_desc.addressing_mode << + GEN8_CTX_ADDRESSING_MODE_SHIFT; + ce->lrc_desc = desc; } @@ -365,26 +371,22 @@ intel_gvt_workload_req_alloc(struct intel_vgpu_workload *workload) { struct intel_vgpu *vgpu = workload->vgpu; struct intel_vgpu_submission *s = &vgpu->submission; - struct i915_gem_context *shadow_ctx = s->shadow_ctx; struct drm_i915_private *dev_priv = vgpu->gvt->dev_priv; - struct intel_engine_cs *engine = dev_priv->engine[workload->ring_id]; struct i915_request *rq; - int ret = 0; lockdep_assert_held(&dev_priv->drm.struct_mutex); if (workload->req) - goto out; + return 0; - rq = i915_request_alloc(engine, shadow_ctx); + rq = i915_request_create(s->shadow[workload->ring_id]); if (IS_ERR(rq)) { gvt_vgpu_err("fail to allocate gem request\n"); -
[Intel-gfx] [PATCH 40/45] drm/i915: Rename intel_context.active to .inflight
Rename the engine this HW context is currently active upon (that we are flying upon) to disambiguate between the mixture of different active terms (and prevent conflict in future patches). Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gt/intel_context_types.h | 2 +- drivers/gpu/drm/i915/gt/intel_lrc.c | 22 +-- 2 files changed, 12 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index d5a7dbd0daee..47f2970ab5b7 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -36,7 +36,7 @@ struct intel_context { struct i915_gem_context *gem_context; struct intel_engine_cs *engine; - struct intel_engine_cs *active; + struct intel_engine_cs *inflight; struct list_head signal_link; struct list_head signals; diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index 524de9eaa44c..b277c0750943 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -459,7 +459,7 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine) __i915_request_unsubmit(rq); unwind_wa_tail(rq); - GEM_BUG_ON(rq->hw_context->active); + GEM_BUG_ON(rq->hw_context->inflight); /* * Push the request back into the queue for later resubmission. @@ -556,11 +556,11 @@ execlists_user_end(struct intel_engine_execlists *execlists) static inline void execlists_context_schedule_in(struct i915_request *rq) { - GEM_BUG_ON(rq->hw_context->active); + GEM_BUG_ON(rq->hw_context->inflight); execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_IN); intel_engine_context_in(rq->engine); - rq->hw_context->active = rq->engine; + rq->hw_context->inflight = rq->engine; } static void kick_siblings(struct i915_request *rq) @@ -575,7 +575,7 @@ static void kick_siblings(struct i915_request *rq) static inline void execlists_context_schedule_out(struct i915_request *rq, unsigned long status) { - rq->hw_context->active = NULL; + rq->hw_context->inflight = NULL; intel_engine_context_out(rq->engine); execlists_context_status_change(rq, status); trace_i915_request_out(rq); @@ -816,7 +816,7 @@ static bool virtual_matches(const struct virtual_engine *ve, const struct i915_request *rq, const struct intel_engine_cs *engine) { - const struct intel_engine_cs *active; + const struct intel_engine_cs *inflight; if (!(rq->execution_mask & engine->mask)) /* We peeked too soon! */ return false; @@ -830,8 +830,8 @@ static bool virtual_matches(const struct virtual_engine *ve, * we reuse the register offsets). This is a very small * hystersis on the greedy seelction algorithm. */ - active = READ_ONCE(ve->context.active); - if (active && active != engine) + inflight = READ_ONCE(ve->context.inflight); + if (inflight && inflight != engine) return false; return true; @@ -1003,7 +1003,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) u32 *regs = ve->context.lrc_reg_state; unsigned int n; - GEM_BUG_ON(READ_ONCE(ve->context.active)); + GEM_BUG_ON(READ_ONCE(ve->context.inflight)); virtual_update_register_offsets(regs, engine); /* @@ -1478,7 +1478,7 @@ static void execlists_context_unpin(struct intel_context *ce) * had the chance to run yet; let it run before we teardown the * reference it may use. */ - engine = READ_ONCE(ce->active); + engine = READ_ONCE(ce->inflight); if (unlikely(engine)) { unsigned long flags; @@ -1486,7 +1486,7 @@ static void execlists_context_unpin(struct intel_context *ce) process_csb(engine); spin_unlock_irqrestore(&engine->timeline.lock, flags); - GEM_BUG_ON(READ_ONCE(ce->active)); + GEM_BUG_ON(READ_ONCE(ce->inflight)); } i915_gem_context_unpin_hw_id(ce->gem_context); @@ -3087,7 +3087,7 @@ static void virtual_context_destroy(struct kref *kref) unsigned int n; GEM_BUG_ON(ve->request); - GEM_BUG_ON(ve->context.active); + GEM_BUG_ON(ve->context.inflight); for (n = 0; n < ve->num_siblings; n++) { struct intel_engine_cs *sibling = ve->siblings[n]; -- 2.20.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 11/45] drm/i915/execlists: Flush the tasklet on parking
Tidy up the cleanup sequence by always ensure that the tasklet is flushed on parking (before we cleanup). The parking provides a convenient point to ensure that the backend is truly idle. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gt/intel_lrc.c | 25 +++-- drivers/gpu/drm/i915/intel_guc_submission.c | 1 + 2 files changed, 9 insertions(+), 17 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index 01f58a152a9e..ea7794d368d1 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -2331,27 +2331,18 @@ static int gen8_init_rcs_context(struct i915_request *rq) return i915_gem_render_state_emit(rq); } +static void execlists_park(struct intel_engine_cs *engine) +{ + tasklet_kill(&engine->execlists.tasklet); +} + /** * intel_logical_ring_cleanup() - deallocate the Engine Command Streamer * @engine: Engine Command Streamer. */ void intel_logical_ring_cleanup(struct intel_engine_cs *engine) { - struct drm_i915_private *dev_priv; - - /* -* Tasklet cannot be active at this point due intel_mark_active/idle -* so this is just for documentation. -*/ - if (WARN_ON(test_bit(TASKLET_STATE_SCHED, -&engine->execlists.tasklet.state))) - tasklet_kill(&engine->execlists.tasklet); - - dev_priv = engine->i915; - - if (engine->buffer) { - WARN_ON((ENGINE_READ(engine, RING_MI_MODE) & MODE_IDLE) == 0); - } + struct drm_i915_private *i915 = engine->i915; if (engine->cleanup) engine->cleanup(engine); @@ -2361,7 +2352,7 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *engine) lrc_destroy_wa_ctx(engine); engine->i915 = NULL; - dev_priv->engine[engine->id] = NULL; + i915->engine[engine->id] = NULL; kfree(engine); } @@ -2376,7 +2367,7 @@ void intel_execlists_set_default_submission(struct intel_engine_cs *engine) engine->reset.reset = execlists_reset; engine->reset.finish = execlists_reset_finish; - engine->park = NULL; + engine->park = execlists_park; engine->unpark = NULL; engine->flags |= I915_ENGINE_SUPPORTS_STATS; diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c index 4c814344809c..ed94001028f2 100644 --- a/drivers/gpu/drm/i915/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/intel_guc_submission.c @@ -1363,6 +1363,7 @@ static void guc_interrupts_release(struct drm_i915_private *dev_priv) static void guc_submission_park(struct intel_engine_cs *engine) { + tasklet_kill(&engine->execlists.tasklet); intel_engine_unpin_breadcrumbs_irq(engine); engine->flags &= ~I915_ENGINE_NEEDS_BREADCRUMB_TASKLET; } -- 2.20.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 41/45] drm/i915: Keep contexts pinned until after the next kernel context switch
We need to keep the context image pinned in memory until after the GPU has finished writing into it. Since it continues to write as we signal the final breadcrumb, we need to keep it pinned until the request after it is complete. Currently we know the order in which requests execute on each engine, and so to remove that presumption we need to identify a request/context-switch we know must occur after our completion. Any request queued after the signal must imply a context switch, for simplicity we use a fresh request from the kernel context. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 24 ++ drivers/gpu/drm/i915/gem/i915_gem_context.h | 1 - drivers/gpu/drm/i915/gem/i915_gem_pm.c| 17 drivers/gpu/drm/i915/gt/intel_context.c | 80 +++--- drivers/gpu/drm/i915/gt/intel_context.h | 3 + drivers/gpu/drm/i915/gt/intel_context_types.h | 6 +- drivers/gpu/drm/i915/gt/intel_engine.h| 2 - drivers/gpu/drm/i915/gt/intel_engine_cs.c | 23 +- drivers/gpu/drm/i915/gt/intel_engine_pm.c | 2 + drivers/gpu/drm/i915/gt/intel_engine_types.h | 13 +-- drivers/gpu/drm/i915/gt/intel_lrc.c | 62 ++ drivers/gpu/drm/i915/gt/intel_ringbuffer.c| 44 +- drivers/gpu/drm/i915/gt/mock_engine.c | 11 +-- drivers/gpu/drm/i915/i915_active.c| 81 ++- drivers/gpu/drm/i915/i915_active.h| 5 ++ drivers/gpu/drm/i915/i915_active_types.h | 3 + drivers/gpu/drm/i915/i915_gem.c | 4 - drivers/gpu/drm/i915/i915_request.c | 15 .../gpu/drm/i915/selftests/mock_gem_device.c | 1 - 19 files changed, 213 insertions(+), 184 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index d4f13278d5b6..3d0a7af096f6 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -676,17 +676,6 @@ int i915_gem_contexts_init(struct drm_i915_private *dev_priv) return 0; } -void i915_gem_contexts_lost(struct drm_i915_private *dev_priv) -{ - struct intel_engine_cs *engine; - enum intel_engine_id id; - - lockdep_assert_held(&dev_priv->drm.struct_mutex); - - for_each_engine(engine, dev_priv, id) - intel_engine_lost_context(engine); -} - void i915_gem_contexts_fini(struct drm_i915_private *i915) { lockdep_assert_held(&i915->drm.struct_mutex); @@ -1174,10 +1163,6 @@ gen8_modify_rpcs(struct intel_context *ce, struct intel_sseu sseu) if (ret) goto out_add; - ret = gen8_emit_rpcs_config(rq, ce, sseu); - if (ret) - goto out_add; - /* * Guarantee context image and the timeline remains pinned until the * modifying request is retired by setting the ce activity tracker. @@ -1185,9 +1170,12 @@ gen8_modify_rpcs(struct intel_context *ce, struct intel_sseu sseu) * But we only need to take one pin on the account of it. Or in other * words transfer the pinned ce object to tracked active request. */ - if (!i915_active_request_isset(&ce->active_tracker)) - __intel_context_pin(ce); - __i915_active_request_set(&ce->active_tracker, rq); + GEM_BUG_ON(i915_active_is_idle(&ce->active)); + ret = i915_active_ref(&ce->active, rq->fence.context, rq); + if (ret) + goto out_add; + + ret = gen8_emit_rpcs_config(rq, ce, sseu); out_add: i915_request_add(rq); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.h b/drivers/gpu/drm/i915/gem/i915_gem_context.h index 630392c77e48..9691dd062f72 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.h @@ -134,7 +134,6 @@ static inline bool i915_gem_context_is_kernel(struct i915_gem_context *ctx) /* i915_gem_context.c */ int __must_check i915_gem_contexts_init(struct drm_i915_private *dev_priv); -void i915_gem_contexts_lost(struct drm_i915_private *dev_priv); void i915_gem_contexts_fini(struct drm_i915_private *dev_priv); int i915_gem_context_open(struct drm_i915_private *i915, diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pm.c b/drivers/gpu/drm/i915/gem/i915_gem_pm.c index 7e3511773fc1..6e8f86de1ed9 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_pm.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_pm.c @@ -10,6 +10,22 @@ #include "i915_drv.h" #include "i915_globals.h" +static void call_idle_barriers(struct intel_engine_cs *engine) +{ + struct llist_node *node, *next; + + llist_for_each_safe(node, next, llist_del_all(&engine->barrier_tasks)) { + struct i915_active_request *active = + container_of((struct list_head *)node, +typeof(*active), link); + + INIT_LIST_HEAD(&active->link); + RCU_INIT_POINTER(active->request, NUL
[Intel-gfx] [PATCH 15/45] drm/i915: Restore control over ppgtt for context creation ABI
Having hid the partially exposed new ABI from the PR, put it back again for completion of context recovery. A significant part of context recovery is the ability to reuse as much of the old context as is feasible (to avoid expensive reconstruction). The biggest chunk kept hidden at the moment is fine-control over the ctx->ppgtt (the GPU page tables and associated translation tables and kernel maps), so make control over the ctx->ppgtt explicit. This allows userspace to create and share virtual memory address spaces (within the limits of a single fd) between contexts they own, along with the ability to query the contexts for the vm state. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_drv.c | 2 ++ drivers/gpu/drm/i915/i915_gem_context.c | 5 - include/uapi/drm/i915_drm.h | 15 +++ 3 files changed, 17 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 824409ffd03f..81866bd1b6e4 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -3150,6 +3150,8 @@ static const struct drm_ioctl_desc i915_ioctls[] = { DRM_IOCTL_DEF_DRV(I915_PERF_ADD_CONFIG, i915_perf_add_config_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(I915_PERF_REMOVE_CONFIG, i915_perf_remove_config_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW), DRM_IOCTL_DEF_DRV(I915_QUERY, i915_query_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW), + DRM_IOCTL_DEF_DRV(I915_GEM_VM_CREATE, i915_gem_vm_create_ioctl, DRM_RENDER_ALLOW), + DRM_IOCTL_DEF_DRV(I915_GEM_VM_DESTROY, i915_gem_vm_destroy_ioctl, DRM_RENDER_ALLOW), }; static struct drm_driver driver = { diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index 0d9d435ae398..217d4fe0349d 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -98,7 +98,6 @@ #include "i915_user_extensions.h" #define I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE (1 << 1) -#define I915_CONTEXT_PARAM_VM 0x9 #define ALL_L3_SLICES(dev) (1 << NUM_L3_SLICES(dev)) - 1 @@ -969,8 +968,6 @@ static int get_ppgtt(struct drm_i915_file_private *file_priv, struct i915_hw_ppgtt *ppgtt; int ret; - return -EINVAL; /* nothing to see here; please move along */ - if (!ctx->ppgtt) return -ENODEV; @@ -1069,8 +1066,6 @@ static int set_ppgtt(struct drm_i915_file_private *file_priv, struct i915_hw_ppgtt *ppgtt, *old; int err; - return -EINVAL; /* nothing to see here; please move along */ - if (args->size) return -EINVAL; diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index 3a73f5316766..d6ad4a15b2b9 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -355,6 +355,8 @@ typedef struct _drm_i915_sarea { #define DRM_I915_PERF_ADD_CONFIG 0x37 #define DRM_I915_PERF_REMOVE_CONFIG0x38 #define DRM_I915_QUERY 0x39 +#define DRM_I915_GEM_VM_CREATE 0x3a +#define DRM_I915_GEM_VM_DESTROY0x3b /* Must be kept compact -- no holes */ #define DRM_IOCTL_I915_INITDRM_IOW( DRM_COMMAND_BASE + DRM_I915_INIT, drm_i915_init_t) @@ -415,6 +417,8 @@ typedef struct _drm_i915_sarea { #define DRM_IOCTL_I915_PERF_ADD_CONFIG DRM_IOW(DRM_COMMAND_BASE + DRM_I915_PERF_ADD_CONFIG, struct drm_i915_perf_oa_config) #define DRM_IOCTL_I915_PERF_REMOVE_CONFIG DRM_IOW(DRM_COMMAND_BASE + DRM_I915_PERF_REMOVE_CONFIG, __u64) #define DRM_IOCTL_I915_QUERY DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_QUERY, struct drm_i915_query) +#define DRM_IOCTL_I915_GEM_VM_CREATE DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_CREATE, struct drm_i915_gem_vm_control) +#define DRM_IOCTL_I915_GEM_VM_DESTROY DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_VM_DESTROY, struct drm_i915_gem_vm_control) /* Allow drivers to submit batchbuffers directly to hardware, relying * on the security mechanisms provided by hardware. @@ -1507,6 +1511,17 @@ struct drm_i915_gem_context_param { * On creation, all new contexts are marked as recoverable. */ #define I915_CONTEXT_PARAM_RECOVERABLE 0x8 + + /* +* The id of the associated virtual memory address space (ppGTT) of +* this context. Can be retrieved and passed to another context +* (on the same fd) for both to use the same ppGTT and so share +* address layouts, and avoid reloading the page tables on context +* switches between themselves. +* +* See DRM_I915_GEM_VM_CREATE and DRM_I915_GEM_VM_DESTROY. +*/ +#define I915_CONTEXT_PARAM_VM 0x9 /* Must be kept compact -- no holes and well documented */ __u64 value; -- 2.20.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 17/45] drm/i915: Re-expose SINGLE_TIMELINE flags for context creation
The SINGLE_TIMELINE flag can be used to create a context such that all engine instances within that context share a common timeline. This can be useful for mixing operations between real and virtual engines, or when using a composite context for a single client API context. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_gem_context.c | 4 include/uapi/drm/i915_drm.h | 3 ++- 2 files changed, 2 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index b4b7a2eee1c9..d6bea51050c0 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -96,8 +96,6 @@ #include "i915_trace.h" #include "i915_user_extensions.h" -#define I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE (1 << 1) - #define ALL_L3_SLICES(dev) (1 << NUM_L3_SLICES(dev)) - 1 static struct i915_global_gem_context { @@ -493,8 +491,6 @@ i915_gem_create_context(struct drm_i915_private *dev_priv, unsigned int flags) lockdep_assert_held(&dev_priv->drm.struct_mutex); - BUILD_BUG_ON(I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE & -~I915_CONTEXT_CREATE_FLAGS_UNKNOWN); if (flags & I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE && !HAS_EXECLISTS(dev_priv)) return ERR_PTR(-EINVAL); diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index 8e1bb22926e4..7aef672ab3c7 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -1469,8 +1469,9 @@ struct drm_i915_gem_context_create_ext { __u32 ctx_id; /* output: id of new context*/ __u32 flags; #define I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS (1u << 0) +#define I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE (1u << 1) #define I915_CONTEXT_CREATE_FLAGS_UNKNOWN \ - (-(I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS << 1)) + (-(I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE << 1)) __u64 extensions; }; -- 2.20.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 13/45] drm/i915: Convert inconsistent static engine tables into an init error
Remove the modification of the "constant" device info by promoting the inconsistent intel_engine static table into an initialisation error. Now, if we add a new engine into the device_info, we must first add that engine information into the intel_engines. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 28 --- 1 file changed, 9 insertions(+), 19 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 65cdd0c4accc..862cf1040f88 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -356,15 +356,14 @@ void intel_engines_cleanup(struct drm_i915_private *i915) */ int intel_engines_init_mmio(struct drm_i915_private *i915) { - struct intel_device_info *device_info = mkwrite_device_info(i915); - const unsigned int engine_mask = INTEL_INFO(i915)->engine_mask; - unsigned int mask = 0; unsigned int i; int err; - WARN_ON(engine_mask == 0); - WARN_ON(engine_mask & - GENMASK(BITS_PER_TYPE(mask) - 1, I915_NUM_ENGINES)); + /* We always presume we have at least RCS available for later probing */ + if (GEM_WARN_ON(!HAS_ENGINE(i915, RCS0))) { + err = -ENODEV; + goto cleanup; + } if (i915_inject_load_failure()) return -ENODEV; @@ -376,25 +375,16 @@ int intel_engines_init_mmio(struct drm_i915_private *i915) err = intel_engine_setup(i915, i); if (err) goto cleanup; - - mask |= BIT(i); } - /* -* Catch failures to update intel_engines table when the new engines -* are added to the driver by a warning and disabling the forgotten -* engines. -*/ - if (WARN_ON(mask != engine_mask)) - device_info->engine_mask = mask; - - /* We always presume we have at least RCS available for later probing */ - if (WARN_ON(!HAS_ENGINE(i915, RCS0))) { + /* Catch failures to update intel_engines table for new engines. */ + if (GEM_WARN_ON(INTEL_INFO(i915)->engine_mask >> i)) { err = -ENODEV; goto cleanup; } - RUNTIME_INFO(i915)->num_engines = hweight32(mask); + RUNTIME_INFO(i915)->num_engines = + hweight32(INTEL_INFO(i915)->engine_mask); i915_check_and_clear_faults(i915); -- 2.20.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 09/45] drm/i915: Remove intel_context.active_link
We no longer need to track the active intel_contexts within each engine, allowing us to drop a tricky mutex_lock from inside unpin (which may occur inside fs_reclaim). Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/gt/intel_context.c | 11 +-- drivers/gpu/drm/i915/gt/intel_context_types.h | 1 - drivers/gpu/drm/i915/i915_debugfs.c | 11 +-- drivers/gpu/drm/i915/i915_gem_context.c | 2 -- drivers/gpu/drm/i915/i915_gem_context_types.h | 1 - drivers/gpu/drm/i915/selftests/i915_gem_context.c | 1 - drivers/gpu/drm/i915/selftests/mock_context.c | 1 - 7 files changed, 10 insertions(+), 18 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index 5e506e648454..1f1761fc6597 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -49,7 +49,6 @@ int __intel_context_do_pin(struct intel_context *ce) return -EINTR; if (likely(!atomic_read(&ce->pin_count))) { - struct i915_gem_context *ctx = ce->gem_context; intel_wakeref_t wakeref; err = 0; @@ -58,11 +57,7 @@ int __intel_context_do_pin(struct intel_context *ce) if (err) goto err; - i915_gem_context_get(ctx); - - mutex_lock(&ctx->mutex); - list_add(&ce->active_link, &ctx->active_engines); - mutex_unlock(&ctx->mutex); + i915_gem_context_get(ce->gem_context); /* for ctx->ppgtt */ intel_context_get(ce); smp_mb__before_atomic(); /* flush pin before it is visible */ @@ -91,10 +86,6 @@ void intel_context_unpin(struct intel_context *ce) if (likely(atomic_dec_and_test(&ce->pin_count))) { ce->ops->unpin(ce); - mutex_lock(&ce->gem_context->mutex); - list_del(&ce->active_link); - mutex_unlock(&ce->gem_context->mutex); - i915_gem_context_put(ce->gem_context); intel_context_put(ce); } diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index 3579c2708321..d5a7dbd0daee 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -38,7 +38,6 @@ struct intel_context { struct intel_engine_cs *engine; struct intel_engine_cs *active; - struct list_head active_link; struct list_head signal_link; struct list_head signals; diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index 00d3ff746eb1..466becbb99c6 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -34,6 +34,7 @@ #include "gt/intel_reset.h" +#include "i915_gem_context.h" #include "intel_dp.h" #include "intel_drv.h" #include "intel_fbc.h" @@ -396,14 +397,17 @@ static void print_context_stats(struct seq_file *m, struct i915_gem_context *ctx; list_for_each_entry(ctx, &i915->contexts.list, link) { + struct i915_gem_engines_iter it; struct intel_context *ce; - list_for_each_entry(ce, &ctx->active_engines, active_link) { + for_each_gem_engine(ce, + i915_gem_context_lock_engines(ctx), it) { if (ce->state) per_file_stats(0, ce->state->obj, &kstats); if (ce->ring) per_file_stats(0, ce->ring->vma->obj, &kstats); } + i915_gem_context_unlock_engines(ctx); if (!IS_ERR_OR_NULL(ctx->file_priv)) { struct file_stats stats = { .vm = &ctx->ppgtt->vm, }; @@ -1893,6 +1897,7 @@ static int i915_context_status(struct seq_file *m, void *unused) return ret; list_for_each_entry(ctx, &dev_priv->contexts.list, link) { + struct i915_gem_engines_iter it; struct intel_context *ce; seq_puts(m, "HW context "); @@ -1917,7 +1922,8 @@ static int i915_context_status(struct seq_file *m, void *unused) seq_putc(m, ctx->remap_slice ? 'R' : 'r'); seq_putc(m, '\n'); - list_for_each_entry(ce, &ctx->active_engines, active_link) { + for_each_gem_engine(ce, + i915_gem_context_lock_engines(ctx), it) { seq_printf(m, "%s: ", ce->engine->name); if (ce->state) describe_obj(m, ce->state->obj); @@ -1925,6 +1931,7 @@ static int i915_context_status(struct seq_file *m, void *unused) describe_ctx_ring(m, ce->ring); seq_putc(m, '\n');
[Intel-gfx] [PATCH 07/45] drm/i915: Split engine setup/init into two phases
In the next patch, we require the engine vfuncs setup prior to initialising the pinned kernel contexts, so split the vfunc setup from the engine initialisation and call it earlier. v2: s/setup_xcs/setup_common/ for intel_ring_submission_setup() Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/gt/intel_engine.h| 8 +- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 99 drivers/gpu/drm/i915/gt/intel_lrc.c | 74 ++ drivers/gpu/drm/i915/gt/intel_lrc.h | 5 +- drivers/gpu/drm/i915/gt/intel_ringbuffer.c| 232 +- drivers/gpu/drm/i915/gt/intel_workarounds.c | 3 +- drivers/gpu/drm/i915/gt/mock_engine.c | 48 ++-- drivers/gpu/drm/i915/gt/mock_engine.h | 2 + drivers/gpu/drm/i915/i915_gem.c | 6 + .../gpu/drm/i915/selftests/mock_gem_device.c | 12 +- 10 files changed, 245 insertions(+), 244 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h index a228dc1774d8..3e53f53bc52b 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine.h +++ b/drivers/gpu/drm/i915/gt/intel_engine.h @@ -362,14 +362,12 @@ __intel_ring_space(unsigned int head, unsigned int tail, unsigned int size) return (head - tail - CACHELINE_BYTES) & (size - 1); } -int intel_engine_setup_common(struct intel_engine_cs *engine); +int intel_engines_setup(struct drm_i915_private *i915); int intel_engine_init_common(struct intel_engine_cs *engine); void intel_engine_cleanup_common(struct intel_engine_cs *engine); -int intel_init_render_ring_buffer(struct intel_engine_cs *engine); -int intel_init_bsd_ring_buffer(struct intel_engine_cs *engine); -int intel_init_blt_ring_buffer(struct intel_engine_cs *engine); -int intel_init_vebox_ring_buffer(struct intel_engine_cs *engine); +int intel_ring_submission_setup(struct intel_engine_cs *engine); +int intel_ring_submission_init(struct intel_engine_cs *engine); int intel_engine_stop_cs(struct intel_engine_cs *engine); void intel_engine_cancel_stop_cs(struct intel_engine_cs *engine); diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 5d26791f8d9e..23da9eebe247 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -50,35 +50,24 @@ struct engine_class_info { const char *name; - int (*init_legacy)(struct intel_engine_cs *engine); - int (*init_execlists)(struct intel_engine_cs *engine); - u8 uabi_class; }; static const struct engine_class_info intel_engine_classes[] = { [RENDER_CLASS] = { .name = "rcs", - .init_execlists = logical_render_ring_init, - .init_legacy = intel_init_render_ring_buffer, .uabi_class = I915_ENGINE_CLASS_RENDER, }, [COPY_ENGINE_CLASS] = { .name = "bcs", - .init_execlists = logical_xcs_ring_init, - .init_legacy = intel_init_blt_ring_buffer, .uabi_class = I915_ENGINE_CLASS_COPY, }, [VIDEO_DECODE_CLASS] = { .name = "vcs", - .init_execlists = logical_xcs_ring_init, - .init_legacy = intel_init_bsd_ring_buffer, .uabi_class = I915_ENGINE_CLASS_VIDEO, }, [VIDEO_ENHANCEMENT_CLASS] = { .name = "vecs", - .init_execlists = logical_xcs_ring_init, - .init_legacy = intel_init_vebox_ring_buffer, .uabi_class = I915_ENGINE_CLASS_VIDEO_ENHANCE, }, }; @@ -400,48 +389,39 @@ int intel_engines_init_mmio(struct drm_i915_private *dev_priv) /** * intel_engines_init() - init the Engine Command Streamers - * @dev_priv: i915 device private + * @i915: i915 device private * * Return: non-zero if the initialization failed. */ -int intel_engines_init(struct drm_i915_private *dev_priv) +int intel_engines_init(struct drm_i915_private *i915) { + int (*init)(struct intel_engine_cs *engine); struct intel_engine_cs *engine; enum intel_engine_id id, err_id; int err; - for_each_engine(engine, dev_priv, id) { - const struct engine_class_info *class_info = - &intel_engine_classes[engine->class]; - int (*init)(struct intel_engine_cs *engine); - - if (HAS_EXECLISTS(dev_priv)) - init = class_info->init_execlists; - else - init = class_info->init_legacy; + if (HAS_EXECLISTS(i915)) + init = intel_execlists_submission_init; + else + init = intel_ring_submission_init; - err = -EINVAL; + for_each_engine(engine, i915, id) { err_id = id; - if (GEM_DEBUG_WARN_ON(!init)) - goto cleanup; - err = init(engine);
[Intel-gfx] [PATCH 10/45] drm/i915: Move i915_request_alloc into selftests/
Having transitioned GEM over to using intel_context as its primary means of tracking the GEM context and engine combined and using i915_request_create(), we can move the older i915_request_alloc() helper function into selftests/ where the remaining users are confined. Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 9 +++-- drivers/gpu/drm/i915/gt/selftest_lrc.c| 13 --- .../gpu/drm/i915/gt/selftest_workarounds.c| 15 +++- drivers/gpu/drm/i915/i915_request.c | 38 --- drivers/gpu/drm/i915/i915_request.h | 3 -- drivers/gpu/drm/i915/selftests/huge_pages.c | 3 +- drivers/gpu/drm/i915/selftests/i915_gem.c | 5 ++- .../gpu/drm/i915/selftests/i915_gem_context.c | 13 --- .../gpu/drm/i915/selftests/i915_gem_evict.c | 3 +- drivers/gpu/drm/i915/selftests/i915_request.c | 4 +- .../gpu/drm/i915/selftests/igt_gem_utils.c| 34 + .../gpu/drm/i915/selftests/igt_gem_utils.h| 17 + drivers/gpu/drm/i915/selftests/igt_spinner.c | 3 +- drivers/gpu/drm/i915/selftests/mock_request.c | 3 +- 15 files changed, 89 insertions(+), 75 deletions(-) create mode 100644 drivers/gpu/drm/i915/selftests/igt_gem_utils.c create mode 100644 drivers/gpu/drm/i915/selftests/igt_gem_utils.h diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index dd8d923aa1c6..58643373495c 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -193,6 +193,7 @@ i915-$(CONFIG_DRM_I915_SELFTEST) += \ selftests/i915_random.o \ selftests/i915_selftest.o \ selftests/igt_flush_test.o \ + selftests/igt_gem_utils.o \ selftests/igt_live_test.o \ selftests/igt_reset.o \ selftests/igt_spinner.o diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c index 9dece55a091c..dab3d30c9c73 100644 --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c @@ -29,6 +29,7 @@ #include "i915_selftest.h" #include "selftests/i915_random.h" #include "selftests/igt_flush_test.h" +#include "selftests/igt_gem_utils.h" #include "selftests/igt_reset.h" #include "selftests/igt_wedge_me.h" @@ -175,7 +176,7 @@ hang_create_request(struct hang *h, struct intel_engine_cs *engine) if (err) goto unpin_vma; - rq = i915_request_alloc(engine, h->ctx); + rq = igt_request_alloc(h->ctx, engine); if (IS_ERR(rq)) { err = PTR_ERR(rq); goto unpin_hws; @@ -455,7 +456,7 @@ static int igt_reset_nop(void *arg) for (i = 0; i < 16; i++) { struct i915_request *rq; - rq = i915_request_alloc(engine, ctx); + rq = igt_request_alloc(ctx, engine); if (IS_ERR(rq)) { err = PTR_ERR(rq); break; @@ -554,7 +555,7 @@ static int igt_reset_nop_engine(void *arg) for (i = 0; i < 16; i++) { struct i915_request *rq; - rq = i915_request_alloc(engine, ctx); + rq = igt_request_alloc(ctx, engine); if (IS_ERR(rq)) { err = PTR_ERR(rq); break; @@ -800,7 +801,7 @@ static int active_engine(void *data) struct i915_request *new; mutex_lock(&engine->i915->drm.struct_mutex); - new = i915_request_alloc(engine, ctx[idx]); + new = igt_request_alloc(ctx[idx], engine); if (IS_ERR(new)) { mutex_unlock(&engine->i915->drm.struct_mutex); err = PTR_ERR(new); diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c index cd0551f97c2f..84538f69185b 100644 --- a/drivers/gpu/drm/i915/gt/selftest_lrc.c +++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c @@ -10,6 +10,7 @@ #include "i915_selftest.h" #include "selftests/i915_random.h" #include "selftests/igt_flush_test.h" +#include "selftests/igt_gem_utils.h" #include "selftests/igt_live_test.h" #include "selftests/igt_spinner.h" #include "selftests/mock_context.h" @@ -148,7 +149,7 @@ static int live_busywait_preempt(void *arg) * fails, we hang instead. */ - lo = i915_request_alloc(engine, ctx_lo); + lo = igt_request_alloc(ctx_lo, engine); if (IS_ERR(lo)) { err = PTR_ERR(lo); goto err_vma; @@ -192,7 +193,7 @@ static int live_busywait_preempt(void *arg)
[Intel-gfx] [PATCH 16/45] drm/i915: Allow a context to define its set of engines
Over the last few years, we have debated how to extend the user API to support an increase in the number of engines, that may be sparse and even be heterogeneous within a class (not all video decoders created equal). We settled on using (class, instance) tuples to identify a specific engine, with an API for the user to construct a map of engines to capabilities. Into this picture, we then add a challenge of virtual engines; one user engine that maps behind the scenes to any number of physical engines. To keep it general, we want the user to have full control over that mapping. To that end, we allow the user to constrain a context to define the set of engines that it can access, order fully controlled by the user via (class, instance). With such precise control in context setup, we can continue to use the existing execbuf uABI of specifying a single index; only now it doesn't automagically map onto the engines, it uses the user defined engine map from the context. v2: Fixup freeing of local on success of get_engines() v3: Allow empty engines[] v4: s/nengine/num_engines/ v5: Replace 64 limit on num_engines with a note that execbuf is currently limited to only using the first 64 engines. v6: Actually use the engines_mutex to guard the ctx->engines. Testcase: igt/gem_ctx_engines Signed-off-by: Chris Wilson Cc: Tvrtko Ursulin Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_gem_context.c | 219 +- drivers/gpu/drm/i915/i915_gem_context.h | 18 ++ drivers/gpu/drm/i915/i915_gem_context_types.h | 1 + drivers/gpu/drm/i915/i915_gem_execbuffer.c| 5 +- drivers/gpu/drm/i915/i915_utils.h | 36 +++ include/uapi/drm/i915_drm.h | 31 +++ 6 files changed, 303 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index 217d4fe0349d..b4b7a2eee1c9 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -90,7 +90,6 @@ #include #include "gt/intel_lrc_reg.h" -#include "gt/intel_workarounds.h" #include "i915_drv.h" #include "i915_globals.h" @@ -143,13 +142,17 @@ static void lut_close(struct i915_gem_context *ctx) static struct intel_context * lookup_user_engine(struct i915_gem_context *ctx, u16 class, u16 instance) { - struct intel_engine_cs *engine; + if (!i915_gem_context_user_engines(ctx)) { + struct intel_engine_cs *engine; - engine = intel_engine_lookup_user(ctx->i915, class, instance); - if (!engine) - return ERR_PTR(-EINVAL); + engine = intel_engine_lookup_user(ctx->i915, class, instance); + if (!engine) + return ERR_PTR(-EINVAL); + + instance = engine->id; + } - return i915_gem_context_get_engine(ctx, engine->id); + return i915_gem_context_get_engine(ctx, instance); } static inline int new_hw_id(struct drm_i915_private *i915, gfp_t gfp) @@ -257,6 +260,17 @@ static void free_engines(struct i915_gem_engines *e) __free_engines(e, e->num_engines); } +static void free_engines_rcu(struct work_struct *wrk) +{ + struct i915_gem_engines *e = + container_of(wrk, struct i915_gem_engines, rcu.work); + struct drm_i915_private *i915 = e->i915; + + mutex_lock(&i915->drm.struct_mutex); + free_engines(e); + mutex_unlock(&i915->drm.struct_mutex); +} + static struct i915_gem_engines *default_engines(struct i915_gem_context *ctx) { struct intel_engine_cs *engine; @@ -1382,6 +1396,191 @@ static int set_sseu(struct i915_gem_context *ctx, return ret; } +struct set_engines { + struct i915_gem_context *ctx; + struct i915_gem_engines *engines; +}; + +static const i915_user_extension_fn set_engines__extensions[] = { +}; + +static int +set_engines(struct i915_gem_context *ctx, + const struct drm_i915_gem_context_param *args) +{ + struct i915_context_param_engines __user *user = + u64_to_user_ptr(args->value); + struct set_engines set = { .ctx = ctx }; + unsigned int num_engines, n; + u64 extensions; + int err; + + if (!args->size) { /* switch back to legacy user_ring_map */ + if (!i915_gem_context_user_engines(ctx)) + return 0; + + set.engines = default_engines(ctx); + if (IS_ERR(set.engines)) + return PTR_ERR(set.engines); + + goto replace; + } + + BUILD_BUG_ON(!IS_ALIGNED(sizeof(*user), sizeof(*user->engines))); + if (args->size < sizeof(*user) || + !IS_ALIGNED(args->size, sizeof(*user->engines))) { + DRM_DEBUG("Invalid size for engine array: %d\n", + args->size); + return -EINVAL; + } + + /* +* Note that I915_EXEC_RING_MASK limits execbuf to only using th
[Intel-gfx] [PATCH 12/45] drm/i915: Move the engine->destroy() vfunc onto the engine
Make the engine responsible for cleaning itself up! Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gt/intel_engine.h | 4 ++ drivers/gpu/drm/i915/gt/intel_engine_cs.c| 63 ++-- drivers/gpu/drm/i915/gt/intel_engine_types.h | 2 +- drivers/gpu/drm/i915/gt/intel_lrc.c | 30 -- drivers/gpu/drm/i915/gt/intel_ringbuffer.c | 35 +-- drivers/gpu/drm/i915/i915_drv.h | 6 -- drivers/gpu/drm/i915/i915_gem.c | 19 +- 7 files changed, 66 insertions(+), 93 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h index 3e53f53bc52b..f5b0f27cecb6 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine.h +++ b/drivers/gpu/drm/i915/gt/intel_engine.h @@ -362,7 +362,11 @@ __intel_ring_space(unsigned int head, unsigned int tail, unsigned int size) return (head - tail - CACHELINE_BYTES) & (size - 1); } +int intel_engines_init_mmio(struct drm_i915_private *i915); int intel_engines_setup(struct drm_i915_private *i915); +int intel_engines_init(struct drm_i915_private *i915); +void intel_engines_cleanup(struct drm_i915_private *i915); + int intel_engine_init_common(struct intel_engine_cs *engine); void intel_engine_cleanup_common(struct intel_engine_cs *engine); diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 4e30f3720eb7..65cdd0c4accc 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -303,6 +303,12 @@ intel_engine_setup(struct drm_i915_private *dev_priv, engine->class = info->class; engine->instance = info->instance; + /* +* To be overridden by the backend on setup, however to facilitate +* cleanup on error during setup we always provide the destroy vfunc. +*/ + engine->destroy = (typeof(engine->destroy))kfree; + engine->uabi_class = intel_engine_classes[info->class].uabi_class; engine->context_size = __intel_engine_context_size(dev_priv, @@ -327,18 +333,31 @@ intel_engine_setup(struct drm_i915_private *dev_priv, return 0; } +/** + * intel_engines_cleanup() - free the resources allocated for Command Streamers + * @dev_priv: i915 device private + */ +void intel_engines_cleanup(struct drm_i915_private *i915) +{ + struct intel_engine_cs *engine; + enum intel_engine_id id; + + for_each_engine(engine, i915, id) { + engine->destroy(engine); + i915->engine[id] = NULL; + } +} + /** * intel_engines_init_mmio() - allocate and prepare the Engine Command Streamers * @dev_priv: i915 device private * * Return: non-zero if the initialization failed. */ -int intel_engines_init_mmio(struct drm_i915_private *dev_priv) +int intel_engines_init_mmio(struct drm_i915_private *i915) { - struct intel_device_info *device_info = mkwrite_device_info(dev_priv); - const unsigned int engine_mask = INTEL_INFO(dev_priv)->engine_mask; - struct intel_engine_cs *engine; - enum intel_engine_id id; + struct intel_device_info *device_info = mkwrite_device_info(i915); + const unsigned int engine_mask = INTEL_INFO(i915)->engine_mask; unsigned int mask = 0; unsigned int i; int err; @@ -351,10 +370,10 @@ int intel_engines_init_mmio(struct drm_i915_private *dev_priv) return -ENODEV; for (i = 0; i < ARRAY_SIZE(intel_engines); i++) { - if (!HAS_ENGINE(dev_priv, i)) + if (!HAS_ENGINE(i915, i)) continue; - err = intel_engine_setup(dev_priv, i); + err = intel_engine_setup(i915, i); if (err) goto cleanup; @@ -370,20 +389,19 @@ int intel_engines_init_mmio(struct drm_i915_private *dev_priv) device_info->engine_mask = mask; /* We always presume we have at least RCS available for later probing */ - if (WARN_ON(!HAS_ENGINE(dev_priv, RCS0))) { + if (WARN_ON(!HAS_ENGINE(i915, RCS0))) { err = -ENODEV; goto cleanup; } - RUNTIME_INFO(dev_priv)->num_engines = hweight32(mask); + RUNTIME_INFO(i915)->num_engines = hweight32(mask); - i915_check_and_clear_faults(dev_priv); + i915_check_and_clear_faults(i915); return 0; cleanup: - for_each_engine(engine, dev_priv, id) - kfree(engine); + intel_engines_cleanup(i915); return err; } @@ -397,7 +415,7 @@ int intel_engines_init(struct drm_i915_private *i915) { int (*init)(struct intel_engine_cs *engine); struct intel_engine_cs *engine; - enum intel_engine_id id, err_id; + enum intel_engine_id id; int err; if (HAS_EXECLISTS(i915)) @@ -406,8 +424,6 @@ int intel_engines_init(struct drm_i915_private *i915) init = intel_ring
[Intel-gfx] [PATCH 14/45] drm/i915: Make engine_mask & num_engines static
Having removed the urge to modify the engine_mask at runtime, we can promote the num_engines from a runtime calculation to a static and push it into the device_info tables. Signed-off-by: Chris Wilson Cc: Tvrtko Ursulin Cc: Jani Nikula --- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 3 -- drivers/gpu/drm/i915/gt/intel_ringbuffer.c| 2 +- drivers/gpu/drm/i915/gt/selftest_lrc.c| 4 +- drivers/gpu/drm/i915/i915_pci.c | 45 +-- drivers/gpu/drm/i915/intel_device_info.h | 3 +- .../gpu/drm/i915/selftests/i915_gem_context.c | 4 +- drivers/gpu/drm/i915/selftests/i915_request.c | 2 +- 7 files changed, 27 insertions(+), 36 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 862cf1040f88..b2cd6c6785be 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -383,9 +383,6 @@ int intel_engines_init_mmio(struct drm_i915_private *i915) goto cleanup; } - RUNTIME_INFO(i915)->num_engines = - hweight32(INTEL_INFO(i915)->engine_mask); - i915_check_and_clear_faults(i915); return 0; diff --git a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c index 7a4804c47466..2edeedb748ed 100644 --- a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c @@ -1568,7 +1568,7 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags) struct intel_engine_cs *engine = rq->engine; enum intel_engine_id id; const int num_engines = - IS_HSW_GT1(i915) ? RUNTIME_INFO(i915)->num_engines - 1 : 0; + IS_HSW_GT1(i915) ? INTEL_INFO(i915)->num_engines - 1 : 0; bool force_restore = false; int len; u32 *cs; diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c index 84538f69185b..6f223f7facde 100644 --- a/drivers/gpu/drm/i915/gt/selftest_lrc.c +++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c @@ -1185,7 +1185,7 @@ static int smoke_crescendo(struct preempt_smoke *smoke, unsigned int flags) pr_info("Submitted %lu crescendo:%x requests across %d engines and %d contexts\n", count, flags, - RUNTIME_INFO(smoke->i915)->num_engines, smoke->ncontext); + INTEL_INFO(smoke->i915)->num_engines, smoke->ncontext); return 0; } @@ -1213,7 +1213,7 @@ static int smoke_random(struct preempt_smoke *smoke, unsigned int flags) pr_info("Submitted %lu random:%x requests across %d engines and %d contexts\n", count, flags, - RUNTIME_INFO(smoke->i915)->num_engines, smoke->ncontext); + INTEL_INFO(smoke->i915)->num_engines, smoke->ncontext); return 0; } diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c index ffa2ee70a03d..431a4a2c20e1 100644 --- a/drivers/gpu/drm/i915/i915_pci.c +++ b/drivers/gpu/drm/i915/i915_pci.c @@ -35,6 +35,7 @@ #define PLATFORM(x) .platform = (x) #define GEN(x) .gen = (x), .gen_mask = BIT((x) - 1) +#define ENGINES(x) .engine_mask = (x), .num_engines = hweight8(x) #define I845_PIPE_OFFSETS \ .pipe_offsets = { \ @@ -145,6 +146,7 @@ #define I830_FEATURES \ GEN(2), \ + ENGINES(BIT(RCS0)), \ .is_mobile = 1, \ .num_pipes = 2, \ .display.has_overlay = 1, \ @@ -154,7 +156,6 @@ .gpu_reset_clobbers_display = true, \ .hws_needs_physical = 1, \ .unfenced_needs_alignment = 1, \ - .engine_mask = BIT(RCS0), \ .has_snoop = true, \ .has_coherent_ggtt = false, \ I9XX_PIPE_OFFSETS, \ @@ -164,6 +165,7 @@ #define I845_FEATURES \ GEN(2), \ + ENGINES(BIT(RCS0)), \ .num_pipes = 1, \ .display.has_overlay = 1, \ .display.overlay_needs_physical = 1, \ @@ -171,7 +173,6 @@ .gpu_reset_clobbers_display = true, \ .hws_needs_physical = 1, \ .unfenced_needs_alignment = 1, \ - .engine_mask = BIT(RCS0), \ .has_snoop = true, \ .has_coherent_ggtt = false, \ I845_PIPE_OFFSETS, \ @@ -202,10 +203,10 @@ static const struct intel_device_info intel_i865g_info = { #define GEN3_FEATURES \ GEN(3), \ + ENGINES(BIT(RCS0)), \ .num_pipes = 2, \ .display.has_gmch = 1, \ .gpu_reset_clobbers_display = true, \ - .engine_mask = BIT(RCS0), \ .has_snoop = true, \ .has_coherent_ggtt = true, \ I9XX_PIPE_OFFSETS, \ @@ -286,11 +287,11 @@ static const struct intel_device_info intel_pineview_m_info = { #define GEN4_FEATURES \ GEN(4), \ + ENGINES(BIT(RCS0)), \ .num_pipes = 2, \ .display.has_hotplug = 1, \ .display.has_gmch = 1, \ .gpu_reset_clobbers_display = true, \ - .engine_mask = BIT(RCS0), \ .
[Intel-gfx] [PATCH 27/45] drm/i915: Move shmem object setup to its own file
Split the plain old shmem object into its own file to start decluttering i915_gem.c v2: Lose the confusing, hysterical raisins, suffix of _gtt. Signed-off-by: Chris Wilson Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/Makefile | 3 +- drivers/gpu/drm/i915/gem/i915_gem_object.c| 298 +++ drivers/gpu/drm/i915/gem/i915_gem_object.h| 41 +- drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 512 +++ drivers/gpu/drm/i915/gt/intel_lrc.c | 4 +- drivers/gpu/drm/i915/gt/intel_ringbuffer.c| 2 +- drivers/gpu/drm/i915/gvt/cmd_parser.c | 21 +- drivers/gpu/drm/i915/i915_drv.h | 10 - drivers/gpu/drm/i915/i915_gem.c | 826 +- drivers/gpu/drm/i915/i915_perf.c | 2 +- drivers/gpu/drm/i915/intel_fbdev.c| 2 +- drivers/gpu/drm/i915/intel_guc.c | 2 +- drivers/gpu/drm/i915/intel_uc_fw.c| 3 +- drivers/gpu/drm/i915/selftests/huge_pages.c | 6 +- .../gpu/drm/i915/selftests/i915_gem_dmabuf.c | 8 +- .../gpu/drm/i915/selftests/i915_gem_object.c | 4 +- 16 files changed, 883 insertions(+), 861 deletions(-) create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_shmem.c diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index deac659ddef0..447143fadb05 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -87,7 +87,8 @@ i915-y += $(gt-y) # GEM (Graphics Execution Management) code obj-y += gem/ gem-y += \ - gem/i915_gem_object.o + gem/i915_gem_object.o \ + gem/i915_gem_shmem.o i915-y += \ $(gem-y) \ i915_active.o \ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c index 8179252bb39b..86e7e88817af 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c @@ -26,6 +26,7 @@ #include "../i915_drv.h" #include "../i915_globals.h" +#include "../intel_frontbuffer.h" static struct i915_global_object { struct i915_global base; @@ -42,6 +43,64 @@ void i915_gem_object_free(struct drm_i915_gem_object *obj) return kmem_cache_free(global.slab_objects, obj); } +/* some bookkeeping */ +static void i915_gem_info_add_obj(struct drm_i915_private *i915, + u64 size) +{ + spin_lock(&i915->mm.object_stat_lock); + i915->mm.object_count++; + i915->mm.object_memory += size; + spin_unlock(&i915->mm.object_stat_lock); +} + +static void i915_gem_info_remove_obj(struct drm_i915_private *i915, +u64 size) +{ + spin_lock(&i915->mm.object_stat_lock); + i915->mm.object_count--; + i915->mm.object_memory -= size; + spin_unlock(&i915->mm.object_stat_lock); +} + +static void +frontbuffer_retire(struct i915_active_request *active, + struct i915_request *request) +{ + struct drm_i915_gem_object *obj = + container_of(active, typeof(*obj), frontbuffer_write); + + intel_fb_obj_flush(obj, ORIGIN_CS); +} + +void i915_gem_object_init(struct drm_i915_gem_object *obj, + const struct drm_i915_gem_object_ops *ops) +{ + mutex_init(&obj->mm.lock); + + spin_lock_init(&obj->vma.lock); + INIT_LIST_HEAD(&obj->vma.list); + + INIT_LIST_HEAD(&obj->lut_list); + INIT_LIST_HEAD(&obj->batch_pool_link); + + init_rcu_head(&obj->rcu); + + obj->ops = ops; + + reservation_object_init(&obj->__builtin_resv); + obj->resv = &obj->__builtin_resv; + + obj->frontbuffer_ggtt_origin = ORIGIN_GTT; + i915_active_request_init(&obj->frontbuffer_write, +NULL, frontbuffer_retire); + + obj->mm.madv = I915_MADV_WILLNEED; + INIT_RADIX_TREE(&obj->mm.get_page.radix, GFP_KERNEL | __GFP_NOWARN); + mutex_init(&obj->mm.get_page.lock); + + i915_gem_info_add_obj(to_i915(obj->base.dev), obj->base.size); +} + /** * Mark up the object's coherency levels for a given cache_level * @obj: #drm_i915_gem_object @@ -64,6 +123,245 @@ void i915_gem_object_set_cache_coherency(struct drm_i915_gem_object *obj, !(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE); } +void i915_gem_close_object(struct drm_gem_object *gem, struct drm_file *file) +{ + struct drm_i915_private *i915 = to_i915(gem->dev); + struct drm_i915_gem_object *obj = to_intel_bo(gem); + struct drm_i915_file_private *fpriv = file->driver_priv; + struct i915_lut_handle *lut, *ln; + + mutex_lock(&i915->drm.struct_mutex); + + list_for_each_entry_safe(lut, ln, &obj->lut_list, obj_link) { + struct i915_gem_context *ctx = lut->ctx; + struct i915_vma *vma; + + GEM_BUG_ON(ctx->file_priv == ERR_PTR(-EBADF)); + if (ctx->file_priv != fpriv) + conti
[Intel-gfx] [PATCH 32/45] drm/i915: Pull scatterlist utils out of i915_gem.h
Out scatterlist utility routines can be pulled out of i915_gem.h for a bit more decluttering. v2: Push I915_GTT_PAGE_SIZE out of i915_scatterlist itself and into the caller. Signed-off-by: Chris Wilson Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c| 1 + drivers/gpu/drm/i915/gem/i915_gem_internal.c | 1 + drivers/gpu/drm/i915/gem/i915_gem_pages.c | 1 + drivers/gpu/drm/i915/gem/i915_gem_phys.c | 1 + drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 1 + drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 1 + .../drm/i915/gem/selftests/huge_gem_object.c | 2 + .../drm/i915/gem/selftests/i915_gem_dmabuf.c | 2 + drivers/gpu/drm/i915/i915_drv.h | 110 --- drivers/gpu/drm/i915/i915_gem.c | 30 + drivers/gpu/drm/i915/i915_gem_fence_reg.c | 2 + drivers/gpu/drm/i915/i915_gem_gtt.c | 3 +- drivers/gpu/drm/i915/i915_gem_gtt.h | 4 +- drivers/gpu/drm/i915/i915_gpu_error.c | 1 + drivers/gpu/drm/i915/i915_scatterlist.c | 39 ++ drivers/gpu/drm/i915/i915_scatterlist.h | 127 ++ drivers/gpu/drm/i915/selftests/i915_vma.c | 1 + drivers/gpu/drm/i915/selftests/scatterlist.c | 1 + 19 files changed, 188 insertions(+), 141 deletions(-) create mode 100644 drivers/gpu/drm/i915/i915_scatterlist.c create mode 100644 drivers/gpu/drm/i915/i915_scatterlist.h diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index cfefb544b613..8e70d5972195 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -44,6 +44,7 @@ i915-y += i915_drv.o \ i915_irq.o \ i915_params.o \ i915_pci.o \ + i915_scatterlist.o \ i915_suspend.o \ i915_sysfs.o \ intel_csr.o \ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c index 9b75dd8c267d..4e7efe159531 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c @@ -11,6 +11,7 @@ #include "i915_gem_object.h" #include "../i915_drv.h" +#include "../i915_scatterlist.h" static struct drm_i915_gem_object *dma_buf_to_obj(struct dma_buf *buf) { diff --git a/drivers/gpu/drm/i915/gem/i915_gem_internal.c b/drivers/gpu/drm/i915/gem/i915_gem_internal.c index d40adb3bbe29..d072d0cbce06 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_internal.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_internal.c @@ -14,6 +14,7 @@ #include "../i915_drv.h" #include "../i915_gem.h" +#include "../i915_scatterlist.h" #include "../i915_utils.h" #define QUIET (__GFP_NORETRY | __GFP_NOWARN) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c index 452aba0a3359..21635d0ac95e 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c @@ -7,6 +7,7 @@ #include "i915_gem_object.h" #include "../i915_drv.h" +#include "../i915_scatterlist.h" void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj, struct sg_table *pages, diff --git a/drivers/gpu/drm/i915/gem/i915_gem_phys.c b/drivers/gpu/drm/i915/gem/i915_gem_phys.c index 1bf3e0afcba2..22d185301578 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_phys.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_phys.c @@ -16,6 +16,7 @@ #include "i915_gem_object.h" #include "../i915_drv.h" +#include "../i915_scatterlist.h" static int i915_gem_object_get_pages_phys(struct drm_i915_gem_object *obj) { diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c index 78df86d1db7a..2eee1af88bcb 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c @@ -10,6 +10,7 @@ #include "i915_gem_object.h" #include "../i915_drv.h" +#include "../i915_scatterlist.h" /* * Move pages to appropriate lru and release the pagevec, decrementing the diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c index 007a2b8cac8b..0137baa409e1 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c @@ -15,6 +15,7 @@ #include "i915_gem_ioctls.h" #include "i915_gem_object.h" +#include "../i915_scatterlist.h" #include "../i915_trace.h" #include "../intel_drv.h" diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c b/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c index 824f3761314c..c03781f8b435 100644 --- a/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c +++ b/drivers/gpu/drm/i915/gem/selftests/huge_gem_object.c @@ -4,6 +4,8 @@ * Copyright © 2016 Intel Corporation */ +#include "../../i915_scatterlist.h" + #include "huge_gem_object.h" static void huge_free_pages(struct drm_i915_gem_object *obj, diff --git a/drivers/gpu/
[Intel-gfx] [PATCH 05/45] drm/i915/selftests: Pass around intel_context for sseu
Combine the (i915_gem_context, intel_engine) into a single parameter, the intel_context for convenience and later simplification. Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin --- .../gpu/drm/i915/selftests/i915_gem_context.c | 74 +++ 1 file changed, 44 insertions(+), 30 deletions(-) diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/selftests/i915_gem_context.c index 807644ae6877..8e2a94333559 100644 --- a/drivers/gpu/drm/i915/selftests/i915_gem_context.c +++ b/drivers/gpu/drm/i915/selftests/i915_gem_context.c @@ -755,8 +755,7 @@ static struct i915_vma *rpcs_query_batch(struct i915_vma *vma) static int emit_rpcs_query(struct drm_i915_gem_object *obj, - struct i915_gem_context *ctx, - struct intel_engine_cs *engine, + struct intel_context *ce, struct i915_request **rq_out) { struct i915_request *rq; @@ -764,9 +763,9 @@ emit_rpcs_query(struct drm_i915_gem_object *obj, struct i915_vma *vma; int err; - GEM_BUG_ON(!intel_engine_can_store_dword(engine)); + GEM_BUG_ON(!intel_engine_can_store_dword(ce->engine)); - vma = i915_vma_instance(obj, &ctx->ppgtt->vm, NULL); + vma = i915_vma_instance(obj, &ce->gem_context->ppgtt->vm, NULL); if (IS_ERR(vma)) return PTR_ERR(vma); @@ -784,13 +783,15 @@ emit_rpcs_query(struct drm_i915_gem_object *obj, goto err_vma; } - rq = i915_request_alloc(engine, ctx); + rq = i915_request_create(ce); if (IS_ERR(rq)) { err = PTR_ERR(rq); goto err_batch; } - err = engine->emit_bb_start(rq, batch->node.start, batch->node.size, 0); + err = rq->engine->emit_bb_start(rq, + batch->node.start, batch->node.size, + 0); if (err) goto err_request; @@ -834,8 +835,7 @@ static int __sseu_prepare(struct drm_i915_private *i915, const char *name, unsigned int flags, - struct i915_gem_context *ctx, - struct intel_engine_cs *engine, + struct intel_context *ce, struct igt_spinner **spin) { struct i915_request *rq; @@ -853,7 +853,10 @@ __sseu_prepare(struct drm_i915_private *i915, if (ret) goto err_free; - rq = igt_spinner_create_request(*spin, ctx, engine, MI_NOOP); + rq = igt_spinner_create_request(*spin, + ce->gem_context, + ce->engine, + MI_NOOP); if (IS_ERR(rq)) { ret = PTR_ERR(rq); goto err_fini; @@ -880,8 +883,7 @@ __sseu_prepare(struct drm_i915_private *i915, static int __read_slice_count(struct drm_i915_private *i915, - struct i915_gem_context *ctx, - struct intel_engine_cs *engine, + struct intel_context *ce, struct drm_i915_gem_object *obj, struct igt_spinner *spin, u32 *rpcs) @@ -892,7 +894,7 @@ __read_slice_count(struct drm_i915_private *i915, u32 *buf, val; long ret; - ret = emit_rpcs_query(obj, ctx, engine, &rq); + ret = emit_rpcs_query(obj, ce, &rq); if (ret) return ret; @@ -956,29 +958,28 @@ static int __sseu_finish(struct drm_i915_private *i915, const char *name, unsigned int flags, - struct i915_gem_context *ctx, - struct intel_engine_cs *engine, + struct intel_context *ce, struct drm_i915_gem_object *obj, unsigned int expected, struct igt_spinner *spin) { - unsigned int slices = hweight32(engine->sseu.slice_mask); + unsigned int slices = hweight32(ce->engine->sseu.slice_mask); u32 rpcs = 0; int ret = 0; if (flags & TEST_RESET) { - ret = i915_reset_engine(engine, "sseu"); + ret = i915_reset_engine(ce->engine, "sseu"); if (ret) goto out; } - ret = __read_slice_count(i915, ctx, engine, obj, + ret = __read_slice_count(i915, ce, obj, flags & TEST_RESET ? NULL : spin, &rpcs); ret = __check_rpcs(name, rpcs, ret, expected, "Context", "!"); if (ret) goto out; - ret = __read_slice_count(i915, i915->kernel_context, engine, obj, + ret = __read_slice_count(i915, ce->engine->kernel_context, obj, NULL, &rpcs); ret = __check_rpcs(name, rpcs, ret, slices, "Kernel context", "!"); @@ -993,7 +994,7 @@ __sseu_finish(struct drm_i915_private *i915, if (ret) return ret; -
[Intel-gfx] [PATCH 04/45] drm/i915/selftests: Use the real kernel context for sseu isolation tests
Simply the setup slightly for the sseu selftests to use the actual kernel_context. Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin --- .../gpu/drm/i915/selftests/i915_gem_context.c | 17 - 1 file changed, 4 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/selftests/i915_gem_context.c index 71d896bbade2..807644ae6877 100644 --- a/drivers/gpu/drm/i915/selftests/i915_gem_context.c +++ b/drivers/gpu/drm/i915/selftests/i915_gem_context.c @@ -957,7 +957,6 @@ __sseu_finish(struct drm_i915_private *i915, const char *name, unsigned int flags, struct i915_gem_context *ctx, - struct i915_gem_context *kctx, struct intel_engine_cs *engine, struct drm_i915_gem_object *obj, unsigned int expected, @@ -979,7 +978,8 @@ __sseu_finish(struct drm_i915_private *i915, if (ret) goto out; - ret = __read_slice_count(i915, kctx, engine, obj, NULL, &rpcs); + ret = __read_slice_count(i915, i915->kernel_context, engine, obj, +NULL, &rpcs); ret = __check_rpcs(name, rpcs, ret, slices, "Kernel context", "!"); out: @@ -1011,22 +1011,17 @@ __sseu_test(struct drm_i915_private *i915, struct intel_sseu sseu) { struct igt_spinner *spin = NULL; - struct i915_gem_context *kctx; int ret; - kctx = kernel_context(i915); - if (IS_ERR(kctx)) - return PTR_ERR(kctx); - ret = __sseu_prepare(i915, name, flags, ctx, engine, &spin); if (ret) - goto out_context; + return ret; ret = __i915_gem_context_reconfigure_sseu(ctx, engine, sseu); if (ret) goto out_spin; - ret = __sseu_finish(i915, name, flags, ctx, kctx, engine, obj, + ret = __sseu_finish(i915, name, flags, ctx, engine, obj, hweight32(sseu.slice_mask), spin); out_spin: @@ -1035,10 +1030,6 @@ __sseu_test(struct drm_i915_private *i915, igt_spinner_fini(spin); kfree(spin); } - -out_context: - kernel_context_close(kctx); - return ret; } -- 2.20.1 ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [PATCH 21/45] drm/i915: Extend execution fence to support a callback
In the next patch, we will want to configure the slave request depending on which physical engine the master request is executed on. For this, we introduce a callback from the execute fence to convey this information. Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_request.c | 84 +++-- drivers/gpu/drm/i915/i915_request.h | 4 ++ 2 files changed, 83 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 06c2b89d0b0a..55357b4a5fff 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -38,6 +38,8 @@ struct execute_cb { struct list_head link; struct irq_work work; struct i915_sw_fence *fence; + void (*hook)(struct i915_request *rq, struct dma_fence *signal); + struct i915_request *signal; }; static struct i915_global_request { @@ -329,6 +331,17 @@ static void irq_execute_cb(struct irq_work *wrk) kmem_cache_free(global.slab_execute_cbs, cb); } +static void irq_execute_cb_hook(struct irq_work *wrk) +{ + struct execute_cb *cb = container_of(wrk, typeof(*cb), work); + + cb->hook(container_of(cb->fence, struct i915_request, submit), +&cb->signal->fence); + i915_request_put(cb->signal); + + irq_execute_cb(wrk); +} + static void __notify_execute_cb(struct i915_request *rq) { struct execute_cb *cb; @@ -355,14 +368,19 @@ static void __notify_execute_cb(struct i915_request *rq) } static int -i915_request_await_execution(struct i915_request *rq, -struct i915_request *signal, -gfp_t gfp) +__i915_request_await_execution(struct i915_request *rq, + struct i915_request *signal, + void (*hook)(struct i915_request *rq, + struct dma_fence *signal), + gfp_t gfp) { struct execute_cb *cb; - if (i915_request_is_active(signal)) + if (i915_request_is_active(signal)) { + if (hook) + hook(rq, &signal->fence); return 0; + } cb = kmem_cache_alloc(global.slab_execute_cbs, gfp); if (!cb) @@ -372,8 +390,18 @@ i915_request_await_execution(struct i915_request *rq, i915_sw_fence_await(cb->fence); init_irq_work(&cb->work, irq_execute_cb); + if (hook) { + cb->hook = hook; + cb->signal = i915_request_get(signal); + cb->work.func = irq_execute_cb_hook; + } + spin_lock_irq(&signal->lock); if (i915_request_is_active(signal)) { + if (hook) { + hook(rq, &signal->fence); + i915_request_put(signal); + } i915_sw_fence_complete(cb->fence); kmem_cache_free(global.slab_execute_cbs, cb); } else { @@ -802,7 +830,7 @@ emit_semaphore_wait(struct i915_request *to, return err; /* Only submit our spinner after the signaler is running! */ - err = i915_request_await_execution(to, from, gfp); + err = __i915_request_await_execution(to, from, NULL, gfp); if (err) return err; @@ -923,6 +951,52 @@ i915_request_await_dma_fence(struct i915_request *rq, struct dma_fence *fence) return 0; } +int +i915_request_await_execution(struct i915_request *rq, +struct dma_fence *fence, +void (*hook)(struct i915_request *rq, + struct dma_fence *signal)) +{ + struct dma_fence **child = &fence; + unsigned int nchild = 1; + int ret; + + if (dma_fence_is_array(fence)) { + struct dma_fence_array *array = to_dma_fence_array(fence); + + /* XXX Error for signal-on-any fence arrays */ + + child = array->fences; + nchild = array->num_fences; + GEM_BUG_ON(!nchild); + } + + do { + fence = *child++; + if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)) + continue; + + /* +* We don't squash repeated fence dependencies here as we +* want to run our callback in all cases. +*/ + + if (dma_fence_is_i915(fence)) + ret = __i915_request_await_execution(rq, +to_request(fence), +hook, +I915_FENCE_GFP); + else + ret = i915_sw_fence_await_dma_fence(&rq->submit, fence, +
[Intel-gfx] [PATCH 18/45] drm/i915: Allow userspace to clone contexts on creation
A usecase arose out of handling context recovery in mesa, whereby they wish to recreate a context with fresh logical state but preserving all other details of the original. Currently, they create a new context and iterate over which bits they want to copy across, but it would much more convenient if they were able to just pass in a target context to clone during creation. This essentially extends the setparam during creation to pull the details from a target context instead of the user supplied parameters. The ideal here is that we don't expose control over anything more than can be obtained via CONTEXT_PARAM. That is userspace retains explicit control over all features, and this api is just convenience. For example, you could replace struct context_param p = { .param = CONTEXT_PARAM_VM }; param.ctx_id = old_id; gem_context_get_param(&p.param); new_id = gem_context_create(); param.ctx_id = new_id; gem_context_set_param(&p.param); gem_vm_destroy(param.value); /* drop the ref to VM_ID handle */ with struct create_ext_param p = { { .name = CONTEXT_CREATE_CLONE }, .clone_id = old_id, .flags = CLONE_FLAGS_VM } new_id = gem_context_create_ext(&p); and not have to worry about stray namespace pollution etc. Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_gem_context.c | 206 include/uapi/drm/i915_drm.h | 15 ++ 2 files changed, 221 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index d6bea51050c0..ba7582d955d1 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -1682,8 +1682,214 @@ static int create_setparam(struct i915_user_extension __user *ext, void *data) return ctx_setparam(arg->fpriv, arg->ctx, &local.param); } +static int clone_engines(struct i915_gem_context *dst, +struct i915_gem_context *src) +{ + struct i915_gem_engines *e = i915_gem_context_lock_engines(src); + struct i915_gem_engines *clone; + bool user_engines; + unsigned long n; + + clone = kmalloc(struct_size(e, engines, e->num_engines), GFP_KERNEL); + if (!clone) + goto err_unlock; + + clone->i915 = dst->i915; + for (n = 0; n < e->num_engines; n++) { + if (!e->engines[n]) { + clone->engines[n] = NULL; + continue; + } + + clone->engines[n] = + intel_context_create(dst, e->engines[n]->engine); + if (!clone->engines[n]) { + __free_engines(clone, n); + goto err_unlock; + } + } + clone->num_engines = n; + + user_engines = i915_gem_context_user_engines(src); + i915_gem_context_unlock_engines(src); + + free_engines(dst->engines); + RCU_INIT_POINTER(dst->engines, clone); + if (user_engines) + i915_gem_context_set_user_engines(dst); + else + i915_gem_context_clear_user_engines(dst); + return 0; + +err_unlock: + i915_gem_context_unlock_engines(src); + return -ENOMEM; +} + +static int clone_flags(struct i915_gem_context *dst, + struct i915_gem_context *src) +{ + dst->user_flags = src->user_flags; + return 0; +} + +static int clone_schedattr(struct i915_gem_context *dst, + struct i915_gem_context *src) +{ + dst->sched = src->sched; + return 0; +} + +static int clone_sseu(struct i915_gem_context *dst, + struct i915_gem_context *src) +{ + struct i915_gem_engines *e = i915_gem_context_lock_engines(src); + struct i915_gem_engines *clone; + unsigned long n; + int err; + + clone = dst->engines; /* no locking required; sole access */ + if (e->num_engines != clone->num_engines) { + err = -EINVAL; + goto unlock; + } + + for (n = 0; n < e->num_engines; n++) { + struct intel_context *ce = e->engines[n]; + + if (clone->engines[n]->engine->class != ce->engine->class) { + /* Must have compatible engine maps! */ + err = -EINVAL; + goto unlock; + } + + /* serialises with set_sseu */ + err = intel_context_lock_pinned(ce); + if (err) + goto unlock; + + clone->engines[n]->sseu = ce->sseu; + intel_context_unlock_pinned(ce); + } + + err = 0; +unlock: + i915_gem_context_unlock_engines(src); + return err; +} + +static int clone_timeline(struct i915_gem_context *dst, + struct i915_gem_context *src) +{ +
[Intel-gfx] [PATCH 20/45] drm/i915: Apply an execution_mask to the virtual_engine
Allow the user to direct which physical engines of the virtual engine they wish to execute one, as sometimes it is necessary to override the load balancing algorithm. v2: Only kick the virtual engines on context-out if required Signed-off-by: Chris Wilson Cc: Tvrtko Ursulin --- drivers/gpu/drm/i915/gt/intel_lrc.c| 67 + drivers/gpu/drm/i915/gt/selftest_lrc.c | 131 + drivers/gpu/drm/i915/i915_request.c| 1 + drivers/gpu/drm/i915/i915_request.h| 3 + 4 files changed, 202 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index e8faf3417f9c..8acf9d1c6f2f 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -549,6 +549,15 @@ execlists_context_schedule_in(struct i915_request *rq) rq->hw_context->active = rq->engine; } +static void kick_siblings(struct i915_request *rq) +{ + struct virtual_engine *ve = to_virtual_engine(rq->hw_context->engine); + struct i915_request *next = READ_ONCE(ve->request); + + if (next && next->execution_mask & ~rq->execution_mask) + tasklet_schedule(&ve->base.execlists.tasklet); +} + static inline void execlists_context_schedule_out(struct i915_request *rq, unsigned long status) { @@ -556,6 +565,18 @@ execlists_context_schedule_out(struct i915_request *rq, unsigned long status) intel_engine_context_out(rq->engine); execlists_context_status_change(rq, status); trace_i915_request_out(rq); + + /* +* If this is part of a virtual engine, its next request may have +* been blocked waiting for access to the active context. We have +* to kick all the siblings again in case we need to switch (e.g. +* the next request is not runnable on this engine). Hopefully, +* we will already have submitted the next request before the +* tasklet runs and do not need to rebuild each virtual tree +* and kick everyone again. +*/ + if (rq->engine != rq->hw_context->engine) + kick_siblings(rq); } static u64 execlists_update_context(struct i915_request *rq) @@ -783,6 +804,9 @@ static bool virtual_matches(const struct virtual_engine *ve, { const struct intel_engine_cs *active; + if (!(rq->execution_mask & engine->mask)) /* We peeked too soon! */ + return false; + /* * We track when the HW has completed saving the context image * (i.e. when we have seen the final CS event switching out of @@ -3141,12 +3165,44 @@ static const struct intel_context_ops virtual_context_ops = { .destroy = virtual_context_destroy, }; +static intel_engine_mask_t virtual_submission_mask(struct virtual_engine *ve) +{ + struct i915_request *rq; + intel_engine_mask_t mask; + + rq = READ_ONCE(ve->request); + if (!rq) + return 0; + + /* The rq is ready for submission; rq->execution_mask is now stable. */ + mask = rq->execution_mask; + if (unlikely(!mask)) { + /* Invalid selection, submit to a random engine in error */ + i915_request_skip(rq, -ENODEV); + mask = ve->siblings[0]->mask; + } + + GEM_TRACE("%s: rq=%llx:%lld, mask=%x, prio=%d\n", + ve->base.name, + rq->fence.context, rq->fence.seqno, + mask, ve->base.execlists.queue_priority_hint); + + return mask; +} + static void virtual_submission_tasklet(unsigned long data) { struct virtual_engine * const ve = (struct virtual_engine *)data; const int prio = ve->base.execlists.queue_priority_hint; + intel_engine_mask_t mask; unsigned int n; + rcu_read_lock(); + mask = virtual_submission_mask(ve); + rcu_read_unlock(); + if (unlikely(!mask)) + return; + local_irq_disable(); for (n = 0; READ_ONCE(ve->request) && n < ve->num_siblings; n++) { struct intel_engine_cs *sibling = ve->siblings[n]; @@ -3154,6 +3210,17 @@ static void virtual_submission_tasklet(unsigned long data) struct rb_node **parent, *rb; bool first; + if (unlikely(!(mask & sibling->mask))) { + if (!RB_EMPTY_NODE(&node->rb)) { + spin_lock(&sibling->timeline.lock); + rb_erase_cached(&node->rb, + &sibling->execlists.virtual); + RB_CLEAR_NODE(&node->rb); + spin_unlock(&sibling->timeline.lock); + } + continue; + } + spin_lock(&sibling->timeline.lock); if (!RB_EMPTY_NODE(&node->rb)) { diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c index
[Intel-gfx] [PATCH 24/45] drm/i915: Split GEM object type definition to its own header
For convenience in avoiding inline spaghetti, keep the type definition as a separate header. Signed-off-by: Chris Wilson Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/gem/Makefile | 1 + drivers/gpu/drm/i915/gem/Makefile.header-test | 16 + .../gpu/drm/i915/gem/i915_gem_object_types.h | 285 + drivers/gpu/drm/i915/gt/intel_engine_types.h | 1 + drivers/gpu/drm/i915/i915_drv.h | 3 +- drivers/gpu/drm/i915/i915_gem_batch_pool.h| 3 +- drivers/gpu/drm/i915/i915_gem_gtt.h | 1 + drivers/gpu/drm/i915/i915_gem_object.h| 295 +- 9 files changed, 312 insertions(+), 294 deletions(-) create mode 100644 drivers/gpu/drm/i915/gem/Makefile create mode 100644 drivers/gpu/drm/i915/gem/Makefile.header-test create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_object_types.h diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index 58643373495c..def781c9ea69 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -85,6 +85,7 @@ gt-$(CONFIG_DRM_I915_SELFTEST) += \ i915-y += $(gt-y) # GEM (Graphics Execution Management) code +obj-y += gem/ i915-y += \ i915_active.o \ i915_cmd_parser.o \ diff --git a/drivers/gpu/drm/i915/gem/Makefile b/drivers/gpu/drm/i915/gem/Makefile new file mode 100644 index ..07e7b8b840ea --- /dev/null +++ b/drivers/gpu/drm/i915/gem/Makefile @@ -0,0 +1 @@ +include $(src)/Makefile.header-test # Extra header tests diff --git a/drivers/gpu/drm/i915/gem/Makefile.header-test b/drivers/gpu/drm/i915/gem/Makefile.header-test new file mode 100644 index ..61e06cbb4b32 --- /dev/null +++ b/drivers/gpu/drm/i915/gem/Makefile.header-test @@ -0,0 +1,16 @@ +# SPDX-License-Identifier: MIT +# Copyright © 2019 Intel Corporation + +# Test the headers are compilable as standalone units +header_test := $(notdir $(wildcard $(src)/*.h)) + +quiet_cmd_header_test = HDRTEST $@ + cmd_header_test = echo "\#include \"$( $@ + +header_test_%.c: %.h + $(call cmd,header_test) + +extra-$(CONFIG_DRM_I915_WERROR) += \ + $(foreach h,$(header_test),$(patsubst %.h,header_test_%.o,$(h))) + +clean-files += $(foreach h,$(header_test),$(patsubst %.h,header_test_%.c,$(h))) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h new file mode 100644 index ..e4b50944f553 --- /dev/null +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h @@ -0,0 +1,285 @@ +/* + * SPDX-License-Identifier: MIT + * + * Copyright © 2016 Intel Corporation + */ + +#ifndef __I915_GEM_OBJECT_TYPES_H__ +#define __I915_GEM_OBJECT_TYPES_H__ + +#include + +#include + +#include "../i915_active.h" +#include "../i915_selftest.h" + +struct drm_i915_gem_object; + +/* + * struct i915_lut_handle tracks the fast lookups from handle to vma used + * for execbuf. Although we use a radixtree for that mapping, in order to + * remove them as the object or context is closed, we need a secondary list + * and a translation entry (i915_lut_handle). + */ +struct i915_lut_handle { + struct list_head obj_link; + struct list_head ctx_link; + struct i915_gem_context *ctx; + u32 handle; +}; + +struct drm_i915_gem_object_ops { + unsigned int flags; +#define I915_GEM_OBJECT_HAS_STRUCT_PAGEBIT(0) +#define I915_GEM_OBJECT_IS_SHRINKABLE BIT(1) +#define I915_GEM_OBJECT_IS_PROXY BIT(2) +#define I915_GEM_OBJECT_ASYNC_CANCEL BIT(3) + + /* Interface between the GEM object and its backing storage. +* get_pages() is called once prior to the use of the associated set +* of pages before to binding them into the GTT, and put_pages() is +* called after we no longer need them. As we expect there to be +* associated cost with migrating pages between the backing storage +* and making them available for the GPU (e.g. clflush), we may hold +* onto the pages after they are no longer referenced by the GPU +* in case they may be used again shortly (for example migrating the +* pages to a different memory domain within the GTT). put_pages() +* will therefore most likely be called when the object itself is +* being released or under memory pressure (where we attempt to +* reap pages for the shrinker). +*/ + int (*get_pages)(struct drm_i915_gem_object *obj); + void (*put_pages)(struct drm_i915_gem_object *obj, + struct sg_table *pages); + + int (*pwrite)(struct drm_i915_gem_object *obj, + const struct drm_i915_gem_pwrite *arg); + + int (*dmabuf_export)(struct drm_i915_gem_object *obj); + void (*release)(struct drm_i915_gem_object *obj); +}; + +struct drm_i915_gem_object { + struct drm_gem_object base; + + const struct drm_i915_gem_object_ops *ops; + +
[Intel-gfx] [PATCH 06/45] drm/i915: Pass intel_context to intel_context_pin_lock()
Move the intel_context_instance() to the caller so that we can decouple ourselves from one context instance per engine. v2: Rename pin_lock() to lock_pinned(), hopefully that is clearer. Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/gt/intel_context.c | 26 -- drivers/gpu/drm/i915/gt/intel_context.h | 34 +-- drivers/gpu/drm/i915/i915_gem_context.c | 92 +++ .../gpu/drm/i915/selftests/i915_gem_context.c | 2 +- 4 files changed, 82 insertions(+), 72 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index 8b386202b374..15ac99c5dd4a 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -128,32 +128,6 @@ intel_context_instance(struct i915_gem_context *ctx, return intel_context_get(pos); } -struct intel_context * -intel_context_pin_lock(struct i915_gem_context *ctx, - struct intel_engine_cs *engine) - __acquires(ce->pin_mutex) -{ - struct intel_context *ce; - - ce = intel_context_instance(ctx, engine); - if (IS_ERR(ce)) - return ce; - - if (mutex_lock_interruptible(&ce->pin_mutex)) { - intel_context_put(ce); - return ERR_PTR(-EINTR); - } - - return ce; -} - -void intel_context_pin_unlock(struct intel_context *ce) - __releases(ce->pin_mutex) -{ - mutex_unlock(&ce->pin_mutex); - intel_context_put(ce); -} - int __intel_context_do_pin(struct intel_context *ce) { int err; diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h index b9a574587eb3..b746add6b71d 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.h +++ b/drivers/gpu/drm/i915/gt/intel_context.h @@ -31,25 +31,45 @@ intel_context_lookup(struct i915_gem_context *ctx, struct intel_engine_cs *engine); /** - * intel_context_pin_lock - Stablises the 'pinned' status of the HW context - * @ctx - the parent GEM context - * @engine - the target HW engine + * intel_context_lock_pinned - Stablises the 'pinned' status of the HW context + * @ce - the context * * Acquire a lock on the pinned status of the HW context, such that the context * can neither be bound to the GPU or unbound whilst the lock is held, i.e. * intel_context_is_pinned() remains stable. */ -struct intel_context * -intel_context_pin_lock(struct i915_gem_context *ctx, - struct intel_engine_cs *engine); +static inline int intel_context_lock_pinned(struct intel_context *ce) + __acquires(ce->pin_mutex) +{ + return mutex_lock_interruptible(&ce->pin_mutex); +} +/** + * intel_context_is_pinned - Reports the 'pinned' status + * @ce - the context + * + * While in use by the GPU, the context, along with its ring and page + * tables is pinned into memory and the GTT. + * + * Returns: true if the context is currently pinned for use by the GPU. + */ static inline bool intel_context_is_pinned(struct intel_context *ce) { return atomic_read(&ce->pin_count); } -void intel_context_pin_unlock(struct intel_context *ce); +/** + * intel_context_unlock_pinned - Releases the earlier locking of 'pinned' status + * @ce - the context + * + * Releases the lock earlier acquired by intel_context_unlock_pinned(). + */ +static inline void intel_context_unlock_pinned(struct intel_context *ce) + __releases(ce->pin_mutex) +{ + mutex_unlock(&ce->pin_mutex); +} struct intel_context * __intel_context_insert(struct i915_gem_context *ctx, diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c index 05496ea7a123..d9db3fea151c 100644 --- a/drivers/gpu/drm/i915/i915_gem_context.c +++ b/drivers/gpu/drm/i915/i915_gem_context.c @@ -141,6 +141,18 @@ static void lut_close(struct i915_gem_context *ctx) rcu_read_unlock(); } +static struct intel_context * +lookup_user_engine(struct i915_gem_context *ctx, u16 class, u16 instance) +{ + struct intel_engine_cs *engine; + + engine = intel_engine_lookup_user(ctx->i915, class, instance); + if (!engine) + return ERR_PTR(-EINVAL); + + return intel_context_instance(ctx, engine); +} + static inline int new_hw_id(struct drm_i915_private *i915, gfp_t gfp) { unsigned int max; @@ -1132,19 +1144,17 @@ gen8_modify_rpcs(struct intel_context *ce, struct intel_sseu sseu) } static int -__i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx, - struct intel_engine_cs *engine, - struct intel_sseu sseu) +__intel_context_reconfigure_sseu(struct intel_context *ce, +struct intel_sseu sseu) { - struct intel_context *ce; - int ret = 0; + int ret; - GEM_BUG_ON(INTEL_GEN(ctx->i915) < 8); - GEM_BUG_ON(engine->id != RCS0); +
[Intel-gfx] [PATCH 22/45] drm/i915/execlists: Virtual engine bonding
Some users require that when a master batch is executed on one particular engine, a companion batch is run simultaneously on a specific slave engine. For this purpose, we introduce virtual engine bonding, allowing maps of master:slaves to be constructed to constrain which physical engines a virtual engine may select given a fence on a master engine. For the moment, we continue to ignore the issue of preemption deferring the master request for later. Ideally, we would like to then also remove the slave and run something else rather than have it stall the pipeline. With load balancing, we should be able to move workload around it, but there is a similar stall on the master pipeline while it may wait for the slave to be executed. At the cost of more latency for the bonded request, it may be interesting to launch both on their engines in lockstep. (Bubbles abound.) Opens: Also what about bonding an engine as its own master? It doesn't break anything internally, so allow the silliness. v2: Emancipate the bonds v3: Couple in delayed scheduling for the selftests v4: Handle invalid mutually exclusive bonding v5: Mention what the uapi does v6: s/nbond/num_bonds/ Signed-off-by: Chris Wilson Cc: Tvrtko Ursulin --- drivers/gpu/drm/i915/gt/intel_engine_types.h | 7 + drivers/gpu/drm/i915/gt/intel_lrc.c | 98 + drivers/gpu/drm/i915/gt/intel_lrc.h | 4 + drivers/gpu/drm/i915/gt/selftest_lrc.c| 191 ++ drivers/gpu/drm/i915/i915_gem_context.c | 84 drivers/gpu/drm/i915/selftests/lib_sw_fence.c | 3 + include/uapi/drm/i915_drm.h | 44 7 files changed, 431 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index b07d3685e2cb..ca8d95a5708d 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -405,6 +405,13 @@ struct intel_engine_cs { */ void(*submit_request)(struct i915_request *rq); + /* +* Called on signaling of a SUBMIT_FENCE, passing along the signaling +* request down to the bonded pairs. +*/ + void(*bond_execute)(struct i915_request *rq, + struct dma_fence *signal); + /* * Call when the priority on a request has changed and it and its * dependencies may need rescheduling. Note the request itself may diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index 8acf9d1c6f2f..d793d976402d 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -191,6 +191,18 @@ struct virtual_engine { int prio; } nodes[I915_NUM_ENGINES]; + /* +* Keep track of bonded pairs -- restrictions upon on our selection +* of physical engines any particular request may be submitted to. +* If we receive a submit-fence from a master engine, we will only +* use one of sibling_mask physical engines. +*/ + struct ve_bond { + const struct intel_engine_cs *master; + intel_engine_mask_t sibling_mask; + } *bonds; + unsigned int num_bonds; + /* And finally, which physical engines this virtual engine maps onto. */ unsigned int num_siblings; struct intel_engine_cs *siblings[0]; @@ -982,6 +994,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) rb_erase_cached(rb, &execlists->virtual); RB_CLEAR_NODE(rb); + GEM_BUG_ON(!(rq->execution_mask & engine->mask)); rq->engine = engine; if (engine != ve->siblings[0]) { @@ -3093,6 +3106,8 @@ static void virtual_context_destroy(struct kref *kref) if (ve->context.state) __execlists_context_fini(&ve->context); + kfree(ve->bonds); + i915_timeline_fini(&ve->base.timeline); kfree(ve); } @@ -3288,6 +3303,38 @@ static void virtual_submit_request(struct i915_request *rq) tasklet_schedule(&ve->base.execlists.tasklet); } +static struct ve_bond * +virtual_find_bond(struct virtual_engine *ve, + const struct intel_engine_cs *master) +{ + int i; + + for (i = 0; i < ve->num_bonds; i++) { + if (ve->bonds[i].master == master) + return &ve->bonds[i]; + } + + return NULL; +} + +static void +virtual_bond_execute(struct i915_request *rq, struct dma_fence *signal) +{ + struct virtual_engine *ve = to_virtual_engine(rq->engine); + struct ve_bond *bond; + + bond = virtual_find_bond(ve, to_request(signal)->engine); + if (bond) { + intel_engine_mask_t old, new, cmp; + + cmp = READ_ONCE(rq->execution_mask); + do { +
[Intel-gfx] [PATCH 08/45] drm/i915: Switch back to an array of logical per-engine HW contexts
We switched to a tree of per-engine HW context to accommodate the introduction of virtual engines. However, we plan to also support multiple instances of the same engine within the GEM context, defeating our use of the engine as a key to looking up the HW context. Just allocate a logical per-engine instance and always use an index into the ctx->engines[]. Later on, this ctx->engines[] may be replaced by a user specified map. v2: Add for_each_gem_engine() helper to iterator within the engines lock v3: intel_context_create_request() helper v4: s/unsigned long/unsigned int/ 4 billion engines is quite enough. v5: Push iterator locking to caller Signed-off-by: Chris Wilson Cc: Tvrtko Ursulin Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/gt/intel_context.c | 112 -- drivers/gpu/drm/i915/gt/intel_context.h | 27 + drivers/gpu/drm/i915/gt/intel_context_types.h | 2 - drivers/gpu/drm/i915/gt/intel_engine_cs.c | 2 +- drivers/gpu/drm/i915/gt/mock_engine.c | 3 +- drivers/gpu/drm/i915/gvt/scheduler.c | 2 +- drivers/gpu/drm/i915/i915_gem.c | 24 ++-- drivers/gpu/drm/i915/i915_gem_context.c | 99 ++-- drivers/gpu/drm/i915/i915_gem_context.h | 58 + drivers/gpu/drm/i915/i915_gem_context_types.h | 40 ++- drivers/gpu/drm/i915/i915_gem_execbuffer.c| 70 +-- drivers/gpu/drm/i915/i915_perf.c | 80 +++-- drivers/gpu/drm/i915/i915_request.c | 15 +-- drivers/gpu/drm/i915/intel_guc_submission.c | 22 ++-- .../gpu/drm/i915/selftests/i915_gem_context.c | 2 +- drivers/gpu/drm/i915/selftests/mock_context.c | 14 ++- 16 files changed, 331 insertions(+), 241 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index 15ac99c5dd4a..5e506e648454 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -17,7 +17,7 @@ static struct i915_global_context { struct kmem_cache *slab_ce; } global; -struct intel_context *intel_context_alloc(void) +static struct intel_context *intel_context_alloc(void) { return kmem_cache_zalloc(global.slab_ce, GFP_KERNEL); } @@ -28,104 +28,17 @@ void intel_context_free(struct intel_context *ce) } struct intel_context * -intel_context_lookup(struct i915_gem_context *ctx, +intel_context_create(struct i915_gem_context *ctx, struct intel_engine_cs *engine) { - struct intel_context *ce = NULL; - struct rb_node *p; - - spin_lock(&ctx->hw_contexts_lock); - p = ctx->hw_contexts.rb_node; - while (p) { - struct intel_context *this = - rb_entry(p, struct intel_context, node); - - if (this->engine == engine) { - GEM_BUG_ON(this->gem_context != ctx); - ce = this; - break; - } - - if (this->engine < engine) - p = p->rb_right; - else - p = p->rb_left; - } - spin_unlock(&ctx->hw_contexts_lock); - - return ce; -} - -struct intel_context * -__intel_context_insert(struct i915_gem_context *ctx, - struct intel_engine_cs *engine, - struct intel_context *ce) -{ - struct rb_node **p, *parent; - int err = 0; - - spin_lock(&ctx->hw_contexts_lock); - - parent = NULL; - p = &ctx->hw_contexts.rb_node; - while (*p) { - struct intel_context *this; - - parent = *p; - this = rb_entry(parent, struct intel_context, node); - - if (this->engine == engine) { - err = -EEXIST; - ce = this; - break; - } - - if (this->engine < engine) - p = &parent->rb_right; - else - p = &parent->rb_left; - } - if (!err) { - rb_link_node(&ce->node, parent, p); - rb_insert_color(&ce->node, &ctx->hw_contexts); - } - - spin_unlock(&ctx->hw_contexts_lock); - - return ce; -} - -void __intel_context_remove(struct intel_context *ce) -{ - struct i915_gem_context *ctx = ce->gem_context; - - spin_lock(&ctx->hw_contexts_lock); - rb_erase(&ce->node, &ctx->hw_contexts); - spin_unlock(&ctx->hw_contexts_lock); -} - -struct intel_context * -intel_context_instance(struct i915_gem_context *ctx, - struct intel_engine_cs *engine) -{ - struct intel_context *ce, *pos; - - ce = intel_context_lookup(ctx, engine); - if (likely(ce)) - return intel_context_get(ce); + struct intel_context *ce; ce = intel_context_alloc(); if (!ce) return ERR_PTR(-ENOMEM); intel_con
[Intel-gfx] [PATCH 01/45] drm/i915: Seal races between async GPU cancellation, retirement and signaling
Currently there is an underlying assumption that i915_request_unsubmit() is synchronous wrt the GPU -- that is the request is no longer in flight as we remove it. In the near future that may change, and this may upset our signaling as we can process an interrupt for that request while it is no longer in flight. CPU0CPU1 intel_engine_breadcrumbs_irq (queue request completion) i915_request_cancel_signaling ... ... i915_request_enable_signaling dma_fence_signal Hence in the time it took us to drop the lock to signal the request, a preemption event may have occurred and re-queued the request. In the process, that request would have seen I915_FENCE_FLAG_SIGNAL clear and so reused the rq->signal_link that was in use on CPU0, leading to bad pointer chasing in intel_engine_breadcrumbs_irq. A related issue was that if someone started listening for a signal on a completed but no longer in-flight request, we missed the opportunity to immediately signal that request. Furthermore, as intel_contexts may be immediately released during request retirement, in order to be entirely sure that intel_engine_breadcrumbs_irq may no longer dereference the intel_context (ce->signals and ce->signal_link), we must wait for irq spinlock. In order to prevent the race, we use a bit in the fence.flags to signal the transfer onto the signal list inside intel_engine_breadcrumbs_irq. For simplicity, we use the DMA_FENCE_FLAG_SIGNALED_BIT as it then quickly signals to any outside observer that the fence is indeed signaled. Fixes: 52c0fdb25c7c ("drm/i915: Replace global breadcrumbs with per-context interrupt tracking") Signed-off-by: Chris Wilson Cc: Tvrtko Ursulin --- drivers/dma-buf/dma-fence.c | 1 + drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 58 + drivers/gpu/drm/i915/i915_request.c | 1 + 3 files changed, 39 insertions(+), 21 deletions(-) diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c index 3aa8733f832a..9bf06042619a 100644 --- a/drivers/dma-buf/dma-fence.c +++ b/drivers/dma-buf/dma-fence.c @@ -29,6 +29,7 @@ EXPORT_TRACEPOINT_SYMBOL(dma_fence_emit); EXPORT_TRACEPOINT_SYMBOL(dma_fence_enable_signal); +EXPORT_TRACEPOINT_SYMBOL(dma_fence_signaled); static DEFINE_SPINLOCK(dma_fence_stub_lock); static struct dma_fence dma_fence_stub; diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c index 3cbffd400b1b..4283224249d4 100644 --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c @@ -23,6 +23,7 @@ */ #include +#include #include #include "i915_drv.h" @@ -83,6 +84,7 @@ static inline bool __request_completed(const struct i915_request *rq) void intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine) { struct intel_breadcrumbs *b = &engine->breadcrumbs; + const ktime_t timestamp = ktime_get(); struct intel_context *ce, *cn; struct list_head *pos, *next; LIST_HEAD(signal); @@ -104,6 +106,11 @@ void intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine) GEM_BUG_ON(!test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags)); + clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags); + + if (test_and_set_bit(DMA_FENCE_FLAG_SIGNALED_BIT, +&rq->fence.flags)) + continue; /* * Queue for execution after dropping the signaling @@ -111,14 +118,6 @@ void intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine) * more signalers to the same context or engine. */ i915_request_get(rq); - - /* -* We may race with direct invocation of -* dma_fence_signal(), e.g. i915_request_retire(), -* so we need to acquire our reference to the request -* before we cancel the breadcrumb. -*/ - clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags); list_add_tail(&rq->signal_link, &signal); } @@ -140,8 +139,21 @@ void intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine) list_for_each_safe(pos, next, &signal) { struct i915_request *rq = list_entry(pos, typeof(*rq), signal_link); + struct dma_fence_cb *cur, *tmp; + + trace_dma_fence_signaled(&rq->fence); + + rq->fence.timestamp = timestamp; + set_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, &rq->fence.flags); + + spin_loc
[Intel-gfx] [PATCH 03/45] drm/i915: Export intel_context_instance()
We want to pass in a intel_context into intel_context_pin() and that requires us to first be able to lookup the intel_context! Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/gt/intel_context.c| 37 +++--- drivers/gpu/drm/i915/gt/intel_context.h| 19 +++ drivers/gpu/drm/i915/gt/intel_engine_cs.c | 8 - drivers/gpu/drm/i915/gt/mock_engine.c | 8 - drivers/gpu/drm/i915/gvt/scheduler.c | 7 +++- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 11 +-- drivers/gpu/drm/i915/i915_perf.c | 21 drivers/gpu/drm/i915/i915_request.c| 11 ++- 8 files changed, 83 insertions(+), 39 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index 298e463ad082..8b386202b374 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -104,7 +104,7 @@ void __intel_context_remove(struct intel_context *ce) spin_unlock(&ctx->hw_contexts_lock); } -static struct intel_context * +struct intel_context * intel_context_instance(struct i915_gem_context *ctx, struct intel_engine_cs *engine) { @@ -112,7 +112,7 @@ intel_context_instance(struct i915_gem_context *ctx, ce = intel_context_lookup(ctx, engine); if (likely(ce)) - return ce; + return intel_context_get(ce); ce = intel_context_alloc(); if (!ce) @@ -125,7 +125,7 @@ intel_context_instance(struct i915_gem_context *ctx, intel_context_free(ce); GEM_BUG_ON(intel_context_lookup(ctx, engine) != pos); - return pos; + return intel_context_get(pos); } struct intel_context * @@ -139,30 +139,30 @@ intel_context_pin_lock(struct i915_gem_context *ctx, if (IS_ERR(ce)) return ce; - if (mutex_lock_interruptible(&ce->pin_mutex)) + if (mutex_lock_interruptible(&ce->pin_mutex)) { + intel_context_put(ce); return ERR_PTR(-EINTR); + } return ce; } -struct intel_context * -intel_context_pin(struct i915_gem_context *ctx, - struct intel_engine_cs *engine) +void intel_context_pin_unlock(struct intel_context *ce) + __releases(ce->pin_mutex) { - struct intel_context *ce; - int err; - - ce = intel_context_instance(ctx, engine); - if (IS_ERR(ce)) - return ce; + mutex_unlock(&ce->pin_mutex); + intel_context_put(ce); +} - if (likely(atomic_inc_not_zero(&ce->pin_count))) - return ce; +int __intel_context_do_pin(struct intel_context *ce) +{ + int err; if (mutex_lock_interruptible(&ce->pin_mutex)) - return ERR_PTR(-EINTR); + return -EINTR; if (likely(!atomic_read(&ce->pin_count))) { + struct i915_gem_context *ctx = ce->gem_context; intel_wakeref_t wakeref; err = 0; @@ -172,7 +172,6 @@ intel_context_pin(struct i915_gem_context *ctx, goto err; i915_gem_context_get(ctx); - GEM_BUG_ON(ce->gem_context != ctx); mutex_lock(&ctx->mutex); list_add(&ce->active_link, &ctx->active_engines); @@ -186,11 +185,11 @@ intel_context_pin(struct i915_gem_context *ctx, GEM_BUG_ON(!intel_context_is_pinned(ce)); /* no overflow! */ mutex_unlock(&ce->pin_mutex); - return ce; + return 0; err: mutex_unlock(&ce->pin_mutex); - return ERR_PTR(err); + return err; } void intel_context_unpin(struct intel_context *ce) diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h index 60379eb37949..b9a574587eb3 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.h +++ b/drivers/gpu/drm/i915/gt/intel_context.h @@ -49,11 +49,7 @@ intel_context_is_pinned(struct intel_context *ce) return atomic_read(&ce->pin_count); } -static inline void intel_context_pin_unlock(struct intel_context *ce) -__releases(ce->pin_mutex) -{ - mutex_unlock(&ce->pin_mutex); -} +void intel_context_pin_unlock(struct intel_context *ce); struct intel_context * __intel_context_insert(struct i915_gem_context *ctx, @@ -63,7 +59,18 @@ void __intel_context_remove(struct intel_context *ce); struct intel_context * -intel_context_pin(struct i915_gem_context *ctx, struct intel_engine_cs *engine); +intel_context_instance(struct i915_gem_context *ctx, + struct intel_engine_cs *engine); + +int __intel_context_do_pin(struct intel_context *ce); + +static inline int intel_context_pin(struct intel_context *ce) +{ + if (likely(atomic_inc_not_zero(&ce->pin_count))) + return 0; + + return __intel_context_do_pin(ce); +} static inline void __intel_context_pin(struct intel_context *ce) { diff --git a/drivers/gpu/drm/i91
[Intel-gfx] [PATCH 26/45] drm/i915: Move object->pages API to i915_gem_object.[ch]
Currently the code for manipulating the pages on an object is still residing in i915_gem.c, move it to i915_gem_object.c Signed-off-by: Chris Wilson Cc: Joonas Lahtinen Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/Makefile | 4 +- .../gpu/drm/i915/{ => gem}/i915_gem_object.c | 5 +- .../gpu/drm/i915/{ => gem}/i915_gem_object.h | 127 +++- drivers/gpu/drm/i915/i915_drv.h | 137 +- drivers/gpu/drm/i915/i915_globals.c | 2 +- drivers/gpu/drm/i915/i915_vma.h | 2 +- drivers/gpu/drm/i915/intel_frontbuffer.h | 2 +- 7 files changed, 141 insertions(+), 138 deletions(-) rename drivers/gpu/drm/i915/{ => gem}/i915_gem_object.c (97%) rename drivers/gpu/drm/i915/{ => gem}/i915_gem_object.h (59%) diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index def781c9ea69..deac659ddef0 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -86,7 +86,10 @@ i915-y += $(gt-y) # GEM (Graphics Execution Management) code obj-y += gem/ +gem-y += \ + gem/i915_gem_object.o i915-y += \ + $(gem-y) \ i915_active.o \ i915_cmd_parser.o \ i915_gem_batch_pool.o \ @@ -99,7 +102,6 @@ i915-y += \ i915_gem_gtt.o \ i915_gem_internal.o \ i915_gem.o \ - i915_gem_object.o \ i915_gem_pm.o \ i915_gem_render_state.o \ i915_gem_shrinker.o \ diff --git a/drivers/gpu/drm/i915/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c similarity index 97% rename from drivers/gpu/drm/i915/i915_gem_object.c rename to drivers/gpu/drm/i915/gem/i915_gem_object.c index ac6a5ab84586..8179252bb39b 100644 --- a/drivers/gpu/drm/i915/i915_gem_object.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c @@ -22,9 +22,10 @@ * */ -#include "i915_drv.h" #include "i915_gem_object.h" -#include "i915_globals.h" + +#include "../i915_drv.h" +#include "../i915_globals.h" static struct i915_global_object { struct i915_global base; diff --git a/drivers/gpu/drm/i915/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h similarity index 59% rename from drivers/gpu/drm/i915/i915_gem_object.h rename to drivers/gpu/drm/i915/gem/i915_gem_object.h index 3666b0c5f6ca..061b20c3da5b 100644 --- a/drivers/gpu/drm/i915/i915_gem_object.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h @@ -13,7 +13,7 @@ #include -#include "gem/i915_gem_object_types.h" +#include "i915_gem_object_types.h" struct drm_i915_gem_object *i915_gem_object_alloc(void); void i915_gem_object_free(struct drm_i915_gem_object *obj); @@ -192,6 +192,131 @@ i915_gem_object_get_tile_row_size(const struct drm_i915_gem_object *obj) int i915_gem_object_set_tiling(struct drm_i915_gem_object *obj, unsigned int tiling, unsigned int stride); +struct scatterlist * +i915_gem_object_get_sg(struct drm_i915_gem_object *obj, + unsigned int n, unsigned int *offset); + +struct page * +i915_gem_object_get_page(struct drm_i915_gem_object *obj, +unsigned int n); + +struct page * +i915_gem_object_get_dirty_page(struct drm_i915_gem_object *obj, + unsigned int n); + +dma_addr_t +i915_gem_object_get_dma_address(struct drm_i915_gem_object *obj, + unsigned long n); + +void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj, +struct sg_table *pages, +unsigned int sg_page_sizes); +int __i915_gem_object_get_pages(struct drm_i915_gem_object *obj); + +static inline int __must_check +i915_gem_object_pin_pages(struct drm_i915_gem_object *obj) +{ + might_lock(&obj->mm.lock); + + if (atomic_inc_not_zero(&obj->mm.pages_pin_count)) + return 0; + + return __i915_gem_object_get_pages(obj); +} + +static inline bool +i915_gem_object_has_pages(struct drm_i915_gem_object *obj) +{ + return !IS_ERR_OR_NULL(READ_ONCE(obj->mm.pages)); +} + +static inline void +__i915_gem_object_pin_pages(struct drm_i915_gem_object *obj) +{ + GEM_BUG_ON(!i915_gem_object_has_pages(obj)); + + atomic_inc(&obj->mm.pages_pin_count); +} + +static inline bool +i915_gem_object_has_pinned_pages(struct drm_i915_gem_object *obj) +{ + return atomic_read(&obj->mm.pages_pin_count); +} + +static inline void +__i915_gem_object_unpin_pages(struct drm_i915_gem_object *obj) +{ + GEM_BUG_ON(!i915_gem_object_has_pages(obj)); + GEM_BUG_ON(!i915_gem_object_has_pinned_pages(obj)); + + atomic_dec(&obj->mm.pages_pin_count); +} + +static inline void +i915_gem_object_unpin_pages(struct drm_i915_gem_object *obj) +{ + __i915_gem_object_unpin_pages(obj); +} + +enum i915_mm_subclass { /* lockdep subclass for obj->mm.lock/struct_mutex */ + I915_MM_NORMAL = 0, + I915_MM_SHRINKER /* called "recursively" from dir
Re: [Intel-gfx] [PATCH v4 0/1] drm/fb-helper: Move modesetting code to drm_client
On Thu, Apr 25, 2019 at 10:31 AM Noralf Trønnes wrote: > > The Intel CI [1] was not happy with the previous version and I don't > know which part it didn't like. So I'll split up the series and feed it > piece by piece until I know where the problem is. You can also send stuff to intel-gfx-try...@lists.fd.o for test runs that you don't want the world to see (it's still public, but no one looks really). -Daniel > > Noralf. > > [1] https://patchwork.freedesktop.org/series/58597/ > > Noralf Trønnes (1): > drm/fb-helper: Avoid race with DRM userspace > > drivers/gpu/drm/drm_auth.c | 20 > drivers/gpu/drm/drm_fb_helper.c | 90 - > drivers/gpu/drm/drm_internal.h | 2 + > 3 files changed, 67 insertions(+), 45 deletions(-) > > -- > 2.20.1 > > ___ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 14/45] drm/i915: Make engine_mask & num_engines static
On Thu, 25 Apr 2019, Chris Wilson wrote: > Having removed the urge to modify the engine_mask at runtime, we can > promote the num_engines from a runtime calculation to a static and push > it into the device_info tables. \o/ Acked-by: Jani Nikula > > Signed-off-by: Chris Wilson > Cc: Tvrtko Ursulin > Cc: Jani Nikula > --- > drivers/gpu/drm/i915/gt/intel_engine_cs.c | 3 -- > drivers/gpu/drm/i915/gt/intel_ringbuffer.c| 2 +- > drivers/gpu/drm/i915/gt/selftest_lrc.c| 4 +- > drivers/gpu/drm/i915/i915_pci.c | 45 +-- > drivers/gpu/drm/i915/intel_device_info.h | 3 +- > .../gpu/drm/i915/selftests/i915_gem_context.c | 4 +- > drivers/gpu/drm/i915/selftests/i915_request.c | 2 +- > 7 files changed, 27 insertions(+), 36 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c > b/drivers/gpu/drm/i915/gt/intel_engine_cs.c > index 862cf1040f88..b2cd6c6785be 100644 > --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c > +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c > @@ -383,9 +383,6 @@ int intel_engines_init_mmio(struct drm_i915_private *i915) > goto cleanup; > } > > - RUNTIME_INFO(i915)->num_engines = > - hweight32(INTEL_INFO(i915)->engine_mask); > - > i915_check_and_clear_faults(i915); > > return 0; > diff --git a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c > b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c > index 7a4804c47466..2edeedb748ed 100644 > --- a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c > +++ b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c > @@ -1568,7 +1568,7 @@ static inline int mi_set_context(struct i915_request > *rq, u32 flags) > struct intel_engine_cs *engine = rq->engine; > enum intel_engine_id id; > const int num_engines = > - IS_HSW_GT1(i915) ? RUNTIME_INFO(i915)->num_engines - 1 : 0; > + IS_HSW_GT1(i915) ? INTEL_INFO(i915)->num_engines - 1 : 0; > bool force_restore = false; > int len; > u32 *cs; > diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c > b/drivers/gpu/drm/i915/gt/selftest_lrc.c > index 84538f69185b..6f223f7facde 100644 > --- a/drivers/gpu/drm/i915/gt/selftest_lrc.c > +++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c > @@ -1185,7 +1185,7 @@ static int smoke_crescendo(struct preempt_smoke *smoke, > unsigned int flags) > > pr_info("Submitted %lu crescendo:%x requests across %d engines and %d > contexts\n", > count, flags, > - RUNTIME_INFO(smoke->i915)->num_engines, smoke->ncontext); > + INTEL_INFO(smoke->i915)->num_engines, smoke->ncontext); > return 0; > } > > @@ -1213,7 +1213,7 @@ static int smoke_random(struct preempt_smoke *smoke, > unsigned int flags) > > pr_info("Submitted %lu random:%x requests across %d engines and %d > contexts\n", > count, flags, > - RUNTIME_INFO(smoke->i915)->num_engines, smoke->ncontext); > + INTEL_INFO(smoke->i915)->num_engines, smoke->ncontext); > return 0; > } > > diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c > index ffa2ee70a03d..431a4a2c20e1 100644 > --- a/drivers/gpu/drm/i915/i915_pci.c > +++ b/drivers/gpu/drm/i915/i915_pci.c > @@ -35,6 +35,7 @@ > > #define PLATFORM(x) .platform = (x) > #define GEN(x) .gen = (x), .gen_mask = BIT((x) - 1) > +#define ENGINES(x) .engine_mask = (x), .num_engines = hweight8(x) > > #define I845_PIPE_OFFSETS \ > .pipe_offsets = { \ > @@ -145,6 +146,7 @@ > > #define I830_FEATURES \ > GEN(2), \ > + ENGINES(BIT(RCS0)), \ > .is_mobile = 1, \ > .num_pipes = 2, \ > .display.has_overlay = 1, \ > @@ -154,7 +156,6 @@ > .gpu_reset_clobbers_display = true, \ > .hws_needs_physical = 1, \ > .unfenced_needs_alignment = 1, \ > - .engine_mask = BIT(RCS0), \ > .has_snoop = true, \ > .has_coherent_ggtt = false, \ > I9XX_PIPE_OFFSETS, \ > @@ -164,6 +165,7 @@ > > #define I845_FEATURES \ > GEN(2), \ > + ENGINES(BIT(RCS0)), \ > .num_pipes = 1, \ > .display.has_overlay = 1, \ > .display.overlay_needs_physical = 1, \ > @@ -171,7 +173,6 @@ > .gpu_reset_clobbers_display = true, \ > .hws_needs_physical = 1, \ > .unfenced_needs_alignment = 1, \ > - .engine_mask = BIT(RCS0), \ > .has_snoop = true, \ > .has_coherent_ggtt = false, \ > I845_PIPE_OFFSETS, \ > @@ -202,10 +203,10 @@ static const struct intel_device_info intel_i865g_info > = { > > #define GEN3_FEATURES \ > GEN(3), \ > + ENGINES(BIT(RCS0)), \ > .num_pipes = 2, \ > .display.has_gmch = 1, \ > .gpu_reset_clobbers_display = true, \ > - .engine_mask = BIT(RCS0), \ > .has_snoop = true, \ > .has_coherent_ggtt = true, \ > I9XX_PIPE_OFFSETS, \ > @@ -286,11 +287,11 @@ static const struct intel_device_info > intel_pineview_m_info = { > > #define GEN4_FEATURES \ >
[Intel-gfx] [PATCH 25/45] drm/i915: Pull GEM ioctls interface to its own file
Declutter i915_drv/gem.h by moving the ioctl API into its own header. Signed-off-by: Chris Wilson Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/gem/i915_gem_ioctls.h | 52 ++ drivers/gpu/drm/i915/i915_drv.c| 1 + drivers/gpu/drm/i915/i915_drv.h| 38 drivers/gpu/drm/i915/i915_gem.c| 1 + drivers/gpu/drm/i915/i915_gem_execbuffer.c | 1 + drivers/gpu/drm/i915/i915_gem_tiling.c | 3 ++ drivers/gpu/drm/i915/i915_gem_userptr.c| 12 +++-- 7 files changed, 66 insertions(+), 42 deletions(-) create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_ioctls.h diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ioctls.h b/drivers/gpu/drm/i915/gem/i915_gem_ioctls.h new file mode 100644 index ..ddc7f2a52b3e --- /dev/null +++ b/drivers/gpu/drm/i915/gem/i915_gem_ioctls.h @@ -0,0 +1,52 @@ +/* + * SPDX-License-Identifier: MIT + * + * Copyright © 2019 Intel Corporation + */ + +#ifndef I915_GEM_IOCTLS_H +#define I915_GEM_IOCTLS_H + +struct drm_device; +struct drm_file; + +int i915_gem_busy_ioctl(struct drm_device *dev, void *data, + struct drm_file *file); +int i915_gem_create_ioctl(struct drm_device *dev, void *data, + struct drm_file *file); +int i915_gem_execbuffer_ioctl(struct drm_device *dev, void *data, + struct drm_file *file); +int i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data, + struct drm_file *file); +int i915_gem_get_aperture_ioctl(struct drm_device *dev, void *data, + struct drm_file *file); +int i915_gem_get_caching_ioctl(struct drm_device *dev, void *data, + struct drm_file *file); +int i915_gem_get_tiling_ioctl(struct drm_device *dev, void *data, + struct drm_file *file); +int i915_gem_madvise_ioctl(struct drm_device *dev, void *data, + struct drm_file *file); +int i915_gem_mmap_ioctl(struct drm_device *dev, void *data, + struct drm_file *file); +int i915_gem_mmap_gtt_ioctl(struct drm_device *dev, void *data, + struct drm_file *file); +int i915_gem_pread_ioctl(struct drm_device *dev, void *data, +struct drm_file *file); +int i915_gem_pwrite_ioctl(struct drm_device *dev, void *data, + struct drm_file *file); +int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data, + struct drm_file *file); +int i915_gem_set_domain_ioctl(struct drm_device *dev, void *data, + struct drm_file *file); +int i915_gem_set_tiling_ioctl(struct drm_device *dev, void *data, + struct drm_file *file); +int i915_gem_sw_finish_ioctl(struct drm_device *dev, void *data, +struct drm_file *file); +int i915_gem_throttle_ioctl(struct drm_device *dev, void *data, + struct drm_file *file); +int i915_gem_userptr_ioctl(struct drm_device *dev, void *data, + struct drm_file *file); +int i915_gem_wait_ioctl(struct drm_device *dev, void *data, + struct drm_file *file); + +#endif diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 10d7016e8f06..cd8d0bd38367 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -47,6 +47,7 @@ #include #include +#include "gem/i915_gem_ioctls.h" #include "gt/intel_gt_pm.h" #include "gt/intel_reset.h" #include "gt/intel_workarounds.h" diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 0cf87495e11b..d8ad9571b087 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2824,46 +2824,8 @@ ibx_disable_display_interrupt(struct drm_i915_private *dev_priv, u32 bits) } /* i915_gem.c */ -int i915_gem_create_ioctl(struct drm_device *dev, void *data, - struct drm_file *file_priv); -int i915_gem_pread_ioctl(struct drm_device *dev, void *data, -struct drm_file *file_priv); -int i915_gem_pwrite_ioctl(struct drm_device *dev, void *data, - struct drm_file *file_priv); -int i915_gem_mmap_ioctl(struct drm_device *dev, void *data, - struct drm_file *file_priv); -int i915_gem_mmap_gtt_ioctl(struct drm_device *dev, void *data, - struct drm_file *file_priv); -int i915_gem_set_domain_ioctl(struct drm_device *dev, void *data, - struct drm_file *file_priv); -int i915_gem_sw_finish_ioctl(struct drm_device *dev, void *data, -struct drm_file *file_priv); -int i915_gem_execbuffer_ioctl(struct drm_device *dev, void *data, - struct drm_file *file_priv); -int i915_gem_execbuffer2_ioctl(struct drm_device *dev, voi
[Intel-gfx] [PATCH 30/45] drm/i915: Move GEM domain management to its own file
Continuing the decluttering of i915_gem.c, that of the read/write domains, perhaps the biggest of GEM's follies? Signed-off-by: Chris Wilson Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/gem/i915_gem_domain.c| 784 ++ drivers/gpu/drm/i915/gem/i915_gem_object.h| 29 + drivers/gpu/drm/i915/gvt/cmd_parser.c | 4 +- drivers/gpu/drm/i915/gvt/scheduler.c | 6 +- drivers/gpu/drm/i915/i915_cmd_parser.c| 8 +- drivers/gpu/drm/i915/i915_drv.h | 26 - drivers/gpu/drm/i915/i915_gem.c | 777 + drivers/gpu/drm/i915/i915_gem_execbuffer.c| 4 +- drivers/gpu/drm/i915/i915_gem_render_state.c | 4 +- drivers/gpu/drm/i915/selftests/huge_pages.c | 4 +- .../drm/i915/selftests/i915_gem_coherency.c | 8 +- .../gpu/drm/i915/selftests/i915_gem_context.c | 8 +- 13 files changed, 841 insertions(+), 822 deletions(-) create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_domain.c diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index b86af182b1ac..35561c0d80f9 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -87,6 +87,7 @@ i915-y += $(gt-y) # GEM (Graphics Execution Management) code obj-y += gem/ gem-y += \ + gem/i915_gem_domain.o \ gem/i915_gem_object.o \ gem/i915_gem_mman.o \ gem/i915_gem_pages.o \ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c new file mode 100644 index ..eee421e3021c --- /dev/null +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c @@ -0,0 +1,784 @@ +/* + * SPDX-License-Identifier: MIT + * + * Copyright © 2014-2016 Intel Corporation + */ + +#include "i915_gem_ioctls.h" +#include "i915_gem_object.h" + +#include "../i915_drv.h" +#include "../i915_gem_clflush.h" +#include "../i915_gem_gtt.h" +#include "../i915_vma.h" + +#include "../intel_frontbuffer.h" + +static void __i915_gem_object_flush_for_display(struct drm_i915_gem_object *obj) +{ + /* +* We manually flush the CPU domain so that we can override and +* force the flush for the display, and perform it asyncrhonously. +*/ + i915_gem_object_flush_write_domain(obj, ~I915_GEM_DOMAIN_CPU); + if (obj->cache_dirty) + i915_gem_clflush_object(obj, I915_CLFLUSH_FORCE); + obj->write_domain = 0; +} + +void i915_gem_object_flush_if_display(struct drm_i915_gem_object *obj) +{ + if (!READ_ONCE(obj->pin_global)) + return; + + mutex_lock(&obj->base.dev->struct_mutex); + __i915_gem_object_flush_for_display(obj); + mutex_unlock(&obj->base.dev->struct_mutex); +} + +/** + * Moves a single object to the WC read, and possibly write domain. + * @obj: object to act on + * @write: ask for write access or read only + * + * This function returns when the move is complete, including waiting on + * flushes to occur. + */ +int +i915_gem_object_set_to_wc_domain(struct drm_i915_gem_object *obj, bool write) +{ + int ret; + + lockdep_assert_held(&obj->base.dev->struct_mutex); + + ret = i915_gem_object_wait(obj, + I915_WAIT_INTERRUPTIBLE | + I915_WAIT_LOCKED | + (write ? I915_WAIT_ALL : 0), + MAX_SCHEDULE_TIMEOUT); + if (ret) + return ret; + + if (obj->write_domain == I915_GEM_DOMAIN_WC) + return 0; + + /* Flush and acquire obj->pages so that we are coherent through +* direct access in memory with previous cached writes through +* shmemfs and that our cache domain tracking remains valid. +* For example, if the obj->filp was moved to swap without us +* being notified and releasing the pages, we would mistakenly +* continue to assume that the obj remained out of the CPU cached +* domain. +*/ + ret = i915_gem_object_pin_pages(obj); + if (ret) + return ret; + + i915_gem_object_flush_write_domain(obj, ~I915_GEM_DOMAIN_WC); + + /* Serialise direct access to this object with the barriers for +* coherent writes from the GPU, by effectively invalidating the +* WC domain upon first access. +*/ + if ((obj->read_domains & I915_GEM_DOMAIN_WC) == 0) + mb(); + + /* It should now be out of any other write domains, and we can update +* the domain values for our changes. +*/ + GEM_BUG_ON((obj->write_domain & ~I915_GEM_DOMAIN_WC) != 0); + obj->read_domains |= I915_GEM_DOMAIN_WC; + if (write) { + obj->read_domains = I915_GEM_DOMAIN_WC; + obj->write_domain = I915_GEM_DOMAIN_WC; + obj->mm.dirty = true; + } + + i915_gem_object_unpin_pages(obj); + return 0; +} +
[Intel-gfx] [PATCH 23/45] drm/i915: Allow specification of parallel execbuf
There is a desire to split a task onto two engines and have them run at the same time, e.g. scanline interleaving to spread the workload evenly. Through the use of the out-fence from the first execbuf, we can coordinate secondary execbuf to only become ready simultaneously with the first, so that with all things idle the second execbufs are executed in parallel with the first. The key difference here between the new EXEC_FENCE_SUBMIT and the existing EXEC_FENCE_IN is that the in-fence waits for the completion of the first request (so that all of its rendering results are visible to the second execbuf, the more common userspace fence requirement). Since we only have a single input fence slot, userspace cannot mix an in-fence and a submit-fence. It has to use one or the other! This is not such a harsh requirement, since by virtue of the submit-fence, the secondary execbuf inherit all of the dependencies from the first request, and for the application the dependencies should be common between the primary and secondary execbuf. Suggested-by: Tvrtko Ursulin Testcase: igt/gem_exec_fence/parallel Signed-off-by: Chris Wilson Cc: Tvrtko Ursulin Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/i915_drv.c| 1 + drivers/gpu/drm/i915/i915_gem_execbuffer.c | 25 +- include/uapi/drm/i915_drm.h| 17 ++- 3 files changed, 41 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 81866bd1b6e4..10d7016e8f06 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -435,6 +435,7 @@ static int i915_getparam_ioctl(struct drm_device *dev, void *data, case I915_PARAM_HAS_EXEC_CAPTURE: case I915_PARAM_HAS_EXEC_BATCH_FIRST: case I915_PARAM_HAS_EXEC_FENCE_ARRAY: + case I915_PARAM_HAS_EXEC_SUBMIT_FENCE: /* For the time being all of these are always true; * if some supported hardware does not have one of these * features this value needs to be provided from diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index d6c5220addd0..7ce25b54c57b 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -2318,6 +2318,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, { struct i915_execbuffer eb; struct dma_fence *in_fence = NULL; + struct dma_fence *exec_fence = NULL; struct sync_file *out_fence = NULL; int out_fence_fd = -1; int err; @@ -2360,11 +2361,24 @@ i915_gem_do_execbuffer(struct drm_device *dev, return -EINVAL; } + if (args->flags & I915_EXEC_FENCE_SUBMIT) { + if (in_fence) { + err = -EINVAL; + goto err_in_fence; + } + + exec_fence = sync_file_get_fence(lower_32_bits(args->rsvd2)); + if (!exec_fence) { + err = -EINVAL; + goto err_in_fence; + } + } + if (args->flags & I915_EXEC_FENCE_OUT) { out_fence_fd = get_unused_fd_flags(O_CLOEXEC); if (out_fence_fd < 0) { err = out_fence_fd; - goto err_in_fence; + goto err_exec_fence; } } @@ -2494,6 +2508,13 @@ i915_gem_do_execbuffer(struct drm_device *dev, goto err_request; } + if (exec_fence) { + err = i915_request_await_execution(eb.request, exec_fence, + eb.engine->bond_execute); + if (err < 0) + goto err_request; + } + if (fences) { err = await_fence_array(&eb, fences); if (err) @@ -2555,6 +2576,8 @@ i915_gem_do_execbuffer(struct drm_device *dev, err_out_fence: if (out_fence_fd != -1) put_unused_fd(out_fence_fd); +err_exec_fence: + dma_fence_put(exec_fence); err_in_fence: dma_fence_put(in_fence); return err; diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index c0054074b4c3..ddf50bb7399a 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -604,6 +604,12 @@ typedef struct drm_i915_irq_wait { */ #define I915_PARAM_MMAP_GTT_COHERENT 52 +/* + * Query whether DRM_I915_GEM_EXECBUFFER2 supports coordination of parallel + * execution through use of explicit fence support. + * See I915_EXEC_FENCE_OUT and I915_EXEC_FENCE_SUBMIT. + */ +#define I915_PARAM_HAS_EXEC_SUBMIT_FENCE 53 /* Must be kept compact -- no holes and well documented */ typedef struct drm_i915_getparam { @@ -1126,7 +1132,16 @@ struct drm_i915_gem_execbuffer2 { */ #define I915_EXEC_FENCE_ARRAY (1<<19) -#define __I915_EXEC_UNKNO
[Intel-gfx] [PATCH 34/45] drm/i915: Move GEM object domain management from struct_mutex to local
Use the per-object local lock to control the cache domain of the individual GEM objects, not struct_mutex. This is a huge leap forward for us in terms of object-level synchronisation; execbuffers are coordinated using the ww_mutex and pread/pwrite is finally fully serialised again. Signed-off-by: Chris Wilson Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/gem/i915_gem_clflush.c | 4 +- drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c| 10 +- drivers/gpu/drm/i915/gem/i915_gem_domain.c| 70 +- .../gpu/drm/i915/gem/i915_gem_execbuffer.c| 123 -- drivers/gpu/drm/i915/gem/i915_gem_fence.c | 97 ++ drivers/gpu/drm/i915/gem/i915_gem_object.c| 2 + drivers/gpu/drm/i915/gem/i915_gem_object.h| 14 ++ drivers/gpu/drm/i915/gem/i915_gem_pm.c| 7 +- .../gpu/drm/i915/gem/selftests/huge_pages.c | 12 +- .../i915/gem/selftests/i915_gem_coherency.c | 12 ++ .../drm/i915/gem/selftests/i915_gem_context.c | 20 +++ .../drm/i915/gem/selftests/i915_gem_mman.c| 6 + .../drm/i915/gem/selftests/i915_gem_phys.c| 4 +- drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 4 + drivers/gpu/drm/i915/gt/selftest_lrc.c| 2 + .../gpu/drm/i915/gt/selftest_workarounds.c| 6 + drivers/gpu/drm/i915/gvt/cmd_parser.c | 2 + drivers/gpu/drm/i915/gvt/scheduler.c | 8 +- drivers/gpu/drm/i915/i915_cmd_parser.c| 23 ++-- drivers/gpu/drm/i915/i915_gem.c | 122 + drivers/gpu/drm/i915/i915_gem_gtt.c | 5 +- drivers/gpu/drm/i915/i915_gem_render_state.c | 2 + drivers/gpu/drm/i915/i915_vma.c | 8 +- drivers/gpu/drm/i915/i915_vma.h | 12 ++ drivers/gpu/drm/i915/intel_display.c | 5 + drivers/gpu/drm/i915/intel_guc_log.c | 6 +- drivers/gpu/drm/i915/intel_overlay.c | 25 ++-- drivers/gpu/drm/i915/intel_uc_fw.c| 6 +- drivers/gpu/drm/i915/selftests/i915_request.c | 4 + drivers/gpu/drm/i915/selftests/igt_spinner.c | 2 + 31 files changed, 444 insertions(+), 180 deletions(-) create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_fence.c diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index 8e70d5972195..e3069812 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -93,6 +93,7 @@ gem-y += \ gem/i915_gem_dmabuf.o \ gem/i915_gem_domain.o \ gem/i915_gem_execbuffer.o \ + gem/i915_gem_fence.o \ gem/i915_gem_internal.o \ gem/i915_gem_object.o \ gem/i915_gem_mman.o \ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_clflush.c b/drivers/gpu/drm/i915/gem/i915_gem_clflush.c index 093bfff55a96..efab47250588 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_clflush.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_clflush.c @@ -96,6 +96,8 @@ bool i915_gem_clflush_object(struct drm_i915_gem_object *obj, { struct clflush *clflush; + assert_object_held(obj); + /* * Stolen memory is always coherent with the GPU as it is explicitly * marked as wc by the system, or the system is cache-coherent. @@ -145,9 +147,7 @@ bool i915_gem_clflush_object(struct drm_i915_gem_object *obj, true, I915_FENCE_TIMEOUT, I915_FENCE_GFP); - reservation_object_lock(obj->resv, NULL); reservation_object_add_excl_fence(obj->resv, &clflush->dma); - reservation_object_unlock(obj->resv); i915_sw_fence_commit(&clflush->wait); } else if (obj->mm.pages) { diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c index 4e7efe159531..50981ea513f0 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c @@ -152,7 +152,6 @@ static int i915_gem_dmabuf_mmap(struct dma_buf *dma_buf, struct vm_area_struct * static int i915_gem_begin_cpu_access(struct dma_buf *dma_buf, enum dma_data_direction direction) { struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf); - struct drm_device *dev = obj->base.dev; bool write = (direction == DMA_BIDIRECTIONAL || direction == DMA_TO_DEVICE); int err; @@ -160,12 +159,12 @@ static int i915_gem_begin_cpu_access(struct dma_buf *dma_buf, enum dma_data_dire if (err) return err; - err = i915_mutex_lock_interruptible(dev); + err = i915_gem_object_lock_interruptible(obj); if (err) goto out; err = i915_gem_object_set_to_cpu_domain(obj, write); - mutex_unlock(&dev->struct_mutex); + i915_gem_object_unlock(obj); out: i915_gem_object_unpin_pages(obj); @@ -175,19 +174,18 @@ static int i915_gem_begin_cpu_access(struct dma_buf *dma_buf, enum dma_da
[Intel-gfx] [PATCH 28/45] drm/i915: Move phys objects to its own file
Continuing the decluttering of i915_gem.c, this time the legacy physical object. Signed-off-by: Chris Wilson Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/Makefile | 2 + drivers/gpu/drm/i915/gem/i915_gem_object.h| 11 +- .../gpu/drm/i915/gem/i915_gem_object_types.h | 2 + drivers/gpu/drm/i915/gem/i915_gem_pages.c | 510 + drivers/gpu/drm/i915/gem/i915_gem_phys.c | 212 ++ drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 61 ++ .../drm/i915/gem/selftests/i915_gem_phys.c| 80 ++ drivers/gpu/drm/i915/i915_drv.h | 2 - drivers/gpu/drm/i915/i915_gem.c | 699 +- drivers/gpu/drm/i915/i915_gem_shrinker.c | 59 +- .../gpu/drm/i915/selftests/i915_gem_object.c | 54 -- .../drm/i915/selftests/i915_mock_selftests.h | 1 + 12 files changed, 885 insertions(+), 808 deletions(-) create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_pages.c create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_phys.c create mode 100644 drivers/gpu/drm/i915/gem/selftests/i915_gem_phys.c diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index 447143fadb05..84277b29389c 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -88,6 +88,8 @@ i915-y += $(gt-y) obj-y += gem/ gem-y += \ gem/i915_gem_object.o \ + gem/i915_gem_pages.o \ + gem/i915_gem_phys.o \ gem/i915_gem_shmem.o i915-y += \ $(gem-y) \ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h index ad82303f741a..2e963a593245 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h @@ -33,11 +33,17 @@ void __i915_gem_object_release_shmem(struct drm_i915_gem_object *obj, struct sg_table *pages, bool needs_clflush); +int i915_gem_object_attach_phys(struct drm_i915_gem_object *obj, int align); + void i915_gem_close_object(struct drm_gem_object *gem, struct drm_file *file); void i915_gem_free_object(struct drm_gem_object *obj); void i915_gem_flush_free_objects(struct drm_i915_private *i915); +struct sg_table * +__i915_gem_object_unset_pages(struct drm_i915_gem_object *obj); +void i915_gem_object_truncate(struct drm_i915_gem_object *obj); + /** * i915_gem_object_lookup_rcu - look up a temporary GEM object from its handle * @filp: DRM file private date @@ -231,6 +237,8 @@ i915_gem_object_get_dma_address(struct drm_i915_gem_object *obj, void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj, struct sg_table *pages, unsigned int sg_page_sizes); + +int i915_gem_object_get_pages(struct drm_i915_gem_object *obj); int __i915_gem_object_get_pages(struct drm_i915_gem_object *obj); static inline int __must_check @@ -286,7 +294,8 @@ enum i915_mm_subclass { /* lockdep subclass for obj->mm.lock/struct_mutex */ int __i915_gem_object_put_pages(struct drm_i915_gem_object *obj, enum i915_mm_subclass subclass); -void __i915_gem_object_truncate(struct drm_i915_gem_object *obj); +void i915_gem_object_truncate(struct drm_i915_gem_object *obj); +void i915_gem_object_writeback(struct drm_i915_gem_object *obj); enum i915_map_type { I915_MAP_WB = 0, diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h index e4b50944f553..24abdd6999b5 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h @@ -52,6 +52,8 @@ struct drm_i915_gem_object_ops { int (*get_pages)(struct drm_i915_gem_object *obj); void (*put_pages)(struct drm_i915_gem_object *obj, struct sg_table *pages); + void (*truncate)(struct drm_i915_gem_object *obj); + void (*writeback)(struct drm_i915_gem_object *obj); int (*pwrite)(struct drm_i915_gem_object *obj, const struct drm_i915_gem_pwrite *arg); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c new file mode 100644 index ..452aba0a3359 --- /dev/null +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c @@ -0,0 +1,510 @@ +/* + * SPDX-License-Identifier: MIT + * + * Copyright © 2014-2016 Intel Corporation + */ + +#include "i915_gem_object.h" + +#include "../i915_drv.h" + +void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj, +struct sg_table *pages, +unsigned int sg_page_sizes) +{ + struct drm_i915_private *i915 = to_i915(obj->base.dev); + unsigned long supported = INTEL_INFO(i915)->page_sizes; + int i; + + lockdep_assert_held(&obj->mm.lock); + + /* Make the pages coherent with the GPU (flushing any swapin). */ + if (
[Intel-gfx] [PATCH 29/45] drm/i915: Move mmap and friends to its own file
Continuing the decluttering of i915_gem.c, now the turn of do_mmap and the faulthandlers Signed-off-by: Chris Wilson Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/Makefile | 1 + drivers/gpu/drm/i915/gem/i915_gem_mman.c | 505 drivers/gpu/drm/i915/gem/i915_gem_object.c| 56 ++ drivers/gpu/drm/i915/gem/i915_gem_object.h| 7 + .../drm/i915/gem/selftests/i915_gem_mman.c| 503 drivers/gpu/drm/i915/i915_drv.h | 1 - drivers/gpu/drm/i915/i915_gem.c | 561 +- drivers/gpu/drm/i915/i915_gem_tiling.c| 2 +- .../gpu/drm/i915/selftests/i915_gem_object.c | 487 --- .../drm/i915/selftests/i915_live_selftests.h | 1 + 10 files changed, 1088 insertions(+), 1036 deletions(-) create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_mman.c create mode 100644 drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile index 84277b29389c..b86af182b1ac 100644 --- a/drivers/gpu/drm/i915/Makefile +++ b/drivers/gpu/drm/i915/Makefile @@ -88,6 +88,7 @@ i915-y += $(gt-y) obj-y += gem/ gem-y += \ gem/i915_gem_object.o \ + gem/i915_gem_mman.o \ gem/i915_gem_pages.o \ gem/i915_gem_phys.o \ gem/i915_gem_shmem.o diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c new file mode 100644 index ..1bcc6e1091e9 --- /dev/null +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c @@ -0,0 +1,505 @@ +/* + * SPDX-License-Identifier: MIT + * + * Copyright © 2014-2016 Intel Corporation + */ + +#include +#include + +#include "i915_gem_ioctls.h" +#include "i915_gem_object.h" + +#include "../i915_gem_gtt.h" +#include "../i915_vma.h" +#include "../i915_drv.h" +#include "../intel_drv.h" + +static inline bool +__vma_matches(struct vm_area_struct *vma, struct file *filp, + unsigned long addr, unsigned long size) +{ + if (vma->vm_file != filp) + return false; + + return vma->vm_start == addr && + (vma->vm_end - vma->vm_start) == PAGE_ALIGN(size); +} + +/** + * i915_gem_mmap_ioctl - Maps the contents of an object, returning the address + * it is mapped to. + * @dev: drm device + * @data: ioctl data blob + * @file: drm file + * + * While the mapping holds a reference on the contents of the object, it doesn't + * imply a ref on the object itself. + * + * IMPORTANT: + * + * DRM driver writers who look a this function as an example for how to do GEM + * mmap support, please don't implement mmap support like here. The modern way + * to implement DRM mmap support is with an mmap offset ioctl (like + * i915_gem_mmap_gtt) and then using the mmap syscall on the DRM fd directly. + * That way debug tooling like valgrind will understand what's going on, hiding + * the mmap call in a driver private ioctl will break that. The i915 driver only + * does cpu mmaps this way because we didn't know better. + */ +int +i915_gem_mmap_ioctl(struct drm_device *dev, void *data, + struct drm_file *file) +{ + struct drm_i915_gem_mmap *args = data; + struct drm_i915_gem_object *obj; + unsigned long addr; + + if (args->flags & ~(I915_MMAP_WC)) + return -EINVAL; + + if (args->flags & I915_MMAP_WC && !boot_cpu_has(X86_FEATURE_PAT)) + return -ENODEV; + + obj = i915_gem_object_lookup(file, args->handle); + if (!obj) + return -ENOENT; + + /* prime objects have no backing filp to GEM mmap +* pages from. +*/ + if (!obj->base.filp) { + addr = -ENXIO; + goto err; + } + + if (range_overflows(args->offset, args->size, (u64)obj->base.size)) { + addr = -EINVAL; + goto err; + } + + addr = vm_mmap(obj->base.filp, 0, args->size, + PROT_READ | PROT_WRITE, MAP_SHARED, + args->offset); + if (IS_ERR_VALUE(addr)) + goto err; + + if (args->flags & I915_MMAP_WC) { + struct mm_struct *mm = current->mm; + struct vm_area_struct *vma; + + if (down_write_killable(&mm->mmap_sem)) { + addr = -EINTR; + goto err; + } + vma = find_vma(mm, addr); + if (vma && __vma_matches(vma, obj->base.filp, addr, args->size)) + vma->vm_page_prot = + pgprot_writecombine(vm_get_page_prot(vma->vm_flags)); + else + addr = -ENOMEM; + up_write(&mm->mmap_sem); + if (IS_ERR_VALUE(addr)) + goto err; + + /* This may race, but that's ok, it only gets set */ + WRITE_ONCE(obj->frontbuffer_ggtt_origin, ORIGIN_CP
Re: [Intel-gfx] [PATCH 16/17] drm/vkms: Convert to using __drm_atomic_helper_crtc_reset() for reset.
Op 06-03-2019 om 23:43 schreef Rodrigo Siqueira: > On 03/01, Maarten Lankhorst wrote: >> Convert vkms to using __drm_atomic_helper_crtc_reset(), instead of >> writing its own version. Instead of open coding destroy_state(), >> call it directly for freeing the old state. >> >> Signed-off-by: Maarten Lankhorst >> Cc: Rodrigo Siqueira >> Cc: Haneen Mohammed >> Cc: Daniel Vetter >> --- >> drivers/gpu/drm/vkms/vkms_crtc.c | 33 +--- >> 1 file changed, 13 insertions(+), 20 deletions(-) >> >> diff --git a/drivers/gpu/drm/vkms/vkms_crtc.c >> b/drivers/gpu/drm/vkms/vkms_crtc.c >> index 8a9aeb0a9ea8..550888e72c96 100644 >> --- a/drivers/gpu/drm/vkms/vkms_crtc.c >> +++ b/drivers/gpu/drm/vkms/vkms_crtc.c >> @@ -83,26 +83,6 @@ bool vkms_get_vblank_timestamp(struct drm_device *dev, >> unsigned int pipe, >> return true; >> } >> >> -static void vkms_atomic_crtc_reset(struct drm_crtc *crtc) >> -{ >> -struct vkms_crtc_state *vkms_state = NULL; >> - >> -if (crtc->state) { >> -vkms_state = to_vkms_crtc_state(crtc->state); >> -__drm_atomic_helper_crtc_destroy_state(crtc->state); >> -kfree(vkms_state); >> -crtc->state = NULL; >> -} >> - >> -vkms_state = kzalloc(sizeof(*vkms_state), GFP_KERNEL); >> -if (!vkms_state) >> -return; >> -INIT_WORK(&vkms_state->crc_work, vkms_crc_work_handle); >> - >> -crtc->state = &vkms_state->base; >> -crtc->state->crtc = crtc; >> -} >> - >> static struct drm_crtc_state * >> vkms_atomic_crtc_duplicate_state(struct drm_crtc *crtc) >> { >> @@ -135,6 +115,19 @@ static void vkms_atomic_crtc_destroy_state(struct >> drm_crtc *crtc, >> } >> } >> >> +static void vkms_atomic_crtc_reset(struct drm_crtc *crtc) >> +{ >> +struct vkms_crtc_state *vkms_state = >> +kzalloc(sizeof(*vkms_state), GFP_KERNEL); >> + >> +if (crtc->state) >> +vkms_atomic_crtc_destroy_state(crtc, crtc->state); >> + >> +__drm_atomic_helper_crtc_reset(crtc, &vkms_state->base); >> +if (vkms_state) >> +INIT_WORK(&vkms_state->crc_work, vkms_crc_work_handle); >> +} >> + >> static const struct drm_crtc_funcs vkms_crtc_funcs = { >> .set_config = drm_atomic_helper_set_config, >> .destroy= drm_crtc_cleanup, >> -- >> 2.20.1 >> > Hi Maarten, > > First of all, thanks for the patch :) > > I tested it on my VM with the IGT test, and everything looks fine. > > Reviewed-by: Rodrigo Siqueira > Thanks, pushed patches 1-3, and rockchip, tegra, msm, vkms, mali and i915 patches with reviews. :) Will resend the rest of the series and address the feedback. ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [patch V3 01/29] tracing: Cleanup stack trace code
- Remove the extra array member of stack_dump_trace[] along with the ARRAY_SIZE - 1 initialization for struct stack_trace :: max_entries. Both are historical leftovers of no value. The stack tracer never exceeds the array and there is no extra storage requirement either. - Make variables which are only used in trace_stack.c static. - Simplify the enable/disable logic. - Rename stack_trace_print() as it's using the stack_trace_ namespace. Free the name up for stack trace related functions. Signed-off-by: Thomas Gleixner Reviewed-by: Steven Rostedt --- V3: Remove the -1 init and split the variable declaration as requested by Steven. V2: Add more cleanups and use print_max_stack() as requested by Steven. --- include/linux/ftrace.h | 18 -- kernel/trace/trace_stack.c | 42 +- 2 files changed, 17 insertions(+), 43 deletions(-) --- a/include/linux/ftrace.h +++ b/include/linux/ftrace.h @@ -241,21 +241,11 @@ static inline void ftrace_free_mem(struc #ifdef CONFIG_STACK_TRACER -#define STACK_TRACE_ENTRIES 500 - -struct stack_trace; - -extern unsigned stack_trace_index[]; -extern struct stack_trace stack_trace_max; -extern unsigned long stack_trace_max_size; -extern arch_spinlock_t stack_trace_max_lock; - extern int stack_tracer_enabled; -void stack_trace_print(void); -int -stack_trace_sysctl(struct ctl_table *table, int write, - void __user *buffer, size_t *lenp, - loff_t *ppos); + +int stack_trace_sysctl(struct ctl_table *table, int write, + void __user *buffer, size_t *lenp, + loff_t *ppos); /* DO NOT MODIFY THIS VARIABLE DIRECTLY! */ DECLARE_PER_CPU(int, disable_stack_tracer); --- a/kernel/trace/trace_stack.c +++ b/kernel/trace/trace_stack.c @@ -18,30 +18,26 @@ #include "trace.h" -static unsigned long stack_dump_trace[STACK_TRACE_ENTRIES + 1]; -unsigned stack_trace_index[STACK_TRACE_ENTRIES]; +#define STACK_TRACE_ENTRIES 500 + +static unsigned long stack_dump_trace[STACK_TRACE_ENTRIES]; +static unsigned stack_trace_index[STACK_TRACE_ENTRIES]; -/* - * Reserve one entry for the passed in ip. This will allow - * us to remove most or all of the stack size overhead - * added by the stack tracer itself. - */ struct stack_trace stack_trace_max = { - .max_entries= STACK_TRACE_ENTRIES - 1, + .max_entries= STACK_TRACE_ENTRIES, .entries= &stack_dump_trace[0], }; -unsigned long stack_trace_max_size; -arch_spinlock_t stack_trace_max_lock = +static unsigned long stack_trace_max_size; +static arch_spinlock_t stack_trace_max_lock = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED; DEFINE_PER_CPU(int, disable_stack_tracer); static DEFINE_MUTEX(stack_sysctl_mutex); int stack_tracer_enabled; -static int last_stack_tracer_enabled; -void stack_trace_print(void) +static void print_max_stack(void) { long i; int size; @@ -61,16 +57,7 @@ void stack_trace_print(void) } } -/* - * When arch-specific code overrides this function, the following - * data should be filled up, assuming stack_trace_max_lock is held to - * prevent concurrent updates. - * stack_trace_index[] - * stack_trace_max - * stack_trace_max_size - */ -void __weak -check_stack(unsigned long ip, unsigned long *stack) +static void check_stack(unsigned long ip, unsigned long *stack) { unsigned long this_size, flags; unsigned long *p, *top, *start; static int tracer_frame; @@ -179,7 +166,7 @@ check_stack(unsigned long ip, unsigned l stack_trace_max.nr_entries = x; if (task_stack_end_corrupted(current)) { - stack_trace_print(); + print_max_stack(); BUG(); } @@ -412,23 +399,21 @@ stack_trace_sysctl(struct ctl_table *tab void __user *buffer, size_t *lenp, loff_t *ppos) { + int was_enabled; int ret; mutex_lock(&stack_sysctl_mutex); + was_enabled = !!stack_tracer_enabled; ret = proc_dointvec(table, write, buffer, lenp, ppos); - if (ret || !write || - (last_stack_tracer_enabled == !!stack_tracer_enabled)) + if (ret || !write || (was_enabled == !!stack_tracer_enabled)) goto out; - last_stack_tracer_enabled = !!stack_tracer_enabled; - if (stack_tracer_enabled) register_ftrace_function(&trace_ops); else unregister_ftrace_function(&trace_ops); - out: mutex_unlock(&stack_sysctl_mutex); return ret; @@ -444,7 +429,6 @@ static __init int enable_stacktrace(char strncpy(stack_trace_filter_buf, str + len, COMMAND_LINE_SIZE); stack_tracer_enabled = 1; - last_stack_tracer_enabled = 1; return 1; } __setup("stacktrace", enable_stacktrace); ___ Intel-gfx
[Intel-gfx] [patch V3 10/29] mm/page_owner: Simplify stack trace handling
Replace the indirection through struct stack_trace by using the storage array based interfaces. The original code in all printing functions is really wrong. It allocates a storage array on stack which is unused because depot_fetch_stack() does not store anything in it. It overwrites the entries pointer in the stack_trace struct so it points to the depot storage. Signed-off-by: Thomas Gleixner Cc: linux...@kvack.org Cc: Mike Rapoport Cc: David Rientjes Cc: Andrew Morton --- mm/page_owner.c | 79 +++- 1 file changed, 28 insertions(+), 51 deletions(-) --- a/mm/page_owner.c +++ b/mm/page_owner.c @@ -58,15 +58,10 @@ static bool need_page_owner(void) static __always_inline depot_stack_handle_t create_dummy_stack(void) { unsigned long entries[4]; - struct stack_trace dummy; + unsigned int nr_entries; - dummy.nr_entries = 0; - dummy.max_entries = ARRAY_SIZE(entries); - dummy.entries = &entries[0]; - dummy.skip = 0; - - save_stack_trace(&dummy); - return depot_save_stack(&dummy, GFP_KERNEL); + nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 0); + return stack_depot_save(entries, nr_entries, GFP_KERNEL); } static noinline void register_dummy_stack(void) @@ -120,46 +115,39 @@ void __reset_page_owner(struct page *pag } } -static inline bool check_recursive_alloc(struct stack_trace *trace, - unsigned long ip) +static inline bool check_recursive_alloc(unsigned long *entries, +unsigned int nr_entries, +unsigned long ip) { - int i; + unsigned int i; - if (!trace->nr_entries) - return false; - - for (i = 0; i < trace->nr_entries; i++) { - if (trace->entries[i] == ip) + for (i = 0; i < nr_entries; i++) { + if (entries[i] == ip) return true; } - return false; } static noinline depot_stack_handle_t save_stack(gfp_t flags) { unsigned long entries[PAGE_OWNER_STACK_DEPTH]; - struct stack_trace trace = { - .nr_entries = 0, - .entries = entries, - .max_entries = PAGE_OWNER_STACK_DEPTH, - .skip = 2 - }; depot_stack_handle_t handle; + unsigned int nr_entries; - save_stack_trace(&trace); + nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 2); /* -* We need to check recursion here because our request to stackdepot -* could trigger memory allocation to save new entry. New memory -* allocation would reach here and call depot_save_stack() again -* if we don't catch it. There is still not enough memory in stackdepot -* so it would try to allocate memory again and loop forever. +* We need to check recursion here because our request to +* stackdepot could trigger memory allocation to save new +* entry. New memory allocation would reach here and call +* stack_depot_save_entries() again if we don't catch it. There is +* still not enough memory in stackdepot so it would try to +* allocate memory again and loop forever. */ - if (check_recursive_alloc(&trace, _RET_IP_)) + if (check_recursive_alloc(entries, nr_entries, _RET_IP_)) return dummy_handle; - handle = depot_save_stack(&trace, flags); + handle = stack_depot_save(entries, nr_entries, flags); if (!handle) handle = failure_handle; @@ -337,16 +325,10 @@ print_page_owner(char __user *buf, size_ struct page *page, struct page_owner *page_owner, depot_stack_handle_t handle) { - int ret; - int pageblock_mt, page_mt; + int ret, pageblock_mt, page_mt; + unsigned long *entries; + unsigned int nr_entries; char *kbuf; - unsigned long entries[PAGE_OWNER_STACK_DEPTH]; - struct stack_trace trace = { - .nr_entries = 0, - .entries = entries, - .max_entries = PAGE_OWNER_STACK_DEPTH, - .skip = 0 - }; count = min_t(size_t, count, PAGE_SIZE); kbuf = kmalloc(count, GFP_KERNEL); @@ -375,8 +357,8 @@ print_page_owner(char __user *buf, size_ if (ret >= count) goto err; - depot_fetch_stack(handle, &trace); - ret += snprint_stack_trace(kbuf + ret, count - ret, &trace, 0); + nr_entries = stack_depot_fetch(handle, &entries); + ret += stack_trace_snprint(kbuf + ret, count - ret, entries, nr_entries, 0); if (ret >= count) goto err; @@ -407,14 +389,9 @@ void __dump_page_owner(struct page *page { struct page_ext *page_ext = lookup_page_ext(page); struct page_owner *page_owner; - unsigned l
[Intel-gfx] [patch V3 06/29] latency_top: Simplify stack trace handling
Replace the indirection through struct stack_trace with an invocation of the storage array based interface. Signed-off-by: Thomas Gleixner --- kernel/latencytop.c | 17 ++--- 1 file changed, 2 insertions(+), 15 deletions(-) --- a/kernel/latencytop.c +++ b/kernel/latencytop.c @@ -141,20 +141,6 @@ account_global_scheduler_latency(struct memcpy(&latency_record[i], lat, sizeof(struct latency_record)); } -/* - * Iterator to store a backtrace into a latency record entry - */ -static inline void store_stacktrace(struct task_struct *tsk, - struct latency_record *lat) -{ - struct stack_trace trace; - - memset(&trace, 0, sizeof(trace)); - trace.max_entries = LT_BACKTRACEDEPTH; - trace.entries = &lat->backtrace[0]; - save_stack_trace_tsk(tsk, &trace); -} - /** * __account_scheduler_latency - record an occurred latency * @tsk - the task struct of the task hitting the latency @@ -191,7 +177,8 @@ void __sched lat.count = 1; lat.time = usecs; lat.max = usecs; - store_stacktrace(tsk, &lat); + + stack_trace_save_tsk(tsk, lat.backtrace, LT_BACKTRACEDEPTH, 0); raw_spin_lock_irqsave(&latency_lock, flags); ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [patch V3 16/29] drm: Simplify stacktrace handling
Replace the indirection through struct stack_trace by using the storage array based interfaces. The original code in all printing functions is really wrong. It allocates a storage array on stack which is unused because depot_fetch_stack() does not store anything in it. It overwrites the entries pointer in the stack_trace struct so it points to the depot storage. Signed-off-by: Thomas Gleixner Acked-by: Daniel Vetter Cc: intel-gfx@lists.freedesktop.org Cc: Joonas Lahtinen Cc: Maarten Lankhorst Cc: dri-de...@lists.freedesktop.org Cc: David Airlie Cc: Jani Nikula Cc: Rodrigo Vivi --- drivers/gpu/drm/drm_mm.c| 22 +++--- drivers/gpu/drm/i915/i915_vma.c | 11 --- drivers/gpu/drm/i915/intel_runtime_pm.c | 21 +++-- 3 files changed, 18 insertions(+), 36 deletions(-) --- a/drivers/gpu/drm/drm_mm.c +++ b/drivers/gpu/drm/drm_mm.c @@ -106,22 +106,19 @@ static noinline void save_stack(struct drm_mm_node *node) { unsigned long entries[STACKDEPTH]; - struct stack_trace trace = { - .entries = entries, - .max_entries = STACKDEPTH, - .skip = 1 - }; + unsigned int n; - save_stack_trace(&trace); + n = stack_trace_save(entries, ARRAY_SIZE(entries), 1); /* May be called under spinlock, so avoid sleeping */ - node->stack = depot_save_stack(&trace, GFP_NOWAIT); + node->stack = stack_depot_save(entries, n, GFP_NOWAIT); } static void show_leaks(struct drm_mm *mm) { struct drm_mm_node *node; - unsigned long entries[STACKDEPTH]; + unsigned long *entries; + unsigned int nr_entries; char *buf; buf = kmalloc(BUFSZ, GFP_KERNEL); @@ -129,19 +126,14 @@ static void show_leaks(struct drm_mm *mm return; list_for_each_entry(node, drm_mm_nodes(mm), node_list) { - struct stack_trace trace = { - .entries = entries, - .max_entries = STACKDEPTH - }; - if (!node->stack) { DRM_ERROR("node [%08llx + %08llx]: unknown owner\n", node->start, node->size); continue; } - depot_fetch_stack(node->stack, &trace); - snprint_stack_trace(buf, BUFSZ, &trace, 0); + nr_entries = stack_depot_fetch(node->stack, &entries); + stack_trace_snprint(buf, BUFSZ, entries, nr_entries, 0); DRM_ERROR("node [%08llx + %08llx]: inserted at\n%s", node->start, node->size, buf); } --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -36,11 +36,8 @@ static void vma_print_allocator(struct i915_vma *vma, const char *reason) { - unsigned long entries[12]; - struct stack_trace trace = { - .entries = entries, - .max_entries = ARRAY_SIZE(entries), - }; + unsigned long *entries; + unsigned int nr_entries; char buf[512]; if (!vma->node.stack) { @@ -49,8 +46,8 @@ static void vma_print_allocator(struct i return; } - depot_fetch_stack(vma->node.stack, &trace); - snprint_stack_trace(buf, sizeof(buf), &trace, 0); + nr_entries = stack_depot_fetch(vma->node.stack, &entries); + stack_trace_snprint(buf, sizeof(buf), entries, nr_entries, 0); DRM_DEBUG_DRIVER("vma.node [%08llx + %08llx] %s: inserted at %s\n", vma->node.start, vma->node.size, reason, buf); } --- a/drivers/gpu/drm/i915/intel_runtime_pm.c +++ b/drivers/gpu/drm/i915/intel_runtime_pm.c @@ -60,27 +60,20 @@ static noinline depot_stack_handle_t __save_depot_stack(void) { unsigned long entries[STACKDEPTH]; - struct stack_trace trace = { - .entries = entries, - .max_entries = ARRAY_SIZE(entries), - .skip = 1, - }; + unsigned int n; - save_stack_trace(&trace); - return depot_save_stack(&trace, GFP_NOWAIT | __GFP_NOWARN); + n = stack_trace_save(entries, ARRAY_SIZE(entries), 1); + return stack_depot_save(entries, n, GFP_NOWAIT | __GFP_NOWARN); } static void __print_depot_stack(depot_stack_handle_t stack, char *buf, int sz, int indent) { - unsigned long entries[STACKDEPTH]; - struct stack_trace trace = { - .entries = entries, - .max_entries = ARRAY_SIZE(entries), - }; + unsigned long *entries; + unsigned int nr_entries; - depot_fetch_stack(stack, &trace); - snprint_stack_trace(buf, sz, &trace, indent); + nr_entries = stack_depot_fetch(stack, &entries); + stack_trace_snprint(buf, sz, entries, nr_entries, indent); } static void init_intel_runtime_pm_wakeref(struct drm_i915_private *i915) __
[Intel-gfx] [patch V3 08/29] mm/kmemleak: Simplify stacktrace handling
Replace the indirection through struct stack_trace by using the storage array based interfaces. Signed-off-by: Thomas Gleixner Acked-by: Catalin Marinas Cc: linux...@kvack.org --- mm/kmemleak.c | 24 +++- 1 file changed, 3 insertions(+), 21 deletions(-) --- a/mm/kmemleak.c +++ b/mm/kmemleak.c @@ -410,11 +410,6 @@ static void print_unreferenced(struct se */ static void dump_object_info(struct kmemleak_object *object) { - struct stack_trace trace; - - trace.nr_entries = object->trace_len; - trace.entries = object->trace; - pr_notice("Object 0x%08lx (size %zu):\n", object->pointer, object->size); pr_notice(" comm \"%s\", pid %d, jiffies %lu\n", @@ -424,7 +419,7 @@ static void dump_object_info(struct kmem pr_notice(" flags = 0x%x\n", object->flags); pr_notice(" checksum = %u\n", object->checksum); pr_notice(" backtrace:\n"); - print_stack_trace(&trace, 4); + stack_trace_print(object->trace, object->trace_len, 4); } /* @@ -553,15 +548,7 @@ static struct kmemleak_object *find_and_ */ static int __save_stack_trace(unsigned long *trace) { - struct stack_trace stack_trace; - - stack_trace.max_entries = MAX_TRACE; - stack_trace.nr_entries = 0; - stack_trace.entries = trace; - stack_trace.skip = 2; - save_stack_trace(&stack_trace); - - return stack_trace.nr_entries; + return stack_trace_save(trace, MAX_TRACE, 2); } /* @@ -2019,13 +2006,8 @@ early_param("kmemleak", kmemleak_boot_co static void __init print_log_trace(struct early_log *log) { - struct stack_trace trace; - - trace.nr_entries = log->trace_len; - trace.entries = log->trace; - pr_notice("Early log backtrace:\n"); - print_stack_trace(&trace, 2); + stack_trace_print(log->trace, log->trace_len, 2); } /* ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [patch V3 12/29] dma/debug: Simplify stracktrace retrieval
Replace the indirection through struct stack_trace with an invocation of the storage array based interface. Signed-off-by: Thomas Gleixner Reviewed-by: Christoph Hellwig Cc: io...@lists.linux-foundation.org Cc: Robin Murphy Cc: Marek Szyprowski --- kernel/dma/debug.c | 14 ++ 1 file changed, 6 insertions(+), 8 deletions(-) --- a/kernel/dma/debug.c +++ b/kernel/dma/debug.c @@ -89,8 +89,8 @@ struct dma_debug_entry { int sg_mapped_ents; enum map_err_types map_err_type; #ifdef CONFIG_STACKTRACE - struct stack_trace stacktrace; - unsigned longst_entries[DMA_DEBUG_STACKTRACE_ENTRIES]; + unsigned intstack_len; + unsigned long stack_entries[DMA_DEBUG_STACKTRACE_ENTRIES]; #endif }; @@ -174,7 +174,7 @@ static inline void dump_entry_trace(stru #ifdef CONFIG_STACKTRACE if (entry) { pr_warning("Mapped at:\n"); - print_stack_trace(&entry->stacktrace, 0); + stack_trace_print(entry->stack_entries, entry->stack_len, 0); } #endif } @@ -704,12 +704,10 @@ static struct dma_debug_entry *dma_entry spin_unlock_irqrestore(&free_entries_lock, flags); #ifdef CONFIG_STACKTRACE - entry->stacktrace.max_entries = DMA_DEBUG_STACKTRACE_ENTRIES; - entry->stacktrace.entries = entry->st_entries; - entry->stacktrace.skip = 1; - save_stack_trace(&entry->stacktrace); + entry->stack_len = stack_trace_save(entry->stack_entries, + ARRAY_SIZE(entry->stack_entries), + 1); #endif - return entry; } ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [patch V3 07/29] mm/slub: Simplify stack trace retrieval
Replace the indirection through struct stack_trace with an invocation of the storage array based interface. Signed-off-by: Thomas Gleixner Acked-by: Christoph Lameter Cc: Andrew Morton Cc: Pekka Enberg Cc: linux...@kvack.org Cc: David Rientjes --- mm/slub.c | 12 1 file changed, 4 insertions(+), 8 deletions(-) --- a/mm/slub.c +++ b/mm/slub.c @@ -552,18 +552,14 @@ static void set_track(struct kmem_cache if (addr) { #ifdef CONFIG_STACKTRACE - struct stack_trace trace; + unsigned int nr_entries; - trace.nr_entries = 0; - trace.max_entries = TRACK_ADDRS_COUNT; - trace.entries = p->addrs; - trace.skip = 3; metadata_access_enable(); - save_stack_trace(&trace); + nr_entries = stack_trace_save(p->addrs, TRACK_ADDRS_COUNT, 3); metadata_access_disable(); - if (trace.nr_entries < TRACK_ADDRS_COUNT) - p->addrs[trace.nr_entries] = 0; + if (nr_entries < TRACK_ADDRS_COUNT) + p->addrs[nr_entries] = 0; #endif p->addr = addr; p->cpu = smp_processor_id(); ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [patch V3 18/29] lockdep: Remove save argument from check_prev_add()
There is only one caller which hands in save_trace as function pointer. Signed-off-by: Thomas Gleixner --- kernel/locking/lockdep.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -2158,8 +2158,7 @@ check_deadlock(struct task_struct *curr, */ static int check_prev_add(struct task_struct *curr, struct held_lock *prev, - struct held_lock *next, int distance, struct stack_trace *trace, - int (*save)(struct stack_trace *trace)) + struct held_lock *next, int distance, struct stack_trace *trace) { struct lock_list *uninitialized_var(target_entry); struct lock_list *entry; @@ -2199,11 +2198,11 @@ check_prev_add(struct task_struct *curr, if (unlikely(!ret)) { if (!trace->entries) { /* -* If @save fails here, the printing might trigger -* a WARN but because of the !nr_entries it should -* not do bad things. +* If save_trace fails here, the printing might +* trigger a WARN but because of the !nr_entries it +* should not do bad things. */ - save(trace); + save_trace(trace); } return print_circular_bug(&this, target_entry, next, prev); } @@ -2253,7 +2252,7 @@ check_prev_add(struct task_struct *curr, return print_bfs_bug(ret); - if (!trace->entries && !save(trace)) + if (!trace->entries && !save_trace(trace)) return 0; /* @@ -2318,7 +2317,8 @@ check_prevs_add(struct task_struct *curr * added: */ if (hlock->read != 2 && hlock->check) { - int ret = check_prev_add(curr, hlock, next, distance, &trace, save_trace); + int ret = check_prev_add(curr, hlock, next, distance, +&trace); if (!ret) return 0; ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [patch V3 15/29] dm persistent data: Simplify stack trace handling
Replace the indirection through struct stack_trace with an invocation of the storage array based interface. This results in less storage space and indirection. Signed-off-by: Thomas Gleixner Cc: dm-de...@redhat.com Cc: Mike Snitzer Cc: Alasdair Kergon --- drivers/md/persistent-data/dm-block-manager.c | 19 +-- 1 file changed, 9 insertions(+), 10 deletions(-) --- a/drivers/md/persistent-data/dm-block-manager.c +++ b/drivers/md/persistent-data/dm-block-manager.c @@ -35,7 +35,10 @@ #define MAX_HOLDERS 4 #define MAX_STACK 10 -typedef unsigned long stack_entries[MAX_STACK]; +struct stack_store { + unsigned intnr_entries; + unsigned long entries[MAX_STACK]; +}; struct block_lock { spinlock_t lock; @@ -44,8 +47,7 @@ struct block_lock { struct task_struct *holders[MAX_HOLDERS]; #ifdef CONFIG_DM_DEBUG_BLOCK_STACK_TRACING - struct stack_trace traces[MAX_HOLDERS]; - stack_entries entries[MAX_HOLDERS]; + struct stack_store traces[MAX_HOLDERS]; #endif }; @@ -73,7 +75,7 @@ static void __add_holder(struct block_lo { unsigned h = __find_holder(lock, NULL); #ifdef CONFIG_DM_DEBUG_BLOCK_STACK_TRACING - struct stack_trace *t; + struct stack_store *t; #endif get_task_struct(task); @@ -81,11 +83,7 @@ static void __add_holder(struct block_lo #ifdef CONFIG_DM_DEBUG_BLOCK_STACK_TRACING t = lock->traces + h; - t->nr_entries = 0; - t->max_entries = MAX_STACK; - t->entries = lock->entries[h]; - t->skip = 2; - save_stack_trace(t); + t->nr_entries = stack_trace_save(t->entries, MAX_STACK, 2); #endif } @@ -106,7 +104,8 @@ static int __check_holder(struct block_l DMERR("recursive lock detected in metadata"); #ifdef CONFIG_DM_DEBUG_BLOCK_STACK_TRACING DMERR("previously held here:"); - print_stack_trace(lock->traces + i, 4); + stack_trace_print(lock->traces[i].entries, + lock->traces[i].nr_entries, 4); DMERR("subsequent acquisition attempted here:"); dump_stack(); ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [patch V3 26/29] stacktrace: Remove obsolete functions
No more users of the struct stack_trace based interfaces. Remove them. Remove the macro stubs for !CONFIG_STACKTRACE as well as they are pointless because the storage on the call sites is conditional on CONFIG_STACKTRACE already. No point to be 'smart'. Signed-off-by: Thomas Gleixner --- include/linux/stacktrace.h | 17 - kernel/stacktrace.c| 14 -- 2 files changed, 31 deletions(-) --- a/include/linux/stacktrace.h +++ b/include/linux/stacktrace.h @@ -36,24 +36,7 @@ extern void save_stack_trace_tsk(struct struct stack_trace *trace); extern int save_stack_trace_tsk_reliable(struct task_struct *tsk, struct stack_trace *trace); - -extern void print_stack_trace(struct stack_trace *trace, int spaces); -extern int snprint_stack_trace(char *buf, size_t size, - struct stack_trace *trace, int spaces); - -#ifdef CONFIG_USER_STACKTRACE_SUPPORT extern void save_stack_trace_user(struct stack_trace *trace); -#else -# define save_stack_trace_user(trace) do { } while (0) -#endif - -#else /* !CONFIG_STACKTRACE */ -# define save_stack_trace(trace) do { } while (0) -# define save_stack_trace_tsk(tsk, trace) do { } while (0) -# define save_stack_trace_user(trace) do { } while (0) -# define print_stack_trace(trace, spaces) do { } while (0) -# define snprint_stack_trace(buf, size, trace, spaces) do { } while (0) -# define save_stack_trace_tsk_reliable(tsk, trace) ({ -ENOSYS; }) #endif /* CONFIG_STACKTRACE */ #if defined(CONFIG_STACKTRACE) && defined(CONFIG_HAVE_RELIABLE_STACKTRACE) --- a/kernel/stacktrace.c +++ b/kernel/stacktrace.c @@ -30,12 +30,6 @@ void stack_trace_print(unsigned long *en } EXPORT_SYMBOL_GPL(stack_trace_print); -void print_stack_trace(struct stack_trace *trace, int spaces) -{ - stack_trace_print(trace->entries, trace->nr_entries, spaces); -} -EXPORT_SYMBOL_GPL(print_stack_trace); - /** * stack_trace_snprint - Print the entries in the stack trace into a buffer * @buf: Pointer to the print buffer @@ -72,14 +66,6 @@ int stack_trace_snprint(char *buf, size_ } EXPORT_SYMBOL_GPL(stack_trace_snprint); -int snprint_stack_trace(char *buf, size_t size, - struct stack_trace *trace, int spaces) -{ - return stack_trace_snprint(buf, size, trace->entries, - trace->nr_entries, spaces); -} -EXPORT_SYMBOL_GPL(snprint_stack_trace); - /* * Architectures that do not implement save_stack_trace_*() * get these weak aliases and once-per-bootup warnings ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [patch V3 19/29] lockdep: Simplify stack trace handling
Replace the indirection through struct stack_trace by using the storage array based interfaces and storing the information is a small lockdep specific data structure. Signed-off-by: Thomas Gleixner Acked-by: Peter Zijlstra (Intel) --- include/linux/lockdep.h |9 +-- kernel/locking/lockdep.c | 55 +++ 2 files changed, 35 insertions(+), 29 deletions(-) --- a/include/linux/lockdep.h +++ b/include/linux/lockdep.h @@ -66,6 +66,11 @@ struct lock_class_key { extern struct lock_class_key __lockdep_no_validate__; +struct lock_trace { + unsigned intnr_entries; + unsigned intoffset; +}; + #define LOCKSTAT_POINTS4 /* @@ -100,7 +105,7 @@ struct lock_class { * IRQ/softirq usage tracking bits: */ unsigned long usage_mask; - struct stack_trace usage_traces[XXX_LOCK_USAGE_STATES]; + struct lock_trace usage_traces[XXX_LOCK_USAGE_STATES]; /* * Generation counter, when doing certain classes of graph walking, @@ -188,7 +193,7 @@ struct lock_list { struct list_headentry; struct lock_class *class; struct lock_class *links_to; - struct stack_trace trace; + struct lock_trace trace; int distance; /* --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -434,18 +434,14 @@ static void print_lockdep_off(const char #endif } -static int save_trace(struct stack_trace *trace) +static int save_trace(struct lock_trace *trace) { - trace->nr_entries = 0; - trace->max_entries = MAX_STACK_TRACE_ENTRIES - nr_stack_trace_entries; - trace->entries = stack_trace + nr_stack_trace_entries; - - trace->skip = 3; - - save_stack_trace(trace); - - trace->max_entries = trace->nr_entries; + unsigned long *entries = stack_trace + nr_stack_trace_entries; + unsigned int max_entries; + trace->offset = nr_stack_trace_entries; + max_entries = MAX_STACK_TRACE_ENTRIES - nr_stack_trace_entries; + trace->nr_entries = stack_trace_save(entries, max_entries, 3); nr_stack_trace_entries += trace->nr_entries; if (nr_stack_trace_entries >= MAX_STACK_TRACE_ENTRIES-1) { @@ -1196,7 +1192,7 @@ static struct lock_list *alloc_list_entr static int add_lock_to_list(struct lock_class *this, struct lock_class *links_to, struct list_head *head, unsigned long ip, int distance, - struct stack_trace *trace) + struct lock_trace *trace) { struct lock_list *entry; /* @@ -1415,6 +1411,13 @@ static inline int __bfs_backwards(struct * checking. */ +static void print_lock_trace(struct lock_trace *trace, unsigned int spaces) +{ + unsigned long *entries = stack_trace + trace->offset; + + stack_trace_print(entries, trace->nr_entries, spaces); +} + /* * Print a dependency chain entry (this is only done when a deadlock * has been detected): @@ -1427,8 +1430,7 @@ print_circular_bug_entry(struct lock_lis printk("\n-> #%u", depth); print_lock_name(target->class); printk(KERN_CONT ":\n"); - print_stack_trace(&target->trace, 6); - + print_lock_trace(&target->trace, 6); return 0; } @@ -1740,7 +1742,7 @@ static void print_lock_class_header(stru len += printk("%*s %s", depth, "", usage_str[bit]); len += printk(KERN_CONT " at:\n"); - print_stack_trace(class->usage_traces + bit, len); + print_lock_trace(class->usage_traces + bit, len); } } printk("%*s }\n", depth, ""); @@ -1765,7 +1767,7 @@ print_shortest_lock_dependencies(struct do { print_lock_class_header(entry->class, depth); printk("%*s ... acquired at:\n", depth, ""); - print_stack_trace(&entry->trace, 2); + print_lock_trace(&entry->trace, 2); printk("\n"); if (depth == 0 && (entry != root)) { @@ -1878,14 +1880,14 @@ print_bad_irq_dependency(struct task_str print_lock_name(backwards_entry->class); pr_warn("\n... which became %s-irq-safe at:\n", irqclass); - print_stack_trace(backwards_entry->class->usage_traces + bit1, 1); + print_lock_trace(backwards_entry->class->usage_traces + bit1, 1); pr_warn("\nto a %s-irq-unsafe lock:\n", irqclass); print_lock_name(forwards_entry->class); pr_warn("\n... which became %s-irq-unsafe at:\n", irqclass); pr_warn("..."); - print_stack_trace(forwards_entry->class->usage_traces + bit2, 1); + print_lock_trace(forwards_entry->class->usage_traces + bit2, 1);
[Intel-gfx] [patch V3 13/29] btrfs: ref-verify: Simplify stack trace retrieval
Replace the indirection through struct stack_trace with an invocation of the storage array based interface. Signed-off-by: Thomas Gleixner Reviewed-by: Johannes Thumshirn Acked-by: David Sterba Cc: Chris Mason Cc: Josef Bacik Cc: linux-bt...@vger.kernel.org --- fs/btrfs/ref-verify.c | 15 ++- 1 file changed, 2 insertions(+), 13 deletions(-) --- a/fs/btrfs/ref-verify.c +++ b/fs/btrfs/ref-verify.c @@ -205,28 +205,17 @@ static struct root_entry *lookup_root_en #ifdef CONFIG_STACKTRACE static void __save_stack_trace(struct ref_action *ra) { - struct stack_trace stack_trace; - - stack_trace.max_entries = MAX_TRACE; - stack_trace.nr_entries = 0; - stack_trace.entries = ra->trace; - stack_trace.skip = 2; - save_stack_trace(&stack_trace); - ra->trace_len = stack_trace.nr_entries; + ra->trace_len = stack_trace_save(ra->trace, MAX_TRACE, 2); } static void __print_stack_trace(struct btrfs_fs_info *fs_info, struct ref_action *ra) { - struct stack_trace trace; - if (ra->trace_len == 0) { btrfs_err(fs_info, " ref-verify: no stacktrace"); return; } - trace.nr_entries = ra->trace_len; - trace.entries = ra->trace; - print_stack_trace(&trace, 2); + stack_trace_print(ra->trace, ra->trace_len, 2); } #else static void inline __save_stack_trace(struct ref_action *ra) ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [patch V3 02/29] stacktrace: Provide helpers for common stack trace operations
All operations with stack traces are based on struct stack_trace. That's a horrible construct as the struct is a kitchen sink for input and output. Quite some usage sites embed it into their own data structures which creates weird indirections. There is absolutely no point in doing so. For all use cases a storage array and the number of valid stack trace entries in the array is sufficient. Provide helper functions which avoid the struct stack_trace indirection so the usage sites can be cleaned up. Signed-off-by: Thomas Gleixner --- V3: Fix kernel doc. --- include/linux/stacktrace.h | 27 +++ kernel/stacktrace.c| 170 + 2 files changed, 182 insertions(+), 15 deletions(-) --- a/include/linux/stacktrace.h +++ b/include/linux/stacktrace.h @@ -3,11 +3,26 @@ #define __LINUX_STACKTRACE_H #include +#include struct task_struct; struct pt_regs; #ifdef CONFIG_STACKTRACE +void stack_trace_print(unsigned long *trace, unsigned int nr_entries, + int spaces); +int stack_trace_snprint(char *buf, size_t size, unsigned long *entries, + unsigned int nr_entries, int spaces); +unsigned int stack_trace_save(unsigned long *store, unsigned int size, + unsigned int skipnr); +unsigned int stack_trace_save_tsk(struct task_struct *task, + unsigned long *store, unsigned int size, + unsigned int skipnr); +unsigned int stack_trace_save_regs(struct pt_regs *regs, unsigned long *store, + unsigned int size, unsigned int skipnr); +unsigned int stack_trace_save_user(unsigned long *store, unsigned int size); + +/* Internal interfaces. Do not use in generic code */ struct stack_trace { unsigned int nr_entries, max_entries; unsigned long *entries; @@ -41,4 +56,16 @@ extern void save_stack_trace_user(struct # define save_stack_trace_tsk_reliable(tsk, trace) ({ -ENOSYS; }) #endif /* CONFIG_STACKTRACE */ +#if defined(CONFIG_STACKTRACE) && defined(CONFIG_HAVE_RELIABLE_STACKTRACE) +int stack_trace_save_tsk_reliable(struct task_struct *tsk, unsigned long *store, + unsigned int size); +#else +static inline int stack_trace_save_tsk_reliable(struct task_struct *tsk, + unsigned long *store, + unsigned int size) +{ + return -ENOSYS; +} +#endif + #endif /* __LINUX_STACKTRACE_H */ --- a/kernel/stacktrace.c +++ b/kernel/stacktrace.c @@ -11,35 +11,54 @@ #include #include -void print_stack_trace(struct stack_trace *trace, int spaces) +/** + * stack_trace_print - Print the entries in the stack trace + * @entries: Pointer to storage array + * @nr_entries:Number of entries in the storage array + * @spaces:Number of leading spaces to print + */ +void stack_trace_print(unsigned long *entries, unsigned int nr_entries, + int spaces) { - int i; + unsigned int i; - if (WARN_ON(!trace->entries)) + if (WARN_ON(!entries)) return; - for (i = 0; i < trace->nr_entries; i++) - printk("%*c%pS\n", 1 + spaces, ' ', (void *)trace->entries[i]); + for (i = 0; i < nr_entries; i++) + printk("%*c%pS\n", 1 + spaces, ' ', (void *)entries[i]); +} +EXPORT_SYMBOL_GPL(stack_trace_print); + +void print_stack_trace(struct stack_trace *trace, int spaces) +{ + stack_trace_print(trace->entries, trace->nr_entries, spaces); } EXPORT_SYMBOL_GPL(print_stack_trace); -int snprint_stack_trace(char *buf, size_t size, - struct stack_trace *trace, int spaces) +/** + * stack_trace_snprint - Print the entries in the stack trace into a buffer + * @buf: Pointer to the print buffer + * @size: Size of the print buffer + * @entries: Pointer to storage array + * @nr_entries:Number of entries in the storage array + * @spaces:Number of leading spaces to print + * + * Return: Number of bytes printed. + */ +int stack_trace_snprint(char *buf, size_t size, unsigned long *entries, + unsigned int nr_entries, int spaces) { - int i; - int generated; - int total = 0; + unsigned int generated, i, total = 0; - if (WARN_ON(!trace->entries)) + if (WARN_ON(!entries)) return 0; - for (i = 0; i < trace->nr_entries; i++) { + for (i = 0; i < nr_entries && size; i++) { generated = snprintf(buf, size, "%*c%pS\n", 1 + spaces, ' ', -(void *)trace->entries[i]); +(void *)entries[i]); total += generated; - - /* Assume that generated isn't a negative number */ if (generated >= size) { buf += size;
[Intel-gfx] [patch V3 27/29] lib/stackdepot: Remove obsolete functions
No more users of the struct stack_trace based interfaces. Signed-off-by: Thomas Gleixner Acked-by: Alexander Potapenko --- include/linux/stackdepot.h |4 lib/stackdepot.c | 20 2 files changed, 24 deletions(-) --- a/include/linux/stackdepot.h +++ b/include/linux/stackdepot.h @@ -23,13 +23,9 @@ typedef u32 depot_stack_handle_t; -struct stack_trace; - -depot_stack_handle_t depot_save_stack(struct stack_trace *trace, gfp_t flags); depot_stack_handle_t stack_depot_save(unsigned long *entries, unsigned int nr_entries, gfp_t gfp_flags); -void depot_fetch_stack(depot_stack_handle_t handle, struct stack_trace *trace); unsigned int stack_depot_fetch(depot_stack_handle_t handle, unsigned long **entries); --- a/lib/stackdepot.c +++ b/lib/stackdepot.c @@ -216,14 +216,6 @@ unsigned int stack_depot_fetch(depot_sta } EXPORT_SYMBOL_GPL(stack_depot_fetch); -void depot_fetch_stack(depot_stack_handle_t handle, struct stack_trace *trace) -{ - unsigned int nent = stack_depot_fetch(handle, &trace->entries); - - trace->max_entries = trace->nr_entries = nent; -} -EXPORT_SYMBOL_GPL(depot_fetch_stack); - /** * stack_depot_save - Save a stack trace from an array * @@ -318,15 +310,3 @@ depot_stack_handle_t stack_depot_save(un return retval; } EXPORT_SYMBOL_GPL(stack_depot_save); - -/** - * depot_save_stack - save stack in a stack depot. - * @trace - the stacktrace to save. - * @alloc_flags - flags for allocating additional memory if required. - */ -depot_stack_handle_t depot_save_stack(struct stack_trace *trace, - gfp_t alloc_flags) -{ - return stack_depot_save(trace->entries, trace->nr_entries, alloc_flags); -} -EXPORT_SYMBOL_GPL(depot_save_stack); ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [patch V3 21/29] tracing: Use percpu stack trace buffer more intelligently
The per cpu stack trace buffer usage pattern is odd at best. The buffer has place for 512 stack trace entries on 64-bit and 1024 on 32-bit. When interrupts or exceptions nest after the per cpu buffer was acquired the stacktrace length is hardcoded to 8 entries. 512/1024 stack trace entries in kernel stacks are unrealistic so the buffer is a complete waste. Split the buffer into 4 nest levels, which are 128/256 entries per level. This allows nesting contexts (interrupts, exceptions) to utilize the cpu buffer for stack retrieval and avoids the fixed length allocation along with the conditional execution pathes. Signed-off-by: Thomas Gleixner Cc: Steven Rostedt --- V3: Limit to 4 nest levels and increase size per level. --- kernel/trace/trace.c | 77 +-- 1 file changed, 39 insertions(+), 38 deletions(-) --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -2749,12 +2749,21 @@ trace_function(struct trace_array *tr, #ifdef CONFIG_STACKTRACE -#define FTRACE_STACK_MAX_ENTRIES (PAGE_SIZE / sizeof(unsigned long)) +/* Allow 4 levels of nesting: normal, softirq, irq, NMI */ +#define FTRACE_KSTACK_NESTING 4 + +#define FTRACE_KSTACK_ENTRIES (PAGE_SIZE / FTRACE_KSTACK_NESTING) + struct ftrace_stack { - unsigned long calls[FTRACE_STACK_MAX_ENTRIES]; + unsigned long calls[FTRACE_KSTACK_ENTRIES]; +}; + + +struct ftrace_stacks { + struct ftrace_stack stacks[FTRACE_KSTACK_NESTING]; }; -static DEFINE_PER_CPU(struct ftrace_stack, ftrace_stack); +static DEFINE_PER_CPU(struct ftrace_stacks, ftrace_stacks); static DEFINE_PER_CPU(int, ftrace_stack_reserve); static void __ftrace_trace_stack(struct ring_buffer *buffer, @@ -2763,10 +2772,11 @@ static void __ftrace_trace_stack(struct { struct trace_event_call *call = &event_kernel_stack; struct ring_buffer_event *event; + struct ftrace_stack *fstack; struct stack_entry *entry; struct stack_trace trace; - int use_stack; - int size = FTRACE_STACK_ENTRIES; + int size = FTRACE_KSTACK_ENTRIES; + int stackidx; trace.nr_entries= 0; trace.skip = skip; @@ -2788,29 +2798,32 @@ static void __ftrace_trace_stack(struct */ preempt_disable_notrace(); - use_stack = __this_cpu_inc_return(ftrace_stack_reserve); + stackidx = __this_cpu_inc_return(ftrace_stack_reserve); + + /* This should never happen. If it does, yell once and skip */ + if (WARN_ON_ONCE(stackidx >= FTRACE_KSTACK_NESTING)) + goto out; + /* -* We don't need any atomic variables, just a barrier. -* If an interrupt comes in, we don't care, because it would -* have exited and put the counter back to what we want. -* We just need a barrier to keep gcc from moving things -* around. +* The above __this_cpu_inc_return() is 'atomic' cpu local. An +* interrupt will either see the value pre increment or post +* increment. If the interrupt happens pre increment it will have +* restored the counter when it returns. We just need a barrier to +* keep gcc from moving things around. */ barrier(); - if (use_stack == 1) { - trace.entries = this_cpu_ptr(ftrace_stack.calls); - trace.max_entries = FTRACE_STACK_MAX_ENTRIES; - - if (regs) - save_stack_trace_regs(regs, &trace); - else - save_stack_trace(&trace); - - if (trace.nr_entries > size) - size = trace.nr_entries; - } else - /* From now on, use_stack is a boolean */ - use_stack = 0; + + fstack = this_cpu_ptr(ftrace_stacks.stacks) + (stackidx - 1); + trace.entries = fstack->calls; + trace.max_entries = FTRACE_KSTACK_ENTRIES; + + if (regs) + save_stack_trace_regs(regs, &trace); + else + save_stack_trace(&trace); + + if (trace.nr_entries > size) + size = trace.nr_entries; size *= sizeof(unsigned long); @@ -2820,19 +2833,7 @@ static void __ftrace_trace_stack(struct goto out; entry = ring_buffer_event_data(event); - memset(&entry->caller, 0, size); - - if (use_stack) - memcpy(&entry->caller, trace.entries, - trace.nr_entries * sizeof(unsigned long)); - else { - trace.max_entries = FTRACE_STACK_ENTRIES; - trace.entries = entry->caller; - if (regs) - save_stack_trace_regs(regs, &trace); - else - save_stack_trace(&trace); - } + memcpy(&entry->caller, trace.entries, size); entry->size = trace.nr_entries; __
[Intel-gfx] [patch V3 22/29] tracing: Make ftrace_trace_userstack() static and conditional
It's only used in trace.c and there is absolutely no point in compiling it in when user space stack traces are not supported. Signed-off-by: Thomas Gleixner Reviewed-by: Steven Rostedt --- kernel/trace/trace.c | 14 -- kernel/trace/trace.h |8 2 files changed, 8 insertions(+), 14 deletions(-) --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -159,6 +159,8 @@ static union trace_eval_map_item *trace_ #endif /* CONFIG_TRACE_EVAL_MAP_FILE */ static int tracing_set_tracer(struct trace_array *tr, const char *buf); +static void ftrace_trace_userstack(struct ring_buffer *buffer, + unsigned long flags, int pc); #define MAX_TRACER_SIZE100 static char bootup_tracer_buf[MAX_TRACER_SIZE] __initdata; @@ -2905,9 +2907,10 @@ void trace_dump_stack(int skip) } EXPORT_SYMBOL_GPL(trace_dump_stack); +#ifdef CONFIG_USER_STACKTRACE_SUPPORT static DEFINE_PER_CPU(int, user_stack_count); -void +static void ftrace_trace_userstack(struct ring_buffer *buffer, unsigned long flags, int pc) { struct trace_event_call *call = &event_user_stack; @@ -2958,13 +2961,12 @@ ftrace_trace_userstack(struct ring_buffe out: preempt_enable(); } - -#ifdef UNUSED -static void __trace_userstack(struct trace_array *tr, unsigned long flags) +#else /* CONFIG_USER_STACKTRACE_SUPPORT */ +static void ftrace_trace_userstack(struct ring_buffer *buffer, + unsigned long flags, int pc) { - ftrace_trace_userstack(tr, flags, preempt_count()); } -#endif /* UNUSED */ +#endif /* !CONFIG_USER_STACKTRACE_SUPPORT */ #endif /* CONFIG_STACKTRACE */ --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -782,17 +782,9 @@ void update_max_tr_single(struct trace_a #endif /* CONFIG_TRACER_MAX_TRACE */ #ifdef CONFIG_STACKTRACE -void ftrace_trace_userstack(struct ring_buffer *buffer, unsigned long flags, - int pc); - void __trace_stack(struct trace_array *tr, unsigned long flags, int skip, int pc); #else -static inline void ftrace_trace_userstack(struct ring_buffer *buffer, - unsigned long flags, int pc) -{ -} - static inline void __trace_stack(struct trace_array *tr, unsigned long flags, int skip, int pc) { ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [patch V3 05/29] proc: Simplify task stack retrieval
Replace the indirection through struct stack_trace with an invocation of the storage array based interface. Signed-off-by: Thomas Gleixner Reviewed-by: Alexey Dobriyan Cc: Andrew Morton --- fs/proc/base.c | 14 +- 1 file changed, 5 insertions(+), 9 deletions(-) --- a/fs/proc/base.c +++ b/fs/proc/base.c @@ -407,7 +407,6 @@ static void unlock_trace(struct task_str static int proc_pid_stack(struct seq_file *m, struct pid_namespace *ns, struct pid *pid, struct task_struct *task) { - struct stack_trace trace; unsigned long *entries; int err; @@ -430,20 +429,17 @@ static int proc_pid_stack(struct seq_fil if (!entries) return -ENOMEM; - trace.nr_entries= 0; - trace.max_entries = MAX_STACK_TRACE_DEPTH; - trace.entries = entries; - trace.skip = 0; - err = lock_trace(task); if (!err) { - unsigned int i; + unsigned int i, nr_entries; - save_stack_trace_tsk(task, &trace); + nr_entries = stack_trace_save_tsk(task, entries, + MAX_STACK_TRACE_DEPTH, 0); - for (i = 0; i < trace.nr_entries; i++) { + for (i = 0; i < nr_entries; i++) { seq_printf(m, "[<0>] %pB\n", (void *)entries[i]); } + unlock_trace(task); } kfree(entries); ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [patch V3 11/29] fault-inject: Simplify stacktrace retrieval
Replace the indirection through struct stack_trace with an invocation of the storage array based interface. Signed-off-by: Thomas Gleixner Cc: Akinobu Mita --- lib/fault-inject.c | 12 +++- 1 file changed, 3 insertions(+), 9 deletions(-) --- a/lib/fault-inject.c +++ b/lib/fault-inject.c @@ -65,22 +65,16 @@ static bool fail_task(struct fault_attr static bool fail_stacktrace(struct fault_attr *attr) { - struct stack_trace trace; int depth = attr->stacktrace_depth; unsigned long entries[MAX_STACK_TRACE_DEPTH]; - int n; + int n, nr_entries; bool found = (attr->require_start == 0 && attr->require_end == ULONG_MAX); if (depth == 0) return found; - trace.nr_entries = 0; - trace.entries = entries; - trace.max_entries = depth; - trace.skip = 1; - - save_stack_trace(&trace); - for (n = 0; n < trace.nr_entries; n++) { + nr_entries = stack_trace_save(entries, depth, 1); + for (n = 0; n < nr_entries; n++) { if (attr->reject_start <= entries[n] && entries[n] < attr->reject_end) return false; ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [patch V3 14/29] dm bufio: Simplify stack trace retrieval
Replace the indirection through struct stack_trace with an invocation of the storage array based interface. Signed-off-by: Thomas Gleixner Cc: dm-de...@redhat.com Cc: Mike Snitzer Cc: Alasdair Kergon --- drivers/md/dm-bufio.c | 15 ++- 1 file changed, 6 insertions(+), 9 deletions(-) --- a/drivers/md/dm-bufio.c +++ b/drivers/md/dm-bufio.c @@ -150,7 +150,7 @@ struct dm_buffer { void (*end_io)(struct dm_buffer *, blk_status_t); #ifdef CONFIG_DM_DEBUG_BLOCK_STACK_TRACING #define MAX_STACK 10 - struct stack_trace stack_trace; + unsigned int stack_len; unsigned long stack_entries[MAX_STACK]; #endif }; @@ -232,11 +232,7 @@ static DEFINE_MUTEX(dm_bufio_clients_loc #ifdef CONFIG_DM_DEBUG_BLOCK_STACK_TRACING static void buffer_record_stack(struct dm_buffer *b) { - b->stack_trace.nr_entries = 0; - b->stack_trace.max_entries = MAX_STACK; - b->stack_trace.entries = b->stack_entries; - b->stack_trace.skip = 2; - save_stack_trace(&b->stack_trace); + b->stack_len = stack_trace_save(b->stack_entries, MAX_STACK, 2); } #endif @@ -438,7 +434,7 @@ static struct dm_buffer *alloc_buffer(st adjust_total_allocated(b->data_mode, (long)c->block_size); #ifdef CONFIG_DM_DEBUG_BLOCK_STACK_TRACING - memset(&b->stack_trace, 0, sizeof(b->stack_trace)); + b->stack_len = 0; #endif return b; } @@ -1520,8 +1516,9 @@ static void drop_buffers(struct dm_bufio DMERR("leaked buffer %llx, hold count %u, list %d", (unsigned long long)b->block, b->hold_count, i); #ifdef CONFIG_DM_DEBUG_BLOCK_STACK_TRACING - print_stack_trace(&b->stack_trace, 1); - b->hold_count = 0; /* mark unclaimed to avoid BUG_ON below */ + stack_trace_print(b->stack_entries, b->stack_len, 1); + /* mark unclaimed to avoid BUG_ON below */ + b->hold_count = 0; #endif } ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [patch V3 03/29] lib/stackdepot: Provide functions which operate on plain storage arrays
The struct stack_trace indirection in the stack depot functions is a truly pointless excercise which requires horrible code at the callsites. Provide interfaces based on plain storage arrays. Signed-off-by: Thomas Gleixner Acked-by: Alexander Potapenko --- V3: Fix kernel-doc --- include/linux/stackdepot.h |4 ++ lib/stackdepot.c | 70 - 2 files changed, 55 insertions(+), 19 deletions(-) --- a/include/linux/stackdepot.h +++ b/include/linux/stackdepot.h @@ -26,7 +26,11 @@ typedef u32 depot_stack_handle_t; struct stack_trace; depot_stack_handle_t depot_save_stack(struct stack_trace *trace, gfp_t flags); +depot_stack_handle_t stack_depot_save(unsigned long *entries, + unsigned int nr_entries, gfp_t gfp_flags); void depot_fetch_stack(depot_stack_handle_t handle, struct stack_trace *trace); +unsigned int stack_depot_fetch(depot_stack_handle_t handle, + unsigned long **entries); #endif --- a/lib/stackdepot.c +++ b/lib/stackdepot.c @@ -194,40 +194,60 @@ static inline struct stack_record *find_ return NULL; } -void depot_fetch_stack(depot_stack_handle_t handle, struct stack_trace *trace) +/** + * stack_depot_fetch - Fetch stack entries from a depot + * + * @handle:Stack depot handle which was returned from + * stack_depot_save(). + * @entries: Pointer to store the entries address + * + * Return: The number of trace entries for this depot. + */ +unsigned int stack_depot_fetch(depot_stack_handle_t handle, + unsigned long **entries) { union handle_parts parts = { .handle = handle }; void *slab = stack_slabs[parts.slabindex]; size_t offset = parts.offset << STACK_ALLOC_ALIGN; struct stack_record *stack = slab + offset; - trace->nr_entries = trace->max_entries = stack->size; - trace->entries = stack->entries; - trace->skip = 0; + *entries = stack->entries; + return stack->size; +} +EXPORT_SYMBOL_GPL(stack_depot_fetch); + +void depot_fetch_stack(depot_stack_handle_t handle, struct stack_trace *trace) +{ + unsigned int nent = stack_depot_fetch(handle, &trace->entries); + + trace->max_entries = trace->nr_entries = nent; } EXPORT_SYMBOL_GPL(depot_fetch_stack); /** - * depot_save_stack - save stack in a stack depot. - * @trace - the stacktrace to save. - * @alloc_flags - flags for allocating additional memory if required. + * stack_depot_save - Save a stack trace from an array + * + * @entries: Pointer to storage array + * @nr_entries:Size of the storage array + * @alloc_flags: Allocation gfp flags * - * Returns the handle of the stack struct stored in depot. + * Return: The handle of the stack struct stored in depot */ -depot_stack_handle_t depot_save_stack(struct stack_trace *trace, - gfp_t alloc_flags) +depot_stack_handle_t stack_depot_save(unsigned long *entries, + unsigned int nr_entries, + gfp_t alloc_flags) { - u32 hash; - depot_stack_handle_t retval = 0; struct stack_record *found = NULL, **bucket; - unsigned long flags; + depot_stack_handle_t retval = 0; struct page *page = NULL; void *prealloc = NULL; + unsigned long flags; + u32 hash; - if (unlikely(trace->nr_entries == 0)) + if (unlikely(nr_entries == 0)) goto fast_exit; - hash = hash_stack(trace->entries, trace->nr_entries); + hash = hash_stack(entries, nr_entries); bucket = &stack_table[hash & STACK_HASH_MASK]; /* @@ -235,8 +255,8 @@ depot_stack_handle_t depot_save_stack(st * The smp_load_acquire() here pairs with smp_store_release() to * |bucket| below. */ - found = find_stack(smp_load_acquire(bucket), trace->entries, - trace->nr_entries, hash); + found = find_stack(smp_load_acquire(bucket), entries, + nr_entries, hash); if (found) goto exit; @@ -264,10 +284,10 @@ depot_stack_handle_t depot_save_stack(st spin_lock_irqsave(&depot_lock, flags); - found = find_stack(*bucket, trace->entries, trace->nr_entries, hash); + found = find_stack(*bucket, entries, nr_entries, hash); if (!found) { struct stack_record *new = - depot_alloc_stack(trace->entries, trace->nr_entries, + depot_alloc_stack(entries, nr_entries, hash, &prealloc, alloc_flags); if (new) { new->next = *bucket; @@ -297,4 +317,16 @@ depot_stack_handle_t depot_save_stack(st fast_exit: return retval; } +EXPORT_SYMBOL_GPL(stack_depot_save); + +/** + * depot
[Intel-gfx] [patch V3 00/29] stacktrace: Consolidate stack trace usage
This is an update to V2 which can be found here: https://lkml.kernel.org/r/20190418084119.056416...@linutronix.de Changes vs. V2: - Fixed the kernel-doc issue pointed out by Mike - Removed the '-1' oddity from the tracer - Restricted the tracer nesting to 4 - Restored the lockdep magic to prevent redundant stack traces - Addressed the small nitpicks here and there - Picked up Acked/Reviewed tags The series is based on: git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git core/stacktrace which contains the removal of the inconsistent and pointless ULONG_MAX termination of stacktraces. It's also available from git: git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git WIP.core/stacktrace up to 5160eb663575 ("x86/stacktrace: Use common infrastructure") Delta patch vs. v2 below. Thanks, tglx 8<- diff --git a/include/linux/stacktrace.h b/include/linux/stacktrace.h index 654949dc1c16..f0cfd12cb45e 100644 --- a/include/linux/stacktrace.h +++ b/include/linux/stacktrace.h @@ -32,14 +32,13 @@ unsigned int stack_trace_save_user(unsigned long *store, unsigned int size); * @reliable: True when the stack entry is reliable. Required by * some printk based consumers. * - * Returns:True, if the entry was consumed or skipped + * Return: True, if the entry was consumed or skipped * False, if there is no space left to store */ typedef bool (*stack_trace_consume_fn)(void *cookie, unsigned long addr, bool reliable); /** * arch_stack_walk - Architecture specific function to walk the stack - * @consume_entry: Callback which is invoked by the architecture code for * each entry. * @cookie:Caller supplied pointer which is handed back to @@ -47,10 +46,12 @@ typedef bool (*stack_trace_consume_fn)(void *cookie, unsigned long addr, * @task: Pointer to a task struct, can be NULL * @regs: Pointer to registers, can be NULL * - * @task @regs: - * NULLNULLStack trace from current + * === + * taskregs + * === * taskNULLStack trace from task (can be current) - * NULLregsStack trace starting on regs->stackpointer + * current regsStack trace starting on regs->stackpointer + * === */ void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie, struct task_struct *task, struct pt_regs *regs); diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c index 1e8e246edaad..badd77670d00 100644 --- a/kernel/dma/debug.c +++ b/kernel/dma/debug.c @@ -705,7 +705,8 @@ static struct dma_debug_entry *dma_entry_alloc(void) #ifdef CONFIG_STACKTRACE entry->stack_len = stack_trace_save(entry->stack_entries, - ARRAY_SIZE(entry->stack_entries), 1); + ARRAY_SIZE(entry->stack_entries), + 1); #endif return entry; } diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index 84423f0bb0b0..45bcaf2e4cb6 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -2160,12 +2160,11 @@ check_deadlock(struct task_struct *curr, struct held_lock *next, */ static int check_prev_add(struct task_struct *curr, struct held_lock *prev, - struct held_lock *next, int distance) + struct held_lock *next, int distance, struct lock_trace *trace) { struct lock_list *uninitialized_var(target_entry); struct lock_list *entry; struct lock_list this; - struct lock_trace trace; int ret; if (!hlock_class(prev)->key || !hlock_class(next)->key) { @@ -2198,8 +2197,17 @@ check_prev_add(struct task_struct *curr, struct held_lock *prev, this.class = hlock_class(next); this.parent = NULL; ret = check_noncircular(&this, hlock_class(prev), &target_entry); - if (unlikely(!ret)) + if (unlikely(!ret)) { + if (!trace->nr_entries) { + /* +* If save_trace fails here, the printing might +* trigger a WARN but because of the !nr_entries it +* should not do bad things. +*/ + save_trace(trace); + } return print_circular_bug(&this, target_entry, next, prev); + } else if (unlikely(ret < 0)) return print_bfs_bug(ret); @@ -2246,7 +2254,7 @@ check_prev_add(struct task_struct *curr, struct held_lock *prev, return print_bfs_bug(ret); - if (!save_trace(&trace)) + if (
[Intel-gfx] [patch V3 23/29] tracing: Simplify stack trace retrieval
Replace the indirection through struct stack_trace by using the storage array based interfaces. Signed-off-by: Thomas Gleixner Reviewed-by: Steven Rostedt (VMware) --- kernel/trace/trace.c | 40 +--- 1 file changed, 13 insertions(+), 27 deletions(-) --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -2774,22 +2774,18 @@ static void __ftrace_trace_stack(struct { struct trace_event_call *call = &event_kernel_stack; struct ring_buffer_event *event; + unsigned int size, nr_entries; struct ftrace_stack *fstack; struct stack_entry *entry; - struct stack_trace trace; - int size = FTRACE_KSTACK_ENTRIES; int stackidx; - trace.nr_entries= 0; - trace.skip = skip; - /* * Add one, for this function and the call to save_stack_trace() * If regs is set, then these functions will not be in the way. */ #ifndef CONFIG_UNWINDER_ORC if (!regs) - trace.skip++; + skip++; #endif /* @@ -2816,28 +2812,24 @@ static void __ftrace_trace_stack(struct barrier(); fstack = this_cpu_ptr(ftrace_stacks.stacks) + (stackidx - 1); - trace.entries = fstack->calls; - trace.max_entries = FTRACE_KSTACK_ENTRIES; - - if (regs) - save_stack_trace_regs(regs, &trace); - else - save_stack_trace(&trace); - - if (trace.nr_entries > size) - size = trace.nr_entries; + size = ARRAY_SIZE(fstack->calls); - size *= sizeof(unsigned long); + if (regs) { + nr_entries = stack_trace_save_regs(regs, fstack->calls, + size, skip); + } else { + nr_entries = stack_trace_save(fstack->calls, size, skip); + } + size = nr_entries * sizeof(unsigned long); event = __trace_buffer_lock_reserve(buffer, TRACE_STACK, sizeof(*entry) + size, flags, pc); if (!event) goto out; entry = ring_buffer_event_data(event); - memcpy(&entry->caller, trace.entries, size); - - entry->size = trace.nr_entries; + memcpy(&entry->caller, fstack->calls, size); + entry->size = nr_entries; if (!call_filter_check_discard(call, entry, buffer, event)) __buffer_unlock_commit(buffer, event); @@ -2916,7 +2908,6 @@ ftrace_trace_userstack(struct ring_buffe struct trace_event_call *call = &event_user_stack; struct ring_buffer_event *event; struct userstack_entry *entry; - struct stack_trace trace; if (!(global_trace.trace_flags & TRACE_ITER_USERSTACKTRACE)) return; @@ -2947,12 +2938,7 @@ ftrace_trace_userstack(struct ring_buffe entry->tgid = current->tgid; memset(&entry->caller, 0, sizeof(entry->caller)); - trace.nr_entries= 0; - trace.max_entries = FTRACE_STACK_ENTRIES; - trace.skip = 0; - trace.entries = entry->caller; - - save_stack_trace_user(&trace); + stack_trace_save_user(entry->caller, FTRACE_STACK_ENTRIES); if (!call_filter_check_discard(call, entry, buffer, event)) __buffer_unlock_commit(buffer, event); ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [patch V3 09/29] mm/kasan: Simplify stacktrace handling
Replace the indirection through struct stack_trace by using the storage array based interfaces. Signed-off-by: Thomas Gleixner Acked-by: Dmitry Vyukov Acked-by: Andrey Ryabinin Cc: Alexander Potapenko Cc: kasan-...@googlegroups.com Cc: linux...@kvack.org --- mm/kasan/common.c | 30 -- mm/kasan/report.c |7 --- 2 files changed, 16 insertions(+), 21 deletions(-) --- a/mm/kasan/common.c +++ b/mm/kasan/common.c @@ -48,34 +48,28 @@ static inline int in_irqentry_text(unsig ptr < (unsigned long)&__softirqentry_text_end); } -static inline void filter_irq_stacks(struct stack_trace *trace) +static inline unsigned int filter_irq_stacks(unsigned long *entries, +unsigned int nr_entries) { - int i; + unsigned int i; - if (!trace->nr_entries) - return; - for (i = 0; i < trace->nr_entries; i++) - if (in_irqentry_text(trace->entries[i])) { + for (i = 0; i < nr_entries; i++) { + if (in_irqentry_text(entries[i])) { /* Include the irqentry function into the stack. */ - trace->nr_entries = i + 1; - break; + return i + 1; } + } + return nr_entries; } static inline depot_stack_handle_t save_stack(gfp_t flags) { unsigned long entries[KASAN_STACK_DEPTH]; - struct stack_trace trace = { - .nr_entries = 0, - .entries = entries, - .max_entries = KASAN_STACK_DEPTH, - .skip = 0 - }; + unsigned int nr_entries; - save_stack_trace(&trace); - filter_irq_stacks(&trace); - - return depot_save_stack(&trace, flags); + nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 0); + nr_entries = filter_irq_stacks(entries, nr_entries); + return stack_depot_save(entries, nr_entries, flags); } static inline void set_track(struct kasan_track *track, gfp_t flags) --- a/mm/kasan/report.c +++ b/mm/kasan/report.c @@ -100,10 +100,11 @@ static void print_track(struct kasan_tra { pr_err("%s by task %u:\n", prefix, track->pid); if (track->stack) { - struct stack_trace trace; + unsigned long *entries; + unsigned int nr_entries; - depot_fetch_stack(track->stack, &trace); - print_stack_trace(&trace, 0); + nr_entries = stack_depot_fetch(track->stack, &entries); + stack_trace_print(entries, nr_entries, 0); } else { pr_err("(stack is not available)\n"); } ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [patch V3 17/29] lockdep: Remove unused trace argument from print_circular_bug()
Signed-off-by: Thomas Gleixner --- kernel/locking/lockdep.c |9 - 1 file changed, 4 insertions(+), 5 deletions(-) --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -1522,10 +1522,9 @@ static inline int class_equal(struct loc } static noinline int print_circular_bug(struct lock_list *this, - struct lock_list *target, - struct held_lock *check_src, - struct held_lock *check_tgt, - struct stack_trace *trace) + struct lock_list *target, + struct held_lock *check_src, + struct held_lock *check_tgt) { struct task_struct *curr = current; struct lock_list *parent; @@ -2206,7 +2205,7 @@ check_prev_add(struct task_struct *curr, */ save(trace); } - return print_circular_bug(&this, target_entry, next, prev, trace); + return print_circular_bug(&this, target_entry, next, prev); } else if (unlikely(ret < 0)) return print_bfs_bug(ret); ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [patch V3 24/29] tracing: Remove the last struct stack_trace usage
Simplify the stack retrieval code by using the storage array based interface. Signed-off-by: Thomas Gleixner Reviewed-by: Steven Rostedt (VMware) --- kernel/trace/trace_stack.c | 37 - 1 file changed, 16 insertions(+), 21 deletions(-) --- a/kernel/trace/trace_stack.c +++ b/kernel/trace/trace_stack.c @@ -23,11 +23,7 @@ static unsigned long stack_dump_trace[STACK_TRACE_ENTRIES]; static unsigned stack_trace_index[STACK_TRACE_ENTRIES]; -struct stack_trace stack_trace_max = { - .max_entries= STACK_TRACE_ENTRIES, - .entries= &stack_dump_trace[0], -}; - +static unsigned int stack_trace_entries; static unsigned long stack_trace_max_size; static arch_spinlock_t stack_trace_max_lock = (arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED; @@ -44,10 +40,10 @@ static void print_max_stack(void) pr_emerg("DepthSize Location(%d entries)\n" "- \n", - stack_trace_max.nr_entries); + stack_trace_entries); - for (i = 0; i < stack_trace_max.nr_entries; i++) { - if (i + 1 == stack_trace_max.nr_entries) + for (i = 0; i < stack_trace_entries; i++) { + if (i + 1 == stack_trace_entries) size = stack_trace_index[i]; else size = stack_trace_index[i] - stack_trace_index[i+1]; @@ -93,13 +89,12 @@ static void check_stack(unsigned long ip stack_trace_max_size = this_size; - stack_trace_max.nr_entries = 0; - stack_trace_max.skip = 0; - - save_stack_trace(&stack_trace_max); + stack_trace_entries = stack_trace_save(stack_dump_trace, + ARRAY_SIZE(stack_dump_trace) - 1, + 0); /* Skip over the overhead of the stack tracer itself */ - for (i = 0; i < stack_trace_max.nr_entries; i++) { + for (i = 0; i < stack_trace_entries; i++) { if (stack_dump_trace[i] == ip) break; } @@ -108,7 +103,7 @@ static void check_stack(unsigned long ip * Some archs may not have the passed in ip in the dump. * If that happens, we need to show everything. */ - if (i == stack_trace_max.nr_entries) + if (i == stack_trace_entries) i = 0; /* @@ -126,13 +121,13 @@ static void check_stack(unsigned long ip * loop will only happen once. This code only takes place * on a new max, so it is far from a fast path. */ - while (i < stack_trace_max.nr_entries) { + while (i < stack_trace_entries) { int found = 0; stack_trace_index[x] = this_size; p = start; - for (; p < top && i < stack_trace_max.nr_entries; p++) { + for (; p < top && i < stack_trace_entries; p++) { /* * The READ_ONCE_NOCHECK is used to let KASAN know that * this is not a stack-out-of-bounds error. @@ -163,7 +158,7 @@ static void check_stack(unsigned long ip i++; } - stack_trace_max.nr_entries = x; + stack_trace_entries = x; if (task_stack_end_corrupted(current)) { print_max_stack(); @@ -265,7 +260,7 @@ static void * { long n = *pos - 1; - if (n >= stack_trace_max.nr_entries) + if (n >= stack_trace_entries) return NULL; m->private = (void *)n; @@ -329,7 +324,7 @@ static int t_show(struct seq_file *m, vo seq_printf(m, "DepthSize Location" "(%d entries)\n" "- \n", - stack_trace_max.nr_entries); + stack_trace_entries); if (!stack_tracer_enabled && !stack_trace_max_size) print_disabled(m); @@ -339,10 +334,10 @@ static int t_show(struct seq_file *m, vo i = *(long *)v; - if (i >= stack_trace_max.nr_entries) + if (i >= stack_trace_entries) return 0; - if (i + 1 == stack_trace_max.nr_entries) + if (i + 1 == stack_trace_entries) size = stack_trace_index[i]; else size = stack_trace_index[i] - stack_trace_index[i+1]; ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [patch V3 04/29] backtrace-test: Simplify stack trace handling
Replace the indirection through struct stack_trace by using the storage array based interfaces. Signed-off-by: Thomas Gleixner --- kernel/backtracetest.c | 11 +++ 1 file changed, 3 insertions(+), 8 deletions(-) --- a/kernel/backtracetest.c +++ b/kernel/backtracetest.c @@ -48,19 +48,14 @@ static void backtrace_test_irq(void) #ifdef CONFIG_STACKTRACE static void backtrace_test_saved(void) { - struct stack_trace trace; unsigned long entries[8]; + unsigned int nr_entries; pr_info("Testing a saved backtrace.\n"); pr_info("The following trace is a kernel self test and not a bug!\n"); - trace.nr_entries = 0; - trace.max_entries = ARRAY_SIZE(entries); - trace.entries = entries; - trace.skip = 0; - - save_stack_trace(&trace); - print_stack_trace(&trace, 0); + nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 0); + stack_trace_print(entries, nr_entries, 0); } #else static void backtrace_test_saved(void) ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [patch V3 28/29] stacktrace: Provide common infrastructure
All architectures which support stacktrace carry duplicated code and do the stack storage and filtering at the architecture side. Provide a consolidated interface with a callback function for consuming the stack entries provided by the architecture specific stack walker. This removes lots of duplicated code and allows to implement better filtering than 'skip number of entries' in the future without touching any architecture specific code. Signed-off-by: Thomas Gleixner Cc: linux-a...@vger.kernel.org --- V3: Fix kernel doc --- include/linux/stacktrace.h | 39 ++ kernel/stacktrace.c| 173 + lib/Kconfig|4 + 3 files changed, 216 insertions(+) --- a/include/linux/stacktrace.h +++ b/include/linux/stacktrace.h @@ -23,6 +23,44 @@ unsigned int stack_trace_save_regs(struc unsigned int stack_trace_save_user(unsigned long *store, unsigned int size); /* Internal interfaces. Do not use in generic code */ +#ifdef CONFIG_ARCH_STACKWALK + +/** + * stack_trace_consume_fn - Callback for arch_stack_walk() + * @cookie:Caller supplied pointer handed back by arch_stack_walk() + * @addr: The stack entry address to consume + * @reliable: True when the stack entry is reliable. Required by + * some printk based consumers. + * + * Return: True, if the entry was consumed or skipped + * False, if there is no space left to store + */ +typedef bool (*stack_trace_consume_fn)(void *cookie, unsigned long addr, + bool reliable); +/** + * arch_stack_walk - Architecture specific function to walk the stack + * @consume_entry: Callback which is invoked by the architecture code for + * each entry. + * @cookie:Caller supplied pointer which is handed back to + * @consume_entry + * @task: Pointer to a task struct, can be NULL + * @regs: Pointer to registers, can be NULL + * + * === + * taskregs + * === + * taskNULLStack trace from task (can be current) + * current regsStack trace starting on regs->stackpointer + * === + */ +void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie, +struct task_struct *task, struct pt_regs *regs); +int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry, void *cookie, +struct task_struct *task); +void arch_stack_walk_user(stack_trace_consume_fn consume_entry, void *cookie, + const struct pt_regs *regs); + +#else /* CONFIG_ARCH_STACKWALK */ struct stack_trace { unsigned int nr_entries, max_entries; unsigned long *entries; @@ -37,6 +75,7 @@ extern void save_stack_trace_tsk(struct extern int save_stack_trace_tsk_reliable(struct task_struct *tsk, struct stack_trace *trace); extern void save_stack_trace_user(struct stack_trace *trace); +#endif /* !CONFIG_ARCH_STACKWALK */ #endif /* CONFIG_STACKTRACE */ #if defined(CONFIG_STACKTRACE) && defined(CONFIG_HAVE_RELIABLE_STACKTRACE) --- a/kernel/stacktrace.c +++ b/kernel/stacktrace.c @@ -5,6 +5,8 @@ * * Copyright (C) 2006 Red Hat, Inc., Ingo Molnar */ +#include +#include #include #include #include @@ -66,6 +68,175 @@ int stack_trace_snprint(char *buf, size_ } EXPORT_SYMBOL_GPL(stack_trace_snprint); +#ifdef CONFIG_ARCH_STACKWALK + +struct stacktrace_cookie { + unsigned long *store; + unsigned intsize; + unsigned intskip; + unsigned intlen; +}; + +static bool stack_trace_consume_entry(void *cookie, unsigned long addr, + bool reliable) +{ + struct stacktrace_cookie *c = cookie; + + if (c->len >= c->size) + return false; + + if (c->skip > 0) { + c->skip--; + return true; + } + c->store[c->len++] = addr; + return c->len < c->size; +} + +static bool stack_trace_consume_entry_nosched(void *cookie, unsigned long addr, + bool reliable) +{ + if (in_sched_functions(addr)) + return true; + return stack_trace_consume_entry(cookie, addr, reliable); +} + +/** + * stack_trace_save - Save a stack trace into a storage array + * @store: Pointer to storage array + * @size: Size of the storage array + * @skipnr:Number of entries to skip at the start of the stack trace + * + * Return: Number of trace entries stored. + */ +unsigned int stack_trace_save(unsigned long *store, unsigned int size, + unsigned int skipnr) +{ + stack_trace_consume_fn consume_entry = stack_trace_consume_entry; +
[Intel-gfx] [patch V3 25/29] livepatch: Simplify stack trace retrieval
Replace the indirection through struct stack_trace by using the storage array based interfaces. Signed-off-by: Thomas Gleixner Acked-by: Miroslav Benes --- kernel/livepatch/transition.c | 22 +- 1 file changed, 9 insertions(+), 13 deletions(-) --- a/kernel/livepatch/transition.c +++ b/kernel/livepatch/transition.c @@ -202,15 +202,15 @@ void klp_update_patch_state(struct task_ * Determine whether the given stack trace includes any references to a * to-be-patched or to-be-unpatched function. */ -static int klp_check_stack_func(struct klp_func *func, - struct stack_trace *trace) +static int klp_check_stack_func(struct klp_func *func, unsigned long *entries, + unsigned int nr_entries) { unsigned long func_addr, func_size, address; struct klp_ops *ops; int i; - for (i = 0; i < trace->nr_entries; i++) { - address = trace->entries[i]; + for (i = 0; i < nr_entries; i++) { + address = entries[i]; if (klp_target_state == KLP_UNPATCHED) { /* @@ -254,29 +254,25 @@ static int klp_check_stack_func(struct k static int klp_check_stack(struct task_struct *task, char *err_buf) { static unsigned long entries[MAX_STACK_ENTRIES]; - struct stack_trace trace; struct klp_object *obj; struct klp_func *func; - int ret; + int ret, nr_entries; - trace.skip = 0; - trace.nr_entries = 0; - trace.max_entries = MAX_STACK_ENTRIES; - trace.entries = entries; - ret = save_stack_trace_tsk_reliable(task, &trace); + ret = stack_trace_save_tsk_reliable(task, entries, ARRAY_SIZE(entries)); WARN_ON_ONCE(ret == -ENOSYS); - if (ret) { + if (ret < 0) { snprintf(err_buf, STACK_ERR_BUF_SIZE, "%s: %s:%d has an unreliable stack\n", __func__, task->comm, task->pid); return ret; } + nr_entries = ret; klp_for_each_object(klp_transition_patch, obj) { if (!obj->patched) continue; klp_for_each_func(obj, func) { - ret = klp_check_stack_func(func, &trace); + ret = klp_check_stack_func(func, entries, nr_entries); if (ret) { snprintf(err_buf, STACK_ERR_BUF_SIZE, "%s: %s:%d is sleeping on function %s\n", ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [patch V3 20/29] tracing: Simplify stacktrace retrieval in histograms
The indirection through struct stack_trace is not necessary at all. Use the storage array based interface. Signed-off-by: Thomas Gleixner Tested-by: Tom Zanussi Reviewed-by: Tom Zanussi Acked-by: Steven Rostedt (VMware) --- kernel/trace/trace_events_hist.c | 12 +++- 1 file changed, 3 insertions(+), 9 deletions(-) --- a/kernel/trace/trace_events_hist.c +++ b/kernel/trace/trace_events_hist.c @@ -5186,7 +5186,6 @@ static void event_hist_trigger(struct ev u64 var_ref_vals[TRACING_MAP_VARS_MAX]; char compound_key[HIST_KEY_SIZE_MAX]; struct tracing_map_elt *elt = NULL; - struct stack_trace stacktrace; struct hist_field *key_field; u64 field_contents; void *key = NULL; @@ -5198,14 +5197,9 @@ static void event_hist_trigger(struct ev key_field = hist_data->fields[i]; if (key_field->flags & HIST_FIELD_FL_STACKTRACE) { - stacktrace.max_entries = HIST_STACKTRACE_DEPTH; - stacktrace.entries = entries; - stacktrace.nr_entries = 0; - stacktrace.skip = HIST_STACKTRACE_SKIP; - - memset(stacktrace.entries, 0, HIST_STACKTRACE_SIZE); - save_stack_trace(&stacktrace); - + memset(entries, 0, HIST_STACKTRACE_SIZE); + stack_trace_save(entries, HIST_STACKTRACE_DEPTH, +HIST_STACKTRACE_SKIP); key = entries; } else { field_contents = key_field->fn(key_field, elt, rbe, rec); ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] [patch V3 29/29] x86/stacktrace: Use common infrastructure
Replace the stack_trace_save*() functions with the new arch_stack_walk() interfaces. Signed-off-by: Thomas Gleixner Cc: linux-a...@vger.kernel.org --- arch/x86/Kconfig |1 arch/x86/kernel/stacktrace.c | 116 +++ 2 files changed, 20 insertions(+), 97 deletions(-) --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -74,6 +74,7 @@ config X86 select ARCH_MIGHT_HAVE_ACPI_PDC if ACPI select ARCH_MIGHT_HAVE_PC_PARPORT select ARCH_MIGHT_HAVE_PC_SERIO + select ARCH_STACKWALK select ARCH_SUPPORTS_ACPI select ARCH_SUPPORTS_ATOMIC_RMW select ARCH_SUPPORTS_NUMA_BALANCING if X86_64 --- a/arch/x86/kernel/stacktrace.c +++ b/arch/x86/kernel/stacktrace.c @@ -12,75 +12,31 @@ #include #include -static int save_stack_address(struct stack_trace *trace, unsigned long addr, - bool nosched) -{ - if (nosched && in_sched_functions(addr)) - return 0; - - if (trace->skip > 0) { - trace->skip--; - return 0; - } - - if (trace->nr_entries >= trace->max_entries) - return -1; - - trace->entries[trace->nr_entries++] = addr; - return 0; -} - -static void noinline __save_stack_trace(struct stack_trace *trace, - struct task_struct *task, struct pt_regs *regs, - bool nosched) +void arch_stack_walk(stack_trace_consume_fn consume_entry, void *cookie, +struct task_struct *task, struct pt_regs *regs) { struct unwind_state state; unsigned long addr; - if (regs) - save_stack_address(trace, regs->ip, nosched); + if (regs && !consume_entry(cookie, regs->ip, false)) + return; for (unwind_start(&state, task, regs, NULL); !unwind_done(&state); unwind_next_frame(&state)) { addr = unwind_get_return_address(&state); - if (!addr || save_stack_address(trace, addr, nosched)) + if (!addr || !consume_entry(cookie, addr, false)) break; } } /* - * Save stack-backtrace addresses into a stack_trace buffer. + * This function returns an error if it detects any unreliable features of the + * stack. Otherwise it guarantees that the stack trace is reliable. + * + * If the task is not 'current', the caller *must* ensure the task is inactive. */ -void save_stack_trace(struct stack_trace *trace) -{ - trace->skip++; - __save_stack_trace(trace, current, NULL, false); -} -EXPORT_SYMBOL_GPL(save_stack_trace); - -void save_stack_trace_regs(struct pt_regs *regs, struct stack_trace *trace) -{ - __save_stack_trace(trace, current, regs, false); -} - -void save_stack_trace_tsk(struct task_struct *tsk, struct stack_trace *trace) -{ - if (!try_get_task_stack(tsk)) - return; - - if (tsk == current) - trace->skip++; - __save_stack_trace(trace, tsk, NULL, true); - - put_task_stack(tsk); -} -EXPORT_SYMBOL_GPL(save_stack_trace_tsk); - -#ifdef CONFIG_HAVE_RELIABLE_STACKTRACE - -static int __always_inline -__save_stack_trace_reliable(struct stack_trace *trace, - struct task_struct *task) +int arch_stack_walk_reliable(stack_trace_consume_fn consume_entry, +void *cookie, struct task_struct *task) { struct unwind_state state; struct pt_regs *regs; @@ -117,7 +73,7 @@ static int __always_inline if (!addr) return -EINVAL; - if (save_stack_address(trace, addr, false)) + if (!consume_entry(cookie, addr, false)) return -EINVAL; } @@ -132,32 +88,6 @@ static int __always_inline return 0; } -/* - * This function returns an error if it detects any unreliable features of the - * stack. Otherwise it guarantees that the stack trace is reliable. - * - * If the task is not 'current', the caller *must* ensure the task is inactive. - */ -int save_stack_trace_tsk_reliable(struct task_struct *tsk, - struct stack_trace *trace) -{ - int ret; - - /* -* If the task doesn't have a stack (e.g., a zombie), the stack is -* "reliably" empty. -*/ - if (!try_get_task_stack(tsk)) - return 0; - - ret = __save_stack_trace_reliable(trace, tsk); - - put_task_stack(tsk); - - return ret; -} -#endif /* CONFIG_HAVE_RELIABLE_STACKTRACE */ - /* Userspace stacktrace - based on kernel/trace/trace_sysprof.c */ struct stack_frame_user { @@ -182,15 +112,15 @@ copy_stack_frame(const void __user *fp, return ret; } -static inline void __save_stack_trace_user(struct stack_trace *trace) +void arch_stack_walk_user(stack_trace_consume_fn consume_entry, void *cookie, + c
Re: [Intel-gfx] [patch V3 00/29] stacktrace: Consolidate stack trace usage
* Thomas Gleixner wrote: > - if (unlikely(!ret)) > + if (unlikely(!ret)) { > + if (!trace->nr_entries) { > + /* > + * If save_trace fails here, the printing might > + * trigger a WARN but because of the !nr_entries it > + * should not do bad things. > + */ > + save_trace(trace); > + } > return print_circular_bug(&this, target_entry, next, prev); > + } > else if (unlikely(ret < 0)) > return print_bfs_bug(ret); Just a minor style nit: the 'else' should probably on the same line as the '}' it belongs to, to make it really obvious that the 'if' has an 'else' branch? At that point the condition should probably also use balanced curly braces. Interdiff looks good otherwise. Thanks, Ingo ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 14/45] drm/i915: Make engine_mask & num_engines static
On 25/04/2019 10:19, Chris Wilson wrote: Having removed the urge to modify the engine_mask at runtime, we can promote the num_engines from a runtime calculation to a static and push it into the device_info tables. What about fused off engines (intel_device_info_init_mmio)? I don't see the patch touching that so it doesn't remove the runtime modification of engine_mask and also makes num_engines wrong in those cases. Regards, Tvrtko Signed-off-by: Chris Wilson Cc: Tvrtko Ursulin Cc: Jani Nikula --- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 3 -- drivers/gpu/drm/i915/gt/intel_ringbuffer.c| 2 +- drivers/gpu/drm/i915/gt/selftest_lrc.c| 4 +- drivers/gpu/drm/i915/i915_pci.c | 45 +-- drivers/gpu/drm/i915/intel_device_info.h | 3 +- .../gpu/drm/i915/selftests/i915_gem_context.c | 4 +- drivers/gpu/drm/i915/selftests/i915_request.c | 2 +- 7 files changed, 27 insertions(+), 36 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 862cf1040f88..b2cd6c6785be 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -383,9 +383,6 @@ int intel_engines_init_mmio(struct drm_i915_private *i915) goto cleanup; } - RUNTIME_INFO(i915)->num_engines = - hweight32(INTEL_INFO(i915)->engine_mask); - i915_check_and_clear_faults(i915); return 0; diff --git a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c index 7a4804c47466..2edeedb748ed 100644 --- a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c @@ -1568,7 +1568,7 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags) struct intel_engine_cs *engine = rq->engine; enum intel_engine_id id; const int num_engines = - IS_HSW_GT1(i915) ? RUNTIME_INFO(i915)->num_engines - 1 : 0; + IS_HSW_GT1(i915) ? INTEL_INFO(i915)->num_engines - 1 : 0; bool force_restore = false; int len; u32 *cs; diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c index 84538f69185b..6f223f7facde 100644 --- a/drivers/gpu/drm/i915/gt/selftest_lrc.c +++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c @@ -1185,7 +1185,7 @@ static int smoke_crescendo(struct preempt_smoke *smoke, unsigned int flags) pr_info("Submitted %lu crescendo:%x requests across %d engines and %d contexts\n", count, flags, - RUNTIME_INFO(smoke->i915)->num_engines, smoke->ncontext); + INTEL_INFO(smoke->i915)->num_engines, smoke->ncontext); return 0; } @@ -1213,7 +1213,7 @@ static int smoke_random(struct preempt_smoke *smoke, unsigned int flags) pr_info("Submitted %lu random:%x requests across %d engines and %d contexts\n", count, flags, - RUNTIME_INFO(smoke->i915)->num_engines, smoke->ncontext); + INTEL_INFO(smoke->i915)->num_engines, smoke->ncontext); return 0; } diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c index ffa2ee70a03d..431a4a2c20e1 100644 --- a/drivers/gpu/drm/i915/i915_pci.c +++ b/drivers/gpu/drm/i915/i915_pci.c @@ -35,6 +35,7 @@ #define PLATFORM(x) .platform = (x) #define GEN(x) .gen = (x), .gen_mask = BIT((x) - 1) +#define ENGINES(x) .engine_mask = (x), .num_engines = hweight8(x) #define I845_PIPE_OFFSETS \ .pipe_offsets = { \ @@ -145,6 +146,7 @@ #define I830_FEATURES \ GEN(2), \ + ENGINES(BIT(RCS0)), \ .is_mobile = 1, \ .num_pipes = 2, \ .display.has_overlay = 1, \ @@ -154,7 +156,6 @@ .gpu_reset_clobbers_display = true, \ .hws_needs_physical = 1, \ .unfenced_needs_alignment = 1, \ - .engine_mask = BIT(RCS0), \ .has_snoop = true, \ .has_coherent_ggtt = false, \ I9XX_PIPE_OFFSETS, \ @@ -164,6 +165,7 @@ #define I845_FEATURES \ GEN(2), \ + ENGINES(BIT(RCS0)), \ .num_pipes = 1, \ .display.has_overlay = 1, \ .display.overlay_needs_physical = 1, \ @@ -171,7 +173,6 @@ .gpu_reset_clobbers_display = true, \ .hws_needs_physical = 1, \ .unfenced_needs_alignment = 1, \ - .engine_mask = BIT(RCS0), \ .has_snoop = true, \ .has_coherent_ggtt = false, \ I845_PIPE_OFFSETS, \ @@ -202,10 +203,10 @@ static const struct intel_device_info intel_i865g_info = { #define GEN3_FEATURES \ GEN(3), \ + ENGINES(BIT(RCS0)), \ .num_pipes = 2, \ .display.has_gmch = 1, \ .gpu_reset_clobbers_display = true, \ - .engine_mask = BIT(RCS0), \ .has_snoop = true, \ .has_coherent_ggtt = true, \ I9XX_PIPE_OFFSETS, \ @@ -286,11 +287,11 @@ static const struct intel_device_info
Re: [Intel-gfx] [PATCH 14/45] drm/i915: Make engine_mask & num_engines static
Quoting Tvrtko Ursulin (2019-04-25 11:20:47) > > On 25/04/2019 10:19, Chris Wilson wrote: > > Having removed the urge to modify the engine_mask at runtime, we can > > promote the num_engines from a runtime calculation to a static and push > > it into the device_info tables. > > What about fused off engines (intel_device_info_init_mmio)? Didn't notice. :(( Plan B was just to calculate num_engines at point of use, since it is rare enough. Oh well, this pair of patches was just inspired because I touched the cleanup code and not important. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH 01/45] drm/i915: Seal races between async GPU cancellation, retirement and signaling
On 25/04/2019 10:19, Chris Wilson wrote: Currently there is an underlying assumption that i915_request_unsubmit() is synchronous wrt the GPU -- that is the request is no longer in flight as we remove it. In the near future that may change, and this may upset our signaling as we can process an interrupt for that request while it is no longer in flight. CPU0CPU1 intel_engine_breadcrumbs_irq (queue request completion) i915_request_cancel_signaling ... ... i915_request_enable_signaling dma_fence_signal Hence in the time it took us to drop the lock to signal the request, a preemption event may have occurred and re-queued the request. In the process, that request would have seen I915_FENCE_FLAG_SIGNAL clear and so reused the rq->signal_link that was in use on CPU0, leading to bad pointer chasing in intel_engine_breadcrumbs_irq. A related issue was that if someone started listening for a signal on a completed but no longer in-flight request, we missed the opportunity to immediately signal that request. Furthermore, as intel_contexts may be immediately released during request retirement, in order to be entirely sure that intel_engine_breadcrumbs_irq may no longer dereference the intel_context (ce->signals and ce->signal_link), we must wait for irq spinlock. In order to prevent the race, we use a bit in the fence.flags to signal the transfer onto the signal list inside intel_engine_breadcrumbs_irq. For simplicity, we use the DMA_FENCE_FLAG_SIGNALED_BIT as it then quickly signals to any outside observer that the fence is indeed signaled. Fixes: 52c0fdb25c7c ("drm/i915: Replace global breadcrumbs with per-context interrupt tracking") Signed-off-by: Chris Wilson Cc: Tvrtko Ursulin --- drivers/dma-buf/dma-fence.c | 1 + drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 58 + drivers/gpu/drm/i915/i915_request.c | 1 + 3 files changed, 39 insertions(+), 21 deletions(-) diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c index 3aa8733f832a..9bf06042619a 100644 --- a/drivers/dma-buf/dma-fence.c +++ b/drivers/dma-buf/dma-fence.c @@ -29,6 +29,7 @@ EXPORT_TRACEPOINT_SYMBOL(dma_fence_emit); EXPORT_TRACEPOINT_SYMBOL(dma_fence_enable_signal); +EXPORT_TRACEPOINT_SYMBOL(dma_fence_signaled); static DEFINE_SPINLOCK(dma_fence_stub_lock); static struct dma_fence dma_fence_stub; diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c index 3cbffd400b1b..4283224249d4 100644 --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c @@ -23,6 +23,7 @@ */ #include +#include #include #include "i915_drv.h" @@ -83,6 +84,7 @@ static inline bool __request_completed(const struct i915_request *rq) void intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine) { struct intel_breadcrumbs *b = &engine->breadcrumbs; + const ktime_t timestamp = ktime_get(); struct intel_context *ce, *cn; struct list_head *pos, *next; LIST_HEAD(signal); @@ -104,6 +106,11 @@ void intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine) GEM_BUG_ON(!test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags)); + clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags); + + if (test_and_set_bit(DMA_FENCE_FLAG_SIGNALED_BIT, +&rq->fence.flags)) + continue; /* * Queue for execution after dropping the signaling @@ -111,14 +118,6 @@ void intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine) * more signalers to the same context or engine. */ i915_request_get(rq); - - /* -* We may race with direct invocation of -* dma_fence_signal(), e.g. i915_request_retire(), -* so we need to acquire our reference to the request -* before we cancel the breadcrumb. -*/ - clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags); list_add_tail(&rq->signal_link, &signal); } @@ -140,8 +139,21 @@ void intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine) list_for_each_safe(pos, next, &signal) { struct i915_request *rq = list_entry(pos, typeof(*rq), signal_link); + struct dma_fence_cb *cur, *tmp; + + trace_dma_fence_signaled(&rq->fence); + + rq->fence.timestamp = timestamp; + set_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, &rq->fence.f
Re: [Intel-gfx] [PATCH i-g-t] i915/gem_exec_schedule: Exercise resolving of userspace semaphores
Chris Wilson writes: > Check that we can reorder batches around userspace sempahore waits by semaphore > injecting a semaphore that is only released by a later context. > > Signed-off-by: Chris Wilson > --- > tests/i915/gem_exec_schedule.c | 143 + > 1 file changed, 143 insertions(+) > > diff --git a/tests/i915/gem_exec_schedule.c b/tests/i915/gem_exec_schedule.c > index 9a0795281..812197770 100644 > --- a/tests/i915/gem_exec_schedule.c > +++ b/tests/i915/gem_exec_schedule.c > @@ -52,6 +52,15 @@ > #define LOCAL_I915_EXEC_BSD_MASK (3 << LOCAL_I915_EXEC_BSD_SHIFT) > #define ENGINE_MASK (I915_EXEC_RING_MASK | LOCAL_I915_EXEC_BSD_MASK) > > +#define MI_SEMAPHORE_WAIT(0x1c << 23) > +#define MI_SEMAPHORE_POLL (1 << 15) > +#define MI_SEMAPHORE_SAD_GT_SDD (0 << 12) > +#define MI_SEMAPHORE_SAD_GTE_SDD (1 << 12) > +#define MI_SEMAPHORE_SAD_LT_SDD (2 << 12) > +#define MI_SEMAPHORE_SAD_LTE_SDD (3 << 12) > +#define MI_SEMAPHORE_SAD_EQ_SDD (4 << 12) > +#define MI_SEMAPHORE_SAD_NEQ_SDD (5 << 12) > + > IGT_TEST_DESCRIPTION("Check that we can control the order of execution"); > > static inline > @@ -463,6 +472,138 @@ static void semaphore_codependency(int i915) > } > } > > +static unsigned int offset_in_page(void *addr) > +{ > + return (uintptr_t)addr & 4095; > +} > + > +static void semaphore_resolve(int i915) > +{ > + const uint32_t SEMAPHORE_ADDR = 64 << 10; > + uint32_t semaphore, outer, inner, *sema; > + unsigned int engine; > + > + /* > + * Userspace may submit batches that wait upon unresolved > + * semaphores. Ideally, we want to put those blocking batches > + * to the back of the execution queue if we have something else > + * that is ready to run right away. This test exploits a failure > + * to reorder batches around a blocking semaphore by submitting > + * the release of that semaphore from a later context. > + */ > + > + igt_require(gem_scheduler_has_preemption(i915)); > + igt_assert(intel_get_drm_devid(i915) >= 8); > + > + outer = gem_context_create(i915); > + inner = gem_context_create(i915); > + > + semaphore = gem_create(i915, 4096); For the uninitiated, the assumption that first object is always at ppgtt address zero, is not so obvious. But this is not the first test to make that assumption nor the last. > + sema = gem_mmap__wc(i915, semaphore, 0, 4096, PROT_WRITE); > + > + for_each_physical_engine(i915, engine) { > + struct drm_i915_gem_exec_object2 obj[3]; > + struct drm_i915_gem_execbuffer2 eb; > + uint32_t handle, cancel; > + uint32_t *cs, *map; > + igt_spin_t *spin; > + > + if (!gem_can_store_dword(i915, engine)) > + continue; > + > + spin = __igt_spin_new(i915, .engine = engine); > + cancel = *spin->batch; > + igt_spin_end(spin); /* we just want its address for later */ I do see igt_spin_reset in here. And it also makes me now think if that I should check if we reset as a part of end. > + gem_sync(i915, spin->handle); > + *spin->batch = cancel; > + > + handle = gem_create(i915, 4096); > + cs = map = gem_mmap__cpu(i915, handle, 0, 4096, PROT_WRITE); > + > + /* Set semaphore initially to 1 for polling and signaling */ > + *cs++ = MI_STORE_DWORD_IMM; > + *cs++ = SEMAPHORE_ADDR; > + *cs++ = 0; > + *cs++ = 1; > + > + /* Wait until another batch writes to our semaphore */ > + *cs++ = MI_SEMAPHORE_WAIT | > + MI_SEMAPHORE_POLL | > + MI_SEMAPHORE_SAD_EQ_SDD | > + (4 - 2); > + *cs++ = 0; > + *cs++ = SEMAPHORE_ADDR; > + *cs++ = 0; > + > + /* Then cancel the spinner */ > + *cs++ = MI_STORE_DWORD_IMM; > + *cs++ = spin->obj[1].offset + offset_in_page(spin->batch); > + *cs++ = 0; > + *cs++ = MI_BATCH_BUFFER_END; > + > + *cs++ = MI_BATCH_BUFFER_END; > + munmap(map, 4096); > + > + memset(&eb, 0, sizeof(eb)); > + > + /* First up is our spinning semaphore */ > + memset(obj, 0, sizeof(obj)); > + obj[0] = spin->obj[1]; > + obj[1].handle = semaphore; > + obj[1].offset = SEMAPHORE_ADDR; > + obj[1].flags = EXEC_OBJECT_PINNED; > + obj[2].handle = handle; > + eb.buffer_count = 3; > + eb.buffers_ptr = to_user_pointer(obj); > + eb.rsvd1 = outer; > + gem_execbuf(i915, &eb); > + > + /* Then add the GPU hang intermediatory */ > + memset(obj, 0, sizeof(obj)); > + obj[0].handle = handle; > + obj[0]
Re: [Intel-gfx] [PATCH 01/45] drm/i915: Seal races between async GPU cancellation, retirement and signaling
Quoting Tvrtko Ursulin (2019-04-25 11:35:01) > > On 25/04/2019 10:19, Chris Wilson wrote: > > Currently there is an underlying assumption that i915_request_unsubmit() > > is synchronous wrt the GPU -- that is the request is no longer in flight > > as we remove it. In the near future that may change, and this may upset > > our signaling as we can process an interrupt for that request while it > > is no longer in flight. > > > > CPU0 CPU1 > > intel_engine_breadcrumbs_irq > > (queue request completion) > > i915_request_cancel_signaling > > ... ... > > i915_request_enable_signaling > > dma_fence_signal > > > > Hence in the time it took us to drop the lock to signal the request, a > > preemption event may have occurred and re-queued the request. In the > > process, that request would have seen I915_FENCE_FLAG_SIGNAL clear and > > so reused the rq->signal_link that was in use on CPU0, leading to bad > > pointer chasing in intel_engine_breadcrumbs_irq. > > > > A related issue was that if someone started listening for a signal on a > > completed but no longer in-flight request, we missed the opportunity to > > immediately signal that request. > > > > Furthermore, as intel_contexts may be immediately released during > > request retirement, in order to be entirely sure that > > intel_engine_breadcrumbs_irq may no longer dereference the intel_context > > (ce->signals and ce->signal_link), we must wait for irq spinlock. > > > > In order to prevent the race, we use a bit in the fence.flags to signal > > the transfer onto the signal list inside intel_engine_breadcrumbs_irq. > > For simplicity, we use the DMA_FENCE_FLAG_SIGNALED_BIT as it then > > quickly signals to any outside observer that the fence is indeed signaled. > > > > Fixes: 52c0fdb25c7c ("drm/i915: Replace global breadcrumbs with per-context > > interrupt tracking") > > Signed-off-by: Chris Wilson > > Cc: Tvrtko Ursulin > > --- > > drivers/dma-buf/dma-fence.c | 1 + > > drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 58 + > > drivers/gpu/drm/i915/i915_request.c | 1 + > > 3 files changed, 39 insertions(+), 21 deletions(-) > > > > diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c > > index 3aa8733f832a..9bf06042619a 100644 > > --- a/drivers/dma-buf/dma-fence.c > > +++ b/drivers/dma-buf/dma-fence.c > > @@ -29,6 +29,7 @@ > > > > EXPORT_TRACEPOINT_SYMBOL(dma_fence_emit); > > EXPORT_TRACEPOINT_SYMBOL(dma_fence_enable_signal); > > +EXPORT_TRACEPOINT_SYMBOL(dma_fence_signaled); > > > > static DEFINE_SPINLOCK(dma_fence_stub_lock); > > static struct dma_fence dma_fence_stub; > > diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c > > b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c > > index 3cbffd400b1b..4283224249d4 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c > > +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c > > @@ -23,6 +23,7 @@ > >*/ > > > > #include > > +#include > > #include > > > > #include "i915_drv.h" > > @@ -83,6 +84,7 @@ static inline bool __request_completed(const struct > > i915_request *rq) > > void intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine) > > { > > struct intel_breadcrumbs *b = &engine->breadcrumbs; > > + const ktime_t timestamp = ktime_get(); > > struct intel_context *ce, *cn; > > struct list_head *pos, *next; > > LIST_HEAD(signal); > > @@ -104,6 +106,11 @@ void intel_engine_breadcrumbs_irq(struct > > intel_engine_cs *engine) > > > > GEM_BUG_ON(!test_bit(I915_FENCE_FLAG_SIGNAL, > >&rq->fence.flags)); > > + clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags); > > + > > + if (test_and_set_bit(DMA_FENCE_FLAG_SIGNALED_BIT, > > + &rq->fence.flags)) > > + continue; > > > > /* > >* Queue for execution after dropping the signaling > > @@ -111,14 +118,6 @@ void intel_engine_breadcrumbs_irq(struct > > intel_engine_cs *engine) > >* more signalers to the same context or engine. > >*/ > > i915_request_get(rq); > > - > > - /* > > - * We may race with direct invocation of > > - * dma_fence_signal(), e.g. i915_request_retire(), > > - * so we need to acquire our reference to the request > > - * before we cancel the breadcrumb. > > - */ > > - clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags); > > list_add_tail(&rq->signal_link, &signal); > > } > > > > @@ -140,8 +139
Re: [Intel-gfx] [PATCH 14/45] drm/i915: Make engine_mask & num_engines static
On 25/04/2019 11:30, Chris Wilson wrote: Quoting Tvrtko Ursulin (2019-04-25 11:20:47) On 25/04/2019 10:19, Chris Wilson wrote: Having removed the urge to modify the engine_mask at runtime, we can promote the num_engines from a runtime calculation to a static and push it into the device_info tables. What about fused off engines (intel_device_info_init_mmio)? Didn't notice. :(( Plan B was just to calculate num_engines at point of use, since it is rare enough. I'd just leave it in runtime info for now. Don't see a benefit of moving it. If I can find my last series which tried to eliminate mkwrite and reduce runtime info, there were some notes in the resulting threads about which fields have what problems associated with them. I only remember the gen field was looking like easy pickings.. Anyways... Oh well, this pair of patches was just inspired because I touched the cleanup code and not important. .. please restrain the inspiration if ahead of media scalability otherwise we will never be done with it! :)) Regards, Tvrtko ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v6 1/3] drm: Add CRTC background color property (v5)
Op 26-02-2019 om 17:17 schreef Matt Roper: > On Tue, Feb 26, 2019 at 08:26:36AM +0100, Maarten Lankhorst wrote: >> Hey, >> >> Op 21-02-2019 om 01:28 schreef Matt Roper: >>> Some display controllers can be programmed to present non-black colors >>> for pixels not covered by any plane (or pixels covered by the >>> transparent regions of higher planes). Compositors that want a UI with >>> a solid color background can potentially save memory bandwidth by >>> setting the CRTC background property and using smaller planes to display >>> the rest of the content. >>> >>> To avoid confusion between different ways of encoding RGB data, we >>> define a standard 64-bit format that should be used for this property's >>> value. Helper functions and macros are provided to generate and dissect >>> values in this standard format with varying component precision values. >>> >>> v2: >>> - Swap internal representation's blue and red bits to make it easier >>>to read if printed out. (Ville) >>> - Document bgcolor property in drm_blend.c. (Sean Paul) >>> - s/background_color/bgcolor/ for consistency between property name and >>>value storage field. (Sean Paul) >>> - Add a convenience function to attach property to a given crtc. >>> >>> v3: >>> - Restructure ARGB component extraction macros to be easier to >>>understand and enclose the parameters in () to avoid calculations >>>if expressions are passed. (Sean Paul) >>> - s/rgba/argb/ in helper function/macro names. Even though the idea is >>>to not worry about the internal representation of the u64, it can >>>still be confusing to look at code that uses 'rgba' terminology, but >>>stores values with argb ordering. (Ville) >>> >>> v4: >>> - Drop the bgcolor_changed flag. (Ville, Brian Starkey) >>> - Clarify in kerneldoc that background color is expected to undergo the >>>same pipe-level degamma/csc/gamma transformations that planes do. >>>(Brian Starkey) >>> - Update kerneldoc to indicate non-opaque colors are allowed, but are >>>generally only useful in special cases such as when writeback >>>connectors are used (Brian Starkey / Eric Anholt) >>> >>> v5: >>> - Set crtc->state->bgcolor to solid black inside >>>drm_crtc_add_bgcolor_property() in case drivers don't do this >>>themselves. (Ville) >>> - Add kerneldoc to drm_crtc_add_bgcolor_property() function. >>> >>> Cc: dri-de...@lists.freedesktop.org >>> Cc: wei.c...@intel.com >>> Cc: harish.krupo@intel.com >>> Cc: Ville Syrjälä >>> Cc: Sean Paul >>> Cc: Brian Starkey >>> Cc: Eric Anholt >>> Cc: Stéphane Marchesin >>> Cc: Daniel Vetter >>> Signed-off-by: Matt Roper >>> Reviewed-by(v2): Sean Paul >>> Reviewed-by: Brian Starkey >> I like how bgcolor is a u64 now, but there is an issue with setting >> crtc->state->bgcolor in attaching the property. >> >> We should do it in atomic core init instead, like we already do for >> connector/plane properties.. >> >> See for example https://patchwork.freedesktop.org/series/52363/ >> >> This was specificallly for the background color proposal, but we >> probalby have to split out the non-trivial conversions of >> __drm_atomic_helper_crtc_reset. > Makes sense. What's the status of that patch? It looks like it has a > lot of a-b's/r-b's but hasn't landed yet. Is there anything blocking > it? If you're still driving it forward, I can wait for that to land and > then build on top of it. > > > Matt > I've split out the patch and landed some of the preparation patches, it should be good enough to unblock this patch series at least now. :) ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
[Intel-gfx] ✓ Fi.CI.IGT: success for series starting with [CI,1/5] drm/i915: Introduce struct intel_wakeref
== Series Details == Series: series starting with [CI,1/5] drm/i915: Introduce struct intel_wakeref URL : https://patchwork.freedesktop.org/series/59904/ State : success == Summary == CI Bug Log - changes from CI_DRM_5992_full -> Patchwork_12865_full Summary --- **SUCCESS** No regressions found. Known issues Here are the changes found in Patchwork_12865_full that come from known issues: ### IGT changes ### Issues hit * igt@gem_ctx_isolation@rcs0-s3: - shard-skl: [PASS][1] -> [INCOMPLETE][2] ([fdo#104108] / [fdo#107773]) +1 similar issue [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5992/shard-skl2/igt@gem_ctx_isolat...@rcs0-s3.html [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12865/shard-skl4/igt@gem_ctx_isolat...@rcs0-s3.html * igt@gem_eio@in-flight-suspend: - shard-apl: [PASS][3] -> [DMESG-WARN][4] ([fdo#108566]) +2 similar issues [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5992/shard-apl8/igt@gem_...@in-flight-suspend.html [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12865/shard-apl6/igt@gem_...@in-flight-suspend.html * igt@gem_mmap_gtt@forked-medium-copy: - shard-iclb: [PASS][5] -> [INCOMPLETE][6] ([fdo#107713]) +1 similar issue [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5992/shard-iclb3/igt@gem_mmap_...@forked-medium-copy.html [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12865/shard-iclb7/igt@gem_mmap_...@forked-medium-copy.html * igt@i915_pm_rc6_residency@rc6-accuracy: - shard-kbl: [PASS][7] -> [SKIP][8] ([fdo#109271]) +1 similar issue [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5992/shard-kbl5/igt@i915_pm_rc6_reside...@rc6-accuracy.html [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12865/shard-kbl5/igt@i915_pm_rc6_reside...@rc6-accuracy.html * igt@kms_flip@wf_vblank-ts-check-interruptible: - shard-snb: [PASS][9] -> [INCOMPLETE][10] ([fdo#105411]) [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5992/shard-snb4/igt@kms_flip@wf_vblank-ts-check-interruptible.html [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12865/shard-snb1/igt@kms_flip@wf_vblank-ts-check-interruptible.html * igt@kms_frontbuffer_tracking@fbcpsr-tilingchange: - shard-iclb: [PASS][11] -> [FAIL][12] ([fdo#103167]) +5 similar issues [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5992/shard-iclb8/igt@kms_frontbuffer_track...@fbcpsr-tilingchange.html [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12865/shard-iclb4/igt@kms_frontbuffer_track...@fbcpsr-tilingchange.html * igt@kms_plane@plane-panning-bottom-right-suspend-pipe-b-planes: - shard-skl: [PASS][13] -> [INCOMPLETE][14] ([fdo#104108]) +1 similar issue [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5992/shard-skl3/igt@kms_pl...@plane-panning-bottom-right-suspend-pipe-b-planes.html [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12865/shard-skl1/igt@kms_pl...@plane-panning-bottom-right-suspend-pipe-b-planes.html * igt@kms_plane_alpha_blend@pipe-b-coverage-7efc: - shard-skl: [PASS][15] -> [FAIL][16] ([fdo#108145] / [fdo#110403]) [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5992/shard-skl2/igt@kms_plane_alpha_bl...@pipe-b-coverage-7efc.html [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12865/shard-skl7/igt@kms_plane_alpha_bl...@pipe-b-coverage-7efc.html * igt@kms_psr2_su@page_flip: - shard-iclb: [PASS][17] -> [SKIP][18] ([fdo#109642]) [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5992/shard-iclb2/igt@kms_psr2_su@page_flip.html [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12865/shard-iclb6/igt@kms_psr2_su@page_flip.html * igt@kms_psr@no_drrs: - shard-iclb: [PASS][19] -> [FAIL][20] ([fdo#108341]) [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5992/shard-iclb6/igt@kms_psr@no_drrs.html [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12865/shard-iclb1/igt@kms_psr@no_drrs.html * igt@kms_psr@psr2_dpms: - shard-iclb: [PASS][21] -> [SKIP][22] ([fdo#109441]) +1 similar issue [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5992/shard-iclb2/igt@kms_psr@psr2_dpms.html [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12865/shard-iclb3/igt@kms_psr@psr2_dpms.html Possible fixes * igt@i915_suspend@fence-restore-tiled2untiled: - shard-apl: [DMESG-WARN][23] ([fdo#108566]) -> [PASS][24] +5 similar issues [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_5992/shard-apl6/igt@i915_susp...@fence-restore-tiled2untiled.html [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12865/shard-apl5/igt@i915_susp...@fence-restore-tiled2untiled.html * igt@kms_cursor_legacy@2x-nonblocking-modeset-vs-cursor-atomic: - s