Re: [RFC][PATCH] drm/amdgpu/powerplay/smu10: Add custom profile
On Tue, 7 Sept 2021 at 19:23, Alex Deucher wrote: > > On Tue, Sep 7, 2021 at 4:53 AM Daniel Gomez wrote: > > > > Add custom power profile mode support on smu10. > > Update workload bit list. > > --- > > > > Hi, > > > > I'm trying to add custom profile for the Raven Ridge but not sure if > > I'd need a different parameter than PPSMC_MSG_SetCustomPolicy to > > configure the custom values. The code seemed to support CUSTOM for > > workload types but it didn't show up in the menu or accept any user > > input parameter. So far, I've added that part but a bit confusing to > > me what is the policy I need for setting these parameters or if it's > > maybe not possible at all. > > > > After applying the changes I'd configure the CUSTOM mode as follows: > > > > echo manual > > > /sys/class/drm/card0/device/hwmon/hwmon1/device/power_dpm_force_performance_level > > echo "6 70 90 0 0" > > > /sys/class/drm/card0/device/hwmon/hwmon1/device/pp_power_profile_mode > > > > Then, using Darren Powell script for testing modes I get the following > > output: > > > > 05:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. > > [AMD/ATI] Raven Ridge [Radeon Vega Series / Radeon Vega Mobile Series] > > [1002:15dd] (rev 83) > > === pp_dpm_sclk === > > 0: 200Mhz > > 1: 400Mhz * > > 2: 1100Mhz > > === pp_dpm_mclk === > > 0: 400Mhz > > 1: 933Mhz * > > 2: 1067Mhz > > 3: 1200Mhz > > === pp_power_profile_mode === > > NUMMODE_NAME BUSY_SET_POINT FPS USE_RLC_BUSY MIN_ACTIVE_LEVEL > > 0 BOOTUP_DEFAULT : 70 60 0 0 > > 1 3D_FULL_SCREEN : 70 60 1 3 > > 2 POWER_SAVING : 90 60 0 0 > > 3 VIDEO : 70 60 0 0 > > 4 VR : 70 90 0 0 > > 5COMPUTE : 30 60 0 6 > > 6 CUSTOM*: 70 90 0 0 > > > > As you can also see in my changes, I've also updated the workload bit > > table but I'm not completely sure about that change. With the tests > > I've done, using bit 5 for the WORKLOAD_PPLIB_CUSTOM_BIT makes the > > gpu sclk locked around ~36%. So, maybe I'm missing a clock limit > > configuraton table somewhere. Would you give me some hints to > > proceed with this? > > I don't think APUs support customizing the workloads the same way > dGPUs do. I think they just support predefined profiles. > > Alex Thanks Alex for the quick response. Would it make sense then to remove the custom workload code (PP_SMC_POWER_PROFILE_CUSTOM) from the smu10? That workload was added in this commit: f6f75ebdc06c04d3cfcd100f1b10256a9cdca407 [1] and not use at all in the code as it's limited to PP_SMC_POWER_PROFILE_COMPUTE index. The smu10.h also includes the custom workload bit definition and that was a bit confusing for me to understand if it was half-supported or not possible to use at all as I understood from your comment. Perhaps could also be mentioned (if that's kind of standard) in the documentation[2] so, the custom pp_power_profile_mode is only supported in dGPUs. I can send the patches if it makes sense. [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c?id=f6f75ebdc06c04d3cfcd100f1b10256a9cdca407 [2]: https://www.kernel.org/doc/html/latest/gpu/amdgpu.html#pp-power-profile-mode Daniel > > > > > > Thanks in advance, > > Daniel > > > > > > drivers/gpu/drm/amd/pm/inc/smu10.h| 14 +++-- > > .../drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c | 57 +-- > > .../drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.h | 1 + > > 3 files changed, 61 insertions(+), 11 deletions(-) > > > > diff --git a/drivers/gpu/drm/amd/pm/inc/smu10.h > > b/drivers/gpu/drm/amd/pm/inc/smu10.h > > index 9e837a5014c5..b96520528240 100644 > > --- a/drivers/gpu/drm/amd/pm/inc/smu10.h > > +++ b/drivers/gpu/drm/amd/pm/inc/smu10.h > > @@ -136,12 +136,14 @@ > > #define FEATURE_CORE_CSTATES_MASK (1 << FEATURE_CORE_CSTATES_BIT) > > > > /* Workload bits */ > > -#define WORKLOAD_PPLIB_FULL_SCREEN_3D_BIT 0 > > -#define WORKLOAD_PPLIB_VIDEO_BIT 2 > > -#define WORKLOAD_PPLIB_VR_BIT 3 > > -#define WORKLOAD_PPLIB_COMPUTE_BIT4 > > -#define WORKLOAD_PPLIB_CUSTOM_BIT 5 > > -#define WORKLOAD_PPLIB_COUNT 6 > > +#define WORKLOAD_DEFAULT_BIT 0 > > +#define WORKLOAD_PPLIB_FULL_SCREEN_3D_BIT 1 > > +#define WORKLOAD_PPLIB_POWER_SAVING_BIT 2 > > +#define WORKLOAD_PPLIB_VIDEO_BIT 3 > > +#define WORKLOAD_PPLIB_VR_BIT 4 > > +#define WORKLOAD_PPLIB_COMPUTE_BIT5 > > +#define WORKLOAD_PPLIB_CUSTOM_BIT 6 > > +#define WORKLOAD_PPLIB_COUNT 7 > > > > typedef struct { > > /* MP1_EXT_SCRATCH0 */ > > diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c > > b/drivers/gpu/drm/amd/pm/po
Re: [PATCH v10 07/17] dt-bindings: display: mediatek: merge: add additional prop for mt8195
Hi Philipp, Thanks for the reviews. On Wed, 2021-09-08 at 08:39 +0200, Philipp Zabel wrote: > Hi Jason, > > On Wed, 2021-09-08 at 14:03 +0800, jason-jh.lin wrote: > > add MERGE additional properties description for mt8195: > > 1. async clock > > 2. fifo setting enable > > 3. reset controller > > > > Signed-off-by: jason-jh.lin > > --- > > .../display/mediatek/mediatek,merge.yaml | 30 > > +++ > > 1 file changed, 30 insertions(+) > > > > diff --git > > a/Documentation/devicetree/bindings/display/mediatek/mediatek,merge > > .yaml > > b/Documentation/devicetree/bindings/display/mediatek/mediatek,merge > > .yaml > > index 75beeb207ceb..0fe204d9ad2c 100644 > > --- > > a/Documentation/devicetree/bindings/display/mediatek/mediatek,merge > > .yaml > > +++ > > b/Documentation/devicetree/bindings/display/mediatek/mediatek,merge > > .yaml > > @@ -38,6 +38,19 @@ properties: > >clocks: > > items: > >- description: MERGE Clock > > + - description: MERGE Async Clock > > + Controlling the synchronous process between MERGE and > > other display > > + function blocks cross clock domain. > > + > > + mediatek,merge-fifo-en: > > +description: > > + The setting of merge fifo is mainly provided for the display > > latency > > + buffer to ensure that the back-end panel display data will > > not be > > + underrun, a little more data is needed in the fifo. > > + According to the merge fifo settings, when the water level > > is detected > > + to be insufficient, it will trigger RDMA sending ultra and > > preulra > > + command to SMI to speed up the data rate. > > +type: boolean > > > >mediatek,gce-client-reg: > > description: > > @@ -50,6 +63,10 @@ properties: > > $ref: /schemas/types.yaml#/definitions/phandle-array > > maxItems: 1 > > > > + resets: > > +description: reset controller > > + See Documentation/devicetree/bindings/reset/reset.txt for > > details. > > From the example this looks like it could have a maxItems: 1. OK, I think it could have a maxItems: 1 in mt8195 because merge1~megre5 only have one async clock. > > > + > > required: > >- compatible > >- reg > > Should the resets property be required for "mediatek,mt8195-disp- > merge"? I think the resets property is not the required propoerty. The reset controller is for async clock of MERGE module on vdosys1. MERGE module on vdosys0 doesn't have async clock, so it doesn't need to add the resets property. Regards, Jason-JH.Lin > > > @@ -67,3 +84,16 @@ examples: > > power-domains = <&spm MT8173_POWER_DOMAIN_MM>; > > clocks = <&mmsys CLK_MM_DISP_MERGE>; > > }; > > + > > +merge5: disp_vpp_merge5@1c11 { > > +compatible = "mediatek,mt8195-disp-merge"; > > +reg = <0 0x1c11 0 0x1000>; > > +interrupts = ; > > +clocks = <&vdosys1 CLK_VDO1_VPP_MERGE4>, > > + <&vdosys1 CLK_VDO1_MERGE4_DL_ASYNC>; > > +clock-names = "merge","merge_async"; > > +power-domains = <&spm MT8195_POWER_DOMAIN_VDOSYS1>; > > +mediatek,gce-client-reg = <&gce1 SUBSYS_1c11 0x > > 0x1000>; > > +mediatek,merge-fifo-en = <1>; > > +resets = <&vdosys1 > > MT8195_VDOSYS1_SW0_RST_B_MERGE4_DL_ASYNC>; > > +}; > > regards > Philipp -- Jason-JH Lin
Re: Handling DRM master transitions cooperatively
On Tue, 7 Sep 2021 14:42:56 +0200 Hans de Goede wrote: > Hi, > > On 9/7/21 12:07 PM, Pekka Paalanen wrote: > > On Fri, 3 Sep 2021 21:08:21 +0200 > > Dennis Filder wrote: > > > >> Hans de Goede asked me to take a topic from a private discussion here. > >> I must also preface that I'm not a graphics person and my knowledge of > >> DRI/DRM is cursory at best. > >> > >> I initiated the conversation with de Goede after learning that the X > >> server now supports being started with an open DRM file descriptor > >> (this was added for Keith Packard's xlease project). I wondered if > >> that could be used to smoothen the Plymouth->X transition somehow and > >> asked de Goede if there were any such plans. He denied, but mentioned > >> that a new ioctl is in the works to prevent the kernel from wiping the > >> contents of a frame buffer after a device is closed, and that this > >> would help to keep transitions smooth. > > > > Hi, > > > > I believe the kernel is not wiping anything on device close. If > > something in the KMS state is wiped, it originates in userspace: > > > > - Plymouth doing something (e.g. RmFB on an in-use FB will turn the > > output off, you need to be careful to "leak" your FB if you want a > > smooth hand-over) > > The "kernel is not wiping anything on device close" is not true, > when closing /dev/dri/card# any remaining FBs from the app closing > it will be dealt with as if they were RmFB-ed, causing the screen > to show what I call "the fallback fb", at least with the i915 driver. No, that's not what should happen AFAIK. True, all FBs that are not referenced by active CRTCs or planes will get freed, since their refcount drops to zero, but those CRTCs and planes that are active will remain active and therefore keep their reference to the respective FBs and so the FBs remain until replaced or turned off explicitly (by e.g. fbcon if you switch to that rather than another userspace KMS client). I believe that is the whole reason why e.g. DRM_IOCTL_MODE_GETFB2 can be useful, otherwise the next KMS client would not have anything to scrape. danvet, what is the DRM core intention? Or am I confused because display servers do not tend to close the DRM device fd on switch-out but Plymouth does (too early)? If so, why can't Plymouth keep the device open longer and quit only when the hand-off is complete? Not quitting too early would be a prerequisite for any explicit hand-off protocol as well. Thanks, pq > > - Xorg doing something (e.g. resetting instead of inheriting KMS state) > > > > - Something missed in the hand-off sequence which allows fbcon to > > momentarily take over between Plymouth and Xorg. This would need to > > be fixed between Plymouth and Xorg. > > > > - Maybe systemd-logind does something odd to the KMS device? It has > > pretty wild code there. Or maybe it causes fbcon to take over. > > > > What is the new ioctl you referred to? > > It is an ioctl to mark a FB to not have it auto-removed on device-close, > instead leaving it in place until some some kernel/userspace client > actively installs another FB. This was proposed by Rob Clark quite > a while ago, but it never got anywhere because of lack of userspace > actually interested in using it. > > I've been thinking about reviving Rob's patch, since at least for > plymouth this would be pretty useful to have. > > Regards, > > Hans > pgpgX0HYsfvFC.pgp Description: OpenPGP digital signature
Re: Handling DRM master transitions cooperatively
Hi, On 9/8/21 9:36 AM, Pekka Paalanen wrote: > On Tue, 7 Sep 2021 14:42:56 +0200 > Hans de Goede wrote: > >> Hi, >> >> On 9/7/21 12:07 PM, Pekka Paalanen wrote: >>> On Fri, 3 Sep 2021 21:08:21 +0200 >>> Dennis Filder wrote: >>> Hans de Goede asked me to take a topic from a private discussion here. I must also preface that I'm not a graphics person and my knowledge of DRI/DRM is cursory at best. I initiated the conversation with de Goede after learning that the X server now supports being started with an open DRM file descriptor (this was added for Keith Packard's xlease project). I wondered if that could be used to smoothen the Plymouth->X transition somehow and asked de Goede if there were any such plans. He denied, but mentioned that a new ioctl is in the works to prevent the kernel from wiping the contents of a frame buffer after a device is closed, and that this would help to keep transitions smooth. >>> >>> Hi, >>> >>> I believe the kernel is not wiping anything on device close. If >>> something in the KMS state is wiped, it originates in userspace: >>> >>> - Plymouth doing something (e.g. RmFB on an in-use FB will turn the >>> output off, you need to be careful to "leak" your FB if you want a >>> smooth hand-over) >> >> The "kernel is not wiping anything on device close" is not true, >> when closing /dev/dri/card# any remaining FBs from the app closing >> it will be dealt with as if they were RmFB-ed, causing the screen >> to show what I call "the fallback fb", at least with the i915 driver. > > No, that's not what should happen AFAIK. I'm pretty sure that that is what is happening though. But hopefully someone else can either confirm or deny this :) > True, all FBs that are not referenced by active CRTCs or planes will > get freed, since their refcount drops to zero, but those CRTCs and > planes that are active will remain active and therefore keep their > reference to the respective FBs and so the FBs remain until replaced or > turned off explicitly (by e.g. fbcon if you switch to that rather than > another userspace KMS client). I believe that is the whole reason why > e.g. DRM_IOCTL_MODE_GETFB2 can be useful, otherwise the next KMS client > would not have anything to scrape. > > danvet, what is the DRM core intention? > > Or am I confused because display servers do not tend to close the DRM > device fd on switch-out but Plymouth does (too early)? > > If so, why can't Plymouth keep the device open longer and quit only > when the hand-off is complete? Not quitting too early would be a > prerequisite for any explicit hand-off protocol as well. plymouth is actually keeping the device open longer for exactly this reason, the following happens: 1. plmouth starts 2. gdm starts and tells plymouth to "deactivate" which will stop it from making drm ioctls and drop its drm master rights, while keeping the fb around 3. gdm waits for the greeter process to tell it that it has successfully taken over the screen 4. gdm tells plymouth to quit And something similar is happening on gdm greeter -> gnome user session handover. But we need the new ioctl at least on shutdown / reboot to avoid the "fallback fb" (typically the EFI/BIOS setup fb which i915 inherited at boot) showing for a brief moment when plymouth quits at shutdown / reboot and there is nothing to hand-over the fb to in that case. And the new ioctl would also make the above handover a lot simpler. And we currently also have a flicker when going from user-session to gdm on logout or from gdm/user-session to plymout on shutdown/reboot. Basically we have quite a few transitions and currently only the boot + login path is smooth and the rest needs more work, which either requires a standardized handover method (instead of the current hardcoded plymouth -> gdm stuff), or just allowing the FB to sit around until the next drm-client installs its FB, which would be much more KISS, so that has my preference. And this KISS method will also work with transitions to a new console-owner process which is not aware of any handover protocols, as long as the old process uses the ioctl the transition will be smooth. So e.g. gdm -> i3 on Xorg session will be smooth (1) Regards, Hans 1) I think this actually already is smooth because in this case gdm just sleeps for 5 seconds before killing the greeter I believe, but with the ioctl we could remove this hack
Re: [PATCH v2 3/6] drm/i915 Implement LMEM backup and restore for suspend / resume
Hi, Matt, Thanks for reviewing. On 9/7/21 7:37 PM, Matthew Auld wrote: + i915_gem_ww_unlock_single(backup); + i915_gem_object_put(backup); I assume we need to set ttm.backup = NULL somewhere here on the failure path, or don't drop the ref? Or at least it looks like potential uaf later? Yes, I think on failure, we just don't drop the ref here in case something at some point decides to retry. I'll fix up this and other comments. /Thomas + + return err; +} +
Re: [Freedreno] [PATCH 2/3] drm/msm/dpu1: Add MSM8998 to hw catalog
Hi, On Tue, 7 Sept 2021 at 22:13, Jeffrey Hugo wrote: > > On Wed, Sep 1, 2021 at 12:11 PM AngeloGioacchino Del Regno > wrote: > > > > Bringup functionality for MSM8998 in the DPU, driver which is mostly > > the same as SDM845 (just a few variations). > > > > Signed-off-by: AngeloGioacchino Del Regno > > > > I don't seem to see a cover letter for this series. > > Eh, there are a fair number of differences between the MDSS versions > for 8998 and 845. > > Probably a bigger question, why extend the DPU driver for 8998, when > the MDP5 driver already supports it[1]? The MDP/DPU split is pretty > dumb, but I don't see a valid reason for both drivers supporting the > same target/display revision. IMO, if you want this support in DPU, > remove it from MDP5. > > [1] > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.14&id=d6c7b2284b14c66a268a448a7a8d54f585d38785 I don't think that we should enforce such requirements. Having support both in MDP5 and DPU would allow one to compare those two drivers, performance, features, etc. It might be that all MDP5-supported hardware would be also supported by DPU, thus allowing us to remove the former driver. But until that time I'd suggest leaving support in place. -- With best wishes Dmitry
Re: [PATCH v10 01/17] dt-bindings: arm: mediatek: mmsys: add power and gce properties
Hi Jason, Thank you for your patch. One small comment below. On 8/9/21 8:02, jason-jh.lin wrote: > Power: > 1. Add description for power-domains property. > > GCE: > 1. Add description for mboxes property. > 2. Add description for mediatek,gce-client-reg property. > > Signed-off-by: jason-jh.lin > --- > .../bindings/arm/mediatek/mediatek,mmsys.yaml | 30 ++- > 1 file changed, 29 insertions(+), 1 deletion(-) > > diff --git > a/Documentation/devicetree/bindings/arm/mediatek/mediatek,mmsys.yaml > b/Documentation/devicetree/bindings/arm/mediatek/mediatek,mmsys.yaml > index 2d4ff0ce387b..a2e7bddfed03 100644 > --- a/Documentation/devicetree/bindings/arm/mediatek/mediatek,mmsys.yaml > +++ b/Documentation/devicetree/bindings/arm/mediatek/mediatek,mmsys.yaml > @@ -39,6 +39,30 @@ properties: >reg: > maxItems: 1 > > + power-domains: > +description: > + A phandle and PM domain specifier as defined by bindings > + of the power controller specified by phandle. See > + Documentation/devicetree/bindings/power/power-domain.yaml for details. > + > + mboxes: > +description: > + Using mailbox to communicate with GCE, it should have this > + property and list of phandle, mailbox specifiers. See > + Documentation/devicetree/bindings/mailbox/mtk-gce.txt for details. > +$ref: /schemas/types.yaml#/definitions/phandle-array > + > + mediatek,gce-client-reg: > +description: > + The register of client driver can be configured by gce with 4 arguments > + defined in this property, such as phandle of gce, subsys id, > + register offset and size. > + Each subsys id is mapping to a base address of display function blocks > + register which is defined in the gce header > + include/dt-bindings/gce/-gce.h. > +$ref: /schemas/types.yaml#/definitions/phandle-array > +maxItems: 1 > + >"#clock-cells": > const: 1 > > @@ -53,6 +77,10 @@ examples: >- | > mmsys: syscon@1400 { > compatible = "mediatek,mt8173-mmsys", "syscon"; > -reg = <0x1400 0x1000>; > +reg = <0 0x1400 0 0x1000>; Why this change? Thanks, Enric > +power-domains = <&spm MT8173_POWER_DOMAIN_MM>; > #clock-cells = <1>; > +mboxes = <&gce 0 CMDQ_THR_PRIO_HIGHEST>, > + <&gce 1 CMDQ_THR_PRIO_HIGHEST>; > +mediatek,gce-client-reg = <&gce SUBSYS_1400 0 0x1000>; > }; >
Re: [PATCH v4] drm/i915: Use Transparent Hugepages when IOMMU is enabled
On 07/09/2021 12:13, Eero Tamminen wrote: Hi, For completeness sake, it might be worth mentioning specifically what (synthetic) test-cases regress with THP patch. * Skylake GT4e: 20-25% SynMark TexMem* (whereas all MemBW GPU tests either improve or are not affected) * Broxton J4205: 7% MemBW GPU texture 2-3% SynMark TexMem* * Tigerlake-H: 7% MemBW GPU blend Ah right that makes sense. All the entries marker with asterisk under the "with patch" list. Okay if I just add an explanation on what does the asterisk mean for them at a single place? And about the Broxton one. In the bug you put "15-20% MemBW GPU texture" and "10% SynMark TexMem*" so from where are these numbers now? I have no idea why on GEN9 texture accesses regress, but on GEN12 TGL it's render buffer blend that regresses. Blend (read+write) regressing is especially odd, as neither render buffer read nor write regresses. Maybe that is a GEN12 specific driver bug similar to Mesa/i965 bug from few years back in how its shaders access render buffer, that had caused SIMD32 accesses to regress memory BW bound test-cases perf a bit compared to SIMD16? (Blend test is likely to run nowadays as SIMD32.) No idea on this one from me, leaving to more qualified people to comment. Regards, Tvrtko - Eero On 7.9.2021 13.34, Tvrtko Ursulin wrote: From: Tvrtko Ursulin Usage of Transparent Hugepages was disabled in 9987da4b5dcf ("drm/i915: Disable THP until we have a GPU read BW W/A"), but since it appears majority of performance regressions reported with an enabled IOMMU can be almost eliminated by turning them on, lets just do that. To err on the side of safety we keep the current default in cases where IOMMU is not active, and only when it is default to the "huge=within_size" mode. Although there probably would be wins to enable them throughout, more extensive testing across benchmarks and platforms would need to be done. With the patch and IOMMU enabled my local testing on a small Skylake part shows OglVSTangent regression being reduced from ~14% (IOMMU on versus IOMMU off) to ~2% (same comparison but with THP on). More detailed testing done in the below referenced Gitlab issue by Eero: Skylake GT4e: Performance drops from enabling IOMMU: 30-35% SynMark CSDof 20-25% Unigine Heaven, MemBW GPU write, SynMark VSTangent ~20% GLB Egypt (1/2 screen window) 10-15% GLB T-Rex (1/2 screen window) 8-10% GfxBench T-Rex, MemBW GPU blit 7-8% SynMark DeferredAA + TerrainFly* + ZBuffer 6-7% GfxBench Manhattan 3.0 + 3.1, SynMark TexMem128 & CSCloth 5-6% GfxBench CarChase, Unigine Valley 3-5% GfxBench Vulkan & GL AztecRuins + ALU2, MemBW GPU texture, SynMark Fill*, Deferred, TerrainPan* 1-2% Most of the other tests With the patch drops become: 20-25% SynMark TexMem* 15-20% GLB Egypt (1/2 screen window) 10-15% GLB T-Rex (1/2 screen window) 4-7% GfxBench T-Rex, GpuTest Triangle 1-8% GfxBench ALU2 (offscreen 1%, onscreen 8%) 3% GfxBench Manhattan 3.0, SynMark CSDof 2-3% Unigine Heaven + Valley, MemBW GPU texture 1-3 GfxBench Manhattan 3.1 + CarChase + Vulkan & GL AztecRuins Broxton: Performance drops from IOMMU, without patch: 30% MemBW GPU write 25% SynMark ZBuffer + Fill* 20% MemBW GPU blit 15% MemBW GPU blend, GpuTest Triangle 10-15% MemBW GPU texture 10% GLB Egypt, Unigine Heaven (had hangs), SynMark TerrainFly* 7-9% GLB T-Rex, GfxBench Manhattan 3.0 + T-Rex, SynMark Deferred* + TexMem* 6-8% GfxBench CarChase, Unigine Valley, SynMark CSCloth + ShMapVsm + TerrainPan* 5-6% GfxBench Manhattan 3.1 + GL AztecRuins, SynMark CSDof + TexFilterTri 2-4% GfxBench ALU2, SynMark DrvRes + GSCloth + ShMapPcf + Batch[0-5] + TexFilterAniso, GpuTest GiMark + 32-bit Julia And with patch: 15-20% MemBW GPU texture 10% SynMark TexMem* 8-9% GLB Egypt (1/2 screen window) 4-5% GLB T-Rex (1/2 screen window) 3-6% GfxBench Manhattan 3.0, GpuTest FurMark, SynMark Deferred + TexFilterTri 3-4% GfxBench Manhattan 3.1 + T-Rex, SynMark VSInstancing 2-4% GpuTest Triangle, SynMark DeferredAA 2-3% Unigine Heaven + Valley 1-3% SynMark Terrain* 1-2% GfxBench CarChase, SynMark TexFilterAniso + ZBuffer Tigerlake-H: 20-25% MemBW GPU texture 15-20% GpuTest Triangle 13-15% SynMark TerrainFly* + DeferredAA + HdrBloom 8-10% GfxBench Manhattan 3.1, SynMark TerrainPan* + DrvRes 6-7% GfxBench Manhattan 3.0, SynMark TexMem* 4-8% GLB onscreen Fill + T-Rex + Egypt (more in onscreen than offscreen versions of T-Rex/Egypt) 4-6% GfxBench CarChase + GLES AztecRuins + ALU2, GpuTest 32-bit Julia, SynMark CSDof + DrvState 3-5% GfxBench T-Rex + Egypt, Unigine Heaven + Valley, GpuTest Plot3D 1-7% Media tests 2-3% MemBW GPU blit 1-3%
Re: linux-next: build failure after merge of the drm tree
On Wed, Sep 8, 2021 at 5:14 AM Masahiro Yamada wrote: > > On Mon, Sep 6, 2021 at 4:34 PM Daniel Vetter wrote: > > > > On Mon, Sep 6, 2021 at 12:49 AM Stephen Rothwell > > wrote: > > > Hi all, > > > > > > On Thu, 2 Sep 2021 07:50:38 +1000 Stephen Rothwell > > > wrote: > > > > > > > > On Fri, 20 Aug 2021 15:23:34 +0900 Masahiro Yamada > > > > wrote: > > > > > > > > > > On Fri, Aug 20, 2021 at 11:33 AM Stephen Rothwell > > > > > wrote: > > > > > > > > > > > After merging the drm tree, today's linux-next build (x86_64 > > > allmodconfig) > > > > > > failed like this: > > > > > > > > > > > > In file included from drivers/gpu/drm/i915/i915_debugfs.c:39: > > > > > > drivers/gpu/drm/i915/gt/intel_gt_requests.h:9:10: fatal error: > > > > > > stddef.h: No such file or directory > > > > > > 9 | #include > > > > > > | ^~ > > > > > > > > > > > > Caused by commit > > > > > > > > > > > > 564f963eabd1 ("isystem: delete global -isystem compile option") > > > > > > > > > > > > from the kbuild tree interacting with commit > > > > > > > > > > > > b97060a99b01 ("drm/i915/guc: Update intel_gt_wait_for_idle to > > > > > > work with GuC") > > > > > > > > > > > > I have applied the following patch for today. > > > > > > > > > > > > > > > Thanks. > > > > > > > > > > This fix-up does not depend on my kbuild tree in any way. > > > > > > > > > > So, the drm maintainer can apply it to his tree. > > > > > > > > > > Perhaps with > > > > > > > > > > Fixes: b97060a99b01 ("drm/i915/guc: Update intel_gt_wait_for_idle to > > > > > work with GuC") > > > > > > > > OK, so that didn't happen so I will now apply the merge fix up to the > > > > merge of the kbuild tree. > > > > > > > > > > From: Stephen Rothwell > > > > > > Date: Fri, 20 Aug 2021 12:24:19 +1000 > > > > > > Subject: [PATCH] drm/i915: use linux/stddef.h due to "isystem: > > > > > > trim/fixup stdarg.h and other headers" > > > > > > > > > > > > Signed-off-by: Stephen Rothwell > > > > > > --- > > > > > > drivers/gpu/drm/i915/gt/intel_gt_requests.h | 2 +- > > > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.h > > > > > > b/drivers/gpu/drm/i915/gt/intel_gt_requests.h > > > > > > index 51dbe0e3294e..d2969f68dd64 100644 > > > > > > --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.h > > > > > > +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.h > > > > > > @@ -6,7 +6,7 @@ > > > > > > #ifndef INTEL_GT_REQUESTS_H > > > > > > #define INTEL_GT_REQUESTS_H > > > > > > > > > > > > -#include > > > > > > +#include > > > > > > > > > > > > struct intel_engine_cs; > > > > > > struct intel_gt; > > > > > > -- > > > > > > 2.32.0 > > > > > > Ping? I am still applying this ... > > > > Apologies, this fell through a lot of cracks. I applied this to drm-next > > now. > > > > Rather, I was planning to apply this fix to my kbuild tree. > > Since you guys did not fix the issue in time, > I ended up with dropping [1] from my pull request. > > I want to get [1] merged in this MW. > > If I postponed it, somebody would add new > or inclusion in the next development > cycle, I will never make it in the mainline. > > [1] > https://lore.kernel.org/linux-kernel/YQhY40teUJcTc5H4@localhost.localdomain/ Yeah no problem if you apply it too. For that: Acked-by: Daniel Vetter I just figured I make sure this is at least not lost. -Daniel > > > > > > > Matt/John, as author/committer it's your job to make sure issues and > > fixes for the stuff you're pushing don't get lost. I'd have expected > > John to apply this to at least drm-intel-gt-next (it's not even > > there). > > > > Joonas, I think this is the 2nd or 3rd or so issue this release cycle > > where some compile fix got stuck a bit because drm-intel-gt-next isn't > > in linux-next. Can we please fix that? It probably needs some changes > > to the dim script. > > > > Cheers, Daniel > > -- > > Daniel Vetter > > Software Engineer, Intel Corporation > > http://blog.ffwll.ch > > > > -- > Best Regards > Masahiro Yamada -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [Intel-gfx] [PATCH] drm/i915: Get PM ref before accessing HW register
On 08/09/2021 00:27, Vinay Belgaumkar wrote: Seeing these errors when GT is likely in suspend state- "RPM wakelock ref not held during HW access" Ensure GT is awake before trying to access HW registers. Avoid reading the register if that is not the case. Signed-off-by: Vinay Belgaumkar Fixes: 41e5c17ebfc2 ("drm/i915/guc/slpc: Sysfs hooks for SLPC") Reviewed-by: Tvrtko Ursulin Regards, Tvrtko --- drivers/gpu/drm/i915/gt/intel_rps.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c b/drivers/gpu/drm/i915/gt/intel_rps.c index 3489f5f0cac1..e1a198bbd135 100644 --- a/drivers/gpu/drm/i915/gt/intel_rps.c +++ b/drivers/gpu/drm/i915/gt/intel_rps.c @@ -1969,8 +1969,14 @@ u32 intel_rps_read_actual_frequency(struct intel_rps *rps) u32 intel_rps_read_punit_req(struct intel_rps *rps) { struct intel_uncore *uncore = rps_to_uncore(rps); + struct intel_runtime_pm *rpm = rps_to_uncore(rps)->rpm; + intel_wakeref_t wakeref; + u32 freq = 0; - return intel_uncore_read(uncore, GEN6_RPNSWREQ); + with_intel_runtime_pm_if_in_use(rpm, wakeref) + freq = intel_uncore_read(uncore, GEN6_RPNSWREQ); + + return freq; } static u32 intel_rps_get_req(u32 pureq)
Re: [PATCH] doc: gpu: Add document describing buffer exchange
On Sun, 5 Sep 2021 13:27:42 +0100 Daniel Stone wrote: > Since there's a lot of confusion around this, document both the rules > and the best practice around negotiating, allocating, importing, and > using buffers when crossing context/process/device/subsystem boundaries. > > This ties up all of dmabuf, formats and modifiers, and their usage. > > Signed-off-by: Daniel Stone Hi, I checked the comments from Simon and Bob, and I agree with them. Below are some more from me. There is room for adding a glossary for the terms, like what is the difference between a buffer, pixel buffer and a memory buffer, and things like pixel data, color value, stride, etc. For example: image Conceptually a two-dimensional array of pixels. The pixels may be stored in one or more memory buffers. Has width and height in pixels, pixel format and modifier (implicit or explicit). memory buffer A piece of memory for storing (parts of) pixel data. Has stride and size in bytes and at least one handle in some API. May contain one or more planes. plane A two-dimensional array of some or all of an image's color and alpha channel values. pixel A picture element. Has a single color value which is defined by one or more color channels values, e.g. R, G and B, or Y, Cb and Cr. May also have an alpha value as an additional channel. pixel data Bytes or bits that represent some or all of the color/alpha channel values of a pixel or an image. The data for one pixel may be spread over several planes or memory buffers depending on format and modifier. color value A tuple of numbers, representing a color. Each element in the tuple is a color channel value. color channel One of the dimensions in a color model. For example, RGB model has channels R, G, and B. Alpha channel is sometimes counted as a color channel as well. pixel format A description of how pixel data represents the pixel's color and alpha values. modifier A description of how pixel data is laid out in memory buffers. alpha A value that denotes the color coverage in a pixel. Sometimes used for translucency instead. stride > --- > > This is just a quick first draft, inspired by: > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3197#note_1048637 > > It's not complete or perfect, but I'm off to eat a roast then have a > nice walk in the sun, so figured it'd be better to dash it off rather > than let it rot on my hard drive. For a quick draft, this is quite excellent. > > .../gpu/exchanging-pixel-buffers.rst | 285 ++ > Documentation/gpu/index.rst | 1 + > 2 files changed, 286 insertions(+) > create mode 100644 Documentation/gpu/exchanging-pixel-buffers.rst > > diff --git a/Documentation/gpu/exchanging-pixel-buffers.rst > b/Documentation/gpu/exchanging-pixel-buffers.rst > new file mode 100644 > index ..75c4de13d5c8 > --- /dev/null > +++ b/Documentation/gpu/exchanging-pixel-buffers.rst > @@ -0,0 +1,285 @@ > +.. Copyright 2021 Collabora Ltd. > + > + > +Exchanging pixel buffers > + > + > +As originally designed, the Linux graphics subsystem had extremely limited > +support for sharing pixel-buffer allocations between processes, devices, and > +subsystems. Modern systems require extensive integration between all three > +classes; this document details how applications and kernel subsystems should > +approach this sharing for two-dimensional image data. > + > +It is written with reference to the DRM subsystem for GPU and display > devices, > +V4L2 for media devices, and also to Vulkan, EGL and Wayland, for userspace > +support, however any other subsystems should also follow this design and > advice. > + > + > +Formats and modifiers > += > + > +Each buffer must have an underlying format. This format describes the data > which > +can be stored and loaded for each pixel. Although each subsystem has its own > +format descriptions (e.g. V4L2 and fbdev), the `DRM_FORMAT_*` tokens should > be > +reused wherever possible, as they are the standard descriptions used for > +interchange. > + > +Each `DRM_FORMAT_*` token describes the per-pixel data available, in terms of > +the translation between one or more pixels in memory, and the color data > +contained within that memory. The number and type of color channels are > +described: whether they are RGB or YUV, integer or floating-point, the size > +of each channel and their locations within the pixel memory, and the > +relationship between color planes. > + > +For example, `DRM_FORMAT_ARGB` describes a format in which each pixel > has a > +single 32-bit value in memory. Alpha, red, green, and blue, color channels > are > +available at 8-byte precision per channel,
Re: [PATCH] doc: gpu: Add document describing buffer exchange
> stride > I think what's clear is: - Per-plane property - In bytes - Offset between two consecutive rows How that applies to weird YUV formats is the tricky question… > Btw. there was a fun argument whether the same modifier value could > mean different things on different devices. There were also arguments > that a certain modifier could reference additional implicit memory on > the device - memory that can only be accessed by very specific devices. > > I think AMLOGIC_FBC_LAYOUT_SCATTER was one of those. A recent exmaple of this is [1]. [1]: https://patchwork.freedesktop.org/patch/452461/
Re: [PATCH 1/8] drm/i915/xehp: Define compute class and engine
On 07/09/2021 18:19, Matt Roper wrote: Introduce a Compute Command Streamer (CCS), which has access to the media and GPGPU pipelines (but not the 3D pipeline). To begin with, define the compute class/engine common functions, based on the existing render ones. Bspec: 46167, 45544 Original-patch-by: Michel Thierry Cc: Daniele Ceraolo Spurio Cc: Tvrtko Ursulin Cc: Vinay Belgaumkar Cc: Szymon Morek UMD (compute): https://github.com/intel/compute-runtime/pull/451 Signed-off-by: Rodrigo Vivi Signed-off-by: Daniele Ceraolo Spurio Signed-off-by: Aravind Iddamsetty Signed-off-by: Matt Roper --- drivers/gpu/drm/i915/gt/intel_engine_cs.c| 28 drivers/gpu/drm/i915/gt/intel_engine_types.h | 9 ++- drivers/gpu/drm/i915/gt/intel_engine_user.c | 5 +++- drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 13 + drivers/gpu/drm/i915/i915_reg.h | 8 ++ include/uapi/drm/i915_drm.h | 1 + 6 files changed, 57 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 332efea696a5..69944bd8c19d 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -153,6 +153,34 @@ static const struct engine_info intel_engines[] = { { .graphics_ver = 12, .base = XEHP_VEBOX4_RING_BASE } }, }, + [CCS0] = { + .class = COMPUTE_CLASS, + .instance = 0, + .mmio_bases = { + { .graphics_ver = 12, .base = GEN12_COMPUTE0_RING_BASE } + } + }, + [CCS1] = { + .class = COMPUTE_CLASS, + .instance = 1, + .mmio_bases = { + { .graphics_ver = 12, .base = GEN12_COMPUTE1_RING_BASE } + } + }, + [CCS2] = { + .class = COMPUTE_CLASS, + .instance = 2, + .mmio_bases = { + { .graphics_ver = 12, .base = GEN12_COMPUTE2_RING_BASE } + } + }, + [CCS3] = { + .class = COMPUTE_CLASS, + .instance = 3, + .mmio_bases = { + { .graphics_ver = 12, .base = GEN12_COMPUTE3_RING_BASE } + } + }, }; /** diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index bfbfe53c23dd..dcb9d8b2362a 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -33,7 +33,8 @@ #define VIDEO_ENHANCEMENT_CLASS 2 #define COPY_ENGINE_CLASS 3 #define OTHER_CLASS 4 -#define MAX_ENGINE_CLASS 4 +#define COMPUTE_CLASS 5 +#define MAX_ENGINE_CLASS 5 #define MAX_ENGINE_INSTANCE 7 #define I915_MAX_SLICES 3 @@ -95,6 +96,7 @@ struct i915_ctx_workarounds { #define I915_MAX_VCS 8 #define I915_MAX_VECS 4 +#define I915_MAX_CCS 4 /* * Engine IDs definitions. @@ -117,6 +119,11 @@ enum intel_engine_id { VECS2, VECS3, #define _VECS(n) (VECS0 + (n)) + CCS0, + CCS1, + CCS2, + CCS3, +#define _CCS(n) (CCS0 + (n)) I915_NUM_ENGINES #define INVALID_ENGINE ((enum intel_engine_id)-1) }; diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c b/drivers/gpu/drm/i915/gt/intel_engine_user.c index 8f8bea08e734..d981621a7c30 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_user.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c @@ -47,6 +47,7 @@ static const u8 uabi_classes[] = { [COPY_ENGINE_CLASS] = I915_ENGINE_CLASS_COPY, [VIDEO_DECODE_CLASS] = I915_ENGINE_CLASS_VIDEO, [VIDEO_ENHANCEMENT_CLASS] = I915_ENGINE_CLASS_VIDEO_ENHANCE, + [COMPUTE_CLASS] = I915_ENGINE_CLASS_COMPUTE, }; static int engine_cmp(void *priv, const struct list_head *A, @@ -139,6 +140,7 @@ const char *intel_engine_class_repr(u8 class) [COPY_ENGINE_CLASS] = "bcs", [VIDEO_DECODE_CLASS] = "vcs", [VIDEO_ENHANCEMENT_CLASS] = "vecs", + [COMPUTE_CLASS] = "ccs", }; if (class >= ARRAY_SIZE(uabi_names) || !uabi_names[class]) @@ -162,6 +164,7 @@ static int legacy_ring_idx(const struct legacy_ring *ring) [COPY_ENGINE_CLASS] = { BCS0, 1 }, [VIDEO_DECODE_CLASS] = { VCS0, I915_MAX_VCS }, [VIDEO_ENHANCEMENT_CLASS] = { VECS0, I915_MAX_VECS }, + [COMPUTE_CLASS] = { CCS0, I915_MAX_CCS }, }; if (GEM_DEBUG_WARN_ON(ring->class >= ARRAY_SIZE(map))) @@ -190,7 +193,7 @@ static void add_legacy_ring(struct legacy_ring *ring, void intel_engines_driver_register(struct drm_i915_private *i915) { struct legacy_ring ring = {}; - u8 uabi_instances[4] = {}; + u8 uabi_instances[5] = {}; struct list_head *it, *next; struct
Re: Handling DRM master transitions cooperatively
> On Tue, 07 Sep 2021 10:19:03 + > Simon Ser wrote: > > > FWIW, I've just hit a case where a compositor leaves a "rotation" KMS > > prop set behind, then Xorg tries to startup and fails because it doesn't > > reset this prop. So none of this is theoretical. > > > > I still think a "reset all KMS props to an arbitrary default value" flag > > in drmModeAtomicCommit is the best way forward. I'm not sure a user-space > > protocol would help too much. > > Hi Simon, > > for the "reset KMS state" problem, sure. Thanks for confirming the > problem, too. > > The hand-off problem does need userspace protocol though, so that the > two parties can negotiate what part of KMS state can be inherited by > the receiver and who will do the animation from the first to the second > state in case you want to avoid abrupt changes. It would also be useful > for a cross-fade as a perhaps more flexible way than the current "leak > an FB, let the next KMS client scrape it via ioctls and copy it so it > can be textured from". The KMS state can be limited to single FB on primary plane covering the whole CRTC, no scaling, no other property set than FB_ID/CRTC_*/SRC_*. Is it useful to make the previous client perform the animation? I don't really understand the use-case here. > Userspace protocol is also useful for starting the next KMS client > first and handing off only later once it's actually running. I'm not > sure if that is already possible with the session switching stuff, but > I have a feeling it might be fragile or miss pieces like the next KMS > client signalling ready before actually switching to it. Hm, right. I'm not 100% clear if it's possible for the next client to set everything up while the VT is not active. It would help to make logind/seatd give a non-master DRM FD when VT-switched away. Not sure they do it atm.
Re: [PATCH 2/8] drm/i915/xehp: CCS shares the render reset domain
On 07/09/2021 18:19, Matt Roper wrote: The reset domain is shared between render and all compute engines, so resetting one will affect the others. Note: Before performing a reset on an RCS or CCS engine, the GuC will attempt to preempt-to-idle the other non-hung RCS/CCS engines to avoid impacting other clients (since some shared modules will be reset). If other engines are executing non-preemptable workloads, the impact is unavoidable and some work may be lost. Since here it talks about engine reset, should this patch add warning if same is attempted by i915 on a GuC platform - to document it is not implemented/supported? Or perhaps later in the series, or future series works better. Reviewed-by: Tvrtko Ursulin Regards, Tvrtko Bspec: 52549 Original-patch-by: Michel Thierry Cc: Tvrtko Ursulin Cc: Vinay Belgaumkar Signed-off-by: Daniele Ceraolo Spurio Signed-off-by: Aravind Iddamsetty Signed-off-by: Matt Roper --- drivers/gpu/drm/i915/gt/intel_reset.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c index 91200c43951f..30598c1d070c 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.c +++ b/drivers/gpu/drm/i915/gt/intel_reset.c @@ -507,6 +507,10 @@ static int gen11_reset_engines(struct intel_gt *gt, [VECS1] = GEN11_GRDOM_VECS2, [VECS2] = GEN11_GRDOM_VECS3, [VECS3] = GEN11_GRDOM_VECS4, + [CCS0] = GEN11_GRDOM_RENDER, + [CCS1] = GEN11_GRDOM_RENDER, + [CCS2] = GEN11_GRDOM_RENDER, + [CCS3] = GEN11_GRDOM_RENDER, }; struct intel_engine_cs *engine; intel_engine_mask_t tmp;
Re: [PATCH 3/8] drm/i915/xehp: Add Compute CS IRQ handlers
On 07/09/2021 18:19, Matt Roper wrote: Add execlists and GuC interrupts for compute CS into existing IRQ handlers. All compute command streamers belong to the same compute class, so the only change needed to enable their interrupts is to program their GT engine interrupt mask registers. CCS0 shares the register with CCS1, while CCS2 and CCS3 are in a new one. BSpec: 50844, 54029, 54030, 53223, 53224. Original-patch-by: Michel Thierry Cc: Tvrtko Ursulin Cc: Vinay Belgaumkar Signed-off-by: Daniele Ceraolo Spurio Signed-off-by: Aravind Iddamsetty Signed-off-by: Matt Roper --- drivers/gpu/drm/i915/gt/intel_gt_irq.c | 15 ++- drivers/gpu/drm/i915/i915_drv.h| 2 ++ drivers/gpu/drm/i915/i915_reg.h| 3 +++ 3 files changed, 19 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gt/intel_gt_irq.c b/drivers/gpu/drm/i915/gt/intel_gt_irq.c index b2de83be4d97..612281d47513 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_irq.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_irq.c @@ -96,7 +96,7 @@ gen11_gt_identity_handler(struct intel_gt *gt, const u32 identity) if (unlikely(!intr)) return; - if (class <= COPY_ENGINE_CLASS) + if (class <= COPY_ENGINE_CLASS || class == COMPUTE_CLASS) return gen11_engine_irq_handler(gt, class, instance, intr); if (class == OTHER_CLASS) @@ -178,6 +178,8 @@ void gen11_gt_irq_reset(struct intel_gt *gt) /* Disable RCS, BCS, VCS and VECS class engines. */ intel_uncore_write(uncore, GEN11_RENDER_COPY_INTR_ENABLE, 0); intel_uncore_write(uncore, GEN11_VCS_VECS_INTR_ENABLE,0); + if (CCS_MASK(gt)) + intel_uncore_write(uncore, GEN12_CCS_RSVD_INTR_ENABLE, 0); /* Restore masks irqs on RCS, BCS, VCS and VECS engines. */ intel_uncore_write(uncore, GEN11_RCS0_RSVD_INTR_MASK, ~0); @@ -191,6 +193,10 @@ void gen11_gt_irq_reset(struct intel_gt *gt) intel_uncore_write(uncore, GEN11_VECS0_VECS1_INTR_MASK, ~0); if (HAS_ENGINE(gt, VECS2) || HAS_ENGINE(gt, VECS3)) intel_uncore_write(uncore, GEN12_VECS2_VECS3_INTR_MASK, ~0); + if (HAS_ENGINE(gt, CCS0) || HAS_ENGINE(gt, CCS1)) + intel_uncore_write(uncore, GEN12_CCS0_CCS1_INTR_MASK, ~0); + if (HAS_ENGINE(gt, CCS2) || HAS_ENGINE(gt, CCS3)) + intel_uncore_write(uncore, GEN12_CCS2_CCS3_INTR_MASK, ~0); intel_uncore_write(uncore, GEN11_GPM_WGBOXPERF_INTR_ENABLE, 0); intel_uncore_write(uncore, GEN11_GPM_WGBOXPERF_INTR_MASK, ~0); @@ -218,6 +224,8 @@ void gen11_gt_irq_postinstall(struct intel_gt *gt) /* Enable RCS, BCS, VCS and VECS class interrupts. */ intel_uncore_write(uncore, GEN11_RENDER_COPY_INTR_ENABLE, dmask); intel_uncore_write(uncore, GEN11_VCS_VECS_INTR_ENABLE, dmask); + if (CCS_MASK(gt)) + intel_uncore_write(uncore, GEN12_CCS_RSVD_INTR_ENABLE, smask); /* Unmask irqs on RCS, BCS, VCS and VECS engines. */ intel_uncore_write(uncore, GEN11_RCS0_RSVD_INTR_MASK, ~smask); @@ -231,6 +239,11 @@ void gen11_gt_irq_postinstall(struct intel_gt *gt) intel_uncore_write(uncore, GEN11_VECS0_VECS1_INTR_MASK, ~dmask); if (HAS_ENGINE(gt, VECS2) || HAS_ENGINE(gt, VECS3)) intel_uncore_write(uncore, GEN12_VECS2_VECS3_INTR_MASK, ~dmask); + if (HAS_ENGINE(gt, CCS0) || HAS_ENGINE(gt, CCS1)) + intel_uncore_write(uncore, GEN12_CCS0_CCS1_INTR_MASK, ~dmask); + if (HAS_ENGINE(gt, CCS2) || HAS_ENGINE(gt, CCS3)) + intel_uncore_write(uncore, GEN12_CCS2_CCS3_INTR_MASK, ~dmask); + /* * RPS interrupts will get enabled/disabled on demand when RPS itself * is enabled/disabled. diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 1fd3040b6771..5b6eee5d8ade 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1573,6 +1573,8 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915, ENGINE_INSTANCES_MASK(gt, VCS0, I915_MAX_VCS) #define VEBOX_MASK(gt) \ ENGINE_INSTANCES_MASK(gt, VECS0, I915_MAX_VECS) +#define CCS_MASK(gt) \ + ENGINE_INSTANCES_MASK(gt, CCS0, I915_MAX_CCS) /* * The Gen7 cmdparser copies the scanned buffer to the ggtt for execution diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 33d6aa0b07c1..31e9c2cc4c0c 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -8139,6 +8139,7 @@ enum { #define GEN11_GPM_WGBOXPERF_INTR_ENABLE _MMIO(0x19003c) #define GEN11_CRYPTO_RSVD_INTR_ENABLE _MMIO(0x190040) #define GEN11_GUNIT_CSME_INTR_ENABLE _MMIO(0x190044) +#define GEN12_CCS_RSVD_INTR_ENABLE _MMIO(0x190048) #define GEN11_RCS0_RSVD_INTR_MASK _MMIO(0x190090) #define GEN11_BCS_RSVD_INTR_MASK _MMIO(0x1900a0) @@ -8152,6 +8153,8 @@ enum { #define GEN11_GPM_WGBOXPERF_INTR_MASK _MMIO(0x1900ec) #define GEN11_
Re: [PATCH v10 01/17] dt-bindings: arm: mediatek: mmsys: add power and gce properties
Hi Enric, Thanks for the reviews. On Wed, 2021-09-08 at 10:32 +0200, Enric Balletbo i Serra wrote: > Hi Jason, > > Thank you for your patch. One small comment below. > > On 8/9/21 8:02, jason-jh.lin wrote: > > Power: > > 1. Add description for power-domains property. > > > > GCE: > > 1. Add description for mboxes property. > > 2. Add description for mediatek,gce-client-reg property. > > > > Signed-off-by: jason-jh.lin > > --- > > .../bindings/arm/mediatek/mediatek,mmsys.yaml | 30 > > ++- > > 1 file changed, 29 insertions(+), 1 deletion(-) > > > > diff --git > > a/Documentation/devicetree/bindings/arm/mediatek/mediatek,mmsys.yam > > l > > b/Documentation/devicetree/bindings/arm/mediatek/mediatek,mmsys.yam > > l > > index 2d4ff0ce387b..a2e7bddfed03 100644 > > --- > > a/Documentation/devicetree/bindings/arm/mediatek/mediatek,mmsys.yam > > l > > +++ > > b/Documentation/devicetree/bindings/arm/mediatek/mediatek,mmsys.yam > > l > > @@ -39,6 +39,30 @@ properties: > >reg: > > maxItems: 1 > > > > + power-domains: > > +description: > > + A phandle and PM domain specifier as defined by bindings > > + of the power controller specified by phandle. See > > + Documentation/devicetree/bindings/power/power-domain.yaml > > for details. > > + > > + mboxes: > > +description: > > + Using mailbox to communicate with GCE, it should have this > > + property and list of phandle, mailbox specifiers. See > > + Documentation/devicetree/bindings/mailbox/mtk-gce.txt for > > details. > > +$ref: /schemas/types.yaml#/definitions/phandle-array > > + > > + mediatek,gce-client-reg: > > +description: > > + The register of client driver can be configured by gce with > > 4 arguments > > + defined in this property, such as phandle of gce, subsys id, > > + register offset and size. > > + Each subsys id is mapping to a base address of display > > function blocks > > + register which is defined in the gce header > > + include/dt-bindings/gce/-gce.h. > > +$ref: /schemas/types.yaml#/definitions/phandle-array > > +maxItems: 1 > > + > >"#clock-cells": > > const: 1 > > > > @@ -53,6 +77,10 @@ examples: > >- | > > mmsys: syscon@1400 { > > compatible = "mediatek,mt8173-mmsys", "syscon"; > > -reg = <0x1400 0x1000>; > > +reg = <0 0x1400 0 0x1000>; > > Why this change? > > Thanks, > Enric > I think the first version of this example is not correct. I,ve checked the first version of mt8173. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/arm64/boot/dts/mediatek/mt8173.dtsi?id=b3a37248415716663ea2d752da4a5f765fc87442 Because #address-cells and #size-cells of parent node are defined as 2. e.g. soc { #address-cells = <2>; #size-cells = <2>; ... }; Regards, Jason-JH.Lin > > > +power-domains = <&spm MT8173_POWER_DOMAIN_MM>; > > #clock-cells = <1>; > > +mboxes = <&gce 0 CMDQ_THR_PRIO_HIGHEST>, > > + <&gce 1 CMDQ_THR_PRIO_HIGHEST>; > > +mediatek,gce-client-reg = <&gce SUBSYS_1400 0 0x1000>; > > }; > > -- Jason-JH Lin
Re: [PATCH 4/8] drm/i915/xehp: CCS should use RCS setup functions
On 07/09/2021 18:19, Matt Roper wrote: The compute engine handles the same commands the render engine can (except 3D pipeline), so it makes sense that CCS is more similar to RCS than non-render engines. The CCS context state (lrc) is also similar to the render one, so reuse it. Note that the compute engine has its own CTX_R_PWR_CLK_STATE register. In order to avoid having multiple RCS && CCS checks, add the following engine flag: - I915_ENGINE_HAS_RCS_REG_STATE - use the render (larger) reg state ctx. BSpec: 46260 Original-patch-by: Michel Thierry Cc: Tvrtko Ursulin Cc: Daniele Ceraolo Spurio Signed-off-by: Aravind Iddamsetty Signed-off-by: Matt Roper --- drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c | 8 +--- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 6 ++ drivers/gpu/drm/i915/gt/intel_engine_types.h | 1 + drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 2 +- drivers/gpu/drm/i915/gt/intel_lrc.c | 4 ++-- drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 2 +- drivers/gpu/drm/i915/i915_perf.c | 4 ++-- drivers/gpu/drm/i915/i915_reg.h | 2 +- 8 files changed, 19 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c index b32f7fed2d9c..fbe10783628b 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c @@ -883,7 +883,9 @@ static int igt_shared_ctx_exec(void *arg) return err; } -static int rpcs_query_batch(struct drm_i915_gem_object *rpcs, struct i915_vma *vma) +static int rpcs_query_batch(struct drm_i915_gem_object *rpcs, + struct i915_vma *vma, + struct intel_engine_cs *engine) { u32 *cmd; @@ -894,7 +896,7 @@ static int rpcs_query_batch(struct drm_i915_gem_object *rpcs, struct i915_vma *v return PTR_ERR(cmd); *cmd++ = MI_STORE_REGISTER_MEM_GEN8; - *cmd++ = i915_mmio_reg_offset(GEN8_R_PWR_CLK_STATE); + *cmd++ = i915_mmio_reg_offset(GEN8_R_PWR_CLK_STATE(engine->mmio_base)); *cmd++ = lower_32_bits(vma->node.start); *cmd++ = upper_32_bits(vma->node.start); *cmd = MI_BATCH_BUFFER_END; @@ -955,7 +957,7 @@ emit_rpcs_query(struct drm_i915_gem_object *obj, if (err) goto err_vma; - err = rpcs_query_batch(rpcs, vma); + err = rpcs_query_batch(rpcs, vma, ce->engine); if (err) goto err_batch; diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 69944bd8c19d..b346b946602d 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -205,6 +205,8 @@ u32 intel_engine_context_size(struct intel_gt *gt, u8 class) BUILD_BUG_ON(I915_GTT_PAGE_SIZE != PAGE_SIZE); switch (class) { + case COMPUTE_CLASS: + fallthrough; case RENDER_CLASS: switch (GRAPHICS_VER(gt->i915)) { default: @@ -379,6 +381,10 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id) if (GRAPHICS_VER(i915) == 12 && engine->class == RENDER_CLASS) engine->props.preempt_timeout_ms = 0; + /* features common between engines sharing EUs */ + if (engine->class == RENDER_CLASS || engine->class == COMPUTE_CLASS) + engine->flags |= I915_ENGINE_HAS_RCS_REG_STATE; + engine->defaults = engine->props; /* never to change again */ engine->context_size = intel_engine_context_size(gt, engine->class); diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index dcb9d8b2362a..30a0c69c36c8 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -454,6 +454,7 @@ struct intel_engine_cs { #define I915_ENGINE_HAS_RELATIVE_MMIO BIT(6) #define I915_ENGINE_REQUIRES_CMD_PARSER BIT(7) #define I915_ENGINE_WANT_FORCED_PREEMPTION BIT(8) +#define I915_ENGINE_HAS_RCS_REG_STATE BIT(9) unsigned int flags; /* diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index de5f9c86b9a4..4c600c46414d 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -3406,7 +3406,7 @@ int intel_execlists_submission_setup(struct intel_engine_cs *engine) logical_ring_default_vfuncs(engine); logical_ring_default_irqs(engine); - if (engine->class == RENDER_CLASS) + if (engine->flags & I915_ENGINE_HAS_RCS_REG_STATE) rcs_submission_override(engine); Hm, what do pipe control flushes which relate to 3d pipeline end up doing on CCS engines? Regards, Tvrtko
[PATCH v2 (repost)] fbmem: don't allow too huge resolutions
syzbot is reporting page fault at vga16fb_fillrect() [1], for vga16fb_check_var() is failing to detect multiplication overflow. if (vxres * vyres > maxmem) { vyres = maxmem / vxres; if (vyres < yres) return -ENOMEM; } Since no module would accept too huge resolutions where multiplication overflow happens, let's reject in the common path. Link: https://syzkaller.appspot.com/bug?extid=04168c8063cfdde1db5e [1] Reported-by: syzbot Debugged-by: Randy Dunlap Signed-off-by: Tetsuo Handa Reviewed-by: Geert Uytterhoeven --- Changes in v2: Use check_mul_overflow(), suggested by Geert Uytterhoeven . drivers/video/fbdev/core/fbmem.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c index 71fb710f1ce3..7420d2c16e47 100644 --- a/drivers/video/fbdev/core/fbmem.c +++ b/drivers/video/fbdev/core/fbmem.c @@ -962,6 +962,7 @@ fb_set_var(struct fb_info *info, struct fb_var_screeninfo *var) struct fb_var_screeninfo old_var; struct fb_videomode mode; struct fb_event event; + u32 unused; if (var->activate & FB_ACTIVATE_INV_MODE) { struct fb_videomode mode1, mode2; @@ -1008,6 +1009,11 @@ fb_set_var(struct fb_info *info, struct fb_var_screeninfo *var) if (var->xres < 8 || var->yres < 8) return -EINVAL; + /* Too huge resolution causes multiplication overflow. */ + if (check_mul_overflow(var->xres, var->yres, &unused) || + check_mul_overflow(var->xres_virtual, var->yres_virtual, &unused)) + return -EINVAL; + ret = info->fbops->fb_check_var(var, info); if (ret) -- 2.18.4
Re: [PATCH] kernel/locking: Add context to ww_mutex_trylock.
On Tue, Sep 07, 2021 at 03:20:44PM +0200, Maarten Lankhorst wrote: > i915 will soon gain an eviction path that trylock a whole lot of locks > for eviction, getting dmesg failures like below: > > BUG: MAX_LOCK_DEPTH too low! > turning off the locking correctness validator. > depth: 48 max: 48! > 48 locks held by i915_selftest/5776: > #0: 888101a79240 (&dev->mutex){}-{3:3}, at: > __driver_attach+0x88/0x160 > #1: c99778c0 (reservation_ww_class_acquire){+.+.}-{0:0}, at: > i915_vma_pin.constprop.63+0x39/0x1b0 [i915] > #2: 88800cf74de8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: > i915_vma_pin.constprop.63+0x5f/0x1b0 [i915] > #3: 88810c7f9e38 (&vm->mutex/1){+.+.}-{3:3}, at: > i915_vma_pin_ww+0x1c4/0x9d0 [i915] > #4: 88810bad5768 (reservation_ww_class_mutex){+.+.}-{3:3}, at: > i915_gem_evict_something+0x110/0x860 [i915] > #5: 88810bad60e8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: > i915_gem_evict_something+0x110/0x860 [i915] > ... > #46: 88811964d768 (reservation_ww_class_mutex){+.+.}-{3:3}, at: > i915_gem_evict_something+0x110/0x860 [i915] > #47: 88811964e0e8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: > i915_gem_evict_something+0x110/0x860 [i915] > INFO: lockdep is turned off. > As an intermediate solution, add an acquire context to ww_mutex_trylock, > which allows us to do proper nesting annotations on the trylocks, making > the above lockdep splat disappear. Fair enough I suppose. > +/** > + * ww_mutex_trylock - tries to acquire the w/w mutex with optional acquire > context > + * @lock: mutex to lock > + * @ctx: optional w/w acquire context > + * > + * Trylocks a mutex with the optional acquire context; no deadlock detection > is > + * possible. Returns 1 if the mutex has been acquired successfully, 0 > otherwise. > + * > + * Unlike ww_mutex_lock, no deadlock handling is performed. However, if a > @ctx is > + * specified, -EALREADY and -EDEADLK handling may happen in calls to > ww_mutex_lock. > + * > + * A mutex acquired with this function must be released with ww_mutex_unlock. > + */ > +int __sched > +ww_mutex_trylock(struct ww_mutex *ww, struct ww_acquire_ctx *ctx) > +{ > + bool locked; > + > + if (!ctx) > + return mutex_trylock(&ww->base); > + > +#ifdef CONFIG_DEBUG_MUTEXES > + DEBUG_LOCKS_WARN_ON(ww->base.magic != &ww->base); > +#endif > + > + preempt_disable(); > + locked = __mutex_trylock(&ww->base); > + > + if (locked) { > + ww_mutex_set_context_fastpath(ww, ctx); > + mutex_acquire_nest(&ww->base.dep_map, 0, 1, &ctx->dep_map, > _RET_IP_); > + } > + preempt_enable(); > + > + return locked; > +} > +EXPORT_SYMBOL(ww_mutex_trylock); You'll need a similar hunk in ww_rt_mutex.c
Re: [PATCH v2 5/6] drm/i915: Don't back up pinned LMEM context images and rings during suspend
On 06/09/2021 17:55, Thomas Hellström wrote: Pinned context images are now reset during resume. Don't back them up, and assuming that rings can be assumed empty at suspend, don't back them up either. Introduce a new object flag, I915_BO_ALLOC_PM_VOLATILE meaning that an object is allowed to lose its content on suspend. Signed-off-by: Thomas Hellström --- .../gpu/drm/i915/gem/i915_gem_object_types.h| 17 ++--- drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c | 3 +++ drivers/gpu/drm/i915/gt/intel_lrc.c | 3 ++- drivers/gpu/drm/i915/gt/intel_ring.c| 3 ++- 4 files changed, 17 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h index 734cc8e16481..66123ba46247 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h @@ -288,16 +288,19 @@ struct drm_i915_gem_object { I915_SELFTEST_DECLARE(struct list_head st_link); unsigned long flags; -#define I915_BO_ALLOC_CONTIGUOUS BIT(0) -#define I915_BO_ALLOC_VOLATILE BIT(1) -#define I915_BO_ALLOC_CPU_CLEAR BIT(2) -#define I915_BO_ALLOC_USER BIT(3) +#define I915_BO_ALLOC_CONTIGUOUS BIT(0) +#define I915_BO_ALLOC_VOLATILEBIT(1) +#define I915_BO_ALLOC_CPU_CLEAR BIT(2) +#define I915_BO_ALLOC_USERBIT(3) +/* Object may lose its contents on suspend / resume */ + if we can't evict it? +#define I915_BO_ALLOC_PM_VOLATILE BIT(4) #define I915_BO_ALLOC_FLAGS (I915_BO_ALLOC_CONTIGUOUS | \ I915_BO_ALLOC_VOLATILE | \ I915_BO_ALLOC_CPU_CLEAR | \ -I915_BO_ALLOC_USER) -#define I915_BO_READONLY BIT(4) -#define I915_TILING_QUIRK_BIT5 /* unknown swizzling; do not release! */ +I915_BO_ALLOC_USER | \ +I915_BO_ALLOC_PM_VOLATILE) +#define I915_BO_READONLY BIT(5) +#define I915_TILING_QUIRK_BIT 6 /* unknown swizzling; do not release! */ /** * @mem_flags - Mutable placement-related flags diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c index 3884bf45dab8..eaceecfc3f19 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c @@ -61,6 +61,9 @@ static int i915_ttm_backup(struct i915_gem_apply_to_region *apply, if (!pm_apply->backup_pinned) return 0; + if (obj->flags & I915_BO_ALLOC_PM_VOLATILE) + return 0; + sys_region = i915->mm.regions[INTEL_REGION_SMEM]; backup = i915_gem_object_create_region(sys_region, obj->base.size, diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index 6ba8daea2f56..3ef9eaf8c50e 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -942,7 +942,8 @@ __lrc_alloc_state(struct intel_context *ce, struct intel_engine_cs *engine) context_size += PAGE_SIZE; } - obj = i915_gem_object_create_lmem(engine->i915, context_size, 0); + obj = i915_gem_object_create_lmem(engine->i915, context_size, + I915_BO_ALLOC_PM_VOLATILE); if (IS_ERR(obj)) obj = i915_gem_object_create_shmem(engine->i915, context_size); if (IS_ERR(obj)) diff --git a/drivers/gpu/drm/i915/gt/intel_ring.c b/drivers/gpu/drm/i915/gt/intel_ring.c index 7c4d5158e03b..2fdd52b62092 100644 --- a/drivers/gpu/drm/i915/gt/intel_ring.c +++ b/drivers/gpu/drm/i915/gt/intel_ring.c @@ -112,7 +112,8 @@ static struct i915_vma *create_ring_vma(struct i915_ggtt *ggtt, int size) struct drm_i915_gem_object *obj; struct i915_vma *vma; - obj = i915_gem_object_create_lmem(i915, size, I915_BO_ALLOC_VOLATILE); + obj = i915_gem_object_create_lmem(i915, size, I915_BO_ALLOC_VOLATILE | + I915_BO_ALLOC_PM_VOLATILE); if (IS_ERR(obj) && i915_ggtt_has_aperture(ggtt)) obj = i915_gem_object_create_stolen(i915, size); if (IS_ERR(obj))
Re: [PATCH v2 5/6] drm/i915: Don't back up pinned LMEM context images and rings during suspend
On 06/09/2021 17:55, Thomas Hellström wrote: Pinned context images are now reset during resume. Don't back them up, and assuming that rings can be assumed empty at suspend, don't back them up either. Introduce a new object flag, I915_BO_ALLOC_PM_VOLATILE meaning that an object is allowed to lose its content on suspend. Signed-off-by: Thomas Hellström --- .../gpu/drm/i915/gem/i915_gem_object_types.h| 17 ++--- drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c | 3 +++ drivers/gpu/drm/i915/gt/intel_lrc.c | 3 ++- drivers/gpu/drm/i915/gt/intel_ring.c| 3 ++- 4 files changed, 17 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h index 734cc8e16481..66123ba46247 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h @@ -288,16 +288,19 @@ struct drm_i915_gem_object { I915_SELFTEST_DECLARE(struct list_head st_link); unsigned long flags; -#define I915_BO_ALLOC_CONTIGUOUS BIT(0) -#define I915_BO_ALLOC_VOLATILE BIT(1) -#define I915_BO_ALLOC_CPU_CLEAR BIT(2) -#define I915_BO_ALLOC_USER BIT(3) +#define I915_BO_ALLOC_CONTIGUOUS BIT(0) +#define I915_BO_ALLOC_VOLATILEBIT(1) +#define I915_BO_ALLOC_CPU_CLEAR BIT(2) +#define I915_BO_ALLOC_USERBIT(3) +/* Object may lose its contents on suspend / resume */ +#define I915_BO_ALLOC_PM_VOLATILE BIT(4) PM_SKIP_PINNED? Not sure if that is better. #define I915_BO_ALLOC_FLAGS (I915_BO_ALLOC_CONTIGUOUS | \ I915_BO_ALLOC_VOLATILE | \ I915_BO_ALLOC_CPU_CLEAR | \ -I915_BO_ALLOC_USER) -#define I915_BO_READONLY BIT(4) -#define I915_TILING_QUIRK_BIT5 /* unknown swizzling; do not release! */ +I915_BO_ALLOC_USER | \ +I915_BO_ALLOC_PM_VOLATILE) +#define I915_BO_READONLY BIT(5) +#define I915_TILING_QUIRK_BIT 6 /* unknown swizzling; do not release! */ /** * @mem_flags - Mutable placement-related flags diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c index 3884bf45dab8..eaceecfc3f19 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c @@ -61,6 +61,9 @@ static int i915_ttm_backup(struct i915_gem_apply_to_region *apply, if (!pm_apply->backup_pinned) return 0; + if (obj->flags & I915_BO_ALLOC_PM_VOLATILE) + return 0; + sys_region = i915->mm.regions[INTEL_REGION_SMEM]; backup = i915_gem_object_create_region(sys_region, obj->base.size, diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index 6ba8daea2f56..3ef9eaf8c50e 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -942,7 +942,8 @@ __lrc_alloc_state(struct intel_context *ce, struct intel_engine_cs *engine) context_size += PAGE_SIZE; } - obj = i915_gem_object_create_lmem(engine->i915, context_size, 0); + obj = i915_gem_object_create_lmem(engine->i915, context_size, + I915_BO_ALLOC_PM_VOLATILE); if (IS_ERR(obj)) obj = i915_gem_object_create_shmem(engine->i915, context_size); if (IS_ERR(obj)) diff --git a/drivers/gpu/drm/i915/gt/intel_ring.c b/drivers/gpu/drm/i915/gt/intel_ring.c index 7c4d5158e03b..2fdd52b62092 100644 --- a/drivers/gpu/drm/i915/gt/intel_ring.c +++ b/drivers/gpu/drm/i915/gt/intel_ring.c @@ -112,7 +112,8 @@ static struct i915_vma *create_ring_vma(struct i915_ggtt *ggtt, int size) struct drm_i915_gem_object *obj; struct i915_vma *vma; - obj = i915_gem_object_create_lmem(i915, size, I915_BO_ALLOC_VOLATILE); + obj = i915_gem_object_create_lmem(i915, size, I915_BO_ALLOC_VOLATILE | + I915_BO_ALLOC_PM_VOLATILE); if (IS_ERR(obj) && i915_ggtt_has_aperture(ggtt)) obj = i915_gem_object_create_stolen(i915, size); if (IS_ERR(obj))
Re: [PATCH] drm/bridge: ti-sn65dsi83: Check link status register after enabling the bridge
Hi Marek and Andrzej On Tue, 7 Sept 2021 at 22:24, Marek Vasut wrote: > > On 9/7/21 7:29 PM, Andrzej Hajda wrote: > > > > W dniu 07.09.2021 o 16:25, Marek Vasut pisze: > >> On 9/7/21 9:31 AM, Andrzej Hajda wrote: > >>> On 07.09.2021 04:39, Marek Vasut wrote: > In rare cases, the bridge may not start up correctly, which usually > leads to no display output. In case this happens, warn about it in > the kernel log. > > Signed-off-by: Marek Vasut > Cc: Jagan Teki > Cc: Laurent Pinchart > Cc: Linus Walleij > Cc: Robert Foss > Cc: Sam Ravnborg > Cc: dri-devel@lists.freedesktop.org > --- > NOTE: See the following: > https://e2e.ti.com/support/interface-group/interface/f/interface-forum/942005/sn65dsi83-dsi83-lvds-bridge---sporadic-behavior---no-video > > https://community.nxp.com/t5/i-MX-Processors/i-MX8M-MIPI-DSI-Interface-LVDS-Bridge-Initialization/td-p/1156533 > > --- > drivers/gpu/drm/bridge/ti-sn65dsi83.c | 5 + > 1 file changed, 5 insertions(+) > > diff --git a/drivers/gpu/drm/bridge/ti-sn65dsi83.c > b/drivers/gpu/drm/bridge/ti-sn65dsi83.c > index a32f70bc68ea4..4ea71d7f0bfbc 100644 > --- a/drivers/gpu/drm/bridge/ti-sn65dsi83.c > +++ b/drivers/gpu/drm/bridge/ti-sn65dsi83.c > @@ -520,6 +520,11 @@ static void sn65dsi83_atomic_enable(struct > drm_bridge *bridge, > /* Clear all errors that got asserted during initialization. */ > regmap_read(ctx->regmap, REG_IRQ_STAT, &pval); > regmap_write(ctx->regmap, REG_IRQ_STAT, pval); > >>> > >>> > >>> It does not look as correct error handling, maybe it would be good to > >>> analyze and optionally report 'unexpected' errors here as well. > >> > >> The above is correct -- it clears the status register because the > >> setup might've set random bits in that register. Then we wait a bit, > >> let the link run, and read them again to get the real link status in > >> this new piece of code below, hence the usleep_range there. And then > >> if the link indicates a problem, we know it is a problem. > > > > > > Usually such registers are cleared on very beginning of the > > initialization, and tested (via irq handler, or via reading), during > > initalization, if initialization phase goes well. If it is not the case > > forgive me. > > The init just flips the bit at random in the IRQ_STAT register, so no, > that's not really viable here. That's why we clear them at the end, and > then wait a bit, and then check whether something new appeared in them. > > If not, all is great. > > Sure, we could generate an IRQ, but then IRQ line is not always > connected to this chip on all hardware I have available. So this gives > the user at least some indication that something is wrong with their HW. > > + > +usleep_range(1, 12000); > +regmap_read(ctx->regmap, REG_IRQ_STAT, &pval); > +if (pval) > +dev_err(ctx->dev, "Unexpected link status 0x%02x\n", pval); > >>> > >>> > >>> I am not sure what is the case here but it looks like 'we do not know > >>> what is going on, so let's add some diagnostic messages to gather info > >>> and figure it out later'. > >> > >> That's pretty much the case, see the two links above in the NOTE > >> section. If something goes wrong, we print the value for the user > >> (usually developer) so they can fix their problems. We cannot do much > >> better in the attach callback. > >> > >> The issue I ran into (and where this would be helpful information to > >> me during debugging, since the issue happened real seldom, see also > >> the NOTE links above) is that the DSI controller driver started > >> streaming video on the data lanes before the DSI83 had a chance to > >> initialize. This worked most of the time, except for a few exceptions > >> here and there, where the video didn't start. This does set link > >> status bits consistently. In the meantime, I fixed the controller > >> driver (so far downstream, due to ongoing discussion). > > > > > > Maybe drm_connector_set_link_status_property(conn, > > DRM_MODE_LINK_STATUS_BAD) would be usefule here. > > Hmm, this works on connector, the dsi83 is a bridge and it can be stuck > between two other bridges. That doesn't seem like the right tool, no ? > > >>> Whole driver lacks IRQ handler which IMO could perform better diagnosis, > >>> and I guess it could also help in recovery, but this is just my guess. > >>> So if this patch is enough for now you can add: > >> > >> No, IRQ won't help you here, because by the time you get the IRQ, the > >> DSI host already started streaming video on data lanes and you won't > >> be able to correctly reinit the DSI83 unless you communicate to the > >> DSI host that it should switch the data lanes back to LP11. > >> > >> And for that, there is a bigger chunk missing really. What needs to be > >> added is a way for the DSI bridge / panel to commun
Re: Handling DRM master transitions cooperatively
On Wed, 08 Sep 2021 09:51:54 + Simon Ser wrote: > > On Tue, 07 Sep 2021 10:19:03 + > > Simon Ser wrote: > > > > > FWIW, I've just hit a case where a compositor leaves a "rotation" KMS > > > prop set behind, then Xorg tries to startup and fails because it doesn't > > > reset this prop. So none of this is theoretical. > > > > > > I still think a "reset all KMS props to an arbitrary default value" flag > > > in drmModeAtomicCommit is the best way forward. I'm not sure a user-space > > > protocol would help too much. > > > > Hi Simon, > > > > for the "reset KMS state" problem, sure. Thanks for confirming the > > problem, too. > > > > The hand-off problem does need userspace protocol though, so that the > > two parties can negotiate what part of KMS state can be inherited by > > the receiver and who will do the animation from the first to the second > > state in case you want to avoid abrupt changes. It would also be useful > > for a cross-fade as a perhaps more flexible way than the current "leak > > an FB, let the next KMS client scrape it via ioctls and copy it so it > > can be textured from". > > The KMS state can be limited to single FB on primary plane covering the whole > CRTC, no scaling, no other property set than FB_ID/CRTC_*/SRC_*. > > Is it useful to make the previous client perform the animation? I don't really > understand the use-case here. I guess the use cases are more or less imaginary for now. Imagine one HDR-capable display server handing off to another HDR-capable display server. If the releasing display server does not know the receiving display server understands HDR, the releasing display server might run an animation to turn HDR off - fade to black, for instance, so that the impact from changing from HDR to SDR is minimized. Then the receiving display server sees KMS is in SDR mode, and maybe sets up a black image and then switches back to HDR. If you're happy with fade-to-black on switch, then no problem. However, the only way to not fade-to-black or even come cross-fade is some negotiation to see that both sides understand HDR. If the previous FB was rendered for HDR display, you will need to know a lot from it if you want to do a cross-fade that doesn't glitch. Also, while I don't see why changing between SDR and HDR would require a modeset in KMS, I suppose it might take a moment for the monitor to adapt. It might cause glitches similar to changing video modes. > > Userspace protocol is also useful for starting the next KMS client > > first and handing off only later once it's actually running. I'm not > > sure if that is already possible with the session switching stuff, but > > I have a feeling it might be fragile or miss pieces like the next KMS > > client signalling ready before actually switching to it. > > Hm, right. I'm not 100% clear if it's possible for the next client to set > everything up while the VT is not active. > > It would help to make logind/seatd give a non-master DRM FD when VT-switched > away. Not sure they do it atm. Oh yeah, that may be an obvious gap I missed. Thanks, pq pgpZ01px8LEWc.pgp Description: OpenPGP digital signature
[drm:i915-uncore-vfunc 30/31] drivers/gpu/drm/i915/selftests/mock_uncore.c:47:2: error: implicit declaration of function 'ASSIGN_RAW_WRITE_MMIO_VFUNCS'
tree: git://people.freedesktop.org/~airlied/linux.git i915-uncore-vfunc head: b42168f90718a90b11f2d52306d9aeaa9468 commit: 99aebd17891290abfca80c48eca01f4e02413fb3 [30/31] drm/i915/uncore: constify the register vtables. config: i386-randconfig-a014-20210908 (attached as .config) compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 9c476172b93367d2cb88d7d3f4b1b5b456fa6020) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross git remote add drm git://people.freedesktop.org/~airlied/linux.git git fetch --no-tags drm i915-uncore-vfunc git checkout 99aebd17891290abfca80c48eca01f4e02413fb3 # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=i386 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All errors (new ones prefixed by >>): In file included from drivers/gpu/drm/i915/intel_uncore.c:2630: >> drivers/gpu/drm/i915/selftests/mock_uncore.c:47:2: error: implicit >> declaration of function 'ASSIGN_RAW_WRITE_MMIO_VFUNCS' >> [-Werror,-Wimplicit-function-declaration] ASSIGN_RAW_WRITE_MMIO_VFUNCS(uncore, nop); ^ >> drivers/gpu/drm/i915/selftests/mock_uncore.c:47:39: error: use of undeclared >> identifier 'nop'; did you mean 'nopv'? ASSIGN_RAW_WRITE_MMIO_VFUNCS(uncore, nop); ^~~ nopv arch/x86/include/asm/hypervisor.h:69:13: note: 'nopv' declared here extern bool nopv; ^ In file included from drivers/gpu/drm/i915/intel_uncore.c:2630: >> drivers/gpu/drm/i915/selftests/mock_uncore.c:48:2: error: implicit >> declaration of function 'ASSIGN_RAW_READ_MMIO_VFUNCS' >> [-Werror,-Wimplicit-function-declaration] ASSIGN_RAW_READ_MMIO_VFUNCS(uncore, nop); ^ drivers/gpu/drm/i915/selftests/mock_uncore.c:48:2: note: did you mean 'ASSIGN_RAW_WRITE_MMIO_VFUNCS'? drivers/gpu/drm/i915/selftests/mock_uncore.c:47:2: note: 'ASSIGN_RAW_WRITE_MMIO_VFUNCS' declared here ASSIGN_RAW_WRITE_MMIO_VFUNCS(uncore, nop); ^ drivers/gpu/drm/i915/selftests/mock_uncore.c:48:38: error: use of undeclared identifier 'nop'; did you mean 'nopv'? ASSIGN_RAW_READ_MMIO_VFUNCS(uncore, nop); ^~~ nopv arch/x86/include/asm/hypervisor.h:69:13: note: 'nopv' declared here extern bool nopv; ^ 4 errors generated. vim +/ASSIGN_RAW_WRITE_MMIO_VFUNCS +47 drivers/gpu/drm/i915/selftests/mock_uncore.c 0757ac8fc7c1da Chris Wilson 2017-04-12 41 d14a701b007063 Chris Wilson 2019-10-08 42 void mock_uncore_init(struct intel_uncore *uncore, d14a701b007063 Chris Wilson 2019-10-08 43 struct drm_i915_private *i915) 0757ac8fc7c1da Chris Wilson 2017-04-12 44 { d14a701b007063 Chris Wilson 2019-10-08 45 intel_uncore_init_early(uncore, i915); d14a701b007063 Chris Wilson 2019-10-08 46 ccb2aceaaa5f92 Daniele Ceraolo Spurio 2019-06-19 @47 ASSIGN_RAW_WRITE_MMIO_VFUNCS(uncore, nop); ccb2aceaaa5f92 Daniele Ceraolo Spurio 2019-06-19 @48 ASSIGN_RAW_READ_MMIO_VFUNCS(uncore, nop); :: The code at line 47 was first introduced by commit :: ccb2aceaaa5f9267ef7b485b41ae9be3f04b50d3 drm/i915: use vfuncs for reg_read/write_fw_domains :: TO: Daniele Ceraolo Spurio :: CC: Tvrtko Ursulin --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org .config.gz Description: application/gzip
Re: [PATCH v2 1/3] dt-bindings: msm: dsi: Add MSM8953 dsi phy
On Fri, Sep 03, 2021 at 10:38:42PM +0530, Sireesh Kodali wrote: > SoCs based on the MSM8953 platform use the 14nm DSI PHY driver > > Signed-off-by: Sireesh Kodali > --- > Documentation/devicetree/bindings/display/msm/dsi-phy-14nm.yaml | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/Documentation/devicetree/bindings/display/msm/dsi-phy-14nm.yaml > b/Documentation/devicetree/bindings/display/msm/dsi-phy-14nm.yaml > index 72a00cce0147..d2cb19cf71d6 100644 > --- a/Documentation/devicetree/bindings/display/msm/dsi-phy-14nm.yaml > +++ b/Documentation/devicetree/bindings/display/msm/dsi-phy-14nm.yaml > @@ -17,6 +17,8 @@ properties: > oneOf: >- const: qcom,dsi-phy-14nm >- const: qcom,dsi-phy-14nm-660 > + - const: qcom,dsi-phy-14nm-8953 > + This is going to conflict with v5.15-rc1, so you'll need to resend it. > >reg: > items: > -- > 2.33.0 > >
[drm:i915-uncore-vfunc 31/31] make[4]: *** No rule to make target 'drivers/gpu/drm/i915/display/intel_display_trace_points.o', needed by 'drivers/gpu/drm/i915/i915.o'.
Hi Dave, First bad commit (maybe != root cause): tree: git://people.freedesktop.org/~airlied/linux.git i915-uncore-vfunc head: b42168f90718a90b11f2d52306d9aeaa9468 commit: b42168f90718a90b11f2d52306d9aeaa9468 [31/31] RFC: drm/i915: start splitting trace points config: x86_64-randconfig-a006-20210908 (attached as .config) compiler: gcc-9 (Debian 9.3.0-22) 9.3.0 reproduce (this is a W=1 build): git remote add drm git://people.freedesktop.org/~airlied/linux.git git fetch --no-tags drm i915-uncore-vfunc git checkout b42168f90718a90b11f2d52306d9aeaa9468 # save the attached .config to linux build tree make W=1 ARCH=x86_64 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All errors (new ones prefixed by >>): make[4]: *** [scripts/Makefile.build:271: drivers/gpu/drm/i915/i915_irq.o] Error 1 make[4]: *** [scripts/Makefile.build:271: drivers/gpu/drm/i915/intel_uncore.o] Error 1 >> make[4]: *** No rule to make target >> 'drivers/gpu/drm/i915/display/intel_display_trace_points.o', needed by >> 'drivers/gpu/drm/i915/i915.o'. make[4]: *** [scripts/Makefile.build:271: drivers/gpu/drm/i915/display/intel_atomic_plane.o] Error 1 make[4]: *** [scripts/Makefile.build:271: drivers/gpu/drm/i915/display/intel_crtc.o] Error 1 make[4]: *** [scripts/Makefile.build:271: drivers/gpu/drm/i915/display/intel_fifo_underrun.o] Error 1 make[4]: *** [scripts/Makefile.build:271: drivers/gpu/drm/i915/display/intel_frontbuffer.o] Error 1 make[4]: *** [scripts/Makefile.build:271: drivers/gpu/drm/i915/display/intel_fbc.o] Error 1 make[4]: Target '__build' not remade because of errors. --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org .config.gz Description: application/gzip
Re: [PATCH v2 5/6] drm/i915: Don't back up pinned LMEM context images and rings during suspend
On Wed, 2021-09-08 at 12:07 +0100, Matthew Auld wrote: > On 06/09/2021 17:55, Thomas Hellström wrote: > > Pinned context images are now reset during resume. Don't back them > > up, > > and assuming that rings can be assumed empty at suspend, don't back > > them > > up either. > > > > Introduce a new object flag, I915_BO_ALLOC_PM_VOLATILE meaning that > > an > > object is allowed to lose its content on suspend. > > > > Signed-off-by: Thomas Hellström > > --- > > .../gpu/drm/i915/gem/i915_gem_object_types.h | 17 ++-- > > - > > drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c | 3 +++ > > drivers/gpu/drm/i915/gt/intel_lrc.c | 3 ++- > > drivers/gpu/drm/i915/gt/intel_ring.c | 3 ++- > > 4 files changed, 17 insertions(+), 9 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h > > b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h > > index 734cc8e16481..66123ba46247 100644 > > --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h > > @@ -288,16 +288,19 @@ struct drm_i915_gem_object { > > I915_SELFTEST_DECLARE(struct list_head st_link); > > > > unsigned long flags; > > -#define I915_BO_ALLOC_CONTIGUOUS BIT(0) > > -#define I915_BO_ALLOC_VOLATILE BIT(1) > > -#define I915_BO_ALLOC_CPU_CLEAR BIT(2) > > -#define I915_BO_ALLOC_USER BIT(3) > > +#define I915_BO_ALLOC_CONTIGUOUS BIT(0) > > +#define I915_BO_ALLOC_VOLATILE BIT(1) > > +#define I915_BO_ALLOC_CPU_CLEAR BIT(2) > > +#define I915_BO_ALLOC_USER BIT(3) > > +/* Object may lose its contents on suspend / resume */ > > +#define I915_BO_ALLOC_PM_VOLATILE BIT(4) > > PM_SKIP_PINNED? Not sure if that is better. I think we could update the comment to say "object is allowed to lose..", I think we could keep PM_VOLATILE to keep it consistent with the ALLOC_VOLATILE flag? /Thomas
Re: [PATCH v2 5/6] drm/i915: Don't back up pinned LMEM context images and rings during suspend
On 08/09/2021 13:26, Thomas Hellström wrote: On Wed, 2021-09-08 at 12:07 +0100, Matthew Auld wrote: On 06/09/2021 17:55, Thomas Hellström wrote: Pinned context images are now reset during resume. Don't back them up, and assuming that rings can be assumed empty at suspend, don't back them up either. Introduce a new object flag, I915_BO_ALLOC_PM_VOLATILE meaning that an object is allowed to lose its content on suspend. Signed-off-by: Thomas Hellström --- .../gpu/drm/i915/gem/i915_gem_object_types.h | 17 ++-- - drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c | 3 +++ drivers/gpu/drm/i915/gt/intel_lrc.c | 3 ++- drivers/gpu/drm/i915/gt/intel_ring.c | 3 ++- 4 files changed, 17 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h index 734cc8e16481..66123ba46247 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h @@ -288,16 +288,19 @@ struct drm_i915_gem_object { I915_SELFTEST_DECLARE(struct list_head st_link); unsigned long flags; -#define I915_BO_ALLOC_CONTIGUOUS BIT(0) -#define I915_BO_ALLOC_VOLATILE BIT(1) -#define I915_BO_ALLOC_CPU_CLEAR BIT(2) -#define I915_BO_ALLOC_USER BIT(3) +#define I915_BO_ALLOC_CONTIGUOUS BIT(0) +#define I915_BO_ALLOC_VOLATILE BIT(1) +#define I915_BO_ALLOC_CPU_CLEAR BIT(2) +#define I915_BO_ALLOC_USER BIT(3) +/* Object may lose its contents on suspend / resume */ +#define I915_BO_ALLOC_PM_VOLATILE BIT(4) PM_SKIP_PINNED? Not sure if that is better. I think we could update the comment to say "object is allowed to lose..", I think we could keep PM_VOLATILE to keep it consistent with the ALLOC_VOLATILE flag? I guess that's the potentially confusing bit. ALLLOC_VOLATILE means the pages might be discarded as soon as the pages become unpinned, without needing to worry about persisting their contents. With PM_VOLATILE I was expecting something similar where unpinned objects can simply be skipped or ignored during pm. Anyway, that's just a bikeshed, I think with improved comment this should be fine. /Thomas
Enabling TTM kerneldoc
Last round for this set I think, already got RBs for most patches. Only patch #2 is currently missing anything. Please point out anything which can be quickly improved and keep in mind that it's better to have this enabled with some typos than not enabled at all. Cheers, Christian.
[PATCH 1/8] drm/ttm: remove the outdated kerneldoc section
Clean up to start over with new and more accurate documentation. Signed-off-by: Christian König Reviewed-by: Matthew Auld --- Documentation/gpu/drm-mm.rst | 49 1 file changed, 49 deletions(-) diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst index 0198fa43d254..8ca981065e1a 100644 --- a/Documentation/gpu/drm-mm.rst +++ b/Documentation/gpu/drm-mm.rst @@ -30,55 +30,6 @@ The Translation Table Manager (TTM) TTM design background and information belongs here. -TTM initialization --- - -**Warning** -This section is outdated. - -Drivers wishing to support TTM must pass a filled :c:type:`ttm_bo_driver -` structure to ttm_bo_device_init, together with an -initialized global reference to the memory manager. The ttm_bo_driver -structure contains several fields with function pointers for -initializing the TTM, allocating and freeing memory, waiting for command -completion and fence synchronization, and memory migration. - -The :c:type:`struct drm_global_reference ` is made -up of several fields: - -.. code-block:: c - - struct drm_global_reference { - enum ttm_global_types global_type; - size_t size; - void *object; - int (*init) (struct drm_global_reference *); - void (*release) (struct drm_global_reference *); - }; - - -There should be one global reference structure for your memory manager -as a whole, and there will be others for each object created by the -memory manager at runtime. Your global TTM should have a type of -TTM_GLOBAL_TTM_MEM. The size field for the global object should be -sizeof(struct ttm_mem_global), and the init and release hooks should -point at your driver-specific init and release routines, which probably -eventually call ttm_mem_global_init and ttm_mem_global_release, -respectively. - -Once your global TTM accounting structure is set up and initialized by -calling ttm_global_item_ref() on it, you need to create a buffer -object TTM to provide a pool for buffer object allocation by clients and -the kernel itself. The type of this object should be -TTM_GLOBAL_TTM_BO, and its size should be sizeof(struct -ttm_bo_global). Again, driver-specific init and release functions may -be provided, likely eventually calling ttm_bo_global_ref_init() and -ttm_bo_global_ref_release(), respectively. Also, like the previous -object, ttm_global_item_ref() is used to create an initial reference -count for the TTM, which will call your initialization function. - -See the radeon_ttm.c file for an example of usage. - The Graphics Execution Manager (GEM) -- 2.25.1
[PATCH 2/8] drm/ttm: add some general module kerneldoc
For now just a brief description of what TTM is all about. Signed-off-by: Christian König --- Documentation/gpu/drm-mm.rst | 3 ++- drivers/gpu/drm/ttm/ttm_module.c | 12 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst index 8ca981065e1a..6b7717af4f88 100644 --- a/Documentation/gpu/drm-mm.rst +++ b/Documentation/gpu/drm-mm.rst @@ -28,7 +28,8 @@ UMA devices. The Translation Table Manager (TTM) === -TTM design background and information belongs here. +.. kernel-doc:: drivers/gpu/drm/ttm/ttm_module.c + :doc: TTM The Graphics Execution Manager (GEM) diff --git a/drivers/gpu/drm/ttm/ttm_module.c b/drivers/gpu/drm/ttm/ttm_module.c index 997c458f68a9..6c19290f7ea9 100644 --- a/drivers/gpu/drm/ttm/ttm_module.c +++ b/drivers/gpu/drm/ttm/ttm_module.c @@ -39,6 +39,18 @@ #include "ttm_module.h" +/** + * DOC: TTM + * + * TTM is a memory manager for accelerator devices with dedicated memory. + * + * The basic idea is that resources are grouped together in buffer objects of + * certain size and TTM handles lifetime, movement and CPU mappings of those + * objects. + * + * TODO: Add more design background and information here. + */ + /** * ttm_prot_from_caching - Modify the page protection according to the * ttm cacing mode -- 2.25.1
[PATCH 5/8] drm/ttm: enable TTM resource object kerneldoc v2
Fix the last two remaining warnings and finally enable this. v2: add caching enum link Signed-off-by: Christian König Reviewed-by: Matthew Auld Reviewed-by: Alex Deucher --- Documentation/gpu/drm-mm.rst | 9 + include/drm/ttm/ttm_resource.h | 6 ++ 2 files changed, 11 insertions(+), 4 deletions(-) diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst index 3da81b7b4e71..66d24b745c62 100644 --- a/Documentation/gpu/drm-mm.rst +++ b/Documentation/gpu/drm-mm.rst @@ -43,6 +43,15 @@ TTM device object reference .. kernel-doc:: drivers/gpu/drm/ttm/ttm_device.c :export: +TTM resource object reference +- + +.. kernel-doc:: include/drm/ttm/ttm_resource.h + :internal: + +.. kernel-doc:: drivers/gpu/drm/ttm/ttm_resource.c + :export: + The Graphics Execution Manager (GEM) diff --git a/include/drm/ttm/ttm_resource.h b/include/drm/ttm/ttm_resource.h index 32c5edd9e8b5..5952051091cd 100644 --- a/include/drm/ttm/ttm_resource.h +++ b/include/drm/ttm/ttm_resource.h @@ -103,10 +103,7 @@ struct ttm_resource_manager_func { * struct ttm_resource_manager * * @use_type: The memory type is enabled. - * @flags: TTM_MEMTYPE_XX flags identifying the traits of the memory - * managed by this memory type. - * @gpu_offset: If used, the GPU offset of the first managed page of - * fixed memory or the first managed location in an aperture. + * @use_tt: If a TT object should be used for the backing store. * @size: Size of the managed region. * @func: structure pointer implementing the range manager. See above * @move_lock: lock for move fence @@ -144,6 +141,7 @@ struct ttm_resource_manager { * @addr: mapped virtual address * @offset:physical addr * @is_iomem: is this io memory ? + * @caching: See enum ttm_caching * * Structure indicating the bus placement of an object. */ -- 2.25.1
[PATCH 4/8] drm/ttm: enable TTM device object kerneldoc v2
Fix the remaining warnings, switch to inline structure documentation and finally enable this. v2: adjust based on suggestions from Alex Signed-off-by: Christian König Reviewed-by: Matthew Auld --- Documentation/gpu/drm-mm.rst | 9 + include/drm/ttm/ttm_device.h | 72 +++- 2 files changed, 48 insertions(+), 33 deletions(-) diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst index f22c9f9a2c0e..3da81b7b4e71 100644 --- a/Documentation/gpu/drm-mm.rst +++ b/Documentation/gpu/drm-mm.rst @@ -34,6 +34,15 @@ The Translation Table Manager (TTM) .. kernel-doc:: include/drm/ttm/ttm_caching.h :internal: +TTM device object reference +--- + +.. kernel-doc:: include/drm/ttm/ttm_device.h + :internal: + +.. kernel-doc:: drivers/gpu/drm/ttm/ttm_device.c + :export: + The Graphics Execution Manager (GEM) diff --git a/include/drm/ttm/ttm_device.h b/include/drm/ttm/ttm_device.h index 07d722950d5b..3cc1d9b76131 100644 --- a/include/drm/ttm/ttm_device.h +++ b/include/drm/ttm/ttm_device.h @@ -39,31 +39,23 @@ struct ttm_operation_ctx; /** * struct ttm_global - Buffer object driver global data. - * - * @dummy_read_page: Pointer to a dummy page used for mapping requests - * of unpopulated pages. - * @shrink: A shrink callback object used for buffer object swap. - * @device_list_mutex: Mutex protecting the device list. - * This mutex is held while traversing the device list for pm options. - * @lru_lock: Spinlock protecting the bo subsystem lru lists. - * @device_list: List of buffer object devices. - * @swap_lru: Lru list of buffer objects used for swapping. */ extern struct ttm_global { /** -* Constant after init. +* @dummy_read_page: Pointer to a dummy page used for mapping requests +* of unpopulated pages. Constant after init. */ - struct page *dummy_read_page; /** -* Protected by ttm_global_mutex. +* @device_list: List of buffer object devices. Protected by +* ttm_global_mutex. */ struct list_head device_list; /** -* Internal protection. +* @bo_count: Number of buffer objects allocated by devices. */ atomic_t bo_count; } ttm_glob; @@ -230,50 +222,64 @@ struct ttm_device_funcs { /** * struct ttm_device - Buffer object driver device-specific data. - * - * @device_list: Our entry in the global device list. - * @funcs: Function table for the device. - * @sysman: Resource manager for the system domain. - * @man_drv: An array of resource_managers. - * @vma_manager: Address space manager. - * @pool: page pool for the device. - * @dev_mapping: A pointer to the struct address_space representing the - * device address space. - * @wq: Work queue structure for the delayed delete workqueue. */ struct ttm_device { - /* + /** +* @device_list: Our entry in the global device list. * Constant after bo device init */ struct list_head device_list; + + /** +* @funcs: Function table for the device. +* Constant after bo device init +*/ struct ttm_device_funcs *funcs; - /* + /** +* @sysman: Resource manager for the system domain. * Access via ttm_manager_type. */ struct ttm_resource_manager sysman; + + /** +* @man_drv: An array of resource_managers, one per resource type. +*/ struct ttm_resource_manager *man_drv[TTM_NUM_MEM_TYPES]; - /* -* Protected by internal locks. + /** +* @vma_manager: Address space manager for finding BOs to mmap. */ struct drm_vma_offset_manager *vma_manager; + + /** +* @pool: page pool for the device. +*/ struct ttm_pool pool; - /* -* Protection for the per manager LRU and ddestroy lists. + /** +* @lru_lock: Protection for the per manager LRU and ddestroy lists. */ spinlock_t lru_lock; + + /** +* @ddestroy: Destroyed but not yet cleaned up buffer objects. +*/ struct list_head ddestroy; + + /** +* @pinned: Buffer objects which are pinned and so not on any LRU list. +*/ struct list_head pinned; - /* -* Protected by load / firstopen / lastclose /unload sync. + /** +* @dev_mapping: A pointer to the struct address_space for invalidating +* CPU mappings on buffer move. Protected by load/unload sync. */ struct address_space *dev_mapping; - /* -* Internal protection. + /** +* @wq: Work queue structure for the delayed delete workqueue. */ struct delayed_work wq; }; -- 2.25.1
[PATCH 6/8] drm/ttm: enable TTM placement kerneldoc
Fix the last remaining warning and finally enable this. Signed-off-by: Christian König Reviewed-by: Matthew Auld Reviewed-by: Alex Deucher --- Documentation/gpu/drm-mm.rst| 6 ++ include/drm/ttm/ttm_placement.h | 1 + 2 files changed, 7 insertions(+) diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst index 66d24b745c62..1c9930fb5e7d 100644 --- a/Documentation/gpu/drm-mm.rst +++ b/Documentation/gpu/drm-mm.rst @@ -43,6 +43,12 @@ TTM device object reference .. kernel-doc:: drivers/gpu/drm/ttm/ttm_device.c :export: +TTM resource placement reference + + +.. kernel-doc:: include/drm/ttm/ttm_placement.h + :internal: + TTM resource object reference - diff --git a/include/drm/ttm/ttm_placement.h b/include/drm/ttm/ttm_placement.h index 8995c9e4ec1b..76d1b9119a2b 100644 --- a/include/drm/ttm/ttm_placement.h +++ b/include/drm/ttm/ttm_placement.h @@ -58,6 +58,7 @@ * * @fpfn: first valid page frame number to put the object * @lpfn: last valid page frame number to put the object + * @mem_type: One of TTM_PL_* where the resource should be allocated from. * @flags: memory domain and caching flags for the object * * Structure indicating a possible place to put an object. -- 2.25.1
[PATCH 7/8] drm/ttm: enable TTM TT object kerneldoc v2
Fix the remaining warnings and finally enable this. v2: add caching enum link Signed-off-by: Christian König Reviewed-by: Matthew Auld Reviewed-by: Alex Deucher --- Documentation/gpu/drm-mm.rst | 9 + include/drm/ttm/ttm_tt.h | 11 --- 2 files changed, 17 insertions(+), 3 deletions(-) diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst index 1c9930fb5e7d..69c4a20b95d0 100644 --- a/Documentation/gpu/drm-mm.rst +++ b/Documentation/gpu/drm-mm.rst @@ -58,6 +58,15 @@ TTM resource object reference .. kernel-doc:: drivers/gpu/drm/ttm/ttm_resource.c :export: +TTM TT object reference +--- + +.. kernel-doc:: include/drm/ttm/ttm_tt.h + :internal: + +.. kernel-doc:: drivers/gpu/drm/ttm/ttm_tt.c + :export: + The Graphics Execution Manager (GEM) diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h index e402dab1d0f6..b3963ab12e1f 100644 --- a/include/drm/ttm/ttm_tt.h +++ b/include/drm/ttm/ttm_tt.h @@ -54,7 +54,7 @@ struct ttm_operation_ctx; * @dma_address: The DMA (bus) addresses of the pages * @swap_storage: Pointer to shmem struct file for swap storage. * @pages_list: used by some page allocation backend - * @caching: The current caching state of the pages. + * @caching: The current caching state of the pages, see enum ttm_caching. * * This is a structure holding the pages, caching- and aperture binding * status for a buffer object that isn't backed by fixed (VRAM / AGP) @@ -126,8 +126,9 @@ int ttm_sg_tt_init(struct ttm_tt *ttm_dma, struct ttm_buffer_object *bo, void ttm_tt_fini(struct ttm_tt *ttm); /** - * ttm_ttm_destroy: + * ttm_tt_destroy: * + * @bdev: the ttm_device this object belongs to * @ttm: The struct ttm_tt. * * Unbind, unpopulate and destroy common struct ttm_tt. @@ -148,15 +149,19 @@ int ttm_tt_swapout(struct ttm_device *bdev, struct ttm_tt *ttm, /** * ttm_tt_populate - allocate pages for a ttm * + * @bdev: the ttm_device this object belongs to * @ttm: Pointer to the ttm_tt structure + * @ctx: operation context for populating the tt object. * * Calls the driver method to allocate pages for a ttm */ -int ttm_tt_populate(struct ttm_device *bdev, struct ttm_tt *ttm, struct ttm_operation_ctx *ctx); +int ttm_tt_populate(struct ttm_device *bdev, struct ttm_tt *ttm, + struct ttm_operation_ctx *ctx); /** * ttm_tt_unpopulate - free pages from a ttm * + * @bdev: the ttm_device this object belongs to * @ttm: Pointer to the ttm_tt structure * * Calls the driver method to free all pages from a ttm -- 2.25.1
[PATCH 3/8] drm/ttm: add kerneldoc for enum ttm_caching
Briefly describe what this is all about. Signed-off-by: Christian König Reviewed-by: Alex Deucher --- Documentation/gpu/drm-mm.rst | 3 +++ include/drm/ttm/ttm_caching.h | 17 + 2 files changed, 20 insertions(+) diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst index 6b7717af4f88..f22c9f9a2c0e 100644 --- a/Documentation/gpu/drm-mm.rst +++ b/Documentation/gpu/drm-mm.rst @@ -31,6 +31,9 @@ The Translation Table Manager (TTM) .. kernel-doc:: drivers/gpu/drm/ttm/ttm_module.c :doc: TTM +.. kernel-doc:: include/drm/ttm/ttm_caching.h + :internal: + The Graphics Execution Manager (GEM) diff --git a/include/drm/ttm/ttm_caching.h b/include/drm/ttm/ttm_caching.h index 3c9dd65f5aaf..235a743d90e1 100644 --- a/include/drm/ttm/ttm_caching.h +++ b/include/drm/ttm/ttm_caching.h @@ -27,9 +27,26 @@ #define TTM_NUM_CACHING_TYPES 3 +/** + * enum ttm_caching - CPU caching and BUS snooping behavior. + */ enum ttm_caching { + /** +* @ttm_uncached: Most defensive option for device mappings, +* don't even allow write combining. +*/ ttm_uncached, + + /** +* @ttm_write_combined: Don't cache read accesses, but allow at least +* writes to be combined. +*/ ttm_write_combined, + + /** +* @ttm_cached: Fully cached like normal system memory, requires that +* devices snoop the CPU cache on accesses. +*/ ttm_cached }; -- 2.25.1
[PATCH 8/8] drm/ttm: enable TTM page pool kerneldoc
Fix the remaining warnings and finally enable this. Signed-off-by: Christian König Reviewed-by: Alex Deucher --- Documentation/gpu/drm-mm.rst | 9 + include/drm/ttm/ttm_pool.h | 5 +++-- 2 files changed, 12 insertions(+), 2 deletions(-) diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst index 69c4a20b95d0..e0538083a2c0 100644 --- a/Documentation/gpu/drm-mm.rst +++ b/Documentation/gpu/drm-mm.rst @@ -67,6 +67,15 @@ TTM TT object reference .. kernel-doc:: drivers/gpu/drm/ttm/ttm_tt.c :export: +TTM page pool reference +--- + +.. kernel-doc:: include/drm/ttm/ttm_pool.h + :internal: + +.. kernel-doc:: drivers/gpu/drm/ttm/ttm_pool.c + :export: + The Graphics Execution Manager (GEM) diff --git a/include/drm/ttm/ttm_pool.h b/include/drm/ttm/ttm_pool.h index 4321728bdd11..ef09b23d29e3 100644 --- a/include/drm/ttm/ttm_pool.h +++ b/include/drm/ttm/ttm_pool.h @@ -37,7 +37,7 @@ struct ttm_pool; struct ttm_operation_ctx; /** - * ttm_pool_type - Pool for a certain memory type + * struct ttm_pool_type - Pool for a certain memory type * * @pool: the pool we belong to, might be NULL for the global ones * @order: the allocation order our pages have @@ -58,8 +58,9 @@ struct ttm_pool_type { }; /** - * ttm_pool - Pool for all caching and orders + * struct ttm_pool - Pool for all caching and orders * + * @dev: the device we allocate pages for * @use_dma_alloc: if coherent DMA allocations should be used * @use_dma32: if GFP_DMA32 should be used * @caching: pools for each caching/order -- 2.25.1
[drm:i915-vtable-cleanup 12/12] drivers/gpu/drm/i915/display/intel_audio.c:685:13: error: 'ilk_audio_codec_disable' defined but not used
tree: git://people.freedesktop.org/~airlied/linux.git i915-vtable-cleanup head: b0d0061aeef594fc572295c0e3c02ba91596cbf6 commit: b0d0061aeef594fc572295c0e3c02ba91596cbf6 [12/12] drm/i915/display: constify the audio functions config: i386-allyesconfig (attached as .config) compiler: gcc-9 (Debian 9.3.0-22) 9.3.0 reproduce (this is a W=1 build): git remote add drm git://people.freedesktop.org/~airlied/linux.git git fetch --no-tags drm i915-vtable-cleanup git checkout b0d0061aeef594fc572295c0e3c02ba91596cbf6 # save the attached .config to linux build tree make W=1 ARCH=i386 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All errors (new ones prefixed by >>): drivers/gpu/drm/i915/display/intel_audio.c: In function 'intel_audio_codec_enable': drivers/gpu/drm/i915/display/intel_audio.c:852:24: error: 'dev_priv->audio_funcs' is a pointer; did you mean to use '->'? 852 | dev_priv->audio_funcs.audio_codec_enable(encoder, |^ |-> drivers/gpu/drm/i915/display/intel_audio.c: In function 'intel_audio_codec_disable': drivers/gpu/drm/i915/display/intel_audio.c:897:24: error: 'dev_priv->audio_funcs' is a pointer; did you mean to use '->'? 897 | dev_priv->audio_funcs.audio_codec_disable(encoder, |^ |-> drivers/gpu/drm/i915/display/intel_audio.c: At top level: drivers/gpu/drm/i915/display/intel_audio.c:919:46: error: expected '}' before ';' token 919 | .audio_codec_enable = g4x_audio_codec_enable; | ^ drivers/gpu/drm/i915/display/intel_audio.c:918:68: note: to match this '{' 918 | static const struct drm_i915_display_audio_funcs g4x_audio_funcs = { |^ drivers/gpu/drm/i915/display/intel_audio.c:924:46: error: expected '}' before ';' token 924 | .audio_codec_enable = ilk_audio_codec_enable; | ^ drivers/gpu/drm/i915/display/intel_audio.c:923:68: note: to match this '{' 923 | static const struct drm_i915_display_audio_funcs ilk_audio_funcs = { |^ drivers/gpu/drm/i915/display/intel_audio.c:929:46: error: expected '}' before ';' token 929 | .audio_codec_enable = hsw_audio_codec_enable; | ^ drivers/gpu/drm/i915/display/intel_audio.c:928:68: note: to match this '{' 928 | static const struct drm_i915_display_audio_funcs hsw_audio_funcs = { |^ >> drivers/gpu/drm/i915/display/intel_audio.c:685:13: error: >> 'ilk_audio_codec_disable' defined but not used [-Werror=unused-function] 685 | static void ilk_audio_codec_disable(struct intel_encoder *encoder, | ^~~ >> drivers/gpu/drm/i915/display/intel_audio.c:486:13: error: >> 'hsw_audio_codec_disable' defined but not used [-Werror=unused-function] 486 | static void hsw_audio_codec_disable(struct intel_encoder *encoder, | ^~~ >> drivers/gpu/drm/i915/display/intel_audio.c:323:13: error: >> 'g4x_audio_codec_disable' defined but not used [-Werror=unused-function] 323 | static void g4x_audio_codec_disable(struct intel_encoder *encoder, | ^~~ cc1: all warnings being treated as errors vim +/ilk_audio_codec_disable +685 drivers/gpu/drm/i915/display/intel_audio.c 12e87f23c6278ed drivers/gpu/drm/i915/intel_audio.c Jani Nikula 2016-10-10 485 8ec47de21bfab96 drivers/gpu/drm/i915/intel_audio.c Ville Syrjälä 2017-10-30 @486 static void hsw_audio_codec_disable(struct intel_encoder *encoder, 8ec47de21bfab96 drivers/gpu/drm/i915/intel_audio.c Ville Syrjälä 2017-10-30 487const struct intel_crtc_state *old_crtc_state, 8ec47de21bfab96 drivers/gpu/drm/i915/intel_audio.c Ville Syrjälä 2017-10-30 488const struct drm_connector_state *old_conn_state) 69bfe1a9b4dffca drivers/gpu/drm/i915/intel_audio.c Jani Nikula 2014-10-27 489 { fac5e23e3c385fd drivers/gpu/drm/i915/intel_audio.c Chris Wilson 2016-07-04 490struct drm_i915_private *dev_priv = to_i915(encoder->base.dev); 3904fb78a80da64 drivers/gpu/drm/i915/intel_audio.c Ville Syrjälä 2019-04-30 491enum transcoder cpu_transcoder = old_crtc_state->cpu_transcoder; c25004964c5a8a0 drivers/gpu/drm/i915/intel_audio.c Jani Nikula 2018-06-12 492u32 tmp; 69bfe1a9b4dffca drivers/gpu/drm/i915/intel_audio.c Jani Nikula 2014-10-27
Re: [PATCH 2/8] drm/ttm: add some general module kerneldoc
On Wed, 8 Sept 2021 at 14:29, Christian König wrote: > > For now just a brief description of what TTM is all about. > > Signed-off-by: Christian König Reviewed-by: Matthew Auld
Re: [PATCH] drm/msm: Disable frequency clamping on a630
On 08/09/2021 03:21, Bjorn Andersson wrote: On Mon 09 Aug 10:26 PDT 2021, Akhil P Oommen wrote: On 8/9/2021 9:48 PM, Caleb Connolly wrote: On 09/08/2021 17:12, Rob Clark wrote: On Mon, Aug 9, 2021 at 7:52 AM Akhil P Oommen wrote: [..] I am a bit confused. We don't define a power domain for gpu in dt, correct? Then what exactly set_opp do here? Do you think this usleep is what is helping here somehow to mask the issue? The power domains (for cx and gx) are defined in the GMU DT, the OPPs in the GPU DT. For the sake of simplicity I'll refer to the lowest frequency (25700) and OPP level (RPMH_REGULATOR_LEVEL_LOW_SVS) as the "min" state, and the highest frequency (71000) and OPP level (RPMH_REGULATOR_LEVEL_TURBO_L1) as the "max" state. These are defined in sdm845.dtsi under the gpu node. The new devfreq behaviour unmasks what I think is a driver bug, it inadvertently puts much more strain on the GPU regulators than they usually get. With the new behaviour the GPU jumps from it's min state to the max state and back again extremely rapidly under workloads as small as refreshing UI. Where previously the GPU would rarely if ever go above 342MHz when interacting with the device, it now jumps between min and max many times per second. If my understanding is correct, the current implementation of the GMU set freq is the following: - Get OPP for frequency to set - Push the frequency to the GMU - immediately updating the core clock - Call dev_pm_opp_set_opp() which triggers a notify chain, this winds up somewhere in power management code and causes the gx regulator level to be updated Nope. dev_pm_opp_set_opp() sets the bandwidth for gpu and nothing else. We were using a different api earlier which got deprecated - dev_pm_opp_set_bw(). On the Lenovo Yoga C630 this is reproduced by starting alacritty and if I'm lucky I managed to hit a few keys before it crashes, so I spent a few hours looking into this as well... As you say, the dev_pm_opp_set_opp() will only cast a interconnect vote. The opp-level is just there for show and isn't used by anything, at least not on 845. Further more, I'm missing something in my tree, so the interconnect doesn't hit sync_state, and as such we're not actually scaling the buses. So the problem is not that Linux doesn't turn on the buses in time. So I suspect that the "AHB bus error" isn't saying that we turned off the bus, but rather that the GPU becomes unstable or something of that sort. Lastly, I reverted 9bc95570175a ("drm/msm: Devfreq tuning") and ran Aquarium for 20 minutes without a problem. I then switched the gpu devfreq governor to "userspace" and ran the following: while true; do echo 25700 > /sys/class/devfreq/500.gpu/userspace/set_freq echo 71000 > /sys/class/devfreq/500.gpu/userspace/set_freq done It took 19 iterations of this loop to crash the GPU. So the problem doesn't seem to be Rob's change, it's just that prior to it the chance to hitting it is way lower. Question is still what it is that we're triggering. Do the opp-levels in DTS represent how the hardware behaves? If so then it does just appear to be that whatever is responsible for scaling the GX rail voltage has no time limits and will attempt to switch the regulator between min/max voltage as often as we tell it to which is probably not something the hardware expected. Regards, Bjorn -- Kind Regards, Caleb (they/them)
[drm:i915-uncore-vfunc 31/31] drivers/gpu/drm/i915/i915_irq.c:52:10: fatal error: 'display/i915_display_trace.h' file not found
tree: git://people.freedesktop.org/~airlied/linux.git i915-uncore-vfunc head: b42168f90718a90b11f2d52306d9aeaa9468 commit: b42168f90718a90b11f2d52306d9aeaa9468 [31/31] RFC: drm/i915: start splitting trace points config: i386-randconfig-a014-20210908 (attached as .config) compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 9c476172b93367d2cb88d7d3f4b1b5b456fa6020) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross git remote add drm git://people.freedesktop.org/~airlied/linux.git git fetch --no-tags drm i915-uncore-vfunc git checkout b42168f90718a90b11f2d52306d9aeaa9468 # save the attached .config to linux build tree mkdir build_dir COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross O=build_dir ARCH=i386 SHELL=/bin/bash drivers/gpu/ If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All errors (new ones prefixed by >>): make[4]: *** [scripts/Makefile.build:271: drivers/gpu/drm/i915/i915_irq.o] Error 1 make[4]: *** [scripts/Makefile.build:271: drivers/gpu/drm/i915/intel_uncore.o] Error 1 make[4]: *** [scripts/Makefile.build:271: drivers/gpu/drm/i915/gt/intel_execlists_submission.o] Error 1 make[4]: *** [scripts/Makefile.build:271: drivers/gpu/drm/i915/gt/intel_reset.o] Error 1 make[4]: *** [scripts/Makefile.build:271: drivers/gpu/drm/i915/gem/i915_gem_context.o] Error 1 >> make[4]: *** No rule to make target >> 'drivers/gpu/drm/i915/display/intel_display_trace_points.o', needed by >> 'drivers/gpu/drm/i915/i915.o'. make[4]: *** [scripts/Makefile.build:271: drivers/gpu/drm/i915/display/intel_atomic_plane.o] Error 1 make[4]: *** [scripts/Makefile.build:271: drivers/gpu/drm/i915/display/intel_crtc.o] Error 1 make[4]: *** [scripts/Makefile.build:271: drivers/gpu/drm/i915/display/intel_frontbuffer.o] Error 1 make[4]: *** [scripts/Makefile.build:271: drivers/gpu/drm/i915/display/intel_fifo_underrun.o] Error 1 make[4]: *** [scripts/Makefile.build:271: drivers/gpu/drm/i915/display/intel_fbc.o] Error 1 make[4]: Target '__build' not remade because of errors. -- >> drivers/gpu/drm/i915/i915_irq.c:52:10: fatal error: >> 'display/i915_display_trace.h' file not found #include "display/i915_display_trace.h" ^~ 1 error generated. -- >> drivers/gpu/drm/i915/display/intel_atomic_plane.c:38:10: fatal error: >> 'i915_display_trace.h' file not found #include "i915_display_trace.h" ^~ 1 error generated. -- >> drivers/gpu/drm/i915/display/intel_crtc.c:13:10: fatal error: >> 'i915_display_trace.h' file not found #include "i915_display_trace.h" ^~ 1 error generated. -- >> drivers/gpu/drm/i915/display/intel_fbc.c:44:10: fatal error: >> 'i915_display_trace.h' file not found #include "i915_display_trace.h" ^~ 1 error generated. -- >> drivers/gpu/drm/i915/display/intel_fifo_underrun.c:29:10: fatal error: >> 'i915_display_trace.h' file not found #include "i915_display_trace.h" ^~ 1 error generated. -- >> drivers/gpu/drm/i915/display/intel_frontbuffer.c:61:10: fatal error: >> 'i915_display_trace.h' file not found #include "i915_display_trace.h" ^~ 1 error generated. vim +52 drivers/gpu/drm/i915/i915_irq.c 49 50 #include "i915_drv.h" 51 #include "i915_irq.h" > 52 #include "display/i915_display_trace.h" 53 #include "intel_pm.h" 54 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org .config.gz Description: application/gzip
[PATCH] drm/rockchip: Update crtc fixup to account for fractional clk change
From: Chris Morgan After commit 928f9e268611 ("clk: fractional-divider: Hide clk_fractional_divider_ops from wide audience") was merged it appears that the DSI panel on my Odroid Go Advance stopped working. Upon closer examination of the problem, it looks like it was the fixup in the rockchip_drm_vop.c file was causing the issue. The changes made to the clk driver appear to change some assumptions made in the fixup. After debugging the working 5.14 kernel and the no-longer working 5.15 kernel, it looks like this was broken all along but still worked, whereas after the fractional clock change it stopped working despite the issue (it went from sort-of broken to very broken). In the 5.14 kernel the dclk_vopb_frac was being requested to be set to 17000999 on my board. The clock driver was taking the value of the parent clock and attempting to divide the requested value from it (1700/17000999 = 0), then subtracting 1 from it (making it -1), and running it through fls_long to get 64. It would then subtract the value of fd->mwidth from it to get 48, and then bit shift 17000999 to the left by 48, coming up with a very large number of 7649082492112076800. This resulted in a numerator of 65535 and a denominator of 1 from the clk driver. The driver seemingly would try again and get a correct 1:1 value later, and then move on. Output from my 5.14 kernel (with some printfs for good measure): [2.830066] rockchip-drm display-subsystem: bound ff46.vop (ops vop_component_ops) [2.839431] rockchip-drm display-subsystem: bound ff45.dsi (ops dw_mipi_dsi_rockchip_ops) [2.855980] Clock is dclk_vopb_frac [2.856004] Scale 64, Rate 7649082492112076800, Oldrate 17000999, Parent Rate 1700, Best Numerator 65535, Best Denominator 1, fd->mwidth 16 [2.903529] Clock is dclk_vopb_frac [2.903556] Scale 0, Rate 1700, Oldrate 1700, Parent Rate 1700, Best Numerator 1, Best Denominator 1, fd->mwidth 16 [2.903579] Clock is dclk_vopb_frac [2.903583] Scale 0, Rate 1700, Oldrate 1700, Parent Rate 1700, Best Numerator 1, Best Denominator 1, fd->mwidth 16 Contrast this with 5.15 after the clk change where the rate of 17000999 was getting passed and resulted in numerators/denomiators of 17001/ 17000. Output from my 5.15 kernel (with some printfs added for good measure): [2.817571] rockchip-drm display-subsystem: bound ff46.vop (ops vop_component_ops) [2.826975] rockchip-drm display-subsystem: bound ff45.dsi (ops dw_mipi_dsi_rockchip_ops) [2.843430] Rate 17000999, Parent Rate 1700, Best Numerator 17018, Best Denominator 17017 [2.891073] Rate 17001000, Parent Rate 1700, Best Numerator 17001, Best Denominator 17000 [2.891269] Rate 17001000, Parent Rate 1700, Best Numerator 17001, Best Denominator 17000 [2.891281] Rate 17001000, Parent Rate 1700, Best Numerator 17001, Best Denominator 17000 After tracing through the code it appeared that this function here was adding a 999 to the requested frequency because of how the clk driver was rounding/accepting those frequencies. I believe after the changes made in the commit listed above the assumptions listed in this driver are no longer true. When I remove the + 999 from the driver the DSI panel begins to work again. Output from my 5.15 kernel with 999 removed (printfs added): [2.852054] rockchip-drm display-subsystem: bound ff46.vop (ops vop_component_ops) [2.864483] rockchip-drm display-subsystem: bound ff45.dsi (ops dw_mipi_dsi_rockchip_ops) [2.880869] Clock is dclk_vopb_frac [2.880892] Rate 1700, Parent Rate 1700, Best Numerator 1, Best Denominator 1 [2.928521] Clock is dclk_vopb_frac [2.928551] Rate 1700, Parent Rate 1700, Best Numerator 1, Best Denominator 1 [2.928570] Clock is dclk_vopb_frac [2.928574] Rate 1700, Parent Rate 1700, Best Numerator 1, Best Denominator 1 I have tested the change extensively on my Odroid Go Advance (Rockchip RK3326) and it appears to work well. However, this change will affect all Rockchip SoCs that use this driver so I believe further testing is warranted. Please note that without this change I can confirm at least all PX30s with DSI panels will stop working with the 5.15 kernel. Signed-off-by: Chris Morgan --- drivers/gpu/drm/rockchip/rockchip_drm_vop.c | 21 +++-- 1 file changed, 3 insertions(+), 18 deletions(-) diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c index ba9e14da41b4..bfef4f52dce6 100644 --- a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c +++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c @@ -1169,31 +1169,16 @@ static bool vop_crtc_mode_fixup(struct drm_crtc *crtc, * * - DRM works in in kHz. * - Clock framework works in Hz. -* - Rockchip's clock driver picks the clock rate that is the -* same _OR LOWER_ than the one requested. *
Re: [PATCH 4/8] drm/i915/xehp: CCS should use RCS setup functions
On 08/09/2021 11:13, Tvrtko Ursulin wrote: On 07/09/2021 18:19, Matt Roper wrote: The compute engine handles the same commands the render engine can (except 3D pipeline), so it makes sense that CCS is more similar to RCS than non-render engines. The CCS context state (lrc) is also similar to the render one, so reuse it. Note that the compute engine has its own CTX_R_PWR_CLK_STATE register. In order to avoid having multiple RCS && CCS checks, add the following engine flag: - I915_ENGINE_HAS_RCS_REG_STATE - use the render (larger) reg state ctx. BSpec: 46260 Original-patch-by: Michel Thierry Cc: Tvrtko Ursulin Cc: Daniele Ceraolo Spurio Signed-off-by: Aravind Iddamsetty Signed-off-by: Matt Roper --- drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c | 8 +--- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 6 ++ drivers/gpu/drm/i915/gt/intel_engine_types.h | 1 + drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 2 +- drivers/gpu/drm/i915/gt/intel_lrc.c | 4 ++-- drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 2 +- drivers/gpu/drm/i915/i915_perf.c | 4 ++-- drivers/gpu/drm/i915/i915_reg.h | 2 +- 8 files changed, 19 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c index b32f7fed2d9c..fbe10783628b 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c @@ -883,7 +883,9 @@ static int igt_shared_ctx_exec(void *arg) return err; } -static int rpcs_query_batch(struct drm_i915_gem_object *rpcs, struct i915_vma *vma) +static int rpcs_query_batch(struct drm_i915_gem_object *rpcs, + struct i915_vma *vma, + struct intel_engine_cs *engine) { u32 *cmd; @@ -894,7 +896,7 @@ static int rpcs_query_batch(struct drm_i915_gem_object *rpcs, struct i915_vma *v return PTR_ERR(cmd); *cmd++ = MI_STORE_REGISTER_MEM_GEN8; - *cmd++ = i915_mmio_reg_offset(GEN8_R_PWR_CLK_STATE); + *cmd++ = i915_mmio_reg_offset(GEN8_R_PWR_CLK_STATE(engine->mmio_base)); *cmd++ = lower_32_bits(vma->node.start); *cmd++ = upper_32_bits(vma->node.start); *cmd = MI_BATCH_BUFFER_END; @@ -955,7 +957,7 @@ emit_rpcs_query(struct drm_i915_gem_object *obj, if (err) goto err_vma; - err = rpcs_query_batch(rpcs, vma); + err = rpcs_query_batch(rpcs, vma, ce->engine); if (err) goto err_batch; diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 69944bd8c19d..b346b946602d 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -205,6 +205,8 @@ u32 intel_engine_context_size(struct intel_gt *gt, u8 class) BUILD_BUG_ON(I915_GTT_PAGE_SIZE != PAGE_SIZE); switch (class) { + case COMPUTE_CLASS: + fallthrough; case RENDER_CLASS: switch (GRAPHICS_VER(gt->i915)) { default: @@ -379,6 +381,10 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id) if (GRAPHICS_VER(i915) == 12 && engine->class == RENDER_CLASS) engine->props.preempt_timeout_ms = 0; + /* features common between engines sharing EUs */ + if (engine->class == RENDER_CLASS || engine->class == COMPUTE_CLASS) + engine->flags |= I915_ENGINE_HAS_RCS_REG_STATE; + engine->defaults = engine->props; /* never to change again */ engine->context_size = intel_engine_context_size(gt, engine->class); diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index dcb9d8b2362a..30a0c69c36c8 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -454,6 +454,7 @@ struct intel_engine_cs { #define I915_ENGINE_HAS_RELATIVE_MMIO BIT(6) #define I915_ENGINE_REQUIRES_CMD_PARSER BIT(7) #define I915_ENGINE_WANT_FORCED_PREEMPTION BIT(8) +#define I915_ENGINE_HAS_RCS_REG_STATE BIT(9) unsigned int flags; /* diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index de5f9c86b9a4..4c600c46414d 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -3406,7 +3406,7 @@ int intel_execlists_submission_setup(struct intel_engine_cs *engine) logical_ring_default_vfuncs(engine); logical_ring_default_irqs(engine); - if (engine->class == RENDER_CLASS) + if (engine->flags & I915_ENGINE_HAS_RCS_REG_STATE) rcs_submission_override(engine); Hm, what do pipe control flushes which relate to 3d pipeline end up doing on CCS engines? Right, answer found in the following patch. Ideally the two would swap places in the series so by
Re: [PATCH 6/8] drm/i915/xehp: Define context scheduling attributes in lrc descriptor
On 07/09/2021 18:19, Matt Roper wrote: In Dual Context mode the EUs are shared between render and compute command streamers. The hardware provides a field in the lrc descriptor to indicate the prioritization of the thread dispatch associated to the corresponding context. The context priority is set to 'low' at creation time and relies on the existing context priority to set it to low/normal/high. HSDES: 1604462009 Bspec: 46145, 46260 Original-patch-by: Michel Thierry Cc: Tvrtko Ursulin Signed-off-by: Aravind Iddamsetty Signed-off-by: Prasad Nallani Signed-off-by: Matt Roper --- drivers/gpu/drm/i915/gt/intel_engine_cs.c| 4 +++- drivers/gpu/drm/i915/gt/intel_engine_types.h | 1 + drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 6 +- drivers/gpu/drm/i915/gt/intel_lrc.h | 10 ++ drivers/gpu/drm/i915/i915_reg.h | 4 5 files changed, 23 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index b346b946602d..2f719f0ecac3 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -382,8 +382,10 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id) engine->props.preempt_timeout_ms = 0; /* features common between engines sharing EUs */ - if (engine->class == RENDER_CLASS || engine->class == COMPUTE_CLASS) + if (engine->class == RENDER_CLASS || engine->class == COMPUTE_CLASS) { engine->flags |= I915_ENGINE_HAS_RCS_REG_STATE; + engine->flags |= I915_ENGINE_HAS_EU_PRIORITY; + } engine->defaults = engine->props; /* never to change again */ diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index 30a0c69c36c8..00bf0296b28a 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -455,6 +455,7 @@ struct intel_engine_cs { #define I915_ENGINE_REQUIRES_CMD_PARSER BIT(7) #define I915_ENGINE_WANT_FORCED_PREEMPTION BIT(8) #define I915_ENGINE_HAS_RCS_REG_STATE BIT(9) +#define I915_ENGINE_HAS_EU_PRIORITYBIT(10) unsigned int flags; /* diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index 4c600c46414d..2b36ec7f3a04 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -662,9 +662,13 @@ static inline void execlists_schedule_out(struct i915_request *rq) static u64 execlists_update_context(struct i915_request *rq) { struct intel_context *ce = rq->context; - u64 desc = ce->lrc.desc; + u64 desc; u32 tail, prev; + desc = ce->lrc.desc; + if (rq->engine->flags & I915_ENGINE_HAS_EU_PRIORITY) + desc |= lrc_desc_priority(rq_prio(rq)); + /* * WaIdleLiteRestore:bdw,skl * diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.h b/drivers/gpu/drm/i915/gt/intel_lrc.h index 7f697845c4cf..d3f2096b3d51 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.h +++ b/drivers/gpu/drm/i915/gt/intel_lrc.h @@ -79,4 +79,14 @@ static inline u32 lrc_get_runtime(const struct intel_context *ce) return READ_ONCE(ce->lrc_reg_state[CTX_TIMESTAMP]); } +static inline u32 lrc_desc_priority(int prio) +{ + if (prio > I915_PRIORITY_NORMAL) + return GEN12_CTX_PRIORITY_HIGH; + else if (prio < I915_PRIORITY_NORMAL) + return GEN12_CTX_PRIORITY_LOW; + else + return GEN12_CTX_PRIORITY_NORMAL; +} + #endif /* __INTEL_LRC_H__ */ diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 0bb185ce9529..5b68c02c35af 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -4212,6 +4212,10 @@ enum { #define GEN8_CTX_L3LLC_COHERENT (1 << 5) #define GEN8_CTX_PRIVILEGE (1 << 8) #define GEN8_CTX_ADDRESSING_MODE_SHIFT 3 +#define GEN12_CTX_PRIORITY_MASK REG_GENMASK(10, 9) +#define GEN12_CTX_PRIORITY_HIGH REG_FIELD_PREP(GEN12_CTX_PRIORITY_MASK, 2) +#define GEN12_CTX_PRIORITY_NORMAL REG_FIELD_PREP(GEN12_CTX_PRIORITY_MASK, 1) +#define GEN12_CTX_PRIORITY_LOW REG_FIELD_PREP(GEN12_CTX_PRIORITY_MASK, 0) #define GEN8_CTX_ID_SHIFT 32 #define GEN8_CTX_ID_WIDTH 21 Haven't checked bspec to check the bitfield but the mechanics look good. Reviewed-by: Tvrtko Ursulin Regards, Tvrtko
Re: [PATCH 7/8] drm/i915/xehp: Enable ccs/dual-ctx in RCU_MODE
On 07/09/2021 18:19, Matt Roper wrote: We have to specify in the Render Control Unit Mode register when CCS is enabled. Bspec: 46034 Original-patch-by: Michel Thierry Cc: Daniele Ceraolo Spurio Cc: Tvrtko Ursulin Cc: Vinay Belgaumkar Signed-off-by: Daniele Ceraolo Spurio Signed-off-by: Aravind Iddamsetty Signed-off-by: Matt Roper --- .../drm/i915/gt/intel_execlists_submission.c | 26 +++ .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 26 +++ drivers/gpu/drm/i915/i915_reg.h | 3 +++ 3 files changed, 55 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index 2b36ec7f3a04..046f7da67ba6 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -2874,6 +2874,29 @@ static int execlists_resume(struct intel_engine_cs *engine) return 0; } +static int gen12_rcs_resume(struct intel_engine_cs *engine) +{ + int ret; + + ret = execlists_resume(engine); + if (ret) + return ret; + + /* +* Multi Context programming. +* just need to program this register once no matter how many CCS Just +* engines there are. Since some of the CCS engines might be fused off, +* we can't do this as part of the init of a specific CCS and we do +* it during RCS init instead. RCS and all CCS engines are reset I don't really understand the "can't" part - clearly it would be doable if a specific vfunc was assigned to one ccs only, the one which is present of course. Not saying that would be nicer since I think it has it's own downside. Perhaps nicest solution is to add an engine flag saying "enables rcu" and then execlists and guc resume check that and do stuff? No strong opinion yet, just discussing. +* together, so post-reset re-init is covered as well. +*/ + if (CCS_MASK(engine->gt)) + intel_uncore_write(engine->uncore, GEN12_RCU_MODE, + _MASKED_BIT_ENABLE(GEN12_RCU_MODE_CCS_ENABLE)); + + return 0; +} + static void execlists_reset_prepare(struct intel_engine_cs *engine) { ENGINE_TRACE(engine, "depth<-%d\n", @@ -3394,6 +3417,9 @@ static void rcs_submission_override(struct intel_engine_cs *engine) engine->emit_fini_breadcrumb = gen8_emit_fini_breadcrumb_rcs; break; } + + if (engine->class == RENDER_CLASS) + engine->resume = gen12_rcs_resume; } int intel_execlists_submission_setup(struct intel_engine_cs *engine) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 2f5bf7aa7e3b..db956255d076 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -2350,6 +2350,29 @@ static bool guc_sched_engine_disabled(struct i915_sched_engine *sched_engine) return !sched_engine->tasklet.callback; } +static int gen12_rcs_resume(struct intel_engine_cs *engine) +{ + int ret; + + ret = guc_resume(engine); + if (ret) + return ret; + + /* +* Multi Context programming. +* just need to program this register once no matter how many CCS +* engines there are. Since some of the CCS engines might be fused off, +* we can't do this as part of the init of a specific CCS and we do +* it during RCS init instead. RCS and all CCS engines are reset +* together, so post-reset re-init is covered as well. +*/ + if (CCS_MASK(engine->gt)) + intel_uncore_write(engine->uncore, GEN12_RCU_MODE, + _MASKED_BIT_ENABLE(GEN12_RCU_MODE_CCS_ENABLE)); Duplicating the write from gen12_rcs_resume looks passable but when with the whole comment then hmm.. How about a helper is added which both would call? Like intel_engine_enable_rcu_mode() or something? Regards, Tvrtko + + return 0; +} + static void guc_set_default_submission(struct intel_engine_cs *engine) { engine->submit_request = guc_submit_request; @@ -2464,6 +2487,9 @@ static void rcs_submission_override(struct intel_engine_cs *engine) engine->emit_fini_breadcrumb = gen8_emit_fini_breadcrumb_rcs; break; } + + if (engine->class == RENDER_CLASS) + engine->resume = gen12_rcs_resume; } static inline void guc_default_irqs(struct intel_engine_cs *engine) diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 5b68c02c35af..57f9456f8c61 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -498,6 +498,9 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg) #define ECOBITS_PPGTT_CACHE64B (3 << 8) #define ECOBITS_PPGTT_CACHE4B
Re: [Intel-gfx] [PATCH 8/8] drm/i915/xehp: Extend uninterruptible OpenCL workloads to CCS
On 07/09/2021 18:19, Matt Roper wrote: From: John Harrison Now that OpenCL workloads can run on the compute engine, we need to set preempt_timeout_ms = 0 on the CCS engines too. Signed-off-by: John Harrison Signed-off-by: Matt Roper --- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 2f719f0ecac3..7e6ac0ae1f07 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -377,16 +377,17 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id) engine->props.timeslice_duration_ms = CONFIG_DRM_I915_TIMESLICE_DURATION; - /* Override to uninterruptible for OpenCL workloads. */ - if (GRAPHICS_VER(i915) == 12 && engine->class == RENDER_CLASS) - engine->props.preempt_timeout_ms = 0; - /* features common between engines sharing EUs */ if (engine->class == RENDER_CLASS || engine->class == COMPUTE_CLASS) { engine->flags |= I915_ENGINE_HAS_RCS_REG_STATE; engine->flags |= I915_ENGINE_HAS_EU_PRIORITY; } + /* Override to uninterruptible for OpenCL workloads. */ + if (GRAPHICS_VER(i915) == 12 && + engine->flags & I915_ENGINE_HAS_RCS_REG_STATE) + engine->props.preempt_timeout_ms = 0; + engine->defaults = engine->props; /* never to change again */ engine->context_size = intel_engine_context_size(gt, engine->class); Reviewed-by: Tvrtko Ursulin Regards, Tvrtko
Re: [Freedreno] [PATCH 2/3] drm/msm/dpu1: Add MSM8998 to hw catalog
On Wed, Sep 8, 2021 at 2:26 AM Dmitry Baryshkov wrote: > > Hi, > > On Tue, 7 Sept 2021 at 22:13, Jeffrey Hugo wrote: > > > > On Wed, Sep 1, 2021 at 12:11 PM AngeloGioacchino Del Regno > > wrote: > > > > > > Bringup functionality for MSM8998 in the DPU, driver which is mostly > > > the same as SDM845 (just a few variations). > > > > > > Signed-off-by: AngeloGioacchino Del Regno > > > > > > > I don't seem to see a cover letter for this series. > > > > Eh, there are a fair number of differences between the MDSS versions > > for 8998 and 845. > > > > Probably a bigger question, why extend the DPU driver for 8998, when > > the MDP5 driver already supports it[1]? The MDP/DPU split is pretty > > dumb, but I don't see a valid reason for both drivers supporting the > > same target/display revision. IMO, if you want this support in DPU, > > remove it from MDP5. > > > > [1] > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.14&id=d6c7b2284b14c66a268a448a7a8d54f585d38785 > > I don't think that we should enforce such requirements. Having support > both in MDP5 and DPU would allow one to compare those two drivers, > performance, features, etc. > It might be that all MDP5-supported hardware would be also supported > by DPU, thus allowing us to remove the former driver. But until that > time I'd suggest leaving support in place. Well, then you have a host of problems to solve. Lets ignore the code duplication for a minute and assume we've gone with this grand experiment. Two drivers enter, one leaves the victor. How are the clients supposed to pick which driver to use in the mean time? We already have one DT binding for 8998 (which the MDP5 driver services). This series proposes a second. If we go forward with what you propose, we'll have two bindings for the same hardware, which IMO doesn't make sense in the context of DT, and the reason for that is to select which driver is "better". Driver selection is not supposed to be tied to DT like this. So, some boards think MDP5 is better, and some boards think DPU is better. At some point, we decide one of the drivers is the clear winner (lets assume DPU). Then what happens to the existing DTs that were using the MDP5 description? Are they really compatible with DPU? >From a DT perspective, there should be one description, but then how do you pick which driver to load? Both can't bind on the single description, and while you could argue that the users should build one driver or the other, but not both (thus picking which one at build time), that doesn't work for distros that want to build both drivers so that they can support all platforms with a single build (per arch). >From where I sit, your position starts with a good idea, but isn't fully thought out and leads to problems. If there is some reason why DPU is better for 8998, please enumerate it. Does DPU support some config that MDP5 doesn't, which is valuable to you? I'm ok with ripping out the MDP5 support, the reason I didn't go with DPU was that the DPU driver was clearly written only for 845 at the time, and needed significant rework to "downgrade" to an earlier hardware. However, the "reason" DPU exists separate from MDP5 is the claim that the MDP hardware underwent a significant rearchitecture, and thus it was too cumbersome to extend MDP5. While I disagree with the premise, that "rearch" started with 8998.
Re: [PATCH v5 02/16] drm/sched: Allow using a dedicated workqueue for the timeout/fault tdr
On 2021-09-08 2:50 a.m., Boris Brezillon wrote: On Tue, 7 Sep 2021 14:53:58 -0400 Andrey Grodzovsky wrote: On 2021-06-29 7:24 a.m., Christian König wrote: Am 29.06.21 um 13:18 schrieb Boris Brezillon: Hi Christian, On Tue, 29 Jun 2021 13:03:58 +0200 Christian König wrote: Am 29.06.21 um 09:34 schrieb Boris Brezillon: Mali Midgard/Bifrost GPUs have 3 hardware queues but only a global GPU reset. This leads to extra complexity when we need to synchronize timeout works with the reset work. One solution to address that is to have an ordered workqueue at the driver level that will be used by the different schedulers to queue their timeout work. Thanks to the serialization provided by the ordered workqueue we are guaranteed that timeout handlers are executed sequentially, and can thus easily reset the GPU from the timeout handler without extra synchronization. Well, we had already tried this and it didn't worked the way it is expected. The major problem is that you not only want to serialize the queue, but rather have a single reset for all queues. Otherwise you schedule multiple resets for each hardware queue. E.g. for your 3 hardware queues you would reset the GPU 3 times if all of them time out at the same time (which is rather likely). Using a single delayed work item doesn't work either because you then only have one timeout. What could be done is to cancel all delayed work items from all stopped schedulers. drm_sched_stop() does that already, and since we call drm_sched_stop() on all queues in the timeout handler, we end up with only one global reset happening even if several queues report a timeout at the same time. Ah, nice. Yeah, in this case it should indeed work as expected. Feel free to add an Acked-by: Christian König to it. Regards, Christian. Seems to me that for this to work we need to change cancel_delayed_work to cancel_delayed_work_sync so not only pending TO handlers are cancelled but also any in progress are waited for and to to prevent rearming. Also move it right after kthread_park - before we start touching pending list. I'm probably missing something, but I don't really see why this specific change would require replacing cancel_delayed_work() calls by the sync variant. I see, I missed the point that since now we have a single threaded processing and only one TDR handler runs at given time there is no need to wait for other parallel in flight TDR handlers. I mean, if there's a situation where we need to wait for in-flight timeout handler to return, it was already the case before that patch. In amdgpu case we avoided that by trylock on a common lock and returning early in case it was already taken by another TDR handler Note that we need to be careful to not call the sync variant in helpers that are called from the interrupt handler itself to avoid deadlocks (i.e. drm_sched_stop()). I am not clear here - which interrupt handler is drm_sched_stop called from ? It's called from TDR work as far as I see in the code. Andrey
Re: [PATCH v5 02/16] drm/sched: Allow using a dedicated workqueue for the timeout/fault tdr
On Wed, 8 Sep 2021 10:53:21 -0400 Andrey Grodzovsky wrote: > > Note that we need to be careful to not call the sync > > variant in helpers that are called from the interrupt handler itself > > to avoid deadlocks (i.e. drm_sched_stop()). > > > I am not clear here - which interrupt handler is drm_sched_stop > called from ? It's called from TDR work as far as I see in the code. My bad, I meant the timeout handler, not the interrupt handler.
Re: [PATCH] drm/bridge: ti-sn65dsi83: Check link status register after enabling the bridge
On 9/8/21 1:11 PM, Dave Stevenson wrote: Hi Marek and Andrzej Hello Dave, skipping the protocol discussion, which I hope Andrej will pick up. [...] Usually video transmission starts in crtc->enable (CRTC->Encoder), and in encoder->enable (encoder->bridge), so in bridges->enable it would be too late for LP11 state - transmission can be already in progress. It shows well that this order of calls does not fit well to DSI, and probably many other protocols. Maybe moving most of the bridge->enable code to bridge->pre_enable would help, but I am not sur if it will not pose another issues. Yep, that won't work e.g. with the exynos DSIM, because exynos_dsi_set_display_mode() sets the data lanes to LP11. Isn't the bigger question for SN65DSI8[3|4|5] whether the clock lane is running or not in pre_enable? I think the bigger question really is -- how do we cater for all the different bridges with different init-time requirements. This is quick analysis, so please fix me if I am wrong. I pretty much agree that the current state of things does not fit with DSI too well. That was why I was questioning how DSI was meant to be implemented in https://lore.kernel.org/dri-devel/capy8ntbukrksam59y+72dw_6xoekvswpwffzpj3uvge6pv4...@mail.gmail.com/ The need to have the DSI host in a defined idle state (often LP-11, but varying whether the clock lane is in HS) before powering up the panel/bridge is incredibly common, but currently undefined in DRM. Taking the SN65DSI83 as an example, the datasheet [1] section 7.4.2 states that the clock lane must be in HS mode, and data lanes in LP-11 when coming out of reset. That means that we can't be "enable" as that will have the data lanes in HS mode and sending video, and as we can't be in "pre_enable" as the DSI PHY will be powered down and so we won't have the clock lanes in HS mode. I've hit a similar one with the Toshiba TC358762 where it seems to get upset if it is receiving video data when it gets configured. panel-raspberrypi-touchscreen[2] which drives that chip is intermittent when using panel enable, whereas panel prepare is significantly more reliable but relies on the DSI host being initialised to LP-11 by breaking the chain. Right To make it worse, not initing the DSI bridge exactly per spec leads to intermittent failures, not consistently occuring ones. Dave [1] https://www.ti.com/lit/ds/symlink/sn65dsi83.pdf [2] https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/panel/panel-raspberrypi-touchscreen.c Unrelated to this discussion -- there is a tc358762 driver, driver for that attiny88 regulator, and driver for the touchscreen chip, on that rpi 7" display, in upstream. You can use those to replace the composite panel driver (it works at least against stm32mp1 DSI host with the rpi 7" panel). Sadly there is little documentation for that attiny88 protocol or firmware, that's what I don't like about that panel.
Re: [PATCH] drm/bridge: ti-sn65dsi83: Check link status register after enabling the bridge
On Wed, 8 Sept 2021 at 16:26, Marek Vasut wrote: > > On 9/8/21 1:11 PM, Dave Stevenson wrote: > > Hi Marek and Andrzej > > Hello Dave, > > skipping the protocol discussion, which I hope Andrej will pick up. > > [...] > > >>> Usually video transmission starts in crtc->enable (CRTC->Encoder), and > >>> in encoder->enable (encoder->bridge), so in bridges->enable it would be > >>> too late for LP11 state - transmission can be already in progress. > >>> > >>> It shows well that this order of calls does not fit well to DSI, and > >>> probably many other protocols. > >>> > >>> Maybe moving most of the bridge->enable code to bridge->pre_enable would > >>> help, but I am not sur if it will not pose another issues. > >> > >> Yep, that won't work e.g. with the exynos DSIM, because > >> exynos_dsi_set_display_mode() sets the data lanes to LP11. > > > > Isn't the bigger question for SN65DSI8[3|4|5] whether the clock lane > > is running or not in pre_enable? > > I think the bigger question really is -- how do we cater for all the > different bridges with different init-time requirements. > > >>> This is quick analysis, so please fix me if I am wrong. > >> > >> I pretty much agree that the current state of things does not fit with > >> DSI too well. > > > > That was why I was questioning how DSI was meant to be implemented in > > https://lore.kernel.org/dri-devel/capy8ntbukrksam59y+72dw_6xoekvswpwffzpj3uvge6pv4...@mail.gmail.com/ > > > > The need to have the DSI host in a defined idle state (often LP-11, > > but varying whether the clock lane is in HS) before powering up the > > panel/bridge is incredibly common, but currently undefined in DRM. > > > > Taking the SN65DSI83 as an example, the datasheet [1] section 7.4.2 > > states that the clock lane must be in HS mode, and data lanes in LP-11 > > when coming out of reset. That means that we can't be "enable" as that > > will have the data lanes in HS mode and sending video, and as we can't > > be in "pre_enable" as the DSI PHY will be powered down and so we won't > > have the clock lanes in HS mode. > > > > I've hit a similar one with the Toshiba TC358762 where it seems to get > > upset if it is receiving video data when it gets configured. > > panel-raspberrypi-touchscreen[2] which drives that chip is > > intermittent when using panel enable, whereas panel prepare is > > significantly more reliable but relies on the DSI host being > > initialised to LP-11 by breaking the chain. > > Right > > To make it worse, not initing the DSI bridge exactly per spec leads to > intermittent failures, not consistently occuring ones. Yes, I suspect it's been just down to timing as to whether the display side starts producing video data before or after all the configuration has been sent, and I get random LP commands timing out. (We're only dropping to LP in vertical blanking, so there isn't a huge amount of time). > >Dave > > > > [1] https://www.ti.com/lit/ds/symlink/sn65dsi83.pdf > > [2] > > https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/panel/panel-raspberrypi-touchscreen.c > > Unrelated to this discussion -- there is a tc358762 driver, driver for > that attiny88 regulator, and driver for the touchscreen chip, on that > rpi 7" display, in upstream. You can use those to replace the composite > panel driver (it works at least against stm32mp1 DSI host with the rpi > 7" panel). Sadly there is little documentation for that attiny88 > protocol or firmware, that's what I don't like about that panel. Thank you, I know they exist, and I'm looking at exactly that problem at the moment! panel-raspberrypi-touchscreen doesn't expose any form of regulator control, so trying to hook edt-ft54x6 on for the touchscreen sees it getting the power yanked from under it. I'm trying to switch to those drivers so that the two play nicely. The Atmel is a bit nasty in trying to initialise the bridge, panel, and touch all at the same time. The edt-ft54x6 driver generally probes first and powers everything up when the DSI host isn't initialised. This seems to upset the TC358762 and it then won't initialise. It is possible to poke most things manually through the PORTA, PORTB and PORTC commands, but I'm currently failing to create a reliable mechanism :-( I have the advantage that I have the source code for the Atmel (it's not nice) Dave
Re: Handling DRM master transitions cooperatively
On Wed, Sep 08, 2021 at 09:51:54AM +, Simon Ser wrote: > > On Tue, 07 Sep 2021 10:19:03 + > > Simon Ser wrote: > > > > > FWIW, I've just hit a case where a compositor leaves a "rotation" KMS > > > prop set behind, then Xorg tries to startup and fails because it doesn't > > > reset this prop. So none of this is theoretical. > > > > > > I still think a "reset all KMS props to an arbitrary default value" flag > > > in drmModeAtomicCommit is the best way forward. I'm not sure a user-space > > > protocol would help too much. > > > > Hi Simon, > > > > for the "reset KMS state" problem, sure. Thanks for confirming the > > problem, too. > > > > The hand-off problem does need userspace protocol though, so that the > > two parties can negotiate what part of KMS state can be inherited by > > the receiver and who will do the animation from the first to the second > > state in case you want to avoid abrupt changes. It would also be useful > > for a cross-fade as a perhaps more flexible way than the current "leak > > an FB, let the next KMS client scrape it via ioctls and copy it so it > > can be textured from". > > The KMS state can be limited to single FB on primary plane covering the whole > CRTC, no scaling, no other property set than FB_ID/CRTC_*/SRC_*. > > Is it useful to make the previous client perform the animation? I don't really > understand the use-case here. The use case for the animation is e.g. the transition from Plymouth to the display server. Currently it is done as a still frame transition, maybe with a blend-over effect. But with the current design it is not possible to blend Plymouth's animation over into another animation in the display server because the second client lacks the knowledge how to keep it going for a little bit. Another use case is switching between sessions which currently also is only possible as a still frame transition. However, if you wanted to present the session switching by doing e.g. a shaking screen animation and blending the old display content over into the new content then the first client would have to render the first half of the animation, and the second client would have to render the second half during which it would then blend away the content of the first screen while blending in its own content and also slowing the shaking to a stop. For that to work the second client would need all the information necessary to render that animation, and also a way to perform the frame-perfect change-over. Granted, that is a very complicated, eye-candy-oriented use case, but it would serve to show-case the potential of the design. Regards.
Re: Handling DRM master transitions cooperatively
On Tue, Sep 07, 2021 at 05:52:41PM +0200, Sebastian Wick wrote: > > On Tue, 07 Sep 2021 10:19:03 + > > Simon Ser wrote: > > > > > FWIW, I've just hit a case where a compositor leaves a "rotation" KMS > > > prop set behind, then Xorg tries to startup and fails because it doesn't > > > reset this prop. So none of this is theoretical. > > > > > > I still think a "reset all KMS props to an arbitrary default value" flag > > > in drmModeAtomicCommit is the best way forward. I'm not sure a user-space > > > protocol would help too much. > > > > Hi Simon, > > > > for the "reset KMS state" problem, sure. Thanks for confirming the > > problem, too. > > > > The hand-off problem does need userspace protocol though, so that the > > two parties can negotiate what part of KMS state can be inherited by > > the receiver and who will do the animation from the first to the second > > state in case you want to avoid abrupt changes. It would also be useful > > for a cross-fade as a perhaps more flexible way than the current "leak > > an FB, let the next KMS client scrape it via ioctls and copy it so it > > can be textured from". > > The state reset already is an implicit protocol. Another IPC mechanism > however could extend it to work the other way around: instead of > inheriting all the state and trying to transition from that to the > second client's desired state the second client would send its own > desired state back to the first (instead of applying it immediately) > which would then try to transition from its own state to the second > state (and if it can't you fall back to the implicit inherited state > protocol). However, this is only an improvement if the first client > knows how to do the transition and the second does not. All in all I > doubt that you can convince most people to add this kind of complexity > just for slightly higher chances of a good transition. > > The reset state protocol on the other hand solves real problems and > gives you a good transition as long as the second client knows about the > same properties as the previous one which usually is the case for the > typical bootsplash->login manager->compositor chain. > > Maybe I'm completely missing how such a protocol would work though. The idea was that since you would have to have some IPC mechanism in user space anyway to quickly effect a flicker-free transition from Plymouth to the display manager (since, as de Goede reiterates in the other message, both processes must have the device already open and call drmSetMaster/drmDropMaster coordinatedly) you might just as well look for ways how it could be designed for the benefit of everyone. Using "implicit protocols" for things like this is usually the go-to way, not because it's good design, but because it is easy to implement. But these "implicit protocols" have a tendency to greatly limit what can be done and to not be easily adaptable once the use cases become more complicated or refined, and thus they force contortions on everyone eventually. How such a protocol could look? I don't know. Maybe some DBus interface for a broker/multiplexer for shared devices that would keep track of the current DRM master and tell any process interested in obtaining it what process to talk to. It could then contact it either via DBus or over a separate socket, communicate its capabilities, negotiate the modalities for the transition and acquire the necessary resources in the form of file descriptors passed over DBus/the socket. Then both processes could set themselves up for the transition and effect it, which could involve e.g. unlocking a locked mutex/semaphore in shared memory. Alternatively, the donor could refuse the handover, e.g. if a screen locker is configured to prohibit release of the device. Complexitywise the sky would be the limit, of course, but it needn't be this complicated from the beginning. An initial version of such a protocol could be held just as simple as the status quo. As for the point raised by Paalanen that implementing something like this would require a lot of effort I must state that, while certainly true, many of the things I mentioned here are already implemented somehow somewhere. Plymouth has a control socket and protocol with which the state of the splash screen can be controlled from the outside to make the transition to gdm smoother. The xlease project apparently was designed with the intent that DRM devices should be leased (and subleased) out to processes, and cross-process coordination would be governed this way. The kmscon project also had to come up with something to govern device access since it could no longer piggy-back on the VT-API. systemd-logind also draws up a framework for governance over a shared device and how to tie them to sessions/seats (with such peculiarities that you cannot auto-spawn a getty on tty1 since that is "reserved" for Wayland). Then there is the VT console, and probably lots of other little things I don't even know about.
Re: Handling DRM master transitions cooperatively
On Wed, Sep 8, 2021 at 9:36 AM Pekka Paalanen wrote: > > On Tue, 7 Sep 2021 14:42:56 +0200 > Hans de Goede wrote: > > > Hi, > > > > On 9/7/21 12:07 PM, Pekka Paalanen wrote: > > > On Fri, 3 Sep 2021 21:08:21 +0200 > > > Dennis Filder wrote: > > > > > >> Hans de Goede asked me to take a topic from a private discussion here. > > >> I must also preface that I'm not a graphics person and my knowledge of > > >> DRI/DRM is cursory at best. > > >> > > >> I initiated the conversation with de Goede after learning that the X > > >> server now supports being started with an open DRM file descriptor > > >> (this was added for Keith Packard's xlease project). I wondered if > > >> that could be used to smoothen the Plymouth->X transition somehow and > > >> asked de Goede if there were any such plans. He denied, but mentioned > > >> that a new ioctl is in the works to prevent the kernel from wiping the > > >> contents of a frame buffer after a device is closed, and that this > > >> would help to keep transitions smooth. > > > > > > Hi, > > > > > > I believe the kernel is not wiping anything on device close. If > > > something in the KMS state is wiped, it originates in userspace: > > > > > > - Plymouth doing something (e.g. RmFB on an in-use FB will turn the > > > output off, you need to be careful to "leak" your FB if you want a > > > smooth hand-over) > > > > The "kernel is not wiping anything on device close" is not true, > > when closing /dev/dri/card# any remaining FBs from the app closing > > it will be dealt with as if they were RmFB-ed, causing the screen > > to show what I call "the fallback fb", at least with the i915 driver. > > No, that's not what should happen AFAIK. > > True, all FBs that are not referenced by active CRTCs or planes will > get freed, since their refcount drops to zero, but those CRTCs and > planes that are active will remain active and therefore keep their > reference to the respective FBs and so the FBs remain until replaced or > turned off explicitly (by e.g. fbcon if you switch to that rather than > another userspace KMS client). I believe that is the whole reason why > e.g. DRM_IOCTL_MODE_GETFB2 can be useful, otherwise the next KMS client > would not have anything to scrape. > > danvet, what is the DRM core intention? Historical accidents mostly. There's two things that foil easy handover to the next compositor: - RMFB instead of CLOSEFB semantics, especially when closing the drmfd. This is uapi, so anything we change needs to be opt-in - Forced fbdev restore on final close of all drm fd. This is only prevented if there's a drm master left around (systemd-logind can keep that instead of forcing the compositor to survive until the other has taken over, which it needs to do anyway to prevent the drm master handover from going sideways). This can be fixed by simply disabling fbdev completely, which you really want to do anyway. Again it's uabi, people will complain if we break this I think. > Or am I confused because display servers do not tend to close the DRM > device fd on switch-out but Plymouth does (too early)? Yeah, that stops both forced restore/disable from kicking in. > If so, why can't Plymouth keep the device open longer and quit only > when the hand-off is complete? Not quitting too early would be a > prerequisite for any explicit hand-off protocol as well. With closefb semantics and fbdev disabled plymouth could quit early, and things still work. -Daniel > > Thanks, > pq > > > > - Xorg doing something (e.g. resetting instead of inheriting KMS state) > > > > > > - Something missed in the hand-off sequence which allows fbcon to > > > momentarily take over between Plymouth and Xorg. This would need to > > > be fixed between Plymouth and Xorg. > > > > > > - Maybe systemd-logind does something odd to the KMS device? It has > > > pretty wild code there. Or maybe it causes fbcon to take over. > > > > > > What is the new ioctl you referred to? > > > > It is an ioctl to mark a FB to not have it auto-removed on device-close, > > instead leaving it in place until some some kernel/userspace client > > actively installs another FB. This was proposed by Rob Clark quite > > a while ago, but it never got anywhere because of lack of userspace > > actually interested in using it. > > > > I've been thinking about reviving Rob's patch, since at least for > > plymouth this would be pretty useful to have. > > > > Regards, > > > > Hans > > > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [Intel-gfx] [PATCH 1/8] drm/i915/xehp: Define compute class and engine
On Tue, Sep 07, 2021 at 10:19:09AM -0700, Matt Roper wrote: > Introduce a Compute Command Streamer (CCS), which has access to > the media and GPGPU pipelines (but not the 3D pipeline). > > To begin with, define the compute class/engine common functions, based > on the existing render ones. > > Bspec: 46167, 45544 > Original-patch-by: Michel Thierry > Cc: Daniele Ceraolo Spurio > Cc: Tvrtko Ursulin > Cc: Vinay Belgaumkar > Cc: Szymon Morek > UMD (compute): https://github.com/intel/compute-runtime/pull/451 > Signed-off-by: Rodrigo Vivi > Signed-off-by: Daniele Ceraolo Spurio > Signed-off-by: Aravind Iddamsetty > Signed-off-by: Matt Roper > --- > drivers/gpu/drm/i915/gt/intel_engine_cs.c| 28 > drivers/gpu/drm/i915/gt/intel_engine_types.h | 9 ++- > drivers/gpu/drm/i915/gt/intel_engine_user.c | 5 +++- > drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h | 13 + > drivers/gpu/drm/i915/i915_reg.h | 8 ++ > include/uapi/drm/i915_drm.h | 1 + > 6 files changed, 57 insertions(+), 7 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c > b/drivers/gpu/drm/i915/gt/intel_engine_cs.c > index 332efea696a5..69944bd8c19d 100644 > --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c > +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c > @@ -153,6 +153,34 @@ static const struct engine_info intel_engines[] = { > { .graphics_ver = 12, .base = XEHP_VEBOX4_RING_BASE } > }, > }, > + [CCS0] = { > + .class = COMPUTE_CLASS, > + .instance = 0, > + .mmio_bases = { > + { .graphics_ver = 12, .base = GEN12_COMPUTE0_RING_BASE } > + } > + }, > + [CCS1] = { > + .class = COMPUTE_CLASS, > + .instance = 1, > + .mmio_bases = { > + { .graphics_ver = 12, .base = GEN12_COMPUTE1_RING_BASE } > + } > + }, > + [CCS2] = { > + .class = COMPUTE_CLASS, > + .instance = 2, > + .mmio_bases = { > + { .graphics_ver = 12, .base = GEN12_COMPUTE2_RING_BASE } > + } > + }, > + [CCS3] = { > + .class = COMPUTE_CLASS, > + .instance = 3, > + .mmio_bases = { > + { .graphics_ver = 12, .base = GEN12_COMPUTE3_RING_BASE } > + } > + }, > }; > > /** > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h > b/drivers/gpu/drm/i915/gt/intel_engine_types.h > index bfbfe53c23dd..dcb9d8b2362a 100644 > --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h > +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h > @@ -33,7 +33,8 @@ > #define VIDEO_ENHANCEMENT_CLASS 2 > #define COPY_ENGINE_CLASS3 > #define OTHER_CLASS 4 > -#define MAX_ENGINE_CLASS 4 > +#define COMPUTE_CLASS5 > +#define MAX_ENGINE_CLASS 5 > #define MAX_ENGINE_INSTANCE 7 > > #define I915_MAX_SLICES 3 > @@ -95,6 +96,7 @@ struct i915_ctx_workarounds { > > #define I915_MAX_VCS 8 > #define I915_MAX_VECS4 > +#define I915_MAX_CCS 4 > > /* > * Engine IDs definitions. > @@ -117,6 +119,11 @@ enum intel_engine_id { > VECS2, > VECS3, > #define _VECS(n) (VECS0 + (n)) > + CCS0, > + CCS1, > + CCS2, > + CCS3, > +#define _CCS(n) (CCS0 + (n)) > I915_NUM_ENGINES > #define INVALID_ENGINE ((enum intel_engine_id)-1) > }; > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c > b/drivers/gpu/drm/i915/gt/intel_engine_user.c > index 8f8bea08e734..d981621a7c30 100644 > --- a/drivers/gpu/drm/i915/gt/intel_engine_user.c > +++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c > @@ -47,6 +47,7 @@ static const u8 uabi_classes[] = { > [COPY_ENGINE_CLASS] = I915_ENGINE_CLASS_COPY, > [VIDEO_DECODE_CLASS] = I915_ENGINE_CLASS_VIDEO, > [VIDEO_ENHANCEMENT_CLASS] = I915_ENGINE_CLASS_VIDEO_ENHANCE, > + [COMPUTE_CLASS] = I915_ENGINE_CLASS_COMPUTE, > }; > > static int engine_cmp(void *priv, const struct list_head *A, > @@ -139,6 +140,7 @@ const char *intel_engine_class_repr(u8 class) > [COPY_ENGINE_CLASS] = "bcs", > [VIDEO_DECODE_CLASS] = "vcs", > [VIDEO_ENHANCEMENT_CLASS] = "vecs", > + [COMPUTE_CLASS] = "ccs", > }; > > if (class >= ARRAY_SIZE(uabi_names) || !uabi_names[class]) > @@ -162,6 +164,7 @@ static int legacy_ring_idx(const struct legacy_ring *ring) > [COPY_ENGINE_CLASS] = { BCS0, 1 }, > [VIDEO_DECODE_CLASS] = { VCS0, I915_MAX_VCS }, > [VIDEO_ENHANCEMENT_CLASS] = { VECS0, I915_MAX_VECS }, > + [COMPUTE_CLASS] = { CCS0, I915_MAX_CCS }, > }; > > if (GEM_DEBUG_WARN_ON(ring->class >= ARRAY_SIZE(map))) > @@ -190,7 +193,7 @@ static void add_legacy_ring(struct legacy_ring *ring, > void intel_engines_driver_register(struct drm_i915_private *i915) > { >
Re: [Intel-gfx] [PATCH 2/8] drm/i915/xehp: CCS shares the render reset domain
On Tue, Sep 07, 2021 at 10:19:10AM -0700, Matt Roper wrote: > The reset domain is shared between render and all compute engines, > so resetting one will affect the others. > > Note: Before performing a reset on an RCS or CCS engine, the GuC will > attempt to preempt-to-idle the other non-hung RCS/CCS engines to avoid > impacting other clients (since some shared modules will be reset). If > other engines are executing non-preemptable workloads, the impact is > unavoidable and some work may be lost. > > Bspec: 52549 > Original-patch-by: Michel Thierry > Cc: Tvrtko Ursulin > Cc: Vinay Belgaumkar > Signed-off-by: Daniele Ceraolo Spurio > Signed-off-by: Aravind Iddamsetty > Signed-off-by: Matt Roper Do we have igts validating this all properly? Specifically that the reset stats are incremented correctly for guilty respectively victimized contexts. This is necessary if it doesn't exist yet. Also you need a patch set here that fixes up the igts which have wrong assumptions about context isolation. -Daniel > --- > drivers/gpu/drm/i915/gt/intel_reset.c | 4 > 1 file changed, 4 insertions(+) > > diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c > b/drivers/gpu/drm/i915/gt/intel_reset.c > index 91200c43951f..30598c1d070c 100644 > --- a/drivers/gpu/drm/i915/gt/intel_reset.c > +++ b/drivers/gpu/drm/i915/gt/intel_reset.c > @@ -507,6 +507,10 @@ static int gen11_reset_engines(struct intel_gt *gt, > [VECS1] = GEN11_GRDOM_VECS2, > [VECS2] = GEN11_GRDOM_VECS3, > [VECS3] = GEN11_GRDOM_VECS4, > + [CCS0] = GEN11_GRDOM_RENDER, > + [CCS1] = GEN11_GRDOM_RENDER, > + [CCS2] = GEN11_GRDOM_RENDER, > + [CCS3] = GEN11_GRDOM_RENDER, > }; > struct intel_engine_cs *engine; > intel_engine_mask_t tmp; > -- > 2.25.4 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [PATCH] Change igt_log level from IGT_LOG_WARN to IGT_LOG_INFO
On Wed, Sep 08, 2021 at 12:03:56AM +0530, Jeevan B wrote: > change igt_warn to igt_info when unloading the snd module before > unbinding i915 until WA is fixed. > > Signed-off-by: Jeevan B Please submit per https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/blob/master/CONTRIBUTING.md#sending-patches No one (not even CI) pick up igt patches submitted to dri-devel. -Daniel > --- > tests/core_hotunplug.c | 2 +- > tests/device_reset.c | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c > index 2d73e27f..b3661668 100644 > --- a/tests/core_hotunplug.c > +++ b/tests/core_hotunplug.c > @@ -164,7 +164,7 @@ static void driver_unbind(struct hotunplug *priv, const > char *prefix, > igt_lsof("/dev/snd"); > igt_skip("Audio is in use, skipping\n"); > } else { > - igt_warn("Preventively unloaded snd_hda_intel\n"); > + igt_info("Preventively unloaded snd_hda_intel\n"); > } > } > > diff --git a/tests/device_reset.c b/tests/device_reset.c > index e6a468e6..982ba5ef 100644 > --- a/tests/device_reset.c > +++ b/tests/device_reset.c > @@ -201,7 +201,7 @@ static void driver_unbind(struct device_fds *dev) > igt_lsof("/dev/snd"); > igt_skip("Audio is in use, skipping\n"); > } else { > - igt_warn("Preventively unloaded snd_hda_intel\n"); > + igt_info("Preventively unloaded snd_hda_intel\n"); > } > } > > -- > 2.19.1 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [PATCH v2 (repost)] fbmem: don't allow too huge resolutions
On Wed, Sep 08, 2021 at 07:27:49PM +0900, Tetsuo Handa wrote: > syzbot is reporting page fault at vga16fb_fillrect() [1], for > vga16fb_check_var() is failing to detect multiplication overflow. > > if (vxres * vyres > maxmem) { > vyres = maxmem / vxres; > if (vyres < yres) > return -ENOMEM; > } > > Since no module would accept too huge resolutions where multiplication > overflow happens, let's reject in the common path. > > Link: https://syzkaller.appspot.com/bug?extid=04168c8063cfdde1db5e [1] > Reported-by: syzbot > Debugged-by: Randy Dunlap > Signed-off-by: Tetsuo Handa > Reviewed-by: Geert Uytterhoeven > --- > Changes in v2: > Use check_mul_overflow(), suggested by Geert Uytterhoeven > . Pushed to drm-misc-next-fixes so it should get into current merge window. I also added a cc: stable here, I htink it's needed. Thanks a lot to both you&Geert for handling this! -Daniel > > drivers/video/fbdev/core/fbmem.c | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/drivers/video/fbdev/core/fbmem.c > b/drivers/video/fbdev/core/fbmem.c > index 71fb710f1ce3..7420d2c16e47 100644 > --- a/drivers/video/fbdev/core/fbmem.c > +++ b/drivers/video/fbdev/core/fbmem.c > @@ -962,6 +962,7 @@ fb_set_var(struct fb_info *info, struct fb_var_screeninfo > *var) > struct fb_var_screeninfo old_var; > struct fb_videomode mode; > struct fb_event event; > + u32 unused; > > if (var->activate & FB_ACTIVATE_INV_MODE) { > struct fb_videomode mode1, mode2; > @@ -1008,6 +1009,11 @@ fb_set_var(struct fb_info *info, struct > fb_var_screeninfo *var) > if (var->xres < 8 || var->yres < 8) > return -EINVAL; > > + /* Too huge resolution causes multiplication overflow. */ > + if (check_mul_overflow(var->xres, var->yres, &unused) || > + check_mul_overflow(var->xres_virtual, var->yres_virtual, &unused)) > + return -EINVAL; > + > ret = info->fbops->fb_check_var(var, info); > > if (ret) > -- > 2.18.4 > > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [Intel-gfx] [PATCH v2] drm/i915: Handle Intel igfx + Intel dgfx hybrid graphics setup
On Thu, Sep 02, 2021 at 04:01:40PM +0100, Tvrtko Ursulin wrote: > > On 02/09/2021 15:33, Daniel Vetter wrote: > > On Tue, Aug 31, 2021 at 02:18:15PM +0100, Tvrtko Ursulin wrote: > > > > > > On 31/08/2021 13:43, Daniel Vetter wrote: > > > > On Tue, Aug 31, 2021 at 10:15:03AM +0100, Tvrtko Ursulin wrote: > > > > > > > > > > On 30/08/2021 09:26, Daniel Vetter wrote: > > > > > > On Fri, Aug 27, 2021 at 03:44:42PM +0100, Tvrtko Ursulin wrote: > > > > > > > > > > > > > > On 27/08/2021 15:39, Tvrtko Ursulin wrote: > > > > > > > > From: Tvrtko Ursulin > > > > > > > > > > > > > > > > In short this makes i915 work for hybrid setups (DRI_PRIME=1 > > > > > > > > with Mesa) > > > > > > > > when rendering is done on Intel dgfx and scanout/composition on > > > > > > > > Intel > > > > > > > > igfx. > > > > > > > > > > > > > > > > Before this patch the driver was not quite ready for that > > > > > > > > setup, mainly > > > > > > > > because it was able to emit a semaphore wait between the two > > > > > > > > GPUs, which > > > > > > > > results in deadlocks because semaphore target location in HWSP > > > > > > > > is neither > > > > > > > > shared between the two, nor mapped in both GGTT spaces. > > > > > > > > > > > > > > > > To fix it the patch adds an additional check to a couple of > > > > > > > > relevant code > > > > > > > > paths in order to prevent using semaphores for inter-engine > > > > > > > > synchronisation between different driver instances. > > > > > > > > > > > > > > > > Patch also moves singly used i915_gem_object_last_write_engine > > > > > > > > to be > > > > > > > > private in its only calling unit (debugfs), while modifying it > > > > > > > > to only > > > > > > > > show activity belonging to the respective driver instance. > > > > > > > > > > > > > > > > What remains in this problem space is the question of the GEM > > > > > > > > busy ioctl. > > > > > > > > We have a somewhat ambigous comment there saying only status of > > > > > > > > native > > > > > > > > fences will be reported, which could be interpreted as either > > > > > > > > i915, or > > > > > > > > native to the drm fd. For now I have decided to leave that as > > > > > > > > is, meaning > > > > > > > > any i915 instance activity continues to be reported. > > > > > > > > > > > > > > > > v2: > > > > > > > > * Avoid adding rq->i915. (Chris) > > > > > > > > > > > > > > > > Signed-off-by: Tvrtko Ursulin > > > > > > > > > > > > Can't we just delete semaphore code and done? > > > > > > - GuC won't have it > > > > > > - media team benchmarked on top of softpin media driver, found no > > > > > > difference > > > > > > > > > > You have S-curve for saturated workloads or something else? How > > > > > thorough and > > > > > which media team I guess. > > > > > > > > > > From memory it was a nice win for some benchmarks (non-saturated > > > > > ones), but > > > > > as I have told you previously, we haven't been putting numbers in > > > > > commit > > > > > messages since it wasn't allowed. I may be able to dig out some more > > > > > details > > > > > if I went trawling through GEM channel IRC logs, although probably > > > > > not the > > > > > actual numbers since those were usually on pastebin. Or you go an > > > > > talk with > > > > > Chris since he probably remembers more details. Or you just decide > > > > > you don't > > > > > care and remove it. I wouldn't do that without putting the complete > > > > > story in > > > > > writing, but it's your call after all. > > > > > > > > Media has also changed, they're not using relocations anymore. > > > > > > Meaning you think it changes the benchmarking story? When coupled with > > > removal of GPU relocations then possibly yes. > > > > > > > Unless there's solid data performance tuning of any kind that gets in > > > > the > > > > way simply needs to be removed. Yes this is radical, but the codebase is > > > > in a state to require this. > > > > > > > > So either way we'd need to rebenchmark this if it's really required. > > > > Also > > > > > > Therefore can you share what benchmarks have been done or is it secret? > > > As > > > said, I think the non-saturated case was the more interesting one here. > > > > > > > if we really need this code still someone needs to fix the design, the > > > > current code is making layering violations an art form. > > > > > > > > > Anyway, without the debugfs churn it is more or less two line patch > > > > > to fix > > > > > igfx + dgfx hybrid setup. So while mulling it over this could go in. > > > > > I'd > > > > > just refine it to use a GGTT check instead of GT. And unless DG1 ends > > > > > up > > > > > being GuC only. > > > > > > > > The minimal robust fix here is imo to stop us from upcasting dma_fence > > > > to > > > > i915_request if it's not for our device. Not sprinkle code here into the > > > > semaphore code. We shouldn't even get this far with foreign fences. > > > > > > Device check does not w
Re: [PATCH] drm/ttm: provide default page protection for UML
On Sat, Sep 04, 2021 at 11:50:37AM +0800, David Gow wrote: > On Thu, Sep 2, 2021 at 10:46 PM Daniel Vetter wrote: > > > > On Thu, Sep 02, 2021 at 07:19:01AM +0100, Anton Ivanov wrote: > > > On 02/09/2021 06:52, Randy Dunlap wrote: > > > > On 9/1/21 10:48 PM, Anton Ivanov wrote: > > > > > On 02/09/2021 03:01, Randy Dunlap wrote: > > > > > > boot_cpu_data [struct cpuinfo_um (on UML)] does not have a struct > > > > > > member named 'x86', so provide a default page protection mode > > > > > > for CONFIG_UML. > > > > > > > > > > > > Mends this build error: > > > > > > ../drivers/gpu/drm/ttm/ttm_module.c: In function > > > > > > ‘ttm_prot_from_caching’: > > > > > > ../drivers/gpu/drm/ttm/ttm_module.c:59:24: error: ‘struct > > > > > > cpuinfo_um’ has no member named ‘x86’ > > > > > >else if (boot_cpu_data.x86 > 3) > > > > > > ^ > > > > > > > > > > > > Fixes: 3bf3710e3718 ("drm/ttm: Add a generic TTM memcpy move for > > > > > > page-based iomem") > > > > > > Signed-off-by: Randy Dunlap > > > > > > Cc: Thomas Hellström > > > > > > Cc: Christian König > > > > > > Cc: Huang Rui > > > > > > Cc: dri-devel@lists.freedesktop.org > > > > > > Cc: Jeff Dike > > > > > > Cc: Richard Weinberger > > > > > > Cc: Anton Ivanov > > > > > > Cc: linux...@lists.infradead.org > > > > > > Cc: David Airlie > > > > > > Cc: Daniel Vetter > > > > > > --- > > > > > > drivers/gpu/drm/ttm/ttm_module.c |4 > > > > > > 1 file changed, 4 insertions(+) > > > > > > > > > > > > --- linux-next-20210901.orig/drivers/gpu/drm/ttm/ttm_module.c > > > > > > +++ linux-next-20210901/drivers/gpu/drm/ttm/ttm_module.c > > > > > > @@ -53,6 +53,9 @@ pgprot_t ttm_prot_from_caching(enum ttm_ > > > > > > if (caching == ttm_cached) > > > > > > return tmp; > > > > > > +#ifdef CONFIG_UML > > > > > > +tmp = pgprot_noncached(tmp); > > > > > > +#else > > > > > > #if defined(__i386__) || defined(__x86_64__) > > > > > > if (caching == ttm_write_combined) > > > > > > tmp = pgprot_writecombine(tmp); > > > > > > @@ -69,6 +72,7 @@ pgprot_t ttm_prot_from_caching(enum ttm_ > > > > > > #if defined(__sparc__) > > > > > > tmp = pgprot_noncached(tmp); > > > > > > #endif > > > > > > +#endif > > > > > > return tmp; > > > > > > } > > > > > > > > > > Patch looks OK. > > > > > > > > > > I have a question though - why all of DRM is not !UML in config. Not > > > > > like we can use them. > > > > > > > > I have no idea about that. > > > > Hopefully one of the (other) UML maintainers can answer you. > > > > > > Touche. > > > > > > We will discuss that and possibly push a patch to !UML that part of the > > > tree. IMHO it is not applicable. > > > > I thought kunit is based on top of uml, and we do want to eventually adopt > > that. Especially for helper libraries like ttm. > > UML is not actually a dependency for KUnit, so it's definitely > possible to test things which aren't compatible with UML. (In fact, > there's even now some tooling support to use qemu instead on a number > of architectures.) > > That being said, the KUnit tooling does use UML by default, so if it's > not too difficult to keep some level of UML support, it'll make it a > little easier (and faster) for people to run any KUnit tests. Yeah my understanding is that uml is the quickest way to spawn a new kernel, which kunit needs to run. And I really do like that idea, because having virtualization support in cloud CI systems (which use containers themselves) is a bit a fun exercise. The less we rely on virtual machines in containers for that, the better. Hence why I really like the uml approach for kunit. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [PATCH][next] drm/i915: clean up inconsistent indenting
On Thu, Sep 02, 2021 at 10:57:37PM +0100, Colin King wrote: > From: Colin Ian King > > There is a statement that is indented one character too deeply, > clean this up. > > Signed-off-by: Colin Ian King Queued to drm-intel-gt-next, thanks for patch. -Daniel > --- > drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c > b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c > index de5f9c86b9a4..aeb324b701ec 100644 > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c > @@ -2565,7 +2565,7 @@ __execlists_context_pre_pin(struct intel_context *ce, > if (!__test_and_set_bit(CONTEXT_INIT_BIT, &ce->flags)) { > lrc_init_state(ce, engine, *vaddr); > > - __i915_gem_object_flush_map(ce->state->obj, 0, > engine->context_size); > + __i915_gem_object_flush_map(ce->state->obj, 0, > engine->context_size); > } > > return 0; > -- > 2.32.0 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [PATCH] drm/sched: Fix drm_sched_fence_free() so it can be passed an uninitialized fence
On Fri, Sep 03, 2021 at 02:05:54PM +0200, Boris Brezillon wrote: > drm_sched_job_cleanup() will pass an uninitialized fence to > drm_sched_fence_free(), which will cause to_drm_sched_fence() to return > a NULL fence object, causing a NULL pointer deref when this NULL object > is passed to kmem_cache_free(). > > Let's create a new drm_sched_fence_free() function that takes a > drm_sched_fence pointer and suffix the old function with _rcu. While at > it, complain if drm_sched_fence_free() is passed an initialized fence > or if drm_sched_fence_free_rcu() is passed an uninitialized fence. > > Fixes: dbe48d030b28 ("drm/sched: Split drm_sched_job_init") > Signed-off-by: Boris Brezillon > --- > Found while debugging another issue in panfrost causing a failure in > the submit ioctl and exercising the error path (path that calls > drm_sched_job_cleanup() on an unarmed job). Reviewed-by: Daniel Vetter I already provided an irc r-b, just here for the record too. -Daniel > --- > drivers/gpu/drm/scheduler/sched_fence.c | 29 - > drivers/gpu/drm/scheduler/sched_main.c | 2 +- > include/drm/gpu_scheduler.h | 2 +- > 3 files changed, 21 insertions(+), 12 deletions(-) > > diff --git a/drivers/gpu/drm/scheduler/sched_fence.c > b/drivers/gpu/drm/scheduler/sched_fence.c > index db3fd1303fc4..7fd869520ef2 100644 > --- a/drivers/gpu/drm/scheduler/sched_fence.c > +++ b/drivers/gpu/drm/scheduler/sched_fence.c > @@ -69,19 +69,28 @@ static const char > *drm_sched_fence_get_timeline_name(struct dma_fence *f) > return (const char *)fence->sched->name; > } > > -/** > - * drm_sched_fence_free - free up the fence memory > - * > - * @rcu: RCU callback head > - * > - * Free up the fence memory after the RCU grace period. > - */ > -void drm_sched_fence_free(struct rcu_head *rcu) > +static void drm_sched_fence_free_rcu(struct rcu_head *rcu) > { > struct dma_fence *f = container_of(rcu, struct dma_fence, rcu); > struct drm_sched_fence *fence = to_drm_sched_fence(f); > > - kmem_cache_free(sched_fence_slab, fence); > + if (!WARN_ON_ONCE(!fence)) > + kmem_cache_free(sched_fence_slab, fence); > +} > + > +/** > + * drm_sched_fence_free - free up an uninitialized fence > + * > + * @fence: fence to free > + * > + * Free up the fence memory. Should only be used if drm_sched_fence_init() > + * has not been called yet. > + */ > +void drm_sched_fence_free(struct drm_sched_fence *fence) > +{ > + /* This function should not be called if the fence has been > initialized. */ > + if (!WARN_ON_ONCE(fence->sched)) > + kmem_cache_free(sched_fence_slab, fence); > } > > /** > @@ -97,7 +106,7 @@ static void drm_sched_fence_release_scheduled(struct > dma_fence *f) > struct drm_sched_fence *fence = to_drm_sched_fence(f); > > dma_fence_put(fence->parent); > - call_rcu(&fence->finished.rcu, drm_sched_fence_free); > + call_rcu(&fence->finished.rcu, drm_sched_fence_free_rcu); > } > > /** > diff --git a/drivers/gpu/drm/scheduler/sched_main.c > b/drivers/gpu/drm/scheduler/sched_main.c > index fbbd3b03902f..6987d412a946 100644 > --- a/drivers/gpu/drm/scheduler/sched_main.c > +++ b/drivers/gpu/drm/scheduler/sched_main.c > @@ -750,7 +750,7 @@ void drm_sched_job_cleanup(struct drm_sched_job *job) > dma_fence_put(&job->s_fence->finished); > } else { > /* aborted job before committing to run it */ > - drm_sched_fence_free(&job->s_fence->finished.rcu); > + drm_sched_fence_free(job->s_fence); > } > > job->s_fence = NULL; > diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h > index 7f77a455722c..f011e4c407f2 100644 > --- a/include/drm/gpu_scheduler.h > +++ b/include/drm/gpu_scheduler.h > @@ -509,7 +509,7 @@ struct drm_sched_fence *drm_sched_fence_alloc( > struct drm_sched_entity *s_entity, void *owner); > void drm_sched_fence_init(struct drm_sched_fence *fence, > struct drm_sched_entity *entity); > -void drm_sched_fence_free(struct rcu_head *rcu); > +void drm_sched_fence_free(struct drm_sched_fence *fence); > > void drm_sched_fence_scheduled(struct drm_sched_fence *fence); > void drm_sched_fence_finished(struct drm_sched_fence *fence); > -- > 2.31.1 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [PATCH] drm/i915/request: fix early tracepoints
On Fri, Sep 03, 2021 at 12:24:05PM +0100, Matthew Auld wrote: > Currently we blow up in trace_dma_fence_init, when calling into > get_driver_name or get_timeline_name, since both the engine and context > might be NULL(or contain some garbage address) in the case of newly > allocated slab objects via the request ctor. Note that we also use > SLAB_TYPESAFE_BY_RCU here, which allows requests to be immediately > freed, but delay freeing the underlying page by an RCU grace period. > With this scheme requests can be re-allocated, at the same time as they > are also being read by some lockless RCU lookup mechanism. > > One possible fix, since we don't yet have a fully initialised request > when in the ctor, is just setting the context/engine as NULL and adding > some extra handling in get_driver_name etc. And since the ctor is only > called for new slab objects(i.e allocate new page and call the ctor for > each object) it's safe to reset the context/engine prior to calling into > dma_fence_init, since we can be certain that no one is doing an RCU > lookup which might depend on peeking at the engine/context, like in > active_engine(), since the object can't yet be externally visible. > > In the recycled case(which might also be externally visible) the request > refcount always transitions from 0->1 after we set the context/engine > etc, which should ensure it's valid to dereference the engine for > example, when doing an RCU list-walk, so long as we can also increment > the refcount first. If the refcount is already zero, then the request is > considered complete/released. If it's non-zero, then the request might > be in the process of being re-allocated, or potentially still in flight, > however after successfully incrementing the refcount, it's possible to > carefully inspect the request state, to determine if the request is > still what we were looking for. Note that all externally visible > requests returned to the cache must have zero refcount. The commit message here is a bit confusing, since you start out with describing a solution that you're not actually implementing it. I usually do this by putting alternate solutions at the bottom, starting with "An alternate solution would be ..." or so. And then closing with why we don't do that, here it would be that we do no longer have a need for these partially set up i915_requests, and therefore just reverting that complication is the simplest solution. > An alternative fix then is to instead move the dma_fence_init out from > the request ctor. Originally this was how it was done, but it was moved > in: > > commit 855e39e65cfc33a73724f1cc644ffc5754864a20 > Author: Chris Wilson > Date: Mon Feb 3 09:41:48 2020 + > > drm/i915: Initialise basic fence before acquiring seqno > > where it looks like intel_timeline_get_seqno() relied on some of the > rq->fence state, but that is no longer the case since: > > commit 12ca695d2c1ed26b2dcbb528b42813bd0f216cfc > Author: Maarten Lankhorst > Date: Tue Mar 23 16:49:50 2021 +0100 > > drm/i915: Do not share hwsp across contexts any more, v8. > > intel_timeline_get_seqno() could also be cleaned up slightly by dropping > the request argument. > > Moving dma_fence_init back out of the ctor, should ensure we have enough > of the request initialised in case of trace_dma_fence_init. > Functionally this should be the same, and is effectively what we were > already open coding before, except now we also assign the fence->lock > and fence->ops, but since these are invariant for recycled > requests(which might be externally visible), and will therefore already > hold the same value, it shouldn't matter. We still leave the > spin_lock_init() in the ctor, since we can't re-init the rq->lock in > case it is already held. Holding rq->lock without having a full reference to it sounds like really bad taste. I think it would be good to have a (kerneldoc) comment next to i915_request.lock about this, with a FIXME. But separate patch. > Fixes: 855e39e65cfc ("drm/i915: Initialise basic fence before acquiring > seqno") > Signed-off-by: Matthew Auld > Cc: Michael Mason > Cc: Daniel Vetter With the commit message restructured a bit, and assuming this one actually works: Reviewed-by: Daniel Vetter But I'm really not confident :-( -Daniel > --- > drivers/gpu/drm/i915/i915_request.c | 11 ++- > 1 file changed, 2 insertions(+), 9 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_request.c > b/drivers/gpu/drm/i915/i915_request.c > index ce446716d092..79da5eca60af 100644 > --- a/drivers/gpu/drm/i915/i915_request.c > +++ b/drivers/gpu/drm/i915/i915_request.c > @@ -829,8 +829,6 @@ static void __i915_request_ctor(void *arg) > i915_sw_fence_init(&rq->submit, submit_notify); > i915_sw_fence_init(&rq->semaphore, semaphore_notify); > > - dma_fence_init(&rq->fence, &i915_fence_ops, &rq->lock, 0, 0); > - > rq->capture_list = NULL; > > init_llist_head(&rq->execute_cb); > @@ -905,
Re: [PATCH v2 1/2] drm: document drm_mode_create_lease object requirements
On Fri, Sep 03, 2021 at 01:00:16PM +, Simon Ser wrote: > validate_lease expects one CRTC, one connector and one plane. > > Signed-off-by: Simon Ser > Cc: Daniel Vetter > Cc: Pekka Paalanen > Cc: Keith Packard Reviewed-by: Daniel Vetter > --- > include/uapi/drm/drm_mode.h | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/include/uapi/drm/drm_mode.h b/include/uapi/drm/drm_mode.h > index 90c55383f1ee..e4a2570a6058 100644 > --- a/include/uapi/drm/drm_mode.h > +++ b/include/uapi/drm/drm_mode.h > @@ -1110,6 +1110,9 @@ struct drm_mode_destroy_blob { > * struct drm_mode_create_lease - Create lease > * > * Lease mode resources, creating another drm_master. > + * > + * The @object_ids array must reference at least one CRTC, one connector and > + * one plane if &DRM_CLIENT_CAP_UNIVERSAL_PLANES is enabled. > */ > struct drm_mode_create_lease { > /** @object_ids: Pointer to array of object ids (__u32) */ > -- > 2.33.0 > > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
[PULL] drm-misc-fixes
Hi Dave and Daniel, here's this week's PR for drm-misc-fixes. One patch is a potential deadlock in TTM, the other enables an additional plane in kmb. I'm slightly unhappy that the latter one ended up in -fixes as it's not a bugfix AFAICT. Best regards Thomas drm-misc-fixes-2021-09-08: Short summary of fixes pull: * kmb: Emable second plane * ttm: Fix potential deadlock during swap The following changes since commit fa0b1ef5f7a694f48e00804a391245f3471aa155: drm: Copy drm_wait_vblank to user before returning (2021-08-17 13:56:03 -0400) are available in the Git repository at: git://anongit.freedesktop.org/drm/drm-misc tags/drm-misc-fixes-2021-09-08 for you to fetch changes up to c8704b7ec182f9293e6a994310c7d4203428cdfb: drm/kmb: Enable alpha blended second plane (2021-09-07 10:10:30 -0700) Short summary of fixes pull: * kmb: Emable second plane * ttm: Fix potential deadlock during swap Edmund Dea (1): drm/kmb: Enable alpha blended second plane xinhui pan (1): drm/ttm: Fix a deadlock if the target BO is not idle during swap drivers/gpu/drm/kmb/kmb_drv.c | 8 ++-- drivers/gpu/drm/kmb/kmb_drv.h | 5 +++ drivers/gpu/drm/kmb/kmb_plane.c | 81 - drivers/gpu/drm/kmb/kmb_plane.h | 5 ++- drivers/gpu/drm/kmb/kmb_regs.h | 3 ++ drivers/gpu/drm/ttm/ttm_bo.c| 6 +-- 6 files changed, 90 insertions(+), 18 deletions(-) -- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Maxfeldstr. 5, 90409 Nürnberg, Germany (HRB 36809, AG Nürnberg) Geschäftsführer: Felix Imendörffer
Re: [PATCH v3 4/9] drm/scheduler: Add fence deadline support
On Fri, Sep 03, 2021 at 11:47:55AM -0700, Rob Clark wrote: > From: Rob Clark > > As the finished fence is the one that is exposed to userspace, and > therefore the one that other operations, like atomic update, would > block on, we need to propagate the deadline from from the finished > fence to the actual hw fence. > > v2: Split into drm_sched_fence_set_parent() (ckoenig) > > Signed-off-by: Rob Clark > --- > drivers/gpu/drm/scheduler/sched_fence.c | 34 + > drivers/gpu/drm/scheduler/sched_main.c | 2 +- > include/drm/gpu_scheduler.h | 8 ++ > 3 files changed, 43 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/scheduler/sched_fence.c > b/drivers/gpu/drm/scheduler/sched_fence.c > index bcea035cf4c6..4fc41a71d1c7 100644 > --- a/drivers/gpu/drm/scheduler/sched_fence.c > +++ b/drivers/gpu/drm/scheduler/sched_fence.c > @@ -128,6 +128,30 @@ static void drm_sched_fence_release_finished(struct > dma_fence *f) > dma_fence_put(&fence->scheduled); > } > > +static void drm_sched_fence_set_deadline_finished(struct dma_fence *f, > + ktime_t deadline) > +{ > + struct drm_sched_fence *fence = to_drm_sched_fence(f); > + unsigned long flags; > + > + spin_lock_irqsave(&fence->lock, flags); > + > + /* If we already have an earlier deadline, keep it: */ > + if (test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags) && > + ktime_before(fence->deadline, deadline)) { > + spin_unlock_irqrestore(&fence->lock, flags); > + return; > + } > + > + fence->deadline = deadline; > + set_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags); > + > + spin_unlock_irqrestore(&fence->lock, flags); > + > + if (fence->parent) > + dma_fence_set_deadline(fence->parent, deadline); > +} > + > static const struct dma_fence_ops drm_sched_fence_ops_scheduled = { > .get_driver_name = drm_sched_fence_get_driver_name, > .get_timeline_name = drm_sched_fence_get_timeline_name, > @@ -138,6 +162,7 @@ static const struct dma_fence_ops > drm_sched_fence_ops_finished = { > .get_driver_name = drm_sched_fence_get_driver_name, > .get_timeline_name = drm_sched_fence_get_timeline_name, > .release = drm_sched_fence_release_finished, > + .set_deadline = drm_sched_fence_set_deadline_finished, > }; > > struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f) > @@ -152,6 +177,15 @@ struct drm_sched_fence *to_drm_sched_fence(struct > dma_fence *f) > } > EXPORT_SYMBOL(to_drm_sched_fence); > > +void drm_sched_fence_set_parent(struct drm_sched_fence *s_fence, > + struct dma_fence *fence) > +{ > + s_fence->parent = dma_fence_get(fence); > + if (test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, > + &s_fence->finished.flags)) Don't you need the spinlock here too to avoid races? test_bit is very unordered, so guarantees nothing. Spinlock would need to be both around ->parent = and the test_bit. Entirely aside, but there's discussions going on to preallocate the hw fence somehow. If we do that we could make the deadline forwarding lockless here. Having a spinlock just to set the parent is a bit annoying ... Alternative is that you do this locklessly with barriers and a _lot_ of comments. Would be good to benchmark whether the overhead matters though first. -Daniel > + dma_fence_set_deadline(fence, s_fence->deadline); > +} > + > struct drm_sched_fence *drm_sched_fence_alloc(struct drm_sched_entity > *entity, > void *owner) > { > diff --git a/drivers/gpu/drm/scheduler/sched_main.c > b/drivers/gpu/drm/scheduler/sched_main.c > index 595e47ff7d06..27bf0ac0625f 100644 > --- a/drivers/gpu/drm/scheduler/sched_main.c > +++ b/drivers/gpu/drm/scheduler/sched_main.c > @@ -978,7 +978,7 @@ static int drm_sched_main(void *param) > drm_sched_fence_scheduled(s_fence); > > if (!IS_ERR_OR_NULL(fence)) { > - s_fence->parent = dma_fence_get(fence); > + drm_sched_fence_set_parent(s_fence, fence); > r = dma_fence_add_callback(fence, &sched_job->cb, > drm_sched_job_done_cb); > if (r == -ENOENT) > diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h > index 7f77a455722c..158ddd662469 100644 > --- a/include/drm/gpu_scheduler.h > +++ b/include/drm/gpu_scheduler.h > @@ -238,6 +238,12 @@ struct drm_sched_fence { > */ > struct dma_fencefinished; > > + /** > + * @deadline: deadline set on &drm_sched_fence.finished which > + * potentially needs to be propagated to &drm_sched_fence.parent > + */ > + ktime_t deadline; > + > /** > * @parent: the fence returned by &drm_sched_backend_ops.run_
Re: [PATCH v3 5/9] drm/msm: Add deadline based boost support
On Fri, Sep 03, 2021 at 11:47:56AM -0700, Rob Clark wrote: > From: Rob Clark > > Signed-off-by: Rob Clark Why do you need a kthread_work here? Is this just to make sure you're running at realtime prio? Maybe a comment to that effect would be good. -Daniel > --- > drivers/gpu/drm/msm/msm_fence.c | 76 +++ > drivers/gpu/drm/msm/msm_fence.h | 20 +++ > drivers/gpu/drm/msm/msm_gpu.h | 1 + > drivers/gpu/drm/msm/msm_gpu_devfreq.c | 20 +++ > 4 files changed, 117 insertions(+) > > diff --git a/drivers/gpu/drm/msm/msm_fence.c b/drivers/gpu/drm/msm/msm_fence.c > index f2cece542c3f..67c2a96e1c85 100644 > --- a/drivers/gpu/drm/msm/msm_fence.c > +++ b/drivers/gpu/drm/msm/msm_fence.c > @@ -8,6 +8,37 @@ > > #include "msm_drv.h" > #include "msm_fence.h" > +#include "msm_gpu.h" > + > +static inline bool fence_completed(struct msm_fence_context *fctx, uint32_t > fence); > + > +static struct msm_gpu *fctx2gpu(struct msm_fence_context *fctx) > +{ > + struct msm_drm_private *priv = fctx->dev->dev_private; > + return priv->gpu; > +} > + > +static enum hrtimer_restart deadline_timer(struct hrtimer *t) > +{ > + struct msm_fence_context *fctx = container_of(t, > + struct msm_fence_context, deadline_timer); > + > + kthread_queue_work(fctx2gpu(fctx)->worker, &fctx->deadline_work); > + > + return HRTIMER_NORESTART; > +} > + > +static void deadline_work(struct kthread_work *work) > +{ > + struct msm_fence_context *fctx = container_of(work, > + struct msm_fence_context, deadline_work); > + > + /* If deadline fence has already passed, nothing to do: */ > + if (fence_completed(fctx, fctx->next_deadline_fence)) > + return; > + > + msm_devfreq_boost(fctx2gpu(fctx), 2); > +} > > > struct msm_fence_context * > @@ -26,6 +57,13 @@ msm_fence_context_alloc(struct drm_device *dev, volatile > uint32_t *fenceptr, > fctx->fenceptr = fenceptr; > spin_lock_init(&fctx->spinlock); > > + hrtimer_init(&fctx->deadline_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS); > + fctx->deadline_timer.function = deadline_timer; > + > + kthread_init_work(&fctx->deadline_work, deadline_work); > + > + fctx->next_deadline = ktime_get(); > + > return fctx; > } > > @@ -49,6 +87,8 @@ void msm_update_fence(struct msm_fence_context *fctx, > uint32_t fence) > { > spin_lock(&fctx->spinlock); > fctx->completed_fence = max(fence, fctx->completed_fence); > + if (fence_completed(fctx, fctx->next_deadline_fence)) > + hrtimer_cancel(&fctx->deadline_timer); > spin_unlock(&fctx->spinlock); > } > > @@ -79,10 +119,46 @@ static bool msm_fence_signaled(struct dma_fence *fence) > return fence_completed(f->fctx, f->base.seqno); > } > > +static void msm_fence_set_deadline(struct dma_fence *fence, ktime_t deadline) > +{ > + struct msm_fence *f = to_msm_fence(fence); > + struct msm_fence_context *fctx = f->fctx; > + unsigned long flags; > + ktime_t now; > + > + spin_lock_irqsave(&fctx->spinlock, flags); > + now = ktime_get(); > + > + if (ktime_after(now, fctx->next_deadline) || > + ktime_before(deadline, fctx->next_deadline)) { > + fctx->next_deadline = deadline; > + fctx->next_deadline_fence = > + max(fctx->next_deadline_fence, (uint32_t)fence->seqno); > + > + /* > + * Set timer to trigger boost 3ms before deadline, or > + * if we are already less than 3ms before the deadline > + * schedule boost work immediately. > + */ > + deadline = ktime_sub(deadline, ms_to_ktime(3)); > + > + if (ktime_after(now, deadline)) { > + kthread_queue_work(fctx2gpu(fctx)->worker, > + &fctx->deadline_work); > + } else { > + hrtimer_start(&fctx->deadline_timer, deadline, > + HRTIMER_MODE_ABS); > + } > + } > + > + spin_unlock_irqrestore(&fctx->spinlock, flags); > +} > + > static const struct dma_fence_ops msm_fence_ops = { > .get_driver_name = msm_fence_get_driver_name, > .get_timeline_name = msm_fence_get_timeline_name, > .signaled = msm_fence_signaled, > + .set_deadline = msm_fence_set_deadline, > }; > > struct dma_fence * > diff --git a/drivers/gpu/drm/msm/msm_fence.h b/drivers/gpu/drm/msm/msm_fence.h > index 4783db528bcc..d34e853c555a 100644 > --- a/drivers/gpu/drm/msm/msm_fence.h > +++ b/drivers/gpu/drm/msm/msm_fence.h > @@ -50,6 +50,26 @@ struct msm_fence_context { > volatile uint32_t *fenceptr; > > spinlock_t spinlock; > + > + /* > + * TODO this doesn't really deal with multiple deadlines, like > + * if userspace got multiple frames ahead.. OTOH atomic updates > + * don't queue, so maybe that is
Re: [PATCH v3 8/9] dma-buf/sync_file: Add SET_DEADLINE ioctl
On Fri, Sep 03, 2021 at 11:47:59AM -0700, Rob Clark wrote: > From: Rob Clark > > The initial purpose is for igt tests, but this would also be useful for > compositors that wait until close to vblank deadline to make decisions > about which frame to show. > > Signed-off-by: Rob Clark Needs userspace and I think ideally also some igts to make sure it works and doesn't go boom. -Daniel > --- > drivers/dma-buf/sync_file.c| 19 +++ > include/uapi/linux/sync_file.h | 20 > 2 files changed, 39 insertions(+) > > diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c > index 394e6e1e9686..f295772d5169 100644 > --- a/drivers/dma-buf/sync_file.c > +++ b/drivers/dma-buf/sync_file.c > @@ -459,6 +459,22 @@ static long sync_file_ioctl_fence_info(struct sync_file > *sync_file, > return ret; > } > > +static int sync_file_ioctl_set_deadline(struct sync_file *sync_file, > + unsigned long arg) > +{ > + struct sync_set_deadline ts; > + > + if (copy_from_user(&ts, (void __user *)arg, sizeof(ts))) > + return -EFAULT; > + > + if (ts.pad) > + return -EINVAL; > + > + dma_fence_set_deadline(sync_file->fence, ktime_set(ts.tv_sec, > ts.tv_nsec)); > + > + return 0; > +} > + > static long sync_file_ioctl(struct file *file, unsigned int cmd, > unsigned long arg) > { > @@ -471,6 +487,9 @@ static long sync_file_ioctl(struct file *file, unsigned > int cmd, > case SYNC_IOC_FILE_INFO: > return sync_file_ioctl_fence_info(sync_file, arg); > > + case SYNC_IOC_SET_DEADLINE: > + return sync_file_ioctl_set_deadline(sync_file, arg); > + > default: > return -ENOTTY; > } > diff --git a/include/uapi/linux/sync_file.h b/include/uapi/linux/sync_file.h > index ee2dcfb3d660..f67d4ffe7566 100644 > --- a/include/uapi/linux/sync_file.h > +++ b/include/uapi/linux/sync_file.h > @@ -67,6 +67,18 @@ struct sync_file_info { > __u64 sync_fence_info; > }; > > +/** > + * struct sync_set_deadline - set a deadline on a fence > + * @tv_sec: seconds elapsed since epoch > + * @tv_nsec: nanoseconds elapsed since the time given by the tv_sec > + * @pad: must be zero > + */ > +struct sync_set_deadline { > + __s64 tv_sec; > + __s32 tv_nsec; > + __u32 pad; > +}; > + > #define SYNC_IOC_MAGIC '>' > > /** > @@ -95,4 +107,12 @@ struct sync_file_info { > */ > #define SYNC_IOC_FILE_INFO _IOWR(SYNC_IOC_MAGIC, 4, struct sync_file_info) > > + > +/** > + * DOC: SYNC_IOC_SET_DEADLINE - set a deadline on a fence > + * > + * Allows userspace to set a deadline on a fence, see > dma_fence_set_deadline() > + */ > +#define SYNC_IOC_SET_DEADLINE_IOW(SYNC_IOC_MAGIC, 5, struct > sync_set_deadline) > + > #endif /* _UAPI_LINUX_SYNC_H */ > -- > 2.31.1 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [PATCH 13/14] drm/kmb: Enable alpha blended second plane
Hi Am 03.08.21 um 07:10 schrieb Sam Ravnborg: Hi Anitha, On Mon, Aug 02, 2021 at 08:44:26PM +, Chrisanthus, Anitha wrote: Hi Sam, Thanks. Where should this go, drm-misc-fixes or drm-misc-next? Looks like a drm-misc-next candidate to me. I may improve something for existing users, but it does not look like it fixes an existing bug. I found this patch in drm-misc-fixes, although it doesn't look like a bugfix. It should have gone into drm-misc-next. See [1]. If it indeed belongs into drm-misc-fixes, it certainly should have contained a Fixes tag. Best regards Thomas [1] https://drm.pages.freedesktop.org/maintainer-tools/committer-drm-misc.html#where-do-i-apply-my-patch Sam -- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Maxfeldstr. 5, 90409 Nürnberg, Germany (HRB 36809, AG Nürnberg) Geschäftsführer: Felix Imendörffer OpenPGP_signature Description: OpenPGP digital signature
Re: [PATCH v3 5/9] drm/msm: Add deadline based boost support
On Wed, Sep 8, 2021 at 10:48 AM Daniel Vetter wrote: > > On Fri, Sep 03, 2021 at 11:47:56AM -0700, Rob Clark wrote: > > From: Rob Clark > > > > Signed-off-by: Rob Clark > > Why do you need a kthread_work here? Is this just to make sure you're > running at realtime prio? Maybe a comment to that effect would be good. Mostly because we are already using a kthread_worker for things the GPU needs to kick off to a different context.. but I think this is something we'd want at a realtime prio BR, -R > -Daniel > > > --- > > drivers/gpu/drm/msm/msm_fence.c | 76 +++ > > drivers/gpu/drm/msm/msm_fence.h | 20 +++ > > drivers/gpu/drm/msm/msm_gpu.h | 1 + > > drivers/gpu/drm/msm/msm_gpu_devfreq.c | 20 +++ > > 4 files changed, 117 insertions(+) > > > > diff --git a/drivers/gpu/drm/msm/msm_fence.c > > b/drivers/gpu/drm/msm/msm_fence.c > > index f2cece542c3f..67c2a96e1c85 100644 > > --- a/drivers/gpu/drm/msm/msm_fence.c > > +++ b/drivers/gpu/drm/msm/msm_fence.c > > @@ -8,6 +8,37 @@ > > > > #include "msm_drv.h" > > #include "msm_fence.h" > > +#include "msm_gpu.h" > > + > > +static inline bool fence_completed(struct msm_fence_context *fctx, > > uint32_t fence); > > + > > +static struct msm_gpu *fctx2gpu(struct msm_fence_context *fctx) > > +{ > > + struct msm_drm_private *priv = fctx->dev->dev_private; > > + return priv->gpu; > > +} > > + > > +static enum hrtimer_restart deadline_timer(struct hrtimer *t) > > +{ > > + struct msm_fence_context *fctx = container_of(t, > > + struct msm_fence_context, deadline_timer); > > + > > + kthread_queue_work(fctx2gpu(fctx)->worker, &fctx->deadline_work); > > + > > + return HRTIMER_NORESTART; > > +} > > + > > +static void deadline_work(struct kthread_work *work) > > +{ > > + struct msm_fence_context *fctx = container_of(work, > > + struct msm_fence_context, deadline_work); > > + > > + /* If deadline fence has already passed, nothing to do: */ > > + if (fence_completed(fctx, fctx->next_deadline_fence)) > > + return; > > + > > + msm_devfreq_boost(fctx2gpu(fctx), 2); > > +} > > > > > > struct msm_fence_context * > > @@ -26,6 +57,13 @@ msm_fence_context_alloc(struct drm_device *dev, volatile > > uint32_t *fenceptr, > > fctx->fenceptr = fenceptr; > > spin_lock_init(&fctx->spinlock); > > > > + hrtimer_init(&fctx->deadline_timer, CLOCK_MONOTONIC, > > HRTIMER_MODE_ABS); > > + fctx->deadline_timer.function = deadline_timer; > > + > > + kthread_init_work(&fctx->deadline_work, deadline_work); > > + > > + fctx->next_deadline = ktime_get(); > > + > > return fctx; > > } > > > > @@ -49,6 +87,8 @@ void msm_update_fence(struct msm_fence_context *fctx, > > uint32_t fence) > > { > > spin_lock(&fctx->spinlock); > > fctx->completed_fence = max(fence, fctx->completed_fence); > > + if (fence_completed(fctx, fctx->next_deadline_fence)) > > + hrtimer_cancel(&fctx->deadline_timer); > > spin_unlock(&fctx->spinlock); > > } > > > > @@ -79,10 +119,46 @@ static bool msm_fence_signaled(struct dma_fence *fence) > > return fence_completed(f->fctx, f->base.seqno); > > } > > > > +static void msm_fence_set_deadline(struct dma_fence *fence, ktime_t > > deadline) > > +{ > > + struct msm_fence *f = to_msm_fence(fence); > > + struct msm_fence_context *fctx = f->fctx; > > + unsigned long flags; > > + ktime_t now; > > + > > + spin_lock_irqsave(&fctx->spinlock, flags); > > + now = ktime_get(); > > + > > + if (ktime_after(now, fctx->next_deadline) || > > + ktime_before(deadline, fctx->next_deadline)) { > > + fctx->next_deadline = deadline; > > + fctx->next_deadline_fence = > > + max(fctx->next_deadline_fence, > > (uint32_t)fence->seqno); > > + > > + /* > > + * Set timer to trigger boost 3ms before deadline, or > > + * if we are already less than 3ms before the deadline > > + * schedule boost work immediately. > > + */ > > + deadline = ktime_sub(deadline, ms_to_ktime(3)); > > + > > + if (ktime_after(now, deadline)) { > > + kthread_queue_work(fctx2gpu(fctx)->worker, > > + &fctx->deadline_work); > > + } else { > > + hrtimer_start(&fctx->deadline_timer, deadline, > > + HRTIMER_MODE_ABS); > > + } > > + } > > + > > + spin_unlock_irqrestore(&fctx->spinlock, flags); > > +} > > + > > static const struct dma_fence_ops msm_fence_ops = { > > .get_driver_name = msm_fence_get_driver_name, > > .get_timeline_name = msm_fence_get_timeline_name, > > .signaled = msm_fence_signaled, > > + .set_deadline = msm_fence_set_deadline, > > }; > > > > struct dma_fence * > > diff --git a/d
Re: [PATCH v3 7/9] dma-buf/fence-chain: Add fence deadline support
On Fri, Sep 03, 2021 at 11:47:58AM -0700, Rob Clark wrote: > From: Rob Clark > > Signed-off-by: Rob Clark > --- > drivers/dma-buf/dma-fence-chain.c | 13 + > 1 file changed, 13 insertions(+) > > diff --git a/drivers/dma-buf/dma-fence-chain.c > b/drivers/dma-buf/dma-fence-chain.c > index 1b4cb3e5cec9..736a9ad3ea6d 100644 > --- a/drivers/dma-buf/dma-fence-chain.c > +++ b/drivers/dma-buf/dma-fence-chain.c > @@ -208,6 +208,18 @@ static void dma_fence_chain_release(struct dma_fence > *fence) > dma_fence_free(fence); > } > > + > +static void dma_fence_chain_set_deadline(struct dma_fence *fence, > + ktime_t deadline) > +{ > + dma_fence_chain_for_each(fence, fence) { > + struct dma_fence_chain *chain = to_dma_fence_chain(fence); > + struct dma_fence *f = chain ? chain->fence : fence; Doesn't this just end up calling set_deadline on a chain, potenetially resulting in recursion? Also I don't think this should ever happen, why did you add that? -Daniel > + > + dma_fence_set_deadline(f, deadline); > + } > +} > + > const struct dma_fence_ops dma_fence_chain_ops = { > .use_64bit_seqno = true, > .get_driver_name = dma_fence_chain_get_driver_name, > @@ -215,6 +227,7 @@ const struct dma_fence_ops dma_fence_chain_ops = { > .enable_signaling = dma_fence_chain_enable_signaling, > .signaled = dma_fence_chain_signaled, > .release = dma_fence_chain_release, > + .set_deadline = dma_fence_chain_set_deadline, > }; > EXPORT_SYMBOL(dma_fence_chain_ops); > > -- > 2.31.1 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [PATCH v3 1/9] dma-fence: Add deadline awareness
On Fri, Sep 03, 2021 at 11:47:52AM -0700, Rob Clark wrote: > From: Rob Clark > > Add a way to hint to the fence signaler of an upcoming deadline, such as > vblank, which the fence waiter would prefer not to miss. This is to aid > the fence signaler in making power management decisions, like boosting > frequency as the deadline approaches and awareness of missing deadlines > so that can be factored in to the frequency scaling. > > v2: Drop dma_fence::deadline and related logic to filter duplicate > deadlines, to avoid increasing dma_fence size. The fence-context > implementation will need similar logic to track deadlines of all > the fences on the same timeline. [ckoenig] > > Signed-off-by: Rob Clark > Reviewed-by: Christian König > Signed-off-by: Rob Clark > --- > drivers/dma-buf/dma-fence.c | 20 > include/linux/dma-fence.h | 16 > 2 files changed, 36 insertions(+) > > diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c > index ce0f5eff575d..1f444863b94d 100644 > --- a/drivers/dma-buf/dma-fence.c > +++ b/drivers/dma-buf/dma-fence.c > @@ -910,6 +910,26 @@ dma_fence_wait_any_timeout(struct dma_fence **fences, > uint32_t count, > } > EXPORT_SYMBOL(dma_fence_wait_any_timeout); > > + > +/** > + * dma_fence_set_deadline - set desired fence-wait deadline > + * @fence:the fence that is to be waited on > + * @deadline: the time by which the waiter hopes for the fence to be > + *signaled > + * > + * Inform the fence signaler of an upcoming deadline, such as vblank, by > + * which point the waiter would prefer the fence to be signaled by. This > + * is intended to give feedback to the fence signaler to aid in power > + * management decisions, such as boosting GPU frequency if a periodic > + * vblank deadline is approaching. > + */ > +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline) > +{ > + if (fence->ops->set_deadline && !dma_fence_is_signaled(fence)) > + fence->ops->set_deadline(fence, deadline); > +} > +EXPORT_SYMBOL(dma_fence_set_deadline); > + > /** > * dma_fence_init - Initialize a custom fence. > * @fence: the fence to initialize > diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h > index 6ffb4b2c6371..9c809f0d5d0a 100644 > --- a/include/linux/dma-fence.h > +++ b/include/linux/dma-fence.h > @@ -99,6 +99,7 @@ enum dma_fence_flag_bits { > DMA_FENCE_FLAG_SIGNALED_BIT, > DMA_FENCE_FLAG_TIMESTAMP_BIT, > DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, > + DMA_FENCE_FLAG_HAS_DEADLINE_BIT, > DMA_FENCE_FLAG_USER_BITS, /* must always be last member */ > }; > > @@ -261,6 +262,19 @@ struct dma_fence_ops { >*/ > void (*timeline_value_str)(struct dma_fence *fence, > char *str, int size); > + > + /** > + * @set_deadline: > + * > + * Callback to allow a fence waiter to inform the fence signaler of an > + * upcoming deadline, such as vblank, by which point the waiter would > + * prefer the fence to be signaled by. This is intended to give > feedback > + * to the fence signaler to aid in power management decisions, such as > + * boosting GPU frequency. Please add here that this callback is called without &dma_fence.lock held, and that locking is up to callers if they have some state to manage. I realized that while scratching some heads over your later patches. -Daniel > + * > + * This callback is optional. > + */ > + void (*set_deadline)(struct dma_fence *fence, ktime_t deadline); > }; > > void dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops, > @@ -586,6 +600,8 @@ static inline signed long dma_fence_wait(struct dma_fence > *fence, bool intr) > return ret < 0 ? ret : 0; > } > > +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline); > + > struct dma_fence *dma_fence_get_stub(void); > struct dma_fence *dma_fence_allocate_private_stub(void); > u64 dma_fence_context_alloc(unsigned num); > -- > 2.31.1 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [PATCH v3 6/9] dma-buf/fence-array: Add fence deadline support
On Fri, Sep 03, 2021 at 11:47:57AM -0700, Rob Clark wrote: > From: Rob Clark > > Signed-off-by: Rob Clark > --- > drivers/dma-buf/dma-fence-array.c | 11 +++ > 1 file changed, 11 insertions(+) > > diff --git a/drivers/dma-buf/dma-fence-array.c > b/drivers/dma-buf/dma-fence-array.c > index d3fbd950be94..8d194b09ee3d 100644 > --- a/drivers/dma-buf/dma-fence-array.c > +++ b/drivers/dma-buf/dma-fence-array.c > @@ -119,12 +119,23 @@ static void dma_fence_array_release(struct dma_fence > *fence) > dma_fence_free(fence); > } > > +static void dma_fence_array_set_deadline(struct dma_fence *fence, > + ktime_t deadline) > +{ > + struct dma_fence_array *array = to_dma_fence_array(fence); > + unsigned i; > + > + for (i = 0; i < array->num_fences; ++i) > + dma_fence_set_deadline(array->fences[i], deadline); Hm I wonder whether this can go wrong, and whether we need Christian's massive fence iterator that I've seen flying around. If you nest these things too much it could all go wrong I think. I looked at other users which inspect dma_fence_array and none of them have a risk for unbounded recursion. Maybe check with Christian. -Daniel > +} > + > const struct dma_fence_ops dma_fence_array_ops = { > .get_driver_name = dma_fence_array_get_driver_name, > .get_timeline_name = dma_fence_array_get_timeline_name, > .enable_signaling = dma_fence_array_enable_signaling, > .signaled = dma_fence_array_signaled, > .release = dma_fence_array_release, > + .set_deadline = dma_fence_array_set_deadline, > }; > EXPORT_SYMBOL(dma_fence_array_ops); > > -- > 2.31.1 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [PATCH] drm/rockchip: Update crtc fixup to account for fractional clk change
On Wed, Sep 08, 2021 at 08:53:56AM -0500, Chris Morgan wrote: > From: Chris Morgan > > After commit 928f9e268611 ("clk: fractional-divider: Hide > clk_fractional_divider_ops from wide audience") was merged it appears > that the DSI panel on my Odroid Go Advance stopped working. Upon closer > examination of the problem, it looks like it was the fixup in the > rockchip_drm_vop.c file was causing the issue. The changes made to the > clk driver appear to change some assumptions made in the fixup. > > After debugging the working 5.14 kernel and the no-longer working > 5.15 kernel, it looks like this was broken all along but still > worked, whereas after the fractional clock change it stopped > working despite the issue (it went from sort-of broken to very broken). > > In the 5.14 kernel the dclk_vopb_frac was being requested to be set to > 17000999 on my board. The clock driver was taking the value of the > parent clock and attempting to divide the requested value from it > (1700/17000999 = 0), then subtracting 1 from it (making it -1), > and running it through fls_long to get 64. It would then subtract > the value of fd->mwidth from it to get 48, and then bit shift > 17000999 to the left by 48, coming up with a very large number of > 7649082492112076800. This resulted in a numerator of 65535 and a > denominator of 1 from the clk driver. The driver seemingly would > try again and get a correct 1:1 value later, and then move on. > > Output from my 5.14 kernel (with some printfs for good measure): > [2.830066] rockchip-drm display-subsystem: bound ff46.vop (ops > vop_component_ops) > [2.839431] rockchip-drm display-subsystem: bound ff45.dsi (ops > dw_mipi_dsi_rockchip_ops) > [2.855980] Clock is dclk_vopb_frac > [2.856004] Scale 64, Rate 7649082492112076800, Oldrate 17000999, Parent > Rate 1700, Best Numerator 65535, Best Denominator 1, fd->mwidth 16 > [2.903529] Clock is dclk_vopb_frac > [2.903556] Scale 0, Rate 1700, Oldrate 1700, Parent Rate > 1700, Best Numerator 1, Best Denominator 1, fd->mwidth 16 > [2.903579] Clock is dclk_vopb_frac > [2.903583] Scale 0, Rate 1700, Oldrate 1700, Parent Rate > 1700, Best Numerator 1, Best Denominator 1, fd->mwidth 16 > > Contrast this with 5.15 after the clk change where the rate of 17000999 > was getting passed and resulted in numerators/denomiators of 17001/ > 17000. > > Output from my 5.15 kernel (with some printfs added for good measure): > [2.817571] rockchip-drm display-subsystem: bound ff46.vop (ops > vop_component_ops) > [2.826975] rockchip-drm display-subsystem: bound ff45.dsi (ops > dw_mipi_dsi_rockchip_ops) > [2.843430] Rate 17000999, Parent Rate 1700, Best Numerator 17018, > Best Denominator 17017 > [2.891073] Rate 17001000, Parent Rate 1700, Best Numerator 17001, > Best Denominator 17000 > [2.891269] Rate 17001000, Parent Rate 1700, Best Numerator 17001, > Best Denominator 17000 > [2.891281] Rate 17001000, Parent Rate 1700, Best Numerator 17001, > Best Denominator 17000 > > After tracing through the code it appeared that this function here was > adding a 999 to the requested frequency because of how the clk driver > was rounding/accepting those frequencies. I believe after the changes > made in the commit listed above the assumptions listed in this driver > are no longer true. When I remove the + 999 from the driver the DSI > panel begins to work again. > > Output from my 5.15 kernel with 999 removed (printfs added): > [2.852054] rockchip-drm display-subsystem: bound ff46.vop (ops > vop_component_ops) > [2.864483] rockchip-drm display-subsystem: bound ff45.dsi (ops > dw_mipi_dsi_rockchip_ops) > [2.880869] Clock is dclk_vopb_frac > [2.880892] Rate 1700, Parent Rate 1700, Best Numerator 1, Best > Denominator 1 > [2.928521] Clock is dclk_vopb_frac > [2.928551] Rate 1700, Parent Rate 1700, Best Numerator 1, Best > Denominator 1 > [2.928570] Clock is dclk_vopb_frac > [2.928574] Rate 1700, Parent Rate 1700, Best Numerator 1, Best > Denominator 1 > > I have tested the change extensively on my Odroid Go Advance (Rockchip > RK3326) and it appears to work well. However, this change will affect > all Rockchip SoCs that use this driver so I believe further testing > is warranted. Please note that without this change I can confirm > at least all PX30s with DSI panels will stop working with the 5.15 > kernel. To me it all makes a lot of sense, thank you for deep analysis of the issue! In any case I think we will need a Fixes tag to something (either one of clk-fractional-divider.c series or preexisted). Anyway, FWIW, Reviewed-by: Andy Shevchenko > Signed-off-by: Chris Morgan > --- > drivers/gpu/drm/rockchip/rockchip_drm_vop.c | 21 +++-- > 1 file changed, 3 insertions(+), 18 deletions(-) > > diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
Re: [PATCH v2 7/7] drm/gud: Add module parameter to control emulation: xrgb8888
Hi Am 07.09.21 um 13:57 schrieb Noralf Trønnes: For devices that don't support XRGB give the user the ability to choose what's most important: Color depth or frames per second. Add an 'xrgb' module parameter to override the emulation format. Assume the user wants full control if xrgb is set and don't set DRM_CAP_DUMB_PREFERRED_DEPTH if RGB565 is supported (AFAIK only X.org supports this). More of a general statement: wouldn't it make more sense to auto-detect this entirely? The GUD protocol could order the list of supported formats by preference (maybe it does already). Or you could take the type of USB connection into account. Additionally, xrgb is really a fall-back for lazy userspace programs, but userspace should do better IMHO. Best regards Thomas Signed-off-by: Noralf Trønnes --- drivers/gpu/drm/gud/gud_drv.c | 13 ++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/gud/gud_drv.c b/drivers/gpu/drm/gud/gud_drv.c index 3f9d4b9a1e3d..60d27ee5ddbd 100644 --- a/drivers/gpu/drm/gud/gud_drv.c +++ b/drivers/gpu/drm/gud/gud_drv.c @@ -30,6 +30,10 @@ #include "gud_internal.h" +static int gud_xrgb; +module_param_named(xrgb, gud_xrgb, int, 0644); +MODULE_PARM_DESC(xrgb, "XRGB emulation format: GUD_PIXEL_FORMAT_* value, 0=auto, -1=disable [default=auto]"); + /* Only used internally */ static const struct drm_format_info gud_drm_format_r1 = { .format = GUD_DRM_FORMAT_R1, @@ -530,12 +534,12 @@ static int gud_probe(struct usb_interface *intf, const struct usb_device_id *id) case DRM_FORMAT_RGB332: fallthrough; case DRM_FORMAT_RGB888: - if (!xrgb_emulation_format) + if (!gud_xrgb && !xrgb_emulation_format) xrgb_emulation_format = info; break; case DRM_FORMAT_RGB565: rgb565_supported = true; - if (!xrgb_emulation_format) + if (!gud_xrgb && !xrgb_emulation_format) xrgb_emulation_format = info; break; case DRM_FORMAT_XRGB: @@ -543,6 +547,9 @@ static int gud_probe(struct usb_interface *intf, const struct usb_device_id *id) break; } + if (gud_xrgb == formats_dev[i]) + xrgb_emulation_format = info; + fmt_buf_size = drm_format_info_min_pitch(info, 0, drm->mode_config.max_width) * drm->mode_config.max_height; max_buffer_size = max(max_buffer_size, fmt_buf_size); @@ -559,7 +566,7 @@ static int gud_probe(struct usb_interface *intf, const struct usb_device_id *id) } /* Prefer speed over color depth */ - if (rgb565_supported) + if (!gud_xrgb && rgb565_supported) drm->mode_config.preferred_depth = 16; if (!xrgb_supported && xrgb_emulation_format) { -- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Maxfeldstr. 5, 90409 Nürnberg, Germany (HRB 36809, AG Nürnberg) Geschäftsführer: Felix Imendörffer OpenPGP_signature Description: OpenPGP digital signature
Re: [PATCH v3 7/9] dma-buf/fence-chain: Add fence deadline support
On Wed, Sep 8, 2021 at 10:54 AM Daniel Vetter wrote: > > On Fri, Sep 03, 2021 at 11:47:58AM -0700, Rob Clark wrote: > > From: Rob Clark > > > > Signed-off-by: Rob Clark > > --- > > drivers/dma-buf/dma-fence-chain.c | 13 + > > 1 file changed, 13 insertions(+) > > > > diff --git a/drivers/dma-buf/dma-fence-chain.c > > b/drivers/dma-buf/dma-fence-chain.c > > index 1b4cb3e5cec9..736a9ad3ea6d 100644 > > --- a/drivers/dma-buf/dma-fence-chain.c > > +++ b/drivers/dma-buf/dma-fence-chain.c > > @@ -208,6 +208,18 @@ static void dma_fence_chain_release(struct dma_fence > > *fence) > > dma_fence_free(fence); > > } > > > > + > > +static void dma_fence_chain_set_deadline(struct dma_fence *fence, > > + ktime_t deadline) > > +{ > > + dma_fence_chain_for_each(fence, fence) { > > + struct dma_fence_chain *chain = to_dma_fence_chain(fence); > > + struct dma_fence *f = chain ? chain->fence : fence; > > Doesn't this just end up calling set_deadline on a chain, potenetially > resulting in recursion? Also I don't think this should ever happen, why > did you add that? Tbh the fence-chain was the part I was a bit fuzzy about, and the main reason I added igt tests. The iteration is similar to how, for ex, dma_fence_chain_signaled() work, and according to the igt test it does what was intended BR, -R > -Daniel > > > + > > + dma_fence_set_deadline(f, deadline); > > + } > > +} > > + > > const struct dma_fence_ops dma_fence_chain_ops = { > > .use_64bit_seqno = true, > > .get_driver_name = dma_fence_chain_get_driver_name, > > @@ -215,6 +227,7 @@ const struct dma_fence_ops dma_fence_chain_ops = { > > .enable_signaling = dma_fence_chain_enable_signaling, > > .signaled = dma_fence_chain_signaled, > > .release = dma_fence_chain_release, > > + .set_deadline = dma_fence_chain_set_deadline, > > }; > > EXPORT_SYMBOL(dma_fence_chain_ops); > > > > -- > > 2.31.1 > > > > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch
Re: [PATCH] doc: gpu: Add document describing buffer exchange
On Sun, Sep 05, 2021 at 01:27:42PM +0100, Daniel Stone wrote: > Since there's a lot of confusion around this, document both the rules > and the best practice around negotiating, allocating, importing, and > using buffers when crossing context/process/device/subsystem boundaries. > > This ties up all of dmabuf, formats and modifiers, and their usage. > > Signed-off-by: Daniel Stone > --- > > This is just a quick first draft, inspired by: > https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3197#note_1048637 > > It's not complete or perfect, but I'm off to eat a roast then have a > nice walk in the sun, so figured it'd be better to dash it off rather > than let it rot on my hard drive. > > > .../gpu/exchanging-pixel-buffers.rst | 285 ++ I think we should stuff this into the dma-buf.rst page instead of hiding it in gpu? Maybe then link to it from everywhere, so from a the prime stuff in gpu, and from whatever doc there is for the v4l import/export ioctls. > Documentation/gpu/index.rst | 1 + > 2 files changed, 286 insertions(+) > create mode 100644 Documentation/gpu/exchanging-pixel-buffers.rst > > diff --git a/Documentation/gpu/exchanging-pixel-buffers.rst > b/Documentation/gpu/exchanging-pixel-buffers.rst > new file mode 100644 > index ..75c4de13d5c8 > --- /dev/null > +++ b/Documentation/gpu/exchanging-pixel-buffers.rst > @@ -0,0 +1,285 @@ > +.. Copyright 2021 Collabora Ltd. > + > + > +Exchanging pixel buffers > + > + > +As originally designed, the Linux graphics subsystem had extremely limited > +support for sharing pixel-buffer allocations between processes, devices, and > +subsystems. Modern systems require extensive integration between all three > +classes; this document details how applications and kernel subsystems should > +approach this sharing for two-dimensional image data. > + > +It is written with reference to the DRM subsystem for GPU and display > devices, > +V4L2 for media devices, and also to Vulkan, EGL and Wayland, for userspace > +support, however any other subsystems should also follow this design and > advice. > + > + > +Formats and modifiers > += > + > +Each buffer must have an underlying format. This format describes the data > which > +can be stored and loaded for each pixel. Although each subsystem has its own > +format descriptions (e.g. V4L2 and fbdev), the `DRM_FORMAT_*` tokens should > be > +reused wherever possible, as they are the standard descriptions used for > +interchange. > + > +Each `DRM_FORMAT_*` token describes the per-pixel data available, in terms of > +the translation between one or more pixels in memory, and the color data > +contained within that memory. The number and type of color channels are > +described: whether they are RGB or YUV, integer or floating-point, the size > +of each channel and their locations within the pixel memory, and the > +relationship between color planes. > + > +For example, `DRM_FORMAT_ARGB` describes a format in which each pixel > has a > +single 32-bit value in memory. Alpha, red, green, and blue, color channels > are > +available at 8-byte precision per channel, ordered respectively from most to > +least significant bits in little-endian storage. As a more complex example, > +`DRM_FORMAT_NV12` describes a format in which luma and chroma YUV samples are > +stored in separate memory planes, where the chroma plane is stored at half > the > +resolution in both dimensions (i.e. one U/V chroma sample is stored for each > 2x2 > +pixel grouping). > + > +Format modifiers describe a translation mechanism between these per-pixel > memory > +samples, and the actual memory storage for the buffer. The most > straightforward > +modifier is `DRM_FORMAT_MOD_LINEAR`, describing a scheme in which each pixel > has > +contiguous storage beginning at (0,0); each pixel's location in memory will > be > +`base + (y * stride) + (x * bpp)`. This is considered the baseline > interchange > +format, and most convenient for CPU access. > + > +Modern hardware employs much more sophisticated access mechanisms, typically > +making use of tiled access and possibly also compression. For example, the > +`DRM_FORMAT_MOD_VIVANTE_TILED` modifier describes memory storage where pixels > +are stored in 4x4 blocks arranged in row-major ordering, i.e. the first tile > in > +memory stores pixels (0,0) to (3,3) inclusive, and the second tile in memory > +stores pixels (4,0) to (7,3) inclusive. > + > +Some modifiers may modify the number of memory buffers required to store the > +data; for example, the `I915_FORMAT_MOD_Y_TILED_CCS` modifier adds a second > +memory buffer to RGB formats in which it stores data about the status of > every > +tile, notably including whether the tile is fully populated with pixel data, > or > +can be expanded from a single solid color. > + > +These extended layouts are highly vendor-spe
[PATCH 1/2] drm/bridge: parade-ps8640: Use regmap APIs
Replace the direct i2c access (i2c_smbus_* functions) with regmap APIs, which will simplify the future update on ps8640 driver. Signed-off-by: Philip Chen --- drivers/gpu/drm/bridge/parade-ps8640.c | 66 +++--- 1 file changed, 39 insertions(+), 27 deletions(-) diff --git a/drivers/gpu/drm/bridge/parade-ps8640.c b/drivers/gpu/drm/bridge/parade-ps8640.c index 685e9c38b2db..a16725dbf912 100644 --- a/drivers/gpu/drm/bridge/parade-ps8640.c +++ b/drivers/gpu/drm/bridge/parade-ps8640.c @@ -9,6 +9,7 @@ #include #include #include +#include #include #include @@ -64,12 +65,29 @@ struct ps8640 { struct drm_bridge *panel_bridge; struct mipi_dsi_device *dsi; struct i2c_client *page[MAX_DEVS]; + struct regmap *regmap[MAX_DEVS]; struct regulator_bulk_data supplies[2]; struct gpio_desc *gpio_reset; struct gpio_desc *gpio_powerdown; bool powered; }; +static const struct regmap_range ps8640_volatile_ranges[] = { + { .range_min = 0, .range_max = 0xff }, +}; + +static const struct regmap_access_table ps8640_volatile_table = { + .yes_ranges = ps8640_volatile_ranges, + .n_yes_ranges = ARRAY_SIZE(ps8640_volatile_ranges), +}; + +static const struct regmap_config ps8640_regmap_config = { + .reg_bits = 8, + .val_bits = 8, + .volatile_table = &ps8640_volatile_table, + .cache_type = REGCACHE_NONE, +}; + static inline struct ps8640 *bridge_to_ps8640(struct drm_bridge *e) { return container_of(e, struct ps8640, bridge); @@ -78,13 +96,13 @@ static inline struct ps8640 *bridge_to_ps8640(struct drm_bridge *e) static int ps8640_bridge_vdo_control(struct ps8640 *ps_bridge, const enum ps8640_vdo_control ctrl) { - struct i2c_client *client = ps_bridge->page[PAGE3_DSI_CNTL1]; - u8 vdo_ctrl_buf[] = { VDO_CTL_ADD, ctrl }; + struct regmap *map = ps_bridge->regmap[PAGE3_DSI_CNTL1]; + u8 vdo_ctrl_buf[] = {VDO_CTL_ADD, ctrl}; int ret; - ret = i2c_smbus_write_i2c_block_data(client, PAGE3_SET_ADD, -sizeof(vdo_ctrl_buf), -vdo_ctrl_buf); + ret = regmap_bulk_write(map, PAGE3_SET_ADD, + vdo_ctrl_buf, sizeof(vdo_ctrl_buf)); + if (ret < 0) { DRM_ERROR("failed to %sable VDO: %d\n", ctrl == ENABLE ? "en" : "dis", ret); @@ -96,8 +114,7 @@ static int ps8640_bridge_vdo_control(struct ps8640 *ps_bridge, static void ps8640_bridge_poweron(struct ps8640 *ps_bridge) { - struct i2c_client *client = ps_bridge->page[PAGE2_TOP_CNTL]; - unsigned long timeout; + struct regmap *map = ps_bridge->regmap[PAGE2_TOP_CNTL]; int ret, status; if (ps_bridge->powered) @@ -121,18 +138,12 @@ static void ps8640_bridge_poweron(struct ps8640 *ps_bridge) */ msleep(200); - timeout = jiffies + msecs_to_jiffies(200) + 1; + ret = regmap_read_poll_timeout(map, PAGE2_GPIO_H, status, + status & PS_GPIO9, 20 * 1000, 200 * 1000); - while (time_is_after_jiffies(timeout)) { - status = i2c_smbus_read_byte_data(client, PAGE2_GPIO_H); - if (status < 0) { - DRM_ERROR("failed read PAGE2_GPIO_H: %d\n", status); - goto err_regulators_disable; - } - if ((status & PS_GPIO9) == PS_GPIO9) - break; - - msleep(20); + if (ret < 0) { + DRM_ERROR("failed read PAGE2_GPIO_H: %d\n", ret); + goto err_regulators_disable; } msleep(50); @@ -144,22 +155,15 @@ static void ps8640_bridge_poweron(struct ps8640 *ps_bridge) * disabled by the manufacturer. Once disabled, all MCS commands are * ignored by the display interface. */ - status = i2c_smbus_read_byte_data(client, PAGE2_MCS_EN); - if (status < 0) { - DRM_ERROR("failed read PAGE2_MCS_EN: %d\n", status); - goto err_regulators_disable; - } - ret = i2c_smbus_write_byte_data(client, PAGE2_MCS_EN, - status & ~MCS_EN); + ret = regmap_update_bits(map, PAGE2_MCS_EN, MCS_EN, 0); if (ret < 0) { DRM_ERROR("failed write PAGE2_MCS_EN: %d\n", ret); goto err_regulators_disable; } /* Switch access edp panel's edid through i2c */ - ret = i2c_smbus_write_byte_data(client, PAGE2_I2C_BYPASS, - I2C_BYPASS_EN); + ret = regmap_write(map, PAGE2_I2C_BYPASS, I2C_BYPASS_EN); if (ret < 0) { DRM_ERROR("failed write PAGE2_I2C_BYPASS: %d\n", ret); goto err_regulators_disable; @@ -361,6 +365,10 @@ static int ps8640_probe(struct i2c_client *client)
[PATCH 2/2] drm/bridge: parade-ps8640: Add support for AUX channel
Implement the first version of AUX support, which will be useful as we expand the driver to support varied use cases. Signed-off-by: Philip Chen --- drivers/gpu/drm/bridge/parade-ps8640.c | 123 + 1 file changed, 123 insertions(+) diff --git a/drivers/gpu/drm/bridge/parade-ps8640.c b/drivers/gpu/drm/bridge/parade-ps8640.c index a16725dbf912..3f0241a60357 100644 --- a/drivers/gpu/drm/bridge/parade-ps8640.c +++ b/drivers/gpu/drm/bridge/parade-ps8640.c @@ -9,15 +9,36 @@ #include #include #include +#include #include #include #include +#include #include #include #include #include +#define PAGE0_AUXCH_CFG3 0x76 +#define AUXCH_CFG3_RESET 0xff +#define PAGE0_AUX_ADDR_7_0 0x7d +#define PAGE0_AUX_ADDR_15_80x7e +#define PAGE0_AUX_ADDR_23_16 0x7f +#define AUX_ADDR_19_16_MASK GENMASK(3, 0) +#define AUX_CMD_MASK GENMASK(7, 4) +#define PAGE0_AUX_LENGTH 0x80 +#define AUX_LENGTH_MASK GENMASK(3, 0) +#define PAGE0_AUX_WDATA0x81 +#define PAGE0_AUX_RDATA0x82 +#define PAGE0_AUX_CTRL 0x83 +#define AUX_START 0x01 +#define PAGE0_AUX_STATUS 0x84 +#define AUX_STATUS_MASK GENMASK(7, 5) +#define AUX_STATUS_TIMEOUT(0x7 << 5) +#define AUX_STATUS_DEFER (0x2 << 5) +#define AUX_STATUS_NACK (0x1 << 5) + #define PAGE2_GPIO_H 0xa7 #define PS_GPIO9 BIT(1) #define PAGE2_I2C_BYPASS 0xea @@ -63,6 +84,7 @@ enum ps8640_vdo_control { struct ps8640 { struct drm_bridge bridge; struct drm_bridge *panel_bridge; + struct drm_dp_aux aux; struct mipi_dsi_device *dsi; struct i2c_client *page[MAX_DEVS]; struct regmap *regmap[MAX_DEVS]; @@ -93,6 +115,102 @@ static inline struct ps8640 *bridge_to_ps8640(struct drm_bridge *e) return container_of(e, struct ps8640, bridge); } +static inline struct ps8640 *aux_to_ps8640(struct drm_dp_aux *aux) +{ + return container_of(aux, struct ps8640, aux); +} + +static ssize_t ps8640_aux_transfer(struct drm_dp_aux *aux, + struct drm_dp_aux_msg *msg) +{ + struct ps8640 *ps_bridge = aux_to_ps8640(aux); + struct i2c_client *client = ps_bridge->page[PAGE0_DP_CNTL]; + struct regmap *map = ps_bridge->regmap[PAGE0_DP_CNTL]; + unsigned int len = msg->size; + unsigned int data; + int ret; + u8 request = msg->request & +~(DP_AUX_I2C_MOT | DP_AUX_I2C_WRITE_STATUS_UPDATE); + u8 *buf = msg->buffer; + bool is_native_aux = false; + + if (len > DP_AUX_MAX_PAYLOAD_BYTES) + return -EINVAL; + + pm_runtime_get_sync(&client->dev); + + switch (request) { + case DP_AUX_NATIVE_WRITE: + case DP_AUX_NATIVE_READ: + is_native_aux = true; + case DP_AUX_I2C_WRITE: + case DP_AUX_I2C_READ: + regmap_write(map, PAGE0_AUXCH_CFG3, AUXCH_CFG3_RESET); + break; + default: + ret = -EINVAL; + goto exit; + } + + /* Assume it's good */ + msg->reply = 0; + + data = ((request << 4) & AUX_CMD_MASK) | + ((msg->address >> 16) & AUX_ADDR_19_16_MASK); + regmap_write(map, PAGE0_AUX_ADDR_23_16, data); + data = (msg->address >> 8) & 0xff; + regmap_write(map, PAGE0_AUX_ADDR_15_8, data); + data = msg->address & 0xff; + regmap_write(map, PAGE0_AUX_ADDR_7_0, msg->address & 0xff); + + data = (len - 1) & AUX_LENGTH_MASK; + regmap_write(map, PAGE0_AUX_LENGTH, data); + + if (request == DP_AUX_NATIVE_WRITE || request == DP_AUX_I2C_WRITE) { + ret = regmap_noinc_write(map, PAGE0_AUX_WDATA, buf, len); + if (ret < 0) { + DRM_ERROR("failed to write PAGE0_AUX_WDATA"); + goto exit; + } + } + + regmap_write(map, PAGE0_AUX_CTRL, AUX_START); + + regmap_read(map, PAGE0_AUX_STATUS, &data); + switch (data & AUX_STATUS_MASK) { + case AUX_STATUS_DEFER: + if (is_native_aux) + msg->reply |= DP_AUX_NATIVE_REPLY_DEFER; + else + msg->reply |= DP_AUX_I2C_REPLY_DEFER; + goto exit; + case AUX_STATUS_NACK: + if (is_native_aux) + msg->reply |= DP_AUX_NATIVE_REPLY_NACK; + else + msg->reply |= DP_AUX_I2C_REPLY_NACK; + goto exit; + case AUX_STATUS_TIMEOUT: + ret = -ETIMEDOUT; + goto exit; + } + + if (request == DP_AUX_NATIVE_READ || request == DP_AUX_I2C_READ) { + ret = regmap_noinc_read(map, PAGE0_AUX_RDATA, buf, len); + if (ret < 0) + DRM_ERROR("failed to read PAGE0_AUX_RDATA"); + } + +exit: + pm_runtime_mark_last_busy
Re: [PATCH v3 8/9] dma-buf/sync_file: Add SET_DEADLINE ioctl
On Wed, Sep 8, 2021 at 10:50 AM Daniel Vetter wrote: > > On Fri, Sep 03, 2021 at 11:47:59AM -0700, Rob Clark wrote: > > From: Rob Clark > > > > The initial purpose is for igt tests, but this would also be useful for > > compositors that wait until close to vblank deadline to make decisions > > about which frame to show. > > > > Signed-off-by: Rob Clark > > Needs userspace and I think ideally also some igts to make sure it works > and doesn't go boom. See cover-letter.. there are igt tests, although currently that is the only user. I'd be ok to otherwise initially restrict this and the sw_sync UABI (CAP_SYS_ADMIN? Or??) until there is a non-igt user, but they are both needed by the igt tests BR, -R > -Daniel > > > --- > > drivers/dma-buf/sync_file.c| 19 +++ > > include/uapi/linux/sync_file.h | 20 > > 2 files changed, 39 insertions(+) > > > > diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c > > index 394e6e1e9686..f295772d5169 100644 > > --- a/drivers/dma-buf/sync_file.c > > +++ b/drivers/dma-buf/sync_file.c > > @@ -459,6 +459,22 @@ static long sync_file_ioctl_fence_info(struct > > sync_file *sync_file, > > return ret; > > } > > > > +static int sync_file_ioctl_set_deadline(struct sync_file *sync_file, > > + unsigned long arg) > > +{ > > + struct sync_set_deadline ts; > > + > > + if (copy_from_user(&ts, (void __user *)arg, sizeof(ts))) > > + return -EFAULT; > > + > > + if (ts.pad) > > + return -EINVAL; > > + > > + dma_fence_set_deadline(sync_file->fence, ktime_set(ts.tv_sec, > > ts.tv_nsec)); > > + > > + return 0; > > +} > > + > > static long sync_file_ioctl(struct file *file, unsigned int cmd, > > unsigned long arg) > > { > > @@ -471,6 +487,9 @@ static long sync_file_ioctl(struct file *file, unsigned > > int cmd, > > case SYNC_IOC_FILE_INFO: > > return sync_file_ioctl_fence_info(sync_file, arg); > > > > + case SYNC_IOC_SET_DEADLINE: > > + return sync_file_ioctl_set_deadline(sync_file, arg); > > + > > default: > > return -ENOTTY; > > } > > diff --git a/include/uapi/linux/sync_file.h b/include/uapi/linux/sync_file.h > > index ee2dcfb3d660..f67d4ffe7566 100644 > > --- a/include/uapi/linux/sync_file.h > > +++ b/include/uapi/linux/sync_file.h > > @@ -67,6 +67,18 @@ struct sync_file_info { > > __u64 sync_fence_info; > > }; > > > > +/** > > + * struct sync_set_deadline - set a deadline on a fence > > + * @tv_sec: seconds elapsed since epoch > > + * @tv_nsec: nanoseconds elapsed since the time given by the tv_sec > > + * @pad: must be zero > > + */ > > +struct sync_set_deadline { > > + __s64 tv_sec; > > + __s32 tv_nsec; > > + __u32 pad; > > +}; > > + > > #define SYNC_IOC_MAGIC '>' > > > > /** > > @@ -95,4 +107,12 @@ struct sync_file_info { > > */ > > #define SYNC_IOC_FILE_INFO _IOWR(SYNC_IOC_MAGIC, 4, struct > > sync_file_info) > > > > + > > +/** > > + * DOC: SYNC_IOC_SET_DEADLINE - set a deadline on a fence > > + * > > + * Allows userspace to set a deadline on a fence, see > > dma_fence_set_deadline() > > + */ > > +#define SYNC_IOC_SET_DEADLINE_IOW(SYNC_IOC_MAGIC, 5, struct > > sync_set_deadline) > > + > > #endif /* _UAPI_LINUX_SYNC_H */ > > -- > > 2.31.1 > > > > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch
Re: [PATCH 1/2] drm/nouveau/ga102-: support ttm buffer moves via copy engine
On Mon, Sep 06, 2021 at 10:56:27AM +1000, Ben Skeggs wrote: > From: Ben Skeggs > > We don't currently have any kind of real acceleration on Ampere GPUs, > but the TTM memcpy() fallback paths aren't really designed to handle > copies between different devices, such as on Optimus systems, and > result in a kernel OOPS. Is this just for moving a buffer from vram to system memory when you pin it for dma-buf? I'm kinda lost what you even use ttm bo moves for if there's no one using the gpu. Also I guess memcpy goes boom if you can't mmap it because it's outside the gart? Or just that it's very slow. We're trying to use ttm memcyp as fallback, so want to know how this can all go wrong :-) -Daniel > > A few options were investigated to try and fix this, but didn't work > out, and likely would have resulted in a very unpleasant experience > for users anyway. > > This commit adds just enough support for setting up a single channel > connected to a copy engine, which the kernel can use to accelerate > the buffer copies between devices. Userspace has no access to this > incomplete channel support, but it's suitable for TTM's needs. > > A more complete implementation of host(fifo) for Ampere GPUs is in > the works, but the required changes are far too invasive that they > would be unsuitable to backport to fix this issue on current kernels. > > Signed-off-by: Ben Skeggs > Cc: Lyude Paul > Cc: Karol Herbst > Cc: # v5.12+ > --- > drivers/gpu/drm/nouveau/include/nvif/class.h | 2 + > .../drm/nouveau/include/nvkm/engine/fifo.h| 1 + > drivers/gpu/drm/nouveau/nouveau_bo.c | 1 + > drivers/gpu/drm/nouveau/nouveau_chan.c| 6 +- > drivers/gpu/drm/nouveau/nouveau_drm.c | 4 + > drivers/gpu/drm/nouveau/nv84_fence.c | 2 +- > .../gpu/drm/nouveau/nvkm/engine/device/base.c | 3 + > .../gpu/drm/nouveau/nvkm/engine/fifo/Kbuild | 1 + > .../gpu/drm/nouveau/nvkm/engine/fifo/ga102.c | 308 ++ > .../gpu/drm/nouveau/nvkm/subdev/top/ga100.c | 7 +- > 10 files changed, 329 insertions(+), 6 deletions(-) > create mode 100644 drivers/gpu/drm/nouveau/nvkm/engine/fifo/ga102.c > > diff --git a/drivers/gpu/drm/nouveau/include/nvif/class.h > b/drivers/gpu/drm/nouveau/include/nvif/class.h > index c68cc957248e..a582c0cb0cb0 100644 > --- a/drivers/gpu/drm/nouveau/include/nvif/class.h > +++ b/drivers/gpu/drm/nouveau/include/nvif/class.h > @@ -71,6 +71,7 @@ > #define PASCAL_CHANNEL_GPFIFO_A /* cla06f.h */ > 0xc06f > #define VOLTA_CHANNEL_GPFIFO_A/* clc36f.h */ > 0xc36f > #define TURING_CHANNEL_GPFIFO_A /* clc36f.h */ > 0xc46f > +#define AMPERE_CHANNEL_GPFIFO_B /* clc36f.h */ > 0xc76f > > #define NV50_DISP /* cl5070.h */ > 0x5070 > #define G82_DISP /* cl5070.h */ > 0x8270 > @@ -200,6 +201,7 @@ > #define PASCAL_DMA_COPY_B > 0xc1b5 > #define VOLTA_DMA_COPY_A > 0xc3b5 > #define TURING_DMA_COPY_A > 0xc5b5 > +#define AMPERE_DMA_COPY_B > 0xc7b5 > > #define FERMI_DECOMPRESS > 0x90b8 > > diff --git a/drivers/gpu/drm/nouveau/include/nvkm/engine/fifo.h > b/drivers/gpu/drm/nouveau/include/nvkm/engine/fifo.h > index 54fab7cc36c1..64ee82c7c1be 100644 > --- a/drivers/gpu/drm/nouveau/include/nvkm/engine/fifo.h > +++ b/drivers/gpu/drm/nouveau/include/nvkm/engine/fifo.h > @@ -77,4 +77,5 @@ int gp100_fifo_new(struct nvkm_device *, enum > nvkm_subdev_type, int inst, struct > int gp10b_fifo_new(struct nvkm_device *, enum nvkm_subdev_type, int inst, > struct nvkm_fifo **); > int gv100_fifo_new(struct nvkm_device *, enum nvkm_subdev_type, int inst, > struct nvkm_fifo **); > int tu102_fifo_new(struct nvkm_device *, enum nvkm_subdev_type, int inst, > struct nvkm_fifo **); > +int ga102_fifo_new(struct nvkm_device *, enum nvkm_subdev_type, int inst, > struct nvkm_fifo **); > #endif > diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c > b/drivers/gpu/drm/nouveau/nouveau_bo.c > index 4a7cebac8060..b3e4f555fa05 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_bo.c > +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c > @@ -844,6 +844,7 @@ nouveau_bo_move_init(struct nouveau_drm *drm) > struct ttm_resource *, struct ttm_resource *); > int (*init)(struct nouveau_channel *, u32 handle); > } _methods[] = { > + { "COPY", 4, 0xc7b5, nve0_bo_move_copy, nve0_bo_move_init }, > { "COPY", 4, 0xc5b5, nve0_bo_move_copy, nve0_bo_move_init }, > { "GRCE", 0, 0xc5b5, nve0_bo_move_copy, nvc0_bo_move_init }, > { "COPY", 4, 0xc3b5, nve0_bo_move_copy, nve0_bo_move_i
Re: [PATCH] drm: mxsfb: Fix NULL pointer dereference crash on unload
On Tue, Sep 07, 2021 at 04:49:00AM +0200, Marek Vasut wrote: > The mxsfb->crtc.funcs may already be NULL when unloading the driver, > in which case calling mxsfb_irq_disable() via drm_irq_uninstall() from > mxsfb_unload() leads to NULL pointer dereference. > > Since all we care about is masking the IRQ and mxsfb->base is still > valid, just use that to clear and mask the IRQ. > > Fixes: ae1ed00932819 ("drm: mxsfb: Stop using DRM simple display pipeline > helper") > Signed-off-by: Marek Vasut > Cc: Daniel Abrecht > Cc: Emil Velikov > Cc: Laurent Pinchart > Cc: Sam Ravnborg > Cc: Stefan Agner You probably want a drm_atomic_helper_shutdown instead of trying to do all that manually. We've also added a bunch more devm and drmm_ functions to automate the cleanup a lot more here, e.g. your drm_mode_config_cleanup is in the wrong place. Also I'm confused because I'm not even seeing this function anywhere in upstream. -Daniel > --- > drivers/gpu/drm/mxsfb/mxsfb_drv.c | 6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/mxsfb/mxsfb_drv.c > b/drivers/gpu/drm/mxsfb/mxsfb_drv.c > index ec0432fe1bdf8..86d78634a9799 100644 > --- a/drivers/gpu/drm/mxsfb/mxsfb_drv.c > +++ b/drivers/gpu/drm/mxsfb/mxsfb_drv.c > @@ -173,7 +173,11 @@ static void mxsfb_irq_disable(struct drm_device *drm) > struct mxsfb_drm_private *mxsfb = drm->dev_private; > > mxsfb_enable_axi_clk(mxsfb); > - mxsfb->crtc.funcs->disable_vblank(&mxsfb->crtc); > + > + /* Disable and clear VBLANK IRQ */ > + writel(CTRL1_CUR_FRAME_DONE_IRQ_EN, mxsfb->base + LCDC_CTRL1 + REG_CLR); > + writel(CTRL1_CUR_FRAME_DONE_IRQ, mxsfb->base + LCDC_CTRL1 + REG_CLR); > + > mxsfb_disable_axi_clk(mxsfb); > } > > -- > 2.33.0 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [resend PATCH] drm/ttm: Fix a deadlock if the target BO is not idle during swap
On Tue, Sep 07, 2021 at 11:28:23AM +0200, Christian König wrote: > Am 07.09.21 um 11:05 schrieb Daniel Vetter: > > On Tue, Sep 07, 2021 at 08:22:20AM +0200, Christian König wrote: > > > Added a Fixes tag and pushed this to drm-misc-fixes. > > We're in the merge window, this should have been drm-misc-next-fixes. I'll > > poke misc maintainers so it's not lost. > > Hui? It's a fix for a problem in stable and not in drm-misc-next. Ah the flow chart is confusing. There is no current -rc, so it's always -next-fixes. Or you're running the risk that it's lost until after -rc1. Maybe we should clarify that "is the bug in current -rc?" only applies if there is a current -rc. Anyway Thomas sent out a pr, so it's all good. -Daniel > > Christian. > > > -Daniel > > > > > It will take a while until it cycles back into the development branches, > > > so > > > feel free to push some version to amd-staging-drm-next as well. Just ping > > > Alex when you do this. > > > > > > Thanks, > > > Christian. > > > > > > Am 07.09.21 um 06:08 schrieb xinhui pan: > > > > The ret value might be -EBUSY, caller will think lru lock is still > > > > locked but actually NOT. So return -ENOSPC instead. Otherwise we hit > > > > list corruption. > > > > > > > > ttm_bo_cleanup_refs might fail too if BO is not idle. If we return 0, > > > > caller(ttm_tt_populate -> ttm_global_swapout ->ttm_device_swapout) will > > > > be stuck as we actually did not free any BO memory. This usually happens > > > > when the fence is not signaled for a long time. > > > > > > > > Signed-off-by: xinhui pan > > > > Reviewed-by: Christian König > > > > --- > > > >drivers/gpu/drm/ttm/ttm_bo.c | 6 +++--- > > > >1 file changed, 3 insertions(+), 3 deletions(-) > > > > > > > > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c > > > > index 8d7fd65ccced..23f906941ac9 100644 > > > > --- a/drivers/gpu/drm/ttm/ttm_bo.c > > > > +++ b/drivers/gpu/drm/ttm/ttm_bo.c > > > > @@ -1152,9 +1152,9 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, > > > > struct ttm_operation_ctx *ctx, > > > > } > > > > if (bo->deleted) { > > > > - ttm_bo_cleanup_refs(bo, false, false, locked); > > > > + ret = ttm_bo_cleanup_refs(bo, false, false, locked); > > > > ttm_bo_put(bo); > > > > - return 0; > > > > + return ret == -EBUSY ? -ENOSPC : ret; > > > > } > > > > ttm_bo_del_from_lru(bo); > > > > @@ -1208,7 +1208,7 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, > > > > struct ttm_operation_ctx *ctx, > > > > if (locked) > > > > dma_resv_unlock(bo->base.resv); > > > > ttm_bo_put(bo); > > > > - return ret; > > > > + return ret == -EBUSY ? -ENOSPC : ret; > > > >} > > > >void ttm_bo_tt_destroy(struct ttm_buffer_object *bo) > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [PATCH] kernel/locking: Add context to ww_mutex_trylock.
On Wed, Sep 08, 2021 at 12:14:23PM +0200, Peter Zijlstra wrote: > On Tue, Sep 07, 2021 at 03:20:44PM +0200, Maarten Lankhorst wrote: > > i915 will soon gain an eviction path that trylock a whole lot of locks > > for eviction, getting dmesg failures like below: > > > > BUG: MAX_LOCK_DEPTH too low! > > turning off the locking correctness validator. > > depth: 48 max: 48! > > 48 locks held by i915_selftest/5776: > > #0: 888101a79240 (&dev->mutex){}-{3:3}, at: > > __driver_attach+0x88/0x160 > > #1: c99778c0 (reservation_ww_class_acquire){+.+.}-{0:0}, at: > > i915_vma_pin.constprop.63+0x39/0x1b0 [i915] > > #2: 88800cf74de8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: > > i915_vma_pin.constprop.63+0x5f/0x1b0 [i915] > > #3: 88810c7f9e38 (&vm->mutex/1){+.+.}-{3:3}, at: > > i915_vma_pin_ww+0x1c4/0x9d0 [i915] > > #4: 88810bad5768 (reservation_ww_class_mutex){+.+.}-{3:3}, at: > > i915_gem_evict_something+0x110/0x860 [i915] > > #5: 88810bad60e8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: > > i915_gem_evict_something+0x110/0x860 [i915] > > ... > > #46: 88811964d768 (reservation_ww_class_mutex){+.+.}-{3:3}, at: > > i915_gem_evict_something+0x110/0x860 [i915] > > #47: 88811964e0e8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: > > i915_gem_evict_something+0x110/0x860 [i915] > > INFO: lockdep is turned off. > > > As an intermediate solution, add an acquire context to ww_mutex_trylock, > > which allows us to do proper nesting annotations on the trylocks, making > > the above lockdep splat disappear. > > Fair enough I suppose. What's maybe missing from the commit message - we'll probably use this for ttm too eventually - even when we add full ww_mutex locking we'll still have the trylock fastpath. This is because we have a lock inversion against list locks in these eviction paths, and the slow path unroll to drop that list lock is a bit nasty (and defintely expensive). iow even long term this here is needed in some form I think. -Daniel > > > +/** > > + * ww_mutex_trylock - tries to acquire the w/w mutex with optional acquire > > context > > + * @lock: mutex to lock > > + * @ctx: optional w/w acquire context > > + * > > + * Trylocks a mutex with the optional acquire context; no deadlock > > detection is > > + * possible. Returns 1 if the mutex has been acquired successfully, 0 > > otherwise. > > + * > > + * Unlike ww_mutex_lock, no deadlock handling is performed. However, if a > > @ctx is > > + * specified, -EALREADY and -EDEADLK handling may happen in calls to > > ww_mutex_lock. > > + * > > + * A mutex acquired with this function must be released with > > ww_mutex_unlock. > > + */ > > +int __sched > > +ww_mutex_trylock(struct ww_mutex *ww, struct ww_acquire_ctx *ctx) > > +{ > > + bool locked; > > + > > + if (!ctx) > > + return mutex_trylock(&ww->base); > > + > > +#ifdef CONFIG_DEBUG_MUTEXES > > + DEBUG_LOCKS_WARN_ON(ww->base.magic != &ww->base); > > +#endif > > + > > + preempt_disable(); > > + locked = __mutex_trylock(&ww->base); > > + > > + if (locked) { > > + ww_mutex_set_context_fastpath(ww, ctx); > > + mutex_acquire_nest(&ww->base.dep_map, 0, 1, &ctx->dep_map, > > _RET_IP_); > > + } > > + preempt_enable(); > > + > > + return locked; > > +} > > +EXPORT_SYMBOL(ww_mutex_trylock); > > You'll need a similar hunk in ww_rt_mutex.c -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [PATCH] drm/plane-helper: fix uninitialized variable reference
On Tue, Sep 07, 2021 at 10:08:36AM -0400, Alex Xu (Hello71) wrote: > drivers/gpu/drm/drm_plane_helper.c: In function 'drm_primary_helper_update': > drivers/gpu/drm/drm_plane_helper.c:113:32: error: 'visible' is used > uninitialized [-Werror=uninitialized] > 113 | struct drm_plane_state plane_state = { > |^~~ > drivers/gpu/drm/drm_plane_helper.c:178:14: note: 'visible' was declared here > 178 | bool visible; > | ^~~ > cc1: all warnings being treated as errors > > visible is an output, not an input. in practice this use might turn out > OK but it's still UB. > > Fixes: df86af9133 ("drm/plane-helper: Add drm_plane_helper_check_state()") I need a signed-off-by from you before I can merge this. See https://dri.freedesktop.org/docs/drm/process/submitting-patches.html#sign-your-work-the-developer-s-certificate-of-origin Patch lgtm otherwise. -Daniel > --- > drivers/gpu/drm/drm_plane_helper.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/drivers/gpu/drm/drm_plane_helper.c > b/drivers/gpu/drm/drm_plane_helper.c > index 5b2d0ca03705..838b32b70bce 100644 > --- a/drivers/gpu/drm/drm_plane_helper.c > +++ b/drivers/gpu/drm/drm_plane_helper.c > @@ -123,7 +123,6 @@ static int drm_plane_helper_check_update(struct drm_plane > *plane, > .crtc_w = drm_rect_width(dst), > .crtc_h = drm_rect_height(dst), > .rotation = rotation, > - .visible = *visible, > }; > struct drm_crtc_state crtc_state = { > .crtc = crtc, > -- > 2.33.0 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [PATCH v3 7/9] dma-buf/fence-chain: Add fence deadline support
On Wed, Sep 08, 2021 at 11:19:15AM -0700, Rob Clark wrote: > On Wed, Sep 8, 2021 at 10:54 AM Daniel Vetter wrote: > > > > On Fri, Sep 03, 2021 at 11:47:58AM -0700, Rob Clark wrote: > > > From: Rob Clark > > > > > > Signed-off-by: Rob Clark > > > --- > > > drivers/dma-buf/dma-fence-chain.c | 13 + > > > 1 file changed, 13 insertions(+) > > > > > > diff --git a/drivers/dma-buf/dma-fence-chain.c > > > b/drivers/dma-buf/dma-fence-chain.c > > > index 1b4cb3e5cec9..736a9ad3ea6d 100644 > > > --- a/drivers/dma-buf/dma-fence-chain.c > > > +++ b/drivers/dma-buf/dma-fence-chain.c > > > @@ -208,6 +208,18 @@ static void dma_fence_chain_release(struct dma_fence > > > *fence) > > > dma_fence_free(fence); > > > } > > > > > > + > > > +static void dma_fence_chain_set_deadline(struct dma_fence *fence, > > > + ktime_t deadline) > > > +{ > > > + dma_fence_chain_for_each(fence, fence) { > > > + struct dma_fence_chain *chain = to_dma_fence_chain(fence); > > > + struct dma_fence *f = chain ? chain->fence : fence; > > > > Doesn't this just end up calling set_deadline on a chain, potenetially > > resulting in recursion? Also I don't think this should ever happen, why > > did you add that? > > Tbh the fence-chain was the part I was a bit fuzzy about, and the main > reason I added igt tests. The iteration is similar to how, for ex, > dma_fence_chain_signaled() work, and according to the igt test it does > what was intended Huh indeed. Maybe something we should fix, like why does the dma_fence_chain_for_each not give you the upcast chain pointer ... I guess this also needs more Christian and less me. -Daniel > > BR, > -R > > > -Daniel > > > > > + > > > + dma_fence_set_deadline(f, deadline); > > > + } > > > +} > > > + > > > const struct dma_fence_ops dma_fence_chain_ops = { > > > .use_64bit_seqno = true, > > > .get_driver_name = dma_fence_chain_get_driver_name, > > > @@ -215,6 +227,7 @@ const struct dma_fence_ops dma_fence_chain_ops = { > > > .enable_signaling = dma_fence_chain_enable_signaling, > > > .signaled = dma_fence_chain_signaled, > > > .release = dma_fence_chain_release, > > > + .set_deadline = dma_fence_chain_set_deadline, > > > }; > > > EXPORT_SYMBOL(dma_fence_chain_ops); > > > > > > -- > > > 2.31.1 > > > > > > > -- > > Daniel Vetter > > Software Engineer, Intel Corporation > > http://blog.ffwll.ch -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
Re: [PATCH v3 8/9] dma-buf/sync_file: Add SET_DEADLINE ioctl
On Wed, Sep 08, 2021 at 11:23:42AM -0700, Rob Clark wrote: > On Wed, Sep 8, 2021 at 10:50 AM Daniel Vetter wrote: > > > > On Fri, Sep 03, 2021 at 11:47:59AM -0700, Rob Clark wrote: > > > From: Rob Clark > > > > > > The initial purpose is for igt tests, but this would also be useful for > > > compositors that wait until close to vblank deadline to make decisions > > > about which frame to show. > > > > > > Signed-off-by: Rob Clark > > > > Needs userspace and I think ideally also some igts to make sure it works > > and doesn't go boom. > > See cover-letter.. there are igt tests, although currently that is the > only user. Ah sorry missed that. It would be good to record that in the commit too that adds the uapi. git blame doesn't find cover letters at all, unlike on gitlab where you get the MR request with everything. Ok there is the Link: thing, but since that only points at the last version all the interesting discussion is still usually lost, so I tend to not bother looking there. > I'd be ok to otherwise initially restrict this and the sw_sync UABI > (CAP_SYS_ADMIN? Or??) until there is a non-igt user, but they are > both needed by the igt tests Hm really awkward, uapi for igts in cross vendor stuff like this isn't great. I think hiding it in vgem is semi-ok (we have fences there already). But it's all a bit silly ... For the tests, should we instead have a selftest/Kunit thing to exercise this stuff? igt probably not quite the right thing. Or combine with a page flip if you want to test msm. -Daniel > > BR, > -R > > > -Daniel > > > > > --- > > > drivers/dma-buf/sync_file.c| 19 +++ > > > include/uapi/linux/sync_file.h | 20 > > > 2 files changed, 39 insertions(+) > > > > > > diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c > > > index 394e6e1e9686..f295772d5169 100644 > > > --- a/drivers/dma-buf/sync_file.c > > > +++ b/drivers/dma-buf/sync_file.c > > > @@ -459,6 +459,22 @@ static long sync_file_ioctl_fence_info(struct > > > sync_file *sync_file, > > > return ret; > > > } > > > > > > +static int sync_file_ioctl_set_deadline(struct sync_file *sync_file, > > > + unsigned long arg) > > > +{ > > > + struct sync_set_deadline ts; > > > + > > > + if (copy_from_user(&ts, (void __user *)arg, sizeof(ts))) > > > + return -EFAULT; > > > + > > > + if (ts.pad) > > > + return -EINVAL; > > > + > > > + dma_fence_set_deadline(sync_file->fence, ktime_set(ts.tv_sec, > > > ts.tv_nsec)); > > > + > > > + return 0; > > > +} > > > + > > > static long sync_file_ioctl(struct file *file, unsigned int cmd, > > > unsigned long arg) > > > { > > > @@ -471,6 +487,9 @@ static long sync_file_ioctl(struct file *file, > > > unsigned int cmd, > > > case SYNC_IOC_FILE_INFO: > > > return sync_file_ioctl_fence_info(sync_file, arg); > > > > > > + case SYNC_IOC_SET_DEADLINE: > > > + return sync_file_ioctl_set_deadline(sync_file, arg); > > > + > > > default: > > > return -ENOTTY; > > > } > > > diff --git a/include/uapi/linux/sync_file.h > > > b/include/uapi/linux/sync_file.h > > > index ee2dcfb3d660..f67d4ffe7566 100644 > > > --- a/include/uapi/linux/sync_file.h > > > +++ b/include/uapi/linux/sync_file.h > > > @@ -67,6 +67,18 @@ struct sync_file_info { > > > __u64 sync_fence_info; > > > }; > > > > > > +/** > > > + * struct sync_set_deadline - set a deadline on a fence > > > + * @tv_sec: seconds elapsed since epoch > > > + * @tv_nsec: nanoseconds elapsed since the time given by the tv_sec > > > + * @pad: must be zero > > > + */ > > > +struct sync_set_deadline { > > > + __s64 tv_sec; > > > + __s32 tv_nsec; > > > + __u32 pad; > > > +}; > > > + > > > #define SYNC_IOC_MAGIC '>' > > > > > > /** > > > @@ -95,4 +107,12 @@ struct sync_file_info { > > > */ > > > #define SYNC_IOC_FILE_INFO _IOWR(SYNC_IOC_MAGIC, 4, struct > > > sync_file_info) > > > > > > + > > > +/** > > > + * DOC: SYNC_IOC_SET_DEADLINE - set a deadline on a fence > > > + * > > > + * Allows userspace to set a deadline on a fence, see > > > dma_fence_set_deadline() > > > + */ > > > +#define SYNC_IOC_SET_DEADLINE_IOW(SYNC_IOC_MAGIC, 5, struct > > > sync_set_deadline) > > > + > > > #endif /* _UAPI_LINUX_SYNC_H */ > > > -- > > > 2.31.1 > > > > > > > -- > > Daniel Vetter > > Software Engineer, Intel Corporation > > http://blog.ffwll.ch -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
[PATCH 0/2] drm/i915/gt: Locking splats PREEMPT_RT
Clark Williams reported two issues with the i915 driver running on PREEMPT_RT. While #1 looks simple I have no idea about #2 thus the RFC. Sebastian
[RFC PATCH 2/2] drm/i915/gt: Use spin_lock_irq() instead of local_irq_disable() + spin_lock()
execlists_dequeue() is invoked from a function which uses local_irq_disable() to disable interrupts so the spin_lock() behaves like spin_lock_irq(). This breaks PREEMPT_RT because local_irq_disable() + spin_lock() is not the same as spin_lock_irq(). execlists_dequeue_irq() and execlists_dequeue() has each one caller only. If intel_engine_cs::active::lock is acquired and released with the _irq suffix then it behaves almost as if execlists_dequeue() would be invoked with disabled interrupts. The difference is the last part of the function which is then invoked with enabled interrupts. I can't tell if this makes a difference. From looking at it, it might work to move the last unlock at the end of the function as I didn't find anything that would acquire the lock again. Reported-by: Clark Williams Signed-off-by: Sebastian Andrzej Siewior --- .../drm/i915/gt/intel_execlists_submission.c| 17 + 1 file changed, 5 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c index fc77592d88a96..2ec1dd352960b 100644 --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c @@ -1265,7 +1265,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) * and context switches) submission. */ - spin_lock(&engine->active.lock); + spin_lock_irq(&engine->active.lock); /* * If the queue is higher priority than the last @@ -1365,7 +1365,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) * Even if ELSP[1] is occupied and not worthy * of timeslices, our queue might be. */ - spin_unlock(&engine->active.lock); + spin_unlock_irq(&engine->active.lock); return; } } @@ -1391,7 +1391,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) if (last && !can_merge_rq(last, rq)) { spin_unlock(&ve->base.active.lock); - spin_unlock(&engine->active.lock); + spin_unlock_irq(&engine->active.lock); return; /* leave this for another sibling */ } @@ -1552,7 +1552,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) * interrupt for secondary ports). */ execlists->queue_priority_hint = queue_prio(execlists); - spin_unlock(&engine->active.lock); + spin_unlock_irq(&engine->active.lock); /* * We can skip poking the HW if we ended up with exactly the same set @@ -1578,13 +1578,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine) } } -static void execlists_dequeue_irq(struct intel_engine_cs *engine) -{ - local_irq_disable(); /* Suspend interrupts across request submission */ - execlists_dequeue(engine); - local_irq_enable(); /* flush irq_work (e.g. breadcrumb enabling) */ -} - static void clear_ports(struct i915_request **ports, int count) { memset_p((void **)ports, NULL, count); @@ -2377,7 +2370,7 @@ static void execlists_submission_tasklet(struct tasklet_struct *t) } if (!engine->execlists.pending[0]) { - execlists_dequeue_irq(engine); + execlists_dequeue(engine); start_timeslice(engine); } -- 2.33.0
[PATCH 1/2] drm/i915/gt: Queue and wait for the irq_work item.
Disabling interrupts and invoking the irq_work function directly breaks on PREEMPT_RT. PREEMPT_RT does not invoke all irq_work from hardirq context because some of the user have spinlock_t locking in the callback function. These locks are then turned into a sleeping locks which can not be acquired with disabled interrupts. Using irq_work_queue() has the benefit that the irqwork will be invoked in the regular context. In general there is "no" delay between enqueuing the callback and its invocation because the interrupt is raised right away on architectures which support it (which includes x86). Use irq_work_queue() + irq_work_sync() instead invoking the callback directly. Reported-by: Clark Williams Signed-off-by: Sebastian Andrzej Siewior --- drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c index 38cc42783dfb2..594dec2f76954 100644 --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c @@ -318,10 +318,9 @@ void __intel_breadcrumbs_park(struct intel_breadcrumbs *b) /* Kick the work once more to drain the signalers, and disarm the irq */ irq_work_sync(&b->irq_work); while (READ_ONCE(b->irq_armed) && !atomic_read(&b->active)) { - local_irq_disable(); - signal_irq_work(&b->irq_work); - local_irq_enable(); + irq_work_queue(&b->irq_work); cond_resched(); + irq_work_sync(&b->irq_work); } } -- 2.33.0
[PATCH] drm/nouveau/nvkm: Replace -ENOSYS with -ENODEV
nvkm test builds fail with the following error. drivers/gpu/drm/nouveau/nvkm/engine/device/ctrl.c: In function 'nvkm_control_mthd_pstate_info': drivers/gpu/drm/nouveau/nvkm/engine/device/ctrl.c:60:35: error: overflow in conversion from 'int' to '__s8' {aka 'signed char'} changes value from '-251' to '5' The code builds on most architectures, but fails on parisc where ENOSYS is defined as 251. Replace the error code with -ENODEV (-19). The actual error code does not really matter and is not passed to userspace - it just has to be negative. Fixes: 7238eca4cf18 ("drm/nouveau: expose pstate selection per-power source in sysfs") Signed-off-by: Guenter Roeck --- drivers/gpu/drm/nouveau/nvkm/engine/device/ctrl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/device/ctrl.c b/drivers/gpu/drm/nouveau/nvkm/engine/device/ctrl.c index b0ece71aefde..ce774579c89d 100644 --- a/drivers/gpu/drm/nouveau/nvkm/engine/device/ctrl.c +++ b/drivers/gpu/drm/nouveau/nvkm/engine/device/ctrl.c @@ -57,7 +57,7 @@ nvkm_control_mthd_pstate_info(struct nvkm_control *ctrl, void *data, u32 size) args->v0.count = 0; args->v0.ustate_ac = NVIF_CONTROL_PSTATE_INFO_V0_USTATE_DISABLE; args->v0.ustate_dc = NVIF_CONTROL_PSTATE_INFO_V0_USTATE_DISABLE; - args->v0.pwrsrc = -ENOSYS; + args->v0.pwrsrc = -ENODEV; args->v0.pstate = NVIF_CONTROL_PSTATE_INFO_V0_PSTATE_UNKNOWN; } -- 2.33.0
Re: [PATCH 13/14] drm/kmb: Enable alpha blended second plane
Hi Thomas, On Wed, Sep 08, 2021 at 07:50:42PM +0200, Thomas Zimmermann wrote: > Hi > > Am 03.08.21 um 07:10 schrieb Sam Ravnborg: > > Hi Anitha, > > > > On Mon, Aug 02, 2021 at 08:44:26PM +, Chrisanthus, Anitha wrote: > > > Hi Sam, > > > Thanks. Where should this go, drm-misc-fixes or drm-misc-next? > > > > Looks like a drm-misc-next candidate to me. > > I may improve something for existing users, but it does not look like it > > fixes an existing bug. > > I found this patch in drm-misc-fixes, although it doesn't look like a > bugfix. It should have gone into drm-misc-next. See [1]. If it indeed > belongs into drm-misc-fixes, it certainly should have contained a Fixes tag. The patch fixes some warnings that has become errors the last week. Anitha pinged me about it, but I failed to followup. So in the end it was applied to shut up the warning => errors. Sam
[drm:i915-uncore-vfunc 30/31] drivers/gpu/drm/i915/selftests/mock_uncore.c:47:2: error: implicit declaration of function 'ASSIGN_RAW_WRITE_MMIO_VFUNCS'; did you mean 'MMIO_RAW_WRITE_VFUNCS'?
tree: git://people.freedesktop.org/~airlied/linux.git i915-uncore-vfunc head: b42168f90718a90b11f2d52306d9aeaa9468 commit: 99aebd17891290abfca80c48eca01f4e02413fb3 [30/31] drm/i915/uncore: constify the register vtables. config: i386-allyesconfig (attached as .config) compiler: gcc-9 (Debian 9.3.0-22) 9.3.0 reproduce (this is a W=1 build): git remote add drm git://people.freedesktop.org/~airlied/linux.git git fetch --no-tags drm i915-uncore-vfunc git checkout 99aebd17891290abfca80c48eca01f4e02413fb3 # save the attached .config to linux build tree make W=1 ARCH=i386 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot All errors (new ones prefixed by >>): In file included from drivers/gpu/drm/i915/intel_uncore.c:2630: drivers/gpu/drm/i915/selftests/mock_uncore.c: In function 'mock_uncore_init': >> drivers/gpu/drm/i915/selftests/mock_uncore.c:47:2: error: implicit >> declaration of function 'ASSIGN_RAW_WRITE_MMIO_VFUNCS'; did you mean >> 'MMIO_RAW_WRITE_VFUNCS'? [-Werror=implicit-function-declaration] 47 | ASSIGN_RAW_WRITE_MMIO_VFUNCS(uncore, nop); | ^~~~ | MMIO_RAW_WRITE_VFUNCS >> drivers/gpu/drm/i915/selftests/mock_uncore.c:47:39: error: 'nop' undeclared >> (first use in this function); did you mean 'nopv'? 47 | ASSIGN_RAW_WRITE_MMIO_VFUNCS(uncore, nop); | ^~~ | nopv drivers/gpu/drm/i915/selftests/mock_uncore.c:47:39: note: each undeclared identifier is reported only once for each function it appears in >> drivers/gpu/drm/i915/selftests/mock_uncore.c:48:2: error: implicit >> declaration of function 'ASSIGN_RAW_READ_MMIO_VFUNCS' >> [-Werror=implicit-function-declaration] 48 | ASSIGN_RAW_READ_MMIO_VFUNCS(uncore, nop); | ^~~ At top level: >> drivers/gpu/drm/i915/selftests/mock_uncore.c:36:1: error: 'nop_read64' >> defined but not used [-Werror=unused-function] 36 | nop_read##x(struct intel_uncore *uncore, i915_reg_t reg, bool trace) { return 0; } | ^~~~ drivers/gpu/drm/i915/selftests/mock_uncore.c:36:1: note: in definition of macro '__nop_read' 36 | nop_read##x(struct intel_uncore *uncore, i915_reg_t reg, bool trace) { return 0; } | ^~~~ >> drivers/gpu/drm/i915/selftests/mock_uncore.c:36:1: error: 'nop_read32' >> defined but not used [-Werror=unused-function] 36 | nop_read##x(struct intel_uncore *uncore, i915_reg_t reg, bool trace) { return 0; } | ^~~~ drivers/gpu/drm/i915/selftests/mock_uncore.c:36:1: note: in definition of macro '__nop_read' 36 | nop_read##x(struct intel_uncore *uncore, i915_reg_t reg, bool trace) { return 0; } | ^~~~ >> drivers/gpu/drm/i915/selftests/mock_uncore.c:36:1: error: 'nop_read16' >> defined but not used [-Werror=unused-function] 36 | nop_read##x(struct intel_uncore *uncore, i915_reg_t reg, bool trace) { return 0; } | ^~~~ drivers/gpu/drm/i915/selftests/mock_uncore.c:36:1: note: in definition of macro '__nop_read' 36 | nop_read##x(struct intel_uncore *uncore, i915_reg_t reg, bool trace) { return 0; } | ^~~~ >> drivers/gpu/drm/i915/selftests/mock_uncore.c:36:1: error: 'nop_read8' >> defined but not used [-Werror=unused-function] 36 | nop_read##x(struct intel_uncore *uncore, i915_reg_t reg, bool trace) { return 0; } | ^~~~ drivers/gpu/drm/i915/selftests/mock_uncore.c:36:1: note: in definition of macro '__nop_read' 36 | nop_read##x(struct intel_uncore *uncore, i915_reg_t reg, bool trace) { return 0; } | ^~~~ >> drivers/gpu/drm/i915/selftests/mock_uncore.c:29:1: error: 'nop_write32' >> defined but not used [-Werror=unused-function] 29 | nop_write##x(struct intel_uncore *uncore, i915_reg_t reg, u##x val, bool trace) { } | ^ drivers/gpu/drm/i915/selftests/mock_uncore.c:29:1: note: in definition of macro '__nop_write' 29 | nop_write##x(struct intel_uncore *uncore, i915_reg_t reg, u##x val, bool trace) { } | ^ >> drivers/gpu/drm/i915/selftests/mock_uncore.c:29:1: error: 'nop_write16' >> defined but not used [-Werror=unused-function] 29 | nop_write##x(struct intel_uncore *uncore, i915_reg_t reg, u##x val, bool trace) { } | ^ drivers/gpu/drm/i915/selftests/mock_uncore.c:29:1: note: in definition of macro '__nop_write' 29 | nop_write##x(struct intel_uncore *uncore, i915_reg_t reg, u##x val, bool trace) { } | ^ >> drivers/gpu/drm/i915/selftests/mock_uncore.c:29:1: error: 'nop_write8' >> defined but not used [-Werror=unused-function] 29 | nop_write##x(struct intel_uncore *uncore, i915_reg_t reg, u##x val, bool trace) { } | ^ drivers/gpu/drm/i915/selftests/mock_uncore.c:29:1: note: in def
Re: [PATCH v3 8/9] dma-buf/sync_file: Add SET_DEADLINE ioctl
On Wed, Sep 8, 2021 at 11:49 AM Daniel Vetter wrote: > > On Wed, Sep 08, 2021 at 11:23:42AM -0700, Rob Clark wrote: > > On Wed, Sep 8, 2021 at 10:50 AM Daniel Vetter wrote: > > > > > > On Fri, Sep 03, 2021 at 11:47:59AM -0700, Rob Clark wrote: > > > > From: Rob Clark > > > > > > > > The initial purpose is for igt tests, but this would also be useful for > > > > compositors that wait until close to vblank deadline to make decisions > > > > about which frame to show. > > > > > > > > Signed-off-by: Rob Clark > > > > > > Needs userspace and I think ideally also some igts to make sure it works > > > and doesn't go boom. > > > > See cover-letter.. there are igt tests, although currently that is the > > only user. > > Ah sorry missed that. It would be good to record that in the commit too > that adds the uapi. git blame doesn't find cover letters at all, unlike on > gitlab where you get the MR request with everything. > > Ok there is the Link: thing, but since that only points at the last > version all the interesting discussion is still usually lost, so I tend to > not bother looking there. > > > I'd be ok to otherwise initially restrict this and the sw_sync UABI > > (CAP_SYS_ADMIN? Or??) until there is a non-igt user, but they are > > both needed by the igt tests > > Hm really awkward, uapi for igts in cross vendor stuff like this isn't > great. I think hiding it in vgem is semi-ok (we have fences there > already). But it's all a bit silly ... > > For the tests, should we instead have a selftest/Kunit thing to exercise > this stuff? igt probably not quite the right thing. Or combine with a page > flip if you want to test msm. Hmm, IIRC we have used CONFIG_BROKEN or something along those lines for UABI in other places where we weren't willing to commit to yet? I suppose if we had to I could make this a sw_sync ioctl instead. But OTOH there are kind of a limited # of ways this ioctl could look. And we already know that at least some wayland compositors are going to want this. I guess I can look at non-igt options. But the igt test is already a pretty convenient way to contrive situations (like loops, which is a thing I need to add) BR, -R > -Daniel > > > > > BR, > > -R > > > > > -Daniel > > > > > > > --- > > > > drivers/dma-buf/sync_file.c| 19 +++ > > > > include/uapi/linux/sync_file.h | 20 > > > > 2 files changed, 39 insertions(+) > > > > > > > > diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c > > > > index 394e6e1e9686..f295772d5169 100644 > > > > --- a/drivers/dma-buf/sync_file.c > > > > +++ b/drivers/dma-buf/sync_file.c > > > > @@ -459,6 +459,22 @@ static long sync_file_ioctl_fence_info(struct > > > > sync_file *sync_file, > > > > return ret; > > > > } > > > > > > > > +static int sync_file_ioctl_set_deadline(struct sync_file *sync_file, > > > > + unsigned long arg) > > > > +{ > > > > + struct sync_set_deadline ts; > > > > + > > > > + if (copy_from_user(&ts, (void __user *)arg, sizeof(ts))) > > > > + return -EFAULT; > > > > + > > > > + if (ts.pad) > > > > + return -EINVAL; > > > > + > > > > + dma_fence_set_deadline(sync_file->fence, ktime_set(ts.tv_sec, > > > > ts.tv_nsec)); > > > > + > > > > + return 0; > > > > +} > > > > + > > > > static long sync_file_ioctl(struct file *file, unsigned int cmd, > > > > unsigned long arg) > > > > { > > > > @@ -471,6 +487,9 @@ static long sync_file_ioctl(struct file *file, > > > > unsigned int cmd, > > > > case SYNC_IOC_FILE_INFO: > > > > return sync_file_ioctl_fence_info(sync_file, arg); > > > > > > > > + case SYNC_IOC_SET_DEADLINE: > > > > + return sync_file_ioctl_set_deadline(sync_file, arg); > > > > + > > > > default: > > > > return -ENOTTY; > > > > } > > > > diff --git a/include/uapi/linux/sync_file.h > > > > b/include/uapi/linux/sync_file.h > > > > index ee2dcfb3d660..f67d4ffe7566 100644 > > > > --- a/include/uapi/linux/sync_file.h > > > > +++ b/include/uapi/linux/sync_file.h > > > > @@ -67,6 +67,18 @@ struct sync_file_info { > > > > __u64 sync_fence_info; > > > > }; > > > > > > > > +/** > > > > + * struct sync_set_deadline - set a deadline on a fence > > > > + * @tv_sec: seconds elapsed since epoch > > > > + * @tv_nsec: nanoseconds elapsed since the time given by the tv_sec > > > > + * @pad: must be zero > > > > + */ > > > > +struct sync_set_deadline { > > > > + __s64 tv_sec; > > > > + __s32 tv_nsec; > > > > + __u32 pad; > > > > +}; > > > > + > > > > #define SYNC_IOC_MAGIC '>' > > > > > > > > /** > > > > @@ -95,4 +107,12 @@ struct sync_file_info { > > > > */ > > > > #define SYNC_IOC_FILE_INFO _IOWR(SYNC_IOC_MAGIC, 4, struct > > > > sync_file_info) > > > > > > > > + > > > > +/** > > > > + * DOC: SYNC_IOC_SET_DEADLINE - set a deadline on a fence > > > > + * > >
Re: [PATCH 2/8] drm/i915/xehp: CCS shares the render reset domain
On Wed, Sep 08, 2021 at 11:07:07AM +0100, Tvrtko Ursulin wrote: > > On 07/09/2021 18:19, Matt Roper wrote: > > The reset domain is shared between render and all compute engines, > > so resetting one will affect the others. > > > > Note: Before performing a reset on an RCS or CCS engine, the GuC will > > attempt to preempt-to-idle the other non-hung RCS/CCS engines to avoid > > impacting other clients (since some shared modules will be reset). If > > other engines are executing non-preemptable workloads, the impact is > > unavoidable and some work may be lost. > > Since here it talks about engine reset, should this patch add warning if > same is attempted by i915 on a GuC platform - to document it is not Did you mean "on a *non* GuC platform" here? We aren't going to have compute engine support on any platforms where GuC submission isn't the default operating model, so the only way to get compute engines + execlist submission is to force an override via module parameters (e.g., enable_guc=0). Doing so will taint the kernel, so I think the current consensus from offline discussion is that the user has already put themselves into a configuration where it's easier than usual to shoot themselves in the foot; it's not too much different than the kind of trouble a user could get themselves into if they loaded the driver with hangcheck disabled or something. Matt > implemented/supported? Or perhaps later in the series, or future series > works better. > > Reviewed-by: Tvrtko Ursulin > > Regards, > > Tvrtko > > > Bspec: 52549 > > Original-patch-by: Michel Thierry > > Cc: Tvrtko Ursulin > > Cc: Vinay Belgaumkar > > Signed-off-by: Daniele Ceraolo Spurio > > Signed-off-by: Aravind Iddamsetty > > Signed-off-by: Matt Roper > > --- > > drivers/gpu/drm/i915/gt/intel_reset.c | 4 > > 1 file changed, 4 insertions(+) > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c > > b/drivers/gpu/drm/i915/gt/intel_reset.c > > index 91200c43951f..30598c1d070c 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_reset.c > > +++ b/drivers/gpu/drm/i915/gt/intel_reset.c > > @@ -507,6 +507,10 @@ static int gen11_reset_engines(struct intel_gt *gt, > > [VECS1] = GEN11_GRDOM_VECS2, > > [VECS2] = GEN11_GRDOM_VECS3, > > [VECS3] = GEN11_GRDOM_VECS4, > > + [CCS0] = GEN11_GRDOM_RENDER, > > + [CCS1] = GEN11_GRDOM_RENDER, > > + [CCS2] = GEN11_GRDOM_RENDER, > > + [CCS3] = GEN11_GRDOM_RENDER, > > }; > > struct intel_engine_cs *engine; > > intel_engine_mask_t tmp; > > -- Matt Roper Graphics Software Engineer VTT-OSGC Platform Enablement Intel Corporation (916) 356-2795
Re: [PATCH v3 8/9] dma-buf/sync_file: Add SET_DEADLINE ioctl
On Wed, Sep 8, 2021 at 9:36 PM Rob Clark wrote: > On Wed, Sep 8, 2021 at 11:49 AM Daniel Vetter wrote: > > On Wed, Sep 08, 2021 at 11:23:42AM -0700, Rob Clark wrote: > > > On Wed, Sep 8, 2021 at 10:50 AM Daniel Vetter wrote: > > > > > > > > On Fri, Sep 03, 2021 at 11:47:59AM -0700, Rob Clark wrote: > > > > > From: Rob Clark > > > > > > > > > > The initial purpose is for igt tests, but this would also be useful > > > > > for > > > > > compositors that wait until close to vblank deadline to make decisions > > > > > about which frame to show. > > > > > > > > > > Signed-off-by: Rob Clark > > > > > > > > Needs userspace and I think ideally also some igts to make sure it works > > > > and doesn't go boom. > > > > > > See cover-letter.. there are igt tests, although currently that is the > > > only user. > > > > Ah sorry missed that. It would be good to record that in the commit too > > that adds the uapi. git blame doesn't find cover letters at all, unlike on > > gitlab where you get the MR request with everything. > > > > Ok there is the Link: thing, but since that only points at the last > > version all the interesting discussion is still usually lost, so I tend to > > not bother looking there. > > > > > I'd be ok to otherwise initially restrict this and the sw_sync UABI > > > (CAP_SYS_ADMIN? Or??) until there is a non-igt user, but they are > > > both needed by the igt tests > > > > Hm really awkward, uapi for igts in cross vendor stuff like this isn't > > great. I think hiding it in vgem is semi-ok (we have fences there > > already). But it's all a bit silly ... > > > > For the tests, should we instead have a selftest/Kunit thing to exercise > > this stuff? igt probably not quite the right thing. Or combine with a page > > flip if you want to test msm. > > Hmm, IIRC we have used CONFIG_BROKEN or something along those lines > for UABI in other places where we weren't willing to commit to yet? > > I suppose if we had to I could make this a sw_sync ioctl instead. But > OTOH there are kind of a limited # of ways this ioctl could look. And > we already know that at least some wayland compositors are going to > want this. Hm I was trying to think up a few ways this could work, but didn't come up with anything reasonable. Forcing the compositor to boost the entire chain (for gl composited primary plane fallback) is something the kernel can easily do too. Also only makes sense for priority boost, not so much for clock boosting, since clock boosting only really needs the final element to be boosted. > I guess I can look at non-igt options. But the igt test is already a > pretty convenient way to contrive situations (like loops, which is a > thing I need to add) Yeah it's definitely very useful for testing ... One option could be a hacky debugfs interface, where you write a fd number and deadline and the debugfs read function does the deadline setting. Horribly, but since it's debugfs no one ever cares. That's at least where we're hiding all the i915 hacks that igts need. -Daniel > BR, > -R > > > > -Daniel > > > > > > > > BR, > > > -R > > > > > > > -Daniel > > > > > > > > > --- > > > > > drivers/dma-buf/sync_file.c| 19 +++ > > > > > include/uapi/linux/sync_file.h | 20 > > > > > 2 files changed, 39 insertions(+) > > > > > > > > > > diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c > > > > > index 394e6e1e9686..f295772d5169 100644 > > > > > --- a/drivers/dma-buf/sync_file.c > > > > > +++ b/drivers/dma-buf/sync_file.c > > > > > @@ -459,6 +459,22 @@ static long sync_file_ioctl_fence_info(struct > > > > > sync_file *sync_file, > > > > > return ret; > > > > > } > > > > > > > > > > +static int sync_file_ioctl_set_deadline(struct sync_file *sync_file, > > > > > + unsigned long arg) > > > > > +{ > > > > > + struct sync_set_deadline ts; > > > > > + > > > > > + if (copy_from_user(&ts, (void __user *)arg, sizeof(ts))) > > > > > + return -EFAULT; > > > > > + > > > > > + if (ts.pad) > > > > > + return -EINVAL; > > > > > + > > > > > + dma_fence_set_deadline(sync_file->fence, ktime_set(ts.tv_sec, > > > > > ts.tv_nsec)); > > > > > + > > > > > + return 0; > > > > > +} > > > > > + > > > > > static long sync_file_ioctl(struct file *file, unsigned int cmd, > > > > > unsigned long arg) > > > > > { > > > > > @@ -471,6 +487,9 @@ static long sync_file_ioctl(struct file *file, > > > > > unsigned int cmd, > > > > > case SYNC_IOC_FILE_INFO: > > > > > return sync_file_ioctl_fence_info(sync_file, arg); > > > > > > > > > > + case SYNC_IOC_SET_DEADLINE: > > > > > + return sync_file_ioctl_set_deadline(sync_file, arg); > > > > > + > > > > > default: > > > > > return -ENOTTY; > > > > > } > > > > > diff --git a/include/uapi/linux/sync_file.h > > > > > b/include/uapi/linux/sync_file.h > > > > > inde
Re: [PATCH v3 10/16] drm/panel-simple: Non-eDP panels don't need "HPD" handling
Hi, On Sun, Sep 5, 2021 at 11:46 AM Sam Ravnborg wrote: > > On Wed, Sep 01, 2021 at 01:19:28PM -0700, Douglas Anderson wrote: > > All of the "HPD" handling added to panel-simple recently was for eDP > > panels. Remove it from panel-simple now that panel-simple-edp handles > > eDP panels. The "prepare_to_enable" delay only makes sense in the > > context of HPD, so remove it too. No non-eDP panels used it anyway. > > > > Signed-off-by: Douglas Anderson > > Maybe merge this with the patch that moved all the functionality > from panel-simple to panel-edp? Unless you feel strongly about it, I'm going to keep it separate still in the next version. To try to make diffing easier, I tried hard to make the minimal changes in the "split the driver in two" patch. -Doug
Re: [PATCH] drm/bridge: ti-sn65dsi83: Check link status register after enabling the bridge
W dniu 08.09.2021 o 13:11, Dave Stevenson pisze: > Hi Marek and Andrzej > > On Tue, 7 Sept 2021 at 22:24, Marek Vasut wrote: >> On 9/7/21 7:29 PM, Andrzej Hajda wrote: >>> W dniu 07.09.2021 o 16:25, Marek Vasut pisze: On 9/7/21 9:31 AM, Andrzej Hajda wrote: > On 07.09.2021 04:39, Marek Vasut wrote: >> In rare cases, the bridge may not start up correctly, which usually >> leads to no display output. In case this happens, warn about it in >> the kernel log. >> >> Signed-off-by: Marek Vasut >> Cc: Jagan Teki >> Cc: Laurent Pinchart >> Cc: Linus Walleij >> Cc: Robert Foss >> Cc: Sam Ravnborg >> Cc: dri-devel@lists.freedesktop.org >> --- >> NOTE: See the following: >> https://e2e.ti.com/support/interface-group/interface/f/interface-forum/942005/sn65dsi83-dsi83-lvds-bridge---sporadic-behavior---no-video >> >> https://community.nxp.com/t5/i-MX-Processors/i-MX8M-MIPI-DSI-Interface-LVDS-Bridge-Initialization/td-p/1156533 >> >> --- >> drivers/gpu/drm/bridge/ti-sn65dsi83.c | 5 + >> 1 file changed, 5 insertions(+) >> >> diff --git a/drivers/gpu/drm/bridge/ti-sn65dsi83.c >> b/drivers/gpu/drm/bridge/ti-sn65dsi83.c >> index a32f70bc68ea4..4ea71d7f0bfbc 100644 >> --- a/drivers/gpu/drm/bridge/ti-sn65dsi83.c >> +++ b/drivers/gpu/drm/bridge/ti-sn65dsi83.c >> @@ -520,6 +520,11 @@ static void sn65dsi83_atomic_enable(struct >> drm_bridge *bridge, >> /* Clear all errors that got asserted during initialization. */ >> regmap_read(ctx->regmap, REG_IRQ_STAT, &pval); >> regmap_write(ctx->regmap, REG_IRQ_STAT, pval); > > It does not look as correct error handling, maybe it would be good to > analyze and optionally report 'unexpected' errors here as well. The above is correct -- it clears the status register because the setup might've set random bits in that register. Then we wait a bit, let the link run, and read them again to get the real link status in this new piece of code below, hence the usleep_range there. And then if the link indicates a problem, we know it is a problem. >>> >>> Usually such registers are cleared on very beginning of the >>> initialization, and tested (via irq handler, or via reading), during >>> initalization, if initialization phase goes well. If it is not the case >>> forgive me. >> The init just flips the bit at random in the IRQ_STAT register, so no, >> that's not really viable here. That's why we clear them at the end, and >> then wait a bit, and then check whether something new appeared in them. >> >> If not, all is great. >> >> Sure, we could generate an IRQ, but then IRQ line is not always >> connected to this chip on all hardware I have available. So this gives >> the user at least some indication that something is wrong with their HW. >> >> + >> +usleep_range(1, 12000); >> +regmap_read(ctx->regmap, REG_IRQ_STAT, &pval); >> +if (pval) >> +dev_err(ctx->dev, "Unexpected link status 0x%02x\n", pval); > > I am not sure what is the case here but it looks like 'we do not know > what is going on, so let's add some diagnostic messages to gather info > and figure it out later'. That's pretty much the case, see the two links above in the NOTE section. If something goes wrong, we print the value for the user (usually developer) so they can fix their problems. We cannot do much better in the attach callback. The issue I ran into (and where this would be helpful information to me during debugging, since the issue happened real seldom, see also the NOTE links above) is that the DSI controller driver started streaming video on the data lanes before the DSI83 had a chance to initialize. This worked most of the time, except for a few exceptions here and there, where the video didn't start. This does set link status bits consistently. In the meantime, I fixed the controller driver (so far downstream, due to ongoing discussion). >>> >>> Maybe drm_connector_set_link_status_property(conn, >>> DRM_MODE_LINK_STATUS_BAD) would be usefule here. >> Hmm, this works on connector, the dsi83 is a bridge and it can be stuck >> between two other bridges. That doesn't seem like the right tool, no ? >> > Whole driver lacks IRQ handler which IMO could perform better diagnosis, > and I guess it could also help in recovery, but this is just my guess. > So if this patch is enough for now you can add: No, IRQ won't help you here, because by the time you get the IRQ, the DSI host already started streaming video on data lanes and you won't be able to correctly reinit the DSI83 unless you communicate to the DSI host that it should switch the data lanes back to LP11. And for that, there is a bigger chunk missing really. What needs to be added is a way for th
Re: [PATCH] drm: mxsfb: Fix NULL pointer dereference crash on unload
On 9/8/21 8:24 PM, Daniel Vetter wrote: On Tue, Sep 07, 2021 at 04:49:00AM +0200, Marek Vasut wrote: The mxsfb->crtc.funcs may already be NULL when unloading the driver, in which case calling mxsfb_irq_disable() via drm_irq_uninstall() from mxsfb_unload() leads to NULL pointer dereference. Since all we care about is masking the IRQ and mxsfb->base is still valid, just use that to clear and mask the IRQ. Fixes: ae1ed00932819 ("drm: mxsfb: Stop using DRM simple display pipeline helper") Signed-off-by: Marek Vasut Cc: Daniel Abrecht Cc: Emil Velikov Cc: Laurent Pinchart Cc: Sam Ravnborg Cc: Stefan Agner You probably want a drm_atomic_helper_shutdown instead of trying to do all that manually. We've also added a bunch more devm and drmm_ functions to automate the cleanup a lot more here, e.g. your drm_mode_config_cleanup is in the wrong place. Also I'm confused because I'm not even seeing this function anywhere in upstream. It is still here: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/gpu/drm/mxsfb/mxsfb_drv.c#n171 as of: 999569d59a0aa ("Add linux-next specific files for 20210908") Is there some other tree I should be looking at ?