Re: [RFC][PATCH] drm/amdgpu/powerplay/smu10: Add custom profile

2021-09-08 Thread Daniel Gomez
On Tue, 7 Sept 2021 at 19:23, Alex Deucher  wrote:
>
> On Tue, Sep 7, 2021 at 4:53 AM Daniel Gomez  wrote:
> >
> > Add custom power profile mode support on smu10.
> > Update workload bit list.
> > ---
> >
> > Hi,
> >
> > I'm trying to add custom profile for the Raven Ridge but not sure if
> > I'd need a different parameter than PPSMC_MSG_SetCustomPolicy to
> > configure the custom values. The code seemed to support CUSTOM for
> > workload types but it didn't show up in the menu or accept any user
> > input parameter. So far, I've added that part but a bit confusing to
> > me what is the policy I need for setting these parameters or if it's
> > maybe not possible at all.
> >
> > After applying the changes I'd configure the CUSTOM mode as follows:
> >
> > echo manual > 
> > /sys/class/drm/card0/device/hwmon/hwmon1/device/power_dpm_force_performance_level
> > echo "6 70 90 0 0" > 
> > /sys/class/drm/card0/device/hwmon/hwmon1/device/pp_power_profile_mode
> >
> > Then, using Darren Powell script for testing modes I get the following
> > output:
> >
> > 05:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. 
> > [AMD/ATI] Raven Ridge [Radeon Vega Series / Radeon Vega Mobile Series] 
> > [1002:15dd] (rev 83)
> > === pp_dpm_sclk ===
> > 0: 200Mhz
> > 1: 400Mhz *
> > 2: 1100Mhz
> > === pp_dpm_mclk ===
> > 0: 400Mhz
> > 1: 933Mhz *
> > 2: 1067Mhz
> > 3: 1200Mhz
> > === pp_power_profile_mode ===
> > NUMMODE_NAME BUSY_SET_POINT FPS USE_RLC_BUSY MIN_ACTIVE_LEVEL
> >   0 BOOTUP_DEFAULT : 70  60  0  0
> >   1 3D_FULL_SCREEN : 70  60  1  3
> >   2   POWER_SAVING : 90  60  0  0
> >   3  VIDEO : 70  60  0  0
> >   4 VR : 70  90  0  0
> >   5COMPUTE : 30  60  0  6
> >   6 CUSTOM*: 70  90  0  0
> >
> > As you can also see in my changes, I've also updated the workload bit
> > table but I'm not completely sure about that change. With the tests
> > I've done, using bit 5 for the WORKLOAD_PPLIB_CUSTOM_BIT makes the
> > gpu sclk locked around ~36%. So, maybe I'm missing a clock limit
> > configuraton table somewhere. Would you give me some hints to
> > proceed with this?
>
> I don't think APUs support customizing the workloads the same way
> dGPUs do.  I think they just support predefined profiles.
>
> Alex


Thanks Alex for the quick response. Would it make sense then to remove
the custom workload code (PP_SMC_POWER_PROFILE_CUSTOM) from the smu10?
That workload was added in this commit:
f6f75ebdc06c04d3cfcd100f1b10256a9cdca407 [1] and not use at all in the
code as it's limited to PP_SMC_POWER_PROFILE_COMPUTE index. The
smu10.h also includes the custom workload bit definition and that was
a bit confusing for me to understand if it was half-supported or not
possible to use at all as I understood from your comment.

Perhaps could also be mentioned (if that's kind of standard) in the
documentation[2] so, the custom pp_power_profile_mode is only
supported in dGPUs.

I can send the patches if it makes sense.

[1]: 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c?id=f6f75ebdc06c04d3cfcd100f1b10256a9cdca407
[2]: 
https://www.kernel.org/doc/html/latest/gpu/amdgpu.html#pp-power-profile-mode

Daniel

>
>
> >
> > Thanks in advance,
> > Daniel
> >
> >
> >  drivers/gpu/drm/amd/pm/inc/smu10.h| 14 +++--
> >  .../drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c  | 57 +--
> >  .../drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.h  |  1 +
> >  3 files changed, 61 insertions(+), 11 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/pm/inc/smu10.h 
> > b/drivers/gpu/drm/amd/pm/inc/smu10.h
> > index 9e837a5014c5..b96520528240 100644
> > --- a/drivers/gpu/drm/amd/pm/inc/smu10.h
> > +++ b/drivers/gpu/drm/amd/pm/inc/smu10.h
> > @@ -136,12 +136,14 @@
> >  #define FEATURE_CORE_CSTATES_MASK (1 << FEATURE_CORE_CSTATES_BIT)
> >
> >  /* Workload bits */
> > -#define WORKLOAD_PPLIB_FULL_SCREEN_3D_BIT 0
> > -#define WORKLOAD_PPLIB_VIDEO_BIT  2
> > -#define WORKLOAD_PPLIB_VR_BIT 3
> > -#define WORKLOAD_PPLIB_COMPUTE_BIT4
> > -#define WORKLOAD_PPLIB_CUSTOM_BIT 5
> > -#define WORKLOAD_PPLIB_COUNT  6
> > +#define WORKLOAD_DEFAULT_BIT  0
> > +#define WORKLOAD_PPLIB_FULL_SCREEN_3D_BIT 1
> > +#define WORKLOAD_PPLIB_POWER_SAVING_BIT   2
> > +#define WORKLOAD_PPLIB_VIDEO_BIT  3
> > +#define WORKLOAD_PPLIB_VR_BIT 4
> > +#define WORKLOAD_PPLIB_COMPUTE_BIT5
> > +#define WORKLOAD_PPLIB_CUSTOM_BIT 6
> > +#define WORKLOAD_PPLIB_COUNT  7
> >
> >  typedef struct {
> > /* MP1_EXT_SCRATCH0 */
> > diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.c 
> > b/drivers/gpu/drm/amd/pm/po

Re: [PATCH v10 07/17] dt-bindings: display: mediatek: merge: add additional prop for mt8195

2021-09-08 Thread Jason-JH Lin
Hi Philipp,

Thanks for the reviews.

On Wed, 2021-09-08 at 08:39 +0200, Philipp Zabel wrote:
> Hi Jason,
> 
> On Wed, 2021-09-08 at 14:03 +0800, jason-jh.lin wrote:
> > add MERGE additional properties description for mt8195:
> > 1. async clock
> > 2. fifo setting enable
> > 3. reset controller
> > 
> > Signed-off-by: jason-jh.lin 
> > ---
> >  .../display/mediatek/mediatek,merge.yaml  | 30
> > +++
> >  1 file changed, 30 insertions(+)
> > 
> > diff --git
> > a/Documentation/devicetree/bindings/display/mediatek/mediatek,merge
> > .yaml
> > b/Documentation/devicetree/bindings/display/mediatek/mediatek,merge
> > .yaml
> > index 75beeb207ceb..0fe204d9ad2c 100644
> > ---
> > a/Documentation/devicetree/bindings/display/mediatek/mediatek,merge
> > .yaml
> > +++
> > b/Documentation/devicetree/bindings/display/mediatek/mediatek,merge
> > .yaml
> > @@ -38,6 +38,19 @@ properties:
> >clocks:
> >  items:
> >- description: MERGE Clock
> > +  - description: MERGE Async Clock
> > +  Controlling the synchronous process between MERGE and
> > other display
> > +  function blocks cross clock domain.
> > +
> > +  mediatek,merge-fifo-en:
> > +description:
> > +  The setting of merge fifo is mainly provided for the display
> > latency
> > +  buffer to ensure that the back-end panel display data will
> > not be
> > +  underrun, a little more data is needed in the fifo.
> > +  According to the merge fifo settings, when the water level
> > is detected
> > +  to be insufficient, it will trigger RDMA sending ultra and
> > preulra
> > +  command to SMI to speed up the data rate.
> > +type: boolean
> >  
> >mediatek,gce-client-reg:
> >  description:
> > @@ -50,6 +63,10 @@ properties:
> >  $ref: /schemas/types.yaml#/definitions/phandle-array
> >  maxItems: 1
> >  
> > +  resets:
> > +description: reset controller
> > +  See Documentation/devicetree/bindings/reset/reset.txt for
> > details.
> 
> From the example this looks like it could have a maxItems: 1.

OK, I think it could have a maxItems: 1 in mt8195 because
merge1~megre5 only have one async clock.

> 
> > +
> >  required:
> >- compatible
> >- reg
> 
> Should the resets property be required for "mediatek,mt8195-disp-
> merge"?

I think the resets property is not the required propoerty.
The reset controller is for async clock of MERGE module on vdosys1.
MERGE module on vdosys0 doesn't have async clock, so it doesn't need to
add the resets property.

Regards,
Jason-JH.Lin
> 
> > @@ -67,3 +84,16 @@ examples:
> >  power-domains = <&spm MT8173_POWER_DOMAIN_MM>;
> >  clocks = <&mmsys CLK_MM_DISP_MERGE>;
> >  };
> > +
> > +merge5: disp_vpp_merge5@1c11 {
> > +compatible = "mediatek,mt8195-disp-merge";
> > +reg = <0 0x1c11 0 0x1000>;
> > +interrupts = ;
> > +clocks = <&vdosys1 CLK_VDO1_VPP_MERGE4>,
> > + <&vdosys1 CLK_VDO1_MERGE4_DL_ASYNC>;
> > +clock-names = "merge","merge_async";
> > +power-domains = <&spm MT8195_POWER_DOMAIN_VDOSYS1>;
> > +mediatek,gce-client-reg = <&gce1 SUBSYS_1c11 0x
> > 0x1000>;
> > +mediatek,merge-fifo-en = <1>;
> > +resets = <&vdosys1
> > MT8195_VDOSYS1_SW0_RST_B_MERGE4_DL_ASYNC>;
> > +};
> 
> regards
> Philipp
-- 
Jason-JH Lin 



Re: Handling DRM master transitions cooperatively

2021-09-08 Thread Pekka Paalanen
On Tue, 7 Sep 2021 14:42:56 +0200
Hans de Goede  wrote:

> Hi,
> 
> On 9/7/21 12:07 PM, Pekka Paalanen wrote:
> > On Fri, 3 Sep 2021 21:08:21 +0200
> > Dennis Filder  wrote:
> >   
> >> Hans de Goede asked me to take a topic from a private discussion here.
> >> I must also preface that I'm not a graphics person and my knowledge of
> >> DRI/DRM is cursory at best.
> >>
> >> I initiated the conversation with de Goede after learning that the X
> >> server now supports being started with an open DRM file descriptor
> >> (this was added for Keith Packard's xlease project).  I wondered if
> >> that could be used to smoothen the Plymouth->X transition somehow and
> >> asked de Goede if there were any such plans.  He denied, but mentioned
> >> that a new ioctl is in the works to prevent the kernel from wiping the
> >> contents of a frame buffer after a device is closed, and that this
> >> would help to keep transitions smooth.  
> > 
> > Hi,
> > 
> > I believe the kernel is not wiping anything on device close. If
> > something in the KMS state is wiped, it originates in userspace:
> > 
> > - Plymouth doing something (e.g. RmFB on an in-use FB will turn the
> >   output off, you need to be careful to "leak" your FB if you want a
> >   smooth hand-over)  
> 
> The "kernel is not wiping anything on device close" is not true,
> when closing /dev/dri/card# any remaining FBs from the app closing
> it will be dealt with as if they were RmFB-ed, causing the screen
> to show what I call "the fallback fb", at least with the i915 driver.

No, that's not what should happen AFAIK.

True, all FBs that are not referenced by active CRTCs or planes will
get freed, since their refcount drops to zero, but those CRTCs and
planes that are active will remain active and therefore keep their
reference to the respective FBs and so the FBs remain until replaced or
turned off explicitly (by e.g. fbcon if you switch to that rather than
another userspace KMS client). I believe that is the whole reason why
e.g. DRM_IOCTL_MODE_GETFB2 can be useful, otherwise the next KMS client
would not have anything to scrape.

danvet, what is the DRM core intention?

Or am I confused because display servers do not tend to close the DRM
device fd on switch-out but Plymouth does (too early)?

If so, why can't Plymouth keep the device open longer and quit only
when the hand-off is complete? Not quitting too early would be a
prerequisite for any explicit hand-off protocol as well.


Thanks,
pq

> > - Xorg doing something (e.g. resetting instead of inheriting KMS state)
> > 
> > - Something missed in the hand-off sequence which allows fbcon to
> >   momentarily take over between Plymouth and Xorg. This would need to
> >   be fixed between Plymouth and Xorg.
> > 
> > - Maybe systemd-logind does something odd to the KMS device? It has
> >   pretty wild code there. Or maybe it causes fbcon to take over.
> > 
> > What is the new ioctl you referred to?  
> 
> It is an ioctl to mark a FB to not have it auto-removed on device-close,
> instead leaving it in place until some some kernel/userspace client
> actively installs another FB. This was proposed by Rob Clark quite
> a while ago, but it never got anywhere because of lack of userspace
> actually interested in using it.
> 
> I've been thinking about reviving Rob's patch, since at least for
> plymouth this would be pretty useful to have.
> 
> Regards,
> 
> Hans
> 



pgpgX0HYsfvFC.pgp
Description: OpenPGP digital signature


Re: Handling DRM master transitions cooperatively

2021-09-08 Thread Hans de Goede
Hi,

On 9/8/21 9:36 AM, Pekka Paalanen wrote:
> On Tue, 7 Sep 2021 14:42:56 +0200
> Hans de Goede  wrote:
> 
>> Hi,
>>
>> On 9/7/21 12:07 PM, Pekka Paalanen wrote:
>>> On Fri, 3 Sep 2021 21:08:21 +0200
>>> Dennis Filder  wrote:
>>>   
 Hans de Goede asked me to take a topic from a private discussion here.
 I must also preface that I'm not a graphics person and my knowledge of
 DRI/DRM is cursory at best.

 I initiated the conversation with de Goede after learning that the X
 server now supports being started with an open DRM file descriptor
 (this was added for Keith Packard's xlease project).  I wondered if
 that could be used to smoothen the Plymouth->X transition somehow and
 asked de Goede if there were any such plans.  He denied, but mentioned
 that a new ioctl is in the works to prevent the kernel from wiping the
 contents of a frame buffer after a device is closed, and that this
 would help to keep transitions smooth.  
>>>
>>> Hi,
>>>
>>> I believe the kernel is not wiping anything on device close. If
>>> something in the KMS state is wiped, it originates in userspace:
>>>
>>> - Plymouth doing something (e.g. RmFB on an in-use FB will turn the
>>>   output off, you need to be careful to "leak" your FB if you want a
>>>   smooth hand-over)  
>>
>> The "kernel is not wiping anything on device close" is not true,
>> when closing /dev/dri/card# any remaining FBs from the app closing
>> it will be dealt with as if they were RmFB-ed, causing the screen
>> to show what I call "the fallback fb", at least with the i915 driver.
> 
> No, that's not what should happen AFAIK.

I'm pretty sure that that is what is happening though.

But hopefully someone else can either confirm or deny this :)

> True, all FBs that are not referenced by active CRTCs or planes will
> get freed, since their refcount drops to zero, but those CRTCs and
> planes that are active will remain active and therefore keep their
> reference to the respective FBs and so the FBs remain until replaced or
> turned off explicitly (by e.g. fbcon if you switch to that rather than
> another userspace KMS client). I believe that is the whole reason why
> e.g. DRM_IOCTL_MODE_GETFB2 can be useful, otherwise the next KMS client
> would not have anything to scrape.
> 
> danvet, what is the DRM core intention?
> 
> Or am I confused because display servers do not tend to close the DRM
> device fd on switch-out but Plymouth does (too early)?
> 
> If so, why can't Plymouth keep the device open longer and quit only
> when the hand-off is complete? Not quitting too early would be a
> prerequisite for any explicit hand-off protocol as well.

plymouth is actually keeping the device open longer for exactly this
reason, the following happens:

1. plmouth starts
2. gdm starts and tells plymouth to "deactivate" which will stop it from
making drm ioctls and drop its drm master rights, while keeping the fb around
3. gdm waits for the greeter process to tell it that it has successfully
taken over the screen
4. gdm tells plymouth to quit

And something similar is happening on gdm greeter -> gnome user session
handover.

But we need the new ioctl at least on shutdown / reboot to avoid the
"fallback fb" (typically the EFI/BIOS setup fb which i915 inherited at boot)
showing for a brief moment when plymouth quits at shutdown / reboot and
there is nothing to hand-over the fb to in that case.

And the new ioctl would also make the above handover a lot simpler.

And we currently also have a flicker when going from user-session to
gdm on logout or from gdm/user-session to plymout on shutdown/reboot.

Basically we have quite a few transitions and currently only the
boot + login path is smooth and the rest needs more work, which either
requires a standardized handover method (instead of the current
hardcoded plymouth -> gdm stuff), or just allowing the FB to sit
around until the next drm-client installs its FB, which would be
much more KISS, so that has my preference.

And this KISS method will also work with transitions to a new
console-owner process which is not aware of any handover protocols,
as long as the old process uses the ioctl the transition will be
smooth. So e.g. gdm -> i3 on Xorg session will be smooth (1)

Regards,

Hans


1) I think this actually already is smooth because in this case gdm
just sleeps for 5 seconds before killing the greeter I believe, but
with the ioctl we could remove this hack




Re: [PATCH v2 3/6] drm/i915 Implement LMEM backup and restore for suspend / resume

2021-09-08 Thread Thomas Hellström

Hi, Matt,

Thanks for reviewing.

On 9/7/21 7:37 PM, Matthew Auld wrote:



+    i915_gem_ww_unlock_single(backup);
+    i915_gem_object_put(backup);


I assume we need to set ttm.backup = NULL somewhere here on the 
failure path, or don't drop the ref? Or at least it looks like 
potential uaf later?


Yes, I think on failure, we just don't drop the ref here in case 
something at some point decides to retry.


I'll fix up this and other comments.

/Thomas





+
+    return err;
+}
+


Re: [Freedreno] [PATCH 2/3] drm/msm/dpu1: Add MSM8998 to hw catalog

2021-09-08 Thread Dmitry Baryshkov
Hi,

On Tue, 7 Sept 2021 at 22:13, Jeffrey Hugo  wrote:
>
> On Wed, Sep 1, 2021 at 12:11 PM AngeloGioacchino Del Regno
>  wrote:
> >
> > Bringup functionality for MSM8998 in the DPU, driver which is mostly
> > the same as SDM845 (just a few variations).
> >
> > Signed-off-by: AngeloGioacchino Del Regno 
> > 
>
> I don't seem to see a cover letter for this series.
>
> Eh, there are a fair number of differences between the MDSS versions
> for 8998 and 845.
>
> Probably a bigger question, why extend the DPU driver for 8998, when
> the MDP5 driver already supports it[1]?  The MDP/DPU split is pretty
> dumb, but I don't see a valid reason for both drivers supporting the
> same target/display revision.  IMO, if you want this support in DPU,
> remove it from MDP5.
>
> [1] 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.14&id=d6c7b2284b14c66a268a448a7a8d54f585d38785

I don't think that we should enforce such requirements. Having support
both in MDP5 and DPU would allow one to compare those two drivers,
performance, features, etc.
It might be that all MDP5-supported hardware would be also supported
by DPU, thus allowing us to remove the former driver. But until that
time I'd suggest leaving support in place.

-- 
With best wishes
Dmitry


Re: [PATCH v10 01/17] dt-bindings: arm: mediatek: mmsys: add power and gce properties

2021-09-08 Thread Enric Balletbo i Serra
Hi Jason,

Thank you for your patch. One small comment below.

On 8/9/21 8:02, jason-jh.lin wrote:
> Power:
> 1. Add description for power-domains property.
> 
> GCE:
> 1. Add description for mboxes property.
> 2. Add description for mediatek,gce-client-reg property.
> 
> Signed-off-by: jason-jh.lin 
> ---
>  .../bindings/arm/mediatek/mediatek,mmsys.yaml | 30 ++-
>  1 file changed, 29 insertions(+), 1 deletion(-)
> 
> diff --git 
> a/Documentation/devicetree/bindings/arm/mediatek/mediatek,mmsys.yaml 
> b/Documentation/devicetree/bindings/arm/mediatek/mediatek,mmsys.yaml
> index 2d4ff0ce387b..a2e7bddfed03 100644
> --- a/Documentation/devicetree/bindings/arm/mediatek/mediatek,mmsys.yaml
> +++ b/Documentation/devicetree/bindings/arm/mediatek/mediatek,mmsys.yaml
> @@ -39,6 +39,30 @@ properties:
>reg:
>  maxItems: 1
>  
> +  power-domains:
> +description:
> +  A phandle and PM domain specifier as defined by bindings
> +  of the power controller specified by phandle. See
> +  Documentation/devicetree/bindings/power/power-domain.yaml for details.
> +
> +  mboxes:
> +description:
> +  Using mailbox to communicate with GCE, it should have this
> +  property and list of phandle, mailbox specifiers. See
> +  Documentation/devicetree/bindings/mailbox/mtk-gce.txt for details.
> +$ref: /schemas/types.yaml#/definitions/phandle-array
> +
> +  mediatek,gce-client-reg:
> +description:
> +  The register of client driver can be configured by gce with 4 arguments
> +  defined in this property, such as phandle of gce, subsys id,
> +  register offset and size.
> +  Each subsys id is mapping to a base address of display function blocks
> +  register which is defined in the gce header
> +  include/dt-bindings/gce/-gce.h.
> +$ref: /schemas/types.yaml#/definitions/phandle-array
> +maxItems: 1
> +
>"#clock-cells":
>  const: 1
>  
> @@ -53,6 +77,10 @@ examples:
>- |
>  mmsys: syscon@1400 {
>  compatible = "mediatek,mt8173-mmsys", "syscon";
> -reg = <0x1400 0x1000>;
> +reg = <0 0x1400 0 0x1000>;

Why this change?

Thanks,
  Enric


> +power-domains = <&spm MT8173_POWER_DOMAIN_MM>;
>  #clock-cells = <1>;
> +mboxes = <&gce 0 CMDQ_THR_PRIO_HIGHEST>,
> + <&gce 1 CMDQ_THR_PRIO_HIGHEST>;
> +mediatek,gce-client-reg = <&gce SUBSYS_1400 0 0x1000>;
>  };
> 


Re: [PATCH v4] drm/i915: Use Transparent Hugepages when IOMMU is enabled

2021-09-08 Thread Tvrtko Ursulin



On 07/09/2021 12:13, Eero Tamminen wrote:

Hi,

For completeness sake, it might be worth mentioning specifically what 
(synthetic) test-cases regress with THP patch.


* Skylake GT4e:
   20-25% SynMark TexMem*
   (whereas all MemBW GPU tests either improve or are not affected)

* Broxton J4205:
   7% MemBW GPU texture
   2-3% SynMark TexMem*

* Tigerlake-H:
   7% MemBW GPU blend


Ah right that makes sense. All the entries marker with asterisk under 
the "with patch" list. Okay if I just add an explanation on what does 
the asterisk mean for them at a single place?


And about the Broxton one. In the bug you put "15-20% MemBW GPU texture" 
and "10% SynMark TexMem*" so from where are these numbers now?




I have no idea why on GEN9 texture accesses regress, but on GEN12 TGL 
it's render buffer blend that regresses.


Blend (read+write) regressing is especially odd, as neither render 
buffer read nor write regresses.


Maybe that is a GEN12 specific driver bug similar to Mesa/i965 bug from 
few years back in how its shaders access render buffer, that had caused 
SIMD32 accesses to regress memory BW bound test-cases perf a bit 
compared to SIMD16?


(Blend test is likely to run nowadays as SIMD32.)


No idea on this one from me, leaving to more qualified people to comment.

Regards,

Tvrtko




 - Eero

On 7.9.2021 13.34, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

Usage of Transparent Hugepages was disabled in 9987da4b5dcf
("drm/i915: Disable THP until we have a GPU read BW W/A"), but since it
appears majority of performance regressions reported with an enabled 
IOMMU

can be almost eliminated by turning them on, lets just do that.

To err on the side of safety we keep the current default in cases where
IOMMU is not active, and only when it is default to the 
"huge=within_size"

mode. Although there probably would be wins to enable them throughout,
more extensive testing across benchmarks and platforms would need to be
done.

With the patch and IOMMU enabled my local testing on a small Skylake part
shows OglVSTangent regression being reduced from ~14% (IOMMU on versus
IOMMU off) to ~2% (same comparison but with THP on).

More detailed testing done in the below referenced Gitlab issue by Eero:

Skylake GT4e:

Performance drops from enabling IOMMU:

 30-35% SynMark CSDof
 20-25% Unigine Heaven, MemBW GPU write, SynMark VSTangent
 ~20% GLB Egypt  (1/2 screen window)
 10-15% GLB T-Rex (1/2 screen window)
 8-10% GfxBench T-Rex, MemBW GPU blit
 7-8% SynMark DeferredAA + TerrainFly* + ZBuffer
 6-7% GfxBench Manhattan 3.0 + 3.1, SynMark TexMem128 & CSCloth
 5-6% GfxBench CarChase, Unigine Valley
 3-5% GfxBench Vulkan & GL AztecRuins + ALU2, MemBW GPU texture,
  SynMark Fill*, Deferred, TerrainPan*
 1-2% Most of the other tests

With the patch drops become:

 20-25% SynMark TexMem*
 15-20% GLB Egypt (1/2 screen window)
 10-15% GLB T-Rex (1/2 screen window)
 4-7% GfxBench T-Rex, GpuTest Triangle
 1-8% GfxBench ALU2 (offscreen 1%, onscreen 8%)
 3% GfxBench Manhattan 3.0, SynMark CSDof
 2-3% Unigine Heaven + Valley, MemBW GPU texture
 1-3 GfxBench Manhattan 3.1 + CarChase + Vulkan & GL AztecRuins

Broxton:

Performance drops from IOMMU, without patch:

 30% MemBW GPU write
 25% SynMark ZBuffer + Fill*
 20% MemBW GPU blit
 15% MemBW GPU blend, GpuTest Triangle
 10-15% MemBW GPU texture
 10% GLB Egypt, Unigine Heaven (had hangs), SynMark TerrainFly*
 7-9% GLB T-Rex, GfxBench Manhattan 3.0 + T-Rex,
  SynMark Deferred* + TexMem*
 6-8% GfxBench CarChase, Unigine Valley,
  SynMark CSCloth + ShMapVsm + TerrainPan*
 5-6% GfxBench Manhattan 3.1 + GL AztecRuins,
  SynMark CSDof + TexFilterTri
 2-4% GfxBench ALU2, SynMark DrvRes + GSCloth + ShMapPcf + 
Batch[0-5] +

  TexFilterAniso, GpuTest GiMark + 32-bit Julia

And with patch:

 15-20% MemBW GPU texture
 10% SynMark TexMem*
 8-9% GLB Egypt (1/2 screen window)
 4-5% GLB T-Rex (1/2 screen window)
 3-6% GfxBench Manhattan 3.0, GpuTest FurMark,
  SynMark Deferred + TexFilterTri
 3-4% GfxBench Manhattan 3.1 + T-Rex, SynMark VSInstancing
 2-4% GpuTest Triangle, SynMark DeferredAA
 2-3% Unigine Heaven + Valley
 1-3% SynMark Terrain*
 1-2% GfxBench CarChase, SynMark TexFilterAniso + ZBuffer

Tigerlake-H:

 20-25% MemBW GPU texture
 15-20% GpuTest Triangle
 13-15% SynMark TerrainFly* + DeferredAA + HdrBloom
 8-10% GfxBench Manhattan 3.1, SynMark TerrainPan* + DrvRes
 6-7% GfxBench Manhattan 3.0, SynMark TexMem*
 4-8% GLB onscreen Fill + T-Rex + Egypt (more in onscreen than
  offscreen versions of T-Rex/Egypt)
 4-6% GfxBench CarChase + GLES AztecRuins + ALU2, GpuTest 32-bit 
Julia,

  SynMark CSDof + DrvState
 3-5% GfxBench T-Rex + Egypt, Unigine Heaven + Valley, GpuTest Plot3D
 1-7% Media tests
 2-3% MemBW GPU blit
 1-3% 

Re: linux-next: build failure after merge of the drm tree

2021-09-08 Thread Daniel Vetter
On Wed, Sep 8, 2021 at 5:14 AM Masahiro Yamada  wrote:
>
> On Mon, Sep 6, 2021 at 4:34 PM Daniel Vetter  wrote:
> >
> > On Mon, Sep 6, 2021 at 12:49 AM Stephen Rothwell  
> > wrote:
> > > Hi all,
> > >
> > > On Thu, 2 Sep 2021 07:50:38 +1000 Stephen Rothwell 
> > >  wrote:
> > > >
> > > > On Fri, 20 Aug 2021 15:23:34 +0900 Masahiro Yamada 
> > > >  wrote:
> > > > >
> > > > > On Fri, Aug 20, 2021 at 11:33 AM Stephen Rothwell 
> > > > >  wrote:
> > > > > >
> > >  > > After merging the drm tree, today's linux-next build (x86_64 
> > > allmodconfig)
> > > > > > failed like this:
> > > > > >
> > > > > > In file included from drivers/gpu/drm/i915/i915_debugfs.c:39:
> > > > > > drivers/gpu/drm/i915/gt/intel_gt_requests.h:9:10: fatal error: 
> > > > > > stddef.h: No such file or directory
> > > > > > 9 | #include 
> > > > > >   |  ^~
> > > > > >
> > > > > > Caused by commit
> > > > > >
> > > > > >   564f963eabd1 ("isystem: delete global -isystem compile option")
> > > > > >
> > > > > > from the kbuild tree interacting with commit
> > > > > >
> > > > > >   b97060a99b01 ("drm/i915/guc: Update intel_gt_wait_for_idle to 
> > > > > > work with GuC")
> > > > > >
> > > > > > I have applied the following patch for today.
> > > > >
> > > > >
> > > > > Thanks.
> > > > >
> > > > > This fix-up does not depend on my kbuild tree in any way.
> > > > >
> > > > > So, the drm maintainer can apply it to his tree.
> > > > >
> > > > > Perhaps with
> > > > >
> > > > > Fixes: b97060a99b01 ("drm/i915/guc: Update intel_gt_wait_for_idle to
> > > > > work with GuC")
> > > >
> > > > OK, so that didn't happen so I will now apply the merge fix up to the
> > > > merge of the kbuild tree.
> > > >
> > > > > > From: Stephen Rothwell 
> > > > > > Date: Fri, 20 Aug 2021 12:24:19 +1000
> > > > > > Subject: [PATCH] drm/i915: use linux/stddef.h due to "isystem: 
> > > > > > trim/fixup stdarg.h and other headers"
> > > > > >
> > > > > > Signed-off-by: Stephen Rothwell 
> > > > > > ---
> > > > > >  drivers/gpu/drm/i915/gt/intel_gt_requests.h | 2 +-
> > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.h 
> > > > > > b/drivers/gpu/drm/i915/gt/intel_gt_requests.h
> > > > > > index 51dbe0e3294e..d2969f68dd64 100644
> > > > > > --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.h
> > > > > > +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.h
> > > > > > @@ -6,7 +6,7 @@
> > > > > >  #ifndef INTEL_GT_REQUESTS_H
> > > > > >  #define INTEL_GT_REQUESTS_H
> > > > > >
> > > > > > -#include 
> > > > > > +#include 
> > > > > >
> > > > > >  struct intel_engine_cs;
> > > > > >  struct intel_gt;
> > > > > > --
> > > > > > 2.32.0
> > >
> > > Ping?  I am still applying this ...
> >
> > Apologies, this fell through a lot of cracks. I applied this to drm-next 
> > now.
>
>
>
> Rather, I was planning to apply this fix to my kbuild tree.
>
> Since you guys did not fix the issue in time,
> I ended up with dropping [1] from my pull request.
>
> I want to get [1] merged in this MW.
>
> If I postponed it, somebody would add new
>  or  inclusion in the next development
> cycle, I will never make it in the mainline.
>
> [1] 
> https://lore.kernel.org/linux-kernel/YQhY40teUJcTc5H4@localhost.localdomain/

Yeah no problem if you apply it too. For that:

Acked-by: Daniel Vetter 

I just figured I make sure this is at least not lost.
-Daniel

>
>
>
>
>
> > Matt/John, as author/committer it's your job to make sure issues and
> > fixes for the stuff you're pushing don't get lost. I'd have expected
> > John to apply this to at least drm-intel-gt-next (it's not even
> > there).
> >
> > Joonas, I think this is the 2nd or 3rd or so issue this release cycle
> > where some compile fix got stuck a bit because drm-intel-gt-next isn't
> > in linux-next. Can we please fix that? It probably needs some changes
> > to the dim script.
> >
> > Cheers, Daniel
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch
>
>
>
> --
> Best Regards
> Masahiro Yamada



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [Intel-gfx] [PATCH] drm/i915: Get PM ref before accessing HW register

2021-09-08 Thread Tvrtko Ursulin



On 08/09/2021 00:27, Vinay Belgaumkar wrote:

Seeing these errors when GT is likely in suspend state-
"RPM wakelock ref not held during HW access"

Ensure GT is awake before trying to access HW registers. Avoid
reading the register if that is not the case.

Signed-off-by: Vinay Belgaumkar 


Fixes: 41e5c17ebfc2 ("drm/i915/guc/slpc: Sysfs hooks for SLPC")
Reviewed-by: Tvrtko Ursulin 

Regards,

Tvrtko


---
  drivers/gpu/drm/i915/gt/intel_rps.c | 8 +++-
  1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c 
b/drivers/gpu/drm/i915/gt/intel_rps.c
index 3489f5f0cac1..e1a198bbd135 100644
--- a/drivers/gpu/drm/i915/gt/intel_rps.c
+++ b/drivers/gpu/drm/i915/gt/intel_rps.c
@@ -1969,8 +1969,14 @@ u32 intel_rps_read_actual_frequency(struct intel_rps 
*rps)
  u32 intel_rps_read_punit_req(struct intel_rps *rps)
  {
struct intel_uncore *uncore = rps_to_uncore(rps);
+   struct intel_runtime_pm *rpm = rps_to_uncore(rps)->rpm;
+   intel_wakeref_t wakeref;
+   u32 freq = 0;
  
-	return intel_uncore_read(uncore, GEN6_RPNSWREQ);

+   with_intel_runtime_pm_if_in_use(rpm, wakeref)
+   freq = intel_uncore_read(uncore, GEN6_RPNSWREQ);
+
+   return freq;
  }
  
  static u32 intel_rps_get_req(u32 pureq)




Re: [PATCH] doc: gpu: Add document describing buffer exchange

2021-09-08 Thread Pekka Paalanen
On Sun,  5 Sep 2021 13:27:42 +0100
Daniel Stone  wrote:

> Since there's a lot of confusion around this, document both the rules
> and the best practice around negotiating, allocating, importing, and
> using buffers when crossing context/process/device/subsystem boundaries.
> 
> This ties up all of dmabuf, formats and modifiers, and their usage.
> 
> Signed-off-by: Daniel Stone 

Hi,

I checked the comments from Simon and Bob, and I agree with them. Below
are some more from me.

There is room for adding a glossary for the terms, like what is the
difference between a buffer, pixel buffer and a memory buffer, and
things like pixel data, color value, stride, etc.

For example:

image
Conceptually a two-dimensional array of pixels. The pixels may
be stored in one or more memory buffers. Has width and height
in pixels, pixel format and modifier (implicit or explicit).

memory buffer
A piece of memory for storing (parts of) pixel data. Has stride
and size in bytes and at least one handle in some API. May
contain one or more planes.

plane
A two-dimensional array of some or all of an image's color and
alpha channel values.

pixel
A picture element. Has a single color value which is defined by
one or more color channels values, e.g. R, G and B, or Y, Cb
and Cr. May also have an alpha value as an additional
channel.

pixel data
Bytes or bits that represent some or all of the color/alpha
channel values of a pixel or an image. The data for one pixel
may be spread over several planes or memory buffers depending
on format and modifier.

color value
A tuple of numbers, representing a color. Each element in the
tuple is a color channel value.

color channel
One of the dimensions in a color model. For example, RGB model
has channels R, G, and B. Alpha channel is sometimes counted as
a color channel as well.

pixel format
A description of how pixel data represents the pixel's color
and alpha values.

modifier
A description of how pixel data is laid out in memory buffers.

alpha
A value that denotes the color coverage in a pixel. Sometimes
used for translucency instead.

stride



> ---
> 
> This is just a quick first draft, inspired by:
>   https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3197#note_1048637
> 
> It's not complete or perfect, but I'm off to eat a roast then have a
> nice walk in the sun, so figured it'd be better to dash it off rather
> than let it rot on my hard drive.

For a quick draft, this is quite excellent.

> 
>  .../gpu/exchanging-pixel-buffers.rst  | 285 ++
>  Documentation/gpu/index.rst   |   1 +
>  2 files changed, 286 insertions(+)
>  create mode 100644 Documentation/gpu/exchanging-pixel-buffers.rst
> 
> diff --git a/Documentation/gpu/exchanging-pixel-buffers.rst 
> b/Documentation/gpu/exchanging-pixel-buffers.rst
> new file mode 100644
> index ..75c4de13d5c8
> --- /dev/null
> +++ b/Documentation/gpu/exchanging-pixel-buffers.rst
> @@ -0,0 +1,285 @@
> +.. Copyright 2021 Collabora Ltd.
> +
> +
> +Exchanging pixel buffers
> +
> +
> +As originally designed, the Linux graphics subsystem had extremely limited
> +support for sharing pixel-buffer allocations between processes, devices, and
> +subsystems. Modern systems require extensive integration between all three
> +classes; this document details how applications and kernel subsystems should
> +approach this sharing for two-dimensional image data.
> +
> +It is written with reference to the DRM subsystem for GPU and display 
> devices,
> +V4L2 for media devices, and also to Vulkan, EGL and Wayland, for userspace
> +support, however any other subsystems should also follow this design and 
> advice.
> +
> +
> +Formats and modifiers
> +=
> +
> +Each buffer must have an underlying format. This format describes the data 
> which
> +can be stored and loaded for each pixel. Although each subsystem has its own
> +format descriptions (e.g. V4L2 and fbdev), the `DRM_FORMAT_*` tokens should 
> be
> +reused wherever possible, as they are the standard descriptions used for
> +interchange.
> +
> +Each `DRM_FORMAT_*` token describes the per-pixel data available, in terms of
> +the translation between one or more pixels in memory, and the color data
> +contained within that memory. The number and type of color channels are
> +described: whether they are RGB or YUV, integer or floating-point, the size
> +of each channel and their locations within the pixel memory, and the
> +relationship between color planes.
> +
> +For example, `DRM_FORMAT_ARGB` describes a format in which each pixel 
> has a
> +single 32-bit value in memory. Alpha, red, green, and blue, color channels 
> are
> +available at 8-byte precision per channel,

Re: [PATCH] doc: gpu: Add document describing buffer exchange

2021-09-08 Thread Simon Ser
> stride
>   

I think what's clear is:

- Per-plane property
- In bytes
- Offset between two consecutive rows

How that applies to weird YUV formats is the tricky question…

> Btw. there was a fun argument whether the same modifier value could
> mean different things on different devices. There were also arguments
> that a certain modifier could reference additional implicit memory on
> the device - memory that can only be accessed by very specific devices.
>
> I think AMLOGIC_FBC_LAYOUT_SCATTER was one of those.

A recent exmaple of this is [1].

[1]: https://patchwork.freedesktop.org/patch/452461/


Re: [PATCH 1/8] drm/i915/xehp: Define compute class and engine

2021-09-08 Thread Tvrtko Ursulin



On 07/09/2021 18:19, Matt Roper wrote:

Introduce a Compute Command Streamer (CCS), which has access to
the media and GPGPU pipelines (but not the 3D pipeline).

To begin with, define the compute class/engine common functions, based
on the existing render ones.

Bspec: 46167, 45544
Original-patch-by: Michel Thierry
Cc: Daniele Ceraolo Spurio 
Cc: Tvrtko Ursulin 
Cc: Vinay Belgaumkar 
Cc: Szymon Morek 
UMD (compute): https://github.com/intel/compute-runtime/pull/451
Signed-off-by: Rodrigo Vivi 
Signed-off-by: Daniele Ceraolo Spurio 
Signed-off-by: Aravind Iddamsetty 
Signed-off-by: Matt Roper 
---
  drivers/gpu/drm/i915/gt/intel_engine_cs.c| 28 
  drivers/gpu/drm/i915/gt/intel_engine_types.h |  9 ++-
  drivers/gpu/drm/i915/gt/intel_engine_user.c  |  5 +++-
  drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h  | 13 +
  drivers/gpu/drm/i915/i915_reg.h  |  8 ++
  include/uapi/drm/i915_drm.h  |  1 +
  6 files changed, 57 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 332efea696a5..69944bd8c19d 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -153,6 +153,34 @@ static const struct engine_info intel_engines[] = {
{ .graphics_ver = 12, .base = XEHP_VEBOX4_RING_BASE }
},
},
+   [CCS0] = {
+   .class = COMPUTE_CLASS,
+   .instance = 0,
+   .mmio_bases = {
+   { .graphics_ver = 12, .base = GEN12_COMPUTE0_RING_BASE }
+   }
+   },
+   [CCS1] = {
+   .class = COMPUTE_CLASS,
+   .instance = 1,
+   .mmio_bases = {
+   { .graphics_ver = 12, .base = GEN12_COMPUTE1_RING_BASE }
+   }
+   },
+   [CCS2] = {
+   .class = COMPUTE_CLASS,
+   .instance = 2,
+   .mmio_bases = {
+   { .graphics_ver = 12, .base = GEN12_COMPUTE2_RING_BASE }
+   }
+   },
+   [CCS3] = {
+   .class = COMPUTE_CLASS,
+   .instance = 3,
+   .mmio_bases = {
+   { .graphics_ver = 12, .base = GEN12_COMPUTE3_RING_BASE }
+   }
+   },
  };
  
  /**

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h 
b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index bfbfe53c23dd..dcb9d8b2362a 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -33,7 +33,8 @@
  #define VIDEO_ENHANCEMENT_CLASS   2
  #define COPY_ENGINE_CLASS 3
  #define OTHER_CLASS   4
-#define MAX_ENGINE_CLASS   4
+#define COMPUTE_CLASS  5
+#define MAX_ENGINE_CLASS   5
  #define MAX_ENGINE_INSTANCE   7
  
  #define I915_MAX_SLICES	3

@@ -95,6 +96,7 @@ struct i915_ctx_workarounds {
  
  #define I915_MAX_VCS	8

  #define I915_MAX_VECS 4
+#define I915_MAX_CCS   4
  
  /*

   * Engine IDs definitions.
@@ -117,6 +119,11 @@ enum intel_engine_id {
VECS2,
VECS3,
  #define _VECS(n) (VECS0 + (n))
+   CCS0,
+   CCS1,
+   CCS2,
+   CCS3,
+#define _CCS(n) (CCS0 + (n))
I915_NUM_ENGINES
  #define INVALID_ENGINE ((enum intel_engine_id)-1)
  };
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c 
b/drivers/gpu/drm/i915/gt/intel_engine_user.c
index 8f8bea08e734..d981621a7c30 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
@@ -47,6 +47,7 @@ static const u8 uabi_classes[] = {
[COPY_ENGINE_CLASS] = I915_ENGINE_CLASS_COPY,
[VIDEO_DECODE_CLASS] = I915_ENGINE_CLASS_VIDEO,
[VIDEO_ENHANCEMENT_CLASS] = I915_ENGINE_CLASS_VIDEO_ENHANCE,
+   [COMPUTE_CLASS] = I915_ENGINE_CLASS_COMPUTE,
  };
  
  static int engine_cmp(void *priv, const struct list_head *A,

@@ -139,6 +140,7 @@ const char *intel_engine_class_repr(u8 class)
[COPY_ENGINE_CLASS] = "bcs",
[VIDEO_DECODE_CLASS] = "vcs",
[VIDEO_ENHANCEMENT_CLASS] = "vecs",
+   [COMPUTE_CLASS] = "ccs",
};
  
  	if (class >= ARRAY_SIZE(uabi_names) || !uabi_names[class])

@@ -162,6 +164,7 @@ static int legacy_ring_idx(const struct legacy_ring *ring)
[COPY_ENGINE_CLASS] = { BCS0, 1 },
[VIDEO_DECODE_CLASS] = { VCS0, I915_MAX_VCS },
[VIDEO_ENHANCEMENT_CLASS] = { VECS0, I915_MAX_VECS },
+   [COMPUTE_CLASS] = { CCS0, I915_MAX_CCS },
};
  
  	if (GEM_DEBUG_WARN_ON(ring->class >= ARRAY_SIZE(map)))

@@ -190,7 +193,7 @@ static void add_legacy_ring(struct legacy_ring *ring,
  void intel_engines_driver_register(struct drm_i915_private *i915)
  {
struct legacy_ring ring = {};
-   u8 uabi_instances[4] = {};
+   u8 uabi_instances[5] = {};
struct list_head *it, *next;
struct 

Re: Handling DRM master transitions cooperatively

2021-09-08 Thread Simon Ser
> On Tue, 07 Sep 2021 10:19:03 +
> Simon Ser  wrote:
>
> > FWIW, I've just hit a case where a compositor leaves a "rotation" KMS
> > prop set behind, then Xorg tries to startup and fails because it doesn't
> > reset this prop. So none of this is theoretical.
> >
> > I still think a "reset all KMS props to an arbitrary default value" flag
> > in drmModeAtomicCommit is the best way forward. I'm not sure a user-space
> > protocol would help too much.
>
> Hi Simon,
>
> for the "reset KMS state" problem, sure. Thanks for confirming the
> problem, too.
>
> The hand-off problem does need userspace protocol though, so that the
> two parties can negotiate what part of KMS state can be inherited by
> the receiver and who will do the animation from the first to the second
> state in case you want to avoid abrupt changes. It would also be useful
> for a cross-fade as a perhaps more flexible way than the current "leak
> an FB, let the next KMS client scrape it via ioctls and copy it so it
> can be textured from".

The KMS state can be limited to single FB on primary plane covering the whole
CRTC, no scaling, no other property set than FB_ID/CRTC_*/SRC_*.

Is it useful to make the previous client perform the animation? I don't really
understand the use-case here.

> Userspace protocol is also useful for starting the next KMS client
> first and handing off only later once it's actually running. I'm not
> sure if that is already possible with the session switching stuff, but
> I have a feeling it might be fragile or miss pieces like the next KMS
> client signalling ready before actually switching to it.

Hm, right. I'm not 100% clear if it's possible for the next client to set
everything up while the VT is not active.

It would help to make logind/seatd give a non-master DRM FD when VT-switched
away. Not sure they do it atm.


Re: [PATCH 2/8] drm/i915/xehp: CCS shares the render reset domain

2021-09-08 Thread Tvrtko Ursulin



On 07/09/2021 18:19, Matt Roper wrote:

The reset domain is shared between render and all compute engines,
so resetting one will affect the others.

Note:  Before performing a reset on an RCS or CCS engine, the GuC will
attempt to preempt-to-idle the other non-hung RCS/CCS engines to avoid
impacting other clients (since some shared modules will be reset).  If
other engines are executing non-preemptable workloads, the impact is
unavoidable and some work may be lost.


Since here it talks about engine reset, should this patch add warning if 
 same is attempted by i915 on a GuC platform - to document it is not 
implemented/supported? Or perhaps later in the series, or future series 
works better.


Reviewed-by: Tvrtko Ursulin 

Regards,

Tvrtko


Bspec: 52549
Original-patch-by: Michel Thierry
Cc: Tvrtko Ursulin 
Cc: Vinay Belgaumkar 
Signed-off-by: Daniele Ceraolo Spurio 
Signed-off-by: Aravind Iddamsetty 
Signed-off-by: Matt Roper 
---
  drivers/gpu/drm/i915/gt/intel_reset.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c 
b/drivers/gpu/drm/i915/gt/intel_reset.c
index 91200c43951f..30598c1d070c 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -507,6 +507,10 @@ static int gen11_reset_engines(struct intel_gt *gt,
[VECS1] = GEN11_GRDOM_VECS2,
[VECS2] = GEN11_GRDOM_VECS3,
[VECS3] = GEN11_GRDOM_VECS4,
+   [CCS0] = GEN11_GRDOM_RENDER,
+   [CCS1] = GEN11_GRDOM_RENDER,
+   [CCS2] = GEN11_GRDOM_RENDER,
+   [CCS3] = GEN11_GRDOM_RENDER,
};
struct intel_engine_cs *engine;
intel_engine_mask_t tmp;



Re: [PATCH 3/8] drm/i915/xehp: Add Compute CS IRQ handlers

2021-09-08 Thread Tvrtko Ursulin



On 07/09/2021 18:19, Matt Roper wrote:

Add execlists and GuC interrupts for compute CS into existing IRQ handlers.

All compute command streamers belong to the same compute class, so the
only change needed to enable their interrupts is to program their GT engine
interrupt mask registers.

CCS0 shares the register with CCS1, while CCS2 and CCS3 are in a new one.

BSpec: 50844, 54029, 54030, 53223, 53224.
Original-patch-by: Michel Thierry
Cc: Tvrtko Ursulin 
Cc: Vinay Belgaumkar 
Signed-off-by: Daniele Ceraolo Spurio 
Signed-off-by: Aravind Iddamsetty 
Signed-off-by: Matt Roper 
---
  drivers/gpu/drm/i915/gt/intel_gt_irq.c | 15 ++-
  drivers/gpu/drm/i915/i915_drv.h|  2 ++
  drivers/gpu/drm/i915/i915_reg.h|  3 +++
  3 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_irq.c 
b/drivers/gpu/drm/i915/gt/intel_gt_irq.c
index b2de83be4d97..612281d47513 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_irq.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_irq.c
@@ -96,7 +96,7 @@ gen11_gt_identity_handler(struct intel_gt *gt, const u32 
identity)
if (unlikely(!intr))
return;
  
-	if (class <= COPY_ENGINE_CLASS)

+   if (class <= COPY_ENGINE_CLASS || class == COMPUTE_CLASS)
return gen11_engine_irq_handler(gt, class, instance, intr);
  
  	if (class == OTHER_CLASS)

@@ -178,6 +178,8 @@ void gen11_gt_irq_reset(struct intel_gt *gt)
/* Disable RCS, BCS, VCS and VECS class engines. */
intel_uncore_write(uncore, GEN11_RENDER_COPY_INTR_ENABLE, 0);
intel_uncore_write(uncore, GEN11_VCS_VECS_INTR_ENABLE,0);
+   if (CCS_MASK(gt))
+   intel_uncore_write(uncore, GEN12_CCS_RSVD_INTR_ENABLE, 0);
  
  	/* Restore masks irqs on RCS, BCS, VCS and VECS engines. */

intel_uncore_write(uncore, GEN11_RCS0_RSVD_INTR_MASK,   ~0);
@@ -191,6 +193,10 @@ void gen11_gt_irq_reset(struct intel_gt *gt)
intel_uncore_write(uncore, GEN11_VECS0_VECS1_INTR_MASK, ~0);
if (HAS_ENGINE(gt, VECS2) || HAS_ENGINE(gt, VECS3))
intel_uncore_write(uncore, GEN12_VECS2_VECS3_INTR_MASK, ~0);
+   if (HAS_ENGINE(gt, CCS0) || HAS_ENGINE(gt, CCS1))
+   intel_uncore_write(uncore, GEN12_CCS0_CCS1_INTR_MASK, ~0);
+   if (HAS_ENGINE(gt, CCS2) || HAS_ENGINE(gt, CCS3))
+   intel_uncore_write(uncore, GEN12_CCS2_CCS3_INTR_MASK, ~0);
  
  	intel_uncore_write(uncore, GEN11_GPM_WGBOXPERF_INTR_ENABLE, 0);

intel_uncore_write(uncore, GEN11_GPM_WGBOXPERF_INTR_MASK,  ~0);
@@ -218,6 +224,8 @@ void gen11_gt_irq_postinstall(struct intel_gt *gt)
/* Enable RCS, BCS, VCS and VECS class interrupts. */
intel_uncore_write(uncore, GEN11_RENDER_COPY_INTR_ENABLE, dmask);
intel_uncore_write(uncore, GEN11_VCS_VECS_INTR_ENABLE, dmask);
+   if (CCS_MASK(gt))
+   intel_uncore_write(uncore, GEN12_CCS_RSVD_INTR_ENABLE, smask);
  
  	/* Unmask irqs on RCS, BCS, VCS and VECS engines. */

intel_uncore_write(uncore, GEN11_RCS0_RSVD_INTR_MASK, ~smask);
@@ -231,6 +239,11 @@ void gen11_gt_irq_postinstall(struct intel_gt *gt)
intel_uncore_write(uncore, GEN11_VECS0_VECS1_INTR_MASK, ~dmask);
if (HAS_ENGINE(gt, VECS2) || HAS_ENGINE(gt, VECS3))
intel_uncore_write(uncore, GEN12_VECS2_VECS3_INTR_MASK, ~dmask);
+   if (HAS_ENGINE(gt, CCS0) || HAS_ENGINE(gt, CCS1))
+   intel_uncore_write(uncore, GEN12_CCS0_CCS1_INTR_MASK, ~dmask);
+   if (HAS_ENGINE(gt, CCS2) || HAS_ENGINE(gt, CCS3))
+   intel_uncore_write(uncore, GEN12_CCS2_CCS3_INTR_MASK, ~dmask);
+
/*
 * RPS interrupts will get enabled/disabled on demand when RPS itself
 * is enabled/disabled.
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 1fd3040b6771..5b6eee5d8ade 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1573,6 +1573,8 @@ IS_SUBPLATFORM(const struct drm_i915_private *i915,
ENGINE_INSTANCES_MASK(gt, VCS0, I915_MAX_VCS)
  #define VEBOX_MASK(gt) \
ENGINE_INSTANCES_MASK(gt, VECS0, I915_MAX_VECS)
+#define CCS_MASK(gt) \
+   ENGINE_INSTANCES_MASK(gt, CCS0, I915_MAX_CCS)
  
  /*

   * The Gen7 cmdparser copies the scanned buffer to the ggtt for execution
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 33d6aa0b07c1..31e9c2cc4c0c 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -8139,6 +8139,7 @@ enum {
  #define GEN11_GPM_WGBOXPERF_INTR_ENABLE   _MMIO(0x19003c)
  #define GEN11_CRYPTO_RSVD_INTR_ENABLE _MMIO(0x190040)
  #define GEN11_GUNIT_CSME_INTR_ENABLE  _MMIO(0x190044)
+#define GEN12_CCS_RSVD_INTR_ENABLE _MMIO(0x190048)
  
  #define GEN11_RCS0_RSVD_INTR_MASK	_MMIO(0x190090)

  #define GEN11_BCS_RSVD_INTR_MASK  _MMIO(0x1900a0)
@@ -8152,6 +8153,8 @@ enum {
  #define GEN11_GPM_WGBOXPERF_INTR_MASK _MMIO(0x1900ec)
  #define GEN11_

Re: [PATCH v10 01/17] dt-bindings: arm: mediatek: mmsys: add power and gce properties

2021-09-08 Thread Jason-JH Lin
Hi Enric,

Thanks for the reviews.

On Wed, 2021-09-08 at 10:32 +0200, Enric Balletbo i Serra wrote:
> Hi Jason,
> 
> Thank you for your patch. One small comment below.
> 
> On 8/9/21 8:02, jason-jh.lin wrote:
> > Power:
> > 1. Add description for power-domains property.
> > 
> > GCE:
> > 1. Add description for mboxes property.
> > 2. Add description for mediatek,gce-client-reg property.
> > 
> > Signed-off-by: jason-jh.lin 
> > ---
> >  .../bindings/arm/mediatek/mediatek,mmsys.yaml | 30
> > ++-
> >  1 file changed, 29 insertions(+), 1 deletion(-)
> > 
> > diff --git
> > a/Documentation/devicetree/bindings/arm/mediatek/mediatek,mmsys.yam
> > l
> > b/Documentation/devicetree/bindings/arm/mediatek/mediatek,mmsys.yam
> > l
> > index 2d4ff0ce387b..a2e7bddfed03 100644
> > ---
> > a/Documentation/devicetree/bindings/arm/mediatek/mediatek,mmsys.yam
> > l
> > +++
> > b/Documentation/devicetree/bindings/arm/mediatek/mediatek,mmsys.yam
> > l
> > @@ -39,6 +39,30 @@ properties:
> >reg:
> >  maxItems: 1
> >  
> > +  power-domains:
> > +description:
> > +  A phandle and PM domain specifier as defined by bindings
> > +  of the power controller specified by phandle. See
> > +  Documentation/devicetree/bindings/power/power-domain.yaml
> > for details.
> > +
> > +  mboxes:
> > +description:
> > +  Using mailbox to communicate with GCE, it should have this
> > +  property and list of phandle, mailbox specifiers. See
> > +  Documentation/devicetree/bindings/mailbox/mtk-gce.txt for
> > details.
> > +$ref: /schemas/types.yaml#/definitions/phandle-array
> > +
> > +  mediatek,gce-client-reg:
> > +description:
> > +  The register of client driver can be configured by gce with
> > 4 arguments
> > +  defined in this property, such as phandle of gce, subsys id,
> > +  register offset and size.
> > +  Each subsys id is mapping to a base address of display
> > function blocks
> > +  register which is defined in the gce header
> > +  include/dt-bindings/gce/-gce.h.
> > +$ref: /schemas/types.yaml#/definitions/phandle-array
> > +maxItems: 1
> > +
> >"#clock-cells":
> >  const: 1
> >  
> > @@ -53,6 +77,10 @@ examples:
> >- |
> >  mmsys: syscon@1400 {
> >  compatible = "mediatek,mt8173-mmsys", "syscon";
> > -reg = <0x1400 0x1000>;
> > +reg = <0 0x1400 0 0x1000>;
> 
> Why this change?
> 
> Thanks,
>   Enric
> 

I think the first version of this example is not correct.
I,ve checked the first version of mt8173.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/arm64/boot/dts/mediatek/mt8173.dtsi?id=b3a37248415716663ea2d752da4a5f765fc87442

Because #address-cells and #size-cells of parent node are defined as 2.

e.g.

soc {
#address-cells = <2>;
#size-cells = <2>;
...

};


Regards,
Jason-JH.Lin

> 
> > +power-domains = <&spm MT8173_POWER_DOMAIN_MM>;
> >  #clock-cells = <1>;
> > +mboxes = <&gce 0 CMDQ_THR_PRIO_HIGHEST>,
> > + <&gce 1 CMDQ_THR_PRIO_HIGHEST>;
> > +mediatek,gce-client-reg = <&gce SUBSYS_1400 0 0x1000>;
> >  };
> > 
-- 
Jason-JH Lin 



Re: [PATCH 4/8] drm/i915/xehp: CCS should use RCS setup functions

2021-09-08 Thread Tvrtko Ursulin



On 07/09/2021 18:19, Matt Roper wrote:

The compute engine handles the same commands the render engine can
(except 3D pipeline), so it makes sense that CCS is more similar to RCS
than non-render engines.

The CCS context state (lrc) is also similar to the render one, so reuse
it. Note that the compute engine has its own CTX_R_PWR_CLK_STATE
register.

In order to avoid having multiple RCS && CCS checks, add the following
engine flag:
  - I915_ENGINE_HAS_RCS_REG_STATE - use the render (larger) reg state ctx.

BSpec: 46260
Original-patch-by: Michel Thierry
Cc: Tvrtko Ursulin 
Cc: Daniele Ceraolo Spurio 
Signed-off-by: Aravind Iddamsetty 
Signed-off-by: Matt Roper 
---
  drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c | 8 +---
  drivers/gpu/drm/i915/gt/intel_engine_cs.c | 6 ++
  drivers/gpu/drm/i915/gt/intel_engine_types.h  | 1 +
  drivers/gpu/drm/i915/gt/intel_execlists_submission.c  | 2 +-
  drivers/gpu/drm/i915/gt/intel_lrc.c   | 4 ++--
  drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 2 +-
  drivers/gpu/drm/i915/i915_perf.c  | 4 ++--
  drivers/gpu/drm/i915/i915_reg.h   | 2 +-
  8 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
index b32f7fed2d9c..fbe10783628b 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
@@ -883,7 +883,9 @@ static int igt_shared_ctx_exec(void *arg)
return err;
  }
  
-static int rpcs_query_batch(struct drm_i915_gem_object *rpcs, struct i915_vma *vma)

+static int rpcs_query_batch(struct drm_i915_gem_object *rpcs,
+   struct i915_vma *vma,
+   struct intel_engine_cs *engine)
  {
u32 *cmd;
  
@@ -894,7 +896,7 @@ static int rpcs_query_batch(struct drm_i915_gem_object *rpcs, struct i915_vma *v

return PTR_ERR(cmd);
  
  	*cmd++ = MI_STORE_REGISTER_MEM_GEN8;

-   *cmd++ = i915_mmio_reg_offset(GEN8_R_PWR_CLK_STATE);
+   *cmd++ = i915_mmio_reg_offset(GEN8_R_PWR_CLK_STATE(engine->mmio_base));
*cmd++ = lower_32_bits(vma->node.start);
*cmd++ = upper_32_bits(vma->node.start);
*cmd = MI_BATCH_BUFFER_END;
@@ -955,7 +957,7 @@ emit_rpcs_query(struct drm_i915_gem_object *obj,
if (err)
goto err_vma;
  
-	err = rpcs_query_batch(rpcs, vma);

+   err = rpcs_query_batch(rpcs, vma, ce->engine);
if (err)
goto err_batch;
  
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c

index 69944bd8c19d..b346b946602d 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -205,6 +205,8 @@ u32 intel_engine_context_size(struct intel_gt *gt, u8 class)
BUILD_BUG_ON(I915_GTT_PAGE_SIZE != PAGE_SIZE);
  
  	switch (class) {

+   case COMPUTE_CLASS:
+   fallthrough;
case RENDER_CLASS:
switch (GRAPHICS_VER(gt->i915)) {
default:
@@ -379,6 +381,10 @@ static int intel_engine_setup(struct intel_gt *gt, enum 
intel_engine_id id)
if (GRAPHICS_VER(i915) == 12 && engine->class == RENDER_CLASS)
engine->props.preempt_timeout_ms = 0;
  
+	/* features common between engines sharing EUs */

+   if (engine->class == RENDER_CLASS || engine->class == COMPUTE_CLASS)
+   engine->flags |= I915_ENGINE_HAS_RCS_REG_STATE;
+
engine->defaults = engine->props; /* never to change again */
  
  	engine->context_size = intel_engine_context_size(gt, engine->class);

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h 
b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index dcb9d8b2362a..30a0c69c36c8 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -454,6 +454,7 @@ struct intel_engine_cs {
  #define I915_ENGINE_HAS_RELATIVE_MMIO BIT(6)
  #define I915_ENGINE_REQUIRES_CMD_PARSER BIT(7)
  #define I915_ENGINE_WANT_FORCED_PREEMPTION BIT(8)
+#define I915_ENGINE_HAS_RCS_REG_STATE  BIT(9)
unsigned int flags;
  
  	/*

diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index de5f9c86b9a4..4c600c46414d 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -3406,7 +3406,7 @@ int intel_execlists_submission_setup(struct 
intel_engine_cs *engine)
logical_ring_default_vfuncs(engine);
logical_ring_default_irqs(engine);
  
-	if (engine->class == RENDER_CLASS)

+   if (engine->flags & I915_ENGINE_HAS_RCS_REG_STATE)
rcs_submission_override(engine);


Hm, what do pipe control flushes which relate to 3d pipeline end up 
doing on CCS engines?


Regards,

Tvrtko

 

[PATCH v2 (repost)] fbmem: don't allow too huge resolutions

2021-09-08 Thread Tetsuo Handa
syzbot is reporting page fault at vga16fb_fillrect() [1], for
vga16fb_check_var() is failing to detect multiplication overflow.

  if (vxres * vyres > maxmem) {
vyres = maxmem / vxres;
if (vyres < yres)
  return -ENOMEM;
  }

Since no module would accept too huge resolutions where multiplication
overflow happens, let's reject in the common path.

Link: https://syzkaller.appspot.com/bug?extid=04168c8063cfdde1db5e [1]
Reported-by: syzbot 
Debugged-by: Randy Dunlap 
Signed-off-by: Tetsuo Handa 
Reviewed-by: Geert Uytterhoeven 
---
Changes in v2:
  Use check_mul_overflow(), suggested by Geert Uytterhoeven 
.

 drivers/video/fbdev/core/fbmem.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c
index 71fb710f1ce3..7420d2c16e47 100644
--- a/drivers/video/fbdev/core/fbmem.c
+++ b/drivers/video/fbdev/core/fbmem.c
@@ -962,6 +962,7 @@ fb_set_var(struct fb_info *info, struct fb_var_screeninfo 
*var)
struct fb_var_screeninfo old_var;
struct fb_videomode mode;
struct fb_event event;
+   u32 unused;
 
if (var->activate & FB_ACTIVATE_INV_MODE) {
struct fb_videomode mode1, mode2;
@@ -1008,6 +1009,11 @@ fb_set_var(struct fb_info *info, struct 
fb_var_screeninfo *var)
if (var->xres < 8 || var->yres < 8)
return -EINVAL;
 
+   /* Too huge resolution causes multiplication overflow. */
+   if (check_mul_overflow(var->xres, var->yres, &unused) ||
+   check_mul_overflow(var->xres_virtual, var->yres_virtual, &unused))
+   return -EINVAL;
+
ret = info->fbops->fb_check_var(var, info);
 
if (ret)
-- 
2.18.4




Re: [PATCH] kernel/locking: Add context to ww_mutex_trylock.

2021-09-08 Thread Peter Zijlstra
On Tue, Sep 07, 2021 at 03:20:44PM +0200, Maarten Lankhorst wrote:
> i915 will soon gain an eviction path that trylock a whole lot of locks
> for eviction, getting dmesg failures like below:
> 
> BUG: MAX_LOCK_DEPTH too low!
> turning off the locking correctness validator.
> depth: 48  max: 48!
> 48 locks held by i915_selftest/5776:
>  #0: 888101a79240 (&dev->mutex){}-{3:3}, at: 
> __driver_attach+0x88/0x160
>  #1: c99778c0 (reservation_ww_class_acquire){+.+.}-{0:0}, at: 
> i915_vma_pin.constprop.63+0x39/0x1b0 [i915]
>  #2: 88800cf74de8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: 
> i915_vma_pin.constprop.63+0x5f/0x1b0 [i915]
>  #3: 88810c7f9e38 (&vm->mutex/1){+.+.}-{3:3}, at: 
> i915_vma_pin_ww+0x1c4/0x9d0 [i915]
>  #4: 88810bad5768 (reservation_ww_class_mutex){+.+.}-{3:3}, at: 
> i915_gem_evict_something+0x110/0x860 [i915]
>  #5: 88810bad60e8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: 
> i915_gem_evict_something+0x110/0x860 [i915]
> ...
>  #46: 88811964d768 (reservation_ww_class_mutex){+.+.}-{3:3}, at: 
> i915_gem_evict_something+0x110/0x860 [i915]
>  #47: 88811964e0e8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: 
> i915_gem_evict_something+0x110/0x860 [i915]
> INFO: lockdep is turned off.

> As an intermediate solution, add an acquire context to ww_mutex_trylock,
> which allows us to do proper nesting annotations on the trylocks, making
> the above lockdep splat disappear.

Fair enough I suppose.

> +/**
> + * ww_mutex_trylock - tries to acquire the w/w mutex with optional acquire 
> context
> + * @lock: mutex to lock
> + * @ctx: optional w/w acquire context
> + *
> + * Trylocks a mutex with the optional acquire context; no deadlock detection 
> is
> + * possible. Returns 1 if the mutex has been acquired successfully, 0 
> otherwise.
> + *
> + * Unlike ww_mutex_lock, no deadlock handling is performed. However, if a 
> @ctx is
> + * specified, -EALREADY and -EDEADLK handling may happen in calls to 
> ww_mutex_lock.
> + *
> + * A mutex acquired with this function must be released with ww_mutex_unlock.
> + */
> +int __sched
> +ww_mutex_trylock(struct ww_mutex *ww, struct ww_acquire_ctx *ctx)
> +{
> + bool locked;
> +
> + if (!ctx)
> + return mutex_trylock(&ww->base);
> +
> +#ifdef CONFIG_DEBUG_MUTEXES
> + DEBUG_LOCKS_WARN_ON(ww->base.magic != &ww->base);
> +#endif
> +
> + preempt_disable();
> + locked = __mutex_trylock(&ww->base);
> +
> + if (locked) {
> + ww_mutex_set_context_fastpath(ww, ctx);
> + mutex_acquire_nest(&ww->base.dep_map, 0, 1, &ctx->dep_map, 
> _RET_IP_);
> + }
> + preempt_enable();
> +
> + return locked;
> +}
> +EXPORT_SYMBOL(ww_mutex_trylock);

You'll need a similar hunk in ww_rt_mutex.c


Re: [PATCH v2 5/6] drm/i915: Don't back up pinned LMEM context images and rings during suspend

2021-09-08 Thread Matthew Auld

On 06/09/2021 17:55, Thomas Hellström wrote:

Pinned context images are now reset during resume. Don't back them up,
and assuming that rings can be assumed empty at suspend, don't back them
up either.

Introduce a new object flag, I915_BO_ALLOC_PM_VOLATILE meaning that an
object is allowed to lose its content on suspend.

Signed-off-by: Thomas Hellström 
---
  .../gpu/drm/i915/gem/i915_gem_object_types.h| 17 ++---
  drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c  |  3 +++
  drivers/gpu/drm/i915/gt/intel_lrc.c |  3 ++-
  drivers/gpu/drm/i915/gt/intel_ring.c|  3 ++-
  4 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 734cc8e16481..66123ba46247 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -288,16 +288,19 @@ struct drm_i915_gem_object {
I915_SELFTEST_DECLARE(struct list_head st_link);
  
  	unsigned long flags;

-#define I915_BO_ALLOC_CONTIGUOUS BIT(0)
-#define I915_BO_ALLOC_VOLATILE   BIT(1)
-#define I915_BO_ALLOC_CPU_CLEAR  BIT(2)
-#define I915_BO_ALLOC_USER   BIT(3)
+#define I915_BO_ALLOC_CONTIGUOUS  BIT(0)
+#define I915_BO_ALLOC_VOLATILEBIT(1)
+#define I915_BO_ALLOC_CPU_CLEAR   BIT(2)
+#define I915_BO_ALLOC_USERBIT(3)
+/* Object may lose its contents on suspend / resume */


+ if we can't evict it?


+#define I915_BO_ALLOC_PM_VOLATILE BIT(4)
  #define I915_BO_ALLOC_FLAGS (I915_BO_ALLOC_CONTIGUOUS | \
 I915_BO_ALLOC_VOLATILE | \
 I915_BO_ALLOC_CPU_CLEAR | \
-I915_BO_ALLOC_USER)
-#define I915_BO_READONLY BIT(4)
-#define I915_TILING_QUIRK_BIT5 /* unknown swizzling; do not release! */
+I915_BO_ALLOC_USER | \
+I915_BO_ALLOC_PM_VOLATILE)
+#define I915_BO_READONLY  BIT(5)
+#define I915_TILING_QUIRK_BIT 6 /* unknown swizzling; do not release! */
  
  	/**

 * @mem_flags - Mutable placement-related flags
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
index 3884bf45dab8..eaceecfc3f19 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
@@ -61,6 +61,9 @@ static int i915_ttm_backup(struct i915_gem_apply_to_region 
*apply,
if (!pm_apply->backup_pinned)
return 0;
  
+	if (obj->flags & I915_BO_ALLOC_PM_VOLATILE)

+   return 0;
+
sys_region = i915->mm.regions[INTEL_REGION_SMEM];
backup = i915_gem_object_create_region(sys_region,
   obj->base.size,
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c 
b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 6ba8daea2f56..3ef9eaf8c50e 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -942,7 +942,8 @@ __lrc_alloc_state(struct intel_context *ce, struct 
intel_engine_cs *engine)
context_size += PAGE_SIZE;
}
  
-	obj = i915_gem_object_create_lmem(engine->i915, context_size, 0);

+   obj = i915_gem_object_create_lmem(engine->i915, context_size,
+ I915_BO_ALLOC_PM_VOLATILE);
if (IS_ERR(obj))
obj = i915_gem_object_create_shmem(engine->i915, context_size);
if (IS_ERR(obj))
diff --git a/drivers/gpu/drm/i915/gt/intel_ring.c 
b/drivers/gpu/drm/i915/gt/intel_ring.c
index 7c4d5158e03b..2fdd52b62092 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring.c
@@ -112,7 +112,8 @@ static struct i915_vma *create_ring_vma(struct i915_ggtt 
*ggtt, int size)
struct drm_i915_gem_object *obj;
struct i915_vma *vma;
  
-	obj = i915_gem_object_create_lmem(i915, size, I915_BO_ALLOC_VOLATILE);

+   obj = i915_gem_object_create_lmem(i915, size, I915_BO_ALLOC_VOLATILE |
+ I915_BO_ALLOC_PM_VOLATILE);
if (IS_ERR(obj) && i915_ggtt_has_aperture(ggtt))
obj = i915_gem_object_create_stolen(i915, size);
if (IS_ERR(obj))



Re: [PATCH v2 5/6] drm/i915: Don't back up pinned LMEM context images and rings during suspend

2021-09-08 Thread Matthew Auld

On 06/09/2021 17:55, Thomas Hellström wrote:

Pinned context images are now reset during resume. Don't back them up,
and assuming that rings can be assumed empty at suspend, don't back them
up either.

Introduce a new object flag, I915_BO_ALLOC_PM_VOLATILE meaning that an
object is allowed to lose its content on suspend.

Signed-off-by: Thomas Hellström 
---
  .../gpu/drm/i915/gem/i915_gem_object_types.h| 17 ++---
  drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c  |  3 +++
  drivers/gpu/drm/i915/gt/intel_lrc.c |  3 ++-
  drivers/gpu/drm/i915/gt/intel_ring.c|  3 ++-
  4 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 734cc8e16481..66123ba46247 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -288,16 +288,19 @@ struct drm_i915_gem_object {
I915_SELFTEST_DECLARE(struct list_head st_link);
  
  	unsigned long flags;

-#define I915_BO_ALLOC_CONTIGUOUS BIT(0)
-#define I915_BO_ALLOC_VOLATILE   BIT(1)
-#define I915_BO_ALLOC_CPU_CLEAR  BIT(2)
-#define I915_BO_ALLOC_USER   BIT(3)
+#define I915_BO_ALLOC_CONTIGUOUS  BIT(0)
+#define I915_BO_ALLOC_VOLATILEBIT(1)
+#define I915_BO_ALLOC_CPU_CLEAR   BIT(2)
+#define I915_BO_ALLOC_USERBIT(3)
+/* Object may lose its contents on suspend / resume */
+#define I915_BO_ALLOC_PM_VOLATILE BIT(4)


PM_SKIP_PINNED? Not sure if that is better.



  #define I915_BO_ALLOC_FLAGS (I915_BO_ALLOC_CONTIGUOUS | \
 I915_BO_ALLOC_VOLATILE | \
 I915_BO_ALLOC_CPU_CLEAR | \
-I915_BO_ALLOC_USER)
-#define I915_BO_READONLY BIT(4)
-#define I915_TILING_QUIRK_BIT5 /* unknown swizzling; do not release! */
+I915_BO_ALLOC_USER | \
+I915_BO_ALLOC_PM_VOLATILE)
+#define I915_BO_READONLY  BIT(5)
+#define I915_TILING_QUIRK_BIT 6 /* unknown swizzling; do not release! */
  
  	/**

 * @mem_flags - Mutable placement-related flags
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
index 3884bf45dab8..eaceecfc3f19 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c
@@ -61,6 +61,9 @@ static int i915_ttm_backup(struct i915_gem_apply_to_region 
*apply,
if (!pm_apply->backup_pinned)
return 0;
  
+	if (obj->flags & I915_BO_ALLOC_PM_VOLATILE)

+   return 0;
+
sys_region = i915->mm.regions[INTEL_REGION_SMEM];
backup = i915_gem_object_create_region(sys_region,
   obj->base.size,
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c 
b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 6ba8daea2f56..3ef9eaf8c50e 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -942,7 +942,8 @@ __lrc_alloc_state(struct intel_context *ce, struct 
intel_engine_cs *engine)
context_size += PAGE_SIZE;
}
  
-	obj = i915_gem_object_create_lmem(engine->i915, context_size, 0);

+   obj = i915_gem_object_create_lmem(engine->i915, context_size,
+ I915_BO_ALLOC_PM_VOLATILE);
if (IS_ERR(obj))
obj = i915_gem_object_create_shmem(engine->i915, context_size);
if (IS_ERR(obj))
diff --git a/drivers/gpu/drm/i915/gt/intel_ring.c 
b/drivers/gpu/drm/i915/gt/intel_ring.c
index 7c4d5158e03b..2fdd52b62092 100644
--- a/drivers/gpu/drm/i915/gt/intel_ring.c
+++ b/drivers/gpu/drm/i915/gt/intel_ring.c
@@ -112,7 +112,8 @@ static struct i915_vma *create_ring_vma(struct i915_ggtt 
*ggtt, int size)
struct drm_i915_gem_object *obj;
struct i915_vma *vma;
  
-	obj = i915_gem_object_create_lmem(i915, size, I915_BO_ALLOC_VOLATILE);

+   obj = i915_gem_object_create_lmem(i915, size, I915_BO_ALLOC_VOLATILE |
+ I915_BO_ALLOC_PM_VOLATILE);
if (IS_ERR(obj) && i915_ggtt_has_aperture(ggtt))
obj = i915_gem_object_create_stolen(i915, size);
if (IS_ERR(obj))



Re: [PATCH] drm/bridge: ti-sn65dsi83: Check link status register after enabling the bridge

2021-09-08 Thread Dave Stevenson
Hi Marek and Andrzej

On Tue, 7 Sept 2021 at 22:24, Marek Vasut  wrote:
>
> On 9/7/21 7:29 PM, Andrzej Hajda wrote:
> >
> > W dniu 07.09.2021 o 16:25, Marek Vasut pisze:
> >> On 9/7/21 9:31 AM, Andrzej Hajda wrote:
> >>> On 07.09.2021 04:39, Marek Vasut wrote:
>  In rare cases, the bridge may not start up correctly, which usually
>  leads to no display output. In case this happens, warn about it in
>  the kernel log.
> 
>  Signed-off-by: Marek Vasut 
>  Cc: Jagan Teki 
>  Cc: Laurent Pinchart 
>  Cc: Linus Walleij 
>  Cc: Robert Foss 
>  Cc: Sam Ravnborg 
>  Cc: dri-devel@lists.freedesktop.org
>  ---
>  NOTE: See the following:
>  https://e2e.ti.com/support/interface-group/interface/f/interface-forum/942005/sn65dsi83-dsi83-lvds-bridge---sporadic-behavior---no-video
> 
>  https://community.nxp.com/t5/i-MX-Processors/i-MX8M-MIPI-DSI-Interface-LVDS-Bridge-Initialization/td-p/1156533
> 
>  ---
>  drivers/gpu/drm/bridge/ti-sn65dsi83.c | 5 +
>  1 file changed, 5 insertions(+)
> 
>  diff --git a/drivers/gpu/drm/bridge/ti-sn65dsi83.c
>  b/drivers/gpu/drm/bridge/ti-sn65dsi83.c
>  index a32f70bc68ea4..4ea71d7f0bfbc 100644
>  --- a/drivers/gpu/drm/bridge/ti-sn65dsi83.c
>  +++ b/drivers/gpu/drm/bridge/ti-sn65dsi83.c
>  @@ -520,6 +520,11 @@ static void sn65dsi83_atomic_enable(struct
>  drm_bridge *bridge,
>  /* Clear all errors that got asserted during initialization. */
>  regmap_read(ctx->regmap, REG_IRQ_STAT, &pval);
>  regmap_write(ctx->regmap, REG_IRQ_STAT, pval);
> >>>
> >>>
> >>> It does not look as correct error handling, maybe it would be good to
> >>> analyze and optionally report 'unexpected' errors here as well.
> >>
> >> The above is correct -- it clears the status register because the
> >> setup might've set random bits in that register. Then we wait a bit,
> >> let the link run, and read them again to get the real link status in
> >> this new piece of code below, hence the usleep_range there. And then
> >> if the link indicates a problem, we know it is a problem.
> >
> >
> > Usually such registers are cleared on very beginning of the
> > initialization, and tested (via irq handler, or via reading), during
> > initalization, if initialization phase goes well. If it is not the case
> > forgive me.
>
> The init just flips the bit at random in the IRQ_STAT register, so no,
> that's not really viable here. That's why we clear them at the end, and
> then wait a bit, and then check whether something new appeared in them.
>
> If not, all is great.
>
> Sure, we could generate an IRQ, but then IRQ line is not always
> connected to this chip on all hardware I have available. So this gives
> the user at least some indication that something is wrong with their HW.
>
>  +
>  +usleep_range(1, 12000);
>  +regmap_read(ctx->regmap, REG_IRQ_STAT, &pval);
>  +if (pval)
>  +dev_err(ctx->dev, "Unexpected link status 0x%02x\n", pval);
> >>>
> >>>
> >>> I am not sure what is the case here but it looks like 'we do not know
> >>> what is going on, so let's add some diagnostic messages to gather info
> >>> and figure it out later'.
> >>
> >> That's pretty much the case, see the two links above in the NOTE
> >> section. If something goes wrong, we print the value for the user
> >> (usually developer) so they can fix their problems. We cannot do much
> >> better in the attach callback.
> >>
> >> The issue I ran into (and where this would be helpful information to
> >> me during debugging, since the issue happened real seldom, see also
> >> the NOTE links above) is that the DSI controller driver started
> >> streaming video on the data lanes before the DSI83 had a chance to
> >> initialize. This worked most of the time, except for a few exceptions
> >> here and there, where the video didn't start. This does set link
> >> status bits consistently. In the meantime, I fixed the controller
> >> driver (so far downstream, due to ongoing discussion).
> >
> >
> > Maybe drm_connector_set_link_status_property(conn,
> > DRM_MODE_LINK_STATUS_BAD) would be usefule here.
>
> Hmm, this works on connector, the dsi83 is a bridge and it can be stuck
> between two other bridges. That doesn't seem like the right tool, no ?
>
> >>> Whole driver lacks IRQ handler which IMO could perform better diagnosis,
> >>> and I guess it could also help in recovery, but this is just my guess.
> >>> So if this patch is enough for now you can add:
> >>
> >> No, IRQ won't help you here, because by the time you get the IRQ, the
> >> DSI host already started streaming video on data lanes and you won't
> >> be able to correctly reinit the DSI83 unless you communicate to the
> >> DSI host that it should switch the data lanes back to LP11.
> >>
> >> And for that, there is a bigger chunk missing really. What needs to be
> >> added is a way for the DSI bridge / panel to commun

Re: Handling DRM master transitions cooperatively

2021-09-08 Thread Pekka Paalanen
On Wed, 08 Sep 2021 09:51:54 +
Simon Ser  wrote:

> > On Tue, 07 Sep 2021 10:19:03 +
> > Simon Ser  wrote:
> >  
> > > FWIW, I've just hit a case where a compositor leaves a "rotation" KMS
> > > prop set behind, then Xorg tries to startup and fails because it doesn't
> > > reset this prop. So none of this is theoretical.
> > >
> > > I still think a "reset all KMS props to an arbitrary default value" flag
> > > in drmModeAtomicCommit is the best way forward. I'm not sure a user-space
> > > protocol would help too much.  
> >
> > Hi Simon,
> >
> > for the "reset KMS state" problem, sure. Thanks for confirming the
> > problem, too.
> >
> > The hand-off problem does need userspace protocol though, so that the
> > two parties can negotiate what part of KMS state can be inherited by
> > the receiver and who will do the animation from the first to the second
> > state in case you want to avoid abrupt changes. It would also be useful
> > for a cross-fade as a perhaps more flexible way than the current "leak
> > an FB, let the next KMS client scrape it via ioctls and copy it so it
> > can be textured from".  
> 
> The KMS state can be limited to single FB on primary plane covering the whole
> CRTC, no scaling, no other property set than FB_ID/CRTC_*/SRC_*.
> 
> Is it useful to make the previous client perform the animation? I don't really
> understand the use-case here.

I guess the use cases are more or less imaginary for now.

Imagine one HDR-capable display server handing off to another
HDR-capable display server. If the releasing display server does not
know the receiving display server understands HDR, the releasing
display server might run an animation to turn HDR off - fade to black,
for instance, so that the impact from changing from HDR to SDR is
minimized. Then the receiving display server sees KMS is in SDR mode,
and maybe sets up a black image and then switches back to HDR.

If you're happy with fade-to-black on switch, then no problem. However,
the only way to not fade-to-black or even come cross-fade is
some negotiation to see that both sides understand HDR.

If the previous FB was rendered for HDR display, you will need to know
a lot from it if you want to do a cross-fade that doesn't glitch.

Also, while I don't see why changing between SDR and HDR would require
a modeset in KMS, I suppose it might take a moment for the monitor to
adapt. It might cause glitches similar to changing video modes.

> > Userspace protocol is also useful for starting the next KMS client
> > first and handing off only later once it's actually running. I'm not
> > sure if that is already possible with the session switching stuff, but
> > I have a feeling it might be fragile or miss pieces like the next KMS
> > client signalling ready before actually switching to it.  
> 
> Hm, right. I'm not 100% clear if it's possible for the next client to set
> everything up while the VT is not active.
> 
> It would help to make logind/seatd give a non-master DRM FD when VT-switched
> away. Not sure they do it atm.

Oh yeah, that may be an obvious gap I missed.


Thanks,
pq


pgpZ01px8LEWc.pgp
Description: OpenPGP digital signature


[drm:i915-uncore-vfunc 30/31] drivers/gpu/drm/i915/selftests/mock_uncore.c:47:2: error: implicit declaration of function 'ASSIGN_RAW_WRITE_MMIO_VFUNCS'

2021-09-08 Thread kernel test robot
tree:   git://people.freedesktop.org/~airlied/linux.git i915-uncore-vfunc
head:   b42168f90718a90b11f2d52306d9aeaa9468
commit: 99aebd17891290abfca80c48eca01f4e02413fb3 [30/31] drm/i915/uncore: 
constify the register vtables.
config: i386-randconfig-a014-20210908 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 
9c476172b93367d2cb88d7d3f4b1b5b456fa6020)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
git remote add drm git://people.freedesktop.org/~airlied/linux.git
git fetch --no-tags drm i915-uncore-vfunc
git checkout 99aebd17891290abfca80c48eca01f4e02413fb3
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=i386 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

   In file included from drivers/gpu/drm/i915/intel_uncore.c:2630:
>> drivers/gpu/drm/i915/selftests/mock_uncore.c:47:2: error: implicit 
>> declaration of function 'ASSIGN_RAW_WRITE_MMIO_VFUNCS' 
>> [-Werror,-Wimplicit-function-declaration]
   ASSIGN_RAW_WRITE_MMIO_VFUNCS(uncore, nop);
   ^
>> drivers/gpu/drm/i915/selftests/mock_uncore.c:47:39: error: use of undeclared 
>> identifier 'nop'; did you mean 'nopv'?
   ASSIGN_RAW_WRITE_MMIO_VFUNCS(uncore, nop);
^~~
nopv
   arch/x86/include/asm/hypervisor.h:69:13: note: 'nopv' declared here
   extern bool nopv;
   ^
   In file included from drivers/gpu/drm/i915/intel_uncore.c:2630:
>> drivers/gpu/drm/i915/selftests/mock_uncore.c:48:2: error: implicit 
>> declaration of function 'ASSIGN_RAW_READ_MMIO_VFUNCS' 
>> [-Werror,-Wimplicit-function-declaration]
   ASSIGN_RAW_READ_MMIO_VFUNCS(uncore, nop);
   ^
   drivers/gpu/drm/i915/selftests/mock_uncore.c:48:2: note: did you mean 
'ASSIGN_RAW_WRITE_MMIO_VFUNCS'?
   drivers/gpu/drm/i915/selftests/mock_uncore.c:47:2: note: 
'ASSIGN_RAW_WRITE_MMIO_VFUNCS' declared here
   ASSIGN_RAW_WRITE_MMIO_VFUNCS(uncore, nop);
   ^
   drivers/gpu/drm/i915/selftests/mock_uncore.c:48:38: error: use of undeclared 
identifier 'nop'; did you mean 'nopv'?
   ASSIGN_RAW_READ_MMIO_VFUNCS(uncore, nop);
   ^~~
   nopv
   arch/x86/include/asm/hypervisor.h:69:13: note: 'nopv' declared here
   extern bool nopv;
   ^
   4 errors generated.


vim +/ASSIGN_RAW_WRITE_MMIO_VFUNCS +47 
drivers/gpu/drm/i915/selftests/mock_uncore.c

0757ac8fc7c1da Chris Wilson   2017-04-12  41  
d14a701b007063 Chris Wilson   2019-10-08  42  void 
mock_uncore_init(struct intel_uncore *uncore,
d14a701b007063 Chris Wilson   2019-10-08  43  struct 
drm_i915_private *i915)
0757ac8fc7c1da Chris Wilson   2017-04-12  44  {
d14a701b007063 Chris Wilson   2019-10-08  45
intel_uncore_init_early(uncore, i915);
d14a701b007063 Chris Wilson   2019-10-08  46  
ccb2aceaaa5f92 Daniele Ceraolo Spurio 2019-06-19 @47
ASSIGN_RAW_WRITE_MMIO_VFUNCS(uncore, nop);
ccb2aceaaa5f92 Daniele Ceraolo Spurio 2019-06-19 @48
ASSIGN_RAW_READ_MMIO_VFUNCS(uncore, nop);

:: The code at line 47 was first introduced by commit
:: ccb2aceaaa5f9267ef7b485b41ae9be3f04b50d3 drm/i915: use vfuncs for 
reg_read/write_fw_domains

:: TO: Daniele Ceraolo Spurio 
:: CC: Tvrtko Ursulin 

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip


Re: [PATCH v2 1/3] dt-bindings: msm: dsi: Add MSM8953 dsi phy

2021-09-08 Thread Rob Herring
On Fri, Sep 03, 2021 at 10:38:42PM +0530, Sireesh Kodali wrote:
> SoCs based on the MSM8953 platform use the 14nm DSI PHY driver
> 
> Signed-off-by: Sireesh Kodali 
> ---
>  Documentation/devicetree/bindings/display/msm/dsi-phy-14nm.yaml | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/display/msm/dsi-phy-14nm.yaml 
> b/Documentation/devicetree/bindings/display/msm/dsi-phy-14nm.yaml
> index 72a00cce0147..d2cb19cf71d6 100644
> --- a/Documentation/devicetree/bindings/display/msm/dsi-phy-14nm.yaml
> +++ b/Documentation/devicetree/bindings/display/msm/dsi-phy-14nm.yaml
> @@ -17,6 +17,8 @@ properties:
>  oneOf:
>- const: qcom,dsi-phy-14nm
>- const: qcom,dsi-phy-14nm-660
> +  - const: qcom,dsi-phy-14nm-8953
> +

This is going to conflict with v5.15-rc1, so you'll need to resend it.

>  
>reg:
>  items:
> -- 
> 2.33.0
> 
> 


[drm:i915-uncore-vfunc 31/31] make[4]: *** No rule to make target 'drivers/gpu/drm/i915/display/intel_display_trace_points.o', needed by 'drivers/gpu/drm/i915/i915.o'.

2021-09-08 Thread kernel test robot
Hi Dave,

First bad commit (maybe != root cause):

tree:   git://people.freedesktop.org/~airlied/linux.git i915-uncore-vfunc
head:   b42168f90718a90b11f2d52306d9aeaa9468
commit: b42168f90718a90b11f2d52306d9aeaa9468 [31/31] RFC: drm/i915: start 
splitting trace points
config: x86_64-randconfig-a006-20210908 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce (this is a W=1 build):
git remote add drm git://people.freedesktop.org/~airlied/linux.git
git fetch --no-tags drm i915-uncore-vfunc
git checkout b42168f90718a90b11f2d52306d9aeaa9468
# save the attached .config to linux build tree
make W=1 ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

   make[4]: *** [scripts/Makefile.build:271: drivers/gpu/drm/i915/i915_irq.o] 
Error 1
   make[4]: *** [scripts/Makefile.build:271: 
drivers/gpu/drm/i915/intel_uncore.o] Error 1
>> make[4]: *** No rule to make target 
>> 'drivers/gpu/drm/i915/display/intel_display_trace_points.o', needed by 
>> 'drivers/gpu/drm/i915/i915.o'.
   make[4]: *** [scripts/Makefile.build:271: 
drivers/gpu/drm/i915/display/intel_atomic_plane.o] Error 1
   make[4]: *** [scripts/Makefile.build:271: 
drivers/gpu/drm/i915/display/intel_crtc.o] Error 1
   make[4]: *** [scripts/Makefile.build:271: 
drivers/gpu/drm/i915/display/intel_fifo_underrun.o] Error 1
   make[4]: *** [scripts/Makefile.build:271: 
drivers/gpu/drm/i915/display/intel_frontbuffer.o] Error 1
   make[4]: *** [scripts/Makefile.build:271: 
drivers/gpu/drm/i915/display/intel_fbc.o] Error 1
   make[4]: Target '__build' not remade because of errors.

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip


Re: [PATCH v2 5/6] drm/i915: Don't back up pinned LMEM context images and rings during suspend

2021-09-08 Thread Thomas Hellström
On Wed, 2021-09-08 at 12:07 +0100, Matthew Auld wrote:
> On 06/09/2021 17:55, Thomas Hellström wrote:
> > Pinned context images are now reset during resume. Don't back them
> > up,
> > and assuming that rings can be assumed empty at suspend, don't back
> > them
> > up either.
> > 
> > Introduce a new object flag, I915_BO_ALLOC_PM_VOLATILE meaning that
> > an
> > object is allowed to lose its content on suspend.
> > 
> > Signed-off-by: Thomas Hellström 
> > ---
> >   .../gpu/drm/i915/gem/i915_gem_object_types.h    | 17 ++--
> > -
> >   drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c  |  3 +++
> >   drivers/gpu/drm/i915/gt/intel_lrc.c |  3 ++-
> >   drivers/gpu/drm/i915/gt/intel_ring.c    |  3 ++-
> >   4 files changed, 17 insertions(+), 9 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> > b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> > index 734cc8e16481..66123ba46247 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> > @@ -288,16 +288,19 @@ struct drm_i915_gem_object {
> > I915_SELFTEST_DECLARE(struct list_head st_link);
> >   
> > unsigned long flags;
> > -#define I915_BO_ALLOC_CONTIGUOUS BIT(0)
> > -#define I915_BO_ALLOC_VOLATILE   BIT(1)
> > -#define I915_BO_ALLOC_CPU_CLEAR  BIT(2)
> > -#define I915_BO_ALLOC_USER   BIT(3)
> > +#define I915_BO_ALLOC_CONTIGUOUS  BIT(0)
> > +#define I915_BO_ALLOC_VOLATILE    BIT(1)
> > +#define I915_BO_ALLOC_CPU_CLEAR   BIT(2)
> > +#define I915_BO_ALLOC_USER    BIT(3)
> > +/* Object may lose its contents on suspend / resume */
> > +#define I915_BO_ALLOC_PM_VOLATILE BIT(4)

> 
> PM_SKIP_PINNED? Not sure if that is better.

I think we could update the comment to say "object is allowed to
lose..", I think we could keep PM_VOLATILE to keep it consistent with
the ALLOC_VOLATILE flag?

/Thomas




Re: [PATCH v2 5/6] drm/i915: Don't back up pinned LMEM context images and rings during suspend

2021-09-08 Thread Matthew Auld

On 08/09/2021 13:26, Thomas Hellström wrote:

On Wed, 2021-09-08 at 12:07 +0100, Matthew Auld wrote:

On 06/09/2021 17:55, Thomas Hellström wrote:

Pinned context images are now reset during resume. Don't back them
up,
and assuming that rings can be assumed empty at suspend, don't back
them
up either.

Introduce a new object flag, I915_BO_ALLOC_PM_VOLATILE meaning that
an
object is allowed to lose its content on suspend.

Signed-off-by: Thomas Hellström 
---
   .../gpu/drm/i915/gem/i915_gem_object_types.h    | 17 ++--
-
   drivers/gpu/drm/i915/gem/i915_gem_ttm_pm.c  |  3 +++
   drivers/gpu/drm/i915/gt/intel_lrc.c |  3 ++-
   drivers/gpu/drm/i915/gt/intel_ring.c    |  3 ++-
   4 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 734cc8e16481..66123ba46247 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -288,16 +288,19 @@ struct drm_i915_gem_object {
 I915_SELFTEST_DECLARE(struct list_head st_link);
   
 unsigned long flags;

-#define I915_BO_ALLOC_CONTIGUOUS BIT(0)
-#define I915_BO_ALLOC_VOLATILE   BIT(1)
-#define I915_BO_ALLOC_CPU_CLEAR  BIT(2)
-#define I915_BO_ALLOC_USER   BIT(3)
+#define I915_BO_ALLOC_CONTIGUOUS  BIT(0)
+#define I915_BO_ALLOC_VOLATILE    BIT(1)
+#define I915_BO_ALLOC_CPU_CLEAR   BIT(2)
+#define I915_BO_ALLOC_USER    BIT(3)
+/* Object may lose its contents on suspend / resume */
+#define I915_BO_ALLOC_PM_VOLATILE BIT(4)




PM_SKIP_PINNED? Not sure if that is better.


I think we could update the comment to say "object is allowed to
lose..", I think we could keep PM_VOLATILE to keep it consistent with
the ALLOC_VOLATILE flag?


I guess that's the potentially confusing bit. ALLLOC_VOLATILE means the 
pages might be discarded as soon as the pages become unpinned, without 
needing to worry about persisting their contents. With PM_VOLATILE I was 
expecting something similar where unpinned objects can simply be skipped 
or ignored during pm. Anyway, that's just a bikeshed, I think with 
improved comment this should be fine.




/Thomas




Enabling TTM kerneldoc

2021-09-08 Thread Christian König
Last round for this set I think, already got RBs for most patches.

Only patch #2 is currently missing anything.

Please point out anything which can be quickly improved and keep in mind
that it's better to have this enabled with some typos than not enabled
at all.

Cheers,
Christian.




[PATCH 1/8] drm/ttm: remove the outdated kerneldoc section

2021-09-08 Thread Christian König
Clean up to start over with new and more accurate documentation.

Signed-off-by: Christian König 
Reviewed-by: Matthew Auld 
---
 Documentation/gpu/drm-mm.rst | 49 
 1 file changed, 49 deletions(-)

diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
index 0198fa43d254..8ca981065e1a 100644
--- a/Documentation/gpu/drm-mm.rst
+++ b/Documentation/gpu/drm-mm.rst
@@ -30,55 +30,6 @@ The Translation Table Manager (TTM)
 
 TTM design background and information belongs here.
 
-TTM initialization
---
-
-**Warning**
-This section is outdated.
-
-Drivers wishing to support TTM must pass a filled :c:type:`ttm_bo_driver
-` structure to ttm_bo_device_init, together with an
-initialized global reference to the memory manager.  The ttm_bo_driver
-structure contains several fields with function pointers for
-initializing the TTM, allocating and freeing memory, waiting for command
-completion and fence synchronization, and memory migration.
-
-The :c:type:`struct drm_global_reference ` is made
-up of several fields:
-
-.. code-block:: c
-
-  struct drm_global_reference {
-  enum ttm_global_types global_type;
-  size_t size;
-  void *object;
-  int (*init) (struct drm_global_reference *);
-  void (*release) (struct drm_global_reference *);
-  };
-
-
-There should be one global reference structure for your memory manager
-as a whole, and there will be others for each object created by the
-memory manager at runtime. Your global TTM should have a type of
-TTM_GLOBAL_TTM_MEM. The size field for the global object should be
-sizeof(struct ttm_mem_global), and the init and release hooks should
-point at your driver-specific init and release routines, which probably
-eventually call ttm_mem_global_init and ttm_mem_global_release,
-respectively.
-
-Once your global TTM accounting structure is set up and initialized by
-calling ttm_global_item_ref() on it, you need to create a buffer
-object TTM to provide a pool for buffer object allocation by clients and
-the kernel itself. The type of this object should be
-TTM_GLOBAL_TTM_BO, and its size should be sizeof(struct
-ttm_bo_global). Again, driver-specific init and release functions may
-be provided, likely eventually calling ttm_bo_global_ref_init() and
-ttm_bo_global_ref_release(), respectively. Also, like the previous
-object, ttm_global_item_ref() is used to create an initial reference
-count for the TTM, which will call your initialization function.
-
-See the radeon_ttm.c file for an example of usage.
-
 The Graphics Execution Manager (GEM)
 
 
-- 
2.25.1



[PATCH 2/8] drm/ttm: add some general module kerneldoc

2021-09-08 Thread Christian König
For now just a brief description of what TTM is all about.

Signed-off-by: Christian König 
---
 Documentation/gpu/drm-mm.rst |  3 ++-
 drivers/gpu/drm/ttm/ttm_module.c | 12 
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
index 8ca981065e1a..6b7717af4f88 100644
--- a/Documentation/gpu/drm-mm.rst
+++ b/Documentation/gpu/drm-mm.rst
@@ -28,7 +28,8 @@ UMA devices.
 The Translation Table Manager (TTM)
 ===
 
-TTM design background and information belongs here.
+.. kernel-doc:: drivers/gpu/drm/ttm/ttm_module.c
+   :doc: TTM
 
 The Graphics Execution Manager (GEM)
 
diff --git a/drivers/gpu/drm/ttm/ttm_module.c b/drivers/gpu/drm/ttm/ttm_module.c
index 997c458f68a9..6c19290f7ea9 100644
--- a/drivers/gpu/drm/ttm/ttm_module.c
+++ b/drivers/gpu/drm/ttm/ttm_module.c
@@ -39,6 +39,18 @@
 
 #include "ttm_module.h"
 
+/**
+ * DOC: TTM
+ *
+ * TTM is a memory manager for accelerator devices with dedicated memory.
+ *
+ * The basic idea is that resources are grouped together in buffer objects of
+ * certain size and TTM handles lifetime, movement and CPU mappings of those
+ * objects.
+ *
+ * TODO: Add more design background and information here.
+ */
+
 /**
  * ttm_prot_from_caching - Modify the page protection according to the
  * ttm cacing mode
-- 
2.25.1



[PATCH 5/8] drm/ttm: enable TTM resource object kerneldoc v2

2021-09-08 Thread Christian König
Fix the last two remaining warnings and finally enable this.

v2: add caching enum link

Signed-off-by: Christian König 
Reviewed-by: Matthew Auld 
Reviewed-by: Alex Deucher 
---
 Documentation/gpu/drm-mm.rst   | 9 +
 include/drm/ttm/ttm_resource.h | 6 ++
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
index 3da81b7b4e71..66d24b745c62 100644
--- a/Documentation/gpu/drm-mm.rst
+++ b/Documentation/gpu/drm-mm.rst
@@ -43,6 +43,15 @@ TTM device object reference
 .. kernel-doc:: drivers/gpu/drm/ttm/ttm_device.c
:export:
 
+TTM resource object reference
+-
+
+.. kernel-doc:: include/drm/ttm/ttm_resource.h
+   :internal:
+
+.. kernel-doc:: drivers/gpu/drm/ttm/ttm_resource.c
+   :export:
+
 The Graphics Execution Manager (GEM)
 
 
diff --git a/include/drm/ttm/ttm_resource.h b/include/drm/ttm/ttm_resource.h
index 32c5edd9e8b5..5952051091cd 100644
--- a/include/drm/ttm/ttm_resource.h
+++ b/include/drm/ttm/ttm_resource.h
@@ -103,10 +103,7 @@ struct ttm_resource_manager_func {
  * struct ttm_resource_manager
  *
  * @use_type: The memory type is enabled.
- * @flags: TTM_MEMTYPE_XX flags identifying the traits of the memory
- * managed by this memory type.
- * @gpu_offset: If used, the GPU offset of the first managed page of
- * fixed memory or the first managed location in an aperture.
+ * @use_tt: If a TT object should be used for the backing store.
  * @size: Size of the managed region.
  * @func: structure pointer implementing the range manager. See above
  * @move_lock: lock for move fence
@@ -144,6 +141,7 @@ struct ttm_resource_manager {
  * @addr:  mapped virtual address
  * @offset:physical addr
  * @is_iomem:  is this io memory ?
+ * @caching:   See enum ttm_caching
  *
  * Structure indicating the bus placement of an object.
  */
-- 
2.25.1



[PATCH 4/8] drm/ttm: enable TTM device object kerneldoc v2

2021-09-08 Thread Christian König
Fix the remaining warnings, switch to inline structure documentation
and finally enable this.

v2: adjust based on suggestions from Alex

Signed-off-by: Christian König 
Reviewed-by: Matthew Auld 
---
 Documentation/gpu/drm-mm.rst |  9 +
 include/drm/ttm/ttm_device.h | 72 +++-
 2 files changed, 48 insertions(+), 33 deletions(-)

diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
index f22c9f9a2c0e..3da81b7b4e71 100644
--- a/Documentation/gpu/drm-mm.rst
+++ b/Documentation/gpu/drm-mm.rst
@@ -34,6 +34,15 @@ The Translation Table Manager (TTM)
 .. kernel-doc:: include/drm/ttm/ttm_caching.h
:internal:
 
+TTM device object reference
+---
+
+.. kernel-doc:: include/drm/ttm/ttm_device.h
+   :internal:
+
+.. kernel-doc:: drivers/gpu/drm/ttm/ttm_device.c
+   :export:
+
 The Graphics Execution Manager (GEM)
 
 
diff --git a/include/drm/ttm/ttm_device.h b/include/drm/ttm/ttm_device.h
index 07d722950d5b..3cc1d9b76131 100644
--- a/include/drm/ttm/ttm_device.h
+++ b/include/drm/ttm/ttm_device.h
@@ -39,31 +39,23 @@ struct ttm_operation_ctx;
 
 /**
  * struct ttm_global - Buffer object driver global data.
- *
- * @dummy_read_page: Pointer to a dummy page used for mapping requests
- * of unpopulated pages.
- * @shrink: A shrink callback object used for buffer object swap.
- * @device_list_mutex: Mutex protecting the device list.
- * This mutex is held while traversing the device list for pm options.
- * @lru_lock: Spinlock protecting the bo subsystem lru lists.
- * @device_list: List of buffer object devices.
- * @swap_lru: Lru list of buffer objects used for swapping.
  */
 extern struct ttm_global {
 
/**
-* Constant after init.
+* @dummy_read_page: Pointer to a dummy page used for mapping requests
+* of unpopulated pages. Constant after init.
 */
-
struct page *dummy_read_page;
 
/**
-* Protected by ttm_global_mutex.
+* @device_list: List of buffer object devices. Protected by
+* ttm_global_mutex.
 */
struct list_head device_list;
 
/**
-* Internal protection.
+* @bo_count: Number of buffer objects allocated by devices.
 */
atomic_t bo_count;
 } ttm_glob;
@@ -230,50 +222,64 @@ struct ttm_device_funcs {
 
 /**
  * struct ttm_device - Buffer object driver device-specific data.
- *
- * @device_list: Our entry in the global device list.
- * @funcs: Function table for the device.
- * @sysman: Resource manager for the system domain.
- * @man_drv: An array of resource_managers.
- * @vma_manager: Address space manager.
- * @pool: page pool for the device.
- * @dev_mapping: A pointer to the struct address_space representing the
- * device address space.
- * @wq: Work queue structure for the delayed delete workqueue.
  */
 struct ttm_device {
-   /*
+   /**
+* @device_list: Our entry in the global device list.
 * Constant after bo device init
 */
struct list_head device_list;
+
+   /**
+* @funcs: Function table for the device.
+* Constant after bo device init
+*/
struct ttm_device_funcs *funcs;
 
-   /*
+   /**
+* @sysman: Resource manager for the system domain.
 * Access via ttm_manager_type.
 */
struct ttm_resource_manager sysman;
+
+   /**
+* @man_drv: An array of resource_managers, one per resource type.
+*/
struct ttm_resource_manager *man_drv[TTM_NUM_MEM_TYPES];
 
-   /*
-* Protected by internal locks.
+   /**
+* @vma_manager: Address space manager for finding BOs to mmap.
 */
struct drm_vma_offset_manager *vma_manager;
+
+   /**
+* @pool: page pool for the device.
+*/
struct ttm_pool pool;
 
-   /*
-* Protection for the per manager LRU and ddestroy lists.
+   /**
+* @lru_lock: Protection for the per manager LRU and ddestroy lists.
 */
spinlock_t lru_lock;
+
+   /**
+* @ddestroy: Destroyed but not yet cleaned up buffer objects.
+*/
struct list_head ddestroy;
+
+   /**
+* @pinned: Buffer objects which are pinned and so not on any LRU list.
+*/
struct list_head pinned;
 
-   /*
-* Protected by load / firstopen / lastclose /unload sync.
+   /**
+* @dev_mapping: A pointer to the struct address_space for invalidating
+* CPU mappings on buffer move. Protected by load/unload sync.
 */
struct address_space *dev_mapping;
 
-   /*
-* Internal protection.
+   /**
+* @wq: Work queue structure for the delayed delete workqueue.
 */
struct delayed_work wq;
 };
-- 
2.25.1



[PATCH 6/8] drm/ttm: enable TTM placement kerneldoc

2021-09-08 Thread Christian König
Fix the last remaining warning and finally enable this.

Signed-off-by: Christian König 
Reviewed-by: Matthew Auld 
Reviewed-by: Alex Deucher 
---
 Documentation/gpu/drm-mm.rst| 6 ++
 include/drm/ttm/ttm_placement.h | 1 +
 2 files changed, 7 insertions(+)

diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
index 66d24b745c62..1c9930fb5e7d 100644
--- a/Documentation/gpu/drm-mm.rst
+++ b/Documentation/gpu/drm-mm.rst
@@ -43,6 +43,12 @@ TTM device object reference
 .. kernel-doc:: drivers/gpu/drm/ttm/ttm_device.c
:export:
 
+TTM resource placement reference
+
+
+.. kernel-doc:: include/drm/ttm/ttm_placement.h
+   :internal:
+
 TTM resource object reference
 -
 
diff --git a/include/drm/ttm/ttm_placement.h b/include/drm/ttm/ttm_placement.h
index 8995c9e4ec1b..76d1b9119a2b 100644
--- a/include/drm/ttm/ttm_placement.h
+++ b/include/drm/ttm/ttm_placement.h
@@ -58,6 +58,7 @@
  *
  * @fpfn:  first valid page frame number to put the object
  * @lpfn:  last valid page frame number to put the object
+ * @mem_type:  One of TTM_PL_* where the resource should be allocated from.
  * @flags: memory domain and caching flags for the object
  *
  * Structure indicating a possible place to put an object.
-- 
2.25.1



[PATCH 7/8] drm/ttm: enable TTM TT object kerneldoc v2

2021-09-08 Thread Christian König
Fix the remaining warnings and finally enable this.

v2: add caching enum link

Signed-off-by: Christian König 
Reviewed-by: Matthew Auld 
Reviewed-by: Alex Deucher 
---
 Documentation/gpu/drm-mm.rst |  9 +
 include/drm/ttm/ttm_tt.h | 11 ---
 2 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
index 1c9930fb5e7d..69c4a20b95d0 100644
--- a/Documentation/gpu/drm-mm.rst
+++ b/Documentation/gpu/drm-mm.rst
@@ -58,6 +58,15 @@ TTM resource object reference
 .. kernel-doc:: drivers/gpu/drm/ttm/ttm_resource.c
:export:
 
+TTM TT object reference
+---
+
+.. kernel-doc:: include/drm/ttm/ttm_tt.h
+   :internal:
+
+.. kernel-doc:: drivers/gpu/drm/ttm/ttm_tt.c
+   :export:
+
 The Graphics Execution Manager (GEM)
 
 
diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
index e402dab1d0f6..b3963ab12e1f 100644
--- a/include/drm/ttm/ttm_tt.h
+++ b/include/drm/ttm/ttm_tt.h
@@ -54,7 +54,7 @@ struct ttm_operation_ctx;
  * @dma_address: The DMA (bus) addresses of the pages
  * @swap_storage: Pointer to shmem struct file for swap storage.
  * @pages_list: used by some page allocation backend
- * @caching: The current caching state of the pages.
+ * @caching: The current caching state of the pages, see enum ttm_caching.
  *
  * This is a structure holding the pages, caching- and aperture binding
  * status for a buffer object that isn't backed by fixed (VRAM / AGP)
@@ -126,8 +126,9 @@ int ttm_sg_tt_init(struct ttm_tt *ttm_dma, struct 
ttm_buffer_object *bo,
 void ttm_tt_fini(struct ttm_tt *ttm);
 
 /**
- * ttm_ttm_destroy:
+ * ttm_tt_destroy:
  *
+ * @bdev: the ttm_device this object belongs to
  * @ttm: The struct ttm_tt.
  *
  * Unbind, unpopulate and destroy common struct ttm_tt.
@@ -148,15 +149,19 @@ int ttm_tt_swapout(struct ttm_device *bdev, struct ttm_tt 
*ttm,
 /**
  * ttm_tt_populate - allocate pages for a ttm
  *
+ * @bdev: the ttm_device this object belongs to
  * @ttm: Pointer to the ttm_tt structure
+ * @ctx: operation context for populating the tt object.
  *
  * Calls the driver method to allocate pages for a ttm
  */
-int ttm_tt_populate(struct ttm_device *bdev, struct ttm_tt *ttm, struct 
ttm_operation_ctx *ctx);
+int ttm_tt_populate(struct ttm_device *bdev, struct ttm_tt *ttm,
+   struct ttm_operation_ctx *ctx);
 
 /**
  * ttm_tt_unpopulate - free pages from a ttm
  *
+ * @bdev: the ttm_device this object belongs to
  * @ttm: Pointer to the ttm_tt structure
  *
  * Calls the driver method to free all pages from a ttm
-- 
2.25.1



[PATCH 3/8] drm/ttm: add kerneldoc for enum ttm_caching

2021-09-08 Thread Christian König
Briefly describe what this is all about.

Signed-off-by: Christian König 
Reviewed-by: Alex Deucher 
---
 Documentation/gpu/drm-mm.rst  |  3 +++
 include/drm/ttm/ttm_caching.h | 17 +
 2 files changed, 20 insertions(+)

diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
index 6b7717af4f88..f22c9f9a2c0e 100644
--- a/Documentation/gpu/drm-mm.rst
+++ b/Documentation/gpu/drm-mm.rst
@@ -31,6 +31,9 @@ The Translation Table Manager (TTM)
 .. kernel-doc:: drivers/gpu/drm/ttm/ttm_module.c
:doc: TTM
 
+.. kernel-doc:: include/drm/ttm/ttm_caching.h
+   :internal:
+
 The Graphics Execution Manager (GEM)
 
 
diff --git a/include/drm/ttm/ttm_caching.h b/include/drm/ttm/ttm_caching.h
index 3c9dd65f5aaf..235a743d90e1 100644
--- a/include/drm/ttm/ttm_caching.h
+++ b/include/drm/ttm/ttm_caching.h
@@ -27,9 +27,26 @@
 
 #define TTM_NUM_CACHING_TYPES  3
 
+/**
+ * enum ttm_caching - CPU caching and BUS snooping behavior.
+ */
 enum ttm_caching {
+   /**
+* @ttm_uncached: Most defensive option for device mappings,
+* don't even allow write combining.
+*/
ttm_uncached,
+
+   /**
+* @ttm_write_combined: Don't cache read accesses, but allow at least
+* writes to be combined.
+*/
ttm_write_combined,
+
+   /**
+* @ttm_cached: Fully cached like normal system memory, requires that
+* devices snoop the CPU cache on accesses.
+*/
ttm_cached
 };
 
-- 
2.25.1



[PATCH 8/8] drm/ttm: enable TTM page pool kerneldoc

2021-09-08 Thread Christian König
Fix the remaining warnings and finally enable this.

Signed-off-by: Christian König 
Reviewed-by: Alex Deucher 
---
 Documentation/gpu/drm-mm.rst | 9 +
 include/drm/ttm/ttm_pool.h   | 5 +++--
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
index 69c4a20b95d0..e0538083a2c0 100644
--- a/Documentation/gpu/drm-mm.rst
+++ b/Documentation/gpu/drm-mm.rst
@@ -67,6 +67,15 @@ TTM TT object reference
 .. kernel-doc:: drivers/gpu/drm/ttm/ttm_tt.c
:export:
 
+TTM page pool reference
+---
+
+.. kernel-doc:: include/drm/ttm/ttm_pool.h
+   :internal:
+
+.. kernel-doc:: drivers/gpu/drm/ttm/ttm_pool.c
+   :export:
+
 The Graphics Execution Manager (GEM)
 
 
diff --git a/include/drm/ttm/ttm_pool.h b/include/drm/ttm/ttm_pool.h
index 4321728bdd11..ef09b23d29e3 100644
--- a/include/drm/ttm/ttm_pool.h
+++ b/include/drm/ttm/ttm_pool.h
@@ -37,7 +37,7 @@ struct ttm_pool;
 struct ttm_operation_ctx;
 
 /**
- * ttm_pool_type - Pool for a certain memory type
+ * struct ttm_pool_type - Pool for a certain memory type
  *
  * @pool: the pool we belong to, might be NULL for the global ones
  * @order: the allocation order our pages have
@@ -58,8 +58,9 @@ struct ttm_pool_type {
 };
 
 /**
- * ttm_pool - Pool for all caching and orders
+ * struct ttm_pool - Pool for all caching and orders
  *
+ * @dev: the device we allocate pages for
  * @use_dma_alloc: if coherent DMA allocations should be used
  * @use_dma32: if GFP_DMA32 should be used
  * @caching: pools for each caching/order
-- 
2.25.1



[drm:i915-vtable-cleanup 12/12] drivers/gpu/drm/i915/display/intel_audio.c:685:13: error: 'ilk_audio_codec_disable' defined but not used

2021-09-08 Thread kernel test robot
tree:   git://people.freedesktop.org/~airlied/linux.git i915-vtable-cleanup
head:   b0d0061aeef594fc572295c0e3c02ba91596cbf6
commit: b0d0061aeef594fc572295c0e3c02ba91596cbf6 [12/12] drm/i915/display: 
constify the audio functions
config: i386-allyesconfig (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce (this is a W=1 build):
git remote add drm git://people.freedesktop.org/~airlied/linux.git
git fetch --no-tags drm i915-vtable-cleanup
git checkout b0d0061aeef594fc572295c0e3c02ba91596cbf6
# save the attached .config to linux build tree
make W=1 ARCH=i386 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

   drivers/gpu/drm/i915/display/intel_audio.c: In function 
'intel_audio_codec_enable':
   drivers/gpu/drm/i915/display/intel_audio.c:852:24: error: 
'dev_priv->audio_funcs' is a pointer; did you mean to use '->'?
 852 |   dev_priv->audio_funcs.audio_codec_enable(encoder,
 |^
 |->
   drivers/gpu/drm/i915/display/intel_audio.c: In function 
'intel_audio_codec_disable':
   drivers/gpu/drm/i915/display/intel_audio.c:897:24: error: 
'dev_priv->audio_funcs' is a pointer; did you mean to use '->'?
 897 |   dev_priv->audio_funcs.audio_codec_disable(encoder,
 |^
 |->
   drivers/gpu/drm/i915/display/intel_audio.c: At top level:
   drivers/gpu/drm/i915/display/intel_audio.c:919:46: error: expected '}' 
before ';' token
 919 |  .audio_codec_enable = g4x_audio_codec_enable;
 |  ^
   drivers/gpu/drm/i915/display/intel_audio.c:918:68: note: to match this '{'
 918 | static const struct drm_i915_display_audio_funcs g4x_audio_funcs = {
 |^
   drivers/gpu/drm/i915/display/intel_audio.c:924:46: error: expected '}' 
before ';' token
 924 |  .audio_codec_enable = ilk_audio_codec_enable;
 |  ^
   drivers/gpu/drm/i915/display/intel_audio.c:923:68: note: to match this '{'
 923 | static const struct drm_i915_display_audio_funcs ilk_audio_funcs = {
 |^
   drivers/gpu/drm/i915/display/intel_audio.c:929:46: error: expected '}' 
before ';' token
 929 |  .audio_codec_enable = hsw_audio_codec_enable;
 |  ^
   drivers/gpu/drm/i915/display/intel_audio.c:928:68: note: to match this '{'
 928 | static const struct drm_i915_display_audio_funcs hsw_audio_funcs = {
 |^
>> drivers/gpu/drm/i915/display/intel_audio.c:685:13: error: 
>> 'ilk_audio_codec_disable' defined but not used [-Werror=unused-function]
 685 | static void ilk_audio_codec_disable(struct intel_encoder *encoder,
 | ^~~
>> drivers/gpu/drm/i915/display/intel_audio.c:486:13: error: 
>> 'hsw_audio_codec_disable' defined but not used [-Werror=unused-function]
 486 | static void hsw_audio_codec_disable(struct intel_encoder *encoder,
 | ^~~
>> drivers/gpu/drm/i915/display/intel_audio.c:323:13: error: 
>> 'g4x_audio_codec_disable' defined but not used [-Werror=unused-function]
 323 | static void g4x_audio_codec_disable(struct intel_encoder *encoder,
 | ^~~
   cc1: all warnings being treated as errors


vim +/ilk_audio_codec_disable +685 drivers/gpu/drm/i915/display/intel_audio.c

12e87f23c6278ed drivers/gpu/drm/i915/intel_audio.c Jani Nikula   
2016-10-10  485  
8ec47de21bfab96 drivers/gpu/drm/i915/intel_audio.c Ville Syrjälä 
2017-10-30 @486  static void hsw_audio_codec_disable(struct intel_encoder 
*encoder,
8ec47de21bfab96 drivers/gpu/drm/i915/intel_audio.c Ville Syrjälä 
2017-10-30  487const struct intel_crtc_state 
*old_crtc_state,
8ec47de21bfab96 drivers/gpu/drm/i915/intel_audio.c Ville Syrjälä 
2017-10-30  488const struct drm_connector_state 
*old_conn_state)
69bfe1a9b4dffca drivers/gpu/drm/i915/intel_audio.c Jani Nikula   
2014-10-27  489  {
fac5e23e3c385fd drivers/gpu/drm/i915/intel_audio.c Chris Wilson  
2016-07-04  490struct drm_i915_private *dev_priv = 
to_i915(encoder->base.dev);
3904fb78a80da64 drivers/gpu/drm/i915/intel_audio.c Ville Syrjälä 
2019-04-30  491enum transcoder cpu_transcoder = 
old_crtc_state->cpu_transcoder;
c25004964c5a8a0 drivers/gpu/drm/i915/intel_audio.c Jani Nikula   
2018-06-12  492u32 tmp;
69bfe1a9b4dffca drivers/gpu/drm/i915/intel_audio.c Jani Nikula   
2014-10-27

Re: [PATCH 2/8] drm/ttm: add some general module kerneldoc

2021-09-08 Thread Matthew Auld
On Wed, 8 Sept 2021 at 14:29, Christian König
 wrote:
>
> For now just a brief description of what TTM is all about.
>
> Signed-off-by: Christian König 
Reviewed-by: Matthew Auld 


Re: [PATCH] drm/msm: Disable frequency clamping on a630

2021-09-08 Thread Caleb Connolly




On 08/09/2021 03:21, Bjorn Andersson wrote:

On Mon 09 Aug 10:26 PDT 2021, Akhil P Oommen wrote:


On 8/9/2021 9:48 PM, Caleb Connolly wrote:



On 09/08/2021 17:12, Rob Clark wrote:

On Mon, Aug 9, 2021 at 7:52 AM Akhil P Oommen
 wrote:

[..]

I am a bit confused. We don't define a power domain for gpu in dt,
correct? Then what exactly set_opp do here? Do you think this usleep is
what is helping here somehow to mask the issue?

The power domains (for cx and gx) are defined in the GMU DT, the OPPs in
the GPU DT. For the sake of simplicity I'll refer to the lowest
frequency (25700) and OPP level (RPMH_REGULATOR_LEVEL_LOW_SVS) as
the "min" state, and the highest frequency (71000) and OPP level
(RPMH_REGULATOR_LEVEL_TURBO_L1) as the "max" state. These are defined in
sdm845.dtsi under the gpu node.

The new devfreq behaviour unmasks what I think is a driver bug, it
inadvertently puts much more strain on the GPU regulators than they
usually get. With the new behaviour the GPU jumps from it's min state to
the max state and back again extremely rapidly under workloads as small
as refreshing UI. Where previously the GPU would rarely if ever go above
342MHz when interacting with the device, it now jumps between min and
max many times per second.

If my understanding is correct, the current implementation of the GMU
set freq is the following:
   - Get OPP for frequency to set
   - Push the frequency to the GMU - immediately updating the core clock
   - Call dev_pm_opp_set_opp() which triggers a notify chain, this winds
up somewhere in power management code and causes the gx regulator level
to be updated


Nope. dev_pm_opp_set_opp() sets the bandwidth for gpu and nothing else. We
were using a different api earlier which got deprecated -
dev_pm_opp_set_bw().



On the Lenovo Yoga C630 this is reproduced by starting alacritty and if
I'm lucky I managed to hit a few keys before it crashes, so I spent a
few hours looking into this as well...

As you say, the dev_pm_opp_set_opp() will only cast a interconnect vote.
The opp-level is just there for show and isn't used by anything, at
least not on 845.

Further more, I'm missing something in my tree, so the interconnect
doesn't hit sync_state, and as such we're not actually scaling the
buses. So the problem is not that Linux doesn't turn on the buses in
time.

So I suspect that the "AHB bus error" isn't saying that we turned off
the bus, but rather that the GPU becomes unstable or something of that
sort.


Lastly, I reverted 9bc95570175a ("drm/msm: Devfreq tuning") and ran
Aquarium for 20 minutes without a problem. I then switched the gpu
devfreq governor to "userspace" and ran the following:

while true; do
   echo 25700 > /sys/class/devfreq/500.gpu/userspace/set_freq
   echo 71000 > /sys/class/devfreq/500.gpu/userspace/set_freq
done

It took 19 iterations of this loop to crash the GPU.

So the problem doesn't seem to be Rob's change, it's just that prior to
it the chance to hitting it is way lower. Question is still what it is
that we're triggering.


Do the opp-levels in DTS represent how the hardware behaves? If so then it does 
just
appear to be that whatever is responsible for scaling the GX rail voltage
has no time limits and will attempt to switch the regulator between min/max
voltage as often as we tell it to which is probably not something the hardware 
expected.


Regards,
Bjorn



--
Kind Regards,
Caleb (they/them)


[drm:i915-uncore-vfunc 31/31] drivers/gpu/drm/i915/i915_irq.c:52:10: fatal error: 'display/i915_display_trace.h' file not found

2021-09-08 Thread kernel test robot
tree:   git://people.freedesktop.org/~airlied/linux.git i915-uncore-vfunc
head:   b42168f90718a90b11f2d52306d9aeaa9468
commit: b42168f90718a90b11f2d52306d9aeaa9468 [31/31] RFC: drm/i915: start 
splitting trace points
config: i386-randconfig-a014-20210908 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 
9c476172b93367d2cb88d7d3f4b1b5b456fa6020)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
git remote add drm git://people.freedesktop.org/~airlied/linux.git
git fetch --no-tags drm i915-uncore-vfunc
git checkout b42168f90718a90b11f2d52306d9aeaa9468
# save the attached .config to linux build tree
mkdir build_dir
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross O=build_dir 
ARCH=i386 SHELL=/bin/bash drivers/gpu/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

   make[4]: *** [scripts/Makefile.build:271: drivers/gpu/drm/i915/i915_irq.o] 
Error 1
   make[4]: *** [scripts/Makefile.build:271: 
drivers/gpu/drm/i915/intel_uncore.o] Error 1
   make[4]: *** [scripts/Makefile.build:271: 
drivers/gpu/drm/i915/gt/intel_execlists_submission.o] Error 1
   make[4]: *** [scripts/Makefile.build:271: 
drivers/gpu/drm/i915/gt/intel_reset.o] Error 1
   make[4]: *** [scripts/Makefile.build:271: 
drivers/gpu/drm/i915/gem/i915_gem_context.o] Error 1
>> make[4]: *** No rule to make target 
>> 'drivers/gpu/drm/i915/display/intel_display_trace_points.o', needed by 
>> 'drivers/gpu/drm/i915/i915.o'.
   make[4]: *** [scripts/Makefile.build:271: 
drivers/gpu/drm/i915/display/intel_atomic_plane.o] Error 1
   make[4]: *** [scripts/Makefile.build:271: 
drivers/gpu/drm/i915/display/intel_crtc.o] Error 1
   make[4]: *** [scripts/Makefile.build:271: 
drivers/gpu/drm/i915/display/intel_frontbuffer.o] Error 1
   make[4]: *** [scripts/Makefile.build:271: 
drivers/gpu/drm/i915/display/intel_fifo_underrun.o] Error 1
   make[4]: *** [scripts/Makefile.build:271: 
drivers/gpu/drm/i915/display/intel_fbc.o] Error 1
   make[4]: Target '__build' not remade because of errors.
--
>> drivers/gpu/drm/i915/i915_irq.c:52:10: fatal error: 
>> 'display/i915_display_trace.h' file not found
   #include "display/i915_display_trace.h"
^~
   1 error generated.
--
>> drivers/gpu/drm/i915/display/intel_atomic_plane.c:38:10: fatal error: 
>> 'i915_display_trace.h' file not found
   #include "i915_display_trace.h"
^~
   1 error generated.
--
>> drivers/gpu/drm/i915/display/intel_crtc.c:13:10: fatal error: 
>> 'i915_display_trace.h' file not found
   #include "i915_display_trace.h"
^~
   1 error generated.
--
>> drivers/gpu/drm/i915/display/intel_fbc.c:44:10: fatal error: 
>> 'i915_display_trace.h' file not found
   #include "i915_display_trace.h"
^~
   1 error generated.
--
>> drivers/gpu/drm/i915/display/intel_fifo_underrun.c:29:10: fatal error: 
>> 'i915_display_trace.h' file not found
   #include "i915_display_trace.h"
^~
   1 error generated.
--
>> drivers/gpu/drm/i915/display/intel_frontbuffer.c:61:10: fatal error: 
>> 'i915_display_trace.h' file not found
   #include "i915_display_trace.h"
^~
   1 error generated.


vim +52 drivers/gpu/drm/i915/i915_irq.c

49  
50  #include "i915_drv.h"
51  #include "i915_irq.h"
  > 52  #include "display/i915_display_trace.h"
53  #include "intel_pm.h"
54  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip


[PATCH] drm/rockchip: Update crtc fixup to account for fractional clk change

2021-09-08 Thread Chris Morgan
From: Chris Morgan 

After commit 928f9e268611 ("clk: fractional-divider: Hide
clk_fractional_divider_ops from wide audience") was merged it appears
that the DSI panel on my Odroid Go Advance stopped working. Upon closer
examination of the problem, it looks like it was the fixup in the
rockchip_drm_vop.c file was causing the issue. The changes made to the
clk driver appear to change some assumptions made in the fixup.

After debugging the working 5.14 kernel and the no-longer working
5.15 kernel, it looks like this was broken all along but still
worked, whereas after the fractional clock change it stopped
working despite the issue (it went from sort-of broken to very broken).

In the 5.14 kernel the dclk_vopb_frac was being requested to be set to
17000999 on my board. The clock driver was taking the value of the
parent clock and attempting to divide the requested value from it
(1700/17000999 = 0), then subtracting 1 from it (making it -1),
and running it through fls_long to get 64. It would then subtract
the value of fd->mwidth from it to get 48, and then bit shift
17000999 to the left by 48, coming up with a very large number of
7649082492112076800. This resulted in a numerator of 65535 and a
denominator of 1 from the clk driver. The driver seemingly would
try again and get a correct 1:1 value later, and then move on.

Output from my 5.14 kernel (with some printfs for good measure):
[2.830066] rockchip-drm display-subsystem: bound ff46.vop (ops 
vop_component_ops)
[2.839431] rockchip-drm display-subsystem: bound ff45.dsi (ops 
dw_mipi_dsi_rockchip_ops)
[2.855980] Clock is dclk_vopb_frac
[2.856004] Scale 64, Rate 7649082492112076800, Oldrate 17000999, Parent 
Rate 1700, Best Numerator 65535, Best Denominator 1, fd->mwidth 16
[2.903529] Clock is dclk_vopb_frac
[2.903556] Scale 0, Rate 1700, Oldrate 1700, Parent Rate 1700, 
Best Numerator 1, Best Denominator 1, fd->mwidth 16
[2.903579] Clock is dclk_vopb_frac
[2.903583] Scale 0, Rate 1700, Oldrate 1700, Parent Rate 1700, 
Best Numerator 1, Best Denominator 1, fd->mwidth 16

Contrast this with 5.15 after the clk change where the rate of 17000999
was getting passed and resulted in numerators/denomiators of 17001/
17000.

Output from my 5.15 kernel (with some printfs added for good measure):
[2.817571] rockchip-drm display-subsystem: bound ff46.vop (ops 
vop_component_ops)
[2.826975] rockchip-drm display-subsystem: bound ff45.dsi (ops 
dw_mipi_dsi_rockchip_ops)
[2.843430] Rate 17000999, Parent Rate 1700, Best Numerator 17018, Best 
Denominator 17017
[2.891073] Rate 17001000, Parent Rate 1700, Best Numerator 17001, Best 
Denominator 17000
[2.891269] Rate 17001000, Parent Rate 1700, Best Numerator 17001, Best 
Denominator 17000
[2.891281] Rate 17001000, Parent Rate 1700, Best Numerator 17001, Best 
Denominator 17000

After tracing through the code it appeared that this function here was
adding a 999 to the requested frequency because of how the clk driver
was rounding/accepting those frequencies. I believe after the changes
made in the commit listed above the assumptions listed in this driver
are no longer true. When I remove the + 999 from the driver the DSI
panel begins to work again.

Output from my 5.15 kernel with 999 removed (printfs added):
[2.852054] rockchip-drm display-subsystem: bound ff46.vop (ops 
vop_component_ops)
[2.864483] rockchip-drm display-subsystem: bound ff45.dsi (ops 
dw_mipi_dsi_rockchip_ops)
[2.880869] Clock is dclk_vopb_frac
[2.880892] Rate 1700, Parent Rate 1700, Best Numerator 1, Best 
Denominator 1
[2.928521] Clock is dclk_vopb_frac
[2.928551] Rate 1700, Parent Rate 1700, Best Numerator 1, Best 
Denominator 1
[2.928570] Clock is dclk_vopb_frac
[2.928574] Rate 1700, Parent Rate 1700, Best Numerator 1, Best 
Denominator 1

I have tested the change extensively on my Odroid Go Advance (Rockchip
RK3326) and it appears to work well. However, this change will affect
all Rockchip SoCs that use this driver so I believe further testing
is warranted. Please note that without this change I can confirm
at least all PX30s with DSI panels will stop working with the 5.15
kernel.

Signed-off-by: Chris Morgan 
---
 drivers/gpu/drm/rockchip/rockchip_drm_vop.c | 21 +++--
 1 file changed, 3 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c 
b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
index ba9e14da41b4..bfef4f52dce6 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c
@@ -1169,31 +1169,16 @@ static bool vop_crtc_mode_fixup(struct drm_crtc *crtc,
 *
 * - DRM works in in kHz.
 * - Clock framework works in Hz.
-* - Rockchip's clock driver picks the clock rate that is the
-*   same _OR LOWER_ than the one requested.
 *
  

Re: [PATCH 4/8] drm/i915/xehp: CCS should use RCS setup functions

2021-09-08 Thread Tvrtko Ursulin



On 08/09/2021 11:13, Tvrtko Ursulin wrote:


On 07/09/2021 18:19, Matt Roper wrote:

The compute engine handles the same commands the render engine can
(except 3D pipeline), so it makes sense that CCS is more similar to RCS
than non-render engines.

The CCS context state (lrc) is also similar to the render one, so reuse
it. Note that the compute engine has its own CTX_R_PWR_CLK_STATE
register.

In order to avoid having multiple RCS && CCS checks, add the following
engine flag:
  - I915_ENGINE_HAS_RCS_REG_STATE - use the render (larger) reg state 
ctx.


BSpec: 46260
Original-patch-by: Michel Thierry
Cc: Tvrtko Ursulin 
Cc: Daniele Ceraolo Spurio 
Signed-off-by: Aravind Iddamsetty 
Signed-off-by: Matt Roper 
---
  drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c | 8 +---
  drivers/gpu/drm/i915/gt/intel_engine_cs.c | 6 ++
  drivers/gpu/drm/i915/gt/intel_engine_types.h  | 1 +
  drivers/gpu/drm/i915/gt/intel_execlists_submission.c  | 2 +-
  drivers/gpu/drm/i915/gt/intel_lrc.c   | 4 ++--
  drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 2 +-
  drivers/gpu/drm/i915/i915_perf.c  | 4 ++--
  drivers/gpu/drm/i915/i915_reg.h   | 2 +-
  8 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c

index b32f7fed2d9c..fbe10783628b 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c
@@ -883,7 +883,9 @@ static int igt_shared_ctx_exec(void *arg)
  return err;
  }
-static int rpcs_query_batch(struct drm_i915_gem_object *rpcs, struct 
i915_vma *vma)

+static int rpcs_query_batch(struct drm_i915_gem_object *rpcs,
+    struct i915_vma *vma,
+    struct intel_engine_cs *engine)
  {
  u32 *cmd;
@@ -894,7 +896,7 @@ static int rpcs_query_batch(struct 
drm_i915_gem_object *rpcs, struct i915_vma *v

  return PTR_ERR(cmd);
  *cmd++ = MI_STORE_REGISTER_MEM_GEN8;
-    *cmd++ = i915_mmio_reg_offset(GEN8_R_PWR_CLK_STATE);
+    *cmd++ = 
i915_mmio_reg_offset(GEN8_R_PWR_CLK_STATE(engine->mmio_base));

  *cmd++ = lower_32_bits(vma->node.start);
  *cmd++ = upper_32_bits(vma->node.start);
  *cmd = MI_BATCH_BUFFER_END;
@@ -955,7 +957,7 @@ emit_rpcs_query(struct drm_i915_gem_object *obj,
  if (err)
  goto err_vma;
-    err = rpcs_query_batch(rpcs, vma);
+    err = rpcs_query_batch(rpcs, vma, ce->engine);
  if (err)
  goto err_batch;
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
b/drivers/gpu/drm/i915/gt/intel_engine_cs.c

index 69944bd8c19d..b346b946602d 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -205,6 +205,8 @@ u32 intel_engine_context_size(struct intel_gt *gt, 
u8 class)

  BUILD_BUG_ON(I915_GTT_PAGE_SIZE != PAGE_SIZE);
  switch (class) {
+    case COMPUTE_CLASS:
+    fallthrough;
  case RENDER_CLASS:
  switch (GRAPHICS_VER(gt->i915)) {
  default:
@@ -379,6 +381,10 @@ static int intel_engine_setup(struct intel_gt 
*gt, enum intel_engine_id id)

  if (GRAPHICS_VER(i915) == 12 && engine->class == RENDER_CLASS)
  engine->props.preempt_timeout_ms = 0;
+    /* features common between engines sharing EUs */
+    if (engine->class == RENDER_CLASS || engine->class == COMPUTE_CLASS)
+    engine->flags |= I915_ENGINE_HAS_RCS_REG_STATE;
+
  engine->defaults = engine->props; /* never to change again */
  engine->context_size = intel_engine_context_size(gt, 
engine->class);
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h 
b/drivers/gpu/drm/i915/gt/intel_engine_types.h

index dcb9d8b2362a..30a0c69c36c8 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -454,6 +454,7 @@ struct intel_engine_cs {
  #define I915_ENGINE_HAS_RELATIVE_MMIO BIT(6)
  #define I915_ENGINE_REQUIRES_CMD_PARSER BIT(7)
  #define I915_ENGINE_WANT_FORCED_PREEMPTION BIT(8)
+#define I915_ENGINE_HAS_RCS_REG_STATE  BIT(9)
  unsigned int flags;
  /*
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c

index de5f9c86b9a4..4c600c46414d 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -3406,7 +3406,7 @@ int intel_execlists_submission_setup(struct 
intel_engine_cs *engine)

  logical_ring_default_vfuncs(engine);
  logical_ring_default_irqs(engine);
-    if (engine->class == RENDER_CLASS)
+    if (engine->flags & I915_ENGINE_HAS_RCS_REG_STATE)
  rcs_submission_override(engine);


Hm, what do pipe control flushes which relate to 3d pipeline end up 
doing on CCS engines?


Right, answer found in the following patch.

Ideally the two would swap places in the series so by 

Re: [PATCH 6/8] drm/i915/xehp: Define context scheduling attributes in lrc descriptor

2021-09-08 Thread Tvrtko Ursulin



On 07/09/2021 18:19, Matt Roper wrote:

In Dual Context mode the EUs are shared between render and compute
command streamers. The hardware provides a field in the lrc descriptor
to indicate the prioritization of the thread dispatch associated to the
corresponding context.

The context priority is set to 'low' at creation time and relies on the
existing context priority to set it to low/normal/high.

HSDES: 1604462009
Bspec: 46145, 46260
Original-patch-by: Michel Thierry
Cc: Tvrtko Ursulin 
Signed-off-by: Aravind Iddamsetty 
Signed-off-by: Prasad Nallani 
Signed-off-by: Matt Roper 
---
  drivers/gpu/drm/i915/gt/intel_engine_cs.c|  4 +++-
  drivers/gpu/drm/i915/gt/intel_engine_types.h |  1 +
  drivers/gpu/drm/i915/gt/intel_execlists_submission.c |  6 +-
  drivers/gpu/drm/i915/gt/intel_lrc.h  | 10 ++
  drivers/gpu/drm/i915/i915_reg.h  |  4 
  5 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index b346b946602d..2f719f0ecac3 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -382,8 +382,10 @@ static int intel_engine_setup(struct intel_gt *gt, enum 
intel_engine_id id)
engine->props.preempt_timeout_ms = 0;
  
  	/* features common between engines sharing EUs */

-   if (engine->class == RENDER_CLASS || engine->class == COMPUTE_CLASS)
+   if (engine->class == RENDER_CLASS || engine->class == COMPUTE_CLASS) {
engine->flags |= I915_ENGINE_HAS_RCS_REG_STATE;
+   engine->flags |= I915_ENGINE_HAS_EU_PRIORITY;
+   }
  
  	engine->defaults = engine->props; /* never to change again */
  
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h

index 30a0c69c36c8..00bf0296b28a 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -455,6 +455,7 @@ struct intel_engine_cs {
  #define I915_ENGINE_REQUIRES_CMD_PARSER BIT(7)
  #define I915_ENGINE_WANT_FORCED_PREEMPTION BIT(8)
  #define I915_ENGINE_HAS_RCS_REG_STATE  BIT(9)
+#define I915_ENGINE_HAS_EU_PRIORITYBIT(10)
unsigned int flags;
  
  	/*

diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 4c600c46414d..2b36ec7f3a04 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -662,9 +662,13 @@ static inline void execlists_schedule_out(struct 
i915_request *rq)
  static u64 execlists_update_context(struct i915_request *rq)
  {
struct intel_context *ce = rq->context;
-   u64 desc = ce->lrc.desc;
+   u64 desc;
u32 tail, prev;
  
+	desc = ce->lrc.desc;

+   if (rq->engine->flags & I915_ENGINE_HAS_EU_PRIORITY)
+   desc |= lrc_desc_priority(rq_prio(rq));
+
/*
 * WaIdleLiteRestore:bdw,skl
 *
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.h 
b/drivers/gpu/drm/i915/gt/intel_lrc.h
index 7f697845c4cf..d3f2096b3d51 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.h
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.h
@@ -79,4 +79,14 @@ static inline u32 lrc_get_runtime(const struct intel_context 
*ce)
return READ_ONCE(ce->lrc_reg_state[CTX_TIMESTAMP]);
  }
  
+static inline u32 lrc_desc_priority(int prio)

+{
+   if (prio > I915_PRIORITY_NORMAL)
+   return GEN12_CTX_PRIORITY_HIGH;
+   else if (prio < I915_PRIORITY_NORMAL)
+   return GEN12_CTX_PRIORITY_LOW;
+   else
+   return GEN12_CTX_PRIORITY_NORMAL;
+}
+
  #endif /* __INTEL_LRC_H__ */
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 0bb185ce9529..5b68c02c35af 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -4212,6 +4212,10 @@ enum {
  #define GEN8_CTX_L3LLC_COHERENT (1 << 5)
  #define GEN8_CTX_PRIVILEGE (1 << 8)
  #define GEN8_CTX_ADDRESSING_MODE_SHIFT 3
+#define GEN12_CTX_PRIORITY_MASK REG_GENMASK(10, 9)
+#define GEN12_CTX_PRIORITY_HIGH REG_FIELD_PREP(GEN12_CTX_PRIORITY_MASK, 2)
+#define GEN12_CTX_PRIORITY_NORMAL REG_FIELD_PREP(GEN12_CTX_PRIORITY_MASK, 1)
+#define GEN12_CTX_PRIORITY_LOW REG_FIELD_PREP(GEN12_CTX_PRIORITY_MASK, 0)
  
  #define GEN8_CTX_ID_SHIFT 32

  #define GEN8_CTX_ID_WIDTH 21



Haven't checked bspec to check the bitfield but the mechanics look good.

Reviewed-by: Tvrtko Ursulin 

Regards,

Tvrtko



Re: [PATCH 7/8] drm/i915/xehp: Enable ccs/dual-ctx in RCU_MODE

2021-09-08 Thread Tvrtko Ursulin



On 07/09/2021 18:19, Matt Roper wrote:

We have to specify in the Render Control Unit Mode register
when CCS is enabled.

Bspec: 46034
Original-patch-by: Michel Thierry
Cc: Daniele Ceraolo Spurio 
Cc: Tvrtko Ursulin 
Cc: Vinay Belgaumkar 
Signed-off-by: Daniele Ceraolo Spurio 
Signed-off-by: Aravind Iddamsetty 
Signed-off-by: Matt Roper 
---
  .../drm/i915/gt/intel_execlists_submission.c  | 26 +++
  .../gpu/drm/i915/gt/uc/intel_guc_submission.c | 26 +++
  drivers/gpu/drm/i915/i915_reg.h   |  3 +++
  3 files changed, 55 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index 2b36ec7f3a04..046f7da67ba6 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -2874,6 +2874,29 @@ static int execlists_resume(struct intel_engine_cs 
*engine)
return 0;
  }
  
+static int gen12_rcs_resume(struct intel_engine_cs *engine)

+{
+   int ret;
+
+   ret = execlists_resume(engine);
+   if (ret)
+   return ret;
+
+   /*
+* Multi Context programming.
+* just need to program this register once no matter how many CCS


Just


+* engines there are. Since some of the CCS engines might be fused off,
+* we can't do this as part of the init of a specific CCS and we do
+* it during RCS init instead. RCS and all CCS engines are reset


I don't really understand the "can't" part - clearly it would be doable 
if a specific vfunc was assigned to one ccs only, the one which is 
present of course. Not saying that would be nicer since I think it has 
it's own downside.


Perhaps nicest solution is to add an engine flag saying "enables rcu" 
and then execlists and guc resume check that and do stuff?


No strong opinion yet, just discussing.


+* together, so post-reset re-init is covered as well.
+*/
+   if (CCS_MASK(engine->gt))
+   intel_uncore_write(engine->uncore, GEN12_RCU_MODE,
+  _MASKED_BIT_ENABLE(GEN12_RCU_MODE_CCS_ENABLE));
+
+   return 0;
+}
+
  static void execlists_reset_prepare(struct intel_engine_cs *engine)
  {
ENGINE_TRACE(engine, "depth<-%d\n",
@@ -3394,6 +3417,9 @@ static void rcs_submission_override(struct 
intel_engine_cs *engine)
engine->emit_fini_breadcrumb = gen8_emit_fini_breadcrumb_rcs;
break;
}
+
+   if (engine->class == RENDER_CLASS)
+   engine->resume = gen12_rcs_resume;
  }
  
  int intel_execlists_submission_setup(struct intel_engine_cs *engine)

diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c 
b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 2f5bf7aa7e3b..db956255d076 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -2350,6 +2350,29 @@ static bool guc_sched_engine_disabled(struct 
i915_sched_engine *sched_engine)
return !sched_engine->tasklet.callback;
  }
  
+static int gen12_rcs_resume(struct intel_engine_cs *engine)

+{
+   int ret;
+
+   ret = guc_resume(engine);
+   if (ret)
+   return ret;
+
+   /*
+* Multi Context programming.
+* just need to program this register once no matter how many CCS
+* engines there are. Since some of the CCS engines might be fused off,
+* we can't do this as part of the init of a specific CCS and we do
+* it during RCS init instead. RCS and all CCS engines are reset
+* together, so post-reset re-init is covered as well.
+*/
+   if (CCS_MASK(engine->gt))
+   intel_uncore_write(engine->uncore, GEN12_RCU_MODE,
+  _MASKED_BIT_ENABLE(GEN12_RCU_MODE_CCS_ENABLE));


Duplicating the write from gen12_rcs_resume looks passable but when with 
the whole comment then hmm.. How about a helper is added which both 
would call? Like intel_engine_enable_rcu_mode() or something?


Regards,

Tvrtko


+
+   return 0;
+}
+
  static void guc_set_default_submission(struct intel_engine_cs *engine)
  {
engine->submit_request = guc_submit_request;
@@ -2464,6 +2487,9 @@ static void rcs_submission_override(struct 
intel_engine_cs *engine)
engine->emit_fini_breadcrumb = gen8_emit_fini_breadcrumb_rcs;
break;
}
+
+   if (engine->class == RENDER_CLASS)
+   engine->resume = gen12_rcs_resume;
  }
  
  static inline void guc_default_irqs(struct intel_engine_cs *engine)

diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 5b68c02c35af..57f9456f8c61 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -498,6 +498,9 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
  #define   ECOBITS_PPGTT_CACHE64B  (3 << 8)
  #define   ECOBITS_PPGTT_CACHE4B  

Re: [Intel-gfx] [PATCH 8/8] drm/i915/xehp: Extend uninterruptible OpenCL workloads to CCS

2021-09-08 Thread Tvrtko Ursulin



On 07/09/2021 18:19, Matt Roper wrote:

From: John Harrison 

Now that OpenCL workloads can run on the compute engine, we need to set
preempt_timeout_ms = 0 on the CCS engines too.

Signed-off-by: John Harrison 
Signed-off-by: Matt Roper 
---
  drivers/gpu/drm/i915/gt/intel_engine_cs.c | 9 +
  1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 2f719f0ecac3..7e6ac0ae1f07 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -377,16 +377,17 @@ static int intel_engine_setup(struct intel_gt *gt, enum 
intel_engine_id id)
engine->props.timeslice_duration_ms =
CONFIG_DRM_I915_TIMESLICE_DURATION;
  
-	/* Override to uninterruptible for OpenCL workloads. */

-   if (GRAPHICS_VER(i915) == 12 && engine->class == RENDER_CLASS)
-   engine->props.preempt_timeout_ms = 0;
-
/* features common between engines sharing EUs */
if (engine->class == RENDER_CLASS || engine->class == COMPUTE_CLASS) {
engine->flags |= I915_ENGINE_HAS_RCS_REG_STATE;
engine->flags |= I915_ENGINE_HAS_EU_PRIORITY;
}
  
+	/* Override to uninterruptible for OpenCL workloads. */

+   if (GRAPHICS_VER(i915) == 12 &&
+   engine->flags & I915_ENGINE_HAS_RCS_REG_STATE)
+   engine->props.preempt_timeout_ms = 0;
+
engine->defaults = engine->props; /* never to change again */
  
  	engine->context_size = intel_engine_context_size(gt, engine->class);




Reviewed-by: Tvrtko Ursulin 

Regards,

Tvrtko


Re: [Freedreno] [PATCH 2/3] drm/msm/dpu1: Add MSM8998 to hw catalog

2021-09-08 Thread Jeffrey Hugo
On Wed, Sep 8, 2021 at 2:26 AM Dmitry Baryshkov
 wrote:
>
> Hi,
>
> On Tue, 7 Sept 2021 at 22:13, Jeffrey Hugo  wrote:
> >
> > On Wed, Sep 1, 2021 at 12:11 PM AngeloGioacchino Del Regno
> >  wrote:
> > >
> > > Bringup functionality for MSM8998 in the DPU, driver which is mostly
> > > the same as SDM845 (just a few variations).
> > >
> > > Signed-off-by: AngeloGioacchino Del Regno 
> > > 
> >
> > I don't seem to see a cover letter for this series.
> >
> > Eh, there are a fair number of differences between the MDSS versions
> > for 8998 and 845.
> >
> > Probably a bigger question, why extend the DPU driver for 8998, when
> > the MDP5 driver already supports it[1]?  The MDP/DPU split is pretty
> > dumb, but I don't see a valid reason for both drivers supporting the
> > same target/display revision.  IMO, if you want this support in DPU,
> > remove it from MDP5.
> >
> > [1] 
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v5.14&id=d6c7b2284b14c66a268a448a7a8d54f585d38785
>
> I don't think that we should enforce such requirements. Having support
> both in MDP5 and DPU would allow one to compare those two drivers,
> performance, features, etc.
> It might be that all MDP5-supported hardware would be also supported
> by DPU, thus allowing us to remove the former driver. But until that
> time I'd suggest leaving support in place.

Well, then you have a host of problems to solve.

Lets ignore the code duplication for a minute and assume we've gone
with this grand experiment.  Two drivers enter, one leaves the victor.

How are the clients supposed to pick which driver to use in the mean
time?  We already have one DT binding for 8998 (which the MDP5 driver
services).  This series proposes a second.  If we go forward with what
you propose, we'll have two bindings for the same hardware, which IMO
doesn't make sense in the context of DT, and the reason for that is to
select which driver is "better".  Driver selection is not supposed to
be tied to DT like this.

So, some boards think MDP5 is better, and some boards think DPU is
better.  At some point, we decide one of the drivers is the clear
winner (lets assume DPU).  Then what happens to the existing DTs that
were using the MDP5 description?  Are they really compatible with DPU?

>From a DT perspective, there should be one description, but then how
do you pick which driver to load?  Both can't bind on the single
description, and while you could argue that the users should build one
driver or the other, but not both (thus picking which one at build
time), that doesn't work for distros that want to build both drivers
so that they can support all platforms with a single build (per arch).

>From where I sit, your position starts with a good idea, but isn't
fully thought out and leads to problems.

If there is some reason why DPU is better for 8998, please enumerate
it.  Does DPU support some config that MDP5 doesn't, which is valuable
to you?  I'm ok with ripping out the MDP5 support, the reason I didn't
go with DPU was that the DPU driver was clearly written only for 845
at the time, and needed significant rework to "downgrade" to an
earlier hardware.  However, the "reason" DPU exists separate from MDP5
is the claim that the MDP hardware underwent a significant
rearchitecture, and thus it was too cumbersome to extend MDP5.  While
I disagree with the premise, that "rearch" started with 8998.


Re: [PATCH v5 02/16] drm/sched: Allow using a dedicated workqueue for the timeout/fault tdr

2021-09-08 Thread Andrey Grodzovsky



On 2021-09-08 2:50 a.m., Boris Brezillon wrote:

On Tue, 7 Sep 2021 14:53:58 -0400
Andrey Grodzovsky  wrote:


On 2021-06-29 7:24 a.m., Christian König wrote:


Am 29.06.21 um 13:18 schrieb Boris Brezillon:

Hi Christian,

On Tue, 29 Jun 2021 13:03:58 +0200
Christian König  wrote:
  

Am 29.06.21 um 09:34 schrieb Boris Brezillon:

Mali Midgard/Bifrost GPUs have 3 hardware queues but only a global GPU
reset. This leads to extra complexity when we need to synchronize
timeout
works with the reset work. One solution to address that is to have an
ordered workqueue at the driver level that will be used by the
different
schedulers to queue their timeout work. Thanks to the serialization
provided by the ordered workqueue we are guaranteed that timeout
handlers are executed sequentially, and can thus easily reset the GPU
from the timeout handler without extra synchronization.

Well, we had already tried this and it didn't worked the way it is
expected.

The major problem is that you not only want to serialize the queue, but
rather have a single reset for all queues.

Otherwise you schedule multiple resets for each hardware queue. E.g.
for
your 3 hardware queues you would reset the GPU 3 times if all of them
time out at the same time (which is rather likely).

Using a single delayed work item doesn't work either because you then
only have one timeout.

What could be done is to cancel all delayed work items from all stopped
schedulers.

drm_sched_stop() does that already, and since we call drm_sched_stop()
on all queues in the timeout handler, we end up with only one global
reset happening even if several queues report a timeout at the same
time.

Ah, nice. Yeah, in this case it should indeed work as expected.

Feel free to add an Acked-by: Christian König
 to it.

Regards,
Christian.


Seems to me that for this to work we need to change cancel_delayed_work
to cancel_delayed_work_sync
so not only pending TO handlers  are cancelled but also any in progress
are waited for and to to prevent rearming.
Also move it right after kthread_park - before we start touching pending
list.

I'm probably missing something, but I don't really see why this
specific change would require replacing cancel_delayed_work() calls by
the sync variant.



I see, I missed the point that since now we have a single threaded 
processing and
only one TDR handler runs at given time there is no need to wait for 
other parallel in flight TDR handlers.




I mean, if there's a situation where we need to wait
for in-flight timeout handler to return, it was already the case before
that patch.



In amdgpu case we avoided that by trylock on a common lock
and returning early in case it was already taken by another TDR handler



Note that we need to be careful to not call the sync
variant in helpers that are called from the interrupt handler itself
to avoid deadlocks (i.e. drm_sched_stop()).



I am not clear here - which interrupt handler is drm_sched_stop
called from ? It's called from TDR work as far as I see in the code.

Andrey




Re: [PATCH v5 02/16] drm/sched: Allow using a dedicated workqueue for the timeout/fault tdr

2021-09-08 Thread Boris Brezillon
On Wed, 8 Sep 2021 10:53:21 -0400
Andrey Grodzovsky  wrote:

> > Note that we need to be careful to not call the sync
> > variant in helpers that are called from the interrupt handler itself
> > to avoid deadlocks (i.e. drm_sched_stop()).  
> 
> 
> I am not clear here - which interrupt handler is drm_sched_stop
> called from ? It's called from TDR work as far as I see in the code.

My bad, I meant the timeout handler, not the interrupt handler.



Re: [PATCH] drm/bridge: ti-sn65dsi83: Check link status register after enabling the bridge

2021-09-08 Thread Marek Vasut

On 9/8/21 1:11 PM, Dave Stevenson wrote:

Hi Marek and Andrzej


Hello Dave,

skipping the protocol discussion, which I hope Andrej will pick up.

[...]


Usually video transmission starts in crtc->enable (CRTC->Encoder), and
in encoder->enable (encoder->bridge), so in bridges->enable it would be
too late for LP11 state - transmission can be already in progress.

It shows well that this order of calls does not fit well to DSI, and
probably many other protocols.

Maybe moving most of the bridge->enable code to bridge->pre_enable would
help, but I am not sur if it will not pose another issues.


Yep, that won't work e.g. with the exynos DSIM, because
exynos_dsi_set_display_mode() sets the data lanes to LP11.


Isn't the bigger question for SN65DSI8[3|4|5] whether the clock lane
is running or not in pre_enable?


I think the bigger question really is -- how do we cater for all the 
different bridges with different init-time requirements.



This is quick analysis, so please fix me if I am wrong.


I pretty much agree that the current state of things does not fit with
DSI too well.


That was why I was questioning how DSI was meant to be implemented in
https://lore.kernel.org/dri-devel/capy8ntbukrksam59y+72dw_6xoekvswpwffzpj3uvge6pv4...@mail.gmail.com/

The need to have the DSI host in a defined idle state (often LP-11,
but varying whether the clock lane is in HS) before powering up the
panel/bridge is incredibly common, but currently undefined in DRM.

Taking the SN65DSI83 as an example, the datasheet [1] section 7.4.2
states that the clock lane must be in HS mode, and data lanes in LP-11
when coming out of reset. That means that we can't be "enable" as that
will have the data lanes in HS mode and sending video, and as we can't
be in "pre_enable" as the DSI PHY will be powered down and so we won't
have the clock lanes in HS mode.

I've hit a similar one with the Toshiba TC358762 where it seems to get
upset if it is receiving video data when it gets configured.
panel-raspberrypi-touchscreen[2] which drives that chip is
intermittent when using panel enable, whereas panel prepare is
significantly more reliable but relies on the DSI host being
initialised to LP-11 by breaking the chain.


Right

To make it worse, not initing the DSI bridge exactly per spec leads to 
intermittent failures, not consistently occuring ones.



   Dave

[1] https://www.ti.com/lit/ds/symlink/sn65dsi83.pdf
[2] 
https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/panel/panel-raspberrypi-touchscreen.c


Unrelated to this discussion -- there is a tc358762 driver, driver for 
that attiny88 regulator, and driver for the touchscreen chip, on that 
rpi 7" display, in upstream. You can use those to replace the composite 
panel driver (it works at least against stm32mp1 DSI host with the rpi 
7" panel). Sadly there is little documentation for that attiny88 
protocol or firmware, that's what I don't like about that panel.


Re: [PATCH] drm/bridge: ti-sn65dsi83: Check link status register after enabling the bridge

2021-09-08 Thread Dave Stevenson
On Wed, 8 Sept 2021 at 16:26, Marek Vasut  wrote:
>
> On 9/8/21 1:11 PM, Dave Stevenson wrote:
> > Hi Marek and Andrzej
>
> Hello Dave,
>
> skipping the protocol discussion, which I hope Andrej will pick up.
>
> [...]
>
> >>> Usually video transmission starts in crtc->enable (CRTC->Encoder), and
> >>> in encoder->enable (encoder->bridge), so in bridges->enable it would be
> >>> too late for LP11 state - transmission can be already in progress.
> >>>
> >>> It shows well that this order of calls does not fit well to DSI, and
> >>> probably many other protocols.
> >>>
> >>> Maybe moving most of the bridge->enable code to bridge->pre_enable would
> >>> help, but I am not sur if it will not pose another issues.
> >>
> >> Yep, that won't work e.g. with the exynos DSIM, because
> >> exynos_dsi_set_display_mode() sets the data lanes to LP11.
> >
> > Isn't the bigger question for SN65DSI8[3|4|5] whether the clock lane
> > is running or not in pre_enable?
>
> I think the bigger question really is -- how do we cater for all the
> different bridges with different init-time requirements.
>
> >>> This is quick analysis, so please fix me if I am wrong.
> >>
> >> I pretty much agree that the current state of things does not fit with
> >> DSI too well.
> >
> > That was why I was questioning how DSI was meant to be implemented in
> > https://lore.kernel.org/dri-devel/capy8ntbukrksam59y+72dw_6xoekvswpwffzpj3uvge6pv4...@mail.gmail.com/
> >
> > The need to have the DSI host in a defined idle state (often LP-11,
> > but varying whether the clock lane is in HS) before powering up the
> > panel/bridge is incredibly common, but currently undefined in DRM.
> >
> > Taking the SN65DSI83 as an example, the datasheet [1] section 7.4.2
> > states that the clock lane must be in HS mode, and data lanes in LP-11
> > when coming out of reset. That means that we can't be "enable" as that
> > will have the data lanes in HS mode and sending video, and as we can't
> > be in "pre_enable" as the DSI PHY will be powered down and so we won't
> > have the clock lanes in HS mode.
> >
> > I've hit a similar one with the Toshiba TC358762 where it seems to get
> > upset if it is receiving video data when it gets configured.
> > panel-raspberrypi-touchscreen[2] which drives that chip is
> > intermittent when using panel enable, whereas panel prepare is
> > significantly more reliable but relies on the DSI host being
> > initialised to LP-11 by breaking the chain.
>
> Right
>
> To make it worse, not initing the DSI bridge exactly per spec leads to
> intermittent failures, not consistently occuring ones.

Yes, I suspect it's been just down to timing as to whether the display
side starts producing video data before or after all the configuration
has been sent, and I get random LP commands timing out. (We're only
dropping to LP in vertical blanking, so there isn't a huge amount of
time).

> >Dave
> >
> > [1] https://www.ti.com/lit/ds/symlink/sn65dsi83.pdf
> > [2] 
> > https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/panel/panel-raspberrypi-touchscreen.c
>
> Unrelated to this discussion -- there is a tc358762 driver, driver for
> that attiny88 regulator, and driver for the touchscreen chip, on that
> rpi 7" display, in upstream. You can use those to replace the composite
> panel driver (it works at least against stm32mp1 DSI host with the rpi
> 7" panel). Sadly there is little documentation for that attiny88
> protocol or firmware, that's what I don't like about that panel.

Thank you, I know they exist, and I'm looking at exactly that problem
at the moment!

panel-raspberrypi-touchscreen doesn't expose any form of regulator
control, so trying to hook edt-ft54x6 on for the touchscreen sees it
getting the power yanked from under it. I'm trying to switch to those
drivers so that the two play nicely.

The Atmel is a bit nasty in trying to initialise the bridge, panel,
and touch all at the same time. The edt-ft54x6 driver generally probes
first and powers everything up when the DSI host isn't initialised.
This seems to upset the TC358762 and it then won't initialise.
It is possible to poke most things manually through the PORTA, PORTB
and PORTC commands, but I'm currently failing to create a reliable
mechanism :-( I have the advantage that I have the source code for the
Atmel (it's not nice)

  Dave


Re: Handling DRM master transitions cooperatively

2021-09-08 Thread Dennis Filder
On Wed, Sep 08, 2021 at 09:51:54AM +, Simon Ser wrote:
> > On Tue, 07 Sep 2021 10:19:03 +
> > Simon Ser  wrote:
> >
> > > FWIW, I've just hit a case where a compositor leaves a "rotation" KMS
> > > prop set behind, then Xorg tries to startup and fails because it doesn't
> > > reset this prop. So none of this is theoretical.
> > >
> > > I still think a "reset all KMS props to an arbitrary default value" flag
> > > in drmModeAtomicCommit is the best way forward. I'm not sure a user-space
> > > protocol would help too much.
> >
> > Hi Simon,
> >
> > for the "reset KMS state" problem, sure. Thanks for confirming the
> > problem, too.
> >
> > The hand-off problem does need userspace protocol though, so that the
> > two parties can negotiate what part of KMS state can be inherited by
> > the receiver and who will do the animation from the first to the second
> > state in case you want to avoid abrupt changes. It would also be useful
> > for a cross-fade as a perhaps more flexible way than the current "leak
> > an FB, let the next KMS client scrape it via ioctls and copy it so it
> > can be textured from".
>
> The KMS state can be limited to single FB on primary plane covering the whole
> CRTC, no scaling, no other property set than FB_ID/CRTC_*/SRC_*.
>
> Is it useful to make the previous client perform the animation? I don't really
> understand the use-case here.

The use case for the animation is e.g. the transition from Plymouth to
the display server.  Currently it is done as a still frame transition,
maybe with a blend-over effect.  But with the current design it is not
possible to blend Plymouth's animation over into another animation in
the display server because the second client lacks the knowledge how
to keep it going for a little bit.

Another use case is switching between sessions which currently also is
only possible as a still frame transition.  However, if you wanted to
present the session switching by doing e.g. a shaking screen animation
and blending the old display content over into the new content then
the first client would have to render the first half of the animation,
and the second client would have to render the second half during
which it would then blend away the content of the first screen while
blending in its own content and also slowing the shaking to a stop.
For that to work the second client would need all the information
necessary to render that animation, and also a way to perform the
frame-perfect change-over.

Granted, that is a very complicated, eye-candy-oriented use case, but
it would serve to show-case the potential of the design.

Regards.


Re: Handling DRM master transitions cooperatively

2021-09-08 Thread Dennis Filder
On Tue, Sep 07, 2021 at 05:52:41PM +0200, Sebastian Wick wrote:
> > On Tue, 07 Sep 2021 10:19:03 +
> > Simon Ser  wrote:
> >
> > > FWIW, I've just hit a case where a compositor leaves a "rotation" KMS
> > > prop set behind, then Xorg tries to startup and fails because it doesn't
> > > reset this prop. So none of this is theoretical.
> > >
> > > I still think a "reset all KMS props to an arbitrary default value" flag
> > > in drmModeAtomicCommit is the best way forward. I'm not sure a user-space
> > > protocol would help too much.
> >
> > Hi Simon,
> >
> > for the "reset KMS state" problem, sure. Thanks for confirming the
> > problem, too.
> >
> > The hand-off problem does need userspace protocol though, so that the
> > two parties can negotiate what part of KMS state can be inherited by
> > the receiver and who will do the animation from the first to the second
> > state in case you want to avoid abrupt changes. It would also be useful
> > for a cross-fade as a perhaps more flexible way than the current "leak
> > an FB, let the next KMS client scrape it via ioctls and copy it so it
> > can be textured from".
>
> The state reset already is an implicit protocol. Another IPC mechanism
> however could extend it to work the other way around: instead of
> inheriting all the state and trying to transition from that to the
> second client's desired state the second client would send its own
> desired state back to the first (instead of applying it immediately)
> which would then try to transition from its own state to the second
> state (and if it can't you fall back to the implicit inherited state
> protocol). However, this is only an improvement if the first client
> knows how to do the transition and the second does not. All in all I
> doubt that you can convince most people to add this kind of complexity
> just for slightly higher chances of a good transition.
>
> The reset state protocol on the other hand solves real problems and
> gives you a good transition as long as the second client knows about the
> same properties as the previous one which usually is the case for the
> typical bootsplash->login manager->compositor chain.
>
> Maybe I'm completely missing how such a protocol would work though.

The idea was that since you would have to have some IPC mechanism in
user space anyway to quickly effect a flicker-free transition from
Plymouth to the display manager (since, as de Goede reiterates in the
other message, both processes must have the device already open and
call drmSetMaster/drmDropMaster coordinatedly) you might just as well
look for ways how it could be designed for the benefit of everyone.
Using "implicit protocols" for things like this is usually the go-to
way, not because it's good design, but because it is easy to
implement.  But these "implicit protocols" have a tendency to greatly
limit what can be done and to not be easily adaptable once the use
cases become more complicated or refined, and thus they force
contortions on everyone eventually.

How such a protocol could look?  I don't know.  Maybe some DBus
interface for a broker/multiplexer for shared devices that would keep
track of the current DRM master and tell any process interested in
obtaining it what process to talk to.  It could then contact it either
via DBus or over a separate socket, communicate its capabilities,
negotiate the modalities for the transition and acquire the necessary
resources in the form of file descriptors passed over DBus/the socket.
Then both processes could set themselves up for the transition and
effect it, which could involve e.g. unlocking a locked mutex/semaphore
in shared memory.  Alternatively, the donor could refuse the handover,
e.g. if a screen locker is configured to prohibit release of the
device.  Complexitywise the sky would be the limit, of course, but it
needn't be this complicated from the beginning.  An initial version of
such a protocol could be held just as simple as the status quo.

As for the point raised by Paalanen that implementing something like
this would require a lot of effort I must state that, while certainly
true, many of the things I mentioned here are already implemented
somehow somewhere.  Plymouth has a control socket and protocol with
which the state of the splash screen can be controlled from the
outside to make the transition to gdm smoother.  The xlease project
apparently was designed with the intent that DRM devices should be
leased (and subleased) out to processes, and cross-process
coordination would be governed this way.  The kmscon project also had
to come up with something to govern device access since it could no
longer piggy-back on the VT-API.  systemd-logind also draws up a
framework for governance over a shared device and how to tie them to
sessions/seats (with such peculiarities that you cannot auto-spawn a
getty on tty1 since that is "reserved" for Wayland).  Then there is
the VT console, and probably lots of other little things I don't even
know about.

Re: Handling DRM master transitions cooperatively

2021-09-08 Thread Daniel Vetter
On Wed, Sep 8, 2021 at 9:36 AM Pekka Paalanen  wrote:
>
> On Tue, 7 Sep 2021 14:42:56 +0200
> Hans de Goede  wrote:
>
> > Hi,
> >
> > On 9/7/21 12:07 PM, Pekka Paalanen wrote:
> > > On Fri, 3 Sep 2021 21:08:21 +0200
> > > Dennis Filder  wrote:
> > >
> > >> Hans de Goede asked me to take a topic from a private discussion here.
> > >> I must also preface that I'm not a graphics person and my knowledge of
> > >> DRI/DRM is cursory at best.
> > >>
> > >> I initiated the conversation with de Goede after learning that the X
> > >> server now supports being started with an open DRM file descriptor
> > >> (this was added for Keith Packard's xlease project).  I wondered if
> > >> that could be used to smoothen the Plymouth->X transition somehow and
> > >> asked de Goede if there were any such plans.  He denied, but mentioned
> > >> that a new ioctl is in the works to prevent the kernel from wiping the
> > >> contents of a frame buffer after a device is closed, and that this
> > >> would help to keep transitions smooth.
> > >
> > > Hi,
> > >
> > > I believe the kernel is not wiping anything on device close. If
> > > something in the KMS state is wiped, it originates in userspace:
> > >
> > > - Plymouth doing something (e.g. RmFB on an in-use FB will turn the
> > >   output off, you need to be careful to "leak" your FB if you want a
> > >   smooth hand-over)
> >
> > The "kernel is not wiping anything on device close" is not true,
> > when closing /dev/dri/card# any remaining FBs from the app closing
> > it will be dealt with as if they were RmFB-ed, causing the screen
> > to show what I call "the fallback fb", at least with the i915 driver.
>
> No, that's not what should happen AFAIK.
>
> True, all FBs that are not referenced by active CRTCs or planes will
> get freed, since their refcount drops to zero, but those CRTCs and
> planes that are active will remain active and therefore keep their
> reference to the respective FBs and so the FBs remain until replaced or
> turned off explicitly (by e.g. fbcon if you switch to that rather than
> another userspace KMS client). I believe that is the whole reason why
> e.g. DRM_IOCTL_MODE_GETFB2 can be useful, otherwise the next KMS client
> would not have anything to scrape.
>
> danvet, what is the DRM core intention?

Historical accidents mostly. There's two things that foil easy
handover to the next compositor:
- RMFB instead of CLOSEFB semantics, especially when closing the
drmfd. This is uapi, so anything we change needs to be opt-in
- Forced fbdev restore on final close of all drm fd. This is only
prevented if there's a drm master left around (systemd-logind can keep
that instead of forcing the compositor to survive until the other has
taken over, which it needs to do anyway to prevent the drm master
handover from going sideways). This can be fixed by simply disabling
fbdev completely, which you really want to do anyway. Again it's uabi,
people will complain if we break this I think.

> Or am I confused because display servers do not tend to close the DRM
> device fd on switch-out but Plymouth does (too early)?

Yeah, that stops both forced restore/disable from kicking in.

> If so, why can't Plymouth keep the device open longer and quit only
> when the hand-off is complete? Not quitting too early would be a
> prerequisite for any explicit hand-off protocol as well.

With closefb semantics and fbdev disabled plymouth could quit early,
and things still work.
-Daniel

>
> Thanks,
> pq
>
> > > - Xorg doing something (e.g. resetting instead of inheriting KMS state)
> > >
> > > - Something missed in the hand-off sequence which allows fbcon to
> > >   momentarily take over between Plymouth and Xorg. This would need to
> > >   be fixed between Plymouth and Xorg.
> > >
> > > - Maybe systemd-logind does something odd to the KMS device? It has
> > >   pretty wild code there. Or maybe it causes fbcon to take over.
> > >
> > > What is the new ioctl you referred to?
> >
> > It is an ioctl to mark a FB to not have it auto-removed on device-close,
> > instead leaving it in place until some some kernel/userspace client
> > actively installs another FB. This was proposed by Rob Clark quite
> > a while ago, but it never got anywhere because of lack of userspace
> > actually interested in using it.
> >
> > I've been thinking about reviving Rob's patch, since at least for
> > plymouth this would be pretty useful to have.
> >
> > Regards,
> >
> > Hans
> >
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [Intel-gfx] [PATCH 1/8] drm/i915/xehp: Define compute class and engine

2021-09-08 Thread Daniel Vetter
On Tue, Sep 07, 2021 at 10:19:09AM -0700, Matt Roper wrote:
> Introduce a Compute Command Streamer (CCS), which has access to
> the media and GPGPU pipelines (but not the 3D pipeline).
> 
> To begin with, define the compute class/engine common functions, based
> on the existing render ones.
> 
> Bspec: 46167, 45544
> Original-patch-by: Michel Thierry
> Cc: Daniele Ceraolo Spurio 
> Cc: Tvrtko Ursulin 
> Cc: Vinay Belgaumkar 
> Cc: Szymon Morek 
> UMD (compute): https://github.com/intel/compute-runtime/pull/451
> Signed-off-by: Rodrigo Vivi 
> Signed-off-by: Daniele Ceraolo Spurio 
> Signed-off-by: Aravind Iddamsetty 
> Signed-off-by: Matt Roper 
> ---
>  drivers/gpu/drm/i915/gt/intel_engine_cs.c| 28 
>  drivers/gpu/drm/i915/gt/intel_engine_types.h |  9 ++-
>  drivers/gpu/drm/i915/gt/intel_engine_user.c  |  5 +++-
>  drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h  | 13 +
>  drivers/gpu/drm/i915/i915_reg.h  |  8 ++
>  include/uapi/drm/i915_drm.h  |  1 +
>  6 files changed, 57 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c 
> b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> index 332efea696a5..69944bd8c19d 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> @@ -153,6 +153,34 @@ static const struct engine_info intel_engines[] = {
>   { .graphics_ver = 12, .base = XEHP_VEBOX4_RING_BASE }
>   },
>   },
> + [CCS0] = {
> + .class = COMPUTE_CLASS,
> + .instance = 0,
> + .mmio_bases = {
> + { .graphics_ver = 12, .base = GEN12_COMPUTE0_RING_BASE }
> + }
> + },
> + [CCS1] = {
> + .class = COMPUTE_CLASS,
> + .instance = 1,
> + .mmio_bases = {
> + { .graphics_ver = 12, .base = GEN12_COMPUTE1_RING_BASE }
> + }
> + },
> + [CCS2] = {
> + .class = COMPUTE_CLASS,
> + .instance = 2,
> + .mmio_bases = {
> + { .graphics_ver = 12, .base = GEN12_COMPUTE2_RING_BASE }
> + }
> + },
> + [CCS3] = {
> + .class = COMPUTE_CLASS,
> + .instance = 3,
> + .mmio_bases = {
> + { .graphics_ver = 12, .base = GEN12_COMPUTE3_RING_BASE }
> + }
> + },
>  };
>  
>  /**
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h 
> b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> index bfbfe53c23dd..dcb9d8b2362a 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> @@ -33,7 +33,8 @@
>  #define VIDEO_ENHANCEMENT_CLASS  2
>  #define COPY_ENGINE_CLASS3
>  #define OTHER_CLASS  4
> -#define MAX_ENGINE_CLASS 4
> +#define COMPUTE_CLASS5
> +#define MAX_ENGINE_CLASS 5
>  #define MAX_ENGINE_INSTANCE  7
>  
>  #define I915_MAX_SLICES  3
> @@ -95,6 +96,7 @@ struct i915_ctx_workarounds {
>  
>  #define I915_MAX_VCS 8
>  #define I915_MAX_VECS4
> +#define I915_MAX_CCS 4
>  
>  /*
>   * Engine IDs definitions.
> @@ -117,6 +119,11 @@ enum intel_engine_id {
>   VECS2,
>   VECS3,
>  #define _VECS(n) (VECS0 + (n))
> + CCS0,
> + CCS1,
> + CCS2,
> + CCS3,
> +#define _CCS(n) (CCS0 + (n))
>   I915_NUM_ENGINES
>  #define INVALID_ENGINE ((enum intel_engine_id)-1)
>  };
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c 
> b/drivers/gpu/drm/i915/gt/intel_engine_user.c
> index 8f8bea08e734..d981621a7c30 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
> @@ -47,6 +47,7 @@ static const u8 uabi_classes[] = {
>   [COPY_ENGINE_CLASS] = I915_ENGINE_CLASS_COPY,
>   [VIDEO_DECODE_CLASS] = I915_ENGINE_CLASS_VIDEO,
>   [VIDEO_ENHANCEMENT_CLASS] = I915_ENGINE_CLASS_VIDEO_ENHANCE,
> + [COMPUTE_CLASS] = I915_ENGINE_CLASS_COMPUTE,
>  };
>  
>  static int engine_cmp(void *priv, const struct list_head *A,
> @@ -139,6 +140,7 @@ const char *intel_engine_class_repr(u8 class)
>   [COPY_ENGINE_CLASS] = "bcs",
>   [VIDEO_DECODE_CLASS] = "vcs",
>   [VIDEO_ENHANCEMENT_CLASS] = "vecs",
> + [COMPUTE_CLASS] = "ccs",
>   };
>  
>   if (class >= ARRAY_SIZE(uabi_names) || !uabi_names[class])
> @@ -162,6 +164,7 @@ static int legacy_ring_idx(const struct legacy_ring *ring)
>   [COPY_ENGINE_CLASS] = { BCS0, 1 },
>   [VIDEO_DECODE_CLASS] = { VCS0, I915_MAX_VCS },
>   [VIDEO_ENHANCEMENT_CLASS] = { VECS0, I915_MAX_VECS },
> + [COMPUTE_CLASS] = { CCS0, I915_MAX_CCS },
>   };
>  
>   if (GEM_DEBUG_WARN_ON(ring->class >= ARRAY_SIZE(map)))
> @@ -190,7 +193,7 @@ static void add_legacy_ring(struct legacy_ring *ring,
>  void intel_engines_driver_register(struct drm_i915_private *i915)
>  {
> 

Re: [Intel-gfx] [PATCH 2/8] drm/i915/xehp: CCS shares the render reset domain

2021-09-08 Thread Daniel Vetter
On Tue, Sep 07, 2021 at 10:19:10AM -0700, Matt Roper wrote:
> The reset domain is shared between render and all compute engines,
> so resetting one will affect the others.
> 
> Note:  Before performing a reset on an RCS or CCS engine, the GuC will
> attempt to preempt-to-idle the other non-hung RCS/CCS engines to avoid
> impacting other clients (since some shared modules will be reset).  If
> other engines are executing non-preemptable workloads, the impact is
> unavoidable and some work may be lost.
> 
> Bspec: 52549
> Original-patch-by: Michel Thierry
> Cc: Tvrtko Ursulin 
> Cc: Vinay Belgaumkar 
> Signed-off-by: Daniele Ceraolo Spurio 
> Signed-off-by: Aravind Iddamsetty 
> Signed-off-by: Matt Roper 

Do we have igts validating this all properly?

Specifically that the reset stats are incremented correctly for guilty
respectively victimized contexts.

This is necessary if it doesn't exist yet.

Also you need a patch set here that fixes up the igts which have wrong
assumptions about context isolation.
-Daniel

> ---
>  drivers/gpu/drm/i915/gt/intel_reset.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c 
> b/drivers/gpu/drm/i915/gt/intel_reset.c
> index 91200c43951f..30598c1d070c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> @@ -507,6 +507,10 @@ static int gen11_reset_engines(struct intel_gt *gt,
>   [VECS1] = GEN11_GRDOM_VECS2,
>   [VECS2] = GEN11_GRDOM_VECS3,
>   [VECS3] = GEN11_GRDOM_VECS4,
> + [CCS0] = GEN11_GRDOM_RENDER,
> + [CCS1] = GEN11_GRDOM_RENDER,
> + [CCS2] = GEN11_GRDOM_RENDER,
> + [CCS3] = GEN11_GRDOM_RENDER,
>   };
>   struct intel_engine_cs *engine;
>   intel_engine_mask_t tmp;
> -- 
> 2.25.4
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH] Change igt_log level from IGT_LOG_WARN to IGT_LOG_INFO

2021-09-08 Thread Daniel Vetter
On Wed, Sep 08, 2021 at 12:03:56AM +0530, Jeevan B wrote:
> change igt_warn to igt_info when unloading the snd module before
> unbinding i915 until WA is fixed.
> 
> Signed-off-by: Jeevan B 

Please submit per

https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/blob/master/CONTRIBUTING.md#sending-patches

No one (not even CI) pick up igt patches submitted to dri-devel.
-Daniel

> ---
>  tests/core_hotunplug.c | 2 +-
>  tests/device_reset.c   | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/tests/core_hotunplug.c b/tests/core_hotunplug.c
> index 2d73e27f..b3661668 100644
> --- a/tests/core_hotunplug.c
> +++ b/tests/core_hotunplug.c
> @@ -164,7 +164,7 @@ static void driver_unbind(struct hotunplug *priv, const 
> char *prefix,
>   igt_lsof("/dev/snd");
>   igt_skip("Audio is in use, skipping\n");
>   } else {
> - igt_warn("Preventively unloaded snd_hda_intel\n");
> + igt_info("Preventively unloaded snd_hda_intel\n");
>   }
>   }
>  
> diff --git a/tests/device_reset.c b/tests/device_reset.c
> index e6a468e6..982ba5ef 100644
> --- a/tests/device_reset.c
> +++ b/tests/device_reset.c
> @@ -201,7 +201,7 @@ static void driver_unbind(struct device_fds *dev)
>   igt_lsof("/dev/snd");
>   igt_skip("Audio is in use, skipping\n");
>   } else {
> - igt_warn("Preventively unloaded snd_hda_intel\n");
> + igt_info("Preventively unloaded snd_hda_intel\n");
>   }
>   }
>  
> -- 
> 2.19.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH v2 (repost)] fbmem: don't allow too huge resolutions

2021-09-08 Thread Daniel Vetter
On Wed, Sep 08, 2021 at 07:27:49PM +0900, Tetsuo Handa wrote:
> syzbot is reporting page fault at vga16fb_fillrect() [1], for
> vga16fb_check_var() is failing to detect multiplication overflow.
> 
>   if (vxres * vyres > maxmem) {
> vyres = maxmem / vxres;
> if (vyres < yres)
>   return -ENOMEM;
>   }
> 
> Since no module would accept too huge resolutions where multiplication
> overflow happens, let's reject in the common path.
> 
> Link: https://syzkaller.appspot.com/bug?extid=04168c8063cfdde1db5e [1]
> Reported-by: syzbot 
> Debugged-by: Randy Dunlap 
> Signed-off-by: Tetsuo Handa 
> Reviewed-by: Geert Uytterhoeven 
> ---
> Changes in v2:
>   Use check_mul_overflow(), suggested by Geert Uytterhoeven 
> .

Pushed to drm-misc-next-fixes so it should get into current merge window.

I also added a cc: stable here, I htink it's needed.

Thanks a lot to both you&Geert for handling this!
-Daniel

> 
>  drivers/video/fbdev/core/fbmem.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/video/fbdev/core/fbmem.c 
> b/drivers/video/fbdev/core/fbmem.c
> index 71fb710f1ce3..7420d2c16e47 100644
> --- a/drivers/video/fbdev/core/fbmem.c
> +++ b/drivers/video/fbdev/core/fbmem.c
> @@ -962,6 +962,7 @@ fb_set_var(struct fb_info *info, struct fb_var_screeninfo 
> *var)
>   struct fb_var_screeninfo old_var;
>   struct fb_videomode mode;
>   struct fb_event event;
> + u32 unused;
>  
>   if (var->activate & FB_ACTIVATE_INV_MODE) {
>   struct fb_videomode mode1, mode2;
> @@ -1008,6 +1009,11 @@ fb_set_var(struct fb_info *info, struct 
> fb_var_screeninfo *var)
>   if (var->xres < 8 || var->yres < 8)
>   return -EINVAL;
>  
> + /* Too huge resolution causes multiplication overflow. */
> + if (check_mul_overflow(var->xres, var->yres, &unused) ||
> + check_mul_overflow(var->xres_virtual, var->yres_virtual, &unused))
> + return -EINVAL;
> +
>   ret = info->fbops->fb_check_var(var, info);
>  
>   if (ret)
> -- 
> 2.18.4
> 
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [Intel-gfx] [PATCH v2] drm/i915: Handle Intel igfx + Intel dgfx hybrid graphics setup

2021-09-08 Thread Daniel Vetter
On Thu, Sep 02, 2021 at 04:01:40PM +0100, Tvrtko Ursulin wrote:
> 
> On 02/09/2021 15:33, Daniel Vetter wrote:
> > On Tue, Aug 31, 2021 at 02:18:15PM +0100, Tvrtko Ursulin wrote:
> > > 
> > > On 31/08/2021 13:43, Daniel Vetter wrote:
> > > > On Tue, Aug 31, 2021 at 10:15:03AM +0100, Tvrtko Ursulin wrote:
> > > > > 
> > > > > On 30/08/2021 09:26, Daniel Vetter wrote:
> > > > > > On Fri, Aug 27, 2021 at 03:44:42PM +0100, Tvrtko Ursulin wrote:
> > > > > > > 
> > > > > > > On 27/08/2021 15:39, Tvrtko Ursulin wrote:
> > > > > > > > From: Tvrtko Ursulin 
> > > > > > > > 
> > > > > > > > In short this makes i915 work for hybrid setups (DRI_PRIME=1 
> > > > > > > > with Mesa)
> > > > > > > > when rendering is done on Intel dgfx and scanout/composition on 
> > > > > > > > Intel
> > > > > > > > igfx.
> > > > > > > > 
> > > > > > > > Before this patch the driver was not quite ready for that 
> > > > > > > > setup, mainly
> > > > > > > > because it was able to emit a semaphore wait between the two 
> > > > > > > > GPUs, which
> > > > > > > > results in deadlocks because semaphore target location in HWSP 
> > > > > > > > is neither
> > > > > > > > shared between the two, nor mapped in both GGTT spaces.
> > > > > > > > 
> > > > > > > > To fix it the patch adds an additional check to a couple of 
> > > > > > > > relevant code
> > > > > > > > paths in order to prevent using semaphores for inter-engine
> > > > > > > > synchronisation between different driver instances.
> > > > > > > > 
> > > > > > > > Patch also moves singly used i915_gem_object_last_write_engine 
> > > > > > > > to be
> > > > > > > > private in its only calling unit (debugfs), while modifying it 
> > > > > > > > to only
> > > > > > > > show activity belonging to the respective driver instance.
> > > > > > > > 
> > > > > > > > What remains in this problem space is the question of the GEM 
> > > > > > > > busy ioctl.
> > > > > > > > We have a somewhat ambigous comment there saying only status of 
> > > > > > > > native
> > > > > > > > fences will be reported, which could be interpreted as either 
> > > > > > > > i915, or
> > > > > > > > native to the drm fd. For now I have decided to leave that as 
> > > > > > > > is, meaning
> > > > > > > > any i915 instance activity continues to be reported.
> > > > > > > > 
> > > > > > > > v2:
> > > > > > > >  * Avoid adding rq->i915. (Chris)
> > > > > > > > 
> > > > > > > > Signed-off-by: Tvrtko Ursulin 
> > > > > > 
> > > > > > Can't we just delete semaphore code and done?
> > > > > > - GuC won't have it
> > > > > > - media team benchmarked on top of softpin media driver, found no
> > > > > >  difference
> > > > > 
> > > > > You have S-curve for saturated workloads or something else? How 
> > > > > thorough and
> > > > > which media team I guess.
> > > > > 
> > > > >   From memory it was a nice win for some benchmarks (non-saturated 
> > > > > ones), but
> > > > > as I have told you previously, we haven't been putting numbers in 
> > > > > commit
> > > > > messages since it wasn't allowed. I may be able to dig out some more 
> > > > > details
> > > > > if I went trawling through GEM channel IRC logs, although probably 
> > > > > not the
> > > > > actual numbers since those were usually on pastebin. Or you go an 
> > > > > talk with
> > > > > Chris since he probably remembers more details. Or you just decide 
> > > > > you don't
> > > > > care and remove it. I wouldn't do that without putting the complete 
> > > > > story in
> > > > > writing, but it's your call after all.
> > > > 
> > > > Media has also changed, they're not using relocations anymore.
> > > 
> > > Meaning you think it changes the benchmarking story? When coupled with
> > > removal of GPU relocations then possibly yes.
> > > 
> > > > Unless there's solid data performance tuning of any kind that gets in 
> > > > the
> > > > way simply needs to be removed. Yes this is radical, but the codebase is
> > > > in a state to require this.
> > > > 
> > > > So either way we'd need to rebenchmark this if it's really required. 
> > > > Also
> > > 
> > > Therefore can you share what benchmarks have been done or is it secret?  
> > > As
> > > said, I think the non-saturated case was the more interesting one here.
> > > 
> > > > if we really need this code still someone needs to fix the design, the
> > > > current code is making layering violations an art form.
> > > > 
> > > > > Anyway, without the debugfs churn it is more or less two line patch 
> > > > > to fix
> > > > > igfx + dgfx hybrid setup. So while mulling it over this could go in. 
> > > > > I'd
> > > > > just refine it to use a GGTT check instead of GT. And unless DG1 ends 
> > > > > up
> > > > > being GuC only.
> > > > 
> > > > The minimal robust fix here is imo to stop us from upcasting dma_fence 
> > > > to
> > > > i915_request if it's not for our device. Not sprinkle code here into the
> > > > semaphore code. We shouldn't even get this far with foreign fences.
> > > 
> > > Device check does not w

Re: [PATCH] drm/ttm: provide default page protection for UML

2021-09-08 Thread Daniel Vetter
On Sat, Sep 04, 2021 at 11:50:37AM +0800, David Gow wrote:
> On Thu, Sep 2, 2021 at 10:46 PM Daniel Vetter  wrote:
> >
> > On Thu, Sep 02, 2021 at 07:19:01AM +0100, Anton Ivanov wrote:
> > > On 02/09/2021 06:52, Randy Dunlap wrote:
> > > > On 9/1/21 10:48 PM, Anton Ivanov wrote:
> > > > > On 02/09/2021 03:01, Randy Dunlap wrote:
> > > > > > boot_cpu_data [struct cpuinfo_um (on UML)] does not have a struct
> > > > > > member named 'x86', so provide a default page protection mode
> > > > > > for CONFIG_UML.
> > > > > >
> > > > > > Mends this build error:
> > > > > > ../drivers/gpu/drm/ttm/ttm_module.c: In function
> > > > > > ‘ttm_prot_from_caching’:
> > > > > > ../drivers/gpu/drm/ttm/ttm_module.c:59:24: error: ‘struct
> > > > > > cpuinfo_um’ has no member named ‘x86’
> > > > > >else if (boot_cpu_data.x86 > 3)
> > > > > >  ^
> > > > > >
> > > > > > Fixes: 3bf3710e3718 ("drm/ttm: Add a generic TTM memcpy move for
> > > > > > page-based iomem")
> > > > > > Signed-off-by: Randy Dunlap 
> > > > > > Cc: Thomas Hellström 
> > > > > > Cc: Christian König 
> > > > > > Cc: Huang Rui 
> > > > > > Cc: dri-devel@lists.freedesktop.org
> > > > > > Cc: Jeff Dike 
> > > > > > Cc: Richard Weinberger 
> > > > > > Cc: Anton Ivanov 
> > > > > > Cc: linux...@lists.infradead.org
> > > > > > Cc: David Airlie 
> > > > > > Cc: Daniel Vetter 
> > > > > > ---
> > > > > >   drivers/gpu/drm/ttm/ttm_module.c |4 
> > > > > >   1 file changed, 4 insertions(+)
> > > > > >
> > > > > > --- linux-next-20210901.orig/drivers/gpu/drm/ttm/ttm_module.c
> > > > > > +++ linux-next-20210901/drivers/gpu/drm/ttm/ttm_module.c
> > > > > > @@ -53,6 +53,9 @@ pgprot_t ttm_prot_from_caching(enum ttm_
> > > > > >   if (caching == ttm_cached)
> > > > > >   return tmp;
> > > > > > +#ifdef CONFIG_UML
> > > > > > +tmp = pgprot_noncached(tmp);
> > > > > > +#else
> > > > > >   #if defined(__i386__) || defined(__x86_64__)
> > > > > >   if (caching == ttm_write_combined)
> > > > > >   tmp = pgprot_writecombine(tmp);
> > > > > > @@ -69,6 +72,7 @@ pgprot_t ttm_prot_from_caching(enum ttm_
> > > > > >   #if defined(__sparc__)
> > > > > >   tmp = pgprot_noncached(tmp);
> > > > > >   #endif
> > > > > > +#endif
> > > > > >   return tmp;
> > > > > >   }
> > > > >
> > > > > Patch looks OK.
> > > > >
> > > > > I have a question though - why all of DRM is not !UML in config. Not
> > > > > like we can use them.
> > > >
> > > > I have no idea about that.
> > > > Hopefully one of the (other) UML maintainers can answer you.
> > >
> > > Touche.
> > >
> > > We will discuss that and possibly push a patch to !UML that part of the
> > > tree. IMHO it is not applicable.
> >
> > I thought kunit is based on top of uml, and we do want to eventually adopt
> > that. Especially for helper libraries like ttm.
> 
> UML is not actually a dependency for KUnit, so it's definitely
> possible to test things which aren't compatible with UML. (In fact,
> there's even now some tooling support to use qemu instead on a number
> of architectures.)
> 
> That being said, the KUnit tooling does use UML by default, so if it's
> not too difficult to keep some level of UML support, it'll make it a
> little easier (and faster) for people to run any KUnit tests.

Yeah my understanding is that uml is the quickest way to spawn a new
kernel, which kunit needs to run. And I really do like that idea, because
having virtualization support in cloud CI systems (which use containers
themselves) is a bit a fun exercise. The less we rely on virtual machines
in containers for that, the better.

Hence why I really like the uml approach for kunit.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH][next] drm/i915: clean up inconsistent indenting

2021-09-08 Thread Daniel Vetter
On Thu, Sep 02, 2021 at 10:57:37PM +0100, Colin King wrote:
> From: Colin Ian King 
> 
> There is a statement that is indented one character too deeply,
> clean this up.
> 
> Signed-off-by: Colin Ian King 

Queued to drm-intel-gt-next, thanks for patch.
-Daniel

> ---
>  drivers/gpu/drm/i915/gt/intel_execlists_submission.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
> b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> index de5f9c86b9a4..aeb324b701ec 100644
> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> @@ -2565,7 +2565,7 @@ __execlists_context_pre_pin(struct intel_context *ce,
>   if (!__test_and_set_bit(CONTEXT_INIT_BIT, &ce->flags)) {
>   lrc_init_state(ce, engine, *vaddr);
>  
> -  __i915_gem_object_flush_map(ce->state->obj, 0, 
> engine->context_size);
> + __i915_gem_object_flush_map(ce->state->obj, 0, 
> engine->context_size);
>   }
>  
>   return 0;
> -- 
> 2.32.0
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH] drm/sched: Fix drm_sched_fence_free() so it can be passed an uninitialized fence

2021-09-08 Thread Daniel Vetter
On Fri, Sep 03, 2021 at 02:05:54PM +0200, Boris Brezillon wrote:
> drm_sched_job_cleanup() will pass an uninitialized fence to
> drm_sched_fence_free(), which will cause to_drm_sched_fence() to return
> a NULL fence object, causing a NULL pointer deref when this NULL object
> is passed to kmem_cache_free().
> 
> Let's create a new drm_sched_fence_free() function that takes a
> drm_sched_fence pointer and suffix the old function with _rcu. While at
> it, complain if drm_sched_fence_free() is passed an initialized fence
> or if drm_sched_fence_free_rcu() is passed an uninitialized fence.
> 
> Fixes: dbe48d030b28 ("drm/sched: Split drm_sched_job_init")
> Signed-off-by: Boris Brezillon 
> ---
> Found while debugging another issue in panfrost causing a failure in
> the submit ioctl and exercising the error path (path that calls
> drm_sched_job_cleanup() on an unarmed job).

Reviewed-by: Daniel Vetter 

I already provided an irc r-b, just here for the record too.
-Daniel

> ---
>  drivers/gpu/drm/scheduler/sched_fence.c | 29 -
>  drivers/gpu/drm/scheduler/sched_main.c  |  2 +-
>  include/drm/gpu_scheduler.h |  2 +-
>  3 files changed, 21 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c 
> b/drivers/gpu/drm/scheduler/sched_fence.c
> index db3fd1303fc4..7fd869520ef2 100644
> --- a/drivers/gpu/drm/scheduler/sched_fence.c
> +++ b/drivers/gpu/drm/scheduler/sched_fence.c
> @@ -69,19 +69,28 @@ static const char 
> *drm_sched_fence_get_timeline_name(struct dma_fence *f)
>   return (const char *)fence->sched->name;
>  }
>  
> -/**
> - * drm_sched_fence_free - free up the fence memory
> - *
> - * @rcu: RCU callback head
> - *
> - * Free up the fence memory after the RCU grace period.
> - */
> -void drm_sched_fence_free(struct rcu_head *rcu)
> +static void drm_sched_fence_free_rcu(struct rcu_head *rcu)
>  {
>   struct dma_fence *f = container_of(rcu, struct dma_fence, rcu);
>   struct drm_sched_fence *fence = to_drm_sched_fence(f);
>  
> - kmem_cache_free(sched_fence_slab, fence);
> + if (!WARN_ON_ONCE(!fence))
> + kmem_cache_free(sched_fence_slab, fence);
> +}
> +
> +/**
> + * drm_sched_fence_free - free up an uninitialized fence
> + *
> + * @fence: fence to free
> + *
> + * Free up the fence memory. Should only be used if drm_sched_fence_init()
> + * has not been called yet.
> + */
> +void drm_sched_fence_free(struct drm_sched_fence *fence)
> +{
> + /* This function should not be called if the fence has been 
> initialized. */
> + if (!WARN_ON_ONCE(fence->sched))
> + kmem_cache_free(sched_fence_slab, fence);
>  }
>  
>  /**
> @@ -97,7 +106,7 @@ static void drm_sched_fence_release_scheduled(struct 
> dma_fence *f)
>   struct drm_sched_fence *fence = to_drm_sched_fence(f);
>  
>   dma_fence_put(fence->parent);
> - call_rcu(&fence->finished.rcu, drm_sched_fence_free);
> + call_rcu(&fence->finished.rcu, drm_sched_fence_free_rcu);
>  }
>  
>  /**
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
> b/drivers/gpu/drm/scheduler/sched_main.c
> index fbbd3b03902f..6987d412a946 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -750,7 +750,7 @@ void drm_sched_job_cleanup(struct drm_sched_job *job)
>   dma_fence_put(&job->s_fence->finished);
>   } else {
>   /* aborted job before committing to run it */
> - drm_sched_fence_free(&job->s_fence->finished.rcu);
> + drm_sched_fence_free(job->s_fence);
>   }
>  
>   job->s_fence = NULL;
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index 7f77a455722c..f011e4c407f2 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -509,7 +509,7 @@ struct drm_sched_fence *drm_sched_fence_alloc(
>   struct drm_sched_entity *s_entity, void *owner);
>  void drm_sched_fence_init(struct drm_sched_fence *fence,
> struct drm_sched_entity *entity);
> -void drm_sched_fence_free(struct rcu_head *rcu);
> +void drm_sched_fence_free(struct drm_sched_fence *fence);
>  
>  void drm_sched_fence_scheduled(struct drm_sched_fence *fence);
>  void drm_sched_fence_finished(struct drm_sched_fence *fence);
> -- 
> 2.31.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH] drm/i915/request: fix early tracepoints

2021-09-08 Thread Daniel Vetter
On Fri, Sep 03, 2021 at 12:24:05PM +0100, Matthew Auld wrote:
> Currently we blow up in trace_dma_fence_init, when calling into
> get_driver_name or get_timeline_name, since both the engine and context
> might be NULL(or contain some garbage address) in the case of newly
> allocated slab objects via the request ctor. Note that we also use
> SLAB_TYPESAFE_BY_RCU here, which allows requests to be immediately
> freed, but delay freeing the underlying page by an RCU grace period.
> With this scheme requests can be re-allocated, at the same time as they
> are also being read by some lockless RCU lookup mechanism.
> 
> One possible fix, since we don't yet have a fully initialised request
> when in the ctor, is just setting the context/engine as NULL and adding
> some extra handling in get_driver_name etc. And since the ctor is only
> called for new slab objects(i.e allocate new page and call the ctor for
> each object) it's safe to reset the context/engine prior to calling into
> dma_fence_init, since we can be certain that no one is doing an RCU
> lookup which might depend on peeking at the engine/context, like in
> active_engine(), since the object can't yet be externally visible.
> 
> In the recycled case(which might also be externally visible) the request
> refcount always transitions from 0->1 after we set the context/engine
> etc, which should ensure it's valid to dereference the engine for
> example, when doing an RCU list-walk, so long as we can also increment
> the refcount first. If the refcount is already zero, then the request is
> considered complete/released.  If it's non-zero, then the request might
> be in the process of being re-allocated, or potentially still in flight,
> however after successfully incrementing the refcount, it's possible to
> carefully inspect the request state, to determine if the request is
> still what we were looking for. Note that all externally visible
> requests returned to the cache must have zero refcount.

The commit message here is a bit confusing, since you start out with
describing a solution that you're not actually implementing it. I usually
do this by putting alternate solutions at the bottom, starting with "An
alternate solution would be ..." or so.

And then closing with why we don't do that, here it would be that we do
no longer have a need for these partially set up i915_requests, and
therefore just reverting that complication is the simplest solution.

> An alternative fix then is to instead move the dma_fence_init out from
> the request ctor. Originally this was how it was done, but it was moved
> in:
> 
> commit 855e39e65cfc33a73724f1cc644ffc5754864a20
> Author: Chris Wilson 
> Date:   Mon Feb 3 09:41:48 2020 +
> 
> drm/i915: Initialise basic fence before acquiring seqno
> 
> where it looks like intel_timeline_get_seqno() relied on some of the
> rq->fence state, but that is no longer the case since:
> 
> commit 12ca695d2c1ed26b2dcbb528b42813bd0f216cfc
> Author: Maarten Lankhorst 
> Date:   Tue Mar 23 16:49:50 2021 +0100
> 
> drm/i915: Do not share hwsp across contexts any more, v8.
> 
> intel_timeline_get_seqno() could also be cleaned up slightly by dropping
> the request argument.
> 
> Moving dma_fence_init back out of the ctor, should ensure we have enough
> of the request initialised in case of trace_dma_fence_init.
> Functionally this should be the same, and is effectively what we were
> already open coding before, except now we also assign the fence->lock
> and fence->ops, but since these are invariant for recycled
> requests(which might be externally visible), and will therefore already
> hold the same value, it shouldn't matter. We still leave the
> spin_lock_init() in the ctor, since we can't re-init the rq->lock in
> case it is already held.

Holding rq->lock without having a full reference to it sounds like really
bad taste. I think it would be good to have a (kerneldoc) comment next to
i915_request.lock about this, with a FIXME. But separate patch.

> Fixes: 855e39e65cfc ("drm/i915: Initialise basic fence before acquiring 
> seqno")
> Signed-off-by: Matthew Auld 
> Cc: Michael Mason 
> Cc: Daniel Vetter 

With the commit message restructured a bit, and assuming this one actually
works:

Reviewed-by: Daniel Vetter 

But I'm really not confident :-(
-Daniel

> ---
>  drivers/gpu/drm/i915/i915_request.c | 11 ++-
>  1 file changed, 2 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_request.c 
> b/drivers/gpu/drm/i915/i915_request.c
> index ce446716d092..79da5eca60af 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -829,8 +829,6 @@ static void __i915_request_ctor(void *arg)
>   i915_sw_fence_init(&rq->submit, submit_notify);
>   i915_sw_fence_init(&rq->semaphore, semaphore_notify);
>  
> - dma_fence_init(&rq->fence, &i915_fence_ops, &rq->lock, 0, 0);
> -
>   rq->capture_list = NULL;
>  
>   init_llist_head(&rq->execute_cb);
> @@ -905,

Re: [PATCH v2 1/2] drm: document drm_mode_create_lease object requirements

2021-09-08 Thread Daniel Vetter
On Fri, Sep 03, 2021 at 01:00:16PM +, Simon Ser wrote:
> validate_lease expects one CRTC, one connector and one plane.
> 
> Signed-off-by: Simon Ser 
> Cc: Daniel Vetter 
> Cc: Pekka Paalanen 
> Cc: Keith Packard 

Reviewed-by: Daniel Vetter 

> ---
>  include/uapi/drm/drm_mode.h | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/include/uapi/drm/drm_mode.h b/include/uapi/drm/drm_mode.h
> index 90c55383f1ee..e4a2570a6058 100644
> --- a/include/uapi/drm/drm_mode.h
> +++ b/include/uapi/drm/drm_mode.h
> @@ -1110,6 +1110,9 @@ struct drm_mode_destroy_blob {
>   * struct drm_mode_create_lease - Create lease
>   *
>   * Lease mode resources, creating another drm_master.
> + *
> + * The @object_ids array must reference at least one CRTC, one connector and
> + * one plane if &DRM_CLIENT_CAP_UNIVERSAL_PLANES is enabled.
>   */
>  struct drm_mode_create_lease {
>   /** @object_ids: Pointer to array of object ids (__u32) */
> -- 
> 2.33.0
> 
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


[PULL] drm-misc-fixes

2021-09-08 Thread Thomas Zimmermann
Hi Dave and Daniel,

here's this week's PR for drm-misc-fixes. One patch is a potential deadlock
in TTM, the other enables an additional plane in kmb. I'm slightly unhappy
that the latter one ended up in -fixes as it's not a bugfix AFAICT.

Best regards
Thomas

drm-misc-fixes-2021-09-08:
Short summary of fixes pull:

 * kmb: Emable second plane
 * ttm: Fix potential deadlock during swap

The following changes since commit fa0b1ef5f7a694f48e00804a391245f3471aa155:

  drm: Copy drm_wait_vblank to user before returning (2021-08-17 13:56:03 -0400)

are available in the Git repository at:

  git://anongit.freedesktop.org/drm/drm-misc tags/drm-misc-fixes-2021-09-08

for you to fetch changes up to c8704b7ec182f9293e6a994310c7d4203428cdfb:

  drm/kmb: Enable alpha blended second plane (2021-09-07 10:10:30 -0700)


Short summary of fixes pull:

 * kmb: Emable second plane
 * ttm: Fix potential deadlock during swap


Edmund Dea (1):
  drm/kmb: Enable alpha blended second plane

xinhui pan (1):
  drm/ttm: Fix a deadlock if the target BO is not idle during swap

 drivers/gpu/drm/kmb/kmb_drv.c   |  8 ++--
 drivers/gpu/drm/kmb/kmb_drv.h   |  5 +++
 drivers/gpu/drm/kmb/kmb_plane.c | 81 -
 drivers/gpu/drm/kmb/kmb_plane.h |  5 ++-
 drivers/gpu/drm/kmb/kmb_regs.h  |  3 ++
 drivers/gpu/drm/ttm/ttm_bo.c|  6 +--
 6 files changed, 90 insertions(+), 18 deletions(-)

--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer


Re: [PATCH v3 4/9] drm/scheduler: Add fence deadline support

2021-09-08 Thread Daniel Vetter
On Fri, Sep 03, 2021 at 11:47:55AM -0700, Rob Clark wrote:
> From: Rob Clark 
> 
> As the finished fence is the one that is exposed to userspace, and
> therefore the one that other operations, like atomic update, would
> block on, we need to propagate the deadline from from the finished
> fence to the actual hw fence.
> 
> v2: Split into drm_sched_fence_set_parent() (ckoenig)
> 
> Signed-off-by: Rob Clark 
> ---
>  drivers/gpu/drm/scheduler/sched_fence.c | 34 +
>  drivers/gpu/drm/scheduler/sched_main.c  |  2 +-
>  include/drm/gpu_scheduler.h |  8 ++
>  3 files changed, 43 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_fence.c 
> b/drivers/gpu/drm/scheduler/sched_fence.c
> index bcea035cf4c6..4fc41a71d1c7 100644
> --- a/drivers/gpu/drm/scheduler/sched_fence.c
> +++ b/drivers/gpu/drm/scheduler/sched_fence.c
> @@ -128,6 +128,30 @@ static void drm_sched_fence_release_finished(struct 
> dma_fence *f)
>   dma_fence_put(&fence->scheduled);
>  }
>  
> +static void drm_sched_fence_set_deadline_finished(struct dma_fence *f,
> +   ktime_t deadline)
> +{
> + struct drm_sched_fence *fence = to_drm_sched_fence(f);
> + unsigned long flags;
> +
> + spin_lock_irqsave(&fence->lock, flags);
> +
> + /* If we already have an earlier deadline, keep it: */
> + if (test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags) &&
> + ktime_before(fence->deadline, deadline)) {
> + spin_unlock_irqrestore(&fence->lock, flags);
> + return;
> + }
> +
> + fence->deadline = deadline;
> + set_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT, &f->flags);
> +
> + spin_unlock_irqrestore(&fence->lock, flags);
> +
> + if (fence->parent)
> + dma_fence_set_deadline(fence->parent, deadline);
> +}
> +
>  static const struct dma_fence_ops drm_sched_fence_ops_scheduled = {
>   .get_driver_name = drm_sched_fence_get_driver_name,
>   .get_timeline_name = drm_sched_fence_get_timeline_name,
> @@ -138,6 +162,7 @@ static const struct dma_fence_ops 
> drm_sched_fence_ops_finished = {
>   .get_driver_name = drm_sched_fence_get_driver_name,
>   .get_timeline_name = drm_sched_fence_get_timeline_name,
>   .release = drm_sched_fence_release_finished,
> + .set_deadline = drm_sched_fence_set_deadline_finished,
>  };
>  
>  struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f)
> @@ -152,6 +177,15 @@ struct drm_sched_fence *to_drm_sched_fence(struct 
> dma_fence *f)
>  }
>  EXPORT_SYMBOL(to_drm_sched_fence);
>  
> +void drm_sched_fence_set_parent(struct drm_sched_fence *s_fence,
> + struct dma_fence *fence)
> +{
> + s_fence->parent = dma_fence_get(fence);
> + if (test_bit(DMA_FENCE_FLAG_HAS_DEADLINE_BIT,
> +  &s_fence->finished.flags))

Don't you need the spinlock here too to avoid races? test_bit is very
unordered, so guarantees nothing. Spinlock would need to be both around
->parent = and the test_bit.

Entirely aside, but there's discussions going on to preallocate the hw
fence somehow. If we do that we could make the deadline forwarding
lockless here. Having a spinlock just to set the parent is a bit annoying
...

Alternative is that you do this locklessly with barriers and a _lot_ of
comments. Would be good to benchmark whether the overhead matters though
first.
-Daniel

> + dma_fence_set_deadline(fence, s_fence->deadline);
> +}
> +
>  struct drm_sched_fence *drm_sched_fence_alloc(struct drm_sched_entity 
> *entity,
> void *owner)
>  {
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
> b/drivers/gpu/drm/scheduler/sched_main.c
> index 595e47ff7d06..27bf0ac0625f 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -978,7 +978,7 @@ static int drm_sched_main(void *param)
>   drm_sched_fence_scheduled(s_fence);
>  
>   if (!IS_ERR_OR_NULL(fence)) {
> - s_fence->parent = dma_fence_get(fence);
> + drm_sched_fence_set_parent(s_fence, fence);
>   r = dma_fence_add_callback(fence, &sched_job->cb,
>  drm_sched_job_done_cb);
>   if (r == -ENOENT)
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index 7f77a455722c..158ddd662469 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -238,6 +238,12 @@ struct drm_sched_fence {
>   */
>   struct dma_fencefinished;
>  
> + /**
> +  * @deadline: deadline set on &drm_sched_fence.finished which
> +  * potentially needs to be propagated to &drm_sched_fence.parent
> +  */
> + ktime_t deadline;
> +
>  /**
>   * @parent: the fence returned by &drm_sched_backend_ops.run_

Re: [PATCH v3 5/9] drm/msm: Add deadline based boost support

2021-09-08 Thread Daniel Vetter
On Fri, Sep 03, 2021 at 11:47:56AM -0700, Rob Clark wrote:
> From: Rob Clark 
> 
> Signed-off-by: Rob Clark 

Why do you need a kthread_work here? Is this just to make sure you're
running at realtime prio? Maybe a comment to that effect would be good.
-Daniel

> ---
>  drivers/gpu/drm/msm/msm_fence.c   | 76 +++
>  drivers/gpu/drm/msm/msm_fence.h   | 20 +++
>  drivers/gpu/drm/msm/msm_gpu.h |  1 +
>  drivers/gpu/drm/msm/msm_gpu_devfreq.c | 20 +++
>  4 files changed, 117 insertions(+)
> 
> diff --git a/drivers/gpu/drm/msm/msm_fence.c b/drivers/gpu/drm/msm/msm_fence.c
> index f2cece542c3f..67c2a96e1c85 100644
> --- a/drivers/gpu/drm/msm/msm_fence.c
> +++ b/drivers/gpu/drm/msm/msm_fence.c
> @@ -8,6 +8,37 @@
>  
>  #include "msm_drv.h"
>  #include "msm_fence.h"
> +#include "msm_gpu.h"
> +
> +static inline bool fence_completed(struct msm_fence_context *fctx, uint32_t 
> fence);
> +
> +static struct msm_gpu *fctx2gpu(struct msm_fence_context *fctx)
> +{
> + struct msm_drm_private *priv = fctx->dev->dev_private;
> + return priv->gpu;
> +}
> +
> +static enum hrtimer_restart deadline_timer(struct hrtimer *t)
> +{
> + struct msm_fence_context *fctx = container_of(t,
> + struct msm_fence_context, deadline_timer);
> +
> + kthread_queue_work(fctx2gpu(fctx)->worker, &fctx->deadline_work);
> +
> + return HRTIMER_NORESTART;
> +}
> +
> +static void deadline_work(struct kthread_work *work)
> +{
> + struct msm_fence_context *fctx = container_of(work,
> + struct msm_fence_context, deadline_work);
> +
> + /* If deadline fence has already passed, nothing to do: */
> + if (fence_completed(fctx, fctx->next_deadline_fence))
> + return;
> +
> + msm_devfreq_boost(fctx2gpu(fctx), 2);
> +}
>  
>  
>  struct msm_fence_context *
> @@ -26,6 +57,13 @@ msm_fence_context_alloc(struct drm_device *dev, volatile 
> uint32_t *fenceptr,
>   fctx->fenceptr = fenceptr;
>   spin_lock_init(&fctx->spinlock);
>  
> + hrtimer_init(&fctx->deadline_timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
> + fctx->deadline_timer.function = deadline_timer;
> +
> + kthread_init_work(&fctx->deadline_work, deadline_work);
> +
> + fctx->next_deadline = ktime_get();
> +
>   return fctx;
>  }
>  
> @@ -49,6 +87,8 @@ void msm_update_fence(struct msm_fence_context *fctx, 
> uint32_t fence)
>  {
>   spin_lock(&fctx->spinlock);
>   fctx->completed_fence = max(fence, fctx->completed_fence);
> + if (fence_completed(fctx, fctx->next_deadline_fence))
> + hrtimer_cancel(&fctx->deadline_timer);
>   spin_unlock(&fctx->spinlock);
>  }
>  
> @@ -79,10 +119,46 @@ static bool msm_fence_signaled(struct dma_fence *fence)
>   return fence_completed(f->fctx, f->base.seqno);
>  }
>  
> +static void msm_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
> +{
> + struct msm_fence *f = to_msm_fence(fence);
> + struct msm_fence_context *fctx = f->fctx;
> + unsigned long flags;
> + ktime_t now;
> +
> + spin_lock_irqsave(&fctx->spinlock, flags);
> + now = ktime_get();
> +
> + if (ktime_after(now, fctx->next_deadline) ||
> + ktime_before(deadline, fctx->next_deadline)) {
> + fctx->next_deadline = deadline;
> + fctx->next_deadline_fence =
> + max(fctx->next_deadline_fence, (uint32_t)fence->seqno);
> +
> + /*
> +  * Set timer to trigger boost 3ms before deadline, or
> +  * if we are already less than 3ms before the deadline
> +  * schedule boost work immediately.
> +  */
> + deadline = ktime_sub(deadline, ms_to_ktime(3));
> +
> + if (ktime_after(now, deadline)) {
> + kthread_queue_work(fctx2gpu(fctx)->worker,
> + &fctx->deadline_work);
> + } else {
> + hrtimer_start(&fctx->deadline_timer, deadline,
> + HRTIMER_MODE_ABS);
> + }
> + }
> +
> + spin_unlock_irqrestore(&fctx->spinlock, flags);
> +}
> +
>  static const struct dma_fence_ops msm_fence_ops = {
>   .get_driver_name = msm_fence_get_driver_name,
>   .get_timeline_name = msm_fence_get_timeline_name,
>   .signaled = msm_fence_signaled,
> + .set_deadline = msm_fence_set_deadline,
>  };
>  
>  struct dma_fence *
> diff --git a/drivers/gpu/drm/msm/msm_fence.h b/drivers/gpu/drm/msm/msm_fence.h
> index 4783db528bcc..d34e853c555a 100644
> --- a/drivers/gpu/drm/msm/msm_fence.h
> +++ b/drivers/gpu/drm/msm/msm_fence.h
> @@ -50,6 +50,26 @@ struct msm_fence_context {
>   volatile uint32_t *fenceptr;
>  
>   spinlock_t spinlock;
> +
> + /*
> +  * TODO this doesn't really deal with multiple deadlines, like
> +  * if userspace got multiple frames ahead.. OTOH atomic updates
> +  * don't queue, so maybe that is 

Re: [PATCH v3 8/9] dma-buf/sync_file: Add SET_DEADLINE ioctl

2021-09-08 Thread Daniel Vetter
On Fri, Sep 03, 2021 at 11:47:59AM -0700, Rob Clark wrote:
> From: Rob Clark 
> 
> The initial purpose is for igt tests, but this would also be useful for
> compositors that wait until close to vblank deadline to make decisions
> about which frame to show.
> 
> Signed-off-by: Rob Clark 

Needs userspace and I think ideally also some igts to make sure it works
and doesn't go boom.
-Daniel

> ---
>  drivers/dma-buf/sync_file.c| 19 +++
>  include/uapi/linux/sync_file.h | 20 
>  2 files changed, 39 insertions(+)
> 
> diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c
> index 394e6e1e9686..f295772d5169 100644
> --- a/drivers/dma-buf/sync_file.c
> +++ b/drivers/dma-buf/sync_file.c
> @@ -459,6 +459,22 @@ static long sync_file_ioctl_fence_info(struct sync_file 
> *sync_file,
>   return ret;
>  }
>  
> +static int sync_file_ioctl_set_deadline(struct sync_file *sync_file,
> + unsigned long arg)
> +{
> + struct sync_set_deadline ts;
> +
> + if (copy_from_user(&ts, (void __user *)arg, sizeof(ts)))
> + return -EFAULT;
> +
> + if (ts.pad)
> + return -EINVAL;
> +
> + dma_fence_set_deadline(sync_file->fence, ktime_set(ts.tv_sec, 
> ts.tv_nsec));
> +
> + return 0;
> +}
> +
>  static long sync_file_ioctl(struct file *file, unsigned int cmd,
>   unsigned long arg)
>  {
> @@ -471,6 +487,9 @@ static long sync_file_ioctl(struct file *file, unsigned 
> int cmd,
>   case SYNC_IOC_FILE_INFO:
>   return sync_file_ioctl_fence_info(sync_file, arg);
>  
> + case SYNC_IOC_SET_DEADLINE:
> + return sync_file_ioctl_set_deadline(sync_file, arg);
> +
>   default:
>   return -ENOTTY;
>   }
> diff --git a/include/uapi/linux/sync_file.h b/include/uapi/linux/sync_file.h
> index ee2dcfb3d660..f67d4ffe7566 100644
> --- a/include/uapi/linux/sync_file.h
> +++ b/include/uapi/linux/sync_file.h
> @@ -67,6 +67,18 @@ struct sync_file_info {
>   __u64   sync_fence_info;
>  };
>  
> +/**
> + * struct sync_set_deadline - set a deadline on a fence
> + * @tv_sec:  seconds elapsed since epoch
> + * @tv_nsec: nanoseconds elapsed since the time given by the tv_sec
> + * @pad: must be zero
> + */
> +struct sync_set_deadline {
> + __s64   tv_sec;
> + __s32   tv_nsec;
> + __u32   pad;
> +};
> +
>  #define SYNC_IOC_MAGIC   '>'
>  
>  /**
> @@ -95,4 +107,12 @@ struct sync_file_info {
>   */
>  #define SYNC_IOC_FILE_INFO   _IOWR(SYNC_IOC_MAGIC, 4, struct sync_file_info)
>  
> +
> +/**
> + * DOC: SYNC_IOC_SET_DEADLINE - set a deadline on a fence
> + *
> + * Allows userspace to set a deadline on a fence, see 
> dma_fence_set_deadline()
> + */
> +#define SYNC_IOC_SET_DEADLINE_IOW(SYNC_IOC_MAGIC, 5, struct 
> sync_set_deadline)
> +
>  #endif /* _UAPI_LINUX_SYNC_H */
> -- 
> 2.31.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 13/14] drm/kmb: Enable alpha blended second plane

2021-09-08 Thread Thomas Zimmermann

Hi

Am 03.08.21 um 07:10 schrieb Sam Ravnborg:

Hi Anitha,

On Mon, Aug 02, 2021 at 08:44:26PM +, Chrisanthus, Anitha wrote:

Hi Sam,
Thanks. Where should this go, drm-misc-fixes or drm-misc-next?


Looks like a drm-misc-next candidate to me.
I may improve something for existing users, but it does not look like it
fixes an existing bug.


I found this patch in drm-misc-fixes, although it doesn't look like a 
bugfix. It should have gone into drm-misc-next. See [1]. If it indeed 
belongs into drm-misc-fixes, it certainly should have contained a Fixes tag.


Best regards
Thomas

[1] 
https://drm.pages.freedesktop.org/maintainer-tools/committer-drm-misc.html#where-do-i-apply-my-patch




Sam



--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH v3 5/9] drm/msm: Add deadline based boost support

2021-09-08 Thread Rob Clark
On Wed, Sep 8, 2021 at 10:48 AM Daniel Vetter  wrote:
>
> On Fri, Sep 03, 2021 at 11:47:56AM -0700, Rob Clark wrote:
> > From: Rob Clark 
> >
> > Signed-off-by: Rob Clark 
>
> Why do you need a kthread_work here? Is this just to make sure you're
> running at realtime prio? Maybe a comment to that effect would be good.

Mostly because we are already using a kthread_worker for things the
GPU needs to kick off to a different context.. but I think this is
something we'd want at a realtime prio

BR,
-R

> -Daniel
>
> > ---
> >  drivers/gpu/drm/msm/msm_fence.c   | 76 +++
> >  drivers/gpu/drm/msm/msm_fence.h   | 20 +++
> >  drivers/gpu/drm/msm/msm_gpu.h |  1 +
> >  drivers/gpu/drm/msm/msm_gpu_devfreq.c | 20 +++
> >  4 files changed, 117 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/msm/msm_fence.c 
> > b/drivers/gpu/drm/msm/msm_fence.c
> > index f2cece542c3f..67c2a96e1c85 100644
> > --- a/drivers/gpu/drm/msm/msm_fence.c
> > +++ b/drivers/gpu/drm/msm/msm_fence.c
> > @@ -8,6 +8,37 @@
> >
> >  #include "msm_drv.h"
> >  #include "msm_fence.h"
> > +#include "msm_gpu.h"
> > +
> > +static inline bool fence_completed(struct msm_fence_context *fctx, 
> > uint32_t fence);
> > +
> > +static struct msm_gpu *fctx2gpu(struct msm_fence_context *fctx)
> > +{
> > + struct msm_drm_private *priv = fctx->dev->dev_private;
> > + return priv->gpu;
> > +}
> > +
> > +static enum hrtimer_restart deadline_timer(struct hrtimer *t)
> > +{
> > + struct msm_fence_context *fctx = container_of(t,
> > + struct msm_fence_context, deadline_timer);
> > +
> > + kthread_queue_work(fctx2gpu(fctx)->worker, &fctx->deadline_work);
> > +
> > + return HRTIMER_NORESTART;
> > +}
> > +
> > +static void deadline_work(struct kthread_work *work)
> > +{
> > + struct msm_fence_context *fctx = container_of(work,
> > + struct msm_fence_context, deadline_work);
> > +
> > + /* If deadline fence has already passed, nothing to do: */
> > + if (fence_completed(fctx, fctx->next_deadline_fence))
> > + return;
> > +
> > + msm_devfreq_boost(fctx2gpu(fctx), 2);
> > +}
> >
> >
> >  struct msm_fence_context *
> > @@ -26,6 +57,13 @@ msm_fence_context_alloc(struct drm_device *dev, volatile 
> > uint32_t *fenceptr,
> >   fctx->fenceptr = fenceptr;
> >   spin_lock_init(&fctx->spinlock);
> >
> > + hrtimer_init(&fctx->deadline_timer, CLOCK_MONOTONIC, 
> > HRTIMER_MODE_ABS);
> > + fctx->deadline_timer.function = deadline_timer;
> > +
> > + kthread_init_work(&fctx->deadline_work, deadline_work);
> > +
> > + fctx->next_deadline = ktime_get();
> > +
> >   return fctx;
> >  }
> >
> > @@ -49,6 +87,8 @@ void msm_update_fence(struct msm_fence_context *fctx, 
> > uint32_t fence)
> >  {
> >   spin_lock(&fctx->spinlock);
> >   fctx->completed_fence = max(fence, fctx->completed_fence);
> > + if (fence_completed(fctx, fctx->next_deadline_fence))
> > + hrtimer_cancel(&fctx->deadline_timer);
> >   spin_unlock(&fctx->spinlock);
> >  }
> >
> > @@ -79,10 +119,46 @@ static bool msm_fence_signaled(struct dma_fence *fence)
> >   return fence_completed(f->fctx, f->base.seqno);
> >  }
> >
> > +static void msm_fence_set_deadline(struct dma_fence *fence, ktime_t 
> > deadline)
> > +{
> > + struct msm_fence *f = to_msm_fence(fence);
> > + struct msm_fence_context *fctx = f->fctx;
> > + unsigned long flags;
> > + ktime_t now;
> > +
> > + spin_lock_irqsave(&fctx->spinlock, flags);
> > + now = ktime_get();
> > +
> > + if (ktime_after(now, fctx->next_deadline) ||
> > + ktime_before(deadline, fctx->next_deadline)) {
> > + fctx->next_deadline = deadline;
> > + fctx->next_deadline_fence =
> > + max(fctx->next_deadline_fence, 
> > (uint32_t)fence->seqno);
> > +
> > + /*
> > +  * Set timer to trigger boost 3ms before deadline, or
> > +  * if we are already less than 3ms before the deadline
> > +  * schedule boost work immediately.
> > +  */
> > + deadline = ktime_sub(deadline, ms_to_ktime(3));
> > +
> > + if (ktime_after(now, deadline)) {
> > + kthread_queue_work(fctx2gpu(fctx)->worker,
> > + &fctx->deadline_work);
> > + } else {
> > + hrtimer_start(&fctx->deadline_timer, deadline,
> > + HRTIMER_MODE_ABS);
> > + }
> > + }
> > +
> > + spin_unlock_irqrestore(&fctx->spinlock, flags);
> > +}
> > +
> >  static const struct dma_fence_ops msm_fence_ops = {
> >   .get_driver_name = msm_fence_get_driver_name,
> >   .get_timeline_name = msm_fence_get_timeline_name,
> >   .signaled = msm_fence_signaled,
> > + .set_deadline = msm_fence_set_deadline,
> >  };
> >
> >  struct dma_fence *
> > diff --git a/d

Re: [PATCH v3 7/9] dma-buf/fence-chain: Add fence deadline support

2021-09-08 Thread Daniel Vetter
On Fri, Sep 03, 2021 at 11:47:58AM -0700, Rob Clark wrote:
> From: Rob Clark 
> 
> Signed-off-by: Rob Clark 
> ---
>  drivers/dma-buf/dma-fence-chain.c | 13 +
>  1 file changed, 13 insertions(+)
> 
> diff --git a/drivers/dma-buf/dma-fence-chain.c 
> b/drivers/dma-buf/dma-fence-chain.c
> index 1b4cb3e5cec9..736a9ad3ea6d 100644
> --- a/drivers/dma-buf/dma-fence-chain.c
> +++ b/drivers/dma-buf/dma-fence-chain.c
> @@ -208,6 +208,18 @@ static void dma_fence_chain_release(struct dma_fence 
> *fence)
>   dma_fence_free(fence);
>  }
>  
> +
> +static void dma_fence_chain_set_deadline(struct dma_fence *fence,
> +  ktime_t deadline)
> +{
> + dma_fence_chain_for_each(fence, fence) {
> + struct dma_fence_chain *chain = to_dma_fence_chain(fence);
> + struct dma_fence *f = chain ? chain->fence : fence;

Doesn't this just end up calling set_deadline on a chain, potenetially
resulting in recursion? Also I don't think this should ever happen, why
did you add that?
-Daniel

> +
> + dma_fence_set_deadline(f, deadline);
> + }
> +}
> +
>  const struct dma_fence_ops dma_fence_chain_ops = {
>   .use_64bit_seqno = true,
>   .get_driver_name = dma_fence_chain_get_driver_name,
> @@ -215,6 +227,7 @@ const struct dma_fence_ops dma_fence_chain_ops = {
>   .enable_signaling = dma_fence_chain_enable_signaling,
>   .signaled = dma_fence_chain_signaled,
>   .release = dma_fence_chain_release,
> + .set_deadline = dma_fence_chain_set_deadline,
>  };
>  EXPORT_SYMBOL(dma_fence_chain_ops);
>  
> -- 
> 2.31.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH v3 1/9] dma-fence: Add deadline awareness

2021-09-08 Thread Daniel Vetter
On Fri, Sep 03, 2021 at 11:47:52AM -0700, Rob Clark wrote:
> From: Rob Clark 
> 
> Add a way to hint to the fence signaler of an upcoming deadline, such as
> vblank, which the fence waiter would prefer not to miss.  This is to aid
> the fence signaler in making power management decisions, like boosting
> frequency as the deadline approaches and awareness of missing deadlines
> so that can be factored in to the frequency scaling.
> 
> v2: Drop dma_fence::deadline and related logic to filter duplicate
> deadlines, to avoid increasing dma_fence size.  The fence-context
> implementation will need similar logic to track deadlines of all
> the fences on the same timeline.  [ckoenig]
> 
> Signed-off-by: Rob Clark 
> Reviewed-by: Christian König 
> Signed-off-by: Rob Clark 
> ---
>  drivers/dma-buf/dma-fence.c | 20 
>  include/linux/dma-fence.h   | 16 
>  2 files changed, 36 insertions(+)
> 
> diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
> index ce0f5eff575d..1f444863b94d 100644
> --- a/drivers/dma-buf/dma-fence.c
> +++ b/drivers/dma-buf/dma-fence.c
> @@ -910,6 +910,26 @@ dma_fence_wait_any_timeout(struct dma_fence **fences, 
> uint32_t count,
>  }
>  EXPORT_SYMBOL(dma_fence_wait_any_timeout);
>  
> +
> +/**
> + * dma_fence_set_deadline - set desired fence-wait deadline
> + * @fence:the fence that is to be waited on
> + * @deadline: the time by which the waiter hopes for the fence to be
> + *signaled
> + *
> + * Inform the fence signaler of an upcoming deadline, such as vblank, by
> + * which point the waiter would prefer the fence to be signaled by.  This
> + * is intended to give feedback to the fence signaler to aid in power
> + * management decisions, such as boosting GPU frequency if a periodic
> + * vblank deadline is approaching.
> + */
> +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
> +{
> + if (fence->ops->set_deadline && !dma_fence_is_signaled(fence))
> + fence->ops->set_deadline(fence, deadline);
> +}
> +EXPORT_SYMBOL(dma_fence_set_deadline);
> +
>  /**
>   * dma_fence_init - Initialize a custom fence.
>   * @fence: the fence to initialize
> diff --git a/include/linux/dma-fence.h b/include/linux/dma-fence.h
> index 6ffb4b2c6371..9c809f0d5d0a 100644
> --- a/include/linux/dma-fence.h
> +++ b/include/linux/dma-fence.h
> @@ -99,6 +99,7 @@ enum dma_fence_flag_bits {
>   DMA_FENCE_FLAG_SIGNALED_BIT,
>   DMA_FENCE_FLAG_TIMESTAMP_BIT,
>   DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
> + DMA_FENCE_FLAG_HAS_DEADLINE_BIT,
>   DMA_FENCE_FLAG_USER_BITS, /* must always be last member */
>  };
>  
> @@ -261,6 +262,19 @@ struct dma_fence_ops {
>*/
>   void (*timeline_value_str)(struct dma_fence *fence,
>  char *str, int size);
> +
> + /**
> +  * @set_deadline:
> +  *
> +  * Callback to allow a fence waiter to inform the fence signaler of an
> +  * upcoming deadline, such as vblank, by which point the waiter would
> +  * prefer the fence to be signaled by.  This is intended to give 
> feedback
> +  * to the fence signaler to aid in power management decisions, such as
> +  * boosting GPU frequency.

Please add here that this callback is called without &dma_fence.lock held,
and that locking is up to callers if they have some state to manage.

I realized that while scratching some heads over your later patches.
-Daniel

> +  *
> +  * This callback is optional.
> +  */
> + void (*set_deadline)(struct dma_fence *fence, ktime_t deadline);
>  };
>  
>  void dma_fence_init(struct dma_fence *fence, const struct dma_fence_ops *ops,
> @@ -586,6 +600,8 @@ static inline signed long dma_fence_wait(struct dma_fence 
> *fence, bool intr)
>   return ret < 0 ? ret : 0;
>  }
>  
> +void dma_fence_set_deadline(struct dma_fence *fence, ktime_t deadline);
> +
>  struct dma_fence *dma_fence_get_stub(void);
>  struct dma_fence *dma_fence_allocate_private_stub(void);
>  u64 dma_fence_context_alloc(unsigned num);
> -- 
> 2.31.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH v3 6/9] dma-buf/fence-array: Add fence deadline support

2021-09-08 Thread Daniel Vetter
On Fri, Sep 03, 2021 at 11:47:57AM -0700, Rob Clark wrote:
> From: Rob Clark 
> 
> Signed-off-by: Rob Clark 
> ---
>  drivers/dma-buf/dma-fence-array.c | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/dma-buf/dma-fence-array.c 
> b/drivers/dma-buf/dma-fence-array.c
> index d3fbd950be94..8d194b09ee3d 100644
> --- a/drivers/dma-buf/dma-fence-array.c
> +++ b/drivers/dma-buf/dma-fence-array.c
> @@ -119,12 +119,23 @@ static void dma_fence_array_release(struct dma_fence 
> *fence)
>   dma_fence_free(fence);
>  }
>  
> +static void dma_fence_array_set_deadline(struct dma_fence *fence,
> +  ktime_t deadline)
> +{
> + struct dma_fence_array *array = to_dma_fence_array(fence);
> + unsigned i;
> +
> + for (i = 0; i < array->num_fences; ++i)
> + dma_fence_set_deadline(array->fences[i], deadline);

Hm I wonder whether this can go wrong, and whether we need Christian's
massive fence iterator that I've seen flying around. If you nest these
things too much it could all go wrong I think. I looked at other users
which inspect dma_fence_array and none of them have a risk for unbounded
recursion.

Maybe check with Christian.
-Daniel


> +}
> +
>  const struct dma_fence_ops dma_fence_array_ops = {
>   .get_driver_name = dma_fence_array_get_driver_name,
>   .get_timeline_name = dma_fence_array_get_timeline_name,
>   .enable_signaling = dma_fence_array_enable_signaling,
>   .signaled = dma_fence_array_signaled,
>   .release = dma_fence_array_release,
> + .set_deadline = dma_fence_array_set_deadline,
>  };
>  EXPORT_SYMBOL(dma_fence_array_ops);
>  
> -- 
> 2.31.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH] drm/rockchip: Update crtc fixup to account for fractional clk change

2021-09-08 Thread Andy Shevchenko
On Wed, Sep 08, 2021 at 08:53:56AM -0500, Chris Morgan wrote:
> From: Chris Morgan 
> 
> After commit 928f9e268611 ("clk: fractional-divider: Hide
> clk_fractional_divider_ops from wide audience") was merged it appears
> that the DSI panel on my Odroid Go Advance stopped working. Upon closer
> examination of the problem, it looks like it was the fixup in the
> rockchip_drm_vop.c file was causing the issue. The changes made to the
> clk driver appear to change some assumptions made in the fixup.
> 
> After debugging the working 5.14 kernel and the no-longer working
> 5.15 kernel, it looks like this was broken all along but still
> worked, whereas after the fractional clock change it stopped
> working despite the issue (it went from sort-of broken to very broken).
> 
> In the 5.14 kernel the dclk_vopb_frac was being requested to be set to
> 17000999 on my board. The clock driver was taking the value of the
> parent clock and attempting to divide the requested value from it
> (1700/17000999 = 0), then subtracting 1 from it (making it -1),
> and running it through fls_long to get 64. It would then subtract
> the value of fd->mwidth from it to get 48, and then bit shift
> 17000999 to the left by 48, coming up with a very large number of
> 7649082492112076800. This resulted in a numerator of 65535 and a
> denominator of 1 from the clk driver. The driver seemingly would
> try again and get a correct 1:1 value later, and then move on.
> 
> Output from my 5.14 kernel (with some printfs for good measure):
> [2.830066] rockchip-drm display-subsystem: bound ff46.vop (ops 
> vop_component_ops)
> [2.839431] rockchip-drm display-subsystem: bound ff45.dsi (ops 
> dw_mipi_dsi_rockchip_ops)
> [2.855980] Clock is dclk_vopb_frac
> [2.856004] Scale 64, Rate 7649082492112076800, Oldrate 17000999, Parent 
> Rate 1700, Best Numerator 65535, Best Denominator 1, fd->mwidth 16
> [2.903529] Clock is dclk_vopb_frac
> [2.903556] Scale 0, Rate 1700, Oldrate 1700, Parent Rate 
> 1700, Best Numerator 1, Best Denominator 1, fd->mwidth 16
> [2.903579] Clock is dclk_vopb_frac
> [2.903583] Scale 0, Rate 1700, Oldrate 1700, Parent Rate 
> 1700, Best Numerator 1, Best Denominator 1, fd->mwidth 16
> 
> Contrast this with 5.15 after the clk change where the rate of 17000999
> was getting passed and resulted in numerators/denomiators of 17001/
> 17000.
> 
> Output from my 5.15 kernel (with some printfs added for good measure):
> [2.817571] rockchip-drm display-subsystem: bound ff46.vop (ops 
> vop_component_ops)
> [2.826975] rockchip-drm display-subsystem: bound ff45.dsi (ops 
> dw_mipi_dsi_rockchip_ops)
> [2.843430] Rate 17000999, Parent Rate 1700, Best Numerator 17018, 
> Best Denominator 17017
> [2.891073] Rate 17001000, Parent Rate 1700, Best Numerator 17001, 
> Best Denominator 17000
> [2.891269] Rate 17001000, Parent Rate 1700, Best Numerator 17001, 
> Best Denominator 17000
> [2.891281] Rate 17001000, Parent Rate 1700, Best Numerator 17001, 
> Best Denominator 17000
> 
> After tracing through the code it appeared that this function here was
> adding a 999 to the requested frequency because of how the clk driver
> was rounding/accepting those frequencies. I believe after the changes
> made in the commit listed above the assumptions listed in this driver
> are no longer true. When I remove the + 999 from the driver the DSI
> panel begins to work again.
> 
> Output from my 5.15 kernel with 999 removed (printfs added):
> [2.852054] rockchip-drm display-subsystem: bound ff46.vop (ops 
> vop_component_ops)
> [2.864483] rockchip-drm display-subsystem: bound ff45.dsi (ops 
> dw_mipi_dsi_rockchip_ops)
> [2.880869] Clock is dclk_vopb_frac
> [2.880892] Rate 1700, Parent Rate 1700, Best Numerator 1, Best 
> Denominator 1
> [2.928521] Clock is dclk_vopb_frac
> [2.928551] Rate 1700, Parent Rate 1700, Best Numerator 1, Best 
> Denominator 1
> [2.928570] Clock is dclk_vopb_frac
> [2.928574] Rate 1700, Parent Rate 1700, Best Numerator 1, Best 
> Denominator 1
> 
> I have tested the change extensively on my Odroid Go Advance (Rockchip
> RK3326) and it appears to work well. However, this change will affect
> all Rockchip SoCs that use this driver so I believe further testing
> is warranted. Please note that without this change I can confirm
> at least all PX30s with DSI panels will stop working with the 5.15
> kernel.

To me it all makes a lot of sense, thank you for deep analysis of the issue!
In any case I think we will need a Fixes tag to something (either one of
clk-fractional-divider.c series or preexisted).

Anyway, FWIW,
Reviewed-by: Andy Shevchenko 

> Signed-off-by: Chris Morgan 
> ---
>  drivers/gpu/drm/rockchip/rockchip_drm_vop.c | 21 +++--
>  1 file changed, 3 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c

Re: [PATCH v2 7/7] drm/gud: Add module parameter to control emulation: xrgb8888

2021-09-08 Thread Thomas Zimmermann

Hi

Am 07.09.21 um 13:57 schrieb Noralf Trønnes:

For devices that don't support XRGB give the user the ability to
choose what's most important: Color depth or frames per second.

Add an 'xrgb' module parameter to override the emulation format.

Assume the user wants full control if xrgb is set and don't set
DRM_CAP_DUMB_PREFERRED_DEPTH if RGB565 is supported (AFAIK only X.org
supports this).


More of a general statement: wouldn't it make more sense to auto-detect 
this entirely? The GUD protocol could order the list of supported 
formats by preference (maybe it does already). Or you could take the 
type of USB connection into account.


Additionally, xrgb is really a fall-back for lazy userspace 
programs, but userspace should do better IMHO.


Best regards
Thomas



Signed-off-by: Noralf Trønnes 
---
  drivers/gpu/drm/gud/gud_drv.c | 13 ++---
  1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/gud/gud_drv.c b/drivers/gpu/drm/gud/gud_drv.c
index 3f9d4b9a1e3d..60d27ee5ddbd 100644
--- a/drivers/gpu/drm/gud/gud_drv.c
+++ b/drivers/gpu/drm/gud/gud_drv.c
@@ -30,6 +30,10 @@
  
  #include "gud_internal.h"
  
+static int gud_xrgb;

+module_param_named(xrgb, gud_xrgb, int, 0644);
+MODULE_PARM_DESC(xrgb, "XRGB emulation format: GUD_PIXEL_FORMAT_* value, 
0=auto, -1=disable [default=auto]");
+
  /* Only used internally */
  static const struct drm_format_info gud_drm_format_r1 = {
.format = GUD_DRM_FORMAT_R1,
@@ -530,12 +534,12 @@ static int gud_probe(struct usb_interface *intf, const 
struct usb_device_id *id)
case DRM_FORMAT_RGB332:
fallthrough;
case DRM_FORMAT_RGB888:
-   if (!xrgb_emulation_format)
+   if (!gud_xrgb && !xrgb_emulation_format)
xrgb_emulation_format = info;
break;
case DRM_FORMAT_RGB565:
rgb565_supported = true;
-   if (!xrgb_emulation_format)
+   if (!gud_xrgb && !xrgb_emulation_format)
xrgb_emulation_format = info;
break;
case DRM_FORMAT_XRGB:
@@ -543,6 +547,9 @@ static int gud_probe(struct usb_interface *intf, const 
struct usb_device_id *id)
break;
}
  
+		if (gud_xrgb == formats_dev[i])

+   xrgb_emulation_format = info;
+
fmt_buf_size = drm_format_info_min_pitch(info, 0, 
drm->mode_config.max_width) *
   drm->mode_config.max_height;
max_buffer_size = max(max_buffer_size, fmt_buf_size);
@@ -559,7 +566,7 @@ static int gud_probe(struct usb_interface *intf, const 
struct usb_device_id *id)
}
  
  	/* Prefer speed over color depth */

-   if (rgb565_supported)
+   if (!gud_xrgb && rgb565_supported)
drm->mode_config.preferred_depth = 16;
  
  	if (!xrgb_supported && xrgb_emulation_format) {




--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH v3 7/9] dma-buf/fence-chain: Add fence deadline support

2021-09-08 Thread Rob Clark
On Wed, Sep 8, 2021 at 10:54 AM Daniel Vetter  wrote:
>
> On Fri, Sep 03, 2021 at 11:47:58AM -0700, Rob Clark wrote:
> > From: Rob Clark 
> >
> > Signed-off-by: Rob Clark 
> > ---
> >  drivers/dma-buf/dma-fence-chain.c | 13 +
> >  1 file changed, 13 insertions(+)
> >
> > diff --git a/drivers/dma-buf/dma-fence-chain.c 
> > b/drivers/dma-buf/dma-fence-chain.c
> > index 1b4cb3e5cec9..736a9ad3ea6d 100644
> > --- a/drivers/dma-buf/dma-fence-chain.c
> > +++ b/drivers/dma-buf/dma-fence-chain.c
> > @@ -208,6 +208,18 @@ static void dma_fence_chain_release(struct dma_fence 
> > *fence)
> >   dma_fence_free(fence);
> >  }
> >
> > +
> > +static void dma_fence_chain_set_deadline(struct dma_fence *fence,
> > +  ktime_t deadline)
> > +{
> > + dma_fence_chain_for_each(fence, fence) {
> > + struct dma_fence_chain *chain = to_dma_fence_chain(fence);
> > + struct dma_fence *f = chain ? chain->fence : fence;
>
> Doesn't this just end up calling set_deadline on a chain, potenetially
> resulting in recursion? Also I don't think this should ever happen, why
> did you add that?

Tbh the fence-chain was the part I was a bit fuzzy about, and the main
reason I added igt tests.  The iteration is similar to how, for ex,
dma_fence_chain_signaled() work, and according to the igt test it does
what was intended

BR,
-R

> -Daniel
>
> > +
> > + dma_fence_set_deadline(f, deadline);
> > + }
> > +}
> > +
> >  const struct dma_fence_ops dma_fence_chain_ops = {
> >   .use_64bit_seqno = true,
> >   .get_driver_name = dma_fence_chain_get_driver_name,
> > @@ -215,6 +227,7 @@ const struct dma_fence_ops dma_fence_chain_ops = {
> >   .enable_signaling = dma_fence_chain_enable_signaling,
> >   .signaled = dma_fence_chain_signaled,
> >   .release = dma_fence_chain_release,
> > + .set_deadline = dma_fence_chain_set_deadline,
> >  };
> >  EXPORT_SYMBOL(dma_fence_chain_ops);
> >
> > --
> > 2.31.1
> >
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch


Re: [PATCH] doc: gpu: Add document describing buffer exchange

2021-09-08 Thread Daniel Vetter
On Sun, Sep 05, 2021 at 01:27:42PM +0100, Daniel Stone wrote:
> Since there's a lot of confusion around this, document both the rules
> and the best practice around negotiating, allocating, importing, and
> using buffers when crossing context/process/device/subsystem boundaries.
> 
> This ties up all of dmabuf, formats and modifiers, and their usage.
> 
> Signed-off-by: Daniel Stone 
> ---
> 
> This is just a quick first draft, inspired by:
>   https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3197#note_1048637
> 
> It's not complete or perfect, but I'm off to eat a roast then have a
> nice walk in the sun, so figured it'd be better to dash it off rather
> than let it rot on my hard drive.
> 
> 
>  .../gpu/exchanging-pixel-buffers.rst  | 285 ++

I think we should stuff this into the dma-buf.rst page instead of hiding
it in gpu?

Maybe then link to it from everywhere, so from a the prime stuff in gpu,
and from whatever doc there is for the v4l import/export ioctls.

>  Documentation/gpu/index.rst   |   1 +
>  2 files changed, 286 insertions(+)
>  create mode 100644 Documentation/gpu/exchanging-pixel-buffers.rst
> 
> diff --git a/Documentation/gpu/exchanging-pixel-buffers.rst 
> b/Documentation/gpu/exchanging-pixel-buffers.rst
> new file mode 100644
> index ..75c4de13d5c8
> --- /dev/null
> +++ b/Documentation/gpu/exchanging-pixel-buffers.rst
> @@ -0,0 +1,285 @@
> +.. Copyright 2021 Collabora Ltd.
> +
> +
> +Exchanging pixel buffers
> +
> +
> +As originally designed, the Linux graphics subsystem had extremely limited
> +support for sharing pixel-buffer allocations between processes, devices, and
> +subsystems. Modern systems require extensive integration between all three
> +classes; this document details how applications and kernel subsystems should
> +approach this sharing for two-dimensional image data.
> +
> +It is written with reference to the DRM subsystem for GPU and display 
> devices,
> +V4L2 for media devices, and also to Vulkan, EGL and Wayland, for userspace
> +support, however any other subsystems should also follow this design and 
> advice.
> +
> +
> +Formats and modifiers
> +=
> +
> +Each buffer must have an underlying format. This format describes the data 
> which
> +can be stored and loaded for each pixel. Although each subsystem has its own
> +format descriptions (e.g. V4L2 and fbdev), the `DRM_FORMAT_*` tokens should 
> be
> +reused wherever possible, as they are the standard descriptions used for
> +interchange.
> +
> +Each `DRM_FORMAT_*` token describes the per-pixel data available, in terms of
> +the translation between one or more pixels in memory, and the color data
> +contained within that memory. The number and type of color channels are
> +described: whether they are RGB or YUV, integer or floating-point, the size
> +of each channel and their locations within the pixel memory, and the
> +relationship between color planes.
> +
> +For example, `DRM_FORMAT_ARGB` describes a format in which each pixel 
> has a
> +single 32-bit value in memory. Alpha, red, green, and blue, color channels 
> are
> +available at 8-byte precision per channel, ordered respectively from most to
> +least significant bits in little-endian storage. As a more complex example,
> +`DRM_FORMAT_NV12` describes a format in which luma and chroma YUV samples are
> +stored in separate memory planes, where the chroma plane is stored at half 
> the
> +resolution in both dimensions (i.e. one U/V chroma sample is stored for each 
> 2x2
> +pixel grouping).
> +
> +Format modifiers describe a translation mechanism between these per-pixel 
> memory
> +samples, and the actual memory storage for the buffer. The most 
> straightforward
> +modifier is `DRM_FORMAT_MOD_LINEAR`, describing a scheme in which each pixel 
> has
> +contiguous storage beginning at (0,0); each pixel's location in memory will 
> be
> +`base + (y * stride) + (x * bpp)`. This is considered the baseline 
> interchange
> +format, and most convenient for CPU access.
> +
> +Modern hardware employs much more sophisticated access mechanisms, typically
> +making use of tiled access and possibly also compression. For example, the
> +`DRM_FORMAT_MOD_VIVANTE_TILED` modifier describes memory storage where pixels
> +are stored in 4x4 blocks arranged in row-major ordering, i.e. the first tile 
> in
> +memory stores pixels (0,0) to (3,3) inclusive, and the second tile in memory
> +stores pixels (4,0) to (7,3) inclusive.
> +
> +Some modifiers may modify the number of memory buffers required to store the
> +data; for example, the `I915_FORMAT_MOD_Y_TILED_CCS` modifier adds a second
> +memory buffer to RGB formats in which it stores data about the status of 
> every
> +tile, notably including whether the tile is fully populated with pixel data, 
> or
> +can be expanded from a single solid color.
> +
> +These extended layouts are highly vendor-spe

[PATCH 1/2] drm/bridge: parade-ps8640: Use regmap APIs

2021-09-08 Thread Philip Chen
Replace the direct i2c access (i2c_smbus_* functions) with regmap APIs,
which will simplify the future update on ps8640 driver.

Signed-off-by: Philip Chen 
---

 drivers/gpu/drm/bridge/parade-ps8640.c | 66 +++---
 1 file changed, 39 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/bridge/parade-ps8640.c 
b/drivers/gpu/drm/bridge/parade-ps8640.c
index 685e9c38b2db..a16725dbf912 100644
--- a/drivers/gpu/drm/bridge/parade-ps8640.c
+++ b/drivers/gpu/drm/bridge/parade-ps8640.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -64,12 +65,29 @@ struct ps8640 {
struct drm_bridge *panel_bridge;
struct mipi_dsi_device *dsi;
struct i2c_client *page[MAX_DEVS];
+   struct regmap   *regmap[MAX_DEVS];
struct regulator_bulk_data supplies[2];
struct gpio_desc *gpio_reset;
struct gpio_desc *gpio_powerdown;
bool powered;
 };
 
+static const struct regmap_range ps8640_volatile_ranges[] = {
+   { .range_min = 0, .range_max = 0xff },
+};
+
+static const struct regmap_access_table ps8640_volatile_table = {
+   .yes_ranges = ps8640_volatile_ranges,
+   .n_yes_ranges = ARRAY_SIZE(ps8640_volatile_ranges),
+};
+
+static const struct regmap_config ps8640_regmap_config = {
+   .reg_bits = 8,
+   .val_bits = 8,
+   .volatile_table = &ps8640_volatile_table,
+   .cache_type = REGCACHE_NONE,
+};
+
 static inline struct ps8640 *bridge_to_ps8640(struct drm_bridge *e)
 {
return container_of(e, struct ps8640, bridge);
@@ -78,13 +96,13 @@ static inline struct ps8640 *bridge_to_ps8640(struct 
drm_bridge *e)
 static int ps8640_bridge_vdo_control(struct ps8640 *ps_bridge,
 const enum ps8640_vdo_control ctrl)
 {
-   struct i2c_client *client = ps_bridge->page[PAGE3_DSI_CNTL1];
-   u8 vdo_ctrl_buf[] = { VDO_CTL_ADD, ctrl };
+   struct regmap *map = ps_bridge->regmap[PAGE3_DSI_CNTL1];
+   u8 vdo_ctrl_buf[] = {VDO_CTL_ADD, ctrl};
int ret;
 
-   ret = i2c_smbus_write_i2c_block_data(client, PAGE3_SET_ADD,
-sizeof(vdo_ctrl_buf),
-vdo_ctrl_buf);
+   ret = regmap_bulk_write(map, PAGE3_SET_ADD,
+   vdo_ctrl_buf, sizeof(vdo_ctrl_buf));
+
if (ret < 0) {
DRM_ERROR("failed to %sable VDO: %d\n",
  ctrl == ENABLE ? "en" : "dis", ret);
@@ -96,8 +114,7 @@ static int ps8640_bridge_vdo_control(struct ps8640 
*ps_bridge,
 
 static void ps8640_bridge_poweron(struct ps8640 *ps_bridge)
 {
-   struct i2c_client *client = ps_bridge->page[PAGE2_TOP_CNTL];
-   unsigned long timeout;
+   struct regmap *map = ps_bridge->regmap[PAGE2_TOP_CNTL];
int ret, status;
 
if (ps_bridge->powered)
@@ -121,18 +138,12 @@ static void ps8640_bridge_poweron(struct ps8640 
*ps_bridge)
 */
msleep(200);
 
-   timeout = jiffies + msecs_to_jiffies(200) + 1;
+   ret = regmap_read_poll_timeout(map, PAGE2_GPIO_H, status,
+   status & PS_GPIO9, 20 * 1000, 200 * 1000);
 
-   while (time_is_after_jiffies(timeout)) {
-   status = i2c_smbus_read_byte_data(client, PAGE2_GPIO_H);
-   if (status < 0) {
-   DRM_ERROR("failed read PAGE2_GPIO_H: %d\n", status);
-   goto err_regulators_disable;
-   }
-   if ((status & PS_GPIO9) == PS_GPIO9)
-   break;
-
-   msleep(20);
+   if (ret < 0) {
+   DRM_ERROR("failed read PAGE2_GPIO_H: %d\n", ret);
+   goto err_regulators_disable;
}
 
msleep(50);
@@ -144,22 +155,15 @@ static void ps8640_bridge_poweron(struct ps8640 
*ps_bridge)
 * disabled by the manufacturer. Once disabled, all MCS commands are
 * ignored by the display interface.
 */
-   status = i2c_smbus_read_byte_data(client, PAGE2_MCS_EN);
-   if (status < 0) {
-   DRM_ERROR("failed read PAGE2_MCS_EN: %d\n", status);
-   goto err_regulators_disable;
-   }
 
-   ret = i2c_smbus_write_byte_data(client, PAGE2_MCS_EN,
-   status & ~MCS_EN);
+   ret = regmap_update_bits(map, PAGE2_MCS_EN, MCS_EN, 0);
if (ret < 0) {
DRM_ERROR("failed write PAGE2_MCS_EN: %d\n", ret);
goto err_regulators_disable;
}
 
/* Switch access edp panel's edid through i2c */
-   ret = i2c_smbus_write_byte_data(client, PAGE2_I2C_BYPASS,
-   I2C_BYPASS_EN);
+   ret = regmap_write(map, PAGE2_I2C_BYPASS, I2C_BYPASS_EN);
if (ret < 0) {
DRM_ERROR("failed write PAGE2_I2C_BYPASS: %d\n", ret);
goto err_regulators_disable;
@@ -361,6 +365,10 @@ static int ps8640_probe(struct i2c_client *client)

[PATCH 2/2] drm/bridge: parade-ps8640: Add support for AUX channel

2021-09-08 Thread Philip Chen
Implement the first version of AUX support, which will be useful as
we expand the driver to support varied use cases.

Signed-off-by: Philip Chen 
---

 drivers/gpu/drm/bridge/parade-ps8640.c | 123 +
 1 file changed, 123 insertions(+)

diff --git a/drivers/gpu/drm/bridge/parade-ps8640.c 
b/drivers/gpu/drm/bridge/parade-ps8640.c
index a16725dbf912..3f0241a60357 100644
--- a/drivers/gpu/drm/bridge/parade-ps8640.c
+++ b/drivers/gpu/drm/bridge/parade-ps8640.c
@@ -9,15 +9,36 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
 
+#define PAGE0_AUXCH_CFG3   0x76
+#define  AUXCH_CFG3_RESET  0xff
+#define PAGE0_AUX_ADDR_7_0 0x7d
+#define PAGE0_AUX_ADDR_15_80x7e
+#define PAGE0_AUX_ADDR_23_16   0x7f
+#define  AUX_ADDR_19_16_MASK   GENMASK(3, 0)
+#define  AUX_CMD_MASK  GENMASK(7, 4)
+#define PAGE0_AUX_LENGTH   0x80
+#define  AUX_LENGTH_MASK   GENMASK(3, 0)
+#define PAGE0_AUX_WDATA0x81
+#define PAGE0_AUX_RDATA0x82
+#define PAGE0_AUX_CTRL 0x83
+#define  AUX_START 0x01
+#define PAGE0_AUX_STATUS   0x84
+#define  AUX_STATUS_MASK   GENMASK(7, 5)
+#define  AUX_STATUS_TIMEOUT(0x7 << 5)
+#define  AUX_STATUS_DEFER  (0x2 << 5)
+#define  AUX_STATUS_NACK   (0x1 << 5)
+
 #define PAGE2_GPIO_H   0xa7
 #define  PS_GPIO9  BIT(1)
 #define PAGE2_I2C_BYPASS   0xea
@@ -63,6 +84,7 @@ enum ps8640_vdo_control {
 struct ps8640 {
struct drm_bridge bridge;
struct drm_bridge *panel_bridge;
+   struct drm_dp_aux aux;
struct mipi_dsi_device *dsi;
struct i2c_client *page[MAX_DEVS];
struct regmap   *regmap[MAX_DEVS];
@@ -93,6 +115,102 @@ static inline struct ps8640 *bridge_to_ps8640(struct 
drm_bridge *e)
return container_of(e, struct ps8640, bridge);
 }
 
+static inline struct ps8640 *aux_to_ps8640(struct drm_dp_aux *aux)
+{
+   return container_of(aux, struct ps8640, aux);
+}
+
+static ssize_t ps8640_aux_transfer(struct drm_dp_aux *aux,
+  struct drm_dp_aux_msg *msg)
+{
+   struct ps8640 *ps_bridge = aux_to_ps8640(aux);
+   struct i2c_client *client = ps_bridge->page[PAGE0_DP_CNTL];
+   struct regmap *map = ps_bridge->regmap[PAGE0_DP_CNTL];
+   unsigned int len = msg->size;
+   unsigned int data;
+   int ret;
+   u8 request = msg->request &
+~(DP_AUX_I2C_MOT | DP_AUX_I2C_WRITE_STATUS_UPDATE);
+   u8 *buf = msg->buffer;
+   bool is_native_aux = false;
+
+   if (len > DP_AUX_MAX_PAYLOAD_BYTES)
+   return -EINVAL;
+
+   pm_runtime_get_sync(&client->dev);
+
+   switch (request) {
+   case DP_AUX_NATIVE_WRITE:
+   case DP_AUX_NATIVE_READ:
+   is_native_aux = true;
+   case DP_AUX_I2C_WRITE:
+   case DP_AUX_I2C_READ:
+   regmap_write(map, PAGE0_AUXCH_CFG3, AUXCH_CFG3_RESET);
+   break;
+   default:
+   ret = -EINVAL;
+   goto exit;
+   }
+
+   /* Assume it's good */
+   msg->reply = 0;
+
+   data = ((request << 4) & AUX_CMD_MASK) |
+  ((msg->address >> 16) & AUX_ADDR_19_16_MASK);
+   regmap_write(map, PAGE0_AUX_ADDR_23_16, data);
+   data = (msg->address >> 8) & 0xff;
+   regmap_write(map, PAGE0_AUX_ADDR_15_8, data);
+   data = msg->address & 0xff;
+   regmap_write(map, PAGE0_AUX_ADDR_7_0, msg->address & 0xff);
+
+   data = (len - 1) & AUX_LENGTH_MASK;
+   regmap_write(map, PAGE0_AUX_LENGTH, data);
+
+   if (request == DP_AUX_NATIVE_WRITE || request == DP_AUX_I2C_WRITE) {
+   ret = regmap_noinc_write(map, PAGE0_AUX_WDATA, buf, len);
+   if (ret < 0) {
+   DRM_ERROR("failed to write PAGE0_AUX_WDATA");
+   goto exit;
+   }
+   }
+
+   regmap_write(map, PAGE0_AUX_CTRL, AUX_START);
+
+   regmap_read(map, PAGE0_AUX_STATUS, &data);
+   switch (data & AUX_STATUS_MASK) {
+   case AUX_STATUS_DEFER:
+   if (is_native_aux)
+   msg->reply |= DP_AUX_NATIVE_REPLY_DEFER;
+   else
+   msg->reply |= DP_AUX_I2C_REPLY_DEFER;
+   goto exit;
+   case AUX_STATUS_NACK:
+   if (is_native_aux)
+   msg->reply |= DP_AUX_NATIVE_REPLY_NACK;
+   else
+   msg->reply |= DP_AUX_I2C_REPLY_NACK;
+   goto exit;
+   case AUX_STATUS_TIMEOUT:
+   ret = -ETIMEDOUT;
+   goto exit;
+   }
+
+   if (request == DP_AUX_NATIVE_READ || request == DP_AUX_I2C_READ) {
+   ret = regmap_noinc_read(map, PAGE0_AUX_RDATA, buf, len);
+   if (ret < 0)
+   DRM_ERROR("failed to read PAGE0_AUX_RDATA");
+   }
+
+exit:
+   pm_runtime_mark_last_busy

Re: [PATCH v3 8/9] dma-buf/sync_file: Add SET_DEADLINE ioctl

2021-09-08 Thread Rob Clark
On Wed, Sep 8, 2021 at 10:50 AM Daniel Vetter  wrote:
>
> On Fri, Sep 03, 2021 at 11:47:59AM -0700, Rob Clark wrote:
> > From: Rob Clark 
> >
> > The initial purpose is for igt tests, but this would also be useful for
> > compositors that wait until close to vblank deadline to make decisions
> > about which frame to show.
> >
> > Signed-off-by: Rob Clark 
>
> Needs userspace and I think ideally also some igts to make sure it works
> and doesn't go boom.

See cover-letter.. there are igt tests, although currently that is the
only user.

I'd be ok to otherwise initially restrict this and the sw_sync UABI
(CAP_SYS_ADMIN?  Or??) until there is a non-igt user, but they are
both needed by the igt tests

BR,
-R

> -Daniel
>
> > ---
> >  drivers/dma-buf/sync_file.c| 19 +++
> >  include/uapi/linux/sync_file.h | 20 
> >  2 files changed, 39 insertions(+)
> >
> > diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c
> > index 394e6e1e9686..f295772d5169 100644
> > --- a/drivers/dma-buf/sync_file.c
> > +++ b/drivers/dma-buf/sync_file.c
> > @@ -459,6 +459,22 @@ static long sync_file_ioctl_fence_info(struct 
> > sync_file *sync_file,
> >   return ret;
> >  }
> >
> > +static int sync_file_ioctl_set_deadline(struct sync_file *sync_file,
> > + unsigned long arg)
> > +{
> > + struct sync_set_deadline ts;
> > +
> > + if (copy_from_user(&ts, (void __user *)arg, sizeof(ts)))
> > + return -EFAULT;
> > +
> > + if (ts.pad)
> > + return -EINVAL;
> > +
> > + dma_fence_set_deadline(sync_file->fence, ktime_set(ts.tv_sec, 
> > ts.tv_nsec));
> > +
> > + return 0;
> > +}
> > +
> >  static long sync_file_ioctl(struct file *file, unsigned int cmd,
> >   unsigned long arg)
> >  {
> > @@ -471,6 +487,9 @@ static long sync_file_ioctl(struct file *file, unsigned 
> > int cmd,
> >   case SYNC_IOC_FILE_INFO:
> >   return sync_file_ioctl_fence_info(sync_file, arg);
> >
> > + case SYNC_IOC_SET_DEADLINE:
> > + return sync_file_ioctl_set_deadline(sync_file, arg);
> > +
> >   default:
> >   return -ENOTTY;
> >   }
> > diff --git a/include/uapi/linux/sync_file.h b/include/uapi/linux/sync_file.h
> > index ee2dcfb3d660..f67d4ffe7566 100644
> > --- a/include/uapi/linux/sync_file.h
> > +++ b/include/uapi/linux/sync_file.h
> > @@ -67,6 +67,18 @@ struct sync_file_info {
> >   __u64   sync_fence_info;
> >  };
> >
> > +/**
> > + * struct sync_set_deadline - set a deadline on a fence
> > + * @tv_sec:  seconds elapsed since epoch
> > + * @tv_nsec: nanoseconds elapsed since the time given by the tv_sec
> > + * @pad: must be zero
> > + */
> > +struct sync_set_deadline {
> > + __s64   tv_sec;
> > + __s32   tv_nsec;
> > + __u32   pad;
> > +};
> > +
> >  #define SYNC_IOC_MAGIC   '>'
> >
> >  /**
> > @@ -95,4 +107,12 @@ struct sync_file_info {
> >   */
> >  #define SYNC_IOC_FILE_INFO   _IOWR(SYNC_IOC_MAGIC, 4, struct 
> > sync_file_info)
> >
> > +
> > +/**
> > + * DOC: SYNC_IOC_SET_DEADLINE - set a deadline on a fence
> > + *
> > + * Allows userspace to set a deadline on a fence, see 
> > dma_fence_set_deadline()
> > + */
> > +#define SYNC_IOC_SET_DEADLINE_IOW(SYNC_IOC_MAGIC, 5, struct 
> > sync_set_deadline)
> > +
> >  #endif /* _UAPI_LINUX_SYNC_H */
> > --
> > 2.31.1
> >
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch


Re: [PATCH 1/2] drm/nouveau/ga102-: support ttm buffer moves via copy engine

2021-09-08 Thread Daniel Vetter
On Mon, Sep 06, 2021 at 10:56:27AM +1000, Ben Skeggs wrote:
> From: Ben Skeggs 
> 
> We don't currently have any kind of real acceleration on Ampere GPUs,
> but the TTM memcpy() fallback paths aren't really designed to handle
> copies between different devices, such as on Optimus systems, and
> result in a kernel OOPS.

Is this just for moving a buffer from vram to system memory when you pin
it for dma-buf? I'm kinda lost what you even use ttm bo moves for if
there's no one using the gpu.

Also I guess memcpy goes boom if you can't mmap it because it's outside
the gart? Or just that it's very slow. We're trying to use ttm memcyp as
fallback, so want to know how this can all go wrong :-)
-Daniel

> 
> A few options were investigated to try and fix this, but didn't work
> out, and likely would have resulted in a very unpleasant experience
> for users anyway.
> 
> This commit adds just enough support for setting up a single channel
> connected to a copy engine, which the kernel can use to accelerate
> the buffer copies between devices.  Userspace has no access to this
> incomplete channel support, but it's suitable for TTM's needs.
> 
> A more complete implementation of host(fifo) for Ampere GPUs is in
> the works, but the required changes are far too invasive that they
> would be unsuitable to backport to fix this issue on current kernels.
> 
> Signed-off-by: Ben Skeggs 
> Cc: Lyude Paul 
> Cc: Karol Herbst 
> Cc:  # v5.12+
> ---
>  drivers/gpu/drm/nouveau/include/nvif/class.h  |   2 +
>  .../drm/nouveau/include/nvkm/engine/fifo.h|   1 +
>  drivers/gpu/drm/nouveau/nouveau_bo.c  |   1 +
>  drivers/gpu/drm/nouveau/nouveau_chan.c|   6 +-
>  drivers/gpu/drm/nouveau/nouveau_drm.c |   4 +
>  drivers/gpu/drm/nouveau/nv84_fence.c  |   2 +-
>  .../gpu/drm/nouveau/nvkm/engine/device/base.c |   3 +
>  .../gpu/drm/nouveau/nvkm/engine/fifo/Kbuild   |   1 +
>  .../gpu/drm/nouveau/nvkm/engine/fifo/ga102.c  | 308 ++
>  .../gpu/drm/nouveau/nvkm/subdev/top/ga100.c   |   7 +-
>  10 files changed, 329 insertions(+), 6 deletions(-)
>  create mode 100644 drivers/gpu/drm/nouveau/nvkm/engine/fifo/ga102.c
> 
> diff --git a/drivers/gpu/drm/nouveau/include/nvif/class.h 
> b/drivers/gpu/drm/nouveau/include/nvif/class.h
> index c68cc957248e..a582c0cb0cb0 100644
> --- a/drivers/gpu/drm/nouveau/include/nvif/class.h
> +++ b/drivers/gpu/drm/nouveau/include/nvif/class.h
> @@ -71,6 +71,7 @@
>  #define PASCAL_CHANNEL_GPFIFO_A   /* cla06f.h */ 
> 0xc06f
>  #define VOLTA_CHANNEL_GPFIFO_A/* clc36f.h */ 
> 0xc36f
>  #define TURING_CHANNEL_GPFIFO_A   /* clc36f.h */ 
> 0xc46f
> +#define AMPERE_CHANNEL_GPFIFO_B   /* clc36f.h */ 
> 0xc76f
>  
>  #define NV50_DISP /* cl5070.h */ 
> 0x5070
>  #define G82_DISP  /* cl5070.h */ 
> 0x8270
> @@ -200,6 +201,7 @@
>  #define PASCAL_DMA_COPY_B
> 0xc1b5
>  #define VOLTA_DMA_COPY_A 
> 0xc3b5
>  #define TURING_DMA_COPY_A
> 0xc5b5
> +#define AMPERE_DMA_COPY_B
> 0xc7b5
>  
>  #define FERMI_DECOMPRESS 
> 0x90b8
>  
> diff --git a/drivers/gpu/drm/nouveau/include/nvkm/engine/fifo.h 
> b/drivers/gpu/drm/nouveau/include/nvkm/engine/fifo.h
> index 54fab7cc36c1..64ee82c7c1be 100644
> --- a/drivers/gpu/drm/nouveau/include/nvkm/engine/fifo.h
> +++ b/drivers/gpu/drm/nouveau/include/nvkm/engine/fifo.h
> @@ -77,4 +77,5 @@ int gp100_fifo_new(struct nvkm_device *, enum 
> nvkm_subdev_type, int inst, struct
>  int gp10b_fifo_new(struct nvkm_device *, enum nvkm_subdev_type, int inst, 
> struct nvkm_fifo **);
>  int gv100_fifo_new(struct nvkm_device *, enum nvkm_subdev_type, int inst, 
> struct nvkm_fifo **);
>  int tu102_fifo_new(struct nvkm_device *, enum nvkm_subdev_type, int inst, 
> struct nvkm_fifo **);
> +int ga102_fifo_new(struct nvkm_device *, enum nvkm_subdev_type, int inst, 
> struct nvkm_fifo **);
>  #endif
> diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
> b/drivers/gpu/drm/nouveau/nouveau_bo.c
> index 4a7cebac8060..b3e4f555fa05 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_bo.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
> @@ -844,6 +844,7 @@ nouveau_bo_move_init(struct nouveau_drm *drm)
>   struct ttm_resource *, struct ttm_resource *);
>   int (*init)(struct nouveau_channel *, u32 handle);
>   } _methods[] = {
> + {  "COPY", 4, 0xc7b5, nve0_bo_move_copy, nve0_bo_move_init },
>   {  "COPY", 4, 0xc5b5, nve0_bo_move_copy, nve0_bo_move_init },
>   {  "GRCE", 0, 0xc5b5, nve0_bo_move_copy, nvc0_bo_move_init },
>   {  "COPY", 4, 0xc3b5, nve0_bo_move_copy, nve0_bo_move_i

Re: [PATCH] drm: mxsfb: Fix NULL pointer dereference crash on unload

2021-09-08 Thread Daniel Vetter
On Tue, Sep 07, 2021 at 04:49:00AM +0200, Marek Vasut wrote:
> The mxsfb->crtc.funcs may already be NULL when unloading the driver,
> in which case calling mxsfb_irq_disable() via drm_irq_uninstall() from
> mxsfb_unload() leads to NULL pointer dereference.
> 
> Since all we care about is masking the IRQ and mxsfb->base is still
> valid, just use that to clear and mask the IRQ.
> 
> Fixes: ae1ed00932819 ("drm: mxsfb: Stop using DRM simple display pipeline 
> helper")
> Signed-off-by: Marek Vasut 
> Cc: Daniel Abrecht 
> Cc: Emil Velikov 
> Cc: Laurent Pinchart 
> Cc: Sam Ravnborg 
> Cc: Stefan Agner 

You probably want a drm_atomic_helper_shutdown instead of trying to do all
that manually. We've also added a bunch more devm and drmm_ functions to
automate the cleanup a lot more here, e.g. your drm_mode_config_cleanup is
in the wrong place.

Also I'm confused because I'm not even seeing this function anywhere in
upstream.
-Daniel

> ---
>  drivers/gpu/drm/mxsfb/mxsfb_drv.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/mxsfb/mxsfb_drv.c 
> b/drivers/gpu/drm/mxsfb/mxsfb_drv.c
> index ec0432fe1bdf8..86d78634a9799 100644
> --- a/drivers/gpu/drm/mxsfb/mxsfb_drv.c
> +++ b/drivers/gpu/drm/mxsfb/mxsfb_drv.c
> @@ -173,7 +173,11 @@ static void mxsfb_irq_disable(struct drm_device *drm)
>   struct mxsfb_drm_private *mxsfb = drm->dev_private;
>  
>   mxsfb_enable_axi_clk(mxsfb);
> - mxsfb->crtc.funcs->disable_vblank(&mxsfb->crtc);
> +
> + /* Disable and clear VBLANK IRQ */
> + writel(CTRL1_CUR_FRAME_DONE_IRQ_EN, mxsfb->base + LCDC_CTRL1 + REG_CLR);
> + writel(CTRL1_CUR_FRAME_DONE_IRQ, mxsfb->base + LCDC_CTRL1 + REG_CLR);
> +
>   mxsfb_disable_axi_clk(mxsfb);
>  }
>  
> -- 
> 2.33.0
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [resend PATCH] drm/ttm: Fix a deadlock if the target BO is not idle during swap

2021-09-08 Thread Daniel Vetter
On Tue, Sep 07, 2021 at 11:28:23AM +0200, Christian König wrote:
> Am 07.09.21 um 11:05 schrieb Daniel Vetter:
> > On Tue, Sep 07, 2021 at 08:22:20AM +0200, Christian König wrote:
> > > Added a Fixes tag and pushed this to drm-misc-fixes.
> > We're in the merge window, this should have been drm-misc-next-fixes. I'll
> > poke misc maintainers so it's not lost.
> 
> Hui? It's a fix for a problem in stable and not in drm-misc-next.

Ah the flow chart is confusing. There is no current -rc, so it's always
-next-fixes. Or you're running the risk that it's lost until after -rc1.
Maybe we should clarify that "is the bug in current -rc?" only applies if
there is a current -rc.

Anyway Thomas sent out a pr, so it's all good.
-Daniel

> 
> Christian.
> 
> > -Daniel
> > 
> > > It will take a while until it cycles back into the development branches, 
> > > so
> > > feel free to push some version to amd-staging-drm-next as well. Just ping
> > > Alex when you do this.
> > > 
> > > Thanks,
> > > Christian.
> > > 
> > > Am 07.09.21 um 06:08 schrieb xinhui pan:
> > > > The ret value might be -EBUSY, caller will think lru lock is still
> > > > locked but actually NOT. So return -ENOSPC instead. Otherwise we hit
> > > > list corruption.
> > > > 
> > > > ttm_bo_cleanup_refs might fail too if BO is not idle. If we return 0,
> > > > caller(ttm_tt_populate -> ttm_global_swapout ->ttm_device_swapout) will
> > > > be stuck as we actually did not free any BO memory. This usually happens
> > > > when the fence is not signaled for a long time.
> > > > 
> > > > Signed-off-by: xinhui pan 
> > > > Reviewed-by: Christian König 
> > > > ---
> > > >drivers/gpu/drm/ttm/ttm_bo.c | 6 +++---
> > > >1 file changed, 3 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> > > > index 8d7fd65ccced..23f906941ac9 100644
> > > > --- a/drivers/gpu/drm/ttm/ttm_bo.c
> > > > +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> > > > @@ -1152,9 +1152,9 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, 
> > > > struct ttm_operation_ctx *ctx,
> > > > }
> > > > if (bo->deleted) {
> > > > -   ttm_bo_cleanup_refs(bo, false, false, locked);
> > > > +   ret = ttm_bo_cleanup_refs(bo, false, false, locked);
> > > > ttm_bo_put(bo);
> > > > -   return 0;
> > > > +   return ret == -EBUSY ? -ENOSPC : ret;
> > > > }
> > > > ttm_bo_del_from_lru(bo);
> > > > @@ -1208,7 +1208,7 @@ int ttm_bo_swapout(struct ttm_buffer_object *bo, 
> > > > struct ttm_operation_ctx *ctx,
> > > > if (locked)
> > > > dma_resv_unlock(bo->base.resv);
> > > > ttm_bo_put(bo);
> > > > -   return ret;
> > > > +   return ret == -EBUSY ? -ENOSPC : ret;
> > > >}
> > > >void ttm_bo_tt_destroy(struct ttm_buffer_object *bo)
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH] kernel/locking: Add context to ww_mutex_trylock.

2021-09-08 Thread Daniel Vetter
On Wed, Sep 08, 2021 at 12:14:23PM +0200, Peter Zijlstra wrote:
> On Tue, Sep 07, 2021 at 03:20:44PM +0200, Maarten Lankhorst wrote:
> > i915 will soon gain an eviction path that trylock a whole lot of locks
> > for eviction, getting dmesg failures like below:
> > 
> > BUG: MAX_LOCK_DEPTH too low!
> > turning off the locking correctness validator.
> > depth: 48  max: 48!
> > 48 locks held by i915_selftest/5776:
> >  #0: 888101a79240 (&dev->mutex){}-{3:3}, at: 
> > __driver_attach+0x88/0x160
> >  #1: c99778c0 (reservation_ww_class_acquire){+.+.}-{0:0}, at: 
> > i915_vma_pin.constprop.63+0x39/0x1b0 [i915]
> >  #2: 88800cf74de8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: 
> > i915_vma_pin.constprop.63+0x5f/0x1b0 [i915]
> >  #3: 88810c7f9e38 (&vm->mutex/1){+.+.}-{3:3}, at: 
> > i915_vma_pin_ww+0x1c4/0x9d0 [i915]
> >  #4: 88810bad5768 (reservation_ww_class_mutex){+.+.}-{3:3}, at: 
> > i915_gem_evict_something+0x110/0x860 [i915]
> >  #5: 88810bad60e8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: 
> > i915_gem_evict_something+0x110/0x860 [i915]
> > ...
> >  #46: 88811964d768 (reservation_ww_class_mutex){+.+.}-{3:3}, at: 
> > i915_gem_evict_something+0x110/0x860 [i915]
> >  #47: 88811964e0e8 (reservation_ww_class_mutex){+.+.}-{3:3}, at: 
> > i915_gem_evict_something+0x110/0x860 [i915]
> > INFO: lockdep is turned off.
> 
> > As an intermediate solution, add an acquire context to ww_mutex_trylock,
> > which allows us to do proper nesting annotations on the trylocks, making
> > the above lockdep splat disappear.
> 
> Fair enough I suppose.

What's maybe missing from the commit message
- we'll probably use this for ttm too eventually
- even when we add full ww_mutex locking we'll still have the trylock
  fastpath. This is because we have a lock inversion against list locks in
  these eviction paths, and the slow path unroll to drop that list lock is
  a bit nasty (and defintely expensive).

iow even long term this here is needed in some form I think.
-Daniel

> 
> > +/**
> > + * ww_mutex_trylock - tries to acquire the w/w mutex with optional acquire 
> > context
> > + * @lock: mutex to lock
> > + * @ctx: optional w/w acquire context
> > + *
> > + * Trylocks a mutex with the optional acquire context; no deadlock 
> > detection is
> > + * possible. Returns 1 if the mutex has been acquired successfully, 0 
> > otherwise.
> > + *
> > + * Unlike ww_mutex_lock, no deadlock handling is performed. However, if a 
> > @ctx is
> > + * specified, -EALREADY and -EDEADLK handling may happen in calls to 
> > ww_mutex_lock.
> > + *
> > + * A mutex acquired with this function must be released with 
> > ww_mutex_unlock.
> > + */
> > +int __sched
> > +ww_mutex_trylock(struct ww_mutex *ww, struct ww_acquire_ctx *ctx)
> > +{
> > +   bool locked;
> > +
> > +   if (!ctx)
> > +   return mutex_trylock(&ww->base);
> > +
> > +#ifdef CONFIG_DEBUG_MUTEXES
> > +   DEBUG_LOCKS_WARN_ON(ww->base.magic != &ww->base);
> > +#endif
> > +
> > +   preempt_disable();
> > +   locked = __mutex_trylock(&ww->base);
> > +
> > +   if (locked) {
> > +   ww_mutex_set_context_fastpath(ww, ctx);
> > +   mutex_acquire_nest(&ww->base.dep_map, 0, 1, &ctx->dep_map, 
> > _RET_IP_);
> > +   }
> > +   preempt_enable();
> > +
> > +   return locked;
> > +}
> > +EXPORT_SYMBOL(ww_mutex_trylock);
> 
> You'll need a similar hunk in ww_rt_mutex.c

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH] drm/plane-helper: fix uninitialized variable reference

2021-09-08 Thread Daniel Vetter
On Tue, Sep 07, 2021 at 10:08:36AM -0400, Alex Xu (Hello71) wrote:
> drivers/gpu/drm/drm_plane_helper.c: In function 'drm_primary_helper_update':
> drivers/gpu/drm/drm_plane_helper.c:113:32: error: 'visible' is used 
> uninitialized [-Werror=uninitialized]
>   113 | struct drm_plane_state plane_state = {
>   |^~~
> drivers/gpu/drm/drm_plane_helper.c:178:14: note: 'visible' was declared here
>   178 | bool visible;
>   |  ^~~
> cc1: all warnings being treated as errors
> 
> visible is an output, not an input. in practice this use might turn out
> OK but it's still UB.
> 
> Fixes: df86af9133 ("drm/plane-helper: Add drm_plane_helper_check_state()")

I need a signed-off-by from you before I can merge this. See

https://dri.freedesktop.org/docs/drm/process/submitting-patches.html#sign-your-work-the-developer-s-certificate-of-origin

Patch lgtm otherwise.
-Daniel

> ---
>  drivers/gpu/drm/drm_plane_helper.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/drm_plane_helper.c 
> b/drivers/gpu/drm/drm_plane_helper.c
> index 5b2d0ca03705..838b32b70bce 100644
> --- a/drivers/gpu/drm/drm_plane_helper.c
> +++ b/drivers/gpu/drm/drm_plane_helper.c
> @@ -123,7 +123,6 @@ static int drm_plane_helper_check_update(struct drm_plane 
> *plane,
>   .crtc_w = drm_rect_width(dst),
>   .crtc_h = drm_rect_height(dst),
>   .rotation = rotation,
> - .visible = *visible,
>   };
>   struct drm_crtc_state crtc_state = {
>   .crtc = crtc,
> -- 
> 2.33.0
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH v3 7/9] dma-buf/fence-chain: Add fence deadline support

2021-09-08 Thread Daniel Vetter
On Wed, Sep 08, 2021 at 11:19:15AM -0700, Rob Clark wrote:
> On Wed, Sep 8, 2021 at 10:54 AM Daniel Vetter  wrote:
> >
> > On Fri, Sep 03, 2021 at 11:47:58AM -0700, Rob Clark wrote:
> > > From: Rob Clark 
> > >
> > > Signed-off-by: Rob Clark 
> > > ---
> > >  drivers/dma-buf/dma-fence-chain.c | 13 +
> > >  1 file changed, 13 insertions(+)
> > >
> > > diff --git a/drivers/dma-buf/dma-fence-chain.c 
> > > b/drivers/dma-buf/dma-fence-chain.c
> > > index 1b4cb3e5cec9..736a9ad3ea6d 100644
> > > --- a/drivers/dma-buf/dma-fence-chain.c
> > > +++ b/drivers/dma-buf/dma-fence-chain.c
> > > @@ -208,6 +208,18 @@ static void dma_fence_chain_release(struct dma_fence 
> > > *fence)
> > >   dma_fence_free(fence);
> > >  }
> > >
> > > +
> > > +static void dma_fence_chain_set_deadline(struct dma_fence *fence,
> > > +  ktime_t deadline)
> > > +{
> > > + dma_fence_chain_for_each(fence, fence) {
> > > + struct dma_fence_chain *chain = to_dma_fence_chain(fence);
> > > + struct dma_fence *f = chain ? chain->fence : fence;
> >
> > Doesn't this just end up calling set_deadline on a chain, potenetially
> > resulting in recursion? Also I don't think this should ever happen, why
> > did you add that?
> 
> Tbh the fence-chain was the part I was a bit fuzzy about, and the main
> reason I added igt tests.  The iteration is similar to how, for ex,
> dma_fence_chain_signaled() work, and according to the igt test it does
> what was intended

Huh indeed. Maybe something we should fix, like why does the
dma_fence_chain_for_each not give you the upcast chain pointer ... I guess
this also needs more Christian and less me.
-Daniel

> 
> BR,
> -R
> 
> > -Daniel
> >
> > > +
> > > + dma_fence_set_deadline(f, deadline);
> > > + }
> > > +}
> > > +
> > >  const struct dma_fence_ops dma_fence_chain_ops = {
> > >   .use_64bit_seqno = true,
> > >   .get_driver_name = dma_fence_chain_get_driver_name,
> > > @@ -215,6 +227,7 @@ const struct dma_fence_ops dma_fence_chain_ops = {
> > >   .enable_signaling = dma_fence_chain_enable_signaling,
> > >   .signaled = dma_fence_chain_signaled,
> > >   .release = dma_fence_chain_release,
> > > + .set_deadline = dma_fence_chain_set_deadline,
> > >  };
> > >  EXPORT_SYMBOL(dma_fence_chain_ops);
> > >
> > > --
> > > 2.31.1
> > >
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH v3 8/9] dma-buf/sync_file: Add SET_DEADLINE ioctl

2021-09-08 Thread Daniel Vetter
On Wed, Sep 08, 2021 at 11:23:42AM -0700, Rob Clark wrote:
> On Wed, Sep 8, 2021 at 10:50 AM Daniel Vetter  wrote:
> >
> > On Fri, Sep 03, 2021 at 11:47:59AM -0700, Rob Clark wrote:
> > > From: Rob Clark 
> > >
> > > The initial purpose is for igt tests, but this would also be useful for
> > > compositors that wait until close to vblank deadline to make decisions
> > > about which frame to show.
> > >
> > > Signed-off-by: Rob Clark 
> >
> > Needs userspace and I think ideally also some igts to make sure it works
> > and doesn't go boom.
> 
> See cover-letter.. there are igt tests, although currently that is the
> only user.

Ah sorry missed that. It would be good to record that in the commit too
that adds the uapi. git blame doesn't find cover letters at all, unlike on
gitlab where you get the MR request with everything.

Ok there is the Link: thing, but since that only points at the last
version all the interesting discussion is still usually lost, so I tend to
not bother looking there.

> I'd be ok to otherwise initially restrict this and the sw_sync UABI
> (CAP_SYS_ADMIN?  Or??) until there is a non-igt user, but they are
> both needed by the igt tests

Hm really awkward, uapi for igts in cross vendor stuff like this isn't
great. I think hiding it in vgem is semi-ok (we have fences there
already). But it's all a bit silly ...

For the tests, should we instead have a selftest/Kunit thing to exercise
this stuff? igt probably not quite the right thing. Or combine with a page
flip if you want to test msm.
-Daniel

> 
> BR,
> -R
> 
> > -Daniel
> >
> > > ---
> > >  drivers/dma-buf/sync_file.c| 19 +++
> > >  include/uapi/linux/sync_file.h | 20 
> > >  2 files changed, 39 insertions(+)
> > >
> > > diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c
> > > index 394e6e1e9686..f295772d5169 100644
> > > --- a/drivers/dma-buf/sync_file.c
> > > +++ b/drivers/dma-buf/sync_file.c
> > > @@ -459,6 +459,22 @@ static long sync_file_ioctl_fence_info(struct 
> > > sync_file *sync_file,
> > >   return ret;
> > >  }
> > >
> > > +static int sync_file_ioctl_set_deadline(struct sync_file *sync_file,
> > > + unsigned long arg)
> > > +{
> > > + struct sync_set_deadline ts;
> > > +
> > > + if (copy_from_user(&ts, (void __user *)arg, sizeof(ts)))
> > > + return -EFAULT;
> > > +
> > > + if (ts.pad)
> > > + return -EINVAL;
> > > +
> > > + dma_fence_set_deadline(sync_file->fence, ktime_set(ts.tv_sec, 
> > > ts.tv_nsec));
> > > +
> > > + return 0;
> > > +}
> > > +
> > >  static long sync_file_ioctl(struct file *file, unsigned int cmd,
> > >   unsigned long arg)
> > >  {
> > > @@ -471,6 +487,9 @@ static long sync_file_ioctl(struct file *file, 
> > > unsigned int cmd,
> > >   case SYNC_IOC_FILE_INFO:
> > >   return sync_file_ioctl_fence_info(sync_file, arg);
> > >
> > > + case SYNC_IOC_SET_DEADLINE:
> > > + return sync_file_ioctl_set_deadline(sync_file, arg);
> > > +
> > >   default:
> > >   return -ENOTTY;
> > >   }
> > > diff --git a/include/uapi/linux/sync_file.h 
> > > b/include/uapi/linux/sync_file.h
> > > index ee2dcfb3d660..f67d4ffe7566 100644
> > > --- a/include/uapi/linux/sync_file.h
> > > +++ b/include/uapi/linux/sync_file.h
> > > @@ -67,6 +67,18 @@ struct sync_file_info {
> > >   __u64   sync_fence_info;
> > >  };
> > >
> > > +/**
> > > + * struct sync_set_deadline - set a deadline on a fence
> > > + * @tv_sec:  seconds elapsed since epoch
> > > + * @tv_nsec: nanoseconds elapsed since the time given by the tv_sec
> > > + * @pad: must be zero
> > > + */
> > > +struct sync_set_deadline {
> > > + __s64   tv_sec;
> > > + __s32   tv_nsec;
> > > + __u32   pad;
> > > +};
> > > +
> > >  #define SYNC_IOC_MAGIC   '>'
> > >
> > >  /**
> > > @@ -95,4 +107,12 @@ struct sync_file_info {
> > >   */
> > >  #define SYNC_IOC_FILE_INFO   _IOWR(SYNC_IOC_MAGIC, 4, struct 
> > > sync_file_info)
> > >
> > > +
> > > +/**
> > > + * DOC: SYNC_IOC_SET_DEADLINE - set a deadline on a fence
> > > + *
> > > + * Allows userspace to set a deadline on a fence, see 
> > > dma_fence_set_deadline()
> > > + */
> > > +#define SYNC_IOC_SET_DEADLINE_IOW(SYNC_IOC_MAGIC, 5, struct 
> > > sync_set_deadline)
> > > +
> > >  #endif /* _UAPI_LINUX_SYNC_H */
> > > --
> > > 2.31.1
> > >
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


[PATCH 0/2] drm/i915/gt: Locking splats PREEMPT_RT

2021-09-08 Thread Sebastian Andrzej Siewior
Clark Williams reported two issues with the i915 driver running on
PREEMPT_RT. While #1 looks simple I have no idea about #2 thus the RFC.

Sebastian



[RFC PATCH 2/2] drm/i915/gt: Use spin_lock_irq() instead of local_irq_disable() + spin_lock()

2021-09-08 Thread Sebastian Andrzej Siewior
execlists_dequeue() is invoked from a function which uses
local_irq_disable() to disable interrupts so the spin_lock() behaves
like spin_lock_irq().
This breaks PREEMPT_RT because local_irq_disable() + spin_lock() is not
the same as spin_lock_irq().

execlists_dequeue_irq() and execlists_dequeue() has each one caller
only. If intel_engine_cs::active::lock is acquired and released with the
_irq suffix then it behaves almost as if execlists_dequeue() would be
invoked with disabled interrupts. The difference is the last part of the
function which is then invoked with enabled interrupts.
I can't tell if this makes a difference. From looking at it, it might
work to move the last unlock at the end of the function as I didn't find
anything that would acquire the lock again.

Reported-by: Clark Williams 
Signed-off-by: Sebastian Andrzej Siewior 
---
 .../drm/i915/gt/intel_execlists_submission.c| 17 +
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index fc77592d88a96..2ec1dd352960b 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -1265,7 +1265,7 @@ static void execlists_dequeue(struct intel_engine_cs 
*engine)
 * and context switches) submission.
 */
 
-   spin_lock(&engine->active.lock);
+   spin_lock_irq(&engine->active.lock);
 
/*
 * If the queue is higher priority than the last
@@ -1365,7 +1365,7 @@ static void execlists_dequeue(struct intel_engine_cs 
*engine)
 * Even if ELSP[1] is occupied and not worthy
 * of timeslices, our queue might be.
 */
-   spin_unlock(&engine->active.lock);
+   spin_unlock_irq(&engine->active.lock);
return;
}
}
@@ -1391,7 +1391,7 @@ static void execlists_dequeue(struct intel_engine_cs 
*engine)
 
if (last && !can_merge_rq(last, rq)) {
spin_unlock(&ve->base.active.lock);
-   spin_unlock(&engine->active.lock);
+   spin_unlock_irq(&engine->active.lock);
return; /* leave this for another sibling */
}
 
@@ -1552,7 +1552,7 @@ static void execlists_dequeue(struct intel_engine_cs 
*engine)
 * interrupt for secondary ports).
 */
execlists->queue_priority_hint = queue_prio(execlists);
-   spin_unlock(&engine->active.lock);
+   spin_unlock_irq(&engine->active.lock);
 
/*
 * We can skip poking the HW if we ended up with exactly the same set
@@ -1578,13 +1578,6 @@ static void execlists_dequeue(struct intel_engine_cs 
*engine)
}
 }
 
-static void execlists_dequeue_irq(struct intel_engine_cs *engine)
-{
-   local_irq_disable(); /* Suspend interrupts across request submission */
-   execlists_dequeue(engine);
-   local_irq_enable(); /* flush irq_work (e.g. breadcrumb enabling) */
-}
-
 static void clear_ports(struct i915_request **ports, int count)
 {
memset_p((void **)ports, NULL, count);
@@ -2377,7 +2370,7 @@ static void execlists_submission_tasklet(struct 
tasklet_struct *t)
}
 
if (!engine->execlists.pending[0]) {
-   execlists_dequeue_irq(engine);
+   execlists_dequeue(engine);
start_timeslice(engine);
}
 
-- 
2.33.0



[PATCH 1/2] drm/i915/gt: Queue and wait for the irq_work item.

2021-09-08 Thread Sebastian Andrzej Siewior
Disabling interrupts and invoking the irq_work function directly breaks
on PREEMPT_RT.
PREEMPT_RT does not invoke all irq_work from hardirq context because
some of the user have spinlock_t locking in the callback function.
These locks are then turned into a sleeping locks which can not be
acquired with disabled interrupts.

Using irq_work_queue() has the benefit that the irqwork will be invoked
in the regular context. In general there is "no" delay between enqueuing
the callback and its invocation because the interrupt is raised right
away on architectures which support it (which includes x86).

Use irq_work_queue() + irq_work_sync() instead invoking the callback
directly.

Reported-by: Clark Williams 
Signed-off-by: Sebastian Andrzej Siewior 
---
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c 
b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
index 38cc42783dfb2..594dec2f76954 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -318,10 +318,9 @@ void __intel_breadcrumbs_park(struct intel_breadcrumbs *b)
/* Kick the work once more to drain the signalers, and disarm the irq */
irq_work_sync(&b->irq_work);
while (READ_ONCE(b->irq_armed) && !atomic_read(&b->active)) {
-   local_irq_disable();
-   signal_irq_work(&b->irq_work);
-   local_irq_enable();
+   irq_work_queue(&b->irq_work);
cond_resched();
+   irq_work_sync(&b->irq_work);
}
 }
 
-- 
2.33.0



[PATCH] drm/nouveau/nvkm: Replace -ENOSYS with -ENODEV

2021-09-08 Thread Guenter Roeck
nvkm test builds fail with the following error.

drivers/gpu/drm/nouveau/nvkm/engine/device/ctrl.c:
In function 'nvkm_control_mthd_pstate_info':
drivers/gpu/drm/nouveau/nvkm/engine/device/ctrl.c:60:35: error:
overflow in conversion from 'int' to '__s8' {aka 'signed char'}
changes value from '-251' to '5'

The code builds on most architectures, but fails on parisc where ENOSYS
is defined as 251. Replace the error code with -ENODEV (-19). The actual
error code does not really matter and is not passed to userspace - it
just has to be negative.

Fixes: 7238eca4cf18 ("drm/nouveau: expose pstate selection per-power source in 
sysfs")
Signed-off-by: Guenter Roeck 
---
 drivers/gpu/drm/nouveau/nvkm/engine/device/ctrl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/device/ctrl.c 
b/drivers/gpu/drm/nouveau/nvkm/engine/device/ctrl.c
index b0ece71aefde..ce774579c89d 100644
--- a/drivers/gpu/drm/nouveau/nvkm/engine/device/ctrl.c
+++ b/drivers/gpu/drm/nouveau/nvkm/engine/device/ctrl.c
@@ -57,7 +57,7 @@ nvkm_control_mthd_pstate_info(struct nvkm_control *ctrl, void 
*data, u32 size)
args->v0.count = 0;
args->v0.ustate_ac = NVIF_CONTROL_PSTATE_INFO_V0_USTATE_DISABLE;
args->v0.ustate_dc = NVIF_CONTROL_PSTATE_INFO_V0_USTATE_DISABLE;
-   args->v0.pwrsrc = -ENOSYS;
+   args->v0.pwrsrc = -ENODEV;
args->v0.pstate = NVIF_CONTROL_PSTATE_INFO_V0_PSTATE_UNKNOWN;
}
 
-- 
2.33.0



Re: [PATCH 13/14] drm/kmb: Enable alpha blended second plane

2021-09-08 Thread Sam Ravnborg
Hi Thomas,

On Wed, Sep 08, 2021 at 07:50:42PM +0200, Thomas Zimmermann wrote:
> Hi
> 
> Am 03.08.21 um 07:10 schrieb Sam Ravnborg:
> > Hi Anitha,
> > 
> > On Mon, Aug 02, 2021 at 08:44:26PM +, Chrisanthus, Anitha wrote:
> > > Hi Sam,
> > > Thanks. Where should this go, drm-misc-fixes or drm-misc-next?
> > 
> > Looks like a drm-misc-next candidate to me.
> > I may improve something for existing users, but it does not look like it
> > fixes an existing bug.
> 
> I found this patch in drm-misc-fixes, although it doesn't look like a
> bugfix. It should have gone into drm-misc-next. See [1]. If it indeed
> belongs into drm-misc-fixes, it certainly should have contained a Fixes tag.

The patch fixes some warnings that has become errors the last week.
Anitha pinged me about it, but I failed to followup. So in the end it
was applied to shut up the warning => errors.

Sam


[drm:i915-uncore-vfunc 30/31] drivers/gpu/drm/i915/selftests/mock_uncore.c:47:2: error: implicit declaration of function 'ASSIGN_RAW_WRITE_MMIO_VFUNCS'; did you mean 'MMIO_RAW_WRITE_VFUNCS'?

2021-09-08 Thread kernel test robot
tree:   git://people.freedesktop.org/~airlied/linux.git i915-uncore-vfunc
head:   b42168f90718a90b11f2d52306d9aeaa9468
commit: 99aebd17891290abfca80c48eca01f4e02413fb3 [30/31] drm/i915/uncore: 
constify the register vtables.
config: i386-allyesconfig (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce (this is a W=1 build):
git remote add drm git://people.freedesktop.org/~airlied/linux.git
git fetch --no-tags drm i915-uncore-vfunc
git checkout 99aebd17891290abfca80c48eca01f4e02413fb3
# save the attached .config to linux build tree
make W=1 ARCH=i386 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

   In file included from drivers/gpu/drm/i915/intel_uncore.c:2630:
   drivers/gpu/drm/i915/selftests/mock_uncore.c: In function 'mock_uncore_init':
>> drivers/gpu/drm/i915/selftests/mock_uncore.c:47:2: error: implicit 
>> declaration of function 'ASSIGN_RAW_WRITE_MMIO_VFUNCS'; did you mean 
>> 'MMIO_RAW_WRITE_VFUNCS'? [-Werror=implicit-function-declaration]
  47 |  ASSIGN_RAW_WRITE_MMIO_VFUNCS(uncore, nop);
 |  ^~~~
 |  MMIO_RAW_WRITE_VFUNCS
>> drivers/gpu/drm/i915/selftests/mock_uncore.c:47:39: error: 'nop' undeclared 
>> (first use in this function); did you mean 'nopv'?
  47 |  ASSIGN_RAW_WRITE_MMIO_VFUNCS(uncore, nop);
 |   ^~~
 |   nopv
   drivers/gpu/drm/i915/selftests/mock_uncore.c:47:39: note: each undeclared 
identifier is reported only once for each function it appears in
>> drivers/gpu/drm/i915/selftests/mock_uncore.c:48:2: error: implicit 
>> declaration of function 'ASSIGN_RAW_READ_MMIO_VFUNCS' 
>> [-Werror=implicit-function-declaration]
  48 |  ASSIGN_RAW_READ_MMIO_VFUNCS(uncore, nop);
 |  ^~~
   At top level:
>> drivers/gpu/drm/i915/selftests/mock_uncore.c:36:1: error: 'nop_read64' 
>> defined but not used [-Werror=unused-function]
  36 | nop_read##x(struct intel_uncore *uncore, i915_reg_t reg, bool trace) 
{ return 0; }
 | ^~~~
   drivers/gpu/drm/i915/selftests/mock_uncore.c:36:1: note: in definition of 
macro '__nop_read'
  36 | nop_read##x(struct intel_uncore *uncore, i915_reg_t reg, bool trace) 
{ return 0; }
 | ^~~~
>> drivers/gpu/drm/i915/selftests/mock_uncore.c:36:1: error: 'nop_read32' 
>> defined but not used [-Werror=unused-function]
  36 | nop_read##x(struct intel_uncore *uncore, i915_reg_t reg, bool trace) 
{ return 0; }
 | ^~~~
   drivers/gpu/drm/i915/selftests/mock_uncore.c:36:1: note: in definition of 
macro '__nop_read'
  36 | nop_read##x(struct intel_uncore *uncore, i915_reg_t reg, bool trace) 
{ return 0; }
 | ^~~~
>> drivers/gpu/drm/i915/selftests/mock_uncore.c:36:1: error: 'nop_read16' 
>> defined but not used [-Werror=unused-function]
  36 | nop_read##x(struct intel_uncore *uncore, i915_reg_t reg, bool trace) 
{ return 0; }
 | ^~~~
   drivers/gpu/drm/i915/selftests/mock_uncore.c:36:1: note: in definition of 
macro '__nop_read'
  36 | nop_read##x(struct intel_uncore *uncore, i915_reg_t reg, bool trace) 
{ return 0; }
 | ^~~~
>> drivers/gpu/drm/i915/selftests/mock_uncore.c:36:1: error: 'nop_read8' 
>> defined but not used [-Werror=unused-function]
  36 | nop_read##x(struct intel_uncore *uncore, i915_reg_t reg, bool trace) 
{ return 0; }
 | ^~~~
   drivers/gpu/drm/i915/selftests/mock_uncore.c:36:1: note: in definition of 
macro '__nop_read'
  36 | nop_read##x(struct intel_uncore *uncore, i915_reg_t reg, bool trace) 
{ return 0; }
 | ^~~~
>> drivers/gpu/drm/i915/selftests/mock_uncore.c:29:1: error: 'nop_write32' 
>> defined but not used [-Werror=unused-function]
  29 | nop_write##x(struct intel_uncore *uncore, i915_reg_t reg, u##x val, 
bool trace) { }
 | ^
   drivers/gpu/drm/i915/selftests/mock_uncore.c:29:1: note: in definition of 
macro '__nop_write'
  29 | nop_write##x(struct intel_uncore *uncore, i915_reg_t reg, u##x val, 
bool trace) { }
 | ^
>> drivers/gpu/drm/i915/selftests/mock_uncore.c:29:1: error: 'nop_write16' 
>> defined but not used [-Werror=unused-function]
  29 | nop_write##x(struct intel_uncore *uncore, i915_reg_t reg, u##x val, 
bool trace) { }
 | ^
   drivers/gpu/drm/i915/selftests/mock_uncore.c:29:1: note: in definition of 
macro '__nop_write'
  29 | nop_write##x(struct intel_uncore *uncore, i915_reg_t reg, u##x val, 
bool trace) { }
 | ^
>> drivers/gpu/drm/i915/selftests/mock_uncore.c:29:1: error: 'nop_write8' 
>> defined but not used [-Werror=unused-function]
  29 | nop_write##x(struct intel_uncore *uncore, i915_reg_t reg, u##x val, 
bool trace) { }
 | ^
   drivers/gpu/drm/i915/selftests/mock_uncore.c:29:1: note: in def

Re: [PATCH v3 8/9] dma-buf/sync_file: Add SET_DEADLINE ioctl

2021-09-08 Thread Rob Clark
On Wed, Sep 8, 2021 at 11:49 AM Daniel Vetter  wrote:
>
> On Wed, Sep 08, 2021 at 11:23:42AM -0700, Rob Clark wrote:
> > On Wed, Sep 8, 2021 at 10:50 AM Daniel Vetter  wrote:
> > >
> > > On Fri, Sep 03, 2021 at 11:47:59AM -0700, Rob Clark wrote:
> > > > From: Rob Clark 
> > > >
> > > > The initial purpose is for igt tests, but this would also be useful for
> > > > compositors that wait until close to vblank deadline to make decisions
> > > > about which frame to show.
> > > >
> > > > Signed-off-by: Rob Clark 
> > >
> > > Needs userspace and I think ideally also some igts to make sure it works
> > > and doesn't go boom.
> >
> > See cover-letter.. there are igt tests, although currently that is the
> > only user.
>
> Ah sorry missed that. It would be good to record that in the commit too
> that adds the uapi. git blame doesn't find cover letters at all, unlike on
> gitlab where you get the MR request with everything.
>
> Ok there is the Link: thing, but since that only points at the last
> version all the interesting discussion is still usually lost, so I tend to
> not bother looking there.
>
> > I'd be ok to otherwise initially restrict this and the sw_sync UABI
> > (CAP_SYS_ADMIN?  Or??) until there is a non-igt user, but they are
> > both needed by the igt tests
>
> Hm really awkward, uapi for igts in cross vendor stuff like this isn't
> great. I think hiding it in vgem is semi-ok (we have fences there
> already). But it's all a bit silly ...
>
> For the tests, should we instead have a selftest/Kunit thing to exercise
> this stuff? igt probably not quite the right thing. Or combine with a page
> flip if you want to test msm.

Hmm, IIRC we have used CONFIG_BROKEN or something along those lines
for UABI in other places where we weren't willing to commit to yet?

I suppose if we had to I could make this a sw_sync ioctl instead.  But
OTOH there are kind of a limited # of ways this ioctl could look.  And
we already know that at least some wayland compositors are going to
want this.

I guess I can look at non-igt options.  But the igt test is already a
pretty convenient way to contrive situations (like loops, which is a
thing I need to add)

BR,
-R


> -Daniel
>
> >
> > BR,
> > -R
> >
> > > -Daniel
> > >
> > > > ---
> > > >  drivers/dma-buf/sync_file.c| 19 +++
> > > >  include/uapi/linux/sync_file.h | 20 
> > > >  2 files changed, 39 insertions(+)
> > > >
> > > > diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c
> > > > index 394e6e1e9686..f295772d5169 100644
> > > > --- a/drivers/dma-buf/sync_file.c
> > > > +++ b/drivers/dma-buf/sync_file.c
> > > > @@ -459,6 +459,22 @@ static long sync_file_ioctl_fence_info(struct 
> > > > sync_file *sync_file,
> > > >   return ret;
> > > >  }
> > > >
> > > > +static int sync_file_ioctl_set_deadline(struct sync_file *sync_file,
> > > > + unsigned long arg)
> > > > +{
> > > > + struct sync_set_deadline ts;
> > > > +
> > > > + if (copy_from_user(&ts, (void __user *)arg, sizeof(ts)))
> > > > + return -EFAULT;
> > > > +
> > > > + if (ts.pad)
> > > > + return -EINVAL;
> > > > +
> > > > + dma_fence_set_deadline(sync_file->fence, ktime_set(ts.tv_sec, 
> > > > ts.tv_nsec));
> > > > +
> > > > + return 0;
> > > > +}
> > > > +
> > > >  static long sync_file_ioctl(struct file *file, unsigned int cmd,
> > > >   unsigned long arg)
> > > >  {
> > > > @@ -471,6 +487,9 @@ static long sync_file_ioctl(struct file *file, 
> > > > unsigned int cmd,
> > > >   case SYNC_IOC_FILE_INFO:
> > > >   return sync_file_ioctl_fence_info(sync_file, arg);
> > > >
> > > > + case SYNC_IOC_SET_DEADLINE:
> > > > + return sync_file_ioctl_set_deadline(sync_file, arg);
> > > > +
> > > >   default:
> > > >   return -ENOTTY;
> > > >   }
> > > > diff --git a/include/uapi/linux/sync_file.h 
> > > > b/include/uapi/linux/sync_file.h
> > > > index ee2dcfb3d660..f67d4ffe7566 100644
> > > > --- a/include/uapi/linux/sync_file.h
> > > > +++ b/include/uapi/linux/sync_file.h
> > > > @@ -67,6 +67,18 @@ struct sync_file_info {
> > > >   __u64   sync_fence_info;
> > > >  };
> > > >
> > > > +/**
> > > > + * struct sync_set_deadline - set a deadline on a fence
> > > > + * @tv_sec:  seconds elapsed since epoch
> > > > + * @tv_nsec: nanoseconds elapsed since the time given by the tv_sec
> > > > + * @pad: must be zero
> > > > + */
> > > > +struct sync_set_deadline {
> > > > + __s64   tv_sec;
> > > > + __s32   tv_nsec;
> > > > + __u32   pad;
> > > > +};
> > > > +
> > > >  #define SYNC_IOC_MAGIC   '>'
> > > >
> > > >  /**
> > > > @@ -95,4 +107,12 @@ struct sync_file_info {
> > > >   */
> > > >  #define SYNC_IOC_FILE_INFO   _IOWR(SYNC_IOC_MAGIC, 4, struct 
> > > > sync_file_info)
> > > >
> > > > +
> > > > +/**
> > > > + * DOC: SYNC_IOC_SET_DEADLINE - set a deadline on a fence
> > > > + *
> > 

Re: [PATCH 2/8] drm/i915/xehp: CCS shares the render reset domain

2021-09-08 Thread Matt Roper
On Wed, Sep 08, 2021 at 11:07:07AM +0100, Tvrtko Ursulin wrote:
> 
> On 07/09/2021 18:19, Matt Roper wrote:
> > The reset domain is shared between render and all compute engines,
> > so resetting one will affect the others.
> > 
> > Note:  Before performing a reset on an RCS or CCS engine, the GuC will
> > attempt to preempt-to-idle the other non-hung RCS/CCS engines to avoid
> > impacting other clients (since some shared modules will be reset).  If
> > other engines are executing non-preemptable workloads, the impact is
> > unavoidable and some work may be lost.
> 
> Since here it talks about engine reset, should this patch add warning if
> same is attempted by i915 on a GuC platform - to document it is not

Did you mean "on a *non* GuC platform" here?  We aren't going to have
compute engine support on any platforms where GuC submission isn't the
default operating model, so the only way to get compute engines +
execlist submission is to force an override via module parameters (e.g.,
enable_guc=0).  Doing so will taint the kernel, so I think the current
consensus from offline discussion is that the user has already put
themselves into a configuration where it's easier than usual to shoot
themselves in the foot; it's not too much different than the kind of
trouble a user could get themselves into if they loaded the driver with
hangcheck disabled or something.


Matt

> implemented/supported? Or perhaps later in the series, or future series
> works better.
> 
> Reviewed-by: Tvrtko Ursulin 
> 
> Regards,
> 
> Tvrtko
> 
> > Bspec: 52549
> > Original-patch-by: Michel Thierry
> > Cc: Tvrtko Ursulin 
> > Cc: Vinay Belgaumkar 
> > Signed-off-by: Daniele Ceraolo Spurio 
> > Signed-off-by: Aravind Iddamsetty 
> > Signed-off-by: Matt Roper 
> > ---
> >   drivers/gpu/drm/i915/gt/intel_reset.c | 4 
> >   1 file changed, 4 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c 
> > b/drivers/gpu/drm/i915/gt/intel_reset.c
> > index 91200c43951f..30598c1d070c 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_reset.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_reset.c
> > @@ -507,6 +507,10 @@ static int gen11_reset_engines(struct intel_gt *gt,
> > [VECS1] = GEN11_GRDOM_VECS2,
> > [VECS2] = GEN11_GRDOM_VECS3,
> > [VECS3] = GEN11_GRDOM_VECS4,
> > +   [CCS0] = GEN11_GRDOM_RENDER,
> > +   [CCS1] = GEN11_GRDOM_RENDER,
> > +   [CCS2] = GEN11_GRDOM_RENDER,
> > +   [CCS3] = GEN11_GRDOM_RENDER,
> > };
> > struct intel_engine_cs *engine;
> > intel_engine_mask_t tmp;
> > 

-- 
Matt Roper
Graphics Software Engineer
VTT-OSGC Platform Enablement
Intel Corporation
(916) 356-2795


Re: [PATCH v3 8/9] dma-buf/sync_file: Add SET_DEADLINE ioctl

2021-09-08 Thread Daniel Vetter
On Wed, Sep 8, 2021 at 9:36 PM Rob Clark  wrote:
> On Wed, Sep 8, 2021 at 11:49 AM Daniel Vetter  wrote:
> > On Wed, Sep 08, 2021 at 11:23:42AM -0700, Rob Clark wrote:
> > > On Wed, Sep 8, 2021 at 10:50 AM Daniel Vetter  wrote:
> > > >
> > > > On Fri, Sep 03, 2021 at 11:47:59AM -0700, Rob Clark wrote:
> > > > > From: Rob Clark 
> > > > >
> > > > > The initial purpose is for igt tests, but this would also be useful 
> > > > > for
> > > > > compositors that wait until close to vblank deadline to make decisions
> > > > > about which frame to show.
> > > > >
> > > > > Signed-off-by: Rob Clark 
> > > >
> > > > Needs userspace and I think ideally also some igts to make sure it works
> > > > and doesn't go boom.
> > >
> > > See cover-letter.. there are igt tests, although currently that is the
> > > only user.
> >
> > Ah sorry missed that. It would be good to record that in the commit too
> > that adds the uapi. git blame doesn't find cover letters at all, unlike on
> > gitlab where you get the MR request with everything.
> >
> > Ok there is the Link: thing, but since that only points at the last
> > version all the interesting discussion is still usually lost, so I tend to
> > not bother looking there.
> >
> > > I'd be ok to otherwise initially restrict this and the sw_sync UABI
> > > (CAP_SYS_ADMIN?  Or??) until there is a non-igt user, but they are
> > > both needed by the igt tests
> >
> > Hm really awkward, uapi for igts in cross vendor stuff like this isn't
> > great. I think hiding it in vgem is semi-ok (we have fences there
> > already). But it's all a bit silly ...
> >
> > For the tests, should we instead have a selftest/Kunit thing to exercise
> > this stuff? igt probably not quite the right thing. Or combine with a page
> > flip if you want to test msm.
>
> Hmm, IIRC we have used CONFIG_BROKEN or something along those lines
> for UABI in other places where we weren't willing to commit to yet?
>
> I suppose if we had to I could make this a sw_sync ioctl instead.  But
> OTOH there are kind of a limited # of ways this ioctl could look.  And
> we already know that at least some wayland compositors are going to
> want this.

Hm I was trying to think up a few ways this could work, but didn't
come up with anything reasonable. Forcing the compositor to boost the
entire chain (for gl composited primary plane fallback) is something
the kernel can easily do too. Also only makes sense for priority
boost, not so much for clock boosting, since clock boosting only
really needs the final element to be boosted.

> I guess I can look at non-igt options.  But the igt test is already a
> pretty convenient way to contrive situations (like loops, which is a
> thing I need to add)

Yeah it's definitely very useful for testing ... One option could be a
hacky debugfs interface, where you write a fd number and deadline and
the debugfs read function does the deadline setting. Horribly, but
since it's debugfs no one ever cares. That's at least where we're
hiding all the i915 hacks that igts need.
-Daniel

> BR,
> -R
>
>
> > -Daniel
> >
> > >
> > > BR,
> > > -R
> > >
> > > > -Daniel
> > > >
> > > > > ---
> > > > >  drivers/dma-buf/sync_file.c| 19 +++
> > > > >  include/uapi/linux/sync_file.h | 20 
> > > > >  2 files changed, 39 insertions(+)
> > > > >
> > > > > diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c
> > > > > index 394e6e1e9686..f295772d5169 100644
> > > > > --- a/drivers/dma-buf/sync_file.c
> > > > > +++ b/drivers/dma-buf/sync_file.c
> > > > > @@ -459,6 +459,22 @@ static long sync_file_ioctl_fence_info(struct 
> > > > > sync_file *sync_file,
> > > > >   return ret;
> > > > >  }
> > > > >
> > > > > +static int sync_file_ioctl_set_deadline(struct sync_file *sync_file,
> > > > > + unsigned long arg)
> > > > > +{
> > > > > + struct sync_set_deadline ts;
> > > > > +
> > > > > + if (copy_from_user(&ts, (void __user *)arg, sizeof(ts)))
> > > > > + return -EFAULT;
> > > > > +
> > > > > + if (ts.pad)
> > > > > + return -EINVAL;
> > > > > +
> > > > > + dma_fence_set_deadline(sync_file->fence, ktime_set(ts.tv_sec, 
> > > > > ts.tv_nsec));
> > > > > +
> > > > > + return 0;
> > > > > +}
> > > > > +
> > > > >  static long sync_file_ioctl(struct file *file, unsigned int cmd,
> > > > >   unsigned long arg)
> > > > >  {
> > > > > @@ -471,6 +487,9 @@ static long sync_file_ioctl(struct file *file, 
> > > > > unsigned int cmd,
> > > > >   case SYNC_IOC_FILE_INFO:
> > > > >   return sync_file_ioctl_fence_info(sync_file, arg);
> > > > >
> > > > > + case SYNC_IOC_SET_DEADLINE:
> > > > > + return sync_file_ioctl_set_deadline(sync_file, arg);
> > > > > +
> > > > >   default:
> > > > >   return -ENOTTY;
> > > > >   }
> > > > > diff --git a/include/uapi/linux/sync_file.h 
> > > > > b/include/uapi/linux/sync_file.h
> > > > > inde

Re: [PATCH v3 10/16] drm/panel-simple: Non-eDP panels don't need "HPD" handling

2021-09-08 Thread Doug Anderson
Hi,

On Sun, Sep 5, 2021 at 11:46 AM Sam Ravnborg  wrote:
>
> On Wed, Sep 01, 2021 at 01:19:28PM -0700, Douglas Anderson wrote:
> > All of the "HPD" handling added to panel-simple recently was for eDP
> > panels. Remove it from panel-simple now that panel-simple-edp handles
> > eDP panels. The "prepare_to_enable" delay only makes sense in the
> > context of HPD, so remove it too. No non-eDP panels used it anyway.
> >
> > Signed-off-by: Douglas Anderson 
>
> Maybe merge this with the patch that moved all the functionality
> from panel-simple to panel-edp?

Unless you feel strongly about it, I'm going to keep it separate still
in the next version. To try to make diffing easier, I tried hard to
make the minimal changes in the "split the driver in two" patch.

-Doug


Re: [PATCH] drm/bridge: ti-sn65dsi83: Check link status register after enabling the bridge

2021-09-08 Thread Andrzej Hajda


W dniu 08.09.2021 o 13:11, Dave Stevenson pisze:
> Hi Marek and Andrzej
>
> On Tue, 7 Sept 2021 at 22:24, Marek Vasut  wrote:
>> On 9/7/21 7:29 PM, Andrzej Hajda wrote:
>>> W dniu 07.09.2021 o 16:25, Marek Vasut pisze:
 On 9/7/21 9:31 AM, Andrzej Hajda wrote:
> On 07.09.2021 04:39, Marek Vasut wrote:
>> In rare cases, the bridge may not start up correctly, which usually
>> leads to no display output. In case this happens, warn about it in
>> the kernel log.
>>
>> Signed-off-by: Marek Vasut 
>> Cc: Jagan Teki 
>> Cc: Laurent Pinchart 
>> Cc: Linus Walleij 
>> Cc: Robert Foss 
>> Cc: Sam Ravnborg 
>> Cc: dri-devel@lists.freedesktop.org
>> ---
>> NOTE: See the following:
>> https://e2e.ti.com/support/interface-group/interface/f/interface-forum/942005/sn65dsi83-dsi83-lvds-bridge---sporadic-behavior---no-video
>>
>> https://community.nxp.com/t5/i-MX-Processors/i-MX8M-MIPI-DSI-Interface-LVDS-Bridge-Initialization/td-p/1156533
>>
>> ---
>>  drivers/gpu/drm/bridge/ti-sn65dsi83.c | 5 +
>>  1 file changed, 5 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/bridge/ti-sn65dsi83.c
>> b/drivers/gpu/drm/bridge/ti-sn65dsi83.c
>> index a32f70bc68ea4..4ea71d7f0bfbc 100644
>> --- a/drivers/gpu/drm/bridge/ti-sn65dsi83.c
>> +++ b/drivers/gpu/drm/bridge/ti-sn65dsi83.c
>> @@ -520,6 +520,11 @@ static void sn65dsi83_atomic_enable(struct
>> drm_bridge *bridge,
>>  /* Clear all errors that got asserted during initialization. */
>>  regmap_read(ctx->regmap, REG_IRQ_STAT, &pval);
>>  regmap_write(ctx->regmap, REG_IRQ_STAT, pval);
>
> It does not look as correct error handling, maybe it would be good to
> analyze and optionally report 'unexpected' errors here as well.
 The above is correct -- it clears the status register because the
 setup might've set random bits in that register. Then we wait a bit,
 let the link run, and read them again to get the real link status in
 this new piece of code below, hence the usleep_range there. And then
 if the link indicates a problem, we know it is a problem.
>>>
>>> Usually such registers are cleared on very beginning of the
>>> initialization, and tested (via irq handler, or via reading), during
>>> initalization, if initialization phase goes well. If it is not the case
>>> forgive me.
>> The init just flips the bit at random in the IRQ_STAT register, so no,
>> that's not really viable here. That's why we clear them at the end, and
>> then wait a bit, and then check whether something new appeared in them.
>>
>> If not, all is great.
>>
>> Sure, we could generate an IRQ, but then IRQ line is not always
>> connected to this chip on all hardware I have available. So this gives
>> the user at least some indication that something is wrong with their HW.
>>
>> +
>> +usleep_range(1, 12000);
>> +regmap_read(ctx->regmap, REG_IRQ_STAT, &pval);
>> +if (pval)
>> +dev_err(ctx->dev, "Unexpected link status 0x%02x\n", pval);
>
> I am not sure what is the case here but it looks like 'we do not know
> what is going on, so let's add some diagnostic messages to gather info
> and figure it out later'.
 That's pretty much the case, see the two links above in the NOTE
 section. If something goes wrong, we print the value for the user
 (usually developer) so they can fix their problems. We cannot do much
 better in the attach callback.

 The issue I ran into (and where this would be helpful information to
 me during debugging, since the issue happened real seldom, see also
 the NOTE links above) is that the DSI controller driver started
 streaming video on the data lanes before the DSI83 had a chance to
 initialize. This worked most of the time, except for a few exceptions
 here and there, where the video didn't start. This does set link
 status bits consistently. In the meantime, I fixed the controller
 driver (so far downstream, due to ongoing discussion).
>>>
>>> Maybe drm_connector_set_link_status_property(conn,
>>> DRM_MODE_LINK_STATUS_BAD) would be usefule here.
>> Hmm, this works on connector, the dsi83 is a bridge and it can be stuck
>> between two other bridges. That doesn't seem like the right tool, no ?
>>
> Whole driver lacks IRQ handler which IMO could perform better diagnosis,
> and I guess it could also help in recovery, but this is just my guess.
> So if this patch is enough for now you can add:
 No, IRQ won't help you here, because by the time you get the IRQ, the
 DSI host already started streaming video on data lanes and you won't
 be able to correctly reinit the DSI83 unless you communicate to the
 DSI host that it should switch the data lanes back to LP11.

 And for that, there is a bigger chunk missing really. What needs to be
 added is a way for th

Re: [PATCH] drm: mxsfb: Fix NULL pointer dereference crash on unload

2021-09-08 Thread Marek Vasut

On 9/8/21 8:24 PM, Daniel Vetter wrote:

On Tue, Sep 07, 2021 at 04:49:00AM +0200, Marek Vasut wrote:

The mxsfb->crtc.funcs may already be NULL when unloading the driver,
in which case calling mxsfb_irq_disable() via drm_irq_uninstall() from
mxsfb_unload() leads to NULL pointer dereference.

Since all we care about is masking the IRQ and mxsfb->base is still
valid, just use that to clear and mask the IRQ.

Fixes: ae1ed00932819 ("drm: mxsfb: Stop using DRM simple display pipeline 
helper")
Signed-off-by: Marek Vasut 
Cc: Daniel Abrecht 
Cc: Emil Velikov 
Cc: Laurent Pinchart 
Cc: Sam Ravnborg 
Cc: Stefan Agner 


You probably want a drm_atomic_helper_shutdown instead of trying to do all
that manually. We've also added a bunch more devm and drmm_ functions to
automate the cleanup a lot more here, e.g. your drm_mode_config_cleanup is
in the wrong place.

Also I'm confused because I'm not even seeing this function anywhere in
upstream.


It is still here:
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/tree/drivers/gpu/drm/mxsfb/mxsfb_drv.c#n171
as of:
999569d59a0aa ("Add linux-next specific files for 20210908")

Is there some other tree I should be looking at ?


  1   2   >