date:20220301

https://bugzilla.kernel.org/show_bug.cgi?id=215511

Philipp Riederer (pr_ker...@tum.fail) changed:

   What|Removed |Added

 CC||pr_ker...@tum.fail

--- Comment #7 from Philipp Riederer (pr_ker...@tum.fail) ---
Hi!

My Lenovo T14s (AMD) crashes with a panic (https://imgur.com/a/P6Twvov) when I
unplug/replug any monitor. This also happens when waking from DPMS.

I have bisected the issue to the same 0f591d17e36e08313b0c440b99b0e57b47e01a9a
as Jose. The patch (that is already mainlined, if I see that correctly) does
not help.

I have tried all kernel up to 5.15.24 -- I cannot try 5.16 as I use zfs as root
device the and zfs module is not (yet) compatible with 5.16.

Is there anything you would like me to try or should my issue be fixed in
5.16+?

Cheers,
Philipp

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

Re: [PATCH] dt-bindings: Another pass removing cases of 'allOf' containing a '$ref'

2022-03-01 Thread Lee Jones

On Mon, 28 Feb 2022, Rob Herring wrote:

> Another pass at removing unnecessary use of 'allOf' with a '$ref'.
> 
> json-schema versions draft7 and earlier have a weird behavior in that
> any keywords combined with a '$ref' are ignored (silently). The correct
> form was to put a '$ref' under an 'allOf'. This behavior is now changed
> in the 2019-09 json-schema spec and '$ref' can be mixed with other
> keywords.
> 
> Cc: Krzysztof Kozlowski 
> Cc: Thierry Reding 
> Cc: Sam Ravnborg 
> Cc: Dmitry Torokhov 
> Cc: Pavel Machek 
> Cc: Lee Jones 
> Cc: Guenter Roeck 
> Cc: Miquel Raynal 
> Cc: Richard Weinberger 
> Cc: Vignesh Raghavendra 
> Cc: "David S. Miller" 
> Cc: Jakub Kicinski 
> Cc: Kishon Vijay Abraham I 
> Cc: Vinod Koul 
> Cc: Sebastian Reichel 
> Cc: Bjorn Andersson 
> Cc: Mathieu Poirier 
> Cc: Mark Brown 
> Cc: Greg Kroah-Hartman 
> Cc: Laurent Pinchart 
> Cc: dri-devel@lists.freedesktop.org
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-in...@vger.kernel.org
> Cc: linux-l...@vger.kernel.org
> Cc: linux-...@lists.infradead.org
> Cc: net...@vger.kernel.org
> Cc: linux-...@lists.infradead.org
> Cc: linux...@vger.kernel.org
> Cc: linux-remotep...@vger.kernel.org
> Cc: alsa-de...@alsa-project.org
> Cc: linux-...@vger.kernel.org
> Cc: linux-...@vger.kernel.org
> Signed-off-by: Rob Herring 
> ---
>  .../bindings/connector/usb-connector.yaml |  3 +--
>  .../bindings/display/brcm,bcm2711-hdmi.yaml   |  3 +--
>  .../bindings/display/bridge/adi,adv7511.yaml  |  5 ++---
>  .../bindings/display/bridge/synopsys,dw-hdmi.yaml |  5 ++---
>  .../bindings/display/panel/display-timings.yaml   |  3 +--
>  .../devicetree/bindings/display/ste,mcde.yaml |  4 ++--
>  .../devicetree/bindings/input/adc-joystick.yaml   |  9 -
>  .../bindings/leds/cznic,turris-omnia-leds.yaml|  3 +--
>  .../devicetree/bindings/leds/leds-lp50xx.yaml |  3 +--

>  .../devicetree/bindings/mfd/google,cros-ec.yaml   | 12 

Go for it.

Acked-by: Lee Jones 

>  .../devicetree/bindings/mtd/nand-controller.yaml  |  8 +++-
>  .../bindings/mtd/rockchip,nand-controller.yaml|  3 +--
>  .../devicetree/bindings/net/ti,cpsw-switch.yaml   |  3 +--
>  .../bindings/phy/phy-stm32-usbphyc.yaml   |  3 +--
>  .../bindings/power/supply/sbs,sbs-manager.yaml|  4 +---
>  .../bindings/remoteproc/ti,k3-r5f-rproc.yaml  |  3 +--
>  .../devicetree/bindings/soc/ti/ti,pruss.yaml  | 15 +++
>  .../devicetree/bindings/sound/st,stm32-sai.yaml   |  3 +--
>  .../devicetree/bindings/sound/tlv320adcx140.yaml  | 13 ++---
>  .../devicetree/bindings/spi/spi-controller.yaml   |  4 +---
>  .../devicetree/bindings/usb/st,stusb160x.yaml |  4 +---
>  21 files changed, 39 insertions(+), 74 deletions(-)

-- 
Lee Jones [李琼斯]
Principal Technical Lead - Developer Services
Linaro.org │ Open source software for Arm SoCs
Follow Linaro: Facebook | Twitter | Blog

Re: [PATCH 1/9] dt-bindings: mxsfb: Add compatible for i.MX8MP

2022-03-01 Thread Lucas Stach

Hi Marek, hi Liu,

Am Dienstag, dem 01.03.2022 um 10:44 +0800 schrieb Liu Ying:
> On Mon, 2022-02-28 at 16:34 +0100, Marek Vasut wrote:
> > On 2/28/22 09:18, Liu Ying wrote:
> > 
> > Hi,
> 
> Hi,
> 
> > 
> > > > > On Mon, 2022-02-28 at 01:45 +0100, Marek Vasut wrote:
> > > > > > Add compatible string for i.MX8MP LCDIF variant. This is called 
> > > > > > LCDIFv3
> > > > > > and is completely different from the LCDIFv3 found in i.MX23 in 
> > > > > > that it
> > > > > 
> > > > > In i.MX23 reference manual, there is no LCDIFv3 found, but only LCDIF.
> > > > 
> > > > See i.MX23 HW_LCDIF_VERSION MAJOR=0x3 , that's LCDIF V3 . MX28 has LCDIF
> > > > V4 .
> > > 
> > > Ok, got it now. AFAIK, the SoC design team calls i.MX8MP display
> > > controller as 'LCDIFv3'. Those in other SoCs are called 'LCDIF'.  There
> > > is not even a register in i.MX8MP display controller to decribe the
> > > version.
> > 
> > We also don't have a version register on MX6SX and we call it LCDIF V6 
> > in the driver. The naming scheme is confusing.
> 
> It looks ok for the current mxsfb drm driver to use its own version
> tracking mechanism to distinguish kinda small difference across LCDIF
> variants.  However, LCDIFv3 in i.MX8mp is a totally different IP, which
> does not apply to the tracking mechanism.
> 
> > 
> > > > > > has a completely scrambled register layout compared to all previous 
> > > > > > LCDIF
> > > > > 
> > > > > It looks like no single register of i.MX8MP LCDIFv3 overlaps with
> > > > > registers in other i.MX2x/6x/7x/8x LCDIFs. The LCDIFv3 block diagram 
> > > > > is
> > > > > totally different from the LCDIF block diagram, according to the SoC
> > > > > reference manuals. LCDIFv3 supports SHADOW_EN bit to update horizontal
> > > > > and vertical size of graphic, position of graphic on the panel, 
> > > > > address
> > > > > of graphic in memory and color formats or color palettes, which is not
> > > > > supported by LCDIF and impacts display driver control mechanism
> > > > > considerably. LCDIF supports DOTCLK interface, MPU interface and VSYNC
> > > > > interface, while LCDIFv3 only supports parallel output as a 
> > > > > counterpart
> > > > > of the DOTCLK interface.
> > > > > 
> > > > > Generally speaking, LCDIFv3 is just a new display IP which happens to
> > > > > have the word 'LCDIF' in its name.  Although both of LCDIFv3 and LCDIF
> > > > > are display controllers for scanning out frames onto display devices, 
> > > > > I
> > > > > don't think they are in one family.
> > > > > 
> > > > > So, LCDIFv3 deserves a new separate dt-binding, IMO.
> > > > 
> > > > It seems to me a lot of those bits just map to their previous
> > > > equivalents in older LCDIF, others were dropped, so this is some sort of
> > > > new LCDIF mutation, is it not ?
> > > 
> > > I say 'LCDIFv3' and 'LCDIF' are totally two IPs, if I compare the names
> > > of registers and the names of register bits .
> > > 
> > > > I am aware NXP has a separate driver in its downstream, but I'm not
> > > > convinced the duplication of boilerplate code by introducing a separate
> > > > driver for what looks like another LCDIF variant is the right approach.
> > > 
> > > Hmmm, given the two IPs, I think there should be separate drivers.
> > >   With one single driver, there would be too many 'if/else' checks to
> > > separate the logics for the IPs, just like Patch 9/9 does.  The
> > > boilerplate code to do things like registering a drm device is
> > > acceptable, IMO.
> > > 
> > > Aside from that, with separate drivers, we don't have to test too many
> > > SoCs if we only want to touch either 'LCDIFv3' or 'LCDIF'.
> > 
> > But then, with two drivers, you also might miss fixes which get applied 
> > to one driver and not the other, eventually the two drivers will diverge 
> > and that's not good.
> 
> Given the two totally different IPs, I don't see bugs of IP control
> logics should be fixed for both drivers. Naturally, the two would
> diverge due to different HWs. Looking at Patch 9/9, it basically
> squashes code to control LCDIFv3 into the mxsfb drm driver with
> 'if/else' checks(barely no common control code), which is hard to
> maintain and not able to achieve good scalability for both 'LCDIFv3'
> and 'LCDIF'.

I tend to agree with Liu here. Writing a DRM driver isn't that much
boilerplate anymore with all the helpers we have available in the
framework today.

The IP is so different from the currently supported LCDIF controllers
that I think trying to support this one in the existing driver actually
increases the chances to break something when modifying the driver in
the future. Not everyone is able to test all LCDIF versions. My vote is
on having a separate driver for the i.MX8MP LCDIF.

Regards,
Lucas

Re: [PATCH] dt-bindings: Another pass removing cases of 'allOf' containing a '$ref'

2022-03-01 Thread Vinod Koul

On 28-02-22, 15:38, Rob Herring wrote:
> Another pass at removing unnecessary use of 'allOf' with a '$ref'.
> 
> json-schema versions draft7 and earlier have a weird behavior in that
> any keywords combined with a '$ref' are ignored (silently). The correct
> form was to put a '$ref' under an 'allOf'. This behavior is now changed
> in the 2019-09 json-schema spec and '$ref' can be mixed with other
> keywords.

...

>  .../bindings/connector/usb-connector.yaml |  3 +--
>  .../bindings/display/brcm,bcm2711-hdmi.yaml   |  3 +--
>  .../bindings/display/bridge/adi,adv7511.yaml  |  5 ++---
>  .../bindings/display/bridge/synopsys,dw-hdmi.yaml |  5 ++---
>  .../bindings/display/panel/display-timings.yaml   |  3 +--
>  .../devicetree/bindings/display/ste,mcde.yaml |  4 ++--
>  .../devicetree/bindings/input/adc-joystick.yaml   |  9 -
>  .../bindings/leds/cznic,turris-omnia-leds.yaml|  3 +--
>  .../devicetree/bindings/leds/leds-lp50xx.yaml |  3 +--
>  .../devicetree/bindings/mfd/google,cros-ec.yaml   | 12 
>  .../devicetree/bindings/mtd/nand-controller.yaml  |  8 +++-
>  .../bindings/mtd/rockchip,nand-controller.yaml|  3 +--
>  .../devicetree/bindings/net/ti,cpsw-switch.yaml   |  3 +--
>  .../bindings/phy/phy-stm32-usbphyc.yaml   |  3 +--

Acked-By: Vinod Koul 

-- 
~Vinod

Re: [PATCH 1/9] dt-bindings: mxsfb: Add compatible for i.MX8MP

2022-03-01 Thread Marek Vasut


On 3/1/22 11:04, Lucas Stach wrote:

Hi,

[...]


Given the two totally different IPs, I don't see bugs of IP control
logics should be fixed for both drivers. Naturally, the two would
diverge due to different HWs. Looking at Patch 9/9, it basically
squashes code to control LCDIFv3 into the mxsfb drm driver with
'if/else' checks(barely no common control code), which is hard to
maintain and not able to achieve good scalability for both 'LCDIFv3'
and 'LCDIF'.


I tend to agree with Liu here. Writing a DRM driver isn't that much
boilerplate anymore with all the helpers we have available in the
framework today.


I did write a separate driver for this IP before I spent time merging 
them into single driver, that's when I realized a single driver is much 
better and discarded the separate driver idea.



The IP is so different from the currently supported LCDIF controllers
that I think trying to support this one in the existing driver actually
increases the chances to break something when modifying the driver in
the future. Not everyone is able to test all LCDIF versions. My vote is
on having a separate driver for the i.MX8MP LCDIF.


If you look at both controllers, it is clear it is still the LCDIF 
behind, even the CSC that is bolted on would suggest that.


I am also not happy when I look at the amount of duplication a separate 
driver would create, it will be some 50% of the code that would be just 
duplicated.


[...]

Re: [PATCH] [v2] Kbuild: move to -std=gnu11

2022-03-01 Thread Miguel Ojeda

On Mon, Feb 28, 2022 at 11:32 AM Arnd Bergmann  wrote:
>
> -under ``-std=gnu89`` [gcc-c-dialect-options]_: the GNU dialect of ISO C90
> -(including some C99 features). ``clang`` [clang]_ is also supported, see
> +under ``-std=gnu11`` [gcc-c-dialect-options]_: the GNU dialect of ISO C11
> +(including some C17 features). ``clang`` [clang]_ is also supported, see

I think the "(including some C17)" bit would not make much sense
anymore. There were no major changes in C17 and GCC implements
`-std=c11` and `-std=c17` as basically the same thing according to the
docs (and GNU extensions apply equally to both, I would assume).

When I wrote the "(including some C99 features)" I meant that GCC
implemented some C99 features as extensions in C90 mode, and the
kernel used some of those (e.g. the now gone VLAs).

With that changed, for `programming-language.rst`:

Reviewed-by: Miguel Ojeda 

Cheers,
Miguel

Re: [Intel-gfx] [PATCH 1/3] drm/i915/guc: Limit scheduling properties to avoid overflow




On 28/02/2022 18:32, John Harrison wrote:

On 2/28/2022 08:11, Tvrtko Ursulin wrote:

On 25/02/2022 17:39, John Harrison wrote:

On 2/25/2022 09:06, Tvrtko Ursulin wrote:


On 24/02/2022 19:19, John Harrison wrote:

[snip]


./gt/uc/intel_guc_fwif.h: u32 execution_quantum;

./gt/uc/intel_guc_submission.c: desc->execution_quantum = 
engine->props.timeslice_duration_ms * 1000;


./gt/intel_engine_types.h:  unsigned long 
timeslice_duration_ms;


timeslice_store/preempt_timeout_store:
err = kstrtoull(buf, 0, &duration);

So both kconfig and sysfs can already overflow GuC, not only 
because of tick conversion internally but because at backend 
level nothing was done for assigning 64-bit into 32-bit. Or I 
failed to find where it is handled.
That's why I'm adding this range check to make sure we don't 
allow overflows.


Yes and no, this fixes it, but the first bug was not only due 
GuC internal tick conversion. It was present ever since the u64 
from i915 was shoved into u32 sent to GuC. So even if GuC used 
the value without additional multiplication, bug was be there. 
My point being when GuC backend was added timeout_ms values 
should have been limited/clamped to U32_MAX. The tick discovery 
is additional limit on top.
I'm not disagreeing. I'm just saying that the truncation wasn't 
noticed until I actually tried using very long timeouts to debug 
a particular problem. Now that it is noticed, we need some method 
of range checking and this simple clamp solves all the truncation 
problems.


Agreed in principle, just please mention in the commit message all 
aspects of the problem.


I think we can get away without a Fixes: tag since it requires 
user fiddling to break things in unexpected ways.


I would though put in a code a clamping which expresses both, 
something like min(u32, ..GUC LIMIT..). So the full story is 
documented forever. Or "if > u32 || > ..GUC LIMIT..) return 
-EINVAL". Just in case GuC limit one day changes but u32 stays. 
Perhaps internal ticks go away or anything and we are left with 
plain 1:1 millisecond relationship.
Can certainly add a comment along the lines of "GuC API only takes 
a 32bit field but that is further reduced to GUC_LIMIT due to 
internal calculations which would otherwise overflow".


But if the GuC limit is > u32 then, by definition, that means the 
GuC API has changed to take a u64 instead of a u32. So there will 
no u32 truncation any more. So I'm not seeing a need to explicitly 
test the integer size when the value check covers that.


Hmm I was thinking if the internal conversion in the GuC fw changes 
so that GUC_POLICY_MAX_PREEMPT_TIMEOUT_MS goes above u32, then to be 
extra safe by documenting in code there is the additional limit of 
the data structure field. Say the field was changed to take some 
unit larger than a millisecond. Then the check against the GuC MAX 
limit define would not be enough, unless that would account both for 
internal implementation and u32 in the protocol. Maybe that is 
overdefensive but I don't see that it harms. 50-50, but it's do it 
once and forget so I'd do it.

Huh?

How can the limit be greater than a u32 if the interface only takes a 
u32? By definition the limit would be clamped to u32 size.


If you mean that the GuC policy is in different units and those units 
might not overflow but ms units do, then actually that is already the 
case. The GuC works in us not ms. That's part of why the wrap around 
is so low, we have to multiply by 1000 before sending to GuC. 
However, that is actually irrelevant because the comparison is being 
done on the i915 side in i915's units. We have to scale the GuC limit 
to match what i915 is using. And the i915 side is u64 so if the 
scaling to i915 numbers overflows a u32 then who cares because that 
comparison can be done at 64 bits wide.


If the units change then that is a backwards breaking API change that 
will require a manual driver code update. You can't just recompile 
with a new header and magically get an ms to us or ms to s conversion 
in your a = b assignment. The code will need to be changed to do the 
new unit conversion (note we already convert from ms to us, the GuC 
API is all expressed in us). And that code change will mean having to 
revisit any and all scaling, type conversions, etc. I.e. any 
pre-existing checks will not necessarily be valid and will need to be 
re-visted anyway. But as above, any scaling to GuC units has to be 
incorporated into the limit already because otherwise the limit would 
not fit in the GuC's own API.


Yes I get that, I was just worried that u32 field in the protocol and 
GUC_POLICY_MAX_EXEC_QUANTUM_MS defines are separate in the source code 
and then how to protect against forgetting to update both in sync.


Like if the protocol was changed to take nanoseconds, and firmware 
implementation changed to support the full range, but define 
left/forgotten at 100s. That would then overflow u32.
Huh? If the API was updated to 'sup

Re: [PATCH v5 5/7] drm/aspeed: Add reset and clock for AST2600

2022-03-01 Thread Joel Stanley

On Tue, 1 Mar 2022 at 07:00, Tommy Huang  wrote:
>
> Hi Joel,
>
> It seems that the reset control could keep original code behavior.
> Just change the reset define in the .dtsi file from ASPEED_RESET_CRT1 
> into ASPEED_RESET_GRAPHICS.

Right, because the ASPEED_RESET_CRT reset is released by the
ASPEED_CLK_GATE_D1CLK line?

include/dt-bindings/clock/ast2600-clock.h:#define ASPEED_RESET_CRT
 13

drivers/clk/clk-ast2600.c:  /*
clk rst  name   parent   flags */
drivers/clk/clk-ast2600.c:  [ASPEED_CLK_GATE_D1CLK] = {
10, 13, "d1clk-gate",   "d1clk", 0 >



>
> By the way, the HW controller states and FW programming register will 
> be reset by CRT reset line.
> And another part HW controller states will be reset by engine reset 
> line.

Thanks. Can we include that in the commit message for the device tree change?

>
> Thanks,
>
> By Tommy
>
> > -Original Message-
> > From: Joel Stanley 
> > Sent: Monday, February 28, 2022 5:51 PM
> > To: Tommy Huang 
> > Cc: David Airlie ; Daniel Vetter ; Rob
> > Herring ; Andrew Jeffery ;
> > linux-aspeed ; open list:DRM DRIVERS
> > ; devicetree ;
> > Linux ARM ; Linux Kernel Mailing List
> > ; BMC-SW 
> > Subject: Re: [PATCH v5 5/7] drm/aspeed: Add reset and clock for AST2600
> >
> > On Wed, 8 Dec 2021 at 01:34, Tommy Haung
> >  wrote:
> > >
> > > From: tommy-huang 
> > >
> > > Add more reset and clock select code for AST2600.
> > > The gfx_flags parameter was added for chip caps idenified.
> >
> > Can you tell me a bit more about the two reset lines:
> >
> > What is the CRT reset line controlling?
> >
> > What does the engine reset line control?
> >
> > Can we use devm_reset_control_array_get() to get whichever are specified in
> > the device tree, so we don't need to have different logic for the 2600 and
> > earlier chips?
> >
> >
> >
> > >
> > > Signed-off-by: tommy-huang 
> > > ---
> > >  drivers/gpu/drm/aspeed/aspeed_gfx.h  | 16 +++-
> > >  drivers/gpu/drm/aspeed/aspeed_gfx_crtc.c | 16 
> > > drivers/gpu/drm/aspeed/aspeed_gfx_drv.c  | 50
> > ++--
> > >  3 files changed, 77 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/aspeed/aspeed_gfx.h
> > > b/drivers/gpu/drm/aspeed/aspeed_gfx.h
> > > index 4e6a442c3886..2c733225d3c7 100644
> > > --- a/drivers/gpu/drm/aspeed/aspeed_gfx.h
> > > +++ b/drivers/gpu/drm/aspeed/aspeed_gfx.h
> > > @@ -8,7 +8,8 @@ struct aspeed_gfx {
> > > struct drm_device   drm;
> > > void __iomem*base;
> > > struct clk  *clk;
> > > -   struct reset_control*rst;
> > > +   struct reset_control*rst_crt;
> > > +   struct reset_control*rst_engine;
> > > struct regmap   *scu;
> > >
> > > u32 dac_reg;
> > > @@ -16,6 +17,7 @@ struct aspeed_gfx {
> > > u32 vga_scratch_reg;
> > > u32 throd_val;
> > > u32 scan_line_max;
> > > +   u32 flags;
> > >
> > > struct drm_simple_display_pipe  pipe;
> > > struct drm_connectorconnector;
> > > @@ -106,3 +108,15 @@ int aspeed_gfx_create_output(struct drm_device
> > > *drm);
> > >  /* CRT_THROD */
> > >  #define CRT_THROD_LOW(x)   (x)
> > >  #define CRT_THROD_HIGH(x)  ((x) << 8)
> > > +
> > > +/* SCU control */
> > > +#define SCU_G6_CLK_COURCE  0x300
> > > +
> > > +/* GFX FLAGS */
> > > +#define RESET_MASK BIT(0)
> > > +#define RESET_G6   BIT(0)
> > > +#define CLK_MASK   BIT(4)
> > > +#define CLK_G6 BIT(4)
> > > +
> > > +#define G6_CLK_MASK(BIT(8) | BIT(9) | BIT(10))
> > > +#define G6_USB_40_CLK  BIT(9)
> > > diff --git a/drivers/gpu/drm/aspeed/aspeed_gfx_crtc.c
> > > b/drivers/gpu/drm/aspeed/aspeed_gfx_crtc.c
> > > index 827e62c1daba..e0975ecda92d 100644
> > > --- a/drivers/gpu/drm/aspeed/aspeed_gfx_crtc.c
> > > +++ b/drivers/gpu/drm/aspeed/aspeed_gfx_crtc.c
> > > @@ -77,6 +77,18 @@ static void aspeed_gfx_disable_controller(struct
> > aspeed_gfx *priv)
> > > regmap_update_bits(priv->scu, priv->dac_reg, BIT(16), 0);  }
> > >
> > > +static void aspeed_gfx_set_clk(struct aspeed_gfx *priv) {
> > > +   switch (priv->flags & CLK_MASK) {
> > > +   case CLK_G6:
> > > +   regmap_update_bits(priv->scu, SCU_G6_CLK_COURCE,
> > G6_CLK_MASK, 0x0);
> > > +   regmap_update_bits(priv->scu, SCU_G6_CLK_COURCE,
> > G6_CLK_MASK, G6_USB_40_CLK);
> > > +   break;
> > > +   default:
> > > +   break;
> > > +   }
> > > +}
> > > +
> > >  static void aspeed_gfx_crtc_mode_set_nofb(struct aspeed_gfx *priv)  {
> > > struct drm_

Re: [PATCH 1/9] dt-bindings: mxsfb: Add compatible for i.MX8MP

2022-03-01 Thread Lucas Stach

Am Dienstag, dem 01.03.2022 um 11:19 +0100 schrieb Marek Vasut:
> On 3/1/22 11:04, Lucas Stach wrote:
> 
> Hi,
> 
> [...]
> 
> > > Given the two totally different IPs, I don't see bugs of IP control
> > > logics should be fixed for both drivers. Naturally, the two would
> > > diverge due to different HWs. Looking at Patch 9/9, it basically
> > > squashes code to control LCDIFv3 into the mxsfb drm driver with
> > > 'if/else' checks(barely no common control code), which is hard to
> > > maintain and not able to achieve good scalability for both 'LCDIFv3'
> > > and 'LCDIF'.
> > 
> > I tend to agree with Liu here. Writing a DRM driver isn't that much
> > boilerplate anymore with all the helpers we have available in the
> > framework today.
> 
> I did write a separate driver for this IP before I spent time merging 
> them into single driver, that's when I realized a single driver is much 
> better and discarded the separate driver idea.
> 
> > The IP is so different from the currently supported LCDIF controllers
> > that I think trying to support this one in the existing driver actually
> > increases the chances to break something when modifying the driver in
> > the future. Not everyone is able to test all LCDIF versions. My vote is
> > on having a separate driver for the i.MX8MP LCDIF.
> 
> If you look at both controllers, it is clear it is still the LCDIF 
> behind, even the CSC that is bolted on would suggest that.

Yes, but from a driver PoV what you care about is not really the
hardware blocks used to implement something, but the programming model,
i.e. the register interface exposed to software.

> 
> I am also not happy when I look at the amount of duplication a separate 
> driver would create, it will be some 50% of the code that would be just 
> duplicated.
> 
Yea, the duplicated code is still significant, as the HW itself is so
simple. However, if you find yourself in the situation where basically
every actual register access in the driver ends up being in a "if (some
HW rev) ... " clause, i still think it would be better to have a
separate driver, as the programming interface is just different.

Regards,
Lucas

Re: [PATCH 2/6] treewide: remove using list iterator after loop body as a ptr

2022-03-01 Thread Jakob Koschel




> On 1. Mar 2022, at 01:41, Linus Torvalds  
> wrote:
> 
> On Mon, Feb 28, 2022 at 1:47 PM Jakob Koschel  wrote:
>> 
>> The goal of this is to get compiler warnings right? This would indeed be 
>> great.
> 
> Yes, so I don't mind having a one-time patch that has been gathered
> using some automated checker tool, but I don't think that works from a
> long-term maintenance perspective.
> 
> So if we have the basic rule being "don't use the loop iterator after
> the loop has finished, because it can cause all kinds of subtle
> issues", then in _addition_ to fixing the existing code paths that
> have this issue, I really would want to (a) get a compiler warning for
> future cases and (b) make it not actually _work_ for future cases.
> 
> Because otherwise it will just happen again.
> 
>> Changing the list_for_each_entry() macro first will break all of those cases
>> (e.g. the ones using 'list_entry_is_head()).
> 
> So I have no problems with breaking cases that we basically already
> have a patch for due to  your automated tool. There were certainly
> more than a handful, but it didn't look _too_ bad to just make the
> rule be "don't use the iterator after the loop".
> 
> Of course, that's just based on that patch of yours. Maybe there are a
> ton of other cases that your patch didn't change, because they didn't
> match your trigger case, so I may just be overly optimistic here.

Based on the coccinelle script there are ~480 cases that need fixing
in total. I'll now finish all of them and then split them by
submodules as Greg suggested and repost a patch set per submodule.
Sounds good?

> 
> But basically to _me_, the important part is that the end result is
> maintainable longer-term. I'm more than happy to have a one-time patch
> to fix a lot of dubious cases if we can then have clean rules going
> forward.
> 
>> I assumed it is better to fix those cases first and then have a simple
>> coccinelle script changing the macro + moving the iterator into the scope
>> of the macro.
> 
> So that had been another plan of mine, until I actually looked at
> changing the macro. In the one case I looked at, it was ugly beyond
> belief.
> 
> It turns out that just syntactically, it's really nice to give the
> type of the iterator from outside the way we do now. Yeah, it may be a
> bit odd, and maybe it's partly because I'm so used to the
> "list_for_each_list_entry()" syntax, but moving the type into the loop
> construct really made it nasty - either one very complex line, or
> having to split it over two lines which was even worse.
> 
> Maybe the place I looked at just happened to have a long typename, but
> it's basically always going to be a struct, so it's never a _simple_
> type. And it just looked very odd adn unnatural to have the type as
> one of the "arguments" to that list_for_each_entry() macro.
> 
> So yes, initially my idea had been to just move the iterator entirely
> inside the macro. But specifying the type got so ugly that I think
> that
> 
>typeof (pos) pos
> 
> trick inside the macro really ends up giving us the best of all worlds:
> 
> (a) let's us keep the existing syntax and code for all the nice cases
> that did everything inside the loop anyway
> 
> (b) gives us a nice warning for any normal use-after-loop case
> (unless you explicitly initialized it like that
> sgx_mmu_notifier_release() function did for no good reason
> 
> (c) also guarantees that even if you don't get a warning,
> non-converted (or newly written) bad code won't actually _work_
> 
> so you end up getting the new rules without any ambiguity or mistaken
> 
>> With this you are no longer able to set the 'outer' pos within the list
>> iterator loop body or am I missing something?
> 
> Correct. Any assignment inside the loop will be entirely just to the
> local loop case. So any "break;" out of the loop will have to set
> another variable - like your updated patch did.
> 
>> I fail to see how this will make most of the changes in this
>> patch obsolete (if that was the intention).
> 
> I hope my explanation above clarifies my thinking: I do not dislike
> your patch, and in fact your patch is indeed required to make the new
> semantics work.

ok it's all clear now, thanks for clarifying.
I've defined all the 'tmp' iterator variables uninitialized so applying
your patch on top of that later will just give the nice compiler warning 
if they are used past the loop body.

> 
> What I disliked was always the maintainability of your patch - making
> the rules be something that isn't actually visible in the source code,
> and letting the old semantics still work as well as they ever did, and
> having to basically run some verification pass to find bad users.

Since this patch is not a complete list of cases that need fixing (30%)
I haven't included the actual change of moving the iterator variable
into the loop and thought that would be a second step coming after this
is merged.

With these changes alone, yes you still rely on manu

Re: [Intel-gfx] [PATCH 2/3] drm/i915/gt: Make the heartbeat play nice with long pre-emption timeouts




I'll trim it a bit again..

On 28/02/2022 18:55, John Harrison wrote:

On 2/28/2022 09:12, Tvrtko Ursulin wrote:

On 25/02/2022 18:48, John Harrison wrote:

On 2/25/2022 10:14, Tvrtko Ursulin wrote:


[snip]

Your only objection is that ends up with too long total time before 
reset? Or something else as well?
An unnecessarily long total heartbeat timeout is the main objection. 
(2.5 + 12) * 5 = 72.5 seconds. That is a massive change from the 
current 12.5s.


If we are happy with that huge increase then fine. But I'm pretty 
sure you are going to get a lot more bug reports about hung systems 
not recovering. 10-20s is just about long enough for someone to wait 
before leaning on the power button of their machine. Over a minute is 
not. That kind of delay is going to cause support issues.


Sorry I wrote 12s, while you actually said tP * 12, so 7.68s, chosen 
just so it is longer than tH * 3?


And how do you keep coming up with factor of five? Isn't it four 
periods before "heartbeat stopped"? (Prio normal, hearbeat, barrier 
and then reset.)

Prio starts at low not normal.


Right, slipped my mind since I only keep seeing that one priority ladder 
block in intel_engine_heartbeat.c/heartbeat()..


From the point of view of user experience I agree reasonable 
responsiveness is needed before user "reaches for the power button".


In your proposal we are talking about 3 * 2.5s + 2 * 7.5s, so 22.5s.

Question of workloads.. what is the actual preempt timeout compute is 
happy with? And I don't mean compute setups with disabled hangcheck, 
which you say they want anyway, but if we target defaults for end 
users. Do we have some numbers on what they are likely to run?
Not that I have ever seen. This is all just finger in the air stuff. I 
don't recall if we invented the number and the compute people agreed 
with it or if they proposed the number to us.


Yeah me neither. And found nothing in my email archives. :(

Thinking about it today I don't see that disabled timeout is a practical 
default.


With it, if users have something un-preemptable to run (assuming prio 
normal), it would get killed after ~13s (5 * 2.5).


If we go for my scheme it gets killed in ~17.5s (3 * (2.5 + 2.5) + 2.5 
(third pulse triggers preempt timeout)).


And if we go for your scheme it gets killed in ~22.5s (4 * 2.5 + 2 * 3 * 
2.5).


If I did not confuse any calculation this time round, then the 
differences for default case for normal priority sound pretty immaterial 
to me.


What if we gave them a default of 2.5s? That would be 4 * (2.5s + 
2.5s) = 20s worst case until reset, comparable to your proposal. Are 
there realistic workloads which are non-preemptable for 2.5s? I am 
having hard time imagining someone would run them on a general purpose 
desktop since it would mean frozen and unusable UI anyway.


Advantage still being in my mind that there would be no fudging of 
timeouts during driver load and heartbeat periods depending on 
priority. To me it feels more plausible to account for preempt timeout 
in heartbeat pulses that to calculate preempt timeout to be longer 
than hearbeat pulses, just to avoid races between the two.
Except that when the user asks for a heartbeat period of 2.5s you are 
actually setting it to 5s. How is that not a major fudge that is totally 
disregarding the user's request?


This is indeed the core question. My thinking:

It is not defined in the heartbeat ABI that N pulses should do anything, 
just that they are periodic pulses which check the health of an engine.


If we view user priority as not under our control then we can say that 
any heartbeat pulse can trigger preempt timeout and we should let it do 
that.


From that it follows that it is justified to account for preempt 
timeout in the decision when to schedule heartbeat pulses and that it is 
legitimate to do it for all of them.


It also avoids the double reset problem, regardless of the backend and 
regardless of how the user configured the timeouts. Without the need to 
fudge them neither during driver load or during sysfs store.


User has configured that heartbeat pulses should be sent every N 
seconds, yes, but I think we are free to account for inherent hardware 
and software latencies in our calculations. Especially since other than 
flawed Gen12 RCS, other engines will be much closer to the configured 
period.


It is just the same as user asking for preempt timeout N and we say on 
driver load "oh no you won't get it". Same for heartbeats, they said 
2.5s, we said 2.5s + broken engine factor...


I don't see a problem there. Nothing timing sensitive relies on the 
heartbeat interval nor we provided any guarantees.


That patch from Chris for instance AFAIR accounted for scheduling or 
context switch latencies. Because what is the point of sending further 
elevated priority pulses if we did not leave enough time to the engine 
to schedule them in, react with preemption, or signalling completion?


Persistence itself can sta

[PATCH] drm/panfrost: cleanup comments

2022-03-01 Thread trix

From: Tom Rix 

For spdx
change tab to space delimiter
Use // for *.c

Replacements
commited to committed, use multiline comment style
regsiters to registers
initialze to initialize

Signed-off-by: Tom Rix 
---
 drivers/gpu/drm/panfrost/panfrost_drv.c  | 2 +-
 drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c | 2 +-
 drivers/gpu/drm/panfrost/panfrost_issues.h   | 6 --
 drivers/gpu/drm/panfrost/panfrost_mmu.c  | 2 +-
 drivers/gpu/drm/panfrost/panfrost_regs.h | 2 +-
 5 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c 
b/drivers/gpu/drm/panfrost/panfrost_drv.c
index 96bb5a4656278..94b6f0a19c83a 100644
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c
+++ b/drivers/gpu/drm/panfrost/panfrost_drv.c
@@ -562,7 +562,7 @@ static int panfrost_probe(struct platform_device *pdev)
 
pfdev->coherent = device_get_dma_attr(&pdev->dev) == DEV_DMA_COHERENT;
 
-   /* Allocate and initialze the DRM device. */
+   /* Allocate and initialize the DRM device. */
ddev = drm_dev_alloc(&panfrost_drm_driver, &pdev->dev);
if (IS_ERR(ddev))
return PTR_ERR(ddev);
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c 
b/drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c
index b0142341e2235..77e7cb6d1ae3b 100644
--- a/drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c
+++ b/drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: GPL-2.0 */
+// SPDX-License-Identifier: GPL-2.0
 /* Copyright (C) 2019 Arm Ltd.
  *
  * Based on msm_gem_freedreno.c:
diff --git a/drivers/gpu/drm/panfrost/panfrost_issues.h 
b/drivers/gpu/drm/panfrost/panfrost_issues.h
index 8e59d765bf19f..4e7cf979ee67a 100644
--- a/drivers/gpu/drm/panfrost/panfrost_issues.h
+++ b/drivers/gpu/drm/panfrost/panfrost_issues.h
@@ -13,8 +13,10 @@
  * to care about.
  */
 enum panfrost_hw_issue {
-   /* Need way to guarantee that all previously-translated memory accesses
-* are commited */
+   /*
+* Need way to guarantee that all previously-translated memory accesses
+* are committed
+*/
HW_ISSUE_6367,
 
/* On job complete with non-done the cache is not flushed */
diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c 
b/drivers/gpu/drm/panfrost/panfrost_mmu.c
index 39562f2d11a47..d3f82b26a631d 100644
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c
+++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c
@@ -1,4 +1,4 @@
-// SPDX-License-Identifier:GPL-2.0
+// SPDX-License-Identifier: GPL-2.0
 /* Copyright 2019 Linaro, Ltd, Rob Herring  */
 
 #include 
diff --git a/drivers/gpu/drm/panfrost/panfrost_regs.h 
b/drivers/gpu/drm/panfrost/panfrost_regs.h
index 6c5a11ef1ee87..efe4b75149d35 100644
--- a/drivers/gpu/drm/panfrost/panfrost_regs.h
+++ b/drivers/gpu/drm/panfrost/panfrost_regs.h
@@ -292,7 +292,7 @@
 #define AS_FAULTADDRESS_LO(as) (MMU_AS(as) + 0x20) /* (RO) Fault 
Address for address space n, low word */
 #define AS_FAULTADDRESS_HI(as) (MMU_AS(as) + 0x24) /* (RO) Fault 
Address for address space n, high word */
 #define AS_STATUS(as)  (MMU_AS(as) + 0x28) /* (RO) Status 
flags for address space n */
-/* Additional Bifrost AS regsiters */
+/* Additional Bifrost AS registers */
 #define AS_TRANSCFG_LO(as) (MMU_AS(as) + 0x30) /* (RW) Translation 
table configuration for address space n, low word */
 #define AS_TRANSCFG_HI(as) (MMU_AS(as) + 0x34) /* (RW) Translation 
table configuration for address space n, high word */
 #define AS_FAULTEXTRA_LO(as)   (MMU_AS(as) + 0x38) /* (RO) Secondary 
fault address for address space n, low word */
-- 
2.26.3

[GIT PULL v2] drm/tegra: Changes for v5.18-rc1

2022-03-01 Thread Thierry Reding

Hi Dave, Daniel,

The following changes since commit 8913e1aea4b32a866343b14e565c62cec54f3f78:

  drm/tegra: dpaux: Populate AUX bus (2022-02-23 13:00:37 +0100)

are available in the Git repository at:

  https://gitlab.freedesktop.org/drm/tegra.git tags/drm/tegra/for-5.18-rc1

for you to fetch changes up to cf5086d35d8c7c2b9cb1ca34590097a5f2f8b588:

  drm/tegra: Support YVYU, VYUY and YU24 formats (2022-03-01 11:13:09 +0100)

Thanks,
Thierry


drm/tegra: Changes for v5.18-rc1

This contains a couple more minor fixes that didn't seem urgent enough
for v5.17. On top of that this improves YUV format support on older
chips.


Christophe JAILLET (2):
  gpu: host1x: Fix an error handling path in 'host1x_probe()'
  gpu: host1x: Fix a memory leak in 'host1x_remove()'

Dmitry Osipenko (1):
  drm/tegra: Use dev_err_probe()

Miaoqian Lin (1):
  drm/tegra: Fix reference leak in tegra_dsi_ganged_probe

Thierry Reding (3):
  drm/tegra: Fix planar formats on Tegra186 and later
  drm/tegra: Support semi-planar formats on Tegra114+
  drm/tegra: Support YVYU, VYUY and YU24 formats

chiminghao (1):
  drm/tegra: dpaux: Remove unneeded variable

 drivers/gpu/drm/tegra/dc.c| 50 ++---
 drivers/gpu/drm/tegra/dc.h|  7 +
 drivers/gpu/drm/tegra/dpaux.c |  3 +-
 drivers/gpu/drm/tegra/dsi.c   |  4 ++-
 drivers/gpu/drm/tegra/hdmi.c  | 34 ++--
 drivers/gpu/drm/tegra/hub.c   | 24 --
 drivers/gpu/drm/tegra/plane.c | 73 ++-
 drivers/gpu/drm/tegra/plane.h |  2 +-
 drivers/gpu/host1x/dev.c  |  8 +++--
 9 files changed, 140 insertions(+), 65 deletions(-)

Re: [GIT PULL] drm/tegra: Changes for v5.18-rc1

2022-03-01 Thread Thierry Reding

On Mon, Feb 28, 2022 at 04:51:22PM +1000, Dave Airlie wrote:
> Hi Thierry,
> 
> dim: d65e338027e7 ("gpu: host1x: Fix an error handling path in
> 'host1x_probe()'"): SHA1 in fixes line not found:
> dim: e3166698a8a0 ("drm/tegra: Implement buffer object cache")
> 
> not the same as
> 
>  1f39b1dfa53c84b56d7ad37fed44afda7004959d
> Author: Thierry Reding 
> Date:   Fri Feb 7 16:50:52 2020 +0100
> 
> drm/tegra: Implement buffer object cache
> 
> Please fix that up.

Good catch. I sent up an updated version of the PR.

Thanks,
Thierry


signature.asc
Description: PGP signature

Re: [PATCH] drm/bridge: it6505: Fix the read buffer array bound

Applied to drm-misc-next.

Re: [PATCH] drm/bridge: anx7625: Fix release wrong workqueue

Applied to drm-misc-next.

Re: [PATCH v2] drm/bridge: Clear the DP_AUX_I2C_MOT bit passed in aux read command.

Applied to drm-misc-next.

Re: [PATCH 1/9] dt-bindings: mxsfb: Add compatible for i.MX8MP

2022-03-01 Thread Adam Ford

On Tue, Mar 1, 2022 at 5:05 AM Lucas Stach  wrote:
>
> Am Dienstag, dem 01.03.2022 um 11:19 +0100 schrieb Marek Vasut:
> > On 3/1/22 11:04, Lucas Stach wrote:
> >
> > Hi,
> >
> > [...]
> >
> > > > Given the two totally different IPs, I don't see bugs of IP control
> > > > logics should be fixed for both drivers. Naturally, the two would
> > > > diverge due to different HWs. Looking at Patch 9/9, it basically
> > > > squashes code to control LCDIFv3 into the mxsfb drm driver with
> > > > 'if/else' checks(barely no common control code), which is hard to
> > > > maintain and not able to achieve good scalability for both 'LCDIFv3'
> > > > and 'LCDIF'.
> > >
> > > I tend to agree with Liu here. Writing a DRM driver isn't that much
> > > boilerplate anymore with all the helpers we have available in the
> > > framework today.
> >
> > I did write a separate driver for this IP before I spent time merging
> > them into single driver, that's when I realized a single driver is much
> > better and discarded the separate driver idea.
> >
> > > The IP is so different from the currently supported LCDIF controllers
> > > that I think trying to support this one in the existing driver actually
> > > increases the chances to break something when modifying the driver in
> > > the future. Not everyone is able to test all LCDIF versions. My vote is
> > > on having a separate driver for the i.MX8MP LCDIF.
> >
> > If you look at both controllers, it is clear it is still the LCDIF
> > behind, even the CSC that is bolted on would suggest that.
>
> Yes, but from a driver PoV what you care about is not really the
> hardware blocks used to implement something, but the programming model,
> i.e. the register interface exposed to software.
>
> >
> > I am also not happy when I look at the amount of duplication a separate
> > driver would create, it will be some 50% of the code that would be just
> > duplicated.
> >
> Yea, the duplicated code is still significant, as the HW itself is so
> simple. However, if you find yourself in the situation where basically
> every actual register access in the driver ends up being in a "if (some
> HW rev) ... " clause, i still think it would be better to have a
> separate driver, as the programming interface is just different.

I tend to agree with Marek on this one.  We have an instance where the
blk-ctrl and the GPC driver between 8m, mini, nano, plus are close,
but different enough where each SoC has it's own set of tables and
some checks.   Lucas created the framework, and others adapted it for
various SoC's.  If there really is nearly 50% common code for the
LCDIF, why not either leave the driver as one or split the common code
into its own driver like lcdif-common and then have smaller drivers
that handle their specific variations.

adam
>
> Regards,
> Lucas
>
>

Re: [PATCH] drm/bridge: chipone-icn6211: switch to devm_drm_of_get_bridge

Applied to drm-misc-next.

Re: [PATCH 1/9] dt-bindings: mxsfb: Add compatible for i.MX8MP

2022-03-01 Thread Lucas Stach

Am Dienstag, dem 01.03.2022 um 07:03 -0600 schrieb Adam Ford:
> On Tue, Mar 1, 2022 at 5:05 AM Lucas Stach  wrote:
> > 
> > Am Dienstag, dem 01.03.2022 um 11:19 +0100 schrieb Marek Vasut:
> > > On 3/1/22 11:04, Lucas Stach wrote:
> > > 
> > > Hi,
> > > 
> > > [...]
> > > 
> > > > > Given the two totally different IPs, I don't see bugs of IP control
> > > > > logics should be fixed for both drivers. Naturally, the two would
> > > > > diverge due to different HWs. Looking at Patch 9/9, it basically
> > > > > squashes code to control LCDIFv3 into the mxsfb drm driver with
> > > > > 'if/else' checks(barely no common control code), which is hard to
> > > > > maintain and not able to achieve good scalability for both 'LCDIFv3'
> > > > > and 'LCDIF'.
> > > > 
> > > > I tend to agree with Liu here. Writing a DRM driver isn't that much
> > > > boilerplate anymore with all the helpers we have available in the
> > > > framework today.
> > > 
> > > I did write a separate driver for this IP before I spent time merging
> > > them into single driver, that's when I realized a single driver is much
> > > better and discarded the separate driver idea.
> > > 
> > > > The IP is so different from the currently supported LCDIF controllers
> > > > that I think trying to support this one in the existing driver actually
> > > > increases the chances to break something when modifying the driver in
> > > > the future. Not everyone is able to test all LCDIF versions. My vote is
> > > > on having a separate driver for the i.MX8MP LCDIF.
> > > 
> > > If you look at both controllers, it is clear it is still the LCDIF
> > > behind, even the CSC that is bolted on would suggest that.
> > 
> > Yes, but from a driver PoV what you care about is not really the
> > hardware blocks used to implement something, but the programming model,
> > i.e. the register interface exposed to software.
> > 
> > > 
> > > I am also not happy when I look at the amount of duplication a separate
> > > driver would create, it will be some 50% of the code that would be just
> > > duplicated.
> > > 
> > Yea, the duplicated code is still significant, as the HW itself is so
> > simple. However, if you find yourself in the situation where basically
> > every actual register access in the driver ends up being in a "if (some
> > HW rev) ... " clause, i still think it would be better to have a
> > separate driver, as the programming interface is just different.
> 
> I tend to agree with Marek on this one.  We have an instance where the
> blk-ctrl and the GPC driver between 8m, mini, nano, plus are close,
> but different enough where each SoC has it's own set of tables and
> some checks.   Lucas created the framework, and others adapted it for
> various SoC's.  If there really is nearly 50% common code for the
> LCDIF, why not either leave the driver as one or split the common code
> into its own driver like lcdif-common and then have smaller drivers
> that handle their specific variations.

I don't know exactly how the standalone driver looks like, but I guess
the overlap is not really in any real HW specific parts, but the common
DRM boilerplate, so there isn't much point in creating a common lcdif
driver.

As you brought up the blk-ctrl as an example: I'm all for supporting
slightly different hardware in the same driver, as long as the HW
interface is close enough. But then I also opted for a separate 8MP
blk-ctrl driver for those blk-ctrls that differ significantly from the
others, as I think it would make the common driver unmaintainable
trying to support all the different variants in one driver.

Regards,
Lucas

Re: [PATCH v7 10/24] drm/rockchip: dw_hdmi: Add support for hclk

2022-03-01 Thread Dmitry Osipenko

On 3/1/22 11:37, Sascha Hauer wrote:
> On Tue, Mar 01, 2022 at 01:56:59AM +0300, Dmitry Osipenko wrote:
>> On 2/28/22 17:19, Sascha Hauer wrote:
>>> On Fri, Feb 25, 2022 at 02:11:54PM +0100, Sascha Hauer wrote:
 On Fri, Feb 25, 2022 at 12:41:23PM +, Robin Murphy wrote:
> On 2022-02-25 11:10, Dmitry Osipenko wrote:
>> 25.02.2022 13:49, Sascha Hauer пишет:
>>> On Fri, Feb 25, 2022 at 01:26:14PM +0300, Dmitry Osipenko wrote:
 25.02.2022 10:51, Sascha Hauer пишет:
> The rk3568 HDMI has an additional clock that needs to be enabled for 
> the
> HDMI controller to work. The purpose of that clock is not clear. It is
> named "hclk" in the downstream driver, so use the same name.
>
> Signed-off-by: Sascha Hauer 
> ---
>
> Notes:
>  Changes since v5:
>  - Use devm_clk_get_optional rather than devm_clk_get
>
>   drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c | 16 
>   1 file changed, 16 insertions(+)
>
> diff --git a/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c 
> b/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c
> index fe4f9556239ac..c6c00e8779ab5 100644
> --- a/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c
> +++ b/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c
> @@ -76,6 +76,7 @@ struct rockchip_hdmi {
>   const struct rockchip_hdmi_chip_data *chip_data;
>   struct clk *ref_clk;
>   struct clk *grf_clk;
> + struct clk *hclk_clk;
>   struct dw_hdmi *hdmi;
>   struct regulator *avdd_0v9;
>   struct regulator *avdd_1v8;
> @@ -229,6 +230,14 @@ static int rockchip_hdmi_parse_dt(struct 
> rockchip_hdmi *hdmi)
>   return PTR_ERR(hdmi->grf_clk);
>   }
> + hdmi->hclk_clk = devm_clk_get_optional(hdmi->dev, "hclk");
> + if (PTR_ERR(hdmi->hclk_clk) == -EPROBE_DEFER) {

 Have you tried to investigate the hclk? I'm still thinking that's not
 only HDMI that needs this clock and then the hardware description
 doesn't look correct.
>>>
>>> I am still not sure what you mean. Yes, it's not only the HDMI that
>>> needs this clock. The VOP2 needs it as well and the driver handles that.
>>
>> I'm curious whether DSI/DP also need that clock to be enabled. If they
>> do, then you aren't modeling h/w properly AFAICS.
>
> Assuming nobody at Rockchip decided to make things needlessly inconsistent
> with previous SoCs, HCLK_VOP should be the clock for the VOP's AHB slave
> interface. Usually, if that affected anything other than accessing VOP
> registers, indeed it would smell of something being wrong in the clock 
> tree,
> but in this case I'd also be suspicious of whether it might have ended up
> clocking related GRF registers as well (either directly, or indirectly via
> some gate that the clock driver hasn't modelled yet).

 Ok, I am beginning to understand. I verified that hdmi, mipi and dp are
 hanging when HCLK_VOP is disabled by disabling that clock via sysfs
 using CLOCK_ALLOW_WRITE_DEBUGFS. When it's disabled then the registers
 of that units can't be accessed. However, when I disable HCLK_VOP by
 directly writing to the gate bit RK3568_CLKGATE_CON(20) then only
 accessing VOP registers hangs, the other units stay functional.
 So it seems it must be the parent clock which must be enabled. The
 parent clock is hclk_vo. This clock should be handled as part of the
 RK3568_PD_VO power domain:

power-domain@RK3568_PD_VO {
 reg = ;
 clocks = <&cru HCLK_VO>,
  <&cru PCLK_VO>,
  <&cru ACLK_VOP_PRE>;
  pm_qos = <&qos_hdcp>,
   <&qos_vop_m0>,
   <&qos_vop_m1>;
  #power-domain-cells = <0>;
 };
>>>
>>> Forget this. The clocks in this node are only enabled during enabling or
>>> disabling the power domain, they are disabled again immediately afterwards.
>>>
>>> OK, I need HCLK_VO to access the HDMI registers. I verified that by
>>> disabling HCLK_VO at register level (CRU_GATE_CON(20) BIT(1)). The
>>> HDMI registers become inaccessible then. This means I'll replace
>>> HCLK_VOP in the HDMI node with HCLK_VO. Does this sound sane?
>>
>> The RK3568_PD_VO already has HCLK_VO and the domain should be
>> auto-enabled before HDMI registers are accessed,
> 
> As said, the clocks given in the power domain are only enabled during
> the process of enabling/disabling the power domain and are disabled
> again directly afterwards:
> 
>>  if (rockchip_pmu_domain_is_on(pd) != power_on) {
> 
> They are enabled here:
> 
>>  ret = clk_bulk_enable(pd

Re: [PATCH v7 10/24] drm/rockchip: dw_hdmi: Add support for hclk

2022-03-01 Thread Robin Murphy


On 2022-02-28 14:19, Sascha Hauer wrote:

On Fri, Feb 25, 2022 at 02:11:54PM +0100, Sascha Hauer wrote:

On Fri, Feb 25, 2022 at 12:41:23PM +, Robin Murphy wrote:

On 2022-02-25 11:10, Dmitry Osipenko wrote:

25.02.2022 13:49, Sascha Hauer пишет:

On Fri, Feb 25, 2022 at 01:26:14PM +0300, Dmitry Osipenko wrote:

25.02.2022 10:51, Sascha Hauer пишет:

The rk3568 HDMI has an additional clock that needs to be enabled for the
HDMI controller to work. The purpose of that clock is not clear. It is
named "hclk" in the downstream driver, so use the same name.

Signed-off-by: Sascha Hauer 
---

Notes:
  Changes since v5:
  - Use devm_clk_get_optional rather than devm_clk_get

   drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c | 16 
   1 file changed, 16 insertions(+)

diff --git a/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c 
b/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c
index fe4f9556239ac..c6c00e8779ab5 100644
--- a/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c
+++ b/drivers/gpu/drm/rockchip/dw_hdmi-rockchip.c
@@ -76,6 +76,7 @@ struct rockchip_hdmi {
const struct rockchip_hdmi_chip_data *chip_data;
struct clk *ref_clk;
struct clk *grf_clk;
+   struct clk *hclk_clk;
struct dw_hdmi *hdmi;
struct regulator *avdd_0v9;
struct regulator *avdd_1v8;
@@ -229,6 +230,14 @@ static int rockchip_hdmi_parse_dt(struct rockchip_hdmi 
*hdmi)
return PTR_ERR(hdmi->grf_clk);
}
+   hdmi->hclk_clk = devm_clk_get_optional(hdmi->dev, "hclk");
+   if (PTR_ERR(hdmi->hclk_clk) == -EPROBE_DEFER) {


Have you tried to investigate the hclk? I'm still thinking that's not
only HDMI that needs this clock and then the hardware description
doesn't look correct.


I am still not sure what you mean. Yes, it's not only the HDMI that
needs this clock. The VOP2 needs it as well and the driver handles that.


I'm curious whether DSI/DP also need that clock to be enabled. If they
do, then you aren't modeling h/w properly AFAICS.


Assuming nobody at Rockchip decided to make things needlessly inconsistent
with previous SoCs, HCLK_VOP should be the clock for the VOP's AHB slave
interface. Usually, if that affected anything other than accessing VOP
registers, indeed it would smell of something being wrong in the clock tree,
but in this case I'd also be suspicious of whether it might have ended up
clocking related GRF registers as well (either directly, or indirectly via
some gate that the clock driver hasn't modelled yet).


Ok, I am beginning to understand. I verified that hdmi, mipi and dp are
hanging when HCLK_VOP is disabled by disabling that clock via sysfs
using CLOCK_ALLOW_WRITE_DEBUGFS. When it's disabled then the registers
of that units can't be accessed. However, when I disable HCLK_VOP by
directly writing to the gate bit RK3568_CLKGATE_CON(20) then only
accessing VOP registers hangs, the other units stay functional.
So it seems it must be the parent clock which must be enabled. The
parent clock is hclk_vo. This clock should be handled as part of the
RK3568_PD_VO power domain:

power-domain@RK3568_PD_VO {
 reg = ;
 clocks = <&cru HCLK_VO>,
  <&cru PCLK_VO>,
  <&cru ACLK_VOP_PRE>;
  pm_qos = <&qos_hdcp>,
   <&qos_vop_m0>,
   <&qos_vop_m1>;
  #power-domain-cells = <0>;
 };


Forget this. The clocks in this node are only enabled during enabling or
disabling the power domain, they are disabled again immediately afterwards.

OK, I need HCLK_VO to access the HDMI registers. I verified that by
disabling HCLK_VO at register level (CRU_GATE_CON(20) BIT(1)). The
HDMI registers become inaccessible then. This means I'll replace
HCLK_VOP in the HDMI node with HCLK_VO. Does this sound sane?


Well, it's still a mystery hack overall, and in some ways it seems even 
more suspect to be claiming a whole branch of the clock tree rather than 
a leaf gate with a specific purpose. I'm really starting to think that 
the underlying issue here is a bug in the clock driver, or a hardware 
mishap that should logically be worked around by the clock driver, 
rather than individual the consumers.


Does it work if you hack the clock driver to think that PCLK_VO is a 
child of HCLK_VO? Even if that's not technically true, it would seem to 
effectively match the observed behaviour (i.e. all 3 things whose 
register access apparently *should* be enabled by a gate off PCLK_VO, 
seem to also require HCLK_VO).


Thanks,
Robin.

[PATCH v2 2/8] drm: bridge: nwl-dsi: Switch to devm_drm_of_get_bridge

devm_drm_of_get_bridge is capable of looking up the downstream
bridge and panel and trying to add a panel bridge if the panel
is found.

Replace explicit finding calls with devm_drm_of_get_bridge.

Cc: Guido Günther 
Signed-off-by: Jagan Teki 
---
Changes for v2:
- split the patch

 drivers/gpu/drm/bridge/nwl-dsi.c | 18 +++---
 1 file changed, 3 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/bridge/nwl-dsi.c b/drivers/gpu/drm/bridge/nwl-dsi.c
index 30aacd939dc3..c9e108a7eca2 100644
--- a/drivers/gpu/drm/bridge/nwl-dsi.c
+++ b/drivers/gpu/drm/bridge/nwl-dsi.c
@@ -916,22 +916,10 @@ static int nwl_dsi_bridge_attach(struct drm_bridge 
*bridge,
 {
struct nwl_dsi *dsi = bridge_to_dsi(bridge);
struct drm_bridge *panel_bridge;
-   struct drm_panel *panel;
-   int ret;
-
-   ret = drm_of_find_panel_or_bridge(dsi->dev->of_node, 1, 0, &panel,
- &panel_bridge);
-   if (ret)
-   return ret;
-
-   if (panel) {
-   panel_bridge = drm_panel_bridge_add(panel);
-   if (IS_ERR(panel_bridge))
-   return PTR_ERR(panel_bridge);
-   }
 
-   if (!panel_bridge)
-   return -EPROBE_DEFER;
+   panel_bridge = devm_drm_of_get_bridge(dsi->dev, dsi->dev->of_node, 1, 
0);
+   if (IS_ERR(panel_bridge))
+   return PTR_ERR(panel_bridge);
 
return drm_bridge_attach(bridge->encoder, panel_bridge, bridge, flags);
 }
-- 
2.25.1

[PATCH v2 1/8] Revert "drm/bridge: dw-mipi-dsi: Find the possible DSI devices"

This reverts commit c206c7faeb3263a7cc7b4de443a3877cd7a5e74b.

In order to avoid any probe ordering issues, the I2C based downstream
bridge drivers now register and attach the DSI devices at the probe
instead of doing it on drm_bridge_function.attach().

Examples of those commits are:

commit <6ef7ee48765f> ("drm/bridge: sn65dsi83: Register and attach our
DSI device at probe")
commit  ("drm/bridge: lt8912b: Register and attach our DSI
device at probe")
commit <864c49a31d6b> ("drm/bridge: adv7511: Register and attach our DSI
device at probe")

dw-mipi-dsi has panel or bridge finding code based on previous downstream
bridges, so revert the same and make the panel or bridge funding in host
attach as before.

Signed-off-by: Jagan Teki 
---
Changes for v2:
- none

 drivers/gpu/drm/bridge/synopsys/dw-mipi-dsi.c | 58 +--
 1 file changed, 15 insertions(+), 43 deletions(-)

diff --git a/drivers/gpu/drm/bridge/synopsys/dw-mipi-dsi.c 
b/drivers/gpu/drm/bridge/synopsys/dw-mipi-dsi.c
index 11d20b8638cd..1cc912b6e1f8 100644
--- a/drivers/gpu/drm/bridge/synopsys/dw-mipi-dsi.c
+++ b/drivers/gpu/drm/bridge/synopsys/dw-mipi-dsi.c
@@ -246,7 +246,6 @@ struct dw_mipi_dsi {
 
struct clk *pclk;
 
-   bool device_found;
unsigned int lane_mbps; /* per lane */
u32 channel;
u32 lanes;
@@ -310,37 +309,13 @@ static inline u32 dsi_read(struct dw_mipi_dsi *dsi, u32 
reg)
return readl(dsi->base + reg);
 }
 
-static int dw_mipi_dsi_panel_or_bridge(struct dw_mipi_dsi *dsi,
-  struct device_node *node)
-{
-   struct drm_bridge *bridge;
-   struct drm_panel *panel;
-   int ret;
-
-   ret = drm_of_find_panel_or_bridge(node, 1, 0, &panel, &bridge);
-   if (ret)
-   return ret;
-
-   if (panel) {
-   bridge = drm_panel_bridge_add_typed(panel,
-   DRM_MODE_CONNECTOR_DSI);
-   if (IS_ERR(bridge))
-   return PTR_ERR(bridge);
-   }
-
-   dsi->panel_bridge = bridge;
-
-   if (!dsi->panel_bridge)
-   return -EPROBE_DEFER;
-
-   return 0;
-}
-
 static int dw_mipi_dsi_host_attach(struct mipi_dsi_host *host,
   struct mipi_dsi_device *device)
 {
struct dw_mipi_dsi *dsi = host_to_dsi(host);
const struct dw_mipi_dsi_plat_data *pdata = dsi->plat_data;
+   struct drm_bridge *bridge;
+   struct drm_panel *panel;
int ret;
 
if (device->lanes > dsi->plat_data->max_data_lanes) {
@@ -354,14 +329,22 @@ static int dw_mipi_dsi_host_attach(struct mipi_dsi_host 
*host,
dsi->format = device->format;
dsi->mode_flags = device->mode_flags;
 
-   if (!dsi->device_found) {
-   ret = dw_mipi_dsi_panel_or_bridge(dsi, host->dev->of_node);
-   if (ret)
-   return ret;
+   ret = drm_of_find_panel_or_bridge(host->dev->of_node, 1, 0,
+ &panel, &bridge);
+   if (ret)
+   return ret;
 
-   dsi->device_found = true;
+   if (panel) {
+   bridge = drm_panel_bridge_add_typed(panel,
+   DRM_MODE_CONNECTOR_DSI);
+   if (IS_ERR(bridge))
+   return PTR_ERR(bridge);
}
 
+   dsi->panel_bridge = bridge;
+
+   drm_bridge_add(&dsi->bridge);
+
if (pdata->host_ops && pdata->host_ops->attach) {
ret = pdata->host_ops->attach(pdata->priv_data, device);
if (ret < 0)
@@ -1021,16 +1004,6 @@ static int dw_mipi_dsi_bridge_attach(struct drm_bridge 
*bridge,
/* Set the encoder type as caller does not know it */
bridge->encoder->encoder_type = DRM_MODE_ENCODER_DSI;
 
-   if (!dsi->device_found) {
-   int ret;
-
-   ret = dw_mipi_dsi_panel_or_bridge(dsi, dsi->dev->of_node);
-   if (ret)
-   return ret;
-
-   dsi->device_found = true;
-   }
-
/* Attach the panel-bridge to the dsi bridge */
return drm_bridge_attach(bridge->encoder, dsi->panel_bridge, bridge,
 flags);
@@ -1217,7 +1190,6 @@ __dw_mipi_dsi_probe(struct platform_device *pdev,
 #ifdef CONFIG_OF
dsi->bridge.of_node = pdev->dev.of_node;
 #endif
-   drm_bridge_add(&dsi->bridge);
 
return dsi;
 }
-- 
2.25.1

[PATCH v2 3/8] drm: mediatek: mtk_dsi: Switch to devm_drm_of_get_bridge

devm_drm_of_get_bridge is capable of looking up the downstream
bridge and panel and trying to add a panel bridge if the panel
is found.

Replace explicit finding calls with devm_drm_of_get_bridge.

Cc: Chun-Kuang Hu 
Cc: Philipp Zabel 
Signed-off-by: Jagan Teki 
---
Changes for v2:
- split the patch

 drivers/gpu/drm/mediatek/mtk_dsi.c | 14 +++---
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/mediatek/mtk_dsi.c 
b/drivers/gpu/drm/mediatek/mtk_dsi.c
index 5d90d2eb0019..a1b3e1f4b497 100644
--- a/drivers/gpu/drm/mediatek/mtk_dsi.c
+++ b/drivers/gpu/drm/mediatek/mtk_dsi.c
@@ -1004,7 +1004,6 @@ static int mtk_dsi_probe(struct platform_device *pdev)
 {
struct mtk_dsi *dsi;
struct device *dev = &pdev->dev;
-   struct drm_panel *panel;
struct resource *regs;
int irq_num;
int ret;
@@ -1021,17 +1020,10 @@ static int mtk_dsi_probe(struct platform_device *pdev)
return ret;
}
 
-   ret = drm_of_find_panel_or_bridge(dev->of_node, 0, 0,
- &panel, &dsi->next_bridge);
-   if (ret)
+   dsi->next_bridge = devm_drm_of_get_bridge(dev, dev->of_node, 0, 0);
+   if (IS_ERR(dsi->next_bridge)) {
+   ret = PTR_ERR(dsi->next_bridge);
goto err_unregister_host;
-
-   if (panel) {
-   dsi->next_bridge = devm_drm_panel_bridge_add(dev, panel);
-   if (IS_ERR(dsi->next_bridge)) {
-   ret = PTR_ERR(dsi->next_bridge);
-   goto err_unregister_host;
-   }
}
 
dsi->driver_data = of_device_get_match_data(dev);
-- 
2.25.1

[PATCH v2 4/8] drm: bridge: dw-mipi-dsi: Switch to devm_drm_of_get_bridge

devm_drm_of_get_bridge is capable of looking up the downstream
bridge and panel and trying to add a panel bridge if the panel
is found.

Replace explicit finding calls with devm_drm_of_get_bridge.

Signed-off-by: Jagan Teki 
---
Changes for v2:
- split the patch

 drivers/gpu/drm/bridge/synopsys/dw-mipi-dsi.c | 15 +++
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/bridge/synopsys/dw-mipi-dsi.c 
b/drivers/gpu/drm/bridge/synopsys/dw-mipi-dsi.c
index 1cc912b6e1f8..b2efecf7d160 100644
--- a/drivers/gpu/drm/bridge/synopsys/dw-mipi-dsi.c
+++ b/drivers/gpu/drm/bridge/synopsys/dw-mipi-dsi.c
@@ -315,7 +315,6 @@ static int dw_mipi_dsi_host_attach(struct mipi_dsi_host 
*host,
struct dw_mipi_dsi *dsi = host_to_dsi(host);
const struct dw_mipi_dsi_plat_data *pdata = dsi->plat_data;
struct drm_bridge *bridge;
-   struct drm_panel *panel;
int ret;
 
if (device->lanes > dsi->plat_data->max_data_lanes) {
@@ -329,17 +328,9 @@ static int dw_mipi_dsi_host_attach(struct mipi_dsi_host 
*host,
dsi->format = device->format;
dsi->mode_flags = device->mode_flags;
 
-   ret = drm_of_find_panel_or_bridge(host->dev->of_node, 1, 0,
- &panel, &bridge);
-   if (ret)
-   return ret;
-
-   if (panel) {
-   bridge = drm_panel_bridge_add_typed(panel,
-   DRM_MODE_CONNECTOR_DSI);
-   if (IS_ERR(bridge))
-   return PTR_ERR(bridge);
-   }
+   bridge = devm_drm_of_get_bridge(dsi->dev, dsi->dev->of_node, 1, 0);
+   if (IS_ERR(bridge))
+   return PTR_ERR(bridge);
 
dsi->panel_bridge = bridge;
 
-- 
2.25.1

[PATCH v2 5/8] drm: bridge: nxp-ptn3460: Switch to devm_drm_of_get_bridge

devm_drm_of_get_bridge is capable of looking up the downstream
bridge and panel and trying to add a panel bridge if the panel
is found.

Replace explicit finding calls with devm_drm_of_get_bridge.

Signed-off-by: Jagan Teki 
---
Changes for v2:
- split the patch

 drivers/gpu/drm/bridge/nxp-ptn3460.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/bridge/nxp-ptn3460.c 
b/drivers/gpu/drm/bridge/nxp-ptn3460.c
index e941c1132598..1ab91f4e057b 100644
--- a/drivers/gpu/drm/bridge/nxp-ptn3460.c
+++ b/drivers/gpu/drm/bridge/nxp-ptn3460.c
@@ -263,7 +263,6 @@ static int ptn3460_probe(struct i2c_client *client,
struct device *dev = &client->dev;
struct ptn3460_bridge *ptn_bridge;
struct drm_bridge *panel_bridge;
-   struct drm_panel *panel;
int ret;
 
ptn_bridge = devm_kzalloc(dev, sizeof(*ptn_bridge), GFP_KERNEL);
@@ -271,11 +270,7 @@ static int ptn3460_probe(struct i2c_client *client,
return -ENOMEM;
}
 
-   ret = drm_of_find_panel_or_bridge(dev->of_node, 0, 0, &panel, NULL);
-   if (ret)
-   return ret;
-
-   panel_bridge = devm_drm_panel_bridge_add(dev, panel);
+   panel_bridge = devm_drm_of_get_bridge(dev, dev->of_node, 0, 0);
if (IS_ERR(panel_bridge))
return PTR_ERR(panel_bridge);
 
-- 
2.25.1

[PATCH v2 6/8] drm: bridge: parade-ps8622: Switch to devm_drm_of_get_bridge

devm_drm_of_get_bridge is capable of looking up the downstream
bridge and panel and trying to add a panel bridge if the panel
is found.

Replace explicit finding calls with devm_drm_of_get_bridge.

Signed-off-by: Jagan Teki 
---
Changes for v2:
- split the patch

 drivers/gpu/drm/bridge/parade-ps8622.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/bridge/parade-ps8622.c 
b/drivers/gpu/drm/bridge/parade-ps8622.c
index 614b19f0f1b7..37b308850b4e 100644
--- a/drivers/gpu/drm/bridge/parade-ps8622.c
+++ b/drivers/gpu/drm/bridge/parade-ps8622.c
@@ -452,18 +452,13 @@ static int ps8622_probe(struct i2c_client *client,
struct device *dev = &client->dev;
struct ps8622_bridge *ps8622;
struct drm_bridge *panel_bridge;
-   struct drm_panel *panel;
int ret;
 
ps8622 = devm_kzalloc(dev, sizeof(*ps8622), GFP_KERNEL);
if (!ps8622)
return -ENOMEM;
 
-   ret = drm_of_find_panel_or_bridge(dev->of_node, 0, 0, &panel, NULL);
-   if (ret)
-   return ret;
-
-   panel_bridge = devm_drm_panel_bridge_add(dev, panel);
+   panel_bridge = devm_drm_of_get_bridge(dev, dev->of_node, 0, 0);
if (IS_ERR(panel_bridge))
return PTR_ERR(panel_bridge);
 
-- 
2.25.1

[PATCH v2 7/8] drm: bridge: anx7625: Switch to devm_drm_of_get_bridge

devm_drm_of_get_bridge is capable of looking up the downstream
bridge and panel and trying to add a panel bridge if the panel
is found.

Replace explicit finding calls with devm_drm_of_get_bridge.

Signed-off-by: Jagan Teki 
---
Changes for v2:
- split the patch

 drivers/gpu/drm/bridge/analogix/anx7625.c | 13 +
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/bridge/analogix/anx7625.c 
b/drivers/gpu/drm/bridge/analogix/anx7625.c
index 9aab879a8851..f7c911104464 100644
--- a/drivers/gpu/drm/bridge/analogix/anx7625.c
+++ b/drivers/gpu/drm/bridge/analogix/anx7625.c
@@ -1606,8 +1606,6 @@ static int anx7625_parse_dt(struct device *dev,
struct anx7625_platform_data *pdata)
 {
struct device_node *np = dev->of_node, *ep0;
-   struct drm_panel *panel;
-   int ret;
int bus_type, mipi_lanes;
 
anx7625_get_swing_setting(dev, pdata);
@@ -1644,16 +1642,7 @@ static int anx7625_parse_dt(struct device *dev,
if (of_property_read_bool(np, "analogix,audio-enable"))
pdata->audio_en = 1;
 
-   ret = drm_of_find_panel_or_bridge(np, 1, 0, &panel, NULL);
-   if (ret < 0) {
-   if (ret == -ENODEV)
-   return 0;
-   return ret;
-   }
-   if (!panel)
-   return -ENODEV;
-
-   pdata->panel_bridge = devm_drm_panel_bridge_add(dev, panel);
+   pdata->panel_bridge = devm_drm_of_get_bridge(dev, dev->of_node, 1, 0);
if (IS_ERR(pdata->panel_bridge))
return PTR_ERR(pdata->panel_bridge);
DRM_DEV_DEBUG_DRIVER(dev, "get panel node.\n");
-- 
2.25.1

[PATCH v2 8/8] drm: bridge: anx7625: Switch to devm_drm_of_get_bridge

devm_drm_of_get_bridge is capable of looking up the downstream
bridge and panel and trying to add a panel bridge if the panel
is found.

Replace explicit finding calls with devm_drm_of_get_bridge.

Cc: Linus Walleij 
Signed-off-by: Jagan Teki 
---
Changes for v2:
- split the patch

 drivers/gpu/drm/mcde/mcde_dsi.c | 39 +
 1 file changed, 5 insertions(+), 34 deletions(-)

diff --git a/drivers/gpu/drm/mcde/mcde_dsi.c b/drivers/gpu/drm/mcde/mcde_dsi.c
index 5651734ce977..9371349b8b25 100644
--- a/drivers/gpu/drm/mcde/mcde_dsi.c
+++ b/drivers/gpu/drm/mcde/mcde_dsi.c
@@ -1073,9 +1073,7 @@ static int mcde_dsi_bind(struct device *dev, struct 
device *master,
struct drm_device *drm = data;
struct mcde *mcde = to_mcde(drm);
struct mcde_dsi *d = dev_get_drvdata(dev);
-   struct device_node *child;
-   struct drm_panel *panel = NULL;
-   struct drm_bridge *bridge = NULL;
+   struct drm_bridge *bridge;
 
if (!of_get_available_child_count(dev->of_node)) {
dev_info(dev, "unused DSI interface\n");
@@ -1100,37 +1098,10 @@ static int mcde_dsi_bind(struct device *dev, struct 
device *master,
return PTR_ERR(d->lp_clk);
}
 
-   /* Look for a panel as a child to this node */
-   for_each_available_child_of_node(dev->of_node, child) {
-   panel = of_drm_find_panel(child);
-   if (IS_ERR(panel)) {
-   dev_err(dev, "failed to find panel try bridge (%ld)\n",
-   PTR_ERR(panel));
-   panel = NULL;
-
-   bridge = of_drm_find_bridge(child);
-   if (!bridge) {
-   dev_err(dev, "failed to find bridge\n");
-   return -EINVAL;
-   }
-   }
-   }
-   if (panel) {
-   bridge = drm_panel_bridge_add_typed(panel,
-   DRM_MODE_CONNECTOR_DSI);
-   if (IS_ERR(bridge)) {
-   dev_err(dev, "error adding panel bridge\n");
-   return PTR_ERR(bridge);
-   }
-   dev_info(dev, "connected to panel\n");
-   d->panel = panel;
-   } else if (bridge) {
-   /* TODO: AV8100 HDMI encoder goes here for example */
-   dev_info(dev, "connected to non-panel bridge (unsupported)\n");
-   return -ENODEV;
-   } else {
-   dev_err(dev, "no panel or bridge\n");
-   return -ENODEV;
+   bridge = devm_drm_of_get_bridge(dev, dev->of_node, 0, 0);
+   if (IS_ERR(bridge)) {
+   dev_err(dev, "error to get bridge\n");
+   return PTR_ERR(bridge);
}
 
d->bridge_out = bridge;
-- 
2.25.1

[Bug 215511] Dual monitor with amd 5700 causes system to hang at startup.

https://bugzilla.kernel.org/show_bug.cgi?id=215511

--- Comment #8 from Alex Deucher (alexdeuc...@gmail.com) ---
(In reply to Philipp Riederer from comment #7)
> Hi!
> 
> My Lenovo T14s (AMD) crashes with a panic (https://imgur.com/a/P6Twvov) when
> I unplug/replug any monitor. This also happens when waking from DPMS.
> 
> I have bisected the issue to the same
> 0f591d17e36e08313b0c440b99b0e57b47e01a9a as Jose. The patch (that is already
> mainlined, if I see that correctly) does not help.
> 
> I have tried all kernel up to 5.15.24 -- I cannot try 5.16 as I use zfs as
> root device the and zfs module is not (yet) compatible with 5.16.
> 
> Is there anything you would like me to try or should my issue be fixed in
> 5.16+?

Please open a new ticket as this is a different issue.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

Re: [PATCH] drm/i915: Add a DP1.2 compatible way to read LTTPR capabilities

2022-03-01 Thread Ville Syrjälä

On Mon, Feb 28, 2022 at 10:12:34PM +0200, Imre Deak wrote:
> At least some DELL monitors (P2715Q) with DPCD_REV 1.2 return corrupted
> DPCD register values when reading from the 0xF- LTTPR range with an
> AUX transaction block size bigger than 1. The DP standard requires 0 to
> be returned - as for any other reserved/invalid addresses - but these
> monitors return the DPCD_REV register value repeated in each byte of the
> read buffer. This will in turn corrupt the values returned by the LTTPRs
> between the source and the monitor: LTTPRs must adjust the values they
> read from the downstream DPRX, for instance left-shift/init the
> downstream DP_PHY_REPEATER_CNT value. Since the value returned by the
> monitor's DPRX is non-zero the adjusted values will be corrupt.
> 
> Reading the LTTPR registers one-by-one instead of reading all of them
> with a single AUX transfer works around the issue.
> 
> According to the DP standard's 0xF register description:
> "LTTPR-related registers at DPCD Addresses Fh through F02FFh are
> valid only for DPCD r1.4 (or higher)." While it's unclear if DPCD r1.4
> refers to the DPCD_REV or to the
> LT_TUNABLE_PHY_REPEATER_FIELD_DATA_STRUCTURE_REV register (tickets filed
> at the VESA site to clarify this haven't been addressed), one
> possibility is that it's a restriction due to non-compliant monitors
> described above. Disabling the non-transparent LTTPR mode for all such
> monitors is not a viable solution: the transparent LTTPR mode has its
> own issue causing link training failures and this would affect a lot of
> monitors in use with DPCD_REV < 1.4. Instead this patch works around
> the problem by reading the LTTPR common and PHY cap registers one-by-one
> for any monitor with a DPCD_REV < 1.4.
> 
> The standard requires the DPCD capabilites to be read after the LTTPR
> common capabilities are read, so re-read the DPCD capabilities after
> the LTTPR common and PHY caps were read out.
> 
> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/4531
> Signed-off-by: Imre Deak 
> ---
>  drivers/gpu/drm/dp/drm_dp.c   | 58 ---
>  .../drm/i915/display/intel_dp_link_training.c | 30 +++---
>  include/drm/dp/drm_dp_helper.h|  2 +
>  3 files changed, 59 insertions(+), 31 deletions(-)
> 
> diff --git a/drivers/gpu/drm/dp/drm_dp.c b/drivers/gpu/drm/dp/drm_dp.c
> index 703972ae14c64..f3950d42980f9 100644
> --- a/drivers/gpu/drm/dp/drm_dp.c
> +++ b/drivers/gpu/drm/dp/drm_dp.c
> @@ -2390,9 +2390,36 @@ int drm_dp_dsc_sink_supported_input_bpcs(const u8 
> dsc_dpcd[DP_DSC_RECEIVER_CAP_S
>  }
>  EXPORT_SYMBOL(drm_dp_dsc_sink_supported_input_bpcs);
>  
> +static int drm_dp_read_lttpr_regs(struct drm_dp_aux *aux, const u8 
> dpcd[DP_RECEIVER_CAP_SIZE], int address,
> +   u8 *buf, int buf_size)
> +{
> + /*
> +  * Some monitors with a DPCD_REV < 0x14 return corrupted values when
> +  * reading from the 0xF- range with a block size bigger than 1.
> +  */

This sounds really scary. Have we checked what other registers might
end up corrupted? Eg. couple of rounds of comparing full dd bs=1 vs. 
dd bs=16.

> + int block_size = dpcd[DP_DPCD_REV] < 0x14 ? 1 : buf_size;
> + int offset = 0;
> + int ret;
> +
> + while (offset < buf_size) {

Can we use a for loop?

> + ret = drm_dp_dpcd_read(aux,
> +address + offset,
> +&buf[offset], block_size);
> + if (ret < 0)
> + return ret;
> +
> + WARN_ON(ret != block_size);
> +
> + offset += block_size;
> + }
> +
> + return 0;
> +}
> +

-- 
Ville Syrjälä
Intel

Re: [PATCH] drm/i915: Depend on !PREEMPT_RT.

On 28/02/2022 10:35, Sebastian Andrzej Siewior wrote:

On 2022-02-28 10:10:48 [+], Tvrtko Ursulin wrote:

Hi,

Could you paste a link to the queue of i915 patches pending for a quick
overview of how much work there is and in what areas?

Last post to the list:

https://https://lkml.kernel.org/r/.kernel.org/all/20211214140301.520464-1-bige...@linutronix.de/

or if you look at the DRM section in

https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/tree/patches/series?h=v5.17-rc6-rt10-patches#n156

Thanks!

you see:
0003-drm-i915-Use-preempt_disable-enable_rt-where-recomme.patch
0004-drm-i915-Don-t-disable-interrupts-on-PREEMPT_RT-duri.patch

Two for the display folks.

0005-drm-i915-Don-t-check-for-atomic-context-on-PREEMPT_R.patch

What do preempt_disable/enable do on PREEMPT_RT? Thinking if instead the
solution could be to always force the !ATOMIC path (for the whole
_wait_for_atomic macro) on PREEMPT_RT.

0006-drm-i915-Disable-tracing-points-on-PREEMPT_RT.patch

If the issue is only with certain trace points why disable all?

0007-drm-i915-skip-DRM_I915_LOW_LEVEL_TRACEPOINTS-with-NO.patch

Didn't quite fully understand, why is this not fixable? Especially
thinking if the option of not blanket disabling all tracepoints in the
previous patch.

0008-drm-i915-gt-Queue-and-wait-for-the-irq_work-item.patch

Not sure about why cond_resched was put between irq_work_queue and
irq_work_sync - would it not be like-for-like change to have the two
together? Commit message makes me think _queue already starts the
handler on x86 at least.

0009-drm-i915-gt-Use-spin_lock_irq-instead-of-local_irq_d.patch

I think this is okay. The part after the unlock is serialized by the
tasklet already.

Slight doubt due the comment:

local_irq_enable(); /* flush irq_work (e.g. breadcrumb enabling) */

Makes me want to think about it harder but not now.

Another thing to check is if execlists_context_status_change still needs
the atomic notifier chain with this change.

0010-drm-i915-Drop-the-irqs_disabled-check.patch

LGTM.

Revert-drm-i915-Depend-on-PREEMPT_RT.patch

Okay.

And finally for this very patch (the thread I am replying to):

Acked-by: Tvrtko Ursulin

Regards,

Tvrtko

and you could view them from

https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/tree/patches?h=v5.17-rc6-rt10-patches

Also, I assume due absence of ARCH_SUPPORTS_RT being defined by any arch,
that something more is not yet ready?

Correct. Looking at what I have queued for the next merge window I have
less than 20 patches (excluding i915 and printk) before ARCH_SUPPORTS_RT
can be enabled for x86-64.

Regards,

Tvrtko

Sebastian

[Bug 215511] Dual monitor with amd 5700 causes system to hang at startup.

https://bugzilla.kernel.org/show_bug.cgi?id=215511

--- Comment #9 from Philipp Riederer (pr_ker...@tum.fail) ---
Certainly. Thank you!

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 215648] New: amdgpu: Changing monitor configuration (plug/unplug/wake from DPMS) causes kernel panic

https://bugzilla.kernel.org/show_bug.cgi?id=215648

Bug ID: 215648
   Summary: amdgpu: Changing monitor configuration
(plug/unplug/wake from DPMS) causes kernel panic
   Product: Drivers
   Version: 2.5
Kernel Version: 5.15.12
  Hardware: x86-64
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: Video(DRI - non Intel)
  Assignee: drivers_video-...@kernel-bugs.osdl.org
  Reporter: pr_ker...@tum.fail
Regression: No

Hi!

My Lenovo T14s (AMD) crashes with a panic (https://imgur.com/a/P6Twvov) when I
unplug/replug any monitor. This also happens when an external display wakes
from DPMS.

I have bisected the issue to the same 0f591d17e36e08313b0c440b99b0e57b47e01a9a
as Jose Mestre did in #215511. The patch proposed there (that is already
mainlined, if I see that correctly) does not help. Alex Deucher asked me to
open this as a new bug.

I have tried all kernel up to 5.15.24 -- I cannot try 5.16 as I use zfs as root
device the and zfs module is not (yet) compatible with 5.16.

Is there anything you would like me to try or should my issue be fixed in
5.16+?

Cheers,
Philipp

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

Re: [PATCH v7, 07/15] media: mtk-vcodec: Refactor supported vdec formats and framesizes

2022-03-01 Thread Nicolas Dufresne

Le mercredi 23 février 2022 à 11:40 +0800, Yunfei Dong a écrit :
> Supported output and capture format types for mt8192 are different
> with mt8183. Needs to get format types according to decoder capability.

This patch is both refactoring and changing the behaviour. Can you please split
the non-functional changes from the functional one. This ensure we can proceed
with a good review of the functional changes.

regards,
Nicolas

> 
> Signed-off-by: Yunfei Dong 
> ---
>  .../platform/mtk-vcodec/mtk_vcodec_dec.c  |   8 +-
>  .../mtk-vcodec/mtk_vcodec_dec_stateful.c  |  13 +-
>  .../mtk-vcodec/mtk_vcodec_dec_stateless.c | 117 +-
>  .../platform/mtk-vcodec/mtk_vcodec_drv.h  |  13 +-
>  4 files changed, 107 insertions(+), 44 deletions(-)
> 
> diff --git a/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec.c 
> b/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec.c
> index 304f5afbd419..bae43938ee37 100644
> --- a/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec.c
> +++ b/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec.c
> @@ -26,7 +26,7 @@ mtk_vdec_find_format(struct v4l2_format *f,
>   const struct mtk_video_fmt *fmt;
>   unsigned int k;
>  
> - for (k = 0; k < dec_pdata->num_formats; k++) {
> + for (k = 0; k < *dec_pdata->num_formats; k++) {
>   fmt = &dec_pdata->vdec_formats[k];
>   if (fmt->fourcc == f->fmt.pix_mp.pixelformat)
>   return fmt;
> @@ -525,7 +525,7 @@ static int vidioc_enum_framesizes(struct file *file, void 
> *priv,
>   if (fsize->index != 0)
>   return -EINVAL;
>  
> - for (i = 0; i < dec_pdata->num_framesizes; ++i) {
> + for (i = 0; i < *dec_pdata->num_framesizes; ++i) {
>   if (fsize->pixel_format != dec_pdata->vdec_framesizes[i].fourcc)
>   continue;
>  
> @@ -564,7 +564,7 @@ static int vidioc_enum_fmt(struct v4l2_fmtdesc *f, void 
> *priv,
>   const struct mtk_video_fmt *fmt;
>   int i, j = 0;
>  
> - for (i = 0; i < dec_pdata->num_formats; i++) {
> + for (i = 0; i < *dec_pdata->num_formats; i++) {
>   if (output_queue &&
>   dec_pdata->vdec_formats[i].type != MTK_FMT_DEC)
>   continue;
> @@ -577,7 +577,7 @@ static int vidioc_enum_fmt(struct v4l2_fmtdesc *f, void 
> *priv,
>   ++j;
>   }
>  
> - if (i == dec_pdata->num_formats)
> + if (i == *dec_pdata->num_formats)
>   return -EINVAL;
>  
>   fmt = &dec_pdata->vdec_formats[i];
> diff --git a/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_stateful.c 
> b/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_stateful.c
> index 7966c132be8f..3f33beb9c551 100644
> --- a/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_stateful.c
> +++ b/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_stateful.c
> @@ -37,7 +37,9 @@ static const struct mtk_video_fmt mtk_video_formats[] = {
>   },
>  };
>  
> -#define NUM_FORMATS ARRAY_SIZE(mtk_video_formats)
> +static const unsigned int num_supported_formats =
> + ARRAY_SIZE(mtk_video_formats);
> +
>  #define DEFAULT_OUT_FMT_IDX 0
>  #define DEFAULT_CAP_FMT_IDX 3
>  
> @@ -59,7 +61,8 @@ static const struct mtk_codec_framesizes 
> mtk_vdec_framesizes[] = {
>   },
>  };
>  
> -#define NUM_SUPPORTED_FRAMESIZE ARRAY_SIZE(mtk_vdec_framesizes)
> +static const unsigned int num_supported_framesize =
> + ARRAY_SIZE(mtk_vdec_framesizes);
>  
>  /*
>   * This function tries to clean all display buffers, the buffers will return
> @@ -235,7 +238,7 @@ static void mtk_vdec_update_fmt(struct mtk_vcodec_ctx 
> *ctx,
>   unsigned int k;
>  
>   dst_q_data = &ctx->q_data[MTK_Q_DATA_DST];
> - for (k = 0; k < NUM_FORMATS; k++) {
> + for (k = 0; k < num_supported_formats; k++) {
>   fmt = &mtk_video_formats[k];
>   if (fmt->fourcc == pixelformat) {
>   mtk_v4l2_debug(1, "Update cap fourcc(%d -> %d)",
> @@ -617,11 +620,11 @@ const struct mtk_vcodec_dec_pdata mtk_vdec_8173_pdata = 
> {
>   .ctrls_setup = mtk_vcodec_dec_ctrls_setup,
>   .vdec_vb2_ops = &mtk_vdec_frame_vb2_ops,
>   .vdec_formats = mtk_video_formats,
> - .num_formats = NUM_FORMATS,
> + .num_formats = &num_supported_formats,
>   .default_out_fmt = &mtk_video_formats[DEFAULT_OUT_FMT_IDX],
>   .default_cap_fmt = &mtk_video_formats[DEFAULT_CAP_FMT_IDX],
>   .vdec_framesizes = mtk_vdec_framesizes,
> - .num_framesizes = NUM_SUPPORTED_FRAMESIZE,
> + .num_framesizes = &num_supported_framesize,
>   .worker = mtk_vdec_worker,
>   .flush_decoder = mtk_vdec_flush_decoder,
>   .is_subdev_supported = false,
> diff --git a/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_stateless.c 
> b/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_stateless.c
> index 6d481410bf89..e51d935bd21d 100644
> --- a/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_stateless.c
> +++ b/drivers/media/platform/mtk-vcodec/mtk_vcodec_dec_stat

Re: [v2 2/4] drm/edid: parse multiple CEA extension block

2022-03-01 Thread Ville Syrjälä

On Tue, Mar 01, 2022 at 04:12:14PM +0800, Lee Shawn C wrote:
> Try to find and parse more CEA ext blocks if edid->extensions
> is greater than one.
> 
> v2: split prvious patch to two. And do CEA block parsing
> in this one.
> 
> Cc: Jani Nikula 
> Cc: Ville Syrjala 
> Cc: Ankit Nautiyal 
> Signed-off-by: Lee Shawn C 
> ---
>  drivers/gpu/drm/drm_edid.c | 70 +-
>  1 file changed, 38 insertions(+), 32 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_edid.c b/drivers/gpu/drm/drm_edid.c
> index 375e70d9de86..e2cfde02f837 100644
> --- a/drivers/gpu/drm/drm_edid.c
> +++ b/drivers/gpu/drm/drm_edid.c
> @@ -4319,47 +4319,53 @@ static void drm_parse_y420cmdb_bitmap(struct 
> drm_connector *connector,
>  static int
>  add_cea_modes(struct drm_connector *connector, struct edid *edid)
>  {
> - const u8 *cea, *db, *hdmi = NULL, *video = NULL;
> - u8 dbl, hdmi_len, video_len = 0;
> + const u8 *cea;
>   int modes = 0, ext_index = 0;
>  
> - cea = drm_find_cea_extension(edid, &ext_index);
> - if (cea && cea_revision(cea) >= 3) {
> - int i, start, end;
> + for (;;) {
> + cea = drm_find_cea_extension(edid, &ext_index);
> + if (!cea)
> + break;
>  
> - if (cea_db_offsets(cea, &start, &end))
> - return 0;
> + if (cea && cea_revision(cea) >= 3) {
> + const u8 *db, *hdmi = NULL, *video = NULL;
> + u8 dbl, hdmi_len, video_len = 0;
> + int i, start, end;

This amount of indentation is pretty horrible. Pls reverse
this if-clause to get rid of one level. Should also make
the diff much nicer.

>  
> - for_each_cea_db(cea, i, start, end) {
> - db = &cea[i];
> - dbl = cea_db_payload_len(db);
> + if (cea_db_offsets(cea, &start, &end))
> + continue;
>  
> - if (cea_db_tag(db) == VIDEO_BLOCK) {
> - video = db + 1;
> - video_len = dbl;
> - modes += do_cea_modes(connector, video, dbl);
> - } else if (cea_db_is_hdmi_vsdb(db)) {
> - hdmi = db;
> - hdmi_len = dbl;
> - } else if (cea_db_is_y420vdb(db)) {
> - const u8 *vdb420 = &db[2];
> -
> - /* Add 4:2:0(only) modes present in EDID */
> - modes += do_y420vdb_modes(connector,
> -   vdb420,
> -   dbl - 1);
> + for_each_cea_db(cea, i, start, end) {
> + db = &cea[i];
> + dbl = cea_db_payload_len(db);
> +
> + if (cea_db_tag(db) == VIDEO_BLOCK) {
> + video = db + 1;
> + video_len = dbl;
> + modes += do_cea_modes(connector, video, 
> dbl);
> + } else if (cea_db_is_hdmi_vsdb(db)) {
> + hdmi = db;
> + hdmi_len = dbl;
> + } else if (cea_db_is_y420vdb(db)) {
> + const u8 *vdb420 = &db[2];
> +
> + /* Add 4:2:0(only) modes present in 
> EDID */
> + modes += do_y420vdb_modes(connector,
> +   vdb420,
> +   dbl - 1);
> + }
>   }
> +
> + /*
> +  * We parse the HDMI VSDB after having added the cea 
> modes as we will
> +  * be patching their flags when the sink supports 
> stereo 3D.
> +  */
> + if (hdmi)
> + modes += do_hdmi_vsdb_modes(connector, hdmi, 
> hdmi_len, video,
> + video_len);
>   }
>   }
>  
> - /*
> -  * We parse the HDMI VSDB after having added the cea modes as we will
> -  * be patching their flags when the sink supports stereo 3D.
> -  */
> - if (hdmi)
> - modes += do_hdmi_vsdb_modes(connector, hdmi, hdmi_len, video,
> - video_len);
> -
>   return modes;
>  }
>  
> -- 
> 2.31.1

-- 
Ville Syrjälä
Intel

Re: [PATCH 3/3] drm/msm: Expose client engine utilization via fdinfo




On 28/02/2022 16:01, Rob Clark wrote:

On Mon, Feb 28, 2022 at 6:33 AM Tvrtko Ursulin
 wrote:



On 25/02/2022 22:14, Rob Clark wrote:

On Fri, Feb 25, 2022 at 12:25 PM Rob Clark  wrote:


From: Rob Clark 

Similar to AMD commit
874442541133 ("drm/amdgpu: Add show_fdinfo() interface"), using the
infrastructure added in previous patches, we add basic client info
and GPU engine utilisation for msm.

Example output:

  # cat /proc/`pgrep glmark2`/fdinfo/6
  pos:0
  flags:  0242
  mnt_id: 21
  ino:162
  drm-driver: msm
  drm-client-id:  7
  drm-engine-gpu: 1734371319 ns
  drm-cycles-gpu: 1153645024


Nice, so my vendor agnostic actually worked (with that single fixup of
accounting for the fact pdev tag is optional)?


Note that it might be useful to have a standardized way to report # of
cycles and max freq, so userspace tool can derive %utilization in
addition to just %busy


How do you define %utilisation vs %busy - I don't exactly follow since I
see the two as same?


so, say you are running at 50% of max clk, and gpu is busy 70% of the
time.  The utilization is only 35% because the gpu could scale up the
clk to get more work done.


Got it, thanks. I don't think we have the equivalent on the i915 side 
(we do have global frequency reporting via perf/PMU). In general things 
like these I imagined would be defined as driver specific tags. If you 
look at the drm-usage-stats.rst in my series, there is a "Driver 
specific implementations" section in there which links to the i915 doc 
section.


I've also put a "big fat" comment into the i915 fdinfo fops vfunc 
pointing back to drm-usage-stats.rst which I think is useful when large 
teams work on a driver. Not sure if that applies to msm so just mentioning.


Since this all works for you, would you mind applying your ack against 
20220222140422.1121163-9-tvrtko.ursu...@linux.intel.com?


I need to get some updates r-b's for my series and then I submit it 
again to Dave and Daniel for final acks.


After that, for a 2nd/follow-up phase, I plan to re-surrect the amdgpu 
patch I had to make it compliant to common format, plus document the 
option of engine utilisation tag being in percentage units as exposed 
from that driver. And extending gputop to support that as well.


Regards,

Tvrtko


Looking at your patch I guess I don't understand the difference between
'elapsed' and 'cycles' inside your retire_submit(). Both are scoped to a
single context and are not global? If 'elapsed' is time context has
spent on the GPU, cycles isn't the same just in a different unit?


Correct, we capture (from GPU cmdstream) two counters both before and
after a submit (aka execbuf) runs, one is a fixed-rate counter, which
gives us elapsed time.  The second is a counter that increments every
clk cycle, which gives us the # of cycles.  With the two values, we
can calculate GPU frequency.

BR,
-R


Regards,

Tvrtko

Re: [PATCH v7, 03/15] media: mtk-vcodec: get capture queue buffer size from scp

2022-03-01 Thread Nicolas Dufresne

Thanks for your patch, though perhaps it could be improved, see comment below.

Le mercredi 23 février 2022 à 11:39 +0800, Yunfei Dong a écrit :
> Different capture buffer format has different buffer size, need to get
> real buffer size according to buffer type from scp.
> 
> Signed-off-by: Yunfei Dong 
> ---
>  .../media/platform/mtk-vcodec/vdec_ipi_msg.h  | 36 ++
>  .../media/platform/mtk-vcodec/vdec_vpu_if.c   | 49 +++
>  .../media/platform/mtk-vcodec/vdec_vpu_if.h   | 15 ++
>  3 files changed, 100 insertions(+)
> 
> diff --git a/drivers/media/platform/mtk-vcodec/vdec_ipi_msg.h 
> b/drivers/media/platform/mtk-vcodec/vdec_ipi_msg.h
> index bf54d6d9a857..47070be2a991 100644
> --- a/drivers/media/platform/mtk-vcodec/vdec_ipi_msg.h
> +++ b/drivers/media/platform/mtk-vcodec/vdec_ipi_msg.h
> @@ -20,6 +20,7 @@ enum vdec_ipi_msgid {
>   AP_IPIMSG_DEC_RESET = 0xA004,
>   AP_IPIMSG_DEC_CORE = 0xA005,
>   AP_IPIMSG_DEC_CORE_END = 0xA006,
> + AP_IPIMSG_DEC_GET_PARAM = 0xA007,
>  
>   VPU_IPIMSG_DEC_INIT_ACK = 0xB000,
>   VPU_IPIMSG_DEC_START_ACK = 0xB001,
> @@ -28,6 +29,7 @@ enum vdec_ipi_msgid {
>   VPU_IPIMSG_DEC_RESET_ACK = 0xB004,
>   VPU_IPIMSG_DEC_CORE_ACK = 0xB005,
>   VPU_IPIMSG_DEC_CORE_END_ACK = 0xB006,
> + VPU_IPIMSG_DEC_GET_PARAM_ACK = 0xB007,
>  };
>  
>  /**
> @@ -114,4 +116,38 @@ struct vdec_vpu_ipi_init_ack {
>   uint32_t inst_id;
>  };
>  
> +/**
> + * struct vdec_ap_ipi_get_param - for AP_IPIMSG_DEC_GET_PARAM
> + * @msg_id   : AP_IPIMSG_DEC_GET_PARAM
> + * @inst_id : instance ID. Used if the ABI version >= 2.
> + * @data : picture information
> + * @param_type   : get param type
> + * @codec_type   : Codec fourcc
> + */
> +struct vdec_ap_ipi_get_param {
> + u32 msg_id;
> + u32 inst_id;
> + u32 data[4];
> + u32 param_type;
> + u32 codec_type;
> +};
> +
> +/**
> + * struct vdec_vpu_ipi_get_param_ack - for VPU_IPIMSG_DEC_GET_PARAM_ACK
> + * @msg_id   : VPU_IPIMSG_DEC_GET_PARAM_ACK
> + * @status   : VPU execution result
> + * @ap_inst_addr : AP vcodec_vpu_inst instance address
> + * @data : picture information from SCP.
> + * @param_type   : get param type
> + * @reserved : reserved param
> + */
> +struct vdec_vpu_ipi_get_param_ack {
> + u32 msg_id;
> + s32 status;
> + u64 ap_inst_addr;
> + u32 data[4];
> + u32 param_type;
> + u32 reserved;
> +};
> +
>  #endif
> diff --git a/drivers/media/platform/mtk-vcodec/vdec_vpu_if.c 
> b/drivers/media/platform/mtk-vcodec/vdec_vpu_if.c
> index 7210061c772f..35f4d5583084 100644
> --- a/drivers/media/platform/mtk-vcodec/vdec_vpu_if.c
> +++ b/drivers/media/platform/mtk-vcodec/vdec_vpu_if.c
> @@ -6,6 +6,7 @@
>  
>  #include "mtk_vcodec_drv.h"
>  #include "mtk_vcodec_util.h"
> +#include "vdec_drv_if.h"
>  #include "vdec_ipi_msg.h"
>  #include "vdec_vpu_if.h"
>  #include "mtk_vcodec_fw.h"
> @@ -54,6 +55,26 @@ static void handle_init_ack_msg(const struct 
> vdec_vpu_ipi_init_ack *msg)
>   }
>  }
>  
> +static void handle_get_param_msg_ack(const struct vdec_vpu_ipi_get_param_ack 
> *msg)
> +{
> + struct vdec_vpu_inst *vpu = (struct vdec_vpu_inst *)
> + (unsigned long)msg->ap_inst_addr;
> +
> + mtk_vcodec_debug(vpu, "+ ap_inst_addr = 0x%llx", msg->ap_inst_addr);
> +
> + /* param_type is enum vdec_get_param_type */
> + switch (msg->param_type) {
> + case GET_PARAM_PIC_INFO:
> + vpu->fb_sz[0] = msg->data[0];
> + vpu->fb_sz[1] = msg->data[1];
> + break;
> + default:
> + mtk_vcodec_err(vpu, "invalid get param type=%d", 
> msg->param_type);
> + vpu->failure = 1;
> + break;
> + }
> +}
> +
>  /*
>   * vpu_dec_ipi_handler - Handler for VPU ipi message.
>   *
> @@ -89,6 +110,9 @@ static void vpu_dec_ipi_handler(void *data, unsigned int 
> len, void *priv)
>   case VPU_IPIMSG_DEC_CORE_END_ACK:
>   break;
>  
> + case VPU_IPIMSG_DEC_GET_PARAM_ACK:
> + handle_get_param_msg_ack(data);
> + break;
>   default:
>   mtk_vcodec_err(vpu, "invalid msg=%X", msg->msg_id);
>   break;
> @@ -217,6 +241,31 @@ int vpu_dec_start(struct vdec_vpu_inst *vpu, uint32_t 
> *data, unsigned int len)
>   return err;
>  }
>  
> +int vpu_dec_get_param(struct vdec_vpu_inst *vpu, uint32_t *data,
> +   unsigned int len, unsigned int param_type)
> +{
> + struct vdec_ap_ipi_get_param msg;
> + int err;
> +
> + mtk_vcodec_debug_enter(vpu);
> +
> + if (len > ARRAY_SIZE(msg.data)) {
> + mtk_vcodec_err(vpu, "invalid len = %d\n", len);
> + return -EINVAL;
> + }
> +
> + memset(&msg, 0, sizeof(msg));
> + msg.msg_id = AP_IPIMSG_DEC_GET_PARAM;
> + msg.inst_id = vpu->inst_id;
> + memcpy(msg.data, data, sizeof(unsigned int) * len);

Re: [PATCH] [v2] Kbuild: move to -std=gnu11

2022-03-01 Thread Arnd Bergmann

On Tue, Mar 1, 2022 at 11:43 AM Miguel Ojeda
 wrote:
>
> On Mon, Feb 28, 2022 at 11:32 AM Arnd Bergmann  wrote:
> >
> > -under ``-std=gnu89`` [gcc-c-dialect-options]_: the GNU dialect of ISO C90
> > -(including some C99 features). ``clang`` [clang]_ is also supported, see
> > +under ``-std=gnu11`` [gcc-c-dialect-options]_: the GNU dialect of ISO C11
> > +(including some C17 features). ``clang`` [clang]_ is also supported, see
>
> I think the "(including some C17)" bit would not make much sense
> anymore. There were no major changes in C17 and GCC implements
> `-std=c11` and `-std=c17` as basically the same thing according to the
> docs (and GNU extensions apply equally to both, I would assume).

Ok, changed now.

> When I wrote the "(including some C99 features)" I meant that GCC
> implemented some C99 features as extensions in C90 mode, and the
> kernel used some of those (e.g. the now gone VLAs).

I suppose it's still true for some c2x features (static_assert, fallthrough,
binary literals, ...), but it seems easier to just leave it out.

> With that changed, for `programming-language.rst`:
>
> Reviewed-by: Miguel Ojeda 

Thanks.

Re: [PATCH] [v2] Kbuild: move to -std=gnu11

2022-03-01 Thread Arnd Bergmann

On Mon, Feb 28, 2022 at 10:41 PM Fangrui Song  wrote:
> >
> >More precisely, the semantics of "extern inline" functions changed
> >between ISO C90 and ISO C99.
>
> Perhaps a clearer explanation to readers is: "extern inline" and "inline" swap
> semantics with gnu_inline (-fgnu89-inline or __attribute__((__gnu_inline__))).
>
> >That's the only concern I have, which I doubt is an issue. The kernel
> >is already covered by the function attribute as you note.
> >
> >Just to have some measure:
> >$ git grep -rn "extern inline" | wc -l
> >116
>
> "^inline" behaves like C99+ "extern inline"
>
> Agree this is handled by
>
>  #define inline inline __gnu_inline __inline_maybe_unused notrace
>

Ok, I've reworded it again, but kept it a bit shorter, I don't think we
need the full explanation in this patch description in the end.

Thanks,

  Arnd

Re: [PATCH] drm/vc4: add tracepoints for CL submissions

2022-03-01 Thread Melissa Wen

On 02/25, Maxime Ripard wrote:
> Hi Melissa,
> 
> On Tue, Feb 01, 2022 at 08:26:51PM -0100, Melissa Wen wrote:
> > Trace submit_cl_ioctl and related IRQs for CL submission and bin/render
> > jobs execution. It might be helpful to get a rendering timeline and
> > track job throttling.
> > 
> > Signed-off-by: Melissa Wen 
> 
> I'm not really sure what to do about this patch to be honest.
> 
> My understanding is that tracepoints are considered as userspace ABI,
> but I can't really judge whether your traces are good enough or if it's
> something that will bit us some time down the road.

Hi Maxime,

Thanks for taking a look at this patch.

So, I followed the same path of tracing CL submissions on v3d. I mean,
tracking submit_cl ioctl, points when a job (bin/render) starts it
execution, and irqs of completion (bin/render job). We used it to
examine job throttling when running Chromium and, therefore, in addition
to have the timeline of jobs execution, I show some data submitted in
the ioctl to make connections. I think these tracers might be useful for
some investigation in the future, but I'm also not sure if all data are
necessary to be displayed.

Melissa

> 
> Emma, Daniel, David, any insight?
> 
> Thanks,
> Maxime

signature.asc
Description: PGP signature

[Bug 215648] amdgpu: Changing monitor configuration (plug/unplug/wake from DPMS) causes kernel panic

https://bugzilla.kernel.org/show_bug.cgi?id=215648

Alex Deucher (alexdeuc...@gmail.com) changed:

   What|Removed |Added

 CC||alexdeuc...@gmail.com

--- Comment #1 from Alex Deucher (alexdeuc...@gmail.com) ---
Can you attach your full dmesg output when the issue happens?  Is it actually a
segfault or just a warning?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

Re: [PATCH] drm/i915: Depend on !PREEMPT_RT.

2022-03-01 Thread Sebastian Andrzej Siewior

On 2022-03-01 14:27:18 [+], Tvrtko Ursulin wrote:
> > you see:
> > 0003-drm-i915-Use-preempt_disable-enable_rt-where-recomme.patch
> > 0004-drm-i915-Don-t-disable-interrupts-on-PREEMPT_RT-duri.patch
> 
> Two for the display folks.
> 
> > 0005-drm-i915-Don-t-check-for-atomic-context-on-PREEMPT_R.patch
> 
> What do preempt_disable/enable do on PREEMPT_RT? Thinking if instead the
> solution could be to always force the !ATOMIC path (for the whole
> _wait_for_atomic macro) on PREEMPT_RT.

Could be one way to handle it. But please don't disable preemption and
or interrupts for longer period of time as all of it increases the
overall latency.

Side note: All of these patches is a collection over time. I personally
have only a single i7-sandybridge with i915 and here I don't really
enter all the possible paths here. People report, I patch and look
around and then they are quiet so I assume that it is working.

> > 0006-drm-i915-Disable-tracing-points-on-PREEMPT_RT.patch
> 
> If the issue is only with certain trace points why disable all?

It is a class and it is easier that way.

> > 0007-drm-i915-skip-DRM_I915_LOW_LEVEL_TRACEPOINTS-with-NO.patch
> 
> Didn't quite fully understand, why is this not fixable? Especially thinking
> if the option of not blanket disabling all tracepoints in the previous
> patch.

The problem is that you can't acquire that lock from within that
trace-point on PREEMPT_RT. On !RT it is possible but it is also
problematic because LOCKDEP does not see possible dead locks unless that
trace-point is enabled.

I've been talking to Steven (after
https://lkml.kernel.org/r/20211214115837.6f33a...@gandalf.local.home)
and he wants to come up with something where you can pass a lock as
argument to the tracing-API. That way the lock can be acquired before
the trace event is invoked and lockdep will see it even if the trace
event is disabled.
So there is an idea how to get it to work eventually without disabling
it in the long term.

Making the register a raw_spinlock_t would solve problem immediately but
I am a little worried given the increased latency in a quick test:
   https://lore.kernel.org/all/20211006164628.s2mtsdd2jdbfy...@linutronix.de/

also, this one single hardware but the upper limit atomic-polls is high.

> > 0008-drm-i915-gt-Queue-and-wait-for-the-irq_work-item.patch
> 
> Not sure about why cond_resched was put between irq_work_queue and
> irq_work_sync - would it not be like-for-like change to have the two
> together? 

maybe it loops for a while and an additional scheduling would be nice.

> Commit message makes me think _queue already starts the handler on
> x86 at least.

Yes, irq_work_queue() triggers the IRQ right away on x86,
irq_work_sync() would wait for it to happen in case it did not happen.
On architectures which don't provide an IRQ-work interrupt, it is
delayed to the HZ tick timer interrupt. So this serves also as an
example in case someone want to copy the code ;)

> > 0009-drm-i915-gt-Use-spin_lock_irq-instead-of-local_irq_d.patch
> 
> I think this is okay. The part after the unlock is serialized by the tasklet
> already.
> 
> Slight doubt due the comment:
> 
>   local_irq_enable(); /* flush irq_work (e.g. breadcrumb enabling) */
> 
> Makes me want to think about it harder but not now.

Clark reported it and confirmed that the warning is gone on RT and
everything appears to work ;)
PREEMPT_RT wise there is no synchronisation vs irq_work other than an
actual lock (in case it is needed).

> Another thing to check is if execlists_context_status_change still needs the
> atomic notifier chain with this change.
> 
> > 0010-drm-i915-Drop-the-irqs_disabled-check.patch
> 
> LGTM.

Do you want me to repost that one?

> > Revert-drm-i915-Depend-on-PREEMPT_RT.patch
> 
> Okay.
> 
> And finally for this very patch (the thread I am replying to):
> 
> Acked-by: Tvrtko Ursulin 

Thanks.

> Regards,
> 
> Tvrtko

Sebastian

Re: [PATCH v5 2/7] drm/i915: Prepare for multiple GTs

2022-03-01 Thread Andrzej Hajda





On 17.02.2022 15:41, Andi Shyti wrote:

From: Tvrtko Ursulin 

On a multi-tile platform, each tile has its own registers + GGTT
space, and BAR 0 is extended to cover all of them.

Up to four GTs are supported in i915->gt[], with slot zero
shadowing the existing i915->gt0 to enable source compatibility
with legacy driver paths. A for_each_gt macro is added to iterate
over the GTs and will be used by upcoming patches that convert
various parts of the driver to be multi-gt aware.

Only the primary/root tile is initialized for now; the other
tiles will be detected and plugged in by future patches once the
necessary infrastructure is in place to handle them.

Signed-off-by: Abdiel Janulgue 
Signed-off-by: Daniele Ceraolo Spurio 
Signed-off-by: Tvrtko Ursulin 
Signed-off-by: Matt Roper 
Signed-off-by: Andi Shyti 
Cc: Daniele Ceraolo Spurio 
Cc: Joonas Lahtinen 
Cc: Matthew Auld 
Reviewed-by: Matt Roper 
---
  drivers/gpu/drm/i915/gt/intel_gt.c| 135 --
  drivers/gpu/drm/i915/gt/intel_gt.h|  16 ++-
  drivers/gpu/drm/i915/gt/intel_gt_pm.c |   9 +-
  drivers/gpu/drm/i915/gt/intel_gt_types.h  |   7 +
  drivers/gpu/drm/i915/i915_driver.c|  29 ++--
  drivers/gpu/drm/i915/i915_drv.h   |   6 +
  drivers/gpu/drm/i915/intel_memory_region.h|   3 +
  drivers/gpu/drm/i915/intel_uncore.c   |  12 +-
  drivers/gpu/drm/i915/intel_uncore.h   |   3 +-
  .../gpu/drm/i915/selftests/mock_gem_device.c  |   7 +-
  10 files changed, 182 insertions(+), 45 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.c 
b/drivers/gpu/drm/i915/gt/intel_gt.c
index db171e85f4df..8c64b81e9ec9 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt.c
@@ -29,7 +29,7 @@
  #include "intel_uncore.h"
  #include "shmem_utils.h"
  
-void __intel_gt_init_early(struct intel_gt *gt, struct drm_i915_private *i915)

+static void __intel_gt_init_early(struct intel_gt *gt)
  {
spin_lock_init(>->irq_lock);
  
@@ -51,19 +51,29 @@ void __intel_gt_init_early(struct intel_gt *gt, struct drm_i915_private *i915)

intel_rps_init_early(>->rps);
  }
  
-void intel_gt_init_early(struct intel_gt *gt, struct drm_i915_private *i915)

+/* Preliminary initialization of Tile 0 */
+void intel_root_gt_init_early(struct drm_i915_private *i915)
  {
+   struct intel_gt *gt = to_gt(i915);
+
gt->i915 = i915;
gt->uncore = &i915->uncore;
+
+   __intel_gt_init_early(gt);
  }
  
-int intel_gt_probe_lmem(struct intel_gt *gt)

+static int intel_gt_probe_lmem(struct intel_gt *gt)
  {
struct drm_i915_private *i915 = gt->i915;
+   unsigned int instance = gt->info.id;
struct intel_memory_region *mem;
int id;
int err;
  
+	id = INTEL_REGION_LMEM_0 + instance;

+   if (drm_WARN_ON(&i915->drm, id >= INTEL_REGION_STOLEN_SMEM))


Do we need to check id correctness? wouldn't be enough to check it on 
initialization of gt->info.id.
If yes, maybe (id > INTEL_REGION_LMEM_3) would be more readable, or 
(info.id < MAX_GT),  up to you.



+   return -ENODEV;
+
mem = intel_gt_setup_lmem(gt);
if (mem == ERR_PTR(-ENODEV))
mem = intel_gt_setup_fake_lmem(gt);
@@ -78,9 +88,8 @@ int intel_gt_probe_lmem(struct intel_gt *gt)
return err;
}
  
-	id = INTEL_REGION_LMEM_0;

-
mem->id = id;
+   mem->instance = instance;
  
  	intel_memory_region_set_name(mem, "local%u", mem->instance);
  
@@ -795,16 +804,21 @@ void intel_gt_driver_release(struct intel_gt *gt)

intel_gt_fini_buffer_pool(gt);
  }
  
-void intel_gt_driver_late_release(struct intel_gt *gt)

+void intel_gt_driver_late_release(struct drm_i915_private *i915)
  {
+   struct intel_gt *gt;
+   unsigned int id;
+
/* We need to wait for inflight RCU frees to release their grip */
rcu_barrier();
  
-	intel_uc_driver_late_release(>->uc);

-   intel_gt_fini_requests(gt);
-   intel_gt_fini_reset(gt);
-   intel_gt_fini_timelines(gt);
-   intel_engines_free(gt);
+   for_each_gt(gt, i915, id) {
+   intel_uc_driver_late_release(>->uc);
+   intel_gt_fini_requests(gt);
+   intel_gt_fini_reset(gt);
+   intel_gt_fini_timelines(gt);
+   intel_engines_free(gt);
+   }
  }
  
  /**

@@ -913,6 +927,105 @@ u32 intel_gt_read_register_fw(struct intel_gt *gt, 
i915_reg_t reg)
return intel_uncore_read_fw(gt->uncore, reg);
  }
  
+static int intel_gt_tile_setup(struct intel_gt *gt, phys_addr_t phys_addr)

+{
+   unsigned int id = gt->info.id;
+   int ret;
+
+   if (id) {
+   struct intel_uncore_mmio_debug *mmio_debug;
+   struct intel_uncore *uncore;
+
+   uncore = kzalloc(sizeof(*uncore), GFP_KERNEL);
+   if (!gt->uncore)
+   return -ENOMEM;


s/gt->uncore/uncore/


+
+   mmio_debug = kza

Re: [PATCH v5 1/7] drm/i915: Rename INTEL_REGION_LMEM with INTEL_REGION_LMEM_0

2022-03-01 Thread Andrzej Hajda





On 17.02.2022 15:41, Andi Shyti wrote:

With the upcoming multitile support each tile will have its own
local memory. Mark the current LMEM with the suffix '0' to
emphasise that it belongs to the root tile.

Suggested-by: Michal Wajdeczko 
Signed-off-by: Andi Shyti 

Reviewed-by: Andrzej Hajda 

Regards
Andrzej

Re: [Intel-gfx] [PATCH v5 3/7] drm/i915/gt: add gt_is_root() helper

2022-03-01 Thread Andrzej Hajda





On 28.02.2022 21:02, Michal Wajdeczko wrote:


On 17.02.2022 15:41, Andi Shyti wrote:

The "gt_is_root(struct intel_gt *gt)" helper return true if the
gt is the root gt, which means that its id is 0. Return false
otherwise.

Suggested-by: Michal Wajdeczko 
Signed-off-by: Andi Shyti 
---
  drivers/gpu/drm/i915/gt/intel_gt.h | 5 +
  1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt.h 
b/drivers/gpu/drm/i915/gt/intel_gt.h
index 915d6192079b..f17f51e2d394 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt.h
@@ -19,6 +19,11 @@ struct drm_printer;
  ##__VA_ARGS__);   \
  } while (0)
  
+static inline bool gt_is_root(struct intel_gt *gt)

+{
+   return !gt->info.id;
+}
+

we could squash this patch with prev one, where it can be used in:

  intel_gt_tile_cleanup(struct intel_gt *gt)
  {
intel_uncore_cleanup_mmio(gt->uncore);

-   if (gt->info.id) {
+   if (!gt_is_root(gt)) {
kfree(gt->uncore);
kfree(gt);
}
  }


It can be used in intel_gt_tile_setup as well, and then you can remove 
id var.




or just use it this way in this patch, with that:

Reviewed-by: Michal Wajdeczko 

Accordingly:

Reviewed-by: Andrzej Hajda 

Regards
Andrzej




  static inline struct intel_gt *uc_to_gt(struct intel_uc *uc)
  {
return container_of(uc, struct intel_gt, uc);

[PATCH v1 0/3] Ingenic DRM bridge_atomic_enable proposal

Hello, this is a set of patches to allow the upstreaming of the
NV3052C panel found in the Anbernic RG350M mips gaming handheld.

It was never upstreamed so far due to a longstanding graphical
bug, which I propose to solve by introducing ingenic_drm_bridge_atomic_enable
in the drm driver so the CRTC can be enabled after the panel itself slept
out, and not before as it used to.

After the drm change, 2 of the existing panels have to be modified accordingly 
to introduce missing .enable and .disable in their code.

Christophe Branchereau (3):
  drm/ingenic : add ingenic_drm_bridge_atomic_enable
  drm/panel: Add panel driver for NewVision NV3052C based LCDs
  drm/panel : innolux-ej030na and abt-y030xx067a : add .enable and
.disable

 drivers/gpu/drm/ingenic/ingenic-drm-drv.c |  19 +-
 drivers/gpu/drm/panel/Kconfig |   9 +
 drivers/gpu/drm/panel/Makefile|   1 +
 drivers/gpu/drm/panel/panel-abt-y030xx067a.c  |  23 +-
 drivers/gpu/drm/panel/panel-innolux-ej030na.c |  31 +-
 .../gpu/drm/panel/panel-newvision-nv3052c.c   | 504 ++
 6 files changed, 575 insertions(+), 12 deletions(-)
 create mode 100644 drivers/gpu/drm/panel/panel-newvision-nv3052c.c

-- 
2.34.1

[PATCH v1 1/3] drm/ingenic : add ingenic_drm_bridge_atomic_enable

This allows the CRTC to be enabled after panels have slept out,
and before their display is turned on, solving a graphical bug
on the newvision nv3502c

Signed-off-by: Christophe Branchereau 
---
 drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c 
b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
index dcf44cb00821..51512f41263e 100644
--- a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
+++ b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c
@@ -226,6 +226,18 @@ static int ingenic_drm_update_pixclk(struct notifier_block 
*nb,
}
 }
 
+static void ingenic_drm_bridge_atomic_enable(struct drm_bridge *bridge,
+struct drm_bridge_state 
*old_bridge_state)
+{
+   struct ingenic_drm *priv = drm_device_get_priv(bridge->dev);
+
+   regmap_write(priv->map, JZ_REG_LCD_STATE, 0);
+
+   regmap_update_bits(priv->map, JZ_REG_LCD_CTRL,
+  JZ_LCD_CTRL_ENABLE | JZ_LCD_CTRL_DISABLE,
+  JZ_LCD_CTRL_ENABLE);
+}
+
 static void ingenic_drm_crtc_atomic_enable(struct drm_crtc *crtc,
   struct drm_atomic_state *state)
 {
@@ -237,17 +249,11 @@ static void ingenic_drm_crtc_atomic_enable(struct 
drm_crtc *crtc,
if (WARN_ON(IS_ERR(priv_state)))
return;
 
-   regmap_write(priv->map, JZ_REG_LCD_STATE, 0);
-
/* Set addresses of our DMA descriptor chains */
next_id = priv_state->use_palette ? HWDESC_PALETTE : 0;
regmap_write(priv->map, JZ_REG_LCD_DA0, dma_hwdesc_addr(priv, next_id));
regmap_write(priv->map, JZ_REG_LCD_DA1, dma_hwdesc_addr(priv, 1));
 
-   regmap_update_bits(priv->map, JZ_REG_LCD_CTRL,
-  JZ_LCD_CTRL_ENABLE | JZ_LCD_CTRL_DISABLE,
-  JZ_LCD_CTRL_ENABLE);
-
drm_crtc_vblank_on(crtc);
 }
 
@@ -968,6 +974,7 @@ static const struct drm_encoder_helper_funcs 
ingenic_drm_encoder_helper_funcs =
 
 static const struct drm_bridge_funcs ingenic_drm_bridge_funcs = {
.attach = ingenic_drm_bridge_attach,
+   .atomic_enable  = ingenic_drm_bridge_atomic_enable,
.atomic_check   = ingenic_drm_bridge_atomic_check,
.atomic_reset   = drm_atomic_helper_bridge_reset,
.atomic_duplicate_state = drm_atomic_helper_bridge_duplicate_state,
-- 
2.34.1

[PATCH v1 2/3] drm/panel: Add panel driver for NewVision NV3052C based LCDs

This driver supports the NewVision NV3052C based LCDs. Right now, it
only supports the LeadTek LTK035C5444T 2.4" 640x480 TFT LCD panel, which
can be found in the Anbernic RG-350M handheld console.

Signed-off-by: Christophe Branchereau 
---
 drivers/gpu/drm/panel/Kconfig |   9 +
 drivers/gpu/drm/panel/Makefile|   1 +
 .../gpu/drm/panel/panel-newvision-nv3052c.c   | 504 ++
 3 files changed, 514 insertions(+)
 create mode 100644 drivers/gpu/drm/panel/panel-newvision-nv3052c.c

diff --git a/drivers/gpu/drm/panel/Kconfig b/drivers/gpu/drm/panel/Kconfig
index bb2e47229c68..40084f709789 100644
--- a/drivers/gpu/drm/panel/Kconfig
+++ b/drivers/gpu/drm/panel/Kconfig
@@ -283,6 +283,15 @@ config DRM_PANEL_NEC_NL8048HL11
  panel (found on the Zoom2/3/3630 SDP boards). To compile this driver
  as a module, choose M here.
 
+config DRM_PANEL_NEWVISION_NV3052C
+   tristate "NewVision NV3052C RGB/SPI panel"
+   depends on OF && SPI
+   depends on BACKLIGHT_CLASS_DEVICE
+   select DRM_MIPI_DBI
+   help
+ Say Y here if you want to enable support for the panels built
+ around the NewVision NV3052C display controller.
+
 config DRM_PANEL_NOVATEK_NT35510
tristate "Novatek NT35510 RGB panel driver"
depends on OF
diff --git a/drivers/gpu/drm/panel/Makefile b/drivers/gpu/drm/panel/Makefile
index 5740911f637c..42a7ab54234b 100644
--- a/drivers/gpu/drm/panel/Makefile
+++ b/drivers/gpu/drm/panel/Makefile
@@ -26,6 +26,7 @@ obj-$(CONFIG_DRM_PANEL_LEADTEK_LTK500HD1829) += 
panel-leadtek-ltk500hd1829.o
 obj-$(CONFIG_DRM_PANEL_LG_LB035Q02) += panel-lg-lb035q02.o
 obj-$(CONFIG_DRM_PANEL_LG_LG4573) += panel-lg-lg4573.o
 obj-$(CONFIG_DRM_PANEL_NEC_NL8048HL11) += panel-nec-nl8048hl11.o
+obj-$(CONFIG_DRM_PANEL_NEWVISION_NV3052C) += panel-newvision-nv3052c.o
 obj-$(CONFIG_DRM_PANEL_NOVATEK_NT35510) += panel-novatek-nt35510.o
 obj-$(CONFIG_DRM_PANEL_NOVATEK_NT35560) += panel-novatek-nt35560.o
 obj-$(CONFIG_DRM_PANEL_NOVATEK_NT35950) += panel-novatek-nt35950.o
diff --git a/drivers/gpu/drm/panel/panel-newvision-nv3052c.c 
b/drivers/gpu/drm/panel/panel-newvision-nv3052c.c
new file mode 100644
index ..08a3b33bcce0
--- /dev/null
+++ b/drivers/gpu/drm/panel/panel-newvision-nv3052c.c
@@ -0,0 +1,504 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * NevVision NV3052C IPS LCD panel driver
+ *
+ * Copyright (C) 2020, Paul Cercueil 
+ * Copyright (C) 2022, Christophe Branchereau 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include 
+#include 
+#include 
+
+struct nv3052c_panel_info {
+   const struct drm_display_mode *display_modes;
+   unsigned int num_modes;
+   u16 width_mm, height_mm;
+   u32 bus_format, bus_flags;
+};
+
+struct nv3052c {
+   struct device *dev;
+   struct drm_panel panel;
+   struct mipi_dbi dbi;
+
+   const struct nv3052c_panel_info *panel_info;
+
+   struct regulator *supply;
+   struct gpio_desc *reset_gpio;
+};
+
+struct nv3052c_reg {
+   u8 cmd;
+   u8 val;
+};
+
+static const struct nv3052c_reg nv3052c_panel_regs[] = {
+   { 0xff, 0x30 },
+   { 0xff, 0x52 },
+   { 0xff, 0x01 },
+   { 0xe3, 0x00 },
+   { 0x40, 0x00 },
+   { 0x03, 0x40 },
+   { 0x04, 0x00 },
+   { 0x05, 0x03 },
+   { 0x08, 0x00 },
+   { 0x09, 0x07 },
+   { 0x0a, 0x01 },
+   { 0x0b, 0x32 },
+   { 0x0c, 0x32 },
+   { 0x0d, 0x0b },
+   { 0x0e, 0x00 },
+   { 0x23, 0xa0 },
+
+   { 0x24, 0x0c },
+   { 0x25, 0x06 },
+   { 0x26, 0x14 },
+   { 0x27, 0x14 },
+
+   { 0x38, 0xcc },
+   { 0x39, 0xd7 },
+   { 0x3a, 0x4a },
+
+   { 0x28, 0x40 },
+   { 0x29, 0x01 },
+   { 0x2a, 0xdf },
+   { 0x49, 0x3c },
+   { 0x91, 0x77 },
+   { 0x92, 0x77 },
+   { 0xa0, 0x55 },
+   { 0xa1, 0x50 },
+   { 0xa4, 0x9c },
+   { 0xa7, 0x02 },
+   { 0xa8, 0x01 },
+   { 0xa9, 0x01 },
+   { 0xaa, 0xfc },
+   { 0xab, 0x28 },
+   { 0xac, 0x06 },
+   { 0xad, 0x06 },
+   { 0xae, 0x06 },
+   { 0xaf, 0x03 },
+   { 0xb0, 0x08 },
+   { 0xb1, 0x26 },
+   { 0xb2, 0x28 },
+   { 0xb3, 0x28 },
+   { 0xb4, 0x33 },
+   { 0xb5, 0x08 },
+   { 0xb6, 0x26 },
+   { 0xb7, 0x08 },
+   { 0xb8, 0x26 },
+   { 0xf0, 0x00 },
+   { 0xf6, 0xc0 },
+
+   { 0xff, 0x30 },
+   { 0xff, 0x52 },
+   { 0xff, 0x02 },
+   { 0xb0, 0x0b },
+   { 0xb1, 0x16 },
+   { 0xb2, 0x17 },
+   { 0xb3, 0x2c },
+   { 0xb4, 0x32 },
+   { 0xb5, 0x3b },
+   { 0xb6, 0x29 },
+   { 0xb7, 0x40 },
+   { 0xb8, 0x0d },
+   { 0xb9, 0x05 },
+   { 0xba, 0x12 },
+   { 0xbb, 0x10 },
+   { 0xbc, 0x12 },
+   { 0xbd, 0x15 },
+   { 0xbe, 0x19 },
+   { 0xbf, 0x0e },
+   { 0xc0, 0x16 },
+   { 0xc1, 0x0a },
+   { 0xd0, 0x0c },
+   { 0xd1, 0x17 }

[PATCH v1 3/3] drm/panel : innolux-ej030na and abt-y030xx067a : add .enable and .disable

Following the introduction of bridge_atomic_enable in the ingenic
drm driver, the crtc is enabled between .prepare and .enable, if
it exists.

Add it so the backlight is only enabled after the crtc is, to avoid
graphical issues.

Signed-off-by: Christophe Branchereau 
---
 drivers/gpu/drm/panel/panel-abt-y030xx067a.c  | 23 --
 drivers/gpu/drm/panel/panel-innolux-ej030na.c | 31 ---
 2 files changed, 48 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/panel/panel-abt-y030xx067a.c 
b/drivers/gpu/drm/panel/panel-abt-y030xx067a.c
index f043b484055b..b5736344e3ec 100644
--- a/drivers/gpu/drm/panel/panel-abt-y030xx067a.c
+++ b/drivers/gpu/drm/panel/panel-abt-y030xx067a.c
@@ -183,8 +183,6 @@ static int y030xx067a_prepare(struct drm_panel *panel)
goto err_disable_regulator;
}
 
-   msleep(120);
-
return 0;
 
 err_disable_regulator:
@@ -202,6 +200,25 @@ static int y030xx067a_unprepare(struct drm_panel *panel)
return 0;
 }
 
+static int y030xx067a_enable(struct drm_panel *panel)
+{
+   if (panel->backlight) {
+   /* Wait for the picture to be ready before enabling backlight */
+   msleep(120);
+   }
+
+   return 0;
+}
+
+static int y030xx067a_disable(struct drm_panel *panel)
+{
+   struct y030xx067a *priv = to_y030xx067a(panel);
+
+   regmap_clear_bits(priv->map, 0x06, REG06_XPSAVE);
+
+   return 0;
+}
+
 static int y030xx067a_get_modes(struct drm_panel *panel,
struct drm_connector *connector)
 {
@@ -239,6 +256,8 @@ static int y030xx067a_get_modes(struct drm_panel *panel,
 static const struct drm_panel_funcs y030xx067a_funcs = {
.prepare= y030xx067a_prepare,
.unprepare  = y030xx067a_unprepare,
+   .enable = y030xx067a_enable,
+   .disable= y030xx067a_disable,
.get_modes  = y030xx067a_get_modes,
 };
 
diff --git a/drivers/gpu/drm/panel/panel-innolux-ej030na.c 
b/drivers/gpu/drm/panel/panel-innolux-ej030na.c
index c558de3f99be..6de7370185cd 100644
--- a/drivers/gpu/drm/panel/panel-innolux-ej030na.c
+++ b/drivers/gpu/drm/panel/panel-innolux-ej030na.c
@@ -80,8 +80,6 @@ static const struct reg_sequence ej030na_init_sequence[] = {
{ 0x47, 0x08 },
{ 0x48, 0x0f },
{ 0x49, 0x0f },
-
-   { 0x2b, 0x01 },
 };
 
 static int ej030na_prepare(struct drm_panel *panel)
@@ -109,8 +107,6 @@ static int ej030na_prepare(struct drm_panel *panel)
goto err_disable_regulator;
}
 
-   msleep(120);
-
return 0;
 
 err_disable_regulator:
@@ -128,6 +124,31 @@ static int ej030na_unprepare(struct drm_panel *panel)
return 0;
 }
 
+static int ej030na_enable(struct drm_panel *panel)
+{
+   struct ej030na *priv = to_ej030na(panel);
+
+   /* standby off */
+   regmap_write(priv->map, 0x2b, 0x01);
+
+   if (panel->backlight) {
+   /* Wait for the picture to be ready before enabling backlight */
+   msleep(120);
+   }
+
+   return 0;
+}
+
+static int ej030na_disable(struct drm_panel *panel)
+{
+   struct ej030na *priv = to_ej030na(panel);
+
+   /* standby on */
+   regmap_write(priv->map, 0x2b, 0x00);
+
+   return 0;
+}
+
 static int ej030na_get_modes(struct drm_panel *panel,
 struct drm_connector *connector)
 {
@@ -165,6 +186,8 @@ static int ej030na_get_modes(struct drm_panel *panel,
 static const struct drm_panel_funcs ej030na_funcs = {
.prepare= ej030na_prepare,
.unprepare  = ej030na_unprepare,
+   .enable = ej030na_enable,
+   .disable= ej030na_disable,
.get_modes  = ej030na_get_modes,
 };
 
-- 
2.34.1

Re: [PATCH] mm: split vm_normal_pages for LRU and non-LRU handling

2022-03-01 Thread Felix Kuehling




Am 2022-03-01 um 03:03 schrieb David Hildenbrand:

On 28.02.22 21:34, Alex Sierra wrote:

DEVICE_COHERENT pages introduce a subtle distinction in the way
"normal" pages can be used by various callers throughout the kernel.
They behave like normal pages for purposes of mapping in CPU page
tables, and for COW. But they do not support LRU lists, NUMA
migration or THP. Therefore we split vm_normal_page into two
functions vm_normal_any_page and vm_normal_lru_page. The latter will
only return pages that can be put on an LRU list and that support
NUMA migration and THP.

Why not s/vm_normal_any_page/vm_normal_page/ and avoid code churn?


I don't care much, personally, what names we end up with. I'll wait for 
a consensus here.






We also introduced a FOLL_LRU flag that adds the same behaviour to
follow_page and related APIs, to allow callers to specify that they
expect to put pages on an LRU list.

[...]

-#define FOLL_WRITE 0x01/* check pte is writable */
-#define FOLL_TOUCH 0x02/* mark page accessed */
-#define FOLL_GET   0x04/* do get_page on page */
-#define FOLL_DUMP  0x08/* give error on hole if it would be zero */
-#define FOLL_FORCE 0x10/* get_user_pages read/write w/o permission */
-#define FOLL_NOWAIT0x20/* if a disk transfer is needed, start the IO
-* and return without waiting upon it */
-#define FOLL_POPULATE  0x40/* fault in pages (with FOLL_MLOCK) */
-#define FOLL_NOFAULT   0x80/* do not fault in pages */
-#define FOLL_HWPOISON  0x100   /* check page is hwpoisoned */
-#define FOLL_NUMA  0x200   /* force NUMA hinting page fault */
-#define FOLL_MIGRATION 0x400   /* wait for page to replace migration entry */
-#define FOLL_TRIED 0x800   /* a retry, previous pass started an IO */
-#define FOLL_MLOCK 0x1000  /* lock present pages */
-#define FOLL_REMOTE0x2000  /* we are working on non-current tsk/mm */
-#define FOLL_COW   0x4000  /* internal GUP flag */
-#define FOLL_ANON  0x8000  /* don't do file mappings */
-#define FOLL_LONGTERM  0x1 /* mapping lifetime is indefinite: see below */
-#define FOLL_SPLIT_PMD 0x2 /* split huge pmd before returning */
-#define FOLL_PIN   0x4 /* pages must be released via unpin_user_page */
-#define FOLL_FAST_ONLY 0x8 /* gup_fast: prevent fall-back to slow gup */
+#define FOLL_WRITE 0x01 /* check pte is writable */
+#define FOLL_TOUCH 0x02 /* mark page accessed */
+#define FOLL_GET   0x04 /* do get_page on page */
+#define FOLL_DUMP  0x08 /* give error on hole if it would be zero */
+#define FOLL_FORCE 0x10 /* get_user_pages read/write w/o permission */
+#define FOLL_NOWAIT0x20 /* if a disk transfer is needed, start the IO
+ * and return without waiting upon it */
+#define FOLL_POPULATE  0x40 /* fault in pages (with FOLL_MLOCK) */
+#define FOLL_NOFAULT   0x80 /* do not fault in pages */
+#define FOLL_HWPOISON  0x100/* check page is hwpoisoned */
+#define FOLL_NUMA  0x200/* force NUMA hinting page fault */
+#define FOLL_MIGRATION 0x400/* wait for page to replace migration entry */
+#define FOLL_TRIED 0x800/* a retry, previous pass started an IO */
+#define FOLL_MLOCK 0x1000   /* lock present pages */
+#define FOLL_REMOTE0x2000   /* we are working on non-current tsk/mm */
+#define FOLL_COW   0x4000   /* internal GUP flag */
+#define FOLL_ANON  0x8000   /* don't do file mappings */
+#define FOLL_LONGTERM  0x1  /* mapping lifetime is indefinite: see below */
+#define FOLL_SPLIT_PMD 0x2  /* split huge pmd before returning */
+#define FOLL_PIN   0x4  /* pages must be released via unpin_user_page 
*/
+#define FOLL_FAST_ONLY 0x8  /* gup_fast: prevent fall-back to slow gup */
+#define FOLL_LRU   0x10 /* return only LRU (anon or page cache) */
  

Can we minimize code churn, please?


OK. I guess we could unindent the FOLL_LRU number to avoid changing all 
the comments.







if (PageReserved(page))
diff --git a/mm/migrate.c b/mm/migrate.c
index c31d04b46a5e..17d049311b78 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1614,7 +1614,7 @@ static int add_page_for_migration(struct mm_struct *mm, 
unsigned long addr,
goto out;
  
  	/* FOLL_DUMP to ignore special (like zero) pages */

-   follflags = FOLL_GET | FOLL_DUMP;
+   follflags = FOLL_GET | FOLL_DUMP | FOLL_LRU;
page = follow_page(vma, addr, follflags);

Why wouldn't we want to dump DEVICE_COHERENT pages? This looks wrong.


This function later calls isolate_lru_page, which is something you can't 
do with a device page.


Regards,
  Felix

[PATCH v4 6/9] arm64: tegra: Add Host1x context stream IDs on Tegra186+

From: Mikko Perttunen 

Add Host1x context stream IDs on systems that support Host1x context
isolation. Host1x and attached engines can use these stream IDs to
allow isolation between memory used by different processes.

The specified stream IDs must match those configured by the hypervisor,
if one is present.

Signed-off-by: Mikko Perttunen 
---
v2:
* Added context devices on T194.
* Use iommu-map instead of custom property.
v4:
* Remove memory-contexts subnode.
---
 arch/arm64/boot/dts/nvidia/tegra186.dtsi | 11 +++
 arch/arm64/boot/dts/nvidia/tegra194.dtsi | 11 +++
 2 files changed, 22 insertions(+)

diff --git a/arch/arm64/boot/dts/nvidia/tegra186.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra186.dtsi
index c91afff1b757..1b71cba0df06 100644
--- a/arch/arm64/boot/dts/nvidia/tegra186.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra186.dtsi
@@ -1406,6 +1406,17 @@ host1x@13e0 {
 
iommus = <&smmu TEGRA186_SID_HOST1X>;
 
+   /* Context isolation domains */
+   iommu-map = <
+   0 &smmu TEGRA186_SID_HOST1X_CTX0 1
+   1 &smmu TEGRA186_SID_HOST1X_CTX1 1
+   2 &smmu TEGRA186_SID_HOST1X_CTX2 1
+   3 &smmu TEGRA186_SID_HOST1X_CTX3 1
+   4 &smmu TEGRA186_SID_HOST1X_CTX4 1
+   5 &smmu TEGRA186_SID_HOST1X_CTX5 1
+   6 &smmu TEGRA186_SID_HOST1X_CTX6 1
+   7 &smmu TEGRA186_SID_HOST1X_CTX7 1>;
+
dpaux1: dpaux@1504 {
compatible = "nvidia,tegra186-dpaux";
reg = <0x1504 0x1>;
diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
index 2d48c3715fc6..eb0d2ba89cb1 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
@@ -1686,6 +1686,17 @@ host1x@13e0 {
interconnect-names = "dma-mem";
iommus = <&smmu TEGRA194_SID_HOST1X>;
 
+   /* Context isolation domains */
+   iommu-map = <
+   0 &smmu TEGRA194_SID_HOST1X_CTX0 1
+   1 &smmu TEGRA194_SID_HOST1X_CTX1 1
+   2 &smmu TEGRA194_SID_HOST1X_CTX2 1
+   3 &smmu TEGRA194_SID_HOST1X_CTX3 1
+   4 &smmu TEGRA194_SID_HOST1X_CTX4 1
+   5 &smmu TEGRA194_SID_HOST1X_CTX5 1
+   6 &smmu TEGRA194_SID_HOST1X_CTX6 1
+   7 &smmu TEGRA194_SID_HOST1X_CTX7 1>;
+
nvdec@1514 {
compatible = "nvidia,tegra194-nvdec";
reg = <0x1514 0x0004>;
-- 
2.35.0

[PATCH v4 0/9] Host1x context isolation support

From: Mikko Perttunen 

***
New in v4:

Addressed review comments. See individual patches.
***

***
New in v3:

Added device tree bindings for new property.
***

***
New in v2:

Added support for Tegra194
Use standard iommu-map property instead of custom mechanism
***

This series adds support for Host1x 'context isolation'. Since
when programming engines through Host1x, userspace can program in
any addresses it wants, we need some way to isolate the engines'
memory spaces. Traditionally this has either been done imperfectly
with a single shared IOMMU domain, or by copying and verifying the
programming command stream at submit time (Host1x firewall).

Since Tegra186 there is a privileged (only usable by kernel)
Host1x opcode that allows setting the stream ID sent by the engine
to the SMMU. So, by allocating a number of context banks and stream
IDs for this purpose, and using this opcode at the beginning of
each job, we can implement isolation. Due to the limited number of
context banks only each process gets its own context, and not
each channel.

This feature also allows sharing engines among multiple VMs when
used with Host1x's hardware virtualization support - up to 8 VMs
can be configured with a subset of allowed stream IDs, enforced
at hardware level.

To implement this, this series adds a new host1x context bus, which
will contain the 'struct device's corresponding to each context
bank / stream ID, changes to device tree and SMMU code to allow
registering the devices and using the bus, as well as the Host1x
stream ID programming code and support in TegraDRM.

-
Merging notes
-

The changes to DT bindings should be applied on top of Thierry's patch
'dt-bindings: display: tegra: Convert to json-schema'.

Thanks,
Mikko

Mikko Perttunen (9):
  dt-bindings: host1x: Add iommu-map property
  gpu: host1x: Add context bus
  gpu: host1x: Add context device management code
  gpu: host1x: Program context stream ID on submission
  iommu/arm-smmu: Attach to host1x context device bus
  arm64: tegra: Add Host1x context stream IDs on Tegra186+
  drm/tegra: falcon: Set DMACTX field on DMA transactions
  drm/tegra: Support context isolation
  drm/tegra: vic: Implement get_streamid_offset

 .../display/tegra/nvidia,tegra20-host1x.yaml  |   5 +
 arch/arm64/boot/dts/nvidia/tegra186.dtsi  |  11 ++
 arch/arm64/boot/dts/nvidia/tegra194.dtsi  |  11 ++
 drivers/gpu/Makefile  |   3 +-
 drivers/gpu/drm/tegra/drm.h   |   2 +
 drivers/gpu/drm/tegra/falcon.c|   8 +
 drivers/gpu/drm/tegra/falcon.h|   1 +
 drivers/gpu/drm/tegra/submit.c|  21 ++-
 drivers/gpu/drm/tegra/uapi.c  |  45 -
 drivers/gpu/drm/tegra/vic.c   |  68 +++-
 drivers/gpu/host1x/Kconfig|   5 +
 drivers/gpu/host1x/Makefile   |   2 +
 drivers/gpu/host1x/context.c  | 160 ++
 drivers/gpu/host1x/context.h  |  27 +++
 drivers/gpu/host1x/context_bus.c  |  31 
 drivers/gpu/host1x/dev.c  |  12 +-
 drivers/gpu/host1x/dev.h  |   2 +
 drivers/gpu/host1x/hw/channel_hw.c|  52 +-
 drivers/gpu/host1x/hw/host1x06_hardware.h |  10 ++
 drivers/gpu/host1x/hw/host1x07_hardware.h |  10 ++
 drivers/iommu/arm/arm-smmu/arm-smmu.c |  13 ++
 include/linux/host1x.h|  22 +++
 include/linux/host1x_context_bus.h|  15 ++
 23 files changed, 518 insertions(+), 18 deletions(-)
 create mode 100644 drivers/gpu/host1x/context.c
 create mode 100644 drivers/gpu/host1x/context.h
 create mode 100644 drivers/gpu/host1x/context_bus.c
 create mode 100644 include/linux/host1x_context_bus.h

-- 
2.35.0

[PATCH v4 2/9] gpu: host1x: Add context bus

From: Mikko Perttunen 

The context bus is a "dummy" bus that contains struct devices that
correspond to IOMMU contexts assigned through Host1x to processes.

Even when host1x itself is built as a module, the bus is registered
in built-in code so that the built-in ARM SMMU driver is able to
reference it.

Signed-off-by: Mikko Perttunen 
---
v4:
* Export bus as GPL
---
 drivers/gpu/Makefile   |  3 +--
 drivers/gpu/host1x/Kconfig |  5 +
 drivers/gpu/host1x/Makefile|  1 +
 drivers/gpu/host1x/context_bus.c   | 31 ++
 include/linux/host1x_context_bus.h | 15 +++
 5 files changed, 53 insertions(+), 2 deletions(-)
 create mode 100644 drivers/gpu/host1x/context_bus.c
 create mode 100644 include/linux/host1x_context_bus.h

diff --git a/drivers/gpu/Makefile b/drivers/gpu/Makefile
index 835c88318cec..8997f0096545 100644
--- a/drivers/gpu/Makefile
+++ b/drivers/gpu/Makefile
@@ -2,7 +2,6 @@
 # drm/tegra depends on host1x, so if both drivers are built-in care must be
 # taken to initialize them in the correct order. Link order is the only way
 # to ensure this currently.
-obj-$(CONFIG_TEGRA_HOST1X) += host1x/
-obj-y  += drm/ vga/
+obj-y  += host1x/ drm/ vga/
 obj-$(CONFIG_IMX_IPUV3_CORE)   += ipu-v3/
 obj-$(CONFIG_TRACE_GPU_MEM)+= trace/
diff --git a/drivers/gpu/host1x/Kconfig b/drivers/gpu/host1x/Kconfig
index 6815b4db17c1..1861a8180d3f 100644
--- a/drivers/gpu/host1x/Kconfig
+++ b/drivers/gpu/host1x/Kconfig
@@ -1,8 +1,13 @@
 # SPDX-License-Identifier: GPL-2.0-only
+
+config TEGRA_HOST1X_CONTEXT_BUS
+   bool
+
 config TEGRA_HOST1X
tristate "NVIDIA Tegra host1x driver"
depends on ARCH_TEGRA || (ARM && COMPILE_TEST)
select DMA_SHARED_BUFFER
+   select TEGRA_HOST1X_CONTEXT_BUS
select IOMMU_IOVA
help
  Driver for the NVIDIA Tegra host1x hardware.
diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile
index d2b6f7de0498..c891a3e33844 100644
--- a/drivers/gpu/host1x/Makefile
+++ b/drivers/gpu/host1x/Makefile
@@ -18,3 +18,4 @@ host1x-y = \
hw/host1x07.o
 
 obj-$(CONFIG_TEGRA_HOST1X) += host1x.o
+obj-$(CONFIG_TEGRA_HOST1X_CONTEXT_BUS) += context_bus.o
diff --git a/drivers/gpu/host1x/context_bus.c b/drivers/gpu/host1x/context_bus.c
new file mode 100644
index ..b0d35b2bbe89
--- /dev/null
+++ b/drivers/gpu/host1x/context_bus.c
@@ -0,0 +1,31 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2021, NVIDIA Corporation.
+ */
+
+#include 
+#include 
+
+struct bus_type host1x_context_device_bus_type = {
+   .name = "host1x-context",
+};
+EXPORT_SYMBOL_GPL(host1x_context_device_bus_type);
+
+static int __init host1x_context_device_bus_init(void)
+{
+   int err;
+
+   if (!of_machine_is_compatible("nvidia,tegra186") &&
+   !of_machine_is_compatible("nvidia,tegra194") &&
+   !of_machine_is_compatible("nvidia,tegra234"))
+   return 0;
+
+   err = bus_register(&host1x_context_device_bus_type);
+   if (err < 0) {
+   pr_err("bus type registration failed: %d\n", err);
+   return err;
+   }
+
+   return 0;
+}
+postcore_initcall(host1x_context_device_bus_init);
diff --git a/include/linux/host1x_context_bus.h 
b/include/linux/host1x_context_bus.h
new file mode 100644
index ..72462737a6db
--- /dev/null
+++ b/include/linux/host1x_context_bus.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright (c) 2021, NVIDIA Corporation. All rights reserved.
+ */
+
+#ifndef __LINUX_HOST1X_CONTEXT_BUS_H
+#define __LINUX_HOST1X_CONTEXT_BUS_H
+
+#include 
+
+#ifdef CONFIG_TEGRA_HOST1X_CONTEXT_BUS
+extern struct bus_type host1x_context_device_bus_type;
+#endif
+
+#endif
-- 
2.35.0

[PATCH v4 9/9] drm/tegra: vic: Implement get_streamid_offset

From: Mikko Perttunen 

Implement the get_streamid_offset required for supporting context
isolation. Since old firmware cannot support context isolation
without hacks that we don't want to implement, check the firmware
binary to see if context isolation should be enabled.

Signed-off-by: Mikko Perttunen 
---
v4:
* Add locking in vic_load_firmware
* Return -EOPNOTSUPP if context isolation is not available
* Update for changed get_streamid_offset declaration
* Add comment noting that vic_load_firmware is safe to call
  without the hardware being powered on
---
 drivers/gpu/drm/tegra/vic.c | 68 -
 1 file changed, 60 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/tegra/vic.c b/drivers/gpu/drm/tegra/vic.c
index 1e342fa3d27b..61eb0407de2a 100644
--- a/drivers/gpu/drm/tegra/vic.c
+++ b/drivers/gpu/drm/tegra/vic.c
@@ -38,6 +38,8 @@ struct vic {
struct clk *clk;
struct reset_control *rst;
 
+   bool can_use_context;
+
/* Platform configuration */
const struct vic_config *config;
 };
@@ -229,28 +231,38 @@ static int vic_load_firmware(struct vic *vic)
 {
struct host1x_client *client = &vic->client.base;
struct tegra_drm *tegra = vic->client.drm;
+   static DEFINE_MUTEX(lock);
+   u32 fce_bin_data_offset;
dma_addr_t iova;
size_t size;
void *virt;
int err;
 
-   if (vic->falcon.firmware.virt)
-   return 0;
+   mutex_lock(&lock);
+
+   if (vic->falcon.firmware.virt) {
+   err = 0;
+   goto unlock;
+   }
 
err = falcon_read_firmware(&vic->falcon, vic->config->firmware);
if (err < 0)
-   return err;
+   goto unlock;
 
size = vic->falcon.firmware.size;
 
if (!client->group) {
virt = dma_alloc_coherent(vic->dev, size, &iova, GFP_KERNEL);
-   if (!virt)
-   return -ENOMEM;
+   if (!virt) {
+   err = -ENOMEM;
+   goto unlock;
+   }
} else {
virt = tegra_drm_alloc(tegra, size, &iova);
-   if (IS_ERR(virt))
-   return PTR_ERR(virt);
+   if (IS_ERR(virt)) {
+   err = PTR_ERR(virt);
+   goto unlock;
+   }
}
 
vic->falcon.firmware.virt = virt;
@@ -277,7 +289,28 @@ static int vic_load_firmware(struct vic *vic)
vic->falcon.firmware.phys = phys;
}
 
-   return 0;
+   /*
+* Check if firmware is new enough to not require mapping firmware
+* to data buffer domains.
+*/
+   fce_bin_data_offset = *(u32 *)(virt + VIC_UCODE_FCE_DATA_OFFSET);
+
+   if (!vic->config->supports_sid) {
+   vic->can_use_context = false;
+   } else if (fce_bin_data_offset != 0x0 && fce_bin_data_offset != 
0xa5a5a5a5) {
+   /*
+* Firmware will access FCE through STREAMID0, so context
+* isolation cannot be used.
+*/
+   vic->can_use_context = false;
+   dev_warn_once(vic->dev, "context isolation disabled due to old 
firmware\n");
+   } else {
+   vic->can_use_context = true;
+   }
+
+unlock:
+   mutex_unlock(&lock);
+   return err;
 
 cleanup:
if (!client->group)
@@ -285,6 +318,7 @@ static int vic_load_firmware(struct vic *vic)
else
tegra_drm_free(tegra, size, virt, iova);
 
+   mutex_unlock(&lock);
return err;
 }
 
@@ -358,10 +392,28 @@ static void vic_close_channel(struct tegra_drm_context 
*context)
host1x_channel_put(context->channel);
 }
 
+static int vic_get_streamid_offset(struct tegra_drm_client *client, u32 
*offset)
+{
+   struct vic *vic = to_vic(client);
+   int err;
+
+   /* This doesn't access HW so it's safe to call without powering up. */
+   err = vic_load_firmware(vic);
+   if (err < 0)
+   return err;
+
+   if (!vic->can_use_context)
+   return -EOPNOTSUPP;
+
+   *offset = 0x30;
+   return 0;
+}
+
 static const struct tegra_drm_client_ops vic_ops = {
.open_channel = vic_open_channel,
.close_channel = vic_close_channel,
.submit = tegra_drm_submit,
+   .get_streamid_offset = vic_get_streamid_offset,
 };
 
 #define NVIDIA_TEGRA_124_VIC_FIRMWARE "nvidia/tegra124/vic03_ucode.bin"
-- 
2.35.0

[PATCH v4 4/9] gpu: host1x: Program context stream ID on submission

From: Mikko Perttunen 

Add code to do stream ID switching at the beginning of a job. The
stream ID is switched to the stream ID specified by the context
passed in the job structure.

Before switching the stream ID, an OP_DONE wait is done on the
channel's engine to ensure that there is no residual ongoing
work that might do DMA using the new stream ID.

Signed-off-by: Mikko Perttunen 
---
v4:
* Rename job->context to job->memory_context for clarity
---
 drivers/gpu/host1x/hw/channel_hw.c| 52 +--
 drivers/gpu/host1x/hw/host1x06_hardware.h | 10 +
 drivers/gpu/host1x/hw/host1x07_hardware.h | 10 +
 include/linux/host1x.h|  4 ++
 4 files changed, 72 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/host1x/hw/channel_hw.c 
b/drivers/gpu/host1x/hw/channel_hw.c
index 6b40e9af1e88..f84caf06621a 100644
--- a/drivers/gpu/host1x/hw/channel_hw.c
+++ b/drivers/gpu/host1x/hw/channel_hw.c
@@ -180,6 +180,45 @@ static void host1x_enable_gather_filter(struct 
host1x_channel *ch)
 #endif
 }
 
+static void host1x_channel_program_engine_streamid(struct host1x_job *job)
+{
+#if HOST1X_HW >= 6
+   u32 fence;
+
+   if (!job->memory_context)
+   return;
+
+   fence = host1x_syncpt_incr_max(job->syncpt, 1);
+
+   /* First, increment a syncpoint on OP_DONE condition.. */
+
+   host1x_cdma_push(&job->channel->cdma,
+   host1x_opcode_nonincr(HOST1X_UCLASS_INCR_SYNCPT, 1),
+   HOST1X_UCLASS_INCR_SYNCPT_INDX_F(job->syncpt->id) |
+   HOST1X_UCLASS_INCR_SYNCPT_COND_F(1));
+
+   /* Wait for syncpoint to increment */
+
+   host1x_cdma_push(&job->channel->cdma,
+   host1x_opcode_setclass(HOST1X_CLASS_HOST1X,
+   host1x_uclass_wait_syncpt_r(), 1),
+   host1x_class_host_wait_syncpt(job->syncpt->id, fence));
+
+   /*
+* Now that we know the engine is idle, return to class and
+* change stream ID.
+*/
+
+   host1x_cdma_push(&job->channel->cdma,
+   host1x_opcode_setclass(job->class, 0, 0),
+   HOST1X_OPCODE_NOP);
+
+   host1x_cdma_push(&job->channel->cdma,
+   host1x_opcode_setpayload(job->memory_context->stream_id),
+   host1x_opcode_setstreamid(job->engine_streamid_offset / 4));
+#endif
+}
+
 static int channel_submit(struct host1x_job *job)
 {
struct host1x_channel *ch = job->channel;
@@ -236,18 +275,23 @@ static int channel_submit(struct host1x_job *job)
if (sp->base)
synchronize_syncpt_base(job);
 
-   syncval = host1x_syncpt_incr_max(sp, user_syncpt_incrs);
-
host1x_hw_syncpt_assign_to_channel(host, sp, ch);
 
-   job->syncpt_end = syncval;
-
/* add a setclass for modules that require it */
if (job->class)
host1x_cdma_push(&ch->cdma,
 host1x_opcode_setclass(job->class, 0, 0),
 HOST1X_OPCODE_NOP);
 
+   /*
+* Ensure engine DMA is idle and set new stream ID. May increment
+* syncpt max.
+*/
+   host1x_channel_program_engine_streamid(job);
+
+   syncval = host1x_syncpt_incr_max(sp, user_syncpt_incrs);
+   job->syncpt_end = syncval;
+
submit_gathers(job, syncval - user_syncpt_incrs);
 
/* end CDMA submit & stash pinned hMems into sync queue */
diff --git a/drivers/gpu/host1x/hw/host1x06_hardware.h 
b/drivers/gpu/host1x/hw/host1x06_hardware.h
index 01a142a09800..5d515745eee7 100644
--- a/drivers/gpu/host1x/hw/host1x06_hardware.h
+++ b/drivers/gpu/host1x/hw/host1x06_hardware.h
@@ -127,6 +127,16 @@ static inline u32 host1x_opcode_gather_incr(unsigned 
offset, unsigned count)
return (6 << 28) | (offset << 16) | BIT(15) | BIT(14) | count;
 }
 
+static inline u32 host1x_opcode_setstreamid(unsigned streamid)
+{
+   return (7 << 28) | streamid;
+}
+
+static inline u32 host1x_opcode_setpayload(unsigned payload)
+{
+   return (9 << 28) | payload;
+}
+
 static inline u32 host1x_opcode_gather_wide(unsigned count)
 {
return (12 << 28) | count;
diff --git a/drivers/gpu/host1x/hw/host1x07_hardware.h 
b/drivers/gpu/host1x/hw/host1x07_hardware.h
index e6582172ebfd..82c0cc9bb0b5 100644
--- a/drivers/gpu/host1x/hw/host1x07_hardware.h
+++ b/drivers/gpu/host1x/hw/host1x07_hardware.h
@@ -127,6 +127,16 @@ static inline u32 host1x_opcode_gather_incr(unsigned 
offset, unsigned count)
return (6 << 28) | (offset << 16) | BIT(15) | BIT(14) | count;
 }
 
+static inline u32 host1x_opcode_setstreamid(unsigned streamid)
+{
+   return (7 << 28) | streamid;
+}
+
+static inline u32 host1x_opcode_setpayload(unsigned payload)
+{
+   return (9 << 28) | payload;
+}
+
 static inline u32 host1x_opcode_gather_wide(unsigned count)
 {
return (12 << 28) | count;
diff --git a/include/linux/host1x.h b/include/linux/host1x.h
index 9dbdc4b0bb82..2793131e8c96 100644
--- a/include/

[PATCH v4 7/9] drm/tegra: falcon: Set DMACTX field on DMA transactions

From: Mikko Perttunen 

The DMACTX field determines which context, as specified in the
TRANSCFG register, is used. While during boot it doesn't matter
which is used, later on it matters and this value is reused by
the firmware.

Signed-off-by: Mikko Perttunen 
---
 drivers/gpu/drm/tegra/falcon.c | 8 
 drivers/gpu/drm/tegra/falcon.h | 1 +
 2 files changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/tegra/falcon.c b/drivers/gpu/drm/tegra/falcon.c
index 223ab2ceb7e6..8bdb72f08f58 100644
--- a/drivers/gpu/drm/tegra/falcon.c
+++ b/drivers/gpu/drm/tegra/falcon.c
@@ -48,6 +48,14 @@ static int falcon_copy_chunk(struct falcon *falcon,
if (target == FALCON_MEMORY_IMEM)
cmd |= FALCON_DMATRFCMD_IMEM;
 
+   /*
+* Use second DMA context (i.e. the one for firmware). Strictly
+* speaking, at this point both DMA contexts point to the firmware
+* stream ID, but this register's value will be reused by the firmware
+* for later DMA transactions, so we need to use the correct value.
+*/
+   cmd |= FALCON_DMATRFCMD_DMACTX(1);
+
falcon_writel(falcon, offset, FALCON_DMATRFMOFFS);
falcon_writel(falcon, base, FALCON_DMATRFFBOFFS);
falcon_writel(falcon, cmd, FALCON_DMATRFCMD);
diff --git a/drivers/gpu/drm/tegra/falcon.h b/drivers/gpu/drm/tegra/falcon.h
index c56ee32d92ee..1955cf11a8a6 100644
--- a/drivers/gpu/drm/tegra/falcon.h
+++ b/drivers/gpu/drm/tegra/falcon.h
@@ -50,6 +50,7 @@
 #define FALCON_DMATRFCMD_IDLE  (1 << 1)
 #define FALCON_DMATRFCMD_IMEM  (1 << 4)
 #define FALCON_DMATRFCMD_SIZE_256B (6 << 8)
+#define FALCON_DMATRFCMD_DMACTX(v) (((v) & 0x7) << 12)
 
 #define FALCON_DMATRFFBOFFS0x111c
 
-- 
2.35.0

[PATCH v4 3/9] gpu: host1x: Add context device management code

From: Mikko Perttunen 

Add code to register context devices from device tree, allocate them
out and manage their refcounts.

Signed-off-by: Mikko Perttunen 
---
v2:
* Directly set DMA mask instead of inheriting from Host1x.
* Use iommu-map instead of custom DT property.
v4:
* Use u64 instead of dma_addr_t for DMA mask
* Use unsigned ints for indexes and adjust error handling flow
* Parse iommu-map property at top level host1x DT node
* Use separate DMA mask per device
* Export symbols as GPL
---
 drivers/gpu/host1x/Makefile  |   1 +
 drivers/gpu/host1x/context.c | 160 +++
 drivers/gpu/host1x/context.h |  27 ++
 drivers/gpu/host1x/dev.c |  12 ++-
 drivers/gpu/host1x/dev.h |   2 +
 include/linux/host1x.h   |  18 
 6 files changed, 219 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/host1x/context.c
 create mode 100644 drivers/gpu/host1x/context.h

diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile
index c891a3e33844..8a65e13d113a 100644
--- a/drivers/gpu/host1x/Makefile
+++ b/drivers/gpu/host1x/Makefile
@@ -10,6 +10,7 @@ host1x-y = \
debug.o \
mipi.o \
fence.o \
+   context.o \
hw/host1x01.o \
hw/host1x02.o \
hw/host1x04.o \
diff --git a/drivers/gpu/host1x/context.c b/drivers/gpu/host1x/context.c
new file mode 100644
index ..c5889be64634
--- /dev/null
+++ b/drivers/gpu/host1x/context.c
@@ -0,0 +1,160 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2021, NVIDIA Corporation.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "context.h"
+#include "dev.h"
+
+int host1x_context_list_init(struct host1x *host1x)
+{
+   struct host1x_context_list *cdl = &host1x->context_list;
+   struct device_node *node = host1x->dev->of_node;
+   struct host1x_context *ctx;
+   unsigned int i;
+   int err;
+
+   cdl->devs = NULL;
+   cdl->len = 0;
+   mutex_init(&cdl->lock);
+
+   err = of_property_count_u32_elems(node, "iommu-map");
+   if (err < 0)
+   return 0;
+
+   cdl->devs = kcalloc(err, sizeof(*cdl->devs), GFP_KERNEL);
+   if (!cdl->devs)
+   return -ENOMEM;
+   cdl->len = err / 4;
+
+   for (i = 0; i < cdl->len; i++) {
+   struct iommu_fwspec *fwspec;
+
+   ctx = &cdl->devs[i];
+
+   ctx->host = host1x;
+
+   device_initialize(&ctx->dev);
+
+   /*
+* Due to an issue with T194 NVENC, only 38 bits can be used.
+* Anyway, 256GiB of IOVA ought to be enough for anyone.
+*/
+   ctx->dma_mask = DMA_BIT_MASK(38);
+   ctx->dev.dma_mask = &ctx->dma_mask;
+   ctx->dev.coherent_dma_mask = ctx->dma_mask;
+   dev_set_name(&ctx->dev, "host1x-ctx.%d", i);
+   ctx->dev.bus = &host1x_context_device_bus_type;
+   ctx->dev.parent = host1x->dev;
+
+   dma_set_max_seg_size(&ctx->dev, UINT_MAX);
+
+   err = device_add(&ctx->dev);
+   if (err) {
+   dev_err(host1x->dev, "could not add context device %d: 
%d\n", i, err);
+   goto del_devices;
+   }
+
+   err = of_dma_configure_id(&ctx->dev, node, true, &i);
+   if (err) {
+   dev_err(host1x->dev, "IOMMU configuration failed for 
context device %d: %d\n",
+   i, err);
+   device_del(&ctx->dev);
+   goto del_devices;
+   }
+
+   fwspec = dev_iommu_fwspec_get(&ctx->dev);
+   if (!fwspec) {
+   dev_err(host1x->dev, "Context device %d has no 
IOMMU!\n", i);
+   device_del(&ctx->dev);
+   goto del_devices;
+   }
+
+   ctx->stream_id = fwspec->ids[0] & 0x;
+   }
+
+   return 0;
+
+del_devices:
+   while (i--)
+   device_del(&cdl->devs[i].dev);
+
+   kfree(cdl->devs);
+   cdl->len = 0;
+
+   return err;
+}
+
+void host1x_context_list_free(struct host1x_context_list *cdl)
+{
+   unsigned int i;
+
+   for (i = 0; i < cdl->len; i++)
+   device_del(&cdl->devs[i].dev);
+
+   kfree(cdl->devs);
+   cdl->len = 0;
+}
+
+struct host1x_context *host1x_context_alloc(struct host1x *host1x,
+   struct pid *pid)
+{
+   struct host1x_context_list *cdl = &host1x->context_list;
+   struct host1x_context *free = NULL;
+   int i;
+
+   if (!cdl->len)
+   return ERR_PTR(-EOPNOTSUPP);
+
+   mutex_lock(&cdl->lock);
+
+   for (i = 0; i < cdl->len; i++) {
+   struct host1x_context *cd = &cdl->devs[i];
+
+   if (cd->owner == pid) {
+   refcount_inc(&cd->ref);
+

[PATCH v4 8/9] drm/tegra: Support context isolation

From: Mikko Perttunen 

For engines that support context isolation, allocate a context when
opening a channel, and set up stream ID offset and context fields
when submitting a job.

Signed-off-by: Mikko Perttunen 
---
v4:
* Separate error and output values in get_streamid_offset API
* Improve error handling
* Rename job->context to job->memory_context for clarity
---
 drivers/gpu/drm/tegra/drm.h|  2 ++
 drivers/gpu/drm/tegra/submit.c | 21 +++-
 drivers/gpu/drm/tegra/uapi.c   | 45 --
 3 files changed, 65 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/tegra/drm.h b/drivers/gpu/drm/tegra/drm.h
index fc0a19554eac..608daba01587 100644
--- a/drivers/gpu/drm/tegra/drm.h
+++ b/drivers/gpu/drm/tegra/drm.h
@@ -80,6 +80,7 @@ struct tegra_drm_context {
 
/* Only used by new UAPI. */
struct xarray mappings;
+   struct host1x_context *memory_context;
 };
 
 struct tegra_drm_client_ops {
@@ -91,6 +92,7 @@ struct tegra_drm_client_ops {
int (*submit)(struct tegra_drm_context *context,
  struct drm_tegra_submit *args, struct drm_device *drm,
  struct drm_file *file);
+   int (*get_streamid_offset)(struct tegra_drm_client *client, u32 
*offset);
 };
 
 int tegra_drm_submit(struct tegra_drm_context *context,
diff --git a/drivers/gpu/drm/tegra/submit.c b/drivers/gpu/drm/tegra/submit.c
index 6d6dd8c35475..fd0ba09e552a 100644
--- a/drivers/gpu/drm/tegra/submit.c
+++ b/drivers/gpu/drm/tegra/submit.c
@@ -498,6 +498,9 @@ static void release_job(struct host1x_job *job)
struct tegra_drm_submit_data *job_data = job->user_data;
u32 i;
 
+   if (job->memory_context)
+   host1x_context_put(job->memory_context);
+
for (i = 0; i < job_data->num_used_mappings; i++)
tegra_drm_mapping_put(job_data->used_mappings[i].mapping);
 
@@ -588,11 +591,24 @@ int tegra_drm_ioctl_channel_submit(struct drm_device 
*drm, void *data,
goto put_job;
}
 
+   if (context->memory_context && 
context->client->ops->get_streamid_offset) {
+   int offset;
+
+   err = 
context->client->ops->get_streamid_offset(context->client, &offset);
+   if (!err) {
+   job->memory_context = context->memory_context;
+   job->engine_streamid_offset = offset;
+   host1x_context_get(job->memory_context);
+   } else if (err != -EOPNOTSUPP) {
+   goto unpin_job;
+   }
+   }
+
/* Boot engine. */
err = pm_runtime_resume_and_get(context->client->base.dev);
if (err < 0) {
SUBMIT_ERR(context, "could not power up engine: %d", err);
-   goto unpin_job;
+   goto put_memory_context;
}
 
job->user_data = job_data;
@@ -627,6 +643,9 @@ int tegra_drm_ioctl_channel_submit(struct drm_device *drm, 
void *data,
 
goto put_job;
 
+put_memory_context:
+   if (job->memory_context)
+   host1x_context_put(job->memory_context);
 unpin_job:
host1x_job_unpin(job);
 put_job:
diff --git a/drivers/gpu/drm/tegra/uapi.c b/drivers/gpu/drm/tegra/uapi.c
index 9ab9179d2026..475febb6d86b 100644
--- a/drivers/gpu/drm/tegra/uapi.c
+++ b/drivers/gpu/drm/tegra/uapi.c
@@ -33,6 +33,9 @@ static void tegra_drm_channel_context_close(struct 
tegra_drm_context *context)
struct tegra_drm_mapping *mapping;
unsigned long id;
 
+   if (context->memory_context)
+   host1x_context_put(context->memory_context);
+
xa_for_each(&context->mappings, id, mapping)
tegra_drm_mapping_put(mapping);
 
@@ -72,6 +75,7 @@ static struct tegra_drm_client *tegra_drm_find_client(struct 
tegra_drm *tegra, u
 
 int tegra_drm_ioctl_channel_open(struct drm_device *drm, void *data, struct 
drm_file *file)
 {
+   struct host1x *host = tegra_drm_to_host1x(drm->dev_private);
struct tegra_drm_file *fpriv = file->driver_priv;
struct tegra_drm *tegra = drm->dev_private;
struct drm_tegra_channel_open *args = data;
@@ -102,10 +106,38 @@ int tegra_drm_ioctl_channel_open(struct drm_device *drm, 
void *data, struct drm_
}
}
 
+   /* Only allocate context if the engine supports context isolation. */
+   if (client->ops->get_streamid_offset) {
+   u32 offset;
+
+   err = client->ops->get_streamid_offset(client, &offset);
+   if (!err) {
+   context->memory_context =
+   host1x_context_alloc(host, 
get_task_pid(current, PIDTYPE_TGID));
+   } else if (err == -EOPNOTSUPP) {
+   context->memory_context = NULL;
+   } else {
+   goto put_channel;
+   }
+
+   if (IS_ERR(context->memory_context)) {
+   if (PTR_ERR(

[PATCH v4 5/9] iommu/arm-smmu: Attach to host1x context device bus

From: Mikko Perttunen 

Set itself as the IOMMU for the host1x context device bus, containing
"dummy" devices used for Host1x context isolation.

Signed-off-by: Mikko Perttunen 
---
 drivers/iommu/arm/arm-smmu/arm-smmu.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 4bc75c4ce402..23082675d542 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -39,6 +39,7 @@
 
 #include 
 #include 
+#include 
 
 #include "arm-smmu.h"
 
@@ -2051,8 +2052,20 @@ static int arm_smmu_bus_init(struct iommu_ops *ops)
goto err_reset_pci_ops;
}
 #endif
+#ifdef CONFIG_TEGRA_HOST1X_CONTEXT_BUS
+   if (!iommu_present(&host1x_context_device_bus_type)) {
+   err = bus_set_iommu(&host1x_context_device_bus_type, ops);
+   if (err)
+   goto err_reset_fsl_mc_ops;
+   }
+#endif
+
return 0;
 
+err_reset_fsl_mc_ops: __maybe_unused;
+#ifdef CONFIG_FSL_MC_BUS
+   bus_set_iommu(&fsl_mc_bus_type, NULL);
+#endif
 err_reset_pci_ops: __maybe_unused;
 #ifdef CONFIG_PCI
bus_set_iommu(&pci_bus_type, NULL);
-- 
2.35.0

[PATCH v4 1/9] dt-bindings: host1x: Add iommu-map property

From: Mikko Perttunen 

Add schema information for specifying context stream IDs. This uses
the standard iommu-map property.

Signed-off-by: Mikko Perttunen 
---
v3:
* New patch
v4:
* Remove memory-contexts subnode.
---
 .../bindings/display/tegra/nvidia,tegra20-host1x.yaml| 5 +
 1 file changed, 5 insertions(+)

diff --git 
a/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml 
b/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml
index 4fd513efb0f7..0adeb03b9e3a 100644
--- a/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml
+++ b/Documentation/devicetree/bindings/display/tegra/nvidia,tegra20-host1x.yaml
@@ -144,6 +144,11 @@ allOf:
 reset-names:
   maxItems: 1
 
+iommu-map:
+  description: Specification of stream IDs available for memory 
context device
+use. Should be a mapping of IDs 0..n to IOMMU entries 
corresponding to
+usable stream IDs.
+
   required:
 - reg-names
 
-- 
2.35.0

Re: [PATCH] mm: split vm_normal_pages for LRU and non-LRU handling

2022-03-01 Thread David Hildenbrand



>>
>>> if (PageReserved(page))
>>> diff --git a/mm/migrate.c b/mm/migrate.c
>>> index c31d04b46a5e..17d049311b78 100644
>>> --- a/mm/migrate.c
>>> +++ b/mm/migrate.c
>>> @@ -1614,7 +1614,7 @@ static int add_page_for_migration(struct mm_struct 
>>> *mm, unsigned long addr,
>>> goto out;
>>>   
>>> /* FOLL_DUMP to ignore special (like zero) pages */
>>> -   follflags = FOLL_GET | FOLL_DUMP;
>>> +   follflags = FOLL_GET | FOLL_DUMP | FOLL_LRU;
>>> page = follow_page(vma, addr, follflags);
>> Why wouldn't we want to dump DEVICE_COHERENT pages? This looks wrong.
> 
> This function later calls isolate_lru_page, which is something you can't 
> do with a device page.
> 

Then, that code might require care instead. We most certainly don't want
to have random memory holes in a dump just because some anonymous memory
was migrated to DEVICE_COHERENT.

-- 
Thanks,

David / dhildenb

Re: [PATCH] mm: split vm_normal_pages for LRU and non-LRU handling

2022-03-01 Thread Felix Kuehling


Am 2022-03-01 um 11:22 schrieb David Hildenbrand:

if (PageReserved(page))
diff --git a/mm/migrate.c b/mm/migrate.c
index c31d04b46a5e..17d049311b78 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1614,7 +1614,7 @@ static int add_page_for_migration(struct mm_struct *mm, 
unsigned long addr,
goto out;
   
   	/* FOLL_DUMP to ignore special (like zero) pages */

-   follflags = FOLL_GET | FOLL_DUMP;
+   follflags = FOLL_GET | FOLL_DUMP | FOLL_LRU;
page = follow_page(vma, addr, follflags);

Why wouldn't we want to dump DEVICE_COHERENT pages? This looks wrong.

This function later calls isolate_lru_page, which is something you can't
do with a device page.


Then, that code might require care instead. We most certainly don't want
to have random memory holes in a dump just because some anonymous memory
was migrated to DEVICE_COHERENT.

I don't think this code is for core dumps. The call chain I see is

SYSCALL_DEFINE6(move_pages, ...) -> kernel_move_pages -> do_pages_move 
-> add_page_for_migration


As far as I can tell this is for NUMA migrations.

Regards,
  Felix

Re: [PATCH] mm: split vm_normal_pages for LRU and non-LRU handling

2022-03-01 Thread David Hildenbrand

On 01.03.22 17:30, Felix Kuehling wrote:
> Am 2022-03-01 um 11:22 schrieb David Hildenbrand:
>   if (PageReserved(page))
> diff --git a/mm/migrate.c b/mm/migrate.c
> index c31d04b46a5e..17d049311b78 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1614,7 +1614,7 @@ static int add_page_for_migration(struct mm_struct 
> *mm, unsigned long addr,
>   goto out;
>
>   /* FOLL_DUMP to ignore special (like zero) pages */
> - follflags = FOLL_GET | FOLL_DUMP;
> + follflags = FOLL_GET | FOLL_DUMP | FOLL_LRU;
>   page = follow_page(vma, addr, follflags);
 Why wouldn't we want to dump DEVICE_COHERENT pages? This looks wrong.
>>> This function later calls isolate_lru_page, which is something you can't
>>> do with a device page.
>>>
>> Then, that code might require care instead. We most certainly don't want
>> to have random memory holes in a dump just because some anonymous memory
>> was migrated to DEVICE_COHERENT.
> I don't think this code is for core dumps. The call chain I see is
> 
> SYSCALL_DEFINE6(move_pages, ...) -> kernel_move_pages -> do_pages_move 
> -> add_page_for_migration
> 

Ah, sorry, I got mislead by FOLL_DUMP and thought we'd be in
get_dump_page() . As you said, nothing to do.

-- 
Thanks,

David / dhildenb

[PATCH 00/12] use dynamic-debug under drm.debug api

hi Jason, Greg, Daniel, DRM-everyone

drm.debug api provides ~23 macros to issue 10 categories of debug
messages, each enabled by a bit in /sys/module/drm/parameters/debug.
drm_debug_enabled(category) tests these bits at runtime; while cheap
individually, the costs accumulate.

Daniel,
I think this revision addresses most of your early review, a lot has
changed since.  Heres the link:
https://patchwork.freedesktop.org/patch/443989/

For CONFIG_DRM_USE_DYNAMIC_DEBUG=y, this patchset obsoletes those
runtime tests (inside drm_*dbg) by wrapping the 2 fns in one of the
dynamic_func_call* Factory macros.  The config dependence is due to
the .data footprint cost of the tables; AMDGPU has ~4k callsites, at
56 bytes each.

This patchset creates entries in /proc/dynamic_debug/control for each
callsite, and each has .class_id = macros' category.  Those entries,
and a new query keyword, allow (1st):

  # 1=DRM_UT_KMS (iirc)
  #> echo "module drm class 1 +p  > /proc/dynamic_debug/control

Then equivalently:
  # except it also clears other flags
  #> echo 0x01 > /sys/module/drm/parameters/debug

series overview:

dyndbg:
 - fix a bug in dyndbg static_key toggling, @stable cc'd
 - adds support for distinct classes to dyndbg (new,unused feature)
 - add DECLARE_DYNAMIC_DEBUG_CLASSBITS macro and callbacks
   to implement bitmap -> classid sysfs knob
dyndbg:
 - drops exported fn: dynamic_debug_exec_queries()
   any potential users would just use macro, or a tweak on it.
 - improve info-msg to print both "old -> new" flags
drm:
 - adapts drm debug category to dyndbg.class_id
 - wraps drm_*dbg() in a dyndbg Factory macro to get NOOP optimized debugs
   this disconnects drm.debug sysfs knob
 - uses DECLARE_DYNAMIC_DEBUG_CLASSBITS macro
   this reconnects sysfs knob

This could be -v12, but the focus and subject has wandered a bit, and
patchwork CI had multiple different notions of the version.

Noteworthy changes:

- no tracefs stuff here, refocus

In contrast, the previous drm.debug approach:

- replaced drm_dbg & drm_devdbg with calls to pr_debug & dev_dbg
  this preserved the optional decorations: module:function:line:

- used DRM_UT_CORE => "drm:core:" prefix-string, cpp cat'd to formats
  this made sites selectable by matching to that format prefix

This version:

- .class_id is easier to explain, and no config/format-string diffs

- wraps drm_dbg & drm_devdbg callsites for jumplabel enablement
  efficiency was original goal.

- loses the optional decorations.
  drm has its own logmsg standards, doesn't need decorations slapped on
  later: could recast flags for drm specific decorations

This is based on 5.17-rc4, for no particular reason.

Its also here: in (dd-drm branch)
  ghlinux-rohttps://github.com/jimc/linux.git (fetch)


Jim Cromie (13):
  dyndbg: fix static_branch manipulation @stable
  dyndbg: add class_id field and query support
  dyndbg: add DEFINE_DYNAMIC_DEBUG_CLASSBITS macro and callbacks
  dyndbg: drop EXPORTed dynamic_debug_exec_queries
  dyndbg: improve change-info to have old and new
  dyndbg: abstract dyndbg_site_is_printing
  drm_print: condense enum drm_debug_category
  drm_print: interpose drm_*dbg with forwarding macros
  drm_print: wrap drm_*_dbg in dyndbg jumplabel
  drm_print: refine drm_debug_enabled for dyndbg+jump-label
  drm_print: prefer bare printk KERN_DEBUG on generic fn
  drm_print: add _ddebug desc to drm_*dbg prototypes
  drm_print: use DEFINE_DYNAMIC_DEBUG_CLASSBITS for drm.debug

 .../admin-guide/dynamic-debug-howto.rst   |   7 +
 drivers/gpu/drm/Kconfig   |  12 ++
 drivers/gpu/drm/Makefile  |   2 +
 drivers/gpu/drm/drm_print.c   |  56 ---
 include/drm/drm_print.h   |  80 +++---
 include/linux/dynamic_debug.h | 113 +++---
 lib/dynamic_debug.c   | 140 ++
 7 files changed, 323 insertions(+), 87 deletions(-)

-- 
2.35.1

[PATCH 02/13] dyndbg: add class_id field and query support

DRM defines/uses 10 enum drm_debug_category's to create exclusive
classes of debug messages.  To support this directly in dynamic-debug,
add the following:

- struct _ddebug.class_id:4 - 4 bits is enough
- define _DPRINTK_SITE_UNCLASSED 15 - see below

and the query support:
- struct _ddebug_query.class_id
- ddebug_parse_query- looks for "class" keyword, then calls..
- parse_class   - accepts uint: 0-15, sets query.class_id and marker
- vpr_info_dq   - displays new field
- ddebug_proc_show  - append column with "cls:%d" if !UNCLASSED

With the patch, this command enables current/unclassed callsites:

  #> echo drm class 15 +p > /proc/dynamic_debug/control

parse_class() accepts 0 .. _DPRINTK_SITE_UNCLASSED, which allows the
>control interface to explicitly manipulate unclassed callsites.

After parsing keywords, ddebug_parse_query() sets .class_id=15, iff it
wasnt explicitly set.  This allows future classed/categorized
callsites to be untouched by legacy (class unaware) queries.

DEFINE_DYNAMIC_DEBUG_METADATA gets _CLS(cls,) suffix and arg, and
initializes the new .class_id=cls field.  The old name gets the default.

Then, these _CLS(cls,...) modifications are repeated up through the
stack of *dynamic_func_call* macros that use the METADATA initializer,
so as to actually supply the category into it.

NOTES:

_DPRINTK_SITE_UNCLASSED: this symbol is used to initialize all
existing/unclassed pr-debug callsites.  Normally, the default would be
zero, but DRM_UT_CORE "uses" that value, in the sense that 0 is
exposed as a bit position in drm.debug.  Using 15 allows identity
mapping from category to class, avoiding fiddly offsets.

CC: Rasmus Villemoes 
Signed-off-by: Jim Cromie 

fixup class-id preset

fix2
---
 .../admin-guide/dynamic-debug-howto.rst   |  7 +++
 include/linux/dynamic_debug.h | 54 ++-
 lib/dynamic_debug.c   | 48 ++---
 3 files changed, 88 insertions(+), 21 deletions(-)

diff --git a/Documentation/admin-guide/dynamic-debug-howto.rst 
b/Documentation/admin-guide/dynamic-debug-howto.rst
index a89cfa083155..8ef8d7dcd140 100644
--- a/Documentation/admin-guide/dynamic-debug-howto.rst
+++ b/Documentation/admin-guide/dynamic-debug-howto.rst
@@ -35,6 +35,7 @@ Dynamic debug has even more useful features:
- line number (including ranges of line numbers)
- module name
- format string
+   - class number:0-15
 
  * Provides a debugfs control file: ``/dynamic_debug/control``
which can be read to display the complete list of known debug
@@ -143,6 +144,7 @@ against.  Possible keywords are:::
 'module' string |
 'format' string |
 'line' line-range
+'class' integer:[0-15]
 
   line-range ::= lineno |
 '-'lineno |
@@ -217,6 +219,11 @@ line
line -1605  // the 1605 lines from line 1 to line 1605
line 1600-  // all lines from line 1600 to the end of the file
 
+class
+This expects a single integer in range: 0-15.
+15 is used/reserved for existing/unclassed callsites,
+and is defaulted in unless specified to >control
+
 The flags specification comprises a change operation followed
 by one or more flag characters.  The change operation is one
 of the characters::
diff --git a/include/linux/dynamic_debug.h b/include/linux/dynamic_debug.h
index dce631e678dd..d4b48f3cc6e8 100644
--- a/include/linux/dynamic_debug.h
+++ b/include/linux/dynamic_debug.h
@@ -6,6 +6,8 @@
 #include 
 #endif
 
+#include 
+
 /*
  * An instance of this structure is created in a special
  * ELF section at every dynamic debug callsite.  At runtime,
@@ -21,6 +23,9 @@ struct _ddebug {
const char *filename;
const char *format;
unsigned int lineno:18;
+#define CLS_BITS 4
+   unsigned int class_id:CLS_BITS;
+#define _DPRINTK_SITE_UNCLASSED((1 << CLS_BITS) - 1)
/*
 * The flags field controls the behaviour at the callsite.
 * The bits here are changed dynamically when the user
@@ -87,7 +92,7 @@ void __dynamic_ibdev_dbg(struct _ddebug *descriptor,
 const struct ib_device *ibdev,
 const char *fmt, ...);
 
-#define DEFINE_DYNAMIC_DEBUG_METADATA(name, fmt)   \
+#define DEFINE_DYNAMIC_DEBUG_METADATA_CLS(name, cls, fmt)  \
static struct _ddebug  __aligned(8) \
__section("__dyndbg") name = {  \
.modname = KBUILD_MODNAME,  \
@@ -96,8 +101,14 @@ void __dynamic_ibdev_dbg(struct _ddebug *descriptor,
.format = (fmt),\
.lineno = __LINE__, \
.flags = _DPRINTK_FLAGS_DEFAULT,\
+   .class_id = cls,\
_DPRINTK_KEY_INIT

[PATCH 01/13] dyndbg: fix static_branch manipulation

In https://lore.kernel.org/lkml/20211209150910.ga23...@axis.com/

Vincent's patch commented on, and worked around, a bug toggling
static_branch's, when a 2nd PRINTK-ish flag was added.  The bug
results in a premature static_branch_disable when the 1st of 2 flags
was disabled.

The cited commit computed newflags, but then in the JUMP_LABEL block,
did not use that result, but used just one of the terms in it.  Using
newflags instead made the code work properly.

This is Vincents test-case, reduced.  It needs the 2nd flag to work
properly, but it's explanatory here.

pt_test() {
echo 5 > /sys/module/dynamic_debug/verbose

site="module tcp" # just one callsite
echo " $site =_ " > /proc/dynamic_debug/control # clear it

# A B ~A ~B
for flg in +T +p "-T #broke here" -p; do
echo " $site $flg " > /proc/dynamic_debug/control
done;

# A B ~B ~A
for flg in +T +p "-p #broke here" -T; do
echo " $site $flg " > /proc/dynamic_debug/control
done
}
pt_test

Fixes: 84da83a6ffc0 dyndbg: combine flags & mask into a struct, simplify with it
CC: vincent.whitchu...@axis.com
CC: sta...@vger.kernel.org
Signed-off-by: Jim Cromie 
---
 lib/dynamic_debug.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/lib/dynamic_debug.c b/lib/dynamic_debug.c
index dd7f56af9aed..a56c1286ffa4 100644
--- a/lib/dynamic_debug.c
+++ b/lib/dynamic_debug.c
@@ -211,10 +211,11 @@ static int ddebug_change(const struct ddebug_query *query,
continue;
 #ifdef CONFIG_JUMP_LABEL
if (dp->flags & _DPRINTK_FLAGS_PRINT) {
-   if (!(modifiers->flags & _DPRINTK_FLAGS_PRINT))
+   if (!(newflags & _DPRINTK_FLAGS_PRINT))

static_branch_disable(&dp->key.dd_key_true);
-   } else if (modifiers->flags & _DPRINTK_FLAGS_PRINT)
+   } else if (newflags & _DPRINTK_FLAGS_PRINT) {
static_branch_enable(&dp->key.dd_key_true);
+   }
 #endif
dp->flags = newflags;
v4pr_info("changed %s:%d [%s]%s =%s\n",
-- 
2.35.1

[PATCH 04/13] dyndbg: drop EXPORTed dynamic_debug_exec_queries

This exported fn is effectively obsoleted by Commit:HEAD~2, so remove it.

The export was added here:
  commit a2d375eda771 ("dyndbg: refine export, rename to 
dynamic_debug_exec_queries()")
  commit 4c0d77828d4f ("dyndbg: export ddebug_exec_queries")

Its intent was to allow drm.debug to use the exported function to
implement its drm.debug bitmap api using dynamic_debug.

Instead, HEAD~2 implements the bitmap inside dyndbg, and exposes it in
a macro declarator, and HEAD~1 uses the macro to connect __drm_debug
to the supporting callbacks.

Since there are no other expected users, and any prospects would
likely reuse the bitmap or a straightforward extension of it, we can
drop this function until its really needed.

This also drops the CONFIG_DYNAMIC_DEBUG=N stub-func, and its
pr_warn(), which I avoided in 2012, then added in 2020 :-/

Signed-off-by: Jim Cromie 
---
 include/linux/dynamic_debug.h |  9 -
 lib/dynamic_debug.c   | 29 -
 2 files changed, 38 deletions(-)

diff --git a/include/linux/dynamic_debug.h b/include/linux/dynamic_debug.h
index e83c4e36ad29..664bb83778d2 100644
--- a/include/linux/dynamic_debug.h
+++ b/include/linux/dynamic_debug.h
@@ -60,9 +60,6 @@ struct _ddebug {
 
 #if defined(CONFIG_DYNAMIC_DEBUG_CORE)
 
-/* exported for module authors to exercise >control */
-int dynamic_debug_exec_queries(const char *query, const char *modname);
-
 int ddebug_add_module(struct _ddebug *tab, unsigned int n,
const char *modname);
 extern int ddebug_remove_module(const char *mod_name);
@@ -253,12 +250,6 @@ static inline int ddebug_dyndbg_module_param_cb(char 
*param, char *val,
rowsize, groupsize, buf, len, ascii);   \
} while (0)
 
-static inline int dynamic_debug_exec_queries(const char *query, const char 
*modname)
-{
-   pr_warn("kernel not built with CONFIG_DYNAMIC_DEBUG_CORE\n");
-   return 0;
-}
-
 struct kernel_param;
 static inline int param_set_dyndbg_classbits(const char *instr, const struct 
kernel_param *kp)
 { return 0; }
diff --git a/lib/dynamic_debug.c b/lib/dynamic_debug.c
index 7eb1c31f870d..60b2572e64f0 100644
--- a/lib/dynamic_debug.c
+++ b/lib/dynamic_debug.c
@@ -582,35 +582,6 @@ static int ddebug_exec_queries(char *query, const char 
*modname)
return nfound;
 }
 
-/**
- * dynamic_debug_exec_queries - select and change dynamic-debug prints
- * @query: query-string described in admin-guide/dynamic-debug-howto
- * @modname: string containing module name, usually &module.mod_name
- *
- * This uses the >/proc/dynamic_debug/control reader, allowing module
- * authors to modify their dynamic-debug callsites. The modname is
- * canonically struct module.mod_name, but can also be null or a
- * module-wildcard, for example: "drm*".
- */
-int dynamic_debug_exec_queries(const char *query, const char *modname)
-{
-   int rc;
-   char *qry; /* writable copy of query */
-
-   if (!query) {
-   pr_err("non-null query/command string expected\n");
-   return -EINVAL;
-   }
-   qry = kstrndup(query, PAGE_SIZE, GFP_KERNEL);
-   if (!qry)
-   return -ENOMEM;
-
-   rc = ddebug_exec_queries(qry, modname);
-   kfree(qry);
-   return rc;
-}
-EXPORT_SYMBOL_GPL(dynamic_debug_exec_queries);
-
 #ifdef CONFIG_MODULES
 #define KP_MOD_NAME kp->mod->name
 #else
-- 
2.35.1

[PATCH 03/13] dyndbg: add DEFINE_DYNAMIC_DEBUG_CLASSBITS macro and callbacks

DEFINE_DYNAMIC_DEBUG_CLASSBITS(fsname, var, bitmap_desc, classes..)
allows users to create a drm.debug style (bitmap) sysfs interface, to
control sets of pr_debug's according to their .class_id's

This wraps existing "class" keyword and behavior:

   echo "module drm -p ; module drm class 0 +p ; module drm class 2 +p" >control

With the macro in use, this is equivalent:

   echo 0x05 > /sys/module/drm/parameters/debug

To use:

DEFINE_DYNAMIC_DEBUG_CLASSBITS(debug, __drm_debug, "pmfl",
"drm.debug - bits => categories:",
/* vector of uint:4 symbols, ala enum drm_debug_category, 15 is EOL */
DRM_UT_CORE,
DRM_UT_DRIVER,
DRM_UT_KMS ... );

The 3rd arg is a string with any of the dyndbg.flags [pmflt_]+
Full exposure of the flags here lets the module author:

- fully customize/take-over the decorations of enabled sites.
  generally leaving decorations to user is preferred.

- aim the debug-stream:
  now printk, later tracefs.
  using both together means more work (p or T, in practice)
  iface doesn't care about new flags added later

- declare 2 separate sysfs-knobs, one each for p, T, if desired.

- decorations are per callsite,
  shared across sysfs-knobs for any controlled classes

To support the macro, the patch adds:

 - int param_set_dyndbg_classbits()
 - int param_get_dyndbg_classbits()
 - struct kernel_param_ops param_ops_dyndbg_classbits

Following the model of kernel/params.c STANDARD_PARAM_DEFS, these are
non-static and exported.

get/set use an augmented kernel_param; the arg refs a new struct
dyndbg_bitmap_param containing:

A- the vector of classes (drm.debug "categories") being controlled

This in-line vector of constants (uint [0-14]) specifies a sequence of
controlling bits (by position, starting at 0) with the values naming
the class_id's mapped to that bit.

A value of _DPRINTK_SITE_UNCLASSED terminates the vector processing by
param_set_dyndbg_classbits(), and is appended by the macro to insure a
defined termination after max 15 classes are applied.

Technically, the vector is a flex-array, but its size is practically
limited to max 15 in length (repeats are pointless).

B- a pointer to the user module's ulong holding the bits/state.

By accessing client's bit-state, we coordinate with existing code
that still uses drm_debug_enabled(), so they work unchanged.
The change to ulong allows use of BIT() etc.

NOTES:

_DPRINTK_SITE_UNCLASSED = 15, ie 2**CLS_BITS-1, deserves special
mention; it already marks all existing pr-debug callsites, only the
new drm.debug callsites are initialized to other (DRM_UT_*) values.

_DPRINTK_SITE_UNCLASSED is used in param_set_dyndbg_classbits() to
limit the range of bits that are processed to what fits in uint:4.
It also terminates the class-id list passed into the macro, so dont
use it halfway through your list of classes-to-control.

param_set_dyndbg_classbits() compares new vs old bits, and only
updates each class on changes.  This maximally preserves the
underlying state, which may have been customized at some point via
`echo $cmd >control`.  So if users really want to know that all
prdbgs are set precisely, they must pre-clear then set.

Identity mapping in (A) is encouraged.  Your vector should exclude 15,
if used, it terminates the list prematurely; any following bit
mappings will be ignored (it is the default/non category).

The whole (A) vector/list passed to the macro is:

- not strictly needed: 0-N bits are scanned, only N is needed in the
  macro interface to do this, not the whole list.

- 0-N list allows juggling the bit->class map
  Identity map is preferred.
  15 terminates list if used. (macro impl does this)

That said, (A) is self-documenting; the explicit list is greppable,
'DRM_UT_*' provides lots of clues.  Further, it could be upgraded,
something like:

  _pick_sym_(DRM_UT_CORE, "mumble something useful about CORE debug")

_pick_sym_(a,b) a   // gives us what we need here
_pick_help_(a,b) #a " : " b // mod-info fodder

Signed-off-by: Jim Cromie 
---
 include/linux/dynamic_debug.h | 50 +++
 lib/dynamic_debug.c   | 77 +++
 2 files changed, 127 insertions(+)

diff --git a/include/linux/dynamic_debug.h b/include/linux/dynamic_debug.h
index d4b48f3cc6e8..e83c4e36ad29 100644
--- a/include/linux/dynamic_debug.h
+++ b/include/linux/dynamic_debug.h
@@ -209,6 +209,10 @@ void __dynamic_ibdev_dbg(struct _ddebug *descriptor,
   KERN_DEBUG, prefix_str, prefix_type, \
   rowsize, groupsize, buf, len, ascii)
 
+struct kernel_param;
+int param_set_dyndbg_classbits(const char *instr, const struct kernel_param 
*kp);
+int param_get_dyndbg_classbits(char *buffer, const struct kernel_param *kp);
+
 #else /* !CONFIG_DYNAMIC_DEBUG_CORE */
 
 #include 
@@ -255,6 +259,52 @@ static inline int dynamic_debug_exec_queries(const char 
*query, const char *modn
return 0;
 }
 
+struct kern

[PATCH 05/13] dyndbg: improve change-info to have old and new

move site.flag update after the v4pr_info("change") message, and
improve the message to print both old and new flag values.

Heres new form:
  dyndbg: changed net/ipv4/tcp.c:2424 [tcp]tcp_recvmsg_locked pT -> _

Signed-off-by: Jim Cromie 
---
 lib/dynamic_debug.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/lib/dynamic_debug.c b/lib/dynamic_debug.c
index 60b2572e64f0..ab93b370d489 100644
--- a/lib/dynamic_debug.c
+++ b/lib/dynamic_debug.c
@@ -158,7 +158,7 @@ static int ddebug_change(const struct ddebug_query *query,
struct ddebug_table *dt;
unsigned int newflags;
unsigned int nfound = 0;
-   struct flagsbuf fbuf;
+   struct flagsbuf fbuf, nbuf;
 
/* search for matching ddebugs */
mutex_lock(&ddebug_lock);
@@ -223,11 +223,12 @@ static int ddebug_change(const struct ddebug_query *query,
static_branch_enable(&dp->key.dd_key_true);
}
 #endif
+   v4pr_info("changed %s:%d [%s]%s %s -> %s\n",
+ trim_prefix(dp->filename), dp->lineno,
+ dt->mod_name, dp->function,
+ ddebug_describe_flags(dp->flags, &fbuf),
+ ddebug_describe_flags(newflags, &nbuf));
dp->flags = newflags;
-   v4pr_info("changed %s:%d [%s]%s =%s\n",
-trim_prefix(dp->filename), dp->lineno,
-dt->mod_name, dp->function,
-ddebug_describe_flags(dp->flags, &fbuf));
}
}
mutex_unlock(&ddebug_lock);
-- 
2.35.1

[PATCH 08/13] drm_print: interpose drm_*dbg with forwarding macros

drm_dev_dbg() & drm_dbg() sit below the categorized layer of the DRM
debug API, and implement most of it.  These are good places to insert
dynamic-debug jump-label mechanics, allowing DRM to avoid the runtime
cost of drm_debug_enabled().

Set up for this by changing the func names by adding '__' prefixes,
and define forwarding macros to the new names.

no functional changes.

memory cost baseline: (unchanged)
bash-5.1# drms_load
[9.220389] dyndbg:   1 debug prints in module drm
[9.224426] ACPI: bus type drm_connector registered
[9.302192] dyndbg:   2 debug prints in module ttm
[9.305033] dyndbg:   8 debug prints in module video
[9.627563] dyndbg: 127 debug prints in module i915
[9.721505] AMD-Vi: AMD IOMMUv2 functionality not available on this system - 
This is not a bug.
[   10.091345] dyndbg: 2196 debug prints in module amdgpu
[   10.106589] [drm] amdgpu kernel modesetting enabled.
[   10.107270] amdgpu: CRAT table not found
[   10.107926] amdgpu: Virtual CRAT table created for CPU
[   10.108398] amdgpu: Topology: Add CPU node
[   10.168507] dyndbg:   3 debug prints in module wmi
[   10.329587] dyndbg:   3 debug prints in module nouveau

Signed-off-by: Jim Cromie 
---
 drivers/gpu/drm/drm_print.c | 10 +-
 include/drm/drm_print.h |  9 +++--
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/drm_print.c b/drivers/gpu/drm/drm_print.c
index f783d4963d4b..e45ba224e57c 100644
--- a/drivers/gpu/drm/drm_print.c
+++ b/drivers/gpu/drm/drm_print.c
@@ -256,8 +256,8 @@ void drm_dev_printk(const struct device *dev, const char 
*level,
 }
 EXPORT_SYMBOL(drm_dev_printk);
 
-void drm_dev_dbg(const struct device *dev, enum drm_debug_category category,
-const char *format, ...)
+void __drm_dev_dbg(const struct device *dev, enum drm_debug_category category,
+  const char *format, ...)
 {
struct va_format vaf;
va_list args;
@@ -278,9 +278,9 @@ void drm_dev_dbg(const struct device *dev, enum 
drm_debug_category category,
 
va_end(args);
 }
-EXPORT_SYMBOL(drm_dev_dbg);
+EXPORT_SYMBOL(__drm_dev_dbg);
 
-void __drm_dbg(enum drm_debug_category category, const char *format, ...)
+void ___drm_dbg(enum drm_debug_category category, const char *format, ...)
 {
struct va_format vaf;
va_list args;
@@ -297,7 +297,7 @@ void __drm_dbg(enum drm_debug_category category, const char 
*format, ...)
 
va_end(args);
 }
-EXPORT_SYMBOL(__drm_dbg);
+EXPORT_SYMBOL(___drm_dbg);
 
 void __drm_err(const char *format, ...)
 {
diff --git a/include/drm/drm_print.h b/include/drm/drm_print.h
index b3b470440e46..4bed99326631 100644
--- a/include/drm/drm_print.h
+++ b/include/drm/drm_print.h
@@ -334,7 +334,7 @@ __printf(3, 4)
 void drm_dev_printk(const struct device *dev, const char *level,
const char *format, ...);
 __printf(3, 4)
-void drm_dev_dbg(const struct device *dev, enum drm_debug_category category,
+void __drm_dev_dbg(const struct device *dev, enum drm_debug_category category,
 const char *format, ...);
 
 /**
@@ -383,6 +383,9 @@ void drm_dev_dbg(const struct device *dev, enum 
drm_debug_category category,
}   \
 })
 
+#define drm_dev_dbg(dev, cat, fmt, ...)\
+   __drm_dev_dbg(dev, cat, fmt, ##__VA_ARGS__)
+
 /**
  * DRM_DEV_DEBUG() - Debug output for generic drm code
  *
@@ -484,10 +487,12 @@ void drm_dev_dbg(const struct device *dev, enum 
drm_debug_category category,
  */
 
 __printf(2, 3)
-void __drm_dbg(enum drm_debug_category category, const char *format, ...);
+void ___drm_dbg(enum drm_debug_category category, const char *format, ...);
 __printf(1, 2)
 void __drm_err(const char *format, ...);
 
+#define __drm_dbg(fmt, ...)___drm_dbg(fmt, ##__VA_ARGS__)
+
 /* Macros to make printk easier */
 
 #define _DRM_PRINTK(once, level, fmt, ...) \
-- 
2.35.1

[PATCH 07/13] drm_print: condense enum drm_debug_category

enum drm_debug_category has 10 "classes", explicitly initialized with
0x-bitmasks which could be simplified as BIT(X)s.  But lets go
further: use natural enumeration (int, starting at 0), and do the
BIT(cat) in drm_debug_enabled(cat) at runtime.

While this slightly pessimizes the bit-test, the category now fits in
4 bits, allowing it in struct _ddebug.class_id:4.  This sets us up to
adapt drm to use dyndbg with JUMP_LABEL, thus avoiding all those
bit-tests anyway.

Signed-off-by: Jim Cromie 
---
 include/drm/drm_print.h | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/include/drm/drm_print.h b/include/drm/drm_print.h
index 22fabdeed297..b3b470440e46 100644
--- a/include/drm/drm_print.h
+++ b/include/drm/drm_print.h
@@ -279,49 +279,49 @@ enum drm_debug_category {
 * @DRM_UT_CORE: Used in the generic drm code: drm_ioctl.c, drm_mm.c,
 * drm_memory.c, ...
 */
-   DRM_UT_CORE = 0x01,
+   DRM_UT_CORE,
/**
 * @DRM_UT_DRIVER: Used in the vendor specific part of the driver: i915,
 * radeon, ... macro.
 */
-   DRM_UT_DRIVER   = 0x02,
+   DRM_UT_DRIVER,
/**
 * @DRM_UT_KMS: Used in the modesetting code.
 */
-   DRM_UT_KMS  = 0x04,
+   DRM_UT_KMS,
/**
 * @DRM_UT_PRIME: Used in the prime code.
 */
-   DRM_UT_PRIME= 0x08,
+   DRM_UT_PRIME,
/**
 * @DRM_UT_ATOMIC: Used in the atomic code.
 */
-   DRM_UT_ATOMIC   = 0x10,
+   DRM_UT_ATOMIC,
/**
 * @DRM_UT_VBL: Used for verbose debug message in the vblank code.
 */
-   DRM_UT_VBL  = 0x20,
+   DRM_UT_VBL,
/**
 * @DRM_UT_STATE: Used for verbose atomic state debugging.
 */
-   DRM_UT_STATE= 0x40,
+   DRM_UT_STATE,
/**
 * @DRM_UT_LEASE: Used in the lease code.
 */
-   DRM_UT_LEASE= 0x80,
+   DRM_UT_LEASE,
/**
 * @DRM_UT_DP: Used in the DP code.
 */
-   DRM_UT_DP   = 0x100,
+   DRM_UT_DP,
/**
 * @DRM_UT_DRMRES: Used in the drm managed resources code.
 */
-   DRM_UT_DRMRES   = 0x200,
+   DRM_UT_DRMRES
 };
 
 static inline bool drm_debug_enabled(enum drm_debug_category category)
 {
-   return unlikely(__drm_debug & category);
+   return unlikely(__drm_debug & BIT(category));
 }
 
 /*
-- 
2.35.1

[PATCH 06/13] dyndbg: abstract dyndbg_site_is_printing

Hide flags test in a macro.
no functional changes.

Signed-off-by: Jim Cromie 
---
 include/linux/dynamic_debug.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/dynamic_debug.h b/include/linux/dynamic_debug.h
index 664bb83778d2..106065244f73 100644
--- a/include/linux/dynamic_debug.h
+++ b/include/linux/dynamic_debug.h
@@ -56,7 +56,7 @@ struct _ddebug {
 #endif
 } __attribute__((aligned(8)));
 
-
+#define dyndbg_site_is_printing(desc)  (desc->flags & _DPRINTK_FLAGS_PRINT)
 
 #if defined(CONFIG_DYNAMIC_DEBUG_CORE)
 
-- 
2.35.1

[PATCH 09/13] drm_print: wrap drm_*_dbg in dyndbg jumplabel

For CONFIG_DRM_USE_DYNAMIC_DEBUG=y, wrap drm_dbg() & drm_dev_dbg() in
one of dyndbg's Factory macros: _dynamic_func_call_no_desc().
This makes the (~4000) callsites controllable, typically by class:

  # 0 is DRM_UT_CORE
  #> echo module drm class 0 +p > /proc/dynamic_debug/control

 =N: keeps direct forwarding: drm_*_dbg -> __drm_*_dbg()

I added the CONFIG_DRM_USE_DYNAMIC_DEBUG item because of the .data
footprint cost of per-callsite control; 56 bytes/site * ~2k,4k
callsites (for i915, amdgpu), which is significant enough that a user
might not want it.  Using CONFIG_DYNAMIC_DEBUG_CORE only eliminates
the builtin portion, leaving only drm modules, but still 200k of
module data is a lot.

Signed-off-by: Jim Cromie 
---
 drivers/gpu/drm/Kconfig  | 12 
 drivers/gpu/drm/Makefile |  2 ++
 include/drm/drm_print.h  | 12 
 3 files changed, 26 insertions(+)

diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index b1f22e457fd0..ec14a1cd4449 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -63,6 +63,18 @@ config DRM_DEBUG_MM
 
  If in doubt, say "N".
 
+config DRM_USE_DYNAMIC_DEBUG
+   bool "use dynamic debug to implement drm.debug"
+   default y
+   depends on DRM
+   depends on DYNAMIC_DEBUG || DYNAMIC_DEBUG_CORE
+   depends on JUMP_LABEL
+   help
+ Use dynamic-debug to avoid drm_debug_enabled() runtime overheads.
+ Due to callsite counts in DRM drivers (~4k in amdgpu) and 56
+ bytes per callsite, the .data costs can be substantial, and
+ are therefore configurable.
+
 config DRM_DEBUG_SELFTEST
tristate "kselftests for DRM"
depends on DRM
diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
index 301a44dc18e3..24e6410d6c0e 100644
--- a/drivers/gpu/drm/Makefile
+++ b/drivers/gpu/drm/Makefile
@@ -3,6 +3,8 @@
 # Makefile for the drm device driver.  This driver provides support for the
 # Direct Rendering Infrastructure (DRI) in XFree86 4.1.0 and higher.
 
+CFLAGS-$(CONFIG_DRM_USE_DYNAMIC_DEBUG) += -DDYNAMIC_DEBUG_MODULE
+
 drm-y   := drm_aperture.o drm_auth.o drm_cache.o \
drm_file.o drm_gem.o drm_ioctl.o \
drm_drv.o \
diff --git a/include/drm/drm_print.h b/include/drm/drm_print.h
index 4bed99326631..06f0ee06be1f 100644
--- a/include/drm/drm_print.h
+++ b/include/drm/drm_print.h
@@ -383,8 +383,14 @@ void __drm_dev_dbg(const struct device *dev, enum 
drm_debug_category category,
}   \
 })
 
+#if !defined(CONFIG_DRM_USE_DYNAMIC_DEBUG)
 #define drm_dev_dbg(dev, cat, fmt, ...)\
__drm_dev_dbg(dev, cat, fmt, ##__VA_ARGS__)
+#else
+#define drm_dev_dbg(dev, cat, fmt, ...)\
+   _dynamic_func_call_no_desc(fmt, __drm_dev_dbg,  \
+  dev, cat, fmt, ##__VA_ARGS__)
+#endif
 
 /**
  * DRM_DEV_DEBUG() - Debug output for generic drm code
@@ -491,7 +497,13 @@ void ___drm_dbg(enum drm_debug_category category, const 
char *format, ...);
 __printf(1, 2)
 void __drm_err(const char *format, ...);
 
+#if !defined(CONFIG_DRM_USE_DYNAMIC_DEBUG)
 #define __drm_dbg(fmt, ...)___drm_dbg(fmt, ##__VA_ARGS__)
+#else
+#define __drm_dbg(cat, fmt, ...)   \
+   _dynamic_func_call_no_desc(fmt, ___drm_dbg, \
+  cat, fmt, ##__VA_ARGS__)
+#endif
 
 /* Macros to make printk easier */
 
-- 
2.35.1

[PATCH 11/13] drm_print: prefer bare printk KERN_DEBUG on generic fn

drm_print.c calls pr_debug() just once, from __drm_printfn_debug(),
which is a generic/service fn.  The callsite is compile-time enabled
by DEBUG in both DYNAMIC_DEBUG=y/n builds.

For dyndbg builds, reverting this callsite back to bare printk is
correcting a few anti-features:

1- callsite is generic, serves multiple drm users.
   its hardwired on currently
   could accidentally: #> echo -p > /proc/dynamic_debug/control

2- optional "decorations" by dyndbg are unhelpful/misleading
   they describe only the generic site, not end users

IOW, 1,2 are unhelpful at best, and possibly confusing.

reverting yields a nominal data and text shrink:

   textdata bss dec hex filename
 462583   36604   54592 553779   87333 
/lib/modules/5.16.0-rc4-lm1-8-ged3eac8ceeea/kernel/drivers/gpu/drm/drm.ko
 462515   36532   54592 553639   872a7 
/lib/modules/5.16.0-rc4-lm1-9-g6ce0b88d2539-dirty/kernel/drivers/gpu/drm/drm.ko

NB: this was noticed using _drm_debug_enabled(), added earlier.

Signed-off-by: Jim Cromie 
---
 drivers/gpu/drm/drm_print.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/drm_print.c b/drivers/gpu/drm/drm_print.c
index 92e6e18026da..24c57b92dc69 100644
--- a/drivers/gpu/drm/drm_print.c
+++ b/drivers/gpu/drm/drm_print.c
@@ -23,8 +23,6 @@
  * Rob Clark 
  */
 
-#define DEBUG /* for pr_debug() */
-
 #include 
 
 #include 
@@ -162,7 +160,8 @@ EXPORT_SYMBOL(__drm_printfn_info);
 
 void __drm_printfn_debug(struct drm_printer *p, struct va_format *vaf)
 {
-   pr_debug("%s %pV", p->prefix, vaf);
+   /* pr_debug callsite decorations are unhelpful here */
+   printk(KERN_DEBUG "%s %pV", p->prefix, vaf);
 }
 EXPORT_SYMBOL(__drm_printfn_debug);
 
-- 
2.35.1

[PATCH 10/13] drm_print: refine drm_debug_enabled for dyndbg+jump-label

In order to use dynamic-debug's jump-label optimization in drm-debug,
its clarifying to refine drm_debug_enabled into 3 uses:

1.   drm_debug_enabled - legacy, public
2. __drm_debug_enabled - optimized for dyndbg jump-label enablement.
3.  _drm_debug_enabled - pr_debug instrumented, observable

1. The legacy version always checks the bits.

2. is privileged, for use by __drm_dbg(), __drm_dev_dbg(), which do an
early return unless the category is enabled (free of call/NOOP side
effects).  For dyndbg builds, debug callsites are selectively
"pre-enabled", so __drm_debug_enabled() short-circuits to true there.
Remaining callers of 1 may be able to use 2, case by case.

3. is 1st wrapped in a macro, with a pr_debug, which reports each
usage in /proc/dynamic_debug/control, making it observable in the
logs.  The macro lets the pr_debug see the real caller, not an inline
function.

When plugged into 1, it identified ~10 remaining callers of the
function, leading to the follow-on cleanup patch, and would allow
activating the pr_debugs, estimating the callrate, and the potential
savings by using the wrapper macro.  It is unused ATM, but it fills
out the picture.

Signed-off-by: Jim Cromie 
---
 drivers/gpu/drm/drm_print.c |  4 ++--
 include/drm/drm_print.h | 28 
 2 files changed, 30 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/drm_print.c b/drivers/gpu/drm/drm_print.c
index e45ba224e57c..92e6e18026da 100644
--- a/drivers/gpu/drm/drm_print.c
+++ b/drivers/gpu/drm/drm_print.c
@@ -262,7 +262,7 @@ void __drm_dev_dbg(const struct device *dev, enum 
drm_debug_category category,
struct va_format vaf;
va_list args;
 
-   if (!drm_debug_enabled(category))
+   if (!__drm_debug_enabled(category))
return;
 
va_start(args, format);
@@ -285,7 +285,7 @@ void ___drm_dbg(enum drm_debug_category category, const 
char *format, ...)
struct va_format vaf;
va_list args;
 
-   if (!drm_debug_enabled(category))
+   if (!__drm_debug_enabled(category))
return;
 
va_start(args, format);
diff --git a/include/drm/drm_print.h b/include/drm/drm_print.h
index 06f0ee06be1f..38ef044d786e 100644
--- a/include/drm/drm_print.h
+++ b/include/drm/drm_print.h
@@ -319,11 +319,39 @@ enum drm_debug_category {
DRM_UT_DRMRES
 };
 
+/*
+ * 3 name flavors of drm_debug_enabled:
+ *   drm_debug_enabled - public/legacy, always checks bits
+ *  _drm_debug_enabled - instrumented to observe call-rates, est overheads.
+ * __drm_debug_enabled - privileged - knows jump-label state, can short-circuit
+ */
 static inline bool drm_debug_enabled(enum drm_debug_category category)
 {
return unlikely(__drm_debug & BIT(category));
 }
 
+/*
+ * Wrap fn in macro, so that the pr_debug sees the actual caller, not
+ * the inline fn.  Using this name creates a callsite entry / control
+ * point in /proc/dynamic_debug/control.
+ */
+#define _drm_debug_enabled(category)   \
+   ({  \
+   pr_debug("todo: maybe avoid via dyndbg\n"); \
+   drm_debug_enabled(category);\
+   })
+
+#if defined(CONFIG_DRM_USE_DYNAMIC_DEBUG)
+/*
+ * dyndbg is wrapping the drm.debug API, so as to avoid the runtime
+ * bit-test overheads of drm_debug_enabled() in those api calls.
+ * In this case, executed callsites are known enabled, so true.
+ */
+#define __drm_debug_enabled(category)  true
+#else
+#define __drm_debug_enabled(category)  drm_debug_enabled(category)
+#endif
+
 /*
  * struct device based logging
  *
-- 
2.35.1

[PATCH 13/13] drm_print: use DEFINE_DYNAMIC_DEBUG_CLASSBITS for drm.debug

if CONFIG_DRM_USE_DYNAMIC_DEBUG=y, use new macro to create the sysfs
bitmap to control drm.debug callsites.

DEFINE_DYNAMIC_DEBUG_CLASSBITS( debug, __drm_debug, "p",
"drm.debug - control summary",
/* inline vector of _ddebug.class_id's to be controlled, max 14 vals */
DRM_UT_CORE,
DRM_UT_DRIVER,
DRM_UT_KMS,
DRM_UT_PRIME,
DRM_UT_ATOMIC,
DRM_UT_VBL,
DRM_UT_STATE,
DRM_UT_LEASE,
DRM_UT_DP,
DRM_UT_DRMRES );

NOTES:

The @_flgs used here is "p", so this bitmap enables input to syslog
only, matching legacy behavior.

Also, no "fmlt" decorator flags are used here; that is discouraged, as
it then toggles those flags along with the "p".  This would overwrite
any customizations a user added since the sysfs-knob was last used.
Still, there may be cases/reasons.

_ddebug.class_id is uint:4, values 0-14 are valid. 15 is reserved for
non-classified callsites (regular pr_debugs).  Using it terminates the
scan, don't use it halfway through your list.

Signed-off-by: Jim Cromie 
---
 drivers/gpu/drm/drm_print.c | 20 ++--
 include/drm/drm_print.h |  4 ++--
 2 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/drm_print.c b/drivers/gpu/drm/drm_print.c
index c9b2a2ab0d3d..d916daa384e5 100644
--- a/drivers/gpu/drm/drm_print.c
+++ b/drivers/gpu/drm/drm_print.c
@@ -38,7 +38,7 @@
  * __drm_debug: Enable debug output.
  * Bitmask of DRM_UT_x. See include/drm/drm_print.h for details.
  */
-unsigned int __drm_debug;
+unsigned long __drm_debug;
 EXPORT_SYMBOL(__drm_debug);
 
 MODULE_PARM_DESC(debug, "Enable debug output, where each bit enables a debug 
category.\n"
@@ -50,7 +50,23 @@ MODULE_PARM_DESC(debug, "Enable debug output, where each bit 
enables a debug cat
 "\t\tBit 5 (0x20)  will enable VBL messages (vblank code)\n"
 "\t\tBit 7 (0x80)  will enable LEASE messages (leasing code)\n"
 "\t\tBit 8 (0x100) will enable DP messages (displayport code)");
-module_param_named(debug, __drm_debug, int, 0600);
+
+#if !defined(CONFIG_DRM_USE_DYNAMIC_DEBUG)
+module_param_named(debug, __drm_debug, ulong, 0600);
+#else
+DEFINE_DYNAMIC_DEBUG_CLASSBITS(debug, __drm_debug, "p",
+   "enable drm.debug categories - 1 bit per category",
+   DRM_UT_CORE,
+   DRM_UT_DRIVER,
+   DRM_UT_KMS,
+   DRM_UT_PRIME,
+   DRM_UT_ATOMIC,
+   DRM_UT_VBL,
+   DRM_UT_STATE,
+   DRM_UT_LEASE,
+   DRM_UT_DP,
+   DRM_UT_DRMRES);
+#endif
 
 void __drm_puts_coredump(struct drm_printer *p, const char *str)
 {
diff --git a/include/drm/drm_print.h b/include/drm/drm_print.h
index 13d52b60f388..419140bf992d 100644
--- a/include/drm/drm_print.h
+++ b/include/drm/drm_print.h
@@ -36,7 +36,7 @@
 #include 
 
 /* Do *not* use outside of drm_print.[ch]! */
-extern unsigned int __drm_debug;
+extern unsigned long __drm_debug;
 
 /**
  * DOC: print
@@ -527,7 +527,7 @@ __printf(1, 2)
 void __drm_err(const char *format, ...);
 
 #if !defined(CONFIG_DRM_USE_DYNAMIC_DEBUG)
-#define __drm_dbg(fmt, ...)___drm_dbg(NULL, fmt, ##__VA_ARGS__)
+#define __drm_dbg(cat, fmt, ...)   ___drm_dbg(NULL, cat, fmt, 
##__VA_ARGS__)
 #else
 #define __drm_dbg(cat, fmt, ...)   \
_dynamic_func_call_cls(cat, fmt, ___drm_dbg,\
-- 
2.35.1

[PATCH 12/13] drm_print: add _ddebug desc to drm_*dbg prototypes