date:20210428


That is already fixed upstream.

Am 28.04.21 um 07:33 schrieb Felix Kuehling:

ttm_bo_swapout returns a non-0 value on success. Don't treat that as an
error in ttm_tt_populate.

Signed-off-by: Felix Kuehling 
---
  drivers/gpu/drm/ttm/ttm_tt.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 5d8820725b75..1858a7fb9169 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -326,7 +326,7 @@ int ttm_tt_populate(struct ttm_device *bdev,
   ttm_dma32_pages_limit) {
  
  		ret = ttm_bo_swapout(ctx, GFP_KERNEL);

-   if (ret)
+   if (ret < 0)
goto error;
}
  


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 1/2] drm/ttm: Don't evict SG BOs


Am 28.04.21 um 07:33 schrieb Felix Kuehling:

SG BOs do not occupy space that is managed by TTM. So do not evict them.

This fixes unexpected evictions of KFD's userptr BOs. KFD only expects
userptr "evictions" in the form of MMU notifiers.


NAK, SG BOs also account for the memory the GPU can currently access.

We can ignore them for the allocated memory, but not for the GTT domain.

Christian.



Signed-off-by: Felix Kuehling 
---
  drivers/gpu/drm/ttm/ttm_bo.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index de1ec838cf8b..0b953654fdbf 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -655,6 +655,10 @@ int ttm_mem_evict_first(struct ttm_device *bdev,
list_for_each_entry(bo, &man->lru[i], lru) {
bool busy;
  
+			/* Don't evict SG BOs */

+   if (bo->ttm && bo->ttm->sg)
+   continue;
+
if (!ttm_bo_evict_swapout_allowable(bo, ctx, &locked,
&busy)) {
if (busy && !busy_bo && ticket !=


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: Display notch support

2021-04-28 Thread Simon Ser

> A solution to make this configuration generic and exposed by the kernel
> would standardise this across Linux

Having a KMS property for this makes sense to me.

Chatting with Jani on IRC, it doesn't seem like there's any EDID or
DisplayID block for this.

Note, Android exposes a data structure [1] with:

- Margin of the cut-out for each edge of the screen
- One rectangle per edge describing the cut-out region
- Size of the curved area for each edge of a waterfall display

I haven't found anything describing the rounded corners of the display.

[1]: https://developer.android.com/reference/android/view/DisplayCutout
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: New warnings with gcc-11

2021-04-28 Thread Jani Nikula

On Tue, 27 Apr 2021, Linus Torvalds  wrote:
> I've updated to Fedora 34 on one of my machines, and it causes a lot
> of i915 warnings like
>
>   drivers/gpu/drm/i915/intel_pm.c: In function ‘ilk_setup_wm_latency’:
>   drivers/gpu/drm/i915/intel_pm.c:3059:9: note: referencing argument 3
> of type ‘const u16 *’ {aka ‘const short unsigned int *’}
>   drivers/gpu/drm/i915/intel_pm.c:2994:13: note: in a call to function
> ‘intel_print_wm_latency’
>
> and the reason is that gcc now seems to look at the argument array
> size more, and notices that

Arnd Bergmann reported some of these a while back. I think we have some
of them fixed in our -next already, but not all. Thanks for the
reminder.

BR,
Jani.

-- 
Jani Nikula, Intel Open Source Graphics Center
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: Display notch support

2021-04-28 Thread Pekka Paalanen

On Wed, 28 Apr 2021 07:21:28 +
Simon Ser  wrote:

> > A solution to make this configuration generic and exposed by the kernel
> > would standardise this across Linux  
> 
> Having a KMS property for this makes sense to me.
> 
> Chatting with Jani on IRC, it doesn't seem like there's any EDID or
> DisplayID block for this.
> 
> Note, Android exposes a data structure [1] with:
> 
> - Margin of the cut-out for each edge of the screen
> - One rectangle per edge describing the cut-out region
> - Size of the curved area for each edge of a waterfall display
> 
> I haven't found anything describing the rounded corners of the display.
> 
> [1]: https://developer.android.com/reference/android/view/DisplayCutout

Hi,

I'm kind of worried whether you can design a description structure that
would be good for a long time. That list already looks quite
complicated. Add also watch-like devices with circular displays.

Would the kernel itself use this information at all?

If not, is there not a policy that DT is not a userspace configuration
store?

You mentioned the panel orientation property, but that is used by the
kernel for fbcon or something, is it not? Maybe as the default value
for the CRTC rotation property which actually turns the image?

Assuming that you succeed in describing these non-usable, funny
(waterfall edge), funny2 (e.g. behind a shade or filter so visible but
not normal), funny3 (e.g. phone button area with maybe tactile
markings), and normal areas, how would userspace handle this
information?

Funny2 and funny3 are hypothetical but maybe not too far-fetched.

Is there any provision for generic userspace to handle this generically?

This seems more like a job for the hypothetical liboutput, just like
recognising HMDs (yes, I know, kernel does that already, but there is a
point that kernel may not want to put fbcon on a HMD).

Thanks,
pq

pgpCWSN3bQs6E.pgp
Description: OpenPGP digital signature
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 1/2] drm/ttm: Don't evict SG BOs

2021-04-28 Thread Felix Kuehling


Am 2021-04-28 um 3:04 a.m. schrieb Christian König:
> Am 28.04.21 um 07:33 schrieb Felix Kuehling:
>> SG BOs do not occupy space that is managed by TTM. So do not evict them.
>>
>> This fixes unexpected evictions of KFD's userptr BOs. KFD only expects
>> userptr "evictions" in the form of MMU notifiers.
>
> NAK, SG BOs also account for the memory the GPU can currently access. 
>
> We can ignore them for the allocated memory, but not for the GTT domain.
Hmm, the only reason I found this problem is, that I am now testing with
IOMMU enabled. Evicting the userptr BO destroys the DMA mapping. Without
IOMMU-enforced device isolation I was blissfully unaware that the
userptr BOs were being evicted. The GPUVM mappings were unaffected and
just worked without problems. Having to evict these BOs is crippling
KFD's ability to map system memory for GPU access, once again.

I think this affects not only userptr BOs but also DMABuf imports for
BOs shared between multiple GPUs.

The GTT size limitation is entirely artificial. And the only reason I
know of for keeping it limited to the VRAM size is to work around some
OOM issues with GTT BOs. Applying this to userptrs and DMABuf imports
makes no sense. But I understand that the way TTM manages the GTT domain
there is no easy fix for this. Maybe we'd have to create a new domain
for validating SG BOs that's separate from GTT, so that TTM would not
try to allocate GTT space for them.

Failing that, I'd probably have to abandon userptr BOs altogether and
switch system memory mappings over to using the new SVM API on systems
where it is avaliable.

Regards,
  Felix


>
> Christian.
>
>>
>> Signed-off-by: Felix Kuehling 
>> ---
>>   drivers/gpu/drm/ttm/ttm_bo.c | 4 
>>   1 file changed, 4 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
>> index de1ec838cf8b..0b953654fdbf 100644
>> --- a/drivers/gpu/drm/ttm/ttm_bo.c
>> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
>> @@ -655,6 +655,10 @@ int ttm_mem_evict_first(struct ttm_device *bdev,
>>   list_for_each_entry(bo, &man->lru[i], lru) {
>>   bool busy;
>>   +    /* Don't evict SG BOs */
>> +    if (bo->ttm && bo->ttm->sg)
>> +    continue;
>> +
>>   if (!ttm_bo_evict_swapout_allowable(bo, ctx, &locked,
>>   &busy)) {
>>   if (busy && !busy_bo && ticket !=
>
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH V2 2/2] drm/bridge: ti-sn65dsi83: Add TI SN65DSI83 and SN65DSI84 driver

2021-04-28 Thread Frieder Schrempf


On 22.04.21 00:31, Marek Vasut wrote:

Add driver for TI SN65DSI83 Single-link DSI to Single-link LVDS bridge
and TI SN65DSI84 Single-link DSI to Dual-link or 2x Single-link LVDS
bridge. TI SN65DSI85 is unsupported due to lack of hardware to test on,
but easy to add.

The driver operates the chip via I2C bus. Currently the LVDS clock are
always derived from DSI clock lane, which is the usual mode of operation.
Support for clock from external oscillator is not implemented, but it is
easy to add if ever needed. Only RGB888 pixel format is implemented, the
LVDS666 is not supported, but could be added if needed.

Signed-off-by: Marek Vasut 
Cc: Douglas Anderson 
Cc: Jagan Teki 
Cc: Laurent Pinchart 
Cc: Linus Walleij 
Cc: Philippe Schenker 
Cc: Sam Ravnborg 
Cc: Stephen Boyd 
Cc: Valentin Raevsky 
To: dri-devel@lists.freedesktop.org
Tested-by: Loic Poulain 
---
V2: - Use dev_err_probe()
 - Set REG_RC_RESET as volatile
 - Wait for PLL stabilization by polling REG_RC_LVDS_PLL
 - Use ctx->mode = *adj instead of *mode in sn65dsi83_mode_set
 - Add tested DSI84 support in dual-link mode
 - Correctly set VCOM
 - Fill in missing DSI CHB and LVDS CHB bits from DSI84 and DSI85
   datasheets, with that all the reserved bits make far more sense
   as the DSI83 and DSI84 seems to be reduced version of DSI85
---
  drivers/gpu/drm/bridge/Kconfig|  10 +
  drivers/gpu/drm/bridge/Makefile   |   1 +
  drivers/gpu/drm/bridge/ti-sn65dsi83.c | 617 ++
  3 files changed, 628 insertions(+)
  create mode 100644 drivers/gpu/drm/bridge/ti-sn65dsi83.c


[...]

+static int sn65dsi83_probe(struct i2c_client *client,
+  const struct i2c_device_id *id)
+{
+   struct device *dev = &client->dev;
+   enum sn65dsi83_model model;
+   struct sn65dsi83 *ctx;
+   int ret;
+
+   ctx = devm_kzalloc(dev, sizeof(*ctx), GFP_KERNEL);
+   if (!ctx)
+   return -ENOMEM;
+
+   ctx->dev = dev;
+
+   if (dev->of_node)
+   model = (enum sn65dsi83_model)of_device_get_match_data(dev);
+   else
+   model = id->driver_data;
+
+   /* Default to dual-link LVDS on all but DSI83. */
+   if (model != MODEL_SN65DSI83)
+   ctx->lvds_dual_link = true;


What if I use the DSI84 with a single link LVDS? I can't see any way to 
configure that right now.



+
+   ctx->enable_gpio = devm_gpiod_get(ctx->dev, "enable", GPIOD_OUT_LOW);
+   if (IS_ERR(ctx->enable_gpio))
+   return PTR_ERR(ctx->enable_gpio);
+
+   ret = sn65dsi83_parse_dt(ctx);
+   if (ret)
+   return ret;
+
+   ctx->regmap = devm_regmap_init_i2c(client, &sn65dsi83_regmap_config);
+   if (IS_ERR(ctx->regmap))
+   return PTR_ERR(ctx->regmap);
+
+   dev_set_drvdata(dev, ctx);
+   i2c_set_clientdata(client, ctx);
+
+   ctx->bridge.funcs = &sn65dsi83_funcs;
+   ctx->bridge.of_node = dev->of_node;
+   drm_bridge_add(&ctx->bridge);
+
+   return 0;
+}

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: Display notch support

2021-04-28 Thread Simon Ser

On Wednesday, April 28th, 2021 at 9:44 AM, Pekka Paalanen  
wrote:

> I'm kind of worried whether you can design a description structure that
> would be good for a long time. That list already looks quite
> complicated. Add also watch-like devices with circular displays.
>
> Would the kernel itself use this information at all?

fbcon might want to letter-box its output to make sure it's not
obscured behind a cut-out area.

> If not, is there not a policy that DT is not a userspace configuration
> store?
>
> You mentioned the panel orientation property, but that is used by the
> kernel for fbcon or something, is it not? Maybe as the default value
> for the CRTC rotation property which actually turns the image?

I wonder if fbcon uses it at all. In general CRTC rotation is not
well-supported by HW drivers, at least for linear buffers. CRTC
rotation is just an optimization.

> Assuming that you succeed in describing these non-usable, funny
> (waterfall edge), funny2 (e.g. behind a shade or filter so visible but
> not normal), funny3 (e.g. phone button area with maybe tactile
> markings), and normal areas, how would userspace handle this
> information?
>
> Funny2 and funny3 are hypothetical but maybe not too far-fetched.
>
> Is there any provision for generic userspace to handle this generically?

I think the main use-case here is make sure there's nothing important
being cut out on screen. I agree we still don't know how the hw will
evolve and might design an API which is too restricted. But building
something that ends up too complicated and too generic wouldn't be
great either.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [RFC PATCH 0/3] A drm_plane API to support HDR planes

2021-04-28 Thread Shashank Sharma

Hello Harry,

Many of us in the mail chain have discussed this before, on what is the right 
way to blend and tone map a SDR and a HDR buffer from same/different color 
spaces, and what kind of DRM plane properties will be needed.

As you can see from the previous comments, that the majority of the decision 
making will happen in the Compositor, as it's the only SW unit, which has the 
overall picture clear.

Reference: 
(https://lists.freedesktop.org/archives/wayland-devel/2019-January/039808.html )

If we see a systematic approach of how do we make such blending policy, it will 
look like:


- Compositor needs to understand the following values of each of the buffer:

    - Color space or Gamut: BT2020/SRGB/DCI-P3/BT709/BT601 etc

    - Color format (RGB/YCBCR) and subsampling (444/422/420)

    - Tone (SDR/HDR_A/HDR_B)


- Then the Compositor needs to understand the capabilities of the output 
display, as this will be a clamping value

    - Output Gamut support (BT2020/SRGB/DCIP3)

    - Output max Luminance of the monitor in Nits (even in case of HDR content 
to HDR display)

  

Based of all this information above, the compositor needs to set a blending 
target, which contains the following:

    - Output Colorspace of the blended output: say BT2020

    - Output Luminance of the blended output: Match content, if monitor can 
support it

    - Output Color format of the blended output: Say YCBCR4:2:0


Let's assume compositor prepares a blending policy with output as:

    - Output Luminance: HDR 500 Nits

    - Output color space: BT2020

    - Output color format: RGB888

    - Output curve: ST2084

  

Assuming these details, A compositor will look for DRM color properties like 
these:

1. Degamma plane property : To make buffers linear for Gamut mapping

2. Gamut mapping plane property:  To gamut map SRGB buffer to BT2020 colorspace

3. Color space conversion plane property: To convert from YCBCR->RGB

4. Tone mapping plane property: To tone map SDR buffer S2H and HDR buffer H2H

5. Gamma plane/CRTC property: to re-apply the output ST2084 curve


We will also need connector/CRTC properties to set AVI info-frames accordingly.

A high level block diagram for blending on a generic HW should look like this:

/*
 *  SDR 200Nits┌┐ SDR 200 Nits  ┌┐ SDR 200 
┌──┐HDR 500┌┐ HDR 500
 *   BT709 │    │ BT709 │    │ BT2020  │
  │BT2020 │    │ BT2020
 *   ► │   Degamma  ├─► │ Gamut Mapping  ├►│  
Tone mapping    ├──►│  Gamma │
 *  RGB888 │ 2.2    │ RGB888    │  709->2020 │ RGB888  │    
S2H   │RGB888 │  ST2084    │ RGB888
 *  Non Linear │    │ Linear    │    │ Linear  │   
200->500   │Linear │    │ ST2084
 * └┘   └┘ 
└──┘   └┘
 *
 *
 *
 *
 *
 *
 *
 *
 * ┌─┐ ┌─┐   
┌─┐   ┌┐
 * HDR 600 Nits│ │HDR 600 Nits │ │HDR600 │  
   │HDR500 │    │ HDR500
 *   ► │  Degamma    ├►│  Color space    ├──►│  
Tone mapping   ├──►│  Gamma │
 * BT2020  │  OETF ST2084    │ BT2020  │  conversion │BT2020 │  
 H2H   │BT2020 │  ST2084    │ BT2020
 * YCBCR420    │ │ YCBCR420    │ YCBCR->RGB  │RGB88  │  
 600->500  │RGB888 │    │ RGB888
 * Non Linear  └─┘ Linear  └─┘Linear 
└─┘Linear └┘ ST2084
 */


Hope this helps to refine the series.


Regards

Shashank

On 27/04/21 20:20, Pekka Paalanen wrote:
> On Mon, 26 Apr 2021 13:38:49 -0400
> Harry Wentland  wrote:
>
>> ## Introduction
>>
>> We are looking to enable HDR support for a couple of single-plane and
>> multi-plane scenarios. To do this effectively we recommend new
>> interfaces to drm_plane. Below I'll give a bit of background on HDR
>> and why we propose these interfaces.
>>
>>
>> ## Defining a pixel's luminance
>>
>> Currently the luminance space of pixels in a framebuffer/plane
>> presented to the display is not well defined. It's usually assumed to
>> be in a 2.2 or 2.4 gamma space and has no mapping to an absolute
>> luminance value but is interpreted in relative terms.
>>
>> Luminance can be measured and described in absolute terms as candela
>> per meter squared, or cd/m2, or nits. Even though a pixel value can
>> be mapped to luminance in a linear fashion to do so without losing a
>> lot of detail requires 16-bpc color depth. The reason for this is
>> that human perception can distinguish roughly between a 0.5-1%
>> luminance delta. A linear representation is suboptimal, wasting
>> precision in the h

Re: [PATCH V2 1/2] dt-bindings: drm/bridge: ti-sn65dsi83: Add TI SN65DSI83 and SN65DSI84 bindings

2021-04-28 Thread Frieder Schrempf


On 22.04.21 00:31, Marek Vasut wrote:

Add DT binding document for TI SN65DSI83 and SN65DSI84 DSI to LVDS bridge.

Signed-off-by: Marek Vasut 
Cc: Douglas Anderson 
Cc: Jagan Teki 
Cc: Laurent Pinchart 
Cc: Linus Walleij 
Cc: Rob Herring 
Cc: Sam Ravnborg 
Cc: Stephen Boyd 
Cc: devicet...@vger.kernel.org
To: dri-devel@lists.freedesktop.org
---
V2: Add compatible string for SN65DSI84, since this is now tested on it
---
  .../bindings/display/bridge/ti,sn65dsi83.yaml | 134 ++
  1 file changed, 134 insertions(+)
  create mode 100644 
Documentation/devicetree/bindings/display/bridge/ti,sn65dsi83.yaml

diff --git a/Documentation/devicetree/bindings/display/bridge/ti,sn65dsi83.yaml 
b/Documentation/devicetree/bindings/display/bridge/ti,sn65dsi83.yaml
new file mode 100644
index ..42d11b46a1eb
--- /dev/null
+++ b/Documentation/devicetree/bindings/display/bridge/ti,sn65dsi83.yaml
@@ -0,0 +1,134 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/display/bridge/ti,sn65dsi83.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: SN65DSI83 and SN65DSI84 DSI to LVDS bridge chip
+
+maintainers:
+  - Marek Vasut 
+
+description: |
+  Texas Instruments SN65DSI83 1x Single-link MIPI DSI
+  to 1x Single-link LVDS
+  https://www.ti.com/lit/gpn/sn65dsi83
+  Texas Instruments SN65DSI84 1x Single-link MIPI DSI
+  to 1x Dual-link or 2x Single-link LVDS
+  https://www.ti.com/lit/gpn/sn65dsi84
+
+properties:
+  compatible:
+oneOf:
+  - const: ti,sn65dsi83
+  - const: ti,sn65dsi84
+
+  reg:
+const: 0x2d


There is a strapping pin to select the last bit of the address, so apart 
from 0x2d also 0x2c is valid here.



+
+  enable-gpios:
+maxItems: 1
+description: GPIO specifier for bridge_en pin (active high).
+
+  ports:
+type: object
+additionalProperties: false
+
+properties:
+  "#address-cells":
+const: 1
+
+  "#size-cells":
+const: 0
+
+  port@0:
+type: object
+additionalProperties: false
+
+description:
+  Video port for MIPI DSI input
+
+properties:
+  reg:
+const: 0
+
+  endpoint:
+type: object
+additionalProperties: false
+properties:
+  remote-endpoint: true
+  data-lanes:
+description: array of physical DSI data lane indexes.
+
+required:
+  - reg
+
+  port@1:
+type: object
+additionalProperties: false
+
+description:
+  Video port for LVDS output (panel or bridge).
+
+properties:
+  reg:
+const: 1
+
+  endpoint:
+type: object
+additionalProperties: false
+properties:
+  remote-endpoint: true
+
+required:
+  - reg
+
+required:
+  - "#address-cells"
+  - "#size-cells"
+  - port@0
+  - port@1
+
+required:
+  - compatible
+  - reg
+  - enable-gpios
+  - ports
+
+additionalProperties: false
+
+examples:
+  - |
+#include 
+
+i2c {
+  #address-cells = <1>;
+  #size-cells = <0>;
+
+  bridge@2d {
+compatible = "ti,sn65dsi83";
+reg = <0x2d>;
+
+enable-gpios = <&gpio2 1 GPIO_ACTIVE_HIGH>;
+
+ports {
+  #address-cells = <1>;
+  #size-cells = <0>;
+
+  port@0 {
+reg = <0>;
+endpoint {
+  remote-endpoint = <&dsi0_out>;
+  data-lanes = <1 2 3 4>;
+};
+  };
+
+  port@1 {
+reg = <1>;
+endpoint {
+  remote-endpoint = <&panel_in_lvds>;
+};
+  };
+};
+  };
+};


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH V2 2/2] drm/bridge: ti-sn65dsi83: Add TI SN65DSI83 and SN65DSI84 driver

2021-04-28 Thread Frieder Schrempf


On 28.04.21 09:51, Frieder Schrempf wrote:

On 22.04.21 00:31, Marek Vasut wrote:

Add driver for TI SN65DSI83 Single-link DSI to Single-link LVDS bridge
and TI SN65DSI84 Single-link DSI to Dual-link or 2x Single-link LVDS
bridge. TI SN65DSI85 is unsupported due to lack of hardware to test on,
but easy to add.

The driver operates the chip via I2C bus. Currently the LVDS clock are
always derived from DSI clock lane, which is the usual mode of operation.
Support for clock from external oscillator is not implemented, but it is
easy to add if ever needed. Only RGB888 pixel format is implemented, the
LVDS666 is not supported, but could be added if needed.

Signed-off-by: Marek Vasut 
Cc: Douglas Anderson 
Cc: Jagan Teki 
Cc: Laurent Pinchart 
Cc: Linus Walleij 
Cc: Philippe Schenker 
Cc: Sam Ravnborg 
Cc: Stephen Boyd 
Cc: Valentin Raevsky 
To: dri-devel@lists.freedesktop.org
Tested-by: Loic Poulain 
---
V2: - Use dev_err_probe()
 - Set REG_RC_RESET as volatile
 - Wait for PLL stabilization by polling REG_RC_LVDS_PLL
 - Use ctx->mode = *adj instead of *mode in sn65dsi83_mode_set
 - Add tested DSI84 support in dual-link mode
 - Correctly set VCOM
 - Fill in missing DSI CHB and LVDS CHB bits from DSI84 and DSI85
   datasheets, with that all the reserved bits make far more sense
   as the DSI83 and DSI84 seems to be reduced version of DSI85
---
  drivers/gpu/drm/bridge/Kconfig    |  10 +
  drivers/gpu/drm/bridge/Makefile   |   1 +
  drivers/gpu/drm/bridge/ti-sn65dsi83.c | 617 ++
  3 files changed, 628 insertions(+)
  create mode 100644 drivers/gpu/drm/bridge/ti-sn65dsi83.c


[...]

+static int sn65dsi83_probe(struct i2c_client *client,
+   const struct i2c_device_id *id)
+{
+    struct device *dev = &client->dev;
+    enum sn65dsi83_model model;
+    struct sn65dsi83 *ctx;
+    int ret;
+
+    ctx = devm_kzalloc(dev, sizeof(*ctx), GFP_KERNEL);
+    if (!ctx)
+    return -ENOMEM;
+
+    ctx->dev = dev;
+
+    if (dev->of_node)
+    model = (enum sn65dsi83_model)of_device_get_match_data(dev);
+    else
+    model = id->driver_data;
+
+    /* Default to dual-link LVDS on all but DSI83. */
+    if (model != MODEL_SN65DSI83)
+    ctx->lvds_dual_link = true;


What if I use the DSI84 with a single link LVDS? I can't see any way to 
configure that right now.


I just saw the note in the header of the driver that says that single 
link mode is unsupported for the DSI84.


I have hardware with a single link display and if I set 
ctx->lvds_dual_link = false it works just fine.


How is this supposed to be selected? Does it need an extra devicetree 
property? And would you mind adding single-link support in the next 
version or do you prefer adding it in a follow-up patch?





+
+    ctx->enable_gpio = devm_gpiod_get(ctx->dev, "enable", 
GPIOD_OUT_LOW);

+    if (IS_ERR(ctx->enable_gpio))
+    return PTR_ERR(ctx->enable_gpio);
+
+    ret = sn65dsi83_parse_dt(ctx);
+    if (ret)
+    return ret;
+
+    ctx->regmap = devm_regmap_init_i2c(client, 
&sn65dsi83_regmap_config);

+    if (IS_ERR(ctx->regmap))
+    return PTR_ERR(ctx->regmap);
+
+    dev_set_drvdata(dev, ctx);
+    i2c_set_clientdata(client, ctx);
+
+    ctx->bridge.funcs = &sn65dsi83_funcs;
+    ctx->bridge.of_node = dev->of_node;
+    drm_bridge_add(&ctx->bridge);
+
+    return 0;
+}

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 1/1] i915/query: Correlate engine and cpu timestamps with better accuracy

2021-04-28 Thread Jani Nikula

On Tue, 27 Apr 2021, Umesh Nerlige Ramappa  
wrote:
> Perf measurements rely on CPU and engine timestamps to correlate
> events of interest across these time domains. Current mechanisms get
> these timestamps separately and the calculated delta between these
> timestamps lack enough accuracy.
>
> To improve the accuracy of these time measurements to within a few us,
> add a query that returns the engine and cpu timestamps captured as
> close to each other as possible.

Cc: dri-devel, Jason and Daniel for review.

>
> v2: (Tvrtko)
> - document clock reference used
> - return cpu timestamp always
> - capture cpu time just before lower dword of cs timestamp
>
> v3: (Chris)
> - use uncore-rpm
> - use __query_cs_timestamp helper
>
> v4: (Lionel)
> - Kernel perf subsytem allows users to specify the clock id to be used
>   in perf_event_open. This clock id is used by the perf subsystem to
>   return the appropriate cpu timestamp in perf events. Similarly, let
>   the user pass the clockid to this query so that cpu timestamp
>   corresponds to the clock id requested.
>
> v5: (Tvrtko)
> - Use normal ktime accessors instead of fast versions
> - Add more uApi documentation
>
> v6: (Lionel)
> - Move switch out of spinlock
>
> v7: (Chris)
> - cs_timestamp is a misnomer, use cs_cycles instead
> - return the cs cycle frequency as well in the query
>
> v8:
> - Add platform and engine specific checks
>
> v9: (Lionel)
> - Return 2 cpu timestamps in the query - captured before and after the
>   register read
>
> v10: (Chris)
> - Use local_clock() to measure time taken to read lower dword of
>   register and return it to user.
>
> v11: (Jani)
> - IS_GEN deprecated. User GRAPHICS_VER instead.
>
> Signed-off-by: Umesh Nerlige Ramappa 
> ---
>  drivers/gpu/drm/i915/i915_query.c | 145 ++
>  include/uapi/drm/i915_drm.h   |  48 ++
>  2 files changed, 193 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_query.c 
> b/drivers/gpu/drm/i915/i915_query.c
> index fed337ad7b68..2594b93901ac 100644
> --- a/drivers/gpu/drm/i915/i915_query.c
> +++ b/drivers/gpu/drm/i915/i915_query.c
> @@ -6,6 +6,8 @@
>  
>  #include 
>  
> +#include "gt/intel_engine_pm.h"
> +#include "gt/intel_engine_user.h"
>  #include "i915_drv.h"
>  #include "i915_perf.h"
>  #include "i915_query.h"
> @@ -90,6 +92,148 @@ static int query_topology_info(struct drm_i915_private 
> *dev_priv,
>   return total_length;
>  }
>  
> +typedef u64 (*__ktime_func_t)(void);
> +static __ktime_func_t __clock_id_to_func(clockid_t clk_id)
> +{
> + /*
> +  * Use logic same as the perf subsystem to allow user to select the
> +  * reference clock id to be used for timestamps.
> +  */
> + switch (clk_id) {
> + case CLOCK_MONOTONIC:
> + return &ktime_get_ns;
> + case CLOCK_MONOTONIC_RAW:
> + return &ktime_get_raw_ns;
> + case CLOCK_REALTIME:
> + return &ktime_get_real_ns;
> + case CLOCK_BOOTTIME:
> + return &ktime_get_boottime_ns;
> + case CLOCK_TAI:
> + return &ktime_get_clocktai_ns;
> + default:
> + return NULL;
> + }
> +}
> +
> +static inline int
> +__read_timestamps(struct intel_uncore *uncore,
> +   i915_reg_t lower_reg,
> +   i915_reg_t upper_reg,
> +   u64 *cs_ts,
> +   u64 *cpu_ts,
> +   __ktime_func_t cpu_clock)
> +{
> + u32 upper, lower, old_upper, loop = 0;
> +
> + upper = intel_uncore_read_fw(uncore, upper_reg);
> + do {
> + cpu_ts[1] = local_clock();
> + cpu_ts[0] = cpu_clock();
> + lower = intel_uncore_read_fw(uncore, lower_reg);
> + cpu_ts[1] = local_clock() - cpu_ts[1];
> + old_upper = upper;
> + upper = intel_uncore_read_fw(uncore, upper_reg);
> + } while (upper != old_upper && loop++ < 2);
> +
> + *cs_ts = (u64)upper << 32 | lower;
> +
> + return 0;
> +}
> +
> +static int
> +__query_cs_cycles(struct intel_engine_cs *engine,
> +   u64 *cs_ts, u64 *cpu_ts,
> +   __ktime_func_t cpu_clock)
> +{
> + struct intel_uncore *uncore = engine->uncore;
> + enum forcewake_domains fw_domains;
> + u32 base = engine->mmio_base;
> + intel_wakeref_t wakeref;
> + int ret;
> +
> + fw_domains = intel_uncore_forcewake_for_reg(uncore,
> + RING_TIMESTAMP(base),
> + FW_REG_READ);
> +
> + with_intel_runtime_pm(uncore->rpm, wakeref) {
> + spin_lock_irq(&uncore->lock);
> + intel_uncore_forcewake_get__locked(uncore, fw_domains);
> +
> + ret = __read_timestamps(uncore,
> + RING_TIMESTAMP(base),
> + RING_TIMESTAMP_UDW(base),
> + cs_ts,
> + cpu_ts,
> +

Re: [PATCH 1/2] drm/ttm: Don't evict SG BOs


Am 28.04.21 um 09:49 schrieb Felix Kuehling:

Am 2021-04-28 um 3:04 a.m. schrieb Christian König:

Am 28.04.21 um 07:33 schrieb Felix Kuehling:

SG BOs do not occupy space that is managed by TTM. So do not evict them.

This fixes unexpected evictions of KFD's userptr BOs. KFD only expects
userptr "evictions" in the form of MMU notifiers.

NAK, SG BOs also account for the memory the GPU can currently access.

We can ignore them for the allocated memory, but not for the GTT domain.

Hmm, the only reason I found this problem is, that I am now testing with
IOMMU enabled. Evicting the userptr BO destroys the DMA mapping. Without
IOMMU-enforced device isolation I was blissfully unaware that the
userptr BOs were being evicted. The GPUVM mappings were unaffected and
just worked without problems. Having to evict these BOs is crippling
KFD's ability to map system memory for GPU access, once again.

I think this affects not only userptr BOs but also DMABuf imports for
BOs shared between multiple GPUs.


Correct, yes.


The GTT size limitation is entirely artificial. And the only reason I
know of for keeping it limited to the VRAM size is to work around some
OOM issues with GTT BOs. Applying this to userptrs and DMABuf imports
makes no sense. But I understand that the way TTM manages the GTT domain
there is no easy fix for this. Maybe we'd have to create a new domain
for validating SG BOs that's separate from GTT, so that TTM would not
try to allocate GTT space for them.


Well that is contradict to what the GTT domain is all about.

It should limit the amount of system memory the GPU can access at the 
same time. This includes imported DMA-bus as well as userptrs.


That the GPUVM mappings are still there is certainly a bug we should 
look into, but in general if we don't want that limitation we need to 
increase the GTT size and not work around it.


But increasing the GTT size in turn as has a huge negative impact on OOM 
situations up to the point that the OOM killer can't work any more.



Failing that, I'd probably have to abandon userptr BOs altogether and
switch system memory mappings over to using the new SVM API on systems
where it is avaliable.


Well as long as that provides the necessary functionality through HMM it 
would be an option.


Regards,
Christian.



Regards,
   Felix



Christian.


Signed-off-by: Felix Kuehling 
---
   drivers/gpu/drm/ttm/ttm_bo.c | 4 
   1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index de1ec838cf8b..0b953654fdbf 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -655,6 +655,10 @@ int ttm_mem_evict_first(struct ttm_device *bdev,
   list_for_each_entry(bo, &man->lru[i], lru) {
   bool busy;
   +    /* Don't evict SG BOs */
+    if (bo->ttm && bo->ttm->sg)
+    continue;
+
   if (!ttm_bo_evict_swapout_allowable(bo, ctx, &locked,
   &busy)) {
   if (busy && !busy_bo && ticket !=


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-04-28 Thread Michel Dänzer

On 2021-04-28 8:59 a.m., Christian König wrote:
> Hi Dave,
> 
> Am 27.04.21 um 21:23 schrieb Marek Olšák:
>> Supporting interop with any device is always possible. It depends on which 
>> drivers we need to interoperate with and update them. We've already found 
>> the path forward for amdgpu. We just need to find out how many other drivers 
>> need to be updated and evaluate the cost/benefit aspect.
>>
>> Marek
>>
>> On Tue, Apr 27, 2021 at 2:38 PM Dave Airlie > > wrote:
>>
>> On Tue, 27 Apr 2021 at 22:06, Christian König
>> > > wrote:
>> >
>> > Correct, we wouldn't have synchronization between device with and 
>> without user queues any more.
>> >
>> > That could only be a problem for A+I Laptops.
>>
>> Since I think you mentioned you'd only be enabling this on newer
>> chipsets, won't it be a problem for A+A where one A is a generation
>> behind the other?
>>
> 
> Crap, that is a good point as well.
> 
>>
>> I'm not really liking where this is going btw, seems like a ill
>> thought out concept, if AMD is really going down the road of designing
>> hw that is currently Linux incompatible, you are going to have to
>> accept a big part of the burden in bringing this support in to more
>> than just amd drivers for upcoming generations of gpu.
>>
> 
> Well we don't really like that either, but we have no other option as far as 
> I can see.

I don't really understand what "future hw may remove support for kernel queues" 
means exactly. While the per-context queues can be mapped to userspace 
directly, they don't *have* to be, do they? I.e. the kernel driver should be 
able to either intercept userspace access to the queues, or in the worst case 
do it all itself, and provide the existing synchronization semantics as needed?

Surely there are resource limits for the per-context queues, so the kernel 
driver needs to do some kind of virtualization / multi-plexing anyway, or we'll 
get sad user faces when there's no queue available for .

I'm probably missing something though, awaiting enlightenment. :)


-- 
Earthling Michel Dänzer   |   https://redhat.com
Libre software enthusiast | Mesa and X developer
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH V2 2/2] drm/bridge: ti-sn65dsi83: Add TI SN65DSI83 and SN65DSI84 driver

2021-04-28 Thread Loic Poulain

On Wed, 28 Apr 2021 at 10:13, Frieder Schrempf
 wrote:
>
> On 28.04.21 09:51, Frieder Schrempf wrote:
> > On 22.04.21 00:31, Marek Vasut wrote:
> >> Add driver for TI SN65DSI83 Single-link DSI to Single-link LVDS bridge
> >> and TI SN65DSI84 Single-link DSI to Dual-link or 2x Single-link LVDS
> >> bridge. TI SN65DSI85 is unsupported due to lack of hardware to test on,
> >> but easy to add.
> >>
> >> The driver operates the chip via I2C bus. Currently the LVDS clock are
> >> always derived from DSI clock lane, which is the usual mode of operation.
> >> Support for clock from external oscillator is not implemented, but it is
> >> easy to add if ever needed. Only RGB888 pixel format is implemented, the
> >> LVDS666 is not supported, but could be added if needed.
> >>
> >> Signed-off-by: Marek Vasut 
> >> Cc: Douglas Anderson 
> >> Cc: Jagan Teki 
> >> Cc: Laurent Pinchart 
> >> Cc: Linus Walleij 
> >> Cc: Philippe Schenker 
> >> Cc: Sam Ravnborg 
> >> Cc: Stephen Boyd 
> >> Cc: Valentin Raevsky 
> >> To: dri-devel@lists.freedesktop.org
> >> Tested-by: Loic Poulain 
> >> ---
> >> V2: - Use dev_err_probe()
> >>  - Set REG_RC_RESET as volatile
> >>  - Wait for PLL stabilization by polling REG_RC_LVDS_PLL
> >>  - Use ctx->mode = *adj instead of *mode in sn65dsi83_mode_set
> >>  - Add tested DSI84 support in dual-link mode
> >>  - Correctly set VCOM
> >>  - Fill in missing DSI CHB and LVDS CHB bits from DSI84 and DSI85
> >>datasheets, with that all the reserved bits make far more sense
> >>as the DSI83 and DSI84 seems to be reduced version of DSI85
> >> ---
> >>   drivers/gpu/drm/bridge/Kconfig|  10 +
> >>   drivers/gpu/drm/bridge/Makefile   |   1 +
> >>   drivers/gpu/drm/bridge/ti-sn65dsi83.c | 617 ++
> >>   3 files changed, 628 insertions(+)
> >>   create mode 100644 drivers/gpu/drm/bridge/ti-sn65dsi83.c
> >>
> > [...]
> >> +static int sn65dsi83_probe(struct i2c_client *client,
> >> +   const struct i2c_device_id *id)
> >> +{
> >> +struct device *dev = &client->dev;
> >> +enum sn65dsi83_model model;
> >> +struct sn65dsi83 *ctx;
> >> +int ret;
> >> +
> >> +ctx = devm_kzalloc(dev, sizeof(*ctx), GFP_KERNEL);
> >> +if (!ctx)
> >> +return -ENOMEM;
> >> +
> >> +ctx->dev = dev;
> >> +
> >> +if (dev->of_node)
> >> +model = (enum sn65dsi83_model)of_device_get_match_data(dev);
> >> +else
> >> +model = id->driver_data;
> >> +
> >> +/* Default to dual-link LVDS on all but DSI83. */
> >> +if (model != MODEL_SN65DSI83)
> >> +ctx->lvds_dual_link = true;
> >
> > What if I use the DSI84 with a single link LVDS? I can't see any way to
> > configure that right now.

I assume the simplest way would be to use the "ti,sn65dsi83"
compatible string in your dts, since the way you wired it is
'compatible' with sn65dsi83, right?

>
> I just saw the note in the header of the driver that says that single
> link mode is unsupported for the DSI84.
>
> I have hardware with a single link display and if I set
> ctx->lvds_dual_link = false it works just fine.
>
> How is this supposed to be selected? Does it need an extra devicetree
> property? And would you mind adding single-link support in the next
> version or do you prefer adding it in a follow-up patch?

If this has to be supported I think the proper way would be to support
two output ports in the dts (e.g. lvds0_out, lvds1_out), in the same
way as supported by the 'advantech,idk-2121wr' panel.

Regards,
Loic
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH V2 2/2] drm/bridge: ti-sn65dsi83: Add TI SN65DSI83 and SN65DSI84 driver

2021-04-28 Thread Neil Armstrong

On 28/04/2021 11:26, Loic Poulain wrote:
> On Wed, 28 Apr 2021 at 10:13, Frieder Schrempf
>  wrote:
>>
>> On 28.04.21 09:51, Frieder Schrempf wrote:
>>> On 22.04.21 00:31, Marek Vasut wrote:
 Add driver for TI SN65DSI83 Single-link DSI to Single-link LVDS bridge
 and TI SN65DSI84 Single-link DSI to Dual-link or 2x Single-link LVDS
 bridge. TI SN65DSI85 is unsupported due to lack of hardware to test on,
 but easy to add.

 The driver operates the chip via I2C bus. Currently the LVDS clock are
 always derived from DSI clock lane, which is the usual mode of operation.
 Support for clock from external oscillator is not implemented, but it is
 easy to add if ever needed. Only RGB888 pixel format is implemented, the
 LVDS666 is not supported, but could be added if needed.

 Signed-off-by: Marek Vasut 
 Cc: Douglas Anderson 
 Cc: Jagan Teki 
 Cc: Laurent Pinchart 
 Cc: Linus Walleij 
 Cc: Philippe Schenker 
 Cc: Sam Ravnborg 
 Cc: Stephen Boyd 
 Cc: Valentin Raevsky 
 To: dri-devel@lists.freedesktop.org
 Tested-by: Loic Poulain 
 ---
 V2: - Use dev_err_probe()
  - Set REG_RC_RESET as volatile
  - Wait for PLL stabilization by polling REG_RC_LVDS_PLL
  - Use ctx->mode = *adj instead of *mode in sn65dsi83_mode_set
  - Add tested DSI84 support in dual-link mode
  - Correctly set VCOM
  - Fill in missing DSI CHB and LVDS CHB bits from DSI84 and DSI85
datasheets, with that all the reserved bits make far more sense
as the DSI83 and DSI84 seems to be reduced version of DSI85
 ---
   drivers/gpu/drm/bridge/Kconfig|  10 +
   drivers/gpu/drm/bridge/Makefile   |   1 +
   drivers/gpu/drm/bridge/ti-sn65dsi83.c | 617 ++
   3 files changed, 628 insertions(+)
   create mode 100644 drivers/gpu/drm/bridge/ti-sn65dsi83.c

>>> [...]
 +static int sn65dsi83_probe(struct i2c_client *client,
 +   const struct i2c_device_id *id)
 +{
 +struct device *dev = &client->dev;
 +enum sn65dsi83_model model;
 +struct sn65dsi83 *ctx;
 +int ret;
 +
 +ctx = devm_kzalloc(dev, sizeof(*ctx), GFP_KERNEL);
 +if (!ctx)
 +return -ENOMEM;
 +
 +ctx->dev = dev;
 +
 +if (dev->of_node)
 +model = (enum sn65dsi83_model)of_device_get_match_data(dev);
 +else
 +model = id->driver_data;
 +
 +/* Default to dual-link LVDS on all but DSI83. */
 +if (model != MODEL_SN65DSI83)
 +ctx->lvds_dual_link = true;
>>>
>>> What if I use the DSI84 with a single link LVDS? I can't see any way to
>>> configure that right now.
> 
> I assume the simplest way would be to use the "ti,sn65dsi83"
> compatible string in your dts, since the way you wired it is
> 'compatible' with sn65dsi83, right?

No this isn't the right way to to, if sn65dsi84 is supported and the bindings 
only support single lvds link,
the driver must only support single link on sn65dsi84, or add the dual link 
lvds in the bindings only for sn65dsi84.

> 
>>
>> I just saw the note in the header of the driver that says that single
>> link mode is unsupported for the DSI84.
>>
>> I have hardware with a single link display and if I set
>> ctx->lvds_dual_link = false it works just fine.
>>
>> How is this supposed to be selected? Does it need an extra devicetree
>> property? And would you mind adding single-link support in the next
>> version or do you prefer adding it in a follow-up patch?
> 
> If this has to be supported I think the proper way would be to support
> two output ports in the dts (e.g. lvds0_out, lvds1_out), in the same
> way as supported by the 'advantech,idk-2121wr' panel.

Yes, this is why I asked to have the dual-link lvds in the bindings.

Neil

> 
> Regards,
> Loic
> ___
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> 

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCHv2] drm/omap: Fix issue with clocks left on after resume

2021-04-28 Thread Tony Lindgren

On resume, dispc pm_runtime_force_resume() is not enabling the hardware
as we pass the pm_runtime_need_not_resume() test as the device is suspended
with no child devices.

As the resume continues, omap_atomic_comit_tail() calls dispc_runtime_get()
that calls rpm_resume() enabling the hardware, and increasing child_count
for it's parent device.

But at this point device_complete() has not yet been called for dispc. So
when omap_atomic_comit_tail() calls dispc_runtime_put(), it won't idle
the hardware as rpm_suspend() returns -EBUSY, and the clocks are left on
after resume. The parent child count is not decremented as the -EBUSY
cannot be easily handled until later on after device_complete().

This can be easily seen for example after suspending Beagleboard-X15 with
no displays connected, and by reading the CM_DSS_DSS_CLKCTRL register at
0x4a009120 after resume. After a suspend and resume cycle, it shows a
value of 0x00040102 instead of 0x0007 like it should.

Let's fix the issue by calling dispc_runtime_suspend() and
dispc_runtime_resume() directly from dispc_suspend() and dispc_resume().
This leaves out the PM runtime related issues for system suspend.

We could handle the issue by adding more calls to dispc_runtime_get()
and dispc_runtime_put() from omap_drm_suspend() and omap_drm_resume()
as suggested by Tomi Valkeinen .
But that would just add more inter-component calls and more dependencies
to PM runtime for system suspend and does not make things easier in the
long.

See also earlier commit 88d26136a256 ("PM: Prevent runtime suspend during
system resume") and commit ca8199f13498 ("drm/msm/dpu: ensure device
suspend happens during PM sleep") for more information.

Fixes: ecfdedd7da5d ("drm/omap: force runtime PM suspend on system suspend")
Signed-off-by: Tony Lindgren 
---

Changes since v1:
- Updated the description for a typo noticed by Tomi
- Added more info about what all goes wrong

---
 drivers/gpu/drm/omapdrm/dss/dispc.c | 27 ++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/omapdrm/dss/dispc.c 
b/drivers/gpu/drm/omapdrm/dss/dispc.c
--- a/drivers/gpu/drm/omapdrm/dss/dispc.c
+++ b/drivers/gpu/drm/omapdrm/dss/dispc.c
@@ -182,6 +182,7 @@ struct dispc_device {
const struct dispc_features *feat;
 
bool is_enabled;
+   bool needs_resume;
 
struct regmap *syscon_pol;
u32 syscon_pol_offset;
@@ -4887,10 +4888,34 @@ static int dispc_runtime_resume(struct device *dev)
return 0;
 }
 
+static int dispc_suspend(struct device *dev)
+{
+   struct dispc_device *dispc = dev_get_drvdata(dev);
+
+   if (!dispc->is_enabled)
+   return 0;
+
+   dispc->needs_resume = true;
+
+   return dispc_runtime_suspend(dev);
+}
+
+static int dispc_resume(struct device *dev)
+{
+   struct dispc_device *dispc = dev_get_drvdata(dev);
+
+   if (!dispc->needs_resume)
+   return 0;
+
+   dispc->needs_resume = false;
+
+   return dispc_runtime_resume(dev);
+}
+
 static const struct dev_pm_ops dispc_pm_ops = {
.runtime_suspend = dispc_runtime_suspend,
.runtime_resume = dispc_runtime_resume,
-   SET_LATE_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend, 
pm_runtime_force_resume)
+   SET_LATE_SYSTEM_SLEEP_PM_OPS(dispc_suspend, dispc_resume)
 };
 
 struct platform_driver omap_dispchw_driver = {
-- 
2.31.1
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH] drm/omap: Fix issue with clocks left on after resume

2021-04-28 Thread Tony Lindgren

* Tony Lindgren  [210427 10:54]:
> * Tony Lindgren  [210427 10:12]:
> > * Tomi Valkeinen  [210427 08:47]:
> > > If I understand right, this is only an issue when the dss was not enabled
> > > before the system suspend? And as the dispc is not enabled at suspend,
> > > pm_runtime_force_suspend and pm_runtime_force_resume don't really do
> > > anything. At resume, the DRM resume functionality causes omapdrm to call
> > > pm_runtime_get and put, and this somehow causes the dss to stay enabled.
> > 
> > We do have dss enabled at system suspend from omap_atomic_comit_tail()
> > until pm_runtime_force_suspend(). Then we have pm_runtime_force_resume()
> > enable it.
> 
> Sorry I already forgot that pm_runtime_force_resume() is not enabling
> it because pm_runtime_need_not_resume().. It's the omapdrm calling
> pm_runtime_get() that enables the hardware on resume.
> 
> > Then on resume PM runtime prevents disable of the hardware on resume path
> > until after device_complete(). Until then we have rpm_suspend() return
> > -EBUSY, and so the parent child_count is not going to get decreased.
> > Something would have to handle the -EBUSY error here it seems.

I sent out v2 patch with an updated description.

Regards,

Tony
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH V2 2/2] drm/bridge: ti-sn65dsi83: Add TI SN65DSI83 and SN65DSI84 driver

2021-04-28 Thread Jagan Teki

On Wed, Apr 28, 2021 at 2:54 PM Neil Armstrong  wrote:
>
> On 28/04/2021 11:26, Loic Poulain wrote:
> > On Wed, 28 Apr 2021 at 10:13, Frieder Schrempf
> >  wrote:
> >>
> >> On 28.04.21 09:51, Frieder Schrempf wrote:
> >>> On 22.04.21 00:31, Marek Vasut wrote:
>  Add driver for TI SN65DSI83 Single-link DSI to Single-link LVDS bridge
>  and TI SN65DSI84 Single-link DSI to Dual-link or 2x Single-link LVDS
>  bridge. TI SN65DSI85 is unsupported due to lack of hardware to test on,
>  but easy to add.
> 
>  The driver operates the chip via I2C bus. Currently the LVDS clock are
>  always derived from DSI clock lane, which is the usual mode of operation.
>  Support for clock from external oscillator is not implemented, but it is
>  easy to add if ever needed. Only RGB888 pixel format is implemented, the
>  LVDS666 is not supported, but could be added if needed.
> 
>  Signed-off-by: Marek Vasut 
>  Cc: Douglas Anderson 
>  Cc: Jagan Teki 
>  Cc: Laurent Pinchart 
>  Cc: Linus Walleij 
>  Cc: Philippe Schenker 
>  Cc: Sam Ravnborg 
>  Cc: Stephen Boyd 
>  Cc: Valentin Raevsky 
>  To: dri-devel@lists.freedesktop.org
>  Tested-by: Loic Poulain 
>  ---
>  V2: - Use dev_err_probe()
>   - Set REG_RC_RESET as volatile
>   - Wait for PLL stabilization by polling REG_RC_LVDS_PLL
>   - Use ctx->mode = *adj instead of *mode in sn65dsi83_mode_set
>   - Add tested DSI84 support in dual-link mode
>   - Correctly set VCOM
>   - Fill in missing DSI CHB and LVDS CHB bits from DSI84 and DSI85
> datasheets, with that all the reserved bits make far more sense
> as the DSI83 and DSI84 seems to be reduced version of DSI85
>  ---
>    drivers/gpu/drm/bridge/Kconfig|  10 +
>    drivers/gpu/drm/bridge/Makefile   |   1 +
>    drivers/gpu/drm/bridge/ti-sn65dsi83.c | 617 ++
>    3 files changed, 628 insertions(+)
>    create mode 100644 drivers/gpu/drm/bridge/ti-sn65dsi83.c
> 
> >>> [...]
>  +static int sn65dsi83_probe(struct i2c_client *client,
>  +   const struct i2c_device_id *id)
>  +{
>  +struct device *dev = &client->dev;
>  +enum sn65dsi83_model model;
>  +struct sn65dsi83 *ctx;
>  +int ret;
>  +
>  +ctx = devm_kzalloc(dev, sizeof(*ctx), GFP_KERNEL);
>  +if (!ctx)
>  +return -ENOMEM;
>  +
>  +ctx->dev = dev;
>  +
>  +if (dev->of_node)
>  +model = (enum sn65dsi83_model)of_device_get_match_data(dev);
>  +else
>  +model = id->driver_data;
>  +
>  +/* Default to dual-link LVDS on all but DSI83. */
>  +if (model != MODEL_SN65DSI83)
>  +ctx->lvds_dual_link = true;
> >>>
> >>> What if I use the DSI84 with a single link LVDS? I can't see any way to
> >>> configure that right now.
> >
> > I assume the simplest way would be to use the "ti,sn65dsi83"
> > compatible string in your dts, since the way you wired it is
> > 'compatible' with sn65dsi83, right?
>
> No this isn't the right way to to, if sn65dsi84 is supported and the bindings 
> only support single lvds link,
> the driver must only support single link on sn65dsi84, or add the dual link 
> lvds in the bindings only for sn65dsi84.
>
> >
> >>
> >> I just saw the note in the header of the driver that says that single
> >> link mode is unsupported for the DSI84.
> >>
> >> I have hardware with a single link display and if I set
> >> ctx->lvds_dual_link = false it works just fine.
> >>
> >> How is this supposed to be selected? Does it need an extra devicetree
> >> property? And would you mind adding single-link support in the next
> >> version or do you prefer adding it in a follow-up patch?
> >
> > If this has to be supported I think the proper way would be to support
> > two output ports in the dts (e.g. lvds0_out, lvds1_out), in the same
> > way as supported by the 'advantech,idk-2121wr' panel.
>
> Yes, this is why I asked to have the dual-link lvds in the bindings.

Agreed with Neil, this is what we discussed on my v3. Each of these 3
chips has its own compatible and supporting dual-link lvds and
dual-link dsi as to be done by 84/85 and 85 respectively.

Maybe I can push my configuration changes in gist if required?

Jagan.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v5 05/16] swiotlb: Add restricted DMA pool initialization

2021-04-28 Thread Steven Price


On 26/04/2021 17:37, Claire Chang wrote:

On Fri, Apr 23, 2021 at 7:34 PM Steven Price  wrote:

[...]


But even then if it's not and we have the situation where debugfs==NULL
then the debugfs_create_dir() here will cause a subsequent attempt in
swiotlb_create_debugfs() to fail (directory already exists) leading to
mem->debugfs being assigned an error value. I suspect the creation of
the debugfs directory needs to be separated from io_tlb_default_mem
being set.


debugfs creation should move into the if (!mem) {...} above to avoid
duplication.
I think having a separated struct dentry pointer for the default
debugfs should be enough?

if (!debugfs)
 debugfs = debugfs_create_dir("swiotlb", NULL);
swiotlb_create_debugfs(mem, rmem->name, debugfs);


Yes that looks like a good solution to me. Although I'd name the 
variable something a bit more descriptive than just "debugfs" e.g. 
"debugfs_dir" or "debugfs_root".


Thanks,

Steve
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

On Wed, Apr 28, 2021 at 08:59:47AM +0200, Christian König wrote:
> Hi Dave,
> 
> Am 27.04.21 um 21:23 schrieb Marek Olšák:
> > Supporting interop with any device is always possible. It depends on
> > which drivers we need to interoperate with and update them. We've
> > already found the path forward for amdgpu. We just need to find out how
> > many other drivers need to be updated and evaluate the cost/benefit
> > aspect.
> > 
> > Marek
> > 
> > On Tue, Apr 27, 2021 at 2:38 PM Dave Airlie  > > wrote:
> > 
> > On Tue, 27 Apr 2021 at 22:06, Christian König
> >  > > wrote:
> > >
> > > Correct, we wouldn't have synchronization between device with
> > and without user queues any more.
> > >
> > > That could only be a problem for A+I Laptops.
> > 
> > Since I think you mentioned you'd only be enabling this on newer
> > chipsets, won't it be a problem for A+A where one A is a generation
> > behind the other?
> > 
> 
> Crap, that is a good point as well.
> 
> > 
> > I'm not really liking where this is going btw, seems like a ill
> > thought out concept, if AMD is really going down the road of designing
> > hw that is currently Linux incompatible, you are going to have to
> > accept a big part of the burden in bringing this support in to more
> > than just amd drivers for upcoming generations of gpu.
> > 
> 
> Well we don't really like that either, but we have no other option as far as
> I can see.
> 
> I have a couple of ideas how to handle this in the kernel without
> dma_fences, but it always require more or less changes to all existing
> drivers.

Yeah one horrible idea is to essentially do the plan we hashed out for
adding userspace fences to drm_syncobj timelines. And then add drm_syncobj
as another implicit fencing thing to dma-buf.

But:
- This is horrible. We're all agreeing that implicit sync is not a great
  idea, building an entire new world on this flawed thing doesn't sound
  like a good path forward.

- It's kernel uapi, so it's going to be forever.

- It's only fixing the correctness issue, since you have to stall for
  future/indefinite fences at the beginning of the CS ioctl. Or at the
  beginning of the atomic modeset ioctl, which kinda defeats the point of
  nonblocking.

- You still have to touch all kmd drivers.

- For performance, you still have to glue a submit thread onto all gl
  drivers.

It is horrendous.
-Daniel

> 
> Christian.
> 
> > 
> > Dave.
> > 
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

On Wed, Apr 28, 2021 at 11:07:09AM +0200, Michel Dänzer wrote:
> On 2021-04-28 8:59 a.m., Christian König wrote:
> > Hi Dave,
> > 
> > Am 27.04.21 um 21:23 schrieb Marek Olšák:
> >> Supporting interop with any device is always possible. It depends on which 
> >> drivers we need to interoperate with and update them. We've already found 
> >> the path forward for amdgpu. We just need to find out how many other 
> >> drivers need to be updated and evaluate the cost/benefit aspect.
> >>
> >> Marek
> >>
> >> On Tue, Apr 27, 2021 at 2:38 PM Dave Airlie  >> > wrote:
> >>
> >> On Tue, 27 Apr 2021 at 22:06, Christian König
> >>  >> > wrote:
> >> >
> >> > Correct, we wouldn't have synchronization between device with and 
> >> without user queues any more.
> >> >
> >> > That could only be a problem for A+I Laptops.
> >>
> >> Since I think you mentioned you'd only be enabling this on newer
> >> chipsets, won't it be a problem for A+A where one A is a generation
> >> behind the other?
> >>
> > 
> > Crap, that is a good point as well.
> > 
> >>
> >> I'm not really liking where this is going btw, seems like a ill
> >> thought out concept, if AMD is really going down the road of designing
> >> hw that is currently Linux incompatible, you are going to have to
> >> accept a big part of the burden in bringing this support in to more
> >> than just amd drivers for upcoming generations of gpu.
> >>
> > 
> > Well we don't really like that either, but we have no other option as far 
> > as I can see.
> 
> I don't really understand what "future hw may remove support for kernel
> queues" means exactly. While the per-context queues can be mapped to
> userspace directly, they don't *have* to be, do they? I.e. the kernel
> driver should be able to either intercept userspace access to the
> queues, or in the worst case do it all itself, and provide the existing
> synchronization semantics as needed?
> 
> Surely there are resource limits for the per-context queues, so the
> kernel driver needs to do some kind of virtualization / multi-plexing
> anyway, or we'll get sad user faces when there's no queue available for
> .
> 
> I'm probably missing something though, awaiting enlightenment. :)

Yeah in all this discussion what's unclear to me is, is this a hard amdgpu
requirement going forward, in which case you need a time machine and lots
of people to retroactively fix this because this aint fast to get fixed.

Or is this just musings for an ecosystem that better fits current&future
hw, for which I think we all agree where the rough direction is?

The former is quite a glorious situation, and I'm with Dave here that if
your hw engineers really removed the bit to not map the ringbuffers to
userspace, then amd gets to eat a big chunk of the cost here.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

On Tue, Apr 27, 2021 at 06:27:27PM +, Simon Ser wrote:
> On Tuesday, April 27th, 2021 at 8:01 PM, Alex Deucher  
> wrote:
> 
> > It's an upcoming requirement for windows[1], so you are likely to
> > start seeing this across all GPU vendors that support windows. I
> > think the timing depends on how quickly the legacy hardware support
> > sticks around for each vendor.
> 
> Hm, okay.
> 
> Will using the existing explicit synchronization APIs make it work
> properly? (e.g. IN_FENCE_FD + OUT_FENCE_PTR in KMS, EGL_KHR_fence_sync +
> EGL_ANDROID_native_fence_sync + EGL_KHR_wait_sync in EGL)

If you have hw which really _only_ supports userspace direct submission
(i.e. the ringbuffer has to be in the same gpu vm as everything else by
design, and can't be protected at all with e.g. read-only pte entries)
then all that stuff would be broken.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

On Tue, Apr 27, 2021 at 02:01:20PM -0400, Alex Deucher wrote:
> On Tue, Apr 27, 2021 at 1:35 PM Simon Ser  wrote:
> >
> > On Tuesday, April 27th, 2021 at 7:31 PM, Lucas Stach 
> >  wrote:
> >
> > > > Ok. So that would only make the following use cases broken for now:
> > > >
> > > > - amd render -> external gpu
> > > > - amd video encode -> network device
> > >
> > > FWIW, "only" breaking amd render -> external gpu will make us pretty
> > > unhappy
> >
> > I concur. I have quite a few users with a multi-GPU setup involving
> > AMD hardware.
> >
> > Note, if this brokenness can't be avoided, I'd prefer a to get a clear
> > error, and not bad results on screen because nothing is synchronized
> > anymore.
> 
> It's an upcoming requirement for windows[1], so you are likely to
> start seeing this across all GPU vendors that support windows.  I
> think the timing depends on how quickly the legacy hardware support
> sticks around for each vendor.

Yeah but hw scheduling doesn't mean the hw has to be constructed to not
support isolating the ringbuffer at all.

E.g. even if the hw loses the bit to put the ringbuffer outside of the
userspace gpu vm, if you have pagetables I'm seriously hoping you have r/o
pte flags. Otherwise the entire "share address space with cpu side,
seamlessly" thing is out of the window.

And with that r/o bit on the ringbuffer you can once more force submit
through kernel space, and all the legacy dma_fence based stuff keeps
working. And we don't have to invent some horrendous userspace fence based
implicit sync mechanism in the kernel, but can instead do this transition
properly with drm_syncobj timeline explicit sync and protocol reving.

At least I think you'd have to work extra hard to create a gpu which
cannot possibly be intercepted by the kernel, even when it's designed to
support userspace direct submit only.

Or are your hw engineers more creative here and we're screwed?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 08/21] drm/i915/gem: Disallow bonding of virtual engines

On Tue, Apr 27, 2021 at 08:51:08AM -0500, Jason Ekstrand wrote:
> On Fri, Apr 23, 2021 at 5:31 PM Jason Ekstrand  wrote:
> >
> > This adds a bunch of complexity which the media driver has never
> > actually used.  The media driver does technically bond a balanced engine
> > to another engine but the balanced engine only has one engine in the
> > sibling set.  This doesn't actually result in a virtual engine.
> >
> > Unless some userspace badly wants it, there's no good reason to support
> > this case.  This makes I915_CONTEXT_ENGINES_EXT_BOND a total no-op.  We
> > leave the validation code in place in case we ever decide we want to do
> > something interesting with the bonding information.
> >
> > Signed-off-by: Jason Ekstrand 
> > ---
> >  drivers/gpu/drm/i915/gem/i915_gem_context.c   |  18 +-
> >  .../gpu/drm/i915/gem/i915_gem_execbuffer.c|   2 +-
> >  drivers/gpu/drm/i915/gt/intel_engine_types.h  |   7 -
> >  .../drm/i915/gt/intel_execlists_submission.c  | 100 
> >  .../drm/i915/gt/intel_execlists_submission.h  |   4 -
> >  drivers/gpu/drm/i915/gt/selftest_execlists.c  | 229 --
> >  6 files changed, 7 insertions(+), 353 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
> > b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > index e8179918fa306..5f8d0faf783aa 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > @@ -1553,6 +1553,12 @@ set_engines__bond(struct i915_user_extension __user 
> > *base, void *data)
> > }
> > virtual = set->engines->engines[idx]->engine;
> >
> > +   if (intel_engine_is_virtual(virtual)) {
> > +   drm_dbg(&i915->drm,
> > +   "Bonding with virtual engines not allowed\n");
> > +   return -EINVAL;
> > +   }
> > +
> > err = check_user_mbz(&ext->flags);
> > if (err)
> > return err;
> > @@ -1593,18 +1599,6 @@ set_engines__bond(struct i915_user_extension __user 
> > *base, void *data)
> > n, ci.engine_class, ci.engine_instance);
> > return -EINVAL;
> > }
> > -
> > -   /*
> > -* A non-virtual engine has no siblings to choose between; 
> > and
> > -* a submit fence will always be directed to the one engine.
> > -*/
> > -   if (intel_engine_is_virtual(virtual)) {
> > -   err = intel_virtual_engine_attach_bond(virtual,
> > -  master,
> > -  bond);
> > -   if (err)
> > -   return err;
> > -   }
> > }
> >
> > return 0;
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
> > b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > index d640bba6ad9ab..efb2fa3522a42 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > @@ -3474,7 +3474,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
> > if (args->flags & I915_EXEC_FENCE_SUBMIT)
> > err = i915_request_await_execution(eb.request,
> >in_fence,
> > -  
> > eb.engine->bond_execute);
> > +  NULL);
> > else
> > err = i915_request_await_dma_fence(eb.request,
> >in_fence);
> > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h 
> > b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > index 883bafc449024..68cfe5080325c 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> > @@ -446,13 +446,6 @@ struct intel_engine_cs {
> >  */
> > void(*submit_request)(struct i915_request *rq);
> >
> > -   /*
> > -* Called on signaling of a SUBMIT_FENCE, passing along the 
> > signaling
> > -* request down to the bonded pairs.
> > -*/
> > -   void(*bond_execute)(struct i915_request *rq,
> > -   struct dma_fence *signal);
> > -
> > /*
> >  * Call when the priority on a request has changed and it and its
> >  * dependencies may need rescheduling. Note the request itself may
> > diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
> > b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > index de124870af44d..b6e2b59f133b7 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> > @@ -181,18 +181,6 @@ struct virtual_engine

Re: [Intel-gfx] [PATCH 09/21] drm/i915/gem: Disallow creating contexts with too many engines

On Fri, Apr 23, 2021 at 05:31:19PM -0500, Jason Ekstrand wrote:
> There's no sense in allowing userspace to create more engines than it
> can possibly access via execbuf.
> 
> Signed-off-by: Jason Ekstrand 
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_context.c | 7 +++
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index 5f8d0faf783aa..ecb3bf5369857 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -1640,11 +1640,10 @@ set_engines(struct i915_gem_context *ctx,
>   return -EINVAL;
>   }
>  
> - /*
> -  * Note that I915_EXEC_RING_MASK limits execbuf to only using the
> -  * first 64 engines defined here.
> -  */
>   num_engines = (args->size - sizeof(*user)) / sizeof(*user->engines);

Maybe add a comment like /* RING_MASK has not shift, so can be used
directly here */ since I had to check that :-)

Same story about igt testcases needed, just to be sure.

Reviewed-by: Daniel Vetter 

> + if (num_engines > I915_EXEC_RING_MASK + 1)
> + return -EINVAL;
> +
>   set.engines = alloc_engines(num_engines);
>   if (!set.engines)
>   return -ENOMEM;
> -- 
> 2.31.1
> 
> ___
> Intel-gfx mailing list
> intel-...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 11/21] drm/i915: Stop manually RCU banging in reset_stats_ioctl

On Fri, Apr 23, 2021 at 05:31:21PM -0500, Jason Ekstrand wrote:
> As far as I can tell, the only real reason for this is to avoid taking a
> reference to the i915_gem_context.  The cost of those two atomics
> probably pales in comparison to the cost of the ioctl itself so we're
> really not buying ourselves anything here.  We're about to make context
> lookup a tiny bit more complicated, so let's get rid of the one hand-
> rolled case.

I think the historical reason here is that i965_brw checks this before
every execbuf call, at least for arb_robustness contexts with the right
flag. But we've fixed that hotpath problem by adding non-recoverable
contexts. The kernel will tell you now automatically, for proper userspace
at least (I checked iris and anv, assuming I got it correct), and
reset_stats ioctl isn't a hot path worth micro-optimizing anymore.

With that bit of more context added to the commit message:

Reviewed-by: Daniel Vetter 

> 
> Signed-off-by: Jason Ekstrand 
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_context.c | 13 -
>  drivers/gpu/drm/i915/i915_drv.h |  8 +---
>  2 files changed, 5 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index ecb3bf5369857..941fbf78267b4 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -2090,16 +2090,13 @@ int i915_gem_context_reset_stats_ioctl(struct 
> drm_device *dev,
>   struct drm_i915_private *i915 = to_i915(dev);
>   struct drm_i915_reset_stats *args = data;
>   struct i915_gem_context *ctx;
> - int ret;
>  
>   if (args->flags || args->pad)
>   return -EINVAL;
>  
> - ret = -ENOENT;
> - rcu_read_lock();
> - ctx = __i915_gem_context_lookup_rcu(file->driver_priv, args->ctx_id);
> + ctx = i915_gem_context_lookup(file->driver_priv, args->ctx_id);
>   if (!ctx)
> - goto out;
> + return -ENOENT;
>  
>   /*
>* We opt for unserialised reads here. This may result in tearing
> @@ -2116,10 +2113,8 @@ int i915_gem_context_reset_stats_ioctl(struct 
> drm_device *dev,
>   args->batch_active = atomic_read(&ctx->guilty_count);
>   args->batch_pending = atomic_read(&ctx->active_count);
>  
> - ret = 0;
> -out:
> - rcu_read_unlock();
> - return ret;
> + i915_gem_context_put(ctx);
> + return 0;
>  }
>  
>  /* GEM context-engines iterator: for_each_gem_engine() */
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 0b44333eb7033..8571c5c1509a7 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1840,19 +1840,13 @@ struct drm_gem_object *i915_gem_prime_import(struct 
> drm_device *dev,
>  
>  struct dma_buf *i915_gem_prime_export(struct drm_gem_object *gem_obj, int 
> flags);
>  
> -static inline struct i915_gem_context *
> -__i915_gem_context_lookup_rcu(struct drm_i915_file_private *file_priv, u32 
> id)
> -{
> - return xa_load(&file_priv->context_xa, id);
> -}
> -
>  static inline struct i915_gem_context *
>  i915_gem_context_lookup(struct drm_i915_file_private *file_priv, u32 id)
>  {
>   struct i915_gem_context *ctx;
>  
>   rcu_read_lock();
> - ctx = __i915_gem_context_lookup_rcu(file_priv, id);
> + ctx = xa_load(&file_priv->context_xa, id);
>   if (ctx && !kref_get_unless_zero(&ctx->ref))
>   ctx = NULL;
>   rcu_read_unlock();
> -- 
> 2.31.1
> 
> ___
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal


Am 28.04.21 um 12:05 schrieb Daniel Vetter:

On Tue, Apr 27, 2021 at 02:01:20PM -0400, Alex Deucher wrote:

On Tue, Apr 27, 2021 at 1:35 PM Simon Ser  wrote:

On Tuesday, April 27th, 2021 at 7:31 PM, Lucas Stach  
wrote:


Ok. So that would only make the following use cases broken for now:

- amd render -> external gpu
- amd video encode -> network device

FWIW, "only" breaking amd render -> external gpu will make us pretty
unhappy

I concur. I have quite a few users with a multi-GPU setup involving
AMD hardware.

Note, if this brokenness can't be avoided, I'd prefer a to get a clear
error, and not bad results on screen because nothing is synchronized
anymore.

It's an upcoming requirement for windows[1], so you are likely to
start seeing this across all GPU vendors that support windows.  I
think the timing depends on how quickly the legacy hardware support
sticks around for each vendor.

Yeah but hw scheduling doesn't mean the hw has to be constructed to not
support isolating the ringbuffer at all.

E.g. even if the hw loses the bit to put the ringbuffer outside of the
userspace gpu vm, if you have pagetables I'm seriously hoping you have r/o
pte flags. Otherwise the entire "share address space with cpu side,
seamlessly" thing is out of the window.

And with that r/o bit on the ringbuffer you can once more force submit
through kernel space, and all the legacy dma_fence based stuff keeps
working. And we don't have to invent some horrendous userspace fence based
implicit sync mechanism in the kernel, but can instead do this transition
properly with drm_syncobj timeline explicit sync and protocol reving.

At least I think you'd have to work extra hard to create a gpu which
cannot possibly be intercepted by the kernel, even when it's designed to
support userspace direct submit only.

Or are your hw engineers more creative here and we're screwed?


The upcomming hardware generation will have this hardware scheduler as a 
must have, but there are certain ways we can still stick to the old 
approach:


1. The new hardware scheduler currently still supports kernel queues 
which essentially is the same as the old hardware ring buffer.


2. Mapping the top level ring buffer into the VM at least partially 
solves the problem. This way you can't manipulate the ring buffer 
content, but the location for the fence must still be writeable.


For now and the next hardware we are save to support the old submission 
model, but the functionality of kernel queues will sooner or later go 
away if it is only for Linux.


So we need to work on something which works in the long term and get us 
away from this implicit sync.


Christian.


-Daniel


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [Intel-gfx] [PATCH 09/21] drm/i915/gem: Disallow creating contexts with too many engines

2021-04-28 Thread Tvrtko Ursulin




On 28/04/2021 11:16, Daniel Vetter wrote:

On Fri, Apr 23, 2021 at 05:31:19PM -0500, Jason Ekstrand wrote:

There's no sense in allowing userspace to create more engines than it
can possibly access via execbuf.

Signed-off-by: Jason Ekstrand 
---
  drivers/gpu/drm/i915/gem/i915_gem_context.c | 7 +++
  1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 5f8d0faf783aa..ecb3bf5369857 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1640,11 +1640,10 @@ set_engines(struct i915_gem_context *ctx,
return -EINVAL;
}
  
-	/*

-* Note that I915_EXEC_RING_MASK limits execbuf to only using the
-* first 64 engines defined here.
-*/
num_engines = (args->size - sizeof(*user)) / sizeof(*user->engines);


Maybe add a comment like /* RING_MASK has not shift, so can be used
directly here */ since I had to check that :-)

Same story about igt testcases needed, just to be sure.

Reviewed-by: Daniel Vetter 


I am not sure about the churn vs benefit ratio here. There are also 
patches which extend the engine selection field in execbuf2 over the 
unused constants bits (with an explicit flag). So churn upstream and 
churn in internal (if interesting) for not much benefit.


Regards,

Tvrtko


+   if (num_engines > I915_EXEC_RING_MASK + 1)
+   return -EINVAL;
+
set.engines = alloc_engines(num_engines);
if (!set.engines)
return -ENOMEM;
--
2.31.1

___
Intel-gfx mailing list
intel-...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx



___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH v2] drm/bridge: anx7625: Fix power on delay

2021-04-28 Thread Hsin-Yi Wang

>From anx7625 spec, the delay between powering on power supplies and gpio
should be larger than 10ms.

Fixes: 6c744983004e ("drm/bridge: anx7625: disable regulators when power off")
Signed-off-by: Hsin-Yi Wang 
Reviewed-by: Neil Armstrong 
---
v1->v2: Extend sleep range a bit as the regulator on some device takes
more time to be powered on after regulator_enable() is called.
---
 drivers/gpu/drm/bridge/analogix/anx7625.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/bridge/analogix/anx7625.c 
b/drivers/gpu/drm/bridge/analogix/anx7625.c
index 23283ba0c4f9..b4e349ca38fe 100644
--- a/drivers/gpu/drm/bridge/analogix/anx7625.c
+++ b/drivers/gpu/drm/bridge/analogix/anx7625.c
@@ -893,7 +893,7 @@ static void anx7625_power_on(struct anx7625_data *ctx)
usleep_range(2000, 2100);
}
 
-   usleep_range(4000, 4100);
+   usleep_range(11000, 12000);
 
/* Power on pin enable */
gpiod_set_value(ctx->pdata.gpio_p_on, 1);
-- 
2.31.1.498.g6c1eba8ee3d-goog

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v6 0/2] Add support for ANX7688

2021-04-28 Thread Dafna Hirschfeld

Hi, pinging here, can one of the kernel bridge maintainers review this patchset?

Thanks,
Dafna

On 09.04.21 18:19, Dafna Hirschfeld wrote:

ANX7688 is a typec port controller that also converts HDMI to DP.
ANX7688 is found on Acer Chromebook R13 (elm) and on Pine64 PinePhone.

On Acer Chromebook R13 (elm), the device is powered-up and controller by the
Embedded Controller. Therefore its operation is transparent
to the SoC. It is used in elm only as a display bridge driver.
The bridge driver only reads some values using i2c and use them to
implement the mode_fixup cb.

On v5 we added the full dt-binding of the generic Analogix anx7688 device.
The problem is that for elm, most of the fields are not needed since
the anx7688 sits behind the EC. After a discussion on v5 (see [1])
we decided to go back to the original approach and send the dt binding
as specific to the elm. So in this version we rename the device to
cros_ec_anx7688
and use the compatible 'google,cros-ec-anx7688'.

[1]
https://patchwork.kernel.org/project/dri-devel/patch/20210305124351.15079-3-dafna.hirschf...@collabora.com/

Changes since v5:
* treat the device as a specific combination of an ANX7688 behind the EC and
call it 'cros-ec-anx7688'

Changes since v4:
In v4 of this set, the device was added as an 'mfd' device
and an additional 'bridge' device for the HDMI-DP conversion, see [2].

[2] https://lkml.org/lkml/2020/3/18/64

Dafna Hirschfeld (1):
dt-bindings: display: add google,cros-ec-anx7688.yaml

Enric Balletbo i Serra (1):
drm/bridge: Add ChromeOS EC ANX7688 bridge driver support

.../bridge/google,cros-ec-anx7688.yaml| 82
drivers/gpu/drm/bridge/Kconfig| 12 ++
drivers/gpu/drm/bridge/Makefile | 1 +
drivers/gpu/drm/bridge/cros-ec-anx7688.c | 191 ++
4 files changed, 286 insertions(+)
create mode 100644
Documentation/devicetree/bindings/display/bridge/google,cros-ec-anx7688.yaml
create mode 100644 drivers/gpu/drm/bridge/cros-ec-anx7688.c

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: Display notch support

On Wed, Apr 28, 2021 at 10:44:03AM +0300, Pekka Paalanen wrote:
> On Wed, 28 Apr 2021 07:21:28 +
> Simon Ser  wrote:
> 
> > > A solution to make this configuration generic and exposed by the kernel
> > > would standardise this across Linux  
> > 
> > Having a KMS property for this makes sense to me.
> > 
> > Chatting with Jani on IRC, it doesn't seem like there's any EDID or
> > DisplayID block for this.
> > 
> > Note, Android exposes a data structure [1] with:
> > 
> > - Margin of the cut-out for each edge of the screen
> > - One rectangle per edge describing the cut-out region
> > - Size of the curved area for each edge of a waterfall display
> > 
> > I haven't found anything describing the rounded corners of the display.
> > 
> > [1]: https://developer.android.com/reference/android/view/DisplayCutout
> 
> Hi,
> 
> I'm kind of worried whether you can design a description structure that
> would be good for a long time. That list already looks quite
> complicated. Add also watch-like devices with circular displays.
> 
> Would the kernel itself use this information at all?
> 
> If not, is there not a policy that DT is not a userspace configuration
> store?

If someone is sufficiently bored it would make sense to teach fbcon (but
not fbdev I guess for full sized boot splash) to avoid the edges/corners
for output.

But also fbcon/fbdev is I think finally dead on Android, so the
intersection of people who care about cut-outs and fbcon is likely 0.

Otherwise I can't think of anything.

> You mentioned the panel orientation property, but that is used by the
> kernel for fbcon or something, is it not? Maybe as the default value
> for the CRTC rotation property which actually turns the image?
> 
> Assuming that you succeed in describing these non-usable, funny
> (waterfall edge), funny2 (e.g. behind a shade or filter so visible but
> not normal), funny3 (e.g. phone button area with maybe tactile
> markings), and normal areas, how would userspace handle this
> information?
> 
> Funny2 and funny3 are hypothetical but maybe not too far-fetched.
> 
> Is there any provision for generic userspace to handle this generically?
> 
> This seems more like a job for the hypothetical liboutput, just like
> recognising HMDs (yes, I know, kernel does that already, but there is a
> point that kernel may not want to put fbcon on a HMD).

I think the desktop linux solution would be hwdb entries, except we've
never done this for anything display related. So yeah liboutput sounds
about right for this :-)

Btw on fbcon on HMD, I thought we're already taking care of that?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCHv2 1/6] drm: drm_bridge: add connector_attach/detach bridge ops

On 16/04/2021 09:46, Tomi Valkeinen wrote:
> Hi Hans,
> 
> On 02/03/2021 18:23, Hans Verkuil wrote:
>> Add bridge connector_attach/detach ops. These ops are called when a
>> bridge is attached or detached to a drm_connector. These ops can be
>> used to register and unregister an HDMI CEC adapter for a bridge that
>> supports CEC.
>>
>> Signed-off-by: Hans Verkuil 
>> ---
>>   drivers/gpu/drm/drm_bridge_connector.c |  9 +
>>   include/drm/drm_bridge.h   | 27 ++
>>   2 files changed, 36 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/drm_bridge_connector.c 
>> b/drivers/gpu/drm/drm_bridge_connector.c
>> index 791379816837..07db71d4f5b3 100644
>> --- a/drivers/gpu/drm/drm_bridge_connector.c
>> +++ b/drivers/gpu/drm/drm_bridge_connector.c
>> @@ -203,6 +203,11 @@ static void drm_bridge_connector_destroy(struct 
>> drm_connector *connector)
>>   {
>>  struct drm_bridge_connector *bridge_connector =
>>  to_drm_bridge_connector(connector);
>> +struct drm_bridge *bridge;
>> +
>> +drm_for_each_bridge_in_chain(bridge_connector->encoder, bridge)
>> +if (bridge->funcs->connector_detach)
>> +bridge->funcs->connector_detach(bridge, connector);
>>   
>>  if (bridge_connector->bridge_hpd) {
>>  struct drm_bridge *hpd = bridge_connector->bridge_hpd;
>> @@ -375,6 +380,10 @@ struct drm_connector *drm_bridge_connector_init(struct 
>> drm_device *drm,
>>  connector->polled = DRM_CONNECTOR_POLL_CONNECT
>>| DRM_CONNECTOR_POLL_DISCONNECT;
>>   
>> +drm_for_each_bridge_in_chain(encoder, bridge)
>> +if (bridge->funcs->connector_attach)
>> +bridge->funcs->connector_attach(bridge, connector);
>> +
>>  return connector;
>>   }
>>   EXPORT_SYMBOL_GPL(drm_bridge_connector_init);
>> diff --git a/include/drm/drm_bridge.h b/include/drm/drm_bridge.h
>> index 2195daa289d2..3320a6ebd253 100644
>> --- a/include/drm/drm_bridge.h
>> +++ b/include/drm/drm_bridge.h
>> @@ -629,6 +629,33 @@ struct drm_bridge_funcs {
>>   * the DRM_BRIDGE_OP_HPD flag in their &drm_bridge->ops.
>>   */
>>  void (*hpd_disable)(struct drm_bridge *bridge);
>> +
>> +/**
>> + * @connector_attach:
>> + *
>> + * This callback is invoked whenever our bridge is being attached to a
>> + * &drm_connector. This is where an HDMI CEC adapter can be registered.
>> + * Note that this callback expects that this op always succeeds. Since
>> + * HDMI CEC support is an optional feature, any failure to register a
>> + * CEC adapter must be ignored since video output will still work
>> + * without CEC.
>> + *
> 
> Even if CEC support is optional, the callback itself is generic. 
> Wouldn't it be better to make this function return an error, and for 
> CEC, just return 0 if CEC won't get registered correctly?

I'll do that.

> 
> Also, I personally like things to fail if something doesn't go right, 
> instead of continuing, if that thing is never supposed to happen in 
> normal situations. E.g. if CEC registration fails because we're out of 
> memory, I think the op should fail too.

If that happens you have no video output. And that's a lot more important
than CEC! As you suggested, I'll have the cec connector_attach just return
0.

Regards,

Hans

> 
>   Tomi
> 

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: Display notch support

2021-04-28 Thread Jani Nikula

On Wed, 28 Apr 2021, Daniel Vetter  wrote:
> On Wed, Apr 28, 2021 at 10:44:03AM +0300, Pekka Paalanen wrote:
>> This seems more like a job for the hypothetical liboutput, just like
>> recognising HMDs (yes, I know, kernel does that already, but there is a
>> point that kernel may not want to put fbcon on a HMD).
>
> I think the desktop linux solution would be hwdb entries, except we've
> never done this for anything display related. So yeah liboutput sounds
> about right for this :-)
>
> Btw on fbcon on HMD, I thought we're already taking care of that?

This is a bit off-topic, but DisplayID 2.0 defines primary use cases for
head mounted VR and AR, so we wouldn't have to quirk them.

BR,
Jani.

-- 
Jani Nikula, Intel Open Source Graphics Center
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

On Wed, Apr 28, 2021 at 12:31:09PM +0200, Christian König wrote:
> Am 28.04.21 um 12:05 schrieb Daniel Vetter:
> > On Tue, Apr 27, 2021 at 02:01:20PM -0400, Alex Deucher wrote:
> > > On Tue, Apr 27, 2021 at 1:35 PM Simon Ser  wrote:
> > > > On Tuesday, April 27th, 2021 at 7:31 PM, Lucas Stach 
> > > >  wrote:
> > > > 
> > > > > > Ok. So that would only make the following use cases broken for now:
> > > > > > 
> > > > > > - amd render -> external gpu
> > > > > > - amd video encode -> network device
> > > > > FWIW, "only" breaking amd render -> external gpu will make us pretty
> > > > > unhappy
> > > > I concur. I have quite a few users with a multi-GPU setup involving
> > > > AMD hardware.
> > > > 
> > > > Note, if this brokenness can't be avoided, I'd prefer a to get a clear
> > > > error, and not bad results on screen because nothing is synchronized
> > > > anymore.
> > > It's an upcoming requirement for windows[1], so you are likely to
> > > start seeing this across all GPU vendors that support windows.  I
> > > think the timing depends on how quickly the legacy hardware support
> > > sticks around for each vendor.
> > Yeah but hw scheduling doesn't mean the hw has to be constructed to not
> > support isolating the ringbuffer at all.
> > 
> > E.g. even if the hw loses the bit to put the ringbuffer outside of the
> > userspace gpu vm, if you have pagetables I'm seriously hoping you have r/o
> > pte flags. Otherwise the entire "share address space with cpu side,
> > seamlessly" thing is out of the window.
> > 
> > And with that r/o bit on the ringbuffer you can once more force submit
> > through kernel space, and all the legacy dma_fence based stuff keeps
> > working. And we don't have to invent some horrendous userspace fence based
> > implicit sync mechanism in the kernel, but can instead do this transition
> > properly with drm_syncobj timeline explicit sync and protocol reving.
> > 
> > At least I think you'd have to work extra hard to create a gpu which
> > cannot possibly be intercepted by the kernel, even when it's designed to
> > support userspace direct submit only.
> > 
> > Or are your hw engineers more creative here and we're screwed?
> 
> The upcomming hardware generation will have this hardware scheduler as a
> must have, but there are certain ways we can still stick to the old
> approach:
> 
> 1. The new hardware scheduler currently still supports kernel queues which
> essentially is the same as the old hardware ring buffer.
> 
> 2. Mapping the top level ring buffer into the VM at least partially solves
> the problem. This way you can't manipulate the ring buffer content, but the
> location for the fence must still be writeable.

Yeah allowing userspace to lie about completion fences in this model is
ok. Though I haven't thought through full consequences of that, but I
think it's not any worse than userspace lying about which buffers/address
it uses in the current model - we rely on hw vm ptes to catch that stuff.

Also it might be good to switch to a non-recoverable ctx model for these.
That's already what we do in i915 (opt-in, but all current umd use that
mode). So any hang/watchdog just kills the entire ctx and you don't have
to worry about userspace doing something funny with it's ringbuffer.
Simplifies everything.

Also ofc userspace fencing still disallowed, but since userspace would
queu up all writes to its ringbuffer through the drm/scheduler, we'd
handle dependencies through that still. Not great, but workable.

Thinking about this, not even mapping the ringbuffer r/o is required, it's
just that we must queue things throug the kernel to resolve dependencies
and everything without breaking dma_fence. If userspace lies, tdr will
shoot it and the kernel stops running that context entirely.

So I think even if we have hw with 100% userspace submit model only we
should be still fine. It's ofc silly, because instead of using userspace
fences and gpu semaphores the hw scheduler understands we still take the
detour through drm/scheduler, but at least it's not a break-the-world
event.

Or do I miss something here?

> For now and the next hardware we are save to support the old submission
> model, but the functionality of kernel queues will sooner or later go away
> if it is only for Linux.
> 
> So we need to work on something which works in the long term and get us away
> from this implicit sync.

Yeah I think we have pretty clear consensus on that goal, just no one yet
volunteered to get going with the winsys/wayland work to plumb drm_syncobj
through, and the kernel/mesa work to make that optionally a userspace
fence underneath. And it's for a sure a lot of work.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

On Wed, Apr 28, 2021 at 02:21:54PM +0200, Daniel Vetter wrote:
> On Wed, Apr 28, 2021 at 12:31:09PM +0200, Christian König wrote:
> > Am 28.04.21 um 12:05 schrieb Daniel Vetter:
> > > On Tue, Apr 27, 2021 at 02:01:20PM -0400, Alex Deucher wrote:
> > > > On Tue, Apr 27, 2021 at 1:35 PM Simon Ser  wrote:
> > > > > On Tuesday, April 27th, 2021 at 7:31 PM, Lucas Stach 
> > > > >  wrote:
> > > > > 
> > > > > > > Ok. So that would only make the following use cases broken for 
> > > > > > > now:
> > > > > > > 
> > > > > > > - amd render -> external gpu
> > > > > > > - amd video encode -> network device
> > > > > > FWIW, "only" breaking amd render -> external gpu will make us pretty
> > > > > > unhappy
> > > > > I concur. I have quite a few users with a multi-GPU setup involving
> > > > > AMD hardware.
> > > > > 
> > > > > Note, if this brokenness can't be avoided, I'd prefer a to get a clear
> > > > > error, and not bad results on screen because nothing is synchronized
> > > > > anymore.
> > > > It's an upcoming requirement for windows[1], so you are likely to
> > > > start seeing this across all GPU vendors that support windows.  I
> > > > think the timing depends on how quickly the legacy hardware support
> > > > sticks around for each vendor.
> > > Yeah but hw scheduling doesn't mean the hw has to be constructed to not
> > > support isolating the ringbuffer at all.
> > > 
> > > E.g. even if the hw loses the bit to put the ringbuffer outside of the
> > > userspace gpu vm, if you have pagetables I'm seriously hoping you have r/o
> > > pte flags. Otherwise the entire "share address space with cpu side,
> > > seamlessly" thing is out of the window.
> > > 
> > > And with that r/o bit on the ringbuffer you can once more force submit
> > > through kernel space, and all the legacy dma_fence based stuff keeps
> > > working. And we don't have to invent some horrendous userspace fence based
> > > implicit sync mechanism in the kernel, but can instead do this transition
> > > properly with drm_syncobj timeline explicit sync and protocol reving.
> > > 
> > > At least I think you'd have to work extra hard to create a gpu which
> > > cannot possibly be intercepted by the kernel, even when it's designed to
> > > support userspace direct submit only.
> > > 
> > > Or are your hw engineers more creative here and we're screwed?
> > 
> > The upcomming hardware generation will have this hardware scheduler as a
> > must have, but there are certain ways we can still stick to the old
> > approach:
> > 
> > 1. The new hardware scheduler currently still supports kernel queues which
> > essentially is the same as the old hardware ring buffer.
> > 
> > 2. Mapping the top level ring buffer into the VM at least partially solves
> > the problem. This way you can't manipulate the ring buffer content, but the
> > location for the fence must still be writeable.
> 
> Yeah allowing userspace to lie about completion fences in this model is
> ok. Though I haven't thought through full consequences of that, but I
> think it's not any worse than userspace lying about which buffers/address
> it uses in the current model - we rely on hw vm ptes to catch that stuff.
> 
> Also it might be good to switch to a non-recoverable ctx model for these.
> That's already what we do in i915 (opt-in, but all current umd use that
> mode). So any hang/watchdog just kills the entire ctx and you don't have
> to worry about userspace doing something funny with it's ringbuffer.
> Simplifies everything.
> 
> Also ofc userspace fencing still disallowed, but since userspace would
> queu up all writes to its ringbuffer through the drm/scheduler, we'd
> handle dependencies through that still. Not great, but workable.
> 
> Thinking about this, not even mapping the ringbuffer r/o is required, it's
> just that we must queue things throug the kernel to resolve dependencies
> and everything without breaking dma_fence. If userspace lies, tdr will
> shoot it and the kernel stops running that context entirely.
> 
> So I think even if we have hw with 100% userspace submit model only we
> should be still fine. It's ofc silly, because instead of using userspace
> fences and gpu semaphores the hw scheduler understands we still take the
> detour through drm/scheduler, but at least it's not a break-the-world
> event.

Also no page fault support, userptr invalidates still stall until
end-of-batch instead of just preempting it, and all that too. But I mean
there needs to be some motivation to fix this and roll out explicit sync
:-)
-Daniel

> 
> Or do I miss something here?
> 
> > For now and the next hardware we are save to support the old submission
> > model, but the functionality of kernel queues will sooner or later go away
> > if it is only for Linux.
> > 
> > So we need to work on something which works in the long term and get us away
> > from this implicit sync.
> 
> Yeah I think we have pretty clear consensus on that goal, just no one yet
> volunteered to get going with the winsys/way

Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-04-28 Thread Simon Ser

On Wednesday, April 28th, 2021 at 2:21 PM, Daniel Vetter  
wrote:

> Yeah I think we have pretty clear consensus on that goal, just no one yet
> volunteered to get going with the winsys/wayland work to plumb drm_syncobj
> through, and the kernel/mesa work to make that optionally a userspace
> fence underneath. And it's for a sure a lot of work.

I'm interested in helping with the winsys/wayland bits, assuming the
following:

- We are pretty confident that drm_syncobj won't be superseded by
  something else in the near future. It seems to me like a lot of
  effort has gone into plumbing sync_file stuff all over, and it
  already needs replacing (I mean, it'll keep working, but we have a
  better replacement now. So compositors which have decided to ignore
  explicit sync for all this time won't have to do the work twice.)
- Plumbing drm_syncobj solves the synchronization issues with upcoming
  AMD hardware, and all of this works fine in cross-vendor multi-GPU
  setups.
- Someone is willing to spend a bit of time bearing with me and
  explaining how this all works. (I only know about sync_file for now,
  I'll start reading the Vulkan bits.)

Are these points something we can agree on?

Thanks,

Simon
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[Bug 212871] New: AMD Radeon Pro VEGA 20 (Aka Vega12) - Glitch and freeze on any kernel and/or distro.

https://bugzilla.kernel.org/show_bug.cgi?id=212871

Bug ID: 212871
   Summary: AMD Radeon Pro VEGA 20 (Aka Vega12) - Glitch and
freeze on any kernel and/or distro.
   Product: Drivers
   Version: 2.5
Kernel Version: Any
  Hardware: x86-64
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: blocking
  Priority: P1
 Component: Video(DRI - non Intel)
  Assignee: drivers_video-...@kernel-bugs.osdl.org
  Reporter: rodrigo.lug...@icloud.com
Regression: No

I have a macbook pro with vega 20 which uses the amdgpu firmware vega12 and
when i boot any distro the graphics glitch and the computer freezes.
 If i install amdgpu pro on ubuntu it works flawlessly. Would you guys help me
debug this and fix for upstream? 

Let me know what can I send to complement the information required for
analysis, like logs or dmesg. I would be very happy to help and participate on
this.

Please, excuse me if this is not the right place for me to ask this kind of
thing, and please if you can, kindly redirect me to the right place.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-04-28 Thread Alex Deucher

On Wed, Apr 28, 2021 at 6:31 AM Christian König
 wrote:
>
> Am 28.04.21 um 12:05 schrieb Daniel Vetter:
> > On Tue, Apr 27, 2021 at 02:01:20PM -0400, Alex Deucher wrote:
> >> On Tue, Apr 27, 2021 at 1:35 PM Simon Ser  wrote:
> >>> On Tuesday, April 27th, 2021 at 7:31 PM, Lucas Stach 
> >>>  wrote:
> >>>
> > Ok. So that would only make the following use cases broken for now:
> >
> > - amd render -> external gpu
> > - amd video encode -> network device
>  FWIW, "only" breaking amd render -> external gpu will make us pretty
>  unhappy
> >>> I concur. I have quite a few users with a multi-GPU setup involving
> >>> AMD hardware.
> >>>
> >>> Note, if this brokenness can't be avoided, I'd prefer a to get a clear
> >>> error, and not bad results on screen because nothing is synchronized
> >>> anymore.
> >> It's an upcoming requirement for windows[1], so you are likely to
> >> start seeing this across all GPU vendors that support windows.  I
> >> think the timing depends on how quickly the legacy hardware support
> >> sticks around for each vendor.
> > Yeah but hw scheduling doesn't mean the hw has to be constructed to not
> > support isolating the ringbuffer at all.
> >
> > E.g. even if the hw loses the bit to put the ringbuffer outside of the
> > userspace gpu vm, if you have pagetables I'm seriously hoping you have r/o
> > pte flags. Otherwise the entire "share address space with cpu side,
> > seamlessly" thing is out of the window.
> >
> > And with that r/o bit on the ringbuffer you can once more force submit
> > through kernel space, and all the legacy dma_fence based stuff keeps
> > working. And we don't have to invent some horrendous userspace fence based
> > implicit sync mechanism in the kernel, but can instead do this transition
> > properly with drm_syncobj timeline explicit sync and protocol reving.
> >
> > At least I think you'd have to work extra hard to create a gpu which
> > cannot possibly be intercepted by the kernel, even when it's designed to
> > support userspace direct submit only.
> >
> > Or are your hw engineers more creative here and we're screwed?
>
> The upcomming hardware generation will have this hardware scheduler as a
> must have, but there are certain ways we can still stick to the old
> approach:
>
> 1. The new hardware scheduler currently still supports kernel queues
> which essentially is the same as the old hardware ring buffer.
>
> 2. Mapping the top level ring buffer into the VM at least partially
> solves the problem. This way you can't manipulate the ring buffer
> content, but the location for the fence must still be writeable.
>
> For now and the next hardware we are save to support the old submission
> model, but the functionality of kernel queues will sooner or later go
> away if it is only for Linux.

Even if it didn't go away completely, no one else will be using it.
This leaves a lot of under-validated execution paths that lead to
subtle bugs.  When everyone else moved to KIQ for queue management, we
stuck with MMIO for a while in Linux and we ran into tons of subtle
bugs that disappeared when we moved to KIQ.  There were lots of
assumptions about how software would use different firmware interfaces
or not which impacted lots of interactions with clock and powergating
to name a few.  On top of that, you need to use the scheduler to
utilize stuff like preemption properly.  Also, if you want to do stuff
like gang scheduling (UMD scheduling multiple queues together), it's
really hard to do with kernel software schedulers.

Alex

>
> So we need to work on something which works in the long term and get us
> away from this implicit sync.
>
> Christian.
>
> > -Daniel
>
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH v3 4/4] staging: fbtft: Update TODO

Now, after a few fixes we may consider the conversion to
the GPIO descriptor API is done.

Signed-off-by: Andy Shevchenko 
---
 drivers/staging/fbtft/TODO | 5 -
 1 file changed, 5 deletions(-)

diff --git a/drivers/staging/fbtft/TODO b/drivers/staging/fbtft/TODO
index a9f4802bb6be..e72a08bf221c 100644
--- a/drivers/staging/fbtft/TODO
+++ b/drivers/staging/fbtft/TODO
@@ -1,8 +1,3 @@
-* convert all uses of the old GPIO API from  to the
-  GPIO descriptor API in  and look up GPIO
-  lines from device tree, ACPI or board files, board files should
-  use 
-
 * convert all these over to drm_simple_display_pipe and submit for inclusion
   into the DRM subsystem under drivers/gpu/drm - fbdev doesn't take any new
   drivers anymore.
-- 
2.30.2

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH v3 3/4] staging: fbtft: Don't spam logs when probe is deferred

When requesting GPIO line the probe can be deferred.
In such case don't spam logs with an error message.
This can be achieved by switching to dev_err_probe().

Signed-off-by: Andy Shevchenko 
---
 drivers/staging/fbtft/fbtft-core.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/staging/fbtft/fbtft-core.c 
b/drivers/staging/fbtft/fbtft-core.c
index 67c3b1975a4d..a564907c4fa1 100644
--- a/drivers/staging/fbtft/fbtft-core.c
+++ b/drivers/staging/fbtft/fbtft-core.c
@@ -75,20 +75,16 @@ static int fbtft_request_one_gpio(struct fbtft_par *par,
  struct gpio_desc **gpiop)
 {
struct device *dev = par->info->device;
-   int ret = 0;
 
*gpiop = devm_gpiod_get_index_optional(dev, name, index,
   GPIOD_OUT_LOW);
-   if (IS_ERR(*gpiop)) {
-   ret = PTR_ERR(*gpiop);
-   dev_err(dev,
-   "Failed to request %s GPIO: %d\n", name, ret);
-   return ret;
-   }
+   if (IS_ERR(*gpiop))
+   dev_err_probe(dev, PTR_ERR(*gpiop), "Failed to request %s 
GPIO\n", name);
+
fbtft_par_dbg(DEBUG_REQUEST_GPIOS, par, "%s: '%s' GPIO\n",
  __func__, name);
 
-   return ret;
+   return 0;
 }
 
 static int fbtft_request_gpios(struct fbtft_par *par)
-- 
2.30.2

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH v3 0/4] staging: fbtft: Fixing GPIO handling issues

This series fixes a number of GPIO handling issues after converting this driver
to use descriptors.

The series has been tested on HX8347d display with parallel interface. Without
first patch it's not working.

In v3:
 - added staging prefix (Fabio)
 - slightly amended commit message in the patch 1
 - added Rb tag (Phil)
 - dropped Fixes tag from the patch 2 (Greg)

Andy Shevchenko (4):
  staging: fbtft: Rectify GPIO handling
  staging: fbtft: Replace custom ->reset() with generic one
  staging: fbtft: Don't spam logs when probe is deferred
  staging: fbtft: Update TODO

 drivers/staging/fbtft/TODO |  5 -
 drivers/staging/fbtft/fb_agm1264k-fl.c | 30 +++---
 drivers/staging/fbtft/fb_bd663474.c|  4 
 drivers/staging/fbtft/fb_ili9163.c |  4 
 drivers/staging/fbtft/fb_ili9320.c |  1 -
 drivers/staging/fbtft/fb_ili9325.c |  4 
 drivers/staging/fbtft/fb_ili9340.c |  1 -
 drivers/staging/fbtft/fb_s6d1121.c |  4 
 drivers/staging/fbtft/fb_sh1106.c  |  1 -
 drivers/staging/fbtft/fb_ssd1289.c |  4 
 drivers/staging/fbtft/fb_ssd1325.c |  2 --
 drivers/staging/fbtft/fb_ssd1331.c |  6 ++
 drivers/staging/fbtft/fb_ssd1351.c |  1 -
 drivers/staging/fbtft/fb_upd161704.c   |  4 
 drivers/staging/fbtft/fb_watterott.c   |  1 -
 drivers/staging/fbtft/fbtft-bus.c  |  3 +--
 drivers/staging/fbtft/fbtft-core.c | 25 +
 drivers/staging/fbtft/fbtft-io.c   | 12 +--
 18 files changed, 27 insertions(+), 85 deletions(-)

-- 
2.30.2

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH v3 1/4] staging: fbtft: Rectify GPIO handling

The infamous commit c440eee1a7a1 ("Staging: staging: fbtft: Switch to
the GPIO descriptor interface") broke GPIO handling completely.
It has already four commits to rectify and it seems not enough.
In order to fix the mess here we:

  1) Set default to "inactive" for all requested pins

  2) Fix CS#, RD#, and WR# pins polarity since it's active low
 and GPIO descriptor interface takes it into consideration
 from the Device Tree or ACPI

  3) Consolidate chip activation (CS# assertion) under default
 ->reset() callback

To summarize the expectations about polarity for GPIOs:

   RD#  Low
   WR#  Low
   CS#  Low
   RESET#   Low
   DC or RS High
   RW   High
   Data 0 .. 15 High

See also Adafruit learning course [1] for the example of the schematics.

While at it, drop unneeded NULL checks, since GPIO API is tolerant to that.

[1]: 
https://learn.adafruit.com/adafruit-2-8-and-3-2-color-tft-touchscreen-breakout-v2/downloads

Fixes: 92e3e884887c ("Staging: staging: fbtft: Fix GPIO handling")
Fixes: b918d1c27066 ("Staging: staging: fbtft: Fix reset assertion when using 
gpio descriptor")
Fixes: dbc4f989c878 ("Staging: staging: fbtft: Fix probing of gpio descriptor")
Fixes: c440eee1a7a1 ("Staging: staging: fbtft: Switch to the gpio descriptor 
interface")
Cc: Jan Sebastian Götte 
Cc: Nishad Kamdar 
Signed-off-by: Andy Shevchenko 
Reviewed-by: Phil Reid 
---
 drivers/staging/fbtft/fb_agm1264k-fl.c | 20 ++--
 drivers/staging/fbtft/fb_bd663474.c|  4 
 drivers/staging/fbtft/fb_ili9163.c |  4 
 drivers/staging/fbtft/fb_ili9320.c |  1 -
 drivers/staging/fbtft/fb_ili9325.c |  4 
 drivers/staging/fbtft/fb_ili9340.c |  1 -
 drivers/staging/fbtft/fb_s6d1121.c |  4 
 drivers/staging/fbtft/fb_sh1106.c  |  1 -
 drivers/staging/fbtft/fb_ssd1289.c |  4 
 drivers/staging/fbtft/fb_ssd1325.c |  2 --
 drivers/staging/fbtft/fb_ssd1331.c |  6 ++
 drivers/staging/fbtft/fb_ssd1351.c |  1 -
 drivers/staging/fbtft/fb_upd161704.c   |  4 
 drivers/staging/fbtft/fb_watterott.c   |  1 -
 drivers/staging/fbtft/fbtft-bus.c  |  3 +--
 drivers/staging/fbtft/fbtft-core.c | 13 ++---
 drivers/staging/fbtft/fbtft-io.c   | 12 ++--
 17 files changed, 25 insertions(+), 60 deletions(-)

diff --git a/drivers/staging/fbtft/fb_agm1264k-fl.c 
b/drivers/staging/fbtft/fb_agm1264k-fl.c
index ec97ad27..b545c2ca80a4 100644
--- a/drivers/staging/fbtft/fb_agm1264k-fl.c
+++ b/drivers/staging/fbtft/fb_agm1264k-fl.c
@@ -84,9 +84,9 @@ static void reset(struct fbtft_par *par)
 
dev_dbg(par->info->device, "%s()\n", __func__);
 
-   gpiod_set_value(par->gpio.reset, 0);
-   udelay(20);
gpiod_set_value(par->gpio.reset, 1);
+   udelay(20);
+   gpiod_set_value(par->gpio.reset, 0);
mdelay(120);
 }
 
@@ -194,12 +194,12 @@ static void write_reg8_bus8(struct fbtft_par *par, int 
len, ...)
/* select chip */
if (*buf) {
/* cs1 */
-   gpiod_set_value(par->CS0, 1);
-   gpiod_set_value(par->CS1, 0);
-   } else {
-   /* cs0 */
gpiod_set_value(par->CS0, 0);
gpiod_set_value(par->CS1, 1);
+   } else {
+   /* cs0 */
+   gpiod_set_value(par->CS0, 1);
+   gpiod_set_value(par->CS1, 0);
}
 
gpiod_set_value(par->RS, 0); /* RS->0 (command mode) */
@@ -397,8 +397,8 @@ static int write_vmem(struct fbtft_par *par, size_t offset, 
size_t len)
}
kfree(convert_buf);
 
-   gpiod_set_value(par->CS0, 1);
-   gpiod_set_value(par->CS1, 1);
+   gpiod_set_value(par->CS0, 0);
+   gpiod_set_value(par->CS1, 0);
 
return ret;
 }
@@ -419,10 +419,10 @@ static int write(struct fbtft_par *par, void *buf, size_t 
len)
for (i = 0; i < 8; ++i)
gpiod_set_value(par->gpio.db[i], data & (1 << i));
/* set E */
-   gpiod_set_value(par->EPIN, 1);
+   gpiod_set_value(par->EPIN, 0);
udelay(5);
/* unset E - write */
-   gpiod_set_value(par->EPIN, 0);
+   gpiod_set_value(par->EPIN, 1);
udelay(1);
}
 
diff --git a/drivers/staging/fbtft/fb_bd663474.c 
b/drivers/staging/fbtft/fb_bd663474.c
index e2c7646588f8..1629c2c440a9 100644
--- a/drivers/staging/fbtft/fb_bd663474.c
+++ b/drivers/staging/fbtft/fb_bd663474.c
@@ -12,7 +12,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 
 #include "fbtft.h"
@@ -24,9 +23,6 @@
 
 static int init_display(struct fbtft_par *par)
 {
-   if (par->gpio.cs)
-   gpiod_set_value(par->gpio.cs, 0);  /* Activate chip */
-
par->fbtftops.reset(par);
 
/* Initialization sequence from Lib_UTFT */
diff --git a/drivers/staging/fbtft/fb_ili9163.c 
b/drivers

[PATCH v3 2/4] staging: fbtft: Replace custom ->reset() with generic one

The custom ->reset() repeats the generic one, replace it.

Note, in newer kernels the context of the function is a sleeping one,
it's fine to switch over to the sleeping functions. Keeping the reset
line asserted longer than 20 microseconds is also okay, it's an idling
state of the hardware.

Signed-off-by: Andy Shevchenko 
---
 drivers/staging/fbtft/fb_agm1264k-fl.c | 14 --
 1 file changed, 14 deletions(-)

diff --git a/drivers/staging/fbtft/fb_agm1264k-fl.c 
b/drivers/staging/fbtft/fb_agm1264k-fl.c
index b545c2ca80a4..207d578547cd 100644
--- a/drivers/staging/fbtft/fb_agm1264k-fl.c
+++ b/drivers/staging/fbtft/fb_agm1264k-fl.c
@@ -77,19 +77,6 @@ static int init_display(struct fbtft_par *par)
return 0;
 }
 
-static void reset(struct fbtft_par *par)
-{
-   if (!par->gpio.reset)
-   return;
-
-   dev_dbg(par->info->device, "%s()\n", __func__);
-
-   gpiod_set_value(par->gpio.reset, 1);
-   udelay(20);
-   gpiod_set_value(par->gpio.reset, 0);
-   mdelay(120);
-}
-
 /* Check if all necessary GPIOS defined */
 static int verify_gpios(struct fbtft_par *par)
 {
@@ -439,7 +426,6 @@ static struct fbtft_display display = {
.set_addr_win = set_addr_win,
.verify_gpios = verify_gpios,
.request_gpios_match = request_gpios_match,
-   .reset = reset,
.write = write,
.write_register = write_reg8_bus8,
.write_vmem = write_vmem,
-- 
2.30.2

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH] drivers: i2c: i2c-core-smbus.c: Fix alignment of comment

2021-04-28 Thread Shubhankar Kuranagatti

Multi line comment have been aligned starting with a *
The closing */ has been shifted to a new line.
Single space replaced with tab space
This is done to maintain code uniformity.

Signed-off-by: Shubhankar Kuranagatti 
---
 drivers/i2c/i2c-core-smbus.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/i2c/i2c-core-smbus.c b/drivers/i2c/i2c-core-smbus.c
index d2d32c0fd8c3..205750518c21 100644
--- a/drivers/i2c/i2c-core-smbus.c
+++ b/drivers/i2c/i2c-core-smbus.c
@@ -66,10 +66,11 @@ static inline void i2c_smbus_add_pec(struct i2c_msg *msg)
 }
 
 /* Return <0 on CRC error
-   If there was a write before this read (most cases) we need to take the
-   partial CRC from the write part into account.
-   Note that this function does modify the message (we need to decrease the
-   message length to hide the CRC byte from the caller). */
+ * If there was a write before this read (most cases) we need to take the
+ * partial CRC from the write part into account.
+ * Note that this function does modify the message (we need to decrease the
+ * message length to hide the CRC byte from the caller).
+ */
 static int i2c_smbus_check_pec(u8 cpec, struct i2c_msg *msg)
 {
u8 rpec = msg->buf[--msg->len];
@@ -113,7 +114,7 @@ EXPORT_SYMBOL(i2c_smbus_read_byte);
 s32 i2c_smbus_write_byte(const struct i2c_client *client, u8 value)
 {
return i2c_smbus_xfer(client->adapter, client->addr, client->flags,
- I2C_SMBUS_WRITE, value, I2C_SMBUS_BYTE, NULL);
+   I2C_SMBUS_WRITE, value, I2C_SMBUS_BYTE, NULL);
 }
 EXPORT_SYMBOL(i2c_smbus_write_byte);
 
@@ -387,7 +388,8 @@ static s32 i2c_smbus_xfer_emulated(struct i2c_adapter 
*adapter, u16 addr,
if (read_write == I2C_SMBUS_READ) {
msg[1].flags |= I2C_M_RECV_LEN;
msg[1].len = 1; /* block length will be added by
-  the underlying bus driver */
+* the underlying bus driver
+*/
i2c_smbus_try_get_dmabuf(&msg[1], 0);
} else {
msg[0].len = data->block[0] + 2;
@@ -418,7 +420,8 @@ static s32 i2c_smbus_xfer_emulated(struct i2c_adapter 
*adapter, u16 addr,
 
msg[1].flags |= I2C_M_RECV_LEN;
msg[1].len = 1; /* block length will be added by
-  the underlying bus driver */
+* the underlying bus driver
+*/
i2c_smbus_try_get_dmabuf(&msg[1], 0);
break;
case I2C_SMBUS_I2C_BLOCK_DATA:
-- 
2.17.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[Bug 212871] AMD Radeon Pro VEGA 20 (Aka Vega12) - Glitch and freeze on any kernel and/or distro.

https://bugzilla.kernel.org/show_bug.cgi?id=212871

Alex Deucher (alexdeuc...@gmail.com) changed:

   What|Removed |Added

 CC||alexdeuc...@gmail.com

--- Comment #1 from Alex Deucher (alexdeuc...@gmail.com) ---
amdgpu pro uses the same driver as upstream, just packaged so that you can
install it on enterprise distros, so the code is the same.  What driver package
version did you use?  What upstream kernels have you tried?  Please include the
dmesg output from the working and non-working cases.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[Bug 212871] AMD Radeon Pro VEGA 20 (Aka Vega12) - Glitch and freeze on any kernel and/or distro.

https://bugzilla.kernel.org/show_bug.cgi?id=212871

--- Comment #3 from Alex Deucher (alexdeuc...@gmail.com) ---
Note that you don't need to file two bugs.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[Bug 212871] AMD Radeon Pro VEGA 20 (Aka Vega12) - Glitch and freeze on any kernel and/or distro.

https://bugzilla.kernel.org/show_bug.cgi?id=212871

--- Comment #2 from Alex Deucher (alexdeuc...@gmail.com) ---
Also filed as: https://gitlab.freedesktop.org/drm/amd/-/issues/1582

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal


Am 28.04.21 um 14:26 schrieb Daniel Vetter:

On Wed, Apr 28, 2021 at 02:21:54PM +0200, Daniel Vetter wrote:

On Wed, Apr 28, 2021 at 12:31:09PM +0200, Christian König wrote:

Am 28.04.21 um 12:05 schrieb Daniel Vetter:

On Tue, Apr 27, 2021 at 02:01:20PM -0400, Alex Deucher wrote:

On Tue, Apr 27, 2021 at 1:35 PM Simon Ser  wrote:

On Tuesday, April 27th, 2021 at 7:31 PM, Lucas Stach  
wrote:


Ok. So that would only make the following use cases broken for now:

- amd render -> external gpu
- amd video encode -> network device

FWIW, "only" breaking amd render -> external gpu will make us pretty
unhappy

I concur. I have quite a few users with a multi-GPU setup involving
AMD hardware.

Note, if this brokenness can't be avoided, I'd prefer a to get a clear
error, and not bad results on screen because nothing is synchronized
anymore.

It's an upcoming requirement for windows[1], so you are likely to
start seeing this across all GPU vendors that support windows.  I
think the timing depends on how quickly the legacy hardware support
sticks around for each vendor.

Yeah but hw scheduling doesn't mean the hw has to be constructed to not
support isolating the ringbuffer at all.

E.g. even if the hw loses the bit to put the ringbuffer outside of the
userspace gpu vm, if you have pagetables I'm seriously hoping you have r/o
pte flags. Otherwise the entire "share address space with cpu side,
seamlessly" thing is out of the window.

And with that r/o bit on the ringbuffer you can once more force submit
through kernel space, and all the legacy dma_fence based stuff keeps
working. And we don't have to invent some horrendous userspace fence based
implicit sync mechanism in the kernel, but can instead do this transition
properly with drm_syncobj timeline explicit sync and protocol reving.

At least I think you'd have to work extra hard to create a gpu which
cannot possibly be intercepted by the kernel, even when it's designed to
support userspace direct submit only.

Or are your hw engineers more creative here and we're screwed?

The upcomming hardware generation will have this hardware scheduler as a
must have, but there are certain ways we can still stick to the old
approach:

1. The new hardware scheduler currently still supports kernel queues which
essentially is the same as the old hardware ring buffer.

2. Mapping the top level ring buffer into the VM at least partially solves
the problem. This way you can't manipulate the ring buffer content, but the
location for the fence must still be writeable.

Yeah allowing userspace to lie about completion fences in this model is
ok. Though I haven't thought through full consequences of that, but I
think it's not any worse than userspace lying about which buffers/address
it uses in the current model - we rely on hw vm ptes to catch that stuff.

Also it might be good to switch to a non-recoverable ctx model for these.
That's already what we do in i915 (opt-in, but all current umd use that
mode). So any hang/watchdog just kills the entire ctx and you don't have
to worry about userspace doing something funny with it's ringbuffer.
Simplifies everything.

Also ofc userspace fencing still disallowed, but since userspace would
queu up all writes to its ringbuffer through the drm/scheduler, we'd
handle dependencies through that still. Not great, but workable.

Thinking about this, not even mapping the ringbuffer r/o is required, it's
just that we must queue things throug the kernel to resolve dependencies
and everything without breaking dma_fence. If userspace lies, tdr will
shoot it and the kernel stops running that context entirely.


Thinking more about that approach I don't think that it will work correctly.

See we not only need to write the fence as signal that an IB is 
submitted, but also adjust a bunch of privileged hardware registers.


When userspace could do that from its IBs as well then there is nothing 
blocking it from reprogramming the page table base address for example.


We could do those writes with the CPU as well, but that would be a huge 
performance drop because of the additional latency.


Christian.



So I think even if we have hw with 100% userspace submit model only we
should be still fine. It's ofc silly, because instead of using userspace
fences and gpu semaphores the hw scheduler understands we still take the
detour through drm/scheduler, but at least it's not a break-the-world
event.

Also no page fault support, userptr invalidates still stall until
end-of-batch instead of just preempting it, and all that too. But I mean
there needs to be some motivation to fix this and roll out explicit sync
:-)
-Daniel


Or do I miss something here?


For now and the next hardware we are save to support the old submission
model, but the functionality of kernel queues will sooner or later go away
if it is only for Linux.

So we need to work on something which works in the long term and get us away
from this implicit sync.

Yeah I think we have pretty clear consens

[PATCHv3 1/6] drm: drm_bridge: add connector_attach/detach bridge ops

Add bridge connector_attach/detach ops. These ops are called when a
bridge is attached or detached to a drm_connector. These ops can be
used to register and unregister an HDMI CEC adapter for a bridge that
supports CEC.

Signed-off-by: Hans Verkuil 
---
 drivers/gpu/drm/drm_bridge_connector.c | 25 +++-
 include/drm/drm_bridge.h   | 27 ++
 2 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_bridge_connector.c 
b/drivers/gpu/drm/drm_bridge_connector.c
index 791379816837..0676677badfe 100644
--- a/drivers/gpu/drm/drm_bridge_connector.c
+++ b/drivers/gpu/drm/drm_bridge_connector.c
@@ -203,6 +203,11 @@ static void drm_bridge_connector_destroy(struct 
drm_connector *connector)
 {
struct drm_bridge_connector *bridge_connector =
to_drm_bridge_connector(connector);
+   struct drm_bridge *bridge;
+
+   drm_for_each_bridge_in_chain(bridge_connector->encoder, bridge)
+   if (bridge->funcs->connector_detach)
+   bridge->funcs->connector_detach(bridge, connector);
 
if (bridge_connector->bridge_hpd) {
struct drm_bridge *hpd = bridge_connector->bridge_hpd;
@@ -318,6 +323,7 @@ struct drm_connector *drm_bridge_connector_init(struct 
drm_device *drm,
struct i2c_adapter *ddc = NULL;
struct drm_bridge *bridge;
int connector_type;
+   int ret;
 
bridge_connector = kzalloc(sizeof(*bridge_connector), GFP_KERNEL);
if (!bridge_connector)
@@ -375,6 +381,23 @@ struct drm_connector *drm_bridge_connector_init(struct 
drm_device *drm,
connector->polled = DRM_CONNECTOR_POLL_CONNECT
  | DRM_CONNECTOR_POLL_DISCONNECT;
 
-   return connector;
+   ret = 0;
+   /* call connector_attach for all bridges */
+   drm_for_each_bridge_in_chain(encoder, bridge) {
+   if (!bridge->funcs->connector_attach)
+   continue;
+   ret = bridge->funcs->connector_attach(bridge, connector);
+   if (ret)
+   break;
+   }
+   if (!ret)
+   return connector;
+
+   /* on error, detach any previously successfully attached connectors */
+   list_for_each_entry_continue_reverse(bridge, &(encoder)->bridge_chain,
+chain_node)
+   if (bridge->funcs->connector_detach)
+   bridge->funcs->connector_detach(bridge, connector);
+   return ERR_PTR(ret);
 }
 EXPORT_SYMBOL_GPL(drm_bridge_connector_init);
diff --git a/include/drm/drm_bridge.h b/include/drm/drm_bridge.h
index 2195daa289d2..333fbc3a03e9 100644
--- a/include/drm/drm_bridge.h
+++ b/include/drm/drm_bridge.h
@@ -629,6 +629,33 @@ struct drm_bridge_funcs {
 * the DRM_BRIDGE_OP_HPD flag in their &drm_bridge->ops.
 */
void (*hpd_disable)(struct drm_bridge *bridge);
+
+   /**
+* @connector_attach:
+*
+* This callback is invoked whenever our bridge is being attached to a
+* &drm_connector. This is where an HDMI CEC adapter can be registered.
+*
+* The @connector_attach callback is optional.
+*
+* RETURNS:
+*
+* Zero on success, error code on failure.
+*/
+   int (*connector_attach)(struct drm_bridge *bridge,
+   struct drm_connector *conn);
+
+   /**
+* @connector_detach:
+*
+* This callback is invoked whenever our bridge is being detached from a
+* &drm_connector. This is where an HDMI CEC adapter can be
+* unregistered.
+*
+* The @connector_detach callback is optional.
+*/
+   void (*connector_detach)(struct drm_bridge *bridge,
+struct drm_connector *conn);
 };
 
 /**
-- 
2.30.2

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCHv3 2/6] drm/omapdrm/dss/hdmi4: switch to the connector bridge ops

Implement the new connector_attach/detach bridge ops. This makes it
possible to associate a CEC adapter with a drm connector, which helps
userspace determine which cec device node belongs to which drm connector.

Signed-off-by: Hans Verkuil 
Reviewed-by: Tomi Valkeinen 
---
 drivers/gpu/drm/omapdrm/dss/hdmi4.c | 28 ++---
 drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c |  9 +---
 drivers/gpu/drm/omapdrm/dss/hdmi4_cec.h |  7 ---
 3 files changed, 30 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/omapdrm/dss/hdmi4.c 
b/drivers/gpu/drm/omapdrm/dss/hdmi4.c
index 35b750cebaeb..e29d4d186265 100644
--- a/drivers/gpu/drm/omapdrm/dss/hdmi4.c
+++ b/drivers/gpu/drm/omapdrm/dss/hdmi4.c
@@ -482,6 +482,23 @@ static struct edid *hdmi4_bridge_get_edid(struct 
drm_bridge *bridge,
return edid;
 }
 
+static int hdmi4_bridge_connector_attach(struct drm_bridge *bridge,
+struct drm_connector *conn)
+{
+   struct omap_hdmi *hdmi = drm_bridge_to_hdmi(bridge);
+
+   hdmi4_cec_init(hdmi->pdev, &hdmi->core, &hdmi->wp, conn);
+   return 0;
+}
+
+static void hdmi4_bridge_connector_detach(struct drm_bridge *bridge,
+ struct drm_connector *conn)
+{
+   struct omap_hdmi *hdmi = drm_bridge_to_hdmi(bridge);
+
+   hdmi4_cec_uninit(&hdmi->core);
+}
+
 static const struct drm_bridge_funcs hdmi4_bridge_funcs = {
.attach = hdmi4_bridge_attach,
.mode_set = hdmi4_bridge_mode_set,
@@ -492,6 +509,8 @@ static const struct drm_bridge_funcs hdmi4_bridge_funcs = {
.atomic_disable = hdmi4_bridge_disable,
.hpd_notify = hdmi4_bridge_hpd_notify,
.get_edid = hdmi4_bridge_get_edid,
+   .connector_attach = hdmi4_bridge_connector_attach,
+   .connector_detach = hdmi4_bridge_connector_detach,
 };
 
 static void hdmi4_bridge_init(struct omap_hdmi *hdmi)
@@ -647,14 +666,10 @@ static int hdmi4_bind(struct device *dev, struct device 
*master, void *data)
if (r)
goto err_runtime_put;
 
-   r = hdmi4_cec_init(hdmi->pdev, &hdmi->core, &hdmi->wp);
-   if (r)
-   goto err_pll_uninit;
-
r = hdmi_audio_register(hdmi);
if (r) {
DSSERR("Registering HDMI audio failed\n");
-   goto err_cec_uninit;
+   goto err_pll_uninit;
}
 
hdmi->debugfs = dss_debugfs_create_file(dss, "hdmi", hdmi_dump_regs,
@@ -664,8 +679,6 @@ static int hdmi4_bind(struct device *dev, struct device 
*master, void *data)
 
return 0;
 
-err_cec_uninit:
-   hdmi4_cec_uninit(&hdmi->core);
 err_pll_uninit:
hdmi_pll_uninit(&hdmi->pll);
 err_runtime_put:
@@ -682,7 +695,6 @@ static void hdmi4_unbind(struct device *dev, struct device 
*master, void *data)
if (hdmi->audio_pdev)
platform_device_unregister(hdmi->audio_pdev);
 
-   hdmi4_cec_uninit(&hdmi->core);
hdmi_pll_uninit(&hdmi->pll);
 }
 
diff --git a/drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c 
b/drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c
index 43592c1cf081..80ec52c9c846 100644
--- a/drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c
+++ b/drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c
@@ -335,10 +335,10 @@ void hdmi4_cec_set_phys_addr(struct hdmi_core_data *core, 
u16 pa)
 }
 
 int hdmi4_cec_init(struct platform_device *pdev, struct hdmi_core_data *core,
- struct hdmi_wp_data *wp)
+  struct hdmi_wp_data *wp, struct drm_connector *conn)
 {
-   const u32 caps = CEC_CAP_TRANSMIT | CEC_CAP_LOG_ADDRS |
-CEC_CAP_PASSTHROUGH | CEC_CAP_RC;
+   const u32 caps = CEC_CAP_DEFAULTS | CEC_CAP_CONNECTOR_INFO;
+   struct cec_connector_info conn_info;
int ret;
 
core->adap = cec_allocate_adapter(&hdmi_cec_adap_ops, core,
@@ -346,6 +346,8 @@ int hdmi4_cec_init(struct platform_device *pdev, struct 
hdmi_core_data *core,
ret = PTR_ERR_OR_ZERO(core->adap);
if (ret < 0)
return ret;
+   cec_fill_conn_info_from_drm(&conn_info, conn);
+   cec_s_conn_info(core->adap, &conn_info);
core->wp = wp;
 
/* Disable clock initially, hdmi_cec_adap_enable() manages it */
@@ -354,6 +356,7 @@ int hdmi4_cec_init(struct platform_device *pdev, struct 
hdmi_core_data *core,
ret = cec_register_adapter(core->adap, &pdev->dev);
if (ret < 0) {
cec_delete_adapter(core->adap);
+   core->adap = NULL;
return ret;
}
return 0;
diff --git a/drivers/gpu/drm/omapdrm/dss/hdmi4_cec.h 
b/drivers/gpu/drm/omapdrm/dss/hdmi4_cec.h
index 0292337c97cc..b59a54c3040e 100644
--- a/drivers/gpu/drm/omapdrm/dss/hdmi4_cec.h
+++ b/drivers/gpu/drm/omapdrm/dss/hdmi4_cec.h
@@ -29,7 +29,7 @@ struct platform_device;
 void hdmi4_cec_set_phys_addr(struct hdmi_core_data *core, u16 pa);
 void hdmi4_cec_irq(struct hdmi_core_data *core);
 int hdmi4_cec_init(struct platform_device *pdev, struct hd

[PATCHv3 6/6] drm/omapdrm/dss/hdmi5: add CEC support

Add HDMI CEC support for OMAP5.

Signed-off-by: Hans Verkuil 
Reviewed-by: Tomi Valkeinen 
---
 drivers/gpu/drm/omapdrm/Kconfig  |   8 +
 drivers/gpu/drm/omapdrm/Makefile |   1 +
 drivers/gpu/drm/omapdrm/dss/hdmi.h   |   1 +
 drivers/gpu/drm/omapdrm/dss/hdmi5.c  |  64 +--
 drivers/gpu/drm/omapdrm/dss/hdmi5_cec.c  | 209 +++
 drivers/gpu/drm/omapdrm/dss/hdmi5_cec.h  |  42 +
 drivers/gpu/drm/omapdrm/dss/hdmi5_core.c |  35 +++-
 drivers/gpu/drm/omapdrm/dss/hdmi5_core.h |  33 +++-
 8 files changed, 374 insertions(+), 19 deletions(-)
 create mode 100644 drivers/gpu/drm/omapdrm/dss/hdmi5_cec.c
 create mode 100644 drivers/gpu/drm/omapdrm/dss/hdmi5_cec.h

diff --git a/drivers/gpu/drm/omapdrm/Kconfig b/drivers/gpu/drm/omapdrm/Kconfig
index e7281da5bc6a..08866ac7d869 100644
--- a/drivers/gpu/drm/omapdrm/Kconfig
+++ b/drivers/gpu/drm/omapdrm/Kconfig
@@ -80,6 +80,14 @@ config OMAP5_DSS_HDMI
  Definition Multimedia Interface. See http://www.hdmi.org/ for HDMI
  specification.
 
+config OMAP5_DSS_HDMI_CEC
+   bool "Enable HDMI CEC support for OMAP5"
+   depends on OMAP5_DSS_HDMI
+   select CEC_CORE
+   default y
+   help
+ When selected the HDMI transmitter will support the CEC feature.
+
 config OMAP2_DSS_SDI
bool "SDI support"
default n
diff --git a/drivers/gpu/drm/omapdrm/Makefile b/drivers/gpu/drm/omapdrm/Makefile
index 21e8277ff88f..0732bd2dae1e 100644
--- a/drivers/gpu/drm/omapdrm/Makefile
+++ b/drivers/gpu/drm/omapdrm/Makefile
@@ -29,6 +29,7 @@ omapdrm-$(CONFIG_OMAP2_DSS_HDMI_COMMON) += dss/hdmi_common.o 
dss/hdmi_wp.o \
 omapdrm-$(CONFIG_OMAP4_DSS_HDMI) += dss/hdmi4.o dss/hdmi4_core.o
 omapdrm-$(CONFIG_OMAP4_DSS_HDMI_CEC) += dss/hdmi4_cec.o
 omapdrm-$(CONFIG_OMAP5_DSS_HDMI) += dss/hdmi5.o dss/hdmi5_core.o
+omapdrm-$(CONFIG_OMAP5_DSS_HDMI_CEC) += dss/hdmi5_cec.o
 ccflags-$(CONFIG_OMAP2_DSS_DEBUG) += -DDEBUG
 
 obj-$(CONFIG_DRM_OMAP) += omapdrm.o
diff --git a/drivers/gpu/drm/omapdrm/dss/hdmi.h 
b/drivers/gpu/drm/omapdrm/dss/hdmi.h
index c4a4e07f0b99..72d8ae441da6 100644
--- a/drivers/gpu/drm/omapdrm/dss/hdmi.h
+++ b/drivers/gpu/drm/omapdrm/dss/hdmi.h
@@ -261,6 +261,7 @@ struct hdmi_core_data {
struct hdmi_wp_data *wp;
unsigned int core_pwr_cnt;
struct cec_adapter *adap;
+   struct clk *cec_clk;
 };
 
 static inline void hdmi_write_reg(void __iomem *base_addr, const u32 idx,
diff --git a/drivers/gpu/drm/omapdrm/dss/hdmi5.c 
b/drivers/gpu/drm/omapdrm/dss/hdmi5.c
index 65085d886da5..11941d7b1d81 100644
--- a/drivers/gpu/drm/omapdrm/dss/hdmi5.c
+++ b/drivers/gpu/drm/omapdrm/dss/hdmi5.c
@@ -29,12 +29,14 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
 
 #include "omapdss.h"
 #include "hdmi5_core.h"
+#include "hdmi5_cec.h"
 #include "dss.h"
 
 static int hdmi_runtime_get(struct omap_hdmi *hdmi)
@@ -105,6 +107,9 @@ static irqreturn_t hdmi_irq_handler(int irq, void *data)
hdmi_wp_set_phy_pwr(wp, HDMI_PHYPWRCMD_LDOON);
}
 
+   if (irqstatus & HDMI_IRQ_CORE)
+   hdmi5_core_handle_irqs(&hdmi->core);
+
return IRQ_HANDLED;
 }
 
@@ -112,9 +117,12 @@ static int hdmi_power_on_core(struct omap_hdmi *hdmi)
 {
int r;
 
+   if (hdmi->core.core_pwr_cnt++)
+   return 0;
+
r = regulator_enable(hdmi->vdda_reg);
if (r)
-   return r;
+   goto err_reg_enable;
 
r = hdmi_runtime_get(hdmi);
if (r)
@@ -129,12 +137,17 @@ static int hdmi_power_on_core(struct omap_hdmi *hdmi)
 
 err_runtime_get:
regulator_disable(hdmi->vdda_reg);
+err_reg_enable:
+   hdmi->core.core_pwr_cnt--;
 
return r;
 }
 
 static void hdmi_power_off_core(struct omap_hdmi *hdmi)
 {
+   if (--hdmi->core.core_pwr_cnt)
+   return;
+
hdmi->core_enabled = false;
 
hdmi_runtime_put(hdmi);
@@ -168,9 +181,9 @@ static int hdmi_power_on_full(struct omap_hdmi *hdmi)
pc, &hdmi_cinfo);
 
/* disable and clear irqs */
-   hdmi_wp_clear_irqenable(&hdmi->wp, 0x);
+   hdmi_wp_clear_irqenable(&hdmi->wp, ~HDMI_IRQ_CORE);
hdmi_wp_set_irqstatus(&hdmi->wp,
-   hdmi_wp_get_irqstatus(&hdmi->wp));
+   hdmi_wp_get_irqstatus(&hdmi->wp) & ~HDMI_IRQ_CORE);
 
r = dss_pll_enable(&hdmi->pll.pll);
if (r) {
@@ -225,7 +238,7 @@ static int hdmi_power_on_full(struct omap_hdmi *hdmi)
 
 static void hdmi_power_off_full(struct omap_hdmi *hdmi)
 {
-   hdmi_wp_clear_irqenable(&hdmi->wp, 0x);
+   hdmi_wp_clear_irqenable(&hdmi->wp, ~HDMI_IRQ_CORE);
 
hdmi_wp_video_stop(&hdmi->wp);
 
@@ -273,11 +286,11 @@ static void hdmi_stop_audio_stream(struct omap_hdmi *hd)
REG_FLD_MOD(hd->wp.base, HDMI_WP_SYSCONFIG, hd->wp_idlemode, 3, 2);
 }
 
-static int hdmi_core_enable(struct omap_hdmi *hdmi)
+int hdmi5_core_enable(struct omap_hdmi *hdmi)
 {
int r = 0;

[PATCHv3 4/6] dt-bindings: display: ti: ti, omap5-dss.txt: add cec clock

The cec clock is required as well in order to support HDMI CEC,
document this.

Signed-off-by: Hans Verkuil 
Reviewed-by: Tomi Valkeinen 
Acked-by: Rob Herring 
---
 Documentation/devicetree/bindings/display/ti/ti,omap5-dss.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/devicetree/bindings/display/ti/ti,omap5-dss.txt 
b/Documentation/devicetree/bindings/display/ti/ti,omap5-dss.txt
index 20861218649f..c321c67472f0 100644
--- a/Documentation/devicetree/bindings/display/ti/ti,omap5-dss.txt
+++ b/Documentation/devicetree/bindings/display/ti/ti,omap5-dss.txt
@@ -89,8 +89,8 @@ Required properties:
 - interrupts: the HDMI interrupt line
 - ti,hwmods: "dss_hdmi"
 - vdda-supply: vdda power supply
-- clocks: handles to fclk and pll clock
-- clock-names: "fck", "sys_clk"
+- clocks: handles to fclk, pll and cec clock
+- clock-names: "fck", "sys_clk", "cec"
 
 Optional nodes:
 - Video port for HDMI output
-- 
2.30.2

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCHv3 3/6] drm/omapdrm/dss/hdmi4: simplify CEC Phys Addr handling

Switch to using cec_s_phys_addr_from_edid() instead of a two-step process
of calling cec_get_edid_phys_addr() followed by cec_s_phys_addr().

Signed-off-by: Hans Verkuil 
Reviewed-by: Tomi Valkeinen 
---
 drivers/gpu/drm/omapdrm/dss/hdmi4.c | 13 ++---
 drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c |  4 ++--
 drivers/gpu/drm/omapdrm/dss/hdmi4_cec.h |  5 +++--
 3 files changed, 7 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/omapdrm/dss/hdmi4.c 
b/drivers/gpu/drm/omapdrm/dss/hdmi4.c
index e29d4d186265..40f791c668f4 100644
--- a/drivers/gpu/drm/omapdrm/dss/hdmi4.c
+++ b/drivers/gpu/drm/omapdrm/dss/hdmi4.c
@@ -432,7 +432,7 @@ static void hdmi4_bridge_hpd_notify(struct drm_bridge 
*bridge,
struct omap_hdmi *hdmi = drm_bridge_to_hdmi(bridge);
 
if (status == connector_status_disconnected)
-   hdmi4_cec_set_phys_addr(&hdmi->core, CEC_PHYS_ADDR_INVALID);
+   hdmi4_cec_set_phys_addr(&hdmi->core, NULL);
 }
 
 static struct edid *hdmi4_bridge_get_edid(struct drm_bridge *bridge,
@@ -440,7 +440,6 @@ static struct edid *hdmi4_bridge_get_edid(struct drm_bridge 
*bridge,
 {
struct omap_hdmi *hdmi = drm_bridge_to_hdmi(bridge);
struct edid *edid = NULL;
-   unsigned int cec_addr;
bool need_enable;
int r;
 
@@ -466,15 +465,7 @@ static struct edid *hdmi4_bridge_get_edid(struct 
drm_bridge *bridge,
hdmi_runtime_put(hdmi);
mutex_unlock(&hdmi->lock);
 
-   if (edid && edid->extensions) {
-   unsigned int len = (edid->extensions + 1) * EDID_LENGTH;
-
-   cec_addr = cec_get_edid_phys_addr((u8 *)edid, len, NULL);
-   } else {
-   cec_addr = CEC_PHYS_ADDR_INVALID;
-   }
-
-   hdmi4_cec_set_phys_addr(&hdmi->core, cec_addr);
+   hdmi4_cec_set_phys_addr(&hdmi->core, edid);
 
if (need_enable)
hdmi4_core_disable(&hdmi->core);
diff --git a/drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c 
b/drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c
index 80ec52c9c846..cf406d86c845 100644
--- a/drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c
+++ b/drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c
@@ -329,9 +329,9 @@ static const struct cec_adap_ops hdmi_cec_adap_ops = {
.adap_transmit = hdmi_cec_adap_transmit,
 };
 
-void hdmi4_cec_set_phys_addr(struct hdmi_core_data *core, u16 pa)
+void hdmi4_cec_set_phys_addr(struct hdmi_core_data *core, struct edid *edid)
 {
-   cec_s_phys_addr(core->adap, pa, false);
+   cec_s_phys_addr_from_edid(core->adap, edid);
 }
 
 int hdmi4_cec_init(struct platform_device *pdev, struct hdmi_core_data *core,
diff --git a/drivers/gpu/drm/omapdrm/dss/hdmi4_cec.h 
b/drivers/gpu/drm/omapdrm/dss/hdmi4_cec.h
index b59a54c3040e..16bf259643b7 100644
--- a/drivers/gpu/drm/omapdrm/dss/hdmi4_cec.h
+++ b/drivers/gpu/drm/omapdrm/dss/hdmi4_cec.h
@@ -26,13 +26,14 @@ struct platform_device;
 
 /* HDMI CEC funcs */
 #ifdef CONFIG_OMAP4_DSS_HDMI_CEC
-void hdmi4_cec_set_phys_addr(struct hdmi_core_data *core, u16 pa);
+void hdmi4_cec_set_phys_addr(struct hdmi_core_data *core, struct edid *edid);
 void hdmi4_cec_irq(struct hdmi_core_data *core);
 int hdmi4_cec_init(struct platform_device *pdev, struct hdmi_core_data *core,
   struct hdmi_wp_data *wp, struct drm_connector *conn);
 void hdmi4_cec_uninit(struct hdmi_core_data *core);
 #else
-static inline void hdmi4_cec_set_phys_addr(struct hdmi_core_data *core, u16 pa)
+static inline void hdmi4_cec_set_phys_addr(struct hdmi_core_data *core,
+  struct edid *edid)
 {
 }
 
-- 
2.30.2

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCHv3 5/6] dra7.dtsi/omap5.dtsi: add cec clock

Add cec clock to the dra7 and omap5 device trees.

Signed-off-by: Hans Verkuil 
Acked-by: Tony Lindgren 
Reviewed-by: Tomi Valkeinen 
---
 arch/arm/boot/dts/dra7.dtsi  | 5 +++--
 arch/arm/boot/dts/omap5.dtsi | 5 +++--
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/arm/boot/dts/dra7.dtsi b/arch/arm/boot/dts/dra7.dtsi
index ce1194744f84..efe579ddb324 100644
--- a/arch/arm/boot/dts/dra7.dtsi
+++ b/arch/arm/boot/dts/dra7.dtsi
@@ -879,8 +879,9 @@ hdmi: encoder@0 {
interrupts = ;
status = "disabled";
clocks = <&dss_clkctrl 
DRA7_DSS_DSS_CORE_CLKCTRL 9>,
-<&dss_clkctrl 
DRA7_DSS_DSS_CORE_CLKCTRL 10>;
-   clock-names = "fck", "sys_clk";
+<&dss_clkctrl 
DRA7_DSS_DSS_CORE_CLKCTRL 10>,
+<&dss_clkctrl 
DRA7_DSS_DSS_CORE_CLKCTRL 11>;
+   clock-names = "fck", "sys_clk", 
"cec";
dmas = <&sdma_xbar 76>;
dma-names = "audio_tx";
};
diff --git a/arch/arm/boot/dts/omap5.dtsi b/arch/arm/boot/dts/omap5.dtsi
index e025b7c9a357..6726e1f1b07c 100644
--- a/arch/arm/boot/dts/omap5.dtsi
+++ b/arch/arm/boot/dts/omap5.dtsi
@@ -586,8 +586,9 @@ hdmi: encoder@0 {
interrupts = ;
status = "disabled";
clocks = <&dss_clkctrl 
OMAP5_DSS_CORE_CLKCTRL 9>,
-<&dss_clkctrl 
OMAP5_DSS_CORE_CLKCTRL 10>;
-   clock-names = "fck", "sys_clk";
+<&dss_clkctrl 
OMAP5_DSS_CORE_CLKCTRL 10>,
+<&dss_clkctrl 
OMAP5_DSS_CORE_CLKCTRL 11>;
+   clock-names = "fck", "sys_clk", 
"cec";
dmas = <&sdma 76>;
dma-names = "audio_tx";
};
-- 
2.30.2

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCHv3 0/6] drm/omap: hdmi: improve hdmi4 CEC, add CEC for hdmi5

This series improves the drm_bridge support for CEC by introducing two
new bridge ops in the first patch, and using those in the second patch.

This makes it possible to call cec_s_conn_info() and set
CEC_CAP_CONNECTOR_INFO for the CEC adapter, so userspace can associate
the CEC adapter with the corresponding DRM connector.

The third patch simplifies CEC physical address handling by using the
cec_s_phys_addr_from_edid helper function that didn't exist when this
code was originally written.

The fourth patch adds the cec clock to ti,omap5-dss.txt.

The fifth patch the missing cec clock to the dra7 and omap5 device tree,
and the last patch adds CEC support to the OMAP5 driver.

Tested with a Pandaboard and a Beagle X15 board.

Regards,

Hans

Changes since v2:

- connector_attach can now return an error. If an error is
  returned then connector_detach is called in reverse order
  to clean up any previous connector_attach calls.

- connector_attach in hdmi4 and hdmi5 now return 0.

Changes since v1:

- as per suggestion from Laurent, changed cec_init/exit to
  connector_attach/_detach which are just called for all
  bridges. The DRM_BRIDGE_OP_CEC was dropped.

- added patch to add the cec clock to ti,omap5-dss.txt

- swapped the order of the last two patches

- incorporated Tomi's suggestions for the hdmi5 CEC support.

Hans Verkuil (6):
  drm: drm_bridge: add connector_attach/detach bridge ops
  drm/omapdrm/dss/hdmi4: switch to the connector bridge ops
  drm/omapdrm/dss/hdmi4: simplify CEC Phys Addr handling
  dt-bindings: display: ti: ti,omap5-dss.txt: add cec clock
  dra7.dtsi/omap5.dtsi: add cec clock
  drm/omapdrm/dss/hdmi5: add CEC support

 .../bindings/display/ti/ti,omap5-dss.txt  |   4 +-
 arch/arm/boot/dts/dra7.dtsi   |   5 +-
 arch/arm/boot/dts/omap5.dtsi  |   5 +-
 drivers/gpu/drm/drm_bridge_connector.c|  25 ++-
 drivers/gpu/drm/omapdrm/Kconfig   |   8 +
 drivers/gpu/drm/omapdrm/Makefile  |   1 +
 drivers/gpu/drm/omapdrm/dss/hdmi.h|   1 +
 drivers/gpu/drm/omapdrm/dss/hdmi4.c   |  41 ++--
 drivers/gpu/drm/omapdrm/dss/hdmi4_cec.c   |  13 +-
 drivers/gpu/drm/omapdrm/dss/hdmi4_cec.h   |  12 +-
 drivers/gpu/drm/omapdrm/dss/hdmi5.c   |  64 +-
 drivers/gpu/drm/omapdrm/dss/hdmi5_cec.c   | 209 ++
 drivers/gpu/drm/omapdrm/dss/hdmi5_cec.h   |  42 
 drivers/gpu/drm/omapdrm/dss/hdmi5_core.c  |  35 ++-
 drivers/gpu/drm/omapdrm/dss/hdmi5_core.h  |  33 ++-
 include/drm/drm_bridge.h  |  27 +++
 16 files changed, 470 insertions(+), 55 deletions(-)
 create mode 100644 drivers/gpu/drm/omapdrm/dss/hdmi5_cec.c
 create mode 100644 drivers/gpu/drm/omapdrm/dss/hdmi5_cec.h

-- 
2.30.2

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

On Wed, Apr 28, 2021 at 03:11:27PM +0200, Christian König wrote:
> Am 28.04.21 um 14:26 schrieb Daniel Vetter:
> > On Wed, Apr 28, 2021 at 02:21:54PM +0200, Daniel Vetter wrote:
> > > On Wed, Apr 28, 2021 at 12:31:09PM +0200, Christian König wrote:
> > > > Am 28.04.21 um 12:05 schrieb Daniel Vetter:
> > > > > On Tue, Apr 27, 2021 at 02:01:20PM -0400, Alex Deucher wrote:
> > > > > > On Tue, Apr 27, 2021 at 1:35 PM Simon Ser  
> > > > > > wrote:
> > > > > > > On Tuesday, April 27th, 2021 at 7:31 PM, Lucas Stach 
> > > > > > >  wrote:
> > > > > > > 
> > > > > > > > > Ok. So that would only make the following use cases broken 
> > > > > > > > > for now:
> > > > > > > > > 
> > > > > > > > > - amd render -> external gpu
> > > > > > > > > - amd video encode -> network device
> > > > > > > > FWIW, "only" breaking amd render -> external gpu will make us 
> > > > > > > > pretty
> > > > > > > > unhappy
> > > > > > > I concur. I have quite a few users with a multi-GPU setup 
> > > > > > > involving
> > > > > > > AMD hardware.
> > > > > > > 
> > > > > > > Note, if this brokenness can't be avoided, I'd prefer a to get a 
> > > > > > > clear
> > > > > > > error, and not bad results on screen because nothing is 
> > > > > > > synchronized
> > > > > > > anymore.
> > > > > > It's an upcoming requirement for windows[1], so you are likely to
> > > > > > start seeing this across all GPU vendors that support windows.  I
> > > > > > think the timing depends on how quickly the legacy hardware support
> > > > > > sticks around for each vendor.
> > > > > Yeah but hw scheduling doesn't mean the hw has to be constructed to 
> > > > > not
> > > > > support isolating the ringbuffer at all.
> > > > > 
> > > > > E.g. even if the hw loses the bit to put the ringbuffer outside of the
> > > > > userspace gpu vm, if you have pagetables I'm seriously hoping you 
> > > > > have r/o
> > > > > pte flags. Otherwise the entire "share address space with cpu side,
> > > > > seamlessly" thing is out of the window.
> > > > > 
> > > > > And with that r/o bit on the ringbuffer you can once more force submit
> > > > > through kernel space, and all the legacy dma_fence based stuff keeps
> > > > > working. And we don't have to invent some horrendous userspace fence 
> > > > > based
> > > > > implicit sync mechanism in the kernel, but can instead do this 
> > > > > transition
> > > > > properly with drm_syncobj timeline explicit sync and protocol reving.
> > > > > 
> > > > > At least I think you'd have to work extra hard to create a gpu which
> > > > > cannot possibly be intercepted by the kernel, even when it's designed 
> > > > > to
> > > > > support userspace direct submit only.
> > > > > 
> > > > > Or are your hw engineers more creative here and we're screwed?
> > > > The upcomming hardware generation will have this hardware scheduler as a
> > > > must have, but there are certain ways we can still stick to the old
> > > > approach:
> > > > 
> > > > 1. The new hardware scheduler currently still supports kernel queues 
> > > > which
> > > > essentially is the same as the old hardware ring buffer.
> > > > 
> > > > 2. Mapping the top level ring buffer into the VM at least partially 
> > > > solves
> > > > the problem. This way you can't manipulate the ring buffer content, but 
> > > > the
> > > > location for the fence must still be writeable.
> > > Yeah allowing userspace to lie about completion fences in this model is
> > > ok. Though I haven't thought through full consequences of that, but I
> > > think it's not any worse than userspace lying about which buffers/address
> > > it uses in the current model - we rely on hw vm ptes to catch that stuff.
> > > 
> > > Also it might be good to switch to a non-recoverable ctx model for these.
> > > That's already what we do in i915 (opt-in, but all current umd use that
> > > mode). So any hang/watchdog just kills the entire ctx and you don't have
> > > to worry about userspace doing something funny with it's ringbuffer.
> > > Simplifies everything.
> > > 
> > > Also ofc userspace fencing still disallowed, but since userspace would
> > > queu up all writes to its ringbuffer through the drm/scheduler, we'd
> > > handle dependencies through that still. Not great, but workable.
> > > 
> > > Thinking about this, not even mapping the ringbuffer r/o is required, it's
> > > just that we must queue things throug the kernel to resolve dependencies
> > > and everything without breaking dma_fence. If userspace lies, tdr will
> > > shoot it and the kernel stops running that context entirely.
> 
> Thinking more about that approach I don't think that it will work correctly.
> 
> See we not only need to write the fence as signal that an IB is submitted,
> but also adjust a bunch of privileged hardware registers.
> 
> When userspace could do that from its IBs as well then there is nothing
> blocking it from reprogramming the page table base address for example.
> 
> We could do those writes with the CPU as well, but that would be

Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal


Am 28.04.21 um 15:34 schrieb Daniel Vetter:

On Wed, Apr 28, 2021 at 03:11:27PM +0200, Christian König wrote:

Am 28.04.21 um 14:26 schrieb Daniel Vetter:

On Wed, Apr 28, 2021 at 02:21:54PM +0200, Daniel Vetter wrote:

On Wed, Apr 28, 2021 at 12:31:09PM +0200, Christian König wrote:

Am 28.04.21 um 12:05 schrieb Daniel Vetter:

On Tue, Apr 27, 2021 at 02:01:20PM -0400, Alex Deucher wrote:

On Tue, Apr 27, 2021 at 1:35 PM Simon Ser  wrote:

On Tuesday, April 27th, 2021 at 7:31 PM, Lucas Stach  
wrote:


Ok. So that would only make the following use cases broken for now:

- amd render -> external gpu
- amd video encode -> network device

FWIW, "only" breaking amd render -> external gpu will make us pretty
unhappy

I concur. I have quite a few users with a multi-GPU setup involving
AMD hardware.

Note, if this brokenness can't be avoided, I'd prefer a to get a clear
error, and not bad results on screen because nothing is synchronized
anymore.

It's an upcoming requirement for windows[1], so you are likely to
start seeing this across all GPU vendors that support windows.  I
think the timing depends on how quickly the legacy hardware support
sticks around for each vendor.

Yeah but hw scheduling doesn't mean the hw has to be constructed to not
support isolating the ringbuffer at all.

E.g. even if the hw loses the bit to put the ringbuffer outside of the
userspace gpu vm, if you have pagetables I'm seriously hoping you have r/o
pte flags. Otherwise the entire "share address space with cpu side,
seamlessly" thing is out of the window.

And with that r/o bit on the ringbuffer you can once more force submit
through kernel space, and all the legacy dma_fence based stuff keeps
working. And we don't have to invent some horrendous userspace fence based
implicit sync mechanism in the kernel, but can instead do this transition
properly with drm_syncobj timeline explicit sync and protocol reving.

At least I think you'd have to work extra hard to create a gpu which
cannot possibly be intercepted by the kernel, even when it's designed to
support userspace direct submit only.

Or are your hw engineers more creative here and we're screwed?

The upcomming hardware generation will have this hardware scheduler as a
must have, but there are certain ways we can still stick to the old
approach:

1. The new hardware scheduler currently still supports kernel queues which
essentially is the same as the old hardware ring buffer.

2. Mapping the top level ring buffer into the VM at least partially solves
the problem. This way you can't manipulate the ring buffer content, but the
location for the fence must still be writeable.

Yeah allowing userspace to lie about completion fences in this model is
ok. Though I haven't thought through full consequences of that, but I
think it's not any worse than userspace lying about which buffers/address
it uses in the current model - we rely on hw vm ptes to catch that stuff.

Also it might be good to switch to a non-recoverable ctx model for these.
That's already what we do in i915 (opt-in, but all current umd use that
mode). So any hang/watchdog just kills the entire ctx and you don't have
to worry about userspace doing something funny with it's ringbuffer.
Simplifies everything.

Also ofc userspace fencing still disallowed, but since userspace would
queu up all writes to its ringbuffer through the drm/scheduler, we'd
handle dependencies through that still. Not great, but workable.

Thinking about this, not even mapping the ringbuffer r/o is required, it's
just that we must queue things throug the kernel to resolve dependencies
and everything without breaking dma_fence. If userspace lies, tdr will
shoot it and the kernel stops running that context entirely.

Thinking more about that approach I don't think that it will work correctly.

See we not only need to write the fence as signal that an IB is submitted,
but also adjust a bunch of privileged hardware registers.

When userspace could do that from its IBs as well then there is nothing
blocking it from reprogramming the page table base address for example.

We could do those writes with the CPU as well, but that would be a huge
performance drop because of the additional latency.

That's not what I'm suggesting. I'm suggesting you have the queue and
everything in userspace, like in wondows. Fences are exactly handled like
on windows too. The difference is:

- All new additions to the ringbuffer are done through a kernel ioctl
   call, using the drm/scheduler to resolve dependencies.

- Memory management is also done like today int that ioctl.

- TDR makes sure that if userspace abuses the contract (which it can, but
   it can do that already today because there's also no command parser to
   e.g. stop gpu semaphores) the entire context is shot and terminally
   killed. Userspace has to then set up a new one. This isn't how amdgpu
   recovery works right now, but i915 supports it and I think it's also the
   better model for userspace error recov

Re: [PATCH v2 3/4] drm/msm: get rid of msm_iomap_size

2021-04-28 Thread Dmitry Baryshkov


On 28/04/2021 05:47, Bjorn Andersson wrote:

On Mon 26 Apr 19:18 CDT 2021, Dmitry Baryshkov wrote:
[..]

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 92fe844b517b..be578fc4e54f 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -124,7 +124,7 @@ struct clk *msm_clk_get(struct platform_device *pdev, const 
char *name)
  }
  
  static void __iomem *_msm_ioremap(struct platform_device *pdev, const char *name,

- const char *dbgname, bool quiet)
+ const char *dbgname, bool quiet, phys_addr_t 
*psize)


size_t sounds like a better fit for psize...


I was trying to select between size_t and phys_addr_t, settling on the 
latter one because it is used for resource size.



--
With best wishes
Dmitry
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v2 3/4] drm/msm: get rid of msm_iomap_size

2021-04-28 Thread Bjorn Andersson

On Wed 28 Apr 08:41 CDT 2021, Dmitry Baryshkov wrote:

> On 28/04/2021 05:47, Bjorn Andersson wrote:
> > On Mon 26 Apr 19:18 CDT 2021, Dmitry Baryshkov wrote:
> > [..]
> > > diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
> > > index 92fe844b517b..be578fc4e54f 100644
> > > --- a/drivers/gpu/drm/msm/msm_drv.c
> > > +++ b/drivers/gpu/drm/msm/msm_drv.c
> > > @@ -124,7 +124,7 @@ struct clk *msm_clk_get(struct platform_device *pdev, 
> > > const char *name)
> > >   }
> > >   static void __iomem *_msm_ioremap(struct platform_device *pdev, const 
> > > char *name,
> > > -   const char *dbgname, bool quiet)
> > > +   const char *dbgname, bool quiet, phys_addr_t 
> > > *psize)
> > 
> > size_t sounds like a better fit for psize...
> 
> I was trying to select between size_t and phys_addr_t, settling on the
> latter one because it is used for resource size.
> 

I always thought resource_size_t was an alias for size_t, now I know :)

That said, I still think that size_t (in line with resource_size_t)
gives a better hint about what the parameter represents...

Regards,
Bjorn
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [Intel-gfx] [PATCH 09/21] drm/i915/gem: Disallow creating contexts with too many engines

On Wed, Apr 28, 2021 at 11:42:31AM +0100, Tvrtko Ursulin wrote:
> 
> On 28/04/2021 11:16, Daniel Vetter wrote:
> > On Fri, Apr 23, 2021 at 05:31:19PM -0500, Jason Ekstrand wrote:
> > > There's no sense in allowing userspace to create more engines than it
> > > can possibly access via execbuf.
> > > 
> > > Signed-off-by: Jason Ekstrand 
> > > ---
> > >   drivers/gpu/drm/i915/gem/i915_gem_context.c | 7 +++
> > >   1 file changed, 3 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
> > > b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > > index 5f8d0faf783aa..ecb3bf5369857 100644
> > > --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> > > @@ -1640,11 +1640,10 @@ set_engines(struct i915_gem_context *ctx,
> > >   return -EINVAL;
> > >   }
> > > - /*
> > > -  * Note that I915_EXEC_RING_MASK limits execbuf to only using the
> > > -  * first 64 engines defined here.
> > > -  */
> > >   num_engines = (args->size - sizeof(*user)) / 
> > > sizeof(*user->engines);
> > 
> > Maybe add a comment like /* RING_MASK has not shift, so can be used
> > directly here */ since I had to check that :-)
> > 
> > Same story about igt testcases needed, just to be sure.
> > 
> > Reviewed-by: Daniel Vetter 
> 
> I am not sure about the churn vs benefit ratio here. There are also patches
> which extend the engine selection field in execbuf2 over the unused
> constants bits (with an explicit flag). So churn upstream and churn in
> internal (if interesting) for not much benefit.

This isn't churn.

This is "lock done uapi properly".
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v2 3/4] drm/msm: get rid of msm_iomap_size

2021-04-28 Thread Dmitry Baryshkov


On 28/04/2021 16:59, Bjorn Andersson wrote:

On Wed 28 Apr 08:41 CDT 2021, Dmitry Baryshkov wrote:


On 28/04/2021 05:47, Bjorn Andersson wrote:

On Mon 26 Apr 19:18 CDT 2021, Dmitry Baryshkov wrote:
[..]

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 92fe844b517b..be578fc4e54f 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -124,7 +124,7 @@ struct clk *msm_clk_get(struct platform_device *pdev, const 
char *name)
   }
   static void __iomem *_msm_ioremap(struct platform_device *pdev, const char 
*name,
- const char *dbgname, bool quiet)
+ const char *dbgname, bool quiet, phys_addr_t 
*psize)


size_t sounds like a better fit for psize...


I was trying to select between size_t and phys_addr_t, settling on the
latter one because it is used for resource size.



I always thought resource_size_t was an alias for size_t, now I know :)

That said, I still think that size_t (in line with resource_size_t)
gives a better hint about what the parameter represents...


Indeed, I'll change that in the next version.



Regards,
Bjorn




--
With best wishes
Dmitry
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCHv2] drm/omap: Fix issue with clocks left on after resume

2021-04-28 Thread Laurent Pinchart

Hi Tony,

Thank you for the patch.

On Wed, Apr 28, 2021 at 12:25:00PM +0300, Tony Lindgren wrote:
> On resume, dispc pm_runtime_force_resume() is not enabling the hardware
> as we pass the pm_runtime_need_not_resume() test as the device is suspended
> with no child devices.
> 
> As the resume continues, omap_atomic_comit_tail() calls dispc_runtime_get()
> that calls rpm_resume() enabling the hardware, and increasing child_count
> for it's parent device.
> 
> But at this point device_complete() has not yet been called for dispc. So
> when omap_atomic_comit_tail() calls dispc_runtime_put(), it won't idle
> the hardware as rpm_suspend() returns -EBUSY, and the clocks are left on
> after resume. The parent child count is not decremented as the -EBUSY
> cannot be easily handled until later on after device_complete().
> 
> This can be easily seen for example after suspending Beagleboard-X15 with
> no displays connected, and by reading the CM_DSS_DSS_CLKCTRL register at
> 0x4a009120 after resume. After a suspend and resume cycle, it shows a
> value of 0x00040102 instead of 0x0007 like it should.
> 
> Let's fix the issue by calling dispc_runtime_suspend() and
> dispc_runtime_resume() directly from dispc_suspend() and dispc_resume().
> This leaves out the PM runtime related issues for system suspend.
> 
> We could handle the issue by adding more calls to dispc_runtime_get()
> and dispc_runtime_put() from omap_drm_suspend() and omap_drm_resume()
> as suggested by Tomi Valkeinen .
> But that would just add more inter-component calls and more dependencies
> to PM runtime for system suspend and does not make things easier in the
> long.

Based on my experience on the camera and display side with devices that
are made of multiple components, suspend and resume are best handled in
a controlled way by the top-level driver. Otherwise you end up having
different components suspending and resuming in random orders, and
that's a recipe for failure.

Can we get the omapdrm suspend/resume to run first/last, and
stop/restart the whole device from there ?

> See also earlier commit 88d26136a256 ("PM: Prevent runtime suspend during
> system resume") and commit ca8199f13498 ("drm/msm/dpu: ensure device
> suspend happens during PM sleep") for more information.
> 
> Fixes: ecfdedd7da5d ("drm/omap: force runtime PM suspend on system suspend")
> Signed-off-by: Tony Lindgren 
> ---
> 
> Changes since v1:
> - Updated the description for a typo noticed by Tomi
> - Added more info about what all goes wrong
> 
> ---
>  drivers/gpu/drm/omapdrm/dss/dispc.c | 27 ++-
>  1 file changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/omapdrm/dss/dispc.c 
> b/drivers/gpu/drm/omapdrm/dss/dispc.c
> --- a/drivers/gpu/drm/omapdrm/dss/dispc.c
> +++ b/drivers/gpu/drm/omapdrm/dss/dispc.c
> @@ -182,6 +182,7 @@ struct dispc_device {
>   const struct dispc_features *feat;
>  
>   bool is_enabled;
> + bool needs_resume;
>  
>   struct regmap *syscon_pol;
>   u32 syscon_pol_offset;
> @@ -4887,10 +4888,34 @@ static int dispc_runtime_resume(struct device *dev)
>   return 0;
>  }
>  
> +static int dispc_suspend(struct device *dev)
> +{
> + struct dispc_device *dispc = dev_get_drvdata(dev);
> +
> + if (!dispc->is_enabled)
> + return 0;
> +
> + dispc->needs_resume = true;
> +
> + return dispc_runtime_suspend(dev);
> +}
> +
> +static int dispc_resume(struct device *dev)
> +{
> + struct dispc_device *dispc = dev_get_drvdata(dev);
> +
> + if (!dispc->needs_resume)
> + return 0;
> +
> + dispc->needs_resume = false;
> +
> + return dispc_runtime_resume(dev);
> +}
> +
>  static const struct dev_pm_ops dispc_pm_ops = {
>   .runtime_suspend = dispc_runtime_suspend,
>   .runtime_resume = dispc_runtime_resume,
> - SET_LATE_SYSTEM_SLEEP_PM_OPS(pm_runtime_force_suspend, 
> pm_runtime_force_resume)
> + SET_LATE_SYSTEM_SLEEP_PM_OPS(dispc_suspend, dispc_resume)
>  };
>  
>  struct platform_driver omap_dispchw_driver = {

-- 
Regards,

Laurent Pinchart
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH V2 2/2] drm/bridge: ti-sn65dsi83: Add TI SN65DSI83 and SN65DSI84 driver

2021-04-28 Thread Marek Vasut


On 4/28/21 11:24 AM, Neil Armstrong wrote:
[...]


+static int sn65dsi83_probe(struct i2c_client *client,
+   const struct i2c_device_id *id)
+{
+struct device *dev = &client->dev;
+enum sn65dsi83_model model;
+struct sn65dsi83 *ctx;
+int ret;
+
+ctx = devm_kzalloc(dev, sizeof(*ctx), GFP_KERNEL);
+if (!ctx)
+return -ENOMEM;
+
+ctx->dev = dev;
+
+if (dev->of_node)
+model = (enum sn65dsi83_model)of_device_get_match_data(dev);
+else
+model = id->driver_data;
+
+/* Default to dual-link LVDS on all but DSI83. */
+if (model != MODEL_SN65DSI83)
+ctx->lvds_dual_link = true;


What if I use the DSI84 with a single link LVDS? I can't see any way to
configure that right now.


I assume the simplest way would be to use the "ti,sn65dsi83"
compatible string in your dts, since the way you wired it is
'compatible' with sn65dsi83, right?


No this isn't the right way to to, if sn65dsi84 is supported and the bindings 
only support single lvds link,
the driver must only support single link on sn65dsi84, or add the dual link 
lvds in the bindings only for sn65dsi84.


The driver has a comment about what is supported and tested, as Frieder 
already pointed out:


Currently supported:
- SN65DSI83
  = 1x Single-link DSI ~ 1x Single-link LVDS
  - Supported
  - Single-link LVDS mode tested
- SN65DSI84
  = 1x Single-link DSI ~ 2x Single-link or 1x Dual-link LVDS
  - Supported
  - Dual-link LVDS mode tested
  - 2x Single-link LVDS mode unsupported
(should be easy to add by someone who has the HW)
- SN65DSI85
  = 2x Single-link or 1x Dual-link DSI ~ 2x Single-link or 1x Dual-link 
LVDS

  - Unsupported
(should be easy to add by someone who has the HW)

So,
DSI83 is always single-link DSI, single-link LVDS.
DSI84 is always single-link DSI, and currently always dual-link LVDS.

The DSI83 can do one thing on the LVDS end:
- 1x single link lVDS

The DSI84 can do two things on the LVDS end:
- 1x single link lVDS
- 1x dual link LVDS

There is also some sort of mention in the DSI84 datasheet about 2x 
single link LVDS, but I suspect that might be copied from DSI85 
datasheet instead, which would make sense. The other option is that it 
behaves as a mirror (i.e. same pixels are scanned out of LVDS channel A 
and B). Either option can be added by either adding a DT property which 
would enable the mirror mode, or new port linking the LVDS endpoint to 
the same panel twice, and/or two new ports for DSI85, so there is no 
problem to extend the bindings without breaking them. So for now I would 
ignore this mode.


So ultimately, what we have to sort out is the 1x single / 1x dual link 
LVDS mode setting on DSI84. Frieder already pointed out how the driver 
can be tweaked to support the single-link mode on DSI84, so now we need 
to tie it into DT bindings.


Currently, neither the LVDS panels in upstream in panel-simple nor 
lvds.yaml provide any indication that the panel is single-link or 
dual-link. Those dual-link LVDS panels seem to always set 2x pixel clock 
and let the bridge somehow sort it out.


Maybe that isn't always the best approach, maybe we should add a new 
DRM_BUS_FLAG for those panels and handle the flag in the bridge driver ? 
Such a new flag could be added over time to panels where applicable, so 
old setups won't be broken by that either, they will just ignore the new 
flag and work as before.



I just saw the note in the header of the driver that says that single
link mode is unsupported for the DSI84.

I have hardware with a single link display and if I set
ctx->lvds_dual_link = false it works just fine.

How is this supposed to be selected? Does it need an extra devicetree
property? And would you mind adding single-link support in the next
version or do you prefer adding it in a follow-up patch?


If this has to be supported I think the proper way would be to support
two output ports in the dts (e.g. lvds0_out, lvds1_out), in the same
way as supported by the 'advantech,idk-2121wr' panel.


Yes, this is why I asked to have the dual-link lvds in the bindings.


Maybe it shouldn't really be in the bindings, see above.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH V2 2/2] drm/bridge: ti-sn65dsi83: Add TI SN65DSI83 and SN65DSI84 driver

2021-04-28 Thread Marek Vasut


On 4/28/21 11:49 AM, Jagan Teki wrote:
[...]


I just saw the note in the header of the driver that says that single
link mode is unsupported for the DSI84.

I have hardware with a single link display and if I set
ctx->lvds_dual_link = false it works just fine.

How is this supposed to be selected? Does it need an extra devicetree
property? And would you mind adding single-link support in the next
version or do you prefer adding it in a follow-up patch?


If this has to be supported I think the proper way would be to support
two output ports in the dts (e.g. lvds0_out, lvds1_out), in the same
way as supported by the 'advantech,idk-2121wr' panel.


Yes, this is why I asked to have the dual-link lvds in the bindings.


Agreed with Neil, this is what we discussed on my v3. Each of these 3
chips has its own compatible and supporting dual-link lvds and
dual-link dsi as to be done by 84/85 and 85 respectively.


I have a counter-proposal to this single/dual link LVDS panel handling, 
maybe this should really be done using DRM_BUS_FLAG added to the panel, 
to indicate whether it is single or dual link. Then the bridge can 
figure that out, without any extra DT props.



Maybe I can push my configuration changes in gist if required?


Please summarize the v3 discussion, yes.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH V2 1/2] dt-bindings: drm/bridge: ti-sn65dsi83: Add TI SN65DSI83 and SN65DSI84 bindings

2021-04-28 Thread Marek Vasut


On 4/28/21 9:56 AM, Frieder Schrempf wrote:
[...]

+properties:
+  compatible:
+    oneOf:
+  - const: ti,sn65dsi83
+  - const: ti,sn65dsi84
+
+  reg:
+    const: 0x2d


There is a strapping pin to select the last bit of the address, so apart 
from 0x2d also 0x2c is valid here.


Fixed, along with the other DT details pointed out by Laurent, thanks.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [Intel-gfx] [PATCH 09/21] drm/i915/gem: Disallow creating contexts with too many engines

2021-04-28 Thread Tvrtko Ursulin




On 28/04/2021 15:02, Daniel Vetter wrote:

On Wed, Apr 28, 2021 at 11:42:31AM +0100, Tvrtko Ursulin wrote:


On 28/04/2021 11:16, Daniel Vetter wrote:

On Fri, Apr 23, 2021 at 05:31:19PM -0500, Jason Ekstrand wrote:

There's no sense in allowing userspace to create more engines than it
can possibly access via execbuf.

Signed-off-by: Jason Ekstrand 
---
   drivers/gpu/drm/i915/gem/i915_gem_context.c | 7 +++
   1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 5f8d0faf783aa..ecb3bf5369857 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1640,11 +1640,10 @@ set_engines(struct i915_gem_context *ctx,
return -EINVAL;
}
-   /*
-* Note that I915_EXEC_RING_MASK limits execbuf to only using the
-* first 64 engines defined here.
-*/
num_engines = (args->size - sizeof(*user)) / sizeof(*user->engines);


Maybe add a comment like /* RING_MASK has not shift, so can be used
directly here */ since I had to check that :-)

Same story about igt testcases needed, just to be sure.

Reviewed-by: Daniel Vetter 


I am not sure about the churn vs benefit ratio here. There are also patches
which extend the engine selection field in execbuf2 over the unused
constants bits (with an explicit flag). So churn upstream and churn in
internal (if interesting) for not much benefit.


This isn't churn.

This is "lock done uapi properly".


IMO it is a "meh" patch. Doesn't fix any problems and will create work 
for other people and man hours spent which no one will ever properly 
account against.


Number of contexts in the engine map should not really be tied to 
execbuf2. As is demonstrated by the incoming work to address more than 
63 engines, either as an extension to execbuf2 or future execbuf3.


Regards,

Tvrtko
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

On Wed, Apr 28, 2021 at 03:37:49PM +0200, Christian König wrote:
> Am 28.04.21 um 15:34 schrieb Daniel Vetter:
> > On Wed, Apr 28, 2021 at 03:11:27PM +0200, Christian König wrote:
> > > Am 28.04.21 um 14:26 schrieb Daniel Vetter:
> > > > On Wed, Apr 28, 2021 at 02:21:54PM +0200, Daniel Vetter wrote:
> > > > > On Wed, Apr 28, 2021 at 12:31:09PM +0200, Christian König wrote:
> > > > > > Am 28.04.21 um 12:05 schrieb Daniel Vetter:
> > > > > > > On Tue, Apr 27, 2021 at 02:01:20PM -0400, Alex Deucher wrote:
> > > > > > > > On Tue, Apr 27, 2021 at 1:35 PM Simon Ser  
> > > > > > > > wrote:
> > > > > > > > > On Tuesday, April 27th, 2021 at 7:31 PM, Lucas Stach 
> > > > > > > > >  wrote:
> > > > > > > > > 
> > > > > > > > > > > Ok. So that would only make the following use cases 
> > > > > > > > > > > broken for now:
> > > > > > > > > > > 
> > > > > > > > > > > - amd render -> external gpu
> > > > > > > > > > > - amd video encode -> network device
> > > > > > > > > > FWIW, "only" breaking amd render -> external gpu will make 
> > > > > > > > > > us pretty
> > > > > > > > > > unhappy
> > > > > > > > > I concur. I have quite a few users with a multi-GPU setup 
> > > > > > > > > involving
> > > > > > > > > AMD hardware.
> > > > > > > > > 
> > > > > > > > > Note, if this brokenness can't be avoided, I'd prefer a to 
> > > > > > > > > get a clear
> > > > > > > > > error, and not bad results on screen because nothing is 
> > > > > > > > > synchronized
> > > > > > > > > anymore.
> > > > > > > > It's an upcoming requirement for windows[1], so you are likely 
> > > > > > > > to
> > > > > > > > start seeing this across all GPU vendors that support windows.  
> > > > > > > > I
> > > > > > > > think the timing depends on how quickly the legacy hardware 
> > > > > > > > support
> > > > > > > > sticks around for each vendor.
> > > > > > > Yeah but hw scheduling doesn't mean the hw has to be constructed 
> > > > > > > to not
> > > > > > > support isolating the ringbuffer at all.
> > > > > > > 
> > > > > > > E.g. even if the hw loses the bit to put the ringbuffer outside 
> > > > > > > of the
> > > > > > > userspace gpu vm, if you have pagetables I'm seriously hoping you 
> > > > > > > have r/o
> > > > > > > pte flags. Otherwise the entire "share address space with cpu 
> > > > > > > side,
> > > > > > > seamlessly" thing is out of the window.
> > > > > > > 
> > > > > > > And with that r/o bit on the ringbuffer you can once more force 
> > > > > > > submit
> > > > > > > through kernel space, and all the legacy dma_fence based stuff 
> > > > > > > keeps
> > > > > > > working. And we don't have to invent some horrendous userspace 
> > > > > > > fence based
> > > > > > > implicit sync mechanism in the kernel, but can instead do this 
> > > > > > > transition
> > > > > > > properly with drm_syncobj timeline explicit sync and protocol 
> > > > > > > reving.
> > > > > > > 
> > > > > > > At least I think you'd have to work extra hard to create a gpu 
> > > > > > > which
> > > > > > > cannot possibly be intercepted by the kernel, even when it's 
> > > > > > > designed to
> > > > > > > support userspace direct submit only.
> > > > > > > 
> > > > > > > Or are your hw engineers more creative here and we're screwed?
> > > > > > The upcomming hardware generation will have this hardware scheduler 
> > > > > > as a
> > > > > > must have, but there are certain ways we can still stick to the old
> > > > > > approach:
> > > > > > 
> > > > > > 1. The new hardware scheduler currently still supports kernel 
> > > > > > queues which
> > > > > > essentially is the same as the old hardware ring buffer.
> > > > > > 
> > > > > > 2. Mapping the top level ring buffer into the VM at least partially 
> > > > > > solves
> > > > > > the problem. This way you can't manipulate the ring buffer content, 
> > > > > > but the
> > > > > > location for the fence must still be writeable.
> > > > > Yeah allowing userspace to lie about completion fences in this model 
> > > > > is
> > > > > ok. Though I haven't thought through full consequences of that, but I
> > > > > think it's not any worse than userspace lying about which 
> > > > > buffers/address
> > > > > it uses in the current model - we rely on hw vm ptes to catch that 
> > > > > stuff.
> > > > > 
> > > > > Also it might be good to switch to a non-recoverable ctx model for 
> > > > > these.
> > > > > That's already what we do in i915 (opt-in, but all current umd use 
> > > > > that
> > > > > mode). So any hang/watchdog just kills the entire ctx and you don't 
> > > > > have
> > > > > to worry about userspace doing something funny with it's ringbuffer.
> > > > > Simplifies everything.
> > > > > 
> > > > > Also ofc userspace fencing still disallowed, but since userspace would
> > > > > queu up all writes to its ringbuffer through the drm/scheduler, we'd
> > > > > handle dependencies through that still. Not great, but workable.
> > > > > 
> > > > > Thinking about this, not even mapping the ringbuffer r/o is required,

Re: [PATCH 12/21] drm/i915/gem: Add a separate validate_priority helper

On Fri, Apr 23, 2021 at 05:31:22PM -0500, Jason Ekstrand wrote:

Maybe explain that you pull this out since with the proto context there
will be two paths to set this, one for proto context, the other for
context already finalized and executing patches?

With that: Reviewed-by: Daniel Vetter 

> Signed-off-by: Jason Ekstrand 
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_context.c | 42 +
>  1 file changed, 27 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index 941fbf78267b4..e5efd22c89ba2 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -169,6 +169,28 @@ lookup_user_engine(struct i915_gem_context *ctx,
>   return i915_gem_context_get_engine(ctx, idx);
>  }
>  
> +static int validate_priority(struct drm_i915_private *i915,
> +  const struct drm_i915_gem_context_param *args)
> +{
> + s64 priority = args->value;
> +
> + if (args->size)
> + return -EINVAL;
> +
> + if (!(i915->caps.scheduler & I915_SCHEDULER_CAP_PRIORITY))
> + return -ENODEV;
> +
> + if (priority > I915_CONTEXT_MAX_USER_PRIORITY ||
> + priority < I915_CONTEXT_MIN_USER_PRIORITY)
> + return -EINVAL;
> +
> + if (priority > I915_CONTEXT_DEFAULT_PRIORITY &&
> + !capable(CAP_SYS_NICE))
> + return -EPERM;
> +
> + return 0;
> +}
> +
>  static struct i915_address_space *
>  context_get_vm_rcu(struct i915_gem_context *ctx)
>  {
> @@ -1744,23 +1766,13 @@ static void __apply_priority(struct intel_context 
> *ce, void *arg)
>  static int set_priority(struct i915_gem_context *ctx,
>   const struct drm_i915_gem_context_param *args)
>  {
> - s64 priority = args->value;
> -
> - if (args->size)
> - return -EINVAL;
> -
> - if (!(ctx->i915->caps.scheduler & I915_SCHEDULER_CAP_PRIORITY))
> - return -ENODEV;
> -
> - if (priority > I915_CONTEXT_MAX_USER_PRIORITY ||
> - priority < I915_CONTEXT_MIN_USER_PRIORITY)
> - return -EINVAL;
> + int err;
>  
> - if (priority > I915_CONTEXT_DEFAULT_PRIORITY &&
> - !capable(CAP_SYS_NICE))
> - return -EPERM;
> + err = validate_priority(ctx->i915, args);
> + if (err)
> + return err;
>  
> - ctx->sched.priority = priority;
> + ctx->sched.priority = args->value;
>   context_apply_all(ctx, __apply_priority, ctx);
>  
>   return 0;
> -- 
> 2.31.1
> 
> ___
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal


Am 28.04.21 um 16:34 schrieb Daniel Vetter:

On Wed, Apr 28, 2021 at 03:37:49PM +0200, Christian König wrote:

Am 28.04.21 um 15:34 schrieb Daniel Vetter:

On Wed, Apr 28, 2021 at 03:11:27PM +0200, Christian König wrote:

Am 28.04.21 um 14:26 schrieb Daniel Vetter:

On Wed, Apr 28, 2021 at 02:21:54PM +0200, Daniel Vetter wrote:

On Wed, Apr 28, 2021 at 12:31:09PM +0200, Christian König wrote:

Am 28.04.21 um 12:05 schrieb Daniel Vetter:

On Tue, Apr 27, 2021 at 02:01:20PM -0400, Alex Deucher wrote:

On Tue, Apr 27, 2021 at 1:35 PM Simon Ser  wrote:

On Tuesday, April 27th, 2021 at 7:31 PM, Lucas Stach  
wrote:


Ok. So that would only make the following use cases broken for now:

- amd render -> external gpu
- amd video encode -> network device

FWIW, "only" breaking amd render -> external gpu will make us pretty
unhappy

I concur. I have quite a few users with a multi-GPU setup involving
AMD hardware.

Note, if this brokenness can't be avoided, I'd prefer a to get a clear
error, and not bad results on screen because nothing is synchronized
anymore.

It's an upcoming requirement for windows[1], so you are likely to
start seeing this across all GPU vendors that support windows.  I
think the timing depends on how quickly the legacy hardware support
sticks around for each vendor.

Yeah but hw scheduling doesn't mean the hw has to be constructed to not
support isolating the ringbuffer at all.

E.g. even if the hw loses the bit to put the ringbuffer outside of the
userspace gpu vm, if you have pagetables I'm seriously hoping you have r/o
pte flags. Otherwise the entire "share address space with cpu side,
seamlessly" thing is out of the window.

And with that r/o bit on the ringbuffer you can once more force submit
through kernel space, and all the legacy dma_fence based stuff keeps
working. And we don't have to invent some horrendous userspace fence based
implicit sync mechanism in the kernel, but can instead do this transition
properly with drm_syncobj timeline explicit sync and protocol reving.

At least I think you'd have to work extra hard to create a gpu which
cannot possibly be intercepted by the kernel, even when it's designed to
support userspace direct submit only.

Or are your hw engineers more creative here and we're screwed?

The upcomming hardware generation will have this hardware scheduler as a
must have, but there are certain ways we can still stick to the old
approach:

1. The new hardware scheduler currently still supports kernel queues which
essentially is the same as the old hardware ring buffer.

2. Mapping the top level ring buffer into the VM at least partially solves
the problem. This way you can't manipulate the ring buffer content, but the
location for the fence must still be writeable.

Yeah allowing userspace to lie about completion fences in this model is
ok. Though I haven't thought through full consequences of that, but I
think it's not any worse than userspace lying about which buffers/address
it uses in the current model - we rely on hw vm ptes to catch that stuff.

Also it might be good to switch to a non-recoverable ctx model for these.
That's already what we do in i915 (opt-in, but all current umd use that
mode). So any hang/watchdog just kills the entire ctx and you don't have
to worry about userspace doing something funny with it's ringbuffer.
Simplifies everything.

Also ofc userspace fencing still disallowed, but since userspace would
queu up all writes to its ringbuffer through the drm/scheduler, we'd
handle dependencies through that still. Not great, but workable.

Thinking about this, not even mapping the ringbuffer r/o is required, it's
just that we must queue things throug the kernel to resolve dependencies
and everything without breaking dma_fence. If userspace lies, tdr will
shoot it and the kernel stops running that context entirely.

Thinking more about that approach I don't think that it will work correctly.

See we not only need to write the fence as signal that an IB is submitted,
but also adjust a bunch of privileged hardware registers.

When userspace could do that from its IBs as well then there is nothing
blocking it from reprogramming the page table base address for example.

We could do those writes with the CPU as well, but that would be a huge
performance drop because of the additional latency.

That's not what I'm suggesting. I'm suggesting you have the queue and
everything in userspace, like in wondows. Fences are exactly handled like
on windows too. The difference is:

- All new additions to the ringbuffer are done through a kernel ioctl
call, using the drm/scheduler to resolve dependencies.

- Memory management is also done like today int that ioctl.

- TDR makes sure that if userspace abuses the contract (which it can, but
it can do that already today because there's also no command parser to
e.g. stop gpu semaphores) the entire context is shot and terminally
killed. Userspace has to then set up a new one. This isn't how amdgpu

[PATCH v5 00/27] RFC Support hot device unplug in amdgpu

Until now extracting a card either by physical extraction (e.g. eGPU with 
thunderbolt connection or by emulation through  syfs -> 
/sys/bus/pci/devices/device_id/remove) 
would cause random crashes in user apps. The random crashes in apps were 
mostly due to the app having mapped a device backed BO into its address 
space was still trying to access the BO while the backing device was gone.
To answer this first problem Christian suggested to fix the handling of mapped 
memory in the clients when the device goes away by forcibly unmap all buffers 
the 
user processes has by clearing their respective VMAs mapping the device BOs. 
Then when the VMAs try to fill in the page tables again we check in the fault 
handlerif the device is removed and if so, return an error. This will generate 
a 
SIGBUS to the application which can then cleanly terminate.This indeed was done 
but this in turn created a problem of kernel OOPs were the OOPSes were due to 
the 
fact that while the app was terminating because of the SIGBUSit would trigger 
use 
after free in the driver by calling to accesses device structures that were 
already 
released from the pci remove sequence.This was handled by introducing a 'flush' 
sequence during device removal were we wait for drm file reference to drop to 0 
meaning all user clients directly using this device terminated.

v2:
Based on discussions in the mailing list with Daniel and Pekka [1] and based on 
the document 
produced by Pekka from those discussions [2] the whole approach with returning 
SIGBUS and 
waiting for all user clients having CPU mapping of device BOs to die was 
dropped. 
Instead as per the document suggestion the device structures are kept alive 
until 
the last reference to the device is dropped by user client and in the meanwhile 
all existing and new CPU mappings of the BOs 
belonging to the device directly or by dma-buf import are rerouted to per user 
process dummy rw page.Also, I skipped the 'Requirements for KMS UAPI' section 
of [2] 
since i am trying to get the minimal set of requirements that still give useful 
solution 
to work and this is the'Requirements for Render and Cross-Device UAPI' section 
and so my 
test case is removing a secondary device, which is render only and is not 
involved 
in KMS.

v3:
More updates following comments from v2 such as removing loop to find DRM file 
when rerouting 
page faults to dummy page,getting rid of unnecessary sysfs handling refactoring 
and moving 
prevention of GPU recovery post device unplug from amdgpu to scheduler layer. 
On top of that added unplug support for the IOMMU enabled system.

v4:
Drop last sysfs hack and use sysfs default attribute.
Guard against write accesses after device removal to avoid modifying released 
memory.
Update dummy pages handling to on demand allocation and release through drm 
managed framework.
Add return value to scheduler job TO handler (by Luben Tuikov) and use this in 
amdgpu for prevention 
of GPU recovery post device unplug
Also rebase on top of drm-misc-mext instead of amd-staging-drm-next

v5:
The most significant in this series is the improved protection from kernel 
driver accessing MMIO ranges that were allocated
for the device once the device is gone. To do this, first a patch 'drm/amdgpu: 
Unmap all MMIO mappings' is introduced.
This patch unamps all MMIO mapped into the kernel address space in the form of 
BARs and kernel BOs with CPU visible VRAM mappings.
This way it helped to discover multiple such access points because a page fault 
would be immediately generated on access. Most of them
were solved by moving HW fini code into pci_remove stage (patch drm/amdgpu: Add 
early fini callback) and for some who 
were harder to unwind drm_dev_enter/exit scoping was used. In addition all the 
IOCTLs and all background work and timers 
are now protected with drm_dev_enter/exit at their root in an attempt that 
after drm_dev_unplug is finished none of them 
run anymore and the pci_remove thread is the only thread executing which might 
touch the HW. To prevent deadlocks in such 
case against threads stuck on various HW or SW fences patches 'drm/amdgpu: 
Finalise device fences on device remove'  
and drm/amdgpu: Add rw_sem to pushing job into sched queue' take care of force 
signaling all such existing fences 
and rejecting any newly added ones.

With these patches I am able to gracefully remove the secondary card using 
sysfs remove hook while glxgears is running off of secondary 
card (DRI_PRIME=1) without kernel oopses or hangs and keep working with the 
primary card or soft reset the device without hangs or oopses.
Also as per Daniel's comment I added 3 tests to IGT [4] to core_hotunplug test 
suite - remove device while commands are submitted, 
exported BO and exported fence (not pushed yet).
Also now it's possible to plug back the device after unplug 
Also some users now can successfully use those patches with eGPU boxes[3].




TODOs for followup work:
Convert AMDGPU code

[PATCH v5 03/27] drm/amdgpu: Split amdgpu_device_fini into early and late

Some of the stuff in amdgpu_device_fini such as HW interrupts
disable and pending fences finilization must be done right away on
pci_remove while most of the stuff which relates to finilizing and
releasing driver data structures can be kept until
drm_driver.release hook is called, i.e. when the last device
reference is dropped.

v4: Change functions prefix early->hw and late->sw

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|  6 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 26 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c|  7 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 15 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c| 26 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h|  3 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c| 12 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c|  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  3 ++-
 drivers/gpu/drm/amd/amdgpu/cik_ih.c|  2 +-
 drivers/gpu/drm/amd/amdgpu/cz_ih.c |  2 +-
 drivers/gpu/drm/amd/amdgpu/iceland_ih.c|  2 +-
 drivers/gpu/drm/amd/amdgpu/navi10_ih.c |  2 +-
 drivers/gpu/drm/amd/amdgpu/si_ih.c |  2 +-
 drivers/gpu/drm/amd/amdgpu/tonga_ih.c  |  2 +-
 drivers/gpu/drm/amd/amdgpu/vega10_ih.c |  2 +-
 drivers/gpu/drm/amd/amdgpu/vega20_ih.c |  2 +-
 17 files changed, 79 insertions(+), 36 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 1af2fa1591fd..fddb82897e5d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1073,7 +1073,9 @@ static inline struct amdgpu_device 
*amdgpu_ttm_adev(struct ttm_device *bdev)
 
 int amdgpu_device_init(struct amdgpu_device *adev,
   uint32_t flags);
-void amdgpu_device_fini(struct amdgpu_device *adev);
+void amdgpu_device_fini_hw(struct amdgpu_device *adev);
+void amdgpu_device_fini_sw(struct amdgpu_device *adev);
+
 int amdgpu_gpu_wait_for_idle(struct amdgpu_device *adev);
 
 void amdgpu_device_vram_access(struct amdgpu_device *adev, loff_t pos,
@@ -1289,6 +1291,8 @@ void amdgpu_driver_lastclose_kms(struct drm_device *dev);
 int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv);
 void amdgpu_driver_postclose_kms(struct drm_device *dev,
 struct drm_file *file_priv);
+void amdgpu_driver_release_kms(struct drm_device *dev);
+
 int amdgpu_device_ip_suspend(struct amdgpu_device *adev);
 int amdgpu_device_suspend(struct drm_device *dev, bool fbcon);
 int amdgpu_device_resume(struct drm_device *dev, bool fbcon);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 6447cd6ca5a8..8d22b79fc1cd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3590,14 +3590,12 @@ int amdgpu_device_init(struct amdgpu_device *adev,
  * Tear down the driver info (all asics).
  * Called at driver shutdown.
  */
-void amdgpu_device_fini(struct amdgpu_device *adev)
+void amdgpu_device_fini_hw(struct amdgpu_device *adev)
 {
dev_info(adev->dev, "amdgpu: finishing device.\n");
flush_delayed_work(&adev->delayed_init_work);
adev->shutdown = true;
 
-   kfree(adev->pci_state);
-
/* make sure IB test finished before entering exclusive mode
 * to avoid preemption on IB test
 * */
@@ -3614,11 +3612,24 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
else
drm_atomic_helper_shutdown(adev_to_drm(adev));
}
-   amdgpu_fence_driver_fini(adev);
+   amdgpu_fence_driver_fini_hw(adev);
+
if (adev->pm_sysfs_en)
amdgpu_pm_sysfs_fini(adev);
+   if (adev->ucode_sysfs_en)
+   amdgpu_ucode_sysfs_fini(adev);
+   sysfs_remove_files(&adev->dev->kobj, amdgpu_dev_attributes);
+
+
amdgpu_fbdev_fini(adev);
+
+   amdgpu_irq_fini_hw(adev);
+}
+
+void amdgpu_device_fini_sw(struct amdgpu_device *adev)
+{
amdgpu_device_ip_fini(adev);
+   amdgpu_fence_driver_fini_sw(adev);
release_firmware(adev->firmware.gpu_info_fw);
adev->firmware.gpu_info_fw = NULL;
adev->accel_working = false;
@@ -3647,14 +3658,13 @@ void amdgpu_device_fini(struct amdgpu_device *adev)
adev->rmmio = NULL;
amdgpu_device_doorbell_fini(adev);
 
-   if (adev->ucode_sysfs_en)
-   amdgpu_ucode_sysfs_fini(adev);
-
-   sysfs_remove_files(&adev->dev->kobj, amdgpu_dev_attributes);
if (IS_ENABLED(CONFIG_PERF_EVENTS))
amdgpu_pmu_fini(adev);
if (adev->mman.discovery_bin)
amdgpu_discovery_fini(adev);
+
+   kfree(adev->pci_state);
+
 }
 
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 671ec1002230..54cb5ee2f563 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_d

[PATCH v5 08/27] PCI: add support for dev_groups to struct pci_device_driver

This is exact copy of 'USB: add support for dev_groups to
struct usb_device_driver' patch by Greg but just for
the PCI case.

Signed-off-by: Andrey Grodzovsky 
Suggested-by: Greg Kroah-Hartman 
---
 drivers/pci/pci-driver.c | 1 +
 include/linux/pci.h  | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index ec44a79e951a..3a72352aa5cf 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -1385,6 +1385,7 @@ int __pci_register_driver(struct pci_driver *drv, struct 
module *owner,
drv->driver.owner = owner;
drv->driver.mod_name = mod_name;
drv->driver.groups = drv->groups;
+   drv->driver.dev_groups = drv->dev_groups;
 
spin_lock_init(&drv->dynids.lock);
INIT_LIST_HEAD(&drv->dynids.list);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 86c799c97b77..b57755b03009 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -858,6 +858,8 @@ struct module;
  * number of VFs to enable via sysfs "sriov_numvfs" file.
  * @err_handler: See Documentation/PCI/pci-error-recovery.rst
  * @groups:Sysfs attribute groups.
+ * @dev_groups: Attributes attached to the device that will be
+ *  created once it is bound to the driver.
  * @driver:Driver model structure.
  * @dynids:List of dynamically added device IDs.
  */
@@ -873,6 +875,7 @@ struct pci_driver {
int  (*sriov_configure)(struct pci_dev *dev, int num_vfs); /* On PF */
const struct pci_error_handlers *err_handler;
const struct attribute_group **groups;
+   const struct attribute_group **dev_groups;
struct device_driverdriver;
struct pci_dynids   dynids;
 };
-- 
2.25.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH v5 05/27] drm/amdgpu: Add early fini callback

Use it to call disply code dependent on device->drv_data
before it's set to NULL on device unplug

v5: Move HW finilization into this callback to prevent MMIO accesses
post cpi remove.

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c| 59 +--
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 12 +++-
 drivers/gpu/drm/amd/include/amd_shared.h  |  2 +
 3 files changed, 52 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 8d22b79fc1cd..46d646c40338 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2536,34 +2536,26 @@ static int amdgpu_device_ip_late_init(struct 
amdgpu_device *adev)
return 0;
 }
 
-/**
- * amdgpu_device_ip_fini - run fini for hardware IPs
- *
- * @adev: amdgpu_device pointer
- *
- * Main teardown pass for hardware IPs.  The list of all the hardware
- * IPs that make up the asic is walked and the hw_fini and sw_fini callbacks
- * are run.  hw_fini tears down the hardware associated with each IP
- * and sw_fini tears down any software state associated with each IP.
- * Returns 0 on success, negative error code on failure.
- */
-static int amdgpu_device_ip_fini(struct amdgpu_device *adev)
+static int amdgpu_device_ip_fini_early(struct amdgpu_device *adev)
 {
int i, r;
 
-   if (amdgpu_sriov_vf(adev) && adev->virt.ras_init_done)
-   amdgpu_virt_release_ras_err_handler_data(adev);
+   for (i = 0; i < adev->num_ip_blocks; i++) {
+   if (!adev->ip_blocks[i].version->funcs->early_fini)
+   continue;
 
-   amdgpu_ras_pre_fini(adev);
+   r = adev->ip_blocks[i].version->funcs->early_fini((void *)adev);
+   if (r) {
+   DRM_DEBUG("early_fini of IP block <%s> failed %d\n",
+ adev->ip_blocks[i].version->funcs->name, r);
+   }
+   }
 
-   if (adev->gmc.xgmi.num_physical_nodes > 1)
-   amdgpu_xgmi_remove_device(adev);
+   amdgpu_amdkfd_suspend(adev, false);
 
amdgpu_device_set_pg_state(adev, AMD_PG_STATE_UNGATE);
amdgpu_device_set_cg_state(adev, AMD_CG_STATE_UNGATE);
 
-   amdgpu_amdkfd_device_fini(adev);
-
/* need to disable SMC first */
for (i = 0; i < adev->num_ip_blocks; i++) {
if (!adev->ip_blocks[i].status.hw)
@@ -2594,6 +2586,33 @@ static int amdgpu_device_ip_fini(struct amdgpu_device 
*adev)
adev->ip_blocks[i].status.hw = false;
}
 
+   return 0;
+}
+
+/**
+ * amdgpu_device_ip_fini - run fini for hardware IPs
+ *
+ * @adev: amdgpu_device pointer
+ *
+ * Main teardown pass for hardware IPs.  The list of all the hardware
+ * IPs that make up the asic is walked and the hw_fini and sw_fini callbacks
+ * are run.  hw_fini tears down the hardware associated with each IP
+ * and sw_fini tears down any software state associated with each IP.
+ * Returns 0 on success, negative error code on failure.
+ */
+static int amdgpu_device_ip_fini(struct amdgpu_device *adev)
+{
+   int i, r;
+
+   if (amdgpu_sriov_vf(adev) && adev->virt.ras_init_done)
+   amdgpu_virt_release_ras_err_handler_data(adev);
+
+   amdgpu_ras_pre_fini(adev);
+
+   if (adev->gmc.xgmi.num_physical_nodes > 1)
+   amdgpu_xgmi_remove_device(adev);
+
+   amdgpu_amdkfd_device_fini_sw(adev);
 
for (i = adev->num_ip_blocks - 1; i >= 0; i--) {
if (!adev->ip_blocks[i].status.sw)
@@ -3624,6 +3643,8 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev)
amdgpu_fbdev_fini(adev);
 
amdgpu_irq_fini_hw(adev);
+
+   amdgpu_device_ip_fini_early(adev);
 }
 
 void amdgpu_device_fini_sw(struct amdgpu_device *adev)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 55e39b462a5e..c0b9abb773a4 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -1170,6 +1170,15 @@ static int amdgpu_dm_init(struct amdgpu_device *adev)
return -EINVAL;
 }
 
+static int amdgpu_dm_early_fini(void *handle)
+{
+   struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+
+   amdgpu_dm_audio_fini(adev);
+
+   return 0;
+}
+
 static void amdgpu_dm_fini(struct amdgpu_device *adev)
 {
int i;
@@ -1178,8 +1187,6 @@ static void amdgpu_dm_fini(struct amdgpu_device *adev)
drm_encoder_cleanup(&adev->dm.mst_encoders[i].base);
}
 
-   amdgpu_dm_audio_fini(adev);
-
amdgpu_dm_destroy_drm_device(&adev->dm);
 
 #ifdef CONFIG_DRM_AMD_DC_HDCP
@@ -2194,6 +2201,7 @@ static const struct amd_ip_funcs amdgpu_dm_funcs = {
.late_init = dm_late_init,
.sw_init = dm_sw_init,
.sw_fini = dm_sw_fini,
+   .early_fini = amdgpu_dm_early_f

[PATCH v5 01/27] drm/ttm: Remap all page faults to per process dummy page.

On device removal reroute all CPU mappings to dummy page.

v3:
Remove loop to find DRM file and instead access it
by vma->vm_file->private_data. Move dummy page installation
into a separate function.

v4:
Map the entire BOs VA space into on demand allocated dummy page
on the first fault for that BO.

v5: Remove duplicate return.

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/ttm/ttm_bo_vm.c | 79 -
 include/drm/ttm/ttm_bo_api.h|  2 +
 2 files changed, 80 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index b31b18058965..8b8300551a7f 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -34,6 +34,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -380,19 +382,94 @@ vm_fault_t ttm_bo_vm_fault_reserved(struct vm_fault *vmf,
 }
 EXPORT_SYMBOL(ttm_bo_vm_fault_reserved);
 
+static void ttm_bo_release_dummy_page(struct drm_device *dev, void *res)
+{
+   struct page *dummy_page = (struct page *)res;
+
+   __free_page(dummy_page);
+}
+
+vm_fault_t ttm_bo_vm_dummy_page(struct vm_fault *vmf, pgprot_t prot)
+{
+   struct vm_area_struct *vma = vmf->vma;
+   struct ttm_buffer_object *bo = vma->vm_private_data;
+   struct drm_device *ddev = bo->base.dev;
+   vm_fault_t ret = VM_FAULT_NOPAGE;
+   unsigned long address = vma->vm_start;
+   unsigned long num_prefault = (vma->vm_end - vma->vm_start) >> 
PAGE_SHIFT;
+   unsigned long pfn;
+   struct page *page;
+   int i;
+
+   /*
+* Wait for buffer data in transit, due to a pipelined
+* move.
+*/
+   ret = ttm_bo_vm_fault_idle(bo, vmf);
+   if (unlikely(ret != 0))
+   return ret;
+
+   /* Allocate new dummy page to map all the VA range in this VMA to it*/
+   page = alloc_page(GFP_KERNEL | __GFP_ZERO);
+   if (!page)
+   return VM_FAULT_OOM;
+
+   pfn = page_to_pfn(page);
+
+   /*
+* Prefault the entire VMA range right away to avoid further faults
+*/
+   for (i = 0; i < num_prefault; ++i) {
+
+   if (unlikely(address >= vma->vm_end))
+   break;
+
+   if (vma->vm_flags & VM_MIXEDMAP)
+   ret = vmf_insert_mixed_prot(vma, address,
+   __pfn_to_pfn_t(pfn, 
PFN_DEV),
+   prot);
+   else
+   ret = vmf_insert_pfn_prot(vma, address, pfn, prot);
+
+   /* Never error on prefaulted PTEs */
+   if (unlikely((ret & VM_FAULT_ERROR))) {
+   if (i == 0)
+   return VM_FAULT_NOPAGE;
+   else
+   break;
+   }
+
+   address += PAGE_SIZE;
+   }
+
+   /* Set the page to be freed using drmm release action */
+   if (drmm_add_action_or_reset(ddev, ttm_bo_release_dummy_page, page))
+   return VM_FAULT_OOM;
+
+   return ret;
+}
+EXPORT_SYMBOL(ttm_bo_vm_dummy_page);
+
 vm_fault_t ttm_bo_vm_fault(struct vm_fault *vmf)
 {
struct vm_area_struct *vma = vmf->vma;
pgprot_t prot;
struct ttm_buffer_object *bo = vma->vm_private_data;
+   struct drm_device *ddev = bo->base.dev;
vm_fault_t ret;
+   int idx;
 
ret = ttm_bo_vm_reserve(bo, vmf);
if (ret)
return ret;
 
prot = vma->vm_page_prot;
-   ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT, 1);
+   if (drm_dev_enter(ddev, &idx)) {
+   ret = ttm_bo_vm_fault_reserved(vmf, prot, 
TTM_BO_VM_NUM_PREFAULT, 1);
+   drm_dev_exit(idx);
+   } else {
+   ret = ttm_bo_vm_dummy_page(vmf, prot);
+   }
if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
return ret;
 
diff --git a/include/drm/ttm/ttm_bo_api.h b/include/drm/ttm/ttm_bo_api.h
index 3587f660e8f4..dbb00e495cb4 100644
--- a/include/drm/ttm/ttm_bo_api.h
+++ b/include/drm/ttm/ttm_bo_api.h
@@ -635,4 +635,6 @@ int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned 
long addr,
 void *buf, int len, int write);
 bool ttm_bo_delayed_delete(struct ttm_device *bdev, bool remove_all);
 
+vm_fault_t ttm_bo_vm_dummy_page(struct vm_fault *vmf, pgprot_t prot);
+
 #endif
-- 
2.25.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH v5 11/27] drm/sched: Make timeout timer rearm conditional.

We don't want to rearm the timer if driver hook reports
that the device is gone.

v5: Update drm_gpu_sched_stat values in code.

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/scheduler/sched_main.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index d82a7ebf6099..908b0b56032d 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -314,6 +314,7 @@ static void drm_sched_job_timedout(struct work_struct *work)
 {
struct drm_gpu_scheduler *sched;
struct drm_sched_job *job;
+   enum drm_gpu_sched_stat status = DRM_GPU_SCHED_STAT_NOMINAL;
 
sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
 
@@ -331,7 +332,7 @@ static void drm_sched_job_timedout(struct work_struct *work)
list_del_init(&job->list);
spin_unlock(&sched->job_list_lock);
 
-   job->sched->ops->timedout_job(job);
+   status = job->sched->ops->timedout_job(job);
 
/*
 * Guilty job did complete and hence needs to be manually 
removed
@@ -345,9 +346,11 @@ static void drm_sched_job_timedout(struct work_struct 
*work)
spin_unlock(&sched->job_list_lock);
}
 
-   spin_lock(&sched->job_list_lock);
-   drm_sched_start_timeout(sched);
-   spin_unlock(&sched->job_list_lock);
+   if (status != DRM_GPU_SCHED_STAT_ENODEV) {
+   spin_lock(&sched->job_list_lock);
+   drm_sched_start_timeout(sched);
+   spin_unlock(&sched->job_list_lock);
+   }
 }
 
  /**
-- 
2.25.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH v5 04/27] drm/amdkfd: Split kfd suspend from devie exit

Helps to expdite HW related stuff to amdgpu_pci_remove

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_device.c| 3 ++-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index c5343a5eecbe..9edb35ba181b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -169,7 +169,7 @@ void amdgpu_amdkfd_device_init(struct amdgpu_device *adev)
}
 }
 
-void amdgpu_amdkfd_device_fini(struct amdgpu_device *adev)
+void amdgpu_amdkfd_device_fini_sw(struct amdgpu_device *adev)
 {
if (adev->kfd.dev) {
kgd2kfd_device_exit(adev->kfd.dev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index a81d9cacf9b8..c51001602a68 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -126,7 +126,7 @@ void amdgpu_amdkfd_interrupt(struct amdgpu_device *adev,
const void *ih_ring_entry);
 void amdgpu_amdkfd_device_probe(struct amdgpu_device *adev);
 void amdgpu_amdkfd_device_init(struct amdgpu_device *adev);
-void amdgpu_amdkfd_device_fini(struct amdgpu_device *adev);
+void amdgpu_amdkfd_device_fini_sw(struct amdgpu_device *adev);
 int amdgpu_amdkfd_submit_ib(struct kgd_dev *kgd, enum kgd_engine_type engine,
uint32_t vmid, uint64_t gpu_addr,
uint32_t *ib_cmd, uint32_t ib_len);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 72c893fff61a..1bb8bc6d85f5 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -833,10 +833,11 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd,
return kfd->init_complete;
 }
 
+
+
 void kgd2kfd_device_exit(struct kfd_dev *kfd)
 {
if (kfd->init_complete) {
-   kgd2kfd_suspend(kfd, false);
device_queue_manager_uninit(kfd->dqm);
kfd_interrupt_exit(kfd);
kfd_topology_remove_device(kfd);
-- 
2.25.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH v5 07/27] drm/amdgpu: Remap all page faults to per process dummy page.

On device removal reroute all CPU mappings to dummy page
per drm_file instance or imported GEM object.

v4:
Update for modified ttm_bo_vm_dummy_page

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 21 -
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index a785acc09f20..93163b220e46 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -49,6 +49,7 @@
 
 #include 
 #include 
+#include 
 
 #include "amdgpu.h"
 #include "amdgpu_object.h"
@@ -1982,18 +1983,28 @@ void amdgpu_ttm_set_buffer_funcs_status(struct 
amdgpu_device *adev, bool enable)
 static vm_fault_t amdgpu_ttm_fault(struct vm_fault *vmf)
 {
struct ttm_buffer_object *bo = vmf->vma->vm_private_data;
+   struct drm_device *ddev = bo->base.dev;
vm_fault_t ret;
+   int idx;
 
ret = ttm_bo_vm_reserve(bo, vmf);
if (ret)
return ret;
 
-   ret = amdgpu_bo_fault_reserve_notify(bo);
-   if (ret)
-   goto unlock;
+   if (drm_dev_enter(ddev, &idx)) {
+   ret = amdgpu_bo_fault_reserve_notify(bo);
+   if (ret) {
+   drm_dev_exit(idx);
+   goto unlock;
+   }
 
-   ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma->vm_page_prot,
-  TTM_BO_VM_NUM_PREFAULT, 1);
+ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma->vm_page_prot,
+   TTM_BO_VM_NUM_PREFAULT, 1);
+
+drm_dev_exit(idx);
+   } else {
+   ret = ttm_bo_vm_dummy_page(vmf, vmf->vma->vm_page_prot);
+   }
if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
return ret;
 
-- 
2.25.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH v5 14/27] drm/amdgpu: Fix hang on device removal.

If removing while commands in flight you cannot wait to flush the
HW fences on a ring since the device is gone.

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index fd9282637549..2670201e78d3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -37,6 +37,7 @@
 #include 
 
 #include 
+#include 
 
 #include "amdgpu.h"
 #include "amdgpu_trace.h"
@@ -525,8 +526,7 @@ int amdgpu_fence_driver_init(struct amdgpu_device *adev)
  */
 void amdgpu_fence_driver_fini_hw(struct amdgpu_device *adev)
 {
-   unsigned i, j;
-   int r;
+   int i, r;
 
for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
struct amdgpu_ring *ring = adev->rings[i];
@@ -539,11 +539,15 @@ void amdgpu_fence_driver_fini_hw(struct amdgpu_device 
*adev)
if (!ring->no_scheduler)
drm_sched_fini(&ring->sched);
 
-   r = amdgpu_fence_wait_empty(ring);
-   if (r) {
-   /* no need to trigger GPU reset as we are unloading */
+   /* You can't wait for HW to signal if it's gone */
+   if (!drm_dev_is_unplugged(&adev->ddev))
+   r = amdgpu_fence_wait_empty(ring);
+   else
+   r = -ENODEV;
+   /* no need to trigger GPU reset as we are unloading */
+   if (r)
amdgpu_fence_driver_force_completion(ring);
-   }
+
if (ring->fence_drv.irq_src)
amdgpu_irq_put(adev, ring->fence_drv.irq_src,
   ring->fence_drv.irq_type);
-- 
2.25.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH v5 02/27] drm/ttm: Expose ttm_tt_unpopulate for driver use

It's needed to drop iommu backed pages on device unplug
before device's IOMMU group is released.

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/ttm/ttm_tt.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 48c407cff112..f2ce1b372096 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -431,3 +431,4 @@ void ttm_tt_mgr_init(unsigned long num_pages, unsigned long 
num_dma32_pages)
if (!ttm_dma32_pages_limit)
ttm_dma32_pages_limit = num_dma32_pages;
 }
+EXPORT_SYMBOL(ttm_tt_unpopulate);
-- 
2.25.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH v5 20/27] drm: Scope all DRM IOCTLs with drm_dev_enter/exit

With this calling drm_dev_unplug will flush and block
all in flight IOCTLs

Also, add feature such that if device supports graceful unplug
we enclose entire IOCTL in SRCU critical section.

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/drm_ioctl.c | 15 +--
 include/drm/drm_drv.h   |  6 ++
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c
index d273d1a8603a..5882ef2183bb 100644
--- a/drivers/gpu/drm/drm_ioctl.c
+++ b/drivers/gpu/drm/drm_ioctl.c
@@ -815,7 +815,7 @@ long drm_ioctl(struct file *filp,
const struct drm_ioctl_desc *ioctl = NULL;
drm_ioctl_t *func;
unsigned int nr = DRM_IOCTL_NR(cmd);
-   int retcode = -EINVAL;
+   int idx, retcode = -EINVAL;
char stack_kdata[128];
char *kdata = NULL;
unsigned int in_size, out_size, drv_size, ksize;
@@ -884,7 +884,18 @@ long drm_ioctl(struct file *filp,
if (ksize > in_size)
memset(kdata + in_size, 0, ksize - in_size);
 
-   retcode = drm_ioctl_kernel(filp, func, kdata, ioctl->flags);
+   if (drm_core_check_feature(dev, DRIVER_HOTUNPLUG_SUPPORT)) {
+   if (drm_dev_enter(dev, &idx)) {
+   retcode = drm_ioctl_kernel(filp, func, kdata, 
ioctl->flags);
+   drm_dev_exit(idx);
+   } else {
+   retcode = -ENODEV;
+   goto err_i1;
+   }
+   } else {
+   retcode = drm_ioctl_kernel(filp, func, kdata, ioctl->flags);
+   }
+
if (copy_to_user((void __user *)arg, kdata, out_size) != 0)
retcode = -EFAULT;
 
diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h
index b439ae1921b8..63e05cec46c1 100644
--- a/include/drm/drm_drv.h
+++ b/include/drm/drm_drv.h
@@ -94,6 +94,12 @@ enum drm_driver_feature {
 * synchronization of command submission.
 */
DRIVER_SYNCOBJ_TIMELINE = BIT(6),
+   /**
+* @DRIVER_NO_HOTUNPLUG_SUPPORT:
+*
+* Driver support gracefull remove.
+*/
+   DRIVER_HOTUNPLUG_SUPPORT = BIT(7),
 
/* IMPORTANT: Below are all the legacy flags, add new ones above. */
 
-- 
2.25.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH v5 26/27] drm/amd/display: Remove superflous drm_mode_config_cleanup

It's already being released by DRM core through devm

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index c0b9abb773a4..b9aa15f22cfc 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -3590,7 +3590,6 @@ static int amdgpu_dm_initialize_drm_device(struct 
amdgpu_device *adev)
 
 static void amdgpu_dm_destroy_drm_device(struct amdgpu_display_manager *dm)
 {
-   drm_mode_config_cleanup(dm->ddev);
drm_atomic_private_obj_fini(&dm->atomic_obj);
return;
 }
-- 
2.25.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH v5 18/27] drm/sched: Expose drm_sched_entity_kill_jobs

Will be used to complete all schedulte fences on device
remove

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/scheduler/sched_entity.c | 3 ++-
 include/drm/gpu_scheduler.h  | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index cb58f692dad9..9ff4bfd8f548 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -219,7 +219,7 @@ static void drm_sched_entity_kill_jobs_cb(struct dma_fence 
*f,
  * Makes sure that all remaining jobs in an entity are killed before it is
  * destroyed.
  */
-static void drm_sched_entity_kill_jobs(struct drm_sched_entity *entity)
+void drm_sched_entity_kill_jobs(struct drm_sched_entity *entity)
 {
struct drm_sched_job *job;
int r;
@@ -249,6 +249,7 @@ static void drm_sched_entity_kill_jobs(struct 
drm_sched_entity *entity)
DRM_ERROR("fence add callback failed (%d)\n", r);
}
 }
+EXPORT_SYMBOL(drm_sched_entity_kill_jobs);
 
 /**
  * drm_sched_entity_cleanup - Destroy a context entity
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index f888b5e9583a..9601d5b966ba 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -339,6 +339,7 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
  unsigned int num_sched_list,
  atomic_t *guilty);
 long drm_sched_entity_flush(struct drm_sched_entity *entity, long timeout);
+void drm_sched_entity_kill_jobs(struct drm_sched_entity *entity);
 void drm_sched_entity_fini(struct drm_sched_entity *entity);
 void drm_sched_entity_destroy(struct drm_sched_entity *entity);
 void drm_sched_entity_select_rq(struct drm_sched_entity *entity);
-- 
2.25.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH v5 15/27] drm/scheduler: Fix hang when sched_entity released

Problem: If scheduler is already stopped by the time sched_entity
is released and entity's job_queue not empty I encountred
a hang in drm_sched_entity_flush. This is because drm_sched_entity_is_idle
never becomes false.

Fix: In drm_sched_fini detach all sched_entities from the
scheduler's run queues. This will satisfy drm_sched_entity_is_idle.
Also wakeup all those processes stuck in sched_entity flushing
as the scheduler main thread which wakes them up is stopped by now.

v2:
Reverse order of drm_sched_rq_remove_entity and marking
s_entity as stopped to prevent reinserion back to rq due
to race.

v3:
Drop drm_sched_rq_remove_entity, only modify entity->stopped
and check for it in drm_sched_entity_is_idle

Signed-off-by: Andrey Grodzovsky 
Reviewed-by: Christian König 
---
 drivers/gpu/drm/scheduler/sched_entity.c |  3 ++-
 drivers/gpu/drm/scheduler/sched_main.c   | 24 
 2 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index f0790e9471d1..cb58f692dad9 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -116,7 +116,8 @@ static bool drm_sched_entity_is_idle(struct 
drm_sched_entity *entity)
rmb(); /* for list_empty to work without lock */
 
if (list_empty(&entity->list) ||
-   spsc_queue_count(&entity->job_queue) == 0)
+   spsc_queue_count(&entity->job_queue) == 0 ||
+   entity->stopped)
return true;
 
return false;
diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 908b0b56032d..ba087354d0a8 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -897,9 +897,33 @@ EXPORT_SYMBOL(drm_sched_init);
  */
 void drm_sched_fini(struct drm_gpu_scheduler *sched)
 {
+   struct drm_sched_entity *s_entity;
+   int i;
+
if (sched->thread)
kthread_stop(sched->thread);
 
+   for (i = DRM_SCHED_PRIORITY_COUNT - 1; i >= DRM_SCHED_PRIORITY_MIN; 
i--) {
+   struct drm_sched_rq *rq = &sched->sched_rq[i];
+
+   if (!rq)
+   continue;
+
+   spin_lock(&rq->lock);
+   list_for_each_entry(s_entity, &rq->entities, list)
+   /*
+* Prevents reinsertion and marks job_queue as idle,
+* it will removed from rq in drm_sched_entity_fini
+* eventually
+*/
+   s_entity->stopped = true;
+   spin_unlock(&rq->lock);
+
+   }
+
+   /* Wakeup everyone stuck in drm_sched_entity_flush for this scheduler */
+   wake_up_all(&sched->job_scheduled);
+
/* Confirm no work left behind accessing device structures */
cancel_delayed_work_sync(&sched->work_tdr);
 
-- 
2.25.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH v5 17/27] drm/amdgpu: Add rw_sem to pushing job into sched queue

Will be later used block further submissions once device is
removed. Also complete schedule fence if scheduling failed
due to submission blocking.

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|  3 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 13 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 14 +-
 4 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 3e4755fc10c8..0db0ba4fba89 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1057,6 +1057,9 @@ struct amdgpu_device {
 
struct list_headdevice_bo_list;
 
+   boolstop_job_submissions;
+   struct rw_semaphore sched_fence_completion_sem;
+
/* List of all MMIO BOs */
struct list_headmmio_list;
struct mutexmmio_list_lock;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 3e240b952e79..ac092a5eb4e7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1256,7 +1256,18 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
 
trace_amdgpu_cs_ioctl(job);
amdgpu_vm_bo_trace_cs(&fpriv->vm, &p->ticket);
-   drm_sched_entity_push_job(&job->base, entity);
+
+   down_read(&p->adev->sched_fence_completion_sem);
+   if (!p->adev->stop_job_submissions) {
+   drm_sched_entity_push_job(&job->base, entity);
+   } else {
+   dma_fence_set_error(&job->base.s_fence->scheduled, -ENODEV);
+   dma_fence_set_error(&job->base.s_fence->finished, -ENODEV);
+   dma_fence_signal(&job->base.s_fence->scheduled);
+   dma_fence_signal(&job->base.s_fence->finished);
+   }
+
+   up_read(&p->adev->sched_fence_completion_sem);
 
amdgpu_vm_move_to_lru_tail(p->adev, &fpriv->vm);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 3ddad6cba62d..33e8e9e1d1fe 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3302,6 +3302,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
init_rwsem(&adev->reset_sem);
mutex_init(&adev->psp.mutex);
mutex_init(&adev->notifier_lock);
+   init_rwsem(&adev->sched_fence_completion_sem);
 
r = amdgpu_device_check_arguments(adev);
if (r)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index d33e6d97cc89..26d8b79ea165 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -162,6 +162,7 @@ int amdgpu_job_submit(struct amdgpu_job *job, struct 
drm_sched_entity *entity,
  void *owner, struct dma_fence **f)
 {
int r;
+   struct amdgpu_ring *ring = to_amdgpu_ring(job->base.sched);
 
if (!f)
return -EINVAL;
@@ -172,7 +173,18 @@ int amdgpu_job_submit(struct amdgpu_job *job, struct 
drm_sched_entity *entity,
 
*f = dma_fence_get(&job->base.s_fence->finished);
amdgpu_job_free_resources(job);
-   drm_sched_entity_push_job(&job->base, entity);
+
+   down_read(&ring->adev->sched_fence_completion_sem);
+   if (!ring->adev->stop_job_submissions) {
+   drm_sched_entity_push_job(&job->base, entity);
+   } else {
+   dma_fence_set_error(&job->base.s_fence->scheduled, -ENODEV);
+   dma_fence_set_error(&job->base.s_fence->finished, -ENODEV);
+   dma_fence_signal(&job->base.s_fence->scheduled);
+   dma_fence_signal(&job->base.s_fence->finished);
+
+   }
+   up_read(&ring->adev->sched_fence_completion_sem);
 
return 0;
 }
-- 
2.25.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH v5 06/27] drm/amdgpu: Handle IOMMU enabled case.

Handle all DMA IOMMU gropup related dependencies before the
group is removed.

v5: Drop IOMMU notifier and switch to lockless call to ttm_tt_unpopulate

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 31 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c   |  3 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h   |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c|  9 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 13 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |  2 ++
 drivers/gpu/drm/amd/amdgpu/cik_ih.c|  1 -
 drivers/gpu/drm/amd/amdgpu/cz_ih.c |  1 -
 drivers/gpu/drm/amd/amdgpu/iceland_ih.c|  1 -
 drivers/gpu/drm/amd/amdgpu/navi10_ih.c |  3 ---
 drivers/gpu/drm/amd/amdgpu/si_ih.c |  1 -
 drivers/gpu/drm/amd/amdgpu/tonga_ih.c  |  1 -
 drivers/gpu/drm/amd/amdgpu/vega10_ih.c |  3 ---
 14 files changed, 56 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index fddb82897e5d..30a24db5f4d1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1054,6 +1054,8 @@ struct amdgpu_device {
 
boolin_pci_err_recovery;
struct pci_saved_state  *pci_state;
+
+   struct list_headdevice_bo_list;
 };
 
 static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 46d646c40338..91594ddc2459 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -70,6 +70,7 @@
 #include 
 #include 
 
+
 MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin");
 MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin");
 MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin");
@@ -3211,7 +3212,6 @@ static const struct attribute *amdgpu_dev_attributes[] = {
NULL
 };
 
-
 /**
  * amdgpu_device_init - initialize the driver
  *
@@ -3316,6 +3316,8 @@ int amdgpu_device_init(struct amdgpu_device *adev,
 
INIT_WORK(&adev->xgmi_reset_work, amdgpu_device_xgmi_reset_func);
 
+   INIT_LIST_HEAD(&adev->device_bo_list);
+
adev->gfx.gfx_off_req_count = 1;
adev->pm.ac_power = power_supply_is_system_supplied() > 0;
 
@@ -3601,6 +3603,28 @@ int amdgpu_device_init(struct amdgpu_device *adev,
return r;
 }
 
+static void amdgpu_clear_dma_mappings(struct amdgpu_device *adev)
+{
+   struct amdgpu_bo *bo = NULL;
+
+   /*
+* Unmaps all DMA mappings before device will be removed from it's
+* IOMMU group otherwise in case of IOMMU enabled system a crash
+* will happen.
+*/
+
+   spin_lock(&adev->mman.bdev.lru_lock);
+   while (!list_empty(&adev->device_bo_list)) {
+   bo = list_first_entry(&adev->device_bo_list, struct amdgpu_bo, 
bo);
+   list_del_init(&bo->bo);
+   spin_unlock(&adev->mman.bdev.lru_lock);
+   if (bo->tbo.ttm)
+   ttm_tt_unpopulate(bo->tbo.bdev, bo->tbo.ttm);
+   spin_lock(&adev->mman.bdev.lru_lock);
+   }
+   spin_unlock(&adev->mman.bdev.lru_lock);
+}
+
 /**
  * amdgpu_device_fini - tear down the driver
  *
@@ -3639,12 +3663,15 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev)
amdgpu_ucode_sysfs_fini(adev);
sysfs_remove_files(&adev->dev->kobj, amdgpu_dev_attributes);
 
-
amdgpu_fbdev_fini(adev);
 
amdgpu_irq_fini_hw(adev);
 
amdgpu_device_ip_fini_early(adev);
+
+   amdgpu_clear_dma_mappings(adev);
+
+   amdgpu_gart_dummy_page_fini(adev);
 }
 
 void amdgpu_device_fini_sw(struct amdgpu_device *adev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
index fde2d899b2c4..49cdcaf8512d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
@@ -92,7 +92,7 @@ static int amdgpu_gart_dummy_page_init(struct amdgpu_device 
*adev)
  *
  * Frees the dummy page used by the driver (all asics).
  */
-static void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev)
+void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev)
 {
if (!adev->dummy_page_addr)
return;
@@ -397,5 +397,4 @@ void amdgpu_gart_fini(struct amdgpu_device *adev)
vfree(adev->gart.pages);
adev->gart.pages = NULL;
 #endif
-   amdgpu_gart_dummy_page_fini(adev);
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h
index afa2e2877d87..5678d9c105ab 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h
@@ -61,6 +61,7 @@ int amdgpu_gart_table_vram_pin(struct amdgpu_device *adev);
 void amdgpu_gart_table_vram_unpin(struct amdgpu_device *adev);
 int amdgpu_gart_init(struct a

[PATCH v5 09/27] dmr/amdgpu: Move some sysfs attrs creation to default_attr

This allows to remove explicit creation and destruction
of those attrs and by this avoids warnings on device
finilizing post physical device extraction.

v5: Use newly added pci_driver.dev_groups directly

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c | 17 ++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c  | 13 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c  | 25 
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 14 ---
 4 files changed, 37 insertions(+), 32 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c
index 86add0f4ea4d..0346e124ab8c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_atombios.c
@@ -1953,6 +1953,15 @@ static ssize_t amdgpu_atombios_get_vbios_version(struct 
device *dev,
 static DEVICE_ATTR(vbios_version, 0444, amdgpu_atombios_get_vbios_version,
   NULL);
 
+static struct attribute *amdgpu_vbios_version_attrs[] = {
+   &dev_attr_vbios_version.attr,
+   NULL
+};
+
+const struct attribute_group amdgpu_vbios_version_attr_group = {
+   .attrs = amdgpu_vbios_version_attrs
+};
+
 /**
  * amdgpu_atombios_fini - free the driver info and callbacks for atombios
  *
@@ -1972,7 +1981,6 @@ void amdgpu_atombios_fini(struct amdgpu_device *adev)
adev->mode_info.atom_context = NULL;
kfree(adev->mode_info.atom_card_info);
adev->mode_info.atom_card_info = NULL;
-   device_remove_file(adev->dev, &dev_attr_vbios_version);
 }
 
 /**
@@ -1989,7 +1997,6 @@ int amdgpu_atombios_init(struct amdgpu_device *adev)
 {
struct card_info *atom_card_info =
kzalloc(sizeof(struct card_info), GFP_KERNEL);
-   int ret;
 
if (!atom_card_info)
return -ENOMEM;
@@ -2027,12 +2034,6 @@ int amdgpu_atombios_init(struct amdgpu_device *adev)
amdgpu_atombios_allocate_fb_scratch(adev);
}
 
-   ret = device_create_file(adev->dev, &dev_attr_vbios_version);
-   if (ret) {
-   DRM_ERROR("Failed to create device file for VBIOS version\n");
-   return ret;
-   }
-
return 0;
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 54cb5ee2f563..f799c40d7e72 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -1605,6 +1605,18 @@ static struct pci_error_handlers amdgpu_pci_err_handler 
= {
.resume = amdgpu_pci_resume,
 };
 
+extern const struct attribute_group amdgpu_vram_mgr_attr_group;
+extern const struct attribute_group amdgpu_gtt_mgr_attr_group;
+extern const struct attribute_group amdgpu_vbios_version_attr_group;
+
+static const struct attribute_group *amdgpu_sysfs_groups[] = {
+   &amdgpu_vram_mgr_attr_group,
+   &amdgpu_gtt_mgr_attr_group,
+   &amdgpu_vbios_version_attr_group,
+   NULL,
+};
+
+
 static struct pci_driver amdgpu_kms_pci_driver = {
.name = DRIVER_NAME,
.id_table = pciidlist,
@@ -1613,6 +1625,7 @@ static struct pci_driver amdgpu_kms_pci_driver = {
.shutdown = amdgpu_pci_shutdown,
.driver.pm = &amdgpu_pm_ops,
.err_handler = &amdgpu_pci_err_handler,
+   .dev_groups = amdgpu_sysfs_groups,
 };
 
 static int __init amdgpu_init(void)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
index 8980329cded0..3b7150e1c5ed 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
@@ -77,6 +77,16 @@ static DEVICE_ATTR(mem_info_gtt_total, S_IRUGO,
 static DEVICE_ATTR(mem_info_gtt_used, S_IRUGO,
   amdgpu_mem_info_gtt_used_show, NULL);
 
+static struct attribute *amdgpu_gtt_mgr_attributes[] = {
+   &dev_attr_mem_info_gtt_total.attr,
+   &dev_attr_mem_info_gtt_used.attr,
+   NULL
+};
+
+const struct attribute_group amdgpu_gtt_mgr_attr_group = {
+   .attrs = amdgpu_gtt_mgr_attributes
+};
+
 static const struct ttm_resource_manager_func amdgpu_gtt_mgr_func;
 /**
  * amdgpu_gtt_mgr_init - init GTT manager and DRM MM
@@ -91,7 +101,6 @@ int amdgpu_gtt_mgr_init(struct amdgpu_device *adev, uint64_t 
gtt_size)
struct amdgpu_gtt_mgr *mgr = &adev->mman.gtt_mgr;
struct ttm_resource_manager *man = &mgr->manager;
uint64_t start, size;
-   int ret;
 
man->use_tt = true;
man->func = &amdgpu_gtt_mgr_func;
@@ -104,17 +113,6 @@ int amdgpu_gtt_mgr_init(struct amdgpu_device *adev, 
uint64_t gtt_size)
spin_lock_init(&mgr->lock);
atomic64_set(&mgr->available, gtt_size >> PAGE_SHIFT);
 
-   ret = device_create_file(adev->dev, &dev_attr_mem_info_gtt_total);
-   if (ret) {
-   DRM_ERROR("Failed to create device file mem_info_gtt_total\n");
-   return ret;
-   }
-   ret = device_create_file(adev->dev

[PATCH v5 16/27] drm/amdgpu: Unmap all MMIO mappings

Access to those must be prevented post pci_remove

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|  5 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 38 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 28 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |  5 +++
 4 files changed, 71 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 30a24db5f4d1..3e4755fc10c8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1056,6 +1056,11 @@ struct amdgpu_device {
struct pci_saved_state  *pci_state;
 
struct list_headdevice_bo_list;
+
+   /* List of all MMIO BOs */
+   struct list_headmmio_list;
+   struct mutexmmio_list_lock;
+
 };
 
 static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 22b09c4db255..3ddad6cba62d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3320,6 +3320,9 @@ int amdgpu_device_init(struct amdgpu_device *adev,
INIT_LIST_HEAD(&adev->shadow_list);
mutex_init(&adev->shadow_list_lock);
 
+   INIT_LIST_HEAD(&adev->mmio_list);
+   mutex_init(&adev->mmio_list_lock);
+
INIT_DELAYED_WORK(&adev->delayed_init_work,
  amdgpu_device_delayed_init_work_handler);
INIT_DELAYED_WORK(&adev->gfx.gfx_off_delay_work,
@@ -3636,6 +3639,36 @@ static void amdgpu_clear_dma_mappings(struct 
amdgpu_device *adev)
spin_unlock(&adev->mman.bdev.lru_lock);
 }
 
+static void amdgpu_device_unmap_mmio(struct amdgpu_device *adev)
+{
+   struct amdgpu_bo *bo;
+
+   /* Clear all CPU mappings pointing to this device */
+   unmap_mapping_range(adev->ddev.anon_inode->i_mapping, 0, 0, 1);
+
+   /* Unmap all MMIO mapped kernel BOs */
+   mutex_lock(&adev->mmio_list_lock);
+   list_for_each_entry(bo, &adev->mmio_list, mmio_list) {
+   amdgpu_bo_kunmap(bo);
+   if (*bo->kmap_ptr)
+   *bo->kmap_ptr = NULL;
+   }
+   mutex_unlock(&adev->mmio_list_lock);
+
+   /* Unmap all mapped bars - Doorbell, registers and VRAM */
+   amdgpu_device_doorbell_fini(adev);
+
+   iounmap(adev->rmmio);
+   adev->rmmio = NULL;
+   if (adev->mman.aper_base_kaddr)
+   iounmap(adev->mman.aper_base_kaddr);
+   adev->mman.aper_base_kaddr = NULL;
+
+   /* Memory manager related */
+   arch_phys_wc_del(adev->gmc.vram_mtrr);
+   arch_io_free_memtype_wc(adev->gmc.aper_base, adev->gmc.aper_size);
+}
+
 /**
  * amdgpu_device_fini - tear down the driver
  *
@@ -3683,6 +3716,8 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev)
amdgpu_clear_dma_mappings(adev);
 
amdgpu_gart_dummy_page_fini(adev);
+
+   amdgpu_device_unmap_mmio(adev);
 }
 
 void amdgpu_device_fini_sw(struct amdgpu_device *adev)
@@ -3713,9 +3748,6 @@ void amdgpu_device_fini_sw(struct amdgpu_device *adev)
if (adev->rio_mem)
pci_iounmap(adev->pdev, adev->rio_mem);
adev->rio_mem = NULL;
-   iounmap(adev->rmmio);
-   adev->rmmio = NULL;
-   amdgpu_device_doorbell_fini(adev);
 
if (IS_ENABLED(CONFIG_PERF_EVENTS))
amdgpu_pmu_fini(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 62d829f5e62c..9b05e3b96fa0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -531,6 +531,9 @@ static int amdgpu_bo_do_create(struct amdgpu_device *adev,
return -ENOMEM;
drm_gem_private_object_init(adev_to_drm(adev), &bo->tbo.base, size);
INIT_LIST_HEAD(&bo->shadow_list);
+
+   INIT_LIST_HEAD(&bo->mmio_list);
+
bo->vm_bo = NULL;
bo->preferred_domains = bp->preferred_domain ? bp->preferred_domain :
bp->domain;
@@ -774,9 +777,21 @@ int amdgpu_bo_kmap(struct amdgpu_bo *bo, void **ptr)
if (r)
return r;
 
-   if (ptr)
+   if (bo->kmap.bo_kmap_type == ttm_bo_map_iomap) {
+   struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
+
+   mutex_lock(&adev->mmio_list_lock);
+   list_add_tail(&bo->mmio_list, &adev->mmio_list);
+   mutex_unlock(&adev->mmio_list_lock);
+   }
+
+   if (ptr) {
*ptr = amdgpu_bo_kptr(bo);
 
+   if (bo->kmap.bo_kmap_type == ttm_bo_map_iomap)
+   bo->kmap_ptr = ptr;
+   }
+
return 0;
 }
 
@@ -804,8 +819,17 @@ void *amdgpu_bo_kptr(struct amdgpu_bo *bo)
  */
 void amdgpu_bo_kunmap(struct amdgpu_bo *bo)
 {
-   if (bo->kmap.bo)
+   struct amdgpu_device *a

[PATCH v5 13/27] drm/amdgpu: When filizing the fence driver. stop scheduler first.

No point calling amdgpu_fence_wait_empty before stopping the
SW scheduler otherwise there is always a chance another job sneaked
in after the wait.

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index 34d51e962799..fd9282637549 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -533,6 +533,12 @@ void amdgpu_fence_driver_fini_hw(struct amdgpu_device 
*adev)
 
if (!ring || !ring->fence_drv.initialized)
continue;
+
+   /* Stop any new job submissions from sched before flushing the 
ring */
+   /* TODO Handle amdgpu_job_submit_direct and 
amdgpu_amdkfd_submit_ib */
+   if (!ring->no_scheduler)
+   drm_sched_fini(&ring->sched);
+
r = amdgpu_fence_wait_empty(ring);
if (r) {
/* no need to trigger GPU reset as we are unloading */
@@ -541,8 +547,7 @@ void amdgpu_fence_driver_fini_hw(struct amdgpu_device *adev)
if (ring->fence_drv.irq_src)
amdgpu_irq_put(adev, ring->fence_drv.irq_src,
   ring->fence_drv.irq_type);
-   if (!ring->no_scheduler)
-   drm_sched_fini(&ring->sched);
+
del_timer_sync(&ring->fence_drv.fallback_timer);
}
 }
-- 
2.25.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH v5 10/27] drm/amdgpu: Guard against write accesses after device removal

This should prevent writing to memory or IO ranges possibly
already allocated for other uses after our device is removed.

v5:
Protect more places wher memcopy_to/form_io takes place
Protect IB submissions

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c|  75 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c   |   9 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c| 228 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c   | 115 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h   |   3 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c  |  70 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h  |  49 +---
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c   |  31 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c   |  11 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c   |  22 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c|   7 +-
 drivers/gpu/drm/amd/amdgpu/psp_v11_0.c|  44 ++--
 drivers/gpu/drm/amd/amdgpu/psp_v12_0.c|   8 +-
 drivers/gpu/drm/amd/amdgpu/psp_v3_1.c |   8 +-
 drivers/gpu/drm/amd/amdgpu/vce_v4_0.c |  26 +-
 drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c |  22 +-
 .../drm/amd/pm/powerplay/smumgr/smu7_smumgr.c |   2 +
 17 files changed, 425 insertions(+), 305 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 91594ddc2459..22b09c4db255 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -71,6 +71,8 @@
 #include 
 
 
+#include 
+
 MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin");
 MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin");
 MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin");
@@ -279,48 +281,55 @@ void amdgpu_device_vram_access(struct amdgpu_device 
*adev, loff_t pos,
unsigned long flags;
uint32_t hi = ~0;
uint64_t last;
+   int idx;
 
+   if (drm_dev_enter(&adev->ddev, &idx)) {
 
 #ifdef CONFIG_64BIT
-   last = min(pos + size, adev->gmc.visible_vram_size);
-   if (last > pos) {
-   void __iomem *addr = adev->mman.aper_base_kaddr + pos;
-   size_t count = last - pos;
-
-   if (write) {
-   memcpy_toio(addr, buf, count);
-   mb();
-   amdgpu_asic_flush_hdp(adev, NULL);
-   } else {
-   amdgpu_asic_invalidate_hdp(adev, NULL);
-   mb();
-   memcpy_fromio(buf, addr, count);
-   }
+   last = min(pos + size, adev->gmc.visible_vram_size);
+   if (last > pos) {
+   void __iomem *addr = adev->mman.aper_base_kaddr + pos;
+   size_t count = last - pos;
+
+   if (write) {
+   memcpy_toio(addr, buf, count);
+   mb();
+   amdgpu_asic_flush_hdp(adev, NULL);
+   } else {
+   amdgpu_asic_invalidate_hdp(adev, NULL);
+   mb();
+   memcpy_fromio(buf, addr, count);
+   }
 
-   if (count == size)
-   return;
+   if (count == size) {
+   drm_dev_exit(idx);
+   return;
+   }
 
-   pos += count;
-   buf += count / 4;
-   size -= count;
-   }
+   pos += count;
+   buf += count / 4;
+   size -= count;
+   }
 #endif
 
-   spin_lock_irqsave(&adev->mmio_idx_lock, flags);
-   for (last = pos + size; pos < last; pos += 4) {
-   uint32_t tmp = pos >> 31;
+   spin_lock_irqsave(&adev->mmio_idx_lock, flags);
+   for (last = pos + size; pos < last; pos += 4) {
+   uint32_t tmp = pos >> 31;
 
-   WREG32_NO_KIQ(mmMM_INDEX, ((uint32_t)pos) | 0x8000);
-   if (tmp != hi) {
-   WREG32_NO_KIQ(mmMM_INDEX_HI, tmp);
-   hi = tmp;
+   WREG32_NO_KIQ(mmMM_INDEX, ((uint32_t)pos) | 0x8000);
+   if (tmp != hi) {
+   WREG32_NO_KIQ(mmMM_INDEX_HI, tmp);
+   hi = tmp;
+   }
+   if (write)
+   WREG32_NO_KIQ(mmMM_DATA, *buf++);
+   else
+   *buf++ = RREG32_NO_KIQ(mmMM_DATA);
}
-   if (write)
-   WREG32_NO_KIQ(mmMM_DATA, *buf++);
-   else
-   *buf++ = RREG32_NO_KIQ(mmMM_DATA);
+   spin_unlock_irqrestore(&adev->mmio_idx_lock, flags);
+
+   drm_dev_exit(idx);
}
-

[PATCH v5 21/27] drm/amdgpu: Add support for hot-unplug feature at DRM level.

To allow scoping DRM IOCTLs with drm_dev_enter/exit.

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 8a19b8dd02ee..d0f34f230ef3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -1579,7 +1579,7 @@ static const struct drm_driver amdgpu_kms_driver = {
DRIVER_ATOMIC |
DRIVER_GEM |
DRIVER_RENDER | DRIVER_MODESET | DRIVER_SYNCOBJ |
-   DRIVER_SYNCOBJ_TIMELINE,
+   DRIVER_SYNCOBJ_TIMELINE | DRIVER_HOTUNPLUG_SUPPORT,
.open = amdgpu_driver_open_kms,
.postclose = amdgpu_driver_postclose_kms,
.lastclose = amdgpu_driver_lastclose_kms,
-- 
2.25.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH v5 23/27] drm/amd/powerplay: Scope all PM queued work with drm_dev_enter/exit

To allow completion and further block of HW accesses post device PCI
remove.

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/pm/amdgpu_dpm.c   | 44 +--
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 26 +++---
 2 files changed, 47 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c 
b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
index 8fb12afe3c96..649e10d52d17 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
@@ -31,6 +31,7 @@
 #include "amdgpu_display.h"
 #include "hwmgr.h"
 #include 
+#include 
 
 #define WIDTH_4K 3840
 
@@ -1316,29 +1317,36 @@ void amdgpu_dpm_thermal_work_handler(struct work_struct 
*work)
/* switch to the thermal state */
enum amd_pm_state_type dpm_state = POWER_STATE_TYPE_INTERNAL_THERMAL;
int temp, size = sizeof(temp);
+   int idx;
 
if (!adev->pm.dpm_enabled)
return;
 
-   if (!amdgpu_dpm_read_sensor(adev, AMDGPU_PP_SENSOR_GPU_TEMP,
-   (void *)&temp, &size)) {
-   if (temp < adev->pm.dpm.thermal.min_temp)
-   /* switch back the user state */
-   dpm_state = adev->pm.dpm.user_state;
-   } else {
-   if (adev->pm.dpm.thermal.high_to_low)
-   /* switch back the user state */
-   dpm_state = adev->pm.dpm.user_state;
-   }
-   mutex_lock(&adev->pm.mutex);
-   if (dpm_state == POWER_STATE_TYPE_INTERNAL_THERMAL)
-   adev->pm.dpm.thermal_active = true;
-   else
-   adev->pm.dpm.thermal_active = false;
-   adev->pm.dpm.state = dpm_state;
-   mutex_unlock(&adev->pm.mutex);
+   if (drm_dev_enter(&adev->ddev, &idx)) {
 
-   amdgpu_pm_compute_clocks(adev);
+   if (!amdgpu_dpm_read_sensor(adev, AMDGPU_PP_SENSOR_GPU_TEMP,
+   (void *)&temp, &size)) {
+   if (temp < adev->pm.dpm.thermal.min_temp)
+   /* switch back the user state */
+   dpm_state = adev->pm.dpm.user_state;
+   } else {
+   if (adev->pm.dpm.thermal.high_to_low)
+   /* switch back the user state */
+   dpm_state = adev->pm.dpm.user_state;
+   }
+   mutex_lock(&adev->pm.mutex);
+   if (dpm_state == POWER_STATE_TYPE_INTERNAL_THERMAL)
+   adev->pm.dpm.thermal_active = true;
+   else
+   adev->pm.dpm.thermal_active = false;
+   adev->pm.dpm.state = dpm_state;
+   mutex_unlock(&adev->pm.mutex);
+
+   amdgpu_pm_compute_clocks(adev);
+
+   drm_dev_exit(idx);
+
+   }
 }
 
 static struct amdgpu_ps *amdgpu_dpm_pick_power_state(struct amdgpu_device 
*adev,
diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c 
b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
index d143ef1b460b..f034c8a5eb44 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
@@ -25,6 +25,8 @@
 #include 
 #include 
 
+#include 
+
 #include "amdgpu.h"
 #include "amdgpu_smu.h"
 #include "smu_internal.h"
@@ -904,21 +906,35 @@ static void smu_throttling_logging_work_fn(struct 
work_struct *work)
 {
struct smu_context *smu = container_of(work, struct smu_context,
   throttling_logging_work);
+   int idx;
+
+
+   if (drm_dev_enter(&smu->adev->ddev, &idx)) {
+
+   smu_log_thermal_throttling(smu);
 
-   smu_log_thermal_throttling(smu);
+   drm_dev_exit(idx);
+   }
 }
 
 static void smu_interrupt_work_fn(struct work_struct *work)
 {
struct smu_context *smu = container_of(work, struct smu_context,
   interrupt_work);
+   int idx;
 
-   mutex_lock(&smu->mutex);
+   if (drm_dev_enter(&smu->adev->ddev, &idx)) {
 
-   if (smu->ppt_funcs && smu->ppt_funcs->interrupt_work)
-   smu->ppt_funcs->interrupt_work(smu);
+   mutex_lock(&smu->mutex);
 
-   mutex_unlock(&smu->mutex);
+   if (smu->ppt_funcs && smu->ppt_funcs->interrupt_work)
+   smu->ppt_funcs->interrupt_work(smu);
+
+   mutex_unlock(&smu->mutex);
+
+   drm_dev_exit(idx);
+
+   }
 }
 
 static int smu_sw_init(void *handle)
-- 
2.25.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH v5 25/27] drm/amdgpu: Scope all amdgpu queued work with drm_dev_enter/exit

To allow completion and further block of HW accesses post device PCI
remove.

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 11 +++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 29 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c| 26 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c   | 28 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c| 55 ---
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c| 43 ---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c| 30 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c| 61 --
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c  | 10 +++-
 9 files changed, 189 insertions(+), 104 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 9edb35ba181b..f942496c2b35 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -30,6 +30,7 @@
 #include 
 #include "amdgpu_xgmi.h"
 #include 
+#include 
 
 /* Total memory size in system memory and all GPU VRAM. Used to
  * estimate worst case amount of memory to reserve for page tables
@@ -223,9 +224,15 @@ int amdgpu_amdkfd_post_reset(struct amdgpu_device *adev)
 void amdgpu_amdkfd_gpu_reset(struct kgd_dev *kgd)
 {
struct amdgpu_device *adev = (struct amdgpu_device *)kgd;
+   int idx;
 
-   if (amdgpu_device_should_recover_gpu(adev))
-   amdgpu_device_gpu_recover(adev, NULL);
+   if (drm_dev_enter(&adev->ddev, &idx)) {
+
+   if (amdgpu_device_should_recover_gpu(adev))
+   amdgpu_device_gpu_recover(adev, NULL);
+
+   drm_dev_exit(idx);
+   }
 }
 
 int amdgpu_amdkfd_alloc_gtt_mem(struct kgd_dev *kgd, size_t size,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 55afc11c17e6..c30e0b0596a5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2674,24 +2674,35 @@ static void 
amdgpu_device_delayed_init_work_handler(struct work_struct *work)
 {
struct amdgpu_device *adev =
container_of(work, struct amdgpu_device, 
delayed_init_work.work);
-   int r;
+   int r, idx;
 
-   r = amdgpu_ib_ring_tests(adev);
-   if (r)
-   DRM_ERROR("ib ring test failed (%d).\n", r);
+   if (drm_dev_enter(&adev->ddev, &idx)) {
+   r = amdgpu_ib_ring_tests(adev);
+   if (r)
+   DRM_ERROR("ib ring test failed (%d).\n", r);
+
+   drm_dev_exit(idx);
+   }
 }
 
 static void amdgpu_device_delay_enable_gfx_off(struct work_struct *work)
 {
struct amdgpu_device *adev =
container_of(work, struct amdgpu_device, 
gfx.gfx_off_delay_work.work);
+   int idx;
+
+   if (drm_dev_enter(&adev->ddev, &idx)) {
+
+   mutex_lock(&adev->gfx.gfx_off_mutex);
+   if (!adev->gfx.gfx_off_state && !adev->gfx.gfx_off_req_count) {
+   if (!amdgpu_dpm_set_powergating_by_smu(adev, 
AMD_IP_BLOCK_TYPE_GFX, true))
+   adev->gfx.gfx_off_state = true;
+   }
+   mutex_unlock(&adev->gfx.gfx_off_mutex);
+
+   drm_dev_exit(idx);
 
-   mutex_lock(&adev->gfx.gfx_off_mutex);
-   if (!adev->gfx.gfx_off_state && !adev->gfx.gfx_off_req_count) {
-   if (!amdgpu_dpm_set_powergating_by_smu(adev, 
AMD_IP_BLOCK_TYPE_GFX, true))
-   adev->gfx.gfx_off_state = true;
}
-   mutex_unlock(&adev->gfx.gfx_off_mutex);
 }
 
 /**
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
index a922154953a7..5eda0d0fc974 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
@@ -188,8 +188,15 @@ static void amdgpu_irq_handle_ih1(struct work_struct *work)
 {
struct amdgpu_device *adev = container_of(work, struct amdgpu_device,
  irq.ih1_work);
+   int idx;
 
-   amdgpu_ih_process(adev, &adev->irq.ih1);
+   if (drm_dev_enter(&adev->ddev, &idx)) {
+
+   amdgpu_ih_process(adev, &adev->irq.ih1);
+
+   drm_dev_exit(idx);
+
+   }
 }
 
 /**
@@ -203,8 +210,14 @@ static void amdgpu_irq_handle_ih2(struct work_struct *work)
 {
struct amdgpu_device *adev = container_of(work, struct amdgpu_device,
  irq.ih2_work);
+   int idx;
+
+   if (drm_dev_enter(&adev->ddev, &idx)) {
+
+   amdgpu_ih_process(adev, &adev->irq.ih2);
 
-   amdgpu_ih_process(adev, &adev->irq.ih2);
+   drm_dev_exit(idx);
+   }
 }
 
 /**
@@ -218,8 +231,15 @@ static void amdgpu_irq_handle_ih_soft(struct work_struct 
*work)
 {
struct amdgpu_device *adev = container_of(work, struct amdgpu_device,

[PATCH v5 24/27] drm/amdkfd: Scope all KFD queued work with drm_dev_enter/exit

To allow completion and further block of HW accesses post device PCI
remove.

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c
index bc47f6a44456..563f02ab5b95 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c
@@ -43,8 +43,10 @@
 #include 
 #include 
 #include 
+#include 
 #include "kfd_priv.h"
 
+
 #define KFD_IH_NUM_ENTRIES 8192
 
 static void interrupt_wq(struct work_struct *);
@@ -145,15 +147,21 @@ static void interrupt_wq(struct work_struct *work)
struct kfd_dev *dev = container_of(work, struct kfd_dev,
interrupt_work);
uint32_t ih_ring_entry[KFD_MAX_RING_ENTRY_SIZE];
+   int idx;
 
if (dev->device_info->ih_ring_entry_size > sizeof(ih_ring_entry)) {
dev_err_once(kfd_chardev(), "Ring entry too small\n");
return;
}
 
-   while (dequeue_ih_ring_entry(dev, ih_ring_entry))
-   dev->device_info->event_interrupt_class->interrupt_wq(dev,
-   ih_ring_entry);
+   if (drm_dev_enter(dev->ddev, &idx)) {
+
+   while (dequeue_ih_ring_entry(dev, ih_ring_entry))
+   
dev->device_info->event_interrupt_class->interrupt_wq(dev,
+   
ih_ring_entry);
+
+   drm_dev_exit(idx);
+   }
 }
 
 bool interrupt_is_wanted(struct kfd_dev *dev,
-- 
2.25.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH v5 19/27] drm/amdgpu: Finilise device fences on device remove.

Make sure all fecens dependent on HW present are force signaled
when handling device removal. This helpes later to scope all HW
accesing code such as IOCTLs in drm_dev_enter/exit and use
drm_dev_unplug as synchronization point past which we know HW
will not be accessed anymore outside of pci remove driver callback.

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|  2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 98 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c|  6 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 12 +--
 4 files changed, 103 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 0db0ba4fba89..df6c5ed676b1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1374,6 +1374,8 @@ void amdgpu_pci_resume(struct pci_dev *pdev);
 bool amdgpu_device_cache_pci_state(struct pci_dev *pdev);
 bool amdgpu_device_load_pci_state(struct pci_dev *pdev);
 
+void amdgpu_finilize_device_fences(struct drm_device *dev);
+
 #include "amdgpu_object.h"
 
 static inline bool amdgpu_is_tmz(struct amdgpu_device *adev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 33e8e9e1d1fe..55afc11c17e6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3692,15 +3692,12 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev)
amdgpu_virt_fini_data_exchange(adev);
}
 
-   /* disable all interrupts */
-   amdgpu_irq_disable_all(adev);
if (adev->mode_info.mode_config_initialized){
if (!amdgpu_device_has_dc_support(adev))
drm_helper_force_disable_all(adev_to_drm(adev));
else
drm_atomic_helper_shutdown(adev_to_drm(adev));
}
-   amdgpu_fence_driver_fini_hw(adev);
 
if (adev->pm_sysfs_en)
amdgpu_pm_sysfs_fini(adev);
@@ -4567,14 +4564,19 @@ static bool amdgpu_device_lock_adev(struct 
amdgpu_device *adev,
return true;
 }
 
-static void amdgpu_device_unlock_adev(struct amdgpu_device *adev)
+static void amdgpu_device_unlock_adev_imp(struct amdgpu_device *adev, bool 
skip_in_gpu_reset)
 {
amdgpu_vf_error_trans_all(adev);
adev->mp1_state = PP_MP1_STATE_NONE;
-   atomic_set(&adev->in_gpu_reset, 0);
+   !skip_in_gpu_reset ? atomic_set(&adev->in_gpu_reset, 0) : 0;
up_write(&adev->reset_sem);
 }
 
+static void amdgpu_device_unlock_adev(struct amdgpu_device *adev)
+{
+   amdgpu_device_unlock_adev_imp(adev, false);
+}
+
 /*
  * to lockup a list of amdgpu devices in a hive safely, if not a hive
  * with multiple nodes, it will be similar as amdgpu_device_lock_adev.
@@ -5321,3 +5323,89 @@ bool amdgpu_device_load_pci_state(struct pci_dev *pdev)
 }
 
 
+static void amdgpu_finilize_schedulded_fences(struct amdgpu_ctx_mgr *mgr)
+{
+   struct amdgpu_ctx *ctx;
+   struct idr *idp;
+   uint32_t id, i, j;
+
+   idp = &mgr->ctx_handles;
+
+   idr_for_each_entry(idp, ctx, id) {
+   for (i = 0; i < AMDGPU_HW_IP_NUM; ++i) {
+   for (j = 0; j < amdgpu_ctx_num_entities[i]; ++j) {
+   struct drm_sched_entity *entity;
+
+   if (!ctx->entities[i][j])
+   continue;
+
+   entity = &ctx->entities[i][j]->entity;
+   drm_sched_entity_kill_jobs(entity);
+   }
+   }
+   }
+}
+
+/**
+ * amdgpu_finilize_device_fences() - Finilize all device fences
+ * @pdev: pointer to PCI device
+ *
+ * Will disable and finilise ISRs and will signal all fences
+ * that might hang if HW is gone
+ */
+void amdgpu_finilize_device_fences(struct drm_device *dev)
+{
+   struct amdgpu_device *adev = drm_to_adev(dev);
+   struct drm_file *file;
+
+   /*
+*  Block TDRs from further execution by setting adev->in_gpu_reset
+*  instead of holding full reset lock in order to not deadlock
+*  further ahead against any thread locking the reset lock when we
+*  wait for it's completion
+*/
+   while (!amdgpu_device_lock_adev(adev, NULL))
+   amdgpu_cancel_all_tdr(adev);
+   amdgpu_device_unlock_adev_imp(adev, true);
+
+
+   /* disable all HW interrupts */
+   amdgpu_irq_disable_all(adev);
+
+   /* stop and flush all in flight HW interrupts handlers */
+   disable_irq(pci_irq_vector(adev->pdev, 0));
+
+   /*
+* Stop SW GPU schedulers and force completion on all HW fences. Since
+* in the prev. step all ISRs were disabled and completed the
+* HW fence array is idle (no insertions or extractions) and so it's
+* safe to iterate it bellow.
+* After this step all HW fences

[PATCH v5 12/27] drm/amdgpu: Prevent any job recoveries after device is unplugged.

Return DRM_TASK_STATUS_ENODEV back to the scheduler when device
is not present so they timeout timer will not be rearmed.

v5: Update to match updated return values in enum drm_gpu_sched_stat

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 19 ---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 759b34799221..d33e6d97cc89 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -25,6 +25,8 @@
 #include 
 #include 
 
+#include 
+
 #include "amdgpu.h"
 #include "amdgpu_trace.h"
 
@@ -34,6 +36,15 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct 
drm_sched_job *s_job)
struct amdgpu_job *job = to_amdgpu_job(s_job);
struct amdgpu_task_info ti;
struct amdgpu_device *adev = ring->adev;
+   int idx;
+
+   if (!drm_dev_enter(&adev->ddev, &idx)) {
+   DRM_INFO("%s - device unplugged skipping recovery on 
scheduler:%s",
+__func__, s_job->sched->name);
+
+   /* Effectively the job is aborted as the device is gone */
+   return DRM_GPU_SCHED_STAT_ENODEV;
+   }
 
memset(&ti, 0, sizeof(struct amdgpu_task_info));
 
@@ -41,7 +52,7 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct 
drm_sched_job *s_job)
amdgpu_ring_soft_recovery(ring, job->vmid, s_job->s_fence->parent)) 
{
DRM_ERROR("ring %s timeout, but soft recovered\n",
  s_job->sched->name);
-   return DRM_GPU_SCHED_STAT_NOMINAL;
+   goto exit;
}
 
amdgpu_vm_get_task_info(ring->adev, job->pasid, &ti);
@@ -53,13 +64,15 @@ static enum drm_gpu_sched_stat amdgpu_job_timedout(struct 
drm_sched_job *s_job)
 
if (amdgpu_device_should_recover_gpu(ring->adev)) {
amdgpu_device_gpu_recover(ring->adev, job);
-   return DRM_GPU_SCHED_STAT_NOMINAL;
} else {
drm_sched_suspend_timeout(&ring->sched);
if (amdgpu_sriov_vf(adev))
adev->virt.tdr_debug = true;
-   return DRM_GPU_SCHED_STAT_NOMINAL;
}
+
+exit:
+   drm_dev_exit(idx);
+   return DRM_GPU_SCHED_STAT_NOMINAL;
 }
 
 int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned num_ibs,
-- 
2.25.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

[PATCH v5 22/27] drm/amd/display: Scope all DM queued work with drm_dev_enter/exit

To allow completion and further block of HW accesses post device PCI
remove.

Signed-off-by: Andrey Grodzovsky 
---
 .../amd/display/amdgpu_dm/amdgpu_dm_hdcp.c| 124 +++---
 .../drm/amd/display/amdgpu_dm/amdgpu_dm_irq.c |  24 +++-
 2 files changed, 98 insertions(+), 50 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_hdcp.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_hdcp.c
index 0cdbfcd475ec..81ea5a1ea46b 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_hdcp.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_hdcp.c
@@ -28,6 +28,7 @@
 #include "amdgpu_dm.h"
 #include "dm_helpers.h"
 #include 
+#include 
 #include "hdcp_psp.h"
 
 /*
@@ -260,20 +261,27 @@ void hdcp_handle_cpirq(struct hdcp_workqueue *hdcp_work, 
unsigned int link_index
 static void event_callback(struct work_struct *work)
 {
struct hdcp_workqueue *hdcp_work;
+   int idx;
 
hdcp_work = container_of(to_delayed_work(work), struct hdcp_workqueue,
  callback_dwork);
 
-   mutex_lock(&hdcp_work->mutex);
+   if (drm_dev_enter(hdcp_work->aconnector->base.dev, &idx)) {
 
-   cancel_delayed_work(&hdcp_work->callback_dwork);
+   mutex_lock(&hdcp_work->mutex);
 
-   mod_hdcp_process_event(&hdcp_work->hdcp, MOD_HDCP_EVENT_CALLBACK,
-  &hdcp_work->output);
+   cancel_delayed_work(&hdcp_work->callback_dwork);
+
+   mod_hdcp_process_event(&hdcp_work->hdcp, 
MOD_HDCP_EVENT_CALLBACK,
+  &hdcp_work->output);
 
-   process_output(hdcp_work);
+   process_output(hdcp_work);
 
-   mutex_unlock(&hdcp_work->mutex);
+   mutex_unlock(&hdcp_work->mutex);
+
+   drm_dev_exit(idx);
+
+   }
 
 
 }
@@ -284,34 +292,41 @@ static void event_property_update(struct work_struct 
*work)
struct amdgpu_dm_connector *aconnector = hdcp_work->aconnector;
struct drm_device *dev = hdcp_work->aconnector->base.dev;
long ret;
+   int idx;
+
+   if (drm_dev_enter(dev, &idx)) {
+
+   drm_modeset_lock(&dev->mode_config.connection_mutex, NULL);
+   mutex_lock(&hdcp_work->mutex);
 
-   drm_modeset_lock(&dev->mode_config.connection_mutex, NULL);
-   mutex_lock(&hdcp_work->mutex);
 
+   if (aconnector->base.state->commit) {
+   ret = 
wait_for_completion_interruptible_timeout(&aconnector->base.state->commit->hw_done,
 10 * HZ);
 
-   if (aconnector->base.state->commit) {
-   ret = 
wait_for_completion_interruptible_timeout(&aconnector->base.state->commit->hw_done,
 10 * HZ);
+   if (ret == 0) {
+   DRM_ERROR("HDCP state unknown! Setting it to 
DESIRED");
+   hdcp_work->encryption_status = 
MOD_HDCP_ENCRYPTION_STATUS_HDCP_OFF;
+   }
+   }
 
-   if (ret == 0) {
-   DRM_ERROR("HDCP state unknown! Setting it to DESIRED");
-   hdcp_work->encryption_status = 
MOD_HDCP_ENCRYPTION_STATUS_HDCP_OFF;
+   if (hdcp_work->encryption_status != 
MOD_HDCP_ENCRYPTION_STATUS_HDCP_OFF) {
+   if (aconnector->base.state->hdcp_content_type == 
DRM_MODE_HDCP_CONTENT_TYPE0 &&
+   hdcp_work->encryption_status <= 
MOD_HDCP_ENCRYPTION_STATUS_HDCP2_TYPE0_ON)
+   
drm_hdcp_update_content_protection(&aconnector->base, 
DRM_MODE_CONTENT_PROTECTION_ENABLED);
+   else if (aconnector->base.state->hdcp_content_type == 
DRM_MODE_HDCP_CONTENT_TYPE1 &&
+hdcp_work->encryption_status == 
MOD_HDCP_ENCRYPTION_STATUS_HDCP2_TYPE1_ON)
+   
drm_hdcp_update_content_protection(&aconnector->base, 
DRM_MODE_CONTENT_PROTECTION_ENABLED);
+   } else {
+   drm_hdcp_update_content_protection(&aconnector->base, 
DRM_MODE_CONTENT_PROTECTION_DESIRED);
}
-   }
 
-   if (hdcp_work->encryption_status != 
MOD_HDCP_ENCRYPTION_STATUS_HDCP_OFF) {
-   if (aconnector->base.state->hdcp_content_type == 
DRM_MODE_HDCP_CONTENT_TYPE0 &&
-   hdcp_work->encryption_status <= 
MOD_HDCP_ENCRYPTION_STATUS_HDCP2_TYPE0_ON)
-   drm_hdcp_update_content_protection(&aconnector->base, 
DRM_MODE_CONTENT_PROTECTION_ENABLED);
-   else if (aconnector->base.state->hdcp_content_type == 
DRM_MODE_HDCP_CONTENT_TYPE1 &&
-hdcp_work->encryption_status == 
MOD_HDCP_ENCRYPTION_STATUS_HDCP2_TYPE1_ON)
-   drm_hdcp_update_content_protection(&aconnector->base, 
DRM_MODE_CONTENT_PROTECTION_ENABLED);
-   } else {
-   drm_hdcp_update_content_protection(&aconnector->base, 
DRM_MODE_CONTENT_PROTECTION_DESIRED);
-   }

[PATCH v5 27/27] drm/amdgpu: Verify DMA opearations from device are done

In case device remove is just simualted by sysfs then verify
device doesn't keep doing DMA to the released memory after
pci_remove is done.

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index d0f34f230ef3..f3e8fbde62a0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -1258,7 +1258,13 @@ amdgpu_pci_remove(struct pci_dev *pdev)
drm_dev_unplug(dev);
amdgpu_driver_unload_kms(dev);
 
+   /*
+* Flush any in flight DMA operations from device.
+* Clear the Bus Master Enable bit and then wait on the PCIe Device
+* StatusTransactions Pending bit.
+*/
pci_disable_device(pdev);
+   pci_wait_for_pending_transaction(pdev);
 }
 
 static void
-- 
2.25.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH 1/2] drm/ttm: Don't evict SG BOs

2021-04-28 Thread Felix Kuehling

Am 2021-04-28 um 5:05 a.m. schrieb Christian König:
> Am 28.04.21 um 09:49 schrieb Felix Kuehling:
>> Am 2021-04-28 um 3:04 a.m. schrieb Christian König:
>>> Am 28.04.21 um 07:33 schrieb Felix Kuehling:
 SG BOs do not occupy space that is managed by TTM. So do not evict
 them.

 This fixes unexpected evictions of KFD's userptr BOs. KFD only expects
 userptr "evictions" in the form of MMU notifiers.
>>> NAK, SG BOs also account for the memory the GPU can currently access.
>>>
>>> We can ignore them for the allocated memory, but not for the GTT
>>> domain.
>> Hmm, the only reason I found this problem is, that I am now testing with
>> IOMMU enabled. Evicting the userptr BO destroys the DMA mapping. Without
>> IOMMU-enforced device isolation I was blissfully unaware that the
>> userptr BOs were being evicted. The GPUVM mappings were unaffected and
>> just worked without problems. Having to evict these BOs is crippling
>> KFD's ability to map system memory for GPU access, once again.
>>
>> I think this affects not only userptr BOs but also DMABuf imports for
>> BOs shared between multiple GPUs.
>
> Correct, yes.
>
>> The GTT size limitation is entirely artificial. And the only reason I
>> know of for keeping it limited to the VRAM size is to work around some
>> OOM issues with GTT BOs. Applying this to userptrs and DMABuf imports
>> makes no sense. But I understand that the way TTM manages the GTT domain
>> there is no easy fix for this. Maybe we'd have to create a new domain
>> for validating SG BOs that's separate from GTT, so that TTM would not
>> try to allocate GTT space for them.
>
> Well that is contradict to what the GTT domain is all about.
>
> It should limit the amount of system memory the GPU can access at the
> same time. This includes imported DMA-bus as well as userptrs.

Hmm, I was missing something. The amdgpu_gtt_mgr doesn't actually
allocate space for many BOs:

if (!place->lpfn) {
mem->mm_node = NULL;
mem->start = AMDGPU_BO_INVALID_OFFSET;
return 0;
}

I think our userptr BOs don't have mm_nodes and don't use GTT space. So
I could add a check for that to amdgpu_ttm_bo_eviction_valuable.
Evicting a BO that doesn't have an mm_node is not valuable because it
cannot free up any space.


>
> That the GPUVM mappings are still there is certainly a bug we should
> look into, but in general if we don't want that limitation we need to
> increase the GTT size and not work around it.

I can fix that by adding the KFD eviction fence to userptr BOs. But
given the above suggestion, I think this would never be triggered by
ttm_mem_evict_first. Also not by ttm_bo_swapout, because SG BOs are
never added to the swap_lru (for good reason).


>
> But increasing the GTT size in turn as has a huge negative impact on
> OOM situations up to the point that the OOM killer can't work any more.
>
>> Failing that, I'd probably have to abandon userptr BOs altogether and
>> switch system memory mappings over to using the new SVM API on systems
>> where it is avaliable.
>
> Well as long as that provides the necessary functionality through HMM
> it would be an option.
Just another way of circumventing "It should limit the amount of system
memory the GPU can access at the same time," a premise I disagree with
in case of userptrs and HMM. Both use pageable, unpinned memory. Both
can cause the GPU to be preempted in case of MMU interval notifiers.
Statically limiting the amount of pageable memory accessible to GTT is
redundant and overly limiting.

Regards,
  Felix


>
> Regards,
> Christian.
>
>>
>> Regards,
>>    Felix
>>
>>
>>> Christian.
>>>
 Signed-off-by: Felix Kuehling 
 ---
    drivers/gpu/drm/ttm/ttm_bo.c | 4 
    1 file changed, 4 insertions(+)

 diff --git a/drivers/gpu/drm/ttm/ttm_bo.c
 b/drivers/gpu/drm/ttm/ttm_bo.c
 index de1ec838cf8b..0b953654fdbf 100644
 --- a/drivers/gpu/drm/ttm/ttm_bo.c
 +++ b/drivers/gpu/drm/ttm/ttm_bo.c
 @@ -655,6 +655,10 @@ int ttm_mem_evict_first(struct ttm_device *bdev,
    list_for_each_entry(bo, &man->lru[i], lru) {
    bool busy;
    +    /* Don't evict SG BOs */
 +    if (bo->ttm && bo->ttm->sg)
 +    continue;
 +
    if (!ttm_bo_evict_swapout_allowable(bo, ctx, &locked,
    &busy)) {
    if (busy && !busy_bo && ticket !=
>
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v5 19/27] drm/amdgpu: Finilise device fences on device remove.