Re: [PATCH v6 00/15] Restricted DMA

2021-05-18 Thread Claire Chang
v7: https://lore.kernel.org/patchwork/cover/1431031/

On Mon, May 10, 2021 at 5:50 PM Claire Chang  wrote:
>
> From: Claire Chang 
>
> This series implements mitigations for lack of DMA access control on
> systems without an IOMMU, which could result in the DMA accessing the
> system memory at unexpected times and/or unexpected addresses, possibly
> leading to data leakage or corruption.
>
> For example, we plan to use the PCI-e bus for Wi-Fi and that PCI-e bus is
> not behind an IOMMU. As PCI-e, by design, gives the device full access to
> system memory, a vulnerability in the Wi-Fi firmware could easily escalate
> to a full system exploit (remote wifi exploits: [1a], [1b] that shows a
> full chain of exploits; [2], [3]).
>
> To mitigate the security concerns, we introduce restricted DMA. Restricted
> DMA utilizes the existing swiotlb to bounce streaming DMA in and out of a
> specially allocated region and does memory allocation from the same region.
> The feature on its own provides a basic level of protection against the DMA
> overwriting buffer contents at unexpected times. However, to protect
> against general data leakage and system memory corruption, the system needs
> to provide a way to restrict the DMA to a predefined memory region (this is
> usually done at firmware level, e.g. MPU in ATF on some ARM platforms [4]).
>
> [1a] 
> https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_4.html
> [1b] 
> https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_11.html
> [2] https://blade.tencent.com/en/advisories/qualpwn/
> [3] 
> https://www.bleepingcomputer.com/news/security/vulnerabilities-found-in-highly-popular-firmware-for-wifi-chips/
> [4] 
> https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/mediatek/mt8183/drivers/emi_mpu/emi_mpu.c#L132
>
> v6:
> Address the comments in v5
>
> v5:
> Rebase on latest linux-next
> https://lore.kernel.org/patchwork/cover/1416899/
>
> v4:
> - Fix spinlock bad magic
> - Use rmem->name for debugfs entry
> - Address the comments in v3
> https://lore.kernel.org/patchwork/cover/1378113/
>
> v3:
> Using only one reserved memory region for both streaming DMA and memory
> allocation.
> https://lore.kernel.org/patchwork/cover/1360992/
>
> v2:
> Building on top of swiotlb.
> https://lore.kernel.org/patchwork/cover/1280705/
>
> v1:
> Using dma_map_ops.
> https://lore.kernel.org/patchwork/cover/1271660/
> *** BLURB HERE ***
>
> Claire Chang (15):
>   swiotlb: Refactor swiotlb init functions
>   swiotlb: Refactor swiotlb_create_debugfs
>   swiotlb: Add DMA_RESTRICTED_POOL
>   swiotlb: Add restricted DMA pool initialization
>   swiotlb: Add a new get_io_tlb_mem getter
>   swiotlb: Update is_swiotlb_buffer to add a struct device argument
>   swiotlb: Update is_swiotlb_active to add a struct device argument
>   swiotlb: Bounce data from/to restricted DMA pool if available
>   swiotlb: Move alloc_size to find_slots
>   swiotlb: Refactor swiotlb_tbl_unmap_single
>   dma-direct: Add a new wrapper __dma_direct_free_pages()
>   swiotlb: Add restricted DMA alloc/free support.
>   dma-direct: Allocate memory from restricted DMA pool if available
>   dt-bindings: of: Add restricted DMA pool
>   of: Add plumbing for restricted DMA pool
>
>  .../reserved-memory/reserved-memory.txt   |  27 ++
>  drivers/gpu/drm/i915/gem/i915_gem_internal.c  |   2 +-
>  drivers/gpu/drm/nouveau/nouveau_ttm.c |   2 +-
>  drivers/iommu/dma-iommu.c |  12 +-
>  drivers/of/address.c  |  25 ++
>  drivers/of/device.c   |   3 +
>  drivers/of/of_private.h   |   5 +
>  drivers/pci/xen-pcifront.c|   2 +-
>  drivers/xen/swiotlb-xen.c |   2 +-
>  include/linux/device.h|   4 +
>  include/linux/swiotlb.h   |  41 ++-
>  kernel/dma/Kconfig|  14 +
>  kernel/dma/direct.c   |  63 +++--
>  kernel/dma/direct.h   |   9 +-
>  kernel/dma/swiotlb.c  | 242 +-
>  15 files changed, 356 insertions(+), 97 deletions(-)
>
> --
> 2.31.1.607.g51e8a6a459-goog
>


[PATCH v1 2/3] drm: panel-simple: Add support for the Innolux G070Y2-T02 panel

2021-05-18 Thread Oleksij Rempel
Add compatible and timings for the Innolux G070Y2-T02 panel. It is 7"
WVGA (800x480) TFT LCD panel with TTL interface and a backlight unit.

Co-Developed-by: Robin van der Gracht 
Signed-off-by: Robin van der Gracht 
Signed-off-by: Oleksij Rempel 
---
 drivers/gpu/drm/panel/panel-simple.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/drivers/gpu/drm/panel/panel-simple.c 
b/drivers/gpu/drm/panel/panel-simple.c
index be312b5c04dd..f79d97115f8f 100644
--- a/drivers/gpu/drm/panel/panel-simple.c
+++ b/drivers/gpu/drm/panel/panel-simple.c
@@ -2323,6 +2323,19 @@ static const struct panel_desc innolux_g070y2_l01 = {
.connector_type = DRM_MODE_CONNECTOR_LVDS,
 };
 
+static const struct panel_desc innolux_g070y2_t02 = {
+   .modes = &innolux_at070tn92_mode,
+   .num_modes = 1,
+   .bpc = 8,
+   .size = {
+   .width = 152,
+   .height = 92,
+   },
+   .bus_format = MEDIA_BUS_FMT_RGB888_1X24,
+   .bus_flags = DRM_BUS_FLAG_DE_HIGH | DRM_BUS_FLAG_PIXDATA_DRIVE_POSEDGE,
+   .connector_type = DRM_MODE_CONNECTOR_DPI,
+};
+
 static const struct display_timing innolux_g101ice_l01_timing = {
.pixelclock = { 6040, 7110, 7470 },
.hactive = { 1280, 1280, 1280 },
@@ -4344,6 +4357,9 @@ static const struct of_device_id platform_of_match[] = {
}, {
.compatible = "innolux,g070y2-l01",
.data = &innolux_g070y2_l01,
+   }, {
+   .compatible = "innolux,g070y2-t02",
+   .data = &innolux_g070y2_t02,
}, {
.compatible = "innolux,g101ice-l01",
.data = &innolux_g101ice_l01
-- 
2.29.2



[PATCH v1 0/3] add innolux, g070y2-t02 support for the Protonic VT7 board

2021-05-18 Thread Oleksij Rempel
Add Innolux G070Y2-T02 panel support for the Protonic VT7 board.

Oleksij Rempel (3):
  dt-bindings: display: simple: add Innolux G070Y2-T02 panel
  drm: panel-simple: Add support for the Innolux G070Y2-T02 panel
  ARM: dts: imx6dl-prtvt7: Add display and panel nodes

 .../bindings/display/panel/panel-simple.yaml  |  2 +
 arch/arm/boot/dts/imx6dl-prtvt7.dts   | 47 +++
 drivers/gpu/drm/panel/panel-simple.c  | 16 +++
 3 files changed, 65 insertions(+)

-- 
2.29.2



[PATCH v1 1/3] dt-bindings: display: simple: add Innolux G070Y2-T02 panel

2021-05-18 Thread Oleksij Rempel
Add binding for the Innolux G070Y2-T02 panel. It is 7" WVGA (800x480)
TFT LCD panel with TTL interface and a backlight unit.

Signed-off-by: Oleksij Rempel 
---
 .../devicetree/bindings/display/panel/panel-simple.yaml | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/devicetree/bindings/display/panel/panel-simple.yaml 
b/Documentation/devicetree/bindings/display/panel/panel-simple.yaml
index b3797ba2698b..c06633264e5c 100644
--- a/Documentation/devicetree/bindings/display/panel/panel-simple.yaml
+++ b/Documentation/devicetree/bindings/display/panel/panel-simple.yaml
@@ -154,6 +154,8 @@ properties:
   - innolux,at070tn92
 # Innolux G070Y2-L01 7" WVGA (800x480) TFT LCD panel
   - innolux,g070y2-l01
+# Innolux G070Y2-T02 7" WVGA (800x480) TFT LCD TTL panel
+  - innolux,g070y2-t02
 # Innolux Corporation 10.1" G101ICE-L01 WXGA (1280x800) LVDS panel
   - innolux,g101ice-l01
 # Innolux Corporation 12.1" WXGA (1280x800) TFT LCD panel
-- 
2.29.2



[PATCH v1 3/3] ARM: dts: imx6dl-prtvt7: Add display and panel nodes

2021-05-18 Thread Oleksij Rempel
Add Innolux G070Y2-T02 panel to the Protonic VT7 board.

Signed-off-by: Robin van der Gracht 
Signed-off-by: Oleksij Rempel 
---
 arch/arm/boot/dts/imx6dl-prtvt7.dts | 47 +
 1 file changed, 47 insertions(+)

diff --git a/arch/arm/boot/dts/imx6dl-prtvt7.dts 
b/arch/arm/boot/dts/imx6dl-prtvt7.dts
index ae6da241f13e..156a5c5c0dc1 100644
--- a/arch/arm/boot/dts/imx6dl-prtvt7.dts
+++ b/arch/arm/boot/dts/imx6dl-prtvt7.dts
@@ -31,6 +31,30 @@ backlight_lcd: backlight-lcd {
enable-gpios = <&gpio4 28 GPIO_ACTIVE_HIGH>;
};
 
+   display {
+   compatible = "fsl,imx-parallel-display";
+   pinctrl-0 = <&pinctrl_ipu1_disp>;
+   pinctrl-names = "default";
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   port@0 {
+   reg = <0>;
+
+   display_in: endpoint {
+   remote-endpoint = <&ipu1_di0_disp0>;
+   };
+   };
+
+   port@1 {
+   reg = <1>;
+
+   display_out: endpoint {
+   remote-endpoint = <&panel_in>;
+   };
+   };
+   };
+
keys {
compatible = "gpio-keys";
autorepeat;
@@ -138,6 +162,18 @@ led-debug0 {
};
};
 
+   panel {
+   compatible = "innolux,g070y2-t02";
+   backlight = <&backlight_lcd>;
+   power-supply = <®_3v3>;
+
+   port {
+   panel_in: endpoint {
+   remote-endpoint = <&display_out>;
+   };
+   };
+   };
+
reg_bl_12v0: regulator-bl-12v0 {
compatible = "regulator-fixed";
pinctrl-names = "default";
@@ -156,6 +192,13 @@ reg_1v8: regulator-1v8 {
regulator-max-microvolt = <180>;
};
 
+   reg_3v3: regulator-3v3 {
+   compatible = "regulator-fixed";
+   regulator-name = "3v3";
+   regulator-min-microvolt = <330>;
+   regulator-max-microvolt = <330>;
+   };
+
sound {
compatible = "simple-audio-card";
simple-audio-card,name = "prti6q-sgtl5000";
@@ -260,6 +303,10 @@ &ipu1 {
status = "okay";
 };
 
+&ipu1_di0_disp0 {
+   remote-endpoint = <&display_in>;
+};
+
 &pwm1 {
#pwm-cells = <2>;
pinctrl-names = "default";
-- 
2.29.2



Re: [PATCH] drm: bridge: cdns-mhdp8546: Fix PM reference leak in cdns_mhdp_probe()

2021-05-18 Thread Johan Hovold
On Mon, May 17, 2021 at 11:27:38AM +0200, Robert Foss wrote:
> Hey Yu,
> 
> On Mon, 17 May 2021 at 10:08, Yu Kuai  wrote:
> >
> > pm_runtime_get_sync will increment pm usage counter even it failed.
> > Forgetting to putting operation will result in reference leak here.
> > Fix it by replacing it with pm_runtime_resume_and_get to keep usage
> > counter balanced.
> >
> > Reported-by: Hulk Robot 
> > Signed-off-by: Yu Kuai 
> > ---
> >  drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c 
> > b/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c
> > index 0cd8f40fb690..305489d48c16 100644
> > --- a/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c
> > +++ b/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c
> > @@ -2478,7 +2478,7 @@ static int cdns_mhdp_probe(struct platform_device 
> > *pdev)
> > clk_prepare_enable(clk);
> >
> > pm_runtime_enable(dev);
> > -   ret = pm_runtime_get_sync(dev);
> > +   ret = pm_runtime_resume_and_get(dev);
> > if (ret < 0) {
> > dev_err(dev, "pm_runtime_get_sync failed\n");
> > pm_runtime_disable(dev);
> 
> The code is correct as it is. If pm_runtime_get_sync() fails and
> increments[1] the pm.usage_count variable, that isn't a problem since
> pm_runtime_disable() disables pm, and resets pm.usage_count variable
> to zero[2].

No it doesn't; pm_runtime_disable() does not reset the counter and you
still need to decrement the usage count when pm_runtime_get_sync()
fails.

> [1] 
> https://elixir.bootlin.com/linux/latest/source/include/linux/pm_runtime.h#L384
> [2] 
> https://elixir.bootlin.com/linux/latest/source/drivers/base/power/runtime.c#L1383

Johan


Re: [RFC PATCH 0/3] A drm_plane API to support HDR planes

2021-05-18 Thread Pekka Paalanen
On Mon, 17 May 2021 15:39:03 -0400
Vitaly Prosyak  wrote:

> On 2021-05-17 12:48 p.m., Sebastian Wick wrote:
> > On 2021-05-17 10:57, Pekka Paalanen wrote:  
> >> On Fri, 14 May 2021 17:05:11 -0400
> >> Harry Wentland  wrote:
> >>  
> >>> On 2021-04-27 10:50 a.m., Pekka Paalanen wrote:  
> >>> > On Mon, 26 Apr 2021 13:38:49 -0400
> >>> > Harry Wentland  wrote:  
> >>
> >> ...
> >>  
> >>> >> ## Mastering Luminances
> >>> >>
> >>> >> Now we are able to use the PQ 2084 EOTF to define the luminance of
> >>> >> pixels in absolute terms. Unfortunately we're again presented with
> >>> >> physical limitations of the display technologies on the market   
> >>> today.  
> >>> >> Here are a few examples of luminance ranges of displays.
> >>> >>
> >>> >> | Display  | Luminance range in nits |
> >>> >> |  | --- |
> >>> >> | Typical PC display   | 0.3 - 200 |
> >>> >> | Excellent LCD HDTV   | 0.3 - 400 |
> >>> >> | HDR LCD w/ local dimming | 0.05 - 1,500 |
> >>> >>
> >>> >> Since no display can currently show the full 0.0005 to 10,000 nits
> >>> >> luminance range the display will need to tonemap the HDR content,   
> >>> i.e  
> >>> >> to fit the content within a display's capabilities. To assist with
> >>> >> tonemapping HDR content is usually accompanied with a metadata that
> >>> >> describes (among other things) the minimum and maximum mastering
> >>> >> luminance, i.e. the maximum and minimum luminance of the display   
> >>> that  
> >>> >> was used to master the HDR content.
> >>> >>
> >>> >> The HDR metadata is currently defined on the drm_connector via the
> >>> >> hdr_output_metadata blob property.
> >>> >>
> >>> >> It might be useful to define per-plane hdr metadata, as different
> >>> >> planes might have been mastered differently.  
> >>> >
> >>> > I don't think this would directly help with the dynamic range   
> >>> blending  
> >>> > problem. You still need to establish the mapping between the optical
> >>> > values from two different EOTFs and dynamic ranges. Or can you know
> >>> > which optical values match the mastering display maximum and minimum
> >>> > luminances for not-PQ?
> >>> >  
> >>>
> >>> My understanding of this is probably best illustrated by this example:
> >>>
> >>> Assume HDR was mastered on a display with a maximum white level of 500
> >>> nits and played back on a display that supports a max white level of 
> >>> 400
> >>> nits. If you know the mastering white level of 500 you know that 
> >>> this is
> >>> the maximum value you need to compress down to 400 nits, allowing 
> >>> you to
> >>> use the full extent of the 400 nits panel.  
> >>
> >> Right, but in the kernel, where do you get these nits values from?
> >>
> >> hdr_output_metadata blob is infoframe data to the monitor. I think this
> >> should be independent of the metadata used for color transformations in
> >> the display pipeline before the monitor.
> >>
> >> EDID may tell us the monitor HDR metadata, but again what is used in
> >> the color transformations should be independent, because EDIDs lie,
> >> lighting environments change, and users have different preferences.
> >>
> >> What about black levels?
> >>
> >> Do you want to do black level adjustment?
> >>
> >> How exactly should the compression work?
> >>
> >> Where do you map the mid-tones?
> >>
> >> What if the end user wants something different?  
> >
> > I suspect that this is not about tone mapping at all. The use cases
> > listed always have the display in PQ mode and just assume that no
> > content exceeds the PQ limitations. Then you can simply bring all
> > content to the color space with a matrix multiplication and then map the
> > linear light content somewhere into the PQ range. Tone mapping is
> > performed in the display only.

The use cases do use the word "desktop" though. Harry, could you expand
on this, are you seeking a design that is good for generic desktop
compositors too, or one that is more tailored to "embedded" video
player systems taking the most advantage of (potentially
fixed-function) hardware?

What matrix would one choose? Which render intent would it
correspond to?

If you need to adapt different dynamic ranges into the blending dynamic
range, would a simple linear transformation really be enough?

> > From a generic wayland compositor point of view this is uninteresting.
> >  
> It a compositor's decision to provide or not the metadata property to 
> the kernel. The metadata can be available from one or multiple clients 
> or most likely not available at all.
> 
> Compositors may put a display in HDR10 ( when PQ 2084 INV EOTF and TM 
> occurs in display ) or NATIVE mode and do not attach any metadata to the 
> connector and do TM in compositor.
> 
> It is all about user preference or compositor design, or a combination 
> of both options.

Indeed. The thing here is that you cannot just add KMS UAPI, you also
need to have the FOSS userspace to go with it. So you n

Re: [RFC PATCH v2 0/6] A drm_plane API to support HDR planes

2021-05-18 Thread Pekka Paalanen
On Fri, 14 May 2021 17:07:14 -0400
Harry Wentland  wrote:

> We are looking to enable HDR support for a couple of single-plane and
> multi-plane scenarios. To do this effectively we recommend new interfaces
> to drm_plane. The first patch gives a bit of background on HDR and why we
> propose these interfaces.
> 
> v2:

For everyone's information, the discussion is still on-going in the
first email thread.


Thanks,
pq


pgpqwnEn_3pKQ.pgp
Description: OpenPGP digital signature


[PATCH v2 00/15] drm/i915: Move LMEM (VRAM) management over to TTM

2021-05-18 Thread Thomas Hellström
This is an initial patch series to move discrete memory management over to
TTM. It will be followed up shortly with adding more functionality.

The buddy allocator is temporarily removed along with its selftests and
It is replaced with the TTM range manager and some selftests are adjusted
to account for introduced fragmentation. Work is ongoing to reintroduce the
buddy allocator as a TTM resource manager.

A new memcpy ttm move is introduced that uses kmap_local() functionality
rather than vmap(). Among other things stated in the patch commit message
it helps us deal with page-pased LMEM memory. It is generic enough to replace
the ttm memcpy move with some additional work if so desired. On x86 it also
enables prefetching reads from write-combined memory.

Finally the old i915 gem object LMEM backend is replaced with a
i915 gem object TTM backend and some additional i915 gem object ops are
introduced to support the added functionality.
Currently it is used only to support management and eviction of the LMEM
region, but work is underway to extend the support to system memory. In this
way we use TTM the way it was originally intended, having the GPU binding
taken care of by driver code.

Intention is to follow up with
- System memory support
- Pipelined accelerated moves / migration
- Re-added buddy allocator in the TTM framework

v2:
- Add patches to move pagefaulting over to TTM
- Break out TTM changes to separate patches
- Address various review comments as detailed in the affected patches

Cc: Christian König 

Maarten Lankhorst (4):
  drm/i915: Disable mmap ioctl for gen12+
  drm/ttm: Add BO and offset arguments for vm_access and vm_fault ttm
handlers.
  drm/i915: Use ttm mmap handling for ttm bo's.
  drm/i915/ttm: Add io sgt caching to i915_ttm_io_mem_pfn

Thomas Hellström (11):
  drm/i915: Untangle the vma pages_mutex
  drm/i915: Don't free shared locks while shared
  drm/i915: Fix i915_sg_page_sizes to record dma segments rather than
physical pages
  drm/ttm: Export functions to initialize and finalize the ttm range
manager standalone
  drm/i915/ttm Initialize the ttm device and memory managers
  drm/i915/ttm: Embed a ttm buffer object in the i915 gem object
  drm/ttm: Export ttm_bo_tt_destroy()
  drm/i915/ttm Add a generic TTM memcpy move for page-based iomem
  drm/ttm, drm/amdgpu: Allow the driver some control over swapping
  drm/i915/ttm: Introduce a TTM i915 gem object backend
  drm/i915/lmem: Verify checks for lmem residency

 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   |   8 +-
 drivers/gpu/drm/i915/Kconfig  |   1 +
 drivers/gpu/drm/i915/Makefile |   4 +-
 drivers/gpu/drm/i915/display/intel_display.c  |   2 +-
 drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c|   2 +-
 drivers/gpu/drm/i915/gem/i915_gem_lmem.c  |  71 +-
 drivers/gpu/drm/i915/gem/i915_gem_lmem.h  |   5 -
 drivers/gpu/drm/i915/gem/i915_gem_mman.c  |  24 +-
 drivers/gpu/drm/i915/gem/i915_gem_mman.h  |   2 +
 drivers/gpu/drm/i915/gem/i915_gem_object.c| 149 +++-
 drivers/gpu/drm/i915/gem/i915_gem_object.h|  19 +-
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |  39 +-
 drivers/gpu/drm/i915/gem/i915_gem_pages.c |   6 +-
 drivers/gpu/drm/i915/gem/i915_gem_phys.c  |   2 +-
 drivers/gpu/drm/i915/gem/i915_gem_region.c| 126 +--
 drivers/gpu/drm/i915/gem/i915_gem_region.h|   4 -
 drivers/gpu/drm/i915/gem/i915_gem_shmem.c |   4 +-
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c|  10 +-
 drivers/gpu/drm/i915/gem/i915_gem_stolen.h|   9 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c   | 626 ++
 drivers/gpu/drm/i915/gem/i915_gem_ttm.h   |  48 ++
 .../gpu/drm/i915/gem/i915_gem_ttm_bo_util.c   | 194 +
 .../gpu/drm/i915/gem/i915_gem_ttm_bo_util.h   | 107 +++
 drivers/gpu/drm/i915/gem/i915_gem_userptr.c   |   2 +-
 drivers/gpu/drm/i915/gt/intel_ggtt.c  |  19 +-
 drivers/gpu/drm/i915/gt/intel_gt.c|   2 -
 drivers/gpu/drm/i915/gt/intel_gtt.c   |  45 +-
 drivers/gpu/drm/i915/gt/intel_gtt.h   |  30 +-
 drivers/gpu/drm/i915/gt/intel_ppgtt.c |   2 +-
 drivers/gpu/drm/i915/gt/intel_region_lmem.c   |  30 +-
 drivers/gpu/drm/i915/i915_buddy.c | 435 --
 drivers/gpu/drm/i915/i915_buddy.h | 131 ---
 drivers/gpu/drm/i915/i915_drv.c   |  13 +
 drivers/gpu/drm/i915/i915_drv.h   |   7 +-
 drivers/gpu/drm/i915/i915_gem.c   |   6 +-
 drivers/gpu/drm/i915/i915_globals.c   |   1 -
 drivers/gpu/drm/i915/i915_globals.h   |   1 -
 drivers/gpu/drm/i915/i915_scatterlist.c   |  70 ++
 drivers/gpu/drm/i915/i915_scatterlist.h   |  20 +-
 drivers/gpu/drm/i915/i915_vma.c   |  33 +-
 drivers/gpu/drm/i915/intel_memory_region.c| 181 ++--
 drivers/gpu/drm/i915/intel_memory_region.h|  45 +-
 drivers/gpu/drm/i915/intel_region_ttm.c   | 246 ++
 drivers/gpu/drm/i915/intel_region_ttm.h   |  32 +
 drivers/gpu/drm/i

[PATCH v2 01/15] drm/i915: Untangle the vma pages_mutex

2021-05-18 Thread Thomas Hellström
From: Thomas Hellström 

Any sleeping dma_resv lock taken while the vma pages_mutex is held
will cause a lockdep splat.
Move the i915_gem_object_pin_pages() call out of the pages_mutex
critical section.

Signed-off-by: Thomas Hellström 
---
 drivers/gpu/drm/i915/i915_vma.c | 33 +++--
 1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index a6cd0fa62847..7b1c0f4e60d7 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -800,32 +800,37 @@ static bool try_qad_pin(struct i915_vma *vma, unsigned 
int flags)
 static int vma_get_pages(struct i915_vma *vma)
 {
int err = 0;
+   bool pinned_pages = false;
 
if (atomic_add_unless(&vma->pages_count, 1, 0))
return 0;
 
+   if (vma->obj) {
+   err = i915_gem_object_pin_pages(vma->obj);
+   if (err)
+   return err;
+   pinned_pages = true;
+   }
+
/* Allocations ahoy! */
-   if (mutex_lock_interruptible(&vma->pages_mutex))
-   return -EINTR;
+   if (mutex_lock_interruptible(&vma->pages_mutex)) {
+   err = -EINTR;
+   goto unpin;
+   }
 
if (!atomic_read(&vma->pages_count)) {
-   if (vma->obj) {
-   err = i915_gem_object_pin_pages(vma->obj);
-   if (err)
-   goto unlock;
-   }
-
err = vma->ops->set_pages(vma);
-   if (err) {
-   if (vma->obj)
-   i915_gem_object_unpin_pages(vma->obj);
+   if (err)
goto unlock;
-   }
+   pinned_pages = false;
}
atomic_inc(&vma->pages_count);
 
 unlock:
mutex_unlock(&vma->pages_mutex);
+unpin:
+   if (pinned_pages)
+   __i915_gem_object_unpin_pages(vma->obj);
 
return err;
 }
@@ -838,10 +843,10 @@ static void __vma_put_pages(struct i915_vma *vma, 
unsigned int count)
if (atomic_sub_return(count, &vma->pages_count) == 0) {
vma->ops->clear_pages(vma);
GEM_BUG_ON(vma->pages);
-   if (vma->obj)
-   i915_gem_object_unpin_pages(vma->obj);
}
mutex_unlock(&vma->pages_mutex);
+   if (vma->obj)
+   i915_gem_object_unpin_pages(vma->obj);
 }
 
 static void vma_put_pages(struct i915_vma *vma)
-- 
2.31.1



[PATCH v2 03/15] drm/i915: Fix i915_sg_page_sizes to record dma segments rather than physical pages

2021-05-18 Thread Thomas Hellström
All users of this function actually want the dma segment sizes, but that's
not what's calculated. Fix that and rename the function to
i915_sg_dma_sizes to reflect what's calculated.

Signed-off-by: Thomas Hellström 
---
 drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c  |  2 +-
 drivers/gpu/drm/i915/gem/i915_gem_phys.c|  2 +-
 drivers/gpu/drm/i915/gem/i915_gem_userptr.c |  2 +-
 drivers/gpu/drm/i915/i915_scatterlist.h | 16 
 4 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c 
b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
index ccede73c6465..616c3a2f1baf 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
@@ -209,7 +209,7 @@ static int i915_gem_object_get_pages_dmabuf(struct 
drm_i915_gem_object *obj)
if (IS_ERR(pages))
return PTR_ERR(pages);
 
-   sg_page_sizes = i915_sg_page_sizes(pages->sgl);
+   sg_page_sizes = i915_sg_dma_sizes(pages->sgl);
 
__i915_gem_object_set_pages(obj, pages, sg_page_sizes);
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_phys.c 
b/drivers/gpu/drm/i915/gem/i915_gem_phys.c
index 81dc2bf59bc3..36f373dc493c 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_phys.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_phys.c
@@ -208,7 +208,7 @@ static int i915_gem_object_shmem_to_phys(struct 
drm_i915_gem_object *obj)
 
 err_xfer:
if (!IS_ERR_OR_NULL(pages)) {
-   unsigned int sg_page_sizes = i915_sg_page_sizes(pages->sgl);
+   unsigned int sg_page_sizes = i915_sg_dma_sizes(pages->sgl);
 
__i915_gem_object_set_pages(obj, pages, sg_page_sizes);
}
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c 
b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
index a657b99ec760..602f0ed983ec 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
@@ -173,7 +173,7 @@ static int i915_gem_userptr_get_pages(struct 
drm_i915_gem_object *obj)
goto err;
}
 
-   sg_page_sizes = i915_sg_page_sizes(st->sgl);
+   sg_page_sizes = i915_sg_dma_sizes(st->sgl);
 
__i915_gem_object_set_pages(obj, st, sg_page_sizes);
 
diff --git a/drivers/gpu/drm/i915/i915_scatterlist.h 
b/drivers/gpu/drm/i915/i915_scatterlist.h
index 9cb26a224034..b96baad66a3a 100644
--- a/drivers/gpu/drm/i915/i915_scatterlist.h
+++ b/drivers/gpu/drm/i915/i915_scatterlist.h
@@ -101,15 +101,23 @@ static inline struct scatterlist *__sg_next(struct 
scatterlist *sg)
 (((__iter).curr += PAGE_SIZE) >= (__iter).max) ?   \
 (__iter) = __sgt_iter(__sg_next((__iter).sgp), false), 0 : 0)
 
-static inline unsigned int i915_sg_page_sizes(struct scatterlist *sg)
+/**
+ * i915_sg_dma_sizes - Record the dma segment sizes of a scatterlist
+ * @sg: The scatterlist
+ *
+ * Return: An unsigned int with segment sizes logically or'ed together.
+ * A caller can use this information to determine what hardware page table
+ * entry sizes can be used to map the memory represented by the scatterlist.
+ */
+static inline unsigned int i915_sg_dma_sizes(struct scatterlist *sg)
 {
unsigned int page_sizes;
 
page_sizes = 0;
-   while (sg) {
+   while (sg && sg_dma_len(sg)) {
GEM_BUG_ON(sg->offset);
-   GEM_BUG_ON(!IS_ALIGNED(sg->length, PAGE_SIZE));
-   page_sizes |= sg->length;
+   GEM_BUG_ON(!IS_ALIGNED(sg_dma_len(sg), PAGE_SIZE));
+   page_sizes |= sg_dma_len(sg);
sg = __sg_next(sg);
}
 
-- 
2.31.1



[PATCH v2 04/15] drm/ttm: Export functions to initialize and finalize the ttm range manager standalone

2021-05-18 Thread Thomas Hellström
i915 mock selftests are run without the device set up. In order to be able
to run the region related mock selftests, export functions in order for the
TTM range manager to be set up without a device to attach it to.

Cc: Christian König 
Signed-off-by: Thomas Hellström 
---
 drivers/gpu/drm/ttm/ttm_range_manager.c | 55 +
 include/drm/ttm/ttm_bo_driver.h | 23 +++
 2 files changed, 61 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_range_manager.c 
b/drivers/gpu/drm/ttm/ttm_range_manager.c
index b9d5da6e6a81..6957dfb0cf5a 100644
--- a/drivers/gpu/drm/ttm/ttm_range_manager.c
+++ b/drivers/gpu/drm/ttm/ttm_range_manager.c
@@ -125,55 +125,76 @@ static const struct ttm_resource_manager_func 
ttm_range_manager_func = {
.debug = ttm_range_man_debug
 };
 
-int ttm_range_man_init(struct ttm_device *bdev,
-  unsigned type, bool use_tt,
-  unsigned long p_size)
+struct ttm_resource_manager *
+ttm_range_man_init_standalone(unsigned long size, bool use_tt)
 {
struct ttm_resource_manager *man;
struct ttm_range_manager *rman;
 
rman = kzalloc(sizeof(*rman), GFP_KERNEL);
if (!rman)
-   return -ENOMEM;
+   return ERR_PTR(-ENOMEM);
 
man = &rman->manager;
man->use_tt = use_tt;
 
man->func = &ttm_range_manager_func;
 
-   ttm_resource_manager_init(man, p_size);
+   ttm_resource_manager_init(man, size);
 
-   drm_mm_init(&rman->mm, 0, p_size);
+   drm_mm_init(&rman->mm, 0, size);
spin_lock_init(&rman->lock);
 
-   ttm_set_driver_manager(bdev, type, &rman->manager);
+   return man;
+}
+EXPORT_SYMBOL(ttm_range_man_init_standalone);
+
+int ttm_range_man_init(struct ttm_device *bdev,
+  unsigned int type, bool use_tt,
+  unsigned long p_size)
+{
+   struct ttm_resource_manager *man;
+
+   man = ttm_range_man_init_standalone(p_size, use_tt);
+   if (IS_ERR(man))
+   return PTR_ERR(man);
+
ttm_resource_manager_set_used(man, true);
+   ttm_set_driver_manager(bdev, type, man);
+
return 0;
 }
 EXPORT_SYMBOL(ttm_range_man_init);
 
+void ttm_range_man_fini_standalone(struct ttm_resource_manager *man)
+{
+   struct ttm_range_manager *rman = to_range_manager(man);
+   struct drm_mm *mm = &rman->mm;
+
+   spin_lock(&rman->lock);
+   drm_mm_clean(mm);
+   drm_mm_takedown(mm);
+   spin_unlock(&rman->lock);
+
+   ttm_resource_manager_cleanup(man);
+   kfree(rman);
+}
+EXPORT_SYMBOL(ttm_range_man_fini_standalone);
+
 int ttm_range_man_fini(struct ttm_device *bdev,
   unsigned type)
 {
struct ttm_resource_manager *man = ttm_manager_type(bdev, type);
-   struct ttm_range_manager *rman = to_range_manager(man);
-   struct drm_mm *mm = &rman->mm;
int ret;
 
ttm_resource_manager_set_used(man, false);
-
ret = ttm_resource_manager_evict_all(bdev, man);
if (ret)
return ret;
 
-   spin_lock(&rman->lock);
-   drm_mm_clean(mm);
-   drm_mm_takedown(mm);
-   spin_unlock(&rman->lock);
-
-   ttm_resource_manager_cleanup(man);
ttm_set_driver_manager(bdev, type, NULL);
-   kfree(rman);
+   ttm_range_man_fini_standalone(man);
+
return 0;
 }
 EXPORT_SYMBOL(ttm_range_man_fini);
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
index dbccac957f8f..734b1712ea72 100644
--- a/include/drm/ttm/ttm_bo_driver.h
+++ b/include/drm/ttm/ttm_bo_driver.h
@@ -321,6 +321,20 @@ int ttm_range_man_init(struct ttm_device *bdev,
   unsigned type, bool use_tt,
   unsigned long p_size);
 
+/**
+ * ttm_range_man_init_standalone - Initialize a ttm range manager without
+ * device interaction.
+ * @size: Size of the area to be managed in pages.
+ * @use_tt: The memory type requires tt backing.
+ *
+ * This function is intended for selftests. It initializes a range manager
+ * without any device interaction.
+ *
+ * Return: pointer to a range manager on success. Error pointer on failure.
+ */
+struct ttm_resource_manager *
+ttm_range_man_init_standalone(unsigned long size, bool use_tt);
+
 /**
  * ttm_range_man_fini
  *
@@ -332,4 +346,13 @@ int ttm_range_man_init(struct ttm_device *bdev,
 int ttm_range_man_fini(struct ttm_device *bdev,
   unsigned type);
 
+/**
+ * ttm_range_man_fini_standalone
+ * @man: The range manager
+ *
+ * Tear down a range manager initialized with
+ * ttm_range_manager_init_standalone().
+ */
+void ttm_range_man_fini_standalone(struct ttm_resource_manager *man);
+
 #endif
-- 
2.31.1



[PATCH v2 07/15] drm/ttm: Export ttm_bo_tt_destroy()

2021-05-18 Thread Thomas Hellström
For the upcoming kmapping i915 memcpy_move, export ttm_bo_tt_destroy().
A future change might be to move the new memcpy_move into ttm, replacing
the old ioremapping one.

Cc: Christian König 
Signed-off-by: Thomas Hellström 
---
 drivers/gpu/drm/ttm/ttm_bo.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index ca1b098b6a56..4479c55aaa1d 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -1221,3 +1221,4 @@ void ttm_bo_tt_destroy(struct ttm_buffer_object *bo)
ttm_tt_destroy(bo->bdev, bo->ttm);
bo->ttm = NULL;
 }
+EXPORT_SYMBOL(ttm_bo_tt_destroy);
-- 
2.31.1



[PATCH v2 08/15] drm/i915/ttm Add a generic TTM memcpy move for page-based iomem

2021-05-18 Thread Thomas Hellström
The internal ttm_bo_util memcpy uses vmap functionality, and while it
probably might be possible to use it for copying in- and out of
sglist represented io memory, using io_mem_reserve() / io_mem_free()
callbacks, that would cause problems with fault().
Instead, implement a method mapping page-by-page using kmap_local()
semantics. As an additional benefit we then avoid the occasional global
TLB flushes of vmap() and consuming vmap space, elimination of a critical
point of failure and with a slight change of semantics we could also push
the memcpy out async for testing and async driver develpment purposes.
Pushing out async can be done since there is no memory allocation going on
that could violate the dma_fence lockdep rules.

For copies from iomem, use the WC prefetching memcpy variant for
additional speed.

Note that drivers that don't want to use struct io_mapping but relies on
memremap functionality, and that don't want to use scatterlists for
VRAM may well define specialized (hopefully reusable) iterators for their
particular environment.

Cc: Christian König 
Signed-off-by: Thomas Hellström 
---
v2:
- Move new TTM exports to a separate commit. (Reported by Christian König)
- Avoid having the iterator init functions inline. (Reported by Jani Nikula)
- Remove a stray comment.
---
 drivers/gpu/drm/i915/Makefile |   1 +
 .../gpu/drm/i915/gem/i915_gem_ttm_bo_util.c   | 194 ++
 .../gpu/drm/i915/gem/i915_gem_ttm_bo_util.h   | 107 ++
 3 files changed, 302 insertions(+)
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_ttm_bo_util.c
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_ttm_bo_util.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index cb8823570996..958ccc1edfed 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -155,6 +155,7 @@ gem-y += \
gem/i915_gem_stolen.o \
gem/i915_gem_throttle.o \
gem/i915_gem_tiling.o \
+   gem/i915_gem_ttm_bo_util.o \
gem/i915_gem_userptr.o \
gem/i915_gem_wait.o \
gem/i915_gemfs.o
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_bo_util.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_bo_util.c
new file mode 100644
index ..5f347a85bf44
--- /dev/null
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_bo_util.c
@@ -0,0 +1,194 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2021 Intel Corporation
+ */
+
+/**
+ * DOC: Usage and intentions.
+ *
+ * This file contains functionality that we might want to move into
+ * ttm_bo_util.c if there is a common interest.
+ * Currently a kmap_local only memcpy with support for page-based iomem 
regions,
+ * and fast memcpy from write-combined memory.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include "i915_memcpy.h"
+
+#include "gem/i915_gem_ttm_bo_util.h"
+
+static void i915_ttm_kmap_iter_tt_kmap_local(struct i915_ttm_kmap_iter *iter,
+struct dma_buf_map *dmap,
+pgoff_t i)
+{
+   struct i915_ttm_kmap_iter_tt *iter_tt =
+   container_of(iter, typeof(*iter_tt), base);
+
+   dma_buf_map_set_vaddr(dmap, kmap_local_page(iter_tt->tt->pages[i]));
+}
+
+static void i915_ttm_kmap_iter_iomap_kmap_local(struct i915_ttm_kmap_iter 
*iter,
+   struct dma_buf_map *dmap,
+   pgoff_t i)
+{
+   struct i915_ttm_kmap_iter_iomap *iter_io =
+   container_of(iter, typeof(*iter_io), base);
+   void __iomem *addr;
+
+retry:
+   while (i >= iter_io->cache.end) {
+   iter_io->cache.sg = iter_io->cache.sg ?
+   sg_next(iter_io->cache.sg) : iter_io->st->sgl;
+   iter_io->cache.i = iter_io->cache.end;
+   iter_io->cache.end += sg_dma_len(iter_io->cache.sg) >>
+   PAGE_SHIFT;
+   iter_io->cache.offs = sg_dma_address(iter_io->cache.sg) -
+   iter_io->start;
+   }
+
+   if (i < iter_io->cache.i) {
+   iter_io->cache.end = 0;
+   iter_io->cache.sg = NULL;
+   goto retry;
+   }
+
+   addr = io_mapping_map_local_wc(iter_io->iomap, iter_io->cache.offs +
+  (((resource_size_t)i - iter_io->cache.i)
+   << PAGE_SHIFT));
+   dma_buf_map_set_vaddr_iomem(dmap, addr);
+}
+
+static const struct i915_ttm_kmap_iter_ops i915_ttm_kmap_iter_tt_ops = {
+   .kmap_local = i915_ttm_kmap_iter_tt_kmap_local
+};
+
+static const struct i915_ttm_kmap_iter_ops i915_ttm_kmap_iter_io_ops = {
+   .kmap_local =  i915_ttm_kmap_iter_iomap_kmap_local
+};
+
+static void kunmap_local_dma_buf_map(struct dma_buf_map *map)
+{
+   if (map->is_iomem)
+   io_mapping_unmap_local(map->vaddr_iomem);
+   else
+   kunmap_local(map->vaddr);
+}
+
+/**
+

[PATCH v2 05/15] drm/i915/ttm Initialize the ttm device and memory managers

2021-05-18 Thread Thomas Hellström
Temporarily remove the buddy allocator and related selftests
and hook up the TTM range manager for i915 regions.

Also modify the mock region selftests somewhat to account for a
fragmenting manager.

Signed-off-by: Thomas Hellström 
---
v2:
- Fix an error unwind in lmem_get_pages() (Reported by Matthew Auld)
- Break out and modify usage of i915_sg_dma_sizes() (Reported by Mattew Auld)
- Break out TTM changes to a separate patch (Reported by Christian König)
---
 drivers/gpu/drm/i915/Kconfig  |   1 +
 drivers/gpu/drm/i915/Makefile |   2 +-
 drivers/gpu/drm/i915/gem/i915_gem_lmem.c  |  59 +-
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |   6 +-
 drivers/gpu/drm/i915/gem/i915_gem_pages.c |   3 +-
 drivers/gpu/drm/i915/gem/i915_gem_region.c| 120 ---
 drivers/gpu/drm/i915/gem/i915_gem_region.h|   4 -
 drivers/gpu/drm/i915/gem/i915_gem_shmem.c |   4 +-
 drivers/gpu/drm/i915/gem/i915_gem_stolen.c|  10 +-
 drivers/gpu/drm/i915/gem/i915_gem_stolen.h|   9 +-
 drivers/gpu/drm/i915/gt/intel_gt.c|   2 -
 drivers/gpu/drm/i915/gt/intel_region_lmem.c   |  27 +-
 drivers/gpu/drm/i915/i915_buddy.c | 435 --
 drivers/gpu/drm/i915/i915_buddy.h | 131 ---
 drivers/gpu/drm/i915/i915_drv.c   |   8 +
 drivers/gpu/drm/i915/i915_drv.h   |   7 +-
 drivers/gpu/drm/i915/i915_gem.c   |   1 +
 drivers/gpu/drm/i915/i915_globals.c   |   1 -
 drivers/gpu/drm/i915/i915_globals.h   |   1 -
 drivers/gpu/drm/i915/i915_scatterlist.c   |  70 ++
 drivers/gpu/drm/i915/i915_scatterlist.h   |   4 +
 drivers/gpu/drm/i915/intel_memory_region.c| 180 ++--
 drivers/gpu/drm/i915/intel_memory_region.h|  44 +-
 drivers/gpu/drm/i915/intel_region_ttm.c   | 245 ++
 drivers/gpu/drm/i915/intel_region_ttm.h   |  29 +
 drivers/gpu/drm/i915/selftests/i915_buddy.c   | 789 --
 .../drm/i915/selftests/i915_mock_selftests.h  |   1 -
 .../drm/i915/selftests/intel_memory_region.c  | 133 +--
 drivers/gpu/drm/i915/selftests/mock_region.c  |  50 +-
 29 files changed, 622 insertions(+), 1754 deletions(-)
 delete mode 100644 drivers/gpu/drm/i915/i915_buddy.c
 delete mode 100644 drivers/gpu/drm/i915/i915_buddy.h
 create mode 100644 drivers/gpu/drm/i915/intel_region_ttm.c
 create mode 100644 drivers/gpu/drm/i915/intel_region_ttm.h
 delete mode 100644 drivers/gpu/drm/i915/selftests/i915_buddy.c

diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
index 1e1cb245fca7..b63d374dff23 100644
--- a/drivers/gpu/drm/i915/Kconfig
+++ b/drivers/gpu/drm/i915/Kconfig
@@ -26,6 +26,7 @@ config DRM_I915
select SND_HDA_I915 if SND_HDA_CORE
select CEC_CORE if CEC_NOTIFIER
select VMAP_PFN
+   select DRM_TTM
help
  Choose this option if you have a system that has "Intel Graphics
  Media Accelerator" or "HD Graphics" integrated graphics,
diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index d0d936d9137b..cb8823570996 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -50,6 +50,7 @@ i915-y += i915_drv.o \
  intel_memory_region.o \
  intel_pch.o \
  intel_pm.o \
+ intel_region_ttm.o \
  intel_runtime_pm.o \
  intel_sideband.o \
  intel_step.o \
@@ -160,7 +161,6 @@ gem-y += \
 i915-y += \
  $(gem-y) \
  i915_active.o \
- i915_buddy.o \
  i915_cmd_parser.o \
  i915_gem_evict.o \
  i915_gem_gtt.o \
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c 
b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c
index f44bdd08f7cb..3b4aa28a076d 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c
@@ -4,16 +4,71 @@
  */
 
 #include "intel_memory_region.h"
+#include "intel_region_ttm.h"
 #include "gem/i915_gem_region.h"
 #include "gem/i915_gem_lmem.h"
 #include "i915_drv.h"
 
+static void lmem_put_pages(struct drm_i915_gem_object *obj,
+  struct sg_table *pages)
+{
+   intel_region_ttm_node_free(obj->mm.region, obj->mm.st_mm_node);
+   obj->mm.dirty = false;
+   sg_free_table(pages);
+   kfree(pages);
+}
+
+static int lmem_get_pages(struct drm_i915_gem_object *obj)
+{
+   unsigned int flags;
+   struct sg_table *pages;
+
+   flags = I915_ALLOC_MIN_PAGE_SIZE;
+   if (obj->flags & I915_BO_ALLOC_CONTIGUOUS)
+   flags |= I915_ALLOC_CONTIGUOUS;
+
+   obj->mm.st_mm_node = intel_region_ttm_node_alloc(obj->mm.region,
+obj->base.size,
+flags);
+   if (IS_ERR(obj->mm.st_mm_node))
+   return PTR_ERR(obj->mm.st_mm_node);
+
+   /* Range manager is always contigous */
+   if (obj->mm.region->is_range_manager)
+   obj->flags |= I915_BO_ALLOC_CONTIGUO

[PATCH v2 02/15] drm/i915: Don't free shared locks while shared

2021-05-18 Thread Thomas Hellström
We are currently sharing the VM reservation locks across a number of
gem objects with page-table memory. Since TTM will individiualize the
reservation locks when freeing objects, including accessing the shared
locks, make sure that the shared locks are not freed until that is done.
For PPGTT we add an additional refcount, for GGTT we take additional
measures to make sure objects sharing the GGTT reservation lock are
freed at GGTT takedown

Signed-off-by: Thomas Hellström 
---
v2: Try harder to make sure objects sharing the GGTT reservation lock are
freed at GGTT takedown.
---
 drivers/gpu/drm/i915/gem/i915_gem_object.c|  3 ++
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |  1 +
 drivers/gpu/drm/i915/gt/intel_ggtt.c  | 19 ++--
 drivers/gpu/drm/i915/gt/intel_gtt.c   | 45 +++
 drivers/gpu/drm/i915/gt/intel_gtt.h   | 30 -
 drivers/gpu/drm/i915/gt/intel_ppgtt.c |  2 +-
 drivers/gpu/drm/i915/i915_drv.c   |  5 +++
 7 files changed, 92 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 28144410df86..abadf0994ad0 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -252,6 +252,9 @@ static void __i915_gem_free_objects(struct drm_i915_private 
*i915,
if (obj->mm.n_placements > 1)
kfree(obj->mm.placements);
 
+   if (obj->resv_shared_from)
+   i915_vm_resv_put(obj->resv_shared_from);
+
/* But keep the pointer alive for RCU-protected lookups */
call_rcu(&obj->rcu, __i915_gem_free_object_rcu);
cond_resched();
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 0727d0c76aa0..450340a73186 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -149,6 +149,7 @@ struct drm_i915_gem_object {
 * when i915_gem_ww_ctx_backoff() or i915_gem_ww_ctx_fini() are called.
 */
struct list_head obj_link;
+   struct dma_resv *resv_shared_from;
 
union {
struct rcu_head rcu;
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c 
b/drivers/gpu/drm/i915/gt/intel_ggtt.c
index 35069ca5d7de..10c23a749a95 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
@@ -746,7 +746,6 @@ static void ggtt_cleanup_hw(struct i915_ggtt *ggtt)
 
mutex_unlock(&ggtt->vm.mutex);
i915_address_space_fini(&ggtt->vm);
-   dma_resv_fini(&ggtt->vm.resv);
 
arch_phys_wc_del(ggtt->mtrr);
 
@@ -768,6 +767,19 @@ void i915_ggtt_driver_release(struct drm_i915_private 
*i915)
ggtt_cleanup_hw(ggtt);
 }
 
+/**
+ * i915_ggtt_driver_late_release - Cleanup of GGTT that needs to be done after
+ * all free objects have been drained.
+ * @i915: i915 device
+ */
+void i915_ggtt_driver_late_release(struct drm_i915_private *i915)
+{
+   struct i915_ggtt *ggtt = &i915->ggtt;
+
+   GEM_WARN_ON(kref_read(&ggtt->vm.resv_ref) != 1);
+   dma_resv_fini(&ggtt->vm._resv);
+}
+
 static unsigned int gen6_get_total_gtt_size(u16 snb_gmch_ctl)
 {
snb_gmch_ctl >>= SNB_GMCH_GGMS_SHIFT;
@@ -829,6 +841,7 @@ static int ggtt_probe_common(struct i915_ggtt *ggtt, u64 
size)
return -ENOMEM;
}
 
+   kref_init(&ggtt->vm.resv_ref);
ret = setup_scratch_page(&ggtt->vm);
if (ret) {
drm_err(&i915->drm, "Scratch setup failed\n");
@@ -1135,7 +1148,7 @@ static int ggtt_probe_hw(struct i915_ggtt *ggtt, struct 
intel_gt *gt)
ggtt->vm.gt = gt;
ggtt->vm.i915 = i915;
ggtt->vm.dma = i915->drm.dev;
-   dma_resv_init(&ggtt->vm.resv);
+   dma_resv_init(&ggtt->vm._resv);
 
if (INTEL_GEN(i915) <= 5)
ret = i915_gmch_probe(ggtt);
@@ -1144,7 +1157,7 @@ static int ggtt_probe_hw(struct i915_ggtt *ggtt, struct 
intel_gt *gt)
else
ret = gen8_gmch_probe(ggtt);
if (ret) {
-   dma_resv_fini(&ggtt->vm.resv);
+   dma_resv_fini(&ggtt->vm._resv);
return ret;
}
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c 
b/drivers/gpu/drm/i915/gt/intel_gtt.c
index 9b98f9d9faa3..695b22b17644 100644
--- a/drivers/gpu/drm/i915/gt/intel_gtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_gtt.c
@@ -22,8 +22,11 @@ struct drm_i915_gem_object *alloc_pt_lmem(struct 
i915_address_space *vm, int sz)
 * object underneath, with the idea that one object_lock() will lock
 * them all at once.
 */
-   if (!IS_ERR(obj))
-   obj->base.resv = &vm->resv;
+   if (!IS_ERR(obj)) {
+   obj->base.resv = i915_vm_resv_get(vm);
+   obj->resv_shared_from = obj->base.resv;
+   }
+
return obj;
 }
 
@@ -40,8 +43,11 

[PATCH v2 06/15] drm/i915/ttm: Embed a ttm buffer object in the i915 gem object

2021-05-18 Thread Thomas Hellström
Embed a struct ttm_buffer_object into the i915 gem object, making sure
we alias the gem object part. It's a bit unfortunate that the
struct ttm_buffer_ojbect embeds a gem object since we otherwise could
make the TTM part private to the TTM backend, and use the usual
i915 gem object for the other backends.
To make this a bit more storage efficient for the other backends,
we'd have to use a pointer for the gem object which would require
a lot of changes in the driver. We postpone that for later.

Signed-off-by: Thomas Hellström 
---
 drivers/gpu/drm/i915/gem/i915_gem_object.c   |  7 +++
 drivers/gpu/drm/i915/gem/i915_gem_object_types.h | 12 +++-
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index abadf0994ad0..c8953e3f5c70 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -62,6 +62,13 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj,
  const struct drm_i915_gem_object_ops *ops,
  struct lock_class_key *key, unsigned flags)
 {
+   /*
+* A gem object is embedded both in a struct ttm_buffer_object :/ and
+* in a drm_i915_gem_object. Make sure they are aliased.
+*/
+   BUILD_BUG_ON(offsetof(typeof(*obj), base) !=
+offsetof(typeof(*obj), __do_not_access.base));
+
spin_lock_init(&obj->vma.lock);
INIT_LIST_HEAD(&obj->vma.list);
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index dbd7fffe956e..98f69d8fd37d 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -10,6 +10,7 @@
 #include 
 
 #include 
+#include 
 #include 
 
 #include "i915_active.h"
@@ -99,7 +100,16 @@ struct i915_gem_object_page_iter {
 };
 
 struct drm_i915_gem_object {
-   struct drm_gem_object base;
+   /*
+* We might have reason to revisit the below since it wastes
+* a lot of space for non-ttm gem objects.
+* In any case, always use the accessors for the ttm_buffer_object
+* when accessing it.
+*/
+   union {
+   struct drm_gem_object base;
+   struct ttm_buffer_object __do_not_access;
+   };
 
const struct drm_i915_gem_object_ops *ops;
 
-- 
2.31.1



[PATCH v2 13/15] drm/ttm: Add BO and offset arguments for vm_access and vm_fault ttm handlers.

2021-05-18 Thread Thomas Hellström
From: Maarten Lankhorst 

This allows other drivers that may not setup the vma in the same way
to use the ttm bo helpers.

Also clarify the documentation a bit, especially related to VM_FAULT_RETRY.

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c|  4 +-
 drivers/gpu/drm/nouveau/nouveau_ttm.c  |  4 +-
 drivers/gpu/drm/radeon/radeon_ttm.c|  4 +-
 drivers/gpu/drm/ttm/ttm_bo_vm.c| 84 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_page_dirty.c |  8 ++-
 include/drm/ttm/ttm_bo_api.h   |  9 ++-
 6 files changed, 75 insertions(+), 38 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index d5a9d7a88315..89dafe14f828 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1919,7 +1919,9 @@ static vm_fault_t amdgpu_ttm_fault(struct vm_fault *vmf)
if (ret)
goto unlock;
 
-   ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma->vm_page_prot,
+   ret = ttm_bo_vm_fault_reserved(bo, vmf,
+  drm_vma_node_start(&bo->base.vma_node),
+  vmf->vma->vm_page_prot,
   TTM_BO_VM_NUM_PREFAULT, 1);
if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
return ret;
diff --git a/drivers/gpu/drm/nouveau/nouveau_ttm.c 
b/drivers/gpu/drm/nouveau/nouveau_ttm.c
index b81ae90b8449..555fb6d8be8b 100644
--- a/drivers/gpu/drm/nouveau/nouveau_ttm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c
@@ -144,7 +144,9 @@ static vm_fault_t nouveau_ttm_fault(struct vm_fault *vmf)
 
nouveau_bo_del_io_reserve_lru(bo);
prot = vm_get_page_prot(vma->vm_flags);
-   ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT, 1);
+   ret = ttm_bo_vm_fault_reserved(bo, vmf,
+  drm_vma_node_start(&bo->base.vma_node),
+  prot, TTM_BO_VM_NUM_PREFAULT, 1);
nouveau_bo_add_io_reserve_lru(bo);
if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
return ret;
diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c 
b/drivers/gpu/drm/radeon/radeon_ttm.c
index 3361d11769a2..ba48a2acdef0 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -816,7 +816,9 @@ static vm_fault_t radeon_ttm_fault(struct vm_fault *vmf)
if (ret)
goto unlock_resv;
 
-   ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma->vm_page_prot,
+   ret = ttm_bo_vm_fault_reserved(bo, vmf,
+  drm_vma_node_start(&bo->base.vma_node),
+  vmf->vma->vm_page_prot,
   TTM_BO_VM_NUM_PREFAULT, 1);
if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
goto unlock_mclk;
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index b31b18058965..ed00ccf1376e 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -42,7 +42,7 @@
 #include 
 
 static vm_fault_t ttm_bo_vm_fault_idle(struct ttm_buffer_object *bo,
-   struct vm_fault *vmf)
+  struct vm_fault *vmf)
 {
vm_fault_t ret = 0;
int err = 0;
@@ -122,7 +122,8 @@ static unsigned long ttm_bo_io_mem_pfn(struct 
ttm_buffer_object *bo,
  * Return:
  *0 on success and the bo was reserved.
  *VM_FAULT_RETRY if blocking wait.
- *VM_FAULT_NOPAGE if blocking wait and retrying was not allowed.
+ *VM_FAULT_NOPAGE if blocking wait and retrying was not allowed, or wait 
interrupted.
+ *VM_FAULT_SIGBUS if wait on bo->moving failed for reason other than a 
signal.
  */
 vm_fault_t ttm_bo_vm_reserve(struct ttm_buffer_object *bo,
 struct vm_fault *vmf)
@@ -254,7 +255,9 @@ static vm_fault_t ttm_bo_vm_insert_huge(struct vm_fault 
*vmf,
 
 /**
  * ttm_bo_vm_fault_reserved - TTM fault helper
+ * @bo: The buffer object
  * @vmf: The struct vm_fault given as argument to the fault callback
+ * @mmap_base: The base of the mmap, to which the @vmf fault is relative to.
  * @prot: The page protection to be used for this memory area.
  * @num_prefault: Maximum number of prefault pages. The caller may want to
  * specify this based on madvice settings and the size of the GPU object
@@ -265,19 +268,28 @@ static vm_fault_t ttm_bo_vm_insert_huge(struct vm_fault 
*vmf,
  * memory backing the buffer object, and then returns a return code
  * instructing the caller to retry the page access.
  *
+ * This function ensures any pipelined wait is finished.
+ *
+ * WARNING:
+ * On VM_FAULT_RETRY, the bo will be unlocked by this function when
+ * #FAULT_FLAG_RETRY_NOWAIT is not set inside @vmf->flags. In this
+ * case, the caller should 

[PATCH v2 09/15] drm/ttm, drm/amdgpu: Allow the driver some control over swapping

2021-05-18 Thread Thomas Hellström
We are calling the eviction_valuable driver callback at eviction time to
determine whether we actually can evict a buffer object.
The upcoming i915 TTM backend needs the same functionality for swapout,
and that might actually be beneficial to other drivers as well.

Add an eviction_valuable call also in the swapout path. Try to keep the
current behaviour for all drivers by returning true if the buffer object
is already in the TTM_PL_SYSTEM placement. We change behaviour for the
case where a buffer object is in a TT backed placement when swapped out,
in which case the drivers normal eviction_valuable path is run.

Finally export ttm_tt_unpopulate() and don't swap out bos
that are not populated. This allows a driver to purge a bo at
swapout time if its content is no longer valuable rather than to
have TTM swap the contents out.

Cc: Christian König 
Signed-off-by: Thomas Hellström 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c |  4 +++
 drivers/gpu/drm/ttm/ttm_bo.c| 41 +++--
 drivers/gpu/drm/ttm/ttm_tt.c|  4 +++
 3 files changed, 33 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 8c7ec09eb1a4..d5a9d7a88315 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1399,6 +1399,10 @@ static bool amdgpu_ttm_bo_eviction_valuable(struct 
ttm_buffer_object *bo,
struct dma_fence *f;
int i;
 
+   /* Swapout? */
+   if (bo->mem.mem_type == TTM_PL_SYSTEM)
+   return true;
+
if (bo->type == ttm_bo_type_kernel &&
!amdgpu_vm_evictable(ttm_to_amdgpu_bo(bo)))
return false;
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 4479c55aaa1d..6a3f3112f62a 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -531,6 +531,10 @@ static int ttm_bo_evict(struct ttm_buffer_object *bo,
 bool ttm_bo_eviction_valuable(struct ttm_buffer_object *bo,
  const struct ttm_place *place)
 {
+   dma_resv_assert_held(bo->base.resv);
+   if (bo->mem.mem_type == TTM_PL_SYSTEM)
+   return true;
+
/* Don't evict this BO if it's outside of the
 * requested placement range
 */
@@ -553,7 +557,9 @@ EXPORT_SYMBOL(ttm_bo_eviction_valuable);
  * b. Otherwise, trylock it.
  */
 static bool ttm_bo_evict_swapout_allowable(struct ttm_buffer_object *bo,
-   struct ttm_operation_ctx *ctx, bool *locked, bool *busy)
+  struct ttm_operation_ctx *ctx,
+  const struct ttm_place *place,
+  bool *locked, bool *busy)
 {
bool ret = false;
 
@@ -571,6 +577,12 @@ static bool ttm_bo_evict_swapout_allowable(struct 
ttm_buffer_object *bo,
*busy = !ret;
}
 
+   if (ret && place && !bo->bdev->funcs->eviction_valuable(bo, place)) {
+   ret = false;
+   if (locked)
+   dma_resv_unlock(bo->base.resv);
+   }
+
return ret;
 }
 
@@ -625,20 +637,14 @@ int ttm_mem_evict_first(struct ttm_device *bdev,
list_for_each_entry(bo, &man->lru[i], lru) {
bool busy;
 
-   if (!ttm_bo_evict_swapout_allowable(bo, ctx, &locked,
-   &busy)) {
+   if (!ttm_bo_evict_swapout_allowable(bo, ctx, place,
+   &locked, &busy)) {
if (busy && !busy_bo && ticket !=
dma_resv_locking_ctx(bo->base.resv))
busy_bo = bo;
continue;
}
 
-   if (place && !bdev->funcs->eviction_valuable(bo,
- place)) {
-   if (locked)
-   dma_resv_unlock(bo->base.resv);
-   continue;
-   }
if (!ttm_bo_get_unless_zero(bo)) {
if (locked)
dma_resv_unlock(bo->base.resv);
@@ -1138,10 +1144,18 @@ EXPORT_SYMBOL(ttm_bo_wait);
 int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx *ctx,
   gfp_t gfp_flags)
 {
+   struct ttm_place place = {};
bool locked;
int ret;
 
-   if (!ttm_bo_evict_swapout_allowable(bo, ctx, &locked, NULL))
+   /*
+* While the bo may already reside in SYSTEM placement, set
+* SYSTEM as new placement to cover also the move further below.
+* The driver may use the fact that we're moving from SYSTEM
+

[PATCH v2 14/15] drm/i915: Use ttm mmap handling for ttm bo's.

2021-05-18 Thread Thomas Hellström
From: Maarten Lankhorst 

Use the ttm handlers for servicing page faults, and vm_access.

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/i915/gem/i915_gem_mman.c  |  17 ++-
 drivers/gpu/drm/i915/gem/i915_gem_mman.h  |   2 +
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |   1 +
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c   | 105 +-
 4 files changed, 118 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c 
b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
index 65db290efd16..2bf89349dde9 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -19,6 +19,7 @@
 #include "i915_gem_mman.h"
 #include "i915_trace.h"
 #include "i915_user_extensions.h"
+#include "i915_gem_ttm.h"
 #include "i915_vma.h"
 
 static inline bool
@@ -789,7 +790,7 @@ i915_gem_mmap_offset_ioctl(struct drm_device *dev, void 
*data,
return __assign_mmap_offset(file, args->handle, type, &args->offset);
 }
 
-static void vm_open(struct vm_area_struct *vma)
+void i915_gem_mmap_vm_open(struct vm_area_struct *vma)
 {
struct i915_mmap_offset *mmo = vma->vm_private_data;
struct drm_i915_gem_object *obj = mmo->obj;
@@ -798,7 +799,7 @@ static void vm_open(struct vm_area_struct *vma)
i915_gem_object_get(obj);
 }
 
-static void vm_close(struct vm_area_struct *vma)
+void i915_gem_mmap_vm_close(struct vm_area_struct *vma)
 {
struct i915_mmap_offset *mmo = vma->vm_private_data;
struct drm_i915_gem_object *obj = mmo->obj;
@@ -810,15 +811,15 @@ static void vm_close(struct vm_area_struct *vma)
 static const struct vm_operations_struct vm_ops_gtt = {
.fault = vm_fault_gtt,
.access = vm_access,
-   .open = vm_open,
-   .close = vm_close,
+   .open = i915_gem_mmap_vm_open,
+   .close = i915_gem_mmap_vm_close,
 };
 
 static const struct vm_operations_struct vm_ops_cpu = {
.fault = vm_fault_cpu,
.access = vm_access,
-   .open = vm_open,
-   .close = vm_close,
+   .open = i915_gem_mmap_vm_open,
+   .close = i915_gem_mmap_vm_close,
 };
 
 static int singleton_release(struct inode *inode, struct file *file)
@@ -953,6 +954,10 @@ int i915_gem_mmap(struct file *filp, struct vm_area_struct 
*vma)
}
vma->vm_page_prot = pgprot_decrypted(vma->vm_page_prot);
 
+   /* override ops per-object if desired */
+   if (obj->ops->mmap_ops)
+   vma->vm_ops = obj->ops->mmap_ops;
+
return 0;
 }
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.h 
b/drivers/gpu/drm/i915/gem/i915_gem_mman.h
index efee9e0d2508..e5bd02a6db12 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.h
@@ -28,5 +28,7 @@ void __i915_gem_object_release_mmap_gtt(struct 
drm_i915_gem_object *obj);
 void i915_gem_object_release_mmap_gtt(struct drm_i915_gem_object *obj);
 
 void i915_gem_object_release_mmap_offset(struct drm_i915_gem_object *obj);
+void i915_gem_mmap_vm_open(struct vm_area_struct *vma);
+void i915_gem_mmap_vm_close(struct vm_area_struct *vma);
 
 #endif
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index b350765e1935..31d828e91cf4 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -79,6 +79,7 @@ struct drm_i915_gem_object_ops {
void (*delayed_free)(struct drm_i915_gem_object *obj);
void (*release)(struct drm_i915_gem_object *obj);
 
+   const struct vm_operations_struct *mmap_ops;
const char *name; /* friendly name for debug, e.g. lockdep classes */
 };
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index 790f5ec45c4d..fe9ac50b2470 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -14,6 +14,7 @@
 #include "gem/i915_gem_region.h"
 #include "gem/i915_gem_ttm.h"
 #include "gem/i915_gem_ttm_bo_util.h"
+#include "gem/i915_gem_mman.h"
 
 #define I915_PL_LMEM0 TTM_PL_PRIV
 #define I915_PL_SYSTEM TTM_PL_SYSTEM
@@ -345,6 +346,44 @@ static int i915_ttm_move(struct ttm_buffer_object *bo, 
bool evict,
return 0;
 }
 
+static int i915_ttm_io_mem_reserve(struct ttm_device *bdev, struct 
ttm_resource *mem)
+{
+   if (mem->mem_type < I915_PL_LMEM0)
+   return 0;
+
+   /* We may need to revisit this later, but this allows all caching to be 
used in mmap */
+   mem->bus.caching = ttm_cached;
+   mem->bus.is_iomem = true;
+
+   return 0;
+}
+
+static unsigned long i915_ttm_io_mem_pfn(struct ttm_buffer_object *bo,
+unsigned long page_offset)
+{
+   struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
+   struct sg_table *sgt = obj->ttm.cached_io_st;
+   struct scatterlist *sg;
+   unsigned int i;
+
+   GEM_WARN_ON(bo->ttm);
+
+   for_each_sgtable_dma

[PATCH v2 12/15] drm/i915: Disable mmap ioctl for gen12+

2021-05-18 Thread Thomas Hellström
From: Maarten Lankhorst 

The paltform should exclusively use mmap_offset, one less path to worry
about for discrete.

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/i915/gem/i915_gem_mman.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c 
b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
index 8598a1c78a4c..65db290efd16 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -56,10 +56,17 @@ int
 i915_gem_mmap_ioctl(struct drm_device *dev, void *data,
struct drm_file *file)
 {
+   struct drm_i915_private *i915 = to_i915(dev);
struct drm_i915_gem_mmap *args = data;
struct drm_i915_gem_object *obj;
unsigned long addr;
 
+   /* mmap ioctl is disallowed for all platforms after TGL-LP.  This also
+* covers all platforms with local memory.
+*/
+   if (INTEL_GEN(i915) >= 12 && !IS_TIGERLAKE(i915))
+   return -EOPNOTSUPP;
+
if (args->flags & ~(I915_MMAP_WC))
return -EINVAL;
 
-- 
2.31.1



[PATCH v2 10/15] drm/i915/ttm: Introduce a TTM i915 gem object backend

2021-05-18 Thread Thomas Hellström
Most logical place to introduce TTM buffer objects is as an i915
gem object backend. We need to add some ops to account for added
functionality like delayed delete and LRU list manipulation.

Initially we support only LMEM and SYSTEM memory, but SYSTEM
(which in this case means evicted LMEM objects) is not
visible to i915 GEM yet. The plan is to move the i915 gem system region
over to the TTM system memory type in upcoming patches.

We set up GPU bindings directly both from LMEM and from the system region,
as there is no need to use the legacy TTM_TT memory type. We reserve
that for future porting of GGTT bindings to TTM.

Remove the old lmem backend.

Signed-off-by: Thomas Hellström 
---
v2:
- Break out needed TTM functionality to a separate patch (Reported by
Christian König).
- Fix an unhandled error (Reported by Matthew Auld and Maarten Lankhorst)
- Remove a stray leftover sg_table allocation (Reported by Matthew Auld)
- Use ttm_tt_unpopulate() rather than ttm_tt_destroy() in the purge path
  as some TTM functionality relies on having a ttm_tt present for !is_iomem.
---
 drivers/gpu/drm/i915/Makefile |   1 +
 drivers/gpu/drm/i915/gem/i915_gem_lmem.c  |  84 ---
 drivers/gpu/drm/i915/gem/i915_gem_lmem.h  |   5 -
 drivers/gpu/drm/i915/gem/i915_gem_object.c| 125 +++--
 drivers/gpu/drm/i915/gem/i915_gem_object.h|   9 +
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |  18 +
 drivers/gpu/drm/i915/gem/i915_gem_region.c|   6 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c   | 519 ++
 drivers/gpu/drm/i915/gem/i915_gem_ttm.h   |  48 ++
 drivers/gpu/drm/i915/gt/intel_region_lmem.c   |   3 +-
 drivers/gpu/drm/i915/i915_gem.c   |   5 +-
 drivers/gpu/drm/i915/intel_memory_region.c|   1 -
 drivers/gpu/drm/i915/intel_memory_region.h|   1 -
 drivers/gpu/drm/i915/intel_region_ttm.c   |   5 +-
 drivers/gpu/drm/i915/intel_region_ttm.h   |   7 +-
 15 files changed, 696 insertions(+), 141 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_ttm.c
 create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_ttm.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 958ccc1edfed..ef0d884a9e2d 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -155,6 +155,7 @@ gem-y += \
gem/i915_gem_stolen.o \
gem/i915_gem_throttle.o \
gem/i915_gem_tiling.o \
+   gem/i915_gem_ttm.o \
gem/i915_gem_ttm_bo_util.o \
gem/i915_gem_userptr.o \
gem/i915_gem_wait.o \
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c 
b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c
index 3b4aa28a076d..2b8cd15de1d9 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c
@@ -4,74 +4,10 @@
  */
 
 #include "intel_memory_region.h"
-#include "intel_region_ttm.h"
 #include "gem/i915_gem_region.h"
 #include "gem/i915_gem_lmem.h"
 #include "i915_drv.h"
 
-static void lmem_put_pages(struct drm_i915_gem_object *obj,
-  struct sg_table *pages)
-{
-   intel_region_ttm_node_free(obj->mm.region, obj->mm.st_mm_node);
-   obj->mm.dirty = false;
-   sg_free_table(pages);
-   kfree(pages);
-}
-
-static int lmem_get_pages(struct drm_i915_gem_object *obj)
-{
-   unsigned int flags;
-   struct sg_table *pages;
-
-   flags = I915_ALLOC_MIN_PAGE_SIZE;
-   if (obj->flags & I915_BO_ALLOC_CONTIGUOUS)
-   flags |= I915_ALLOC_CONTIGUOUS;
-
-   obj->mm.st_mm_node = intel_region_ttm_node_alloc(obj->mm.region,
-obj->base.size,
-flags);
-   if (IS_ERR(obj->mm.st_mm_node))
-   return PTR_ERR(obj->mm.st_mm_node);
-
-   /* Range manager is always contigous */
-   if (obj->mm.region->is_range_manager)
-   obj->flags |= I915_BO_ALLOC_CONTIGUOUS;
-   pages = intel_region_ttm_node_to_st(obj->mm.region, obj->mm.st_mm_node);
-   if (IS_ERR(pages)) {
-   intel_region_ttm_node_free(obj->mm.region, obj->mm.st_mm_node);
-   return PTR_ERR(pages);
-   }
-
-   __i915_gem_object_set_pages(obj, pages, i915_sg_dma_sizes(pages->sgl));
-
-   if (obj->flags & I915_BO_ALLOC_CPU_CLEAR) {
-   void __iomem *vaddr =
-   i915_gem_object_lmem_io_map(obj, 0, obj->base.size);
-
-   if (!vaddr) {
-   struct sg_table *pages =
-   __i915_gem_object_unset_pages(obj);
-
-   if (!IS_ERR_OR_NULL(pages))
-   lmem_put_pages(obj, pages);
-   }
-
-   memset_io(vaddr, 0, obj->base.size);
-   io_mapping_unmap(vaddr);
-   }
-
-   return 0;
-}
-
-const struct drm_i915_gem_object_ops i915_gem_lmem_obj_ops = {
-   .name = "i915_gem_object_lmem",
-   .flags =

[PATCH v2 15/15] drm/i915/ttm: Add io sgt caching to i915_ttm_io_mem_pfn

2021-05-18 Thread Thomas Hellström
From: Maarten Lankhorst 

Instead of walking the sg table manually, use our caching helpers
to do the sgt caching. To prevent lifetime issues of ttm_bo vs
i915_gem_object, we will use a separate member, instead of re-using
the dma page member.

Signed-off-by: Maarten Lankhorst 
---
 drivers/gpu/drm/i915/gem/i915_gem_object.h|  6 +--
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |  1 +
 drivers/gpu/drm/i915/gem/i915_gem_pages.c |  3 +-
 drivers/gpu/drm/i915/gem/i915_gem_ttm.c   | 46 ++-
 4 files changed, 30 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object.h
index a3ad8cf4eefd..ff59e6c640e6 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
@@ -342,14 +342,14 @@ struct scatterlist *
 __i915_gem_object_get_sg(struct drm_i915_gem_object *obj,
 struct i915_gem_object_page_iter *iter,
 unsigned int n,
-unsigned int *offset, bool allow_alloc);
+unsigned int *offset, bool allow_alloc, bool dma);
 
 static inline struct scatterlist *
 i915_gem_object_get_sg(struct drm_i915_gem_object *obj,
   unsigned int n,
   unsigned int *offset, bool allow_alloc)
 {
-   return __i915_gem_object_get_sg(obj, &obj->mm.get_page, n, offset, 
allow_alloc);
+   return __i915_gem_object_get_sg(obj, &obj->mm.get_page, n, offset, 
allow_alloc, false);
 }
 
 static inline struct scatterlist *
@@ -357,7 +357,7 @@ i915_gem_object_get_sg_dma(struct drm_i915_gem_object *obj,
   unsigned int n,
   unsigned int *offset, bool allow_alloc)
 {
-   return __i915_gem_object_get_sg(obj, &obj->mm.get_dma_page, n, offset, 
allow_alloc);
+   return __i915_gem_object_get_sg(obj, &obj->mm.get_dma_page, n, offset, 
allow_alloc, true);
 }
 
 struct page *
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 31d828e91cf4..828310802b9f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -324,6 +324,7 @@ struct drm_i915_gem_object {
 
struct {
struct sg_table *cached_io_st;
+   struct i915_gem_object_page_iter get_io_page;
} ttm;
 
/** Record of address bit 17 of each page at last unbind. */
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c 
b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
index 62ee2185a41b..577352b4f2f6 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -465,9 +465,8 @@ __i915_gem_object_get_sg(struct drm_i915_gem_object *obj,
 struct i915_gem_object_page_iter *iter,
 unsigned int n,
 unsigned int *offset,
-bool allow_alloc)
+bool allow_alloc, bool dma)
 {
-   const bool dma = iter == &obj->mm.get_dma_page;
struct scatterlist *sg;
unsigned int idx, count;
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
index fe9ac50b2470..1eaefb89e859 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -167,11 +167,20 @@ static int i915_ttm_move_notify(struct ttm_buffer_object 
*bo)
 
 static void i915_ttm_free_cached_io_st(struct drm_i915_gem_object *obj)
 {
-   if (obj->ttm.cached_io_st) {
-   sg_free_table(obj->ttm.cached_io_st);
-   kfree(obj->ttm.cached_io_st);
-   obj->ttm.cached_io_st = NULL;
-   }
+   struct radix_tree_iter iter;
+   void __rcu **slot;
+
+   if (!obj->ttm.cached_io_st)
+   return;
+
+   rcu_read_lock();
+   radix_tree_for_each_slot(slot, &obj->ttm.get_io_page.radix, &iter, 0)
+   radix_tree_delete(&obj->ttm.get_io_page.radix, iter.index);
+   rcu_read_unlock();
+
+   sg_free_table(obj->ttm.cached_io_st);
+   kfree(obj->ttm.cached_io_st);
+   obj->ttm.cached_io_st = NULL;
 }
 
 static void i915_ttm_purge(struct drm_i915_gem_object *obj)
@@ -340,8 +349,11 @@ static int i915_ttm_move(struct ttm_buffer_object *bo, 
bool evict,
i915_ttm_move_memcpy(bo, new_mem, new_iter, old_iter);
i915_ttm_free_cached_io_st(obj);
 
-   if (!new_man->use_tt)
+   if (!new_man->use_tt) {
obj->ttm.cached_io_st = new_st;
+   obj->ttm.get_io_page.sg_pos = new_st->sgl;
+   obj->ttm.get_io_page.sg_idx = 0;
+   }
 
return 0;
 }
@@ -362,26 +374,15 @@ static unsigned long i915_ttm_io_mem_pfn(struct 
ttm_buffer_object *bo,
 unsigned long page_offset)
 {
struct drm_i915_gem_object *obj = i915_ttm_to_gem(bo);
-  

[PATCH v2 11/15] drm/i915/lmem: Verify checks for lmem residency

2021-05-18 Thread Thomas Hellström
Since objects can be migrated or evicted when not pinned or locked,
update the checks for lmem residency or future residency so that
the value returned is not immediately stale.

Signed-off-by: Thomas Hellström 
---
v2: Simplify i915_gem_object_migratable() (Reported by Mattew Auld)
---
 drivers/gpu/drm/i915/display/intel_display.c |  2 +-
 drivers/gpu/drm/i915/gem/i915_gem_lmem.c | 42 +++-
 drivers/gpu/drm/i915/gem/i915_gem_object.c   | 18 +
 drivers/gpu/drm/i915/gem/i915_gem_object.h   |  4 ++
 4 files changed, 64 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_display.c 
b/drivers/gpu/drm/i915/display/intel_display.c
index de1f13d203b5..b95def2d5af3 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -11615,7 +11615,7 @@ intel_user_framebuffer_create(struct drm_device *dev,
 
/* object is backed with LMEM for discrete */
i915 = to_i915(obj->base.dev);
-   if (HAS_LMEM(i915) && !i915_gem_object_is_lmem(obj)) {
+   if (HAS_LMEM(i915) && !i915_gem_object_validates_to_lmem(obj)) {
/* object is "remote", not in local memory */
i915_gem_object_put(obj);
return ERR_PTR(-EREMOTE);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c 
b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c
index 2b8cd15de1d9..d539dffa1554 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_lmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_lmem.c
@@ -23,10 +23,50 @@ i915_gem_object_lmem_io_map(struct drm_i915_gem_object *obj,
return io_mapping_map_wc(&obj->mm.region->iomap, offset, size);
 }
 
+/**
+ * i915_gem_object_validates_to_lmem - Whether the object is resident in
+ * lmem when pages are present.
+ * @obj: The object to check.
+ *
+ * Migratable objects residency may change from under us if the object is
+ * not pinned or locked. This function is intended to be used to check whether
+ * the object can only reside in lmem when pages are present.
+ *
+ * Return: Whether the object is always resident in lmem when pages are
+ * present.
+ */
+bool i915_gem_object_validates_to_lmem(struct drm_i915_gem_object *obj)
+{
+   struct intel_memory_region *mr = READ_ONCE(obj->mm.region);
+
+   return !i915_gem_object_migratable(obj) &&
+   mr && (mr->type == INTEL_MEMORY_LOCAL ||
+  mr->type == INTEL_MEMORY_STOLEN_LOCAL);
+}
+
+/**
+ * i915_gem_object_is_lmem - Whether the object is resident in
+ * lmem
+ * @obj: The object to check.
+ *
+ * Even if an object is allowed to migrate and change memory region,
+ * this function checks whether it will always be present in lmem when
+ * valid *or* if that's not the case, whether it's currently resident in lmem.
+ * For migratable and evictable objects, the latter only makes sense when
+ * the object is locked.
+ *
+ * Return: Whether the object migratable but resident in lmem, or not
+ * migratable and will be present in lmem when valid.
+ */
 bool i915_gem_object_is_lmem(struct drm_i915_gem_object *obj)
 {
-   struct intel_memory_region *mr = obj->mm.region;
+   struct intel_memory_region *mr = READ_ONCE(obj->mm.region);
 
+#ifdef CONFIG_LOCKDEP
+   if (i915_gem_object_migratable(obj) &&
+   i915_gem_object_evictable(obj))
+   assert_object_held(obj);
+#endif
return mr && (mr->type == INTEL_MEMORY_LOCAL ||
  mr->type == INTEL_MEMORY_STOLEN_LOCAL);
 }
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 8580996107ce..8484d8940531 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -457,6 +457,24 @@ bool i915_gem_object_evictable(struct drm_i915_gem_object 
*obj)
return pin_count == 0;
 }
 
+/**
+ * i915_gem_object_migratable - Whether the object is migratable out of the
+ * current region.
+ * @obj: Pointer to the object.
+ *
+ * Return: Whether the object is allowed to be resident in other
+ * regions than the current while pages are present.
+ */
+bool i915_gem_object_migratable(struct drm_i915_gem_object *obj)
+{
+   struct intel_memory_region *mr = READ_ONCE(obj->mm.region);
+
+   if (!mr)
+   return false;
+
+   return obj->mm.n_placements > 1;
+}
+
 void i915_gem_init__objects(struct drm_i915_private *i915)
 {
INIT_WORK(&i915->mm.free_work, __i915_gem_free_work);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object.h
index ae5930e307d5..a3ad8cf4eefd 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
@@ -596,6 +596,10 @@ void __i915_gem_free_object(struct drm_i915_gem_object 
*obj);
 
 bool i915_gem_object_evictable(struct drm_i915_gem_object *obj);
 
+bool i915_gem_object_migratable(struct drm_i915_gem_object *obj);
+
+bool i915_gem_object_validates_to_lmem(struct 

Re: [Intel-gfx] [PATCH v2 12/15] drm/i915: Disable mmap ioctl for gen12+

2021-05-18 Thread Thomas Hellström



On 5/18/21 10:26 AM, Thomas Hellström wrote:

From: Maarten Lankhorst 

The paltform should exclusively use mmap_offset, one less path to worry
about for discrete.

s/paltform/platform/


Signed-off-by: Maarten Lankhorst 


Otherwise,

Reviewed-by: Thomas Hellström 



---
  drivers/gpu/drm/i915/gem/i915_gem_mman.c | 7 +++
  1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c 
b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
index 8598a1c78a4c..65db290efd16 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -56,10 +56,17 @@ int
  i915_gem_mmap_ioctl(struct drm_device *dev, void *data,
struct drm_file *file)
  {
+   struct drm_i915_private *i915 = to_i915(dev);
struct drm_i915_gem_mmap *args = data;
struct drm_i915_gem_object *obj;
unsigned long addr;
  
+	/* mmap ioctl is disallowed for all platforms after TGL-LP.  This also

+* covers all platforms with local memory.
+*/
+   if (INTEL_GEN(i915) >= 12 && !IS_TIGERLAKE(i915))
+   return -EOPNOTSUPP;
+
if (args->flags & ~(I915_MMAP_WC))
return -EINVAL;
  


Re: [Intel-gfx] [PATCH v2 03/15] drm/i915: Fix i915_sg_page_sizes to record dma segments rather than physical pages

2021-05-18 Thread Matthew Auld
On Tue, 18 May 2021 at 09:27, Thomas Hellström
 wrote:
>
> All users of this function actually want the dma segment sizes, but that's
> not what's calculated. Fix that and rename the function to
> i915_sg_dma_sizes to reflect what's calculated.
>
> Signed-off-by: Thomas Hellström 
Reviewed-by: Matthew Auld 


Re: [Intel-gfx] [PATCH v2 13/15] drm/ttm: Add BO and offset arguments for vm_access and vm_fault ttm handlers.

2021-05-18 Thread Thomas Hellström

+ Christian König

On 5/18/21 10:26 AM, Thomas Hellström wrote:

From: Maarten Lankhorst 

This allows other drivers that may not setup the vma in the same way
to use the ttm bo helpers.

Also clarify the documentation a bit, especially related to VM_FAULT_RETRY.

Signed-off-by: Maarten Lankhorst 


Lgtm. Reviewed-by: Thomas Hellström 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c|  4 +-
  drivers/gpu/drm/nouveau/nouveau_ttm.c  |  4 +-
  drivers/gpu/drm/radeon/radeon_ttm.c|  4 +-
  drivers/gpu/drm/ttm/ttm_bo_vm.c| 84 +-
  drivers/gpu/drm/vmwgfx/vmwgfx_page_dirty.c |  8 ++-
  include/drm/ttm/ttm_bo_api.h   |  9 ++-
  6 files changed, 75 insertions(+), 38 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index d5a9d7a88315..89dafe14f828 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1919,7 +1919,9 @@ static vm_fault_t amdgpu_ttm_fault(struct vm_fault *vmf)
if (ret)
goto unlock;
  
-	ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma->vm_page_prot,

+   ret = ttm_bo_vm_fault_reserved(bo, vmf,
+  drm_vma_node_start(&bo->base.vma_node),
+  vmf->vma->vm_page_prot,
   TTM_BO_VM_NUM_PREFAULT, 1);
if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
return ret;
diff --git a/drivers/gpu/drm/nouveau/nouveau_ttm.c 
b/drivers/gpu/drm/nouveau/nouveau_ttm.c
index b81ae90b8449..555fb6d8be8b 100644
--- a/drivers/gpu/drm/nouveau/nouveau_ttm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c
@@ -144,7 +144,9 @@ static vm_fault_t nouveau_ttm_fault(struct vm_fault *vmf)
  
  	nouveau_bo_del_io_reserve_lru(bo);

prot = vm_get_page_prot(vma->vm_flags);
-   ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT, 1);
+   ret = ttm_bo_vm_fault_reserved(bo, vmf,
+  drm_vma_node_start(&bo->base.vma_node),
+  prot, TTM_BO_VM_NUM_PREFAULT, 1);
nouveau_bo_add_io_reserve_lru(bo);
if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
return ret;
diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c 
b/drivers/gpu/drm/radeon/radeon_ttm.c
index 3361d11769a2..ba48a2acdef0 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -816,7 +816,9 @@ static vm_fault_t radeon_ttm_fault(struct vm_fault *vmf)
if (ret)
goto unlock_resv;
  
-	ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma->vm_page_prot,

+   ret = ttm_bo_vm_fault_reserved(bo, vmf,
+  drm_vma_node_start(&bo->base.vma_node),
+  vmf->vma->vm_page_prot,
   TTM_BO_VM_NUM_PREFAULT, 1);
if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT))
goto unlock_mclk;
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index b31b18058965..ed00ccf1376e 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -42,7 +42,7 @@
  #include 
  
  static vm_fault_t ttm_bo_vm_fault_idle(struct ttm_buffer_object *bo,

-   struct vm_fault *vmf)
+  struct vm_fault *vmf)
  {
vm_fault_t ret = 0;
int err = 0;
@@ -122,7 +122,8 @@ static unsigned long ttm_bo_io_mem_pfn(struct 
ttm_buffer_object *bo,
   * Return:
   *0 on success and the bo was reserved.
   *VM_FAULT_RETRY if blocking wait.
- *VM_FAULT_NOPAGE if blocking wait and retrying was not allowed.
+ *VM_FAULT_NOPAGE if blocking wait and retrying was not allowed, or wait 
interrupted.
+ *VM_FAULT_SIGBUS if wait on bo->moving failed for reason other than a 
signal.
   */
  vm_fault_t ttm_bo_vm_reserve(struct ttm_buffer_object *bo,
 struct vm_fault *vmf)
@@ -254,7 +255,9 @@ static vm_fault_t ttm_bo_vm_insert_huge(struct vm_fault 
*vmf,
  
  /**

   * ttm_bo_vm_fault_reserved - TTM fault helper
+ * @bo: The buffer object
   * @vmf: The struct vm_fault given as argument to the fault callback
+ * @mmap_base: The base of the mmap, to which the @vmf fault is relative to.
   * @prot: The page protection to be used for this memory area.
   * @num_prefault: Maximum number of prefault pages. The caller may want to
   * specify this based on madvice settings and the size of the GPU object
@@ -265,19 +268,28 @@ static vm_fault_t ttm_bo_vm_insert_huge(struct vm_fault 
*vmf,
   * memory backing the buffer object, and then returns a return code
   * instructing the caller to retry the page access.
   *
+ * This function ensures any pipelined wait is finished.
+ *
+ * WARNING:
+ * On VM_FAULT_RETRY, the bo will be unl

Re: [PATCH v2 04/15] drm/ttm: Export functions to initialize and finalize the ttm range manager standalone

2021-05-18 Thread Daniel Vetter
On Tue, May 18, 2021 at 10:26:50AM +0200, Thomas Hellström wrote:
> i915 mock selftests are run without the device set up. In order to be able
> to run the region related mock selftests, export functions in order for the
> TTM range manager to be set up without a device to attach it to.
> 
> Cc: Christian König 
> Signed-off-by: Thomas Hellström 
> ---
>  drivers/gpu/drm/ttm/ttm_range_manager.c | 55 +
>  include/drm/ttm/ttm_bo_driver.h | 23 +++
>  2 files changed, 61 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/gpu/drm/ttm/ttm_range_manager.c 
> b/drivers/gpu/drm/ttm/ttm_range_manager.c
> index b9d5da6e6a81..6957dfb0cf5a 100644
> --- a/drivers/gpu/drm/ttm/ttm_range_manager.c
> +++ b/drivers/gpu/drm/ttm/ttm_range_manager.c
> @@ -125,55 +125,76 @@ static const struct ttm_resource_manager_func 
> ttm_range_manager_func = {
>   .debug = ttm_range_man_debug
>  };
>  
> -int ttm_range_man_init(struct ttm_device *bdev,
> -unsigned type, bool use_tt,
> -unsigned long p_size)
> +struct ttm_resource_manager *
> +ttm_range_man_init_standalone(unsigned long size, bool use_tt)
>  {
>   struct ttm_resource_manager *man;
>   struct ttm_range_manager *rman;
>  
>   rman = kzalloc(sizeof(*rman), GFP_KERNEL);
>   if (!rman)
> - return -ENOMEM;
> + return ERR_PTR(-ENOMEM);
>  
>   man = &rman->manager;
>   man->use_tt = use_tt;
>  
>   man->func = &ttm_range_manager_func;
>  
> - ttm_resource_manager_init(man, p_size);
> + ttm_resource_manager_init(man, size);
>  
> - drm_mm_init(&rman->mm, 0, p_size);
> + drm_mm_init(&rman->mm, 0, size);
>   spin_lock_init(&rman->lock);
>  
> - ttm_set_driver_manager(bdev, type, &rman->manager);
> + return man;
> +}
> +EXPORT_SYMBOL(ttm_range_man_init_standalone);
> +
> +int ttm_range_man_init(struct ttm_device *bdev,
> +unsigned int type, bool use_tt,
> +unsigned long p_size)
> +{
> + struct ttm_resource_manager *man;
> +
> + man = ttm_range_man_init_standalone(p_size, use_tt);
> + if (IS_ERR(man))
> + return PTR_ERR(man);
> +
>   ttm_resource_manager_set_used(man, true);
> + ttm_set_driver_manager(bdev, type, man);
> +
>   return 0;
>  }
>  EXPORT_SYMBOL(ttm_range_man_init);
>  
> +void ttm_range_man_fini_standalone(struct ttm_resource_manager *man)
> +{
> + struct ttm_range_manager *rman = to_range_manager(man);
> + struct drm_mm *mm = &rman->mm;
> +
> + spin_lock(&rman->lock);
> + drm_mm_clean(mm);
> + drm_mm_takedown(mm);
> + spin_unlock(&rman->lock);
> +
> + ttm_resource_manager_cleanup(man);
> + kfree(rman);
> +}
> +EXPORT_SYMBOL(ttm_range_man_fini_standalone);
> +
>  int ttm_range_man_fini(struct ttm_device *bdev,
>  unsigned type)
>  {
>   struct ttm_resource_manager *man = ttm_manager_type(bdev, type);
> - struct ttm_range_manager *rman = to_range_manager(man);
> - struct drm_mm *mm = &rman->mm;
>   int ret;
>  
>   ttm_resource_manager_set_used(man, false);
> -
>   ret = ttm_resource_manager_evict_all(bdev, man);
>   if (ret)
>   return ret;
>  
> - spin_lock(&rman->lock);
> - drm_mm_clean(mm);
> - drm_mm_takedown(mm);
> - spin_unlock(&rman->lock);
> -
> - ttm_resource_manager_cleanup(man);
>   ttm_set_driver_manager(bdev, type, NULL);
> - kfree(rman);
> + ttm_range_man_fini_standalone(man);
> +
>   return 0;
>  }
>  EXPORT_SYMBOL(ttm_range_man_fini);
> diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
> index dbccac957f8f..734b1712ea72 100644
> --- a/include/drm/ttm/ttm_bo_driver.h
> +++ b/include/drm/ttm/ttm_bo_driver.h
> @@ -321,6 +321,20 @@ int ttm_range_man_init(struct ttm_device *bdev,
>  unsigned type, bool use_tt,
>  unsigned long p_size);
>  
> +/**
> + * ttm_range_man_init_standalone - Initialize a ttm range manager without
> + * device interaction.
> + * @size: Size of the area to be managed in pages.
> + * @use_tt: The memory type requires tt backing.
> + *
> + * This function is intended for selftests. It initializes a range manager
> + * without any device interaction.
> + *
> + * Return: pointer to a range manager on success. Error pointer on failure.
> + */

Kerneldoc is great and I'm happy you're updating them (Christian's not so
much good for this), but I think would be good to go one step further with
a prep patch:

- Make sure ttm_bo_driver.h is appropriately included in
  Documentation/gpu/drm-mm.rst.

- Fix up any kerneldoc fallout. Specifically I think common usage at least
  is that for non-inline functions, the kerneldoc is in the .c file, not
  in the headers.

But also this might be way too much work since ttm hasn't been properly
kerneldoc-ified, so maybe later.
-Daniel

> +struct ttm_resource_manager *
> +ttm_r

Re: [Intel-gfx] [PATCH v2 05/15] drm/i915/ttm Initialize the ttm device and memory managers

2021-05-18 Thread Matthew Auld
On Tue, 18 May 2021 at 09:27, Thomas Hellström
 wrote:
>
> Temporarily remove the buddy allocator and related selftests
> and hook up the TTM range manager for i915 regions.
>
> Also modify the mock region selftests somewhat to account for a
> fragmenting manager.
>
> Signed-off-by: Thomas Hellström 
> ---
> v2:
> - Fix an error unwind in lmem_get_pages() (Reported by Matthew Auld)
> - Break out and modify usage of i915_sg_dma_sizes() (Reported by Mattew Auld)
> - Break out TTM changes to a separate patch (Reported by Christian König)
> ---



>
> +/**
> + * i915_sg_from_mm_node - Create an sg_table from a struct drm_mm_node
> + * @node: The drm_mm_node.
> + * @region_start: An offset to add to the dma addresses of the sg list.
> + *
> + * Create a struct sg_table, initializing it from a struct drm_mm_node,
> + * taking a maximum segment length into account, splitting into segments
> + * if necessary.
> + *
> + * Return: A pointer to a kmalloced struct sg_table on success, negative
> + * error code cast to an error pointer on failure.
> + */
> +struct sg_table *i915_sg_from_mm_node(const struct drm_mm_node *node,
> + u64 region_start)
> +{
> +   const u64 max_segment = SZ_1G; /* Do we have a limit on this? */

For lmem just INT_MAX I think, which is the limit of the sg, but
really doesn't matter for now, this should be totally fine for now.

Assuming CI is happy,
Reviewed-by: Matthew Auld 

Also we could maybe fling this series with the HAX autoprobing patch
for DG1 at trybot, just to see where we are?


[PATCH 2/5] video: fbdev: ssd1307fb: Simplify ssd1307fb_update_display()

2021-05-18 Thread Geert Uytterhoeven
Simplify the nested loops to handle conversion from linear frame buffer
to ssd1307 page layout:
  1. Move last page handling one level up, as the value of "m" is the
 same inside a page,
  2. array->data[] is filled linearly, so there is no need to
 recalculate array_idx over and over again; a simple increment is
 sufficient.

Signed-off-by: Geert Uytterhoeven 
---
 drivers/video/fbdev/ssd1307fb.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/video/fbdev/ssd1307fb.c b/drivers/video/fbdev/ssd1307fb.c
index e6b6263e3bef847f..6d7bd025bca1a175 100644
--- a/drivers/video/fbdev/ssd1307fb.c
+++ b/drivers/video/fbdev/ssd1307fb.c
@@ -158,6 +158,7 @@ static int ssd1307fb_update_display(struct ssd1307fb_par 
*par)
u8 *vmem = par->info->screen_buffer;
unsigned int line_length = par->info->fix.line_length;
unsigned int pages = DIV_ROUND_UP(par->height, 8);
+   u32 array_idx = 0;
int ret, i, j, k;
 
array = ssd1307fb_alloc_array(par->width * pages, SSD1307FB_DATA);
@@ -194,19 +195,21 @@ static int ssd1307fb_update_display(struct ssd1307fb_par 
*par)
 */
 
for (i = 0; i < pages; i++) {
+   int m = 8;
+
+   /* Last page may be partial */
+   if (i + 1 == pages && par->height % 8)
+   m = par->height % 8;
for (j = 0; j < par->width; j++) {
-   int m = 8;
-   u32 array_idx = i * par->width + j;
-   array->data[array_idx] = 0;
-   /* Last page may be partial */
-   if (i + 1 == pages && par->height % 8)
-   m = par->height % 8;
+   u8 data = 0;
+
for (k = 0; k < m; k++) {
u8 byte = vmem[(8 * i + k) * line_length +
   j / 8];
u8 bit = (byte >> (j % 8)) & 1;
-   array->data[array_idx] |= bit << k;
+   data |= bit << k;
}
+   array->data[array_idx++] = data;
}
}
 
-- 
2.25.1



[PATCH 3/5] video: fbdev: ssd1307fb: Extract ssd1307fb_set_address_range()

2021-05-18 Thread Geert Uytterhoeven
Extract the code to set the column and page ranges into a helper
function.

Signed-off-by: Geert Uytterhoeven 
---
 drivers/video/fbdev/ssd1307fb.c | 61 +++--
 1 file changed, 36 insertions(+), 25 deletions(-)

diff --git a/drivers/video/fbdev/ssd1307fb.c b/drivers/video/fbdev/ssd1307fb.c
index 6d7bd025bca1a175..cfa27ea0feab4f01 100644
--- a/drivers/video/fbdev/ssd1307fb.c
+++ b/drivers/video/fbdev/ssd1307fb.c
@@ -152,6 +152,38 @@ static inline int ssd1307fb_write_cmd(struct i2c_client 
*client, u8 cmd)
return ret;
 }
 
+static int ssd1307fb_set_address_range(struct ssd1307fb_par *par, u8 col_start,
+  u8 cols, u8 page_start, u8 pages)
+{
+   u8 col_end = col_start + cols - 1;
+   u8 page_end = page_start + pages - 1;
+   int ret;
+
+   /* Set column range */
+   ret = ssd1307fb_write_cmd(par->client, SSD1307FB_SET_COL_RANGE);
+   if (ret < 0)
+   return ret;
+
+   ret = ssd1307fb_write_cmd(par->client, col_start);
+   if (ret < 0)
+   return ret;
+
+   ret = ssd1307fb_write_cmd(par->client, col_end);
+   if (ret < 0)
+   return ret;
+
+   /* Set page range */
+   ret = ssd1307fb_write_cmd(par->client, SSD1307FB_SET_PAGE_RANGE);
+   if (ret < 0)
+   return ret;
+
+   ret = ssd1307fb_write_cmd(par->client, page_start);
+   if (ret < 0)
+   return ret;
+
+   return ssd1307fb_write_cmd(par->client, page_end);
+}
+
 static int ssd1307fb_update_display(struct ssd1307fb_par *par)
 {
struct ssd1307fb_array *array;
@@ -461,31 +493,10 @@ static int ssd1307fb_init(struct ssd1307fb_par *par)
if (ret < 0)
return ret;
 
-   /* Set column range */
-   ret = ssd1307fb_write_cmd(par->client, SSD1307FB_SET_COL_RANGE);
-   if (ret < 0)
-   return ret;
-
-   ret = ssd1307fb_write_cmd(par->client, par->col_offset);
-   if (ret < 0)
-   return ret;
-
-   ret = ssd1307fb_write_cmd(par->client, par->col_offset + par->width - 
1);
-   if (ret < 0)
-   return ret;
-
-   /* Set page range */
-   ret = ssd1307fb_write_cmd(par->client, SSD1307FB_SET_PAGE_RANGE);
-   if (ret < 0)
-   return ret;
-
-   ret = ssd1307fb_write_cmd(par->client, par->page_offset);
-   if (ret < 0)
-   return ret;
-
-   ret = ssd1307fb_write_cmd(par->client,
- par->page_offset +
- DIV_ROUND_UP(par->height, 8) - 1);
+   /* Set column and page range */
+   ret = ssd1307fb_set_address_range(par, par->col_offset, par->width,
+ par->page_offset,
+ DIV_ROUND_UP(par->height, 8));
if (ret < 0)
return ret;
 
-- 
2.25.1



[PATCH] dt-bindings: display: ssd1307fb: Convert to json-schema

2021-05-18 Thread Geert Uytterhoeven
Convert the Solomon SSD1307 Framebuffer Device Tree binding
documentation to json-schema.

Fix the spelling of the "pwms" property.
Document default values.
Make properties with default values not required.

Signed-off-by: Geert Uytterhoeven 
---
I have listed Maxime as the maintainer, as he wrote the original driver
and bindings.  Maxime: Please scream if this is inappropriate ;-)
---
 .../bindings/display/solomon,ssd1307fb.yaml   | 166 ++
 .../devicetree/bindings/display/ssd1307fb.txt |  60 ---
 2 files changed, 166 insertions(+), 60 deletions(-)
 create mode 100644 
Documentation/devicetree/bindings/display/solomon,ssd1307fb.yaml
 delete mode 100644 Documentation/devicetree/bindings/display/ssd1307fb.txt

diff --git a/Documentation/devicetree/bindings/display/solomon,ssd1307fb.yaml 
b/Documentation/devicetree/bindings/display/solomon,ssd1307fb.yaml
new file mode 100644
index ..bd632d86a4f814a0
--- /dev/null
+++ b/Documentation/devicetree/bindings/display/solomon,ssd1307fb.yaml
@@ -0,0 +1,166 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/display/solomon,ssd1307fb.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Solomon SSD1307 OLED Controller Framebuffer
+
+maintainers:
+  - Maxime Ripard 
+
+properties:
+  compatible:
+enum:
+  - solomon,ssd1305fb-i2c
+  - solomon,ssd1306fb-i2c
+  - solomon,ssd1307fb-i2c
+  - solomon,ssd1309fb-i2c
+
+  reg:
+maxItems: 1
+
+  pwms:
+maxItems: 1
+
+  reset-gpios:
+maxItems: 1
+
+  vbat-supply:
+description: The supply for VBAT
+
+  solomon,height:
+$ref: /schemas/types.yaml#/definitions/uint32
+default: 16
+description:
+  Height in pixel of the screen driven by the controller
+
+  solomon,width:
+$ref: /schemas/types.yaml#/definitions/uint32
+default: 96
+description:
+  Width in pixel of the screen driven by the controller
+
+  solomon,page-offset:
+$ref: /schemas/types.yaml#/definitions/uint32
+default: 1
+description:
+  Offset of pages (band of 8 pixels) that the screen is mapped to
+
+  solomon,segment-no-remap:
+type: boolean
+description:
+  Display needs normal (non-inverted) data column to segment mapping
+
+  solomon,col-offset:
+$ref: /schemas/types.yaml#/definitions/uint32
+default: 0
+description:
+  Offset of columns (COL/SEG) that the screen is mapped to
+
+  solomon,com-seq:
+type: boolean
+description:
+  Display uses sequential COM pin configuration
+
+  solomon,com-lrremap:
+type: boolean
+description:
+  Display uses left-right COM pin remap
+
+  solomon,com-invdir:
+type: boolean
+description:
+  Display uses inverted COM pin scan direction
+
+  solomon,com-offset:
+$ref: /schemas/types.yaml#/definitions/uint32
+default: 0
+description:
+  Number of the COM pin wired to the first display line
+
+  solomon,prechargep1:
+$ref: /schemas/types.yaml#/definitions/uint32
+default: 2
+description:
+  Length of deselect period (phase 1) in clock cycles
+
+  solomon,prechargep2:
+$ref: /schemas/types.yaml#/definitions/uint32
+default: 2
+description:
+  Length of precharge period (phase 2) in clock cycles.  This needs to be
+  the higher, the higher the capacitance of the OLED's pixels is.
+
+  solomon,dclk-div:
+$ref: /schemas/types.yaml#/definitions/uint32
+minimum: 1
+maximum: 16
+description:
+  Clock divisor. The default value is controller-dependent.
+
+  solomon,dclk-frq:
+$ref: /schemas/types.yaml#/definitions/uint32
+minimum: 0
+maximum: 15
+description:
+  Clock frequency, higher value means higher frequency.
+  The default value is controller-dependent.
+
+  solomon,lookup-table:
+$ref: /schemas/types.yaml#/definitions/uint8-array
+maxItems: 4
+description:
+  8 bit value array of current drive pulse widths for BANK0, and colors A,
+  B, and C. Each value in range of 31 to 63 for pulse widths of 32 to 64.
+  Color D is always width 64.
+
+  solomon,area-color-enable:
+type: boolean
+description:
+  Display uses color mode
+
+  solomon,low-power:
+type: boolean
+description:
+  Display runs in low power mode
+
+required:
+  - compatible
+  - reg
+
+if:
+  properties:
+compatible:
+  contains:
+const: solomon,ssd1307fb-i2c
+then:
+  required:
+- pwms
+
+additionalProperties: false
+
+examples:
+  - |
+i2c1 {
+#address-cells = <1>;
+#size-cells = <0>;
+
+ssd1307: oled@3c {
+compatible = "solomon,ssd1307fb-i2c";
+reg = <0x3c>;
+pwms = <&pwm 4 3000>;
+reset-gpios = <&gpio2 7>;
+};
+
+ssd1306: oled@3d {
+compatible = "solomon,ssd1306fb-i2c";
+r

[PATCH 1/5] video: fbdev: ssd1307fb: Propagate errors via ssd1307fb_update_display()

2021-05-18 Thread Geert Uytterhoeven
Make ssd1307fb_update_display() return an error code, so callers that
can handle failures can propagate it.

Signed-off-by: Geert Uytterhoeven 
---
 drivers/video/fbdev/ssd1307fb.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/video/fbdev/ssd1307fb.c b/drivers/video/fbdev/ssd1307fb.c
index eda448b7a0c9d8ce..e6b6263e3bef847f 100644
--- a/drivers/video/fbdev/ssd1307fb.c
+++ b/drivers/video/fbdev/ssd1307fb.c
@@ -152,17 +152,17 @@ static inline int ssd1307fb_write_cmd(struct i2c_client 
*client, u8 cmd)
return ret;
 }
 
-static void ssd1307fb_update_display(struct ssd1307fb_par *par)
+static int ssd1307fb_update_display(struct ssd1307fb_par *par)
 {
struct ssd1307fb_array *array;
u8 *vmem = par->info->screen_buffer;
unsigned int line_length = par->info->fix.line_length;
unsigned int pages = DIV_ROUND_UP(par->height, 8);
-   int i, j, k;
+   int ret, i, j, k;
 
array = ssd1307fb_alloc_array(par->width * pages, SSD1307FB_DATA);
if (!array)
-   return;
+   return -ENOMEM;
 
/*
 * The screen is divided in pages, each having a height of 8
@@ -210,8 +210,9 @@ static void ssd1307fb_update_display(struct ssd1307fb_par 
*par)
}
}
 
-   ssd1307fb_write_array(par->client, array, par->width * pages);
+   ret = ssd1307fb_write_array(par->client, array, par->width * pages);
kfree(array);
+   return ret;
 }
 
 
@@ -222,6 +223,7 @@ static ssize_t ssd1307fb_write(struct fb_info *info, const 
char __user *buf,
unsigned long total_size;
unsigned long p = *ppos;
void *dst;
+   int ret;
 
total_size = info->fix.smem_len;
 
@@ -239,7 +241,9 @@ static ssize_t ssd1307fb_write(struct fb_info *info, const 
char __user *buf,
if (copy_from_user(dst, buf, count))
return -EFAULT;
 
-   ssd1307fb_update_display(par);
+   ret = ssd1307fb_update_display(par);
+   if (ret < 0)
+   return ret;
 
*ppos += count;
 
@@ -483,7 +487,9 @@ static int ssd1307fb_init(struct ssd1307fb_par *par)
return ret;
 
/* Clear the screen */
-   ssd1307fb_update_display(par);
+   ret = ssd1307fb_update_display(par);
+   if (ret < 0)
+   return ret;
 
/* Turn on the display */
ret = ssd1307fb_write_cmd(par->client, SSD1307FB_DISPLAY_ON);
-- 
2.25.1



[PATCH 0/5] video: fbdev: ssd1307fb: Optimizations and improvements

2021-05-18 Thread Geert Uytterhoeven
Hi all,

This patch series optimizes console operations on ssd1307fb, after the
customary fixes and cleanups.

Currently, each screen update triggers an I2C transfer of all screen
data, up to 1 KiB of data for a 128x64 display, which takes at least 20
ms in Fast mode.  While many displays are smaller, and thus require less
data to be transferred, 20 ms is still an optimistic value, as the
actual data transfer may be much slower, especially on bitbanged I2C
drivers.  After this series, the amount of data transfer is reduced, as
fillrect, copyarea, and imageblit only update the rectangle that
changed.

This has been tested on an Adafruit FeatherWing OLED with an SSD1306
controller and a 128x32 OLED, connected to an OrangeCrab ECP5 FPGA board
running a 64 MHz VexRiscv RISC-V softcore, where it reduced the CPU
usage for blinking the cursor from more than 70% to ca. 10%.

Thanks for your comments!

Geert Uytterhoeven (5):
  video: fbdev: ssd1307fb: Propagate errors via
ssd1307fb_update_display()
  video: fbdev: ssd1307fb: Simplify ssd1307fb_update_display()
  video: fbdev: ssd1307fb: Extract ssd1307fb_set_address_range()
  video: fbdev: ssd1307fb: Optimize screen updates
  video: fbdev: ssd1307fb: Cache address ranges

 drivers/video/fbdev/ssd1307fb.c | 143 +---
 1 file changed, 96 insertions(+), 47 deletions(-)

-- 
2.25.1

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


[PATCH 4/5] video: fbdev: ssd1307fb: Optimize screen updates

2021-05-18 Thread Geert Uytterhoeven
Currently, each screen update triggers an I2C transfer of all screen
data, up to 1 KiB of data for a 128x64 display, which takes at least 20
ms in Fast mode.

Reduce the amount of transferred data by only updating the rectangle
that changed.  Remove the call to ssd1307fb_set_address_range() during
initialization, as ssd1307fb_update_rect() now takes care of that.

Note that for now the optimized operation is only used for fillrect,
copyarea, and imageblit, which are used by fbcon.

Signed-off-by: Geert Uytterhoeven 
---
 drivers/video/fbdev/ssd1307fb.c | 43 -
 1 file changed, 26 insertions(+), 17 deletions(-)

diff --git a/drivers/video/fbdev/ssd1307fb.c b/drivers/video/fbdev/ssd1307fb.c
index cfa27ea0feab4f01..8e3d4be74723b9bf 100644
--- a/drivers/video/fbdev/ssd1307fb.c
+++ b/drivers/video/fbdev/ssd1307fb.c
@@ -184,16 +184,18 @@ static int ssd1307fb_set_address_range(struct 
ssd1307fb_par *par, u8 col_start,
return ssd1307fb_write_cmd(par->client, page_end);
 }
 
-static int ssd1307fb_update_display(struct ssd1307fb_par *par)
+static int ssd1307fb_update_rect(struct ssd1307fb_par *par, unsigned int x,
+unsigned int y, unsigned int width,
+unsigned int height)
 {
struct ssd1307fb_array *array;
u8 *vmem = par->info->screen_buffer;
unsigned int line_length = par->info->fix.line_length;
-   unsigned int pages = DIV_ROUND_UP(par->height, 8);
+   unsigned int pages = DIV_ROUND_UP(height + y % 8, 8);
u32 array_idx = 0;
int ret, i, j, k;
 
-   array = ssd1307fb_alloc_array(par->width * pages, SSD1307FB_DATA);
+   array = ssd1307fb_alloc_array(width * pages, SSD1307FB_DATA);
if (!array)
return -ENOMEM;
 
@@ -226,13 +228,18 @@ static int ssd1307fb_update_display(struct ssd1307fb_par 
*par)
 *  (5) A4 B4 C4 D4 E4 F4 G4 H4
 */
 
-   for (i = 0; i < pages; i++) {
+   ret = ssd1307fb_set_address_range(par, par->col_offset + x, width,
+ par->page_offset + y / 8, pages);
+   if (ret < 0)
+   goto out_free;
+
+   for (i = y / 8; i < y / 8 + pages; i++) {
int m = 8;
 
/* Last page may be partial */
-   if (i + 1 == pages && par->height % 8)
+   if (8 * (i + 1) > par->height)
m = par->height % 8;
-   for (j = 0; j < par->width; j++) {
+   for (j = x; j < x + width; j++) {
u8 data = 0;
 
for (k = 0; k < m; k++) {
@@ -245,11 +252,17 @@ static int ssd1307fb_update_display(struct ssd1307fb_par 
*par)
}
}
 
-   ret = ssd1307fb_write_array(par->client, array, par->width * pages);
+   ret = ssd1307fb_write_array(par->client, array, width * pages);
+
+out_free:
kfree(array);
return ret;
 }
 
+static int ssd1307fb_update_display(struct ssd1307fb_par *par)
+{
+   return ssd1307fb_update_rect(par, 0, 0, par->width, par->height);
+}
 
 static ssize_t ssd1307fb_write(struct fb_info *info, const char __user *buf,
size_t count, loff_t *ppos)
@@ -299,21 +312,24 @@ static void ssd1307fb_fillrect(struct fb_info *info, 
const struct fb_fillrect *r
 {
struct ssd1307fb_par *par = info->par;
sys_fillrect(info, rect);
-   ssd1307fb_update_display(par);
+   ssd1307fb_update_rect(par, rect->dx, rect->dy, rect->width,
+ rect->height);
 }
 
 static void ssd1307fb_copyarea(struct fb_info *info, const struct fb_copyarea 
*area)
 {
struct ssd1307fb_par *par = info->par;
sys_copyarea(info, area);
-   ssd1307fb_update_display(par);
+   ssd1307fb_update_rect(par, area->dx, area->dy, area->width,
+ area->height);
 }
 
 static void ssd1307fb_imageblit(struct fb_info *info, const struct fb_image 
*image)
 {
struct ssd1307fb_par *par = info->par;
sys_imageblit(info, image);
-   ssd1307fb_update_display(par);
+   ssd1307fb_update_rect(par, image->dx, image->dy, image->width,
+ image->height);
 }
 
 static const struct fb_ops ssd1307fb_ops = {
@@ -493,13 +509,6 @@ static int ssd1307fb_init(struct ssd1307fb_par *par)
if (ret < 0)
return ret;
 
-   /* Set column and page range */
-   ret = ssd1307fb_set_address_range(par, par->col_offset, par->width,
- par->page_offset,
- DIV_ROUND_UP(par->height, 8));
-   if (ret < 0)
-   return ret;
-
/* Clear the screen */
ret = ssd1307fb_update_display(par);
if (ret < 0)
-- 
2.25.1



[PATCH 5/5] video: fbdev: ssd1307fb: Cache address ranges

2021-05-18 Thread Geert Uytterhoeven
Cache the column and page ranges, to avoid doing unneeded I2C transfers
when the values haven't changed.

Signed-off-by: Geert Uytterhoeven 
---
 drivers/video/fbdev/ssd1307fb.c | 52 +++--
 1 file changed, 36 insertions(+), 16 deletions(-)

diff --git a/drivers/video/fbdev/ssd1307fb.c b/drivers/video/fbdev/ssd1307fb.c
index 8e3d4be74723b9bf..23b43ce479898813 100644
--- a/drivers/video/fbdev/ssd1307fb.c
+++ b/drivers/video/fbdev/ssd1307fb.c
@@ -82,6 +82,11 @@ struct ssd1307fb_par {
struct regulator *vbat_reg;
u32 vcomh;
u32 width;
+   /* Cached address ranges */
+   u8 col_start;
+   u8 col_end;
+   u8 page_start;
+   u8 page_end;
 };
 
 struct ssd1307fb_array {
@@ -160,28 +165,43 @@ static int ssd1307fb_set_address_range(struct 
ssd1307fb_par *par, u8 col_start,
int ret;
 
/* Set column range */
-   ret = ssd1307fb_write_cmd(par->client, SSD1307FB_SET_COL_RANGE);
-   if (ret < 0)
-   return ret;
+   if (col_start != par->col_start || col_end != par->col_end) {
+   ret = ssd1307fb_write_cmd(par->client, SSD1307FB_SET_COL_RANGE);
+   if (ret < 0)
+   return ret;
 
-   ret = ssd1307fb_write_cmd(par->client, col_start);
-   if (ret < 0)
-   return ret;
+   ret = ssd1307fb_write_cmd(par->client, col_start);
+   if (ret < 0)
+   return ret;
 
-   ret = ssd1307fb_write_cmd(par->client, col_end);
-   if (ret < 0)
-   return ret;
+   ret = ssd1307fb_write_cmd(par->client, col_end);
+   if (ret < 0)
+   return ret;
+
+   par->col_start = col_start;
+   par->col_end = col_end;
+   }
 
/* Set page range */
-   ret = ssd1307fb_write_cmd(par->client, SSD1307FB_SET_PAGE_RANGE);
-   if (ret < 0)
-   return ret;
+   if (page_start != par->page_start || page_end != par->page_end) {
+   ret = ssd1307fb_write_cmd(par->client,
+ SSD1307FB_SET_PAGE_RANGE);
+   if (ret < 0)
+   return ret;
 
-   ret = ssd1307fb_write_cmd(par->client, page_start);
-   if (ret < 0)
-   return ret;
+   ret = ssd1307fb_write_cmd(par->client, page_start);
+   if (ret < 0)
+   return ret;
+
+   ret = ssd1307fb_write_cmd(par->client, page_end);
+   if (ret < 0)
+   return ret;
 
-   return ssd1307fb_write_cmd(par->client, page_end);
+   par->page_start = page_start;
+   par->page_end = page_end;
+   }
+
+   return 0;
 }
 
 static int ssd1307fb_update_rect(struct ssd1307fb_par *par, unsigned int x,
-- 
2.25.1



Re: [PATCH 0/7] Per client engine busyness

2021-05-18 Thread Tvrtko Ursulin



On 17/05/2021 20:03, Simon Ser wrote:

On Monday, May 17th, 2021 at 8:16 PM, Nieto, David M  
wrote:


Btw is DRM_MAJOR 226 consider uapi? I don't see it in uapi headers.


It's not in the headers, but it's de facto uAPI, as seen in libdrm:

 > git grep 226
 xf86drm.c
 99:#define DRM_MAJOR 226 /* Linux */


I suspected it would be yes, thanks.

I was just wondering if stat(2) and a chrdev major check would be a 
solid criteria to more efficiently (compared to parsing the text 
content) detect drm files while walking procfs.


Regards,

Tvrtko


Re: [Intel-gfx] [PATCH v2 05/15] drm/i915/ttm Initialize the ttm device and memory managers

2021-05-18 Thread Matthew Auld
On Tue, 18 May 2021 at 09:27, Thomas Hellström
 wrote:
>
> Temporarily remove the buddy allocator and related selftests
> and hook up the TTM range manager for i915 regions.
>
> Also modify the mock region selftests somewhat to account for a
> fragmenting manager.
>
> Signed-off-by: Thomas Hellström 
> ---
> v2:
> - Fix an error unwind in lmem_get_pages() (Reported by Matthew Auld)
> - Break out and modify usage of i915_sg_dma_sizes() (Reported by Mattew Auld)
> - Break out TTM changes to a separate patch (Reported by Christian König)
> ---



> +
> +static int mock_region_get_pages(struct drm_i915_gem_object *obj)
> +{
> +   unsigned int flags;
> +   struct sg_table *pages;
> +
> +   flags = I915_ALLOC_MIN_PAGE_SIZE;
> +   if (obj->flags & I915_BO_ALLOC_CONTIGUOUS)
> +   flags |= I915_ALLOC_CONTIGUOUS;
> +
> +   obj->mm.st_mm_node = intel_region_ttm_node_alloc(obj->mm.region,
> +obj->base.size,
> +flags);
> +   if (IS_ERR(obj->mm.st_mm_node))
> +   return PTR_ERR(obj->mm.st_mm_node);
> +
> +   pages = intel_region_ttm_node_to_st(obj->mm.region, 
> obj->mm.st_mm_node);
> +   if (IS_ERR(pages))
> +   return PTR_ERR(pages);

Needs some onion?


Re: [Intel-gfx] [PATCH v2 05/15] drm/i915/ttm Initialize the ttm device and memory managers

2021-05-18 Thread Thomas Hellström



On 5/18/21 11:09 AM, Matthew Auld wrote:

On Tue, 18 May 2021 at 09:27, Thomas Hellström
 wrote:

Temporarily remove the buddy allocator and related selftests
and hook up the TTM range manager for i915 regions.

Also modify the mock region selftests somewhat to account for a
fragmenting manager.

Signed-off-by: Thomas Hellström 
---
v2:
- Fix an error unwind in lmem_get_pages() (Reported by Matthew Auld)
- Break out and modify usage of i915_sg_dma_sizes() (Reported by Mattew Auld)
- Break out TTM changes to a separate patch (Reported by Christian König)
---




+
+static int mock_region_get_pages(struct drm_i915_gem_object *obj)
+{
+   unsigned int flags;
+   struct sg_table *pages;
+
+   flags = I915_ALLOC_MIN_PAGE_SIZE;
+   if (obj->flags & I915_BO_ALLOC_CONTIGUOUS)
+   flags |= I915_ALLOC_CONTIGUOUS;
+
+   obj->mm.st_mm_node = intel_region_ttm_node_alloc(obj->mm.region,
+obj->base.size,
+flags);
+   if (IS_ERR(obj->mm.st_mm_node))
+   return PTR_ERR(obj->mm.st_mm_node);
+
+   pages = intel_region_ttm_node_to_st(obj->mm.region, obj->mm.st_mm_node);
+   if (IS_ERR(pages))
+   return PTR_ERR(pages);

Needs some onion?


Ah, yes.

/Thomas




Re: [PATCH 0/7] Per client engine busyness

2021-05-18 Thread Daniel Stone
Hi,

On Tue, 18 May 2021 at 10:09, Tvrtko Ursulin
 wrote:
> I was just wondering if stat(2) and a chrdev major check would be a
> solid criteria to more efficiently (compared to parsing the text
> content) detect drm files while walking procfs.

Maybe I'm missing something, but is the per-PID walk actually a
measurable performance issue rather than just a bit unpleasant?

Cheers,
Daniel


Re: [Intel-gfx] [PATCH v2 14/15] drm/i915: Use ttm mmap handling for ttm bo's.

2021-05-18 Thread Thomas Hellström



On 5/18/21 10:27 AM, Thomas Hellström wrote:

From: Maarten Lankhorst 

Use the ttm handlers for servicing page faults, and vm_access.

Signed-off-by: Maarten Lankhorst 


LGTM. Just need to make sure we don't forget about the caching.

Reviewed-by: Thomas Hellström 




---
  drivers/gpu/drm/i915/gem/i915_gem_mman.c  |  17 ++-
  drivers/gpu/drm/i915/gem/i915_gem_mman.h  |   2 +
  .../gpu/drm/i915/gem/i915_gem_object_types.h  |   1 +
  drivers/gpu/drm/i915/gem/i915_gem_ttm.c   | 105 +-
  4 files changed, 118 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c 
b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
index 65db290efd16..2bf89349dde9 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
@@ -19,6 +19,7 @@
  #include "i915_gem_mman.h"
  #include "i915_trace.h"
  #include "i915_user_extensions.h"
+#include "i915_gem_ttm.h"
  #include "i915_vma.h"
  
  static inline bool

@@ -789,7 +790,7 @@ i915_gem_mmap_offset_ioctl(struct drm_device *dev, void 
*data,
return __assign_mmap_offset(file, args->handle, type, &args->offset);
  }
  
-static void vm_open(struct vm_area_struct *vma)

+void i915_gem_mmap_vm_open(struct vm_area_struct *vma)
  {
struct i915_mmap_offset *mmo = vma->vm_private_data;
struct drm_i915_gem_object *obj = mmo->obj;
@@ -798,7 +799,7 @@ static void vm_open(struct vm_area_struct *vma)
i915_gem_object_get(obj);
  }
  
-static void vm_close(struct vm_area_struct *vma)

+void i915_gem_mmap_vm_close(struct vm_area_struct *vma)
  {
struct i915_mmap_offset *mmo = vma->vm_private_data;
struct drm_i915_gem_object *obj = mmo->obj;
@@ -810,15 +811,15 @@ static void vm_close(struct vm_area_struct *vma)
  static const struct vm_operations_struct vm_ops_gtt = {
.fault = vm_fault_gtt,
.access = vm_access,
-   .open = vm_open,
-   .close = vm_close,
+   .open = i915_gem_mmap_vm_open,
+   .close = i915_gem_mmap_vm_close,
  };
  
  static const struct vm_operations_struct vm_ops_cpu = {

.fault = vm_fault_cpu,
.access = vm_access,
-   .open = vm_open,
-   .close = vm_close,
+   .open = i915_gem_mmap_vm_open,
+   .close = i915_gem_mmap_vm_close,
  };
  
  static int singleton_release(struct inode *inode, struct file *file)

@@ -953,6 +954,10 @@ int i915_gem_mmap(struct file *filp, struct vm_area_struct 
*vma)
}
vma->vm_page_prot = pgprot_decrypted(vma->vm_page_prot);
  
+	/* override ops per-object if desired */

+   if (obj->ops->mmap_ops)
+   vma->vm_ops = obj->ops->mmap_ops;
+
return 0;
  }
  
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.h b/drivers/gpu/drm/i915/gem/i915_gem_mman.h

index efee9e0d2508..e5bd02a6db12 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_mman.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.h
@@ -28,5 +28,7 @@ void __i915_gem_object_release_mmap_gtt(struct 
drm_i915_gem_object *obj);
  void i915_gem_object_release_mmap_gtt(struct drm_i915_gem_object *obj);
  
  void i915_gem_object_release_mmap_offset(struct drm_i915_gem_object *obj);

+void i915_gem_mmap_vm_open(struct vm_area_struct *vma);
+void i915_gem_mmap_vm_close(struct vm_area_struct *vma);
  
  #endif

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index b350765e1935..31d828e91cf4 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -79,6 +79,7 @@ struct drm_i915_gem_object_ops {
void (*delayed_free)(struct drm_i915_gem_object *obj);
void (*release)(struct drm_i915_gem_object *obj);
  
+	const struct vm_operations_struct *mmap_ops;

const char *name; /* friendly name for debug, e.g. lockdep classes */
  };
  
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c

index 790f5ec45c4d..fe9ac50b2470 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -14,6 +14,7 @@
  #include "gem/i915_gem_region.h"
  #include "gem/i915_gem_ttm.h"
  #include "gem/i915_gem_ttm_bo_util.h"
+#include "gem/i915_gem_mman.h"
  
  #define I915_PL_LMEM0 TTM_PL_PRIV

  #define I915_PL_SYSTEM TTM_PL_SYSTEM
@@ -345,6 +346,44 @@ static int i915_ttm_move(struct ttm_buffer_object *bo, 
bool evict,
return 0;
  }
  
+static int i915_ttm_io_mem_reserve(struct ttm_device *bdev, struct ttm_resource *mem)

+{
+   if (mem->mem_type < I915_PL_LMEM0)
+   return 0;
+
+   /* We may need to revisit this later, but this allows all caching to be 
used in mmap */
+   mem->bus.caching = ttm_cached;
+   mem->bus.is_iomem = true;
+
+   return 0;
+}
+
+static unsigned long i915_ttm_io_mem_pfn(struct ttm_buffer_object *bo,
+unsigned long page_offset)
+{
+   struct drm_i915_gem_object *obj = i

[PATCH] drm/bridge: ti-sn65dsi86: fix a ternary type promotion bug

2021-05-18 Thread Dan Carpenter
The ti_sn_aux_transfer() function returns ssize_t (signed long).  It's
supposed to return negative error codes or the number of bytes
transferred.  The "ret" variable is int and the "len" variable is
unsigned int.

The problem is that with a ternary like this, the negative int is first
type promoted to unsigned int to match "len" at this point it is a high
positive value.  Then when it is type promoted to ssize_t (s64) it
remains a high positive value instead of sign extending and becoming a
negative again.

Fix this by removing the ternary.

Fixes: b137406d9679 ("drm/bridge: ti-sn65dsi86: If refclk, DP AUX can happen 
w/out pre-enable")
Signed-off-by: Dan Carpenter 
---
 drivers/gpu/drm/bridge/ti-sn65dsi86.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/bridge/ti-sn65dsi86.c 
b/drivers/gpu/drm/bridge/ti-sn65dsi86.c
index bb0a0e1c6341..45a2969afb2b 100644
--- a/drivers/gpu/drm/bridge/ti-sn65dsi86.c
+++ b/drivers/gpu/drm/bridge/ti-sn65dsi86.c
@@ -1042,7 +1042,9 @@ static ssize_t ti_sn_aux_transfer(struct drm_dp_aux *aux,
pm_runtime_mark_last_busy(pdata->dev);
pm_runtime_put_autosuspend(pdata->dev);
 
-   return ret ? ret : len;
+   if (ret)
+   return ret;
+   return len;
 }
 
 static int ti_sn_bridge_parse_dsi_host(struct ti_sn65dsi86 *pdata)
-- 
2.30.2



Re: [Intel-gfx] [PATCH v2 15/15] drm/i915/ttm: Add io sgt caching to i915_ttm_io_mem_pfn

2021-05-18 Thread Thomas Hellström



On 5/18/21 10:27 AM, Thomas Hellström wrote:

From: Maarten Lankhorst 

Instead of walking the sg table manually, use our caching helpers
to do the sgt caching. To prevent lifetime issues of ttm_bo vs
i915_gem_object, we will use a separate member, instead of re-using
the dma page member.

Signed-off-by: Maarten Lankhorst 
---
  drivers/gpu/drm/i915/gem/i915_gem_object.h|  6 +--
  .../gpu/drm/i915/gem/i915_gem_object_types.h  |  1 +
  drivers/gpu/drm/i915/gem/i915_gem_pages.c |  3 +-
  drivers/gpu/drm/i915/gem/i915_gem_ttm.c   | 46 ++-
  4 files changed, 30 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object.h
index a3ad8cf4eefd..ff59e6c640e6 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
@@ -342,14 +342,14 @@ struct scatterlist *
  __i915_gem_object_get_sg(struct drm_i915_gem_object *obj,
 struct i915_gem_object_page_iter *iter,
 unsigned int n,
-unsigned int *offset, bool allow_alloc);
+unsigned int *offset, bool allow_alloc, bool dma);
  
  static inline struct scatterlist *

  i915_gem_object_get_sg(struct drm_i915_gem_object *obj,
   unsigned int n,
   unsigned int *offset, bool allow_alloc)
  {
-   return __i915_gem_object_get_sg(obj, &obj->mm.get_page, n, offset, 
allow_alloc);
+   return __i915_gem_object_get_sg(obj, &obj->mm.get_page, n, offset, 
allow_alloc, false);
  }
  
  static inline struct scatterlist *

@@ -357,7 +357,7 @@ i915_gem_object_get_sg_dma(struct drm_i915_gem_object *obj,
   unsigned int n,
   unsigned int *offset, bool allow_alloc)
  {
-   return __i915_gem_object_get_sg(obj, &obj->mm.get_dma_page, n, offset, 
allow_alloc);
+   return __i915_gem_object_get_sg(obj, &obj->mm.get_dma_page, n, offset, 
allow_alloc, true);
  }
  
  struct page *

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 31d828e91cf4..828310802b9f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -324,6 +324,7 @@ struct drm_i915_gem_object {
  
  	struct {

struct sg_table *cached_io_st;
+   struct i915_gem_object_page_iter get_io_page;
} ttm;
  
  	/** Record of address bit 17 of each page at last unbind. */

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c 
b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
index 62ee2185a41b..577352b4f2f6 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c
@@ -465,9 +465,8 @@ __i915_gem_object_get_sg(struct drm_i915_gem_object *obj,
 struct i915_gem_object_page_iter *iter,
 unsigned int n,
 unsigned int *offset,
-bool allow_alloc)
+bool allow_alloc, bool dma)
  {
-   const bool dma = iter == &obj->mm.get_dma_page;
struct scatterlist *sg;
unsigned int idx, count;
  
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c

index fe9ac50b2470..1eaefb89e859 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
@@ -167,11 +167,20 @@ static int i915_ttm_move_notify(struct ttm_buffer_object 
*bo)
  
  static void i915_ttm_free_cached_io_st(struct drm_i915_gem_object *obj)

  {
-   if (obj->ttm.cached_io_st) {
-   sg_free_table(obj->ttm.cached_io_st);
-   kfree(obj->ttm.cached_io_st);
-   obj->ttm.cached_io_st = NULL;
-   }
+   struct radix_tree_iter iter;
+   void __rcu **slot;
+
+   if (!obj->ttm.cached_io_st)
+   return;
+
+   rcu_read_lock();
+   radix_tree_for_each_slot(slot, &obj->ttm.get_io_page.radix, &iter, 0)
+   radix_tree_delete(&obj->ttm.get_io_page.radix, iter.index);
+   rcu_read_unlock();
+
+   sg_free_table(obj->ttm.cached_io_st);
+   kfree(obj->ttm.cached_io_st);
+   obj->ttm.cached_io_st = NULL;
  }
  
  static void i915_ttm_purge(struct drm_i915_gem_object *obj)

@@ -340,8 +349,11 @@ static int i915_ttm_move(struct ttm_buffer_object *bo, 
bool evict,
i915_ttm_move_memcpy(bo, new_mem, new_iter, old_iter);
i915_ttm_free_cached_io_st(obj);
  
-	if (!new_man->use_tt)

+   if (!new_man->use_tt) {
obj->ttm.cached_io_st = new_st;
+   obj->ttm.get_io_page.sg_pos = new_st->sgl;
+   obj->ttm.get_io_page.sg_idx = 0;
+   }
  
  	return 0;

  }
@@ -362,26 +374,15 @@ static unsigned long i915_ttm_io_mem_pfn(struct 
ttm_buffer_object *bo,
 unsigned long page_offset)
  

Re: [PATCH 0/7] Per client engine busyness

2021-05-18 Thread Tvrtko Ursulin



On 17/05/2021 19:02, Nieto, David M wrote:

[AMD Official Use Only]


The format is simple:

:  %


Hm what time period does the percent relate to?

The i915 implementation uses accumulated nanoseconds active. That way 
who reads the file can calculate the percentage relative to the time 
period between two reads of the file.



we also have entries for the memory mapped:
mem  :  KiB


Okay so in general key values per line in text format. Colon as delimiter.

What common fields could be useful between different drivers and what 
common naming scheme, in order to enable as easy as possible creation of 
a generic top-like tool?


driver: 
pdev: 
ring-: N 
...
mem-: N 
...

What else?
Is ring a good common name? We actually more use engine in i915 but I am 
not really bothered about it.


Aggregated GPU usage could be easily and generically done by userspace 
by adding all rings and normalizing.


On my submission 
https://lists.freedesktop.org/archives/amd-gfx/2021-May/063149.html 
 I 
added a python script to print out the info. It has a CPU usage lower 
that top, for example.


To be absolutely honest, I agree that there is an overhead, but It might 
not be as much as you fear.


For me more the issue is that the extra number of operations grows with 
the number of open files on the system, which has no relation to the 
number of drm clients.


Extra so if the monitoring tool wants to show _only_ DRM processes. Then 
the cost scales with total number of processes time total number of 
files on the server.


This design inefficiency bothers me yes. This is somewhat alleviated by 
the proposal from Chris 
(https://patchwork.freedesktop.org/patch/419042/?series=86692&rev=1) 
although there are downsides there as well. Like needing to keep a map 
of pids to drm files in drivers.


Btw what do you do in that tool for same fd in a multi-threaded process
or so? Do you show duplicate entries or detect and ignore? I guess I did 
not figure out if you show by pid/tgid or by fd.


Regards,

Tvrtko



*From:* Tvrtko Ursulin 
*Sent:* Monday, May 17, 2021 9:00 AM
*To:* Nieto, David M ; Daniel Vetter 
; Koenig, Christian 
*Cc:* Alex Deucher ; Intel Graphics Development 
; Maling list - DRI developers 


*Subject:* Re: [PATCH 0/7] Per client engine busyness

On 17/05/2021 15:39, Nieto, David M wrote:

[AMD Official Use Only]


Maybe we could try to standardize how the different submission ring 
   usage gets exposed in the fdinfo? We went the simple way of just 
adding name and index, but if someone has a suggestion on how else we 
could format them so there is commonality across vendors we could just 
amend those.


Could you paste an example of your format?

Standardized fdinfo sounds good to me in principle. But I would also
like people to look at the procfs proposal from Chris,
   - link to which I have pasted elsewhere in the thread.

Only potential issue with fdinfo I see at the moment is a bit of an
extra cost in DRM client discovery (compared to my sysfs series and also
procfs RFC from Chris). It would require reading all processes (well
threads, then maybe aggregating threads into parent processes), all fd
symlinks, and doing a stat on them to figure out which ones are DRM devices.

Btw is DRM_MAJOR 226 consider uapi? I don't see it in uapi headers.

I’d really like to have the process managers tools display GPU usage 
regardless of what vendor is installed.


Definitely.

Regards,

Tvrtko


Re: [PATCH 0/7] Per client engine busyness

2021-05-18 Thread Tvrtko Ursulin



On 18/05/2021 10:16, Daniel Stone wrote:

Hi,

On Tue, 18 May 2021 at 10:09, Tvrtko Ursulin
 wrote:

I was just wondering if stat(2) and a chrdev major check would be a
solid criteria to more efficiently (compared to parsing the text
content) detect drm files while walking procfs.


Maybe I'm missing something, but is the per-PID walk actually a
measurable performance issue rather than just a bit unpleasant?


Per pid and per each open fd.

As said in the other thread what bothers me a bit in this scheme is that 
the cost of obtaining GPU usage scales based on non-GPU criteria.


For use case of a top-like tool which shows all processes this is a 
smaller additional cost, but then for a gpu-top like tool it is somewhat 
higher.


Regards,

Tvrtko


[V2][PATCH 0/2] drm: xlnx: add some functions

2021-05-18 Thread quanyang . wang
From: Quanyang Wang 

Hi all,

The patch "drm: xlnx: add is_layer_vid() to simplify the code" is to
simplify the code which judge the layer type.

The patch "drm: xlnx: consolidate the functions which programming 
AUDIO_VIDEO_SELECT register"
is to consolidate the code that can configure vid/gfx/audio to output
different mode (live/mem/disable/tpg) in one function 
"zynqmp_disp_avbuf_output_select".

Changelogs:

 V1 ---> V2:
 - As per Paul's comments, add "const" to the argument "layer" of the
 function is_layer_vid, and just return the result of "==" operator, and
 add Acked-by from Paul. 
 - As per Paul's comments, fix some pattern errors and use FIELD_PREP()
 macro instead of *_SHIFT and use GENMASK/BIT to create *_MASK macros.

Thanks,
Quanyang


Quanyang Wang (2):
  drm: xlnx: add is_layer_vid() to simplify the code
  drm: xlnx: consolidate the functions which programming
AUDIO_VIDEO_SELECT register

 drivers/gpu/drm/xlnx/zynqmp_disp.c  | 193 ++--
 drivers/gpu/drm/xlnx/zynqmp_disp_regs.h |  23 +--
 2 files changed, 121 insertions(+), 95 deletions(-)

-- 
2.25.1



[V2][PATCH 1/2] drm: xlnx: add is_layer_vid() to simplify the code

2021-05-18 Thread quanyang . wang
From: Quanyang Wang 

Add a new function is_layer_vid() to simplify the code that
judges if a layer is the video layer.

Acked-by: Paul Cercueil 
Signed-off-by: Quanyang Wang 
---
 drivers/gpu/drm/xlnx/zynqmp_disp.c | 39 +-
 1 file changed, 22 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/xlnx/zynqmp_disp.c 
b/drivers/gpu/drm/xlnx/zynqmp_disp.c
index 109d627968ac..eefb278e24c6 100644
--- a/drivers/gpu/drm/xlnx/zynqmp_disp.c
+++ b/drivers/gpu/drm/xlnx/zynqmp_disp.c
@@ -434,30 +434,35 @@ static void zynqmp_disp_avbuf_write(struct 
zynqmp_disp_avbuf *avbuf,
writel(val, avbuf->base + reg);
 }
 
+static bool is_layer_vid(const struct zynqmp_disp_layer *layer)
+{
+   return layer->id == ZYNQMP_DISP_LAYER_VID;
+}
+
 /**
  * zynqmp_disp_avbuf_set_format - Set the input format for a layer
  * @avbuf: Audio/video buffer manager
- * @layer: The layer ID
+ * @layer: The layer
  * @fmt: The format information
  *
  * Set the video buffer manager format for @layer to @fmt.
  */
 static void zynqmp_disp_avbuf_set_format(struct zynqmp_disp_avbuf *avbuf,
-enum zynqmp_disp_layer_id layer,
+struct zynqmp_disp_layer *layer,
 const struct zynqmp_disp_format *fmt)
 {
unsigned int i;
u32 val;
 
val = zynqmp_disp_avbuf_read(avbuf, ZYNQMP_DISP_AV_BUF_FMT);
-   val &= layer == ZYNQMP_DISP_LAYER_VID
+   val &= is_layer_vid(layer)
? ~ZYNQMP_DISP_AV_BUF_FMT_NL_VID_MASK
: ~ZYNQMP_DISP_AV_BUF_FMT_NL_GFX_MASK;
val |= fmt->buf_fmt;
zynqmp_disp_avbuf_write(avbuf, ZYNQMP_DISP_AV_BUF_FMT, val);
 
for (i = 0; i < ZYNQMP_DISP_AV_BUF_NUM_SF; i++) {
-   unsigned int reg = layer == ZYNQMP_DISP_LAYER_VID
+   unsigned int reg = is_layer_vid(layer)
 ? ZYNQMP_DISP_AV_BUF_VID_COMP_SF(i)
 : ZYNQMP_DISP_AV_BUF_GFX_COMP_SF(i);
 
@@ -573,19 +578,19 @@ static void zynqmp_disp_avbuf_disable_audio(struct 
zynqmp_disp_avbuf *avbuf)
 /**
  * zynqmp_disp_avbuf_enable_video - Enable a video layer
  * @avbuf: Audio/video buffer manager
- * @layer: The layer ID
+ * @layer: The layer
  * @mode: Operating mode of layer
  *
  * Enable the video/graphics buffer for @layer.
  */
 static void zynqmp_disp_avbuf_enable_video(struct zynqmp_disp_avbuf *avbuf,
-  enum zynqmp_disp_layer_id layer,
+  struct zynqmp_disp_layer *layer,
   enum zynqmp_disp_layer_mode mode)
 {
u32 val;
 
val = zynqmp_disp_avbuf_read(avbuf, ZYNQMP_DISP_AV_BUF_OUTPUT);
-   if (layer == ZYNQMP_DISP_LAYER_VID) {
+   if (is_layer_vid(layer)) {
val &= ~ZYNQMP_DISP_AV_BUF_OUTPUT_VID1_MASK;
if (mode == ZYNQMP_DISP_LAYER_NONLIVE)
val |= ZYNQMP_DISP_AV_BUF_OUTPUT_VID1_MEM;
@@ -605,17 +610,17 @@ static void zynqmp_disp_avbuf_enable_video(struct 
zynqmp_disp_avbuf *avbuf,
 /**
  * zynqmp_disp_avbuf_disable_video - Disable a video layer
  * @avbuf: Audio/video buffer manager
- * @layer: The layer ID
+ * @layer: The layer
  *
  * Disable the video/graphics buffer for @layer.
  */
 static void zynqmp_disp_avbuf_disable_video(struct zynqmp_disp_avbuf *avbuf,
-   enum zynqmp_disp_layer_id layer)
+   struct zynqmp_disp_layer *layer)
 {
u32 val;
 
val = zynqmp_disp_avbuf_read(avbuf, ZYNQMP_DISP_AV_BUF_OUTPUT);
-   if (layer == ZYNQMP_DISP_LAYER_VID) {
+   if (is_layer_vid(layer)) {
val &= ~ZYNQMP_DISP_AV_BUF_OUTPUT_VID1_MASK;
val |= ZYNQMP_DISP_AV_BUF_OUTPUT_VID1_NONE;
} else {
@@ -807,7 +812,7 @@ static void zynqmp_disp_blend_layer_set_csc(struct 
zynqmp_disp_blend *blend,
}
}
 
-   if (layer->id == ZYNQMP_DISP_LAYER_VID)
+   if (is_layer_vid(layer))
reg = ZYNQMP_DISP_V_BLEND_IN1CSC_COEFF(0);
else
reg = ZYNQMP_DISP_V_BLEND_IN2CSC_COEFF(0);
@@ -818,7 +823,7 @@ static void zynqmp_disp_blend_layer_set_csc(struct 
zynqmp_disp_blend *blend,
zynqmp_disp_blend_write(blend, reg + 8, coeffs[i + swap[2]]);
}
 
-   if (layer->id == ZYNQMP_DISP_LAYER_VID)
+   if (is_layer_vid(layer))
reg = ZYNQMP_DISP_V_BLEND_IN1CSC_OFFSET(0);
else
reg = ZYNQMP_DISP_V_BLEND_IN2CSC_OFFSET(0);
@@ -1025,7 +1030,7 @@ zynqmp_disp_layer_find_format(struct zynqmp_disp_layer 
*layer,
  */
 static void zynqmp_disp_layer_enable(struct zynqmp_disp_layer *layer)
 {
-   zynqmp_disp_avbuf_enable_video(&layer->disp->avbuf, layer->id,
+   zynqmp_disp_avbuf_enable_video(&layer->disp->avbuf, layer,
   ZY

[V2][PATCH 2/2] drm: xlnx: consolidate the functions which programming AUDIO_VIDEO_SELECT register

2021-05-18 Thread quanyang . wang
From: Quanyang Wang 

For now, the functions zynqmp_disp_avbuf_enable/disable_audio and
zynqmp_disp_avbuf_enable/disable_video are all programming the register
AV_BUF_OUTPUT_AUDIO_VIDEO_SELECT to select the output for audio or video.
And in the future, many drm properties (like video_tpg, audio_tpg,
audio_pl, etc) also need to access it. So let's introduce some variables
of enum type and consolidate the code to unify handling this.

Signed-off-by: Quanyang Wang 
---
 drivers/gpu/drm/xlnx/zynqmp_disp.c  | 168 ++--
 drivers/gpu/drm/xlnx/zynqmp_disp_regs.h |  23 +---
 2 files changed, 106 insertions(+), 85 deletions(-)

diff --git a/drivers/gpu/drm/xlnx/zynqmp_disp.c 
b/drivers/gpu/drm/xlnx/zynqmp_disp.c
index eefb278e24c6..3672d2f5665b 100644
--- a/drivers/gpu/drm/xlnx/zynqmp_disp.c
+++ b/drivers/gpu/drm/xlnx/zynqmp_disp.c
@@ -102,12 +102,39 @@ enum zynqmp_disp_layer_id {
 
 /**
  * enum zynqmp_disp_layer_mode - Layer mode
- * @ZYNQMP_DISP_LAYER_NONLIVE: non-live (memory) mode
+ * @ZYNQMP_DISP_LAYER_MEM: memory mode
  * @ZYNQMP_DISP_LAYER_LIVE: live (stream) mode
+ * @ZYNQMP_DISP_LAYER_TPG: tpg mode (only for video layer)
+ * @ZYNQMP_DISP_LAYER_DISABLE: disable mode
  */
 enum zynqmp_disp_layer_mode {
-   ZYNQMP_DISP_LAYER_NONLIVE,
-   ZYNQMP_DISP_LAYER_LIVE
+   ZYNQMP_DISP_LAYER_MEM,
+   ZYNQMP_DISP_LAYER_LIVE,
+   ZYNQMP_DISP_LAYER_TPG,
+   ZYNQMP_DISP_LAYER_DISABLE
+};
+
+enum avbuf_vid_mode {
+   VID_MODE_LIVE,
+   VID_MODE_MEM,
+   VID_MODE_TPG,
+   VID_MODE_NONE
+};
+
+enum avbuf_gfx_mode {
+   GFX_MODE_DISABLE,
+   GFX_MODE_MEM,
+   GFX_MODE_LIVE,
+   GFX_MODE_NONE
+};
+
+enum avbuf_aud_mode {
+   AUD1_MODE_LIVE,
+   AUD1_MODE_MEM,
+   AUD1_MODE_TPG,
+   AUD1_MODE_DISABLE,
+   AUD2_MODE_DISABLE,
+   AUD2_MODE_ENABLE
 };
 
 /**
@@ -542,92 +569,102 @@ static void zynqmp_disp_avbuf_disable_channels(struct 
zynqmp_disp_avbuf *avbuf)
 }
 
 /**
- * zynqmp_disp_avbuf_enable_audio - Enable audio
+ * zynqmp_disp_avbuf_output_select - Select the buffer manager outputs
  * @avbuf: Audio/video buffer manager
+ * @layer: The layer
+ * @mode: The mode for this layer
  *
- * Enable all audio buffers with a non-live (memory) source.
+ * Select the buffer manager outputs for @layer.
  */
-static void zynqmp_disp_avbuf_enable_audio(struct zynqmp_disp_avbuf *avbuf)
+static void zynqmp_disp_avbuf_output_select(struct zynqmp_disp_avbuf *avbuf,
+  struct zynqmp_disp_layer *layer,
+  u32 mode)
 {
-   u32 val;
+   u32 reg;
 
-   val = zynqmp_disp_avbuf_read(avbuf, ZYNQMP_DISP_AV_BUF_OUTPUT);
-   val &= ~ZYNQMP_DISP_AV_BUF_OUTPUT_AUD1_MASK;
-   val |= ZYNQMP_DISP_AV_BUF_OUTPUT_AUD1_MEM;
-   val |= ZYNQMP_DISP_AV_BUF_OUTPUT_AUD2_EN;
-   zynqmp_disp_avbuf_write(avbuf, ZYNQMP_DISP_AV_BUF_OUTPUT, val);
+   reg = zynqmp_disp_avbuf_read(avbuf, ZYNQMP_DISP_AV_BUF_OUTPUT);
+
+   /* Select audio mode when the layer is NULL */
+   if (layer == NULL) {
+   if (mode >= AUD2_MODE_DISABLE) {
+   reg &= ~ZYNQMP_DISP_AV_BUF_OUTPUT_AUD2_MASK;
+   reg |= FIELD_PREP(ZYNQMP_DISP_AV_BUF_OUTPUT_AUD2_MASK,
+   (mode - AUD2_MODE_DISABLE));
+   } else {
+   reg &= ~ZYNQMP_DISP_AV_BUF_OUTPUT_AUD1_MASK;
+   reg |= FIELD_PREP(ZYNQMP_DISP_AV_BUF_OUTPUT_AUD1_MASK, 
mode);
+   }
+   } else if (is_layer_vid(layer)) {
+   reg &= ~ZYNQMP_DISP_AV_BUF_OUTPUT_VID1_MASK;
+   reg |= FIELD_PREP(ZYNQMP_DISP_AV_BUF_OUTPUT_VID1_MASK, mode);
+   } else {
+   reg &= ~ZYNQMP_DISP_AV_BUF_OUTPUT_VID2_MASK;
+   reg |= FIELD_PREP(ZYNQMP_DISP_AV_BUF_OUTPUT_VID2_MASK, mode);
+   }
+
+   zynqmp_disp_avbuf_write(avbuf, ZYNQMP_DISP_AV_BUF_OUTPUT, reg);
 }
 
 /**
- * zynqmp_disp_avbuf_disable_audio - Disable audio
+ * zynqmp_disp_avbuf_enable_audio - Enable audio
  * @avbuf: Audio/video buffer manager
  *
- * Disable all audio buffers.
+ * Enable all audio buffers.
  */
-static void zynqmp_disp_avbuf_disable_audio(struct zynqmp_disp_avbuf *avbuf)
+static void zynqmp_disp_avbuf_enable_audio(struct zynqmp_disp_avbuf *avbuf)
 {
-   u32 val;
-
-   val = zynqmp_disp_avbuf_read(avbuf, ZYNQMP_DISP_AV_BUF_OUTPUT);
-   val &= ~ZYNQMP_DISP_AV_BUF_OUTPUT_AUD1_MASK;
-   val |= ZYNQMP_DISP_AV_BUF_OUTPUT_AUD1_DISABLE;
-   val &= ~ZYNQMP_DISP_AV_BUF_OUTPUT_AUD2_EN;
-   zynqmp_disp_avbuf_write(avbuf, ZYNQMP_DISP_AV_BUF_OUTPUT, val);
+   zynqmp_disp_avbuf_output_select(avbuf, NULL, AUD1_MODE_MEM);
+   zynqmp_disp_avbuf_output_select(avbuf, NULL, AUD2_MODE_ENABLE);
 }
 
 /**
- * zynqmp_disp_avbuf_enable_video - Enable a video layer
+ * zynqmp_disp_avbuf_disable_audio - Disable audio
  * @avbuf: Audio/video buffer manager
- * @layer: Th

Re: [PATCH 1/3] drm/virtio: Fixes a potential NULL pointer dereference on probe failure

2021-05-18 Thread Gerd Hoffmann
On Mon, May 17, 2021 at 04:49:11PM +0800, Xie Yongji wrote:
> The dev->dev_private might not be allocated if virtio_gpu_pci_quirk()
> or virtio_gpu_init() failed. In this case, we should avoid the cleanup
> in virtio_gpu_release().

Pushed all three to drm-misc-next.

thanks,
  Gerd



[PATCH v16 2/4] dt-bindings: msm: dsi: add yaml schemas for DSI bindings

2021-05-18 Thread Krishna Manikandan
Add YAML schema for the device tree bindings for DSI

Signed-off-by: Krishna Manikandan 

Changes in v1:
- Separate dsi controller bindings to a separate patch (Stephen Boyd)
- Merge dsi-common-controller.yaml and dsi-controller-main.yaml to
  a single file (Stephen Boyd)
- Drop supply entries and definitions from properties (Stephen Boyd)
- Modify phy-names property for dsi controller (Stephen Boyd)
- Remove boolean from description (Stephen Boyd)
- Drop pinctrl properties as they are standard entries (Stephen Boyd)
- Modify the description for ports property and keep the reference
  to the generic binding where this is defined (Stephen Boyd)
- Add description to clock names (Stephen Boyd)
- Correct the indendation (Stephen Boyd)
- Drop the label for display dt nodes and correct the node
  name (Stephen Boyd)

Changes in v2:
- Drop maxItems for clock (Stephen Boyd)
- Drop qcom,mdss-mdp-transfer-time-us as it is not used in upstream
  dt file (Stephen Boyd)
- Keep child node directly under soc node (Stephen Boyd)
- Drop qcom,sync-dual-dsi as it is not used in upstream dt

Changes in v3:
- Add description for register property (Stephen Boyd)

Changes in v4:
- Add maxItems for phys property (Stephen Boyd)
- Add maxItems for reg property (Stephen Boyd)
- Add reference for data-lanes property (Stephen Boyd)
- Remove soc from example (Stephen Boyd)

Changes in v5:
- Modify title and description (Stephen Boyd)
- Add required properties for ports node (Stephen Boyd)
- Add data-lanes in the example (Stephen Boyd)
- Drop qcom,master-dsi property (Stephen Boyd)

Changes in v6:
- Add required properties for port@0, port@1 and corresponding
  endpoints (Stephen Boyd)
- Add address-cells and size-cells for ports (Stephen Boyd)
- Use additionalProperties instead of unevaluatedProperties (Stephen Boyd)

Changes in v7:
- Add reference for ports and data-lanes (Rob Herring)
- Add maxItems and minItems for data-lanes (Rob Herring)
---
 .../bindings/display/msm/dsi-controller-main.yaml  | 209 +
 .../devicetree/bindings/display/msm/dsi.txt| 249 -
 2 files changed, 209 insertions(+), 249 deletions(-)
 create mode 100644 
Documentation/devicetree/bindings/display/msm/dsi-controller-main.yaml
 delete mode 100644 Documentation/devicetree/bindings/display/msm/dsi.txt

diff --git 
a/Documentation/devicetree/bindings/display/msm/dsi-controller-main.yaml 
b/Documentation/devicetree/bindings/display/msm/dsi-controller-main.yaml
new file mode 100644
index 000..80f5218
--- /dev/null
+++ b/Documentation/devicetree/bindings/display/msm/dsi-controller-main.yaml
@@ -0,0 +1,209 @@
+# SPDX-License-Identifier: GPL-2.0-only or BSD-2-Clause
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/display/msm/dsi-controller-main.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Qualcomm Display DSI controller
+
+maintainers:
+  - Krishna Manikandan 
+
+allOf:
+  - $ref: "../dsi-controller.yaml#"
+
+properties:
+  compatible:
+items:
+  - const: qcom,mdss-dsi-ctrl
+
+  reg:
+maxItems: 1
+
+  reg-names:
+const: dsi_ctrl
+
+  interrupts:
+maxItems: 1
+
+  clocks:
+items:
+  - description: Display byte clock
+  - description: Display byte interface clock
+  - description: Display pixel clock
+  - description: Display escape clock
+  - description: Display AHB clock
+  - description: Display AXI clock
+
+  clock-names:
+items:
+  - const: byte
+  - const: byte_intf
+  - const: pixel
+  - const: core
+  - const: iface
+  - const: bus
+
+  phys:
+maxItems: 1
+
+  phy-names:
+const: dsi
+
+  "#address-cells": true
+
+  "#size-cells": true
+
+  syscon-sfpb:
+description: A phandle to mmss_sfpb syscon node (only for DSIv2).
+$ref: "/schemas/types.yaml#/definitions/phandle"
+
+  qcom,dual-dsi-mode:
+type: boolean
+description: |
+  Indicates if the DSI controller is driving a panel which needs
+  2 DSI links.
+
+  power-domains:
+maxItems: 1
+
+  operating-points-v2: true
+
+  ports:
+$ref: "/schemas/graph.yaml#/properties/ports"
+description: |
+  Contains DSI controller input and output ports as children, each
+  containing one endpoint subnode.
+
+properties:
+  port@0:
+$ref: "/schemas/graph.yaml#/properties/port"
+description: |
+  Input endpoints of the controller.
+
+properties:
+  reg:
+const: 0
+
+  endpoint:
+type: object
+properties:
+  remote-endpoint: true
+  data-lanes:
+$ref: "../../media/video-interfaces.yaml"
+description: |
+  This describes how the physical DSI data lanes are mapped
+  to the logical lanes on the given platform. The value 
con

[PATCH v16 4/4] dt-bindings: msm/dp: Add bindings of MSM DisplayPort controller

2021-05-18 Thread Krishna Manikandan
Add bindings for Snapdragon DisplayPort controller driver.

Signed-off-by: Chandan Uddaraju 
Signed-off-by: Vara Reddy 
Signed-off-by: Tanmay Shah 
Signed-off-by: Kuogee Hsieh 
Signed-off-by: Krishna Manikandan 

Changes in V2:
-Provide details about sel-gpio

Changes in V4:
-Provide details about max dp lanes
-Change the commit text

Changes in V5:
-moved dp.txt to yaml file

Changes in v6:
- Squash all AUX LUT properties into one pattern Property
- Make aux-cfg[0-9]-settings properties optional
- Remove PLL/PHY bindings from DP controller dts
- Add DP clocks description
- Remove _clk suffix from clock names
- Rename pixel clock to stream_pixel
- Remove redundant bindings (GPIO, PHY, HDCP clock, etc..)
- Fix indentation
- Add Display Port as interface of DPU in DPU bindings
  and add port mapping accordingly.

Chages in v7:
- Add dp-controller.yaml file common between multiple SOC
- Rename dp-sc7180.yaml to dp-controller-sc7180.yaml
- change compatible string and add SOC name to it.
- Remove Root clock generator for pixel clock
- Add assigned-clocks and assigned-clock-parents bindings
- Remove redundant properties, descriptions and blank lines
- Add DP port in DPU bindings
- Update depends-on tag in commit message and rebase change accordingly

Changes in v8:
- Add MDSS AHB clock in bindings

Changes in v9:
- Remove redundant reg-name property
- Change assigned-clocks and assigned-clocks-parents counts to 2
- Use IRQ flags in example dts

Changes in v10:
- Change title of this patch as it does not contain PLL bindings anymore
- Remove redundant properties
- Remove use of IRQ flag
- Fix ports property

Changes in v11:
- add ports required of both #address-cells and  #size-cells
- add required operating-points-v2
- add required #sound-dai-cells
- add required power-domains
- update maintainer list

Changes in v12:
- remove soc node from examples (Stephen Boyd)
- split dpu-sc7180.yaml changes to separate patch (Stephen Boyd)

Changes in v13:
- add assigned-clocks
- add assigned-clock-parents

Changes in v14:
- add reference for ports (Rob Herring)
---
 .../bindings/display/msm/dp-controller.yaml| 159 +
 1 file changed, 159 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/display/msm/dp-controller.yaml

diff --git a/Documentation/devicetree/bindings/display/msm/dp-controller.yaml 
b/Documentation/devicetree/bindings/display/msm/dp-controller.yaml
new file mode 100644
index 000..bcce567
--- /dev/null
+++ b/Documentation/devicetree/bindings/display/msm/dp-controller.yaml
@@ -0,0 +1,159 @@
+# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/display/msm/dp-controller.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: MSM Display Port Controller
+
+maintainers:
+  - Kuogee Hsieh 
+
+description: |
+  Device tree bindings for DisplayPort host controller for MSM targets
+  that are compatible with VESA DisplayPort interface specification.
+
+properties:
+  compatible:
+enum:
+  - qcom,sc7180-dp
+
+  reg:
+maxItems: 1
+
+  interrupts:
+maxItems: 1
+
+  clocks:
+items:
+  - description: AHB clock to enable register access
+  - description: Display Port AUX clock
+  - description: Display Port Link clock
+  - description: Link interface clock between DP and PHY
+  - description: Display Port Pixel clock
+
+  clock-names:
+items:
+  - const: core_iface
+  - const: core_aux
+  - const: ctrl_link
+  - const: ctrl_link_iface
+  - const: stream_pixel
+
+  assigned-clocks:
+items:
+  - description: link clock source
+  - description: pixel clock source
+
+  assigned-clock-parents:
+items:
+  - description: phy 0 parent
+  - description: phy 1 parent
+
+  phys:
+maxItems: 1
+
+  phy-names:
+items:
+  - const: dp
+
+  operating-points-v2:
+maxItems: 1
+
+  power-domains:
+maxItems: 1
+
+  "#sound-dai-cells":
+const: 0
+
+  ports:
+$ref: /schemas/graph.yaml#/properties/ports
+properties:
+  "#address-cells":
+const: 1
+
+  "#size-cells":
+const: 0
+
+  port@0:
+$ref: /schemas/graph.yaml#/properties/port
+description: Input endpoint of the controller
+
+  port@1:
+$ref: /schemas/graph.yaml#/properties/port
+description: Output endpoint of the controller
+
+required:
+  - "#address-cells"
+  - "#size-cells"
+
+additionalProperties: false
+
+
+required:
+  - compatible
+  - reg
+  - interrupts
+  - clocks
+  - clock-names
+  - phys
+  - phy-names
+  - "#sound-dai-cells"
+  - power-domains
+  - ports
+
+additionalProperties: false
+
+examples:
+  - |
+#include 
+#include 
+#include 
+#include 
+
+displayport-controller@ae9 {
+compatible = "qcom,sc7180-dp";
+reg = <0xae9 0x1400>;
+interrupt-parent = <&mdss>;
+interrupts = <12>;
+clocks = <&

[PATCH v16 3/4] dt-bindings: msm: dsi: add yaml schemas for DSI PHY bindings

2021-05-18 Thread Krishna Manikandan
Add YAML schema for the device tree bindings for DSI PHY.

Signed-off-by: Krishna Manikandan 

Changes in v1:
   - Merge dsi-phy.yaml and dsi-phy-10nm.yaml (Stephen Boyd)
   - Remove qcom,dsi-phy-regulator-ldo-mode (Stephen Boyd)
   - Add clock cells properly (Stephen Boyd)
   - Remove unnecessary decription from clock names (Stephen Boyd)
   - Add pin names for the supply entries for 10nm phy which is
 used in sc7180 and sdm845 (Stephen Boyd)
   - Remove unused header files from examples (Stephen Boyd)
   - Drop labels for display nodes and correct node name (Stephen Boyd)

Changes in v2:
   - Drop maxItems for clock (Stephen Boyd)
   - Add vdds supply pin information for sdm845 (Stephen Boyd)
   - Add examples for 14nm, 20nm and 28nm phy yaml files (Stephen Boyd)
   - Keep child nodes directly under soc node (Stephen Boyd)

Changes in v3:
   - Use a separate yaml file to describe the common properties
 for all the dsi phy versions (Stephen Boyd)
   - Remove soc from examples (Stephen Boyd)
   - Add description for register property

Changes in v4:
   - Modify the title for all the phy versions (Stephen Boyd)
   - Drop description for all the phy versions (Stephen Boyd)
   - Modify the description for register property (Stephen Boyd)

Changes in v5:
   - Remove unused properties from common dsi phy file
   - Add clock-cells and phy-cells to required property
 list (Stephen Boyd)

Changes in v6:
   - Add proper compatible string in example
---
 .../bindings/display/msm/dsi-phy-10nm.yaml | 68 +
 .../bindings/display/msm/dsi-phy-14nm.yaml | 66 
 .../bindings/display/msm/dsi-phy-20nm.yaml | 71 ++
 .../bindings/display/msm/dsi-phy-28nm.yaml | 68 +
 .../bindings/display/msm/dsi-phy-common.yaml   | 40 
 5 files changed, 313 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/display/msm/dsi-phy-10nm.yaml
 create mode 100644 
Documentation/devicetree/bindings/display/msm/dsi-phy-14nm.yaml
 create mode 100644 
Documentation/devicetree/bindings/display/msm/dsi-phy-20nm.yaml
 create mode 100644 
Documentation/devicetree/bindings/display/msm/dsi-phy-28nm.yaml
 create mode 100644 
Documentation/devicetree/bindings/display/msm/dsi-phy-common.yaml

diff --git a/Documentation/devicetree/bindings/display/msm/dsi-phy-10nm.yaml 
b/Documentation/devicetree/bindings/display/msm/dsi-phy-10nm.yaml
new file mode 100644
index 000..4a26bef
--- /dev/null
+++ b/Documentation/devicetree/bindings/display/msm/dsi-phy-10nm.yaml
@@ -0,0 +1,68 @@
+# SPDX-License-Identifier: GPL-2.0-only or BSD-2-Clause
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/display/msm/dsi-phy-10nm.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Qualcomm Display DSI 10nm PHY
+
+maintainers:
+  - Krishna Manikandan 
+
+allOf:
+  - $ref: dsi-phy-common.yaml#
+
+properties:
+  compatible:
+oneOf:
+  - const: qcom,dsi-phy-10nm
+  - const: qcom,dsi-phy-10nm-8998
+
+  reg:
+items:
+  - description: dsi phy register set
+  - description: dsi phy lane register set
+  - description: dsi pll register set
+
+  reg-names:
+items:
+  - const: dsi_phy
+  - const: dsi_phy_lane
+  - const: dsi_pll
+
+  vdds-supply:
+description: |
+  Connected to DSI0_MIPI_DSI_PLL_VDDA0P9 pin for sc7180 target and
+  connected to VDDA_MIPI_DSI_0_PLL_0P9 pin for sdm845 target
+
+required:
+  - compatible
+  - reg
+  - reg-names
+  - vdds-supply
+
+unevaluatedProperties: false
+
+examples:
+  - |
+ #include 
+ #include 
+
+ dsi-phy@ae94400 {
+ compatible = "qcom,dsi-phy-10nm";
+ reg = <0x0ae94400 0x200>,
+   <0x0ae94600 0x280>,
+   <0x0ae94a00 0x1e0>;
+ reg-names = "dsi_phy",
+ "dsi_phy_lane",
+ "dsi_pll";
+
+ #clock-cells = <1>;
+ #phy-cells = <0>;
+
+ vdds-supply = <&vdda_mipi_dsi0_pll>;
+ clocks = <&dispcc DISP_CC_MDSS_AHB_CLK>,
+  <&rpmhcc RPMH_CXO_CLK>;
+ clock-names = "iface", "ref";
+ };
+...
diff --git a/Documentation/devicetree/bindings/display/msm/dsi-phy-14nm.yaml 
b/Documentation/devicetree/bindings/display/msm/dsi-phy-14nm.yaml
new file mode 100644
index 000..72a00cc
--- /dev/null
+++ b/Documentation/devicetree/bindings/display/msm/dsi-phy-14nm.yaml
@@ -0,0 +1,66 @@
+# SPDX-License-Identifier: GPL-2.0-only or BSD-2-Clause
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/display/msm/dsi-phy-14nm.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Qualcomm Display DSI 14nm PHY
+
+maintainers:
+  - Krishna Manikandan 
+
+allOf:
+  - $ref: dsi-phy-common.yaml#
+
+properties:
+  compatible:
+oneOf:
+  - const: qcom,dsi-phy-14nm
+  - const: qcom,dsi-phy-14nm-660
+
+  reg:
+items:
+  - description: dsi phy register set
+  - description: dsi phy

[PATCH v16 1/4] dt-bindings: msm: disp: add yaml schemas for DPU bindings

2021-05-18 Thread Krishna Manikandan
MSM Mobile Display Subsystem (MDSS) encapsulates sub-blocks
like DPU display controller, DSI etc. Add YAML schema
for DPU device tree bindings.

Signed-off-by: Krishna Manikandan 

Changes in v2:
- Changed dpu to DPU (Sam Ravnborg)
- Fixed indentation issues (Sam Ravnborg)
- Added empty line between different properties (Sam Ravnborg)
- Replaced reference txt files with  their corresponding
  yaml files (Sam Ravnborg)
- Modified the file to use "|" only when it is
  necessary (Sam Ravnborg)

Changes in v3:
- Corrected the license used (Rob Herring)
- Added maxItems for properties (Rob Herring)
- Dropped generic descriptions (Rob Herring)
- Added ranges property (Rob Herring)
- Corrected the indendation (Rob Herring)
- Added additionalProperties (Rob Herring)
- Split dsi file into two, one for dsi controller
  and another one for dsi phy per target (Rob Herring)
- Corrected description for pinctrl-names (Rob Herring)
- Corrected the examples used in yaml file (Rob Herring)
- Delete dsi.txt and dpu.txt (Rob Herring)

Changes in v4:
- Move schema up by one level (Rob Herring)
- Add patternProperties for mdp node (Rob Herring)
- Corrected description of some properties (Rob Herring)

Changes in v5:
- Correct the indentation (Rob Herring)
- Remove unnecessary description from properties (Rob Herring)
- Correct the number of interconnect entries (Rob Herring)
- Add interconnect names for sc7180 (Rob Herring)
- Add description for ports (Rob Herring)
- Remove common properties (Rob Herring)
- Add unevalutatedProperties (Rob Herring)
- Reference existing dsi controller yaml in the common
  dsi controller file (Rob Herring)
- Correct the description of clock names to include only the
  clocks that are required (Rob Herring)
- Remove properties which are already covered under the common
  binding (Rob Herring)
- Add dsi phy supply nodes which are required for sc7180 and
  sdm845 targets (Rob Herring)
- Add type ref for syscon-sfpb (Rob Herring)

Changes in v6:
- Fixed errors during dt_binding_check (Rob Herring)
- Add maxItems for phys and phys-names (Rob Herring)
- Use unevaluatedProperties wherever required (Rob Herring)
- Removed interrupt controller from required properties for
  dsi controller (Rob Herring)
- Add constraints for dsi-phy reg-names based on the compatible
  phy version (Rob Herring)
- Add constraints for dsi-phy supply nodes based on the
  compatible phy version (Rob Herring)

Changes in v7:
- Add default value for qcom,mdss-mdp-transfer-time-us (Rob Herring)
- Modify the schema for data-lanes (Rob Herring)
- Split the phy schema into separate schemas based on
  the phy version (Rob Herring)

Changes in v8:
- Resolve merge conflicts with latest dsi.txt file
- Include dp yaml change also in the same series

Changes in v9:
- Combine target specific dsi controller yaml files
  to a single yaml file (Rob Herring)
- Combine target specific dsi phy yaml files into a
  single yaml file (Rob Herring)
- Use unevaluatedProperties and additionalProperties
  wherever required
- Remove duplicate properties from common yaml files

Changes in v10:
- Split the patch into separate patches for DPU, DSI and
  PHY (Stephen Boyd)
- Drop unnecessary fullstop (Stephen Boyd)
- Add newline whereever required (Stephen Boyd)
- Add description for clock used (Stephen Boyd)
- Modify the description for interconnect entries  (Stephen Boyd)
- Drop assigned clock entries as it a generic property (Stephen Boyd)
- Correct the definition for interrupts (Stephen Boyd)
- Drop clock names from required properties (Stephen Boyd)
- Drop labels for display nodes from example (Stephen Boyd)
- Drop flags from interrupts entries (Stephen Boyd)

Changes in v11:
- Drop maxItems for clocks (Stephen Boyd)

Changes in v12:
- Add description for register property (Stephen Boyd)
- Add maxItems for interrupts (Stephen Boyd)
- Add description for iommus property (Stephen Boyd)
- Add description for interconnects (Stephen Boyd)
- Change display node name to display_controller (Stephen Boyd)

Changes in v13:
- Add maxItems for reg property (Stephen Boyd)
- Add ranges property in example (Stephen Boyd)
- Modify description for iommus property (Stephen Boyd)
- Add Dp bindings for ports in the same patch (Stephen Boyd)
- Remove soc from examples and change address and size cells
  accordingly (Stephen Boyd)
- Add reference for ports

Changes in v14:
- Modify title for SC7180 and SDM845 yaml files (Stephen Boyd)
- Add required list for display-controller node (Stephen Boyd)

Changes in v16:
- Add reference for port (Rob Herring)
- Make additionalProperties as false (Rob Herring)
---
 .../bindings/display/msm/dpu-

Re: [RFC] Add DMA_RESV_USAGE flags

2021-05-18 Thread Daniel Vetter
On Tue, May 18, 2021 at 7:59 AM Daniel Vetter  wrote:
>
> On Tue, May 18, 2021 at 12:49 AM Jason Ekstrand  wrote:
> >
> > On Mon, May 17, 2021 at 3:15 PM Daniel Vetter  wrote:
> > >
> > > On Mon, May 17, 2021 at 9:38 PM Christian König
> > >  wrote:
> > > >
> > > > Am 17.05.21 um 17:04 schrieb Daniel Vetter:
> > > > > On Mon, May 17, 2021 at 04:11:18PM +0200, Christian König wrote:
> > > > >> We had a long outstanding problem in amdgpu that buffers exported to
> > > > >> user drivers by DMA-buf serialize all command submissions using them.
> > > > >>
> > > > >> In other words we can't compose the buffer with different engines and
> > > > >> then send it to another driver for display further processing.
> > > > >>
> > > > >> This was added to work around the fact that i915 didn't wanted to 
> > > > >> wait
> > > > >> for shared fences in the dma_resv objects before displaying a buffer.
> > > > >>
> > > > >> Since this problem is now causing issues with Vulkan we need to find 
> > > > >> a
> > > > >> better solution for that.
> > > > >>
> > > > >> The patch set here tries to do this by adding an usage flag to the
> > > > >> shared fences noting when and how they should participate in implicit
> > > > >> synchronization.
> > > > > So the way this is fixed in every other vulkan driver is that vulkan
> > > > > userspace sets flags in the CS ioctl when it wants to synchronize with
> > > > > implicit sync. This gets you mostly there. Last time I checked amdgpu
> > > > > isn't doing this, and yes that's broken.
> > > >
> > > > And exactly that is a really bad approach as far as I can see. The
> > > > Vulkan stack on top simply doesn't know when to set this flag during CS.
> > >
> > > Adding Jason for the Vulkan side of things, because this isn't how I
> > > understand this works.
> > >
> > > But purely form a kernel pov your patches are sketchy for two reasons:
> > >
> > > - we reinstate the amdgpu special case of not setting exclusive fences
> > >
> > > - you only fix the single special case of i915 display, nothing else
> > >
> > > That's not how a cross driver interface works. And if you'd do this
> > > properly, you'd be back to all the same sync fun you've orignally had,
> > > with all the same fallout.
> >
> > I think I'm starting to see what Christian is trying to do here and I
> > think there likely is a real genuine problem here.  I'm not convinced
> > this is 100% of a solution but there might be something real.  Let me
> > see if I can convince you or if I just make a hash of things. :-)
> >
> > The problem, once again, comes down to memory fencing vs. execution
> > fencing and the way that we've unfortunately tied them together in the
> > kernel.  With the current architecture, the only way to get proper
> > write-fence semantics for implicit sync is to take an exclusive fence
> > on the buffer.  This implies two things:
> >
> >  1. You have to implicitly wait on EVERY fence on the buffer before
> > you can start your write-fenced operation
> >
> >  2. No one else can start ANY operation which accesses that buffer
> > until you're done.
> >
> > Let's say that you have a buffer which is shared between two drivers A
> > and B and let's say driver A has thrown a fence on it just to ensure
> > that the BO doesn't get swapped out to disk until it's at a good
> > stopping point.  Then driver B comes along and wants to throw a
> > write-fence on it.  Suddenly, your memory fence from driver A causes
> > driver B to have to stall waiting for a "good" time to throw in a
> > fence.  It sounds like this is the sort of scenario that Christian is
> > running into.  And, yes, with certain Vulkan drivers being a bit
> > sloppy about exactly when they throw in write fences, I could see it
> > being a real problem.
>
> Yes this is a potential problem, and on the i915 side we need to do
> some shuffling here most likely. Especially due to discrete, but the
> problem is pre-existing. tbh I forgot about the implications here
> until I pondered this again yesterday evening.
>
> But afaiui the amdgpu code and winsys in mesa, this isn't (yet) the
> problem amd vk drivers have. The issue is that with amdgpu, all you
> supply are the following bits at CS time:
> - list of always mapped private buffers, which is implicit and O(1) in
> the kernel fastpath
> - additional list of shared buffers that are used by the current CS
>
> I didn't check how exactly that works wrt winsys buffer ownership, but
> the thing is that on the kernel side _any_ buffer in there is treated
> as a implicit sync'ed write. Which means if you render your winsys
> with a bunch of command submission split over 3d and compute pipes,
> you end up with horrendous amounts of oversync.
>
> The reason for this is that amdgpu decided to go with a different
> implicit sync model than everyone else:
> - within an drm file everything is unsynced and left to userspace to
> handle, amdgpu.ko only ever sets the shared fence slots.
> - this means the exclusive slot reall

Re: [PATCH 19/27] drm/i915/gem: Use the proto-context to handle create parameters

2021-05-18 Thread Jani Nikula
On Mon, 17 May 2021, Daniel Vetter  wrote:
> On Mon, May 17, 2021 at 7:05 PM Jason Ekstrand  wrote:
>>
>> On Mon, May 17, 2021 at 8:40 AM Daniel Vetter  wrote:
>> >
>> > On Fri, May 14, 2021 at 02:13:57PM -0500, Jason Ekstrand wrote:
>> > > I can add those.  I just don't know where to put it.  We don't have an
>> > > i915_gem_vm.h.  Suggestions?
>> >
>> > gt/intel_gtt.h seems to be the header for i915_address_space stuff. Also
>> > contains the i915_vma_ops but not i915_vma.
>> >
>> > It's a pretty good mess, but probably the best place for now for these :-/
>>
>> The one for contexts is in i915_drv.h so I put the VM one there too.
>> Feel free to tell me to move it.  I don't care where it goes.
>
> i915_drv.h is the og dumping ground and needs to die. Everything in
> there needs to be moved out/split/whatever for better code
> organization. If we have a place already that fits better (even if
> maybe misnamed) it's better to put it there.

I haven't really codified this anywhere, but this is what I've been
trying to drive:

* All functions in a .c file are declared in the corresponding .h
  file. 1:1 relationship.

* Have _types.h headers separately for defining types that lead to deep
  include chains. (We have this in part because we have absolutely
  everything in struct drm_i915_private, and everything needs everything
  else to look inside i915.)

* Minimize includes from headers. Prefer forward declarations where
  possible. Prefer specific includes over generic includes.

* Each header is self-contained (this is build-tested with
  CONFIG_DRM_I915_WERROR=y).

* Avoid static inlines unless you have a performance need.

* Don't have any externs. Interfaces over data; data is not an
  interface.

* Prefix functions in a file according to the filename. intel_foo.[ch]
  would have functions intel_foo_*(). Ditto i915_bar.[ch] and
  i915_bar_*(). (Avoid non-static platform specific functions, but if
  you have them, you'd name them e.g. skl_foo_*().)

Basically the rationale is to have more order in the chaos that we've
had for a long time. It's not so much about being pedantic about the
naming, but rather the secondary effect of making people think about
where they put stuff and how it's all grouped together.

IMO it's also easier to add file.[ch] and nuke it later than add stuff
to some of our huge files and then clean it up later.


BR,
Jani.

-- 
Jani Nikula, Intel Open Source Graphics Center


Re: [PATCH] drm/i915: only disable default vga device

2021-05-18 Thread Emil Velikov
Hi Ville,

On Mon, 17 May 2021 at 18:24, Ville Syrjälä
 wrote:
>
> On Sun, May 16, 2021 at 06:14:32PM +0100, Emil Velikov wrote:
> > From: Vivek Das Mohapatra 
> >
> > This patch is to do with seamless handover, eg when the sequence is
> > bootloader → plymouth → desktop.
> >
> > It switches the vga arbiter from the "other" GPU to the default one
> > (intel in this case), so the driver can issue some io().
>
> I don't understand what this commit message is trying to say.
>
Bunch of context is lost due to the patch age, so I'm not 100% sure of
the actual hardware setup where this occurs.
Does the following make sense?

Currently on dual GPU systems, we do not get seamless handover as the
output flickers during the transition bootloader -> plymouth ->
desktop.
This happens as a result of switching (via the VGA arbiter) from the
"other" GPU back to the default i915 one and issuing io() commands.

-Emil


Re: [PATCH v2 01/15] drm/i915: Untangle the vma pages_mutex

2021-05-18 Thread Maarten Lankhorst


Hey,

This needs a small fix, otherwise looks good.

Op 18-05-2021 om 10:26 schreef Thomas Hellström:
> From: Thomas Hellström 
>
> Any sleeping dma_resv lock taken while the vma pages_mutex is held
> will cause a lockdep splat.
> Move the i915_gem_object_pin_pages() call out of the pages_mutex
> critical section.
>
> Signed-off-by: Thomas Hellström 
> ---
>  drivers/gpu/drm/i915/i915_vma.c | 33 +++--
>  1 file changed, 19 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
> index a6cd0fa62847..7b1c0f4e60d7 100644
> --- a/drivers/gpu/drm/i915/i915_vma.c
> +++ b/drivers/gpu/drm/i915/i915_vma.c
> @@ -800,32 +800,37 @@ static bool try_qad_pin(struct i915_vma *vma, unsigned 
> int flags)
>  static int vma_get_pages(struct i915_vma *vma)
>  {
>   int err = 0;
> + bool pinned_pages = false;
>  
>   if (atomic_add_unless(&vma->pages_count, 1, 0))
>   return 0;
>  
> + if (vma->obj) {
> + err = i915_gem_object_pin_pages(vma->obj);
> + if (err)
> + return err;
> + pinned_pages = true;
> + }
> +
>   /* Allocations ahoy! */
> - if (mutex_lock_interruptible(&vma->pages_mutex))
> - return -EINTR;
> + if (mutex_lock_interruptible(&vma->pages_mutex)) {
> + err = -EINTR;
> + goto unpin;
> + }
>  
>   if (!atomic_read(&vma->pages_count)) {
> - if (vma->obj) {
> - err = i915_gem_object_pin_pages(vma->obj);
> - if (err)
> - goto unlock;
> - }
> -
>   err = vma->ops->set_pages(vma);
> - if (err) {
> - if (vma->obj)
> - i915_gem_object_unpin_pages(vma->obj);
> + if (err)
>   goto unlock;
> - }
> + pinned_pages = false;
>   }
>   atomic_inc(&vma->pages_count);
>  
>  unlock:
>   mutex_unlock(&vma->pages_mutex);
> +unpin:
> + if (pinned_pages)
> + __i915_gem_object_unpin_pages(vma->obj);
>  
>   return err;
>  }
> @@ -838,10 +843,10 @@ static void __vma_put_pages(struct i915_vma *vma, 
> unsigned int count)
>   if (atomic_sub_return(count, &vma->pages_count) == 0) {
>   vma->ops->clear_pages(vma);
>   GEM_BUG_ON(vma->pages);
> - if (vma->obj)
> - i915_gem_object_unpin_pages(vma->obj);
>   }
>   mutex_unlock(&vma->pages_mutex);
> + if (vma->obj)
> + i915_gem_object_unpin_pages(vma->obj);

You're unconditionally unpinning pages here, if pages_count wasn't dropped to 
0, we will go negative.

With that fixed:

Reviewed-by: Maarten Lankhorst 



[PATCH 2/3] drm: clarify and linkify DRM_CLIENT_CAP_WRITEBACK_CONNECTORS docs

2021-05-18 Thread Simon Ser
Make it clear that the client is responsible for enabling ATOMIC
prior to enabling WRITEBACK_CONNECTORS. Linkify the reference to
ATOMIC.

Signed-off-by: Simon Ser 
Cc: Daniel Vetter 
Cc: Daniel Stone 
Cc: Pekka Paalanen 
---
 include/uapi/drm/drm.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h
index 1c947227f72b..87878aea4526 100644
--- a/include/uapi/drm/drm.h
+++ b/include/uapi/drm/drm.h
@@ -812,8 +812,8 @@ struct drm_get_cap {
  * DRM_CLIENT_CAP_WRITEBACK_CONNECTORS
  *
  * If set to 1, the DRM core will expose special connectors to be used for
- * writing back to memory the scene setup in the commit. Depends on client
- * also supporting DRM_CLIENT_CAP_ATOMIC
+ * writing back to memory the scene setup in the commit. The client must enable
+ * &DRM_CLIENT_CAP_ATOMIC first.
  */
 #define DRM_CLIENT_CAP_WRITEBACK_CONNECTORS5
 
-- 
2.31.1




[PATCH 3/3] drm: document minimum kernel version for DRM_CLIENT_CAP_*

2021-05-18 Thread Simon Ser
The kernel versions including the following commits are referenced:

DRM_CLIENT_CAP_STEREO_3D
61d8e3282541 ("drm: Add a STEREO_3D capability to the SET_CLIENT_CAP ioctl")

DRM_CLIENT_CAP_UNIVERSAL_PLANES
681e7ec73044 ("drm: Allow userspace to ask for universal plane list (v2)")
c7dbc6c9ae5c ("drm: Remove command line guard for universal planes")

DRM_CLIENT_CAP_ATOMIC
88a48e297b3a ("drm: add atomic properties")
8b72ce158cf0 ("drm: Always enable atomic API")

DRM_CLIENT_CAP_ASPECT_RATIO
7595bda2fb43 ("drm: Add DRM client cap for aspect-ratio")

DRM_CLIENT_CAP_WRITEBACK_CONNECTORS
d67b6a206507 ("drm: writeback: Add client capability for exposing writeback 
connectors")

Signed-off-by: Simon Ser 
Cc: Daniel Vetter 
Cc: Daniel Stone 
Cc: Pekka Paalanen 
---
 include/uapi/drm/drm.h | 17 +
 1 file changed, 17 insertions(+)

diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h
index 87878aea4526..ec2b122cdcc5 100644
--- a/include/uapi/drm/drm.h
+++ b/include/uapi/drm/drm.h
@@ -780,6 +780,8 @@ struct drm_get_cap {
  * If set to 1, the DRM core will expose the stereo 3D capabilities of the
  * monitor by advertising the supported 3D layouts in the flags of struct
  * drm_mode_modeinfo. See ``DRM_MODE_FLAG_3D_*``.
+ *
+ * This capability is always supported starting from kernel version 3.13.
  */
 #define DRM_CLIENT_CAP_STEREO_3D   1
 
@@ -788,6 +790,9 @@ struct drm_get_cap {
  *
  * If set to 1, the DRM core will expose all planes (overlay, primary, and
  * cursor) to userspace.
+ *
+ * This capability has been introduced in kernel version 3.15. Starting from
+ * kernel version 3.17, this capability is always supported.
  */
 #define DRM_CLIENT_CAP_UNIVERSAL_PLANES  2
 
@@ -797,6 +802,13 @@ struct drm_get_cap {
  * If set to 1, the DRM core will expose atomic properties to userspace. This
  * implicitly enables &DRM_CLIENT_CAP_UNIVERSAL_PLANES and
  * &DRM_CLIENT_CAP_ASPECT_RATIO.
+ *
+ * If the driver doesn't support atomic mode-setting, enabling this capability
+ * will fail with -EOPNOTSUPP.
+ *
+ * This capability has been introduced in kernel version 4.0. Starting from
+ * kernel version 4.2, this capability is always supported for atomic-capable
+ * drivers.
  */
 #define DRM_CLIENT_CAP_ATOMIC  3
 
@@ -805,6 +817,8 @@ struct drm_get_cap {
  *
  * If set to 1, the DRM core will provide aspect ratio information in modes.
  * See ``DRM_MODE_FLAG_PIC_AR_*``.
+ *
+ * This capability is always supported starting from kernel version 4.18.
  */
 #define DRM_CLIENT_CAP_ASPECT_RATIO4
 
@@ -814,6 +828,9 @@ struct drm_get_cap {
  * If set to 1, the DRM core will expose special connectors to be used for
  * writing back to memory the scene setup in the commit. The client must enable
  * &DRM_CLIENT_CAP_ATOMIC first.
+ *
+ * This capability is always supported for atomic-capable drivers starting from
+ * kernel version 4.19.
  */
 #define DRM_CLIENT_CAP_WRITEBACK_CONNECTORS5
 
-- 
2.31.1




[PATCH 1/3] drm: reference mode flags in DRM_CLIENT_CAP_* docs

2021-05-18 Thread Simon Ser
In the docs for DRM_CLIENT_CAP_STEREO_3D and
DRM_CLIENT_CAP_ASPECT_RATIO, reference the DRM_MODE_FLAG_* defines
that get set when the cap is enabled.

Signed-off-by: Simon Ser 
Cc: Daniel Vetter 
Cc: Daniel Stone 
Cc: Pekka Paalanen 
---
 include/uapi/drm/drm.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h
index 67b94bc3c885..1c947227f72b 100644
--- a/include/uapi/drm/drm.h
+++ b/include/uapi/drm/drm.h
@@ -777,9 +777,9 @@ struct drm_get_cap {
 /**
  * DRM_CLIENT_CAP_STEREO_3D
  *
- * if set to 1, the DRM core will expose the stereo 3D capabilities of the
+ * If set to 1, the DRM core will expose the stereo 3D capabilities of the
  * monitor by advertising the supported 3D layouts in the flags of struct
- * drm_mode_modeinfo.
+ * drm_mode_modeinfo. See ``DRM_MODE_FLAG_3D_*``.
  */
 #define DRM_CLIENT_CAP_STEREO_3D   1
 
@@ -804,6 +804,7 @@ struct drm_get_cap {
  * DRM_CLIENT_CAP_ASPECT_RATIO
  *
  * If set to 1, the DRM core will provide aspect ratio information in modes.
+ * See ``DRM_MODE_FLAG_PIC_AR_*``.
  */
 #define DRM_CLIENT_CAP_ASPECT_RATIO4
 
-- 
2.31.1




Re: [PATCH] drm/i915: only disable default vga device

2021-05-18 Thread Ville Syrjälä
On Tue, May 18, 2021 at 12:09:56PM +0100, Emil Velikov wrote:
> Hi Ville,
> 
> On Mon, 17 May 2021 at 18:24, Ville Syrjälä
>  wrote:
> >
> > On Sun, May 16, 2021 at 06:14:32PM +0100, Emil Velikov wrote:
> > > From: Vivek Das Mohapatra 
> > >
> > > This patch is to do with seamless handover, eg when the sequence is
> > > bootloader → plymouth → desktop.
> > >
> > > It switches the vga arbiter from the "other" GPU to the default one
> > > (intel in this case), so the driver can issue some io().
> >
> > I don't understand what this commit message is trying to say.
> >
> Bunch of context is lost due to the patch age, so I'm not 100% sure of
> the actual hardware setup where this occurs.
> Does the following make sense?
> 
> Currently on dual GPU systems, we do not get seamless handover as the
> output flickers during the transition bootloader -> plymouth ->
> desktop.
> This happens as a result of switching (via the VGA arbiter) from the
> "other" GPU back to the default i915 one and issuing io() commands.

Hmm. Does this work?

--- a/drivers/gpu/drm/i915/display/intel_vga.c
+++ b/drivers/gpu/drm/i915/display/intel_vga.c
@@ -29,6 +29,9 @@ void intel_vga_disable(struct drm_i915_private *dev_priv)
i915_reg_t vga_reg = intel_vga_cntrl_reg(dev_priv);
u8 sr1;
 
+   if (intel_de_read(dev_priv, vga_reg) & VGA_DISP_DISABLE)
+   return;
+
/* WaEnableVGAAccessThroughIOPort:ctg,elk,ilk,snb,ivb,vlv,hsw */
vga_get_uninterruptible(pdev, VGA_RSRC_LEGACY_IO);
outb(SR01, VGA_SR_INDEX);

-- 
Ville Syrjälä
Intel


Re: [PATCH v2 02/15] drm/i915: Don't free shared locks while shared

2021-05-18 Thread Maarten Lankhorst
Op 18-05-2021 om 10:26 schreef Thomas Hellström:
> We are currently sharing the VM reservation locks across a number of
> gem objects with page-table memory. Since TTM will individiualize the
> reservation locks when freeing objects, including accessing the shared
> locks, make sure that the shared locks are not freed until that is done.
> For PPGTT we add an additional refcount, for GGTT we take additional
> measures to make sure objects sharing the GGTT reservation lock are
> freed at GGTT takedown
>
> Signed-off-by: Thomas Hellström 
> ---
> v2: Try harder to make sure objects sharing the GGTT reservation lock are
> freed at GGTT takedown.
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_object.c|  3 ++
>  .../gpu/drm/i915/gem/i915_gem_object_types.h  |  1 +
>  drivers/gpu/drm/i915/gt/intel_ggtt.c  | 19 ++--
>  drivers/gpu/drm/i915/gt/intel_gtt.c   | 45 +++
>  drivers/gpu/drm/i915/gt/intel_gtt.h   | 30 -
>  drivers/gpu/drm/i915/gt/intel_ppgtt.c |  2 +-
>  drivers/gpu/drm/i915/i915_drv.c   |  5 +++
>  7 files changed, 92 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> index 28144410df86..abadf0994ad0 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> @@ -252,6 +252,9 @@ static void __i915_gem_free_objects(struct 
> drm_i915_private *i915,
>   if (obj->mm.n_placements > 1)
>   kfree(obj->mm.placements);
>  
> + if (obj->resv_shared_from)
> + i915_vm_resv_put(obj->resv_shared_from);
> +
>   /* But keep the pointer alive for RCU-protected lookups */
>   call_rcu(&obj->rcu, __i915_gem_free_object_rcu);
>   cond_resched();
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h 
> b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> index 0727d0c76aa0..450340a73186 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> @@ -149,6 +149,7 @@ struct drm_i915_gem_object {
>* when i915_gem_ww_ctx_backoff() or i915_gem_ww_ctx_fini() are called.
>*/
>   struct list_head obj_link;
> + struct dma_resv *resv_shared_from;

Since this can only be a vm object, would it make sense to make this a pointer 
to the vm address space, so we can call vm_resv_put on it directly?

The current pointer type and name makes it look generic, but if you try to use 
it with anything but an address space, it will blow up.

Otherwise looks good. I guess we cannot force all bo's to be deleted before the 
vm is freed. :-)

So with that fixed

Reviewed-by: Maarten Lankhorst 

~Maarten

>   union {
>   struct rcu_head rcu;
> diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c 
> b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> index 35069ca5d7de..10c23a749a95 100644
> --- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
> +++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
> @@ -746,7 +746,6 @@ static void ggtt_cleanup_hw(struct i915_ggtt *ggtt)
>  
>   mutex_unlock(&ggtt->vm.mutex);
>   i915_address_space_fini(&ggtt->vm);
> - dma_resv_fini(&ggtt->vm.resv);
>  
>   arch_phys_wc_del(ggtt->mtrr);
>  
> @@ -768,6 +767,19 @@ void i915_ggtt_driver_release(struct drm_i915_private 
> *i915)
>   ggtt_cleanup_hw(ggtt);
>  }
>  
> +/**
> + * i915_ggtt_driver_late_release - Cleanup of GGTT that needs to be done 
> after
> + * all free objects have been drained.
> + * @i915: i915 device
> + */
> +void i915_ggtt_driver_late_release(struct drm_i915_private *i915)
> +{
> + struct i915_ggtt *ggtt = &i915->ggtt;
> +
> + GEM_WARN_ON(kref_read(&ggtt->vm.resv_ref) != 1);
> + dma_resv_fini(&ggtt->vm._resv);
> +}
> +
>  static unsigned int gen6_get_total_gtt_size(u16 snb_gmch_ctl)
>  {
>   snb_gmch_ctl >>= SNB_GMCH_GGMS_SHIFT;
> @@ -829,6 +841,7 @@ static int ggtt_probe_common(struct i915_ggtt *ggtt, u64 
> size)
>   return -ENOMEM;
>   }
>  
> + kref_init(&ggtt->vm.resv_ref);
>   ret = setup_scratch_page(&ggtt->vm);
>   if (ret) {
>   drm_err(&i915->drm, "Scratch setup failed\n");
> @@ -1135,7 +1148,7 @@ static int ggtt_probe_hw(struct i915_ggtt *ggtt, struct 
> intel_gt *gt)
>   ggtt->vm.gt = gt;
>   ggtt->vm.i915 = i915;
>   ggtt->vm.dma = i915->drm.dev;
> - dma_resv_init(&ggtt->vm.resv);
> + dma_resv_init(&ggtt->vm._resv);
>  
>   if (INTEL_GEN(i915) <= 5)
>   ret = i915_gmch_probe(ggtt);
> @@ -1144,7 +1157,7 @@ static int ggtt_probe_hw(struct i915_ggtt *ggtt, struct 
> intel_gt *gt)
>   else
>   ret = gen8_gmch_probe(ggtt);
>   if (ret) {
> - dma_resv_fini(&ggtt->vm.resv);
> + dma_resv_fini(&ggtt->vm._resv);
>   return ret;
>   }
>  
> diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c 
> 

Re: [PATCH v2 01/15] drm/i915: Untangle the vma pages_mutex

2021-05-18 Thread Thomas Hellström



On 5/18/21 1:12 PM, Maarten Lankhorst wrote:

Hey,

This needs a small fix, otherwise looks good.

Op 18-05-2021 om 10:26 schreef Thomas Hellström:

From: Thomas Hellström 

Any sleeping dma_resv lock taken while the vma pages_mutex is held
will cause a lockdep splat.
Move the i915_gem_object_pin_pages() call out of the pages_mutex
critical section.

Signed-off-by: Thomas Hellström 
---
  drivers/gpu/drm/i915/i915_vma.c | 33 +++--
  1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index a6cd0fa62847..7b1c0f4e60d7 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -800,32 +800,37 @@ static bool try_qad_pin(struct i915_vma *vma, unsigned 
int flags)
  static int vma_get_pages(struct i915_vma *vma)
  {
int err = 0;
+   bool pinned_pages = false;
  
  	if (atomic_add_unless(&vma->pages_count, 1, 0))

return 0;
  
+	if (vma->obj) {

+   err = i915_gem_object_pin_pages(vma->obj);
+   if (err)
+   return err;
+   pinned_pages = true;
+   }
+
/* Allocations ahoy! */
-   if (mutex_lock_interruptible(&vma->pages_mutex))
-   return -EINTR;
+   if (mutex_lock_interruptible(&vma->pages_mutex)) {
+   err = -EINTR;
+   goto unpin;
+   }
  
  	if (!atomic_read(&vma->pages_count)) {

-   if (vma->obj) {
-   err = i915_gem_object_pin_pages(vma->obj);
-   if (err)
-   goto unlock;
-   }
-
err = vma->ops->set_pages(vma);
-   if (err) {
-   if (vma->obj)
-   i915_gem_object_unpin_pages(vma->obj);
+   if (err)
goto unlock;
-   }
+   pinned_pages = false;
}
atomic_inc(&vma->pages_count);
  
  unlock:

mutex_unlock(&vma->pages_mutex);
+unpin:
+   if (pinned_pages)
+   __i915_gem_object_unpin_pages(vma->obj);
  
  	return err;

  }
@@ -838,10 +843,10 @@ static void __vma_put_pages(struct i915_vma *vma, 
unsigned int count)
if (atomic_sub_return(count, &vma->pages_count) == 0) {
vma->ops->clear_pages(vma);
GEM_BUG_ON(vma->pages);
-   if (vma->obj)
-   i915_gem_object_unpin_pages(vma->obj);
}
mutex_unlock(&vma->pages_mutex);
+   if (vma->obj)
+   i915_gem_object_unpin_pages(vma->obj);

You're unconditionally unpinning pages here, if pages_count wasn't dropped to 
0, we will go negative.

With that fixed:

Reviewed-by: Maarten Lankhorst 


Ah yes, thanks. That was a leftover from an earlier version...

/Thomas




Re: [Intel-gfx] [PATCH v2 02/15] drm/i915: Don't free shared locks while shared

2021-05-18 Thread Intel



On 5/18/21 1:18 PM, Maarten Lankhorst wrote:

Op 18-05-2021 om 10:26 schreef Thomas Hellström:

We are currently sharing the VM reservation locks across a number of
gem objects with page-table memory. Since TTM will individiualize the
reservation locks when freeing objects, including accessing the shared
locks, make sure that the shared locks are not freed until that is done.
For PPGTT we add an additional refcount, for GGTT we take additional
measures to make sure objects sharing the GGTT reservation lock are
freed at GGTT takedown

Signed-off-by: Thomas Hellström 
---
v2: Try harder to make sure objects sharing the GGTT reservation lock are
freed at GGTT takedown.
---
  drivers/gpu/drm/i915/gem/i915_gem_object.c|  3 ++
  .../gpu/drm/i915/gem/i915_gem_object_types.h  |  1 +
  drivers/gpu/drm/i915/gt/intel_ggtt.c  | 19 ++--
  drivers/gpu/drm/i915/gt/intel_gtt.c   | 45 +++
  drivers/gpu/drm/i915/gt/intel_gtt.h   | 30 -
  drivers/gpu/drm/i915/gt/intel_ppgtt.c |  2 +-
  drivers/gpu/drm/i915/i915_drv.c   |  5 +++
  7 files changed, 92 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 28144410df86..abadf0994ad0 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -252,6 +252,9 @@ static void __i915_gem_free_objects(struct drm_i915_private 
*i915,
if (obj->mm.n_placements > 1)
kfree(obj->mm.placements);
  
+		if (obj->resv_shared_from)

+   i915_vm_resv_put(obj->resv_shared_from);
+
/* But keep the pointer alive for RCU-protected lookups */
call_rcu(&obj->rcu, __i915_gem_free_object_rcu);
cond_resched();
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h 
b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
index 0727d0c76aa0..450340a73186 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
@@ -149,6 +149,7 @@ struct drm_i915_gem_object {
 * when i915_gem_ww_ctx_backoff() or i915_gem_ww_ctx_fini() are called.
 */
struct list_head obj_link;
+   struct dma_resv *resv_shared_from;

Since this can only be a vm object, would it make sense to make this a pointer 
to the vm address space, so we can call vm_resv_put on it directly?

The current pointer type and name makes it look generic, but if you try to use 
it with anything but an address space, it will blow up.

Otherwise looks good. I guess we cannot force all bo's to be deleted before the 
vm is freed. :-)

So with that fixed

Reviewed-by: Maarten Lankhorst 


Thanks, I'll take a look at that.

Thomas




Re: [PATCH v2 06/15] drm/i915/ttm: Embed a ttm buffer object in the i915 gem object

2021-05-18 Thread Maarten Lankhorst
Op 18-05-2021 om 10:26 schreef Thomas Hellström:
> Embed a struct ttm_buffer_object into the i915 gem object, making sure
> we alias the gem object part. It's a bit unfortunate that the
> struct ttm_buffer_ojbect embeds a gem object since we otherwise could
> make the TTM part private to the TTM backend, and use the usual
> i915 gem object for the other backends.
> To make this a bit more storage efficient for the other backends,
> we'd have to use a pointer for the gem object which would require
> a lot of changes in the driver. We postpone that for later.
>
> Signed-off-by: Thomas Hellström 
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_object.c   |  7 +++
>  drivers/gpu/drm/i915/gem/i915_gem_object_types.h | 12 +++-
>  2 files changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> index abadf0994ad0..c8953e3f5c70 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
> @@ -62,6 +62,13 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj,
> const struct drm_i915_gem_object_ops *ops,
> struct lock_class_key *key, unsigned flags)
>  {
> + /*
> +  * A gem object is embedded both in a struct ttm_buffer_object :/ and
> +  * in a drm_i915_gem_object. Make sure they are aliased.
> +  */
> + BUILD_BUG_ON(offsetof(typeof(*obj), base) !=
> +  offsetof(typeof(*obj), __do_not_access.base));
> +
>   spin_lock_init(&obj->vma.lock);
>   INIT_LIST_HEAD(&obj->vma.list);
>  
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h 
> b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> index dbd7fffe956e..98f69d8fd37d 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> @@ -10,6 +10,7 @@
>  #include 
>  
>  #include 
> +#include 
>  #include 
>  
>  #include "i915_active.h"
> @@ -99,7 +100,16 @@ struct i915_gem_object_page_iter {
>  };
>  
>  struct drm_i915_gem_object {
> - struct drm_gem_object base;
> + /*
> +  * We might have reason to revisit the below since it wastes
> +  * a lot of space for non-ttm gem objects.
> +  * In any case, always use the accessors for the ttm_buffer_object
> +  * when accessing it.
> +  */
> + union {
> + struct drm_gem_object base;
> + struct ttm_buffer_object __do_not_access;
> + };
>  
>   const struct drm_i915_gem_object_ops *ops;
>  

Considering Dave did roughly the same in his patches, I don't think there's a 
better way to do this.

Although he just wrapped base.base using sizes. This works too. It probably 
needs someone else's r-b too, to ensure this is allowed.

Acked-by: Maarten Lankhorst 



Re: [PATCH v2 07/15] drm/ttm: Export ttm_bo_tt_destroy()

2021-05-18 Thread Maarten Lankhorst
Op 18-05-2021 om 10:26 schreef Thomas Hellström:
> For the upcoming kmapping i915 memcpy_move, export ttm_bo_tt_destroy().
> A future change might be to move the new memcpy_move into ttm, replacing
> the old ioremapping one.
>
> Cc: Christian König 
> Signed-off-by: Thomas Hellström 
> ---
>  drivers/gpu/drm/ttm/ttm_bo.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index ca1b098b6a56..4479c55aaa1d 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -1221,3 +1221,4 @@ void ttm_bo_tt_destroy(struct ttm_buffer_object *bo)
>   ttm_tt_destroy(bo->bdev, bo->ttm);
>   bo->ttm = NULL;
>  }
> +EXPORT_SYMBOL(ttm_bo_tt_destroy);

Looks sane, could we reorder the patches to put all ttm changes first?

Reviewed-by: Maarten Lankhorst 



Re: [PATCH v2 04/15] drm/ttm: Export functions to initialize and finalize the ttm range manager standalone

2021-05-18 Thread Christian König

Am 18.05.21 um 10:26 schrieb Thomas Hellström:

i915 mock selftests are run without the device set up. In order to be able
to run the region related mock selftests, export functions in order for the
TTM range manager to be set up without a device to attach it to.


From the code it looks good, but to be honest I don't think that this 
makes much sense from the organizational point of view.


If a self test exercises internals of TTM it should be moved into TTM as 
well.


Christian.



Cc: Christian König 
Signed-off-by: Thomas Hellström 
---
  drivers/gpu/drm/ttm/ttm_range_manager.c | 55 +
  include/drm/ttm/ttm_bo_driver.h | 23 +++
  2 files changed, 61 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_range_manager.c 
b/drivers/gpu/drm/ttm/ttm_range_manager.c
index b9d5da6e6a81..6957dfb0cf5a 100644
--- a/drivers/gpu/drm/ttm/ttm_range_manager.c
+++ b/drivers/gpu/drm/ttm/ttm_range_manager.c
@@ -125,55 +125,76 @@ static const struct ttm_resource_manager_func 
ttm_range_manager_func = {
.debug = ttm_range_man_debug
  };
  
-int ttm_range_man_init(struct ttm_device *bdev,

-  unsigned type, bool use_tt,
-  unsigned long p_size)
+struct ttm_resource_manager *
+ttm_range_man_init_standalone(unsigned long size, bool use_tt)
  {
struct ttm_resource_manager *man;
struct ttm_range_manager *rman;
  
  	rman = kzalloc(sizeof(*rman), GFP_KERNEL);

if (!rman)
-   return -ENOMEM;
+   return ERR_PTR(-ENOMEM);
  
  	man = &rman->manager;

man->use_tt = use_tt;
  
  	man->func = &ttm_range_manager_func;
  
-	ttm_resource_manager_init(man, p_size);

+   ttm_resource_manager_init(man, size);
  
-	drm_mm_init(&rman->mm, 0, p_size);

+   drm_mm_init(&rman->mm, 0, size);
spin_lock_init(&rman->lock);
  
-	ttm_set_driver_manager(bdev, type, &rman->manager);

+   return man;
+}
+EXPORT_SYMBOL(ttm_range_man_init_standalone);
+
+int ttm_range_man_init(struct ttm_device *bdev,
+  unsigned int type, bool use_tt,
+  unsigned long p_size)
+{
+   struct ttm_resource_manager *man;
+
+   man = ttm_range_man_init_standalone(p_size, use_tt);
+   if (IS_ERR(man))
+   return PTR_ERR(man);
+
ttm_resource_manager_set_used(man, true);
+   ttm_set_driver_manager(bdev, type, man);
+
return 0;
  }
  EXPORT_SYMBOL(ttm_range_man_init);
  
+void ttm_range_man_fini_standalone(struct ttm_resource_manager *man)

+{
+   struct ttm_range_manager *rman = to_range_manager(man);
+   struct drm_mm *mm = &rman->mm;
+
+   spin_lock(&rman->lock);
+   drm_mm_clean(mm);
+   drm_mm_takedown(mm);
+   spin_unlock(&rman->lock);
+
+   ttm_resource_manager_cleanup(man);
+   kfree(rman);
+}
+EXPORT_SYMBOL(ttm_range_man_fini_standalone);
+
  int ttm_range_man_fini(struct ttm_device *bdev,
   unsigned type)
  {
struct ttm_resource_manager *man = ttm_manager_type(bdev, type);
-   struct ttm_range_manager *rman = to_range_manager(man);
-   struct drm_mm *mm = &rman->mm;
int ret;
  
  	ttm_resource_manager_set_used(man, false);

-
ret = ttm_resource_manager_evict_all(bdev, man);
if (ret)
return ret;
  
-	spin_lock(&rman->lock);

-   drm_mm_clean(mm);
-   drm_mm_takedown(mm);
-   spin_unlock(&rman->lock);
-
-   ttm_resource_manager_cleanup(man);
ttm_set_driver_manager(bdev, type, NULL);
-   kfree(rman);
+   ttm_range_man_fini_standalone(man);
+
return 0;
  }
  EXPORT_SYMBOL(ttm_range_man_fini);
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
index dbccac957f8f..734b1712ea72 100644
--- a/include/drm/ttm/ttm_bo_driver.h
+++ b/include/drm/ttm/ttm_bo_driver.h
@@ -321,6 +321,20 @@ int ttm_range_man_init(struct ttm_device *bdev,
   unsigned type, bool use_tt,
   unsigned long p_size);
  
+/**

+ * ttm_range_man_init_standalone - Initialize a ttm range manager without
+ * device interaction.
+ * @size: Size of the area to be managed in pages.
+ * @use_tt: The memory type requires tt backing.
+ *
+ * This function is intended for selftests. It initializes a range manager
+ * without any device interaction.
+ *
+ * Return: pointer to a range manager on success. Error pointer on failure.
+ */
+struct ttm_resource_manager *
+ttm_range_man_init_standalone(unsigned long size, bool use_tt);
+
  /**
   * ttm_range_man_fini
   *
@@ -332,4 +346,13 @@ int ttm_range_man_init(struct ttm_device *bdev,
  int ttm_range_man_fini(struct ttm_device *bdev,
   unsigned type);
  
+/**

+ * ttm_range_man_fini_standalone
+ * @man: The range manager
+ *
+ * Tear down a range manager initialized with
+ * ttm_range_manager_init_standalone().
+ */
+void ttm_range_man_fini_standalone(struct ttm_resource_manager *man)

Re: [PATCH v2 08/15] drm/i915/ttm Add a generic TTM memcpy move for page-based iomem

2021-05-18 Thread Christian König




Am 18.05.21 um 10:26 schrieb Thomas Hellström:

The internal ttm_bo_util memcpy uses vmap functionality, and while it
probably might be possible to use it for copying in- and out of
sglist represented io memory, using io_mem_reserve() / io_mem_free()
callbacks, that would cause problems with fault().
Instead, implement a method mapping page-by-page using kmap_local()
semantics. As an additional benefit we then avoid the occasional global
TLB flushes of vmap() and consuming vmap space, elimination of a critical
point of failure and with a slight change of semantics we could also push
the memcpy out async for testing and async driver develpment purposes.
Pushing out async can be done since there is no memory allocation going on
that could violate the dma_fence lockdep rules.

For copies from iomem, use the WC prefetching memcpy variant for
additional speed.

Note that drivers that don't want to use struct io_mapping but relies on
memremap functionality, and that don't want to use scatterlists for
VRAM may well define specialized (hopefully reusable) iterators for their
particular environment.


In general yes please since I have that as TODO for TTM for a very long 
time.


But I would prefer to fix the implementation in TTM instead and give it 
proper cursor handling.


Amdgpu is also using page based iomem and we are having similar 
workarounds in place there as well.


I think it makes sense to unify this inside TTM and remove the old 
memcpy util function when done.


Regards,
Christian.



Cc: Christian König 
Signed-off-by: Thomas Hellström 
---
v2:
- Move new TTM exports to a separate commit. (Reported by Christian König)
- Avoid having the iterator init functions inline. (Reported by Jani Nikula)
- Remove a stray comment.
---
  drivers/gpu/drm/i915/Makefile |   1 +
  .../gpu/drm/i915/gem/i915_gem_ttm_bo_util.c   | 194 ++
  .../gpu/drm/i915/gem/i915_gem_ttm_bo_util.h   | 107 ++
  3 files changed, 302 insertions(+)
  create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_ttm_bo_util.c
  create mode 100644 drivers/gpu/drm/i915/gem/i915_gem_ttm_bo_util.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index cb8823570996..958ccc1edfed 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -155,6 +155,7 @@ gem-y += \
gem/i915_gem_stolen.o \
gem/i915_gem_throttle.o \
gem/i915_gem_tiling.o \
+   gem/i915_gem_ttm_bo_util.o \
gem/i915_gem_userptr.o \
gem/i915_gem_wait.o \
gem/i915_gemfs.o
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_bo_util.c 
b/drivers/gpu/drm/i915/gem/i915_gem_ttm_bo_util.c
new file mode 100644
index ..5f347a85bf44
--- /dev/null
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_bo_util.c
@@ -0,0 +1,194 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2021 Intel Corporation
+ */
+
+/**
+ * DOC: Usage and intentions.
+ *
+ * This file contains functionality that we might want to move into
+ * ttm_bo_util.c if there is a common interest.
+ * Currently a kmap_local only memcpy with support for page-based iomem 
regions,
+ * and fast memcpy from write-combined memory.
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include "i915_memcpy.h"
+
+#include "gem/i915_gem_ttm_bo_util.h"
+
+static void i915_ttm_kmap_iter_tt_kmap_local(struct i915_ttm_kmap_iter *iter,
+struct dma_buf_map *dmap,
+pgoff_t i)
+{
+   struct i915_ttm_kmap_iter_tt *iter_tt =
+   container_of(iter, typeof(*iter_tt), base);
+
+   dma_buf_map_set_vaddr(dmap, kmap_local_page(iter_tt->tt->pages[i]));
+}
+
+static void i915_ttm_kmap_iter_iomap_kmap_local(struct i915_ttm_kmap_iter 
*iter,
+   struct dma_buf_map *dmap,
+   pgoff_t i)
+{
+   struct i915_ttm_kmap_iter_iomap *iter_io =
+   container_of(iter, typeof(*iter_io), base);
+   void __iomem *addr;
+
+retry:
+   while (i >= iter_io->cache.end) {
+   iter_io->cache.sg = iter_io->cache.sg ?
+   sg_next(iter_io->cache.sg) : iter_io->st->sgl;
+   iter_io->cache.i = iter_io->cache.end;
+   iter_io->cache.end += sg_dma_len(iter_io->cache.sg) >>
+   PAGE_SHIFT;
+   iter_io->cache.offs = sg_dma_address(iter_io->cache.sg) -
+   iter_io->start;
+   }
+
+   if (i < iter_io->cache.i) {
+   iter_io->cache.end = 0;
+   iter_io->cache.sg = NULL;
+   goto retry;
+   }
+
+   addr = io_mapping_map_local_wc(iter_io->iomap, iter_io->cache.offs +
+  (((resource_size_t)i - iter_io->cache.i)
+   << PAGE_SHIFT));
+   dma_buf_map_set_vaddr_iomem(dmap, addr);
+}
+
+static const struct i91

Re: [PATCH v8 1/8] mm: Remove special swap entry functions

2021-05-18 Thread Alistair Popple
On Tuesday, 18 May 2021 12:17:32 PM AEST Peter Xu wrote:
> On Wed, Apr 07, 2021 at 06:42:31PM +1000, Alistair Popple wrote:
> > +static inline struct page *pfn_swap_entry_to_page(swp_entry_t entry)
> > +{
> > + struct page *p = pfn_to_page(swp_offset(entry));
> > +
> > + /*
> > +  * Any use of migration entries may only occur while the
> > +  * corresponding page is locked
> > +  */
> > + BUG_ON(is_migration_entry(entry) && !PageLocked(p));
> > +
> > + return p;
> > +}
> 
> Would swap_pfn_entry_to_page() be slightly better?
> 
> The thing is it's very easy to read pfn_*() as a function to take a pfn as
> parameter...
> 
> Since I'm also recently working on some swap-related new ptes [1], I'm
> thinking whether we could name these swap entries as "swap XXX entries". 
> Say, "swap hwpoison entry", "swap pfn entry" (which is a superset of "swap
> migration entry", "swap device exclusive entry", ...).  That's where I came
> with the above swap_pfn_entry_to_page(), then below will be naturally
> is_swap_pfn_entry().

Equally though "hwpoison swap entry", "pfn swap entry", "migration swap 
entry", etc. also makes sense (at least to me), but does that not fit in as 
well with your series? I haven't looked too deeply at your series but have 
been meaning to so thanks for the pointer.

> No strong opinion as this is already a v8 series (and sorry to chim in this
> late), just to raise this up.

No worries, it's good timing as I was about to send a v9 which was just a 
rebase anyway. I am hoping to try and get this accepted for the next merge 
window but I will wait before sending v9 to see if anyone else has thoughts on 
the naming here.

I don't have a particularly strong opinion either, and your justification is 
more thought than I gave to naming these originally so am happy to rename if 
it's more readable or fits better with your series.

Thanks.

 - Alistair

> [1] https://lore.kernel.org/lkml/20210427161317.50682-1-pet...@redhat.com/
> 
> Thanks,
> 
> > +
> > +/*
> > + * A pfn swap entry is a special type of swap entry that always has a pfn
> > stored + * in the swap offset. They are used to represent unaddressable
> > device memory + * and to restrict access to a page undergoing migration.
> > + */
> > +static inline bool is_pfn_swap_entry(swp_entry_t entry)
> > +{
> > + return is_migration_entry(entry) || is_device_private_entry(entry);
> > +}
> 
> --
> Peter Xu






Re: [Intel-gfx] [PATCH v2 13/15] drm/ttm: Add BO and offset arguments for vm_access and vm_fault ttm handlers.

2021-05-18 Thread Christian König

Can you send me the patch directly and not just on CC?

Thanks,
Christian.

Am 18.05.21 um 10:59 schrieb Thomas Hellström:

+ Christian König

On 5/18/21 10:26 AM, Thomas Hellström wrote:

From: Maarten Lankhorst 

This allows other drivers that may not setup the vma in the same way
to use the ttm bo helpers.

Also clarify the documentation a bit, especially related to 
VM_FAULT_RETRY.


Signed-off-by: Maarten Lankhorst 


Lgtm. Reviewed-by: Thomas Hellström 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c    |  4 +-
  drivers/gpu/drm/nouveau/nouveau_ttm.c  |  4 +-
  drivers/gpu/drm/radeon/radeon_ttm.c    |  4 +-
  drivers/gpu/drm/ttm/ttm_bo_vm.c    | 84 +-
  drivers/gpu/drm/vmwgfx/vmwgfx_page_dirty.c |  8 ++-
  include/drm/ttm/ttm_bo_api.h   |  9 ++-
  6 files changed, 75 insertions(+), 38 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c

index d5a9d7a88315..89dafe14f828 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1919,7 +1919,9 @@ static vm_fault_t amdgpu_ttm_fault(struct 
vm_fault *vmf)

  if (ret)
  goto unlock;
  -    ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma->vm_page_prot,
+    ret = ttm_bo_vm_fault_reserved(bo, vmf,
+ drm_vma_node_start(&bo->base.vma_node),
+   vmf->vma->vm_page_prot,
 TTM_BO_VM_NUM_PREFAULT, 1);
  if (ret == VM_FAULT_RETRY && !(vmf->flags & 
FAULT_FLAG_RETRY_NOWAIT))

  return ret;
diff --git a/drivers/gpu/drm/nouveau/nouveau_ttm.c 
b/drivers/gpu/drm/nouveau/nouveau_ttm.c

index b81ae90b8449..555fb6d8be8b 100644
--- a/drivers/gpu/drm/nouveau/nouveau_ttm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c
@@ -144,7 +144,9 @@ static vm_fault_t nouveau_ttm_fault(struct 
vm_fault *vmf)

    nouveau_bo_del_io_reserve_lru(bo);
  prot = vm_get_page_prot(vma->vm_flags);
-    ret = ttm_bo_vm_fault_reserved(vmf, prot, 
TTM_BO_VM_NUM_PREFAULT, 1);

+    ret = ttm_bo_vm_fault_reserved(bo, vmf,
+ drm_vma_node_start(&bo->base.vma_node),
+   prot, TTM_BO_VM_NUM_PREFAULT, 1);
  nouveau_bo_add_io_reserve_lru(bo);
  if (ret == VM_FAULT_RETRY && !(vmf->flags & 
FAULT_FLAG_RETRY_NOWAIT))

  return ret;
diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c 
b/drivers/gpu/drm/radeon/radeon_ttm.c

index 3361d11769a2..ba48a2acdef0 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -816,7 +816,9 @@ static vm_fault_t radeon_ttm_fault(struct 
vm_fault *vmf)

  if (ret)
  goto unlock_resv;
  -    ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma->vm_page_prot,
+    ret = ttm_bo_vm_fault_reserved(bo, vmf,
+ drm_vma_node_start(&bo->base.vma_node),
+   vmf->vma->vm_page_prot,
 TTM_BO_VM_NUM_PREFAULT, 1);
  if (ret == VM_FAULT_RETRY && !(vmf->flags & 
FAULT_FLAG_RETRY_NOWAIT))

  goto unlock_mclk;
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c 
b/drivers/gpu/drm/ttm/ttm_bo_vm.c

index b31b18058965..ed00ccf1376e 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -42,7 +42,7 @@
  #include 
    static vm_fault_t ttm_bo_vm_fault_idle(struct ttm_buffer_object *bo,
-    struct vm_fault *vmf)
+   struct vm_fault *vmf)
  {
  vm_fault_t ret = 0;
  int err = 0;
@@ -122,7 +122,8 @@ static unsigned long ttm_bo_io_mem_pfn(struct 
ttm_buffer_object *bo,

   * Return:
   *    0 on success and the bo was reserved.
   *    VM_FAULT_RETRY if blocking wait.
- *    VM_FAULT_NOPAGE if blocking wait and retrying was not allowed.
+ *    VM_FAULT_NOPAGE if blocking wait and retrying was not allowed, 
or wait interrupted.
+ *    VM_FAULT_SIGBUS if wait on bo->moving failed for reason other 
than a signal.

   */
  vm_fault_t ttm_bo_vm_reserve(struct ttm_buffer_object *bo,
   struct vm_fault *vmf)
@@ -254,7 +255,9 @@ static vm_fault_t ttm_bo_vm_insert_huge(struct 
vm_fault *vmf,

    /**
   * ttm_bo_vm_fault_reserved - TTM fault helper
+ * @bo: The buffer object
   * @vmf: The struct vm_fault given as argument to the fault callback
+ * @mmap_base: The base of the mmap, to which the @vmf fault is 
relative to.

   * @prot: The page protection to be used for this memory area.
   * @num_prefault: Maximum number of prefault pages. The caller may 
want to
   * specify this based on madvice settings and the size of the GPU 
object
@@ -265,19 +268,28 @@ static vm_fault_t ttm_bo_vm_insert_huge(struct 
vm_fault *vmf,

   * memory backing the buffer object, and then returns a return code
   * instructing the caller to retry the page access.
   *
+ * This function ensures any pipelined wait is finished.
+ *
+ * WARNING:
+ * On VM_FAULT_RETRY, the bo will be unlocked by this function when
+ * #FAULT_FLAG_RETRY_NOWAIT is not set inside @vmf->flags. In this
+ * case, the caller should not unlo

Re: [PATCH v2 07/15] drm/ttm: Export ttm_bo_tt_destroy()

2021-05-18 Thread Christian König

Am 18.05.21 um 10:26 schrieb Thomas Hellström:

For the upcoming kmapping i915 memcpy_move, export ttm_bo_tt_destroy().
A future change might be to move the new memcpy_move into ttm, replacing
the old ioremapping one.


Well this is an upfront NAK for that approach.

I've worked quite hard to not export those functions any more since this 
is not something drivers should be allowed to mess with.


If you need this we should probably move the functionality into TTM instead.

Regards,
Christian.



Cc: Christian König 
Signed-off-by: Thomas Hellström 
---
  drivers/gpu/drm/ttm/ttm_bo.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index ca1b098b6a56..4479c55aaa1d 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -1221,3 +1221,4 @@ void ttm_bo_tt_destroy(struct ttm_buffer_object *bo)
ttm_tt_destroy(bo->bdev, bo->ttm);
bo->ttm = NULL;
  }
+EXPORT_SYMBOL(ttm_bo_tt_destroy);




Re: [PATCH v2 08/15] drm/i915/ttm Add a generic TTM memcpy move for page-based iomem

2021-05-18 Thread Thomas Hellström



On 5/18/21 1:55 PM, Christian König wrote:



Am 18.05.21 um 10:26 schrieb Thomas Hellström:

The internal ttm_bo_util memcpy uses vmap functionality, and while it
probably might be possible to use it for copying in- and out of
sglist represented io memory, using io_mem_reserve() / io_mem_free()
callbacks, that would cause problems with fault().
Instead, implement a method mapping page-by-page using kmap_local()
semantics. As an additional benefit we then avoid the occasional global
TLB flushes of vmap() and consuming vmap space, elimination of a 
critical
point of failure and with a slight change of semantics we could also 
push

the memcpy out async for testing and async driver develpment purposes.
Pushing out async can be done since there is no memory allocation 
going on

that could violate the dma_fence lockdep rules.

For copies from iomem, use the WC prefetching memcpy variant for
additional speed.

Note that drivers that don't want to use struct io_mapping but relies on
memremap functionality, and that don't want to use scatterlists for
VRAM may well define specialized (hopefully reusable) iterators for 
their

particular environment.


In general yes please since I have that as TODO for TTM for a very 
long time.


But I would prefer to fix the implementation in TTM instead and give 
it proper cursor handling.


Amdgpu is also using page based iomem and we are having similar 
workarounds in place there as well.


I think it makes sense to unify this inside TTM and remove the old 
memcpy util function when done.


Regards,
Christian.


Christian,

I was thinking when we replace the bo.mem with a pointer (and perhaps 
have a driver callback to allocate the bo->mem,
we could perhaps embed a struct ttm_kmap_iter and use it for all mapping 
in one way or another). That would mean perhaps land this is i915 now 
and sort out the unification once the struct ttm_resource, struct 
ttm_buffer_object separation has landed?


/Thomas




Re: [PATCH 0/7] Per client engine busyness

2021-05-18 Thread Christian König

Am 18.05.21 um 11:35 schrieb Tvrtko Ursulin:


On 17/05/2021 19:02, Nieto, David M wrote:

[AMD Official Use Only]


The format is simple:

:  %


Hm what time period does the percent relate to?

The i915 implementation uses accumulated nanoseconds active. That way 
who reads the file can calculate the percentage relative to the time 
period between two reads of the file.


That sounds much saner to me as well. The percentage calculation inside 
the kernel looks suspiciously misplaced.





we also have entries for the memory mapped:
mem  :  KiB


Okay so in general key values per line in text format. Colon as 
delimiter.


What common fields could be useful between different drivers and what 
common naming scheme, in order to enable as easy as possible creation 
of a generic top-like tool?


driver: 
pdev: 
ring-: N 
...
mem-: N 
...

What else?
Is ring a good common name? We actually more use engine in i915 but I 
am not really bothered about it.


I would prefer engine as well. We are currently in the process of moving 
away from kernel rings, so that notion doesn't make much sense to keep 
forward.


Christian.



Aggregated GPU usage could be easily and generically done by userspace 
by adding all rings and normalizing.


On my submission 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Farchives%2Famd-gfx%2F2021-May%2F063149.html&data=04%7C01%7CChristian.Koenig%40amd.com%7Cbad72cde9a7248b20c7f08d919e03deb%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637569273164210285%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=TW3HaPkqyr6jwhTUVRue3fGTyRfV4KnhEuRtTTI5fMY%3D&reserved=0 
 I 
added a python script to print out the info. It has a CPU usage lower 
that top, for example.


To be absolutely honest, I agree that there is an overhead, but It 
might not be as much as you fear.


For me more the issue is that the extra number of operations grows 
with the number of open files on the system, which has no relation to 
the number of drm clients.


Extra so if the monitoring tool wants to show _only_ DRM processes. 
Then the cost scales with total number of processes time total number 
of files on the server.


This design inefficiency bothers me yes. This is somewhat alleviated 
by the proposal from Chris 
(https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.freedesktop.org%2Fpatch%2F419042%2F%3Fseries%3D86692%26rev%3D1&data=04%7C01%7CChristian.Koenig%40amd.com%7Cbad72cde9a7248b20c7f08d919e03deb%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637569273164210285%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=jNfe8h2BalOOc1Y0Idcs3wxnNOi74XhulkRlebmpgJM%3D&reserved=0) 
although there are downsides there as well. Like needing to keep a map 
of pids to drm files in drivers.


Btw what do you do in that tool for same fd in a multi-threaded process
or so? Do you show duplicate entries or detect and ignore? I guess I 
did not figure out if you show by pid/tgid or by fd.


Regards,

Tvrtko



*From:* Tvrtko Ursulin 
*Sent:* Monday, May 17, 2021 9:00 AM
*To:* Nieto, David M ; Daniel Vetter 
; Koenig, Christian 
*Cc:* Alex Deucher ; Intel Graphics 
Development ; Maling list - DRI 
developers 

*Subject:* Re: [PATCH 0/7] Per client engine busyness

On 17/05/2021 15:39, Nieto, David M wrote:

[AMD Official Use Only]


Maybe we could try to standardize how the different submission ring 
   usage gets exposed in the fdinfo? We went the simple way of just 
adding name and index, but if someone has a suggestion on how else 
we could format them so there is commonality across vendors we could 
just amend those.


Could you paste an example of your format?

Standardized fdinfo sounds good to me in principle. But I would also
like people to look at the procfs proposal from Chris,
   - link to which I have pasted elsewhere in the thread.

Only potential issue with fdinfo I see at the moment is a bit of an
extra cost in DRM client discovery (compared to my sysfs series and also
procfs RFC from Chris). It would require reading all processes (well
threads, then maybe aggregating threads into parent processes), all fd
symlinks, and doing a stat on them to figure out which ones are DRM 
devices.


Btw is DRM_MAJOR 226 consider uapi? I don't see it in uapi headers.

I’d really like to have the process managers tools display GPU usage 
regardless of what vendor is 

[Bug 213129] New: Tiny font

2021-05-18 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=213129

Bug ID: 213129
   Summary: Tiny font
   Product: Drivers
   Version: 2.5
Kernel Version: Tiny font during kernel boot with multihead
  Hardware: All
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: Video(DRI - non Intel)
  Assignee: drivers_video-...@kernel-bugs.osdl.org
  Reporter: thoralf.dass...@gmail.com
Regression: No

Created attachment 296833
  --> https://bugzilla.kernel.org/attachment.cgi?id=296833&action=edit
relev_out

The following issue started with the 5.10.33 vanilla kernel (currently I am on
5.10.37 vanilla). I get my kernels (incl. firmware, modules and headers)
through my distro.

My graphics card is a Radean RX 560, on which I use all three heads:
DisplayPort 1.2, HDMI and DVI.


Here is the issue:
--
At the beginning of the kernel boot process, the frame buffer fonts on all
three heads are normal size. But as soon as amdgpu takes over during the boot
process, the font on the DisplayPort changes to this:
1) the fontsize goes tiny;
2) only the upper-left quadrant of DisplayPort is used.

Fonts on the other two heads are unaffected.

When I disconnect the cables from HDMI and DVI, so that I only use DisplayPort,
then fonts remain of a good size and the full screen area is used.

I attach the relevant output of lspci and dmesg, with all heads connected.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

Re: [PATCH v2 08/15] drm/i915/ttm Add a generic TTM memcpy move for page-based iomem

2021-05-18 Thread Christian König

Am 18.05.21 um 14:04 schrieb Thomas Hellström:


On 5/18/21 1:55 PM, Christian König wrote:



Am 18.05.21 um 10:26 schrieb Thomas Hellström:

The internal ttm_bo_util memcpy uses vmap functionality, and while it
probably might be possible to use it for copying in- and out of
sglist represented io memory, using io_mem_reserve() / io_mem_free()
callbacks, that would cause problems with fault().
Instead, implement a method mapping page-by-page using kmap_local()
semantics. As an additional benefit we then avoid the occasional global
TLB flushes of vmap() and consuming vmap space, elimination of a 
critical
point of failure and with a slight change of semantics we could also 
push

the memcpy out async for testing and async driver develpment purposes.
Pushing out async can be done since there is no memory allocation 
going on

that could violate the dma_fence lockdep rules.

For copies from iomem, use the WC prefetching memcpy variant for
additional speed.

Note that drivers that don't want to use struct io_mapping but 
relies on

memremap functionality, and that don't want to use scatterlists for
VRAM may well define specialized (hopefully reusable) iterators for 
their

particular environment.


In general yes please since I have that as TODO for TTM for a very 
long time.


But I would prefer to fix the implementation in TTM instead and give 
it proper cursor handling.


Amdgpu is also using page based iomem and we are having similar 
workarounds in place there as well.


I think it makes sense to unify this inside TTM and remove the old 
memcpy util function when done.


Regards,
Christian.


Christian,

I was thinking when we replace the bo.mem with a pointer (and perhaps 
have a driver callback to allocate the bo->mem,
we could perhaps embed a struct ttm_kmap_iter and use it for all 
mapping in one way or another). That would mean perhaps land this is 
i915 now and sort out the unification once the struct ttm_resource, 
struct ttm_buffer_object separation has landed?


That stuff is ready, reviewed and I'm just waiting for some amdgpu 
changes to land in drm-misc-next to push it.


But yes in general an iterator for the resource object sounds like the 
right plan to me as well.


Christian.



/Thomas






[PATCH -next] drm: Fix missing unlock and free on error in drm_legacy_addbufs_pci()

2021-05-18 Thread Zou Wei
Add the missing unlock and free before return from function
drm_legacy_addbufs_pci() in the error handling case.

Fixes: 70556e24e18e ("drm: remove usage of drm_pci_alloc/free")
Reported-by: Hulk Robot 
Signed-off-by: Zou Wei 
---
 drivers/gpu/drm/drm_bufs.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_bufs.c b/drivers/gpu/drm/drm_bufs.c
index 4805726..c23d7f7 100644
--- a/drivers/gpu/drm/drm_bufs.c
+++ b/drivers/gpu/drm/drm_bufs.c
@@ -984,8 +984,16 @@ int drm_legacy_addbufs_pci(struct drm_device *dev,
 
while (entry->buf_count < count) {
dmah = kmalloc(sizeof(drm_dma_handle_t), GFP_KERNEL);
-   if (!dmah)
+   if (!dmah) {
+   /* Set count correctly so we free the proper amount. */
+   entry->buf_count = count;
+   entry->seg_count = count;
+   drm_cleanup_buf_error(dev, entry);
+   kfree(temp_pagelist);
+   mutex_unlock(&dev->struct_mutex);
+   atomic_dec(&dev->buf_alloc);
return -ENOMEM;
+   }
 
dmah->size = total;
dmah->vaddr = dma_alloc_coherent(dev->dev,
-- 
2.6.2



Re: [Intel-gfx] [PATCH v2 09/15] drm/ttm, drm/amdgpu: Allow the driver some control over swapping

2021-05-18 Thread Maarten Lankhorst
Op 18-05-2021 om 10:26 schreef Thomas Hellström:
> We are calling the eviction_valuable driver callback at eviction time to
> determine whether we actually can evict a buffer object.
> The upcoming i915 TTM backend needs the same functionality for swapout,
> and that might actually be beneficial to other drivers as well.
>
> Add an eviction_valuable call also in the swapout path. Try to keep the
> current behaviour for all drivers by returning true if the buffer object
> is already in the TTM_PL_SYSTEM placement. We change behaviour for the
> case where a buffer object is in a TT backed placement when swapped out,
> in which case the drivers normal eviction_valuable path is run.
>
> Finally export ttm_tt_unpopulate() and don't swap out bos
> that are not populated. This allows a driver to purge a bo at
> swapout time if its content is no longer valuable rather than to
> have TTM swap the contents out.
>
> Cc: Christian König 
> Signed-off-by: Thomas Hellström 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c |  4 +++
>  drivers/gpu/drm/ttm/ttm_bo.c| 41 +++--
>  drivers/gpu/drm/ttm/ttm_tt.c|  4 +++
>  3 files changed, 33 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 8c7ec09eb1a4..d5a9d7a88315 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -1399,6 +1399,10 @@ static bool amdgpu_ttm_bo_eviction_valuable(struct 
> ttm_buffer_object *bo,
>   struct dma_fence *f;
>   int i;
>  
> + /* Swapout? */
> + if (bo->mem.mem_type == TTM_PL_SYSTEM)
> + return true;
> +
>   if (bo->type == ttm_bo_type_kernel &&
>   !amdgpu_vm_evictable(ttm_to_amdgpu_bo(bo)))
>   return false;
> diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
> index 4479c55aaa1d..6a3f3112f62a 100644
> --- a/drivers/gpu/drm/ttm/ttm_bo.c
> +++ b/drivers/gpu/drm/ttm/ttm_bo.c
> @@ -531,6 +531,10 @@ static int ttm_bo_evict(struct ttm_buffer_object *bo,
>  bool ttm_bo_eviction_valuable(struct ttm_buffer_object *bo,
> const struct ttm_place *place)
>  {
> + dma_resv_assert_held(bo->base.resv);
> + if (bo->mem.mem_type == TTM_PL_SYSTEM)
> + return true;
> +
>   /* Don't evict this BO if it's outside of the
>* requested placement range
>*/
> @@ -553,7 +557,9 @@ EXPORT_SYMBOL(ttm_bo_eviction_valuable);
>   * b. Otherwise, trylock it.
>   */
>  static bool ttm_bo_evict_swapout_allowable(struct ttm_buffer_object *bo,
> - struct ttm_operation_ctx *ctx, bool *locked, bool *busy)
> +struct ttm_operation_ctx *ctx,
> +const struct ttm_place *place,
> +bool *locked, bool *busy)
>  {
>   bool ret = false;
>  
> @@ -571,6 +577,12 @@ static bool ttm_bo_evict_swapout_allowable(struct 
> ttm_buffer_object *bo,
>   *busy = !ret;
>   }
>  
> + if (ret && place && !bo->bdev->funcs->eviction_valuable(bo, place)) {
> + ret = false;
> + if (locked)
> + dma_resv_unlock(bo->base.resv);
> + }

Probably meant to check and clear *locked here?

With that fixed:

Reviewed-by: Maarten Lankhorst 

> +
>   return ret;
>  }
>  
> @@ -625,20 +637,14 @@ int ttm_mem_evict_first(struct ttm_device *bdev,
>   list_for_each_entry(bo, &man->lru[i], lru) {
>   bool busy;
>  
> - if (!ttm_bo_evict_swapout_allowable(bo, ctx, &locked,
> - &busy)) {
> + if (!ttm_bo_evict_swapout_allowable(bo, ctx, place,
> + &locked, &busy)) {
>   if (busy && !busy_bo && ticket !=
>   dma_resv_locking_ctx(bo->base.resv))
>   busy_bo = bo;
>   continue;
>   }
>  
> - if (place && !bdev->funcs->eviction_valuable(bo,
> -   place)) {
> - if (locked)
> - dma_resv_unlock(bo->base.resv);
> - continue;
> - }
>   if (!ttm_bo_get_unless_zero(bo)) {
>   if (locked)
>   dma_resv_unlock(bo->base.resv);
> @@ -1138,10 +1144,18 @@ EXPORT_SYMBOL(ttm_bo_wait);
>  int ttm_bo_swapout(struct ttm_buffer_object *bo, struct ttm_operation_ctx 
> *ctx,
>  gfp_t gfp_flags)
>  {
> + struct ttm_place place = {};
>   bool locked;
>   int ret;
>  
> - if (!ttm_bo_evict_

[PATCH v4 1/3] drm/dp_mst: Add self-tests for up requests

2021-05-18 Thread Sam McNally
Up requests are decoded by drm_dp_sideband_parse_req(), which operates
on a drm_dp_sideband_msg_rx, unlike down requests. Expand the existing
self-test helper sideband_msg_req_encode_decode() to copy the message
contents and length from a drm_dp_sideband_msg_tx to
drm_dp_sideband_msg_rx and use the parse function under test in place of
decode.

Add support for currently-supported up requests to
drm_dp_dump_sideband_msg_req_body(); add support to
drm_dp_encode_sideband_req() to allow encoding for the self-tests.

Add self-tests for CONNECTION_STATUS_NOTIFY and RESOURCE_STATUS_NOTIFY.

Signed-off-by: Sam McNally 
---

Changes in v4:
- New in v4

 drivers/gpu/drm/drm_dp_mst_topology.c |  54 ++-
 .../gpu/drm/drm_dp_mst_topology_internal.h|   4 +
 .../drm/selftests/test-drm_dp_mst_helper.c| 147 --
 3 files changed, 190 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c 
b/drivers/gpu/drm/drm_dp_mst_topology.c
index 54604633e65c..573f39a3dc16 100644
--- a/drivers/gpu/drm/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
@@ -442,6 +442,37 @@ drm_dp_encode_sideband_req(const struct 
drm_dp_sideband_msg_req_body *req,
idx++;
}
break;
+   case DP_CONNECTION_STATUS_NOTIFY: {
+   const struct drm_dp_connection_status_notify *msg;
+
+   msg = &req->u.conn_stat;
+   buf[idx] = (msg->port_number & 0xf) << 4;
+   idx++;
+   memcpy(&raw->msg[idx], msg->guid, 16);
+   idx += 16;
+   raw->msg[idx] = 0;
+   raw->msg[idx] |= msg->legacy_device_plug_status ? BIT(6) : 0;
+   raw->msg[idx] |= msg->displayport_device_plug_status ? BIT(5) : 
0;
+   raw->msg[idx] |= msg->message_capability_status ? BIT(4) : 0;
+   raw->msg[idx] |= msg->input_port ? BIT(3) : 0;
+   raw->msg[idx] |= FIELD_PREP(GENMASK(2, 0), 
msg->peer_device_type);
+   idx++;
+   break;
+   }
+   case DP_RESOURCE_STATUS_NOTIFY: {
+   const struct drm_dp_resource_status_notify *msg;
+
+   msg = &req->u.resource_stat;
+   buf[idx] = (msg->port_number & 0xf) << 4;
+   idx++;
+   memcpy(&raw->msg[idx], msg->guid, 16);
+   idx += 16;
+   buf[idx] = (msg->available_pbn & 0xff00) >> 8;
+   idx++;
+   buf[idx] = (msg->available_pbn & 0xff);
+   idx++;
+   break;
+   }
}
raw->cur_len = idx;
 }
@@ -672,6 +703,22 @@ drm_dp_dump_sideband_msg_req_body(const struct 
drm_dp_sideband_msg_req_body *req
  req->u.enc_status.stream_behavior,
  req->u.enc_status.valid_stream_behavior);
break;
+   case DP_CONNECTION_STATUS_NOTIFY:
+   P("port=%d guid=%*ph legacy=%d displayport=%d messaging=%d 
input=%d peer_type=%d",
+ req->u.conn_stat.port_number,
+ (int)ARRAY_SIZE(req->u.conn_stat.guid), req->u.conn_stat.guid,
+ req->u.conn_stat.legacy_device_plug_status,
+ req->u.conn_stat.displayport_device_plug_status,
+ req->u.conn_stat.message_capability_status,
+ req->u.conn_stat.input_port,
+ req->u.conn_stat.peer_device_type);
+   break;
+   case DP_RESOURCE_STATUS_NOTIFY:
+   P("port=%d guid=%*ph pbn=%d",
+ req->u.resource_stat.port_number,
+ (int)ARRAY_SIZE(req->u.resource_stat.guid), 
req->u.resource_stat.guid,
+ req->u.resource_stat.available_pbn);
+   break;
default:
P("???\n");
break;
@@ -1116,9 +1163,9 @@ static bool 
drm_dp_sideband_parse_resource_status_notify(const struct drm_dp_mst
return false;
 }
 
-static bool drm_dp_sideband_parse_req(const struct drm_dp_mst_topology_mgr 
*mgr,
- struct drm_dp_sideband_msg_rx *raw,
- struct drm_dp_sideband_msg_req_body *msg)
+bool drm_dp_sideband_parse_req(const struct drm_dp_mst_topology_mgr *mgr,
+  struct drm_dp_sideband_msg_rx *raw,
+  struct drm_dp_sideband_msg_req_body *msg)
 {
memset(msg, 0, sizeof(*msg));
msg->req_type = (raw->msg[0] & 0x7f);
@@ -1134,6 +1181,7 @@ static bool drm_dp_sideband_parse_req(const struct 
drm_dp_mst_topology_mgr *mgr,
return false;
}
 }
+EXPORT_SYMBOL_FOR_TESTS_ONLY(drm_dp_sideband_parse_req);
 
 static void build_dpcd_write(struct drm_dp_sideband_msg_tx *msg,
 u8 port_num, u32 offset, u8 num_bytes, u8 *bytes)
diff --git a/drivers/gpu/drm/drm_dp_mst_topology_internal.h 
b/drivers/gpu/drm/drm_dp_mst_topology_internal.h
index eeda9a61c657..0356a2e0dba1 100644
--

[PATCH v4 2/3] drm/dp_mst: Add support for sink event notify messages

2021-05-18 Thread Sam McNally
Sink event notify messages are used for MST CEC IRQs. Add parsing
support for sink event notify messages in preparation for handling MST
CEC IRQs.

Signed-off-by: Sam McNally 
---

Changes in v4:
- Changed logging to use drm_dbg_kms()
- Added self-test

 drivers/gpu/drm/drm_dp_mst_topology.c | 57 ++-
 .../drm/selftests/test-drm_dp_mst_helper.c|  8 +++
 include/drm/drm_dp_mst_helper.h   | 14 +
 3 files changed, 78 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c 
b/drivers/gpu/drm/drm_dp_mst_topology.c
index 573f39a3dc16..29aad3b6b31a 100644
--- a/drivers/gpu/drm/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
@@ -473,6 +473,20 @@ drm_dp_encode_sideband_req(const struct 
drm_dp_sideband_msg_req_body *req,
idx++;
break;
}
+   case DP_SINK_EVENT_NOTIFY: {
+   const struct drm_dp_sink_event_notify *msg;
+
+   msg = &req->u.sink_event;
+   buf[idx] = (msg->port_number & 0xf) << 4;
+   idx++;
+   memcpy(&raw->msg[idx], msg->guid, 16);
+   idx += 16;
+   buf[idx] = (msg->event_id & 0xff00) >> 8;
+   idx++;
+   buf[idx] = (msg->event_id & 0xff);
+   idx++;
+   break;
+   }
}
raw->cur_len = idx;
 }
@@ -719,6 +733,12 @@ drm_dp_dump_sideband_msg_req_body(const struct 
drm_dp_sideband_msg_req_body *req
  (int)ARRAY_SIZE(req->u.resource_stat.guid), 
req->u.resource_stat.guid,
  req->u.resource_stat.available_pbn);
break;
+   case DP_SINK_EVENT_NOTIFY:
+   P("port=%d guid=%*ph event=%d",
+ req->u.sink_event.port_number,
+ (int)ARRAY_SIZE(req->u.sink_event.guid), 
req->u.sink_event.guid,
+ req->u.sink_event.event_id);
+   break;
default:
P("???\n");
break;
@@ -1163,6 +1183,30 @@ static bool 
drm_dp_sideband_parse_resource_status_notify(const struct drm_dp_mst
return false;
 }
 
+static bool drm_dp_sideband_parse_sink_event_notify(const struct 
drm_dp_mst_topology_mgr *mgr,
+   struct drm_dp_sideband_msg_rx *raw,
+   struct drm_dp_sideband_msg_req_body *msg)
+{
+   int idx = 1;
+
+   msg->u.sink_event.port_number = (raw->msg[idx] & 0xf0) >> 4;
+   idx++;
+   if (idx > raw->curlen)
+   goto fail_len;
+
+   memcpy(msg->u.sink_event.guid, &raw->msg[idx], 16);
+   idx += 16;
+   if (idx > raw->curlen)
+   goto fail_len;
+
+   msg->u.sink_event.event_id = (raw->msg[idx] << 8) | (raw->msg[idx + 1]);
+   idx++;
+   return true;
+fail_len:
+   drm_dbg_kms(mgr->dev, "sink event notify parse length fail %d %d\n", 
idx, raw->curlen);
+   return false;
+}
+
 bool drm_dp_sideband_parse_req(const struct drm_dp_mst_topology_mgr *mgr,
   struct drm_dp_sideband_msg_rx *raw,
   struct drm_dp_sideband_msg_req_body *msg)
@@ -1175,6 +1219,8 @@ bool drm_dp_sideband_parse_req(const struct 
drm_dp_mst_topology_mgr *mgr,
return drm_dp_sideband_parse_connection_status_notify(mgr, raw, 
msg);
case DP_RESOURCE_STATUS_NOTIFY:
return drm_dp_sideband_parse_resource_status_notify(mgr, raw, 
msg);
+   case DP_SINK_EVENT_NOTIFY:
+   return drm_dp_sideband_parse_sink_event_notify(mgr, raw, msg);
default:
drm_err(mgr->dev, "Got unknown request 0x%02x (%s)\n",
msg->req_type, drm_dp_mst_req_type_str(msg->req_type));
@@ -4106,6 +4152,8 @@ drm_dp_mst_process_up_req(struct drm_dp_mst_topology_mgr 
*mgr,
guid = msg->u.conn_stat.guid;
else if (msg->req_type == DP_RESOURCE_STATUS_NOTIFY)
guid = msg->u.resource_stat.guid;
+   else if (msg->req_type == DP_SINK_EVENT_NOTIFY)
+   guid = msg->u.sink_event.guid;
 
if (guid)
mstb = drm_dp_get_mst_branch_device_by_guid(mgr, guid);
@@ -4177,7 +4225,8 @@ static int drm_dp_mst_handle_up_req(struct 
drm_dp_mst_topology_mgr *mgr)
drm_dp_sideband_parse_req(mgr, &mgr->up_req_recv, &up_req->msg);
 
if (up_req->msg.req_type != DP_CONNECTION_STATUS_NOTIFY &&
-   up_req->msg.req_type != DP_RESOURCE_STATUS_NOTIFY) {
+   up_req->msg.req_type != DP_RESOURCE_STATUS_NOTIFY &&
+   up_req->msg.req_type != DP_SINK_EVENT_NOTIFY) {
drm_dbg_kms(mgr->dev, "Received unknown up req type, ignoring: 
%x\n",
up_req->msg.req_type);
kfree(up_req);
@@ -4205,6 +4254,12 @@ static int drm_dp_mst_handle_up_req(struct 
drm_dp_mst_topology_mgr *mgr)
drm_dbg_kms(mgr->dev, "Got RSN: pn: %d avail_pbn %d\n",
   

[PATCH v4 3/3] drm_dp_cec: add MST support

2021-05-18 Thread Sam McNally
With DP v2.0 errata E5, CEC tunneling can be supported through an MST
topology.

When tunneling CEC through an MST port, CEC IRQs are delivered via a
sink event notify message; when a sink event notify message is received,
trigger CEC IRQ handling - ESI1 is not used for remote CEC IRQs so its
value is not checked.

Register and unregister for all MST connectors, ensuring their
drm_dp_aux_cec struct won't be accessed uninitialized.

Reviewed-by: Hans Verkuil 
Signed-off-by: Sam McNally 
---

Changes in v4:
- Removed use of work queues
- Updated checks of aux.transfer to accept aux.is_remote

Changes in v3:
- Fixed whitespace in drm_dp_cec_mst_irq_work()
- Moved drm_dp_cec_mst_set_edid_work() with the other set_edid functions

Changes in v2:
- Used aux->is_remote instead of aux->cec.is_mst, removing the need for
  the previous patch in the series
- Added a defensive check for null edid in the deferred set_edid work,
  in case the edid is no longer valid at that point

 drivers/gpu/drm/drm_dp_cec.c  | 20 
 drivers/gpu/drm/drm_dp_mst_topology.c | 24 
 2 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/drm_dp_cec.c b/drivers/gpu/drm/drm_dp_cec.c
index 3ab2609f9ec7..1abd3f4654dc 100644
--- a/drivers/gpu/drm/drm_dp_cec.c
+++ b/drivers/gpu/drm/drm_dp_cec.c
@@ -14,6 +14,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Unfortunately it turns out that we have a chicken-and-egg situation
@@ -245,13 +246,22 @@ void drm_dp_cec_irq(struct drm_dp_aux *aux)
int ret;
 
/* No transfer function was set, so not a DP connector */
-   if (!aux->transfer)
+   if (!aux->transfer && !aux->is_remote)
return;
 
mutex_lock(&aux->cec.lock);
if (!aux->cec.adap)
goto unlock;
 
+   if (aux->is_remote) {
+   /*
+* For remote connectors, CEC IRQ is triggered by an explicit
+* message so ESI1 is not involved.
+*/
+   drm_dp_cec_handle_irq(aux);
+   goto unlock;
+   }
+
ret = drm_dp_dpcd_readb(aux, DP_DEVICE_SERVICE_IRQ_VECTOR_ESI1,
&cec_irq);
if (ret < 0 || !(cec_irq & DP_CEC_IRQ))
@@ -307,7 +317,7 @@ void drm_dp_cec_set_edid(struct drm_dp_aux *aux, const 
struct edid *edid)
u8 cap;
 
/* No transfer function was set, so not a DP connector */
-   if (!aux->transfer)
+   if (!aux->transfer && !aux->is_remote)
return;
 
 #ifndef CONFIG_MEDIA_CEC_RC
@@ -375,6 +385,7 @@ void drm_dp_cec_set_edid(struct drm_dp_aux *aux, const 
struct edid *edid)
 unlock:
mutex_unlock(&aux->cec.lock);
 }
+
 EXPORT_SYMBOL(drm_dp_cec_set_edid);
 
 /*
@@ -383,7 +394,7 @@ EXPORT_SYMBOL(drm_dp_cec_set_edid);
 void drm_dp_cec_unset_edid(struct drm_dp_aux *aux)
 {
/* No transfer function was set, so not a DP connector */
-   if (!aux->transfer)
+   if (!aux->transfer && !aux->is_remote)
return;
 
cancel_delayed_work_sync(&aux->cec.unregister_work);
@@ -393,6 +404,7 @@ void drm_dp_cec_unset_edid(struct drm_dp_aux *aux)
goto unlock;
 
cec_phys_addr_invalidate(aux->cec.adap);
+
/*
 * We're done if we want to keep the CEC device
 * (drm_dp_cec_unregister_delay is >= NEVER_UNREG_DELAY) or if the
@@ -428,7 +440,7 @@ void drm_dp_cec_register_connector(struct drm_dp_aux *aux,
   struct drm_connector *connector)
 {
WARN_ON(aux->cec.adap);
-   if (WARN_ON(!aux->transfer))
+   if (WARN_ON(!aux->transfer && !aux->is_remote))
return;
aux->cec.connector = connector;
INIT_DELAYED_WORK(&aux->cec.unregister_work,
diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c 
b/drivers/gpu/drm/drm_dp_mst_topology.c
index 29aad3b6b31a..5612caf9fb49 100644
--- a/drivers/gpu/drm/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/drm_dp_mst_topology.c
@@ -2359,6 +2359,8 @@ static void build_mst_prop_path(const struct 
drm_dp_mst_branch *mstb,
 int drm_dp_mst_connector_late_register(struct drm_connector *connector,
   struct drm_dp_mst_port *port)
 {
+   drm_dp_cec_register_connector(&port->aux, connector);
+
drm_dbg_kms(port->mgr->dev, "registering %s remote bus for %s\n",
port->aux.name, connector->kdev->kobj.name);
 
@@ -2382,6 +2384,8 @@ void drm_dp_mst_connector_early_unregister(struct 
drm_connector *connector,
drm_dbg_kms(port->mgr->dev, "unregistering %s remote bus for %s\n",
port->aux.name, connector->kdev->kobj.name);
drm_dp_aux_unregister_devnode(&port->aux);
+
+   drm_dp_cec_unregister_connector(&port->aux);
 }
 EXPORT_SYMBOL(drm_dp_mst_connector_early_unregister);
 
@@ -2682,6 +2686,21 @@ drm_dp_mst_handle_conn_stat(struct drm_dp_mst_branch 
*mstb,
queue_work(system_lon

Re: [PATCH] drm/i915: only disable default vga device

2021-05-18 Thread Emil Velikov
On Tue, 18 May 2021 at 12:17, Ville Syrjälä
 wrote:
>
> On Tue, May 18, 2021 at 12:09:56PM +0100, Emil Velikov wrote:
> > Hi Ville,
> >
> > On Mon, 17 May 2021 at 18:24, Ville Syrjälä
> >  wrote:
> > >
> > > On Sun, May 16, 2021 at 06:14:32PM +0100, Emil Velikov wrote:
> > > > From: Vivek Das Mohapatra 
> > > >
> > > > This patch is to do with seamless handover, eg when the sequence is
> > > > bootloader → plymouth → desktop.
> > > >
> > > > It switches the vga arbiter from the "other" GPU to the default one
> > > > (intel in this case), so the driver can issue some io().
> > >
> > > I don't understand what this commit message is trying to say.
> > >
> > Bunch of context is lost due to the patch age, so I'm not 100% sure of
> > the actual hardware setup where this occurs.
> > Does the following make sense?
> >
> > Currently on dual GPU systems, we do not get seamless handover as the
> > output flickers during the transition bootloader -> plymouth ->
> > desktop.
> > This happens as a result of switching (via the VGA arbiter) from the
> > "other" GPU back to the default i915 one and issuing io() commands.
>
> Hmm. Does this work?
>
Thanks I'll it give it a try. Might need a few days to find the right
hardware/software combination.

-Emil


Re: [PATCH v7 02/10] dt-bindings: display: simple: List hpd properties in panel-simple

2021-05-18 Thread Rob Herring
On Mon, May 17, 2021 at 3:09 PM Douglas Anderson  wrote:
>
> These are described in panel-common.yaml but if I don't list them in
> panel-simple then I get yells when running 'dt_binding_check' in a
> future patch. List them along with other properties that seem to be
> listed in panel-simple for similar reasons.

If you have HPD, is it still a simple panel? I don't see this as an
omission because the use of these properties for simple panels was
never documented IIRC.

Not saying we can't add them, but justify it as an addition, not just
fixing a warning.

>
> Signed-off-by: Douglas Anderson 
> ---
> I didn't spend tons of time digging to see if there was supposed to be
> a better way of doing this. If there is, feel free to yell.

That's the right way to do it unless you want to allow all common
properties, then we'd use unevaluatedProperties instead of
additionalProperties.


>
> Changes in v7:
> - List hpd properties bindings patch new for v7.
>
>  .../devicetree/bindings/display/panel/panel-simple.yaml | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git 
> a/Documentation/devicetree/bindings/display/panel/panel-simple.yaml 
> b/Documentation/devicetree/bindings/display/panel/panel-simple.yaml
> index b3797ba2698b..4a0a5e1ee252 100644
> --- a/Documentation/devicetree/bindings/display/panel/panel-simple.yaml
> +++ b/Documentation/devicetree/bindings/display/panel/panel-simple.yaml
> @@ -298,6 +298,8 @@ properties:
>enable-gpios: true
>port: true
>power-supply: true
> +  no-hpd: true
> +  hpd-gpios: true
>
>  additionalProperties: false
>
> --
> 2.31.1.751.gd2f1c929bd-goog
>


Re: [RFC] Add DMA_RESV_USAGE flags

2021-05-18 Thread Christian König

Hi Jason & Daniel,

Am 18.05.21 um 07:59 schrieb Daniel Vetter:

On Tue, May 18, 2021 at 12:49 AM Jason Ekstrand  wrote:

On Mon, May 17, 2021 at 3:15 PM Daniel Vetter  wrote:

On Mon, May 17, 2021 at 9:38 PM Christian König
 wrote:

Am 17.05.21 um 17:04 schrieb Daniel Vetter:

On Mon, May 17, 2021 at 04:11:18PM +0200, Christian König wrote:

We had a long outstanding problem in amdgpu that buffers exported to
user drivers by DMA-buf serialize all command submissions using them.

In other words we can't compose the buffer with different engines and
then send it to another driver for display further processing.

This was added to work around the fact that i915 didn't wanted to wait
for shared fences in the dma_resv objects before displaying a buffer.

Since this problem is now causing issues with Vulkan we need to find a
better solution for that.

The patch set here tries to do this by adding an usage flag to the
shared fences noting when and how they should participate in implicit
synchronization.

So the way this is fixed in every other vulkan driver is that vulkan
userspace sets flags in the CS ioctl when it wants to synchronize with
implicit sync. This gets you mostly there. Last time I checked amdgpu
isn't doing this, and yes that's broken.

And exactly that is a really bad approach as far as I can see. The
Vulkan stack on top simply doesn't know when to set this flag during CS.

Adding Jason for the Vulkan side of things, because this isn't how I
understand this works.

But purely form a kernel pov your patches are sketchy for two reasons:

- we reinstate the amdgpu special case of not setting exclusive fences

- you only fix the single special case of i915 display, nothing else

That's not how a cross driver interface works. And if you'd do this
properly, you'd be back to all the same sync fun you've orignally had,
with all the same fallout.

I think I'm starting to see what Christian is trying to do here and I
think there likely is a real genuine problem here.  I'm not convinced
this is 100% of a solution but there might be something real.  Let me
see if I can convince you or if I just make a hash of things. :-)

The problem, once again, comes down to memory fencing vs. execution
fencing and the way that we've unfortunately tied them together in the
kernel.  With the current architecture, the only way to get proper
write-fence semantics for implicit sync is to take an exclusive fence
on the buffer.  This implies two things:

  1. You have to implicitly wait on EVERY fence on the buffer before
you can start your write-fenced operation

  2. No one else can start ANY operation which accesses that buffer
until you're done.


Yes, exactly that. You absolutely nailed it.

I unfortunately also have a 3rd use case:

3. Operations which shouldn't participate in any syncing, but only 
affect the memory management.


This is basically our heavyweight TLB flush after unmapping the BO from 
somebodies page tables. Nobody should ever be concerned about it for any 
form of synchronization, but memory managment is not allowed to reuse or 
move the buffer before the operation is completed.




Let's say that you have a buffer which is shared between two drivers A
and B and let's say driver A has thrown a fence on it just to ensure
that the BO doesn't get swapped out to disk until it's at a good
stopping point.  Then driver B comes along and wants to throw a
write-fence on it.  Suddenly, your memory fence from driver A causes
driver B to have to stall waiting for a "good" time to throw in a
fence.  It sounds like this is the sort of scenario that Christian is
running into.  And, yes, with certain Vulkan drivers being a bit
sloppy about exactly when they throw in write fences, I could see it
being a real problem.

Yes this is a potential problem, and on the i915 side we need to do
some shuffling here most likely. Especially due to discrete, but the
problem is pre-existing. tbh I forgot about the implications here
until I pondered this again yesterday evening.

But afaiui the amdgpu code and winsys in mesa, this isn't (yet) the
problem amd vk drivers have. The issue is that with amdgpu, all you
supply are the following bits at CS time:
- list of always mapped private buffers, which is implicit and O(1) in
the kernel fastpath
- additional list of shared buffers that are used by the current CS

I didn't check how exactly that works wrt winsys buffer ownership, but
the thing is that on the kernel side _any_ buffer in there is treated
as a implicit sync'ed write. Which means if you render your winsys
with a bunch of command submission split over 3d and compute pipes,
you end up with horrendous amounts of oversync.


What are you talking about? We have no sync at all for submissions from 
the same client.



The reason for this is that amdgpu decided to go with a different
implicit sync model than everyone else:
- within an drm file everything is unsynced and left to userspace to
handle, amdgpu.ko only ever sets the share

Re: [PATCH v2 08/15] drm/i915/ttm Add a generic TTM memcpy move for page-based iomem

2021-05-18 Thread Thomas Hellström



On 5/18/21 2:09 PM, Christian König wrote:

Am 18.05.21 um 14:04 schrieb Thomas Hellström:


On 5/18/21 1:55 PM, Christian König wrote:



Am 18.05.21 um 10:26 schrieb Thomas Hellström:

The internal ttm_bo_util memcpy uses vmap functionality, and while it
probably might be possible to use it for copying in- and out of
sglist represented io memory, using io_mem_reserve() / io_mem_free()
callbacks, that would cause problems with fault().
Instead, implement a method mapping page-by-page using kmap_local()
semantics. As an additional benefit we then avoid the occasional 
global
TLB flushes of vmap() and consuming vmap space, elimination of a 
critical
point of failure and with a slight change of semantics we could 
also push

the memcpy out async for testing and async driver develpment purposes.
Pushing out async can be done since there is no memory allocation 
going on

that could violate the dma_fence lockdep rules.

For copies from iomem, use the WC prefetching memcpy variant for
additional speed.

Note that drivers that don't want to use struct io_mapping but 
relies on

memremap functionality, and that don't want to use scatterlists for
VRAM may well define specialized (hopefully reusable) iterators for 
their

particular environment.


In general yes please since I have that as TODO for TTM for a very 
long time.


But I would prefer to fix the implementation in TTM instead and give 
it proper cursor handling.


Amdgpu is also using page based iomem and we are having similar 
workarounds in place there as well.


I think it makes sense to unify this inside TTM and remove the old 
memcpy util function when done.


Regards,
Christian.


Christian,

I was thinking when we replace the bo.mem with a pointer (and perhaps 
have a driver callback to allocate the bo->mem,
we could perhaps embed a struct ttm_kmap_iter and use it for all 
mapping in one way or another). That would mean perhaps land this is 
i915 now and sort out the unification once the struct ttm_resource, 
struct ttm_buffer_object separation has landed?


That stuff is ready, reviewed and I'm just waiting for some amdgpu 
changes to land in drm-misc-next to push it.


But yes in general an iterator for the resource object sounds like the 
right plan to me as well.


Christian.


OK, so then are you OK with landing this in i915 for now? That would 
also ofc mean the export you NAK'd but strictly for this memcpy use 
until we merge it with TTM?


/Thomas





/Thomas






Re: [PATCH v5 1/2] drm/bridge: anx7625: refactor power control to use runtime PM framework

2021-05-18 Thread Robert Foss
Series applied to drm-misc-next

https://cgit.freedesktop.org/drm/drm-misc/commit/?id=60487584a79abd763570b54d59e6aad586d64c7b


Re: [PATCH 09/11] dma-buf: add shared fence usage flags

2021-05-18 Thread Christian König

Am 17.05.21 um 22:36 schrieb Daniel Vetter:

On Mon, May 17, 2021 at 04:11:27PM +0200, Christian König wrote:

Add usage flags for shared fences and improve the documentation.

This allows driver to better specify what shared fences
are doing with the resource.

Signed-off-by: Christian König 
diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 16b869d9b1d6..c9bbc4630afc 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -700,7 +700,7 @@ static int ttm_bo_add_move_fence(struct ttm_buffer_object 
*bo,
return ret;
}
  
-	dma_resv_add_shared_fence(bo->base.resv, fence);

+   dma_resv_add_shared(bo->base.resv, fence, DMA_RESV_USAGE_RW);

Entirely aside, but I ended up scratching my head a lot for why exactly
this here is a shared fence, and why that's ok. Since just looking at this
it seems like waiting for the memory allocation to actually be owned by
this driver is optional.

Is this ok because the next thing we'll do is a move, which will then set
the exclusive fence here. Which will then wait on the shared one here, so
it doesn't matter? Or well, allows us to pipeline the eviction of ttm_man
against whatever might be currently keeping the bo busy in it's current
place?


Yes, exactly that.

We just need to make sure that the new BO location isn't used before the 
fence is completed, but we can't use the exclusive slot because we have 
no guarantee at all that the move fence signals in the right order.


Regards,
Christian.



Might be good candidate to explain this in a comment or something like
that.
-Daniel




Re: [PATCH] drm: bridge: cdns-mhdp8546: Fix PM reference leak in cdns_mhdp_probe()

2021-05-18 Thread Robert Foss
On Tue, 18 May 2021 at 09:42, Johan Hovold  wrote:
>
> On Mon, May 17, 2021 at 11:27:38AM +0200, Robert Foss wrote:
> > Hey Yu,
> >
> > On Mon, 17 May 2021 at 10:08, Yu Kuai  wrote:
> > >
> > > pm_runtime_get_sync will increment pm usage counter even it failed.
> > > Forgetting to putting operation will result in reference leak here.
> > > Fix it by replacing it with pm_runtime_resume_and_get to keep usage
> > > counter balanced.
> > >
> > > Reported-by: Hulk Robot 
> > > Signed-off-by: Yu Kuai 
> > > ---
> > >  drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c 
> > > b/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c
> > > index 0cd8f40fb690..305489d48c16 100644
> > > --- a/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c
> > > +++ b/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c
> > > @@ -2478,7 +2478,7 @@ static int cdns_mhdp_probe(struct platform_device 
> > > *pdev)
> > > clk_prepare_enable(clk);
> > >
> > > pm_runtime_enable(dev);
> > > -   ret = pm_runtime_get_sync(dev);
> > > +   ret = pm_runtime_resume_and_get(dev);
> > > if (ret < 0) {
> > > dev_err(dev, "pm_runtime_get_sync failed\n");

This error message is a bit confusing now, could you update it.

> > > pm_runtime_disable(dev);
> >
> > The code is correct as it is. If pm_runtime_get_sync() fails and
> > increments[1] the pm.usage_count variable, that isn't a problem since
> > pm_runtime_disable() disables pm, and resets pm.usage_count variable
> > to zero[2].
>
> No it doesn't; pm_runtime_disable() does not reset the counter and you
> still need to decrement the usage count when pm_runtime_get_sync()
> fails.

Thanks for chiming in Johan, you're absolutely right and I must have
misread something.

With the above fix, feel free to add my r-b.

Reviewed-by: Robert Foss 


Re: [PATCH] dt-bindings: display: bridge: lvds-codec: Fix spacing

2021-05-18 Thread Rob Herring
On Sun, May 16, 2021 at 12:48:05AM +0300, Laurent Pinchart wrote:
> Hi Marek,
> 
> Thank you for the patch.
> 
> On Sat, May 15, 2021 at 10:39:32PM +0200, Marek Vasut wrote:
> > Add missing spaces to make the diagrams readable, no functional change.
> 
> Looks better indeed. The patch view looks bad though, because of the
> tabs. Maybe you could replace them with spaces, while at it ?

It's best to not have tabs in yaml. And if we ever generate any 
documentation out of the schema, the tabs would probably cause issues.

Rob


Re: [PATCH v2 04/15] drm/ttm: Export functions to initialize and finalize the ttm range manager standalone

2021-05-18 Thread Thomas Hellström



On 5/18/21 1:51 PM, Christian König wrote:

Am 18.05.21 um 10:26 schrieb Thomas Hellström:
i915 mock selftests are run without the device set up. In order to be 
able
to run the region related mock selftests, export functions in order 
for the

TTM range manager to be set up without a device to attach it to.


From the code it looks good, but to be honest I don't think that this 
makes much sense from the organizational point of view.


If a self test exercises internals of TTM it should be moved into TTM 
as well.


This particular selftest actually exercises i915 memory regions which is 
a level above TTM, but the memory regions are backed by TTM. Since they 
are mock selftests they don't have a TTM device. For the buddy allocator 
the situation would be the same, but there we have selftests that 
exercise the allocator standalone, and those would probably fit best 
into a TTM selftest infrastructure.


Although in this particular case, we could of course add a mock TTM 
device and be done. Pls let me know what you think.


/Thomas




Re: [PATCH v2 08/15] drm/i915/ttm Add a generic TTM memcpy move for page-based iomem

2021-05-18 Thread Christian König

Am 18.05.21 um 14:52 schrieb Thomas Hellström:


On 5/18/21 2:09 PM, Christian König wrote:

Am 18.05.21 um 14:04 schrieb Thomas Hellström:


On 5/18/21 1:55 PM, Christian König wrote:



Am 18.05.21 um 10:26 schrieb Thomas Hellström:

The internal ttm_bo_util memcpy uses vmap functionality, and while it
probably might be possible to use it for copying in- and out of
sglist represented io memory, using io_mem_reserve() / io_mem_free()
callbacks, that would cause problems with fault().
Instead, implement a method mapping page-by-page using kmap_local()
semantics. As an additional benefit we then avoid the occasional 
global
TLB flushes of vmap() and consuming vmap space, elimination of a 
critical
point of failure and with a slight change of semantics we could 
also push
the memcpy out async for testing and async driver develpment 
purposes.
Pushing out async can be done since there is no memory allocation 
going on

that could violate the dma_fence lockdep rules.

For copies from iomem, use the WC prefetching memcpy variant for
additional speed.

Note that drivers that don't want to use struct io_mapping but 
relies on

memremap functionality, and that don't want to use scatterlists for
VRAM may well define specialized (hopefully reusable) iterators 
for their

particular environment.


In general yes please since I have that as TODO for TTM for a very 
long time.


But I would prefer to fix the implementation in TTM instead and 
give it proper cursor handling.


Amdgpu is also using page based iomem and we are having similar 
workarounds in place there as well.


I think it makes sense to unify this inside TTM and remove the old 
memcpy util function when done.


Regards,
Christian.


Christian,

I was thinking when we replace the bo.mem with a pointer (and 
perhaps have a driver callback to allocate the bo->mem,
we could perhaps embed a struct ttm_kmap_iter and use it for all 
mapping in one way or another). That would mean perhaps land this is 
i915 now and sort out the unification once the struct ttm_resource, 
struct ttm_buffer_object separation has landed?


That stuff is ready, reviewed and I'm just waiting for some amdgpu 
changes to land in drm-misc-next to push it.


But yes in general an iterator for the resource object sounds like 
the right plan to me as well.


Christian.


OK, so then are you OK with landing this in i915 for now? That would 
also ofc mean the export you NAK'd but strictly for this memcpy use 
until we merge it with TTM?


Well you can of course prototype that in i915, but I really don't want 
to export the TT functions upstream.


Can we cleanly move that functionality into TTM instead?

Christian.




/Thomas





/Thomas








Re: [PATCH v2 04/15] drm/ttm: Export functions to initialize and finalize the ttm range manager standalone

2021-05-18 Thread Christian König




Am 18.05.21 um 15:06 schrieb Thomas Hellström:


On 5/18/21 1:51 PM, Christian König wrote:

Am 18.05.21 um 10:26 schrieb Thomas Hellström:
i915 mock selftests are run without the device set up. In order to 
be able
to run the region related mock selftests, export functions in order 
for the

TTM range manager to be set up without a device to attach it to.


From the code it looks good, but to be honest I don't think that this 
makes much sense from the organizational point of view.


If a self test exercises internals of TTM it should be moved into TTM 
as well.


This particular selftest actually exercises i915 memory regions which 
is a level above TTM, but the memory regions are backed by TTM. Since 
they are mock selftests they don't have a TTM device. For the buddy 
allocator the situation would be the same, but there we have selftests 
that exercise the allocator standalone, and those would probably fit 
best into a TTM selftest infrastructure.


Although in this particular case, we could of course add a mock TTM 
device and be done. Pls let me know what you think.


Well if you can do that with a dummy device then that would certainly be 
the preferred option.


Christian.



/Thomas






Re: [PATCH v8 5/8] mm: Device exclusive memory access

2021-05-18 Thread Alistair Popple
On Tuesday, 18 May 2021 12:08:34 PM AEST Peter Xu wrote:
> > v6:
> > * Fixed a bisectablity issue due to incorrectly applying the rename of
> > 
> >   migrate_pgmap_owner to the wrong patches for Nouveau and hmm_test.
> > 
> > v5:
> > * Renamed range->migrate_pgmap_owner to range->owner.
> 
> May be nicer to mention this rename in commit message (or make it a separate
> patch)?

Ok, I think if it needs to be explicitly called out in the commit message it 
should be a separate patch. Originally I thought the change was _just_ small 
enough to include here, but this patch has since grown so I'll split it out.
 


> > diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> > index 0e25d829f742..3a1ce4ef9276 100644
> > --- a/include/linux/rmap.h
> > +++ b/include/linux/rmap.h
> > @@ -193,6 +193,10 @@ int page_referenced(struct page *, int is_locked,
> > 
> >  bool try_to_migrate(struct page *page, enum ttu_flags flags);
> >  bool try_to_unmap(struct page *, enum ttu_flags flags);
> > 
> > +int make_device_exclusive_range(struct mm_struct *mm, unsigned long
> > start,
> > + unsigned long end, struct page **pages,
> > + void *arg);
> > +
> > 
> >  /* Avoid racy checks */
> >  #define PVMW_SYNC(1 << 0)
> >  /* Look for migarion entries rather than present PTEs */
> > 
> > diff --git a/include/linux/swap.h b/include/linux/swap.h
> > index 516104b9334b..7a3c260146df 100644
> > --- a/include/linux/swap.h
> > +++ b/include/linux/swap.h
> > @@ -63,9 +63,11 @@ static inline int current_is_kswapd(void)
> > 
> >   * to a special SWP_DEVICE_* entry.
> >   */
> 
> Should we add another short description for the newly added two types?
> Otherwise the reader could get confused assuming the above comment is
> explaining all four types, while it is for SWP_DEVICE_{READ|WRITE} only?

Good idea, I can see how that could be confusing. Will add a short 
description.



> > diff --git a/mm/memory.c b/mm/memory.c
> > index 3a5705cfc891..556ff396f2e9 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -700,6 +700,84 @@ struct page *vm_normal_page_pmd(struct vm_area_struct
> > *vma, unsigned long addr,> 
> >  }
> >  #endif
> > 
> > +static void restore_exclusive_pte(struct vm_area_struct *vma,
> > +   struct page *page, unsigned long address,
> > +   pte_t *ptep)
> > +{
> > + pte_t pte;
> > + swp_entry_t entry;
> > +
> > + pte = pte_mkold(mk_pte(page, READ_ONCE(vma->vm_page_prot)));
> > + if (pte_swp_soft_dirty(*ptep))
> > + pte = pte_mksoft_dirty(pte);
> > +
> > + entry = pte_to_swp_entry(*ptep);
> > + if (pte_swp_uffd_wp(*ptep))
> > + pte = pte_mkuffd_wp(pte);
> > + else if (is_writable_device_exclusive_entry(entry))
> > + pte = maybe_mkwrite(pte_mkdirty(pte), vma);
> > +
> > + set_pte_at(vma->vm_mm, address, ptep, pte);
> > +
> > + /*
> > +  * No need to take a page reference as one was already
> > +  * created when the swap entry was made.
> > +  */
> > + if (PageAnon(page))
> > + page_add_anon_rmap(page, vma, address, false);
> > + else
> > + page_add_file_rmap(page, false);
> > +
> > + if (vma->vm_flags & VM_LOCKED)
> > + mlock_vma_page(page);
> > +
> > + /*
> > +  * No need to invalidate - it was non-present before. However
> > +  * secondary CPUs may have mappings that need invalidating.
> > +  */
> > + update_mmu_cache(vma, address, ptep);
> > +}
> > +
> > +/*
> > + * Tries to restore an exclusive pte if the page lock can be acquired
> > without + * sleeping. Returns 0 on success or -EBUSY if the page could
> > not be locked or + * the entry no longer points at locked_page in which
> > case locked_page should be + * locked before retrying the call.
> > + */
> > +static unsigned long
> > +try_restore_exclusive_pte(struct mm_struct *src_mm, pte_t *src_pte,
> > +   struct vm_area_struct *vma, unsigned long addr,
> > +   struct page **locked_page)
> > +{
> > + swp_entry_t entry = pte_to_swp_entry(*src_pte);
> > + struct page *page = pfn_swap_entry_to_page(entry);
> > +
> > + if (*locked_page) {
> > + /* The entry changed, retry */
> > + if (unlikely(*locked_page != page)) {
> > + unlock_page(*locked_page);
> > + put_page(*locked_page);
> > + *locked_page = page;
> > + return -EBUSY;
> > + }
> > + restore_exclusive_pte(vma, page, addr, src_pte);
> > + unlock_page(page);
> > + put_page(page);
> > + *locked_page = NULL;
> > + return 0;
> > + }
> > +
> > + if (trylock_page(page)) {
> > + restore_exclusive_pte(vma, page, addr, src_pte);
> > + unlock_page(page);
> > + return 0;
> > + }

Re: [PATCH] drm/bridge: ti-sn65dsi86: fix a ternary type promotion bug

2021-05-18 Thread Robert Foss
Hey Dan,

Thanks for submitting this.

On Tue, 18 May 2021 at 11:20, Dan Carpenter  wrote:
>
> The ti_sn_aux_transfer() function returns ssize_t (signed long).  It's
> supposed to return negative error codes or the number of bytes
> transferred.  The "ret" variable is int and the "len" variable is
> unsigned int.
>
> The problem is that with a ternary like this, the negative int is first
> type promoted to unsigned int to match "len" at this point it is a high
> positive value.  Then when it is type promoted to ssize_t (s64) it
> remains a high positive value instead of sign extending and becoming a
> negative again.
>
> Fix this by removing the ternary.
>
> Fixes: b137406d9679 ("drm/bridge: ti-sn65dsi86: If refclk, DP AUX can happen 
> w/out pre-enable")
> Signed-off-by: Dan Carpenter 
> ---
>  drivers/gpu/drm/bridge/ti-sn65dsi86.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/bridge/ti-sn65dsi86.c 
> b/drivers/gpu/drm/bridge/ti-sn65dsi86.c
> index bb0a0e1c6341..45a2969afb2b 100644
> --- a/drivers/gpu/drm/bridge/ti-sn65dsi86.c
> +++ b/drivers/gpu/drm/bridge/ti-sn65dsi86.c
> @@ -1042,7 +1042,9 @@ static ssize_t ti_sn_aux_transfer(struct drm_dp_aux 
> *aux,
> pm_runtime_mark_last_busy(pdata->dev);
> pm_runtime_put_autosuspend(pdata->dev);
>
> -   return ret ? ret : len;
> +   if (ret)
> +   return ret;
> +   return len;
>  }
>

Reviewed-by: Robert Foss 

Applying to drm-misc-fixes.


Re: [PATCH V4 1/2] dt-bindings: display: bridge: lvds-codec: Document pixel data sampling edge select

2021-05-18 Thread Rob Herring
On Sat, May 15, 2021 at 10:43:15PM +0200, Marek Vasut wrote:
> The OnSemi FIN3385 Parallel-to-LVDS encoder has a dedicated input line to
> select input pixel data sampling edge. Add DT property "pclk-sample", not
> the same as the one used by display timings but rather the same as used by
> media, to define the pixel data sampling edge.
> 
> Signed-off-by: Marek Vasut 
> Cc: Alexandre Torgue 
> Cc: Andrzej Hajda 
> Cc: Antonio Borneo 
> Cc: Benjamin Gaignard 
> Cc: Biju Das 
> Cc: Laurent Pinchart 
> Cc: Maxime Coquelin 
> Cc: Philippe Cornu 
> Cc: Rob Herring 
> Cc: Sam Ravnborg 
> Cc: Vincent Abriou 
> Cc: Yannick Fertre 
> Cc: devicet...@vger.kernel.org
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-st...@st-md-mailman.stormreply.com
> To: dri-devel@lists.freedesktop.org
> ---
> V4: New patch split from combined V3
> ---
>  .../bindings/display/bridge/lvds-codec.yaml| 18 ++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/display/bridge/lvds-codec.yaml 
> b/Documentation/devicetree/bindings/display/bridge/lvds-codec.yaml
> index 304a1367faaa..f4dd16bd69d2 100644
> --- a/Documentation/devicetree/bindings/display/bridge/lvds-codec.yaml
> +++ b/Documentation/devicetree/bindings/display/bridge/lvds-codec.yaml
> @@ -64,6 +64,14 @@ properties:
>- port@0
>- port@1
>  
> +  pclk-sample:
> +description:
> +  Data sampling on rising or falling edge.
> +enum:
> +  - 0  # Falling edge
> +  - 1  # Rising edge
> +default: 0

This is already defined in video-interfaces.yaml, why are you redefining 
it here?

It's also defined to be an endpoint property, so move it there and 
reference video-interfaces.yaml.

> +
>powerdown-gpios:
>  description:
>The GPIO used to control the power down line of this device.
> @@ -71,6 +79,16 @@ properties:
>  
>power-supply: true
>  
> +if:
> +  not:
> +properties:
> +  compatible:
> +contains:
> +  const: lvds-encoder
> +then:
> +  properties:
> +pclk-sample: false

This constraint would be difficult to express with the property in an 
endpoint. I'd just drop it.

Rob


Re: [PATCH AUTOSEL 5.12 5/5] tty: vt: always invoke vc->vc_sw->con_resize callback

2021-05-18 Thread Sasha Levin

On Tue, May 18, 2021 at 07:45:59AM +0200, Greg KH wrote:

On Mon, May 17, 2021 at 06:35:24PM -0700, Linus Torvalds wrote:

On Mon, May 17, 2021 at 6:09 PM Sasha Levin  wrote:
>
> From: Tetsuo Handa 
>
> [ Upstream commit ffb324e6f874121f7dce5bdae5e05d02baae7269 ]

So I think the commit is fine, and yes, it should be applied to
stable, but it's one of those "there were three different patches in
as many days to fix the problem, and this is the right one, but maybe
stable should hold off for a while to see that there aren't any
problem reports".

I don't think there will be any problems from this, but while the
patch is tiny, it's conceptually quite a big change to something that
people haven't really touched for a long time.

So use your own judgement, but it might be a good idea to wait a week
before backporting this to see if anything screams.


I was going to wait a few weeks for this, and the other vt patches that
were marked with cc: stable@ before queueing them up.


I'll drop it from my queue then.

--
Thanks,
Sasha


Re: [PATCH v2 08/15] drm/i915/ttm Add a generic TTM memcpy move for page-based iomem

2021-05-18 Thread Thomas Hellström



On 5/18/21 3:08 PM, Christian König wrote:

Am 18.05.21 um 14:52 schrieb Thomas Hellström:


On 5/18/21 2:09 PM, Christian König wrote:

Am 18.05.21 um 14:04 schrieb Thomas Hellström:


On 5/18/21 1:55 PM, Christian König wrote:



Am 18.05.21 um 10:26 schrieb Thomas Hellström:
The internal ttm_bo_util memcpy uses vmap functionality, and 
while it

probably might be possible to use it for copying in- and out of
sglist represented io memory, using io_mem_reserve() / io_mem_free()
callbacks, that would cause problems with fault().
Instead, implement a method mapping page-by-page using kmap_local()
semantics. As an additional benefit we then avoid the occasional 
global
TLB flushes of vmap() and consuming vmap space, elimination of a 
critical
point of failure and with a slight change of semantics we could 
also push
the memcpy out async for testing and async driver develpment 
purposes.
Pushing out async can be done since there is no memory allocation 
going on

that could violate the dma_fence lockdep rules.

For copies from iomem, use the WC prefetching memcpy variant for
additional speed.

Note that drivers that don't want to use struct io_mapping but 
relies on

memremap functionality, and that don't want to use scatterlists for
VRAM may well define specialized (hopefully reusable) iterators 
for their

particular environment.


In general yes please since I have that as TODO for TTM for a very 
long time.


But I would prefer to fix the implementation in TTM instead and 
give it proper cursor handling.


Amdgpu is also using page based iomem and we are having similar 
workarounds in place there as well.


I think it makes sense to unify this inside TTM and remove the old 
memcpy util function when done.


Regards,
Christian.


Christian,

I was thinking when we replace the bo.mem with a pointer (and 
perhaps have a driver callback to allocate the bo->mem,
we could perhaps embed a struct ttm_kmap_iter and use it for all 
mapping in one way or another). That would mean perhaps land this 
is i915 now and sort out the unification once the struct 
ttm_resource, struct ttm_buffer_object separation has landed?


That stuff is ready, reviewed and I'm just waiting for some amdgpu 
changes to land in drm-misc-next to push it.


But yes in general an iterator for the resource object sounds like 
the right plan to me as well.


Christian.


OK, so then are you OK with landing this in i915 for now? That would 
also ofc mean the export you NAK'd but strictly for this memcpy use 
until we merge it with TTM?


Well you can of course prototype that in i915, but I really don't want 
to export the TT functions upstream.


I understand, I once had the same thoughts trying to avoid that as far 
as possible, so this function was actually then added to the ttm_bo 
interface, (hence the awkward naming) as a helper for drivers 
implementing move(), essentially a very special case of 
ttm_bo_move_accel_cleanup(), but anyway, see below:




Can we cleanly move that functionality into TTM instead?


I'll take a look at that, but I think we'd initially be having iterators 
mimicing the current move_memcpy() for the

linear iomem !WC cases, hope that's OK.

/Thomas




Christian.




/Thomas





/Thomas








Re: [RFC] Add DMA_RESV_USAGE flags

2021-05-18 Thread Daniel Stone
On Tue, 18 May 2021 at 13:49, Christian König
 wrote:
> Am 18.05.21 um 07:59 schrieb Daniel Vetter:
> > First step in fixing that is (and frankly was since years) to fix the
> > amdgpu CS so winsys can pass along a bunch of flags about which CS
> > should actually set the exclusive fence, so that you stop oversyncing
> > so badly. Ofc old userspace needs to keep oversyncing forever, no way
> > to fix that.
>
> Exactly that is what we don't want to do because the winsys has no idea
> when to sync and when not to sync.

Hey, we're typing that out as fast as we can ... it's just that you
keep reinventing sync primitives faster than we can ship support for
them :P


Re: [PATCH v2 08/15] drm/i915/ttm Add a generic TTM memcpy move for page-based iomem

2021-05-18 Thread Christian König

Am 18.05.21 um 15:24 schrieb Thomas Hellström:


On 5/18/21 3:08 PM, Christian König wrote:

Am 18.05.21 um 14:52 schrieb Thomas Hellström:


On 5/18/21 2:09 PM, Christian König wrote:

Am 18.05.21 um 14:04 schrieb Thomas Hellström:


On 5/18/21 1:55 PM, Christian König wrote:



Am 18.05.21 um 10:26 schrieb Thomas Hellström:
The internal ttm_bo_util memcpy uses vmap functionality, and 
while it

probably might be possible to use it for copying in- and out of
sglist represented io memory, using io_mem_reserve() / 
io_mem_free()

callbacks, that would cause problems with fault().
Instead, implement a method mapping page-by-page using kmap_local()
semantics. As an additional benefit we then avoid the occasional 
global
TLB flushes of vmap() and consuming vmap space, elimination of a 
critical
point of failure and with a slight change of semantics we could 
also push
the memcpy out async for testing and async driver develpment 
purposes.
Pushing out async can be done since there is no memory 
allocation going on

that could violate the dma_fence lockdep rules.

For copies from iomem, use the WC prefetching memcpy variant for
additional speed.

Note that drivers that don't want to use struct io_mapping but 
relies on

memremap functionality, and that don't want to use scatterlists for
VRAM may well define specialized (hopefully reusable) iterators 
for their

particular environment.


In general yes please since I have that as TODO for TTM for a 
very long time.


But I would prefer to fix the implementation in TTM instead and 
give it proper cursor handling.


Amdgpu is also using page based iomem and we are having similar 
workarounds in place there as well.


I think it makes sense to unify this inside TTM and remove the 
old memcpy util function when done.


Regards,
Christian.


Christian,

I was thinking when we replace the bo.mem with a pointer (and 
perhaps have a driver callback to allocate the bo->mem,
we could perhaps embed a struct ttm_kmap_iter and use it for all 
mapping in one way or another). That would mean perhaps land this 
is i915 now and sort out the unification once the struct 
ttm_resource, struct ttm_buffer_object separation has landed?


That stuff is ready, reviewed and I'm just waiting for some amdgpu 
changes to land in drm-misc-next to push it.


But yes in general an iterator for the resource object sounds like 
the right plan to me as well.


Christian.


OK, so then are you OK with landing this in i915 for now? That would 
also ofc mean the export you NAK'd but strictly for this memcpy use 
until we merge it with TTM?


Well you can of course prototype that in i915, but I really don't 
want to export the TT functions upstream.


I understand, I once had the same thoughts trying to avoid that as far 
as possible, so this function was actually then added to the ttm_bo 
interface, (hence the awkward naming) as a helper for drivers 
implementing move(), essentially a very special case of 
ttm_bo_move_accel_cleanup(), but anyway, see below:




Can we cleanly move that functionality into TTM instead?


I'll take a look at that, but I think we'd initially be having 
iterators mimicing the current move_memcpy() for the

linear iomem !WC cases, hope that's OK.


Yeah, that's peefectly fine with me. I can tackle cleaning up all 
drivers and move over to the new implementation when that is fully complete.


As I said we already have the same problem in amdgpu and only solved it 
by avoiding memcpy all together.


Christian.



/Thomas




Christian.




/Thomas





/Thomas










Re: [PATCH] drm/bridge: ti-sn65dsi86: fix a ternary type promotion bug

2021-05-18 Thread Robert Foss
Since it fixes drm-misc-next, applied it there instead.

On Tue, 18 May 2021 at 15:20, Robert Foss  wrote:
>
> Hey Dan,
>
> Thanks for submitting this.
>
> On Tue, 18 May 2021 at 11:20, Dan Carpenter  wrote:
> >
> > The ti_sn_aux_transfer() function returns ssize_t (signed long).  It's
> > supposed to return negative error codes or the number of bytes
> > transferred.  The "ret" variable is int and the "len" variable is
> > unsigned int.
> >
> > The problem is that with a ternary like this, the negative int is first
> > type promoted to unsigned int to match "len" at this point it is a high
> > positive value.  Then when it is type promoted to ssize_t (s64) it
> > remains a high positive value instead of sign extending and becoming a
> > negative again.
> >
> > Fix this by removing the ternary.
> >
> > Fixes: b137406d9679 ("drm/bridge: ti-sn65dsi86: If refclk, DP AUX can 
> > happen w/out pre-enable")
> > Signed-off-by: Dan Carpenter 
> > ---
> >  drivers/gpu/drm/bridge/ti-sn65dsi86.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/bridge/ti-sn65dsi86.c 
> > b/drivers/gpu/drm/bridge/ti-sn65dsi86.c
> > index bb0a0e1c6341..45a2969afb2b 100644
> > --- a/drivers/gpu/drm/bridge/ti-sn65dsi86.c
> > +++ b/drivers/gpu/drm/bridge/ti-sn65dsi86.c
> > @@ -1042,7 +1042,9 @@ static ssize_t ti_sn_aux_transfer(struct drm_dp_aux 
> > *aux,
> > pm_runtime_mark_last_busy(pdata->dev);
> > pm_runtime_put_autosuspend(pdata->dev);
> >
> > -   return ret ? ret : len;
> > +   if (ret)
> > +   return ret;
> > +   return len;
> >  }
> >
>
> Reviewed-by: Robert Foss 
>
> Applying to drm-misc-fixes.


Re: [Intel-gfx] [PATCH 4/4] i915: fix remap_io_sg to verify the pgprot

2021-05-18 Thread Thomas Hellström



On 5/18/21 3:24 PM, Christoph Hellwig wrote:

On Tue, May 18, 2021 at 08:46:44AM +0200, Thomas Hellström wrote:

And worse, if we prefault a user-space buffer object map using
remap_io_sg() and then zap some ptes using madvise(), the next time those
ptes are accessed, we'd trigger a new call to remap_io_sg() which would now
find already populated ptes. While the old code looks to just silently
overwrite those, it looks like the new code would BUG in remap_pte_range()?

How can you zap the PTEs using madvise?


Hmm, that's not possible with VM_PFNMAP. My bad. Should be OK then.

/Thomas




  1   2   3   >