Re: [PATCH rdma-core 3/5] pyverbs: Add dma-buf based MR support

2020-11-25 Thread Jason Gunthorpe
On Tue, Nov 24, 2020 at 04:16:58PM +0100, Daniel Vetter wrote:

> Compute is the worst, because opencl is widely considered a mistake (maybe
> opencl 3 is better, but nvidia is stuck on 1.2). The actually used stuff is
> cuda (nvidia-only), rocm (amd-only) and now with intel also playing we
> have xe (intel-only).

> It's pretty glorious :-/

I enjoyed how the Intel version of CUDA is called "OneAPI" not "Third
API" ;)

Hopefuly xe compute won't leave a lot of half finished abandoned
kernel code like Xeon Phi did :(

> Also I think we discussed this already, but for actual p2p the intel
> patches aren't in upstream yet. We have some internally, but with very
> broken locking (in the process of getting fixed up, but it's taking time).

Someone needs to say this test works on a real system with an
unpatched upstream driver.

I thought AMD had the needed parts merged?

Jason
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: Linux 5.10-rc4; graphics alignment

2020-11-25 Thread David Laight
From: David Laight
> Sent: 20 November 2020 15:39
> 
> From: Thomas Zimmermann
> > Sent: 20 November 2020 13:42
> ...
> > I did a diff from v5.10-rc4 to drm-tip to look for suspicious changes.
> > Some candidates are
> >
> >8e3784dfef8a ("drm/ast: Reload gamma LUT after changing primary
> > plane's color format")
> 
> Ok, that one fixes the screen colours (etc).
> So 8e3784dfef8a was good and then HEAD^ was bad.
> 
> I might try to bisect the breakage.
> 
> The stack splat is entirely different.
> I'll try to bisect that on Linus's tree.

The good news is I'm not getting the stack splat on rc5.
I'm not sure I can be bothered to find out when :-)

Applying 8e3784dfef8a to rc5 by hand also fixes the display colours.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Intel-wired-lan] [PATCH 000/141] Fix fall-through warnings for Clang

2020-11-25 Thread Miguel Ojeda
On Tue, Nov 24, 2020 at 11:24 PM Finn Thain  wrote:
>
> These statements are not "missing" unless you presume that code written
> before the latest de facto language spec was written should somehow be
> held to that spec.

There is no "language spec" the kernel adheres to. Even if it did,
kernel code is not frozen. If an improvement is found, it should be
applied.

> If the 'fallthrough' statement is not part of the latest draft spec then
> we should ask why not before we embrace it. Being that the kernel still
> prefers -std=gnu89 you might want to consider what has prevented
> -std=gnu99 or -std=gnu2x etc.

The C standard has nothing to do with this. We use compiler extensions
of several kinds, for many years. Even discounting those extensions,
the kernel is not even conforming to C due to e.g. strict aliasing. I
am not sure what you are trying to argue here.

But, since you insist: yes, the `fallthrough` attribute is in the
current C2x draft.

Cheers,
Miguel
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCHv10 5/9] drm/msm: rearrange the gpu_rmw() function

2020-11-25 Thread Sai Prakash Ranjan
From: Sharat Masetty 

The register read-modify-write construct is generic enough
that it can be used by other subsystems as needed, create
a more generic rmw() function and have the gpu_rmw() use
this new function.

Signed-off-by: Sharat Masetty 
Reviewed-by: Jordan Crouse 
Signed-off-by: Sai Prakash Ranjan 
---
 drivers/gpu/drm/msm/msm_drv.c | 8 
 drivers/gpu/drm/msm/msm_drv.h | 1 +
 drivers/gpu/drm/msm/msm_gpu.h | 5 +
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 49685571dc0e..a1e22b974b77 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -180,6 +180,14 @@ u32 msm_readl(const void __iomem *addr)
return val;
 }
 
+void msm_rmw(void __iomem *addr, u32 mask, u32 or)
+{
+   u32 val = msm_readl(addr);
+
+   val &= ~mask;
+   msm_writel(val | or, addr);
+}
+
 struct msm_vblank_work {
struct work_struct work;
int crtc_id;
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index b9dd8f8f4887..655b3b0424a1 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -478,6 +478,7 @@ void __iomem *msm_ioremap_quiet(struct platform_device 
*pdev, const char *name,
const char *dbgname);
 void msm_writel(u32 data, void __iomem *addr);
 u32 msm_readl(const void __iomem *addr);
+void msm_rmw(void __iomem *addr, u32 mask, u32 or);
 
 struct msm_gpu_submitqueue;
 int msm_submitqueue_init(struct drm_device *drm, struct msm_file_private *ctx);
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index 6c9e1fdc1a76..b2b419277953 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -246,10 +246,7 @@ static inline u32 gpu_read(struct msm_gpu *gpu, u32 reg)
 
 static inline void gpu_rmw(struct msm_gpu *gpu, u32 reg, u32 mask, u32 or)
 {
-   uint32_t val = gpu_read(gpu, reg);
-
-   val &= ~mask;
-   gpu_write(gpu, reg, val | or);
+   msm_rmw(gpu->mmio + (reg << 2), mask, or);
 }
 
 static inline u64 gpu_read64(struct msm_gpu *gpu, u32 lo, u32 hi)
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Freedreno] [PATCH 3/3] drm/msm/dpu: add support for clk and bw scaling for display

2020-11-25 Thread kalyan_t

On 2020-11-08 23:25, Amit Pundir wrote:

On Tue, 4 Aug 2020 at 21:09, Rob Clark  wrote:


On Thu, Jul 16, 2020 at 4:36 AM Kalyan Thota  
wrote:

>
> This change adds support to scale src clk and bandwidth as
> per composition requirements.
>
> Interconnect registration for bw has been moved to mdp
> device node from mdss to facilitate the scaling.
>
> Changes in v1:
>  - Address armv7 compilation issues with the patch (Rob)
>
> Signed-off-by: Kalyan Thota 

Reviewed-by: Rob Clark 



Hi Kalyan, Rob,

This patch broke the display on the PocoF1 phone
(sdm845-xiaomi-beryllium.dts) running AOSP.
I can boot to UI but the display is frozen soon after that and
dmesg is full of following errors:

[drm:dpu_core_perf_crtc_update:397] [dpu error]crtc-65: failed to
update bus bw vote
[drm:dpu_core_perf_crtc_check:203] [dpu error]exceeds bandwidth:
7649746kb > 680kb
[drm:dpu_crtc_atomic_check:969] [dpu error]crtc65 failed performance 
check -7

[drm:dpu_core_perf_crtc_check:203] [dpu error]exceeds bandwidth:
7649746kb > 680kb
[drm:dpu_crtc_atomic_check:969] [dpu error]crtc65 failed performance 
check -7

[drm:dpu_core_perf_crtc_check:203] [dpu error]exceeds bandwidth:
7649746kb > 680kb
[drm:dpu_crtc_atomic_check:969] [dpu error]crtc65 failed performance 
check -7


Here is the full dmesg https://pastebin.ubuntu.com/p/PcSdNgMnYw/.
Georgi pointed out following patch but it didn't help,
https://lore.kernel.org/dri-devel/20201027102304.945424-1-dmitry.barysh...@linaro.org/
Am I missing any other followup fix?

Regards,
Amit Pundir
__


Hi Amit,

Apologies for the delay.

I have gone through the logs and referred to the below panel file for 
the timings.

https://github.com/Matheus-Garbelini/Kernel-Sphinx-Pocophone-F1/blob/master/arch/arm64/boot/dts/qcom/dsi-panel-tianma-fhd-nt36672a-video.dtsi

if the above is correct file, then below could be the possible root 
cause.


The panel back porch and pw is less and it is causing the prefill bw 
requirement to shoot up per layer as currently we are not considering 
front porch in the calculation. can you please try the attached patch in 
the email as a solution and provide me the feedback, i'll post it as a 
formal change.


Thanks,
Kalyan

_

Freedreno mailing list
freedr...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/freedreno
From 028fb47ccc5a3f8f8e51513bd2719aa14c68ac09 Mon Sep 17 00:00:00 2001
From: Kalyan Thota 
Date: Tue, 24 Nov 2020 02:39:52 -0800
Subject: [PATCH] drm: msm: dpu: consider front porch in the prefill
 calculation

In case of panels with low vertical back porch and pw,
the prefill bw will increase as we will have less time to fetch
and fill all the hw latency buffers.

for ex: hw_latnecy_lines = 24, and if vbp+pw = 10 then we need to
fetch 24 lines of data in 10 line times. This will increase prefill
bw requirement.

DPU hw can fetch data during front porch also provided prefetch is
enabled. Use front porch also into the prefill caluculation as
driver enables prefetch if the blanking is not sufficient to fill
the latency lines.

Signed-off-by: Kalyan Thota 
---
 drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c
index 7ea90d2..315b999 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c
@@ -151,7 +151,7 @@ static void _dpu_plane_calc_bw(struct drm_plane *plane,
 	u64 plane_bw;
 	u32 hw_latency_lines;
 	u64 scale_factor;
-	int vbp, vpw;
+	int vbp, vpw, vfp;
 
 	pstate = to_dpu_plane_state(plane->state);
 	mode = &plane->state->crtc->mode;
@@ -164,6 +164,7 @@ static void _dpu_plane_calc_bw(struct drm_plane *plane,
 	fps = drm_mode_vrefresh(mode);
 	vbp = mode->vtotal - mode->vsync_end;
 	vpw = mode->vsync_end - mode->vsync_start;
+	vfp = mode->vsync_start - mode->vdisplay;
 	hw_latency_lines =  dpu_kms->catalog->perf.min_prefill_lines;
 	scale_factor = src_height > dst_height ?
 		mult_frac(src_height, 1, dst_height) : 1;
@@ -176,7 +177,13 @@ static void _dpu_plane_calc_bw(struct drm_plane *plane,
 		src_width * hw_latency_lines * fps * fmt->bpp *
 		scale_factor * mode->vtotal;
 
-	do_div(plane_prefill_bw, (vbp+vpw));
+	if ((vbp+vpw) > hw_latency_lines)
+		do_div(plane_prefill_bw, (vbp+vpw));
+	else if ((vbp+vpw+vfp) < hw_latency_lines)
+		do_div(plane_prefill_bw, (vbp+vpw+vfp));
+	else
+		do_div(plane_prefill_bw, hw_latency_lines);
+
 
 	pstate->plane_fetch_bw = max(plane_bw, plane_prefill_bw);
 }
-- 
2.7.4

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH rdma-core 1/5] verbs: Support dma-buf based memory region

2020-11-25 Thread Yishai Hadas

On 11/23/2020 7:53 PM, Jianxin Xiong wrote:

Add new API function and new provider method for registering dma-buf
based memory region. Update the man page and bump the API version.

Signed-off-by: Jianxin Xiong 
---
  kernel-headers/rdma/ib_user_ioctl_cmds.h | 14 
  libibverbs/cmd_mr.c  | 38 
  libibverbs/driver.h  |  7 ++
  libibverbs/dummy_ops.c   | 11 +
  libibverbs/libibverbs.map.in |  6 +
  libibverbs/man/ibv_reg_mr.3  | 21 --
  libibverbs/verbs.c   | 19 
  libibverbs/verbs.h   | 10 +
  8 files changed, 124 insertions(+), 2 deletions(-)

diff --git a/kernel-headers/rdma/ib_user_ioctl_cmds.h 
b/kernel-headers/rdma/ib_user_ioctl_cmds.h
index 7968a18..dafc7eb 100644
--- a/kernel-headers/rdma/ib_user_ioctl_cmds.h
+++ b/kernel-headers/rdma/ib_user_ioctl_cmds.h
@@ -1,5 +1,6 @@
  /*
   * Copyright (c) 2018, Mellanox Technologies inc.  All rights reserved.
+ * Copyright (c) 2020, Intel Corporation. All rights reserved.
   *
   * This software is available to you under a choice of one of two
   * licenses.  You may choose to be licensed under the terms of the GNU
@@ -251,6 +252,7 @@ enum uverbs_methods_mr {
UVERBS_METHOD_MR_DESTROY,
UVERBS_METHOD_ADVISE_MR,
UVERBS_METHOD_QUERY_MR,
+   UVERBS_METHOD_REG_DMABUF_MR,
  };
  
  enum uverbs_attrs_mr_destroy_ids {

@@ -272,6 +274,18 @@ enum uverbs_attrs_query_mr_cmd_attr_ids {
UVERBS_ATTR_QUERY_MR_RESP_IOVA,
  };
  
+enum uverbs_attrs_reg_dmabuf_mr_cmd_attr_ids {

+   UVERBS_ATTR_REG_DMABUF_MR_HANDLE,
+   UVERBS_ATTR_REG_DMABUF_MR_PD_HANDLE,
+   UVERBS_ATTR_REG_DMABUF_MR_OFFSET,
+   UVERBS_ATTR_REG_DMABUF_MR_LENGTH,
+   UVERBS_ATTR_REG_DMABUF_MR_IOVA,
+   UVERBS_ATTR_REG_DMABUF_MR_FD,
+   UVERBS_ATTR_REG_DMABUF_MR_ACCESS_FLAGS,
+   UVERBS_ATTR_REG_DMABUF_MR_RESP_LKEY,
+   UVERBS_ATTR_REG_DMABUF_MR_RESP_RKEY,
+};
+
  enum uverbs_attrs_create_counters_cmd_attr_ids {
UVERBS_ATTR_CREATE_COUNTERS_HANDLE,
  };
diff --git a/libibverbs/cmd_mr.c b/libibverbs/cmd_mr.c
index 42dbe42..91ce2ef 100644
--- a/libibverbs/cmd_mr.c
+++ b/libibverbs/cmd_mr.c
@@ -1,5 +1,6 @@
  /*
   * Copyright (c) 2018 Mellanox Technologies, Ltd.  All rights reserved.
+ * Copyright (c) 2020 Intel Corporation.  All rights reserved.
   *
   * This software is available to you under a choice of one of two
   * licenses.  You may choose to be licensed under the terms of the GNU
@@ -116,3 +117,40 @@ int ibv_cmd_query_mr(struct ibv_pd *pd, struct verbs_mr 
*vmr,
return 0;
  }
  
+int ibv_cmd_reg_dmabuf_mr(struct ibv_pd *pd, uint64_t offset, size_t length,

+ uint64_t iova, int fd, int access,
+ struct verbs_mr *vmr)
+{
+   DECLARE_COMMAND_BUFFER(cmdb, UVERBS_OBJECT_MR,
+  UVERBS_METHOD_REG_DMABUF_MR,
+  9);
+   struct ib_uverbs_attr *handle;
+   uint32_t lkey, rkey;
+   int ret;
+
+   handle = fill_attr_out_obj(cmdb, UVERBS_ATTR_REG_DMABUF_MR_HANDLE);
+   fill_attr_out_ptr(cmdb, UVERBS_ATTR_REG_DMABUF_MR_RESP_LKEY, &lkey);
+   fill_attr_out_ptr(cmdb, UVERBS_ATTR_REG_DMABUF_MR_RESP_RKEY, &rkey);
+
+   fill_attr_in_obj(cmdb, UVERBS_ATTR_REG_DMABUF_MR_PD_HANDLE, pd->handle);
+   fill_attr_in_uint64(cmdb, UVERBS_ATTR_REG_DMABUF_MR_OFFSET, offset);
+   fill_attr_in_uint64(cmdb, UVERBS_ATTR_REG_DMABUF_MR_LENGTH, length);
+   fill_attr_in_uint64(cmdb, UVERBS_ATTR_REG_DMABUF_MR_IOVA, iova);
+   fill_attr_in_uint32(cmdb, UVERBS_ATTR_REG_DMABUF_MR_FD, fd);
+   fill_attr_in_uint32(cmdb, UVERBS_ATTR_REG_DMABUF_MR_ACCESS_FLAGS, 
access);
+
+   ret = execute_ioctl(pd->context, cmdb);
+   if (ret)
+   return errno;
+
+   vmr->ibv_mr.handle =
+   read_attr_obj(UVERBS_ATTR_REG_DMABUF_MR_HANDLE, handle);
+   vmr->ibv_mr.context = pd->context;
+   vmr->ibv_mr.lkey= lkey;
+   vmr->ibv_mr.rkey= rkey;
+   vmr->ibv_mr.pd   = pd;
+   vmr->ibv_mr.addr= (void *)offset;
+   vmr->ibv_mr.length  = length;
+   vmr->mr_type= IBV_MR_TYPE_MR;
+   return 0;
+}
diff --git a/libibverbs/driver.h b/libibverbs/driver.h
index ab80f4b..d6a9d0a 100644
--- a/libibverbs/driver.h
+++ b/libibverbs/driver.h
@@ -2,6 +2,7 @@
   * Copyright (c) 2004, 2005 Topspin Communications.  All rights reserved.
   * Copyright (c) 2005, 2006 Cisco Systems, Inc.  All rights reserved.
   * Copyright (c) 2005 PathScale, Inc.  All rights reserved.
+ * Copyright (c) 2020 Intel Corporation. All rights reserved.
   *
   * This software is available to you under a choice of one of two
   * licenses.  You may choose to be licensed under the terms of the GNU
@@ -373,6 +374,9 @@ struct verbs_context_ops {
struct ibv_mr *(*

[PATCHv10 0/9] System Cache support for GPU and required SMMU support

2020-11-25 Thread Sai Prakash Ranjan
Some hardware variants contain a system cache or the last level
cache(llc). This cache is typically a large block which is shared
by multiple clients on the SOC. GPU uses the system cache to cache
both the GPU data buffers(like textures) as well the SMMU pagetables.
This helps with improved render performance as well as lower power
consumption by reducing the bus traffic to the system memory.

The system cache architecture allows the cache to be split into slices
which then be used by multiple SOC clients. This patch series is an
effort to enable and use two of those slices preallocated for the GPU,
one for the GPU data buffers and another for the GPU SMMU hardware
pagetables.

Patch 1 - Patch 7 adds system cache support in SMMU and GPU driver.
Patch 8 and 9 are minor cleanups for arm-smmu impl.

Changes in v10:
 * Fix non-strict mode domain attr handling (Will)
 * Split the domain attribute patch into two (Will)

Changes in v9:
 * Change name from domain_attr_io_pgtbl_cfg to io_pgtable_domain_attr (Will)
 * Modify comment for the quirk as suggested (Will)
 * Compare with IO_PGTABLE_QUIRK_NON_STRICT for non-strict mode (Will)

Changes in v8:
 * Introduce a generic domain attribute for pagetable config (Will)
 * Rename quirk to more generic IO_PGTABLE_QUIRK_ARM_OUTER_WBWA (Will)
 * Move non-strict mode to use new struct domain_attr_io_pgtbl_config (Will)

Changes in v7:
 * Squash Jordan's patch to support MMU500 targets
 * Rebase on top of for-joerg/arm-smmu/updates and Jordan's short series for 
adreno-smmu impl

Changes in v6:
 * Move table to arm-smmu-qcom (Robin)

Changes in v5:
 * Drop cleanup of blank lines since it was intentional (Robin)
 * Rebase again on top of msm-next-pgtables as it moves pretty fast

Changes in v4:
 * Drop IOMMU_SYS_CACHE prot flag
 * Rebase on top of 
https://gitlab.freedesktop.org/drm/msm/-/tree/msm-next-pgtables

Changes in v3:
 * Fix domain attribute setting to before iommu_attach_device()
 * Fix few code style and checkpatch warnings
 * Rebase on top of Jordan's latest split pagetables and per-instance
   pagetables support

Changes in v2:
 * Addressed review comments and rebased on top of Jordan's split
   pagetables series

Jordan Crouse (1):
  drm/msm/a6xx: Add support for using system cache on MMU500 based
targets

Sai Prakash Ranjan (6):
  iommu/io-pgtable: Add a domain attribute for pagetable configuration
  iommu/io-pgtable-arm: Add support to use system cache
  iommu/arm-smmu: Add support for pagetable config domain attribute
  iommu/arm-smmu: Move non-strict mode to use io_pgtable_domain_attr
  iommu: arm-smmu-impl: Use table to list QCOM implementations
  iommu: arm-smmu-impl: Add a space before open parenthesis

Sharat Masetty (2):
  drm/msm: rearrange the gpu_rmw() function
  drm/msm/a6xx: Add support for using system cache(LLC)

 drivers/gpu/drm/msm/adreno/a6xx_gpu.c  | 109 +
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h  |   5 +
 drivers/gpu/drm/msm/adreno/adreno_gpu.c|  17 
 drivers/gpu/drm/msm/msm_drv.c  |   8 ++
 drivers/gpu/drm/msm/msm_drv.h  |   1 +
 drivers/gpu/drm/msm/msm_gpu.h  |   5 +-
 drivers/iommu/arm/arm-smmu/arm-smmu-impl.c |  11 +--
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c |  21 +++-
 drivers/iommu/arm/arm-smmu/arm-smmu.c  |  33 ++-
 drivers/iommu/arm/arm-smmu/arm-smmu.h  |   3 +-
 drivers/iommu/io-pgtable-arm.c |  10 +-
 include/linux/io-pgtable.h |   8 ++
 include/linux/iommu.h  |   1 +
 13 files changed, 205 insertions(+), 27 deletions(-)


base-commit: a29bbb0861f487a5e144dc997a9f71a36c7a2404
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] drm/i915: fix error return code in check_partial_mapping()

2020-11-25 Thread Luo Meng
Fix to return a negative error code from the error handling case
instead of 0 in function check_partial_mapping(), as done elsewhere
in this function.

Fixes: 07e98eb0a174 ("drm/i915/selftests: Tighten the timeout testing for 
partial mmaps")
Reported-by: Hulk Robot 
Signed-off-by: Luo Meng 
---
 drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c 
b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
index d27d87a678c8..3f5e7d0a3c53 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
@@ -137,8 +137,10 @@ static int check_partial_mapping(struct 
drm_i915_gem_object *obj,
i915_vma_unpin_iomap(vma);
 
offset = tiled_offset(tile, page << PAGE_SHIFT);
-   if (offset >= obj->base.size)
+   if (offset >= obj->base.size) {
+   err = -EINVAL;
goto out;
+   }
 
intel_gt_flush_ggtt_writes(&to_i915(obj->base.dev)->gt);
 
-- 
2.25.4

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [REGRESSION] omapdrm/N900 display broken

2020-11-25 Thread Ivaylo Dimitrov

Hi,

On 25.08.20 г. 16:16 ч., Tomi Valkeinen wrote:

Hi Laurent,

On 23/08/2020 19:26, Aaro Koskinen wrote:

Hi,

On Tue, Aug 04, 2020 at 03:39:37PM +0300, Tomi Valkeinen wrote:

On 04/08/2020 15:13, Tomi Valkeinen wrote:



Can you try to pinpoint a bit where the hang happens? Maybe add
DRM/omapdrm debug prints, or perhaps sysrq works and it shows a lock
that's in deadlock.


Also, one data point would be to disable venc, e.g. set venc status to
"disabled" in dts.


Disabling venc makes no difference.

The hang happens in drm_fb_helper_initial_config(). I followed the
"HANG DEBUGGING" tips in the function comment text and enabled
fb.lockless_register_fb=1 to get more (serial) console output.

Now I get this:

[6.514739] omapdss_dss 4805.dss: supply vdda_video not found, using 
dummy regulator
[6.566375] DSS: OMAP DSS rev 2.0
[6.571807] omapdss_dss 4805.dss: bound 48050400.dispc (ops 
dispc_component_ops)
[6.580749] omapdrm omapdrm.0: DMM not available, disable DMM support
[6.587982] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[6.626617] [ cut here ]
[6.631774] WARNING: CPU: 0 PID: 18 at drivers/gpu/drm/drm_bridge.c:708 
drm_atomic_helper_commit_modeset_enables+0x134/0x268
[6.643768] Modules linked in:
[6.647033] CPU: 0 PID: 18 Comm: kworker/0:1 Tainted: G U
5.8.0-omap3-los_16068+-4-g2e7d4a7efefd-dirty #2
[6.658966] Hardware name: Nokia RX-51 board
[6.663635] Workqueue: events deferred_probe_work_func
[6.669097] [] (unwind_backtrace) from [] 
(show_stack+0x10/0x14)
[6.677429] [] (show_stack) from [] (__warn+0xbc/0xd4)
[6.684844] [] (__warn) from [] 
(warn_slowpath_fmt+0x60/0xb8)
[6.692901] [] (warn_slowpath_fmt) from [] 
(drm_atomic_helper_commit_modeset_enables+0x134/0x268)
[6.704254] [] (drm_atomic_helper_commit_modeset_enables) from 
[] (omap_atomic_commit_tail+0xb4/0xc0)
[6.715972] [] (omap_atomic_commit_tail) from [] 
(commit_tail+0x9c/0x1a8)
[6.725128] [] (commit_tail) from [] 
(drm_atomic_helper_commit+0x134/0x158)
[6.734466] [] (drm_atomic_helper_commit) from [] 
(drm_client_modeset_commit_atomic+0x16c/0x208)
[6.745727] [] (drm_client_modeset_commit_atomic) from 
[] (drm_client_modeset_commit_locked+0x58/0x184)
[6.757629] [] (drm_client_modeset_commit_locked) from 
[] (drm_client_modeset_commit+0x24/0x40)
[6.768798] [] (drm_client_modeset_commit) from [] 
(__drm_fb_helper_restore_fbdev_mode_unlocked+0xa0/0xc8)
[6.780975] [] (__drm_fb_helper_restore_fbdev_mode_unlocked) from 
[] (drm_fb_helper_set_par+0x38/0x64)
[6.792785] [] (drm_fb_helper_set_par) from [] 
(fbcon_init+0x3d4/0x568)
[6.801757] [] (fbcon_init) from [] 
(visual_init+0xb8/0xfc)
[6.809631] [] (visual_init) from [] 
(do_bind_con_driver+0x1e0/0x3bc)
[6.818267] [] (do_bind_con_driver) from [] 
(do_take_over_console+0x138/0x1d8)
[6.827880] [] (do_take_over_console) from [] 
(do_fbcon_takeover+0x74/0xd4)
[6.837219] [] (do_fbcon_takeover) from [] 
(register_framebuffer+0x204/0x2d8)
[6.846740] [] (register_framebuffer) from [] 
(__drm_fb_helper_initial_config_and_unlock+0x3a4/0x554)
[6.858459] [] (__drm_fb_helper_initial_config_and_unlock) from 
[] (omap_fbdev_init+0x84/0xbc)
[6.869537] [] (omap_fbdev_init) from [] 
(pdev_probe+0x580/0x7d8)
[6.877807] [] (pdev_probe) from [] 
(platform_drv_probe+0x48/0x98)


Laurent, does this ring any bells? The WARN comes in 
drm_atomic_bridge_chain_enable() when
drm_atomic_get_old_bridge_state() returns null for (presumably) sdi bridge.

I'm not sure why the bridge state would not be there.

Aaro, you can probably debug easier if you disable CONFIG_FRAMEBUFFER_CONSOLE, 
or even
CONFIG_DRM_FBDEV_EMULATION.

  Tomi



Is there any progress on the issue? I tried 5.9.1 and still nothing 
displayed.


Regards,
Ivo
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCHv10 2/9] iommu/io-pgtable-arm: Add support to use system cache

2020-11-25 Thread Sai Prakash Ranjan
Add a quirk IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to override
the outer-cacheability attributes set in the TCR for a
non-coherent page table walker when using system cache.

Signed-off-by: Sai Prakash Ranjan 
---
 drivers/iommu/io-pgtable-arm.c | 10 --
 include/linux/io-pgtable.h |  4 
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index a7a9bc08dcd1..7c9ea9d7874a 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -761,7 +761,8 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, 
void *cookie)
 
if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS |
IO_PGTABLE_QUIRK_NON_STRICT |
-   IO_PGTABLE_QUIRK_ARM_TTBR1))
+   IO_PGTABLE_QUIRK_ARM_TTBR1 |
+   IO_PGTABLE_QUIRK_ARM_OUTER_WBWA))
return NULL;
 
data = arm_lpae_alloc_pgtable(cfg);
@@ -773,10 +774,15 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, 
void *cookie)
tcr->sh = ARM_LPAE_TCR_SH_IS;
tcr->irgn = ARM_LPAE_TCR_RGN_WBWA;
tcr->orgn = ARM_LPAE_TCR_RGN_WBWA;
+   if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA)
+   goto out_free_data;
} else {
tcr->sh = ARM_LPAE_TCR_SH_OS;
tcr->irgn = ARM_LPAE_TCR_RGN_NC;
-   tcr->orgn = ARM_LPAE_TCR_RGN_NC;
+   if (!(cfg->quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA))
+   tcr->orgn = ARM_LPAE_TCR_RGN_NC;
+   else
+   tcr->orgn = ARM_LPAE_TCR_RGN_WBWA;
}
 
tg1 = cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1;
diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 215fd9d69540..fb4d5a763e0c 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -86,6 +86,9 @@ struct io_pgtable_cfg {
 *
 * IO_PGTABLE_QUIRK_ARM_TTBR1: (ARM LPAE format) Configure the table
 *  for use in the upper half of a split address space.
+*
+* IO_PGTABLE_QUIRK_ARM_OUTER_WBWA: Override the outer-cacheability
+*  attributes set in the TCR for a non-coherent page-table walker.
 */
#define IO_PGTABLE_QUIRK_ARM_NS BIT(0)
#define IO_PGTABLE_QUIRK_NO_PERMS   BIT(1)
@@ -93,6 +96,7 @@ struct io_pgtable_cfg {
#define IO_PGTABLE_QUIRK_ARM_MTK_EXTBIT(3)
#define IO_PGTABLE_QUIRK_NON_STRICT BIT(4)
#define IO_PGTABLE_QUIRK_ARM_TTBR1  BIT(5)
+   #define IO_PGTABLE_QUIRK_ARM_OUTER_WBWA BIT(6)
unsigned long   quirks;
unsigned long   pgsize_bitmap;
unsigned intias;
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 000/141] Fix fall-through warnings for Clang

2020-11-25 Thread Miguel Ojeda
On Mon, Nov 23, 2020 at 9:38 PM James Bottomley
 wrote:
>
> So you think a one line patch should take one minute to produce ... I
> really don't think that's grounded in reality.

No, I have not said that. Please don't put words in my mouth (again).

I have said *authoring* lines of *this* kind takes a minute per line.
Specifically: lines fixing the fallthrough warning mechanically and
repeatedly where the compiler tells you to, and doing so full-time for
a month.

For instance, take the following one from Gustavo. Are you really
saying it takes 12 minutes (your number) to write that `break;`?

diff --git a/drivers/gpu/drm/via/via_irq.c b/drivers/gpu/drm/via/via_irq.c
index 24cc445169e2..a3e0fb5b8671 100644
--- a/drivers/gpu/drm/via/via_irq.c
+++ b/drivers/gpu/drm/via/via_irq.c
@@ -364,6 +364,7 @@ int via_wait_irq(struct drm_device *dev, void
*data, struct drm_file *file_priv)
irqwait->request.sequence +=
atomic_read(&cur_irq->irq_received);
irqwait->request.type &= ~_DRM_VBLANK_RELATIVE;
+   break;
case VIA_IRQ_ABSOLUTE:
break;
default:

>  I suppose a one line
> patch only takes a minute to merge with b4 if no-one reviews or tests
> it, but that's not really desirable.

I have not said that either. I said reviewing and merging those are
noise compared to any complex patch. Testing should be done by the
author comparing codegen.

> Part of what I'm trying to measure is the "and useful" bit because
> that's not a given.

It is useful since it makes intent clear. It also catches actual bugs,
which is even more valuable.

> Well, you know, subsystems are very different in terms of the amount of
> patches a maintainer has to process per release cycle of the kernel.
> If a maintainer is close to capacity, additional patches, however
> trivial, become a problem.  If a maintainer has spare cycles, trivial
> patches may look easy.

First of all, voluntary maintainers choose their own workload.
Furthermore, we already measure capacity in the `MAINTAINERS` file:
maintainers can state they can only handle a few patches. Finally, if
someone does not have time for a trivial patch, they are very unlikely
to have any time to review big ones.

> You seem to be saying that because you find it easy to merge trivial
> patches, everyone should.

Again, I have not said anything of the sort.

Cheers,
Miguel
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Intel-wired-lan] [PATCH 000/141] Fix fall-through warnings for Clang

2020-11-25 Thread Miguel Ojeda
On Wed, Nov 25, 2020 at 12:53 AM Finn Thain  wrote:
>
> I'm saying that supporting the official language spec makes more sense
> than attempting to support a multitude of divergent interpretations of the
> spec (i.e. gcc, clang, coverity etc.)

Making the kernel strictly conforming is a ship that sailed long ago,
for several reasons. Anyway, supporting several compilers and other
tools, regardless of extensions, is valuable.

> I'm also saying that the reason why we use -std=gnu89 is that existing
> code was written in that language, not in ad hoc languages comprised of
> collections of extensions that change with every release.

No, we aren't particularly tied to `gnu89` or anything like that. We
could actually go for `gnu11` already, since the minimum GCC and Clang
support it. Even if a bit of code needs fixing, that shouldn't be a
problem if someone puts the work.

In other words, the kernel code is not frozen, nor are the features it
uses from compilers. They do, in fact, change from time to time.

> Thank you for checking. I found a free version that's only 6 weeks old:

You're welcome! There are quite a few new attributes coming, mostly
following C++ ones.

> It will be interesting to see whether 6.7.11.5 changes once the various
> implementations reach agreement.

Not sure what you mean. The standard does not evolve through
implementations' agreement (although standardizing existing practice
is one of the best arguments to back a change).

Cheers,
Miguel
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 2/2] dt-bindings: display/panel: add Innolux N125HCE-GN1

2020-11-25 Thread Lukas F. Hartmann
The Innolux N125HCE-GN1 display is used in the MNT Reform 2.0 laptop,
attached via eDP to a SN65DSI86 MIPI-DSI to eDP bridge. This patch
contains the DT binding for "innolux,n125hce-gn1".

Signed-off-by: Lukas F. Hartmann 
---
 .../devicetree/bindings/display/panel/panel-simple.yaml | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/devicetree/bindings/display/panel/panel-simple.yaml 
b/Documentation/devicetree/bindings/display/panel/panel-simple.yaml
index edb53ab0d..03b3e0b9d 100644
--- a/Documentation/devicetree/bindings/display/panel/panel-simple.yaml
+++ b/Documentation/devicetree/bindings/display/panel/panel-simple.yaml
@@ -160,6 +160,8 @@ properties:
 # Innolux Corporation 11.6" WXGA (1366x768) TFT LCD panel
   - innolux,n116bge
 # InnoLux 15.6" WXGA TFT LCD panel
+  - innolux,n125hce-gn1
+# InnoLux 13.3" FHD (1920x1080) eDP TFT LCD panel
   - innolux,n156bge-l21
 # Innolux Corporation 7.0" WSVGA (1024x600) TFT LCD panel
   - innolux,zj070na-01p
-- 
2.28.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Freedreno] [PATCH 3/3] drm/msm/dpu: add support for clk and bw scaling for display

2020-11-25 Thread Amit Pundir
Hi Kalyan,

On Tue, 24 Nov 2020 at 18:27,  wrote:
>
> On 2020-11-08 23:25, Amit Pundir wrote:
> > On Tue, 4 Aug 2020 at 21:09, Rob Clark  wrote:
> >>
> >> On Thu, Jul 16, 2020 at 4:36 AM Kalyan Thota 
> >> wrote:
> >> >
> >> > This change adds support to scale src clk and bandwidth as
> >> > per composition requirements.
> >> >
> >> > Interconnect registration for bw has been moved to mdp
> >> > device node from mdss to facilitate the scaling.
> >> >
> >> > Changes in v1:
> >> >  - Address armv7 compilation issues with the patch (Rob)
> >> >
> >> > Signed-off-by: Kalyan Thota 
> >>
> >> Reviewed-by: Rob Clark 
> >>
> >
> > Hi Kalyan, Rob,
> >
> > This patch broke the display on the PocoF1 phone
> > (sdm845-xiaomi-beryllium.dts) running AOSP.
> > I can boot to UI but the display is frozen soon after that and
> > dmesg is full of following errors:
> >
> > [drm:dpu_core_perf_crtc_update:397] [dpu error]crtc-65: failed to
> > update bus bw vote
> > [drm:dpu_core_perf_crtc_check:203] [dpu error]exceeds bandwidth:
> > 7649746kb > 680kb
> > [drm:dpu_crtc_atomic_check:969] [dpu error]crtc65 failed performance
> > check -7
> > [drm:dpu_core_perf_crtc_check:203] [dpu error]exceeds bandwidth:
> > 7649746kb > 680kb
> > [drm:dpu_crtc_atomic_check:969] [dpu error]crtc65 failed performance
> > check -7
> > [drm:dpu_core_perf_crtc_check:203] [dpu error]exceeds bandwidth:
> > 7649746kb > 680kb
> > [drm:dpu_crtc_atomic_check:969] [dpu error]crtc65 failed performance
> > check -7
> >
> > Here is the full dmesg https://pastebin.ubuntu.com/p/PcSdNgMnYw/.
> > Georgi pointed out following patch but it didn't help,
> > https://lore.kernel.org/dri-devel/20201027102304.945424-1-dmitry.barysh...@linaro.org/
> > Am I missing any other followup fix?
> >
> > Regards,
> > Amit Pundir
> > __
>
> Hi Amit,
>
> Apologies for the delay.

No worries at all.

>
> I have gone through the logs and referred to the below panel file for
> the timings.
> https://github.com/Matheus-Garbelini/Kernel-Sphinx-Pocophone-F1/blob/master/arch/arm64/boot/dts/qcom/dsi-panel-tianma-fhd-nt36672a-video.dtsi
>
> if the above is correct file, then below could be the possible root
> cause.
>
> The panel back porch and pw is less and it is causing the prefill bw
> requirement to shoot up per layer as currently we are not considering
> front porch in the calculation. can you please try the attached patch in
> the email as a solution and provide me the feedback, i'll post it as a
> formal change.

The attached patch worked for me. Thanks a lot for looking closely
into this issue.

Regards,
Amit Pundir

>
> Thanks,
> Kalyan
>
> _
> > Freedreno mailing list
> > freedr...@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/freedreno
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Intel-gfx] [PATCH] dma-buf/dma-resv: Respect num_fences when initializing the shared fence list.

2020-11-25 Thread Thomas Hellström


On 11/24/20 12:57 PM, Maarten Lankhorst wrote:

We hardcode the maximum number of shared fences to 4, instead of
respecting num_fences. Use a minimum of 4, but more if num_fences
is higher.

This seems to have been an oversight when first implementing the
api.

Fixes: 04a5faa8cbe5 ("reservation: update api and add some helpers")
Cc:  # v3.17+
Reported-by: Niranjana Vishwanathapura 
Signed-off-by: Maarten Lankhorst 
---
  drivers/dma-buf/dma-resv.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)


Reviewed-by: Thomas Hellström 


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCHv10 1/9] iommu/io-pgtable: Add a domain attribute for pagetable configuration

2020-11-25 Thread Sai Prakash Ranjan
Add a new iommu domain attribute DOMAIN_ATTR_IO_PGTABLE_CFG
for pagetable configuration which initially will be used to
set quirks like for system cache aka last level cache to be
used by client drivers like GPU to set right attributes for
caching the hardware pagetables into the system cache and
later can be extended to include other page table configuration
data.

Signed-off-by: Sai Prakash Ranjan 
---
 include/linux/io-pgtable.h | 4 
 include/linux/iommu.h  | 1 +
 2 files changed, 5 insertions(+)

diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 4cde111e425b..215fd9d69540 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -208,6 +208,10 @@ struct io_pgtable {
 
 #define io_pgtable_ops_to_pgtable(x) container_of((x), struct io_pgtable, ops)
 
+struct io_pgtable_domain_attr {
+   unsigned long quirks;
+};
+
 static inline void io_pgtable_tlb_flush_all(struct io_pgtable *iop)
 {
iop->cfg.tlb->tlb_flush_all(iop->cookie);
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index b95a6f8db6ff..ffaa389ea128 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -118,6 +118,7 @@ enum iommu_attr {
DOMAIN_ATTR_FSL_PAMUV1,
DOMAIN_ATTR_NESTING,/* two stages of translation */
DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
+   DOMAIN_ATTR_IO_PGTABLE_CFG,
DOMAIN_ATTR_MAX,
 };
 
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCHv10 3/9] iommu/arm-smmu: Add support for pagetable config domain attribute

2020-11-25 Thread Sai Prakash Ranjan
Add support for domain attribute DOMAIN_ATTR_IO_PGTABLE_CFG
to get/set pagetable configuration data which initially will
be used to set quirks and later can be extended to include
other pagetable configuration data.

Signed-off-by: Sai Prakash Ranjan 
---
 drivers/iommu/arm/arm-smmu/arm-smmu.c | 20 
 drivers/iommu/arm/arm-smmu/arm-smmu.h |  1 +
 2 files changed, 21 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 0f28a8614da3..4b9b10fe50ed 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -789,6 +789,9 @@ static int arm_smmu_init_domain_context(struct iommu_domain 
*domain,
if (smmu_domain->non_strict)
pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
 
+   if (smmu_domain->pgtbl_cfg.quirks)
+   pgtbl_cfg.quirks |= smmu_domain->pgtbl_cfg.quirks;
+
pgtbl_ops = alloc_io_pgtable_ops(fmt, &pgtbl_cfg, smmu_domain);
if (!pgtbl_ops) {
ret = -ENOMEM;
@@ -1511,6 +1514,12 @@ static int arm_smmu_domain_get_attr(struct iommu_domain 
*domain,
case DOMAIN_ATTR_NESTING:
*(int *)data = (smmu_domain->stage == 
ARM_SMMU_DOMAIN_NESTED);
return 0;
+   case DOMAIN_ATTR_IO_PGTABLE_CFG: {
+   struct io_pgtable_domain_attr *pgtbl_cfg = data;
+   *pgtbl_cfg = smmu_domain->pgtbl_cfg;
+
+   return 0;
+   }
default:
return -ENODEV;
}
@@ -1551,6 +1560,17 @@ static int arm_smmu_domain_set_attr(struct iommu_domain 
*domain,
else
smmu_domain->stage = ARM_SMMU_DOMAIN_S1;
break;
+   case DOMAIN_ATTR_IO_PGTABLE_CFG: {
+   struct io_pgtable_domain_attr *pgtbl_cfg = data;
+
+   if (smmu_domain->smmu) {
+   ret = -EPERM;
+   goto out_unlock;
+   }
+
+   smmu_domain->pgtbl_cfg = *pgtbl_cfg;
+   break;
+   }
default:
ret = -ENODEV;
}
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h 
b/drivers/iommu/arm/arm-smmu/arm-smmu.h
index 04288b6fc619..bb5a419f240f 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
@@ -364,6 +364,7 @@ enum arm_smmu_domain_stage {
 struct arm_smmu_domain {
struct arm_smmu_device  *smmu;
struct io_pgtable_ops   *pgtbl_ops;
+   struct io_pgtable_domain_attr   pgtbl_cfg;
const struct iommu_flush_ops*flush_ops;
struct arm_smmu_cfg cfg;
enum arm_smmu_domain_stage  stage;
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 1/2] panel-simple: add Innolux N125HCE-GN1

2020-11-25 Thread Lukas F. Hartmann
The Innolux N125HCE-GN1 display is used in the MNT Reform 2.0 laptop,
attached via eDP to a SN65DSI86 MIPI-DSI to eDP bridge.

Signed-off-by: Lukas F. Hartmann 
---
 drivers/gpu/drm/panel/panel-simple.c | 28 
 1 file changed, 28 insertions(+)

diff --git a/drivers/gpu/drm/panel/panel-simple.c 
b/drivers/gpu/drm/panel/panel-simple.c
index 2be358fb4..774acab52 100644
--- a/drivers/gpu/drm/panel/panel-simple.c
+++ b/drivers/gpu/drm/panel/panel-simple.c
@@ -2263,6 +2263,31 @@ static const struct panel_desc innolux_n116bge = {
},
 };
 
+static const struct drm_display_mode innolux_n125hce_gn1_mode = {
+   .clock = 162000,
+   .hdisplay = 1920,
+   .hsync_start = 1920 + 40,
+   .hsync_end = 1920 + 40 + 40,
+   .htotal = 1920 + 40 + 40 + 80,
+   .vdisplay = 1080,
+   .vsync_start = 1080 + 4,
+   .vsync_end = 1080 + 4 + 4,
+   .vtotal = 1080 + 4 + 4 + 24,
+};
+
+static const struct panel_desc innolux_n125hce_gn1 = {
+   .modes = &innolux_n125hce_gn1_mode,
+   .num_modes = 1,
+   .bpc = 8,
+   .size = {
+   .width = 276,
+   .height = 155,
+   },
+   .bus_format = MEDIA_BUS_FMT_RGB888_1X24,
+   .bus_flags = DRM_BUS_FLAG_DATA_MSB_TO_LSB,
+   .connector_type = DRM_MODE_CONNECTOR_eDP,
+};
+
 static const struct drm_display_mode innolux_n156bge_l21_mode = {
.clock = 69300,
.hdisplay = 1366,
@@ -4092,6 +4117,9 @@ static const struct of_device_id platform_of_match[] = {
}, {
.compatible = "innolux,n116bge",
.data = &innolux_n116bge,
+   }, {
+   .compatible = "innolux,n125hce-gn1",
+   .data = &innolux_n125hce_gn1,
}, {
.compatible = "innolux,n156bge-l21",
.data = &innolux_n156bge_l21,
-- 
2.28.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 2/3] mm: Extract might_alloc() debug check

2020-11-25 Thread Jason Gunthorpe
On Tue, Nov 24, 2020 at 03:34:11PM +0100, Daniel Vetter wrote:
> On Fri, Nov 20, 2020 at 02:07:19PM -0400, Jason Gunthorpe wrote:
> > On Fri, Nov 20, 2020 at 10:54:43AM +0100, Daniel Vetter wrote:
> > > diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> > > index d5ece7a9a403..f94405d43fd1 100644
> > > +++ b/include/linux/sched/mm.h
> > > @@ -180,6 +180,22 @@ static inline void fs_reclaim_acquire(gfp_t 
> > > gfp_mask) { }
> > >  static inline void fs_reclaim_release(gfp_t gfp_mask) { }
> > >  #endif
> > >  
> > > +/**
> > > + * might_alloc - Marks possible allocation sites
> > > + * @gfp_mask: gfp_t flags that would be use to allocate
> > > + *
> > > + * Similar to might_sleep() and other annotations this can be used in 
> > > functions
> > > + * that might allocate, but often dont. Compiles to nothing without
> > > + * CONFIG_LOCKDEP. Includes a conditional might_sleep() if @gfp allows 
> > > blocking.
> > > + */
> > > +static inline void might_alloc(gfp_t gfp_mask)
> > > +{
> > > + fs_reclaim_acquire(gfp_mask);
> > > + fs_reclaim_release(gfp_mask);
> > > +
> > > + might_sleep_if(gfpflags_allow_blocking(gfp_mask));
> > > +}
> > 
> > Reviewed-by: Jason Gunthorpe 
> > 
> > Oh, I just had a another thread with Matt about xarray, this would be
> > perfect to add before xas_nomem():
> 
> Yeah I think there's plenty of places where this will be useful. Want to
> slap a sob onto this diff so I can include it for the next round, or will
> you or Matt send this out when my might_alloc has landed?

When this is merged I can do this - just wanted to point out the API
is good and useful

Jason
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCHv9 3/8] iommu/arm-smmu: Move non-strict mode to use io_pgtable_domain_attr

2020-11-25 Thread Sai Prakash Ranjan

On 2020-11-25 03:09, Will Deacon wrote:

On Mon, Nov 23, 2020 at 10:35:56PM +0530, Sai Prakash Ranjan wrote:

Now that we have a struct io_pgtable_domain_attr with quirks,
use that for non_strict mode as well thereby removing the need
for more members of arm_smmu_domain in the future.

Signed-off-by: Sai Prakash Ranjan 
---
 drivers/iommu/arm/arm-smmu/arm-smmu.c | 8 +++-
 drivers/iommu/arm/arm-smmu/arm-smmu.h | 1 -
 2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c

index 4b9b10fe50ed..f56f266ebdf7 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -786,9 +786,6 @@ static int arm_smmu_init_domain_context(struct 
iommu_domain *domain,

goto out_clear_smmu;
}

-   if (smmu_domain->non_strict)
-   pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
-
if (smmu_domain->pgtbl_cfg.quirks)
pgtbl_cfg.quirks |= smmu_domain->pgtbl_cfg.quirks;

@@ -1527,7 +1524,8 @@ static int arm_smmu_domain_get_attr(struct 
iommu_domain *domain,

case IOMMU_DOMAIN_DMA:
switch (attr) {
case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE:
-   *(int *)data = smmu_domain->non_strict;
+   if (smmu_domain->pgtbl_cfg.quirks & 
IO_PGTABLE_QUIRK_NON_STRICT)
+   *(int *)data = smmu_domain->pgtbl_cfg.quirks;


I still don't think this is right :(
We need to set *data to 1 or 0 depending on whether or not the 
non-strict

quirk is set, i.e:

	bool non_strict = smmu_domain->pgtbl_cfg.quirks & 
IO_PGTABLE_QUIRK_NON_STRICT;

*(int *)data = non_strict;

Your code above leaves *data uninitialised if non_strict is not set.


Ugh sorry, I should have looked at this some more before hurrying up
to post, will fix it.




return 0;
default:
return -ENODEV;
@@ -1578,7 +1576,7 @@ static int arm_smmu_domain_set_attr(struct 
iommu_domain *domain,

case IOMMU_DOMAIN_DMA:
switch (attr) {
case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE:
-   smmu_domain->non_strict = *(int *)data;
+   smmu_domain->pgtbl_cfg.quirks |= 
IO_PGTABLE_QUIRK_NON_STRICT;


And this is broken because if *data is 0, then you _set_ the quirk, 
which is

the opposite of what we should be doing.

In other words, although the implementation has changed, the semantics 
have

not.



Will fix this to have quirk set only when *data = 1 and unset in case of 
0.


Thanks,
Sai

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member

of Code Aurora Forum, hosted by The Linux Foundation
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCHv10 6/9] drm/msm/a6xx: Add support for using system cache(LLC)

2020-11-25 Thread Sai Prakash Ranjan
From: Sharat Masetty 

The last level system cache can be partitioned to 32 different
slices of which GPU has two slices preallocated. One slice is
used for caching GPU buffers and the other slice is used for
caching the GPU SMMU pagetables. This talks to the core system
cache driver to acquire the slice handles, configure the SCID's
to those slices and activates and deactivates the slices upon
GPU power collapse and restore.

Some support from the IOMMU driver is also needed to make use
of the system cache to set the right TCR attributes. GPU then
has the ability to override a few cacheability parameters which
it does to override write-allocate to write-no-allocate as the
GPU hardware does not benefit much from it.

DOMAIN_ATTR_IO_PGTABLE_CFG is another domain level attribute used
by the IOMMU driver for pagetable configuration which will be used
to set a quirk initially to set the right attributes to cache the
hardware pagetables into the system cache.

Signed-off-by: Sharat Masetty 
[saiprakash.ranjan: fix to set attr before device attach to iommu and rebase]
Signed-off-by: Sai Prakash Ranjan 
---
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c   | 83 +
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h   |  4 ++
 drivers/gpu/drm/msm/adreno/adreno_gpu.c | 17 +
 3 files changed, 104 insertions(+)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 948f3656c20c..95c98c642876 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -8,7 +8,9 @@
 #include "a6xx_gpu.h"
 #include "a6xx_gmu.xml.h"
 
+#include 
 #include 
+#include 
 
 #define GPU_PAS_ID 13
 
@@ -1022,6 +1024,79 @@ static irqreturn_t a6xx_irq(struct msm_gpu *gpu)
return IRQ_HANDLED;
 }
 
+static void a6xx_llc_rmw(struct a6xx_gpu *a6xx_gpu, u32 reg, u32 mask, u32 or)
+{
+   return msm_rmw(a6xx_gpu->llc_mmio + (reg << 2), mask, or);
+}
+
+static void a6xx_llc_write(struct a6xx_gpu *a6xx_gpu, u32 reg, u32 value)
+{
+   return msm_writel(value, a6xx_gpu->llc_mmio + (reg << 2));
+}
+
+static void a6xx_llc_deactivate(struct a6xx_gpu *a6xx_gpu)
+{
+   llcc_slice_deactivate(a6xx_gpu->llc_slice);
+   llcc_slice_deactivate(a6xx_gpu->htw_llc_slice);
+}
+
+static void a6xx_llc_activate(struct a6xx_gpu *a6xx_gpu)
+{
+   u32 cntl1_regval = 0;
+
+   if (IS_ERR(a6xx_gpu->llc_mmio))
+   return;
+
+   if (!llcc_slice_activate(a6xx_gpu->llc_slice)) {
+   u32 gpu_scid = llcc_get_slice_id(a6xx_gpu->llc_slice);
+
+   gpu_scid &= 0x1f;
+   cntl1_regval = (gpu_scid << 0) | (gpu_scid << 5) | (gpu_scid << 
10) |
+  (gpu_scid << 15) | (gpu_scid << 20);
+   }
+
+   if (!llcc_slice_activate(a6xx_gpu->htw_llc_slice)) {
+   u32 gpuhtw_scid = llcc_get_slice_id(a6xx_gpu->htw_llc_slice);
+
+   gpuhtw_scid &= 0x1f;
+   cntl1_regval |= FIELD_PREP(GENMASK(29, 25), gpuhtw_scid);
+   }
+
+   if (cntl1_regval) {
+   /*
+* Program the slice IDs for the various GPU blocks and GPU MMU
+* pagetables
+*/
+   a6xx_llc_write(a6xx_gpu, REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_1, 
cntl1_regval);
+
+   /*
+* Program cacheability overrides to not allocate cache lines on
+* a write miss
+*/
+   a6xx_llc_rmw(a6xx_gpu, REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_0, 
0xF, 0x03);
+   }
+}
+
+static void a6xx_llc_slices_destroy(struct a6xx_gpu *a6xx_gpu)
+{
+   llcc_slice_putd(a6xx_gpu->llc_slice);
+   llcc_slice_putd(a6xx_gpu->htw_llc_slice);
+}
+
+static void a6xx_llc_slices_init(struct platform_device *pdev,
+   struct a6xx_gpu *a6xx_gpu)
+{
+   a6xx_gpu->llc_mmio = msm_ioremap(pdev, "cx_mem", "gpu_cx");
+   if (IS_ERR(a6xx_gpu->llc_mmio))
+   return;
+
+   a6xx_gpu->llc_slice = llcc_slice_getd(LLCC_GPU);
+   a6xx_gpu->htw_llc_slice = llcc_slice_getd(LLCC_GPUHTW);
+
+   if (IS_ERR(a6xx_gpu->llc_slice) && IS_ERR(a6xx_gpu->htw_llc_slice))
+   a6xx_gpu->llc_mmio = ERR_PTR(-EINVAL);
+}
+
 static int a6xx_pm_resume(struct msm_gpu *gpu)
 {
struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
@@ -1038,6 +1113,8 @@ static int a6xx_pm_resume(struct msm_gpu *gpu)
 
msm_gpu_resume_devfreq(gpu);
 
+   a6xx_llc_activate(a6xx_gpu);
+
return 0;
 }
 
@@ -1048,6 +1125,8 @@ static int a6xx_pm_suspend(struct msm_gpu *gpu)
 
trace_msm_gpu_suspend(0);
 
+   a6xx_llc_deactivate(a6xx_gpu);
+
devfreq_suspend_device(gpu->devfreq.devfreq);
 
return a6xx_gmu_stop(a6xx_gpu);
@@ -1091,6 +1170,8 @@ static void a6xx_destroy(struct msm_gpu *gpu)
drm_gem_object_put(a6xx_gpu->shadow_bo);
}
 
+   a6xx_llc_slices_destroy(a6xx_gpu);
+
a6xx_gmu_remove(a6xx_gpu);
 
adreno_gpu_cleanup(adreno_

Re: [PATCH v2 2/9] misc: Add Xilinx AI engine device driver

2020-11-25 Thread Wendy Liang



On 11/19/20 12:12 PM, Dave Airlie wrote:

diff --git a/MAINTAINERS b/MAINTAINERS
index 5cc595a..40e3351 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -19283,6 +19283,14 @@ T: git https://github.com/Xilinx/linux-xlnx.git
  F: Documentation/devicetree/bindings/phy/xlnx,zynqmp-psgtr.yaml
  F: drivers/phy/xilinx/phy-zynqmp.c

+XILINX AI ENGINE DRIVER
+M: Wendy Liang 
+S: Supported
+F: Documentation/devicetree/bindings/soc/xilinx/xlnx,ai-engine.yaml
+F: drivers/misc/xilinx-ai-engine/
+F: include/linux/xlnx-ai-engine.h
+F: include/uapi/linux/xlnx-ai-engine.h
+
  XILLYBUS DRIVER
  M: Eli Billauer 
  L: linux-ker...@vger.kernel.org
diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig
index fafa8b0..0b8ce4d 100644
--- a/drivers/misc/Kconfig
+++ b/drivers/misc/Kconfig
@@ -444,6 +444,18 @@ config XILINX_SDFEC

   If unsure, say N.

+config XILINX_AIE
+   tristate "Xilinx AI engine"
+   depends on ARM64 || COMPILE_TEST
+   help
+ This option enables support for the Xilinx AI engine driver.
+ One Xilinx AI engine device can have multiple partitions (groups of
+ AI engine tiles). Xilinx AI engine device driver instance manages
+ AI engine partitions. User application access its partitions through
+ AI engine partition instance file operations.
+
+ If unsure, say N
+
  config MISC_RTSX
 tristate
 default MISC_RTSX_PCI || MISC_RTSX_USB
diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile
index d23231e..2176b18 100644
--- a/drivers/misc/Makefile
+++ b/drivers/misc/Makefile
@@ -57,3 +57,4 @@ obj-$(CONFIG_HABANA_AI)   += habanalabs/
  obj-$(CONFIG_UACCE)+= uacce/
  obj-$(CONFIG_XILINX_SDFEC) += xilinx_sdfec.o
  obj-$(CONFIG_HISI_HIKEY_USB)   += hisi_hikey_usb.o
+obj-$(CONFIG_XILINX_AIE)   += xilinx-ai-engine/
diff --git a/drivers/misc/xilinx-ai-engine/Makefile 
b/drivers/misc/xilinx-ai-engine/Makefile
new file mode 100644
index 000..7827a0a
--- /dev/null
+++ b/drivers/misc/xilinx-ai-engine/Makefile
@@ -0,0 +1,11 @@
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Makefile for Xilinx AI engine device driver
+#
+
+obj-$(CONFIG_XILINX_AIE)   += xilinx-aie.o
+
+xilinx-aie-$(CONFIG_XILINX_AIE) := ai-engine-aie.o \
+  ai-engine-dev.o \
+  ai-engine-part.o \
+  ai-engine-res.o
diff --git a/drivers/misc/xilinx-ai-engine/ai-engine-aie.c 
b/drivers/misc/xilinx-ai-engine/ai-engine-aie.c
new file mode 100644
index 000..319260f
--- /dev/null
+++ b/drivers/misc/xilinx-ai-engine/ai-engine-aie.c
@@ -0,0 +1,115 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Xilinx AI Engine driver AIE device specific implementation
+ *
+ * Copyright (C) 2020 Xilinx, Inc.
+ */
+
+#include 
+
+#include "ai-engine-internal.h"
+
+#define AIE_ARRAY_SHIFT30U
+#define AIE_COL_SHIFT  23U
+#define AIE_ROW_SHIFT  18U
+
+/*
+ * Registers offsets
+ */
+#define AIE_SHIMNOC_L2INTR_MASK_REGOFF 0x00015000U
+#define AIE_SHIMNOC_L2INTR_INTR_REGOFF 0x00015010U
+#define AIE_SHIMNOC_DMA_BD0_ADDRLOW_REGOFF 0x0001d000U
+#define AIE_SHIMNOC_DMA_BD15_PACKET_REGOFF 0x0001d13cU
+#define AIE_SHIMNOC_AXIMM_REGOFF   0x0001e020U
+#define AIE_SHIMPL_L1INTR_MASK_A_REGOFF0x00035000U
+#define AIE_SHIMPL_L1INTR_BLOCK_NORTH_B_REGOFF 0x00035050U
+#define AIE_SHIMPL_CLKCNTR_REGOFF  0x00036040U
+#define AIE_SHIMPL_RESET_REGOFF0x0003604cU
+#define AIE_TILE_CORE_CLKCNTR_REGOFF   0x00036040U
+
+static const struct aie_tile_regs aie_kernel_regs[] = {
+   /* SHIM AXI MM Config */
+   {.attribute = AIE_TILE_TYPE_SHIMNOC << AIE_REGS_ATTR_TILE_TYPE_SHIFT,
+.soff = AIE_SHIMNOC_AXIMM_REGOFF,
+.eoff = AIE_SHIMNOC_AXIMM_REGOFF,
+   },
+   /* SHIM DMA ADDRESS range */
+   {.attribute = AIE_TILE_TYPE_SHIMNOC << AIE_REGS_ATTR_TILE_TYPE_SHIFT,
+.soff = AIE_SHIMNOC_DMA_BD0_ADDRLOW_REGOFF,
+.eoff = AIE_SHIMNOC_DMA_BD15_PACKET_REGOFF,
+   },
+   /* SHIM 2nd level interrupt controller */
+   {.attribute = AIE_TILE_TYPE_SHIMNOC << AIE_REGS_ATTR_TILE_TYPE_SHIFT,
+.soff = AIE_SHIMNOC_L2INTR_MASK_REGOFF,
+.eoff = AIE_SHIMNOC_L2INTR_INTR_REGOFF,
+   },
+   /* SHIM 1st level interrupt controller */
+   {.attribute = (AIE_TILE_TYPE_SHIMPL | AIE_TILE_TYPE_SHIMNOC) <<
+ AIE_REGS_ATTR_TILE_TYPE_SHIFT,
+.soff = AIE_SHIMPL_L1INTR_MASK_A_REGOFF,
+.eoff = AIE_SHIMPL_L1INTR_BLOCK_NORTH_B_REGOFF,
+   },
+   /* SHIM reset Enable */
+   {.attribute = (AIE_TILE_TYPE_SHIMPL | AIE_TILE_TYPE_SHIMNOC) <<
+ AIE_REGS_ATTR_TILE_TYPE_SHIFT,
+.soff = AIE_SHIMPL_RESET_REGOFF,
+.eoff = AIE_SHIMPL_RESET_REGOFF,
+   },
+   /* SHIM clock control */
+   {.attribute 

Re: [Intel-wired-lan] [PATCH 000/141] Fix fall-through warnings for Clang

2020-11-25 Thread Finn Thain


On Wed, 25 Nov 2020, Miguel Ojeda wrote:

> 
> The C standard has nothing to do with this. We use compiler extensions 
> of several kinds, for many years. Even discounting those extensions, the 
> kernel is not even conforming to C due to e.g. strict aliasing. I am not 
> sure what you are trying to argue here.
> 

I'm saying that supporting the official language spec makes more sense 
than attempting to support a multitude of divergent interpretations of the 
spec (i.e. gcc, clang, coverity etc.)

I'm also saying that the reason why we use -std=gnu89 is that existing 
code was written in that language, not in ad hoc languages comprised of 
collections of extensions that change with every release.

> But, since you insist: yes, the `fallthrough` attribute is in the 
> current C2x draft.
> 

Thank you for checking. I found a free version that's only 6 weeks old:

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2583.pdf

It will be interesting to see whether 6.7.11.5 changes once the various 
implementations reach agreement.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] Revert "i2c: qcom-geni: Disable DMA processing on the Lenovo Yoga C630"

2020-11-25 Thread Bjorn Andersson
A combination of recent bug fixes by Doug Anderson and the proper
definition of iommu streams means that this hack is no longer needed.
Let's clean up the code by reverting '127068abe85b ("i2c: qcom-geni:
Disable DMA processing on the Lenovo Yoga C630")'.

Signed-off-by: Bjorn Andersson 
---
 drivers/i2c/busses/i2c-qcom-geni.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/i2c/busses/i2c-qcom-geni.c 
b/drivers/i2c/busses/i2c-qcom-geni.c
index dce75b85253c..046d241183c5 100644
--- a/drivers/i2c/busses/i2c-qcom-geni.c
+++ b/drivers/i2c/busses/i2c-qcom-geni.c
@@ -353,13 +353,11 @@ static int geni_i2c_rx_one_msg(struct geni_i2c_dev *gi2c, 
struct i2c_msg *msg,
 {
dma_addr_t rx_dma;
unsigned long time_left;
-   void *dma_buf = NULL;
+   void *dma_buf;
struct geni_se *se = &gi2c->se;
size_t len = msg->len;
 
-   if (!of_machine_is_compatible("lenovo,yoga-c630"))
-   dma_buf = i2c_get_dma_safe_msg_buf(msg, 32);
-
+   dma_buf = i2c_get_dma_safe_msg_buf(msg, 32);
if (dma_buf)
geni_se_select_mode(se, GENI_SE_DMA);
else
@@ -394,13 +392,11 @@ static int geni_i2c_tx_one_msg(struct geni_i2c_dev *gi2c, 
struct i2c_msg *msg,
 {
dma_addr_t tx_dma;
unsigned long time_left;
-   void *dma_buf = NULL;
+   void *dma_buf;
struct geni_se *se = &gi2c->se;
size_t len = msg->len;
 
-   if (!of_machine_is_compatible("lenovo,yoga-c630"))
-   dma_buf = i2c_get_dma_safe_msg_buf(msg, 32);
-
+   dma_buf = i2c_get_dma_safe_msg_buf(msg, 32);
if (dma_buf)
geni_se_select_mode(se, GENI_SE_DMA);
else
-- 
2.29.2

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCHv10 9/9] iommu: arm-smmu-impl: Add a space before open parenthesis

2020-11-25 Thread Sai Prakash Ranjan
Fix the checkpatch warning for space required before the open
parenthesis.

Signed-off-by: Sai Prakash Ranjan 
Acked-by: Will Deacon 
---
 drivers/iommu/arm/arm-smmu/arm-smmu-impl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
index 26e2734eb4d7..136872e77195 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
@@ -12,7 +12,7 @@
 
 static int arm_smmu_gr0_ns(int offset)
 {
-   switch(offset) {
+   switch (offset) {
case ARM_SMMU_GR0_sCR0:
case ARM_SMMU_GR0_sACR:
case ARM_SMMU_GR0_sGFSR:
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v11 1/4] RDMA/umem: Support importing dma-buf as user memory region

2020-11-25 Thread Jason Gunthorpe
On Tue, Nov 24, 2020 at 06:24:43PM +, Xiong, Jianxin wrote:
> > From: Christoph Hellwig 
> > Sent: Tuesday, November 24, 2020 1:34 AM
> > To: Xiong, Jianxin 
> > Cc: linux-r...@vger.kernel.org; dri-devel@lists.freedesktop.org; Doug 
> > Ledford ; Jason Gunthorpe ;
> > Leon Romanovsky ; Sumit Semwal ; 
> > Christian Koenig ; Vetter,
> > Daniel 
> > Subject: Re: [PATCH v11 1/4] RDMA/umem: Support importing dma-buf as user 
> > memory region
> > 
> > As these are mostly trivial wrappers around the EXPORT_SYMBOL_GPL dmabuf 
> > exports please stick to that export style.
> > 
> > > +++ b/drivers/infiniband/core/umem_dmabuf.h
> > > @@ -0,0 +1,11 @@
> > > +/* SPDX-License-Identifier: (GPL-2.0 OR BSD-3-Clause) */
> > > +/*
> > > + * Copyright (c) 2020 Intel Corporation. All rights reserved.
> > > + */
> > > +
> > > +#ifndef UMEM_DMABUF_H
> > > +#define UMEM_DMABUF_H
> > > +
> > > +void ib_umem_dmabuf_release(struct ib_umem_dmabuf *umem_dmabuf);
> > > +
> > > +#endif /* UMEM_DMABUF_H */
> > 
> > Does this really need a separate header?
> 
> The symbol doesn't need to be exported otherwise it can be put into 
> "ib_umem.h".
> Although the prototype could be put into the file where it is called 
> directly, using a
> separate header file provides a cleaner interface.

It is fine to put this single symbol in ib_umem.h

Thanks
Jason
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[RFC] dcss: fix attaching to sn56dsi86 bridge

2020-11-25 Thread Lukas F. Hartmann
The sn56dsi86 DSI to eDP bridge driver does not support attaching
without a drm connector. This patch makes the attachment work. Required
for the display chain in MNT Reform 2.0 (DCSS->NWL DSI->SN56DSI86->EDP).

Signed-off-by: Lukas F. Hartmann 
---
 drivers/gpu/drm/imx/dcss/dcss-kms.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/imx/dcss/dcss-kms.c 
b/drivers/gpu/drm/imx/dcss/dcss-kms.c
index 135a62366..4967f828b 100644
--- a/drivers/gpu/drm/imx/dcss/dcss-kms.c
+++ b/drivers/gpu/drm/imx/dcss/dcss-kms.c
@@ -82,6 +82,7 @@ static int dcss_kms_bridge_connector_init(struct dcss_kms_dev 
*kms)
struct drm_crtc *crtc = (struct drm_crtc *)&kms->crtc;
struct drm_panel *panel;
struct drm_bridge *bridge;
+   struct drm_connector_list_iter iter;
int ret;

ret = drm_of_find_panel_or_bridge(ddev->dev->of_node, 0, 0,
@@ -104,19 +105,19 @@ static int dcss_kms_bridge_connector_init(struct 
dcss_kms_dev *kms)
return ret;
}

-   ret = drm_bridge_attach(encoder, bridge, NULL,
-   DRM_BRIDGE_ATTACH_NO_CONNECTOR);
+   ret = drm_bridge_attach(encoder, bridge, NULL, 0);
if (ret < 0) {
dev_err(ddev->dev, "Unable to attach bridge %pOF\n",
bridge->of_node);
return ret;
}

-   kms->connector = drm_bridge_connector_init(ddev, encoder);
-   if (IS_ERR(kms->connector)) {
-   dev_err(ddev->dev, "Unable to create bridge connector.\n");
-   return PTR_ERR(kms->connector);
-   }
+   /*
+* This hack to look up the connector is copied from mxsfb.
+*/
+   drm_connector_list_iter_begin(ddev, &iter);
+   kms->connector = drm_connector_list_iter_next(&iter);
+   drm_connector_list_iter_end(&iter);

drm_connector_attach_encoder(kms->connector, encoder);

--
2.28.0
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 0/2] powerpc: Remove support for ppc405/440 Xilinx platforms

2020-11-25 Thread Christophe Leroy



Le 21/05/2020 à 12:38, Christophe Leroy a écrit :



Le 21/05/2020 à 09:02, Michael Ellerman a écrit :

Arnd Bergmann  writes:

+On Wed, Apr 8, 2020 at 2:04 PM Michael Ellerman  wrote:

Benjamin Herrenschmidt  writes:

On Fri, 2020-04-03 at 15:59 +1100, Michael Ellerman wrote:

Benjamin Herrenschmidt  writes:

IBM still put 40x cores inside POWER chips no ?


Oh yeah that's true. I guess most folks don't know that, or that they
run RHEL on them.


Is there a reason for not having those dts files in mainline then?
If nothing else, it would document what machines are still being
used with future kernels.


Sorry that part was a joke :D  Those chips don't run Linux.



Nice to know :)

What's the plan then, do we still want to keep 40x in the kernel ?

If yes, is it ok to drop the oldies anyway as done in my series 
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=172630 ?


(Note that this series will conflict with my series on hugepages on 8xx due to the 
PTE_ATOMIC_UPDATES stuff. I can rebase the 40x modernisation series on top of the 8xx hugepages 
series if it is worth it)




Do we still want to keep 40x in the kernel ? We don't even have a running 40x QEMU machine as far as 
I know.


I'm asking because I'd like to drop the non CONFIG_VMAP_STACK code to simplify and ease stuff (code 
that works with vmalloc'ed stacks also works with stacks in linear memory), but I can't do it 
because 40x doesn't have VMAP_STACK and should I implement it for 40x, I have to means to test it.


So it would ease things if we could drop 40x completely, unless someone there has a 40x platform to 
test stuff.


Thanks
Christophe
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Intel-wired-lan] [PATCH 000/141] Fix fall-through warnings for Clang

2020-11-25 Thread Finn Thain
On Tue, 24 Nov 2020, Kees Cook wrote:

> On Mon, Nov 23, 2020 at 08:31:30AM -0800, James Bottomley wrote:
> > Really, no ... something which produces no improvement has no value at 
> > all ... we really shouldn't be wasting maintainer time with it because 
> > it has a cost to merge.  I'm not sure we understand where the balance 
> > lies in value vs cost to merge but I am confident in the zero value 
> > case.
> 
> What? We can't measure how many future bugs aren't introduced because 
> the kernel requires explicit case flow-control statements for all new 
> code.
> 

These statements are not "missing" unless you presume that code written 
before the latest de facto language spec was written should somehow be 
held to that spec.

If the 'fallthrough' statement is not part of the latest draft spec then 
we should ask why not before we embrace it. Being that the kernel still 
prefers -std=gnu89 you might want to consider what has prevented 
-std=gnu99 or -std=gnu2x etc.

> We already enable -Wimplicit-fallthrough globally, so that's not the 
> discussion. The issue is that Clang is (correctly) even more strict than 
> GCC for this, so these are the remaining ones to fix for full Clang 
> coverage too.
> 

Seems to me you should be patching the compiler.

When you have consensus among the language lawyers you'll have more 
credibility with those being subjected to enforcement.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


nwl-dsi: fixup mode only for LCDIF input, not DCSS

2020-11-25 Thread Lukas F. Hartmann
The fixup of HSYNC and VSYNC should not be done when the input source is
DCSS, or internal display does not work on MNT Reform 2 (open hardware 
laptop based on NXP i.MX8M using DCSS->DSI->eDP for internal display).

Signed-off-by: Lukas F. Hartmann 
---
 drivers/gpu/drm/bridge/nwl-dsi.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/bridge/nwl-dsi.c b/drivers/gpu/drm/bridge/nwl-dsi.c
index 66b67402f..6735ab2a2 100644
--- a/drivers/gpu/drm/bridge/nwl-dsi.c
+++ b/drivers/gpu/drm/bridge/nwl-dsi.c
@@ -807,10 +807,16 @@ static bool nwl_dsi_bridge_mode_fixup(struct drm_bridge 
*bridge,
  const struct drm_display_mode *mode,
  struct drm_display_mode *adjusted_mode)
 {
-   /* At least LCDIF + NWL needs active high sync */
-   adjusted_mode->flags |= (DRM_MODE_FLAG_PHSYNC | DRM_MODE_FLAG_PVSYNC);
-   adjusted_mode->flags &= ~(DRM_MODE_FLAG_NHSYNC | DRM_MODE_FLAG_NVSYNC);
+   struct device_node *remote;
+   struct nwl_dsi *dsi = bridge_to_dsi(bridge);
+
+   remote = of_graph_get_remote_node(dsi->dev->of_node, 0,
+   NWL_DSI_ENDPOINT_LCDIF);
+   if (remote) {
+   /* At least LCDIF + NWL needs active high sync */
+   adjusted_mode->flags |= (DRM_MODE_FLAG_PHSYNC | 
DRM_MODE_FLAG_PVSYNC);
+   adjusted_mode->flags &= ~(DRM_MODE_FLAG_NHSYNC | 
DRM_MODE_FLAG_NVSYNC);
+   }

return true;
 }
--
2.28.0
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCHv9 2/8] iommu/arm-smmu: Add domain attribute for pagetable configuration

2020-11-25 Thread Sai Prakash Ranjan

On 2020-11-25 03:11, Will Deacon wrote:

On Mon, Nov 23, 2020 at 10:35:55PM +0530, Sai Prakash Ranjan wrote:

Add iommu domain attribute for pagetable configuration which
initially will be used to set quirks like for system cache aka
last level cache to be used by client drivers like GPU to set
right attributes for caching the hardware pagetables into the
system cache and later can be extended to include other page
table configuration data.

Signed-off-by: Sai Prakash Ranjan 
---
 drivers/iommu/arm/arm-smmu/arm-smmu.c | 20 
 drivers/iommu/arm/arm-smmu/arm-smmu.h |  1 +
 include/linux/io-pgtable.h|  4 
 include/linux/iommu.h |  1 +
 4 files changed, 26 insertions(+)


Given that we're heading for a v10 to address my comments on patch 3,
then I guess you may as well split this into two patches so that I can
share just the atttibute with Rob rather than the driver parts.

Please keep it all as one series though, with the common parts at the
beginning, and I'll figure it out.



Ok I will split up and send v10.

Thanks,
Sai

--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member

of Code Aurora Forum, hosted by The Linux Foundation
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCHv10 8/9] iommu: arm-smmu-impl: Use table to list QCOM implementations

2020-11-25 Thread Sai Prakash Ranjan
Use table and of_match_node() to match qcom implementation
instead of multiple of_device_compatible() calls for each
QCOM SMMU implementation.

Signed-off-by: Sai Prakash Ranjan 
Acked-by: Will Deacon 
---
 drivers/iommu/arm/arm-smmu/arm-smmu-impl.c |  9 +
 drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 21 -
 drivers/iommu/arm/arm-smmu/arm-smmu.h  |  1 -
 3 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
index 7fed89c9d18a..26e2734eb4d7 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-impl.c
@@ -214,14 +214,7 @@ struct arm_smmu_device *arm_smmu_impl_init(struct 
arm_smmu_device *smmu)
if (of_device_is_compatible(np, "nvidia,tegra194-smmu"))
return nvidia_smmu_impl_init(smmu);
 
-   if (of_device_is_compatible(np, "qcom,sdm845-smmu-500") ||
-   of_device_is_compatible(np, "qcom,sc7180-smmu-500") ||
-   of_device_is_compatible(np, "qcom,sm8150-smmu-500") ||
-   of_device_is_compatible(np, "qcom,sm8250-smmu-500"))
-   return qcom_smmu_impl_init(smmu);
-
-   if (of_device_is_compatible(smmu->dev->of_node, "qcom,adreno-smmu"))
-   return qcom_adreno_smmu_impl_init(smmu);
+   smmu = qcom_smmu_impl_init(smmu);
 
if (of_device_is_compatible(np, "marvell,ap806-smmu-500"))
smmu->impl = &mrvl_mmu500_impl;
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index d0636c803a36..add1859b2899 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -318,12 +318,23 @@ static struct arm_smmu_device *qcom_smmu_create(struct 
arm_smmu_device *smmu,
return &qsmmu->smmu;
 }
 
+static const struct of_device_id __maybe_unused qcom_smmu_impl_of_match[] = {
+   { .compatible = "qcom,sc7180-smmu-500" },
+   { .compatible = "qcom,sdm845-smmu-500" },
+   { .compatible = "qcom,sm8150-smmu-500" },
+   { .compatible = "qcom,sm8250-smmu-500" },
+   { }
+};
+
 struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu)
 {
-   return qcom_smmu_create(smmu, &qcom_smmu_impl);
-}
+   const struct device_node *np = smmu->dev->of_node;
 
-struct arm_smmu_device *qcom_adreno_smmu_impl_init(struct arm_smmu_device 
*smmu)
-{
-   return qcom_smmu_create(smmu, &qcom_adreno_smmu_impl);
+   if (of_match_node(qcom_smmu_impl_of_match, np))
+   return qcom_smmu_create(smmu, &qcom_smmu_impl);
+
+   if (of_device_is_compatible(np, "qcom,adreno-smmu"))
+   return qcom_smmu_create(smmu, &qcom_adreno_smmu_impl);
+
+   return smmu;
 }
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h 
b/drivers/iommu/arm/arm-smmu/arm-smmu.h
index cb7ca3a444c9..d2a2d1bc58ba 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
@@ -523,7 +523,6 @@ static inline void arm_smmu_writeq(struct arm_smmu_device 
*smmu, int page,
 struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu);
 struct arm_smmu_device *nvidia_smmu_impl_init(struct arm_smmu_device *smmu);
 struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu);
-struct arm_smmu_device *qcom_adreno_smmu_impl_init(struct arm_smmu_device 
*smmu);
 
 void arm_smmu_write_context_bank(struct arm_smmu_device *smmu, int idx);
 int arm_mmu500_reset(struct arm_smmu_device *smmu);
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 000/141] Fix fall-through warnings for Clang

2020-11-25 Thread Miguel Ojeda
On Tue, Nov 24, 2020 at 1:58 AM Finn Thain  wrote:
>
> What I meant was that you've used pessimism as if it was fact.

"future mistakes that it might prevent" is neither pessimism nor states a fact.

> For example, "There is no way to guess what the effect would be if the
> compiler trained programmers to add a knee-jerk 'break' statement to avoid
> a warning".

It is only knee-jerk if you think you are infallible.

> Moreover, what I meant was that preventing programmer mistakes is a
> problem to be solved by development tools

This warning comes from a development tool -- the compiler.

> The idea that retro-fitting new
> language constructs onto mature code is somehow necessary to "prevent
> future mistakes" is entirely questionable.

The kernel is not a frozen codebase.

Further, "mature code vs. risk of change" arguments don't apply here
because the semantics of the program and binary output isn't changing.

> Sure. And if you put -Wimplicit-fallthrough into the Makefile and if that
> leads to well-intentioned patches that cause regressions, it is partly on
> you.

Again: adding a `fallthrough` does not change the program semantics.
If you are a maintainer and want to cross-check, compare the codegen.

> Have you ever considered the overall cost of the countless
> -Wpresume-incompetence flags?

Yeah: negative. On the other hand, the overall cost of the countless
-fI-am-infallible flags is very noticeable.

> Perhaps you pay the power bill for a build farm that produces logs that
> no-one reads? Perhaps you've run git bisect, knowing that the compiler
> messages are not interesting? Or compiled software in using a language
> that generates impenetrable messages? If so, here's a tip:
>
> # grep CFLAGS /etc/portage/make.conf
> CFLAGS="... -Wno-all -Wno-extra ..."
> CXXFLAGS="${CFLAGS}"
>
> Now allow me some pessimism: the hardware upgrades, gigawatt hours and
> wait time attributable to obligatory static analyses are a net loss.

If you really believe compiler warnings and static analysis are
useless and costly, I think there is not much point in continuing the
discussion.

> No, it's not for me to prove that such patches don't affect code
> generation. That's for the patch author and (unfortunately) for reviewers.

I was not asking you to prove it. I am stating that proving it is very easy.

Cheers,
Miguel
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v6 17/17] RFC: mm: add mmu_notifier argument to follow_pfn

2020-11-25 Thread Jason Gunthorpe
On Tue, Nov 24, 2020 at 03:28:14PM +0100, Daniel Vetter wrote:
> On Fri, Nov 20, 2020 at 02:30:29PM -0400, Jason Gunthorpe wrote:
> > On Thu, Nov 19, 2020 at 03:41:46PM +0100, Daniel Vetter wrote:
> > > @@ -4805,21 +4824,15 @@ EXPORT_SYMBOL(follow_pte_pmd);
> > >   * Return: zero and the pfn at @pfn on success, -ve otherwise.
> > >   */
> > >  int follow_pfn(struct vm_area_struct *vma, unsigned long address,
> > > - unsigned long *pfn)
> > > + unsigned long *pfn, struct mmu_notifier *subscription)
> > >  {
> > > - int ret = -EINVAL;
> > > - spinlock_t *ptl;
> > > - pte_t *ptep;
> > > + if (WARN_ON(!subscription->mm))
> > > + return -EINVAL;
> > >  
> > > + if (WARN_ON(subscription->mm != vma->vm_mm))
> > > + return -EINVAL;
> > 
> > These two things are redundant right? vma->vm_mm != NULL?
> 
> Yup, will remove.
> 
> > BTW, why do we even have this for nommu? If the only caller is kvm,
> > can you even compile kvm on nommu??
> 
> Kinda makes sense, but I have no idea how to make sure with compile
> testing this is really the case. And I didn't see any hard evidence in
> Kconfig or Makefile that mmu notifiers requires CONFIG_MMU. So not sure
> what to do here.

It looks like only some arches have selectable CONFIG_MMU: arm,
m68k, microblaze, riscv, sh

If we look at arches that work with HAVE_KVM, I only see: arm64, mips,
powerpc, s390, x86

So my conclusion is there is no intersection between !MMU and HAVE_KVM?

> Should I just remove the nommu version of follow_pfn and see what happens?
> We can't remove it earlier since it's still used by other
> subsystems.

This is what I was thinking might work

Jason
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH rdma-core v2 2/6] verbs: Support dma-buf based memory region

2020-11-25 Thread Yishai Hadas

On 11/24/2020 11:38 PM, Jianxin Xiong wrote:

Add new API function and new provider method for registering dma-buf
based memory region. Update the man page and bump the API version.

Signed-off-by: Jianxin Xiong 


I don't see that this  V2 fixes the notes that I published on V1 (fork 
handling , man page fix).




---
  debian/libibverbs1.symbols   |  2 ++
  libibverbs/CMakeLists.txt|  2 +-
  libibverbs/cmd_mr.c  | 38 ++
  libibverbs/driver.h  |  7 +++
  libibverbs/dummy_ops.c   | 11 +++
  libibverbs/libibverbs.map.in |  6 ++
  libibverbs/man/ibv_reg_mr.3  | 21 +++--
  libibverbs/verbs.c   | 17 +
  libibverbs/verbs.h   | 11 +++
  9 files changed, 112 insertions(+), 3 deletions(-)

diff --git a/debian/libibverbs1.symbols b/debian/libibverbs1.symbols
index 9130f41..fcf4d87 100644
--- a/debian/libibverbs1.symbols
+++ b/debian/libibverbs1.symbols
@@ -9,6 +9,7 @@ libibverbs.so.1 libibverbs1 #MINVER#
   IBVERBS_1.9@IBVERBS_1.9 30
   IBVERBS_1.10@IBVERBS_1.10 31
   IBVERBS_1.11@IBVERBS_1.11 32
+ IBVERBS_1.12@IBVERBS_1.12 33
   (symver)IBVERBS_PRIVATE_33 33
   _ibv_query_gid_ex@IBVERBS_1.11 32
   _ibv_query_gid_table@IBVERBS_1.11 32
@@ -99,6 +100,7 @@ libibverbs.so.1 libibverbs1 #MINVER#
   ibv_rate_to_mbps@IBVERBS_1.1 1.1.8
   ibv_rate_to_mult@IBVERBS_1.0 1.1.6
   ibv_read_sysfs_file@IBVERBS_1.0 1.1.6
+ ibv_reg_dmabuf_mr@IBVERBS_1.12 33
   ibv_reg_mr@IBVERBS_1.0 1.1.6
   ibv_reg_mr@IBVERBS_1.1 1.1.6
   ibv_reg_mr_iova@IBVERBS_1.7 25
diff --git a/libibverbs/CMakeLists.txt b/libibverbs/CMakeLists.txt
index 0fe4256..d075225 100644
--- a/libibverbs/CMakeLists.txt
+++ b/libibverbs/CMakeLists.txt
@@ -21,7 +21,7 @@ configure_file("libibverbs.map.in"
  
  rdma_library(ibverbs "${CMAKE_CURRENT_BINARY_DIR}/libibverbs.map"

# See Documentation/versioning.md
-  1 1.11.${PACKAGE_VERSION}
+  1 1.12.${PACKAGE_VERSION}
all_providers.c
cmd.c
cmd_ah.c
diff --git a/libibverbs/cmd_mr.c b/libibverbs/cmd_mr.c
index 42dbe42..95ed2d1 100644
--- a/libibverbs/cmd_mr.c
+++ b/libibverbs/cmd_mr.c
@@ -1,5 +1,6 @@
  /*
   * Copyright (c) 2018 Mellanox Technologies, Ltd.  All rights reserved.
+ * Copyright (c) 2020 Intel Corporation.  All rights reserved.
   *
   * This software is available to you under a choice of one of two
   * licenses.  You may choose to be licensed under the terms of the GNU
@@ -116,3 +117,40 @@ int ibv_cmd_query_mr(struct ibv_pd *pd, struct verbs_mr 
*vmr,
return 0;
  }
  
+int ibv_cmd_reg_dmabuf_mr(struct ibv_pd *pd, uint64_t offset, size_t length,

+ uint64_t iova, int fd, int access,
+ struct verbs_mr *vmr)
+{
+   DECLARE_COMMAND_BUFFER(cmdb, UVERBS_OBJECT_MR,
+  UVERBS_METHOD_REG_DMABUF_MR,
+  9);
+   struct ib_uverbs_attr *handle;
+   uint32_t lkey, rkey;
+   int ret;
+
+   handle = fill_attr_out_obj(cmdb, UVERBS_ATTR_REG_DMABUF_MR_HANDLE);
+   fill_attr_out_ptr(cmdb, UVERBS_ATTR_REG_DMABUF_MR_RESP_LKEY, &lkey);
+   fill_attr_out_ptr(cmdb, UVERBS_ATTR_REG_DMABUF_MR_RESP_RKEY, &rkey);
+
+   fill_attr_in_obj(cmdb, UVERBS_ATTR_REG_DMABUF_MR_PD_HANDLE, pd->handle);
+   fill_attr_in_uint64(cmdb, UVERBS_ATTR_REG_DMABUF_MR_OFFSET, offset);
+   fill_attr_in_uint64(cmdb, UVERBS_ATTR_REG_DMABUF_MR_LENGTH, length);
+   fill_attr_in_uint64(cmdb, UVERBS_ATTR_REG_DMABUF_MR_IOVA, iova);
+   fill_attr_in_uint32(cmdb, UVERBS_ATTR_REG_DMABUF_MR_FD, fd);
+   fill_attr_in_uint32(cmdb, UVERBS_ATTR_REG_DMABUF_MR_ACCESS_FLAGS, 
access);
+
+   ret = execute_ioctl(pd->context, cmdb);
+   if (ret)
+   return errno;
+
+   vmr->ibv_mr.handle = read_attr_obj(UVERBS_ATTR_REG_DMABUF_MR_HANDLE,
+  handle);
+   vmr->ibv_mr.context = pd->context;
+   vmr->ibv_mr.lkey = lkey;
+   vmr->ibv_mr.rkey = rkey;
+   vmr->ibv_mr.pd = pd;
+   vmr->ibv_mr.addr = (void *)offset;
+   vmr->ibv_mr.length = length;
+   vmr->mr_type = IBV_MR_TYPE_MR;
+   return 0;
+}
diff --git a/libibverbs/driver.h b/libibverbs/driver.h
index ab80f4b..d6a9d0a 100644
--- a/libibverbs/driver.h
+++ b/libibverbs/driver.h
@@ -2,6 +2,7 @@
   * Copyright (c) 2004, 2005 Topspin Communications.  All rights reserved.
   * Copyright (c) 2005, 2006 Cisco Systems, Inc.  All rights reserved.
   * Copyright (c) 2005 PathScale, Inc.  All rights reserved.
+ * Copyright (c) 2020 Intel Corporation. All rights reserved.
   *
   * This software is available to you under a choice of one of two
   * licenses.  You may choose to be licensed under the terms of the GNU
@@ -373,6 +374,9 @@ struct verbs_context_ops {
struct ibv_mr *(*reg_dm_mr)(struct ibv_pd *pd, struct ibv_dm *dm,
uint64_t dm_offset, size_t length,
unsigned int

[PATCHv10 7/9] drm/msm/a6xx: Add support for using system cache on MMU500 based targets

2020-11-25 Thread Sai Prakash Ranjan
From: Jordan Crouse 

GPU targets with an MMU-500 attached have a slightly different process for
enabling system cache. Use the compatible string on the IOMMU phandle
to see if an MMU-500 is attached and modify the programming sequence
accordingly.

Signed-off-by: Jordan Crouse 
Signed-off-by: Sai Prakash Ranjan 
---
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 46 +--
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h |  1 +
 2 files changed, 37 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 95c98c642876..3f8b92da8cba 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -1042,6 +1042,8 @@ static void a6xx_llc_deactivate(struct a6xx_gpu *a6xx_gpu)
 
 static void a6xx_llc_activate(struct a6xx_gpu *a6xx_gpu)
 {
+   struct adreno_gpu *adreno_gpu = &a6xx_gpu->base;
+   struct msm_gpu *gpu = &adreno_gpu->base;
u32 cntl1_regval = 0;
 
if (IS_ERR(a6xx_gpu->llc_mmio))
@@ -1055,11 +1057,17 @@ static void a6xx_llc_activate(struct a6xx_gpu *a6xx_gpu)
   (gpu_scid << 15) | (gpu_scid << 20);
}
 
+   /*
+* For targets with a MMU500, activate the slice but don't program the
+* register.  The XBL will take care of that.
+*/
if (!llcc_slice_activate(a6xx_gpu->htw_llc_slice)) {
-   u32 gpuhtw_scid = llcc_get_slice_id(a6xx_gpu->htw_llc_slice);
+   if (!a6xx_gpu->have_mmu500) {
+   u32 gpuhtw_scid = 
llcc_get_slice_id(a6xx_gpu->htw_llc_slice);
 
-   gpuhtw_scid &= 0x1f;
-   cntl1_regval |= FIELD_PREP(GENMASK(29, 25), gpuhtw_scid);
+   gpuhtw_scid &= 0x1f;
+   cntl1_regval |= FIELD_PREP(GENMASK(29, 25), 
gpuhtw_scid);
+   }
}
 
if (cntl1_regval) {
@@ -1067,13 +1075,20 @@ static void a6xx_llc_activate(struct a6xx_gpu *a6xx_gpu)
 * Program the slice IDs for the various GPU blocks and GPU MMU
 * pagetables
 */
-   a6xx_llc_write(a6xx_gpu, REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_1, 
cntl1_regval);
-
-   /*
-* Program cacheability overrides to not allocate cache lines on
-* a write miss
-*/
-   a6xx_llc_rmw(a6xx_gpu, REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_0, 
0xF, 0x03);
+   if (a6xx_gpu->have_mmu500)
+   gpu_rmw(gpu, REG_A6XX_GBIF_SCACHE_CNTL1, GENMASK(24, 0),
+   cntl1_regval);
+   else {
+   a6xx_llc_write(a6xx_gpu,
+   REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_1, 
cntl1_regval);
+
+   /*
+* Program cacheability overrides to not allocate cache
+* lines on a write miss
+*/
+   a6xx_llc_rmw(a6xx_gpu,
+   REG_A6XX_CX_MISC_SYSTEM_CACHE_CNTL_0, 0xF, 
0x03);
+   }
}
 }
 
@@ -1086,10 +1101,21 @@ static void a6xx_llc_slices_destroy(struct a6xx_gpu 
*a6xx_gpu)
 static void a6xx_llc_slices_init(struct platform_device *pdev,
struct a6xx_gpu *a6xx_gpu)
 {
+   struct device_node *phandle;
+
a6xx_gpu->llc_mmio = msm_ioremap(pdev, "cx_mem", "gpu_cx");
if (IS_ERR(a6xx_gpu->llc_mmio))
return;
 
+   /*
+* There is a different programming path for targets with an mmu500
+* attached, so detect if that is the case
+*/
+   phandle = of_parse_phandle(pdev->dev.of_node, "iommus", 0);
+   a6xx_gpu->have_mmu500 = (phandle &&
+   of_device_is_compatible(phandle, "arm,mmu-500"));
+   of_node_put(phandle);
+
a6xx_gpu->llc_slice = llcc_slice_getd(LLCC_GPU);
a6xx_gpu->htw_llc_slice = llcc_slice_getd(LLCC_GPUHTW);
 
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h 
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
index 9e6079af679c..e793d329e77b 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
@@ -32,6 +32,7 @@ struct a6xx_gpu {
void __iomem *llc_mmio;
void *llc_slice;
void *htw_llc_slice;
+   bool have_mmu500;
 };
 
 #define to_a6xx_gpu(x) container_of(x, struct a6xx_gpu, base)
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCHv10 4/9] iommu/arm-smmu: Move non-strict mode to use io_pgtable_domain_attr

2020-11-25 Thread Sai Prakash Ranjan
Now that we have a struct io_pgtable_domain_attr with quirks,
use that for non_strict mode as well thereby removing the need
for more members of arm_smmu_domain in the future.

Signed-off-by: Sai Prakash Ranjan 
---
 drivers/iommu/arm/arm-smmu/arm-smmu.c | 15 +--
 drivers/iommu/arm/arm-smmu/arm-smmu.h |  1 -
 2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 4b9b10fe50ed..d8979bb71fc0 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -786,9 +786,6 @@ static int arm_smmu_init_domain_context(struct iommu_domain 
*domain,
goto out_clear_smmu;
}
 
-   if (smmu_domain->non_strict)
-   pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
-
if (smmu_domain->pgtbl_cfg.quirks)
pgtbl_cfg.quirks |= smmu_domain->pgtbl_cfg.quirks;
 
@@ -1526,9 +1523,12 @@ static int arm_smmu_domain_get_attr(struct iommu_domain 
*domain,
break;
case IOMMU_DOMAIN_DMA:
switch (attr) {
-   case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE:
-   *(int *)data = smmu_domain->non_strict;
+   case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE: {
+   bool non_strict = smmu_domain->pgtbl_cfg.quirks &
+ IO_PGTABLE_QUIRK_NON_STRICT;
+   *(int *)data = non_strict;
return 0;
+   }
default:
return -ENODEV;
}
@@ -1578,7 +1578,10 @@ static int arm_smmu_domain_set_attr(struct iommu_domain 
*domain,
case IOMMU_DOMAIN_DMA:
switch (attr) {
case DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE:
-   smmu_domain->non_strict = *(int *)data;
+   if (*(int *)data)
+   smmu_domain->pgtbl_cfg.quirks |= 
IO_PGTABLE_QUIRK_NON_STRICT;
+   else
+   smmu_domain->pgtbl_cfg.quirks &= 
~IO_PGTABLE_QUIRK_NON_STRICT;
break;
default:
ret = -ENODEV;
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.h 
b/drivers/iommu/arm/arm-smmu/arm-smmu.h
index bb5a419f240f..cb7ca3a444c9 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.h
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.h
@@ -368,7 +368,6 @@ struct arm_smmu_domain {
const struct iommu_flush_ops*flush_ops;
struct arm_smmu_cfg cfg;
enum arm_smmu_domain_stage  stage;
-   boolnon_strict;
struct mutexinit_mutex; /* Protects smmu pointer */
spinlock_t  cb_lock; /* Serialises ATS1* ops and 
TLB syncs */
struct iommu_domain domain;
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Linux 5.10-rc4; graphics alignment

2020-11-25 Thread Thomas Zimmermann

Hi

Am 24.11.20 um 17:27 schrieb David Laight:

From: David Laight

Sent: 20 November 2020 15:39

From: Thomas Zimmermann

Sent: 20 November 2020 13:42

...

I did a diff from v5.10-rc4 to drm-tip to look for suspicious changes.
Some candidates are

8e3784dfef8a ("drm/ast: Reload gamma LUT after changing primary
plane's color format")


Ok, that one fixes the screen colours (etc).
So 8e3784dfef8a was good and then HEAD^ was bad.

I might try to bisect the breakage.

The stack splat is entirely different.
I'll try to bisect that on Linus's tree.


The good news is I'm not getting the stack splat on rc5.
I'm not sure I can be bothered to find out when :-)

Applying 8e3784dfef8a to rc5 by hand also fixes the display colours.


I've added this commit to drm-misc-fixes and it should show up in the 
upstream kernel soonish.


Best regards
Thomas



David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel



--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer


OpenPGP_0x680DC11D530B7A23.asc
Description: application/pgp-keys


OpenPGP_signature
Description: OpenPGP digital signature
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 4/7] drm/radeon: Pin buffers while they are vmap'ed

2020-11-25 Thread Thomas Zimmermann

Hi

Am 24.11.20 um 15:06 schrieb Christian König:

Am 24.11.20 um 14:56 schrieb Thomas Zimmermann:

Hi

Am 24.11.20 um 14:36 schrieb Christian König:

Am 24.11.20 um 13:15 schrieb Thomas Zimmermann:

[SNIP]
First I wanted to put this into drm_gem_ttm_vmap/vunmap(), but 
then wondered why ttm_bo_vmap() doe not acquire the lock 
internally? I'd expect that vmap/vunmap are close together and 
do not overlap for the same BO. 


We have use cases like the following during command submission:

1. lock
2. map
3. copy parts of the BO content somewhere else or patch it with 
additional information

4. unmap
5. submit BO to the hardware
6. add hardware fence to the BO to make sure it doesn't move
7. unlock

That use case won't be possible with vmap/vunmap if we move the 
lock/unlock into it and I hope to replace the kmap/kunmap 
functions with them in the near term.


Otherwise, acquiring the reservation lock would require another 
ref-counting variable or per-driver code.


Hui, why that? Just put this into drm_gem_ttm_vmap/vunmap() 
helper as you initially planned.


Given your example above, step one would acquire the lock, and 
step two would also acquire the lock as part of the vmap 
implementation. Wouldn't this fail (At least during unmap or 
unlock steps) ?


Oh, so you want to nest them? No, that is a rather bad no-go.


I don't want to nest/overlap them. My question was whether that 
would be required. Apparently not.


While the console's BO is being set for scanout, it's protected from 
movement via the pin/unpin implementation, right?


Yes, correct.

The driver does not acquire the resv lock for longer periods. I'm 
asking because this would prevent any console-buffer updates while 
the console is being displayed.


Correct as well, we only hold the lock for things like command 
submission, pinning, unpinning etc etc




Thanks for answering my questions.





You need to make sure that the lock is only taken from the FB path 
which wants to vmap the object.


Why don't you lock the GEM object from the caller in the generic FB 
implementation?


With the current blitter code, it breaks abstraction. if vmap/vunmap 
hold the lock implicitly, things would be easier.


Do you have a link to the code?


It's the damage blitter in the fbdev code. [1] While it flushes the 
shadow buffer into the BO, the BO has to be kept in place. I already 
changed it to lock struct drm_fb_helper.lock, but I don't think this 
is enough. TTM could still evict the BO concurrently.


Yeah, that's correct.

But I still don't fully understand the problem. You just need to change 
the code like this:


     mutex_lock(&fb_helper->lock);
     dma_resv_lock(buffer->gem->resv, NULL);

     ret = drm_client_buffer_vmap(buffer, &map);
     if (ret)
     goto out;

     dst = map;
     drm_fb_helper_damage_blit_real(fb_helper, clip, &dst);

     drm_client_buffer_vunmap(buffer);

out:
     dma_resv_unlock(buffer->gem->resv);
     mutex_unlock(&fb_helper->lock);



Yes, that's the code I had in mind.



You could abstract that in drm_client functions as well, but I don't 
really see the value in that.


The fbdev code tries hard to not use GEM directly, but to wrap 
everything behind client interfaces. I'm not sure if I like that, but 
for now I'd stick to this design.


Best regards
Thomas



Regards,
Christian.

There's no recursion taking place, so I guess the reservation lock 
could be acquired/release in drm_client_buffer_vmap/vunmap(), or a 
separate pair of DRM client functions could do the locking.


Best regards
Thomas

[1] 
https://cgit.freedesktop.org/drm/drm-tip/tree/drivers/gpu/drm/drm_fb_helper.c?id=ac60f3f3090115d21f028bffa2dcfb67f695c4f2#n394 





Please note that the reservation lock you need to take here is part 
of the GEM object.


Usually we design things in the way that the code needs to take a 
lock which protects an object, then do some operations with the 
object and then release the lock again.


Having in the lock inside the operation can be done as well, but 
returning with it is kind of unusual design.


Sorry for the noob questions. I'm still trying to understand the 
implications of acquiring these locks.


Well this is the reservation lock of the GEM object we are talking 
about here. We need to take that for a couple of different 
operations, vmap/vunmap doesn't sound like a special case to me.


Regards,
Christian.



Best regards
Thomas


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel




___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer


OpenPGP_0x680DC11D530B7A23.asc
Description: applicatio

Re: [PATCH 4/7] drm/radeon: Pin buffers while they are vmap'ed

2020-11-25 Thread Thomas Zimmermann

Hi

Am 24.11.20 um 15:09 schrieb Daniel Vetter:

On Tue, Nov 24, 2020 at 02:56:51PM +0100, Thomas Zimmermann wrote:

Hi

Am 24.11.20 um 14:36 schrieb Christian König:

Am 24.11.20 um 13:15 schrieb Thomas Zimmermann:

[SNIP]

First I wanted to put this into
drm_gem_ttm_vmap/vunmap(), but then wondered why
ttm_bo_vmap() doe not acquire the lock internally?
I'd expect that vmap/vunmap are close together and
do not overlap for the same BO.


We have use cases like the following during command submission:

1. lock
2. map
3. copy parts of the BO content somewhere else or patch
it with additional information
4. unmap
5. submit BO to the hardware
6. add hardware fence to the BO to make sure it doesn't move
7. unlock

That use case won't be possible with vmap/vunmap if we
move the lock/unlock into it and I hope to replace the
kmap/kunmap functions with them in the near term.


Otherwise, acquiring the reservation lock would
require another ref-counting variable or per-driver
code.


Hui, why that? Just put this into
drm_gem_ttm_vmap/vunmap() helper as you initially
planned.


Given your example above, step one would acquire the lock,
and step two would also acquire the lock as part of the vmap
implementation. Wouldn't this fail (At least during unmap or
unlock steps) ?


Oh, so you want to nest them? No, that is a rather bad no-go.


I don't want to nest/overlap them. My question was whether that
would be required. Apparently not.

While the console's BO is being set for scanout, it's protected from
movement via the pin/unpin implementation, right?


Yes, correct.


The driver does not acquire the resv lock for longer periods. I'm
asking because this would prevent any console-buffer updates while
the console is being displayed.


Correct as well, we only hold the lock for things like command
submission, pinning, unpinning etc etc



Thanks for answering my questions.





You need to make sure that the lock is only taken from the FB
path which wants to vmap the object.

Why don't you lock the GEM object from the caller in the generic
FB implementation?


With the current blitter code, it breaks abstraction. if vmap/vunmap
hold the lock implicitly, things would be easier.


Do you have a link to the code?


It's the damage blitter in the fbdev code. [1] While it flushes the shadow
buffer into the BO, the BO has to be kept in place. I already changed it to
lock struct drm_fb_helper.lock, but I don't think this is enough. TTM could
still evict the BO concurrently.


So I'm not sure this is actually a problem: ttm could try to concurrently
evict the buffer we pinned into vram, and then just skip to the next one.

Plus atm generic fbdev isn't used on any chip where we really care about
that last few mb of vram being useable for command submission (well atm
there's no driver using it).


Well, this is the patchset for radeon. If it works out, amdgpu and 
nouveau are natural next choices. Especially radeon and nouveau support 
cards with low- to medium-sized VRAM. The MiBs wasted on fbdev certainly 
matter.




Having the buffer pinned into system memory and trying to do a concurrent
modeset that tries to pull it in is the hard failure mode. And holding
fb_helper.lock fully prevents that.

So not really clear on what failure mode you're seeing here?


Imagine the fbdev BO is in VRAM, but not pinned. (Maybe Xorg or Wayland 
is running.) The fbdev BO is a few MiBs and not in use, so TTM would 
want to evict it if memory gets tight.


What I have in mind is a concurrent modeset that requires the memory. 
If we do a concurrent damage blit without protecting against eviction, 
things go boom. Same for concurrent 3d graphics with textures, model 
data, etc.


Best regards
Thomas




There's no recursion taking place, so I guess the reservation lock could be
acquired/release in drm_client_buffer_vmap/vunmap(), or a separate pair of
DRM client functions could do the locking.


Given how this "do the right locking" is a can of worms (and I think it's
worse than what you dug out already) I think the fb_helper.lock hack is
perfectly good enough.

I'm also somewhat worried that starting to use dma_resv lock in generic
code, while many helpers/drivers still have their hand-rolled locking,
will make conversion over to dma_resv needlessly more complicated.
-Daniel



Best regards
Thomas

[1] 
https://cgit.freedesktop.org/drm/drm-tip/tree/drivers/gpu/drm/drm_fb_helper.c?id=ac60f3f3090115d21f028bffa2dcfb67f695c4f2#n394



Please note that the reservation lock you need to take here is part of
the GEM object.

Usually we design things in the way that the code needs to take a lock
which protects an object, then do some operations with the object and
then release the lock again.

Having in the lock inside the operation can be done as well, but
returning with it is kind of unusual design.


Sorry for the noob questions. I'm still trying to understand the
implications of acquiring these locks.


Well this is the reservation lock of the GEM

Re: nwl-dsi: fixup mode only for LCDIF input, not DCSS

2020-11-25 Thread Guido Günther
Hi Lukas,
On Tue, Nov 24, 2020 at 06:12:17PM +0100, Lukas F. Hartmann wrote:
> The fixup of HSYNC and VSYNC should not be done when the input source is
> DCSS, or internal display does not work on MNT Reform 2 (open hardware 
> laptop based on NXP i.MX8M using DCSS->DSI->eDP for internal display).
> 
> Signed-off-by: Lukas F. Hartmann 
> ---
>  drivers/gpu/drm/bridge/nwl-dsi.c | 12 +---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/bridge/nwl-dsi.c 
> b/drivers/gpu/drm/bridge/nwl-dsi.c
> index 66b67402f..6735ab2a2 100644
> --- a/drivers/gpu/drm/bridge/nwl-dsi.c
> +++ b/drivers/gpu/drm/bridge/nwl-dsi.c
> @@ -807,10 +807,16 @@ static bool nwl_dsi_bridge_mode_fixup(struct drm_bridge 
> *bridge,
> const struct drm_display_mode *mode,
> struct drm_display_mode *adjusted_mode)
>  {
> - /* At least LCDIF + NWL needs active high sync */
> - adjusted_mode->flags |= (DRM_MODE_FLAG_PHSYNC | DRM_MODE_FLAG_PVSYNC);
> - adjusted_mode->flags &= ~(DRM_MODE_FLAG_NHSYNC | DRM_MODE_FLAG_NVSYNC);
> + struct device_node *remote;
> + struct nwl_dsi *dsi = bridge_to_dsi(bridge);
> +
> + remote = of_graph_get_remote_node(dsi->dev->of_node, 0,
> + NWL_DSI_ENDPOINT_LCDIF);
> + if (remote) {
> + /* At least LCDIF + NWL needs active high sync */
> + adjusted_mode->flags |= (DRM_MODE_FLAG_PHSYNC | 
> DRM_MODE_FLAG_PVSYNC);
> + adjusted_mode->flags &= ~(DRM_MODE_FLAG_NHSYNC | 
> DRM_MODE_FLAG_NVSYNC);
> + }

When submitting the NWL driver i was told to not change properties based
on the endpoint. The argument is that this breaks when putting the
bridge into another chain and that there might be other bridges in
between. Maybe Laurent and Andrzej have a suggetion?

I intend to respin the input mux bridge
(https://lore.kernel.org/dri-devel/cover.1589548223.git@sigxcpu.org/)
at some point but even then we need to carry over the flags, so any
input on how that should best be done would be welcome.

Cheers,
 -- Guido

> 
>   return true;
>  }
> --
> 2.28.0
> 
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v6 17/17] RFC: mm: add mmu_notifier argument to follow_pfn

2020-11-25 Thread Daniel Vetter
On Wed, Nov 25, 2020 at 9:13 AM Jason Gunthorpe  wrote:
>
> On Tue, Nov 24, 2020 at 03:28:14PM +0100, Daniel Vetter wrote:
> > On Fri, Nov 20, 2020 at 02:30:29PM -0400, Jason Gunthorpe wrote:
> > > On Thu, Nov 19, 2020 at 03:41:46PM +0100, Daniel Vetter wrote:
> > > > @@ -4805,21 +4824,15 @@ EXPORT_SYMBOL(follow_pte_pmd);
> > > >   * Return: zero and the pfn at @pfn on success, -ve otherwise.
> > > >   */
> > > >  int follow_pfn(struct vm_area_struct *vma, unsigned long address,
> > > > - unsigned long *pfn)
> > > > + unsigned long *pfn, struct mmu_notifier *subscription)
> > > >  {
> > > > - int ret = -EINVAL;
> > > > - spinlock_t *ptl;
> > > > - pte_t *ptep;
> > > > + if (WARN_ON(!subscription->mm))
> > > > + return -EINVAL;
> > > >
> > > > + if (WARN_ON(subscription->mm != vma->vm_mm))
> > > > + return -EINVAL;
> > >
> > > These two things are redundant right? vma->vm_mm != NULL?
> >
> > Yup, will remove.
> >
> > > BTW, why do we even have this for nommu? If the only caller is kvm,
> > > can you even compile kvm on nommu??
> >
> > Kinda makes sense, but I have no idea how to make sure with compile
> > testing this is really the case. And I didn't see any hard evidence in
> > Kconfig or Makefile that mmu notifiers requires CONFIG_MMU. So not sure
> > what to do here.
>
> It looks like only some arches have selectable CONFIG_MMU: arm,
> m68k, microblaze, riscv, sh
>
> If we look at arches that work with HAVE_KVM, I only see: arm64, mips,
> powerpc, s390, x86
>
> So my conclusion is there is no intersection between !MMU and HAVE_KVM?
>
> > Should I just remove the nommu version of follow_pfn and see what happens?
> > We can't remove it earlier since it's still used by other
> > subsystems.
>
> This is what I was thinking might work

Makes sense, I'll do that for the next round.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v3 10/12] drm/amdgpu: Avoid sysfs dirs removal post device unplug

2020-11-25 Thread Daniel Vetter
On Tue, Nov 24, 2020 at 11:27 PM Andrey Grodzovsky
 wrote:
>
>
> On 11/24/20 9:49 AM, Daniel Vetter wrote:
> > On Sat, Nov 21, 2020 at 12:21:20AM -0500, Andrey Grodzovsky wrote:
> >> Avoids NULL ptr due to kobj->sd being unset on device removal.
> >>
> >> Signed-off-by: Andrey Grodzovsky 
> >> ---
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c   | 4 +++-
> >>   drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 4 +++-
> >>   2 files changed, 6 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> >> index caf828a..812e592 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> >> @@ -27,6 +27,7 @@
> >>   #include 
> >>   #include 
> >>   #include 
> >> +#include 
> >>
> >>   #include "amdgpu.h"
> >>   #include "amdgpu_ras.h"
> >> @@ -1043,7 +1044,8 @@ static int 
> >> amdgpu_ras_sysfs_remove_feature_node(struct amdgpu_device *adev)
> >>  .attrs = attrs,
> >>  };
> >>
> >> -sysfs_remove_group(&adev->dev->kobj, &group);
> >> +if (!drm_dev_is_unplugged(&adev->ddev))
> >> +sysfs_remove_group(&adev->dev->kobj, &group);
> > This looks wrong. sysfs, like any other interface, should be
> > unconditionally thrown out when we do the drm_dev_unregister. Whether
> > hotunplugged or not should matter at all. Either this isn't needed at all,
> > or something is wrong with the ordering here. But definitely fishy.
> > -Daniel
>
>
> So technically this is needed because kobejct's sysfs directory entry kobj->sd
> is set to NULL
> on device removal (from sysfs_remove_dir) but because we don't finalize the 
> device
> until last reference to drm file is dropped (which can happen later) we end up
> calling sysfs_remove_file/dir after
> this pointer is NULL. sysfs_remove_file checks for NULL and aborts while
> sysfs_remove_dir
> is not and that why I guard against calls to sysfs_remove_dir.
> But indeed the whole approach in the driver is incorrect, as Greg pointed out 
> -
> we should use
> default groups attributes instead of explicit calls to sysfs interface and 
> this
> would save those troubles.
> But again. the issue here of scope of work, converting all of amdgpu to 
> default
> groups attributes is somewhat
> lengthy process with extra testing as the entire driver is papered with sysfs
> references and seems to me more of a standalone
> cleanup, just like switching to devm_ and drmm_ work. To me at least it seems
> that it makes more sense
> to finalize and push the hot unplug patches so that this new functionality can
> be part of the driver sooner
> and then incrementally improve it by working on those other topics. Just as
> devm_/drmm_ I also added sysfs cleanup
> to my TODO list in the RFC patch.

Hm, whether you solve this with the default group stuff to
auto-remove, or remove explicitly at the right time doesn't matter
much. The underlying problem you have here is that it's done way too
late. sysfs removal (like all uapi interfaces) need to be removed as
part of drm_dev_unregister. I guess aside from the split into fini_hw
and fini_sw, you also need an unregister_late callback (like we have
already for drm_connector, so that e.g. backlight and similar stuff
can be unregistered).

Papering over the underlying bug like this doesn't really fix much,
the lifetimes are still wrong.
-Daniel

>
> Andrey
>
>
> >
> >>
> >>  return 0;
> >>   }
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c 
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
> >> index 2b7c90b..54331fc 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
> >> @@ -24,6 +24,7 @@
> >>   #include 
> >>   #include 
> >>   #include 
> >> +#include 
> >>
> >>   #include "amdgpu.h"
> >>   #include "amdgpu_ucode.h"
> >> @@ -464,7 +465,8 @@ int amdgpu_ucode_sysfs_init(struct amdgpu_device *adev)
> >>
> >>   void amdgpu_ucode_sysfs_fini(struct amdgpu_device *adev)
> >>   {
> >> -sysfs_remove_group(&adev->dev->kobj, &fw_attr_group);
> >> +if (!drm_dev_is_unplugged(&adev->ddev))
> >> +sysfs_remove_group(&adev->dev->kobj, &fw_attr_group);
> >>   }
> >>
> >>   static int amdgpu_ucode_init_single_fw(struct amdgpu_device *adev,
> >> --
> >> 2.7.4
> >>



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [REGRESSION] omapdrm/N900 display broken

2020-11-25 Thread Daniel Vetter
On Tue, Aug 25, 2020 at 3:16 PM Tomi Valkeinen  wrote:
>
> Hi Laurent,
>
> On 23/08/2020 19:26, Aaro Koskinen wrote:
> > Hi,
> >
> > On Tue, Aug 04, 2020 at 03:39:37PM +0300, Tomi Valkeinen wrote:
> >> On 04/08/2020 15:13, Tomi Valkeinen wrote:
> >
> >>> Can you try to pinpoint a bit where the hang happens? Maybe add
> >>> DRM/omapdrm debug prints, or perhaps sysrq works and it shows a lock
> >>> that's in deadlock.
> >>
> >> Also, one data point would be to disable venc, e.g. set venc status to
> >> "disabled" in dts.
> >
> > Disabling venc makes no difference.
> >
> > The hang happens in drm_fb_helper_initial_config(). I followed the
> > "HANG DEBUGGING" tips in the function comment text and enabled
> > fb.lockless_register_fb=1 to get more (serial) console output.
> >
> > Now I get this:
> >
> > [6.514739] omapdss_dss 4805.dss: supply vdda_video not found, using 
> > dummy regulator
> > [6.566375] DSS: OMAP DSS rev 2.0
> > [6.571807] omapdss_dss 4805.dss: bound 48050400.dispc (ops 
> > dispc_component_ops)
> > [6.580749] omapdrm omapdrm.0: DMM not available, disable DMM support
> > [6.587982] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
> > [6.626617] [ cut here ]
> > [6.631774] WARNING: CPU: 0 PID: 18 at drivers/gpu/drm/drm_bridge.c:708 
> > drm_atomic_helper_commit_modeset_enables+0x134/0x268
> > [6.643768] Modules linked in:
> > [6.647033] CPU: 0 PID: 18 Comm: kworker/0:1 Tainted: G U
> > 5.8.0-omap3-los_16068+-4-g2e7d4a7efefd-dirty #2
> > [6.658966] Hardware name: Nokia RX-51 board
> > [6.663635] Workqueue: events deferred_probe_work_func
> > [6.669097] [] (unwind_backtrace) from [] 
> > (show_stack+0x10/0x14)
> > [6.677429] [] (show_stack) from [] 
> > (__warn+0xbc/0xd4)
> > [6.684844] [] (__warn) from [] 
> > (warn_slowpath_fmt+0x60/0xb8)
> > [6.692901] [] (warn_slowpath_fmt) from [] 
> > (drm_atomic_helper_commit_modeset_enables+0x134/0x268)
> > [6.704254] [] (drm_atomic_helper_commit_modeset_enables) from 
> > [] (omap_atomic_commit_tail+0xb4/0xc0)
> > [6.715972] [] (omap_atomic_commit_tail) from [] 
> > (commit_tail+0x9c/0x1a8)
> > [6.725128] [] (commit_tail) from [] 
> > (drm_atomic_helper_commit+0x134/0x158)
> > [6.734466] [] (drm_atomic_helper_commit) from [] 
> > (drm_client_modeset_commit_atomic+0x16c/0x208)
> > [6.745727] [] (drm_client_modeset_commit_atomic) from 
> > [] (drm_client_modeset_commit_locked+0x58/0x184)
> > [6.757629] [] (drm_client_modeset_commit_locked) from 
> > [] (drm_client_modeset_commit+0x24/0x40)
> > [6.768798] [] (drm_client_modeset_commit) from [] 
> > (__drm_fb_helper_restore_fbdev_mode_unlocked+0xa0/0xc8)
> > [6.780975] [] (__drm_fb_helper_restore_fbdev_mode_unlocked) 
> > from [] (drm_fb_helper_set_par+0x38/0x64)
> > [6.792785] [] (drm_fb_helper_set_par) from [] 
> > (fbcon_init+0x3d4/0x568)
> > [6.801757] [] (fbcon_init) from [] 
> > (visual_init+0xb8/0xfc)
> > [6.809631] [] (visual_init) from [] 
> > (do_bind_con_driver+0x1e0/0x3bc)
> > [6.818267] [] (do_bind_con_driver) from [] 
> > (do_take_over_console+0x138/0x1d8)
> > [6.827880] [] (do_take_over_console) from [] 
> > (do_fbcon_takeover+0x74/0xd4)
> > [6.837219] [] (do_fbcon_takeover) from [] 
> > (register_framebuffer+0x204/0x2d8)
> > [6.846740] [] (register_framebuffer) from [] 
> > (__drm_fb_helper_initial_config_and_unlock+0x3a4/0x554)
> > [6.858459] [] (__drm_fb_helper_initial_config_and_unlock) 
> > from [] (omap_fbdev_init+0x84/0xbc)
> > [6.869537] [] (omap_fbdev_init) from [] 
> > (pdev_probe+0x580/0x7d8)
> > [6.877807] [] (pdev_probe) from [] 
> > (platform_drv_probe+0x48/0x98)
>
> Laurent, does this ring any bells? The WARN comes in 
> drm_atomic_bridge_chain_enable() when
> drm_atomic_get_old_bridge_state() returns null for (presumably) sdi bridge.
>
> I'm not sure why the bridge state would not be there.

Lack of state on first modeset usually means your
drm_mode_config_reset didn't create one. Or whatever it is you're
using. I didn't look whether you're wiring this up correctly or not.
We might even want to add a ->reset function to
drm_private_state_funcs to make this work for everyone.
-Daniel

> Aaro, you can probably debug easier if you disable 
> CONFIG_FRAMEBUFFER_CONSOLE, or even
> CONFIG_DRM_FBDEV_EMULATION.
>
>  Tomi
>
> --
> Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
> Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
> ___
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] drm/ast: Fixed CVE for DP501

2020-11-25 Thread KuoHsiang Chou
[Bug][DP501]
1. For security concerning, P2A have to be disabled by CVE regulation.
2. FrameBuffer reverses last 2MB used for the image of DP501.
3. If P2A is disallowed, the default "ioremap()" behavior is non-cached
   and could be an alternative accessing on the image of DP501.
---
 drivers/gpu/drm/ast/ast_dp501.c | 131 +++-
 drivers/gpu/drm/ast/ast_drv.h   |   2 +
 drivers/gpu/drm/ast/ast_main.c  |  12 +++
 drivers/gpu/drm/ast/ast_mm.c|   1 +
 4 files changed, 110 insertions(+), 36 deletions(-)

diff --git a/drivers/gpu/drm/ast/ast_dp501.c b/drivers/gpu/drm/ast/ast_dp501.c
index 88121c0e0d05..7640364ef2bc 100644
--- a/drivers/gpu/drm/ast/ast_dp501.c
+++ b/drivers/gpu/drm/ast/ast_dp501.c
@@ -189,6 +189,8 @@ bool ast_backup_fw(struct drm_device *dev, u8 *addr, u32 
size)
u32 i, data;
u32 boot_address;

+   if (ast->config_mode != ast_use_p2a) return false;
+
data = ast_mindwm(ast, 0x1e6e2100) & 0x01;
if (data) {
boot_address = get_fw_base(ast);
@@ -207,6 +209,8 @@ static bool ast_launch_m68k(struct drm_device *dev)
u8 *fw_addr = NULL;
u8 jreg;

+   if (ast->config_mode != ast_use_p2a) return false;
+
data = ast_mindwm(ast, 0x1e6e2100) & 0x01;
if (!data) {

@@ -272,24 +276,51 @@ u8 ast_get_dp501_max_clk(struct drm_device *dev)
u32 boot_address, offset, data;
u8 linkcap[4], linkrate, linklanes, maxclk = 0xff;

-   boot_address = get_fw_base(ast);
-
-   /* validate FW version */
-   offset = 0xf000;
-   data = ast_mindwm(ast, boot_address + offset);
-   if ((data & 0xf0) != 0x10) /* version: 1x */
-   return maxclk;
-
-   /* Read Link Capability */
-   offset  = 0xf014;
-   *(u32 *)linkcap = ast_mindwm(ast, boot_address + offset);
-   if (linkcap[2] == 0) {
-   linkrate = linkcap[0];
-   linklanes = linkcap[1];
-   data = (linkrate == 0x0a) ? (90 * linklanes) : (54 * linklanes);
-   if (data > 0xff)
-   data = 0xff;
-   maxclk = (u8)data;
+   if (ast->config_mode == ast_use_p2a) {
+   boot_address = get_fw_base(ast);
+
+   /* validate FW version */
+   offset = 0xf000;
+   data = ast_mindwm(ast, boot_address + offset);
+   if ((data & 0xf0) != 0x10) /* version: 1x */
+   return maxclk;
+
+   /* Read Link Capability */
+   offset  = 0xf014;
+   *(u32 *)linkcap = ast_mindwm(ast, boot_address + offset);
+   if (linkcap[2] == 0) {
+   linkrate = linkcap[0];
+   linklanes = linkcap[1];
+   data = (linkrate == 0x0a) ? (90 * linklanes) : (54 * 
linklanes);
+   if (data > 0xff)
+   data = 0xff;
+   maxclk = (u8)data;
+   }
+   }
+   else {
+   if (!ast->reservedbuffer) return 65;/* 1024x768 as default 
*/
+
+   /* dummy read */
+   offset = 0x;
+   data = *(u32 *) (ast->reservedbuffer + offset);
+
+   /* validate FW version */
+   offset = 0xf000;
+   data = *(u32 *) (ast->reservedbuffer + offset);
+   if ((data & 0xf0) != 0x10) /* version: 1x */
+   return maxclk;
+
+   /* Read Link Capability */
+   offset  = 0xf014;
+   *(u32 *)linkcap = *(u32 *) (ast->reservedbuffer + offset);
+   if (linkcap[2] == 0) {
+   linkrate = linkcap[0];
+   linklanes = linkcap[1];
+   data = (linkrate == 0x0a) ? (90 * linklanes) : (54 * 
linklanes);
+   if (data > 0xff)
+   data = 0xff;
+   maxclk = (u8)data;
+   }
}
return maxclk;
 }
@@ -299,25 +330,53 @@ bool ast_dp501_read_edid(struct drm_device *dev, u8 
*ediddata)
struct ast_private *ast = to_ast_private(dev);
u32 i, boot_address, offset, data;

-   boot_address = get_fw_base(ast);
-
-   /* validate FW version */
-   offset = 0xf000;
-   data = ast_mindwm(ast, boot_address + offset);
-   if ((data & 0xf0) != 0x10)
-   return false;
-
-   /* validate PnP Monitor */
-   offset = 0xf010;
-   data = ast_mindwm(ast, boot_address + offset);
-   if (!(data & 0x01))
-   return false;
+   if (ast->config_mode == ast_use_p2a) {
+   boot_address = get_fw_base(ast);

-   /* Read EDID */
-   offset = 0xf020;
-   for (i = 0; i < 128; i += 4) {
-   data = ast_mindwm(ast, boot_address + offset + i);
-   *(u32 *)(ediddata + i) = data;
+   /* validate FW version */
+   

Re: [Intel-gfx] [PATCH] drm/i915: fix error return code in check_partial_mapping()

2020-11-25 Thread Chris Wilson
Quoting Luo Meng (2020-11-25 01:29:38)
> Fix to return a negative error code from the error handling case
> instead of 0 in function check_partial_mapping(), as done elsewhere
> in this function.

It's not an error, just the end of testing.
-Chris
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 1/6] drm/scheduler: "node" --> "list"

2020-11-25 Thread Christian König

Am 25.11.20 um 04:17 schrieb Luben Tuikov:

Rename "node" to "list" in struct drm_sched_job,
in order to make it consistent with what we see
being used throughout gpu_scheduler.h, for
instance in struct drm_sched_entity, as well as
the rest of DRM and the kernel.

Signed-off-by: Luben Tuikov 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |  6 +++---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  2 +-
  drivers/gpu/drm/scheduler/sched_main.c  | 23 +++--
  include/drm/gpu_scheduler.h |  4 ++--
  5 files changed, 19 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index 5c1f3725c741..8358cae0b5a4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1427,7 +1427,7 @@ static void amdgpu_ib_preempt_job_recovery(struct 
drm_gpu_scheduler *sched)
struct dma_fence *fence;
  
  	spin_lock(&sched->job_list_lock);

-   list_for_each_entry(s_job, &sched->ring_mirror_list, node) {
+   list_for_each_entry(s_job, &sched->ring_mirror_list, list) {
fence = sched->ops->run_job(s_job);
dma_fence_put(fence);
}
@@ -1459,10 +1459,10 @@ static void amdgpu_ib_preempt_mark_partial_job(struct 
amdgpu_ring *ring)
  
  no_preempt:

spin_lock(&sched->job_list_lock);
-   list_for_each_entry_safe(s_job, tmp, &sched->ring_mirror_list, node) {
+   list_for_each_entry_safe(s_job, tmp, &sched->ring_mirror_list, list) {
if (dma_fence_is_signaled(&s_job->s_fence->finished)) {
/* remove job from ring_mirror_list */
-   list_del_init(&s_job->node);
+   list_del_init(&s_job->list);
sched->ops->free_job(s_job);
continue;
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 7560b05e4ac1..4df6de81cd41 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4128,7 +4128,7 @@ bool amdgpu_device_has_job_running(struct amdgpu_device 
*adev)
  
  		spin_lock(&ring->sched.job_list_lock);

job = list_first_entry_or_null(&ring->sched.ring_mirror_list,
-   struct drm_sched_job, node);
+   struct drm_sched_job, list);
spin_unlock(&ring->sched.job_list_lock);
if (job)
return true;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index dcfe8a3b03ff..aca52a46b93d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -271,7 +271,7 @@ void amdgpu_job_stop_all_jobs_on_sched(struct 
drm_gpu_scheduler *sched)
}
  
  	/* Signal all jobs already scheduled to HW */

-   list_for_each_entry(s_job, &sched->ring_mirror_list, node) {
+   list_for_each_entry(s_job, &sched->ring_mirror_list, list) {
struct drm_sched_fence *s_fence = s_job->s_fence;
  
  		dma_fence_set_error(&s_fence->finished, -EHWPOISON);

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index c6332d75025e..c52eba407ebd 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -272,7 +272,7 @@ static void drm_sched_job_begin(struct drm_sched_job *s_job)
struct drm_gpu_scheduler *sched = s_job->sched;
  
  	spin_lock(&sched->job_list_lock);

-   list_add_tail(&s_job->node, &sched->ring_mirror_list);
+   list_add_tail(&s_job->list, &sched->ring_mirror_list);
drm_sched_start_timeout(sched);
spin_unlock(&sched->job_list_lock);
  }
@@ -287,7 +287,7 @@ static void drm_sched_job_timedout(struct work_struct *work)
/* Protects against concurrent deletion in drm_sched_get_cleanup_job */
spin_lock(&sched->job_list_lock);
job = list_first_entry_or_null(&sched->ring_mirror_list,
-  struct drm_sched_job, node);
+  struct drm_sched_job, list);
  
  	if (job) {

/*
@@ -295,7 +295,7 @@ static void drm_sched_job_timedout(struct work_struct *work)
 * drm_sched_cleanup_jobs. It will be reinserted back after 
sched->thread
 * is parked at which point it's safe.
 */
-   list_del_init(&job->node);
+   list_del_init(&job->list);
spin_unlock(&sched->job_list_lock);
  
  		job->sched->ops->timedout_job(job);

@@ -392,7 +392,7 @@ void drm_sched_stop(struct drm_gpu_scheduler *sched, struct 
drm_sched_job *bad)
 * Add at the head of the queue to reflect it was the ea

Re: [PATCH 1/3] drm/virtio: virtio_{blah} --> virtio_gpu_{blah}

2020-11-25 Thread Anthoine Bourgeois

On Mon, Nov 23, 2020 at 06:19:00PM -0800, Gurchetan Singh wrote:

virtio_gpu typically uses the prefix virtio_gpu, but there are
a few places where the virtio prefix is used.  Modify this for
consistency.

Signed-off-by: Gurchetan Singh 

Reviewed-by: Anthoine Bourgeois 

---
drivers/gpu/drm/virtio/virtgpu_debugfs.c | 24 ++
drivers/gpu/drm/virtio/virtgpu_fence.c   | 32 +---
2 files changed, 30 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_debugfs.c 
b/drivers/gpu/drm/virtio/virtgpu_debugfs.c
index 5fefc88d47e4..c2b20e0ee030 100644
--- a/drivers/gpu/drm/virtio/virtgpu_debugfs.c
+++ b/drivers/gpu/drm/virtio/virtgpu_debugfs.c
@@ -28,14 +28,13 @@

#include "virtgpu_drv.h"

-static void virtio_add_bool(struct seq_file *m, const char *name,
-   bool value)
+static void virtio_gpu_add_bool(struct seq_file *m, const char *name,
+   bool value)
{
seq_printf(m, "%-16s : %s\n", name, value ? "yes" : "no");
}

-static void virtio_add_int(struct seq_file *m, const char *name,
-  int value)
+static void virtio_gpu_add_int(struct seq_file *m, const char *name, int value)
{
seq_printf(m, "%-16s : %d\n", name, value);
}
@@ -45,13 +44,16 @@ static int virtio_gpu_features(struct seq_file *m, void 
*data)
struct drm_info_node *node = (struct drm_info_node *)m->private;
struct virtio_gpu_device *vgdev = node->minor->dev->dev_private;

-   virtio_add_bool(m, "virgl", vgdev->has_virgl_3d);
-   virtio_add_bool(m, "edid", vgdev->has_edid);
-   virtio_add_bool(m, "indirect", vgdev->has_indirect);
-   virtio_add_bool(m, "resource uuid", vgdev->has_resource_assign_uuid);
-   virtio_add_bool(m, "blob resources", vgdev->has_resource_blob);
-   virtio_add_int(m, "cap sets", vgdev->num_capsets);
-   virtio_add_int(m, "scanouts", vgdev->num_scanouts);
+   virtio_gpu_add_bool(m, "virgl", vgdev->has_virgl_3d);
+   virtio_gpu_add_bool(m, "edid", vgdev->has_edid);
+   virtio_gpu_add_bool(m, "indirect", vgdev->has_indirect);
+
+   virtio_gpu_add_bool(m, "resource uuid",
+   vgdev->has_resource_assign_uuid);
+
+   virtio_gpu_add_bool(m, "blob resources", vgdev->has_resource_blob);
+   virtio_gpu_add_int(m, "cap sets", vgdev->num_capsets);
+   virtio_gpu_add_int(m, "scanouts", vgdev->num_scanouts);
if (vgdev->host_visible_region.len) {
seq_printf(m, "%-16s : 0x%lx +0x%lx\n", "host visible region",
   (unsigned long)vgdev->host_visible_region.addr,
diff --git a/drivers/gpu/drm/virtio/virtgpu_fence.c 
b/drivers/gpu/drm/virtio/virtgpu_fence.c
index 728ca36f6327..586034c90587 100644
--- a/drivers/gpu/drm/virtio/virtgpu_fence.c
+++ b/drivers/gpu/drm/virtio/virtgpu_fence.c
@@ -27,22 +27,22 @@

#include "virtgpu_drv.h"

-#define to_virtio_fence(x) \
+#define to_virtio_gpu_fence(x) \
container_of(x, struct virtio_gpu_fence, f)

-static const char *virtio_get_driver_name(struct dma_fence *f)
+static const char *virtio_gpu_get_driver_name(struct dma_fence *f)
{
return "virtio_gpu";
}

-static const char *virtio_get_timeline_name(struct dma_fence *f)
+static const char *virtio_gpu_get_timeline_name(struct dma_fence *f)
{
return "controlq";
}

-static bool virtio_fence_signaled(struct dma_fence *f)
+static bool virtio_gpu_fence_signaled(struct dma_fence *f)
{
-   struct virtio_gpu_fence *fence = to_virtio_fence(f);
+   struct virtio_gpu_fence *fence = to_virtio_gpu_fence(f);

if (WARN_ON_ONCE(fence->f.seqno == 0))
/* leaked fence outside driver before completing
@@ -53,25 +53,26 @@ static bool virtio_fence_signaled(struct dma_fence *f)
return false;
}

-static void virtio_fence_value_str(struct dma_fence *f, char *str, int size)
+static void virtio_gpu_fence_value_str(struct dma_fence *f, char *str, int 
size)
{
snprintf(str, size, "%llu", f->seqno);
}

-static void virtio_timeline_value_str(struct dma_fence *f, char *str, int size)
+static void virtio_gpu_timeline_value_str(struct dma_fence *f, char *str,
+ int size)
{
-   struct virtio_gpu_fence *fence = to_virtio_fence(f);
+   struct virtio_gpu_fence *fence = to_virtio_gpu_fence(f);

snprintf(str, size, "%llu",
 (u64)atomic64_read(&fence->drv->last_fence_id));
}

-static const struct dma_fence_ops virtio_fence_ops = {
-   .get_driver_name = virtio_get_driver_name,
-   .get_timeline_name   = virtio_get_timeline_name,
-   .signaled= virtio_fence_signaled,
-   .fence_value_str = virtio_fence_value_str,
-   .timeline_value_str  = virtio_timeline_value_str,
+static const struct dma_fence_ops virtio_gpu_fence_ops = {
+   .get_driver_name = virtio_gpu_get_driver_name,
+   .get_timeline_name   = virtio_gpu_

Re: [PATCH 2/6] gpu/drm: ring_mirror_list --> pending_list

2020-11-25 Thread Christian König

Am 25.11.20 um 04:17 schrieb Luben Tuikov:

Rename "ring_mirror_list" to "pending_list",
to describe what something is, not what it does,
how it's used, or how the hardware implements it.

This also abstracts the actual hardware
implementation, i.e. how the low-level driver
communicates with the device it drives, ring, CAM,
etc., shouldn't be exposed to DRM.

The pending_list keeps jobs submitted, which are
out of our control. Usually this means they are
pending execution status in hardware, but the
latter definition is a more general (inclusive)
definition.

Signed-off-by: Luben Tuikov 


In general the rename is a good idea, but I think we should try to 
remove this linked list in general.


As the original name described this is essentially a ring buffer, the is 
no reason I can see to use a linked list here except for the add/remove 
madness we currently have.


Anyway patch is Acked-by: Christian König  for 
now.


Regards,
Christian.


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |  4 +--
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  4 +--
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  2 +-
  drivers/gpu/drm/scheduler/sched_main.c  | 34 ++---
  include/drm/gpu_scheduler.h | 10 +++---
  5 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
index 8358cae0b5a4..db77a5bdfa45 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
@@ -1427,7 +1427,7 @@ static void amdgpu_ib_preempt_job_recovery(struct 
drm_gpu_scheduler *sched)
struct dma_fence *fence;
  
  	spin_lock(&sched->job_list_lock);

-   list_for_each_entry(s_job, &sched->ring_mirror_list, list) {
+   list_for_each_entry(s_job, &sched->pending_list, list) {
fence = sched->ops->run_job(s_job);
dma_fence_put(fence);
}
@@ -1459,7 +1459,7 @@ static void amdgpu_ib_preempt_mark_partial_job(struct 
amdgpu_ring *ring)
  
  no_preempt:

spin_lock(&sched->job_list_lock);
-   list_for_each_entry_safe(s_job, tmp, &sched->ring_mirror_list, list) {
+   list_for_each_entry_safe(s_job, tmp, &sched->pending_list, list) {
if (dma_fence_is_signaled(&s_job->s_fence->finished)) {
/* remove job from ring_mirror_list */
list_del_init(&s_job->list);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 4df6de81cd41..fbae600aa5f9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4127,8 +4127,8 @@ bool amdgpu_device_has_job_running(struct amdgpu_device 
*adev)
continue;
  
  		spin_lock(&ring->sched.job_list_lock);

-   job = list_first_entry_or_null(&ring->sched.ring_mirror_list,
-   struct drm_sched_job, list);
+   job = list_first_entry_or_null(&ring->sched.pending_list,
+  struct drm_sched_job, list);
spin_unlock(&ring->sched.job_list_lock);
if (job)
return true;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index aca52a46b93d..ff48101bab55 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -271,7 +271,7 @@ void amdgpu_job_stop_all_jobs_on_sched(struct 
drm_gpu_scheduler *sched)
}
  
  	/* Signal all jobs already scheduled to HW */

-   list_for_each_entry(s_job, &sched->ring_mirror_list, list) {
+   list_for_each_entry(s_job, &sched->pending_list, list) {
struct drm_sched_fence *s_fence = s_job->s_fence;
  
  		dma_fence_set_error(&s_fence->finished, -EHWPOISON);

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index c52eba407ebd..b694df12aaba 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -198,7 +198,7 @@ EXPORT_SYMBOL(drm_sched_dependency_optimized);
  static void drm_sched_start_timeout(struct drm_gpu_scheduler *sched)
  {
if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
-   !list_empty(&sched->ring_mirror_list))
+   !list_empty(&sched->pending_list))
schedule_delayed_work(&sched->work_tdr, sched->timeout);
  }
  
@@ -258,7 +258,7 @@ void drm_sched_resume_timeout(struct drm_gpu_scheduler *sched,

  {
spin_lock(&sched->job_list_lock);
  
-	if (list_empty(&sched->ring_mirror_list))

+   if (list_empty(&sched->pending_list))
cancel_delayed_work(&sched->work_tdr);
else
mod_delayed_work(system_wq, &sched->work_tdr, remaining);
@@ -272,7 +272,7 @@ static void drm_sched_job_begin(struct drm_sched_job *s_job)
struct drm_gpu_scheduler *s

Re: [PATCH 2/3] drm/virtio: rework virtio_fence_signaled

2020-11-25 Thread Anthoine Bourgeois

On Mon, Nov 23, 2020 at 06:19:01PM -0800, Gurchetan Singh wrote:

virtio_gpu_fence_event_process sets the last_fence_id and
subsequently calls dma_fence_signal_locked(..).

dma_fence_signal_locked(..) sets DMA_FENCE_FLAG_SIGNALED_BIT,
which is actually checked before &dma_fence_ops.(*signaled) is
called.

The check for last_fence_id is therefore a bit redundant, and
it will not be sufficient to check the last_fence_id for multiple
synchronization timelines.  Remove it.

Signed-off-by: Gurchetan Singh 

Reviewed-by: Anthoine Bourgeois 

---
drivers/gpu/drm/virtio/virtgpu_fence.c | 12 
1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_fence.c 
b/drivers/gpu/drm/virtio/virtgpu_fence.c
index 586034c90587..b35fcd1d02d7 100644
--- a/drivers/gpu/drm/virtio/virtgpu_fence.c
+++ b/drivers/gpu/drm/virtio/virtgpu_fence.c
@@ -42,14 +42,10 @@ static const char *virtio_gpu_get_timeline_name(struct 
dma_fence *f)

static bool virtio_gpu_fence_signaled(struct dma_fence *f)
{
-   struct virtio_gpu_fence *fence = to_virtio_gpu_fence(f);
-
-   if (WARN_ON_ONCE(fence->f.seqno == 0))
-   /* leaked fence outside driver before completing
-* initialization with virtio_gpu_fence_emit */
-   return false;
-   if (atomic64_read(&fence->drv->last_fence_id) >= fence->f.seqno)
-   return true;
+   /* leaked fence outside driver before completing
+* initialization with virtio_gpu_fence_emit.
+*/
+   WARN_ON_ONCE(f->seqno == 0);
return false;
}

--
2.29.2.454.gaff20da3a2-goog

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 3/3] drm/virtio: consider dma-fence context when signaling

2020-11-25 Thread Anthoine Bourgeois

On Mon, Nov 23, 2020 at 06:28:17PM -0800, Gurchetan Singh wrote:

This an incremental refactor towards multiple dma-fence contexts
in virtio-gpu.  Since all fences are still allocated using
&virtio_gpu_fence_driver.context, nothing should break and every
processed fence will be signaled.

The overall idea is every 3D context can allocate a number of
dma-fence contexts.  Each dma-fence context refers to it's own
timeline.

For example, consider the following case where virgl submits
commands to the GPU (fence ids 1, 3) and does a metadata query with
the CPU (fence id 5).  In a different process, gfxstream submits
commands to the GPU (fence ids 2, 4).

fence_id (&dma_fence.seqno)   | 1 2 3 4 5
--|---
fence_ctx 0 (virgl gpu)   | 1   3
fence_ctx 1 (virgl metadata query)| 5
fence_ctx 2 (gfxstream gpu)   |   2   4

With multiple fence contexts, we can wait for the metadata query
to finish without waiting for the virgl gpu to finish.  virgl gpu
does not have to wait for gfxstream gpu.  The fence id still is the
monotonically increasing sequence number, but it's only revelant to
the specific dma-fence context.

To fully enable this feature, we'll need to:
 - have each 3d context allocate a number of fence contexts. Not
   too hard with explicit context initialization on the horizon.
 - have guest userspace specify fence context when performing
   ioctls.
 - tag each fence emitted to the host with the fence context
   information.  virtio_gpu_ctrl_hdr has padding + flags available,
   so that should be easy.

This change goes in the direction specified above, by:
 - looking up the virtgpu_fence given a fence_id
 - signalling all prior fences in a given context
 - signalling current fence

v2: fix grammar in comment

Signed-off-by: Gurchetan Singh 

Reviewed-by: Anthoine Bourgeois 

---
drivers/gpu/drm/virtio/virtgpu_drv.h   |  1 +
drivers/gpu/drm/virtio/virtgpu_fence.c | 39 --
2 files changed, 31 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
b/drivers/gpu/drm/virtio/virtgpu_drv.h
index 6a232553c99b..d9dbc4f258f3 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.h
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.h
@@ -136,6 +136,7 @@ struct virtio_gpu_fence_driver {

struct virtio_gpu_fence {
struct dma_fence f;
+   uint64_t fence_id;
struct virtio_gpu_fence_driver *drv;
struct list_head node;
};
diff --git a/drivers/gpu/drm/virtio/virtgpu_fence.c 
b/drivers/gpu/drm/virtio/virtgpu_fence.c
index b35fcd1d02d7..d28e25e8409b 100644
--- a/drivers/gpu/drm/virtio/virtgpu_fence.c
+++ b/drivers/gpu/drm/virtio/virtgpu_fence.c
@@ -51,7 +51,7 @@ static bool virtio_gpu_fence_signaled(struct dma_fence *f)

static void virtio_gpu_fence_value_str(struct dma_fence *f, char *str, int size)
{
-   snprintf(str, size, "%llu", f->seqno);
+   snprintf(str, size, "[%llu, %llu]", f->context, f->seqno);
}

static void virtio_gpu_timeline_value_str(struct dma_fence *f, char *str,
@@ -99,7 +99,7 @@ void virtio_gpu_fence_emit(struct virtio_gpu_device *vgdev,
unsigned long irq_flags;

spin_lock_irqsave(&drv->lock, irq_flags);
-   fence->f.seqno = ++drv->current_fence_id;
+   fence->fence_id = fence->f.seqno = ++drv->current_fence_id;
dma_fence_get(&fence->f);
list_add_tail(&fence->node, &drv->fences);
spin_unlock_irqrestore(&drv->lock, irq_flags);
@@ -107,24 +107,45 @@ void virtio_gpu_fence_emit(struct virtio_gpu_device 
*vgdev,
trace_dma_fence_emit(&fence->f);

cmd_hdr->flags |= cpu_to_le32(VIRTIO_GPU_FLAG_FENCE);
-   cmd_hdr->fence_id = cpu_to_le64(fence->f.seqno);
+   cmd_hdr->fence_id = cpu_to_le64(fence->fence_id);
}

void virtio_gpu_fence_event_process(struct virtio_gpu_device *vgdev,
u64 fence_id)
{
struct virtio_gpu_fence_driver *drv = &vgdev->fence_drv;
-   struct virtio_gpu_fence *fence, *tmp;
+   struct virtio_gpu_fence *signaled, *curr, *tmp;
unsigned long irq_flags;

spin_lock_irqsave(&drv->lock, irq_flags);
atomic64_set(&vgdev->fence_drv.last_fence_id, fence_id);
-   list_for_each_entry_safe(fence, tmp, &drv->fences, node) {
-   if (fence_id < fence->f.seqno)
+   list_for_each_entry_safe(curr, tmp, &drv->fences, node) {
+   if (fence_id != curr->fence_id)
continue;
-   dma_fence_signal_locked(&fence->f);
-   list_del(&fence->node);
-   dma_fence_put(&fence->f);
+
+   signaled = curr;
+
+   /*
+* Signal any fences with a strictly smaller sequence number
+* than the current signaled fence.
+*/
+   list_for_each_entry_safe(curr, tmp, &drv->fences, node) {
+   /* dma-fence contexts must match */
+   if (signaled->f.context != curr

Re: [Intel-gfx] [PATCH] dma-buf/dma-resv: Respect num_fences when initializing the shared fence list.

2020-11-25 Thread Maarten Lankhorst
Op 24-11-2020 om 14:10 schreef Thomas Hellström:
>
> On 11/24/20 12:57 PM, Maarten Lankhorst wrote:
>> We hardcode the maximum number of shared fences to 4, instead of
>> respecting num_fences. Use a minimum of 4, but more if num_fences
>> is higher.
>>
>> This seems to have been an oversight when first implementing the
>> api.
>>
>> Fixes: 04a5faa8cbe5 ("reservation: update api and add some helpers")
>> Cc:  # v3.17+
>> Reported-by: Niranjana Vishwanathapura 
>> Signed-off-by: Maarten Lankhorst 
>> ---
>>   drivers/dma-buf/dma-resv.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
> Reviewed-by: Thomas Hellström 
>
>
Thanks, pushed!

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 3/6] drm/scheduler: Job timeout handler returns status

2020-11-25 Thread Christian König

Am 25.11.20 um 04:17 schrieb Luben Tuikov:

The job timeout handler now returns status
indicating back to the DRM layer whether the job
was successfully cancelled or whether more time
should be given to the job to complete.

Signed-off-by: Luben Tuikov 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  6 --
  include/drm/gpu_scheduler.h | 13 ++---
  2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index ff48101bab55..81b73790ecc6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -28,7 +28,7 @@
  #include "amdgpu.h"
  #include "amdgpu_trace.h"
  
-static void amdgpu_job_timedout(struct drm_sched_job *s_job)

+static int amdgpu_job_timedout(struct drm_sched_job *s_job)
  {
struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched);
struct amdgpu_job *job = to_amdgpu_job(s_job);
@@ -41,7 +41,7 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job)
amdgpu_ring_soft_recovery(ring, job->vmid, s_job->s_fence->parent)) 
{
DRM_ERROR("ring %s timeout, but soft recovered\n",
  s_job->sched->name);
-   return;
+   return 0;
}
  
  	amdgpu_vm_get_task_info(ring->adev, job->pasid, &ti);

@@ -53,10 +53,12 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job)
  
  	if (amdgpu_device_should_recover_gpu(ring->adev)) {

amdgpu_device_gpu_recover(ring->adev, job);
+   return 0;
} else {
drm_sched_suspend_timeout(&ring->sched);
if (amdgpu_sriov_vf(adev))
adev->virt.tdr_debug = true;
+   return 1;
}
  }
  
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h

index 2e0c368e19f6..61f7121e1c19 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -230,10 +230,17 @@ struct drm_sched_backend_ops {
struct dma_fence *(*run_job)(struct drm_sched_job *sched_job);
  
  	/**

- * @timedout_job: Called when a job has taken too long to execute,
- * to trigger GPU recovery.
+* @timedout_job: Called when a job has taken too long to execute,
+* to trigger GPU recovery.
+*
+* Return 0, if the job has been aborted successfully and will
+* never be heard of from the device. Return non-zero if the
+* job wasn't able to be aborted, i.e. if more time should be
+* given to this job. The result is not "bool" as this
+* function is not a predicate, although its result may seem
+* as one.


I think the whole approach of timing out a job needs to be rethinked. 
What's timing out here is the hardware engine, not the job.


So we should also not have the job as parameter here. Maybe we should 
make that the fence we are waiting for instead.



 */
-   void (*timedout_job)(struct drm_sched_job *sched_job);
+   int (*timedout_job)(struct drm_sched_job *sched_job);


I would either return an error code, boolean or enum here. But not use a 
number without a define.


Regards,
Christian.

  
  	/**

   * @free_job: Called once the job's finished fence has been signaled


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 4/6] drm/scheduler: Essentialize the job done callback

2020-11-25 Thread Christian König

Am 25.11.20 um 04:17 schrieb Luben Tuikov:

The job done callback is called from various
places, in two ways: in job done role, and
as a fence callback role.

Essentialize the callback to an atom
function to just complete the job,
and into a second function as a prototype
of fence callback which calls to complete
the job.

This is used in latter patches by the completion
code.

Signed-off-by: Luben Tuikov 


Reviewed-by: Christian König 


---
  drivers/gpu/drm/scheduler/sched_main.c | 73 ++
  1 file changed, 40 insertions(+), 33 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index b694df12aaba..3eb7618a627d 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -60,8 +60,6 @@
  #define to_drm_sched_job(sched_job)   \
container_of((sched_job), struct drm_sched_job, queue_node)
  
-static void drm_sched_process_job(struct dma_fence *f, struct dma_fence_cb *cb);

-
  /**
   * drm_sched_rq_init - initialize a given run queue struct
   *
@@ -162,6 +160,40 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq)
return NULL;
  }
  
+/**

+ * drm_sched_job_done - complete a job
+ * @s_job: pointer to the job which is done
+ *
+ * Finish the job's fence and wake up the worker thread.
+ */
+static void drm_sched_job_done(struct drm_sched_job *s_job)
+{
+   struct drm_sched_fence *s_fence = s_job->s_fence;
+   struct drm_gpu_scheduler *sched = s_fence->sched;
+
+   atomic_dec(&sched->hw_rq_count);
+   atomic_dec(&sched->score);
+
+   trace_drm_sched_process_job(s_fence);
+
+   dma_fence_get(&s_fence->finished);
+   drm_sched_fence_finished(s_fence);
+   dma_fence_put(&s_fence->finished);
+   wake_up_interruptible(&sched->wake_up_worker);
+}
+
+/**
+ * drm_sched_job_done_cb - the callback for a done job
+ * @f: fence
+ * @cb: fence callbacks
+ */
+static void drm_sched_job_done_cb(struct dma_fence *f, struct dma_fence_cb *cb)
+{
+   struct drm_sched_job *s_job = container_of(cb, struct drm_sched_job, 
cb);
+
+   drm_sched_job_done(s_job);
+}
+
  /**
   * drm_sched_dependency_optimized
   *
@@ -473,14 +505,14 @@ void drm_sched_start(struct drm_gpu_scheduler *sched, 
bool full_recovery)
  
  		if (fence) {

r = dma_fence_add_callback(fence, &s_job->cb,
-  drm_sched_process_job);
+  drm_sched_job_done_cb);
if (r == -ENOENT)
-   drm_sched_process_job(fence, &s_job->cb);
+   drm_sched_job_done(s_job);
else if (r)
DRM_ERROR("fence add callback failed (%d)\n",
  r);
} else
-   drm_sched_process_job(NULL, &s_job->cb);
+   drm_sched_job_done(s_job);
}
  
  	if (full_recovery) {

@@ -635,31 +667,6 @@ drm_sched_select_entity(struct drm_gpu_scheduler *sched)
return entity;
  }
  
-/**

- * drm_sched_process_job - process a job
- *
- * @f: fence
- * @cb: fence callbacks
- *
- * Called after job has finished execution.
- */
-static void drm_sched_process_job(struct dma_fence *f, struct dma_fence_cb *cb)
-{
-   struct drm_sched_job *s_job = container_of(cb, struct drm_sched_job, 
cb);
-   struct drm_sched_fence *s_fence = s_job->s_fence;
-   struct drm_gpu_scheduler *sched = s_fence->sched;
-
-   atomic_dec(&sched->hw_rq_count);
-   atomic_dec(&sched->score);
-
-   trace_drm_sched_process_job(s_fence);
-
-   dma_fence_get(&s_fence->finished);
-   drm_sched_fence_finished(s_fence);
-   dma_fence_put(&s_fence->finished);
-   wake_up_interruptible(&sched->wake_up_worker);
-}
-
  /**
   * drm_sched_get_cleanup_job - fetch the next finished job to be destroyed
   *
@@ -809,9 +816,9 @@ static int drm_sched_main(void *param)
if (!IS_ERR_OR_NULL(fence)) {
s_fence->parent = dma_fence_get(fence);
r = dma_fence_add_callback(fence, &sched_job->cb,
-  drm_sched_process_job);
+  drm_sched_job_done_cb);
if (r == -ENOENT)
-   drm_sched_process_job(fence, &sched_job->cb);
+   drm_sched_job_done(sched_job);
else if (r)
DRM_ERROR("fence add callback failed (%d)\n",
  r);
@@ -820,7 +827,7 @@ static int drm_sched_main(void *param)
if (IS_ERR(fence))
dma_fence_set_error(&s_fence->finished, 
PTR_ERR(fence));
  
-			drm_sched_process_job(NULL, &sched_job->cb);

+  

Re: [PATCH 5/6] drm/amdgpu: Don't hardcode thread name length

2020-11-25 Thread Christian König

Am 25.11.20 um 04:17 schrieb Luben Tuikov:

Introduce a macro DRM_THREAD_NAME_LEN
and use that to define ring name size,
instead of hardcoding it to 16.

Signed-off-by: Luben Tuikov 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 2 +-
  include/drm/gpu_scheduler.h  | 2 ++
  2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 7112137689db..bbd46c6dec65 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -230,7 +230,7 @@ struct amdgpu_ring {
unsignedwptr_offs;
unsignedfence_offs;
uint64_tcurrent_ctx;
-   charname[16];
+   charname[DRM_THREAD_NAME_LEN];
u32 trail_seq;
unsignedtrail_fence_offs;
u64 trail_fence_gpu_addr;
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 61f7121e1c19..3a5686c3b5e9 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -30,6 +30,8 @@
  
  #define MAX_WAIT_SCHED_ENTITY_Q_EMPTY msecs_to_jiffies(1000)
  
+#define DRM_THREAD_NAME_LEN TASK_COMM_LEN

+


The thread name is an amdgpu specific thing. I don't think we should 
have that in the scheduler.


And why do you use TASK_COMM_LEN here? That is completely unrelated stuff.

Regards,
Christian.


  struct drm_gpu_scheduler;
  struct drm_sched_rq;
  


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 6/6] drm/sched: Make use of a "done" thread

2020-11-25 Thread Christian König

Am 25.11.20 um 04:17 schrieb Luben Tuikov:

Add a "done" list to which all completed jobs are added
to be freed. The drm_sched_job_done() callback is the
producer of jobs to this list.

Add a "done" thread which consumes from the done list
and frees up jobs. Now, the main scheduler thread only
pushes jobs to the GPU and the "done" thread frees them
up, on the way out of the GPU when they've completed
execution.


Well there are quite a number of problems in this patch.

From the design I think we should be getting rid of the linked list and 
not extend its use. And we also don't want to offload the freeing of 
jobs into a different thread because that could potentially mean that 
this is executed on a different CPU.


Then one obvious problem seems to be that you don't take into account 
that we moved the job freeing into the scheduler thread to make sure 
that this is suspended while the scheduler thread is stopped. This 
behavior is now completely gone, e.g. the delete thread keeps running 
while the scheduler thread is stopped.


A few more comments below.


Make use of the status returned by the GPU driver
timeout handler to decide whether to leave the job in
the pending list, or to send it off to the done list.
If a job is done, it is added to the done list and the
done thread woken up. If a job needs more time, it is
left on the pending list and the timeout timer
restarted.

Eliminate the polling mechanism of picking out done
jobs from the pending list, i.e. eliminate
drm_sched_get_cleanup_job(). Now the main scheduler
thread only pushes jobs down to the GPU.

Various other optimizations to the GPU scheduler
and job recovery are possible with this format.

Signed-off-by: Luben Tuikov 
---
  drivers/gpu/drm/scheduler/sched_main.c | 173 +
  include/drm/gpu_scheduler.h|  14 ++
  2 files changed, 101 insertions(+), 86 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 3eb7618a627d..289ae68cd97f 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -164,7 +164,8 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq)
   * drm_sched_job_done - complete a job
   * @s_job: pointer to the job which is done
   *
- * Finish the job's fence and wake up the worker thread.
+ * Finish the job's fence, move it to the done list,
+ * and wake up the done thread.
   */
  static void drm_sched_job_done(struct drm_sched_job *s_job)
  {
@@ -179,7 +180,12 @@ static void drm_sched_job_done(struct drm_sched_job *s_job)
dma_fence_get(&s_fence->finished);
drm_sched_fence_finished(s_fence);
dma_fence_put(&s_fence->finished);
-   wake_up_interruptible(&sched->wake_up_worker);
+
+   spin_lock(&sched->job_list_lock);
+   list_move(&s_job->list, &sched->done_list);
+   spin_unlock(&sched->job_list_lock);
+
+   wake_up_interruptible(&sched->done_wait_q);


How is the worker thread then woken up to push new jobs to the hardware?


  }
  
  /**

@@ -221,11 +227,10 @@ bool drm_sched_dependency_optimized(struct dma_fence* 
fence,
  EXPORT_SYMBOL(drm_sched_dependency_optimized);
  
  /**

- * drm_sched_start_timeout - start timeout for reset worker
- *
- * @sched: scheduler instance to start the worker for
+ * drm_sched_start_timeout - start a timeout timer
+ * @sched: scheduler instance whose job we're timing
   *
- * Start the timeout for the given scheduler.
+ * Start a timeout timer for the given scheduler.
   */
  static void drm_sched_start_timeout(struct drm_gpu_scheduler *sched)
  {
@@ -305,8 +310,8 @@ static void drm_sched_job_begin(struct drm_sched_job *s_job)
  
  	spin_lock(&sched->job_list_lock);

list_add_tail(&s_job->list, &sched->pending_list);
-   drm_sched_start_timeout(sched);
spin_unlock(&sched->job_list_lock);
+   drm_sched_start_timeout(sched);


This looks wrong, the drm_sched_start_timeout() function used to need 
the lock. Why should that have changed?



  }
  
  static void drm_sched_job_timedout(struct work_struct *work)

@@ -316,37 +321,30 @@ static void drm_sched_job_timedout(struct work_struct 
*work)
  
  	sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
  
-	/* Protects against concurrent deletion in drm_sched_get_cleanup_job */

spin_lock(&sched->job_list_lock);
job = list_first_entry_or_null(&sched->pending_list,
   struct drm_sched_job, list);
+   spin_unlock(&sched->job_list_lock);
  
  	if (job) {

-   /*
-* Remove the bad job so it cannot be freed by concurrent
-* drm_sched_cleanup_jobs. It will be reinserted back after 
sched->thread
-* is parked at which point it's safe.
-*/
-   list_del_init(&job->list);
-   spin_unlock(&sched->job_list_lock);
+   int res;
  
-		job->sched->ops->timedout_job(job);

+   

Re: [PATCH 4/7] drm/radeon: Pin buffers while they are vmap'ed

2020-11-25 Thread Christian König

Am 25.11.20 um 09:37 schrieb Thomas Zimmermann:

Hi

Am 24.11.20 um 15:09 schrieb Daniel Vetter:

On Tue, Nov 24, 2020 at 02:56:51PM +0100, Thomas Zimmermann wrote:

Hi

Am 24.11.20 um 14:36 schrieb Christian König:

Am 24.11.20 um 13:15 schrieb Thomas Zimmermann:

[SNIP]

First I wanted to put this into
drm_gem_ttm_vmap/vunmap(), but then wondered why
ttm_bo_vmap() doe not acquire the lock internally?
I'd expect that vmap/vunmap are close together and
do not overlap for the same BO.


We have use cases like the following during command submission:

1. lock
2. map
3. copy parts of the BO content somewhere else or patch
it with additional information
4. unmap
5. submit BO to the hardware
6. add hardware fence to the BO to make sure it doesn't move
7. unlock

That use case won't be possible with vmap/vunmap if we
move the lock/unlock into it and I hope to replace the
kmap/kunmap functions with them in the near term.


Otherwise, acquiring the reservation lock would
require another ref-counting variable or per-driver
code.


Hui, why that? Just put this into
drm_gem_ttm_vmap/vunmap() helper as you initially
planned.


Given your example above, step one would acquire the lock,
and step two would also acquire the lock as part of the vmap
implementation. Wouldn't this fail (At least during unmap or
unlock steps) ?


Oh, so you want to nest them? No, that is a rather bad no-go.


I don't want to nest/overlap them. My question was whether that
would be required. Apparently not.

While the console's BO is being set for scanout, it's protected from
movement via the pin/unpin implementation, right?


Yes, correct.


The driver does not acquire the resv lock for longer periods. I'm
asking because this would prevent any console-buffer updates while
the console is being displayed.


Correct as well, we only hold the lock for things like command
submission, pinning, unpinning etc etc



Thanks for answering my questions.





You need to make sure that the lock is only taken from the FB
path which wants to vmap the object.

Why don't you lock the GEM object from the caller in the generic
FB implementation?


With the current blitter code, it breaks abstraction. if vmap/vunmap
hold the lock implicitly, things would be easier.


Do you have a link to the code?


It's the damage blitter in the fbdev code. [1] While it flushes the 
shadow
buffer into the BO, the BO has to be kept in place. I already 
changed it to
lock struct drm_fb_helper.lock, but I don't think this is enough. 
TTM could

still evict the BO concurrently.


So I'm not sure this is actually a problem: ttm could try to 
concurrently
evict the buffer we pinned into vram, and then just skip to the next 
one.


Plus atm generic fbdev isn't used on any chip where we really care about
that last few mb of vram being useable for command submission (well atm
there's no driver using it).


Well, this is the patchset for radeon. If it works out, amdgpu and 
nouveau are natural next choices. Especially radeon and nouveau 
support cards with low- to medium-sized VRAM. The MiBs wasted on fbdev 
certainly matter.




Having the buffer pinned into system memory and trying to do a 
concurrent

modeset that tries to pull it in is the hard failure mode. And holding
fb_helper.lock fully prevents that.

So not really clear on what failure mode you're seeing here?


Imagine the fbdev BO is in VRAM, but not pinned. (Maybe Xorg or 
Wayland is running.) The fbdev BO is a few MiBs and not in use, so TTM 
would want to evict it if memory gets tight.


What I have in mind is a concurrent modeset that requires the memory. 
If we do a concurrent damage blit without protecting against eviction, 
things go boom. Same for concurrent 3d graphics with textures, model 
data, etc.


Completely agree.

This needs proper lock protection of the memory mapped buffer. Relying 
on that some other code isn't run because we have some third part locks 
taken is not sufficient here.


Regards,
Christian.



Best regards
Thomas



There's no recursion taking place, so I guess the reservation lock 
could be
acquired/release in drm_client_buffer_vmap/vunmap(), or a separate 
pair of

DRM client functions could do the locking.


Given how this "do the right locking" is a can of worms (and I think 
it's

worse than what you dug out already) I think the fb_helper.lock hack is
perfectly good enough.

I'm also somewhat worried that starting to use dma_resv lock in generic
code, while many helpers/drivers still have their hand-rolled locking,
will make conversion over to dma_resv needlessly more complicated.
-Daniel



Best regards
Thomas

[1] 
https://cgit.freedesktop.org/drm/drm-tip/tree/drivers/gpu/drm/drm_fb_helper.c?id=ac60f3f3090115d21f028bffa2dcfb67f695c4f2#n394




Please note that the reservation lock you need to take here is part of
the GEM object.

Usually we design things in the way that the code needs to take a lock
which protects an object, then do some operations with the object and
then relea

Re: [PATCH 4/7] drm/radeon: Pin buffers while they are vmap'ed

2020-11-25 Thread Daniel Vetter
On Wed, Nov 25, 2020 at 11:13:13AM +0100, Christian König wrote:
> Am 25.11.20 um 09:37 schrieb Thomas Zimmermann:
> > Hi
> > 
> > Am 24.11.20 um 15:09 schrieb Daniel Vetter:
> > > On Tue, Nov 24, 2020 at 02:56:51PM +0100, Thomas Zimmermann wrote:
> > > > Hi
> > > > 
> > > > Am 24.11.20 um 14:36 schrieb Christian König:
> > > > > Am 24.11.20 um 13:15 schrieb Thomas Zimmermann:
> > > > > > [SNIP]
> > > > > > > > > > First I wanted to put this into
> > > > > > > > > > drm_gem_ttm_vmap/vunmap(), but then wondered why
> > > > > > > > > > ttm_bo_vmap() doe not acquire the lock internally?
> > > > > > > > > > I'd expect that vmap/vunmap are close together and
> > > > > > > > > > do not overlap for the same BO.
> > > > > > > > > 
> > > > > > > > > We have use cases like the following during command 
> > > > > > > > > submission:
> > > > > > > > > 
> > > > > > > > > 1. lock
> > > > > > > > > 2. map
> > > > > > > > > 3. copy parts of the BO content somewhere else or patch
> > > > > > > > > it with additional information
> > > > > > > > > 4. unmap
> > > > > > > > > 5. submit BO to the hardware
> > > > > > > > > 6. add hardware fence to the BO to make sure it doesn't move
> > > > > > > > > 7. unlock
> > > > > > > > > 
> > > > > > > > > That use case won't be possible with vmap/vunmap if we
> > > > > > > > > move the lock/unlock into it and I hope to replace the
> > > > > > > > > kmap/kunmap functions with them in the near term.
> > > > > > > > > 
> > > > > > > > > > Otherwise, acquiring the reservation lock would
> > > > > > > > > > require another ref-counting variable or per-driver
> > > > > > > > > > code.
> > > > > > > > > 
> > > > > > > > > Hui, why that? Just put this into
> > > > > > > > > drm_gem_ttm_vmap/vunmap() helper as you initially
> > > > > > > > > planned.
> > > > > > > > 
> > > > > > > > Given your example above, step one would acquire the lock,
> > > > > > > > and step two would also acquire the lock as part of the vmap
> > > > > > > > implementation. Wouldn't this fail (At least during unmap or
> > > > > > > > unlock steps) ?
> > > > > > > 
> > > > > > > Oh, so you want to nest them? No, that is a rather bad no-go.
> > > > > > 
> > > > > > I don't want to nest/overlap them. My question was whether that
> > > > > > would be required. Apparently not.
> > > > > > 
> > > > > > While the console's BO is being set for scanout, it's protected from
> > > > > > movement via the pin/unpin implementation, right?
> > > > > 
> > > > > Yes, correct.
> > > > > 
> > > > > > The driver does not acquire the resv lock for longer periods. I'm
> > > > > > asking because this would prevent any console-buffer updates while
> > > > > > the console is being displayed.
> > > > > 
> > > > > Correct as well, we only hold the lock for things like command
> > > > > submission, pinning, unpinning etc etc
> > > > > 
> > > > 
> > > > Thanks for answering my questions.
> > > > 
> > > > > > 
> > > > > > > 
> > > > > > > You need to make sure that the lock is only taken from the FB
> > > > > > > path which wants to vmap the object.
> > > > > > > 
> > > > > > > Why don't you lock the GEM object from the caller in the generic
> > > > > > > FB implementation?
> > > > > > 
> > > > > > With the current blitter code, it breaks abstraction. if vmap/vunmap
> > > > > > hold the lock implicitly, things would be easier.
> > > > > 
> > > > > Do you have a link to the code?
> > > > 
> > > > It's the damage blitter in the fbdev code. [1] While it flushes
> > > > the shadow
> > > > buffer into the BO, the BO has to be kept in place. I already
> > > > changed it to
> > > > lock struct drm_fb_helper.lock, but I don't think this is
> > > > enough. TTM could
> > > > still evict the BO concurrently.
> > > 
> > > So I'm not sure this is actually a problem: ttm could try to
> > > concurrently
> > > evict the buffer we pinned into vram, and then just skip to the next
> > > one.
> > > 
> > > Plus atm generic fbdev isn't used on any chip where we really care about
> > > that last few mb of vram being useable for command submission (well atm
> > > there's no driver using it).
> > 
> > Well, this is the patchset for radeon. If it works out, amdgpu and
> > nouveau are natural next choices. Especially radeon and nouveau support
> > cards with low- to medium-sized VRAM. The MiBs wasted on fbdev certainly
> > matter.
> > 
> > > 
> > > Having the buffer pinned into system memory and trying to do a
> > > concurrent
> > > modeset that tries to pull it in is the hard failure mode. And holding
> > > fb_helper.lock fully prevents that.
> > > 
> > > So not really clear on what failure mode you're seeing here?
> > 
> > Imagine the fbdev BO is in VRAM, but not pinned. (Maybe Xorg or Wayland
> > is running.) The fbdev BO is a few MiBs and not in use, so TTM would
> > want to evict it if memory gets tight.
> > 
> > What I have in mind is a concurrent modeset that requires the memory. If
> > we do a concurrent damage blit without protecting against eviction,
> > things go

Re: [PATCH 000/141] Fix fall-through warnings for Clang

2020-11-25 Thread Andy Shevchenko
On Mon, Nov 23, 2020 at 10:39 PM James Bottomley
 wrote:
> On Mon, 2020-11-23 at 19:56 +0100, Miguel Ojeda wrote:
> > On Mon, Nov 23, 2020 at 4:58 PM James Bottomley
> >  wrote:

...

> > But if we do the math, for an author, at even 1 minute per line
> > change and assuming nothing can be automated at all, it would take 1
> > month of work. For maintainers, a couple of trivial lines is noise
> > compared to many other patches.
>
> So you think a one line patch should take one minute to produce ... I
> really don't think that's grounded in reality.  I suppose a one line
> patch only takes a minute to merge with b4 if no-one reviews or tests
> it, but that's not really desirable.

In my practice most of the one line patches were either to fix or to
introduce quite interesting issues.
1 minute is 2-3 orders less than usually needed for such patches.
That's why I don't like churn produced by people who often even didn't
compile their useful contributions.

-- 
With Best Regards,
Andy Shevchenko
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use

2020-11-25 Thread Daniel Vetter
On Tue, Nov 24, 2020 at 05:44:07PM +0100, Christian König wrote:
> Am 24.11.20 um 17:22 schrieb Andrey Grodzovsky:
> > 
> > On 11/24/20 2:41 AM, Christian König wrote:
> > > Am 23.11.20 um 22:08 schrieb Andrey Grodzovsky:
> > > > 
> > > > On 11/23/20 3:41 PM, Christian König wrote:
> > > > > Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky:
> > > > > > 
> > > > > > On 11/23/20 3:20 PM, Christian König wrote:
> > > > > > > Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky:
> > > > > > > > 
> > > > > > > > On 11/25/20 5:42 AM, Christian König wrote:
> > > > > > > > > Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
> > > > > > > > > > It's needed to drop iommu backed pages on device unplug
> > > > > > > > > > before device's IOMMU group is released.
> > > > > > > > > 
> > > > > > > > > It would be cleaner if we could do the whole
> > > > > > > > > handling in TTM. I also need to double check
> > > > > > > > > what you are doing with this function.
> > > > > > > > > 
> > > > > > > > > Christian.
> > > > > > > > 
> > > > > > > > 
> > > > > > > > Check patch "drm/amdgpu: Register IOMMU topology
> > > > > > > > notifier per device." to see
> > > > > > > > how i use it. I don't see why this should go
> > > > > > > > into TTM mid-layer - the stuff I do inside
> > > > > > > > is vendor specific and also I don't think TTM is
> > > > > > > > explicitly aware of IOMMU ?
> > > > > > > > Do you mean you prefer the IOMMU notifier to be
> > > > > > > > registered from within TTM
> > > > > > > > and then use a hook to call into vendor specific handler ?
> > > > > > > 
> > > > > > > No, that is really vendor specific.
> > > > > > > 
> > > > > > > What I meant is to have a function like
> > > > > > > ttm_resource_manager_evict_all() which you only need
> > > > > > > to call and all tt objects are unpopulated.
> > > > > > 
> > > > > > 
> > > > > > So instead of this BO list i create and later iterate in
> > > > > > amdgpu from the IOMMU patch you just want to do it
> > > > > > within
> > > > > > TTM with a single function ? Makes much more sense.
> > > > > 
> > > > > Yes, exactly.
> > > > > 
> > > > > The list_empty() checks we have in TTM for the LRU are
> > > > > actually not the best idea, we should now check the
> > > > > pin_count instead. This way we could also have a list of the
> > > > > pinned BOs in TTM.
> > > > 
> > > > 
> > > > So from my IOMMU topology handler I will iterate the TTM LRU for
> > > > the unpinned BOs and this new function for the pinned ones  ?
> > > > It's probably a good idea to combine both iterations into this
> > > > new function to cover all the BOs allocated on the device.
> > > 
> > > Yes, that's what I had in my mind as well.
> > > 
> > > > 
> > > > 
> > > > > 
> > > > > BTW: Have you thought about what happens when we unpopulate
> > > > > a BO while we still try to use a kernel mapping for it? That
> > > > > could have unforeseen consequences.
> > > > 
> > > > 
> > > > Are you asking what happens to kmap or vmap style mapped CPU
> > > > accesses once we drop all the DMA backing pages for a particular
> > > > BO ? Because for user mappings
> > > > (mmap) we took care of this with dummy page reroute but indeed
> > > > nothing was done for in kernel CPU mappings.
> > > 
> > > Yes exactly that.
> > > 
> > > In other words what happens if we free the ring buffer while the
> > > kernel still writes to it?
> > > 
> > > Christian.
> > 
> > 
> > While we can't control user application accesses to the mapped buffers
> > explicitly and hence we use page fault rerouting
> > I am thinking that in this  case we may be able to sprinkle
> > drm_dev_enter/exit in any such sensitive place were we might
> > CPU access a DMA buffer from the kernel ?
> 
> Yes, I fear we are going to need that.

Uh ... problem is that dma_buf_vmap are usually permanent things. Maybe we
could stuff this into begin/end_cpu_access (but only for the kernel, so a
bit tricky)?

btw the other issue with dma-buf (and even worse with dma_fence) is
refcounting of the underlying drm_device. I'd expect that all your
callbacks go boom if the dma_buf outlives your drm_device. That part isn't
yet solved in your series here.
-Daniel

> 
> > Things like CPU page table updates, ring buffer accesses and FW memcpy ?
> > Is there other places ?
> 
> Puh, good question. I have no idea.
> 
> > Another point is that at this point the driver shouldn't access any such
> > buffers as we are at the process finishing the device.
> > AFAIK there is no page fault mechanism for kernel mappings so I don't
> > think there is anything else to do ?
> 
> Well there is a page fault handler for kernel mappings, but that one just
> prints the stack trace into the system log and calls BUG(); :)
> 
> Long story short we need to avoid any access to released pages after unplug.
> No matter if it's from the kernel or userspace.
> 
> Regards,
> Christian.
> 
> > 
> > Andrey
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_

Re: [PATCH v3 08/12] drm/amdgpu: Split amdgpu_device_fini into early and late

2020-11-25 Thread Daniel Vetter
On Tue, Nov 24, 2020 at 10:51:57AM -0500, Andrey Grodzovsky wrote:
> 
> On 11/24/20 9:53 AM, Daniel Vetter wrote:
> > On Sat, Nov 21, 2020 at 12:21:18AM -0500, Andrey Grodzovsky wrote:
> > > Some of the stuff in amdgpu_device_fini such as HW interrupts
> > > disable and pending fences finilization must be done right away on
> > > pci_remove while most of the stuff which relates to finilizing and
> > > releasing driver data structures can be kept until
> > > drm_driver.release hook is called, i.e. when the last device
> > > reference is dropped.
> > > 
> > Uh fini_late and fini_early are rathare meaningless namings, since no
> > clear why there's a split. If you used drm_connector_funcs as inspiration,
> > that's kinda not good because 'register' itself is a reserved keyword.
> > That's why we had to add late_ prefix, could as well have used
> > C_sucks_ as prefix :-) And then the early_unregister for consistency.
> > 
> > I think fini_hw and fini_sw (or maybe fini_drm) would be a lot clearer
> > about what they're doing.
> > 
> > I still strongly recommend that you cut over as much as possible of the
> > fini_hw work to devm_ and for the fini_sw/drm stuff there's drmm_
> > -Daniel
> 
> 
> Definitely, and I put it in a TODO list in the RFC patch.Also, as I
> mentioned before -
> I just prefer to leave it for a follow up work because it's non trivial and
> requires shuffling
> a lof of stuff around in the driver. I was thinking of committing the work
> in incremental steps -
> so it's easier to merge it and control for breakages.

Yeah doing devm/drmm conversion later on makes sense. It'd still try to
have better names than what you're currently going with. A few of these
will likely stick around for very long, not just interim.
-Daniel

> 
> Andrey
> 
> 
> > 
> > > Signed-off-by: Andrey Grodzovsky 
> > > ---
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu.h|  6 +-
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 16 
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c|  7 ++-
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  | 15 ++-
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c| 24 +++-
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_irq.h|  1 +
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c| 12 +++-
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c|  3 +++
> > >   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  3 ++-
> > >   9 files changed, 65 insertions(+), 22 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > > index 83ac06a..6243f6d 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > > @@ -1063,7 +1063,9 @@ static inline struct amdgpu_device 
> > > *amdgpu_ttm_adev(struct ttm_bo_device *bdev)
> > >   int amdgpu_device_init(struct amdgpu_device *adev,
> > >  uint32_t flags);
> > > -void amdgpu_device_fini(struct amdgpu_device *adev);
> > > +void amdgpu_device_fini_early(struct amdgpu_device *adev);
> > > +void amdgpu_device_fini_late(struct amdgpu_device *adev);
> > > +
> > >   int amdgpu_gpu_wait_for_idle(struct amdgpu_device *adev);
> > >   void amdgpu_device_vram_access(struct amdgpu_device *adev, loff_t pos,
> > > @@ -1275,6 +1277,8 @@ void amdgpu_driver_lastclose_kms(struct drm_device 
> > > *dev);
> > >   int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file 
> > > *file_priv);
> > >   void amdgpu_driver_postclose_kms(struct drm_device *dev,
> > >struct drm_file *file_priv);
> > > +void amdgpu_driver_release_kms(struct drm_device *dev);
> > > +
> > >   int amdgpu_device_ip_suspend(struct amdgpu_device *adev);
> > >   int amdgpu_device_suspend(struct drm_device *dev, bool fbcon);
> > >   int amdgpu_device_resume(struct drm_device *dev, bool fbcon);
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > index 2f60b70..797d94d 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > > @@ -3557,14 +3557,12 @@ int amdgpu_device_init(struct amdgpu_device *adev,
> > >* Tear down the driver info (all asics).
> > >* Called at driver shutdown.
> > >*/
> > > -void amdgpu_device_fini(struct amdgpu_device *adev)
> > > +void amdgpu_device_fini_early(struct amdgpu_device *adev)
> > >   {
> > >   dev_info(adev->dev, "amdgpu: finishing device.\n");
> > >   flush_delayed_work(&adev->delayed_init_work);
> > >   adev->shutdown = true;
> > > - kfree(adev->pci_state);
> > > -
> > >   /* make sure IB test finished before entering exclusive mode
> > >* to avoid preemption on IB test
> > >* */
> > > @@ -3581,11 +3579,18 @@ void amdgpu_device_fini(struct amdgpu_device 
> > > *adev)
> > >   else
> > >   drm_atomic_h

Re: [PATCH rdma-core 3/5] pyverbs: Add dma-buf based MR support

2020-11-25 Thread Daniel Vetter
On Tue, Nov 24, 2020 at 06:45:06PM +, Xiong, Jianxin wrote:
> > -Original Message-
> > From: Daniel Vetter 
> > Sent: Tuesday, November 24, 2020 7:17 AM
> > To: Jason Gunthorpe 
> > Cc: Xiong, Jianxin ; Leon Romanovsky 
> > ; linux-r...@vger.kernel.org; dri-
> > de...@lists.freedesktop.org; Doug Ledford ; Vetter, 
> > Daniel ; Christian Koenig
> > 
> > Subject: Re: [PATCH rdma-core 3/5] pyverbs: Add dma-buf based MR support
> > 
> > On Mon, Nov 23, 2020 at 02:05:04PM -0400, Jason Gunthorpe wrote:
> > > On Mon, Nov 23, 2020 at 09:53:02AM -0800, Jianxin Xiong wrote:
> > >
> > > > +cdef class DmaBuf:
> > > > +def __init__(self, size, unit=0):
> > > > +"""
> > > > +Allocate DmaBuf object from a GPU device. This is done through 
> > > > the
> > > > +DRI device interface (/dev/dri/card*). Usually this
> > > > +requires the
> > 
> > Please use /dev/dri/renderD* instead. That's the interface meant for 
> > unpriviledged rendering access. card* is the legacy interface with
> > backwards compat galore, don't use.
> > 
> > Specifically if you do this on a gpu which also has display (maybe some 
> > testing on a local developer machine, no idea ...) then you mess with
> > compositors and stuff.
> > 
> > Also wherever you copied this from, please also educate those teams that 
> > using /dev/dri/card* for rendering stuff is a Bad Idea (tm)
> 
> /dev/dri/renderD* is not always available (e.g. for many iGPUs) and doesn't 
> support
> mode setting commands (including dumb_buf). The original intention here is to
> have something to support the new tests added, not for general compute. 

Not having dumb_buf available is a feature. So even more reasons to use
that.

Also note that amdgpu has killed card* access pretty much, it's for
modesetting only.

> > > > +effective user id being root or being a member of the 'video' 
> > > > group.
> > > > +:param size: The size (in number of bytes) of the buffer.
> > > > +:param unit: The unit number of the GPU to allocate the buffer 
> > > > from.
> > > > +:return: The newly created DmaBuf object on success.
> > > > +"""
> > > > +self.dmabuf_mrs = weakref.WeakSet()
> > > > +self.dri_fd = open('/dev/dri/card'+str(unit), O_RDWR)
> > > > +
> > > > +args = bytearray(32)
> > > > +pack_into('=iiq', args, 0, 1, size, 8, 0, 0, 0, 0)
> > > > +ioctl(self.dri_fd, DRM_IOCTL_MODE_CREATE_DUMB, args)
> > > > +a, b, c, d, self.handle, e, self.size = unpack('=iiq',
> > > > + args)
> > 
> > Yeah no, don't allocate render buffers with create_dumb. Every time this 
> > comes up I'm wondering whether we should just completely
> > disable dma-buf operations on these. Dumb buffers are explicitly only for 
> > software rendering for display purposes when the gpu userspace
> > stack isn't fully running yet, aka boot splash.
> > 
> > And yes I know there's endless amounts of abuse of that stuff floating 
> > around, especially on arm-soc/android systems.
> 
> One alternative is to use the GEM_CREATE method which can be done via the 
> renderD*
> device, but the command is vendor specific, so the logic is a little bit more 
> complex. 

Yup. I guess the most minimal thing is to have a per-vendor (you can ask
drm for the driver name to match the right one) callback here to allocate
buffers correctly. Might be less churn than trying to pull in vulkan or
something like that.

It's at least what we're doing in igt for testing drm drivers (although
most of the generic igt tests for display, so dumb_buffer fallback is
available).

DRM_IOCTL_VERSION is the thing you'd need here, struct drm_version.name
has the field for figuring out which driver it is.

Also drivers without render node support won't ever be in the same system
as an rdma card and actually useful (because well they're either very old,
or display-only). So not an issue I think.

> > > > +
> > > > +args = bytearray(12)
> > > > +pack_into('=iii', args, 0, self.handle, O_RDWR, 0)
> > > > +ioctl(self.dri_fd, DRM_IOCTL_PRIME_HANDLE_TO_FD, args)
> > > > +a, b, self.fd = unpack('=iii', args)
> > > > +
> > > > +args = bytearray(16)
> > > > +pack_into('=iiq', args, 0, self.handle, 0, 0)
> > > > +ioctl(self.dri_fd, DRM_IOCTL_MODE_MAP_DUMB, args);
> > > > +a, b, self.map_offset = unpack('=iiq', args);
> > >
> > > Wow, OK
> > >
> > > Is it worth using ctypes here instead? Can you at least add a comment
> > > before each pack specifying the 'struct XXX' this is following?
> > >
> > > Does this work with normal Intel GPUs, like in a Laptop? AMD too?
> > >
> > > Christian, I would be very happy to hear from you that this entire
> > > work is good for AMD as well
> > 
> > I think the smallest generic interface for allocating gpu buffers which are 
> > more useful than the stuff you get from CREATE_DUMB is gbm.
> > That's used by compositors to get bare metal opengl going on

Re: [PATCH 4/7] drm/radeon: Pin buffers while they are vmap'ed

2020-11-25 Thread Christian König

Am 25.11.20 um 11:36 schrieb Daniel Vetter:

On Wed, Nov 25, 2020 at 11:13:13AM +0100, Christian König wrote:

Am 25.11.20 um 09:37 schrieb Thomas Zimmermann:

Hi

Am 24.11.20 um 15:09 schrieb Daniel Vetter:

On Tue, Nov 24, 2020 at 02:56:51PM +0100, Thomas Zimmermann wrote:

Hi

Am 24.11.20 um 14:36 schrieb Christian König:

Am 24.11.20 um 13:15 schrieb Thomas Zimmermann:

[SNIP]

First I wanted to put this into
drm_gem_ttm_vmap/vunmap(), but then wondered why
ttm_bo_vmap() doe not acquire the lock internally?
I'd expect that vmap/vunmap are close together and
do not overlap for the same BO.

We have use cases like the following during command submission:

1. lock
2. map
3. copy parts of the BO content somewhere else or patch
it with additional information
4. unmap
5. submit BO to the hardware
6. add hardware fence to the BO to make sure it doesn't move
7. unlock

That use case won't be possible with vmap/vunmap if we
move the lock/unlock into it and I hope to replace the
kmap/kunmap functions with them in the near term.


Otherwise, acquiring the reservation lock would
require another ref-counting variable or per-driver
code.

Hui, why that? Just put this into
drm_gem_ttm_vmap/vunmap() helper as you initially
planned.

Given your example above, step one would acquire the lock,
and step two would also acquire the lock as part of the vmap
implementation. Wouldn't this fail (At least during unmap or
unlock steps) ?

Oh, so you want to nest them? No, that is a rather bad no-go.

I don't want to nest/overlap them. My question was whether that
would be required. Apparently not.

While the console's BO is being set for scanout, it's protected from
movement via the pin/unpin implementation, right?

Yes, correct.


The driver does not acquire the resv lock for longer periods. I'm
asking because this would prevent any console-buffer updates while
the console is being displayed.

Correct as well, we only hold the lock for things like command
submission, pinning, unpinning etc etc


Thanks for answering my questions.


You need to make sure that the lock is only taken from the FB
path which wants to vmap the object.

Why don't you lock the GEM object from the caller in the generic
FB implementation?

With the current blitter code, it breaks abstraction. if vmap/vunmap
hold the lock implicitly, things would be easier.

Do you have a link to the code?

It's the damage blitter in the fbdev code. [1] While it flushes
the shadow
buffer into the BO, the BO has to be kept in place. I already
changed it to
lock struct drm_fb_helper.lock, but I don't think this is
enough. TTM could
still evict the BO concurrently.

So I'm not sure this is actually a problem: ttm could try to
concurrently
evict the buffer we pinned into vram, and then just skip to the next
one.

Plus atm generic fbdev isn't used on any chip where we really care about
that last few mb of vram being useable for command submission (well atm
there's no driver using it).

Well, this is the patchset for radeon. If it works out, amdgpu and
nouveau are natural next choices. Especially radeon and nouveau support
cards with low- to medium-sized VRAM. The MiBs wasted on fbdev certainly
matter.


Having the buffer pinned into system memory and trying to do a
concurrent
modeset that tries to pull it in is the hard failure mode. And holding
fb_helper.lock fully prevents that.

So not really clear on what failure mode you're seeing here?

Imagine the fbdev BO is in VRAM, but not pinned. (Maybe Xorg or Wayland
is running.) The fbdev BO is a few MiBs and not in use, so TTM would
want to evict it if memory gets tight.

What I have in mind is a concurrent modeset that requires the memory. If
we do a concurrent damage blit without protecting against eviction,
things go boom. Same for concurrent 3d graphics with textures, model
data, etc.

Completely agree.

This needs proper lock protection of the memory mapped buffer. Relying on
that some other code isn't run because we have some third part locks taken
is not sufficient here.

We are still protected by the pin count in this scenario. Plus, with
current drivers we always pin the fbdev buffer into vram, so occasionally
failing to move it out isn't a regression.

So I'm still not seeing how this can go boom.


Well as far as I understand it the pin count is zero for this buffer in 
this case here :)


I might be wrong on this because I don't know the FB code at all, but 
Thomas seems to be pretty clear that this is the shadow buffer which is 
not scanned out from.


Regards,
Christian.



Now long term it'd be nice to cut everything over to dma_resv locking, but
the issue there is that beyond ttm, none of the helpers (and few of the
drivers) use dma_resv. So this is a fairly big uphill battle. Quick
interim fix seems like the right solution to me.
-Daniel


Regards,
Christian.


Best regards
Thomas


There's no recursion taking place, so I guess the reservation
lock could be
acquired/release in drm_client_buffer_vmap/vunmap()

Re: [PATCH 3/6] drm/scheduler: Job timeout handler returns status

2020-11-25 Thread Steven Price

On 25/11/2020 03:17, Luben Tuikov wrote:

The job timeout handler now returns status
indicating back to the DRM layer whether the job
was successfully cancelled or whether more time
should be given to the job to complete.


I'm not sure I understand in what circumstances you would want to give 
the job more time to complete. Could you expand on that?


One thing we're missing at the moment in Panfrost is the ability to 
suspend ("soft stop" is the Mali jargon) a job and pick something else 
to run. The propitiatory driver stack uses this to avoid timing out long 
running jobs while still allowing other processes to have time on the 
GPU. But this interface as it stands doesn't seem to provide that.


As the kernel test robot has already pointed out - you'll need to at the 
very least update the other uses of this interface.


Steve



Signed-off-by: Luben Tuikov 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  6 --
  include/drm/gpu_scheduler.h | 13 ++---
  2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index ff48101bab55..81b73790ecc6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -28,7 +28,7 @@
  #include "amdgpu.h"
  #include "amdgpu_trace.h"
  
-static void amdgpu_job_timedout(struct drm_sched_job *s_job)

+static int amdgpu_job_timedout(struct drm_sched_job *s_job)
  {
struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched);
struct amdgpu_job *job = to_amdgpu_job(s_job);
@@ -41,7 +41,7 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job)
amdgpu_ring_soft_recovery(ring, job->vmid, s_job->s_fence->parent)) 
{
DRM_ERROR("ring %s timeout, but soft recovered\n",
  s_job->sched->name);
-   return;
+   return 0;
}
  
  	amdgpu_vm_get_task_info(ring->adev, job->pasid, &ti);

@@ -53,10 +53,12 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job)
  
  	if (amdgpu_device_should_recover_gpu(ring->adev)) {

amdgpu_device_gpu_recover(ring->adev, job);
+   return 0;
} else {
drm_sched_suspend_timeout(&ring->sched);
if (amdgpu_sriov_vf(adev))
adev->virt.tdr_debug = true;
+   return 1;
}
  }
  
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h

index 2e0c368e19f6..61f7121e1c19 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -230,10 +230,17 @@ struct drm_sched_backend_ops {
struct dma_fence *(*run_job)(struct drm_sched_job *sched_job);
  
  	/**

- * @timedout_job: Called when a job has taken too long to execute,
- * to trigger GPU recovery.
+* @timedout_job: Called when a job has taken too long to execute,
+* to trigger GPU recovery.
+*
+* Return 0, if the job has been aborted successfully and will
+* never be heard of from the device. Return non-zero if the
+* job wasn't able to be aborted, i.e. if more time should be
+* given to this job. The result is not "bool" as this
+* function is not a predicate, although its result may seem
+* as one.
 */
-   void (*timedout_job)(struct drm_sched_job *sched_job);
+   int (*timedout_job)(struct drm_sched_job *sched_job);
  
  	/**

   * @free_job: Called once the job's finished fence has been signaled



___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 6/6] drm/sched: Make use of a "done" thread

2020-11-25 Thread Steven Price

On 25/11/2020 03:17, Luben Tuikov wrote:

Add a "done" list to which all completed jobs are added
to be freed. The drm_sched_job_done() callback is the
producer of jobs to this list.

Add a "done" thread which consumes from the done list
and frees up jobs. Now, the main scheduler thread only
pushes jobs to the GPU and the "done" thread frees them
up, on the way out of the GPU when they've completed
execution.


Generally I'd be in favour of a "done thread" as I think there are some 
murky corners of Panfrost's locking that would be helped by deferring 
the free_job() callback.


But I think you're trying to do too much in one patch here. And as 
Christian has pointed out there's some dodgy looking changes to locking 
which aren't explained.


Steve



Make use of the status returned by the GPU driver
timeout handler to decide whether to leave the job in
the pending list, or to send it off to the done list.
If a job is done, it is added to the done list and the
done thread woken up. If a job needs more time, it is
left on the pending list and the timeout timer
restarted.

Eliminate the polling mechanism of picking out done
jobs from the pending list, i.e. eliminate
drm_sched_get_cleanup_job(). Now the main scheduler
thread only pushes jobs down to the GPU.

Various other optimizations to the GPU scheduler
and job recovery are possible with this format.

Signed-off-by: Luben Tuikov 
---
  drivers/gpu/drm/scheduler/sched_main.c | 173 +
  include/drm/gpu_scheduler.h|  14 ++
  2 files changed, 101 insertions(+), 86 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 3eb7618a627d..289ae68cd97f 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -164,7 +164,8 @@ drm_sched_rq_select_entity(struct drm_sched_rq *rq)
   * drm_sched_job_done - complete a job
   * @s_job: pointer to the job which is done
   *
- * Finish the job's fence and wake up the worker thread.
+ * Finish the job's fence, move it to the done list,
+ * and wake up the done thread.
   */
  static void drm_sched_job_done(struct drm_sched_job *s_job)
  {
@@ -179,7 +180,12 @@ static void drm_sched_job_done(struct drm_sched_job *s_job)
dma_fence_get(&s_fence->finished);
drm_sched_fence_finished(s_fence);
dma_fence_put(&s_fence->finished);
-   wake_up_interruptible(&sched->wake_up_worker);
+
+   spin_lock(&sched->job_list_lock);
+   list_move(&s_job->list, &sched->done_list);
+   spin_unlock(&sched->job_list_lock);
+
+   wake_up_interruptible(&sched->done_wait_q);
  }
  
  /**

@@ -221,11 +227,10 @@ bool drm_sched_dependency_optimized(struct dma_fence* 
fence,
  EXPORT_SYMBOL(drm_sched_dependency_optimized);
  
  /**

- * drm_sched_start_timeout - start timeout for reset worker
- *
- * @sched: scheduler instance to start the worker for
+ * drm_sched_start_timeout - start a timeout timer
+ * @sched: scheduler instance whose job we're timing
   *
- * Start the timeout for the given scheduler.
+ * Start a timeout timer for the given scheduler.
   */
  static void drm_sched_start_timeout(struct drm_gpu_scheduler *sched)
  {
@@ -305,8 +310,8 @@ static void drm_sched_job_begin(struct drm_sched_job *s_job)
  
  	spin_lock(&sched->job_list_lock);

list_add_tail(&s_job->list, &sched->pending_list);
-   drm_sched_start_timeout(sched);
spin_unlock(&sched->job_list_lock);
+   drm_sched_start_timeout(sched);
  }
  
  static void drm_sched_job_timedout(struct work_struct *work)

@@ -316,37 +321,30 @@ static void drm_sched_job_timedout(struct work_struct 
*work)
  
  	sched = container_of(work, struct drm_gpu_scheduler, work_tdr.work);
  
-	/* Protects against concurrent deletion in drm_sched_get_cleanup_job */

spin_lock(&sched->job_list_lock);
job = list_first_entry_or_null(&sched->pending_list,
   struct drm_sched_job, list);
+   spin_unlock(&sched->job_list_lock);
  
  	if (job) {

-   /*
-* Remove the bad job so it cannot be freed by concurrent
-* drm_sched_cleanup_jobs. It will be reinserted back after 
sched->thread
-* is parked at which point it's safe.
-*/
-   list_del_init(&job->list);
-   spin_unlock(&sched->job_list_lock);
+   int res;
  
-		job->sched->ops->timedout_job(job);

+   job->job_status |= DRM_JOB_STATUS_TIMEOUT;
+   res = job->sched->ops->timedout_job(job);
+   if (res == 0) {
+   /* The job is out of the device.
+*/
+   spin_lock(&sched->job_list_lock);
+   list_move(&job->list, &sched->done_list);
+   spin_unlock(&sched->job_list_lock);
  
-		/*

-* Guilty job did complete and hence needs to be manually 
removed
- 

Re: [PATCH 3/6] drm/scheduler: Job timeout handler returns status

2020-11-25 Thread Lucas Stach
Am Mittwoch, den 25.11.2020, 11:04 + schrieb Steven Price:
> On 25/11/2020 03:17, Luben Tuikov wrote:
> > The job timeout handler now returns status
> > indicating back to the DRM layer whether the job
> > was successfully cancelled or whether more time
> > should be given to the job to complete.
> 
> I'm not sure I understand in what circumstances you would want to give 
> the job more time to complete. Could you expand on that?

On etnaviv we don't have the ability to preempt a running job, but we
can look at the GPU state to determine if it's still making progress
with the current job, so we want to extend the timeout in that case to
not kill a long running but valid job.

Regards,
Lucas

> One thing we're missing at the moment in Panfrost is the ability to 
> suspend ("soft stop" is the Mali jargon) a job and pick something else 
> to run. The propitiatory driver stack uses this to avoid timing out long 
> running jobs while still allowing other processes to have time on the 
> GPU. But this interface as it stands doesn't seem to provide that.
> 
> As the kernel test robot has already pointed out - you'll need to at the 
> very least update the other uses of this interface.
> 
> Steve
> 
> > Signed-off-by: Luben Tuikov 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  6 --
> >   include/drm/gpu_scheduler.h | 13 ++---
> >   2 files changed, 14 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > index ff48101bab55..81b73790ecc6 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > @@ -28,7 +28,7 @@
> >   #include "amdgpu.h"
> >   #include "amdgpu_trace.h"
> >   
> > -static void amdgpu_job_timedout(struct drm_sched_job *s_job)
> > +static int amdgpu_job_timedout(struct drm_sched_job *s_job)
> >   {
> > struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched);
> > struct amdgpu_job *job = to_amdgpu_job(s_job);
> > @@ -41,7 +41,7 @@ static void amdgpu_job_timedout(struct drm_sched_job 
> > *s_job)
> > amdgpu_ring_soft_recovery(ring, job->vmid, s_job->s_fence->parent)) 
> > {
> > DRM_ERROR("ring %s timeout, but soft recovered\n",
> >   s_job->sched->name);
> > -   return;
> > +   return 0;
> > }
> >   
> > amdgpu_vm_get_task_info(ring->adev, job->pasid, &ti);
> > @@ -53,10 +53,12 @@ static void amdgpu_job_timedout(struct drm_sched_job 
> > *s_job)
> >   
> > if (amdgpu_device_should_recover_gpu(ring->adev)) {
> > amdgpu_device_gpu_recover(ring->adev, job);
> > +   return 0;
> > } else {
> > drm_sched_suspend_timeout(&ring->sched);
> > if (amdgpu_sriov_vf(adev))
> > adev->virt.tdr_debug = true;
> > +   return 1;
> > }
> >   }
> >   
> > diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> > index 2e0c368e19f6..61f7121e1c19 100644
> > --- a/include/drm/gpu_scheduler.h
> > +++ b/include/drm/gpu_scheduler.h
> > @@ -230,10 +230,17 @@ struct drm_sched_backend_ops {
> > struct dma_fence *(*run_job)(struct drm_sched_job *sched_job);
> >   
> > /**
> > - * @timedout_job: Called when a job has taken too long to execute,
> > - * to trigger GPU recovery.
> > +* @timedout_job: Called when a job has taken too long to execute,
> > +* to trigger GPU recovery.
> > +*
> > +* Return 0, if the job has been aborted successfully and will
> > +* never be heard of from the device. Return non-zero if the
> > +* job wasn't able to be aborted, i.e. if more time should be
> > +* given to this job. The result is not "bool" as this
> > +* function is not a predicate, although its result may seem
> > +* as one.
> >  */
> > -   void (*timedout_job)(struct drm_sched_job *sched_job);
> > +   int (*timedout_job)(struct drm_sched_job *sched_job);
> >   
> > /**
> >* @free_job: Called once the job's finished fence has been 
> > signaled
> > 

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 3/6] drm/scheduler: Job timeout handler returns status

2020-11-25 Thread Steven Price

On 25/11/2020 11:15, Lucas Stach wrote:

Am Mittwoch, den 25.11.2020, 11:04 + schrieb Steven Price:

On 25/11/2020 03:17, Luben Tuikov wrote:

The job timeout handler now returns status
indicating back to the DRM layer whether the job
was successfully cancelled or whether more time
should be given to the job to complete.


I'm not sure I understand in what circumstances you would want to give
the job more time to complete. Could you expand on that?


On etnaviv we don't have the ability to preempt a running job, but we
can look at the GPU state to determine if it's still making progress
with the current job, so we want to extend the timeout in that case to
not kill a long running but valid job.


Ok, fair enough. Although from my experience (on Mali) jobs very rarely 
"get stuck" it's just that their run time can be excessive[1] causing 
other processes to not make forward progress. So I'd expect the timeout 
to be set based on how long a job can run before you need to stop it to 
allow other processes to run their jobs.


But I'm not familiar with etnaviv so perhaps stuck jobs are actually a 
thing there.


Thanks,

Steve

[1] Also on Mali it's quite possible to create an infinite duration job 
which appears to be making forward progress, so in that case our measure 
of progress isn't useful against these malicious jobs.



Regards,
Lucas


One thing we're missing at the moment in Panfrost is the ability to
suspend ("soft stop" is the Mali jargon) a job and pick something else
to run. The propitiatory driver stack uses this to avoid timing out long
running jobs while still allowing other processes to have time on the
GPU. But this interface as it stands doesn't seem to provide that.

As the kernel test robot has already pointed out - you'll need to at the
very least update the other uses of this interface.

Steve


Signed-off-by: Luben Tuikov 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  6 --
   include/drm/gpu_scheduler.h | 13 ++---
   2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index ff48101bab55..81b73790ecc6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -28,7 +28,7 @@
   #include "amdgpu.h"
   #include "amdgpu_trace.h"
   
-static void amdgpu_job_timedout(struct drm_sched_job *s_job)

+static int amdgpu_job_timedout(struct drm_sched_job *s_job)
   {
struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched);
struct amdgpu_job *job = to_amdgpu_job(s_job);
@@ -41,7 +41,7 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job)
amdgpu_ring_soft_recovery(ring, job->vmid, s_job->s_fence->parent)) 
{
DRM_ERROR("ring %s timeout, but soft recovered\n",
  s_job->sched->name);
-   return;
+   return 0;
}
   
   	amdgpu_vm_get_task_info(ring->adev, job->pasid, &ti);

@@ -53,10 +53,12 @@ static void amdgpu_job_timedout(struct drm_sched_job *s_job)
   
   	if (amdgpu_device_should_recover_gpu(ring->adev)) {

amdgpu_device_gpu_recover(ring->adev, job);
+   return 0;
} else {
drm_sched_suspend_timeout(&ring->sched);
if (amdgpu_sriov_vf(adev))
adev->virt.tdr_debug = true;
+   return 1;
}
   }
   
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h

index 2e0c368e19f6..61f7121e1c19 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -230,10 +230,17 @@ struct drm_sched_backend_ops {
struct dma_fence *(*run_job)(struct drm_sched_job *sched_job);
   
   	/**

- * @timedout_job: Called when a job has taken too long to execute,
- * to trigger GPU recovery.
+* @timedout_job: Called when a job has taken too long to execute,
+* to trigger GPU recovery.
+*
+* Return 0, if the job has been aborted successfully and will
+* never be heard of from the device. Return non-zero if the
+* job wasn't able to be aborted, i.e. if more time should be
+* given to this job. The result is not "bool" as this
+* function is not a predicate, although its result may seem
+* as one.
 */
-   void (*timedout_job)(struct drm_sched_job *sched_job);
+   int (*timedout_job)(struct drm_sched_job *sched_job);
   
   	/**

* @free_job: Called once the job's finished fence has been signaled





___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 4/7] drm/radeon: Pin buffers while they are vmap'ed

2020-11-25 Thread Thomas Zimmermann

Hi

Am 25.11.20 um 11:36 schrieb Daniel Vetter:

On Wed, Nov 25, 2020 at 11:13:13AM +0100, Christian König wrote:

Am 25.11.20 um 09:37 schrieb Thomas Zimmermann:

Hi

Am 24.11.20 um 15:09 schrieb Daniel Vetter:

On Tue, Nov 24, 2020 at 02:56:51PM +0100, Thomas Zimmermann wrote:

Hi

Am 24.11.20 um 14:36 schrieb Christian König:

Am 24.11.20 um 13:15 schrieb Thomas Zimmermann:

[SNIP]

First I wanted to put this into
drm_gem_ttm_vmap/vunmap(), but then wondered why
ttm_bo_vmap() doe not acquire the lock internally?
I'd expect that vmap/vunmap are close together and
do not overlap for the same BO.


We have use cases like the following during command submission:

1. lock
2. map
3. copy parts of the BO content somewhere else or patch
it with additional information
4. unmap
5. submit BO to the hardware
6. add hardware fence to the BO to make sure it doesn't move
7. unlock

That use case won't be possible with vmap/vunmap if we
move the lock/unlock into it and I hope to replace the
kmap/kunmap functions with them in the near term.


Otherwise, acquiring the reservation lock would
require another ref-counting variable or per-driver
code.


Hui, why that? Just put this into
drm_gem_ttm_vmap/vunmap() helper as you initially
planned.


Given your example above, step one would acquire the lock,
and step two would also acquire the lock as part of the vmap
implementation. Wouldn't this fail (At least during unmap or
unlock steps) ?


Oh, so you want to nest them? No, that is a rather bad no-go.


I don't want to nest/overlap them. My question was whether that
would be required. Apparently not.

While the console's BO is being set for scanout, it's protected from
movement via the pin/unpin implementation, right?


Yes, correct.


The driver does not acquire the resv lock for longer periods. I'm
asking because this would prevent any console-buffer updates while
the console is being displayed.


Correct as well, we only hold the lock for things like command
submission, pinning, unpinning etc etc



Thanks for answering my questions.





You need to make sure that the lock is only taken from the FB
path which wants to vmap the object.

Why don't you lock the GEM object from the caller in the generic
FB implementation?


With the current blitter code, it breaks abstraction. if vmap/vunmap
hold the lock implicitly, things would be easier.


Do you have a link to the code?


It's the damage blitter in the fbdev code. [1] While it flushes
the shadow
buffer into the BO, the BO has to be kept in place. I already
changed it to
lock struct drm_fb_helper.lock, but I don't think this is
enough. TTM could
still evict the BO concurrently.


So I'm not sure this is actually a problem: ttm could try to
concurrently
evict the buffer we pinned into vram, and then just skip to the next
one.

Plus atm generic fbdev isn't used on any chip where we really care about
that last few mb of vram being useable for command submission (well atm
there's no driver using it).


Well, this is the patchset for radeon. If it works out, amdgpu and
nouveau are natural next choices. Especially radeon and nouveau support
cards with low- to medium-sized VRAM. The MiBs wasted on fbdev certainly
matter.



Having the buffer pinned into system memory and trying to do a
concurrent
modeset that tries to pull it in is the hard failure mode. And holding
fb_helper.lock fully prevents that.

So not really clear on what failure mode you're seeing here?


Imagine the fbdev BO is in VRAM, but not pinned. (Maybe Xorg or Wayland
is running.) The fbdev BO is a few MiBs and not in use, so TTM would
want to evict it if memory gets tight.

What I have in mind is a concurrent modeset that requires the memory. If
we do a concurrent damage blit without protecting against eviction,
things go boom. Same for concurrent 3d graphics with textures, model
data, etc.


Completely agree.

This needs proper lock protection of the memory mapped buffer. Relying on
that some other code isn't run because we have some third part locks taken
is not sufficient here.


We are still protected by the pin count in this scenario. Plus, with
current drivers we always pin the fbdev buffer into vram, so occasionally
failing to move it out isn't a regression.


Why is this protected by the pin count? The counter should be zero in 
this scenario. Otherwise, we could not evict the fbdev BO on all those 
systems where that's a hard requirement (e.g., ast).


The pin count is currently maintained by the vmap implementation in vram 
helpers. Calling vmap is an implicit pin; calling vunmap is an implicit 
unpin. This prevents eviction in the damage worker. But now I was told 
than pinning is only for BOs that are controlled by userspace and 
internal users should acquire the resv lock. So vram helpers have to be 
fixed, actually.


In vram helpers, unmapping does not mean eviction. The unmap operation 
only marks the BO as unmappable. The real unmap happens when the 
eviction takes place. This avoids many

Re: [PATCH 3/6] drm/scheduler: Job timeout handler returns status

2020-11-25 Thread Lucas Stach
Am Mittwoch, den 25.11.2020, 11:22 + schrieb Steven Price:
> On 25/11/2020 11:15, Lucas Stach wrote:
> > Am Mittwoch, den 25.11.2020, 11:04 + schrieb Steven Price:
> > > On 25/11/2020 03:17, Luben Tuikov wrote:
> > > > The job timeout handler now returns status
> > > > indicating back to the DRM layer whether the job
> > > > was successfully cancelled or whether more time
> > > > should be given to the job to complete.
> > > 
> > > I'm not sure I understand in what circumstances you would want to give
> > > the job more time to complete. Could you expand on that?
> > 
> > On etnaviv we don't have the ability to preempt a running job, but we
> > can look at the GPU state to determine if it's still making progress
> > with the current job, so we want to extend the timeout in that case to
> > not kill a long running but valid job.
> 
> Ok, fair enough. Although from my experience (on Mali) jobs very rarely 
> "get stuck" it's just that their run time can be excessive[1] causing 
> other processes to not make forward progress. So I'd expect the timeout 
> to be set based on how long a job can run before you need to stop it to 
> allow other processes to run their jobs.

Yea, we might want to kill the job eventually, but people tend to get
very angry if their use-case gets broken just because the userspace
driver manages to put enough blits in one job to run over the 500ms
timeout we allow for a job and the kernel then just hard-kills the job.

In an ideal world we would just preempt the job and allow something
else to run for a while, but without proper preemption support in HW
that's not an option right now.

> But I'm not familiar with etnaviv so perhaps stuck jobs are actually a 
> thing there.

It happens from time to time when our understanding of the HW isn't
complete and the userspace driver manages to create command streams
with missing semaphores between HW engines. ;)

Regards,
Lucas

> Thanks,
> 
> Steve
> 
> [1] Also on Mali it's quite possible to create an infinite duration job 
> which appears to be making forward progress, so in that case our measure 
> of progress isn't useful against these malicious jobs.
> 
> > Regards,
> > Lucas
> > 
> > > One thing we're missing at the moment in Panfrost is the ability to
> > > suspend ("soft stop" is the Mali jargon) a job and pick something else
> > > to run. The propitiatory driver stack uses this to avoid timing out long
> > > running jobs while still allowing other processes to have time on the
> > > GPU. But this interface as it stands doesn't seem to provide that.
> > > 
> > > As the kernel test robot has already pointed out - you'll need to at the
> > > very least update the other uses of this interface.
> > > 
> > > Steve
> > > 
> > > > Signed-off-by: Luben Tuikov 
> > > > ---
> > > >drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  6 --
> > > >include/drm/gpu_scheduler.h | 13 ++---
> > > >2 files changed, 14 insertions(+), 5 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
> > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > > > index ff48101bab55..81b73790ecc6 100644
> > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > > > @@ -28,7 +28,7 @@
> > > >#include "amdgpu.h"
> > > >#include "amdgpu_trace.h"
> > > >
> > > > -static void amdgpu_job_timedout(struct drm_sched_job *s_job)
> > > > +static int amdgpu_job_timedout(struct drm_sched_job *s_job)
> > > >{
> > > > struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched);
> > > > struct amdgpu_job *job = to_amdgpu_job(s_job);
> > > > @@ -41,7 +41,7 @@ static void amdgpu_job_timedout(struct drm_sched_job 
> > > > *s_job)
> > > > amdgpu_ring_soft_recovery(ring, job->vmid, 
> > > > s_job->s_fence->parent)) {
> > > > DRM_ERROR("ring %s timeout, but soft recovered\n",
> > > >   s_job->sched->name);
> > > > -   return;
> > > > +   return 0;
> > > > }
> > > >
> > > > amdgpu_vm_get_task_info(ring->adev, job->pasid, &ti);
> > > > @@ -53,10 +53,12 @@ static void amdgpu_job_timedout(struct 
> > > > drm_sched_job *s_job)
> > > >
> > > > if (amdgpu_device_should_recover_gpu(ring->adev)) {
> > > > amdgpu_device_gpu_recover(ring->adev, job);
> > > > +   return 0;
> > > > } else {
> > > > drm_sched_suspend_timeout(&ring->sched);
> > > > if (amdgpu_sriov_vf(adev))
> > > > adev->virt.tdr_debug = true;
> > > > +   return 1;
> > > > }
> > > >}
> > > >
> > > > diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> > > > index 2e0c368e19f6..61f7121e1c19 100644
> > > > --- a/include/drm/gpu_scheduler.h
> > > > +++ b/include/drm/gpu_scheduler.h
> > > > @@ -230,10 +230,17 @@ struct drm_sched_backend_ops {
> > > > str

RE: [PATCH] video: hyperv_fb: Directly use the MMIO VRAM

2020-11-25 Thread Dexuan Cui
Hi Wei Liu,
Please do not pick up this patch, because actually MMIO VRAM can not work
with fb_deferred_io.

Previously I didn't test Xorg -- sorry. As soon as I tested it, I got the below
warning and the Xorg program ternimated immediately:

[   28.148432] WARNING: CPU: 19 PID: 1410 at mm/vmalloc.c:383 
vmalloc_to_page+0x14b/0x150
...
[   28.192959] CPU: 19 PID: 1410 Comm: Xorg Tainted: GE 
5.10.0-rc1+ #4
...
[   28.208720] RIP: 0010:vmalloc_to_page+0x14b/0x150
...
[   28.299231] Call Trace:
[   28.301428]  fb_deferred_io_fault+0x3a/0xa0
[   28.305276]  __do_fault+0x36/0x120
[   28.308276]  handle_mm_fault+0x1144/0x1950
[   28.311963]  exc_page_fault+0x290/0x510
[   28.315551]  ? asm_exc_page_fault+0x8/0x30
[   28.319186]  asm_exc_page_fault+0x1e/0x30
[   28.322969] RIP: 0033:0x7fbeda3ec2f5

The issue is that fb_deferred_io_page() requires that the PFN be backed by a
struct page, but it looks the MMIO address does not have the struct page backed.

So I have to drop this patch. 
Thanks Wei Hu and Michael for pointing this out!

FYI: drivers/video/fbdev/core/fb_defio.c:
static struct page *fb_deferred_io_page(struct fb_info *info, unsigned long 
offs)
{
void *screen_base = (void __force *) info->screen_base;
struct page *page;

if (is_vmalloc_addr(screen_base + offs))
page = vmalloc_to_page(screen_base + offs);
else
page = pfn_to_page((info->fix.smem_start + offs) >> PAGE_SHIFT);

return page;
}

/* this is to find and return the vmalloc-ed fb pages */
static vm_fault_t fb_deferred_io_fault(struct vm_fault *vmf)
{
unsigned long offset;
struct page *page;
struct fb_info *info = vmf->vma->vm_private_data;

offset = vmf->pgoff << PAGE_SHIFT;
if (offset >= info->fix.smem_len)
return VM_FAULT_SIGBUS;

page = fb_deferred_io_page(info, offset);
if (!page)
return VM_FAULT_SIGBUS;

Thanks,
-- Dexuan
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 3/6] drm/scheduler: Job timeout handler returns status

2020-11-25 Thread Christian König

Am 25.11.20 um 12:04 schrieb Steven Price:

On 25/11/2020 03:17, Luben Tuikov wrote:

The job timeout handler now returns status
indicating back to the DRM layer whether the job
was successfully cancelled or whether more time
should be given to the job to complete.


I'm not sure I understand in what circumstances you would want to give 
the job more time to complete. Could you expand on that?


One thing we're missing at the moment in Panfrost is the ability to 
suspend ("soft stop" is the Mali jargon) a job and pick something else 
to run. The propitiatory driver stack uses this to avoid timing out 
long running jobs while still allowing other processes to have time on 
the GPU. But this interface as it stands doesn't seem to provide that.


On AMD hardware we call this IB preemption and it is indeed not handled 
very well by the scheduler at the moment.


See how the amdgpu code messes with the preempted IBs to restart them 
for example.


Christian.



As the kernel test robot has already pointed out - you'll need to at 
the very least update the other uses of this interface.


Steve



Signed-off-by: Luben Tuikov 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  6 --
  include/drm/gpu_scheduler.h | 13 ++---
  2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c

index ff48101bab55..81b73790ecc6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -28,7 +28,7 @@
  #include "amdgpu.h"
  #include "amdgpu_trace.h"
  -static void amdgpu_job_timedout(struct drm_sched_job *s_job)
+static int amdgpu_job_timedout(struct drm_sched_job *s_job)
  {
  struct amdgpu_ring *ring = to_amdgpu_ring(s_job->sched);
  struct amdgpu_job *job = to_amdgpu_job(s_job);
@@ -41,7 +41,7 @@ static void amdgpu_job_timedout(struct 
drm_sched_job *s_job)
  amdgpu_ring_soft_recovery(ring, job->vmid, 
s_job->s_fence->parent)) {

  DRM_ERROR("ring %s timeout, but soft recovered\n",
    s_job->sched->name);
-    return;
+    return 0;
  }
    amdgpu_vm_get_task_info(ring->adev, job->pasid, &ti);
@@ -53,10 +53,12 @@ static void amdgpu_job_timedout(struct 
drm_sched_job *s_job)

    if (amdgpu_device_should_recover_gpu(ring->adev)) {
  amdgpu_device_gpu_recover(ring->adev, job);
+    return 0;
  } else {
  drm_sched_suspend_timeout(&ring->sched);
  if (amdgpu_sriov_vf(adev))
  adev->virt.tdr_debug = true;
+    return 1;
  }
  }
  diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 2e0c368e19f6..61f7121e1c19 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -230,10 +230,17 @@ struct drm_sched_backend_ops {
  struct dma_fence *(*run_job)(struct drm_sched_job *sched_job);
    /**
- * @timedout_job: Called when a job has taken too long to 
execute,

- * to trigger GPU recovery.
+ * @timedout_job: Called when a job has taken too long to execute,
+ * to trigger GPU recovery.
+ *
+ * Return 0, if the job has been aborted successfully and will
+ * never be heard of from the device. Return non-zero if the
+ * job wasn't able to be aborted, i.e. if more time should be
+ * given to this job. The result is not "bool" as this
+ * function is not a predicate, although its result may seem
+ * as one.
   */
-    void (*timedout_job)(struct drm_sched_job *sched_job);
+    int (*timedout_job)(struct drm_sched_job *sched_job);
    /**
   * @free_job: Called once the job's finished fence has been 
signaled






___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v4 55/80] drm/panel: panel-dsi-cm: use MIPI_DCS_GET_ERROR_COUNT_ON_DSI

2020-11-25 Thread Tomi Valkeinen
On 24/11/2020 18:38, Sam Ravnborg wrote:

>>> IMO you should get all the patches at least up including this patch applied.
>>> They are all reviewed/acked. And then you have a much smaller stack of
>>> patches to spam us with.
>>
>> Yes, I think that makes sense. I did not want to merge them earlier, as with 
>> the v3, I could not get
>> videomode panels work at all (while cmd mode panels did work). So I was not 
>> sure if something is
>> totally silly and broken in the series.
>>
>> Now that I can get video mode panels work with some hacks on top, I'm fine 
>> with merging these.
>>
>> But it's too late for 5.11, as we need testing and work on the video mode 
>> panels. So targeting 5.12.
> Obviously your call, but I see no reason to wait for working videomode
> panles if what you have now do not introduce any (known) regressions.
> 
> ofc I assume videomode panels is something new and not something that worked
> before.
It gets a bit muddy here. The omap dsi host driver has had videomode support 
for a long time, but
there has been no upstream videomode panel drivers (omapdrm specific drivers, 
as omapdrm had its own
panel framework) and no board dts files using it.

I have a board with a custom made DSI videomode panel setup, but it's broken 
(cable, I think) and
works only randomly. I have an old 4.14 based branch with a hacky panel driver 
and dts file which
get the panel working. I don't know if videomode works on current upstream, or 
has it been broken
between 4.14 and current upstream, as the 4.14 panel driver doesn't work 
without modifications on
current upstream.

In this series we convert the omap dsi host driver to be a proper DRM citizen, 
removing support for
omapdrm specific panels, so new DRM panel drivers are needed to replace the 
omapdrm specific ones.

With this series applied, and adding a new panel driver and dts changes, 
videomode works (Nikolaus
confirmed that his panel works. Mine doesn't, as afaics it needs more finetuned 
initialization which
may not be possible with the current DRM bridge/panel callbacks. But mine works 
with some hacks).
But I'm sure in the middle of this series videomode won't work.

So, I think one can argue that this causes regressions in the middle of the 
series to non-upstream
panel drivers, but at the end of the series, they probably work, presuming you 
have a new DRM panel
driver for it.

 Tomi

-- 
Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use

2020-11-25 Thread Christian König

Am 25.11.20 um 11:40 schrieb Daniel Vetter:

On Tue, Nov 24, 2020 at 05:44:07PM +0100, Christian König wrote:

Am 24.11.20 um 17:22 schrieb Andrey Grodzovsky:

On 11/24/20 2:41 AM, Christian König wrote:

Am 23.11.20 um 22:08 schrieb Andrey Grodzovsky:

On 11/23/20 3:41 PM, Christian König wrote:

Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky:

On 11/23/20 3:20 PM, Christian König wrote:

Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky:

On 11/25/20 5:42 AM, Christian König wrote:

Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:

It's needed to drop iommu backed pages on device unplug
before device's IOMMU group is released.

It would be cleaner if we could do the whole
handling in TTM. I also need to double check
what you are doing with this function.

Christian.


Check patch "drm/amdgpu: Register IOMMU topology
notifier per device." to see
how i use it. I don't see why this should go
into TTM mid-layer - the stuff I do inside
is vendor specific and also I don't think TTM is
explicitly aware of IOMMU ?
Do you mean you prefer the IOMMU notifier to be
registered from within TTM
and then use a hook to call into vendor specific handler ?

No, that is really vendor specific.

What I meant is to have a function like
ttm_resource_manager_evict_all() which you only need
to call and all tt objects are unpopulated.


So instead of this BO list i create and later iterate in
amdgpu from the IOMMU patch you just want to do it
within
TTM with a single function ? Makes much more sense.

Yes, exactly.

The list_empty() checks we have in TTM for the LRU are
actually not the best idea, we should now check the
pin_count instead. This way we could also have a list of the
pinned BOs in TTM.


So from my IOMMU topology handler I will iterate the TTM LRU for
the unpinned BOs and this new function for the pinned ones  ?
It's probably a good idea to combine both iterations into this
new function to cover all the BOs allocated on the device.

Yes, that's what I had in my mind as well.




BTW: Have you thought about what happens when we unpopulate
a BO while we still try to use a kernel mapping for it? That
could have unforeseen consequences.


Are you asking what happens to kmap or vmap style mapped CPU
accesses once we drop all the DMA backing pages for a particular
BO ? Because for user mappings
(mmap) we took care of this with dummy page reroute but indeed
nothing was done for in kernel CPU mappings.

Yes exactly that.

In other words what happens if we free the ring buffer while the
kernel still writes to it?

Christian.


While we can't control user application accesses to the mapped buffers
explicitly and hence we use page fault rerouting
I am thinking that in this  case we may be able to sprinkle
drm_dev_enter/exit in any such sensitive place were we might
CPU access a DMA buffer from the kernel ?

Yes, I fear we are going to need that.

Uh ... problem is that dma_buf_vmap are usually permanent things. Maybe we
could stuff this into begin/end_cpu_access (but only for the kernel, so a
bit tricky)?


Oh very very good point! I haven't thought about DMA-buf mmaps in this 
context yet.




btw the other issue with dma-buf (and even worse with dma_fence) is
refcounting of the underlying drm_device. I'd expect that all your
callbacks go boom if the dma_buf outlives your drm_device. That part isn't
yet solved in your series here.


Well thinking more about this, it seems to be a another really good 
argument why mapping pages from DMA-bufs into application address space 
directly is a very bad idea :)


But yes, we essentially can't remove the device as long as there is a 
DMA-buf with mappings. No idea how to clean that one up.


Christian.


-Daniel


Things like CPU page table updates, ring buffer accesses and FW memcpy ?
Is there other places ?

Puh, good question. I have no idea.


Another point is that at this point the driver shouldn't access any such
buffers as we are at the process finishing the device.
AFAIK there is no page fault mechanism for kernel mappings so I don't
think there is anything else to do ?

Well there is a page fault handler for kernel mappings, but that one just
prints the stack trace into the system log and calls BUG(); :)

Long story short we need to avoid any access to released pages after unplug.
No matter if it's from the kernel or userspace.

Regards,
Christian.


Andrey


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 1/7] drm/radeon: stop using pages with drm_prime_sg_to_page_addr_arrays v2

2020-11-25 Thread Christian König
This is deprecated.

v2: also use ttm_sg_tt_init to avoid allocating the page array.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/radeon/radeon_ttm.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c 
b/drivers/gpu/drm/radeon/radeon_ttm.c
index 0ca381b95d3d..5d00b3dff388 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -401,8 +401,8 @@ static int radeon_ttm_tt_pin_userptr(struct ttm_bo_device 
*bdev, struct ttm_tt *
if (r)
goto release_sg;
 
-   drm_prime_sg_to_page_addr_arrays(ttm->sg, ttm->pages,
-gtt->ttm.dma_address, ttm->num_pages);
+   drm_prime_sg_to_page_addr_arrays(ttm->sg, NULL, gtt->ttm.dma_address,
+ttm->num_pages);
 
return 0;
 
@@ -542,7 +542,7 @@ static struct ttm_tt *radeon_ttm_tt_create(struct 
ttm_buffer_object *bo,
else
caching = ttm_cached;
 
-   if (ttm_dma_tt_init(>t->ttm, bo, page_flags, caching)) {
+   if (ttm_sg_tt_init(>t->ttm, bo, page_flags, caching)) {
kfree(gtt);
return NULL;
}
@@ -580,8 +580,9 @@ static int radeon_ttm_tt_populate(struct ttm_bo_device 
*bdev,
}
 
if (slave && ttm->sg) {
-   drm_prime_sg_to_page_addr_arrays(ttm->sg, ttm->pages,
-gtt->ttm.dma_address, 
ttm->num_pages);
+   drm_prime_sg_to_page_addr_arrays(ttm->sg, NULL,
+gtt->ttm.dma_address,
+ttm->num_pages);
return 0;
}
 
-- 
2.25.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 5/7] drm/qxl: switch to ttm_sg_tt_init

2020-11-25 Thread Christian König
The function qxl_gem_prime_import_sg_table is not fully implemented.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/qxl/qxl_ttm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/qxl/qxl_ttm.c b/drivers/gpu/drm/qxl/qxl_ttm.c
index 128c38c8a837..d42e750cbdd3 100644
--- a/drivers/gpu/drm/qxl/qxl_ttm.c
+++ b/drivers/gpu/drm/qxl/qxl_ttm.c
@@ -115,7 +115,7 @@ static struct ttm_tt *qxl_ttm_tt_create(struct 
ttm_buffer_object *bo,
ttm = kzalloc(sizeof(struct ttm_tt), GFP_KERNEL);
if (ttm == NULL)
return NULL;
-   if (ttm_dma_tt_init(ttm, bo, page_flags, ttm_cached)) {
+   if (ttm_sg_tt_init(ttm, bo, page_flags, ttm_cached)) {
kfree(ttm);
return NULL;
}
-- 
2.25.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 4/7] drm/vmwgfx: switch to ttm_sg_tt_init

2020-11-25 Thread Christian König
According to Daniel VMWGFX doesn't support DMA-buf anyway.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c 
b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
index 6a04261ce760..1c75f73538c0 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c
@@ -611,8 +611,8 @@ static struct ttm_tt *vmw_ttm_tt_create(struct 
ttm_buffer_object *bo,
vmw_be->mob = NULL;
 
if (vmw_be->dev_priv->map_mode == vmw_dma_alloc_coherent)
-   ret = ttm_dma_tt_init(&vmw_be->dma_ttm, bo, page_flags,
- ttm_cached);
+   ret = ttm_sg_tt_init(&vmw_be->dma_ttm, bo, page_flags,
+ttm_cached);
else
ret = ttm_tt_init(&vmw_be->dma_ttm, bo, page_flags,
  ttm_cached);
-- 
2.25.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 3/7] drm/nouveau: stop using pages with drm_prime_sg_to_page_addr_arrays v2

2020-11-25 Thread Christian König
This is deprecated, also drop the comment about faults.

v2: also use ttm_sg_tt_init to avoid allocating the page array.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/nouveau/nouveau_bo.c| 6 +++---
 drivers/gpu/drm/nouveau/nouveau_sgdma.c | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
b/drivers/gpu/drm/nouveau/nouveau_bo.c
index 7aa4286784ae..c30f088cefcc 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -1235,9 +1235,9 @@ nouveau_ttm_tt_populate(struct ttm_bo_device *bdev,
return 0;
 
if (slave && ttm->sg) {
-   /* make userspace faulting work */
-   drm_prime_sg_to_page_addr_arrays(ttm->sg, ttm->pages,
-ttm_dma->dma_address, 
ttm->num_pages);
+   drm_prime_sg_to_page_addr_arrays(ttm->sg, NULL,
+ttm_dma->dma_address,
+ttm->num_pages);
return 0;
}
 
diff --git a/drivers/gpu/drm/nouveau/nouveau_sgdma.c 
b/drivers/gpu/drm/nouveau/nouveau_sgdma.c
index a2e23fd4906c..1cf52635ea74 100644
--- a/drivers/gpu/drm/nouveau/nouveau_sgdma.c
+++ b/drivers/gpu/drm/nouveau/nouveau_sgdma.c
@@ -84,7 +84,7 @@ nouveau_sgdma_create_ttm(struct ttm_buffer_object *bo, 
uint32_t page_flags)
if (!nvbe)
return NULL;
 
-   if (ttm_dma_tt_init(&nvbe->ttm, bo, page_flags, caching)) {
+   if (ttm_sg_tt_init(&nvbe->ttm, bo, page_flags, caching)) {
kfree(nvbe);
return NULL;
}
-- 
2.25.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 2/7] drm/amdgpu: stop using pages with drm_prime_sg_to_page_addr_arrays

2020-11-25 Thread Christian König
This is deprecated.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index c438d290a6db..02748e030322 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -918,8 +918,8 @@ static int amdgpu_ttm_tt_pin_userptr(struct ttm_bo_device 
*bdev,
goto release_sg;
 
/* convert SG to linear array of pages and dma addresses */
-   drm_prime_sg_to_page_addr_arrays(ttm->sg, ttm->pages,
-gtt->ttm.dma_address, ttm->num_pages);
+   drm_prime_sg_to_page_addr_arrays(ttm->sg, NULL, gtt->ttm.dma_address,
+ttm->num_pages);
 
return 0;
 
@@ -1264,7 +1264,7 @@ static int amdgpu_ttm_tt_populate(struct ttm_bo_device 
*bdev,
ttm->sg = sgt;
}
 
-   drm_prime_sg_to_page_addr_arrays(ttm->sg, ttm->pages,
+   drm_prime_sg_to_page_addr_arrays(ttm->sg, NULL,
 gtt->ttm.dma_address,
 ttm->num_pages);
return 0;
-- 
2.25.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 6/7] drm/ttm: nuke ttm_dma_tt_init

2020-11-25 Thread Christian König
Not used any more.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/ttm/ttm_tt.c | 13 -
 include/drm/ttm/ttm_tt.h |  2 --
 2 files changed, 15 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index da9eeffe0c6d..77ba784425dd 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -162,19 +162,6 @@ void ttm_tt_fini(struct ttm_tt *ttm)
 }
 EXPORT_SYMBOL(ttm_tt_fini);
 
-int ttm_dma_tt_init(struct ttm_tt *ttm, struct ttm_buffer_object *bo,
-   uint32_t page_flags, enum ttm_caching caching)
-{
-   ttm_tt_init_fields(ttm, bo, page_flags, caching);
-
-   if (ttm_dma_tt_alloc_page_directory(ttm)) {
-   pr_err("Failed allocating page table\n");
-   return -ENOMEM;
-   }
-   return 0;
-}
-EXPORT_SYMBOL(ttm_dma_tt_init);
-
 int ttm_sg_tt_init(struct ttm_tt *ttm, struct ttm_buffer_object *bo,
   uint32_t page_flags, enum ttm_caching caching)
 {
diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
index da27e9d8fa64..6c8eb9a4de81 100644
--- a/include/drm/ttm/ttm_tt.h
+++ b/include/drm/ttm/ttm_tt.h
@@ -99,8 +99,6 @@ int ttm_tt_create(struct ttm_buffer_object *bo, bool 
zero_alloc);
  */
 int ttm_tt_init(struct ttm_tt *ttm, struct ttm_buffer_object *bo,
uint32_t page_flags, enum ttm_caching caching);
-int ttm_dma_tt_init(struct ttm_tt *ttm_dma, struct ttm_buffer_object *bo,
-   uint32_t page_flags, enum ttm_caching caching);
 int ttm_sg_tt_init(struct ttm_tt *ttm_dma, struct ttm_buffer_object *bo,
   uint32_t page_flags, enum ttm_caching caching);
 
-- 
2.25.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 7/7] drm/prime: split array import functions v4

2020-11-25 Thread Christian König
Mapping the imported pages of a DMA-buf into an userspace process
doesn't work as expected.

But we have reoccurring requests on this approach, so split the
functions for this and  document that dma_buf_mmap() needs to be used
instead.

v2: split it into two functions
v3: rebased on latest changes
v4: update commit message a bit

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c |  9 ++-
 drivers/gpu/drm/drm_prime.c | 64 +
 drivers/gpu/drm/etnaviv/etnaviv_gem_prime.c |  3 +-
 drivers/gpu/drm/mediatek/mtk_drm_gem.c  |  2 +-
 drivers/gpu/drm/msm/msm_gem.c   |  2 +-
 drivers/gpu/drm/nouveau/nouveau_bo.c|  5 +-
 drivers/gpu/drm/omapdrm/omap_gem.c  |  3 +-
 drivers/gpu/drm/radeon/radeon_ttm.c |  9 ++-
 drivers/gpu/drm/vgem/vgem_drv.c |  3 +-
 drivers/gpu/drm/xen/xen_drm_front_gem.c |  4 +-
 include/drm/drm_prime.h |  7 ++-
 11 files changed, 60 insertions(+), 51 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 02748e030322..a34fedcc8b61 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -918,8 +918,8 @@ static int amdgpu_ttm_tt_pin_userptr(struct ttm_bo_device 
*bdev,
goto release_sg;
 
/* convert SG to linear array of pages and dma addresses */
-   drm_prime_sg_to_page_addr_arrays(ttm->sg, NULL, gtt->ttm.dma_address,
-ttm->num_pages);
+   drm_prime_sg_to_dma_addr_array(ttm->sg, gtt->ttm.dma_address,
+  ttm->num_pages);
 
return 0;
 
@@ -1264,9 +1264,8 @@ static int amdgpu_ttm_tt_populate(struct ttm_bo_device 
*bdev,
ttm->sg = sgt;
}
 
-   drm_prime_sg_to_page_addr_arrays(ttm->sg, NULL,
-gtt->ttm.dma_address,
-ttm->num_pages);
+   drm_prime_sg_to_dma_addr_array(ttm->sg, gtt->ttm.dma_address,
+  ttm->num_pages);
return 0;
}
 
diff --git a/drivers/gpu/drm/drm_prime.c b/drivers/gpu/drm/drm_prime.c
index 7db55fce35d8..683aa29ecd3b 100644
--- a/drivers/gpu/drm/drm_prime.c
+++ b/drivers/gpu/drm/drm_prime.c
@@ -978,44 +978,58 @@ struct drm_gem_object *drm_gem_prime_import(struct 
drm_device *dev,
 EXPORT_SYMBOL(drm_gem_prime_import);
 
 /**
- * drm_prime_sg_to_page_addr_arrays - convert an sg table into a page array
+ * drm_prime_sg_to_page_array - convert an sg table into a page array
  * @sgt: scatter-gather table to convert
- * @pages: optional array of page pointers to store the page array in
- * @addrs: optional array to store the dma bus address of each page
+ * @pages: array of page pointers to store the pages in
+ * @max_entries: size of the passed-in array
+ *
+ * Exports an sg table into an array of pages.
+ *
+ * This function is deprecated and strongly discouraged to be used.
+ * The page array is only useful for page faults and those can corrupt fields
+ * in the struct page if they are not handled by the exporting driver.
+ */
+int __deprecated drm_prime_sg_to_page_array(struct sg_table *sgt,
+   struct page **pages,
+   int max_entries)
+{
+   struct sg_page_iter page_iter;
+   struct page **p = pages;
+
+   for_each_sgtable_page(sgt, &page_iter, 0) {
+   if (WARN_ON(p - pages >= max_entries))
+   return -1;
+   *p++ = sg_page_iter_page(&page_iter);
+   }
+   return 0;
+}
+EXPORT_SYMBOL(drm_prime_sg_to_page_array);
+
+/**
+ * drm_prime_sg_to_dma_addr_array - convert an sg table into a dma addr array
+ * @sgt: scatter-gather table to convert
+ * @addrs: array to store the dma bus address of each page
  * @max_entries: size of both the passed-in arrays
  *
- * Exports an sg table into an array of pages and addresses. This is currently
- * required by the TTM driver in order to do correct fault handling.
+ * Exports an sg table into an array of addresses.
  *
- * Drivers can use this in their &drm_driver.gem_prime_import_sg_table
+ * Drivers should use this in their &drm_driver.gem_prime_import_sg_table
  * implementation.
  */
-int drm_prime_sg_to_page_addr_arrays(struct sg_table *sgt, struct page **pages,
-dma_addr_t *addrs, int max_entries)
+int drm_prime_sg_to_dma_addr_array(struct sg_table *sgt, dma_addr_t *addrs,
+  int max_entries)
 {
struct sg_dma_page_iter dma_iter;
-   struct sg_page_iter page_iter;
-   struct page **p = pages;
dma_addr_t *a = addrs;
 
-   if (pages) {
-   for_each_sgtable_page(sgt, &page_iter, 0) {
-   if (WARN_ON(p - 

Re: [PATCHv10 0/9] System Cache support for GPU and required SMMU support

2020-11-25 Thread Will Deacon
On Wed, 25 Nov 2020 12:30:09 +0530, Sai Prakash Ranjan wrote:
> Some hardware variants contain a system cache or the last level
> cache(llc). This cache is typically a large block which is shared
> by multiple clients on the SOC. GPU uses the system cache to cache
> both the GPU data buffers(like textures) as well the SMMU pagetables.
> This helps with improved render performance as well as lower power
> consumption by reducing the bus traffic to the system memory.
> 
> [...]

Applied the SMMU bits to arm64 (for-next/iommu/arm-smmu), thanks!

[3/9] iommu/arm-smmu: Add support for pagetable config domain attribute
  https://git.kernel.org/arm64/c/c99110a865a3
[4/9] iommu/arm-smmu: Move non-strict mode to use io_pgtable_domain_attr
  https://git.kernel.org/arm64/c/12bc36793fd6

[8/9] iommu: arm-smmu-impl: Use table to list QCOM implementations
  https://git.kernel.org/arm64/c/00597f9ff5ec
[9/9] iommu: arm-smmu-impl: Add a space before open parenthesis
  https://git.kernel.org/arm64/c/7f575a6087f4

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCHv10 0/9] System Cache support for GPU and required SMMU support

2020-11-25 Thread Will Deacon
On Wed, 25 Nov 2020 12:30:09 +0530, Sai Prakash Ranjan wrote:
> Some hardware variants contain a system cache or the last level
> cache(llc). This cache is typically a large block which is shared
> by multiple clients on the SOC. GPU uses the system cache to cache
> both the GPU data buffers(like textures) as well the SMMU pagetables.
> This helps with improved render performance as well as lower power
> consumption by reducing the bus traffic to the system memory.
> 
> [...]

Applied first two patches on a shared branch for Rob:

arm64 (for-next/iommu/io-pgtable-domain-attr), thanks!

[1/9] iommu/io-pgtable: Add a domain attribute for pagetable configuration
  https://git.kernel.org/arm64/c/a7656ecf825a
[2/9] iommu/io-pgtable-arm: Add support to use system cache
  https://git.kernel.org/arm64/c/e67890c97944

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 01/15] drm/amdgpu: Remove references to struct drm_device.pdev

2020-11-25 Thread Alex Deucher
On Tue, Nov 24, 2020 at 6:38 AM Thomas Zimmermann  wrote:
>
> Using struct drm_device.pdev is deprecated. Convert amdgpu to struct
> drm_device.dev. No functional changes.
>
> Signed-off-by: Thomas Zimmermann 
> Cc: Alex Deucher 
> Cc: Christian König 

There are a few unrelated whitespace changes.  Other than that, patch is:
Acked-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  | 23 ++---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_display.c |  3 ++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  1 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c  |  2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 10 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c |  2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 10 -
>  7 files changed, 25 insertions(+), 26 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 7560b05e4ac1..d61715133825 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -1404,9 +1404,9 @@ static void amdgpu_switcheroo_set_state(struct pci_dev 
> *pdev,
> /* don't suspend or resume card normally */
> dev->switch_power_state = DRM_SWITCH_POWER_CHANGING;
>
> -   pci_set_power_state(dev->pdev, PCI_D0);
> -   amdgpu_device_load_pci_state(dev->pdev);
> -   r = pci_enable_device(dev->pdev);
> +   pci_set_power_state(pdev, PCI_D0);
> +   amdgpu_device_load_pci_state(pdev);
> +   r = pci_enable_device(pdev);
> if (r)
> DRM_WARN("pci_enable_device failed (%d)\n", r);
> amdgpu_device_resume(dev, true);
> @@ -1418,10 +1418,10 @@ static void amdgpu_switcheroo_set_state(struct 
> pci_dev *pdev,
> drm_kms_helper_poll_disable(dev);
> dev->switch_power_state = DRM_SWITCH_POWER_CHANGING;
> amdgpu_device_suspend(dev, true);
> -   amdgpu_device_cache_pci_state(dev->pdev);
> +   amdgpu_device_cache_pci_state(pdev);
> /* Shut down the device */
> -   pci_disable_device(dev->pdev);
> -   pci_set_power_state(dev->pdev, PCI_D3cold);
> +   pci_disable_device(pdev);
> +   pci_set_power_state(pdev, PCI_D3cold);
> dev->switch_power_state = DRM_SWITCH_POWER_OFF;
> }
>  }
> @@ -1684,8 +1684,7 @@ static void amdgpu_device_enable_virtual_display(struct 
> amdgpu_device *adev)
> adev->enable_virtual_display = false;
>
> if (amdgpu_virtual_display) {
> -   struct drm_device *ddev = adev_to_drm(adev);
> -   const char *pci_address_name = pci_name(ddev->pdev);
> +   const char *pci_address_name = pci_name(adev->pdev);
> char *pciaddstr, *pciaddstr_tmp, *pciaddname_tmp, *pciaddname;
>
> pciaddstr = kstrdup(amdgpu_virtual_display, GFP_KERNEL);
> @@ -3375,7 +3374,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
> }
> }
>
> -   pci_enable_pcie_error_reporting(adev->ddev.pdev);
> +   pci_enable_pcie_error_reporting(adev->pdev);
>
> /* Post card if necessary */
> if (amdgpu_device_need_post(adev)) {
> @@ -4922,8 +4921,8 @@ pci_ers_result_t amdgpu_pci_error_detected(struct 
> pci_dev *pdev, pci_channel_sta
> case pci_channel_io_normal:
> return PCI_ERS_RESULT_CAN_RECOVER;
> /* Fatal error, prepare for slot reset */
> -   case pci_channel_io_frozen:
> -   /*
> +   case pci_channel_io_frozen:
> +   /*
>  * Cancel and wait for all TDRs in progress if failing to
>  * set  adev->in_gpu_reset in amdgpu_device_lock_adev
>  *
> @@ -5014,7 +5013,7 @@ pci_ers_result_t amdgpu_pci_slot_reset(struct pci_dev 
> *pdev)
> goto out;
> }
>
> -   adev->in_pci_err_recovery = true;
> +   adev->in_pci_err_recovery = true;
> r = amdgpu_device_pre_asic_reset(adev, NULL, &need_full_reset);
> adev->in_pci_err_recovery = false;
> if (r)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
> index 2e8a8b57639f..77974c3981fa 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
> @@ -721,13 +721,14 @@ amdgpu_display_user_framebuffer_create(struct 
> drm_device *dev,
>struct drm_file *file_priv,
>const struct drm_mode_fb_cmd2 
> *mode_cmd)
>  {
> +   struct amdgpu_device *adev = drm_to_adev(dev);
> struct drm_gem_object *obj;
> struct amdgpu_framebuffer *amdgpu_fb;
> int ret;
>
> obj = drm_gem_object_lookup(file_priv, mode_cmd->handles[0])

Re: [PATCH 11/15] drm/radeon: Remove references to struct drm_device.pdev

2020-11-25 Thread Alex Deucher
On Tue, Nov 24, 2020 at 6:39 AM Thomas Zimmermann  wrote:
>
> Using struct drm_device.pdev is deprecated. Convert radeon to struct
> drm_device.dev. No functional changes.
>
> Signed-off-by: Thomas Zimmermann 
> Cc: Alex Deucher 
> Cc: Christian König 

There are a few unrelated whitespace changes.  Other than that, patch is:
Acked-by: Alex Deucher 

> ---
>  drivers/gpu/drm/radeon/atombios_encoders.c|  6 +-
>  drivers/gpu/drm/radeon/r100.c | 27 +++---
>  drivers/gpu/drm/radeon/radeon.h   | 32 +++
>  drivers/gpu/drm/radeon/radeon_atombios.c  | 89 ++-
>  drivers/gpu/drm/radeon/radeon_bios.c  |  6 +-
>  drivers/gpu/drm/radeon/radeon_combios.c   | 55 ++--
>  drivers/gpu/drm/radeon/radeon_cs.c|  3 +-
>  drivers/gpu/drm/radeon/radeon_device.c| 17 ++--
>  drivers/gpu/drm/radeon/radeon_display.c   |  2 +-
>  drivers/gpu/drm/radeon/radeon_drv.c   |  3 +-
>  drivers/gpu/drm/radeon/radeon_fb.c|  2 +-
>  drivers/gpu/drm/radeon/radeon_gem.c   |  6 +-
>  drivers/gpu/drm/radeon/radeon_i2c.c   |  2 +-
>  drivers/gpu/drm/radeon/radeon_irq_kms.c   |  2 +-
>  drivers/gpu/drm/radeon/radeon_kms.c   | 20 ++---
>  .../gpu/drm/radeon/radeon_legacy_encoders.c   |  6 +-
>  drivers/gpu/drm/radeon/rs780_dpm.c|  7 +-
>  17 files changed, 144 insertions(+), 141 deletions(-)
>
> diff --git a/drivers/gpu/drm/radeon/atombios_encoders.c 
> b/drivers/gpu/drm/radeon/atombios_encoders.c
> index cc5ee1b3af84..a9ae8b6c5991 100644
> --- a/drivers/gpu/drm/radeon/atombios_encoders.c
> +++ b/drivers/gpu/drm/radeon/atombios_encoders.c
> @@ -2065,9 +2065,9 @@ atombios_apply_encoder_quirks(struct drm_encoder 
> *encoder,
> struct radeon_crtc *radeon_crtc = to_radeon_crtc(encoder->crtc);
>
> /* Funky macbooks */
> -   if ((dev->pdev->device == 0x71C5) &&
> -   (dev->pdev->subsystem_vendor == 0x106b) &&
> -   (dev->pdev->subsystem_device == 0x0080)) {
> +   if ((rdev->pdev->device == 0x71C5) &&
> +   (rdev->pdev->subsystem_vendor == 0x106b) &&
> +   (rdev->pdev->subsystem_device == 0x0080)) {
> if (radeon_encoder->devices & ATOM_DEVICE_LCD1_SUPPORT) {
> uint32_t lvtma_bit_depth_control = 
> RREG32(AVIVO_LVTMA_BIT_DEPTH_CONTROL);
>
> diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c
> index 24c8db673931..984eeb893d76 100644
> --- a/drivers/gpu/drm/radeon/r100.c
> +++ b/drivers/gpu/drm/radeon/r100.c
> @@ -2611,7 +2611,6 @@ int r100_asic_reset(struct radeon_device *rdev, bool 
> hard)
>
>  void r100_set_common_regs(struct radeon_device *rdev)
>  {
> -   struct drm_device *dev = rdev->ddev;
> bool force_dac2 = false;
> u32 tmp;
>
> @@ -2629,7 +2628,7 @@ void r100_set_common_regs(struct radeon_device *rdev)
>  * don't report it in the bios connector
>  * table.
>  */
> -   switch (dev->pdev->device) {
> +   switch (rdev->pdev->device) {
> /* RN50 */
> case 0x515e:
> case 0x5969:
> @@ -2639,17 +2638,17 @@ void r100_set_common_regs(struct radeon_device *rdev)
> case 0x5159:
> case 0x515a:
> /* DELL triple head servers */
> -   if ((dev->pdev->subsystem_vendor == 0x1028 /* DELL */) &&
> -   ((dev->pdev->subsystem_device == 0x016c) ||
> -(dev->pdev->subsystem_device == 0x016d) ||
> -(dev->pdev->subsystem_device == 0x016e) ||
> -(dev->pdev->subsystem_device == 0x016f) ||
> -(dev->pdev->subsystem_device == 0x0170) ||
> -(dev->pdev->subsystem_device == 0x017d) ||
> -(dev->pdev->subsystem_device == 0x017e) ||
> -(dev->pdev->subsystem_device == 0x0183) ||
> -(dev->pdev->subsystem_device == 0x018a) ||
> -(dev->pdev->subsystem_device == 0x019a)))
> +   if ((rdev->pdev->subsystem_vendor == 0x1028 /* DELL */) &&
> +   ((rdev->pdev->subsystem_device == 0x016c) ||
> +(rdev->pdev->subsystem_device == 0x016d) ||
> +(rdev->pdev->subsystem_device == 0x016e) ||
> +(rdev->pdev->subsystem_device == 0x016f) ||
> +(rdev->pdev->subsystem_device == 0x0170) ||
> +(rdev->pdev->subsystem_device == 0x017d) ||
> +(rdev->pdev->subsystem_device == 0x017e) ||
> +(rdev->pdev->subsystem_device == 0x0183) ||
> +(rdev->pdev->subsystem_device == 0x018a) ||
> +(rdev->pdev->subsystem_device == 0x019a)))
> force_dac2 = true;
> break;
> }
> @@ -2797,7 +2796,7 @@ void r100_vram_init_sizes(struct radeon_device *rdev)
> rdev->mc.real_

[PATCH][next] drm/amdgpu: Fix sizeof() mismatch in bps_bo kmalloc_array creation

2020-11-25 Thread Colin King
From: Colin Ian King 

An incorrect sizeof() is being used, sizeof((*data)->bps_bo) is not
correct, it should be sizeof(*(*data)->bps_bo).  It just so happens
to work because the sizes are the same.  Fix it.

Addresses-Coverity: ("Sizeof not portable (SIZEOF_MISMATCH)")
Fixes: 5278a159cf35 ("drm/amdgpu: support reserve bad page for virt (v3)")
Signed-off-by: Colin Ian King 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index 2d51b7694d1f..df15d33e3c5c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
@@ -283,7 +283,7 @@ static int amdgpu_virt_init_ras_err_handler_data(struct 
amdgpu_device *adev)
return -ENOMEM;
 
bps = kmalloc_array(align_space, sizeof((*data)->bps), GFP_KERNEL);
-   bps_bo = kmalloc_array(align_space, sizeof((*data)->bps_bo), 
GFP_KERNEL);
+   bps_bo = kmalloc_array(align_space, sizeof(*(*data)->bps_bo), 
GFP_KERNEL);
 
if (!bps || !bps_bo) {
kfree(bps);
-- 
2.29.2

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH][next] drm/amdgpu: Fix sizeof() mismatch in bps_bo kmalloc_array creation

2020-11-25 Thread Christian König

Am 25.11.20 um 15:18 schrieb Colin King:

From: Colin Ian King 

An incorrect sizeof() is being used, sizeof((*data)->bps_bo) is not
correct, it should be sizeof(*(*data)->bps_bo).  It just so happens
to work because the sizes are the same.  Fix it.

Addresses-Coverity: ("Sizeof not portable (SIZEOF_MISMATCH)")
Fixes: 5278a159cf35 ("drm/amdgpu: support reserve bad page for virt (v3)")
Signed-off-by: Colin Ian King 


Acked-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index 2d51b7694d1f..df15d33e3c5c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
@@ -283,7 +283,7 @@ static int amdgpu_virt_init_ras_err_handler_data(struct 
amdgpu_device *adev)
return -ENOMEM;
  
  	bps = kmalloc_array(align_space, sizeof((*data)->bps), GFP_KERNEL);

-   bps_bo = kmalloc_array(align_space, sizeof((*data)->bps_bo), 
GFP_KERNEL);
+   bps_bo = kmalloc_array(align_space, sizeof(*(*data)->bps_bo), 
GFP_KERNEL);
  
  	if (!bps || !bps_bo) {

kfree(bps);


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] drm/radeon: fix check order in radeon_bo_move

2020-11-25 Thread Christian König
Reorder the code to fix checking if blitting is available.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/radeon/radeon_ttm.c | 54 +
 1 file changed, 24 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c 
b/drivers/gpu/drm/radeon/radeon_ttm.c
index 0ca381b95d3d..2b598141225f 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -216,27 +216,15 @@ static int radeon_bo_move(struct ttm_buffer_object *bo, 
bool evict,
struct ttm_resource *old_mem = &bo->mem;
int r;
 
-   if ((old_mem->mem_type == TTM_PL_SYSTEM &&
-new_mem->mem_type == TTM_PL_VRAM) ||
-   (old_mem->mem_type == TTM_PL_VRAM &&
-new_mem->mem_type == TTM_PL_SYSTEM)) {
-   hop->fpfn = 0;
-   hop->lpfn = 0;
-   hop->mem_type = TTM_PL_TT;
-   hop->flags = 0;
-   return -EMULTIHOP;
-   }
-
if (new_mem->mem_type == TTM_PL_TT) {
r = radeon_ttm_tt_bind(bo->bdev, bo->ttm, new_mem);
if (r)
return r;
}
-   radeon_bo_move_notify(bo, evict, new_mem);
 
r = ttm_bo_wait_ctx(bo, ctx);
if (r)
-   goto fail;
+   return r;
 
/* Can't move a pinned BO */
rbo = container_of(bo, struct radeon_bo, tbo);
@@ -246,12 +234,12 @@ static int radeon_bo_move(struct ttm_buffer_object *bo, 
bool evict,
rdev = radeon_get_rdev(bo->bdev);
if (old_mem->mem_type == TTM_PL_SYSTEM && bo->ttm == NULL) {
ttm_bo_move_null(bo, new_mem);
-   return 0;
+   goto out;
}
if (old_mem->mem_type == TTM_PL_SYSTEM &&
new_mem->mem_type == TTM_PL_TT) {
ttm_bo_move_null(bo, new_mem);
-   return 0;
+   goto out;
}
 
if (old_mem->mem_type == TTM_PL_TT &&
@@ -259,31 +247,37 @@ static int radeon_bo_move(struct ttm_buffer_object *bo, 
bool evict,
radeon_ttm_tt_unbind(bo->bdev, bo->ttm);
ttm_resource_free(bo, &bo->mem);
ttm_bo_assign_mem(bo, new_mem);
-   return 0;
+   goto out;
}
-   if (!rdev->ring[radeon_copy_ring_index(rdev)].ready ||
-   rdev->asic->copy.copy == NULL) {
-   /* use memcpy */
-   goto memcpy;
+   if (rdev->ring[radeon_copy_ring_index(rdev)].ready &&
+   rdev->asic->copy.copy != NULL) {
+   if ((old_mem->mem_type == TTM_PL_SYSTEM &&
+new_mem->mem_type == TTM_PL_VRAM) ||
+   (old_mem->mem_type == TTM_PL_VRAM &&
+new_mem->mem_type == TTM_PL_SYSTEM)) {
+   hop->fpfn = 0;
+   hop->lpfn = 0;
+   hop->mem_type = TTM_PL_TT;
+   hop->flags = 0;
+   return -EMULTIHOP;
+   }
+
+   r = radeon_move_blit(bo, evict, new_mem, old_mem);
+   } else {
+   r = -ENODEV;
}
 
-   r = radeon_move_blit(bo, evict, new_mem, old_mem);
if (r) {
-memcpy:
r = ttm_bo_move_memcpy(bo, ctx, new_mem);
-   if (r) {
-   goto fail;
-   }
+   if (r)
+   return r;
}
 
+out:
/* update statistics */
atomic64_add((u64)bo->num_pages << PAGE_SHIFT, &rdev->num_bytes_moved);
+   radeon_bo_move_notify(bo, evict, new_mem);
return 0;
-fail:
-   swap(*new_mem, bo->mem);
-   radeon_bo_move_notify(bo, false, new_mem);
-   swap(*new_mem, bo->mem);
-   return r;
 }
 
 static int radeon_ttm_io_mem_reserve(struct ttm_bo_device *bdev, struct 
ttm_resource *mem)
-- 
2.25.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH][next] drm/amdgpu: Fix sizeof() mismatch in bps_bo kmalloc_array creation

2020-11-25 Thread Chen, Guchun
[AMD Public Use]

Reviewed-by: Guchun Chen 

Regards,
Guchun

-Original Message-
From: Colin King  
Sent: Wednesday, November 25, 2020 10:18 PM
To: Deucher, Alexander ; Koenig, Christian 
; David Airlie ; Daniel Vetter 
; Zhou1, Tao ; Chen, Guchun 
; amd-...@lists.freedesktop.org; 
dri-devel@lists.freedesktop.org
Cc: kernel-janit...@vger.kernel.org; linux-ker...@vger.kernel.org
Subject: [PATCH][next] drm/amdgpu: Fix sizeof() mismatch in bps_bo 
kmalloc_array creation

From: Colin Ian King 

An incorrect sizeof() is being used, sizeof((*data)->bps_bo) is not correct, it 
should be sizeof(*(*data)->bps_bo).  It just so happens to work because the 
sizes are the same.  Fix it.

Addresses-Coverity: ("Sizeof not portable (SIZEOF_MISMATCH)")
Fixes: 5278a159cf35 ("drm/amdgpu: support reserve bad page for virt (v3)")
Signed-off-by: Colin Ian King 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index 2d51b7694d1f..df15d33e3c5c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
@@ -283,7 +283,7 @@ static int amdgpu_virt_init_ras_err_handler_data(struct 
amdgpu_device *adev)
return -ENOMEM;
 
bps = kmalloc_array(align_space, sizeof((*data)->bps), GFP_KERNEL);
-   bps_bo = kmalloc_array(align_space, sizeof((*data)->bps_bo), 
GFP_KERNEL);
+   bps_bo = kmalloc_array(align_space, sizeof(*(*data)->bps_bo), 
+GFP_KERNEL);
 
if (!bps || !bps_bo) {
kfree(bps);
--
2.29.2
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[pull] amdgpu drm-fixes-5.10

2020-11-25 Thread Alex Deucher
Hi Dave, Daniel,

Fixes for 5.10.

The following changes since commit 6600f9d52213b5c3455481b5c9e61cf5e305c0e6:

  Merge tag 'drm-intel-fixes-2020-11-19' of 
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes (2020-11-20 11:21:54 
+1000)

are available in the Git repository at:

  git://people.freedesktop.org/~agd5f/linux tags/amd-drm-fixes-5.10-2020-11-25

for you to fetch changes up to 60734bd54679d7998a24a257b0403f7644005572:

  drm/amdgpu: update golden setting for sienna_cichlid (2020-11-24 12:33:07 
-0500)


amd-drm-fixes-5.10-2020-11-25:

amdgpu:
- Runtime pm fix
- SI UVD suspend/resume fix
- HDCP fix for headless cards
- Sienna Cichlid golden register update


Kenneth Feng (1):
  drm/amd/amdgpu: fix null pointer in runtime pm

Likun Gao (1):
  drm/amdgpu: update golden setting for sienna_cichlid

Rodrigo Siqueira (1):
  drm/amd/display: Avoid HDCP initialization in devices without output

Sonny Jiang (2):
  drm/amdgpu: fix SI UVD firmware validate resume fail
  drm/amdgpu: fix a page fault

 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c|  4 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.h   |  1 +
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c|  2 ++
 drivers/gpu/drm/amd/amdgpu/uvd_v3_1.c | 20 +++-
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |  2 +-
 5 files changed, 17 insertions(+), 12 deletions(-)
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Intel-wired-lan] [PATCH 000/141] Fix fall-through warnings for Clang

2020-11-25 Thread Jakub Kicinski
On Wed, 25 Nov 2020 04:24:27 -0800 Nick Desaulniers wrote:
> I even agree that most of the churn comes from
> 
> case 0:
>   ++x;
> default:
>   break;

And just to spell it out,

case ENUM_VALUE1:
bla();
break;
case ENUM_VALUE2:
bla();
default:
break;

is a fairly idiomatic way of indicating that not all values of the enum
are expected to be handled by the switch statement. 

I really hope the Clang folks are reasonable and merge your patch.

> If trivial patches are adding too much to your workload, consider
> training a co-maintainer or asking for help from one of your reviewers
> whom you trust.  I don't doubt it's hard to find maintainers, but
> existing maintainers should go out of their way to entrust
> co-maintainers especially when they find their workload becomes too
> high.  And reviewing/picking up trivial patches is probably a great
> way to get started.  If we allow too much knowledge of any one
> subsystem to collect with one maintainer, what happens when that
> maintainer leaves the community (which, given a finite lifespan, is an
> inevitability)?

The burn out point is about enjoying your work and feeling that it
matters. It really doesn't make much difference if you're doing
something you don't like for 12 hours every day or only in shifts with
another maintainer. You'll dislike it either way.

Applying a real patch set and then getting a few follow ups the next day
for trivial coding things like fallthrough missing or static missing,
just because I didn't have the full range of compilers to check with
before applying makes me feel pretty shitty, like I'm not doing a good
job. YMMV.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH v4 1/3] mm: Track mmu notifiers in fs_reclaim_acquire/release

2020-11-25 Thread Daniel Vetter
fs_reclaim_acquire/release nicely catch recursion issues when
allocating GFP_KERNEL memory against shrinkers (which gpu drivers tend
to use to keep the excessive caches in check). For mmu notifier
recursions we do have lockdep annotations since 23b68395c7c7
("mm/mmu_notifiers: add a lockdep map for invalidate_range_start/end").

But these only fire if a path actually results in some pte
invalidation - for most small allocations that's very rarely the case.
The other trouble is that pte invalidation can happen any time when
__GFP_RECLAIM is set. Which means only really GFP_ATOMIC is a safe
choice, GFP_NOIO isn't good enough to avoid potential mmu notifier
recursion.

I was pondering whether we should just do the general annotation, but
there's always the risk for false positives. Plus I'm assuming that
the core fs and io code is a lot better reviewed and tested than
random mmu notifier code in drivers. Hence why I decide to only
annotate for that specific case.

Furthermore even if we'd create a lockdep map for direct reclaim, we'd
still need to explicit pull in the mmu notifier map - there's a lot
more places that do pte invalidation than just direct reclaim, these
two contexts arent the same.

Note that the mmu notifiers needing their own independent lockdep map
is also the reason we can't hold them from fs_reclaim_acquire to
fs_reclaim_release - it would nest with the acquistion in the pte
invalidation code, causing a lockdep splat. And we can't remove the
annotations from pte invalidation and all the other places since
they're called from many other places than page reclaim. Hence we can
only do the equivalent of might_lock, but on the raw lockdep map.

With this we can also remove the lockdep priming added in 66204f1d2d1b
("mm/mmu_notifiers: prime lockdep") since the new annotations are
strictly more powerful.

v2: Review from Thomas Hellstrom:
- unbotch the fs_reclaim context check, I accidentally inverted it,
  but it didn't blow up because I inverted it immediately
- fix compiling for !CONFIG_MMU_NOTIFIER

v3: Unbreak the PF_MEMALLOC_ context flags. Thanks to Qian for the
report and Dave for explaining what I failed to see.

Reviewed-by: Jason Gunthorpe 
Cc: linux-fsde...@vger.kernel.org
Cc: Dave Chinner 
Cc: Qian Cai 
Cc: linux-...@vger.kernel.org
Cc: Thomas Hellström (Intel) 
Cc: Andrew Morton 
Cc: Jason Gunthorpe 
Cc: linux...@kvack.org
Cc: linux-r...@vger.kernel.org
Cc: Maarten Lankhorst 
Cc: Christian König 
Cc: "Matthew Wilcox (Oracle)" 
Signed-off-by: Daniel Vetter 
---
 mm/mmu_notifier.c |  7 ---
 mm/page_alloc.c   | 31 ---
 2 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index 5654dd19addc..61ee40ed804e 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -612,13 +612,6 @@ int __mmu_notifier_register(struct mmu_notifier 
*subscription,
mmap_assert_write_locked(mm);
BUG_ON(atomic_read(&mm->mm_users) <= 0);
 
-   if (IS_ENABLED(CONFIG_LOCKDEP)) {
-   fs_reclaim_acquire(GFP_KERNEL);
-   lock_map_acquire(&__mmu_notifier_invalidate_range_start_map);
-   lock_map_release(&__mmu_notifier_invalidate_range_start_map);
-   fs_reclaim_release(GFP_KERNEL);
-   }
-
if (!mm->notifier_subscriptions) {
/*
 * kmalloc cannot be called under mm_take_all_locks(), but we
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 23f5066bd4a5..ff0f9a84b8de 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -57,6 +57,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -4264,10 +4265,8 @@ should_compact_retry(struct alloc_context *ac, unsigned 
int order, int alloc_fla
 static struct lockdep_map __fs_reclaim_map =
STATIC_LOCKDEP_MAP_INIT("fs_reclaim", &__fs_reclaim_map);
 
-static bool __need_fs_reclaim(gfp_t gfp_mask)
+static bool __need_reclaim(gfp_t gfp_mask)
 {
-   gfp_mask = current_gfp_context(gfp_mask);
-
/* no reclaim without waiting on it */
if (!(gfp_mask & __GFP_DIRECT_RECLAIM))
return false;
@@ -4276,10 +4275,6 @@ static bool __need_fs_reclaim(gfp_t gfp_mask)
if (current->flags & PF_MEMALLOC)
return false;
 
-   /* We're only interested __GFP_FS allocations for now */
-   if (!(gfp_mask & __GFP_FS))
-   return false;
-
if (gfp_mask & __GFP_NOLOCKDEP)
return false;
 
@@ -4298,15 +4293,29 @@ void __fs_reclaim_release(void)
 
 void fs_reclaim_acquire(gfp_t gfp_mask)
 {
-   if (__need_fs_reclaim(gfp_mask))
-   __fs_reclaim_acquire();
+   gfp_mask = current_gfp_context(gfp_mask);
+
+   if (__need_reclaim(gfp_mask)) {
+   if (gfp_mask & __GFP_FS)
+   __fs_reclaim_acquire();
+
+#ifdef CONFIG_MMU_NOTIFIER
+   lock_map_acquire(&__mmu_notifier_invalidate_range_start_map);
+   lock_map_release(&

[PATCH v4 2/3] mm: Extract might_alloc() debug check

2020-11-25 Thread Daniel Vetter
Extracted from slab.h, which seems to have the most complete version
including the correct might_sleep() check. Roll it out to slob.c.

Motivated by a discussion with Paul about possibly changing call_rcu
behaviour to allocate memory, but only roughly every 500th call.

There are a lot fewer places in the kernel that care about whether
allocating memory is allowed or not (due to deadlocks with reclaim
code) than places that care whether sleeping is allowed. But debugging
these also tends to be a lot harder, so nice descriptive checks could
come in handy. I might have some use eventually for annotations in
drivers/gpu.

Note that unlike fs_reclaim_acquire/release gfpflags_allow_blocking
does not consult the PF_MEMALLOC flags. But there is no flag
equivalent for GFP_NOWAIT, hence this check can't go wrong due to
memalloc_no*_save/restore contexts. Willy is working on a patch series
which might change this:

https://lore.kernel.org/linux-mm/20200625113122.7540-7-wi...@infradead.org/

I think best would be if that updates gfpflags_allow_blocking(), since
there's a ton of callers all over the place for that already.

v2: Fix typos in kerneldoc (Randy)

Acked-by: Vlastimil Babka 
Acked-by: Paul E. McKenney 
Reviewed-by: Jason Gunthorpe 
Cc. Randy Dunlap 
Cc: Paul E. McKenney 
Cc: Christoph Lameter 
Cc: Pekka Enberg 
Cc: David Rientjes 
Cc: Joonsoo Kim 
Cc: Andrew Morton 
Cc: Peter Zijlstra 
Cc: Ingo Molnar 
Cc: Vlastimil Babka 
Cc: Mathieu Desnoyers 
Cc: Sebastian Andrzej Siewior 
Cc: Michel Lespinasse 
Cc: Daniel Vetter 
Cc: Waiman Long 
Cc: Thomas Gleixner 
Cc: Randy Dunlap 
Cc: linux...@kvack.org
Cc: linux-fsde...@vger.kernel.org
Cc: Dave Chinner 
Cc: Qian Cai 
Cc: linux-...@vger.kernel.org
Cc: "Matthew Wilcox (Oracle)" 
Signed-off-by: Daniel Vetter 
---
 include/linux/sched/mm.h | 16 
 mm/slab.h|  5 +
 mm/slob.c|  6 ++
 3 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index d5ece7a9a403..a11a61b5226f 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -180,6 +180,22 @@ static inline void fs_reclaim_acquire(gfp_t gfp_mask) { }
 static inline void fs_reclaim_release(gfp_t gfp_mask) { }
 #endif
 
+/**
+ * might_alloc - Mark possible allocation sites
+ * @gfp_mask: gfp_t flags that would be used to allocate
+ *
+ * Similar to might_sleep() and other annotations, this can be used in 
functions
+ * that might allocate, but often don't. Compiles to nothing without
+ * CONFIG_LOCKDEP. Includes a conditional might_sleep() if @gfp allows 
blocking.
+ */
+static inline void might_alloc(gfp_t gfp_mask)
+{
+   fs_reclaim_acquire(gfp_mask);
+   fs_reclaim_release(gfp_mask);
+
+   might_sleep_if(gfpflags_allow_blocking(gfp_mask));
+}
+
 /**
  * memalloc_noio_save - Marks implicit GFP_NOIO allocation scope.
  *
diff --git a/mm/slab.h b/mm/slab.h
index 6d7c6a5056ba..37b981247e5d 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -500,10 +500,7 @@ static inline struct kmem_cache 
*slab_pre_alloc_hook(struct kmem_cache *s,
 {
flags &= gfp_allowed_mask;
 
-   fs_reclaim_acquire(flags);
-   fs_reclaim_release(flags);
-
-   might_sleep_if(gfpflags_allow_blocking(flags));
+   might_alloc(flags);
 
if (should_failslab(s, flags))
return NULL;
diff --git a/mm/slob.c b/mm/slob.c
index 7cc9805c8091..8d4bfa46247f 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -474,8 +474,7 @@ __do_kmalloc_node(size_t size, gfp_t gfp, int node, 
unsigned long caller)
 
gfp &= gfp_allowed_mask;
 
-   fs_reclaim_acquire(gfp);
-   fs_reclaim_release(gfp);
+   might_alloc(gfp);
 
if (size < PAGE_SIZE - minalign) {
int align = minalign;
@@ -597,8 +596,7 @@ static void *slob_alloc_node(struct kmem_cache *c, gfp_t 
flags, int node)
 
flags &= gfp_allowed_mask;
 
-   fs_reclaim_acquire(flags);
-   fs_reclaim_release(flags);
+   might_alloc(flags);
 
if (c->size < PAGE_SIZE) {
b = slob_alloc(c->size, flags, c->align, node, 0);
-- 
2.29.2

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH v4 0/3] mmu_notifier vs fs_reclaim lockdep annotations

2020-11-25 Thread Daniel Vetter
Hi all,

Just resending with the polish applied, no functional changes at all.

Previous versions.

v3: 
https://lore.kernel.org/dri-devel/20201120095445.1195585-1-daniel.vet...@ffwll.ch/
v2: 
https://lore.kernel.org/dri-devel/20200610194101.1668038-1-daniel.vet...@ffwll.ch/

Changes since v3:
- more acks/r-b
- typos in the kerneldoc fixed

Changes since v2:
- Now hopefully the bug that bombed xfs fixed.
- With unit-tests (that's the part I really wanted and never got to)
- might_alloc() helper thrown in for good.

I think if we have an ack/review from fs-devel this should be good to
land. Last version that landed in -mm (v2) broke xfs pretty badly.

Unfortuantely I don't have a working email from Qian anymore, who reported
the xfs issue. Maybe Dave Chinner instead?

Cheers, Daniel

Daniel Vetter (3):
  mm: Track mmu notifiers in fs_reclaim_acquire/release
  mm: Extract might_alloc() debug check
  locking/selftests: Add testcases for fs_reclaim

 include/linux/sched/mm.h | 16 ++
 lib/locking-selftest.c   | 47 
 mm/mmu_notifier.c|  7 --
 mm/page_alloc.c  | 31 --
 mm/slab.h|  5 +
 mm/slob.c|  6 ++---
 6 files changed, 86 insertions(+), 26 deletions(-)

-- 
2.29.2

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] drm/ttm: don't set page->mapping

2020-11-25 Thread Daniel Vetter
Random observation while trying to review Christian's patch series to
stop looking at struct page for dma-buf imports.

This was originally added in

commit 58aa6622d32af7d2c08d45085f44c54554a16ed7
Author: Thomas Hellstrom 
Date:   Fri Jan 3 11:47:23 2014 +0100

drm/ttm: Correctly set page mapping and -index members

Needed for some vm operations; most notably unmap_mapping_range() with
even_cows = 0.

Signed-off-by: Thomas Hellstrom 
Reviewed-by: Brian Paul 

but we do not have a single caller of unmap_mapping_range with
even_cows == 0. And all the gem drivers don't do this, so another
small thing we could standardize between drm and ttm drivers.

Plus I don't really see a need for unamp_mapping_range where we don't
want to indiscriminately shoot down all ptes.

Cc: Thomas Hellstrom 
Cc: Brian Paul 
Signed-off-by: Daniel Vetter 
Cc: Christian Koenig 
Cc: Huang Rui 
---
 drivers/gpu/drm/ttm/ttm_tt.c | 12 
 1 file changed, 12 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index da9eeffe0c6d..5b2eb6d58bb7 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -284,17 +284,6 @@ int ttm_tt_swapout(struct ttm_bo_device *bdev, struct 
ttm_tt *ttm)
return ret;
 }
 
-static void ttm_tt_add_mapping(struct ttm_bo_device *bdev, struct ttm_tt *ttm)
-{
-   pgoff_t i;
-
-   if (ttm->page_flags & TTM_PAGE_FLAG_SG)
-   return;
-
-   for (i = 0; i < ttm->num_pages; ++i)
-   ttm->pages[i]->mapping = bdev->dev_mapping;
-}
-
 int ttm_tt_populate(struct ttm_bo_device *bdev,
struct ttm_tt *ttm, struct ttm_operation_ctx *ctx)
 {
@@ -313,7 +302,6 @@ int ttm_tt_populate(struct ttm_bo_device *bdev,
if (ret)
return ret;
 
-   ttm_tt_add_mapping(bdev, ttm);
ttm->page_flags |= TTM_PAGE_FLAG_PRIV_POPULATED;
if (unlikely(ttm->page_flags & TTM_PAGE_FLAG_SWAPPED)) {
ret = ttm_tt_swapin(ttm);
-- 
2.29.2

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH v4 3/3] locking/selftests: Add testcases for fs_reclaim

2020-11-25 Thread Daniel Vetter
Since I butchered this I figured better to make sure we have testcases
for this now. Since we only have a locking context for __GFP_FS that's
the only thing we're testing right now.

Acked-by: Peter Zijlstra (Intel) 
Cc: linux-fsde...@vger.kernel.org
Cc: Dave Chinner 
Cc: Qian Cai 
Cc: linux-...@vger.kernel.org
Cc: Thomas Hellström (Intel) 
Cc: Andrew Morton 
Cc: Jason Gunthorpe 
Cc: linux...@kvack.org
Cc: linux-r...@vger.kernel.org
Cc: Maarten Lankhorst 
Cc: Christian König 
Cc: "Matthew Wilcox (Oracle)" 
Signed-off-by: Daniel Vetter 
Cc: Peter Zijlstra 
Cc: Ingo Molnar 
Cc: Will Deacon 
Cc: linux-ker...@vger.kernel.org
---
 lib/locking-selftest.c | 47 ++
 1 file changed, 47 insertions(+)

diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c
index a899b3f0e2e5..ad47c3358e30 100644
--- a/lib/locking-selftest.c
+++ b/lib/locking-selftest.c
@@ -15,6 +15,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -2357,6 +2358,50 @@ static void queued_read_lock_tests(void)
pr_cont("\n");
 }
 
+static void fs_reclaim_correct_nesting(void)
+{
+   fs_reclaim_acquire(GFP_KERNEL);
+   might_alloc(GFP_NOFS);
+   fs_reclaim_release(GFP_KERNEL);
+}
+
+static void fs_reclaim_wrong_nesting(void)
+{
+   fs_reclaim_acquire(GFP_KERNEL);
+   might_alloc(GFP_KERNEL);
+   fs_reclaim_release(GFP_KERNEL);
+}
+
+static void fs_reclaim_protected_nesting(void)
+{
+   unsigned int flags;
+
+   fs_reclaim_acquire(GFP_KERNEL);
+   flags = memalloc_nofs_save();
+   might_alloc(GFP_KERNEL);
+   memalloc_nofs_restore(flags);
+   fs_reclaim_release(GFP_KERNEL);
+}
+
+static void fs_reclaim_tests(void)
+{
+   printk("  \n");
+   printk("  | fs_reclaim tests |\n");
+   printk("  \n");
+
+   print_testname("correct nesting");
+   dotest(fs_reclaim_correct_nesting, SUCCESS, 0);
+   pr_cont("\n");
+
+   print_testname("wrong nesting");
+   dotest(fs_reclaim_wrong_nesting, FAILURE, 0);
+   pr_cont("\n");
+
+   print_testname("protected nesting");
+   dotest(fs_reclaim_protected_nesting, SUCCESS, 0);
+   pr_cont("\n");
+}
+
 void locking_selftest(void)
 {
/*
@@ -2478,6 +2523,8 @@ void locking_selftest(void)
if (IS_ENABLED(CONFIG_QUEUED_RWLOCKS))
queued_read_lock_tests();
 
+   fs_reclaim_tests();
+
if (unexpected_testcase_failures) {

printk("-\n");
debug_locks = 0;
-- 
2.29.2

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/ttm: don't set page->mapping

2020-11-25 Thread Daniel Vetter
On Wed, Nov 25, 2020 at 5:25 PM Daniel Vetter  wrote:
>
> Random observation while trying to review Christian's patch series to
> stop looking at struct page for dma-buf imports.
>
> This was originally added in
>
> commit 58aa6622d32af7d2c08d45085f44c54554a16ed7
> Author: Thomas Hellstrom 
> Date:   Fri Jan 3 11:47:23 2014 +0100
>
> drm/ttm: Correctly set page mapping and -index members
>
> Needed for some vm operations; most notably unmap_mapping_range() with
> even_cows = 0.
>
> Signed-off-by: Thomas Hellstrom 
> Reviewed-by: Brian Paul 
>
> but we do not have a single caller of unmap_mapping_range with
> even_cows == 0. And all the gem drivers don't do this, so another
> small thing we could standardize between drm and ttm drivers.
>
> Plus I don't really see a need for unamp_mapping_range where we don't
> want to indiscriminately shoot down all ptes.
>
> Cc: Thomas Hellstrom 
> Cc: Brian Paul 
> Signed-off-by: Daniel Vetter 
> Cc: Christian Koenig 
> Cc: Huang Rui 

Apologies again, this shouldn't have been included. But at least I
have an idea now why this patch somehow was included in the git
send-email. Lovely interface :-/
-Daniel

> ---
>  drivers/gpu/drm/ttm/ttm_tt.c | 12 
>  1 file changed, 12 deletions(-)
>
> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
> index da9eeffe0c6d..5b2eb6d58bb7 100644
> --- a/drivers/gpu/drm/ttm/ttm_tt.c
> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
> @@ -284,17 +284,6 @@ int ttm_tt_swapout(struct ttm_bo_device *bdev, struct 
> ttm_tt *ttm)
> return ret;
>  }
>
> -static void ttm_tt_add_mapping(struct ttm_bo_device *bdev, struct ttm_tt 
> *ttm)
> -{
> -   pgoff_t i;
> -
> -   if (ttm->page_flags & TTM_PAGE_FLAG_SG)
> -   return;
> -
> -   for (i = 0; i < ttm->num_pages; ++i)
> -   ttm->pages[i]->mapping = bdev->dev_mapping;
> -}
> -
>  int ttm_tt_populate(struct ttm_bo_device *bdev,
> struct ttm_tt *ttm, struct ttm_operation_ctx *ctx)
>  {
> @@ -313,7 +302,6 @@ int ttm_tt_populate(struct ttm_bo_device *bdev,
> if (ret)
> return ret;
>
> -   ttm_tt_add_mapping(bdev, ttm);
> ttm->page_flags |= TTM_PAGE_FLAG_PRIV_POPULATED;
> if (unlikely(ttm->page_flags & TTM_PAGE_FLAG_SWAPPED)) {
> ret = ttm_tt_swapin(ttm);
> --
> 2.29.2
>


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 4/7] drm/radeon: Pin buffers while they are vmap'ed

2020-11-25 Thread Daniel Vetter
On Wed, Nov 25, 2020 at 12:38:01PM +0100, Thomas Zimmermann wrote:
> Hi
> 
> Am 25.11.20 um 11:36 schrieb Daniel Vetter:
> > On Wed, Nov 25, 2020 at 11:13:13AM +0100, Christian König wrote:
> > > Am 25.11.20 um 09:37 schrieb Thomas Zimmermann:
> > > > Hi
> > > > 
> > > > Am 24.11.20 um 15:09 schrieb Daniel Vetter:
> > > > > On Tue, Nov 24, 2020 at 02:56:51PM +0100, Thomas Zimmermann wrote:
> > > > > > Hi
> > > > > > 
> > > > > > Am 24.11.20 um 14:36 schrieb Christian König:
> > > > > > > Am 24.11.20 um 13:15 schrieb Thomas Zimmermann:
> > > > > > > > [SNIP]
> > > > > > > > > > > > First I wanted to put this into
> > > > > > > > > > > > drm_gem_ttm_vmap/vunmap(), but then wondered why
> > > > > > > > > > > > ttm_bo_vmap() doe not acquire the lock internally?
> > > > > > > > > > > > I'd expect that vmap/vunmap are close together and
> > > > > > > > > > > > do not overlap for the same BO.
> > > > > > > > > > > 
> > > > > > > > > > > We have use cases like the following during command 
> > > > > > > > > > > submission:
> > > > > > > > > > > 
> > > > > > > > > > > 1. lock
> > > > > > > > > > > 2. map
> > > > > > > > > > > 3. copy parts of the BO content somewhere else or patch
> > > > > > > > > > > it with additional information
> > > > > > > > > > > 4. unmap
> > > > > > > > > > > 5. submit BO to the hardware
> > > > > > > > > > > 6. add hardware fence to the BO to make sure it doesn't 
> > > > > > > > > > > move
> > > > > > > > > > > 7. unlock
> > > > > > > > > > > 
> > > > > > > > > > > That use case won't be possible with vmap/vunmap if we
> > > > > > > > > > > move the lock/unlock into it and I hope to replace the
> > > > > > > > > > > kmap/kunmap functions with them in the near term.
> > > > > > > > > > > 
> > > > > > > > > > > > Otherwise, acquiring the reservation lock would
> > > > > > > > > > > > require another ref-counting variable or per-driver
> > > > > > > > > > > > code.
> > > > > > > > > > > 
> > > > > > > > > > > Hui, why that? Just put this into
> > > > > > > > > > > drm_gem_ttm_vmap/vunmap() helper as you initially
> > > > > > > > > > > planned.
> > > > > > > > > > 
> > > > > > > > > > Given your example above, step one would acquire the lock,
> > > > > > > > > > and step two would also acquire the lock as part of the vmap
> > > > > > > > > > implementation. Wouldn't this fail (At least during unmap or
> > > > > > > > > > unlock steps) ?
> > > > > > > > > 
> > > > > > > > > Oh, so you want to nest them? No, that is a rather bad no-go.
> > > > > > > > 
> > > > > > > > I don't want to nest/overlap them. My question was whether that
> > > > > > > > would be required. Apparently not.
> > > > > > > > 
> > > > > > > > While the console's BO is being set for scanout, it's protected 
> > > > > > > > from
> > > > > > > > movement via the pin/unpin implementation, right?
> > > > > > > 
> > > > > > > Yes, correct.
> > > > > > > 
> > > > > > > > The driver does not acquire the resv lock for longer periods. 
> > > > > > > > I'm
> > > > > > > > asking because this would prevent any console-buffer updates 
> > > > > > > > while
> > > > > > > > the console is being displayed.
> > > > > > > 
> > > > > > > Correct as well, we only hold the lock for things like command
> > > > > > > submission, pinning, unpinning etc etc
> > > > > > > 
> > > > > > 
> > > > > > Thanks for answering my questions.
> > > > > > 
> > > > > > > > 
> > > > > > > > > 
> > > > > > > > > You need to make sure that the lock is only taken from the FB
> > > > > > > > > path which wants to vmap the object.
> > > > > > > > > 
> > > > > > > > > Why don't you lock the GEM object from the caller in the 
> > > > > > > > > generic
> > > > > > > > > FB implementation?
> > > > > > > > 
> > > > > > > > With the current blitter code, it breaks abstraction. if 
> > > > > > > > vmap/vunmap
> > > > > > > > hold the lock implicitly, things would be easier.
> > > > > > > 
> > > > > > > Do you have a link to the code?
> > > > > > 
> > > > > > It's the damage blitter in the fbdev code. [1] While it flushes
> > > > > > the shadow
> > > > > > buffer into the BO, the BO has to be kept in place. I already
> > > > > > changed it to
> > > > > > lock struct drm_fb_helper.lock, but I don't think this is
> > > > > > enough. TTM could
> > > > > > still evict the BO concurrently.
> > > > > 
> > > > > So I'm not sure this is actually a problem: ttm could try to
> > > > > concurrently
> > > > > evict the buffer we pinned into vram, and then just skip to the next
> > > > > one.
> > > > > 
> > > > > Plus atm generic fbdev isn't used on any chip where we really care 
> > > > > about
> > > > > that last few mb of vram being useable for command submission (well 
> > > > > atm
> > > > > there's no driver using it).
> > > > 
> > > > Well, this is the patchset for radeon. If it works out, amdgpu and
> > > > nouveau are natural next choices. Especially radeon and nouveau support
> > > > cards with low- to medium-sized VRAM. The MiBs wasted on fbdev certainly
> > > > matter.
> 

Re: [PATCH v3 05/12] drm/ttm: Expose ttm_tt_unpopulate for driver use

2020-11-25 Thread Daniel Vetter
On Wed, Nov 25, 2020 at 01:57:40PM +0100, Christian König wrote:
> Am 25.11.20 um 11:40 schrieb Daniel Vetter:
> > On Tue, Nov 24, 2020 at 05:44:07PM +0100, Christian König wrote:
> > > Am 24.11.20 um 17:22 schrieb Andrey Grodzovsky:
> > > > On 11/24/20 2:41 AM, Christian König wrote:
> > > > > Am 23.11.20 um 22:08 schrieb Andrey Grodzovsky:
> > > > > > On 11/23/20 3:41 PM, Christian König wrote:
> > > > > > > Am 23.11.20 um 21:38 schrieb Andrey Grodzovsky:
> > > > > > > > On 11/23/20 3:20 PM, Christian König wrote:
> > > > > > > > > Am 23.11.20 um 21:05 schrieb Andrey Grodzovsky:
> > > > > > > > > > On 11/25/20 5:42 AM, Christian König wrote:
> > > > > > > > > > > Am 21.11.20 um 06:21 schrieb Andrey Grodzovsky:
> > > > > > > > > > > > It's needed to drop iommu backed pages on device unplug
> > > > > > > > > > > > before device's IOMMU group is released.
> > > > > > > > > > > It would be cleaner if we could do the whole
> > > > > > > > > > > handling in TTM. I also need to double check
> > > > > > > > > > > what you are doing with this function.
> > > > > > > > > > > 
> > > > > > > > > > > Christian.
> > > > > > > > > > 
> > > > > > > > > > Check patch "drm/amdgpu: Register IOMMU topology
> > > > > > > > > > notifier per device." to see
> > > > > > > > > > how i use it. I don't see why this should go
> > > > > > > > > > into TTM mid-layer - the stuff I do inside
> > > > > > > > > > is vendor specific and also I don't think TTM is
> > > > > > > > > > explicitly aware of IOMMU ?
> > > > > > > > > > Do you mean you prefer the IOMMU notifier to be
> > > > > > > > > > registered from within TTM
> > > > > > > > > > and then use a hook to call into vendor specific handler ?
> > > > > > > > > No, that is really vendor specific.
> > > > > > > > > 
> > > > > > > > > What I meant is to have a function like
> > > > > > > > > ttm_resource_manager_evict_all() which you only need
> > > > > > > > > to call and all tt objects are unpopulated.
> > > > > > > > 
> > > > > > > > So instead of this BO list i create and later iterate in
> > > > > > > > amdgpu from the IOMMU patch you just want to do it
> > > > > > > > within
> > > > > > > > TTM with a single function ? Makes much more sense.
> > > > > > > Yes, exactly.
> > > > > > > 
> > > > > > > The list_empty() checks we have in TTM for the LRU are
> > > > > > > actually not the best idea, we should now check the
> > > > > > > pin_count instead. This way we could also have a list of the
> > > > > > > pinned BOs in TTM.
> > > > > > 
> > > > > > So from my IOMMU topology handler I will iterate the TTM LRU for
> > > > > > the unpinned BOs and this new function for the pinned ones  ?
> > > > > > It's probably a good idea to combine both iterations into this
> > > > > > new function to cover all the BOs allocated on the device.
> > > > > Yes, that's what I had in my mind as well.
> > > > > 
> > > > > > 
> > > > > > > BTW: Have you thought about what happens when we unpopulate
> > > > > > > a BO while we still try to use a kernel mapping for it? That
> > > > > > > could have unforeseen consequences.
> > > > > > 
> > > > > > Are you asking what happens to kmap or vmap style mapped CPU
> > > > > > accesses once we drop all the DMA backing pages for a particular
> > > > > > BO ? Because for user mappings
> > > > > > (mmap) we took care of this with dummy page reroute but indeed
> > > > > > nothing was done for in kernel CPU mappings.
> > > > > Yes exactly that.
> > > > > 
> > > > > In other words what happens if we free the ring buffer while the
> > > > > kernel still writes to it?
> > > > > 
> > > > > Christian.
> > > > 
> > > > While we can't control user application accesses to the mapped buffers
> > > > explicitly and hence we use page fault rerouting
> > > > I am thinking that in this  case we may be able to sprinkle
> > > > drm_dev_enter/exit in any such sensitive place were we might
> > > > CPU access a DMA buffer from the kernel ?
> > > Yes, I fear we are going to need that.
> > Uh ... problem is that dma_buf_vmap are usually permanent things. Maybe we
> > could stuff this into begin/end_cpu_access (but only for the kernel, so a
> > bit tricky)?
> 
> Oh very very good point! I haven't thought about DMA-buf mmaps in this
> context yet.
> 
> 
> > btw the other issue with dma-buf (and even worse with dma_fence) is
> > refcounting of the underlying drm_device. I'd expect that all your
> > callbacks go boom if the dma_buf outlives your drm_device. That part isn't
> > yet solved in your series here.
> 
> Well thinking more about this, it seems to be a another really good argument
> why mapping pages from DMA-bufs into application address space directly is a
> very bad idea :)
> 
> But yes, we essentially can't remove the device as long as there is a
> DMA-buf with mappings. No idea how to clean that one up.

drm_dev_get/put in drm_prime helpers should get us like 90% there I think.

The even more worrying thing is random dma_fence attached to the dma_resv
object. We could try to cle

Re: [PATCH 2/6] gpu/drm: ring_mirror_list --> pending_list

2020-11-25 Thread Luben Tuikov
On 2020-11-25 04:47, Christian König wrote:
> Am 25.11.20 um 04:17 schrieb Luben Tuikov:
>> Rename "ring_mirror_list" to "pending_list",
>> to describe what something is, not what it does,
>> how it's used, or how the hardware implements it.
>>
>> This also abstracts the actual hardware
>> implementation, i.e. how the low-level driver
>> communicates with the device it drives, ring, CAM,
>> etc., shouldn't be exposed to DRM.
>>
>> The pending_list keeps jobs submitted, which are
>> out of our control. Usually this means they are
>> pending execution status in hardware, but the
>> latter definition is a more general (inclusive)
>> definition.
>>
>> Signed-off-by: Luben Tuikov 
> 
> In general the rename is a good idea, but I think we should try to 
> remove this linked list in general.
> 
> As the original name described this is essentially a ring buffer, the is 
> no reason I can see to use a linked list here except for the add/remove 
> madness we currently have.
> 
> Anyway patch is Acked-by: Christian König  for 
> now.

Thanks for the Ack Christian.

Well this list is there now and I don't want to change too many
things or this patch would get out of hand.

Yeah, in the future, perhaps we can overhaul and change this. For now
this is a minimal rename-only patch.

Thanks,
Luben

> 
> Regards,
> Christian.
> 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |  4 +--
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  4 +--
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  2 +-
>>   drivers/gpu/drm/scheduler/sched_main.c  | 34 ++---
>>   include/drm/gpu_scheduler.h | 10 +++---
>>   5 files changed, 27 insertions(+), 27 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
>> index 8358cae0b5a4..db77a5bdfa45 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
>> @@ -1427,7 +1427,7 @@ static void amdgpu_ib_preempt_job_recovery(struct 
>> drm_gpu_scheduler *sched)
>>  struct dma_fence *fence;
>>   
>>  spin_lock(&sched->job_list_lock);
>> -list_for_each_entry(s_job, &sched->ring_mirror_list, list) {
>> +list_for_each_entry(s_job, &sched->pending_list, list) {
>>  fence = sched->ops->run_job(s_job);
>>  dma_fence_put(fence);
>>  }
>> @@ -1459,7 +1459,7 @@ static void amdgpu_ib_preempt_mark_partial_job(struct 
>> amdgpu_ring *ring)
>>   
>>   no_preempt:
>>  spin_lock(&sched->job_list_lock);
>> -list_for_each_entry_safe(s_job, tmp, &sched->ring_mirror_list, list) {
>> +list_for_each_entry_safe(s_job, tmp, &sched->pending_list, list) {
>>  if (dma_fence_is_signaled(&s_job->s_fence->finished)) {
>>  /* remove job from ring_mirror_list */
>>  list_del_init(&s_job->list);
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index 4df6de81cd41..fbae600aa5f9 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -4127,8 +4127,8 @@ bool amdgpu_device_has_job_running(struct 
>> amdgpu_device *adev)
>>  continue;
>>   
>>  spin_lock(&ring->sched.job_list_lock);
>> -job = list_first_entry_or_null(&ring->sched.ring_mirror_list,
>> -struct drm_sched_job, list);
>> +job = list_first_entry_or_null(&ring->sched.pending_list,
>> +   struct drm_sched_job, list);
>>  spin_unlock(&ring->sched.job_list_lock);
>>  if (job)
>>  return true;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>> index aca52a46b93d..ff48101bab55 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>> @@ -271,7 +271,7 @@ void amdgpu_job_stop_all_jobs_on_sched(struct 
>> drm_gpu_scheduler *sched)
>>  }
>>   
>>  /* Signal all jobs already scheduled to HW */
>> -list_for_each_entry(s_job, &sched->ring_mirror_list, list) {
>> +list_for_each_entry(s_job, &sched->pending_list, list) {
>>  struct drm_sched_fence *s_fence = s_job->s_fence;
>>   
>>  dma_fence_set_error(&s_fence->finished, -EHWPOISON);
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
>> b/drivers/gpu/drm/scheduler/sched_main.c
>> index c52eba407ebd..b694df12aaba 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -198,7 +198,7 @@ EXPORT_SYMBOL(drm_sched_dependency_optimized);
>>   static void drm_sched_start_timeout(struct drm_gpu_scheduler *sched)
>>   {
>>  if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
>> -!list_empty(&sched->ring_mirror_list))
>> +!list_empty(&sched->pending_list))
>>  schedul

[pull] amdgpu, radeon drm-next-5.11

2020-11-25 Thread Alex Deucher
Hi Dave, Daniel,

More updates for 5.11.

The following changes since commit 178631700f9dc40df754acbe766b55753ddcbfec:

  drm/amd/pm: fix spelling mistakes in dev_warn messages (2020-11-17 14:07:26 
-0500)

are available in the Git repository at:

  git://people.freedesktop.org/~agd5f/linux tags/amd-drm-next-5.11-2020-11-25

for you to fetch changes up to beaff108e1bf1e38c9def60dd09f7a4ed7910481:

  drm/amd/powerplay: fix spelling mistake "smu_state_memroy_block" -> 
"smu_state_memory_block" (2020-11-24 12:09:54 -0500)


amd-drm-next-5.11-2020-11-25:

amdgpu:
- Updates for Navy Flounder
- Updates for Dimgrey Cavefish
- Updates for Vangogh
- Add experimental support for VCN decode software ring
- Only register VGA devices with the VGA arbiter
- Clang warning fixes
- Add software IH handing
- Add cursor validation
- More W=1 fixes

radeon:
- More W=1 fixes


Alex Deucher (1):
  drm/amdgpu: only register VGA devices with the VGA arbiter

Aric Cyr (1):
  drm/amd/display: 3.2.113

Ashley Thomas (1):
  drm/amd/display: Source minimum HBlank support

Bernard Zhao (2):
  amdgpu/amdgpu_ids: fix kmalloc_array not uses number as first arg
  amd/amdgpu: use kmalloc_array to replace kmalloc with multiply

Bhawanpreet Lakha (3):
  drm/amd/display: Add display only once.
  drm/amd/display: Add comments to hdcp property change code
  drm/amd/display: Add DPCS regs for dcn302 link encoder

Camille Cho (1):
  drm/amd/display: To update backlight restore mechanism

Charlene Liu (1):
  drm/amd/display: add i2c speed arbitration for dc_i2c and hdcp_i2c

Chris Park (1):
  drm/amd/display: Update panel register

Christian König (7):
  drm/amdgpu: drop leading zeros from the gmc9 fault address
  drm/amdgpu: cleanup gmc_v10_0_process_interrupt a bit
  drm/amdgpu: add infrastructure for soft IH ring
  drm/amdgpu: enabled software IH ring for Vega
  drm/amdgpu: make sure retry faults are handled in a work item on Vega
  drm/amdgpu: enabled software IH ring for Navi
  drm/amdgpu: implement retry fault handling for Navi

Colin Ian King (1):
  drm/amd/powerplay: fix spelling mistake "smu_state_memroy_block" -> 
"smu_state_memory_block"

Eric Yang (1):
  drm/amd/display: expose clk_mgr functions for reuse

Gustavo A. R. Silva (4):
  drm/amdgpu: Fix fall-through warnings for Clang
  drm/radeon: Fix fall-through warnings for Clang
  drm/amd/display: Fix fall-through warnings for Clang
  drm/amd/pm: Fix fall-through warnings for Clang

Jacky Liao (3):
  drm/amd/display: Add DMCU memory low power support
  drm/amd/display: Add BLNDGAM memory shutdown support
  drm/amd/display: Add GAMCOR memory shutdown support

James Zhu (5):
  drm/amdgpu/vcn: refactor dec message functions
  drm/amdgpu/vcn: update header to support dec software ring
  drm/amdgpu/vcn: add test for dec software ring
  drm/amdgpu/vcn3.0: add dec software ring vm functions to support
  drm/amdgpu/vcn3.0: add software ring share memory support

Jiansong Chen (1):
  drm/amdgpu: update GC golden setting for navy_flounder

Jinzhou Su (1):
  drm/amdgpu: Add gfx doorbell setting for Vangogh

Kenneth Feng (2):
  drm/amd/amdgpu: fix null pointer in runtime pm
  drm/amd/amdgpu: skip unload message in reset

Lee Jones (27):
  drm/radeon/radeon_device: Consume our own header where the prototypes are 
located
  drm/amd/amdgpu/amdgpu_ttm: Add description for 'page_flags'
  drm/amd/amdgpu/amdgpu_ib: Provide docs for 'amdgpu_ib_schedule()'s 'job' 
param
  drm/amd/amdgpu/cik_ih: Supply description for 'ih' in 'cik_ih_{get, 
set}_wptr()'
  drm/amd/amdgpu/amdgpu_virt: Correct possible copy/paste or doc-rot 
misnaming issue
  drm/amd/amdgpu/uvd_v4_2: Fix some kernel-doc misdemeanours
  drm/amd/amdgpu/dce_v8_0: Supply description for 'async'
  drm/amd/amdgpu/cik_sdma: Supply some missing function param descriptions
  drm/amd/amdgpu/gfx_v7_0: Clean-up a bunch of kernel-doc related issues
  drm/amd/amdgpu/si_dma: Fix a bunch of function documentation issues
  drm/amd/amdgpu/gfx_v6_0: Supply description for 
'gfx_v6_0_ring_test_ib()'s 'timeout' param
  drm/amd/amdgpu/uvd_v3_1: Fix-up some documentation issues
  drm/amd/amdgpu/dce_v6_0: Fix formatting and missing parameter description 
issues
  drm/amd/include/vega20_ip_offset: Mark top-level IP_BASE definition as 
__maybe_unused
  drm/amd/include/navi10_ip_offset: Mark top-level IP_BASE as __maybe_unused
  drm/amd/include/arct_ip_offset: Mark top-level IP_BASE definition as 
__maybe_unused
  drm/amd/include/navi14_ip_offset: Mark top-level IP_BASE as __maybe_unused
  drm/amd/include/navi12_ip_offset: Mark top-level IP_BASE as __maybe_unused
  drm/amd/include/sienna_cichlid_ip_offset: Mark top-level IP_BASE 

  1   2   >