date:20240129

Re: [PATCH next 09/11] tree-wide: minmax: Replace all the uses of max() for array sizes with max_const()

2024-01-29 Thread David Sterba

On Sun, Jan 28, 2024 at 07:34:23PM +, David Laight wrote:
> These are the only uses of max() that require a constant value
> from constant parameters.
> There don't seem to be any similar uses of min().
> 
> Replacing the max() by max_const() lets min()/max() be simplified
> speeding up compilation.
> 
> max_const() will convert enums to int (or unsigned int) so that the
> casts added by max_t() are no longer needed.
> 
> Signed-off-by: David Laight 
> ---

For

>  fs/btrfs/tree-checker.c| 2 +-

Acked-by: David Sterba

Re: [PATCH 2/2] drm/bridge: samsung-dsim: Fix porch calcalcuation rounding

2024-01-29 Thread Frieder Schrempf

On 25.01.24 19:44, Adam Ford wrote:
> On Mon, Dec 11, 2023 at 9:33 PM Adam Ford  wrote:
>>
>> When using video sync pulses, the HFP, HBP, and HSA are divided between
>> the available lanes if there is more than one lane.  For certain
>> timings and lane configurations, the HFP may not be evenly divisible.
>> If the HFP is rounded down, it ends up being too small which can cause
>> some monitors to not sync properly. In these instances, adjust htotal
>> and hsync to round the HFP up, and recalculate the htotal.
>>
>> Tested-by: Frieder Schrempf  # Kontron BL 
>> i.MX8MM with HDMI monitor
>> Signed-off-by: Adam Ford 
> 
> Gentle nudge on this one.  Basically this fixes an issue with the 8MP,
> but it's still unknown why it doesn't work on 8MM or 8MN, but Frieder
> confirmed there are no regressions on 8MM or 8MN.

I only tested two specific display setups on i.MX8MM. So of course I
can't confirm the absence of regressions in general.

Anyway, I think this should be applied.

RE: [PATCH linux-next/master 1/1] media: staging/imx7: Add handler the abnormal interrupt BIT_STATFF_INT

2024-01-29 Thread Alice Yuan

@rmf...@gmail.com@Laurent 
Pinchart@martin.kepplin...@puri.sm@ker...@puri.sm@linux-me...@vger.kernel.org

Any feedback?

Thanks

B&R.
Name: Alice Yuan
Address: No.192, Zhangjiang Liangjing Road, Pudong New Area, Shanghai

> -Original Message-
> From: Alice Yuan
> Sent: 2023年12月8日 10:15
> To: rmf...@gmail.com; Laurent Pinchart
> ; martin.kepplin...@puri.sm;
> ker...@puri.sm; linux-me...@vger.kernel.org
> Cc: LnxRevLi ; linux-de...@linux.nxdi.nxp.com;
> ker...@pengutronix.de; dl-linux-imx ;
> dri-devel@lists.freedesktop.org; linux-arm-ker...@lists.infradead.org;
> patchwork-...@pengutronix.de
> Subject: [PATCH linux-next/master 1/1] media: staging/imx7: Add handler the
> abnormal interrupt BIT_STATFF_INT
> 
> From: "alice.yuan" 
> 
> When do 2592x1944 resolution capture, we found some times the
> "BIT_STATFF_INT" is abnormal, the stat fifo is not full, but still the DMA 
> transfer
> done interrupts generate, at this time we will get broken frames.
> 
> From the reference manual of imx8mm, we know the STATFIFO full interrupt
> status, that indicates the number of data in the STATFIFO reaches the trigger
> level. It defined clearly about BIT_STATFF_INT, 0: STATFIFO is not full, 1:
> STATFIFO is full.
> 
> So we should handler the abnormal interrupts when BIT_STATFF_INT is 0, the
> stat fifo is not full, we need to drop the broken frames, and recovery from 
> the
> abnormal status.
> 
> Signed-off-by: alice.yuan 
> ---
>  drivers/media/platform/nxp/imx7-media-csi.c | 25 -
>  1 file changed, 19 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/media/platform/nxp/imx7-media-csi.c
> b/drivers/media/platform/nxp/imx7-media-csi.c
> index 15049c6aab37..9012b155c2d7 100644
> --- a/drivers/media/platform/nxp/imx7-media-csi.c
> +++ b/drivers/media/platform/nxp/imx7-media-csi.c
> @@ -249,6 +249,7 @@ struct imx7_csi {
>   bool is_streaming;
>   int buf_num;
>   u32 frame_sequence;
> + int frame_skip;
> 
>   bool last_eof;
>   struct completion last_eof_completion; @@ -686,6 +687,7 @@ static
> void imx7_csi_enable(struct imx7_csi *csi)
>   imx7_csi_dmareq_rff_enable(csi);
>   imx7_csi_hw_enable(csi);
> 
> + csi->frame_skip = 0;
>   if (csi->model == IMX7_CSI_IMX8MQ)
>   imx7_csi_baseaddr_switch_on_second_frame(csi);
>  }
> @@ -764,6 +766,12 @@ static irqreturn_t imx7_csi_irq_handler(int irq, void
> *data)
>   imx7_csi_error_recovery(csi);
>   }
> 
> + if (!(status & BIT_STATFF_INT)) {
> + dev_warn(csi->dev, "Stat fifo is not full\n");
> + imx7_csi_error_recovery(csi);
> + csi->frame_skip++;
> + }
> +
>   if (status & BIT_ADDR_CH_ERR_INT) {
>   imx7_csi_hw_disable(csi);
> 
> @@ -790,14 +798,19 @@ static irqreturn_t imx7_csi_irq_handler(int irq, void
> *data)
> 
>   if ((status & BIT_DMA_TSF_DONE_FB1) ||
>   (status & BIT_DMA_TSF_DONE_FB2)) {
> - imx7_csi_vb2_buf_done(csi);
> -
> - if (csi->last_eof) {
> - complete(&csi->last_eof_completion);
> - csi->last_eof = false;
> + if (csi->frame_skip) {
> + dev_warn(csi->dev, "skip frame: %d\n", csi->frame_skip);
> + csi->frame_skip--;
> + goto out;
> + } else {
> + imx7_csi_vb2_buf_done(csi);
> + if (csi->last_eof) {
> + complete(&csi->last_eof_completion);
> + csi->last_eof = false;
> + }
>   }
>   }
> -
> +out:
>   spin_unlock(&csi->irqlock);
> 
>   return IRQ_HANDLED;
> --
> 2.37.1

[PATCH] drm/i915: use READ_ONCE() to read vma->iomap in concurrent environment

2024-01-29 Thread linke li

In function i915_vma_pin_iomap(), vma->iomap is read using READ_ONCE()
in line 562

562ptr = READ_ONCE(vma->iomap);

while read directly in line 597

592if (unlikely(cmpxchg(&vma->iomap, NULL, ptr))) {
593if (page_unmask_bits(ptr))
594__i915_gem_object_release_map(vma->obj);
595else
596io_mapping_unmap(ptr);
597ptr = vma->iomap;

There is patch similar to this. 
https://github.com/torvalds/linux/commit/c1c0ce31b2420d5c173228a2132a492ede03d81f
This patch find two read of same variable while one is protected, another
is not. And READ_ONCE() is added to protect.

Signed-off-by: linke li 
---
 drivers/gpu/drm/i915/i915_vma.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index d09aad34ba37..9fcc11db0505 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -594,7 +594,7 @@ void __iomem *i915_vma_pin_iomap(struct i915_vma *vma)
__i915_gem_object_release_map(vma->obj);
else
io_mapping_unmap(ptr);
-   ptr = vma->iomap;
+   ptr = READ_ONCE(vma->iomap);
}
}
 
-- 
2.39.3 (Apple Git-145)

Re: [PATCH RFC for upstream 1/4] dt-bindings: display: panel-simple: add ETML1010G3DRA

2024-01-29 Thread Yannic Moog

Hi Conor,

On Fri, 2024-01-26 at 16:27 +, Conor Dooley wrote:
> Hey,
> 
> On Fri, Jan 26, 2024 at 09:57:23AM +0100, Yannic Moog wrote:
> > Add Emerging Display Technology Corp. etml1010g3dra 10.1" LCD-TFT LVDS
> > panel compatible string.
> > 
> > Signed-off-by: Yannic Moog 
> 
> > [PATCH RFC for upstream 1/4]
> 
> The "for upstream" here is not really relevant, what else would the
> patch be for?

I sent this for internal review first and forgot to remove the tags, sorry.

Yannic

> 
> Acked-by: Conor Dooley 
> 
> Thanks,
> Conor.
>

Re: [PATCH v3 1/3] drm/atomic: Allow drivers to write their own plane check for async flips

2024-01-29 Thread Pekka Paalanen

On Sun, 28 Jan 2024 18:25:13 -0300
André Almeida  wrote:

> Some hardware are more flexible on what they can flip asynchronously, so
> rework the plane check so drivers can implement their own check, lifting
> up some of the restrictions.
> 
> Signed-off-by: André Almeida 
> ---
> v3: no changes
> 
>  drivers/gpu/drm/drm_atomic_uapi.c | 62 ++-
>  include/drm/drm_atomic_uapi.h | 12 ++
>  include/drm/drm_plane.h   |  5 +++
>  3 files changed, 62 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_atomic_uapi.c 
> b/drivers/gpu/drm/drm_atomic_uapi.c
> index aee4a65d4959..6d5b9fec90c7 100644
> --- a/drivers/gpu/drm/drm_atomic_uapi.c
> +++ b/drivers/gpu/drm/drm_atomic_uapi.c
> @@ -620,7 +620,7 @@ static int drm_atomic_plane_set_property(struct drm_plane 
> *plane,
>   return 0;
>  }
>  
> -static int
> +int
>  drm_atomic_plane_get_property(struct drm_plane *plane,
>   const struct drm_plane_state *state,
>   struct drm_property *property, uint64_t *val)
> @@ -683,6 +683,7 @@ drm_atomic_plane_get_property(struct drm_plane *plane,
>  
>   return 0;
>  }
> +EXPORT_SYMBOL(drm_atomic_plane_get_property);
>  
>  static int drm_atomic_set_writeback_fb_for_connector(
>   struct drm_connector_state *conn_state,
> @@ -1026,18 +1027,54 @@ int drm_atomic_connector_commit_dpms(struct 
> drm_atomic_state *state,
>   return ret;
>  }
>  
> -static int drm_atomic_check_prop_changes(int ret, uint64_t old_val, uint64_t 
> prop_value,
> +int drm_atomic_check_prop_changes(int ret, uint64_t old_val, uint64_t 
> prop_value,

Hi,

should the word "async" be somewhere in the function name?

>struct drm_property *prop)
>  {
>   if (ret != 0 || old_val != prop_value) {
>   drm_dbg_atomic(prop->dev,
> -"[PROP:%d:%s] No prop can be changed during 
> async flip\n",
> +"[PROP:%d:%s] This prop cannot be changed during 
> async flip\n",
>  prop->base.id, prop->name);
>   return -EINVAL;
>   }
>  
>   return 0;
>  }
> +EXPORT_SYMBOL(drm_atomic_check_prop_changes);
> +
> +/* plane changes may have exceptions, so we have a special function for them 
> */
> +static int drm_atomic_check_plane_changes(struct drm_property *prop,
> +   struct drm_plane *plane,
> +   struct drm_plane_state *plane_state,
> +   struct drm_mode_object *obj,
> +   u64 prop_value, u64 old_val)
> +{
> + struct drm_mode_config *config = &plane->dev->mode_config;
> + int ret;
> +
> + if (plane->funcs->check_async_props)
> + return plane->funcs->check_async_props(prop, plane, plane_state,
> +  obj, prop_value, 
> old_val);

Is it really ok to allow drivers to opt-in to also *reject* the FB_ID
and IN_FENCE_FD props on async commits?

Either intentionally or by accident.

> +
> + /*
> +  * if you are trying to change something other than the FB ID, your
> +  * change will be either rejected or ignored, so we can stop the check
> +  * here
> +  */
> + if (prop != config->prop_fb_id) {
> + ret = drm_atomic_plane_get_property(plane, plane_state,
> + prop, &old_val);
> + return drm_atomic_check_prop_changes(ret, old_val, prop_value, 
> prop);

When I first read this code, it seemed to say: "If the prop is not
FB_ID, then do the usual prop change checking that happens on all
changes, not only async.". Reading this patch a few more times over, I
finally realized drm_atomic_check_prop_changes() is about async
specifically.

> + }
> +
> + if (plane_state->plane->type != DRM_PLANE_TYPE_PRIMARY) {
> + drm_dbg_atomic(prop->dev,
> +"[OBJECT:%d] Only primary planes can be changed 
> during async flip\n",
> +obj->id);
> + return -EINVAL;
> + }
> +
> + return 0;
> +}
>  
>  int drm_atomic_set_property(struct drm_atomic_state *state,
>   struct drm_file *file_priv,
> @@ -1100,7 +1137,6 @@ int drm_atomic_set_property(struct drm_atomic_state 
> *state,
>   case DRM_MODE_OBJECT_PLANE: {
>   struct drm_plane *plane = obj_to_plane(obj);
>   struct drm_plane_state *plane_state;
> - struct drm_mode_config *config = &plane->dev->mode_config;
>  
>   plane_state = drm_atomic_get_plane_state(state, plane);
>   if (IS_ERR(plane_state)) {
> @@ -1108,19 +1144,11 @@ int drm_atomic_set_property(struct drm_atomic_state 
> *state,
>   break;
>   }
>  
> - if (async_flip && prop != config->prop_fb_id) {
> -

Re: [PATCH v19 22/30] drm/shmem-helper: Add common memory shrinker

2024-01-29 Thread Boris Brezillon

On Mon, 29 Jan 2024 09:16:04 +0300
Dmitry Osipenko  wrote:

> On 1/26/24 21:12, Boris Brezillon wrote:
> > On Fri, 26 Jan 2024 19:27:49 +0300
> > Dmitry Osipenko  wrote:
> >   
> >> On 1/26/24 12:55, Boris Brezillon wrote:  
> >>> On Fri, 26 Jan 2024 00:56:47 +0300
> >>> Dmitry Osipenko  wrote:
> >>> 
>  On 1/25/24 13:19, Boris Brezillon wrote:
> > On Fri,  5 Jan 2024 21:46:16 +0300
> > Dmitry Osipenko  wrote:
> >   
> >> +static bool drm_gem_shmem_is_evictable(struct drm_gem_shmem_object 
> >> *shmem)
> >> +{
> >> +  return (shmem->madv >= 0) && shmem->base.funcs->evict &&
> >> +  refcount_read(&shmem->pages_use_count) &&
> >> +  !refcount_read(&shmem->pages_pin_count) &&
> >> +  !shmem->base.dma_buf && !shmem->base.import_attach &&
> >> +  !shmem->evicted;  
> >
> > Are we missing
> >
> > && dma_resv_test_signaled(shmem->base.resv,
> >   DMA_RESV_USAGE_BOOKKEEP)
> >
> > to make sure the GPU is done using the BO?
> > The same applies to drm_gem_shmem_is_purgeable() BTW.
> >
> > If you don't want to do this test here, we need a way to let drivers
> > provide a custom is_{evictable,purgeable}() test.
> >
> > I guess we should also expose drm_gem_shmem_shrinker_update_lru_locked()
> > to let drivers move the GEMs that were used most recently (those
> > referenced by a GPU job) at the end of the evictable LRU.  
> 
>  We have the signaled-check in the common drm_gem_evict() helper:
> 
>  https://elixir.bootlin.com/linux/v6.8-rc1/source/drivers/gpu/drm/drm_gem.c#L1496
>  
> >>>
> >>> Ah, indeed. I'll need DMA_RESV_USAGE_BOOKKEEP instead of
> >>> DMA_RESV_USAGE_READ in panthor, but I can add it in the driver specific   
> >>>  
> >>> ->evict() hook (though that means calling dma_resv_test_signaled()
> >>> twice, which is not great, oh well).
> >>
> >> Maybe we should change drm_gem_evict() to use BOOKKEEP. The
> >> test_signaled(BOOKKEEP) should be a "stronger" check than
> >> test_signaled(READ)?  
> > 
> > It is, just wondering if some users have a good reason to want
> > READ here.
> >   
> >>  
> >>> The problem about the evictable LRU remains though: we need a way to let
> >>> drivers put their BOs at the end of the list when the BO has been used
> >>> by the GPU, don't we?
> >>
> >> If BO is use, then it won't be evicted, while idling BOs will be
> >> evicted. Hence, the used BOs will be naturally moved down the LRU list
> >> each time shrinker is invoked.
> >>  
> > 
> > That only do the trick if the BOs being used most often are busy when
> > the shrinker kicks in though. Let's take this scenario:
> > 
> > 
> > BO 1BO 2
> > shinker
> > 
> > busy
> > idle (first-pos-in-evictable-LRU)
> > 
> > busy
> > idle (second-pos-in-evictable-LRU)
> > 
> > busy
> > idle
> > 
> > busy
> > idle
> > 
> > busy
> > idle
> > 
> > 
> > find a BO to evict
> > 
> > pick BO 2
> > 
> > busy (swapin)
> > idle
> > 
> > If the LRU had been updated at each busy event, BO 1 should have
> > been picked for eviction. But we evicted the BO that was first
> > recorded idle instead of the one that was least recently
> > recorded busy.  
> 
> You have to swapin(BO) every time BO goes to busy state, and swapin does 
> drm_gem_lru_move_tail(BO). Hence, each time BO goes idle->busy, it's moved 
> down the LRU list.

Ah, that's the bit I was missing. It makes sense now. I guess that's
good enough for now, we can sort out the BOOKKEEP vs READ in a
follow-up series.

Reviewed-by: Boris Brezillon

Re: [PATCH v19 22/30] drm/shmem-helper: Add common memory shrinker

2024-01-29 Thread Boris Brezillon

On Fri,  5 Jan 2024 21:46:16 +0300
Dmitry Osipenko  wrote:

> +/**
> + * drm_gem_shmem_swapin_locked() - Moves shmem GEM back to memory and enables
> + * hardware access to the memory.
> + * @shmem: shmem GEM object
> + *
> + * This function moves shmem GEM back to memory if it was previously evicted
> + * by the memory shrinker. The GEM is ready to use on success.
> + *
> + * Returns:
> + * 0 on success or a negative error code on failure.
> + */
> +int drm_gem_shmem_swapin_locked(struct drm_gem_shmem_object *shmem)
> +{
> + int err;
> +
> + dma_resv_assert_held(shmem->base.resv);
> +
> + if (!shmem->evicted)
> + return 0;

Shouldn't we have a drm_gem_shmem_shrinker_update_lru_locked() even if
the object wasn't evicted, such that idle->busy transition moves the BO
to the list tail?

> +
> + err = drm_gem_shmem_acquire_pages(shmem);
> + if (err)
> + return err;
> +
> + shmem->evicted = false;
> +
> + drm_gem_shmem_shrinker_update_lru_locked(shmem);
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(drm_gem_shmem_swapin_locked);
> +

Re: [PATCH v19 22/30] drm/shmem-helper: Add common memory shrinker

2024-01-29 Thread Boris Brezillon

On Mon, 29 Jan 2024 09:55:11 +0100
Boris Brezillon  wrote:

> On Mon, 29 Jan 2024 09:16:04 +0300
> Dmitry Osipenko  wrote:
> 
> > On 1/26/24 21:12, Boris Brezillon wrote:  
> > > On Fri, 26 Jan 2024 19:27:49 +0300
> > > Dmitry Osipenko  wrote:
> > > 
> > >> On 1/26/24 12:55, Boris Brezillon wrote:
> > >>> On Fri, 26 Jan 2024 00:56:47 +0300
> > >>> Dmitry Osipenko  wrote:
> > >>>   
> >  On 1/25/24 13:19, Boris Brezillon wrote:  
> > > On Fri,  5 Jan 2024 21:46:16 +0300
> > > Dmitry Osipenko  wrote:
> > > 
> > >> +static bool drm_gem_shmem_is_evictable(struct drm_gem_shmem_object 
> > >> *shmem)
> > >> +{
> > >> +return (shmem->madv >= 0) && shmem->base.funcs->evict &&
> > >> +refcount_read(&shmem->pages_use_count) &&
> > >> +!refcount_read(&shmem->pages_pin_count) &&
> > >> +!shmem->base.dma_buf && !shmem->base.import_attach &&
> > >> +!shmem->evicted;
> > >
> > > Are we missing
> > >
> > > && dma_resv_test_signaled(shmem->base.resv,
> > > DMA_RESV_USAGE_BOOKKEEP)
> > >
> > > to make sure the GPU is done using the BO?
> > > The same applies to drm_gem_shmem_is_purgeable() BTW.
> > >
> > > If you don't want to do this test here, we need a way to let drivers
> > > provide a custom is_{evictable,purgeable}() test.
> > >
> > > I guess we should also expose 
> > > drm_gem_shmem_shrinker_update_lru_locked()
> > > to let drivers move the GEMs that were used most recently (those
> > > referenced by a GPU job) at the end of the evictable LRU.
> > 
> >  We have the signaled-check in the common drm_gem_evict() helper:
> > 
> >  https://elixir.bootlin.com/linux/v6.8-rc1/source/drivers/gpu/drm/drm_gem.c#L1496
> >    
> > >>>
> > >>> Ah, indeed. I'll need DMA_RESV_USAGE_BOOKKEEP instead of
> > >>> DMA_RESV_USAGE_READ in panthor, but I can add it in the driver specific 
> > >>>  
> > >>> ->evict() hook (though that means calling dma_resv_test_signaled()  
> > >>> twice, which is not great, oh well).  
> > >>
> > >> Maybe we should change drm_gem_evict() to use BOOKKEEP. The
> > >> test_signaled(BOOKKEEP) should be a "stronger" check than
> > >> test_signaled(READ)?
> > > 
> > > It is, just wondering if some users have a good reason to want
> > > READ here.
> > > 
> > >>
> > >>> The problem about the evictable LRU remains though: we need a way to let
> > >>> drivers put their BOs at the end of the list when the BO has been used
> > >>> by the GPU, don't we?  
> > >>
> > >> If BO is use, then it won't be evicted, while idling BOs will be
> > >> evicted. Hence, the used BOs will be naturally moved down the LRU list
> > >> each time shrinker is invoked.
> > >>
> > > 
> > > That only do the trick if the BOs being used most often are busy when
> > > the shrinker kicks in though. Let's take this scenario:
> > > 
> > > 
> > > BO 1  BO 2
> > > shinker
> > > 
> > >   busy
> > >   idle (first-pos-in-evictable-LRU)
> > > 
> > > busy
> > > idle (second-pos-in-evictable-LRU)
> > > 
> > >   busy
> > >   idle
> > > 
> > >   busy
> > >   idle
> > > 
> > >   busy
> > >   idle
> > > 
> > >   
> > > find a BO to evict
> > >   
> > > pick BO 2
> > > 
> > >   busy (swapin)
> > >   idle
> > > 
> > > If the LRU had been updated at each busy event, BO 1 should have
> > > been picked for eviction. But we evicted the BO that was first
> > > recorded idle instead of the one that was least recently
> > > recorded busy.
> > 
> > You have to swapin(BO) every time BO goes to busy state, and swapin does 
> > drm_gem_lru_move_tail(BO). Hence, each time BO goes idle->busy, it's moved 
> > down the LRU list.  
> 
> Ah, that's the bit I was missing. It makes sense now. I guess that's
> good enough for now, we can sort out the BOOKKEEP vs READ in a
> follow-up series.

On second look, it seems drm_gem_shmem_shrinker_update_lru_locked()
doesn't call drm_gem_shmem_shrinker_update_lru_locked() if the BO was
already resident? Is there something else I'm overlooking here?

> 
> Reviewed-by: Boris Brezillon

Re: [PATCH next 10/11] block: Use a boolean expression instead of max() on booleans

2024-01-29 Thread Jani Nikula

On Sun, 28 Jan 2024, David Laight  wrote:
> blk_stack_limits() contains:
>   t->zoned = max(t->zoned, b->zoned);
> These are bool, so it is just a bitwise or.

Should be a logical or, really. And || in code.

BR,
Jani.


> However it generates:
> error: comparison of constant â€˜0â€™ with boolean expression is always true 
> [-Werror=bool-compare]
> inside the signedness check that max() does unless a '+ 0' is added.
> It is a shame the compiler generates this warning for code that will
> be optimised away.
>
> Change so that the extra '+ 0' can be removed.
>
> Signed-off-by: David Laight 
> ---
>  block/blk-settings.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index 06ea91e51b8b..9ca21fea039d 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -688,7 +688,7 @@ int blk_stack_limits(struct queue_limits *t, struct 
> queue_limits *b,
>  b->max_secure_erase_sectors);
>   t->zone_write_granularity = max(t->zone_write_granularity,
>   b->zone_write_granularity);
> - t->zoned = max(t->zoned, b->zoned);
> + t->zoned = t->zoned | b->zoned;
>   return ret;
>  }
>  EXPORT_SYMBOL(blk_stack_limits);

-- 
Jani Nikula, Intel

Re: [PATCH RESEND] drm/virtio: set segment size for virtio_gpu device

2024-01-29 Thread Dmitry Osipenko

On 1/23/24 21:14, Sebastian Ott wrote:
> drm/virtio: set segment size for virtio_gpu device
> 
> Set the segment size of the virtio_gpu device to the value
> used by the drm helpers when allocating sg lists to fix the
> following complaint from DMA_API debug code:
> DMA-API: virtio-pci :07:00.0: mapping sg segment longer than device
> claims to support [len=262144] [max=65536]
> 
> Signed-off-by: Sebastian Ott 
> ---
>   drivers/gpu/drm/virtio/virtgpu_drv.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.c
> b/drivers/gpu/drm/virtio/virtgpu_drv.c
> index 4334c7608408..74b2cb3295af 100644
> --- a/drivers/gpu/drm/virtio/virtgpu_drv.c
> +++ b/drivers/gpu/drm/virtio/virtgpu_drv.c
> @@ -94,6 +94,7 @@ static int virtio_gpu_probe(struct virtio_device *vdev)
>   goto err_free;
>   }
> 
> +    dma_set_max_seg_size(dev->dev, dma_max_mapping_size(dev->dev) ? :
> UINT_MAX);
>   ret = virtio_gpu_init(vdev, dev);
>   if (ret)
>   goto err_free;

Added stable tag and applied to misc-fixes. Please use `git send-email`
next time. Thanks

-- 
Best regards,
Dmitry

Re: [PATCH] drm: bridge: samsung-dsim: Don't use FORCE_STOP_STATE

2024-01-29 Thread Frieder Schrempf

On 26.01.24 19:28, Dave Airlie wrote:
> Just FYI this conflictted pretty heavily with drm-misc-next changes in
> the same area, someone should check drm-tip has the correct
> resolution, I'm not really sure what is definitely should be.
> 
> Dave.

Thanks! I took a quick look at what is now in Linus' tree and it looks
correct to me. The only thing I'm missing is my Reviewed-by tag which
got lost somewhere, but I can get over that.

RE: [PATCH next 10/11] block: Use a boolean expression instead of max() on booleans

2024-01-29 Thread David Laight

From: Jani Nikula
> Sent: 29 January 2024 09:08
> 
> On Sun, 28 Jan 2024, David Laight  wrote:
> > blk_stack_limits() contains:
> > t->zoned = max(t->zoned, b->zoned);
> > These are bool, so it is just a bitwise or.
> 
> Should be a logical or, really. And || in code.

Not really, bitwise is fine for bool (especially for 'or')
and generates better code.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)

Re: [PATCH v5 0/3] drm/i915: Fix VMA UAF on destroy against deactivate race

2024-01-29 Thread Nirmoy Das


Hi Janusz,

There seems to be a regression in CI related to this:

https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_129026v2/bat-dg1-7/igt@gem_lmem_swapping@random-engi...@lmem0.html#dmesg-warnings1053

Please have a look.


Regards,

Nirmoy

On 1/24/2024 6:13 PM, Janusz Krzysztofik wrote:

Object debugging tools were sporadically reporting illegal attempts to
free a still active i915 VMA object when parking a GT believed to be idle.

[161.359441] ODEBUG: free active (active state 0) object: 88811643b958 
object type: i915_active hint: __i915_vma_active+0x0/0x50 [i915]
[161.360082] WARNING: CPU: 5 PID: 276 at lib/debugobjects.c:514 
debug_print_object+0x80/0xb0
...
[161.360304] CPU: 5 PID: 276 Comm: kworker/5:2 Not tainted 
6.5.0-rc1-CI_DRM_13375-g003f860e5577+ #1
[161.360314] Hardware name: Intel Corporation Rocket Lake Client 
Platform/RocketLake S UDIMM 6L RVP, BIOS RKLSFWI1.R00.3173.A03.2204210138 
04/21/2022
[161.360322] Workqueue: i915-unordered __intel_wakeref_put_work [i915]
[161.360592] RIP: 0010:debug_print_object+0x80/0xb0
...
[161.361347] debug_object_free+0xeb/0x110
[161.361362] i915_active_fini+0x14/0x130 [i915]
[161.361866] release_references+0xfe/0x1f0 [i915]
[161.362543] i915_vma_parked+0x1db/0x380 [i915]
[161.363129] __gt_park+0x121/0x230 [i915]
[161.363515] intel_wakeref_put_last+0x1f/0x70 [i915]

That has been tracked down to be happening when another thread is
deactivating the VMA inside __active_retire() helper, after the VMA's
active counter has been already decremented to 0, but before deactivation
of the VMA's object is reported to the object debugging tool.

We could prevent from that race by serializing i915_active_fini() with
__active_retire() via ref->tree_lock, but that wouldn't stop the VMA from
being used, e.g. from __i915_vma_retire() called at the end of
__active_retire(), after that VMA has been already freed by a concurrent
i915_vma_destroy() on return from the i915_active_fini().  Then, we should
rather fix the issue at the VMA level, not in i915_active.

Since __i915_vma_parked() is called from __gt_park() on last put of the
GT's wakeref, the issue could be addressed by holding the GT wakeref long
enough for __active_retire() to complete before that wakeref is released
and the GT parked.

A VMA associated with a request doesn't acquire a GT wakeref by itself.
Instead, it depends on a wakeref held directly by the request's active
intel_context for a GT associated with its VM, and indirectly on that
intel_context's engine wakeref if the engine belongs to the same GT as the
VMA's VM.  Those wakerefs are released asynchronously to VMA deactivation.

In case of single-GT platforms, at least one of those wakerefs is usually
held long enough for the request's VMA to be deactivated on time, before
it is destroyed on last put of its VM GT wakeref.  However, on multi-GT
platforms, a request may use a VMA from a GT other than the one that hosts
the request's engine, then it is protected only with the intel_context's
VM GT wakeref.

There was an attempt to fix the issue on 2-GT Meteor Lake by acquiring an
extra wakeref for a Primary GT from i915_gem_do_execbuffer() -- see commit
f56fe3e91787 ("drm/i915: Fix a VMA UAF for multi-gt platform").  However,
that fix occurred insufficient -- the issue was still reported by CI.
That wakeref was released on exit from i915_gem_do_execbuffer(), then
potentially before completion of the request and deactivation of its
associated VMAs.  Moreover, CI reports indicated that single-GT platforms
also suffered sporadically from the same race.

I believe the issue was introduced by commit d93939730347 ("drm/i915:
Remove the vma refcount") which moved a call to i915_active_fini() from
a dropped i915_vma_release(), called on last put of the removed VMA kref,
to i915_vma_parked() processing path called on last put of a GT wakeref.
However, its visibility to the object debugging tool was suppressed by a
bug in i915_active that was fixed two weeks later with commit e92eb246feb9
("drm/i915/active: Fix missing debug object activation").

Fix the issue by getting a wakeref for the VMA's GT when activating it,
and putting that wakeref only after the VMA is deactivated.  However,
exclude global GTT from that processing path, otherwise the GPU never goes
idle.  Since __i915_vma_retire() may be called from atomic contexts, use
async variant of wakeref put.  Also, to avoid circular locking dependency,
take care of acquiring the wakeref before VM mutex when both are needed.

Having that fixed, stop explicitly acquiring the extra GT0 wakeref from
inside i915_gem_do_execbuffer(), and also drop an extra call to
i915_active_wait(), introduced by commit 7a2280e8dcd2 ("drm/i915: Wait for
active retire before i915_active_fini()") as another insufficient fix for
this UAF race.

v5: Replace "tile" with "GT" across commit descriptions (Rodrigo),
   - reword commit message and description of patch 2 reusing relevant
 chunks moved there from commit description

Re: [PATCH v19 22/30] drm/shmem-helper: Add common memory shrinker

2024-01-29 Thread Dmitry Osipenko

On 1/29/24 12:01, Boris Brezillon wrote:
> On Fri,  5 Jan 2024 21:46:16 +0300
> Dmitry Osipenko  wrote:
> 
>> +/**
>> + * drm_gem_shmem_swapin_locked() - Moves shmem GEM back to memory and 
>> enables
>> + * hardware access to the memory.
>> + * @shmem: shmem GEM object
>> + *
>> + * This function moves shmem GEM back to memory if it was previously evicted
>> + * by the memory shrinker. The GEM is ready to use on success.
>> + *
>> + * Returns:
>> + * 0 on success or a negative error code on failure.
>> + */
>> +int drm_gem_shmem_swapin_locked(struct drm_gem_shmem_object *shmem)
>> +{
>> +int err;
>> +
>> +dma_resv_assert_held(shmem->base.resv);
>> +
>> +if (!shmem->evicted)
>> +return 0;
> 
> Shouldn't we have a drm_gem_shmem_shrinker_update_lru_locked() even if
> the object wasn't evicted, such that idle->busy transition moves the BO
> to the list tail?

Seems so, good catch. I'll double-check and remove it in the next version.

-- 
Best regards,
Dmitry

Re: [RFC PATCH 0/2] drm/amd/display: switch amdgpu_dm_connector to

2024-01-29 Thread Jani Nikula

On Fri, 26 Jan 2024, Mario Limonciello  wrote:
> On 1/26/2024 10:28, Melissa Wen wrote:
>> Hi,
>> 
>> I'm debugging a null-pointer dereference when running
>> igt@kms_connector_force_edid and the way I found to solve the bug is to
>> stop using raw edid handler in amdgpu_connector_funcs_force and
>> create_eml_sink in favor of managing resouces via sruct drm_edid helpers
>> (Patch 1). The proper solution seems to be switch amdgpu_dm_connector
>> from struct edid to struct drm_edid and avoid the usage of different
>> approaches in the driver (Patch 2). However, doing it implies a good
>> amount of work and validation, therefore I decided to send this RFC
>> first to collect opinions and check if there is any parallel work on
>> this side. It's a working in progress.
>> 
>> The null-pointer error trigger by the igt@kms_connector_force_edid test
>> was introduced by:
>> - e54ed41620f ("drm/amd/display: Remove unwanted drm edid references")
>> 
>> You can check the error trace in the first patch.
>> 
>> This series was tested with kms_hdmi_inject and kms_force_connector. No
>> null-pointer error, kms_hdmi_inject is successul and kms_force_connector
>> is sucessful after the second execution - the force-edid subtest
>> still fails in the first run (I'm still investigating).
>> 
>> There is also a couple of cast warnings to be addressed - I'm looking
>> for the best replacement.
>> 
>> I appreciate any feedback and testing.
>
> So I'm actually a little bit worried by hardcoding EDID_LENGTH in this 
> series.
>
> I have some other patches that I'm posting later on that let you get the 
> EDID from _DDC BIOS method too.  My observation was that the EDID can be 
> anywhere up to 512 bytes according to the ACPI spec.
>
> An earlier version of my patch was using EDID_LENGTH when fetching it 
> and the EDID checksum failed.
>
> I'll CC you on the post, we probably want to get your changes and mine 
> merged together.

One of the main points of struct drm_edid is that it tracks the
allocation size separately.

We should simply not trust edid->extensions, because most of the time it
originates from outside the kernel.

Using drm_edid and immediately drm_edid_raw() falls short. That function
should only be used during migration to help. And yeah, it also means
EDID parsing should be done in drm_edid.c, and not spread out all over
the subsystem.


BR,
Jani.


>
>> 
>> Melissa
>> 
>> Melissa Wen (2):
>>drm/amd/display: fix null-pointer dereference on edid reading
>>drm/amd/display: switch amdgpu_dm_connector to use struct drm_edid
>> 
>>   .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 78 ++-
>>   .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h |  4 +-
>>   .../amd/display/amdgpu_dm/amdgpu_dm_helpers.c |  9 ++-
>>   .../display/amdgpu_dm/amdgpu_dm_mst_types.c   | 23 +++---
>>   4 files changed, 60 insertions(+), 54 deletions(-)
>> 
>

-- 
Jani Nikula, Intel

[PATCH RESEND v7 0/7] Add support for XLCDC to sam9x7 SoC family.

2024-01-29 Thread Manikandan Muralidharan

This patch series aims to add support for XLCDC IP of sam9x7 SoC family
to the DRM subsystem.XLCDC IP has additional registers and new
configuration bits compared to the existing register set of HLCDC IP.
The new compatible string "microchip,sam9x75-xlcdc" is defined for sam9x75
variant of the sam9x7 SoC family.The is_xlcdc flag under driver data and
IP specific driver ops helps to differentiate the XLCDC and existing HLCDC
code within the same driver.

changes in v7:
* LCDC IP driver ops functions are declared static and its 
declaration are removed.

changes in v6:
* Fixed Cosmetic defects.
* Added comments for readability.

changes in v5:
* return value of regmap_read_poll_timeout is checked in failure
case.
* HLCDC and XLCDC specific driver functions are now invoked
using its IP specific driver ops w/o the need of checking is_xlcdc flag.
* Removed empty spaces and blank lines.

changes in v4:
* fixed kernel warnings reported by kernel test robot.

changes in v3:
* Removed de-referencing the value of is_xlcdc flag multiple times in
a single function.
* Removed cpu_relax() call when using regmap_read_poll_timeout.
* Updated xfactor and yfactor equations using shift operators
* Defined CSC co-efficients in an array for code readability.

changes in v2:
* Change the driver compatible name from "microchip,sam9x7-xlcdc" to
"microchip,sam9x75-xlcdc".
* Move is_xlcdc flag to driver data.
* Remove unsed Macro definitions.
* Add co-developed-bys tags
* Replace regmap_read() with regmap_read_poll_timeout() call
* Split code into two helpers for code readablitity.

Durai Manickam KR (1):
  drm: atmel-hlcdc: Define SAM9X7 SoC XLCDC specific registers

Manikandan Muralidharan (6):
  drm: atmel-hlcdc: add flag and driver ops to differentiate XLCDC and
HLCDC IP
  drm: atmel-hlcdc: add LCD controller layer definition for sam9x75
  drm: atmel_hlcdc: Add support for XLCDC in atmel LCD driver
  drm: atmel-hlcdc: add DPI mode support for XLCDC
  drm: atmel-hlcdc: add vertical and horizontal scaling support for
XLCDC
  drm: atmel-hlcdc: add support for DSI output formats

 .../gpu/drm/atmel-hlcdc/atmel_hlcdc_crtc.c| 175 +++--
 drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c  | 105 +
 drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.h  |  86 
 .../gpu/drm/atmel-hlcdc/atmel_hlcdc_plane.c   | 369 +++---
 include/linux/mfd/atmel-hlcdc.h   |  10 +
 5 files changed, 647 insertions(+), 98 deletions(-)

-- 
2.25.1

[PATCH RESEND v7 1/7] drm: atmel-hlcdc: add flag and driver ops to differentiate XLCDC and HLCDC IP

2024-01-29 Thread Manikandan Muralidharan

Add is_xlcdc flag and LCD IP specific ops in driver data to differentiate
XLCDC and HLCDC code within the atmel-hlcdc driver files.

Signed-off-by: Manikandan Muralidharan 
---
 drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.h | 37 
 1 file changed, 37 insertions(+)

diff --git a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.h 
b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.h
index 5b5c774e0edf..d5e01ff8c7f9 100644
--- a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.h
+++ b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.h
@@ -177,6 +177,9 @@ struct atmel_hlcdc_layer_cfg_layout {
int csc;
 };
 
+struct atmel_hlcdc_plane_state;
+struct atmel_hlcdc_dc;
+
 /**
  * Atmel HLCDC DMA descriptor structure
  *
@@ -288,6 +291,36 @@ atmel_hlcdc_layer_to_plane(struct atmel_hlcdc_layer *layer)
return container_of(layer, struct atmel_hlcdc_plane, layer);
 }
 
+/**
+ * struct atmel_lcdc_dc_ops - describes atmel_lcdc ops group
+ * to differentiate HLCDC and XLCDC IP code support.
+ * @plane_setup_scaler: update the vertical and horizontal scaling factors
+ * @update_lcdc_buffers: update the each LCDC layers DMA registers.
+ * @lcdc_atomic_disable: disable LCDC interrupts and layers
+ * @lcdc_update_general_settings: update each LCDC layers general
+ * confiugration register.
+ * @lcdc_atomic_update: enable the LCDC layers and interrupts.
+ * @lcdc_csc_init: update the color space conversion co-efficient of
+ * High-end overlay register.
+ * @lcdc_irq_dbg: to raise alert incase of interrupt overrun in any LCDC layer.
+ */
+struct atmel_lcdc_dc_ops {
+   void (*plane_setup_scaler)(struct atmel_hlcdc_plane *plane,
+  struct atmel_hlcdc_plane_state *state);
+   void (*update_lcdc_buffers)(struct atmel_hlcdc_plane *plane,
+   struct atmel_hlcdc_plane_state *state,
+   u32 sr, int i);
+   void (*lcdc_atomic_disable)(struct atmel_hlcdc_plane *plane);
+   void (*lcdc_update_general_settings)(struct atmel_hlcdc_plane *plane,
+struct atmel_hlcdc_plane_state 
*state);
+   void (*lcdc_atomic_update)(struct atmel_hlcdc_plane *plane,
+  struct atmel_hlcdc_dc *dc);
+   void (*lcdc_csc_init)(struct atmel_hlcdc_plane *plane,
+ const struct atmel_hlcdc_layer_desc *desc);
+   void (*lcdc_irq_dbg)(struct atmel_hlcdc_plane *plane,
+const struct atmel_hlcdc_layer_desc *desc);
+};
+
 /**
  * Atmel HLCDC Display Controller description structure.
  *
@@ -304,8 +337,10 @@ atmel_hlcdc_layer_to_plane(struct atmel_hlcdc_layer *layer)
  * @conflicting_output_formats: true if RGBXXX output formats conflict with
  * each other.
  * @fixed_clksrc: true if clock source is fixed
+ * @is_xlcdc: true if XLCDC IP is supported
  * @layers: a layer description table describing available layers
  * @nlayers: layer description table size
+ * @ops: atmel lcdc dc ops
  */
 struct atmel_hlcdc_dc_desc {
int min_width;
@@ -317,8 +352,10 @@ struct atmel_hlcdc_dc_desc {
int max_hpw;
bool conflicting_output_formats;
bool fixed_clksrc;
+   bool is_xlcdc;
const struct atmel_hlcdc_layer_desc *layers;
int nlayers;
+   const struct atmel_lcdc_dc_ops *ops;
 };
 
 /**
-- 
2.25.1

[PATCH RESEND v7 2/7] drm: atmel-hlcdc: add LCD controller layer definition for sam9x75

2024-01-29 Thread Manikandan Muralidharan

Add the LCD controller layer definition and descriptor structure for
sam9x75 for the following layers:
- Base Layer
- Overlay1 Layer
- Overlay2 Layer
- High End Overlay

Signed-off-by: Manikandan Muralidharan 
---
 drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c | 97 
 1 file changed, 97 insertions(+)

diff --git a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c 
b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c
index fa0f9a93d50d..d30aec174aa2 100644
--- a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c
+++ b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c
@@ -462,6 +462,99 @@ static const struct atmel_hlcdc_dc_desc 
atmel_hlcdc_dc_sam9x60 = {
.layers = atmel_hlcdc_sam9x60_layers,
 };
 
+static const struct atmel_hlcdc_layer_desc atmel_xlcdc_sam9x75_layers[] = {
+   {
+   .name = "base",
+   .formats = &atmel_hlcdc_plane_rgb_formats,
+   .regs_offset = 0x60,
+   .id = 0,
+   .type = ATMEL_HLCDC_BASE_LAYER,
+   .cfgs_offset = 0x1c,
+   .layout = {
+   .xstride = { 2 },
+   .default_color = 3,
+   .general_config = 4,
+   .disc_pos = 5,
+   .disc_size = 6,
+   },
+   .clut_offset = 0x700,
+   },
+   {
+   .name = "overlay1",
+   .formats = &atmel_hlcdc_plane_rgb_formats,
+   .regs_offset = 0x160,
+   .id = 1,
+   .type = ATMEL_HLCDC_OVERLAY_LAYER,
+   .cfgs_offset = 0x1c,
+   .layout = {
+   .pos = 2,
+   .size = 3,
+   .xstride = { 4 },
+   .pstride = { 5 },
+   .default_color = 6,
+   .chroma_key = 7,
+   .chroma_key_mask = 8,
+   .general_config = 9,
+   },
+   .clut_offset = 0xb00,
+   },
+   {
+   .name = "overlay2",
+   .formats = &atmel_hlcdc_plane_rgb_formats,
+   .regs_offset = 0x260,
+   .id = 2,
+   .type = ATMEL_HLCDC_OVERLAY_LAYER,
+   .cfgs_offset = 0x1c,
+   .layout = {
+   .pos = 2,
+   .size = 3,
+   .xstride = { 4 },
+   .pstride = { 5 },
+   .default_color = 6,
+   .chroma_key = 7,
+   .chroma_key_mask = 8,
+   .general_config = 9,
+   },
+   .clut_offset = 0xf00,
+   },
+   {
+   .name = "high-end-overlay",
+   .formats = &atmel_hlcdc_plane_rgb_and_yuv_formats,
+   .regs_offset = 0x360,
+   .id = 3,
+   .type = ATMEL_HLCDC_OVERLAY_LAYER,
+   .cfgs_offset = 0x30,
+   .layout = {
+   .pos = 2,
+   .size = 3,
+   .memsize = 4,
+   .xstride = { 5, 7 },
+   .pstride = { 6, 8 },
+   .default_color = 9,
+   .chroma_key = 10,
+   .chroma_key_mask = 11,
+   .general_config = 12,
+   .csc = 16,
+   .scaler_config = 23,
+   },
+   .clut_offset = 0x1300,
+   },
+};
+
+static const struct atmel_hlcdc_dc_desc atmel_xlcdc_dc_sam9x75 = {
+   .min_width = 0,
+   .min_height = 0,
+   .max_width = 2048,
+   .max_height = 2048,
+   .max_spw = 0xff,
+   .max_vpw = 0xff,
+   .max_hpw = 0x3ff,
+   .fixed_clksrc = true,
+   .is_xlcdc = true,
+   .nlayers = ARRAY_SIZE(atmel_xlcdc_sam9x75_layers),
+   .layers = atmel_xlcdc_sam9x75_layers,
+};
+
 static const struct of_device_id atmel_hlcdc_of_match[] = {
{
.compatible = "atmel,at91sam9n12-hlcdc",
@@ -487,6 +580,10 @@ static const struct of_device_id atmel_hlcdc_of_match[] = {
.compatible = "microchip,sam9x60-hlcdc",
.data = &atmel_hlcdc_dc_sam9x60,
},
+   {
+   .compatible = "microchip,sam9x75-xlcdc",
+   .data = &atmel_xlcdc_dc_sam9x75,
+   },
{ /* sentinel */ },
 };
 MODULE_DEVICE_TABLE(of, atmel_hlcdc_of_match);
-- 
2.25.1

[PATCH RESEND v7 3/7] drm: atmel-hlcdc: Define SAM9X7 SoC XLCDC specific registers

2024-01-29 Thread Manikandan Muralidharan

From: Durai Manickam KR 

The register address of the XLCDC IP used in SAM9X7 SoC family
are different from the previous HLCDC. Defining those address
space with valid macros.

Signed-off-by: Durai Manickam KR 
[manikanda...@microchip.com: Remove unused macro definitions]
Signed-off-by: Manikandan Muralidharan 
---
 drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.h | 42 
 include/linux/mfd/atmel-hlcdc.h  | 10 +
 2 files changed, 52 insertions(+)

diff --git a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.h 
b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.h
index d5e01ff8c7f9..e6b4b0bb11b9 100644
--- a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.h
+++ b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.h
@@ -15,6 +15,7 @@
 
 #include 
 
+/* LCD controller common registers */
 #define ATMEL_HLCDC_LAYER_CHER 0x0
 #define ATMEL_HLCDC_LAYER_CHDR 0x4
 #define ATMEL_HLCDC_LAYER_CHSR 0x8
@@ -128,6 +129,47 @@
 
 #define ATMEL_HLCDC_MAX_LAYERS 6
 
+/* XLCDC controller specific registers */
+#define ATMEL_XLCDC_LAYER_ENR  0x10
+#define ATMEL_XLCDC_LAYER_EN   BIT(0)
+
+#define ATMEL_XLCDC_LAYER_IER  0x0
+#define ATMEL_XLCDC_LAYER_IDR  0x4
+#define ATMEL_XLCDC_LAYER_ISR  0xc
+#define ATMEL_XLCDC_LAYER_OVR_IRQ(p)   BIT(2 + (8 * (p)))
+
+#define ATMEL_XLCDC_LAYER_PLANE_ADDR(p)(((p) * 0x4) + 0x18)
+
+#define ATMEL_XLCDC_LAYER_DMA_CFG  0
+
+#define ATMEL_XLCDC_LAYER_DMA  BIT(0)
+#define ATMEL_XLCDC_LAYER_REP  BIT(1)
+#define ATMEL_XLCDC_LAYER_DISCEN   BIT(4)
+
+#define ATMEL_XLCDC_LAYER_SFACTC_A0_MULT_AS(4 << 6)
+#define ATMEL_XLCDC_LAYER_SFACTA_ONE   BIT(9)
+#define ATMEL_XLCDC_LAYER_DFACTC_M_A0_MULT_AS  (6 << 11)
+#define ATMEL_XLCDC_LAYER_DFACTA_ONE   BIT(14)
+
+#define ATMEL_XLCDC_LAYER_A0_SHIFT 16
+#define ATMEL_XLCDC_LAYER_A0(x)\
+   ((x) << ATMEL_XLCDC_LAYER_A0_SHIFT)
+
+#define ATMEL_XLCDC_LAYER_VSCALER_LUMA_ENABLE  BIT(0)
+#define ATMEL_XLCDC_LAYER_VSCALER_CHROMA_ENABLEBIT(1)
+#define ATMEL_XLCDC_LAYER_HSCALER_LUMA_ENABLE  BIT(4)
+#define ATMEL_XLCDC_LAYER_HSCALER_CHROMA_ENABLEBIT(5)
+
+#define ATMEL_XLCDC_LAYER_VXSYCFG_ONE  BIT(0)
+#define ATMEL_XLCDC_LAYER_VXSYTAP2_ENABLE  BIT(4)
+#define ATMEL_XLCDC_LAYER_VXSCCFG_ONE  BIT(16)
+#define ATMEL_XLCDC_LAYER_VXSCTAP2_ENABLE  BIT(20)
+
+#define ATMEL_XLCDC_LAYER_HXSYCFG_ONE  BIT(0)
+#define ATMEL_XLCDC_LAYER_HXSYTAP2_ENABLE  BIT(4)
+#define ATMEL_XLCDC_LAYER_HXSCCFG_ONE  BIT(16)
+#define ATMEL_XLCDC_LAYER_HXSCTAP2_ENABLE  BIT(20)
+
 /**
  * Atmel HLCDC Layer registers layout structure
  *
diff --git a/include/linux/mfd/atmel-hlcdc.h b/include/linux/mfd/atmel-hlcdc.h
index a186119a49b5..80d675a03b39 100644
--- a/include/linux/mfd/atmel-hlcdc.h
+++ b/include/linux/mfd/atmel-hlcdc.h
@@ -22,6 +22,8 @@
 #define ATMEL_HLCDC_DITHER BIT(6)
 #define ATMEL_HLCDC_DISPDLYBIT(7)
 #define ATMEL_HLCDC_MODE_MASK  GENMASK(9, 8)
+#define ATMEL_XLCDC_MODE_MASK  GENMASK(10, 8)
+#define ATMEL_XLCDC_DPIBIT(11)
 #define ATMEL_HLCDC_PP BIT(10)
 #define ATMEL_HLCDC_VSPSU  BIT(12)
 #define ATMEL_HLCDC_VSPHO  BIT(13)
@@ -34,6 +36,12 @@
 #define ATMEL_HLCDC_IDR0x30
 #define ATMEL_HLCDC_IMR0x34
 #define ATMEL_HLCDC_ISR0x38
+#define ATMEL_XLCDC_ATTRE  0x3c
+
+#define ATMEL_XLCDC_BASE_UPDATEBIT(0)
+#define ATMEL_XLCDC_OVR1_UPDATEBIT(1)
+#define ATMEL_XLCDC_OVR3_UPDATEBIT(2)
+#define ATMEL_XLCDC_HEO_UPDATE BIT(3)
 
 #define ATMEL_HLCDC_CLKPOL BIT(0)
 #define ATMEL_HLCDC_CLKSEL BIT(2)
@@ -48,6 +56,8 @@
 #define ATMEL_HLCDC_DISP   BIT(2)
 #define ATMEL_HLCDC_PWMBIT(3)
 #define ATMEL_HLCDC_SIPBIT(4)
+#define ATMEL_XLCDC_SD BIT(5)
+#define ATMEL_XLCDC_CM BIT(6)
 
 #define ATMEL_HLCDC_SOFBIT(0)
 #define ATMEL_HLCDC_SYNCDISBIT(1)
-- 
2.25.1

[PATCH RESEND v7 4/7] drm: atmel_hlcdc: Add support for XLCDC in atmel LCD driver

2024-01-29 Thread Manikandan Muralidharan

XLCDC in SAM9X7 has different sets of registers and additional
configuration bits when compared to previous HLCDC IP. Read/write
operation on the controller registers is now separated using the
XLCDC status flag and with HLCDC and XLCDC IP specific ops.
HEO scaling, window resampling, Alpha blending, YUV-to-RGB
conversion in XLCDC is derived and handled using additional
configuration bits and registers. Writing one to the Enable fields
of each layer in LCD_ATTRE is required to reflect the values set
in Configuration, FBA, Enable registers of each layer.

Signed-off-by: Manikandan Muralidharan 
Co-developed-by: Hari Prasath Gujulan Elango 
Signed-off-by: Hari Prasath Gujulan Elango 
Co-developed-by: Durai Manickam KR 
Signed-off-by: Durai Manickam KR 
---
 .../gpu/drm/atmel-hlcdc/atmel_hlcdc_crtc.c|  33 +-
 drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c  |   6 +
 drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.h  |   3 +
 .../gpu/drm/atmel-hlcdc/atmel_hlcdc_plane.c   | 349 +++---
 4 files changed, 329 insertions(+), 62 deletions(-)

diff --git a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_crtc.c 
b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_crtc.c
index cc5cf4c2faf7..1ac31c0c474a 100644
--- a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_crtc.c
+++ b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_crtc.c
@@ -79,6 +79,7 @@ static void atmel_hlcdc_crtc_mode_set_nofb(struct drm_crtc *c)
unsigned int mask = ATMEL_HLCDC_CLKDIV_MASK | ATMEL_HLCDC_CLKPOL;
unsigned int cfg = 0;
int div, ret;
+   bool is_xlcdc = crtc->dc->desc->is_xlcdc;
 
/* get encoder from crtc */
drm_for_each_encoder(en_iter, ddev) {
@@ -164,10 +165,10 @@ static void atmel_hlcdc_crtc_mode_set_nofb(struct 
drm_crtc *c)
state = drm_crtc_state_to_atmel_hlcdc_crtc_state(c->state);
cfg = state->output_mode << 8;
 
-   if (adj->flags & DRM_MODE_FLAG_NVSYNC)
+   if (!is_xlcdc && (adj->flags & DRM_MODE_FLAG_NVSYNC))
cfg |= ATMEL_HLCDC_VSPOL;
 
-   if (adj->flags & DRM_MODE_FLAG_NHSYNC)
+   if (!is_xlcdc && (adj->flags & DRM_MODE_FLAG_NHSYNC))
cfg |= ATMEL_HLCDC_HSPOL;
 
regmap_update_bits(regmap, ATMEL_HLCDC_CFG(5),
@@ -202,6 +203,20 @@ static void atmel_hlcdc_crtc_atomic_disable(struct 
drm_crtc *c,
 
pm_runtime_get_sync(dev->dev);
 
+   if (crtc->dc->desc->is_xlcdc) {
+   regmap_write(regmap, ATMEL_HLCDC_DIS, ATMEL_XLCDC_CM);
+   if (regmap_read_poll_timeout(regmap, ATMEL_HLCDC_SR, status,
+!(status & ATMEL_XLCDC_CM),
+10, 1000))
+   dev_warn(dev->dev, "Atmel LCDC status register CMSTS 
timeout\n");
+
+   regmap_write(regmap, ATMEL_HLCDC_DIS, ATMEL_XLCDC_SD);
+   if (regmap_read_poll_timeout(regmap, ATMEL_HLCDC_SR, status,
+status & ATMEL_XLCDC_SD,
+10, 1000))
+   dev_warn(dev->dev, "Atmel LCDC status register SDSTS 
timeout\n");
+   }
+
regmap_write(regmap, ATMEL_HLCDC_DIS, ATMEL_HLCDC_DISP);
while (!regmap_read(regmap, ATMEL_HLCDC_SR, &status) &&
   (status & ATMEL_HLCDC_DISP))
@@ -256,6 +271,20 @@ static void atmel_hlcdc_crtc_atomic_enable(struct drm_crtc 
*c,
   !(status & ATMEL_HLCDC_DISP))
cpu_relax();
 
+   if (crtc->dc->desc->is_xlcdc) {
+   regmap_write(regmap, ATMEL_HLCDC_EN, ATMEL_XLCDC_CM);
+   if (regmap_read_poll_timeout(regmap, ATMEL_HLCDC_SR, status,
+status & ATMEL_XLCDC_CM,
+10, 1000))
+   dev_warn(dev->dev, "Atmel LCDC status register CMSTS 
timeout\n");
+
+   regmap_write(regmap, ATMEL_HLCDC_EN, ATMEL_XLCDC_SD);
+   if (regmap_read_poll_timeout(regmap, ATMEL_HLCDC_SR, status,
+!(status & ATMEL_XLCDC_SD),
+10, 1000))
+   dev_warn(dev->dev, "Atmel LCDC status register SDSTS 
timeout\n");
+   }
+
pm_runtime_put_sync(dev->dev);
 
 }
diff --git a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c 
b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c
index d30aec174aa2..18a3a95f94be 100644
--- a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c
+++ b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c
@@ -58,6 +58,7 @@ static const struct atmel_hlcdc_dc_desc 
atmel_hlcdc_dc_at91sam9n12 = {
.conflicting_output_formats = true,
.nlayers = ARRAY_SIZE(atmel_hlcdc_at91sam9n12_layers),
.layers = atmel_hlcdc_at91sam9n12_layers,
+   .ops = &atmel_hlcdc_ops,
 };
 
 static const struct atmel_hlcdc_layer_desc atmel_hlcdc_at91sam9x5_layers[] = {
@@ -151,6 +152,7 @@ static const struct atmel_hlcdc_dc_desc 
atmel_hlcdc_dc_at91sam9x5 = {

[PATCH RESEND v7 6/7] drm: atmel-hlcdc: add vertical and horizontal scaling support for XLCDC

2024-01-29 Thread Manikandan Muralidharan

Update the LCDC_HEOCFG30 and LCDC_HEOCFG31 registers of XLCDC IP which
supports vertical and horizontal scaling with Bilinear and Bicubic
co-efficients taps for Chroma and Luma componenets of the Pixel.

Signed-off-by: Manikandan Muralidharan 
---
 drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c  |  2 ++
 drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.h  |  4 
 .../gpu/drm/atmel-hlcdc/atmel_hlcdc_plane.c   | 20 +++
 3 files changed, 26 insertions(+)

diff --git a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c 
b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c
index 18a3a95f94be..debd4bf3e1b0 100644
--- a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c
+++ b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.c
@@ -541,6 +541,8 @@ static const struct atmel_hlcdc_layer_desc 
atmel_xlcdc_sam9x75_layers[] = {
.general_config = 12,
.csc = 16,
.scaler_config = 23,
+   .vxs_config = 30,
+   .hxs_config = 31,
},
.clut_offset = 0x1300,
},
diff --git a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.h 
b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.h
index 1ef15f2d536c..216beaf1da0e 100644
--- a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.h
+++ b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_dc.h
@@ -198,6 +198,8 @@
  * @disc_pos: discard area position register
  * @disc_size: discard area size register
  * @csc: color space conversion register
+ * @vxs_config: vertical scalar filter taps control register
+ * @hxs_config: horizontal scalar filter taps control register
  */
 struct atmel_hlcdc_layer_cfg_layout {
int xstride[ATMEL_HLCDC_LAYER_MAX_PLANES];
@@ -217,6 +219,8 @@ struct atmel_hlcdc_layer_cfg_layout {
int disc_pos;
int disc_size;
int csc;
+   int vxs_config;
+   int hxs_config;
 };
 
 struct atmel_hlcdc_plane_state;
diff --git a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_plane.c 
b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_plane.c
index 59ddd743ce92..a527badf865d 100644
--- a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_plane.c
+++ b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_plane.c
@@ -966,6 +966,26 @@ static void xlcdc_csc_init(struct atmel_hlcdc_plane *plane,
desc->layout.csc + i,
xlcdc_csc_coeffs[i]);
}
+
+   if (desc->layout.vxs_config && desc->layout.hxs_config) {
+   /*
+* Updating vxs.config and hxs.config fixes the
+* Green Color Issue in SAM9X7 EGT Video Player App
+*/
+   atmel_hlcdc_layer_write_cfg(&plane->layer,
+   desc->layout.vxs_config,
+   ATMEL_XLCDC_LAYER_VXSYCFG_ONE |
+   ATMEL_XLCDC_LAYER_VXSYTAP2_ENABLE |
+   ATMEL_XLCDC_LAYER_VXSCCFG_ONE |
+   ATMEL_XLCDC_LAYER_VXSCTAP2_ENABLE);
+
+   atmel_hlcdc_layer_write_cfg(&plane->layer,
+   desc->layout.hxs_config,
+   ATMEL_XLCDC_LAYER_HXSYCFG_ONE |
+   ATMEL_XLCDC_LAYER_HXSYTAP2_ENABLE |
+   ATMEL_XLCDC_LAYER_HXSCCFG_ONE |
+   ATMEL_XLCDC_LAYER_HXSCTAP2_ENABLE);
+   }
 }
 
 static int atmel_hlcdc_plane_init_properties(struct atmel_hlcdc_plane *plane)
-- 
2.25.1

[PATCH RESEND v7 5/7] drm: atmel-hlcdc: add DPI mode support for XLCDC

2024-01-29 Thread Manikandan Muralidharan

Add support for Display Pixel Interface (DPI) Compatible Mode
support in atmel-hlcdc driver for XLCDC IP along with legacy
pixel mapping. DPI mode BIT is configured in LCDC_CFG5 register.

Signed-off-by: Manikandan Muralidharan 
[durai.manicka...@microchip.com: update DPI mode bit using is_xlcdc flag]
Signed-off-by: Durai Manickam KR 
---
 .../gpu/drm/atmel-hlcdc/atmel_hlcdc_crtc.c| 21 +--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_crtc.c 
b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_crtc.c
index 1ac31c0c474a..1899be2eb6a3 100644
--- a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_crtc.c
+++ b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_crtc.c
@@ -30,10 +30,12 @@
  *
  * @base: base CRTC state
  * @output_mode: RGBXXX output mode
+ * @dpi: output DPI mode
  */
 struct atmel_hlcdc_crtc_state {
struct drm_crtc_state base;
unsigned int output_mode;
+   u8 dpi;
 };
 
 static inline struct atmel_hlcdc_crtc_state *
@@ -164,6 +166,8 @@ static void atmel_hlcdc_crtc_mode_set_nofb(struct drm_crtc 
*c)
 
state = drm_crtc_state_to_atmel_hlcdc_crtc_state(c->state);
cfg = state->output_mode << 8;
+   if (is_xlcdc)
+   cfg |= state->dpi << 11;
 
if (!is_xlcdc && (adj->flags & DRM_MODE_FLAG_NVSYNC))
cfg |= ATMEL_HLCDC_VSPOL;
@@ -176,7 +180,9 @@ static void atmel_hlcdc_crtc_mode_set_nofb(struct drm_crtc 
*c)
   ATMEL_HLCDC_VSPDLYS | ATMEL_HLCDC_VSPDLYE |
   ATMEL_HLCDC_DISPPOL | ATMEL_HLCDC_DISPDLY |
   ATMEL_HLCDC_VSPSU | ATMEL_HLCDC_VSPHO |
-  ATMEL_HLCDC_GUARDTIME_MASK | ATMEL_HLCDC_MODE_MASK,
+  ATMEL_HLCDC_GUARDTIME_MASK |
+  (is_xlcdc ? ATMEL_XLCDC_MODE_MASK |
+  ATMEL_XLCDC_DPI : ATMEL_HLCDC_MODE_MASK),
   cfg);
 
clk_disable_unprepare(crtc->dc->hlcdc->sys_clk);
@@ -374,7 +380,15 @@ static int atmel_hlcdc_crtc_select_output_mode(struct 
drm_crtc_state *state)
 
hstate = drm_crtc_state_to_atmel_hlcdc_crtc_state(state);
hstate->output_mode = fls(output_fmts) - 1;
-
+   if (crtc->dc->desc->is_xlcdc) {
+   /* check if MIPI DPI bit needs to be set */
+   if (fls(output_fmts) > 3) {
+   hstate->output_mode -= 4;
+   hstate->dpi = 1;
+   } else {
+   hstate->dpi = 0;
+   }
+   }
return 0;
 }
 
@@ -478,6 +492,7 @@ static struct drm_crtc_state *
 atmel_hlcdc_crtc_duplicate_state(struct drm_crtc *crtc)
 {
struct atmel_hlcdc_crtc_state *state, *cur;
+   struct atmel_hlcdc_crtc *c = drm_crtc_to_atmel_hlcdc_crtc(crtc);
 
if (WARN_ON(!crtc->state))
return NULL;
@@ -489,6 +504,8 @@ atmel_hlcdc_crtc_duplicate_state(struct drm_crtc *crtc)
 
cur = drm_crtc_state_to_atmel_hlcdc_crtc_state(crtc->state);
state->output_mode = cur->output_mode;
+   if (c->dc->desc->is_xlcdc)
+   state->dpi = cur->dpi;
 
return &state->base;
 }
-- 
2.25.1

[PATCH RESEND v7 7/7] drm: atmel-hlcdc: add support for DSI output formats

2024-01-29 Thread Manikandan Muralidharan

Add support for the following DPI mode if the encoder type
is DSI as per the XLCDC IP datasheet:
- 16BPPCFG1
- 16BPPCFG2
- 16BPPCFG3
- 18BPPCFG1
- 18BPPCFG2
- 24BPP

Signed-off-by: Manikandan Muralidharan 
[durai.manicka...@microchip.com: update output format using is_xlcdc flag]
Signed-off-by: Durai Manickam KR 
---
 .../gpu/drm/atmel-hlcdc/atmel_hlcdc_crtc.c| 123 +-
 1 file changed, 88 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_crtc.c 
b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_crtc.c
index 1899be2eb6a3..6f529769b036 100644
--- a/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_crtc.c
+++ b/drivers/gpu/drm/atmel-hlcdc/atmel_hlcdc_crtc.c
@@ -295,11 +295,18 @@ static void atmel_hlcdc_crtc_atomic_enable(struct 
drm_crtc *c,
 
 }
 
-#define ATMEL_HLCDC_RGB444_OUTPUT  BIT(0)
-#define ATMEL_HLCDC_RGB565_OUTPUT  BIT(1)
-#define ATMEL_HLCDC_RGB666_OUTPUT  BIT(2)
-#define ATMEL_HLCDC_RGB888_OUTPUT  BIT(3)
-#define ATMEL_HLCDC_OUTPUT_MODE_MASK   GENMASK(3, 0)
+#define ATMEL_HLCDC_RGB444_OUTPUT  BIT(0)
+#define ATMEL_HLCDC_RGB565_OUTPUT  BIT(1)
+#define ATMEL_HLCDC_RGB666_OUTPUT  BIT(2)
+#define ATMEL_HLCDC_RGB888_OUTPUT  BIT(3)
+#define ATMEL_HLCDC_DPI_RGB565C1_OUTPUTBIT(4)
+#define ATMEL_HLCDC_DPI_RGB565C2_OUTPUTBIT(5)
+#define ATMEL_HLCDC_DPI_RGB565C3_OUTPUTBIT(6)
+#define ATMEL_HLCDC_DPI_RGB666C1_OUTPUTBIT(7)
+#define ATMEL_HLCDC_DPI_RGB666C2_OUTPUTBIT(8)
+#define ATMEL_HLCDC_DPI_RGB888_OUTPUT  BIT(9)
+#define ATMEL_HLCDC_OUTPUT_MODE_MASK   GENMASK(3, 0)
+#define ATMEL_XLCDC_OUTPUT_MODE_MASK   GENMASK(9, 0)
 
 static int atmel_hlcdc_connector_output_mode(struct drm_connector_state *state)
 {
@@ -313,53 +320,99 @@ static int atmel_hlcdc_connector_output_mode(struct 
drm_connector_state *state)
if (!encoder)
encoder = connector->encoder;
 
-   switch (atmel_hlcdc_encoder_get_bus_fmt(encoder)) {
-   case 0:
-   break;
-   case MEDIA_BUS_FMT_RGB444_1X12:
-   return ATMEL_HLCDC_RGB444_OUTPUT;
-   case MEDIA_BUS_FMT_RGB565_1X16:
-   return ATMEL_HLCDC_RGB565_OUTPUT;
-   case MEDIA_BUS_FMT_RGB666_1X18:
-   return ATMEL_HLCDC_RGB666_OUTPUT;
-   case MEDIA_BUS_FMT_RGB888_1X24:
-   return ATMEL_HLCDC_RGB888_OUTPUT;
-   default:
-   return -EINVAL;
-   }
-
-   for (j = 0; j < info->num_bus_formats; j++) {
-   switch (info->bus_formats[j]) {
-   case MEDIA_BUS_FMT_RGB444_1X12:
-   supported_fmts |= ATMEL_HLCDC_RGB444_OUTPUT;
+   if (encoder->encoder_type == DRM_MODE_ENCODER_DSI) {
+   /*
+* atmel-hlcdc to support DSI formats with DSI video pipeline
+* when DRM_MODE_ENCODER_DSI type is set by
+* connector driver component.
+*/
+   switch (atmel_hlcdc_encoder_get_bus_fmt(encoder)) {
+   case 0:
break;
case MEDIA_BUS_FMT_RGB565_1X16:
-   supported_fmts |= ATMEL_HLCDC_RGB565_OUTPUT;
-   break;
+   return ATMEL_HLCDC_DPI_RGB565C1_OUTPUT;
case MEDIA_BUS_FMT_RGB666_1X18:
-   supported_fmts |= ATMEL_HLCDC_RGB666_OUTPUT;
-   break;
+   return ATMEL_HLCDC_DPI_RGB666C1_OUTPUT;
+   case MEDIA_BUS_FMT_RGB666_1X24_CPADHI:
+   return ATMEL_HLCDC_DPI_RGB666C2_OUTPUT;
case MEDIA_BUS_FMT_RGB888_1X24:
-   supported_fmts |= ATMEL_HLCDC_RGB888_OUTPUT;
-   break;
+   return ATMEL_HLCDC_DPI_RGB888_OUTPUT;
default:
+   return -EINVAL;
+   }
+
+   for (j = 0; j < info->num_bus_formats; j++) {
+   switch (info->bus_formats[j]) {
+   case MEDIA_BUS_FMT_RGB565_1X16:
+   supported_fmts |=
+   ATMEL_HLCDC_DPI_RGB565C1_OUTPUT;
+   break;
+   case MEDIA_BUS_FMT_RGB666_1X18:
+   supported_fmts |=
+   ATMEL_HLCDC_DPI_RGB666C1_OUTPUT;
+   break;
+   case MEDIA_BUS_FMT_RGB666_1X24_CPADHI:
+   supported_fmts |=
+   ATMEL_HLCDC_DPI_RGB666C2_OUTPUT;
+   break;
+   case MEDIA_BUS_FMT_RGB888_1X24:
+   supported_fmts |=
+   ATMEL_HLCDC_DPI_RGB888_OUTPUT;
+   break;
+

Re:[PATCH v4 00/14] drm: Add a driver for CSF-based Mali GPUs

2024-01-29 Thread Andy Yan


Hi Boris：

Thanks for you great work。

One thing please take note：
commit (arm64: dts: rockchip: rk3588: Add GPU nodes)  in [1] seems remove the 
"disabled" status 
of usb_host2_xhci, this may cause a boot issue on some boards that use 
combphy2_psu  phy for other functions.

https://gitlab.freedesktop.org/panfrost/linux/-/commit/bf5b542294894219c2d366f6dd0d32a9dcf17252


At 2024-01-23 00:30:31, "Boris Brezillon"  wrote:
>Hello,
>
>This is the 4th version of the kernel driver for Mali CSF-based GPUs.
>
>A branch based on drm-misc-next and containing all the dependencies
>that are not yet available in drm-misc-next here[1], and another [2]
>containing extra patches to have things working on rk3588. The CSF
>firmware binary can be found here[3], and should be placed under
>/lib/firmware/arm/mali/arch10.8/mali_csffw.bin.
>
>The mesa MR adding v10 support on top of panthor is available here [4].
>
>Steve, I intentionally dropped your R-b on "drm/panthor: Add the heap
>logical block" and "drm/panthor: Add the scheduler logical block"
>because the tiler-OOM handling changed enough to require a new review
>IMHO.
>
>Regarding the GPL2+MIT relicensing, I collected Clément's R-b for the
>devfreq code, but am still lacking Alexey Sheplyakov for some bits in
>panthor_gpu.c. The rest of the code is either new, or covered by the
>Linaro, Arm and Collabora acks.
>
>And here is a non-exhaustive changelog, check each commit for a detailed
>changelog.
>
>v4:
>- Fix various bugs in the VM logic
>- Address comments from Steven, Liviu, Ketil and Chris
>- Move tiler OOM handling out of the scheduler interrupt handling path
>  so we can properly recover when the system runs out of memory, and
>  panthor is blocked trying to allocate heap chunks
>- Rework the heap locking to support concurrent chunk allocation. Not
>  sure if this is supposed to happen, but we need to be robust against
>  userspace passing the same heap context to two scheduling groups.
>  Wasn't needed before the tiler_oom rework, because heap allocation
>  base serialized by the scheduler lock.
>- Make kernel BO destruction robust to NULL/ERR pointers
>
>v3;
>- Quite a few changes at the MMU/sched level to make the fix some
>  race conditions and deadlocks
>- Addition of the a sync-only VM_BIND operation (to support
>  vkQueueSparseBind with zero commands).
>- Addition of a VM_GET_STATE ioctl
>- Various cosmetic changes (see the commit changelogs for more details)
>- Various fixes (see the commit changelogs for more details)
>
>v2:
>- Rename the driver (pancsf -> panthor)
>- Split the commit adding the driver to ease review
>- Use drm_sched for dependency tracking/job submission
>- Add a VM_BIND ioctl
>- Add the concept of exclusive VM for BOs that are only ever mapped to a
>  single VM
>- Document the code and uAPI
>- Add a DT binding doc
>
>Regards,
>
>Boris
>
>[1]https://gitlab.freedesktop.org/panfrost/linux/-/tree/panthor-v4
>[2]https://gitlab.freedesktop.org/panfrost/linux/-/tree/panthor-v4+rk3588
>[3]https://gitlab.com/firefly-linux/external/libmali/-/raw/firefly/firmware/g610/mali_csffw.bin
>[4]https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26358
>
>Boris Brezillon (13):
>  drm/panthor: Add uAPI
>  drm/panthor: Add GPU register definitions
>  drm/panthor: Add the device logical block
>  drm/panthor: Add the GPU logical block
>  drm/panthor: Add GEM logical block
>  drm/panthor: Add the devfreq logical block
>  drm/panthor: Add the MMU/VM logical block
>  drm/panthor: Add the FW logical block
>  drm/panthor: Add the heap logical block
>  drm/panthor: Add the scheduler logical block
>  drm/panthor: Add the driver frontend block
>  drm/panthor: Allow driver compilation
>  drm/panthor: Add an entry to MAINTAINERS
>
>Liviu Dudau (1):
>  dt-bindings: gpu: mali-valhall-csf: Add support for Arm Mali CSF GPUs
>
> .../bindings/gpu/arm,mali-valhall-csf.yaml|  147 +
> Documentation/gpu/driver-uapi.rst |5 +
> MAINTAINERS   |   11 +
> drivers/gpu/drm/Kconfig   |2 +
> drivers/gpu/drm/Makefile  |1 +
> drivers/gpu/drm/panthor/Kconfig   |   23 +
> drivers/gpu/drm/panthor/Makefile  |   15 +
> drivers/gpu/drm/panthor/panthor_devfreq.c |  283 ++
> drivers/gpu/drm/panthor/panthor_devfreq.h |   25 +
> drivers/gpu/drm/panthor/panthor_device.c  |  544 +++
> drivers/gpu/drm/panthor/panthor_device.h  |  393 ++
> drivers/gpu/drm/panthor/panthor_drv.c | 1470 +++
> drivers/gpu/drm/panthor/panthor_fw.c  | 1334 +++
> drivers/gpu/drm/panthor/panthor_fw.h  |  504 +++
> drivers/gpu/drm/panthor/panthor_gem.c |  228 ++
> drivers/gpu/drm/panthor/panthor_gem.h |  144 +
> drivers/gpu/drm/panthor/panthor_gpu.c |  482 +++
> drivers/gpu/drm/panthor/panthor_gpu.h |   52 +
> drivers/gpu/drm/panthor/panthor_heap.c|  596 +++
> drivers/gpu/drm/panthor/panthor_heap.h|   39 +

Re: [PATCH 2/2] drm/amd: Fetch the EDID from _DDC if available for eDP

2024-01-29 Thread Jani Nikula

On Fri, 26 Jan 2024, Mario Limonciello  wrote:
> Some manufacturers have intentionally put an EDID that differs from
> the EDID on the internal panel on laptops.
>
> Attempt to fetch this EDID if it exists and prefer it over the EDID
> that is provided by the panel.
>
> Signed-off-by: Mario Limonciello 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  2 ++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c  | 30 +++
>  .../gpu/drm/amd/amdgpu/amdgpu_connectors.c|  5 
>  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |  8 -
>  .../amd/display/amdgpu_dm/amdgpu_dm_helpers.c |  7 +++--
>  5 files changed, 49 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index c5f3859fd682..99abe12567a4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -1520,6 +1520,7 @@ int amdgpu_acpi_get_mem_info(struct amdgpu_device 
> *adev, int xcc_id,
>  
>  void amdgpu_acpi_get_backlight_caps(struct amdgpu_dm_backlight_caps *caps);
>  bool amdgpu_acpi_should_gpu_reset(struct amdgpu_device *adev);
> +void *amdgpu_acpi_edid(struct amdgpu_device *adev, struct drm_connector 
> *connector);
>  void amdgpu_acpi_detect(void);
>  void amdgpu_acpi_release(void);
>  #else
> @@ -1537,6 +1538,7 @@ static inline int amdgpu_acpi_get_mem_info(struct 
> amdgpu_device *adev,
>  }
>  static inline void amdgpu_acpi_fini(struct amdgpu_device *adev) { }
>  static inline bool amdgpu_acpi_should_gpu_reset(struct amdgpu_device *adev) 
> { return false; }
> +static inline void *amdgpu_acpi_edid(struct amdgpu_device *adev, struct 
> drm_connector *connector) { return NULL; }
>  static inline void amdgpu_acpi_detect(void) { }
>  static inline void amdgpu_acpi_release(void) { }
>  static inline bool amdgpu_acpi_is_power_shift_control_supported(void) { 
> return false; }
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> index e550067e5c5d..c106335f1f22 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> @@ -1380,6 +1380,36 @@ bool amdgpu_acpi_should_gpu_reset(struct amdgpu_device 
> *adev)
>  #endif
>  }
>  
> +/**
> + * amdgpu_acpi_edid
> + * @adev: amdgpu_device pointer
> + * @connector: drm_connector pointer
> + *
> + * Returns the EDID used for the internal panel if present, NULL otherwise.
> + */
> +void *
> +amdgpu_acpi_edid(struct amdgpu_device *adev, struct drm_connector *connector)
> +{
> + struct drm_device *ddev = adev_to_drm(adev);
> + struct acpi_device *acpidev = ACPI_COMPANION(ddev->dev);
> + void *edid;
> + int r;
> +
> + if (!acpidev)
> + return NULL;
> +
> + if (connector->connector_type != DRM_MODE_CONNECTOR_eDP)
> + return NULL;
> +
> + r = acpi_video_get_edid(acpidev, ACPI_VIDEO_DISPLAY_LCD, -1, &edid);
> + if (r < 0) {
> + DRM_DEBUG_DRIVER("Failed to get EDID from ACPI: %d\n", r);
> + return NULL;
> + }
> +
> + return kmemdup(edid, r, GFP_KERNEL);
> +}
> +
>  /*
>   * amdgpu_acpi_detect - detect ACPI ATIF/ATCS methods
>   *
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
> index 9caba10315a8..c7e1563a46d3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
> @@ -278,6 +278,11 @@ static void amdgpu_connector_get_edid(struct 
> drm_connector *connector)
>   struct amdgpu_device *adev = drm_to_adev(dev);
>   struct amdgpu_connector *amdgpu_connector = 
> to_amdgpu_connector(connector);
>  
> + if (amdgpu_connector->edid)
> + return;
> +
> + /* if the BIOS specifies the EDID via _DDC, prefer this */
> + amdgpu_connector->edid = amdgpu_acpi_edid(adev, connector);

Imagine the EDID returned by acpi_video_get_edid() has edid->extensions
bigger than 4. Of course it should not, but you have no guarantees, and
it originates outside of the kernel.

The real fix is to have the function return a struct drm_edid which
tracks the allocation size separately. Unfortunately, it requires a
bunch of changes along the way. We've mostly done it in i915, and I've
sent a series to do this in drm/bridge [1].

Bottom line, we should stop using struct edid in drivers. They'll all
parse the info differently, and from what I've seen, often wrong.


BR,
Jani.


[1] https://patchwork.freedesktop.org/series/128149/


>   if (amdgpu_connector->edid)
>   return;
>  
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index cd98b3565178..1faa21f542bd 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -6562,17 +6562,23 @@ static void amdgpu_dm_connector_funcs_force(struct 
> drm_connector *connector)
>  {
>

RE: [PATCH next 10/11] block: Use a boolean expression instead of max() on booleans

2024-01-29 Thread Jani Nikula

On Mon, 29 Jan 2024, David Laight  wrote:
> From: Jani Nikula
>> Sent: 29 January 2024 09:08
>> 
>> On Sun, 28 Jan 2024, David Laight  wrote:
>> > blk_stack_limits() contains:
>> >t->zoned = max(t->zoned, b->zoned);
>> > These are bool, so it is just a bitwise or.
>> 
>> Should be a logical or, really. And || in code.
>
> Not really, bitwise is fine for bool (especially for 'or')
> and generates better code.

Logical operations for booleans are more readable for humans than
bitwise. And semantically correct.

With a = b || c you know what happens regardless of the types in
question. a = b | c you have to look up the types to know what's going
on.

To me, better code only matters if it's a hotpath.

That said, not my are of maintenance, so *shrug*.

BR,
Jani.

-- 
Jani Nikula, Intel

Re: Making drm_gpuvm work across gpu devices

2024-01-29 Thread Christian König

Am 26.01.24 um 21:13 schrieb Zeng, Oak:

-Original Message-
From: Christian König
Sent: Friday, January 26, 2024 5:10 AM
To: Zeng, Oak; David Airlie
Cc: Ghimiray, Himal Prasad;
thomas.hellst...@linux.intel.com; Winiarski, Michal
; Felix Kuehling; Welty,
Brian; Shah, Ankur N; dri-
de...@lists.freedesktop.org;intel...@lists.freedesktop.org; Gupta, saurabhg
; Danilo Krummrich; Daniel
Vetter; Brost, Matthew; Bommu,
Krishnaiah; Vishwanathapura, Niranjana

Subject: Re: Making drm_gpuvm work across gpu devices

Hi Oak,

you can still use SVM, but it should not be a design criteria for the
kernel UAPI. In other words the UAPI should be designed in such a way
that the GPU virtual address can be equal to the CPU virtual address of
a buffer, but can also be different to support use cases where this
isn't the case.

Terminology:
SVM: any technology which can achieve a shared virtual address space b/t cpu 
and devices. The virtual address space can be managed by user space or kernel 
space. Intel implemented a SVM, based on the BO-centric gpu driver (gem-create, 
vm-bind) where virtual address space is managed by UMD.
System allocator: another way of implement SVM. User just use malloc'ed memory 
for gpu submission. Virtual address space is managed by Linux core mm. In 
practice, we leverage HMM to implement system allocator.
This article described details of all those different 
model:https://developer.nvidia.com/blog/simplifying-gpu-application-development-with-heterogeneous-memory-management/

Our programming model allows a mixture use of system allocator (even though 
system allocator is ) and traditional vm_bind (where cpu address can != gpu 
address). Let me re-post the pseudo codes:

1. Fd0 = open(/"dev/dri/render0")
2. Fd1 = open("/dev/dri/render1")
3. Fd3 = open("/dev/dri/xe-svm")
4. Gpu_Vm0 =xe_vm_create(fd0)
5. Gpu_Vm1 = xe_vm_create(fd1)
6. Queue0 = xe_exec_queue_create(fd0, gpu_vm0)
7. Queue1 = xe_exec_queue_create(fd1, gpu_vm1)
8. ptr = malloc()
9. bo = xe_bo_create(fd0)
10. Vm_bind(bo, gpu_vm0, va)//va is from UMD, cpu can access bo with 
same or different va. It is UMD's responsibility that va doesn't conflict with 
malloc'ed PTRs.
11. Xe_exec(queue0, ptr)//submit gpu job which use ptr, on card0
12. Xe_exec(queue1, ptr)//submit gpu job which use ptr, on card1
13. Xe_exec(queue0, va)//submit gpu job which use va, on card0

In above codes, the va used in vm_bind (line 10, Intel's API to bind an object 
to a va for GPU access) can be different from the CPU address when cpu access 
the same object. But whenever user use malloc'ed ptr for GPU submission (line 
11, 12, so called system allocator), it implies CPU and GPUs use the same ptr 
to access.

In above vm_bind, it is user/UMD's responsibility to guarantee that vm_bind va 
doesn't conflict with malloc'ed ptr. Otherwise it is treated as programming 
error.

I think this design still meets your design restrictions.

Well why do you need this "Fd3 = open("/dev/dri/xe-svm")" ?

As far as I see fd3 isn't used anywhere. What you can do is to bind 
parts of your process address space to your driver connections (fd1, fd2 
etc..) with a vm_bind(), but this should *not* come because of 
implicitely using some other file descriptor in the process.

As far as I can see this design is exactly what failed so badly with KFD.

Regards,
Christian.

Additionally to what Dave wrote I can summarize a few things I have
learned while working on the AMD GPU drivers in the last decade or so:

1. Userspace requirements are *not* relevant for UAPI or even more
general kernel driver design.

2. What should be done is to look at the hardware capabilities and try
to expose those in a save manner to userspace.

3. The userspace requirements are then used to validate the kernel
driver and especially the UAPI design to ensure that nothing was missed.

The consequence of this is that nobody should ever use things like Cuda,
Vulkan, OpenCL, OpenGL etc.. as argument to propose a certain UAPI design.

What should be done instead is to say: My hardware works in this and
that way -> we want to expose it like this -> because that enables us to
implement the high level API in this and that way.

Only this gives then a complete picture of how things interact together
and allows the kernel community to influence and validate the design.

What you described above is mainly bottom up. I know other people do top down, 
or whole system vertical HW-SW co-design. I don't have strong opinion here.

Regards,
Oak

This doesn't mean that you need to throw away everything, but it gives a
clear restriction that designs are not nailed in stone and for example
you can't use something like a waterfall model.

Going to answer on your other questions separately.

Regards,
Christian.

Am 25.01.24 um 06:25 schrieb Zeng, Oak:

Hi Dave,

Let me step back. When I wrote " shared virtual address spac

[PATCH v2 0/1] Implement device_attach for virtio gpu

2024-01-29 Thread Julia Zhang

To realize dGPU prime feature for virtio gpu, we are trying to let dGPU
import vram object of virtio gpu. As vram objects don't have backing pages
and thus can't implement the drm_gem_object_funcs.get_sg_table callback,
this removes calling drm_gem_map_dma_buf in virtgpu_gem_map_dma_buf and
implement virtgpu specific map/unmap/attach callbacks to support both of
shmem objects and vram objects.
 
Changes from v1 to v2:
-Reimplement virtgpu_gem_device_attach() 
-Remove calling drm dma-buf funcs in virtgpu callbacks and reimplement virtgpu
specific dma-buf callbacks.

Julia Zhang (1):
  drm/virtio: Implement device_attach

 drivers/gpu/drm/virtio/virtgpu_prime.c | 40 +++---
 1 file changed, 36 insertions(+), 4 deletions(-)

-- 
2.34.1

Re: [PATCH next 10/11] block: Use a boolean expression instead of max() on booleans

2024-01-29 Thread Dan Carpenter

On Mon, Jan 29, 2024 at 09:22:40AM +, David Laight wrote:
> From: Jani Nikula
> > Sent: 29 January 2024 09:08
> > 
> > On Sun, 28 Jan 2024, David Laight  wrote:
> > > blk_stack_limits() contains:
> > >   t->zoned = max(t->zoned, b->zoned);
> > > These are bool, so it is just a bitwise or.
> > 
> > Should be a logical or, really. And || in code.
> 
> Not really, bitwise is fine for bool (especially for 'or')
> and generates better code.

For | vs || the type doesn't make a difference...  It makes a difference
for AND.  "0x1 & 0x10" vs "0x1 && 0x10".

regards,
dan carpenter

Re: Making drm_gpuvm work across gpu devices

2024-01-29 Thread Christian König

Well Daniel and Dave noted it as well, so I'm just repeating it: Your 
design choices are not an argument to get something upstream.


It's the job of the maintainers and at the end of the Linus to judge of 
something is acceptable or not.


As far as I can see a good part of this this idea has been exercised 
lengthy with KFD and it turned out to not be the best approach.


So from what I've seen the design you outlined is extremely unlikely to 
go upstream.


Regards,
Christian.

Am 27.01.24 um 03:21 schrieb Zeng, Oak:


Regarding the idea of expanding userptr to support migration, we 
explored this idea long time ago. It provides similar functions of the 
system allocator but its interface is not as convenient as system 
allocator. Besides the shared virtual address space, another benefit 
of a system allocator is, you can offload cpu program to gpu easier, 
you don’t need to call driver specific API (such as register_userptr 
and vm_bind in this case) for memory allocation.


We also scoped the implementation. It turned out to be big, and not as 
beautiful as hmm. Why we gave up this approach.


*From:*Christian König 
*Sent:* Friday, January 26, 2024 7:52 AM
*To:* Thomas Hellström ; Daniel 
Vetter 
*Cc:* Brost, Matthew ; Felix Kuehling 
; Welty, Brian ; 
Ghimiray, Himal Prasad ; Zeng, Oak 
; Gupta, saurabhg ; 
Danilo Krummrich ; dri-devel@lists.freedesktop.org; 
Bommu, Krishnaiah ; Dave Airlie 
; Vishwanathapura, Niranjana 
; intel...@lists.freedesktop.org

*Subject:* Re: Making drm_gpuvm work across gpu devices

Am 26.01.24 um 09:21 schrieb Thomas Hellström:

Hi, all

On Thu, 2024-01-25 at 19:32 +0100, Daniel Vetter wrote:

On Wed, Jan 24, 2024 at 09:33:12AM +0100, Christian König wrote:

Am 23.01.24 um 20:37 schrieb Zeng, Oak:

[SNIP]

Yes most API are per device based.

One exception I know is actually the kfd SVM API. If you look at

the svm_ioctl function, it is per-process based. Each 
kfd_process

represent a process across N gpu devices.

Yeah and that was a big mistake in my opinion. We should really not

do that

ever again.

Need to say, kfd SVM represent a shared virtual address space

across CPU and all GPU devices on the system. This is by the

definition of SVM (shared virtual memory). This is very 
different

from our legacy gpu *device* driver which works for only one

device (i.e., if you want one device to access another device's

memory, you will have to use dma-buf export/import etc).

Exactly that thinking is what we have currently found as blocker

for a

virtualization projects. Having SVM as device independent feature

which

somehow ties to the process address space turned out to be an

extremely bad

idea.

The background is that this only works for some use cases but not

all of

them.

What's working much better is to just have a mirror functionality

which says

that a range A..B of the process address space is mapped into a

range C..D

of the GPU address space.

Those ranges can then be used to implement the SVM feature required

for

higher level APIs and not something you need at the UAPI or even

inside the

low level kernel memory management.

When you talk about migrating memory to a device you also do this

on a per

device basis and *not* tied to the process address space. If you

then get

crappy performance because userspace gave contradicting information

where to

migrate memory then that's a bug in userspace and not something the

kernel

should try to prevent somehow.

[SNIP]

I think if you start using the same drm_gpuvm for multiple

devices you

will sooner or later start to run into the same mess we have

seen with

KFD, where we moved more and more functionality from the 
KFD to

the DRM

render node because we found that a lot of the stuff simply

doesn't work

correctly with a single object to maintain the state.

As I understand it, KFD is designed to work across devices. A

single pseudo /dev/kfd device represent all hardware gpu 
devices.

That is why during kfd open, many pdd (process device data) is

created, each for one hardware device for this process.

Yes, I'm perfectly aware of that. And I can only r

Re: [PATCH] drm/etnaviv: fix DMA direction handling for cached read/write buffers

2024-01-29 Thread Daniel Stone

Hi Lucas,

On Fri, 26 Jan 2024 at 17:00, Lucas Stach  wrote:
> The dma sync operation needs to be done with DMA_BIDIRECTIONAL when
> the BO is prepared for both read and write operations. With the
> current inverted if ladder it would only be synced for DMA_FROM_DEVICE.
>
> [...]
>
>  static inline enum dma_data_direction etnaviv_op_to_dma_dir(u32 op)
>  {
> -   if (op & ETNA_PREP_READ)
> +   if (op & (ETNA_PREP_READ | ETNA_PREP_WRITE))
> +   return DMA_BIDIRECTIONAL;

This test will always be true for _either_ read or write.

Cheers,
Daniel

[PATCH v2 1/1] drm/virtio: Implement device_attach

2024-01-29 Thread Julia Zhang

As vram objects don't have backing pages and thus can't implement
drm_gem_object_funcs.get_sg_table callback. This removes drm dma-buf
callbacks in virtgpu_gem_map_dma_buf()/virtgpu_gem_unmap_dma_buf()
and implement virtgpu specific map/unmap/attach callbacks to support
both of shmem objects and vram objects.

Signed-off-by: Julia Zhang 
---
 drivers/gpu/drm/virtio/virtgpu_prime.c | 40 +++---
 1 file changed, 36 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_prime.c 
b/drivers/gpu/drm/virtio/virtgpu_prime.c
index 44425f20d91a..b490a5343b06 100644
--- a/drivers/gpu/drm/virtio/virtgpu_prime.c
+++ b/drivers/gpu/drm/virtio/virtgpu_prime.c
@@ -49,11 +49,26 @@ virtgpu_gem_map_dma_buf(struct dma_buf_attachment *attach,
 {
struct drm_gem_object *obj = attach->dmabuf->priv;
struct virtio_gpu_object *bo = gem_to_virtio_gpu_obj(obj);
+   struct sg_table *sgt;
+   int ret;
 
if (virtio_gpu_is_vram(bo))
return virtio_gpu_vram_map_dma_buf(bo, attach->dev, dir);
 
-   return drm_gem_map_dma_buf(attach, dir);
+   sgt = drm_prime_pages_to_sg(obj->dev,
+   to_drm_gem_shmem_obj(obj)->pages,
+   obj->size >> PAGE_SHIFT);
+   if (IS_ERR(sgt))
+   return sgt;
+
+   ret = dma_map_sgtable(attach->dev, sgt, dir, DMA_ATTR_SKIP_CPU_SYNC);
+   if (ret) {
+   sg_free_table(sgt);
+   kfree(sgt);
+   return ERR_PTR(ret);
+   }
+
+   return sgt;
 }
 
 static void virtgpu_gem_unmap_dma_buf(struct dma_buf_attachment *attach,
@@ -63,12 +78,29 @@ static void virtgpu_gem_unmap_dma_buf(struct 
dma_buf_attachment *attach,
struct drm_gem_object *obj = attach->dmabuf->priv;
struct virtio_gpu_object *bo = gem_to_virtio_gpu_obj(obj);
 
+   if (!sgt)
+   return;
+
if (virtio_gpu_is_vram(bo)) {
virtio_gpu_vram_unmap_dma_buf(attach->dev, sgt, dir);
-   return;
+   } else {
+   dma_unmap_sgtable(attach->dev, sgt, dir, 
DMA_ATTR_SKIP_CPU_SYNC);
+   sg_free_table(sgt);
+   kfree(sgt);
}
+}
+
+static int virtgpu_gem_device_attach(struct dma_buf *dma_buf,
+struct dma_buf_attachment *attach)
+{
+   struct drm_gem_object *obj = attach->dmabuf->priv;
+   struct virtio_gpu_object *bo = gem_to_virtio_gpu_obj(obj);
+   int ret = 0;
+
+   if (!virtio_gpu_is_vram(bo) && obj->funcs->pin)
+   ret = obj->funcs->pin(obj);
 
-   drm_gem_unmap_dma_buf(attach, sgt, dir);
+   return ret;
 }
 
 static const struct virtio_dma_buf_ops virtgpu_dmabuf_ops =  {
@@ -83,7 +115,7 @@ static const struct virtio_dma_buf_ops virtgpu_dmabuf_ops =  
{
.vmap = drm_gem_dmabuf_vmap,
.vunmap = drm_gem_dmabuf_vunmap,
},
-   .device_attach = drm_gem_map_attach,
+   .device_attach = virtgpu_gem_device_attach,
.get_uuid = virtgpu_virtio_get_uuid,
 };
 
-- 
2.34.1

Re: [PATCH] drm: bridge: samsung-dsim: Don't use FORCE_STOP_STATE

2024-01-29 Thread Michael Walle


Just FYI this conflictted pretty heavily with drm-misc-next changes in
the same area, someone should check drm-tip has the correct
resolution, I'm not really sure what is definitely should be.


FWIW, this looks rather messy now. The drm-tip doesn't build.

There was a new call to samsung_dsim_set_stop_state() introduced
in commit b2fe2292624ac (drm: bridge: samsung-dsim: enter display
mode in the enable() callback).

Also merge commit 663a907e199b (Merge remote-tracking branch
'drm-misc/drm-misc-next' into drm-tip) is broken because it
completely removes samsung_dsim_atomic_disable(). Dunno whats
going on there.

I'm just about to look at getting it to compile again and
I'm trying to retest it.

-michael

Re: [PATCH 03/19] drm/i915/dp: Add support to notify MST connectors to retry modesets

2024-01-29 Thread Hogander, Jouni

On Tue, 2024-01-23 at 12:28 +0200, Imre Deak wrote:
> On shared (Thunderbolt) links with DP tunnels, the modeset may need
> to
> be retried on all connectors on the link due to a link BW limitation
> arising only after the atomic check phase. To support this add a
> helper
> function queuing a work to retry the modeset on a given port's
> connector
> and at the same time any MST connector with streams through the same
> port. A follow-up change enabling the DP tunnel Bandwidth Allocation
> Mode will take this into use.
> 
> Signed-off-by: Imre Deak 
> ---
>  drivers/gpu/drm/i915/display/intel_display.c  |  5 +-
>  drivers/gpu/drm/i915/display/intel_dp.c   | 55
> ++-
>  drivers/gpu/drm/i915/display/intel_dp.h   |  8 +++
>  .../drm/i915/display/intel_dp_link_training.c |  3 +-
>  drivers/gpu/drm/i915/display/intel_dp_mst.c   |  2 +
>  5 files changed, 67 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/display/intel_display.c
> b/drivers/gpu/drm/i915/display/intel_display.c
> index a92e959c8ac7b..0caebbb3e2dbb 100644
> --- a/drivers/gpu/drm/i915/display/intel_display.c
> +++ b/drivers/gpu/drm/i915/display/intel_display.c
> @@ -8060,8 +8060,9 @@ void intel_hpd_poll_fini(struct
> drm_i915_private *i915)
> /* Kill all the work that may have been queued by hpd. */
> drm_connector_list_iter_begin(&i915->drm, &conn_iter);
> for_each_intel_connector_iter(connector, &conn_iter) {
> -   if (connector->modeset_retry_work.func)
> -   cancel_work_sync(&connector-
> >modeset_retry_work);
> +   if (connector->modeset_retry_work.func &&
> +   cancel_work_sync(&connector->modeset_retry_work))
> +   drm_connector_put(&connector->base);
> if (connector->hdcp.shim) {
> cancel_delayed_work_sync(&connector-
> >hdcp.check_work);
> cancel_work_sync(&connector->hdcp.prop_work);
> diff --git a/drivers/gpu/drm/i915/display/intel_dp.c
> b/drivers/gpu/drm/i915/display/intel_dp.c
> index ab415f41924d7..4e36c2c39888e 100644
> --- a/drivers/gpu/drm/i915/display/intel_dp.c
> +++ b/drivers/gpu/drm/i915/display/intel_dp.c
> @@ -2837,6 +2837,50 @@ intel_dp_audio_compute_config(struct
> intel_encoder *encoder,
> intel_dp_is_uhbr(pipe_config)
> ;
>  }
>  
> +void intel_dp_queue_modeset_retry_work(struct intel_connector
> *connector)
> +{
> +   struct drm_i915_private *i915 = to_i915(connector->base.dev);
> +
> +   drm_connector_get(&connector->base);
> +   if (!queue_work(i915->unordered_wq, &connector-
> >modeset_retry_work))
> +   drm_connector_put(&connector->base);
> +}
> +
> +void
> +intel_dp_queue_modeset_retry_for_link(struct intel_atomic_state
> *state,
> + struct intel_encoder *encoder,
> + const struct intel_crtc_state
> *crtc_state,
> + const struct
> drm_connector_state *conn_state)
> +{
> +   struct drm_i915_private *i915 = to_i915(crtc_state-
> >uapi.crtc->dev);
> +   struct intel_connector *connector;
> +   struct intel_digital_connector_state *iter_conn_state;
> +   struct intel_dp *intel_dp;
> +   int i;
> +
> +   if (conn_state) {
> +   connector = to_intel_connector(conn_state-
> >connector);
> +   intel_dp_queue_modeset_retry_work(connector);
> +
> +   return;
> +   }
> +
> +   if (drm_WARN_ON(&i915->drm,
> +   !intel_crtc_has_type(crtc_state,
> INTEL_OUTPUT_DP_MST)))
> +   return;
> +
> +   intel_dp = enc_to_intel_dp(encoder);
> +
> +   for_each_new_intel_connector_in_state(state, connector,
> iter_conn_state, i) {
> +   (void)iter_conn_state;

Checked iter_conn_state->base->crtc documentation:

@crtc: CRTC to connect connector to, NULL if disabled.

Do we need to check if connector is "disabled" or is it impossible
scenario?

BR,

Jouni Högander

 
> +
> +   if (connector->mst_port != intel_dp)
> +   continue;
> +
> +   intel_dp_queue_modeset_retry_work(connector);
> +   }
> +}
> +
>  int
>  intel_dp_compute_config(struct intel_encoder *encoder,
> struct intel_crtc_state *pipe_config,
> @@ -6436,6 +6480,14 @@ static void
> intel_dp_modeset_retry_work_fn(struct work_struct *work)
> mutex_unlock(&connector->dev->mode_config.mutex);
> /* Send Hotplug uevent so userspace can reprobe */
> drm_kms_helper_connector_hotplug_event(connector);
> +
> +   drm_connector_put(connector);
> +}
> +
> +void intel_dp_init_modeset_retry_work(struct intel_connector
> *connector)
> +{
> +   INIT_WORK(&connector->modeset_retry_work,
> + intel_dp_modeset_retry_work_fn);
>  }
>  
>  bool
> @@ -6452,8 +6504,7 @@ intel_dp_init_connector(

Re: [PATCH] drm: bridge: samsung-dsim: Don't use FORCE_STOP_STATE

2024-01-29 Thread Michael Walle


Also merge commit 663a907e199b (Merge remote-tracking branch
'drm-misc/drm-misc-next' into drm-tip) is broken because it
completely removes samsung_dsim_atomic_disable(). Dunno whats
going on there.


Actually, that merge commit looks even worse. It somehow folds
the original samsung_dsim_atomic_disable() into
samsung_dsim_atomic_enable(). Therefore, now the enable op
will clear the DSIM_STATE_VIDOUT_AVAILABLE flag?! It will also
never be set. Dunno how to proceed here.

-michael

Re: (subset) [PATCH RFC for upstream 0/4] Add support for ETML1010G3DRA LVDS display on phyBOARD-Pollux

2024-01-29 Thread Neil Armstrong

Hi,

On Fri, 26 Jan 2024 09:57:22 +0100, Yannic Moog wrote:
> This series adds the LVDS panel support in simple-panel, introduces
> device tree support for the LVDS panel on the phyBOARD-Pollux and
> enables the FSL_LDB config option needed for the driver for the imx8mp
> ldb bridge.
> 

Thanks, Applied to https://anongit.freedesktop.org/git/drm/drm-misc.git 
(drm-misc-next)

[1/4] dt-bindings: display: panel-simple: add ETML1010G3DRA
  
https://cgit.freedesktop.org/drm/drm-misc/commit/?id=7a61bbc10a7b2734fbffa4438340b6878cce2e5c
[2/4] drm/panel: simple: Add EDT ETML1010G3DRA panel
  
https://cgit.freedesktop.org/drm/drm-misc/commit/?id=aeb262c353354eab81ab0d3242afa70984b7dc34

-- 
Neil

Re: [PATCH 1/8] video: Add helpers for decoding screen_info

2024-01-29 Thread Javier Martinez Canillas

Thomas Zimmermann  writes:

Hello Thomas,

> The plain values as stored in struct screen_info need to be decoded
> before being used. Add helpers that decode the type of video output
> and the framebuffer I/O aperture.
>
> Old or non-x86 systems may not set the type of video directly, but
> only indicate the presence by storing 0x01 in orig_video_isVGA. The
> decoding logic in screen_info_video_type() takes this into account.

I always disliked how the orig_video_isVGA variable lost its meaning.

> It then follows similar code in vgacon's vgacon_startup() to detect
> the video type from the given values.
>
> A call to screen_info_resources() returns all known resources of the
> given screen_info. The resources' values have been taken from existing
> code in vgacon and vga16fb. These drivers can later be converted to
> use the new interfaces.
>
> Signed-off-by: Thomas Zimmermann 
> ---

Thanks for doing this! It's quite useful to have these helpers, since as
you mention the screen_info data decoding is complex and the variables
used to store the video type and modes are confusing / misleading.

Reviewed-by: Javier Martinez Canillas 

I just have a few comments below:

> +static inline bool __screen_info_has_ega_gfx(unsigned int mode)
> +{
> + switch (mode) {
> + case 0x0d:  /* 320x200-4 */
> + case 0x0e:  /* 640x200-4 */
> + case 0x0f:  /* 640x350-1 */
> + case 0x10:  /* 640x350-4 */

I wonder if makes sense to define some constant macros for these modes? I
know that check_mode_supported() in drivers/video/fbdev/vga16fb.c also use
magic numbers but I believe that it could ease readability.

> + return true;
> + default:
> + return false;
> + }
> +}
> +
> +static inline bool __screen_info_has_vga_gfx(unsigned int mode)
> +{
> + switch (mode) {
> + case 0x10:  /* 640x480-1 */
> + case 0x12:  /* 640x480-4 */
> + case 0x13:  /* 320-200-8 */
> + case 0x6a:  /* 800x600-4 (VESA) */
> + return true;

And same for these.

It can be a follow-up patch though.

[...]

> +int screen_info_resources(const struct screen_info *si, struct resource *r, 
> size_t num)
> +{
> + struct resource *pos = r;
> + unsigned int type = screen_info_video_type(si);
> + u64 base, size;
> +
> + switch (type) {
> + case VIDEO_TYPE_MDA:
> + if (num > 0)
> + resource_init_io_named(pos++, 0x3b0, 12, "mda");

I see that drivers/video/fbdev/i740_reg.h has a #define MDA_BASE
0x3B0. Maybe move to a header in include/video along with the other
constants mentioned above ?

> + if (num > 1)
> + resource_init_io_named(pos++, 0x3bf, 0x01, "mda");
> + if (num > 2)
> + resource_init_mem_named(pos++, 0xb, 0x2000, "mda");

Same for these start addresses. I see that are also used by mdacon_startup()
in drivers/video/console/mdacon.c, so some constants defined somewhere might
be useful for that console driver too.

The comment also applies to all the other start addresses, since I see
that those are used by other drivers (i810, vgacon, etc).

-- 
Best regards,

Javier Martinez Canillas
Core Platforms
Red Hat

[PATCH RESEND v3 1/3] drm/stm: dsi: use new SYSTEM_SLEEP_PM_OPS() macro

2024-01-29 Thread Raphael Gallais-Pou

Use RUNTIME_PM_OPS() instead of the old SET_SYSTEM_SLEEP_PM_OPS().
This means we don't need  __maybe_unused on the functions.

Signed-off-by: Raphael Gallais-Pou 
---
 drivers/gpu/drm/stm/dw_mipi_dsi-stm.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/stm/dw_mipi_dsi-stm.c 
b/drivers/gpu/drm/stm/dw_mipi_dsi-stm.c
index d5f8c923d7bc..b1aee43d51e9 100644
--- a/drivers/gpu/drm/stm/dw_mipi_dsi-stm.c
+++ b/drivers/gpu/drm/stm/dw_mipi_dsi-stm.c
@@ -544,7 +544,7 @@ static void dw_mipi_dsi_stm_remove(struct platform_device 
*pdev)
regulator_disable(dsi->vdd_supply);
 }
 
-static int __maybe_unused dw_mipi_dsi_stm_suspend(struct device *dev)
+static int dw_mipi_dsi_stm_suspend(struct device *dev)
 {
struct dw_mipi_dsi_stm *dsi = dw_mipi_dsi_stm_plat_data.priv_data;
 
@@ -556,7 +556,7 @@ static int __maybe_unused dw_mipi_dsi_stm_suspend(struct 
device *dev)
return 0;
 }
 
-static int __maybe_unused dw_mipi_dsi_stm_resume(struct device *dev)
+static int dw_mipi_dsi_stm_resume(struct device *dev)
 {
struct dw_mipi_dsi_stm *dsi = dw_mipi_dsi_stm_plat_data.priv_data;
int ret;
@@ -580,8 +580,8 @@ static int __maybe_unused dw_mipi_dsi_stm_resume(struct 
device *dev)
 }
 
 static const struct dev_pm_ops dw_mipi_dsi_stm_pm_ops = {
-   SET_SYSTEM_SLEEP_PM_OPS(dw_mipi_dsi_stm_suspend,
-   dw_mipi_dsi_stm_resume)
+   SYSTEM_SLEEP_PM_OPS(dw_mipi_dsi_stm_suspend,
+   dw_mipi_dsi_stm_resume)
 };
 
 static struct platform_driver dw_mipi_dsi_stm_driver = {
-- 
2.25.1

[PATCH RESEND v3 2/3] drm/stm: dsi: add pm runtime ops

2024-01-29 Thread Raphael Gallais-Pou

From: Yannick Fertre 

Update control of clocks and supply thanks to the PM runtime
mechanism to avoid kernel crash during a system suspend.

Signed-off-by: Yannick Fertre 
Signed-off-by: Raphael Gallais-Pou 
---
Changes in v2:
- Changed SET_RUNTIME_PM_OPS to RUNTIME_PM_OPS and removed
__maybe_unused
---
 drivers/gpu/drm/stm/dw_mipi_dsi-stm.c | 24 ++--
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/stm/dw_mipi_dsi-stm.c 
b/drivers/gpu/drm/stm/dw_mipi_dsi-stm.c
index b1aee43d51e9..82fff9e84345 100644
--- a/drivers/gpu/drm/stm/dw_mipi_dsi-stm.c
+++ b/drivers/gpu/drm/stm/dw_mipi_dsi-stm.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -77,6 +78,7 @@ enum dsi_color {
 struct dw_mipi_dsi_stm {
void __iomem *base;
struct clk *pllref_clk;
+   struct clk *pclk;
struct dw_mipi_dsi *dsi;
u32 hw_version;
int lane_min_kbps;
@@ -443,7 +445,6 @@ static int dw_mipi_dsi_stm_probe(struct platform_device 
*pdev)
 {
struct device *dev = &pdev->dev;
struct dw_mipi_dsi_stm *dsi;
-   struct clk *pclk;
int ret;
 
dsi = devm_kzalloc(dev, sizeof(*dsi), GFP_KERNEL);
@@ -483,21 +484,21 @@ static int dw_mipi_dsi_stm_probe(struct platform_device 
*pdev)
goto err_clk_get;
}
 
-   pclk = devm_clk_get(dev, "pclk");
-   if (IS_ERR(pclk)) {
-   ret = PTR_ERR(pclk);
+   dsi->pclk = devm_clk_get(dev, "pclk");
+   if (IS_ERR(dsi->pclk)) {
+   ret = PTR_ERR(dsi->pclk);
DRM_ERROR("Unable to get peripheral clock: %d\n", ret);
goto err_dsi_probe;
}
 
-   ret = clk_prepare_enable(pclk);
+   ret = clk_prepare_enable(dsi->pclk);
if (ret) {
DRM_ERROR("%s: Failed to enable peripheral clk\n", __func__);
goto err_dsi_probe;
}
 
dsi->hw_version = dsi_read(dsi, DSI_VERSION) & VERSION;
-   clk_disable_unprepare(pclk);
+   clk_disable_unprepare(dsi->pclk);
 
if (dsi->hw_version != HWVER_130 && dsi->hw_version != HWVER_131) {
ret = -ENODEV;
@@ -551,6 +552,7 @@ static int dw_mipi_dsi_stm_suspend(struct device *dev)
DRM_DEBUG_DRIVER("\n");
 
clk_disable_unprepare(dsi->pllref_clk);
+   clk_disable_unprepare(dsi->pclk);
regulator_disable(dsi->vdd_supply);
 
return 0;
@@ -569,8 +571,16 @@ static int dw_mipi_dsi_stm_resume(struct device *dev)
return ret;
}
 
+   ret = clk_prepare_enable(dsi->pclk);
+   if (ret) {
+   regulator_disable(dsi->vdd_supply);
+   DRM_ERROR("Failed to enable pclk: %d\n", ret);
+   return ret;
+   }
+
ret = clk_prepare_enable(dsi->pllref_clk);
if (ret) {
+   clk_disable_unprepare(dsi->pclk);
regulator_disable(dsi->vdd_supply);
DRM_ERROR("Failed to enable pllref_clk: %d\n", ret);
return ret;
@@ -582,6 +592,8 @@ static int dw_mipi_dsi_stm_resume(struct device *dev)
 static const struct dev_pm_ops dw_mipi_dsi_stm_pm_ops = {
SYSTEM_SLEEP_PM_OPS(dw_mipi_dsi_stm_suspend,
dw_mipi_dsi_stm_resume)
+   RUNTIME_PM_OPS(dw_mipi_dsi_stm_suspend,
+  dw_mipi_dsi_stm_resume, NULL)
 };
 
 static struct platform_driver dw_mipi_dsi_stm_driver = {
-- 
2.25.1

[PATCH RESEND v3 3/3] drm/stm: dsi: expose DSI PHY internal clock

2024-01-29 Thread Raphael Gallais-Pou

DSISRC __
   __\_
  |\
pll4_p_ck   ->|  1  |dsi_k
ck_dsi_phy  ->|  0  |
  |/

A DSI clock is missing in the clock framework. Looking at the
clk_summary, it appears that 'ck_dsi_phy' is not implemented. Since the
DSI kernel clock is based on the internal DSI pll. The common clock
driver can not directly expose this 'ck_dsi_phy' clock because it does
not contain any common registers with the DSI. Thus it needs to be done
directly within the DSI phy driver.

Signed-off-by: Raphael Gallais-Pou 
---
Changes in v3:
- Fix smatch warning:
.../dw_mipi_dsi-stm.c:719 dw_mipi_dsi_stm_probe() warn: 'dsi->pclk'
from clk_prepare_enable() not released on lines: 719.
---
 drivers/gpu/drm/stm/dw_mipi_dsi-stm.c | 247 ++
 1 file changed, 216 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/stm/dw_mipi_dsi-stm.c 
b/drivers/gpu/drm/stm/dw_mipi_dsi-stm.c
index 82fff9e84345..b20123854c4a 100644
--- a/drivers/gpu/drm/stm/dw_mipi_dsi-stm.c
+++ b/drivers/gpu/drm/stm/dw_mipi_dsi-stm.c
@@ -7,7 +7,9 @@
  */
 
 #include 
+#include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -77,9 +79,12 @@ enum dsi_color {
 
 struct dw_mipi_dsi_stm {
void __iomem *base;
+   struct device *dev;
struct clk *pllref_clk;
struct clk *pclk;
+   struct clk_hw txbyte_clk;
struct dw_mipi_dsi *dsi;
+   struct dw_mipi_dsi_plat_data pdata;
u32 hw_version;
int lane_min_kbps;
int lane_max_kbps;
@@ -196,29 +201,198 @@ static int dsi_pll_get_params(struct dw_mipi_dsi_stm 
*dsi,
return 0;
 }
 
-static int dw_mipi_dsi_phy_init(void *priv_data)
+#define clk_to_dw_mipi_dsi_stm(clk) \
+   container_of(clk, struct dw_mipi_dsi_stm, txbyte_clk)
+
+static void dw_mipi_dsi_clk_disable(struct clk_hw *clk)
 {
-   struct dw_mipi_dsi_stm *dsi = priv_data;
+   struct dw_mipi_dsi_stm *dsi = clk_to_dw_mipi_dsi_stm(clk);
+
+   DRM_DEBUG_DRIVER("\n");
+
+   /* Disable the DSI PLL */
+   dsi_clear(dsi, DSI_WRPCR, WRPCR_PLLEN);
+
+   /* Disable the regulator */
+   dsi_clear(dsi, DSI_WRPCR, WRPCR_REGEN | WRPCR_BGREN);
+}
+
+static int dw_mipi_dsi_clk_enable(struct clk_hw *clk)
+{
+   struct dw_mipi_dsi_stm *dsi = clk_to_dw_mipi_dsi_stm(clk);
u32 val;
int ret;
 
+   DRM_DEBUG_DRIVER("\n");
+
/* Enable the regulator */
dsi_set(dsi, DSI_WRPCR, WRPCR_REGEN | WRPCR_BGREN);
-   ret = readl_poll_timeout(dsi->base + DSI_WISR, val, val & WISR_RRS,
-SLEEP_US, TIMEOUT_US);
+   ret = readl_poll_timeout_atomic(dsi->base + DSI_WISR, val, val & 
WISR_RRS,
+   SLEEP_US, TIMEOUT_US);
if (ret)
DRM_DEBUG_DRIVER("!TIMEOUT! waiting REGU, let's continue\n");
 
/* Enable the DSI PLL & wait for its lock */
dsi_set(dsi, DSI_WRPCR, WRPCR_PLLEN);
-   ret = readl_poll_timeout(dsi->base + DSI_WISR, val, val & WISR_PLLLS,
-SLEEP_US, TIMEOUT_US);
+   ret = readl_poll_timeout_atomic(dsi->base + DSI_WISR, val, val & 
WISR_PLLLS,
+   SLEEP_US, TIMEOUT_US);
if (ret)
DRM_DEBUG_DRIVER("!TIMEOUT! waiting PLL, let's continue\n");
 
return 0;
 }
 
+static int dw_mipi_dsi_clk_is_enabled(struct clk_hw *hw)
+{
+   struct dw_mipi_dsi_stm *dsi = clk_to_dw_mipi_dsi_stm(hw);
+
+   return dsi_read(dsi, DSI_WRPCR) & WRPCR_PLLEN;
+}
+
+static unsigned long dw_mipi_dsi_clk_recalc_rate(struct clk_hw *hw,
+unsigned long parent_rate)
+{
+   struct dw_mipi_dsi_stm *dsi = clk_to_dw_mipi_dsi_stm(hw);
+   unsigned int idf, ndiv, odf, pll_in_khz, pll_out_khz;
+   u32 val;
+
+   DRM_DEBUG_DRIVER("\n");
+
+   pll_in_khz = (unsigned int)(parent_rate / 1000);
+
+   val = dsi_read(dsi, DSI_WRPCR);
+
+   idf = (val & WRPCR_IDF) >> 11;
+   if (!idf)
+   idf = 1;
+   ndiv = (val & WRPCR_NDIV) >> 2;
+   odf = int_pow(2, (val & WRPCR_ODF) >> 16);
+
+   /* Get the adjusted pll out value */
+   pll_out_khz = dsi_pll_get_clkout_khz(pll_in_khz, idf, ndiv, odf);
+
+   return (unsigned long)pll_out_khz * 1000;
+}
+
+static long dw_mipi_dsi_clk_round_rate(struct clk_hw *hw, unsigned long rate,
+  unsigned long *parent_rate)
+{
+   struct dw_mipi_dsi_stm *dsi = clk_to_dw_mipi_dsi_stm(hw);
+   unsigned int idf, ndiv, odf, pll_in_khz, pll_out_khz;
+   int ret;
+
+   DRM_DEBUG_DRIVER("\n");
+
+   pll_in_khz = (unsigned int)(*parent_rate / 1000);
+
+   /* Compute best pll parameters */
+   idf = 0;
+   ndiv = 0;
+   odf = 0;
+
+   ret = dsi_pll_get_params(dsi, pll_in_khz, rate / 1000,
+&idf, &ndiv, &odf);

[PATCH RESEND v3 0/3] Update STM DSI PHY driver

2024-01-29 Thread Raphael Gallais-Pou



This patch series aims to add several features of the dw-mipi-dsi phy
driver that are missing or need to be updated.

First patch update a PM macro.

Second patch adds runtime PM functionality to the driver.

Third patch adds a clock provider generated by the PHY itself.  As
explained in the commit log of the second patch, a clock declaration is
missing.  Since this clock is parent of 'dsi_k', it leads to an orphan
clock.  Most importantly this patch is an anticipation for future
versions of the DSI PHY, and its inclusion within the display subsystem
and the DRM framework.

Last patch fixes a corner effect introduced previously.  Since 'dsi' and
'dsi_k' are gated by the same bit on the same register, both reference
work as peripheral clock in the device-tree.

---
Changes in v3-resend:
- Removed last patch as it has been merged
https://lore.kernel.org/lkml/bf49f4c9-9e81-4c91-972d-13782d996...@foss.st.com/

Changes in v3:
- Fix smatch warning (disable dsi->pclk when clk_register fails)

Changes in v2:
- Added patch 1/4 to use SYSTEM_SLEEP_PM_OPS instead of old macro
  and removed __maybe_used for accordingly
- Changed SET_RUNTIME_PM_OPS to RUNTIME_PM_OPS

Raphael Gallais-Pou (3):
  drm/stm: dsi: use new SYSTEM_SLEEP_PM_OPS() macro
  drm/stm: dsi: expose DSI PHY internal clock

Yannick Fertre (1):
  drm/stm: dsi: add pm runtime ops

 drivers/gpu/drm/stm/dw_mipi_dsi-stm.c | 279 ++
 1 file changed, 238 insertions(+), 41 deletions(-)

-- 
2.25.1

Re: [PATCH v4 00/14] drm: Add a driver for CSF-based Mali GPUs

2024-01-29 Thread Boris Brezillon

On Mon, 29 Jan 2024 17:20:47 +0800 (CST)
"Andy Yan"  wrote:

> Hi Boris：
> 
> Thanks for you great work。
> 
> One thing please take note：
> commit (arm64: dts: rockchip: rk3588: Add GPU nodes)  in [1] seems remove the 
> "disabled" status 
> of usb_host2_xhci, this may cause a boot issue on some boards that use 
> combphy2_psu  phy for other functions.

Oops, should be fixed in
https://gitlab.freedesktop.org/panfrost/linux/-/commits/panthor-next+rk3588
now.

Thanks,

Boris

Re: [PATCH 03/19] drm/i915/dp: Add support to notify MST connectors to retry modesets

2024-01-29 Thread Imre Deak

On Mon, Jan 29, 2024 at 12:36:12PM +0200, Hogander, Jouni wrote:
> On Tue, 2024-01-23 at 12:28 +0200, Imre Deak wrote:
> > [...]
> > +void
> > +intel_dp_queue_modeset_retry_for_link(struct intel_atomic_state *state,
> > + struct intel_encoder *encoder,
> > + const struct intel_crtc_state 
> > *crtc_state,
> > + const struct drm_connector_state 
> > *conn_state)
> > +{
> > +   struct drm_i915_private *i915 = to_i915(crtc_state->uapi.crtc->dev);
> > +   struct intel_connector *connector;
> > +   struct intel_digital_connector_state *iter_conn_state;
> > +   struct intel_dp *intel_dp;
> > +   int i;
> > +
> > +   if (conn_state) {
> > +   connector = to_intel_connector(conn_state->connector);
> > +   intel_dp_queue_modeset_retry_work(connector);
> > +
> > +   return;
> > +   }
> > +
> > +   if (drm_WARN_ON(&i915->drm,
> > +   !intel_crtc_has_type(crtc_state, 
> > INTEL_OUTPUT_DP_MST)))
> > +   return;
> > +
> > +   intel_dp = enc_to_intel_dp(encoder);
> > +
> > +   for_each_new_intel_connector_in_state(state, connector, 
> > iter_conn_state, i) {
> > +   (void)iter_conn_state;
> 
> Checked iter_conn_state->base->crtc documentation:
> 
> @crtc: CRTC to connect connector to, NULL if disabled.
> 
> Do we need to check if connector is "disabled" or is it impossible
> scenario?

Yes, it does show if the connector is disabled and it would make sense
to not notify those. However the check for that would be racy, at least
during a non-blocking commit, but I think also in general where
userspace could be in the middle of enabling this connector.

The point of the notification is that userspace re-checks the mode it
wants on each MST connector to be enabled, so to prevent that it would
miss the re-check on connectors with a pending enabling like above, the
notification is simply sent to all the connectors in the MST topology.

> 
> BR,
> 
> Jouni Högander
> 
> 
> > +
> > +   if (connector->mst_port != intel_dp)
> > +   continue;
> > +
> > +   intel_dp_queue_modeset_retry_work(connector);
> > +   }
> > +}
> > +
> >  int
> >  intel_dp_compute_config(struct intel_encoder *encoder,
> > struct intel_crtc_state *pipe_config,
> > @@ -6436,6 +6480,14 @@ static void
> > intel_dp_modeset_retry_work_fn(struct work_struct *work)
> > mutex_unlock(&connector->dev->mode_config.mutex);
> > /* Send Hotplug uevent so userspace can reprobe */
> > drm_kms_helper_connector_hotplug_event(connector);
> > +
> > +   drm_connector_put(connector);
> > +}
> > +
> > +void intel_dp_init_modeset_retry_work(struct intel_connector
> > *connector)
> > +{
> > +   INIT_WORK(&connector->modeset_retry_work,
> > + intel_dp_modeset_retry_work_fn);
> >  }
> >
> >  bool
> > @@ -6452,8 +6504,7 @@ intel_dp_init_connector(struct
> > intel_digital_port *dig_port,
> > int type;
> >
> > /* Initialize the work for modeset in case of link train
> > failure */
> > -   INIT_WORK(&intel_connector->modeset_retry_work,
> > - intel_dp_modeset_retry_work_fn);
> > +   intel_dp_init_modeset_retry_work(intel_connector);
> >
> > if (drm_WARN(dev, dig_port->max_lanes < 1,
> >  "Not enough lanes (%d) for DP on
> > [ENCODER:%d:%s]\n",
> > diff --git a/drivers/gpu/drm/i915/display/intel_dp.h
> > b/drivers/gpu/drm/i915/display/intel_dp.h
> > index 530cc97bc42f4..105c2086310db 100644
> > --- a/drivers/gpu/drm/i915/display/intel_dp.h
> > +++ b/drivers/gpu/drm/i915/display/intel_dp.h
> > @@ -23,6 +23,8 @@ struct intel_digital_port;
> >  struct intel_dp;
> >  struct intel_encoder;
> >
> > +struct work_struct;
> > +
> >  struct link_config_limits {
> > int min_rate, max_rate;
> > int min_lane_count, max_lane_count;
> > @@ -43,6 +45,12 @@ void intel_dp_adjust_compliance_config(struct
> > intel_dp *intel_dp,
> >  bool intel_dp_limited_color_range(const struct intel_crtc_state
> > *crtc_state,
> >   const struct drm_connector_state
> > *conn_state);
> >  int intel_dp_min_bpp(enum intel_output_format output_format);
> > +void intel_dp_init_modeset_retry_work(struct intel_connector
> > *connector);
> > +void intel_dp_queue_modeset_retry_work(struct intel_connector
> > *connector);
> > +void intel_dp_queue_modeset_retry_for_link(struct intel_atomic_state
> > *state,
> > +  struct intel_encoder
> > *encoder,
> > +  const struct
> > intel_crtc_state *crtc_state,
> > +  const struct
> > drm_connector_state *conn_state);
> >  bool intel_dp_init_connector(struct intel_digital_port *dig_port,
> >  st

Re: Re: Re: [PATCH 3/5] drm/ttm: replace busy placement with flags v6

2024-01-29 Thread Thomas Hellström

On Fri, 2024-01-26 at 16:22 -0600, Lucas De Marchi wrote:
> On Fri, Jan 26, 2024 at 04:16:58PM -0600, Lucas De Marchi wrote:
> > On Thu, Jan 18, 2024 at 05:38:16PM +0100, Thomas Hellström wrote:
> > > 
> > > On 1/17/24 13:27, Thomas Hellström wrote:
> > > > 
> > > > On 1/17/24 11:47, Thomas Hellström wrote:
> > > > > Hi, Christian
> > > > > 
> > > > > Xe changes look good. Will send the series to xe ci to check
> > > > > for 
> > > > > regressions.
> > > > 
> > > > Hmm, there are some checkpatch warnings about author / SOB
> > > > email 
> > > > mismatch,
> > > 
> > > With those fixed, this patch is
> > > 
> > > Reviewed-by: Thomas Hellström 
> > 
> > 
> > it actually broke drm-tip now that this is merged:
> > 
> > ../drivers/gpu/drm/xe/xe_bo.c:41:10: error: ‘struct ttm_placement’
> > has no member named ‘num_busy_placement’; did you mean
> > ‘num_placement’
> >   41 | .num_busy_placement = 1,
> >  |  ^~
> >  |  num_placement
> > ../drivers/gpu/drm/xe/xe_bo.c:41:31: error: excess elements in
> > struct initializer [-Werror]
> >   41 | .num_busy_placement = 1,
> >  |   ^
> > 
> > 
> > Apparently a conflict with another patch that got applied a few
> > days
> > ago: a201c6ee37d6 ("drm/xe/bo: Evict VRAM to TT rather than to
> > system")
> 
> oh, no... apparently that commit is  from a long time ago. The
> problem
> was that drm-misc-next was not yet in sync with drm-next. Thomas, do
> you
> have a fixup for this to put in rerere?
> 
> Lucas De Marchi

I added this as a manual fixup and ran some quick igt tests.

Seems to work.

Re: [PATCH v5 0/3] drm/i915: Fix VMA UAF on destroy against deactivate race

2024-01-29 Thread Janusz Krzysztofik

Hi Nirmoy,

On Monday, 29 January 2024 10:24:07 CET Nirmoy Das wrote:
> Hi Janusz,
> 
> There seems to be a regression in CI related to this:
> 
> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_129026v2/bat-dg1-7/
igt@gem_lmem_swapping@random-engi...@lmem0.html#dmesg-warnings1053
> 
> Please have a look.

Yes, that's a problem, the series can't be merged in its current shape.  
However, I'm not sure if that's really a regression, or rather an exposure of 
another, already existing issue.  It looks like a race among two concurrent 
calls to our __active_retire() used in DMA fence callbacks.  I'm going to 
verify some ideas for a fix on trybot.

Thanks,
Janusz

> 
> 
> Regards,
> 
> Nirmoy
> 
> On 1/24/2024 6:13 PM, Janusz Krzysztofik wrote:
> > Object debugging tools were sporadically reporting illegal attempts to
> > free a still active i915 VMA object when parking a GT believed to be idle.
> >
> > [161.359441] ODEBUG: free active (active state 0) object: 88811643b958 
object type: i915_active hint: __i915_vma_active+0x0/0x50 [i915]
> > [161.360082] WARNING: CPU: 5 PID: 276 at lib/debugobjects.c:514 
debug_print_object+0x80/0xb0
> > ...
> > [161.360304] CPU: 5 PID: 276 Comm: kworker/5:2 Not tainted 6.5.0-rc1-
CI_DRM_13375-g003f860e5577+ #1
> > [161.360314] Hardware name: Intel Corporation Rocket Lake Client Platform/
RocketLake S UDIMM 6L RVP, BIOS RKLSFWI1.R00.3173.A03.2204210138 04/21/2022
> > [161.360322] Workqueue: i915-unordered __intel_wakeref_put_work [i915]
> > [161.360592] RIP: 0010:debug_print_object+0x80/0xb0
> > ...
> > [161.361347] debug_object_free+0xeb/0x110
> > [161.361362] i915_active_fini+0x14/0x130 [i915]
> > [161.361866] release_references+0xfe/0x1f0 [i915]
> > [161.362543] i915_vma_parked+0x1db/0x380 [i915]
> > [161.363129] __gt_park+0x121/0x230 [i915]
> > [161.363515] intel_wakeref_put_last+0x1f/0x70 [i915]
> >
> > That has been tracked down to be happening when another thread is
> > deactivating the VMA inside __active_retire() helper, after the VMA's
> > active counter has been already decremented to 0, but before deactivation
> > of the VMA's object is reported to the object debugging tool.
> >
> > We could prevent from that race by serializing i915_active_fini() with
> > __active_retire() via ref->tree_lock, but that wouldn't stop the VMA from
> > being used, e.g. from __i915_vma_retire() called at the end of
> > __active_retire(), after that VMA has been already freed by a concurrent
> > i915_vma_destroy() on return from the i915_active_fini().  Then, we should
> > rather fix the issue at the VMA level, not in i915_active.
> >
> > Since __i915_vma_parked() is called from __gt_park() on last put of the
> > GT's wakeref, the issue could be addressed by holding the GT wakeref long
> > enough for __active_retire() to complete before that wakeref is released
> > and the GT parked.
> >
> > A VMA associated with a request doesn't acquire a GT wakeref by itself.
> > Instead, it depends on a wakeref held directly by the request's active
> > intel_context for a GT associated with its VM, and indirectly on that
> > intel_context's engine wakeref if the engine belongs to the same GT as the
> > VMA's VM.  Those wakerefs are released asynchronously to VMA deactivation.
> >
> > In case of single-GT platforms, at least one of those wakerefs is usually
> > held long enough for the request's VMA to be deactivated on time, before
> > it is destroyed on last put of its VM GT wakeref.  However, on multi-GT
> > platforms, a request may use a VMA from a GT other than the one that hosts
> > the request's engine, then it is protected only with the intel_context's
> > VM GT wakeref.
> >
> > There was an attempt to fix the issue on 2-GT Meteor Lake by acquiring an
> > extra wakeref for a Primary GT from i915_gem_do_execbuffer() -- see commit
> > f56fe3e91787 ("drm/i915: Fix a VMA UAF for multi-gt platform").  However,
> > that fix occurred insufficient -- the issue was still reported by CI.
> > That wakeref was released on exit from i915_gem_do_execbuffer(), then
> > potentially before completion of the request and deactivation of its
> > associated VMAs.  Moreover, CI reports indicated that single-GT platforms
> > also suffered sporadically from the same race.
> >
> > I believe the issue was introduced by commit d93939730347 ("drm/i915:
> > Remove the vma refcount") which moved a call to i915_active_fini() from
> > a dropped i915_vma_release(), called on last put of the removed VMA kref,
> > to i915_vma_parked() processing path called on last put of a GT wakeref.
> > However, its visibility to the object debugging tool was suppressed by a
> > bug in i915_active that was fixed two weeks later with commit e92eb246feb9
> > ("drm/i915/active: Fix missing debug object activation").
> >
> > Fix the issue by getting a wakeref for the VMA's GT when activating it,
> > and putting that wakeref only after the VMA is deactivated.  However,
> > exclude global GTT from that processing path, otherwise the GP

Re: [PATCH 2/8] video: Provide screen_info_get_pci_dev() to find screen_info's PCI device

2024-01-29 Thread Javier Martinez Canillas

Thomas Zimmermann  writes:

> Add screen_info_get_pci_dev() to find the PCI device of an instance
> of screen_info. Does nothing on systems without PCI bus.
>
> Signed-off-by: Thomas Zimmermann 
> ---

[...]

> +struct pci_dev *screen_info_pci_dev(const struct screen_info *si)
> +{
> + struct resource res[SCREEN_INFO_MAX_RESOURCES];
> + size_t i, numres;
> + int ret;
> +
> + ret = screen_info_resources(si, res, ARRAY_SIZE(res));
> + if (ret < 0)
> + return ERR_PTR(ret);
> + numres = ret;
> +

I would just drop the ret variable and assign the screen_info_resources()
return value to numres. I think that makes the code easier to follow.

Reviewed-by: Javier Martinez Canillas 

-- 
Best regards,

Javier Martinez Canillas
Core Platforms
Red Hat

Re: [PATCH v2 10/10] drm/vboxvideo: fix mapping leaks

2024-01-29 Thread Hans de Goede

Hi Philipp,

On 1/23/24 10:43, Philipp Stanner wrote:
> When the PCI devres API was introduced to this driver, it was wrongly
> assumed that initializing the device with pcim_enable_device() instead
> of pci_enable_device() will make all PCI functions managed.
> 
> This is wrong and was caused by the quite confusing devres API for PCI
> in which some, but not all, functions become managed that way.
> 
> The function pci_iomap_range() is never managed.
> 
> Replace pci_iomap_range() with the actually managed function
> pcim_iomap_range().
> 
> Additionally, add a call to pcim_request_region() to ensure exclusive
> access to BAR 0.

I'm a bit worried about this last change. There might be
issues where the pcim_request_region() fails due to
e.g. a conflict with the simplefb / simpledrm code.

There is a drm_aperture_remove_conflicting_pci_framebuffers()
call done before hw_init() gets called, but still this
has been known to cause issues in the past.

Can you split out the adding of the pcim_request_region()
into a separate patch and *not* mark that separate patch
for stable ?

Regards,

Hans





> 
> CC:  # v5.10+
> Fixes: 8558de401b5f ("drm/vboxvideo: use managed pci functions")
> Signed-off-by: Philipp Stanner 
> ---
>  drivers/gpu/drm/vboxvideo/vbox_main.c | 24 +---
>  1 file changed, 13 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/vboxvideo/vbox_main.c 
> b/drivers/gpu/drm/vboxvideo/vbox_main.c
> index 42c2d8a99509..7f686a0190e6 100644
> --- a/drivers/gpu/drm/vboxvideo/vbox_main.c
> +++ b/drivers/gpu/drm/vboxvideo/vbox_main.c
> @@ -42,12 +42,11 @@ static int vbox_accel_init(struct vbox_private *vbox)
>   /* Take a command buffer for each screen from the end of usable VRAM. */
>   vbox->available_vram_size -= vbox->num_crtcs * VBVA_MIN_BUFFER_SIZE;
>  
> - vbox->vbva_buffers = pci_iomap_range(pdev, 0,
> -  vbox->available_vram_size,
> -  vbox->num_crtcs *
> -  VBVA_MIN_BUFFER_SIZE);
> - if (!vbox->vbva_buffers)
> - return -ENOMEM;
> + vbox->vbva_buffers = pcim_iomap_range(
> + pdev, 0, vbox->available_vram_size,
> + vbox->num_crtcs * VBVA_MIN_BUFFER_SIZE);
> + if (IS_ERR(vbox->vbva_buffers))
> + return PTR_ERR(vbox->vbva_buffers);
>  
>   for (i = 0; i < vbox->num_crtcs; ++i) {
>   vbva_setup_buffer_context(&vbox->vbva_info[i],
> @@ -115,12 +114,15 @@ int vbox_hw_init(struct vbox_private *vbox)
>  
>   DRM_INFO("VRAM %08x\n", vbox->full_vram_size);
>  
> + ret = pcim_request_region(pdev, 0, "vboxvideo");
> + if (ret)
> + return ret;
> +
>   /* Map guest-heap at end of vram */
> - vbox->guest_heap =
> - pci_iomap_range(pdev, 0, GUEST_HEAP_OFFSET(vbox),
> - GUEST_HEAP_SIZE);
> - if (!vbox->guest_heap)
> - return -ENOMEM;
> + vbox->guest_heap = pcim_iomap_range(pdev, 0,
> + GUEST_HEAP_OFFSET(vbox), GUEST_HEAP_SIZE);
> + if (IS_ERR(vbox->guest_heap))
> + return PTR_ERR(vbox->guest_heap);
>  
>   /* Create guest-heap mem-pool use 2^4 = 16 byte chunks */
>   vbox->guest_pool = devm_gen_pool_create(vbox->ddev.dev, 4, -1,

Re: [PATCH 3/8] firmware/sysfb: Set firmware-framebuffer parent device

2024-01-29 Thread Javier Martinez Canillas

Thomas Zimmermann  writes:

> Set the firmware framebuffer's parent device, which usually is the
> graphics hardware's physical device. Integrates the framebuffer in
> the Linux device hierarchy and lets Linux handle dependencies among
> devices. For example, the graphics hardware won't be suspended while
> the firmware device is still active.
>
> Signed-off-by: Thomas Zimmermann 
> ---
>  drivers/firmware/sysfb.c  | 11 ++-
>  drivers/firmware/sysfb_simplefb.c |  5 -
>  include/linux/sysfb.h |  3 ++-
>  3 files changed, 16 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/firmware/sysfb.c b/drivers/firmware/sysfb.c
> index 19706bd2642a..8a42da3f67a9 100644
> --- a/drivers/firmware/sysfb.c
> +++ b/drivers/firmware/sysfb.c
> @@ -29,6 +29,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -72,6 +73,8 @@ EXPORT_SYMBOL_GPL(sysfb_disable);
>  static __init int sysfb_init(void)
>  {
>   const struct screen_info *si = &screen_info;
> + struct device *parent = NULL;
> + struct pci_dev *pparent;

Maybe pci_parent? It's easier to read than pparent IMO.

Reviewed-by: Javier Martinez Canillas 

-- 
Best regards,

Javier Martinez Canillas
Core Platforms
Red Hat

Re: [PATCH 4/8] fbdev/efifb: Remove PM for parent device

2024-01-29 Thread Javier Martinez Canillas

Thomas Zimmermann  writes:

> The EFI device has the correct parent device set. This allows Linux
> to handle the power management internally. Hence, remove the manual
> PM management for the parent device from efifb.
>
> Signed-off-by: Thomas Zimmermann 
> ---

Reviewed-by: Javier Martinez Canillas 

-- 
Best regards,

Javier Martinez Canillas
Core Platforms
Red Hat

Re: [PATCH 5/8] firmware/sysfb: Create firmware device only for enabled PCI devices

2024-01-29 Thread Javier Martinez Canillas

Thomas Zimmermann  writes:

> Test if the firmware framebuffer's parent PCI device, if any, has
> been enabled. If not, the firmware framebuffer is most likely not
> working. Hence, do not create a device for the firmware framebuffer
> on disabled PCI devices.
>
> So far, efifb tracked the status of the PCI parent device internally
> and did not bind if it was disabled. This patch implements the
> functionality for all firmware framebuffers.
>
> Signed-off-by: Thomas Zimmermann 
> ---

[...]

>  
> +static __init bool sysfb_pci_dev_is_enabled(struct pci_dev *pdev)
> +{
> +#if defined(CONFIG_PCI)
> + /*
> +  * TODO: Try to integrate this code into the PCI subsystem
> +  */
> + int ret;
> + u16 command;
> +
> + ret = pci_read_config_word(pdev, PCI_COMMAND, &command);
> + if (ret != PCIBIOS_SUCCESSFUL)
> + return false;
> + if (!(command & PCI_COMMAND_MEMORY))
> + return false;
> + return true;
> +#else
> + // Getting here without PCI support is probably a bug.
> + return false;

Should we warn before return in this case ?

Reviewed-by: Javier Martinez Canillas 

-- 
Best regards,

Javier Martinez Canillas
Core Platforms
Red Hat

Re: [PATCH 6/8] fbdev/efifb: Do not track parent device status

2024-01-29 Thread Javier Martinez Canillas

Thomas Zimmermann  writes:

> There will be no EFI framebuffer device for disabled parent devices
> and thus we never probe efifb in that case. Hence remove the tracking
> code from efifb.
>
> Signed-off-by: Thomas Zimmermann 
> ---

Nice cleanup.

Reviewed-by: Javier Martinez Canillas 

-- 
Best regards,

Javier Martinez Canillas
Core Platforms
Red Hat

Re: [PATCH 7/8] firmware/sysfb: Update screen_info for relocated EFI framebuffers

2024-01-29 Thread Javier Martinez Canillas

Thomas Zimmermann  writes:

> On ARM PCI systems, the PCI hierarchy might be reconfigured during
> boot and the firmware framebuffer might move as a result of that.
> The values in screen_info will then be invalid.
>
> Work around this problem by tracking the framebuffer's initial
> location before it get relocated; then fix the screen_info state
> between reloaction and creating the firmware framebuffer's device.
>
> This functionality has been lifted from efifb. See the commit message
> of commit 55d728a40d36 ("efi/fb: Avoid reconfiguration of BAR that
> covers the framebuffer") for more information.
>
> Signed-off-by: Thomas Zimmermann 
> ---

[...]

>  #if defined(CONFIG_PCI)

Shouldn't this be && !defined(CONFIG_X86) ? Or maybe &&
defined(CONFIG_ARM64), although I don't know if the same
also applies to other EFI platforms (e.g: CONFIG_RISCV).

Reviewed-by: Javier Martinez Canillas 

-- 
Best regards,

Javier Martinez Canillas
Core Platforms
Red Hat

Re: [PATCH 7/8] firmware/sysfb: Update screen_info for relocated EFI framebuffers

2024-01-29 Thread Javier Martinez Canillas

Javier Martinez Canillas  writes:

> Thomas Zimmermann  writes:
>
>> On ARM PCI systems, the PCI hierarchy might be reconfigured during
>> boot and the firmware framebuffer might move as a result of that.
>> The values in screen_info will then be invalid.
>>
>> Work around this problem by tracking the framebuffer's initial
>> location before it get relocated; then fix the screen_info state
>> between reloaction and creating the firmware framebuffer's device.
>>
>> This functionality has been lifted from efifb. See the commit message
>> of commit 55d728a40d36 ("efi/fb: Avoid reconfiguration of BAR that
>> covers the framebuffer") for more information.
>>
>> Signed-off-by: Thomas Zimmermann 
>> ---
>
> [...]
>
>>  #if defined(CONFIG_PCI)
>
> Shouldn't this be && !defined(CONFIG_X86) ? Or maybe &&
> defined(CONFIG_ARM64), although I don't know if the same
> also applies to other EFI platforms (e.g: CONFIG_RISCV).
>

Answering my own question, the !defined(CONFIG_X86) was dropped in the commit
dcf8f5ce3165 ("drivers/fbdev/efifb: Allow BAR to be moved instead of claiming
it"). The rationale is explained in that commit message:

While this is less likely to occur on x86, given that the firmware's
PCI resource allocation is more likely to be preserved, this is a
worthwhile sanity check to have in place, and so let's remove the
preprocessor conditional that makes it !X86 only.

So it is OK to just guard with #if defined(CONFIG_PCI).

-- 
Best regards,

Javier Martinez Canillas
Core Platforms
Red Hat

Re: [PATCH 8/8] fbdev/efifb: Remove framebuffer relocation tracking

2024-01-29 Thread Javier Martinez Canillas

Thomas Zimmermann  writes:

> If the firmware framebuffer has been reloacted, the sysfb code
> fixes the screen_info state before it creates the framebuffer's
> platform device. Efifb will automatically receive a screen_info
> with updated values. Hence remove the tracking from efifb.
>
> Signed-off-by: Thomas Zimmermann 
> ---

Reviewed-by: Javier Martinez Canillas 

-- 
Best regards,

Javier Martinez Canillas
Core Platforms
Red Hat

Re: [Linaro-mm-sig] [PATCH 2/3] udmabuf: Sync buffer mappings for attached devices

2024-01-29 Thread Christian König


Am 26.01.24 um 18:24 schrieb Andrew Davis:

On 1/25/24 2:30 PM, Daniel Vetter wrote:

On Tue, Jan 23, 2024 at 04:12:26PM -0600, Andrew Davis wrote:

Currently this driver creates a SGT table using the CPU as the
target device, then performs the dma_sync operations against
that SGT. This is backwards to how DMA-BUFs are supposed to behave.
This may have worked for the case where these buffers were given
only back to the same CPU that produced them as in the QEMU case.
And only then because the original author had the dma_sync
operations also backwards, syncing for the "device" on begin_cpu.
This was noticed and "fixed" in this patch[0].

That then meant we were sync'ing from the CPU to the CPU using
a pseudo-device "miscdevice". Which then caused another issue
due to the miscdevice not having a proper DMA mask (and why should
it, the CPU is not a DMA device). The fix for that was an even
more egregious hack[1] that declares the CPU is coherent with
itself and can access its own memory space..

Unwind all this and perform the correct action by doing the dma_sync
operations for each device currently attached to the backing buffer.

[0] commit 1ffe09590121 ("udmabuf: fix dma-buf cpu access")
[1] commit 9e9fa6a9198b ("udmabuf: Set the DMA mask for the udmabuf 
device (v2)")


Signed-off-by: Andrew Davis 


So yeah the above hacks are terrible, but I don't think this is better.
What you're doing now is that you're potentially doing the flushing
multiple times, so if you have a lot of importers with life mappings 
this

is a performance regression.


I'd take lower performing but correct than fast and broken. :)

Syncing for CPU/device is about making sure the CPU/device can see
the data produced by the other. Some devices might be dma-coherent
and syncing for them would be a NOP, but we cant know that here
in this driver. Let's say we have two attached devices, one that
is cache coherent and one that isn't. If we only sync for first
attached device then that is converted to a NOP and we never flush
like the second device needed.

Same is true for devices behind IOMMU or with an L3 cache when
syncing in the other direction for CPU. So we have to sync for all
attached devices to ensure we get even the lowest common denominator
device sync'd. It is up to the DMA-API layer to decide which syncs
need to actually do something. If all attached devices are coherent
then all syncs will be NOPs and we have no performance penalty.



It's probably time to bite the bullet and teach the dma-api about 
flushing

for multiple devices. Or some way we can figure out which is the one
device we need to pick which gives us the right amount of flushing.



Seems like a constraint solving micro-optimization. The DMA-API layer
would have to track which buffers have already been flushed from CPU
cache and also track that nothing has been written into those caches
since that point, only then could it skip the flush. But that is already
the point of the dirty bit in the caches themselves, cleaning already
clean cache lines is essentially free in hardware. And so is invalidating
lines, it is just flipping a bit.


Well to separate the functionality a bit. What the DMA-API should 
provide is abstracting how the platform does flushing and invalidation 
of caches and the information which devices uses which caches and what 
needs to be flushed/invalidated to allow access between devices and the CPU.


In other words what's necessary is the following:
1. sync device to cpu
2. sync cpu to device
3. sync device to device

1 and 2 are already present and implemented for years, but 3 is missing 
together with some of the necessary infrastructure to actually implement 
this. E.g. we don't know which devices write into which caches etc...


On top of this we need the functionality to track who has accessed which 
piece of data and what DMA-API functions needs to be called to make 
things work for a specific use case. But this is then DMA-buf, I/O layer 
drivers etc.. and should not belong into the DMA-API.


I also strongly think that putting the SWIOTLB bounce buffer 
functionality into the DMA-API was not the right choice.


Regards,
Christian.



Andrew


Cheers, Sima


---
  drivers/dma-buf/udmabuf.c | 41 
+++

  1 file changed, 16 insertions(+), 25 deletions(-)

diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
index 3a23f0a7d112a..ab6764322523c 100644
--- a/drivers/dma-buf/udmabuf.c
+++ b/drivers/dma-buf/udmabuf.c
@@ -26,8 +26,6 @@ MODULE_PARM_DESC(size_limit_mb, "Max size of a 
dmabuf, in megabytes. Default is

  struct udmabuf {
  pgoff_t pagecount;
  struct page **pages;
-    struct sg_table *sg;
-    struct miscdevice *device;
  struct list_head attachments;
  struct mutex lock;
  };
@@ -169,12 +167,8 @@ static void unmap_udmabuf(struct 
dma_buf_attachment *at,

  static void release_udmabuf(struct dma_buf *buf)
  {
  struct udmabuf *ubuf = buf->priv;
-    struct de

[PATCH] backlight: ktz8866: Correct the check for of_property_read_u32

2024-01-29 Thread Jianhua Lu

of_property_read_u32 returns 0 when success, so reverse the
return value to get the true value.

Fixes: f8449c8f7355 ("backlight: ktz8866: Add support for Kinetic KTZ8866 
backlight")
Signed-off-by: Jianhua Lu 
---
 drivers/video/backlight/ktz8866.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/video/backlight/ktz8866.c 
b/drivers/video/backlight/ktz8866.c
index 9c980f2571ee..014877b5a984 100644
--- a/drivers/video/backlight/ktz8866.c
+++ b/drivers/video/backlight/ktz8866.c
@@ -97,20 +97,20 @@ static void ktz8866_init(struct ktz8866 *ktz)
 {
unsigned int val = 0;
 
-   if (of_property_read_u32(ktz->client->dev.of_node, "current-num-sinks", 
&val))
+   if (!of_property_read_u32(ktz->client->dev.of_node, 
"current-num-sinks", &val))
ktz8866_write(ktz, BL_EN, BIT(val) - 1);
else
/* Enable all 6 current sinks if the number of current sinks 
isn't specified. */
ktz8866_write(ktz, BL_EN, BIT(6) - 1);
 
-   if (of_property_read_u32(ktz->client->dev.of_node, 
"kinetic,current-ramp-delay-ms", &val)) {
+   if (!of_property_read_u32(ktz->client->dev.of_node, 
"kinetic,current-ramp-delay-ms", &val)) {
if (val <= 128)
ktz8866_write(ktz, BL_CFG2, BIT(7) | (ilog2(val) << 3) 
| PWM_HYST);
else
ktz8866_write(ktz, BL_CFG2, BIT(7) | ((5 + val / 64) << 
3) | PWM_HYST);
}
 
-   if (of_property_read_u32(ktz->client->dev.of_node, 
"kinetic,led-enable-ramp-delay-ms", &val)) {
+   if (!of_property_read_u32(ktz->client->dev.of_node, 
"kinetic,led-enable-ramp-delay-ms", &val)) {
if (val == 0)
ktz8866_write(ktz, BL_DIMMING, 0);
else {
-- 
2.43.0

Re: [PATCH v5 5/8] iio: core: Add new DMABUF interface infrastructure

2024-01-29 Thread Christian König


Am 27.01.24 um 17:50 schrieb Jonathan Cameron:

+   iio_buffer_dmabuf_put(attach);
+
+out_dmabuf_put:
+   dma_buf_put(dmabuf);

As below. Feels like a __free(dma_buf_put) bit of magic would be a
nice to have.

I'm working on the patches right now, just one quick question.

Having a __free(dma_buf_put) requires that dma_buf_put is first
"registered" as a freeing function using DEFINE_FREE() in , which has not been done yet.

That would mean carrying a dma-buf specific patch in your tree, are you
OK with that?

Needs an ACK from appropriate maintainer, but otherwise I'm fine doing
so.  Alternative is to circle back to this later after this code is upstream.


Separate patches for that please, the autocleanup feature is so new that 
I'm not 100% convinced that everything works out smoothly from the start.


Regards,
Christian.




Cheers,
-Paul

Re: [PATCH v5 5/8] iio: core: Add new DMABUF interface infrastructure

2024-01-29 Thread Paul Cercueil

Hi Christian,

Le lundi 29 janvier 2024 à 13:52 +0100, Christian König a écrit :
> Am 27.01.24 um 17:50 schrieb Jonathan Cameron:
> > > > > + iio_buffer_dmabuf_put(attach);
> > > > > +
> > > > > +out_dmabuf_put:
> > > > > + dma_buf_put(dmabuf);
> > > > As below. Feels like a __free(dma_buf_put) bit of magic would
> > > > be a
> > > > nice to have.
> > > I'm working on the patches right now, just one quick question.
> > > 
> > > Having a __free(dma_buf_put) requires that dma_buf_put is first
> > > "registered" as a freeing function using DEFINE_FREE() in
> > >  > > buf.h>, which has not been done yet.
> > > 
> > > That would mean carrying a dma-buf specific patch in your tree,
> > > are you
> > > OK with that?
> > Needs an ACK from appropriate maintainer, but otherwise I'm fine
> > doing
> > so.  Alternative is to circle back to this later after this code is
> > upstream.
> 
> Separate patches for that please, the autocleanup feature is so new
> that 
> I'm not 100% convinced that everything works out smoothly from the
> start.

Separate patches is a given, did you mean outside this patchset?
Because I can send a separate patchset that introduces scope-based
management for dma_fence and dma_buf, but then it won't have users.

Cheers,
-Paul

Re: [PATCH v2 10/10] drm/vboxvideo: fix mapping leaks

2024-01-29 Thread Philipp Stanner

Hi,

On Mon, 2024-01-29 at 12:15 +0100, Hans de Goede wrote:
> Hi Philipp,
> 
> On 1/23/24 10:43, Philipp Stanner wrote:
> > When the PCI devres API was introduced to this driver, it was
> > wrongly
> > assumed that initializing the device with pcim_enable_device()
> > instead
> > of pci_enable_device() will make all PCI functions managed.
> > 
> > This is wrong and was caused by the quite confusing devres API for
> > PCI
> > in which some, but not all, functions become managed that way.
> > 
> > The function pci_iomap_range() is never managed.
> > 
> > Replace pci_iomap_range() with the actually managed function
> > pcim_iomap_range().
> > 
> > Additionally, add a call to pcim_request_region() to ensure
> > exclusive
> > access to BAR 0.
> 
> I'm a bit worried about this last change. There might be
> issues where the pcim_request_region() fails due to
> e.g. a conflict with the simplefb / simpledrm code.
> 
> There is a drm_aperture_remove_conflicting_pci_framebuffers()
> call done before hw_init() gets called, but still this
> has been known to cause issues in the past.
> 
> Can you split out the adding of the pcim_request_region()
> into a separate patch and *not* mark that separate patch
> for stable ?

Yes, that sounds reasonable. I'll split it out and deal with it once
I'll send the other DRM patches from my backlog.

Greetings,
P.

> 
> Regards,
> 
> Hans
> 
> 
> 
> 
> 
> > 
> > CC:  # v5.10+
> > Fixes: 8558de401b5f ("drm/vboxvideo: use managed pci functions")
> > Signed-off-by: Philipp Stanner 
> > ---
> >  drivers/gpu/drm/vboxvideo/vbox_main.c | 24 +--
> > -
> >  1 file changed, 13 insertions(+), 11 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/vboxvideo/vbox_main.c
> > b/drivers/gpu/drm/vboxvideo/vbox_main.c
> > index 42c2d8a99509..7f686a0190e6 100644
> > --- a/drivers/gpu/drm/vboxvideo/vbox_main.c
> > +++ b/drivers/gpu/drm/vboxvideo/vbox_main.c
> > @@ -42,12 +42,11 @@ static int vbox_accel_init(struct vbox_private
> > *vbox)
> > /* Take a command buffer for each screen from the end of
> > usable VRAM. */
> > vbox->available_vram_size -= vbox->num_crtcs *
> > VBVA_MIN_BUFFER_SIZE;
> >  
> > -   vbox->vbva_buffers = pci_iomap_range(pdev, 0,
> > -    vbox-
> > >available_vram_size,
> > -    vbox->num_crtcs *
> > -    VBVA_MIN_BUFFER_SIZE);
> > -   if (!vbox->vbva_buffers)
> > -   return -ENOMEM;
> > +   vbox->vbva_buffers = pcim_iomap_range(
> > +   pdev, 0, vbox->available_vram_size,
> > +   vbox->num_crtcs * VBVA_MIN_BUFFER_SIZE);
> > +   if (IS_ERR(vbox->vbva_buffers))
> > +   return PTR_ERR(vbox->vbva_buffers);
> >  
> > for (i = 0; i < vbox->num_crtcs; ++i) {
> > vbva_setup_buffer_context(&vbox->vbva_info[i],
> > @@ -115,12 +114,15 @@ int vbox_hw_init(struct vbox_private *vbox)
> >  
> > DRM_INFO("VRAM %08x\n", vbox->full_vram_size);
> >  
> > +   ret = pcim_request_region(pdev, 0, "vboxvideo");
> > +   if (ret)
> > +   return ret;
> > +
> > /* Map guest-heap at end of vram */
> > -   vbox->guest_heap =
> > -   pci_iomap_range(pdev, 0, GUEST_HEAP_OFFSET(vbox),
> > -   GUEST_HEAP_SIZE);
> > -   if (!vbox->guest_heap)
> > -   return -ENOMEM;
> > +   vbox->guest_heap = pcim_iomap_range(pdev, 0,
> > +   GUEST_HEAP_OFFSET(vbox), GUEST_HEAP_SIZE);
> > +   if (IS_ERR(vbox->guest_heap))
> > +   return PTR_ERR(vbox->guest_heap);
> >  
> > /* Create guest-heap mem-pool use 2^4 = 16 byte chunks */
> > vbox->guest_pool = devm_gen_pool_create(vbox->ddev.dev, 4,
> > -1,
>

Re: [PATCH v2 1/1] drm/virtio: Implement device_attach

2024-01-29 Thread Christian König


Am 29.01.24 um 11:31 schrieb Julia Zhang:

As vram objects don't have backing pages and thus can't implement
drm_gem_object_funcs.get_sg_table callback. This removes drm dma-buf
callbacks in virtgpu_gem_map_dma_buf()/virtgpu_gem_unmap_dma_buf()
and implement virtgpu specific map/unmap/attach callbacks to support
both of shmem objects and vram objects.

Signed-off-by: Julia Zhang 


I need to find more time to look into the code, but of hand I would say 
that this is the correct solution.


Regards,
Christian.


---
  drivers/gpu/drm/virtio/virtgpu_prime.c | 40 +++---
  1 file changed, 36 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_prime.c 
b/drivers/gpu/drm/virtio/virtgpu_prime.c
index 44425f20d91a..b490a5343b06 100644
--- a/drivers/gpu/drm/virtio/virtgpu_prime.c
+++ b/drivers/gpu/drm/virtio/virtgpu_prime.c
@@ -49,11 +49,26 @@ virtgpu_gem_map_dma_buf(struct dma_buf_attachment *attach,
  {
struct drm_gem_object *obj = attach->dmabuf->priv;
struct virtio_gpu_object *bo = gem_to_virtio_gpu_obj(obj);
+   struct sg_table *sgt;
+   int ret;
  
  	if (virtio_gpu_is_vram(bo))

return virtio_gpu_vram_map_dma_buf(bo, attach->dev, dir);
  
-	return drm_gem_map_dma_buf(attach, dir);

+   sgt = drm_prime_pages_to_sg(obj->dev,
+   to_drm_gem_shmem_obj(obj)->pages,
+   obj->size >> PAGE_SHIFT);
+   if (IS_ERR(sgt))
+   return sgt;
+
+   ret = dma_map_sgtable(attach->dev, sgt, dir, DMA_ATTR_SKIP_CPU_SYNC);
+   if (ret) {
+   sg_free_table(sgt);
+   kfree(sgt);
+   return ERR_PTR(ret);
+   }
+
+   return sgt;
  }
  
  static void virtgpu_gem_unmap_dma_buf(struct dma_buf_attachment *attach,

@@ -63,12 +78,29 @@ static void virtgpu_gem_unmap_dma_buf(struct 
dma_buf_attachment *attach,
struct drm_gem_object *obj = attach->dmabuf->priv;
struct virtio_gpu_object *bo = gem_to_virtio_gpu_obj(obj);
  
+	if (!sgt)

+   return;
+
if (virtio_gpu_is_vram(bo)) {
virtio_gpu_vram_unmap_dma_buf(attach->dev, sgt, dir);
-   return;
+   } else {
+   dma_unmap_sgtable(attach->dev, sgt, dir, 
DMA_ATTR_SKIP_CPU_SYNC);
+   sg_free_table(sgt);
+   kfree(sgt);
}
+}
+
+static int virtgpu_gem_device_attach(struct dma_buf *dma_buf,
+struct dma_buf_attachment *attach)
+{
+   struct drm_gem_object *obj = attach->dmabuf->priv;
+   struct virtio_gpu_object *bo = gem_to_virtio_gpu_obj(obj);
+   int ret = 0;
+
+   if (!virtio_gpu_is_vram(bo) && obj->funcs->pin)
+   ret = obj->funcs->pin(obj);
  
-	drm_gem_unmap_dma_buf(attach, sgt, dir);

+   return ret;
  }
  
  static const struct virtio_dma_buf_ops virtgpu_dmabuf_ops =  {

@@ -83,7 +115,7 @@ static const struct virtio_dma_buf_ops virtgpu_dmabuf_ops =  
{
.vmap = drm_gem_dmabuf_vmap,
.vunmap = drm_gem_dmabuf_vunmap,
},
-   .device_attach = drm_gem_map_attach,
+   .device_attach = virtgpu_gem_device_attach,
.get_uuid = virtgpu_virtio_get_uuid,
  };

Re: [PATCH v5 5/8] iio: core: Add new DMABUF interface infrastructure

2024-01-29 Thread Christian König


Am 29.01.24 um 14:06 schrieb Paul Cercueil:

Hi Christian,

Le lundi 29 janvier 2024 à 13:52 +0100, Christian König a écrit :

Am 27.01.24 um 17:50 schrieb Jonathan Cameron:

+   iio_buffer_dmabuf_put(attach);
+
+out_dmabuf_put:
+   dma_buf_put(dmabuf);

As below. Feels like a __free(dma_buf_put) bit of magic would
be a
nice to have.

I'm working on the patches right now, just one quick question.

Having a __free(dma_buf_put) requires that dma_buf_put is first
"registered" as a freeing function using DEFINE_FREE() in
, which has not been done yet.

That would mean carrying a dma-buf specific patch in your tree,
are you
OK with that?

Needs an ACK from appropriate maintainer, but otherwise I'm fine
doing
so.  Alternative is to circle back to this later after this code is
upstream.

Separate patches for that please, the autocleanup feature is so new
that
I'm not 100% convinced that everything works out smoothly from the
start.

Separate patches is a given, did you mean outside this patchset?
Because I can send a separate patchset that introduces scope-based
management for dma_fence and dma_buf, but then it won't have users.


Outside of the patchset, this is essentially brand new stuff.

IIRC we have quite a number of dma_fence selftests and sw_sync which is 
basically code inside the drivers/dma-buf directory only there for 
testing DMA-buf functionality.


Convert those over as well and I'm more than happy to upstream this change.

Thanks,
Christian.



Cheers,
-Paul

[PATCH 0/5] drm/msm: Add display support for X1E80100

2024-01-29 Thread Abel Vesa

This patchset adds support for display for X1E80100.
The support for embedded DisplayPort on this platform will not
be enabled using the connetor type from driver match data,
but through some 'is-edp' property via DT. This subsequent work
will be part of a separate patchset.

Signed-off-by: Abel Vesa 
---
Abel Vesa (4):
  dt-bindings: display/msm: document MDSS on X1E80100
  dt-bindings: display/msm: Document the DPU for X1E80100
  drm/msm: mdss: Add X1E80100 support
  drm/msm/dpu: Add X1E80100 support

Abhinav Kumar (1):
  drm/msm/dp: Try looking for link-frequencies into the port@0's endpoint 
first

 .../bindings/display/msm/qcom,sm8650-dpu.yaml  |   5 +-
 .../bindings/display/msm/qcom,x1e80100-mdss.yaml   | 249 
 .../drm/msm/disp/dpu1/catalog/dpu_9_2_x1e80100.h   | 449 +
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c |   2 +
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h |   1 +
 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c|   1 +
 drivers/gpu/drm/msm/dp/dp_parser.c |   6 +-
 drivers/gpu/drm/msm/msm_mdss.c |  10 +
 8 files changed, 721 insertions(+), 2 deletions(-)
---
base-commit: 6776c8d0924953c6bbd4920d8408f4c1d898af71
change-id: 20231201-x1e80100-display-a46324400baf

Best regards,
-- 
Abel Vesa

[PATCH 1/5] dt-bindings: display/msm: document MDSS on X1E80100

2024-01-29 Thread Abel Vesa

Document the MDSS hardware found on the Qualcomm X1E80100 platform.

Signed-off-by: Abel Vesa 
---
 .../bindings/display/msm/qcom,x1e80100-mdss.yaml   | 249 +
 1 file changed, 249 insertions(+)

diff --git 
a/Documentation/devicetree/bindings/display/msm/qcom,x1e80100-mdss.yaml 
b/Documentation/devicetree/bindings/display/msm/qcom,x1e80100-mdss.yaml
new file mode 100644
index ..eaa91f7d61ac
--- /dev/null
+++ b/Documentation/devicetree/bindings/display/msm/qcom,x1e80100-mdss.yaml
@@ -0,0 +1,249 @@
+# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/display/msm/qcom,x1e80100-mdss.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Qualcomm X1E80100 Display MDSS
+
+maintainers:
+  - Abel Vesa 
+
+description:
+  X1E80100 MSM Mobile Display Subsystem(MDSS), which encapsulates sub-blocks 
like
+  DPU display controller, DP interfaces, etc.
+
+$ref: /schemas/display/msm/mdss-common.yaml#
+
+properties:
+  compatible:
+const: qcom,x1e80100-mdss
+
+  clocks:
+items:
+  - description: Display AHB
+  - description: Display hf AXI
+  - description: Display core
+
+  iommus:
+maxItems: 1
+
+  interconnects:
+maxItems: 3
+
+  interconnect-names:
+maxItems: 3
+
+patternProperties:
+  "^display-controller@[0-9a-f]+$":
+type: object
+properties:
+  compatible:
+const: qcom,x1e80100-dpu
+
+  "^displayport-controller@[0-9a-f]+$":
+type: object
+properties:
+  compatible:
+const: qcom,x1e80100-dp
+
+  "^phy@[0-9a-f]+$":
+type: object
+properties:
+  compatible:
+const: qcom,x1e80100-dp-phy
+
+required:
+  - compatible
+
+unevaluatedProperties: false
+
+examples:
+  - |
+#include 
+#include 
+#include 
+
+display-subsystem@ae0 {
+compatible = "qcom,x1e80100-mdss";
+reg = <0x0ae0 0x1000>;
+reg-names = "mdss";
+
+interconnects = <&mmss_noc MASTER_MDP 0 &gem_noc SLAVE_LLCC 0>,
+<&mc_virt MASTER_LLCC 0 &mc_virt SLAVE_EBI1 0>;
+<&gem_noc MASTER_APPSS_PROC 0 &config_noc 
SLAVE_DISPLAY_CFG 0>;
+interconnect-names = "mdp0-mem", "mdp1-mem", "cpu-cfg";
+
+resets = <&dispcc_core_bcr>;
+
+power-domains = <&dispcc_gdsc>;
+
+clocks = <&dispcc DISP_CC_MDSS_AHB_CLK>,
+ <&gcc GCC_DISP_AHB_CLK>,
+ <&gcc GCC_DISP_HF_AXI_CLK>,
+ <&dispcc DISP_CC_MDSS_MDP_CLK>;
+clock-names = "iface", "bus", "nrt_bus", "core";
+
+interrupts = ;
+interrupt-controller;
+#interrupt-cells = <1>;
+
+iommus = <&apps_smmu 0x1c00 0x2>;
+
+#address-cells = <1>;
+#size-cells = <1>;
+ranges;
+
+display-controller@ae01000 {
+compatible = "qcom,x1e80100-dpu";
+reg = <0x0ae01000 0x8f000>,
+  <0x0aeb 0x2008>;
+reg-names = "mdp", "vbif";
+
+clocks = <&gcc_axi_clk>,
+ <&dispcc_ahb_clk>,
+ <&dispcc_mdp_lut_clk>,
+ <&dispcc_mdp_clk>,
+ <&dispcc_mdp_vsync_clk>;
+clock-names = "nrt_bus",
+  "iface",
+  "lut",
+  "core",
+  "vsync";
+
+assigned-clocks = <&dispcc_mdp_vsync_clk>;
+assigned-clock-rates = <1920>;
+
+operating-points-v2 = <&mdp_opp_table>;
+power-domains = <&rpmhpd RPMHPD_MMCX>;
+
+interrupt-parent = <&mdss>;
+interrupts = <0>;
+
+ports {
+#address-cells = <1>;
+#size-cells = <0>;
+
+port@0 {
+reg = <0>;
+dpu_intf1_out: endpoint {
+remote-endpoint = <&dsi0_in>;
+};
+};
+
+port@1 {
+reg = <1>;
+dpu_intf2_out: endpoint {
+remote-endpoint = <&dsi1_in>;
+};
+};
+};
+
+mdp_opp_table: opp-table {
+compatible = "operating-points-v2";
+
+opp-2 {
+opp-hz = /bits/ 64 <2>;
+required-opps = <&rpmhpd_opp_low_svs>;
+};
+
+opp-32500 {
+opp-hz = /bits/ 64 <32500>;
+required-opps = <&rpmhpd_opp_svs>;
+};
+
+opp-37500 {
+opp-hz = /bits/ 64 <37500>;
+required-opps = <&rpmhpd_opp_svs_l1>;
+};
+
+opp-51400 {
+opp-hz = /bits/ 64 <51400>;
+required-opps = <&rpmhpd_opp_nom>;
+};
+

[PATCH 2/5] dt-bindings: display/msm: Document the DPU for X1E80100

2024-01-29 Thread Abel Vesa

Document the DPU for Qualcomm X1E80100 platform in the SM8650 schema, as
they are similar.

Signed-off-by: Abel Vesa 
---
 Documentation/devicetree/bindings/display/msm/qcom,sm8650-dpu.yaml | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/display/msm/qcom,sm8650-dpu.yaml 
b/Documentation/devicetree/bindings/display/msm/qcom,sm8650-dpu.yaml
index a01d15a03317..f84fa6d5e6a2 100644
--- a/Documentation/devicetree/bindings/display/msm/qcom,sm8650-dpu.yaml
+++ b/Documentation/devicetree/bindings/display/msm/qcom,sm8650-dpu.yaml
@@ -13,7 +13,10 @@ $ref: /schemas/display/msm/dpu-common.yaml#
 
 properties:
   compatible:
-const: qcom,sm8650-dpu
+items:
+  - enum:
+  - qcom,sm8650-dpu
+  - qcom,x1e80100-dpu
 
   reg:
 items:

-- 
2.34.1

[PATCH 3/5] drm/msm: mdss: Add X1E80100 support

2024-01-29 Thread Abel Vesa

Add support for MDSS on X1E80100.

Signed-off-by: Abel Vesa 
---
 drivers/gpu/drm/msm/msm_mdss.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/msm/msm_mdss.c b/drivers/gpu/drm/msm/msm_mdss.c
index 455b2e3a0cdd..eddf7fdbb60a 100644
--- a/drivers/gpu/drm/msm/msm_mdss.c
+++ b/drivers/gpu/drm/msm/msm_mdss.c
@@ -564,6 +564,15 @@ static const struct msm_mdss_data sdm670_data = {
.highest_bank_bit = 1,
 };
 
+static const struct msm_mdss_data x1e80100_data = {
+   .ubwc_enc_version = UBWC_4_0,
+   .ubwc_dec_version = UBWC_4_3,
+   .ubwc_swizzle = 6,
+   .ubwc_static = 1,
+   .highest_bank_bit = 2,
+   .macrotile_mode = 1,
+};
+
 static const struct msm_mdss_data sdm845_data = {
.ubwc_enc_version = UBWC_2_0,
.ubwc_dec_version = UBWC_2_0,
@@ -655,6 +664,7 @@ static const struct of_device_id mdss_dt_match[] = {
{ .compatible = "qcom,sm8450-mdss", .data = &sm8350_data },
{ .compatible = "qcom,sm8550-mdss", .data = &sm8550_data },
{ .compatible = "qcom,sm8650-mdss", .data = &sm8550_data},
+   { .compatible = "qcom,x1e80100-mdss", .data = &x1e80100_data},
{}
 };
 MODULE_DEVICE_TABLE(of, mdss_dt_match);

-- 
2.34.1

[PATCH 5/5] drm/msm/dpu: Add X1E80100 support

2024-01-29 Thread Abel Vesa

Add definitions for the display hardware used on the Qualcomm X1E80100
platform.

Co-developed-by: Abhinav Kumar 
Signed-off-by: Abhinav Kumar 
Signed-off-by: Abel Vesa 
---
 .../drm/msm/disp/dpu1/catalog/dpu_9_2_x1e80100.h   | 449 +
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c |   2 +
 drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h |   1 +
 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c|   1 +
 4 files changed, 453 insertions(+)

diff --git a/drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_9_2_x1e80100.h 
b/drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_9_2_x1e80100.h
new file mode 100644
index ..d4f1fbfa420a
--- /dev/null
+++ b/drivers/gpu/drm/msm/disp/dpu1/catalog/dpu_9_2_x1e80100.h
@@ -0,0 +1,449 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (c) 2023, Linaro Limited
+ */
+
+#ifndef _DPU_9_2_X1E80100_H
+#define _DPU_9_2_X1E80100_H
+
+static const struct dpu_caps x1e80100_dpu_caps = {
+   .max_mixer_width = DEFAULT_DPU_OUTPUT_LINE_WIDTH,
+   .max_mixer_blendstages = 0xb,
+   .has_src_split = true,
+   .has_dim_layer = true,
+   .has_idle_pc = true,
+   .has_3d_merge = true,
+   .max_linewidth = 5120,
+   .pixel_ram_size = DEFAULT_PIXEL_RAM_SIZE,
+};
+
+static const struct dpu_mdp_cfg x1e80100_mdp = {
+   .name = "top_0",
+   .base = 0, .len = 0x494,
+   .features = BIT(DPU_MDP_PERIPH_0_REMOVED),
+   .clk_ctrls = {
+   [DPU_CLK_CTRL_REG_DMA] = { .reg_off = 0x2bc, .bit_off = 20 },
+   },
+};
+
+/* FIXME: get rid of DPU_CTL_SPLIT_DISPLAY in favour of proper ACTIVE_CTL 
support */
+static const struct dpu_ctl_cfg x1e80100_ctl[] = {
+   {
+   .name = "ctl_0", .id = CTL_0,
+   .base = 0x15000, .len = 0x290,
+   .features = CTL_SM8550_MASK | BIT(DPU_CTL_SPLIT_DISPLAY),
+   .intr_start = DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR2, 9),
+   }, {
+   .name = "ctl_1", .id = CTL_1,
+   .base = 0x16000, .len = 0x290,
+   .features = CTL_SM8550_MASK | BIT(DPU_CTL_SPLIT_DISPLAY),
+   .intr_start = DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR2, 10),
+   }, {
+   .name = "ctl_2", .id = CTL_2,
+   .base = 0x17000, .len = 0x290,
+   .features = CTL_SM8550_MASK,
+   .intr_start = DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR2, 11),
+   }, {
+   .name = "ctl_3", .id = CTL_3,
+   .base = 0x18000, .len = 0x290,
+   .features = CTL_SM8550_MASK,
+   .intr_start = DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR2, 12),
+   }, {
+   .name = "ctl_4", .id = CTL_4,
+   .base = 0x19000, .len = 0x290,
+   .features = CTL_SM8550_MASK,
+   .intr_start = DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR2, 13),
+   }, {
+   .name = "ctl_5", .id = CTL_5,
+   .base = 0x1a000, .len = 0x290,
+   .features = CTL_SM8550_MASK,
+   .intr_start = DPU_IRQ_IDX(MDP_SSPP_TOP0_INTR2, 23),
+   },
+};
+
+static const struct dpu_sspp_cfg x1e80100_sspp[] = {
+   {
+   .name = "sspp_0", .id = SSPP_VIG0,
+   .base = 0x4000, .len = 0x344,
+   .features = VIG_SDM845_MASK,
+   .sblk = &dpu_vig_sblk_qseed3_3_2,
+   .xin_id = 0,
+   .type = SSPP_TYPE_VIG,
+   }, {
+   .name = "sspp_1", .id = SSPP_VIG1,
+   .base = 0x6000, .len = 0x344,
+   .features = VIG_SDM845_MASK,
+   .sblk = &dpu_vig_sblk_qseed3_3_2,
+   .xin_id = 4,
+   .type = SSPP_TYPE_VIG,
+   }, {
+   .name = "sspp_2", .id = SSPP_VIG2,
+   .base = 0x8000, .len = 0x344,
+   .features = VIG_SDM845_MASK,
+   .sblk = &dpu_vig_sblk_qseed3_3_2,
+   .xin_id = 8,
+   .type = SSPP_TYPE_VIG,
+   }, {
+   .name = "sspp_3", .id = SSPP_VIG3,
+   .base = 0xa000, .len = 0x344,
+   .features = VIG_SDM845_MASK,
+   .sblk = &dpu_vig_sblk_qseed3_3_2,
+   .xin_id = 12,
+   .type = SSPP_TYPE_VIG,
+   }, {
+   .name = "sspp_8", .id = SSPP_DMA0,
+   .base = 0x24000, .len = 0x344,
+   .features = DMA_SDM845_MASK,
+   .sblk = &dpu_dma_sblk,
+   .xin_id = 1,
+   .type = SSPP_TYPE_DMA,
+   }, {
+   .name = "sspp_9", .id = SSPP_DMA1,
+   .base = 0x26000, .len = 0x344,
+   .features = DMA_SDM845_MASK,
+   .sblk = &dpu_dma_sblk,
+   .xin_id = 5,
+   .type = SSPP_TYPE_DMA,
+   }, {
+   .name = "sspp_10", .id = SSPP_DMA2,
+   .base = 0x28000, .len = 0x344,
+   .features = DMA_SDM845_MASK,
+   .sblk = &dpu_dma_sblk,
+   .xin_id = 9,
+   .type = SSPP_TYPE_DMA,
+

[PATCH 4/5] drm/msm/dp: Try looking for link-frequencies into the port@0's endpoint first

2024-01-29 Thread Abel Vesa

From: Abhinav Kumar 

On platforms where the endpoint used is on port@0, looking for port@1
instead results in just ignoring the max link-frequencies altogether.
Look at port@0 first, then, if not found, look for port@1.

Signed-off-by: Abhinav Kumar 
Signed-off-by: Abel Vesa 
---
 drivers/gpu/drm/msm/dp/dp_parser.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/dp/dp_parser.c 
b/drivers/gpu/drm/msm/dp/dp_parser.c
index 7032dcc8842b..eec5b8b83f4b 100644
--- a/drivers/gpu/drm/msm/dp/dp_parser.c
+++ b/drivers/gpu/drm/msm/dp/dp_parser.c
@@ -97,7 +97,11 @@ static u32 dp_parser_link_frequencies(struct device_node 
*of_node)
u64 frequency = 0;
int cnt;
 
-   endpoint = of_graph_get_endpoint_by_regs(of_node, 1, 0); /* port@1 */
+   endpoint = of_graph_get_endpoint_by_regs(of_node, 0, 0); /* port@0 */
+
+   if (!endpoint)
+   endpoint = of_graph_get_endpoint_by_regs(of_node, 1, 0); /* 
port@1 */
+
if (!endpoint)
return 0;
 

-- 
2.34.1

Re: Implement per-key keyboard backlight as auxdisplay?

2024-01-29 Thread Hans de Goede

Hi,

On 1/19/24 21:15, Pavel Machek wrote:
> Hi!
> 
 2. Implement per-key keyboards as auxdisplay

     - Pro:

         - Already has a concept for led positions

         - Is conceptually closer to "multiple leds forming a singular 
 entity"

     - Con:

         - No preexisting UPower support

         - No concept for special hardware lightning modes

         - No support for arbitrary led outlines yet (e.g. ISO style 
 enter-key)
>>>
>>> Please do this one.
>>
>> Ok, so based on the discussion so far and Pavel's feedback lets try to
>> design a custom userspace API for this. I do not believe that auxdisplay
>> is a good fit because:
> 
> Ok, so lets call this a "display". These days, framebuffers and drm
> handles displays. My proposal is to use similar API as other displays.
> 
>> So my proposal would be an ioctl interface (ioctl only no r/w)
>> using /dev/rgbkbd0 /dev/rgbkdb1, etc. registered as a misc chardev.
>>
>> For per key controllable rgb LEDs we need to discuss a coordinate
>> system. I propose using a fixed size of 16 rows of 64 keys,
>> so 64x16 in standard WxH notation.
>>
>> And then storing RGB in separate bytes, so userspace will then
>> always send a buffer of 192 bytes per line (64x3) x 14 rows
>> = 3072 bytes. With the kernel driver ignoring parts of
>> the buffer where there are no actual keys.
> 
> That's really really weird interface. If you are doing RGB888 64x14,
> lets make it a ... display? :-).
> 
> ioctl always sending 3072 bytes is really a hack.
> 
> Small displays exist and are quite common, surely we'd handle this as
> a display:
> https://pajenicko.cz/displeje/graficky-oled-displej-0-66-64x48-i2c-bily-wemos-d1-mini
> It is 64x48.

This is indeed a display and should use display APIs

> And then there's this:
> https://pajenicko.cz/displeje/maticovy-8x8-led-displej-s-radicem-max7219
> and this:
> https://pajenicko.cz/displeje/maticovy-8x32-led-displej-s-radicem-max7219
>
> One of them is 8x8.
> 
> Surely those should be displays, too?

The 8x8 one not really, the other one could be used to scroll
some text one but cannot display images, so not really displays
IMHO.

Anyways we are talking about keyboards here and those do not have
a regular x-y grid like your example above, so they certainly do
not count as displays. See the long discussion earlier in the thread.

Regards,

Hans

Re: Implement per-key keyboard backlight as auxdisplay?

2024-01-29 Thread Hans de Goede

Hi Werner,

On 1/19/24 17:04, Werner Sembach wrote:
> Am 19.01.24 um 09:44 schrieb Hans de Goede:



>> So my proposal would be an ioctl interface (ioctl only no r/w)
>> using /dev/rgbkbd0 /dev/rgbkdb1, etc. registered as a misc chardev.
>>
>> For per key controllable rgb LEDs we need to discuss a coordinate
>> system. I propose using a fixed size of 16 rows of 64 keys,
>> so 64x16 in standard WxH notation.
>>
>> And then storing RGB in separate bytes, so userspace will then
>> always send a buffer of 192 bytes per line (64x3) x 14 rows
>> = 3072 bytes. With the kernel driver ignoring parts of
>> the buffer where there are no actual keys.
> The be sure the "14 rows" is a typo? And should be 16 rows?

Yes that should be 16.



>> This way we can address all the possible keys in the various
>> standard layouts in one standard wat and then the drivers can
>> just skip keys which are not there when preparing the buffer
>> to send to the hw / fw.
> 
> Some remarks here:
> 
> - Some keyboards might have two or more leds for big keys like (iso-)enter, 
> shift, capslock, num+, etc. that in theory are individually controllable by 
> the firmware. In windows drivers this is usually abstracted away, but could 
> be interesting for effects (e.g. if the top of iso-enter is separate from the 
> bottom of iso-enter like with one of our devices).
> 
> - In combination with this: The driver might not be able to tell if the 
> actual physical keyboard is ISO or ANSI, so it might not be able the 
> correctly assign the leds around enter correctly as being an own key or being 
> part of ANSI- or ISO-enter.
> 
> - Should the interface have different addresses for the different enter and 
> num+ styles (or even the different length shifts and spacebars)?
> 
> One idea for this: Actually assign 1 value per line for tall keys per line, 3 
> (or maybe even 4, to have one spare) values per line for wide keys and 6 (or 
> 8) values for space. e.g.:

That sounds workable OTOH combined with your remarks about also supporting
lightbars. I'm starting to think that we need to just punt this to userspace.

So basically change things from trying to present a standardized address
space where say the 'Q' key is always in the same place just model
a keyboard as a string of LEDs (1 dimensional / so an array) and leave
mapping which address in the array is which key to userspace, then userspace
can have json or whatever files for this per keyboard.

This keeps the kernel interface much more KISS which I think is what
we need to strive for.

So instead of having /dev/rgbkbd we get a /dev/rgbledstring and then
that can be used for rbb-kbds and also your lightbar example as well
as actual RGB LED strings, which depending on the controller may
also have zones / effects, etc. just like the keyboards.



> - Right shift would have 3 values in row 10. The first value might be the 
> left side of shift or the additional ABNT/JIS key. The 2nd Key might be the 
> left side or middle of shift and the third key might be the right side of 
> shift or the only value for the whole key. The additional ABNT/JIS key still 
> also has a dedicated value which is used by drivers which can differentiate 
> between physical layouts.
> 
> - Enter would have 3 values in row 8 and 3 values in row 9. With the same 
> disambiguation as the additional ABNT/JIS but this time for ansii-/ and iso-#
> 
> - Num+ would have 2 values, one row 8 and one in row 9. The one in row 9 
> might control the whole key or might just control the lower half. The one in 
> row 8 might be another key or the upper half
> 
> For the left half if the main block the leftmost value should be the "might 
> be the only relevant"-value while the right most value should be the "might 
> be another key"-value. For the right side of the main block this should be 
> swapped. Unused values should be adjacent to the "might be another 
> key"-value, e.g.:
> 
>   | Left shift value 1    | Left shift value 
> 2   | Left shift value 3    | Left shift value 4 | 102nd 
> key value
> ISO/ANSI aware    | Left shift color  | Unused
>    | Unused    | Unused | 102nd 
> key color
> ISO non aware 1 led under shift   | Left shift color  | Unused
>    | Unused    | 102nd key color    | Unused
> ANSI non aware 1 led under shift  | Left shift color  | Unused
>    | Unused    | Unused | Unused
> ISO non aware 2 leds under shift  | Left shift left color | Left shift right 
> color   | Unused    | 102nd key color    | Unused
> ANSI non aware 2 leds under shift | Left shift left color | Left shift right 
> color   | Unused    | Unused | Unused
> ISO non aware 3 leds under shift  | Left shift left color | Left shift middle 
> co

Re: [PATCH v5 5/8] iio: core: Add new DMABUF interface infrastructure

2024-01-29 Thread Paul Cercueil

Le lundi 29 janvier 2024 à 14:17 +0100, Christian König a écrit :
> Am 29.01.24 um 14:06 schrieb Paul Cercueil:
> > Hi Christian,
> > 
> > Le lundi 29 janvier 2024 à 13:52 +0100, Christian König a écrit :
> > > Am 27.01.24 um 17:50 schrieb Jonathan Cameron:
> > > > > > > + iio_buffer_dmabuf_put(attach);
> > > > > > > +
> > > > > > > +out_dmabuf_put:
> > > > > > > + dma_buf_put(dmabuf);
> > > > > > As below. Feels like a __free(dma_buf_put) bit of magic
> > > > > > would
> > > > > > be a
> > > > > > nice to have.
> > > > > I'm working on the patches right now, just one quick
> > > > > question.
> > > > > 
> > > > > Having a __free(dma_buf_put) requires that dma_buf_put is
> > > > > first
> > > > > "registered" as a freeing function using DEFINE_FREE() in
> > > > >  > > > > buf.h>, which has not been done yet.
> > > > > 
> > > > > That would mean carrying a dma-buf specific patch in your
> > > > > tree,
> > > > > are you
> > > > > OK with that?
> > > > Needs an ACK from appropriate maintainer, but otherwise I'm
> > > > fine
> > > > doing
> > > > so.  Alternative is to circle back to this later after this
> > > > code is
> > > > upstream.
> > > Separate patches for that please, the autocleanup feature is so
> > > new
> > > that
> > > I'm not 100% convinced that everything works out smoothly from
> > > the
> > > start.
> > Separate patches is a given, did you mean outside this patchset?
> > Because I can send a separate patchset that introduces scope-based
> > management for dma_fence and dma_buf, but then it won't have users.
> 
> Outside of the patchset, this is essentially brand new stuff.
> 
> IIRC we have quite a number of dma_fence selftests and sw_sync which
> is 
> basically code inside the drivers/dma-buf directory only there for 
> testing DMA-buf functionality.
> 
> Convert those over as well and I'm more than happy to upstream this
> change.

Well there is very little to convert there; you can use scope-based
management when the unref is done in all exit points of the functional
block, and the only place I could find that does that in drivers/dma-
buf/ was in dma_fence_chain_enable_signaling() in dma-fence-chain.c.

Cheers,
-Paul

Re: [PATCH 1/2] ACPI: video: Handle fetching EDID that is longer than 256 bytes

2024-01-29 Thread Rafael J. Wysocki

On Fri, Jan 26, 2024 at 7:55 PM Mario Limonciello
 wrote:
>
> The ACPI specification allows for an EDID to be up to 512 bytes but
> the _DDC EDID fetching code will only try up to 256 bytes.
>
> Modify the code to instead start at 512 bytes and work it's way
> down instead.
>
> Link: 
> https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/Apx_B_Video_Extensions/output-device-specific-methods.html#ddc-return-the-edid-for-this-device
> Signed-off-by: Mario Limonciello 
> ---
>  drivers/acpi/acpi_video.c | 23 ---
>  1 file changed, 16 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/acpi/acpi_video.c b/drivers/acpi/acpi_video.c
> index 62f4364e4460..b3b15dd4755d 100644
> --- a/drivers/acpi/acpi_video.c
> +++ b/drivers/acpi/acpi_video.c
> @@ -624,6 +624,10 @@ acpi_video_device_EDID(struct acpi_video_device *device,
> arg0.integer.value = 1;
> else if (length == 256)
> arg0.integer.value = 2;
> +   else if (length == 384)
> +   arg0.integer.value = 3;
> +   else if (length == 512)
> +   arg0.integer.value = 4;

It looks like switch () would be somewhat better.

Or maybe even

arg0.integer.value = length / 128;

The validation could be added too:

if (arg0.integer.value > 4 || arg0.integer.value * 128 != length)
return -EINVAL;

but it is pointless, because the caller is never passing an invalid
number to it AFAICS.

> else
> return -EINVAL;
>
> @@ -1443,7 +1447,7 @@ int acpi_video_get_edid(struct acpi_device *device, int 
> type, int device_id,
>
> for (i = 0; i < video->attached_count; i++) {
> video_device = video->attached_array[i].bind_info;
> -   length = 256;
> +   length = 512;
>
> if (!video_device)
> continue;
> @@ -1478,13 +1482,18 @@ int acpi_video_get_edid(struct acpi_device *device, 
> int type, int device_id,
>
> if (ACPI_FAILURE(status) || !buffer ||
> buffer->type != ACPI_TYPE_BUFFER) {
> -   length = 128;
> -   status = acpi_video_device_EDID(video_device, &buffer,
> -   length);
> -   if (ACPI_FAILURE(status) || !buffer ||
> -   buffer->type != ACPI_TYPE_BUFFER) {
> -   continue;
> +   while (length) {

I would prefer a do {} while () loop here, which could include the
first invocation of acpi_video_device_EDID() too (and reduce code
duplication a bit).

> +   length -= 128;
> +   status = acpi_video_device_EDID(video_device, 
> &buffer,
> +   length);

No line break, please.

> +   if (ACPI_FAILURE(status) || !buffer ||
> +   buffer->type != ACPI_TYPE_BUFFER) {
> +   continue;
> +   }
> +   break;
> }
> +   if (!length)
> +   continue;
> }
>
> *edid = buffer->buffer.pointer;
> --

Re: [PATCH v5 5/8] iio: core: Add new DMABUF interface infrastructure

2024-01-29 Thread Paul Cercueil

Le lundi 29 janvier 2024 à 14:32 +0100, Paul Cercueil a écrit :
> Le lundi 29 janvier 2024 à 14:17 +0100, Christian König a écrit :
> > Am 29.01.24 um 14:06 schrieb Paul Cercueil:
> > > Hi Christian,
> > > 
> > > Le lundi 29 janvier 2024 à 13:52 +0100, Christian König a écrit :
> > > > Am 27.01.24 um 17:50 schrieb Jonathan Cameron:
> > > > > > > > +   iio_buffer_dmabuf_put(attach);
> > > > > > > > +
> > > > > > > > +out_dmabuf_put:
> > > > > > > > +   dma_buf_put(dmabuf);
> > > > > > > As below. Feels like a __free(dma_buf_put) bit of magic
> > > > > > > would
> > > > > > > be a
> > > > > > > nice to have.
> > > > > > I'm working on the patches right now, just one quick
> > > > > > question.
> > > > > > 
> > > > > > Having a __free(dma_buf_put) requires that dma_buf_put is
> > > > > > first
> > > > > > "registered" as a freeing function using DEFINE_FREE() in
> > > > > >  > > > > > buf.h>, which has not been done yet.
> > > > > > 
> > > > > > That would mean carrying a dma-buf specific patch in your
> > > > > > tree,
> > > > > > are you
> > > > > > OK with that?
> > > > > Needs an ACK from appropriate maintainer, but otherwise I'm
> > > > > fine
> > > > > doing
> > > > > so.  Alternative is to circle back to this later after this
> > > > > code is
> > > > > upstream.
> > > > Separate patches for that please, the autocleanup feature is so
> > > > new
> > > > that
> > > > I'm not 100% convinced that everything works out smoothly from
> > > > the
> > > > start.
> > > Separate patches is a given, did you mean outside this patchset?
> > > Because I can send a separate patchset that introduces scope-
> > > based
> > > management for dma_fence and dma_buf, but then it won't have
> > > users.
> > 
> > Outside of the patchset, this is essentially brand new stuff.
> > 
> > IIRC we have quite a number of dma_fence selftests and sw_sync
> > which
> > is 
> > basically code inside the drivers/dma-buf directory only there for 
> > testing DMA-buf functionality.
> > 
> > Convert those over as well and I'm more than happy to upstream this
> > change.
> 
> Well there is very little to convert there; you can use scope-based
> management when the unref is done in all exit points of the
> functional
> block, and the only place I could find that does that in drivers/dma-
> buf/ was in dma_fence_chain_enable_signaling() in dma-fence-chain.c.

Actually - not even that, since it doesn't call dma_fence_get() and
dma_fence_put() on the same fence.

So I cannot use it anywhere in drivers/dma-buf/.

-Paul

Re: Re: Re: [PATCH 1/3] bits: introduce fixed-type genmasks

2024-01-29 Thread Lucas De Marchi


On Wed, Jan 24, 2024 at 07:27:58AM -0800, Yury Norov wrote:

On Wed, Jan 24, 2024 at 08:03:53AM -0600, Lucas De Marchi wrote:

On Wed, Jan 24, 2024 at 09:58:26AM +0200, Jani Nikula wrote:
> On Tue, 23 Jan 2024, Lucas De Marchi  wrote:
> > From: Yury Norov 
> >
> > Generalize __GENMASK() to support different types, and implement
> > fixed-types versions of GENMASK() based on it. The fixed-type version
> > allows more strict checks to the min/max values accepted, which is
> > useful for defining registers like implemented by i915 and xe drivers
> > with their REG_GENMASK*() macros.
>
> Mmh, the commit message says the fixed-type version allows more strict
> checks, but none are actually added. GENMASK_INPUT_CHECK() remains the
> same.
>
> Compared to the i915 and xe versions, this is more lax now. You could
> specify GENMASK_U32(63,32) without complaints.

Doing this on top of the this series:

-#define   XELPDP_PORT_M2P_COMMAND_TYPE_MASKREG_GENMASK(30, 27)
+#define   XELPDP_PORT_M2P_COMMAND_TYPE_MASKREG_GENMASK(62, 32)

and I do get a build failure:

../drivers/gpu/drm/i915/display/intel_cx0_phy.c: In function 
‘__intel_cx0_read_once’:
../include/linux/bits.h:41:31: error: left shift count >= width of type 
[-Werror=shift-count-overflow]
   41 |  (((t)~0ULL - ((t)(1) << (l)) + 1) & \
  |   ^~


I would better include this in commit message to avoid people's
confusion. If it comes to v2, can you please do it and mention that
this trick relies on shift-count-overflow compiler check?


either that or an explicit check as it was suggested. What's your
preference?

Lucas De Marchi



Thanks,
Yury

Re: Making drm_gpuvm work across gpu devices

2024-01-29 Thread Felix Kuehling


On 2024-01-25 13:32, Daniel Vetter wrote:

On Wed, Jan 24, 2024 at 09:33:12AM +0100, Christian König wrote:

Am 23.01.24 um 20:37 schrieb Zeng, Oak:

[SNIP]
Yes most API are per device based.

One exception I know is actually the kfd SVM API. If you look at the svm_ioctl 
function, it is per-process based. Each kfd_process represent a process across 
N gpu devices.

Yeah and that was a big mistake in my opinion. We should really not do that
ever again.


Need to say, kfd SVM represent a shared virtual address space across CPU and 
all GPU devices on the system. This is by the definition of SVM (shared virtual 
memory). This is very different from our legacy gpu *device* driver which works 
for only one device (i.e., if you want one device to access another device's 
memory, you will have to use dma-buf export/import etc).

Exactly that thinking is what we have currently found as blocker for a
virtualization projects. Having SVM as device independent feature which
somehow ties to the process address space turned out to be an extremely bad
idea.

The background is that this only works for some use cases but not all of
them.

What's working much better is to just have a mirror functionality which says
that a range A..B of the process address space is mapped into a range C..D
of the GPU address space.

Those ranges can then be used to implement the SVM feature required for
higher level APIs and not something you need at the UAPI or even inside the
low level kernel memory management.

When you talk about migrating memory to a device you also do this on a per
device basis and *not* tied to the process address space. If you then get
crappy performance because userspace gave contradicting information where to
migrate memory then that's a bug in userspace and not something the kernel
should try to prevent somehow.

[SNIP]

I think if you start using the same drm_gpuvm for multiple devices you
will sooner or later start to run into the same mess we have seen with
KFD, where we moved more and more functionality from the KFD to the DRM
render node because we found that a lot of the stuff simply doesn't work
correctly with a single object to maintain the state.

As I understand it, KFD is designed to work across devices. A single pseudo 
/dev/kfd device represent all hardware gpu devices. That is why during kfd 
open, many pdd (process device data) is created, each for one hardware device 
for this process.

Yes, I'm perfectly aware of that. And I can only repeat myself that I see
this design as a rather extreme failure. And I think it's one of the reasons
why NVidia is so dominant with Cuda.

This whole approach KFD takes was designed with the idea of extending the
CPU process into the GPUs, but this idea only works for a few use cases and
is not something we should apply to drivers in general.

A very good example are virtualization use cases where you end up with CPU
address != GPU address because the VAs are actually coming from the guest VM
and not the host process.

SVM is a high level concept of OpenCL, Cuda, ROCm etc.. This should not have
any influence on the design of the kernel UAPI.

If you want to do something similar as KFD for Xe I think you need to get
explicit permission to do this from Dave and Daniel and maybe even Linus.

I think the one and only one exception where an SVM uapi like in kfd makes
sense, is if the _hardware_ itself, not the software stack defined
semantics that you've happened to build on top of that hw, enforces a 1:1
mapping with the cpu process address space.

Which means your hardware is using PASID, IOMMU based translation, PCI-ATS
(address translation services) or whatever your hw calls it and has _no_
device-side pagetables on top. Which from what I've seen all devices with
device-memory have, simply because they need some place to store whether
that memory is currently in device memory or should be translated using
PASID. Currently there's no gpu that works with PASID only, but there are
some on-cpu-die accelerator things that do work like that.

Maybe in the future there will be some accelerators that are fully cpu
cache coherent (including atomics) with something like CXL, and the
on-device memory is managed as normal system memory with struct page as
ZONE_DEVICE and accelerator va -> physical address translation is only
done with PASID ... but for now I haven't seen that, definitely not in
upstream drivers.

And the moment you have some per-device pagetables or per-device memory
management of some sort (like using gpuva mgr) then I'm 100% agreeing with
Christian that the kfd SVM model is too strict and not a great idea.


That basically means, without ATS/PRI+PASID you cannot implement a 
unified memory programming model, where GPUs or accelerators access 
virtual addresses without pre-registering them with an SVM API call.


Unified memory is a feature implemented by the KFD SVM API and used by 
ROCm. This is used e.g. to implement OpenMP USM (unified shared memory). 
It'

Re: [PATCH 4/5] drm/msm/dp: Try looking for link-frequencies into the port@0's endpoint first

2024-01-29 Thread Dmitry Baryshkov

On Mon, 29 Jan 2024 at 15:19, Abel Vesa  wrote:
>
> From: Abhinav Kumar 
>
> On platforms where the endpoint used is on port@0, looking for port@1
> instead results in just ignoring the max link-frequencies altogether.
> Look at port@0 first, then, if not found, look for port@1.

NAK. Platforms do not "use port@0". It is for the connection between
DPU and DP, while the link-frequencies property is for the link
between DP controller and the actual display.

>
> Signed-off-by: Abhinav Kumar 
> Signed-off-by: Abel Vesa 
> ---
>  drivers/gpu/drm/msm/dp/dp_parser.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/msm/dp/dp_parser.c 
> b/drivers/gpu/drm/msm/dp/dp_parser.c
> index 7032dcc8842b..eec5b8b83f4b 100644
> --- a/drivers/gpu/drm/msm/dp/dp_parser.c
> +++ b/drivers/gpu/drm/msm/dp/dp_parser.c
> @@ -97,7 +97,11 @@ static u32 dp_parser_link_frequencies(struct device_node 
> *of_node)
> u64 frequency = 0;
> int cnt;
>
> -   endpoint = of_graph_get_endpoint_by_regs(of_node, 1, 0); /* port@1 */
> +   endpoint = of_graph_get_endpoint_by_regs(of_node, 0, 0); /* port@0 */
> +
> +   if (!endpoint)
> +   endpoint = of_graph_get_endpoint_by_regs(of_node, 1, 0); /* 
> port@1 */
> +
> if (!endpoint)
> return 0;
>
>
> --
> 2.34.1
>


-- 
With best wishes
Dmitry

Re: Re: Re: [PATCH 1/3] bits: introduce fixed-type genmasks

2024-01-29 Thread Yury Norov

On Mon, Jan 29, 2024 at 08:49:35AM -0600, Lucas De Marchi wrote:
> On Wed, Jan 24, 2024 at 07:27:58AM -0800, Yury Norov wrote:
> > On Wed, Jan 24, 2024 at 08:03:53AM -0600, Lucas De Marchi wrote:
> > > On Wed, Jan 24, 2024 at 09:58:26AM +0200, Jani Nikula wrote:
> > > > On Tue, 23 Jan 2024, Lucas De Marchi  wrote:
> > > > > From: Yury Norov 
> > > > >
> > > > > Generalize __GENMASK() to support different types, and implement
> > > > > fixed-types versions of GENMASK() based on it. The fixed-type version
> > > > > allows more strict checks to the min/max values accepted, which is
> > > > > useful for defining registers like implemented by i915 and xe drivers
> > > > > with their REG_GENMASK*() macros.
> > > >
> > > > Mmh, the commit message says the fixed-type version allows more strict
> > > > checks, but none are actually added. GENMASK_INPUT_CHECK() remains the
> > > > same.
> > > >
> > > > Compared to the i915 and xe versions, this is more lax now. You could
> > > > specify GENMASK_U32(63,32) without complaints.
> > > 
> > > Doing this on top of the this series:
> > > 
> > > -#define   XELPDP_PORT_M2P_COMMAND_TYPE_MASKREG_GENMASK(30, 
> > > 27)
> > > +#define   XELPDP_PORT_M2P_COMMAND_TYPE_MASKREG_GENMASK(62, 
> > > 32)
> > > 
> > > and I do get a build failure:
> > > 
> > > ../drivers/gpu/drm/i915/display/intel_cx0_phy.c: In function 
> > > ‘__intel_cx0_read_once’:
> > > ../include/linux/bits.h:41:31: error: left shift count >= width of type 
> > > [-Werror=shift-count-overflow]
> > >41 |  (((t)~0ULL - ((t)(1) << (l)) + 1) & \
> > >   |   ^~
> > 
> > I would better include this in commit message to avoid people's
> > confusion. If it comes to v2, can you please do it and mention that
> > this trick relies on shift-count-overflow compiler check?
> 
> either that or an explicit check as it was suggested. What's your
> preference?

Let's put a comment in the code. An argument that shift-count-overflow
may be disabled sounds more like a speculation unless we have a solid
example of a build system where the error is disabled for a good sane
reason, but possible GENMASK() overflow is still considered dangerous.

GENMASK() is all about bit shifts, so shift-related error is something
I'd expect when using GENMASK().

Also, the macro is widely used in the kernel:

yury:linux$ git grep GENMASK | wc -l
26879

Explicit check would add pressure on the compiler for nothing. 

Thanks,
Yury

Re: [PATCH 3/5] drm/msm: mdss: Add X1E80100 support

2024-01-29 Thread Dmitry Baryshkov

On Mon, 29 Jan 2024 at 15:19, Abel Vesa  wrote:
>
> Add support for MDSS on X1E80100.
>
> Signed-off-by: Abel Vesa 
> ---
>  drivers/gpu/drm/msm/msm_mdss.c | 10 ++
>  1 file changed, 10 insertions(+)
>
> diff --git a/drivers/gpu/drm/msm/msm_mdss.c b/drivers/gpu/drm/msm/msm_mdss.c
> index 455b2e3a0cdd..eddf7fdbb60a 100644
> --- a/drivers/gpu/drm/msm/msm_mdss.c
> +++ b/drivers/gpu/drm/msm/msm_mdss.c
> @@ -564,6 +564,15 @@ static const struct msm_mdss_data sdm670_data = {
> .highest_bank_bit = 1,
>  };
>
> +static const struct msm_mdss_data x1e80100_data = {
> +   .ubwc_enc_version = UBWC_4_0,
> +   .ubwc_dec_version = UBWC_4_3,
> +   .ubwc_swizzle = 6,
> +   .ubwc_static = 1,
> +   .highest_bank_bit = 2,
> +   .macrotile_mode = 1,

Missing .reg_bus_bw, LGTM otherwise

> +};
> +
>  static const struct msm_mdss_data sdm845_data = {
> .ubwc_enc_version = UBWC_2_0,
> .ubwc_dec_version = UBWC_2_0,
> @@ -655,6 +664,7 @@ static const struct of_device_id mdss_dt_match[] = {
> { .compatible = "qcom,sm8450-mdss", .data = &sm8350_data },
> { .compatible = "qcom,sm8550-mdss", .data = &sm8550_data },
> { .compatible = "qcom,sm8650-mdss", .data = &sm8550_data},
> +   { .compatible = "qcom,x1e80100-mdss", .data = &x1e80100_data},
> {}
>  };
>  MODULE_DEVICE_TABLE(of, mdss_dt_match);
>
> --
> 2.34.1
>


-- 
With best wishes
Dmitry

Re: [PATCH 5/5] drm/msm/dpu: Add X1E80100 support

2024-01-29 Thread Dmitry Baryshkov

On Mon, 29 Jan 2024 at 15:19, Abel Vesa  wrote:
>
> Add definitions for the display hardware used on the Qualcomm X1E80100
> platform.
>
> Co-developed-by: Abhinav Kumar 
> Signed-off-by: Abhinav Kumar 
> Signed-off-by: Abel Vesa 
> ---
>  .../drm/msm/disp/dpu1/catalog/dpu_9_2_x1e80100.h   | 449 
> +
>  drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c |   2 +
>  drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h |   1 +
>  drivers/gpu/drm/msm/disp/dpu1/dpu_kms.c|   1 +
>  4 files changed, 453 insertions(+)
>


Reviewed-by: Dmitry Baryshkov 


-- 
With best wishes
Dmitry

Re: [PATCH v4 0/8] drm/amd/display: Introduce KUnit to Display Mode Library

2024-01-29 Thread Christian König


That we include so many C files with relative paths seems to be odd.

Apart from that looks good to me.

Christian.

Am 26.01.24 um 16:48 schrieb Rodrigo Siqueira:

In 2022, we got a great patchset from a GSoC project introducing unit
tests to the amdgpu display. Since version 3, this effort was put on
hold, and now I'm attempting to revive it. I'll add part of the original
cover letter at the bottom of this cover letter, but you can read all
the original messages at:

https://lore.kernel.org/amd-gfx/20220912155919.39877-1-mairaca...@riseup.net/

Anyway, this new version changes are:
- Rebase and adjust conflicts.
- Rewrite part of the dc_dmub_srv_test to represent a real scenario that
   simulates some parameter configuration for using 4k144 and 4k240
   displays.

Thanks
Siqueira

Original cover letter

Hello,

This series is version 3 of the introduction of unit testing to the
AMDPGU driver [1].

Our main goal is to bring unit testing to the AMD display driver; in
particular, we'll focus on the Display Mode Library (DML) for DCN2.0,
DMUB, and some of the DCE functions. This implementation intends to
help developers to recognize bugs before they are merged into the
mainline and also makes it possible for future code refactors of the
AMD display driver.

For the implementation of the tests, we decided to go with the Kernel
Unit Testing Framework (KUnit). KUnit makes it possible to run test
suites on kernel boot or load the tests as a module. It reports all test
case results through a TAP (Test Anything Protocol) in the kernel log.
Moreover, KUnit unifies the test structure and provides tools to
simplify the testing for developers and CI systems.

In regards to CI pipelines, we believe kunit_tool [2] provides
ease of use, but we are also working on integrating KUnit into IGT [3].

Since the second version, we've chosen a mix of approaches to integrate
KUnit tests into amdgpu:
 1. Tests that use static functions are included through guards [4].
 2. Tests without static functions are included through a Makefile.

We understand that testing static functions is not ideal, but taking into
consideration that this driver relies heavily on static functions with
complex behavior which would benefit from unit testing, otherwise, black-box
tested through public functions with dozens of arguments and sometimes high
cyclomatic complexity.

The first seven patches represent what we intend to do for the rest of the
DML modules: systematic testing of the DML functions, especially mathematically
complicated functions. Also, it shows how simple it is to add new tests to the 
DML.

Among the tests, we highlight the dcn20_fpu_test, which, had it existed
then, could catch the defects introduced to dcn20_fpu.c by 8861c27a6c [5]
later fixed by 9ad5d02c2a [6].

In this series, there's also an example of how unit tests can help avoid
regressions and keep track of changes in behavior.

[..]

Isabella Basso (1):
   drm/amd/display: Introduce KUnit tests to display_rq_dlg_calc_20

Magali Lemes (1):
   drm/amd/display: Introduce KUnit tests for dcn20_fpu

Maíra Canal (5):
   drm/amd/display: Introduce KUnit tests to the bw_fixed library
   drm/amd/display: Introduce KUnit tests to the display_mode_vba library
   drm/amd/display: Introduce KUnit to dcn20/display_mode_vba_20 library
   drm/amd/display: Introduce KUnit tests to dc_dmub_srv library
   Documentation/gpu: Add Display Core Unit Test documentation

Tales Aparecida (1):
   drm/amd/display: Introduce KUnit tests for fixed31_32 library

  .../gpu/amdgpu/display/display-test.rst   |  88 ++
  Documentation/gpu/amdgpu/display/index.rst|   1 +
  drivers/gpu/drm/amd/display/Kconfig   |  52 +
  drivers/gpu/drm/amd/display/Makefile  |   2 +-
  drivers/gpu/drm/amd/display/dc/dc_dmub_srv.c  |   4 +
  .../dc/dml/dcn20/display_mode_vba_20.c|   4 +
  .../dc/dml/dcn20/display_rq_dlg_calc_20.c |   4 +
  drivers/gpu/drm/amd/display/tests/Makefile|  18 +
  .../display/tests/dc/basics/fixpt31_32_test.c | 232 +
  .../amd/display/tests/dc/dc_dmub_srv_test.c   | 159 
  .../tests/dc/dml/calcs/bw_fixed_test.c| 323 +++
  .../tests/dc/dml/dcn20/dcn20_fpu_test.c   | 561 +++
  .../dc/dml/dcn20/display_mode_vba_20_test.c   | 888 ++
  .../dml/dcn20/display_rq_dlg_calc_20_test.c   | 124 +++
  .../tests/dc/dml/display_mode_vba_test.c  | 741 +++
  15 files changed, 3200 insertions(+), 1 deletion(-)
  create mode 100644 Documentation/gpu/amdgpu/display/display-test.rst
  create mode 100644 drivers/gpu/drm/amd/display/tests/Makefile
  create mode 100644 
drivers/gpu/drm/amd/display/tests/dc/basics/fixpt31_32_test.c
  create mode 100644 drivers/gpu/drm/amd/display/tests/dc/dc_dmub_srv_test.c
  create mode 100644 
drivers/gpu/drm/amd/display/tests/dc/dml/calcs/bw_fixed_test.c
  create mode 100644 
drivers/gpu/drm/amd/display/tests/dc/dml/dcn20/dcn20_fpu_test.c
  create mode 100644 
drivers/gpu/drm/amd/

Re: Making drm_gpuvm work across gpu devices

2024-01-29 Thread Christian König


Am 29.01.24 um 16:03 schrieb Felix Kuehling:

On 2024-01-25 13:32, Daniel Vetter wrote:

On Wed, Jan 24, 2024 at 09:33:12AM +0100, Christian König wrote:

Am 23.01.24 um 20:37 schrieb Zeng, Oak:

[SNIP]
Yes most API are per device based.

One exception I know is actually the kfd SVM API. If you look at 
the svm_ioctl function, it is per-process based. Each kfd_process 
represent a process across N gpu devices.
Yeah and that was a big mistake in my opinion. We should really not 
do that

ever again.

Need to say, kfd SVM represent a shared virtual address space 
across CPU and all GPU devices on the system. This is by the 
definition of SVM (shared virtual memory). This is very different 
from our legacy gpu *device* driver which works for only one device 
(i.e., if you want one device to access another device's memory, 
you will have to use dma-buf export/import etc).

Exactly that thinking is what we have currently found as blocker for a
virtualization projects. Having SVM as device independent feature which
somehow ties to the process address space turned out to be an 
extremely bad

idea.

The background is that this only works for some use cases but not 
all of

them.

What's working much better is to just have a mirror functionality 
which says
that a range A..B of the process address space is mapped into a 
range C..D

of the GPU address space.

Those ranges can then be used to implement the SVM feature required for
higher level APIs and not something you need at the UAPI or even 
inside the

low level kernel memory management.

When you talk about migrating memory to a device you also do this on 
a per
device basis and *not* tied to the process address space. If you 
then get
crappy performance because userspace gave contradicting information 
where to
migrate memory then that's a bug in userspace and not something the 
kernel

should try to prevent somehow.

[SNIP]
I think if you start using the same drm_gpuvm for multiple devices 
you
will sooner or later start to run into the same mess we have seen 
with
KFD, where we moved more and more functionality from the KFD to 
the DRM
render node because we found that a lot of the stuff simply 
doesn't work

correctly with a single object to maintain the state.
As I understand it, KFD is designed to work across devices. A 
single pseudo /dev/kfd device represent all hardware gpu devices. 
That is why during kfd open, many pdd (process device data) is 
created, each for one hardware device for this process.
Yes, I'm perfectly aware of that. And I can only repeat myself that 
I see
this design as a rather extreme failure. And I think it's one of the 
reasons

why NVidia is so dominant with Cuda.

This whole approach KFD takes was designed with the idea of 
extending the
CPU process into the GPUs, but this idea only works for a few use 
cases and

is not something we should apply to drivers in general.

A very good example are virtualization use cases where you end up 
with CPU
address != GPU address because the VAs are actually coming from the 
guest VM

and not the host process.

SVM is a high level concept of OpenCL, Cuda, ROCm etc.. This should 
not have

any influence on the design of the kernel UAPI.

If you want to do something similar as KFD for Xe I think you need 
to get
explicit permission to do this from Dave and Daniel and maybe even 
Linus.
I think the one and only one exception where an SVM uapi like in kfd 
makes

sense, is if the _hardware_ itself, not the software stack defined
semantics that you've happened to build on top of that hw, enforces a 
1:1

mapping with the cpu process address space.

Which means your hardware is using PASID, IOMMU based translation, 
PCI-ATS

(address translation services) or whatever your hw calls it and has _no_
device-side pagetables on top. Which from what I've seen all devices 
with

device-memory have, simply because they need some place to store whether
that memory is currently in device memory or should be translated using
PASID. Currently there's no gpu that works with PASID only, but there 
are

some on-cpu-die accelerator things that do work like that.

Maybe in the future there will be some accelerators that are fully cpu
cache coherent (including atomics) with something like CXL, and the
on-device memory is managed as normal system memory with struct page as
ZONE_DEVICE and accelerator va -> physical address translation is only
done with PASID ... but for now I haven't seen that, definitely not in
upstream drivers.

And the moment you have some per-device pagetables or per-device memory
management of some sort (like using gpuva mgr) then I'm 100% agreeing 
with

Christian that the kfd SVM model is too strict and not a great idea.


That basically means, without ATS/PRI+PASID you cannot implement a 
unified memory programming model, where GPUs or accelerators access 
virtual addresses without pre-registering them with an SVM API call.


Unified memory is a feature implemented by the KFD SVM API and

Re: [PATCH] drm: bridge: samsung-dsim: Don't use FORCE_STOP_STATE

2024-01-29 Thread Michael Walle


Just FYI this conflictted pretty heavily with drm-misc-next changes in
the same area, someone should check drm-tip has the correct
resolution, I'm not really sure what is definitely should be.


FWIW, this looks rather messy now. The drm-tip doesn't build.

There was a new call to samsung_dsim_set_stop_state() introduced
in commit b2fe2292624ac (drm: bridge: samsung-dsim: enter display
mode in the enable() callback).


I had a closer look at the latest linux-next (where somehow my patch
made it into) and tried to apply commit b2fe2292624ac (drm: bridge:
samsung-dsim: enter display mode in the enable() callback). It looks
like only the following hunk is still needed from that patch. Everything
else is covered by this fixes patch.

Dario, could you rebase your commit onto this patch? I had a quick test
with this change and it seems to work fine for our case.

--snip--
diff --git a/drivers/gpu/drm/bridge/samsung-dsim.c 
b/drivers/gpu/drm/bridge/samsung-dsim.c

index 63a1a0c88be4..92755c90e7d2 100644
--- a/drivers/gpu/drm/bridge/samsung-dsim.c
+++ b/drivers/gpu/drm/bridge/samsung-dsim.c
@@ -1498,6 +1498,8 @@ static void samsung_dsim_atomic_disable(struct 
drm_bridge *bridge,

if (!(dsi->state & DSIM_STATE_ENABLED))
return;

+   samsung_dsim_set_display_enable(dsi, false);
+
dsi->state &= ~DSIM_STATE_VIDOUT_AVAILABLE;
 }

@@ -1506,8 +1508,6 @@ static void 
samsung_dsim_atomic_post_disable(struct drm_bridge *bridge,

 {
struct samsung_dsim *dsi = bridge_to_dsi(bridge);

-   samsung_dsim_set_display_enable(dsi, false);
-
dsi->state &= ~DSIM_STATE_ENABLED;
pm_runtime_put_sync(dsi->dev);
 }
--snip--

-michael

Re: [PATCH 2/2] drm/amd: Fetch the EDID from _DDC if available for eDP

2024-01-29 Thread Mario Limonciello


On 1/29/2024 03:39, Jani Nikula wrote:

On Fri, 26 Jan 2024, Mario Limonciello  wrote:

Some manufacturers have intentionally put an EDID that differs from
the EDID on the internal panel on laptops.

Attempt to fetch this EDID if it exists and prefer it over the EDID
that is provided by the panel.

Signed-off-by: Mario Limonciello 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  2 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c  | 30 +++
  .../gpu/drm/amd/amdgpu/amdgpu_connectors.c|  5 
  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |  8 -
  .../amd/display/amdgpu_dm/amdgpu_dm_helpers.c |  7 +++--
  5 files changed, 49 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index c5f3859fd682..99abe12567a4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1520,6 +1520,7 @@ int amdgpu_acpi_get_mem_info(struct amdgpu_device *adev, 
int xcc_id,
  
  void amdgpu_acpi_get_backlight_caps(struct amdgpu_dm_backlight_caps *caps);

  bool amdgpu_acpi_should_gpu_reset(struct amdgpu_device *adev);
+void *amdgpu_acpi_edid(struct amdgpu_device *adev, struct drm_connector 
*connector);
  void amdgpu_acpi_detect(void);
  void amdgpu_acpi_release(void);
  #else
@@ -1537,6 +1538,7 @@ static inline int amdgpu_acpi_get_mem_info(struct 
amdgpu_device *adev,
  }
  static inline void amdgpu_acpi_fini(struct amdgpu_device *adev) { }
  static inline bool amdgpu_acpi_should_gpu_reset(struct amdgpu_device *adev) { 
return false; }
+static inline void *amdgpu_acpi_edid(struct amdgpu_device *adev, struct 
drm_connector *connector) { return NULL; }
  static inline void amdgpu_acpi_detect(void) { }
  static inline void amdgpu_acpi_release(void) { }
  static inline bool amdgpu_acpi_is_power_shift_control_supported(void) { 
return false; }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
index e550067e5c5d..c106335f1f22 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
@@ -1380,6 +1380,36 @@ bool amdgpu_acpi_should_gpu_reset(struct amdgpu_device 
*adev)
  #endif
  }
  
+/**

+ * amdgpu_acpi_edid
+ * @adev: amdgpu_device pointer
+ * @connector: drm_connector pointer
+ *
+ * Returns the EDID used for the internal panel if present, NULL otherwise.
+ */
+void *
+amdgpu_acpi_edid(struct amdgpu_device *adev, struct drm_connector *connector)
+{
+   struct drm_device *ddev = adev_to_drm(adev);
+   struct acpi_device *acpidev = ACPI_COMPANION(ddev->dev);
+   void *edid;
+   int r;
+
+   if (!acpidev)
+   return NULL;
+
+   if (connector->connector_type != DRM_MODE_CONNECTOR_eDP)
+   return NULL;
+
+   r = acpi_video_get_edid(acpidev, ACPI_VIDEO_DISPLAY_LCD, -1, &edid);
+   if (r < 0) {
+   DRM_DEBUG_DRIVER("Failed to get EDID from ACPI: %d\n", r);
+   return NULL;
+   }
+
+   return kmemdup(edid, r, GFP_KERNEL);
+}
+
  /*
   * amdgpu_acpi_detect - detect ACPI ATIF/ATCS methods
   *
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
index 9caba10315a8..c7e1563a46d3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
@@ -278,6 +278,11 @@ static void amdgpu_connector_get_edid(struct drm_connector 
*connector)
struct amdgpu_device *adev = drm_to_adev(dev);
struct amdgpu_connector *amdgpu_connector = 
to_amdgpu_connector(connector);
  
+	if (amdgpu_connector->edid)

+   return;
+
+   /* if the BIOS specifies the EDID via _DDC, prefer this */
+   amdgpu_connector->edid = amdgpu_acpi_edid(adev, connector);


Imagine the EDID returned by acpi_video_get_edid() has edid->extensions
bigger than 4. Of course it should not, but you have no guarantees, and
it originates outside of the kernel.

The real fix is to have the function return a struct drm_edid which
tracks the allocation size separately. Unfortunately, it requires a
bunch of changes along the way. We've mostly done it in i915, and I've
sent a series to do this in drm/bridge [1].

Bottom line, we should stop using struct edid in drivers. They'll all
parse the info differently, and from what I've seen, often wrong.




Thanks for the feedback.  In that case this specific change should 
probably rebase on the Melissa's work 
https://lore.kernel.org/amd-gfx/20240126163429.56714-1-m...@igalia.com/ 
after she takes into account the feedback.


Let me ask you this though - do you think that after that's done should 
we let all drivers get EDID from BIOS as a priority?  Or would you 
prefer that this is unique to amdgpu?


Something like:

1) If user specifies on kernel command line and puts an EDID in 
/lib/firmware use that.

2) If BIOS has EDID in _DDC and it's eDP panel, use that.
3) Get panel EDID.


BR,
Jani.


[1] https

Re: [PATCH 1/2] ACPI: video: Handle fetching EDID that is longer than 256 bytes

2024-01-29 Thread Mario Limonciello


On 1/29/2024 07:54, Rafael J. Wysocki wrote:

On Fri, Jan 26, 2024 at 7:55 PM Mario Limonciello
 wrote:


The ACPI specification allows for an EDID to be up to 512 bytes but
the _DDC EDID fetching code will only try up to 256 bytes.

Modify the code to instead start at 512 bytes and work it's way
down instead.

Link: 
https://uefi.org/htmlspecs/ACPI_Spec_6_4_html/Apx_B_Video_Extensions/output-device-specific-methods.html#ddc-return-the-edid-for-this-device
Signed-off-by: Mario Limonciello 
---
  drivers/acpi/acpi_video.c | 23 ---
  1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/drivers/acpi/acpi_video.c b/drivers/acpi/acpi_video.c
index 62f4364e4460..b3b15dd4755d 100644
--- a/drivers/acpi/acpi_video.c
+++ b/drivers/acpi/acpi_video.c
@@ -624,6 +624,10 @@ acpi_video_device_EDID(struct acpi_video_device *device,
 arg0.integer.value = 1;
 else if (length == 256)
 arg0.integer.value = 2;
+   else if (length == 384)
+   arg0.integer.value = 3;
+   else if (length == 512)
+   arg0.integer.value = 4;


It looks like switch () would be somewhat better.

Or maybe even

arg0.integer.value = length / 128;

The validation could be added too:

if (arg0.integer.value > 4 || arg0.integer.value * 128 != length)
 return -EINVAL;

but it is pointless, because the caller is never passing an invalid
number to it AFAICS.



Thanks.  I'll swap over to one of these suggestions.

I will also split this patch separately from the other as the other will 
take some time with refactoring necessary in DRM that will take a cycle 
or two.



 else
 return -EINVAL;

@@ -1443,7 +1447,7 @@ int acpi_video_get_edid(struct acpi_device *device, int 
type, int device_id,

 for (i = 0; i < video->attached_count; i++) {
 video_device = video->attached_array[i].bind_info;
-   length = 256;
+   length = 512;

 if (!video_device)
 continue;
@@ -1478,13 +1482,18 @@ int acpi_video_get_edid(struct acpi_device *device, int 
type, int device_id,

 if (ACPI_FAILURE(status) || !buffer ||
 buffer->type != ACPI_TYPE_BUFFER) {
-   length = 128;
-   status = acpi_video_device_EDID(video_device, &buffer,
-   length);
-   if (ACPI_FAILURE(status) || !buffer ||
-   buffer->type != ACPI_TYPE_BUFFER) {
-   continue;
+   while (length) {


I would prefer a do {} while () loop here, which could include the
first invocation of acpi_video_device_EDID() too (and reduce code
duplication a bit).


+   length -= 128;
+   status = acpi_video_device_EDID(video_device, 
&buffer,
+   length);


No line break, please.


+   if (ACPI_FAILURE(status) || !buffer ||
+   buffer->type != ACPI_TYPE_BUFFER) {
+   continue;
+   }
+   break;
 }
+   if (!length)
+   continue;
 }

 *edid = buffer->buffer.pointer;
--

Re: Making drm_gpuvm work across gpu devices

2024-01-29 Thread Felix Kuehling




On 2024-01-29 10:33, Christian König wrote:

Am 29.01.24 um 16:03 schrieb Felix Kuehling:

On 2024-01-25 13:32, Daniel Vetter wrote:

On Wed, Jan 24, 2024 at 09:33:12AM +0100, Christian König wrote:

Am 23.01.24 um 20:37 schrieb Zeng, Oak:

[SNIP]
Yes most API are per device based.

One exception I know is actually the kfd SVM API. If you look at 
the svm_ioctl function, it is per-process based. Each kfd_process 
represent a process across N gpu devices.
Yeah and that was a big mistake in my opinion. We should really not 
do that

ever again.

Need to say, kfd SVM represent a shared virtual address space 
across CPU and all GPU devices on the system. This is by the 
definition of SVM (shared virtual memory). This is very different 
from our legacy gpu *device* driver which works for only one 
device (i.e., if you want one device to access another device's 
memory, you will have to use dma-buf export/import etc).

Exactly that thinking is what we have currently found as blocker for a
virtualization projects. Having SVM as device independent feature 
which
somehow ties to the process address space turned out to be an 
extremely bad

idea.

The background is that this only works for some use cases but not 
all of

them.

What's working much better is to just have a mirror functionality 
which says
that a range A..B of the process address space is mapped into a 
range C..D

of the GPU address space.

Those ranges can then be used to implement the SVM feature required 
for
higher level APIs and not something you need at the UAPI or even 
inside the

low level kernel memory management.

When you talk about migrating memory to a device you also do this 
on a per
device basis and *not* tied to the process address space. If you 
then get
crappy performance because userspace gave contradicting information 
where to
migrate memory then that's a bug in userspace and not something the 
kernel

should try to prevent somehow.

[SNIP]
I think if you start using the same drm_gpuvm for multiple 
devices you
will sooner or later start to run into the same mess we have seen 
with
KFD, where we moved more and more functionality from the KFD to 
the DRM
render node because we found that a lot of the stuff simply 
doesn't work

correctly with a single object to maintain the state.
As I understand it, KFD is designed to work across devices. A 
single pseudo /dev/kfd device represent all hardware gpu devices. 
That is why during kfd open, many pdd (process device data) is 
created, each for one hardware device for this process.
Yes, I'm perfectly aware of that. And I can only repeat myself that 
I see
this design as a rather extreme failure. And I think it's one of 
the reasons

why NVidia is so dominant with Cuda.

This whole approach KFD takes was designed with the idea of 
extending the
CPU process into the GPUs, but this idea only works for a few use 
cases and

is not something we should apply to drivers in general.

A very good example are virtualization use cases where you end up 
with CPU
address != GPU address because the VAs are actually coming from the 
guest VM

and not the host process.

SVM is a high level concept of OpenCL, Cuda, ROCm etc.. This should 
not have

any influence on the design of the kernel UAPI.

If you want to do something similar as KFD for Xe I think you need 
to get
explicit permission to do this from Dave and Daniel and maybe even 
Linus.
I think the one and only one exception where an SVM uapi like in kfd 
makes

sense, is if the _hardware_ itself, not the software stack defined
semantics that you've happened to build on top of that hw, enforces 
a 1:1

mapping with the cpu process address space.

Which means your hardware is using PASID, IOMMU based translation, 
PCI-ATS
(address translation services) or whatever your hw calls it and has 
_no_
device-side pagetables on top. Which from what I've seen all devices 
with
device-memory have, simply because they need some place to store 
whether

that memory is currently in device memory or should be translated using
PASID. Currently there's no gpu that works with PASID only, but 
there are

some on-cpu-die accelerator things that do work like that.

Maybe in the future there will be some accelerators that are fully cpu
cache coherent (including atomics) with something like CXL, and the
on-device memory is managed as normal system memory with struct page as
ZONE_DEVICE and accelerator va -> physical address translation is only
done with PASID ... but for now I haven't seen that, definitely not in
upstream drivers.

And the moment you have some per-device pagetables or per-device memory
management of some sort (like using gpuva mgr) then I'm 100% 
agreeing with

Christian that the kfd SVM model is too strict and not a great idea.


That basically means, without ATS/PRI+PASID you cannot implement a 
unified memory programming model, where GPUs or accelerators access 
virtual addresses without pre-registering them with an SVM API call.


Unified memo

Re: Making drm_gpuvm work across gpu devices

2024-01-29 Thread Christian König


Am 29.01.24 um 17:24 schrieb Felix Kuehling:

On 2024-01-29 10:33, Christian König wrote:

Am 29.01.24 um 16:03 schrieb Felix Kuehling:

On 2024-01-25 13:32, Daniel Vetter wrote:

On Wed, Jan 24, 2024 at 09:33:12AM +0100, Christian König wrote:

Am 23.01.24 um 20:37 schrieb Zeng, Oak:

[SNIP]
Yes most API are per device based.

One exception I know is actually the kfd SVM API. If you look at 
the svm_ioctl function, it is per-process based. Each kfd_process 
represent a process across N gpu devices.
Yeah and that was a big mistake in my opinion. We should really 
not do that

ever again.

Need to say, kfd SVM represent a shared virtual address space 
across CPU and all GPU devices on the system. This is by the 
definition of SVM (shared virtual memory). This is very different 
from our legacy gpu *device* driver which works for only one 
device (i.e., if you want one device to access another device's 
memory, you will have to use dma-buf export/import etc).
Exactly that thinking is what we have currently found as blocker 
for a
virtualization projects. Having SVM as device independent feature 
which
somehow ties to the process address space turned out to be an 
extremely bad

idea.

The background is that this only works for some use cases but not 
all of

them.

What's working much better is to just have a mirror functionality 
which says
that a range A..B of the process address space is mapped into a 
range C..D

of the GPU address space.

Those ranges can then be used to implement the SVM feature 
required for
higher level APIs and not something you need at the UAPI or even 
inside the

low level kernel memory management.

When you talk about migrating memory to a device you also do this 
on a per
device basis and *not* tied to the process address space. If you 
then get
crappy performance because userspace gave contradicting 
information where to
migrate memory then that's a bug in userspace and not something 
the kernel

should try to prevent somehow.

[SNIP]
I think if you start using the same drm_gpuvm for multiple 
devices you
will sooner or later start to run into the same mess we have 
seen with
KFD, where we moved more and more functionality from the KFD to 
the DRM
render node because we found that a lot of the stuff simply 
doesn't work

correctly with a single object to maintain the state.
As I understand it, KFD is designed to work across devices. A 
single pseudo /dev/kfd device represent all hardware gpu devices. 
That is why during kfd open, many pdd (process device data) is 
created, each for one hardware device for this process.
Yes, I'm perfectly aware of that. And I can only repeat myself 
that I see
this design as a rather extreme failure. And I think it's one of 
the reasons

why NVidia is so dominant with Cuda.

This whole approach KFD takes was designed with the idea of 
extending the
CPU process into the GPUs, but this idea only works for a few use 
cases and

is not something we should apply to drivers in general.

A very good example are virtualization use cases where you end up 
with CPU
address != GPU address because the VAs are actually coming from 
the guest VM

and not the host process.

SVM is a high level concept of OpenCL, Cuda, ROCm etc.. This 
should not have

any influence on the design of the kernel UAPI.

If you want to do something similar as KFD for Xe I think you need 
to get
explicit permission to do this from Dave and Daniel and maybe even 
Linus.
I think the one and only one exception where an SVM uapi like in 
kfd makes

sense, is if the _hardware_ itself, not the software stack defined
semantics that you've happened to build on top of that hw, enforces 
a 1:1

mapping with the cpu process address space.

Which means your hardware is using PASID, IOMMU based translation, 
PCI-ATS
(address translation services) or whatever your hw calls it and has 
_no_
device-side pagetables on top. Which from what I've seen all 
devices with
device-memory have, simply because they need some place to store 
whether
that memory is currently in device memory or should be translated 
using
PASID. Currently there's no gpu that works with PASID only, but 
there are

some on-cpu-die accelerator things that do work like that.

Maybe in the future there will be some accelerators that are fully cpu
cache coherent (including atomics) with something like CXL, and the
on-device memory is managed as normal system memory with struct 
page as

ZONE_DEVICE and accelerator va -> physical address translation is only
done with PASID ... but for now I haven't seen that, definitely not in
upstream drivers.

And the moment you have some per-device pagetables or per-device 
memory
management of some sort (like using gpuva mgr) then I'm 100% 
agreeing with

Christian that the kfd SVM model is too strict and not a great idea.


That basically means, without ATS/PRI+PASID you cannot implement a 
unified memory programming model, where GPUs or accelerators access 
virtual addresses without pre-regist

Re: [PATCH 2/2] drm/amd: Fetch the EDID from _DDC if available for eDP

2024-01-29 Thread Jani Nikula

On Mon, 29 Jan 2024, Mario Limonciello  wrote:
> On 1/29/2024 03:39, Jani Nikula wrote:
>> On Fri, 26 Jan 2024, Mario Limonciello  wrote:
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
>>> index 9caba10315a8..c7e1563a46d3 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
>>> @@ -278,6 +278,11 @@ static void amdgpu_connector_get_edid(struct 
>>> drm_connector *connector)
>>> struct amdgpu_device *adev = drm_to_adev(dev);
>>> struct amdgpu_connector *amdgpu_connector = 
>>> to_amdgpu_connector(connector);
>>>   
>>> +   if (amdgpu_connector->edid)
>>> +   return;
>>> +
>>> +   /* if the BIOS specifies the EDID via _DDC, prefer this */
>>> +   amdgpu_connector->edid = amdgpu_acpi_edid(adev, connector);
>> 
>> Imagine the EDID returned by acpi_video_get_edid() has edid->extensions
>> bigger than 4. Of course it should not, but you have no guarantees, and
>> it originates outside of the kernel.
>> 
>> The real fix is to have the function return a struct drm_edid which
>> tracks the allocation size separately. Unfortunately, it requires a
>> bunch of changes along the way. We've mostly done it in i915, and I've
>> sent a series to do this in drm/bridge [1].

Looking at it again, perhaps the ACPI code should just return a blob,
and the drm code should have a helper to wrap that around struct
drm_edid, so that the ACPI code does not have to depend on drm. Basic
idea remains.

>> Bottom line, we should stop using struct edid in drivers. They'll all
>> parse the info differently, and from what I've seen, often wrong.
>> 
>> 
>
> Thanks for the feedback.  In that case this specific change should 
> probably rebase on the Melissa's work 
> https://lore.kernel.org/amd-gfx/20240126163429.56714-1-m...@igalia.com/ 
> after she takes into account the feedback.
>
> Let me ask you this though - do you think that after that's done should 
> we let all drivers get EDID from BIOS as a priority?  Or would you 
> prefer that this is unique to amdgpu?

If the reason for having this is that the panel EDID contains some
garbage, that's certainly not unique to amdgpu... :p

> Something like:
>
> 1) If user specifies on kernel command line and puts an EDID in 
> /lib/firmware use that.
> 2) If BIOS has EDID in _DDC and it's eDP panel, use that.

I think we should also look into this. We currently don't do this, and
it might help with some machines. However, gut feeling says it's
probably better to keep this as a per driver decision instead of trying
to bolt it into drm helpers.

BR,
Jani.


> 3) Get panel EDID.
>

-- 
Jani Nikula, Intel

Re: [PATCH] drm/xe: Fix a build error

2024-01-29 Thread Christian König


Am 27.01.24 um 16:53 schrieb Oak Zeng:

This fixes a build failure on drm-tip. This issue was introduced during
merge of "drm/ttm: replace busy placement with flags v6". For some
reason, the xe_bo.c part of above change is not merged. Manually merge
the missing part to drm_tip


Mhm, I provided this as manual fixup for drm-tip in this rerere commit:

commit afc5797e8c03bed3ec47a34f2bc3cf03fce24411
Author: Christian König 
Date:   Thu Jan 25 10:44:54 2024 +0100

    2024y-01m-25d-09h-44m-07s UTC: drm-tip rerere cache update

    git version 2.34.1


And for me compiling xe in drm-tip worked fine after that. No idea why 
that didn't work for you.


Anyway feel free to add my rb to this patch here if it helps in any way.

Regards,
Christian.



Signed-off-by: Oak Zeng 
---
  drivers/gpu/drm/xe/xe_bo.c | 33 +++--
  1 file changed, 15 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c
index 686d716c5581..d6a193060cc0 100644
--- a/drivers/gpu/drm/xe/xe_bo.c
+++ b/drivers/gpu/drm/xe/xe_bo.c
@@ -38,22 +38,26 @@ static const struct ttm_place sys_placement_flags = {
  static struct ttm_placement sys_placement = {
.num_placement = 1,
.placement = &sys_placement_flags,
-   .num_busy_placement = 1,
-   .busy_placement = &sys_placement_flags,
  };
  
-static const struct ttm_place tt_placement_flags = {

-   .fpfn = 0,
-   .lpfn = 0,
-   .mem_type = XE_PL_TT,
-   .flags = 0,
+static const struct ttm_place tt_placement_flags[] = {
+   {
+   .fpfn = 0,
+   .lpfn = 0,
+   .mem_type = XE_PL_TT,
+   .flags = TTM_PL_FLAG_DESIRED,
+   },
+   {
+   .fpfn = 0,
+   .lpfn = 0,
+   .mem_type = XE_PL_SYSTEM,
+   .flags = TTM_PL_FLAG_FALLBACK,
+   }
  };
  
  static struct ttm_placement tt_placement = {

-   .num_placement = 1,
-   .placement = &tt_placement_flags,
-   .num_busy_placement = 1,
-   .busy_placement = &sys_placement_flags,
+   .num_placement = 2,
+   .placement = tt_placement_flags,
  };
  
  bool mem_type_is_vram(u32 mem_type)

@@ -230,8 +234,6 @@ static int __xe_bo_placement_for_flags(struct xe_device 
*xe, struct xe_bo *bo,
bo->placement = (struct ttm_placement) {
.num_placement = c,
.placement = bo->placements,
-   .num_busy_placement = c,
-   .busy_placement = bo->placements,
};
  
  	return 0;

@@ -251,7 +253,6 @@ static void xe_evict_flags(struct ttm_buffer_object *tbo,
/* Don't handle scatter gather BOs */
if (tbo->type == ttm_bo_type_sg) {
placement->num_placement = 0;
-   placement->num_busy_placement = 0;
return;
}
  
@@ -1391,8 +1392,6 @@ static int __xe_bo_fixed_placement(struct xe_device *xe,

bo->placement = (struct ttm_placement) {
.num_placement = 1,
.placement = place,
-   .num_busy_placement = 1,
-   .busy_placement = place,
};
  
  	return 0;

@@ -2150,9 +2149,7 @@ int xe_bo_migrate(struct xe_bo *bo, u32 mem_type)
  
  	xe_place_from_ttm_type(mem_type, &requested);

placement.num_placement = 1;
-   placement.num_busy_placement = 1;
placement.placement = &requested;
-   placement.busy_placement = &requested;
  
  	/*

 * Stolen needs to be handled like below VRAM handling if we ever need

Re: [PATCH] drm: bridge: samsung-dsim: Don't use FORCE_STOP_STATE

2024-01-29 Thread Frieder Schrempf

On 29.01.24 10:20, Frieder Schrempf wrote:
> On 26.01.24 19:28, Dave Airlie wrote:
>> Just FYI this conflictted pretty heavily with drm-misc-next changes in
>> the same area, someone should check drm-tip has the correct
>> resolution, I'm not really sure what is definitely should be.
>>
>> Dave.
> 
> Thanks! I took a quick look at what is now in Linus' tree and it looks
> correct to me. The only thing I'm missing is my Reviewed-by tag which
> got lost somewhere, but I can get over that.

Apparently I missed the point here. I was looking at the wrong trees
(drm-next and master instead of drm-misc-next and drm-tip). Sorry for
the noise. Michael already pointed out the correct details.

Re: [PATCH 2/2] drm/amd: Fetch the EDID from _DDC if available for eDP

2024-01-29 Thread Mario Limonciello


On 1/29/2024 10:46, Jani Nikula wrote:

On Mon, 29 Jan 2024, Mario Limonciello  wrote:

On 1/29/2024 03:39, Jani Nikula wrote:

On Fri, 26 Jan 2024, Mario Limonciello  wrote:

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
index 9caba10315a8..c7e1563a46d3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c
@@ -278,6 +278,11 @@ static void amdgpu_connector_get_edid(struct drm_connector 
*connector)
struct amdgpu_device *adev = drm_to_adev(dev);
struct amdgpu_connector *amdgpu_connector = 
to_amdgpu_connector(connector);
   
+	if (amdgpu_connector->edid)

+   return;
+
+   /* if the BIOS specifies the EDID via _DDC, prefer this */
+   amdgpu_connector->edid = amdgpu_acpi_edid(adev, connector);


Imagine the EDID returned by acpi_video_get_edid() has edid->extensions
bigger than 4. Of course it should not, but you have no guarantees, and
it originates outside of the kernel.

The real fix is to have the function return a struct drm_edid which
tracks the allocation size separately. Unfortunately, it requires a
bunch of changes along the way. We've mostly done it in i915, and I've
sent a series to do this in drm/bridge [1].


Looking at it again, perhaps the ACPI code should just return a blob,
and the drm code should have a helper to wrap that around struct
drm_edid, so that the ACPI code does not have to depend on drm. Basic
idea remains.


I'd ideally like to split this stuff and Melissa's rework to be 
independent if possible.  I'll see if that's actually feasible.





Bottom line, we should stop using struct edid in drivers. They'll all
parse the info differently, and from what I've seen, often wrong.




Thanks for the feedback.  In that case this specific change should
probably rebase on the Melissa's work
https://lore.kernel.org/amd-gfx/20240126163429.56714-1-m...@igalia.com/
after she takes into account the feedback.

Let me ask you this though - do you think that after that's done should
we let all drivers get EDID from BIOS as a priority?  Or would you
prefer that this is unique to amdgpu?


If the reason for having this is that the panel EDID contains some
garbage, that's certainly not unique to amdgpu... :p


OK; maybe a helper in DRM that wraps the ACPI code then and amdgpu will 
use the helper for this series.


I'm also thinking it makes sense to have a new /proc/cmdline setup 
option to ignore the BIOS for EDID.  I'm hoping that since Windows uses 
_DDC that BIOS will be higher quality; but you know; BIOS =)





Something like:

1) If user specifies on kernel command line and puts an EDID in
/lib/firmware use that.
2) If BIOS has EDID in _DDC and it's eDP panel, use that.


I think we should also look into this. We currently don't do this, and
it might help with some machines. However, gut feeling says it's
probably better to keep this as a per driver decision instead of trying
to bolt it into drm helpers.


OK; I'll wire up the helper and if you want to use in the future you can 
too then.




BR,
Jani.



3) Get panel EDID.

[PATCH v6 0/6] iio: new DMABUF based API, v6

2024-01-29 Thread Paul Cercueil

Hi Jonathan,

This is the v6 of my patchset that introduces a new interface based on
DMABUF objects.

The code was updated quite a bit, using the feedback on the list for
this patchset but also the feedback I received on the FunctionFS
patchset that I'm working on upstreaming in parallel [1] where the
DMABUF handling code is very similar.

See below for the full changelog.

I decided to drop the scope-based memory management for dma_buf and
I hope you are OK with that. Christian wants the patch(es) to support
scope-based memory management in dma-buf as a separate patchset; once
it's in, I will gladly send a follow-up patch to use __free() where it
makes sense.

For performance numbers, I'll point you to the cover letter for my v5
patchset [2].

This patchset was based on next-20240129.

Cheers,
-Paul

[1] https://lore.kernel.org/all/20230322092118.9213-1-p...@crapouillou.net/
[2] 
https://lore.kernel.org/linux-iio/219abc43b4fdd4a13b307ed2efaa0e6869e68e3f.ca...@gmail.com/T/

---

Changelog:
* [2/6]:
- Use new prototype for axi_dmac_alloc_desc() as it changed upstream
* [3/6]:
- Remove dead code in iio_dma_resv_lock()
- Fix non-block actually blocking
- Cache dma_buf_attachment instead of mapping/unmapping it for every
  transfer
- Return -EINVAL instead of IIO_IOCTL_UNHANDLED for unknown ioctl
- Make .block_enqueue() callback take a dma_fence pointer, which
  will be passed to iio_buffer_signal_dmabuf_done() instead of the
  dma_buf_attachment; and remove the backpointer from the priv
  structure to the dma_fence.
- Use dma_fence_begin/end_signalling in the dma_fence critical
  sections
- Unref dma_fence and dma_buf_attachment in worker, because they
  might try to lock the dma_resv, which would deadlock.
- Add buffer ops to lock/unlock the queue. This is motivated by the
  fact that once the dma_fence has been installed, we cannot lock
  anything anymore - so the queue must be locked before the
  dma_fence is installed.
- Use 'long retl' variable to handle the return value of
  dma_resv_wait_timeout()
- Protect dmabufs list access with a mutex
- Rework iio_buffer_find_attachment() to use the internal dmabufs
  list, instead of messing with dmabufs private data.
- Add an atomically-increasing sequence number for fences
* [4/6]:
- Update iio_dma_buffer_enqueue_dmabuf() to take a dma_fence pointer
- Pass that dma_fence pointer along to
  iio_buffer_signal_dmabuf_done()
- Add iio_dma_buffer_lock_queue() / iio_dma_buffer_unlock_queue()
- Do not lock the queue in iio_dma_buffer_enqueue_dmabuf().
  The caller will ensure that it has been locked already.
- Replace "int += bool;" by "if (bool) int++;"
- Use dma_fence_begin/end_signalling in the dma_fence critical
  sections
- Use one "num_dmabufs" fields instead of one "num_blocks" and one
  "num_fileio_blocks". Make it an atomic_t, which makes it possible
  to decrement it atomically in iio_buffer_block_release() without
  having to lock the queue mutex; and in turn, it means that we
  don't need to use iio_buffer_block_put_atomic() everywhere to
  avoid locking the queue mutex twice.
- Use cleanup.h guard(mutex) when possible
- Explicitely list all states in the switch in
  iio_dma_can_enqueue_block()
- Rename iio_dma_buffer_fileio_mode() to
  iio_dma_buffer_can_use_fileio(), and add a comment explaining why
  it cannot race vs. DMABUF.
* [5/6]:
- Populate .lock_queue / .unlock_queue callbacks
- Switch to atomic memory allocations in .submit_queue, because of
  the dma_fence critical section
- Make sure that the size of the scatterlist is enough

---
Paul Cercueil (6):
  dmaengine: Add API function dmaengine_prep_slave_dma_vec()
  dmaengine: dma-axi-dmac: Implement device_prep_slave_dma_vec
  iio: core: Add new DMABUF interface infrastructure
  iio: buffer-dma: Enable support for DMABUFs
  iio: buffer-dmaengine: Support new DMABUF based userspace API
  Documentation: iio: Document high-speed DMABUF based API

 Documentation/iio/dmabuf_api.rst  |  54 ++
 Documentation/iio/index.rst   |   2 +
 drivers/dma/dma-axi-dmac.c|  40 ++
 drivers/iio/buffer/industrialio-buffer-dma.c  | 181 ++-
 .../buffer/industrialio-buffer-dmaengine.c|  58 ++-
 drivers/iio/industrialio-buffer.c | 462 ++
 include/linux/dmaengine.h |  25 +
 include/linux/iio/buffer-dma.h|  31 ++
 include/linux/iio/buffer_impl.h   |  33 ++
 include/uapi/linux/iio/buffer.h   |  22 +
 10 files changed, 891 insertions(+), 17 deletions(-)
 create mode 100644 Documentation/iio/dmabuf_api.rst

-- 
2.43.0

[PATCH v6 2/6] dmaengine: dma-axi-dmac: Implement device_prep_slave_dma_vec

2024-01-29 Thread Paul Cercueil

Add implementation of the .device_prep_slave_dma_vec() callback.

Signed-off-by: Paul Cercueil 

---
v3: New patch

v5: Implement .device_prep_slave_dma_vec() instead of v3's
.device_prep_slave_dma_array().

v6: Use new prototype for axi_dmac_alloc_desc() as it changed upstream.
---
 drivers/dma/dma-axi-dmac.c | 40 ++
 1 file changed, 40 insertions(+)

diff --git a/drivers/dma/dma-axi-dmac.c b/drivers/dma/dma-axi-dmac.c
index 4e339c04fc1e..276856a1742d 100644
--- a/drivers/dma/dma-axi-dmac.c
+++ b/drivers/dma/dma-axi-dmac.c
@@ -620,6 +620,45 @@ static struct axi_dmac_sg *axi_dmac_fill_linear_sg(struct 
axi_dmac_chan *chan,
return sg;
 }
 
+static struct dma_async_tx_descriptor *
+axi_dmac_prep_slave_dma_vec(struct dma_chan *c, const struct dma_vec *vecs,
+   size_t nb, enum dma_transfer_direction direction,
+   unsigned long flags)
+{
+   struct axi_dmac_chan *chan = to_axi_dmac_chan(c);
+   struct axi_dmac_desc *desc;
+   unsigned int num_sgs = 0;
+   struct axi_dmac_sg *dsg;
+   size_t i;
+
+   if (direction != chan->direction)
+   return NULL;
+
+   for (i = 0; i < nb; i++)
+   num_sgs += DIV_ROUND_UP(vecs[i].len, chan->max_length);
+
+   desc = axi_dmac_alloc_desc(chan, num_sgs);
+   if (!desc)
+   return NULL;
+
+   dsg = desc->sg;
+
+   for (i = 0; i < nb; i++) {
+   if (!axi_dmac_check_addr(chan, vecs[i].addr) ||
+   !axi_dmac_check_len(chan, vecs[i].len)) {
+   kfree(desc);
+   return NULL;
+   }
+
+   dsg = axi_dmac_fill_linear_sg(chan, direction, vecs[i].addr, 1,
+ vecs[i].len, dsg);
+   }
+
+   desc->cyclic = false;
+
+   return vchan_tx_prep(&chan->vchan, &desc->vdesc, flags);
+}
+
 static struct dma_async_tx_descriptor *axi_dmac_prep_slave_sg(
struct dma_chan *c, struct scatterlist *sgl,
unsigned int sg_len, enum dma_transfer_direction direction,
@@ -1055,6 +1094,7 @@ static int axi_dmac_probe(struct platform_device *pdev)
dma_dev->device_tx_status = dma_cookie_status;
dma_dev->device_issue_pending = axi_dmac_issue_pending;
dma_dev->device_prep_slave_sg = axi_dmac_prep_slave_sg;
+   dma_dev->device_prep_slave_dma_vec = axi_dmac_prep_slave_dma_vec;
dma_dev->device_prep_dma_cyclic = axi_dmac_prep_dma_cyclic;
dma_dev->device_prep_interleaved_dma = axi_dmac_prep_interleaved;
dma_dev->device_terminate_all = axi_dmac_terminate_all;
-- 
2.43.0

[PATCH v6 1/6] dmaengine: Add API function dmaengine_prep_slave_dma_vec()

2024-01-29 Thread Paul Cercueil

This function can be used to initiate a scatter-gather DMA transfer,
where the address and size of each segment is located in one entry of
the dma_vec array.

The major difference with dmaengine_prep_slave_sg() is that it supports
specifying the lengths of each DMA transfer; as trying to override the
length of the transfer with dmaengine_prep_slave_sg() is a very tedious
process. The introduction of a new API function is also justified by the
fact that scatterlists are on their way out.

Note that dmaengine_prep_interleaved_dma() is not helpful either in that
case, as it assumes that the address of each segment will be higher than
the one of the previous segment, which we just cannot guarantee in case
of a scatter-gather transfer.

Signed-off-by: Paul Cercueil 

---
v3: New patch

v5: Replace with function dmaengine_prep_slave_dma_vec(), and struct
'dma_vec'.
Note that at some point we will need to support cyclic transfers
using dmaengine_prep_slave_dma_vec(). Maybe with a new "flags"
parameter to the function?
---
 include/linux/dmaengine.h | 25 +
 1 file changed, 25 insertions(+)

diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h
index 3df70d6131c8..ee5931ddb42f 100644
--- a/include/linux/dmaengine.h
+++ b/include/linux/dmaengine.h
@@ -160,6 +160,16 @@ struct dma_interleaved_template {
struct data_chunk sgl[];
 };
 
+/**
+ * struct dma_vec - DMA vector
+ * @addr: Bus address of the start of the vector
+ * @len: Length in bytes of the DMA vector
+ */
+struct dma_vec {
+   dma_addr_t addr;
+   size_t len;
+};
+
 /**
  * enum dma_ctrl_flags - DMA flags to augment operation preparation,
  *  control completion, and communicate status.
@@ -910,6 +920,10 @@ struct dma_device {
struct dma_async_tx_descriptor *(*device_prep_dma_interrupt)(
struct dma_chan *chan, unsigned long flags);
 
+   struct dma_async_tx_descriptor *(*device_prep_slave_dma_vec)(
+   struct dma_chan *chan, const struct dma_vec *vecs,
+   size_t nents, enum dma_transfer_direction direction,
+   unsigned long flags);
struct dma_async_tx_descriptor *(*device_prep_slave_sg)(
struct dma_chan *chan, struct scatterlist *sgl,
unsigned int sg_len, enum dma_transfer_direction direction,
@@ -972,6 +986,17 @@ static inline struct dma_async_tx_descriptor 
*dmaengine_prep_slave_single(
  dir, flags, NULL);
 }
 
+static inline struct dma_async_tx_descriptor *dmaengine_prep_slave_dma_vec(
+   struct dma_chan *chan, const struct dma_vec *vecs, size_t nents,
+   enum dma_transfer_direction dir, unsigned long flags)
+{
+   if (!chan || !chan->device || !chan->device->device_prep_slave_dma_vec)
+   return NULL;
+
+   return chan->device->device_prep_slave_dma_vec(chan, vecs, nents,
+  dir, flags);
+}
+
 static inline struct dma_async_tx_descriptor *dmaengine_prep_slave_sg(
struct dma_chan *chan, struct scatterlist *sgl, unsigned int sg_len,
enum dma_transfer_direction dir, unsigned long flags)
-- 
2.43.0

[PATCH v6 3/6] iio: core: Add new DMABUF interface infrastructure

2024-01-29 Thread Paul Cercueil

Add the necessary infrastructure to the IIO core to support a new
optional DMABUF based interface.

With this new interface, DMABUF objects (externally created) can be
attached to a IIO buffer, and subsequently used for data transfer.

A userspace application can then use this interface to share DMABUF
objects between several interfaces, allowing it to transfer data in a
zero-copy fashion, for instance between IIO and the USB stack.

The userspace application can also memory-map the DMABUF objects, and
access the sample data directly. The advantage of doing this vs. the
read() interface is that it avoids an extra copy of the data between the
kernel and userspace. This is particularly userful for high-speed
devices which produce several megabytes or even gigabytes of data per
second.

As part of the interface, 3 new IOCTLs have been added:

IIO_BUFFER_DMABUF_ATTACH_IOCTL(int fd):
 Attach the DMABUF object identified by the given file descriptor to the
 buffer.

IIO_BUFFER_DMABUF_DETACH_IOCTL(int fd):
 Detach the DMABUF object identified by the given file descriptor from
 the buffer. Note that closing the IIO buffer's file descriptor will
 automatically detach all previously attached DMABUF objects.

IIO_BUFFER_DMABUF_ENQUEUE_IOCTL(struct iio_dmabuf *):
 Request a data transfer to/from the given DMABUF object. Its file
 descriptor, as well as the transfer size and flags are provided in the
 "iio_dmabuf" structure.

These three IOCTLs have to be performed on the IIO buffer's file
descriptor, obtained using the IIO_BUFFER_GET_FD_IOCTL() ioctl.

Signed-off-by: Paul Cercueil 

---
v2: Only allow the new IOCTLs on the buffer FD created with
IIO_BUFFER_GET_FD_IOCTL().

v3: - Get rid of the old IOCTLs. The IIO subsystem does not create or
manage DMABUFs anymore, and only attaches/detaches externally
created DMABUFs.
- Add IIO_BUFFER_DMABUF_CYCLIC to the supported flags.

v5: - Use dev_err() instead of pr_err()
- Inline to_iio_dma_fence()
- Add comment to explain why we unref twice when detaching dmabuf
- Remove TODO comment. It is actually safe to free the file's
  private data even when transfers are still pending because it
  won't be accessed.
- Fix documentation of new fields in struct iio_buffer_access_funcs
- iio_dma_resv_lock() does not need to be exported, make it static

v6: - Remove dead code in iio_dma_resv_lock()
- Fix non-block actually blocking
- Cache dma_buf_attachment instead of mapping/unmapping it for every
  transfer
- Return -EINVAL instead of IIO_IOCTL_UNHANDLED for unknown ioctl
- Make .block_enqueue() callback take a dma_fence pointer, which
  will be passed to iio_buffer_signal_dmabuf_done() instead of the
  dma_buf_attachment; and remove the backpointer from the priv
  structure to the dma_fence.
- Use dma_fence_begin/end_signalling in the dma_fence critical
  sections
- Unref dma_fence and dma_buf_attachment in worker, because they
  might try to lock the dma_resv, which would deadlock.
- Add buffer ops to lock/unlock the queue. This is motivated by the
  fact that once the dma_fence has been installed, we cannot lock
  anything anymore - so the queue must be locked before the
  dma_fence is installed.
- Use 'long retl' variable to handle the return value of
  dma_resv_wait_timeout()
- Protect dmabufs list access with a mutex
- Rework iio_buffer_find_attachment() to use the internal dmabufs
  list, instead of messing with dmabufs private data.
- Add an atomically-increasing sequence number for fences
---
 drivers/iio/industrialio-buffer.c | 462 ++
 include/linux/iio/buffer_impl.h   |  33 +++
 include/uapi/linux/iio/buffer.h   |  22 ++
 3 files changed, 517 insertions(+)

diff --git a/drivers/iio/industrialio-buffer.c 
b/drivers/iio/industrialio-buffer.c
index b581a7e80566..0e63a09fa90a 100644
--- a/drivers/iio/industrialio-buffer.c
+++ b/drivers/iio/industrialio-buffer.c
@@ -9,14 +9,19 @@
  * - Better memory allocation techniques?
  * - Alternative access techniques?
  */
+#include 
 #include 
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -28,6 +33,32 @@
 #include 
 #include 
 
+#define DMABUF_ENQUEUE_TIMEOUT_MS 5000
+
+struct iio_dma_fence;
+
+struct iio_dmabuf_priv {
+   struct list_head entry;
+   struct kref ref;
+
+   struct iio_buffer *buffer;
+   struct iio_dma_buffer_block *block;
+
+   u64 context;
+   spinlock_t lock;
+
+   struct dma_buf_attachment *attach;
+   struct sg_table *sgt;
+   enum dma_data_direction dir;
+   atomic_t seqno;
+};
+
+struct iio_dma_fence {
+   struct dma_fence base;
+   struct iio_dmabuf_priv *priv;
+   struct work_struct work;
+};
+
 static const char * const iio_endian_prefix[] = {
[IIO_BE] = "be",
[IIO_LE] = "le",
@@ -332,6 +3

[PATCH v6 5/6] iio: buffer-dmaengine: Support new DMABUF based userspace API

2024-01-29 Thread Paul Cercueil

Use the functions provided by the buffer-dma core to implement the
DMABUF userspace API in the buffer-dmaengine IIO buffer implementation.

Since we want to be able to transfer an arbitrary number of bytes and
not necesarily the full DMABUF, the associated scatterlist is converted
to an array of DMA addresses + lengths, which is then passed to
dmaengine_prep_slave_dma_array().

Signed-off-by: Paul Cercueil 

---
v3: Use the new dmaengine_prep_slave_dma_array(), and adapt the code to
work with the new functions introduced in industrialio-buffer-dma.c.

v5: - Use the new dmaengine_prep_slave_dma_vec().
- Restrict to input buffers, since output buffers are not yet
  supported by IIO buffers.

v6: - Populate .lock_queue / .unlock_queue callbacks
- Switch to atomic memory allocations in .submit_queue, because of
  the dma_fence critical section
- Make sure that the size of the scatterlist is enough
---
 .../buffer/industrialio-buffer-dmaengine.c| 58 +--
 1 file changed, 52 insertions(+), 6 deletions(-)

diff --git a/drivers/iio/buffer/industrialio-buffer-dmaengine.c 
b/drivers/iio/buffer/industrialio-buffer-dmaengine.c
index 45fe7d0d42ee..c4cfdb0c1231 100644
--- a/drivers/iio/buffer/industrialio-buffer-dmaengine.c
+++ b/drivers/iio/buffer/industrialio-buffer-dmaengine.c
@@ -64,15 +64,54 @@ static int iio_dmaengine_buffer_submit_block(struct 
iio_dma_buffer_queue *queue,
struct dmaengine_buffer *dmaengine_buffer =
iio_buffer_to_dmaengine_buffer(&queue->buffer);
struct dma_async_tx_descriptor *desc;
+   struct scatterlist *sgl;
+   struct dma_vec *vecs;
dma_cookie_t cookie;
+   size_t len_total;
+   size_t max_size;
+   unsigned int i;
+   int nents;
 
-   block->bytes_used = min(block->size, dmaengine_buffer->max_size);
-   block->bytes_used = round_down(block->bytes_used,
-   dmaengine_buffer->align);
+   if (queue->buffer.direction != IIO_BUFFER_DIRECTION_IN) {
+   /* We do not yet support output buffers. */
+   return -EINVAL;
+   }
 
-   desc = dmaengine_prep_slave_single(dmaengine_buffer->chan,
-   block->phys_addr, block->bytes_used, DMA_DEV_TO_MEM,
-   DMA_PREP_INTERRUPT);
+   if (block->sg_table) {
+   sgl = block->sg_table->sgl;
+   nents = sg_nents_for_len(sgl, block->bytes_used);
+   if (nents < 0)
+   return nents;
+
+   vecs = kmalloc_array(nents, sizeof(*vecs), GFP_ATOMIC);
+   if (!vecs)
+   return -ENOMEM;
+
+   len_total = block->bytes_used;
+
+   for (i = 0; i < nents; i++) {
+   vecs[i].addr = sg_dma_address(sgl);
+   vecs[i].len = min(sg_dma_len(sgl), len_total);
+   len_total -= vecs[i].len;
+
+   sgl = sg_next(sgl);
+   }
+
+   desc = dmaengine_prep_slave_dma_vec(dmaengine_buffer->chan,
+   vecs, nents, DMA_DEV_TO_MEM,
+   DMA_PREP_INTERRUPT);
+   kfree(vecs);
+   } else {
+   max_size = min(block->size, dmaengine_buffer->max_size);
+   max_size = round_down(max_size, dmaengine_buffer->align);
+   block->bytes_used = max_size;
+
+   desc = dmaengine_prep_slave_single(dmaengine_buffer->chan,
+  block->phys_addr,
+  block->bytes_used,
+  DMA_DEV_TO_MEM,
+  DMA_PREP_INTERRUPT);
+   }
if (!desc)
return -ENOMEM;
 
@@ -120,6 +159,13 @@ static const struct iio_buffer_access_funcs 
iio_dmaengine_buffer_ops = {
.data_available = iio_dma_buffer_data_available,
.release = iio_dmaengine_buffer_release,
 
+   .enqueue_dmabuf = iio_dma_buffer_enqueue_dmabuf,
+   .attach_dmabuf = iio_dma_buffer_attach_dmabuf,
+   .detach_dmabuf = iio_dma_buffer_detach_dmabuf,
+
+   .lock_queue = iio_dma_buffer_lock_queue,
+   .unlock_queue = iio_dma_buffer_unlock_queue,
+
.modes = INDIO_BUFFER_HARDWARE,
.flags = INDIO_BUFFER_FLAG_FIXED_WATERMARK,
 };
-- 
2.43.0

[PATCH v6 4/6] iio: buffer-dma: Enable support for DMABUFs

2024-01-29 Thread Paul Cercueil

Implement iio_dma_buffer_attach_dmabuf(), iio_dma_buffer_detach_dmabuf()
and iio_dma_buffer_transfer_dmabuf(), which can then be used by the IIO
DMA buffer implementations.

Signed-off-by: Paul Cercueil 

---
v3: Update code to provide the functions that will be used as callbacks
for the new IOCTLs.

v6: - Update iio_dma_buffer_enqueue_dmabuf() to take a dma_fence pointer
- Pass that dma_fence pointer along to
  iio_buffer_signal_dmabuf_done()
- Add iio_dma_buffer_lock_queue() / iio_dma_buffer_unlock_queue()
- Do not lock the queue in iio_dma_buffer_enqueue_dmabuf().
  The caller will ensure that it has been locked already.
- Replace "int += bool;" by "if (bool) int++;"
- Use dma_fence_begin/end_signalling in the dma_fence critical
  sections
- Use one "num_dmabufs" fields instead of one "num_blocks" and one
  "num_fileio_blocks". Make it an atomic_t, which makes it possible
  to decrement it atomically in iio_buffer_block_release() without
  having to lock the queue mutex; and in turn, it means that we
  don't need to use iio_buffer_block_put_atomic() everywhere to
  avoid locking the queue mutex twice.
- Use cleanup.h guard(mutex) when possible
- Explicitely list all states in the switch in
  iio_dma_can_enqueue_block()
- Rename iio_dma_buffer_fileio_mode() to
  iio_dma_buffer_can_use_fileio(), and add a comment explaining why
  it cannot race vs. DMABUF.
---
 drivers/iio/buffer/industrialio-buffer-dma.c | 181 +--
 include/linux/iio/buffer-dma.h   |  31 
 2 files changed, 201 insertions(+), 11 deletions(-)

diff --git a/drivers/iio/buffer/industrialio-buffer-dma.c 
b/drivers/iio/buffer/industrialio-buffer-dma.c
index 5610ba67925e..c0f539af98f9 100644
--- a/drivers/iio/buffer/industrialio-buffer-dma.c
+++ b/drivers/iio/buffer/industrialio-buffer-dma.c
@@ -4,6 +4,8 @@
  *  Author: Lars-Peter Clausen 
  */
 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -14,6 +16,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 
@@ -94,13 +98,18 @@ static void iio_buffer_block_release(struct kref *kref)
 {
struct iio_dma_buffer_block *block = container_of(kref,
struct iio_dma_buffer_block, kref);
+   struct iio_dma_buffer_queue *queue = block->queue;
 
-   WARN_ON(block->state != IIO_BLOCK_STATE_DEAD);
+   WARN_ON(block->fileio && block->state != IIO_BLOCK_STATE_DEAD);
 
-   dma_free_coherent(block->queue->dev, PAGE_ALIGN(block->size),
-   block->vaddr, block->phys_addr);
+   if (block->fileio) {
+   dma_free_coherent(queue->dev, PAGE_ALIGN(block->size),
+ block->vaddr, block->phys_addr);
+   } else {
+   atomic_dec(&queue->num_dmabufs);
+   }
 
-   iio_buffer_put(&block->queue->buffer);
+   iio_buffer_put(&queue->buffer);
kfree(block);
 }
 
@@ -163,7 +172,7 @@ static struct iio_dma_buffer_queue 
*iio_buffer_to_queue(struct iio_buffer *buf)
 }
 
 static struct iio_dma_buffer_block *iio_dma_buffer_alloc_block(
-   struct iio_dma_buffer_queue *queue, size_t size)
+   struct iio_dma_buffer_queue *queue, size_t size, bool fileio)
 {
struct iio_dma_buffer_block *block;
 
@@ -171,13 +180,16 @@ static struct iio_dma_buffer_block 
*iio_dma_buffer_alloc_block(
if (!block)
return NULL;
 
-   block->vaddr = dma_alloc_coherent(queue->dev, PAGE_ALIGN(size),
-   &block->phys_addr, GFP_KERNEL);
-   if (!block->vaddr) {
-   kfree(block);
-   return NULL;
+   if (fileio) {
+   block->vaddr = dma_alloc_coherent(queue->dev, PAGE_ALIGN(size),
+ &block->phys_addr, 
GFP_KERNEL);
+   if (!block->vaddr) {
+   kfree(block);
+   return NULL;
+   }
}
 
+   block->fileio = fileio;
block->size = size;
block->state = IIO_BLOCK_STATE_DONE;
block->queue = queue;
@@ -186,6 +198,9 @@ static struct iio_dma_buffer_block 
*iio_dma_buffer_alloc_block(
 
iio_buffer_get(&queue->buffer);
 
+   if (!fileio)
+   atomic_inc(&queue->num_dmabufs);
+
return block;
 }
 
@@ -206,13 +221,21 @@ void iio_dma_buffer_block_done(struct 
iio_dma_buffer_block *block)
 {
struct iio_dma_buffer_queue *queue = block->queue;
unsigned long flags;
+   bool cookie;
+
+   cookie = dma_fence_begin_signalling();
 
spin_lock_irqsave(&queue->list_lock, flags);
_iio_dma_buffer_block_done(block);
spin_unlock_irqrestore(&queue->list_lock, flags);
 
+   if (!block->fileio)
+   iio_buffer_signal_dmabuf_done(block->fence, 0);
+
iio_buffer_block_put_atomic(block);
wake_up_interruptible_poll(&queue->buffer.pollq, EPOLLIN | EPOLLRDN

[PATCH v6 6/6] Documentation: iio: Document high-speed DMABUF based API

2024-01-29 Thread Paul Cercueil

Document the new DMABUF based API.

Signed-off-by: Paul Cercueil 

---
v2: - Explicitly state that the new interface is optional and is
  not implemented by all drivers.
- The IOCTLs can now only be called on the buffer FD returned by
  IIO_BUFFER_GET_FD_IOCTL.
- Move the page up a bit in the index since it is core stuff and not
  driver-specific.

v3: Update the documentation to reflect the new API.

v5: Use description lists for the documentation of the three new IOCTLs
instead of abusing subsections.
---
 Documentation/iio/dmabuf_api.rst | 54 
 Documentation/iio/index.rst  |  2 ++
 2 files changed, 56 insertions(+)
 create mode 100644 Documentation/iio/dmabuf_api.rst

diff --git a/Documentation/iio/dmabuf_api.rst b/Documentation/iio/dmabuf_api.rst
new file mode 100644
index ..1cd6cd51a582
--- /dev/null
+++ b/Documentation/iio/dmabuf_api.rst
@@ -0,0 +1,54 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+===
+High-speed DMABUF interface for IIO
+===
+
+1. Overview
+===
+
+The Industrial I/O subsystem supports access to buffers through a
+file-based interface, with read() and write() access calls through the
+IIO device's dev node.
+
+It additionally supports a DMABUF based interface, where the userspace
+can attach DMABUF objects (externally created) to a IIO buffer, and
+subsequently use them for data transfers.
+
+A userspace application can then use this interface to share DMABUF
+objects between several interfaces, allowing it to transfer data in a
+zero-copy fashion, for instance between IIO and the USB stack.
+
+The userspace application can also memory-map the DMABUF objects, and
+access the sample data directly. The advantage of doing this vs. the
+read() interface is that it avoids an extra copy of the data between the
+kernel and userspace. This is particularly useful for high-speed devices
+which produce several megabytes or even gigabytes of data per second.
+It does however increase the userspace-kernelspace synchronization
+overhead, as the DMA_BUF_SYNC_START and DMA_BUF_SYNC_END IOCTLs have to
+be used for data integrity.
+
+2. User API
+===
+
+As part of this interface, three new IOCTLs have been added. These three
+IOCTLs have to be performed on the IIO buffer's file descriptor,
+obtained using the IIO_BUFFER_GET_FD_IOCTL() ioctl.
+
+  ``IIO_BUFFER_DMABUF_ATTACH_IOCTL(int)``
+Attach the DMABUF object, identified by its file descriptor, to the
+IIO buffer. Returns zero on success, and a negative errno value on
+error.
+
+  ``IIO_BUFFER_DMABUF_DETACH_IOCTL(int)``
+Detach the given DMABUF object, identified by its file descriptor,
+from the IIO buffer. Returns zero on success, and a negative errno
+value on error.
+
+Note that closing the IIO buffer's file descriptor will
+automatically detach all previously attached DMABUF objects.
+
+  ``IIO_BUFFER_DMABUF_ENQUEUE_IOCTL(struct iio_dmabuf *iio_dmabuf)``
+Enqueue a previously attached DMABUF object to the buffer queue.
+Enqueued DMABUFs will be read from (if output buffer) or written to
+(if input buffer) as long as the buffer is enabled.
diff --git a/Documentation/iio/index.rst b/Documentation/iio/index.rst
index 1b7292c58cd0..3eae8fcb1938 100644
--- a/Documentation/iio/index.rst
+++ b/Documentation/iio/index.rst
@@ -9,6 +9,8 @@ Industrial I/O
 
iio_configfs
 
+   dmabuf_api
+
ep93xx_adc
 
bno055
-- 
2.43.0

[PATCH 6.7 251/346] drm: Fix TODO list mentioning non-KMS drivers

2024-01-29 Thread Greg Kroah-Hartman

6.7-stable review patch.  If anyone has any objections, please let me know.

--

From: Thomas Zimmermann 

commit 9cf5ca1f485cae406968947a92bf304603999fa1 upstream.

Non-KMS drivers have been removed from DRM. Update the TODO list
accordingly.

Signed-off-by: Thomas Zimmermann 
Fixes: a276afc19eec ("drm: Remove some obsolete drm pciids(tdfx, mga, i810, 
savage, r128, sis, via)")
Cc: Cai Huoqing 
Cc: Daniel Vetter 
Cc: Dave Airlie 
Cc: Thomas Zimmermann 
Cc: Maarten Lankhorst 
Cc: Maxime Ripard 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: Jonathan Corbet 
Cc: dri-devel@lists.freedesktop.org
Cc:  # v6.3+
Cc: linux-...@vger.kernel.org
Reviewed-by: David Airlie 
Reviewed-by: Daniel Vetter 
Acked-by: Alex Deucher 
Link: 
https://patchwork.freedesktop.org/patch/msgid/20231122122449.11588-3-tzimmerm...@suse.de
Signed-off-by: Greg Kroah-Hartman 
---
 Documentation/gpu/todo.rst |7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

--- a/Documentation/gpu/todo.rst
+++ b/Documentation/gpu/todo.rst
@@ -337,8 +337,8 @@ connector register/unregister fixes
 
 Level: Intermediate
 
-Remove load/unload callbacks from all non-DRIVER_LEGACY drivers

+Remove load/unload callbacks
+
 
 The load/unload callbacks in struct &drm_driver are very much midlayers, plus
 for historical reasons they get the ordering wrong (and we can't fix that)
@@ -347,8 +347,7 @@ between setting up the &drm_driver struc
 - Rework drivers to no longer use the load/unload callbacks, directly coding 
the
   load/unload sequence into the driver's probe function.
 
-- Once all non-DRIVER_LEGACY drivers are converted, disallow the load/unload
-  callbacks for all modern drivers.
+- Once all drivers are converted, remove the load/unload callbacks.
 
 Contact: Daniel Vetter

[PATCH 6.7 253/346] drm: Disable the cursor plane on atomic contexts with virtualized drivers

2024-01-29 Thread Greg Kroah-Hartman

6.7-stable review patch.  If anyone has any objections, please let me know.

--

From: Zack Rusin 

commit 4e3b70da64a53784683cfcbac2deda5d6e540407 upstream.

Cursor planes on virtualized drivers have special meaning and require
that the clients handle them in specific ways, e.g. the cursor plane
should react to the mouse movement the way a mouse cursor would be
expected to and the client is required to set hotspot properties on it
in order for the mouse events to be routed correctly.

This breaks the contract as specified by the "universal planes". Fix it
by disabling the cursor planes on virtualized drivers while adding
a foundation on top of which it's possible to special case mouse cursor
planes for clients that want it.

Disabling the cursor planes makes some kms compositors which were broken,
e.g. Weston, fallback to software cursor which works fine or at least
better than currently while having no effect on others, e.g. gnome-shell
or kwin, which put virtualized drivers on a deny-list when running in
atomic context to make them fallback to legacy kms and avoid this issue.

Signed-off-by: Zack Rusin 
Fixes: 681e7ec73044 ("drm: Allow userspace to ask for universal plane list 
(v2)")
Cc:  # v5.4+
Cc: Maarten Lankhorst 
Cc: Maxime Ripard 
Cc: Thomas Zimmermann 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: Dave Airlie 
Cc: Gerd Hoffmann 
Cc: Hans de Goede 
Cc: Gurchetan Singh 
Cc: Chia-I Wu 
Cc: dri-devel@lists.freedesktop.org
Cc: virtualizat...@lists.linux-foundation.org
Cc: spice-de...@lists.freedesktop.org
Acked-by: Pekka Paalanen 
Reviewed-by: Javier Martinez Canillas 
Acked-by: Simon Ser 
Signed-off-by: Javier Martinez Canillas 
Link: 
https://patchwork.freedesktop.org/patch/msgid/20231023074613.41327-2-aest...@redhat.com
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/gpu/drm/drm_plane.c  |   13 +
 drivers/gpu/drm/qxl/qxl_drv.c|2 +-
 drivers/gpu/drm/vboxvideo/vbox_drv.c |2 +-
 drivers/gpu/drm/virtio/virtgpu_drv.c |2 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c  |2 +-
 include/drm/drm_drv.h|9 +
 include/drm/drm_file.h   |   12 
 7 files changed, 38 insertions(+), 4 deletions(-)

--- a/drivers/gpu/drm/drm_plane.c
+++ b/drivers/gpu/drm/drm_plane.c
@@ -678,6 +678,19 @@ int drm_mode_getplane_res(struct drm_dev
!file_priv->universal_planes)
continue;
 
+   /*
+* If we're running on a virtualized driver then,
+* unless userspace advertizes support for the
+* virtualized cursor plane, disable cursor planes
+* because they'll be broken due to missing cursor
+* hotspot info.
+*/
+   if (plane->type == DRM_PLANE_TYPE_CURSOR &&
+   drm_core_check_feature(dev, DRIVER_CURSOR_HOTSPOT) &&
+   file_priv->atomic &&
+   !file_priv->supports_virtualized_cursor_plane)
+   continue;
+
if (drm_lease_held(file_priv, plane->base.id)) {
if (count < plane_resp->count_planes &&
put_user(plane->base.id, plane_ptr + count))
--- a/drivers/gpu/drm/qxl/qxl_drv.c
+++ b/drivers/gpu/drm/qxl/qxl_drv.c
@@ -285,7 +285,7 @@ static const struct drm_ioctl_desc qxl_i
 };
 
 static struct drm_driver qxl_driver = {
-   .driver_features = DRIVER_GEM | DRIVER_MODESET | DRIVER_ATOMIC,
+   .driver_features = DRIVER_GEM | DRIVER_MODESET | DRIVER_ATOMIC | 
DRIVER_CURSOR_HOTSPOT,
 
.dumb_create = qxl_mode_dumb_create,
.dumb_map_offset = drm_gem_ttm_dumb_map_offset,
--- a/drivers/gpu/drm/vboxvideo/vbox_drv.c
+++ b/drivers/gpu/drm/vboxvideo/vbox_drv.c
@@ -182,7 +182,7 @@ DEFINE_DRM_GEM_FOPS(vbox_fops);
 
 static const struct drm_driver driver = {
.driver_features =
-   DRIVER_MODESET | DRIVER_GEM | DRIVER_ATOMIC,
+   DRIVER_MODESET | DRIVER_GEM | DRIVER_ATOMIC | DRIVER_CURSOR_HOTSPOT,
 
.fops = &vbox_fops,
.name = DRIVER_NAME,
--- a/drivers/gpu/drm/virtio/virtgpu_drv.c
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.c
@@ -177,7 +177,7 @@ static const struct drm_driver driver =
 * out via drm_device::driver_features:
 */
.driver_features = DRIVER_MODESET | DRIVER_GEM | DRIVER_RENDER | 
DRIVER_ATOMIC |
-  DRIVER_SYNCOBJ | DRIVER_SYNCOBJ_TIMELINE,
+  DRIVER_SYNCOBJ | DRIVER_SYNCOBJ_TIMELINE | 
DRIVER_CURSOR_HOTSPOT,
.open = virtio_gpu_driver_open,
.postclose = virtio_gpu_driver_postclose,
 
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
@@ -1611,7 +1611,7 @@ static const struct file_operations vmwg
 
 static const struct drm_driver driver = {
.driver_features =
-   DRIVER_MODESET | DRIVER_RENDER | DRIVER_ATOMIC | DRIVER_GEM,
+   DRIVER_MODESET | DRIVER_RENDER | DRI

Re: [PATCH] drm/sched: Drain all entities in DRM sched run job worker

2024-01-29 Thread Luben Tuikov

On 2024-01-29 02:44, Christian König wrote:
> Am 26.01.24 um 17:29 schrieb Matthew Brost:
>> On Fri, Jan 26, 2024 at 11:32:57AM +0100, Christian König wrote:
>>> Am 25.01.24 um 18:30 schrieb Matthew Brost:
 On Thu, Jan 25, 2024 at 04:12:58PM +0100, Christian König wrote:
> Am 24.01.24 um 22:08 schrieb Matthew Brost:
>> All entities must be drained in the DRM scheduler run job worker to
>> avoid the following case. An entity found that is ready, no job found
>> ready on entity, and run job worker goes idle with other entities + jobs
>> ready. Draining all ready entities (i.e. loop over all ready entities)
>> in the run job worker ensures all job that are ready will be scheduled.
> That doesn't make sense. drm_sched_select_entity() only returns entities
> which are "ready", e.g. have a job to run.
>
 That is what I thought too, hence my original design but it is not
 exactly true. Let me explain.

 drm_sched_select_entity() returns an entity with a non-empty spsc queue
 (job in queue) and no *current* waiting dependecies [1]. Dependecies for
 an entity can be added when drm_sched_entity_pop_job() is called [2][3]
 returning a NULL job. Thus we can get into a scenario where 2 entities
 A and B both have jobs and no current dependecies. A's job is waiting
 B's job, entity A gets selected first, a dependecy gets installed in
 drm_sched_entity_pop_job(), run work goes idle, and now we deadlock.
>>> And here is the real problem. run work doesn't goes idle in that moment.
>>>
>>> drm_sched_run_job_work() should restarts itself until there is either no
>>> more space in the ring buffer or it can't find a ready entity any more.
>>>
>>> At least that was the original design when that was all still driven by a
>>> kthread.
>>>
>>> It can perfectly be that we messed this up when switching from kthread to a
>>> work item.
>>>
>> Right, that what this patch does - the run worker does not go idle until
>> no ready entities are found. That was incorrect in the original patch
>> and fixed here. Do you have any issues with this fix? It has been tested
>> 3x times and clearly fixes the issue.
> 
> Ah! Yes in this case that patch here is a little bit ugly as well.
> 
> The original idea was that run_job restarts so that we are able to pause 
> the submission thread without searching for an entity to submit more.
> 
> I strongly suggest to replace the while loop with a call to 
> drm_sched_run_job_queue() so that when the entity can't provide a job we 
> just restart the queuing work.

I agree with Christian. This more closely preserves the original design
of the GPU schedulers, so we should go with that.
-- 
Regards,
Luben


OpenPGP_0x4C15479431A334AF.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature

1 2 >

1 - 100 of 140 matches

Mail list logo