Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-05-04 Thread Christian König
Unfortunately as I pointed out to Daniel as well this won't work 100% 
reliable either.


See the signal on the ring buffer needs to be protected by manipulation 
from userspace so that we can guarantee that the hardware really has 
finished executing when it fires.


Protecting memory by immediate page table updates is a good first step, 
but unfortunately not sufficient (and we would need to restructure large 
parts of the driver to make this happen).


On older hardware we often had the situation that for reliable 
invalidation we need the guarantee that every previous operation has 
finished executing. It's not so much of a problem when the next 
operation has already started, since then we had the opportunity to do 
things in between the last and the next operation. Just see cache 
invalidation and VM switching for example.


Additional to that it doesn't really buy us anything, e.g. there is not 
much advantage to this. Writing the ring buffer in userspace and then 
ringing in the kernel has the same overhead as doing everything in the 
kernel in the first place.


Christian.

Am 04.05.21 um 05:11 schrieb Marek Olšák:

Proposal for a new CS ioctl, kernel pseudo code:

lock(&global_lock);
serial = get_next_serial(dev);
add_wait_command(ring, serial - 1);
add_exec_cmdbuf(ring, user_cmdbuf);
add_signal_command(ring, serial);
*ring->doorbell = FIRE;
unlock(&global_lock);

See? Just like userspace submit, but in the kernel without 
concurrency/preemption. Is this now safe enough for dma_fence?


Marek

On Mon, May 3, 2021 at 4:36 PM Marek Olšák > wrote:


What about direct submit from the kernel where the process still
has write access to the GPU ring buffer but doesn't use it? I
think that solves your preemption example, but leaves a potential
backdoor for a process to overwrite the signal commands, which
shouldn't be a problem since we are OK with timeouts.

Marek

On Mon, May 3, 2021 at 11:23 AM Jason Ekstrand
mailto:ja...@jlekstrand.net>> wrote:

On Mon, May 3, 2021 at 10:16 AM Bas Nieuwenhuizen
mailto:b...@basnieuwenhuizen.nl>> wrote:
>
> On Mon, May 3, 2021 at 5:00 PM Jason Ekstrand
mailto:ja...@jlekstrand.net>> wrote:
> >
> > Sorry for the top-post but there's no good thing to reply
to here...
> >
> > One of the things pointed out to me recently by Daniel
Vetter that I
> > didn't fully understand before is that dma_buf has a very
subtle
> > second requirement beyond finite time completion:  Nothing
required
> > for signaling a dma-fence can allocate memory. Why? 
Because the act
> > of allocating memory may wait on your dma-fence.  This, as
it turns
> > out, is a massively more strict requirement than finite time
> > completion and, I think, throws out all of the proposals
we have so
> > far.
> >
> > Take, for instance, Marek's proposal for userspace
involvement with
> > dma-fence by asking the kernel for a next serial and the
kernel
> > trusting userspace to signal it.  That doesn't work at all if
> > allocating memory to trigger a dma-fence can blow up. 
There's simply
> > no way for the kernel to trust userspace to not do
ANYTHING which
> > might allocate memory.  I don't even think there's a way
userspace can
> > trust itself there.  It also blows up my plan of moving
the fences to
> > transition boundaries.
> >
> > Not sure where that leaves us.
>
> Honestly the more I look at things I think
userspace-signalable fences
> with a timeout sound like they are a valid solution for
these issues.
> Especially since (as has been mentioned countless times in
this email
> thread) userspace already has a lot of ways to cause
timeouts and or
> GPU hangs through GPU work already.
>
> Adding a timeout on the signaling side of a dma_fence would
ensure:
>
> - The dma_fence signals in finite time
> -  If the timeout case does not allocate memory then memory
allocation
> is not a blocker for signaling.
>
> Of course you lose the full dependency graph and we need to
make sure
> garbage collection of fences works correctly when we have
cycles.
> However, the latter sounds very doable and the first sounds
like it is
> to some extent inevitable.
>
> I feel like I'm missing some requirement here given that we
> immediately went to much more complicated things but can't
find it.
> Thoughts?

Timeouts are sufficient to protect the kernel but they make
the fences
unpredictable and u

Re: [PATCH v5 06/27] drm/amdgpu: Handle IOMMU enabled case.

2021-05-04 Thread Christian König

Am 03.05.21 um 22:43 schrieb Andrey Grodzovsky:



On 2021-04-29 3:08 a.m., Christian König wrote:

Am 28.04.21 um 17:11 schrieb Andrey Grodzovsky:

Handle all DMA IOMMU gropup related dependencies before the
group is removed.

v5: Drop IOMMU notifier and switch to lockless call to 
ttm_tt_unpopulate


Maybe split that up into more patches.



Signed-off-by: Andrey Grodzovsky 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h    |  2 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 31 
--

  drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c   |  3 +--
  drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h   |  1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c    |  9 +++
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 13 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |  2 ++
  drivers/gpu/drm/amd/amdgpu/cik_ih.c    |  1 -
  drivers/gpu/drm/amd/amdgpu/cz_ih.c |  1 -
  drivers/gpu/drm/amd/amdgpu/iceland_ih.c    |  1 -
  drivers/gpu/drm/amd/amdgpu/navi10_ih.c |  3 ---
  drivers/gpu/drm/amd/amdgpu/si_ih.c |  1 -
  drivers/gpu/drm/amd/amdgpu/tonga_ih.c  |  1 -
  drivers/gpu/drm/amd/amdgpu/vega10_ih.c |  3 ---
  14 files changed, 56 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h

index fddb82897e5d..30a24db5f4d1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1054,6 +1054,8 @@ struct amdgpu_device {
  bool    in_pci_err_recovery;
  struct pci_saved_state  *pci_state;
+
+    struct list_head    device_bo_list;
  };
  static inline struct amdgpu_device *drm_to_adev(struct drm_device 
*ddev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

index 46d646c40338..91594ddc2459 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -70,6 +70,7 @@
  #include 
  #include 
+
  MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin");
  MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin");
  MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin");
@@ -3211,7 +3212,6 @@ static const struct attribute 
*amdgpu_dev_attributes[] = {

  NULL
  };
-
  /**
   * amdgpu_device_init - initialize the driver
   *
@@ -3316,6 +3316,8 @@ int amdgpu_device_init(struct amdgpu_device 
*adev,

  INIT_WORK(&adev->xgmi_reset_work, amdgpu_device_xgmi_reset_func);
+    INIT_LIST_HEAD(&adev->device_bo_list);
+
  adev->gfx.gfx_off_req_count = 1;
  adev->pm.ac_power = power_supply_is_system_supplied() > 0;
@@ -3601,6 +3603,28 @@ int amdgpu_device_init(struct amdgpu_device 
*adev,

  return r;
  }
+static void amdgpu_clear_dma_mappings(struct amdgpu_device *adev)
+{
+    struct amdgpu_bo *bo = NULL;
+
+    /*
+ * Unmaps all DMA mappings before device will be removed from it's
+ * IOMMU group otherwise in case of IOMMU enabled system a crash
+ * will happen.
+ */
+
+    spin_lock(&adev->mman.bdev.lru_lock);
+    while (!list_empty(&adev->device_bo_list)) {
+    bo = list_first_entry(&adev->device_bo_list, struct 
amdgpu_bo, bo);

+    list_del_init(&bo->bo);
+    spin_unlock(&adev->mman.bdev.lru_lock);
+    if (bo->tbo.ttm)
+    ttm_tt_unpopulate(bo->tbo.bdev, bo->tbo.ttm);
+    spin_lock(&adev->mman.bdev.lru_lock);
+    }
+    spin_unlock(&adev->mman.bdev.lru_lock);


Can you try to use the same approach as amdgpu_gtt_mgr_recover() 
instead of adding something to the BO?


Christian.


Are you sure that dma mappings limit themself only to GTT BOs
which have allocated mm nodes ?


Yes, you would also need the system domain BOs. But those can be put on 
a similar list.



Otherwsie we will crash and burn
on missing IOMMU group when unampping post device remove.
Problem for me to test this as in 5.12 kernel I don't crash even
when removing this entire patch.  Looks like iommu_dma_unmap_page
was changed since 5.9 when I introdiced this patch.


Do we really still need that stuff then? What exactly has changed?

Christian.



Andrey




+}
+
  /**
   * amdgpu_device_fini - tear down the driver
   *
@@ -3639,12 +3663,15 @@ void amdgpu_device_fini_hw(struct 
amdgpu_device *adev)

  amdgpu_ucode_sysfs_fini(adev);
  sysfs_remove_files(&adev->dev->kobj, amdgpu_dev_attributes);
-
  amdgpu_fbdev_fini(adev);
  amdgpu_irq_fini_hw(adev);
  amdgpu_device_ip_fini_early(adev);
+
+    amdgpu_clear_dma_mappings(adev);
+
+    amdgpu_gart_dummy_page_fini(adev);
  }
  void amdgpu_device_fini_sw(struct amdgpu_device *adev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c

index fde2d899b2c4..49cdcaf8512d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
@@ -92,7 +92,7 @@ static int amdgpu_gart_dummy_page_init(struct 
amdgpu_device *adev)

   *
   * Frees the dummy page used by the driver (all asics).
   */
-sta

Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-05-04 Thread Daniel Vetter
On Tue, May 04, 2021 at 09:01:23AM +0200, Christian König wrote:
> Unfortunately as I pointed out to Daniel as well this won't work 100%
> reliable either.

You're claiming this, but there's no clear reason why really, and you
did't reply to my last mail on that sub-thread, so I really don't get
where exactly you're seeing a problem.

> See the signal on the ring buffer needs to be protected by manipulation from
> userspace so that we can guarantee that the hardware really has finished
> executing when it fires.

Nope you don't. Userspace is already allowed to submit all kinds of random
garbage, the only thing the kernel has to guarnatee is:
- the dma-fence DAG stays a DAG
- dma-fence completes in finite time

Everything else is not the kernel's problem, and if userspace mixes stuff
up like manipulates the seqno, that's ok. It can do that kind of garbage
already.

> Protecting memory by immediate page table updates is a good first step, but
> unfortunately not sufficient (and we would need to restructure large parts
> of the driver to make this happen).

This is why you need the unload-fence on top, because indeed you can't
just rely on the fences created from the userspace ring, those are
unreliable for memory management.

btw I thought some more, and I think it's probably best if we only attach
the unload-fence in the ->move(_notify) callbacks. Kinda like we already
do for async copy jobs. So the overall buffer move sequence would be:

1. wait for (untrusted for kernel, but necessary for userspace
correctness) fake dma-fence that rely on the userspace ring

2. unload ctx

3. copy buffer

Ofc 2&3 would be done async behind a dma_fence.

> On older hardware we often had the situation that for reliable invalidation
> we need the guarantee that every previous operation has finished executing.
> It's not so much of a problem when the next operation has already started,
> since then we had the opportunity to do things in between the last and the
> next operation. Just see cache invalidation and VM switching for example.

If you have gpu page faults you generally have synchronous tlb
invalidation, so this also shouldn't be a big problem. Combined with the
unload fence at least. If you don't have synchronous tlb invalidate it
gets a bit more nasty and you need to force a preemption to a kernel
context which has the required flushes across all the caches. Slightly
nasty, but the exact same thing would be required for handling page faults
anyway with the direct userspace submit model.

Again I'm not seeing a problem.

> Additional to that it doesn't really buy us anything, e.g. there is not much
> advantage to this. Writing the ring buffer in userspace and then ringing in
> the kernel has the same overhead as doing everything in the kernel in the
> first place.

It gets you dma-fence backwards compat without having to rewrite the
entire userspace ecosystem. Also since you have the hw already designed
for ringbuffer in userspace it would be silly to copy that through the cs
ioctl, that's just overhead.

Also I thought the problem you're having is that all the kernel ringbuf
stuff is going away, so the old cs ioctl wont work anymore for sure?

Maybe also pick up that other subthread which ended with my last reply.

Cheers, Daniel


> 
> Christian.
> 
> Am 04.05.21 um 05:11 schrieb Marek Olšák:
> > Proposal for a new CS ioctl, kernel pseudo code:
> > 
> > lock(&global_lock);
> > serial = get_next_serial(dev);
> > add_wait_command(ring, serial - 1);
> > add_exec_cmdbuf(ring, user_cmdbuf);
> > add_signal_command(ring, serial);
> > *ring->doorbell = FIRE;
> > unlock(&global_lock);
> > 
> > See? Just like userspace submit, but in the kernel without
> > concurrency/preemption. Is this now safe enough for dma_fence?
> > 
> > Marek
> > 
> > On Mon, May 3, 2021 at 4:36 PM Marek Olšák  > > wrote:
> > 
> > What about direct submit from the kernel where the process still
> > has write access to the GPU ring buffer but doesn't use it? I
> > think that solves your preemption example, but leaves a potential
> > backdoor for a process to overwrite the signal commands, which
> > shouldn't be a problem since we are OK with timeouts.
> > 
> > Marek
> > 
> > On Mon, May 3, 2021 at 11:23 AM Jason Ekstrand
> > mailto:ja...@jlekstrand.net>> wrote:
> > 
> > On Mon, May 3, 2021 at 10:16 AM Bas Nieuwenhuizen
> > mailto:b...@basnieuwenhuizen.nl>> wrote:
> > >
> > > On Mon, May 3, 2021 at 5:00 PM Jason Ekstrand
> > mailto:ja...@jlekstrand.net>> wrote:
> > > >
> > > > Sorry for the top-post but there's no good thing to reply
> > to here...
> > > >
> > > > One of the things pointed out to me recently by Daniel
> > Vetter that I
> > > > didn't fully understand before is that dma_buf has a very
> > subtle
> > > > second requirement beyond finite time completion:  Nothing
> >   

Re: [Intel-gfx] [PATCH 08/21] drm/i915/gem: Disallow bonding of virtual engines

2021-05-04 Thread Daniel Vetter
On Sat, May 01, 2021 at 10:17:46AM -0700, Matthew Brost wrote:
> On Fri, Apr 30, 2021 at 12:11:07PM +0200, Daniel Vetter wrote:
> > On Thu, Apr 29, 2021 at 09:03:48PM -0700, Matthew Brost wrote:
> > > On Thu, Apr 29, 2021 at 02:14:19PM +0200, Daniel Vetter wrote:
> > > > On Wed, Apr 28, 2021 at 01:17:27PM -0500, Jason Ekstrand wrote:
> > > > > On Wed, Apr 28, 2021 at 1:02 PM Matthew Brost 
> > > > >  wrote:
> > > > > >
> > > > > > On Wed, Apr 28, 2021 at 12:46:07PM -0500, Jason Ekstrand wrote:
> > > > > > > On Wed, Apr 28, 2021 at 12:26 PM Matthew Brost 
> > > > > > >  wrote:
> > > > > > > > Jumping on here mid-thread. For what is is worth to make 
> > > > > > > > execlists work
> > > > > > > > with the upcoming parallel submission extension I leveraged 
> > > > > > > > some of the
> > > > > > > > existing bonding code so I wouldn't be too eager to delete this 
> > > > > > > > code
> > > > > > > > until that lands.
> > > > > > >
> > > > > > > Mind being a bit more specific about that?  The motivation for 
> > > > > > > this
> > > > > > > patch is that the current bonding handling and uAPI is, well, 
> > > > > > > very odd
> > > > > > > and confusing IMO.  It doesn't let you create sets of bonded 
> > > > > > > engines.
> > > > > > > Instead you create engines and then bond them together after the 
> > > > > > > fact.
> > > > > > > I didn't want to blindly duplicate those oddities with the 
> > > > > > > proto-ctx
> > > > > > > stuff unless they were useful.  With parallel submit, I would 
> > > > > > > expect
> > > > > > > we want a more explicit API where you specify a set of engine
> > > > > > > class/instance pairs to bond together into a single engine 
> > > > > > > similar to
> > > > > > > how the current balancing API works.
> > > > > > >
> > > > > > > Of course, that's all focused on the API and not the internals.  
> > > > > > > But,
> > > > > > > again, I'm not sure how we want things to look internally.  What 
> > > > > > > we've
> > > > > > > got now doesn't seem great for the GuC submission model but I'm 
> > > > > > > very
> > > > > > > much not the expert there.  I don't want to be working at cross
> > > > > > > purposes to you and I'm happy to leave bits if you think they're
> > > > > > > useful.  But I thought I was clearing things away so that you can 
> > > > > > > put
> > > > > > > in what you actually want for GuC/parallel submit.
> > > > > > >
> > > > > >
> > > > > > Removing all the UAPI things are fine but I wouldn't delete some of 
> > > > > > the
> > > > > > internal stuff (e.g. intel_virtual_engine_attach_bond, bond
> > > > > > intel_context_ops, the hook for a submit fence, etc...) as that will
> > > > > > still likely be used for the new parallel submission interface with
> > > > > > execlists. As you say the new UAPI wont allow crazy configurations,
> > > > > > only simple ones.
> > > > > 
> > > > > I'm fine with leaving some of the internal bits for a little while if
> > > > > it makes pulling the GuC scheduler in easier.  I'm just a bit
> > > > > skeptical of why you'd care about SUBMIT_FENCE. :-)  Daniel, any
> > > > > thoughts?
> > > > 
> > > > Yeah I'm also wondering why we need this. Essentially your insight (and
> > > > Tony Ye from media team confirmed) is that media umd never uses bonded 
> > > > on
> > > > virtual engines.
> > > >
> > > 
> > > Well you should use virtual engines with parallel submission interface 
> > > if are you using it correctly.
> > > 
> > > e.g. You want a 2 wide parallel submission and there are 4 engine
> > > instances.
> > > 
> > > You'd create 2 VEs:
> > > 
> > > A: 0, 2
> > > B: 1, 3
> > > set_parallel
> > 
> > So tbh I'm not really liking this part. At least my understanding is that
> > with GuC this is really one overall virtual engine, backed by a multi-lrc.
> > 
> > So it should fill one engine slot, not fill multiple virtual engines and
> > then be an awkward thing wrapped on top.
> > 
> > I think (but maybe my understanding of GuC and the parallel submit execbuf
> > interface is wrong) that the parallel engine should occupy a single VE
> > slot, not require additional VE just for fun (maybe the execlist backend
> > would require that internally, but that should not leak into the higher
> > levels, much less the uapi). And you submit your multi-batch execbuf on
> > that single parallel VE, which then gets passed to GuC as a multi-LRC.
> > Internally in the backend there's a bit of fan-out to put the right
> > MI_BB_START into the right rings and all that, but again I think that
> > should be backend concerns.
> > 
> > Or am I missing something big here?
> 
> Unfortunately that is not how the interface works. The user must
> configure the engine set with either physical or virtual engines which
> determine the valid placements of each BB (LRC, ring, whatever we want
> to call it) and call the set parallel extension which validations engine
> layout. After that the engines are ready be used with multi-BB
> submission in single IOCTL. 
> 

Re: [PATCH v3 10/11] drm: Use state helper instead of the plane state pointer

2021-05-04 Thread Daniel Vetter
On Fri, Apr 30, 2021 at 09:44:42AM -0700, Rob Clark wrote:
> On Thu, Apr 8, 2021 at 6:20 AM Maxime Ripard  wrote:
> >
> > Hi Stephen,
> >
> > On Tue, Mar 30, 2021 at 11:56:15AM -0700, Stephen Boyd wrote:
> > > Quoting Maxime Ripard (2021-03-30 08:35:27)
> > > > Hi Stephen,
> > > >
> > > > On Mon, Mar 29, 2021 at 06:52:01PM -0700, Stephen Boyd wrote:
> > > > > Trimming Cc list way down, sorry if that's too much.
> > > > >
> > > > > Quoting Maxime Ripard (2021-02-19 04:00:30)
> > > > > > Many drivers reference the plane->state pointer in order to get the
> > > > > > current plane state in their atomic_update or atomic_disable hooks,
> > > > > > which would be the new plane state in the global atomic state since
> > > > > > _swap_state happened when those hooks are run.
> > > > >
> > > > > Does this mean drm_atomic_helper_swap_state()?
> > > >
> > > > Yep. Previous to that call in drm_atomic_helper_commit, plane->state is
> > > > the state currently programmed in the hardware, so the old state (that's
> > > > the case you have with atomic_check for example)
> > > >
> > > > Once drm_atomic_helper_swap_state has run, plane->state is now the state
> > > > that needs to be programmed into the hardware, so the new state.
> > >
> > > Ok, and I suppose that is called by drm_atomic_helper_commit()?
> >
> > Yep :)
> >
> > > So presumably a modeset is causing this? I get the NULL pointer around
> > > the time we switch from the splash screen to the login screen. I think
> > > there's a modeset during that transition.
> >
> > It's very likely yeah. I really don't get how that pointer could be null
> > though :/
> 
> So I think I see what is going on.. the issue is the CRTC has changed,
> but not the plane, so there is no new-state for the plane.

Yeah you're not allowed to touch an object's hw state in ->atomic_commit
without acquiring it's state in atomic_check. Otherwise the
synchronization across commits that atomic helpers provides goes boom.

> But dpu_crtc_atomic_flush() iterates over all the attached planes,
> calling dpu_plane_restore() which leads into
> dpu_plane_atomic_update().. this is kinda dpu breaking the rules..

You're probably missing a drm_atomic_add_affected_planes() somewhere.
Without looking at the code at least, it might be that if you just blindly
do that you take too many states by default and oversynchronize across
multiple crtc, which isn't great. But better than getting the rules wrong
:-)

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Heads up to maintainers] Re: [PATCH v8 1/1] drm/drm_mst: Use Extended Base Receiver Capability DPCD space

2021-05-04 Thread Daniel Vetter
On Mon, May 03, 2021 at 11:33:33AM +0300, Jani Nikula wrote:
> On Fri, 30 Apr 2021, Jani Nikula  wrote:
> > On Thu, 29 Apr 2021, Lyude Paul  wrote:
> >> JFYI Jani and Ben: I will be pushing this patch to drm-misc-next sometime
> >> today if there's no objections
> >
> > Thanks for the heads-up, I think this breaks i915. See my review
> > comments elsewhere in the thread.
> 
> Looks like this was merged anyway.

Yeah in general rule of thumb is to let cross-driver stuff soak for a week
(assuming correctly cc'ed and all that already). I think that's the sweet
spot between maintainers who complain that it's too short and others
complaining it's too quick :-)
-Daniel

> 
> 98025a62cb00 ("drm/dp_mst: Use Extended Base Receiver Capability DPCD space")
> 
> I'm not happy how this played out.
> 
> 1) You need to Cc relevant people
> 
> 2) You need to get the ack before merging changes
> 
> 3) You need to give people more than a day to react, with time zones and
> all; I replied as soon as I saw the heads-up, but it was already too
> late
> 
> It's broken on i915, and perhaps that could be fixed.
> 
> However I also think using DP spec rate codes and calling them "rate" is
> a bad interface, especially when the unit breaks down with DP 2.0 rate
> codes. It's confusing and it's not future proof. Fixing that afterwards
> falls to whoever comes next to pick up the pieces.
> 
> I'd rather just see this reverted and redone.
> 
> 
> BR,
> Jani.
> 
> 
> >
> > BR,
> > Jani.
> >
> >
> >>
> >> On Wed, 2021-04-28 at 19:43 -0400, Nikola Cornij wrote:
> >>> [why]
> >>> DP 1.4a spec madates that if DP_EXTENDED_RECEIVER_CAP_FIELD_PRESENT is
> >>> set, Extended Base Receiver Capability DPCD space must be used. Without
> >>> doing that, the three DPCD values that differ will be wrong, leading to
> >>> incorrect or limited functionality. MST link rate, for example, could
> >>> have a lower value. Also, Synaptics quirk wouldn't work out well when
> >>> Extended DPCD was not read, resulting in no DSC for such hubs.
> >>> 
> >>> [how]
> >>> Modify MST topology manager to use the values from Extended DPCD where
> >>> applicable.
> >>> 
> >>> To prevent regression on the sources that have a lower maximum link rate
> >>> capability than MAX_LINK_RATE from Extended DPCD, have the drivers
> >>> supply maximum lane count and rate at initialization time.
> >>> 
> >>> This also reverts 'commit 2dcab875e763 ("Revert drm/dp_mst: Retrieve
> >>> extended DPCD caps for topology manager")', brining the change back to
> >>> the original 'commit ad44c03208e4 ("drm/dp_mst: Retrieve extended DPCD
> >>> caps for topology manager")'.
> >>> 
> >>> Signed-off-by: Nikola Cornij 
> >>> ---
> >>>  .../display/amdgpu_dm/amdgpu_dm_mst_types.c   |  5 +++
> >>>  .../gpu/drm/amd/display/dc/core/dc_link_dp.c  | 18 ++
> >>>  drivers/gpu/drm/amd/display/dc/dc_link.h  |  2 ++
> >>>  drivers/gpu/drm/drm_dp_mst_topology.c | 33 ---
> >>>  drivers/gpu/drm/i915/display/intel_dp_mst.c   |  6 +++-
> >>>  drivers/gpu/drm/nouveau/dispnv50/disp.c   |  3 +-
> >>>  drivers/gpu/drm/radeon/radeon_dp_mst.c    |  7 
> >>>  include/drm/drm_dp_mst_helper.h   | 12 ++-
> >>>  8 files changed, 71 insertions(+), 15 deletions(-)
> >>> 
> >>> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
> >>> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
> >>> index 997567f6f0ba..b7e01b6fb328 100644
> >>> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
> >>> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
> >>> @@ -429,6 +429,8 @@ void amdgpu_dm_initialize_dp_connector(struct
> >>> amdgpu_display_manager *dm,
> >>>    struct amdgpu_dm_connector
> >>> *aconnector,
> >>>    int link_index)
> >>>  {
> >>> +   struct dc_link_settings max_link_enc_cap = {0};
> >>> +
> >>> aconnector->dm_dp_aux.aux.name =
> >>> kasprintf(GFP_KERNEL, "AMDGPU DM aux hw bus %d",
> >>>   link_index);
> >>> @@ -443,6 +445,7 @@ void amdgpu_dm_initialize_dp_connector(struct
> >>> amdgpu_display_manager *dm,
> >>> if (aconnector->base.connector_type == DRM_MODE_CONNECTOR_eDP)
> >>> return;
> >>>  
> >>> +   dc_link_dp_get_max_link_enc_cap(aconnector->dc_link,
> >>> &max_link_enc_cap);
> >>> aconnector->mst_mgr.cbs = &dm_mst_cbs;
> >>> drm_dp_mst_topology_mgr_init(
> >>> &aconnector->mst_mgr,
> >>> @@ -450,6 +453,8 @@ void amdgpu_dm_initialize_dp_connector(struct
> >>> amdgpu_display_manager *dm,
> >>> &aconnector->dm_dp_aux.aux,
> >>> 16,
> >>> 4,
> >>> +   max_link_enc_cap.lane_count,
> >>> +   max_link_enc_cap.link_rate,
> >>> aconnector->connector_id);
> >>>  
> >>> drm_connector_attach_dp_subconnector_property(&aconnect

Re: [PATCH 5/9] drm/i915: Associate ACPI connector nodes with connector entries

2021-05-04 Thread Andy Shevchenko
On Monday, May 3, 2021, Hans de Goede  wrote:

> From: Heikki Krogerus 
>
> On Intel platforms we know that the ACPI connector device
> node order will follow the order the driver (i915) decides.
> The decision is made using the custom Intel ACPI OpRegion
> (intel_opregion.c), though the driver does not actually know
> that the values it sends to ACPI there are used for
> associating a device node for the connectors, and assigning
> address for them.
>
> In reality that custom Intel ACPI OpRegion actually violates
> ACPI specification (we supply dynamic information to objects
> that are defined static, for example _ADR), however, it
> makes assigning correct connector node for a connector entry
> straightforward (it's one-on-one mapping).
>
> Signed-off-by: Heikki Krogerus 
> [hdego...@redhat.com: Move intel_acpi_assign_connector_fwnodes() to
>  intel_acpi.c]
> Signed-off-by: Hans de Goede 
> ---
>  drivers/gpu/drm/i915/display/intel_acpi.c| 40 
>  drivers/gpu/drm/i915/display/intel_acpi.h|  3 ++
>  drivers/gpu/drm/i915/display/intel_display.c |  1 +
>  3 files changed, 44 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/display/intel_acpi.c
> b/drivers/gpu/drm/i915/display/intel_acpi.c
> index 833d0c1be4f1..9f266dfda7dd 100644
> --- a/drivers/gpu/drm/i915/display/intel_acpi.c
> +++ b/drivers/gpu/drm/i915/display/intel_acpi.c
> @@ -263,3 +263,43 @@ void intel_acpi_device_id_update(struct
> drm_i915_private *dev_priv)
> }
> drm_connector_list_iter_end(&conn_iter);
>  }
> +
> +/* NOTE: The connector order must be final before this is called. */
> +void intel_acpi_assign_connector_fwnodes(struct drm_i915_private *i915)
> +{
> +   struct drm_connector_list_iter conn_iter;
> +   struct drm_device *drm_dev = &i915->drm;
> +   struct device *kdev = &drm_dev->pdev->dev;
> +   struct fwnode_handle *fwnode = NULL;
> +   struct drm_connector *connector;
> +   struct acpi_device *adev;
> +
> +   drm_connector_list_iter_begin(drm_dev, &conn_iter);
> +   drm_for_each_connector_iter(connector, &conn_iter) {
> +   /* Always getting the next, even when the last was not
> used. */
> +   fwnode = device_get_next_child_node(kdev, fwnode);
> +   if (!fwnode)
> +   break;



Who is dropping reference counting on fwnode ?

I’m in the middle of a pile of fixes for fwnode refcounting when
for_each_child or get_next_child is used. So, please double check you drop
a reference.


> +
> +   switch (connector->connector_type) {
> +   case DRM_MODE_CONNECTOR_LVDS:
> +   case DRM_MODE_CONNECTOR_eDP:
> +   case DRM_MODE_CONNECTOR_DSI:
> +   /*
> +* Integrated displays have a specific address
> 0x1f on
> +* most Intel platforms, but not on all of them.
> +*/
> +   adev = acpi_find_child_device(ACPI_
> COMPANION(kdev),
> + 0x1f, 0);
> +   if (adev) {
> +   connector->fwnode =
> acpi_fwnode_handle(adev);
> +   break;
> +   }
> +   fallthrough;
> +   default:
> +   connector->fwnode = fwnode;
> +   break;
> +   }
> +   }
> +   drm_connector_list_iter_end(&conn_iter);
> +}
> diff --git a/drivers/gpu/drm/i915/display/intel_acpi.h
> b/drivers/gpu/drm/i915/display/intel_acpi.h
> index e8b068661d22..d2435691f4b5 100644
> --- a/drivers/gpu/drm/i915/display/intel_acpi.h
> +++ b/drivers/gpu/drm/i915/display/intel_acpi.h
> @@ -12,11 +12,14 @@ struct drm_i915_private;
>  void intel_register_dsm_handler(void);
>  void intel_unregister_dsm_handler(void);
>  void intel_acpi_device_id_update(struct drm_i915_private *i915);
> +void intel_acpi_assign_connector_fwnodes(struct drm_i915_private *i915);
>  #else
>  static inline void intel_register_dsm_handler(void) { return; }
>  static inline void intel_unregister_dsm_handler(void) { return; }
>  static inline
>  void intel_acpi_device_id_update(struct drm_i915_private *i915) {
> return; }
> +static inline
> +void intel_acpi_assign_connector_fwnodes(struct drm_i915_private *i915)
> { return; }
>  #endif /* CONFIG_ACPI */
>
>  #endif /* __INTEL_ACPI_H__ */
> diff --git a/drivers/gpu/drm/i915/display/intel_display.c
> b/drivers/gpu/drm/i915/display/intel_display.c
> index 828ef4c5625f..87cad549632c 100644
> --- a/drivers/gpu/drm/i915/display/intel_display.c
> +++ b/drivers/gpu/drm/i915/display/intel_display.c
> @@ -14970,6 +14970,7 @@ int intel_modeset_init_nogem(struct
> drm_i915_private *i915)
>
> drm_modeset_lock_all(dev);
> intel_modeset_setup_hw_state(dev, dev->mode_config.acquire_ctx);
> +   intel_acpi_assign_connector_fwnodes(i915);
> drm_modeset_unlock

Re: [PATCH 2/9] drm/connector: Add a fwnode pointer to drm_connector and register with ACPI

2021-05-04 Thread Andy Shevchenko
On Monday, May 3, 2021, Hans de Goede  wrote:

> Add a fwnode pointer to struct drm_connector and register an acpi_bus_type
> for the connectors with the ACPI subsystem (when CONFIG_ACPI is enabled).
>
> The adding of the fwnode pointer allows drivers to associate a fwnode
> that represents a connector with that connector.
>
> When the new fwnode pointer points to an ACPI-companion, then the new
> acpi_bus_type will cause the ACPI subsys to bind the device instantiated
> for the connector with the fwnode by calling acpi_bind_one(). This will
> result in a firmware_node symlink under /sys/class/card#-/
> which helps to verify that the fwnode-s and connectors are properly
> matched.
>
> Co-authored-by: Heikki Krogerus 



Official tag is Co-developed-by


> Signed-off-by: Heikki Krogerus 
> Signed-off-by: Hans de Goede 
> ---
>  drivers/gpu/drm/drm_sysfs.c | 37 +
>  include/drm/drm_connector.h |  2 ++
>  2 files changed, 39 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_sysfs.c b/drivers/gpu/drm/drm_sysfs.c
> index 553024bcda8a..12cc649c44f0 100644
> --- a/drivers/gpu/drm/drm_sysfs.c
> +++ b/drivers/gpu/drm/drm_sysfs.c
> @@ -10,6 +10,7 @@
>   * Copyright (c) 2003-2004 IBM Corp.
>   */
>
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -56,6 +57,39 @@ static struct device_type drm_sysfs_device_connector = {
>
>  struct class *drm_class;
>
> +#ifdef CONFIG_ACPI
> +static bool drm_connector_acpi_bus_match(struct device *dev)
> +{
> +   return dev->type == &drm_sysfs_device_connector;
> +}
> +
> +static struct acpi_device *drm_connector_acpi_find_companion(struct
> device *dev)
> +{
> +   struct drm_connector *connector = to_drm_connector(dev);
> +
> +   return to_acpi_device_node(connector->fwnode);
> +}
> +
> +static struct acpi_bus_type drm_connector_acpi_bus = {
> +   .name = "drm_connector",
> +   .match = drm_connector_acpi_bus_match,
> +   .find_companion = drm_connector_acpi_find_companion,
> +};
> +
> +static void drm_sysfs_acpi_register(void)
> +{
> +   register_acpi_bus_type(&drm_connector_acpi_bus);
> +}
> +
> +static void drm_sysfs_acpi_unregister(void)
> +{
> +   unregister_acpi_bus_type(&drm_connector_acpi_bus);
> +}
> +#else
> +static void drm_sysfs_acpi_register(void) { }
> +static void drm_sysfs_acpi_unregister(void) { }
> +#endif
> +
>  static char *drm_devnode(struct device *dev, umode_t *mode)
>  {
> return kasprintf(GFP_KERNEL, "dri/%s", dev_name(dev));
> @@ -89,6 +123,8 @@ int drm_sysfs_init(void)
> }
>
> drm_class->devnode = drm_devnode;
> +
> +   drm_sysfs_acpi_register();
> return 0;
>  }
>
> @@ -101,6 +137,7 @@ void drm_sysfs_destroy(void)
>  {
> if (IS_ERR_OR_NULL(drm_class))
> return;
> +   drm_sysfs_acpi_unregister();
> class_remove_file(drm_class, &class_attr_version.attr);
> class_destroy(drm_class);
> drm_class = NULL;
> diff --git a/include/drm/drm_connector.h b/include/drm/drm_connector.h
> index 0261801af62c..d20bfd7576ed 100644
> --- a/include/drm/drm_connector.h
> +++ b/include/drm/drm_connector.h
> @@ -1254,6 +1254,8 @@ struct drm_connector {
> struct device *kdev;
> /** @attr: sysfs attributes */
> struct device_attribute *attr;
> +   /** @fwnode: associated fwnode supplied by platform firmware */
> +   struct fwnode_handle *fwnode;
>
> /**
>  * @head:
> --
> 2.31.1
>
>

-- 
With Best Regards,
Andy Shevchenko
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 3/9] drm/connector: Add drm_connector_find_by_fwnode() function (v2)

2021-05-04 Thread Andy Shevchenko
On Monday, May 3, 2021, Hans de Goede  wrote:

> Add a function to find a connector based on a fwnode.
>
> This will be used by the new drm_connector_oob_hotplug_event()
> function which is added by the next patch in this patch-set.
>
> Changes in v2:
> - Complete rewrite to use a global connector list in drm_connector.c
>   rather then using a class-dev-iter in drm_sysfs.c
>
> Signed-off-by: Hans de Goede 
> ---
>  drivers/gpu/drm/drm_connector.c | 50 +
>  drivers/gpu/drm/drm_crtc_internal.h |  1 +
>  include/drm/drm_connector.h |  8 +
>  3 files changed, 59 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_connector.c b/drivers/gpu/drm/drm_
> connector.c
> index 87c68563e6c3..ef759d6add81 100644
> --- a/drivers/gpu/drm/drm_connector.c
> +++ b/drivers/gpu/drm/drm_connector.c
> @@ -66,6 +66,14 @@
>   * support can instead use e.g. drm_helper_hpd_irq_event().
>   */
>
> +/*
> + * Global connector list for drm_connector_find_by_fwnode().
> + * Note drm_connector_[un]register() first take connector->lock and then
> + * take the connector_list_lock.
> + */
> +static DEFINE_MUTEX(connector_list_lock);
> +static LIST_HEAD(connector_list);
> +
>  struct drm_conn_prop_enum_list {
> int type;
> const char *name;
> @@ -267,6 +275,7 @@ int drm_connector_init(struct drm_device *dev,
> goto out_put_type_id;
> }
>
> +   INIT_LIST_HEAD(&connector->global_connector_list_entry);
> INIT_LIST_HEAD(&connector->probed_modes);
> INIT_LIST_HEAD(&connector->modes);
> mutex_init(&connector->mutex);
> @@ -540,6 +549,9 @@ int drm_connector_register(struct drm_connector
> *connector)
> drm_privacy_screen_register_notifier(connector->privacy_
> screen,
>&connector->privacy_screen_
> notifier);
>
> +   mutex_lock(&connector_list_lock);
> +   list_add_tail(&connector->global_connector_list_entry,
> &connector_list);
> +   mutex_unlock(&connector_list_lock);
> goto unlock;
>
>  err_debugfs:
> @@ -568,6 +580,10 @@ void drm_connector_unregister(struct drm_connector
> *connector)
> return;
> }
>
> +   mutex_lock(&connector_list_lock);
> +   list_del_init(&connector->global_connector_list_entry);
> +   mutex_unlock(&connector_list_lock);
> +
> if (connector->privacy_screen)
> drm_privacy_screen_unregister_notifier(
> connector->privacy_screen,
> @@ -2676,6 +2692,40 @@ int drm_mode_getconnector(struct drm_device *dev,
> void *data,
> return ret;
>  }
>
> +/**
> + * drm_connector_find_by_fwnode - Find a connector based on the
> associated fwnode
> + * @fwnode: fwnode for which to find the matching drm_connector
> + *
> + * This functions looks up a drm_connector based on its associated
> fwnode. When
> + * a connector is found a reference to the connector is returned. The
> caller must
> + * call drm_connector_put() to release this reference when it is done
> with the
> + * connector.
> + *
> + * Returns: A reference to the found connector or an ERR_PTR().
> + */
> +struct drm_connector *drm_connector_find_by_fwnode(struct fwnode_handle
> *fwnode)
> +{
> +   struct drm_connector *connector, *found = ERR_PTR(-ENODEV);
> +
> +   if (!fwnode)
> +   return ERR_PTR(-ENODEV);
> +
> +   mutex_lock(&connector_list_lock);
> +
> +   list_for_each_entry(connector, &connector_list,
> global_connector_list_entry) {
> +   if (connector->fwnode == fwnode ||
> +   (connector->fwnode && connector->fwnode->secondary ==
> fwnode)) {
> +   drm_connector_get(connector);
> +   found = connector;
> +   break;
> +   }
> +   }
> +
> +   mutex_unlock(&connector_list_lock);
> +
> +   return found;



If I am not mistaken you can replace this with

return list_entry_is_head();

call and remove additional Boolean variable.


> +}
> +
>
>  /**
>   * DOC: Tile group
> diff --git a/drivers/gpu/drm/drm_crtc_internal.h
> b/drivers/gpu/drm/drm_crtc_internal.h
> index 54d4cf1233e9..6e28fc00a740 100644
> --- a/drivers/gpu/drm/drm_crtc_internal.h
> +++ b/drivers/gpu/drm/drm_crtc_internal.h
> @@ -185,6 +185,7 @@ int drm_connector_set_obj_prop(struct drm_mode_object
> *obj,
>  int drm_connector_create_standard_properties(struct drm_device *dev);
>  const char *drm_get_connector_force_name(enum drm_connector_force force);
>  void drm_connector_free_work_fn(struct work_struct *work);
> +struct drm_connector *drm_connector_find_by_fwnode(struct fwnode_handle
> *fwnode);
>
>  /* IOCTL */
>  int drm_connector_property_set_ioctl(struct drm_device *dev,
> diff --git a/include/drm/drm_connector.h b/include/drm/drm_connector.h
> index d20bfd7576ed..ae377354e48e 100644
> --- a/include/drm/drm_connector.h
> +++ b/include/drm/drm_connector.h
> @@ -1267,6 +1267,1

NVIDIA GPU fallen off the bus after exiting s2idle

2021-05-04 Thread Chris Chiu
Hi,
We have some Intel laptops (11th generation CPU) with NVIDIA GPU
suffering the same GPU falling off the bus problem while exiting
s2idle with external display connected. These laptops connect the
external display via the HDMI/DisplayPort on a USB Type-C interfaced
dock. If we enter and exit s2idle with the dock connected, the NVIDIA
GPU (confirmed on 10de:24b6 and 10de:25b8) and the PCIe port can come
back to D0 w/o problem. If we enter the s2idle, disconnect the dock,
then exit the s2idle, both external display and the panel will remain
with no output. The dmesg as follows shows the "nvidia :01:00.0:
can't change power state from D3cold to D0 (config space
inaccessible)" due to the following ACPI error
[ 154.446781]
[ 154.446783]
[ 154.446783] Initialized Local Variables for Method [IPCS]:
[ 154.446784] Local0: 9863e365  Integer 09C5
[ 154.446790]
[ 154.446791] Initialized Arguments for Method [IPCS]: (7 arguments
defined for method invocation)
[ 154.446792] Arg0: 25568fbd  Integer 00AC
[ 154.446795] Arg1: 9ef30e76  Integer 
[ 154.446798] Arg2: fdf820f0  Integer 0010
[ 154.446801] Arg3: 9fc2a088  Integer 0001
[ 154.446804] Arg4: 3a3418f7  Integer 0001
[ 154.446807] Arg5: 20c4b87c  Integer 
[ 154.446810] Arg6: 8b965a8a  Integer 
[ 154.446813]
[ 154.446815] ACPI Error: Aborting method \IPCS due to previous error
(AE_AML_LOOP_TIMEOUT) (20200925/psparse-529)
[ 154.446824] ACPI Error: Aborting method \MCUI due to previous error
(AE_AML_LOOP_TIMEOUT) (20200925/psparse-529)
[ 154.446829] ACPI Error: Aborting method \SPCX due to previous error
(AE_AML_LOOP_TIMEOUT) (20200925/psparse-529)
[ 154.446835] ACPI Error: Aborting method \_SB.PC00.PGSC due to
previous error (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529)
[ 154.446841] ACPI Error: Aborting method \_SB.PC00.PGON due to
previous error (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529)
[ 154.446846] ACPI Error: Aborting method \_SB.PC00.PEG1.NPON due to
previous error (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529)
[ 154.446852] ACPI Error: Aborting method \_SB.PC00.PEG1.PG01._ON due
to previous error (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529)
[ 154.446860] acpi device:02: Failed to change power state to D0
[ 154.690760] video LNXVIDEO:00: Cannot transition to power state D0
for parent in (unknown)

The IPCS is the last function called from \_SB.PC00.PEG1.PG01._ON
which we expect it to prepare everything before bringing back the
NVIDIA GPU but it's stuck in the infinite loop as described below.
Please refer to
https://gist.github.com/mschiu77/fa4f5a97297749d0d66fe60c1d421c44 for
the full DSDT.dsl.
   While (One)
{
If ((!IBSY || (IERR == One)))
{
Break
}

If ((Local0 > TMOV))
{
RPKG [Zero] = 0x03
Return (RPKG) /* \IPCS.RPKG */
}

Sleep (One)
Local0++
}

And the upstream PCIe port of NVIDIA seems to become inaccessible due
to the messages as follows.
[ 292.746508] pcieport :00:01.0: waiting 100 ms for downstream
link, after activation
[ 292.882296] pci :01:00.0: waiting additional 100 ms to become accessible
[ 316.876997] pci :01:00.0: can't change power state from D3cold
to D0 (config space inaccessible)

Since the IPCS is the Intel Reference Code and we don't really know
why the never-end loop happens just because we unplug the dock while
the system still stays in s2idle. Can anyone from Intel suggest what
happens here?

And one thing also worth mentioning, if we unplug the display cable
from the dock before entering the s2idle, NVIDIA GPU can come back w/o
problem even if we disconnect the dock before exiting s2idle. Here's
the lspci information
https://gist.github.com/mschiu77/0bfc439d15d52d20de0129b1b2a86dc4 and
the dmesg log with ACPI trace_state enabled and dynamic debug on for
drivers/pci/pci.c, drivers/acpi/device_pm.c for the whole s2idle
enter/exit with IPCS timeout.

Any suggestion would be appreciated. Thanks.

Chris
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-05-04 Thread Christian König

Am 04.05.21 um 09:32 schrieb Daniel Vetter:

On Tue, May 04, 2021 at 09:01:23AM +0200, Christian König wrote:

Unfortunately as I pointed out to Daniel as well this won't work 100%
reliable either.

You're claiming this, but there's no clear reason why really, and you
did't reply to my last mail on that sub-thread, so I really don't get
where exactly you're seeing a problem.


Yeah, it's rather hard to explain without pointing out how the hardware 
works in detail.



See the signal on the ring buffer needs to be protected by manipulation from
userspace so that we can guarantee that the hardware really has finished
executing when it fires.

Nope you don't. Userspace is already allowed to submit all kinds of random
garbage, the only thing the kernel has to guarnatee is:
- the dma-fence DAG stays a DAG
- dma-fence completes in finite time

Everything else is not the kernel's problem, and if userspace mixes stuff
up like manipulates the seqno, that's ok. It can do that kind of garbage
already.


Protecting memory by immediate page table updates is a good first step, but
unfortunately not sufficient (and we would need to restructure large parts
of the driver to make this happen).

This is why you need the unload-fence on top, because indeed you can't
just rely on the fences created from the userspace ring, those are
unreliable for memory management.


And exactly that's the problem! We can't provide a reliable unload-fence 
and the user fences are unreliable for that.


I've talked this through lengthy with our hardware/firmware guy last 
Thursday but couldn't find a solution either.


We can have a preemption fence for the kernel which says: Hey this queue 
was scheduled away you can touch it's hardware descriptor, control 
registers, page tables, TLB, memory, GWS, GDS, OA etc etc etc... again. 
But that one is only triggered on preemption and then we have the same 
ordering problems once more.


Or we can have a end of operation fence for userspace which says: Hey 
this queue has finished it's batch of execution, but this one is 
manipulable from userspace in both finish to early (very very bad for 
invalidations and memory management) or finish to late/never (deadlock 
prone but fixable by timeout).


What we could do is to use the preemption fence to emulate the unload 
fence, e.g. something like:

1. Preempt the queue in fixed intervals (let's say 100ms).
2. While preempted check if we have reached the checkpoint in question 
by looking at the hardware descriptor.

3. If we have reached the checkpoint signal the unload fence.
4. If we haven't reached the checkpoint resume the queue again.

The problem is that this might introduce a maximum of 100ms delay before 
signaling the unload fence and preempt/resume has such a hefty overhead 
that we waste a horrible amount of time on it.




btw I thought some more, and I think it's probably best if we only attach
the unload-fence in the ->move(_notify) callbacks. Kinda like we already
do for async copy jobs. So the overall buffer move sequence would be:

1. wait for (untrusted for kernel, but necessary for userspace
correctness) fake dma-fence that rely on the userspace ring

2. unload ctx

3. copy buffer

Ofc 2&3 would be done async behind a dma_fence.


On older hardware we often had the situation that for reliable invalidation
we need the guarantee that every previous operation has finished executing.
It's not so much of a problem when the next operation has already started,
since then we had the opportunity to do things in between the last and the
next operation. Just see cache invalidation and VM switching for example.

If you have gpu page faults you generally have synchronous tlb
invalidation,


Please tell that our hardware engineers :)

We have two modes of operation, see the whole XNACK on/off discussion on 
the amdgfx mailing list.



so this also shouldn't be a big problem. Combined with the
unload fence at least. If you don't have synchronous tlb invalidate it
gets a bit more nasty and you need to force a preemption to a kernel
context which has the required flushes across all the caches. Slightly
nasty, but the exact same thing would be required for handling page faults
anyway with the direct userspace submit model.

Again I'm not seeing a problem.


Additional to that it doesn't really buy us anything, e.g. there is not much
advantage to this. Writing the ring buffer in userspace and then ringing in
the kernel has the same overhead as doing everything in the kernel in the
first place.

It gets you dma-fence backwards compat without having to rewrite the
entire userspace ecosystem. Also since you have the hw already designed
for ringbuffer in userspace it would be silly to copy that through the cs
ioctl, that's just overhead.

Also I thought the problem you're having is that all the kernel ringbuf
stuff is going away, so the old cs ioctl wont work anymore for sure?


We still have a bit more time for this. As I learned from our firmware 
engineer 

[PATCH][next] drm/nouveau/nvkm: Fix spelling mistake "endianess" -> "endianness"

2021-05-04 Thread Colin King
From: Colin Ian King 

There is a spelling mistake in a nvdev_error message. Fix it.

Signed-off-by: Colin Ian King 
---
 drivers/gpu/drm/nouveau/nvkm/engine/device/base.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/device/base.c 
b/drivers/gpu/drm/nouveau/nvkm/engine/device/base.c
index b930f539feec..68d58d52eeef 100644
--- a/drivers/gpu/drm/nouveau/nvkm/engine/device/base.c
+++ b/drivers/gpu/drm/nouveau/nvkm/engine/device/base.c
@@ -2891,7 +2891,7 @@ nvkm_device_ctor(const struct nvkm_device_func *func,
/* switch mmio to cpu's native endianness */
if (!nvkm_device_endianness(device)) {
nvdev_error(device,
-   "Couldn't switch GPU to CPUs endianess\n");
+   "Couldn't switch GPU to CPUs endianness\n");
ret = -ENOSYS;
goto done;
}
-- 
2.30.2

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 3/6] drm/i915: Add a separate low-level helper for masked workarounds

2021-05-04 Thread Tvrtko Ursulin


On 01/05/2021 07:55, Lucas De Marchi wrote:

On Thu, Apr 29, 2021 at 10:12:51AM +0100, Tvrtko Ursulin wrote:

From: Tvrtko Ursulin 

We distinguish masked registers from other workarounds by the mask (clr)
being zero for the former.


the difference is more on the fact that those calls used _MASKED_*
macros to prepare the upper 16 bits than the fact the clr is 0.

clr is zero only because for masked registers we don't care about
clearing the value since all the bits in the mask will be written.
More below.


Yes, but not only don't care but really don't want to do rmw. We have 
two separate paths in the apply side which is picked based on clr being 
zero or not.



To avoid callers of the low-level wa_add having to know that, and be
passing this zero explicitly, add a wa_masked_add low-level helper
which embeds this knowledge.

Signed-off-by: Tvrtko Ursulin 
---
drivers/gpu/drm/i915/gt/intel_workarounds.c | 56 +
1 file changed, 34 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_workarounds.c 
b/drivers/gpu/drm/i915/gt/intel_workarounds.c

index 62cb9ee5bfc3..a7abf9ca78ec 100644
--- a/drivers/gpu/drm/i915/gt/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/gt/intel_workarounds.c
@@ -162,6 +162,18 @@ static void wa_add(struct i915_wa_list *wal, 
i915_reg_t reg,

_wa_add(wal, &wa);
}

+static void wa_masked_add(struct i915_wa_list *wal, i915_reg_t reg,
+  u32 set, u32 read_mask)
+{
+    struct i915_wa wa = {
+    .reg  = reg,
+    .set  = set,
+    .read = read_mask,
+    };
+
+    _wa_add(wal, &wa);
+}


I think this would be better together with the other wa_masked_*
functions. If not only by the name, but also because we have a comment
there:

/*
  * WA operations on "masked register". A masked register has the upper 
16 bits
  * documented as "masked" in b-spec. Its purpose is to allow writing to 
just a
  * portion of the register without a rmw: you simply write in the upper 
16 bits

  * the mask of bits you are going to modify.
  *
  * The wa_masked_* family of functions already does the necessary 
operations to

  * calculate the mask based on the parameters passed, so user only has to
  * provide the lower 16 bits of that register.
  */


Yep thanks.




+
static void
wa_write_clr_set(struct i915_wa_list *wal, i915_reg_t reg, u32 clear, 
u32 set)

{
@@ -200,20 +212,20 @@ wa_write_clr(struct i915_wa_list *wal, 
i915_reg_t reg, u32 clr)

static void
wa_masked_en(struct i915_wa_list *wal, i915_reg_t reg, u32 val)
{
-    wa_add(wal, reg, 0, _MASKED_BIT_ENABLE(val), val);
+    wa_masked_add(wal, reg, _MASKED_BIT_ENABLE(val), val);


for me it feels weird that now we have to use wa_masked_add() *and* at the
same time use _MASKED_BIT_ENABLE(). This is not the case for when we are
using wa_masked_en() for example.

and as I said, the clr bits could be anything since they don't really
matter. The biggest value added by the wa_masked_* variant is the use of
_MASKED_* where needed.


Yes I wasn't fully happy with it.

How about both wa_add and wa_masked_add get a single or double 
underscore prefix? That would signify them being low-level and justify 
the need for explicitly using _MASKED_BIT_ENABLE?


Regards,

Tvrtko



Lucas De Marchi


}

static void
wa_masked_dis(struct i915_wa_list *wal, i915_reg_t reg, u32 val)
{
-    wa_add(wal, reg, 0, _MASKED_BIT_DISABLE(val), val);
+    wa_masked_add(wal, reg, _MASKED_BIT_DISABLE(val), val);
}

static void
wa_masked_field_set(struct i915_wa_list *wal, i915_reg_t reg,
    u32 mask, u32 val)
{
-    wa_add(wal, reg, 0, _MASKED_FIELD(mask, val), mask);
+    wa_masked_add(wal, reg, _MASKED_FIELD(mask, val), mask);
}

static void gen6_ctx_workarounds_init(struct intel_engine_cs *engine,
@@ -836,10 +848,10 @@ hsw_gt_workarounds_init(struct drm_i915_private 
*i915, struct i915_wa_list *wal)

/* L3 caching of data atomics doesn't work -- disable it. */
wa_write(wal, HSW_SCRATCH1, HSW_SCRATCH1_L3_DATA_ATOMICS_DISABLE);

-    wa_add(wal,
-   HSW_ROW_CHICKEN3, 0,
-   
_MASKED_BIT_ENABLE(HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE),

-    0 /* XXX does this reg exist? */);
+    wa_masked_add(wal,
+  HSW_ROW_CHICKEN3,
+  
_MASKED_BIT_ENABLE(HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE),

+  0 /* XXX does this reg exist? */);

/* WaVSRefCountFullforceMissDisable:hsw */
wa_write_clr(wal, GEN7_FF_THREAD_MODE, GEN7_FF_VS_REF_CNT_FFME);
@@ -1947,10 +1959,10 @@ rcs_engine_wa_init(struct intel_engine_cs 
*engine, struct i915_wa_list *wal)

 * disable bit, which we don't touch here, but it's good
 * to keep in mind (see 3DSTATE_PS and 3DSTATE_WM).
 */
-    wa_add(wal, GEN7_GT_MODE, 0,
-   _MASKED_FIELD(GEN6_WIZ_HASHING_MASK,
- GEN6_WIZ_HASHING_16x4),
-   GEN6_WIZ_HASHING_16x4);
+    wa_masked_field_set(wal,
+    GEN7_GT_MODE,
+    GE

Re: [PATCHv3 1/6] drm: drm_bridge: add connector_attach/detach bridge ops

2021-05-04 Thread Tomi Valkeinen

On 28/04/2021 16:25, Hans Verkuil wrote:

Add bridge connector_attach/detach ops. These ops are called when a
bridge is attached or detached to a drm_connector. These ops can be
used to register and unregister an HDMI CEC adapter for a bridge that
supports CEC.

Signed-off-by: Hans Verkuil 
---
  drivers/gpu/drm/drm_bridge_connector.c | 25 +++-
  include/drm/drm_bridge.h   | 27 ++
  2 files changed, 51 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_bridge_connector.c 
b/drivers/gpu/drm/drm_bridge_connector.c
index 791379816837..0676677badfe 100644
--- a/drivers/gpu/drm/drm_bridge_connector.c
+++ b/drivers/gpu/drm/drm_bridge_connector.c
@@ -203,6 +203,11 @@ static void drm_bridge_connector_destroy(struct 
drm_connector *connector)
  {
struct drm_bridge_connector *bridge_connector =
to_drm_bridge_connector(connector);
+   struct drm_bridge *bridge;
+
+   drm_for_each_bridge_in_chain(bridge_connector->encoder, bridge)
+   if (bridge->funcs->connector_detach)
+   bridge->funcs->connector_detach(bridge, connector);
  
  	if (bridge_connector->bridge_hpd) {

struct drm_bridge *hpd = bridge_connector->bridge_hpd;
@@ -318,6 +323,7 @@ struct drm_connector *drm_bridge_connector_init(struct 
drm_device *drm,
struct i2c_adapter *ddc = NULL;
struct drm_bridge *bridge;
int connector_type;
+   int ret;
  
  	bridge_connector = kzalloc(sizeof(*bridge_connector), GFP_KERNEL);

if (!bridge_connector)
@@ -375,6 +381,23 @@ struct drm_connector *drm_bridge_connector_init(struct 
drm_device *drm,
connector->polled = DRM_CONNECTOR_POLL_CONNECT
  | DRM_CONNECTOR_POLL_DISCONNECT;
  
-	return connector;

+   ret = 0;
+   /* call connector_attach for all bridges */
+   drm_for_each_bridge_in_chain(encoder, bridge) {
+   if (!bridge->funcs->connector_attach)
+   continue;
+   ret = bridge->funcs->connector_attach(bridge, connector);
+   if (ret)
+   break;
+   }
+   if (!ret)
+   return connector;
+
+   /* on error, detach any previously successfully attached connectors */
+   list_for_each_entry_continue_reverse(bridge, &(encoder)->bridge_chain,


No need for parenthesis in (encoder) here.


+chain_node)
+   if (bridge->funcs->connector_detach)
+   bridge->funcs->connector_detach(bridge, connector);
+   return ERR_PTR(ret);
  }
  EXPORT_SYMBOL_GPL(drm_bridge_connector_init);
diff --git a/include/drm/drm_bridge.h b/include/drm/drm_bridge.h
index 2195daa289d2..333fbc3a03e9 100644
--- a/include/drm/drm_bridge.h
+++ b/include/drm/drm_bridge.h
@@ -629,6 +629,33 @@ struct drm_bridge_funcs {
 * the DRM_BRIDGE_OP_HPD flag in their &drm_bridge->ops.
 */
void (*hpd_disable)(struct drm_bridge *bridge);
+
+   /**
+* @connector_attach:
+*
+* This callback is invoked whenever our bridge is being attached to a
+* &drm_connector. This is where an HDMI CEC adapter can be registered.
+*
+* The @connector_attach callback is optional.
+*
+* RETURNS:
+*
+* Zero on success, error code on failure.
+*/
+   int (*connector_attach)(struct drm_bridge *bridge,
+   struct drm_connector *conn);
+
+   /**
+* @connector_detach:
+*
+* This callback is invoked whenever our bridge is being detached from a
+* &drm_connector. This is where an HDMI CEC adapter can be
+* unregistered.
+*
+* The @connector_detach callback is optional.
+*/
+   void (*connector_detach)(struct drm_bridge *bridge,
+struct drm_connector *conn);
  };
  
  /**




Reviewed-by: Tomi Valkeinen 

I can take this series as it's mostly omapdrm, but we'll need a 
reviewed-by/acked-by from a maintainer for this patch.


 Tomi
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-05-04 Thread Daniel Vetter
On Tue, May 4, 2021 at 10:09 AM Christian König
 wrote:
>
> Am 04.05.21 um 09:32 schrieb Daniel Vetter:
> > On Tue, May 04, 2021 at 09:01:23AM +0200, Christian König wrote:
> >> Unfortunately as I pointed out to Daniel as well this won't work 100%
> >> reliable either.
> > You're claiming this, but there's no clear reason why really, and you
> > did't reply to my last mail on that sub-thread, so I really don't get
> > where exactly you're seeing a problem.
>
> Yeah, it's rather hard to explain without pointing out how the hardware
> works in detail.
>
> >> See the signal on the ring buffer needs to be protected by manipulation 
> >> from
> >> userspace so that we can guarantee that the hardware really has finished
> >> executing when it fires.
> > Nope you don't. Userspace is already allowed to submit all kinds of random
> > garbage, the only thing the kernel has to guarnatee is:
> > - the dma-fence DAG stays a DAG
> > - dma-fence completes in finite time
> >
> > Everything else is not the kernel's problem, and if userspace mixes stuff
> > up like manipulates the seqno, that's ok. It can do that kind of garbage
> > already.
> >
> >> Protecting memory by immediate page table updates is a good first step, but
> >> unfortunately not sufficient (and we would need to restructure large parts
> >> of the driver to make this happen).
> > This is why you need the unload-fence on top, because indeed you can't
> > just rely on the fences created from the userspace ring, those are
> > unreliable for memory management.
>
> And exactly that's the problem! We can't provide a reliable unload-fence
> and the user fences are unreliable for that.
>
> I've talked this through lengthy with our hardware/firmware guy last
> Thursday but couldn't find a solution either.
>
> We can have a preemption fence for the kernel which says: Hey this queue
> was scheduled away you can touch it's hardware descriptor, control
> registers, page tables, TLB, memory, GWS, GDS, OA etc etc etc... again.
> But that one is only triggered on preemption and then we have the same
> ordering problems once more.
>
> Or we can have a end of operation fence for userspace which says: Hey
> this queue has finished it's batch of execution, but this one is
> manipulable from userspace in both finish to early (very very bad for
> invalidations and memory management) or finish to late/never (deadlock
> prone but fixable by timeout).
>
> What we could do is to use the preemption fence to emulate the unload
> fence, e.g. something like:
> 1. Preempt the queue in fixed intervals (let's say 100ms).
> 2. While preempted check if we have reached the checkpoint in question
> by looking at the hardware descriptor.
> 3. If we have reached the checkpoint signal the unload fence.
> 4. If we haven't reached the checkpoint resume the queue again.
>
> The problem is that this might introduce a maximum of 100ms delay before
> signaling the unload fence and preempt/resume has such a hefty overhead
> that we waste a horrible amount of time on it.

So your hw can preempt? That's good enough.

The unload fence is just
1. wait for all dma_fence that are based on the userspace ring. This
is unreliable, but we don't care because tdr will make it reliable.
And once tdr shot down a context we'll force-unload and thrash it
completely, which solves the problem.
2. preempt the context, which /should/ now be stuck waiting for more
commands to be stuffed into the ringbuffer. Which means your
preemption is hopefully fast enough to not matter. If your hw takes
forever to preempt an idle ring, I can't help you :-)

Also, if userspace lies to us and keeps pushing crap into the ring
after it's supposed to be idle: Userspace is already allowed to waste
gpu time. If you're too worried about this set a fairly aggressive
preempt timeout on the unload fence, and kill the context if it takes
longer than what preempting an idle ring should take (because that
would indicate broken/evil userspace).

Again, I'm not seeing the problem. Except if your hw is really
completely busted to the point where it can't even support userspace
ringbuffers properly and with sufficient performance :-P

Of course if you issue the preempt context request before the
userspace fences have finished (or tdr cleaned up the mess) like you
do in your proposal, then it will be ridiculously expensive and/or
wont work. So just don't do that.

> > btw I thought some more, and I think it's probably best if we only attach
> > the unload-fence in the ->move(_notify) callbacks. Kinda like we already
> > do for async copy jobs. So the overall buffer move sequence would be:
> >
> > 1. wait for (untrusted for kernel, but necessary for userspace
> > correctness) fake dma-fence that rely on the userspace ring
> >
> > 2. unload ctx
> >
> > 3. copy buffer
> >
> > Ofc 2&3 would be done async behind a dma_fence.
> >
> >> On older hardware we often had the situation that for reliable invalidation
> >> we need the guarantee that every previous oper

Re: [PATCH 02/27] drm/i915: Stop storing the ring size in the ring pointer

2021-05-04 Thread Daniel Vetter
On Mon, May 03, 2021 at 10:57:23AM -0500, Jason Ekstrand wrote:
> Previously, we were storing the ring size in the ring pointer before it
> was actually allocated.  We would then guard setting the ring size on
> checking for CONTEXT_ALLOC_BIT.  This is error-prone at best and really
> only saves us a few bytes on something that already burns at least 4K.
> Instead, this patch adds a new ring_size field and makes everything use
> that.
> 
> Signed-off-by: Jason Ekstrand 
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_context.c   | 3 +--
>  drivers/gpu/drm/i915/gt/intel_context.c   | 3 ++-
>  drivers/gpu/drm/i915/gt/intel_context.h   | 5 -
>  drivers/gpu/drm/i915/gt/intel_context_types.h | 1 +
>  drivers/gpu/drm/i915/gt/intel_lrc.c   | 2 +-
>  drivers/gpu/drm/i915/gt/selftest_execlists.c  | 2 +-
>  drivers/gpu/drm/i915/gt/selftest_mocs.c   | 2 +-
>  drivers/gpu/drm/i915/gt/selftest_timeline.c   | 2 +-
>  drivers/gpu/drm/i915/gvt/scheduler.c  | 7 ++-
>  9 files changed, 10 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index e52b85b8f923d..2ba4c7e4011b4 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -211,8 +211,7 @@ static void intel_context_set_gem(struct intel_context 
> *ce,
>   GEM_BUG_ON(rcu_access_pointer(ce->gem_context));
>   RCU_INIT_POINTER(ce->gem_context, ctx);
>  
> - if (!test_bit(CONTEXT_ALLOC_BIT, &ce->flags))
> - ce->ring = __intel_context_ring_size(SZ_16K);
> + ce->ring_size = SZ_16K;
>  
>   if (rcu_access_pointer(ctx->vm)) {
>   struct i915_address_space *vm;
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c 
> b/drivers/gpu/drm/i915/gt/intel_context.c
> index 17cf2640b082b..342fa7daa08b5 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.c
> +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> @@ -372,7 +372,8 @@ intel_context_init(struct intel_context *ce, struct 
> intel_engine_cs *engine)
>   ce->engine = engine;
>   ce->ops = engine->cops;
>   ce->sseu = engine->sseu;
> - ce->ring = __intel_context_ring_size(SZ_4K);
> + ce->ring = NULL;
> + ce->ring_size = SZ_4K;
>  
>   ewma_runtime_init(&ce->runtime.avg);
>  
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.h 
> b/drivers/gpu/drm/i915/gt/intel_context.h
> index f83a73a2b39fc..b10cbe8fee992 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context.h
> @@ -175,11 +175,6 @@ int intel_context_prepare_remote_request(struct 
> intel_context *ce,
>  
>  struct i915_request *intel_context_create_request(struct intel_context *ce);
>  
> -static inline struct intel_ring *__intel_context_ring_size(u64 sz)
> -{
> - return u64_to_ptr(struct intel_ring, sz);
> -}
> -
>  static inline bool intel_context_is_barrier(const struct intel_context *ce)
>  {
>   return test_bit(CONTEXT_BARRIER_BIT, &ce->flags);
> diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h 
> b/drivers/gpu/drm/i915/gt/intel_context_types.h
> index ed8c447a7346b..90026c1771055 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> @@ -82,6 +82,7 @@ struct intel_context {
>   spinlock_t signal_lock; /* protects signals, the list of requests */
>  
>   struct i915_vma *state;
> + u32 ring_size;
>   struct intel_ring *ring;
>   struct intel_timeline *timeline;
>  
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c 
> b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index e86897cde9846..63193c80fb117 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -845,7 +845,7 @@ int lrc_alloc(struct intel_context *ce, struct 
> intel_engine_cs *engine)
>   if (IS_ERR(vma))
>   return PTR_ERR(vma);
>  
> - ring = intel_engine_create_ring(engine, (unsigned long)ce->ring);
> + ring = intel_engine_create_ring(engine, ce->ring_size);
>   if (IS_ERR(ring)) {
>   err = PTR_ERR(ring);
>   goto err_vma;
> diff --git a/drivers/gpu/drm/i915/gt/selftest_execlists.c 
> b/drivers/gpu/drm/i915/gt/selftest_execlists.c
> index 1081cd36a2bd3..01d9896dd4844 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_execlists.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_execlists.c
> @@ -2793,7 +2793,7 @@ static int __live_preempt_ring(struct intel_engine_cs 
> *engine,
>   goto err_ce;
>   }
>  
> - tmp->ring = __intel_context_ring_size(ring_sz);
> + tmp->ring_size = ring_sz;
>  
>   err = intel_context_pin(tmp);
>   if (err) {
> diff --git a/drivers/gpu/drm/i915/gt/selftest_mocs.c 
> b/drivers/gpu/drm/i915/gt/selftest_mocs.c
> index e55a887d11e2b..f343fa5fd986f 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_mocs.c
> +++ b/drivers/gpu/drm/i915/gt/self

Re: [Intel-gfx] [PATCH 03/27] drm/i915: Drop I915_CONTEXT_PARAM_NO_ZEROMAP

2021-05-04 Thread Daniel Vetter
On Mon, May 03, 2021 at 10:57:24AM -0500, Jason Ekstrand wrote:
> The idea behind this param is to support OpenCL drivers with relocations
> because OpenCL reserves 0x0 for NULL and, if we placed memory there, it
> would confuse CL kernels.  It was originally sent out as part of a patch
> series including libdrm [1] and Beignet [2] support.  However, the
> libdrm and Beignet patches never landed in their respective upstream
> projects so this API has never been used.  It's never been used in Mesa
> or any other driver, either.
> 
> Dropping this API allows us to delete a small bit of code.
> 
> [1]: https://lists.freedesktop.org/archives/intel-gfx/2015-May/067030.html
> [2]: https://lists.freedesktop.org/archives/intel-gfx/2015-May/067031.html
> 
> Signed-off-by: Jason Ekstrand 

Hm I forgot to r-b this last time around.

Reviewed-by: Daniel Vetter 

> ---
>  drivers/gpu/drm/i915/gem/i915_gem_context.c  | 16 ++--
>  .../gpu/drm/i915/gem/i915_gem_context_types.h|  1 -
>  drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c   |  8 
>  include/uapi/drm/i915_drm.h  |  4 
>  4 files changed, 6 insertions(+), 23 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index 2ba4c7e4011b4..44841db04301b 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -1921,15 +1921,6 @@ static int ctx_setparam(struct drm_i915_file_private 
> *fpriv,
>   int ret = 0;
>  
>   switch (args->param) {
> - case I915_CONTEXT_PARAM_NO_ZEROMAP:
> - if (args->size)
> - ret = -EINVAL;
> - else if (args->value)
> - set_bit(UCONTEXT_NO_ZEROMAP, &ctx->user_flags);
> - else
> - clear_bit(UCONTEXT_NO_ZEROMAP, &ctx->user_flags);
> - break;
> -
>   case I915_CONTEXT_PARAM_NO_ERROR_CAPTURE:
>   if (args->size)
>   ret = -EINVAL;
> @@ -1979,6 +1970,7 @@ static int ctx_setparam(struct drm_i915_file_private 
> *fpriv,
>   ret = set_persistence(ctx, args);
>   break;
>  
> + case I915_CONTEXT_PARAM_NO_ZEROMAP:
>   case I915_CONTEXT_PARAM_BAN_PERIOD:
>   case I915_CONTEXT_PARAM_RINGSIZE:
>   default:
> @@ -2359,11 +2351,6 @@ int i915_gem_context_getparam_ioctl(struct drm_device 
> *dev, void *data,
>   return -ENOENT;
>  
>   switch (args->param) {
> - case I915_CONTEXT_PARAM_NO_ZEROMAP:
> - args->size = 0;
> - args->value = test_bit(UCONTEXT_NO_ZEROMAP, &ctx->user_flags);
> - break;
> -
>   case I915_CONTEXT_PARAM_GTT_SIZE:
>   args->size = 0;
>   rcu_read_lock();
> @@ -2411,6 +2398,7 @@ int i915_gem_context_getparam_ioctl(struct drm_device 
> *dev, void *data,
>   args->value = i915_gem_context_is_persistent(ctx);
>   break;
>  
> + case I915_CONTEXT_PARAM_NO_ZEROMAP:
>   case I915_CONTEXT_PARAM_BAN_PERIOD:
>   case I915_CONTEXT_PARAM_RINGSIZE:
>   default:
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h 
> b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
> index 340473aa70de0..5ae71ec936f7c 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
> @@ -129,7 +129,6 @@ struct i915_gem_context {
>* @user_flags: small set of booleans controlled by the user
>*/
>   unsigned long user_flags;
> -#define UCONTEXT_NO_ZEROMAP  0
>  #define UCONTEXT_NO_ERROR_CAPTURE1
>  #define UCONTEXT_BANNABLE2
>  #define UCONTEXT_RECOVERABLE 3
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index 297143511f99b..b812f313422a9 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -290,7 +290,6 @@ struct i915_execbuffer {
>   struct intel_context *reloc_context;
>  
>   u64 invalid_flags; /** Set of execobj.flags that are invalid */
> - u32 context_flags; /** Set of execobj.flags to insert from the ctx */
>  
>   u64 batch_len; /** Length of batch within object */
>   u32 batch_start_offset; /** Location within object of batch */
> @@ -541,9 +540,6 @@ eb_validate_vma(struct i915_execbuffer *eb,
>   entry->flags |= EXEC_OBJECT_NEEDS_GTT | 
> __EXEC_OBJECT_NEEDS_MAP;
>   }
>  
> - if (!(entry->flags & EXEC_OBJECT_PINNED))
> - entry->flags |= eb->context_flags;
> -
>   return 0;
>  }
>  
> @@ -750,10 +746,6 @@ static int eb_select_context(struct i915_execbuffer *eb)
>   if (rcu_access_pointer(ctx->vm))
>   eb->invalid_flags |= EXEC_OBJECT_NEEDS_GTT;
>  
> - eb->context_flags = 0;
> - if (test_bit(UCONTEXT_NO_ZEROMAP, &ctx->user_fla

Re: [Intel-gfx] [PATCH 06/27] drm/i915: Drop the CONTEXT_CLONE API

2021-05-04 Thread Daniel Vetter
On Mon, May 03, 2021 at 10:57:27AM -0500, Jason Ekstrand wrote:
> This API allows one context to grab bits out of another context upon
> creation.  It can be used as a short-cut for setparam(getparam()) for
> things like I915_CONTEXT_PARAM_VM.  However, it's never been used by any
> real userspace.  It's used by a few IGT tests and that's it.  Since it
> doesn't add any real value (most of the stuff you can CLONE you can copy
> in other ways), drop it.
> 
> There is one thing that this API allows you to clone which you cannot
> clone via getparam/setparam: timelines.  However, timelines are an
> implementation detail of i915 and not really something that needs to be
> exposed to userspace.  Also, sharing timelines between contexts isn't
> obviously useful and supporting it has the potential to complicate i915
> internally.  It also doesn't add any functionality that the client can't
> get in other ways.  If a client really wants a shared timeline, they can
> use a syncobj and set it as an in and out fence on every submit.
> 
> Signed-off-by: Jason Ekstrand 
> Cc: Tvrtko Ursulin 

This aint gitlab MR, so please include a per-patch (and also per-revision)
changelog summary here. With that added:

Reviewed-by: Daniel Vetter 
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_context.c   | 199 +-
>  .../drm/i915/gt/intel_execlists_submission.c  |  28 ---
>  .../drm/i915/gt/intel_execlists_submission.h  |   3 -
>  include/uapi/drm/i915_drm.h   |  16 +-
>  4 files changed, 6 insertions(+), 240 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index d6f342e605254..308a63f778faf 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -1958,207 +1958,14 @@ static int create_setparam(struct 
> i915_user_extension __user *ext, void *data)
>   return ctx_setparam(arg->fpriv, arg->ctx, &local.param);
>  }
>  
> -static int clone_engines(struct i915_gem_context *dst,
> -  struct i915_gem_context *src)
> +static int invalid_ext(struct i915_user_extension __user *ext, void *data)
>  {
> - struct i915_gem_engines *clone, *e;
> - bool user_engines;
> - unsigned long n;
> -
> - e = __context_engines_await(src, &user_engines);
> - if (!e)
> - return -ENOENT;
> -
> - clone = alloc_engines(e->num_engines);
> - if (!clone)
> - goto err_unlock;
> -
> - for (n = 0; n < e->num_engines; n++) {
> - struct intel_engine_cs *engine;
> -
> - if (!e->engines[n]) {
> - clone->engines[n] = NULL;
> - continue;
> - }
> - engine = e->engines[n]->engine;
> -
> - /*
> -  * Virtual engines are singletons; they can only exist
> -  * inside a single context, because they embed their
> -  * HW context... As each virtual context implies a single
> -  * timeline (each engine can only dequeue a single request
> -  * at any time), it would be surprising for two contexts
> -  * to use the same engine. So let's create a copy of
> -  * the virtual engine instead.
> -  */
> - if (intel_engine_is_virtual(engine))
> - clone->engines[n] =
> - intel_execlists_clone_virtual(engine);
> - else
> - clone->engines[n] = intel_context_create(engine);
> - if (IS_ERR_OR_NULL(clone->engines[n])) {
> - __free_engines(clone, n);
> - goto err_unlock;
> - }
> -
> - intel_context_set_gem(clone->engines[n], dst);
> - }
> - clone->num_engines = n;
> - i915_sw_fence_complete(&e->fence);
> -
> - /* Serialised by constructor */
> - engines_idle_release(dst, rcu_replace_pointer(dst->engines, clone, 1));
> - if (user_engines)
> - i915_gem_context_set_user_engines(dst);
> - else
> - i915_gem_context_clear_user_engines(dst);
> - return 0;
> -
> -err_unlock:
> - i915_sw_fence_complete(&e->fence);
> - return -ENOMEM;
> -}
> -
> -static int clone_flags(struct i915_gem_context *dst,
> -struct i915_gem_context *src)
> -{
> - dst->user_flags = src->user_flags;
> - return 0;
> -}
> -
> -static int clone_schedattr(struct i915_gem_context *dst,
> -struct i915_gem_context *src)
> -{
> - dst->sched = src->sched;
> - return 0;
> -}
> -
> -static int clone_sseu(struct i915_gem_context *dst,
> -   struct i915_gem_context *src)
> -{
> - struct i915_gem_engines *e = i915_gem_context_lock_engines(src);
> - struct i915_gem_engines *clone;
> - unsigned long n;
> - int err;
> -
> - /* no locking required; sole access under constructor*/
> - clone = __

Re: [PATCH 11/27] drm/i915/request: Remove the hook from await_execution

2021-05-04 Thread Daniel Vetter
On Mon, May 03, 2021 at 10:57:32AM -0500, Jason Ekstrand wrote:
> This was only ever used for FENCE_SUBMIT automatic engine selection
> which was removed in the previous commit.
> 
> Signed-off-by: Jason Ekstrand 

I really how this is now split up and much more decipherable what's going
on. For the three patches leading to here including this one:

Reviewed-by: Daniel Vetter 

> ---
>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c|  3 +-
>  drivers/gpu/drm/i915/i915_request.c   | 42 ---
>  drivers/gpu/drm/i915/i915_request.h   |  4 +-
>  3 files changed, 9 insertions(+), 40 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index efb2fa3522a42..7024adcd5cf15 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -3473,8 +3473,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
>   if (in_fence) {
>   if (args->flags & I915_EXEC_FENCE_SUBMIT)
>   err = i915_request_await_execution(eb.request,
> -in_fence,
> -NULL);
> +in_fence);
>   else
>   err = i915_request_await_dma_fence(eb.request,
>  in_fence);
> diff --git a/drivers/gpu/drm/i915/i915_request.c 
> b/drivers/gpu/drm/i915/i915_request.c
> index bec9c3652188b..7e00218b8c105 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -49,7 +49,6 @@
>  struct execute_cb {
>   struct irq_work work;
>   struct i915_sw_fence *fence;
> - void (*hook)(struct i915_request *rq, struct dma_fence *signal);
>   struct i915_request *signal;
>  };
>  
> @@ -180,17 +179,6 @@ static void irq_execute_cb(struct irq_work *wrk)
>   kmem_cache_free(global.slab_execute_cbs, cb);
>  }
>  
> -static void irq_execute_cb_hook(struct irq_work *wrk)
> -{
> - struct execute_cb *cb = container_of(wrk, typeof(*cb), work);
> -
> - cb->hook(container_of(cb->fence, struct i915_request, submit),
> -  &cb->signal->fence);
> - i915_request_put(cb->signal);
> -
> - irq_execute_cb(wrk);
> -}
> -
>  static __always_inline void
>  __notify_execute_cb(struct i915_request *rq, bool (*fn)(struct irq_work 
> *wrk))
>  {
> @@ -517,17 +505,12 @@ static bool __request_in_flight(const struct 
> i915_request *signal)
>  static int
>  __await_execution(struct i915_request *rq,
> struct i915_request *signal,
> -   void (*hook)(struct i915_request *rq,
> -struct dma_fence *signal),
> gfp_t gfp)
>  {
>   struct execute_cb *cb;
>  
> - if (i915_request_is_active(signal)) {
> - if (hook)
> - hook(rq, &signal->fence);
> + if (i915_request_is_active(signal))
>   return 0;
> - }
>  
>   cb = kmem_cache_alloc(global.slab_execute_cbs, gfp);
>   if (!cb)
> @@ -537,12 +520,6 @@ __await_execution(struct i915_request *rq,
>   i915_sw_fence_await(cb->fence);
>   init_irq_work(&cb->work, irq_execute_cb);
>  
> - if (hook) {
> - cb->hook = hook;
> - cb->signal = i915_request_get(signal);
> - cb->work.func = irq_execute_cb_hook;
> - }
> -
>   /*
>* Register the callback first, then see if the signaler is already
>* active. This ensures that if we race with the
> @@ -1253,7 +1230,7 @@ emit_semaphore_wait(struct i915_request *to,
>   goto await_fence;
>  
>   /* Only submit our spinner after the signaler is running! */
> - if (__await_execution(to, from, NULL, gfp))
> + if (__await_execution(to, from, gfp))
>   goto await_fence;
>  
>   if (__emit_semaphore_wait(to, from, from->fence.seqno))
> @@ -1284,16 +1261,14 @@ static int intel_timeline_sync_set_start(struct 
> intel_timeline *tl,
>  
>  static int
>  __i915_request_await_execution(struct i915_request *to,
> -struct i915_request *from,
> -void (*hook)(struct i915_request *rq,
> - struct dma_fence *signal))
> +struct i915_request *from)
>  {
>   int err;
>  
>   GEM_BUG_ON(intel_context_is_barrier(from->context));
>  
>   /* Submit both requests at the same time */
> - err = __await_execution(to, from, hook, I915_FENCE_GFP);
> + err = __await_execution(to, from, I915_FENCE_GFP);
>   if (err)
>   return err;
>  
> @@ -1406,9 +1381,7 @@ i915_request_await_external(struct i915_request *rq, 
> struct dma_fence *fence)
>  
>  int
>  i915_request_await_execution(struct i915_request *rq,
> -  str

Re: [PATCH 10/27] drm/i915/gem: Remove engine auto-magic with FENCE_SUBMIT

2021-05-04 Thread Daniel Vetter
On Mon, May 03, 2021 at 10:57:31AM -0500, Jason Ekstrand wrote:
> Even though FENCE_SUBMIT is only documented to wait until the request in
> the in-fence starts instead of waiting until it completes, it has a bit
> more magic than that.  If FENCE_SUBMIT is used to submit something to a
> balanced engine, we would wait to assign engines until the primary
> request was ready to start and then attempt to assign it to a different
> engine than the primary.  There is an IGT test which exercises this by
> submitting a primary batch to a specific VCS and then using FENCE_SUBMIT
> to submit a secondary which can run on any VCS and have i915 figure out
> which VCS to run it on such that they can run in parallel.
> 
> However, this functionality has never been used in the real world.  The
> media driver (the only user of FENCE_SUBMIT) always picks exactly two
> physical engines to bond and never asks us to pick which to use.

Maybe reference the specific igt you're break (and removing in the igt
series to match this) here. Just for the record and all that.
-Daniel

> 
> Signed-off-by: Jason Ekstrand 
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c  |  2 +-
>  drivers/gpu/drm/i915/gt/intel_engine_types.h|  7 ---
>  .../drm/i915/gt/intel_execlists_submission.c| 17 -
>  3 files changed, 1 insertion(+), 25 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index d640bba6ad9ab..efb2fa3522a42 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -3474,7 +3474,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
>   if (args->flags & I915_EXEC_FENCE_SUBMIT)
>   err = i915_request_await_execution(eb.request,
>  in_fence,
> -
> eb.engine->bond_execute);
> +NULL);
>   else
>   err = i915_request_await_dma_fence(eb.request,
>  in_fence);
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h 
> b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> index 883bafc449024..68cfe5080325c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> @@ -446,13 +446,6 @@ struct intel_engine_cs {
>*/
>   void(*submit_request)(struct i915_request *rq);
>  
> - /*
> -  * Called on signaling of a SUBMIT_FENCE, passing along the signaling
> -  * request down to the bonded pairs.
> -  */
> - void(*bond_execute)(struct i915_request *rq,
> - struct dma_fence *signal);
> -
>   /*
>* Call when the priority on a request has changed and it and its
>* dependencies may need rescheduling. Note the request itself may
> diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c 
> b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> index 14378b28169b7..635d6d2494d26 100644
> --- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> +++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
> @@ -3547,22 +3547,6 @@ static void virtual_submit_request(struct i915_request 
> *rq)
>   spin_unlock_irqrestore(&ve->base.active.lock, flags);
>  }
>  
> -static void
> -virtual_bond_execute(struct i915_request *rq, struct dma_fence *signal)
> -{
> - intel_engine_mask_t allowed, exec;
> -
> - allowed = ~to_request(signal)->engine->mask;
> -
> - /* Restrict the bonded request to run on only the available engines */
> - exec = READ_ONCE(rq->execution_mask);
> - while (!try_cmpxchg(&rq->execution_mask, &exec, exec & allowed))
> - ;
> -
> - /* Prevent the master from being re-run on the bonded engines */
> - to_request(signal)->execution_mask &= ~allowed;
> -}
> -
>  struct intel_context *
>  intel_execlists_create_virtual(struct intel_engine_cs **siblings,
>  unsigned int count)
> @@ -3616,7 +3600,6 @@ intel_execlists_create_virtual(struct intel_engine_cs 
> **siblings,
>  
>   ve->base.schedule = i915_schedule;
>   ve->base.submit_request = virtual_submit_request;
> - ve->base.bond_execute = virtual_bond_execute;
>  
>   INIT_LIST_HEAD(virtual_queue(ve));
>   ve->base.execlists.queue_priority_hint = INT_MIN;
> -- 
> 2.31.1
> 
> ___
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 15/27] drm/i915: Add gem/i915_gem_context.h to the docs

2021-05-04 Thread Daniel Vetter
On Mon, May 03, 2021 at 10:57:36AM -0500, Jason Ekstrand wrote:
> In order to prevent kernel doc warnings, also fill out docs for any
> missing fields and fix those that forgot the "@".
> 
> Signed-off-by: Jason Ekstrand 
> ---
>  Documentation/gpu/i915.rst|  2 +
>  .../gpu/drm/i915/gem/i915_gem_context_types.h | 43 ---
>  2 files changed, 38 insertions(+), 7 deletions(-)
> 
> diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst
> index 486c720f38907..0529e5183982e 100644
> --- a/Documentation/gpu/i915.rst
> +++ b/Documentation/gpu/i915.rst
> @@ -422,6 +422,8 @@ Batchbuffer Parsing
>  User Batchbuffer Execution
>  --
>  
> +.. kernel-doc:: drivers/gpu/drm/i915/gem/i915_gem_context_types.h
> +
>  .. kernel-doc:: drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> :doc: User command execution
>  
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h 
> b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
> index df76767f0c41b..5f0673a2129f9 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
> @@ -30,19 +30,39 @@ struct i915_address_space;
>  struct intel_timeline;
>  struct intel_ring;
>  
> +/**
> + * struct i915_gem_engines - A set of engines
> + */
>  struct i915_gem_engines {
>   union {
> + /** @link: Link in i915_gem_context::stale::engines */
>   struct list_head link;
> +
> + /** @rcu: RCU to use when freeing */
>   struct rcu_head rcu;
>   };
> +
> + /** @fence: Fence used for delayed destruction of engines */
>   struct i915_sw_fence fence;

I got derailed a bit appreciating the lifetime complexity here as
expressed in the callbacks for this. I hope this all simplifies?

Anyway patch looks good, or at least better than what we had.

Reviewed-by: Daniel Vetter 

> +
> + /** @ctx: i915_gem_context backpointer */
>   struct i915_gem_context *ctx;
> +
> + /** @num_engines: Number of engines in this set */
>   unsigned int num_engines;
> +
> + /** @engines: Array of engines */
>   struct intel_context *engines[];
>  };
>  
> +/**
> + * struct i915_gem_engines_iter - Iterator for an i915_gem_engines set
> + */
>  struct i915_gem_engines_iter {
> + /** @idx: Index into i915_gem_engines::engines */
>   unsigned int idx;
> +
> + /** @engines: Engine set being iterated */
>   const struct i915_gem_engines *engines;
>  };
>  
> @@ -53,10 +73,10 @@ struct i915_gem_engines_iter {
>   * logical hardware state for a particular client.
>   */
>  struct i915_gem_context {
> - /** i915: i915 device backpointer */
> + /** @i915: i915 device backpointer */
>   struct drm_i915_private *i915;
>  
> - /** file_priv: owning file descriptor */
> + /** @file_priv: owning file descriptor */
>   struct drm_i915_file_private *file_priv;
>  
>   /**
> @@ -81,7 +101,9 @@ struct i915_gem_context {
>* CONTEXT_USER_ENGINES flag is set).
>*/
>   struct i915_gem_engines __rcu *engines;
> - struct mutex engines_mutex; /* guards writes to engines */
> +
> + /** @engines_mutex: guards writes to engines */
> + struct mutex engines_mutex;
>  
>   /**
>* @syncobj: Shared timeline syncobj
> @@ -118,7 +140,7 @@ struct i915_gem_context {
>*/
>   struct pid *pid;
>  
> - /** link: place with &drm_i915_private.context_list */
> + /** @link: place with &drm_i915_private.context_list */
>   struct list_head link;
>  
>   /**
> @@ -153,11 +175,13 @@ struct i915_gem_context {
>  #define CONTEXT_CLOSED   0
>  #define CONTEXT_USER_ENGINES 1
>  
> + /** @mutex: guards everything that isn't engines or handles_vma */
>   struct mutex mutex;
>  
> + /** @sched: scheduler parameters */
>   struct i915_sched_attr sched;
>  
> - /** guilty_count: How many times this context has caused a GPU hang. */
> + /** @guilty_count: How many times this context has caused a GPU hang. */
>   atomic_t guilty_count;
>   /**
>* @active_count: How many times this context was active during a GPU
> @@ -171,15 +195,17 @@ struct i915_gem_context {
>   unsigned long hang_timestamp[2];
>  #define CONTEXT_FAST_HANG_JIFFIES (120 * HZ) /* 3 hangs within 120s? Banned! 
> */
>  
> - /** remap_slice: Bitmask of cache lines that need remapping */
> + /** @remap_slice: Bitmask of cache lines that need remapping */
>   u8 remap_slice;
>  
>   /**
> -  * handles_vma: rbtree to look up our context specific obj/vma for
> +  * @handles_vma: rbtree to look up our context specific obj/vma for
>* the user handle. (user handles are per fd, but the binding is
>* per vm, which may be one per context or shared with the global GTT)
>*/
>   struct radix_tree_root handles_vma;
> +
> + /** @lut_mutex: Locks handles_vma */

Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-05-04 Thread Christian König

Am 04.05.21 um 10:27 schrieb Daniel Vetter:

On Tue, May 4, 2021 at 10:09 AM Christian König
 wrote:

Am 04.05.21 um 09:32 schrieb Daniel Vetter:

On Tue, May 04, 2021 at 09:01:23AM +0200, Christian König wrote:

Unfortunately as I pointed out to Daniel as well this won't work 100%
reliable either.

You're claiming this, but there's no clear reason why really, and you
did't reply to my last mail on that sub-thread, so I really don't get
where exactly you're seeing a problem.

Yeah, it's rather hard to explain without pointing out how the hardware
works in detail.


See the signal on the ring buffer needs to be protected by manipulation from
userspace so that we can guarantee that the hardware really has finished
executing when it fires.

Nope you don't. Userspace is already allowed to submit all kinds of random
garbage, the only thing the kernel has to guarnatee is:
- the dma-fence DAG stays a DAG
- dma-fence completes in finite time

Everything else is not the kernel's problem, and if userspace mixes stuff
up like manipulates the seqno, that's ok. It can do that kind of garbage
already.


Protecting memory by immediate page table updates is a good first step, but
unfortunately not sufficient (and we would need to restructure large parts
of the driver to make this happen).

This is why you need the unload-fence on top, because indeed you can't
just rely on the fences created from the userspace ring, those are
unreliable for memory management.

And exactly that's the problem! We can't provide a reliable unload-fence
and the user fences are unreliable for that.

I've talked this through lengthy with our hardware/firmware guy last
Thursday but couldn't find a solution either.

We can have a preemption fence for the kernel which says: Hey this queue
was scheduled away you can touch it's hardware descriptor, control
registers, page tables, TLB, memory, GWS, GDS, OA etc etc etc... again.
But that one is only triggered on preemption and then we have the same
ordering problems once more.

Or we can have a end of operation fence for userspace which says: Hey
this queue has finished it's batch of execution, but this one is
manipulable from userspace in both finish to early (very very bad for
invalidations and memory management) or finish to late/never (deadlock
prone but fixable by timeout).

What we could do is to use the preemption fence to emulate the unload
fence, e.g. something like:
1. Preempt the queue in fixed intervals (let's say 100ms).
2. While preempted check if we have reached the checkpoint in question
by looking at the hardware descriptor.
3. If we have reached the checkpoint signal the unload fence.
4. If we haven't reached the checkpoint resume the queue again.

The problem is that this might introduce a maximum of 100ms delay before
signaling the unload fence and preempt/resume has such a hefty overhead
that we waste a horrible amount of time on it.

So your hw can preempt? That's good enough.

The unload fence is just
1. wait for all dma_fence that are based on the userspace ring. This
is unreliable, but we don't care because tdr will make it reliable.
And once tdr shot down a context we'll force-unload and thrash it
completely, which solves the problem.
2. preempt the context, which /should/ now be stuck waiting for more
commands to be stuffed into the ringbuffer. Which means your
preemption is hopefully fast enough to not matter. If your hw takes
forever to preempt an idle ring, I can't help you :-)


Yeah, it just takes to long for the preemption to complete to be really 
useful for the feature we are discussing here.


As I said when the kernel requests to preempt a queue we can easily 
expect a timeout of ~100ms until that comes back. For compute that is 
even in the multiple seconds range.


The "preemption" feature is really called suspend and made just for the 
case when we want to put a process to sleep or need to forcefully kill 
it for misbehavior or stuff like that. It is not meant to be used in 
normal operation.


If we only attach it on ->move then yeah maybe a last resort possibility 
to do it this way, but I think in that case we could rather stick with 
kernel submissions.



Also, if userspace lies to us and keeps pushing crap into the ring
after it's supposed to be idle: Userspace is already allowed to waste
gpu time. If you're too worried about this set a fairly aggressive
preempt timeout on the unload fence, and kill the context if it takes
longer than what preempting an idle ring should take (because that
would indicate broken/evil userspace).


I think you have the wrong expectation here. It is perfectly valid and 
expected for userspace to keep writing commands into the ring buffer.


After all when one frame is completed they want to immediately start 
rendering the next one.



Again, I'm not seeing the problem. Except if your hw is really
completely busted to the point where it can't even support userspace
ringbuffers properly and with sufficient performance :-P

Of co

Re: [PATCH] drm/ttm: fix warning in new sys man

2021-05-04 Thread Matthew Auld

On 03/05/2021 15:27, Christian König wrote:

Include the header for the prototype.

Signed-off-by: Christian König 
Reported-by: kernel test robot 

Reviewed-by: Matthew Auld 
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 2/4] Add missing check

2021-05-04 Thread Ville Syrjälä
On Mon, May 03, 2021 at 08:21:46PM +0200, Werner Sembach wrote:
> Add a missing check that could potentially lead to an unarchivable mode being
> validated.
> 
> Signed-off-by: Werner Sembach 
> ---
> 
> >From 54fa706f0a5f260a32af5d18b9622ceebb94c12e Mon Sep 17 00:00:00 2001
> From: Werner Sembach 
> Date: Mon, 3 May 2021 14:42:36 +0200
> Subject: [PATCH 2/4] Add missing check

I guess you did something a bit wonky with git format-patch/send-mail?

> 
> ---
>  drivers/gpu/drm/i915/display/intel_hdmi.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/display/intel_hdmi.c 
> b/drivers/gpu/drm/i915/display/intel_hdmi.c
> index 576d3d910d06..ce165ef28e88 100644
> --- a/drivers/gpu/drm/i915/display/intel_hdmi.c
> +++ b/drivers/gpu/drm/i915/display/intel_hdmi.c
> @@ -1913,7 +1913,7 @@ intel_hdmi_mode_valid(struct drm_connector *connector,
>   clock *= 2;
>   }
>  
> - if (drm_mode_is_420_only(&connector->display_info, mode))
> + if (connector->ycbcr_420_allowed && 
> drm_mode_is_420_only(&connector->display_info, mode))

This one shouldn't be necessary. drm_mode_validate_ycbcr420() has
already checked it for us.

>   clock /= 2;
>  
>   status = intel_hdmi_mode_clock_valid(hdmi, clock, has_hdmi_sink);
> -- 
> 2.25.1

-- 
Ville Syrjälä
Intel
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-05-04 Thread Daniel Vetter
On Tue, May 04, 2021 at 11:14:06AM +0200, Christian König wrote:
> Am 04.05.21 um 10:27 schrieb Daniel Vetter:
> > On Tue, May 4, 2021 at 10:09 AM Christian König
> >  wrote:
> > > Am 04.05.21 um 09:32 schrieb Daniel Vetter:
> > > > On Tue, May 04, 2021 at 09:01:23AM +0200, Christian König wrote:
> > > > > Unfortunately as I pointed out to Daniel as well this won't work 100%
> > > > > reliable either.
> > > > You're claiming this, but there's no clear reason why really, and you
> > > > did't reply to my last mail on that sub-thread, so I really don't get
> > > > where exactly you're seeing a problem.
> > > Yeah, it's rather hard to explain without pointing out how the hardware
> > > works in detail.
> > > 
> > > > > See the signal on the ring buffer needs to be protected by 
> > > > > manipulation from
> > > > > userspace so that we can guarantee that the hardware really has 
> > > > > finished
> > > > > executing when it fires.
> > > > Nope you don't. Userspace is already allowed to submit all kinds of 
> > > > random
> > > > garbage, the only thing the kernel has to guarnatee is:
> > > > - the dma-fence DAG stays a DAG
> > > > - dma-fence completes in finite time
> > > > 
> > > > Everything else is not the kernel's problem, and if userspace mixes 
> > > > stuff
> > > > up like manipulates the seqno, that's ok. It can do that kind of garbage
> > > > already.
> > > > 
> > > > > Protecting memory by immediate page table updates is a good first 
> > > > > step, but
> > > > > unfortunately not sufficient (and we would need to restructure large 
> > > > > parts
> > > > > of the driver to make this happen).
> > > > This is why you need the unload-fence on top, because indeed you can't
> > > > just rely on the fences created from the userspace ring, those are
> > > > unreliable for memory management.
> > > And exactly that's the problem! We can't provide a reliable unload-fence
> > > and the user fences are unreliable for that.
> > > 
> > > I've talked this through lengthy with our hardware/firmware guy last
> > > Thursday but couldn't find a solution either.
> > > 
> > > We can have a preemption fence for the kernel which says: Hey this queue
> > > was scheduled away you can touch it's hardware descriptor, control
> > > registers, page tables, TLB, memory, GWS, GDS, OA etc etc etc... again.
> > > But that one is only triggered on preemption and then we have the same
> > > ordering problems once more.
> > > 
> > > Or we can have a end of operation fence for userspace which says: Hey
> > > this queue has finished it's batch of execution, but this one is
> > > manipulable from userspace in both finish to early (very very bad for
> > > invalidations and memory management) or finish to late/never (deadlock
> > > prone but fixable by timeout).
> > > 
> > > What we could do is to use the preemption fence to emulate the unload
> > > fence, e.g. something like:
> > > 1. Preempt the queue in fixed intervals (let's say 100ms).
> > > 2. While preempted check if we have reached the checkpoint in question
> > > by looking at the hardware descriptor.
> > > 3. If we have reached the checkpoint signal the unload fence.
> > > 4. If we haven't reached the checkpoint resume the queue again.
> > > 
> > > The problem is that this might introduce a maximum of 100ms delay before
> > > signaling the unload fence and preempt/resume has such a hefty overhead
> > > that we waste a horrible amount of time on it.
> > So your hw can preempt? That's good enough.
> > 
> > The unload fence is just
> > 1. wait for all dma_fence that are based on the userspace ring. This
> > is unreliable, but we don't care because tdr will make it reliable.
> > And once tdr shot down a context we'll force-unload and thrash it
> > completely, which solves the problem.
> > 2. preempt the context, which /should/ now be stuck waiting for more
> > commands to be stuffed into the ringbuffer. Which means your
> > preemption is hopefully fast enough to not matter. If your hw takes
> > forever to preempt an idle ring, I can't help you :-)
> 
> Yeah, it just takes to long for the preemption to complete to be really
> useful for the feature we are discussing here.
> 
> As I said when the kernel requests to preempt a queue we can easily expect a
> timeout of ~100ms until that comes back. For compute that is even in the
> multiple seconds range.

100ms for preempting an idle request sounds like broken hw to me. Of
course preemting something that actually runs takes a while, that's
nothing new. But it's also not the thing we're talking about here. Is this
100ms actual numbers from hw for an actual idle ringbuffer?

> The "preemption" feature is really called suspend and made just for the case
> when we want to put a process to sleep or need to forcefully kill it for
> misbehavior or stuff like that. It is not meant to be used in normal
> operation.
> 
> If we only attach it on ->move then yeah maybe a last resort possibility to
> do it this way, but I think in that case we co

Re: [PATCH 3/4] Restructure output format computation for better expandability

2021-05-04 Thread Ville Syrjälä
On Mon, May 03, 2021 at 08:21:47PM +0200, Werner Sembach wrote:
> Couples the decission between RGB and YCbCr420 mode and the check if the port
> clock can archive the required frequency. Other checks and configuration steps
> that where previously done in between can also be done before or after.
> 
> This allows for are cleaner implementation of retrying different color
> encodings.
> 
> Signed-off-by: Werner Sembach 
> ---
> 
> >From 57e42ec6e34ac32da29eb7bc3c691cbeb2534396 Mon Sep 17 00:00:00 2001
> From: Werner Sembach 
> Date: Mon, 3 May 2021 15:30:40 +0200
> Subject: [PATCH 3/4] Restructure output format computation for better
>  expandability
> 
> ---
>  drivers/gpu/drm/i915/display/intel_hdmi.c | 57 +++
>  1 file changed, 26 insertions(+), 31 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/display/intel_hdmi.c 
> b/drivers/gpu/drm/i915/display/intel_hdmi.c
> index ce165ef28e88..e2553ac6fd13 100644
> --- a/drivers/gpu/drm/i915/display/intel_hdmi.c
> +++ b/drivers/gpu/drm/i915/display/intel_hdmi.c
> @@ -1999,29 +1999,6 @@ static bool hdmi_deep_color_possible(const struct 
> intel_crtc_state *crtc_state,
> INTEL_OUTPUT_FORMAT_YCBCR420);
>  }
>  
> -static int
> -intel_hdmi_ycbcr420_config(struct intel_crtc_state *crtc_state,
> -const struct drm_connector_state *conn_state)
> -{
> - struct drm_connector *connector = conn_state->connector;
> - struct drm_i915_private *i915 = to_i915(connector->dev);
> - const struct drm_display_mode *adjusted_mode =
> - &crtc_state->hw.adjusted_mode;
> -
> - if (!drm_mode_is_420_only(&connector->display_info, adjusted_mode))
> - return 0;
> -
> - if (!connector->ycbcr_420_allowed) {
> - drm_err(&i915->drm,
> - "Platform doesn't support YCBCR420 output\n");
> - return -EINVAL;
> - }
> -
> - crtc_state->output_format = INTEL_OUTPUT_FORMAT_YCBCR420;
> -
> - return intel_pch_panel_fitting(crtc_state, conn_state);
> -}
> -
>  static int intel_hdmi_compute_bpc(struct intel_encoder *encoder,
> struct intel_crtc_state *crtc_state,
> int clock)
> @@ -2128,6 +2105,24 @@ static bool intel_hdmi_has_audio(struct intel_encoder 
> *encoder,
>   return intel_conn_state->force_audio == HDMI_AUDIO_ON;
>  }
>  
> +int intel_hdmi_compute_output_format(struct intel_encoder *encoder,
> +  struct intel_crtc_state *crtc_state,
> +  const struct drm_connector_state 
> *conn_state)
> +{
> + const struct drm_connector *connector = conn_state->connector;
> + const struct drm_display_mode *adjusted_mode = 
> &crtc_state->hw.adjusted_mode;
> + int ret;
> +
> + if (connector->ycbcr_420_allowed && 
> drm_mode_is_420_only(&connector->display_info, adjusted_mode))
> + crtc_state->output_format = INTEL_OUTPUT_FORMAT_YCBCR420;
> + else
> + crtc_state->output_format = INTEL_OUTPUT_FORMAT_RGB;

Slight change in behaviour here since we used to reject 420_only modes
if ycbcr_420_allowed wasn't set. But I think this should be OK, and in
fact I believe the DP counterpart code always used an RGB fallback
rather than failing. So this lines up better with that.

Needs at least a note in the commit message to indicate that
there is a functional change buried within. Though it would be
better to split this functional change into a separate prep patch.

> +
> + ret = intel_hdmi_compute_clock(encoder, crtc_state);
> +
> + return ret;
> +}
> +
>  int intel_hdmi_compute_config(struct intel_encoder *encoder,
> struct intel_crtc_state *pipe_config,
> struct drm_connector_state *conn_state)
> @@ -2152,23 +2147,23 @@ int intel_hdmi_compute_config(struct intel_encoder 
> *encoder,
>   if (adjusted_mode->flags & DRM_MODE_FLAG_DBLCLK)
>   pipe_config->pixel_multiplier = 2;
>  
> - ret = intel_hdmi_ycbcr420_config(pipe_config, conn_state);
> - if (ret)
> - return ret;
> -
> - pipe_config->limited_color_range =
> - intel_hdmi_limited_color_range(pipe_config, conn_state);
> -
>   if (HAS_PCH_SPLIT(dev_priv) && !HAS_DDI(dev_priv))
>   pipe_config->has_pch_encoder = true;
>  
>   pipe_config->has_audio =
>   intel_hdmi_has_audio(encoder, pipe_config, conn_state);
>  
> - ret = intel_hdmi_compute_clock(encoder, pipe_config);
> + ret = intel_hdmi_compute_output_format(encoder, pipe_config, 
> conn_state);
>   if (ret)
>   return ret;
>  
> + ret = intel_pch_panel_fitting(pipe_config, conn_state);
> + if (ret)
> + return ret;

We probably want to still wrap this call in a
if (crtc_state->output_format == INTEL_OUTPUT_FORMAT_YCBCR420) {...}

In theory calling intel_pch_panel_fi

Re: [PATCH 4/4] Use YCbCr420 as fallback when RGB fails

2021-05-04 Thread Ville Syrjälä
On Mon, May 03, 2021 at 08:21:48PM +0200, Werner Sembach wrote:
> When encoder validation of a display mode fails, retry with less bandwidth
> heavy YCbCr420 color mode, if available. This enables some HDMI 1.4 setups
> to support 4k60Hz output, which previously failed silently.
> 
> AMDGPU had nearly the exact same issue. This problem description is
> therefore copied from my commit message of the AMDGPU patch.
> 
> On some setups, while the monitor and the gpu support display modes with
> pixel clocks of up to 600MHz, the link encoder might not. This prevents
> YCbCr444 and RGB encoding for 4k60Hz, but YCbCr420 encoding might still be
> possible. However, which color mode is used is decided before the link
> encoder capabilities are checked. This patch fixes the problem by retrying
> to find a display mode with YCbCr420 enforced and using it, if it is
> valid.
> 
> Signed-off-by: Werner Sembach 
> ---
> 
> >From 4ea0c8839b47e846d46c613e38af475231994f0f Mon Sep 17 00:00:00 2001
> From: Werner Sembach 
> Date: Mon, 3 May 2021 16:23:17 +0200
> Subject: [PATCH 4/4] Use YCbCr420 as fallback when RGB fails
> 
> ---
>  drivers/gpu/drm/i915/display/intel_hdmi.c | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/display/intel_hdmi.c 
> b/drivers/gpu/drm/i915/display/intel_hdmi.c
> index e2553ac6fd13..20c800f2ed60 100644
> --- a/drivers/gpu/drm/i915/display/intel_hdmi.c
> +++ b/drivers/gpu/drm/i915/display/intel_hdmi.c
> @@ -1913,7 +1913,7 @@ intel_hdmi_mode_valid(struct drm_connector *connector,
>   clock *= 2;
>   }
>  
> - if (connector->ycbcr_420_allowed && 
> drm_mode_is_420_only(&connector->display_info, mode))
> + if (connector->ycbcr_420_allowed && 
> drm_mode_is_420(&connector->display_info, mode))
>   clock /= 2;

This is too early. We want to keep clock as is for checking whether RGB
output is possible with 420_also modes.

So the structure you had in your original patch was the correct way to
go about it. Which I think was something along the lines of:

if (420_only)
clock /= 2;

status = intel_hdmi_mode_clock_valid()
if (status != OK) {
if (420_only || !420_also || !420_allowed)
return status;

clock /= 2;
status = intel_hdmi_mode_clock_valid()
}


>  
>   status = intel_hdmi_mode_clock_valid(hdmi, clock, has_hdmi_sink);
> @@ -2119,6 +2119,14 @@ int intel_hdmi_compute_output_format(struct 
> intel_encoder *encoder,
>   crtc_state->output_format = INTEL_OUTPUT_FORMAT_RGB;
>  
>   ret = intel_hdmi_compute_clock(encoder, crtc_state);
> + if (ret) {
> + if (crtc_state->output_format != INTEL_OUTPUT_FORMAT_YCBCR420 ||
> + connector->ycbcr_420_allowed ||
> + drm_mode_is_420_also(&connector->display_info, 
> adjusted_mode)) {

That needs s/||/&&/ or we flip the conditions around to:

if (ret) {
if (output_format == 420 || !420_allowed || !420_also)
return ret;

output_format = 420;
...
}

which would have the benefit of avoiding the extra indent level.

> + crtc_state->output_format = 
> INTEL_OUTPUT_FORMAT_YCBCR420;
> + ret = intel_hdmi_compute_clock(encoder, crtc_state);
> + }
> + }
>  
>   return ret;
>  }
> -- 
> 2.25.1

-- 
Ville Syrjälä
Intel
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] drm: Use drm_mode_is_420_only() instead of open coding it

2021-05-04 Thread Ville Syrjala
From: Ville Syrjälä 

Replace the open coded drm_mode_is_420_only() with the real thing.

No functional changes.

Cc: Werner Sembach 
Signed-off-by: Ville Syrjälä 
---
 drivers/gpu/drm/drm_modes.c | 13 -
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/drm_modes.c b/drivers/gpu/drm/drm_modes.c
index 33a93fa24eb1..12fcbb7ce179 100644
--- a/drivers/gpu/drm/drm_modes.c
+++ b/drivers/gpu/drm/drm_modes.c
@@ -1176,16 +1176,11 @@ enum drm_mode_status
 drm_mode_validate_ycbcr420(const struct drm_display_mode *mode,
   struct drm_connector *connector)
 {
-   u8 vic = drm_match_cea_mode(mode);
-   enum drm_mode_status status = MODE_OK;
-   struct drm_hdmi_info *hdmi = &connector->display_info.hdmi;
-
-   if (test_bit(vic, hdmi->y420_vdb_modes)) {
-   if (!connector->ycbcr_420_allowed)
-   status = MODE_NO_420;
-   }
+   if (!connector->ycbcr_420_allowed &&
+   drm_mode_is_420_only(&connector->display_info, mode))
+   return MODE_NO_420;
 
-   return status;
+   return MODE_OK;
 }
 EXPORT_SYMBOL(drm_mode_validate_ycbcr420);
 
-- 
2.26.3

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Intel-gfx] [PATCH resend 2/2] drm/i915/display: Make vlv_find_free_pps() skip pipes which are in use for non DP purposes

2021-05-04 Thread Ville Syrjälä
On Wed, Mar 24, 2021 at 04:37:14PM +0200, Ville Syrjälä wrote:
> On Wed, Mar 24, 2021 at 03:10:59PM +0100, Hans de Goede wrote:
> > Hi,
> > 
> > On 3/24/21 3:02 PM, Ville Syrjälä wrote:
> > > On Tue, Mar 23, 2021 at 11:39:09AM +0100, Hans de Goede wrote:
> > >> Hi,
> > >>
> > >> On 3/2/21 3:51 PM, Ville Syrjälä wrote:
> > >>> On Tue, Mar 02, 2021 at 01:00:40PM +0100, Hans de Goede wrote:
> >  As explained by a long comment block, on VLV intel_setup_outputs()
> >  sometimes thinks there might be an eDP panel connected while there is 
> >  none.
> >  In this case intel_setup_outputs() will call intel_dp_init() to check.
> > 
> >  In this scenario vlv_find_free_pps() ends up selecting pipe A for the 
> >  pps,
> >  even though this might be in use for non DP purposes. When this is the 
> >  case
> >  then the assert_pipe() in vlv_force_pll_on() will fail when called from
> >  vlv_power_sequencer_kick().
> > >>>
> > >>> The idea is that you *can* select a PPS from a pipe used for a non-DP
> > >>> port since those don't care about the PPS stuff. So this doesn't seem
> > >>> correct.
> > >>
> > >> They may not care about the PPS stuff, but as the WARN / backtrace
> > >> shows if the DPLL_VCO_ENABLE bit is not already set for the pipe, while
> > >> the pipe is "otherwise" in use then vlv_force_pll_on() becomes unhappy
> > >> triggering the WARN.DPLL_VCO_ENABLE bit is not
> > >>
> > >>> a) I would like to see the VBT for this machine
> > >>
> > >> https://fedorapeople.org/~jwrdegoede/voyo-winpad-a15-vbt
> > >>
> > >>> b) I wonder if the DSI PLL is sufficient for getting the PPS going?
> > >>
> > >> I have no idea, I just noticed the WARN / backtrace and this seemed
> > >> like a reasonably way to deal with it. With that said I'm fine with 
> > >> fixing
> > >> this a different way.
> > >>
> > >>> c) If we do need the normal DPLL is there any harm to DSI in enabling 
> > >>> it?
> > >>
> > >> I would assume this increases power-consumption and DSI panels are almost
> > >> always used in battery powered devices.
> > > 
> > > This is just used while probing the panel, so power consumption is
> > > not a concern.
> > 
> > Sorry I misinterpreted what you wrote, I interpreted it as have the DSI
> > code enable it to avoid this problem. I see now that that is now what
> > you meant.
> > 
> > >> Also this would impact all BYT/CHT devices, possible triggering unwanted
> > >> side-effects. Where as the proposed fix below is much more narrowly 
> > >> targeted
> > >> at the problem. It might not be the most pretty fix but AFAICT it has a 
> > >> low
> > >> risk of causing regressions.
> > > 
> > > It rather significantly changes the logic of the workaround, potentially
> > > causing us to not find a free PPS at all. Eg. if you were to boot with
> > > a VLV with pipe A -> eDP B + eDP C inactive + pipe B -> VGA then your
> > > change would cause us to not find the free pipe B PPS for probing eDP C,
> > > and in the end we'd get a WARN and fall back to pipe A PPS which would
> > > clobber the actually in use pipe A PPS.
> > 
> > I would welcome, and will happily test, another fix for this. ATM we
> > have a WARN triggering on actual hardware (and not just in a hypothetical
> > example) and I would like to see that WARN fixed. If you can come up with
> > a better fix I would be happy to test.
> 
> Well, I think there are a couple things we want to experiment wiht:
> 
> a) Just skip the asserts and see if enabling the DPLL/poking the PPS
>perturbs the DSI output in any way.
> 
> --- a/drivers/gpu/drm/i915/display/intel_dpll.c
> +++ b/drivers/gpu/drm/i915/display/intel_dpll.c
> @@ -1467,7 +1467,7 @@ void vlv_enable_pll(struct intel_crtc *crtc,
>   struct drm_i915_private *dev_priv = to_i915(crtc->base.dev);
>   enum pipe pipe = crtc->pipe;
>  
> - assert_pipe_disabled(dev_priv, pipe_config->cpu_transcoder);
> + //assert_pipe_disabled(dev_priv, pipe_config->cpu_transcoder);
>  
>   /* PLL is protected by panel, make sure we can write it */
>   assert_panel_unlocked(dev_priv, pipe);
> @@ -1800,7 +1800,7 @@ void vlv_disable_pll(struct drm_i915_private *dev_priv, 
> enum pipe pipe)
>   u32 val;
>  
>   /* Make sure the pipe isn't still relying on us */
> - assert_pipe_disabled(dev_priv, (enum transcoder)pipe);
> + //assert_pipe_disabled(dev_priv, (enum transcoder)pipe);
>  
>   val = DPLL_INTEGRATED_REF_CLK_VLV |
>   DPLL_REF_CLK_ENABLE_VLV | DPLL_VGA_MODE_DIS;
> --- a/drivers/gpu/drm/i915/display/intel_pps.c
> +++ b/drivers/gpu/drm/i915/display/intel_pps.c
> @@ -110,6 +110,8 @@ vlv_power_sequencer_kick(struct intel_dp *intel_dp)
>   intel_de_write(dev_priv, intel_dp->output_reg, DP & ~DP_PORT_EN);
>   intel_de_posting_read(dev_priv, intel_dp->output_reg);
>  
> + msleep(1000); // just to make sure we keep angering DSI for a bit longer
> +
>   if (!pll_enabled) {
>   vlv_force_pll_off(dev_priv, pipe);
>  

Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-05-04 Thread Christian König

Am 04.05.21 um 11:47 schrieb Daniel Vetter:

[SNIP]

Yeah, it just takes to long for the preemption to complete to be really
useful for the feature we are discussing here.

As I said when the kernel requests to preempt a queue we can easily expect a
timeout of ~100ms until that comes back. For compute that is even in the
multiple seconds range.

100ms for preempting an idle request sounds like broken hw to me. Of
course preemting something that actually runs takes a while, that's
nothing new. But it's also not the thing we're talking about here. Is this
100ms actual numbers from hw for an actual idle ringbuffer?


Well 100ms is just an example of the scheduler granularity. Let me 
explain in a wider context.


The hardware can have X queues mapped at the same time and every Y time 
interval the hardware scheduler checks if those queues have changed and 
only if they have changed the necessary steps to reload them are started.


Multiple queues can be rendering at the same time, so you can have X as 
a high priority queue active and just waiting for a signal to start and 
the client rendering one frame after another and a third background 
compute task mining bitcoins for you.


As long as everything is static this is perfectly performant. Adding a 
queue to the list of active queues is also relatively simple, but taking 
one down requires you to wait until we are sure the hardware has seen 
the change and reloaded the queues.


Think of it as an RCU grace period. This is simply not something which 
is made to be used constantly, but rather just at process termination.



The "preemption" feature is really called suspend and made just for the case
when we want to put a process to sleep or need to forcefully kill it for
misbehavior or stuff like that. It is not meant to be used in normal
operation.

If we only attach it on ->move then yeah maybe a last resort possibility to
do it this way, but I think in that case we could rather stick with kernel
submissions.

Well this is a hybrid userspace ring + kernel augmeted submit mode, so you
can keep dma-fences working. Because the dma-fence stuff wont work with
pure userspace submit, I think that conclusion is rather solid. Once more
even after this long thread here.


When assisted with unload fences, then yes. Problem is that I can't see 
how we could implement those performant currently.



Also, if userspace lies to us and keeps pushing crap into the ring
after it's supposed to be idle: Userspace is already allowed to waste
gpu time. If you're too worried about this set a fairly aggressive
preempt timeout on the unload fence, and kill the context if it takes
longer than what preempting an idle ring should take (because that
would indicate broken/evil userspace).

I think you have the wrong expectation here. It is perfectly valid and
expected for userspace to keep writing commands into the ring buffer.

After all when one frame is completed they want to immediately start
rendering the next one.

Sure, for the true userspace direct submit model. But with that you don't
get dma-fence, which means this gpu will not work for 3d accel on any
current linux desktop.


I'm not sure of that. I've looked a bit into how we could add user 
fences to dma_resv objects and that isn't that hard after all.



Which sucks, hence some hybrid model of using the userspace ring and
kernel augmented submit is needed. Which was my idea.


Yeah, I think when our firmware folks would really remove the kernel 
queue and we still don't have





[SNIP]
Can't find that of hand either, but see the amdgpu_noretry module option.

It basically tells the hardware if retry page faults should be supported or
not because this whole TLB shutdown thing when they are supported is
extremely costly.

Hm so synchronous tlb shootdown is a lot more costly when you allow
retrying of page faults?


Partially correct, yes.

See when you have retry page faults enabled and unmap something you need 
to make sure that everybody which could have potentially translated that 
page and has a TLB is either invalidated or waited until the access is 
completed.


Since every CU could be using a memory location that takes ages to 
completed compared to the normal invalidation where you just invalidate 
the L1/L2 and are done.


Additional to that the recovery adds some extra overhead to every memory 
access, so even without a fault you are quite a bit slower if this is 
enabled.



That sounds bad, because for full hmm mode you need to be able to retry
pagefaults. Well at least the PASID/ATS/IOMMU side will do that, and might just
hang your gpu for a long time while it's waiting for the va->pa lookup
response to return. So retrying lookups shouldn't be any different really.

And you also need fairly fast synchronous tlb shootdown for hmm. So if
your hw has a problem with both together that sounds bad.


Completely agree. And since it was my job to validate the implementation 
on Vega10 I was also the first one to reali

Re: [led-backlight] default-brightness-level issue

2021-05-04 Thread pgeiem
‐‐‐ Original Message ‐‐‐
On Thursday, April 29, 2021 2:07 PM, Daniel Thompson 
 wrote:

> On Thu, Apr 29, 2021 at 11:31:20AM +, pgeiem wrote:
>
> > On Thursday, April 29, 2021 1:00 PM, Daniel Thompson 
> > daniel.thomp...@linaro.org wrote:
> >
> > > On Fri, Apr 23, 2021 at 01:04:23PM +, pgeiem wrote:
> > >
> > > > Dear all,
> > > > On a custom board I have a simple DPI panel. Panel's backlight is
> > > > drive with an I2C led driver (PCA9632). led-backlight driver is sued
> > > > to manage this as a backlight.
> > > > When using brightness-levels and default-brightness-level the
> > > > backlight stay turned-off even if manually trying to set a different
> > > > index value to brightness through sysfs.
> > > > I traced this issue as follow: When led_bl_update_status() is called
> > > > the brightness value is returned from backlight_get_brightness() which
> > > > call backlight_is_blank(). In my case backlight_is_blank() return true
> > > > due to props.power = FB_BLANK_POWERDOWN which is != FB_BLANK_UNBLANK.
> > > > I traced why at startup props.power is FB_BLANK_POWERDOWN and found
> > > > that in led_bl_probe() when a default brightness is set (>0)
> > > > props.power is set to FB_BLANK_POWERDOWN which seems incorrect to me.
> > > > I made the small change below and default brightness is correctly used
> > > > at startup. I am not really sure this is an issue and if my change is
> > > > correct or if I am doing something incorrect somewhere else. So I
> > > > first would like to get your opinion on this and if correct will send
> > > > a patch.
> > >
> > > Makes sense. Please send this as a patch.
> > > Daniel.
> >
> > I finally believe this was correct.
> > A link between the panel and the backlight was missing in dts. With this 
> > link the backlight is turned on (props.power = FB_BLANK_UNBLANK) from 
> > drm_panel_enable function after enabling the panel.
> > Let me know if you disagree and still believe I should send the patch.
>
> The original logic is definitely wrong.
>
> However that doesn't mean the driver will be fully correct after fixing
> since it does look like is missing the logic found in other similar
> drivers (pwm-bl.c and gpio_backlight.c) to handle the various different
> cases w.r.t. handover from active backlight by the bootloader.
>
> Daniel.

ok you are right, will see if I can port the logic from others backlight 
drivers to this one.

Patrick
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-05-04 Thread Daniel Vetter
On Tue, May 4, 2021 at 12:53 PM Christian König
 wrote:
>
> Am 04.05.21 um 11:47 schrieb Daniel Vetter:
> > [SNIP]
> >> Yeah, it just takes to long for the preemption to complete to be really
> >> useful for the feature we are discussing here.
> >>
> >> As I said when the kernel requests to preempt a queue we can easily expect 
> >> a
> >> timeout of ~100ms until that comes back. For compute that is even in the
> >> multiple seconds range.
> > 100ms for preempting an idle request sounds like broken hw to me. Of
> > course preemting something that actually runs takes a while, that's
> > nothing new. But it's also not the thing we're talking about here. Is this
> > 100ms actual numbers from hw for an actual idle ringbuffer?
>
> Well 100ms is just an example of the scheduler granularity. Let me
> explain in a wider context.
>
> The hardware can have X queues mapped at the same time and every Y time
> interval the hardware scheduler checks if those queues have changed and
> only if they have changed the necessary steps to reload them are started.
>
> Multiple queues can be rendering at the same time, so you can have X as
> a high priority queue active and just waiting for a signal to start and
> the client rendering one frame after another and a third background
> compute task mining bitcoins for you.
>
> As long as everything is static this is perfectly performant. Adding a
> queue to the list of active queues is also relatively simple, but taking
> one down requires you to wait until we are sure the hardware has seen
> the change and reloaded the queues.
>
> Think of it as an RCU grace period. This is simply not something which
> is made to be used constantly, but rather just at process termination.

Uh ... that indeed sounds rather broken.

Otoh it's just a dma_fence that'd we'd inject as this unload-fence. So
by and large everyone should already be able to cope with it taking a
bit longer. So from a design pov I don't see a huge problem, but I
guess you guys wont be happy since it means on amd hw there will be
random unsightly stalls in desktop linux usage.

> >> The "preemption" feature is really called suspend and made just for the 
> >> case
> >> when we want to put a process to sleep or need to forcefully kill it for
> >> misbehavior or stuff like that. It is not meant to be used in normal
> >> operation.
> >>
> >> If we only attach it on ->move then yeah maybe a last resort possibility to
> >> do it this way, but I think in that case we could rather stick with kernel
> >> submissions.
> > Well this is a hybrid userspace ring + kernel augmeted submit mode, so you
> > can keep dma-fences working. Because the dma-fence stuff wont work with
> > pure userspace submit, I think that conclusion is rather solid. Once more
> > even after this long thread here.
>
> When assisted with unload fences, then yes. Problem is that I can't see
> how we could implement those performant currently.

Is there really no way to fix fw here? Like if process start/teardown
takes 100ms, that's going to suck no matter what.

> >>> Also, if userspace lies to us and keeps pushing crap into the ring
> >>> after it's supposed to be idle: Userspace is already allowed to waste
> >>> gpu time. If you're too worried about this set a fairly aggressive
> >>> preempt timeout on the unload fence, and kill the context if it takes
> >>> longer than what preempting an idle ring should take (because that
> >>> would indicate broken/evil userspace).
> >> I think you have the wrong expectation here. It is perfectly valid and
> >> expected for userspace to keep writing commands into the ring buffer.
> >>
> >> After all when one frame is completed they want to immediately start
> >> rendering the next one.
> > Sure, for the true userspace direct submit model. But with that you don't
> > get dma-fence, which means this gpu will not work for 3d accel on any
> > current linux desktop.
>
> I'm not sure of that. I've looked a bit into how we could add user
> fences to dma_resv objects and that isn't that hard after all.

I think as a proof of concept it's fine, but as an actual solution ...
pls no. Two reasons:
- implicit sync is bad
- this doesn't fix anything for explicit sync using dma_fence in terms
of sync_file or drm_syncobj.

So if we go with the route of papering over this in the kernel, then
it'll be a ton more work than just hacking something into dma_resv.

> > Which sucks, hence some hybrid model of using the userspace ring and
> > kernel augmented submit is needed. Which was my idea.
>
> Yeah, I think when our firmware folks would really remove the kernel
> queue and we still don't have

Yeah I think kernel queue can be removed. But the price is that you
need reasonable fast preempt of idle contexts.

I really can't understand how this can take multiple ms, something
feels very broken in the design of the fw (since obviously the hw can
preempt an idle context to another one pretty fast, or you'd render
any multi-client desktop as a slideshow

Re: [PATCH 3/9] drm/connector: Add drm_connector_find_by_fwnode() function (v2)

2021-05-04 Thread Hans de Goede
Hi,

On 5/4/21 10:00 AM, Andy Shevchenko wrote:
> 
> 
> On Monday, May 3, 2021, Hans de Goede  > wrote:
> 
> Add a function to find a connector based on a fwnode.
> 
> This will be used by the new drm_connector_oob_hotplug_event()
> function which is added by the next patch in this patch-set.
> 
> Changes in v2:
> - Complete rewrite to use a global connector list in drm_connector.c
>   rather then using a class-dev-iter in drm_sysfs.c
> 
> Signed-off-by: Hans de Goede  >
> ---
>  drivers/gpu/drm/drm_connector.c     | 50 +
>  drivers/gpu/drm/drm_crtc_internal.h |  1 +
>  include/drm/drm_connector.h         |  8 +
>  3 files changed, 59 insertions(+)
> 
> diff --git a/drivers/gpu/drm/drm_connector.c 
> b/drivers/gpu/drm/drm_connector.c
> index 87c68563e6c3..ef759d6add81 100644
> --- a/drivers/gpu/drm/drm_connector.c
> +++ b/drivers/gpu/drm/drm_connector.c
> @@ -66,6 +66,14 @@
>   * support can instead use e.g. drm_helper_hpd_irq_event().
>   */
> 
> +/*
> + * Global connector list for drm_connector_find_by_fwnode().
> + * Note drm_connector_[un]register() first take connector->lock and then
> + * take the connector_list_lock.
> + */
> +static DEFINE_MUTEX(connector_list_lock);
> +static LIST_HEAD(connector_list);
> +
>  struct drm_conn_prop_enum_list {
>         int type;
>         const char *name;
> @@ -267,6 +275,7 @@ int drm_connector_init(struct drm_device *dev,
>                 goto out_put_type_id;
>         }
> 
> +       INIT_LIST_HEAD(&connector->global_connector_list_entry);
>         INIT_LIST_HEAD(&connector->probed_modes);
>         INIT_LIST_HEAD(&connector->modes);
>         mutex_init(&connector->mutex);
> @@ -540,6 +549,9 @@ int drm_connector_register(struct drm_connector 
> *connector)
>                 
> drm_privacy_screen_register_notifier(connector->privacy_screen,
>                                            
> &connector->privacy_screen_notifier);
> 
> +       mutex_lock(&connector_list_lock);
> +       list_add_tail(&connector->global_connector_list_entry, 
> &connector_list);
> +       mutex_unlock(&connector_list_lock);
>         goto unlock;
> 
>  err_debugfs:
> @@ -568,6 +580,10 @@ void drm_connector_unregister(struct drm_connector 
> *connector)
>                 return;
>         }
> 
> +       mutex_lock(&connector_list_lock);
> +       list_del_init(&connector->global_connector_list_entry);
> +       mutex_unlock(&connector_list_lock);
> +
>         if (connector->privacy_screen)
>                 drm_privacy_screen_unregister_notifier(
>                                         connector->privacy_screen,
> @@ -2676,6 +2692,40 @@ int drm_mode_getconnector(struct drm_device *dev, 
> void *data,
>         return ret;
>  }
> 
> +/**
> + * drm_connector_find_by_fwnode - Find a connector based on the 
> associated fwnode
> + * @fwnode: fwnode for which to find the matching drm_connector
> + *
> + * This functions looks up a drm_connector based on its associated 
> fwnode. When
> + * a connector is found a reference to the connector is returned. The 
> caller must
> + * call drm_connector_put() to release this reference when it is done 
> with the
> + * connector.
> + *
> + * Returns: A reference to the found connector or an ERR_PTR().
> + */
> +struct drm_connector *drm_connector_find_by_fwnode(struct fwnode_handle 
> *fwnode)
> +{
> +       struct drm_connector *connector, *found = ERR_PTR(-ENODEV);
> +
> +       if (!fwnode)
> +               return ERR_PTR(-ENODEV);
> +
> +       mutex_lock(&connector_list_lock);
> +
> +       list_for_each_entry(connector, &connector_list, 
> global_connector_list_entry) {
> +               if (connector->fwnode == fwnode ||
> +                   (connector->fwnode && connector->fwnode->secondary == 
> fwnode)) {
> +                       drm_connector_get(connector);
> +                       found = connector;
> +                       break;
> +               }
> +       }
> +
> +       mutex_unlock(&connector_list_lock);
> +
> +       return found;
> 
> 
> 
> If I am not mistaken you can replace this with
> 
> return list_entry_is_head();
> 
> call and remove additional Boolean variable.

Found is not a boolean, it is a pointer to the found connector (or 
ERR_PTR(-ENODEV)).

Regards,

Hans


>  
> 
> +}
> +
> 
>  /**
>   * DOC: Tile group
> diff --git a/drivers/gpu/drm/drm_crtc_internal.h 
> b/drivers/gpu/drm/drm_crtc_internal.h
> index 54d4cf1233e9..6e28fc00a740 100644
> --- a/drivers/gpu/drm/drm_crtc_internal.h
> +++ b/drivers/gpu/drm/drm_crtc_internal.h
> @@ -1

Re: [RFC] CRIU support for ROCm

2021-05-04 Thread Adrian Reber
On Mon, May 03, 2021 at 02:21:53PM -0400, Felix Kuehling wrote:
> Am 2021-05-01 um 1:03 p.m. schrieb Adrian Reber:
> > On Fri, Apr 30, 2021 at 09:57:45PM -0400, Felix Kuehling wrote:
> >> We have been working on a prototype supporting CRIU (Checkpoint/Restore
> >> In Userspace) for accelerated compute applications running on AMD GPUs
> >> using ROCm (Radeon Open Compute Platform). We're happy to finally share
> >> this work publicly to solicit feedback and advice. The end-goal is to
> >> get this work included upstream in Linux and CRIU. A short whitepaper
> >> describing our design and intention can be found on Github:
> >> https://github.com/RadeonOpenCompute/criu/tree/criu-dev/test/others/ext-kfd/README.md
> >>
> >> We have RFC patch series for the kernel (based on Alex Deucher's
> >> amd-staging-drm-next branch) and for CRIU including a new plugin and a
> >> few core CRIU changes. I will send those to the respective mailing lists
> >> separately in a minute. They can also be found on Github.
> >>
> >> CRIU+plugin: https://github.com/RadeonOpenCompute/criu/commits/criu-dev
> >> Kernel (KFD):
> >> 
> >> https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/commits/fxkamd/criu-wip
> >>
> >> At this point this is very much a work in progress and not ready for
> >> upstream inclusion. There are still several missing features, known
> >> issues, and open questions that we would like to start addressing with
> >> your feedback.
> >>
> >> What's working and tested at this point:
> >>
> >>   * Checkpoint and restore accelerated machine learning apps: PyTorch
> >> running Bert on systems with 1 or 2 GPUs (MI50 or MI100), 100%
> >> unmodified user mode stack
> >>   * Checkpoint on one system, restore on a different system
> >>   * Checkpoint on one GPU, restore on a different GPU
> > This is very impressive. As far as I know this is the first larger
> > plugin written for CRIU and publicly published. It is also the first GPU
> > supported and people have been asking this for many years. It is in fact
> > the first hardware device supported through a plugin.
> >
> >> Major Known issues:
> >>
> >>   * The KFD ioctl API is not final: Needs a complete redesign to allow
> >> future extension without breaking the ABI
> >>   * Very slow: Need to implement DMA to dump VRAM contents
> >>
> >> Missing or incomplete features:
> >>
> >>   * Support for the new KFD SVM API
> >>   * Check device topology during restore
> >>   * Checkpoint and restore multiple processes
> >>   * Support for applications using Mesa for video decode/encode
> >>   * Testing with more different GPUs and workloads
> >>
> >> Big Open questions:
> >>
> >>   * What's the preferred way to publish our CRIU plugin? In-tree or
> >> out-of-tree?
> > I would do it in-tree.
> >
> >>   * What's the preferred way to distribute our CRIU plugin? Source?
> >> Binary .so? Whole CRIU? Just in-box support?
> > As you are planing to publish the source I would make it part of the
> > CRIU repository and this way it will find its way to the packages in the
> > different distributions.
> 
> Thanks. These are the answers I was hoping for.
> 
> 
> >
> > Does the plugin require any additional dependencies? If there is no
> > additional dependency to a library the plugin can be easily be part of
> > the existing packages.
> 
> The DMA solution we're considering for saving VRAM contents would add a
> dependency on libdrm and libdrm-amdgpu.

For the CRIU packages I am maintaining I would probably put the plugin
in a sub-package so that not all users of the CRIU package have to
install the mentioned libraries.

> >>   * If our plugin can be upstreamed in the CRIU tree, what would be the
> >> right directory?
> > I would just put it into criu/plugins/
> 
> Sounds good.
> 
> >
> > It would also be good to have your patchset submitted as a PR on github
> > to have our normal CI test coverage of the changes.
> 
> We'll probably have to recreate our repository to start as a fork of the
> upstream CRIU repository, so that we can easily send pull-requests.
> We're not going to be ready for upstreaming for a few more months,
> probably. Do you want to get occasionaly pull requests anyway, just to
> run CI on our work-in-progress code?

If you run it early through our CI it might make it easier for you to
see what it might break. Also, if your patches include fixes which are
not directly related to your plugin, it might make sense to submit those
patches earlier to reduce the size of the final patch. But this is up to
you.

Adrian
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 3/9] drm/connector: Add drm_connector_find_by_fwnode() function (v2)

2021-05-04 Thread Andy Shevchenko
On Tue, May 4, 2021 at 2:53 PM Hans de Goede  wrote:
> On 5/4/21 10:00 AM, Andy Shevchenko wrote:
> > On Monday, May 3, 2021, Hans de Goede  > > wrote:

...

> > +struct drm_connector *drm_connector_find_by_fwnode(struct 
> > fwnode_handle *fwnode)
> > +{
> > +   struct drm_connector *connector, *found = ERR_PTR(-ENODEV);
> > +
> > +   if (!fwnode)
> > +   return ERR_PTR(-ENODEV);
> > +
> > +   mutex_lock(&connector_list_lock);
> > +
> > +   list_for_each_entry(connector, &connector_list, 
> > global_connector_list_entry) {
> > +   if (connector->fwnode == fwnode ||
> > +   (connector->fwnode && connector->fwnode->secondary 
> > == fwnode)) {
> > +   drm_connector_get(connector);
> > +   found = connector;
> > +   break;
> > +   }
> > +   }
> > +
> > +   mutex_unlock(&connector_list_lock);
> > +
> > +   return found;
> >
> > If I am not mistaken you can replace this with
> >
> > return list_entry_is_head();
> >
> > call and remove additional Boolean variable.
>
> Found is not a boolean, it is a pointer to the found connector (or 
> ERR_PTR(-ENODEV)).

Ah, perhaps giving a better name? `match` ?

And to the initial topic, it's either an additional variable or
additional branch in this case. I think additional branch (taking into
account the length of the line or amount of lines) doesn't buy us
anything.

> > +}

-- 
With Best Regards,
Andy Shevchenko
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm: Use drm_mode_is_420_only() instead of open coding it

2021-05-04 Thread Jani Nikula
On Tue, 04 May 2021, Ville Syrjala  wrote:
> From: Ville Syrjälä 
>
> Replace the open coded drm_mode_is_420_only() with the real thing.
>
> No functional changes.
>
> Cc: Werner Sembach 
> Signed-off-by: Ville Syrjälä 

Reviewed-by: Jani Nikula 

> ---
>  drivers/gpu/drm/drm_modes.c | 13 -
>  1 file changed, 4 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/gpu/drm/drm_modes.c b/drivers/gpu/drm/drm_modes.c
> index 33a93fa24eb1..12fcbb7ce179 100644
> --- a/drivers/gpu/drm/drm_modes.c
> +++ b/drivers/gpu/drm/drm_modes.c
> @@ -1176,16 +1176,11 @@ enum drm_mode_status
>  drm_mode_validate_ycbcr420(const struct drm_display_mode *mode,
>  struct drm_connector *connector)
>  {
> - u8 vic = drm_match_cea_mode(mode);
> - enum drm_mode_status status = MODE_OK;
> - struct drm_hdmi_info *hdmi = &connector->display_info.hdmi;
> -
> - if (test_bit(vic, hdmi->y420_vdb_modes)) {
> - if (!connector->ycbcr_420_allowed)
> - status = MODE_NO_420;
> - }
> + if (!connector->ycbcr_420_allowed &&
> + drm_mode_is_420_only(&connector->display_info, mode))
> + return MODE_NO_420;
>  
> - return status;
> + return MODE_OK;
>  }
>  EXPORT_SYMBOL(drm_mode_validate_ycbcr420);

-- 
Jani Nikula, Intel Open Source Graphics Center
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-05-04 Thread Christian König

Am 04.05.21 um 13:13 schrieb Daniel Vetter:

On Tue, May 4, 2021 at 12:53 PM Christian König
 wrote:

Am 04.05.21 um 11:47 schrieb Daniel Vetter:

[SNIP]

Yeah, it just takes to long for the preemption to complete to be really
useful for the feature we are discussing here.

As I said when the kernel requests to preempt a queue we can easily expect a
timeout of ~100ms until that comes back. For compute that is even in the
multiple seconds range.

100ms for preempting an idle request sounds like broken hw to me. Of
course preemting something that actually runs takes a while, that's
nothing new. But it's also not the thing we're talking about here. Is this
100ms actual numbers from hw for an actual idle ringbuffer?

Well 100ms is just an example of the scheduler granularity. Let me
explain in a wider context.

The hardware can have X queues mapped at the same time and every Y time
interval the hardware scheduler checks if those queues have changed and
only if they have changed the necessary steps to reload them are started.

Multiple queues can be rendering at the same time, so you can have X as
a high priority queue active and just waiting for a signal to start and
the client rendering one frame after another and a third background
compute task mining bitcoins for you.

As long as everything is static this is perfectly performant. Adding a
queue to the list of active queues is also relatively simple, but taking
one down requires you to wait until we are sure the hardware has seen
the change and reloaded the queues.

Think of it as an RCU grace period. This is simply not something which
is made to be used constantly, but rather just at process termination.

Uh ... that indeed sounds rather broken.


Well I wouldn't call it broken. It's just not made for the use case we 
are trying to abuse it for.



Otoh it's just a dma_fence that'd we'd inject as this unload-fence.


Yeah, exactly that's why it isn't much of a problem for process 
termination or freeing memory.



So by and large everyone should already be able to cope with it taking a
bit longer. So from a design pov I don't see a huge problem, but I
guess you guys wont be happy since it means on amd hw there will be
random unsightly stalls in desktop linux usage.


The "preemption" feature is really called suspend and made just for the case
when we want to put a process to sleep or need to forcefully kill it for
misbehavior or stuff like that. It is not meant to be used in normal
operation.

If we only attach it on ->move then yeah maybe a last resort possibility to
do it this way, but I think in that case we could rather stick with kernel
submissions.

Well this is a hybrid userspace ring + kernel augmeted submit mode, so you
can keep dma-fences working. Because the dma-fence stuff wont work with
pure userspace submit, I think that conclusion is rather solid. Once more
even after this long thread here.

When assisted with unload fences, then yes. Problem is that I can't see
how we could implement those performant currently.

Is there really no way to fix fw here? Like if process start/teardown
takes 100ms, that's going to suck no matter what.


As I said adding the queue is unproblematic and teardown just results in 
a bit more waiting to free things up.


Problematic is more overcommit swapping and OOM situations which need to 
wait for the hw scheduler to come back and tell us that the queue is now 
unmapped.



Also, if userspace lies to us and keeps pushing crap into the ring
after it's supposed to be idle: Userspace is already allowed to waste
gpu time. If you're too worried about this set a fairly aggressive
preempt timeout on the unload fence, and kill the context if it takes
longer than what preempting an idle ring should take (because that
would indicate broken/evil userspace).

I think you have the wrong expectation here. It is perfectly valid and
expected for userspace to keep writing commands into the ring buffer.

After all when one frame is completed they want to immediately start
rendering the next one.

Sure, for the true userspace direct submit model. But with that you don't
get dma-fence, which means this gpu will not work for 3d accel on any
current linux desktop.

I'm not sure of that. I've looked a bit into how we could add user
fences to dma_resv objects and that isn't that hard after all.

I think as a proof of concept it's fine, but as an actual solution ...
pls no. Two reasons:
- implicit sync is bad


Well can't disagree with that :) But I think we can't avoid supporting it.


- this doesn't fix anything for explicit sync using dma_fence in terms
of sync_file or drm_syncobj.


Exactly.

If we do implicit sync or explicit sync is orthogonal to the problems 
that sync must be made reliable somehow.


So when we sync and timeout the waiter should just continue, but whoever 
failed to signal will be punished.


But since this isn't solved on Windows I don't see how we can solve it 
on Linux either.



So if we go with the route of 

Re: [PATCH 4/9] drm/connector: Add support for out-of-band hotplug notification

2021-05-04 Thread Imre Deak
On Mon, May 03, 2021 at 11:00:20AM +0300, Heikki Krogerus wrote:
> Hi Hans,
> 
> On Wed, Apr 28, 2021 at 11:52:52PM +0200, Hans de Goede wrote:
> > +/**
> > + * struct drm_connector_oob_hotplug_event_data: OOB hotplug event data
> > + *
> > + * Contains data about out-of-band hotplug events, signalled through
> > + * drm_connector_oob_hotplug_event().
> > + */
> > +struct drm_connector_oob_hotplug_event_data {
> > +   /**
> > +* @connected: New connected status for the connector.
> > +*/
> > +   bool connected;
> > +   /**
> > +* @dp_lanes: Number of available displayport lanes, 0 if unknown.
> > +*/
> > +   int dp_lanes;
> > +   /**
> > +* @orientation: Connector orientation.
> > +*/
> > +   enum typec_orientation orientation;
> > +};
> 
> I don't think the orientation is relevant. It will always be "normal"
> from DP PoW after muxing, no?
> 
> I'm also not sure those deatils are enough in the long run. Based on
> what I've understood from our graphics team guys, for example knowing
> if multi-function is preferred may be important in some cases.

Combo PHY ports - which is what this patchset is adding the notification
for - can only reverse the lane assignment. TypeC PHY ports (on ICL+)
have a more C-type aware mux in the SoC (FIA) as well, so in theory we
could have a system based on such platforms with an external mux only
switching between the USB, DP, USB+DP (MFD) modes, but leaving the plug
orientation specific muxing up to the FIA. The graphics driver is not
involved in programming the FIA though, it's done by a firmware
component, so I don't think this configuration needs to get passed.

Yes, the driver needs to know if the PD controller configured the sink
in the MFD mode (DP+USB) or in the DP-only mode. For that the number of
lanes assigned to DP is enough.

> +Imre.
> 
> All of that, and more, is already available in the Configuration VDO
> Status VDO that the we have negotiated with the DP partner. Both those
> VDOs are part of struct typec_displayport_data. I think we should
> simply supply that structure to the DRM code instead of picking those
> details out of it...
> 
> >  /**
> >   * struct drm_tv_connector_state - TV connector related states
> >   * @subconnector: selected subconnector
> > @@ -1110,6 +1132,15 @@ struct drm_connector_funcs {
> >  */
> > void (*atomic_print_state)(struct drm_printer *p,
> >const struct drm_connector_state *state);
> > +
> > +   /**
> > +* @oob_hotplug_event:
> > +*
> > +* This will get called when a hotplug-event for a drm-connector
> > +* has been received from a source outside the display driver / device.
> > +*/
> > +   void (*oob_hotplug_event)(struct drm_connector *connector,
> > + struct drm_connector_oob_hotplug_event_data 
> > *data);
> 
> So I would not try to generalise this like that. This callback should
> be USB Type-C DP altmode specific:
> 
>   void (*oob_hotplug_event)(struct drm_connector *connector,
>   struct typec_displayport_data *data);
> 
> Or like this if the orientation can really be reversed after muxing:
> 
>   void (*oob_hotplug_event)(struct drm_connector *connector,
> struct typec_altmode *altmode,
>   struct typec_displayport_data *data);
> 
> You can now check the orientation separately with
> typec_altmode_get_orientation() if necessary.
> 
> 
> thanks,
> 
> -- 
> heikki
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: remove the nvlink2 pci_vfio subdriver v2

2021-05-04 Thread Greg Kroah-Hartman
On Tue, May 04, 2021 at 02:22:36PM +0200, Greg Kurz wrote:
> On Fri, 26 Mar 2021 07:13:09 +0100
> Christoph Hellwig  wrote:
> 
> > Hi all,
> > 
> > the nvlink2 vfio subdriver is a weird beast.  It supports a hardware
> > feature without any open source component - what would normally be
> > the normal open source userspace that we require for kernel drivers,
> > although in this particular case user space could of course be a
> > kernel driver in a VM.  It also happens to be a complete mess that
> > does not properly bind to PCI IDs, is hacked into the vfio_pci driver
> > and also pulles in over 1000 lines of code always build into powerpc
> > kernels that have Power NV support enabled.  Because of all these
> > issues and the lack of breaking userspace when it is removed I think
> > the best idea is to simply kill.
> > 
> > Changes since v1:
> >  - document the removed subtypes as reserved
> >  - add the ACK from Greg
> > 
> > Diffstat:
> >  arch/powerpc/platforms/powernv/npu-dma.c |  705 
> > ---
> >  b/arch/powerpc/include/asm/opal.h|3 
> >  b/arch/powerpc/include/asm/pci-bridge.h  |1 
> >  b/arch/powerpc/include/asm/pci.h |7 
> >  b/arch/powerpc/platforms/powernv/Makefile|2 
> >  b/arch/powerpc/platforms/powernv/opal-call.c |2 
> >  b/arch/powerpc/platforms/powernv/pci-ioda.c  |  185 ---
> >  b/arch/powerpc/platforms/powernv/pci.c   |   11 
> >  b/arch/powerpc/platforms/powernv/pci.h   |   17 
> >  b/arch/powerpc/platforms/pseries/pci.c   |   23 
> >  b/drivers/vfio/pci/Kconfig   |6 
> >  b/drivers/vfio/pci/Makefile  |1 
> >  b/drivers/vfio/pci/vfio_pci.c|   18 
> >  b/drivers/vfio/pci/vfio_pci_private.h|   14 
> >  b/include/uapi/linux/vfio.h  |   38 -
> 
> 
> Hi Christoph,
> 
> FYI, these uapi changes break build of QEMU.

What uapi changes?

What exactly breaks?

Why does QEMU require kernel driver stuff?

thanks,

greg k-h
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [RFC] CRIU support for ROCm

2021-05-04 Thread Daniel Vetter
On Fri, Apr 30, 2021 at 09:57:45PM -0400, Felix Kuehling wrote:
> We have been working on a prototype supporting CRIU (Checkpoint/Restore
> In Userspace) for accelerated compute applications running on AMD GPUs
> using ROCm (Radeon Open Compute Platform). We're happy to finally share
> this work publicly to solicit feedback and advice. The end-goal is to
> get this work included upstream in Linux and CRIU. A short whitepaper
> describing our design and intention can be found on Github:
> https://github.com/RadeonOpenCompute/criu/tree/criu-dev/test/others/ext-kfd/README.md.
>
> We have RFC patch series for the kernel (based on Alex Deucher's
> amd-staging-drm-next branch) and for CRIU including a new plugin and a
> few core CRIU changes. I will send those to the respective mailing lists
> separately in a minute. They can also be found on Github.
>
> CRIU+plugin: https://github.com/RadeonOpenCompute/criu/commits/criu-dev
> Kernel (KFD):
> 
> https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver/commits/fxkamd/criu-wip
>
> At this point this is very much a work in progress and not ready for
> upstream inclusion. There are still several missing features, known
> issues, and open questions that we would like to start addressing with
> your feedback.

Since the thread is a bit split I'm dumping the big thoughts here on this
RFC.

We've discussed this in the past, but I'm once more (insert meme here)
asking whether continuing to walk down the amdgpu vs amdkfd split is
really the right choice. It starts to feel a bit much like sunk cost
fallacy ...

- From the big thread we're having right now on dri-devel it's clear that
  3d will also move towards more and more a userspace submit model. But
  due to backwards compat issues it will be a mixed model, and in some
  cases we need to pick at runtime which model we're picking. A hard split
  between the amdgpu and the amdkfd world gets in the way here.

- There's use-cases for doing compute in vulkan (that was a discussion
  from Feb that I kicked again in private, since I think still
  unresolved). So you need a vulkan stack that runs on both amdgpu and
  amdvlk.

- Maybe not yet on amd's radar, but there's a lot of cloud computing. And
  maybe they also want CRIU for migrating their containers around. So that
  means CRIU for amdgpu too, not just amdkf.

- What's much worse, and I don't think anyone in amd has realized this yet
  (at least not in a public thread I've seen). In vulkan you need to be
  able to switch from compute mode to dma-fence mode after
  pipelines/devices have been created already. This is because winsys are
  only initialized in a second step, until that's done you have to
  pessimistically assume that the user does pure compute. What's worse for
  buffer sharing you don't even have a clear signal on this stuff. So
  either

  - you figure out how to migrate all the buffers and state from amdkfd to
amdgpu at runtime, and duplicate all the features. Which is rather
pointless.

  - or you duplicate all the compute features to amdgpu so that vk can use
them, and still reasonably easy migrate to winsys/dma-fence mode,
which makes amdkfd rather redundant.

  I've discussed this problem extensively with Jason Ekstrand, and it's
  really nasty.

So purely from a technical pov, only looking at the AMD perspective here,
this doesn't make much sense to me. The only reason to keep doubling down
on amdkfd I'm seeing is that you've built your compute rocm stack on top
of it, and because of that the only option is to keep doing that. Which
stops making sense eventually, and we're getting to that point for sure.

The other side is a bit the upstream side, but that's a lot smaller:

- vulkan compute is one of the more reasonable ways to get cross vendor
  compute ecosystem off the ground. At least from what I know from
  background chatter, which you guys probably haven't all heard. amdkfd
  being the single very odd driver here requiring entirely different uapi
  for compute mode is not going to be great.

- CRIU will need new access rights handling (for the save/restore/resume
  stuff you're adding). Generally we standardize access rights checks
  across drivers, and leave everything else to render drivers (command
  submission, memory management, ...). By adding CRIU support to amdkfd
  we pretty much guarantee that we wont be able to standardize CRIU access
  rights across drivers. Which just plains sucks from an
  upstream/cross-vendor ecosystem pov.

And yes we'd still need a per-driver criu plugin in userspace, but the
same is true for amdvlk/radv/anv/ and all the other drivers we have:
Driver is different, access right management is still the same.

And secondly, just because nvidia refuses to collaborate in any
standards around gpu compute doesn't mean that's a good reason for us to
do the same in upstream.

Thirdly, it sounds like this is the first device-driver CRIU support, so I
think we need a solid agreement/standard

Re: remove the nvlink2 pci_vfio subdriver v2

2021-05-04 Thread Cornelia Huck
On Tue, 4 May 2021 15:00:39 +0200
Christoph Hellwig  wrote:

> On Tue, May 04, 2021 at 02:59:07PM +0200, Greg Kroah-Hartman wrote:
> > > Hi Christoph,
> > > 
> > > FYI, these uapi changes break build of QEMU.  
> > 
> > What uapi changes?
> > 
> > What exactly breaks?
> > 
> > Why does QEMU require kernel driver stuff?  
> 
> Looks like it pull in the uapi struct definitions unconditionally
> instead of having a local copy.  We could fix that by just putting
> them back, but to me this seems like a rather broken configuration
> in qemu when it pulls in headers from the running/installed kernel
> without any feature checks before using them.
> 

It is not pulling them from the installed kernel, but from a
development version to get new definitions. Removing things in the
kernel requires workarounds in QEMU until it can remove those things as
well. It is not a dumb update...

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[RFC] Implicit vs explicit user fence sync

2021-05-04 Thread Christian König
Hi guys,

with this patch set I want to look into how much more additional work it would 
be to support implicit sync compared to only explicit sync.

Turned out that this is much simpler than expected since the only addition is 
that before a command submission or flip the kernel and classic drivers would 
need to wait for the user fence to signal before taking any locks.

For this prototype this patch set doesn't implement any user fence 
synchronization at all, but just assumes that faulting user pages is sufficient 
to make sure that we can wait for user space to finish submitting the work. If 
necessary this can be made even more strict, the only use case I could find 
which blocks this is the radeon driver and that should be handle able.

This of course doesn't give you the same semantic as the classic implicit sync 
to guarantee that you have exclusive access to a buffers, but this is also not 
necessary.

So I think the conclusion should be that we don't need to concentrate on 
implicit vs. explicit sync, but rather how to get the synchronization and 
timeout signalling figured out in general.

Regards,
Christian.


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 02/12] RDMA/mlx5: add DMA-buf user fence support

2021-05-04 Thread Christian König
Just add the call before taking locks.

Signed-off-by: Christian König 
---
 drivers/infiniband/hw/mlx5/odp.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/infiniband/hw/mlx5/odp.c b/drivers/infiniband/hw/mlx5/odp.c
index b103555b1f5d..6b4d980c02e8 100644
--- a/drivers/infiniband/hw/mlx5/odp.c
+++ b/drivers/infiniband/hw/mlx5/odp.c
@@ -804,6 +804,10 @@ static int pagefault_dmabuf_mr(struct mlx5_ib_mr *mr, 
size_t bcnt,
if (flags & MLX5_PF_FLAGS_ENABLE)
xlt_flags |= MLX5_IB_UPD_XLT_ENABLE;
 
+   err = dma_resv_sync_user_fence(umem_dmabuf->attach->dmabuf->resv);
+   if (err)
+   return err;
+
dma_resv_lock(umem_dmabuf->attach->dmabuf->resv, NULL);
err = ib_umem_dmabuf_map_pages(umem_dmabuf);
if (err) {
-- 
2.25.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 01/12] dma-buf: add interface for user fence synchronization

2021-05-04 Thread Christian König
This is a RFC/WIP patch which just adds the interface and lockdep
annotation without any actual implementation.

Signed-off-by: Christian König 
---
 drivers/dma-buf/dma-resv.c | 18 ++
 include/linux/dma-resv.h   |  1 +
 2 files changed, 19 insertions(+)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
index 6ddbeb5dfbf6..e0305424957b 100644
--- a/drivers/dma-buf/dma-resv.c
+++ b/drivers/dma-buf/dma-resv.c
@@ -681,3 +681,21 @@ bool dma_resv_test_signaled_rcu(struct dma_resv *obj, bool 
test_all)
return ret;
 }
 EXPORT_SYMBOL_GPL(dma_resv_test_signaled_rcu);
+
+/**
+ * dma_resv_sync_user_fence - block for user fences to signal
+ *
+ * @obj: The DMA resv object with the user fence attached
+ *
+ * To make sure we have proper synchronization between accesses block for user
+ * fences before starting a dma_fence based operation on the buffer.
+ */
+int dma_resv_sync_user_fence(struct dma_resv *obj)
+{
+   might_fault();
+
+   /* TODO: Actually come up with an implementation for this! */
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(dma_resv_sync_user_fence);
diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h
index d44a77e8a7e3..c525a36be900 100644
--- a/include/linux/dma-resv.h
+++ b/include/linux/dma-resv.h
@@ -289,5 +289,6 @@ long dma_resv_wait_timeout_rcu(struct dma_resv *obj, bool 
wait_all, bool intr,
   unsigned long timeout);
 
 bool dma_resv_test_signaled_rcu(struct dma_resv *obj, bool test_all);
+int dma_resv_sync_user_fence(struct dma_resv *obj);
 
 #endif /* _LINUX_RESERVATION_H */
-- 
2.25.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 03/12] drm/amdgpu: add DMA-buf user fence support

2021-05-04 Thread Christian König
Just add the call before taking locks.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 7 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_display.c | 6 ++
 2 files changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index b5c766998045..afd58c6d88a4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -534,6 +534,13 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
return r;
}
 
+   /* Sync to user fences */
+   amdgpu_bo_list_for_each_entry(e, p->bo_list) {
+   r = dma_resv_sync_user_fence(e->tv.bo->base.resv);
+   if (r)
+   return r;
+   }
+
/* One for TTM and one for the CS job */
amdgpu_bo_list_for_each_entry(e, p->bo_list)
e->tv.num_shared = 2;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
index 9a2f811450ed..3edd6dbae71f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
@@ -181,6 +181,12 @@ int amdgpu_display_crtc_page_flip_target(struct drm_crtc 
*crtc,
obj = fb->obj[0];
new_abo = gem_to_amdgpu_bo(obj);
 
+   r = dma_resv_sync_user_fence(obj->resv);
+   if (unlikely(r)) {
+   DRM_ERROR("failed to wait for user fence before flip\n");
+   goto cleanup;
+   }
+
/* pin the new buffer */
r = amdgpu_bo_reserve(new_abo, false);
if (unlikely(r != 0)) {
-- 
2.25.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 04/12] drm/gem: dd DMA-buf user fence support for the atomic helper

2021-05-04 Thread Christian König
Just add the call before taking locks.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/drm_gem_atomic_helper.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/drm_gem_atomic_helper.c 
b/drivers/gpu/drm/drm_gem_atomic_helper.c
index a005c5a0ba46..fe0d18486643 100644
--- a/drivers/gpu/drm/drm_gem_atomic_helper.c
+++ b/drivers/gpu/drm/drm_gem_atomic_helper.c
@@ -142,11 +142,15 @@ int drm_gem_plane_helper_prepare_fb(struct drm_plane 
*plane, struct drm_plane_st
 {
struct drm_gem_object *obj;
struct dma_fence *fence;
+   int ret;
 
if (!state->fb)
return 0;
 
obj = drm_gem_fb_get_obj(state->fb, 0);
+   ret = dma_resv_sync_user_fence(obj->resv);
+   if (ret)
+   return ret;
fence = dma_resv_get_excl_rcu(obj->resv);
drm_atomic_set_fence_for_plane(state, fence);
 
-- 
2.25.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 06/12] drm/i915: add DMA-buf user fence support

2021-05-04 Thread Christian König
Just add the call before taking locks.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c 
b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 5964e67c7d36..24c575d762db 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -872,6 +872,12 @@ static int eb_lookup_vmas(struct i915_execbuffer *eb)
goto err;
}
 
+   err = dma_resv_sync_user_fence(vma->obj->base.resv);
+   if (unlikely(err)) {
+   i915_vma_put(vma);
+   goto err;
+   }
+
eb_add_vma(eb, i, batch, vma);
 
if (i915_gem_object_is_userptr(vma->obj)) {
-- 
2.25.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 05/12] drm/etnaviv: add DMA-buf user fence support

2021-05-04 Thread Christian König
Just add the call before taking locks.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c | 23 ++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c 
b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
index d05c35994579..2e440674ca5b 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c
@@ -108,6 +108,21 @@ static int submit_lookup_objects(struct etnaviv_gem_submit 
*submit,
return ret;
 }
 
+static int submit_sync_user(struct etnaviv_gem_submit *submit)
+{
+   unsigned int i;
+   int ret;
+
+   for (i = 0; i < submit->nr_bos; i++) {
+   struct drm_gem_object *obj = &submit->bos[i].obj->base;
+
+   ret = dma_resv_sync_user_fence(obj->resv);
+   if (ret)
+   return ret;
+   }
+   return 0;
+}
+
 static void submit_unlock_object(struct etnaviv_gem_submit *submit, int i)
 {
if (submit->bos[i].flags & BO_LOCKED) {
@@ -518,8 +533,6 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void 
*data,
}
}
 
-   ww_acquire_init(&ticket, &reservation_ww_class);
-
submit = submit_create(dev, gpu, args->nr_bos, args->nr_pmrs);
if (!submit) {
ret = -ENOMEM;
@@ -541,6 +554,12 @@ int etnaviv_ioctl_gem_submit(struct drm_device *dev, void 
*data,
if (ret)
goto err_submit_objects;
 
+   ret = submit_sync_user(submit);
+   if (ret)
+   goto err_submit_objects;
+
+   ww_acquire_init(&ticket, &reservation_ww_class);
+
if ((priv->mmu_global->version != ETNAVIV_IOMMU_V2) &&
!etnaviv_cmd_validate_one(gpu, stream, args->stream_size / 4,
  relocs, args->nr_relocs)) {
-- 
2.25.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 09/12] drm/nouveau: add DMA-buf user fence support

2021-05-04 Thread Christian König
Just add the call before taking locks.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/nouveau/nouveau_gem.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c 
b/drivers/gpu/drm/nouveau/nouveau_gem.c
index a70e82413fa7..e349a8b32549 100644
--- a/drivers/gpu/drm/nouveau/nouveau_gem.c
+++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
@@ -552,6 +552,7 @@ nouveau_gem_pushbuf_validate(struct nouveau_channel *chan,
 struct validate_op *op, bool *apply_relocs)
 {
struct nouveau_cli *cli = nouveau_cli(file_priv);
+   unsigned int i;
int ret;
 
INIT_LIST_HEAD(&op->list);
@@ -559,6 +560,17 @@ nouveau_gem_pushbuf_validate(struct nouveau_channel *chan,
if (nr_buffers == 0)
return 0;
 
+   for (i = 0; i < nr_buffers; i++) {
+   struct drm_nouveau_gem_pushbuf_bo *b = &pbbo[i];
+   struct drm_gem_object *gem;
+
+   gem = drm_gem_object_lookup(file_priv, b->handle);
+   if (!gem)
+   return -ENOENT;
+   dma_resv_sync_user_fence(gem->resv);
+   drm_gem_object_put(gem);
+   }
+
ret = validate_init(chan, file_priv, pbbo, nr_buffers, op);
if (unlikely(ret)) {
if (ret != -ERESTARTSYS)
-- 
2.25.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 11/12] drm/radeon: add DMA-buf user fence support

2021-05-04 Thread Christian König
Just add the call before taking locks.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/radeon/radeon_cs.c  | 6 ++
 drivers/gpu/drm/radeon/radeon_display.c | 4 
 2 files changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/radeon/radeon_cs.c 
b/drivers/gpu/drm/radeon/radeon_cs.c
index 059431689c2d..fb0e238535f3 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -189,6 +189,12 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser 
*p)
  priority);
}
 
+   for (i = 0; i < p->nrelocs; i++) {
+   r = dma_resv_sync_user_fence(p->relocs[i].tv.bo->base.resv);
+   if (r)
+   return r;
+   }
+
radeon_cs_buckets_get_list(&buckets, &p->validated);
 
if (p->cs_flags & RADEON_CS_USE_VM)
diff --git a/drivers/gpu/drm/radeon/radeon_display.c 
b/drivers/gpu/drm/radeon/radeon_display.c
index 652af7a134bd..75ebd2338809 100644
--- a/drivers/gpu/drm/radeon/radeon_display.c
+++ b/drivers/gpu/drm/radeon/radeon_display.c
@@ -519,6 +519,10 @@ static int radeon_crtc_page_flip_target(struct drm_crtc 
*crtc,
DRM_DEBUG_DRIVER("flip-ioctl() cur_rbo = %p, new_rbo = %p\n",
 work->old_rbo, new_rbo);
 
+   r = dma_resv_sync_user_fence(new_rbo->tbo.base.resv);
+   if (unlikely(r != 0))
+   goto cleanup;
+
r = radeon_bo_reserve(new_rbo, false);
if (unlikely(r != 0)) {
DRM_ERROR("failed to reserve new rbo buffer before flip\n");
-- 
2.25.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 07/12] drm/lima: add DMA-buf user fence support

2021-05-04 Thread Christian König
Just add the call before taking locks.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/lima/lima_gem.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/lima/lima_gem.c b/drivers/gpu/drm/lima/lima_gem.c
index de62966243cd..d3d68218568d 100644
--- a/drivers/gpu/drm/lima/lima_gem.c
+++ b/drivers/gpu/drm/lima/lima_gem.c
@@ -321,6 +321,12 @@ int lima_gem_submit(struct drm_file *file, struct 
lima_submit *submit)
goto err_out0;
}
 
+   err = dma_resv_sync_user_fence(obj->resv);
+   if (err) {
+   drm_gem_object_put(obj);
+   goto err_out0;
+   }
+
bo = to_lima_bo(obj);
 
/* increase refcnt of gpu va map to prevent unmapped when 
executing,
-- 
2.25.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 08/12] drm/msm: add DMA-buf user fence support

2021-05-04 Thread Christian König
Just add the call before taking locks.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/msm/msm_gem_submit.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c 
b/drivers/gpu/drm/msm/msm_gem_submit.c
index 5480852bdeda..a77389ce23d0 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -285,6 +285,20 @@ static int submit_lock_objects(struct msm_gem_submit 
*submit)
return ret;
 }
 
+static int submit_sync_user_fence(struct msm_gem_submit *submit)
+{
+   int i, ret;
+
+   for (i = 0; i < submit->nr_bos; i++) {
+   struct msm_gem_object *msm_obj = submit->bos[i].obj;
+
+   ret = dma_resv_sync_user_fence(msm_obj->base.resv);
+   if (ret)
+   return ret;
+   }
+   return 0;
+}
+
 static int submit_fence_sync(struct msm_gem_submit *submit, bool no_implicit)
 {
int i, ret = 0;
@@ -769,6 +783,10 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void 
*data,
 */
pm_runtime_get_sync(&gpu->pdev->dev);
 
+   ret = submit_sync_user_fence(submit);
+   if (ret)
+   goto out;
+
/* copy_*_user while holding a ww ticket upsets lockdep */
ww_acquire_init(&submit->ticket, &reservation_ww_class);
has_ww_ticket = true;
-- 
2.25.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 12/12] drm/v3d: add DMA-buf user fence support

2021-05-04 Thread Christian König
Just add the call before taking locks.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/v3d/v3d_gem.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/v3d/v3d_gem.c b/drivers/gpu/drm/v3d/v3d_gem.c
index 4eb354226972..7c45292c641c 100644
--- a/drivers/gpu/drm/v3d/v3d_gem.c
+++ b/drivers/gpu/drm/v3d/v3d_gem.c
@@ -345,6 +345,12 @@ v3d_lookup_bos(struct drm_device *dev,
}
spin_unlock(&file_priv->table_lock);
 
+   for (i = 0; i < job->bo_count; i++) {
+   ret = dma_resv_sync_user_fence(job->bo[i]->resv);
+   if (ret)
+   break;
+   }
+
 fail:
kvfree(handles);
return ret;
-- 
2.25.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 10/12] drm/panfrost: add DMA-buf user fence support

2021-05-04 Thread Christian König
Just add the call before taking locks.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/panfrost/panfrost_job.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c 
b/drivers/gpu/drm/panfrost/panfrost_job.c
index 6003cfeb1322..9174ceb1d16d 100644
--- a/drivers/gpu/drm/panfrost/panfrost_job.c
+++ b/drivers/gpu/drm/panfrost/panfrost_job.c
@@ -216,6 +216,18 @@ static void panfrost_attach_object_fences(struct 
drm_gem_object **bos,
dma_resv_add_excl_fence(bos[i]->resv, fence);
 }
 
+static int panfrost_sync_user_fences(struct drm_gem_object **bos, int bo_count)
+{
+   int i, ret;
+
+   for (i = 0; i < bo_count; i++) {
+   ret = dma_resv_sync_user_fence(bos[i]->resv);
+   if (ret)
+   return ret;
+   }
+   return 0;
+}
+
 int panfrost_job_push(struct panfrost_job *job)
 {
struct panfrost_device *pfdev = job->pfdev;
@@ -224,6 +236,10 @@ int panfrost_job_push(struct panfrost_job *job)
struct ww_acquire_ctx acquire_ctx;
int ret = 0;
 
+   ret = panfrost_sync_user_fences(job->bos, job->bo_count);
+   if (ret)
+   return ret;
+
mutex_lock(&pfdev->sched_lock);
 
ret = drm_gem_lock_reservations(job->bos, job->bo_count,
-- 
2.25.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: remove the nvlink2 pci_vfio subdriver v2

2021-05-04 Thread Greg Kroah-Hartman
On Tue, May 04, 2021 at 03:20:34PM +0200, Greg Kurz wrote:
> On Tue, 4 May 2021 14:59:07 +0200
> Greg Kroah-Hartman  wrote:
> 
> > On Tue, May 04, 2021 at 02:22:36PM +0200, Greg Kurz wrote:
> > > On Fri, 26 Mar 2021 07:13:09 +0100
> > > Christoph Hellwig  wrote:
> > > 
> > > > Hi all,
> > > > 
> > > > the nvlink2 vfio subdriver is a weird beast.  It supports a hardware
> > > > feature without any open source component - what would normally be
> > > > the normal open source userspace that we require for kernel drivers,
> > > > although in this particular case user space could of course be a
> > > > kernel driver in a VM.  It also happens to be a complete mess that
> > > > does not properly bind to PCI IDs, is hacked into the vfio_pci driver
> > > > and also pulles in over 1000 lines of code always build into powerpc
> > > > kernels that have Power NV support enabled.  Because of all these
> > > > issues and the lack of breaking userspace when it is removed I think
> > > > the best idea is to simply kill.
> > > > 
> > > > Changes since v1:
> > > >  - document the removed subtypes as reserved
> > > >  - add the ACK from Greg
> > > > 
> > > > Diffstat:
> > > >  arch/powerpc/platforms/powernv/npu-dma.c |  705 
> > > > ---
> > > >  b/arch/powerpc/include/asm/opal.h|3 
> > > >  b/arch/powerpc/include/asm/pci-bridge.h  |1 
> > > >  b/arch/powerpc/include/asm/pci.h |7 
> > > >  b/arch/powerpc/platforms/powernv/Makefile|2 
> > > >  b/arch/powerpc/platforms/powernv/opal-call.c |2 
> > > >  b/arch/powerpc/platforms/powernv/pci-ioda.c  |  185 ---
> > > >  b/arch/powerpc/platforms/powernv/pci.c   |   11 
> > > >  b/arch/powerpc/platforms/powernv/pci.h   |   17 
> > > >  b/arch/powerpc/platforms/pseries/pci.c   |   23 
> > > >  b/drivers/vfio/pci/Kconfig   |6 
> > > >  b/drivers/vfio/pci/Makefile  |1 
> > > >  b/drivers/vfio/pci/vfio_pci.c|   18 
> > > >  b/drivers/vfio/pci/vfio_pci_private.h|   14 
> > > >  b/include/uapi/linux/vfio.h  |   38 -
> > > 
> > > 
> > > Hi Christoph,
> > > 
> > > FYI, these uapi changes break build of QEMU.
> > 
> > What uapi changes?
> > 
> 
> All macros and structure definitions that are being removed
> from include/uapi/linux/vfio.h by patch 1.
> 
> > What exactly breaks?
> > 
> 
> These macros and types are used by the current QEMU code base.
> Next time the QEMU source tree updates its copy of the kernel
> headers, the compilation of affected code will fail.

So does QEMU use this api that is being removed, or does it just have
some odd build artifacts of the uapi things?

What exactly is the error messages here?

And if we put the uapi .h file stuff back, is that sufficient for qemu
to work, as it should be checking at runtime what the kernel has / has
not anyway, right?

thanks,

greg k-h
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 8/8] drm/modifiers: Enforce consistency between the cap an IN_FORMATS

2021-05-04 Thread Pekka Paalanen
On Tue, 27 Apr 2021 11:20:18 +0200
Daniel Vetter  wrote:

> It's very confusing for userspace to have to deal with inconsistencies
> here, and some drivers screwed this up a bit. Most just ommitted the
> format list when they meant to say that only linear modifier is
> allowed, but some also meant that only implied modifiers are
> acceptable (because actually none of the planes registered supported
> modifiers).
> 
> Now that this is all done consistently across all drivers, document
> the rules and enforce it in the drm core.
> 
> v2:
> - Make the capability a link (Simon)
> - Note that all is lost before 5.1.
> 
> Acked-by: Maxime Ripard 
> Cc: Simon Ser 
> Reviewed-by: Lucas Stach 
> Cc: Pekka Paalanen 
> Signed-off-by: Daniel Vetter 
> Cc: Maarten Lankhorst 
> Cc: Maxime Ripard 
> Cc: Thomas Zimmermann 
> Cc: David Airlie 
> Cc: Daniel Vetter 
> ---
>  drivers/gpu/drm/drm_plane.c   | 18 +-
>  include/drm/drm_mode_config.h |  2 ++
>  2 files changed, 19 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/drm_plane.c b/drivers/gpu/drm/drm_plane.c
> index 0dd43882fe7c..20c7a1665414 100644
> --- a/drivers/gpu/drm/drm_plane.c
> +++ b/drivers/gpu/drm/drm_plane.c
> @@ -128,6 +128,13 @@
>   * pairs supported by this plane. The blob is a struct
>   * drm_format_modifier_blob. Without this property the plane doesn't
>   * support buffers with modifiers. Userspace cannot change this property.
> + *
> + * Note that userspace can check the &DRM_CAP_ADDFB2_MODIFIERS driver
> + * capability for general modifier support. If this flag is set then 
> every
> + * plane will have the IN_FORMATS property, even when it only supports
> + * DRM_FORMAT_MOD_LINEAR. Before linux kernel release v5.1 there have 
> been
> + * various bugs in this area with inconsistencies between the capability
> + * flag and per-plane properties.
>   */
>  
>  static unsigned int drm_num_planes(struct drm_device *dev)
> @@ -277,8 +284,14 @@ static int __drm_universal_plane_init(struct drm_device 
> *dev,
>   format_modifier_count++;
>   }
>  
> - if (format_modifier_count)
> + /* autoset the cap and check for consistency across all planes */
> + if (format_modifier_count) {
> + WARN_ON(!config->allow_fb_modifiers &&
> + !list_empty(&config->plane_list));
>   config->allow_fb_modifiers = true;
> + } else {
> + WARN_ON(config->allow_fb_modifiers);
> + }
>  
>   plane->modifier_count = format_modifier_count;
>   plane->modifiers = kmalloc_array(format_modifier_count,
> @@ -360,6 +373,9 @@ static int __drm_universal_plane_init(struct drm_device 
> *dev,
>   * drm_universal_plane_init() to let the DRM managed resource infrastructure
>   * take care of cleanup and deallocation.
>   *
> + * Drivers supporting modifiers must set @format_modifiers on all their 
> planes,
> + * even those that only support DRM_FORMAT_MOD_LINEAR.
> + *
>   * Returns:
>   * Zero on success, error code on failure.
>   */
> diff --git a/include/drm/drm_mode_config.h b/include/drm/drm_mode_config.h
> index ab424ddd7665..1ddf7783fdf7 100644
> --- a/include/drm/drm_mode_config.h
> +++ b/include/drm/drm_mode_config.h
> @@ -909,6 +909,8 @@ struct drm_mode_config {
>* @allow_fb_modifiers:
>*
>* Whether the driver supports fb modifiers in the ADDFB2.1 ioctl call.
> +  * Note that drivers should not set this directly, it is automatically
> +  * set in drm_universal_plane_init().
>*
>* IMPORTANT:
>*

I can only say about the doc parts, but:

Acked-by: Pekka Paalanen 

For patches 2 and 5 too, on the grounds that the idea is good.


Thanks,
pq


pgpisxyY0Fu3T.pgp
Description: OpenPGP digital signature
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 8/8] drm/modifiers: Enforce consistency between the cap an IN_FORMATS

2021-05-04 Thread Emil Velikov
Hi Daniel,

Thanks for the extra clarification.

On Tue, 27 Apr 2021 at 13:22, Daniel Vetter  wrote:
>
> On Tue, Apr 27, 2021 at 12:32:19PM +0100, Emil Velikov wrote:
> > Hi Daniel,
> >
> > On Tue, 27 Apr 2021 at 10:20, Daniel Vetter  wrote:
> >
> > > @@ -360,6 +373,9 @@ static int __drm_universal_plane_init(struct 
> > > drm_device *dev,
> > >   * drm_universal_plane_init() to let the DRM managed resource 
> > > infrastructure
> > >   * take care of cleanup and deallocation.
> > >   *
> > > + * Drivers supporting modifiers must set @format_modifiers on all their 
> > > planes,
> > > + * even those that only support DRM_FORMAT_MOD_LINEAR.
> > > + *
> > The comment says "must", yet we have an "if (format_modifiers)" in the 
> > codebase.
> > Shouldn't we add a WARN_ON() + return -EINVAL (or similar) so people
> > can see and fix their drivers?
>
> This is a must only for drivers supporting modifiers, not all drivers.
> Hence the check in the if. I did add WARN_ON for the combos that get stuff
> wrong though (like only supply one side of the modifier info, not both).
>
Hmm you're spot on - the arm/malidp patch threw me off for a minute.

> > As a follow-up one could even go a step further, by erroring out when
> > the driver hasn't provided valid modifier(s) and even removing
> > config::allow_fb_modifiers all together.
>
> Well that currently only exists to avoid walking the plane list (which we
> need to do for validation that all planes are the same). It's quite tricky
> code for tiny benefit, so I don't think it's worth it trying to remove
> allow_fb_modifiers completely.
>
Pardon if I'm saying something painfully silly - it's been a while
since I've looked closely at KMS.

>From some grepping around, removing ::allow_fb_modifiers would be OK
although it's a secondary goal. It feels like the bigger win will be
simpler modifier handling in DRM.

In particular, one could always "inject" the linear modifier within
drm_universal_plane_init() and always expose DRM_CAP_ADDFB2_MODIFIERS.
Some drivers mxsfb, mgag200, stm and likely others already advertise
the CAP, even though they seemingly lack any modifiers.

The linear/invalid cargo-cult to drm_universal_plane_init() seems
strong and this series adds even more.

Another plus of always exposing the CAP, is that one could mandate (or
nuke) optional .format_mod_supported that you/Ville discussed
earlier[1].
Currently things are weird, since it's required to create IN_FORMAT
blob, yet drivers lack it while simultaneously exposing the CAP to
userspace.

One such example is exynos... Although recently it recently dropped
`allow_fb_modifiers = true` and there are no modifiers passed to
drm_universal_plane_init(), so the CAP is no longer supported.
Inki you might want to check, if that broke your userspace.


Tl:Dr: There _might_ be value in simplifying the modifier handling
_after_ these fixes land.


[1] 
https://lore.kernel.org/dri-devel/cakmk7ugnp5us8kfffnpwq7g4b0-b2q-m7deqz_rphtcrh_q...@mail.gmail.com/

> > Although for stable - this series + WARN_ON (no return since it might
> > break buggy drivers) sounds good.
> >
> > > @@ -909,6 +909,8 @@ struct drm_mode_config {
> > >  * @allow_fb_modifiers:
> > >  *
> > >  * Whether the driver supports fb modifiers in the ADDFB2.1 ioctl 
> > > call.
> > > +* Note that drivers should not set this directly, it is 
> > > automatically
> > > +* set in drm_universal_plane_init().
> > >  *
> > >  * IMPORTANT:
> > >  *
> > The new note and the existing IMPORTANT are in a weird mix.
> > Quoting the latter since it doesn't show in the diff.
> >
> > If this is set the driver must fill out the full implicit modifier
> > information in their &drm_mode_config_funcs.fb_create hook for legacy
> > userspace which does not set modifiers. Otherwise the GETFB2 ioctl is
> > broken for modifier aware userspace.
> >
> > In particular:
> > As the new note says "don't set it" and the existing note one says "if
> > it's set". Yet no drivers do "if (config->allow_fb_modifiers)".
> >
> > Sadly, nothing comes to mind atm wrt alternative wording.
>
> Yeah it's a bit disappointing.
>
> > With the WARN_ON() added or s/must/should/ in the documentation, the series 
> > is:
>
> With my clarification, can you please recheck whether as-is it's not
> correct?
>
Indeed - with the series as-is my RB stands.

Thanks
-Emil
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [RFC] Implicit vs explicit user fence sync

2021-05-04 Thread Daniel Vetter
Hi Christian,

On Tue, May 04, 2021 at 03:27:17PM +0200, Christian König wrote:
> Hi guys,
> 
> with this patch set I want to look into how much more additional work it
> would be to support implicit sync compared to only explicit sync.
> 
> Turned out that this is much simpler than expected since the only
> addition is that before a command submission or flip the kernel and
> classic drivers would need to wait for the user fence to signal before
> taking any locks.

It's a lot more I think
- sync_file/drm_syncobj still need to be supported somehow
- we need userspace to handle the stall in a submit thread at least
- there's nothing here that sets the sync object
- implicit sync isn't just execbuf, it's everything. E.g. the various
  wait_bo ioctl also need to keep working, including timeout and
  everything
- we can't stall in atomic kms where you're currently stalling, that's for
  sure. The uapi says "we're not stalling for fences in there", and you're
  breaking that.
- ... at this point I stopped pondering but there's definitely more

Imo the only way we'll even get the complete is if we do the following:
1. roll out implicit sync with userspace fences on a driver-by-driver basis
   1a. including all the winsys/modeset stuff
2. roll out support for userspace fences to drm_syncobj timeline for
   interop, both across process/userspace and across drivers
   2a. including all the winsys/modeset stuff, but hopefully that's
   largely solved with 1. already.
3. only then try to figure out how to retroshoehorn this into implicit
   sync, and whether that even makes sense.

Because doing 3 before we've done 1&2 for at least 2 drivers (2 because
interop fun across drivers) is just praying that this time around we're
not collectively idiots and can correctly predict the future. That never
worked :-)

> For this prototype this patch set doesn't implement any user fence
> synchronization at all, but just assumes that faulting user pages is
> sufficient to make sure that we can wait for user space to finish
> submitting the work. If necessary this can be made even more strict, the
> only use case I could find which blocks this is the radeon driver and
> that should be handle able.
> 
> This of course doesn't give you the same semantic as the classic
> implicit sync to guarantee that you have exclusive access to a buffers,
> but this is also not necessary.
> 
> So I think the conclusion should be that we don't need to concentrate on
> implicit vs. explicit sync, but rather how to get the synchronization
> and timeout signalling figured out in general.

I'm not sure what exactly you're proving here aside from "it's possible to
roll out a function with ill-defined semantics to all drivers". This
really is a lot harder than just this one function and just this one patch
set.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: remove the nvlink2 pci_vfio subdriver v2

2021-05-04 Thread Daniel Vetter
On Tue, May 04, 2021 at 03:20:34PM +0200, Greg Kurz wrote:
> On Tue, 4 May 2021 14:59:07 +0200
> Greg Kroah-Hartman  wrote:
> 
> > On Tue, May 04, 2021 at 02:22:36PM +0200, Greg Kurz wrote:
> > > On Fri, 26 Mar 2021 07:13:09 +0100
> > > Christoph Hellwig  wrote:
> > > 
> > > > Hi all,
> > > > 
> > > > the nvlink2 vfio subdriver is a weird beast.  It supports a hardware
> > > > feature without any open source component - what would normally be
> > > > the normal open source userspace that we require for kernel drivers,
> > > > although in this particular case user space could of course be a
> > > > kernel driver in a VM.  It also happens to be a complete mess that
> > > > does not properly bind to PCI IDs, is hacked into the vfio_pci driver
> > > > and also pulles in over 1000 lines of code always build into powerpc
> > > > kernels that have Power NV support enabled.  Because of all these
> > > > issues and the lack of breaking userspace when it is removed I think
> > > > the best idea is to simply kill.
> > > > 
> > > > Changes since v1:
> > > >  - document the removed subtypes as reserved
> > > >  - add the ACK from Greg
> > > > 
> > > > Diffstat:
> > > >  arch/powerpc/platforms/powernv/npu-dma.c |  705 
> > > > ---
> > > >  b/arch/powerpc/include/asm/opal.h|3 
> > > >  b/arch/powerpc/include/asm/pci-bridge.h  |1 
> > > >  b/arch/powerpc/include/asm/pci.h |7 
> > > >  b/arch/powerpc/platforms/powernv/Makefile|2 
> > > >  b/arch/powerpc/platforms/powernv/opal-call.c |2 
> > > >  b/arch/powerpc/platforms/powernv/pci-ioda.c  |  185 ---
> > > >  b/arch/powerpc/platforms/powernv/pci.c   |   11 
> > > >  b/arch/powerpc/platforms/powernv/pci.h   |   17 
> > > >  b/arch/powerpc/platforms/pseries/pci.c   |   23 
> > > >  b/drivers/vfio/pci/Kconfig   |6 
> > > >  b/drivers/vfio/pci/Makefile  |1 
> > > >  b/drivers/vfio/pci/vfio_pci.c|   18 
> > > >  b/drivers/vfio/pci/vfio_pci_private.h|   14 
> > > >  b/include/uapi/linux/vfio.h  |   38 -
> > > 
> > > 
> > > Hi Christoph,
> > > 
> > > FYI, these uapi changes break build of QEMU.
> > 
> > What uapi changes?
> > 
> 
> All macros and structure definitions that are being removed
> from include/uapi/linux/vfio.h by patch 1.

Just my 2cents from drm (where we deprecate old gunk uapi quite often):
Imo it's best to keep the uapi headers as-is, but exchange the
documentation with a big "this is removed, never use again" warning:

- it occasionally serves as a good lesson for how to not do uapi (whatever
  the reasons really are in the specific case)

- it's good to know which uapi numbers (like parameter extensions or
  whatever they are in this case) are defacto reserved, because there are
  binaries (qemu in this) that have code acting on them out there.

The only exception where we completely nuke the structs and #defines is
when uapi has been only used by testcases. Which we know, since we defacto
limit our stable uapi guarantee to the canonical open&upstream userspace
drivers only (for at least the driver-specific stuff, the cross-driver
interfaces are hopeless).

Anyway feel free to ignore since this might be different than drivers/gpu.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [RFC] Implicit vs explicit user fence sync

2021-05-04 Thread Christian König

Hi Daniel,

Am 04.05.21 um 16:15 schrieb Daniel Vetter:

Hi Christian,

On Tue, May 04, 2021 at 03:27:17PM +0200, Christian König wrote:

Hi guys,

with this patch set I want to look into how much more additional work it
would be to support implicit sync compared to only explicit sync.

Turned out that this is much simpler than expected since the only
addition is that before a command submission or flip the kernel and
classic drivers would need to wait for the user fence to signal before
taking any locks.

It's a lot more I think
- sync_file/drm_syncobj still need to be supported somehow


You need that with explicit fences as well.

I'm just concentrating on what extra burden implicit sync would get us.


- we need userspace to handle the stall in a submit thread at least
- there's nothing here that sets the sync object
- implicit sync isn't just execbuf, it's everything. E.g. the various
   wait_bo ioctl also need to keep working, including timeout and
   everything


Good point, but that should be relatively easily to add as well.


- we can't stall in atomic kms where you're currently stalling, that's for
   sure. The uapi says "we're not stalling for fences in there", and you're
   breaking that.


Again as far as I can see we run into the same problem with explicit sync.

So the question is where could we block for atomic modeset for user 
fences in general?



- ... at this point I stopped pondering but there's definitely more

Imo the only way we'll even get the complete is if we do the following:
1. roll out implicit sync with userspace fences on a driver-by-driver basis
1a. including all the winsys/modeset stuff


Completely agree, that's why I've split that up into individual patches.

I'm also fine if drivers can just opt out of user fence based 
synchronization and we return an error from dma_buf_dynamic_attach() if 
some driver says it can't handle that.



2. roll out support for userspace fences to drm_syncobj timeline for
interop, both across process/userspace and across drivers
2a. including all the winsys/modeset stuff, but hopefully that's
largely solved with 1. already.


Correct, but again we need this for explicit fencing as well.


3. only then try to figure out how to retroshoehorn this into implicit
sync, and whether that even makes sense.

Because doing 3 before we've done 1&2 for at least 2 drivers (2 because
interop fun across drivers) is just praying that this time around we're
not collectively idiots and can correctly predict the future. That never
worked :-)


For this prototype this patch set doesn't implement any user fence
synchronization at all, but just assumes that faulting user pages is
sufficient to make sure that we can wait for user space to finish
submitting the work. If necessary this can be made even more strict, the
only use case I could find which blocks this is the radeon driver and
that should be handle able.

This of course doesn't give you the same semantic as the classic
implicit sync to guarantee that you have exclusive access to a buffers,
but this is also not necessary.

So I think the conclusion should be that we don't need to concentrate on
implicit vs. explicit sync, but rather how to get the synchronization
and timeout signalling figured out in general.

I'm not sure what exactly you're proving here aside from "it's possible to
roll out a function with ill-defined semantics to all drivers". This
really is a lot harder than just this one function and just this one patch
set.


No it isn't. The hard part is getting the user sync stuff up in general.

Adding implicit synchronization on top of that is then rather trivial.

Christian.


-Daniel


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] fbmem: Mark proc_fb_seq_ops as __maybe_unused

2021-05-04 Thread Guenter Roeck
With CONFIG_PROC_FS=n and -Werror, 0-day reports:

drivers/video/fbdev/core/fbmem.c:736:36: error:
'proc_fb_seq_ops' defined but not used

Mark it as __maybe_unused.

Reported-by: kernel test robot 
Signed-off-by: Guenter Roeck 
---
 drivers/video/fbdev/core/fbmem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c
index 372b52a2befa..52c606c0f8a2 100644
--- a/drivers/video/fbdev/core/fbmem.c
+++ b/drivers/video/fbdev/core/fbmem.c
@@ -733,7 +733,7 @@ static int fb_seq_show(struct seq_file *m, void *v)
return 0;
 }
 
-static const struct seq_operations proc_fb_seq_ops = {
+static const struct __maybe_unused seq_operations proc_fb_seq_ops = {
.start  = fb_seq_start,
.next   = fb_seq_next,
.stop   = fb_seq_stop,
-- 
2.25.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] drm/bridge: ti-sn65dsi86: Remove __exit from GPIO sub-driver remove helper

2021-05-04 Thread Douglas Anderson
The ti_sn_gpio_unregister() is not just called from the remove path
but also from the error handling of the init path. That means it can't
have the __exit annotation.

Fixes: bf73537f411b ("drm/bridge: ti-sn65dsi86: Break GPIO and MIPI-to-eDP 
bridge into sub-drivers")
Reported-by: kernel test robot 
Signed-off-by: Douglas Anderson 
---

 drivers/gpu/drm/bridge/ti-sn65dsi86.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/bridge/ti-sn65dsi86.c 
b/drivers/gpu/drm/bridge/ti-sn65dsi86.c
index db027528febd..bb0a0e1c6341 100644
--- a/drivers/gpu/drm/bridge/ti-sn65dsi86.c
+++ b/drivers/gpu/drm/bridge/ti-sn65dsi86.c
@@ -1251,7 +1251,7 @@ static int __init ti_sn_gpio_register(void)
return auxiliary_driver_register(&ti_sn_gpio_driver);
 }
 
-static void __exit ti_sn_gpio_unregister(void)
+static void ti_sn_gpio_unregister(void)
 {
auxiliary_driver_unregister(&ti_sn_gpio_driver);
 }
-- 
2.31.1.527.g47e6f16901-goog

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 4/9] drm/connector: Add support for out-of-band hotplug notification

2021-05-04 Thread Heikki Krogerus
On Mon, May 03, 2021 at 04:35:29PM +0200, Hans de Goede wrote:
> Hi,
> 
> On 5/3/21 10:00 AM, Heikki Krogerus wrote:
> > Hi Hans,
> > 
> > On Wed, Apr 28, 2021 at 11:52:52PM +0200, Hans de Goede wrote:
> >> +/**
> >> + * struct drm_connector_oob_hotplug_event_data: OOB hotplug event data
> >> + *
> >> + * Contains data about out-of-band hotplug events, signalled through
> >> + * drm_connector_oob_hotplug_event().
> >> + */
> >> +struct drm_connector_oob_hotplug_event_data {
> >> +  /**
> >> +   * @connected: New connected status for the connector.
> >> +   */
> >> +  bool connected;
> >> +  /**
> >> +   * @dp_lanes: Number of available displayport lanes, 0 if unknown.
> >> +   */
> >> +  int dp_lanes;
> >> +  /**
> >> +   * @orientation: Connector orientation.
> >> +   */
> >> +  enum typec_orientation orientation;
> >> +};
> > 
> > I don't think the orientation is relevant. It will always be "normal"
> > from DP PoW after muxing, no?
> 
> That is what I thought to, but during the discussion of my previous attempt
> at this one of the i915 devs mentioned that in some cases the muxes manage
> to swap the lane order when the connector upside-down and at least the
> Intel GPUs can correct for this on the GPU side, so they asked for this
> info to be included.
> 
> > I'm also not sure those deatils are enough in the long run. Based on
> > what I've understood from our graphics team guys, for example knowing
> > if multi-function is preferred may be important in some cases.
> 
> The current data being passed is just intended as a starting point,
> this is purely a kernel internal API so we can easily add more
> data to the struct. As I mentioned in the cover-letter the current
> oob_hotplug handler which the i915 patch adds to the i915 driver does
> not actually do anything with the data.  ATM it is purely there to
> demonstrate that the ability to pass relevant data is there now
> (which was an issue with the previous attempt). I believe the current
> code is fine as a PoC of "pass event data" once GPU drivers actually
> start doing something with the data we can extend or outright replace
> it without issues.

Ah, if there is nothing using that information yet, then just don't
pass it at all for now. As you said, it's kernel internal API, we can
change it later if needed.

> > All of that, and more, is already available in the Configuration VDO
> > Status VDO that the we have negotiated with the DP partner. Both those
> > VDOs are part of struct typec_displayport_data. I think we should
> > simply supply that structure to the DRM code instead of picking those
> > details out of it...
> 
> I'm not sure I like the idea of passing the raw VDO, but if the
> DRM folks think that would be useful we can certainly add it.

Why are you against passing all the data that we have? What is the
benefit in picking only certain details out of an object that has a
standard format, and constructing a customised object for those
details instead?


thanks,

-- 
heikki
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 5/9] drm/i915: Associate ACPI connector nodes with connector entries

2021-05-04 Thread Heikki Krogerus
Hi Andy,

> > +/* NOTE: The connector order must be final before this is called. */
> > +void intel_acpi_assign_connector_fwnodes(struct drm_i915_private *i915)
> > +{
> > +   struct drm_connector_list_iter conn_iter;
> > +   struct drm_device *drm_dev = &i915->drm;
> > +   struct device *kdev = &drm_dev->pdev->dev;
> > +   struct fwnode_handle *fwnode = NULL;
> > +   struct drm_connector *connector;
> > +   struct acpi_device *adev;
> > +
> > +   drm_connector_list_iter_begin(drm_dev, &conn_iter);
> > +   drm_for_each_connector_iter(connector, &conn_iter) {
> > +   /* Always getting the next, even when the last was not
> > used. */
> > +   fwnode = device_get_next_child_node(kdev, fwnode);
> > +   if (!fwnode)
> > +   break;
> 
> Who is dropping reference counting on fwnode ?
> 
> I’m in the middle of a pile of fixes for fwnode refcounting when
> for_each_child or get_next_child is used. So, please double check you drop
> a reference.

Sorry Andy. This patch is from time before the software nodes
implementation of the get_next_child callback handled the ref counting
properly.

Br,

-- 
heikki
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 8/8] drm/modifiers: Enforce consistency between the cap an IN_FORMATS

2021-05-04 Thread Simon Ser
Continuing on that idea to push for enabling the cap in more cases: do
we have a policy to require new drivers to always support modifiers?

That would be nice, even if it's just about enabling LINEAR.
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 4/9] drm/connector: Add support for out-of-band hotplug notification (v2)

2021-05-04 Thread Heikki Krogerus
> +/**
> + * drm_connector_oob_hotplug_event - Report out-of-band hotplug event to 
> connector
> + * @connector: connector to report the event on
> + * @data: data related to the event
> + *
> + * On some hardware a hotplug event notification may come from outside the 
> display
> + * driver / device. An example of this is some USB Type-C setups where the 
> hardware
> + * muxes the DisplayPort data and aux-lines but does not pass the altmode HPD
> + * status bit to the GPU's DP HPD pin.
> + *
> + * This function can be used to report these out-of-band events after 
> obtaining
> + * a drm_connector reference through calling drm_connector_find_by_fwnode().
> + */
> +void drm_connector_oob_hotplug_event(struct fwnode_handle *connector_fwnode,
> +  struct 
> drm_connector_oob_hotplug_event_data *data)
> +{
> + struct drm_connector *connector;
> +
> + connector = drm_connector_find_by_fwnode(connector_fwnode);
> + if (IS_ERR(connector))
> + return;
> +
> + if (connector->funcs->oob_hotplug_event)
> + connector->funcs->oob_hotplug_event(connector, data);
> +
> + drm_connector_put(connector);
> +}
> +EXPORT_SYMBOL(drm_connector_oob_hotplug_event);

So it does looks like the "data" parameter is not needed at all:

void drm_connector_oob_hotplug_event(struct fwnode_handle *connector_fwnode)
{
struct drm_connector *connector;

connector = drm_connector_find_by_fwnode(connector_fwnode);
if (IS_ERR(connector))
return;

if (connector->funcs->oob_hotplug_event)
connector->funcs->oob_hotplug_event(connector);

drm_connector_put(connector);
}

thanks,

-- 
heikki
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [RFC] Implicit vs explicit user fence sync

2021-05-04 Thread Daniel Vetter
On Tue, May 04, 2021 at 04:26:42PM +0200, Christian König wrote:
> Hi Daniel,
> 
> Am 04.05.21 um 16:15 schrieb Daniel Vetter:
> > Hi Christian,
> > 
> > On Tue, May 04, 2021 at 03:27:17PM +0200, Christian König wrote:
> > > Hi guys,
> > > 
> > > with this patch set I want to look into how much more additional work it
> > > would be to support implicit sync compared to only explicit sync.
> > > 
> > > Turned out that this is much simpler than expected since the only
> > > addition is that before a command submission or flip the kernel and
> > > classic drivers would need to wait for the user fence to signal before
> > > taking any locks.
> > It's a lot more I think
> > - sync_file/drm_syncobj still need to be supported somehow
> 
> You need that with explicit fences as well.
> 
> I'm just concentrating on what extra burden implicit sync would get us.

It's not just implicit sync. Currently the best approach we have for
explicit sync is hiding them in drm_syncobj. Because for that all the work
with intentional stall points and userspace submit thread already exists.

None of this work has been done for sync_file. And looking at how much
work it was to get drm_syncobj going, that will be anything but easy.

> > - we need userspace to handle the stall in a submit thread at least
> > - there's nothing here that sets the sync object
> > - implicit sync isn't just execbuf, it's everything. E.g. the various
> >wait_bo ioctl also need to keep working, including timeout and
> >everything
> 
> Good point, but that should be relatively easily to add as well.
> 
> > - we can't stall in atomic kms where you're currently stalling, that's for
> >sure. The uapi says "we're not stalling for fences in there", and you're
> >breaking that.
> 
> Again as far as I can see we run into the same problem with explicit sync.
> 
> So the question is where could we block for atomic modeset for user fences
> in general?

Nah, I have an idea. But it only works if userspace is aware, because the
rules are essentialyl:

- when you supply a userspace in-fence, then you only get a userspace
  out-fence
- mixing in fences between dma-fence and user fence is ok
- mixing out fences isn't

And we currently do have sync_file out fence. So it's not possible to
support implicit user fence in atomic in a way which doesn't break the
uapi somewhere.

Doing the explicit user fence support first will make that very obvious.

And that's just the one ioctl I know is big trouble, I'm sure we'll find
more funny corner cases when we roll out explicit user fencing.

Anotherone that looks very sketchy right now is buffer sharing between
different userspace drivers, like compute <-> media (if you have some
fancy AI pipeline in your media workload, as an example).

> > - ... at this point I stopped pondering but there's definitely more
> > 
> > Imo the only way we'll even get the complete is if we do the following:
> > 1. roll out implicit sync with userspace fences on a driver-by-driver basis
s/implicit/explicit/

But I think you got that.

> > 1a. including all the winsys/modeset stuff
> 
> Completely agree, that's why I've split that up into individual patches.
> 
> I'm also fine if drivers can just opt out of user fence based
> synchronization and we return an error from dma_buf_dynamic_attach() if some
> driver says it can't handle that.

Yeah, but that boils down to us just breaking those use-cases. Which is
exactly what you're trying to avoid by rolling out implicit user fence I
think.

> > 2. roll out support for userspace fences to drm_syncobj timeline for
> > interop, both across process/userspace and across drivers
> > 2a. including all the winsys/modeset stuff, but hopefully that's
> > largely solved with 1. already.
> 
> Correct, but again we need this for explicit fencing as well.
> 
> > 3. only then try to figure out how to retroshoehorn this into implicit
> > sync, and whether that even makes sense.
> > 
> > Because doing 3 before we've done 1&2 for at least 2 drivers (2 because
> > interop fun across drivers) is just praying that this time around we're
> > not collectively idiots and can correctly predict the future. That never
> > worked :-)
> > 
> > > For this prototype this patch set doesn't implement any user fence
> > > synchronization at all, but just assumes that faulting user pages is
> > > sufficient to make sure that we can wait for user space to finish
> > > submitting the work. If necessary this can be made even more strict, the
> > > only use case I could find which blocks this is the radeon driver and
> > > that should be handle able.
> > > 
> > > This of course doesn't give you the same semantic as the classic
> > > implicit sync to guarantee that you have exclusive access to a buffers,
> > > but this is also not necessary.
> > > 
> > > So I think the conclusion should be that we don't need to concentrate on
> > > implicit vs. explicit sync, but rather how to get the synchro

Re: [PATCH 0/9] drm + usb-type-c: Add support for out-of-band hotplug notification (v2)

2021-05-04 Thread Heikki Krogerus
On Mon, May 03, 2021 at 05:46:38PM +0200, Hans de Goede wrote:
> Hi All,
> 
> Here is v2 of my work on making DP over Type-C work on devices where the
> Type-C controller does not drive the HPD pin on the GPU, but instead
> we need to forward HPD events from the Type-C controller to the DRM driver.
> 
> Changes in v2:
> - Replace the bogus "drm/connector: Make the drm_sysfs connector->kdev
>   device hold a reference to the connector" patch with:
>   "drm/connector: Give connector sysfs devices there own device_type"
>   the new patch is a dep for patch 2/9 see the patches
> 
> - Stop using a class-dev-iter, instead at a global connector list
>   to drm_connector.c and use that to find the connector by the fwnode,
>   similar to how we already do this in drm_panel.c and drm_bridge.c
> 
> - Make drm_connector_oob_hotplug_event() take a fwnode pointer as
>   argument, rather then a drm_connector pointer and let it do the
>   lookup itself. This allows making drm_connector_find_by_fwnode() a
>   drm-internal function and avoids code outside the drm subsystem
>   potentially holding on the a drm_connector reference for a longer
>   period.
> 
> This series not only touches drm subsys files but it also touches
> drivers/usb/typec/altmodes/typec_displayport.c, that file usually
> does not see a whole lot of changes. So I believe it would be best
> to just merge the entire series through drm-misc, Assuming we can
> get an ack from Greg for merging the typec_displayport.c changes
> this way.
> 
> ### 
> 
> As already mentioned in the v1 cover-letter this series replaces
> a previous attempt from quite some time ago. 
> For anyone interested here are the old (2019!) patches for this:
> 
> https://patchwork.freedesktop.org/patch/288491/
> https://patchwork.freedesktop.org/patch/288493/
> https://patchwork.freedesktop.org/patch/288495/
> 
> Last time I posted this the biggest change requested was for more info to
> be included in the event send to the DRM-subsystem, specifically sending
> the following info was requested:
> 
> 1. Which DP connector on the GPU the event is for
> 2. How many lanes are available
> 3. Connector orientation
> 
> This series is basically an entirely new approach, which no longer
> uses the notifier framework at all. Instead the Type-C code looksup
> a connector based on a fwnode (this was suggested by Heikki Krogerus)
> and then calls a new oob_hotplug_event drm_connector_func directly
> on the connector, passing the requested info as argument.
> 
> Info such as the orientation and the number of dp-lanes is now passed
> to the drm_connector_oob_hotplug_event() function as requested in the
> review of the old code, but nothing is done with it for now.
> Using this info falls well outside of my knowledge of the i915 driver
> so this is left to a follow-up patch (I will be available to test
> patches for this).

Thanks for taking care of these! It's really great that you spent the
time to do this series. I'm already thinking about what we can add
after these are in. I think support for re-configuration, so support
for changing the pin-configuration in runtime is going to be needed
soon after these. But first things first (sorry, I'm getting ahead of
myself).


thanks,

-- 
heikki
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/bridge: ti-sn65dsi86: Remove __exit from GPIO sub-driver remove helper

2021-05-04 Thread Robert Foss
Hey Douglas,

On Tue, 4 May 2021 at 16:39, Douglas Anderson  wrote:

> The ti_sn_gpio_unregister() is not just called from the remove path
> but also from the error handling of the init path. That means it can't
> have the __exit annotation.
>
> Fixes: bf73537f411b ("drm/bridge: ti-sn65dsi86: Break GPIO and MIPI-to-eDP
> bridge into sub-drivers")
> Reported-by: kernel test robot 
> Signed-off-by: Douglas Anderson 
> ---
>
>  drivers/gpu/drm/bridge/ti-sn65dsi86.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/bridge/ti-sn65dsi86.c
> b/drivers/gpu/drm/bridge/ti-sn65dsi86.c
> index db027528febd..bb0a0e1c6341 100644
> --- a/drivers/gpu/drm/bridge/ti-sn65dsi86.c
> +++ b/drivers/gpu/drm/bridge/ti-sn65dsi86.c
> @@ -1251,7 +1251,7 @@ static int __init ti_sn_gpio_register(void)
> return auxiliary_driver_register(&ti_sn_gpio_driver);
>  }
>
> -static void __exit ti_sn_gpio_unregister(void)
> +static void ti_sn_gpio_unregister(void)
>  {
> auxiliary_driver_unregister(&ti_sn_gpio_driver);
>  }
> --
> 2.31.1.527.g47e6f16901-goog
>
>
Reviewed-by: Robert Foss 
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: remove the nvlink2 pci_vfio subdriver v2

2021-05-04 Thread Alex Williamson
On Tue, 4 May 2021 16:11:31 +0200
Greg Kurz  wrote:

> On Tue, 4 May 2021 15:30:15 +0200
> Greg Kroah-Hartman  wrote:
> 
> > On Tue, May 04, 2021 at 03:20:34PM +0200, Greg Kurz wrote:  
> > > On Tue, 4 May 2021 14:59:07 +0200
> > > Greg Kroah-Hartman  wrote:
> > >   
> > > > On Tue, May 04, 2021 at 02:22:36PM +0200, Greg Kurz wrote:  
> > > > > On Fri, 26 Mar 2021 07:13:09 +0100
> > > > > Christoph Hellwig  wrote:
> > > > >   
> > > > > > Hi all,
> > > > > > 
> > > > > > the nvlink2 vfio subdriver is a weird beast.  It supports a hardware
> > > > > > feature without any open source component - what would normally be
> > > > > > the normal open source userspace that we require for kernel drivers,
> > > > > > although in this particular case user space could of course be a
> > > > > > kernel driver in a VM.  It also happens to be a complete mess that
> > > > > > does not properly bind to PCI IDs, is hacked into the vfio_pci 
> > > > > > driver
> > > > > > and also pulles in over 1000 lines of code always build into powerpc
> > > > > > kernels that have Power NV support enabled.  Because of all these
> > > > > > issues and the lack of breaking userspace when it is removed I think
> > > > > > the best idea is to simply kill.
> > > > > > 
> > > > > > Changes since v1:
> > > > > >  - document the removed subtypes as reserved
> > > > > >  - add the ACK from Greg
> > > > > > 
> > > > > > Diffstat:
> > > > > >  arch/powerpc/platforms/powernv/npu-dma.c |  705 
> > > > > > ---
> > > > > >  b/arch/powerpc/include/asm/opal.h|3 
> > > > > >  b/arch/powerpc/include/asm/pci-bridge.h  |1 
> > > > > >  b/arch/powerpc/include/asm/pci.h |7 
> > > > > >  b/arch/powerpc/platforms/powernv/Makefile|2 
> > > > > >  b/arch/powerpc/platforms/powernv/opal-call.c |2 
> > > > > >  b/arch/powerpc/platforms/powernv/pci-ioda.c  |  185 ---
> > > > > >  b/arch/powerpc/platforms/powernv/pci.c   |   11 
> > > > > >  b/arch/powerpc/platforms/powernv/pci.h   |   17 
> > > > > >  b/arch/powerpc/platforms/pseries/pci.c   |   23 
> > > > > >  b/drivers/vfio/pci/Kconfig   |6 
> > > > > >  b/drivers/vfio/pci/Makefile  |1 
> > > > > >  b/drivers/vfio/pci/vfio_pci.c|   18 
> > > > > >  b/drivers/vfio/pci/vfio_pci_private.h|   14 
> > > > > >  b/include/uapi/linux/vfio.h  |   38 -  
> > > > > 
> > > > > 
> > > > > Hi Christoph,
> > > > > 
> > > > > FYI, these uapi changes break build of QEMU.  
> > > > 
> > > > What uapi changes?
> > > >   
> > > 
> > > All macros and structure definitions that are being removed
> > > from include/uapi/linux/vfio.h by patch 1.
> > >   
> > > > What exactly breaks?
> > > >   
> > > 
> > > These macros and types are used by the current QEMU code base.
> > > Next time the QEMU source tree updates its copy of the kernel
> > > headers, the compilation of affected code will fail.  
> > 
> > So does QEMU use this api that is being removed, or does it just have
> > some odd build artifacts of the uapi things?
> >   
> 
> These are region subtypes definition and associated capabilities.
> QEMU basically gets information on VFIO regions from the kernel
> driver and for those regions with a nvlink2 subtype, it tries
> to extract some more nvlink2 related info.


Urgh, let's put the uapi header back in place with a deprecation
notice.  Userspace should never have a dependency on the existence of a
given region, but clearly will have code to parse the data structure
describing that region.  I'll post a patch.  Thanks,

Alex

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 4/9] drm/connector: Add support for out-of-band hotplug notification

2021-05-04 Thread Hans de Goede
Hi,

On 5/4/21 4:52 PM, Heikki Krogerus wrote:
> On Mon, May 03, 2021 at 04:35:29PM +0200, Hans de Goede wrote:
>> Hi,
>>
>> On 5/3/21 10:00 AM, Heikki Krogerus wrote:
>>> Hi Hans,
>>>
>>> On Wed, Apr 28, 2021 at 11:52:52PM +0200, Hans de Goede wrote:
 +/**
 + * struct drm_connector_oob_hotplug_event_data: OOB hotplug event data
 + *
 + * Contains data about out-of-band hotplug events, signalled through
 + * drm_connector_oob_hotplug_event().
 + */
 +struct drm_connector_oob_hotplug_event_data {
 +  /**
 +   * @connected: New connected status for the connector.
 +   */
 +  bool connected;
 +  /**
 +   * @dp_lanes: Number of available displayport lanes, 0 if unknown.
 +   */
 +  int dp_lanes;
 +  /**
 +   * @orientation: Connector orientation.
 +   */
 +  enum typec_orientation orientation;
 +};
>>>
>>> I don't think the orientation is relevant. It will always be "normal"
>>> from DP PoW after muxing, no?
>>
>> That is what I thought to, but during the discussion of my previous attempt
>> at this one of the i915 devs mentioned that in some cases the muxes manage
>> to swap the lane order when the connector upside-down and at least the
>> Intel GPUs can correct for this on the GPU side, so they asked for this
>> info to be included.
>>
>>> I'm also not sure those deatils are enough in the long run. Based on
>>> what I've understood from our graphics team guys, for example knowing
>>> if multi-function is preferred may be important in some cases.
>>
>> The current data being passed is just intended as a starting point,
>> this is purely a kernel internal API so we can easily add more
>> data to the struct. As I mentioned in the cover-letter the current
>> oob_hotplug handler which the i915 patch adds to the i915 driver does
>> not actually do anything with the data.  ATM it is purely there to
>> demonstrate that the ability to pass relevant data is there now
>> (which was an issue with the previous attempt). I believe the current
>> code is fine as a PoC of "pass event data" once GPU drivers actually
>> start doing something with the data we can extend or outright replace
>> it without issues.
> 
> Ah, if there is nothing using that information yet, then just don't
> pass it at all for now. As you said, it's kernel internal API, we can
> change it later if needed.
> 
>>> All of that, and more, is already available in the Configuration VDO
>>> Status VDO that the we have negotiated with the DP partner. Both those
>>> VDOs are part of struct typec_displayport_data. I think we should
>>> simply supply that structure to the DRM code instead of picking those
>>> details out of it...
>>
>> I'm not sure I like the idea of passing the raw VDO, but if the
>> DRM folks think that would be useful we can certainly add it.
> 
> Why are you against passing all the data that we have? What is the
> benefit in picking only certain details out of an object that has a
> standard format, and constructing a customised object for those
> details instead?

The VDO is Type-C specific and the drm_connector_oob_hotplug_event()
is intended to be a generic API.

There are other OOB event sources, e.g. the drivers/apci/acpi_video.c
code receives hotplug events for connectors on powered-down GPUs
on dual/hybrid GPU laptops. ATM the GPU drivers register an ACPI
notifier to catch these; and there are no immediate plans to change
this, but this does illustrate how OOB hotplug notification is not
just a Type-C thing, where as the VDO and its format very much
are Type-C things.

Regards,

Hans

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 4/9] drm/connector: Add support for out-of-band hotplug notification (v2)

2021-05-04 Thread Hans de Goede
Hi,

On 5/4/21 5:10 PM, Heikki Krogerus wrote:
>> +/**
>> + * drm_connector_oob_hotplug_event - Report out-of-band hotplug event to 
>> connector
>> + * @connector: connector to report the event on
>> + * @data: data related to the event
>> + *
>> + * On some hardware a hotplug event notification may come from outside the 
>> display
>> + * driver / device. An example of this is some USB Type-C setups where the 
>> hardware
>> + * muxes the DisplayPort data and aux-lines but does not pass the altmode 
>> HPD
>> + * status bit to the GPU's DP HPD pin.
>> + *
>> + * This function can be used to report these out-of-band events after 
>> obtaining
>> + * a drm_connector reference through calling drm_connector_find_by_fwnode().
>> + */
>> +void drm_connector_oob_hotplug_event(struct fwnode_handle *connector_fwnode,
>> + struct 
>> drm_connector_oob_hotplug_event_data *data)
>> +{
>> +struct drm_connector *connector;
>> +
>> +connector = drm_connector_find_by_fwnode(connector_fwnode);
>> +if (IS_ERR(connector))
>> +return;
>> +
>> +if (connector->funcs->oob_hotplug_event)
>> +connector->funcs->oob_hotplug_event(connector, data);
>> +
>> +drm_connector_put(connector);
>> +}
>> +EXPORT_SYMBOL(drm_connector_oob_hotplug_event);
> 
> So it does looks like the "data" parameter is not needed at all:

Well Imre did indicate that having the number of lanes is useful, so
for the next version I'll drop the orientation but I plan to keep
the number of lanes if that is ok with you.

Not having passing along this info was one of the reasons why my
previous attempt at this was nacked, so dropping it all together
feels wrong.

Regards,

Hans

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v5 06/27] drm/amdgpu: Handle IOMMU enabled case.

2021-05-04 Thread Andrey Grodzovsky



On 2021-05-04 3:03 a.m., Christian König wrote:

Am 03.05.21 um 22:43 schrieb Andrey Grodzovsky:



On 2021-04-29 3:08 a.m., Christian König wrote:

Am 28.04.21 um 17:11 schrieb Andrey Grodzovsky:

Handle all DMA IOMMU gropup related dependencies before the
group is removed.

v5: Drop IOMMU notifier and switch to lockless call to 
ttm_tt_unpopulate


Maybe split that up into more patches.



Signed-off-by: Andrey Grodzovsky 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h    |  2 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 31 
--

  drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c   |  3 +--
  drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h   |  1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c    |  9 +++
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 13 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |  2 ++
  drivers/gpu/drm/amd/amdgpu/cik_ih.c    |  1 -
  drivers/gpu/drm/amd/amdgpu/cz_ih.c |  1 -
  drivers/gpu/drm/amd/amdgpu/iceland_ih.c    |  1 -
  drivers/gpu/drm/amd/amdgpu/navi10_ih.c |  3 ---
  drivers/gpu/drm/amd/amdgpu/si_ih.c |  1 -
  drivers/gpu/drm/amd/amdgpu/tonga_ih.c  |  1 -
  drivers/gpu/drm/amd/amdgpu/vega10_ih.c |  3 ---
  14 files changed, 56 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h

index fddb82897e5d..30a24db5f4d1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1054,6 +1054,8 @@ struct amdgpu_device {
  bool    in_pci_err_recovery;
  struct pci_saved_state  *pci_state;
+
+    struct list_head    device_bo_list;
  };
  static inline struct amdgpu_device *drm_to_adev(struct drm_device 
*ddev)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

index 46d646c40338..91594ddc2459 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -70,6 +70,7 @@
  #include 
  #include 
+
  MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin");
  MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin");
  MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin");
@@ -3211,7 +3212,6 @@ static const struct attribute 
*amdgpu_dev_attributes[] = {

  NULL
  };
-
  /**
   * amdgpu_device_init - initialize the driver
   *
@@ -3316,6 +3316,8 @@ int amdgpu_device_init(struct amdgpu_device 
*adev,

  INIT_WORK(&adev->xgmi_reset_work, amdgpu_device_xgmi_reset_func);
+    INIT_LIST_HEAD(&adev->device_bo_list);
+
  adev->gfx.gfx_off_req_count = 1;
  adev->pm.ac_power = power_supply_is_system_supplied() > 0;
@@ -3601,6 +3603,28 @@ int amdgpu_device_init(struct amdgpu_device 
*adev,

  return r;
  }
+static void amdgpu_clear_dma_mappings(struct amdgpu_device *adev)
+{
+    struct amdgpu_bo *bo = NULL;
+
+    /*
+ * Unmaps all DMA mappings before device will be removed from it's
+ * IOMMU group otherwise in case of IOMMU enabled system a crash
+ * will happen.
+ */
+
+    spin_lock(&adev->mman.bdev.lru_lock);
+    while (!list_empty(&adev->device_bo_list)) {
+    bo = list_first_entry(&adev->device_bo_list, struct 
amdgpu_bo, bo);

+    list_del_init(&bo->bo);
+    spin_unlock(&adev->mman.bdev.lru_lock);
+    if (bo->tbo.ttm)
+    ttm_tt_unpopulate(bo->tbo.bdev, bo->tbo.ttm);
+    spin_lock(&adev->mman.bdev.lru_lock);
+    }
+    spin_unlock(&adev->mman.bdev.lru_lock);


Can you try to use the same approach as amdgpu_gtt_mgr_recover() 
instead of adding something to the BO?


Christian.


Are you sure that dma mappings limit themself only to GTT BOs
which have allocated mm nodes ?


Yes, you would also need the system domain BOs. But those can be put on 
a similar list.


What list ? Those BOs don't have ttm_resource_manager and so no
drm_mm_node list they all bound to. Should I maintain a list for them
spcifically for the unmap purpuse ?




Otherwsie we will crash and burn
on missing IOMMU group when unampping post device remove.
Problem for me to test this as in 5.12 kernel I don't crash even
when removing this entire patch.  Looks like iommu_dma_unmap_page
was changed since 5.9 when I introdiced this patch.


Do we really still need that stuff then? What exactly has changed?


At first I assumed that because of this change 'iommu: Allow the 
dma-iommu api to use bounce buffers'

Which changed iommu_dma_unmap_page to call __iommu_dma_unmap_swiotlb
instead if __iommu_dma_unmap directly. But then i looked inside
__iommu_dma_unmap_swiotlb and it still calls __iommu_dma_unmap
evenetually. So maybe the fact that I moved the amd_ip_funcs.hw_fini
call to inside amdgpu_pci_remove helps.

Andrey




Christian.



Andrey




+}
+
  /**
   * amdgpu_device_fini - tear down the driver
   *
@@ -3639,12 +3663,15 @@ void amdgpu_device_fini_hw(struct 
amdgpu_device *adev)

  amdgpu_ucode_sysfs_fini(adev);
  sysfs_remove_files(&adev->dev->kobj, amdgpu_dev_attribute

Re: [PATCH 8/8] drm/modifiers: Enforce consistency between the cap an IN_FORMATS

2021-05-04 Thread Emil Velikov
On Tue, 4 May 2021 at 15:58, Simon Ser  wrote:
>
> Continuing on that idea to push for enabling the cap in more cases: do
> we have a policy to require new drivers to always support modifiers?
>
> That would be nice, even if it's just about enabling LINEAR.

Sounds perfectly reasonable IMHO. I think we ought to document this
policy (requirement ideally) somewhere - say alongside the "all new
KMS drivers must support atomic modeset", which lives in ...

-Emil
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: remove the nvlink2 pci_vfio subdriver v2

2021-05-04 Thread Jason Gunthorpe
On Tue, May 04, 2021 at 04:23:40PM +0200, Daniel Vetter wrote:

> Just my 2cents from drm (where we deprecate old gunk uapi quite often):
> Imo it's best to keep the uapi headers as-is, but exchange the
> documentation with a big "this is removed, never use again" warning:

We in RDMA have been doing the opposite, the uapi headers are supposed
to reflect the current kernel. This helps make the kernel
understandable.

When userspace needs backwards compat to ABI that the current kernel
doesn't support then userspace has distinct copies of that information
in some compat location. It has happened a few times over the last 15
years.

We keep full copies of the current kernel headers in the userspace
source tree, when the kernel headers change in a compile incompatible
way we fix everything while updating to the new kernel headers.

> - it's good to know which uapi numbers (like parameter extensions or
>   whatever they are in this case) are defacto reserved, because there are
>   binaries (qemu in this) that have code acting on them out there.

Numbers and things get marked reserved or the like

> Anyway feel free to ignore since this might be different than drivers/gpu.

AFAIK drives/gpu has a lot wider userspace, rdma manages this OK
because we only have one library package that provides the user/kernel
interface.
 
Jason
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/bridge: ti-sn65dsi86: Remove __exit from GPIO sub-driver remove helper

2021-05-04 Thread Robert Foss
Merged to drm-misc-next.

On Tue, 4 May 2021 at 17:22, Robert Foss  wrote:

> Hey Douglas,
>
> On Tue, 4 May 2021 at 16:39, Douglas Anderson 
> wrote:
>
>> The ti_sn_gpio_unregister() is not just called from the remove path
>> but also from the error handling of the init path. That means it can't
>> have the __exit annotation.
>>
>> Fixes: bf73537f411b ("drm/bridge: ti-sn65dsi86: Break GPIO and
>> MIPI-to-eDP bridge into sub-drivers")
>> Reported-by: kernel test robot 
>> Signed-off-by: Douglas Anderson 
>> ---
>>
>>  drivers/gpu/drm/bridge/ti-sn65dsi86.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/bridge/ti-sn65dsi86.c
>> b/drivers/gpu/drm/bridge/ti-sn65dsi86.c
>> index db027528febd..bb0a0e1c6341 100644
>> --- a/drivers/gpu/drm/bridge/ti-sn65dsi86.c
>> +++ b/drivers/gpu/drm/bridge/ti-sn65dsi86.c
>> @@ -1251,7 +1251,7 @@ static int __init ti_sn_gpio_register(void)
>> return auxiliary_driver_register(&ti_sn_gpio_driver);
>>  }
>>
>> -static void __exit ti_sn_gpio_unregister(void)
>> +static void ti_sn_gpio_unregister(void)
>>  {
>> auxiliary_driver_unregister(&ti_sn_gpio_driver);
>>  }
>> --
>> 2.31.1.527.g47e6f16901-goog
>>
>>
> Reviewed-by: Robert Foss 
>
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 16/27] drm/i915/gem: Add an intermediate proto_context struct

2021-05-04 Thread Daniel Vetter
On Mon, May 03, 2021 at 10:57:37AM -0500, Jason Ekstrand wrote:
> The current context uAPI allows for two methods of setting context
> parameters: SET_CONTEXT_PARAM and CONTEXT_CREATE_EXT_SETPARAM.  The
> former is allowed to be called at any time while the later happens as
> part of GEM_CONTEXT_CREATE.  Currently, everything settable via one is
> settable via the other.  While some params are fairly simple and setting
> them on a live context is harmless such the context priority, others are
> far trickier such as the VM or the set of engines.  In order to swap out
> the VM, for instance, we have to delay until all current in-flight work
> is complete, swap in the new VM, and then continue.  This leads to a
> plethora of potential race conditions we'd really rather avoid.
> 
> Unfortunately, both methods of setting the VM and engine set are in
> active use today so we can't simply disallow setting the VM or engine
> set vial SET_CONTEXT_PARAM.  In order to work around this wart, this
> commit adds a proto-context struct which contains all the context create
> parameters.
> 
> Signed-off-by: Jason Ekstrand 

Per-patch changelog pls.

> ---
>  drivers/gpu/drm/i915/gem/i915_gem_context.c   | 145 ++
>  .../gpu/drm/i915/gem/i915_gem_context_types.h |  22 +++
>  .../gpu/drm/i915/gem/selftests/mock_context.c |  16 +-
>  3 files changed, 153 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index 4835991898ac9..10bd1b6dd1774 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -191,6 +191,97 @@ static int validate_priority(struct drm_i915_private 
> *i915,
>   return 0;
>  }
>  
> +static void proto_context_close(struct i915_gem_proto_context *pc)
> +{
> + if (pc->vm)
> + i915_vm_put(pc->vm);
> + kfree(pc);
> +}
> +
> +static int proto_context_set_persistence(struct drm_i915_private *i915,
> +  struct i915_gem_proto_context *pc,
> +  bool persist)
> +{
> + if (persist) {
> + /*
> +  * Only contexts that are short-lived [that will expire or be
> +  * reset] are allowed to survive past termination. We require
> +  * hangcheck to ensure that the persistent requests are healthy.
> +  */
> + if (!i915->params.enable_hangcheck)
> + return -EINVAL;
> +
> + __set_bit(UCONTEXT_PERSISTENCE, &pc->user_flags);

Ok so I looked, and the reason __set_bit and friends is for endless
bitfields, i.e. where user_flags is an actually dynamically sized array.

Given that this is complete overkill I think fully open-coding the bitops
is the right bikeshed color choice. So

user_flags &= UCONTEXT_PERSISTENCE;

> + } else {
> + /* To cancel a context we use "preempt-to-idle" */
> + if (!(i915->caps.scheduler & I915_SCHEDULER_CAP_PREEMPTION))
> + return -ENODEV;
> +
> + /*
> +  * If the cancel fails, we then need to reset, cleanly!
> +  *
> +  * If the per-engine reset fails, all hope is lost! We resort
> +  * to a full GPU reset in that unlikely case, but realistically
> +  * if the engine could not reset, the full reset does not fare
> +  * much better. The damage has been done.
> +  *
> +  * However, if we cannot reset an engine by itself, we cannot
> +  * cleanup a hanging persistent context without causing
> +  * colateral damage, and we should not pretend we can by
> +  * exposing the interface.
> +  */
> + if (!intel_has_reset_engine(&i915->gt))
> + return -ENODEV;
> +
> + __clear_bit(UCONTEXT_PERSISTENCE, &pc->user_flags);

user_flags &= ~UCONTEXT_PERSISTENCE;

Similar for all the others.

> + }
> +
> + return 0;
> +}
> +
> +static struct i915_gem_proto_context *
> +proto_context_create(struct drm_i915_private *i915, unsigned int flags)
> +{
> + struct i915_gem_proto_context *pc, *err;
> +
> + pc = kzalloc(sizeof(*pc), GFP_KERNEL);
> + if (!pc)
> + return ERR_PTR(-ENOMEM);
> +
> + if (HAS_FULL_PPGTT(i915)) {
> + struct i915_ppgtt *ppgtt;
> +
> + ppgtt = i915_ppgtt_create(&i915->gt);
> + if (IS_ERR(ppgtt)) {
> + drm_dbg(&i915->drm, "PPGTT setup failed (%ld)\n",
> + PTR_ERR(ppgtt));
> + err = ERR_CAST(ppgtt);
> + goto proto_close;
> + }
> + pc->vm = &ppgtt->vm;

I'm not understanding why we're creating the default vm as part of the
proto context? If we end up setting one this is kinda just wasted
conditional

Re: [Intel-gfx] [PATCH 17/27] drm/i915/gem: Rework error handling in default_engines

2021-05-04 Thread Daniel Vetter
On Mon, May 03, 2021 at 10:57:38AM -0500, Jason Ekstrand wrote:
> Since free_engines works for partially constructed engine sets, we can
> use the usual goto pattern.
> 
> Signed-off-by: Jason Ekstrand 

I guess subsequent patches apply the same for the set_engines command and
__free_engines disappears? Otherwise feels a bit silly.

Anyway looks correct.

Reviewed-by: Daniel Vetter 

> ---
>  drivers/gpu/drm/i915/gem/i915_gem_context.c | 13 -
>  1 file changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index 10bd1b6dd1774..ce729e640bbf7 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -420,7 +420,7 @@ static struct i915_gem_engines *default_engines(struct 
> i915_gem_context *ctx)
>  {
>   const struct intel_gt *gt = &ctx->i915->gt;
>   struct intel_engine_cs *engine;
> - struct i915_gem_engines *e;
> + struct i915_gem_engines *e, *err;
>   enum intel_engine_id id;
>  
>   e = alloc_engines(I915_NUM_ENGINES);
> @@ -438,18 +438,21 @@ static struct i915_gem_engines *default_engines(struct 
> i915_gem_context *ctx)
>  
>   ce = intel_context_create(engine);
>   if (IS_ERR(ce)) {
> - __free_engines(e, e->num_engines + 1);
> - return ERR_CAST(ce);
> + err = ERR_CAST(ce);
> + goto free_engines;
>   }
>  
>   intel_context_set_gem(ce, ctx);
>  
>   e->engines[engine->legacy_idx] = ce;
> - e->num_engines = max(e->num_engines, engine->legacy_idx);
> + e->num_engines = max(e->num_engines, engine->legacy_idx + 1);
>   }
> - e->num_engines++;
>  
>   return e;
> +
> +free_engines:
> + free_engines(e);
> + return err;
>  }
>  
>  void i915_gem_context_release(struct kref *ref)
> -- 
> 2.31.1
> 
> ___
> Intel-gfx mailing list
> intel-...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] fbmem: Mark proc_fb_seq_ops as __maybe_unused

2021-05-04 Thread Daniel Vetter
On Tue, May 4, 2021 at 4:29 PM Guenter Roeck  wrote:
>
> With CONFIG_PROC_FS=n and -Werror, 0-day reports:
>
> drivers/video/fbdev/core/fbmem.c:736:36: error:
> 'proc_fb_seq_ops' defined but not used
>
> Mark it as __maybe_unused.
>
> Reported-by: kernel test robot 
> Signed-off-by: Guenter Roeck 

Queued up for -rc1 in drm-misc-next-fixes, thanks for the patch.
-Daniel

> ---
>  drivers/video/fbdev/core/fbmem.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/video/fbdev/core/fbmem.c 
> b/drivers/video/fbdev/core/fbmem.c
> index 372b52a2befa..52c606c0f8a2 100644
> --- a/drivers/video/fbdev/core/fbmem.c
> +++ b/drivers/video/fbdev/core/fbmem.c
> @@ -733,7 +733,7 @@ static int fb_seq_show(struct seq_file *m, void *v)
> return 0;
>  }
>
> -static const struct seq_operations proc_fb_seq_ops = {
> +static const struct __maybe_unused seq_operations proc_fb_seq_ops = {
> .start  = fb_seq_start,
> .next   = fb_seq_next,
> .stop   = fb_seq_stop,
> --
> 2.25.1
>
> ___
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: remove the nvlink2 pci_vfio subdriver v2

2021-05-04 Thread Daniel Vetter
On Tue, May 04, 2021 at 12:53:27PM -0300, Jason Gunthorpe wrote:
> On Tue, May 04, 2021 at 04:23:40PM +0200, Daniel Vetter wrote:
> 
> > Just my 2cents from drm (where we deprecate old gunk uapi quite often):
> > Imo it's best to keep the uapi headers as-is, but exchange the
> > documentation with a big "this is removed, never use again" warning:
> 
> We in RDMA have been doing the opposite, the uapi headers are supposed
> to reflect the current kernel. This helps make the kernel
> understandable.
> 
> When userspace needs backwards compat to ABI that the current kernel
> doesn't support then userspace has distinct copies of that information
> in some compat location. It has happened a few times over the last 15
> years.
> 
> We keep full copies of the current kernel headers in the userspace
> source tree, when the kernel headers change in a compile incompatible
> way we fix everything while updating to the new kernel headers.

Yeah we do the same since forever (it's either from libdrm package, or
directly in the corresponding userspace header). So largely include/uapi
is for documentation

> > - it's good to know which uapi numbers (like parameter extensions or
> >   whatever they are in this case) are defacto reserved, because there are
> >   binaries (qemu in this) that have code acting on them out there.
> 
> Numbers and things get marked reserved or the like
> 
> > Anyway feel free to ignore since this might be different than drivers/gpu.
> 
> AFAIK drives/gpu has a lot wider userspace, rdma manages this OK
> because we only have one library package that provides the user/kernel
> interface.

But since we have some many projects we've started asking all the userspace
projects to directly take the kernel ones (after the make step to filter
them) so that there's only one source of truth. And also to make sure they
don't merge stuff before the kernel side is reviewed&landed. Which also
means we can't ditch anything userspace might still need on older trees
and stuff.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] drm/i915: drop the __i915_active_call pointer packing

2021-05-04 Thread Matthew Auld
We use some of the lower bits of the retire function pointer for
potential flags, which is quite thorny, since the caller needs to
remember to give the function the correct alignment with
__i915_active_call, otherwise we might incorrectly unpack the pointer
and jump to some garbage address later. Instead of all this let's just
pass the flags along as a separate parameter.

Suggested-by: Ville Syrjälä 
Suggested-by: Daniel Vetter 
References: ca419f407b43 ("drm/i915: Fix crash in auto_retire")
References: d8e44e4dd221 ("drm/i915/overlay: Fix active retire callback 
alignment")
References: fd5f262db118 ("drm/i915/selftests: Fix active retire callback 
alignment")
Signed-off-by: Matthew Auld 
---
 drivers/gpu/drm/i915/display/intel_frontbuffer.c   |  4 ++--
 drivers/gpu/drm/i915/display/intel_overlay.c   |  5 ++---
 drivers/gpu/drm/i915/gem/i915_gem_context.c|  3 +--
 drivers/gpu/drm/i915/gt/gen6_ppgtt.c   |  2 +-
 drivers/gpu/drm/i915/gt/intel_context.c|  3 +--
 drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c   |  2 +-
 drivers/gpu/drm/i915/gt/intel_gt_buffer_pool.c |  3 +--
 drivers/gpu/drm/i915/gt/intel_timeline.c   |  4 ++--
 drivers/gpu/drm/i915/gt/mock_engine.c  |  2 +-
 .../gpu/drm/i915/gt/selftest_engine_heartbeat.c|  4 ++--
 drivers/gpu/drm/i915/i915_active.c | 14 +-
 drivers/gpu/drm/i915/i915_active.h | 11 ++-
 drivers/gpu/drm/i915/i915_active_types.h   |  5 -
 drivers/gpu/drm/i915/i915_vma.c|  3 +--
 drivers/gpu/drm/i915/selftests/i915_active.c   |  4 ++--
 15 files changed, 28 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_frontbuffer.c 
b/drivers/gpu/drm/i915/display/intel_frontbuffer.c
index 8161d49e78ba..8e75debcce1a 100644
--- a/drivers/gpu/drm/i915/display/intel_frontbuffer.c
+++ b/drivers/gpu/drm/i915/display/intel_frontbuffer.c
@@ -211,7 +211,6 @@ static int frontbuffer_active(struct i915_active *ref)
return 0;
 }
 
-__i915_active_call
 static void frontbuffer_retire(struct i915_active *ref)
 {
struct intel_frontbuffer *front =
@@ -266,7 +265,8 @@ intel_frontbuffer_get(struct drm_i915_gem_object *obj)
atomic_set(&front->bits, 0);
i915_active_init(&front->write,
 frontbuffer_active,
-i915_active_may_sleep(frontbuffer_retire));
+frontbuffer_retire,
+I915_ACTIVE_RETIRE_SLEEPS);
 
spin_lock(&i915->fb_tracking.lock);
if (rcu_access_pointer(obj->frontbuffer)) {
diff --git a/drivers/gpu/drm/i915/display/intel_overlay.c 
b/drivers/gpu/drm/i915/display/intel_overlay.c
index 428819ba18dd..f1e04c1535c7 100644
--- a/drivers/gpu/drm/i915/display/intel_overlay.c
+++ b/drivers/gpu/drm/i915/display/intel_overlay.c
@@ -383,8 +383,7 @@ static void intel_overlay_off_tail(struct intel_overlay 
*overlay)
i830_overlay_clock_gating(dev_priv, true);
 }
 
-__i915_active_call static void
-intel_overlay_last_flip_retire(struct i915_active *active)
+static void intel_overlay_last_flip_retire(struct i915_active *active)
 {
struct intel_overlay *overlay =
container_of(active, typeof(*overlay), last_flip);
@@ -1401,7 +1400,7 @@ void intel_overlay_setup(struct drm_i915_private 
*dev_priv)
overlay->saturation = 146;
 
i915_active_init(&overlay->last_flip,
-NULL, intel_overlay_last_flip_retire);
+NULL, intel_overlay_last_flip_retire, 0);
 
ret = get_registers(overlay, OVERLAY_NEEDS_PHYSICAL(dev_priv));
if (ret)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index fd8ee52e17a4..188dee13e017 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -1046,7 +1046,6 @@ struct context_barrier_task {
void *data;
 };
 
-__i915_active_call
 static void cb_retire(struct i915_active *base)
 {
struct context_barrier_task *cb = container_of(base, typeof(*cb), base);
@@ -1080,7 +1079,7 @@ static int context_barrier_task(struct i915_gem_context 
*ctx,
if (!cb)
return -ENOMEM;
 
-   i915_active_init(&cb->base, NULL, cb_retire);
+   i915_active_init(&cb->base, NULL, cb_retire, 0);
err = i915_active_acquire(&cb->base);
if (err) {
kfree(cb);
diff --git a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c 
b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
index 21b1085769be..1aee5e6b1b23 100644
--- a/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
+++ b/drivers/gpu/drm/i915/gt/gen6_ppgtt.c
@@ -343,7 +343,7 @@ static struct i915_vma *pd_vma_create(struct gen6_ppgtt 
*ppgtt, int size)
if (!vma)
return ERR_PTR(-ENOMEM);
 
-   i915_active_init(&vma->active, NULL, NULL);
+   i915_active_init(&vma->active, NULL, NULL, 0);
 
 

Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-05-04 Thread Daniel Vetter
On Tue, May 04, 2021 at 02:48:35PM +0200, Christian König wrote:
> Am 04.05.21 um 13:13 schrieb Daniel Vetter:
> > On Tue, May 4, 2021 at 12:53 PM Christian König
> >  wrote:
> > > Am 04.05.21 um 11:47 schrieb Daniel Vetter:
> > > > [SNIP]
> > > > > Yeah, it just takes to long for the preemption to complete to be 
> > > > > really
> > > > > useful for the feature we are discussing here.
> > > > > 
> > > > > As I said when the kernel requests to preempt a queue we can easily 
> > > > > expect a
> > > > > timeout of ~100ms until that comes back. For compute that is even in 
> > > > > the
> > > > > multiple seconds range.
> > > > 100ms for preempting an idle request sounds like broken hw to me. Of
> > > > course preemting something that actually runs takes a while, that's
> > > > nothing new. But it's also not the thing we're talking about here. Is 
> > > > this
> > > > 100ms actual numbers from hw for an actual idle ringbuffer?
> > > Well 100ms is just an example of the scheduler granularity. Let me
> > > explain in a wider context.
> > > 
> > > The hardware can have X queues mapped at the same time and every Y time
> > > interval the hardware scheduler checks if those queues have changed and
> > > only if they have changed the necessary steps to reload them are started.
> > > 
> > > Multiple queues can be rendering at the same time, so you can have X as
> > > a high priority queue active and just waiting for a signal to start and
> > > the client rendering one frame after another and a third background
> > > compute task mining bitcoins for you.
> > > 
> > > As long as everything is static this is perfectly performant. Adding a
> > > queue to the list of active queues is also relatively simple, but taking
> > > one down requires you to wait until we are sure the hardware has seen
> > > the change and reloaded the queues.
> > > 
> > > Think of it as an RCU grace period. This is simply not something which
> > > is made to be used constantly, but rather just at process termination.
> > Uh ... that indeed sounds rather broken.
> 
> Well I wouldn't call it broken. It's just not made for the use case we are
> trying to abuse it for.
> 
> > Otoh it's just a dma_fence that'd we'd inject as this unload-fence.
> 
> Yeah, exactly that's why it isn't much of a problem for process termination
> or freeing memory.

Ok so your hw really hates the unload fence. On ours the various queues
are a bit more explicit, so largely unload/preempt is the same as context
switch and pretty quick. Afaik at least.

Still baffled that you can't fix this in fw, but oh well. Judging from how
fast our fw team moves I'm not surprised :-/

Anyway so next plan: Make this work exactly like hmm:
1. wait for the user fence as a dma-fence fake thing, tdr makes this safe
2. remove pte
3. do synchronous tlb flush

Tada, no more 100ms stall in your buffer move callbacks. And feel free to
pack up 2&3 into an async worker or something if it takes too long and
treating it as a bo move dma_fence is better. Also that way you might be
able to batch up the tlb flushing if it's too damn expensive, by
collecting them all under a single dma_fence (and starting a new tlb flush
cycle every time ->enable_signalling gets called).

As long as you nack any gpu faults and don't try to fill them for these
legacy contexts that support dma-fence there's no harm in using the hw
facilities.

Ofc if you're now telling me your synchronous tlb flush is also 100ms,
then maybe just throw the hw out the window, and accept that the
millisecond anything evicts anything (good look with userptr) the screen
freezes for a bit.

> > So by and large everyone should already be able to cope with it taking a
> > bit longer. So from a design pov I don't see a huge problem, but I
> > guess you guys wont be happy since it means on amd hw there will be
> > random unsightly stalls in desktop linux usage.
> > 
> > > > > The "preemption" feature is really called suspend and made just for 
> > > > > the case
> > > > > when we want to put a process to sleep or need to forcefully kill it 
> > > > > for
> > > > > misbehavior or stuff like that. It is not meant to be used in normal
> > > > > operation.
> > > > > 
> > > > > If we only attach it on ->move then yeah maybe a last resort 
> > > > > possibility to
> > > > > do it this way, but I think in that case we could rather stick with 
> > > > > kernel
> > > > > submissions.
> > > > Well this is a hybrid userspace ring + kernel augmeted submit mode, so 
> > > > you
> > > > can keep dma-fences working. Because the dma-fence stuff wont work with
> > > > pure userspace submit, I think that conclusion is rather solid. Once 
> > > > more
> > > > even after this long thread here.
> > > When assisted with unload fences, then yes. Problem is that I can't see
> > > how we could implement those performant currently.
> > Is there really no way to fix fw here? Like if process start/teardown
> > takes 100ms, that's going to suck no matter what.
> 
> As I said add

Re: remove the nvlink2 pci_vfio subdriver v2

2021-05-04 Thread Greg Kurz
On Tue, 4 May 2021 14:59:07 +0200
Greg Kroah-Hartman  wrote:

> On Tue, May 04, 2021 at 02:22:36PM +0200, Greg Kurz wrote:
> > On Fri, 26 Mar 2021 07:13:09 +0100
> > Christoph Hellwig  wrote:
> > 
> > > Hi all,
> > > 
> > > the nvlink2 vfio subdriver is a weird beast.  It supports a hardware
> > > feature without any open source component - what would normally be
> > > the normal open source userspace that we require for kernel drivers,
> > > although in this particular case user space could of course be a
> > > kernel driver in a VM.  It also happens to be a complete mess that
> > > does not properly bind to PCI IDs, is hacked into the vfio_pci driver
> > > and also pulles in over 1000 lines of code always build into powerpc
> > > kernels that have Power NV support enabled.  Because of all these
> > > issues and the lack of breaking userspace when it is removed I think
> > > the best idea is to simply kill.
> > > 
> > > Changes since v1:
> > >  - document the removed subtypes as reserved
> > >  - add the ACK from Greg
> > > 
> > > Diffstat:
> > >  arch/powerpc/platforms/powernv/npu-dma.c |  705 
> > > ---
> > >  b/arch/powerpc/include/asm/opal.h|3 
> > >  b/arch/powerpc/include/asm/pci-bridge.h  |1 
> > >  b/arch/powerpc/include/asm/pci.h |7 
> > >  b/arch/powerpc/platforms/powernv/Makefile|2 
> > >  b/arch/powerpc/platforms/powernv/opal-call.c |2 
> > >  b/arch/powerpc/platforms/powernv/pci-ioda.c  |  185 ---
> > >  b/arch/powerpc/platforms/powernv/pci.c   |   11 
> > >  b/arch/powerpc/platforms/powernv/pci.h   |   17 
> > >  b/arch/powerpc/platforms/pseries/pci.c   |   23 
> > >  b/drivers/vfio/pci/Kconfig   |6 
> > >  b/drivers/vfio/pci/Makefile  |1 
> > >  b/drivers/vfio/pci/vfio_pci.c|   18 
> > >  b/drivers/vfio/pci/vfio_pci_private.h|   14 
> > >  b/include/uapi/linux/vfio.h  |   38 -
> > 
> > 
> > Hi Christoph,
> > 
> > FYI, these uapi changes break build of QEMU.
> 
> What uapi changes?
> 

All macros and structure definitions that are being removed
from include/uapi/linux/vfio.h by patch 1.

> What exactly breaks?
> 

These macros and types are used by the current QEMU code base.
Next time the QEMU source tree updates its copy of the kernel
headers, the compilation of affected code will fail.

> Why does QEMU require kernel driver stuff?
> 

Not sure to understand the question... is there a problem
with QEMU using an already published uapi ?

> thanks,
> 
> greg k-h

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: remove the nvlink2 pci_vfio subdriver v2

2021-05-04 Thread Greg Kurz
On Tue, 4 May 2021 15:30:15 +0200
Greg Kroah-Hartman  wrote:

> On Tue, May 04, 2021 at 03:20:34PM +0200, Greg Kurz wrote:
> > On Tue, 4 May 2021 14:59:07 +0200
> > Greg Kroah-Hartman  wrote:
> > 
> > > On Tue, May 04, 2021 at 02:22:36PM +0200, Greg Kurz wrote:
> > > > On Fri, 26 Mar 2021 07:13:09 +0100
> > > > Christoph Hellwig  wrote:
> > > > 
> > > > > Hi all,
> > > > > 
> > > > > the nvlink2 vfio subdriver is a weird beast.  It supports a hardware
> > > > > feature without any open source component - what would normally be
> > > > > the normal open source userspace that we require for kernel drivers,
> > > > > although in this particular case user space could of course be a
> > > > > kernel driver in a VM.  It also happens to be a complete mess that
> > > > > does not properly bind to PCI IDs, is hacked into the vfio_pci driver
> > > > > and also pulles in over 1000 lines of code always build into powerpc
> > > > > kernels that have Power NV support enabled.  Because of all these
> > > > > issues and the lack of breaking userspace when it is removed I think
> > > > > the best idea is to simply kill.
> > > > > 
> > > > > Changes since v1:
> > > > >  - document the removed subtypes as reserved
> > > > >  - add the ACK from Greg
> > > > > 
> > > > > Diffstat:
> > > > >  arch/powerpc/platforms/powernv/npu-dma.c |  705 
> > > > > ---
> > > > >  b/arch/powerpc/include/asm/opal.h|3 
> > > > >  b/arch/powerpc/include/asm/pci-bridge.h  |1 
> > > > >  b/arch/powerpc/include/asm/pci.h |7 
> > > > >  b/arch/powerpc/platforms/powernv/Makefile|2 
> > > > >  b/arch/powerpc/platforms/powernv/opal-call.c |2 
> > > > >  b/arch/powerpc/platforms/powernv/pci-ioda.c  |  185 ---
> > > > >  b/arch/powerpc/platforms/powernv/pci.c   |   11 
> > > > >  b/arch/powerpc/platforms/powernv/pci.h   |   17 
> > > > >  b/arch/powerpc/platforms/pseries/pci.c   |   23 
> > > > >  b/drivers/vfio/pci/Kconfig   |6 
> > > > >  b/drivers/vfio/pci/Makefile  |1 
> > > > >  b/drivers/vfio/pci/vfio_pci.c|   18 
> > > > >  b/drivers/vfio/pci/vfio_pci_private.h|   14 
> > > > >  b/include/uapi/linux/vfio.h  |   38 -
> > > > 
> > > > 
> > > > Hi Christoph,
> > > > 
> > > > FYI, these uapi changes break build of QEMU.
> > > 
> > > What uapi changes?
> > > 
> > 
> > All macros and structure definitions that are being removed
> > from include/uapi/linux/vfio.h by patch 1.
> > 
> > > What exactly breaks?
> > > 
> > 
> > These macros and types are used by the current QEMU code base.
> > Next time the QEMU source tree updates its copy of the kernel
> > headers, the compilation of affected code will fail.
> 
> So does QEMU use this api that is being removed, or does it just have
> some odd build artifacts of the uapi things?
> 

These are region subtypes definition and associated capabilities.
QEMU basically gets information on VFIO regions from the kernel
driver and for those regions with a nvlink2 subtype, it tries
to extract some more nvlink2 related info.

> What exactly is the error messages here?
> 

[55/143] Compiling C object libqemu-ppc64-softmmu.fa.p/hw_vfio_pci-quirks.c.o
FAILED: libqemu-ppc64-softmmu.fa.p/hw_vfio_pci-quirks.c.o 
cc -Ilibqemu-ppc64-softmmu.fa.p -I. -I../.. -Itarget/ppc -I../../target/ppc 
-I../../capstone/include/capstone -Iqapi -Itrace -Iui -Iui/shader 
-I/usr/include/pixman-1 -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include 
-fdiagnostics-color=auto -pipe -Wall -Winvalid-pch -Werror -std=gnu99 -O2 -g 
-isystem /home/greg/Work/qemu/qemu-virtiofs/linux-headers -isystem 
linux-headers -iquote . -iquote /home/greg/Work/qemu/qemu-virtiofs -iquote 
/home/greg/Work/qemu/qemu-virtiofs/include -iquote 
/home/greg/Work/qemu/qemu-virtiofs/disas/libvixl -iquote 
/home/greg/Work/qemu/qemu-virtiofs/tcg/ppc -iquote 
/home/greg/Work/qemu/qemu-virtiofs/accel/tcg -pthread -U_FORTIFY_SOURCE 
-D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE 
-Wstrict-prototypes -Wredundant-decls -Wundef -Wwrite-strings 
-Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv 
-Wold-style-declaration -Wold-style-definition -Wtype-limits -Wformat-security 
-Wformat-y2k -Winit-self -Wignored-qualifiers -Wempty-body -Wnested-externs 
-Wendif-labels -Wexpansion-to-defined -Wimplicit-fallthrough=2 
-Wno-missing-include-dirs -Wno-shift-negative-value -Wno-psabi 
-fstack-protector-strong -fPIC -isystem../../linux-headers 
-isystemlinux-headers -DNEED_CPU_H 
'-DCONFIG_TARGET="ppc64-softmmu-config-target.h"' 
'-DCONFIG_DEVICES="ppc64-softmmu-config-devices.h"' -MD -MQ 
libqemu-ppc64-softmmu.fa.p/hw_vfio_pci-quirks.c.o -MF 
libqemu-ppc64-softmmu.fa.p/hw_vfio_pci-quirks.c.o.d -o 
libqemu-ppc64-softmmu.fa.p/hw_vfio_pci-quirks.c.o -c ../../hw/vfio/pci-quirks.c
../../hw/vfio/pci-quirks.c: In function ‘vfio_pci_nvidia_v100_ram_init’:
../../hw/vfio/pci-quirks.c:1597:36

Re: remove the nvlink2 pci_vfio subdriver v2

2021-05-04 Thread Greg Kurz
On Fri, 26 Mar 2021 07:13:09 +0100
Christoph Hellwig  wrote:

> Hi all,
> 
> the nvlink2 vfio subdriver is a weird beast.  It supports a hardware
> feature without any open source component - what would normally be
> the normal open source userspace that we require for kernel drivers,
> although in this particular case user space could of course be a
> kernel driver in a VM.  It also happens to be a complete mess that
> does not properly bind to PCI IDs, is hacked into the vfio_pci driver
> and also pulles in over 1000 lines of code always build into powerpc
> kernels that have Power NV support enabled.  Because of all these
> issues and the lack of breaking userspace when it is removed I think
> the best idea is to simply kill.
> 
> Changes since v1:
>  - document the removed subtypes as reserved
>  - add the ACK from Greg
> 
> Diffstat:
>  arch/powerpc/platforms/powernv/npu-dma.c |  705 
> ---
>  b/arch/powerpc/include/asm/opal.h|3 
>  b/arch/powerpc/include/asm/pci-bridge.h  |1 
>  b/arch/powerpc/include/asm/pci.h |7 
>  b/arch/powerpc/platforms/powernv/Makefile|2 
>  b/arch/powerpc/platforms/powernv/opal-call.c |2 
>  b/arch/powerpc/platforms/powernv/pci-ioda.c  |  185 ---
>  b/arch/powerpc/platforms/powernv/pci.c   |   11 
>  b/arch/powerpc/platforms/powernv/pci.h   |   17 
>  b/arch/powerpc/platforms/pseries/pci.c   |   23 
>  b/drivers/vfio/pci/Kconfig   |6 
>  b/drivers/vfio/pci/Makefile  |1 
>  b/drivers/vfio/pci/vfio_pci.c|   18 
>  b/drivers/vfio/pci/vfio_pci_private.h|   14 
>  b/include/uapi/linux/vfio.h  |   38 -


Hi Christoph,

FYI, these uapi changes break build of QEMU.

I guess QEMU people should take some action before this percolates
to the QEMU source tree.

Cc'ing relevant QEMU lists to bring the discussion there.

Cheers,

--
Greg

>  drivers/vfio/pci/vfio_pci_nvlink2.c  |  490 --
>  16 files changed, 12 insertions(+), 1511 deletions(-)

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v5 06/27] drm/amdgpu: Handle IOMMU enabled case.

2021-05-04 Thread Felix Kuehling

Am 2021-04-28 um 11:11 a.m. schrieb Andrey Grodzovsky:
> Handle all DMA IOMMU gropup related dependencies before the
> group is removed.
>
> v5: Drop IOMMU notifier and switch to lockless call to ttm_tt_unpopulate
>
> Signed-off-by: Andrey Grodzovsky 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h|  2 ++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 31 --
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c   |  3 +--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gart.h   |  1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c|  9 +++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 13 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |  2 ++
>  drivers/gpu/drm/amd/amdgpu/cik_ih.c|  1 -
>  drivers/gpu/drm/amd/amdgpu/cz_ih.c |  1 -
>  drivers/gpu/drm/amd/amdgpu/iceland_ih.c|  1 -
>  drivers/gpu/drm/amd/amdgpu/navi10_ih.c |  3 ---
>  drivers/gpu/drm/amd/amdgpu/si_ih.c |  1 -
>  drivers/gpu/drm/amd/amdgpu/tonga_ih.c  |  1 -
>  drivers/gpu/drm/amd/amdgpu/vega10_ih.c |  3 ---
>  14 files changed, 56 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index fddb82897e5d..30a24db5f4d1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -1054,6 +1054,8 @@ struct amdgpu_device {
>  
>   boolin_pci_err_recovery;
>   struct pci_saved_state  *pci_state;
> +
> + struct list_headdevice_bo_list;
>  };
>  
>  static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 46d646c40338..91594ddc2459 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -70,6 +70,7 @@
>  #include 
>  #include 
>  
> +
>  MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin");
>  MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin");
>  MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin");
> @@ -3211,7 +3212,6 @@ static const struct attribute *amdgpu_dev_attributes[] 
> = {
>   NULL
>  };
>  
> -
>  /**
>   * amdgpu_device_init - initialize the driver
>   *
> @@ -3316,6 +3316,8 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>  
>   INIT_WORK(&adev->xgmi_reset_work, amdgpu_device_xgmi_reset_func);
>  
> + INIT_LIST_HEAD(&adev->device_bo_list);
> +
>   adev->gfx.gfx_off_req_count = 1;
>   adev->pm.ac_power = power_supply_is_system_supplied() > 0;
>  
> @@ -3601,6 +3603,28 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>   return r;
>  }
>  
> +static void amdgpu_clear_dma_mappings(struct amdgpu_device *adev)
> +{
> + struct amdgpu_bo *bo = NULL;
> +
> + /*
> +  * Unmaps all DMA mappings before device will be removed from it's
> +  * IOMMU group otherwise in case of IOMMU enabled system a crash
> +  * will happen.
> +  */
> +
> + spin_lock(&adev->mman.bdev.lru_lock);
> + while (!list_empty(&adev->device_bo_list)) {
> + bo = list_first_entry(&adev->device_bo_list, struct amdgpu_bo, 
> bo);
> + list_del_init(&bo->bo);
> + spin_unlock(&adev->mman.bdev.lru_lock);
> + if (bo->tbo.ttm)
> + ttm_tt_unpopulate(bo->tbo.bdev, bo->tbo.ttm);

I have a patch pending (reviewed by Christian) that moves the
dma-unmapping to amdgpu_ttm_backend_unbind. With that patch,
ttm_tt_unpopulate would no longer be the right way to remove the DMA
mapping.

Maybe I'd need to add a check in ttm_tt_unpopulate to call
backend_unbind first, if necessary. Or is there some other mechanism
that moves the BO to the CPU domain before unpopulating it?

Regards,
  Felix


> + spin_lock(&adev->mman.bdev.lru_lock);
> + }
> + spin_unlock(&adev->mman.bdev.lru_lock);
> +}
> +
>  /**
>   * amdgpu_device_fini - tear down the driver
>   *
> @@ -3639,12 +3663,15 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev)
>   amdgpu_ucode_sysfs_fini(adev);
>   sysfs_remove_files(&adev->dev->kobj, amdgpu_dev_attributes);
>  
> -
>   amdgpu_fbdev_fini(adev);
>  
>   amdgpu_irq_fini_hw(adev);
>  
>   amdgpu_device_ip_fini_early(adev);
> +
> + amdgpu_clear_dma_mappings(adev);
> +
> + amdgpu_gart_dummy_page_fini(adev);
>  }
>  
>  void amdgpu_device_fini_sw(struct amdgpu_device *adev)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
> index fde2d899b2c4..49cdcaf8512d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c
> @@ -92,7 +92,7 @@ static int amdgpu_gart_dummy_page_init(struct amdgpu_device 
> *adev)
>   *
>   * Frees the dummy page used by the driver (all asics).
>   */
> -static void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev)
> +void amdgpu_gart_dummy_page_fini(struct amdgpu_device *adev)
> 

Re: i.MX53 error during GPU use

2021-05-04 Thread Otavio Salvador
Hello Rob,

Em sex., 23 de abr. de 2021 às 11:35, Rob Clark  escreveu:
> On Fri, Apr 23, 2021 at 4:58 AM Otavio Salvador
>  wrote:
> > We found this error when using Freedreno driver on an i.MX53 device
> > with Wayland. Any idea how to fix this?
> >
> > [   32.414110] [drm:msm_ioctl_gem_submit] *ERROR* invalid cmdstream size: 0
>
> The invalid cmdstream size is some sort of userspace error
>
> > [   39.177075]
> > [   39.178617] ==
> > [   39.184804] WARNING: possible circular locking dependency detected
> > [   39.190997] 5.10.31+g7ae1de1d2bd3 #1 Not tainted
> > [   39.195619] --
>
> But possibly it is triggering the lockdep anger?  It looks like the
> gem locking re-work landed in v5.11.. any chance you can try a newer
> kernel?

Sure; we tried the 5.12.1 Linux kernel and it "worked". We have used
following versions:

- Linux kernel 5.12.1
- mesa 21.0.3
- libdrm 2.4.105

It improved a lot and it opens. We now have some rendering issues:

https://photos.app.goo.gl/fBKoe5C8tsq4xU556

and an error in serial console:

[  262.319890] schedule_timeout: wrong timeout value bf946f6e
[  262.325845] CPU: 0 PID: 216 Comm: eadedCompositor Not tainted
5.12.1+g1a5fea11bc2f #1
[  262.333727] Hardware name: Freescale i.MX53 (Device Tree Support)
[  262.339854] [] (unwind_backtrace) from []
(show_stack+0x10/0x14)
[  262.347659] [] (show_stack) from []
(dump_stack+0xdc/0x104)
[  262.355007] [] (dump_stack) from []
(schedule_timeout+0xf0/0x128)
[  262.362875] [] (schedule_timeout) from []
(msm_wait_fence+0x1c0/0x320)
[  262.371190] [] (msm_wait_fence) from []
(msm_ioctl_wait_fence+0xa8/0x154)
[  262.379749] [] (msm_ioctl_wait_fence) from []
(drm_ioctl+0x1f0/0x3dc)
[  262.387966] [] (drm_ioctl) from []
(sys_ioctl+0x3cc/0xbac)
[  262.395226] [] (sys_ioctl) from []
(ret_fast_syscall+0x0/0x2c)
[  262.402829] Exception stack(0xc315ffa8 to 0xc315fff0)
[  262.407911] ffa0:    abc10840 0010
40206447 abc10840 0020
[  262.416118] ffc0:  abc10840 40206447 0036 afd32cb0
abc108b8  abc1087c
[  262.424320] ffe0: b075aef0 abc10804 b0740214 b40a11fc

Any idea what might be causing it?




--
Otavio Salvador O.S. Systems
http://www.ossystems.com.brhttp://code.ossystems.com.br
Mobile: +55 (53) 9 9981-7854  Mobile: +1 (347) 903-9750
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Mesa-dev] [RFC] Linux Graphics Next: Explicit fences everywhere and no BO fences - initial proposal

2021-05-04 Thread Marek Olšák
I see some mentions of XNACK and recoverable page faults. Note that all
gaming AMD hw that has userspace queues doesn't have XNACK, so there is no
overhead in compute units. My understanding is that recoverable page faults
are still supported without XNACK, but instead of the compute unit
replaying the faulting instruction, the L1 cache does that. Anyway, the
point is that XNACK is totally irrelevant here.

Marek

On Tue., May 4, 2021, 08:48 Christian König, <
ckoenig.leichtzumer...@gmail.com> wrote:

> Am 04.05.21 um 13:13 schrieb Daniel Vetter:
> > On Tue, May 4, 2021 at 12:53 PM Christian König
> >  wrote:
> >> Am 04.05.21 um 11:47 schrieb Daniel Vetter:
> >>> [SNIP]
>  Yeah, it just takes to long for the preemption to complete to be
> really
>  useful for the feature we are discussing here.
> 
>  As I said when the kernel requests to preempt a queue we can easily
> expect a
>  timeout of ~100ms until that comes back. For compute that is even in
> the
>  multiple seconds range.
> >>> 100ms for preempting an idle request sounds like broken hw to me. Of
> >>> course preemting something that actually runs takes a while, that's
> >>> nothing new. But it's also not the thing we're talking about here. Is
> this
> >>> 100ms actual numbers from hw for an actual idle ringbuffer?
> >> Well 100ms is just an example of the scheduler granularity. Let me
> >> explain in a wider context.
> >>
> >> The hardware can have X queues mapped at the same time and every Y time
> >> interval the hardware scheduler checks if those queues have changed and
> >> only if they have changed the necessary steps to reload them are
> started.
> >>
> >> Multiple queues can be rendering at the same time, so you can have X as
> >> a high priority queue active and just waiting for a signal to start and
> >> the client rendering one frame after another and a third background
> >> compute task mining bitcoins for you.
> >>
> >> As long as everything is static this is perfectly performant. Adding a
> >> queue to the list of active queues is also relatively simple, but taking
> >> one down requires you to wait until we are sure the hardware has seen
> >> the change and reloaded the queues.
> >>
> >> Think of it as an RCU grace period. This is simply not something which
> >> is made to be used constantly, but rather just at process termination.
> > Uh ... that indeed sounds rather broken.
>
> Well I wouldn't call it broken. It's just not made for the use case we
> are trying to abuse it for.
>
> > Otoh it's just a dma_fence that'd we'd inject as this unload-fence.
>
> Yeah, exactly that's why it isn't much of a problem for process
> termination or freeing memory.
>
> > So by and large everyone should already be able to cope with it taking a
> > bit longer. So from a design pov I don't see a huge problem, but I
> > guess you guys wont be happy since it means on amd hw there will be
> > random unsightly stalls in desktop linux usage.
> >
>  The "preemption" feature is really called suspend and made just for
> the case
>  when we want to put a process to sleep or need to forcefully kill it
> for
>  misbehavior or stuff like that. It is not meant to be used in normal
>  operation.
> 
>  If we only attach it on ->move then yeah maybe a last resort
> possibility to
>  do it this way, but I think in that case we could rather stick with
> kernel
>  submissions.
> >>> Well this is a hybrid userspace ring + kernel augmeted submit mode, so
> you
> >>> can keep dma-fences working. Because the dma-fence stuff wont work with
> >>> pure userspace submit, I think that conclusion is rather solid. Once
> more
> >>> even after this long thread here.
> >> When assisted with unload fences, then yes. Problem is that I can't see
> >> how we could implement those performant currently.
> > Is there really no way to fix fw here? Like if process start/teardown
> > takes 100ms, that's going to suck no matter what.
>
> As I said adding the queue is unproblematic and teardown just results in
> a bit more waiting to free things up.
>
> Problematic is more overcommit swapping and OOM situations which need to
> wait for the hw scheduler to come back and tell us that the queue is now
> unmapped.
>
> > Also, if userspace lies to us and keeps pushing crap into the ring
> > after it's supposed to be idle: Userspace is already allowed to waste
> > gpu time. If you're too worried about this set a fairly aggressive
> > preempt timeout on the unload fence, and kill the context if it takes
> > longer than what preempting an idle ring should take (because that
> > would indicate broken/evil userspace).
>  I think you have the wrong expectation here. It is perfectly valid and
>  expected for userspace to keep writing commands into the ring buffer.
> 
>  After all when one frame is completed they want to immediately start
>  rendering the next one.
> >>> Sure, for the true userspace direct su

ERROR: modpost: "drm_display_mode_to_videomode" [drivers/gpu/drm/bridge/lontium-lt8912b.ko] undefined!

2021-05-04 Thread Michal Suchánek
Hello,

I get errors about missing symbol in the lontium-lt8912b module.

Is the problem self-evident or do you need the config as well?

I don't need the driver for anything, it was just auto-enabled because
it's new and the change has not been reviewed.

Thanks

Michal
> 
> Last output:
>   WRAParch/powerpc/boot/zImage.maple
>   WRAParch/powerpc/boot/zImage.pseries
> make[2]: *** Deleting file 'modules-only.symvers'
>   MODPOST modules-only.symvers
> ERROR: modpost: "drm_display_mode_to_videomode" 
> [drivers/gpu/drm/bridge/lontium-lt8912b.ko] undefined!
> make[2]: *** [../scripts/Makefile.modpost:150: modules-only.symvers] Error 1
> make[1]: *** 
> [/home/abuild/rpmbuild/BUILD/kernel-vanilla-5.12.0.13670.g5e321ded302d/linux-5.12-13670-g5e321ded302d/Makefile:1770:
>  modules] Error 2
> make: *** [../Makefile:215: __sub-make] Error 2
> error: Bad exit status from /var/tmp/rpm-tmp.q1oSIp (%build)
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: ERROR: modpost: "drm_display_mode_to_videomode" [drivers/gpu/drm/bridge/lontium-lt8912b.ko] undefined!

2021-05-04 Thread Adrien Grassein
Hello,

I think this is self-evident but could you please send the config to confirm?

Thanks,

Le mar. 4 mai 2021 à 20:30, Michal Suchánek  a écrit :
>
> Hello,
>
> I get errors about missing symbol in the lontium-lt8912b module.
>
> Is the problem self-evident or do you need the config as well?
>
> I don't need the driver for anything, it was just auto-enabled because
> it's new and the change has not been reviewed.
>
> Thanks
>
> Michal
> >
> > Last output:
> >   WRAParch/powerpc/boot/zImage.maple
> >   WRAParch/powerpc/boot/zImage.pseries
> > make[2]: *** Deleting file 'modules-only.symvers'
> >   MODPOST modules-only.symvers
> > ERROR: modpost: "drm_display_mode_to_videomode" 
> > [drivers/gpu/drm/bridge/lontium-lt8912b.ko] undefined!
> > make[2]: *** [../scripts/Makefile.modpost:150: modules-only.symvers] Error 1
> > make[1]: *** 
> > [/home/abuild/rpmbuild/BUILD/kernel-vanilla-5.12.0.13670.g5e321ded302d/linux-5.12-13670-g5e321ded302d/Makefile:1770:
> >  modules] Error 2
> > make: *** [../Makefile:215: __sub-make] Error 2
> > error: Bad exit status from /var/tmp/rpm-tmp.q1oSIp (%build)
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Intel-gfx] [PATCH 18/27] drm/i915/gem: Optionally set SSEU in intel_context_set_gem

2021-05-04 Thread Daniel Vetter
On Mon, May 03, 2021 at 10:57:39AM -0500, Jason Ekstrand wrote:
> For now this is a no-op because everyone passes in a null SSEU but it
> lets us get some of the error handling and selftest refactoring plumbed
> through.
> 
> Signed-off-by: Jason Ekstrand 

it is a bit icky that intel_context_set_gem also sets the sseu, feels a
bit like a layering violation, but welp I couldn't come up with a better
idea either.

Reviewed-by: Daniel Vetter 

> ---
>  drivers/gpu/drm/i915/gem/i915_gem_context.c   | 41 +++
>  .../gpu/drm/i915/gem/selftests/mock_context.c |  6 ++-
>  2 files changed, 36 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index ce729e640bbf7..6dd50d669c5b9 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -320,9 +320,12 @@ context_get_vm_rcu(struct i915_gem_context *ctx)
>   } while (1);
>  }
>  
> -static void intel_context_set_gem(struct intel_context *ce,
> -   struct i915_gem_context *ctx)
> +static int intel_context_set_gem(struct intel_context *ce,
> +  struct i915_gem_context *ctx,
> +  struct intel_sseu sseu)
>  {
> + int ret = 0;
> +
>   GEM_BUG_ON(rcu_access_pointer(ce->gem_context));
>   RCU_INIT_POINTER(ce->gem_context, ctx);
>  
> @@ -349,6 +352,12 @@ static void intel_context_set_gem(struct intel_context 
> *ce,
>  
>   intel_context_set_watchdog_us(ce, (u64)timeout_ms * 1000);
>   }
> +
> + /* A valid SSEU has no zero fields */
> + if (sseu.slice_mask && !WARN_ON(ce->engine->class != RENDER_CLASS))
> + ret = intel_context_reconfigure_sseu(ce, sseu);
> +
> + return ret;
>  }
>  
>  static void __free_engines(struct i915_gem_engines *e, unsigned int count)
> @@ -416,7 +425,8 @@ static struct i915_gem_engines *alloc_engines(unsigned 
> int count)
>   return e;
>  }
>  
> -static struct i915_gem_engines *default_engines(struct i915_gem_context *ctx)
> +static struct i915_gem_engines *default_engines(struct i915_gem_context *ctx,
> + struct intel_sseu rcs_sseu)
>  {
>   const struct intel_gt *gt = &ctx->i915->gt;
>   struct intel_engine_cs *engine;
> @@ -429,6 +439,8 @@ static struct i915_gem_engines *default_engines(struct 
> i915_gem_context *ctx)
>  
>   for_each_engine(engine, gt, id) {
>   struct intel_context *ce;
> + struct intel_sseu sseu = {};
> + int ret;
>  
>   if (engine->legacy_idx == INVALID_ENGINE)
>   continue;
> @@ -442,10 +454,18 @@ static struct i915_gem_engines *default_engines(struct 
> i915_gem_context *ctx)
>   goto free_engines;
>   }
>  
> - intel_context_set_gem(ce, ctx);
> -
>   e->engines[engine->legacy_idx] = ce;
>   e->num_engines = max(e->num_engines, engine->legacy_idx + 1);
> +
> + if (engine->class == RENDER_CLASS)
> + sseu = rcs_sseu;
> +
> + ret = intel_context_set_gem(ce, ctx, sseu);
> + if (ret) {
> + err = ERR_PTR(ret);
> + goto free_engines;
> + }
> +
>   }
>  
>   return e;
> @@ -759,6 +779,7 @@ __create_context(struct drm_i915_private *i915,
>  {
>   struct i915_gem_context *ctx;
>   struct i915_gem_engines *e;
> + struct intel_sseu null_sseu = {};
>   int err;
>   int i;
>  
> @@ -776,7 +797,7 @@ __create_context(struct drm_i915_private *i915,
>   INIT_LIST_HEAD(&ctx->stale.engines);
>  
>   mutex_init(&ctx->engines_mutex);
> - e = default_engines(ctx);
> + e = default_engines(ctx, null_sseu);
>   if (IS_ERR(e)) {
>   err = PTR_ERR(e);
>   goto err_free;
> @@ -1544,6 +1565,7 @@ set_engines__load_balance(struct i915_user_extension 
> __user *base, void *data)
>   struct intel_engine_cs *stack[16];
>   struct intel_engine_cs **siblings;
>   struct intel_context *ce;
> + struct intel_sseu null_sseu = {};
>   u16 num_siblings, idx;
>   unsigned int n;
>   int err;
> @@ -1616,7 +1638,7 @@ set_engines__load_balance(struct i915_user_extension 
> __user *base, void *data)
>   goto out_siblings;
>   }
>  
> - intel_context_set_gem(ce, set->ctx);
> + intel_context_set_gem(ce, set->ctx, null_sseu);
>  
>   if (cmpxchg(&set->engines->engines[idx], NULL, ce)) {
>   intel_context_put(ce);
> @@ -1724,6 +1746,7 @@ set_engines(struct i915_gem_context *ctx,
>   struct drm_i915_private *i915 = ctx->i915;
>   struct i915_context_param_engines __user *user =
>   u64_to_user_ptr(args->value);
> + struct intel_sseu null_sseu = {};
>   struct set_engines set = { .ctx = ctx };
>   unsigned 

Re: [PATCH 0/2] drm/radeon: Fix off-by-one power_state index heap overwrite

2021-05-04 Thread Alex Deucher
On Mon, May 3, 2021 at 1:06 AM Kees Cook  wrote:
>
> Hi,
>
> This is an attempt at fixing a bug[1] uncovered by the relocation of
> the slab freelist pointer offset, as well as some related clean-ups.
>
> I don't have hardware to do runtime testing, but it builds. ;)
>
> -Kees
>
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=211537
>
> Kees Cook (2):
>   drm/radeon: Fix off-by-one power_state index heap overwrite
>   drm/radeon: Avoid power table parsing memory leaks

Applied.  Thanks!

Alex

>
>  drivers/gpu/drm/radeon/radeon_atombios.c | 26 
>  1 file changed, 18 insertions(+), 8 deletions(-)
>
> --
> 2.25.1
>
> ___
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH] drm/amd/pm: initialize variable

2021-05-04 Thread Alex Deucher
On Fri, Apr 30, 2021 at 2:05 PM  wrote:
>
> From: Tom Rix 
>
> Static analysis reports this problem
>
> amdgpu_pm.c:478:16: warning: The right operand of '<' is a garbage value
>   for (i = 0; i < data.nums; i++) {
> ^ ~
>
> In some cases data is not set.  Initialize to 0 and flag not setting
> data as an error with the existing check.
>
> Signed-off-by: Tom Rix 

Applied.  Thanks!

Alex


> ---
>  drivers/gpu/drm/amd/pm/amdgpu_pm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c 
> b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
> index 4e459ef632ef..9a54066ec0af 100644
> --- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
> +++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
> @@ -451,7 +451,7 @@ static ssize_t amdgpu_get_pp_cur_state(struct device *dev,
> struct drm_device *ddev = dev_get_drvdata(dev);
> struct amdgpu_device *adev = drm_to_adev(ddev);
> const struct amd_pm_funcs *pp_funcs = adev->powerplay.pp_funcs;
> -   struct pp_states_info data;
> +   struct pp_states_info data = {0};
> enum amd_pm_state_type pm = 0;
> int i = 0, ret = 0;
>
> --
> 2.26.3
>
> ___
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: ERROR: modpost: "drm_display_mode_to_videomode" [drivers/gpu/drm/bridge/lontium-lt8912b.ko] undefined!

2021-05-04 Thread Adrien Grassein
Ok thanks,

I will investigate this.

Le mar. 4 mai 2021 à 21:04, Michal Suchánek  a écrit :
>
> Hello,
>
> I have only one from ppc64, the other architectures don't have the
> problem or fail earlier.
>
> Thanks
>
> Michal
>
> On Tue, May 04, 2021 at 08:45:01PM +0200, Adrien Grassein wrote:
> > Hello,
> >
> > I think this is self-evident but could you please send the config to 
> > confirm?
> >
> > Thanks,
> >
> > Le mar. 4 mai 2021 à 20:30, Michal Suchánek  a écrit :
> > >
> > > Hello,
> > >
> > > I get errors about missing symbol in the lontium-lt8912b module.
> > >
> > > Is the problem self-evident or do you need the config as well?
> > >
> > > I don't need the driver for anything, it was just auto-enabled because
> > > it's new and the change has not been reviewed.
> > >
> > > Thanks
> > >
> > > Michal
> > > >
> > > > Last output:
> > > >   WRAParch/powerpc/boot/zImage.maple
> > > >   WRAParch/powerpc/boot/zImage.pseries
> > > > make[2]: *** Deleting file 'modules-only.symvers'
> > > >   MODPOST modules-only.symvers
> > > > ERROR: modpost: "drm_display_mode_to_videomode" 
> > > > [drivers/gpu/drm/bridge/lontium-lt8912b.ko] undefined!
> > > > make[2]: *** [../scripts/Makefile.modpost:150: modules-only.symvers] 
> > > > Error 1
> > > > make[1]: *** 
> > > > [/home/abuild/rpmbuild/BUILD/kernel-vanilla-5.12.0.13670.g5e321ded302d/linux-5.12-13670-g5e321ded302d/Makefile:1770:
> > > >  modules] Error 2
> > > > make: *** [../Makefile:215: __sub-make] Error 2
> > > > error: Bad exit status from /var/tmp/rpm-tmp.q1oSIp (%build)
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH v4] drm/amd/amdgpu/amdgpu_drv.c: Replace drm_modeset_lock_all with drm_modeset_lock

2021-05-04 Thread Alex Deucher
On Tue, Apr 27, 2021 at 5:45 AM Fabio M. De Francesco
 wrote:
>
> drm_modeset_lock_all() is not needed here, so it is replaced with
> drm_modeset_lock(). The crtc list around which we are looping never
> changes, therefore the only lock we need is to protect access to
> crtc->state.
>
> Suggested-by: Daniel Vetter 
> Suggested-by: Matthew Wilcox 
> Signed-off-by: Fabio M. De Francesco 
> Reviewed-by: Matthew Wilcox (Oracle) 

Applied.  Thanks!

Alex


> ---
>
> Changes from v3: CC'ed more (previously missing) maintainers.
> Changes from v2: Drop file name from the Subject. Cc'ed all maintainers.
> Changes from v1: Removed unnecessary braces around single statement
> block.
>
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 10 --
>  1 file changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 80130c1c0c68..39204dbc168b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -1595,17 +1595,15 @@ static int amdgpu_pmops_runtime_idle(struct device 
> *dev)
> if (amdgpu_device_has_dc_support(adev)) {
> struct drm_crtc *crtc;
>
> -   drm_modeset_lock_all(drm_dev);
> -
> drm_for_each_crtc(crtc, drm_dev) {
> -   if (crtc->state->active) {
> +   drm_modeset_lock(&crtc->mutex, NULL);
> +   if (crtc->state->active)
> ret = -EBUSY;
> +   drm_modeset_unlock(&crtc->mutex);
> +   if (ret < 0)
> break;
> -   }
> }
>
> -   drm_modeset_unlock_all(drm_dev);
> -
> } else {
> struct drm_connector *list_connector;
> struct drm_connector_list_iter iter;
> --
> 2.31.1
>
> ___
> amd-gfx mailing list
> amd-...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: 16 bpc fixed point (RGBA16) framebuffer support for core and AMD.

2021-05-04 Thread Alex Deucher
On Wed, Apr 28, 2021 at 5:21 PM Alex Deucher  wrote:
>
> On Tue, Apr 20, 2021 at 5:25 PM Alex Deucher  wrote:
> >
> > On Fri, Apr 16, 2021 at 12:29 PM Mario Kleiner
> >  wrote:
> > >
> > > Friendly ping to the AMD people. Nicholas, Harry, Alex, any feedback?
> > > Would be great to get this in sooner than later.
> > >
> >
> > No objections from me.
> >
>
> I don't have any objections to merging this.  Are the IGT tests available?
>

Any preference on whether I merge this through the AMD tree or drm-misc?

Alex


> Alex
>
> > Alex
> >
> >
> > > Thanks and have a nice weekend,
> > > -mario
> > >
> > > On Fri, Mar 19, 2021 at 10:03 PM Mario Kleiner
> > >  wrote:
> > > >
> > > > Hi,
> > > >
> > > > this patch series adds the fourcc's for 16 bit fixed point unorm
> > > > framebuffers to the core, and then an implementation for AMD gpu's
> > > > with DisplayCore.
> > > >
> > > > This is intended to allow for pageflipping to, and direct scanout of,
> > > > Vulkan swapchain images in the format VK_FORMAT_R16G16B16A16_UNORM.
> > > > I have patched AMD's GPUOpen amdvlk OSS driver to enable this format
> > > > for swapchains, mapping to DRM_FORMAT_XBGR16161616:
> > > > Link: 
> > > > https://github.com/kleinerm/pal/commit/a25d4802074b13a8d5f7edc96ae45469ecbac3c4
> > > >
> > > > My main motivation for this is squeezing every bit of precision
> > > > out of the hardware for scientific and medical research applications,
> > > > where fp16 in the unorm range is limited to ~11 bpc effective linear
> > > > precision in the upper half [0.5;1.0] of the unorm range, although
> > > > the hardware could do at least 12 bpc.
> > > >
> > > > It has been successfully tested on AMD RavenRidge (DCN-1), and with
> > > > Polaris11 (DCE-11.2). Up to two displays were active on RavenRidge
> > > > (DP 2560x1440@144Hz + HDMI 2560x1440@120Hz), the maximum supported
> > > > on my hw, both running at 10 bpc DP output depth.
> > > >
> > > > Up to three displays were active on the Polaris (DP 2560x1440@144Hz +
> > > > 2560x1440@100Hz USB-C DP-altMode-to-HDMI converter + eDP 2880x1800@60Hz
> > > > Apple Retina panel), all running at 10 bpc output depth.
> > > >
> > > > No malfunctions, visual artifacts or other oddities were observed
> > > > (apart from an adventureous mess of cables and adapters on my desk),
> > > > suggesting it works.
> > > >
> > > > I used my automatic photometer measurement procedure to verify the
> > > > effective output precision of 10 bpc DP native signal + spatial
> > > > dithering in the gpu as enabled by the amdgpu driver. Results show
> > > > the expected 12 bpc precision i hoped for -- the current upper limit
> > > > for AMD display hw afaik.
> > > >
> > > > So it seems to work in the way i hoped :).
> > > >
> > > > Some open questions wrt. AMD DC, to be addressed in this patch series, 
> > > > or follow up
> > > > patches if neccessary:
> > > >
> > > > - For the atomic check for plane scaling, the current patch will
> > > > apply the same hw limits as for other rgb fixed point fb's, e.g.,
> > > > for 8 bpc rgb8. Is this correct? Or would we need to use the fp16
> > > > limits, because this is also a 64 bpp format? Or something new
> > > > entirely?
> > > >
> > > > - I haven't added the new fourcc to the DCC tables yet. Should i?
> > > >
> > > > - I had to change an assert for DCE to allow 36bpp linebuffers (patch 
> > > > 4/5).
> > > > It looks to me as if that assert was inconsistent with other places
> > > > in the driver where COLOR_DEPTH121212 is supported, and looking at
> > > > the code, the change seems harmless. At least on DCE-11.2 the change
> > > > didn't cause any noticeable (by myself) or measurable (by my equipment)
> > > > problems on any of the 3 connected displays.
> > > >
> > > > - Related to that change, while i needed to increase lb pixelsize to 
> > > > 36bpp
> > > > to get > 10 bpc effective precision on DCN, i didn't need to do that
> > > > on DCE. Also no change of lb pixelsize was needed on either DCN or DCe
> > > > to get > 10 bpc precision for fp16 framebuffers, so something seems to
> > > > behave differently for floating point 16 vs. fixed point 16. This all
> > > > seems to suggest one could leave lb pixelsize at the old 30 bpp value
> > > > on at least DCE-11.2 and still get the > 10 bpc precision if one wanted
> > > > to avoid the changes of patch 4/5.
> > > >
> > > > Thanks,
> > > > -mario
> > > >
> > > >
> > > ___
> > > dri-devel mailing list
> > > dri-devel@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 19/27] drm/i915/gem: Use the proto-context to handle create parameters

2021-05-04 Thread Daniel Vetter
On Mon, May 03, 2021 at 10:57:40AM -0500, Jason Ekstrand wrote:
> This means that the proto-context needs to grow support for engine
> configuration information as well as setparam logic.  Fortunately, we'll
> be deleting a lot of setparam logic on the primary context shortly so it
> will hopefully balance out.
> 
> Signed-off-by: Jason Ekstrand 
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_context.c   | 546 +-
>  .../gpu/drm/i915/gem/i915_gem_context_types.h |  58 ++
>  2 files changed, 587 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index 6dd50d669c5b9..aa4edfbf7ed48 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -193,8 +193,15 @@ static int validate_priority(struct drm_i915_private 
> *i915,
>  
>  static void proto_context_close(struct i915_gem_proto_context *pc)
>  {
> + int i;
> +
>   if (pc->vm)
>   i915_vm_put(pc->vm);
> + if (pc->user_engines) {
> + for (i = 0; i < pc->num_user_engines; i++)
> + kfree(pc->user_engines[i].siblings);
> + kfree(pc->user_engines);

free_engines(&pc->user_engines);

Maybe even stuff that if check into free_engines. Except I realized this
is proto engines here now :-(

> + }
>   kfree(pc);
>  }
>  
> @@ -248,6 +255,9 @@ proto_context_create(struct drm_i915_private *i915, 
> unsigned int flags)
>   if (!pc)
>   return ERR_PTR(-ENOMEM);
>  
> + pc->num_user_engines = -1;
> + pc->user_engines = NULL;
> +
>   if (HAS_FULL_PPGTT(i915)) {
>   struct i915_ppgtt *ppgtt;
>  
> @@ -282,6 +292,439 @@ proto_context_create(struct drm_i915_private *i915, 
> unsigned int flags)
>   return err;
>  }
>  
> +static int set_proto_ctx_vm(struct drm_i915_file_private *fpriv,
> + struct i915_gem_proto_context *pc,
> + const struct drm_i915_gem_context_param *args)
> +{
> + struct i915_address_space *vm;
> +
> + if (args->size)
> + return -EINVAL;
> +
> + if (!pc->vm)
> + return -ENODEV;
> +
> + if (upper_32_bits(args->value))
> + return -ENOENT;
> +
> + rcu_read_lock();
> + vm = xa_load(&fpriv->vm_xa, args->value);
> + if (vm && !kref_get_unless_zero(&vm->ref))
> + vm = NULL;
> + rcu_read_unlock();

vm lookup helpers would be nice I guess, but perhaps something that
vm_bind patches should do.

> + if (!vm)
> + return -ENOENT;
> +
> + i915_vm_put(pc->vm);

Ah I guess I've found why you went with "pc->vm is always set". *shrug*

> + pc->vm = vm;
> +
> + return 0;
> +}
> +
> +struct set_proto_ctx_engines {
> + struct drm_i915_private *i915;
> + unsigned num_engines;
> + struct i915_gem_proto_engine *engines;
> +};
> +
> +static int
> +set_proto_ctx_engines_balance(struct i915_user_extension __user *base,
> +   void *data)
> +{
> + struct i915_context_engines_load_balance __user *ext =
> + container_of_user(base, typeof(*ext), base);
> + const struct set_proto_ctx_engines *set = data;
> + struct drm_i915_private *i915 = set->i915;
> + struct intel_engine_cs **siblings;
> + u16 num_siblings, idx;
> + unsigned int n;
> + int err;
> +
> + if (!HAS_EXECLISTS(i915))
> + return -ENODEV;
> +
> + if (intel_uc_uses_guc_submission(&i915->gt.uc))
> + return -ENODEV; /* not implement yet */
> +
> + if (get_user(idx, &ext->engine_index))
> + return -EFAULT;
> +
> + if (idx >= set->num_engines) {
> + drm_dbg(&i915->drm, "Invalid placement value, %d >= %d\n",
> + idx, set->num_engines);
> + return -EINVAL;
> + }
> +
> + idx = array_index_nospec(idx, set->num_engines);
> + if (set->engines[idx].type != I915_GEM_ENGINE_TYPE_INVALID) {
> + drm_dbg(&i915->drm,
> + "Invalid placement[%d], already occupied\n", idx);
> + return -EEXIST;
> + }
> +
> + if (get_user(num_siblings, &ext->num_siblings))
> + return -EFAULT;
> +
> + err = check_user_mbz(&ext->flags);
> + if (err)
> + return err;
> +
> + err = check_user_mbz(&ext->mbz64);
> + if (err)
> + return err;
> +
> + if (num_siblings == 0)
> + return 0;

You deleted the on-stack siblings micro-optimization.

I'm shocked.

> +
> + siblings = kmalloc_array(num_siblings, sizeof(*siblings), GFP_KERNEL);

If you want to pay back your micro-opt budget: GFP_TEMPORARY.

But then I realized much wiser heads than me removed this in 2017 from the
kernel! That commit is a rather interesting story btw, if you're bored:

commit 0ee931c4e31a5efb134c76440405e9219f896e33
Author: Michal Hocko 
Date:   Wed Sep 13 16:28

Re: [Intel-gfx] [PATCH 22/27] drm/i915/gem: Delay context creation

2021-05-04 Thread Daniel Vetter
On Mon, May 03, 2021 at 10:57:43AM -0500, Jason Ekstrand wrote:
> The current context uAPI allows for two methods of setting context
> parameters: SET_CONTEXT_PARAM and CONTEXT_CREATE_EXT_SETPARAM.  The
> former is allowed to be called at any time while the later happens as
> part of GEM_CONTEXT_CREATE.  Currently, everything settable via one is
> settable via the other.  While some params are fairly simple and setting
> them on a live context is harmless such the context priority, others are
> far trickier such as the VM or the set of engines.  In order to swap out
> the VM, for instance, we have to delay until all current in-flight work
> is complete, swap in the new VM, and then continue.  This leads to a
> plethora of potential race conditions we'd really rather avoid.
> 
> In previous patches, we added a i915_gem_proto_context struct which is
> capable of storing and tracking all such create parameters.  This commit
> delays the creation of the actual context until after the client is done
> configuring it with SET_CONTEXT_PARAM.  From the perspective of the
> client, it has the same u32 context ID the whole time.  From the
> perspective of i915, however, it's an i915_gem_proto_context right up
> until the point where we attempt to do something which the proto-context
> can't handle at which point the real context gets created.
> 
> This is accomplished via a little xarray dance.  When GEM_CONTEXT_CREATE
> is called, we create a proto-context, reserve a slot in context_xa but
> leave it NULL, the proto-context in the corresponding slot in
> proto_context_xa.  Then, whenever we go to look up a context, we first
> check context_xa.  If it's there, we return the i915_gem_context and
> we're done.  If it's not, we look in proto_context_xa and, if we find it
> there, we create the actual context and kill the proto-context.
> 
> In order for this dance to work properly, everything which ever touches
> a proto-context is guarded by drm_i915_file_private::proto_context_lock,
> including context creation.  Yes, this means context creation now takes
> a giant global lock but it can't really be helped and that should never
> be on any driver's fast-path anyway.
> 
> Signed-off-by: Jason Ekstrand 
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_context.c   | 211 ++
>  drivers/gpu/drm/i915/gem/i915_gem_context.h   |   3 +
>  .../gpu/drm/i915/gem/i915_gem_context_types.h |  54 +
>  .../gpu/drm/i915/gem/selftests/mock_context.c |   5 +-
>  drivers/gpu/drm/i915/i915_drv.h   |  24 +-
>  5 files changed, 239 insertions(+), 58 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c 
> b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> index 06d413eef01a3..f0e7ce6b979b4 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
> @@ -292,6 +292,42 @@ proto_context_create(struct drm_i915_private *i915, 
> unsigned int flags)
>   return err;
>  }
>  
> +static int proto_context_register_locked(struct drm_i915_file_private *fpriv,
> +  struct i915_gem_proto_context *pc,
> +  u32 *id)
> +{
> + int ret;
> + void *old;
> +
> + lockdep_assert_held(&fpriv->proto_context_lock);
> +
> + ret = xa_alloc(&fpriv->context_xa, id, NULL, xa_limit_32b, GFP_KERNEL);
> + if (ret)
> + return ret;
> +
> + old = xa_store(&fpriv->proto_context_xa, *id, pc, GFP_KERNEL);
> + if (xa_is_err(old)) {
> + xa_erase(&fpriv->context_xa, *id);
> + return xa_err(old);
> + }
> + GEM_BUG_ON(old);

A bit brutal, since worst case we just leaked something. I'd only go with
WARN_ON. This isn't userspace, dying should be optional to make debugging
easier (paranoid people reboot the machine on both anyway).

> +
> + return 0;
> +}
> +
> +static int proto_context_register(struct drm_i915_file_private *fpriv,
> +   struct i915_gem_proto_context *pc,
> +   u32 *id)
> +{
> + int ret;
> +
> + mutex_lock(&fpriv->proto_context_lock);
> + ret = proto_context_register_locked(fpriv, pc, id);
> + mutex_unlock(&fpriv->proto_context_lock);
> +
> + return ret;
> +}
> +
>  static int set_proto_ctx_vm(struct drm_i915_file_private *fpriv,
>   struct i915_gem_proto_context *pc,
>   const struct drm_i915_gem_context_param *args)
> @@ -1452,12 +1488,12 @@ void i915_gem_init__contexts(struct drm_i915_private 
> *i915)
>   init_contexts(&i915->gem.contexts);
>  }
>  
> -static int gem_context_register(struct i915_gem_context *ctx,
> - struct drm_i915_file_private *fpriv,
> - u32 *id)
> +static void gem_context_register(struct i915_gem_context *ctx,
> +  struct drm_i915_file_private *fpriv,
> +  u32 id)
>  {

Re: [Intel-gfx] [PATCH 22/27] drm/i915/gem: Delay context creation

2021-05-04 Thread Daniel Vetter
On Tue, May 04, 2021 at 03:38:06AM +0800, kernel test robot wrote:
> Hi Jason,
> 
> Thank you for the patch! Yet something to improve:
> 
> [auto build test ERROR on drm-intel/for-linux-next]
> [also build test ERROR on drm-tip/drm-tip drm-exynos/exynos-drm-next 
> next-20210503]
> [cannot apply to tegra-drm/drm/tegra/for-next v5.12]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
> 
> url:
> https://github.com/0day-ci/linux/commits/Jason-Ekstrand/drm-i915-gem-ioctl-clean-ups-v5/20210504-000132
> base:   git://anongit.freedesktop.org/drm-intel for-linux-next
> config: i386-randconfig-r013-20210503 (attached as .config)
> compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
> reproduce (this is a W=1 build):
> # 
> https://github.com/0day-ci/linux/commit/66ce6ce447515a302711a65f731d1e6f66abdcdc
> git remote add linux-review https://github.com/0day-ci/linux
> git fetch --no-tags linux-review 
> Jason-Ekstrand/drm-i915-gem-ioctl-clean-ups-v5/20210504-000132
> git checkout 66ce6ce447515a302711a65f731d1e6f66abdcdc
> # save the attached .config to linux build tree
> make W=1 W=1 ARCH=i386 
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot 
> 
> All errors (new ones prefixed by >>):
> 
> >> drivers/gpu/drm/i915/gem/i915_gem_context.c:2543:1: error: no previous 
> >> prototype for 'lazy_create_context_locked' [-Werror=missing-prototypes]
> 2543 | lazy_create_context_locked(struct drm_i915_file_private *file_priv,
>  | ^~
>cc1: all warnings being treated as errors

Ah you missed the static, and I missed that in review. That one should be
fixed :-)
-Daniel

> 
> 
> vim +/lazy_create_context_locked +2543 
> drivers/gpu/drm/i915/gem/i915_gem_context.c
> 
>   2541
>   2542struct i915_gem_context *
> > 2543lazy_create_context_locked(struct drm_i915_file_private 
> > *file_priv,
>   2544   struct i915_gem_proto_context *pc, 
> u32 id)
>   2545{
>   2546struct i915_gem_context *ctx;
>   2547void *old;
>   2548
>   2549lockdep_assert_held(&file_priv->proto_context_lock);
>   2550
>   2551ctx = i915_gem_create_context(file_priv->dev_priv, pc);
>   2552if (IS_ERR(ctx))
>   2553return ctx;
>   2554
>   2555gem_context_register(ctx, file_priv, id);
>   2556
>   2557old = xa_erase(&file_priv->proto_context_xa, id);
>   2558GEM_BUG_ON(old != pc);
>   2559proto_context_close(pc);
>   2560
>   2561/* One for the xarray and one for the caller */
>   2562return i915_gem_context_get(ctx);
>   2563}
>   2564
> 
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


> ___
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel


-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Intel-gfx] [PATCH 23/27] drm/i915/gem: Don't allow changing the VM on running contexts

2021-05-04 Thread Daniel Vetter
On Tue, May 04, 2021 at 02:52:03AM +0800, kernel test robot wrote:
> Hi Jason,
> 
> Thank you for the patch! Perhaps something to improve:
> 
> [auto build test WARNING on drm-intel/for-linux-next]
> [also build test WARNING on drm-tip/drm-tip drm-exynos/exynos-drm-next 
> next-20210503]
> [cannot apply to tegra-drm/drm/tegra/for-next drm/drm-next v5.12]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
> 
> url:
> https://github.com/0day-ci/linux/commits/Jason-Ekstrand/drm-i915-gem-ioctl-clean-ups-v5/20210504-000132
> base:   git://anongit.freedesktop.org/drm-intel for-linux-next
> config: i386-randconfig-s002-20210503 (attached as .config)
> compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
> reproduce:
> # apt-get install sparse
> # sparse version: v0.6.3-341-g8af24329-dirty
> # 
> https://github.com/0day-ci/linux/commit/6af12f5ca765ecd59075344f3be4c4c0b68ef95e
> git remote add linux-review https://github.com/0day-ci/linux
> git fetch --no-tags linux-review 
> Jason-Ekstrand/drm-i915-gem-ioctl-clean-ups-v5/20210504-000132
> git checkout 6af12f5ca765ecd59075344f3be4c4c0b68ef95e
> # save the attached .config to linux build tree
> make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' W=1 
> ARCH=i386 
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot 

Just from staring at this I have no idea, so I guess you have to
reproduce. What sparse does is primarily check these special bit and
pointer values the kernel has, like __rcu and __user, where you need
special functions to access them. But since the code is using
rcu_dereference I have no idea what the complaint is about.
-Daniel

> 
> 
> sparse warnings: (new ones prefixed by >>)
>drivers/gpu/drm/i915/gt/intel_reset.c:1329:5: sparse: sparse: context 
> imbalance in 'intel_gt_reset_trylock' - different lock contexts for basic 
> block
>drivers/gpu/drm/i915/gt/intel_reset.c: note: in included file:
> >> drivers/gpu/drm/i915/gem/i915_gem_context.h:163:14: sparse: sparse: 
> >> incompatible types in comparison expression (different address spaces):
> >> drivers/gpu/drm/i915/gem/i915_gem_context.h:163:14: sparse:struct 
> >> i915_address_space [noderef] __rcu *
> >> drivers/gpu/drm/i915/gem/i915_gem_context.h:163:14: sparse:struct 
> >> i915_address_space *
> --
>drivers/gpu/drm/i915/gt/intel_execlists_submission.c: note: in included 
> file (through drivers/gpu/drm/i915/selftests/igt_spinner.h, 
> drivers/gpu/drm/i915/gt/selftest_execlists.c):
> >> drivers/gpu/drm/i915/gem/i915_gem_context.h:163:14: sparse: sparse: 
> >> incompatible types in comparison expression (different address spaces):
> >> drivers/gpu/drm/i915/gem/i915_gem_context.h:163:14: sparse:struct 
> >> i915_address_space [noderef] __rcu *
> >> drivers/gpu/drm/i915/gem/i915_gem_context.h:163:14: sparse:struct 
> >> i915_address_space *
> --
>drivers/gpu/drm/i915/gem/i915_gem_object.c: note: in included file:
> >> drivers/gpu/drm/i915/gem/i915_gem_context.h:163:14: sparse: sparse: 
> >> incompatible types in comparison expression (different address spaces):
> >> drivers/gpu/drm/i915/gem/i915_gem_context.h:163:14: sparse:struct 
> >> i915_address_space [noderef] __rcu *
> >> drivers/gpu/drm/i915/gem/i915_gem_context.h:163:14: sparse:struct 
> >> i915_address_space *
> >> drivers/gpu/drm/i915/gem/i915_gem_context.h:163:14: sparse: sparse: 
> >> incompatible types in comparison expression (different address spaces):
> >> drivers/gpu/drm/i915/gem/i915_gem_context.h:163:14: sparse:struct 
> >> i915_address_space [noderef] __rcu *
> >> drivers/gpu/drm/i915/gem/i915_gem_context.h:163:14: sparse:struct 
> >> i915_address_space *
>drivers/gpu/drm/i915/gem/i915_gem_context.h:154:16: sparse: sparse: 
> incompatible types in comparison expression (different address spaces):
>drivers/gpu/drm/i915/gem/i915_gem_context.h:154:16: sparse:struct 
> i915_address_space [noderef] __rcu *
>drivers/gpu/drm/i915/gem/i915_gem_context.h:154:16: sparse:struct 
> i915_address_space *
> --
>drivers/gpu/drm/i915/i915_gem_gtt.c: note: in included file (through 
> drivers/gpu/drm/i915/selftests/i915_gem_gtt.c):
> >> drivers/gpu/drm/i915/gem/i915_gem_context.h:163:14: sparse: sparse: 
> >> incompatible types in comparison expression (different address spaces):
> >>

  1   2   >