On Wed, 20 Dec 2023 20:59:43 +0100
Ketil Johnsen <ketil.john...@arm.com> wrote:

> > +/**
> > + * cs_slot_sync_queue_state_locked() - Synchronize the queue slot priority
> > + * @ptdev: Device.
> > + * @csg_id: Group slot.
> > + * @cs_id: Queue slot.
> > + *
> > + * Queue state is updated on group suspend or STATUS_UPDATE event.
> > + */
> > +static void
> > +cs_slot_sync_queue_state_locked(struct panthor_device *ptdev, u32 csg_id, 
> > u32 cs_id)
> > +{
> > +   struct panthor_group *group = ptdev->scheduler->csg_slots[csg_id].group;
> > +   struct panthor_queue *queue = group->queues[cs_id];
> > +   struct panthor_fw_cs_iface *cs_iface =
> > +           panthor_fw_get_cs_iface(group->ptdev, csg_id, cs_id);
> > +
> > +   u32 status_wait_cond;
> > +
> > +   switch (cs_iface->output->status_blocked_reason) {
> > +   case CS_STATUS_BLOCKED_REASON_UNBLOCKED:
> > +           if (queue->iface.input->insert == queue->iface.output->extract 
> > &&
> > +               cs_iface->output->status_scoreboards == 0)
> > +                   group->idle_queues |= BIT(cs_id);
> > +           break;
> > +
> > +   case CS_STATUS_BLOCKED_REASON_SYNC_WAIT:
> > +           drm_WARN_ON(&ptdev->base, !list_empty(&group->wait_node));  
> 
> I think we should remove this drm_WARN_ON(). With my user submission 
> experiments, I keep hitting this warning because I'm a bit slow to 
> signal a Mali sync object. In other words; I'm keeping a stream blocked 
> for a while.
> 
> It is quite common to get two rapid job IRQs, e.g. one for a global 
> event, and one for a particular CSG event. Depending on timing of the 
> scheduled work to deal with the IRQs, I quite often end up with two 
> tick_work() being scheduled and executed as a result of this. Both of 
> these will see the same stream as CS_STATUS_BLOCKED_REASON_UNBLOCKED, 
> and hence the second will trigger the drm_WARN_ON(), as the first run 
> already added the group to the waiting list.
> 
> I'm pretty sure we can hit this drm_WARN_ON() when user space starts 
> making use of multiple streams pr group as well, since two or more 
> streams for the same group could both be 
> CS_STATUS_BLOCKED_REASON_SYNC_WAIT, thus running into the same issue.

It makes total sense, I'll drop the WARN_ON().

> 
> > +           list_move_tail(&group->wait_node, 
> > &group->ptdev->scheduler->groups.waiting);
> > +           group->blocked_queues |= BIT(cs_id);
> > +           queue->syncwait.gpu_va = cs_iface->output->status_wait_sync_ptr;
> > +           queue->syncwait.ref = cs_iface->output->status_wait_sync_value;
> > +           status_wait_cond = cs_iface->output->status_wait & 
> > CS_STATUS_WAIT_SYNC_COND_MASK;
> > +           queue->syncwait.gt = status_wait_cond == 
> > CS_STATUS_WAIT_SYNC_COND_GT;
> > +           if (cs_iface->output->status_wait & CS_STATUS_WAIT_SYNC_64B) {
> > +                   u64 sync_val_hi = 
> > cs_iface->output->status_wait_sync_value_hi;
> > +
> > +                   queue->syncwait.sync64 = true;
> > +                   queue->syncwait.ref |= sync_val_hi << 32;
> > +           } else {
> > +                   queue->syncwait.sync64 = false;
> > +           }
> > +           break;
> > +
> > +   default:
> > +           /* Other reasons are not blocking. Consider the queue as 
> > runnable
> > +            * in those cases.
> > +            */
> > +           break;
> > +   }
> > +}  

Reply via email to