Hi Michał, Please look at the below function,
static int ena_com_wait_and_process_admin_cq_polling( struct ena_comp_ctx *comp_ctx, struct ena_com_admin_queue *admin_queue) { unsigned long flags = 0; u64 start_time; int ret; start_time = ENA_GET_SYSTEM_USECS(); while (comp_ctx->status == ENA_CMD_SUBMITTED) { if ((ENA_GET_SYSTEM_USECS() - start_time) > ADMIN_CMD_TIMEOUT_US) { ena_trc_err("Wait for completion (polling) timeout\n"); /* ENA didn't have any completion */ ENA_SPINLOCK_LOCK(admin_queue->q_lock, flags); admin_queue->stats.no_completion++; admin_queue->running_state = false; ENA_SPINLOCK_UNLOCK(admin_queue->q_lock, flags); ret = ENA_COM_TIMER_EXPIRED; goto err; } *ENA_SPINLOCK_LOCK(admin_queue->q_lock, flags); ena_com_handle_admin_completion(admin_queue); ENA_SPINLOCK_UNLOCK(admin_queue->q_lock, flags);* } if (unlikely(comp_ctx->status == ENA_CMD_ABORTED)) { ena_trc_err("Command was aborted\n"); ENA_SPINLOCK_LOCK(admin_queue->q_lock, flags); admin_queue->stats.aborted_cmd++; ENA_SPINLOCK_UNLOCK(admin_queue->q_lock, flags); ret = ENA_COM_NO_DEVICE; goto err; } ENA_ASSERT(comp_ctx->status == ENA_CMD_COMPLETED, "Invalid comp status %d\n", comp_ctx->status); ret = ena_com_comp_status_to_errno(comp_ctx->comp_status); err: *comp_ctxt_release(admin_queue, comp_ctx);* return ret; } This is a case where there are two threads executing admin commands. The occupied flag is set to false in the function comp_ctxt_release. Let us say there are two consumers of completion context and C1 has a completion context and the same completion context can be used by another consumer C2 even before the C1 is resetting the occupied flag. This is because the ena_com_handle_admin_completion is done under spin lock and comp_ctxt_release is not under this spin lock. Thanks, Param On Thu, Oct 24, 2019 at 2:09 PM Michał Krawczyk <m...@semihalf.com> wrote: > sob., 19 paź 2019 o 20:26 kumaraparameshwaran rathinavel > <kumaraparames...@gmail.com> napisał(a): > > > > Hi All, > > > > In the ENA poll mode driver I see that every request in the admin queue > is > > associated with a completion context and this is preallocated during the > > device initialisation. When the completion context is used we check for > > occupied to be true in the 16.X version if the occupied flag is set to > true > > we assert and in the latest version I see that this is an error log. But > > there is a time window where if the completion context would be available > > to the other consumer but still the old consumer did not set the occupied > > to false. The new consumer holds the admin queue lock to get the > completion > > context but the update by the old consumer to set the the occupied flag > is > > not done under lock. So should we make sure that the new consumer should > > get the completion context only when the occupied flag is set to false. > Any > > thoughts on this? > > Hi Param, > > Both the producer and the consumer are holding the spinlock while > getting the completion context. If you see any situation where it > isn't (besides the release function), please let me know. > As it is protected by the lock, returning error while completion > context is occupied (and it shouldn't) it fine, as it will stop the > admin queue and allow the DPDK user application to execute the reset > of the device. > > Thanks, > Michal > > > If required I can try to make a patch where the completion context would > be > > available only after setting the occupied flag to false. > > > > Thanks, > > Param. >