at 1:51 PM, Klaus Jensen <i...@irrelevant.dk> wrote:

> On Jul  6 19:34, Jinhao Fan wrote:
>> at 2:43 AM, Keith Busch <kbu...@kernel.org> wrote:
>> 
>>> On Tue, Jul 05, 2022 at 07:11:36PM +0200, Klaus Jensen wrote:
>>>> On Jul  5 22:24, Jinhao Fan wrote:
>>>>> @@ -1374,7 +1374,14 @@ static void nvme_enqueue_req_completion(NvmeCQueue 
>>>>> *cq, NvmeRequest *req)
>>>>> 
>>>>>    QTAILQ_REMOVE(&req->sq->out_req_list, req, entry);
>>>>>    QTAILQ_INSERT_TAIL(&cq->req_list, req, entry);
>>>>> -    timer_mod(cq->timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + 500);
>>>>> +
>>>>> +    if (req->sq->ioeventfd_enabled) {
>>>>> +        /* Post CQE directly since we are in main loop thread */
>>>>> +        nvme_post_cqes(cq);
>>>>> +    } else {
>>>>> +        /* Schedule the timer to post CQE later since we are in vcpu 
>>>>> thread */
>>>>> +        timer_mod(cq->timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + 
>>>>> 500);
>>>>> +    }
>>>> 
>>>> Actually, we are only in the vcpu thread if we come here from
>>>> nvme_process_db that in very rare circumstances may enqueue the
>>>> completion of an AER due to an invalid doorbell write.
>>>> 
>>>> In general, nvme_enqueue_req_completion is only ever called from the
>>>> main iothread. Which actually causes me to wonder why we defer this work
>>>> in the first place. It does have the benefit that we queue up several
>>>> completions before posting them in one go and raising the interrupt.
>>>> But I wonder if that could be handled better.
>>> 
>>> I think the timer is used because of the cq_full condition. We need to 
>>> restart
>>> completions when it becomes not full, which requires a doorbell write. 
>>> Having
>>> everyone from the main iothread use the same timer as the doorbell handler 
>>> just
>>> ensures single threaded list access.
>> 
>> Could we let nvme_process_aers register another timer/BH to trigger
>> nvme_enqueue_req_completion in the iothread? In this way we won’t need the
>> timer_mod in nvme_enqueue_req_completion.
> 
> Yes, we could have process_aers in a timer. Which would probably be
> preferable in order to limit the amount of work the mmio handler is
> doing in that rare case. However, its such a rare case (only misbehaving
> drivers) that it's probably not worth optimizing for.

I think putting nvme_process_aers in a timer can help make sure
nvme_enqueue_req_completion is only called in an iothread context. Currently
it can be called either in iothread or vcpu thread. So
nvme_enqueue_req_completion has to infer its context, in my patch using the
cq->ioeventfd_enabled flag. This approach is probably error-prone. Honestly
I am not sure whether cq->ioeventfd_enabled is enough to guarantee we are in
iothread.

>> We can also avoid some potential currency problems because CQ is only
>> modified in the iothread.
> 
> There are currently no concurrency problems because of the Big QEMU
> Lock. When the mmio handler is running, the vcpu holds the BQL (and
> whenever the main iothread is running, it is holding the BQL).

Will this still hold when we move on to iothreads?

> 
>> BTW, are there any reason that we must use timers (not BH) here? Also why do
>> we choose to delay for 500ns?
> 
> No particular reason. do not see any reason why this could not be bottom
> halfs. This will likely change into bhs when we add iothread support
> anyway.

I will try BH when I start the iothread work.


Reply via email to