On Tue, May 21, 2019 at 06:05:31AM -0400, Michael S. Tsirkin wrote: > On Tue, May 21, 2019 at 11:44:07AM +0200, Stefano Garzarella wrote: > > Hi Micheal, Jason, > > as suggested by Stefan, I'm checking if we have some races in the > > virtio-vsock driver. We found some races in the .probe() and .remove() > > with the upper layer (socket) and I'll fix it. > > > > Now my attention is on the bottom layer (virtio device) and my question is: > > during the .remove() of virtio-vsock driver (virtio_vsock_remove), could > > happen > > that an IRQ comes and one of our callback (e.g. virtio_vsock_rx_done()) is > > executed, queueing new works? > > > > I tried to follow the code in both cases (device unplugged or module > > removed) > > and maybe it couldn't happen because we remove it from bus's knowledge, > > but I'm not sure and your advice would be very helpful. > > > > Thanks in advance, > > Stefano > > > Great question! This should be better documented: patches welcome!
When I'm clear, I'll be happy to document this. > > Here's my understanding: > > > A typical removal flow works like this: > > - prevent linux from sending new kick requests to device > and flush such outstanding requests if any > (device can still send notifications to linux) > > - call > vi->vdev->config->reset(vi->vdev); > this will flush all device writes and interrupts. > device will not use any more buffers. > previously outstanding callbacks might still be active. > > - Then call > vdev->config->del_vqs(vdev); > to flush outstanding callbacks if any. Thanks for sharing these useful information. So, IIUC between step 1 (e.g. in virtio-vsock we flush all work-queues) and step 2, new IRQs could happen, and in the virtio-vsock driver new work will be queued. In order to handle this case, I'm thinking to add a new variable 'work_enabled' in the struct virtio_vsock, put it to false at the start of the .remove(), then call synchronize_rcu() before to flush all work queues and use an helper function virtio_transport_queue_work() to queue a new work, where the check of work_enabled and the queue_work are in the RCU read critical section. Here a pseudo code to explain better the idea: virtio_vsock_remove() { vsock->work_enabled = false; /* Wait for other CPUs to finish to queue works */ synchronize_rcu(); flush_works(); vdev->config->reset(vdev); ... vdev->config->del_vqs(vdev); } virtio_vsock_queue_work(vsock, work) { rcu_read_lock(); if (!vsock->work_enabled) { goto out; } queue_work(virtio_vsock_workqueue, work); out: rcu_read_unlock(); } Do you think can work? Please tell me if there is a better way to handle this case. Thanks, Stefano