On 9/26/23 00:24, Michael S. Tsirkin wrote: > On Tue, Sep 26, 2023 at 12:13:11AM +0200, Ilya Maximets wrote: >> On 9/25/23 23:24, Michael S. Tsirkin wrote: >>> On Mon, Sep 25, 2023 at 10:58:05PM +0200, Ilya Maximets wrote: >>>> On 9/25/23 17:38, Stefan Hajnoczi wrote: >>>>> On Mon, 25 Sept 2023 at 11:36, Ilya Maximets <i.maxim...@ovn.org> wrote: >>>>>> >>>>>> On 9/25/23 17:12, Stefan Hajnoczi wrote: >>>>>>> On Mon, 25 Sept 2023 at 11:02, Ilya Maximets <i.maxim...@ovn.org> wrote: >>>>>>>> >>>>>>>> On 9/25/23 16:23, Stefan Hajnoczi wrote: >>>>>>>>> On Fri, 25 Aug 2023 at 13:04, Ilya Maximets <i.maxim...@ovn.org> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> We do not need the most up to date number of heads, we only want to >>>>>>>>>> know if there is at least one. >>>>>>>>>> >>>>>>>>>> Use shadow variable as long as it is not equal to the last available >>>>>>>>>> index checked. This avoids expensive qatomic dereference of the >>>>>>>>>> RCU-protected memory region cache as well as the memory access itself >>>>>>>>>> and the subsequent memory barrier. >>>>>>>>>> >>>>>>>>>> The change improves performance of the af-xdp network backend by >>>>>>>>>> 2-3%. >>>>>>>>>> >>>>>>>>>> Signed-off-by: Ilya Maximets <i.maxim...@ovn.org> >>>>>>>>>> --- >>>>>>>>>> hw/virtio/virtio.c | 10 +++++++++- >>>>>>>>>> 1 file changed, 9 insertions(+), 1 deletion(-) >>>>>>>>>> >>>>>>>>>> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c >>>>>>>>>> index 309038fd46..04bf7cc977 100644 >>>>>>>>>> --- a/hw/virtio/virtio.c >>>>>>>>>> +++ b/hw/virtio/virtio.c >>>>>>>>>> @@ -999,7 +999,15 @@ void virtqueue_push(VirtQueue *vq, const >>>>>>>>>> VirtQueueElement *elem, >>>>>>>>>> /* Called within rcu_read_lock(). */ >>>>>>>>>> static int virtqueue_num_heads(VirtQueue *vq, unsigned int idx) >>>>>>>>>> { >>>>>>>>>> - uint16_t num_heads = vring_avail_idx(vq) - idx; >>>>>>>>>> + uint16_t num_heads; >>>>>>>>>> + >>>>>>>>>> + if (vq->shadow_avail_idx != idx) { >>>>>>>>>> + num_heads = vq->shadow_avail_idx - idx; >>>>>>>>>> + >>>>>>>>>> + return num_heads; >>>>>>>>> >>>>>>>>> This still needs to check num_heads > vq->vring.num and return -EINVAL >>>>>>>>> as is done below. >>>>>>>> >>>>>>>> Hmm, yeas, you're right. If the value was incorrect initially, the >>>>>>>> shadow >>>>>>>> will be incorrect. However, I think we should just not return here in >>>>>>>> this >>>>>>>> case and let vring_avail_idx() to grab an actual new value below. >>>>>>>> Otherwise >>>>>>>> we may never break out of this error. >>>>>>>> >>>>>>>> Does that make sense? >>>>>>> >>>>>>> No, because virtio_error() marks the device as broken. The device >>>>>>> requires a reset in order to function again. Fetching >>>>>>> vring_avail_idx() again won't help. >>>>>> >>>>>> OK, I see. In this case we're talking about situation where >>>>>> vring_avail_idx() was called in some other place and stored a bad value >>>>>> in the shadow variable, then virtqueue_num_heads() got called. Right? >>>> >>>> Hmm, I suppose we also need a read barrier after all even if we use >>>> a shadow index. Assuming the index is correct, but the shadow variable >>>> was updated by a call outside of this function, then we may miss a >>>> barrier and read the descriptor out of order, in theory. Read barrier >>>> is going to be a compiler barrier on x86, so the performance gain from >>>> this patch should still be mostly there. I'll test that. >>> >>> I can't say I understand generally. shadow is under qemu control, >>> I don't think it can be updated concurrently by multiple CPUs. >> >> It can't, I agree. Scenario I'm thinking about is the following: >> >> 1. vring_avail_idx() is called from one of the places other than >> virtqueue_num_heads(). Shadow is updated with the current value. >> Some users of vring_avail_idx() do not use barriers after the call. >> >> 2. virtqueue_split_get_avail_bytes() is called. >> >> 3. virtqueue_split_get_avail_bytes() calls virtqueue_num_heads(). >> >> 4. virtqueue_num_heads() checks the shadow and returns early. >> >> 5. virtqueue_split_get_avail_bytes() calls vring_split_desc_read() and >> reads the descriptor. >> >> If between steps 1 and 5 we do not have a read barrier, we potentially >> risk reading descriptor data that is not yet fully written, because >> there is no guarantee that reading the last_avail_idx on step 1 wasn't >> reordered with the descriptor read. >> >> In current code we always have smp_rmb() in virtqueue_num_heads(). >> But if we return from this function without a barrier, we may have an >> issue, IIUC. >> >> I agree that it's kind of a very unlikely scenario and we will probably >> have a control dependency between steps 1 and 5 that will prevent the >> issue, but it might be safer to just have an explicit barrier in >> virtqueue_num_heads(). >> >> Does that make sense? Or am I missing something else here? > > Aha, got it. Good point, yes. Pls document in a code comment.
Done. Posted v2: https://lore.kernel.org/qemu-devel/20230927135157.2316982-1-i.maxim...@ovn.org/ > > >>> >>> >>>>>> >>>>>> AFAIU, we can still just fall through here and let vring_avail_idx() >>>>>> to read the index again and fail the existing check. That would happen >>>>>> today without this patch applied. >>>>> >>>>> Yes, that is fine. >>>>> >>>>>> >>>>>> I'm jut trying to avoid duplication of the virtio_error call, i.e.: >>>>>> >>>>>> if (vq->shadow_avail_idx != idx) { >>>>>> num_heads = vq->shadow_avail_idx - idx; >>>>>> >>>>>> /* Check it isn't doing very strange things with descriptor >>>>>> numbers. */ >>>>>> if (num_heads > vq->vring.num) { >>>>>> virtio_error(vq->vdev, "Guest moved used index from %u to >>>>>> %u", >>>>>> idx, vq->shadow_avail_idx); >>>>>> return -EINVAL; >>>>>> } >>>>>> return num_heads; >>>>>> } >>>>>> >>>>>> vs >>>>>> >>>>>> if (vq->shadow_avail_idx != idx) { >>>>>> num_heads = vq->shadow_avail_idx - idx; >>>>>> >>>>>> /* Only use the shadow value if it was good initially. */ >>>>>> if (num_heads <= vq->vring.num) { >>>>>> return num_heads; >>>>>> } >>>>>> } >>>>>> >>>>>> >>>>>> What do you think? >>>>> >>>>> Sounds good. >>>>> >>>>>> >>>>>> Best regards, Ilya Maximets. >>> >