On 9/26/23 00:24, Michael S. Tsirkin wrote:
> On Tue, Sep 26, 2023 at 12:13:11AM +0200, Ilya Maximets wrote:
>> On 9/25/23 23:24, Michael S. Tsirkin wrote:
>>> On Mon, Sep 25, 2023 at 10:58:05PM +0200, Ilya Maximets wrote:
>>>> On 9/25/23 17:38, Stefan Hajnoczi wrote:
>>>>> On Mon, 25 Sept 2023 at 11:36, Ilya Maximets <i.maxim...@ovn.org> wrote:
>>>>>>
>>>>>> On 9/25/23 17:12, Stefan Hajnoczi wrote:
>>>>>>> On Mon, 25 Sept 2023 at 11:02, Ilya Maximets <i.maxim...@ovn.org> wrote:
>>>>>>>>
>>>>>>>> On 9/25/23 16:23, Stefan Hajnoczi wrote:
>>>>>>>>> On Fri, 25 Aug 2023 at 13:04, Ilya Maximets <i.maxim...@ovn.org> 
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> We do not need the most up to date number of heads, we only want to
>>>>>>>>>> know if there is at least one.
>>>>>>>>>>
>>>>>>>>>> Use shadow variable as long as it is not equal to the last available
>>>>>>>>>> index checked.  This avoids expensive qatomic dereference of the
>>>>>>>>>> RCU-protected memory region cache as well as the memory access itself
>>>>>>>>>> and the subsequent memory barrier.
>>>>>>>>>>
>>>>>>>>>> The change improves performance of the af-xdp network backend by 
>>>>>>>>>> 2-3%.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Ilya Maximets <i.maxim...@ovn.org>
>>>>>>>>>> ---
>>>>>>>>>>  hw/virtio/virtio.c | 10 +++++++++-
>>>>>>>>>>  1 file changed, 9 insertions(+), 1 deletion(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
>>>>>>>>>> index 309038fd46..04bf7cc977 100644
>>>>>>>>>> --- a/hw/virtio/virtio.c
>>>>>>>>>> +++ b/hw/virtio/virtio.c
>>>>>>>>>> @@ -999,7 +999,15 @@ void virtqueue_push(VirtQueue *vq, const 
>>>>>>>>>> VirtQueueElement *elem,
>>>>>>>>>>  /* Called within rcu_read_lock().  */
>>>>>>>>>>  static int virtqueue_num_heads(VirtQueue *vq, unsigned int idx)
>>>>>>>>>>  {
>>>>>>>>>> -    uint16_t num_heads = vring_avail_idx(vq) - idx;
>>>>>>>>>> +    uint16_t num_heads;
>>>>>>>>>> +
>>>>>>>>>> +    if (vq->shadow_avail_idx != idx) {
>>>>>>>>>> +        num_heads = vq->shadow_avail_idx - idx;
>>>>>>>>>> +
>>>>>>>>>> +        return num_heads;
>>>>>>>>>
>>>>>>>>> This still needs to check num_heads > vq->vring.num and return -EINVAL
>>>>>>>>> as is done below.
>>>>>>>>
>>>>>>>> Hmm, yeas, you're right.  If the value was incorrect initially, the 
>>>>>>>> shadow
>>>>>>>> will be incorrect.  However, I think we should just not return here in 
>>>>>>>> this
>>>>>>>> case and let vring_avail_idx() to grab an actual new value below.  
>>>>>>>> Otherwise
>>>>>>>> we may never break out of this error.
>>>>>>>>
>>>>>>>> Does that make sense?
>>>>>>>
>>>>>>> No, because virtio_error() marks the device as broken. The device
>>>>>>> requires a reset in order to function again. Fetching
>>>>>>> vring_avail_idx() again won't help.
>>>>>>
>>>>>> OK, I see.  In this case we're talking about situation where
>>>>>> vring_avail_idx() was called in some other place and stored a bad value
>>>>>> in the shadow variable, then virtqueue_num_heads() got called.  Right?
>>>>
>>>> Hmm, I suppose we also need a read barrier after all even if we use
>>>> a shadow index.  Assuming the index is correct, but the shadow variable
>>>> was updated by a call outside of this function, then we may miss a
>>>> barrier and read the descriptor out of order, in theory.  Read barrier
>>>> is going to be a compiler barrier on x86, so the performance gain from
>>>> this patch should still be mostly there.  I'll test that.
>>>
>>> I can't say I understand generally. shadow is under qemu control,
>>> I don't think it can be updated concurrently by multiple CPUs.
>>
>> It can't, I agree.  Scenario I'm thinking about is the following:
>>
>> 1. vring_avail_idx() is called from one of the places other than
>>    virtqueue_num_heads().  Shadow is updated with the current value.
>>    Some users of vring_avail_idx() do not use barriers after the call.
>>
>> 2. virtqueue_split_get_avail_bytes() is called.
>>
>> 3. virtqueue_split_get_avail_bytes() calls virtqueue_num_heads().
>>
>> 4. virtqueue_num_heads() checks the shadow and returns early.
>>
>> 5. virtqueue_split_get_avail_bytes() calls vring_split_desc_read() and
>>    reads the descriptor.
>>
>> If between steps 1 and 5 we do not have a read barrier, we potentially
>> risk reading descriptor data that is not yet fully written, because
>> there is no guarantee that reading the last_avail_idx on step 1 wasn't
>> reordered with the descriptor read.
>>
>> In current code we always have smp_rmb() in virtqueue_num_heads().
>> But if we return from this function without a barrier, we may have an
>> issue, IIUC.
>>
>> I agree that it's kind of a very unlikely scenario and we will probably
>> have a control dependency between steps 1 and 5 that will prevent the
>> issue, but it might be safer to just have an explicit barrier in
>> virtqueue_num_heads().
>>
>> Does that make sense?  Or am I missing something else here?
> 
> Aha, got it. Good point, yes. Pls document in a code comment.

Done.  Posted v2:
  
https://lore.kernel.org/qemu-devel/20230927135157.2316982-1-i.maxim...@ovn.org/

> 
> 
>>>
>>>
>>>>>>
>>>>>> AFAIU, we can still just fall through here and let vring_avail_idx()
>>>>>> to read the index again and fail the existing check.  That would happen
>>>>>> today without this patch applied.
>>>>>
>>>>> Yes, that is fine.
>>>>>
>>>>>>
>>>>>> I'm jut trying to avoid duplication of the virtio_error call, i.e.:
>>>>>>
>>>>>>     if (vq->shadow_avail_idx != idx) {
>>>>>>         num_heads = vq->shadow_avail_idx - idx;
>>>>>>
>>>>>>         /* Check it isn't doing very strange things with descriptor 
>>>>>> numbers. */
>>>>>>         if (num_heads > vq->vring.num) {
>>>>>>             virtio_error(vq->vdev, "Guest moved used index from %u to 
>>>>>> %u",
>>>>>>                          idx, vq->shadow_avail_idx);
>>>>>>             return -EINVAL;
>>>>>>         }
>>>>>>         return num_heads;
>>>>>>     }
>>>>>>
>>>>>> vs
>>>>>>
>>>>>>     if (vq->shadow_avail_idx != idx) {
>>>>>>         num_heads = vq->shadow_avail_idx - idx;
>>>>>>
>>>>>>         /* Only use the shadow value if it was good initially. */
>>>>>>         if (num_heads <= vq->vring.num) {
>>>>>>             return num_heads;
>>>>>>         }
>>>>>>     }
>>>>>>
>>>>>>
>>>>>> What do you think?
>>>>>
>>>>> Sounds good.
>>>>>
>>>>>>
>>>>>> Best regards, Ilya Maximets.
>>>
> 


Reply via email to