Hi,
> -----Original Message-----
> From: Bie, Tiwei
> Sent: Monday, December 3, 2018 10:23 PM
> To: Wang, Xiao W <[email protected]>
> Cc: [email protected]; [email protected]; Wang, Zhihong
> <[email protected]>; Ye, Xiaolong <[email protected]>
> Subject: Re: [PATCH 2/9] vhost: provide helpers for virtio ring relay
>
> On Wed, Nov 28, 2018 at 05:46:00PM +0800, Xiao Wang wrote:
> [...]
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * Synchronize the available ring from guest to mediate ring, help to
> > + * check desc validity to protect against malicious guest driver.
> > + *
> > + * @param vid
> > + * vhost device id
> > + * @param qid
> > + * vhost queue id
> > + * @param m_vring
> > + * mediate virtio ring pointer
> > + * @return
> > + * number of synced available entries on success, -1 on failure
> > + */
> > +int __rte_experimental
> > +rte_vdpa_relay_avail_ring(int vid, int qid, struct vring *m_vring);
> > +
> > +/**
> > + * @warning
> > + * @b EXPERIMENTAL: this API may change without prior notice
> > + *
> > + * Synchronize the used ring from mediate ring to guest, log dirty
> > + * page for each Rx buffer used.
> > + *
> > + * @param vid
> > + * vhost device id
> > + * @param qid
> > + * vhost queue id
> > + * @param m_vring
> > + * mediate virtio ring pointer
> > + * @return
> > + * number of synced used entries on success, -1 on failure
> > + */
> > +int __rte_experimental
> > +rte_vdpa_relay_used_ring(int vid, int qid, struct vring *m_vring);
>
> Above APIs are split ring specific. We also need to take
> packed ring into consideration.
After some study on the current packed ring description, several ideas:
1. These APIs are used as helpers to setup a mediate relay layer to help do
dirty page logging, we may not need
this kind of ring relay for packed ring at all. The target of a mediate SW
layer is to help device do dirty page
logging, so this SW-assisted VDPA tries to find a way to intercept the
frontend-backend communication, as you
can see in this patch set, SW captures the device interrupt and then parse the
vring and log dirty page
afterwards. We set up this mediate vring to make sure the relay SW can
intercept the device interrupt, as you
know, this way we can control the mediate vring's interrupt suppression
structure.
2.One new point about the packed ring is that it separates out the event
suppression structure from the
description ring. So in this case, we can just set up a mediate event
suppression structure to intercept event
notification.
BTW, I find one troublesome point about the packed ring is that it's hard for a
mediate SW to quickly handle the
"buffer id", guest virtio driver understands this id well, it keeps some
internal info about each id, e.g. chain list
length, but the relay SW has to parse the packed ring again, which is not
efficient.
3. In the split vring, relay SW reuses the guest desc vring, and desc is not
writed by DMA, so no log for the desc.
But in the packed vring, desc is writed by DMA, desc ring's logging is a new
thing.
Packed ring is quite different, it could be a very different mechanism, other
than following a vring relay API. Also
from testing point of view, if we come out with a new efficient implementation
for packed ring VDPA, it's hard to
test it with HW. Testing need a HW supporting packed ring DMA and the
get_vring_base/set_vring_base
interface.
>
> > #endif /* _RTE_VDPA_H_ */
> [...]
> > diff --git a/lib/librte_vhost/vdpa.c b/lib/librte_vhost/vdpa.c
> > index e7d849ee0..e41117776 100644
> > --- a/lib/librte_vhost/vdpa.c
> > +++ b/lib/librte_vhost/vdpa.c
> > @@ -122,3 +122,176 @@ rte_vdpa_get_device_num(void)
> > {
> > return vdpa_device_num;
> > }
> > +
> > +static int
> > +invalid_desc_check(struct virtio_net *dev, struct vhost_virtqueue *vq,
> > + uint64_t desc_iova, uint64_t desc_len, uint8_t perm)
> > +{
> > + uint64_t desc_addr, desc_chunck_len;
> > +
> > + while (desc_len) {
> > + desc_chunck_len = desc_len;
> > + desc_addr = vhost_iova_to_vva(dev, vq,
> > + desc_iova,
> > + &desc_chunck_len,
> > + perm);
> > +
> > + if (!desc_addr)
> > + return -1;
> > +
> > + desc_len -= desc_chunck_len;
> > + desc_iova += desc_chunck_len;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +int
> > +rte_vdpa_relay_avail_ring(int vid, int qid, struct vring *m_vring)
> > +{
> > + struct virtio_net *dev = get_device(vid);
> > + uint16_t idx, idx_m, desc_id;
> > + struct vring_desc desc;
> > + struct vhost_virtqueue *vq;
> > + struct vring_desc *desc_ring;
> > + struct vring_desc *idesc = NULL;
> > + uint64_t dlen;
> > + int ret;
> > +
> > + if (!dev)
> > + return -1;
> > +
> > + vq = dev->virtqueue[qid];
>
> Better to also validate qid.
>
> > + idx = vq->avail->idx;
> > + idx_m = m_vring->avail->idx;
> > + ret = idx - idx_m;
>
> Need to cast (idx - idx_m) to uint16_t.
>
> > +
> > + while (idx_m != idx) {
> > + /* avail entry copy */
> > + desc_id = vq->avail->ring[idx_m % vq->size];
>
> idx_m & (vq->size - 1) should be faster.
>
> > + m_vring->avail->ring[idx_m % vq->size] = desc_id;
> > + desc_ring = vq->desc;
> > +
> > + if (vq->desc[desc_id].flags & VRING_DESC_F_INDIRECT) {
> > + dlen = vq->desc[desc_id].len;
> > + desc_ring = (struct vring_desc *)(uintptr_t)
> > + vhost_iova_to_vva(dev, vq, vq->desc[desc_id].addr,
>
> The indent needs to be fixed.
>
> > + &dlen,
> > + VHOST_ACCESS_RO);
> > + if (unlikely(!desc_ring))
> > + return -1;
> > +
> > + if (unlikely(dlen < vq->desc[idx].len)) {
> > + idesc = alloc_copy_ind_table(dev, vq,
> > + vq->desc[idx].addr, vq->desc[idx].len);
> > + if (unlikely(!idesc))
> > + return -1;
> > +
> > + desc_ring = idesc;
> > + }
> > +
> > + desc_id = 0;
> > + }
> > +
> > + /* check if the buf addr is within the guest memory */
> > + do {
> > + desc = desc_ring[desc_id];
> > + if (invalid_desc_check(dev, vq, desc.addr, desc.len,
> > + VHOST_ACCESS_RW))
>
> Should check with < 0, otherwise should return bool.
>
> We may just have RO access.
The desc may refers to a transmit buffer as well as receive buffer. Agree on
the comments and nice catches elsewhere above, will send new version.
[...]
BRs,
Xiao