Hi Maxime, > -----Original Message----- > From: Maxime Coquelin [mailto:maxime.coque...@redhat.com] > Sent: Friday, May 18, 2018 9:51 PM > On 05/18/2018 03:01 PM, Dariusz Stojaczyk wrote: > > rte_vhost is not vhost-user spec compliant. Some Vhost drivers have > > been already confirmed not to work with rte_vhost. virtio-user-scsi-pci > > in QEMU 2.12 doesn't fully initialize its management queues at SeaBIOS > > stage. This is perfectly fine from the Vhost-user spec perspective, but > > doesn't meet rte_vhost expectations. rte_vhost waits for all queues > > to be fully initialized before it allows the entire device to be > > processed. qFixing rte_vhost directly would require quite a big amount > > of changes, which would completely break backwards compatibility. > > > > This rte_vhost2 library is intended to smooth out the transition. > > It exposes a low-level API for implementing new Vhost-user slaves. > > The existing rte_vhost is about to be refactored to use rte_vhost2 > > library underneath, and demanding backends could now use rte_vhost2 > > directly. > > I like the idea, and the proposed way to smooth the transition. > > I will certainly have other comments later, but please find below > the ones I have for the moment. > > > <snip> > > + > > +/** > > + * Registers a new vhost target accepting remote connections. Multiple > > + * available transports are available. It is possible to create a Vhost- > user > > + * Unix domain socket polling local connections or connect to a > physical > > + * Virtio device and install an interrupt handler . > > + * > > + * This function is thread-safe. > > + * > > + * \param trtype type of the transport used, e.g. "vhost-user", > > + * "PCI-vhost-user", "PCI-vDPA". > > + * \param trid identifier of the device. For PCI this would be the BDF > address, > > + * for vhost-user the socket name. > > + * \param trflags additional options for the specified transport > > + * \param trctx additional data for the specified transport. Can be > NULL. > > + * \param tgt_ops callbacks to be called upon reaching specific > initialization > > + * states. > > + * \param features supported Virtio features. To be negotiated with > the > > + * driver ones. rte_vhost2 will append a couple of generic feature bits > > + * which are required by the Virtio spec. TODO list these features here > > + * \return 0 on success, negative errno otherwise > > + */ > > +int rte_vhost2_tgt_register(const char *trtype, const char *trid, > > + uint64_t trflags, void *trctx, > > + struct rte_vhost2_tgt_ops *tgt_ops, > > + uint64_t features); > > Couldn't the register API also pass the vdev? > Doing this, the backend could have rte_vhost2_dev in its device > struct.
Please notice the register API is for registering targets, not devices. A single Vhost-user server target can spawn multiple devices - one for each connection. I know the nomenclature is different from rte_vhost, but since each connection uses its own (virt)queues it makes sense to call things this way. Initially I thought about adding some rte_vhost2_tgt struct declaration that register function would return, but later on came to a conclusion that it would only complicate things for the library user. A parent struct that would keep rte_vhost2_tgt* needs to contain `const char *trtype` and `const char *trid` anyway, so it's just easier to use these two strings for target identification. > > <snip> > > +/** > > + * Bypass VIRTIO_F_IOMMU_PLATFORM and translate gpa directly. > > + * > > + * This function is thread-safe. > > + * > > + * \param mem vhost device memory > > + * \param gpa guest physical address > > + * \param len length of the memory to translate (in bytes). If > requested > > + * memory chunk crosses memory region boundary, the *len will be > set to > > + * the remaining, maximum length of virtually contiguous memory. In > such > > + * case the user will be required to call another gpa_to_vva(gpa + > *len). > > + * \return vhost virtual address or NULL if requested `gpa` is not > mapped. > > + */ > > +static inline void * > > +rte_vhost2_gpa_to_vva(struct rte_vhost2_memory *mem, uint64_t > gpa, uint64_t *len) > > +{ > > + struct rte_vhost2_mem_region *r; > > + uint32_t i; > > + > > + for (i = 0; i < mem->nregions; i++) { > > + r = &mem->regions[i]; > > + if (gpa >= r->guest_phys_addr && > > + gpa < r->guest_phys_addr + r->size) { > > + > > + if (unlikely(*len > r->guest_phys_addr + r->size - > gpa)) { > > + *len = r->guest_phys_addr + r->size - gpa; > > + } > > + > > + return gpa - r->guest_phys_addr + r- > >host_user_addr; > > + } > > + } > > + *len = 0; > > + > > + return 0; > > +} > > Maybe we could take the opportunity to only have > rte_vhost2_iova_to_vva. Good idea; will remove it in v3. Thanks, D.