[dpdk-dev] [PATCH v6 2/2] vhost: Add VHOST PMD

Tetsuya Mukawa Wed, 3 Feb 2016 11:13:44 +0900

On 2016/02/03 8:43, Ferruh Yigit wrote:
> On Tue, Feb 02, 2016 at 08:18:42PM +0900, Tetsuya Mukawa wrote:
>> diff --git a/doc/guides/nics/index.rst b/doc/guides/nics/index.rst
>> index 33c9cea..5819cdb 100644
>> --- a/doc/guides/nics/index.rst
>> +++ b/doc/guides/nics/index.rst
>> @@ -47,6 +47,7 @@ Network Interface Controller Drivers
>>      nfp
>>      szedata2
>>      virtio
>> +    vhost
>>      vmxnet3
>>      pcap_ring
>>  
>> diff --git a/doc/guides/rel_notes/release_2_2.rst 
>> b/doc/guides/rel_notes/release_2_2.rst
> Should this be 2.3 release notes?


Hi Ferruh,

Thanks for your checking.
Yes, it should be. I will fix it.

>> +
>> +struct pmd_internal {
>> +    TAILQ_ENTRY(pmd_internal) next;
>> +    char *dev_name;
>> +    char *iface_name;
>> +    unsigned nb_rx_queues;
>> +    unsigned nb_tx_queues;
>> +    uint8_t port_id;
> nb_rx_queuues, nb_tx_queues and port_id are duplicated in rte_eth_dev_data, 
> there is no reason to not use them but create new ones.
> But you may need to keep list of eth devices instead of internals_list for 
> this update, not sure.
> please check: http://dpdk.org/dev/patchwork/patch/10284/

It seems I wrongly understand how to use queues in virtual PMD, then
duplicates some values.
Sure, I will follow above patch, and fix all same issues in my patch.
Also will check Null PMD fixing.

>> +    for (i = 0; i < internal->nb_rx_queues; i++) {
>> +            vq = internal->rx_vhost_queues[i];
>> +            if (vq == NULL)
>> +                    continue;
>> +            vq->device = dev;
>> +            vq->internal = internal;
>> +            rte_vhost_enable_guest_notification(
>> +                            dev, vq->virtqueue_id, 0);
> syntax: no line wrap required here

Will fix.

>
>> +    }
>> +    for (i = 0; i < internal->nb_tx_queues; i++) {
>> +            vq = internal->tx_vhost_queues[i];
>> +            if (vq == NULL)
>> +                    continue;
>> +            vq->device = dev;
>> +            vq->internal = internal;
>> +            rte_vhost_enable_guest_notification(
>> +                            dev, vq->virtqueue_id, 0);
> syntax: no line wrap required here

Will fix it also.

>> +
>> +static int
>> +vring_state_changed(struct virtio_net *dev, uint16_t vring, int enable)
>> +{
>> +    struct rte_vhost_vring_state *state;
>> +    struct pmd_internal *internal;
>> +#ifdef RTE_LIBRTE_VHOST_NUMA
>> +    int newnode, ret;
>> +#endif
>> +
>> +    if (dev == NULL) {
>> +            RTE_LOG(ERR, PMD, "Invalid argument\n");
>> +            return -1;
>> +    }
>> +
>> +    internal = find_internal_resource(dev->ifname);
> Can find_internal_resource() return NULL here?

Will add null pointer checking here.

>> +    state = vring_states[internal->port_id];
>> +    if (!state) {
>> +            RTE_LOG(ERR, PMD, "Unused port\n");
>> +            return -1;
>> +    }
>> +
>> +#ifdef RTE_LIBRTE_VHOST_NUMA
>> +    ret  = get_mempolicy(&newnode, NULL, 0, dev,
>> +                    MPOL_F_NODE | MPOL_F_ADDR);
>> +    if (ret < 0) {
>> +            RTE_LOG(ERR, PMD, "Unknow numa node\n");
>> +            return -1;
>> +    }
>> +
>> +    rte_eth_devices[internal->port_id].data->numa_node = newnode;
> Isn't dev->priv already has eth_dev, do we need to access to eth_dev as 
> rte_eth_devices[...]
> valid for above, instead of find_internal_resource() can't we use 
> dev->priv->data->dev_private

'dev->priv' will be filled in new_device(). And this event will be come
before calling new_device().
Because of this, the function accesses rte_eth_devices[].

>
>> +static int
>> +eth_rx_queue_setup(struct rte_eth_dev *dev, uint16_t rx_queue_id,
>> +               uint16_t nb_rx_desc __rte_unused,
>> +               unsigned int socket_id,
>> +               const struct rte_eth_rxconf *rx_conf __rte_unused,
>> +               struct rte_mempool *mb_pool)
>> +{
>> +    struct pmd_internal *internal = dev->data->dev_private;
>> +    struct vhost_queue *vq;
>> +
>> +    rte_free(internal->rx_vhost_queues[rx_queue_id]);
> Why free here, initially this value already should be NULL?
> If possible to call queue_setup multiple times, does make sense to free in 
> rx/tx_queue_release() functions?

Yes, you are right.
I forgot to delete debug code.
Will remove it.

>> +
>> +    vq = rte_zmalloc_socket(NULL, sizeof(struct vhost_queue),
>> +                    RTE_CACHE_LINE_SIZE, socket_id);
>> +    if (vq == NULL) {
>> +            RTE_LOG(ERR, PMD, "Failed to allocate memory for rx queue\n");
>> +            return -ENOMEM;
>> +    }
> Other vPMDs use static memory for queues in internals struct to escape from 
> allocate/free with a cost of memory consumption
> Just another option if you prefer.

This is because queues may be in different numa node in vhost PMD case.

>> +    dev_info->min_rx_bufsize = 0;
>> +}
>> +
>> +static void
>> +eth_stats_get(struct rte_eth_dev *dev, struct rte_eth_stats *igb_stats)
> why name is igb_stats, historical?

Yeah, it's came from copy and paste.
I will fix it and below issues you  pointed out.
I appreciate your carefully reviewing!

Tetsuya

>> +static int
>> +eth_dev_vhost_create(const char *name, int index,
>> +                 char *iface_name,
>> +                 int16_t queues,
>> +                 const unsigned numa_node)
>> +{
>> +    struct rte_eth_dev_data *data = NULL;
>> +    struct pmd_internal *internal = NULL;
>> +    struct rte_eth_dev *eth_dev = NULL;
>> +    struct ether_addr *eth_addr = NULL;
>> +    struct rte_vhost_vring_state *vring_state = NULL;
>> +
>> +    RTE_LOG(INFO, PMD, "Creating VHOST-USER backend on numa socket %u\n",
>> +            numa_node);
>> +
>> +    /* now do all data allocation - for eth_dev structure, dummy pci driver
>> +     * and internal (private) data
>> +     */
>> +    data = rte_zmalloc_socket(name, sizeof(*data), 0, numa_node);
>> +    if (data == NULL)
>> +            goto error;
>> +
>> +    internal = rte_zmalloc_socket(name, sizeof(*internal), 0, numa_node);
>> +    if (internal == NULL)
>> +            goto error;
>> +
>> +    eth_addr = rte_zmalloc_socket(name, sizeof(*eth_addr), 0, numa_node);
>> +    if (eth_addr == NULL)
>> +            goto error;
>> +    *eth_addr = base_eth_addr;
>> +    eth_addr->addr_bytes[5] = index;
>> +
>> +    /* reserve an ethdev entry */
>> +    eth_dev = rte_eth_dev_allocate(name, RTE_ETH_DEV_VIRTUAL);
>> +    if (eth_dev == NULL)
>> +            goto error;
>> +
>> +    TAILQ_INIT(&(eth_dev->link_intr_cbs));
>> +
>> +    /* now put it all together
>> +     * - store queue data in internal,
>> +     * - store numa_node info in ethdev data
>> +     * - point eth_dev_data to internals
>> +     * - and point eth_dev structure to new eth_dev_data structure
>> +     */
>> +    internal->nb_rx_queues = queues;
>> +    internal->nb_tx_queues = queues;
>> +    internal->dev_name = strdup(name);
>> +    if (internal->dev_name == NULL)
>> +            goto error;
> eth_dev successfully allocated, do we need to do something in error case?
>
>> +    internal->iface_name = strdup(iface_name);
>> +    if (internal->iface_name == NULL) {
>> +            free(internal->dev_name);
>> +            goto error;
>> +    }
>> +    internal->port_id = eth_dev->data->port_id;
>> +
>> +    vring_state = rte_zmalloc_socket(name,
>> +                    sizeof(*vring_state), 0, numa_node);
>> +    if (vring_state == NULL)
> free dev_name & iface_name.
>
>> +            goto error;
>> +    rte_spinlock_init(&vring_state->lock);
>> +    vring_states[eth_dev->data->port_id] = vring_state;
>> +
>> +    pthread_mutex_lock(&internal_list_lock);
>> +    TAILQ_INSERT_TAIL(&internals_list, internal, next);
>> +    pthread_mutex_unlock(&internal_list_lock);
>> +
>> +    data->dev_private = internal;
>> +    data->port_id = eth_dev->data->port_id;
>> +    memmove(data->name, eth_dev->data->name, sizeof(data->name));
>> +    data->nb_rx_queues = queues;
>> +    data->nb_tx_queues = queues;
>> +    data->dev_link = pmd_link;
>> +    data->mac_addrs = eth_addr;
>> +
>> +    /* We'll replace the 'data' originally allocated by eth_dev. So the
>> +     * vhost PMD resources won't be shared between multi processes.
>> +     */
>> +    eth_dev->data = data;
>> +    eth_dev->dev_ops = &ops;
>> +    eth_dev->driver = NULL;
>> +    eth_dev->data->dev_flags =
>> +            RTE_ETH_DEV_DETACHABLE | RTE_ETH_DEV_INTR_LSC;
>> +    eth_dev->data->kdrv = RTE_KDRV_NONE;
>> +    eth_dev->data->drv_name = internal->dev_name;
>> +    eth_dev->data->numa_node = numa_node;
> Cosmetics but you can access as data->xxx instead of eth_dev->data->xxx
>
>> +static int
>> +rte_pmd_vhost_devinit(const char *name, const char *params)
>> +{
>> +    struct rte_kvargs *kvlist = NULL;
>> +    int ret = 0;
>> +    int index;
>> +    char *iface_name;
>> +    uint16_t queues;
>> +
>> +    RTE_LOG(INFO, PMD, "Initializing pmd_vhost for %s\n", name);
>> +
>> +    if (strlen(name) < strlen("eth_vhost"))
>> +            return -1;
> No need to do this check, rte_eal_vdev_init() already checks name and this 
> functions called only if there is a match.
>
>> +
>> +    index = strtol(name + strlen("eth_vhost"), NULL, 0);
>> +    if (errno == ERANGE)
>> +            return -1;
> Does device name has to contain integer, does "eth_vhostA" valid name? If so 
> does it make sense to use port_id instead of index?
>
>> +
>> +    kvlist = rte_kvargs_parse(params, valid_arguments);
>> +    if (kvlist == NULL)
>> +            return -1;
>> +
>> +    if (rte_kvargs_count(kvlist, ETH_VHOST_IFACE_ARG) == 1) {
>> +            ret = rte_kvargs_process(kvlist, ETH_VHOST_IFACE_ARG,
>> +                                     &open_iface, &iface_name);
>> +            if (ret < 0)
>> +                    goto out_free;
>> +    } else {
>> +            ret = -1;
>> +            goto out_free;
>> +    }
>> +
>> +    if (rte_kvargs_count(kvlist, ETH_VHOST_QUEUES_ARG) == 1) {
>> +            ret = rte_kvargs_process(kvlist, ETH_VHOST_QUEUES_ARG,
>> +                                     &open_queues, &queues);
>> +            if (ret < 0)
>> +                    goto out_free;
>> +
>> +    } else
>> +            queues = 1;
>> +
>> +    eth_dev_vhost_create(name, index,
>> +                    iface_name, queues, rte_socket_id());
> syntax: no line wrap required here
>
>> +
>> +out_free:
>> +    rte_kvargs_free(kvlist);
>> +    return ret;
>> +}
>> +
>> +static int
>> +rte_pmd_vhost_devuninit(const char *name)
>> +{
>> +    struct rte_eth_dev *eth_dev = NULL;
>> +    struct pmd_internal *internal;
>> +    unsigned int i;
>> +
>> +    RTE_LOG(INFO, PMD, "Un-Initializing pmd_vhost for %s\n", name);
>> +
>> +    if (name == NULL)
>> +            return -EINVAL;
> This check is not required, already done in rte_eal_vdev_uninit()
>
>> +
>> +    /* find an ethdev entry */
>> +    eth_dev = rte_eth_dev_allocated(name);
>> +    if (eth_dev == NULL)
>> +            return -ENODEV;
>> +
>> +    internal = eth_dev->data->dev_private;
>> +
>> +    rte_free(vring_states[internal->port_id]);
>> +    vring_states[internal->port_id] = NULL;
>> +
>> +    pthread_mutex_lock(&internal_list_lock);
>> +    TAILQ_REMOVE(&internals_list, internal, next);
>> +    pthread_mutex_unlock(&internal_list_lock);
>> +
>> +    eth_dev_stop(eth_dev);
>> +
>> +    if ((internal) && (internal->dev_name))
> if "internal" can be NULL, above internal->port_id reference will crash, if 
> can't be NULL no need to check here.
>
>> +            free(internal->dev_name);
>> +    if ((internal) && (internal->iface_name))
> Do we need parentheses around internal->iface_name (and internal if it will 
> stay)
>
>> +            free(internal->iface_name);
>> +
>> +    rte_free(eth_dev->data->mac_addrs);
>> +    rte_free(eth_dev->data);
>> +
>> +    for (i = 0; i < internal->nb_rx_queues; i++)
>> +            rte_free(internal->rx_vhost_queues[i]);
>> +    for (i = 0; i < internal->nb_tx_queues; i++)
>> +            rte_free(internal->tx_vhost_queues[i]);
>> +    rte_free(internal);
>> +
>> +    rte_eth_dev_release_port(eth_dev);
>> +
>> +    return 0;
>> +}
>> +
>> +static struct rte_driver pmd_vhost_drv = {
>> +    .name = "eth_vhost",
>> +    .type = PMD_VDEV,
>> +    .init = rte_pmd_vhost_devinit,
>> +    .uninit = rte_pmd_vhost_devuninit,
>> +};
>> +
>> +PMD_REGISTER_DRIVER(pmd_vhost_drv);
>> diff --git a/drivers/net/vhost/rte_eth_vhost.h 
>> b/drivers/net/vhost/rte_eth_vhost.h
>> new file mode 100644
>> index 0000000..8aa894a
>> --- /dev/null
>> +++ b/drivers/net/vhost/rte_eth_vhost.h
>> +/**
>> + * Disable features in feature_mask. Returns 0 on success.
>> + */
>> +int rte_eth_vhost_feature_disable(uint64_t feature_mask);
>> +
>> +/**
>> + *  Enable features in feature_mask. Returns 0 on success.
>> + */
>> +int rte_eth_vhost_feature_enable(uint64_t feature_mask);
>> +
>> +/* Returns currently supported vhost features */
> This also can be commented in doxygen style.
>
>> diff --git a/mk/rte.app.mk b/mk/rte.app.mk
>> index 8ecab41..04f7087 100644
>> --- a/mk/rte.app.mk
>> +++ b/mk/rte.app.mk
>> @@ -159,7 +159,13 @@ _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_QAT)        += 
>> -lrte_pmd_qat
>>  _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AESNI_MB)   += -lrte_pmd_aesni_mb
>>  _LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_AESNI_MB)   += 
>> -L$(AESNI_MULTI_BUFFER_LIB_PATH) -lIPSec_MB
>>  
>> -endif # ! $(CONFIG_RTE_BUILD_SHARED_LIB)
>> +ifeq ($(CONFIG_RTE_LIBRTE_VHOST),y)
>> +
>> +_LDLIBS-$(CONFIG_RTE_LIBRTE_PMD_VHOST)      += -lrte_pmd_vhost
>> +
>> +endif # ! $(CONFIG_RTE_LIBRTE_VHOST)
>> +
>> +endif # $(CONFIG_RTE_BUILD_SHARED_LIB)
> I guess "!" in comment is to say this if block is for SHARED_LIB==n, if so we 
> shouldn't update the comment to remove "!",
> And the line you have added should have "!" in comment : "endif # 
> $(CONFIG_RTE_LIBRTE_VHOST)"
>
>>  
>>  endif # ! CONFIG_RTE_BUILD_COMBINE_LIBS
>>  
>> -- 
>> 2.1.4
>>

[dpdk-dev] [PATCH v6 2/2] vhost: Add VHOST PMD

Reply via email to