Hi,

Some comments below.


On Mon,  1 Feb 2016 10:18:54 +0000
Mark Kavanagh <mark.b.kavan...@intel.com> wrote:

> Add support for Jumbo Frames to DPDK-enabled port types,
> using single-segment-mbufs.
> 
> Using this approach, the amount of memory allocated for each mbuf
> to store frame data is increased to a value greater than 1518B
> (typical Ethernet maximum frame length). The increased space
> available in the mbuf means that an entire Jumbo Frame can be carried
> in a single mbuf, as opposed to partitioning it across multiple mbuf
> segments.
> 
> The amount of space allocated to each mbuf to hold frame data is
> defined dynamically by the user when adding a DPDK port to a bridge.
> If an MTU value is not supplied, or the user-supplied value is
> invalid, the MTU for the port defaults to standard Ethernet MTU (i.e.
> 1500B).
> 
> Signed-off-by: Mark Kavanagh <mark.b.kavan...@intel.com>
> ---
>  INSTALL.DPDK.md   |  59 +++++++++-
>  NEWS              |   2 +
>  lib/netdev-dpdk.c | 347
> +++++++++++++++++++++++++++++++++++++++++------------- 3 files
> changed, 328 insertions(+), 80 deletions(-)
> 
> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
> index 96b686c..64ccd15 100644
> --- a/INSTALL.DPDK.md
> +++ b/INSTALL.DPDK.md
> @@ -859,10 +859,61 @@ by adding the following string:
>  to <interface> sections of all network devices used by DPDK.
> Parameter 'N' determines how many queues can be used by the guest.
>  
> +
> +Jumbo Frames
> +------------
> +
> +Support for Jumbo Frames may be enabled at run-time for DPDK-type
> ports. +
> +To avail of Jumbo Frame support, add the 'dpdk-mtu' option to the
> ovs-vsctl +'add-port' command-line, along with the required MTU for
> the port. +e.g.
> +
> +     ```
> +     ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk
> options:dpdk-mtu=9000
> +     ```

Any particular reason to call it dpdk-mtu instead of mtu?



> +When Jumbo Frames are enabled, the size of a DPDK port's mbuf
> segments are +increased, such that a full Jumbo Frame may be
> accommodated inside a single +mbuf segment. Once set, the MTU for a
> DPDK port is immutable. +

I think we need some protection here because if someone tries to
change that, it will segfault PMD threads.


> +Jumbo frame support has been validated against 13312B frames, using
> the +DPDK `igb_uio` driver, but larger frames and other DPDK NIC
> drivers may +theoretically be supported. Supported port types
> excludes vHost-Cuse ports, as +this feature is pending deprecation.
> +
> +
> +vHost Ports and Jumbo Frames
> +----------------------------
> +Jumbo frame support is available for DPDK vHost-User ports only.
> Some additional +configuration is needed to take advantage of this
> feature: +
> +  1. `mergeable buffers` must be enabled for vHost ports, as
> demonstrated in
> +      the QEMU command line snippet below:
> +
> +      ```
> +      '-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \'
> +      '-device
> virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=on'
> +      ```
> +
> +  2. Where virtio devices are bound to the Linux kernel driver in a
> guest
> +     environment (i.e. interfaces are not bound to an in-guest DPDK
> driver), the
> +     MTU of those logical network interfaces must also be increased.
> This
> +     avoids segmentation of Jumbo Frames in the guest. Note that
> 'MTU' refers
> +     to the length of the IP packet only, and not that of the entire
> frame. +
> +     e.g. To calculate the exact MTU of a standard IPv4 frame,
> subtract the L2
> +     header and CRC lengths (i.e. 18B) from the max supported frame
> size.
> +     So, to set the MTU for a 13312B Jumbo Frame:
> +
> +      ```
> +      ifconfig eth1 mtu 13294
> +      ```
> +
> +
>  Restrictions:
>  -------------
>  
> -  - Work with 1500 MTU, needs few changes in DPDK lib to fix this
> issue.
>    - Currently DPDK port does not make use any offload functionality.
>    - DPDK-vHost support works with 1G huge pages.
>  
> @@ -903,6 +954,12 @@ Restrictions:
>      the next release of DPDK (which includes the above patch) is
> available and integrated into OVS.
>  
> +  Jumbo Frames:
> +  - `virtio-pmd`: DPDK apps in the guest do not exit gracefully. The
> source of
> +     this issue is currently being investigated.
> +  - vHost-Cuse: Jumbo Frame support is not available for vHost Cuse
> ports. +
> +
>  Bug Reporting:
>  --------------
>  
> diff --git a/NEWS b/NEWS
> index 5c18867..cd563e0 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -46,6 +46,8 @@ v2.5.0 - xx xxx xxxx
>       abstractions, such as virtual L2 and L3 overlays and security
> groups.
>     - RHEL packaging:
>       * DPDK ports may now be created via network scripts (see
> README.RHEL).
> +   - netdev-dpdk:
> +     * Add Jumbo Frame Support  
>  
>  v2.4.0 - 20 Aug 2015
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index de7e488..76a5dcc 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -62,20 +62,25 @@ static struct vlog_rate_limit rl =
> VLOG_RATE_LIMIT_INIT(5, 20); #define OVS_CACHE_LINE_SIZE
> CACHE_LINE_SIZE #define OVS_VPORT_DPDK "ovs_dpdk"
>  
> +#define NETDEV_DPDK_JUMBO_FRAME_ENABLED     1
> +#define NETDEV_DPDK_DEFAULT_RX_BUFSIZE      1024
> +
>  /*
>   * need to reserve tons of extra space in the mbufs so we can align
> the
>   * DMA addresses to 4KB.
>   * The minimum mbuf size is limited to avoid scatter behaviour and
> drop in
>   * performance for standard Ethernet MTU.
>   */
> -#define MTU_TO_MAX_LEN(mtu)  ((mtu) + ETHER_HDR_LEN + ETHER_CRC_LEN)
> -#define MBUF_SIZE_MTU(mtu)   (MTU_TO_MAX_LEN(mtu)        \
> -                              + sizeof(struct dp_packet) \
> -                              + RTE_PKTMBUF_HEADROOM)
> -#define MBUF_SIZE_DRIVER     (2048                       \
> -                              + sizeof (struct rte_mbuf) \
> -                              + RTE_PKTMBUF_HEADROOM)
> -#define MBUF_SIZE(mtu)       MAX(MBUF_SIZE_MTU(mtu),
> MBUF_SIZE_DRIVER) +#define MTU_TO_FRAME_LEN(mtu)       ((mtu) +
> ETHER_HDR_LEN + ETHER_CRC_LEN) +#define FRAME_LEN_TO_MTU(frame_len)
> ((frame_len)- ETHER_HDR_LEN - ETHER_CRC_LEN) +#define
> MBUF_SEGMENT_SIZE(mtu)      ( MTU_TO_FRAME_LEN(mtu)      \
> +                                    + sizeof(struct dp_packet)   \
> +                                    + RTE_PKTMBUF_HEADROOM)
> +
> +/* This value should be specified as a multiple of the DPDK NIC
> driver's
> + * 'min_rx_bufsize' attribute (currently 1024B for 'igb_uio').
> + */
> +#define NETDEV_DPDK_MAX_FRAME_LEN    13312
>  
>  /* Max and min number of packets in the mempool.  OVS tries to
> allocate a
>   * mempool with MAX_NB_MBUF: if this fails (because the system
> doesn't have @@ -86,6 +91,8 @@ static struct vlog_rate_limit rl =
> VLOG_RATE_LIMIT_INIT(5, 20); #define MIN_NB_MBUF          (4096 * 4)
>  #define MP_CACHE_SZ          RTE_MEMPOOL_CACHE_MAX_SIZE
>  
> +#define DPDK_VLAN_TAG_LEN    4

Why not use VLAN_HEADER_LEN ?
We already include packets.h anyway.



>  /* MAX_NB_MBUF can be divided by 2 many times, until MIN_NB_MBUF */
>  BUILD_ASSERT_DECL(MAX_NB_MBUF %
> ROUND_DOWN_POW2(MAX_NB_MBUF/MIN_NB_MBUF) == 0); 
> @@ -114,7 +121,6 @@ static const struct rte_eth_conf port_conf = {
>          .header_split   = 0, /* Header Split disabled */
>          .hw_ip_checksum = 0, /* IP checksum offload disabled */
>          .hw_vlan_filter = 0, /* VLAN filtering disabled */
> -        .jumbo_frame    = 0, /* Jumbo Frame Support disabled */
>          .hw_strip_crc   = 0,
>      },
>      .rx_adv_conf = {
> @@ -254,6 +260,41 @@ is_dpdk_class(const struct netdev_class *class)
>      return class->construct == netdev_dpdk_construct;
>  }
>  
> +/* DPDK NIC drivers allocate RX buffers at a particular granularity
> + * (specified by rte_eth_dev_info.min_rx_bufsize - currently 1K for
> igb_uio).
> + * If 'frame_len' is not a multiple of this value, insufficient
> buffers are
> + * allocated to accomodate the packet in its entirety. Furthermore,
> the igb_uio
> + * driver needs to ensure that there is also sufficient space in the
> Rx buffer
> + * to accommodate two VLAN tags (for QinQ frames). If the RX buffer
> is too
> + * small, then the driver enables scatter RX behaviour, which reduces
> + * performance. To prevent this, use a buffer size that is closest to
> + * 'frame_len', but which satisfies the aforementioned criteria.
> + */
> +static uint32_t
> +dpdk_buf_size(struct netdev_dpdk *netdev, int frame_len)
> +{
> +    struct rte_eth_dev_info info;
> +    uint32_t buf_size;
> +    /* XXX: This is a workaround for DPDK v2.2, and needs to be
> refactored with a
> +     * future DPDK release. */
> +    uint32_t len = frame_len + (2 * DPDK_VLAN_TAG_LEN);
> +
> +    if(netdev->type == DPDK_DEV_ETH) {
> +        rte_eth_dev_info_get(netdev->port_id, &info);
> +        buf_size = (info.min_rx_bufsize == 0) ?
> +                   NETDEV_DPDK_DEFAULT_RX_BUFSIZE :
> +                   info.min_rx_bufsize;
> +    } else {
> +        buf_size = NETDEV_DPDK_DEFAULT_RX_BUFSIZE;
> +    }
> +
> +    if(len % buf_size) {
> +        len = buf_size * ((len/buf_size) + 1);
> +    }
> +
> +    return len;
> +}
> +
>  /* XXX: use dpdk malloc for entire OVS. in fact huge page should be
> used
>   * for all other segments data, bss and text. */
>  
> @@ -280,26 +321,65 @@ free_dpdk_buf(struct dp_packet *p)
>  }
>  
>  static void
> -__rte_pktmbuf_init(struct rte_mempool *mp,
> -                   void *opaque_arg OVS_UNUSED,
> -                   void *_m,
> -                   unsigned i OVS_UNUSED)
> +ovs_rte_pktmbuf_pool_init(struct rte_mempool *mp, void *opaque_arg)
>  {
> -    struct rte_mbuf *m = _m;
> -    uint32_t buf_len = mp->elt_size - sizeof(struct dp_packet);
> +    struct rte_pktmbuf_pool_private *user_mbp_priv, *mbp_priv;
> +    struct rte_pktmbuf_pool_private default_mbp_priv;
> +    uint16_t roomsz;
>  
>      RTE_MBUF_ASSERT(mp->elt_size >= sizeof(struct dp_packet));
>  
> +    /* if no structure is provided, assume no mbuf private area */
> +
> +    user_mbp_priv = opaque_arg;
> +    if (user_mbp_priv == NULL) {
> +        default_mbp_priv.mbuf_priv_size = 0;
> +        if (mp->elt_size > sizeof(struct dp_packet)) {
> +            roomsz = mp->elt_size - sizeof(struct dp_packet);
> +        } else {
> +            roomsz = 0;
> +        }
> +        default_mbp_priv.mbuf_data_room_size = roomsz;
> +        user_mbp_priv = &default_mbp_priv;
> +    }
> +
> +    RTE_MBUF_ASSERT(mp->elt_size >= sizeof(struct dp_packet) +
> +        user_mbp_priv->mbuf_data_room_size +
> +        user_mbp_priv->mbuf_priv_size);
> +
> +    mbp_priv = rte_mempool_get_priv(mp);
> +    memcpy(mbp_priv, user_mbp_priv, sizeof(*mbp_priv));
> +}
> +
> +/* Initialise some fields in the mbuf structure that are not
> modified by the
> + * user once created (origin pool, buffer start address, etc.*/
> +static void
> +__ovs_rte_pktmbuf_init(struct rte_mempool *mp,
> +                       void *opaque_arg OVS_UNUSED,
> +                       void *_m,
> +                       unsigned i OVS_UNUSED)
> +{
> +    struct rte_mbuf *m = _m;
> +    uint32_t buf_size, buf_len, priv_size;
> +
> +    priv_size = rte_pktmbuf_priv_size(mp);
> +    buf_size = sizeof(struct dp_packet) + priv_size;
> +    buf_len = rte_pktmbuf_data_room_size(mp);
> +
> +    RTE_MBUF_ASSERT(RTE_ALIGN(priv_size, RTE_MBUF_PRIV_ALIGN) ==
> priv_size);
> +    RTE_MBUF_ASSERT(mp->elt_size >= buf_size);
> +    RTE_MBUF_ASSERT(buf_len <= UINT16_MAX);
> +
>      memset(m, 0, mp->elt_size);
>  
> -    /* start of buffer is just after mbuf structure */
> -    m->buf_addr = (char *)m + sizeof(struct dp_packet);
> -    m->buf_physaddr = rte_mempool_virt2phy(mp, m) +
> -                    sizeof(struct dp_packet);
> +    /* start of buffer is after dp_packet structure and priv data */
> +    m->priv_size = priv_size;
> +    m->buf_addr = (char *)m + buf_size;
> +    m->buf_physaddr = rte_mempool_virt2phy(mp, m) + buf_size;
>      m->buf_len = (uint16_t)buf_len;
>  
>      /* keep some headroom between start of buffer and data */
> -    m->data_off = RTE_MIN(RTE_PKTMBUF_HEADROOM, m->buf_len);
> +    m->data_off = RTE_MIN(RTE_PKTMBUF_HEADROOM,
> (uint16_t)m->buf_len); 
>      /* init some constant fields */
>      m->pool = mp;
> @@ -315,7 +395,7 @@ ovs_rte_pktmbuf_init(struct rte_mempool *mp,
>  {
>      struct rte_mbuf *m = _m;
>  
> -    __rte_pktmbuf_init(mp, opaque_arg, _m, i);
> +    __ovs_rte_pktmbuf_init(mp, opaque_arg, m, i);
>  
>      dp_packet_init_dpdk((struct dp_packet *) m, m->buf_len);
>  }
> @@ -326,6 +406,7 @@ dpdk_mp_get(int socket_id, int mtu)
> OVS_REQUIRES(dpdk_mutex) struct dpdk_mp *dmp = NULL;
>      char mp_name[RTE_MEMPOOL_NAMESIZE];
>      unsigned mp_size;
> +    struct rte_pktmbuf_pool_private mbp_priv;
>  
>      LIST_FOR_EACH (dmp, list_node, &dpdk_mp_list) {
>          if (dmp->socket_id == socket_id && dmp->mtu == mtu) {
> @@ -338,6 +419,8 @@ dpdk_mp_get(int socket_id, int mtu)
> OVS_REQUIRES(dpdk_mutex) dmp->socket_id = socket_id;
>      dmp->mtu = mtu;
>      dmp->refcount = 1;
> +    mbp_priv.mbuf_data_room_size = MTU_TO_FRAME_LEN(mtu) +
> RTE_PKTMBUF_HEADROOM;
> +    mbp_priv.mbuf_priv_size = 0;
>  
>      mp_size = MAX_NB_MBUF;
>      do {
> @@ -346,10 +429,10 @@ dpdk_mp_get(int socket_id, int mtu)
> OVS_REQUIRES(dpdk_mutex) return NULL;
>          }
>  
> -        dmp->mp = rte_mempool_create(mp_name, mp_size,
> MBUF_SIZE(mtu),
> +        dmp->mp = rte_mempool_create(mp_name, mp_size,
> MBUF_SEGMENT_SIZE(mtu), MP_CACHE_SZ,
>                                       sizeof(struct
> rte_pktmbuf_pool_private),
> -                                     rte_pktmbuf_pool_init, NULL,
> +                                     ovs_rte_pktmbuf_pool_init,
> &mbp_priv, ovs_rte_pktmbuf_init, NULL,
>                                       socket_id, 0);
>      } while (!dmp->mp && rte_errno == ENOMEM && (mp_size /= 2) >=
> MIN_NB_MBUF); @@ -433,6 +516,7 @@ dpdk_eth_dev_queue_setup(struct
> netdev_dpdk *dev, int n_rxq, int n_txq) {
>      int diag = 0;
>      int i;
> +    struct rte_eth_conf conf = port_conf;
>  
>      /* A device may report more queues than it makes available (this
> has
>       * been observed for Intel xl710, which reserves some of them for
> @@ -444,7 +528,12 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk
> *dev, int n_rxq, int n_txq) VLOG_INFO("Retrying setup with (rxq:%d
> txq:%d)", n_rxq, n_txq); }
>  
> -        diag = rte_eth_dev_configure(dev->port_id, n_rxq, n_txq,
> &port_conf);
> +        if(OVS_UNLIKELY(dev->mtu > ETHER_MTU)) {
> +            conf.rxmode.jumbo_frame = NETDEV_DPDK_JUMBO_FRAME_ENABLED;
> +            conf.rxmode.max_rx_pkt_len = MTU_TO_FRAME_LEN(dev->mtu);
> +        }
>

Can we add an "else" here?  Before port_conf was initializing
jumbo_frame=0 for instance, but now we are changing it so I wonder if I
set up a port with jumbo and another without, the last one would get
that bit set. 



> +        diag = rte_eth_dev_configure(dev->port_id, n_rxq, n_txq,
> &conf); if (diag) {
>              break;
>          }
> @@ -586,6 +675,7 @@ netdev_dpdk_init(struct netdev *netdev_, unsigned
> int port_no, struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
>      int sid;
>      int err = 0;
> +    uint32_t buf_size;
>  
>      ovs_mutex_init(&netdev->mutex);
>      ovs_mutex_lock(&netdev->mutex);
> @@ -605,10 +695,16 @@ netdev_dpdk_init(struct netdev *netdev_,
> unsigned int port_no, netdev->port_id = port_no;
>      netdev->type = type;
>      netdev->flags = 0;
> +
> +    /* Initialize port's MTU and frame len to the default Ethernet
> values.
> +     * Larger, user-specified (jumbo) frame buffers are accommodated
> in
> +     * netdev_dpdk_set_config.
> +     */
> +    netdev->max_packet_len = ETHER_MAX_LEN;
>      netdev->mtu = ETHER_MTU;
> -    netdev->max_packet_len = MTU_TO_MAX_LEN(netdev->mtu);
>  
> -    netdev->dpdk_mp = dpdk_mp_get(netdev->socket_id, netdev->mtu);
> +    buf_size = dpdk_buf_size(netdev, ETHER_MAX_LEN);
> +    netdev->dpdk_mp = dpdk_mp_get(netdev->socket_id,
> FRAME_LEN_TO_MTU(buf_size)); if (!netdev->dpdk_mp) {
>          err = ENOMEM;
>          goto unlock;
> @@ -651,6 +747,27 @@ dpdk_dev_parse_name(const char dev_name[], const
> char prefix[], return 0;
>  }
>  
> +static void
> +dpdk_dev_parse_mtu(const struct smap *args, int *mtu)
> +{
> +    const char *mtu_str = smap_get(args, "dpdk-mtu");
> +    char *end_ptr = NULL;
> +    int local_mtu;
> +
> +    if(mtu_str) {
> +        local_mtu = strtoul(mtu_str, &end_ptr, 0);
> +    }
> +    if(!mtu_str || local_mtu < ETHER_MTU ||
> +                   local_mtu >
> FRAME_LEN_TO_MTU(NETDEV_DPDK_MAX_FRAME_LEN) ||
> +                   *end_ptr != '\0') {
> +        local_mtu = ETHER_MTU;
> +        VLOG_WARN("Invalid or missing dpdk-mtu parameter -
> defaulting to %d.\n",
> +                   local_mtu);
> +    }
> +
> +    *mtu = local_mtu;
> +}
> +
>  static int
>  vhost_construct_helper(struct netdev *netdev_)
> OVS_REQUIRES(dpdk_mutex) {
> @@ -777,11 +894,77 @@ netdev_dpdk_get_config(const struct netdev
> *netdev_, struct smap *args) smap_add_format(args,
> "configured_rx_queues", "%d", netdev_->n_rxq); smap_add_format(args,
> "requested_tx_queues", "%d", netdev_->n_txq); smap_add_format(args,
> "configured_tx_queues", "%d", dev->real_n_txq);
> +    smap_add_format(args, "mtu", "%d", dev->mtu);
>      ovs_mutex_unlock(&dev->mutex);
>  
>      return 0;
>  }
>  
> +/* Set the mtu of DPDK_DEV_ETH ports */
> +static int
> +netdev_dpdk_set_mtu(const struct netdev *netdev, int mtu)
> +{
> +    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> +    int old_mtu, err;
> +    uint32_t buf_size;
> +    int dpdk_mtu;
> +    struct dpdk_mp *old_mp;
> +    struct dpdk_mp *mp;
> +
> +    ovs_mutex_lock(&dpdk_mutex);
> +    ovs_mutex_lock(&dev->mutex);
> +    if (dev->mtu == mtu) {
> +        err = 0;
> +        goto out;
> +    }
> +
> +    buf_size = dpdk_buf_size(dev, MTU_TO_FRAME_LEN(mtu));
> +    dpdk_mtu = FRAME_LEN_TO_MTU(buf_size);
> +
> +    mp = dpdk_mp_get(dev->socket_id, dpdk_mtu);
> +    if (!mp) {
> +        err = ENOMEM;
> +        goto out;
> +    }
> +
> +    rte_eth_dev_stop(dev->port_id);
> +
> +    old_mtu = dev->mtu;
> +    old_mp = dev->dpdk_mp;
> +    dev->dpdk_mp = mp;
> +    dev->mtu = mtu;
> +    dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
> +
> +    err = dpdk_eth_dev_init(dev);
> +    if (err) {
> +        VLOG_WARN("Unable to set MTU '%d' for '%s'; reverting to
> last known "
> +                  "good value '%d'\n", mtu, dev->up.name, old_mtu);
> +        dpdk_mp_put(mp);
> +        dev->mtu = old_mtu;
> +        dev->dpdk_mp = old_mp;
> +        dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
> +        dpdk_eth_dev_init(dev);
> +        goto out;
> +    } else {
> +        dpdk_mp_put(old_mp);
> +        netdev_change_seq_changed(netdev);
> +    }
> +out:
> +    ovs_mutex_unlock(&dev->mutex);
> +    ovs_mutex_unlock(&dpdk_mutex);
> +    return err;
> +}
> +
> +static int
> +netdev_dpdk_set_config(struct netdev *netdev_, const struct smap
> *args) +{
> +    int mtu;
> +
> +    dpdk_dev_parse_mtu(args, &mtu);
> +
> +    return netdev_dpdk_set_mtu(netdev_, mtu);
> +}
> +
>  static int
>  netdev_dpdk_get_numa_id(const struct netdev *netdev_)
>  {
> @@ -1358,54 +1541,6 @@ netdev_dpdk_get_mtu(const struct netdev
> *netdev, int *mtup) 
>      return 0;
>  }
> -
> -static int
> -netdev_dpdk_set_mtu(const struct netdev *netdev, int mtu)
> -{
> -    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> -    int old_mtu, err;
> -    struct dpdk_mp *old_mp;
> -    struct dpdk_mp *mp;
> -
> -    ovs_mutex_lock(&dpdk_mutex);
> -    ovs_mutex_lock(&dev->mutex);
> -    if (dev->mtu == mtu) {
> -        err = 0;
> -        goto out;
> -    }
> -
> -    mp = dpdk_mp_get(dev->socket_id, dev->mtu);
> -    if (!mp) {
> -        err = ENOMEM;
> -        goto out;
> -    }
> -
> -    rte_eth_dev_stop(dev->port_id);
> -
> -    old_mtu = dev->mtu;
> -    old_mp = dev->dpdk_mp;
> -    dev->dpdk_mp = mp;
> -    dev->mtu = mtu;
> -    dev->max_packet_len = MTU_TO_MAX_LEN(dev->mtu);
> -
> -    err = dpdk_eth_dev_init(dev);
> -    if (err) {
> -        dpdk_mp_put(mp);
> -        dev->mtu = old_mtu;
> -        dev->dpdk_mp = old_mp;
> -        dev->max_packet_len = MTU_TO_MAX_LEN(dev->mtu);
> -        dpdk_eth_dev_init(dev);
> -        goto out;
> -    }
> -
> -    dpdk_mp_put(old_mp);
> -    netdev_change_seq_changed(netdev);
> -out:
> -    ovs_mutex_unlock(&dev->mutex);
> -    ovs_mutex_unlock(&dpdk_mutex);
> -    return err;
> -}
> -
>  static int
>  netdev_dpdk_get_carrier(const struct netdev *netdev_, bool *carrier);
>  
> @@ -1682,7 +1817,7 @@ netdev_dpdk_get_status(const struct netdev
> *netdev_, struct smap *args) smap_add_format(args, "numa_id", "%d",
> rte_eth_dev_socket_id(dev->port_id)); smap_add_format(args,
> "driver_name", "%s", dev_info.driver_name); smap_add_format(args,
> "min_rx_bufsize", "%u", dev_info.min_rx_bufsize);
> -    smap_add_format(args, "max_rx_pktlen", "%u",
> dev_info.max_rx_pktlen);
> +    smap_add_format(args, "max_rx_pktlen", "%u",
> dev->max_packet_len); smap_add_format(args, "max_rx_queues", "%u",
> dev_info.max_rx_queues); smap_add_format(args, "max_tx_queues", "%u",
> dev_info.max_tx_queues); smap_add_format(args, "max_mac_addrs", "%u",
> dev_info.max_mac_addrs); @@ -1904,6 +2039,51 @@
> dpdk_vhost_user_class_init(void) return 0;
>  }
>  
> +/* Set the mtu of DPDK_DEV_VHOST ports */
> +static int
> +netdev_dpdk_vhost_set_mtu(const struct netdev *netdev, int mtu)
> +{
> +    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> +    int err = 0;
> +    struct dpdk_mp *old_mp;
> +    struct dpdk_mp *mp;
> +
> +    ovs_mutex_lock(&dpdk_mutex);
> +    ovs_mutex_lock(&dev->mutex);
> +    if (dev->mtu == mtu) {
> +        err = 0;
> +        goto out;
> +    }
> +
> +    mp = dpdk_mp_get(dev->socket_id, mtu);
> +    if (!mp) {
> +        err = ENOMEM;
> +        goto out;
> +    }
> +
> +    old_mp = dev->dpdk_mp;
> +    dev->dpdk_mp = mp;
> +    dev->mtu = mtu;
> +    dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
> +
> +    dpdk_mp_put(old_mp);
> +    netdev_change_seq_changed(netdev);
> +out:
> +    ovs_mutex_unlock(&dev->mutex);
> +    ovs_mutex_unlock(&dpdk_mutex);
> +    return err;
> +}
> +
> +static int
> +netdev_dpdk_vhost_set_config(struct netdev *netdev_, const struct
> smap *args) +{
> +    int mtu;
> +
> +    dpdk_dev_parse_mtu(args, &mtu);
> +
> +    return netdev_dpdk_vhost_set_mtu(netdev_, mtu);
> +}
> +
>  static void
>  dpdk_common_init(void)
>  {
> @@ -2040,8 +2220,9 @@ unlock_dpdk:
>      return err;
>  }
>  
> -#define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, MULTIQ,
> SEND, \
> -    GET_CARRIER, GET_STATS, GET_FEATURES, GET_STATUS,
> RXQ_RECV)          \ +#define NETDEV_DPDK_CLASS(NAME, INIT,
> CONSTRUCT, DESTRUCT, SET_CONFIG, \
> +        MULTIQ, SEND, SET_MTU, GET_CARRIER, GET_STATS,
> GET_FEATURES,   \
> +        GET_STATUS,
> RXQ_RECV)                                          \
> {                                                             \
> NAME,                                                     \
> INIT,                       /* init */                    \ @@
> -2053,7 +2234,7 @@ unlock_dpdk:
> DESTRUCT,                                                 \
> netdev_dpdk_dealloc,                                      \
> netdev_dpdk_get_config,                                   \
> -    NULL,                       /* netdev_dpdk_set_config */  \
> +    SET_CONFIG,                                               \
>      NULL,                       /* get_tunnel_config */       \
>      NULL,                       /* build header */            \
>      NULL,                       /* push header */             \
> @@ -2067,7 +2248,7 @@ unlock_dpdk:
>      netdev_dpdk_set_etheraddr,                                \
>      netdev_dpdk_get_etheraddr,                                \
>      netdev_dpdk_get_mtu,                                      \
> -    netdev_dpdk_set_mtu,                                      \
> +    SET_MTU,                                                  \
>      netdev_dpdk_get_ifindex,                                  \
>      GET_CARRIER,                                              \
>      netdev_dpdk_get_carrier_resets,                           \
> @@ -2213,8 +2394,10 @@ static const struct netdev_class dpdk_class =
>          NULL,
>          netdev_dpdk_construct,
>          netdev_dpdk_destruct,
> +        netdev_dpdk_set_config,
>          netdev_dpdk_set_multiq,
>          netdev_dpdk_eth_send,
> +        netdev_dpdk_set_mtu,
>          netdev_dpdk_get_carrier,
>          netdev_dpdk_get_stats,
>          netdev_dpdk_get_features,
> @@ -2227,8 +2410,10 @@ static const struct netdev_class
> dpdk_ring_class = NULL,
>          netdev_dpdk_ring_construct,
>          netdev_dpdk_destruct,
> +        netdev_dpdk_set_config,
>          netdev_dpdk_set_multiq,
>          netdev_dpdk_ring_send,
> +        netdev_dpdk_set_mtu,
>          netdev_dpdk_get_carrier,
>          netdev_dpdk_get_stats,
>          netdev_dpdk_get_features,
> @@ -2241,8 +2426,10 @@ static const struct netdev_class OVS_UNUSED
> dpdk_vhost_cuse_class = dpdk_vhost_cuse_class_init,
>          netdev_dpdk_vhost_cuse_construct,
>          netdev_dpdk_vhost_destruct,
> +        NULL,
>          netdev_dpdk_vhost_set_multiq,
>          netdev_dpdk_vhost_send,
> +        NULL,
>          netdev_dpdk_vhost_get_carrier,
>          netdev_dpdk_vhost_get_stats,
>          NULL,
> @@ -2255,8 +2442,10 @@ static const struct netdev_class OVS_UNUSED
> dpdk_vhost_user_class = dpdk_vhost_user_class_init,
>          netdev_dpdk_vhost_user_construct,
>          netdev_dpdk_vhost_destruct,
> +        netdev_dpdk_vhost_set_config,
>          netdev_dpdk_vhost_set_multiq,
>          netdev_dpdk_vhost_send,
> +        netdev_dpdk_vhost_set_mtu,
>          netdev_dpdk_vhost_get_carrier,
>          netdev_dpdk_vhost_get_stats,
>          NULL,

Other parts look good to me, thanks Mark!

I tried this patch with small and jumbo (8000 bytes) frames from
outside to DPDK port then vhost-user and then to guest using a traffic
generator.  I could see the packets inside of the VM just fine.
I haven't tried with real traffic yet though.

Thanks,
-- 
fbl

_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to