Hi, Some comments below.
On Mon, 1 Feb 2016 10:18:54 +0000 Mark Kavanagh <mark.b.kavan...@intel.com> wrote: > Add support for Jumbo Frames to DPDK-enabled port types, > using single-segment-mbufs. > > Using this approach, the amount of memory allocated for each mbuf > to store frame data is increased to a value greater than 1518B > (typical Ethernet maximum frame length). The increased space > available in the mbuf means that an entire Jumbo Frame can be carried > in a single mbuf, as opposed to partitioning it across multiple mbuf > segments. > > The amount of space allocated to each mbuf to hold frame data is > defined dynamically by the user when adding a DPDK port to a bridge. > If an MTU value is not supplied, or the user-supplied value is > invalid, the MTU for the port defaults to standard Ethernet MTU (i.e. > 1500B). > > Signed-off-by: Mark Kavanagh <mark.b.kavan...@intel.com> > --- > INSTALL.DPDK.md | 59 +++++++++- > NEWS | 2 + > lib/netdev-dpdk.c | 347 > +++++++++++++++++++++++++++++++++++++++++------------- 3 files > changed, 328 insertions(+), 80 deletions(-) > > diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md > index 96b686c..64ccd15 100644 > --- a/INSTALL.DPDK.md > +++ b/INSTALL.DPDK.md > @@ -859,10 +859,61 @@ by adding the following string: > to <interface> sections of all network devices used by DPDK. > Parameter 'N' determines how many queues can be used by the guest. > > + > +Jumbo Frames > +------------ > + > +Support for Jumbo Frames may be enabled at run-time for DPDK-type > ports. + > +To avail of Jumbo Frame support, add the 'dpdk-mtu' option to the > ovs-vsctl +'add-port' command-line, along with the required MTU for > the port. +e.g. > + > + ``` > + ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk > options:dpdk-mtu=9000 > + ``` Any particular reason to call it dpdk-mtu instead of mtu? > +When Jumbo Frames are enabled, the size of a DPDK port's mbuf > segments are +increased, such that a full Jumbo Frame may be > accommodated inside a single +mbuf segment. Once set, the MTU for a > DPDK port is immutable. + I think we need some protection here because if someone tries to change that, it will segfault PMD threads. > +Jumbo frame support has been validated against 13312B frames, using > the +DPDK `igb_uio` driver, but larger frames and other DPDK NIC > drivers may +theoretically be supported. Supported port types > excludes vHost-Cuse ports, as +this feature is pending deprecation. > + > + > +vHost Ports and Jumbo Frames > +---------------------------- > +Jumbo frame support is available for DPDK vHost-User ports only. > Some additional +configuration is needed to take advantage of this > feature: + > + 1. `mergeable buffers` must be enabled for vHost ports, as > demonstrated in > + the QEMU command line snippet below: > + > + ``` > + '-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \' > + '-device > virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=on' > + ``` > + > + 2. Where virtio devices are bound to the Linux kernel driver in a > guest > + environment (i.e. interfaces are not bound to an in-guest DPDK > driver), the > + MTU of those logical network interfaces must also be increased. > This > + avoids segmentation of Jumbo Frames in the guest. Note that > 'MTU' refers > + to the length of the IP packet only, and not that of the entire > frame. + > + e.g. To calculate the exact MTU of a standard IPv4 frame, > subtract the L2 > + header and CRC lengths (i.e. 18B) from the max supported frame > size. > + So, to set the MTU for a 13312B Jumbo Frame: > + > + ``` > + ifconfig eth1 mtu 13294 > + ``` > + > + > Restrictions: > ------------- > > - - Work with 1500 MTU, needs few changes in DPDK lib to fix this > issue. > - Currently DPDK port does not make use any offload functionality. > - DPDK-vHost support works with 1G huge pages. > > @@ -903,6 +954,12 @@ Restrictions: > the next release of DPDK (which includes the above patch) is > available and integrated into OVS. > > + Jumbo Frames: > + - `virtio-pmd`: DPDK apps in the guest do not exit gracefully. The > source of > + this issue is currently being investigated. > + - vHost-Cuse: Jumbo Frame support is not available for vHost Cuse > ports. + > + > Bug Reporting: > -------------- > > diff --git a/NEWS b/NEWS > index 5c18867..cd563e0 100644 > --- a/NEWS > +++ b/NEWS > @@ -46,6 +46,8 @@ v2.5.0 - xx xxx xxxx > abstractions, such as virtual L2 and L3 overlays and security > groups. > - RHEL packaging: > * DPDK ports may now be created via network scripts (see > README.RHEL). > + - netdev-dpdk: > + * Add Jumbo Frame Support > > v2.4.0 - 20 Aug 2015 > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c > index de7e488..76a5dcc 100644 > --- a/lib/netdev-dpdk.c > +++ b/lib/netdev-dpdk.c > @@ -62,20 +62,25 @@ static struct vlog_rate_limit rl = > VLOG_RATE_LIMIT_INIT(5, 20); #define OVS_CACHE_LINE_SIZE > CACHE_LINE_SIZE #define OVS_VPORT_DPDK "ovs_dpdk" > > +#define NETDEV_DPDK_JUMBO_FRAME_ENABLED 1 > +#define NETDEV_DPDK_DEFAULT_RX_BUFSIZE 1024 > + > /* > * need to reserve tons of extra space in the mbufs so we can align > the > * DMA addresses to 4KB. > * The minimum mbuf size is limited to avoid scatter behaviour and > drop in > * performance for standard Ethernet MTU. > */ > -#define MTU_TO_MAX_LEN(mtu) ((mtu) + ETHER_HDR_LEN + ETHER_CRC_LEN) > -#define MBUF_SIZE_MTU(mtu) (MTU_TO_MAX_LEN(mtu) \ > - + sizeof(struct dp_packet) \ > - + RTE_PKTMBUF_HEADROOM) > -#define MBUF_SIZE_DRIVER (2048 \ > - + sizeof (struct rte_mbuf) \ > - + RTE_PKTMBUF_HEADROOM) > -#define MBUF_SIZE(mtu) MAX(MBUF_SIZE_MTU(mtu), > MBUF_SIZE_DRIVER) +#define MTU_TO_FRAME_LEN(mtu) ((mtu) + > ETHER_HDR_LEN + ETHER_CRC_LEN) +#define FRAME_LEN_TO_MTU(frame_len) > ((frame_len)- ETHER_HDR_LEN - ETHER_CRC_LEN) +#define > MBUF_SEGMENT_SIZE(mtu) ( MTU_TO_FRAME_LEN(mtu) \ > + + sizeof(struct dp_packet) \ > + + RTE_PKTMBUF_HEADROOM) > + > +/* This value should be specified as a multiple of the DPDK NIC > driver's > + * 'min_rx_bufsize' attribute (currently 1024B for 'igb_uio'). > + */ > +#define NETDEV_DPDK_MAX_FRAME_LEN 13312 > > /* Max and min number of packets in the mempool. OVS tries to > allocate a > * mempool with MAX_NB_MBUF: if this fails (because the system > doesn't have @@ -86,6 +91,8 @@ static struct vlog_rate_limit rl = > VLOG_RATE_LIMIT_INIT(5, 20); #define MIN_NB_MBUF (4096 * 4) > #define MP_CACHE_SZ RTE_MEMPOOL_CACHE_MAX_SIZE > > +#define DPDK_VLAN_TAG_LEN 4 Why not use VLAN_HEADER_LEN ? We already include packets.h anyway. > /* MAX_NB_MBUF can be divided by 2 many times, until MIN_NB_MBUF */ > BUILD_ASSERT_DECL(MAX_NB_MBUF % > ROUND_DOWN_POW2(MAX_NB_MBUF/MIN_NB_MBUF) == 0); > @@ -114,7 +121,6 @@ static const struct rte_eth_conf port_conf = { > .header_split = 0, /* Header Split disabled */ > .hw_ip_checksum = 0, /* IP checksum offload disabled */ > .hw_vlan_filter = 0, /* VLAN filtering disabled */ > - .jumbo_frame = 0, /* Jumbo Frame Support disabled */ > .hw_strip_crc = 0, > }, > .rx_adv_conf = { > @@ -254,6 +260,41 @@ is_dpdk_class(const struct netdev_class *class) > return class->construct == netdev_dpdk_construct; > } > > +/* DPDK NIC drivers allocate RX buffers at a particular granularity > + * (specified by rte_eth_dev_info.min_rx_bufsize - currently 1K for > igb_uio). > + * If 'frame_len' is not a multiple of this value, insufficient > buffers are > + * allocated to accomodate the packet in its entirety. Furthermore, > the igb_uio > + * driver needs to ensure that there is also sufficient space in the > Rx buffer > + * to accommodate two VLAN tags (for QinQ frames). If the RX buffer > is too > + * small, then the driver enables scatter RX behaviour, which reduces > + * performance. To prevent this, use a buffer size that is closest to > + * 'frame_len', but which satisfies the aforementioned criteria. > + */ > +static uint32_t > +dpdk_buf_size(struct netdev_dpdk *netdev, int frame_len) > +{ > + struct rte_eth_dev_info info; > + uint32_t buf_size; > + /* XXX: This is a workaround for DPDK v2.2, and needs to be > refactored with a > + * future DPDK release. */ > + uint32_t len = frame_len + (2 * DPDK_VLAN_TAG_LEN); > + > + if(netdev->type == DPDK_DEV_ETH) { > + rte_eth_dev_info_get(netdev->port_id, &info); > + buf_size = (info.min_rx_bufsize == 0) ? > + NETDEV_DPDK_DEFAULT_RX_BUFSIZE : > + info.min_rx_bufsize; > + } else { > + buf_size = NETDEV_DPDK_DEFAULT_RX_BUFSIZE; > + } > + > + if(len % buf_size) { > + len = buf_size * ((len/buf_size) + 1); > + } > + > + return len; > +} > + > /* XXX: use dpdk malloc for entire OVS. in fact huge page should be > used > * for all other segments data, bss and text. */ > > @@ -280,26 +321,65 @@ free_dpdk_buf(struct dp_packet *p) > } > > static void > -__rte_pktmbuf_init(struct rte_mempool *mp, > - void *opaque_arg OVS_UNUSED, > - void *_m, > - unsigned i OVS_UNUSED) > +ovs_rte_pktmbuf_pool_init(struct rte_mempool *mp, void *opaque_arg) > { > - struct rte_mbuf *m = _m; > - uint32_t buf_len = mp->elt_size - sizeof(struct dp_packet); > + struct rte_pktmbuf_pool_private *user_mbp_priv, *mbp_priv; > + struct rte_pktmbuf_pool_private default_mbp_priv; > + uint16_t roomsz; > > RTE_MBUF_ASSERT(mp->elt_size >= sizeof(struct dp_packet)); > > + /* if no structure is provided, assume no mbuf private area */ > + > + user_mbp_priv = opaque_arg; > + if (user_mbp_priv == NULL) { > + default_mbp_priv.mbuf_priv_size = 0; > + if (mp->elt_size > sizeof(struct dp_packet)) { > + roomsz = mp->elt_size - sizeof(struct dp_packet); > + } else { > + roomsz = 0; > + } > + default_mbp_priv.mbuf_data_room_size = roomsz; > + user_mbp_priv = &default_mbp_priv; > + } > + > + RTE_MBUF_ASSERT(mp->elt_size >= sizeof(struct dp_packet) + > + user_mbp_priv->mbuf_data_room_size + > + user_mbp_priv->mbuf_priv_size); > + > + mbp_priv = rte_mempool_get_priv(mp); > + memcpy(mbp_priv, user_mbp_priv, sizeof(*mbp_priv)); > +} > + > +/* Initialise some fields in the mbuf structure that are not > modified by the > + * user once created (origin pool, buffer start address, etc.*/ > +static void > +__ovs_rte_pktmbuf_init(struct rte_mempool *mp, > + void *opaque_arg OVS_UNUSED, > + void *_m, > + unsigned i OVS_UNUSED) > +{ > + struct rte_mbuf *m = _m; > + uint32_t buf_size, buf_len, priv_size; > + > + priv_size = rte_pktmbuf_priv_size(mp); > + buf_size = sizeof(struct dp_packet) + priv_size; > + buf_len = rte_pktmbuf_data_room_size(mp); > + > + RTE_MBUF_ASSERT(RTE_ALIGN(priv_size, RTE_MBUF_PRIV_ALIGN) == > priv_size); > + RTE_MBUF_ASSERT(mp->elt_size >= buf_size); > + RTE_MBUF_ASSERT(buf_len <= UINT16_MAX); > + > memset(m, 0, mp->elt_size); > > - /* start of buffer is just after mbuf structure */ > - m->buf_addr = (char *)m + sizeof(struct dp_packet); > - m->buf_physaddr = rte_mempool_virt2phy(mp, m) + > - sizeof(struct dp_packet); > + /* start of buffer is after dp_packet structure and priv data */ > + m->priv_size = priv_size; > + m->buf_addr = (char *)m + buf_size; > + m->buf_physaddr = rte_mempool_virt2phy(mp, m) + buf_size; > m->buf_len = (uint16_t)buf_len; > > /* keep some headroom between start of buffer and data */ > - m->data_off = RTE_MIN(RTE_PKTMBUF_HEADROOM, m->buf_len); > + m->data_off = RTE_MIN(RTE_PKTMBUF_HEADROOM, > (uint16_t)m->buf_len); > /* init some constant fields */ > m->pool = mp; > @@ -315,7 +395,7 @@ ovs_rte_pktmbuf_init(struct rte_mempool *mp, > { > struct rte_mbuf *m = _m; > > - __rte_pktmbuf_init(mp, opaque_arg, _m, i); > + __ovs_rte_pktmbuf_init(mp, opaque_arg, m, i); > > dp_packet_init_dpdk((struct dp_packet *) m, m->buf_len); > } > @@ -326,6 +406,7 @@ dpdk_mp_get(int socket_id, int mtu) > OVS_REQUIRES(dpdk_mutex) struct dpdk_mp *dmp = NULL; > char mp_name[RTE_MEMPOOL_NAMESIZE]; > unsigned mp_size; > + struct rte_pktmbuf_pool_private mbp_priv; > > LIST_FOR_EACH (dmp, list_node, &dpdk_mp_list) { > if (dmp->socket_id == socket_id && dmp->mtu == mtu) { > @@ -338,6 +419,8 @@ dpdk_mp_get(int socket_id, int mtu) > OVS_REQUIRES(dpdk_mutex) dmp->socket_id = socket_id; > dmp->mtu = mtu; > dmp->refcount = 1; > + mbp_priv.mbuf_data_room_size = MTU_TO_FRAME_LEN(mtu) + > RTE_PKTMBUF_HEADROOM; > + mbp_priv.mbuf_priv_size = 0; > > mp_size = MAX_NB_MBUF; > do { > @@ -346,10 +429,10 @@ dpdk_mp_get(int socket_id, int mtu) > OVS_REQUIRES(dpdk_mutex) return NULL; > } > > - dmp->mp = rte_mempool_create(mp_name, mp_size, > MBUF_SIZE(mtu), > + dmp->mp = rte_mempool_create(mp_name, mp_size, > MBUF_SEGMENT_SIZE(mtu), MP_CACHE_SZ, > sizeof(struct > rte_pktmbuf_pool_private), > - rte_pktmbuf_pool_init, NULL, > + ovs_rte_pktmbuf_pool_init, > &mbp_priv, ovs_rte_pktmbuf_init, NULL, > socket_id, 0); > } while (!dmp->mp && rte_errno == ENOMEM && (mp_size /= 2) >= > MIN_NB_MBUF); @@ -433,6 +516,7 @@ dpdk_eth_dev_queue_setup(struct > netdev_dpdk *dev, int n_rxq, int n_txq) { > int diag = 0; > int i; > + struct rte_eth_conf conf = port_conf; > > /* A device may report more queues than it makes available (this > has > * been observed for Intel xl710, which reserves some of them for > @@ -444,7 +528,12 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk > *dev, int n_rxq, int n_txq) VLOG_INFO("Retrying setup with (rxq:%d > txq:%d)", n_rxq, n_txq); } > > - diag = rte_eth_dev_configure(dev->port_id, n_rxq, n_txq, > &port_conf); > + if(OVS_UNLIKELY(dev->mtu > ETHER_MTU)) { > + conf.rxmode.jumbo_frame = NETDEV_DPDK_JUMBO_FRAME_ENABLED; > + conf.rxmode.max_rx_pkt_len = MTU_TO_FRAME_LEN(dev->mtu); > + } > Can we add an "else" here? Before port_conf was initializing jumbo_frame=0 for instance, but now we are changing it so I wonder if I set up a port with jumbo and another without, the last one would get that bit set. > + diag = rte_eth_dev_configure(dev->port_id, n_rxq, n_txq, > &conf); if (diag) { > break; > } > @@ -586,6 +675,7 @@ netdev_dpdk_init(struct netdev *netdev_, unsigned > int port_no, struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_); > int sid; > int err = 0; > + uint32_t buf_size; > > ovs_mutex_init(&netdev->mutex); > ovs_mutex_lock(&netdev->mutex); > @@ -605,10 +695,16 @@ netdev_dpdk_init(struct netdev *netdev_, > unsigned int port_no, netdev->port_id = port_no; > netdev->type = type; > netdev->flags = 0; > + > + /* Initialize port's MTU and frame len to the default Ethernet > values. > + * Larger, user-specified (jumbo) frame buffers are accommodated > in > + * netdev_dpdk_set_config. > + */ > + netdev->max_packet_len = ETHER_MAX_LEN; > netdev->mtu = ETHER_MTU; > - netdev->max_packet_len = MTU_TO_MAX_LEN(netdev->mtu); > > - netdev->dpdk_mp = dpdk_mp_get(netdev->socket_id, netdev->mtu); > + buf_size = dpdk_buf_size(netdev, ETHER_MAX_LEN); > + netdev->dpdk_mp = dpdk_mp_get(netdev->socket_id, > FRAME_LEN_TO_MTU(buf_size)); if (!netdev->dpdk_mp) { > err = ENOMEM; > goto unlock; > @@ -651,6 +747,27 @@ dpdk_dev_parse_name(const char dev_name[], const > char prefix[], return 0; > } > > +static void > +dpdk_dev_parse_mtu(const struct smap *args, int *mtu) > +{ > + const char *mtu_str = smap_get(args, "dpdk-mtu"); > + char *end_ptr = NULL; > + int local_mtu; > + > + if(mtu_str) { > + local_mtu = strtoul(mtu_str, &end_ptr, 0); > + } > + if(!mtu_str || local_mtu < ETHER_MTU || > + local_mtu > > FRAME_LEN_TO_MTU(NETDEV_DPDK_MAX_FRAME_LEN) || > + *end_ptr != '\0') { > + local_mtu = ETHER_MTU; > + VLOG_WARN("Invalid or missing dpdk-mtu parameter - > defaulting to %d.\n", > + local_mtu); > + } > + > + *mtu = local_mtu; > +} > + > static int > vhost_construct_helper(struct netdev *netdev_) > OVS_REQUIRES(dpdk_mutex) { > @@ -777,11 +894,77 @@ netdev_dpdk_get_config(const struct netdev > *netdev_, struct smap *args) smap_add_format(args, > "configured_rx_queues", "%d", netdev_->n_rxq); smap_add_format(args, > "requested_tx_queues", "%d", netdev_->n_txq); smap_add_format(args, > "configured_tx_queues", "%d", dev->real_n_txq); > + smap_add_format(args, "mtu", "%d", dev->mtu); > ovs_mutex_unlock(&dev->mutex); > > return 0; > } > > +/* Set the mtu of DPDK_DEV_ETH ports */ > +static int > +netdev_dpdk_set_mtu(const struct netdev *netdev, int mtu) > +{ > + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); > + int old_mtu, err; > + uint32_t buf_size; > + int dpdk_mtu; > + struct dpdk_mp *old_mp; > + struct dpdk_mp *mp; > + > + ovs_mutex_lock(&dpdk_mutex); > + ovs_mutex_lock(&dev->mutex); > + if (dev->mtu == mtu) { > + err = 0; > + goto out; > + } > + > + buf_size = dpdk_buf_size(dev, MTU_TO_FRAME_LEN(mtu)); > + dpdk_mtu = FRAME_LEN_TO_MTU(buf_size); > + > + mp = dpdk_mp_get(dev->socket_id, dpdk_mtu); > + if (!mp) { > + err = ENOMEM; > + goto out; > + } > + > + rte_eth_dev_stop(dev->port_id); > + > + old_mtu = dev->mtu; > + old_mp = dev->dpdk_mp; > + dev->dpdk_mp = mp; > + dev->mtu = mtu; > + dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); > + > + err = dpdk_eth_dev_init(dev); > + if (err) { > + VLOG_WARN("Unable to set MTU '%d' for '%s'; reverting to > last known " > + "good value '%d'\n", mtu, dev->up.name, old_mtu); > + dpdk_mp_put(mp); > + dev->mtu = old_mtu; > + dev->dpdk_mp = old_mp; > + dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); > + dpdk_eth_dev_init(dev); > + goto out; > + } else { > + dpdk_mp_put(old_mp); > + netdev_change_seq_changed(netdev); > + } > +out: > + ovs_mutex_unlock(&dev->mutex); > + ovs_mutex_unlock(&dpdk_mutex); > + return err; > +} > + > +static int > +netdev_dpdk_set_config(struct netdev *netdev_, const struct smap > *args) +{ > + int mtu; > + > + dpdk_dev_parse_mtu(args, &mtu); > + > + return netdev_dpdk_set_mtu(netdev_, mtu); > +} > + > static int > netdev_dpdk_get_numa_id(const struct netdev *netdev_) > { > @@ -1358,54 +1541,6 @@ netdev_dpdk_get_mtu(const struct netdev > *netdev, int *mtup) > return 0; > } > - > -static int > -netdev_dpdk_set_mtu(const struct netdev *netdev, int mtu) > -{ > - struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); > - int old_mtu, err; > - struct dpdk_mp *old_mp; > - struct dpdk_mp *mp; > - > - ovs_mutex_lock(&dpdk_mutex); > - ovs_mutex_lock(&dev->mutex); > - if (dev->mtu == mtu) { > - err = 0; > - goto out; > - } > - > - mp = dpdk_mp_get(dev->socket_id, dev->mtu); > - if (!mp) { > - err = ENOMEM; > - goto out; > - } > - > - rte_eth_dev_stop(dev->port_id); > - > - old_mtu = dev->mtu; > - old_mp = dev->dpdk_mp; > - dev->dpdk_mp = mp; > - dev->mtu = mtu; > - dev->max_packet_len = MTU_TO_MAX_LEN(dev->mtu); > - > - err = dpdk_eth_dev_init(dev); > - if (err) { > - dpdk_mp_put(mp); > - dev->mtu = old_mtu; > - dev->dpdk_mp = old_mp; > - dev->max_packet_len = MTU_TO_MAX_LEN(dev->mtu); > - dpdk_eth_dev_init(dev); > - goto out; > - } > - > - dpdk_mp_put(old_mp); > - netdev_change_seq_changed(netdev); > -out: > - ovs_mutex_unlock(&dev->mutex); > - ovs_mutex_unlock(&dpdk_mutex); > - return err; > -} > - > static int > netdev_dpdk_get_carrier(const struct netdev *netdev_, bool *carrier); > > @@ -1682,7 +1817,7 @@ netdev_dpdk_get_status(const struct netdev > *netdev_, struct smap *args) smap_add_format(args, "numa_id", "%d", > rte_eth_dev_socket_id(dev->port_id)); smap_add_format(args, > "driver_name", "%s", dev_info.driver_name); smap_add_format(args, > "min_rx_bufsize", "%u", dev_info.min_rx_bufsize); > - smap_add_format(args, "max_rx_pktlen", "%u", > dev_info.max_rx_pktlen); > + smap_add_format(args, "max_rx_pktlen", "%u", > dev->max_packet_len); smap_add_format(args, "max_rx_queues", "%u", > dev_info.max_rx_queues); smap_add_format(args, "max_tx_queues", "%u", > dev_info.max_tx_queues); smap_add_format(args, "max_mac_addrs", "%u", > dev_info.max_mac_addrs); @@ -1904,6 +2039,51 @@ > dpdk_vhost_user_class_init(void) return 0; > } > > +/* Set the mtu of DPDK_DEV_VHOST ports */ > +static int > +netdev_dpdk_vhost_set_mtu(const struct netdev *netdev, int mtu) > +{ > + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); > + int err = 0; > + struct dpdk_mp *old_mp; > + struct dpdk_mp *mp; > + > + ovs_mutex_lock(&dpdk_mutex); > + ovs_mutex_lock(&dev->mutex); > + if (dev->mtu == mtu) { > + err = 0; > + goto out; > + } > + > + mp = dpdk_mp_get(dev->socket_id, mtu); > + if (!mp) { > + err = ENOMEM; > + goto out; > + } > + > + old_mp = dev->dpdk_mp; > + dev->dpdk_mp = mp; > + dev->mtu = mtu; > + dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); > + > + dpdk_mp_put(old_mp); > + netdev_change_seq_changed(netdev); > +out: > + ovs_mutex_unlock(&dev->mutex); > + ovs_mutex_unlock(&dpdk_mutex); > + return err; > +} > + > +static int > +netdev_dpdk_vhost_set_config(struct netdev *netdev_, const struct > smap *args) +{ > + int mtu; > + > + dpdk_dev_parse_mtu(args, &mtu); > + > + return netdev_dpdk_vhost_set_mtu(netdev_, mtu); > +} > + > static void > dpdk_common_init(void) > { > @@ -2040,8 +2220,9 @@ unlock_dpdk: > return err; > } > > -#define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, MULTIQ, > SEND, \ > - GET_CARRIER, GET_STATS, GET_FEATURES, GET_STATUS, > RXQ_RECV) \ +#define NETDEV_DPDK_CLASS(NAME, INIT, > CONSTRUCT, DESTRUCT, SET_CONFIG, \ > + MULTIQ, SEND, SET_MTU, GET_CARRIER, GET_STATS, > GET_FEATURES, \ > + GET_STATUS, > RXQ_RECV) \ > { \ > NAME, \ > INIT, /* init */ \ @@ > -2053,7 +2234,7 @@ unlock_dpdk: > DESTRUCT, \ > netdev_dpdk_dealloc, \ > netdev_dpdk_get_config, \ > - NULL, /* netdev_dpdk_set_config */ \ > + SET_CONFIG, \ > NULL, /* get_tunnel_config */ \ > NULL, /* build header */ \ > NULL, /* push header */ \ > @@ -2067,7 +2248,7 @@ unlock_dpdk: > netdev_dpdk_set_etheraddr, \ > netdev_dpdk_get_etheraddr, \ > netdev_dpdk_get_mtu, \ > - netdev_dpdk_set_mtu, \ > + SET_MTU, \ > netdev_dpdk_get_ifindex, \ > GET_CARRIER, \ > netdev_dpdk_get_carrier_resets, \ > @@ -2213,8 +2394,10 @@ static const struct netdev_class dpdk_class = > NULL, > netdev_dpdk_construct, > netdev_dpdk_destruct, > + netdev_dpdk_set_config, > netdev_dpdk_set_multiq, > netdev_dpdk_eth_send, > + netdev_dpdk_set_mtu, > netdev_dpdk_get_carrier, > netdev_dpdk_get_stats, > netdev_dpdk_get_features, > @@ -2227,8 +2410,10 @@ static const struct netdev_class > dpdk_ring_class = NULL, > netdev_dpdk_ring_construct, > netdev_dpdk_destruct, > + netdev_dpdk_set_config, > netdev_dpdk_set_multiq, > netdev_dpdk_ring_send, > + netdev_dpdk_set_mtu, > netdev_dpdk_get_carrier, > netdev_dpdk_get_stats, > netdev_dpdk_get_features, > @@ -2241,8 +2426,10 @@ static const struct netdev_class OVS_UNUSED > dpdk_vhost_cuse_class = dpdk_vhost_cuse_class_init, > netdev_dpdk_vhost_cuse_construct, > netdev_dpdk_vhost_destruct, > + NULL, > netdev_dpdk_vhost_set_multiq, > netdev_dpdk_vhost_send, > + NULL, > netdev_dpdk_vhost_get_carrier, > netdev_dpdk_vhost_get_stats, > NULL, > @@ -2255,8 +2442,10 @@ static const struct netdev_class OVS_UNUSED > dpdk_vhost_user_class = dpdk_vhost_user_class_init, > netdev_dpdk_vhost_user_construct, > netdev_dpdk_vhost_destruct, > + netdev_dpdk_vhost_set_config, > netdev_dpdk_vhost_set_multiq, > netdev_dpdk_vhost_send, > + netdev_dpdk_vhost_set_mtu, > netdev_dpdk_vhost_get_carrier, > netdev_dpdk_vhost_get_stats, > NULL, Other parts look good to me, thanks Mark! I tried this patch with small and jumbo (8000 bytes) frames from outside to DPDK port then vhost-user and then to guest using a traffic generator. I could see the packets inside of the VM just fine. I haven't tried with real traffic yet though. Thanks, -- fbl _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev