Hi Zoltan. Thanks for your mail - responses inline below. Cheers, Mark
> >Hi, > >I have a related question: is it a possible alternative with the current >codebase to use KNI for configuring the MTU? This would (as far as I'm aware) require the implementation of a new KNI netdev class, along with a handler/set of handlers in order to set the MTU (for that netdev class only), based on requests from ifconfig/ip link/etc. It wouldn't make a difference as far as IVSHMEM/Phy ports are concerned, so I don't think this approach is workable. If I've missed something though, please do let me know. >I never actually tried KNI, does it have a performance penalty to just >have it? I haven't used it in some time, but the fact that it passes packets to/accepts packets from the Linux Network Stack means that it's subject to all of the inherent performance bottlenecks that you'd expect. >The reason I'm asking these questions because OVS normally doesn't >configure the MTU, but it is outside of its scope. If there is a >reasonable alternative to do that for DPDK (like with KNI), we don't >need this change. >Sorry for the noise if this idea have been discussed previously. > >Regards, > >Zoltan > >On 01/02/16 10:18, Mark Kavanagh wrote: >> Add support for Jumbo Frames to DPDK-enabled port types, >> using single-segment-mbufs. >> >> Using this approach, the amount of memory allocated for each mbuf >> to store frame data is increased to a value greater than 1518B >> (typical Ethernet maximum frame length). The increased space >> available in the mbuf means that an entire Jumbo Frame can be carried >> in a single mbuf, as opposed to partitioning it across multiple mbuf >> segments. >> >> The amount of space allocated to each mbuf to hold frame data is >> defined dynamically by the user when adding a DPDK port to a bridge. >> If an MTU value is not supplied, or the user-supplied value is invalid, >> the MTU for the port defaults to standard Ethernet MTU (i.e. 1500B). >> >> Signed-off-by: Mark Kavanagh <mark.b.kavan...@intel.com> >> --- >> INSTALL.DPDK.md | 59 +++++++++- >> NEWS | 2 + >> lib/netdev-dpdk.c | 347 >> +++++++++++++++++++++++++++++++++++++++++------------- >> 3 files changed, 328 insertions(+), 80 deletions(-) >> >> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md >> index 96b686c..64ccd15 100644 >> --- a/INSTALL.DPDK.md >> +++ b/INSTALL.DPDK.md >> @@ -859,10 +859,61 @@ by adding the following string: >> to <interface> sections of all network devices used by DPDK. Parameter 'N' >> determines how many queues can be used by the guest. >> >> + >> +Jumbo Frames >> +------------ >> + >> +Support for Jumbo Frames may be enabled at run-time for DPDK-type ports. >> + >> +To avail of Jumbo Frame support, add the 'dpdk-mtu' option to the ovs-vsctl >> +'add-port' command-line, along with the required MTU for the port. >> +e.g. >> + >> + ``` >> + ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk >> options:dpdk-mtu=9000 >> + ``` >> + >> +When Jumbo Frames are enabled, the size of a DPDK port's mbuf segments are >> +increased, such that a full Jumbo Frame may be accommodated inside a single >> +mbuf segment. Once set, the MTU for a DPDK port is immutable. >> + >> +Jumbo frame support has been validated against 13312B frames, using the >> +DPDK `igb_uio` driver, but larger frames and other DPDK NIC drivers may >> +theoretically be supported. Supported port types excludes vHost-Cuse ports, >> as >> +this feature is pending deprecation. >> + >> + >> +vHost Ports and Jumbo Frames >> +---------------------------- >> +Jumbo frame support is available for DPDK vHost-User ports only. Some >> additional >> +configuration is needed to take advantage of this feature: >> + >> + 1. `mergeable buffers` must be enabled for vHost ports, as demonstrated in >> + the QEMU command line snippet below: >> + >> + ``` >> + '-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \' >> + '-device >> virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=on' >> + ``` >> + >> + 2. Where virtio devices are bound to the Linux kernel driver in a guest >> + environment (i.e. interfaces are not bound to an in-guest DPDK >> driver), the >> + MTU of those logical network interfaces must also be increased. This >> + avoids segmentation of Jumbo Frames in the guest. Note that 'MTU' >> refers >> + to the length of the IP packet only, and not that of the entire frame. >> + >> + e.g. To calculate the exact MTU of a standard IPv4 frame, subtract the >> L2 >> + header and CRC lengths (i.e. 18B) from the max supported frame size. >> + So, to set the MTU for a 13312B Jumbo Frame: >> + >> + ``` >> + ifconfig eth1 mtu 13294 >> + ``` >> + >> + >> Restrictions: >> ------------- >> >> - - Work with 1500 MTU, needs few changes in DPDK lib to fix this issue. >> - Currently DPDK port does not make use any offload functionality. >> - DPDK-vHost support works with 1G huge pages. >> >> @@ -903,6 +954,12 @@ Restrictions: >> the next release of DPDK (which includes the above patch) is available >> and >> integrated into OVS. >> >> + Jumbo Frames: >> + - `virtio-pmd`: DPDK apps in the guest do not exit gracefully. The source >> of >> + this issue is currently being investigated. >> + - vHost-Cuse: Jumbo Frame support is not available for vHost Cuse ports. >> + >> + >> Bug Reporting: >> -------------- >> >> diff --git a/NEWS b/NEWS >> index 5c18867..cd563e0 100644 >> --- a/NEWS >> +++ b/NEWS >> @@ -46,6 +46,8 @@ v2.5.0 - xx xxx xxxx >> abstractions, such as virtual L2 and L3 overlays and security groups. >> - RHEL packaging: >> * DPDK ports may now be created via network scripts (see README.RHEL). >> + - netdev-dpdk: >> + * Add Jumbo Frame Support >> >> >> v2.4.0 - 20 Aug 2015 >> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c >> index de7e488..76a5dcc 100644 >> --- a/lib/netdev-dpdk.c >> +++ b/lib/netdev-dpdk.c >> @@ -62,20 +62,25 @@ static struct vlog_rate_limit rl = >> VLOG_RATE_LIMIT_INIT(5, 20); >> #define OVS_CACHE_LINE_SIZE CACHE_LINE_SIZE >> #define OVS_VPORT_DPDK "ovs_dpdk" >> >> +#define NETDEV_DPDK_JUMBO_FRAME_ENABLED 1 >> +#define NETDEV_DPDK_DEFAULT_RX_BUFSIZE 1024 >> + >> /* >> * need to reserve tons of extra space in the mbufs so we can align the >> * DMA addresses to 4KB. >> * The minimum mbuf size is limited to avoid scatter behaviour and drop in >> * performance for standard Ethernet MTU. >> */ >> -#define MTU_TO_MAX_LEN(mtu) ((mtu) + ETHER_HDR_LEN + ETHER_CRC_LEN) >> -#define MBUF_SIZE_MTU(mtu) (MTU_TO_MAX_LEN(mtu) \ >> - + sizeof(struct dp_packet) \ >> - + RTE_PKTMBUF_HEADROOM) >> -#define MBUF_SIZE_DRIVER (2048 \ >> - + sizeof (struct rte_mbuf) \ >> - + RTE_PKTMBUF_HEADROOM) >> -#define MBUF_SIZE(mtu) MAX(MBUF_SIZE_MTU(mtu), MBUF_SIZE_DRIVER) >> +#define MTU_TO_FRAME_LEN(mtu) ((mtu) + ETHER_HDR_LEN + ETHER_CRC_LEN) >> +#define FRAME_LEN_TO_MTU(frame_len) ((frame_len)- ETHER_HDR_LEN - >> ETHER_CRC_LEN) >> +#define MBUF_SEGMENT_SIZE(mtu) ( MTU_TO_FRAME_LEN(mtu) \ >> + + sizeof(struct dp_packet) \ >> + + RTE_PKTMBUF_HEADROOM) >> + >> +/* This value should be specified as a multiple of the DPDK NIC driver's >> + * 'min_rx_bufsize' attribute (currently 1024B for 'igb_uio'). >> + */ >> +#define NETDEV_DPDK_MAX_FRAME_LEN 13312 >> >> /* Max and min number of packets in the mempool. OVS tries to allocate a >> * mempool with MAX_NB_MBUF: if this fails (because the system doesn't have >> @@ -86,6 +91,8 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, >> 20); >> #define MIN_NB_MBUF (4096 * 4) >> #define MP_CACHE_SZ RTE_MEMPOOL_CACHE_MAX_SIZE >> >> +#define DPDK_VLAN_TAG_LEN 4 >> + >> /* MAX_NB_MBUF can be divided by 2 many times, until MIN_NB_MBUF */ >> BUILD_ASSERT_DECL(MAX_NB_MBUF % ROUND_DOWN_POW2(MAX_NB_MBUF/MIN_NB_MBUF) >> == 0); >> >> @@ -114,7 +121,6 @@ static const struct rte_eth_conf port_conf = { >> .header_split = 0, /* Header Split disabled */ >> .hw_ip_checksum = 0, /* IP checksum offload disabled */ >> .hw_vlan_filter = 0, /* VLAN filtering disabled */ >> - .jumbo_frame = 0, /* Jumbo Frame Support disabled */ >> .hw_strip_crc = 0, >> }, >> .rx_adv_conf = { >> @@ -254,6 +260,41 @@ is_dpdk_class(const struct netdev_class *class) >> return class->construct == netdev_dpdk_construct; >> } >> >> +/* DPDK NIC drivers allocate RX buffers at a particular granularity >> + * (specified by rte_eth_dev_info.min_rx_bufsize - currently 1K for >> igb_uio). >> + * If 'frame_len' is not a multiple of this value, insufficient buffers are >> + * allocated to accomodate the packet in its entirety. Furthermore, the >> igb_uio >> + * driver needs to ensure that there is also sufficient space in the Rx >> buffer >> + * to accommodate two VLAN tags (for QinQ frames). If the RX buffer is too >> + * small, then the driver enables scatter RX behaviour, which reduces >> + * performance. To prevent this, use a buffer size that is closest to >> + * 'frame_len', but which satisfies the aforementioned criteria. >> + */ >> +static uint32_t >> +dpdk_buf_size(struct netdev_dpdk *netdev, int frame_len) >> +{ >> + struct rte_eth_dev_info info; >> + uint32_t buf_size; >> + /* XXX: This is a workaround for DPDK v2.2, and needs to be refactored >> with a >> + * future DPDK release. */ >> + uint32_t len = frame_len + (2 * DPDK_VLAN_TAG_LEN); >> + >> + if(netdev->type == DPDK_DEV_ETH) { >> + rte_eth_dev_info_get(netdev->port_id, &info); >> + buf_size = (info.min_rx_bufsize == 0) ? >> + NETDEV_DPDK_DEFAULT_RX_BUFSIZE : >> + info.min_rx_bufsize; >> + } else { >> + buf_size = NETDEV_DPDK_DEFAULT_RX_BUFSIZE; >> + } >> + >> + if(len % buf_size) { >> + len = buf_size * ((len/buf_size) + 1); >> + } >> + >> + return len; >> +} >> + >> /* XXX: use dpdk malloc for entire OVS. in fact huge page should be used >> * for all other segments data, bss and text. */ >> >> @@ -280,26 +321,65 @@ free_dpdk_buf(struct dp_packet *p) >> } >> >> static void >> -__rte_pktmbuf_init(struct rte_mempool *mp, >> - void *opaque_arg OVS_UNUSED, >> - void *_m, >> - unsigned i OVS_UNUSED) >> +ovs_rte_pktmbuf_pool_init(struct rte_mempool *mp, void *opaque_arg) >> { >> - struct rte_mbuf *m = _m; >> - uint32_t buf_len = mp->elt_size - sizeof(struct dp_packet); >> + struct rte_pktmbuf_pool_private *user_mbp_priv, *mbp_priv; >> + struct rte_pktmbuf_pool_private default_mbp_priv; >> + uint16_t roomsz; >> >> RTE_MBUF_ASSERT(mp->elt_size >= sizeof(struct dp_packet)); >> >> + /* if no structure is provided, assume no mbuf private area */ >> + >> + user_mbp_priv = opaque_arg; >> + if (user_mbp_priv == NULL) { >> + default_mbp_priv.mbuf_priv_size = 0; >> + if (mp->elt_size > sizeof(struct dp_packet)) { >> + roomsz = mp->elt_size - sizeof(struct dp_packet); >> + } else { >> + roomsz = 0; >> + } >> + default_mbp_priv.mbuf_data_room_size = roomsz; >> + user_mbp_priv = &default_mbp_priv; >> + } >> + >> + RTE_MBUF_ASSERT(mp->elt_size >= sizeof(struct dp_packet) + >> + user_mbp_priv->mbuf_data_room_size + >> + user_mbp_priv->mbuf_priv_size); >> + >> + mbp_priv = rte_mempool_get_priv(mp); >> + memcpy(mbp_priv, user_mbp_priv, sizeof(*mbp_priv)); >> +} >> + >> +/* Initialise some fields in the mbuf structure that are not modified by the >> + * user once created (origin pool, buffer start address, etc.*/ >> +static void >> +__ovs_rte_pktmbuf_init(struct rte_mempool *mp, >> + void *opaque_arg OVS_UNUSED, >> + void *_m, >> + unsigned i OVS_UNUSED) >> +{ >> + struct rte_mbuf *m = _m; >> + uint32_t buf_size, buf_len, priv_size; >> + >> + priv_size = rte_pktmbuf_priv_size(mp); >> + buf_size = sizeof(struct dp_packet) + priv_size; >> + buf_len = rte_pktmbuf_data_room_size(mp); >> + >> + RTE_MBUF_ASSERT(RTE_ALIGN(priv_size, RTE_MBUF_PRIV_ALIGN) == priv_size); >> + RTE_MBUF_ASSERT(mp->elt_size >= buf_size); >> + RTE_MBUF_ASSERT(buf_len <= UINT16_MAX); >> + >> memset(m, 0, mp->elt_size); >> >> - /* start of buffer is just after mbuf structure */ >> - m->buf_addr = (char *)m + sizeof(struct dp_packet); >> - m->buf_physaddr = rte_mempool_virt2phy(mp, m) + >> - sizeof(struct dp_packet); >> + /* start of buffer is after dp_packet structure and priv data */ >> + m->priv_size = priv_size; >> + m->buf_addr = (char *)m + buf_size; >> + m->buf_physaddr = rte_mempool_virt2phy(mp, m) + buf_size; >> m->buf_len = (uint16_t)buf_len; >> >> /* keep some headroom between start of buffer and data */ >> - m->data_off = RTE_MIN(RTE_PKTMBUF_HEADROOM, m->buf_len); >> + m->data_off = RTE_MIN(RTE_PKTMBUF_HEADROOM, (uint16_t)m->buf_len); >> >> /* init some constant fields */ >> m->pool = mp; >> @@ -315,7 +395,7 @@ ovs_rte_pktmbuf_init(struct rte_mempool *mp, >> { >> struct rte_mbuf *m = _m; >> >> - __rte_pktmbuf_init(mp, opaque_arg, _m, i); >> + __ovs_rte_pktmbuf_init(mp, opaque_arg, m, i); >> >> dp_packet_init_dpdk((struct dp_packet *) m, m->buf_len); >> } >> @@ -326,6 +406,7 @@ dpdk_mp_get(int socket_id, int mtu) >> OVS_REQUIRES(dpdk_mutex) >> struct dpdk_mp *dmp = NULL; >> char mp_name[RTE_MEMPOOL_NAMESIZE]; >> unsigned mp_size; >> + struct rte_pktmbuf_pool_private mbp_priv; >> >> LIST_FOR_EACH (dmp, list_node, &dpdk_mp_list) { >> if (dmp->socket_id == socket_id && dmp->mtu == mtu) { >> @@ -338,6 +419,8 @@ dpdk_mp_get(int socket_id, int mtu) >> OVS_REQUIRES(dpdk_mutex) >> dmp->socket_id = socket_id; >> dmp->mtu = mtu; >> dmp->refcount = 1; >> + mbp_priv.mbuf_data_room_size = MTU_TO_FRAME_LEN(mtu) + >> RTE_PKTMBUF_HEADROOM; >> + mbp_priv.mbuf_priv_size = 0; >> >> mp_size = MAX_NB_MBUF; >> do { >> @@ -346,10 +429,10 @@ dpdk_mp_get(int socket_id, int mtu) >> OVS_REQUIRES(dpdk_mutex) >> return NULL; >> } >> >> - dmp->mp = rte_mempool_create(mp_name, mp_size, MBUF_SIZE(mtu), >> + dmp->mp = rte_mempool_create(mp_name, mp_size, >> MBUF_SEGMENT_SIZE(mtu), >> MP_CACHE_SZ, >> sizeof(struct >> rte_pktmbuf_pool_private), >> - rte_pktmbuf_pool_init, NULL, >> + ovs_rte_pktmbuf_pool_init, &mbp_priv, >> ovs_rte_pktmbuf_init, NULL, >> socket_id, 0); >> } while (!dmp->mp && rte_errno == ENOMEM && (mp_size /= 2) >= >> MIN_NB_MBUF); >> @@ -433,6 +516,7 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int >> n_rxq, int n_txq) >> { >> int diag = 0; >> int i; >> + struct rte_eth_conf conf = port_conf; >> >> /* A device may report more queues than it makes available (this has >> * been observed for Intel xl710, which reserves some of them for >> @@ -444,7 +528,12 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int >> n_rxq, int >n_txq) >> VLOG_INFO("Retrying setup with (rxq:%d txq:%d)", n_rxq, n_txq); >> } >> >> - diag = rte_eth_dev_configure(dev->port_id, n_rxq, n_txq, >> &port_conf); >> + if(OVS_UNLIKELY(dev->mtu > ETHER_MTU)) { >> + conf.rxmode.jumbo_frame = NETDEV_DPDK_JUMBO_FRAME_ENABLED; >> + conf.rxmode.max_rx_pkt_len = MTU_TO_FRAME_LEN(dev->mtu); >> + } >> + >> + diag = rte_eth_dev_configure(dev->port_id, n_rxq, n_txq, &conf); >> if (diag) { >> break; >> } >> @@ -586,6 +675,7 @@ netdev_dpdk_init(struct netdev *netdev_, unsigned int >> port_no, >> struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_); >> int sid; >> int err = 0; >> + uint32_t buf_size; >> >> ovs_mutex_init(&netdev->mutex); >> ovs_mutex_lock(&netdev->mutex); >> @@ -605,10 +695,16 @@ netdev_dpdk_init(struct netdev *netdev_, unsigned int >> port_no, >> netdev->port_id = port_no; >> netdev->type = type; >> netdev->flags = 0; >> + >> + /* Initialize port's MTU and frame len to the default Ethernet values. >> + * Larger, user-specified (jumbo) frame buffers are accommodated in >> + * netdev_dpdk_set_config. >> + */ >> + netdev->max_packet_len = ETHER_MAX_LEN; >> netdev->mtu = ETHER_MTU; >> - netdev->max_packet_len = MTU_TO_MAX_LEN(netdev->mtu); >> >> - netdev->dpdk_mp = dpdk_mp_get(netdev->socket_id, netdev->mtu); >> + buf_size = dpdk_buf_size(netdev, ETHER_MAX_LEN); >> + netdev->dpdk_mp = dpdk_mp_get(netdev->socket_id, >> FRAME_LEN_TO_MTU(buf_size)); >> if (!netdev->dpdk_mp) { >> err = ENOMEM; >> goto unlock; >> @@ -651,6 +747,27 @@ dpdk_dev_parse_name(const char dev_name[], const char >> prefix[], >> return 0; >> } >> >> +static void >> +dpdk_dev_parse_mtu(const struct smap *args, int *mtu) >> +{ >> + const char *mtu_str = smap_get(args, "dpdk-mtu"); >> + char *end_ptr = NULL; >> + int local_mtu; >> + >> + if(mtu_str) { >> + local_mtu = strtoul(mtu_str, &end_ptr, 0); >> + } >> + if(!mtu_str || local_mtu < ETHER_MTU || >> + local_mtu > FRAME_LEN_TO_MTU(NETDEV_DPDK_MAX_FRAME_LEN) >> || >> + *end_ptr != '\0') { >> + local_mtu = ETHER_MTU; >> + VLOG_WARN("Invalid or missing dpdk-mtu parameter - defaulting to >> %d.\n", >> + local_mtu); >> + } >> + >> + *mtu = local_mtu; >> +} >> + >> static int >> vhost_construct_helper(struct netdev *netdev_) OVS_REQUIRES(dpdk_mutex) >> { >> @@ -777,11 +894,77 @@ netdev_dpdk_get_config(const struct netdev *netdev_, >> struct smap >*args) >> smap_add_format(args, "configured_rx_queues", "%d", netdev_->n_rxq); >> smap_add_format(args, "requested_tx_queues", "%d", netdev_->n_txq); >> smap_add_format(args, "configured_tx_queues", "%d", dev->real_n_txq); >> + smap_add_format(args, "mtu", "%d", dev->mtu); >> ovs_mutex_unlock(&dev->mutex); >> >> return 0; >> } >> >> +/* Set the mtu of DPDK_DEV_ETH ports */ >> +static int >> +netdev_dpdk_set_mtu(const struct netdev *netdev, int mtu) >> +{ >> + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); >> + int old_mtu, err; >> + uint32_t buf_size; >> + int dpdk_mtu; >> + struct dpdk_mp *old_mp; >> + struct dpdk_mp *mp; >> + >> + ovs_mutex_lock(&dpdk_mutex); >> + ovs_mutex_lock(&dev->mutex); >> + if (dev->mtu == mtu) { >> + err = 0; >> + goto out; >> + } >> + >> + buf_size = dpdk_buf_size(dev, MTU_TO_FRAME_LEN(mtu)); >> + dpdk_mtu = FRAME_LEN_TO_MTU(buf_size); >> + >> + mp = dpdk_mp_get(dev->socket_id, dpdk_mtu); >> + if (!mp) { >> + err = ENOMEM; >> + goto out; >> + } >> + >> + rte_eth_dev_stop(dev->port_id); >> + >> + old_mtu = dev->mtu; >> + old_mp = dev->dpdk_mp; >> + dev->dpdk_mp = mp; >> + dev->mtu = mtu; >> + dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); >> + >> + err = dpdk_eth_dev_init(dev); >> + if (err) { >> + VLOG_WARN("Unable to set MTU '%d' for '%s'; reverting to last known >> " >> + "good value '%d'\n", mtu, dev->up.name, old_mtu); >> + dpdk_mp_put(mp); >> + dev->mtu = old_mtu; >> + dev->dpdk_mp = old_mp; >> + dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); >> + dpdk_eth_dev_init(dev); >> + goto out; >> + } else { >> + dpdk_mp_put(old_mp); >> + netdev_change_seq_changed(netdev); >> + } >> +out: >> + ovs_mutex_unlock(&dev->mutex); >> + ovs_mutex_unlock(&dpdk_mutex); >> + return err; >> +} >> + >> +static int >> +netdev_dpdk_set_config(struct netdev *netdev_, const struct smap *args) >> +{ >> + int mtu; >> + >> + dpdk_dev_parse_mtu(args, &mtu); >> + >> + return netdev_dpdk_set_mtu(netdev_, mtu); >> +} >> + >> static int >> netdev_dpdk_get_numa_id(const struct netdev *netdev_) >> { >> @@ -1358,54 +1541,6 @@ netdev_dpdk_get_mtu(const struct netdev *netdev, int >> *mtup) >> >> return 0; >> } >> - >> -static int >> -netdev_dpdk_set_mtu(const struct netdev *netdev, int mtu) >> -{ >> - struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); >> - int old_mtu, err; >> - struct dpdk_mp *old_mp; >> - struct dpdk_mp *mp; >> - >> - ovs_mutex_lock(&dpdk_mutex); >> - ovs_mutex_lock(&dev->mutex); >> - if (dev->mtu == mtu) { >> - err = 0; >> - goto out; >> - } >> - >> - mp = dpdk_mp_get(dev->socket_id, dev->mtu); >> - if (!mp) { >> - err = ENOMEM; >> - goto out; >> - } >> - >> - rte_eth_dev_stop(dev->port_id); >> - >> - old_mtu = dev->mtu; >> - old_mp = dev->dpdk_mp; >> - dev->dpdk_mp = mp; >> - dev->mtu = mtu; >> - dev->max_packet_len = MTU_TO_MAX_LEN(dev->mtu); >> - >> - err = dpdk_eth_dev_init(dev); >> - if (err) { >> - dpdk_mp_put(mp); >> - dev->mtu = old_mtu; >> - dev->dpdk_mp = old_mp; >> - dev->max_packet_len = MTU_TO_MAX_LEN(dev->mtu); >> - dpdk_eth_dev_init(dev); >> - goto out; >> - } >> - >> - dpdk_mp_put(old_mp); >> - netdev_change_seq_changed(netdev); >> -out: >> - ovs_mutex_unlock(&dev->mutex); >> - ovs_mutex_unlock(&dpdk_mutex); >> - return err; >> -} >> - >> static int >> netdev_dpdk_get_carrier(const struct netdev *netdev_, bool *carrier); >> >> @@ -1682,7 +1817,7 @@ netdev_dpdk_get_status(const struct netdev *netdev_, >> struct smap >*args) >> smap_add_format(args, "numa_id", "%d", >> rte_eth_dev_socket_id(dev->port_id)); >> smap_add_format(args, "driver_name", "%s", dev_info.driver_name); >> smap_add_format(args, "min_rx_bufsize", "%u", dev_info.min_rx_bufsize); >> - smap_add_format(args, "max_rx_pktlen", "%u", dev_info.max_rx_pktlen); >> + smap_add_format(args, "max_rx_pktlen", "%u", dev->max_packet_len); >> smap_add_format(args, "max_rx_queues", "%u", dev_info.max_rx_queues); >> smap_add_format(args, "max_tx_queues", "%u", dev_info.max_tx_queues); >> smap_add_format(args, "max_mac_addrs", "%u", dev_info.max_mac_addrs); >> @@ -1904,6 +2039,51 @@ dpdk_vhost_user_class_init(void) >> return 0; >> } >> >> +/* Set the mtu of DPDK_DEV_VHOST ports */ >> +static int >> +netdev_dpdk_vhost_set_mtu(const struct netdev *netdev, int mtu) >> +{ >> + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); >> + int err = 0; >> + struct dpdk_mp *old_mp; >> + struct dpdk_mp *mp; >> + >> + ovs_mutex_lock(&dpdk_mutex); >> + ovs_mutex_lock(&dev->mutex); >> + if (dev->mtu == mtu) { >> + err = 0; >> + goto out; >> + } >> + >> + mp = dpdk_mp_get(dev->socket_id, mtu); >> + if (!mp) { >> + err = ENOMEM; >> + goto out; >> + } >> + >> + old_mp = dev->dpdk_mp; >> + dev->dpdk_mp = mp; >> + dev->mtu = mtu; >> + dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); >> + >> + dpdk_mp_put(old_mp); >> + netdev_change_seq_changed(netdev); >> +out: >> + ovs_mutex_unlock(&dev->mutex); >> + ovs_mutex_unlock(&dpdk_mutex); >> + return err; >> +} >> + >> +static int >> +netdev_dpdk_vhost_set_config(struct netdev *netdev_, const struct smap >> *args) >> +{ >> + int mtu; >> + >> + dpdk_dev_parse_mtu(args, &mtu); >> + >> + return netdev_dpdk_vhost_set_mtu(netdev_, mtu); >> +} >> + >> static void >> dpdk_common_init(void) >> { >> @@ -2040,8 +2220,9 @@ unlock_dpdk: >> return err; >> } >> >> -#define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, MULTIQ, SEND, \ >> - GET_CARRIER, GET_STATS, GET_FEATURES, GET_STATUS, RXQ_RECV) \ >> +#define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, SET_CONFIG, \ >> + MULTIQ, SEND, SET_MTU, GET_CARRIER, GET_STATS, GET_FEATURES, \ >> + GET_STATUS, RXQ_RECV) \ >> { \ >> NAME, \ >> INIT, /* init */ \ >> @@ -2053,7 +2234,7 @@ unlock_dpdk: >> DESTRUCT, \ >> netdev_dpdk_dealloc, \ >> netdev_dpdk_get_config, \ >> - NULL, /* netdev_dpdk_set_config */ \ >> + SET_CONFIG, \ >> NULL, /* get_tunnel_config */ \ >> NULL, /* build header */ \ >> NULL, /* push header */ \ >> @@ -2067,7 +2248,7 @@ unlock_dpdk: >> netdev_dpdk_set_etheraddr, \ >> netdev_dpdk_get_etheraddr, \ >> netdev_dpdk_get_mtu, \ >> - netdev_dpdk_set_mtu, \ >> + SET_MTU, \ >> netdev_dpdk_get_ifindex, \ >> GET_CARRIER, \ >> netdev_dpdk_get_carrier_resets, \ >> @@ -2213,8 +2394,10 @@ static const struct netdev_class dpdk_class = >> NULL, >> netdev_dpdk_construct, >> netdev_dpdk_destruct, >> + netdev_dpdk_set_config, >> netdev_dpdk_set_multiq, >> netdev_dpdk_eth_send, >> + netdev_dpdk_set_mtu, >> netdev_dpdk_get_carrier, >> netdev_dpdk_get_stats, >> netdev_dpdk_get_features, >> @@ -2227,8 +2410,10 @@ static const struct netdev_class dpdk_ring_class = >> NULL, >> netdev_dpdk_ring_construct, >> netdev_dpdk_destruct, >> + netdev_dpdk_set_config, >> netdev_dpdk_set_multiq, >> netdev_dpdk_ring_send, >> + netdev_dpdk_set_mtu, >> netdev_dpdk_get_carrier, >> netdev_dpdk_get_stats, >> netdev_dpdk_get_features, >> @@ -2241,8 +2426,10 @@ static const struct netdev_class OVS_UNUSED >> dpdk_vhost_cuse_class = >> dpdk_vhost_cuse_class_init, >> netdev_dpdk_vhost_cuse_construct, >> netdev_dpdk_vhost_destruct, >> + NULL, >> netdev_dpdk_vhost_set_multiq, >> netdev_dpdk_vhost_send, >> + NULL, >> netdev_dpdk_vhost_get_carrier, >> netdev_dpdk_vhost_get_stats, >> NULL, >> @@ -2255,8 +2442,10 @@ static const struct netdev_class OVS_UNUSED >> dpdk_vhost_user_class = >> dpdk_vhost_user_class_init, >> netdev_dpdk_vhost_user_construct, >> netdev_dpdk_vhost_destruct, >> + netdev_dpdk_vhost_set_config, >> netdev_dpdk_vhost_set_multiq, >> netdev_dpdk_vhost_send, >> + netdev_dpdk_vhost_set_mtu, >> netdev_dpdk_vhost_get_carrier, >> netdev_dpdk_vhost_get_stats, >> NULL, >> _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev