>Hi, > >I am working on a lab environment with OVS and DPDK attempting to get Jumbo >frames and mbuf >initialisation working, for DPDK port types only, with the current OVS master >(testing with >commit hash of 5d24608388bcf5610018cb51369adc2e6f3816e1) and the DPDK 16.04 >release. > >I have come across the development of supporting Jumbo frames with OVS with >these patches: > >[ovs-dev,V5,1/2] netdev-dpdk: clean up mbuf initialization - >https://patchwork.ozlabs.org/patch/585153/ >[ovs-dev,V5,2/2] netdev-dpdk: add jumbo frame support - >https://patchwork.ozlabs.org/patch/585154/ > >Part 1 of the patch set is now part of the mainline code. As I understand, >patch 2 will not >be upstreamed as it stands and is waiting upon further patchsets to allow >runtime >modification of netdev properties, such as MTU, as discussed here >http://openvswitch.org/pipermail/dev/2016-February/066940.html > >I would be interested in testing jumbo frame support with the current OVS >master and DPDK >versions mentioned, however patch 2/2 fails to apply successfully with the >latest OVS code. >Is there a later version of the jumbo patch that can be shared with the >current OVS code? > >Thanks, Jim
Hi Jim, I posted the rebased patch as an RFC to ovs-dev - http://openvswitch.org/pipermail/dev/2016-May/070892.html. Please note, though, that I've just resolved compilation issues, and haven't tested jumbo frame functionality itself. Hope this helps. Best regards, Mark _____________________ diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md index 7f76df8..9b83c78 100644 --- a/INSTALL.DPDK.md +++ b/INSTALL.DPDK.md @@ -913,10 +913,63 @@ by adding the following string: to <interface> sections of all network devices used by DPDK. Parameter 'N' determines how many queues can be used by the guest. +Jumbo Frames +------------ + +Support for Jumbo Frames may be enabled at run-time for DPDK-type ports. + +To avail of Jumbo Frame support, add the 'mtu_request' option to the ovs-vsctl +'add-port' command-line, along with the required MTU for the port. +e.g. + + ``` + ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk options:mtu_request=9000 + ``` + +When Jumbo Frames are enabled, the size of a DPDK port's mbuf segments are +increased, such that a full Jumbo Frame may be accommodated inside a single +mbuf segment. Once set, the MTU for a DPDK port is immutable. + +Note that from an OVSDB perspective, the `mtu_request` option for a specific +port may be disregarded once initially set, as subsequent modifications to this +field are disregarded by the DPDK port. As with non-DPDK ports, the MTU of DPDK +ports is reported by the `Interface` table's 'mtu' field. + +Jumbo frame support has been validated against 13312B frames, using the +DPDK `igb_uio` driver, but larger frames and other DPDK NIC drivers may +theoretically be supported. Supported port types excludes vHost-Cuse ports, as +that feature is pending deprecation. + +vHost Ports and Jumbo Frames +---------------------------- +Jumbo frame support is available for DPDK vHost-User ports only. Some additional +configuration is needed to take advantage of this feature: + + 1. `mergeable buffers` must be enabled for vHost ports, as demonstrated in + the QEMU command line snippet below: + + ``` + '-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \' + '-device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=on' + ``` + + 2. Where virtio devices are bound to the Linux kernel driver in a guest + environment (i.e. interfaces are not bound to an in-guest DPDK driver), the + MTU of those logical network interfaces must also be increased. This + avoids segmentation of Jumbo Frames in the guest. Note that 'MTU' refers + to the length of the IP packet only, and not that of the entire frame. + + e.g. To calculate the exact MTU of a standard IPv4 frame, subtract the L2 + header and CRC lengths (i.e. 18B) from the max supported frame size. + So, to set the MTU for a 13312B Jumbo Frame: + + ``` + ifconfig eth1 mtu 13294 + ``` + Restrictions: ------------- - - Work with 1500 MTU, needs few changes in DPDK lib to fix this issue. - Currently DPDK port does not make use any offload functionality. - DPDK-vHost support works with 1G huge pages. @@ -945,6 +998,11 @@ Restrictions: increased to the desired number of queues. Both DPDK and OVS must be recompiled for this change to take effect. + Jumbo Frames: + - `virtio-pmd`: DPDK apps in the guest do not exit gracefully. This is a DPDK + issue that is currently being investigated. + - vHost-Cuse: Jumbo Frame support is not available for vHost Cuse ports. + Bug Reporting: -------------- diff --git a/NEWS b/NEWS index ea7f3a1..4bc0371 100644 --- a/NEWS +++ b/NEWS @@ -26,6 +26,7 @@ Post-v2.5.0 assignment. * Type of log messages from PMD threads changed from INFO to DBG. * QoS functionality with sample egress-policer implementation. + * Support Jumbo Frames - ovs-benchmark: This utility has been removed due to lack of use and bitrot. - ovs-appctl: diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 208c5f5..98e8c3a 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -79,6 +79,8 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20); + sizeof(struct dp_packet) \ + RTE_PKTMBUF_HEADROOM) #define NETDEV_DPDK_MBUF_ALIGN 1024 +#define NETDEV_DPDK_MAX_FRAME_LEN 13312 +#define MTU_NOT_SET 0 /* Max and min number of packets in the mempool. OVS tries to allocate a * mempool with MAX_NB_MBUF: if this fails (because the system doesn't have @@ -531,6 +533,7 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq) { int diag = 0; int i; + struct rte_eth_conf conf = port_conf; /* A device may report more queues than it makes available (this has * been observed for Intel xl710, which reserves some of them for @@ -542,7 +545,15 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int n_rxq, int n_txq) VLOG_INFO("Retrying setup with (rxq:%d txq:%d)", n_rxq, n_txq); } - diag = rte_eth_dev_configure(dev->port_id, n_rxq, n_txq, &port_conf); + if (dev->mtu > ETHER_MTU) { + conf.rxmode.jumbo_frame = 1; + conf.rxmode.max_rx_pkt_len = dev->max_packet_len; + } else { + conf.rxmode.jumbo_frame = 0; + conf.rxmode.max_rx_pkt_len = 0; + } + + diag = rte_eth_dev_configure(dev->port_id, n_rxq, n_txq, &conf); if (diag) { break; } @@ -686,8 +697,6 @@ netdev_dpdk_init(struct netdev *netdev, unsigned int port_no, { struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); int sid; - int err = 0; - uint32_t buf_size; ovs_mutex_init(&dev->mutex); ovs_mutex_lock(&dev->mutex); @@ -707,15 +716,7 @@ netdev_dpdk_init(struct netdev *netdev, unsigned int port_no, dev->port_id = port_no; dev->type = type; dev->flags = 0; - dev->mtu = ETHER_MTU; - dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); - - buf_size = dpdk_buf_size(dev->mtu); - dev->dpdk_mp = dpdk_mp_get(dev->socket_id, FRAME_LEN_TO_MTU(buf_size)); - if (!dev->dpdk_mp) { - err = ENOMEM; - goto unlock; - } + dev->mtu = MTU_NOT_SET; /* Initialise QoS configuration to NULL and qos lock to unlocked */ dev->qos_conf = NULL; @@ -728,22 +729,14 @@ netdev_dpdk_init(struct netdev *netdev, unsigned int port_no, if (type == DPDK_DEV_ETH) { netdev_dpdk_alloc_txq(dev, NR_QUEUE); - err = dpdk_eth_dev_init(dev); - if (err) { - goto unlock; - } } else { netdev_dpdk_alloc_txq(dev, OVS_VHOST_MAX_QUEUE_NUM); } ovs_list_push_back(&dpdk_list, &dev->list_node); -unlock: - if (err) { - rte_free(dev->tx_q); - } ovs_mutex_unlock(&dev->mutex); - return err; + return 0; } /* dev_name must be the prefix followed by a positive decimal number. @@ -767,6 +760,31 @@ dpdk_dev_parse_name(const char dev_name[], const char prefix[], } } +static void +dpdk_dev_parse_mtu(const struct smap *args, int *mtu) +{ + const char *mtu_str = smap_get(args, "mtu_request"); + char *end_ptr = NULL; + int local_mtu; + + if (!mtu_str) { + local_mtu = ETHER_MTU; + } else { + local_mtu = strtoul(mtu_str, &end_ptr, 0); + if (local_mtu < ETHER_MTU || + local_mtu > FRAME_LEN_TO_MTU(NETDEV_DPDK_MAX_FRAME_LEN) || + *end_ptr != '\0') { + local_mtu = ETHER_MTU; + VLOG_WARN("Invalid mtu_request parameter - defaulting to %d.\n", + local_mtu); + } else { + VLOG_INFO("mtu_request parameter %d detected.\n", local_mtu); + } + } + + *mtu = local_mtu; +} + static int vhost_construct_helper(struct netdev *netdev) OVS_REQUIRES(dpdk_mutex) { @@ -913,15 +931,72 @@ netdev_dpdk_get_config(const struct netdev *netdev, struct smap *args) smap_add_format(args, "configured_rx_queues", "%d", netdev->n_rxq); smap_add_format(args, "requested_tx_queues", "%d", netdev->n_txq); smap_add_format(args, "configured_tx_queues", "%d", dev->real_n_txq); + smap_add_format(args, "mtu", "%d", dev->mtu); ovs_mutex_unlock(&dev->mutex); return 0; } +/* Set the mtu of DPDK_DEV_ETH ports */ +static int +netdev_dpdk_set_mtu(const struct netdev *netdev, int mtu) +{ + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); + int err, dpdk_mtu; + uint32_t buf_size; + struct dpdk_mp *mp; + + ovs_mutex_lock(&dpdk_mutex); + ovs_mutex_lock(&dev->mutex); + if (dev->mtu == mtu) { + err = 0; + goto out; + } + + buf_size = dpdk_buf_size(mtu); + dpdk_mtu = FRAME_LEN_TO_MTU(buf_size); + + mp = dpdk_mp_get(dev->socket_id, dpdk_mtu); + if (!mp) { + err = ENOMEM; + goto out; + } + + rte_eth_dev_stop(dev->port_id); + + dev->dpdk_mp = mp; + dev->mtu = mtu; + dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); + + err = dpdk_eth_dev_init(dev); + if (err) { + VLOG_WARN("Unable to set MTU '%d' for '%s'; falling back to default " + "MTU '%d'\n", mtu, dev->up.name, ETHER_MTU); + dpdk_mp_put(mp); + dev->mtu = ETHER_MTU; + mp = dpdk_mp_get(dev->socket_id, dev->mtu); + if (!mp) { + err = ENOMEM; + goto out; + } + dev->dpdk_mp = mp; + dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); + dpdk_eth_dev_init(dev); + goto out; + } else { + netdev_change_seq_changed(netdev); + } +out: + ovs_mutex_unlock(&dev->mutex); + ovs_mutex_unlock(&dpdk_mutex); + return err; +} + static int netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args) { struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); + int mtu; ovs_mutex_lock(&dev->mutex); netdev->requested_n_rxq = MAX(smap_get_int(args, "n_rxq", @@ -929,6 +1004,14 @@ netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args) netdev_change_seq_changed(netdev); ovs_mutex_unlock(&dev->mutex); + dpdk_dev_parse_mtu(args, &mtu); + + if (!dev->mtu) { + return netdev_dpdk_set_mtu(netdev, mtu); + } else if (mtu != dev->mtu) { + VLOG_WARN("Unable to set MTU %d for port %d; this port has immutable MTU " + "%d\n", mtu, dev->port_id, dev->mtu); + } return 0; } @@ -1580,57 +1663,6 @@ netdev_dpdk_get_mtu(const struct netdev *netdev, int *mtup) } static int -netdev_dpdk_set_mtu(const struct netdev *netdev, int mtu) -{ - struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); - int old_mtu, err, dpdk_mtu; - struct dpdk_mp *old_mp; - struct dpdk_mp *mp; - uint32_t buf_size; - - ovs_mutex_lock(&dpdk_mutex); - ovs_mutex_lock(&dev->mutex); - if (dev->mtu == mtu) { - err = 0; - goto out; - } - - buf_size = dpdk_buf_size(mtu); - dpdk_mtu = FRAME_LEN_TO_MTU(buf_size); - - mp = dpdk_mp_get(dev->socket_id, dpdk_mtu); - if (!mp) { - err = ENOMEM; - goto out; - } - - rte_eth_dev_stop(dev->port_id); - - old_mtu = dev->mtu; - old_mp = dev->dpdk_mp; - dev->dpdk_mp = mp; - dev->mtu = mtu; - dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); - - err = dpdk_eth_dev_init(dev); - if (err) { - dpdk_mp_put(mp); - dev->mtu = old_mtu; - dev->dpdk_mp = old_mp; - dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); - dpdk_eth_dev_init(dev); - goto out; - } - - dpdk_mp_put(old_mp); - netdev_change_seq_changed(netdev); -out: - ovs_mutex_unlock(&dev->mutex); - ovs_mutex_unlock(&dpdk_mutex); - return err; -} - -static int netdev_dpdk_get_carrier(const struct netdev *netdev, bool *carrier); static int @@ -2276,6 +2307,61 @@ dpdk_vhost_user_class_init(void) return 0; } +/* Set the mtu of DPDK_DEV_VHOST ports */ +static int +netdev_dpdk_vhost_set_mtu(const struct netdev *netdev, int mtu) +{ + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); + int err = 0; + struct dpdk_mp *mp; + + ovs_mutex_lock(&dpdk_mutex); + ovs_mutex_lock(&dev->mutex); + if (dev->mtu == mtu) { + err = 0; + goto out; + } + + mp = dpdk_mp_get(dev->socket_id, mtu); + if (!mp) { + err = ENOMEM; + goto out; + } + + dev->dpdk_mp = mp; + dev->mtu = mtu; + dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); + + netdev_change_seq_changed(netdev); +out: + ovs_mutex_unlock(&dev->mutex); + ovs_mutex_unlock(&dpdk_mutex); + return err; +} + +static int +netdev_dpdk_vhost_set_config(struct netdev *netdev, const struct smap *args) +{ + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); + int mtu; + + ovs_mutex_lock(&dev->mutex); + netdev->requested_n_rxq = MAX(smap_get_int(args, "n_rxq", + netdev->requested_n_rxq), 1); + netdev_change_seq_changed(netdev); + ovs_mutex_unlock(&dev->mutex); + + dpdk_dev_parse_mtu(args, &mtu); + + if (!dev->mtu) { + return netdev_dpdk_vhost_set_mtu(netdev, mtu); + } else if (mtu != dev->mtu) { + VLOG_WARN("Unable to set MTU %d for vhost port; this port has immutable MTU " + "%d\n", mtu, dev->mtu); + } + return 0; +} + static void dpdk_common_init(void) { @@ -2661,8 +2747,9 @@ static const struct dpdk_qos_ops egress_policer_ops = { egress_policer_run }; -#define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, MULTIQ, SEND, \ - GET_CARRIER, GET_STATS, GET_FEATURES, GET_STATUS, RXQ_RECV) \ +#define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, SET_CONFIG, \ + MULTIQ, SEND, SET_MTU, GET_CARRIER, GET_STATS, GET_FEATURES, \ + GET_STATUS, RXQ_RECV) \ { \ NAME, \ true, /* is_pmd */ \ @@ -2675,7 +2762,7 @@ static const struct dpdk_qos_ops egress_policer_ops = { DESTRUCT, \ netdev_dpdk_dealloc, \ netdev_dpdk_get_config, \ - netdev_dpdk_set_config, \ + SET_CONFIG , \ NULL, /* get_tunnel_config */ \ NULL, /* build header */ \ NULL, /* push header */ \ @@ -2689,7 +2776,7 @@ static const struct dpdk_qos_ops egress_policer_ops = { netdev_dpdk_set_etheraddr, \ netdev_dpdk_get_etheraddr, \ netdev_dpdk_get_mtu, \ - netdev_dpdk_set_mtu, \ + SET_MTU, \ netdev_dpdk_get_ifindex, \ GET_CARRIER, \ netdev_dpdk_get_carrier_resets, \ @@ -2834,8 +2921,10 @@ static const struct netdev_class dpdk_class = NULL, netdev_dpdk_construct, netdev_dpdk_destruct, + netdev_dpdk_set_config, netdev_dpdk_set_multiq, netdev_dpdk_eth_send, + netdev_dpdk_set_mtu, netdev_dpdk_get_carrier, netdev_dpdk_get_stats, netdev_dpdk_get_features, @@ -2848,8 +2937,10 @@ static const struct netdev_class dpdk_ring_class = NULL, netdev_dpdk_ring_construct, netdev_dpdk_destruct, + netdev_dpdk_set_config, netdev_dpdk_set_multiq, netdev_dpdk_ring_send, + netdev_dpdk_set_mtu, netdev_dpdk_get_carrier, netdev_dpdk_get_stats, netdev_dpdk_get_features, @@ -2862,8 +2953,10 @@ static const struct netdev_class OVS_UNUSED dpdk_vhost_cuse_class = dpdk_vhost_cuse_class_init, netdev_dpdk_vhost_cuse_construct, netdev_dpdk_vhost_destruct, + netdev_dpdk_set_config, netdev_dpdk_vhost_cuse_set_multiq, netdev_dpdk_vhost_send, + NULL, netdev_dpdk_vhost_get_carrier, netdev_dpdk_vhost_get_stats, NULL, @@ -2876,8 +2969,10 @@ static const struct netdev_class OVS_UNUSED dpdk_vhost_user_class = dpdk_vhost_user_class_init, netdev_dpdk_vhost_user_construct, netdev_dpdk_vhost_destruct, + netdev_dpdk_vhost_set_config, netdev_dpdk_vhost_set_multiq, netdev_dpdk_vhost_send, + netdev_dpdk_vhost_set_mtu, netdev_dpdk_vhost_get_carrier, netdev_dpdk_vhost_get_stats, NULL, >_______________________________________________ >discuss mailing list >discuss@openvswitch.org >http://openvswitch.org/mailman/listinfo/discuss _______________________________________________ discuss mailing list discuss@openvswitch.org http://openvswitch.org/mailman/listinfo/discuss