On Fri, 19 Feb 2016 11:25:12 +0000 Mark Kavanagh <mark.b.kavan...@intel.com> wrote:
> Add support for Jumbo Frames to DPDK-enabled port types, > using single-segment-mbufs. > > Using this approach, the amount of memory allocated for each mbuf > to store frame data is increased to a value greater than 1518B > (typical Ethernet maximum frame length). The increased space > available in the mbuf means that an entire Jumbo Frame can be carried > in a single mbuf, as opposed to partitioning it across multiple mbuf > segments. > > The amount of space allocated to each mbuf to hold frame data is > defined dynamically by the user when adding a DPDK port to a bridge. > If an MTU value is not supplied, or the user-supplied value is invalid, > the MTU for the port defaults to standard Ethernet MTU (i.e. 1500B). > > Signed-off-by: Mark Kavanagh <mark.b.kavan...@intel.com> > --- > INSTALL.DPDK.md | 60 ++++++++++++- > NEWS | 3 +- > lib/netdev-dpdk.c | 248 > +++++++++++++++++++++++++++++++++++++----------------- > 3 files changed, 233 insertions(+), 78 deletions(-) > > diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md > index d892788..4ca98cb 100644 > --- a/INSTALL.DPDK.md > +++ b/INSTALL.DPDK.md > @@ -878,10 +878,63 @@ by adding the following string: > to <interface> sections of all network devices used by DPDK. Parameter 'N' > determines how many queues can be used by the guest. > > +Jumbo Frames > +------------ > + > +Support for Jumbo Frames may be enabled at run-time for DPDK-type ports. > + > +To avail of Jumbo Frame support, add the 'mtu_request' option to the > ovs-vsctl > +'add-port' command-line, along with the required MTU for the port. > +e.g. > + > + ``` > + ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk > options:mtu_request=9000 > + ``` > + > +When Jumbo Frames are enabled, the size of a DPDK port's mbuf segments are > +increased, such that a full Jumbo Frame may be accommodated inside a single > +mbuf segment. Once set, the MTU for a DPDK port is immutable. > + > +Note that from an OVSDB perspective, the `mtu_request` option for a specific > +port may be disregarded once initially set, as subsequent modifications to > this > +field are disregarded by the DPDK port. As with non-DPDK ports, the MTU of > DPDK > +ports is reported by the `Interface` table's 'mtu' field. > + > +Jumbo frame support has been validated against 13312B frames, using the > +DPDK `igb_uio` driver, but larger frames and other DPDK NIC drivers may > +theoretically be supported. Supported port types excludes vHost-Cuse ports, > as > +that feature is pending deprecation. > + > +vHost Ports and Jumbo Frames > +---------------------------- > +Jumbo frame support is available for DPDK vHost-User ports only. Some > additional > +configuration is needed to take advantage of this feature: > + > + 1. `mergeable buffers` must be enabled for vHost ports, as demonstrated in > + the QEMU command line snippet below: > + > + ``` > + '-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \' > + '-device > virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=on' > + ``` > + > + 2. Where virtio devices are bound to the Linux kernel driver in a guest > + environment (i.e. interfaces are not bound to an in-guest DPDK driver), > the > + MTU of those logical network interfaces must also be increased. This > + avoids segmentation of Jumbo Frames in the guest. Note that 'MTU' refers > + to the length of the IP packet only, and not that of the entire frame. > + > + e.g. To calculate the exact MTU of a standard IPv4 frame, subtract the > L2 > + header and CRC lengths (i.e. 18B) from the max supported frame size. > + So, to set the MTU for a 13312B Jumbo Frame: > + > + ``` > + ifconfig eth1 mtu 13294 > + ``` > + > Restrictions: > ------------- > > - - Work with 1500 MTU, needs few changes in DPDK lib to fix this issue. > - Currently DPDK port does not make use any offload functionality. > - DPDK-vHost support works with 1G huge pages. > > @@ -922,6 +975,11 @@ Restrictions: > the next release of DPDK (which includes the above patch) is available > and > integrated into OVS. > > + Jumbo Frames: > + - `virtio-pmd`: DPDK apps in the guest do not exit gracefully. This is a > DPDK > + issue that is currently being investigated. > + - vHost-Cuse: Jumbo Frame support is not available for vHost Cuse ports. > + > Bug Reporting: > -------------- > > diff --git a/NEWS b/NEWS > index 3e33871..43127f9 100644 > --- a/NEWS > +++ b/NEWS > @@ -10,10 +10,11 @@ Post-v2.5.0 > - DPDK: > * New option "n_rxq" for PMD interfaces. > Old 'other_config:n-dpdk-rxqs' is no longer supported. > + * Support Jumbo Frames > + > - ovs-benchmark: This utility has been removed due to lack of use and > bitrot. > > - > v2.5.0 - xx xxx xxxx > --------------------- > - Dropped support for Python older than version 2.7. As a consequence, > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c > index 2a06bb5..ac89ee6 100644 > --- a/lib/netdev-dpdk.c > +++ b/lib/netdev-dpdk.c > @@ -77,6 +77,8 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, > 20); > + sizeof(struct dp_packet) \ > + RTE_PKTMBUF_HEADROOM) > #define NETDEV_DPDK_MBUF_ALIGN 1024 > +#define NETDEV_DPDK_MAX_FRAME_LEN 13312 > +#define MTU_NOT_SET 0 > > /* Max and min number of packets in the mempool. OVS tries to allocate a > * mempool with MAX_NB_MBUF: if this fails (because the system doesn't have > @@ -422,6 +424,7 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int > n_rxq, int n_txq) > { > int diag = 0; > int i; > + struct rte_eth_conf conf = port_conf; > > /* A device may report more queues than it makes available (this has > * been observed for Intel xl710, which reserves some of them for > @@ -433,7 +436,15 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int > n_rxq, int n_txq) > VLOG_INFO("Retrying setup with (rxq:%d txq:%d)", n_rxq, n_txq); > } > > - diag = rte_eth_dev_configure(dev->port_id, n_rxq, n_txq, &port_conf); > + if (dev->mtu > ETHER_MTU) { > + conf.rxmode.jumbo_frame = 1; > + conf.rxmode.max_rx_pkt_len = dev->max_packet_len; > + } else { > + conf.rxmode.jumbo_frame = 0; > + conf.rxmode.max_rx_pkt_len = 0; > + } > + > + diag = rte_eth_dev_configure(dev->port_id, n_rxq, n_txq, &conf); > if (diag) { > break; > } > @@ -574,8 +585,6 @@ netdev_dpdk_init(struct netdev *netdev_, unsigned int > port_no, > { > struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_); > int sid; > - int err = 0; > - uint32_t buf_size; > > ovs_mutex_init(&netdev->mutex); > ovs_mutex_lock(&netdev->mutex); > @@ -595,15 +604,7 @@ netdev_dpdk_init(struct netdev *netdev_, unsigned int > port_no, > netdev->port_id = port_no; > netdev->type = type; > netdev->flags = 0; > - netdev->mtu = ETHER_MTU; > - netdev->max_packet_len = MTU_TO_FRAME_LEN(netdev->mtu); > - > - buf_size = dpdk_buf_size(netdev->mtu); > - netdev->dpdk_mp = dpdk_mp_get(netdev->socket_id, > FRAME_LEN_TO_MTU(buf_size)); > - if (!netdev->dpdk_mp) { > - err = ENOMEM; > - goto unlock; > - } > + netdev->mtu = MTU_NOT_SET; > > netdev_->n_txq = NR_QUEUE; > netdev_->n_rxq = NR_QUEUE; > @@ -612,20 +613,12 @@ netdev_dpdk_init(struct netdev *netdev_, unsigned int > port_no, > > if (type == DPDK_DEV_ETH) { > netdev_dpdk_alloc_txq(netdev, NR_QUEUE); > - err = dpdk_eth_dev_init(netdev); > - if (err) { > - goto unlock; > - } > } > > list_push_back(&dpdk_list, &netdev->list_node); > > -unlock: > - if (err) { > - rte_free(netdev->tx_q); > - } > ovs_mutex_unlock(&netdev->mutex); > - return err; > + return 0; > } > > static int > @@ -643,6 +636,31 @@ dpdk_dev_parse_name(const char dev_name[], const char > prefix[], > return 0; > } > > +static void > +dpdk_dev_parse_mtu(const struct smap *args, int *mtu) > +{ > + const char *mtu_str = smap_get(args, "mtu_request"); > + char *end_ptr = NULL; > + int local_mtu; > + > + if (!mtu_str) { > + local_mtu = ETHER_MTU; > + } else { > + local_mtu = strtoul(mtu_str, &end_ptr, 0); > + if (local_mtu < ETHER_MTU || > + local_mtu > FRAME_LEN_TO_MTU(NETDEV_DPDK_MAX_FRAME_LEN) || > + *end_ptr != '\0') { > + local_mtu = ETHER_MTU; > + VLOG_WARN("Invalid mtu_request parameter - defaulting to %d.\n", > + local_mtu); > + } else { > + VLOG_INFO("mtu_request parameter %d detected.\n", local_mtu); That message could be VLOG_DBG because it only tells about a parameter being detected and not if the mtu request succeed. Other than that the patch looks good to me. If no one else has comments to address, maybe a commiter can fix the message level to avoid re-spinning the patchset. Thanks for the patch! Acked-by: Flavio Leitner <f...@sysclose.org> -- fbl > + } > + } > + > + *mtu = local_mtu; > +} > + > static int > vhost_construct_helper(struct netdev *netdev_) OVS_REQUIRES(dpdk_mutex) > { > @@ -773,15 +791,72 @@ netdev_dpdk_get_config(const struct netdev *netdev, > struct smap *args) > smap_add_format(args, "configured_rx_queues", "%d", netdev->n_rxq); > smap_add_format(args, "requested_tx_queues", "%d", netdev->n_txq); > smap_add_format(args, "configured_tx_queues", "%d", dev->real_n_txq); > + smap_add_format(args, "mtu", "%d", dev->mtu); > ovs_mutex_unlock(&dev->mutex); > > return 0; > } > > +/* Set the mtu of DPDK_DEV_ETH ports */ > +static int > +netdev_dpdk_set_mtu(const struct netdev *netdev, int mtu) > +{ > + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); > + int err, dpdk_mtu; > + uint32_t buf_size; > + struct dpdk_mp *mp; > + > + ovs_mutex_lock(&dpdk_mutex); > + ovs_mutex_lock(&dev->mutex); > + if (dev->mtu == mtu) { > + err = 0; > + goto out; > + } > + > + buf_size = dpdk_buf_size(mtu); > + dpdk_mtu = FRAME_LEN_TO_MTU(buf_size); > + > + mp = dpdk_mp_get(dev->socket_id, dpdk_mtu); > + if (!mp) { > + err = ENOMEM; > + goto out; > + } > + > + rte_eth_dev_stop(dev->port_id); > + > + dev->dpdk_mp = mp; > + dev->mtu = mtu; > + dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); > + > + err = dpdk_eth_dev_init(dev); > + if (err) { > + VLOG_WARN("Unable to set MTU '%d' for '%s'; falling back to default " > + "MTU '%d'\n", mtu, dev->up.name, ETHER_MTU); > + dpdk_mp_put(mp); > + dev->mtu = ETHER_MTU; > + mp = dpdk_mp_get(dev->socket_id, dev->mtu); > + if (!mp) { > + err = ENOMEM; > + goto out; > + } > + dev->dpdk_mp = mp; > + dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); > + dpdk_eth_dev_init(dev); > + goto out; > + } else { > + netdev_change_seq_changed(netdev); > + } > +out: > + ovs_mutex_unlock(&dev->mutex); > + ovs_mutex_unlock(&dpdk_mutex); > + return err; > +} > + > static int > netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args) > { > struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); > + int mtu; > > ovs_mutex_lock(&dev->mutex); > netdev->requested_n_rxq = MAX(smap_get_int(args, "n_rxq", > @@ -789,6 +864,14 @@ netdev_dpdk_set_config(struct netdev *netdev, const > struct smap *args) > netdev_change_seq_changed(netdev); > ovs_mutex_unlock(&dev->mutex); > > + dpdk_dev_parse_mtu(args, &mtu); > + > + if (!dev->mtu) { > + return netdev_dpdk_set_mtu(netdev, mtu); > + } else if (mtu != dev->mtu) { > + VLOG_WARN("Unable to set MTU %d for port %d; this port has immutable > MTU " > + "%d\n", mtu, dev->port_id, dev->mtu); > + } > return 0; > } > > @@ -1407,57 +1490,6 @@ netdev_dpdk_get_mtu(const struct netdev *netdev, int > *mtup) > } > > static int > -netdev_dpdk_set_mtu(const struct netdev *netdev, int mtu) > -{ > - struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); > - int old_mtu, err, dpdk_mtu; > - struct dpdk_mp *old_mp; > - struct dpdk_mp *mp; > - uint32_t buf_size; > - > - ovs_mutex_lock(&dpdk_mutex); > - ovs_mutex_lock(&dev->mutex); > - if (dev->mtu == mtu) { > - err = 0; > - goto out; > - } > - > - buf_size = dpdk_buf_size(mtu); > - dpdk_mtu = FRAME_LEN_TO_MTU(buf_size); > - > - mp = dpdk_mp_get(dev->socket_id, dpdk_mtu); > - if (!mp) { > - err = ENOMEM; > - goto out; > - } > - > - rte_eth_dev_stop(dev->port_id); > - > - old_mtu = dev->mtu; > - old_mp = dev->dpdk_mp; > - dev->dpdk_mp = mp; > - dev->mtu = mtu; > - dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); > - > - err = dpdk_eth_dev_init(dev); > - if (err) { > - dpdk_mp_put(mp); > - dev->mtu = old_mtu; > - dev->dpdk_mp = old_mp; > - dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); > - dpdk_eth_dev_init(dev); > - goto out; > - } > - > - dpdk_mp_put(old_mp); > - netdev_change_seq_changed(netdev); > -out: > - ovs_mutex_unlock(&dev->mutex); > - ovs_mutex_unlock(&dpdk_mutex); > - return err; > -} > - > -static int > netdev_dpdk_get_carrier(const struct netdev *netdev_, bool *carrier); > > static int > @@ -2000,6 +2032,61 @@ dpdk_vhost_user_class_init(void) > return 0; > } > > +/* Set the mtu of DPDK_DEV_VHOST ports */ > +static int > +netdev_dpdk_vhost_set_mtu(const struct netdev *netdev, int mtu) > +{ > + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); > + int err = 0; > + struct dpdk_mp *mp; > + > + ovs_mutex_lock(&dpdk_mutex); > + ovs_mutex_lock(&dev->mutex); > + if (dev->mtu == mtu) { > + err = 0; > + goto out; > + } > + > + mp = dpdk_mp_get(dev->socket_id, mtu); > + if (!mp) { > + err = ENOMEM; > + goto out; > + } > + > + dev->dpdk_mp = mp; > + dev->mtu = mtu; > + dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); > + > + netdev_change_seq_changed(netdev); > +out: > + ovs_mutex_unlock(&dev->mutex); > + ovs_mutex_unlock(&dpdk_mutex); > + return err; > +} > + > +static int > +netdev_dpdk_vhost_set_config(struct netdev *netdev, const struct smap *args) > +{ > + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); > + int mtu; > + > + ovs_mutex_lock(&dev->mutex); > + netdev->requested_n_rxq = MAX(smap_get_int(args, "n_rxq", > + netdev->requested_n_rxq), 1); > + netdev_change_seq_changed(netdev); > + ovs_mutex_unlock(&dev->mutex); > + > + dpdk_dev_parse_mtu(args, &mtu); > + > + if (!dev->mtu) { > + return netdev_dpdk_vhost_set_mtu(netdev, mtu); > + } else if (mtu != dev->mtu) { > + VLOG_WARN("Unable to set MTU %d for vhost port; this port has > immutable MTU " > + "%d\n", mtu, dev->mtu); > + } > + return 0; > +} > + > static void > dpdk_common_init(void) > { > @@ -2136,8 +2223,9 @@ unlock_dpdk: > return err; > } > > -#define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, MULTIQ, SEND, \ > - GET_CARRIER, GET_STATS, GET_FEATURES, GET_STATUS, RXQ_RECV) \ > +#define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, SET_CONFIG, \ > + MULTIQ, SEND, SET_MTU, GET_CARRIER, GET_STATS, GET_FEATURES, \ > + GET_STATUS, RXQ_RECV) \ > { \ > NAME, \ > INIT, /* init */ \ > @@ -2149,7 +2237,7 @@ unlock_dpdk: > DESTRUCT, \ > netdev_dpdk_dealloc, \ > netdev_dpdk_get_config, \ > - netdev_dpdk_set_config, \ > + SET_CONFIG , \ > NULL, /* get_tunnel_config */ \ > NULL, /* build header */ \ > NULL, /* push header */ \ > @@ -2163,7 +2251,7 @@ unlock_dpdk: > netdev_dpdk_set_etheraddr, \ > netdev_dpdk_get_etheraddr, \ > netdev_dpdk_get_mtu, \ > - netdev_dpdk_set_mtu, \ > + SET_MTU, \ > netdev_dpdk_get_ifindex, \ > GET_CARRIER, \ > netdev_dpdk_get_carrier_resets, \ > @@ -2309,8 +2397,10 @@ static const struct netdev_class dpdk_class = > NULL, > netdev_dpdk_construct, > netdev_dpdk_destruct, > + netdev_dpdk_set_config, > netdev_dpdk_set_multiq, > netdev_dpdk_eth_send, > + netdev_dpdk_set_mtu, > netdev_dpdk_get_carrier, > netdev_dpdk_get_stats, > netdev_dpdk_get_features, > @@ -2323,8 +2413,10 @@ static const struct netdev_class dpdk_ring_class = > NULL, > netdev_dpdk_ring_construct, > netdev_dpdk_destruct, > + netdev_dpdk_set_config, > netdev_dpdk_set_multiq, > netdev_dpdk_ring_send, > + netdev_dpdk_set_mtu, > netdev_dpdk_get_carrier, > netdev_dpdk_get_stats, > netdev_dpdk_get_features, > @@ -2337,8 +2429,10 @@ static const struct netdev_class OVS_UNUSED > dpdk_vhost_cuse_class = > dpdk_vhost_cuse_class_init, > netdev_dpdk_vhost_cuse_construct, > netdev_dpdk_vhost_destruct, > + netdev_dpdk_set_config, > netdev_dpdk_vhost_cuse_set_multiq, > netdev_dpdk_vhost_send, > + NULL, > netdev_dpdk_vhost_get_carrier, > netdev_dpdk_vhost_get_stats, > NULL, > @@ -2351,8 +2445,10 @@ static const struct netdev_class OVS_UNUSED > dpdk_vhost_user_class = > dpdk_vhost_user_class_init, > netdev_dpdk_vhost_user_construct, > netdev_dpdk_vhost_destruct, > + netdev_dpdk_vhost_set_config, > netdev_dpdk_vhost_set_multiq, > netdev_dpdk_vhost_send, > + netdev_dpdk_vhost_set_mtu, > netdev_dpdk_vhost_get_carrier, > netdev_dpdk_vhost_get_stats, > NULL, _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev