Hi Mark, I thought about how to make this dynamically reconfigurable and I've come up with a series that generalizes the mechanism used for rx/tx queues reconfiguration:
http://openvswitch.org/pipermail/dev/2016-February/066927.html I also think that it should be better to introduce 'mtu_request' as a separate column in the Interface table, instead of putting it in 'options'. This way 'mtu_request' won't be netdev specific, but could be used for every netdev type. In this scenario vswitch/bridge.c would call netdev_set_mtu() when it detects a change in 'mtu_request' and netdev_dpdk_set_mtu() could be implemented using netdev_request_reconfigure() from the series that I posted. What do you think? Thanks, Daniele On 23/02/2016 06:56, "Kavanagh, Mark B" <mark.b.kavan...@intel.com> wrote: >> >>On Fri, 19 Feb 2016 11:25:12 +0000 >>Mark Kavanagh <mark.b.kavan...@intel.com> wrote: >> >>> Add support for Jumbo Frames to DPDK-enabled port types, >>> using single-segment-mbufs. >>> >>> Using this approach, the amount of memory allocated for each mbuf >>> to store frame data is increased to a value greater than 1518B >>> (typical Ethernet maximum frame length). The increased space >>> available in the mbuf means that an entire Jumbo Frame can be carried >>> in a single mbuf, as opposed to partitioning it across multiple mbuf >>> segments. >>> >>> The amount of space allocated to each mbuf to hold frame data is >>> defined dynamically by the user when adding a DPDK port to a bridge. >>> If an MTU value is not supplied, or the user-supplied value is invalid, >>> the MTU for the port defaults to standard Ethernet MTU (i.e. 1500B). >>> >>> Signed-off-by: Mark Kavanagh <mark.b.kavan...@intel.com> >>> --- >>> INSTALL.DPDK.md | 60 ++++++++++++- >>> NEWS | 3 +- >>> lib/netdev-dpdk.c | 248 >>>+++++++++++++++++++++++++++++++++++++----------------- >>> 3 files changed, 233 insertions(+), 78 deletions(-) >>> >>> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md >>> index d892788..4ca98cb 100644 >>> --- a/INSTALL.DPDK.md >>> +++ b/INSTALL.DPDK.md >>> @@ -878,10 +878,63 @@ by adding the following string: >>> to <interface> sections of all network devices used by DPDK. >>>Parameter 'N' >>> determines how many queues can be used by the guest. >>> >>> +Jumbo Frames >>> +------------ >>> + >>> +Support for Jumbo Frames may be enabled at run-time for DPDK-type >>>ports. >>> + >>> +To avail of Jumbo Frame support, add the 'mtu_request' option to the >>>ovs-vsctl >>> +'add-port' command-line, along with the required MTU for the port. >>> +e.g. >>> + >>> + ``` >>> + ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk >>options:mtu_request=9000 >>> + ``` >>> + >>> +When Jumbo Frames are enabled, the size of a DPDK port's mbuf >>>segments are >>> +increased, such that a full Jumbo Frame may be accommodated inside a >>>single >>> +mbuf segment. Once set, the MTU for a DPDK port is immutable. >>> + >>> +Note that from an OVSDB perspective, the `mtu_request` option for a >>>specific >>> +port may be disregarded once initially set, as subsequent >>>modifications to this >>> +field are disregarded by the DPDK port. As with non-DPDK ports, the >>>MTU of DPDK >>> +ports is reported by the `Interface` table's 'mtu' field. >>> + >>> +Jumbo frame support has been validated against 13312B frames, using >>>the >>> +DPDK `igb_uio` driver, but larger frames and other DPDK NIC drivers >>>may >>> +theoretically be supported. Supported port types excludes vHost-Cuse >>>ports, as >>> +that feature is pending deprecation. >>> + >>> +vHost Ports and Jumbo Frames >>> +---------------------------- >>> +Jumbo frame support is available for DPDK vHost-User ports only. Some >>>additional >>> +configuration is needed to take advantage of this feature: >>> + >>> + 1. `mergeable buffers` must be enabled for vHost ports, as >>>demonstrated in >>> + the QEMU command line snippet below: >>> + >>> + ``` >>> + '-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \' >>> + '-device >>>virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=on' >>> + ``` >>> + >>> + 2. Where virtio devices are bound to the Linux kernel driver in a >>>guest >>> + environment (i.e. interfaces are not bound to an in-guest DPDK >>>driver), the >>> + MTU of those logical network interfaces must also be increased. >>>This >>> + avoids segmentation of Jumbo Frames in the guest. Note that >>>'MTU' refers >>> + to the length of the IP packet only, and not that of the entire >>>frame. >>> + >>> + e.g. To calculate the exact MTU of a standard IPv4 frame, >>>subtract the L2 >>> + header and CRC lengths (i.e. 18B) from the max supported frame >>>size. >>> + So, to set the MTU for a 13312B Jumbo Frame: >>> + >>> + ``` >>> + ifconfig eth1 mtu 13294 >>> + ``` >>> + >>> Restrictions: >>> ------------- >>> >>> - - Work with 1500 MTU, needs few changes in DPDK lib to fix this >>>issue. >>> - Currently DPDK port does not make use any offload functionality. >>> - DPDK-vHost support works with 1G huge pages. >>> >>> @@ -922,6 +975,11 @@ Restrictions: >>> the next release of DPDK (which includes the above patch) is >>>available and >>> integrated into OVS. >>> >>> + Jumbo Frames: >>> + - `virtio-pmd`: DPDK apps in the guest do not exit gracefully. This >>>is a DPDK >>> + issue that is currently being investigated. >>> + - vHost-Cuse: Jumbo Frame support is not available for vHost Cuse >>>ports. >>> + >>> Bug Reporting: >>> -------------- >>> >>> diff --git a/NEWS b/NEWS >>> index 3e33871..43127f9 100644 >>> --- a/NEWS >>> +++ b/NEWS >>> @@ -10,10 +10,11 @@ Post-v2.5.0 >>> - DPDK: >>> * New option "n_rxq" for PMD interfaces. >>> Old 'other_config:n-dpdk-rxqs' is no longer supported. >>> + * Support Jumbo Frames >>> + >>> - ovs-benchmark: This utility has been removed due to lack of use >>>and >>> bitrot. >>> >>> - >>> v2.5.0 - xx xxx xxxx >>> --------------------- >>> - Dropped support for Python older than version 2.7. As a >>>consequence, >>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c >>> index 2a06bb5..ac89ee6 100644 >>> --- a/lib/netdev-dpdk.c >>> +++ b/lib/netdev-dpdk.c >>> @@ -77,6 +77,8 @@ static struct vlog_rate_limit rl = >>>VLOG_RATE_LIMIT_INIT(5, 20); >>> + sizeof(struct dp_packet) \ >>> + RTE_PKTMBUF_HEADROOM) >>> #define NETDEV_DPDK_MBUF_ALIGN 1024 >>> +#define NETDEV_DPDK_MAX_FRAME_LEN 13312 >>> +#define MTU_NOT_SET 0 >>> >>> /* Max and min number of packets in the mempool. OVS tries to >>>allocate a >>> * mempool with MAX_NB_MBUF: if this fails (because the system >>>doesn't have >>> @@ -422,6 +424,7 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, >>>int n_rxq, int n_txq) >>> { >>> int diag = 0; >>> int i; >>> + struct rte_eth_conf conf = port_conf; >>> >>> /* A device may report more queues than it makes available (this >>>has >>> * been observed for Intel xl710, which reserves some of them for >>> @@ -433,7 +436,15 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, >>>int n_rxq, int >>n_txq) >>> VLOG_INFO("Retrying setup with (rxq:%d txq:%d)", n_rxq, >>>n_txq); >>> } >>> >>> - diag = rte_eth_dev_configure(dev->port_id, n_rxq, n_txq, >>>&port_conf); >>> + if (dev->mtu > ETHER_MTU) { >>> + conf.rxmode.jumbo_frame = 1; >>> + conf.rxmode.max_rx_pkt_len = dev->max_packet_len; >>> + } else { >>> + conf.rxmode.jumbo_frame = 0; >>> + conf.rxmode.max_rx_pkt_len = 0; >>> + } >>> + >>> + diag = rte_eth_dev_configure(dev->port_id, n_rxq, n_txq, >>>&conf); >>> if (diag) { >>> break; >>> } >>> @@ -574,8 +585,6 @@ netdev_dpdk_init(struct netdev *netdev_, unsigned >>>int port_no, >>> { >>> struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_); >>> int sid; >>> - int err = 0; >>> - uint32_t buf_size; >>> >>> ovs_mutex_init(&netdev->mutex); >>> ovs_mutex_lock(&netdev->mutex); >>> @@ -595,15 +604,7 @@ netdev_dpdk_init(struct netdev *netdev_, unsigned >>>int port_no, >>> netdev->port_id = port_no; >>> netdev->type = type; >>> netdev->flags = 0; >>> - netdev->mtu = ETHER_MTU; >>> - netdev->max_packet_len = MTU_TO_FRAME_LEN(netdev->mtu); >>> - >>> - buf_size = dpdk_buf_size(netdev->mtu); >>> - netdev->dpdk_mp = dpdk_mp_get(netdev->socket_id, >>>FRAME_LEN_TO_MTU(buf_size)); >>> - if (!netdev->dpdk_mp) { >>> - err = ENOMEM; >>> - goto unlock; >>> - } >>> + netdev->mtu = MTU_NOT_SET; >>> >>> netdev_->n_txq = NR_QUEUE; >>> netdev_->n_rxq = NR_QUEUE; >>> @@ -612,20 +613,12 @@ netdev_dpdk_init(struct netdev *netdev_, >>>unsigned int port_no, >>> >>> if (type == DPDK_DEV_ETH) { >>> netdev_dpdk_alloc_txq(netdev, NR_QUEUE); >>> - err = dpdk_eth_dev_init(netdev); >>> - if (err) { >>> - goto unlock; >>> - } >>> } >>> >>> list_push_back(&dpdk_list, &netdev->list_node); >>> >>> -unlock: >>> - if (err) { >>> - rte_free(netdev->tx_q); >>> - } >>> ovs_mutex_unlock(&netdev->mutex); >>> - return err; >>> + return 0; >>> } >>> >>> static int >>> @@ -643,6 +636,31 @@ dpdk_dev_parse_name(const char dev_name[], const >>>char prefix[], >>> return 0; >>> } >>> >>> +static void >>> +dpdk_dev_parse_mtu(const struct smap *args, int *mtu) >>> +{ >>> + const char *mtu_str = smap_get(args, "mtu_request"); >>> + char *end_ptr = NULL; >>> + int local_mtu; >>> + >>> + if (!mtu_str) { >>> + local_mtu = ETHER_MTU; >>> + } else { >>> + local_mtu = strtoul(mtu_str, &end_ptr, 0); >>> + if (local_mtu < ETHER_MTU || >>> + local_mtu > FRAME_LEN_TO_MTU(NETDEV_DPDK_MAX_FRAME_LEN) || >>> + *end_ptr != '\0') { >>> + local_mtu = ETHER_MTU; >>> + VLOG_WARN("Invalid mtu_request parameter - defaulting to >>>%d.\n", >>> + local_mtu); >>> + } else { >>> + VLOG_INFO("mtu_request parameter %d detected.\n", >>>local_mtu); >> >>That message could be VLOG_DBG because it only tells about a parameter >>being >>detected and not if the mtu request succeed. Other than that the patch >>looks >>good to me. > >Great! > >> >>If no one else has comments to address, maybe a commiter can fix the >>message >>level to avoid re-spinning the patchset. > >Daniele - are you happy with this, or would you prefer that I spin a new >version? > >> >>Thanks for the patch! > >Thanks for the feedback! :) > >> >>Acked-by: Flavio Leitner <f...@sysclose.org> >> >>-- >>fbl >> >> >> >>> + } >>> + } >>> + >>> + *mtu = local_mtu; >>> +} >>> + >>> static int >>> vhost_construct_helper(struct netdev *netdev_) >>>OVS_REQUIRES(dpdk_mutex) >>> { >>> @@ -773,15 +791,72 @@ netdev_dpdk_get_config(const struct netdev >>>*netdev, struct smap >>*args) >>> smap_add_format(args, "configured_rx_queues", "%d", >>>netdev->n_rxq); >>> smap_add_format(args, "requested_tx_queues", "%d", netdev->n_txq); >>> smap_add_format(args, "configured_tx_queues", "%d", >>>dev->real_n_txq); >>> + smap_add_format(args, "mtu", "%d", dev->mtu); >>> ovs_mutex_unlock(&dev->mutex); >>> >>> return 0; >>> } >>> >>> +/* Set the mtu of DPDK_DEV_ETH ports */ >>> +static int >>> +netdev_dpdk_set_mtu(const struct netdev *netdev, int mtu) >>> +{ >>> + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); >>> + int err, dpdk_mtu; >>> + uint32_t buf_size; >>> + struct dpdk_mp *mp; >>> + >>> + ovs_mutex_lock(&dpdk_mutex); >>> + ovs_mutex_lock(&dev->mutex); >>> + if (dev->mtu == mtu) { >>> + err = 0; >>> + goto out; >>> + } >>> + >>> + buf_size = dpdk_buf_size(mtu); >>> + dpdk_mtu = FRAME_LEN_TO_MTU(buf_size); >>> + >>> + mp = dpdk_mp_get(dev->socket_id, dpdk_mtu); >>> + if (!mp) { >>> + err = ENOMEM; >>> + goto out; >>> + } >>> + >>> + rte_eth_dev_stop(dev->port_id); >>> + >>> + dev->dpdk_mp = mp; >>> + dev->mtu = mtu; >>> + dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); >>> + >>> + err = dpdk_eth_dev_init(dev); >>> + if (err) { >>> + VLOG_WARN("Unable to set MTU '%d' for '%s'; falling back to >>>default " >>> + "MTU '%d'\n", mtu, dev->up.name, ETHER_MTU); >>> + dpdk_mp_put(mp); >>> + dev->mtu = ETHER_MTU; >>> + mp = dpdk_mp_get(dev->socket_id, dev->mtu); >>> + if (!mp) { >>> + err = ENOMEM; >>> + goto out; >>> + } >>> + dev->dpdk_mp = mp; >>> + dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); >>> + dpdk_eth_dev_init(dev); >>> + goto out; >>> + } else { >>> + netdev_change_seq_changed(netdev); >>> + } >>> +out: >>> + ovs_mutex_unlock(&dev->mutex); >>> + ovs_mutex_unlock(&dpdk_mutex); >>> + return err; >>> +} >>> + >>> static int >>> netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args) >>> { >>> struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); >>> + int mtu; >>> >>> ovs_mutex_lock(&dev->mutex); >>> netdev->requested_n_rxq = MAX(smap_get_int(args, "n_rxq", >>> @@ -789,6 +864,14 @@ netdev_dpdk_set_config(struct netdev *netdev, >>>const struct smap *args) >>> netdev_change_seq_changed(netdev); >>> ovs_mutex_unlock(&dev->mutex); >>> >>> + dpdk_dev_parse_mtu(args, &mtu); >>> + >>> + if (!dev->mtu) { >>> + return netdev_dpdk_set_mtu(netdev, mtu); >>> + } else if (mtu != dev->mtu) { >>> + VLOG_WARN("Unable to set MTU %d for port %d; this port has >>>immutable MTU " >>> + "%d\n", mtu, dev->port_id, dev->mtu); >>> + } >>> return 0; >>> } >>> >>> @@ -1407,57 +1490,6 @@ netdev_dpdk_get_mtu(const struct netdev >>>*netdev, int *mtup) >>> } >>> >>> static int >>> -netdev_dpdk_set_mtu(const struct netdev *netdev, int mtu) >>> -{ >>> - struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); >>> - int old_mtu, err, dpdk_mtu; >>> - struct dpdk_mp *old_mp; >>> - struct dpdk_mp *mp; >>> - uint32_t buf_size; >>> - >>> - ovs_mutex_lock(&dpdk_mutex); >>> - ovs_mutex_lock(&dev->mutex); >>> - if (dev->mtu == mtu) { >>> - err = 0; >>> - goto out; >>> - } >>> - >>> - buf_size = dpdk_buf_size(mtu); >>> - dpdk_mtu = FRAME_LEN_TO_MTU(buf_size); >>> - >>> - mp = dpdk_mp_get(dev->socket_id, dpdk_mtu); >>> - if (!mp) { >>> - err = ENOMEM; >>> - goto out; >>> - } >>> - >>> - rte_eth_dev_stop(dev->port_id); >>> - >>> - old_mtu = dev->mtu; >>> - old_mp = dev->dpdk_mp; >>> - dev->dpdk_mp = mp; >>> - dev->mtu = mtu; >>> - dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); >>> - >>> - err = dpdk_eth_dev_init(dev); >>> - if (err) { >>> - dpdk_mp_put(mp); >>> - dev->mtu = old_mtu; >>> - dev->dpdk_mp = old_mp; >>> - dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); >>> - dpdk_eth_dev_init(dev); >>> - goto out; >>> - } >>> - >>> - dpdk_mp_put(old_mp); >>> - netdev_change_seq_changed(netdev); >>> -out: >>> - ovs_mutex_unlock(&dev->mutex); >>> - ovs_mutex_unlock(&dpdk_mutex); >>> - return err; >>> -} >>> - >>> -static int >>> netdev_dpdk_get_carrier(const struct netdev *netdev_, bool *carrier); >>> >>> static int >>> @@ -2000,6 +2032,61 @@ dpdk_vhost_user_class_init(void) >>> return 0; >>> } >>> >>> +/* Set the mtu of DPDK_DEV_VHOST ports */ >>> +static int >>> +netdev_dpdk_vhost_set_mtu(const struct netdev *netdev, int mtu) >>> +{ >>> + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); >>> + int err = 0; >>> + struct dpdk_mp *mp; >>> + >>> + ovs_mutex_lock(&dpdk_mutex); >>> + ovs_mutex_lock(&dev->mutex); >>> + if (dev->mtu == mtu) { >>> + err = 0; >>> + goto out; >>> + } >>> + >>> + mp = dpdk_mp_get(dev->socket_id, mtu); >>> + if (!mp) { >>> + err = ENOMEM; >>> + goto out; >>> + } >>> + >>> + dev->dpdk_mp = mp; >>> + dev->mtu = mtu; >>> + dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); >>> + >>> + netdev_change_seq_changed(netdev); >>> +out: >>> + ovs_mutex_unlock(&dev->mutex); >>> + ovs_mutex_unlock(&dpdk_mutex); >>> + return err; >>> +} >>> + >>> +static int >>> +netdev_dpdk_vhost_set_config(struct netdev *netdev, const struct smap >>>*args) >>> +{ >>> + struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); >>> + int mtu; >>> + >>> + ovs_mutex_lock(&dev->mutex); >>> + netdev->requested_n_rxq = MAX(smap_get_int(args, "n_rxq", >>> + >>>netdev->requested_n_rxq), 1); >>> + netdev_change_seq_changed(netdev); >>> + ovs_mutex_unlock(&dev->mutex); >>> + >>> + dpdk_dev_parse_mtu(args, &mtu); >>> + >>> + if (!dev->mtu) { >>> + return netdev_dpdk_vhost_set_mtu(netdev, mtu); >>> + } else if (mtu != dev->mtu) { >>> + VLOG_WARN("Unable to set MTU %d for vhost port; this port has >>>immutable MTU " >>> + "%d\n", mtu, dev->mtu); >>> + } >>> + return 0; >>> +} >>> + >>> static void >>> dpdk_common_init(void) >>> { >>> @@ -2136,8 +2223,9 @@ unlock_dpdk: >>> return err; >>> } >>> >>> -#define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, MULTIQ, >>>SEND, \ >>> - GET_CARRIER, GET_STATS, GET_FEATURES, GET_STATUS, RXQ_RECV) >>> \ >>> +#define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, >>>SET_CONFIG, \ >>> + MULTIQ, SEND, SET_MTU, GET_CARRIER, GET_STATS, GET_FEATURES, >>> \ >>> + GET_STATUS, RXQ_RECV) >>> \ >>> { \ >>> NAME, \ >>> INIT, /* init */ \ >>> @@ -2149,7 +2237,7 @@ unlock_dpdk: >>> DESTRUCT, \ >>> netdev_dpdk_dealloc, \ >>> netdev_dpdk_get_config, \ >>> - netdev_dpdk_set_config, \ >>> + SET_CONFIG , \ >>> NULL, /* get_tunnel_config */ \ >>> NULL, /* build header */ \ >>> NULL, /* push header */ \ >>> @@ -2163,7 +2251,7 @@ unlock_dpdk: >>> netdev_dpdk_set_etheraddr, \ >>> netdev_dpdk_get_etheraddr, \ >>> netdev_dpdk_get_mtu, \ >>> - netdev_dpdk_set_mtu, \ >>> + SET_MTU, \ >>> netdev_dpdk_get_ifindex, \ >>> GET_CARRIER, \ >>> netdev_dpdk_get_carrier_resets, \ >>> @@ -2309,8 +2397,10 @@ static const struct netdev_class dpdk_class = >>> NULL, >>> netdev_dpdk_construct, >>> netdev_dpdk_destruct, >>> + netdev_dpdk_set_config, >>> netdev_dpdk_set_multiq, >>> netdev_dpdk_eth_send, >>> + netdev_dpdk_set_mtu, >>> netdev_dpdk_get_carrier, >>> netdev_dpdk_get_stats, >>> netdev_dpdk_get_features, >>> @@ -2323,8 +2413,10 @@ static const struct netdev_class >>>dpdk_ring_class = >>> NULL, >>> netdev_dpdk_ring_construct, >>> netdev_dpdk_destruct, >>> + netdev_dpdk_set_config, >>> netdev_dpdk_set_multiq, >>> netdev_dpdk_ring_send, >>> + netdev_dpdk_set_mtu, >>> netdev_dpdk_get_carrier, >>> netdev_dpdk_get_stats, >>> netdev_dpdk_get_features, >>> @@ -2337,8 +2429,10 @@ static const struct netdev_class OVS_UNUSED >>>dpdk_vhost_cuse_class = >>> dpdk_vhost_cuse_class_init, >>> netdev_dpdk_vhost_cuse_construct, >>> netdev_dpdk_vhost_destruct, >>> + netdev_dpdk_set_config, >>> netdev_dpdk_vhost_cuse_set_multiq, >>> netdev_dpdk_vhost_send, >>> + NULL, >>> netdev_dpdk_vhost_get_carrier, >>> netdev_dpdk_vhost_get_stats, >>> NULL, >>> @@ -2351,8 +2445,10 @@ static const struct netdev_class OVS_UNUSED >>>dpdk_vhost_user_class = >>> dpdk_vhost_user_class_init, >>> netdev_dpdk_vhost_user_construct, >>> netdev_dpdk_vhost_destruct, >>> + netdev_dpdk_vhost_set_config, >>> netdev_dpdk_vhost_set_multiq, >>> netdev_dpdk_vhost_send, >>> + netdev_dpdk_vhost_set_mtu, >>> netdev_dpdk_vhost_get_carrier, >>> netdev_dpdk_vhost_get_stats, >>> NULL, > _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev