On 08/02/2016 03:23, "Kavanagh, Mark B" <mark.b.kavan...@intel.com> wrote:
>> >>Hi Mark, >> > >Hi Daniele, > >Thanks for the review! Responses inline below. > >Cheers, >Mark > >>This patch besides adding Jumbo Frame support also cleans up >>the mbuf initialization (by changing the macros, adding >>dpdk_buf_size, and rewriting __ovs_rte_pktmbuf_init), so thanks >>for this. I think it makes sense to split the patch in two: >>one that does the clenup, and one that allows configuring the >>MTU. > >Okay, sounds good - I'll spin another version, splitting into a two patch >set. Thanks! > >> >>I agree with Flavio's comments as well, more inline >> >>Thanks >> >>On 01/02/2016 02:18, "Mark Kavanagh" <mark.b.kavan...@intel.com> wrote: >> >>>Add support for Jumbo Frames to DPDK-enabled port types, >>>using single-segment-mbufs. >>> >>>Using this approach, the amount of memory allocated for each mbuf >>>to store frame data is increased to a value greater than 1518B >>>(typical Ethernet maximum frame length). The increased space >>>available in the mbuf means that an entire Jumbo Frame can be carried >>>in a single mbuf, as opposed to partitioning it across multiple mbuf >>>segments. >>> >>>The amount of space allocated to each mbuf to hold frame data is >>>defined dynamically by the user when adding a DPDK port to a bridge. >>>If an MTU value is not supplied, or the user-supplied value is invalid, >>>the MTU for the port defaults to standard Ethernet MTU (i.e. 1500B). >>> >>>Signed-off-by: Mark Kavanagh <mark.b.kavan...@intel.com> >>>--- >>> INSTALL.DPDK.md | 59 +++++++++- >>> NEWS | 2 + >>> lib/netdev-dpdk.c | 347 >>>+++++++++++++++++++++++++++++++++++++++++------------- >>> 3 files changed, 328 insertions(+), 80 deletions(-) >>> >>>diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md >>>index 96b686c..64ccd15 100644 >>>--- a/INSTALL.DPDK.md >>>+++ b/INSTALL.DPDK.md >>>@@ -859,10 +859,61 @@ by adding the following string: >>> to <interface> sections of all network devices used by DPDK. Parameter >>>'N' >>> determines how many queues can be used by the guest. >>> >>>+ >>>+Jumbo Frames >>>+------------ >>>+ >>>+Support for Jumbo Frames may be enabled at run-time for DPDK-type >>>ports. >>>+ >>>+To avail of Jumbo Frame support, add the 'dpdk-mtu' option to the >>>ovs-vsctl >>>+'add-port' command-line, along with the required MTU for the port. >>>+e.g. >>>+ >>>+ ``` >>>+ ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk >>>options:dpdk-mtu=9000 >>>+ ``` >>>+ >>>+When Jumbo Frames are enabled, the size of a DPDK port's mbuf segments >>>are >>>+increased, such that a full Jumbo Frame may be accommodated inside a >>>single >>>+mbuf segment. Once set, the MTU for a DPDK port is immutable. >> >>Why is it immutable? > >I guess my rationale here is that an MTU change can't be triggered via >OVS command-line, nor can it be triggered programmatically via DPDK >(apart from an explicit call to rte_eth_dev_set_mtu). >So, while technically it's possibly, from a user's point of view, there's >no way to configure it, outside of modifying the code directly. >If I've missed something here, please let me know. With ovs-vsctl set Interface options:dpdk-mtu=... you could change it at runtime. Have you considered that in this patch? I guess it's ok to have it configurable only when the port is added for now, but we need to make sure that nothing bad happens if a user tries to change it when it's running. Also, after talking with other people, I think that a better name for the parameter might be 'mtu_request', and we might want to use it also for kernel ports (but that doesn't need to be done now). > >> >>>+ >>>+Jumbo frame support has been validated against 13312B frames, using the >>>+DPDK `igb_uio` driver, but larger frames and other DPDK NIC drivers may >>>+theoretically be supported. Supported port types excludes vHost-Cuse >>>ports, as >>>+this feature is pending deprecation. >>>+ >>>+ >>>+vHost Ports and Jumbo Frames >>>+---------------------------- >>>+Jumbo frame support is available for DPDK vHost-User ports only. Some >>>additional >>>+configuration is needed to take advantage of this feature: >>>+ >>>+ 1. `mergeable buffers` must be enabled for vHost ports, as >>>demonstrated in >>>+ the QEMU command line snippet below: >>>+ >>>+ ``` >>>+ '-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \' >>>+ '-device >>>virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=on' >>>+ ``` >>>+ >>>+ 2. Where virtio devices are bound to the Linux kernel driver in a >>>guest >>>+ environment (i.e. interfaces are not bound to an in-guest DPDK >>>driver), the >>>+ MTU of those logical network interfaces must also be increased. >>>This >>>+ avoids segmentation of Jumbo Frames in the guest. Note that 'MTU' >>>refers >>>+ to the length of the IP packet only, and not that of the entire >>>frame. >>>+ >>>+ e.g. To calculate the exact MTU of a standard IPv4 frame, subtract >>>the L2 >>>+ header and CRC lengths (i.e. 18B) from the max supported frame >>>size. >>>+ So, to set the MTU for a 13312B Jumbo Frame: >>>+ >>>+ ``` >>>+ ifconfig eth1 mtu 13294 >>>+ ``` >>>+ >>>+ >>> Restrictions: >>> ------------- >>> >>>- - Work with 1500 MTU, needs few changes in DPDK lib to fix this >>>issue. >>> - Currently DPDK port does not make use any offload functionality. >>> - DPDK-vHost support works with 1G huge pages. >>> >>>@@ -903,6 +954,12 @@ Restrictions: >>> the next release of DPDK (which includes the above patch) is >>>available and >>> integrated into OVS. >>> >>>+ Jumbo Frames: >>>+ - `virtio-pmd`: DPDK apps in the guest do not exit gracefully. The >>>source of >>>+ this issue is currently being investigated. >>>+ - vHost-Cuse: Jumbo Frame support is not available for vHost Cuse >>>ports. >>>+ >>>+ >>> Bug Reporting: >>> -------------- >>> >>>diff --git a/NEWS b/NEWS >>>index 5c18867..cd563e0 100644 >>>--- a/NEWS >>>+++ b/NEWS >>>@@ -46,6 +46,8 @@ v2.5.0 - xx xxx xxxx >>> abstractions, such as virtual L2 and L3 overlays and security >>>groups. >>> - RHEL packaging: >>> * DPDK ports may now be created via network scripts (see >>>README.RHEL). >>>+ - netdev-dpdk: >>>+ * Add Jumbo Frame Support >> >>I don't think we should add this to 2.5 > >Out of curiosity, is this mainly due to potential instability? >In any event, I'll move it into the 'Post-v2.5.0' section. > >> >>> >>> >>> v2.4.0 - 20 Aug 2015 >>>diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c >>>index de7e488..76a5dcc 100644 >>>--- a/lib/netdev-dpdk.c >>>+++ b/lib/netdev-dpdk.c >>>@@ -62,20 +62,25 @@ static struct vlog_rate_limit rl = >>>VLOG_RATE_LIMIT_INIT(5, 20); >>> #define OVS_CACHE_LINE_SIZE CACHE_LINE_SIZE >>> #define OVS_VPORT_DPDK "ovs_dpdk" >>> >>>+#define NETDEV_DPDK_JUMBO_FRAME_ENABLED 1 >> >>I don't see particular value in the above #define. > >I've removed this macro, and the section of code that used it in the >latest version of the patch. > >> >>>+#define NETDEV_DPDK_DEFAULT_RX_BUFSIZE 1024 >>>+ >>> /* >>> * need to reserve tons of extra space in the mbufs so we can align the >>> * DMA addresses to 4KB. >>> * The minimum mbuf size is limited to avoid scatter behaviour and drop >>>in >>> * performance for standard Ethernet MTU. >>> */ >>>-#define MTU_TO_MAX_LEN(mtu) ((mtu) + ETHER_HDR_LEN + ETHER_CRC_LEN) >>>-#define MBUF_SIZE_MTU(mtu) (MTU_TO_MAX_LEN(mtu) \ >>>- + sizeof(struct dp_packet) \ >>>- + RTE_PKTMBUF_HEADROOM) >>>-#define MBUF_SIZE_DRIVER (2048 \ >>>- + sizeof (struct rte_mbuf) \ >>>- + RTE_PKTMBUF_HEADROOM) >>>-#define MBUF_SIZE(mtu) MAX(MBUF_SIZE_MTU(mtu), MBUF_SIZE_DRIVER) >>>+#define MTU_TO_FRAME_LEN(mtu) ((mtu) + ETHER_HDR_LEN + >>>ETHER_CRC_LEN) >>>+#define FRAME_LEN_TO_MTU(frame_len) ((frame_len)- ETHER_HDR_LEN - >>>ETHER_CRC_LEN) >>>+#define MBUF_SEGMENT_SIZE(mtu) ( MTU_TO_FRAME_LEN(mtu) \ >>>+ + sizeof(struct dp_packet) \ >>>+ + RTE_PKTMBUF_HEADROOM) >>>+ >>>+/* This value should be specified as a multiple of the DPDK NIC >>>driver's >>>+ * 'min_rx_bufsize' attribute (currently 1024B for 'igb_uio'). >>>+ */ >>>+#define NETDEV_DPDK_MAX_FRAME_LEN 13312 >>> >>> /* Max and min number of packets in the mempool. OVS tries to >>>allocate a >>> * mempool with MAX_NB_MBUF: if this fails (because the system doesn't >>>have >>>@@ -86,6 +91,8 @@ static struct vlog_rate_limit rl = >>>VLOG_RATE_LIMIT_INIT(5, 20); >>> #define MIN_NB_MBUF (4096 * 4) >>> #define MP_CACHE_SZ RTE_MEMPOOL_CACHE_MAX_SIZE >>> >>>+#define DPDK_VLAN_TAG_LEN 4 >>>+ >>> /* MAX_NB_MBUF can be divided by 2 many times, until MIN_NB_MBUF */ >>> BUILD_ASSERT_DECL(MAX_NB_MBUF % >>>ROUND_DOWN_POW2(MAX_NB_MBUF/MIN_NB_MBUF) >>>== 0); >>> >>>@@ -114,7 +121,6 @@ static const struct rte_eth_conf port_conf = { >>> .header_split = 0, /* Header Split disabled */ >>> .hw_ip_checksum = 0, /* IP checksum offload disabled */ >>> .hw_vlan_filter = 0, /* VLAN filtering disabled */ >>>- .jumbo_frame = 0, /* Jumbo Frame Support disabled */ >>> .hw_strip_crc = 0, >>> }, >>> .rx_adv_conf = { >>>@@ -254,6 +260,41 @@ is_dpdk_class(const struct netdev_class *class) >>> return class->construct == netdev_dpdk_construct; >>> } >>> >>>+/* DPDK NIC drivers allocate RX buffers at a particular granularity >>>+ * (specified by rte_eth_dev_info.min_rx_bufsize - currently 1K for >>>igb_uio). >>>+ * If 'frame_len' is not a multiple of this value, insufficient buffers >>>are >>>+ * allocated to accomodate the packet in its entirety. Furthermore, the >>>igb_uio >>>+ * driver needs to ensure that there is also sufficient space in the Rx >>>buffer >>>+ * to accommodate two VLAN tags (for QinQ frames). If the RX buffer is >>>too >>>+ * small, then the driver enables scatter RX behaviour, which reduces >>>+ * performance. To prevent this, use a buffer size that is closest to >>>+ * 'frame_len', but which satisfies the aforementioned criteria. >>>+ */ >>>+static uint32_t >>>+dpdk_buf_size(struct netdev_dpdk *netdev, int frame_len) >>>+{ >>>+ struct rte_eth_dev_info info; >>>+ uint32_t buf_size; >>>+ /* XXX: This is a workaround for DPDK v2.2, and needs to be >>>refactored with a >>>+ * future DPDK release. */ >> >>Could you elaborate on that? > >Due to changes pending in the latest version of the code, this is no >longer relevant and will be removed. > >> >>>+ uint32_t len = frame_len + (2 * DPDK_VLAN_TAG_LEN); >>>+ >>>+ if(netdev->type == DPDK_DEV_ETH) { >>>+ rte_eth_dev_info_get(netdev->port_id, &info); >>>+ buf_size = (info.min_rx_bufsize == 0) ? >>>+ NETDEV_DPDK_DEFAULT_RX_BUFSIZE : >>>+ info.min_rx_bufsize; >>>+ } else { >>>+ buf_size = NETDEV_DPDK_DEFAULT_RX_BUFSIZE; >>>+ } >>>+ >>>+ if(len % buf_size) { >>>+ len = buf_size * ((len/buf_size) + 1); >>>+ } >> >>I think this looks better with the ROUND_UP macro. > >True - I'll update the code accordingly. > >> >>>+ >>>+ return len; >>>+} >>>+ >>> /* XXX: use dpdk malloc for entire OVS. in fact huge page should be >>>used >>> * for all other segments data, bss and text. */ >>> >>>@@ -280,26 +321,65 @@ free_dpdk_buf(struct dp_packet *p) >>> } >>> >>> static void >>>-__rte_pktmbuf_init(struct rte_mempool *mp, >>>- void *opaque_arg OVS_UNUSED, >>>- void *_m, >>>- unsigned i OVS_UNUSED) >>>+ovs_rte_pktmbuf_pool_init(struct rte_mempool *mp, void *opaque_arg) >>> { >>>- struct rte_mbuf *m = _m; >>>- uint32_t buf_len = mp->elt_size - sizeof(struct dp_packet); >>>+ struct rte_pktmbuf_pool_private *user_mbp_priv, *mbp_priv; >>>+ struct rte_pktmbuf_pool_private default_mbp_priv; >>>+ uint16_t roomsz; >>> >>> RTE_MBUF_ASSERT(mp->elt_size >= sizeof(struct dp_packet)); >>> >>>+ /* if no structure is provided, assume no mbuf private area */ >>>+ >>>+ user_mbp_priv = opaque_arg; >>>+ if (user_mbp_priv == NULL) { >>>+ default_mbp_priv.mbuf_priv_size = 0; >>>+ if (mp->elt_size > sizeof(struct dp_packet)) { >>>+ roomsz = mp->elt_size - sizeof(struct dp_packet); >>>+ } else { >>>+ roomsz = 0; >>>+ } >>>+ default_mbp_priv.mbuf_data_room_size = roomsz; >>>+ user_mbp_priv = &default_mbp_priv; >>>+ } >>>+ >>>+ RTE_MBUF_ASSERT(mp->elt_size >= sizeof(struct dp_packet) + >>>+ user_mbp_priv->mbuf_data_room_size + >>>+ user_mbp_priv->mbuf_priv_size); >>>+ >>>+ mbp_priv = rte_mempool_get_priv(mp); >>>+ memcpy(mbp_priv, user_mbp_priv, sizeof(*mbp_priv)); >>>+} >>>+ >>>+/* Initialise some fields in the mbuf structure that are not modified >>>by >>>the >>>+ * user once created (origin pool, buffer start address, etc.*/ >>>+static void >>>+__ovs_rte_pktmbuf_init(struct rte_mempool *mp, >>>+ void *opaque_arg OVS_UNUSED, >>>+ void *_m, >>>+ unsigned i OVS_UNUSED) >>>+{ >>>+ struct rte_mbuf *m = _m; >>>+ uint32_t buf_size, buf_len, priv_size; >>>+ >>>+ priv_size = rte_pktmbuf_priv_size(mp); >>>+ buf_size = sizeof(struct dp_packet) + priv_size; >>>+ buf_len = rte_pktmbuf_data_room_size(mp); >>>+ >>>+ RTE_MBUF_ASSERT(RTE_ALIGN(priv_size, RTE_MBUF_PRIV_ALIGN) == >>>priv_size); >>>+ RTE_MBUF_ASSERT(mp->elt_size >= buf_size); >>>+ RTE_MBUF_ASSERT(buf_len <= UINT16_MAX); >>>+ >>> memset(m, 0, mp->elt_size); >>> >>>- /* start of buffer is just after mbuf structure */ >>>- m->buf_addr = (char *)m + sizeof(struct dp_packet); >>>- m->buf_physaddr = rte_mempool_virt2phy(mp, m) + >>>- sizeof(struct dp_packet); >>>+ /* start of buffer is after dp_packet structure and priv data */ >>>+ m->priv_size = priv_size; >>>+ m->buf_addr = (char *)m + buf_size; >>>+ m->buf_physaddr = rte_mempool_virt2phy(mp, m) + buf_size; >>> m->buf_len = (uint16_t)buf_len; >>> >>> /* keep some headroom between start of buffer and data */ >>>- m->data_off = RTE_MIN(RTE_PKTMBUF_HEADROOM, m->buf_len); >>>+ m->data_off = RTE_MIN(RTE_PKTMBUF_HEADROOM, (uint16_t)m->buf_len); >>> >>> /* init some constant fields */ >>> m->pool = mp; >>>@@ -315,7 +395,7 @@ ovs_rte_pktmbuf_init(struct rte_mempool *mp, >>> { >>> struct rte_mbuf *m = _m; >>> >>>- __rte_pktmbuf_init(mp, opaque_arg, _m, i); >>>+ __ovs_rte_pktmbuf_init(mp, opaque_arg, m, i); >>> >>> dp_packet_init_dpdk((struct dp_packet *) m, m->buf_len); >>> } >>>@@ -326,6 +406,7 @@ dpdk_mp_get(int socket_id, int mtu) >>>OVS_REQUIRES(dpdk_mutex) >>> struct dpdk_mp *dmp = NULL; >>> char mp_name[RTE_MEMPOOL_NAMESIZE]; >>> unsigned mp_size; >>>+ struct rte_pktmbuf_pool_private mbp_priv; >>> >>> LIST_FOR_EACH (dmp, list_node, &dpdk_mp_list) { >>> if (dmp->socket_id == socket_id && dmp->mtu == mtu) { >>>@@ -338,6 +419,8 @@ dpdk_mp_get(int socket_id, int mtu) >>>OVS_REQUIRES(dpdk_mutex) >>> dmp->socket_id = socket_id; >>> dmp->mtu = mtu; >>> dmp->refcount = 1; >>>+ mbp_priv.mbuf_data_room_size = MTU_TO_FRAME_LEN(mtu) + >>>RTE_PKTMBUF_HEADROOM; >>>+ mbp_priv.mbuf_priv_size = 0; >>> >>> mp_size = MAX_NB_MBUF; >>> do { >>>@@ -346,10 +429,10 @@ dpdk_mp_get(int socket_id, int mtu) >>>OVS_REQUIRES(dpdk_mutex) >>> return NULL; >>> } >>> >>>- dmp->mp = rte_mempool_create(mp_name, mp_size, MBUF_SIZE(mtu), >>>+ dmp->mp = rte_mempool_create(mp_name, mp_size, >>>MBUF_SEGMENT_SIZE(mtu), >>> MP_CACHE_SZ, >>> sizeof(struct >>>rte_pktmbuf_pool_private), >>>- rte_pktmbuf_pool_init, NULL, >>>+ ovs_rte_pktmbuf_pool_init, >>>&mbp_priv, >>> ovs_rte_pktmbuf_init, NULL, >>> socket_id, 0); >>> } while (!dmp->mp && rte_errno == ENOMEM && (mp_size /= 2) >= >>>MIN_NB_MBUF); >> >>Ok, this is quite intricated. I believe the reason that OVS used its own >>__rte_pktmbuf_init() was that it wasn't possible to have custom metadata >>in mbufs before DPDK 2.1. >> >>If I'm not mistaken, with DPDK commit b507905ff407 there's a way to do >>that >>without copying any code from DPDK with the following incremental (from >>master) >> >>---8<--- >> >>@@ -278,42 +272,12 @@ free_dpdk_buf(struct dp_packet *p) >> } >> >> static void >>-__rte_pktmbuf_init(struct rte_mempool *mp, >>- void *opaque_arg OVS_UNUSED, >>- void *_m, >>- unsigned i OVS_UNUSED) >>-{ >>- struct rte_mbuf *m = _m; >>- uint32_t buf_len = mp->elt_size - sizeof(struct dp_packet); >>- >>- RTE_MBUF_ASSERT(mp->elt_size >= sizeof(struct dp_packet)); >>- >>- memset(m, 0, mp->elt_size); >>- >>- /* start of buffer is just after mbuf structure */ >>- m->buf_addr = (char *)m + sizeof(struct dp_packet); >>- m->buf_physaddr = rte_mempool_virt2phy(mp, m) + >>- sizeof(struct dp_packet); >>- m->buf_len = (uint16_t)buf_len; >>- >>- /* keep some headroom between start of buffer and data */ >>- m->data_off = RTE_MIN(RTE_PKTMBUF_HEADROOM, m->buf_len); >>- >>- /* init some constant fields */ >>- m->pool = mp; >>- m->nb_segs = 1; >>- m->port = 0xff; >>-} >>- >>-static void >>-ovs_rte_pktmbuf_init(struct rte_mempool *mp, >>- void *opaque_arg OVS_UNUSED, >>- void *_m, >>- unsigned i OVS_UNUSED) >>+ovs_rte_pktmbuf_init(struct rte_mempool *mp, void *opaque_arg, >>+ void *_m, unsigned i) >> { >> struct rte_mbuf *m = _m; >> >>- __rte_pktmbuf_init(mp, opaque_arg, _m, i); >>+ rte_pktmbuf_init(mp, opaque_arg, _m, i); >> >> dp_packet_init_dpdk((struct dp_packet *) m, m->buf_len); >> } >>@@ -339,15 +303,21 @@ dpdk_mp_get(int socket_id, int mtu) >>OVS_REQUIRES(dpdk_mutex) >> >> mp_size = MAX_NB_MBUF; >> do { >>+ struct rte_pktmbuf_pool_private mbuf_sizes; >>+ >> if (snprintf(mp_name, RTE_MEMPOOL_NAMESIZE, "ovs_mp_%d_%d_%u", >> dmp->mtu, dmp->socket_id, mp_size) < 0) { >> return NULL; >> } >> >>+ mbuf_sizes.mbuf_priv_size = sizeof (struct dp_packet) >>+ - sizeof (struct rte_mbuf); >>+ mbuf_sizes.mbuf_data_room_size = MBUF_SIZE(mtu) >>+ - sizeof(struct dp_packet); >> dmp->mp = rte_mempool_create(mp_name, mp_size, MBUF_SIZE(mtu), >> MP_CACHE_SZ, >> sizeof(struct >>rte_pktmbuf_pool_private), >>- rte_pktmbuf_pool_init, NULL, >>+ rte_pktmbuf_pool_init, &mbuf_sizes, >> ovs_rte_pktmbuf_init, NULL, >> socket_id, 0); >> } while (!dmp->mp && rte_errno == ENOMEM && (mp_size /= 2) >= >>MIN_NB_MBUF); >> >>---8<--- >> >>I think this will make the patch cleaner. Do you think this will work >>with >>the increased MTU as well? > >Thanks for this Daniele. When I implemented the code, I attempted to >change as little of the legacy code as possible, but seeing as how I'm >now doing a patchset, I can roll this change in. >Btw, I just tested with P2P 9k frames, and it works fine :) > >> >>>@@ -433,6 +516,7 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, >>>int >>>n_rxq, int n_txq) >>> { >>> int diag = 0; >>> int i; >>>+ struct rte_eth_conf conf = port_conf; >>> >>> /* A device may report more queues than it makes available (this >>>has >>> * been observed for Intel xl710, which reserves some of them for >>>@@ -444,7 +528,12 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, >>>int n_rxq, int n_txq) >>> VLOG_INFO("Retrying setup with (rxq:%d txq:%d)", n_rxq, >>>n_txq); >>> } >>> >>>- diag = rte_eth_dev_configure(dev->port_id, n_rxq, n_txq, >>>&port_conf); >>>+ if(OVS_UNLIKELY(dev->mtu > ETHER_MTU)) { >>>+ conf.rxmode.jumbo_frame = NETDEV_DPDK_JUMBO_FRAME_ENABLED; >>>+ conf.rxmode.max_rx_pkt_len = MTU_TO_FRAME_LEN(dev->mtu); >>>+ } >>>+ >>>+ diag = rte_eth_dev_configure(dev->port_id, n_rxq, n_txq, >>>&conf); >>> if (diag) { >>> break; >>> } >>>@@ -586,6 +675,7 @@ netdev_dpdk_init(struct netdev *netdev_, unsigned >>>int >>>port_no, >>> struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_); >>> int sid; >>> int err = 0; >>>+ uint32_t buf_size; >>> >>> ovs_mutex_init(&netdev->mutex); >>> ovs_mutex_lock(&netdev->mutex); >>>@@ -605,10 +695,16 @@ netdev_dpdk_init(struct netdev *netdev_, unsigned >>>int port_no, >>> netdev->port_id = port_no; >>> netdev->type = type; >>> netdev->flags = 0; >>>+ >>>+ /* Initialize port's MTU and frame len to the default Ethernet >>>values. >>>+ * Larger, user-specified (jumbo) frame buffers are accommodated in >>>+ * netdev_dpdk_set_config. >>>+ */ >>>+ netdev->max_packet_len = ETHER_MAX_LEN; >>> netdev->mtu = ETHER_MTU; >>>- netdev->max_packet_len = MTU_TO_MAX_LEN(netdev->mtu); >>> >>>- netdev->dpdk_mp = dpdk_mp_get(netdev->socket_id, netdev->mtu); >>>+ buf_size = dpdk_buf_size(netdev, ETHER_MAX_LEN); >>>+ netdev->dpdk_mp = dpdk_mp_get(netdev->socket_id, >>>FRAME_LEN_TO_MTU(buf_size)); >>> if (!netdev->dpdk_mp) { >>> err = ENOMEM; >>> goto unlock; >>>@@ -651,6 +747,27 @@ dpdk_dev_parse_name(const char dev_name[], const >>>char prefix[], >>> return 0; >>> } >>> >>>+static void >>>+dpdk_dev_parse_mtu(const struct smap *args, int *mtu) >>>+{ >>>+ const char *mtu_str = smap_get(args, "dpdk-mtu"); >>>+ char *end_ptr = NULL; >>>+ int local_mtu; >>>+ >>>+ if(mtu_str) { >>>+ local_mtu = strtoul(mtu_str, &end_ptr, 0); >>>+ } >>>+ if(!mtu_str || local_mtu < ETHER_MTU || >>>+ local_mtu > >>>FRAME_LEN_TO_MTU(NETDEV_DPDK_MAX_FRAME_LEN) || >>>+ *end_ptr != '\0') { >>>+ local_mtu = ETHER_MTU; >>>+ VLOG_WARN("Invalid or missing dpdk-mtu parameter - defaulting >>>to >>>%d.\n", >>>+ local_mtu); >>>+ } >>>+ >>>+ *mtu = local_mtu; >>>+} >>>+ >>> static int >>> vhost_construct_helper(struct netdev *netdev_) OVS_REQUIRES(dpdk_mutex) >>> { >>>@@ -777,11 +894,77 @@ netdev_dpdk_get_config(const struct netdev >>>*netdev_, struct smap *args) >>> smap_add_format(args, "configured_rx_queues", "%d", >>>netdev_->n_rxq); >>> smap_add_format(args, "requested_tx_queues", "%d", netdev_->n_txq); >>> smap_add_format(args, "configured_tx_queues", "%d", >>>dev->real_n_txq); >>>+ smap_add_format(args, "mtu", "%d", dev->mtu); >>> ovs_mutex_unlock(&dev->mutex); >>> >>> return 0; >>> } >>> >>>+/* Set the mtu of DPDK_DEV_ETH ports */ >>>+static int >>>+netdev_dpdk_set_mtu(const struct netdev *netdev, int mtu) >>>+{ >>>+ struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); >>>+ int old_mtu, err; >>>+ uint32_t buf_size; >>>+ int dpdk_mtu; >>>+ struct dpdk_mp *old_mp; >>>+ struct dpdk_mp *mp; >>>+ >>>+ ovs_mutex_lock(&dpdk_mutex); >>>+ ovs_mutex_lock(&dev->mutex); >>>+ if (dev->mtu == mtu) { >>>+ err = 0; >>>+ goto out; >>>+ } >>>+ >>>+ buf_size = dpdk_buf_size(dev, MTU_TO_FRAME_LEN(mtu)); >>>+ dpdk_mtu = FRAME_LEN_TO_MTU(buf_size); >>>+ >>>+ mp = dpdk_mp_get(dev->socket_id, dpdk_mtu); >>>+ if (!mp) { >>>+ err = ENOMEM; >>>+ goto out; >>>+ } >>>+ >>>+ rte_eth_dev_stop(dev->port_id); >>>+ >>>+ old_mtu = dev->mtu; >>>+ old_mp = dev->dpdk_mp; >>>+ dev->dpdk_mp = mp; >>>+ dev->mtu = mtu; >>>+ dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); >>>+ >>>+ err = dpdk_eth_dev_init(dev); >>>+ if (err) { >>>+ VLOG_WARN("Unable to set MTU '%d' for '%s'; reverting to last >>>known " >>>+ "good value '%d'\n", mtu, dev->up.name, old_mtu); >>>+ dpdk_mp_put(mp); >>>+ dev->mtu = old_mtu; >>>+ dev->dpdk_mp = old_mp; >>>+ dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); >>>+ dpdk_eth_dev_init(dev); >>>+ goto out; >>>+ } else { >>>+ dpdk_mp_put(old_mp); >>>+ netdev_change_seq_changed(netdev); >>>+ } >>>+out: >>>+ ovs_mutex_unlock(&dev->mutex); >>>+ ovs_mutex_unlock(&dpdk_mutex); >>>+ return err; >>>+} >>>+ >>>+static int >>>+netdev_dpdk_set_config(struct netdev *netdev_, const struct smap *args) >>>+{ >>>+ int mtu; >>>+ >>>+ dpdk_dev_parse_mtu(args, &mtu); >>>+ >>>+ return netdev_dpdk_set_mtu(netdev_, mtu); >>>+} >>>+ >>> static int >>> netdev_dpdk_get_numa_id(const struct netdev *netdev_) >>> { >>>@@ -1358,54 +1541,6 @@ netdev_dpdk_get_mtu(const struct netdev *netdev, >>>int *mtup) >>> >>> return 0; >>> } >>>- >>>-static int >>>-netdev_dpdk_set_mtu(const struct netdev *netdev, int mtu) >>>-{ >>>- struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); >>>- int old_mtu, err; >>>- struct dpdk_mp *old_mp; >>>- struct dpdk_mp *mp; >>>- >>>- ovs_mutex_lock(&dpdk_mutex); >>>- ovs_mutex_lock(&dev->mutex); >>>- if (dev->mtu == mtu) { >>>- err = 0; >>>- goto out; >>>- } >>>- >>>- mp = dpdk_mp_get(dev->socket_id, dev->mtu); >>>- if (!mp) { >>>- err = ENOMEM; >>>- goto out; >>>- } >>>- >>>- rte_eth_dev_stop(dev->port_id); >>>- >>>- old_mtu = dev->mtu; >>>- old_mp = dev->dpdk_mp; >>>- dev->dpdk_mp = mp; >>>- dev->mtu = mtu; >>>- dev->max_packet_len = MTU_TO_MAX_LEN(dev->mtu); >>>- >>>- err = dpdk_eth_dev_init(dev); >>>- if (err) { >>>- dpdk_mp_put(mp); >>>- dev->mtu = old_mtu; >>>- dev->dpdk_mp = old_mp; >>>- dev->max_packet_len = MTU_TO_MAX_LEN(dev->mtu); >>>- dpdk_eth_dev_init(dev); >>>- goto out; >>>- } >>>- >>>- dpdk_mp_put(old_mp); >>>- netdev_change_seq_changed(netdev); >>>-out: >>>- ovs_mutex_unlock(&dev->mutex); >>>- ovs_mutex_unlock(&dpdk_mutex); >>>- return err; >>>-} >>>- >>> static int >>> netdev_dpdk_get_carrier(const struct netdev *netdev_, bool *carrier); >>> >>>@@ -1682,7 +1817,7 @@ netdev_dpdk_get_status(const struct netdev >>>*netdev_, struct smap *args) >>> smap_add_format(args, "numa_id", "%d", >>>rte_eth_dev_socket_id(dev->port_id)); >>> smap_add_format(args, "driver_name", "%s", dev_info.driver_name); >>> smap_add_format(args, "min_rx_bufsize", "%u", >>>dev_info.min_rx_bufsize); >>>- smap_add_format(args, "max_rx_pktlen", "%u", >>>dev_info.max_rx_pktlen); >>>+ smap_add_format(args, "max_rx_pktlen", "%u", dev->max_packet_len); >>> smap_add_format(args, "max_rx_queues", "%u", >>>dev_info.max_rx_queues); >>> smap_add_format(args, "max_tx_queues", "%u", >>>dev_info.max_tx_queues); >>> smap_add_format(args, "max_mac_addrs", "%u", >>>dev_info.max_mac_addrs); >>>@@ -1904,6 +2039,51 @@ dpdk_vhost_user_class_init(void) >>> return 0; >>> } >>> >>>+/* Set the mtu of DPDK_DEV_VHOST ports */ >>>+static int >>>+netdev_dpdk_vhost_set_mtu(const struct netdev *netdev, int mtu) >>>+{ >>>+ struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); >>>+ int err = 0; >>>+ struct dpdk_mp *old_mp; >>>+ struct dpdk_mp *mp; >>>+ >>>+ ovs_mutex_lock(&dpdk_mutex); >>>+ ovs_mutex_lock(&dev->mutex); >>>+ if (dev->mtu == mtu) { >>>+ err = 0; >>>+ goto out; >>>+ } >>>+ >>>+ mp = dpdk_mp_get(dev->socket_id, mtu); >>>+ if (!mp) { >>>+ err = ENOMEM; >>>+ goto out; >>>+ } >>>+ >>>+ old_mp = dev->dpdk_mp; >>>+ dev->dpdk_mp = mp; >>>+ dev->mtu = mtu; >>>+ dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); >>>+ >>>+ dpdk_mp_put(old_mp); >>>+ netdev_change_seq_changed(netdev); >>>+out: >>>+ ovs_mutex_unlock(&dev->mutex); >>>+ ovs_mutex_unlock(&dpdk_mutex); >>>+ return err; >>>+} >>>+ >>>+static int >>>+netdev_dpdk_vhost_set_config(struct netdev *netdev_, const struct smap >>>*args) >>>+{ >>>+ int mtu; >>>+ >>>+ dpdk_dev_parse_mtu(args, &mtu); >>>+ >>>+ return netdev_dpdk_vhost_set_mtu(netdev_, mtu); >>>+} >>>+ >>> static void >>> dpdk_common_init(void) >>> { >>>@@ -2040,8 +2220,9 @@ unlock_dpdk: >>> return err; >>> } >>> >>>-#define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, MULTIQ, >>>SEND, >>>\ >>>- GET_CARRIER, GET_STATS, GET_FEATURES, GET_STATUS, RXQ_RECV) >>>\ >>>+#define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, SET_CONFIG, >>>\ >>>+ MULTIQ, SEND, SET_MTU, GET_CARRIER, GET_STATS, GET_FEATURES, >>>\ >>>+ GET_STATUS, RXQ_RECV) >>>\ >>> { \ >>> NAME, \ >>> INIT, /* init */ \ >>>@@ -2053,7 +2234,7 @@ unlock_dpdk: >>> DESTRUCT, \ >>> netdev_dpdk_dealloc, \ >>> netdev_dpdk_get_config, \ >>>- NULL, /* netdev_dpdk_set_config */ \ >>>+ SET_CONFIG, \ >>> NULL, /* get_tunnel_config */ \ >>> NULL, /* build header */ \ >>> NULL, /* push header */ \ >>>@@ -2067,7 +2248,7 @@ unlock_dpdk: >>> netdev_dpdk_set_etheraddr, \ >>> netdev_dpdk_get_etheraddr, \ >>> netdev_dpdk_get_mtu, \ >>>- netdev_dpdk_set_mtu, \ >>>+ SET_MTU, \ >>> netdev_dpdk_get_ifindex, \ >>> GET_CARRIER, \ >>> netdev_dpdk_get_carrier_resets, \ >>>@@ -2213,8 +2394,10 @@ static const struct netdev_class dpdk_class = >>> NULL, >>> netdev_dpdk_construct, >>> netdev_dpdk_destruct, >>>+ netdev_dpdk_set_config, >>> netdev_dpdk_set_multiq, >>> netdev_dpdk_eth_send, >>>+ netdev_dpdk_set_mtu, >>> netdev_dpdk_get_carrier, >>> netdev_dpdk_get_stats, >>> netdev_dpdk_get_features, >>>@@ -2227,8 +2410,10 @@ static const struct netdev_class dpdk_ring_class >>>= >>> NULL, >>> netdev_dpdk_ring_construct, >>> netdev_dpdk_destruct, >>>+ netdev_dpdk_set_config, >>> netdev_dpdk_set_multiq, >>> netdev_dpdk_ring_send, >>>+ netdev_dpdk_set_mtu, >>> netdev_dpdk_get_carrier, >>> netdev_dpdk_get_stats, >>> netdev_dpdk_get_features, >>>@@ -2241,8 +2426,10 @@ static const struct netdev_class OVS_UNUSED >>>dpdk_vhost_cuse_class = >>> dpdk_vhost_cuse_class_init, >>> netdev_dpdk_vhost_cuse_construct, >>> netdev_dpdk_vhost_destruct, >>>+ NULL, >>> netdev_dpdk_vhost_set_multiq, >>> netdev_dpdk_vhost_send, >>>+ NULL, >>> netdev_dpdk_vhost_get_carrier, >>> netdev_dpdk_vhost_get_stats, >>> NULL, >>>@@ -2255,8 +2442,10 @@ static const struct netdev_class OVS_UNUSED >>>dpdk_vhost_user_class = >>> dpdk_vhost_user_class_init, >>> netdev_dpdk_vhost_user_construct, >>> netdev_dpdk_vhost_destruct, >>>+ netdev_dpdk_vhost_set_config, >>> netdev_dpdk_vhost_set_multiq, >>> netdev_dpdk_vhost_send, >>>+ netdev_dpdk_vhost_set_mtu, >>> netdev_dpdk_vhost_get_carrier, >>> netdev_dpdk_vhost_get_stats, >>> NULL, >>>-- >>>1.9.3 >>> >>>_______________________________________________ >>>dev mailing list >>>dev@openvswitch.org >>>http://openvswitch.org/mailman/listinfo/dev > _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev