> >Hi Mark, > Hi Daniele,
Thanks for the review! Responses inline below. Cheers, Mark >This patch besides adding Jumbo Frame support also cleans up >the mbuf initialization (by changing the macros, adding >dpdk_buf_size, and rewriting __ovs_rte_pktmbuf_init), so thanks >for this. I think it makes sense to split the patch in two: >one that does the clenup, and one that allows configuring the >MTU. Okay, sounds good - I'll spin another version, splitting into a two patch set. > >I agree with Flavio's comments as well, more inline > >Thanks > >On 01/02/2016 02:18, "Mark Kavanagh" <mark.b.kavan...@intel.com> wrote: > >>Add support for Jumbo Frames to DPDK-enabled port types, >>using single-segment-mbufs. >> >>Using this approach, the amount of memory allocated for each mbuf >>to store frame data is increased to a value greater than 1518B >>(typical Ethernet maximum frame length). The increased space >>available in the mbuf means that an entire Jumbo Frame can be carried >>in a single mbuf, as opposed to partitioning it across multiple mbuf >>segments. >> >>The amount of space allocated to each mbuf to hold frame data is >>defined dynamically by the user when adding a DPDK port to a bridge. >>If an MTU value is not supplied, or the user-supplied value is invalid, >>the MTU for the port defaults to standard Ethernet MTU (i.e. 1500B). >> >>Signed-off-by: Mark Kavanagh <mark.b.kavan...@intel.com> >>--- >> INSTALL.DPDK.md | 59 +++++++++- >> NEWS | 2 + >> lib/netdev-dpdk.c | 347 >>+++++++++++++++++++++++++++++++++++++++++------------- >> 3 files changed, 328 insertions(+), 80 deletions(-) >> >>diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md >>index 96b686c..64ccd15 100644 >>--- a/INSTALL.DPDK.md >>+++ b/INSTALL.DPDK.md >>@@ -859,10 +859,61 @@ by adding the following string: >> to <interface> sections of all network devices used by DPDK. Parameter >>'N' >> determines how many queues can be used by the guest. >> >>+ >>+Jumbo Frames >>+------------ >>+ >>+Support for Jumbo Frames may be enabled at run-time for DPDK-type ports. >>+ >>+To avail of Jumbo Frame support, add the 'dpdk-mtu' option to the >>ovs-vsctl >>+'add-port' command-line, along with the required MTU for the port. >>+e.g. >>+ >>+ ``` >>+ ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk >>options:dpdk-mtu=9000 >>+ ``` >>+ >>+When Jumbo Frames are enabled, the size of a DPDK port's mbuf segments >>are >>+increased, such that a full Jumbo Frame may be accommodated inside a >>single >>+mbuf segment. Once set, the MTU for a DPDK port is immutable. > >Why is it immutable? I guess my rationale here is that an MTU change can't be triggered via OVS command-line, nor can it be triggered programmatically via DPDK (apart from an explicit call to rte_eth_dev_set_mtu). So, while technically it's possibly, from a user's point of view, there's no way to configure it, outside of modifying the code directly. If I've missed something here, please let me know. > >>+ >>+Jumbo frame support has been validated against 13312B frames, using the >>+DPDK `igb_uio` driver, but larger frames and other DPDK NIC drivers may >>+theoretically be supported. Supported port types excludes vHost-Cuse >>ports, as >>+this feature is pending deprecation. >>+ >>+ >>+vHost Ports and Jumbo Frames >>+---------------------------- >>+Jumbo frame support is available for DPDK vHost-User ports only. Some >>additional >>+configuration is needed to take advantage of this feature: >>+ >>+ 1. `mergeable buffers` must be enabled for vHost ports, as >>demonstrated in >>+ the QEMU command line snippet below: >>+ >>+ ``` >>+ '-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \' >>+ '-device >>virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=on' >>+ ``` >>+ >>+ 2. Where virtio devices are bound to the Linux kernel driver in a guest >>+ environment (i.e. interfaces are not bound to an in-guest DPDK >>driver), the >>+ MTU of those logical network interfaces must also be increased. This >>+ avoids segmentation of Jumbo Frames in the guest. Note that 'MTU' >>refers >>+ to the length of the IP packet only, and not that of the entire >>frame. >>+ >>+ e.g. To calculate the exact MTU of a standard IPv4 frame, subtract >>the L2 >>+ header and CRC lengths (i.e. 18B) from the max supported frame size. >>+ So, to set the MTU for a 13312B Jumbo Frame: >>+ >>+ ``` >>+ ifconfig eth1 mtu 13294 >>+ ``` >>+ >>+ >> Restrictions: >> ------------- >> >>- - Work with 1500 MTU, needs few changes in DPDK lib to fix this issue. >> - Currently DPDK port does not make use any offload functionality. >> - DPDK-vHost support works with 1G huge pages. >> >>@@ -903,6 +954,12 @@ Restrictions: >> the next release of DPDK (which includes the above patch) is >>available and >> integrated into OVS. >> >>+ Jumbo Frames: >>+ - `virtio-pmd`: DPDK apps in the guest do not exit gracefully. The >>source of >>+ this issue is currently being investigated. >>+ - vHost-Cuse: Jumbo Frame support is not available for vHost Cuse >>ports. >>+ >>+ >> Bug Reporting: >> -------------- >> >>diff --git a/NEWS b/NEWS >>index 5c18867..cd563e0 100644 >>--- a/NEWS >>+++ b/NEWS >>@@ -46,6 +46,8 @@ v2.5.0 - xx xxx xxxx >> abstractions, such as virtual L2 and L3 overlays and security >>groups. >> - RHEL packaging: >> * DPDK ports may now be created via network scripts (see >>README.RHEL). >>+ - netdev-dpdk: >>+ * Add Jumbo Frame Support > >I don't think we should add this to 2.5 Out of curiosity, is this mainly due to potential instability? In any event, I'll move it into the 'Post-v2.5.0' section. > >> >> >> v2.4.0 - 20 Aug 2015 >>diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c >>index de7e488..76a5dcc 100644 >>--- a/lib/netdev-dpdk.c >>+++ b/lib/netdev-dpdk.c >>@@ -62,20 +62,25 @@ static struct vlog_rate_limit rl = >>VLOG_RATE_LIMIT_INIT(5, 20); >> #define OVS_CACHE_LINE_SIZE CACHE_LINE_SIZE >> #define OVS_VPORT_DPDK "ovs_dpdk" >> >>+#define NETDEV_DPDK_JUMBO_FRAME_ENABLED 1 > >I don't see particular value in the above #define. I've removed this macro, and the section of code that used it in the latest version of the patch. > >>+#define NETDEV_DPDK_DEFAULT_RX_BUFSIZE 1024 >>+ >> /* >> * need to reserve tons of extra space in the mbufs so we can align the >> * DMA addresses to 4KB. >> * The minimum mbuf size is limited to avoid scatter behaviour and drop >>in >> * performance for standard Ethernet MTU. >> */ >>-#define MTU_TO_MAX_LEN(mtu) ((mtu) + ETHER_HDR_LEN + ETHER_CRC_LEN) >>-#define MBUF_SIZE_MTU(mtu) (MTU_TO_MAX_LEN(mtu) \ >>- + sizeof(struct dp_packet) \ >>- + RTE_PKTMBUF_HEADROOM) >>-#define MBUF_SIZE_DRIVER (2048 \ >>- + sizeof (struct rte_mbuf) \ >>- + RTE_PKTMBUF_HEADROOM) >>-#define MBUF_SIZE(mtu) MAX(MBUF_SIZE_MTU(mtu), MBUF_SIZE_DRIVER) >>+#define MTU_TO_FRAME_LEN(mtu) ((mtu) + ETHER_HDR_LEN + >>ETHER_CRC_LEN) >>+#define FRAME_LEN_TO_MTU(frame_len) ((frame_len)- ETHER_HDR_LEN - >>ETHER_CRC_LEN) >>+#define MBUF_SEGMENT_SIZE(mtu) ( MTU_TO_FRAME_LEN(mtu) \ >>+ + sizeof(struct dp_packet) \ >>+ + RTE_PKTMBUF_HEADROOM) >>+ >>+/* This value should be specified as a multiple of the DPDK NIC driver's >>+ * 'min_rx_bufsize' attribute (currently 1024B for 'igb_uio'). >>+ */ >>+#define NETDEV_DPDK_MAX_FRAME_LEN 13312 >> >> /* Max and min number of packets in the mempool. OVS tries to allocate a >> * mempool with MAX_NB_MBUF: if this fails (because the system doesn't >>have >>@@ -86,6 +91,8 @@ static struct vlog_rate_limit rl = >>VLOG_RATE_LIMIT_INIT(5, 20); >> #define MIN_NB_MBUF (4096 * 4) >> #define MP_CACHE_SZ RTE_MEMPOOL_CACHE_MAX_SIZE >> >>+#define DPDK_VLAN_TAG_LEN 4 >>+ >> /* MAX_NB_MBUF can be divided by 2 many times, until MIN_NB_MBUF */ >> BUILD_ASSERT_DECL(MAX_NB_MBUF % ROUND_DOWN_POW2(MAX_NB_MBUF/MIN_NB_MBUF) >>== 0); >> >>@@ -114,7 +121,6 @@ static const struct rte_eth_conf port_conf = { >> .header_split = 0, /* Header Split disabled */ >> .hw_ip_checksum = 0, /* IP checksum offload disabled */ >> .hw_vlan_filter = 0, /* VLAN filtering disabled */ >>- .jumbo_frame = 0, /* Jumbo Frame Support disabled */ >> .hw_strip_crc = 0, >> }, >> .rx_adv_conf = { >>@@ -254,6 +260,41 @@ is_dpdk_class(const struct netdev_class *class) >> return class->construct == netdev_dpdk_construct; >> } >> >>+/* DPDK NIC drivers allocate RX buffers at a particular granularity >>+ * (specified by rte_eth_dev_info.min_rx_bufsize - currently 1K for >>igb_uio). >>+ * If 'frame_len' is not a multiple of this value, insufficient buffers >>are >>+ * allocated to accomodate the packet in its entirety. Furthermore, the >>igb_uio >>+ * driver needs to ensure that there is also sufficient space in the Rx >>buffer >>+ * to accommodate two VLAN tags (for QinQ frames). If the RX buffer is >>too >>+ * small, then the driver enables scatter RX behaviour, which reduces >>+ * performance. To prevent this, use a buffer size that is closest to >>+ * 'frame_len', but which satisfies the aforementioned criteria. >>+ */ >>+static uint32_t >>+dpdk_buf_size(struct netdev_dpdk *netdev, int frame_len) >>+{ >>+ struct rte_eth_dev_info info; >>+ uint32_t buf_size; >>+ /* XXX: This is a workaround for DPDK v2.2, and needs to be >>refactored with a >>+ * future DPDK release. */ > >Could you elaborate on that? Due to changes pending in the latest version of the code, this is no longer relevant and will be removed. > >>+ uint32_t len = frame_len + (2 * DPDK_VLAN_TAG_LEN); >>+ >>+ if(netdev->type == DPDK_DEV_ETH) { >>+ rte_eth_dev_info_get(netdev->port_id, &info); >>+ buf_size = (info.min_rx_bufsize == 0) ? >>+ NETDEV_DPDK_DEFAULT_RX_BUFSIZE : >>+ info.min_rx_bufsize; >>+ } else { >>+ buf_size = NETDEV_DPDK_DEFAULT_RX_BUFSIZE; >>+ } >>+ >>+ if(len % buf_size) { >>+ len = buf_size * ((len/buf_size) + 1); >>+ } > >I think this looks better with the ROUND_UP macro. True - I'll update the code accordingly. > >>+ >>+ return len; >>+} >>+ >> /* XXX: use dpdk malloc for entire OVS. in fact huge page should be used >> * for all other segments data, bss and text. */ >> >>@@ -280,26 +321,65 @@ free_dpdk_buf(struct dp_packet *p) >> } >> >> static void >>-__rte_pktmbuf_init(struct rte_mempool *mp, >>- void *opaque_arg OVS_UNUSED, >>- void *_m, >>- unsigned i OVS_UNUSED) >>+ovs_rte_pktmbuf_pool_init(struct rte_mempool *mp, void *opaque_arg) >> { >>- struct rte_mbuf *m = _m; >>- uint32_t buf_len = mp->elt_size - sizeof(struct dp_packet); >>+ struct rte_pktmbuf_pool_private *user_mbp_priv, *mbp_priv; >>+ struct rte_pktmbuf_pool_private default_mbp_priv; >>+ uint16_t roomsz; >> >> RTE_MBUF_ASSERT(mp->elt_size >= sizeof(struct dp_packet)); >> >>+ /* if no structure is provided, assume no mbuf private area */ >>+ >>+ user_mbp_priv = opaque_arg; >>+ if (user_mbp_priv == NULL) { >>+ default_mbp_priv.mbuf_priv_size = 0; >>+ if (mp->elt_size > sizeof(struct dp_packet)) { >>+ roomsz = mp->elt_size - sizeof(struct dp_packet); >>+ } else { >>+ roomsz = 0; >>+ } >>+ default_mbp_priv.mbuf_data_room_size = roomsz; >>+ user_mbp_priv = &default_mbp_priv; >>+ } >>+ >>+ RTE_MBUF_ASSERT(mp->elt_size >= sizeof(struct dp_packet) + >>+ user_mbp_priv->mbuf_data_room_size + >>+ user_mbp_priv->mbuf_priv_size); >>+ >>+ mbp_priv = rte_mempool_get_priv(mp); >>+ memcpy(mbp_priv, user_mbp_priv, sizeof(*mbp_priv)); >>+} >>+ >>+/* Initialise some fields in the mbuf structure that are not modified by >>the >>+ * user once created (origin pool, buffer start address, etc.*/ >>+static void >>+__ovs_rte_pktmbuf_init(struct rte_mempool *mp, >>+ void *opaque_arg OVS_UNUSED, >>+ void *_m, >>+ unsigned i OVS_UNUSED) >>+{ >>+ struct rte_mbuf *m = _m; >>+ uint32_t buf_size, buf_len, priv_size; >>+ >>+ priv_size = rte_pktmbuf_priv_size(mp); >>+ buf_size = sizeof(struct dp_packet) + priv_size; >>+ buf_len = rte_pktmbuf_data_room_size(mp); >>+ >>+ RTE_MBUF_ASSERT(RTE_ALIGN(priv_size, RTE_MBUF_PRIV_ALIGN) == >>priv_size); >>+ RTE_MBUF_ASSERT(mp->elt_size >= buf_size); >>+ RTE_MBUF_ASSERT(buf_len <= UINT16_MAX); >>+ >> memset(m, 0, mp->elt_size); >> >>- /* start of buffer is just after mbuf structure */ >>- m->buf_addr = (char *)m + sizeof(struct dp_packet); >>- m->buf_physaddr = rte_mempool_virt2phy(mp, m) + >>- sizeof(struct dp_packet); >>+ /* start of buffer is after dp_packet structure and priv data */ >>+ m->priv_size = priv_size; >>+ m->buf_addr = (char *)m + buf_size; >>+ m->buf_physaddr = rte_mempool_virt2phy(mp, m) + buf_size; >> m->buf_len = (uint16_t)buf_len; >> >> /* keep some headroom between start of buffer and data */ >>- m->data_off = RTE_MIN(RTE_PKTMBUF_HEADROOM, m->buf_len); >>+ m->data_off = RTE_MIN(RTE_PKTMBUF_HEADROOM, (uint16_t)m->buf_len); >> >> /* init some constant fields */ >> m->pool = mp; >>@@ -315,7 +395,7 @@ ovs_rte_pktmbuf_init(struct rte_mempool *mp, >> { >> struct rte_mbuf *m = _m; >> >>- __rte_pktmbuf_init(mp, opaque_arg, _m, i); >>+ __ovs_rte_pktmbuf_init(mp, opaque_arg, m, i); >> >> dp_packet_init_dpdk((struct dp_packet *) m, m->buf_len); >> } >>@@ -326,6 +406,7 @@ dpdk_mp_get(int socket_id, int mtu) >>OVS_REQUIRES(dpdk_mutex) >> struct dpdk_mp *dmp = NULL; >> char mp_name[RTE_MEMPOOL_NAMESIZE]; >> unsigned mp_size; >>+ struct rte_pktmbuf_pool_private mbp_priv; >> >> LIST_FOR_EACH (dmp, list_node, &dpdk_mp_list) { >> if (dmp->socket_id == socket_id && dmp->mtu == mtu) { >>@@ -338,6 +419,8 @@ dpdk_mp_get(int socket_id, int mtu) >>OVS_REQUIRES(dpdk_mutex) >> dmp->socket_id = socket_id; >> dmp->mtu = mtu; >> dmp->refcount = 1; >>+ mbp_priv.mbuf_data_room_size = MTU_TO_FRAME_LEN(mtu) + >>RTE_PKTMBUF_HEADROOM; >>+ mbp_priv.mbuf_priv_size = 0; >> >> mp_size = MAX_NB_MBUF; >> do { >>@@ -346,10 +429,10 @@ dpdk_mp_get(int socket_id, int mtu) >>OVS_REQUIRES(dpdk_mutex) >> return NULL; >> } >> >>- dmp->mp = rte_mempool_create(mp_name, mp_size, MBUF_SIZE(mtu), >>+ dmp->mp = rte_mempool_create(mp_name, mp_size, >>MBUF_SEGMENT_SIZE(mtu), >> MP_CACHE_SZ, >> sizeof(struct >>rte_pktmbuf_pool_private), >>- rte_pktmbuf_pool_init, NULL, >>+ ovs_rte_pktmbuf_pool_init, >>&mbp_priv, >> ovs_rte_pktmbuf_init, NULL, >> socket_id, 0); >> } while (!dmp->mp && rte_errno == ENOMEM && (mp_size /= 2) >= >>MIN_NB_MBUF); > >Ok, this is quite intricated. I believe the reason that OVS used its own >__rte_pktmbuf_init() was that it wasn't possible to have custom metadata >in mbufs before DPDK 2.1. > >If I'm not mistaken, with DPDK commit b507905ff407 there's a way to do that >without copying any code from DPDK with the following incremental (from >master) > >---8<--- > >@@ -278,42 +272,12 @@ free_dpdk_buf(struct dp_packet *p) > } > > static void >-__rte_pktmbuf_init(struct rte_mempool *mp, >- void *opaque_arg OVS_UNUSED, >- void *_m, >- unsigned i OVS_UNUSED) >-{ >- struct rte_mbuf *m = _m; >- uint32_t buf_len = mp->elt_size - sizeof(struct dp_packet); >- >- RTE_MBUF_ASSERT(mp->elt_size >= sizeof(struct dp_packet)); >- >- memset(m, 0, mp->elt_size); >- >- /* start of buffer is just after mbuf structure */ >- m->buf_addr = (char *)m + sizeof(struct dp_packet); >- m->buf_physaddr = rte_mempool_virt2phy(mp, m) + >- sizeof(struct dp_packet); >- m->buf_len = (uint16_t)buf_len; >- >- /* keep some headroom between start of buffer and data */ >- m->data_off = RTE_MIN(RTE_PKTMBUF_HEADROOM, m->buf_len); >- >- /* init some constant fields */ >- m->pool = mp; >- m->nb_segs = 1; >- m->port = 0xff; >-} >- >-static void >-ovs_rte_pktmbuf_init(struct rte_mempool *mp, >- void *opaque_arg OVS_UNUSED, >- void *_m, >- unsigned i OVS_UNUSED) >+ovs_rte_pktmbuf_init(struct rte_mempool *mp, void *opaque_arg, >+ void *_m, unsigned i) > { > struct rte_mbuf *m = _m; > >- __rte_pktmbuf_init(mp, opaque_arg, _m, i); >+ rte_pktmbuf_init(mp, opaque_arg, _m, i); > > dp_packet_init_dpdk((struct dp_packet *) m, m->buf_len); > } >@@ -339,15 +303,21 @@ dpdk_mp_get(int socket_id, int mtu) >OVS_REQUIRES(dpdk_mutex) > > mp_size = MAX_NB_MBUF; > do { >+ struct rte_pktmbuf_pool_private mbuf_sizes; >+ > if (snprintf(mp_name, RTE_MEMPOOL_NAMESIZE, "ovs_mp_%d_%d_%u", > dmp->mtu, dmp->socket_id, mp_size) < 0) { > return NULL; > } > >+ mbuf_sizes.mbuf_priv_size = sizeof (struct dp_packet) >+ - sizeof (struct rte_mbuf); >+ mbuf_sizes.mbuf_data_room_size = MBUF_SIZE(mtu) >+ - sizeof(struct dp_packet); > dmp->mp = rte_mempool_create(mp_name, mp_size, MBUF_SIZE(mtu), > MP_CACHE_SZ, > sizeof(struct >rte_pktmbuf_pool_private), >- rte_pktmbuf_pool_init, NULL, >+ rte_pktmbuf_pool_init, &mbuf_sizes, > ovs_rte_pktmbuf_init, NULL, > socket_id, 0); > } while (!dmp->mp && rte_errno == ENOMEM && (mp_size /= 2) >= >MIN_NB_MBUF); > >---8<--- > >I think this will make the patch cleaner. Do you think this will work with >the increased MTU as well? Thanks for this Daniele. When I implemented the code, I attempted to change as little of the legacy code as possible, but seeing as how I'm now doing a patchset, I can roll this change in. Btw, I just tested with P2P 9k frames, and it works fine :) > >>@@ -433,6 +516,7 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int >>n_rxq, int n_txq) >> { >> int diag = 0; >> int i; >>+ struct rte_eth_conf conf = port_conf; >> >> /* A device may report more queues than it makes available (this has >> * been observed for Intel xl710, which reserves some of them for >>@@ -444,7 +528,12 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, >>int n_rxq, int n_txq) >> VLOG_INFO("Retrying setup with (rxq:%d txq:%d)", n_rxq, >>n_txq); >> } >> >>- diag = rte_eth_dev_configure(dev->port_id, n_rxq, n_txq, >>&port_conf); >>+ if(OVS_UNLIKELY(dev->mtu > ETHER_MTU)) { >>+ conf.rxmode.jumbo_frame = NETDEV_DPDK_JUMBO_FRAME_ENABLED; >>+ conf.rxmode.max_rx_pkt_len = MTU_TO_FRAME_LEN(dev->mtu); >>+ } >>+ >>+ diag = rte_eth_dev_configure(dev->port_id, n_rxq, n_txq, &conf); >> if (diag) { >> break; >> } >>@@ -586,6 +675,7 @@ netdev_dpdk_init(struct netdev *netdev_, unsigned int >>port_no, >> struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_); >> int sid; >> int err = 0; >>+ uint32_t buf_size; >> >> ovs_mutex_init(&netdev->mutex); >> ovs_mutex_lock(&netdev->mutex); >>@@ -605,10 +695,16 @@ netdev_dpdk_init(struct netdev *netdev_, unsigned >>int port_no, >> netdev->port_id = port_no; >> netdev->type = type; >> netdev->flags = 0; >>+ >>+ /* Initialize port's MTU and frame len to the default Ethernet >>values. >>+ * Larger, user-specified (jumbo) frame buffers are accommodated in >>+ * netdev_dpdk_set_config. >>+ */ >>+ netdev->max_packet_len = ETHER_MAX_LEN; >> netdev->mtu = ETHER_MTU; >>- netdev->max_packet_len = MTU_TO_MAX_LEN(netdev->mtu); >> >>- netdev->dpdk_mp = dpdk_mp_get(netdev->socket_id, netdev->mtu); >>+ buf_size = dpdk_buf_size(netdev, ETHER_MAX_LEN); >>+ netdev->dpdk_mp = dpdk_mp_get(netdev->socket_id, >>FRAME_LEN_TO_MTU(buf_size)); >> if (!netdev->dpdk_mp) { >> err = ENOMEM; >> goto unlock; >>@@ -651,6 +747,27 @@ dpdk_dev_parse_name(const char dev_name[], const >>char prefix[], >> return 0; >> } >> >>+static void >>+dpdk_dev_parse_mtu(const struct smap *args, int *mtu) >>+{ >>+ const char *mtu_str = smap_get(args, "dpdk-mtu"); >>+ char *end_ptr = NULL; >>+ int local_mtu; >>+ >>+ if(mtu_str) { >>+ local_mtu = strtoul(mtu_str, &end_ptr, 0); >>+ } >>+ if(!mtu_str || local_mtu < ETHER_MTU || >>+ local_mtu > >>FRAME_LEN_TO_MTU(NETDEV_DPDK_MAX_FRAME_LEN) || >>+ *end_ptr != '\0') { >>+ local_mtu = ETHER_MTU; >>+ VLOG_WARN("Invalid or missing dpdk-mtu parameter - defaulting to >>%d.\n", >>+ local_mtu); >>+ } >>+ >>+ *mtu = local_mtu; >>+} >>+ >> static int >> vhost_construct_helper(struct netdev *netdev_) OVS_REQUIRES(dpdk_mutex) >> { >>@@ -777,11 +894,77 @@ netdev_dpdk_get_config(const struct netdev >>*netdev_, struct smap *args) >> smap_add_format(args, "configured_rx_queues", "%d", netdev_->n_rxq); >> smap_add_format(args, "requested_tx_queues", "%d", netdev_->n_txq); >> smap_add_format(args, "configured_tx_queues", "%d", dev->real_n_txq); >>+ smap_add_format(args, "mtu", "%d", dev->mtu); >> ovs_mutex_unlock(&dev->mutex); >> >> return 0; >> } >> >>+/* Set the mtu of DPDK_DEV_ETH ports */ >>+static int >>+netdev_dpdk_set_mtu(const struct netdev *netdev, int mtu) >>+{ >>+ struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); >>+ int old_mtu, err; >>+ uint32_t buf_size; >>+ int dpdk_mtu; >>+ struct dpdk_mp *old_mp; >>+ struct dpdk_mp *mp; >>+ >>+ ovs_mutex_lock(&dpdk_mutex); >>+ ovs_mutex_lock(&dev->mutex); >>+ if (dev->mtu == mtu) { >>+ err = 0; >>+ goto out; >>+ } >>+ >>+ buf_size = dpdk_buf_size(dev, MTU_TO_FRAME_LEN(mtu)); >>+ dpdk_mtu = FRAME_LEN_TO_MTU(buf_size); >>+ >>+ mp = dpdk_mp_get(dev->socket_id, dpdk_mtu); >>+ if (!mp) { >>+ err = ENOMEM; >>+ goto out; >>+ } >>+ >>+ rte_eth_dev_stop(dev->port_id); >>+ >>+ old_mtu = dev->mtu; >>+ old_mp = dev->dpdk_mp; >>+ dev->dpdk_mp = mp; >>+ dev->mtu = mtu; >>+ dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); >>+ >>+ err = dpdk_eth_dev_init(dev); >>+ if (err) { >>+ VLOG_WARN("Unable to set MTU '%d' for '%s'; reverting to last >>known " >>+ "good value '%d'\n", mtu, dev->up.name, old_mtu); >>+ dpdk_mp_put(mp); >>+ dev->mtu = old_mtu; >>+ dev->dpdk_mp = old_mp; >>+ dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); >>+ dpdk_eth_dev_init(dev); >>+ goto out; >>+ } else { >>+ dpdk_mp_put(old_mp); >>+ netdev_change_seq_changed(netdev); >>+ } >>+out: >>+ ovs_mutex_unlock(&dev->mutex); >>+ ovs_mutex_unlock(&dpdk_mutex); >>+ return err; >>+} >>+ >>+static int >>+netdev_dpdk_set_config(struct netdev *netdev_, const struct smap *args) >>+{ >>+ int mtu; >>+ >>+ dpdk_dev_parse_mtu(args, &mtu); >>+ >>+ return netdev_dpdk_set_mtu(netdev_, mtu); >>+} >>+ >> static int >> netdev_dpdk_get_numa_id(const struct netdev *netdev_) >> { >>@@ -1358,54 +1541,6 @@ netdev_dpdk_get_mtu(const struct netdev *netdev, >>int *mtup) >> >> return 0; >> } >>- >>-static int >>-netdev_dpdk_set_mtu(const struct netdev *netdev, int mtu) >>-{ >>- struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); >>- int old_mtu, err; >>- struct dpdk_mp *old_mp; >>- struct dpdk_mp *mp; >>- >>- ovs_mutex_lock(&dpdk_mutex); >>- ovs_mutex_lock(&dev->mutex); >>- if (dev->mtu == mtu) { >>- err = 0; >>- goto out; >>- } >>- >>- mp = dpdk_mp_get(dev->socket_id, dev->mtu); >>- if (!mp) { >>- err = ENOMEM; >>- goto out; >>- } >>- >>- rte_eth_dev_stop(dev->port_id); >>- >>- old_mtu = dev->mtu; >>- old_mp = dev->dpdk_mp; >>- dev->dpdk_mp = mp; >>- dev->mtu = mtu; >>- dev->max_packet_len = MTU_TO_MAX_LEN(dev->mtu); >>- >>- err = dpdk_eth_dev_init(dev); >>- if (err) { >>- dpdk_mp_put(mp); >>- dev->mtu = old_mtu; >>- dev->dpdk_mp = old_mp; >>- dev->max_packet_len = MTU_TO_MAX_LEN(dev->mtu); >>- dpdk_eth_dev_init(dev); >>- goto out; >>- } >>- >>- dpdk_mp_put(old_mp); >>- netdev_change_seq_changed(netdev); >>-out: >>- ovs_mutex_unlock(&dev->mutex); >>- ovs_mutex_unlock(&dpdk_mutex); >>- return err; >>-} >>- >> static int >> netdev_dpdk_get_carrier(const struct netdev *netdev_, bool *carrier); >> >>@@ -1682,7 +1817,7 @@ netdev_dpdk_get_status(const struct netdev >>*netdev_, struct smap *args) >> smap_add_format(args, "numa_id", "%d", >>rte_eth_dev_socket_id(dev->port_id)); >> smap_add_format(args, "driver_name", "%s", dev_info.driver_name); >> smap_add_format(args, "min_rx_bufsize", "%u", >>dev_info.min_rx_bufsize); >>- smap_add_format(args, "max_rx_pktlen", "%u", dev_info.max_rx_pktlen); >>+ smap_add_format(args, "max_rx_pktlen", "%u", dev->max_packet_len); >> smap_add_format(args, "max_rx_queues", "%u", dev_info.max_rx_queues); >> smap_add_format(args, "max_tx_queues", "%u", dev_info.max_tx_queues); >> smap_add_format(args, "max_mac_addrs", "%u", dev_info.max_mac_addrs); >>@@ -1904,6 +2039,51 @@ dpdk_vhost_user_class_init(void) >> return 0; >> } >> >>+/* Set the mtu of DPDK_DEV_VHOST ports */ >>+static int >>+netdev_dpdk_vhost_set_mtu(const struct netdev *netdev, int mtu) >>+{ >>+ struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); >>+ int err = 0; >>+ struct dpdk_mp *old_mp; >>+ struct dpdk_mp *mp; >>+ >>+ ovs_mutex_lock(&dpdk_mutex); >>+ ovs_mutex_lock(&dev->mutex); >>+ if (dev->mtu == mtu) { >>+ err = 0; >>+ goto out; >>+ } >>+ >>+ mp = dpdk_mp_get(dev->socket_id, mtu); >>+ if (!mp) { >>+ err = ENOMEM; >>+ goto out; >>+ } >>+ >>+ old_mp = dev->dpdk_mp; >>+ dev->dpdk_mp = mp; >>+ dev->mtu = mtu; >>+ dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); >>+ >>+ dpdk_mp_put(old_mp); >>+ netdev_change_seq_changed(netdev); >>+out: >>+ ovs_mutex_unlock(&dev->mutex); >>+ ovs_mutex_unlock(&dpdk_mutex); >>+ return err; >>+} >>+ >>+static int >>+netdev_dpdk_vhost_set_config(struct netdev *netdev_, const struct smap >>*args) >>+{ >>+ int mtu; >>+ >>+ dpdk_dev_parse_mtu(args, &mtu); >>+ >>+ return netdev_dpdk_vhost_set_mtu(netdev_, mtu); >>+} >>+ >> static void >> dpdk_common_init(void) >> { >>@@ -2040,8 +2220,9 @@ unlock_dpdk: >> return err; >> } >> >>-#define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, MULTIQ, SEND, >>\ >>- GET_CARRIER, GET_STATS, GET_FEATURES, GET_STATUS, RXQ_RECV) >>\ >>+#define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, SET_CONFIG, \ >>+ MULTIQ, SEND, SET_MTU, GET_CARRIER, GET_STATS, GET_FEATURES, \ >>+ GET_STATUS, RXQ_RECV) \ >> { \ >> NAME, \ >> INIT, /* init */ \ >>@@ -2053,7 +2234,7 @@ unlock_dpdk: >> DESTRUCT, \ >> netdev_dpdk_dealloc, \ >> netdev_dpdk_get_config, \ >>- NULL, /* netdev_dpdk_set_config */ \ >>+ SET_CONFIG, \ >> NULL, /* get_tunnel_config */ \ >> NULL, /* build header */ \ >> NULL, /* push header */ \ >>@@ -2067,7 +2248,7 @@ unlock_dpdk: >> netdev_dpdk_set_etheraddr, \ >> netdev_dpdk_get_etheraddr, \ >> netdev_dpdk_get_mtu, \ >>- netdev_dpdk_set_mtu, \ >>+ SET_MTU, \ >> netdev_dpdk_get_ifindex, \ >> GET_CARRIER, \ >> netdev_dpdk_get_carrier_resets, \ >>@@ -2213,8 +2394,10 @@ static const struct netdev_class dpdk_class = >> NULL, >> netdev_dpdk_construct, >> netdev_dpdk_destruct, >>+ netdev_dpdk_set_config, >> netdev_dpdk_set_multiq, >> netdev_dpdk_eth_send, >>+ netdev_dpdk_set_mtu, >> netdev_dpdk_get_carrier, >> netdev_dpdk_get_stats, >> netdev_dpdk_get_features, >>@@ -2227,8 +2410,10 @@ static const struct netdev_class dpdk_ring_class = >> NULL, >> netdev_dpdk_ring_construct, >> netdev_dpdk_destruct, >>+ netdev_dpdk_set_config, >> netdev_dpdk_set_multiq, >> netdev_dpdk_ring_send, >>+ netdev_dpdk_set_mtu, >> netdev_dpdk_get_carrier, >> netdev_dpdk_get_stats, >> netdev_dpdk_get_features, >>@@ -2241,8 +2426,10 @@ static const struct netdev_class OVS_UNUSED >>dpdk_vhost_cuse_class = >> dpdk_vhost_cuse_class_init, >> netdev_dpdk_vhost_cuse_construct, >> netdev_dpdk_vhost_destruct, >>+ NULL, >> netdev_dpdk_vhost_set_multiq, >> netdev_dpdk_vhost_send, >>+ NULL, >> netdev_dpdk_vhost_get_carrier, >> netdev_dpdk_vhost_get_stats, >> NULL, >>@@ -2255,8 +2442,10 @@ static const struct netdev_class OVS_UNUSED >>dpdk_vhost_user_class = >> dpdk_vhost_user_class_init, >> netdev_dpdk_vhost_user_construct, >> netdev_dpdk_vhost_destruct, >>+ netdev_dpdk_vhost_set_config, >> netdev_dpdk_vhost_set_multiq, >> netdev_dpdk_vhost_send, >>+ netdev_dpdk_vhost_set_mtu, >> netdev_dpdk_vhost_get_carrier, >> netdev_dpdk_vhost_get_stats, >> NULL, >>-- >>1.9.3 >> >>_______________________________________________ >>dev mailing list >>dev@openvswitch.org >>http://openvswitch.org/mailman/listinfo/dev _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev