> > >On 08/02/2016 03:23, "Kavanagh, Mark B" <mark.b.kavan...@intel.com> wrote: > >>> >>>Hi Mark, >>> >> >>Hi Daniele, >> >>Thanks for the review! Responses inline below. >> >>Cheers, >>Mark >> >>>This patch besides adding Jumbo Frame support also cleans up >>>the mbuf initialization (by changing the macros, adding >>>dpdk_buf_size, and rewriting __ovs_rte_pktmbuf_init), so thanks >>>for this. I think it makes sense to split the patch in two: >>>one that does the clenup, and one that allows configuring the >>>MTU. >> >>Okay, sounds good - I'll spin another version, splitting into a two patch >>set. > >Thanks! > >> >>> >>>I agree with Flavio's comments as well, more inline >>> >>>Thanks >>> >>>On 01/02/2016 02:18, "Mark Kavanagh" <mark.b.kavan...@intel.com> wrote: >>> >>>>Add support for Jumbo Frames to DPDK-enabled port types, >>>>using single-segment-mbufs. >>>> >>>>Using this approach, the amount of memory allocated for each mbuf >>>>to store frame data is increased to a value greater than 1518B >>>>(typical Ethernet maximum frame length). The increased space >>>>available in the mbuf means that an entire Jumbo Frame can be carried >>>>in a single mbuf, as opposed to partitioning it across multiple mbuf >>>>segments. >>>> >>>>The amount of space allocated to each mbuf to hold frame data is >>>>defined dynamically by the user when adding a DPDK port to a bridge. >>>>If an MTU value is not supplied, or the user-supplied value is invalid, >>>>the MTU for the port defaults to standard Ethernet MTU (i.e. 1500B). >>>> >>>>Signed-off-by: Mark Kavanagh <mark.b.kavan...@intel.com> >>>>--- >>>> INSTALL.DPDK.md | 59 +++++++++- >>>> NEWS | 2 + >>>> lib/netdev-dpdk.c | 347 >>>>+++++++++++++++++++++++++++++++++++++++++------------- >>>> 3 files changed, 328 insertions(+), 80 deletions(-) >>>> >>>>diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md >>>>index 96b686c..64ccd15 100644 >>>>--- a/INSTALL.DPDK.md >>>>+++ b/INSTALL.DPDK.md >>>>@@ -859,10 +859,61 @@ by adding the following string: >>>> to <interface> sections of all network devices used by DPDK. Parameter >>>>'N' >>>> determines how many queues can be used by the guest. >>>> >>>>+ >>>>+Jumbo Frames >>>>+------------ >>>>+ >>>>+Support for Jumbo Frames may be enabled at run-time for DPDK-type >>>>ports. >>>>+ >>>>+To avail of Jumbo Frame support, add the 'dpdk-mtu' option to the >>>>ovs-vsctl >>>>+'add-port' command-line, along with the required MTU for the port. >>>>+e.g. >>>>+ >>>>+ ``` >>>>+ ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk >>>>options:dpdk-mtu=9000 >>>>+ ``` >>>>+ >>>>+When Jumbo Frames are enabled, the size of a DPDK port's mbuf segments >>>>are >>>>+increased, such that a full Jumbo Frame may be accommodated inside a >>>>single >>>>+mbuf segment. Once set, the MTU for a DPDK port is immutable. >>> >>>Why is it immutable? >> >>I guess my rationale here is that an MTU change can't be triggered via >>OVS command-line, nor can it be triggered programmatically via DPDK >>(apart from an explicit call to rte_eth_dev_set_mtu). >>So, while technically it's possibly, from a user's point of view, there's >>no way to configure it, outside of modifying the code directly. >>If I've missed something here, please let me know. > >With > >ovs-vsctl set Interface options:dpdk-mtu=... > >you could change it at runtime. Have you considered that in this patch? >I guess it's ok to have it configurable only when the port is added for >now,
I'd actually only considered 'ovs-vsctl set Interface dpdk0 mtu XYZ'. Since the MTU field in the Interface table is a status field (i.e. essentially read-only), updating the MTU in this way wouldn't have been possible. Unfortunately, the same can't be said for the 'options' field - I'll need to look at this before I push a new version of the patch. Thanks for the catch! >but we need to make sure that nothing bad happens if a user tries to change >it when it's running. > >Also, after talking with other people, I think that a better name for the >parameter might be 'mtu_request', No problem, I'll rename it. and we might want to use it also for >kernel >ports (but that doesn't need to be done now). > > >> >>> >>>>+ >>>>+Jumbo frame support has been validated against 13312B frames, using the >>>>+DPDK `igb_uio` driver, but larger frames and other DPDK NIC drivers may >>>>+theoretically be supported. Supported port types excludes vHost-Cuse >>>>ports, as >>>>+this feature is pending deprecation. >>>>+ >>>>+ >>>>+vHost Ports and Jumbo Frames >>>>+---------------------------- >>>>+Jumbo frame support is available for DPDK vHost-User ports only. Some >>>>additional >>>>+configuration is needed to take advantage of this feature: >>>>+ >>>>+ 1. `mergeable buffers` must be enabled for vHost ports, as >>>>demonstrated in >>>>+ the QEMU command line snippet below: >>>>+ >>>>+ ``` >>>>+ '-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \' >>>>+ '-device >>>>virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=on' >>>>+ ``` >>>>+ >>>>+ 2. Where virtio devices are bound to the Linux kernel driver in a >>>>guest >>>>+ environment (i.e. interfaces are not bound to an in-guest DPDK >>>>driver), the >>>>+ MTU of those logical network interfaces must also be increased. >>>>This >>>>+ avoids segmentation of Jumbo Frames in the guest. Note that 'MTU' >>>>refers >>>>+ to the length of the IP packet only, and not that of the entire >>>>frame. >>>>+ >>>>+ e.g. To calculate the exact MTU of a standard IPv4 frame, subtract >>>>the L2 >>>>+ header and CRC lengths (i.e. 18B) from the max supported frame >>>>size. >>>>+ So, to set the MTU for a 13312B Jumbo Frame: >>>>+ >>>>+ ``` >>>>+ ifconfig eth1 mtu 13294 >>>>+ ``` >>>>+ >>>>+ >>>> Restrictions: >>>> ------------- >>>> >>>>- - Work with 1500 MTU, needs few changes in DPDK lib to fix this >>>>issue. >>>> - Currently DPDK port does not make use any offload functionality. >>>> - DPDK-vHost support works with 1G huge pages. >>>> >>>>@@ -903,6 +954,12 @@ Restrictions: >>>> the next release of DPDK (which includes the above patch) is >>>>available and >>>> integrated into OVS. >>>> >>>>+ Jumbo Frames: >>>>+ - `virtio-pmd`: DPDK apps in the guest do not exit gracefully. The >>>>source of >>>>+ this issue is currently being investigated. >>>>+ - vHost-Cuse: Jumbo Frame support is not available for vHost Cuse >>>>ports. >>>>+ >>>>+ >>>> Bug Reporting: >>>> -------------- >>>> >>>>diff --git a/NEWS b/NEWS >>>>index 5c18867..cd563e0 100644 >>>>--- a/NEWS >>>>+++ b/NEWS >>>>@@ -46,6 +46,8 @@ v2.5.0 - xx xxx xxxx >>>> abstractions, such as virtual L2 and L3 overlays and security >>>>groups. >>>> - RHEL packaging: >>>> * DPDK ports may now be created via network scripts (see >>>>README.RHEL). >>>>+ - netdev-dpdk: >>>>+ * Add Jumbo Frame Support >>> >>>I don't think we should add this to 2.5 >> >>Out of curiosity, is this mainly due to potential instability? >>In any event, I'll move it into the 'Post-v2.5.0' section. >> >>> >>>> >>>> >>>> v2.4.0 - 20 Aug 2015 >>>>diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c >>>>index de7e488..76a5dcc 100644 >>>>--- a/lib/netdev-dpdk.c >>>>+++ b/lib/netdev-dpdk.c >>>>@@ -62,20 +62,25 @@ static struct vlog_rate_limit rl = >>>>VLOG_RATE_LIMIT_INIT(5, 20); >>>> #define OVS_CACHE_LINE_SIZE CACHE_LINE_SIZE >>>> #define OVS_VPORT_DPDK "ovs_dpdk" >>>> >>>>+#define NETDEV_DPDK_JUMBO_FRAME_ENABLED 1 >>> >>>I don't see particular value in the above #define. >> >>I've removed this macro, and the section of code that used it in the >>latest version of the patch. >> >>> >>>>+#define NETDEV_DPDK_DEFAULT_RX_BUFSIZE 1024 >>>>+ >>>> /* >>>> * need to reserve tons of extra space in the mbufs so we can align the >>>> * DMA addresses to 4KB. >>>> * The minimum mbuf size is limited to avoid scatter behaviour and drop >>>>in >>>> * performance for standard Ethernet MTU. >>>> */ >>>>-#define MTU_TO_MAX_LEN(mtu) ((mtu) + ETHER_HDR_LEN + ETHER_CRC_LEN) >>>>-#define MBUF_SIZE_MTU(mtu) (MTU_TO_MAX_LEN(mtu) \ >>>>- + sizeof(struct dp_packet) \ >>>>- + RTE_PKTMBUF_HEADROOM) >>>>-#define MBUF_SIZE_DRIVER (2048 \ >>>>- + sizeof (struct rte_mbuf) \ >>>>- + RTE_PKTMBUF_HEADROOM) >>>>-#define MBUF_SIZE(mtu) MAX(MBUF_SIZE_MTU(mtu), MBUF_SIZE_DRIVER) >>>>+#define MTU_TO_FRAME_LEN(mtu) ((mtu) + ETHER_HDR_LEN + >>>>ETHER_CRC_LEN) >>>>+#define FRAME_LEN_TO_MTU(frame_len) ((frame_len)- ETHER_HDR_LEN - >>>>ETHER_CRC_LEN) >>>>+#define MBUF_SEGMENT_SIZE(mtu) ( MTU_TO_FRAME_LEN(mtu) \ >>>>+ + sizeof(struct dp_packet) \ >>>>+ + RTE_PKTMBUF_HEADROOM) >>>>+ >>>>+/* This value should be specified as a multiple of the DPDK NIC >>>>driver's >>>>+ * 'min_rx_bufsize' attribute (currently 1024B for 'igb_uio'). >>>>+ */ >>>>+#define NETDEV_DPDK_MAX_FRAME_LEN 13312 >>>> >>>> /* Max and min number of packets in the mempool. OVS tries to >>>>allocate a >>>> * mempool with MAX_NB_MBUF: if this fails (because the system doesn't >>>>have >>>>@@ -86,6 +91,8 @@ static struct vlog_rate_limit rl = >>>>VLOG_RATE_LIMIT_INIT(5, 20); >>>> #define MIN_NB_MBUF (4096 * 4) >>>> #define MP_CACHE_SZ RTE_MEMPOOL_CACHE_MAX_SIZE >>>> >>>>+#define DPDK_VLAN_TAG_LEN 4 >>>>+ >>>> /* MAX_NB_MBUF can be divided by 2 many times, until MIN_NB_MBUF */ >>>> BUILD_ASSERT_DECL(MAX_NB_MBUF % >>>>ROUND_DOWN_POW2(MAX_NB_MBUF/MIN_NB_MBUF) >>>>== 0); >>>> >>>>@@ -114,7 +121,6 @@ static const struct rte_eth_conf port_conf = { >>>> .header_split = 0, /* Header Split disabled */ >>>> .hw_ip_checksum = 0, /* IP checksum offload disabled */ >>>> .hw_vlan_filter = 0, /* VLAN filtering disabled */ >>>>- .jumbo_frame = 0, /* Jumbo Frame Support disabled */ >>>> .hw_strip_crc = 0, >>>> }, >>>> .rx_adv_conf = { >>>>@@ -254,6 +260,41 @@ is_dpdk_class(const struct netdev_class *class) >>>> return class->construct == netdev_dpdk_construct; >>>> } >>>> >>>>+/* DPDK NIC drivers allocate RX buffers at a particular granularity >>>>+ * (specified by rte_eth_dev_info.min_rx_bufsize - currently 1K for >>>>igb_uio). >>>>+ * If 'frame_len' is not a multiple of this value, insufficient buffers >>>>are >>>>+ * allocated to accomodate the packet in its entirety. Furthermore, the >>>>igb_uio >>>>+ * driver needs to ensure that there is also sufficient space in the Rx >>>>buffer >>>>+ * to accommodate two VLAN tags (for QinQ frames). If the RX buffer is >>>>too >>>>+ * small, then the driver enables scatter RX behaviour, which reduces >>>>+ * performance. To prevent this, use a buffer size that is closest to >>>>+ * 'frame_len', but which satisfies the aforementioned criteria. >>>>+ */ >>>>+static uint32_t >>>>+dpdk_buf_size(struct netdev_dpdk *netdev, int frame_len) >>>>+{ >>>>+ struct rte_eth_dev_info info; >>>>+ uint32_t buf_size; >>>>+ /* XXX: This is a workaround for DPDK v2.2, and needs to be >>>>refactored with a >>>>+ * future DPDK release. */ >>> >>>Could you elaborate on that? >> >>Due to changes pending in the latest version of the code, this is no >>longer relevant and will be removed. >> >>> >>>>+ uint32_t len = frame_len + (2 * DPDK_VLAN_TAG_LEN); >>>>+ >>>>+ if(netdev->type == DPDK_DEV_ETH) { >>>>+ rte_eth_dev_info_get(netdev->port_id, &info); >>>>+ buf_size = (info.min_rx_bufsize == 0) ? >>>>+ NETDEV_DPDK_DEFAULT_RX_BUFSIZE : >>>>+ info.min_rx_bufsize; >>>>+ } else { >>>>+ buf_size = NETDEV_DPDK_DEFAULT_RX_BUFSIZE; >>>>+ } >>>>+ >>>>+ if(len % buf_size) { >>>>+ len = buf_size * ((len/buf_size) + 1); >>>>+ } >>> >>>I think this looks better with the ROUND_UP macro. >> >>True - I'll update the code accordingly. >> >>> >>>>+ >>>>+ return len; >>>>+} >>>>+ >>>> /* XXX: use dpdk malloc for entire OVS. in fact huge page should be >>>>used >>>> * for all other segments data, bss and text. */ >>>> >>>>@@ -280,26 +321,65 @@ free_dpdk_buf(struct dp_packet *p) >>>> } >>>> >>>> static void >>>>-__rte_pktmbuf_init(struct rte_mempool *mp, >>>>- void *opaque_arg OVS_UNUSED, >>>>- void *_m, >>>>- unsigned i OVS_UNUSED) >>>>+ovs_rte_pktmbuf_pool_init(struct rte_mempool *mp, void *opaque_arg) >>>> { >>>>- struct rte_mbuf *m = _m; >>>>- uint32_t buf_len = mp->elt_size - sizeof(struct dp_packet); >>>>+ struct rte_pktmbuf_pool_private *user_mbp_priv, *mbp_priv; >>>>+ struct rte_pktmbuf_pool_private default_mbp_priv; >>>>+ uint16_t roomsz; >>>> >>>> RTE_MBUF_ASSERT(mp->elt_size >= sizeof(struct dp_packet)); >>>> >>>>+ /* if no structure is provided, assume no mbuf private area */ >>>>+ >>>>+ user_mbp_priv = opaque_arg; >>>>+ if (user_mbp_priv == NULL) { >>>>+ default_mbp_priv.mbuf_priv_size = 0; >>>>+ if (mp->elt_size > sizeof(struct dp_packet)) { >>>>+ roomsz = mp->elt_size - sizeof(struct dp_packet); >>>>+ } else { >>>>+ roomsz = 0; >>>>+ } >>>>+ default_mbp_priv.mbuf_data_room_size = roomsz; >>>>+ user_mbp_priv = &default_mbp_priv; >>>>+ } >>>>+ >>>>+ RTE_MBUF_ASSERT(mp->elt_size >= sizeof(struct dp_packet) + >>>>+ user_mbp_priv->mbuf_data_room_size + >>>>+ user_mbp_priv->mbuf_priv_size); >>>>+ >>>>+ mbp_priv = rte_mempool_get_priv(mp); >>>>+ memcpy(mbp_priv, user_mbp_priv, sizeof(*mbp_priv)); >>>>+} >>>>+ >>>>+/* Initialise some fields in the mbuf structure that are not modified >>>>by >>>>the >>>>+ * user once created (origin pool, buffer start address, etc.*/ >>>>+static void >>>>+__ovs_rte_pktmbuf_init(struct rte_mempool *mp, >>>>+ void *opaque_arg OVS_UNUSED, >>>>+ void *_m, >>>>+ unsigned i OVS_UNUSED) >>>>+{ >>>>+ struct rte_mbuf *m = _m; >>>>+ uint32_t buf_size, buf_len, priv_size; >>>>+ >>>>+ priv_size = rte_pktmbuf_priv_size(mp); >>>>+ buf_size = sizeof(struct dp_packet) + priv_size; >>>>+ buf_len = rte_pktmbuf_data_room_size(mp); >>>>+ >>>>+ RTE_MBUF_ASSERT(RTE_ALIGN(priv_size, RTE_MBUF_PRIV_ALIGN) == >>>>priv_size); >>>>+ RTE_MBUF_ASSERT(mp->elt_size >= buf_size); >>>>+ RTE_MBUF_ASSERT(buf_len <= UINT16_MAX); >>>>+ >>>> memset(m, 0, mp->elt_size); >>>> >>>>- /* start of buffer is just after mbuf structure */ >>>>- m->buf_addr = (char *)m + sizeof(struct dp_packet); >>>>- m->buf_physaddr = rte_mempool_virt2phy(mp, m) + >>>>- sizeof(struct dp_packet); >>>>+ /* start of buffer is after dp_packet structure and priv data */ >>>>+ m->priv_size = priv_size; >>>>+ m->buf_addr = (char *)m + buf_size; >>>>+ m->buf_physaddr = rte_mempool_virt2phy(mp, m) + buf_size; >>>> m->buf_len = (uint16_t)buf_len; >>>> >>>> /* keep some headroom between start of buffer and data */ >>>>- m->data_off = RTE_MIN(RTE_PKTMBUF_HEADROOM, m->buf_len); >>>>+ m->data_off = RTE_MIN(RTE_PKTMBUF_HEADROOM, (uint16_t)m->buf_len); >>>> >>>> /* init some constant fields */ >>>> m->pool = mp; >>>>@@ -315,7 +395,7 @@ ovs_rte_pktmbuf_init(struct rte_mempool *mp, >>>> { >>>> struct rte_mbuf *m = _m; >>>> >>>>- __rte_pktmbuf_init(mp, opaque_arg, _m, i); >>>>+ __ovs_rte_pktmbuf_init(mp, opaque_arg, m, i); >>>> >>>> dp_packet_init_dpdk((struct dp_packet *) m, m->buf_len); >>>> } >>>>@@ -326,6 +406,7 @@ dpdk_mp_get(int socket_id, int mtu) >>>>OVS_REQUIRES(dpdk_mutex) >>>> struct dpdk_mp *dmp = NULL; >>>> char mp_name[RTE_MEMPOOL_NAMESIZE]; >>>> unsigned mp_size; >>>>+ struct rte_pktmbuf_pool_private mbp_priv; >>>> >>>> LIST_FOR_EACH (dmp, list_node, &dpdk_mp_list) { >>>> if (dmp->socket_id == socket_id && dmp->mtu == mtu) { >>>>@@ -338,6 +419,8 @@ dpdk_mp_get(int socket_id, int mtu) >>>>OVS_REQUIRES(dpdk_mutex) >>>> dmp->socket_id = socket_id; >>>> dmp->mtu = mtu; >>>> dmp->refcount = 1; >>>>+ mbp_priv.mbuf_data_room_size = MTU_TO_FRAME_LEN(mtu) + >>>>RTE_PKTMBUF_HEADROOM; >>>>+ mbp_priv.mbuf_priv_size = 0; >>>> >>>> mp_size = MAX_NB_MBUF; >>>> do { >>>>@@ -346,10 +429,10 @@ dpdk_mp_get(int socket_id, int mtu) >>>>OVS_REQUIRES(dpdk_mutex) >>>> return NULL; >>>> } >>>> >>>>- dmp->mp = rte_mempool_create(mp_name, mp_size, MBUF_SIZE(mtu), >>>>+ dmp->mp = rte_mempool_create(mp_name, mp_size, >>>>MBUF_SEGMENT_SIZE(mtu), >>>> MP_CACHE_SZ, >>>> sizeof(struct >>>>rte_pktmbuf_pool_private), >>>>- rte_pktmbuf_pool_init, NULL, >>>>+ ovs_rte_pktmbuf_pool_init, >>>>&mbp_priv, >>>> ovs_rte_pktmbuf_init, NULL, >>>> socket_id, 0); >>>> } while (!dmp->mp && rte_errno == ENOMEM && (mp_size /= 2) >= >>>>MIN_NB_MBUF); >>> >>>Ok, this is quite intricated. I believe the reason that OVS used its own >>>__rte_pktmbuf_init() was that it wasn't possible to have custom metadata >>>in mbufs before DPDK 2.1. >>> >>>If I'm not mistaken, with DPDK commit b507905ff407 there's a way to do >>>that >>>without copying any code from DPDK with the following incremental (from >>>master) >>> >>>---8<--- >>> >>>@@ -278,42 +272,12 @@ free_dpdk_buf(struct dp_packet *p) >>> } >>> >>> static void >>>-__rte_pktmbuf_init(struct rte_mempool *mp, >>>- void *opaque_arg OVS_UNUSED, >>>- void *_m, >>>- unsigned i OVS_UNUSED) >>>-{ >>>- struct rte_mbuf *m = _m; >>>- uint32_t buf_len = mp->elt_size - sizeof(struct dp_packet); >>>- >>>- RTE_MBUF_ASSERT(mp->elt_size >= sizeof(struct dp_packet)); >>>- >>>- memset(m, 0, mp->elt_size); >>>- >>>- /* start of buffer is just after mbuf structure */ >>>- m->buf_addr = (char *)m + sizeof(struct dp_packet); >>>- m->buf_physaddr = rte_mempool_virt2phy(mp, m) + >>>- sizeof(struct dp_packet); >>>- m->buf_len = (uint16_t)buf_len; >>>- >>>- /* keep some headroom between start of buffer and data */ >>>- m->data_off = RTE_MIN(RTE_PKTMBUF_HEADROOM, m->buf_len); >>>- >>>- /* init some constant fields */ >>>- m->pool = mp; >>>- m->nb_segs = 1; >>>- m->port = 0xff; >>>-} >>>- >>>-static void >>>-ovs_rte_pktmbuf_init(struct rte_mempool *mp, >>>- void *opaque_arg OVS_UNUSED, >>>- void *_m, >>>- unsigned i OVS_UNUSED) >>>+ovs_rte_pktmbuf_init(struct rte_mempool *mp, void *opaque_arg, >>>+ void *_m, unsigned i) >>> { >>> struct rte_mbuf *m = _m; >>> >>>- __rte_pktmbuf_init(mp, opaque_arg, _m, i); >>>+ rte_pktmbuf_init(mp, opaque_arg, _m, i); >>> >>> dp_packet_init_dpdk((struct dp_packet *) m, m->buf_len); >>> } >>>@@ -339,15 +303,21 @@ dpdk_mp_get(int socket_id, int mtu) >>>OVS_REQUIRES(dpdk_mutex) >>> >>> mp_size = MAX_NB_MBUF; >>> do { >>>+ struct rte_pktmbuf_pool_private mbuf_sizes; >>>+ >>> if (snprintf(mp_name, RTE_MEMPOOL_NAMESIZE, "ovs_mp_%d_%d_%u", >>> dmp->mtu, dmp->socket_id, mp_size) < 0) { >>> return NULL; >>> } >>> >>>+ mbuf_sizes.mbuf_priv_size = sizeof (struct dp_packet) >>>+ - sizeof (struct rte_mbuf); >>>+ mbuf_sizes.mbuf_data_room_size = MBUF_SIZE(mtu) >>>+ - sizeof(struct dp_packet); >>> dmp->mp = rte_mempool_create(mp_name, mp_size, MBUF_SIZE(mtu), >>> MP_CACHE_SZ, >>> sizeof(struct >>>rte_pktmbuf_pool_private), >>>- rte_pktmbuf_pool_init, NULL, >>>+ rte_pktmbuf_pool_init, &mbuf_sizes, >>> ovs_rte_pktmbuf_init, NULL, >>> socket_id, 0); >>> } while (!dmp->mp && rte_errno == ENOMEM && (mp_size /= 2) >= >>>MIN_NB_MBUF); >>> >>>---8<--- >>> >>>I think this will make the patch cleaner. Do you think this will work >>>with >>>the increased MTU as well? >> >>Thanks for this Daniele. When I implemented the code, I attempted to >>change as little of the legacy code as possible, but seeing as how I'm >>now doing a patchset, I can roll this change in. >>Btw, I just tested with P2P 9k frames, and it works fine :) >> >>> >>>>@@ -433,6 +516,7 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, >>>>int >>>>n_rxq, int n_txq) >>>> { >>>> int diag = 0; >>>> int i; >>>>+ struct rte_eth_conf conf = port_conf; >>>> >>>> /* A device may report more queues than it makes available (this >>>>has >>>> * been observed for Intel xl710, which reserves some of them for >>>>@@ -444,7 +528,12 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, >>>>int n_rxq, int n_txq) >>>> VLOG_INFO("Retrying setup with (rxq:%d txq:%d)", n_rxq, >>>>n_txq); >>>> } >>>> >>>>- diag = rte_eth_dev_configure(dev->port_id, n_rxq, n_txq, >>>>&port_conf); >>>>+ if(OVS_UNLIKELY(dev->mtu > ETHER_MTU)) { >>>>+ conf.rxmode.jumbo_frame = NETDEV_DPDK_JUMBO_FRAME_ENABLED; >>>>+ conf.rxmode.max_rx_pkt_len = MTU_TO_FRAME_LEN(dev->mtu); >>>>+ } >>>>+ >>>>+ diag = rte_eth_dev_configure(dev->port_id, n_rxq, n_txq, >>>>&conf); >>>> if (diag) { >>>> break; >>>> } >>>>@@ -586,6 +675,7 @@ netdev_dpdk_init(struct netdev *netdev_, unsigned >>>>int >>>>port_no, >>>> struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_); >>>> int sid; >>>> int err = 0; >>>>+ uint32_t buf_size; >>>> >>>> ovs_mutex_init(&netdev->mutex); >>>> ovs_mutex_lock(&netdev->mutex); >>>>@@ -605,10 +695,16 @@ netdev_dpdk_init(struct netdev *netdev_, unsigned >>>>int port_no, >>>> netdev->port_id = port_no; >>>> netdev->type = type; >>>> netdev->flags = 0; >>>>+ >>>>+ /* Initialize port's MTU and frame len to the default Ethernet >>>>values. >>>>+ * Larger, user-specified (jumbo) frame buffers are accommodated in >>>>+ * netdev_dpdk_set_config. >>>>+ */ >>>>+ netdev->max_packet_len = ETHER_MAX_LEN; >>>> netdev->mtu = ETHER_MTU; >>>>- netdev->max_packet_len = MTU_TO_MAX_LEN(netdev->mtu); >>>> >>>>- netdev->dpdk_mp = dpdk_mp_get(netdev->socket_id, netdev->mtu); >>>>+ buf_size = dpdk_buf_size(netdev, ETHER_MAX_LEN); >>>>+ netdev->dpdk_mp = dpdk_mp_get(netdev->socket_id, >>>>FRAME_LEN_TO_MTU(buf_size)); >>>> if (!netdev->dpdk_mp) { >>>> err = ENOMEM; >>>> goto unlock; >>>>@@ -651,6 +747,27 @@ dpdk_dev_parse_name(const char dev_name[], const >>>>char prefix[], >>>> return 0; >>>> } >>>> >>>>+static void >>>>+dpdk_dev_parse_mtu(const struct smap *args, int *mtu) >>>>+{ >>>>+ const char *mtu_str = smap_get(args, "dpdk-mtu"); >>>>+ char *end_ptr = NULL; >>>>+ int local_mtu; >>>>+ >>>>+ if(mtu_str) { >>>>+ local_mtu = strtoul(mtu_str, &end_ptr, 0); >>>>+ } >>>>+ if(!mtu_str || local_mtu < ETHER_MTU || >>>>+ local_mtu > >>>>FRAME_LEN_TO_MTU(NETDEV_DPDK_MAX_FRAME_LEN) || >>>>+ *end_ptr != '\0') { >>>>+ local_mtu = ETHER_MTU; >>>>+ VLOG_WARN("Invalid or missing dpdk-mtu parameter - defaulting >>>>to >>>>%d.\n", >>>>+ local_mtu); >>>>+ } >>>>+ >>>>+ *mtu = local_mtu; >>>>+} >>>>+ >>>> static int >>>> vhost_construct_helper(struct netdev *netdev_) OVS_REQUIRES(dpdk_mutex) >>>> { >>>>@@ -777,11 +894,77 @@ netdev_dpdk_get_config(const struct netdev >>>>*netdev_, struct smap *args) >>>> smap_add_format(args, "configured_rx_queues", "%d", >>>>netdev_->n_rxq); >>>> smap_add_format(args, "requested_tx_queues", "%d", netdev_->n_txq); >>>> smap_add_format(args, "configured_tx_queues", "%d", >>>>dev->real_n_txq); >>>>+ smap_add_format(args, "mtu", "%d", dev->mtu); >>>> ovs_mutex_unlock(&dev->mutex); >>>> >>>> return 0; >>>> } >>>> >>>>+/* Set the mtu of DPDK_DEV_ETH ports */ >>>>+static int >>>>+netdev_dpdk_set_mtu(const struct netdev *netdev, int mtu) >>>>+{ >>>>+ struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); >>>>+ int old_mtu, err; >>>>+ uint32_t buf_size; >>>>+ int dpdk_mtu; >>>>+ struct dpdk_mp *old_mp; >>>>+ struct dpdk_mp *mp; >>>>+ >>>>+ ovs_mutex_lock(&dpdk_mutex); >>>>+ ovs_mutex_lock(&dev->mutex); >>>>+ if (dev->mtu == mtu) { >>>>+ err = 0; >>>>+ goto out; >>>>+ } >>>>+ >>>>+ buf_size = dpdk_buf_size(dev, MTU_TO_FRAME_LEN(mtu)); >>>>+ dpdk_mtu = FRAME_LEN_TO_MTU(buf_size); >>>>+ >>>>+ mp = dpdk_mp_get(dev->socket_id, dpdk_mtu); >>>>+ if (!mp) { >>>>+ err = ENOMEM; >>>>+ goto out; >>>>+ } >>>>+ >>>>+ rte_eth_dev_stop(dev->port_id); >>>>+ >>>>+ old_mtu = dev->mtu; >>>>+ old_mp = dev->dpdk_mp; >>>>+ dev->dpdk_mp = mp; >>>>+ dev->mtu = mtu; >>>>+ dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); >>>>+ >>>>+ err = dpdk_eth_dev_init(dev); >>>>+ if (err) { >>>>+ VLOG_WARN("Unable to set MTU '%d' for '%s'; reverting to last >>>>known " >>>>+ "good value '%d'\n", mtu, dev->up.name, old_mtu); >>>>+ dpdk_mp_put(mp); >>>>+ dev->mtu = old_mtu; >>>>+ dev->dpdk_mp = old_mp; >>>>+ dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); >>>>+ dpdk_eth_dev_init(dev); >>>>+ goto out; >>>>+ } else { >>>>+ dpdk_mp_put(old_mp); >>>>+ netdev_change_seq_changed(netdev); >>>>+ } >>>>+out: >>>>+ ovs_mutex_unlock(&dev->mutex); >>>>+ ovs_mutex_unlock(&dpdk_mutex); >>>>+ return err; >>>>+} >>>>+ >>>>+static int >>>>+netdev_dpdk_set_config(struct netdev *netdev_, const struct smap *args) >>>>+{ >>>>+ int mtu; >>>>+ >>>>+ dpdk_dev_parse_mtu(args, &mtu); >>>>+ >>>>+ return netdev_dpdk_set_mtu(netdev_, mtu); >>>>+} >>>>+ >>>> static int >>>> netdev_dpdk_get_numa_id(const struct netdev *netdev_) >>>> { >>>>@@ -1358,54 +1541,6 @@ netdev_dpdk_get_mtu(const struct netdev *netdev, >>>>int *mtup) >>>> >>>> return 0; >>>> } >>>>- >>>>-static int >>>>-netdev_dpdk_set_mtu(const struct netdev *netdev, int mtu) >>>>-{ >>>>- struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); >>>>- int old_mtu, err; >>>>- struct dpdk_mp *old_mp; >>>>- struct dpdk_mp *mp; >>>>- >>>>- ovs_mutex_lock(&dpdk_mutex); >>>>- ovs_mutex_lock(&dev->mutex); >>>>- if (dev->mtu == mtu) { >>>>- err = 0; >>>>- goto out; >>>>- } >>>>- >>>>- mp = dpdk_mp_get(dev->socket_id, dev->mtu); >>>>- if (!mp) { >>>>- err = ENOMEM; >>>>- goto out; >>>>- } >>>>- >>>>- rte_eth_dev_stop(dev->port_id); >>>>- >>>>- old_mtu = dev->mtu; >>>>- old_mp = dev->dpdk_mp; >>>>- dev->dpdk_mp = mp; >>>>- dev->mtu = mtu; >>>>- dev->max_packet_len = MTU_TO_MAX_LEN(dev->mtu); >>>>- >>>>- err = dpdk_eth_dev_init(dev); >>>>- if (err) { >>>>- dpdk_mp_put(mp); >>>>- dev->mtu = old_mtu; >>>>- dev->dpdk_mp = old_mp; >>>>- dev->max_packet_len = MTU_TO_MAX_LEN(dev->mtu); >>>>- dpdk_eth_dev_init(dev); >>>>- goto out; >>>>- } >>>>- >>>>- dpdk_mp_put(old_mp); >>>>- netdev_change_seq_changed(netdev); >>>>-out: >>>>- ovs_mutex_unlock(&dev->mutex); >>>>- ovs_mutex_unlock(&dpdk_mutex); >>>>- return err; >>>>-} >>>>- >>>> static int >>>> netdev_dpdk_get_carrier(const struct netdev *netdev_, bool *carrier); >>>> >>>>@@ -1682,7 +1817,7 @@ netdev_dpdk_get_status(const struct netdev >>>>*netdev_, struct smap *args) >>>> smap_add_format(args, "numa_id", "%d", >>>>rte_eth_dev_socket_id(dev->port_id)); >>>> smap_add_format(args, "driver_name", "%s", dev_info.driver_name); >>>> smap_add_format(args, "min_rx_bufsize", "%u", >>>>dev_info.min_rx_bufsize); >>>>- smap_add_format(args, "max_rx_pktlen", "%u", >>>>dev_info.max_rx_pktlen); >>>>+ smap_add_format(args, "max_rx_pktlen", "%u", dev->max_packet_len); >>>> smap_add_format(args, "max_rx_queues", "%u", >>>>dev_info.max_rx_queues); >>>> smap_add_format(args, "max_tx_queues", "%u", >>>>dev_info.max_tx_queues); >>>> smap_add_format(args, "max_mac_addrs", "%u", >>>>dev_info.max_mac_addrs); >>>>@@ -1904,6 +2039,51 @@ dpdk_vhost_user_class_init(void) >>>> return 0; >>>> } >>>> >>>>+/* Set the mtu of DPDK_DEV_VHOST ports */ >>>>+static int >>>>+netdev_dpdk_vhost_set_mtu(const struct netdev *netdev, int mtu) >>>>+{ >>>>+ struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); >>>>+ int err = 0; >>>>+ struct dpdk_mp *old_mp; >>>>+ struct dpdk_mp *mp; >>>>+ >>>>+ ovs_mutex_lock(&dpdk_mutex); >>>>+ ovs_mutex_lock(&dev->mutex); >>>>+ if (dev->mtu == mtu) { >>>>+ err = 0; >>>>+ goto out; >>>>+ } >>>>+ >>>>+ mp = dpdk_mp_get(dev->socket_id, mtu); >>>>+ if (!mp) { >>>>+ err = ENOMEM; >>>>+ goto out; >>>>+ } >>>>+ >>>>+ old_mp = dev->dpdk_mp; >>>>+ dev->dpdk_mp = mp; >>>>+ dev->mtu = mtu; >>>>+ dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu); >>>>+ >>>>+ dpdk_mp_put(old_mp); >>>>+ netdev_change_seq_changed(netdev); >>>>+out: >>>>+ ovs_mutex_unlock(&dev->mutex); >>>>+ ovs_mutex_unlock(&dpdk_mutex); >>>>+ return err; >>>>+} >>>>+ >>>>+static int >>>>+netdev_dpdk_vhost_set_config(struct netdev *netdev_, const struct smap >>>>*args) >>>>+{ >>>>+ int mtu; >>>>+ >>>>+ dpdk_dev_parse_mtu(args, &mtu); >>>>+ >>>>+ return netdev_dpdk_vhost_set_mtu(netdev_, mtu); >>>>+} >>>>+ >>>> static void >>>> dpdk_common_init(void) >>>> { >>>>@@ -2040,8 +2220,9 @@ unlock_dpdk: >>>> return err; >>>> } >>>> >>>>-#define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, MULTIQ, >>>>SEND, >>>>\ >>>>- GET_CARRIER, GET_STATS, GET_FEATURES, GET_STATUS, RXQ_RECV) >>>>\ >>>>+#define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, SET_CONFIG, >>>>\ >>>>+ MULTIQ, SEND, SET_MTU, GET_CARRIER, GET_STATS, GET_FEATURES, >>>>\ >>>>+ GET_STATUS, RXQ_RECV) >>>>\ >>>> { \ >>>> NAME, \ >>>> INIT, /* init */ \ >>>>@@ -2053,7 +2234,7 @@ unlock_dpdk: >>>> DESTRUCT, \ >>>> netdev_dpdk_dealloc, \ >>>> netdev_dpdk_get_config, \ >>>>- NULL, /* netdev_dpdk_set_config */ \ >>>>+ SET_CONFIG, \ >>>> NULL, /* get_tunnel_config */ \ >>>> NULL, /* build header */ \ >>>> NULL, /* push header */ \ >>>>@@ -2067,7 +2248,7 @@ unlock_dpdk: >>>> netdev_dpdk_set_etheraddr, \ >>>> netdev_dpdk_get_etheraddr, \ >>>> netdev_dpdk_get_mtu, \ >>>>- netdev_dpdk_set_mtu, \ >>>>+ SET_MTU, \ >>>> netdev_dpdk_get_ifindex, \ >>>> GET_CARRIER, \ >>>> netdev_dpdk_get_carrier_resets, \ >>>>@@ -2213,8 +2394,10 @@ static const struct netdev_class dpdk_class = >>>> NULL, >>>> netdev_dpdk_construct, >>>> netdev_dpdk_destruct, >>>>+ netdev_dpdk_set_config, >>>> netdev_dpdk_set_multiq, >>>> netdev_dpdk_eth_send, >>>>+ netdev_dpdk_set_mtu, >>>> netdev_dpdk_get_carrier, >>>> netdev_dpdk_get_stats, >>>> netdev_dpdk_get_features, >>>>@@ -2227,8 +2410,10 @@ static const struct netdev_class dpdk_ring_class >>>>= >>>> NULL, >>>> netdev_dpdk_ring_construct, >>>> netdev_dpdk_destruct, >>>>+ netdev_dpdk_set_config, >>>> netdev_dpdk_set_multiq, >>>> netdev_dpdk_ring_send, >>>>+ netdev_dpdk_set_mtu, >>>> netdev_dpdk_get_carrier, >>>> netdev_dpdk_get_stats, >>>> netdev_dpdk_get_features, >>>>@@ -2241,8 +2426,10 @@ static const struct netdev_class OVS_UNUSED >>>>dpdk_vhost_cuse_class = >>>> dpdk_vhost_cuse_class_init, >>>> netdev_dpdk_vhost_cuse_construct, >>>> netdev_dpdk_vhost_destruct, >>>>+ NULL, >>>> netdev_dpdk_vhost_set_multiq, >>>> netdev_dpdk_vhost_send, >>>>+ NULL, >>>> netdev_dpdk_vhost_get_carrier, >>>> netdev_dpdk_vhost_get_stats, >>>> NULL, >>>>@@ -2255,8 +2442,10 @@ static const struct netdev_class OVS_UNUSED >>>>dpdk_vhost_user_class = >>>> dpdk_vhost_user_class_init, >>>> netdev_dpdk_vhost_user_construct, >>>> netdev_dpdk_vhost_destruct, >>>>+ netdev_dpdk_vhost_set_config, >>>> netdev_dpdk_vhost_set_multiq, >>>> netdev_dpdk_vhost_send, >>>>+ netdev_dpdk_vhost_set_mtu, >>>> netdev_dpdk_vhost_get_carrier, >>>> netdev_dpdk_vhost_get_stats, >>>> NULL, >>>>-- >>>>1.9.3 >>>> >>>>_______________________________________________ >>>>dev mailing list >>>>dev@openvswitch.org >>>>http://openvswitch.org/mailman/listinfo/dev >> _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev