>
>Hi Mark,
>

Hi Daniele,

Thanks for the review! Responses inline below.

Cheers,
Mark

>This patch besides adding Jumbo Frame support also cleans up
>the mbuf initialization (by changing the macros, adding
>dpdk_buf_size, and rewriting __ovs_rte_pktmbuf_init), so thanks
>for this.  I think it makes sense to split the patch in two:
>one that does the clenup, and one that allows configuring the
>MTU.

Okay, sounds good - I'll spin another version, splitting into a two patch set.

>
>I agree with Flavio's comments as well, more inline
>
>Thanks
>
>On 01/02/2016 02:18, "Mark Kavanagh" <mark.b.kavan...@intel.com> wrote:
>
>>Add support for Jumbo Frames to DPDK-enabled port types,
>>using single-segment-mbufs.
>>
>>Using this approach, the amount of memory allocated for each mbuf
>>to store frame data is increased to a value greater than 1518B
>>(typical Ethernet maximum frame length). The increased space
>>available in the mbuf means that an entire Jumbo Frame can be carried
>>in a single mbuf, as opposed to partitioning it across multiple mbuf
>>segments.
>>
>>The amount of space allocated to each mbuf to hold frame data is
>>defined dynamically by the user when adding a DPDK port to a bridge.
>>If an MTU value is not supplied, or the user-supplied value is invalid,
>>the MTU for the port defaults to standard Ethernet MTU (i.e. 1500B).
>>
>>Signed-off-by: Mark Kavanagh <mark.b.kavan...@intel.com>
>>---
>> INSTALL.DPDK.md   |  59 +++++++++-
>> NEWS              |   2 +
>> lib/netdev-dpdk.c | 347
>>+++++++++++++++++++++++++++++++++++++++++-------------
>> 3 files changed, 328 insertions(+), 80 deletions(-)
>>
>>diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
>>index 96b686c..64ccd15 100644
>>--- a/INSTALL.DPDK.md
>>+++ b/INSTALL.DPDK.md
>>@@ -859,10 +859,61 @@ by adding the following string:
>> to <interface> sections of all network devices used by DPDK. Parameter
>>'N'
>> determines how many queues can be used by the guest.
>>
>>+
>>+Jumbo Frames
>>+------------
>>+
>>+Support for Jumbo Frames may be enabled at run-time for DPDK-type ports.
>>+
>>+To avail of Jumbo Frame support, add the 'dpdk-mtu' option to the
>>ovs-vsctl
>>+'add-port' command-line, along with the required MTU for the port.
>>+e.g.
>>+
>>+     ```
>>+     ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk
>>options:dpdk-mtu=9000
>>+     ```
>>+
>>+When Jumbo Frames are enabled, the size of a DPDK port's mbuf segments
>>are
>>+increased, such that a full Jumbo Frame may be accommodated inside a
>>single
>>+mbuf segment. Once set, the MTU for a DPDK port is immutable.
>
>Why is it immutable?

I guess my rationale here is that an MTU change can't be triggered via OVS 
command-line, nor can it be triggered programmatically via DPDK (apart from an 
explicit call to rte_eth_dev_set_mtu).
So, while technically it's possibly, from a user's point of view, there's no 
way to configure it, outside of modifying the code directly.
If I've missed something here, please let me know.

>
>>+
>>+Jumbo frame support has been validated against 13312B frames, using the
>>+DPDK `igb_uio` driver, but larger frames and other DPDK NIC drivers may
>>+theoretically be supported. Supported port types excludes vHost-Cuse
>>ports, as
>>+this feature is pending deprecation.
>>+
>>+
>>+vHost Ports and Jumbo Frames
>>+----------------------------
>>+Jumbo frame support is available for DPDK vHost-User ports only. Some
>>additional
>>+configuration is needed to take advantage of this feature:
>>+
>>+  1. `mergeable buffers` must be enabled for vHost ports, as
>>demonstrated in
>>+      the QEMU command line snippet below:
>>+
>>+      ```
>>+      '-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \'
>>+      '-device
>>virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=on'
>>+      ```
>>+
>>+  2. Where virtio devices are bound to the Linux kernel driver in a guest
>>+     environment (i.e. interfaces are not bound to an in-guest DPDK
>>driver), the
>>+     MTU of those logical network interfaces must also be increased. This
>>+     avoids segmentation of Jumbo Frames in the guest. Note that 'MTU'
>>refers
>>+     to the length of the IP packet only, and not that of the entire
>>frame.
>>+
>>+     e.g. To calculate the exact MTU of a standard IPv4 frame, subtract
>>the L2
>>+     header and CRC lengths (i.e. 18B) from the max supported frame size.
>>+     So, to set the MTU for a 13312B Jumbo Frame:
>>+
>>+      ```
>>+      ifconfig eth1 mtu 13294
>>+      ```
>>+
>>+
>> Restrictions:
>> -------------
>>
>>-  - Work with 1500 MTU, needs few changes in DPDK lib to fix this issue.
>>   - Currently DPDK port does not make use any offload functionality.
>>   - DPDK-vHost support works with 1G huge pages.
>>
>>@@ -903,6 +954,12 @@ Restrictions:
>>     the next release of DPDK (which includes the above patch) is
>>available and
>>     integrated into OVS.
>>
>>+  Jumbo Frames:
>>+  - `virtio-pmd`: DPDK apps in the guest do not exit gracefully. The
>>source of
>>+     this issue is currently being investigated.
>>+  - vHost-Cuse: Jumbo Frame support is not available for vHost Cuse
>>ports.
>>+
>>+
>> Bug Reporting:
>> --------------
>>
>>diff --git a/NEWS b/NEWS
>>index 5c18867..cd563e0 100644
>>--- a/NEWS
>>+++ b/NEWS
>>@@ -46,6 +46,8 @@ v2.5.0 - xx xxx xxxx
>>      abstractions, such as virtual L2 and L3 overlays and security
>>groups.
>>    - RHEL packaging:
>>      * DPDK ports may now be created via network scripts (see
>>README.RHEL).
>>+   - netdev-dpdk:
>>+     * Add Jumbo Frame Support
>
>I don't think we should add this to 2.5

Out of curiosity, is this mainly due to potential instability?
In any event, I'll move it into the 'Post-v2.5.0' section.

>
>>
>>
>> v2.4.0 - 20 Aug 2015
>>diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
>>index de7e488..76a5dcc 100644
>>--- a/lib/netdev-dpdk.c
>>+++ b/lib/netdev-dpdk.c
>>@@ -62,20 +62,25 @@ static struct vlog_rate_limit rl =
>>VLOG_RATE_LIMIT_INIT(5, 20);
>> #define OVS_CACHE_LINE_SIZE CACHE_LINE_SIZE
>> #define OVS_VPORT_DPDK "ovs_dpdk"
>>
>>+#define NETDEV_DPDK_JUMBO_FRAME_ENABLED     1
>
>I don't see particular value in the above #define.

I've removed this macro, and the section of code that used it in the latest 
version of the patch.

>
>>+#define NETDEV_DPDK_DEFAULT_RX_BUFSIZE      1024
>>+
>> /*
>>  * need to reserve tons of extra space in the mbufs so we can align the
>>  * DMA addresses to 4KB.
>>  * The minimum mbuf size is limited to avoid scatter behaviour and drop
>>in
>>  * performance for standard Ethernet MTU.
>>  */
>>-#define MTU_TO_MAX_LEN(mtu)  ((mtu) + ETHER_HDR_LEN + ETHER_CRC_LEN)
>>-#define MBUF_SIZE_MTU(mtu)   (MTU_TO_MAX_LEN(mtu)        \
>>-                              + sizeof(struct dp_packet) \
>>-                              + RTE_PKTMBUF_HEADROOM)
>>-#define MBUF_SIZE_DRIVER     (2048                       \
>>-                              + sizeof (struct rte_mbuf) \
>>-                              + RTE_PKTMBUF_HEADROOM)
>>-#define MBUF_SIZE(mtu)       MAX(MBUF_SIZE_MTU(mtu), MBUF_SIZE_DRIVER)
>>+#define MTU_TO_FRAME_LEN(mtu)       ((mtu) + ETHER_HDR_LEN +
>>ETHER_CRC_LEN)
>>+#define FRAME_LEN_TO_MTU(frame_len) ((frame_len)- ETHER_HDR_LEN -
>>ETHER_CRC_LEN)
>>+#define MBUF_SEGMENT_SIZE(mtu)      ( MTU_TO_FRAME_LEN(mtu)      \
>>+                                    + sizeof(struct dp_packet)   \
>>+                                    + RTE_PKTMBUF_HEADROOM)
>>+
>>+/* This value should be specified as a multiple of the DPDK NIC driver's
>>+ * 'min_rx_bufsize' attribute (currently 1024B for 'igb_uio').
>>+ */
>>+#define NETDEV_DPDK_MAX_FRAME_LEN    13312
>>
>> /* Max and min number of packets in the mempool.  OVS tries to allocate a
>>  * mempool with MAX_NB_MBUF: if this fails (because the system doesn't
>>have
>>@@ -86,6 +91,8 @@ static struct vlog_rate_limit rl =
>>VLOG_RATE_LIMIT_INIT(5, 20);
>> #define MIN_NB_MBUF          (4096 * 4)
>> #define MP_CACHE_SZ          RTE_MEMPOOL_CACHE_MAX_SIZE
>>
>>+#define DPDK_VLAN_TAG_LEN    4
>>+
>> /* MAX_NB_MBUF can be divided by 2 many times, until MIN_NB_MBUF */
>> BUILD_ASSERT_DECL(MAX_NB_MBUF % ROUND_DOWN_POW2(MAX_NB_MBUF/MIN_NB_MBUF)
>>== 0);
>>
>>@@ -114,7 +121,6 @@ static const struct rte_eth_conf port_conf = {
>>         .header_split   = 0, /* Header Split disabled */
>>         .hw_ip_checksum = 0, /* IP checksum offload disabled */
>>         .hw_vlan_filter = 0, /* VLAN filtering disabled */
>>-        .jumbo_frame    = 0, /* Jumbo Frame Support disabled */
>>         .hw_strip_crc   = 0,
>>     },
>>     .rx_adv_conf = {
>>@@ -254,6 +260,41 @@ is_dpdk_class(const struct netdev_class *class)
>>     return class->construct == netdev_dpdk_construct;
>> }
>>
>>+/* DPDK NIC drivers allocate RX buffers at a particular granularity
>>+ * (specified by rte_eth_dev_info.min_rx_bufsize - currently 1K for
>>igb_uio).
>>+ * If 'frame_len' is not a multiple of this value, insufficient buffers
>>are
>>+ * allocated to accomodate the packet in its entirety. Furthermore, the
>>igb_uio
>>+ * driver needs to ensure that there is also sufficient space in the Rx
>>buffer
>>+ * to accommodate two VLAN tags (for QinQ frames). If the RX buffer is
>>too
>>+ * small, then the driver enables scatter RX behaviour, which reduces
>>+ * performance. To prevent this, use a buffer size that is closest to
>>+ * 'frame_len', but which satisfies the aforementioned criteria.
>>+ */
>>+static uint32_t
>>+dpdk_buf_size(struct netdev_dpdk *netdev, int frame_len)
>>+{
>>+    struct rte_eth_dev_info info;
>>+    uint32_t buf_size;
>>+    /* XXX: This is a workaround for DPDK v2.2, and needs to be
>>refactored with a
>>+     * future DPDK release. */
>
>Could you elaborate on that?

Due to changes pending in the latest version of the code, this is no longer 
relevant and will be removed.

>
>>+    uint32_t len = frame_len + (2 * DPDK_VLAN_TAG_LEN);
>>+
>>+    if(netdev->type == DPDK_DEV_ETH) {
>>+        rte_eth_dev_info_get(netdev->port_id, &info);
>>+        buf_size = (info.min_rx_bufsize == 0) ?
>>+                   NETDEV_DPDK_DEFAULT_RX_BUFSIZE :
>>+                   info.min_rx_bufsize;
>>+    } else {
>>+        buf_size = NETDEV_DPDK_DEFAULT_RX_BUFSIZE;
>>+    }
>>+
>>+    if(len % buf_size) {
>>+        len = buf_size * ((len/buf_size) + 1);
>>+    }
>
>I think this looks better with the ROUND_UP macro.

True - I'll update the code accordingly.

>
>>+
>>+    return len;
>>+}
>>+
>> /* XXX: use dpdk malloc for entire OVS. in fact huge page should be used
>>  * for all other segments data, bss and text. */
>>
>>@@ -280,26 +321,65 @@ free_dpdk_buf(struct dp_packet *p)
>> }
>>
>> static void
>>-__rte_pktmbuf_init(struct rte_mempool *mp,
>>-                   void *opaque_arg OVS_UNUSED,
>>-                   void *_m,
>>-                   unsigned i OVS_UNUSED)
>>+ovs_rte_pktmbuf_pool_init(struct rte_mempool *mp, void *opaque_arg)
>> {
>>-    struct rte_mbuf *m = _m;
>>-    uint32_t buf_len = mp->elt_size - sizeof(struct dp_packet);
>>+    struct rte_pktmbuf_pool_private *user_mbp_priv, *mbp_priv;
>>+    struct rte_pktmbuf_pool_private default_mbp_priv;
>>+    uint16_t roomsz;
>>
>>     RTE_MBUF_ASSERT(mp->elt_size >= sizeof(struct dp_packet));
>>
>>+    /* if no structure is provided, assume no mbuf private area */
>>+
>>+    user_mbp_priv = opaque_arg;
>>+    if (user_mbp_priv == NULL) {
>>+        default_mbp_priv.mbuf_priv_size = 0;
>>+        if (mp->elt_size > sizeof(struct dp_packet)) {
>>+            roomsz = mp->elt_size - sizeof(struct dp_packet);
>>+        } else {
>>+            roomsz = 0;
>>+        }
>>+        default_mbp_priv.mbuf_data_room_size = roomsz;
>>+        user_mbp_priv = &default_mbp_priv;
>>+    }
>>+
>>+    RTE_MBUF_ASSERT(mp->elt_size >= sizeof(struct dp_packet) +
>>+        user_mbp_priv->mbuf_data_room_size +
>>+        user_mbp_priv->mbuf_priv_size);
>>+
>>+    mbp_priv = rte_mempool_get_priv(mp);
>>+    memcpy(mbp_priv, user_mbp_priv, sizeof(*mbp_priv));
>>+}
>>+
>>+/* Initialise some fields in the mbuf structure that are not modified by
>>the
>>+ * user once created (origin pool, buffer start address, etc.*/
>>+static void
>>+__ovs_rte_pktmbuf_init(struct rte_mempool *mp,
>>+                       void *opaque_arg OVS_UNUSED,
>>+                       void *_m,
>>+                       unsigned i OVS_UNUSED)
>>+{
>>+    struct rte_mbuf *m = _m;
>>+    uint32_t buf_size, buf_len, priv_size;
>>+
>>+    priv_size = rte_pktmbuf_priv_size(mp);
>>+    buf_size = sizeof(struct dp_packet) + priv_size;
>>+    buf_len = rte_pktmbuf_data_room_size(mp);
>>+
>>+    RTE_MBUF_ASSERT(RTE_ALIGN(priv_size, RTE_MBUF_PRIV_ALIGN) ==
>>priv_size);
>>+    RTE_MBUF_ASSERT(mp->elt_size >= buf_size);
>>+    RTE_MBUF_ASSERT(buf_len <= UINT16_MAX);
>>+
>>     memset(m, 0, mp->elt_size);
>>
>>-    /* start of buffer is just after mbuf structure */
>>-    m->buf_addr = (char *)m + sizeof(struct dp_packet);
>>-    m->buf_physaddr = rte_mempool_virt2phy(mp, m) +
>>-                    sizeof(struct dp_packet);
>>+    /* start of buffer is after dp_packet structure and priv data */
>>+    m->priv_size = priv_size;
>>+    m->buf_addr = (char *)m + buf_size;
>>+    m->buf_physaddr = rte_mempool_virt2phy(mp, m) + buf_size;
>>     m->buf_len = (uint16_t)buf_len;
>>
>>     /* keep some headroom between start of buffer and data */
>>-    m->data_off = RTE_MIN(RTE_PKTMBUF_HEADROOM, m->buf_len);
>>+    m->data_off = RTE_MIN(RTE_PKTMBUF_HEADROOM, (uint16_t)m->buf_len);
>>
>>     /* init some constant fields */
>>     m->pool = mp;
>>@@ -315,7 +395,7 @@ ovs_rte_pktmbuf_init(struct rte_mempool *mp,
>> {
>>     struct rte_mbuf *m = _m;
>>
>>-    __rte_pktmbuf_init(mp, opaque_arg, _m, i);
>>+    __ovs_rte_pktmbuf_init(mp, opaque_arg, m, i);
>>
>>     dp_packet_init_dpdk((struct dp_packet *) m, m->buf_len);
>> }
>>@@ -326,6 +406,7 @@ dpdk_mp_get(int socket_id, int mtu)
>>OVS_REQUIRES(dpdk_mutex)
>>     struct dpdk_mp *dmp = NULL;
>>     char mp_name[RTE_MEMPOOL_NAMESIZE];
>>     unsigned mp_size;
>>+    struct rte_pktmbuf_pool_private mbp_priv;
>>
>>     LIST_FOR_EACH (dmp, list_node, &dpdk_mp_list) {
>>         if (dmp->socket_id == socket_id && dmp->mtu == mtu) {
>>@@ -338,6 +419,8 @@ dpdk_mp_get(int socket_id, int mtu)
>>OVS_REQUIRES(dpdk_mutex)
>>     dmp->socket_id = socket_id;
>>     dmp->mtu = mtu;
>>     dmp->refcount = 1;
>>+    mbp_priv.mbuf_data_room_size = MTU_TO_FRAME_LEN(mtu) +
>>RTE_PKTMBUF_HEADROOM;
>>+    mbp_priv.mbuf_priv_size = 0;
>>
>>     mp_size = MAX_NB_MBUF;
>>     do {
>>@@ -346,10 +429,10 @@ dpdk_mp_get(int socket_id, int mtu)
>>OVS_REQUIRES(dpdk_mutex)
>>             return NULL;
>>         }
>>
>>-        dmp->mp = rte_mempool_create(mp_name, mp_size, MBUF_SIZE(mtu),
>>+        dmp->mp = rte_mempool_create(mp_name, mp_size,
>>MBUF_SEGMENT_SIZE(mtu),
>>                                      MP_CACHE_SZ,
>>                                      sizeof(struct
>>rte_pktmbuf_pool_private),
>>-                                     rte_pktmbuf_pool_init, NULL,
>>+                                     ovs_rte_pktmbuf_pool_init,
>>&mbp_priv,
>>                                      ovs_rte_pktmbuf_init, NULL,
>>                                      socket_id, 0);
>>     } while (!dmp->mp && rte_errno == ENOMEM && (mp_size /= 2) >=
>>MIN_NB_MBUF);
>
>Ok, this is quite intricated. I believe the reason that OVS used its own
>__rte_pktmbuf_init() was that it wasn't possible to have custom metadata
>in mbufs before DPDK 2.1.
>
>If I'm not mistaken, with DPDK commit b507905ff407 there's a way to do that
>without copying any code from DPDK with the following incremental (from
>master)
>
>---8<---
>
>@@ -278,42 +272,12 @@ free_dpdk_buf(struct dp_packet *p)
> }
>
> static void
>-__rte_pktmbuf_init(struct rte_mempool *mp,
>-                   void *opaque_arg OVS_UNUSED,
>-                   void *_m,
>-                   unsigned i OVS_UNUSED)
>-{
>-    struct rte_mbuf *m = _m;
>-    uint32_t buf_len = mp->elt_size - sizeof(struct dp_packet);
>-
>-    RTE_MBUF_ASSERT(mp->elt_size >= sizeof(struct dp_packet));
>-
>-    memset(m, 0, mp->elt_size);
>-
>-    /* start of buffer is just after mbuf structure */
>-    m->buf_addr = (char *)m + sizeof(struct dp_packet);
>-    m->buf_physaddr = rte_mempool_virt2phy(mp, m) +
>-                    sizeof(struct dp_packet);
>-    m->buf_len = (uint16_t)buf_len;
>-
>-    /* keep some headroom between start of buffer and data */
>-    m->data_off = RTE_MIN(RTE_PKTMBUF_HEADROOM, m->buf_len);
>-
>-    /* init some constant fields */
>-    m->pool = mp;
>-    m->nb_segs = 1;
>-    m->port = 0xff;
>-}
>-
>-static void
>-ovs_rte_pktmbuf_init(struct rte_mempool *mp,
>-                     void *opaque_arg OVS_UNUSED,
>-                     void *_m,
>-                     unsigned i OVS_UNUSED)
>+ovs_rte_pktmbuf_init(struct rte_mempool *mp, void *opaque_arg,
>+                     void *_m, unsigned i)
> {
>     struct rte_mbuf *m = _m;
>
>-    __rte_pktmbuf_init(mp, opaque_arg, _m, i);
>+    rte_pktmbuf_init(mp, opaque_arg, _m, i);
>
>     dp_packet_init_dpdk((struct dp_packet *) m, m->buf_len);
> }
>@@ -339,15 +303,21 @@ dpdk_mp_get(int socket_id, int mtu)
>OVS_REQUIRES(dpdk_mutex)
>
>     mp_size = MAX_NB_MBUF;
>     do {
>+        struct rte_pktmbuf_pool_private mbuf_sizes;
>+
>         if (snprintf(mp_name, RTE_MEMPOOL_NAMESIZE, "ovs_mp_%d_%d_%u",
>                      dmp->mtu, dmp->socket_id, mp_size) < 0) {
>             return NULL;
>         }
>
>+        mbuf_sizes.mbuf_priv_size = sizeof (struct dp_packet)
>+                                    - sizeof (struct rte_mbuf);
>+        mbuf_sizes.mbuf_data_room_size = MBUF_SIZE(mtu)
>+                                         - sizeof(struct dp_packet);
>         dmp->mp = rte_mempool_create(mp_name, mp_size, MBUF_SIZE(mtu),
>                                      MP_CACHE_SZ,
>                                      sizeof(struct
>rte_pktmbuf_pool_private),
>-                                     rte_pktmbuf_pool_init, NULL,
>+                                     rte_pktmbuf_pool_init, &mbuf_sizes,
>                                      ovs_rte_pktmbuf_init, NULL,
>                                      socket_id, 0);
>     } while (!dmp->mp && rte_errno == ENOMEM && (mp_size /= 2) >=
>MIN_NB_MBUF);
>
>---8<---
>
>I think this will make the patch cleaner. Do you think this will work with
>the increased MTU as well?

Thanks for this Daniele. When I implemented the code, I attempted to change as 
little of the legacy code as possible, but seeing as how I'm now doing a 
patchset, I can roll this change in.
Btw, I just tested with P2P 9k frames, and it works fine :)

>
>>@@ -433,6 +516,7 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev, int
>>n_rxq, int n_txq)
>> {
>>     int diag = 0;
>>     int i;
>>+    struct rte_eth_conf conf = port_conf;
>>
>>     /* A device may report more queues than it makes available (this has
>>      * been observed for Intel xl710, which reserves some of them for
>>@@ -444,7 +528,12 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk *dev,
>>int n_rxq, int n_txq)
>>             VLOG_INFO("Retrying setup with (rxq:%d txq:%d)", n_rxq,
>>n_txq);
>>         }
>>
>>-        diag = rte_eth_dev_configure(dev->port_id, n_rxq, n_txq,
>>&port_conf);
>>+        if(OVS_UNLIKELY(dev->mtu > ETHER_MTU)) {
>>+            conf.rxmode.jumbo_frame = NETDEV_DPDK_JUMBO_FRAME_ENABLED;
>>+            conf.rxmode.max_rx_pkt_len = MTU_TO_FRAME_LEN(dev->mtu);
>>+        }
>>+
>>+        diag = rte_eth_dev_configure(dev->port_id, n_rxq, n_txq, &conf);
>>         if (diag) {
>>             break;
>>         }
>>@@ -586,6 +675,7 @@ netdev_dpdk_init(struct netdev *netdev_, unsigned int
>>port_no,
>>     struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_);
>>     int sid;
>>     int err = 0;
>>+    uint32_t buf_size;
>>
>>     ovs_mutex_init(&netdev->mutex);
>>     ovs_mutex_lock(&netdev->mutex);
>>@@ -605,10 +695,16 @@ netdev_dpdk_init(struct netdev *netdev_, unsigned
>>int port_no,
>>     netdev->port_id = port_no;
>>     netdev->type = type;
>>     netdev->flags = 0;
>>+
>>+    /* Initialize port's MTU and frame len to the default Ethernet
>>values.
>>+     * Larger, user-specified (jumbo) frame buffers are accommodated in
>>+     * netdev_dpdk_set_config.
>>+     */
>>+    netdev->max_packet_len = ETHER_MAX_LEN;
>>     netdev->mtu = ETHER_MTU;
>>-    netdev->max_packet_len = MTU_TO_MAX_LEN(netdev->mtu);
>>
>>-    netdev->dpdk_mp = dpdk_mp_get(netdev->socket_id, netdev->mtu);
>>+    buf_size = dpdk_buf_size(netdev, ETHER_MAX_LEN);
>>+    netdev->dpdk_mp = dpdk_mp_get(netdev->socket_id,
>>FRAME_LEN_TO_MTU(buf_size));
>>     if (!netdev->dpdk_mp) {
>>         err = ENOMEM;
>>         goto unlock;
>>@@ -651,6 +747,27 @@ dpdk_dev_parse_name(const char dev_name[], const
>>char prefix[],
>>     return 0;
>> }
>>
>>+static void
>>+dpdk_dev_parse_mtu(const struct smap *args, int *mtu)
>>+{
>>+    const char *mtu_str = smap_get(args, "dpdk-mtu");
>>+    char *end_ptr = NULL;
>>+    int local_mtu;
>>+
>>+    if(mtu_str) {
>>+        local_mtu = strtoul(mtu_str, &end_ptr, 0);
>>+    }
>>+    if(!mtu_str || local_mtu < ETHER_MTU ||
>>+                   local_mtu >
>>FRAME_LEN_TO_MTU(NETDEV_DPDK_MAX_FRAME_LEN) ||
>>+                   *end_ptr != '\0') {
>>+        local_mtu = ETHER_MTU;
>>+        VLOG_WARN("Invalid or missing dpdk-mtu parameter - defaulting to
>>%d.\n",
>>+                   local_mtu);
>>+    }
>>+
>>+    *mtu = local_mtu;
>>+}
>>+
>> static int
>> vhost_construct_helper(struct netdev *netdev_) OVS_REQUIRES(dpdk_mutex)
>> {
>>@@ -777,11 +894,77 @@ netdev_dpdk_get_config(const struct netdev
>>*netdev_, struct smap *args)
>>     smap_add_format(args, "configured_rx_queues", "%d", netdev_->n_rxq);
>>     smap_add_format(args, "requested_tx_queues", "%d", netdev_->n_txq);
>>     smap_add_format(args, "configured_tx_queues", "%d", dev->real_n_txq);
>>+    smap_add_format(args, "mtu", "%d", dev->mtu);
>>     ovs_mutex_unlock(&dev->mutex);
>>
>>     return 0;
>> }
>>
>>+/* Set the mtu of DPDK_DEV_ETH ports */
>>+static int
>>+netdev_dpdk_set_mtu(const struct netdev *netdev, int mtu)
>>+{
>>+    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>>+    int old_mtu, err;
>>+    uint32_t buf_size;
>>+    int dpdk_mtu;
>>+    struct dpdk_mp *old_mp;
>>+    struct dpdk_mp *mp;
>>+
>>+    ovs_mutex_lock(&dpdk_mutex);
>>+    ovs_mutex_lock(&dev->mutex);
>>+    if (dev->mtu == mtu) {
>>+        err = 0;
>>+        goto out;
>>+    }
>>+
>>+    buf_size = dpdk_buf_size(dev, MTU_TO_FRAME_LEN(mtu));
>>+    dpdk_mtu = FRAME_LEN_TO_MTU(buf_size);
>>+
>>+    mp = dpdk_mp_get(dev->socket_id, dpdk_mtu);
>>+    if (!mp) {
>>+        err = ENOMEM;
>>+        goto out;
>>+    }
>>+
>>+    rte_eth_dev_stop(dev->port_id);
>>+
>>+    old_mtu = dev->mtu;
>>+    old_mp = dev->dpdk_mp;
>>+    dev->dpdk_mp = mp;
>>+    dev->mtu = mtu;
>>+    dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
>>+
>>+    err = dpdk_eth_dev_init(dev);
>>+    if (err) {
>>+        VLOG_WARN("Unable to set MTU '%d' for '%s'; reverting to last
>>known "
>>+                  "good value '%d'\n", mtu, dev->up.name, old_mtu);
>>+        dpdk_mp_put(mp);
>>+        dev->mtu = old_mtu;
>>+        dev->dpdk_mp = old_mp;
>>+        dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
>>+        dpdk_eth_dev_init(dev);
>>+        goto out;
>>+    } else {
>>+        dpdk_mp_put(old_mp);
>>+        netdev_change_seq_changed(netdev);
>>+    }
>>+out:
>>+    ovs_mutex_unlock(&dev->mutex);
>>+    ovs_mutex_unlock(&dpdk_mutex);
>>+    return err;
>>+}
>>+
>>+static int
>>+netdev_dpdk_set_config(struct netdev *netdev_, const struct smap *args)
>>+{
>>+    int mtu;
>>+
>>+    dpdk_dev_parse_mtu(args, &mtu);
>>+
>>+    return netdev_dpdk_set_mtu(netdev_, mtu);
>>+}
>>+
>> static int
>> netdev_dpdk_get_numa_id(const struct netdev *netdev_)
>> {
>>@@ -1358,54 +1541,6 @@ netdev_dpdk_get_mtu(const struct netdev *netdev,
>>int *mtup)
>>
>>     return 0;
>> }
>>-
>>-static int
>>-netdev_dpdk_set_mtu(const struct netdev *netdev, int mtu)
>>-{
>>-    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>>-    int old_mtu, err;
>>-    struct dpdk_mp *old_mp;
>>-    struct dpdk_mp *mp;
>>-
>>-    ovs_mutex_lock(&dpdk_mutex);
>>-    ovs_mutex_lock(&dev->mutex);
>>-    if (dev->mtu == mtu) {
>>-        err = 0;
>>-        goto out;
>>-    }
>>-
>>-    mp = dpdk_mp_get(dev->socket_id, dev->mtu);
>>-    if (!mp) {
>>-        err = ENOMEM;
>>-        goto out;
>>-    }
>>-
>>-    rte_eth_dev_stop(dev->port_id);
>>-
>>-    old_mtu = dev->mtu;
>>-    old_mp = dev->dpdk_mp;
>>-    dev->dpdk_mp = mp;
>>-    dev->mtu = mtu;
>>-    dev->max_packet_len = MTU_TO_MAX_LEN(dev->mtu);
>>-
>>-    err = dpdk_eth_dev_init(dev);
>>-    if (err) {
>>-        dpdk_mp_put(mp);
>>-        dev->mtu = old_mtu;
>>-        dev->dpdk_mp = old_mp;
>>-        dev->max_packet_len = MTU_TO_MAX_LEN(dev->mtu);
>>-        dpdk_eth_dev_init(dev);
>>-        goto out;
>>-    }
>>-
>>-    dpdk_mp_put(old_mp);
>>-    netdev_change_seq_changed(netdev);
>>-out:
>>-    ovs_mutex_unlock(&dev->mutex);
>>-    ovs_mutex_unlock(&dpdk_mutex);
>>-    return err;
>>-}
>>-
>> static int
>> netdev_dpdk_get_carrier(const struct netdev *netdev_, bool *carrier);
>>
>>@@ -1682,7 +1817,7 @@ netdev_dpdk_get_status(const struct netdev
>>*netdev_, struct smap *args)
>>     smap_add_format(args, "numa_id", "%d",
>>rte_eth_dev_socket_id(dev->port_id));
>>     smap_add_format(args, "driver_name", "%s", dev_info.driver_name);
>>     smap_add_format(args, "min_rx_bufsize", "%u",
>>dev_info.min_rx_bufsize);
>>-    smap_add_format(args, "max_rx_pktlen", "%u", dev_info.max_rx_pktlen);
>>+    smap_add_format(args, "max_rx_pktlen", "%u", dev->max_packet_len);
>>     smap_add_format(args, "max_rx_queues", "%u", dev_info.max_rx_queues);
>>     smap_add_format(args, "max_tx_queues", "%u", dev_info.max_tx_queues);
>>     smap_add_format(args, "max_mac_addrs", "%u", dev_info.max_mac_addrs);
>>@@ -1904,6 +2039,51 @@ dpdk_vhost_user_class_init(void)
>>     return 0;
>> }
>>
>>+/* Set the mtu of DPDK_DEV_VHOST ports */
>>+static int
>>+netdev_dpdk_vhost_set_mtu(const struct netdev *netdev, int mtu)
>>+{
>>+    struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
>>+    int err = 0;
>>+    struct dpdk_mp *old_mp;
>>+    struct dpdk_mp *mp;
>>+
>>+    ovs_mutex_lock(&dpdk_mutex);
>>+    ovs_mutex_lock(&dev->mutex);
>>+    if (dev->mtu == mtu) {
>>+        err = 0;
>>+        goto out;
>>+    }
>>+
>>+    mp = dpdk_mp_get(dev->socket_id, mtu);
>>+    if (!mp) {
>>+        err = ENOMEM;
>>+        goto out;
>>+    }
>>+
>>+    old_mp = dev->dpdk_mp;
>>+    dev->dpdk_mp = mp;
>>+    dev->mtu = mtu;
>>+    dev->max_packet_len = MTU_TO_FRAME_LEN(dev->mtu);
>>+
>>+    dpdk_mp_put(old_mp);
>>+    netdev_change_seq_changed(netdev);
>>+out:
>>+    ovs_mutex_unlock(&dev->mutex);
>>+    ovs_mutex_unlock(&dpdk_mutex);
>>+    return err;
>>+}
>>+
>>+static int
>>+netdev_dpdk_vhost_set_config(struct netdev *netdev_, const struct smap
>>*args)
>>+{
>>+    int mtu;
>>+
>>+    dpdk_dev_parse_mtu(args, &mtu);
>>+
>>+    return netdev_dpdk_vhost_set_mtu(netdev_, mtu);
>>+}
>>+
>> static void
>> dpdk_common_init(void)
>> {
>>@@ -2040,8 +2220,9 @@ unlock_dpdk:
>>     return err;
>> }
>>
>>-#define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, MULTIQ, SEND,
>>\
>>-    GET_CARRIER, GET_STATS, GET_FEATURES, GET_STATUS, RXQ_RECV)
>>\
>>+#define NETDEV_DPDK_CLASS(NAME, INIT, CONSTRUCT, DESTRUCT, SET_CONFIG, \
>>+        MULTIQ, SEND, SET_MTU, GET_CARRIER, GET_STATS, GET_FEATURES,   \
>>+        GET_STATUS, RXQ_RECV)                                          \
>> {                                                             \
>>     NAME,                                                     \
>>     INIT,                       /* init */                    \
>>@@ -2053,7 +2234,7 @@ unlock_dpdk:
>>     DESTRUCT,                                                 \
>>     netdev_dpdk_dealloc,                                      \
>>     netdev_dpdk_get_config,                                   \
>>-    NULL,                       /* netdev_dpdk_set_config */  \
>>+    SET_CONFIG,                                               \
>>     NULL,                       /* get_tunnel_config */       \
>>     NULL,                       /* build header */            \
>>     NULL,                       /* push header */             \
>>@@ -2067,7 +2248,7 @@ unlock_dpdk:
>>     netdev_dpdk_set_etheraddr,                                \
>>     netdev_dpdk_get_etheraddr,                                \
>>     netdev_dpdk_get_mtu,                                      \
>>-    netdev_dpdk_set_mtu,                                      \
>>+    SET_MTU,                                                  \
>>     netdev_dpdk_get_ifindex,                                  \
>>     GET_CARRIER,                                              \
>>     netdev_dpdk_get_carrier_resets,                           \
>>@@ -2213,8 +2394,10 @@ static const struct netdev_class dpdk_class =
>>         NULL,
>>         netdev_dpdk_construct,
>>         netdev_dpdk_destruct,
>>+        netdev_dpdk_set_config,
>>         netdev_dpdk_set_multiq,
>>         netdev_dpdk_eth_send,
>>+        netdev_dpdk_set_mtu,
>>         netdev_dpdk_get_carrier,
>>         netdev_dpdk_get_stats,
>>         netdev_dpdk_get_features,
>>@@ -2227,8 +2410,10 @@ static const struct netdev_class dpdk_ring_class =
>>         NULL,
>>         netdev_dpdk_ring_construct,
>>         netdev_dpdk_destruct,
>>+        netdev_dpdk_set_config,
>>         netdev_dpdk_set_multiq,
>>         netdev_dpdk_ring_send,
>>+        netdev_dpdk_set_mtu,
>>         netdev_dpdk_get_carrier,
>>         netdev_dpdk_get_stats,
>>         netdev_dpdk_get_features,
>>@@ -2241,8 +2426,10 @@ static const struct netdev_class OVS_UNUSED
>>dpdk_vhost_cuse_class =
>>         dpdk_vhost_cuse_class_init,
>>         netdev_dpdk_vhost_cuse_construct,
>>         netdev_dpdk_vhost_destruct,
>>+        NULL,
>>         netdev_dpdk_vhost_set_multiq,
>>         netdev_dpdk_vhost_send,
>>+        NULL,
>>         netdev_dpdk_vhost_get_carrier,
>>         netdev_dpdk_vhost_get_stats,
>>         NULL,
>>@@ -2255,8 +2442,10 @@ static const struct netdev_class OVS_UNUSED
>>dpdk_vhost_user_class =
>>         dpdk_vhost_user_class_init,
>>         netdev_dpdk_vhost_user_construct,
>>         netdev_dpdk_vhost_destruct,
>>+        netdev_dpdk_vhost_set_config,
>>         netdev_dpdk_vhost_set_multiq,
>>         netdev_dpdk_vhost_send,
>>+        netdev_dpdk_vhost_set_mtu,
>>         netdev_dpdk_vhost_get_carrier,
>>         netdev_dpdk_vhost_get_stats,
>>         NULL,
>>--
>>1.9.3
>>
>>_______________________________________________
>>dev mailing list
>>dev@openvswitch.org
>>http://openvswitch.org/mailman/listinfo/dev

_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to