[dpdk-dev] [PATCH v2] vhost: dequeue zero copy should restore mbuf before return to pool
dequeue zero copy change buf_addr and buf_iova of mbuf, and return to mbuf pool without restore them, it breaks vm memory if others allocate mbuf from same pool since mbuf reset doesn't reset buf_addr and buf_iova. Signed-off-by: Junjie Chen --- v2 changes: Remove useless restore lib/librte_vhost/virtio_net.c | 17 + 1 file changed, 17 insertions(+) diff --git a/lib/librte_vhost/virtio_net.c b/lib/librte_vhost/virtio_net.c index 568ad0e..d4f9a3a 100644 --- a/lib/librte_vhost/virtio_net.c +++ b/lib/librte_vhost/virtio_net.c @@ -1158,6 +1158,22 @@ mbuf_is_consumed(struct rte_mbuf *m) return true; } +static __rte_always_inline void +restore_mbuf(struct rte_mbuf *m) +{ + uint32_t mbuf_size, priv_size; + + while (m) { + priv_size = rte_pktmbuf_priv_size(m->pool); + mbuf_size = sizeof(struct rte_mbuf) + priv_size; + /* start of buffer is after mbuf structure and priv data */ + + m->buf_addr = (char *)m + mbuf_size; + m->buf_iova = rte_mempool_virt2iova(m) + mbuf_size; + m = m->next; + } +} + uint16_t rte_vhost_dequeue_burst(int vid, uint16_t queue_id, struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t count) @@ -1209,6 +1225,7 @@ rte_vhost_dequeue_burst(int vid, uint16_t queue_id, nr_updated += 1; TAILQ_REMOVE(&vq->zmbuf_list, zmbuf, next); + restore_mbuf(zmbuf->mbuf); rte_pktmbuf_free(zmbuf->mbuf); put_zmbuf(zmbuf); vq->nr_zmbuf -= 1; -- 2.0.1
Re: [dpdk-dev] [RFC] New packet type query API
On 01/16/2018 06:55 PM, Adrien Mazarguil wrote: I understand the motivation behind this proposal, however since new ideas must be challenged, I have a few comments: - How about making packet type recognition an optional offload configurable per queue like any other (e.g. DEV_RX_OFFLOAD_PTYPE)? That way the extra processing cost could be avoided for applications that do not care. - Depending on HW, packet type information inside RX descriptors may not necessarily fit 64-bit, or at least not without transformation. This transformation would still cause wasted cycles on the PMD side. - In case enable_ptype_direct is enabled, the PMD may not waste CPU cycles but the subsequent look-up with the proposed API would translate to a higher cost on the application side. As a data plane API, how does this benefit applications that want to retrieve packet type information? - Without a dedicated mbuf flag, an application cannot tell whether enclosed packet type data is in HW format. Even if present, if port information is discarded or becomes invalid (e.g. mbuf stored in an application queue for lengthy periods or passed as is to an unrelated application), there is no way to make sense of the data. In my opinion, mbufs should only contain data fields in a standardized format. Managing packet types like an offload which can be toggled at will seems to be the best compromise. Thoughts? +1
Re: [dpdk-dev] [PATCH] vhost: dequeue zero copy should restore mbuf before return to pool
> > > > dequeue zero copy change buf_addr and buf_iova of mbuf, and return to > > mbuf pool without restore them, it breaks vm memory if others allocate > > mbuf from same pool since mbuf reset doesn't reset buf_addr and > buf_iova. > > > > Signed-off-by: Junjie Chen > > --- > > lib/librte_vhost/virtio_net.c | 21 + > > 1 file changed, 21 insertions(+) > > > > diff --git a/lib/librte_vhost/virtio_net.c > > b/lib/librte_vhost/virtio_net.c index 568ad0e..e9aaf6d 100644 > > --- a/lib/librte_vhost/virtio_net.c > > +++ b/lib/librte_vhost/virtio_net.c > > @@ -1158,6 +1158,26 @@ mbuf_is_consumed(struct rte_mbuf *m) > > return true; > > } > > > > + > > +static __rte_always_inline void > > +restore_mbuf(struct rte_mbuf *m) > > +{ > > + uint32_t mbuf_size, priv_size; > > + > > + while (m) { > > + priv_size = rte_pktmbuf_priv_size(m->pool); > > + mbuf_size = sizeof(struct rte_mbuf) + priv_size; > > + /* start of buffer is after mbuf structure and priv data */ > > + m->priv_size = priv_size; > > I don't think we need to restore priv_size. Refer to its definition in > rte_mbuf: > "Size of the application private data. In case of an indirect mbuf, it > stores the direct mbuf private data size." > > Thanks, > Jianfeng You are right, I also remove restore for data_len since it is reset when allocating. Please see v2. Thanks. > > > + > > + m->buf_addr = (char *)m + mbuf_size; > > + m->buf_iova = rte_mempool_virt2iova(m) + mbuf_size; > > + m->data_off = RTE_MIN(RTE_PKTMBUF_HEADROOM, > > + (uint16_t)m->buf_len); > > + m = m->next; > > + } > > +} > > + > > uint16_t > > rte_vhost_dequeue_burst(int vid, uint16_t queue_id, > > struct rte_mempool *mbuf_pool, struct rte_mbuf **pkts, uint16_t > > count) > > @@ -1209,6 +1229,7 @@ rte_vhost_dequeue_burst(int vid, uint16_t > > queue_id, > > nr_updated += 1; > > > > TAILQ_REMOVE(&vq->zmbuf_list, zmbuf, next); > > + restore_mbuf(zmbuf->mbuf); > > rte_pktmbuf_free(zmbuf->mbuf); > > put_zmbuf(zmbuf); > > vq->nr_zmbuf -= 1; > > -- > > 2.0.1
[dpdk-dev] [PATCH v2 1/2] net/ena: convert to new Tx offloads API
Ethdev Tx offloads API has changed since: commit cba7f53b717d ("ethdev: introduce Tx queue offloads API") This commit support the new Tx offloads API. Queue configuration is stored in ena_ring.offloads. During preparing mbufs for tx, offloads are allowed only if appropriate flags in this field are set. Signed-off-by: Rafal Kozik --- v2: * Check ETH_TXQ_FLAGS_IGNORE flag. * Use PRIx64 in printf. drivers/net/ena/ena_ethdev.c | 74 +++- drivers/net/ena/ena_ethdev.h | 3 ++ 2 files changed, 62 insertions(+), 15 deletions(-) diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c index 22db895..54ccc3d 100644 --- a/drivers/net/ena/ena_ethdev.c +++ b/drivers/net/ena/ena_ethdev.c @@ -164,6 +164,14 @@ static const struct ena_stats ena_stats_ena_com_strings[] = { #define ENA_STATS_ARRAY_RX ARRAY_SIZE(ena_stats_rx_strings) #define ENA_STATS_ARRAY_ENA_COMARRAY_SIZE(ena_stats_ena_com_strings) +#define QUEUE_OFFLOADS (DEV_TX_OFFLOAD_TCP_CKSUM |\ + DEV_TX_OFFLOAD_UDP_CKSUM |\ + DEV_TX_OFFLOAD_IPV4_CKSUM |\ + DEV_TX_OFFLOAD_TCP_TSO) +#define MBUF_OFFLOADS (PKT_TX_L4_MASK |\ + PKT_TX_IP_CKSUM |\ + PKT_TX_TCP_SEG) + /** Vendor ID used by Amazon devices */ #define PCI_VENDOR_ID_AMAZON 0x1D0F /** Amazon devices */ @@ -227,6 +235,8 @@ static int ena_rss_reta_query(struct rte_eth_dev *dev, struct rte_eth_rss_reta_entry64 *reta_conf, uint16_t reta_size); static int ena_get_sset_count(struct rte_eth_dev *dev, int sset); +static bool ena_are_tx_queue_offloads_allowed(struct ena_adapter *adapter, + uint64_t offloads); static const struct eth_dev_ops ena_dev_ops = { .dev_configure= ena_dev_configure, @@ -280,21 +290,24 @@ static inline void ena_rx_mbuf_prepare(struct rte_mbuf *mbuf, } static inline void ena_tx_mbuf_prepare(struct rte_mbuf *mbuf, - struct ena_com_tx_ctx *ena_tx_ctx) + struct ena_com_tx_ctx *ena_tx_ctx, + uint64_t queue_offloads) { struct ena_com_tx_meta *ena_meta = &ena_tx_ctx->ena_meta; - if (mbuf->ol_flags & - (PKT_TX_L4_MASK | PKT_TX_IP_CKSUM | PKT_TX_TCP_SEG)) { + if ((mbuf->ol_flags & MBUF_OFFLOADS) && + (queue_offloads & QUEUE_OFFLOADS)) { /* check if TSO is required */ - if (mbuf->ol_flags & PKT_TX_TCP_SEG) { + if ((mbuf->ol_flags & PKT_TX_TCP_SEG) && + (queue_offloads & DEV_TX_OFFLOAD_TCP_TSO)) { ena_tx_ctx->tso_enable = true; ena_meta->l4_hdr_len = GET_L4_HDR_LEN(mbuf); } /* check if L3 checksum is needed */ - if (mbuf->ol_flags & PKT_TX_IP_CKSUM) + if ((mbuf->ol_flags & PKT_TX_IP_CKSUM) && + (queue_offloads & DEV_TX_OFFLOAD_IPV4_CKSUM)) ena_tx_ctx->l3_csum_enable = true; if (mbuf->ol_flags & PKT_TX_IPV6) { @@ -310,19 +323,17 @@ static inline void ena_tx_mbuf_prepare(struct rte_mbuf *mbuf, } /* check if L4 checksum is needed */ - switch (mbuf->ol_flags & PKT_TX_L4_MASK) { - case PKT_TX_TCP_CKSUM: + if ((mbuf->ol_flags & PKT_TX_TCP_CKSUM) && + (queue_offloads & DEV_TX_OFFLOAD_TCP_CKSUM)) { ena_tx_ctx->l4_proto = ENA_ETH_IO_L4_PROTO_TCP; ena_tx_ctx->l4_csum_enable = true; - break; - case PKT_TX_UDP_CKSUM: + } else if ((mbuf->ol_flags & PKT_TX_UDP_CKSUM) && + (queue_offloads & DEV_TX_OFFLOAD_UDP_CKSUM)) { ena_tx_ctx->l4_proto = ENA_ETH_IO_L4_PROTO_UDP; ena_tx_ctx->l4_csum_enable = true; - break; - default: + } else { ena_tx_ctx->l4_proto = ENA_ETH_IO_L4_PROTO_UNKNOWN; ena_tx_ctx->l4_csum_enable = false; - break; } ena_meta->mss = mbuf->tso_segsz; @@ -945,7 +956,7 @@ static int ena_tx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx, uint16_t nb_desc, __rte_unused unsigned int socket_id, - __rte_unused const struct rte_eth_txconf *tx_conf) + const struct rte_eth_txconf *tx_conf) { struct ena_com_create_io_ctx ctx = /* policy set to _HOST just to satisfy icc compiler */ @@ -982,6 +993,12 @@ static int ena_tx_queue_setup(struct
Re: [dpdk-dev] [PATCH v8 3/3] ring: introduce new header file to support C11 memory model
17/01/2018 05:03, Jia He: > To support C11 memory model barrier, 2 options are suggested by Jerin: > 1. use rte_smp_rmb > 2. use load_acquire/store_release(refer to [1]). > CONFIG_RTE_RING_USE_C11_MEM_MODEL is provided, and by default it is "n" > on any architectures so far. In previous patches, it was enabled for ARM. You decided to not enable it at all? > config/common_linuxapp | 2 + It should be defined in common_base, not common_linuxapp. > --- /dev/null > +++ b/lib/librte_ring/rte_ring_c11_mem.h > @@ -0,0 +1,193 @@ > +/*- > + * Copyright(c) 2017 Intel Corporation. All rights reserved. > + * All rights reserved. > + * > + * SPDX-License-Identifier: BSD-3-Clause > + */ It is not complying with the template. Please check the license/ directory. Why is it Intel Copyright? "All rights reserved" is probably not needed. > +/* > + * Derived from FreeBSD's bufring.h > + * > + ** > + * > + * Copyright (c) 2007-2009 Kip Macy km...@freebsd.org > + * All rights reserved. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions are > met: > + * > + * 1. Redistributions of source code must retain the above copyright notice, > + *this list of conditions and the following disclaimer. > + * > + * 2. The name of Kip Macy nor the names of other > + *contributors may be used to endorse or promote products derived from > + *this software without specific prior written permission. > + * > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS > IS" > + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE > + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE > + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE > + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR > + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF > + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS > + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN > + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) > + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE > + * POSSIBILITY OF SUCH DAMAGE. > + * > + ***/ This double license may be an issue. Hemant, comment?
[dpdk-dev] [PATCH v2 2/2] net/ena: convert to new Rx offloads API
Ethdev Rx offloads API has changed since: commit ce17eddefc20 ("ethdev: introduce Rx queue offloads API") This commit support the new Rx offloads API. Signed-off-by: Rafal Kozik --- v2: * Use PRIx64 in printf. drivers/net/ena/ena_ethdev.c | 36 ++-- drivers/net/ena/ena_ethdev.h | 2 ++ 2 files changed, 36 insertions(+), 2 deletions(-) diff --git a/drivers/net/ena/ena_ethdev.c b/drivers/net/ena/ena_ethdev.c index 54ccc3d..1dfbe39 100644 --- a/drivers/net/ena/ena_ethdev.c +++ b/drivers/net/ena/ena_ethdev.c @@ -237,6 +237,8 @@ static int ena_rss_reta_query(struct rte_eth_dev *dev, static int ena_get_sset_count(struct rte_eth_dev *dev, int sset); static bool ena_are_tx_queue_offloads_allowed(struct ena_adapter *adapter, uint64_t offloads); +static bool ena_are_rx_queue_offloads_allowed(struct ena_adapter *adapter, + uint64_t offloads); static const struct eth_dev_ops ena_dev_ops = { .dev_configure= ena_dev_configure, @@ -766,7 +768,8 @@ static uint32_t ena_get_mtu_conf(struct ena_adapter *adapter) { uint32_t max_frame_len = adapter->max_mtu; - if (adapter->rte_eth_dev_data->dev_conf.rxmode.jumbo_frame == 1) + if (adapter->rte_eth_dev_data->dev_conf.rxmode.offloads & + DEV_RX_OFFLOAD_JUMBO_FRAME) max_frame_len = adapter->rte_eth_dev_data->dev_conf.rxmode.max_rx_pkt_len; @@ -1066,7 +1069,7 @@ static int ena_rx_queue_setup(struct rte_eth_dev *dev, uint16_t queue_idx, uint16_t nb_desc, __rte_unused unsigned int socket_id, - __rte_unused const struct rte_eth_rxconf *rx_conf, + const struct rte_eth_rxconf *rx_conf, struct rte_mempool *mp) { struct ena_com_create_io_ctx ctx = @@ -1102,6 +1105,11 @@ static int ena_rx_queue_setup(struct rte_eth_dev *dev, return -EINVAL; } + if (!ena_are_rx_queue_offloads_allowed(adapter, rx_conf->offloads)) { + RTE_LOG(ERR, PMD, "Unsupported queue offloads\n"); + return -EINVAL; + } + ena_qid = ENA_IO_RXQ_IDX(queue_idx); ctx.qid = ena_qid; @@ -1406,6 +1414,7 @@ static int ena_dev_configure(struct rte_eth_dev *dev) struct ena_adapter *adapter = (struct ena_adapter *)(dev->data->dev_private); uint64_t tx_offloads = dev->data->dev_conf.txmode.offloads; + uint64_t rx_offloads = dev->data->dev_conf.rxmode.offloads; if ((tx_offloads & adapter->tx_supported_offloads) != tx_offloads) { RTE_LOG(ERR, PMD, "Some Tx offloads are not supported " @@ -1414,6 +1423,13 @@ static int ena_dev_configure(struct rte_eth_dev *dev) return -ENOTSUP; } + if ((rx_offloads & adapter->rx_supported_offloads) != rx_offloads) { + RTE_LOG(ERR, PMD, "Some Rx offloads are not supported " + "requested 0x%" PRIx64 " supported 0x%" PRIx64 "\n", + rx_offloads, adapter->rx_supported_offloads); + return -ENOTSUP; + } + if (!(adapter->state == ENA_ADAPTER_STATE_INIT || adapter->state == ENA_ADAPTER_STATE_STOPPED)) { PMD_INIT_LOG(ERR, "Illegal adapter state: %d", @@ -1435,6 +1451,7 @@ static int ena_dev_configure(struct rte_eth_dev *dev) } adapter->tx_selected_offloads = tx_offloads; + adapter->rx_selected_offloads = rx_offloads; return 0; } @@ -1476,6 +1493,19 @@ static bool ena_are_tx_queue_offloads_allowed(struct ena_adapter *adapter, return true; } +static bool ena_are_rx_queue_offloads_allowed(struct ena_adapter *adapter, + uint64_t offloads) +{ + uint64_t port_offloads = adapter->rx_selected_offloads; + + /* Check if port supports all requested offloads. +* True if all offloads selected for queue are set for port. +*/ + if ((offloads & port_offloads) != offloads) + return false; + return true; +} + static void ena_infos_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info) { @@ -1530,6 +1560,7 @@ static void ena_infos_get(struct rte_eth_dev *dev, /* Inform framework about available features */ dev_info->rx_offload_capa = rx_feat; + dev_info->rx_queue_offload_capa = rx_feat; dev_info->tx_offload_capa = tx_feat; dev_info->tx_queue_offload_capa = tx_feat; @@ -1542,6 +1573,7 @@ static void ena_infos_get(struct rte_eth_dev *dev, dev_info->reta_size = ENA_RX_RSS_TABLE_SIZE; adapter->tx_supported_offloads = tx_feat; + adapter->rx_supported_offloads = rx_feat; } static uint16_t eth_ena_recv_pk
[dpdk-dev] [PATCH v2 4/6] test: fix memory leak in ring perf autotest
Fixes: ac3fb3019c52 ("app: rework ring tests") Cc: sta...@dpdk.org Signed-off-by: Anatoly Burakov --- test/test/test_ring_perf.c | 36 1 file changed, 20 insertions(+), 16 deletions(-) diff --git a/test/test/test_ring_perf.c b/test/test/test_ring_perf.c index 4363e4d..ebb3939 100644 --- a/test/test/test_ring_perf.c +++ b/test/test/test_ring_perf.c @@ -32,9 +32,6 @@ */ static const volatile unsigned bulk_sizes[] = { 8, 32 }; -/* The ring structure used for tests */ -static struct rte_ring *r; - struct lcore_pair { unsigned c1, c2; }; @@ -115,7 +112,7 @@ get_two_sockets(struct lcore_pair *lcp) /* Get cycle counts for dequeuing from an empty ring. Should be 2 or 3 cycles */ static void -test_empty_dequeue(void) +test_empty_dequeue(struct rte_ring *r) { const unsigned iter_shift = 26; const unsigned iterations = 1size; unsigned i; void *burst[MAX_BURST] = {0}; @@ -192,6 +191,7 @@ dequeue_bulk(void *p) const unsigned iter_shift = 23; const unsigned iterations = 1 size; unsigned i; void *burst[MAX_BURST] = {0}; @@ -222,7 +222,7 @@ dequeue_bulk(void *p) * used to measure ring perf between hyperthreads, cores and sockets. */ static void -run_on_core_pair(struct lcore_pair *cores, +run_on_core_pair(struct lcore_pair *cores, struct rte_ring *r, lcore_function_t f1, lcore_function_t f2) { struct thread_params param1 = {0}, param2 = {0}; @@ -230,6 +230,7 @@ run_on_core_pair(struct lcore_pair *cores, for (i = 0; i < sizeof(bulk_sizes)/sizeof(bulk_sizes[0]); i++) { lcore_count = 0; param1.size = param2.size = bulk_sizes[i]; + param1.r = param2.r = r; if (cores->c1 == rte_get_master_lcore()) { rte_eal_remote_launch(f2, ¶m2, cores->c2); f1(¶m1); @@ -252,7 +253,7 @@ run_on_core_pair(struct lcore_pair *cores, * takes on a single lcore. Result is for comparison with the bulk enq+deq. */ static void -test_single_enqueue_dequeue(void) +test_single_enqueue_dequeue(struct rte_ring *r) { const unsigned iter_shift = 24; const unsigned iterations = 1<
[dpdk-dev] [PATCH v2 2/6] test: fix memory leak in reorder autotest
Add a teardown function that frees allocated resources. Fixes: d0c9b58d7156 ("app/test: new reorder unit test") Cc: sergio.gonzalez.mon...@intel.com Cc: reshma.pat...@intek.com Cc: sta...@dpdk.org Signed-off-by: Anatoly Burakov --- test/test/test_reorder.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/test/test/test_reorder.c b/test/test/test_reorder.c index e834bac..65e4f38 100644 --- a/test/test/test_reorder.c +++ b/test/test/test_reorder.c @@ -331,9 +331,20 @@ test_setup(void) return 0; } +static void +test_teardown(void) +{ + rte_reorder_free(test_params->b); + test_params->b = NULL; + rte_mempool_free(test_params->p); + test_params->p = NULL; +} + + static struct unit_test_suite reorder_test_suite = { .setup = test_setup, + .teardown = test_teardown, .suite_name = "Reorder Unit Test Suite", .unit_test_cases = { TEST_CASE(test_reorder_create), -- 2.7.4
[dpdk-dev] [PATCH v2 6/6] test: fix memory leak in timer perf autotest
Fixes: 277afaf3dbcb ("app/test: add timer_perf") Cc: sta...@dpdk.org Signed-off-by: Anatoly Burakov --- test/test/test_timer_perf.c | 1 + 1 file changed, 1 insertion(+) diff --git a/test/test/test_timer_perf.c b/test/test/test_timer_perf.c index afa3a06..0fe2b74 100644 --- a/test/test/test_timer_perf.c +++ b/test/test/test_timer_perf.c @@ -127,6 +127,7 @@ test_timer_perf(void) printf("Time per rte_timer_manage with zero callbacks: %"PRIu64" cycles\n", (end_tsc - start_tsc + iterations/2) / iterations); + rte_free(tms); return 0; } -- 2.7.4
[dpdk-dev] [PATCH v2 1/6] test: fix memory leak in bitmap test
Acked-by: Cristian Dumitrescu Fixes: c7e4a134e769 ("test: verify bitmap operations") Cc: pbhagavat...@caviumnetworks.com Signed-off-by: Anatoly Burakov --- test/test/test_bitmap.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/test/test/test_bitmap.c b/test/test/test_bitmap.c index 05d547e..c3169e9 100644 --- a/test/test/test_bitmap.c +++ b/test/test/test_bitmap.c @@ -158,6 +158,9 @@ test_bitmap(void) if (test_bitmap_scan_operations(bmp) < 0) return TEST_FAILED; + rte_bitmap_free(bmp); + rte_free(mem); + return TEST_SUCCESS; } -- 2.7.4
[dpdk-dev] [PATCH v2 5/6] test: fix memory leak in table autotest
Always deallocate allocated resources after the test is done. Acked-by: Cristian Dumitrescu Fixes: 5205954791cb ("app/test: packet framework unit tests") Cc: cristian.dumitre...@intel.com Cc: sta...@dpdk.org Signed-off-by: Anatoly Burakov --- test/test/test_table.c | 44 1 file changed, 28 insertions(+), 16 deletions(-) diff --git a/test/test/test_table.c b/test/test/test_table.c index c9268a3..f01652d 100644 --- a/test/test/test_table.c +++ b/test/test/test_table.c @@ -55,6 +55,14 @@ uint64_t pipeline_test_hash(void *key, } static void +app_free_resources(void) { + int i; + for (i = 0; i < N_PORTS; i++) + rte_ring_free(rings_rx[i]); + rte_mempool_free(pool); +} + +static void app_init_mbuf_pools(void) { /* Init the buffer pool */ @@ -113,18 +121,20 @@ app_init_rings(void) static int test_table(void) { - int status, failures; + int status, ret; unsigned i; - failures = 0; + ret = TEST_SUCCESS; app_init_rings(); app_init_mbuf_pools(); printf("\n\n\n\nPipeline tests\n"); - if (test_table_pipeline() < 0) - return -1; + if (test_table_pipeline() < 0) { + ret = TEST_FAILED; + goto end; + } printf("\n\n\n\nPort tests\n"); for (i = 0; i < n_port_tests; i++) { @@ -132,8 +142,8 @@ test_table(void) if (status < 0) { printf("\nPort test number %d failed (%d).\n", i, status); - failures++; - return -1; + ret = TEST_FAILED; + goto end; } } @@ -143,8 +153,8 @@ test_table(void) if (status < 0) { printf("\nTable test number %d failed (%d).\n", i, status); - failures++; - return -1; + ret = TEST_FAILED; + goto end; } } @@ -154,21 +164,23 @@ test_table(void) if (status < 0) { printf("\nCombined table test number %d failed with " "reason number %d.\n", i, status); - failures++; - return -1; + ret = TEST_FAILED; + goto end; } } - if (failures) - return -1; - #ifdef RTE_LIBRTE_ACL printf("\n\n\n\nACL tests\n"); - if (test_table_acl() < 0) - return -1; + if (test_table_acl() < 0) { + ret = TEST_FAILED; + goto end; + } #endif - return 0; +end: + app_free_resources(); + + return ret; } REGISTER_TEST_COMMAND(table_autotest, test_table); -- 2.7.4
[dpdk-dev] [PATCH v2 3/6] test: fix memory leak in ring autotest
Get rid of global static ring variable and don't reuse rings between test runs. Fixes: af75078fece3 ("first public release") Cc: sta...@dpdk.org Signed-off-by: Anatoly Burakov --- test/test/test_ring.c | 61 --- 1 file changed, 33 insertions(+), 28 deletions(-) diff --git a/test/test/test_ring.c b/test/test/test_ring.c index 19d497a..b339f04 100644 --- a/test/test/test_ring.c +++ b/test/test/test_ring.c @@ -57,8 +57,6 @@ static rte_atomic32_t synchro; -static struct rte_ring *r; - #defineTEST_RING_VERIFY(exp) \ if (!(exp)) { \ printf("error at %s:%d\tcondition " #exp " failed\n", \ @@ -73,7 +71,7 @@ static struct rte_ring *r; * helper routine for test_ring_basic */ static int -test_ring_basic_full_empty(void * const src[], void *dst[]) +test_ring_basic_full_empty(struct rte_ring *r, void * const src[], void *dst[]) { unsigned i, rand; const unsigned rsz = RING_SIZE - 1; @@ -114,7 +112,7 @@ test_ring_basic_full_empty(void * const src[], void *dst[]) } static int -test_ring_basic(void) +test_ring_basic(struct rte_ring *r) { void **src = NULL, **cur_src = NULL, **dst = NULL, **cur_dst = NULL; int ret; @@ -250,7 +248,7 @@ test_ring_basic(void) goto fail; } - if (test_ring_basic_full_empty(src, dst) != 0) + if (test_ring_basic_full_empty(r, src, dst) != 0) goto fail; cur_src = src; @@ -317,7 +315,7 @@ test_ring_basic(void) } static int -test_ring_burst_basic(void) +test_ring_burst_basic(struct rte_ring *r) { void **src = NULL, **cur_src = NULL, **dst = NULL, **cur_dst = NULL; int ret; @@ -731,6 +729,7 @@ test_ring_basic_ex(void) ret = 0; fail_test: + rte_ring_free(rp); if (obj != NULL) rte_free(obj); @@ -811,61 +810,67 @@ test_ring_with_exact_size(void) static int test_ring(void) { + struct rte_ring *r = NULL; + /* some more basic operations */ if (test_ring_basic_ex() < 0) - return -1; + goto test_fail; rte_atomic32_init(&synchro); + r = rte_ring_create("test", RING_SIZE, SOCKET_ID_ANY, 0); if (r == NULL) - r = rte_ring_create("test", RING_SIZE, SOCKET_ID_ANY, 0); - if (r == NULL) - return -1; + goto test_fail; /* retrieve the ring from its name */ if (rte_ring_lookup("test") != r) { printf("Cannot lookup ring from its name\n"); - return -1; + goto test_fail; } /* burst operations */ - if (test_ring_burst_basic() < 0) - return -1; + if (test_ring_burst_basic(r) < 0) + goto test_fail; /* basic operations */ - if (test_ring_basic() < 0) - return -1; + if (test_ring_basic(r) < 0) + goto test_fail; /* basic operations */ if ( test_create_count_odd() < 0){ - printf ("Test failed to detect odd count\n"); - return -1; - } - else - printf ( "Test detected odd count\n"); + printf("Test failed to detect odd count\n"); + goto test_fail; + } else + printf("Test detected odd count\n"); if ( test_lookup_null() < 0){ - printf ("Test failed to detect NULL ring lookup\n"); - return -1; - } - else - printf ( "Test detected NULL ring lookup \n"); + printf("Test failed to detect NULL ring lookup\n"); + goto test_fail; + } else + printf("Test detected NULL ring lookup\n"); /* test of creating ring with wrong size */ if (test_ring_creation_with_wrong_size() < 0) - return -1; + goto test_fail; /* test of creation ring with an used name */ if (test_ring_creation_with_an_used_name() < 0) - return -1; + goto test_fail; if (test_ring_with_exact_size() < 0) - return -1; + goto test_fail; /* dump the ring status */ rte_ring_list_dump(stdout); + rte_ring_free(r); + return 0; + +test_fail: + rte_ring_free(r); + + return -1; } REGISTER_TEST_COMMAND(ring_autotest, test_ring); -- 2.7.4
Re: [dpdk-dev] [PATCH v3 3/8] net/failsafe: support probed sub-devices getting
Hi Gaetan From: Gaëtan Rivet, Wednesday, January 17, 2018 12:31 AM > Hi Matan, > > On Tue, Jan 16, 2018 at 05:20:27PM +, Matan Azrad wrote: > > Hi Gaetan > > > > > > > > > > In 18.05, or 18.08 there should be an EAL function that would be > > > > > able to identify a device given a specific ID string (very close > > > > > to an > > > rte_devargs). > > > > > Currently, this API does not exist. > > > > > > > > > > You can hack your way around this for the moment, IF you really, > > > > > really > > > > > want: parse your devargs, get the bus, use the bus->parse() > > > > > function to get a binary device representation, and compare > > > > > bytes per bytes the binary representation given by your devargs > > > > > and by the device- > > > >name. > > > > > > > > > > But this is a hack, and a pretty ugly one at that: you have no > > > > > way of knowing the size taken by this binary representation, so > > > > > you can restrict yourself to the vdev and PCI bus for the moment > > > > > and take the larger of an rte_vdev_driver pointer and an > rte_pci_addr > > > > > > > > > > { > > > > > union { > > > > > rte_vdev_driver *drv; > > > > > struct rte_pci_addr pci_addr; > > > > > } bindev1, bindev2; > > > > > memset(&bindev1, 0, sizeof(bindev1)); > > > > > memset(&bindev2, 0, sizeof(bindev2)); > > > > > rte_eal_devargs_parse(device->name, da1); > > > > > rte_eal_devargs_parse(your_devstr, da2); > > > > > RTE_ASSERT(da1->bus == rte_bus_find_by_name("pci") || > > > > >da1->bus == rte_bus_find_by_name("vdev")); > > > > > RTE_ASSERT(da2->bus == rte_bus_find_by_name("pci") || > > > > >da2->bus == rte_bus_find_by_name("vdev")); > > > > > da1->bus->parse(da1->name, &bindev1); > > > > > da1->bus->parse(da2->name, &bindev2); > > > > > if (memcmp(&bindev1, &bindev2, sizeof(bindev1)) == 0) { > > > > > /* found the device */ > > > > > } else { > > > > > /* not found */ > > > > > } > > > > > } > > > > > > > > > > So, really, really ugly. Anyway. > > > > > > > > > Yes, ugly :) Thanks for this update! > > > > Will keep the comparison by device->name. > > > > > > > > > > Well as explained, above, the comparison by device->name only works > > > with whitelisted devices. > > > > > > > > So either implement something broken right now that you will need to > > > update in 18.05, or implement it properly in 18.05 from the get go. > > > > > For the current needs it is enough. > > We can also say that it is the user responsibility to pass to failsafe the > > same > names and same args as he passes for EAL(or default EAL names). > > I think I emphasized it in documentation. > > > > Okay, as you wish. Just be aware of this limitation. > > I think this functionality is good and useful, but it needs to be made clean. > The proper function should be available soon, then this implementaion > should be cleaned up. Sure. > > > > > > > > > > > > > > > > > > > + /* Take control of device probed by EAL > > > options. */ > > > > > > > > + DEBUG("Taking control of a probed sub > > > device" > > > > > > > > + " %d named %s", i, da->name); > > > > > > > > > > > > > > In this case, the devargs of the probed device must be > > > > > > > copied within the sub- device definition and removed from > > > > > > > the EAL using the proper rte_devargs API. > > > > > > > > > > > > > > Note that there is no rte_devargs copy function. You can use > > > > > > > rte_devargs_parse instead, "parsing" again the original > > > > > > > devargs into the sub- device one. It is necessary for > > > > > > > complying with internal rte_devargs requirements (da->args > > > > > > > being malloc-ed, at the moment, > > > > > but may evolve). > > > > > > > > > > > > > > The rte_eal_devargs_parse function is not easy enough to use > > > > > > > right now, you will have to build a devargs string (using > > > > > > > snprintf) and > > > submit it. > > > > > > > I proposed a change this release for it but it will not make > > > > > > > it for 18.02, that would have simplified your implementation. > > > > > > > > > > > > > > > > > > > Got you. You right we need to remove the created devargs in > > > > > > fail-safe > > > > > parse level. > > > > > > What do you think about checking it in the parse level and > > > > > > avoid the new > > > > > devargs creation? > > > > > > Also to do the copy in parse level(same method as we are doing > > > > > > in probe > > > > > level)? > > > > > > > > > > > > > > > > Not sure I follow here, but the new rte_devargs is part of the > > > > > sub-device (it is not a pointer, but allocated alongside the > sub_device). > > > > > > > > > > So keep everything here, it is the
Re: [dpdk-dev] [PATCH v2 3/6] test: fix memory leak in ring autotest
On 17-Jan-18 8:36 AM, Anatoly Burakov wrote: Get rid of global static ring variable and don't reuse rings between test runs. Fixes: af75078fece3 ("first public release") Cc: sta...@dpdk.org Signed-off-by: Anatoly Burakov --- test/test/test_ring.c | 61 --- 1 file changed, 33 insertions(+), 28 deletions(-) diff --git a/test/test/test_ring.c b/test/test/test_ring.c index 19d497a..b339f04 100644 --- a/test/test/test_ring.c +++ b/test/test/test_ring.c @@ -57,8 +57,6 @@ static rte_atomic32_t synchro; -static struct rte_ring *r; - #define TEST_RING_VERIFY(exp) \ if (!(exp)) { \ printf("error at %s:%d\tcondition " #exp " failed\n", \ @@ -73,7 +71,7 @@ static struct rte_ring *r; * helper routine for test_ring_basic */ static int -test_ring_basic_full_empty(void * const src[], void *dst[]) +test_ring_basic_full_empty(struct rte_ring *r, void * const src[], void *dst[]) { unsigned i, rand; const unsigned rsz = RING_SIZE - 1; @@ -114,7 +112,7 @@ test_ring_basic_full_empty(void * const src[], void *dst[]) } static int -test_ring_basic(void) +test_ring_basic(struct rte_ring *r) { void **src = NULL, **cur_src = NULL, **dst = NULL, **cur_dst = NULL; int ret; @@ -250,7 +248,7 @@ test_ring_basic(void) goto fail; } - if (test_ring_basic_full_empty(src, dst) != 0) + if (test_ring_basic_full_empty(r, src, dst) != 0) goto fail; cur_src = src; @@ -317,7 +315,7 @@ test_ring_basic(void) } static int -test_ring_burst_basic(void) +test_ring_burst_basic(struct rte_ring *r) { void **src = NULL, **cur_src = NULL, **dst = NULL, **cur_dst = NULL; int ret; @@ -731,6 +729,7 @@ test_ring_basic_ex(void) ret = 0; fail_test: + rte_ring_free(rp); if (obj != NULL) rte_free(obj); @@ -811,61 +810,67 @@ test_ring_with_exact_size(void) static int test_ring(void) { + struct rte_ring *r = NULL; + /* some more basic operations */ if (test_ring_basic_ex() < 0) - return -1; + goto test_fail; rte_atomic32_init(&synchro); + r = rte_ring_create("test", RING_SIZE, SOCKET_ID_ANY, 0); if (r == NULL) - r = rte_ring_create("test", RING_SIZE, SOCKET_ID_ANY, 0); - if (r == NULL) - return -1; + goto test_fail; /* retrieve the ring from its name */ if (rte_ring_lookup("test") != r) { printf("Cannot lookup ring from its name\n"); - return -1; + goto test_fail; } /* burst operations */ - if (test_ring_burst_basic() < 0) - return -1; + if (test_ring_burst_basic(r) < 0) + goto test_fail; /* basic operations */ - if (test_ring_basic() < 0) - return -1; + if (test_ring_basic(r) < 0) + goto test_fail; /* basic operations */ if ( test_create_count_odd() < 0){ - printf ("Test failed to detect odd count\n"); - return -1; - } - else - printf ( "Test detected odd count\n"); + printf("Test failed to detect odd count\n"); + goto test_fail; + } else + printf("Test detected odd count\n"); if ( test_lookup_null() < 0){ - printf ("Test failed to detect NULL ring lookup\n"); - return -1; - } - else - printf ( "Test detected NULL ring lookup \n"); + printf("Test failed to detect NULL ring lookup\n"); + goto test_fail; + } else + printf("Test detected NULL ring lookup\n"); /* test of creating ring with wrong size */ if (test_ring_creation_with_wrong_size() < 0) - return -1; + goto test_fail; /* test of creation ring with an used name */ if (test_ring_creation_with_an_used_name() < 0) - return -1; + goto test_fail; if (test_ring_with_exact_size() < 0) - return -1; + goto test_fail; /* dump the ring status */ rte_ring_list_dump(stdout); + rte_ring_free(r); + return 0; + +test_fail: + rte_ring_free(r); + + return -1; } REGISTER_TEST_COMMAND(ring_autotest, test_ring); Apologies, for some reason the old and incorrect commit message was attached. I'll respin it. -- Thanks, Anatoly
Re: [dpdk-dev] [PATCH v8 3/3] ring: introduce new header file to support C11 memory model
Hi Thomas On 1/17/2018 4:24 PM, Thomas Monjalon Wrote: 17/01/2018 05:03, Jia He: To support C11 memory model barrier, 2 options are suggested by Jerin: 1. use rte_smp_rmb 2. use load_acquire/store_release(refer to [1]). CONFIG_RTE_RING_USE_C11_MEM_MODEL is provided, and by default it is "n" on any architectures so far. In previous patches, it was enabled for ARM. You decided to not enable it at all? Sorry, maybe I misunderstand your previous mail. >This config option should be added in the common file (as disabled). Do you mean CONFIG_RTE_RING_USE_C11_MEM_MODEL=n in comm_base and "y" in armv8 config? Cheers, Jia config/common_linuxapp | 2 + It should be defined in common_base, not common_linuxapp. --- /dev/null +++ b/lib/librte_ring/rte_ring_c11_mem.h @@ -0,0 +1,193 @@ +/*- + * Copyright(c) 2017 Intel Corporation. All rights reserved. + * All rights reserved. + * + * SPDX-License-Identifier: BSD-3-Clause + */ It is not complying with the template. Please check the license/ directory. Why is it Intel Copyright? "All rights reserved" is probably not needed. +/* + * Derived from FreeBSD's bufring.h + * + ** + * + * Copyright (c) 2007-2009 Kip Macy km...@freebsd.org + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * + * 1. Redistributions of source code must retain the above copyright notice, + *this list of conditions and the following disclaimer. + * + * 2. The name of Kip Macy nor the names of other + *contributors may be used to endorse or promote products derived from + *this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" + * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE + * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. + * + ***/ This double license may be an issue. Hemant, comment? -- Cheers, Jia
Re: [dpdk-dev] [PATCH v2] vfio: noiommu check error handling
On Tue, 16 Jan 2018 18:01:32 +0100 Maxime Coquelin wrote: Hi Jonas, On 01/16/2018 05:08 PM, Jonas Pfefferle wrote: On Mon, 15 Jan 2018 17:11:58 +0100 Thomas Monjalon wrote: 15/01/2018 13:22, Jonas Pfefferle: On Sat, 13 Jan 2018 23:49:30 +0100 Thomas Monjalon wrote: > 13/01/2018 13:15, Burakov, Anatoly: >> On 11-Jan-18 11:45 PM, Thomas Monjalon wrote: >> > 07/11/2017 10:50, Jonas Pfefferle1: >> >>> Is there something urgent for 17.11? >> >>> Or can it be refined in 18.02? >> >> >> >> Nothing urgent. We can refine this for 18.02. >> >> >> >>> Anatoly, any thought? >> > >> > Anatoly, Jonas, how do you want to proceed with this patch? >> > >> >> I don't see anything to be refined here, it's a simple bug fix - >>code >> assumes noiommu mode support is always available, when it might not >>be >> the case on older kernels. > > As a bug fix, the title must start with "fix" and a tag "Fixes:" > must be added to help with backport. > At the same time, the explanation of the bug must be added in > the commit log please. > > Thanks It's not really a bug fix since it does not change the semantic of the function but just adds nicer error handling. Regarding redefining the code: What I don't like is the special cases we have to check for when using the sPAPR iommu because it does not support VA mappings yet. I think we should decide which iova mode to use based on the iommu types available, i.e. each iommu type should report which iova type it supports. Thoughts? Have you looked at what Maxime did? https://dpdk.org/dev/patchwork/patch/33650/ How does it affect this patch? IMO it has the same problem. We shouldn't add more exception cases in drivers/bus/pci/linux/pci.c but instead keep all the information about what an IOMMU can do in lib/librte_eal/linuxapp/eal/eal_vfio.c I agree adding an exception in drivers/bus/pci/linux/pci.c isn't great, but we need first to fix a regression introduced in v17.11 LTS, and IMHO, we cannot do a big rework as the fix is to be backported. Once fixed, I agree we should work on a refactoring. I don't know if eal_vfio is the right place though, as in my case for example I cannot get the information needed through vfio ioctl(). Out of curiosity, what prevents sPAPR to use VA mode for now? Maxime Sounds good to me. The current sPAPR Linux driver cannot use virtual addresses because the DMA window size is restricted to RAM size and always starts at 0. This is not a hardware restriction and we are working on allowing to create arbitrary size windows. Jonas
Re: [dpdk-dev] [PATCH v2 6/6] app/testpmd: adjust ethdev port ownership
Hi Lu From: Lu, Wenzhuo, Wednesday, January 17, 2018 2:47 AM > Hi Matan, > > > -Original Message- > > From: Matan Azrad [mailto:ma...@mellanox.com] > > Sent: Tuesday, January 16, 2018 4:16 PM > > To: Lu, Wenzhuo ; Thomas Monjalon > > ; Gaetan Rivet ; Wu, > > Jingjing > > Cc: dev@dpdk.org; Neil Horman ; Richardson, > > Bruce ; Ananyev, Konstantin > > > > Subject: RE: [dpdk-dev] [PATCH v2 6/6] app/testpmd: adjust ethdev port > > ownership > > > > Hi Lu > > From: Lu, Wenzhuo, Tuesday, January 16, 2018 7:54 AM > > > Hi Matan, > > > > > > > > > > -Original Message- > > > > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Matan Azrad > > > > Sent: Sunday, January 7, 2018 5:46 PM > > > > To: Thomas Monjalon ; Gaetan Rivet > > > > ; Wu, Jingjing > > > > Cc: dev@dpdk.org; Neil Horman ; > Richardson, > > > > Bruce ; Ananyev, Konstantin > > > > > > > > Subject: [dpdk-dev] [PATCH v2 6/6] app/testpmd: adjust ethdev port > > > > ownership > > > > > > > > Testpmd should not use ethdev ports which are managed by other > > > > DPDK entities. > > > > > > > > Set Testpmd ownership to each port which is not used by other > > > > entity and prevent any usage of ethdev ports which are not owned by > Testpmd. > > > Sorry I don't follow all the discussion as there's too much. So it > > > may be a silly question. > > > > No problem, I'm here for any question :) > > > > > Testpmd already has the parameter " --pci-whitelist" to only use the > > > assigned devices. > > > > It is an EAL parameter. No? just say to EAL which devices to create.. > > > > > When using this parameter, all the devices are owned by the current > > > APP. > > > > No, what's about vdev? vdevs may manage devices(even whitlist PCI > > devices) by themselves and want to prevent any app to use these > > devices(see fail- safe PMD). > I'm not an expert of EAL and vdev. Suppose this would be discussed in other > patches. > I don't want to bother you again here as testpmd is only used to show the > result. > So I think if this patch is needed just depends on if other patches are > accepted :) > Sure! > > > > > So I don't know why need to set/check the ownership. > > > BTW, in this patch, seem all the change is for ownership checking. I > > > don't find the setting code. Do I miss something? > > > > Yes, see in main function (the first FOREACH). > I think you mean this change, > > @@ -2394,7 +2406,12 @@ uint8_t port_is_bonding_slave(portid_t > slave_pid) > rte_pdump_init(NULL); > #endif > > - nb_ports = (portid_t) rte_eth_dev_count(); > + if (rte_eth_dev_owner_new(&my_owner.id)) > + rte_panic("Failed to get unique owner identifier\n"); > + snprintf(my_owner.name, sizeof(my_owner.name), > TESTPMD_OWNER_NAME); > + RTE_ETH_FOREACH_DEV_OWNED_BY(port_id, > RTE_ETH_DEV_NO_OWNER) > + if (rte_eth_dev_owner_set(port_id, &my_owner) == 0) > + nb_ports++; > if (nb_ports == 0) > RTE_LOG(WARNING, EAL, "No probed ethernet devices\n"); Yes. > But I thought about some code to assign a specific device to a specific APP > explicitly. > This code looks like just occupying the devices with no owner. So, it means > the first APP will occupy all the devices? It makes me confused as I don't see > the benefit or the difference than before. Remember that in EAL init some drivers may create ports and take ownership of them, so this code owns all the ownerless ports. So, ports which are already owned by another DPDK entity will not success to take ownership here. Since Testpmd wanted to manage all the ports before this feature( by mistake ), I guess this is the right behavior now, just use the ports which are not used. > > > > > Thanks, Matan.
[dpdk-dev] [PATCH v2] mempool/dpaa: optimize phy to virt conversion
If the allocation is from a single memzone, optimize the phy-virt address conversions. Signed-off-by: Hemant Agrawal --- v2: use register memory area instead of new flag in mempool drivers/mempool/dpaa/dpaa_mempool.c | 50 ++--- drivers/mempool/dpaa/dpaa_mempool.h | 13 ++ 2 files changed, 60 insertions(+), 3 deletions(-) diff --git a/drivers/mempool/dpaa/dpaa_mempool.c b/drivers/mempool/dpaa/dpaa_mempool.c index ffb81c2..ddc4e47 100644 --- a/drivers/mempool/dpaa/dpaa_mempool.c +++ b/drivers/mempool/dpaa/dpaa_mempool.c @@ -73,6 +73,8 @@ dpaa_mbuf_create_pool(struct rte_mempool *mp) rte_dpaa_bpid_info[bpid].meta_data_size = sizeof(struct rte_mbuf) + rte_pktmbuf_priv_size(mp); rte_dpaa_bpid_info[bpid].dpaa_ops_index = mp->ops_index; + rte_dpaa_bpid_info[bpid].ptov_off = 0; + rte_dpaa_bpid_info[bpid].flags = 0; bp_info = rte_malloc(NULL, sizeof(struct dpaa_bp_info), @@ -145,9 +147,20 @@ dpaa_mbuf_free_bulk(struct rte_mempool *pool, } while (i < n) { + uint64_t phy = rte_mempool_virt2iova(obj_table[i]); + + if (unlikely(!bp_info->ptov_off)) { + /* buffers are not from multiple memzones */ + if (!(bp_info->flags & DPAA_MPOOL_MULTI_MEMZONE)) { + bp_info->ptov_off + = (uint64_t)obj_table[i] - phy; + rte_dpaa_bpid_info[bp_info->bpid].ptov_off + = bp_info->ptov_off; + } + } + dpaa_buf_free(bp_info, - (uint64_t)rte_mempool_virt2iova(obj_table[i]) + - bp_info->meta_data_size); + (uint64_t)phy + bp_info->meta_data_size); i = i + 1; } @@ -215,7 +228,7 @@ dpaa_mbuf_alloc_bulk(struct rte_mempool *pool, * i.e. first buffer is valid, remaining 6 buffers * may be null. */ - bufaddr = (void *)rte_dpaa_mem_ptov(bufs[i].addr); + bufaddr = DPAA_MEMPOOL_PTOV(bp_info, bufs[i].addr); m[n] = (struct rte_mbuf *)((char *)bufaddr - bp_info->meta_data_size); DPAA_MEMPOOL_DPDEBUG("Paddr (%p), FD (%p) from BMAN", @@ -246,6 +259,36 @@ dpaa_mbuf_get_count(const struct rte_mempool *mp) return bman_query_free_buffers(bp_info->bp); } +static int +dpaa_register_memory_area(const struct rte_mempool *mp, + char *vaddr __rte_unused, + rte_iova_t paddr __rte_unused, + size_t len) +{ + struct dpaa_bp_info *bp_info; + unsigned int total_elt_sz; + + MEMPOOL_INIT_FUNC_TRACE(); + + if (!mp || !mp->pool_data) { + DPAA_MEMPOOL_ERR("Invalid mempool provided\n"); + return 0; + } + + bp_info = DPAA_MEMPOOL_TO_POOL_INFO(mp); + total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size; + + DPAA_MEMPOOL_DEBUG("Req size %lu vs Available %u\n", + len, total_elt_sz * mp->size); + + /* Detect pool area has sufficient space for elements in this memzone */ + if (len < total_elt_sz * mp->size) + /* Else, Memory will be allocated from multiple memzones */ + bp_info->flags |= DPAA_MPOOL_MULTI_MEMZONE; + + return 0; +} + struct rte_mempool_ops dpaa_mpool_ops = { .name = "dpaa", .alloc = dpaa_mbuf_create_pool, @@ -253,6 +296,7 @@ struct rte_mempool_ops dpaa_mpool_ops = { .enqueue = dpaa_mbuf_free_bulk, .dequeue = dpaa_mbuf_alloc_bulk, .get_count = dpaa_mbuf_get_count, + .register_memory_area = dpaa_register_memory_area, }; MEMPOOL_REGISTER_OPS(dpaa_mpool_ops); diff --git a/drivers/mempool/dpaa/dpaa_mempool.h b/drivers/mempool/dpaa/dpaa_mempool.h index 91da62f..02aa513 100644 --- a/drivers/mempool/dpaa/dpaa_mempool.h +++ b/drivers/mempool/dpaa/dpaa_mempool.h @@ -28,6 +28,9 @@ /* Maximum release/acquire from BMAN */ #define DPAA_MBUF_MAX_ACQ_REL 8 +/* Buffers are allocated from multiple memzones i.e. non phys contiguous */ +#define DPAA_MPOOL_MULTI_MEMZONE 0x01 + struct dpaa_bp_info { struct rte_mempool *mp; struct bman_pool *bp; @@ -35,8 +38,18 @@ struct dpaa_bp_info { uint32_t size; uint32_t meta_data_size; int32_t dpaa_ops_index; + int64_t ptov_off; + uint8_t flags; }; +static inline void * +DPAA_MEMPOOL_PTOV(struct dpaa_bp_info *bp_info, uint64_t addr) +{ + if (bp_info->ptov_off) + return ((void *)(addr + bp_info->ptov_off)); + return rte_dpaa_mem_ptov(addr); +} +
[dpdk-dev] [PATCH] net/i40e: fix link_state update for i40e_ethdev_vf drv
The check for bool was accounting unwanted bits in the calulation of truth value Signed-off-by: Tushar Mulkar --- drivers/net/i40e/i40e_ethdev_vf.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/i40e/i40e_ethdev_vf.c b/drivers/net/i40e/i40e_ethdev_vf.c index b96d77a0c..9c14ea278 100644 --- a/drivers/net/i40e/i40e_ethdev_vf.c +++ b/drivers/net/i40e/i40e_ethdev_vf.c @@ -2095,7 +2095,7 @@ i40evf_dev_link_update(struct rte_eth_dev *dev, } /* full duplex only */ new_link.link_duplex = ETH_LINK_FULL_DUPLEX; - new_link.link_status = vf->link_up ? ETH_LINK_UP : + new_link.link_status = (vf->link_up & true) ? ETH_LINK_UP : ETH_LINK_DOWN; new_link.link_autoneg = dev->data->dev_conf.link_speeds & ETH_LINK_SPEED_FIXED; -- 2.11.0
Re: [dpdk-dev] [PATCH v2] vfio: noiommu check error handling
On 01/17/2018 09:48 AM, Jonas Pfefferle wrote: On Tue, 16 Jan 2018 18:01:32 +0100 Maxime Coquelin wrote: Hi Jonas, On 01/16/2018 05:08 PM, Jonas Pfefferle wrote: On Mon, 15 Jan 2018 17:11:58 +0100 Thomas Monjalon wrote: 15/01/2018 13:22, Jonas Pfefferle: On Sat, 13 Jan 2018 23:49:30 +0100 Thomas Monjalon wrote: > 13/01/2018 13:15, Burakov, Anatoly: >> On 11-Jan-18 11:45 PM, Thomas Monjalon wrote: >> > 07/11/2017 10:50, Jonas Pfefferle1: >> >>> Is there something urgent for 17.11? >> >>> Or can it be refined in 18.02? >> >> >> >> Nothing urgent. We can refine this for 18.02. >> >> >> >>> Anatoly, any thought? >> > >> > Anatoly, Jonas, how do you want to proceed with this patch? >> > >> >> I don't see anything to be refined here, it's a simple bug fix - >>code >> assumes noiommu mode support is always available, when it might not >>be >> the case on older kernels. > > As a bug fix, the title must start with "fix" and a tag "Fixes:" > must be added to help with backport. > At the same time, the explanation of the bug must be added in > the commit log please. > > Thanks It's not really a bug fix since it does not change the semantic of the function but just adds nicer error handling. Regarding redefining the code: What I don't like is the special cases we have to check for when using the sPAPR iommu because it does not support VA mappings yet. I think we should decide which iova mode to use based on the iommu types available, i.e. each iommu type should report which iova type it supports. Thoughts? Have you looked at what Maxime did? https://dpdk.org/dev/patchwork/patch/33650/ How does it affect this patch? IMO it has the same problem. We shouldn't add more exception cases in drivers/bus/pci/linux/pci.c but instead keep all the information about what an IOMMU can do in lib/librte_eal/linuxapp/eal/eal_vfio.c I agree adding an exception in drivers/bus/pci/linux/pci.c isn't great, but we need first to fix a regression introduced in v17.11 LTS, and IMHO, we cannot do a big rework as the fix is to be backported. Once fixed, I agree we should work on a refactoring. I don't know if eal_vfio is the right place though, as in my case for example I cannot get the information needed through vfio ioctl(). Out of curiosity, what prevents sPAPR to use VA mode for now? Maxime Sounds good to me. The current sPAPR Linux driver cannot use virtual addresses because the DMA window size is restricted to RAM size and always starts at 0. This is not a hardware restriction and we are working on allowing to create arbitrary size windows. Thanks for the clarification, is the DMA window size information accessible from user-space, so that we can enable VA mode or not depending on its value? Maxime Jonas
Re: [dpdk-dev] [PATCH v3 01/19] crypto/ccp: add AMD ccp skeleton PMD
>Hi Ravi, > >> -Original Message- >> From: Kumar, Ravi1 [mailto:ravi1.ku...@amd.com] >> Sent: Thursday, January 11, 2018 6:37 AM >> To: De Lara Guarch, Pablo ; >> dev@dpdk.org >> Cc: Shippen, Greg >> Subject: RE: [PATCH v3 01/19] crypto/ccp: add AMD ccp skeleton PMD >> >> > -Original Message- >> >> From: Kumar, Ravi1 [mailto:ravi1.ku...@amd.com] >> >> Sent: Wednesday, January 10, 2018 10:30 AM >> >> To: De Lara Guarch, Pablo ; >> >> dev@dpdk.org >> >> Cc: Shippen, Greg >> >> Subject: RE: [PATCH v3 01/19] crypto/ccp: add AMD ccp skeleton PMD >> >> >> >> >Hi Ravi, >> >> > >> >> >> -Original Message- >> >> >> From: Ravi Kumar [mailto:ravi1.ku...@amd.com] >> >> >> Sent: Wednesday, January 10, 2018 9:43 AM >> >> >> To: dev@dpdk.org >> >> >> Cc: De Lara Guarch, Pablo >> >> >> Subject: [PATCH v3 01/19] crypto/ccp: add AMD ccp skeleton PMD >> >> >> >> >> >> Signed-off-by: Ravi Kumar >> >> >> --- >> >> > >> >> >... >> >> > >> >> >> --- /dev/null >> >> >> +++ b/drivers/crypto/ccp/Makefile >> >> >> @@ -0,0 +1,55 @@ >> >> >> +# >> >> >> +# Copyright(c) 2018 Advanced Micro Devices, Inc. >> >> >> +# All rights reserved. >> >> >> +# >> >> > >> >> >As Hemant commented, you need to change this full license with >> >> >SPDX >> >> tags: >> >> > >> >> >http://dpdk.org/ml/archives/dev/2018-January/085510.html >> >> > >> >> >Could you submit a v4 with these changes today? >> >> > >> >> >Thanks, >> >> >Pablo >> >> >> >> Hi Pablo, >> >> >> >> Our legal team is still working on the license. We want to get the >> >> code reviewed in parallel. >> >> I will give you and update later if we can upload the v4 patch with >> >> SPDX tags today. >> >> >> > >> >Ok, I have looked at the overall patchset and it looks ok to me. >> >The only thing that I would change is the title of patch 7/19: >> >I would change it to "crypto/ccp: support sessionless operations". >> > >> >Regards, >> >Pablo >> >> Hi Pablo, >> >> Thanks for going through our code. Good to know the code is fine. >> >> The new changes in licensing policy has introduced some delays as we >> work through the implications and get agreement from the third party. >> >> We are working on it and will upload the v4 with SPDX license tags as >> soon as possible. >> > >Any news? RC1 will be out on Friday, and the subtree should be close tomorrow >evening, and there will not be more features added after this point. > >Thanks, >Pablo Hi Pablo, We are still in the process of getting the sign off on the new SPDX license tags from the third party. We expect an answer from them by the end of the week at the earliest. I will get back to you as soon as I have some update. Regards, Ravi > >> Regards, >> Ravi >> > >> >> Regards, >> >> Ravi
Re: [dpdk-dev] [PATCH v8 3/3] ring: introduce new header file to support C11 memory model
17/01/2018 09:47, Jia He: > > Hi Thomas > > On 1/17/2018 4:24 PM, Thomas Monjalon Wrote: > > 17/01/2018 05:03, Jia He: > >> To support C11 memory model barrier, 2 options are suggested by Jerin: > >> 1. use rte_smp_rmb > >> 2. use load_acquire/store_release(refer to [1]). > >> CONFIG_RTE_RING_USE_C11_MEM_MODEL is provided, and by default it is "n" > >> on any architectures so far. > > In previous patches, it was enabled for ARM. > > You decided to not enable it at all? > Sorry, maybe I misunderstand your previous mail. > >This config option should be added in the common file (as disabled). > Do you mean CONFIG_RTE_RING_USE_C11_MEM_MODEL=n in comm_base and > "y" in armv8 config? Yes, exactly
Re: [dpdk-dev] [PATCH] doc: document the new devargs syntax
Hi Thomas, On Wed, Jan 17, 2018 at 01:03:50AM +0100, Thomas Monjalon wrote: > 17/01/2018 00:46, Gaëtan Rivet: > > On Wed, Jan 17, 2018 at 12:22:43AM +0100, Thomas Monjalon wrote: > > > 17/01/2018 00:19, Gaëtan Rivet: > > > > It might be a nitpick, but the driver specific properties might not > > > > follow the key/value pair syntax. At least for the fail-safe, a custom > > > > parsing needs to happen. I think vdev in general will need flexibility. > > > > > > What is more flexible than key/value? > > > > fail-safe does not need something more flexible, but different. > > It needs to define substrings describing whole devices, thus substrings > > following the aforementioned syntax. > > > > I choose to use ( and ) as markers of beginning and end of the "special > > sub-device part that we need to skip for the moment". In the end, I need > > a way to mark the beginning and the end of a parameter. Without this, > > the next parameter would be considered as the parameter of the > > sub-device, not of the fail-safe. > > > > = separated key/value pair does not allow for this (or with very > > convoluted additional rules to the syntax). > > OK, I agree we need beginning and end markers. > I wonder whether we should consider devargs as a specific case of value. Not sure I follow: you would want to consider a different syntax whether we are defining or identifying a device? This seems dangerous to me, a single unifying syntax should be used. But I probably misunderstood. > Maybe we just want to allow using marker characters inside values. That would be nice. That, or allow drivers to use arbitrary parameters, by freeing the last field (past the "driver" token of the last category). Do you have a justification for restricting drivers parameters? Why couldn't this only be structured by commas (or any separators), and otherwise left to the drivers to do as they see fit? > So we can use parens or quotes to optionnaly protect the values. > But as the shell developers learned, parens are better than quotes in > the long term because it allows nested expressions. > This was the initial reason for using parens in the fail-safe, yes. However, any paired symbol could do, and parens do not actually play nice within a command in shell (the shell keep trying to capture the parens for its own parsing). The usual alternative was to use {}. I'd vote for this. > > > > There could be a note that after the comma past the eventual > > > > "driver=" pair, the syntax is driver-specific and might not follow > > > > the equal-separated key/value pair syntax. > > > > > > Please give an example. > > > > bus=vdev/driver=failsafe,dev(bus=pci,id=00:02.0),fd(/some/file/) > > > > Here, without some kind of "end-of-parameter" mark, fd() would be > > considered as a new parameter of the sub-device 00:02.0 > > Right. > I think an equal sign is missing between "dev" and parenthesis. > > > -- > > > > And while I'm at it, there is an ambiguity that might need to be defined > > before the whole shebang is implemented: whether the parameters > > positions are meaningful or not. Currently some drivers might consider their > > parameters to mean different things depending on their order of appearance. > > > > It could help to explicitly say that the order is asemic and should not > > be considered by drivers. > > > > Why this is important: I think that depending on the new rte_devargs > > representation, it could be beneficial to have a canonical representation > > of an rte_devargs: given an arbitrary string given by users, the devargs > > could then be rewritten in a determinist way, which would help implementing > > comparison, assignment, and some other operations. > > > > However, for this canonicalization to be possible, order needs to be > > explicitly said to be meaningless. > > Good idea. I vote for meaningless ordering, except the first property > of each category, which describes the category. Agreed. -- Gaëtan Rivet 6WIND
Re: [dpdk-dev] [PATCH] doc: document the new devargs syntax
17/01/2018 10:37, Gaëtan Rivet: > Hi Thomas, > > On Wed, Jan 17, 2018 at 01:03:50AM +0100, Thomas Monjalon wrote: > > 17/01/2018 00:46, Gaëtan Rivet: > > > On Wed, Jan 17, 2018 at 12:22:43AM +0100, Thomas Monjalon wrote: > > > > 17/01/2018 00:19, Gaëtan Rivet: > > > > > It might be a nitpick, but the driver specific properties might not > > > > > follow the key/value pair syntax. At least for the fail-safe, a custom > > > > > parsing needs to happen. I think vdev in general will need > > > > > flexibility. > > > > > > > > What is more flexible than key/value? > > > > > > fail-safe does not need something more flexible, but different. > > > It needs to define substrings describing whole devices, thus substrings > > > following the aforementioned syntax. > > > > > > I choose to use ( and ) as markers of beginning and end of the "special > > > sub-device part that we need to skip for the moment". In the end, I need > > > a way to mark the beginning and the end of a parameter. Without this, > > > the next parameter would be considered as the parameter of the > > > sub-device, not of the fail-safe. > > > > > > = separated key/value pair does not allow for this (or with very > > > convoluted additional rules to the syntax). > > > > OK, I agree we need beginning and end markers. > > I wonder whether we should consider devargs as a specific case of value. > > Not sure I follow: you would want to consider a different syntax whether > we are defining or identifying a device? > > This seems dangerous to me, a single unifying syntax should be used. But > I probably misunderstood. No, I'm just saying that it is a more generic problem: values can contain some characters used in this syntax. So yes, we need to protect them with parens (or braces). > > Maybe we just want to allow using marker characters inside values. > > That would be nice. That, or allow drivers to use arbitrary parameters, > by freeing the last field (past the "driver" token of the last > category). > > Do you have a justification for restricting drivers parameters? Why > couldn't this only be structured by commas (or any separators), and otherwise > left to the drivers to do as they see fit? User experience. I don't think key/value is restricting. > > So we can use parens or quotes to optionnaly protect the values. > > But as the shell developers learned, parens are better than quotes in > > the long term because it allows nested expressions. > > This was the initial reason for using parens in the fail-safe, yes. > > However, any paired symbol could do, and parens do not actually play > nice within a command in shell (the shell keep trying to capture the > parens for its own parsing). > > The usual alternative was to use {}. I'd vote for this. Yes braces are also OK. > > > > > There could be a note that after the comma past the eventual > > > > > "driver=" pair, the syntax is driver-specific and might not follow > > > > > the equal-separated key/value pair syntax. > > > > > > > > Please give an example. > > > > > > bus=vdev/driver=failsafe,dev(bus=pci,id=00:02.0),fd(/some/file/) > > > > > > Here, without some kind of "end-of-parameter" mark, fd() would be > > > considered as a new parameter of the sub-device 00:02.0 > > > > Right. > > I think an equal sign is missing between "dev" and parenthesis. > > > > > -- > > > > > > And while I'm at it, there is an ambiguity that might need to be defined > > > before the whole shebang is implemented: whether the parameters > > > positions are meaningful or not. Currently some drivers might consider > > > their > > > parameters to mean different things depending on their order of > > > appearance. > > > > > > It could help to explicitly say that the order is asemic and should not > > > be considered by drivers. > > > > > > Why this is important: I think that depending on the new rte_devargs > > > representation, it could be beneficial to have a canonical representation > > > of an rte_devargs: given an arbitrary string given by users, the devargs > > > could then be rewritten in a determinist way, which would help > > > implementing > > > comparison, assignment, and some other operations. > > > > > > However, for this canonicalization to be possible, order needs to be > > > explicitly said to be meaningless. > > > > Good idea. I vote for meaningless ordering, except the first property > > of each category, which describes the category. > > Agreed.
Re: [dpdk-dev] [PATCH 1/2] lib/cryptodev: add support to set session private data
Hi Abhinandan, > -Original Message- > From: Gujjar, Abhinandan S > Sent: Wednesday, January 17, 2018 6:35 AM > To: Akhil Goyal ; Doherty, Declan > ; De Lara Guarch, Pablo > ; Jacob, Jerin > > Cc: dev@dpdk.org; Vangati, Narender ; Rao, > Nikhil > Subject: RE: [PATCH 1/2] lib/cryptodev: add support to set session private > data > > Hi Akhil, > ... > I guess, you are suggesting below changes: > diff --git a/lib/librte_cryptodev/rte_cryptodev.h > b/lib/librte_cryptodev/rte_cryptodev.h > index 56958a6..057c39a 100644 > --- a/lib/librte_cryptodev/rte_cryptodev.h > +++ b/lib/librte_cryptodev/rte_cryptodev.h > @@ -892,6 +892,8 @@ struct rte_cryptodev_data { > > /** Cryptodev symmetric crypto session */ struct > rte_cryptodev_sym_session { > + uint16_t private_data_offset; > + /**< Private data offset */ > __extension__ void *sess_private_data[0]; > /**< Private session material */ }; I am ok with this. > > Declan/Pablo, > Is this ok? Do you see any impact on performance or anything else has to be > considered? This is breaking ABI, and since there is a zero length array, this latter has to be at the end of the structure. Therefore, this is not a valid option unless ABI deprecation is announced and then it could be merged in the next release. Pablo
Re: [dpdk-dev] [PATCH 1/2] lib/cryptodev: add support to set session private data
Hi Akhil, > -Original Message- > From: De Lara Guarch, Pablo > Sent: Wednesday, January 17, 2018 3:16 PM > To: Gujjar, Abhinandan S ; Akhil Goyal > ; Doherty, Declan ; Jacob, > Jerin > Cc: dev@dpdk.org; Vangati, Narender ; Rao, > Nikhil > Subject: RE: [PATCH 1/2] lib/cryptodev: add support to set session private > data > > Hi Abhinandan, > > > -Original Message- > > From: Gujjar, Abhinandan S > > Sent: Wednesday, January 17, 2018 6:35 AM > > To: Akhil Goyal ; Doherty, Declan > > ; De Lara Guarch, Pablo > > ; Jacob, Jerin > > > > Cc: dev@dpdk.org; Vangati, Narender ; Rao, > > Nikhil > > Subject: RE: [PATCH 1/2] lib/cryptodev: add support to set session > > private data > > > > Hi Akhil, > > > > ... > > > I guess, you are suggesting below changes: > > diff --git a/lib/librte_cryptodev/rte_cryptodev.h > > b/lib/librte_cryptodev/rte_cryptodev.h > > index 56958a6..057c39a 100644 > > --- a/lib/librte_cryptodev/rte_cryptodev.h > > +++ b/lib/librte_cryptodev/rte_cryptodev.h > > @@ -892,6 +892,8 @@ struct rte_cryptodev_data { > > > > /** Cryptodev symmetric crypto session */ struct > > rte_cryptodev_sym_session { > > + uint16_t private_data_offset; > > + /**< Private data offset */ > > __extension__ void *sess_private_data[0]; > > /**< Private session material */ }; I am ok with this. > > > > Declan/Pablo, > > Is this ok? Do you see any impact on performance or anything else has > > to be considered? > > This is breaking ABI, and since there is a zero length array, this latter has > to be at > the end of the structure. > Therefore, this is not a valid option unless ABI deprecation is announced and > then it could be merged in the next release. What is your opinion on this? Should we consider retaining the enum rte_crypto_op_private_data_type? > > Pablo Abhinandan
Re: [dpdk-dev] [PATCH] doc: document the new devargs syntax
A new suggestion about the syntax. On Tue, Jan 16, 2018 at 10:50:18PM +0800, Yuanhan Liu wrote: > This patch documents the new devargs syntax, which is going to be > implemented in DPDK v18.05. > > The new devargs proposal is introduced for having a consistent > interface for: > > - whitelisting/blacklisting devices > - identifying ports > - attaching/detaching devices > > Please check the patch content for the details. Also, here is link > for the background: > http://dpdk.org/ml/archives/dev/2017-November/082600.html > > This syntax is suggestd by Thomas: > http://dpdk.org/ml/archives/dev/2017-December/084234.html > > Signed-off-by: Yuanhan Liu > --- > doc/guides/prog_guide/env_abstraction_layer.rst | 34 > + > 1 file changed, 34 insertions(+) > > diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst > b/doc/guides/prog_guide/env_abstraction_layer.rst > index 34d871c..12f37f2 100644 > --- a/doc/guides/prog_guide/env_abstraction_layer.rst > +++ b/doc/guides/prog_guide/env_abstraction_layer.rst > @@ -213,6 +213,40 @@ device having emitted a Device Removal Event. In such > case, calling > callback. Care must be taken not to close the device from the interrupt > handler > context. It is necessary to reschedule such closing operation. > > +Devargs > +~~~ > + > +The ``devargs`` can be used for whitelisting/blacklisting devices, > identifying > +DPDK ports and attaching/deatching devices. They all share the same syntax. > + > +It is split in 3 categories, where almost everything is optional key/value > pairs: > + > +* bus (pci, vdev, vmbus, fslmc, etc) > +* class (eth, crypto, etc) > +* driver (i40e, mlx5, virtio, etc) > + > +The key/value pair describing the category scope is mandatory and must be the > +first pair in the category properties. Example: bus=pci, must be placed > before > +id=:01:00.0. > + > +The syntax has below rules: > + > +* Between categories, the separator is a slash. > +* Inside a category, the separator is a comma. > +* Inside a key/value pair, the separator is an equal sign. > +* Each category can be used alone. > + > +Here is an example with all categories:: > + > + > bus=pci,id=:01:00.0/class=eth,mac=00:11:22:33:44:55/driver=PMD_NAME,driverspecificproperty=VALUE > + > +It can also be simple like below:: > + > +class=eth,mac=00:11:22:33:44:55 > + 1/ All categories are named, their order is known, and their name comes first. It is thus possible to declare categories unambiguously without using the field name first. pci,id=:01:00.0/eth,mac=00:11:22:33:44:55/PMD_NAME,driverspecificproperty=VALUE eth,mac=00:11:22:33:44:55 pci,id=00:02.0 The only requirement for this to hold is for the layer names not to collide (bus, dev, drivers), which seems like an easy enough requirement to follow. --- > +A device is identified when every properties are matched. Before device is > +probed, only the bus category is relevant. > + 2/ When using the following ID: class=eth,mac=00:11:22:33:44:55 The bus is skipped, as well as the driver. Does that mean that it is allowed to skip any layer, as long as the resulting match is unambiguous? What I mean is that a user could then do driver=net_ring To find the one port using the driver net_ring. --- 3/ The driver would generate a name for the eth_dev structure. I guess it would be possible to identify the device with class=eth,name=net_ring0 Or something. Where does the rte_dev name appears? Is it a property of the bus layer? Is it merged with the eth_dev name? If so, which one will stay, the rte_dev name or the eth_dev name? -- Gaëtan Rivet 6WIND
Re: [dpdk-dev] [PATCH v2] vfio: noiommu check error handling
On Wed, 17 Jan 2018 09:55:37 +0100 Maxime Coquelin wrote: On 01/17/2018 09:48 AM, Jonas Pfefferle wrote: On Tue, 16 Jan 2018 18:01:32 +0100 Maxime Coquelin wrote: Hi Jonas, On 01/16/2018 05:08 PM, Jonas Pfefferle wrote: On Mon, 15 Jan 2018 17:11:58 +0100 Thomas Monjalon wrote: 15/01/2018 13:22, Jonas Pfefferle: On Sat, 13 Jan 2018 23:49:30 +0100 Thomas Monjalon wrote: > 13/01/2018 13:15, Burakov, Anatoly: >> On 11-Jan-18 11:45 PM, Thomas Monjalon wrote: >> > 07/11/2017 10:50, Jonas Pfefferle1: >> >>> Is there something urgent for 17.11? >> >>> Or can it be refined in 18.02? >> >> >> >> Nothing urgent. We can refine this for 18.02. >> >> >> >>> Anatoly, any thought? >> > >> > Anatoly, Jonas, how do you want to proceed with this patch? >> > >> >> I don't see anything to be refined here, it's a simple bug fix - >>code >> assumes noiommu mode support is always available, when it might not >>be >> the case on older kernels. > > As a bug fix, the title must start with "fix" and a tag "Fixes:" > must be added to help with backport. > At the same time, the explanation of the bug must be added in > the commit log please. > > Thanks It's not really a bug fix since it does not change the semantic of the function but just adds nicer error handling. Regarding redefining the code: What I don't like is the special cases we have to check for when using the sPAPR iommu because it does not support VA mappings yet. I think we should decide which iova mode to use based on the iommu types available, i.e. each iommu type should report which iova type it supports. Thoughts? Have you looked at what Maxime did? https://dpdk.org/dev/patchwork/patch/33650/ How does it affect this patch? IMO it has the same problem. We shouldn't add more exception cases in drivers/bus/pci/linux/pci.c but instead keep all the information about what an IOMMU can do in lib/librte_eal/linuxapp/eal/eal_vfio.c I agree adding an exception in drivers/bus/pci/linux/pci.c isn't great, but we need first to fix a regression introduced in v17.11 LTS, and IMHO, we cannot do a big rework as the fix is to be backported. Once fixed, I agree we should work on a refactoring. I don't know if eal_vfio is the right place though, as in my case for example I cannot get the information needed through vfio ioctl(). Out of curiosity, what prevents sPAPR to use VA mode for now? Maxime Sounds good to me. The current sPAPR Linux driver cannot use virtual addresses because the DMA window size is restricted to RAM size and always starts at 0. This is not a hardware restriction and we are working on allowing to create arbitrary size windows. Thanks for the clarification, is the DMA window size information accessible from user-space, so that we can enable VA mode or not depending on its value? Maxime Jonas AFAIK it is not queryable. But we can try to create a window and if it fails we try again with PA mode. Jonas
Re: [dpdk-dev] [PATCH v3] lib/librte_vhost: move fdset_del out of conn_mutex
Hi Yuanhan, Huawei, Can you please spare some time for code review? Thanks. Br, Zhike -Original Message- From: 王志克 Sent: Tuesday, January 02, 2018 6:09 PM To: dev@dpdk.org Cc: 王志克 Subject: [PATCH v3] lib/librte_vhost: move fdset_del out of conn_mutex From: wang zhike v3: * Fix duplicate variable name, which leads to unexpected memory write. v2: * Move fdset_del before conn destroy. * Fix coding style. This patch fixes below race condition: 1. one thread calls: rte_vhost_driver_unregister->lock conn_mutex ->fdset_del->loop to check fd.busy. 2. another thread calls fdset_event_dispatch, and the busy flag is changed AFTER handling on the fd, i.e, rcb(). However, the rcb, such as vhost_user_read_cb() would try to retrieve the conn_mutex. So issue is that the 1st thread will loop check the flag while holding the mutex, while the 2nd thread would be blocked by mutex and can not change the flag. Then dead lock is observed. Signed-off-by: zhike wang --- lib/librte_vhost/socket.c | 18 +- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/lib/librte_vhost/socket.c b/lib/librte_vhost/socket.c index 422da00..ea01327 100644 --- a/lib/librte_vhost/socket.c +++ b/lib/librte_vhost/socket.c @@ -749,6 +749,9 @@ struct vhost_user_reconnect_list { struct vhost_user_socket *vsocket = vhost_user.vsockets[i]; if (!strcmp(vsocket->path, path)) { + int del_fds[MAX_FDS]; + int num_of_fds = 0, fd_index; + if (vsocket->is_server) { fdset_del(&vhost_user.fdset, vsocket->socket_fd); close(vsocket->socket_fd); @@ -757,13 +760,26 @@ struct vhost_user_reconnect_list { vhost_user_remove_reconnect(vsocket); } + /* fdset_del() must be called without conn_mutex. */ + pthread_mutex_lock(&vsocket->conn_mutex); + for (conn = TAILQ_FIRST(&vsocket->conn_list); +conn != NULL; +conn = next) { + next = TAILQ_NEXT(conn, next); + + del_fds[num_of_fds++] = conn->connfd; + } + pthread_mutex_unlock(&vsocket->conn_mutex); + + for (fd_index = 0; fd_index < num_of_fds; fd_index++) + fdset_del(&vhost_user.fdset, del_fds[fd_index]); + pthread_mutex_lock(&vsocket->conn_mutex); for (conn = TAILQ_FIRST(&vsocket->conn_list); conn != NULL; conn = next) { next = TAILQ_NEXT(conn, next); - fdset_del(&vhost_user.fdset, conn->connfd); RTE_LOG(INFO, VHOST_CONFIG, "free connfd = %d for device '%s'\n", conn->connfd, path); -- 1.8.3.1
Re: [dpdk-dev] [PATCH v2 3/4] eal: add synchronous multi-process communication
> > Hi Jianfeng, > > > >> -Original Message- > >> From: Tan, Jianfeng > >> Sent: Tuesday, January 16, 2018 8:11 AM > >> To: Ananyev, Konstantin ; dev@dpdk.org; > >> Burakov, Anatoly > >> Cc: Richardson, Bruce ; tho...@monjalon.net > >> Subject: Re: [PATCH v2 3/4] eal: add synchronous multi-process > >> communication > >> > >> Thank you, Konstantin and Anatoly firstly. Other comments are well > >> received and I'll send out a new version. > >> > >> > >> On 1/16/2018 8:00 AM, Ananyev, Konstantin wrote: > We need the synchronous way for multi-process communication, > i.e., blockingly waiting for reply message when we send a request > to the peer process. > > We add two APIs rte_eal_mp_request() and rte_eal_mp_reply() for > such use case. By invoking rte_eal_mp_request(), a request message > is sent out, and then it waits there for a reply message. The > timeout is hard-coded 5 Sec. And the replied message will be copied > in the parameters of this API so that the caller can decide how > to translate those information (including params and fds). Note > if a primary process owns multiple secondary processes, this API > will fail. > > The API rte_eal_mp_reply() is always called by an mp action handler. > Here we add another parameter for rte_eal_mp_t so that the action > handler knows which peer address to reply. > > We use mutex in rte_eal_mp_request() to guarantee that only one > request is on the fly for one pair of processes. > >>> You don't need to do things in such strange and restrictive way. > >>> Instead you can do something like that: > >>> 1) Introduce new struct, list for it and mutex > >>>struct sync_request { > >>> int reply_received; > >>> char dst[PATH_MAX]; > >>> char reply[...]; > >>> LIST_ENTRY(sync_request) next; > >>> }; > >>> > >>> static struct > >>> LIST_HEAD(list, sync_request); > >>> pthread_mutex_t lock; > >>> pthead_cond_t cond; > >>> } sync_requests; > >>> > >>> 2) then at request() call: > >>> Grab sync_requests.lock > >>> Check do we already have a pending request for that destination, > >>> If yes - the release the lock and returns with error. > >>> - allocate and init new sync_request struct, set reply_received=0 > >>> - do send_msg() > >>> -then in a cycle: > >>> pthread_cond_timed_wait(&sync_requests.cond, &sync_request.lock, > >>> ×pec); > >>> - at return from it check if sync_request.reply_received == 1, if not > >>> check if timeout expired and either return a failure or go to the start > >>> of the cycle. > >>> > >>> 3) at mp_handler() if REPLY received - grab sync_request.lock, > >>> search through sync_requests.list for dst[] , > >>> if found, then set it's reply_received=1, copy the received message > >>> into reply > >>> and call pthread_cond_braodcast((&sync_requests.cond); > >> The only benefit I can see is that now the sender can request to > >> multiple receivers at the same time. And it makes things more > >> complicated. Do we really need this? > > The benefit is that one thread is blocked waiting for response, > > your mp_handler can still receive and handle other messages. > > This can already be done in the original implementation. mp_handler > listens for msg, request from the other peer(s), and replies the > requests, which is not affected. > > > Plus as you said - other threads can keep sending messages. > > For this one, in the original implementation, other threads can still > send msg, but not request. I suppose the request is not in a fast path, > why we care to make it fast? > +int +rte_eal_mp_request(const char *action_name, + void *params, + int len_p, + int fds[], + int fds_in, + int fds_out) +{ + int i, j; + int sockfd; + int nprocs; + int ret = 0; + struct mp_msghdr *req; + struct timeval tv; + char buf[MAX_MSG_LENGTH]; + struct mp_msghdr *hdr; + + RTE_LOG(DEBUG, EAL, "request: %s\n", action_name); + + if (fds_in > SCM_MAX_FD || fds_out > SCM_MAX_FD) { + RTE_LOG(ERR, EAL, "Cannot send more than %d FDs\n", SCM_MAX_FD); + rte_errno = -E2BIG; + return 0; + } + + req = format_msg(action_name, params, len_p, fds_in, MP_REQ); + if (req == NULL) + return 0; + + if ((sockfd = open_unix_fd(0)) < 0) { + free(req); + return 0; + } + + tv.tv_sec = 5; /* 5 Secs Timeout */ + tv.tv_usec = 0; + if (setsockopt(sockfd, SOL_SOCKET, SO_RCVTIMEO, + (const void *)&tv, sizeof(struct timeval)) < 0) + RTE_LOG(INFO, EAL, "Failed to set recv timeout\n"); I f you set it just for one call, why do you not restore it? Also I don't think it is a good idea to
Re: [dpdk-dev] [PATCH 1/2] lib/cryptodev: add support to set session private data
Hi Abhinandan, On 1/17/2018 3:35 PM, Gujjar, Abhinandan S wrote: Hi Akhil, -Original Message- From: De Lara Guarch, Pablo Sent: Wednesday, January 17, 2018 3:16 PM To: Gujjar, Abhinandan S ; Akhil Goyal ; Doherty, Declan ; Jacob, Jerin Cc: dev@dpdk.org; Vangati, Narender ; Rao, Nikhil Subject: RE: [PATCH 1/2] lib/cryptodev: add support to set session private data Hi Abhinandan, -Original Message- From: Gujjar, Abhinandan S Sent: Wednesday, January 17, 2018 6:35 AM To: Akhil Goyal ; Doherty, Declan ; De Lara Guarch, Pablo ; Jacob, Jerin Cc: dev@dpdk.org; Vangati, Narender ; Rao, Nikhil Subject: RE: [PATCH 1/2] lib/cryptodev: add support to set session private data Hi Akhil, ... I guess, you are suggesting below changes: diff --git a/lib/librte_cryptodev/rte_cryptodev.h b/lib/librte_cryptodev/rte_cryptodev.h index 56958a6..057c39a 100644 --- a/lib/librte_cryptodev/rte_cryptodev.h +++ b/lib/librte_cryptodev/rte_cryptodev.h @@ -892,6 +892,8 @@ struct rte_cryptodev_data { /** Cryptodev symmetric crypto session */ struct rte_cryptodev_sym_session { + uint16_t private_data_offset; + /**< Private data offset */ __extension__ void *sess_private_data[0]; /**< Private session material */ }; I am ok with this. Declan/Pablo, Is this ok? Do you see any impact on performance or anything else has to be considered? This is breaking ABI, and since there is a zero length array, this latter has to be at the end of the structure. Therefore, this is not a valid option unless ABI deprecation is announced and then it could be merged in the next release. What is your opinion on this? Should we consider retaining the enum rte_crypto_op_private_data_type? As per our previous discussion we are anyway pushing crypto adapter to next release, then we do have time for the deprecation notice to be sent. Or you can reserve the first byte of private data (internal to library) in the session to check whether the private data is valid or not. IMO, private data offset in session is a better approach instead of adding one more enum. Others can suggest. -Akhil Pablo Abhinandan
Re: [dpdk-dev] [PATCH] doc: document the new devargs syntax
17/01/2018 11:11, Gaëtan Rivet: > A new suggestion about the syntax. > > On Tue, Jan 16, 2018 at 10:50:18PM +0800, Yuanhan Liu wrote: > > This patch documents the new devargs syntax, which is going to be > > implemented in DPDK v18.05. > > > > The new devargs proposal is introduced for having a consistent > > interface for: > > > > - whitelisting/blacklisting devices > > - identifying ports > > - attaching/detaching devices > > > > Please check the patch content for the details. Also, here is link > > for the background: > > http://dpdk.org/ml/archives/dev/2017-November/082600.html > > > > This syntax is suggestd by Thomas: > > http://dpdk.org/ml/archives/dev/2017-December/084234.html > > > > Signed-off-by: Yuanhan Liu > > --- > > doc/guides/prog_guide/env_abstraction_layer.rst | 34 > > + > > 1 file changed, 34 insertions(+) > > > > diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst > > b/doc/guides/prog_guide/env_abstraction_layer.rst > > index 34d871c..12f37f2 100644 > > --- a/doc/guides/prog_guide/env_abstraction_layer.rst > > +++ b/doc/guides/prog_guide/env_abstraction_layer.rst > > @@ -213,6 +213,40 @@ device having emitted a Device Removal Event. In such > > case, calling > > callback. Care must be taken not to close the device from the interrupt > > handler > > context. It is necessary to reschedule such closing operation. > > > > +Devargs > > +~~~ > > + > > +The ``devargs`` can be used for whitelisting/blacklisting devices, > > identifying > > +DPDK ports and attaching/deatching devices. They all share the same syntax. > > + > > +It is split in 3 categories, where almost everything is optional key/value > > pairs: > > + > > +* bus (pci, vdev, vmbus, fslmc, etc) > > +* class (eth, crypto, etc) > > +* driver (i40e, mlx5, virtio, etc) > > + > > +The key/value pair describing the category scope is mandatory and must be > > the > > +first pair in the category properties. Example: bus=pci, must be placed > > before > > +id=:01:00.0. > > + > > +The syntax has below rules: > > + > > +* Between categories, the separator is a slash. > > +* Inside a category, the separator is a comma. > > +* Inside a key/value pair, the separator is an equal sign. > > +* Each category can be used alone. > > + > > +Here is an example with all categories:: > > + > > + > > bus=pci,id=:01:00.0/class=eth,mac=00:11:22:33:44:55/driver=PMD_NAME,driverspecificproperty=VALUE > > + > > +It can also be simple like below:: > > + > > +class=eth,mac=00:11:22:33:44:55 > > + > > 1/ > > All categories are named, their order is known, and their name comes > first. The order of categories is known but some categories may be omitted. > It is thus possible to declare categories unambiguously without using > the field name first. > > > pci,id=:01:00.0/eth,mac=00:11:22:33:44:55/PMD_NAME,driverspecificproperty=VALUE > eth,mac=00:11:22:33:44:55 > pci,id=00:02.0 > > The only requirement for this to hold is for the layer names not to collide > (bus, dev, drivers), which seems like an easy enough requirement to follow. I don't like it for 2 reasons: - it is less explicit - it is less obvious to dispatch to parsers > --- > > > +A device is identified when every properties are matched. Before device is > > +probed, only the bus category is relevant. > > + > > 2/ > > When using the following ID: > > class=eth,mac=00:11:22:33:44:55 > > The bus is skipped, as well as the driver. > Does that mean that it is allowed to skip any layer, as long as the > resulting match is unambiguous? > > What I mean is that a user could then do > > driver=net_ring > > To find the one port using the driver net_ring. I think yes, if there is only one net_ring device. > --- > > 3/ > > The driver would generate a name for the eth_dev structure. > I guess it would be possible to identify the device with > > class=eth,name=net_ring0 > > Or something. Where does the rte_dev name appears? Is it a property > of the bus layer? Is it merged with the eth_dev name? If so, which one > will stay, the rte_dev name or the eth_dev name? EAL device and ethdev are different layers. There is no 1:1 mapping. So both must stay. rte_dev name should be a property of the bus layer.
Re: [dpdk-dev] [PATCH 1/2] test: rely on dynamic log level to display hexdumps
08/12/2017 14:21, Olivier Matz: > Instead of relying on a compile-time option, use the global log-level > to decide if the hexdumps should be displayed in the tests. > > Valitation: > > # build/app/test --no-huge > RTE>>crc_autotest > Test OK > > # build/app/test --no-huge --log-level=8 > RTE>>crc_autotest > [many hexdumps...] > Test OK > > Signed-off-by: Olivier Matz Series applied, thanks
[dpdk-dev] [PATCH v3 1/6] test: fix memory leak in bitmap test
Acked-by: Cristian Dumitrescu Fixes: c7e4a134e769 ("test: verify bitmap operations") Cc: pbhagavat...@caviumnetworks.com Signed-off-by: Anatoly Burakov --- test/test/test_bitmap.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/test/test/test_bitmap.c b/test/test/test_bitmap.c index 05d547e..c3169e9 100644 --- a/test/test/test_bitmap.c +++ b/test/test/test_bitmap.c @@ -158,6 +158,9 @@ test_bitmap(void) if (test_bitmap_scan_operations(bmp) < 0) return TEST_FAILED; + rte_bitmap_free(bmp); + rte_free(mem); + return TEST_SUCCESS; } -- 2.7.4
[dpdk-dev] [PATCH v3 3/6] test: fix memory leak in ring autotest
Get rid of global static ring variable and don't reuse rings between test runs. Fixes: 4e32101f9b01 ("ring: support freeing") Cc: pablo.de.lara.gua...@intel.com Cc: sta...@dpdk.org Signed-off-by: Anatoly Burakov --- Notes: v3: fix commit message to point to approriate commit being fixed v2: remove static ring variable test/test/test_ring.c | 61 --- 1 file changed, 33 insertions(+), 28 deletions(-) diff --git a/test/test/test_ring.c b/test/test/test_ring.c index 19d497a..b339f04 100644 --- a/test/test/test_ring.c +++ b/test/test/test_ring.c @@ -57,8 +57,6 @@ static rte_atomic32_t synchro; -static struct rte_ring *r; - #defineTEST_RING_VERIFY(exp) \ if (!(exp)) { \ printf("error at %s:%d\tcondition " #exp " failed\n", \ @@ -73,7 +71,7 @@ static struct rte_ring *r; * helper routine for test_ring_basic */ static int -test_ring_basic_full_empty(void * const src[], void *dst[]) +test_ring_basic_full_empty(struct rte_ring *r, void * const src[], void *dst[]) { unsigned i, rand; const unsigned rsz = RING_SIZE - 1; @@ -114,7 +112,7 @@ test_ring_basic_full_empty(void * const src[], void *dst[]) } static int -test_ring_basic(void) +test_ring_basic(struct rte_ring *r) { void **src = NULL, **cur_src = NULL, **dst = NULL, **cur_dst = NULL; int ret; @@ -250,7 +248,7 @@ test_ring_basic(void) goto fail; } - if (test_ring_basic_full_empty(src, dst) != 0) + if (test_ring_basic_full_empty(r, src, dst) != 0) goto fail; cur_src = src; @@ -317,7 +315,7 @@ test_ring_basic(void) } static int -test_ring_burst_basic(void) +test_ring_burst_basic(struct rte_ring *r) { void **src = NULL, **cur_src = NULL, **dst = NULL, **cur_dst = NULL; int ret; @@ -731,6 +729,7 @@ test_ring_basic_ex(void) ret = 0; fail_test: + rte_ring_free(rp); if (obj != NULL) rte_free(obj); @@ -811,61 +810,67 @@ test_ring_with_exact_size(void) static int test_ring(void) { + struct rte_ring *r = NULL; + /* some more basic operations */ if (test_ring_basic_ex() < 0) - return -1; + goto test_fail; rte_atomic32_init(&synchro); + r = rte_ring_create("test", RING_SIZE, SOCKET_ID_ANY, 0); if (r == NULL) - r = rte_ring_create("test", RING_SIZE, SOCKET_ID_ANY, 0); - if (r == NULL) - return -1; + goto test_fail; /* retrieve the ring from its name */ if (rte_ring_lookup("test") != r) { printf("Cannot lookup ring from its name\n"); - return -1; + goto test_fail; } /* burst operations */ - if (test_ring_burst_basic() < 0) - return -1; + if (test_ring_burst_basic(r) < 0) + goto test_fail; /* basic operations */ - if (test_ring_basic() < 0) - return -1; + if (test_ring_basic(r) < 0) + goto test_fail; /* basic operations */ if ( test_create_count_odd() < 0){ - printf ("Test failed to detect odd count\n"); - return -1; - } - else - printf ( "Test detected odd count\n"); + printf("Test failed to detect odd count\n"); + goto test_fail; + } else + printf("Test detected odd count\n"); if ( test_lookup_null() < 0){ - printf ("Test failed to detect NULL ring lookup\n"); - return -1; - } - else - printf ( "Test detected NULL ring lookup \n"); + printf("Test failed to detect NULL ring lookup\n"); + goto test_fail; + } else + printf("Test detected NULL ring lookup\n"); /* test of creating ring with wrong size */ if (test_ring_creation_with_wrong_size() < 0) - return -1; + goto test_fail; /* test of creation ring with an used name */ if (test_ring_creation_with_an_used_name() < 0) - return -1; + goto test_fail; if (test_ring_with_exact_size() < 0) - return -1; + goto test_fail; /* dump the ring status */ rte_ring_list_dump(stdout); + rte_ring_free(r); + return 0; + +test_fail: + rte_ring_free(r); + + return -1; } REGISTER_TEST_COMMAND(ring_autotest, test_ring); -- 2.7.4
[dpdk-dev] [PATCH v3 2/6] test: fix memory leak in reorder autotest
Add a teardown function that frees allocated resources. Fixes: d0c9b58d7156 ("app/test: new reorder unit test") Cc: sergio.gonzalez.mon...@intel.com Cc: reshma.pat...@intek.com Cc: sta...@dpdk.org Signed-off-by: Anatoly Burakov --- test/test/test_reorder.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/test/test/test_reorder.c b/test/test/test_reorder.c index e834bac..65e4f38 100644 --- a/test/test/test_reorder.c +++ b/test/test/test_reorder.c @@ -331,9 +331,20 @@ test_setup(void) return 0; } +static void +test_teardown(void) +{ + rte_reorder_free(test_params->b); + test_params->b = NULL; + rte_mempool_free(test_params->p); + test_params->p = NULL; +} + + static struct unit_test_suite reorder_test_suite = { .setup = test_setup, + .teardown = test_teardown, .suite_name = "Reorder Unit Test Suite", .unit_test_cases = { TEST_CASE(test_reorder_create), -- 2.7.4
[dpdk-dev] [PATCH v3 4/6] test: fix memory leak in ring perf autotest
Fixes: ac3fb3019c52 ("app: rework ring tests") Cc: sta...@dpdk.org Signed-off-by: Anatoly Burakov --- Notes: v2: remove static ring variable test/test/test_ring_perf.c | 36 1 file changed, 20 insertions(+), 16 deletions(-) diff --git a/test/test/test_ring_perf.c b/test/test/test_ring_perf.c index 4363e4d..ebb3939 100644 --- a/test/test/test_ring_perf.c +++ b/test/test/test_ring_perf.c @@ -32,9 +32,6 @@ */ static const volatile unsigned bulk_sizes[] = { 8, 32 }; -/* The ring structure used for tests */ -static struct rte_ring *r; - struct lcore_pair { unsigned c1, c2; }; @@ -115,7 +112,7 @@ get_two_sockets(struct lcore_pair *lcp) /* Get cycle counts for dequeuing from an empty ring. Should be 2 or 3 cycles */ static void -test_empty_dequeue(void) +test_empty_dequeue(struct rte_ring *r) { const unsigned iter_shift = 26; const unsigned iterations = 1size; unsigned i; void *burst[MAX_BURST] = {0}; @@ -192,6 +191,7 @@ dequeue_bulk(void *p) const unsigned iter_shift = 23; const unsigned iterations = 1 size; unsigned i; void *burst[MAX_BURST] = {0}; @@ -222,7 +222,7 @@ dequeue_bulk(void *p) * used to measure ring perf between hyperthreads, cores and sockets. */ static void -run_on_core_pair(struct lcore_pair *cores, +run_on_core_pair(struct lcore_pair *cores, struct rte_ring *r, lcore_function_t f1, lcore_function_t f2) { struct thread_params param1 = {0}, param2 = {0}; @@ -230,6 +230,7 @@ run_on_core_pair(struct lcore_pair *cores, for (i = 0; i < sizeof(bulk_sizes)/sizeof(bulk_sizes[0]); i++) { lcore_count = 0; param1.size = param2.size = bulk_sizes[i]; + param1.r = param2.r = r; if (cores->c1 == rte_get_master_lcore()) { rte_eal_remote_launch(f2, ¶m2, cores->c2); f1(¶m1); @@ -252,7 +253,7 @@ run_on_core_pair(struct lcore_pair *cores, * takes on a single lcore. Result is for comparison with the bulk enq+deq. */ static void -test_single_enqueue_dequeue(void) +test_single_enqueue_dequeue(struct rte_ring *r) { const unsigned iter_shift = 24; const unsigned iterations = 1<
[dpdk-dev] [PATCH v3 6/6] test: fix memory leak in timer perf autotest
Fixes: 277afaf3dbcb ("app/test: add timer_perf") Cc: sta...@dpdk.org Signed-off-by: Anatoly Burakov --- test/test/test_timer_perf.c | 1 + 1 file changed, 1 insertion(+) diff --git a/test/test/test_timer_perf.c b/test/test/test_timer_perf.c index afa3a06..0fe2b74 100644 --- a/test/test/test_timer_perf.c +++ b/test/test/test_timer_perf.c @@ -127,6 +127,7 @@ test_timer_perf(void) printf("Time per rte_timer_manage with zero callbacks: %"PRIu64" cycles\n", (end_tsc - start_tsc + iterations/2) / iterations); + rte_free(tms); return 0; } -- 2.7.4
[dpdk-dev] [PATCH v3 5/6] test: fix memory leak in table autotest
Always deallocate allocated resources after the test is done. Acked-by: Cristian Dumitrescu Fixes: 5205954791cb ("app/test: packet framework unit tests") Cc: cristian.dumitre...@intel.com Cc: sta...@dpdk.org Signed-off-by: Anatoly Burakov --- test/test/test_table.c | 44 1 file changed, 28 insertions(+), 16 deletions(-) diff --git a/test/test/test_table.c b/test/test/test_table.c index c9268a3..f01652d 100644 --- a/test/test/test_table.c +++ b/test/test/test_table.c @@ -55,6 +55,14 @@ uint64_t pipeline_test_hash(void *key, } static void +app_free_resources(void) { + int i; + for (i = 0; i < N_PORTS; i++) + rte_ring_free(rings_rx[i]); + rte_mempool_free(pool); +} + +static void app_init_mbuf_pools(void) { /* Init the buffer pool */ @@ -113,18 +121,20 @@ app_init_rings(void) static int test_table(void) { - int status, failures; + int status, ret; unsigned i; - failures = 0; + ret = TEST_SUCCESS; app_init_rings(); app_init_mbuf_pools(); printf("\n\n\n\nPipeline tests\n"); - if (test_table_pipeline() < 0) - return -1; + if (test_table_pipeline() < 0) { + ret = TEST_FAILED; + goto end; + } printf("\n\n\n\nPort tests\n"); for (i = 0; i < n_port_tests; i++) { @@ -132,8 +142,8 @@ test_table(void) if (status < 0) { printf("\nPort test number %d failed (%d).\n", i, status); - failures++; - return -1; + ret = TEST_FAILED; + goto end; } } @@ -143,8 +153,8 @@ test_table(void) if (status < 0) { printf("\nTable test number %d failed (%d).\n", i, status); - failures++; - return -1; + ret = TEST_FAILED; + goto end; } } @@ -154,21 +164,23 @@ test_table(void) if (status < 0) { printf("\nCombined table test number %d failed with " "reason number %d.\n", i, status); - failures++; - return -1; + ret = TEST_FAILED; + goto end; } } - if (failures) - return -1; - #ifdef RTE_LIBRTE_ACL printf("\n\n\n\nACL tests\n"); - if (test_table_acl() < 0) - return -1; + if (test_table_acl() < 0) { + ret = TEST_FAILED; + goto end; + } #endif - return 0; +end: + app_free_resources(); + + return ret; } REGISTER_TEST_COMMAND(table_autotest, test_table); -- 2.7.4
Re: [dpdk-dev] [PATCH v2 2/6] ethdev: add port ownership
Hi Matan, > Hi Konstantin > > From: Ananyev, Konstantin, Tuesday, January 16, 2018 9:11 PM > > Hi Matan, > > > > > > > > Hi Konstantin > > > From: Ananyev, Konstantin, Monday, January 15, 2018 8:44 PM > > > > Hi Matan, > > > > > Hi Konstantin > > > > > From: Ananyev, Konstantin, Monday, January 15, 2018 1:45 PM > > > > > > Hi Matan, > > > > > > > Hi Konstantin > > > > > > > From: Ananyev, Konstantin, Friday, January 12, 2018 2:02 AM > > > > > > > > Hi Matan, > > > > > > > > > Hi Konstantin > > > > > > > > > From: Ananyev, Konstantin, Thursday, January 11, 2018 2:40 > > > > > > > > > PM > > > > > > > > > > Hi Matan, > > > > > > > > > > > Hi Konstantin > > > > > > > > > > > From: Ananyev, Konstantin, Wednesday, January 10, 2018 > > > > > > > > > > > 3:36 PM > > > > > > > > > > > > Hi Matan, > > > > > > > > > > > > > > > It is good to see that now scanning/updating > > > > > > > > > > > > rte_eth_dev_data[] is lock protected, but it might > > > > > > > > > > > > be not very plausible to protect both data[] and > > > > > > > > > > > > next_owner_id using the > > > > > > same lock. > > > > > > > > > > > > > > > > > > > > > > I guess you mean to the owner structure in > > > > > > rte_eth_dev_data[port_id]. > > > > > > > > > > > The next_owner_id is read by ownership APIs(for owner > > > > > > > > > > > validation), so it > > > > > > > > > > makes sense to use the same lock. > > > > > > > > > > > Actually, why not? > > > > > > > > > > > > > > > > > > > > Well to me next_owner_id and rte_eth_dev_data[] are not > > > > > > > > > > directly > > > > > > > > related. > > > > > > > > > > You may create new owner_id but it doesn't mean you > > > > > > > > > > would update rte_eth_dev_data[] immediately. > > > > > > > > > > And visa-versa - you might just want to update > > > > > > > > > > rte_eth_dev_data[].name or .owner_id. > > > > > > > > > > It is not very good coding practice to use same lock for > > > > > > > > > > non-related data structures. > > > > > > > > > > > > > > > > > > > I see the relation like next: > > > > > > > > > Since the ownership mechanism synchronization is in ethdev > > > > > > > > > responsibility, we must protect against user mistakes as > > > > > > > > > much as we can by > > > > > > > > using the same lock. > > > > > > > > > So, if user try to set by invalid owner (exactly the ID > > > > > > > > > which currently is > > > > > > > > allocated) we can protect on it. > > > > > > > > > > > > > > > > Hmm, not sure why you can't do same checking with different > > > > > > > > lock or atomic variable? > > > > > > > > > > > > > > > The set ownership API is protected by ownership lock and > > > > > > > checks the owner ID validity By reading the next owner ID. > > > > > > > So, the owner ID allocation and set API should use the same > > > > > > > atomic > > > > > > mechanism. > > > > > > > > > > > > Sure but all you are doing for checking validity, is check that > > > > > > owner_id > 0 &&& owner_id < next_ownwe_id, right? > > > > > > As you don't allow owner_id overlap (16/3248 bits) you can > > > > > > safely do same check with just atomic_get(&next_owner_id). > > > > > > > > > > > It will not protect it, scenario: > > > > > - current next_id is X. > > > > > - call set ownership of port A with owner id X by thread 0(by user > > mistake). > > > > > - context switch > > > > > - allocate new id by thread 1 and get X and change next_id to X+1 > > > > atomically. > > > > > - context switch > > > > > - Thread 0 validate X by atomic_read and succeed to take ownership. > > > > > - The system loosed the port(or will be managed by two entities) - > > crash. > > > > > > > > > > > > Ok, and how using lock will protect you with such scenario? > > > > > > The owner set API validation by thread 0 should fail because the owner > > validation is included in the protected section. > > > > Then your validation function would fail even if you'll use atomic ops > > instead > > of lock. > No. > With atomic this specific scenario will cause the validation to pass. Can you explain to me how? rte_eth_is_valid_owner_id(uint16_t owner_id) { int32_t cur_owner_id = RTE_MIN(rte_atomic32_get(next_owner_id), UINT16_MAX); if (owner_id == RTE_ETH_DEV_NO_OWNER || owner > cur_owner_id) { RTE_LOG(ERR, EAL, "Invalid owner_id=%d.\n", owner_id); return 0; } return 1; } Let say your next_owne_id==X, and you invoke rte_eth_is_valid_owner_id(owner_id=X+1) - it would fail. > With lock no next_id changes can be done while the thread is in the set API. > > > But in fact your code is not protected for that scenario - doesn't matter > > will > > you'll use lock or atomic ops. > > Let's considerer your current code with the following scenario: > > > > next_owner_id == 1 > > 1) Process 0: > > rte_eth_dev_owner_new(&owner_id); > > now owner_id == 1 and next_owner_id == 2 > > 2) Process 1 (by mistake): > > rte_eth_dev_owner_set(port_id=1, owner->id
[dpdk-dev] [PATCH] net/ixgbe: check if security capabilities are enabled by HW
Check if the security enable bits are not fused before setting offload capabilities for security Signed-off-by: Radu Nicolau --- drivers/net/ixgbe/ixgbe_ethdev.c | 6 -- drivers/net/ixgbe/ixgbe_ipsec.c | 15 +++ drivers/net/ixgbe/ixgbe_ipsec.h | 1 + 3 files changed, 20 insertions(+), 2 deletions(-) diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c index 43e0132..4f2ab2f 100644 --- a/drivers/net/ixgbe/ixgbe_ethdev.c +++ b/drivers/net/ixgbe/ixgbe_ethdev.c @@ -3685,8 +3685,10 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info) dev_info->tx_offload_capa |= DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM; #ifdef RTE_LIBRTE_SECURITY - dev_info->rx_offload_capa |= DEV_RX_OFFLOAD_SECURITY; - dev_info->tx_offload_capa |= DEV_TX_OFFLOAD_SECURITY; + if (ixgbe_crypto_capable(dev)) { + dev_info->rx_offload_capa |= DEV_RX_OFFLOAD_SECURITY; + dev_info->tx_offload_capa |= DEV_TX_OFFLOAD_SECURITY; + } #endif dev_info->default_rxconf = (struct rte_eth_rxconf) { diff --git a/drivers/net/ixgbe/ixgbe_ipsec.c b/drivers/net/ixgbe/ixgbe_ipsec.c index 97f025a8..a495679 100644 --- a/drivers/net/ixgbe/ixgbe_ipsec.c +++ b/drivers/net/ixgbe/ixgbe_ipsec.c @@ -602,6 +602,21 @@ ixgbe_crypto_capabilities_get(void *device __rte_unused) int +ixgbe_crypto_capable(struct rte_eth_dev *dev) +{ + struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private); + uint32_t reg_i, reg, capable = 1; + /* test if rx crypto can be enabled and then write back initial value*/ + reg_i = IXGBE_READ_REG(hw, IXGBE_SECRXCTRL); + IXGBE_WRITE_REG(hw, IXGBE_SECRXCTRL, 0); + reg = IXGBE_READ_REG(hw, IXGBE_SECRXCTRL); + if (reg != 0) + capable = 0; + IXGBE_WRITE_REG(hw, IXGBE_SECRXCTRL, reg_i); + return capable; +} + +int ixgbe_crypto_enable_ipsec(struct rte_eth_dev *dev) { struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private); diff --git a/drivers/net/ixgbe/ixgbe_ipsec.h b/drivers/net/ixgbe/ixgbe_ipsec.h index acd9f3e..eeba39f 100644 --- a/drivers/net/ixgbe/ixgbe_ipsec.h +++ b/drivers/net/ixgbe/ixgbe_ipsec.h @@ -112,6 +112,7 @@ struct ixgbe_ipsec { struct rte_security_ctx * ixgbe_ipsec_ctx_create(struct rte_eth_dev *dev); +int ixgbe_crypto_capable(struct rte_eth_dev *dev); int ixgbe_crypto_enable_ipsec(struct rte_eth_dev *dev); int ixgbe_crypto_add_ingress_sa_from_flow(const void *sess, const void *ip_spec, -- 2.7.5
Re: [dpdk-dev] [PATCH] net/ixgbe: check if security capabilities are enabled by HW
Hi Radu, > -Original Message- > From: Nicolau, Radu > Sent: Wednesday, January 17, 2018 11:19 AM > To: dev@dpdk.org > Cc: Lu, Wenzhuo ; Ananyev, Konstantin > ; Zhao, XinfengX > ; De Lara Guarch, Pablo > ; Nicolau, Radu > Subject: [PATCH] net/ixgbe: check if security capabilities are enabled by HW > > Check if the security enable bits are not fused before setting > offload capabilities for security In theory dev_info_get() - could be called at any stage of device configuration or even when RX/TX is active. Do you really want to assert SECRXCTRL at that moment? Probably better to do this only once at init time and then just use some stored value? Konstantin > > Signed-off-by: Radu Nicolau > --- > drivers/net/ixgbe/ixgbe_ethdev.c | 6 -- > drivers/net/ixgbe/ixgbe_ipsec.c | 15 +++ > drivers/net/ixgbe/ixgbe_ipsec.h | 1 + > 3 files changed, 20 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c > b/drivers/net/ixgbe/ixgbe_ethdev.c > index 43e0132..4f2ab2f 100644 > --- a/drivers/net/ixgbe/ixgbe_ethdev.c > +++ b/drivers/net/ixgbe/ixgbe_ethdev.c > @@ -3685,8 +3685,10 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev, struct > rte_eth_dev_info *dev_info) > dev_info->tx_offload_capa |= DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM; > > #ifdef RTE_LIBRTE_SECURITY > - dev_info->rx_offload_capa |= DEV_RX_OFFLOAD_SECURITY; > - dev_info->tx_offload_capa |= DEV_TX_OFFLOAD_SECURITY; > + if (ixgbe_crypto_capable(dev)) { > + dev_info->rx_offload_capa |= DEV_RX_OFFLOAD_SECURITY; > + dev_info->tx_offload_capa |= DEV_TX_OFFLOAD_SECURITY; > + } > #endif > > dev_info->default_rxconf = (struct rte_eth_rxconf) { > diff --git a/drivers/net/ixgbe/ixgbe_ipsec.c b/drivers/net/ixgbe/ixgbe_ipsec.c > index 97f025a8..a495679 100644 > --- a/drivers/net/ixgbe/ixgbe_ipsec.c > +++ b/drivers/net/ixgbe/ixgbe_ipsec.c > @@ -602,6 +602,21 @@ ixgbe_crypto_capabilities_get(void *device __rte_unused) > > > int > +ixgbe_crypto_capable(struct rte_eth_dev *dev) > +{ > + struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private); > + uint32_t reg_i, reg, capable = 1; > + /* test if rx crypto can be enabled and then write back initial value*/ > + reg_i = IXGBE_READ_REG(hw, IXGBE_SECRXCTRL); > + IXGBE_WRITE_REG(hw, IXGBE_SECRXCTRL, 0); > + reg = IXGBE_READ_REG(hw, IXGBE_SECRXCTRL); > + if (reg != 0) > + capable = 0; > + IXGBE_WRITE_REG(hw, IXGBE_SECRXCTRL, reg_i); > + return capable; > +} > + > +int > ixgbe_crypto_enable_ipsec(struct rte_eth_dev *dev) > { > struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private); > diff --git a/drivers/net/ixgbe/ixgbe_ipsec.h b/drivers/net/ixgbe/ixgbe_ipsec.h > index acd9f3e..eeba39f 100644 > --- a/drivers/net/ixgbe/ixgbe_ipsec.h > +++ b/drivers/net/ixgbe/ixgbe_ipsec.h > @@ -112,6 +112,7 @@ struct ixgbe_ipsec { > > struct rte_security_ctx * > ixgbe_ipsec_ctx_create(struct rte_eth_dev *dev); > +int ixgbe_crypto_capable(struct rte_eth_dev *dev); > int ixgbe_crypto_enable_ipsec(struct rte_eth_dev *dev); > int ixgbe_crypto_add_ingress_sa_from_flow(const void *sess, > const void *ip_spec, > -- > 2.7.5
Re: [dpdk-dev] [PATCH] net/ixgbe: check if security capabilities are enabled by HW
> -Original Message- > From: Ananyev, Konstantin > Sent: Wednesday, January 17, 2018 11:34 AM > To: Nicolau, Radu ; dev@dpdk.org > Cc: Lu, Wenzhuo ; Zhao, XinfengX > ; De Lara Guarch, Pablo > > Subject: RE: [PATCH] net/ixgbe: check if security capabilities are enabled by > HW > > Hi Radu, > > > -Original Message- > > From: Nicolau, Radu > > Sent: Wednesday, January 17, 2018 11:19 AM > > To: dev@dpdk.org > > Cc: Lu, Wenzhuo ; Ananyev, Konstantin > > ; Zhao, XinfengX > > ; De Lara Guarch, Pablo > > ; Nicolau, Radu > > > > Subject: [PATCH] net/ixgbe: check if security capabilities are enabled > > by HW > > > > Check if the security enable bits are not fused before setting offload > > capabilities for security > > In theory dev_info_get() - could be called at any stage of device > configuration or even when RX/TX is active. > Do you really want to assert SECRXCTRL at that moment? > Probably better to do this only once at init time and then just use some > stored value? > Konstantin > Yes, that's true. I will send a v2
[dpdk-dev] [PATCH v2] net/ixgbe: check if security capabilities are enabled by HW
Check if the security enable bits are not fused before setting offload capabilities for security Signed-off-by: Radu Nicolau --- drivers/net/ixgbe/ixgbe_ethdev.c | 20 +++- drivers/net/ixgbe/ixgbe_ipsec.c | 31 +-- 2 files changed, 36 insertions(+), 15 deletions(-) diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c index 43e0132..67ce052 100644 --- a/drivers/net/ixgbe/ixgbe_ethdev.c +++ b/drivers/net/ixgbe/ixgbe_ethdev.c @@ -1141,13 +1141,6 @@ eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev) return 0; } -#ifdef RTE_LIBRTE_SECURITY - /* Initialize security_ctx only for primary process*/ - eth_dev->security_ctx = ixgbe_ipsec_ctx_create(eth_dev); - if (eth_dev->security_ctx == NULL) - return -ENOMEM; -#endif - rte_eth_copy_pci_info(eth_dev, pci_dev); /* Vendor and Device ID need to be set before init of shared code */ @@ -1174,6 +1167,13 @@ eth_ixgbe_dev_init(struct rte_eth_dev *eth_dev) /* Unlock any pending hardware semaphore */ ixgbe_swfw_lock_reset(hw); +#ifdef RTE_LIBRTE_SECURITY + /* Initialize security_ctx only for primary process*/ + eth_dev->security_ctx = ixgbe_ipsec_ctx_create(eth_dev); + if (eth_dev->security_ctx == NULL) + return -ENOMEM; +#endif + /* Initialize DCB configuration*/ memset(dcb_config, 0, sizeof(struct ixgbe_dcb_config)); ixgbe_dcb_init(hw, dcb_config); @@ -3685,8 +3685,10 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev, struct rte_eth_dev_info *dev_info) dev_info->tx_offload_capa |= DEV_TX_OFFLOAD_OUTER_IPV4_CKSUM; #ifdef RTE_LIBRTE_SECURITY - dev_info->rx_offload_capa |= DEV_RX_OFFLOAD_SECURITY; - dev_info->tx_offload_capa |= DEV_TX_OFFLOAD_SECURITY; + if (dev->security_ctx) { + dev_info->rx_offload_capa |= DEV_RX_OFFLOAD_SECURITY; + dev_info->tx_offload_capa |= DEV_TX_OFFLOAD_SECURITY; + } #endif dev_info->default_rxconf = (struct rte_eth_rxconf) { diff --git a/drivers/net/ixgbe/ixgbe_ipsec.c b/drivers/net/ixgbe/ixgbe_ipsec.c index 97f025a8..a9e501e 100644 --- a/drivers/net/ixgbe/ixgbe_ipsec.c +++ b/drivers/net/ixgbe/ixgbe_ipsec.c @@ -694,15 +694,34 @@ static struct rte_security_ops ixgbe_security_ops = { .capabilities_get = ixgbe_crypto_capabilities_get }; +static int +ixgbe_crypto_capable(struct rte_eth_dev *dev) +{ + struct ixgbe_hw *hw = IXGBE_DEV_PRIVATE_TO_HW(dev->data->dev_private); + uint32_t reg_i, reg, capable = 1; + /* test if rx crypto can be enabled and then write back initial value*/ + reg_i = IXGBE_READ_REG(hw, IXGBE_SECRXCTRL); + IXGBE_WRITE_REG(hw, IXGBE_SECRXCTRL, 0); + reg = IXGBE_READ_REG(hw, IXGBE_SECRXCTRL); + if (reg != 0) + capable = 0; + IXGBE_WRITE_REG(hw, IXGBE_SECRXCTRL, reg_i); + return capable; +} + struct rte_security_ctx * ixgbe_ipsec_ctx_create(struct rte_eth_dev *dev) { - struct rte_security_ctx *ctx = rte_malloc("rte_security_instances_ops", - sizeof(struct rte_security_ctx), 0); - if (ctx) { - ctx->device = (void *)dev; - ctx->ops = &ixgbe_security_ops; - ctx->sess_cnt = 0; + struct rte_security_ctx *ctx = NULL; + + if (ixgbe_crypto_capable(dev)) { + ctx = rte_malloc("rte_security_instances_ops", +sizeof(struct rte_security_ctx), 0); + if (ctx) { + ctx->device = (void *)dev; + ctx->ops = &ixgbe_security_ops; + ctx->sess_cnt = 0; + } } return ctx; } -- 2.7.5
Re: [dpdk-dev] [PATCH v2 2/6] ethdev: add port ownership
Hi Konstantin From: Ananyev, Konstantin, Sent: Wednesday, January 17, 2018 1:24 PM > Hi Matan, > > > Hi Konstantin > > > > From: Ananyev, Konstantin, Tuesday, January 16, 2018 9:11 PM > > > Hi Matan, > > > > > > > > > > > Hi Konstantin > > > > From: Ananyev, Konstantin, Monday, January 15, 2018 8:44 PM > > > > > Hi Matan, > > > > > > Hi Konstantin > > > > > > From: Ananyev, Konstantin, Monday, January 15, 2018 1:45 PM > > > > > > > Hi Matan, > > > > > > > > Hi Konstantin > > > > > > > > From: Ananyev, Konstantin, Friday, January 12, 2018 2:02 > > > > > > > > AM > > > > > > > > > Hi Matan, > > > > > > > > > > Hi Konstantin > > > > > > > > > > From: Ananyev, Konstantin, Thursday, January 11, 2018 > > > > > > > > > > 2:40 PM > > > > > > > > > > > Hi Matan, > > > > > > > > > > > > Hi Konstantin > > > > > > > > > > > > From: Ananyev, Konstantin, Wednesday, January 10, > > > > > > > > > > > > 2018 > > > > > > > > > > > > 3:36 PM > > > > > > > > > > > > > Hi Matan, > > > > > > > > > > > > > > > > > It is good to see that now scanning/updating > > > > > > > > > > > > > rte_eth_dev_data[] is lock protected, but it > > > > > > > > > > > > > might be not very plausible to protect both > > > > > > > > > > > > > data[] and next_owner_id using the > > > > > > > same lock. > > > > > > > > > > > > > > > > > > > > > > > > I guess you mean to the owner structure in > > > > > > > rte_eth_dev_data[port_id]. > > > > > > > > > > > > The next_owner_id is read by ownership APIs(for > > > > > > > > > > > > owner validation), so it > > > > > > > > > > > makes sense to use the same lock. > > > > > > > > > > > > Actually, why not? > > > > > > > > > > > > > > > > > > > > > > Well to me next_owner_id and rte_eth_dev_data[] are > > > > > > > > > > > not directly > > > > > > > > > related. > > > > > > > > > > > You may create new owner_id but it doesn't mean you > > > > > > > > > > > would update rte_eth_dev_data[] immediately. > > > > > > > > > > > And visa-versa - you might just want to update > > > > > > > > > > > rte_eth_dev_data[].name or .owner_id. > > > > > > > > > > > It is not very good coding practice to use same lock > > > > > > > > > > > for non-related data structures. > > > > > > > > > > > > > > > > > > > > > I see the relation like next: > > > > > > > > > > Since the ownership mechanism synchronization is in > > > > > > > > > > ethdev responsibility, we must protect against user > > > > > > > > > > mistakes as much as we can by > > > > > > > > > using the same lock. > > > > > > > > > > So, if user try to set by invalid owner (exactly the > > > > > > > > > > ID which currently is > > > > > > > > > allocated) we can protect on it. > > > > > > > > > > > > > > > > > > Hmm, not sure why you can't do same checking with > > > > > > > > > different lock or atomic variable? > > > > > > > > > > > > > > > > > The set ownership API is protected by ownership lock and > > > > > > > > checks the owner ID validity By reading the next owner ID. > > > > > > > > So, the owner ID allocation and set API should use the > > > > > > > > same atomic > > > > > > > mechanism. > > > > > > > > > > > > > > Sure but all you are doing for checking validity, is check > > > > > > > that owner_id > 0 &&& owner_id < next_ownwe_id, right? > > > > > > > As you don't allow owner_id overlap (16/3248 bits) you can > > > > > > > safely do same check with just atomic_get(&next_owner_id). > > > > > > > > > > > > > It will not protect it, scenario: > > > > > > - current next_id is X. > > > > > > - call set ownership of port A with owner id X by thread 0(by > > > > > > user > > > mistake). > > > > > > - context switch > > > > > > - allocate new id by thread 1 and get X and change next_id to > > > > > > X+1 > > > > > atomically. > > > > > > - context switch > > > > > > - Thread 0 validate X by atomic_read and succeed to take > ownership. > > > > > > - The system loosed the port(or will be managed by two > > > > > > entities) - > > > crash. > > > > > > > > > > > > > > > Ok, and how using lock will protect you with such scenario? > > > > > > > > The owner set API validation by thread 0 should fail because the > > > > owner > > > validation is included in the protected section. > > > > > > Then your validation function would fail even if you'll use atomic > > > ops instead of lock. > > No. > > With atomic this specific scenario will cause the validation to pass. > > Can you explain to me how? > > rte_eth_is_valid_owner_id(uint16_t owner_id) { > int32_t cur_owner_id = RTE_MIN(rte_atomic32_get(next_owner_id), > UINT16_MAX); > > if (owner_id == RTE_ETH_DEV_NO_OWNER || owner > > cur_owner_id) { > RTE_LOG(ERR, EAL, "Invalid owner_id=%d.\n", owner_id); > return 0; > } > return 1; > } > > Let say your next_owne_id==X, and you invoke > rte_eth_is_valid_owner_id(owner_id=X+1) - it would fail. Explanation: The scenario with locks: next_owner_id = X. Thread 0 call to set API(with invalid owner Y=X) and take lock
Re: [dpdk-dev] [PATCH] doc: document the new devargs syntax
On 1/16/2018 2:50 PM, Yuanhan Liu wrote: > This patch documents the new devargs syntax, which is going to be > implemented in DPDK v18.05. > > The new devargs proposal is introduced for having a consistent > interface for: > > - whitelisting/blacklisting devices > - identifying ports > - attaching/detaching devices Hi Yuanhan, devargs = device arguments, the PMD specific arguments, similar to module_param in Linux. Currently only "--vdev" and -w/-b eal parameters parse proceeding strings as devargs. Like: "--vdev "net_pcap,iface=lo" . For this case "iface=lo" device specific argument and available to use from pcap PMD. I agree it to have a consistent way to describe device, that makes better whitelist/blacklist support. But that part is not device args, more like device identifier. When you use this string with whitelist/blacklist I think you won't need "iface=lo" part, only need first part. And when using with --vdev, (or perhaps with attach) you don't need to use first part "bus=pci,id=:01:00.0/class=eth,mac=00:11:22:33:44:55", PMD already knows it is in virtual bus and its class etc. So does it make sense to separate them logically? Perhaps as "device identifier" and "device args". string can become: "device=bus=pci,id=:01:00.0/class=eth,mac=00:11:22:33:44:55;driver=PMD_NAME,driverspecificproperty=VALUE" specific usages can become: -w "device=bus=pci,id=:01:00.0/class=eth,mac=00:11:22:33:44:55" --vdev "driver=PMD_NAME,driverspecificproperty=VALUE" And store them in two separate storage, and eal or PMD can ask for "device identifier" or "device args" separately? What do you think? > > Please check the patch content for the details. Also, here is link > for the background: > http://dpdk.org/ml/archives/dev/2017-November/082600.html > > This syntax is suggestd by Thomas: > http://dpdk.org/ml/archives/dev/2017-December/084234.html > > Signed-off-by: Yuanhan Liu > --- > doc/guides/prog_guide/env_abstraction_layer.rst | 34 > + > 1 file changed, 34 insertions(+) > > diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst > b/doc/guides/prog_guide/env_abstraction_layer.rst > index 34d871c..12f37f2 100644 > --- a/doc/guides/prog_guide/env_abstraction_layer.rst > +++ b/doc/guides/prog_guide/env_abstraction_layer.rst > @@ -213,6 +213,40 @@ device having emitted a Device Removal Event. In such > case, calling > callback. Care must be taken not to close the device from the interrupt > handler > context. It is necessary to reschedule such closing operation. > > +Devargs > +~~~ > + > +The ``devargs`` can be used for whitelisting/blacklisting devices, > identifying > +DPDK ports and attaching/deatching devices. They all share the same syntax. > + > +It is split in 3 categories, where almost everything is optional key/value > pairs: > + > +* bus (pci, vdev, vmbus, fslmc, etc) > +* class (eth, crypto, etc) > +* driver (i40e, mlx5, virtio, etc) > + > +The key/value pair describing the category scope is mandatory and must be the > +first pair in the category properties. Example: bus=pci, must be placed > before > +id=:01:00.0. > + > +The syntax has below rules: > + > +* Between categories, the separator is a slash. > +* Inside a category, the separator is a comma. > +* Inside a key/value pair, the separator is an equal sign. > +* Each category can be used alone. > + > +Here is an example with all categories:: > + > + > bus=pci,id=:01:00.0/class=eth,mac=00:11:22:33:44:55/driver=PMD_NAME,driverspecificproperty=VALUE > + > +It can also be simple like below:: > + > +class=eth,mac=00:11:22:33:44:55 > + > +A device is identified when every properties are matched. Before device is > +probed, only the bus category is relevant. > + > Blacklisting > > >
[dpdk-dev] [PATCH V3] net/failsafe: add Rx interrupts
This patch adds support for registering and waiting for Rx interrupts in failsafe PMD. This allows applications to wait for Rx events from the PMD using the DPDK rte_epoll subsystem. The failsafe PMD presents to the application a facade of a single device to be handled by the application while internally it manages several devices on behalf of the application including packets transmission and reception. The Proposed failsafe Rx interrupt scheme follows this approach. The failsafe PMD will present the application with a single set of Rx interrupt vectors representing the failsafe Rx queues, while internally it will serve as an interrupt proxy for its subdevices. This will allow applications to wait for Rx traffic from the failsafe PMD by registering and waiting for Rx events from its Rx queues. In order to support this the following is suggested: * Every Rx queue in the failsafe (virtual) device will be assigned a Linux event file descriptor (efd) and an enable_interrupts flag. * The failsafe PMD will fill in its rte_intr_handle structure with the Rx efds assigned previously and register them with the EAL. * The failsafe driver will create a private epoll fd (epfd) and will allocate enough space to handle all the Rx events from all its subdevices. * Acting as an application, for each Rx queue in each active subdevice the failsafe will: o Register the Rx queue with the EAL. o Pass the EAL the failsafe private epoll fd as the epfd to register the Rx queue event on. o Pass the EAL, as a parameter, the pointer to the failsafe Rx queue that handles this Rx queue. o Using the DPDK service callbacks, the failsafe PMD will launch an Rx proxy service that will Wait on the epoll fd for Rx events from the sub-devices. o For each Rx event received the proxy service will - Retrieve the pointer to failsafe Rx queue that handles this subdevice Rx queue from the user info returned by the EAL. - Trigger a failsafe Rx event on that queue by writing to the event fd unless interrupts are disabled for that queue. * The failsafe pmd will also implement the rx_queue_intr_enable and rx_queue_intr_disable routines that will enable and disable Rx interrupts respectively on both on the failsafe and its subdevices. Signed-off-by: Moti Haimovsky --- V3: Fixed build failures in FreeBSD10.3_64 V2: Modifications according to inputs from Stephen Hemminger: * Removed unneeded (void *) casting. Fixed coding style warning. --- doc/guides/nics/features/failsafe.ini | 1 + drivers/net/failsafe/Makefile | 1 + drivers/net/failsafe/failsafe.c | 4 + drivers/net/failsafe/failsafe_ether.c | 1 + drivers/net/failsafe/failsafe_intr.c| 595 drivers/net/failsafe/failsafe_ops.c | 28 ++ drivers/net/failsafe/failsafe_private.h | 44 +++ 7 files changed, 674 insertions(+) create mode 100644 drivers/net/failsafe/failsafe_intr.c diff --git a/doc/guides/nics/features/failsafe.ini b/doc/guides/nics/features/failsafe.ini index a42e344..39ee579 100644 --- a/doc/guides/nics/features/failsafe.ini +++ b/doc/guides/nics/features/failsafe.ini @@ -6,6 +6,7 @@ [Features] Link status = Y Link status event= Y +Rx interrupt = Y MTU update = Y Jumbo frame = Y Promiscuous mode = Y diff --git a/drivers/net/failsafe/Makefile b/drivers/net/failsafe/Makefile index ea2a8fe..91a734b 100644 --- a/drivers/net/failsafe/Makefile +++ b/drivers/net/failsafe/Makefile @@ -46,6 +46,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += failsafe_ops.c SRCS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += failsafe_rxtx.c SRCS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += failsafe_ether.c SRCS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += failsafe_flow.c +SRCS-$(CONFIG_RTE_LIBRTE_PMD_FAILSAFE) += failsafe_intr.c # No exported include files diff --git a/drivers/net/failsafe/failsafe.c b/drivers/net/failsafe/failsafe.c index b767352..35fcdc2 100644 --- a/drivers/net/failsafe/failsafe.c +++ b/drivers/net/failsafe/failsafe.c @@ -244,6 +244,10 @@ mac->addr_bytes[2], mac->addr_bytes[3], mac->addr_bytes[4], mac->addr_bytes[5]); dev->data->dev_flags |= RTE_ETH_DEV_INTR_LSC; + PRIV(dev)->intr_handle = (struct rte_intr_handle){ + .fd = -1, + .type = RTE_INTR_HANDLE_EXT, + }; return 0; free_args: failsafe_args_free(dev); diff --git a/drivers/net/failsafe/failsafe_ether.c b/drivers/net/failsafe/failsafe_ether.c index 8a4cacf..0f1630e 100644 --- a/drivers/net/failsafe/failsafe_ether.c +++ b/drivers/net/failsafe/failsafe_ether.c @@ -283,6 +283,7 @@ return; switch (sdev->state) { case DEV_STARTED: + failsafe_rx_intr_uninstall_subdevice(sdev); rte_eth_dev_stop(PORT_ID(sdev)); sdev->state = DEV_AC
Re: [dpdk-dev] [PATCH v2 2/6] ethdev: add port ownership
> > > Hi Konstantin > From: Ananyev, Konstantin, Sent: Wednesday, January 17, 2018 1:24 PM > > Hi Matan, > > > > > Hi Konstantin > > > > > > From: Ananyev, Konstantin, Tuesday, January 16, 2018 9:11 PM > > > > Hi Matan, > > > > > > > > > > > > > > Hi Konstantin > > > > > From: Ananyev, Konstantin, Monday, January 15, 2018 8:44 PM > > > > > > Hi Matan, > > > > > > > Hi Konstantin > > > > > > > From: Ananyev, Konstantin, Monday, January 15, 2018 1:45 PM > > > > > > > > Hi Matan, > > > > > > > > > Hi Konstantin > > > > > > > > > From: Ananyev, Konstantin, Friday, January 12, 2018 2:02 > > > > > > > > > AM > > > > > > > > > > Hi Matan, > > > > > > > > > > > Hi Konstantin > > > > > > > > > > > From: Ananyev, Konstantin, Thursday, January 11, 2018 > > > > > > > > > > > 2:40 PM > > > > > > > > > > > > Hi Matan, > > > > > > > > > > > > > Hi Konstantin > > > > > > > > > > > > > From: Ananyev, Konstantin, Wednesday, January 10, > > > > > > > > > > > > > 2018 > > > > > > > > > > > > > 3:36 PM > > > > > > > > > > > > > > Hi Matan, > > > > > > > > > > > > > > > > > > > It is good to see that now scanning/updating > > > > > > > > > > > > > > rte_eth_dev_data[] is lock protected, but it > > > > > > > > > > > > > > might be not very plausible to protect both > > > > > > > > > > > > > > data[] and next_owner_id using the > > > > > > > > same lock. > > > > > > > > > > > > > > > > > > > > > > > > > > I guess you mean to the owner structure in > > > > > > > > rte_eth_dev_data[port_id]. > > > > > > > > > > > > > The next_owner_id is read by ownership APIs(for > > > > > > > > > > > > > owner validation), so it > > > > > > > > > > > > makes sense to use the same lock. > > > > > > > > > > > > > Actually, why not? > > > > > > > > > > > > > > > > > > > > > > > > Well to me next_owner_id and rte_eth_dev_data[] are > > > > > > > > > > > > not directly > > > > > > > > > > related. > > > > > > > > > > > > You may create new owner_id but it doesn't mean you > > > > > > > > > > > > would update rte_eth_dev_data[] immediately. > > > > > > > > > > > > And visa-versa - you might just want to update > > > > > > > > > > > > rte_eth_dev_data[].name or .owner_id. > > > > > > > > > > > > It is not very good coding practice to use same lock > > > > > > > > > > > > for non-related data structures. > > > > > > > > > > > > > > > > > > > > > > > I see the relation like next: > > > > > > > > > > > Since the ownership mechanism synchronization is in > > > > > > > > > > > ethdev responsibility, we must protect against user > > > > > > > > > > > mistakes as much as we can by > > > > > > > > > > using the same lock. > > > > > > > > > > > So, if user try to set by invalid owner (exactly the > > > > > > > > > > > ID which currently is > > > > > > > > > > allocated) we can protect on it. > > > > > > > > > > > > > > > > > > > > Hmm, not sure why you can't do same checking with > > > > > > > > > > different lock or atomic variable? > > > > > > > > > > > > > > > > > > > The set ownership API is protected by ownership lock and > > > > > > > > > checks the owner ID validity By reading the next owner ID. > > > > > > > > > So, the owner ID allocation and set API should use the > > > > > > > > > same atomic > > > > > > > > mechanism. > > > > > > > > > > > > > > > > Sure but all you are doing for checking validity, is check > > > > > > > > that owner_id > 0 &&& owner_id < next_ownwe_id, right? > > > > > > > > As you don't allow owner_id overlap (16/3248 bits) you can > > > > > > > > safely do same check with just atomic_get(&next_owner_id). > > > > > > > > > > > > > > > It will not protect it, scenario: > > > > > > > - current next_id is X. > > > > > > > - call set ownership of port A with owner id X by thread 0(by > > > > > > > user > > > > mistake). > > > > > > > - context switch > > > > > > > - allocate new id by thread 1 and get X and change next_id to > > > > > > > X+1 > > > > > > atomically. > > > > > > > - context switch > > > > > > > - Thread 0 validate X by atomic_read and succeed to take > > ownership. > > > > > > > - The system loosed the port(or will be managed by two > > > > > > > entities) - > > > > crash. > > > > > > > > > > > > > > > > > > Ok, and how using lock will protect you with such scenario? > > > > > > > > > > The owner set API validation by thread 0 should fail because the > > > > > owner > > > > validation is included in the protected section. > > > > > > > > Then your validation function would fail even if you'll use atomic > > > > ops instead of lock. > > > No. > > > With atomic this specific scenario will cause the validation to pass. > > > > Can you explain to me how? > > > > rte_eth_is_valid_owner_id(uint16_t owner_id) { > > int32_t cur_owner_id = > > RTE_MIN(rte_atomic32_get(next_owner_id), > > UINT16_MAX); > > > > if (owner_id == RTE_ETH_DEV_NO_OWNER || owner > > > cur_owner_id) { > > RTE_LOG(ERR, EAL, "Invalid owner_id=%d.\n", owner_id); > > return 0; > > } > > return 1; > >
Re: [dpdk-dev] [PATCH v2 3/4] eal: add synchronous multi-process communication
On 1/17/2018 6:50 PM, Ananyev, Konstantin wrote: Hi Jianfeng, -Original Message- From: Tan, Jianfeng Sent: Tuesday, January 16, 2018 8:11 AM To: Ananyev, Konstantin ; dev@dpdk.org; Burakov, Anatoly Cc: Richardson, Bruce ; tho...@monjalon.net Subject: Re: [PATCH v2 3/4] eal: add synchronous multi-process communication Thank you, Konstantin and Anatoly firstly. Other comments are well received and I'll send out a new version. On 1/16/2018 8:00 AM, Ananyev, Konstantin wrote: We need the synchronous way for multi-process communication, i.e., blockingly waiting for reply message when we send a request to the peer process. We add two APIs rte_eal_mp_request() and rte_eal_mp_reply() for such use case. By invoking rte_eal_mp_request(), a request message is sent out, and then it waits there for a reply message. The timeout is hard-coded 5 Sec. And the replied message will be copied in the parameters of this API so that the caller can decide how to translate those information (including params and fds). Note if a primary process owns multiple secondary processes, this API will fail. The API rte_eal_mp_reply() is always called by an mp action handler. Here we add another parameter for rte_eal_mp_t so that the action handler knows which peer address to reply. We use mutex in rte_eal_mp_request() to guarantee that only one request is on the fly for one pair of processes. You don't need to do things in such strange and restrictive way. Instead you can do something like that: 1) Introduce new struct, list for it and mutex struct sync_request { int reply_received; char dst[PATH_MAX]; char reply[...]; LIST_ENTRY(sync_request) next; }; static struct LIST_HEAD(list, sync_request); pthread_mutex_t lock; pthead_cond_t cond; } sync_requests; 2) then at request() call: Grab sync_requests.lock Check do we already have a pending request for that destination, If yes - the release the lock and returns with error. - allocate and init new sync_request struct, set reply_received=0 - do send_msg() -then in a cycle: pthread_cond_timed_wait(&sync_requests.cond, &sync_request.lock, ×pec); - at return from it check if sync_request.reply_received == 1, if not check if timeout expired and either return a failure or go to the start of the cycle. 3) at mp_handler() if REPLY received - grab sync_request.lock, search through sync_requests.list for dst[] , if found, then set it's reply_received=1, copy the received message into reply and call pthread_cond_braodcast((&sync_requests.cond); The only benefit I can see is that now the sender can request to multiple receivers at the same time. And it makes things more complicated. Do we really need this? The benefit is that one thread is blocked waiting for response, your mp_handler can still receive and handle other messages. This can already be done in the original implementation. mp_handler listens for msg, request from the other peer(s), and replies the requests, which is not affected. Plus as you said - other threads can keep sending messages. For this one, in the original implementation, other threads can still send msg, but not request. I suppose the request is not in a fast path, why we care to make it fast? +int +rte_eal_mp_request(const char *action_name, + void *params, + int len_p, + int fds[], + int fds_in, + int fds_out) +{ + int i, j; + int sockfd; + int nprocs; + int ret = 0; + struct mp_msghdr *req; + struct timeval tv; + char buf[MAX_MSG_LENGTH]; + struct mp_msghdr *hdr; + + RTE_LOG(DEBUG, EAL, "request: %s\n", action_name); + + if (fds_in > SCM_MAX_FD || fds_out > SCM_MAX_FD) { + RTE_LOG(ERR, EAL, "Cannot send more than %d FDs\n", SCM_MAX_FD); + rte_errno = -E2BIG; + return 0; + } + + req = format_msg(action_name, params, len_p, fds_in, MP_REQ); + if (req == NULL) + return 0; + + if ((sockfd = open_unix_fd(0)) < 0) { + free(req); + return 0; + } + + tv.tv_sec = 5; /* 5 Secs Timeout */ + tv.tv_usec = 0; + if (setsockopt(sockfd, SOL_SOCKET, SO_RCVTIMEO, + (const void *)&tv, sizeof(struct timeval)) < 0) + RTE_LOG(INFO, EAL, "Failed to set recv timeout\n"); I f you set it just for one call, why do you not restore it? Yes, original code is buggy, I should have put it into the critical section. Do you mean we just create once and use for ever? if yes, we could put the open and setting into mp_init(). Also I don't think it is a good idea to change it here - if you'll make timeout a parameter value - then it could be overwritten by different threads. For simplicity, I'm not inclined to put the timeout as an parameter
Re: [dpdk-dev] [PATCH v2 2/6] ethdev: add port ownership
Hi Konstantin From: Ananyev, Konstantin, Wednesday, January 17, 2018 2:55 PM > > > > > > Hi Konstantin > > From: Ananyev, Konstantin, Sent: Wednesday, January 17, 2018 1:24 PM > > > Hi Matan, > > > > > > > Hi Konstantin > > > > > > > > From: Ananyev, Konstantin, Tuesday, January 16, 2018 9:11 PM > > > > > Hi Matan, > > > > > > > > > > > > > > > > > Hi Konstantin > > > > > > From: Ananyev, Konstantin, Monday, January 15, 2018 8:44 PM > > > > > > > Hi Matan, > > > > > > > > Hi Konstantin > > > > > > > > From: Ananyev, Konstantin, Monday, January 15, 2018 1:45 > > > > > > > > PM > > > > > > > > > Hi Matan, > > > > > > > > > > Hi Konstantin > > > > > > > > > > From: Ananyev, Konstantin, Friday, January 12, 2018 > > > > > > > > > > 2:02 AM > > > > > > > > > > > Hi Matan, > > > > > > > > > > > > Hi Konstantin > > > > > > > > > > > > From: Ananyev, Konstantin, Thursday, January 11, > > > > > > > > > > > > 2018 > > > > > > > > > > > > 2:40 PM > > > > > > > > > > > > > Hi Matan, > > > > > > > > > > > > > > Hi Konstantin > > > > > > > > > > > > > > From: Ananyev, Konstantin, Wednesday, January > > > > > > > > > > > > > > 10, > > > > > > > > > > > > > > 2018 > > > > > > > > > > > > > > 3:36 PM > > > > > > > > > > > > > > > Hi Matan, > > > > > > > > > > > > > > > > > > > > > It is good to see that now scanning/updating > > > > > > > > > > > > > > > rte_eth_dev_data[] is lock protected, but it > > > > > > > > > > > > > > > might be not very plausible to protect both > > > > > > > > > > > > > > > data[] and next_owner_id using the > > > > > > > > > same lock. > > > > > > > > > > > > > > > > > > > > > > > > > > > > I guess you mean to the owner structure in > > > > > > > > > rte_eth_dev_data[port_id]. > > > > > > > > > > > > > > The next_owner_id is read by ownership > > > > > > > > > > > > > > APIs(for owner validation), so it > > > > > > > > > > > > > makes sense to use the same lock. > > > > > > > > > > > > > > Actually, why not? > > > > > > > > > > > > > > > > > > > > > > > > > > Well to me next_owner_id and rte_eth_dev_data[] > > > > > > > > > > > > > are not directly > > > > > > > > > > > related. > > > > > > > > > > > > > You may create new owner_id but it doesn't mean > > > > > > > > > > > > > you would update rte_eth_dev_data[] immediately. > > > > > > > > > > > > > And visa-versa - you might just want to update > > > > > > > > > > > > > rte_eth_dev_data[].name or .owner_id. > > > > > > > > > > > > > It is not very good coding practice to use same > > > > > > > > > > > > > lock for non-related data structures. > > > > > > > > > > > > > > > > > > > > > > > > > I see the relation like next: > > > > > > > > > > > > Since the ownership mechanism synchronization is > > > > > > > > > > > > in ethdev responsibility, we must protect against > > > > > > > > > > > > user mistakes as much as we can by > > > > > > > > > > > using the same lock. > > > > > > > > > > > > So, if user try to set by invalid owner (exactly > > > > > > > > > > > > the ID which currently is > > > > > > > > > > > allocated) we can protect on it. > > > > > > > > > > > > > > > > > > > > > > Hmm, not sure why you can't do same checking with > > > > > > > > > > > different lock or atomic variable? > > > > > > > > > > > > > > > > > > > > > The set ownership API is protected by ownership lock > > > > > > > > > > and checks the owner ID validity By reading the next owner > ID. > > > > > > > > > > So, the owner ID allocation and set API should use the > > > > > > > > > > same atomic > > > > > > > > > mechanism. > > > > > > > > > > > > > > > > > > Sure but all you are doing for checking validity, is > > > > > > > > > check that owner_id > 0 &&& owner_id < next_ownwe_id, > right? > > > > > > > > > As you don't allow owner_id overlap (16/3248 bits) you > > > > > > > > > can safely do same check with just > atomic_get(&next_owner_id). > > > > > > > > > > > > > > > > > It will not protect it, scenario: > > > > > > > > - current next_id is X. > > > > > > > > - call set ownership of port A with owner id X by thread > > > > > > > > 0(by user > > > > > mistake). > > > > > > > > - context switch > > > > > > > > - allocate new id by thread 1 and get X and change next_id > > > > > > > > to > > > > > > > > X+1 > > > > > > > atomically. > > > > > > > > - context switch > > > > > > > > - Thread 0 validate X by atomic_read and succeed to take > > > ownership. > > > > > > > > - The system loosed the port(or will be managed by two > > > > > > > > entities) - > > > > > crash. > > > > > > > > > > > > > > > > > > > > > Ok, and how using lock will protect you with such scenario? > > > > > > > > > > > > The owner set API validation by thread 0 should fail because > > > > > > the owner > > > > > validation is included in the protected section. > > > > > > > > > > Then your validation function would fail even if you'll use > > > > > atomic ops instead of lock. > > > > No. > > > > With atomic this specific scenario will cause the validation to pass. > > > > > > Can you explain to
Re: [dpdk-dev] [PATCH v2 3/4] eal: add synchronous multi-process communication
On 1/17/2018 9:09 PM, Tan, Jianfeng wrote: On 1/17/2018 6:50 PM, Ananyev, Konstantin wrote: [...] +int +rte_eal_mp_request(const char *action_name, + void *params, + int len_p, + int fds[], + int fds_in, + int fds_out) +{ +int i, j; +int sockfd; +int nprocs; +int ret = 0; +struct mp_msghdr *req; +struct timeval tv; +char buf[MAX_MSG_LENGTH]; +struct mp_msghdr *hdr; + +RTE_LOG(DEBUG, EAL, "request: %s\n", action_name); + +if (fds_in > SCM_MAX_FD || fds_out > SCM_MAX_FD) { +RTE_LOG(ERR, EAL, "Cannot send more than %d FDs\n", SCM_MAX_FD); +rte_errno = -E2BIG; +return 0; +} + +req = format_msg(action_name, params, len_p, fds_in, MP_REQ); +if (req == NULL) +return 0; + +if ((sockfd = open_unix_fd(0)) < 0) { +free(req); +return 0; +} + +tv.tv_sec = 5; /* 5 Secs Timeout */ +tv.tv_usec = 0; +if (setsockopt(sockfd, SOL_SOCKET, SO_RCVTIMEO, +(const void *)&tv, sizeof(struct timeval)) < 0) +RTE_LOG(INFO, EAL, "Failed to set recv timeout\n"); I f you set it just for one call, why do you not restore it? Yes, original code is buggy, I should have put it into the critical section. Do you mean we just create once and use for ever? if yes, we could put the open and setting into mp_init(). A second thought, we shall not put the setting into mp_init(). It'll be set to non-blocking as of sending msg, but blocking as of receiving msg. Thanks, Jianfeng
Re: [dpdk-dev] [PATCH v2 3/3] logs: remove log level config option
12/12/2017 17:36, Olivier MATZ: > On Sat, Dec 09, 2017 at 06:53:58PM +0530, Pavan Nikhilesh wrote: > > Remove RTE_LOG_LEVEL config option, use existing RTE_LOG_DP_LEVEL config > > option for controlling datapath log level. > > RTE_LOG_LEVEL is no longer needed as dynamic logging can be used to > > control global and module specific log levels. > > > > Signed-off-by: Pavan Nikhilesh > > Reviewed-by: Olivier Matz Series applied, thanks
[dpdk-dev] [Bug 10] [Testpmd] NUMA, speed issue
https://dpdk.org/tracker/show_bug.cgi?id=10 Bug ID: 10 Summary: [Testpmd] NUMA, speed issue Product: DPDK Version: unspecified Hardware: x86 OS: All Status: CONFIRMED Severity: normal Priority: Normal Component: testpmd Assignee: dev@dpdk.org Reporter: nounous...@hotmail.com Target Milestone: --- Hello, I need help to manage packet using dpdk under xeon intel chip. When I launch testpmd, I'm wondering if output traces below are blocking to check bandwith: >./testpmd -l 0-3 -n 4 -- -i --portmask=0x1 --nb-cores=2 EAL: Detected 8 lcore(s) EAL: 1024 hugepages of size 2097152 reserved, but no mounted hugetlbfs found for that size EAL: Probing VFIO support... EAL: cannot open /proc/self/numa_maps, consider that all memory is in socket_id 0 EAL: PCI device :01:00.0 on NUMA socket 0 EAL: probe driver: 8086:15a4 net_fm10k EAL: PCI device :02:00.0 on NUMA socket 0 EAL: probe driver: 8086:15a4 net_fm10k EAL: PCI device :04:00.0 on NUMA socket 0 EAL: probe driver: 8086:15ab net_ixgbe EAL: PCI device :04:00.1 on NUMA socket 0 EAL: probe driver: 8086:15ab net_ixgbe EAL: PCI device :04:10.1 on NUMA socket 0 EAL: probe driver: 8086:15a8 net_ixgbe_vf EAL: PCI device :04:10.3 on NUMA socket 0 EAL: probe driver: 8086:15a8 net_ixgbe_vf EAL: PCI device :04:10.5 on NUMA socket 0 EAL: probe driver: 8086:15a8 net_ixgbe_vf EAL: PCI device :06:00.0 on NUMA socket 0 EAL: probe driver: 8086:15a4 net_fm10k EAL: PCI device :08:00.0 on NUMA socket 0 EAL: probe driver: 8086:1533 net_e1000_igb Interactive-mode selected previous number of forwarding ports 2 - changed to number of configured ports 1 USER1: create a new mbuf pool : n=171456, size=2240, socket=0 Warning! Cannot handle an odd number of ports with the current port topology. Configuration must be changed to have an even number of ports, or relaunch application with --port-topology=chained Configuring Port 0 (socket 0) PMD: fm10k_dev_configure(): fm10k always strip CRC Port 0: 00:A0:C9:23:45:69 Configuring Port 1 (socket 0) PMD: fm10k_dev_configure(): fm10k always strip CRC Port 1: 00:A0:C9:23:45:6A Checking link statuses... Port 0 Link Up - speed 0 Mbps - full-duplex Port 1 Link Up - speed 0 Mbps - full-duplex On one side, traces show that there is NUMA, speed and hupepage issue. Have you a idea ? Thank you -- You are receiving this mail because: You are the assignee for the bug.
Re: [dpdk-dev] [PATCH v2 1/8] eal: introduce DMA memory barriers
16/01/2018 10:10, Jianbo Liu: > The 01/16/2018 10:49, Andrew Rybchenko wrote: > > On 01/16/2018 04:10 AM, Yongseok Koh wrote: > > >This commit introduces rte_dma_wmb() and rte_dma_rmb(), in order to > > >guarantee the ordering of coherent shared memory between the CPU and a DMA > > >capable device. > > > > > >Signed-off-by: Yongseok Koh > > >--- > > > lib/librte_eal/common/include/generic/rte_atomic.h | 18 > > > ++ > > > 1 file changed, 18 insertions(+) > > > > > >diff --git a/lib/librte_eal/common/include/generic/rte_atomic.h > > >b/lib/librte_eal/common/include/generic/rte_atomic.h > > >index 16af5ca57..2e0503ce6 100644 > > >--- a/lib/librte_eal/common/include/generic/rte_atomic.h > > >+++ b/lib/librte_eal/common/include/generic/rte_atomic.h > > >@@ -98,6 +98,24 @@ static inline void rte_io_wmb(void); > > > */ > > > static inline void rte_io_rmb(void); > > >+/** > > >+ * Write memory barrier for coherent memory between lcore and IO device > > >+ * > > >+ * Guarantees that the STORE operations on coherent memory that > > >+ * precede the rte_dma_wmb() call are visible to I/O device before the > > >+ * STORE operations that follow it. > > >+ */ > > >+static inline void rte_dma_wmb(void); > > >+ > > >+/** > > >+ * Read memory barrier for coherent memory between lcore and IO device > > >+ * > > >+ * Guarantees that the LOAD operations on coherent memory updated by > > >+ * IO device that precede the rte_dma_rmb() call are visible to CPU > > >+ * before the LOAD operations that follow it. > > >+ */ > > >+static inline void rte_dma_rmb(void); > > >+ > > > #endif /* __DOXYGEN__ */ > > > /** > > > > I'm not an ARMv8 expert so, my comments could be a bit ignorant. > > I'd like to understand the difference between io and added here dma > > barriers. > > The difference should be clearly explained. Otherwise we'll constantly hit > > on incorrect choice of barrier type. > > Also I don't understand why "dma" name is chosen taking into account > > that definition is bound to coherent memory between lcore and IO device. > > A good explanation can be found here. > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1077fa36f23e259858caf6f269a47393a5aff523 I agree that something more is needed to explain when to use rte_io_*. The only difference between rte_io_* and rte_dma_* is "on coherent memory".
[dpdk-dev] [PATCH v5] vhost_user: protect active rings from async ring changes
When performing live migration or memory hot-plugging, the changes to the device and vrings made by message handler done independently from vring usage by PMD threads. This causes for example segfaults during live-migration with MQ enable, but in general virtually any request sent by qemu changing the state of device can cause problems. These patches fixes all above issues by adding a spinlock to every vring and requiring message handler to start operation only after ensuring that all PMD threads related to the device are out of critical section accessing the vring data. Each vring has its own lock in order to not create contention between PMD threads of different vrings and to prevent performance degradation by scaling queue pair number. See https://bugzilla.redhat.com/show_bug.cgi?id=1450680 Signed-off-by: Victor Kaplansky --- v5: o get rid of spinlock wrapping functions in vhost.h v4: o moved access_unlock before accessing enable flag and access_unlock after iommu_unlock consistently. o cosmetics: removed blank line. o the access_lock variable moved to be in the same cache line with enable and access_ok flags. o dequeue path is now guarded with trylock and returning zero if unsuccessful. o GET_VRING_BASE operation is not guarded by access lock to avoid deadlock with device_destroy. See the comment in the code. o Fixed error path exit from enqueue and dequeue carefully unlocking access and iommu locks as appropriate. v3: o Added locking to enqueue flow. o Enqueue path guarded as well as dequeue path. o Changed name of active_lock. o Added initialization of guarding spinlock. o Reworked functions skimming over all virt-queues. o Performance measurements done by Maxime Coquelin shows no degradation in bandwidth and throughput. o Spelling. o Taking lock only on set operations. o IOMMU messages are not guarded by access lock. v2: o Fixed checkpatch complains. o Added Signed-off-by. o Refined placement of guard to exclude IOMMU messages. o TODO: performance degradation measurement. lib/librte_vhost/vhost.h | 6 ++-- lib/librte_vhost/vhost.c | 1 + lib/librte_vhost/vhost_user.c | 70 +++ lib/librte_vhost/virtio_net.c | 28 ++--- 4 files changed, 99 insertions(+), 6 deletions(-) diff --git a/lib/librte_vhost/vhost.h b/lib/librte_vhost/vhost.h index 1cc81c17..c8f2a817 100644 --- a/lib/librte_vhost/vhost.h +++ b/lib/librte_vhost/vhost.h @@ -108,12 +108,14 @@ struct vhost_virtqueue { /* Backend value to determine if device should started/stopped */ int backend; + int enabled; + int access_ok; + rte_spinlock_t access_lock; + /* Used to notify the guest (trigger interrupt) */ int callfd; /* Currently unused as polling mode is enabled */ int kickfd; - int enabled; - int access_ok; /* Physical address of used ring, for logging */ uint64_tlog_guest_addr; diff --git a/lib/librte_vhost/vhost.c b/lib/librte_vhost/vhost.c index 4f8b73a0..dcc42fc7 100644 --- a/lib/librte_vhost/vhost.c +++ b/lib/librte_vhost/vhost.c @@ -259,6 +259,7 @@ alloc_vring_queue(struct virtio_net *dev, uint32_t vring_idx) dev->virtqueue[vring_idx] = vq; init_vring_queue(dev, vring_idx); + rte_spinlock_init(&vq->access_lock); dev->nr_vring += 1; diff --git a/lib/librte_vhost/vhost_user.c b/lib/librte_vhost/vhost_user.c index f4c7ce46..0685d4e7 100644 --- a/lib/librte_vhost/vhost_user.c +++ b/lib/librte_vhost/vhost_user.c @@ -1190,12 +1190,47 @@ vhost_user_check_and_alloc_queue_pair(struct virtio_net *dev, VhostUserMsg *msg) return alloc_vring_queue(dev, vring_idx); } +static void +vhost_user_lock_all_queue_pairs(struct virtio_net *dev) +{ + unsigned int i = 0; + unsigned int vq_num = 0; + + while (vq_num < dev->nr_vring) { + struct vhost_virtqueue *vq = dev->virtqueue[i]; + + if (vq) { + rte_spinlock_lock(&vq->access_lock); + vq_num++; + } + i++; + } +} + +static void +vhost_user_unlock_all_queue_pairs(struct virtio_net *dev) +{ + unsigned int i = 0; + unsigned int vq_num = 0; + + while (vq_num < dev->nr_vring) { + struct vhost_virtqueue *vq = dev->virtqueue[i]; + + if (vq) { + rte_spinlock_unlock(&vq->access_lock); + vq_num++; + } + i++; + } +} + int vhost_user_msg_handler(int vid, int fd) { struct virtio_net *dev; struct VhostUserMsg msg; int ret; + int unlock_required = 0; dev = get_device(vid); if (dev ==
Re: [dpdk-dev] [dpdk-stable] [PATCH] pdump: fix error code check when creating/canceling pthread
> > On error, pthread_create() returns a positive number (an errno) but does > > not set the errno variable. > > > > Fixes: 278f945402c5 ("pdump: add new library for packet capture") > > Cc: sta...@dpdk.org > > > > Signed-off-by: Olivier Matz > > Acked-by: John McNamara Applied, thanks
Re: [dpdk-dev] [PATCH v2 2/6] ethdev: add port ownership
On Wed, Jan 17, 2018 at 12:05:42PM +, Matan Azrad wrote: > > Hi Konstantin > From: Ananyev, Konstantin, Sent: Wednesday, January 17, 2018 1:24 PM > > Hi Matan, > > > > > Hi Konstantin > > > > > > From: Ananyev, Konstantin, Tuesday, January 16, 2018 9:11 PM > > > > Hi Matan, > > > > > > > > > > > > > > Hi Konstantin > > > > > From: Ananyev, Konstantin, Monday, January 15, 2018 8:44 PM > > > > > > Hi Matan, > > > > > > > Hi Konstantin > > > > > > > From: Ananyev, Konstantin, Monday, January 15, 2018 1:45 PM > > > > > > > > Hi Matan, > > > > > > > > > Hi Konstantin > > > > > > > > > From: Ananyev, Konstantin, Friday, January 12, 2018 2:02 > > > > > > > > > AM > > > > > > > > > > Hi Matan, > > > > > > > > > > > Hi Konstantin > > > > > > > > > > > From: Ananyev, Konstantin, Thursday, January 11, 2018 > > > > > > > > > > > 2:40 PM > > > > > > > > > > > > Hi Matan, > > > > > > > > > > > > > Hi Konstantin > > > > > > > > > > > > > From: Ananyev, Konstantin, Wednesday, January 10, > > > > > > > > > > > > > 2018 > > > > > > > > > > > > > 3:36 PM > > > > > > > > > > > > > > Hi Matan, > > > > > > > > > > > > > > > > > > > It is good to see that now scanning/updating > > > > > > > > > > > > > > rte_eth_dev_data[] is lock protected, but it > > > > > > > > > > > > > > might be not very plausible to protect both > > > > > > > > > > > > > > data[] and next_owner_id using the > > > > > > > > same lock. > > > > > > > > > > > > > > > > > > > > > > > > > > I guess you mean to the owner structure in > > > > > > > > rte_eth_dev_data[port_id]. > > > > > > > > > > > > > The next_owner_id is read by ownership APIs(for > > > > > > > > > > > > > owner validation), so it > > > > > > > > > > > > makes sense to use the same lock. > > > > > > > > > > > > > Actually, why not? > > > > > > > > > > > > > > > > > > > > > > > > Well to me next_owner_id and rte_eth_dev_data[] are > > > > > > > > > > > > not directly > > > > > > > > > > related. > > > > > > > > > > > > You may create new owner_id but it doesn't mean you > > > > > > > > > > > > would update rte_eth_dev_data[] immediately. > > > > > > > > > > > > And visa-versa - you might just want to update > > > > > > > > > > > > rte_eth_dev_data[].name or .owner_id. > > > > > > > > > > > > It is not very good coding practice to use same lock > > > > > > > > > > > > for non-related data structures. > > > > > > > > > > > > > > > > > > > > > > > I see the relation like next: > > > > > > > > > > > Since the ownership mechanism synchronization is in > > > > > > > > > > > ethdev responsibility, we must protect against user > > > > > > > > > > > mistakes as much as we can by > > > > > > > > > > using the same lock. > > > > > > > > > > > So, if user try to set by invalid owner (exactly the > > > > > > > > > > > ID which currently is > > > > > > > > > > allocated) we can protect on it. > > > > > > > > > > > > > > > > > > > > Hmm, not sure why you can't do same checking with > > > > > > > > > > different lock or atomic variable? > > > > > > > > > > > > > > > > > > > The set ownership API is protected by ownership lock and > > > > > > > > > checks the owner ID validity By reading the next owner ID. > > > > > > > > > So, the owner ID allocation and set API should use the > > > > > > > > > same atomic > > > > > > > > mechanism. > > > > > > > > > > > > > > > > Sure but all you are doing for checking validity, is check > > > > > > > > that owner_id > 0 &&& owner_id < next_ownwe_id, right? > > > > > > > > As you don't allow owner_id overlap (16/3248 bits) you can > > > > > > > > safely do same check with just atomic_get(&next_owner_id). > > > > > > > > > > > > > > > It will not protect it, scenario: > > > > > > > - current next_id is X. > > > > > > > - call set ownership of port A with owner id X by thread 0(by > > > > > > > user > > > > mistake). > > > > > > > - context switch > > > > > > > - allocate new id by thread 1 and get X and change next_id to > > > > > > > X+1 > > > > > > atomically. > > > > > > > - context switch > > > > > > > - Thread 0 validate X by atomic_read and succeed to take > > ownership. > > > > > > > - The system loosed the port(or will be managed by two > > > > > > > entities) - > > > > crash. > > > > > > > > > > > > > > > > > > Ok, and how using lock will protect you with such scenario? > > > > > > > > > > The owner set API validation by thread 0 should fail because the > > > > > owner > > > > validation is included in the protected section. > > > > > > > > Then your validation function would fail even if you'll use atomic > > > > ops instead of lock. > > > No. > > > With atomic this specific scenario will cause the validation to pass. > > > > Can you explain to me how? > > > > rte_eth_is_valid_owner_id(uint16_t owner_id) { > > int32_t cur_owner_id = > > RTE_MIN(rte_atomic32_get(next_owner_id), > > UINT16_MAX); > > > > if (owner_id == RTE_ETH_DEV_NO_OWNER || owner > > > cur_owner_id) { > > RTE_LOG(ERR, EAL, "Invalid owner_id=%d.\n", owner_id)
[dpdk-dev] [PATCH V5 1/2] net/tap: use new Tx offloads API
Ethdev Tx offloads API has changed since: commit cba7f53b717d ("ethdev: introduce Tx queue offloads API") This commit adds support for the new Tx offloads API. Signed-off-by: Moti Haimovsky --- V5: * Fixed compilation errors caused by not using PRIx64 in log messages when displaying uint64_t values. * Modified patch name V3: * Fixed coding style warnings V2: * Fixed coding style warnings --- drivers/net/tap/rte_eth_tap.c | 97 +-- drivers/net/tap/rte_eth_tap.h | 1 + 2 files changed, 86 insertions(+), 12 deletions(-) diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c index be73f93..7357dce 100644 --- a/drivers/net/tap/rte_eth_tap.c +++ b/drivers/net/tap/rte_eth_tap.c @@ -24,6 +24,7 @@ #include #include #include +#include #include #include #include @@ -346,6 +347,42 @@ enum ioctl_mode { return num_rx; } +static uint64_t +tap_tx_offload_get_port_capa(void) +{ + /* +* In order to support legacy apps, +* report capabilities also as port capabilities. +*/ + return DEV_TX_OFFLOAD_IPV4_CKSUM | + DEV_TX_OFFLOAD_UDP_CKSUM | + DEV_TX_OFFLOAD_TCP_CKSUM; +} + +static uint64_t +tap_tx_offload_get_queue_capa(void) +{ + return DEV_TX_OFFLOAD_IPV4_CKSUM | + DEV_TX_OFFLOAD_UDP_CKSUM | + DEV_TX_OFFLOAD_TCP_CKSUM; +} + +static bool +tap_txq_are_offloads_valid(struct rte_eth_dev *dev, uint64_t offloads) +{ + uint64_t port_offloads = dev->data->dev_conf.txmode.offloads; + uint64_t queue_supp_offloads = tap_tx_offload_get_queue_capa(); + uint64_t port_supp_offloads = tap_tx_offload_get_port_capa(); + + if ((offloads & (queue_supp_offloads | port_supp_offloads)) != + offloads) + return false; + /* Verify we have no conflict with port offloads */ + if ((port_offloads ^ offloads) & port_supp_offloads) + return false; + return true; +} + static void tap_tx_offload(char *packet, uint64_t ol_flags, unsigned int l2_len, unsigned int l3_len) @@ -432,9 +469,10 @@ enum ioctl_mode { rte_pktmbuf_mtod(seg, void *); seg = seg->next; } - if (mbuf->ol_flags & (PKT_TX_IP_CKSUM | PKT_TX_IPV4) || - (mbuf->ol_flags & PKT_TX_L4_MASK) == PKT_TX_UDP_CKSUM || - (mbuf->ol_flags & PKT_TX_L4_MASK) == PKT_TX_TCP_CKSUM) { + if (txq->csum && + ((mbuf->ol_flags & (PKT_TX_IP_CKSUM | PKT_TX_IPV4) || +(mbuf->ol_flags & PKT_TX_L4_MASK) == PKT_TX_UDP_CKSUM || +(mbuf->ol_flags & PKT_TX_L4_MASK) == PKT_TX_TCP_CKSUM))) { /* Support only packets with all data in the same seg */ if (mbuf->nb_segs > 1) break; @@ -572,6 +610,17 @@ enum ioctl_mode { static int tap_dev_configure(struct rte_eth_dev *dev) { + uint64_t supp_tx_offloads = tap_tx_offload_get_port_capa(); + uint64_t tx_offloads = dev->data->dev_conf.txmode.offloads; + + if ((tx_offloads & supp_tx_offloads) != tx_offloads) { + rte_errno = ENOTSUP; + RTE_LOG(ERR, PMD, + "Some Tx offloads are not supported " + "requested 0x%" PRIx64 " supported 0x%" PRIx64 "\n", + tx_offloads, supp_tx_offloads); + return -rte_errno; + } if (dev->data->nb_rx_queues > RTE_PMD_TAP_MAX_QUEUES) { RTE_LOG(ERR, PMD, "%s: number of rx queues %d exceeds max num of queues %d\n", @@ -648,10 +697,9 @@ enum ioctl_mode { dev_info->rx_offload_capa = (DEV_RX_OFFLOAD_IPV4_CKSUM | DEV_RX_OFFLOAD_UDP_CKSUM | DEV_RX_OFFLOAD_TCP_CKSUM); - dev_info->tx_offload_capa = - (DEV_TX_OFFLOAD_IPV4_CKSUM | -DEV_TX_OFFLOAD_UDP_CKSUM | -DEV_TX_OFFLOAD_TCP_CKSUM); + dev_info->tx_queue_offload_capa = tap_tx_offload_get_queue_capa(); + dev_info->tx_offload_capa = dev_info->tx_queue_offload_capa | + tap_tx_offload_get_port_capa(); } static int @@ -1025,21 +1073,46 @@ enum ioctl_mode { uint16_t tx_queue_id, uint16_t nb_tx_desc __rte_unused, unsigned int socket_id __rte_unused, - const struct rte_eth_txconf *tx_conf __rte_unused) + const struct rte_eth_txconf *tx_conf) { struct pmd_internals *internals = dev->data->dev_private; + struct tx_queue *txq; int ret; if (tx_queue_id >= dev->data->nb_tx_queues) return -1; - dev->data->tx_queues[tx_queue_id] = &internals->txq[tx_queue_id]; + txq = dev->data->tx_q
[dpdk-dev] [PATCH V5 2/2] net/tap: use new Rx offloads API
Ethdev Rx offloads API has changed since: commit ce17eddefc20 ("ethdev: introduce Rx queue offloads API") This commit adds support for the new Rx offloads API. Signed-off-by: Moti Haimovsky --- V5: * Fixed compilation errors caused by not using PRIx64 in log messages when displaying uint64_t values. * Modified patch name V4: Modifications according to inputs from Pascal Mazon * Removed extra braces. V3: * Fixed coding style warnings V2: * Fixed coding style warnings --- drivers/net/tap/rte_eth_tap.c | 67 +-- 1 file changed, 59 insertions(+), 8 deletions(-) diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c index 7357dce..2eb8734 100644 --- a/drivers/net/tap/rte_eth_tap.c +++ b/drivers/net/tap/rte_eth_tap.c @@ -253,6 +253,43 @@ enum ioctl_mode { } } +static uint64_t +tap_rx_offload_get_port_capa(void) +{ + /* +* In order to support legacy apps, +* report capabilities also as port capabilities. +*/ + return DEV_RX_OFFLOAD_SCATTER | + DEV_RX_OFFLOAD_IPV4_CKSUM | + DEV_RX_OFFLOAD_UDP_CKSUM | + DEV_RX_OFFLOAD_TCP_CKSUM; +} + +static uint64_t +tap_rx_offload_get_queue_capa(void) +{ + return DEV_RX_OFFLOAD_SCATTER | + DEV_RX_OFFLOAD_IPV4_CKSUM | + DEV_RX_OFFLOAD_UDP_CKSUM | + DEV_RX_OFFLOAD_TCP_CKSUM; +} + +static bool +tap_rxq_are_offloads_valid(struct rte_eth_dev *dev, uint64_t offloads) +{ + uint64_t port_offloads = dev->data->dev_conf.rxmode.offloads; + uint64_t queue_supp_offloads = tap_rx_offload_get_queue_capa(); + uint64_t port_supp_offloads = tap_rx_offload_get_port_capa(); + + if ((offloads & (queue_supp_offloads | port_supp_offloads)) != + offloads) + return false; + if ((port_offloads ^ offloads) & port_supp_offloads) + return false; + return true; +} + /* Callback to handle the rx burst of packets to the correct interface and * file descriptor(s) in a multi-queue setup. */ @@ -277,8 +314,9 @@ enum ioctl_mode { int len; len = readv(rxq->fd, *rxq->iovecs, - 1 + (rxq->rxmode->enable_scatter ? -rxq->nb_rx_desc : 1)); + 1 + + (rxq->rxmode->offloads & DEV_RX_OFFLOAD_SCATTER ? +rxq->nb_rx_desc : 1)); if (len < (int)sizeof(struct tun_pi)) break; @@ -333,7 +371,7 @@ enum ioctl_mode { seg->next = NULL; mbuf->packet_type = rte_net_get_ptype(mbuf, NULL, RTE_PTYPE_ALL_MASK); - if (rxq->rxmode->hw_ip_checksum) + if (rxq->rxmode->offloads & DEV_RX_OFFLOAD_CHECKSUM) tap_verify_csum(mbuf); /* account for the receive frame */ @@ -694,12 +732,12 @@ enum ioctl_mode { dev_info->min_rx_bufsize = 0; dev_info->pci_dev = NULL; dev_info->speed_capa = tap_dev_speed_capa(); - dev_info->rx_offload_capa = (DEV_RX_OFFLOAD_IPV4_CKSUM | -DEV_RX_OFFLOAD_UDP_CKSUM | -DEV_RX_OFFLOAD_TCP_CKSUM); + dev_info->rx_queue_offload_capa = tap_rx_offload_get_queue_capa(); + dev_info->rx_offload_capa = tap_rx_offload_get_port_capa() | + dev_info->rx_queue_offload_capa; dev_info->tx_queue_offload_capa = tap_tx_offload_get_queue_capa(); - dev_info->tx_offload_capa = dev_info->tx_queue_offload_capa | - tap_tx_offload_get_port_capa(); + dev_info->tx_offload_capa = tap_tx_offload_get_port_capa() | + dev_info->tx_queue_offload_capa; } static int @@ -1015,6 +1053,19 @@ enum ioctl_mode { return -1; } + /* Verify application offloads are valid for our port and queue. */ + if (!tap_rxq_are_offloads_valid(dev, rx_conf->offloads)) { + rte_errno = ENOTSUP; + RTE_LOG(ERR, PMD, + "%p: Rx queue offloads 0x%" PRIx64 + " don't match port offloads 0x%" PRIx64 + " or supported offloads 0x%" PRIx64 "\n", + (void *)dev, rx_conf->offloads, + dev->data->dev_conf.rxmode.offloads, + (tap_rx_offload_get_port_capa() | +tap_rx_offload_get_queue_capa())); + return -rte_errno; + } rxq->mp = mp; rxq->trigger_seen = 1; /* force initial burst */ rxq->in_port = dev->data->port_id; -- 1.8.3.1
Re: [dpdk-dev] [PATCH v6 0/2] app/testpmd: fix invalid rxq and txq nubmer settings
12/01/2018 12:31, Wei Dai: > If an invlaid number of RX or TX queues is configured from testpmd > command like "port config all rxq number" or "port config all txq > number" or from --rxq and --txq in the command to start testpmd. > The global variable nb_rxq or nb_txq is updated by the invalid > input. This can cause testpmd crash. For example, if the maximum > number of RX or TX queues is 4, testpmd will crash after running > commands "port config all rxq 5", "port config all txq 5" and > "start" in sequence. > With these 2 patches, if an invalid input is detected, it is refused > and testpmd keeps last correct values of nb_rxq and nb_txq. > > Fixes: ce8d561418d4 ("app/testpmd: add port configuration settings") > Cc: sta...@dpdk.org > > Signed-off-by: Wei Dai > Aced-by: Konstantin Ananyev > Tested-by: Yuan Peng Applied, thanks
[dpdk-dev] [PATCH v4 2/2] net/failsafe: use new Rx offloads API
Ethdev Rx offloads API has changed since: commit ce17eddefc20 ("ethdev: introduce Rx queue offloads API") This commit adds support for the new Rx offloads API. Signed-off-by: Moti Haimovsky --- V4: Modifications according to inputs from Gaetan Rivet in reply to 1515595223-36144-2-git-send-email-mo...@mellanox.com V2: * Fixed coding style warnings. --- drivers/net/failsafe/failsafe_ops.c | 45 - 1 file changed, 44 insertions(+), 1 deletion(-) diff --git a/drivers/net/failsafe/failsafe_ops.c b/drivers/net/failsafe/failsafe_ops.c index 53a242e..a2c74f5 100644 --- a/drivers/net/failsafe/failsafe_ops.c +++ b/drivers/net/failsafe/failsafe_ops.c @@ -77,6 +77,13 @@ DEV_RX_OFFLOAD_UDP_CKSUM | DEV_RX_OFFLOAD_TCP_CKSUM | DEV_RX_OFFLOAD_TCP_LRO, + .rx_queue_offload_capa = + DEV_RX_OFFLOAD_VLAN_STRIP | + DEV_RX_OFFLOAD_QINQ_STRIP | + DEV_RX_OFFLOAD_IPV4_CKSUM | + DEV_RX_OFFLOAD_UDP_CKSUM | + DEV_RX_OFFLOAD_TCP_CKSUM | + DEV_RX_OFFLOAD_TCP_LRO, .tx_offload_capa = 0x0, .flow_type_rss_offloads = 0x0, }; @@ -255,6 +262,25 @@ fs_dev_free_queues(dev); } +static bool +fs_rxq_offloads_valid(struct rte_eth_dev *dev, uint64_t offloads) +{ + uint64_t port_offloads; + uint64_t queue_supp_offloads; + uint64_t port_supp_offloads; + + port_offloads = dev->data->dev_conf.rxmode.offloads; + queue_supp_offloads = PRIV(dev)->infos.rx_queue_offload_capa; + port_supp_offloads = PRIV(dev)->infos.rx_offload_capa; + if ((offloads & (queue_supp_offloads | port_supp_offloads)) != +offloads) + return false; + /* Verify we have no conflict with port offloads */ + if ((port_offloads ^ offloads) & port_supp_offloads) + return false; + return true; +} + static void fs_rx_queue_release(void *queue) { @@ -292,6 +318,18 @@ fs_rx_queue_release(rxq); dev->data->rx_queues[rx_queue_id] = NULL; } + /* Verify application offloads are valid for our port and queue. */ + if (fs_rxq_offloads_valid(dev, rx_conf->offloads) == false) { + rte_errno = ENOTSUP; + ERROR("Rx queue offloads 0x%" PRIx64 + " don't match port offloads 0x%" PRIx64 + " or supported offloads 0x%" PRIx64, + rx_conf->offloads, + dev->data->dev_conf.rxmode.offloads, + PRIV(dev)->infos.rx_offload_capa | + PRIV(dev)->infos.rx_queue_offload_capa); + return -rte_errno; + } rxq = rte_zmalloc(NULL, sizeof(*rxq) + sizeof(rte_atomic64_t) * PRIV(dev)->subs_tail, @@ -595,17 +633,22 @@ rte_memcpy(&PRIV(dev)->infos, &default_infos, sizeof(default_infos)); } else { - uint32_t rx_offload_capa; + uint64_t rx_offload_capa; + uint64_t rxq_offload_capa; rx_offload_capa = default_infos.rx_offload_capa; + rxq_offload_capa = default_infos.rx_queue_offload_capa; FOREACH_SUBDEV_STATE(sdev, i, dev, DEV_PROBED) { rte_eth_dev_info_get(PORT_ID(sdev), &PRIV(dev)->infos); rx_offload_capa &= PRIV(dev)->infos.rx_offload_capa; + rxq_offload_capa &= + PRIV(dev)->infos.rx_queue_offload_capa; } sdev = TX_SUBDEV(dev); rte_eth_dev_info_get(PORT_ID(sdev), &PRIV(dev)->infos); PRIV(dev)->infos.rx_offload_capa = rx_offload_capa; + PRIV(dev)->infos.rx_queue_offload_capa = rxq_offload_capa; PRIV(dev)->infos.tx_offload_capa &= default_infos.tx_offload_capa; PRIV(dev)->infos.tx_queue_offload_capa &= -- 1.8.3.1
[dpdk-dev] [PATCH v4 1/2] net/failsafe: use new Tx offloads API
Ethdev Tx offloads API has changed since: commit cba7f53b717d ("ethdev: introduce Tx queue offloads API") This commit adds support for the new Tx offloads API. Signed-off-by: Moti Haimovsky --- V4: Modifications according to inputs from Gaetan Rivet in reply to 1515595223-36144-1-git-send-email-mo...@mellanox.com V3: Modifications according to inputs from Stephen Hemminger * Removed unnecessary initialization of tx_queue_offload_capa V2: * Fixed coding style warnings. --- drivers/net/failsafe/failsafe_ops.c | 50 + 1 file changed, 50 insertions(+) diff --git a/drivers/net/failsafe/failsafe_ops.c b/drivers/net/failsafe/failsafe_ops.c index fe957ad..53a242e 100644 --- a/drivers/net/failsafe/failsafe_ops.c +++ b/drivers/net/failsafe/failsafe_ops.c @@ -31,6 +31,7 @@ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ +#include #include #include @@ -84,9 +85,20 @@ fs_dev_configure(struct rte_eth_dev *dev) { struct sub_device *sdev; + uint64_t supp_tx_offloads; + uint64_t tx_offloads; uint8_t i; int ret; + supp_tx_offloads = PRIV(dev)->infos.tx_offload_capa; + tx_offloads = dev->data->dev_conf.txmode.offloads; + if ((tx_offloads & supp_tx_offloads) != tx_offloads) { + rte_errno = ENOTSUP; + ERROR("Some Tx offloads are not supported, " + "requested 0x%" PRIx64 " supported 0x%" PRIx64, + tx_offloads, supp_tx_offloads); + return -rte_errno; + } FOREACH_SUBDEV(sdev, i, dev) { int rmv_interrupt = 0; int lsc_interrupt = 0; @@ -312,6 +324,25 @@ return ret; } +static bool +fs_txq_offloads_valid(struct rte_eth_dev *dev, uint64_t offloads) +{ + uint64_t port_offloads; + uint64_t queue_supp_offloads; + uint64_t port_supp_offloads; + + port_offloads = dev->data->dev_conf.txmode.offloads; + queue_supp_offloads = PRIV(dev)->infos.tx_queue_offload_capa; + port_supp_offloads = PRIV(dev)->infos.tx_offload_capa; + if ((offloads & (queue_supp_offloads | port_supp_offloads)) != +offloads) + return false; + /* Verify we have no conflict with port offloads */ + if ((port_offloads ^ offloads) & port_supp_offloads) + return false; + return true; +} + static void fs_tx_queue_release(void *queue) { @@ -348,6 +379,23 @@ fs_tx_queue_release(txq); dev->data->tx_queues[tx_queue_id] = NULL; } + /* +* Don't verify queue offloads for applications which +* use the old API. +*/ + if (tx_conf != NULL && + (tx_conf->txq_flags & ETH_TXQ_FLAGS_IGNORE) && + fs_txq_offloads_valid(dev, tx_conf->offloads) == false) { + rte_errno = ENOTSUP; + ERROR("Tx queue offloads 0x%" PRIx64 + " don't match port offloads 0x%" PRIx64 + " or supported offloads 0x%" PRIx64, + tx_conf->offloads, + dev->data->dev_conf.txmode.offloads, + PRIV(dev)->infos.tx_offload_capa | + PRIV(dev)->infos.tx_queue_offload_capa); + return -rte_errno; + } txq = rte_zmalloc("ethdev TX queue", sizeof(*txq) + sizeof(rte_atomic64_t) * PRIV(dev)->subs_tail, @@ -560,6 +608,8 @@ PRIV(dev)->infos.rx_offload_capa = rx_offload_capa; PRIV(dev)->infos.tx_offload_capa &= default_infos.tx_offload_capa; + PRIV(dev)->infos.tx_queue_offload_capa &= + default_infos.tx_queue_offload_capa; PRIV(dev)->infos.flow_type_rss_offloads &= default_infos.flow_type_rss_offloads; } -- 1.8.3.1
Re: [dpdk-dev] [PATCH v5 00/15] common ethdev linkstatus functions
On 1/17/2018 7:56 AM, Andrew Rybchenko wrote: > On 01/16/2018 09:37 PM, Stephen Hemminger wrote: >> While reviewing drivers, noticed a lot of unnecessary >> duplication of code in drivers for handling the eth_dev link status >> information. While consolidating this, it also became obvious that >> some drivers behave differently for no good reason. >> >> It also was a good chance to introduce atomic exchange primitives >> in EAL because there are other places using cmpset where not >> necessary (such as bonding). >> >> Mostly only compile tested only, don't have all of the hardware >> available (except ixgbe and virtio) to test. >> >> Note: the eth_dev_link_update function return value is inconsistent >> across drivers. Should be changed to be void. > > I would say "link_update" callback return value is inconsistent across > drivers. I'm not sure which direction is right here: make it consistent > or make it void. Also any changes in link information could be > important. As I understand it should not happen without up/down, > but bugs with loss of intermediate transitions are definitely possible. > So, notifying about any changes in link information is definitely safer. > May be not now. Again, why not return previous link status, it is simple enough to prevent inconsistent usage. rte_eth_link_get() already discards the return value, so won't be a problem there. For the cases PMD would like know about link changes, they will need to implement almost same link_update function with a return value, so why not use existing link_update function? Like been in virtio, link_update() used in interrupt handler, and calls a callback process if status changes. When link_update() return status changed to void, I guess they will need to implement another version of the link_update with return and use it. > >> >> v5 >> - checkpatch whitespace cleanup >> >> v4 >> - incorporate review feedback >> - rename _rte_linkstatus to rte_linkstatus >> - change return value of _rte_linkstatus >> - optimize get on 64bit platforms >> - change return value of rte_linkstatus_set >> >> v3 >> - align rte_linkstatus_get with rte_atomic64_read >> - virtio use ETH_SPEED_NUM_10G >> - add net/ >> >> v2 >> - function names changed >> - rebased to current master >> >> Stephen Hemminger (15): >>eal: introduce atomic exchange operation >>ethdev: add linkstatus get/set helper functions >>net/virtio: use eth_linkstatus_set >>net/vmxnet3: use rte_eth_linkstatus_set >>net/dpaa2: use rte_eth_linkstatus_set >>net/nfp: use rte_eth_linkstatus functions >>net/e1000: use rte_eth_linkstatus helpers >>net/ixgbe: use rte_eth_linkstatus functions >>net/sfc: use new rte_eth_linkstatus functions >>net/i40e: use rte_eth_linkstatus functions >>net/liquidio: use rte_eth_linkstatus_set >>net/thunderx: use rte_eth_linkstatus_set >>net/szedata: use _rte_eth_linkstatus_set >>net/octeontx: use rte_eth_linkstatus_set >>net/enic: use rte_eth_linkstatus_set >> >> drivers/net/dpaa2/dpaa2_ethdev.c | 75 ++--- >> drivers/net/e1000/em_ethdev.c | 69 ++-- >> drivers/net/e1000/igb_ethdev.c | 70 ++-- >> drivers/net/enic/enic_ethdev.c | 5 +- >> drivers/net/enic/enic_main.c | 17 ++-- >> drivers/net/i40e/i40e_ethdev.c | 43 ++ >> drivers/net/i40e/i40e_ethdev_vf.c | 18 +--- >> drivers/net/ixgbe/ixgbe_ethdev.c | 96 >> -- >> drivers/net/liquidio/lio_ethdev.c | 53 ++-- >> drivers/net/nfp/nfp_net.c | 77 ++--- >> drivers/net/octeontx/octeontx_ethdev.c | 17 +--- >> drivers/net/sfc/sfc_ethdev.c | 21 + >> drivers/net/sfc/sfc_ev.c | 20 + >> drivers/net/szedata2/rte_eth_szedata2.c| 19 ++--- >> drivers/net/thunderx/nicvf_ethdev.c| 18 +--- >> drivers/net/virtio/virtio_ethdev.c | 65 +++ >> drivers/net/vmxnet3/vmxnet3_ethdev.c | 86 --- >> .../common/include/arch/x86/rte_atomic.h | 24 ++ >> .../common/include/arch/x86/rte_atomic_32.h| 12 +++ >> .../common/include/arch/x86/rte_atomic_64.h| 12 +++ >> lib/librte_eal/common/include/generic/rte_atomic.h | 78 ++ >> lib/librte_ether/rte_ethdev.c | 22 + >> lib/librte_ether/rte_ethdev.h | 62 ++ >> 23 files changed, 302 insertions(+), 677 deletions(-) >> >
Re: [dpdk-dev] [RFC] New packet type query API
Wednesday, January 17, 2018 10:09 AM, Andrew RybchenkoL > On 01/16/2018 06:55 PM, Adrien Mazarguil wrote: > > I understand the motivation behind this proposal, however since new > > ideas must be challenged, I have a few comments: > > > > - How about making packet type recognition an optional offload > configurable > >per queue like any other (e.g. DEV_RX_OFFLOAD_PTYPE)? That way the > extra > >processing cost could be avoided for applications that do not care. > > > > - Depending on HW, packet type information inside RX descriptors may not > >necessarily fit 64-bit, or at least not without transformation. This > >transformation would still cause wasted cycles on the PMD side. > > > > - In case enable_ptype_direct is enabled, the PMD may not waste CPU > cycles > >but the subsequent look-up with the proposed API would translate to a > >higher cost on the application side. As a data plane API, how does this > >benefit applications that want to retrieve packet type information? > > > > - Without a dedicated mbuf flag, an application cannot tell whether > enclosed > >packet type data is in HW format. Even if present, if port information is > >discarded or becomes invalid (e.g. mbuf stored in an application queue > for > >lengthy periods or passed as is to an unrelated application), there is no > >way to make sense of the data. > > > > In my opinion, mbufs should only contain data fields in a standardized > > format. Managing packet types like an offload which can be toggled at > > will seems to be the best compromise. Thoughts? > > +1 Yes. PTYPE is yet another offload the PMD provides. It should be enabled/disabled in the same way all other offloads are. Application who are not interested with it, and wants the extra performance should not enable it.
[dpdk-dev] [PATCH] examples: update copyright and license
This updates the license on files in examples to be the standard BSD-3-Clause license used for the rest of DPDK, bringing the files in compliance with the DPDK licensing policy. Signed-off-by: Lee Daly --- examples/Makefile | 31 +- examples/cmdline/commands.c| 58 +- examples/cmdline/main.c| 58 +- examples/cmdline/parse_obj_list.c | 58 +- examples/cmdline/parse_obj_list.h | 58 +- examples/flow_filtering/Makefile | 33 +-- examples/flow_filtering/flow_blocks.c | 32 +- examples/flow_filtering/main.c | 32 +- examples/ip_pipeline/parser.c | 69 +- examples/ip_pipeline/pipeline/hash_func_arm64.h| 35 ++- examples/l3fwd/l3fwd_altivec.h | 37 ++-- examples/l3fwd/l3fwd_common.h | 32 +- examples/l3fwd/l3fwd_em_hlm.h | 37 ++-- examples/l3fwd/l3fwd_em_hlm_neon.h | 36 ++- examples/l3fwd/l3fwd_lpm_altivec.h | 36 ++- examples/l3fwd/l3fwd_lpm_neon.h| 36 ++- examples/l3fwd/l3fwd_neon.h| 36 ++- examples/netmap_compat/netmap/netmap.h | 30 +- examples/netmap_compat/netmap/netmap_user.h| 30 +- examples/performance-thread/common/arch/x86/ctx.c | 58 +- .../performance-thread/common/arch/x86/stack.h | 61 ++- examples/performance-thread/common/lthread.c | 56 +- examples/performance-thread/common/lthread.h | 56 +- examples/performance-thread/common/lthread_api.h | 56 +- examples/performance-thread/common/lthread_cond.c | 56 +- examples/performance-thread/common/lthread_cond.h | 56 +- examples/performance-thread/common/lthread_int.h | 58 ++ 27 files changed, 79 insertions(+), 1152 deletions(-) diff --git a/examples/Makefile b/examples/Makefile index 9f7974a..f0cf2a1 100644 --- a/examples/Makefile +++ b/examples/Makefile @@ -1,32 +1,5 @@ -# BSD LICENSE -# -# Copyright(c) 2016 6WIND S.A. -# -# Redistribution and use in source and binary forms, with or without -# modification, are permitted provided that the following conditions -# are met: -# -# * Redistributions of source code must retain the above copyright -# notice, this list of conditions and the following disclaimer. -# * Redistributions in binary form must reproduce the above copyright -# notice, this list of conditions and the following disclaimer in -# the documentation and/or other materials provided with the -# distribution. -# * Neither the name of 6WIND S.A. nor the names of its -# contributors may be used to endorse or promote products derived -# from this software without specific prior written permission. -# -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS -# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT -# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR -# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT -# OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, -# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT -# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, -# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY -# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT -# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +# SPDX-License-Identifier: BSD-3-Clause +# Copyright(c) 2016 6WIND S.A. ifeq ($(RTE_SDK),) $(error "Please define RTE_SDK environment variable") diff --git a/examples/cmdline/commands.c b/examples/cmdline/commands.c index f3ba247..06916d7 100644 --- a/examples/cmdline/commands.c +++ b/examples/cmdline/commands.c @@ -1,61 +1,7 @@ -/*- - * BSD LICENSE - * - * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. - * All rights reserved. - * - * Redistribution and use in source and binary forms, with or without - * modification, are permitted provided that the following conditions - * are met: - * - * * Redistributions of source code must retain the above copyright - * notice, this list of conditions and the following disclaimer. - * * Redistributions in binary form must reproduce the above copyright - * notice, this list of conditions and the following disclaimer in - * the documentation and/or other materials provided with the - * distribution.
Re: [dpdk-dev] [PATCH] vhost: do deep copy while reallocate vq
On Mon, Jan 15, 2018 at 06:32:19AM -0500, Junjie Chen wrote: > When vhost reallocate dev and vq for NUMA enabled case, it doesn't perform > deep copy, which lead to 1) zmbuf list not valid 2) remote memory access. > This patch is to re-initlize the zmbuf list and also do the deep copy. > > Signed-off-by: Junjie Chen > --- Applied to dpdk-next-virtio. Thanks. --yliu
Re: [dpdk-dev] [PATCH v3 1/6] net/ixgbevf: unregister irq handler when other interrupts not allowed.
Hi Tonghao, Xiangxia In general, I have several comments for your patch sets. 1. As there are 6 patches in your patch set, so a cover letter is needed. 2. You have send several version of patch sets, so '--in-reply-to' is needed to make sure all can see the history version together. Please follow the guidance on dpdk.org to contribute patches. 3. I saw one ACKs for one of your patches. You still need to get ACKs for all of your patch set. Please address the comments, and ask maintainers to help directly. 4. As today is the integration deadline, I don't know if canbe applied in 18.02. Anyway I will apply it as early as possibel whenever everything is ready and reasonable. Thank you very much for your contribution! Regards, Helin > -Original Message- > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of > xiangxia.m@gmail.com > Sent: Sunday, January 14, 2018 6:04 PM > To: Dai, Wei; Xing, Beilei > Cc: dev@dpdk.org; Tonghao Zhang > Subject: [dpdk-dev] [PATCH v3 1/6] net/ixgbevf: unregister irq handler when > other interrupts not allowed. > > From: Tonghao Zhang > > When bind the ixgbe VF (e.g 82599 card) to igb_uio and enable the > rx-interrupt, > there will be more than one epoll_wait on intr_handle.fd. > One is in "eal-intr-thread" thread, and the others are in the thread which > call > the "rte_epoll_wait". The problem is that sometimes "eal-intr-thread" thread > will process the rx interrupt, and then rte_epoll_wait can't get the event > anymore, and the packets may be lost. > > The patch unregister the status interrupt handler in "eal-intr-thread" > thread and the ixgbe pf is in the same case. > > Signed-off-by: Tonghao Zhang > Acked-by: Beilei Xing > --- > drivers/net/ixgbe/ixgbe_ethdev.c | 15 +++ > 1 file changed, 15 insertions(+) > > diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c > b/drivers/net/ixgbe/ixgbe_ethdev.c > index ff19a56..0e056a2 100644 > --- a/drivers/net/ixgbe/ixgbe_ethdev.c > +++ b/drivers/net/ixgbe/ixgbe_ethdev.c > @@ -5078,6 +5078,15 @@ static int > ixgbevf_dev_xstats_get_names(__rte_unused struct rte_eth_dev *dev, > } > ixgbevf_configure_msix(dev); > > + if (!rte_intr_allow_others(intr_handle)) { > + rte_intr_callback_unregister(intr_handle, > + ixgbevf_dev_interrupt_handler, > + dev); > + if (dev->data->dev_conf.intr_conf.lsc != 0) > + PMD_INIT_LOG(INFO, "lsc won't enable because of" > + " no intr multiplex"); > + } > + > /* When a VF port is bound to VFIO-PCI, only miscellaneous interrupt >* is mapped to VFIO vector 0 in eth_ixgbevf_dev_init( ). >* If previous VFIO interrupt mapping setting in eth_ixgbevf_dev_init( ) > @@ -5120,6 +5129,12 @@ static int > ixgbevf_dev_xstats_get_names(__rte_unused struct rte_eth_dev *dev, > > ixgbe_dev_clear_queues(dev); > > + if (!rte_intr_allow_others(intr_handle)) > + /* resume to the default handler */ > + rte_intr_callback_register(intr_handle, > +ixgbevf_dev_interrupt_handler, > +(void *)dev); > + > /* Clean datapath event and queue/vec mapping */ > rte_intr_efd_disable(intr_handle); > if (intr_handle->intr_vec != NULL) { > -- > 1.8.3.1
Re: [dpdk-dev] [PATCH v3] net/i40e: fix packet type parser issue
> -Original Message- > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Zhang, Qi Z > Sent: Tuesday, January 16, 2018 7:56 AM > To: Xing, Beilei > Cc: dev@dpdk.org; sta...@dpdk.org > Subject: Re: [dpdk-dev] [PATCH v3] net/i40e: fix packet type parser issue > > > > > -Original Message- > > From: Xing, Beilei > > Sent: Monday, January 15, 2018 9:51 PM > > To: Zhang, Qi Z > > Cc: dev@dpdk.org; sta...@dpdk.org > > Subject: [PATCH v3] net/i40e: fix packet type parser issue > > > > Ptype mapping table will fail to update when loading PPP profile, fix > > the issue via modifying metadata and adding check. > > This patch also adds parser for IPV4FRAG and IPV6FRAG. > > > > Fixes: ab2e350c4f4b ("net/i40e: improve packet type parser") > > Cc: sta...@dpdk.org > > > > Signed-off-by: Beilei Xing > > Acked-by: Qi Zhang Applied into dpdk-next-net-intel, with minor commit log corrections. Thanks! /Helin
Re: [dpdk-dev] [RFC PATCH 0/6] mempool: add bucket mempool driver
Hi Olivier, first of all many thanks for the review. See my replies/comments below. Also I'll reply to the the specific patch mails as well. On 12/14/2017 04:36 PM, Olivier MATZ wrote: Hi Andrew, Please find some comments about this patchset below. I'll also send some comments as replies to the specific patch. On Fri, Nov 24, 2017 at 04:06:25PM +, Andrew Rybchenko wrote: The patch series adds bucket mempool driver which allows to allocate (both physically and virtually) contiguous blocks of objects and adds mempool API to do it. It is still capable to provide separate objects, but it is definitely more heavy-weight than ring/stack drivers. The target usecase is dequeue in blocks and enqueue separate objects back (which are collected in buckets to be dequeued). So, the memory pool with bucket driver is created by an application and provided to networking PMD receive queue. The choice of bucket driver is done using rte_eth_dev_pool_ops_supported(). A PMD that relies upon contiguous block allocation should report the bucket driver as the only supported and preferred one. So, you are planning to use this driver for a future/existing PMD? Yes, we're going to use it in the sfc PMD in the case of dedicated FW variant which utilizes the bucketing. Do you have numbers about the performance gain, in which conditions, etc... ? And are there conditions where there is a performance loss ? Our idea here is to use it together HW/FW which understand the bucketing. It adds some load on CPU to track buckets, but block/bucket dequeue allows to compensate it. We'll try to prepare performance figures when we have solution close to final. Hopefully pretty soon. The number of objects in the contiguous block is a function of bucket memory size (.config option) and total element size. The size of the bucket memory is hardcoded to 32KB. Why this value ? It is just an example. In fact we test mainly with 64K and 128K. Won't that be an issue if the user wants to use larger objects? Ideally it should be start-time configurable, but it requires a way to specify driver-specific parameters passed to mempool on allocation. Right now we decided to keep the task for the future since there is no clear understanding on how it should look like. If you have ideas, please, share, we would be thankful. As I understand it breaks ABI so it requires 3 acks in accordance with policy, deprecation notice and mempool shared library version bump. If there is a way to avoid ABI breakage, please, let us know. If my understanding is correct, the ABI breakage is caused by the addition of the new block dequeue operation, right? Yes and we'll have more ops to make population of objects customizable. Thanks, Andrew.
Re: [dpdk-dev] [RFC PATCH 1/6] mempool: implement abstract mempool info API
On 12/14/2017 04:36 PM, Olivier MATZ wrote: On Fri, Nov 24, 2017 at 04:06:26PM +, Andrew Rybchenko wrote: From: "Artem V. Andreev" Primarily, it is intended as a way for the mempool driver to provide additional information on how it lays up objects inside the mempool. Signed-off-by: Artem V. Andreev Signed-off-by: Andrew Rybchenko --- lib/librte_mempool/rte_mempool.h | 31 +++ lib/librte_mempool/rte_mempool_ops.c | 15 +++ 2 files changed, 46 insertions(+) diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h index 721227f..3c59d36 100644 --- a/lib/librte_mempool/rte_mempool.h +++ b/lib/librte_mempool/rte_mempool.h @@ -217,6 +217,11 @@ struct rte_mempool_memhdr { void *opaque;/**< Argument passed to the free callback */ }; +/* + * Additional information about the mempool + */ +struct rte_mempool_info; + While there is no compilation issue, I find a bit strange to define this API without defining the content of rte_mempool_info. Agree. Mainly it was an attempt to fit required way to store objects in memory into existing approach. I agree that it is significantly better to solve it in the different way as you suggested. So, the patch will go away.
Re: [dpdk-dev] [PATCH] net/i40e: fix link_state update for i40e_ethdev_vf drv
> -Original Message- > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Tushar Mulkar > Sent: Wednesday, January 17, 2018 4:53 PM > To: dev@dpdk.org > Cc: Xing, Beilei; Zhang, Qi Z; Tushar Mulkar > Subject: [dpdk-dev] [PATCH] net/i40e: fix link_state update for i40e_ethdev_vf > drv > > The check for bool was accounting unwanted bits in the calulation of truth > value Could you help to add a bit more about the problem statement? > > Signed-off-by: Tushar Mulkar > --- > drivers/net/i40e/i40e_ethdev_vf.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/i40e/i40e_ethdev_vf.c > b/drivers/net/i40e/i40e_ethdev_vf.c > index b96d77a0c..9c14ea278 100644 > --- a/drivers/net/i40e/i40e_ethdev_vf.c > +++ b/drivers/net/i40e/i40e_ethdev_vf.c > @@ -2095,7 +2095,7 @@ i40evf_dev_link_update(struct rte_eth_dev *dev, > } > /* full duplex only */ > new_link.link_duplex = ETH_LINK_FULL_DUPLEX; > - new_link.link_status = vf->link_up ? ETH_LINK_UP : > + new_link.link_status = (vf->link_up & true) ? ETH_LINK_UP : >ETH_LINK_DOWN; Basically 'link_up' is in bool type, why adding bit manipulation on it? But thanks for your patches! /Helin > new_link.link_autoneg = > dev->data->dev_conf.link_speeds & ETH_LINK_SPEED_FIXED; > -- > 2.11.0
Re: [dpdk-dev] [RFC PATCH 2/6] mempool: implement clustered object allocation
On 12/14/2017 04:37 PM, Olivier MATZ wrote: On Fri, Nov 24, 2017 at 04:06:27PM +, Andrew Rybchenko wrote: From: "Artem V. Andreev" Clustered allocation is required to simplify packaging objects into buckets and search of the bucket control structure by an object. Signed-off-by: Artem V. Andreev Signed-off-by: Andrew Rybchenko --- lib/librte_mempool/rte_mempool.c | 39 +++ lib/librte_mempool/rte_mempool.h | 23 +-- test/test/test_mempool.c | 2 +- 3 files changed, 57 insertions(+), 7 deletions(-) diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c index d50dba4..43455a3 100644 --- a/lib/librte_mempool/rte_mempool.c +++ b/lib/librte_mempool/rte_mempool.c @@ -239,7 +239,8 @@ rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags, */ size_t rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t pg_shift, - unsigned int flags) + unsigned int flags, + const struct rte_mempool_info *info) { size_t obj_per_page, pg_num, pg_sz; unsigned int mask; @@ -252,6 +253,17 @@ rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t pg_shift, if (total_elt_sz == 0) return 0; + if (flags & MEMPOOL_F_CAPA_ALLOCATE_IN_CLUSTERS) { + unsigned int align_shift = + rte_bsf32( + rte_align32pow2(total_elt_sz * + info->cluster_size)); + if (pg_shift < align_shift) { + return ((elt_num / info->cluster_size) + 2) + << align_shift; + } + } + +Cc Santosh for this To be honnest, that was my fear when introducing MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS and MEMPOOL_F_CAPA_PHYS_CONTIG to see more and more specific flags in generic code. I feel that the hidden meaning of these flags is more "if driver == foo", which shows that something is wrong is the current design. We have to think about another way to do. Let me try to propose something (to be deepen). The standard way to create a mempool is: mp = create_empty(...) set_ops_by_name(mp, "my-driver")// optional populate_default(mp)// or populate_*() obj_iter(mp, callback, arg) // optional, to init objects // and optional local func to init mempool priv First, we can consider deprecating some APIs like: - rte_mempool_xmem_create() - rte_mempool_xmem_size() - rte_mempool_xmem_usage() - rte_mempool_populate_iova_tab() These functions were introduced for xen, which was recently removed. They are complex to use, and are not used anywhere else in DPDK. Then, instead of having flags (quite hard to understand without knowing the underlying driver), we can let the mempool drivers do the populate_default() operation. For that we can add a populate_default field in mempool ops. Same for populate_virt(), populate_anon(), and populate_phys() which can return -ENOTSUP if this is not implemented/implementable on a specific driver, or if flags (NO_CACHE_ALIGN, NO_SPREAD, ...) are not supported. If the function pointer is NULL, use the generic function. Thanks to this, the generic code would remain understandable and won't have to care about how memory should be allocated for a specific driver. Thoughts? Yes, I agree. This week we'll provide updated version of the RFC which covers it including transition of the mempool/octeontx. I think it is sufficient to introduce two new ops: 1. To calculate memory space required to store specified number of objects 2. To populate objects in the provided memory chunk (the op will be called from rte_mempool_populate_iova() which is a leaf function for all rte_mempool_populate_*() calls. It will allow to avoid duplication and keep memchunks housekeeping inside mempool library. [...] diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h index 3c59d36..9bcb8b7 100644 --- a/lib/librte_mempool/rte_mempool.h +++ b/lib/librte_mempool/rte_mempool.h @@ -220,7 +220,10 @@ struct rte_mempool_memhdr { /* * Additional information about the mempool */ -struct rte_mempool_info; +struct rte_mempool_info { + /** Number of objects in a cluster */ + unsigned int cluster_size; +}; I think what I'm proposing would also prevent to introduce this structure, which is generic but only applies to this driver. Yes
[dpdk-dev] [PATCH 1/2] net/vmxnet3: set the queue shared buffer at start
From: "Charles (Chas) Williams" If a reconfiguration happens, queuedesc is reallocated. Any queues that are preserved point to the previous queuedesc since the queues are only configured during queue setup. Delay configuration of the shared queue pointers until device start when queuedesc is no longer changing. Fixes: 8618d19b52b1 ("net/vmxnet3: reallocate shared memzone on re-config") Signed-off-by: Chas Williams --- drivers/net/vmxnet3/vmxnet3_ethdev.c | 4 drivers/net/vmxnet3/vmxnet3_rxtx.c | 4 ++-- 2 files changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/net/vmxnet3/vmxnet3_ethdev.c b/drivers/net/vmxnet3/vmxnet3_ethdev.c index d3b704b..776a0da 100644 --- a/drivers/net/vmxnet3/vmxnet3_ethdev.c +++ b/drivers/net/vmxnet3/vmxnet3_ethdev.c @@ -644,6 +644,8 @@ vmxnet3_setup_driver_shared(struct rte_eth_dev *dev) Vmxnet3_TxQueueDesc *tqd = &hw->tqd_start[i]; vmxnet3_tx_queue_t *txq = dev->data->tx_queues[i]; + txq->shared = &hw->tqd_start[i]; + tqd->ctrl.txNumDeferred = 0; tqd->ctrl.txThreshold= 1; tqd->conf.txRingBasePA = txq->cmd_ring.basePA; @@ -664,6 +666,8 @@ vmxnet3_setup_driver_shared(struct rte_eth_dev *dev) Vmxnet3_RxQueueDesc *rqd = &hw->rqd_start[i]; vmxnet3_rx_queue_t *rxq = dev->data->rx_queues[i]; + rxq->shared = &hw->rqd_start[i]; + rqd->conf.rxRingBasePA[0] = rxq->cmd_ring[0].basePA; rqd->conf.rxRingBasePA[1] = rxq->cmd_ring[1].basePA; rqd->conf.compRingBasePA = rxq->comp_ring.basePA; diff --git a/drivers/net/vmxnet3/vmxnet3_rxtx.c b/drivers/net/vmxnet3/vmxnet3_rxtx.c index f9416f3..64f24e6 100644 --- a/drivers/net/vmxnet3/vmxnet3_rxtx.c +++ b/drivers/net/vmxnet3/vmxnet3_rxtx.c @@ -908,7 +908,7 @@ vmxnet3_dev_tx_queue_setup(struct rte_eth_dev *dev, txq->queue_id = queue_idx; txq->port_id = dev->data->port_id; - txq->shared = &hw->tqd_start[queue_idx]; + txq->shared = NULL; /* set in vmxnet3_setup_driver_shared() */ txq->hw = hw; txq->qid = queue_idx; txq->stopped = TRUE; @@ -1011,7 +1011,7 @@ vmxnet3_dev_rx_queue_setup(struct rte_eth_dev *dev, rxq->mp = mp; rxq->queue_id = queue_idx; rxq->port_id = dev->data->port_id; - rxq->shared = &hw->rqd_start[queue_idx]; + rxq->shared = NULL; /* set in vmxnet3_setup_driver_shared() */ rxq->hw = hw; rxq->qid1 = queue_idx; rxq->qid2 = queue_idx + hw->num_rx_queues; -- 2.9.5
[dpdk-dev] [PATCH 2/2] net/vmxnet3: keep consistent link status
From: "Charles (Chas) Williams" Bonding may examine the link properties to ensure that matching interfaces are bound together. If the link is going to have fixed properties, these need to remain consistent regardless of the link_status or the state of the adapter. Signed-off-by: Chas Williams --- drivers/net/vmxnet3/vmxnet3_ethdev.c | 20 +++- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/drivers/net/vmxnet3/vmxnet3_ethdev.c b/drivers/net/vmxnet3/vmxnet3_ethdev.c index 776a0da..d5379dd 100644 --- a/drivers/net/vmxnet3/vmxnet3_ethdev.c +++ b/drivers/net/vmxnet3/vmxnet3_ethdev.c @@ -267,6 +267,7 @@ eth_vmxnet3_dev_init(struct rte_eth_dev *eth_dev) struct rte_pci_device *pci_dev; struct vmxnet3_hw *hw = eth_dev->data->dev_private; uint32_t mac_hi, mac_lo, ver; + struct rte_eth_link link; PMD_INIT_FUNC_TRACE(); @@ -369,6 +370,13 @@ eth_vmxnet3_dev_init(struct rte_eth_dev *eth_dev) memset(hw->saved_tx_stats, 0, sizeof(hw->saved_tx_stats)); memset(hw->saved_rx_stats, 0, sizeof(hw->saved_rx_stats)); + /* set the initial link status */ + memset(&link, 0, sizeof(link)); + link.link_duplex = ETH_LINK_FULL_DUPLEX; + link.link_speed = ETH_SPEED_NUM_10G; + link.link_autoneg = ETH_LINK_SPEED_FIXED; + vmxnet3_dev_atomic_write_link_status(eth_dev, &link); + return 0; } @@ -857,6 +865,9 @@ vmxnet3_dev_stop(struct rte_eth_dev *dev) /* Clear recorded link status */ memset(&link, 0, sizeof(link)); + link.link_duplex = ETH_LINK_FULL_DUPLEX; + link.link_speed = ETH_SPEED_NUM_10G; + link.link_autoneg = ETH_LINK_SPEED_FIXED; vmxnet3_dev_atomic_write_link_status(dev, &link); } @@ -1145,12 +1156,11 @@ __vmxnet3_dev_link_update(struct rte_eth_dev *dev, VMXNET3_WRITE_BAR1_REG(hw, VMXNET3_REG_CMD, VMXNET3_CMD_GET_LINK); ret = VMXNET3_READ_BAR1_REG(hw, VMXNET3_REG_CMD); - if (ret & 0x1) { + if (ret & 0x1) link.link_status = ETH_LINK_UP; - link.link_duplex = ETH_LINK_FULL_DUPLEX; - link.link_speed = ETH_SPEED_NUM_10G; - link.link_autoneg = ETH_LINK_AUTONEG; - } + link.link_duplex = ETH_LINK_FULL_DUPLEX; + link.link_speed = ETH_SPEED_NUM_10G; + link.link_autoneg = ETH_LINK_AUTONEG; vmxnet3_dev_atomic_write_link_status(dev, &link); -- 2.9.5
Re: [dpdk-dev] [RFC PATCH 3/6] mempool/bucket: implement bucket mempool manager
On 12/14/2017 04:38 PM, Olivier MATZ wrote: On Fri, Nov 24, 2017 at 04:06:28PM +, Andrew Rybchenko wrote: From: "Artem V. Andreev" The manager provides a way to allocate physically and virtually contiguous set of objects. Note: due to the way objects are organized in the bucket manager, the get_avail_count may return less objects than were enqueued. That breaks the expectation of mempool and mempool_perf tests. To me, this can be problematic. The driver should respect the API, or it will trigger hard-to-debug issues in applications. Can't this be fixed in some way or another? As I understand there is no requirements on how fast get_count works. If so, it is doable and we'll fix it in RFCv2. [...] --- a/config/common_base +++ b/config/common_base @@ -608,6 +608,8 @@ CONFIG_RTE_LIBRTE_MEMPOOL_DEBUG=n # # Compile Mempool drivers # +CONFIG_RTE_DRIVER_MEMPOOL_BUCKET=y +CONFIG_RTE_DRIVER_MEMPOOL_BUCKET_SIZE_KB=32 CONFIG_RTE_DRIVER_MEMPOOL_RING=y CONFIG_RTE_DRIVER_MEMPOOL_STACK=y Why 32KB? Why not more, or less? Can it be a runtime parameter? I guess it won't work with too large objects. We have no good understanding of how driver-specific parameters should be passed on mempool creation. We've simply kept it for future since it looks like separate task. If you have ideas, please, share - we'll be thankful. [...] +struct bucket_data { + unsigned int header_size; + unsigned int chunk_size; + unsigned int bucket_size; + uintptr_t bucket_page_mask; + struct rte_ring *shared_bucket_ring; + struct bucket_stack *buckets[RTE_MAX_LCORE]; + /* +* Multi-producer single-consumer ring to hold objects that are +* returned to the mempool at a different lcore than initially +* dequeued +*/ + struct rte_ring *adoption_buffer_rings[RTE_MAX_LCORE]; + struct rte_ring *shared_orphan_ring; + struct rte_mempool *pool; + +}; I'm seeing per-core structures. Will it work on non-dataplane cores? For instance, if a control thread wants to allocate a mbuf? May be I don't understand something. Does the control thread has valid rte_lcore_id()? If possible, these fields should be more documented (or just renamed). For instance, I suggest chunk_size could be called obj_per_bucket, which better described the content of the field. Thanks, we'll do. [...] +static int +bucket_enqueue_single(struct bucket_data *data, void *obj) +{ + int rc = 0; + uintptr_t addr = (uintptr_t)obj; + struct bucket_header *hdr; + unsigned int lcore_id = rte_lcore_id(); + + addr &= data->bucket_page_mask; + hdr = (struct bucket_header *)addr; + + if (likely(hdr->lcore_id == lcore_id)) { + if (hdr->fill_cnt < data->bucket_size - 1) { + hdr->fill_cnt++; + } else { + hdr->fill_cnt = 0; + /* Stack is big enough to put all buckets */ + bucket_stack_push(data->buckets[lcore_id], hdr); + } + } else if (hdr->lcore_id != LCORE_ID_ANY) { + struct rte_ring *adopt_ring = + data->adoption_buffer_rings[hdr->lcore_id]; + + rc = rte_ring_enqueue(adopt_ring, obj); + /* Ring is big enough to put all objects */ + RTE_ASSERT(rc == 0); + } else if (hdr->fill_cnt < data->bucket_size - 1) { + hdr->fill_cnt++; + } else { + hdr->fill_cnt = 0; + rc = rte_ring_enqueue(data->shared_bucket_ring, hdr); + /* Ring is big enough to put all buckets */ + RTE_ASSERT(rc == 0); + } + + return rc; +} [...] +static int +bucket_dequeue_buckets(struct bucket_data *data, void **obj_table, + unsigned int n_buckets) +{ + struct bucket_stack *cur_stack = data->buckets[rte_lcore_id()]; + unsigned int n_buckets_from_stack = RTE_MIN(n_buckets, cur_stack->top); + void **obj_table_base = obj_table; + + n_buckets -= n_buckets_from_stack; + while (n_buckets_from_stack-- > 0) { + void *obj = bucket_stack_pop_unsafe(cur_stack); + + obj_table = bucket_fill_obj_table(data, &obj, obj_table, + data->bucket_size); + } + while (n_buckets-- > 0) { + struct bucket_header *hdr; + + if (unlikely(rte_ring_dequeue(data->shared_bucket_ring, + (void **)&hdr) != 0)) { + /* Return the already-dequeued buffers +* back to the mempool +*/ + bucket_enqueue(data->pool, obj_table_base, + obj_table - obj_table_base); + rte_errno = ENOBUFS; + return -rte_errno; + } +
Re: [dpdk-dev] [RFC PATCH 4/6] mempool: add a function to flush default cache
On 12/14/2017 04:38 PM, Olivier MATZ wrote: On Fri, Nov 24, 2017 at 04:06:29PM +, Andrew Rybchenko wrote: From: "Artem V. Andreev" Mempool get/put API cares about cache itself, but sometimes it is required to flush the cache explicitly. I don't disagree, but do you have some use-case in mind? Ideally mempool objects should be reused ASAP. Block/bucket dequeue bypasses cache, since cache is not block-aware. So, cache should be flushed before block dequeue. Initially we had cache flush inside block dequeue wrapper, but decoupling it gives more freedom for optimizations. Also dedicated API allows to decouple it from block get API (to be added) and provides more fine-grained control. Signed-off-by: Artem V. Andreev Signed-off-by: Andrew Rybchenko --- lib/librte_mempool/rte_mempool.h | 16 1 file changed, 16 insertions(+) diff --git a/lib/librte_mempool/rte_mempool.h b/lib/librte_mempool/rte_mempool.h index 9bcb8b7..3a52b93 100644 --- a/lib/librte_mempool/rte_mempool.h +++ b/lib/librte_mempool/rte_mempool.h @@ -1161,6 +1161,22 @@ rte_mempool_default_cache(struct rte_mempool *mp, unsigned lcore_id) } /** + * Ensure that a default per-lcore mempool cache is flushed, if it is present + * + * @param mp + * A pointer to the mempool structure. + */ +static __rte_always_inline void +rte_mempool_ensure_cache_flushed(struct rte_mempool *mp) +{ + struct rte_mempool_cache *cache; + cache = rte_mempool_default_cache(mp, rte_lcore_id()); + if (cache != NULL && cache->len > 0) + rte_mempool_cache_flush(cache, mp); +} + We already have rte_mempool_cache_flush(). Why not just extending it instead of adding a new function? I mean: static __rte_always_inline void rte_mempool_cache_flush(struct rte_mempool_cache *cache, struct rte_mempool *mp) { + if (cache == NULL) + cache = rte_mempool_default_cache(mp, rte_lcore_id()); + if (cache == NULL || cache->len == 0) + return; rte_mempool_ops_enqueue_bulk(mp, cache->objs, cache->len); cache->len = 0; } Thanks, good idea.
[dpdk-dev] [PATCH] net/i40e: add warning log during writing global register
This patch adds warning log when writing global register, and add limitation doc for impact during use of 700 series NIC with both kernel driver and DPDK PMD. Signed-off-by: Beilei Xing --- doc/guides/nics/i40e.rst | 12 drivers/net/i40e/i40e_ethdev.c | 16 drivers/net/i40e/i40e_ethdev.h | 39 +++ drivers/net/i40e/i40e_fdir.c | 1 + drivers/net/i40e/i40e_flow.c | 1 + 5 files changed, 69 insertions(+) diff --git a/doc/guides/nics/i40e.rst b/doc/guides/nics/i40e.rst index 50d5e36..61d144b 100644 --- a/doc/guides/nics/i40e.rst +++ b/doc/guides/nics/i40e.rst @@ -543,6 +543,18 @@ DCB function DCB works only when RSS is enabled. +Global configuration warning + + +I40E PMD will set some global registers to enable some function or set some +configure. Then when using different ports of the same NIC with Linux kernel +and DPDK, the port with Linux kernel will be impacted by the port with DPDK. +For example, register I40E_GL_SWT_L2TAGCTRL is used to control L2 tag, i40e +PMD uses I40E_GL_SWT_L2TAGCTRL to set vlan TPID. If setting TPID in port A +with DPDK, then the configuration will also impact port B in the NIC with +kernel driver, which don't want to use the TPID. +So PMD reports warning to clarify what is changed by writing global register. + High Performance of Small Packets on 40G NIC diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c index 9626c6a..3907dfa 100644 --- a/drivers/net/i40e/i40e_ethdev.c +++ b/drivers/net/i40e/i40e_ethdev.c @@ -680,6 +680,7 @@ static inline void i40e_GLQF_reg_init(struct i40e_hw *hw) */ I40E_WRITE_REG(hw, I40E_GLQF_ORT(40), 0x0029); I40E_WRITE_REG(hw, I40E_GLQF_PIT(9), 0x9420); + i40e_global_cfg_warning(I40E_WARNING_QINQ_PARSER); } #define I40E_FLOW_CONTROL_ETHERTYPE 0x8808 @@ -1133,6 +1134,7 @@ eth_i40e_dev_init(struct rte_eth_dev *dev) 0x0028, NULL); if (ret) PMD_INIT_LOG(ERR, "Failed to write L3 MAP register %d", ret); + i40e_global_cfg_warning(I40E_WARNING_QINQ_CLOUD_FILTER); /* Need the special FW version to support floating VEB */ config_floating_veb(dev); @@ -1413,6 +1415,7 @@ void i40e_flex_payload_reg_set_default(struct i40e_hw *hw) I40E_WRITE_REG(hw, I40E_GLQF_ORT(33), 0x); I40E_WRITE_REG(hw, I40E_GLQF_ORT(34), 0x); I40E_WRITE_REG(hw, I40E_GLQF_ORT(35), 0x); + i40e_global_cfg_warning(I40E_WARNING_DIS_FLX_PLD); } static int @@ -3260,6 +3263,7 @@ i40e_vlan_tpid_set(struct rte_eth_dev *dev, /* If NVM API < 1.7, keep the register setting */ ret = i40e_vlan_tpid_set_by_registers(dev, vlan_type, tpid, qinq); + i40e_global_cfg_warning(I40E_WARNING_TPID); return ret; } @@ -3502,6 +3506,7 @@ i40e_flow_ctrl_set(struct rte_eth_dev *dev, struct rte_eth_fc_conf *fc_conf) I40E_WRITE_REG(hw, I40E_GLRPB_GLW, pf->fc_conf.low_water[I40E_MAX_TRAFFIC_CLASS] << I40E_KILOSHIFT); + i40e_global_cfg_warning(I40E_WARNING_FLOW_CTL); I40E_WRITE_FLUSH(hw); @@ -7981,6 +7986,7 @@ i40e_dev_set_gre_key_len(struct i40e_hw *hw, uint8_t len) reg, NULL); if (ret != 0) return ret; + i40e_global_cfg_warning(I40E_WARNING_GRE_KEY_LEN); } else { ret = 0; } @@ -8239,6 +8245,8 @@ i40e_set_hash_filter_global_config(struct i40e_hw *hw, i40e_write_rx_ctl(hw, I40E_GLQF_HSYM(j), reg); + i40e_global_cfg_warning( + I40E_WARNING_HSYM); } } } @@ -8265,6 +8273,7 @@ i40e_set_hash_filter_global_config(struct i40e_hw *hw, goto out; i40e_write_rx_ctl(hw, I40E_GLQF_CTL, reg); + i40e_global_cfg_warning(I40E_WARNING_QF_CTL); out: I40E_WRITE_FLUSH(hw); @@ -8909,6 +8918,10 @@ i40e_filter_input_set_init(struct i40e_pf *pf) pf->hash_input_set[pctype] = input_set; pf->fdir.input_set[pctype] = input_set; } + + i40e_global_cfg_warning(I40E_WARNING_HASH_INSET); + i40e_global_cfg_warning(I40E_WARNING_FD_MSK); + i40e_global_cfg_warning(I40E_WARNING_HASH_MSK); } int @@ -8969,6 +8982,7 @@ i40e_hash_filter_inset_select(struct i40e_hw *hw, i40e_check_write_reg(hw, I40E_GLQF_HASH_INSET(1, pctype), (uint32_t)((inset_reg >> I40E_32_
Re: [dpdk-dev] [RFC PATCH 2/6] mempool: implement clustered object allocation
On Wednesday 17 January 2018 08:33 PM, Andrew Rybchenko wrote: > On 12/14/2017 04:37 PM, Olivier MATZ wrote: >> On Fri, Nov 24, 2017 at 04:06:27PM +, Andrew Rybchenko wrote: >>> From: "Artem V. Andreev" >>> >>> Clustered allocation is required to simplify packaging objects into >>> buckets and search of the bucket control structure by an object. >>> >>> Signed-off-by: Artem V. Andreev >>> Signed-off-by: Andrew Rybchenko >>> --- >>> lib/librte_mempool/rte_mempool.c | 39 >>> +++ >>> lib/librte_mempool/rte_mempool.h | 23 +-- >>> test/test/test_mempool.c | 2 +- >>> 3 files changed, 57 insertions(+), 7 deletions(-) >>> >>> diff --git a/lib/librte_mempool/rte_mempool.c >>> b/lib/librte_mempool/rte_mempool.c >>> index d50dba4..43455a3 100644 >>> --- a/lib/librte_mempool/rte_mempool.c >>> +++ b/lib/librte_mempool/rte_mempool.c >>> @@ -239,7 +239,8 @@ rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t >>> flags, >>> */ >>> size_t >>> rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t >>> pg_shift, >>> - unsigned int flags) >>> + unsigned int flags, >>> + const struct rte_mempool_info *info) >>> { >>> size_t obj_per_page, pg_num, pg_sz; >>> unsigned int mask; >>> @@ -252,6 +253,17 @@ rte_mempool_xmem_size(uint32_t elt_num, size_t >>> total_elt_sz, uint32_t pg_shift, >>> if (total_elt_sz == 0) >>> return 0; >>> + if (flags & MEMPOOL_F_CAPA_ALLOCATE_IN_CLUSTERS) { >>> + unsigned int align_shift = >>> + rte_bsf32( >>> + rte_align32pow2(total_elt_sz * >>> + info->cluster_size)); >>> + if (pg_shift < align_shift) { >>> + return ((elt_num / info->cluster_size) + 2) >>> + << align_shift; >>> + } >>> + } >>> + >> +Cc Santosh for this >> >> To be honnest, that was my fear when introducing >> MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS and MEMPOOL_F_CAPA_PHYS_CONTIG to see more >> and more specific flags in generic code. >> >> I feel that the hidden meaning of these flags is more "if driver == foo", >> which shows that something is wrong is the current design. >> >> We have to think about another way to do. Let me try to propose >> something (to be deepen). >> >> The standard way to create a mempool is: >> >> mp = create_empty(...) >> set_ops_by_name(mp, "my-driver") // optional >> populate_default(mp) // or populate_*() >> obj_iter(mp, callback, arg) // optional, to init objects >> // and optional local func to init mempool priv >> >> First, we can consider deprecating some APIs like: >> - rte_mempool_xmem_create() >> - rte_mempool_xmem_size() >> - rte_mempool_xmem_usage() >> - rte_mempool_populate_iova_tab() >> >> These functions were introduced for xen, which was recently >> removed. They are complex to use, and are not used anywhere else in >> DPDK. >> >> Then, instead of having flags (quite hard to understand without knowing >> the underlying driver), we can let the mempool drivers do the >> populate_default() operation. For that we can add a populate_default >> field in mempool ops. Same for populate_virt(), populate_anon(), and >> populate_phys() which can return -ENOTSUP if this is not >> implemented/implementable on a specific driver, or if flags >> (NO_CACHE_ALIGN, NO_SPREAD, ...) are not supported. If the function >> pointer is NULL, use the generic function. >> >> Thanks to this, the generic code would remain understandable and won't >> have to care about how memory should be allocated for a specific driver. >> >> Thoughts? > > Yes, I agree. This week we'll provide updated version of the RFC which > covers it including transition of the mempool/octeontx. I think it is > sufficient > to introduce two new ops: > 1. To calculate memory space required to store specified number of objects > 2. To populate objects in the provided memory chunk (the op will be called > from rte_mempool_populate_iova() which is a leaf function for all > rte_mempool_populate_*() calls. > It will allow to avoid duplication and keep memchunks housekeeping inside > mempool library. > There is also a downside of letting mempool driver to populate, which was raised in other thread. http://dpdk.org/dev/patchwork/patch/31943/ Thanks.
Re: [dpdk-dev] [PATCH v2] net/bonding: fix link status check
On 11/29/2017 3:42 PM, Tomasz Kulasek wrote: > Some devices needs more time to initialize and bring interface up. When > link is down the link properties are not valid, e.g. link_speed is > reported as 0 and this is not a valid speed for slave as well as for whole > bonding. > > During NIC (and bonding) initialization there's concurrency between > updating link status and adding slave to the bonding. > > This patch: > > - adds delay before configuring bonding (if link is down) to be sure that >link status of new slave is valid, > - propagates information about link status from first slave with link up >instead of first slave at all, to be sure that link speed is valid. > > Fixes: 6abd94d72ab5 ("net/bonding: fix check slaves link properties") > Signed-off-by: Tomasz Kulasek > --- > v2 changes: > - Checkpatch warnings, > - Improved code style Hi Declan, Any comment on this patch? Thanks, ferruh
Re: [dpdk-dev] [PATCH v5 00/15] common ethdev linkstatus functions
On Wed, 17 Jan 2018 14:32:17 + Ferruh Yigit wrote: > On 1/17/2018 7:56 AM, Andrew Rybchenko wrote: > > On 01/16/2018 09:37 PM, Stephen Hemminger wrote: > >> While reviewing drivers, noticed a lot of unnecessary > >> duplication of code in drivers for handling the eth_dev link status > >> information. While consolidating this, it also became obvious that > >> some drivers behave differently for no good reason. > >> > >> It also was a good chance to introduce atomic exchange primitives > >> in EAL because there are other places using cmpset where not > >> necessary (such as bonding). > >> > >> Mostly only compile tested only, don't have all of the hardware > >> available (except ixgbe and virtio) to test. > >> > >> Note: the eth_dev_link_update function return value is inconsistent > >> across drivers. Should be changed to be void. > > > > I would say "link_update" callback return value is inconsistent across > > drivers. I'm not sure which direction is right here: make it consistent > > or make it void. Also any changes in link information could be > > important. As I understand it should not happen without up/down, > > but bugs with loss of intermediate transitions are definitely possible. > > So, notifying about any changes in link information is definitely safer. > > May be not now. > > Again, why not return previous link status, it is simple enough to prevent > inconsistent usage. > > rte_eth_link_get() already discards the return value, so won't be a problem > there. > > For the cases PMD would like know about link changes, they will need to > implement almost same link_update function with a return value, so why not use > existing link_update function? > > Like been in virtio, link_update() used in interrupt handler, and calls a > callback process if status changes. When link_update() return status changed > to > void, I guess they will need to implement another version of the link_update > with return and use it. The interrupt and non-interrupt model are different. Also the driver internally may want to do something different, this is about the return value for dev_ops->link_update. The code in rte_eth_dev never used the return value. The bonding driver was expecting it to work but it doesn't. Anyway drivers shouldn't in general be directly calling other devices eth_dev_ops
Re: [dpdk-dev] [PATCH v1] [net/tap] add logic to assign speed from user
On 12/21/2017 1:38 PM, Wiles, Keith wrote: > > >> On Dec 21, 2017, at 10:53 AM, Vipin Varghese >> wrote: >> >> TAP speed is passed as user argument, but never set to interface. >> New logic brings speed get and set to LOCAL and REMOTE interfaces. >> >> Updated the default PMD speeed to 10M as per Linux Kernel default >> value. > > The problem in setting the link speed to 10M is that TAP will not limit its > traffic to 10M. Applications like pktgen and others use the Link speed to > calculate the bit rate, which will be broken now. > > I would suggest making the default value 10G or 40G instead as CPU speeds > will continue to increase. Forcing someone to always add the link speed seems > a bit much when we know the systems can send/receive much higher then 10M, > which is the reason 10G was picked. Please set the default back to 10G or > some much higher number. Hi Keith, Vipin, Pascal, Since we really can't set the interface for Linux tap interface, what do you think removing speed arg completely from tap PMD? Thanks, ferruh > >> >> Signed-off-by: Vipin Varghese >> --- >> drivers/net/tap/rte_eth_tap.c | 185 >> +- >> 1 file changed, 182 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c >> index 6b27679..7238504 100644 >> --- a/drivers/net/tap/rte_eth_tap.c >> +++ b/drivers/net/tap/rte_eth_tap.c >> @@ -62,6 +62,8 @@ >> #include >> #include >> #include >> +#include >> +#include >> > > Regards, > Keith >
Re: [dpdk-dev] [PATCH v1] [net/tap] add logic to assign speed from user
> On Jan 17, 2018, at 10:06 AM, Yigit, Ferruh wrote: > > On 12/21/2017 1:38 PM, Wiles, Keith wrote: >> >> >>> On Dec 21, 2017, at 10:53 AM, Vipin Varghese >>> wrote: >>> >>> TAP speed is passed as user argument, but never set to interface. >>> New logic brings speed get and set to LOCAL and REMOTE interfaces. >>> >>> Updated the default PMD speeed to 10M as per Linux Kernel default >>> value. >> >> The problem in setting the link speed to 10M is that TAP will not limit its >> traffic to 10M. Applications like pktgen and others use the Link speed to >> calculate the bit rate, which will be broken now. >> >> I would suggest making the default value 10G or 40G instead as CPU speeds >> will continue to increase. Forcing someone to always add the link speed >> seems a bit much when we know the systems can send/receive much higher then >> 10M, which is the reason 10G was picked. Please set the default back to 10G >> or some much higher number. > > Hi Keith, Vipin, Pascal, > > Since we really can't set the interface for Linux tap interface, what do you > think removing speed arg completely from tap PMD? I have no problems with it being remove from the PMD. The only problem is this being a virtual interface it can be any speed, but 10M is unreasonable IMO. I would like it to be set to something reasonable as the default (40G or 10G) or would zero be more reasonable. I know the reported speed does not effect the performance, but tools that look at the speed and attempt to use that speed need a value greater then the max bit rate of the interface IMO. If we had a value to indicate a bogus speed then maybe the tools can adjust in some way. > > Thanks, > ferruh > >> >>> >>> Signed-off-by: Vipin Varghese >>> --- >>> drivers/net/tap/rte_eth_tap.c | 185 >>> +- >>> 1 file changed, 182 insertions(+), 3 deletions(-) >>> >>> diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c >>> index 6b27679..7238504 100644 >>> --- a/drivers/net/tap/rte_eth_tap.c >>> +++ b/drivers/net/tap/rte_eth_tap.c >>> @@ -62,6 +62,8 @@ >>> #include >>> #include >>> #include >>> +#include >>> +#include >>> >> >> Regards, >> Keith Regards, Keith
Re: [dpdk-dev] [PATCH v5 00/15] common ethdev linkstatus functions
On 1/17/2018 4:05 PM, Stephen Hemminger wrote: > On Wed, 17 Jan 2018 14:32:17 + > Ferruh Yigit wrote: > >> On 1/17/2018 7:56 AM, Andrew Rybchenko wrote: >>> On 01/16/2018 09:37 PM, Stephen Hemminger wrote: While reviewing drivers, noticed a lot of unnecessary duplication of code in drivers for handling the eth_dev link status information. While consolidating this, it also became obvious that some drivers behave differently for no good reason. It also was a good chance to introduce atomic exchange primitives in EAL because there are other places using cmpset where not necessary (such as bonding). Mostly only compile tested only, don't have all of the hardware available (except ixgbe and virtio) to test. Note: the eth_dev_link_update function return value is inconsistent across drivers. Should be changed to be void. >>> >>> I would say "link_update" callback return value is inconsistent across >>> drivers. I'm not sure which direction is right here: make it consistent >>> or make it void. Also any changes in link information could be >>> important. As I understand it should not happen without up/down, >>> but bugs with loss of intermediate transitions are definitely possible. >>> So, notifying about any changes in link information is definitely safer. >>> May be not now. >> >> Again, why not return previous link status, it is simple enough to prevent >> inconsistent usage. >> >> rte_eth_link_get() already discards the return value, so won't be a problem >> there. >> >> For the cases PMD would like know about link changes, they will need to >> implement almost same link_update function with a return value, so why not >> use >> existing link_update function? >> >> Like been in virtio, link_update() used in interrupt handler, and calls a >> callback process if status changes. When link_update() return status changed >> to >> void, I guess they will need to implement another version of the link_update >> with return and use it. > > The interrupt and non-interrupt model are different. Yes. But for virtio specific usage: virtio_interrupt_handler() virtio_dev_link_update() == 0 _rte_eth_dev_callback_process() meantime same exact virtio_dev_link_update() used as: .link_update = virtio_dev_link_update, so updating virtio_dev_link_update() to not return status change, will update logic in virtio_interrupt_handler(), no? > Also the driver internally may want to do something different, this is about > the return value for dev_ops->link_update. Agreed, driver may do something different. And the function needs to be implemented will be very close to dev_ops->link_update. I thought making dev_ops->link_update more generic can prevent duplication there. And aligns with virtio usage.. > The code in rte_eth_dev never > used the return value. The bonding driver was expecting it to work but it > doesn't. Agreed. > Anyway drivers shouldn't in general be directly calling other > devices eth_dev_ops I guess now there are a few overlay PMDs does this.
Re: [dpdk-dev] [PATCH v1] [net/tap] add logic to assign speed from user
On 1/17/2018 4:14 PM, Wiles, Keith wrote: > > >> On Jan 17, 2018, at 10:06 AM, Yigit, Ferruh wrote: >> >> On 12/21/2017 1:38 PM, Wiles, Keith wrote: >>> >>> On Dec 21, 2017, at 10:53 AM, Vipin Varghese wrote: TAP speed is passed as user argument, but never set to interface. New logic brings speed get and set to LOCAL and REMOTE interfaces. Updated the default PMD speeed to 10M as per Linux Kernel default value. >>> >>> The problem in setting the link speed to 10M is that TAP will not limit its >>> traffic to 10M. Applications like pktgen and others use the Link speed to >>> calculate the bit rate, which will be broken now. >>> >>> I would suggest making the default value 10G or 40G instead as CPU speeds >>> will continue to increase. Forcing someone to always add the link speed >>> seems a bit much when we know the systems can send/receive much higher then >>> 10M, which is the reason 10G was picked. Please set the default back to 10G >>> or some much higher number. >> >> Hi Keith, Vipin, Pascal, >> >> Since we really can't set the interface for Linux tap interface, what do you >> think removing speed arg completely from tap PMD? > > I have no problems with it being remove from the PMD. The only problem is > this being a virtual interface it can be any speed, but 10M is unreasonable > IMO. I would like it to be set to something reasonable as the default (40G or > 10G) or would zero be more reasonable. Agreed. When "speed" arg removed I think there is no reason to change default speed, it can stay as 10G as it is now. > > I know the reported speed does not effect the performance, but tools that > look at the speed and attempt to use that speed need a value greater then the > max bit rate of the interface IMO. If we had a value to indicate a bogus > speed then maybe the tools can adjust in some way. > >> >> Thanks, >> ferruh >> >>> Signed-off-by: Vipin Varghese --- drivers/net/tap/rte_eth_tap.c | 185 +- 1 file changed, 182 insertions(+), 3 deletions(-) diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c index 6b27679..7238504 100644 --- a/drivers/net/tap/rte_eth_tap.c +++ b/drivers/net/tap/rte_eth_tap.c @@ -62,6 +62,8 @@ #include #include #include +#include +#include >>> >>> Regards, >>> Keith > > Regards, > Keith >
[dpdk-dev] [PATCH] vfio: fix compilation errors in bsdapp
This patch fixes the following compilation errors in bsdapp /home/patchWorkOrg/compilation/lib/librte_eal/bsdapp/eal/eal.c:782:5: error: no previous prototype for function 'rte_vfio_clear_group' [-Werror,-Wmissing-prototypes] int rte_vfio_clear_group(int vfio_group_fd) ^ /home/patchWorkOrg/compilation/lib/librte_eal/bsdapp/eal/eal.c:782:30: error: unused parameter 'vfio_group_fd' [-Werror,-Wunused-parameter] int rte_vfio_clear_group(int vfio_group_fd) ^ Fixes: c564a2a20093 ("vfio: expose clear group function for internal usages") Cc: Hemant Agrawal Signed-off-by: Moti Haimovsky --- lib/librte_eal/bsdapp/eal/eal.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/lib/librte_eal/bsdapp/eal/eal.c b/lib/librte_eal/bsdapp/eal/eal.c index 7239243..cf72537 100644 --- a/lib/librte_eal/bsdapp/eal/eal.c +++ b/lib/librte_eal/bsdapp/eal/eal.c @@ -70,6 +70,7 @@ #include #include #include +#include #include "eal_private.h" #include "eal_thread.h" @@ -779,7 +780,7 @@ int rte_vfio_noiommu_is_enabled(void) return 0; } -int rte_vfio_clear_group(int vfio_group_fd) +int rte_vfio_clear_group(__rte_unused int vfio_group_fd) { return 0; } -- 1.8.3.1
Re: [dpdk-dev] [PATCH] examples/ipsec-secgw: update mbuf packet type
> -Original Message- > From: Nicolau, Radu > Sent: Tuesday, January 16, 2018 4:12 PM > To: Akhil Goyal ; dev@dpdk.org > Cc: De Lara Guarch, Pablo ; > hemant.agra...@nxp.com; Gonzalez Monroy, Sergio > > Subject: RE: [PATCH] examples/ipsec-secgw: update mbuf packet type > > > > -Original Message- > > From: Akhil Goyal [mailto:akhil.go...@nxp.com] > > Sent: Thursday, December 14, 2017 7:06 AM > > To: dev@dpdk.org > > Cc: De Lara Guarch, Pablo ; > > hemant.agra...@nxp.com; Gonzalez Monroy, Sergio > > ; Nicolau, Radu > > ; Akhil Goyal > > Subject: [PATCH] examples/ipsec-secgw: update mbuf packet type > > > > Packet_type should be updated to remove/add L4 type for > > encrypted/decrypted packet > > > > Signed-off-by: Akhil Goyal > > --- > Acked-by: Radu Nicolau Applied to dpdk-next-crypto. Thanks, Pablo
Re: [dpdk-dev] [PATCH] doc: update pcap documentation
> -Original Message- > From: Yigit, Ferruh > Sent: Tuesday, January 16, 2018 6:45 PM > To: Richardson, Bruce ; Mcnamara, John > ; Kovacevic, Marko > Cc: dev@dpdk.org; Yigit, Ferruh ; Varghese, Vipin > ; Glynn, Michael J > Subject: [PATCH] doc: update pcap documentation > > Add note about PMD expects the network interfaces provided to be up, > documented behavior to set expectations right. > > Also added minor fix. > >... > > +Notes > +^ > + > +The network interface provided to the PMD should be up, PMD will return > +an error if interface is down, and PMD itself won't change the status of > the external network interface. It would probably be a little better as an RST note directive, as used elsewhere in the doc. Like this: .. note:: The network interface provided to the PMD should be up. The PMD will return an error if interface is down, and the PMD itself won't change the status of the external network interface. But it isn't a big issue so: Acked-by: John McNamara
Re: [dpdk-dev] [RFC PATCH 2/6] mempool: implement clustered object allocation
On 01/17/2018 06:55 PM, santosh wrote: On Wednesday 17 January 2018 08:33 PM, Andrew Rybchenko wrote: On 12/14/2017 04:37 PM, Olivier MATZ wrote: On Fri, Nov 24, 2017 at 04:06:27PM +, Andrew Rybchenko wrote: From: "Artem V. Andreev" Clustered allocation is required to simplify packaging objects into buckets and search of the bucket control structure by an object. Signed-off-by: Artem V. Andreev Signed-off-by: Andrew Rybchenko --- lib/librte_mempool/rte_mempool.c | 39 +++ lib/librte_mempool/rte_mempool.h | 23 +-- test/test/test_mempool.c | 2 +- 3 files changed, 57 insertions(+), 7 deletions(-) diff --git a/lib/librte_mempool/rte_mempool.c b/lib/librte_mempool/rte_mempool.c index d50dba4..43455a3 100644 --- a/lib/librte_mempool/rte_mempool.c +++ b/lib/librte_mempool/rte_mempool.c @@ -239,7 +239,8 @@ rte_mempool_calc_obj_size(uint32_t elt_size, uint32_t flags, */ size_t rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t pg_shift, - unsigned int flags) + unsigned int flags, + const struct rte_mempool_info *info) { size_t obj_per_page, pg_num, pg_sz; unsigned int mask; @@ -252,6 +253,17 @@ rte_mempool_xmem_size(uint32_t elt_num, size_t total_elt_sz, uint32_t pg_shift, if (total_elt_sz == 0) return 0; + if (flags & MEMPOOL_F_CAPA_ALLOCATE_IN_CLUSTERS) { + unsigned int align_shift = + rte_bsf32( + rte_align32pow2(total_elt_sz * + info->cluster_size)); + if (pg_shift < align_shift) { + return ((elt_num / info->cluster_size) + 2) + << align_shift; + } + } + +Cc Santosh for this To be honnest, that was my fear when introducing MEMPOOL_F_CAPA_BLK_ALIGNED_OBJECTS and MEMPOOL_F_CAPA_PHYS_CONTIG to see more and more specific flags in generic code. I feel that the hidden meaning of these flags is more "if driver == foo", which shows that something is wrong is the current design. We have to think about another way to do. Let me try to propose something (to be deepen). The standard way to create a mempool is: mp = create_empty(...) set_ops_by_name(mp, "my-driver") // optional populate_default(mp) // or populate_*() obj_iter(mp, callback, arg) // optional, to init objects // and optional local func to init mempool priv First, we can consider deprecating some APIs like: - rte_mempool_xmem_create() - rte_mempool_xmem_size() - rte_mempool_xmem_usage() - rte_mempool_populate_iova_tab() These functions were introduced for xen, which was recently removed. They are complex to use, and are not used anywhere else in DPDK. Then, instead of having flags (quite hard to understand without knowing the underlying driver), we can let the mempool drivers do the populate_default() operation. For that we can add a populate_default field in mempool ops. Same for populate_virt(), populate_anon(), and populate_phys() which can return -ENOTSUP if this is not implemented/implementable on a specific driver, or if flags (NO_CACHE_ALIGN, NO_SPREAD, ...) are not supported. If the function pointer is NULL, use the generic function. Thanks to this, the generic code would remain understandable and won't have to care about how memory should be allocated for a specific driver. Thoughts? Yes, I agree. This week we'll provide updated version of the RFC which covers it including transition of the mempool/octeontx. I think it is sufficient to introduce two new ops: 1. To calculate memory space required to store specified number of objects 2. To populate objects in the provided memory chunk (the op will be called from rte_mempool_populate_iova() which is a leaf function for all rte_mempool_populate_*() calls. It will allow to avoid duplication and keep memchunks housekeeping inside mempool library. There is also a downside of letting mempool driver to populate, which was raised in other thread. http://dpdk.org/dev/patchwork/patch/31943/ I've seen the note about code duplication. Let's discuss it when v2 is sent. I think our approach minimizes it and allows to have only specific code in the driver callback.
[dpdk-dev] [PATCH] net/bonding: don't early mark eth device as bonded
From: "Charles (Chas) Williams" bonding immediately marks the incoming eth device as bonded and doesn't clear this in later error paths. Delay marking the dev until we are certain that we are going to add this eth device to the bond group. Signed-off-by: Chas Williams --- drivers/net/bonding/rte_eth_bond_api.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/net/bonding/rte_eth_bond_api.c b/drivers/net/bonding/rte_eth_bond_api.c index 534a890..e9c446b 100644 --- a/drivers/net/bonding/rte_eth_bond_api.c +++ b/drivers/net/bonding/rte_eth_bond_api.c @@ -242,9 +242,6 @@ __eth_bond_slave_add_lock_free(uint16_t bonded_port_id, uint16_t slave_port_id) return -1; } - /* Add slave details to bonded device */ - slave_eth_dev->data->dev_flags |= RTE_ETH_DEV_BONDED_SLAVE; - rte_eth_dev_info_get(slave_port_id, &dev_info); if (dev_info.max_rx_pktlen < internals->max_rx_pktlen) { RTE_BOND_LOG(ERR, "Slave (port %u) max_rx_pktlen too small", @@ -316,18 +313,21 @@ __eth_bond_slave_add_lock_free(uint16_t bonded_port_id, uint16_t slave_port_id) internals->slave_count++; - /* Update all slave devices MACs*/ - mac_address_slaves_update(bonded_eth_dev); - if (bonded_eth_dev->data->dev_started) { if (slave_configure(bonded_eth_dev, slave_eth_dev) != 0) { - slave_eth_dev->data->dev_flags &= (~RTE_ETH_DEV_BONDED_SLAVE); + internals->slave_count--; RTE_BOND_LOG(ERR, "rte_bond_slaves_configure: port=%d", slave_port_id); return -1; } } + /* Add slave details to bonded device */ + slave_eth_dev->data->dev_flags |= RTE_ETH_DEV_BONDED_SLAVE; + + /* Update all slave devices MACs*/ + mac_address_slaves_update(bonded_eth_dev); + /* Register link status change callback with bonded device pointer as * argument*/ rte_eth_dev_callback_register(slave_port_id, RTE_ETH_EVENT_INTR_LSC, -- 2.9.5
Re: [dpdk-dev] [PATCH v3 5/8] net/vdev_netvsc: implement core functionality
10/01/2018 16:02, Matan Azrad: > From: Stephen Hemminger, Tuesday, January 9, 2018 8:49 PM > > On Tue, 9 Jan 2018 14:47:30 + > > Matan Azrad wrote: > > > + ret = rte_eal_alarm_set(VDEV_NETVSC_PROBE_MS * 1000, > > > + vdev_netvsc_alarm, NULL); > > > + if (ret < 0) { > > > + DRV_LOG(ERR, "unable to reschedule alarm callback: %s", > > > + rte_strerror(-ret)); > > > + } > > > +} > > > > Why not use netlink uevent? > > As described in doc, we can improve the hotplug mechanism(here and in > fail-safe) after EAL hotplug work will be done. > So, maybe in next release we will change it to use uevent by EAL hotplug. I don't see any progress here for one week. Yes it is a temporary solution waiting for hotplug event callback in EAL. Hopefully it will be possible to do such improvements in 18.05. Am I missing something else? Or can it be applied to next-net?
Re: [dpdk-dev] [PATCH v2 2/6] ethdev: add port ownership
Hi Matan, > > Hi Konstantin > From: Ananyev, Konstantin, Wednesday, January 17, 2018 2:55 PM > > > > > > > > > Hi Konstantin > > > From: Ananyev, Konstantin, Sent: Wednesday, January 17, 2018 1:24 PM > > > > Hi Matan, > > > > > > > > > Hi Konstantin > > > > > > > > > > From: Ananyev, Konstantin, Tuesday, January 16, 2018 9:11 PM > > > > > > Hi Matan, > > > > > > > > > > > > > > > > > > > > Hi Konstantin > > > > > > > From: Ananyev, Konstantin, Monday, January 15, 2018 8:44 PM > > > > > > > > Hi Matan, > > > > > > > > > Hi Konstantin > > > > > > > > > From: Ananyev, Konstantin, Monday, January 15, 2018 1:45 > > > > > > > > > PM > > > > > > > > > > Hi Matan, > > > > > > > > > > > Hi Konstantin > > > > > > > > > > > From: Ananyev, Konstantin, Friday, January 12, 2018 > > > > > > > > > > > 2:02 AM > > > > > > > > > > > > Hi Matan, > > > > > > > > > > > > > Hi Konstantin > > > > > > > > > > > > > From: Ananyev, Konstantin, Thursday, January 11, > > > > > > > > > > > > > 2018 > > > > > > > > > > > > > 2:40 PM > > > > > > > > > > > > > > Hi Matan, > > > > > > > > > > > > > > > Hi Konstantin > > > > > > > > > > > > > > > From: Ananyev, Konstantin, Wednesday, January > > > > > > > > > > > > > > > 10, > > > > > > > > > > > > > > > 2018 > > > > > > > > > > > > > > > 3:36 PM > > > > > > > > > > > > > > > > Hi Matan, > > > > > > > > > > > > > > > > > > > > > > > It is good to see that now scanning/updating > > > > > > > > > > > > > > > > rte_eth_dev_data[] is lock protected, but it > > > > > > > > > > > > > > > > might be not very plausible to protect both > > > > > > > > > > > > > > > > data[] and next_owner_id using the > > > > > > > > > > same lock. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I guess you mean to the owner structure in > > > > > > > > > > rte_eth_dev_data[port_id]. > > > > > > > > > > > > > > > The next_owner_id is read by ownership > > > > > > > > > > > > > > > APIs(for owner validation), so it > > > > > > > > > > > > > > makes sense to use the same lock. > > > > > > > > > > > > > > > Actually, why not? > > > > > > > > > > > > > > > > > > > > > > > > > > > > Well to me next_owner_id and rte_eth_dev_data[] > > > > > > > > > > > > > > are not directly > > > > > > > > > > > > related. > > > > > > > > > > > > > > You may create new owner_id but it doesn't mean > > > > > > > > > > > > > > you would update rte_eth_dev_data[] immediately. > > > > > > > > > > > > > > And visa-versa - you might just want to update > > > > > > > > > > > > > > rte_eth_dev_data[].name or .owner_id. > > > > > > > > > > > > > > It is not very good coding practice to use same > > > > > > > > > > > > > > lock for non-related data structures. > > > > > > > > > > > > > > > > > > > > > > > > > > > I see the relation like next: > > > > > > > > > > > > > Since the ownership mechanism synchronization is > > > > > > > > > > > > > in ethdev responsibility, we must protect against > > > > > > > > > > > > > user mistakes as much as we can by > > > > > > > > > > > > using the same lock. > > > > > > > > > > > > > So, if user try to set by invalid owner (exactly > > > > > > > > > > > > > the ID which currently is > > > > > > > > > > > > allocated) we can protect on it. > > > > > > > > > > > > > > > > > > > > > > > > Hmm, not sure why you can't do same checking with > > > > > > > > > > > > different lock or atomic variable? > > > > > > > > > > > > > > > > > > > > > > > The set ownership API is protected by ownership lock > > > > > > > > > > > and checks the owner ID validity By reading the next owner > > ID. > > > > > > > > > > > So, the owner ID allocation and set API should use the > > > > > > > > > > > same atomic > > > > > > > > > > mechanism. > > > > > > > > > > > > > > > > > > > > Sure but all you are doing for checking validity, is > > > > > > > > > > check that owner_id > 0 &&& owner_id < next_ownwe_id, > > right? > > > > > > > > > > As you don't allow owner_id overlap (16/3248 bits) you > > > > > > > > > > can safely do same check with just > > atomic_get(&next_owner_id). > > > > > > > > > > > > > > > > > > > It will not protect it, scenario: > > > > > > > > > - current next_id is X. > > > > > > > > > - call set ownership of port A with owner id X by thread > > > > > > > > > 0(by user > > > > > > mistake). > > > > > > > > > - context switch > > > > > > > > > - allocate new id by thread 1 and get X and change next_id > > > > > > > > > to > > > > > > > > > X+1 > > > > > > > > atomically. > > > > > > > > > - context switch > > > > > > > > > - Thread 0 validate X by atomic_read and succeed to take > > > > ownership. > > > > > > > > > - The system loosed the port(or will be managed by two > > > > > > > > > entities) - > > > > > > crash. > > > > > > > > > > > > > > > > > > > > > > > > Ok, and how using lock will protect you with such scenario? > > > > > > > > > > > > > > The owner set API validation by thread 0 should fail because > > > > > > > the owner > > > > > > validation is included in the protected section. > > > >
Re: [dpdk-dev] [PATCH v2 1/2] lib/net: add IPv6 header fields macros
On 1/16/2018 9:17 AM, Shahaf Shuler wrote: > From: Shachar Beiser > > Support IPv6 header vtc_flow fields : tc , flow_label > > Signed-off-by: Shachar Beiser > --- > Sending on behalf of Shachar. > > On v2: > - Addressed Stephen comments on the coding style. Hi Olivier, Any objection to the patch? Thanks, ferruh > > --- > lib/librte_net/rte_ip.h | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/lib/librte_net/rte_ip.h b/lib/librte_net/rte_ip.h > index 9a62ff667..f32684c66 100644 > --- a/lib/librte_net/rte_ip.h > +++ b/lib/librte_net/rte_ip.h > @@ -344,6 +344,12 @@ struct ipv6_hdr { > uint8_t dst_addr[16]; /**< IP address of destination host(s). */ > } __attribute__((__packed__)); > > +/* IPv6 vtc_flow: IPv / TC / flow_label */ > +#define IPV6_HDR_FL_SHIFT 0 > +#define IPV6_HDR_TC_SHIFT 20 > +#define IPV6_HDR_FL_MASK ((1u << IPV6_HDR_TC_SHIFT) - 1) > +#define IPV6_HDR_TC_MASK (0xf << IPV6_HDR_TC_SHIFT) > + > /** > * Process the pseudo-header checksum of an IPv6 header. > * >
Re: [dpdk-dev] [PATCH] examples/ipsec-secgw: fix corner case for spi value
> -Original Message- > From: Nicolau, Radu > Sent: Tuesday, January 16, 2018 11:02 AM > To: Akhil Goyal ; dev@dpdk.org > Cc: De Lara Guarch, Pablo ; > hemant.agra...@nxp.com; Gonzalez Monroy, Sergio > > Subject: RE: [PATCH] examples/ipsec-secgw: fix corner case for spi value > > > > > -Original Message- > > From: Akhil Goyal [mailto:akhil.go...@nxp.com] > > Sent: Thursday, January 11, 2018 11:56 AM > > To: dev@dpdk.org > > Cc: De Lara Guarch, Pablo ; > > hemant.agra...@nxp.com; Gonzalez Monroy, Sergio > > ; Nicolau, Radu > > ; Akhil Goyal > > Subject: [PATCH] examples/ipsec-secgw: fix corner case for spi value > > > > application is using index 0 of SA table as error, with current value > > of > > IPSEC_SA_MAX_ENTRIES(128) it can not support SA with spi = 128, as it > > uses sa_idx = 0 in the SA table. > > > > With this patch, sa_idx = 0 can also be used. > > > > PS: spi = 0 is an invalid SPI and application throws error for it. > > > > Fixes: d299106e8e31 ("examples/ipsec-secgw: add IPsec sample > > application") > > > > Signed-off-by: Akhil Goyal > > --- > Acked-by: Radu Nicolau CC'ing stable ML. Applied to dpdk-next-crypto. Thanks, Pablo
Re: [dpdk-dev] [PATCH v2] crypto/dpaa_sec: rewrite Rx/Tx path
> -Original Message- > From: Akhil Goyal [mailto:akhil.go...@nxp.com] > Sent: Thursday, January 11, 2018 11:44 AM > To: dev@dpdk.org > Cc: De Lara Guarch, Pablo ; > hemant.agra...@nxp.com; Akhil Goyal ; Nipun > Gupta > Subject: [PATCH v2] crypto/dpaa_sec: rewrite Rx/Tx path > > Rx and Tx patch are rewritten with improved internal APIs to improve > performance. > > Signed-off-by: Akhil Goyal > Signed-off-by: Nipun Gupta Applied to dpdk-next-crypto. Thanks, Pablo
Re: [dpdk-dev] [PATCH] examples/ipsec-secgw: improve ipsec dequeue logic
> -Original Message- > From: Nicolau, Radu > Sent: Tuesday, January 16, 2018 4:13 PM > To: Akhil Goyal ; dev@dpdk.org > Cc: De Lara Guarch, Pablo ; > hemant.agra...@nxp.com; Gonzalez Monroy, Sergio > > Subject: RE: [PATCH] examples/ipsec-secgw: improve ipsec dequeue logic > > > > -Original Message- > > From: Akhil Goyal [mailto:akhil.go...@nxp.com] > > Sent: Thursday, December 14, 2017 7:19 AM > > To: dev@dpdk.org > > Cc: De Lara Guarch, Pablo ; > > hemant.agra...@nxp.com; Gonzalez Monroy, Sergio > > ; Nicolau, Radu > > ; Akhil Goyal > > Subject: [PATCH] examples/ipsec-secgw: improve ipsec dequeue logic > > > > Since the processing of crypto operations may take time due to > > hardware offload, all the packets may not be available in the single > dequeue command. > > So it may happen that there is leakage of cops, and there is nobody to > > dequeue the packets because dequeue of crypto ops is done only once > > for a particular queue pair even if it has more packets in flight. > > > > This patch dequeue the packets again if the inflight packets are more > > than the max packet burst. > > > > Signed-off-by: Akhil Goyal > > Acked-by: Radu Nicolau Applied to dpdk-next-crypto. Thanks, Pablo
Re: [dpdk-dev] [PATCH 0/6] net/sfc: convert to the new offload API
On 1/11/2018 8:12 AM, Andrew Rybchenko wrote: > May be it is too late to suggest a new API functions to ethdev, > but hopefully if the idea is accepted, it could be applied in the > current release cycle since these functions are trivial. Agreed, I think they are OK to get in even late. > > I'm not sure that rte_ethdev_version.map is updated correctly > since EXPERIMENTAL tag is present and I don't understand how it > should be handled. > > In general for the transition period from old offload API to the > new one it would be useful to convert Tx offloads to/from txq_flags > in rte_eth_dev_info_get() for default_txconf and > rte_eth_tx_queue_info_get(). Unfortunately it was lost during > new offload API patches review. However, it would require testing > for all network PMDs and we decided to follow more conservative > approach and kept code to fill in txq_flags which should be simply > removed when txq_flags are removed. > > Cc: Thomas Monjalon > Cc: Ferruh Yigit > Cc: Shahaf Shuler > > Ivan Malov (6): > ethdev: add a function to look up Rx offload names > ethdev: add a function to look up Tx offload names > net/sfc: factor out function to report Rx capabilities > net/sfc: convert to the new Rx offload API > net/sfc: factor out function to report Tx capabilities > net/sfc: convert to the new Tx offload API Series Reviewed-by: Ferruh Yigit
Re: [dpdk-dev] [PATCH 1/6] ethdev: add a function to look up Rx offload names
On 1/11/2018 8:12 AM, Andrew Rybchenko wrote: > From: Ivan Malov > > Commonly, drivers converted to the new offload API > may need to log unsupported offloads as a response > to wrong settings. From this perspective, it would > be convenient to have generic functions to look up > offload names. The patch adds such a helper for Rx. > > Signed-off-by: Ivan Malov > Signed-off-by: Andrew Rybchenko > > Cc: Thomas Monjalon > Cc: Ferruh Yigit > Cc: Shahaf Shuler Hi Thomas, Shahaf, Any comment on ethdev patches?
Re: [dpdk-dev] [PATCH 10/10 v4] doc: add DPAA eventdev guide
> -Original Message- > From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Nipun Gupta > Sent: Tuesday, January 16, 2018 8:44 PM > To: jerin.ja...@caviumnetworks.com > Cc: dev@dpdk.org; sunil.k...@nxp.com; hemant.agra...@nxp.com > Subject: [dpdk-dev] [PATCH 10/10 v4] doc: add DPAA eventdev guide > > From: Sunil Kumar Kori > > Signed-off-by: Sunil Kumar Kori > Acked-by: Hemant Agrawal Acked-by: John McNamara
Re: [dpdk-dev] [PATCH v2] doc: add eventdev pipeline sample app to release notes
> -Original Message- > From: Pavan Nikhilesh [mailto:pbhagavat...@caviumnetworks.com] > Sent: Tuesday, January 16, 2018 6:06 PM > To: jerin.ja...@caviumnetworks.com; Kovacevic, Marko > ; Mcnamara, John > Cc: dev@dpdk.org; Pavan Nikhilesh > Subject: [dpdk-dev] [PATCH v2] doc: add eventdev pipeline sample app to > release notes > > Signed-off-by: Pavan Nikhilesh Acked-by: John McNamara
Re: [dpdk-dev] [PATCH v3] ethdev: increase flow type limit from 32 to 64
On 1/15/2018 5:33 PM, Kirill Rybalchenko wrote: > Increase the internal limit for flow types from 32 to 64 > to support future flow type extensions. > Change type of variables from uint32_t[] to uint64_t[]: > rte_eth_fdir_info.flow_types_mask > rte_eth_hash_global_conf.sym_hash_enable_mask > rte_eth_hash_global_conf.valid_bit_mask > > This modification affects the following components: > net/i40e > net/ixgbe > app/testpmd > > v2: > implement versioning of rte_eth_dev_filter_ctrl function > for ABI backward compatibility with version 17.11 and older > > v3: > fix code style warnings > > Signed-off-by: Kirill Rybalchenko Reviewed-by: Ferruh Yigit I suggest keeping deprecation notice and clean versioning in next release, does it make sense?