Re: [dpdk-dev] [RFC PATCH 0/4] ethdev new offloads API

2017-08-26 Thread Shahaf Shuler
Friday, August 25, 2017 1:32 PM, Jerin Jacob:
> >
> > The new API does not have an equivalent for the below Tx flags:
> >
> > * ETH_TXQ_FLAGS_NOREFCOUNT
> > * ETH_TXQ_FLAGS_NOMULTMEMP
> 
> IMO, it make sense to keep those flags as PMD optimization if an application
> does not need reference count and multi mempool in the application.
> As example, An non trivial application like l3fwd does not need both of them.

The l3fwd application is yet another simple example from DPDK tree. Am not sure 
that a complete vRouter/vSwitch implementation is with the same characteristics.
Moreover, I think the fact there is an application which is able to use it is 
not enough.  IMO there needs to be some basic functionality always provided by 
the PMDs and not controlled by flags.
For example, let's say we have an application which always sends the mbufs with 
the same ol_flags, or even with the same length.
Will it make sense to add more flags to control it?
Will it makes sense to run RFC2544 benchmark with testpmd io forwarding with 
those flags? 

If the answer is yes, maybe those flags (and others to follow) belong on 
different location on ethdev. However for sure they are not offloads.


[dpdk-dev] [PATCH v2 1/2] net/mlx5: replace memory barrier type

2017-08-26 Thread Shahaf Shuler
The reason for the requirement of a barrier between the txq writes
and the doorbell record writes is to avoid a case where the device
reads the doorbell record's new value before the txq writes are flushed
to memory.

The current use of rte_wmb is not necessary, and can be replaced by
rte_io_wmb which is more relaxed.

Replacing the rte_wmb is also expected to improve the throughput.

on v2:
 * replace compiler barrier with rte_io_wmb.

Signed-off-by: Shahaf Shuler 
Signed-off-by: Yongseok Koh 
Signed-off-by: Alexander Solganik 
Signed-off-by: Sagi Grimberg 
Acked-by: Nelio Laranjeiro 
---
 drivers/net/mlx5/mlx5_rxtx.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5_rxtx.h b/drivers/net/mlx5/mlx5_rxtx.h
index b3b161da5..e9895a9c0 100644
--- a/drivers/net/mlx5/mlx5_rxtx.h
+++ b/drivers/net/mlx5/mlx5_rxtx.h
@@ -591,7 +591,7 @@ mlx5_tx_dbrec(struct txq *txq, volatile struct mlx5_wqe 
*wqe)
uint64_t *dst = (uint64_t *)((uintptr_t)txq->bf_reg);
volatile uint64_t *src = ((volatile uint64_t *)wqe);
 
-   rte_wmb();
+   rte_io_wmb();
*txq->qp_db = htonl(txq->wqe_ci);
/* Ensure ordering between DB record and BF copy. */
rte_wmb();
-- 
2.12.0



[dpdk-dev] [PATCH v2 0/2] mlx5 high latency observed on send operations

2017-08-26 Thread Shahaf Shuler
from s...@grimberg.me:

When measuring latency when running a latency critical workload on mlx5 pmd 
drivers we noticed high latency can occur due to delayed doorbell record update 
flush.

This can be reproduced using the simple program [1] against testpmd macswap fwd 
mode. This utility sends a raw ethernet frame to the dpdk port and measures the 
time between send and the received mirrored frame.

This patchset guarantees immediate doorbell updates visibility by making the 
doorbell a non-cacheble memory.
In addition, we relax the memory barrier for dma-able memory.

Without this fix the tsc delta was 3550760-5993019 cycles (which translates to 
2-6 ms on 1.7 GHz processor).

With the fix applied the tsc delta reduced to 17740-29663 (wich translates to 
9-17 us).

on v2:
 * replace compiler barrier with rte_io_wmb.

Shahaf Shuler (2):
  net/mlx5: replace memory barrier type
  net/mlx5: don't map doorbell register to write combining

 drivers/net/mlx5/mlx5.c  | 2 ++
 drivers/net/mlx5/mlx5_rxtx.h | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

[1]:
/*
 * compiling: gcc test.c -o test
 * run using: ./test*/ #include  
#include  #include  #include  #include 
 #include  #include  #include  
#include 

#define BUF_SIZ 1024

static inline uint64_t rte_rdtsc(void)
{
union {
uint64_t tsc_64;
struct {
uint32_t lo_32;
uint32_t hi_32;
};
} tsc;

asm volatile("rdtsc" :
 "=a" (tsc.lo_32),
 "=d" (tsc.hi_32));
return tsc.tsc_64;
}

int main(int argc, char *argv[])
{
int sockfd;
struct ifreq if_idx;
struct ifreq if_mac;
int tx_len = 0;
char sendbuf[BUF_SIZ];
struct ether_header *eh = (struct ether_header *) sendbuf;
struct sockaddr_ll socket_address;
char ifname[IFNAMSIZ];
int values[6];
struct ether_header expected;
uint64_t payload = 0xB16B00B5;
uint8_t buffer[1024];
int result;
uint64_t before_rcv;
uint64_t after_rcv;
uint64_t delta;
int numbytes;

if (argc != 3) {
fprintf(stderr, "device name and dest mac\n");
return -1;
}

strcpy(ifname, argv[1]);
result = sscanf(argv[2], "%x:%x:%x:%x:%x:%x",
&values[0], &values[1], &values[2], &values[3], 
&values[4], &values[5]);
if (result != 6) {
fprintf(stderr, "invalid mac\n");
return -1;
}

/* Open RAW socket to send on */
if ((sockfd = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL))) == -1) {
perror("socket");
}

/* Get the index of the interface to send on */
memset(&if_idx, 0, sizeof(struct ifreq));
strncpy(if_idx.ifr_name, ifname, IFNAMSIZ-1);
if (ioctl(sockfd, SIOCGIFINDEX, &if_idx) < 0)
perror("SIOCGIFINDEX");
/* Get the MAC address of the interface to send on */
memset(&if_mac, 0, sizeof(struct ifreq));
strncpy(if_mac.ifr_name, ifname, IFNAMSIZ-1);
if (ioctl(sockfd, SIOCGIFHWADDR, &if_mac) < 0)
perror("SIOCGIFHWADDR");

/* Construct the Ethernet header */
memset(sendbuf, 0, BUF_SIZ);
/* Ethernet header */
eh->ether_shost[0] = ((uint8_t *)&if_mac.ifr_hwaddr.sa_data)[0];
eh->ether_shost[1] = ((uint8_t *)&if_mac.ifr_hwaddr.sa_data)[1];
eh->ether_shost[2] = ((uint8_t *)&if_mac.ifr_hwaddr.sa_data)[2];
eh->ether_shost[3] = ((uint8_t *)&if_mac.ifr_hwaddr.sa_data)[3];
eh->ether_shost[4] = ((uint8_t *)&if_mac.ifr_hwaddr.sa_data)[4];
eh->ether_shost[5] = ((uint8_t *)&if_mac.ifr_hwaddr.sa_data)[5];
eh->ether_dhost[0] = values[0];
eh->ether_dhost[1] = values[1];
eh->ether_dhost[2] = values[2];
eh->ether_dhost[3] = values[3];
eh->ether_dhost[4] = values[4];
eh->ether_dhost[5] = values[5];
/* Ethertype field */
eh->ether_type = htons(ETH_P_IP);
tx_len += sizeof(struct ether_header);

memcpy(&sendbuf[tx_len], &payload, sizeof(payload));
tx_len += sizeof(payload);

/* Index of the network device */
socket_address.sll_ifindex = if_idx.ifr_ifindex;
/* Address length*/
socket_address.sll_halen = ETH_ALEN;
/* Destination MAC */
socket_address.sll_addr[0] = values[0];
socket_address.sll_addr[1] = values[1];
socket_address.sll_addr[2] = values[2];
socket_address.sll_addr[3] = values[3];
socket_address.sll_addr[4] = values[4];
socket_address.sll_addr[5] = values[5];

memcpy(&expected.ether_dhost, &eh->ether_shost, ETH_ALEN);
memcpy(&expected.ether_shost, &eh->ether_dhost, ETH_ALEN);
expected.ether_type = eh->ether_type;


/* Send packet */

[dpdk-dev] [PATCH v2 2/2] net/mlx5: don't map doorbell register to write combining

2017-08-26 Thread Shahaf Shuler
By default, Verbs maps the doorbell register to write combining.
Working with write combining is useful for drivers which use blue flame
for the doorbell write.

Since mlx5 PMD uses only doorbells and write combining mapping requires
an extra memory barrier to flush the doorbell after its write, setting
the mapping to un-cached by default.

Such change is expected to reduce the max and average round trip latency.

Signed-off-by: Shahaf Shuler 
Signed-off-by: Yongseok Koh 
Signed-off-by: Alexander Solganik 
Signed-off-by: Sagi Grimberg 
Acked-by: Nelio Laranjeiro 
---
 drivers/net/mlx5/mlx5.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index bd66a7c77..50f4ba70a 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -884,6 +884,8 @@ rte_mlx5_pmd_init(void)
 * using this PMD, which is not supported in forked processes.
 */
setenv("RDMAV_HUGEPAGES_SAFE", "1", 1);
+   /* Don't map UAR to WC if BlueFlame is not used.*/
+   setenv("MLX5_SHUT_UP_BF", "1", 1);
ibv_fork_init();
rte_pci_register(&mlx5_driver);
 }
-- 
2.12.0