[dpdk-dev] [PATCH 0/4] virtio support for container

2016-01-13 Thread Tan, Jianfeng
Hello,

On 1/12/2016 11:11 PM, Amit Tomer wrote:
> Hello,
>
>> In vhost-switch, it judges if a virtio device is ready for processing after
>> receiving
>> a pkt from virtio device. So you'd better construct a pkt, and send it out
>> firstly
>> in l2fwd.
> I tried to ping the socket interface from host for the same purpose
> but it didn't work.
>
> Could you please suggest some other approach for achieving same(how
> pkt can be sent out to l2fwd)?
>
> Also, before trying this, I have verified that vhost-switch is working
> ok with testpmd .
>
> Thanks,
> Amit.

You can use below patch for l2fwd to send out an arp packet when it gets 
started.

diff --git a/examples/l2fwd/main.c b/examples/l2fwd/main.c
index 720fd5a..572b1ac 100644
--- a/examples/l2fwd/main.c
+++ b/examples/l2fwd/main.c
@@ -69,6 +69,8 @@
  #include 
  #include 

+#define SEND_ARP
+
  #define RTE_LOGTYPE_L2FWD RTE_LOGTYPE_USER1

  #define NB_MBUF   8192
@@ -185,6 +187,53 @@ print_stats(void)
printf("\n\n");
  }

+#ifdef SEND_ARP
+static void
+dpdk_send_arp(int portid, struct rte_mempool *mp)
+{
+/*
+ * len = 14 + 46
+ * ARP, Request who-has 10.0.0.1 tell 10.0.0.2, length 46
+ */
+static const uint8_t arp_request[] = {
+/*0x:*/ 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xec, 0xa8, 
0x6b, 0xfd, 0x02, 0x29, 0x08, 0x06, 0x00, 0x01,
+/*0x0010:*/ 0x08, 0x00, 0x06, 0x04, 0x00, 0x01, 0xec, 0xa8, 
0x6b, 0xfd, 0x02, 0x29, 0x0a, 0x00, 0x00, 0x01,
+/*0x0020:*/ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x0a, 0x00, 
0x00, 0x02, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
+/*0x0030:*/ 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x00, 0x00, 0x00
+};
+int ret;
+struct rte_mbuf *m;
+struct ether_addr mac_addr;
+int pkt_len = sizeof(arp_request) - 1;
+
+m = rte_pktmbuf_alloc(mp);
+
+memcpy((void *)((uint64_t)m->buf_addr + m->data_off), arp_request, 
pkt_len);
+rte_pktmbuf_pkt_len(m) = pkt_len;
+rte_pktmbuf_data_len(m) = pkt_len;
+
+rte_eth_macaddr_get(portid, &mac_addr);
+memcpy((void *)((uint64_t)m->buf_addr + m->data_off + 6), 
&mac_addr, 6);
+
+ret = rte_eth_tx_burst(portid, 0, &m, 1);
+if (ret == 1) {
+printf("arp sent: ok\n");
+printf("%02x:%02x:%02x:%02x:%02x:%02x\n",
+mac_addr.addr_bytes[0],
+mac_addr.addr_bytes[1],
+mac_addr.addr_bytes[2],
+mac_addr.addr_bytes[3],
+mac_addr.addr_bytes[4],
+mac_addr.addr_bytes[5]);
+} else {
+printf("arp sent: fail\n");
+}
+
+rte_pktmbuf_free(m);
+}
+#endif
+
+
  /* Send the burst of packets on an output interface */
  static int
  l2fwd_send_burst(struct lcore_queue_conf *qconf, unsigned n, uint8_t port)
@@ -281,6 +330,9 @@ l2fwd_main_loop(void)
  portid = qconf->rx_port_list[i];
  RTE_LOG(INFO, L2FWD, " -- lcoreid=%u portid=%u\n", lcore_id,
  portid);
+#ifdef SEND_ARP
+dpdk_send_arp(portid, l2fwd_pktmbuf_pool);
+#endif
  }

  while (1) {



[dpdk-dev] [RFC v2 1/2] ethdev: add packet filter flow and new behavior switch to fdir

2016-01-13 Thread Wu, Jingjing
> 
> diff --git a/lib/librte_ether/rte_eth_ctrl.h b/lib/librte_ether/rte_eth_ctrl.h
> index ce224ad..5cc22a0 100644
> --- a/lib/librte_ether/rte_eth_ctrl.h
> +++ b/lib/librte_ether/rte_eth_ctrl.h
> @@ -74,7 +74,11 @@ extern "C" {
>  #define RTE_ETH_FLOW_IPV6_EX15
>  #define RTE_ETH_FLOW_IPV6_TCP_EX16
>  #define RTE_ETH_FLOW_IPV6_UDP_EX17
> -#define RTE_ETH_FLOW_MAX18
> +#define RTE_ETH_FLOW_PKT_FILTER_IPV4_TCP 18 #define
> +RTE_ETH_FLOW_PKT_FILTER_IPV4_UDP 19 #define
> +RTE_ETH_FLOW_PKT_FILTER_IPV6_TCP 20 #define
> +RTE_ETH_FLOW_PKT_FILTER_IPV6_UDP 21
> +#define RTE_ETH_FLOW_MAX22
> 
How to distinguish RTE_ETH_FLOW_PKT_FILTER_IPV4_XX with 
RTE_ETH_FLOW_NONFRAG_IPV4_XX, what is the difference?
>  /**
>   * Feature filter types
> @@ -407,6 +411,9 @@ struct rte_eth_l2_flow {  struct rte_eth_ipv4_flow {
>   uint32_t src_ip;  /**< IPv4 source address to match. */
>   uint32_t dst_ip;  /**< IPv4 destination address to match. */
> + uint8_t tos;  /**< IPV4 type of service to match. */
> + uint8_t proto;/**< IPV4 proto to match. */
> + uint8_t ttl;  /**< IPV4 time to live to match. */
>  };
> 
>  /**
> @@ -443,6 +450,10 @@ struct rte_eth_sctpv4_flow {  struct
> rte_eth_ipv6_flow {
>   uint32_t src_ip[4];  /**< IPv6 source address to match. */
>   uint32_t dst_ip[4];  /**< IPv6 destination address to match. */
> + uint8_t  tc; /**< IPv6 traffic class to match. */
> + uint32_t flow_label; /**< IPv6 flow label to match. */
> + uint8_t  next_header;/**< IPv6 next header to match. */
> + uint8_t  hop_limit;  /**< IPv6 hop limits to match. */
>  };
> 
There is also a patch http://dpdk.org/dev/patchwork/patch/9661/ which added 
these fields. Maybe we can merge them together.
> +struct rte_eth_pkt_filter_flow {
> + enum rte_eth_pkt_filter_type type;   /**< Type of filter */
> + enum rte_eth_pkt_filter_type prio;
> + /**< Prioritize the filter type when a packet matches several types */
> + struct rte_eth_pkt_filter pkt;  /**< Packet fields to match. */
> + struct rte_eth_pkt_filter mask; /**< Mask for matched fields. */
> +};
> +
> +/**
>   * An union contains the inputs for all types of flow
>   */
>  union rte_eth_fdir_flow {
> @@ -514,6 +570,7 @@ union rte_eth_fdir_flow {
>   struct rte_eth_ipv6_flow   ipv6_flow;
>   struct rte_eth_mac_vlan_flow mac_vlan_flow;
>   struct rte_eth_tunnel_flow   tunnel_flow;
> + struct rte_eth_pkt_filter_flow pkt_filter_flow;
>  };
Why not use rte_eth_XX_flow directly but add a new one? Is it because of the 
mask? If so, how about to add a field in rte_eth_fdir_input like:
struct rte_eth_fdir_input {
uint16_t flow_type;
union rte_eth_fdir_flow flow;
/**< Flow fields to match, dependent on flow_type */
union rte_eth_fdir_flow flow_mask;
struct rte_eth_fdir_flow_ext flow_ext;
/**< Additional fields to match */
};

Thanks
Jingjing


[dpdk-dev] [PATCH] vhost_user: Make sure that memory map is set before attempting address translation

2016-01-13 Thread Yuanhan Liu
On Tue, Jan 12, 2016 at 05:35:06PM +0300, Pavel Fedin wrote:
> Malfunctioning virtio clients may not send VHOST_USER_SET_MEM_TABLE for
> some reason. This causes NULL dereference in qva_to_vva().
> 
> Change-Id: Ibc8f6637fb5fb9885b02c316adf18afd45e0d49a

What's this? An internal track id? If so, you should not include it
here: it's just meaningless to us.

Otherwise, this patch looks good to me.

--yliu


[dpdk-dev] [PATCH 00/12] Add API to get packet type info

2016-01-13 Thread Qiu, Michael
On 12/31/2015 9:53 PM, Jianfeng Tan wrote:
> HAPPRY NEW YEAR!
>
> A new ether API rte_eth_dev_get_ptype_info() is added to query what
> packet type information will be provided by current pmd driver of the
> specifed port.
>
> To achieve this, a new function pointer, dev_ptype_info_get, is added
> into struct eth_dev_ops. For those devices who do not implement it, it
> means it will not provide any ptype info.

I haven't go through all the patches, but I have a question, what's the
usercase of this API?

Thanks,
Michael
> Jianfeng Tan (12):
>   ethdev: add API to query what/if packet type is set
>   pmd/cxgbe: add dev_ptype_info_get implementation
>   pmd/e1000: add dev_ptype_info_get implementation
>   pmd/enic: add dev_ptype_info_get implementation
>   pmd/fm10k: add dev_ptype_info_get implementation
>   pmd/i40e: add dev_ptype_info_get implementation
>   pmd/ixgbe: add dev_ptype_info_get implementation
>   pmd/mlx4: add dev_ptype_info_get implementation
>   pmd/mlx5: add dev_ptype_info_get implementation
>   pmd/nfp: add dev_ptype_info_get implementation
>   pmd/vmxnet3: add dev_ptype_info_get implementation
>   examples/l3fwd: add option to parse ptype
>
>  drivers/net/cxgbe/cxgbe_ethdev.c | 17 +++
>  drivers/net/e1000/igb_ethdev.c   | 48 
>  drivers/net/enic/enic_ethdev.c   | 20 +
>  drivers/net/fm10k/fm10k_ethdev.c | 60 +
>  drivers/net/fm10k/fm10k_rxtx.c   |  5 +++
>  drivers/net/fm10k/fm10k_rxtx_vec.c   |  5 +++
>  drivers/net/i40e/i40e_ethdev.c   |  1 +
>  drivers/net/i40e/i40e_ethdev_vf.c|  1 +
>  drivers/net/i40e/i40e_rxtx.c | 69 -
>  drivers/net/i40e/i40e_rxtx.h |  2 +
>  drivers/net/ixgbe/ixgbe_ethdev.c | 50 +
>  drivers/net/ixgbe/ixgbe_ethdev.h |  2 +
>  drivers/net/ixgbe/ixgbe_rxtx.c   |  5 ++-
>  drivers/net/mlx4/mlx4.c  | 27 +++
>  drivers/net/mlx5/mlx5.c  |  1 +
>  drivers/net/mlx5/mlx5.h  |  2 +
>  drivers/net/mlx5/mlx5_ethdev.c   | 25 +++
>  drivers/net/mlx5/mlx5_rxtx.c |  2 +
>  drivers/net/nfp/nfp_net.c| 18 
>  drivers/net/vmxnet3/vmxnet3_ethdev.c | 20 +
>  examples/l3fwd/main.c| 86 
> 
>  lib/librte_ether/rte_ethdev.c| 12 +
>  lib/librte_ether/rte_ethdev.h| 22 +
>  lib/librte_mbuf/rte_mbuf.h   | 13 ++
>  24 files changed, 511 insertions(+), 2 deletions(-)
>



[dpdk-dev] [PATCH v3 1/4] vmxnet3: restore tx data ring support

2016-01-13 Thread Yong Wang
On 1/5/16, 4:48 PM, "Stephen Hemminger"  wrote:


>On Tue,  5 Jan 2016 16:12:55 -0800
>Yong Wang  wrote:
>
>> @@ -365,6 +366,14 @@ vmxnet3_xmit_pkts(void *tx_queue, struct rte_mbuf 
>> **tx_pkts,
>>  break;
>>  }
>>  
>> +if (rte_pktmbuf_pkt_len(txm) <= VMXNET3_HDR_COPY_SIZE) {
>> +struct Vmxnet3_TxDataDesc *tdd;
>> +
>> +tdd = txq->data_ring.base + txq->cmd_ring.next2fill;
>> +copy_size = rte_pktmbuf_pkt_len(txm);
>> +rte_memcpy(tdd->data, rte_pktmbuf_mtod(txm, char *), 
>> copy_size);
>> +}
>
>Good idea to use a local region which optmizes the copy in the host,
>but this implementation needs to be more general.
>
>As written it is broken for multi-segment packets. A multi-segment
>packet will have a pktlen >= datalen as in:
>  m -> mb_segs=3, pktlen=1200, datalen=200
>-> datalen=900
>-> datalen=100
>
>There are two ways to fix this. You could test for nb_segs == 1
>or better yet. Optimize each segment it might be that the first
>segment (or tail segment) would fit in the available data area.

Currently the vmxnet3 backend has a limitation of 128B data area so
it should work even for the multi-segmented pkt shown above. But
I agree it does not work for all multi-segmented packets.  The
following packet will be such an example.

m -> nb_segs=3, pktlen=128, datalen=64
-> datalen=32
-> datalen=32


It?s unclear if/how we might get into such a multi-segmented pkt
but I agree we should handle this case.  Patch updated taking the
simple approach (checking for nb_segs == 1).  I?ll leave the
optimization as a future patch.


[dpdk-dev] [PATCH v3 2/4] vmxnet3: add tx l4 cksum offload

2016-01-13 Thread Yong Wang
On 1/5/16, 4:51 PM, "Stephen Hemminger"  wrote:

>On Tue,  5 Jan 2016 16:12:56 -0800
>Yong Wang  wrote:
>
>> -if (txq->shared->ctrl.txNumDeferred >= txq->shared->ctrl.txThreshold) {
>> +PMD_TX_LOG(DEBUG, "vmxnet3 txThreshold: %u", 
>> rte_le_to_cpu_32(txq_ctrl->txThreshold));
>
>For bisection, it would be good to split the byte-order fixes from the
>offload changes; in other words make them different commits.

Sure and patched updated.


[dpdk-dev] [PATCH v3 4/4] vmxnet3: announce device offload capability

2016-01-13 Thread Yong Wang
On 1/5/16, 4:52 PM, "Stephen Hemminger"  wrote:

>On Tue,  5 Jan 2016 16:12:58 -0800
>Yong Wang  wrote:
>
>>  
>>  /* return 0 means link status changed, -1 means not changed */
>> @@ -819,7 +831,7 @@ vmxnet3_dev_vlan_filter_set(struct rte_eth_dev *dev, 
>> uint16_t vid, int on)
>>  else
>>  VMXNET3_CLEAR_VFTABLE_ENTRY(hw->shadow_vfta, vid);
>>  
>> -/* don't change active filter if in promiscious mode */
>> +/* don't change active filter if in promiscuous mode */
>
>Maybe send a first patch in series with these message and comment cleanups?
>
>Makes the review easier, and aides bisection.

Sure and patch updated.


[dpdk-dev] [PATCH 1/4] ixgbe: support UDP tunnel add/del

2016-01-13 Thread Lu, Wenzhuo
Hi Thomas,

> -Original Message-
> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> Sent: Tuesday, January 12, 2016 8:37 PM
> To: Lu, Wenzhuo 
> Cc: dev at dpdk.org; Vincent JARDIN 
> Subject: Re: [dpdk-dev] [PATCH 1/4] ixgbe: support UDP tunnel add/del
> 
> Hi,
> 
> 2016-01-11 08:28, Lu, Wenzhuo:
> > [Wenzhuo] The udp_tunnel_add and udp_tunnel_del have already existed.
> I just use them. Honestly I agree with you they are not accurate name. Better
> change them to udp_tunnel_port_add and udp_tunnel_port_del. But it
> should be a ABI change if I?m not wrong. I think we can announce it this
> release and change them in the next release. Would you agree?  Thanks.
> 
> You can introduce the new name and keep the old one for backward compat
> while announcing its deprecation.
Good suggestion, I'll send a new patch set for this change. Thanks.

> Thanks


[dpdk-dev] [PATCH 4/4] doc: update release note for VxLAN & NVGRE checksum off-load support

2016-01-13 Thread Lu, Wenzhuo
Hi John,

> -Original Message-
> From: Mcnamara, John
> Sent: Tuesday, January 12, 2016 9:45 PM
> To: Lu, Wenzhuo ; dev at dpdk.org
> Subject: RE: [dpdk-dev] [PATCH 4/4] doc: update release note for VxLAN &
> NVGRE checksum off-load support
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Wenzhuo Lu
> > Sent: Monday, January 11, 2016 7:07 AM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] [PATCH 4/4] doc: update release note for VxLAN &
> > NVGRE checksum off-load support
> 
> 
> > +* **Support VxLAN & NVGRE checksum off-load on X550**
> > +
> > +  * VxLAN & NVGRE RX/TX checksum off-load is supported on X550.
> > +Provide RX/TX checksum off-load on both inner and outer IP
> > +header and TCP header.
> > +  * Support VxLAN port configuration. Although the default VxLAN
> > +port number is 4789, it can be changed. We should make it
> > +configable to meet the change.
> 
> Hi Wenzhou,
> 
> The release note text should be in the past tense. Something like this would
> be better:
> 
> * **Added support for VxLAN and NVGRE checksum off-load on X550.**
> 
>   * Added support for VxLAN and NVGRE RX/TX checksum off-load on
> X550. RX/TX checksum off-load is provided on both inner and
> outer IP header and TCP header.
> 
>   * Added functions to support for VxLAN port configuration. The
> default VxLAN port number is 4789 but this can be updated
> programmatically.
Thanks for the comments. Let me send a V2.



[dpdk-dev] [RFC PATCH 3/3] doc: add introduction for fm10k FTAG based forwarding

2016-01-13 Thread Wang, Xiao W
Hi, John

> -Original Message-
> From: Mcnamara, John
> Sent: Tuesday, January 12, 2016 10:16 PM
> To: Wang, Xiao W ; Chen, Jing D
> ; Richardson, Bruce 
> Cc: dev at dpdk.org
> Subject: RE: [dpdk-dev] [RFC PATCH 3/3] doc: add introduction for fm10k
> FTAG based forwarding
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Wang Xiao W
> > Sent: Tuesday, January 5, 2016 12:32 PM
> > To: Chen, Jing D; Richardson, Bruce
> > Cc: dev at dpdk.org
> > Subject: [dpdk-dev] [RFC PATCH 3/3] doc: add introduction for fm10k
> > FTAG based forwarding
> >
> > Add a brief introduction on FTAG, describes what's FTAG and how it
> > works in forwarding, introduction on how to run fm10k with FTAG is
> > also included.
> >
> > Signed-off-by: Wang Xiao W 
> > ---
> >  doc/guides/nics/fm10k.rst | 13 +
> >  1 file changed, 13 insertions(+)
> >
> > diff --git a/doc/guides/nics/fm10k.rst b/doc/guides/nics/fm10k.rst
> > index
> > 4206b7f..d82bf41 100644
> > --- a/doc/guides/nics/fm10k.rst
> > +++ b/doc/guides/nics/fm10k.rst
> > @@ -34,6 +34,19 @@ FM10K Poll Mode Driver  The FM10K poll mode driver
> > library provides support for the Intel FM1
> >  (FM10K) family of 40GbE/100GbE adapters.
> >
> 
> Hi,
> 
> Some very minor comments.
> 
> 
> > +FTAG Based Forwarding of FM10K
> > +--
> 
> The Documentation Guidelines say to put a newline after section headers.
> 
> > +FTAG Based Forwarding is a unique feature of FM10K. The FM10K family
> > +of NICs support the addition of a Fabric Tag (FTAG) to carry special
> > information.
> > +The FTAG is placed at the beginning of the frame, it contains
> > +information such as where the packet comes from and goes, the vlan tag.
> 
> s/the vlan tag/and the vlan tag
> 
> 
> > +In FTAG based forwarding mode, the switch logic forwards packets
> > +according to glort (global resource tag) information, other than the
> 
> s/other/rather
> 
> 
> > +mac and vlan table. Now this feature works only on PF.
> 
> s/Now/Currently
> 
> 
> > +
> > +To enable this feature, turn CONFIG_RTE_LIBRTE_FM10K_FTAG_FWD to y
> in
> 
> In general variable and config names should be in fixed width quotes:
> 
> ``CONFIG_RTE_LIBRTE_FM10K_FTAG_FWD``
> 
> 
> > +the configuration file. A unit test case fm10k_ftag_autotest is for
> 
> s/for/provided for
> 
> John.
> --
> 
Many thanks for your comments, I'll change accordingly.


[dpdk-dev] [PATCH v2 7/7] virtio: add 1.0 support

2016-01-13 Thread Tetsuya Mukawa
On 2016/01/12 15:59, Yuanhan Liu wrote:
> +static int
> +virtio_read_caps(struct rte_pci_device *dev, struct virtio_hw *hw)
> +{
> + uint8_t pos;
> + struct virtio_pci_cap cap;
> + int ret;
> +
> + if (rte_eal_pci_map_device(dev) < 0) {
> + PMD_INIT_LOG(DEBUG, "failed to map pci device!");
> + return -1;
> + }
> +

Do you need to call rte_eal_pci_unmap_device() in somewhere in this file?

Anyway, I've reviewed and tested your all patches.
And it seems except for it, I guess your patches are good.

Thanks,
Tetsuya




[dpdk-dev] [PATCH] ixgbe: fix whitespace

2016-01-13 Thread Zhang, Helin


-Original Message-
From: Stephen Hemminger [mailto:step...@networkplumber.org] 
Sent: Wednesday, January 13, 2016 12:54 PM
To: Zhang, Helin ; Ananyev, Konstantin 

Cc: dev at dpdk.org; Stephen Hemminger 
Subject: [PATCH] ixgbe: fix whitespace

Normally, I ignore random minor whitespace issues, but this driver seems to 
have some authors that don't understand the conventions of putting whitespace 
after keywords.

Signed-off-by: Stephen Hemminger 
Acked-by: Helin Zhang 



[dpdk-dev] [PATCH] vhost_user: Make sure that memory map is set before attempting address translation

2016-01-13 Thread Pavel Fedin
 Hello!

> > Change-Id: Ibc8f6637fb5fb9885b02c316adf18afd45e0d49a
> 
> What's this? An internal track id?

 Yes, it's from our gerrit. I've just done git format-patch.

> If so, you should not include it
> here: it's just meaningless to us.
> 
> Otherwise, this patch looks good to me.

 Should i repost, or can you just drop this tag by yourself when applying?

Kind regards,
Pavel Fedin
Senior Engineer
Samsung Electronics Research center Russia




[dpdk-dev] [PATCH] vhost_user: Make sure that memory map is set before attempting address translation

2016-01-13 Thread Yuanhan Liu
On Wed, Jan 13, 2016 at 10:18:43AM +0300, Pavel Fedin wrote:
>  Hello!
> 
> > > Change-Id: Ibc8f6637fb5fb9885b02c316adf18afd45e0d49a
> > 
> > What's this? An internal track id?
> 
>  Yes, it's from our gerrit. I've just done git format-patch.
> 
> > If so, you should not include it
> > here: it's just meaningless to us.
> > 
> > Otherwise, this patch looks good to me.
> 
>  Should i repost, or can you just drop this tag by yourself when applying?

I'd suggest so. Since it's Thomas (but not me) to apply your patch.
And if you repost, you can add my ACK:

Acked-by: Yuanhan Liu 

--yliu


[dpdk-dev] [RESEND PATCH] vhost_user: Make sure that memory map is set before attempting address translation

2016-01-13 Thread Pavel Fedin
Malfunctioning virtio clients may not send VHOST_USER_SET_MEM_TABLE for
some reason. This causes NULL dereference in qva_to_vva().

Signed-off-by: Pavel Fedin 
Acked-by: Yuanhan Liu 
---
 lib/librte_vhost/virtio-net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
index 0ba5045..3e7cec0 100644
--- a/lib/librte_vhost/virtio-net.c
+++ b/lib/librte_vhost/virtio-net.c
@@ -630,7 +630,7 @@ set_vring_addr(struct vhost_device_ctx ctx, struct 
vhost_vring_addr *addr)
struct vhost_virtqueue *vq;

dev = get_device(ctx);
-   if (dev == NULL)
+   if ((dev == NULL) || (dev->mem == NULL))
return -1;

/* addr->index refers to the queue index. The txq 1, rxq is 0. */
-- 
2.1.1




[dpdk-dev] [PATCH] doc: coding style: use linux kernel style for indentation

2016-01-13 Thread Yuanhan Liu
Using two tabs for "if" (or "while") statements is a bit weird to me.
Also, using one tab unconditionaly for function definitions and
prototypes doesn't look great.

Here I'd suggest to use the indentation style the Linux kernel
project prefers: to align with the open brace with tabs and
additonal spaces (when necessary).

Example:

  static int
  rte_eal_foo_bar(int a_long_argument_1, int another_long_argument_2,
  struct foo *yet_another_long_argument_3)

  ret = rte_eal_foo_bar(a_long_argument_1, another_long_argument_2,
yet_another_long_argument_3);

  if (really_long_variable_name_1 == really_long_variable_name_2 &&
  var3 == var4) {
  x = y + z;
  ;
  }

Cc: Thomas Monjalon 
Cc: Siobhan Butler 
Cc: John McNamara 
Signed-off-by: Yuanhan Liu 
---
 doc/guides/contributing/coding_style.rst | 38 ++--
 1 file changed, 26 insertions(+), 12 deletions(-)

diff --git a/doc/guides/contributing/coding_style.rst 
b/doc/guides/contributing/coding_style.rst
index ad1392d..23cd060 100644
--- a/doc/guides/contributing/coding_style.rst
+++ b/doc/guides/contributing/coding_style.rst
@@ -339,9 +339,11 @@ General

 * Do not put any spaces before a tab for indentation.
 * If you have to wrap a long statement, put the operator at the end of the 
line, and indent again.
-* For control statements (if, while, etc.), continuation it is recommended 
that the next line be indented by two tabs, rather than one,
-  to prevent confusion as to whether the second line of the control statement 
forms part of the statement body or not.
-  Alternatively, the line continuation may use additional spaces to line up to 
an appropriately point on the preceding line, for example, to align to an 
opening brace.
+* For control statements (if, while, etc.), function definitions and
+  function prototypes continuation lines, it is recommended that the
+  next line be indented by tab and additional spaces when necessary
+  to align to the opening brace. For others, it's also okay to be
+  indented by tab only.

 .. note::

@@ -350,17 +352,29 @@ General
 .. code-block:: c

  while (really_long_variable_name_1 == really_long_variable_name_2 &&
- var3 == var4){  /* confusing to read as */
- x = y + z;  /* control stmt body lines up with second line of */
- a = b + c;  /* control statement itself if single indent used */
+ var3 == var4) { /* confusing to read as */
+ x = y + z;  /* control stmt body lines up with second line of */
+ a = b + c;  /* control statement itself if single indent used */
  }

  if (really_long_variable_name_1 == really_long_variable_name_2 &&
- var3 == var4){  /* two tabs used */
- x = y + z;  /* statement body no longer lines up */
- a = b + c;
+ var3 == var4) {/* align with above opening if statement */
+ x = y + z; /* statement body no longer lines up */
+ a = b + c;
  }

+ /* The continuation line is indented with two tab + 1 space */
+ static int
+ rte_eal_foo_bar(int a_long_argument_1, int another_long_argument_2,
+ struct foo *yet_another_long_argument_3)
+ {
+...
+ }
+
+ /* The continuation line is indented with two tab + 7 spaces */
+ ret = rte_eal_foo_bar(a_long_argument_1, another_long_argument_2,
+   yet_another_long_argument_3);
+
  z = a + really + long + statement + that + needs +
  two + lines + gets + indented + on + the +
  second + and + subsequent + lines;
@@ -510,9 +524,9 @@ Prototypes
 .. code-block:: c

  static char *function1(int _arg, const char *_arg2,
-struct foo *_arg3,
-struct bar *_arg4,
-struct baz *_arg5);
+struct foo *_arg3,
+struct bar *_arg4,
+struct baz *_arg5);
  static void usage(void);

 .. note::
-- 
1.9.0



[dpdk-dev] [RFC v2 1/2] ethdev: add packet filter flow and new behavior switch to fdir

2016-01-13 Thread Rahul Lakkireddy
Hi Jingjing,

On Tuesday, January 01/12/16, 2016 at 17:12:47 -0800, Wu, Jingjing wrote:
> > 
> > diff --git a/lib/librte_ether/rte_eth_ctrl.h 
> > b/lib/librte_ether/rte_eth_ctrl.h
> > index ce224ad..5cc22a0 100644
> > --- a/lib/librte_ether/rte_eth_ctrl.h
> > +++ b/lib/librte_ether/rte_eth_ctrl.h
> > @@ -74,7 +74,11 @@ extern "C" {
> >  #define RTE_ETH_FLOW_IPV6_EX15
> >  #define RTE_ETH_FLOW_IPV6_TCP_EX16
> >  #define RTE_ETH_FLOW_IPV6_UDP_EX17
> > -#define RTE_ETH_FLOW_MAX18
> > +#define RTE_ETH_FLOW_PKT_FILTER_IPV4_TCP 18 #define
> > +RTE_ETH_FLOW_PKT_FILTER_IPV4_UDP 19 #define
> > +RTE_ETH_FLOW_PKT_FILTER_IPV6_TCP 20 #define
> > +RTE_ETH_FLOW_PKT_FILTER_IPV6_UDP 21
> > +#define RTE_ETH_FLOW_MAX22
> > 
> How to distinguish RTE_ETH_FLOW_PKT_FILTER_IPV4_XX with 
> RTE_ETH_FLOW_NONFRAG_IPV4_XX, what is the difference?

The packet filter flow is basically a superset containing Ethernet,
vlan, ipv4/ipv6 and tcp/udp flows whose fields can all be matched at
the same time, unlike in case of the current flow director which seems
to match only one of the flows at any given time.  Additionally, it also
allows specifying masks.  I separated the two to make this meaning
explicit.  If this is not necessary, then I will merge them.

> >  /**
> >   * Feature filter types
> > @@ -407,6 +411,9 @@ struct rte_eth_l2_flow {  struct rte_eth_ipv4_flow {
> > uint32_t src_ip;  /**< IPv4 source address to match. */
> > uint32_t dst_ip;  /**< IPv4 destination address to match. */
> > +   uint8_t tos;  /**< IPV4 type of service to match. */
> > +   uint8_t proto;/**< IPV4 proto to match. */
> > +   uint8_t ttl;  /**< IPV4 time to live to match. */
> >  };
> > 
> >  /**
> > @@ -443,6 +450,10 @@ struct rte_eth_sctpv4_flow {  struct
> > rte_eth_ipv6_flow {
> > uint32_t src_ip[4];  /**< IPv6 source address to match. */
> > uint32_t dst_ip[4];  /**< IPv6 destination address to match. */
> > +   uint8_t  tc; /**< IPv6 traffic class to match. */
> > +   uint32_t flow_label; /**< IPv6 flow label to match. */
> > +   uint8_t  next_header;/**< IPv6 next header to match. */
> > +   uint8_t  hop_limit;  /**< IPv6 hop limits to match. */
> >  };
> > 
> There is also a patch http://dpdk.org/dev/patchwork/patch/9661/ which added 
> these fields. Maybe we can merge them together.

Yes, we can merge them.  Would you like me to merge your patch here?

> > +struct rte_eth_pkt_filter_flow {
> > +   enum rte_eth_pkt_filter_type type;   /**< Type of filter */
> > +   enum rte_eth_pkt_filter_type prio;
> > +   /**< Prioritize the filter type when a packet matches several types */
> > +   struct rte_eth_pkt_filter pkt;  /**< Packet fields to match. */
> > +   struct rte_eth_pkt_filter mask; /**< Mask for matched fields. */
> > +};
> > +
> > +/**
> >   * An union contains the inputs for all types of flow
> >   */
> >  union rte_eth_fdir_flow {
> > @@ -514,6 +570,7 @@ union rte_eth_fdir_flow {
> > struct rte_eth_ipv6_flow   ipv6_flow;
> > struct rte_eth_mac_vlan_flow mac_vlan_flow;
> > struct rte_eth_tunnel_flow   tunnel_flow;
> > +   struct rte_eth_pkt_filter_flow pkt_filter_flow;
> >  };
> Why not use rte_eth_XX_flow directly but add a new one? Is it because of the 
> mask? If so, how about to add a field in rte_eth_fdir_input like:
> struct rte_eth_fdir_input {
> uint16_t flow_type;
> union rte_eth_fdir_flow flow;
> /**< Flow fields to match, dependent on flow_type */
> union rte_eth_fdir_flow flow_mask;
> struct rte_eth_fdir_flow_ext flow_ext;
> /**< Additional fields to match */
> };
> 
> Thanks
> Jingjing

The current rte_eth_XX_flow only allow matching _one_ of the flows
because of the union.  In contrast, rte_eth_pkt_filter_flow can match
several flows at the same time; i.e. it can match ethernet, vlan, ip,
and tcp/udp all at the same time.  rte_eth_pkt_filter_flow is basically
a superset containing several flows that can be matched at the same
time.

In our Chelsio T5 hardware, it's possible to have several flows that can
be matched in a single rule. This is why I've created a superset that
can match several flows in the same rule.

Thanks,
Rahul


[dpdk-dev] [RESEND PATCH] vhost_user: Make sure that memory map is set before attempting address translation

2016-01-13 Thread Xie, Huawei
On 1/13/2016 3:33 PM, Pavel Fedin wrote:
> Malfunctioning virtio clients may not send VHOST_USER_SET_MEM_TABLE for
> some reason. This causes NULL dereference in qva_to_vva().

Do you have examples for the malfunctioning clients? If so, could you
list them in the commit message?

>
> Signed-off-by: Pavel Fedin 
> Acked-by: Yuanhan Liu 
> ---
>  lib/librte_vhost/virtio-net.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/lib/librte_vhost/virtio-net.c b/lib/librte_vhost/virtio-net.c
> index 0ba5045..3e7cec0 100644
> --- a/lib/librte_vhost/virtio-net.c
> +++ b/lib/librte_vhost/virtio-net.c
> @@ -630,7 +630,7 @@ set_vring_addr(struct vhost_device_ctx ctx, struct 
> vhost_vring_addr *addr)
>   struct vhost_virtqueue *vq;
>  
>   dev = get_device(ctx);
> - if (dev == NULL)
> + if ((dev == NULL) || (dev->mem == NULL))
>   return -1;
>  
>   /* addr->index refers to the queue index. The txq 1, rxq is 0. */



[dpdk-dev] [PATCH v2 7/7] virtio: add 1.0 support

2016-01-13 Thread Yuanhan Liu
On Wed, Jan 13, 2016 at 12:31:43PM +0900, Tetsuya Mukawa wrote:
> On 2016/01/12 15:59, Yuanhan Liu wrote:
> > +static int
> > +virtio_read_caps(struct rte_pci_device *dev, struct virtio_hw *hw)
> > +{
> > +   uint8_t pos;
> > +   struct virtio_pci_cap cap;
> > +   int ret;
> > +
> > +   if (rte_eal_pci_map_device(dev) < 0) {
> > +   PMD_INIT_LOG(DEBUG, "failed to map pci device!");
> > +   return -1;
> > +   }
> > +
> 
> Do you need to call rte_eal_pci_unmap_device() in somewhere in this file?

Yes, we should. I will see where I can find a proper place for it in
next version; eth_virtio_dev_uninit sounds like a good option.

> Anyway, I've reviewed and tested your all patches.
> And it seems except for it, I guess your patches are good.

Thank you for reviewing and testing it!

--yliu


[dpdk-dev] [RESEND PATCH] vhost_user: Make sure that memory map is set before attempting address translation

2016-01-13 Thread Pavel Fedin
 Hello!

> Do you have examples for the malfunctioning clients? If so, could you
> list them in the commit message?

 The only malfunctioning client was DPDK itself, with virtio for container RFC 
applied. The client-side problem has been fixed
afterwards by http://dpdk.org/ml/archives/dev/2016-January/031169.html. See the 
RFC discussion thread.

Kind regards,
Pavel Fedin
Senior Engineer
Samsung Electronics Research center Russia




[dpdk-dev] Getting error while running DPDK test app on X-Gene1

2016-01-13 Thread Ankit Jindal
Hi,

We are trying to run dpdk on our arm64 based SOC having Intel 10G
ixgbe PCIe card plugged. While running any test app, we are getting
following error.

EAL: PCI device :01:00.0 on NUMA socket 0
EAL:   probe driver: 8086:10fb rte_ixgbe_pmd
EAL: Cannot open /sys/bus/pci/devices/:01:00.0/resource0: No such
file or directory
EAL: Error - exiting with code: 1
  Cause: Requested device :01:00.0 cannot be used

Below are the details on modules, hugepages and device binding.
root at arm64:~# lsmod
Module  Size  Used by
rte_kni   292795  0
igb_uio 4338  0
ixgbe 184456  0

root at arm64:~/dpdk# cat /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
2048

root at arm64:~/dpdk# ./tools/dpdk_nic_bind.py --status

Network devices using DPDK-compatible driver

:01:00.0 '82599ES 10-Gigabit SFI/SFP+ Network Connection'
drv=igb_uio unused=
:01:00.1 '82599ES 10-Gigabit SFI/SFP+ Network Connection'
drv=igb_uio unused=

Network devices using kernel driver
===


Other network devices
=

root at arm64:~/dpdk#

Thanks,
Ankit


[dpdk-dev] patchwork 9704-9710 compilation error

2016-01-13 Thread Jan Viktorin
Hello Yongjie,

I know about this error. It does not work on some GCC versions...
However, this patch is not intended to be merged upstream. It just
shows that the infra works in the RFC series.

Anyway, thank you for your notification.

Regards
Jan

On Wed, 13 Jan 2016 10:17:37 +
"Gu, YongjieX"  wrote:

> Hi,Viktorin:
>   I tested your patch 
> 9704-9710(dpdk-dev-RFC-1-7-eal-common-define-rte_soc_-related-common-interface.patch
>  etc.) on patchwork,and met an error,
> Error info as below,can you help to check.
> 
> 
> OS: fedora20
> 
> Nic: Niantic
> 
> Gcc: 4.8.3
> Kernel: 3.11.10-301.fc20.x86_64
> Target: x86_64-native-linuxapp-gcc
> Baseline commit: 3b60ce8cbb959d7a6839f94ad995a3594c07801e
> 
> Error info:
> 
> cc1: error: -Werror=incompatible-pointer-types: no option 
> -Wincompatible-pointer-types CC test_pmd_perf.o
> 
> make[5]: *** [test_soc.o] Error 1
> 
> Thanks
> Yongjie



-- 
  Jan ViktorinE-mail: Viktorin at RehiveTech.com
  System ArchitectWeb:www.RehiveTech.com
  RehiveTech
  Brno, Czech Republic


[dpdk-dev] rte_prefetch0() is effective?

2016-01-13 Thread Bruce Richardson
On Thu, Dec 24, 2015 at 03:35:14PM +0900, Moon-Sang Lee wrote:
> I see codes as below in example directory, and I wonder it is effective.
> Coherent IO is adopted to modern architectures,
> so I think that DMA initiation by rte_eth_rx_burst() might already fulfills
> cache lines of RX buffers.
> Do I really need to call rte_prefetchX()?
> 
> nb_rx = rte_eth_rx_burst(portid, queueid, pkts_burst,
> MAX_PKT_BURST);
> ...
> /* Prefetch and forward already prefetched packets */
> for (j = 0; j < (nb_rx - PREFETCH_OFFSET); j++) {
> rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[
> j + PREFETCH_OFFSET], void *));
> l3fwd_simple_forward(pkts_burst[j], portid,
> qconf);
> }
> 

Good question.
When the first example apps using this style of prefetch were originally 
written,
yes, there was a noticable performance increase achieved by using the prefetch.
Thereafter, I'm not sure that anyone has checked with each generation of
platforms whether the prefetches are still necessary and how much they help, but
I suspect that they still help a bit, and don't hurt performance.
It would be an interesting exercise to check whether the prefetch offsets used
in code like above can be adjusted to give better performance on our latest
supported platforms.

/Bruce


[dpdk-dev] [PATCH v2] mbuf: optimize rte_mbuf_refcnt_update

2016-01-13 Thread Bruce Richardson
On Tue, Jan 05, 2016 at 11:11:24AM +, Hanoch Haim (hhaim) wrote:
> Hi Oliver, 
> Thank you for the fast response and it would be great to open a discussion on 
> that.
> In general our project can leverage your optimization and I think it is great 
> (we should have thought about it) . We can use it using the workaround I 
> described.
> However, for me it  seems odd that  rte_pktmbuf_attach () that does not 
> *change* anything in m_const, except of the *atomic* ref counter does not 
> work in parallel.
> The example I gave is a classic use case of rte_pktmbuf_attach  (multicast ) 
> and I don't see why it wouldn't work after your optimization. 
> 
> Do you have a pointer to the documentation that state that that you can't 
> call the atomic ref counter from more than one thread?
> 
Hi,

actually, the issue is not that you can't work with the reference counter field
from multiple threads, or that you can't use an mbuf from multiple threads,
it's that if you are working with the same mbuf in multiple threads you have
multiple references to the mbuf and your application must increase the reference
counter appropriately. For example, if thread A is going to pass an mbuf to
thread B and keep using it itself, you must increment the reference counter
in thread A before enqueuing it to B.

Regards,
/Bruce



[dpdk-dev] [PATCH] pci: Add the class_id support in pci probe

2016-01-13 Thread Bruce Richardson
On Thu, Dec 31, 2015 at 09:12:14AM -0800, Stephen Hemminger wrote:
> On Tue, 29 Dec 2015 10:53:26 +0800
> Ziye Yang  wrote:
> 
> > This patch is used to add the class_id support
> > for pci_probe since some devices need the class_info
> > (class_code, subclass_code, programming_interface)
> > 
> > Signed-off-by: Ziye Yang 
> 
> Since rte_pci is exposed to application this breaks the ABI.

But applications are not going to be defining rte_pci_ids values internally, are
they? That is for drivers to use. Is this really an ABI breakage for 
applications that we
need to be concerned about?

/Bruce


[dpdk-dev] [PATCH v2 0/4] fix the issue that DPDK takes over virtio device blindly

2016-01-13 Thread Santosh Shukla
On Tue, Jan 12, 2016 at 10:46 AM, Xie, Huawei  wrote:
> On 1/12/2016 12:24 PM, Santosh Shukla wrote:
>> On Tue, Jan 12, 2016 at 8:32 AM, Xie, Huawei  wrote:
>>> On 1/5/2016 1:25 AM, Stephen Hemminger wrote:
 On Mon,  4 Jan 2016 01:56:09 +0800
 Huawei Xie  wrote:

> v2 changes:
>  Remove unnecessary assignment of NULL to dev->data->mac_addrs
>  Ajust one comment's position
>  change LOG level from ERR to INFO
>
> virtio PMD doesn't set RTE_PCI_DRV_NEED_MAPPING in drv_flags of its
> eth_driver. It will try igb_uio and PORT IO in turn to configure
> virtio device. Even user in guest VM doesn't want to use virtio for
> DPDK, virtio PMD will take over the device blindly.
>
> The more serious problem is kernel driver is still manipulating the
> device, which causes driver conflict.
>
> This patch checks if there is any kernel driver manipulating the
> virtio device before virtio PMD uses port IO to configure the device.
>
> Huawei Xie (4):
>   eal: make the comment more accurate
>   eal: set kdrv to RTE_KDRV_NONE if kernel driver isn't manipulating the 
> device.
>   virtio: return 1 to tell the kernel we don't take over this device
>   virtio: check if any kernel driver is manipulating the virtio device
>
>  drivers/net/virtio/virtio_ethdev.c | 16 ++--
>  lib/librte_eal/common/eal_common_pci.c |  8 
>  lib/librte_eal/linuxapp/eal/eal_pci.c  |  2 +-
>  3 files changed, 19 insertions(+), 7 deletions(-)
>
 Overall looks good, thanks for addressing this.

 It would be good to note that VFIO no-IOMMU mode should work for this
 as well.
>>> It isn't implemented yet in virtio PMD. I could add a note in the commit
>>> message. Do you plan to implement this?
>>>
>> I can send vfio-noiommu patches for this one, as I am looking at
>> vfio-noiommu for virtio  for my arm v4 patch series. Stephen, let me
>> know if you already started working on this?
>>
>> Also for some reason I can't find [3/4] patch, could you point me to
>> patch link? Thanks.
> Thanks. Here is the patch: http://www.dpdk.org/dev/patchwork/patch/9720/
>

Thanks, I am rebasing my v4 series on top of this patch, although I
couldn't see current series creating issue for vfio-noiommu case.
We'll post feedback in v4 cover-letter.


[dpdk-dev] [PATCH v3 09/12] virtio: vfio: Enable RTE_PCI_DRV_NEED_MAPPING flag in driver

2016-01-13 Thread Santosh Shukla
On Tue, Jan 12, 2016 at 12:44 PM, Yuanhan Liu
 wrote:
> On Sat, Jan 09, 2016 at 06:08:46PM +0530, Santosh Shukla wrote:
>> On Thu, Jan 7, 2016 at 11:50 PM, Stephen Hemminger
>>  wrote:
>> > On Thu,  7 Jan 2016 22:03:06 +0530
>> > Santosh Shukla  wrote:
>> >
>> >> +#ifdef RTE_EAL_VFIO
>> >> + .drv_flags = RTE_PCI_DRV_NEED_MAPPING | 
>> >> RTE_PCI_DRV_DETACHABLE,
>> >> +#else
>> >>   .drv_flags = RTE_PCI_DRV_DETACHABLE,
>> >> +#endif
>> >
>> > Since VFIO is determined at runtime not compile time, the flags should
>> > be updated at runtime not compile time.
>> >
>> >
>> In general, Yes, Its a wrong approach i..e. Wrapping __need_mapping
>> flag only for vfio case. I am thinking to add vfio parser routine
>> something similar to virtio_xxx_xx_uio_xx() / virtio_xx_xx_ioport()
>> currently exist. This will remove RTE_EAL_VFIO ifdef clutter for this
>> patch and [08/12] patch and also virtio pmd driver can then initialize
>> device for vfio mode..
>>
>> _but_ I still need _MAPPING flag enabled for in virtio driver as
>> because for vfio case - I want vfio_xx_mmap() routine to create vfio
>> container/group_id and then create vfio_dev_fd for each virtio-net-pci
>> interface.
>
> I'm thinking my following patch will help:
>
> http://dpdk.org/dev/patchwork/patch/9814/
>

Yes, It works, so wont need NEED_MAPPING flag, Sending v4 patch series
rebased on this patch..

> --yliu
>
>> Let me know my approach aligned to your suggestion.


[dpdk-dev] [PATCH] pci: Add the class_id support in pci probe

2016-01-13 Thread Panu Matilainen
On 01/13/2016 01:55 PM, Bruce Richardson wrote:
> On Thu, Dec 31, 2015 at 09:12:14AM -0800, Stephen Hemminger wrote:
>> On Tue, 29 Dec 2015 10:53:26 +0800
>> Ziye Yang  wrote:
>>
>>> This patch is used to add the class_id support
>>> for pci_probe since some devices need the class_info
>>> (class_code, subclass_code, programming_interface)
>>>
>>> Signed-off-by: Ziye Yang 
>>
>> Since rte_pci is exposed to application this breaks the ABI.
>
> But applications are not going to be defining rte_pci_ids values internally, 
> are
> they? That is for drivers to use. Is this really an ABI breakage for 
> applications that we
> need to be concerned about?

There might not be applications using it but drivers are ABI consumers 
too - think of 3rd party drivers and such.

- Panu -



[dpdk-dev] [PATCH 0/2] i40evf: support interrupt based pf reset request

2016-01-13 Thread Jingjing Wu
If DPDK is used on VF while the host is using Linux Kernel driver
as PF driver on FVL NIC, some setting on PF will trigger VF reset.
DPDK VF need to know the event.
This patch set makes the interrupt based request of PF reset from PF
supported by enabling the adminq event process in VF driver.
Users can register a callback for this interrupt event to get
informed, when a PF reset request detected like:
rte_eth_dev_callback_register(portid,
RTE_ETH_EVENT_INTR_RESET,
reset_event_callback,
arg);

Jingjing Wu (2):
  i40evf: allocate virtchnl cmd buffer for each vf
  i40evf: support interrupt based pf reset request

 drivers/net/i40e/i40e_ethdev.h|   2 +
 drivers/net/i40e/i40e_ethdev_vf.c | 425 +++---
 lib/librte_ether/rte_ethdev.h |   1 +
 3 files changed, 304 insertions(+), 124 deletions(-)

-- 
2.4.0



[dpdk-dev] [PATCH 1/2] i40evf: allocate virtchnl cmd buffer for each vf

2016-01-13 Thread Jingjing Wu
Currently, i40evf PMD uses a global static buffer to send virtchnl
command to host driver. It is shared by multi VFs.
This patch changed to allocate virtchnl cmd buffer for each VF.

Signed-off-by: Jingjing Wu 
---
 drivers/net/i40e/i40e_ethdev.h|   2 +
 drivers/net/i40e/i40e_ethdev_vf.c | 183 +++---
 2 files changed, 75 insertions(+), 110 deletions(-)

diff --git a/drivers/net/i40e/i40e_ethdev.h b/drivers/net/i40e/i40e_ethdev.h
index 1f9792b..93122ad 100644
--- a/drivers/net/i40e/i40e_ethdev.h
+++ b/drivers/net/i40e/i40e_ethdev.h
@@ -494,7 +494,9 @@ struct i40e_vf {
bool link_up;
bool vf_reset;
volatile uint32_t pend_cmd; /* pending command not finished yet */
+   uint32_t cmd_retval; /* return value of the cmd response from PF */
u16 pend_msg; /* flags indicates events from pf not handled yet */
+   uint8_t *aq_resp; /* buffer to store the adminq response from PF */

/* VSI info */
struct i40e_virtchnl_vf_resource *vf_res; /* All VSIs */
diff --git a/drivers/net/i40e/i40e_ethdev_vf.c 
b/drivers/net/i40e/i40e_ethdev_vf.c
index 14d2a50..ebcd881 100644
--- a/drivers/net/i40e/i40e_ethdev_vf.c
+++ b/drivers/net/i40e/i40e_ethdev_vf.c
@@ -103,9 +103,6 @@ enum i40evf_aq_result {
I40EVF_MSG_CMD,  /* Read async command result */
 };

-/* A share buffer to store the command result from PF driver */
-static uint8_t cmd_result_buffer[I40E_AQ_BUF_SZ];
-
 static int i40evf_dev_configure(struct rte_eth_dev *dev);
 static int i40evf_dev_start(struct rte_eth_dev *dev);
 static void i40evf_dev_stop(struct rte_eth_dev *dev);
@@ -237,31 +234,39 @@ i40evf_set_mac_type(struct i40e_hw *hw)
 }

 /*
- * Parse admin queue message.
- *
- * return value:
- *  < 0: meet error
- *  0: read sys msg
- *  > 0: read cmd result
+ * Read data in admin queue to get msg from pf driver
  */
 static enum i40evf_aq_result
-i40evf_parse_pfmsg(struct i40e_vf *vf,
-  struct i40e_arq_event_info *event,
-  struct i40evf_arq_msg_info *data)
+i40evf_read_pfmsg(struct rte_eth_dev *dev, struct i40evf_arq_msg_info *data)
 {
-   enum i40e_virtchnl_ops opcode = (enum i40e_virtchnl_ops)\
-   rte_le_to_cpu_32(event->desc.cookie_high);
-   enum i40e_status_code retval = (enum i40e_status_code)\
-   rte_le_to_cpu_32(event->desc.cookie_low);
-   enum i40evf_aq_result ret = I40EVF_MSG_CMD;
+   struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);
+   struct i40e_vf *vf = I40EVF_DEV_PRIVATE_TO_VF(dev->data->dev_private);
+   struct i40e_arq_event_info event;
+   enum i40e_virtchnl_ops opcode;
+   enum i40e_status_code retval;
+   int ret;
+   enum i40evf_aq_result result = I40EVF_MSG_NON;

+   event.buf_len = data->buf_len;
+   event.msg_buf = data->msg;
+   ret = i40e_clean_arq_element(hw, &event, NULL);
+   /* Can't read any msg from adminQ */
+   if (ret) {
+   if (ret == I40E_ERR_ADMIN_QUEUE_NO_WORK)
+   result = I40EVF_MSG_NON;
+   else
+   result = I40EVF_MSG_ERR;
+   return result;
+   }
+
+   opcode = (enum 
i40e_virtchnl_ops)rte_le_to_cpu_32(event.desc.cookie_high);
+   retval = (enum i40e_status_code)rte_le_to_cpu_32(event.desc.cookie_low);
/* pf sys event */
if (opcode == I40E_VIRTCHNL_OP_EVENT) {
struct i40e_virtchnl_pf_event *vpe =
-   (struct i40e_virtchnl_pf_event *)event->msg_buf;
+   (struct i40e_virtchnl_pf_event *)event.msg_buf;

-   /* Initialize ret to sys event */
-   ret = I40EVF_MSG_SYS;
+   result = I40EVF_MSG_SYS;
switch (vpe->event) {
case I40E_VIRTCHNL_EVENT_LINK_CHANGE:
vf->link_up =
@@ -286,74 +291,17 @@ i40evf_parse_pfmsg(struct i40e_vf *vf,
}
} else {
/* async reply msg on command issued by vf previously */
-   ret = I40EVF_MSG_CMD;
+   result = I40EVF_MSG_CMD;
/* Actual data length read from PF */
-   data->msg_len = event->msg_len;
+   data->msg_len = event.msg_len;
}
-   /* fill the ops and result to notify VF */
+
data->result = retval;
data->ops = opcode;

-   return ret;
-}
-
-/*
- * Read data in admin queue to get msg from pf driver
- */
-static enum i40evf_aq_result
-i40evf_read_pfmsg(struct rte_eth_dev *dev, struct i40evf_arq_msg_info *data)
-{
-   struct i40e_hw *hw = I40E_DEV_PRIVATE_TO_HW(dev->data->dev_private);
-   struct i40e_vf *vf = I40EVF_DEV_PRIVATE_TO_VF(dev->data->dev_private);
-   struct i40e_arq_event_info event;
-   int ret;
-   enum i40evf_aq_result result = I40EVF_MSG_NON;
-
-   event.buf_len = data->buf_len;
-   event.msg_buf = data->msg;
-   ret =

[dpdk-dev] [PATCH 2/2] i40evf: support interrupt based pf reset request

2016-01-13 Thread Jingjing Wu
Interrupt based request of PF reset from PF is supported by
enabling the adminq event process in VF driver.
Users can register a callback for this interrupt event to get
informed, when a PF reset request detected like:
  rte_eth_dev_callback_register(portid,
RTE_ETH_EVENT_INTR_RESET,
reset_event_callback,
arg);

Signed-off-by: Jingjing Wu 
---
 drivers/net/i40e/i40e_ethdev_vf.c | 274 +-
 lib/librte_ether/rte_ethdev.h |   1 +
 2 files changed, 245 insertions(+), 30 deletions(-)

diff --git a/drivers/net/i40e/i40e_ethdev_vf.c 
b/drivers/net/i40e/i40e_ethdev_vf.c
index ebcd881..cfeee77 100644
--- a/drivers/net/i40e/i40e_ethdev_vf.c
+++ b/drivers/net/i40e/i40e_ethdev_vf.c
@@ -74,8 +74,6 @@
 #define I40EVF_BUSY_WAIT_DELAY 10
 #define I40EVF_BUSY_WAIT_COUNT 50
 #define MAX_RESET_WAIT_CNT 20
-/*ITR index for NOITR*/
-#define I40E_QINT_RQCTL_MSIX_INDX_NOITR 3

 struct i40evf_arq_msg_info {
enum i40e_virtchnl_ops ops;
@@ -151,6 +149,9 @@ static int
 i40evf_dev_rx_queue_intr_enable(struct rte_eth_dev *dev, uint16_t queue_id);
 static int
 i40evf_dev_rx_queue_intr_disable(struct rte_eth_dev *dev, uint16_t queue_id);
+static void i40evf_handle_pf_event(__rte_unused struct rte_eth_dev *dev,
+  uint8_t *msg,
+  uint16_t msglen);

 /* Default hash key buffer for RSS */
 static uint32_t rss_key_default[I40E_VFQF_HKEY_MAX_INDEX + 1];
@@ -357,20 +358,42 @@ i40evf_execute_vf_cmd(struct rte_eth_dev *dev, struct 
vf_cmd_info *args)
return err;
}

-   do {
-   /* Delay some time first */
-   rte_delay_ms(ASQ_DELAY_MS);
-   ret = i40evf_read_pfmsg(dev, &info);
-   if (ret == I40EVF_MSG_CMD) {
-   err = 0;
-   break;
-   } else if (ret == I40EVF_MSG_ERR) {
-   err = -1;
-   break;
-   }
-   /* If don't read msg or read sys event, continue */
-   } while (i++ < MAX_TRY_TIMES);
-   _clear_cmd(vf);
+   switch (args->ops) {
+   case I40E_VIRTCHNL_OP_RESET_VF:
+   /*no need to process in this function */
+   break;
+   case I40E_VIRTCHNL_OP_VERSION:
+   case I40E_VIRTCHNL_OP_GET_VF_RESOURCES:
+   /* for init adminq commands, need to poll the response */
+   do {
+   /* Delay some time first */
+   rte_delay_ms(ASQ_DELAY_MS);
+   ret = i40evf_read_pfmsg(dev, &info);
+   if (ret == I40EVF_MSG_CMD) {
+   err = 0;
+   break;
+   } else if (ret == I40EVF_MSG_ERR) {
+   err = -1;
+   break;
+   }
+   /* If don't read msg or read sys event, continue */
+   } while (i++ < MAX_TRY_TIMES);
+   _clear_cmd(vf);
+   break;
+
+   default:
+   /* for other adminq in running time, waiting the cmd done flag 
*/
+   do {
+   /* Delay some time first */
+   rte_delay_ms(ASQ_DELAY_MS);
+   if (vf->pend_cmd == I40E_VIRTCHNL_OP_UNKNOWN) {
+   err = 0;
+   break;
+   }
+   /* If don't read msg or read sys event, continue */
+   } while (i++ < MAX_TRY_TIMES);
+   break;
+   }

return (err | vf->cmd_retval);
 }
@@ -719,7 +742,7 @@ i40evf_config_irq_map(struct rte_eth_dev *dev)

map_info = (struct i40e_virtchnl_irq_map_info *)cmd_buffer;
map_info->num_vectors = 1;
-   map_info->vecmap[0].rxitr_idx = I40E_QINT_RQCTL_MSIX_INDX_NOITR;
+   map_info->vecmap[0].rxitr_idx = I40E_ITR_INDEX_DEFAULT;
map_info->vecmap[0].vsi_id = vf->vsi_res->vsi_id;
/* Alway use default dynamic MSIX interrupt */
map_info->vecmap[0].vector_id = vector_id;
@@ -1093,6 +1116,38 @@ i40evf_dev_atomic_write_link_status(struct rte_eth_dev 
*dev,
return 0;
 }

+/* Disable IRQ0 */
+static inline void
+i40evf_disable_irq0(struct i40e_hw *hw)
+{
+   /* Disable all interrupt types */
+   I40E_WRITE_REG(hw, I40E_VFINT_ICR0_ENA1, 0);
+   I40E_WRITE_REG(hw, I40E_VFINT_DYN_CTL01,
+  I40E_VFINT_DYN_CTL01_ITR_INDX_MASK);
+   I40EVF_WRITE_FLUSH(hw);
+}
+
+/* Enable IRQ0 */
+static inline void
+i40evf_enable_irq0(struct i40e_hw *hw)
+{
+   /* Enable admin queue interrupt trigger */
+   uint32_t val;
+
+   i40evf_disable_irq0(hw);
+   val = I40E_READ_REG(hw, I40E_VFINT_ICR0_ENA1);
+   val |= I40E_VFINT_ICR0_ENA1_ADMINQ_MASK |
+   I40E_VFINT_ICR0_ENA1_LINK_STAT_CHANGE_MASK;
+  

[dpdk-dev] [PATCH v2] vfio: Support for no-IOMMU mode

2016-01-13 Thread Anatoly Burakov
This commit is adding a generic mechanism to support multiple IOMMU
types. For now, it's only type 1 (x86 IOMMU) and no-IOMMU (a special
VFIO mode that doesn't use IOMMU at all), but it's easily extended
by adding necessary definitions into eal_pci_init.h and a DMA
mapping function to eal_pci_vfio_dma.c.

Since type 1 IOMMU module is no longer necessary to have VFIO,
we fix the module check to check for vfio-pci instead. It's not
ideal and triggers VFIO checks more often (and thus produces more
error output, which was the reason behind the module check in the
first place), so we compensate for that by providing more verbose
logging, indicating whether VFIO initialization has succeeded or
failed.

Signed-off-by: Anatoly Burakov 
Signed-off-by: Santosh Shukla 
Tested-by: Santosh Shukla 
---
v2 changes:
  Compile fix (hat-tip to Santosh Shukla)
  Tested-by is provisional, since only superficial testing was done
---
 lib/librte_eal/linuxapp/eal/Makefile   |   1 +
 lib/librte_eal/linuxapp/eal/eal_pci_init.h |  22 
 lib/librte_eal/linuxapp/eal/eal_pci_vfio.c | 143 -
 lib/librte_eal/linuxapp/eal/eal_pci_vfio_dma.c |  84 +++
 lib/librte_eal/linuxapp/eal/eal_vfio.h |   5 +
 5 files changed, 202 insertions(+), 53 deletions(-)
 create mode 100644 lib/librte_eal/linuxapp/eal/eal_pci_vfio_dma.c

diff --git a/lib/librte_eal/linuxapp/eal/Makefile 
b/lib/librte_eal/linuxapp/eal/Makefile
index 26eced5..5c9e9d9 100644
--- a/lib/librte_eal/linuxapp/eal/Makefile
+++ b/lib/librte_eal/linuxapp/eal/Makefile
@@ -59,6 +59,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_log.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_uio.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_vfio.c
+SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_vfio_dma.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_pci_vfio_mp_sync.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_debug.c
 SRCS-$(CONFIG_RTE_LIBRTE_EAL_LINUXAPP) += eal_lcore.c
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_init.h 
b/lib/librte_eal/linuxapp/eal/eal_pci_init.h
index a17c708..da1c431 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_init.h
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_init.h
@@ -106,6 +106,28 @@ struct vfio_config {
struct vfio_group vfio_groups[VFIO_MAX_GROUPS];
 };

+/* function pointer typedef for DMA mapping functions */
+typedef  int (*vfio_dma_func_t)(int);
+
+/* Structure to hold supported IOMMU types */
+struct vfio_iommu_type {
+   int type_id;
+   const char *name;
+   vfio_dma_func_t dma_map_func;
+};
+
+/* function prototypes for different IOMMU types */
+int vfio_iommu_type1_dma_map(int container_fd);
+int vfio_iommu_noiommu_dma_map(int container_fd);
+
+/* IOMMU types we support */
+static const struct vfio_iommu_type iommu_types[] = {
+   /* x86 IOMMU, otherwise known as type 1 */
+   { VFIO_TYPE1_IOMMU, "Type 1", &vfio_iommu_type1_dma_map},
+   /* IOMMU-less mode */
+   { VFIO_NOIOMMU_IOMMU, "No-IOMMU", &vfio_iommu_noiommu_dma_map},
+};
+
 #endif

 #endif /* EAL_PCI_INIT_H_ */
diff --git a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c 
b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
index 74f91ba..5eb6cd0 100644
--- a/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
+++ b/lib/librte_eal/linuxapp/eal/eal_pci_vfio.c
@@ -72,6 +72,7 @@ EAL_REGISTER_TAILQ(rte_vfio_tailq)
 #define VFIO_DIR "/dev/vfio"
 #define VFIO_CONTAINER_PATH "/dev/vfio/vfio"
 #define VFIO_GROUP_FMT "/dev/vfio/%u"
+#define VFIO_NOIOMMU_GROUP_FMT "/dev/vfio/noiommu-%u"
 #define VFIO_GET_REGION_ADDR(x) ((uint64_t) x << 40ULL)

 /* per-process VFIO config */
@@ -208,42 +209,58 @@ pci_vfio_set_bus_master(int dev_fd)
return 0;
 }

-/* set up DMA mappings */
-static int
-pci_vfio_setup_dma_maps(int vfio_container_fd)
-{
-   const struct rte_memseg *ms = rte_eal_get_physmem_layout();
-   int i, ret;
-
-   ret = ioctl(vfio_container_fd, VFIO_SET_IOMMU,
-   VFIO_TYPE1_IOMMU);
-   if (ret) {
-   RTE_LOG(ERR, EAL, "  cannot set IOMMU type, "
-   "error %i (%s)\n", errno, strerror(errno));
-   return -1;
+/* pick IOMMU type. returns a pointer to vfio_iommu_type or NULL for error */
+static const struct vfio_iommu_type *
+pci_vfio_set_iommu_type(int vfio_container_fd) {
+   unsigned idx;
+   for (idx = 0; idx < RTE_DIM(iommu_types); idx++) {
+   const struct vfio_iommu_type *t = &iommu_types[idx];
+
+   int ret = ioctl(vfio_container_fd, VFIO_SET_IOMMU,
+   t->type_id);
+   if (!ret) {
+   RTE_LOG(NOTICE, EAL, "  using IOMMU type %d (%s)\n",
+   t->type_id, t->name);
+   return t;
+   }
+   /* not an error, there may be more supported IOMMU types */
+ 

[dpdk-dev] [RESEND PATCH] vhost_user: Make sure that memory map is set before attempting address translation

2016-01-13 Thread Xie, Huawei
On 1/13/2016 5:40 PM, Pavel Fedin wrote:
>  Hello!
>
>> Do you have examples for the malfunctioning clients? If so, could you
>> list them in the commit message?
>  The only malfunctioning client was DPDK itself, with virtio for container 
> RFC applied. The client-side problem has been fixed
> afterwards by http://dpdk.org/ml/archives/dev/2016-January/031169.html. See 
> the RFC discussion thread.

If this is the case, i am wondering whether we should include
"malfunctioning clients" in commit message. It triggers me to think if
there are existing buggy implementations.

Anyway, check is OK.

> Kind regards,
> Pavel Fedin
> Senior Engineer
> Samsung Electronics Research center Russia
>
>
>



[dpdk-dev] [RFC v2 1/2] ethdev: add packet filter flow and new behavior switch to fdir

2016-01-13 Thread Wu, Jingjing
Hi, Rahul

> -Original Message-
> From: Rahul Lakkireddy [mailto:rahul.lakkireddy at chelsio.com]
> Sent: Wednesday, January 13, 2016 4:49 PM
> To: Wu, Jingjing
> Cc: dev at dpdk.org; Felix Marti; Kumar A S; Nirranjan Kirubaharan
> Subject: Re: [dpdk-dev] [RFC v2 1/2] ethdev: add packet filter flow and new 
> behavior
> switch to fdir
> 
> Hi Jingjing,
> 
> On Tuesday, January 01/12/16, 2016 at 17:12:47 -0800, Wu, Jingjing wrote:
> > >
> > > diff --git a/lib/librte_ether/rte_eth_ctrl.h 
> > > b/lib/librte_ether/rte_eth_ctrl.h
> > > index ce224ad..5cc22a0 100644
> > > --- a/lib/librte_ether/rte_eth_ctrl.h
> > > +++ b/lib/librte_ether/rte_eth_ctrl.h
> > > @@ -74,7 +74,11 @@ extern "C" {
> > >  #define RTE_ETH_FLOW_IPV6_EX15
> > >  #define RTE_ETH_FLOW_IPV6_TCP_EX16
> > >  #define RTE_ETH_FLOW_IPV6_UDP_EX17
> > > -#define RTE_ETH_FLOW_MAX18
> > > +#define RTE_ETH_FLOW_PKT_FILTER_IPV4_TCP 18 #define
> > > +RTE_ETH_FLOW_PKT_FILTER_IPV4_UDP 19 #define
> > > +RTE_ETH_FLOW_PKT_FILTER_IPV6_TCP 20 #define
> > > +RTE_ETH_FLOW_PKT_FILTER_IPV6_UDP 21
> > > +#define RTE_ETH_FLOW_MAX22
> > >
> > How to distinguish RTE_ETH_FLOW_PKT_FILTER_IPV4_XX with
> RTE_ETH_FLOW_NONFRAG_IPV4_XX, what is the difference?
> 
> The packet filter flow is basically a superset containing Ethernet,
> vlan, ipv4/ipv6 and tcp/udp flows whose fields can all be matched at
> the same time, unlike in case of the current flow director which seems
> to match only one of the flows at any given time.  Additionally, it also
> allows specifying masks.  I separated the two to make this meaning
> explicit.  If this is not necessary, then I will merge them.
Thanks for clarification, now I understand. How about just define one to 
indicate using pkt_filter?
And move the IPV4_XX info to the structure rte_eth_pkt_filter?
> > There is also a patch http://dpdk.org/dev/patchwork/patch/9661/ which added 
> > these
> fields. Maybe we can merge them together.
> 
> Yes, we can merge them.  Would you like me to merge your patch here?
The i40e driver implementation is done based on the change. If you'd like to 
merge,
maybe other patches in the patch set also need to be merged. Anyway, I think
maintainer can deal with it. 

> 
> The current rte_eth_XX_flow only allow matching _one_ of the flows
> because of the union.  In contrast, rte_eth_pkt_filter_flow can match
> several flows at the same time; i.e. it can match ethernet, vlan, ip,
> and tcp/udp all at the same time.  rte_eth_pkt_filter_flow is basically
> a superset containing several flows that can be matched at the same
> time.
> 
> In our Chelsio T5 hardware, it's possible to have several flows that can
> be matched in a single rule. This is why I've created a superset that
> can match several flows in the same rule.
> 

> Thanks,
> Rahul

Thanks for clarification, it's clear. And it's great to have this feature.
Even it is a superset containing several flows, we still can use the
existing structs like
struct rte_eth_ipv4_flow {
struct rte_eth_l2_flow ether;
struct rte_eth_vlan_flow vlan;
uint32_t src_ip;
uint32_t dst_ip;
};
What do you think?

Thanks
Jingjing


[dpdk-dev] [PATCH 03/14] eal/common: introduce union rte_device and related

2016-01-13 Thread Bruce Richardson
On Mon, Jan 04, 2016 at 09:08:15PM +0100, Jan Viktorin wrote:
> The union rte_device can be used in situations where we want to work with all
> devices without distinguishing among bus-specific features (PCI, ...).
> The target device type can be detected by reading the magic.
> 
> Also, the macros RTE_DEVICE_DECL and RTE_DEVICE_PTR_DECL are introduced to
> provide a generic way to declare a device or a pointer to a device. The macros
> aim to preserve API backwards-compatibility. Eg.
> 
> struct old_super_struct { =>   struct old_super_struct {
> struct rte_pci_device *pci_dev;   =>RTE_DEVICE_PTR_DECL(pci_dev);
> ...   =>   ...
> };=>   };
> 
> struct old_super_struct inst;
> 
> The new code should reference inst.dev.pci, the old code can still use the
> inst.pci_dev. The previously introduced magic is included so one can ask the
> instance about its type:
> 
> if (inst.dev.magic == RTE_PCI_DEVICE_MAGIC) {
> ...
> }

Rather than magic numbers i.e. #defines, an enum might be better.

/Bruce



[dpdk-dev] [PATCH 00/14] Step towards PCI independency

2016-01-13 Thread David Marchand
Hello Jan,

On Mon, Jan 11, 2016 at 6:29 PM, Jan Viktorin  
wrote:
> Hello David,
>
> did you find time to see the patchset? I am working on a PMD on top of
> these so I'd be glad to base on the code close to the (potentially)
> upstreamed one.

I took a quick look but since we still have an abi breakage, I am
trying to put my ideas on paper and will share to you asap.


Regards,
-- 
David Marchand


[dpdk-dev] [PATCH 03/14] eal/common: introduce union rte_device and related

2016-01-13 Thread Jan Viktorin
On Wed, 13 Jan 2016 14:01:19 +
Bruce Richardson  wrote:

> On Mon, Jan 04, 2016 at 09:08:15PM +0100, Jan Viktorin wrote:
> > The union rte_device can be used in situations where we want to work with 
> > all
> > devices without distinguishing among bus-specific features (PCI, ...).
> > The target device type can be detected by reading the magic.
> > 
> > Also, the macros RTE_DEVICE_DECL and RTE_DEVICE_PTR_DECL are introduced to
> > provide a generic way to declare a device or a pointer to a device. The 
> > macros
> > aim to preserve API backwards-compatibility. Eg.
> > 
> > struct old_super_struct { =>   struct old_super_struct {
> > struct rte_pci_device *pci_dev;   =>
> > RTE_DEVICE_PTR_DECL(pci_dev);
> > ...   =>   ...
> > };=>   };
> > 
> > struct old_super_struct inst;
> > 
> > The new code should reference inst.dev.pci, the old code can still use the
> > inst.pci_dev. The previously introduced magic is included so one can ask the
> > instance about its type:
> > 
> > if (inst.dev.magic == RTE_PCI_DEVICE_MAGIC) {
> > ...
> > }  
> 
> Rather than magic numbers i.e. #defines, an enum might be better.

True. However, would it be helpful to put really some _magic_ numbers
there for debugging purposes (to clearly recognize the data type)? Or,
is it sufficient to just say 1 for PCI, 2 for SoC, 3 for xxx...?

> 
> /Bruce
> 



-- 
   Jan Viktorin  E-mail: Viktorin at RehiveTech.com
   System Architect  Web:www.RehiveTech.com
   RehiveTech
   Brno, Czech Republic


[dpdk-dev] [PATCH 03/14] eal/common: introduce union rte_device and related

2016-01-13 Thread Bruce Richardson
On Wed, Jan 13, 2016 at 03:12:27PM +0100, Jan Viktorin wrote:
> On Wed, 13 Jan 2016 14:01:19 +
> Bruce Richardson  wrote:
> 
> > On Mon, Jan 04, 2016 at 09:08:15PM +0100, Jan Viktorin wrote:
> > > The union rte_device can be used in situations where we want to work with 
> > > all
> > > devices without distinguishing among bus-specific features (PCI, ...).
> > > The target device type can be detected by reading the magic.
> > > 
> > > Also, the macros RTE_DEVICE_DECL and RTE_DEVICE_PTR_DECL are introduced to
> > > provide a generic way to declare a device or a pointer to a device. The 
> > > macros
> > > aim to preserve API backwards-compatibility. Eg.
> > > 
> > > struct old_super_struct { =>   struct old_super_struct {
> > > struct rte_pci_device *pci_dev;   =>
> > > RTE_DEVICE_PTR_DECL(pci_dev);
> > > ...   =>   ...
> > > };=>   };
> > > 
> > > struct old_super_struct inst;
> > > 
> > > The new code should reference inst.dev.pci, the old code can still use the
> > > inst.pci_dev. The previously introduced magic is included so one can ask 
> > > the
> > > instance about its type:
> > > 
> > > if (inst.dev.magic == RTE_PCI_DEVICE_MAGIC) {
> > > ...
> > > }  
> > 
> > Rather than magic numbers i.e. #defines, an enum might be better.
> 
> True. However, would it be helpful to put really some _magic_ numbers
> there for debugging purposes (to clearly recognize the data type)? Or,
> is it sufficient to just say 1 for PCI, 2 for SoC, 3 for xxx...?
> 
I'd find it hard to see the need for actual magic numbers. I think the magic 
field
should be renamed to "type" and the values taken from a device_type enum. Should
make the code more readable e.g.

if (inst.dev.type == RTE_DEVTYPE_PCI) { ... }

/Bruce


[dpdk-dev] [PATCH v2 6/7] eal: pci: export pci_map_device

2016-01-13 Thread Santosh Shukla
On Tue, Jan 12, 2016 at 2:35 PM, Yuanhan Liu
 wrote:
> On Tue, Jan 12, 2016 at 04:40:43PM +0800, Yuanhan Liu wrote:
>> On Tue, Jan 12, 2016 at 09:31:05AM +0100, David Marchand wrote:
>> > On Tue, Jan 12, 2016 at 7:59 AM, Yuanhan Liu > > linux.intel.com>
>> > wrote:
>> >
>> > Normally we could set RTE_PCI_DRV_NEED_MAPPING flag so that eal will
>> > invoke pci_map_device internally for us. From that point view, there
>> > is no need to export pci_map_device.
>> >
>> > However, for virtio pmd driver, which is designed to work without
>> > binding UIO (or something similar first), pci_map_device() will fail,
>> > which ends up with virtio pmd driver being skipped. Therefore, we can
>> > not set RTE_PCI_DRV_NEED_MAPPING blindly at virtio pmd driver.
>> >
>> > Therefore, this patch exports pci_map_device, and let virtio pmd
>> > call it when necessary.
>> >
>> >
>> > Well, if you introduce map function, I suppose, for hotplug, you would need
>> > unmap.
>>
>> Good remind. Thanks. I will export pci_unmap_device as well.
>
> And here you go.
>
> --yliu
>
> -- >8 --
> From aa3d9d0fa827781d1563fd4c06ba04a8fafdc41c Mon Sep 17 00:00:00 2001
> From: Yuanhan Liu 
> Date: Mon, 11 Jan 2016 16:51:35 +0800
> Subject: [PATCH] eal: pci: export pci_[un]map_device
>
> Normally we could set RTE_PCI_DRV_NEED_MAPPING flag so that eal will
> invoke pci_map_device internally for us. From that point view, there
> is no need to export pci_map_device.
>
> However, for virtio pmd driver, which is designed to work without
> binding UIO (or something similar first), pci_map_device() will fail,
> which ends up with virtio pmd driver being skipped. Therefore, we can
> not set RTE_PCI_DRV_NEED_MAPPING blindly at virtio pmd driver.
>
> Therefore, this patch exports pci_map_device, and let virtio pmd
> call it when necessary.
>
> Signed-off-by: Yuanhan Liu 
> ---
> v2: - export pci_unmap_device as well
>
> - Add few more comments about rte_eal_pci_map_device().

This patch tested for vfio-noIOMMU mode for arm64 platform.
Tested-By: Santosh Shukla 


[dpdk-dev] [PATCH v3 06/12] eal: pci: vfio: add rd/wr func for pci bar space

2016-01-13 Thread Bruce Richardson
On Thu, Jan 07, 2016 at 10:19:25AM -0800, Stephen Hemminger wrote:
> On Thu,  7 Jan 2016 22:03:03 +0530
> Santosh Shukla  wrote:
> 
> >  
> > +int rte_eal_pci_read_bar(const struct rte_pci_device *device __rte_unused,
> > +void *buf __rte_unused,
> > +size_t len __rte_unused,
> > +off_t offset __rte_unused,
> > +int bar_idx __rte_unused)
> > +{
> > +#ifdef VFIO_PRESENT
> > +   const struct rte_intr_handle *intr_handle = &device->intr_handle;
> > +   return pci_vfio_read_bar(intr_handle, buf, len, offset, bar_idx);
> > +#else
> > +   return 0; /* UIO's not applicable */
> > +#endif
> > +}
> 
> It seems wrong to declare all the parameters as unused but then use them.
> Maybe there is a way to have a macro for USED(x) in the #else case

There is RTE_SET_USED() in rte_common.h.
I'd prefer that macro used than the option of duplicating the entire function
which really pads out the code.

/Bruce


[dpdk-dev] [PATCH 0/4] virtio support for container

2016-01-13 Thread Amit Tomer
Hello,

>
> You can use below patch for l2fwd to send out an arp packet when it gets
> started.

I tried to send out arp packet using this patch but buffer allocation
for arp packets itself gets failed:

 m = rte_pktmbuf_alloc(mp);

Return a NULL Value.

Thanks,
Amit.


[dpdk-dev] [PATCH] doc: coding style: use linux kernel style for indentation

2016-01-13 Thread Bruce Richardson
On Wed, Jan 13, 2016 at 03:58:49PM +0800, Yuanhan Liu wrote:
> Using two tabs for "if" (or "while") statements is a bit weird to me.
> Also, using one tab unconditionaly for function definitions and
> prototypes doesn't look great.
> 
> Here I'd suggest to use the indentation style the Linux kernel
> project prefers: to align with the open brace with tabs and
> additonal spaces (when necessary).
> 
> Example:
> 
>   static int
>   rte_eal_foo_bar(int a_long_argument_1, int another_long_argument_2,
>   struct foo *yet_another_long_argument_3)
> 
>   ret = rte_eal_foo_bar(a_long_argument_1, another_long_argument_2,
> yet_another_long_argument_3);
> 
>   if (really_long_variable_name_1 == really_long_variable_name_2 &&
>   var3 == var4) {
>   x = y + z;
>   ;
>   }
> 
> Cc: Thomas Monjalon 
> Cc: Siobhan Butler 
> Cc: John McNamara 
> Signed-off-by: Yuanhan Liu 
> ---

While it's not a big deal - yet is something likely to trigger massive 
discussion
my objections to this style of indentation is two-fold:

1. It means that we are using a mix of tabs and spaces for indentation at the 
start of a line. I think it's more consistent that all whitespace at the start
of a line should be either tabs or spaces, but not a mixture.
2. It makes how the code look much more dependent upon the tab-size being used
in the viewer - being able to adjust how much whitespace is seem at the start of
each line is a major advantage of using tabs rather than spaces for indentation.
For anyone using a 4-character tabstop - probably the most popular tabstop value
after an 8-char one - aligning an if-statement continuation to the opening brace
will cause it to align with the body of the block.

So, while the two-tab indent may look "a bit weird" it does solve the two issues
above. I believe practical benefits should override initial impressions. [It 
took
me a while to get used to also, but now I very much like it as a style.]

Regards,
/Bruce



[dpdk-dev] rte_prefetch0() is effective?

2016-01-13 Thread Polehn, Mike A
Prefetchs make a big difference because a powerful CPU like IA is always trying 
to find items to prefetch and the priority of these is not always easy to 
determine. This is especially a problem across subroutine calls since the 
compiler cannot determine what is of priority in the other subroutines and the 
runtime CPU logic cannot always have the future well predicted far enough in 
the future for all possible paths, especially if you have a cache miss, which 
takes eons of clock cycles to do a memory access probably resulting in a CPU 
stall.

Until we get to the point of the computers full understanding the logic of the 
program and writing optimum code (putting programmers out of business) , the 
understanding of what is important as the program progresses gives the 
programmer knowledge of what is desirable to prefetch. It is difficult to 
determine if the CPU is going to have the same priority of the prefetch, so 
having a prefetch may or may not show up as a measureable performance 
improvement under some conditions, but having the prefetch decision in place 
can make prefetch priority decision correct in these other cases, which make a 
performance improvement.

Removing a prefetch without thinking through and fully understanding the logic 
of why it is there, or what he added cost (in the case of calculating an 
address for the prefetch that affects other current operations) if any, is just 
plain amateur  work.  It is not to say people do not make bad judgments on what 
needs to be prefetched and put poor prefetch placement and should only be 
removed if not logically proper for expected runtime operation.

Only more primitive CPUs with no prefetch capabilities don't benefit from 
properly placed prefetches. 

Mike

-Original Message-
From: dev [mailto:dev-boun...@dpdk.org] On Behalf Of Bruce Richardson
Sent: Wednesday, January 13, 2016 3:35 AM
To: Moon-Sang Lee
Cc: dev at dpdk.org
Subject: Re: [dpdk-dev] rte_prefetch0() is effective?

On Thu, Dec 24, 2015 at 03:35:14PM +0900, Moon-Sang Lee wrote:
> I see codes as below in example directory, and I wonder it is effective.
> Coherent IO is adopted to modern architectures, so I think that DMA 
> initiation by rte_eth_rx_burst() might already fulfills cache lines of 
> RX buffers.
> Do I really need to call rte_prefetchX()?
> 
> nb_rx = rte_eth_rx_burst(portid, queueid, pkts_burst, 
> MAX_PKT_BURST);
> ...
> /* Prefetch and forward already prefetched packets */
> for (j = 0; j < (nb_rx - PREFETCH_OFFSET); j++) {
> rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[
> j + PREFETCH_OFFSET], void *));
> l3fwd_simple_forward(pkts_burst[j], portid,
> qconf);
> }
> 

Good question.
When the first example apps using this style of prefetch were originally 
written, yes, there was a noticable performance increase achieved by using the 
prefetch.
Thereafter, I'm not sure that anyone has checked with each generation of 
platforms whether the prefetches are still necessary and how much they help, 
but I suspect that they still help a bit, and don't hurt performance.
It would be an interesting exercise to check whether the prefetch offsets used 
in code like above can be adjusted to give better performance on our latest 
supported platforms.

/Bruce


[dpdk-dev] [PATCH v2] mbuf: optimize rte_mbuf_refcnt_update

2016-01-13 Thread Hanoch Haim (hhaim)
Hi Bruce. 
This is exactly what was possible before the patch and does *not* work after 
this patch. 
I think it is more related to the semantic of the API. My understanding of the 
API was that I could allocate a share mbuf (ref=1) and then attach it from many 
concurrent threads (as each thread attach opt inc in 1 and driver reduces, 
atomic by 1)
After the patch there is a need to make the ref of the share mbuf to be 2 
(why?) and free it twice at the end( ?), this makes it cumbersome to use it and 
for sure there is a need to describe this semantic change.

However, this change is good for the common case, and I think it is better to 
keep it.
The options I see are:
1) Document the current behavior 
2) Create two sets of API with two types of semantic 

thanks,
Hanoh

-Original Message-
From: Bruce Richardson [mailto:bruce.richard...@intel.com] 
Sent: Wednesday, January 13, 2016 1:48 PM
To: Hanoch Haim (hhaim)
Cc: Olivier MATZ; dev at dpdk.org; Ido Barnea (ibarnea); Itay Marom (imarom)
Subject: Re: [dpdk-dev] [PATCH v2] mbuf: optimize rte_mbuf_refcnt_update

On Tue, Jan 05, 2016 at 11:11:24AM +, Hanoch Haim (hhaim) wrote:
> Hi Oliver,
> Thank you for the fast response and it would be great to open a discussion on 
> that.
> In general our project can leverage your optimization and I think it is great 
> (we should have thought about it) . We can use it using the workaround I 
> described.
> However, for me it  seems odd that  rte_pktmbuf_attach () that does not 
> *change* anything in m_const, except of the *atomic* ref counter does not 
> work in parallel.
> The example I gave is a classic use case of rte_pktmbuf_attach  (multicast ) 
> and I don't see why it wouldn't work after your optimization. 
> 
> Do you have a pointer to the documentation that state that that you can't 
> call the atomic ref counter from more than one thread?
> 
Hi,

actually, the issue is not that you can't work with the reference counter field 
from multiple threads, or that you can't use an mbuf from multiple threads, 
it's that if you are working with the same mbuf in multiple threads you have 
multiple references to the mbuf and your application must increase the 
reference counter appropriately. For example, if thread A is going to pass an 
mbuf to thread B and keep using it itself, you must increment the reference 
counter in thread A before enqueuing it to B.

Regards,
/Bruce



[dpdk-dev] [PATCH v2] mbuf: optimize rte_mbuf_refcnt_update

2016-01-13 Thread Bruce Richardson
On Wed, Jan 13, 2016 at 04:28:34PM +, Hanoch Haim (hhaim) wrote:
> Hi Bruce. 
> This is exactly what was possible before the patch and does *not* work after 
> this patch. 
> I think it is more related to the semantic of the API. My understanding of 
> the API was that I could allocate a share mbuf (ref=1) and then attach it 
> from many concurrent threads (as each thread attach opt inc in 1 and driver 
> reduces, atomic by 1)
> After the patch there is a need to make the ref of the share mbuf to be 2 
> (why?) and free it twice at the end( ?), this makes it cumbersome to use it 
> and for sure there is a need to describe this semantic change.
> 

There is no such thing as allocating a shared mbuf. 
Mbufs are always allocated with a reference count of one. 
To share an mbuf among threads, then the ref count must be increased.

NOTE: this is a different operation to that of "attaching" another mbuf to it -
it just happens that when attaching one mbuf to another you have to increase
the reference count as the attaching mbuf acts like another thread holding a
pointer to the original mbuf. 

Once finished with the mbuf, each thread (or pointing mbuf) needs to
decrement the reference counter, and the last one holding the reference is then
responsible for freeing that buffer. The easiest way to do this is to use the
free call and let it handle the reference counts, but if it makes more logical
sense in an app to do the decrement and checking itself (so that inc's and dec's
match in the app code), then that can work too, obviously.

> However, this change is good for the common case, and I think it is better to 
> keep it.
> The options I see are:
> 1) Document the current behavior 
> 2) Create two sets of API with two types of semantic 

Given the confusion I think documenting the way things are "meant to work" is
a good idea.

/Bruce

> 
> thanks,
> Hanoh
> 
> -Original Message-
> From: Bruce Richardson [mailto:bruce.richardson at intel.com] 
> Sent: Wednesday, January 13, 2016 1:48 PM
> To: Hanoch Haim (hhaim)
> Cc: Olivier MATZ; dev at dpdk.org; Ido Barnea (ibarnea); Itay Marom (imarom)
> Subject: Re: [dpdk-dev] [PATCH v2] mbuf: optimize rte_mbuf_refcnt_update
> 
> On Tue, Jan 05, 2016 at 11:11:24AM +, Hanoch Haim (hhaim) wrote:
> > Hi Oliver,
> > Thank you for the fast response and it would be great to open a discussion 
> > on that.
> > In general our project can leverage your optimization and I think it is 
> > great (we should have thought about it) . We can use it using the 
> > workaround I described.
> > However, for me it  seems odd that  rte_pktmbuf_attach () that does not 
> > *change* anything in m_const, except of the *atomic* ref counter does not 
> > work in parallel.
> > The example I gave is a classic use case of rte_pktmbuf_attach  (multicast 
> > ) and I don't see why it wouldn't work after your optimization. 
> > 
> > Do you have a pointer to the documentation that state that that you can't 
> > call the atomic ref counter from more than one thread?
> > 
> Hi,
> 
> actually, the issue is not that you can't work with the reference counter 
> field from multiple threads, or that you can't use an mbuf from multiple 
> threads, it's that if you are working with the same mbuf in multiple threads 
> you have multiple references to the mbuf and your application must increase 
> the reference counter appropriately. For example, if thread A is going to 
> pass an mbuf to thread B and keep using it itself, you must increment the 
> reference counter in thread A before enqueuing it to B.
> 
> Regards,
> /Bruce
> 


[dpdk-dev] [PATCH v2] vfio: Support for no-IOMMU mode

2016-01-13 Thread Stephen Hemminger
On Wed, 13 Jan 2016 12:36:09 +
Anatoly Burakov  wrote:

> +/* IOMMU types we support */
> +static const struct vfio_iommu_type iommu_types[] = {
> + /* x86 IOMMU, otherwise known as type 1 */
> + { VFIO_TYPE1_IOMMU, "Type 1", &vfio_iommu_type1_dma_map},
> + /* IOMMU-less mode */
> + { VFIO_NOIOMMU_IOMMU, "No-IOMMU", &vfio_iommu_noiommu_dma_map},
> +};
> +

Nit.. Why full-tab indent here?


[dpdk-dev] [PATCH] doc: coding style: use linux kernel style for indentation

2016-01-13 Thread Stephen Hemminger
On Wed, 13 Jan 2016 15:07:08 +
Bruce Richardson  wrote:

> So, while the two-tab indent may look "a bit weird" it does solve the two 
> issues
> above. I believe practical benefits should override initial impressions. [It 
> took
> me a while to get used to also, but now I very much like it as a style.]

I don't think that deviating from kernel style for this case is justified.


[dpdk-dev] UX Bug in Sphinx HTML Layout for Programming Guide (and maybe other guides?)

2016-01-13 Thread Matthew Hall
When you go to this link:

http://dpdk.org/doc/guides/prog_guide/perf_opt_guidelines.html

There is a bug in the Sphinx layout, where the subchapters of a chapter are
invisible even after the chapter is clicked.

It is a pain when you are trying to figure out the different sections in a 
widely variable chapter like the performance chapter, to know which section to 
read without having to click one-by-one to find the right part.

I am not sure about what it would take to show the subchapters in the outline 
view to make this easier for someone to figure out. Or maybe a TOC for a 
chapter at the beginning of the chapter?

This would really help a lot.

Sincerely,
Matthew.


[dpdk-dev] rte_prefetch0() is effective?

2016-01-13 Thread Matthew Hall
On Wed, Jan 13, 2016 at 11:34:33AM +, Bruce Richardson wrote:
> When the first example apps using this style of prefetch were originally 
> written, yes, there was a noticable performance increase achieved by using 
> the prefetch. Thereafter, I'm not sure that anyone has checked with each 
> generation of platforms whether the prefetches are still necessary and how 
> much they help, but I suspect that they still help a bit, and don't hurt 
> performance.

FYI, for me as a community member this paragraph describes one of my top 
irritations about DPDK.

The Intel accelerations, such as adding prefetches, or support for new 
features like the librte_power, are treated as one-off projects not as ongoing 
technical efforts which need periodic retesting and maintenance. Thus it 
turned out that after waiting over a month for a reply, I eventually 
discovered librte_power probably never worked right at all since at least 
Sandy Bridge, which is a very old chip by now for servers.

The accelerations are also treated like black magic. Meaning no comments are 
put in the code about how and why they work, so an outsider trying his best to 
measure things in VTune to help provide the ongoing testing and maintenance, 
can not tell why something was done or how it might be adjusted to work right 
in their environment if their hardware is older or newer than whatever 
undocumented hardware was used in developing the example. There's nowhere I 
know of that says the reference platform and core generation used for 
developing an example either so I could get some idea if it's current or old 
code.

When I ask a high level question, such as "Which Intel accelerations should 
one make sure are enabled to get best performance?" it normally doesn't get 
any reply. This makes life difficult because there are many dozen 
accelerations listed in the data sheet of a typical modern Intel core and no 
guidance is provided on the priority of the different accelerations for DPDK. 
So I don't have a good idea about where to focus my time to get the best 
acceleration out of all the technology it must have cost Intel millions or 
billions to create. To me that's very sad.

I am hoping maybe there are some resources we could make available to help 
understand the principles behind the accelerations so it is easier for the 
community to take part in maintaining them and maybe even helping create new 
ones.

Note: I read through all the subchapters here:

http://dpdk.org/doc/guides/prog_guide/perf_opt_guidelines.html

None of them mention any CPU acceleration details whatsoever.

They don't explain any specifics on prefetch or branch prediction. Only that 
they exist and do things.

Sincerely,
Matthew.


[dpdk-dev] [PATCH] app/testpmd Fix max_socket detection

2016-01-13 Thread Stephen Hurd
Previously, max_socket was set to the highest numbered socket with
an enabled lcore.  The intent is to set it to the highest socket
regardless of it being enabled.

Change-Id: I6306af0f90aa3c1fc5ffed75d1eed8297d29e132
Signed-off-by: Stephen Hurd 
---
 app/test-pmd/testpmd.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 6129c26..26a2cce 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -359,17 +359,17 @@ set_default_fwd_lcores_config(void)

nb_lc = 0;
for (i = 0; i < RTE_MAX_LCORE; i++) {
+   if (sock_num > max_socket) {
+   if (sock_num > RTE_MAX_NUMA_NODES)
+   rte_exit(EXIT_FAILURE, "Total sockets greater 
than %u\n", RTE_MAX_NUMA_NODES);
+   max_socket = sock_num;
+   }
if (! rte_lcore_is_enabled(i))
continue;
if (i == rte_get_master_lcore())
continue;
fwd_lcores_cpuids[nb_lc++] = i;
sock_num = rte_lcore_to_socket_id(i) + 1;
-   if (sock_num > max_socket) {
-   if (sock_num > RTE_MAX_NUMA_NODES)
-   rte_exit(EXIT_FAILURE, "Total sockets greater 
than %u\n", RTE_MAX_NUMA_NODES);
-   max_socket = sock_num;
-   }
}
nb_lcores = (lcoreid_t) nb_lc;
nb_cfg_lcores = nb_lcores;
-- 
1.9.1



[dpdk-dev] [PATCH v2] app/testpmd Fix max_socket detection

2016-01-13 Thread Stephen Hurd
Previously, max_socket was set to the highest numbered socket with
an enabled lcore.  The intent is to set it to the highest socket
regardless of it being enabled.

Change-Id: I6306af0f90aa3c1fc5ffed75d1eed8297d29e132
Signed-off-by: Stephen Hurd 

v2: Forgot to commit before sending email... sorry for the nouse.
---
 app/test-pmd/testpmd.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c
index 6129c26..a4088f9 100644
--- a/app/test-pmd/testpmd.c
+++ b/app/test-pmd/testpmd.c
@@ -359,17 +359,17 @@ set_default_fwd_lcores_config(void)

nb_lc = 0;
for (i = 0; i < RTE_MAX_LCORE; i++) {
-   if (! rte_lcore_is_enabled(i))
-   continue;
-   if (i == rte_get_master_lcore())
-   continue;
-   fwd_lcores_cpuids[nb_lc++] = i;
sock_num = rte_lcore_to_socket_id(i) + 1;
if (sock_num > max_socket) {
if (sock_num > RTE_MAX_NUMA_NODES)
rte_exit(EXIT_FAILURE, "Total sockets greater 
than %u\n", RTE_MAX_NUMA_NODES);
max_socket = sock_num;
}
+   if (! rte_lcore_is_enabled(i))
+   continue;
+   if (i == rte_get_master_lcore())
+   continue;
+   fwd_lcores_cpuids[nb_lc++] = i;
}
nb_lcores = (lcoreid_t) nb_lc;
nb_cfg_lcores = nb_lcores;
-- 
1.9.1



[dpdk-dev] VFIO no-iommu

2016-01-13 Thread Alex Williamson
On Thu, 2016-01-14 at 14:03 +0800, Jike Song wrote:
> On Wed, Dec 16, 2015 at 12:38 PM, Alex Williamson
>  wrote:
> > 
> > So it works.??Is it acceptable???Useful???Sufficiently complete???Does
> > it imply deprecating the uio interface???I believe the feature that
> > started this discussion was support for MSI/X interrupts so that VFs
> > can support some kind of interrupt (uio only supports INTx since it
> > doesn't allow DMA).??Implementing that would be the ultimate test of
> > whether this provides dpdk with not only a more consistent interface,
> > but the feature dpdk wants that's missing in uio. Thanks,
> > 
> Hi Alex,
> 
> Sorry for jumping in.??Just being curious, how does VFIO No-IOMMU mode
> support DMA from userspace drivers???If I understand correctly, due to
> the absence of IOMMU, pcidev has to use physaddr to start a DMA
> transaction, but how it is supposed to get physaddr from userspace
> drivers, /proc//pagemap or something else?

Hi Jike,

vfio no-iommu does nothing to facilitate DMA mappings, UIO didn't
either and DPDK managed to work with that. ?vfio no-iommu is meant to
be an enabler and provide a consistent vfio device model (with MSI/X
interrupts), but fundamentally the idea of a non-iommu protected, user
owned device capable of DMA is unsupportable. ?This is why vfio no-
iommu taints the kernel. ?With that in mind, one of the design goals is
to introduce as little code as possible for vfio no-iommu. ?A new vfio
iommu backend with pinning and virt-to-bus translation goes against
that design goal. ?I don't know the details of how DPDK did this with
UIO, but the same lack of DMA mapping facilities is present with vfio
no-iommu. ?It really just brings the vfio device model, nothing more.
Thanks,

Alex