[dpdk-dev] [PATCH 1/6] ixgbe: Support VMDq RSS in non-SRIOV environment

2015-08-25 Thread Ouyang, Changchun
Hi Michael,

Pls review the latest version (v4).

Thanks for your effort
Changchun


> -Original Message-
> From: Qiu, Michael
> Sent: Monday, August 24, 2015 6:42 PM
> To: Ouyang, Changchun; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH 1/6] ixgbe: Support VMDq RSS in non-SRIOV
> environment
> 
> On 5/21/2015 3:50 PM, Ouyang Changchun wrote:
> > In non-SRIOV environment, VMDq RSS could be enabled by MRQC register.
> > In theory, the queue number per pool could be 2 or 4, but only 2
> > queues are available due to HW limitation, the same limit also exist in 
> > Linux
> ixgbe driver.
> >
> > Signed-off-by: Changchun Ouyang 
> > ---
> >  lib/librte_ether/rte_ethdev.c | 40 +++
> >  lib/librte_pmd_ixgbe/ixgbe_rxtx.c | 82
> > +--
> >  2 files changed, 111 insertions(+), 11 deletions(-)
> >
> > diff --git a/lib/librte_ether/rte_ethdev.c
> > b/lib/librte_ether/rte_ethdev.c index 024fe8b..6535715 100644
> > --- a/lib/librte_ether/rte_ethdev.c
> > +++ b/lib/librte_ether/rte_ethdev.c
> > @@ -933,6 +933,16 @@ rte_eth_dev_check_vf_rss_rxq_num(uint8_t
> port_id, uint16_t nb_rx_q)
> > return 0;
> >  }
> >
> > +#define VMDQ_RSS_RX_QUEUE_NUM_MAX 4
> > +
> > +static int
> > +rte_eth_dev_check_vmdq_rss_rxq_num(__rte_unused uint8_t port_id,
> > +uint16_t nb_rx_q) {
> > +   if (nb_rx_q > VMDQ_RSS_RX_QUEUE_NUM_MAX)
> > +   return -EINVAL;
> > +   return 0;
> > +}
> > +
> >  static int
> >  rte_eth_dev_check_mq_mode(uint8_t port_id, uint16_t nb_rx_q,
> uint16_t nb_tx_q,
> >   const struct rte_eth_conf *dev_conf) @@ -1093,6
> +1103,36 @@
> > rte_eth_dev_check_mq_mode(uint8_t port_id, uint16_t nb_rx_q,
> uint16_t nb_tx_q,
> > return -EINVAL;
> > }
> > }
> > +
> > +   if (dev_conf->rxmode.mq_mode ==
> ETH_MQ_RX_VMDQ_RSS) {
> > +   uint32_t nb_queue_pools =
> > +   dev_conf-
> >rx_adv_conf.vmdq_rx_conf.nb_queue_pools;
> > +   struct rte_eth_dev_info dev_info;
> > +
> > +   rte_eth_dev_info_get(port_id, &dev_info);
> > +   dev->data->dev_conf.rxmode.mq_mode =
> ETH_MQ_RX_VMDQ_RSS;
> > +   if (nb_queue_pools == ETH_32_POOLS ||
> nb_queue_pools == ETH_64_POOLS)
> > +   RTE_ETH_DEV_SRIOV(dev).nb_q_per_pool =
> > +
>   dev_info.max_rx_queues/nb_queue_pools;
> > +   else {
> > +   PMD_DEBUG_TRACE("ethdev port_id=%d
> VMDQ "
> > +   "nb_queue_pools=%d invalid
> "
> > +   "in VMDQ RSS\n"
> 
> Does here miss "," ?

Yes, it is fixed in later version.

> 
> Thanks,
> Michael
> 
> > +   port_id,
> > +   nb_queue_pools);
> > +   return -EINVAL;
> > +   }
> > +
> > +   if (rte_eth_dev_check_vmdq_rss_rxq_num(port_id,
> > +
>   RTE_ETH_DEV_SRIOV(dev).nb_q_per_pool) != 0) {
> > +   PMD_DEBUG_TRACE("ethdev port_id=%d"
> > +   " SRIOV active, invalid queue"
> > +   " number for VMDQ RSS, allowed"
> > +   " value are 1, 2 or 4\n",
> > +   port_id);
> > +   return -EINVAL;
> > +   }
> > +   }
> > }
> > return 0;
> >  }
> >



[dpdk-dev] vhost compliant virtio based networking interface in container

2015-08-25 Thread Tetsuya Mukawa
Hi Xie and Yanping,


May I ask you some questions?
It seems we are also developing an almost same one.

On 2015/08/20 19:14, Xie, Huawei wrote:
> Added dev at dpdk.org
>
> On 8/20/2015 6:04 PM, Xie, Huawei wrote:
>> Yanping:
>> I read your mail, seems what we did are quite similar. Here i wrote a
>> quick mail to describe our design. Let me know if it is the same thing.
>>
>> Problem Statement:
>> We don't have a high performance networking interface in container for
>> NFV. Current veth pair based interface couldn't be easily accelerated.
>>
>> The key components involved:
>> 1.DPDK based virtio PMD driver in container.
>> 2.device simulation framework in container.
>> 3.dpdk(or kernel) vhost running in host.
>>
>> How virtio is created?
>> A:  There is no "real" virtio-pci device in container environment.
>> 1). Host maintains pools of memories, and shares memory to container.
>> This could be accomplished through host share a huge page file to container.
>> 2). Containers creates virtio rings based on the shared memory.
>> 3). Container creates mbuf memory pools on the shared memory.
>> 4) Container send the memory and vring information to vhost through
>> vhost message. This could be done either through ioctl call or vhost
>> user message.
>>
>> How vhost message is sent?
>> A: There are two alternative ways to do this.
>> 1) The customized virtio PMD is responsible for all the vring creation,
>> and vhost message sending.

Above is our approach so far.
It seems Yanping also takes this kind of approach.
We are using vhost-user functionality instead of using the vhost-net
kernel module.
Probably this is the difference between Yanping and us.

BTW, we are going to submit a vhost PMD for DPDK-2.2.
This PMD is implemented on librte_vhost.
It allows DPDK application to handle a vhost-user(cuse) backend as a
normal NIC port.
This PMD should work with both Xie and Yanping approach.
(In the case of Yanping approach, we may need vhost-cuse)

>> 2) We could do this through a lightweight device simulation framework.
>> The device simulation creates simple PCI bus. On the PCI bus,
>> virtio-net PCI devices are created. The device simulations provides
>> IOAPI for MMIO/IO access.

Does it mean you implemented a kernel module?
If so, do you still need vhost-cuse functionality to handle vhost
messages n userspace?

>>2.1  virtio PMD configures the pseudo virtio device as how it does in
>> KVM guest enviroment.
>>2.2  Rather than using io instruction, virtio PMD uses IOAPI for IO
>> operation on the virtio-net PCI device.
>>2.3  The device simulation is responsible for device state machine
>> simulation.
>>2.4   The device simulation is responsbile for talking to vhost.
>>  With this approach, we could minimize the virtio PMD modifications.
>> The virtio PMD is like configuring a real virtio-net PCI device.
>>
>> Memory mapping?
>> A: QEMU could access the whole guest memory in KVM enviroment. We need
>> to fill the gap.
>> container maps the shared memory to container's virtual address space
>> and host maps it to host's virtual address space. There is a fixed
>> offset mapping.
>> Container creates shared vring based on the memory. Container also
>> creates mbuf memory pool based on the shared memroy.
>> In VHOST_SET_MEMORY_TABLE message, we send the memory mapping
>> information for the shared memory. As we require mbuf pool created on
>> the shared memory, and buffers are allcoated from the mbuf pools, dpdk
>> vhost could translate the GPA in vring desc to host virtual.
>>
>>
>> GPA or CVA in vring desc?
>> To ease the memory translation, rather than using GPA, here we use
>> CVA(container virtual address). This the tricky thing here.
>> 1) virtio PMD writes vring's VFN rather than PFN to PFN register through
>> IOAPI.
>> 2) device simulation framework will use VFN as PFN.
>> 3) device simulation sends SET_VRING_ADDR with CVA.
>> 4) virtio PMD fills vring desc with CVA of the mbuf data pointer rather
>> than GPA.
>> So when host sees the CVA, it could translates it to HVA(host virtual
>> address).
>>
>> Worth to note:
>> The virtio interface in container follows the vhost message format, and
>> is compliant with dpdk vhost implmentation, i.e, no dpdk vhost
>> modification is needed.
>> vHost isn't aware whether the incoming virtio comes from KVM guest or
>> container.
>>
>> The pretty much covers the high level design. There are quite some low
>> level issues. For example, 32bit PFN is enough for KVM guest, since we
>> use 64bit VFN(virtual page frame number),  trick is done here through a
>> special IOAPI.

In addition above, we might consider "namespace" kernel functionality.
Technically, it would not be a big problem, but related with security.
So it would be nice to take account.

Regards,
Tetsuya

>> /huawei
>>
>>  
>>
>>
>>
>>
>>
>>



[dpdk-dev] Why the offloads of the guest's virtio-net network adapter are disabled when vhost-user is used?

2015-08-25 Thread Tetsuya Mukawa
On 2015/08/24 22:09, leo zhu wrote:
> Hi all,
>
> I am running the vhost sample application on my server.
>
> According to the dpdk-sample-applications-user-guide.pdf, I run the Virtual
> Machine with vhost-user enabled.
> Following is the command that is used to run the virtual machine.
>
>
>
>
>
>
> *qemu-system-x86_64 /root/leo/ubuntu-1.img -enable-kvm -m 1024 -vnc :5
> -chardev \socket,id=char1,path=/root/leo/dpdk-2.0.0/examples/vhost/usvhost
> -netdev type=vhost-user, \id=mynet1,chardev=char1,vhostforce -device
> virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1 \-object
> memory-backend-file,id=mem,size=1024M,mem-path=/dev/hugepages,share=on
> -numa node,memdev=mem -mem-prealloc*
>
> After the Virtual Machine is started, I found the offloads of the
> Virtual Machine's virtio-net network adapter
> are all disabled*.* The offloads status is checked with command*
> ethtool -k eth0*. I try to enables the offloads with ethtool command,
> but it does not work.
>
> My questions are:
>
> 1. Can the offloads of the guest's virtio-net network adapter be
> enabled when vhost-user is used?

Hi Leo,

I guess we need additional implementations in librte_vhost to enable
offloads.


> 2. If the offloads can't be enabled when vhost-user is used, what is the 
> reason?

Features are negotiated not olny between virtio-net driver on guest and
virtio-net device in QEMU, but also virtio-net device in QEMU and
vhost-user backend in librte_vhost.
As a result, if vhost-user backend doesn't support some features,
virtio-net driver on guest also cannot use them.

Please see "lib/librte_vhost/virtio-net.c"

/* Features supported by this lib. */
#define VHOST_SUPPORTED_FEATURES ((1ULL << VIRTIO_NET_F_MRG_RXBUF) | \
(1ULL << VIRTIO_NET_F_CTRL_VQ) | \
(1ULL << VIRTIO_NET_F_CTRL_RX) | \
(1ULL << VHOST_F_LOG_ALL))

This is all current librte_vhost supports.

Thanks,
Tetsuya

> It will be great if someone from the forum could give the answers and clues.
>
> Thanks.
> Leo



[dpdk-dev] Why the offloads of the guest's virtio-net network adapter are disabled when vhost-user is used?

2015-08-25 Thread Liu, Jijiang


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Tetsuya Mukawa
> Sent: Tuesday, August 25, 2015 11:23 AM
> To: leo zhu; dev at dpdk.org
> Subject: Re: [dpdk-dev] Why the offloads of the guest's virtio-net network
> adapter are disabled when vhost-user is used?
> 
> On 2015/08/24 22:09, leo zhu wrote:
> > Hi all,
> >
> > I am running the vhost sample application on my server.
> >
> > According to the dpdk-sample-applications-user-guide.pdf, I run the
> > Virtual Machine with vhost-user enabled.
> > Following is the command that is used to run the virtual machine.
> >
> >
> >
> >
> >
> >
> > *qemu-system-x86_64 /root/leo/ubuntu-1.img -enable-kvm -m 1024 -
> vnc :5
> > -chardev
> > \socket,id=char1,path=/root/leo/dpdk-2.0.0/examples/vhost/usvhost
> > -netdev type=vhost-user, \id=mynet1,chardev=char1,vhostforce -device
> > virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1 \-object
> > memory-backend-file,id=mem,size=1024M,mem-
> path=/dev/hugepages,share=on
> > -numa node,memdev=mem -mem-prealloc*
> >
> > After the Virtual Machine is started, I found the offloads of the
> > Virtual Machine's virtio-net network adapter are all disabled*.* The
> > offloads status is checked with command* ethtool -k eth0*. I try to
> > enables the offloads with ethtool command, but it does not work.
> >
> > My questions are:
> >
> > 1. Can the offloads of the guest's virtio-net network adapter be
> > enabled when vhost-user is used?
> 
> Hi Leo,
> 
> I guess we need additional implementations in librte_vhost to enable
> offloads.
> 
> 
> > 2. If the offloads can't be enabled when vhost-user is used, what is the
> reason?
> 
> Features are negotiated not olny between virtio-net driver on guest and
> virtio-net device in QEMU, but also virtio-net device in QEMU and vhost-user
> backend in librte_vhost.
> As a result, if vhost-user backend doesn't support some features, virtio-net
> driver on guest also cannot use them.
> 
> Please see "lib/librte_vhost/virtio-net.c"
Yes, you are correct.

I'm working on the vhost TSO offload,  the offload set in struct 
'virtio_net_hdr' need to be considered in both virtio-net and vhost side. 

> /* Features supported by this lib. */
> #define VHOST_SUPPORTED_FEATURES ((1ULL <<
> VIRTIO_NET_F_MRG_RXBUF) | \
> (1ULL << VIRTIO_NET_F_CTRL_VQ) | \
> (1ULL << VIRTIO_NET_F_CTRL_RX) | \
> (1ULL << VHOST_F_LOG_ALL))
> 
> This is all current librte_vhost supports.
> 
> Thanks,
> Tetsuya
> 
> > It will be great if someone from the forum could give the answers and clues.
> >
> > Thanks.
> > Leo



[dpdk-dev] Why the offloads of the guest's virtio-net network adapter are disabled when vhost-user is used?

2015-08-25 Thread leo zhu
Hi Tetsuya & Jijiang,
Great!  Thanks a lot for your explanations.

Thanks.
Leo

On Tue, Aug 25, 2015 at 11:27 AM, Liu, Jijiang 
wrote:

>
>
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Tetsuya Mukawa
> > Sent: Tuesday, August 25, 2015 11:23 AM
> > To: leo zhu; dev at dpdk.org
> > Subject: Re: [dpdk-dev] Why the offloads of the guest's virtio-net
> network
> > adapter are disabled when vhost-user is used?
> >
> > On 2015/08/24 22:09, leo zhu wrote:
> > > Hi all,
> > >
> > > I am running the vhost sample application on my server.
> > >
> > > According to the dpdk-sample-applications-user-guide.pdf, I run the
> > > Virtual Machine with vhost-user enabled.
> > > Following is the command that is used to run the virtual machine.
> > >
> > >
> > >
> > >
> > >
> > >
> > > *qemu-system-x86_64 /root/leo/ubuntu-1.img -enable-kvm -m 1024 -
> > vnc :5
> > > -chardev
> > > \socket,id=char1,path=/root/leo/dpdk-2.0.0/examples/vhost/usvhost
> > > -netdev type=vhost-user, \id=mynet1,chardev=char1,vhostforce -device
> > > virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1 \-object
> > > memory-backend-file,id=mem,size=1024M,mem-
> > path=/dev/hugepages,share=on
> > > -numa node,memdev=mem -mem-prealloc*
> > >
> > > After the Virtual Machine is started, I found the offloads of the
> > > Virtual Machine's virtio-net network adapter are all disabled*.* The
> > > offloads status is checked with command* ethtool -k eth0*. I try to
> > > enables the offloads with ethtool command, but it does not work.
> > >
> > > My questions are:
> > >
> > > 1. Can the offloads of the guest's virtio-net network adapter be
> > > enabled when vhost-user is used?
> >
> > Hi Leo,
> >
> > I guess we need additional implementations in librte_vhost to enable
> > offloads.
> >
> >
> > > 2. If the offloads can't be enabled when vhost-user is used, what is
> the
> > reason?
> >
> > Features are negotiated not olny between virtio-net driver on guest and
> > virtio-net device in QEMU, but also virtio-net device in QEMU and
> vhost-user
> > backend in librte_vhost.
> > As a result, if vhost-user backend doesn't support some features,
> virtio-net
> > driver on guest also cannot use them.
> >
> > Please see "lib/librte_vhost/virtio-net.c"
> Yes, you are correct.
>
> I'm working on the vhost TSO offload,  the offload set in struct
> 'virtio_net_hdr' need to be considered in both virtio-net and vhost side.
>
> > /* Features supported by this lib. */
> > #define VHOST_SUPPORTED_FEATURES ((1ULL <<
> > VIRTIO_NET_F_MRG_RXBUF) | \
> > (1ULL << VIRTIO_NET_F_CTRL_VQ) | \
> > (1ULL << VIRTIO_NET_F_CTRL_RX) | \
> > (1ULL << VHOST_F_LOG_ALL))
> >
> > This is all current librte_vhost supports.
> >
> > Thanks,
> > Tetsuya
> >
> > > It will be great if someone from the forum could give the answers and
> clues.
> > >
> > > Thanks.
> > > Leo
>
>


[dpdk-dev] vhost compliant virtio based networking interface in container

2015-08-25 Thread Xie, Huawei
On 8/25/2015 10:59 AM, Tetsuya Mukawa wrote:
> Hi Xie and Yanping,
>
>
> May I ask you some questions?
> It seems we are also developing an almost same one.

Good to know that we are tackling the same problem and have the similar
idea.
What is your status now? We had the POC running, and compliant with
dpdkvhost.
Interrupt like notification isn't supported.

>
> On 2015/08/20 19:14, Xie, Huawei wrote:
>> Added dev at dpdk.org
>>
>> On 8/20/2015 6:04 PM, Xie, Huawei wrote:
>>> Yanping:
>>> I read your mail, seems what we did are quite similar. Here i wrote a
>>> quick mail to describe our design. Let me know if it is the same thing.
>>>
>>> Problem Statement:
>>> We don't have a high performance networking interface in container for
>>> NFV. Current veth pair based interface couldn't be easily accelerated.
>>>
>>> The key components involved:
>>> 1.DPDK based virtio PMD driver in container.
>>> 2.device simulation framework in container.
>>> 3.dpdk(or kernel) vhost running in host.
>>>
>>> How virtio is created?
>>> A:  There is no "real" virtio-pci device in container environment.
>>> 1). Host maintains pools of memories, and shares memory to container.
>>> This could be accomplished through host share a huge page file to container.
>>> 2). Containers creates virtio rings based on the shared memory.
>>> 3). Container creates mbuf memory pools on the shared memory.
>>> 4) Container send the memory and vring information to vhost through
>>> vhost message. This could be done either through ioctl call or vhost
>>> user message.
>>>
>>> How vhost message is sent?
>>> A: There are two alternative ways to do this.
>>> 1) The customized virtio PMD is responsible for all the vring creation,
>>> and vhost message sending.
> Above is our approach so far.
> It seems Yanping also takes this kind of approach.
> We are using vhost-user functionality instead of using the vhost-net
> kernel module.
> Probably this is the difference between Yanping and us.

In my current implementation, the device simulation layer talks to "user
space" vhost through cuse interface. It could also be done through vhost
user socket. This isn't the key point.
Here vhost-user is kind of confusing, maybe user space vhost is more
accurate, either cuse or unix domain socket. :).

As for yanping, they are now connecting to vhost-net kernel module, but
they are also trying to connect to "user space" vhost.  Correct me if wrong.
Yes, there is some difference between these two. Vhost-net kernel module
could directly access other process's memory, while using
vhost-user(cuse/user), we need do the memory mapping.
>
> BTW, we are going to submit a vhost PMD for DPDK-2.2.
> This PMD is implemented on librte_vhost.
> It allows DPDK application to handle a vhost-user(cuse) backend as a
> normal NIC port.
> This PMD should work with both Xie and Yanping approach.
> (In the case of Yanping approach, we may need vhost-cuse)
>
>>> 2) We could do this through a lightweight device simulation framework.
>>> The device simulation creates simple PCI bus. On the PCI bus,
>>> virtio-net PCI devices are created. The device simulations provides
>>> IOAPI for MMIO/IO access.
> Does it mean you implemented a kernel module?
> If so, do you still need vhost-cuse functionality to handle vhost
> messages n userspace?

The device simulation is  a library running in user space in container. 
It is linked with DPDK app. It creates pseudo buses and virtio-net PCI
devices.
The virtio-container-PMD configures the virtio-net pseudo devices
through IOAPI provided by the device simulation rather than IO
instructions as in KVM.
Why we use device simulation?
We could create other virtio devices in container, and provide an common
way to talk to vhost-xx module.

>>>2.1  virtio PMD configures the pseudo virtio device as how it does in
>>> KVM guest enviroment.
>>>2.2  Rather than using io instruction, virtio PMD uses IOAPI for IO
>>> operation on the virtio-net PCI device.
>>>2.3  The device simulation is responsible for device state machine
>>> simulation.
>>>2.4   The device simulation is responsbile for talking to vhost.
>>>  With this approach, we could minimize the virtio PMD modifications.
>>> The virtio PMD is like configuring a real virtio-net PCI device.
>>>
>>> Memory mapping?
>>> A: QEMU could access the whole guest memory in KVM enviroment. We need
>>> to fill the gap.
>>> container maps the shared memory to container's virtual address space
>>> and host maps it to host's virtual address space. There is a fixed
>>> offset mapping.
>>> Container creates shared vring based on the memory. Container also
>>> creates mbuf memory pool based on the shared memroy.
>>> In VHOST_SET_MEMORY_TABLE message, we send the memory mapping
>>> information for the shared memory. As we require mbuf pool created on
>>> the shared memory, and buffers are allcoated from the mbuf pools, dpdk
>>> vhost could translate the GPA in vring desc to host virtual

[dpdk-dev] [PATCH 3/3] app/test: enable test_red to build on non x86 platform

2015-08-25 Thread Thomas Monjalon
2015-08-18 18:10, Jerin Jacob:
> --- a/app/test/test_red.c
> +++ b/app/test/test_red.c
> +#if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_I686) || 
> defined(RTE_ARCH_X86_X32)
>  #ifdef __PIC__
>  asm volatile (
>  "mov %%ebx, %%edi\n"
> @@ -155,6 +156,7 @@ static inline void rdtsc_prof_start(struct rdtsc_prof *p)
>  #else
>   asm( "cpuid" : : : "%eax", "%ebx", "%ecx", "%edx" );
>  #endif
> +#endif
>   p->clk_start = rte_rdtsc();

The right fix would be to move that arch-specific code into an EAL abstraction.



[dpdk-dev] [PATCH] qos_meter: Fix compilation with APP_MODE_FWD

2015-08-25 Thread Thomas Monjalon
2015-08-18 16:55, Ian Stokes:
> The qos_meter sample app will fail to compile if APP_MODE
> is set to APP_MODE_FWD. This patch changes the variable
> name 'color' in main.h to the expected variable name
> 'input_color' to allow compilation with APP_MODE_FWD.

Thanks for raising the issue.

> --- a/examples/qos_meter/main.h
> +++ b/examples/qos_meter/main.h
>  #if APP_MODE == APP_MODE_FWD
>  
> -#define FUNC_METER(a,b,c,d) color, flow_id=flow_id, pkt_len=pkt_len, 
> time=time
> +#define FUNC_METER(a,b,c,d) input_color, flow_id=flow_id, pkt_len=pkt_len, 
> time=time

This patch should not be accepted to discourage build-time options.
Patch for run-time option is welcome.


[dpdk-dev] [PATCH] qos_meter: Fix compilation with APP_MODE_FWD

2015-08-25 Thread Mcnamara, John


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Thomas Monjalon
> Sent: Tuesday, August 25, 2015 1:37 PM
> To: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH] qos_meter: Fix compilation with APP_MODE_FWD
> 
> > --- a/examples/qos_meter/main.h
> > +++ b/examples/qos_meter/main.h
> >  #if APP_MODE == APP_MODE_FWD
> >
> > -#define FUNC_METER(a,b,c,d) color, flow_id=flow_id, pkt_len=pkt_len,
> > time=time
> > +#define FUNC_METER(a,b,c,d) input_color, flow_id=flow_id,
> > +pkt_len=pkt_len, time=time
> 
> This patch should not be accepted to discourage build-time options.
> Patch for run-time option is welcome.

Hi,

The patch is fixing a compilation issue, which seems reasonable. It isn't 
introducing a build time option, it is merely fixing an typo in an existing one.

Yes, it would be better not to have this build time option (in which case the 
issue would have been found sooner) but that isn't the responsibility of the 
person submitting this patch.

That is something that should be pushed back to the author/maintainer.

In the meantime this patch is still valid and should be applied.

John







[dpdk-dev] [PATCH] qos_meter: Fix compilation with APP_MODE_FWD

2015-08-25 Thread Thomas Monjalon
2015-08-25 13:34, Mcnamara, John:
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Thomas Monjalon
> > This patch should not be accepted to discourage build-time options.
> > Patch for run-time option is welcome.
> 
> The patch is fixing a compilation issue, which seems reasonable. It isn't
> introducing a build time option, it is merely fixing an typo in an
> existing one.

Yes

> Yes, it would be better not to have this build time option (in which case
> the issue would have been found sooner) but that isn't the responsibility
> of the person submitting this patch.

Yes

> That is something that should be pushed back to the author/maintainer.

Yes

> In the meantime this patch is still valid and should be applied.

No
After trying to request this kind of cleanup for several months, nothing
happens. Maybe it will be more efficient to leave bugs until someone
submit a real cleanup.



[dpdk-dev] working example commands for ethertype/flow_director_filter ?

2015-08-25 Thread Wu, Jingjing
Hi,

If you use "help filters", you can probably get the command format like:

ethertype_filter (port_id) (add|del) (mac_addr|mac_ignr) (mac_address) 
ethertype (ether_type) (drop|fwd) queue (queue_id)
   Add/Del an ethertype filter.


> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Navneet Rao
> Sent: Tuesday, August 25, 2015 2:56 AM
> To: dev at dpdk.org
> Subject: Re: [dpdk-dev] working example commands for 
> ethertype/flow_director_filter ?
> 
> testpmd>  mac_addr add 0 00:10:E0:3B:3B:50
> 
> testpmd> set promisc all on
> 
> testpmd>  ethertype_filter 0 add mac_addr 00:10:E0:3B:3B:50  ethertype 0x0806 
> fwd queue
> 1
> 
> ethertype filter programming error: (Invalid argument)
> 
> testpmd>  ethertype_filter 0 add mac_addr 00:10:E0:3B:3B:50 ethertype 0x0806 
> fwd queue 1
> 
> ethertype filter programming error: (Invalid argument)

[Wu, Jingjing] above error raised may because the NIC you using does not 
support ethertype_filter with mac.
> 
> testpmd>  ethertype_filter 0 add mac_ignr ethertype 0x0806 fwd queue 1
> 
> Bad arguments
> 
> 
[Wu, Jingjing] I think command like "ethertype_filter 0 add mac_ignr 
00:10:E0:3B:3B:50 ethertype 0x0806 fwd queue 1" may work.
> 
> 
> 
> Has anyone been able to get this to work!!!
> 
> All I want to is steer the traffic on port0 to go to some other queue 
> (instead of default 0)
> 
> 
> 
> And I want to filter on the mac_address.so using the ethertype_filter.
> 
> 
> 
> Thanks
> 
> -Navneet
> 
> 
> 
> 
> 
> -Original Message-
> From: Navneet Rao
> Sent: Friday, August 21, 2015 2:55 PM
> To: dev at dpdk.org
> Subject: [dpdk-dev] working example commands for 
> ethertype/flow_director_filter ?
> 
> 
> 
> Hello:
> 
> 
> 
> 
> 
> If anybody has any working example commands for ethertype or 
> flow_director_filter,  can
> you please send it across..
> 
> 
> 
> I am using the testpmd app, and it is constantly reporting "bad-arguments" 
> even for the legal
> commands in the doc!!!
> 
> 
> 
> 
> 
> Thanks
> 
> 
> 
> -Navneet
> 
> 
> 
> 
> 
> 


[dpdk-dev] flow_director_filter error!!

2015-08-25 Thread Wu, Jingjing
Hi, Navneet

I'm sorry for I have no idea about the NIC i540. Are you talking about X540?
If X540, I guess you can't classify on the MAC-ADDRESS to different queue by 
ethertype filter. Because in the X540 datasheet the ethertype filter is 
described as below:
" 7.1.2.3 L2 Ethertype Filters
These filters identify packets by their L2 Ethertype, 802.1Q user priority and 
optionally
assign them to a receive queue."

So the mac_address is not the filter's input.

Thanks
Jingjing

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Navneet Rao
> Sent: Friday, August 21, 2015 2:57 AM
> To: Mcnamara, John; dev at dpdk.org
> Subject: Re: [dpdk-dev] flow_director_filter error!!
> 
> Thanks John.
> 
> I am trying to setup/use the flow-director-filter on the i540.
> 
> -- When I try to setup the flow-director-filter as per the example, I am 
> getting "bad
> arguments"!!!
>  So decided to see if the flush command would work.
> 
> 
> In the interim --- I am using ethertype filter to accomplish the following.
> What I am trying to do is this --
> Use 2 different i540 cards
> Use the igb_uio driver.
> Use the testpmd app.
> Setup 5 different MAC-ADDRESSes on each port. (using the set mac_addr command)
> Setup 5 different RxQs and TxQs on each port.
> And then use the testpmd app to generate traffic..
> 
> I am assuming that the testpmd app will now send and receive traffic using 
> the 5 different
> MAC_ADDRESSes..
> On each port's receive I will now want to classify on the MAC-ADDRESS and 
> steer the traffic to
> different queues.
> 
> Is there an example/reference on how to achieve this?
> 
> Next, I would want to do "classify" on "flexbytes" and send/steer the traffic 
> to different
> queues using flow-director-filter.
> 
> Thanks
> -Navneet
> 
> 
> 
> 
> -Original Message-
> From: Mcnamara, John [mailto:john.mcnamara at intel.com]
> Sent: Wednesday, August 19, 2015 3:39 PM
> To: Navneet Rao; dev at dpdk.org
> Subject: RE: [dpdk-dev] flow_director_filter error!!
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Navneet Rao
> > Sent: Tuesday, August 18, 2015 4:01 PM
> > To:  HYPERLINK "mailto:dev at dpdk.org" dev at dpdk.org
> > Subject: [dpdk-dev] flow_director_filter error!!
> >
> > After I start the testpmd app, I am flusing the flow_director_filter
> > settings and get the following error -
> >
> >
> >
> > testpmd> flush_flow_director 0
> >
> > PMD: ixgbe_fdir_flush(): Failed to re-initialize FD table.
> >
> > flow director table flushing error: (Too many open files in system)
> 
> Hi,
> 
> Are you setting a flow director filter before flushing? If so, could you give 
> an example.
> 
> John.
> --
> 


[dpdk-dev] Industry's Tiny and Powerful Wireless LAN Controller - Running On Intel NUC D34010WYKH - Built on DPDK

2015-08-25 Thread Venkateswara Rao Thummala
Hi,

We DO have plans to Open Source the entire Controller, not just the Data
Plane, in the near future.

Regards
Venkat
www.onehopnetworks.com



On 19 August 2015 at 02:02, Thomas F Herbert  wrote:

>
>
> On 8/18/15 12:50 PM, Stephen Hemminger wrote:
>
>> On Tue, 18 Aug 2015 13:33:00 +0530
>> Venkateswara Rao Thummala  wrote:
>>
>> Hi,
>>>
>>> We are happy to announce that, we have Just launched the Industry's Tiny
>>> and Powerful Wireless LAN Controller - running on Intel NUC D34010WYKH.
>>>
>>> - Built on High Performance Virtual Data Plane, which is built using DPDK
>>> - Supports wide range of Third Party Access Points
>>>
>>> Please visit our Website www.onehopnetworks.com for more details.
>>>
>>> Regards
>>> Venkat
>>> Founder & Chief Architect
>>> OneHop Networks Pvt Ltd
>>> [www.onehopnetworks.com]
>>>
>> Congratulations. Do you plan to contribute code to help DPDK advance?
>>
> +1
>
>>
>>
>>
>> --
>> Thomas F Herbert Red Hat
>>
>


[dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598

2015-08-25 Thread Ananyev, Konstantin
Hi Vlad,

> -Original Message-
> From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com]
> Sent: Thursday, August 20, 2015 10:07 AM
> To: Ananyev, Konstantin; Lu, Wenzhuo
> Cc: dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for 
> all NICs but 82598
> 
> 
> 
> On 08/20/15 12:05, Vlad Zolotarov wrote:
> >
> >
> > On 08/20/15 11:56, Vlad Zolotarov wrote:
> >>
> >>
> >> On 08/20/15 11:41, Ananyev, Konstantin wrote:
> >>> Hi Vlad,
> >>>
>  -Original Message-
>  From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com]
>  Sent: Wednesday, August 19, 2015 11:03 AM
>  To: Ananyev, Konstantin; Lu, Wenzhuo
>  Cc: dev at dpdk.org
>  Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh
>  above 1 for all NICs but 82598
> 
> 
> 
>  On 08/19/15 10:43, Ananyev, Konstantin wrote:
> > Hi Vlad,
> > Sorry for delay with review, I am OOO till next week.
> > Meanwhile, few questions/comments from me.
>  Hi, Konstantin, long time no see... ;)
> 
> >> This patch fixes the Tx hang we were constantly hitting with a
> >> seastar-based
> >> application on x540 NIC.
> > Could you help to share with us how to reproduce the tx hang
> > issue,
> >> with using
> > typical DPDK examples?
>  Sorry. I'm not very familiar with the typical DPDK examples to
>  help u
>  here. However this is quite irrelevant since without this this
>  patch
>  ixgbe PMD obviously abuses the HW spec as has been explained
>  above.
> 
>  We saw the issue when u stressed the xmit path with a lot of
>  highly
>  fragmented TCP frames (packets with up to 33 fragments with
>  non-headers
>  fragments as small as 4 bytes) with all offload features enabled.
> > Could you provide us with the pcap file to reproduce the issue?
>  Well, the thing is it takes some time to reproduce it (a few
>  minutes of
>  heavy load) therefore a pcap would be quite large.
> >>> Probably you can upload it to some place, from which we will be able
> >>> to download it?
> >>
> >> I'll see what I can do but no promises...
> >
> > On a second thought pcap file won't help u much since in order to
> > reproduce the issue u have to reproduce exactly the same structure of
> > clusters i give to HW and it's not what u see on wire in a TSO case.
> 
> And not only in a TSO case... ;)

I understand that, but my thought was you can add some sort of TX callback for 
the rte_eth_tx_burst()
into your code that would write the packet into pcap file and then re-run your 
hang scenario.
I know that it means extra work for you - but I think it would be very helpful 
if we would be able to reproduce your hang scenario:
- if HW guys would confirm that setting RS bit for every EOP packet is not 
really required,
  then we probably have to look at what else can cause it.
- it might be added to our validation cycle, to prevent hitting similar problem 
in future.  
Thanks
Konstantin

> 
> >
> >>
> >>> Or might be you have some sort of scapy script to generate it?
> >>> I suppose we'll need something to reproduce the issue and verify the
> >>> fix.
> >>
> >> Since the original code abuses the HW spec u don't have to... ;)
> >>
> >>>
> > My concern with you approach is that it would affect TX performance.
>  It certainly will ;) But it seem inevitable. See below.
> 
> > Right now, for simple TX PMD usually reads only
> > (nb_tx_desc/tx_rs_thresh) TXDs,
> > While with your patch (if I understand it correctly) it has to
> > read all TXDs in the HW TX ring.
>  If by "simple" u refer an always single fragment per Tx packet -
>  then u
>  are absolutely correct.
> 
>  My initial patch was to only set RS on every EOP descriptor without
>  changing the rs_thresh value and this patch worked.
>  However HW spec doesn't ensure in a general case that packets are
>  always
>  handled/completion write-back completes in the same order the packets
>  are placed on the ring (see "Tx arbitration schemes" chapter in 82599
>  spec for instance). Therefore AFAIU one should not assume that if
>  packet[x+1] DD bit is set then packet[x] is completed too.
> >>>  From my understanding, TX arbitration controls the order in which
> >>> TXDs from
> >>> different queues are fetched/processed.
> >>> But descriptors from the same TX queue are processed in FIFO order.
> >>> So, I think that  - yes, if TXD[x+1] DD bit is set, then TXD[x] is
> >>> completed too,
> >>> and setting RS on every EOP TXD should be enough.
> >>
> >> Ok. I'll rework the patch under this assumption then.
> >>
> >>>
>  That's why I changed the patch to be as u see it now. However if I
>  miss
>  something here and your HW people ensure the in-order completion
>  this of
>  course

[dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598

2015-08-25 Thread Avi Kivity
On 08/25/2015 08:33 PM, Ananyev, Konstantin wrote:
> Hi Vlad,
>
>> -Original Message-
>> From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com]
>> Sent: Thursday, August 20, 2015 10:07 AM
>> To: Ananyev, Konstantin; Lu, Wenzhuo
>> Cc: dev at dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 
>> for all NICs but 82598
>>
>>
>>
>> On 08/20/15 12:05, Vlad Zolotarov wrote:
>>>
>>> On 08/20/15 11:56, Vlad Zolotarov wrote:

 On 08/20/15 11:41, Ananyev, Konstantin wrote:
> Hi Vlad,
>
>> -Original Message-
>> From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com]
>> Sent: Wednesday, August 19, 2015 11:03 AM
>> To: Ananyev, Konstantin; Lu, Wenzhuo
>> Cc: dev at dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh
>> above 1 for all NICs but 82598
>>
>>
>>
>> On 08/19/15 10:43, Ananyev, Konstantin wrote:
>>> Hi Vlad,
>>> Sorry for delay with review, I am OOO till next week.
>>> Meanwhile, few questions/comments from me.
>> Hi, Konstantin, long time no see... ;)
>>
 This patch fixes the Tx hang we were constantly hitting with a
 seastar-based
 application on x540 NIC.
>>> Could you help to share with us how to reproduce the tx hang
>>> issue,
 with using
>>> typical DPDK examples?
>> Sorry. I'm not very familiar with the typical DPDK examples to
>> help u
>> here. However this is quite irrelevant since without this this
>> patch
>> ixgbe PMD obviously abuses the HW spec as has been explained
>> above.
>>
>> We saw the issue when u stressed the xmit path with a lot of
>> highly
>> fragmented TCP frames (packets with up to 33 fragments with
>> non-headers
>> fragments as small as 4 bytes) with all offload features enabled.
>>> Could you provide us with the pcap file to reproduce the issue?
>> Well, the thing is it takes some time to reproduce it (a few
>> minutes of
>> heavy load) therefore a pcap would be quite large.
> Probably you can upload it to some place, from which we will be able
> to download it?
 I'll see what I can do but no promises...
>>> On a second thought pcap file won't help u much since in order to
>>> reproduce the issue u have to reproduce exactly the same structure of
>>> clusters i give to HW and it's not what u see on wire in a TSO case.
>> And not only in a TSO case... ;)
> I understand that, but my thought was you can add some sort of TX callback 
> for the rte_eth_tx_burst()
> into your code that would write the packet into pcap file and then re-run 
> your hang scenario.
> I know that it means extra work for you - but I think it would be very 
> helpful if we would be able to reproduce your hang scenario:
> - if HW guys would confirm that setting RS bit for every EOP packet is not 
> really required,
>then we probably have to look at what else can cause it.
> - it might be added to our validation cycle, to prevent hitting similar 
> problem in future.
> Thanks
> Konstantin
>


I think if you send packets with random fragment chains up to 32 mbufs 
you might see this.  TSO was not required to trigger this problem.



[dpdk-dev] [PATCH v2] Change rte_eal_vdev_init to update port_id

2015-08-25 Thread Ravi Kerur
Hi Thomas, David

Let us know how you want us to fix this? To fix rte_eal_vdev_init and
rte_eal_pci_probe_one to return allocated port_id we had 2 approaches
mentioned in earlier discussion. In addition to those we have another
approach with changes isolated only to rte_ether component. I am attaching
diffs (preliminary) with this email. Please let us know your inputs since
it involves EAL component.

Thanks,
Ravi


On Thu, Aug 20, 2015 at 8:33 PM, Tetsuya Mukawa  wrote:

> On 2015/08/21 4:16, Ravi Kerur wrote:
> >
> > >  /**
> > >   * Uninitalize a driver specified by name.
> > > @@ -125,6 +127,38 @@ int rte_eal_vdev_init(const char *name,
> > const char *args);
> > >   */
> > >  int rte_eal_vdev_uninit(const char *name);
> > >
> > > +/**
> > > + * Given name, return port_id associated with the device.
> > > + *
> > > + * @param name
> > > + *   Name associated with device.
> > > + * @param port_id
> > > + *   The port identifier of the device.
> > > + *
> > > + * @return
> > > + *   - 0: Success.
> > > + *   - -EINVAL: NULL string (name)
> > > + *   - -ENODEV failure
> >
> > Please define above in 'rte_ethdev.h.'
> >
> >
> > Hi Tetsuya,
> >
> > I would like to take a step back and explain why function declarations
> > are in rte_dev.h and not in rte_ethdev.h
> >
> > Approach 1:
> > Initially I thought of modifying driver init routine to return/update
> > port_id as the init routine is the place port_id gets allocated and it
> > would have been clean approach. However, it required changes to all
> > PMD_VDEV driver init routine to modify function signature for the
> > changes which I thought may be an overkill.
> >
> > Approach 2:
> > Instead I chose to define 2 functions in librte_ether/rte_ethdev.c and
> > make use of it. In this approach new functions are invoked from
> > librte_eal/common/.c to get port_id. If I had new function
> > declarations in rte_ethdev.h and included that file in
> > librte_eal/common/.c files it creates circular dependancy and
> > compilation fails, hence I took hybrid approach of definitions in
> > librte_ether and declarations in librte_eal.
> >
> > Please let me know if there is a better approach to take care of your
> > comments. As it stands declarations cannot be moved to rte_ethdev.h
> > for compilation reasons.
> >
> > Thanks,
> > Ravi
> >
>
> Hi Ravi,
> (Adding David)
>
> I appreciate your description. I understand why you define the functions
> in rte_dev.h.
>
> About Approach2, I don't know a way to implement cleanly.
> I guess if we define the functions in rte_dev.h, the developers who want
> to use the functions will be confused because the functions are
> implemented in ethdev.c, but it is needed to include rte_dev.h.
>
> To avoid such a confusion, following implementation might be worked, but
> I am not sure this cording style is allowed in eal library.
>
> 
> Define the functions in rte_ethdev.h, then fix librte_eal/common/.c
> files like below
>
> ex) lib/librte_eal/common/eal_common_dev.c
> 
> +#include 
>  #include 
>  #include 
>  #include 
>
>  #include "eal_private.h"
>
> +extern int rte_eth_dev_get_port_by_name(const char *name, uint8_t
> *port_id);
> +extern int rte_eth_dev_get_port_by_addr(const struct rte_pci_addr
> *addr, uint8_t *port_id);
> 
>
> In this case, the developer might be able to notice that above usage in
> eal library is some kind of exception. But I guess the DPDK code won't
> be clean if we start having a exception.
> So it might be good to choose Approach1, because apparently it is
> straight forward.
> Anyone won't be confused and complained about coding style.
>
>
> Hi David,
>
> Could you please let us know what you think?
> Do you have a good approach for this?
>
> Thanks,
> Tetsuya
>
>


[dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598

2015-08-25 Thread Zhang, Helin
Hi Vlad

In addition, I?d double check with you what?s the maximum number of descriptors 
would be used for a single packet transmitting?
Datasheet said that it supports up to 8. I am wondering if more than 8 were 
used in your case?
Thank you very much!

Regards,
Helin

From: Zhang, Helin
Sent: Wednesday, August 19, 2015 10:29 AM
To: Vladislav Zolotarov
Cc: Lu, Wenzhuo; dev at dpdk.org
Subject: RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for 
all NICs but 82598

Hi Vlad

Thank you very much for the patches! Give me a few more time to double check 
with more guys, and possibly hardware experts.

Regards,
Helin

From: Vladislav Zolotarov [mailto:vl...@cloudius-systems.com]
Sent: Tuesday, August 18, 2015 9:56 PM
To: Lu, Wenzhuo
Cc: dev at dpdk.org; Zhang, Helin
Subject: RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for 
all NICs but 82598


On Aug 19, 2015 03:42, "Lu, Wenzhuo" mailto:wenzhuo.lu 
at intel.com>> wrote:
>
> Hi Helin,
>
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] 
> > On Behalf Of Vlad Zolotarov
> > Sent: Friday, August 14, 2015 1:38 PM
> > To: Zhang, Helin; dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 
> > for
> > all NICs but 82598
> >
> >
> >
> > On 08/13/15 23:28, Zhang, Helin wrote:
> > > Hi Vlad
> > >
> > > I don't think the changes are needed. It says in datasheet that the RS
> > > bit should be set on the last descriptor of every packet, ONLY WHEN
> > TXDCTL.WTHRESH equals to ZERO.
> >
> > Of course it's needed! See below.
> > Exactly the same spec a few lines above the place u've just quoted states:
> >
> > "Software should not set the RS bit when TXDCTL.WTHRESH is greater than
> > zero."
> >
> > And since all three (3) ixgbe xmit callbacks are utilizing RS bit notation 
> > ixgbe PMD
> > is actually not supporting any value of WTHRESH different from zero.
> I think Vlad is right. We need to fix this issue. Any suggestion? If not, I'd 
> like to ack this patch.

Pls., note that there is a v2 of this patch on the list. I forgot to patch 
ixgbevf_dev_info_get() in v1.

>
> >
> > >
> > > Regards,
> > > Helin
> > >
> > >> -Original Message-
> > >> From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com > >> at cloudius-systems.com>]
> > >> Sent: Thursday, August 13, 2015 11:07 AM
> > >> To: dev at dpdk.org
> > >> Cc: Zhang, Helin; Ananyev, Konstantin; avi at 
> > >> cloudius-systems.com; Vlad
> > >> Zolotarov
> > >> Subject: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for
> > all
> > >> NICs but 82598
> > >>
> > >> According to 82599 and x540 HW specifications RS bit *must* be set in the
> > last
> > >> descriptor of *every* packet.
> > > There is a condition that if TXDCTL.WTHRESH equal to zero.
> >
> > Right and ixgbe PMD requires this condition to be fulfilled in order to
> > function. See above.
> >
> > >
> > >> This patch fixes the Tx hang we were constantly hitting with a 
> > >> seastar-based
> > >> application on x540 NIC.
> > > Could you help to share with us how to reproduce the tx hang issue, with 
> > > using
> > > typical DPDK examples?
> >
> > Sorry. I'm not very familiar with the typical DPDK examples to help u
> > here. However this is quite irrelevant since without this this patch
> > ixgbe PMD obviously abuses the HW spec as has been explained above.
> >
> > We saw the issue when u stressed the xmit path with a lot of highly
> > fragmented TCP frames (packets with up to 33 fragments with non-headers
> > fragments as small as 4 bytes) with all offload features enabled.
> >
> > Thanks,
> > vlad
> > >
> > >> Signed-off-by: Vlad Zolotarov  > >> cloudius-systems.com>
> > >> ---
> > >>   drivers/net/ixgbe/ixgbe_ethdev.c |  9 +
> > >>   drivers/net/ixgbe/ixgbe_rxtx.c   | 23 ++-
> > >>   2 files changed, 31 insertions(+), 1 deletion(-)
> > >>
> > >> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c
> > >> b/drivers/net/ixgbe/ixgbe_ethdev.c
> > >> index b8ee1e9..6714fd9 100644
> > >> --- a/drivers/net/ixgbe/ixgbe_ethdev.c
> > >> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c
> > >> @@ -2414,6 +2414,15 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev,
> > struct
> > >> rte_eth_dev_info *dev_info)
> > >>.txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS |
> > >>ETH_TXQ_FLAGS_NOOFFLOADS,
> > >>};
> > >> +
> > >> +  /*
> > >> +   * According to 82599 and x540 specifications RS bit *must* be set on
> > the
> > >> +   * last descriptor of *every* packet. Therefore we will not allow the
> > >> +   * tx_rs_thresh above 1 for all NICs newer than 82598.
> > >> +   */
> > >> +  if (hw->mac.type > ixgbe_mac_82598EB)
> > >> +  dev_info->default_txconf.tx_rs_thresh = 1;
> > >> +
> > >>dev_info->hash_ke

[dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598

2015-08-25 Thread Vladislav Zolotarov
On Aug 25, 2015 21:14, "Zhang, Helin"  wrote:
>
> Hi Vlad
>
>
>
> In addition, I?d double check with you what?s the maximum number of
descriptors would be used for a single packet transmitting?
>
> Datasheet said that it supports up to 8. I am wondering if more than 8
were used in your case?

If memory serves me well the maximum number of data descriptors per single
xmit packet is 40 minus 2 minus WTHRESH. Since WTHRESH in DPDK is always
zero it gives us 38 segments. We limit them by 33.

>
> Thank you very much!
>
>
>
> Regards,
>
> Helin
>
>
>
> From: Zhang, Helin
> Sent: Wednesday, August 19, 2015 10:29 AM
> To: Vladislav Zolotarov
> Cc: Lu, Wenzhuo; dev at dpdk.org
>
> Subject: RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1
for all NICs but 82598
>
>
>
> Hi Vlad
>
>
>
> Thank you very much for the patches! Give me a few more time to double
check with more guys, and possibly hardware experts.
>
>
>
> Regards,
>
> Helin
>
>
>
> From: Vladislav Zolotarov [mailto:vladz at cloudius-systems.com]
> Sent: Tuesday, August 18, 2015 9:56 PM
> To: Lu, Wenzhuo
> Cc: dev at dpdk.org; Zhang, Helin
> Subject: RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1
for all NICs but 82598
>
>
>
>
> On Aug 19, 2015 03:42, "Lu, Wenzhuo"  wrote:
> >
> > Hi Helin,
> >
> > > -Original Message-
> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Vlad Zolotarov
> > > Sent: Friday, August 14, 2015 1:38 PM
> > > To: Zhang, Helin; dev at dpdk.org
> > > Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh
above 1 for
> > > all NICs but 82598
> > >
> > >
> > >
> > > On 08/13/15 23:28, Zhang, Helin wrote:
> > > > Hi Vlad
> > > >
> > > > I don't think the changes are needed. It says in datasheet that the
RS
> > > > bit should be set on the last descriptor of every packet, ONLY WHEN
> > > TXDCTL.WTHRESH equals to ZERO.
> > >
> > > Of course it's needed! See below.
> > > Exactly the same spec a few lines above the place u've just quoted
states:
> > >
> > > "Software should not set the RS bit when TXDCTL.WTHRESH is greater
than
> > > zero."
> > >
> > > And since all three (3) ixgbe xmit callbacks are utilizing RS bit
notation ixgbe PMD
> > > is actually not supporting any value of WTHRESH different from zero.
> > I think Vlad is right. We need to fix this issue. Any suggestion? If
not, I'd like to ack this patch.
>
> Pls., note that there is a v2 of this patch on the list. I forgot to
patch ixgbevf_dev_info_get() in v1.
>
> >
> > >
> > > >
> > > > Regards,
> > > > Helin
> > > >
> > > >> -Original Message-
> > > >> From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com]
> > > >> Sent: Thursday, August 13, 2015 11:07 AM
> > > >> To: dev at dpdk.org
> > > >> Cc: Zhang, Helin; Ananyev, Konstantin; avi at cloudius-systems.com;
Vlad
> > > >> Zolotarov
> > > >> Subject: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh
above 1 for
> > > all
> > > >> NICs but 82598
> > > >>
> > > >> According to 82599 and x540 HW specifications RS bit *must* be set
in the
> > > last
> > > >> descriptor of *every* packet.
> > > > There is a condition that if TXDCTL.WTHRESH equal to zero.
> > >
> > > Right and ixgbe PMD requires this condition to be fulfilled in order
to
> > > function. See above.
> > >
> > > >
> > > >> This patch fixes the Tx hang we were constantly hitting with a
seastar-based
> > > >> application on x540 NIC.
> > > > Could you help to share with us how to reproduce the tx hang issue,
with using
> > > > typical DPDK examples?
> > >
> > > Sorry. I'm not very familiar with the typical DPDK examples to help u
> > > here. However this is quite irrelevant since without this this patch
> > > ixgbe PMD obviously abuses the HW spec as has been explained above.
> > >
> > > We saw the issue when u stressed the xmit path with a lot of highly
> > > fragmented TCP frames (packets with up to 33 fragments with
non-headers
> > > fragments as small as 4 bytes) with all offload features enabled.
> > >
> > > Thanks,
> > > vlad
> > > >
> > > >> Signed-off-by: Vlad Zolotarov 
> > > >> ---
> > > >>   drivers/net/ixgbe/ixgbe_ethdev.c |  9 +
> > > >>   drivers/net/ixgbe/ixgbe_rxtx.c   | 23 ++-
> > > >>   2 files changed, 31 insertions(+), 1 deletion(-)
> > > >>
> > > >> diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c
> > > >> b/drivers/net/ixgbe/ixgbe_ethdev.c
> > > >> index b8ee1e9..6714fd9 100644
> > > >> --- a/drivers/net/ixgbe/ixgbe_ethdev.c
> > > >> +++ b/drivers/net/ixgbe/ixgbe_ethdev.c
> > > >> @@ -2414,6 +2414,15 @@ ixgbe_dev_info_get(struct rte_eth_dev *dev,
> > > struct
> > > >> rte_eth_dev_info *dev_info)
> > > >>.txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS |
> > > >>ETH_TXQ_FLAGS_NOOFFLOADS,
> > > >>};
> > > >> +
> > > >> +  /*
> > > >> +   * According to 82599 and x540 specifications RS bit *must* be
set on
> > > the
> > > >> +   * last descriptor of *every* packet. Therefore we will not
allow the
> > > >> +   * tx_rs_thr

[dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598

2015-08-25 Thread Zhang, Helin
Hi Vlad

I think this could possibly be the root cause of your TX hang issue. Please try 
to limit the number to 8 or less, and then see if the issue will still be there 
or not?
It does not have any check for the number of descriptors to be used for a 
single packet, and it relies on the users to give correct mbuf chains.

We may need a check of this somewhere. Of cause the point you indicated we also 
need to carefully investigate or fix.

Regards,
Helin

From: Vladislav Zolotarov [mailto:vl...@cloudius-systems.com]
Sent: Tuesday, August 25, 2015 11:34 AM
To: Zhang, Helin
Cc: Lu, Wenzhuo; dev at dpdk.org
Subject: RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for 
all NICs but 82598


On Aug 25, 2015 21:14, "Zhang, Helin" mailto:helin.zhang at intel.com>> wrote:
>
> Hi Vlad
>
>
>
> In addition, I?d double check with you what?s the maximum number of 
> descriptors would be used for a single packet transmitting?
>
> Datasheet said that it supports up to 8. I am wondering if more than 8 were 
> used in your case?

If memory serves me well the maximum number of data descriptors per single xmit 
packet is 40 minus 2 minus WTHRESH. Since WTHRESH in DPDK is always zero it 
gives us 38 segments. We limit them by 33.

>
> Thank you very much!
>
>
>
> Regards,
>
> Helin
>
>
>
> From: Zhang, Helin
> Sent: Wednesday, August 19, 2015 10:29 AM
> To: Vladislav Zolotarov
> Cc: Lu, Wenzhuo; dev at dpdk.org
>
> Subject: RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for 
> all NICs but 82598
>
>
>
> Hi Vlad
>
>
>
> Thank you very much for the patches! Give me a few more time to double check 
> with more guys, and possibly hardware experts.
>
>
>
> Regards,
>
> Helin
>
>
>
> From: Vladislav Zolotarov [mailto:vladz at cloudius-systems.com at cloudius-systems.com>]
> Sent: Tuesday, August 18, 2015 9:56 PM
> To: Lu, Wenzhuo
> Cc: dev at dpdk.org; Zhang, Helin
> Subject: RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for 
> all NICs but 82598
>
>
>
>
> On Aug 19, 2015 03:42, "Lu, Wenzhuo"  intel.com> wrote:
> >
> > Hi Helin,
> >
> > > -Original Message-
> > > From: dev [mailto:dev-bounces at dpdk.org > > dpdk.org>] On Behalf Of Vlad Zolotarov
> > > Sent: Friday, August 14, 2015 1:38 PM
> > > To: Zhang, Helin; dev at dpdk.org
> > > Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 
> > > for
> > > all NICs but 82598
> > >
> > >
> > >
> > > On 08/13/15 23:28, Zhang, Helin wrote:
> > > > Hi Vlad
> > > >
> > > > I don't think the changes are needed. It says in datasheet that the RS
> > > > bit should be set on the last descriptor of every packet, ONLY WHEN
> > > TXDCTL.WTHRESH equals to ZERO.
> > >
> > > Of course it's needed! See below.
> > > Exactly the same spec a few lines above the place u've just quoted states:
> > >
> > > "Software should not set the RS bit when TXDCTL.WTHRESH is greater than
> > > zero."
> > >
> > > And since all three (3) ixgbe xmit callbacks are utilizing RS bit 
> > > notation ixgbe PMD
> > > is actually not supporting any value of WTHRESH different from zero.
> > I think Vlad is right. We need to fix this issue. Any suggestion? If not, 
> > I'd like to ack this patch.
>
> Pls., note that there is a v2 of this patch on the list. I forgot to patch 
> ixgbevf_dev_info_get() in v1.
>
> >
> > >
> > > >
> > > > Regards,
> > > > Helin
> > > >
> > > >> -Original Message-
> > > >> From: Vlad Zolotarov [mailto:vladz at 
> > > >> cloudius-systems.com]
> > > >> Sent: Thursday, August 13, 2015 11:07 AM
> > > >> To: dev at dpdk.org
> > > >> Cc: Zhang, Helin; Ananyev, Konstantin; avi at 
> > > >> cloudius-systems.com; Vlad
> > > >> Zolotarov
> > > >> Subject: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 
> > > >> for
> > > all
> > > >> NICs but 82598
> > > >>
> > > >> According to 82599 and x540 HW specifications RS bit *must* be set in 
> > > >> the
> > > last
> > > >> descriptor of *every* packet.
> > > > There is a condition that if TXDCTL.WTHRESH equal to zero.
> > >
> > > Right and ixgbe PMD requires this condition to be fulfilled in order to
> > > function. See above.
> > >
> > > >
> > > >> This patch fixes the Tx hang we were constantly hitting with a 
> > > >> seastar-based
> > > >> application on x540 NIC.
> > > > Could you help to share with us how to reproduce the tx hang issue, 
> > > > with using
> > > > typical DPDK examples?
> > >
> > > Sorry. I'm not very familiar with the typical DPDK examples to help u
> > > here. However this is quite irrelevant since without this this patch
> > > ixgbe PMD obviously abuses the HW spec as has been explained above.
> > >
> > > We saw the issue when u stressed the xmit path with a lot of highly
> > > fragmented TCP frame

[dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598

2015-08-25 Thread Vlad Zolotarov


On 08/25/15 21:43, Zhang, Helin wrote:
>
> Hi Vlad
>
> I think this could possibly be the root cause of your TX hang issue. 
> Please try to limit the number to 8 or less, and then see if the issue 
> will still be there or not?
>

Helin, the issue has been seen on x540 devices. Pls., see a chapter 
7.2.1.1 of x540 devices spec:

A packet (or multiple packets in transmit segmentation) can span any number of
buffers (and their descriptors) up to a limit of 40 minus WTHRESH minus 2 (see
Section 7.2.3.3 for Tx Ring details and section Section 7.2.3.5.1 for WTHRESH
details). For best performance it is recommended to minimize the number of 
buffers
as possible.

Could u, pls., clarify why do u think that the maximum number of data 
buffers is limited by 8?

thanks,
vlad

> It does not have any check for the number of descriptors to be used 
> for a single packet, and it relies on the users to give correct mbuf 
> chains.
>
> We may need a check of this somewhere. Of cause the point you 
> indicated we also need to carefully investigate or fix.
>
> Regards,
>
> Helin
>
> *From:*Vladislav Zolotarov [mailto:vladz at cloudius-systems.com]
> *Sent:* Tuesday, August 25, 2015 11:34 AM
> *To:* Zhang, Helin
> *Cc:* Lu, Wenzhuo; dev at dpdk.org
> *Subject:* RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh 
> above 1 for all NICs but 82598
>
>
> On Aug 25, 2015 21:14, "Zhang, Helin"  > wrote:
> >
> > Hi Vlad
> >
> >
> >
> > In addition, I?d double check with you what?s the maximum number of 
> descriptors would be used for a single packet transmitting?
> >
> > Datasheet said that it supports up to 8. I am wondering if more than 
> 8 were used in your case?
>
> If memory serves me well the maximum number of data descriptors per 
> single xmit packet is 40 minus 2 minus WTHRESH. Since WTHRESH in DPDK 
> is always zero it gives us 38 segments. We limit them by 33.
>
> >
> > Thank you very much!
> >
> >
> >
> > Regards,
> >
> > Helin
> >
> >
> >
> > From: Zhang, Helin
> > Sent: Wednesday, August 19, 2015 10:29 AM
> > To: Vladislav Zolotarov
> > Cc: Lu, Wenzhuo; dev at dpdk.org 
> >
> > Subject: RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh 
> above 1 for all NICs but 82598
> >
> >
> >
> > Hi Vlad
> >
> >
> >
> > Thank you very much for the patches! Give me a few more time to 
> double check with more guys, and possibly hardware experts.
> >
> >
> >
> > Regards,
> >
> > Helin
> >
> >
> >
> > From: Vladislav Zolotarov [mailto:vladz at cloudius-systems.com 
> ]
> > Sent: Tuesday, August 18, 2015 9:56 PM
> > To: Lu, Wenzhuo
> > Cc: dev at dpdk.org ; Zhang, Helin
> > Subject: RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh 
> above 1 for all NICs but 82598
> >
> >
> >
> >
> > On Aug 19, 2015 03:42, "Lu, Wenzhuo"  > wrote:
> > >
> > > Hi Helin,
> > >
> > > > -Original Message-
> > > > From: dev [mailto:dev-bounces at dpdk.org 
> ] On Behalf Of Vlad Zolotarov
> > > > Sent: Friday, August 14, 2015 1:38 PM
> > > > To: Zhang, Helin; dev at dpdk.org 
> > > > Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid 
> tx_rs_thresh above 1 for
> > > > all NICs but 82598
> > > >
> > > >
> > > >
> > > > On 08/13/15 23:28, Zhang, Helin wrote:
> > > > > Hi Vlad
> > > > >
> > > > > I don't think the changes are needed. It says in datasheet 
> that the RS
> > > > > bit should be set on the last descriptor of every packet, ONLY 
> WHEN
> > > > TXDCTL.WTHRESH equals to ZERO.
> > > >
> > > > Of course it's needed! See below.
> > > > Exactly the same spec a few lines above the place u've just 
> quoted states:
> > > >
> > > > "Software should not set the RS bit when TXDCTL.WTHRESH is 
> greater than
> > > > zero."
> > > >
> > > > And since all three (3) ixgbe xmit callbacks are utilizing RS 
> bit notation ixgbe PMD
> > > > is actually not supporting any value of WTHRESH different from zero.
> > > I think Vlad is right. We need to fix this issue. Any suggestion? 
> If not, I'd like to ack this patch.
> >
> > Pls., note that there is a v2 of this patch on the list. I forgot to 
> patch ixgbevf_dev_info_get() in v1.
> >
> > >
> > > >
> > > > >
> > > > > Regards,
> > > > > Helin
> > > > >
> > > > >> -Original Message-
> > > > >> From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com 
> ]
> > > > >> Sent: Thursday, August 13, 2015 11:07 AM
> > > > >> To: dev at dpdk.org 
> > > > >> Cc: Zhang, Helin; Ananyev, Konstantin; 
> avi at cloudius-systems.com ; Vlad
> > > > >> Zolotarov
> > > > >> Subject: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh 
> above 1 for
> > > > all
> > > > >> NICs but 82598
> > > > >>
> > > > >> According to 82599 and x540 HW specifications RS bit *must* 
> be set in the
> > > > l

[dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598

2015-08-25 Thread Zhang, Helin


> -Original Message-
> From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com]
> Sent: Tuesday, August 25, 2015 11:53 AM
> To: Zhang, Helin
> Cc: Lu, Wenzhuo; dev at dpdk.org
> Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for
> all NICs but 82598
> 
> 
> 
> On 08/25/15 21:43, Zhang, Helin wrote:
> >
> > Hi Vlad
> >
> > I think this could possibly be the root cause of your TX hang issue.
> > Please try to limit the number to 8 or less, and then see if the issue
> > will still be there or not?
> >
> 
> Helin, the issue has been seen on x540 devices. Pls., see a chapter
> 7.2.1.1 of x540 devices spec:
> 
> A packet (or multiple packets in transmit segmentation) can span any number of
> buffers (and their descriptors) up to a limit of 40 minus WTHRESH minus 2 (see
> Section 7.2.3.3 for Tx Ring details and section Section 7.2.3.5.1 for WTHRESH
> details). For best performance it is recommended to minimize the number of
> buffers as possible.
> 
> Could u, pls., clarify why do u think that the maximum number of data buffers 
> is
> limited by 8?
OK, i40e hardware is 8, so I'd assume x540 could have a similar one. Yes, in 
your case,
the limit could be around 38, right?
Could you help to make sure there is no packet to be transmitted uses more than
38 descriptors?
I heard that there is a similar hang issue on X710 if using more than 8 
descriptors for
a single packet. I am wondering if the issue is similar on x540.

Regards,
Helin

> 
> thanks,
> vlad
> 
> > It does not have any check for the number of descriptors to be used
> > for a single packet, and it relies on the users to give correct mbuf
> > chains.
> >
> > We may need a check of this somewhere. Of cause the point you
> > indicated we also need to carefully investigate or fix.
> >
> > Regards,
> >
> > Helin
> >
> > *From:*Vladislav Zolotarov [mailto:vladz at cloudius-systems.com]
> > *Sent:* Tuesday, August 25, 2015 11:34 AM
> > *To:* Zhang, Helin
> > *Cc:* Lu, Wenzhuo; dev at dpdk.org
> > *Subject:* RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh
> > above 1 for all NICs but 82598
> >
> >
> > On Aug 25, 2015 21:14, "Zhang, Helin"  > > wrote:
> > >
> > > Hi Vlad
> > >
> > >
> > >
> > > In addition, I?d double check with you what?s the maximum number of
> > descriptors would be used for a single packet transmitting?
> > >
> > > Datasheet said that it supports up to 8. I am wondering if more than
> > 8 were used in your case?
> >
> > If memory serves me well the maximum number of data descriptors per
> > single xmit packet is 40 minus 2 minus WTHRESH. Since WTHRESH in DPDK
> > is always zero it gives us 38 segments. We limit them by 33.
> >
> > >
> > > Thank you very much!
> > >
> > >
> > >
> > > Regards,
> > >
> > > Helin
> > >
> > >
> > >
> > > From: Zhang, Helin
> > > Sent: Wednesday, August 19, 2015 10:29 AM
> > > To: Vladislav Zolotarov
> > > Cc: Lu, Wenzhuo; dev at dpdk.org 
> > >
> > > Subject: RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh
> > above 1 for all NICs but 82598
> > >
> > >
> > >
> > > Hi Vlad
> > >
> > >
> > >
> > > Thank you very much for the patches! Give me a few more time to
> > double check with more guys, and possibly hardware experts.
> > >
> > >
> > >
> > > Regards,
> > >
> > > Helin
> > >
> > >
> > >
> > > From: Vladislav Zolotarov [mailto:vladz at cloudius-systems.com
> > ]
> > > Sent: Tuesday, August 18, 2015 9:56 PM
> > > To: Lu, Wenzhuo
> > > Cc: dev at dpdk.org ; Zhang, Helin
> > > Subject: RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh
> > above 1 for all NICs but 82598
> > >
> > >
> > >
> > >
> > > On Aug 19, 2015 03:42, "Lu, Wenzhuo"  > > wrote:
> > > >
> > > > Hi Helin,
> > > >
> > > > > -Original Message-
> > > > > From: dev [mailto:dev-bounces at dpdk.org
> > ] On Behalf Of Vlad Zolotarov
> > > > > Sent: Friday, August 14, 2015 1:38 PM
> > > > > To: Zhang, Helin; dev at dpdk.org 
> > > > > Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid
> > tx_rs_thresh above 1 for
> > > > > all NICs but 82598
> > > > >
> > > > >
> > > > >
> > > > > On 08/13/15 23:28, Zhang, Helin wrote:
> > > > > > Hi Vlad
> > > > > >
> > > > > > I don't think the changes are needed. It says in datasheet
> > that the RS
> > > > > > bit should be set on the last descriptor of every packet, ONLY
> > WHEN
> > > > > TXDCTL.WTHRESH equals to ZERO.
> > > > >
> > > > > Of course it's needed! See below.
> > > > > Exactly the same spec a few lines above the place u've just
> > quoted states:
> > > > >
> > > > > "Software should not set the RS bit when TXDCTL.WTHRESH is
> > greater than
> > > > > zero."
> > > > >
> > > > > And since all three (3) ixgbe xmit callbacks are utilizing RS
> > bit notation ixgbe PMD
> > > > > is actually not supporting any value of WTHRESH diff

[dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598

2015-08-25 Thread Avi Kivity
On 08/25/2015 10:16 PM, Zhang, Helin wrote:
>
>> -Original Message-
>> From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com]
>> Sent: Tuesday, August 25, 2015 11:53 AM
>> To: Zhang, Helin
>> Cc: Lu, Wenzhuo; dev at dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for
>> all NICs but 82598
>>
>>
>>
>> On 08/25/15 21:43, Zhang, Helin wrote:
>>> Hi Vlad
>>>
>>> I think this could possibly be the root cause of your TX hang issue.
>>> Please try to limit the number to 8 or less, and then see if the issue
>>> will still be there or not?
>>>
>> Helin, the issue has been seen on x540 devices. Pls., see a chapter
>> 7.2.1.1 of x540 devices spec:
>>
>> A packet (or multiple packets in transmit segmentation) can span any number 
>> of
>> buffers (and their descriptors) up to a limit of 40 minus WTHRESH minus 2 
>> (see
>> Section 7.2.3.3 for Tx Ring details and section Section 7.2.3.5.1 for WTHRESH
>> details). For best performance it is recommended to minimize the number of
>> buffers as possible.
>>
>> Could u, pls., clarify why do u think that the maximum number of data 
>> buffers is
>> limited by 8?
> OK, i40e hardware is 8, so I'd assume x540 could have a similar one. Yes, in 
> your case,
> the limit could be around 38, right?
> Could you help to make sure there is no packet to be transmitted uses more 
> than
> 38 descriptors?
> I heard that there is a similar hang issue on X710 if using more than 8 
> descriptors for
> a single packet. I am wondering if the issue is similar on x540.
>
>

I believe that the ixgbe Linux driver does not limit packets to 8 
fragments, so apparently the hardware is capable.


[dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598

2015-08-25 Thread Vladislav Zolotarov
On Aug 25, 2015 22:16, "Zhang, Helin"  wrote:
>
>
>
> > -Original Message-
> > From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com]
> > Sent: Tuesday, August 25, 2015 11:53 AM
> > To: Zhang, Helin
> > Cc: Lu, Wenzhuo; dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above
1 for
> > all NICs but 82598
> >
> >
> >
> > On 08/25/15 21:43, Zhang, Helin wrote:
> > >
> > > Hi Vlad
> > >
> > > I think this could possibly be the root cause of your TX hang issue.
> > > Please try to limit the number to 8 or less, and then see if the issue
> > > will still be there or not?
> > >
> >
> > Helin, the issue has been seen on x540 devices. Pls., see a chapter
> > 7.2.1.1 of x540 devices spec:
> >
> > A packet (or multiple packets in transmit segmentation) can span any
number of
> > buffers (and their descriptors) up to a limit of 40 minus WTHRESH minus
2 (see
> > Section 7.2.3.3 for Tx Ring details and section Section 7.2.3.5.1 for
WTHRESH
> > details). For best performance it is recommended to minimize the number
of
> > buffers as possible.
> >
> > Could u, pls., clarify why do u think that the maximum number of data
buffers is
> > limited by 8?
> OK, i40e hardware is 8

For i40 it's a bit more complicated than just "not more than 8" - it's not
more than 8 for a non-TSO packet and not more than 8 for each MSS including
headers buffers for TSO. But this thread is not about i40e so this doesn't
seem to be relevant anyway.

, so I'd assume x540 could have a similar one.

x540 spec assumes otherwise... ?

Yes, in your case,
> the limit could be around 38, right?

If by "around 38" u mean "exactly 38" then u are absolutely right... ?

> Could you help to make sure there is no packet to be transmitted uses
more than
> 38 descriptors?

Just like i've already mentioned, we limit the cluster by at most 33 data
segments. Therefore we are good here...

> I heard that there is a similar hang issue on X710 if using more than 8
descriptors for
> a single packet. I am wondering if the issue is similar on x540.

What's x710? If that's xl710 40G nics (i40e driver), then it has its own
specs with its own HW limitations i've mentioned above. It has nothing to
do with this thread that is all about 10G nics managed by ixgbe driver.

There is a different thread, where i've raised the 40G NICs xmit issues.
See "i40e xmit path HW limitation" thread.

>
> Regards,
> Helin
>
> >
> > thanks,
> > vlad
> >
> > > It does not have any check for the number of descriptors to be used
> > > for a single packet, and it relies on the users to give correct mbuf
> > > chains.
> > >
> > > We may need a check of this somewhere. Of cause the point you
> > > indicated we also need to carefully investigate or fix.
> > >
> > > Regards,
> > >
> > > Helin
> > >
> > > *From:*Vladislav Zolotarov [mailto:vladz at cloudius-systems.com]
> > > *Sent:* Tuesday, August 25, 2015 11:34 AM
> > > *To:* Zhang, Helin
> > > *Cc:* Lu, Wenzhuo; dev at dpdk.org
> > > *Subject:* RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh
> > > above 1 for all NICs but 82598
> > >
> > >
> > > On Aug 25, 2015 21:14, "Zhang, Helin"  > > > wrote:
> > > >
> > > > Hi Vlad
> > > >
> > > >
> > > >
> > > > In addition, I?d double check with you what?s the maximum number of
> > > descriptors would be used for a single packet transmitting?
> > > >
> > > > Datasheet said that it supports up to 8. I am wondering if more than
> > > 8 were used in your case?
> > >
> > > If memory serves me well the maximum number of data descriptors per
> > > single xmit packet is 40 minus 2 minus WTHRESH. Since WTHRESH in DPDK
> > > is always zero it gives us 38 segments. We limit them by 33.
> > >
> > > >
> > > > Thank you very much!
> > > >
> > > >
> > > >
> > > > Regards,
> > > >
> > > > Helin
> > > >
> > > >
> > > >
> > > > From: Zhang, Helin
> > > > Sent: Wednesday, August 19, 2015 10:29 AM
> > > > To: Vladislav Zolotarov
> > > > Cc: Lu, Wenzhuo; dev at dpdk.org 
> > > >
> > > > Subject: RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh
> > > above 1 for all NICs but 82598
> > > >
> > > >
> > > >
> > > > Hi Vlad
> > > >
> > > >
> > > >
> > > > Thank you very much for the patches! Give me a few more time to
> > > double check with more guys, and possibly hardware experts.
> > > >
> > > >
> > > >
> > > > Regards,
> > > >
> > > > Helin
> > > >
> > > >
> > > >
> > > > From: Vladislav Zolotarov [mailto:vladz at cloudius-systems.com
> > > ]
> > > > Sent: Tuesday, August 18, 2015 9:56 PM
> > > > To: Lu, Wenzhuo
> > > > Cc: dev at dpdk.org ; Zhang, Helin
> > > > Subject: RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh
> > > above 1 for all NICs but 82598
> > > >
> > > >
> > > >
> > > >
> > > > On Aug 19, 2015 03:42, "Lu, Wenzhuo"  > > > wrote:
> > > > >
> > > > > Hi Helin,
> > > > >
> > > > > > -Original

[dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598

2015-08-25 Thread Vlad Zolotarov


On 08/25/15 22:30, Vladislav Zolotarov wrote:
>
>
> On Aug 25, 2015 22:16, "Zhang, Helin"  > wrote:
> >
> >
> >
> > > -Original Message-
> > > From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com 
> ]
> > > Sent: Tuesday, August 25, 2015 11:53 AM
> > > To: Zhang, Helin
> > > Cc: Lu, Wenzhuo; dev at dpdk.org 
> > > Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh 
> above 1 for
> > > all NICs but 82598
> > >
> > >
> > >
> > > On 08/25/15 21:43, Zhang, Helin wrote:
> > > >
> > > > Hi Vlad
> > > >
> > > > I think this could possibly be the root cause of your TX hang issue.
> > > > Please try to limit the number to 8 or less, and then see if the 
> issue
> > > > will still be there or not?
> > > >
> > >
> > > Helin, the issue has been seen on x540 devices. Pls., see a chapter
> > > 7.2.1.1 of x540 devices spec:
> > >
> > > A packet (or multiple packets in transmit segmentation) can span 
> any number of
> > > buffers (and their descriptors) up to a limit of 40 minus WTHRESH 
> minus 2 (see
> > > Section 7.2.3.3 for Tx Ring details and section Section 7.2.3.5.1 
> for WTHRESH
> > > details). For best performance it is recommended to minimize the 
> number of
> > > buffers as possible.
> > >
> > > Could u, pls., clarify why do u think that the maximum number of 
> data buffers is
> > > limited by 8?
> > OK, i40e hardware is 8
>
> For i40 it's a bit more complicated than just "not more than 8" - it's 
> not more than 8 for a non-TSO packet and not more than 8 for each MSS 
> including headers buffers for TSO. But this thread is not about i40e 
> so this doesn't seem to be relevant anyway.
>
> , so I'd assume x540 could have a similar one.
>
> x540 spec assumes otherwise... ?
>
> Yes, in your case,
> > the limit could be around 38, right?
>
> If by "around 38" u mean "exactly 38" then u are absolutely right... ?
>
> > Could you help to make sure there is no packet to be transmitted 
> uses more than
> > 38 descriptors?
>
> Just like i've already mentioned, we limit the cluster by at most 33 
> data segments. Therefore we are good here...
>
> > I heard that there is a similar hang issue on X710 if using more 
> than 8 descriptors for
> > a single packet. I am wondering if the issue is similar on x540.
>
> What's x710? If that's xl710 40G nics (i40e driver),
>

I've found what x710 NICs are - they are another NICs family managed by 
i40e PMD. Therefore the rest of what I said stands the same... ;)

> then it has its own specs with its own HW limitations i've mentioned 
> above. It has nothing to do with this thread that is all about 10G 
> nics managed by ixgbe driver.
>
> There is a different thread, where i've raised the 40G NICs xmit 
> issues. See "i40e xmit path HW limitation" thread.
>
> >
> > Regards,
> > Helin
> >
> > >
> > > thanks,
> > > vlad
> > >
> > > > It does not have any check for the number of descriptors to be used
> > > > for a single packet, and it relies on the users to give correct mbuf
> > > > chains.
> > > >
> > > > We may need a check of this somewhere. Of cause the point you
> > > > indicated we also need to carefully investigate or fix.
> > > >
> > > > Regards,
> > > >
> > > > Helin
> > > >
> > > > *From:*Vladislav Zolotarov [mailto:vladz at cloudius-systems.com 
> ]
> > > > *Sent:* Tuesday, August 25, 2015 11:34 AM
> > > > *To:* Zhang, Helin
> > > > *Cc:* Lu, Wenzhuo; dev at dpdk.org 
> > > > *Subject:* RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh
> > > > above 1 for all NICs but 82598
> > > >
> > > >
> > > > On Aug 25, 2015 21:14, "Zhang, Helin"  
> > > > >> 
> wrote:
> > > > >
> > > > > Hi Vlad
> > > > >
> > > > >
> > > > >
> > > > > In addition, I?d double check with you what?s the maximum 
> number of
> > > > descriptors would be used for a single packet transmitting?
> > > > >
> > > > > Datasheet said that it supports up to 8. I am wondering if 
> more than
> > > > 8 were used in your case?
> > > >
> > > > If memory serves me well the maximum number of data descriptors per
> > > > single xmit packet is 40 minus 2 minus WTHRESH. Since WTHRESH in 
> DPDK
> > > > is always zero it gives us 38 segments. We limit them by 33.
> > > >
> > > > >
> > > > > Thank you very much!
> > > > >
> > > > >
> > > > >
> > > > > Regards,
> > > > >
> > > > > Helin
> > > > >
> > > > >
> > > > >
> > > > > From: Zhang, Helin
> > > > > Sent: Wednesday, August 19, 2015 10:29 AM
> > > > > To: Vladislav Zolotarov
> > > > > Cc: Lu, Wenzhuo; dev at dpdk.org  
> >
> > > > >
> > > > > Subject: RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh
> > > > above 1 for all NICs but 82598
> > > > >
> > > > >
> > > > >
> > > > > Hi Vlad
> > > > >
> > > 

[dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for all NICs but 82598

2015-08-25 Thread Zhang, Helin
Yes, I got the perfect answers. Thank you very much!
I just wanted to make sure the test case was OK with the limit of maximum 
number of descriptors, as I heard there is a hang issue on other NICs of using 
more descriptors than hardware allowed.
OK. I am still waiting for the answers/confirmation from x540 hardware 
designers. We need all agree on your patches to avoid risks.

Regards,
Helin

From: Vladislav Zolotarov [mailto:vl...@cloudius-systems.com]
Sent: Tuesday, August 25, 2015 12:30 PM
To: Zhang, Helin
Cc: Lu, Wenzhuo; dev at dpdk.org
Subject: RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 for 
all NICs but 82598


On Aug 25, 2015 22:16, "Zhang, Helin" mailto:helin.zhang at intel.com>> wrote:
>
>
>
> > -Original Message-
> > From: Vlad Zolotarov [mailto:vladz at cloudius-systems.com > cloudius-systems.com>]
> > Sent: Tuesday, August 25, 2015 11:53 AM
> > To: Zhang, Helin
> > Cc: Lu, Wenzhuo; dev at dpdk.org
> > Subject: Re: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh above 1 
> > for
> > all NICs but 82598
> >
> >
> >
> > On 08/25/15 21:43, Zhang, Helin wrote:
> > >
> > > Hi Vlad
> > >
> > > I think this could possibly be the root cause of your TX hang issue.
> > > Please try to limit the number to 8 or less, and then see if the issue
> > > will still be there or not?
> > >
> >
> > Helin, the issue has been seen on x540 devices. Pls., see a chapter
> > 7.2.1.1 of x540 devices spec:
> >
> > A packet (or multiple packets in transmit segmentation) can span any number 
> > of
> > buffers (and their descriptors) up to a limit of 40 minus WTHRESH minus 2 
> > (see
> > Section 7.2.3.3 for Tx Ring details and section Section 7.2.3.5.1 for 
> > WTHRESH
> > details). For best performance it is recommended to minimize the number of
> > buffers as possible.
> >
> > Could u, pls., clarify why do u think that the maximum number of data 
> > buffers is
> > limited by 8?
> OK, i40e hardware is 8

For i40 it's a bit more complicated than just "not more than 8" - it's not more 
than 8 for a non-TSO packet and not more than 8 for each MSS including headers 
buffers for TSO. But this thread is not about i40e so this doesn't seem to be 
relevant anyway.

, so I'd assume x540 could have a similar one.

x540 spec assumes otherwise... ?

Yes, in your case,
> the limit could be around 38, right?

If by "around 38" u mean "exactly 38" then u are absolutely right... ?

> Could you help to make sure there is no packet to be transmitted uses more 
> than
> 38 descriptors?

Just like i've already mentioned, we limit the cluster by at most 33 data 
segments. Therefore we are good here...

> I heard that there is a similar hang issue on X710 if using more than 8 
> descriptors for
> a single packet. I am wondering if the issue is similar on x540.

What's x710? If that's xl710 40G nics (i40e driver), then it has its own specs 
with its own HW limitations i've mentioned above. It has nothing to do with 
this thread that is all about 10G nics managed by ixgbe driver.

There is a different thread, where i've raised the 40G NICs xmit issues. See 
"i40e xmit path HW limitation" thread.

>
> Regards,
> Helin
>
> >
> > thanks,
> > vlad
> >
> > > It does not have any check for the number of descriptors to be used
> > > for a single packet, and it relies on the users to give correct mbuf
> > > chains.
> > >
> > > We may need a check of this somewhere. Of cause the point you
> > > indicated we also need to carefully investigate or fix.
> > >
> > > Regards,
> > >
> > > Helin
> > >
> > > *From:*Vladislav Zolotarov [mailto:vladz at 
> > > cloudius-systems.com]
> > > *Sent:* Tuesday, August 25, 2015 11:34 AM
> > > *To:* Zhang, Helin
> > > *Cc:* Lu, Wenzhuo; dev at dpdk.org
> > > *Subject:* RE: [dpdk-dev] [PATCH v1] ixgbe_pmd: forbid tx_rs_thresh
> > > above 1 for all NICs but 82598
> > >
> > >
> > > On Aug 25, 2015 21:14, "Zhang, Helin"  > > intel.com
> > > >> wrote:
> > > >
> > > > Hi Vlad
> > > >
> > > >
> > > >
> > > > In addition, I?d double check with you what?s the maximum number of
> > > descriptors would be used for a single packet transmitting?
> > > >
> > > > Datasheet said that it supports up to 8. I am wondering if more than
> > > 8 were used in your case?
> > >
> > > If memory serves me well the maximum number of data descriptors per
> > > single xmit packet is 40 minus 2 minus WTHRESH. Since WTHRESH in DPDK
> > > is always zero it gives us 38 segments. We limit them by 33.
> > >
> > > >
> > > > Thank you very much!
> > > >
> > > >
> > > >
> > > > Regards,
> > > >
> > > > Helin
> > > >
> > > >
> > > >
> > > > From: Zhang, Helin
> > > > Sent: Wednesday, August 19, 2015 10:29 AM
> > > > To: Vladislav Zolotarov
> > > > Cc: Lu, Wenzhuo; dev at dpdk.org  > > > d

[dpdk-dev] flow_director_filter error!!

2015-08-25 Thread Navneet Rao
Hi Jingjing:

Thanks.

I did have the ethertype_filter ignore the mac_addr, and look at only ethertype 
filtyer and it still  got a "bad arguments" message :-(

testpmd>  ethertype_filter 0 add mac_ignr ethertype 0x0806 fwd queue 1
Bad arguments



-Original Message-
From: Wu, Jingjing [mailto:jingjing...@intel.com] 
Sent: Tuesday, August 25, 2015 6:55 AM
To: Navneet Rao; Mcnamara, John; dev at dpdk.org
Subject: RE: [dpdk-dev] flow_director_filter error!!

Hi, Navneet

I'm sorry for I have no idea about the NIC i540. Are you talking about X540?
If X540, I guess you can't classify on the MAC-ADDRESS to different queue by 
ethertype filter. Because in the X540 datasheet the ethertype filter is 
described as below:
" 7.1.2.3 L2 Ethertype Filters
These filters identify packets by their L2 Ethertype, 802.1Q user priority and 
optionally assign them to a receive queue."

So the mac_address is not the filter's input.

Thanks
Jingjing

> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Navneet Rao
> Sent: Friday, August 21, 2015 2:57 AM
> To: Mcnamara, John; dev at dpdk.org
> Subject: Re: [dpdk-dev] flow_director_filter error!!
> 
> Thanks John.
> 
> I am trying to setup/use the flow-director-filter on the i540.
> 
> -- When I try to setup the flow-director-filter as per the example, I 
> am getting "bad arguments"!!!
>  So decided to see if the flush command would work.
> 
> 
> In the interim --- I am using ethertype filter to accomplish the following.
> What I am trying to do is this --
> Use 2 different i540 cards
> Use the igb_uio driver.
> Use the testpmd app.
> Setup 5 different MAC-ADDRESSes on each port. (using the set mac_addr 
> command) Setup 5 different RxQs and TxQs on each port.
> And then use the testpmd app to generate traffic..
> 
> I am assuming that the testpmd app will now send and receive traffic 
> using the 5 different MAC_ADDRESSes..
> On each port's receive I will now want to classify on the MAC-ADDRESS 
> and steer the traffic to different queues.
> 
> Is there an example/reference on how to achieve this?
> 
> Next, I would want to do "classify" on "flexbytes" and send/steer the 
> traffic to different queues using flow-director-filter.
> 
> Thanks
> -Navneet
> 
> 
> 
> 
> -Original Message-
> From: Mcnamara, John [mailto:john.mcnamara at intel.com]
> Sent: Wednesday, August 19, 2015 3:39 PM
> To: Navneet Rao; dev at dpdk.org
> Subject: RE: [dpdk-dev] flow_director_filter error!!
> 
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Navneet Rao
> > Sent: Tuesday, August 18, 2015 4:01 PM
> > To:  HYPERLINK "mailto:dev at dpdk.org" dev at dpdk.org
> > Subject: [dpdk-dev] flow_director_filter error!!
> >
> > After I start the testpmd app, I am flusing the flow_director_filter 
> > settings and get the following error -
> >
> >
> >
> > testpmd> flush_flow_director 0
> >
> > PMD: ixgbe_fdir_flush(): Failed to re-initialize FD table.
> >
> > flow director table flushing error: (Too many open files in system)
> 
> Hi,
> 
> Are you setting a flow director filter before flushing? If so, could you give 
> an example.
> 
> John.
> --
>