[dpdk-dev] CPU does not support x86-64 instruction set

2014-06-23 Thread Alex Markuze
Hi, I'm new to DPDK and Im trying to compile on a x86 Ubuntu 14.04 VM(KVM).
And I'm getting this error:

"error: CPU you selected does not support x86-64 instruction set"

I've seen in the Archive that Jinho had this same issue last year, I'd be
glad to know how it was resolved.

Thanks
Alex.


[dpdk-dev] Fwd: CPU does not support x86-64 instruction set

2014-06-24 Thread Alex Markuze
Thomas Thanks for your reply,
I've resolved the issue in a similar way by modifying the VM xml config
file with this line

CONFIG_RTE_MACHINE="x86-64"'
But it didn't seem to matter (I didn't explore this much farther, I
dint really try a clean build and didn't make sure that the .config file
I've modified was read).

Alex.



On Mon, Jun 23, 2014 at 5:42 PM, Thomas Monjalon 
wrote:

> Hi,
>
> 2014-06-23 15:42, Alex Markuze:
> > Hi, I'm new to DPDK and Im trying to compile on a x86 Ubuntu 14.04
> VM(KVM).
> > And I'm getting this error:
> >
> > "error: CPU you selected does not support x86-64 instruction set"
>
> You should try "-cpu host" option of Qemu/KVM in order to have the full
> instruction set of your host.
>
> Please confirm it's working.
> --
> Thomas
>


[dpdk-dev] Fwd: CPU does not support x86-64 instruction set

2014-06-24 Thread Alex Markuze
On Tue, Jun 24, 2014 at 11:21 AM, Thomas Monjalon  wrote:

> Welcome Alex.
> Please, for future messages, try to answer below as explained here:
> http://dpdk.org/ml
>
> 2014-06-24 11:12, Alex Markuze:
> > Thomas Monjalon  wrote:
> > > 2014-06-23 15:42, Alex Markuze:
> > > > Hi, I'm new to DPDK and Im trying to compile on a x86 Ubuntu 14.04
> > > VM(KVM).
> > > > And I'm getting this error:
> > > >
> > > > "error: CPU you selected does not support x86-64 instruction set"
> > >
> > > You should try "-cpu host" option of Qemu/KVM in order to have the full
> > > instruction set of your host.
> >
> > I've resolved the issue in a similar way by modifying the VM xml config
> > file with this line
> >  ->
> > CONFIG_RTE_MACHINE="x86-64"'
> > But it didn't seem to matter (I didn't explore this much farther, I
> > dint really try a clean build and didn't make sure that the .config file
> > I've modified was read).
>
> Not sure to understand what you want.
> If you try to build DPDK for most of machines (including VM), you should
> set
> CONFIG_RTE_MACHINE="default"
> in your .config file.
>
>My wish is to be able to compile DPDK on a VM that has "model name :
QEMU Virtual CPU version 2.0.0"
   I'm guessing that the gcc doesn't understand the Arch so it throws this
error:
  "error: CPU you selected does not support x86-64 instruction set"

   I was looking for a way to tell the gcc not to worry about the QEMU cpu
and just compile x86_64 binary.

> --
> Thomas
>


[dpdk-dev] Memory Pinning.

2014-06-30 Thread Alex Markuze
Hi, Guys.
I have several newbie questions about the DPDK design I was hoping some one
could answer.

Both in the RX and TX flow, the Buffer Memory must be pinned and not
swappable.
In RDMA, memory is explicitly registered and made pinned (to the limit
defined @ /etc/security/limits.conf) .With regular sockets/kernel driver
the NIC DMA's the buffer from/to the kernel which are by definition un
swappable.

So I'm guessing that at least the TX/RX buffers are mapped to kernel space.

My questions are 1. How are the buffers made unswappable ? Are they shared
with the kernel 2. When and Which buffers are mapped/unmapped to the kernel
space. 3. When are the buffers DMA mapped and by whom?

And another "bonus" Question. On TX flow I didn't find a way to receive a
send completion.
So how Can I know when its safe to modify the sent buffers (besides of
waiting for the ring buffer to complete a full circle)?


Thanks.
Alex.


[dpdk-dev] bifurcated driver

2014-11-05 Thread Alex Markuze
On Wed, Nov 5, 2014 at 3:00 PM, Thomas Monjalon 
wrote:

> Hi Danny,
>
> 2014-10-31 17:36, O'driscoll, Tim:
> > Bifurcated Driver (Danny.Zhou at intel.com)
>
> Thanks for the presentation of bifurcated driver during the community call.
> I asked if you looked at ibverbs and you wanted a link to check.
> The kernel module is here:
>
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/infiniband/core
> The userspace library:
> http://git.kernel.org/cgit/libs/infiniband/libibverbs.git
>
> Extract from Kconfig:
> "
> config INFINIBAND_USER_ACCESS
> tristate "InfiniBand userspace access (verbs and CM)"
> select ANON_INODES
> ---help---
>   Userspace InfiniBand access support.  This enables the
>   kernel side of userspace verbs and the userspace
>   communication manager (CM).  This allows userspace processes
>   to set up connections and directly access InfiniBand
>   hardware for fast-path operations.  You will also need
>   libibverbs, libibcm and a hardware driver library from
>   .
> "
>
> It seems to be close to the bifurcated driver needs.
> Not sure if it can solve the security issues if there is no dedicated MMU
> in the NIC.
>

Mellanox NIC's and other  RDMA HW (Infiniband/RoCE/iWARP) have MTT units -
memory translation units - a dedicated MMU. These are filled via an
ibv_reg_mr sys calls - this creates a Process VM to physical/iova memory
mapping in the NIC. Thus each process can access only its own memory via
the NIC. This is the way RNICs resolve the security issue I'm not sure how
standard intel nics could support this scheme.

There is already a 6wind PMD for mellanox Nics. I'm assuming this PMD is
verbs based and behaves similar to the bifurcated driver proposed.
http://www.mellanox.com/page/press_release_item?id=979

One, thing that I don't understand (And will be happy if some one could
shed some light on), is how does the NIC supposed do distinguish between
packets that need to go to the kernel driver rings and packets going to
user space rings.

I feel we should sum up pros and cons of
> - igb_uio
> - uio_pci_generic
> - VFIO
> - ibverbs
> - bifurcated driver
> I suggest to consider these criterias:
> - upstream status
> - usable with kernel netdev
> - usable in a vm
> - usable for ethernet
> - hardware requirements
> - security protection
> - performance
>
> Regarding obverts - I'm not sure how its relevant to future DPDK
development , but this is the run down as I know It.
 This is a veteran package called OFED , or its counterpart Mellanox OFED.
    The kernel drivers are upstream
    The PCI dev stays in the kernels care trough out its life span
    SRIOV support exists, paravirt support exists only(AFAIK) as an
Office of the CTO(VMware) project called vRDMA
    Eth/RoCE (RDMA over Converged Ethernet)/IB
   === HW === RDMA capable HW ONLY.
    Security is designed into RDMA HW
    Stellar performance - Favored by HPC.



> --
> Thomas
>


[dpdk-dev] bifurcated driver

2014-11-05 Thread Alex Markuze
On Wed, Nov 5, 2014 at 5:14 PM, Alex Markuze  wrote:

> On Wed, Nov 5, 2014 at 3:00 PM, Thomas Monjalon  > wrote:
>
>> Hi Danny,
>>
>> 2014-10-31 17:36, O'driscoll, Tim:
>> > Bifurcated Driver (Danny.Zhou at intel.com)
>>
>> Thanks for the presentation of bifurcated driver during the community
>> call.
>> I asked if you looked at ibverbs and you wanted a link to check.
>> The kernel module is here:
>>
>> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/infiniband/core
>> The userspace library:
>> http://git.kernel.org/cgit/libs/infiniband/libibverbs.git
>>
>> Extract from Kconfig:
>> "
>> config INFINIBAND_USER_ACCESS
>> tristate "InfiniBand userspace access (verbs and CM)"
>> select ANON_INODES
>> ---help---
>>   Userspace InfiniBand access support.  This enables the
>>   kernel side of userspace verbs and the userspace
>>   communication manager (CM).  This allows userspace processes
>>   to set up connections and directly access InfiniBand
>>   hardware for fast-path operations.  You will also need
>>   libibverbs, libibcm and a hardware driver library from
>>   <http://www.openfabrics.org/git/>.
>> "
>>
>> It seems to be close to the bifurcated driver needs.
>> Not sure if it can solve the security issues if there is no dedicated MMU
>> in the NIC.
>>
>
> Mellanox NIC's and other  RDMA HW (Infiniband/RoCE/iWARP) have MTT units -
> memory translation units - a dedicated MMU. These are filled via an
> ibv_reg_mr sys calls - this creates a Process VM to physical/iova memory
> mapping in the NIC. Thus each process can access only its own memory via
> the NIC. This is the way RNIC*s resolve the security issue I'm not sure how
> standard intel nics could support this scheme.
>
> There is already a 6wind PMD for mellanox Nics. I'm assuming this PMD is
> verbs based and behaves similar to the bifurcated driver proposed.
> http://www.mellanox.com/page/press_release_item?id=979
>
> One, thing that I don't understand (And will be happy if some one could
> shed some light on), is how does the NIC supposed do distinguish between
> packets that need to go to the kernel driver rings and packets going to
> user space rings.
>
> I feel we should sum up pros and cons of
>> - igb_uio
>> - uio_pci_generic
>> - VFIO
>> - ibverbs
>> - bifurcated driver
>> I suggest to consider these criterias:
>> - upstream status
>> - usable with kernel netdev
>> - usable in a vm
>> - usable for ethernet
>> - hardware requirements
>> - security protection
>> - performance
>>
>> Regarding IBVERBS - I'm not sure how its relevant to future DPDK
> development , but this is the run down as I know It.
>  This is a veteran package called OFED , or its counterpart Mellanox OFED.
> The kernel drivers are upstream
> The PCI dev stays in the kernels care trough out its life span
> SRIOV support exists, paravirt support exists only(AFAIK) as an
> Office of the CTO(VMware) project called vRDMA
> Eth/RoCE (RDMA over Converged Ethernet)/IB
>=== HW === RDMA capable HW ONLY.
> Security is designed into RDMA HW
>
    Stellar performance - Favored by HPC.
>

*RNIC - RDMA (Remote DMA - iWARP/Infinibad/RoCE)capable NICs.

>
>
>> --
>> Thomas
>>
>
>


[dpdk-dev] bifurcated driver

2014-11-06 Thread Alex Markuze
Danny sums up the issue perfectly IMHO.
While both verbs and DPDK aim to provide generic user space networking, the
similarities end there.
verbs and RDMA HW are closely coupled and behave differently then standard
eth nics and are not related to netdev mechanisms.

Or, welcome to this discussion.

Those interested can read the IB spec's (+1K pages) available from
openfabrics*.
*https://www.openfabrics.org/index.php




On Thu, Nov 6, 2014 at 6:45 AM, Zhou, Danny  wrote:

> I roughly read libibverbs related code and relevant infiniband/rdma
> documents, and found though
> many concepts in libibverbs looks similar to bifurcated driver, but there
> are still lots of differences as
> illustrated below based on my understanding:
>
> 1) Queue pair defined in RDMA specification are abstract concept, where
> the queue pairs term used in
>   bifurcated driver are rx/tx queue pairs in the NIC.
> 2) Bifurcated PMD in DPDK directly access NIC resources as a slave driver
> (no NIC control), while libibverbs
>   as a user space library rather than driver offloads certain operations
> to kernel driver and NIC by invoking
>   "verbs" APIs.
> 3) Libibverbs invokes infiniband specific system calls to allow
> user/kernel space communication based on
>   "verbs" defined in infiniband/RDMA spec, while bifurcated driver build
> on top of af_packet module
>   and new socket options to do things like hw queue split-off , map
> certain pages on I/O space to user space
>   operations, etc.
> 4) There is a specific embedded MMU unit in Infiniband/RDMA to provides
> memory protection, while
>   bifurcated driver uses IOMMU rather than NIC to provide memory
> protection.
>
> IMHO, libibverbs and corresponding kernel modules/drivers are specifically
> designed and implemented for
> direct access to RDMA hardware from userspace, and it highly depends on
> "verbs" related system calls
> supported by infiniband/rdma mechanism in kernel, rather than netdev
> mechanism that bifurcated driver
> solution depends on.
>
> > -Original Message-
> > From: Vincent JARDIN [mailto:vincent.jardin at 6wind.com]
> > Sent: Thursday, November 06, 2014 9:31 AM
> > To: Zhou, Danny
> > Cc: Thomas Monjalon; dev at dpdk.org; Fastabend, John R; Or Gerlitz
> > Subject: Re: [dpdk-dev] bifurcated driver
> >
> > +Or
> >
> > On 05/11/2014 23:48, Zhou, Danny wrote:
> > > Hi Thomas,
> > >
> > > Thanks for sharing the links to ibverbs, I will take a close look at
> it and compare it to bifurcated driver. My take
> > > after a rough review is that idea is very much similar, but bifurcated
> driver implementation is generic for any
> > > Ethernet device based on existing af_packet mechanism, with extension
> of exchanging the messages between
> > > user space and kernel space driver.
> > >
> > > I have an internal document to summary the pros and cons of below
> solutions, except for ibvers, but
> > > will be adding it shortly.
> > >
> > > - igb_uio
> > > - uio_pci_generic
> > > - VFIO
> > > - bifurcated driver
> > >
> > > Short answers to your questions:
> > >>- upstream status
> > > Adding IOMMU based memory protection and generic descriptor
> description support now, into version 2
> > > kernel patches.
> > >
> > >>- usable with kernel netdev
> > > af_packet based, and relevant patchset will be submitted to netdev for
> sure.
> > >
> > >>- usable in a vm
> > > No, it does no coexist with SRIOV for number of reasons. but if you
> pass-through a PF to a VM, it works perfect.
> > >
> > >>- usable for Ethernet
> > > It could work with all Ethernet NICs, as flow director is available
> and NIC driver support new net_ops to split off
> > > queue pairs for user space.
> > >
> > >>- hardware requirements
> > > No specific hardware requirements. All mainstream NICs have multiple
> qpairs and flow director support.
> > >
> > >>- security protection
> > > Leverage IOMMU to provide memory protection on Intel platform. Other
> archs provide similar memory protection
> > > mechanism, so we only use arch-agnostic DMA memory allocation APIs in
> kernel to support memory protection.
> > >
> > >>- performance
> > > DPDK native performance on user space queues, as long as drop_en is
> enabled to avoid head-of-line blocking.
> > >
> > > -Danny
> > >
> > >> -Original Message-
> > >> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
> > >> Sent: Wednesday, November 05, 2014 9:01 PM
> > >> To: Zhou, Danny
> > >> Cc: dev at dpdk.org; Fastabend, John R
> > >> Subject: Re: [dpdk-dev] bifurcated driver
> > >>
> > >> Hi Danny,
> > >>
> > >> 2014-10-31 17:36, O'driscoll, Tim:
> > >>> Bifurcated Driver (Danny.Zhou at intel.com)
> > >>
> > >> Thanks for the presentation of bifurcated driver during the community
> call.
> > >> I asked if you looked at ibverbs and you wanted a link to check.
> > >> The kernel module is here:
> > >>
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/infiniband/core
> > >> The userspace l

[dpdk-dev] UDP Checksum

2014-11-06 Thread Alex Markuze
Hi,
I'm seeing "UDP: bad checksum." messages(dmesg) for packets sent by my dpdk
app to a socket on a remote machine.
Looking at the packets the scum value is set, its just not what wireshark
expects.

When sending I'm setting these fields in the egress packets.

pkt->pkt.vlan_macip.f.l2_len = sizeof(struct ether_hdr);

pkt->pkt.vlan_macip.f.l3_len = sizeof(struct ipv4_hdr);

pkt->ol_flags |= (PKT_TX_IP_CKSUM | PKT_TX_L4_MASK);
//PKT_TX_OFFLOAD_MASK;


I'm working with a 82599 VF.


Any thoughts? I'm not sure what else to check.


[dpdk-dev] UDP Checksum

2014-11-06 Thread Alex Markuze
I was setting both ip and udp scum fields to 0. PKT_TX_UDP_CKSUM ==
PKT_TX_L4_MASK = 0x6000.

I was not aware of the get_ipv4_psd_sum(ipv4_hdr);
And I'm quite frankly surprised the HW doesn't already do this. Farther
more I don't remember kernel drivers messing with
L3 Headers(bnx2x/mlx4). Is this true for all PMDs that do scum offloads?

I will give it a try now.


On Thu, Nov 6, 2014 at 6:15 PM, Ananyev, Konstantin <
konstantin.ananyev at intel.com> wrote:

>
>
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Alex Markuze
> > Sent: Thursday, November 06, 2014 4:05 PM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] UDP Checksum
> >
> > Hi,
> > I'm seeing "UDP: bad checksum." messages(dmesg) for packets sent by my
> dpdk
> > app to a socket on a remote machine.
> > Looking at the packets the scum value is set, its just not what wireshark
> > expects.
> >
> > When sending I'm setting these fields in the egress packets.
> >
> > pkt->pkt.vlan_macip.f.l2_len = sizeof(struct ether_hdr);
> >
> > pkt->pkt.vlan_macip.f.l3_len = sizeof(struct ipv4_hdr);
> >
> > pkt->ol_flags |= (PKT_TX_IP_CKSUM | PKT_TX_L4_MASK);
> > //PKT_TX_OFFLOAD_MASK;
> >
> >
> > I'm working with a 82599 VF.
> >
> >
> > Any thoughts? I'm not sure what else to check.
>
> As I remember, you have to setup  IPV4 header checksum to 0 and
> calculate and setup pseudo-header checksum for UDP.
> From app/test-pmd/csumonly.c:
> ...
> if (pkt_ol_flags & (PKT_RX_IPV4_HDR | PKT_RX_TUNNEL_IPV4_HDR)) {
>
> /* Do not support ipv4 option field */
> l3_len = sizeof(struct ipv4_hdr) ;
>
> ...
>
> /* Do not delete, this is required by HW*/
> ipv4_hdr->hdr_checksum = 0;
>
>...
>
>   if (l4_proto == IPPROTO_UDP) {
> udp_hdr = (struct udp_hdr*)
> (rte_pktmbuf_mtod(mb,
> unsigned char *) + l2_len
> + l3_len);
> if (tx_ol_flags & 0x2) {
> /* HW Offload */
> ol_flags |= PKT_TX_UDP_CKSUM;
> if (ipv4_tunnel)
> udp_hdr->dgram_cksum = 0;
> else
> /* Pseudo header sum need
> be set properly */
> udp_hdr->dgram_cksum =
>
> get_ipv4_psd_sum(ipv4_hdr);
>
>
>
>


[dpdk-dev] DPDK memory mechanism

2014-08-03 Thread Alex Markuze
I had several similar concerns a few weeks back, please look for this
email thread "Memory Pinning." On this mailing list.
Bruce was very helpful. Ill forward the thread to you now. Hope it helps.

On Fri, Aug 1, 2014 at 11:54 PM, Wenji Wu  wrote:
> Hello, everybody,
>
> I am new on DPDK, and have several questions on DPDK.
>
> Is "Mbuf Pool? pinned to avoid being swapped out? I checked the source code, 
> and found there is API called ?rte_mem_lock_page?. But it seems this API is 
> never by called. Do I miss something?
>
> Thanks,
>
> wenji
>
>
>
>


[dpdk-dev] Debugging EAL PCI / Driver Init

2014-08-03 Thread Alex Markuze
Hi Matt, Dev
I'm Trying to compile ann app linking to dpdk and dpdk based libs.
And I'm seeing the same issue you've reported.
The probe function doesn't seem to find any ixgbevf(SRIOV VM) ports.
Same code compiled as a dpdk app works fine.

In your solution to this issue you are referring to  -lintel_dpdk? I
couldn't find any reference to it.

Thanks
Alex.


On Sat, Aug 2, 2014 at 7:46 PM, Matthew Hall  wrote:
> On Sun, Aug 03, 2014 at 01:37:06AM +0900, Masaru Oki wrote:
>> cc links library funtion from archive only if call from other object.
>> but new dpdk pmd library has constractor section and not call directly.
>> ld always links library funtion with constractor section.
>> use -Xlinker, or use ld instead of cc.
>
> Hello Oki-san,
>
> The trick to fix it was this, I finally found it in the example Makefiles with
> V=1 flag.
>
> -Wl,--whole-archive -Wl,--start-group -lintel_dpdk -Wl,--end-group 
> -Wl,--no-whole-archive
>
> Thank you for the advice you provided, I couldn't have fixed it without your
> suggestions... it got me to look more closely at the linking. Importantly,
> "-Wl,--whole-archive" includes the entire archive whether or not it's called
> from other objects, so we don't lose the constructors, just like you said.
>
> Matthew.


[dpdk-dev] Debugging EAL PCI / Driver Init

2014-08-03 Thread Alex Markuze
Resolved just as Matt has described.
To remove any ambiguity (for future reference).

This line in the gcc command resolves the issue in my case (a
different nic may need a different lib):
--whole-archive -Wl,--start-group -lrte_pmd_ixgbe -Wl,--end-group -Wl,

The problem is that the probe command polls over all pci devices and
tries to find a matching driver, these drivers register
With these macros which will only be called when the --whole-archive
option is provided and its actually the reason for this flags
existence. Without this flag the driver list is empty.

PMD_REGISTER_DRIVER(rte_ixgbe_driver);
PMD_REGISTER_DRIVER(rte_ixgbevf_driver);


On Sun, Aug 3, 2014 at 1:38 PM, Alex Markuze  wrote:
> Hi Matt, Dev
> I'm Trying to compile ann app linking to dpdk and dpdk based libs.
> And I'm seeing the same issue you've reported.
> The probe function doesn't seem to find any ixgbevf(SRIOV VM) ports.
> Same code compiled as a dpdk app works fine.
>
> In your solution to this issue you are referring to  -lintel_dpdk? I
> couldn't find any reference to it.
>
> Thanks
> Alex.
>
>
> On Sat, Aug 2, 2014 at 7:46 PM, Matthew Hall  wrote:
>> On Sun, Aug 03, 2014 at 01:37:06AM +0900, Masaru Oki wrote:
>>> cc links library funtion from archive only if call from other object.
>>> but new dpdk pmd library has constractor section and not call directly.
>>> ld always links library funtion with constractor section.
>>> use -Xlinker, or use ld instead of cc.
>>
>> Hello Oki-san,
>>
>> The trick to fix it was this, I finally found it in the example Makefiles 
>> with
>> V=1 flag.
>>
>> -Wl,--whole-archive -Wl,--start-group -lintel_dpdk -Wl,--end-group 
>> -Wl,--no-whole-archive
>>
>> Thank you for the advice you provided, I couldn't have fixed it without your
>> suggestions... it got me to look more closely at the linking. Importantly,
>> "-Wl,--whole-archive" includes the entire archive whether or not it's called
>> from other objects, so we don't lose the constructors, just like you said.
>>
>> Matthew.


[dpdk-dev] Performance issue with vmxnet3 pmd

2014-08-13 Thread Alex Markuze
HI Guys, I will continue this thread.

On ubuntu 14.4 (Kernel 3.13) with DPDK 1.7 vmxnet3-usermap 1.2 doesn't compile.
>From the git it seems to have been updated 3 months ago.
Is this project going to be killed? And should I look for different
alternatives.

On Tue, Jul 8, 2014 at 6:08 PM, Hyunseok  wrote:
> Thomas,
>
> The last time I tried vmxnet3-usermap a couple of weeks ago, it did not
> compile against the latest kernel (3.11).  Is it still the case?  Or do you
> have the latest version which is compatible with newer kernels?
>
> Also, do you have any benchmark numbers with vmxnet3-usermap in any chance?
>
> Regards,
> -Hyunseok
>
>
>
>
> On Tue, Jul 8, 2014 at 3:05 AM, Thomas Monjalon 
> wrote:
>
>> Hi,
>>
>> 2014-07-07 18:22, Hyunseok:
>> > I was testing l2-fwd with vmxnet3 pmd (included in dpdk).
>>
>> Have you tested vmxnet3-usermap (http://dpdk.org/doc/vmxnet3-usermap)?
>>
>> > The maximum forwarding rate I got from vmxnet3 pmd with l2fwd is only 2.5
>> > to 2.8 Gbps.
>>
>> It could be interesting to know your exact testing procedure with numbers
>> for
>> vmxnet3-usermap.
>>
>> Thanks
>> --
>> Thomas
>>


[dpdk-dev] Vmxnet3 pmd

2014-08-13 Thread Alex Markuze
Hi, I Have a simple dpdk app - basically a KNI interface with the dpdk
layer serving only as a pipeline.

This allows me to ping between vEth0 on different VM's works great with ixgbevf.
Now I moved to ESXi5.5 , Ubuntu14.4 VM (Dpdk 1.7).
When running the same code*  I've discovered that the polling doesn't
retrieve any packets after vEth0 gents an IP. I've resolved this issue
by removing the dev restart calls I had in the callback.

//  rte_eth_dev_stop(port_id);
//  ret = rte_eth_dev_start(port_id);

Is this a know issue? how can I report a BUG if its not.

Thanks
Alex.

*Except this line I needed to add to setup the TXQ.
tx_conf.txq_flags |= (ETH_TXQ_FLAGS_NOMULTSEGS | ETH_TXQ_FLAGS_NOOFFLOADS);


[dpdk-dev] [PATCH 3/6]i40e:Add VxLAN Cloud filter API

2014-08-13 Thread Alex Markuze
All are L2 over L3(UDP) -  General name  - Network Overlay.


On Wed, Aug 13, 2014 at 4:50 PM, Thomas Monjalon
 wrote:
> 2014-08-13 08:23, Liu, Jijiang:
>> From: Thomas Monjalon [mailto:thomas.monjalon at 6wind.com]
>> > About API, why name it cloud filter instead of VxLAN?
>>
>> VxLAN is just a kind tunnel type, there are another tunnel types based
>> on protocol type, they are below.
>> Tunnel Type:
>> * 0x0: VXLAN
>> * 0x1: NVGRE or other MAC in GRE
>> * 0x2: Geneve
>>   0x3: IP in GRE
>> Currently, I just implemented VxLAN tunnel type, and we will support
>> another tunnel types in cloud filter API later.
>
> OK, I understand. But cloud filter is just a marketing name.
> Please let's stick to technical and precise names.
> It seems these tunnels are L2 over IP, right?
>
> --
> Thomas


[dpdk-dev] VMware Fusion + DPDK and KNI

2014-08-20 Thread Alex Markuze
I'm pretty sure I will stumble on this issue in the near feature.
Thanks for the heads up.

On Mon, Aug 18, 2014 at 9:16 PM, Jay Rolette  wrote:
> Thought I'd put this out there in case anyone else runs into it.
>
> Using DPDK 1.6 on Ubuntu 14.04 LTS in a hardware appliance. Also using KNI
> to share the data ports with an app that needs a normal TCP/IP stack
> interface.
>
> We had everything working reasonably well on the hardware, but most of our
> developers run their code in a VM on their laptops: OS X (Mavericks) +
> VMware Fusion 6 Pro.
>
> On some VMs, we were getting errors trying to configure KNI ports:
>
> $ sudo ifconfig ffEth0 10.111.2.100 netmask 255.255.0.0 up
> SIOCSIFFLAGS: Timer expired
> SIOCSIFFLAGS: Timer expired
>
> Skipping the "fun" involved with trying to track down the problem, here's
> what ended up fixing it.
>
> We had 4 network ports on the VM:
>
>- eth0 - Management port
>- eth1 - "other" function not related to the problem
>- eth2 & eth3 - inline datapath (bump-in-the-wire), but also KNI mapped
>to ffEth0 & ffEth1 by our DPDK app
>
> If eth2 and eth3 are on the same vmnet, you'll get the "SIOCSIFFLAGS: Timer
> expired" errors. Depending on what parameters you try to set, ifconfig may
> think some of them have taken effect (they haven't) or it won't (setting
> the MTU, etc.).
>
> If you put eth2 and eth3 on separate vmnets, then no issues and you can
> configure the KNI ports via ifconfig as expected.
>
> No idea why having the ports on the same vmnet matters, since our app
> doesn't care, but I haven't gone spelunking through the KNI source to find
> the root cause.
>
> Doubtful this will matter to many (any?), but maybe it'll save someone some
> time.
>
> Jay Rolette
> *infinite io*


[dpdk-dev] ixgbe network card has dev_info.max_rx_queues == 0

2014-08-21 Thread Alex Markuze
RX and TX Are short hand for Receive and Transmit Queues.
These Queues Store the in/egress packets.

Just looking at the info you've sent it tells you that max_rx_queues
for this dev is 0 (Clearly something is wrong here) so the nb_rx_q
which is 3 is an Invalid Value -EINVAL == -22.

On Thu, Aug 21, 2014 at 3:26 PM, Sergey Mironov  wrote:
> Hi. I have face a strange error on one of my network cards. Call to
> rte_eth_dev_configure returns with error code -22. Increaing the
> verbosity level shows the following:
>
>
> PMD: rte_eth_dev_configure: ethdev port_id=2 nb_rx_queues=3 > 0
> EAL: Error - exiting with code: 1
>
> here is the snippet of code which returns the error
>
>
> ./lib/librte_ether/rte_ethdev.c : 513
>
> (*dev->dev_ops->dev_infos_get)(dev, &dev_info);
> if (nb_rx_q > dev_info.max_rx_queues) {
> PMD_DEBUG_TRACE("ethdev port_id=%d nb_rx_queues=%d > %d\n",
> port_id, nb_rx_q, dev_info.max_rx_queues);
> return (-EINVAL);
> }
>
> What does this error means (what is rx queues of an adapter?) What may
> cause such a problem? I am using dpdk 1.5.1r1.
>
> Thanks in advance,
> Sergey


[dpdk-dev] VMXNET 3

2014-08-26 Thread Alex Markuze
Hi
I'm looking for reasonable DPDK based solution in fully virtualised
VMware environment.
>From what I've seen there are several flavours of VMXNET 3 driver for
dpdk not all of them seem to be alive - user map -last updated on may
and doesn't compile on DPDK 1.7.

So to my question what is the state of the DPDK art in vmxnet drivers?
and what performance
could one expect over a 10/40/56G nic?

Thanks
Alex.


[dpdk-dev] overcommitting CPUs

2014-08-27 Thread Alex Markuze
IMHO adding "Interrupt Mode" to dpdk is important as this can open
DPDK to a larger public of consumers, I can easily imagine someone
trying to find user space networking  solution (And deciding against
verbs - RDMA) for the obvious reasons and not needing deterministic
latency.

A few thoughts:

Deterministic Latency: Its a fiction in a sence that  this something
you will be able to see only in a small controlled environment. As
network latencies in Data Centres(DC) are dominated by switch queuing
(One good reference is http://fastpass.mit.edu that Vincent shared a
few days back).

Virtual environments: In virtual environments this is especially
interesting as the NIC driver(Hypervisor) is working in IRQ mode which
unless the Interrupts are pinned to different cpus then the VM will
have a disruptive effect on the VM's performance. Moving to interrupt
mode mode in paravirtualised environments makes sense as in any
environment that is not carefully crafted you should not expect any
deterministic guaranties and would opt for a simpler programming model
- like interrupt mode.

NAPI: With 10G NICs Most CPUs poll rate is faster then the NIC message
rate resulting in 1:1 napi_poll callback to IRQ ratio this is true
even with small packets. In some cases where the CPU is working slower
- for example when intel_iommu=on,strict is set , you can actually see
a performance inversion where the "slower" CPU can reach higher B/W
because the slowdown makes NAPI work with the kernel effectively
moving to polling mode.

I think that a smarter DPDK-NAPI is important, but it is a next step
IFF the interrupt mode is adopted.

On Wed, Aug 27, 2014 at 8:48 AM, Patel, Rashmin N
 wrote:
> You're right and I've felt the same harder part of determinism with other 
> hypervisors' soft switch solutions as well. I think it's worth thinking about.
>
> Thanks,
> Rashmin
>
> On Aug 26, 2014 9:15 PM, Stephen Hemminger  
> wrote:
> The way to handle switch between out of poll mode is to use IRQ coalescing
> parameters.
> You want to hold off IRQ until there are a couple packets or a short delay.
> Going out of poll mode
> is harder to determine.
>
>
> On Tue, Aug 26, 2014 at 9:59 AM, Zhou, Danny  wrote:
>
>>
>> > -Original Message-
>> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Stephen Hemminger
>> > Sent: Wednesday, August 27, 2014 12:39 AM
>> > To: Michael Marchetti
>> > Cc: dev at dpdk.org
>> > Subject: Re: [dpdk-dev] overcommitting CPUs
>> >
>> > On Tue, 26 Aug 2014 16:27:14 +
>> > "Michael  Marchetti"  wrote:
>> >
>> > > Hi, has there been any consideration to introduce a non-spinning
>> network driver (interrupt based), for the purpose of overcommitting
>> > CPUs in a virtualized environment?  This would obviously have reduced
>> high-end performance but would allow for increased guest
>> > density (sharing of physical CPUs) on a host.
>> > >
>> > > I am interested in adding support for this kind of operation, is there
>> any interest in the community?
>> > >
>> > > Thanks,
>> > >
>> > > Mike.
>> >
>> > Better to implement a NAPI like algorithm that adapts from poll to
>> interrupt.
>>
>> Agreed, but DPDK is currently pure poll-mode based, so unlike the NAPI'
>> simple algorithm, the new heuristic algorithm should not switch from
>> poll-mode to interrupt-mode immediately once there is no packet in the
>> recent poll. Otherwise, mode switching will be too frequent which brings
>> serious negative performance impact to DPDK.
>>


[dpdk-dev] DPDK and custom memory

2014-08-31 Thread Alex Markuze
Artur, I don't have the details of what you are trying to achieve, but
it sounds like something that is covered by IOMMU, SW or HW.  The
IOMMU creates an iova (I/O Virtual address) the nic can access the
range is controlled with flags passed to the dma_map functions.

So I understand your question this way, How does the DPDK work with
IOMMU enabled system and can you influence the mapping?


On Sat, Aug 30, 2014 at 4:03 PM, Thomas Monjalon
 wrote:
> Hello,
>
> 2014-08-29 18:40, Saygin, Artur:
>> Imagine a PMD for an FPGA-based NIC that is limited to accessing certain
>> memory regions .
>
> Does it mean Intel is making an FPGA-based NIC?
>
>> Is there a way to make DPDK use that exact memory?
>
> Maybe I don't understand the question well, because it doesn't seem really
> different of what other PMDs do.
> Assuming your NIC is PCI, you can access it via uio (igb_uio) or VFIO.
>
>> Perhaps this is more of a hugetlbfs question than DPDK but I thought I'd
>> start here.
>
> It's a pleasure to receive new drivers.
> Welcome here :)
>
> --
> Thomas


[dpdk-dev] Next Community Call, Tuesday 2nd December, 8:00 AM GMT

2014-12-01 Thread Alex Markuze
Hi, I've noticed that there is no bridge for Israel, this time.
Could one be set up?

On Mon, Dec 1, 2014 at 4:49 AM, Tetsuya Mukawa  wrote:

> Hi Tim,
>
> Could I explain port hotplug function at next conference call?
> For this explanation, I've prepared slides. could I send a PDF to this ML?
> That slides describe what is the function, how it works and what is
> current progress.
> And it's under 100KB.
>
> Regards,
> Tetsuya
>
> (2014/11/22 1:08), O'driscoll, Tim wrote:
> > We're going to hold our next community call on Tuesday 2nd December.
> This time, we're going to try a time that's more suitable for participants
> in Asia, so we're going to hold it at 8:00 AM GMT. The meeting time in a
> variety of timezones is included below.
> >
> > Generally, GoToMeeting worked well last time, although there was a
> limitation that Neil was unable to present slides as he joined from a Linux
> system. We'll stick with GoToMeeting again this time as we don't yet have a
> better solution. Details for joining the GoToMeeting session are included
> below.
> >
> > I'll record the session again and post the video to YouTube afterwards
> for anybody who can't make it. This seemed to work well last time, and as
> Kevin pointed out, the audio quality on the recording is good.
> >
> > For the agenda, we'd like to discuss the following:
> > ? Remaining 2.0 candidate features, especially PCI Hotplug as there's
> been a lot of discussion on that on the mailing list. Hopefully Tetsuya
> Mukawa can join us to describe his work on this.
> > ? DPDK Test Suite. We hope to announce the release of this next week.
> Waterman Cao and Yong (Marvin) Liu from our Shanghai team will describe the
> functionality and benefits of this.
> >
> >
> > Meeting Time:
> > Dublin (Ireland) Tuesday, December 2, 2014 at 8:00:00 AM GMT UTC
> > Paris (France) Tuesday, December 2, 2014 at 9:00:00 AM CET UTC+1 hour
> > Tel Aviv (Israel) Tuesday, December 2, 2014 at 10:00:00 AM IST UTC+2
> hours
> > Moscow (Russia) Tuesday, December 2, 2014 at 11:00:00 AM MSK UTC+3 hours
> > New Delhi (India - Delhi) Tuesday, December 2, 2014 at 1:30:00 PM IST
> UTC+5:30 hours
> > Shanghai (China - Shanghai Municipality) Tuesday, December 2, 2014 at
> 4:00:00 PM CST UTC+8 hours
> > Tokyo (Japan) Tuesday, December 2, 2014 at 5:00:00 PM JST UTC+9 hours
> > San Francisco (U.S.A. - California) Midnight between Monday, December 1,
> 2014 and Tuesday, December 2, 2014 PST UTC-8 hours
> > Phoenix (U.S.A. - Arizona) Tuesday, December 2, 2014 at 1:00:00 AM MST
> UTC-7 hours
> > New York (U.S.A. - New York) Tuesday, December 2, 2014 at 3:00:00 AM EST
> UTC-5 hours
> > Ottawa (Canada - Ontario) Tuesday, December 2, 2014 at 3:00:00 AM EST
> UTC-5 hours
> > Corresponding UTC (GMT) Tuesday, December 2, 2014 at 08:00:00
> >
> >
> > GoToMeeting Details:
> > To join, follow the meeting link:
> https://global.gotomeeting.com/join/772753069. This will start the
> GoToMeeting web viewer. You then have two options for audio:
> >
> > 1. To use your computer's audio via a headset, you need to switch to the
> desktop version of GoToMeeting. You can do this by clicking the GoToMeeting
> icon on the top right hand side of the web viewer, and then selecting
> "Switch to the desktop version". The desktop version will need to download
> and install, so if you plan to use this you may want to get it set up in
> advance. Once it starts, under the Audio section, you can select "Mic &
> Speakers". The desktop version is only available for Windows and Mac, so if
> you're using Linux then you need to use option 2 below.
> >
> > 2. You can join using a phone via one of the numbers listed below. The
> Access Code is 772-753-069. You'll also be asked for an Audio PIN, which is
> accessible by clicking the phone icon in the GoToMeeting web viewer after
> you've joined the meeting.
> > ? Australia   (Long distance): +61 2 9091 7601
> > ? Austria   (Long distance): +43 (0) 7 2088 1036
> > ? Belgium   (Long distance): +32 (0) 28 08 4345
> > ? Canada   (Long distance): +1 (647) 497-9372
> > ? Denmark   (Long distance): +45 (0) 69 91 89 24
> > ? Finland   (Long distance): +358 (0) 942 45 0382
> > ? France   (Long distance): +33 (0) 170 950 586
> > ? Germany   (Long distance): +49 (0) 692 5736 7206
> > ? Ireland   (Long distance): +353 (0) 15 255 598
> > ? Italy   (Long distance): +39 0 694 80 31 28
> > ? Netherlands   (Long distance): +31 (0) 208 084 055
> > ? New Zealand   (Long distance): +64 (0) 4 974 7243
> > ? Norway   (Long distance): +47 23 96 01 18
> > ? Spain   (Long distance): +34 932 20 0506
> > ? Sweden   (Long distance): +46 (0) 852 500 182
> > ? Switzerland   (Long distance): +41 (0) 435 0824 78
> > ? United Kingdom   (Long distance): +44 (0) 330 221 0098
> > ? United States   (Long distance): +1 (626) 521-0013
> > Access Code 772-753-069.
> >
> > Info on downloading the desktop app is available at:
> http://support.citrixonline.com/en_US/meeting/help_files/G2M010002?title=Download%7D
> > Info on the 

[dpdk-dev] i40 on dpdk 1.7

2014-12-01 Thread Alex Markuze
Hi, We are currently using dpdk 1.7, and I've seen lots of patches for the
i40 pmd adding features and bug fixes.
How much functionality for the xl-710 vf exists in dpdk 1.7

Thanks
Alex


[dpdk-dev] Assign randomly generated MAC address

2014-12-02 Thread Alex Markuze
Hi, I'm seeing this message on real_init. Running from an ESX VM with
Intel 82599 VF

Assign randomly generated MAC address 02:09:c0:88:05:c6

The result is that the NIC anti spoofing kills all my tx traffic and for
some reason jumbo frames fail to go out from the second vf that does work.

Can any one shed some light on the issue? what additional Info I should
provide?

Same code does run on a different host/vm, It seems that some configuration
is missing on the PF.

Thanks.


[dpdk-dev] two tso related questions

2014-12-16 Thread Alex Markuze
On Mon, Dec 15, 2014 at 10:20 PM, Helmut Sim  wrote:
>
> Hi,
>
> While working on TSO based solution I faced the following two questions:
>
> 1.
> is there a maximum pkt_len to be used with TSO?, e.g. let's say if seg_sz
> is 1400 can the entire segmented pkt be 256K (higer than 64K) ?, then the
> driver gets a list of chanined mbufs while the first mbuf is set to TSO
> offload.
>

TSO segments a TCP packet into mtu sied bits. The TCP/IP protocols are
limited to 64K due to the length fields being 16bit wide. You can't build a
valid packet longer then 64K regardless of the NIC.


> 2.
> I wonder, Is there a specific reason why TSO is supported only for IXGBE
> and not for IGB ? the 82576 NIC supports TSO though.
> Is it due to a kind of tecnical barrier or is it because of priorities?
>
> It will be great if someone from the forum could address this.
>
> Thanks,
> Sim
>


[dpdk-dev] two tso related questions

2014-12-16 Thread Alex Markuze
On Tue, Dec 16, 2014 at 2:24 PM, Helmut Sim  wrote:
>
> Thanks Alex,
>
> So i probably miss something...
> what you are saying is correct for IP segmentation where the segmentation
> is at the IP level, and all segments are identified according to the
> Identification field in the IP header.
>
> However in TCP segmentation the segments are at the TCP level (isn't it?),
> where each frame is at a size of
> MSS+sizeof(tcp_hdr)+sizeof(ip_hdr)+sizeof(eth_hdr).
> Hence, for each of the sent packets, the IP Identification is 0 and the IP
> total length is MSS+sizeof(tcp_hdr)+sizeof(ip_hdr).
>
> Please correct me if i am wrong.
>
TSO - takes a one packet max size 64KB(not counting mac/vlan size). and
brakes it into valid mtu sized packets each with its one IP and TCP header.
I'm not sure what how the identificayion/Frag off fields are filled. you
can easily check it by running a short tcp stream(perf/netperf) between two
machines and capturing the packets with tcpdump (wireshark to open) use
ethanol -K to disable LRO/GRO (the receive side kernel driver will
rearrange the headers otherwise).

I hope this helps.

>
>
thanks.
>
> On Tue, Dec 16, 2014 at 11:10 AM, Alex Markuze  wrote:
>>
>>
>>
>> On Mon, Dec 15, 2014 at 10:20 PM, Helmut Sim  wrote:
>>>
>>> Hi,
>>>
>>> While working on TSO based solution I faced the following two questions:
>>>
>>> 1.
>>> is there a maximum pkt_len to be used with TSO?, e.g. let's say if seg_sz
>>> is 1400 can the entire segmented pkt be 256K (higer than 64K) ?, then the
>>> driver gets a list of chanined mbufs while the first mbuf is set to TSO
>>> offload.
>>>
>>
>> TSO segments a TCP packet into mtu sied bits. The TCP/IP protocols are
>> limited to 64K due to the length fields being 16bit wide. You can't build a
>> valid packet longer then 64K regardless of the NIC.
>>
>>
>>> 2.
>>> I wonder, Is there a specific reason why TSO is supported only for IXGBE
>>> and not for IGB ? the 82576 NIC supports TSO though.
>>> Is it due to a kind of tecnical barrier or is it because of priorities?
>>>
>>> It will be great if someone from the forum could address this.
>>>
>>> Thanks,
>>> Sim
>>>
>>


[dpdk-dev] rte_mempool_create fails with ENOMEM

2014-12-18 Thread Alex Markuze
I've Also seen a similar issue when trying to run a dpdk app which
allocates huge pools(~0.5GB) after a memory heavy operation on the machine.

I've come to the same conclusion as you did, that internal fragmentation is
causing pool creation failures.
It seems that the rte_mempool_xmem_create/rte_memzone_reserve_aligned are
attempting to create physicaly contiguous pools. Which may offer a slight
performance gain(?) but may cause unpredictable allocation issues which is
a big risk for DC deployments where hundreds or even thousands of machines
may be deployed with a dpdk app and fail inexplicably.

I didn't really get the chance to digg into the memory managment internals
of DPDK, so feel free to correct me where I'm off.

Thanks.

On Thu, Dec 18, 2014 at 3:25 PM, Newman Poborsky 
wrote:
>
> Hi,
>
> could someone please provide any explanation why sometimes mempool creation
> fails with ENOMEM?
>
> I run my test app several times without any problems and then I start
> getting ENOMEM error when creating mempool that are used for packets. I try
> to delete everything from /mnt/huge, I increase the number of huge pages,
> remount /mnt/huge but nothing helps.
>
> There is more than enough memory on server. I tried to debug
> rte_mempool_create() call and it seems that after server is restarted free
> mem segments are bigger than 2MB, but after running test app for several
> times, it seems that all free mem segments have a size of 2MB, and since I
> am requesting 8MB for my packet mempool, this fails.  I'm not really sure
> that this conclusion is correct.
>
> Does anybody have any idea what to check and how running my test app
> several times affects hugepages?
>
> For me, this doesn't make any since because after test app exits, resources
> should be freed, right?
>
> This has been driving me crazy for days now. I tried reading a bit more
> theory about hugepages, but didn't find out anything that could help me.
> Maybe it's something else and completely trivial, but I can't figure it
> out, so any help is appreciated.
>
> Thank you!
>
> BR,
> Newman P.
>


[dpdk-dev] VLAN header insertion and removal

2014-12-21 Thread Alex Markuze
On ingress when configuring the device
1.modify
the rte_eth_conf.rxmode.hw_vlan_strip  = 1,

On egress you need to  modify the rte_van_macip struct in the sent mbuf*.
(rte_mbuf.h)
   1. add the PKT_TX_VLAN_PKT to the
ol_flags fields.
   2. fill the plan_tci in
pkt.vlan_macip.f.vlan_tci

*This is true for dpdk 1.7, structs may have moved in 1.8

On Sat, Dec 20, 2014 at 8:39 PM, Padam Jeet Singh 
wrote:

> Hello,
>
> I have done a simple mbuf adjust and prepend to achieve the removal and
> insertion of the vlan header and it works fine. The use case is something
> similar to the l3fwd example where one port has traffic coming in on
> multiple vlans and the other port has no vlans. The packet processing path
> in the middle inserts or removes the vlan.
>
> Is there an offload flag or mode where this can be done in hardware? Or is
> there a more optimized way to do this in dpdk? For sake of performance I
> want to avoid the copy ethernet header, modify mbuf, copy back the header
> steps.
>
> Thanks,
> Padam
> ---
> Sent from my mobile. Please excuse the brevity, spelling and punctuation.
>
>


[dpdk-dev] Air conditioner reapir man is running late, ergo ill be delayed.

2014-12-31 Thread Alex Markuze



[dpdk-dev] Memory Pinning.

2014-07-01 Thread Alex Markuze
On Mon, Jun 30, 2014 at 7:55 PM, Richardson, Bruce <
bruce.richardson at intel.com> wrote:

> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Alex Markuze
> > Sent: Monday, June 30, 2014 3:01 AM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] Memory Pinning.
> >
> > Hi, Guys.
> > I have several newbie questions about the DPDK design I was hoping some
> one
> > could answer.
> >
> > Both in the RX and TX flow, the Buffer Memory must be pinned and not
> > swappable.
> > In RDMA, memory is explicitly registered and made pinned (to the limit
> > defined @ /etc/security/limits.conf) .With regular sockets/kernel driver
> > the NIC DMA's the buffer from/to the kernel which are by definition un
> > swappable.
> >
> > So I'm guessing that at least the TX/RX buffers are mapped to kernel
> space.
> >
> > My questions are 1. How are the buffers made unswappable ? Are they
> shared
> > with the kernel 2. When and Which buffers are mapped/unmapped to the
> kernel
> > space. 3. When are the buffers DMA mapped and by whom?
>
> The memory used is all hugepage memory and as such is not swappable by the
> kernel, so remains in place for the duration of the application. At
> initialization time, we query from the kernel via /proc the physical
> address of the pages being used, and when sending buffers to the NIC we use
> those physical addresses directly.
>
>Thanks for the clarification, the actual physical memory can be used in
the write descriptor only when the iova is the same as the physical
address.  When IOMMU is enabled which AFAIK is enabled with deferred
protection by default (intel_iommu=on) , each device will have its own
notion of the iova (which can actually used for the DMA op) for the same
physical address.

So how does DPDK handle IOMMU currently?

> >
> > And another "bonus" Question. On TX flow I didn't find a way to receive a
> > send completion.
> > So how Can I know when its safe to modify the sent buffers (besides of
> > waiting for the ring buffer to complete a full circle)?
>
> This will depend upon the configuration of the NIC on TX. By default when
> using the fast-path we have the NIC only write-back confirmation of a
> packet being sent every 32 packets. You can poll the ring for this
> notification and which point you know all previous packets have been sent.
> If you want to know on a per-packet basis as soon as the packet is sent,
> you'll need to change the write-back threshold to write back every packet.
> That will impact performance, though. Note, too, that there are no APIs
> right now to query if a particular packet is sent, so you will have to
> write the code to scan the TX rings directly yourself.
>
> /Bruce
>


[dpdk-dev] Intel DPDK: exception_path:RTE_ARCH

2014-07-02 Thread Alex Markuze
You need, to define the following variables before compiling, please refer
to the DPDK documentation.


export RTE_TARGET=x86_64-native-linuxapp-gcc

export RTE_SDK=/home/user/dpdk


On Wed, Jul 2, 2014 at 12:48 PM, sothy shan  wrote:

> Hello!
>
> I started playing Intel DPDK example. I used to compile exception_path
> code.
> When I do make command "make", I got an error,
>
> RTE_ARCH is not set. So I set the variable via terminal bash using
>
> export RTE_ARCH=x86_64
>
> Stilll it is not working. Any wrong anywhere?
>
> Thanks for your reponse.
>
> Best regards
> Sothy
>


[dpdk-dev] KNI hw Address.

2014-07-02 Thread Alex Markuze
Hi, I'm playing with KNI on a VM (kvm), the Interface that is created has
no MAC address until the IP is set via ifconfig  - then a random mac is
created.
The VF has a mac address that is easily retrieved with rte_eth_macaddr_get.

What I did not find is a way to create the KNI with that specific mac
address enabled.
What are the ways to set the KNI mac address?

Thanks
Alex


[dpdk-dev] KNI hw Address.

2014-07-03 Thread Alex Markuze
Thanks, guys.
I think I will modify the alloc KNI API, as in the way things are
implemented today the KNI interface can't transmit because of intc MAC
spoofing (w/o hacking the data path, similar to the l2fw example). The
other issue because the interface is created with initial mac address of
0's Im pretty sure DHCP is also out of a question.

Thanks, for the detailed info.


On Thu, Jul 3, 2014 at 9:22 AM, Zhang, Helin  wrote:

>
>
> > -Original Message-
> > From: Padam J. Singh [mailto:padam.singh at inventum.net]
> > Sent: Thursday, July 3, 2014 1:06 PM
> > To: Zhang, Helin
> > Cc: Alex Markuze; dev at dpdk.org
> > Subject: Re: [dpdk-dev] KNI hw Address.
> >
> > Zhang, Alex,
> >
> > Please see the patch I had submitted a few days back which allows
> setting the
> > MAC address using
> >
> > ifconfig ... hw ether MAC-ADDRESS
> >
> > An "ifconfig DEV up" , followed by this sets the MAC address.
> >
> > Thanks,
> > Padam
> >
> > On 03-Jul-2014, at 10:26 am, Zhang, Helin  wrote:
> >
> > >
> > >> -Original Message-
> > >> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Alex Markuze
> > >> Sent: Wednesday, July 2, 2014 11:57 PM
> > >> To: dev at dpdk.org
> > >> Subject: [dpdk-dev] KNI hw Address.
> > >>
> > >> Hi, I'm playing with KNI on a VM (kvm), the Interface that is created
> > >> has no MAC address until the IP is set via ifconfig  - then a random
> mac is
> > created.
> > >> The VF has a mac address that is easily retrieved with
> > rte_eth_macaddr_get.
> > >>
> > >> What I did not find is a way to create the KNI with that specific mac
> > >> address enabled.
> > >> What are the ways to set the KNI mac address?
> > >>
> > >> Thanks
> > >> Alex
> > >
> > > Hi Alex
> > >
> > > No way without modifying the code. Two ways can be taken into account
> as
> > below.
> > >
> > > 1. Implement ndo_set_mac in KNI kernel module to set the MAC address.
> > > 2. Add mac address as one more parameters in user space KNI interface,
> to
> > tell the kernel module the mac during kni device creation.
> > >
> > > Regards,
> > > Helin
>
> Hi Padam
>
> Great! I think you have implemented the first way I listed. It is good for
> VM environments. I remember that might be adopted by some projects based on
> DPDK. Thank you!
> I will review your patch, and possibly add reviewed-by: to your patch.
>
> Regards,
> Helin
>


[dpdk-dev] DPDK Performance issue with l2fwd

2014-07-10 Thread Alex Markuze
Hi Zachary,
Your issue may be with the PCI-e 3, with 16 lanes Each slot is limited to
128Gb/s[3].
Now, AFAIK[1] the CPU is connected to the  I/O with a single PCI-E slot.

Several thoughts that may help you:

1. You can figure out the max b/w by running netsurf over the kernel
interfaces (w/o DPDK). Each CPU can handle the Netperf and the Completion
interrupts with grace (packets of 64K and all offloads on) for 10Gb nics.
With more then 10 Nics I would disable the IRQ balancer and make sure
interrupts are spread evenly by setting the  IRQ affinity manually [2].
As long as you have a physical core(NO hyperthreading) per NIC port you can
figure out the MAX B/W you can get with all the nics.

2. You can try using (If available to you , obviously) 40Gb and 56Gb Nics
(Mellanox), In this case for each Netperf flow you will need to separate
each Netperf Stream and the interrupts to different Cores to Reach wire
speed as long as both cores are on the same NUMA node(lscpu).

Hope this helps.

[1]
http://komposter.com.ua/documents/PCI_Express_Base_Specification_Revision_3.0.pdf
[2]
http://h50146.www5.hp.com/products/software/oe/linux/mainstream/support/whitepaper/pdfs/4AA4-9294ENW.pdf
[3]http://en.wikipedia.org/wiki/PCI_Express#PCI_Express_3.x


On Thu, Jul 10, 2014 at 11:07 AM,  wrote:

> Hey Guys,
>
> Recently, I have used l2fwd to test 160G (82599 10G * 16 ports), but I
> got a strange pheromone in my test.
>
> When I used 12 ports to test the performance of l2fwd, it can work fine
> and achieve 120G.
> But it got abnormal when I using over than 12 port. Part of ports seems
> something wrong and no any Tx/Rx.
> Has anyone know about this?
>
> My testing Environment.
> 1. E5-2658 v2 (10 cores) * 2
>
> http://ark.intel.com/zh-tw/products/76160/Intel-Xeon-Processor-E5-2658-v2-25M-Cache-2_40-GHz
> 2. one core handle one port. (In order to get best performance.)
> 3. No any QPI crossing  issue.
> 4. l2fwd parameters
>  4.1 -c 0xF0FF -- -P 0xF00FF  => 120G get!
>  4.2 -c 0xFF0FF -- -P 0xFF0FF => Failed! Only first 10 ports can
> work well.
>  4.3 -c 0x3F3FF -- -P 0x3F3FF => Failed! Only first 10 ports can
> work well.
>
> BTW, I have tried lots of parameter sets and if I set the ports number
> over than 12 ports, it only first 10 ports got work.
> Else, everything got well.
>
> Can anyone help me to solve the issue? Or DPDK only can set less equal
> than 12 ports?
> Or DPDK max throughput is 120G?
>
> ? This email may contain
> confidential information. Please do not use or disclose it in any way and
> delete it if you are not the intended recipient.
>


[dpdk-dev] Hardware Offloads Support for VF

2014-07-14 Thread Alex Markuze
Hi,
I have a Virtual setup with an Intel 82599 NIC (VF).

I'm trying to Disable CRC stripping, and the flag is gracefully ignored.
This seems to be documented In the DPDK June Release notes (6.16).

Are these limitations (Jumbo Frames, CRC Stripping , Checksum ) are a NIC
(HW/FW)limitations? Or is this something that can be resolved in SW to
allow separate per VF configuration?

Thanks.


[dpdk-dev] [PATCH v2] kni: use netif_rx instead of netif_receive_skb in which ocurr deallock on userpace contex

2014-07-17 Thread Alex Markuze
On Thu, Jul 17, 2014 at 3:02 PM, Thomas Monjalon
 wrote:
> Hi,
>
> 2014-07-11 23:37, Yao-Po Wang:
>> Per netif_receive_skb function description, it may only be called from
>> interrupt contex, but KNI is run on kthread that like as user-space
>> contex. It may occur deallock, if netif_receive_skb called from kthread,
>> so it should be repleaced by netif_rx or adding local_bh_disable/enable
>> around netif_receive_skb.
>>
>> Signed-off-by: Yao-Po Wang 
>
>> --- a/lib/librte_eal/linuxapp/kni/kni_net.c
>> +++ b/lib/librte_eal/linuxapp/kni/kni_net.c
>>   /* Call netif interface */
>> - netif_receive_skb(skb);
>> + netif_rx(skb);
>
> Is there someone confident to approve this change?

Yao-Po is correct.

Please see this comment In Linux source code .
http://lxr.free-electrons.com/source/net/core/dev.c#L3715

Context:
All todays network drivers use(ixgbe is no exception) NAPI which is a
soft irq context of the receive flow .
This is the context in which netif_receive_skb should be called in.
KNI is not soft IRQ context so the generic netif_rx is the
right function.

>
> --
> Thomas


[dpdk-dev] About round trip latency with DPDK

2014-07-24 Thread Alex Markuze
Kai, the latency depends both on what you do and how much you send.
A bigger packet will take longer time to transmit.

Now that thats out of the way I propose you use perf to see how busy
is the cpu and with what.

FYI, ~10us is something that can be achieved with netperf   with a
kernel driver based on interrupts.
The 0.7m latency indicates that something is wrong with your system.

Basically make sure that the cpu is busy with polling only and a small
percent handling the messages.

Hope this helps somehow.

On Wed, Jul 23, 2014 at 10:24 PM, Kai Zhang  wrote:
> Hello,
>
> I am trying to develop a low-latency application, and I measured the round
> trip latency with DPDK. However I got an average of 650~720 microseconds
> round-trip latency with Intel 82599 10Gbps NIC.
>
> The experiment method is as follows. 2 machines (A and B) are connected
> back-to-back. Machine A embeds a time stamp in the packet and sends to B, B
> (use testpmd or l2fwd) forwards packets back to A immediately (A->B->A),
> and A receives packets and calculates time difference between current time
> and the embedded time stamp. (code :
> https://github.com/kay21s/dpdk/tree/master/examples/recv_send)
>
> I have 3 machines, and performing the above experiment on each pair leads
> to a similar latency. However, previous academic papers report that DPDK
> offers only a few 10 microseconds round trip latency.
>
> What's the round trip latency DPDK is supposed to offer? Have you measured
> it at Intel?
>
> Thanks a lot,
> Kai


[dpdk-dev] DPDK Demos at IDF conference using DDIO

2014-09-28 Thread Alex Markuze
Even IF only the Demo is available it would be useful.
I assume the people behind the Demo are pert of this mailing list (Or
someone on the mailing list knows them).
It would be great if the demo was publicly available anywhere.

On Thu, Sep 25, 2014 at 11:09 PM, Matthew Hall  wrote:
> On Thu, Sep 25, 2014 at 07:27:21PM +, Anjali Kulkarni wrote:
>> Actually, in the demo that I saw they had probably used as many of the
>> accelerations as possible to get the kind of rates they described. Even if
>> we could see (a documentation of) what all things they used in this
>> particular application, it would help.
>> From my discussions, it seemed as if there were some specific lookup APIs
>> that they used to get better performance.
>>
>> Anjali
>
> Indeed it would be best if this stuff were documented first, then demoed.
> Otherwise it's hard to get reliably reproducible results.
>
> In particular something which went all the way through the processing pipeline
> from Rx to Tx L1-L7.
>
> Matthew.


[dpdk-dev] GSO support by PMD drivers

2014-09-28 Thread Alex Markuze
LSO/TSO support is an important feature, I'm surprised its not
supported in DPDK.
I personally would like to see these patches.


On Fri, Sep 26, 2014 at 1:23 PM, Vadim Suraev  wrote:
> Hi, all,
> I found ixgbe in couple with rte_mbuf (and probably other PMD drivers)
> don't support GSO, I reverse engineered the linux kernel's  ixgbe's gso
> support and got it working in 1.6. Could it be useful to provide the patch?
> Regards,
>  Vadim.


[dpdk-dev] DPDK doesn't work with iommu=pt

2014-09-28 Thread Alex Markuze
iommu=pt effectively disables iommu for the kernel and iommu is
enabled only for KVM.
http://lwn.net/Articles/329174/

Basically unless you have KVM running you can remove both lines for
the same effect.
On the other hand if you do have KVM and you do want iommu=on You can
remove the iommu=pt for the same performance because AFAIK unlike the
kernel drivers DPDK doesn't dma_map and dma_unman each and every
ingress/egress packet (Please correct me if I'm wrong), and will not
suffer any performance penalties.

FYI. Kernel NIC drivers:
When iommu=on{,strict} the kernel network drivers will suffer a heavy
performance penalty due to regular IOVA modifications (both HW and SW
at fault here). Ixgbe and Mellanox reuse dma_mapped pages on the
receive side to avoid this penalty, but still suffer from iommu on TX.

On Fri, Sep 26, 2014 at 5:47 PM, Choi, Sy Jong  
wrote:
> Hi Shimamoto-san,
>
> There are a lot of sighting relate to "DMAR:[fault reason 06] PTE Read access 
> is not set"
> https://www.mail-archive.com/kvm at vger.kernel.org/msg106573.html
>
> This might be related to IOMMU, and kernel code.
>
> Here is what we know :-
> 1) Disabling VT-d in bios also removed the symptom
> 2) Switch to another OS distribution also removed the symptom
> 3) even different HW we will not see the symptom. In my case, switch from 
> Engineering board to EPSD board.
>
> Regards,
> Choi, Sy Jong
> Platform Application Engineer
>
>
> -Original Message-
> From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Hiroshi Shimamoto
> Sent: Friday, September 26, 2014 5:14 PM
> To: dev at dpdk.org
> Cc: Hayato Momma
> Subject: [dpdk-dev] DPDK doesn't work with iommu=pt
>
> I encountered an issue that DPDK doesn't work with "iommu=pt intel_iommu=on"
> on HP ProLiant DL380p Gen8 server. I'm using the following environment;
>
>   HW: ProLiant DL380p Gen8
>   CPU: E5-2697 v2
>   OS: RHEL7
>   kernel: kernel-3.10.0-123 and the latest kernel 3.17-rc6+
>   DPDK: v1.7.1-53-gce5abac
>   NIC: 82599ES
>
> When boot with "iommu=pt intel_iommu=on", I got the below message and no 
> packets are handled.
>
>   [  120.809611] dmar: DRHD: handling fault status reg 2
>   [  120.809635] dmar: DMAR:[DMA Read] Request device [21:00.0] fault addr 
> aa01
>   DMAR:[fault reason 02] Present bit in context entry is clear
>
> How to reproduce;
> just run testpmd
> # ./testpmd -c 0xf -n 4 -- -i
>
> Configuring Port 0 (socket 0)
> PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x754eafc0 
> hw_ring=0x7420 dma_addr=0xaa00
> PMD: ixgbe_dev_tx_queue_setup(): Using full-featured tx code path
> PMD: ixgbe_dev_tx_queue_setup():  - txq_flags = 0 [IXGBE_SIMPLE_FLAGS=f01]
> PMD: ixgbe_dev_tx_queue_setup():  - tx_rs_thresh = 32 
> [RTE_PMD_IXGBE_TX_MAX_BURST=32]
> PMD: ixgbe_dev_rx_queue_setup(): sw_ring=0x754ea740 
> hw_ring=0x7421 dma_addr=0xaa01
> PMD: check_rx_burst_bulk_alloc_preconditions(): Rx Burst Bulk Alloc 
> Preconditions: rxq->rx_free_thresh=0, RTE_PMD_IXGBE_RX_MAX_BURST=32
> PMD: ixgbe_dev_rx_queue_setup(): Rx Burst Bulk Alloc Preconditions are not 
> satisfied, Scattered Rx is requested, or RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC 
> is not enabled (port=0, queue=0).
> PMD: check_rx_burst_bulk_alloc_preconditions(): Rx Burst Bulk Alloc 
> Preconditions: rxq->rx_free_thresh=0, RTE_PMD_IXGBE_RX_MAX_BURST=32
>
> testpmd> start
>   io packet forwarding - CRC stripping disabled - packets/burst=32
>   nb forwarding cores=1 - nb forwarding ports=2
>   RX queues=1 - RX desc=128 - RX free threshold=0
>   RX threshold registers: pthresh=8 hthresh=8 wthresh=0
>   TX queues=1 - TX desc=512 - TX free threshold=0
>   TX threshold registers: pthresh=32 hthresh=0 wthresh=0
>   TX RS bit threshold=0 - TXQ flags=0x0
>
>
> and ping from another box to this server.
> # ping6 -I eth2 ff02::1
>
> I got the below error message and no packet is received.
> I couldn't see any increase RX/TX count in testpmt statistics
>
> testpmd> show port stats 0
>
>    NIC statistics for port 0  
>   RX-packets: 6  RX-missed: 0  RX-bytes:  732
>   RX-badcrc:  0  RX-badlen: 0  RX-errors: 0
>   RX-nombuf:  0
>   TX-packets: 0  TX-errors: 0  TX-bytes:  0
>   
> testpmd> show port stats 0
>
>    NIC statistics for port 0  
>   RX-packets: 6  RX-missed: 0  RX-bytes:  732
>   RX-badcrc:  0  RX-badlen: 0  RX-errors: 0
>   RX-nombuf:  0
>   TX-packets: 0  TX-errors: 0  TX-bytes:  0
>   
>
>
> The fault addr in error message must be RX DMA descriptor
>
> error message
>   [  120.809635] dmar: DMAR:[DMA Read] Request device [21:00.0] fault addr 
> aa01
>
> log in testpmd
>   PMD: ixgbe_dev_rx_queue_setup(): sw_ring=0x754ea740 
>

[dpdk-dev] DPDK doesn't work with iommu=pt

2014-09-29 Thread Alex Markuze
On Mon, Sep 29, 2014 at 2:53 AM, Hiroshi Shimamoto 
wrote:
> Hi,
>
>> Subject: Re: [dpdk-dev] DPDK doesn't work with iommu=pt
>>
>> iommu=pt effectively disables iommu for the kernel and iommu is
>> enabled only for KVM.
>> http://lwn.net/Articles/329174/
>
> thanks for pointing that.
>
> Okay, I think DPDK cannot handle IOMMU because of no kernel code in
> DPDK application.
>
> And now, I think "iommu=pt" doesn't work correctly DMA on host PMD
> causes DMAR fault which means IOMMU catches a wrong operation.
> Will dig around "iommu=pt".
>
I agree with your analysis, It seems that a fairly recent patch (3~4)
months has introduced a bug that confuses unprotected DMA access with an
iommu access, by the device and produces an equivalent of a page fault.

>>
>> Basically unless you have KVM running you can remove both lines for
>> the same effect.
>> On the other hand if you do have KVM and you do want iommu=on You can
>> remove the iommu=pt for the same performance because AFAIK unlike the
>> kernel drivers DPDK doesn't dma_map and dma_unman each and every
>> ingress/egress packet (Please correct me if I'm wrong), and will not
>> suffer any performance penalties.
>
> I also tried "iommu=on", but it didn't fix the issue.
> I saw the same error messages in kernel.
>

Just to clarify, what I suggested you to try is leaving only this string in
the command line "intel_iommu=on".  w/o iommu=pt.
But this would work iff DPDK can handle iota's (I/O virtual addresses).

>   [   46.978097] dmar: DRHD: handling fault status reg 2
>   [   46.978120] dmar: DMAR:[DMA Read] Request device [21:00.0] fault addr
aa01
>   DMAR:[fault reason 02] Present bit in context entry is clear
>
> thanks,
> Hiroshi
>
>>
>> FYI. Kernel NIC drivers:
>> When iommu=on{,strict} the kernel network drivers will suffer a heavy
>> performance penalty due to regular IOVA modifications (both HW and SW
>> at fault here). Ixgbe and Mellanox reuse dma_mapped pages on the
>> receive side to avoid this penalty, but still suffer from iommu on TX.
>>
>> On Fri, Sep 26, 2014 at 5:47 PM, Choi, Sy Jong 
wrote:
>> > Hi Shimamoto-san,
>> >
>> > There are a lot of sighting relate to "DMAR:[fault reason 06] PTE Read
access is not set"
>> > https://www.mail-archive.com/kvm at vger.kernel.org/msg106573.html
>> >
>> > This might be related to IOMMU, and kernel code.
>> >
>> > Here is what we know :-
>> > 1) Disabling VT-d in bios also removed the symptom
>> > 2) Switch to another OS distribution also removed the symptom
>> > 3) even different HW we will not see the symptom. In my case, switch
from Engineering board to EPSD board.
>> >
>> > Regards,
>> > Choi, Sy Jong
>> > Platform Application Engineer
>> >
>> >
>> > -Original Message-
>> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Hiroshi Shimamoto
>> > Sent: Friday, September 26, 2014 5:14 PM
>> > To: dev at dpdk.org
>> > Cc: Hayato Momma
>> > Subject: [dpdk-dev] DPDK doesn't work with iommu=pt
>> >
>> > I encountered an issue that DPDK doesn't work with "iommu=pt intel_
iommu=on"
>> > on HP ProLiant DL380p Gen8 server. I'm using the following environment;
>> >
>> >   HW: ProLiant DL380p Gen8
>> >   CPU: E5-2697 v2
>> >   OS: RHEL7
>> >   kernel: kernel-3.10.0-123 and the latest kernel 3.17-rc6+
>> >   DPDK: v1.7.1-53-gce5abac
>> >   NIC: 82599ES
>> >
>> > When boot with "iommu=pt intel_iommu=on", I got the below message and
no packets are handled.
>> >
>> >   [  120.809611] dmar: DRHD: handling fault status reg 2
>> >   [  120.809635] dmar: DMAR:[DMA Read] Request device [21:00.0] fault
addr aa01
>> >   DMAR:[fault reason 02] Present bit in context entry is clear
>> >
>> > How to reproduce;
>> > just run testpmd
>> > # ./testpmd -c 0xf -n 4 -- -i
>> >
>> > Configuring Port 0 (socket 0)
>> > PMD: ixgbe_dev_tx_queue_setup(): sw_ring=0x754eafc0 
>> > hw_ring=0x7420
dma_addr=0xaa00
>> > PMD: ixgbe_dev_tx_queue_setup(): Using full-featured tx code path
>> > PMD: ixgbe_dev_tx_queue_setup():  - txq_flags = 0 [IXGBE
_SIMPLE_FLAGS=f01]
>> > PMD: ixgbe_dev_tx_queue_setup():  - tx_rs_thresh = 32 [RTE_PMD_IXGBE
_TX_MAX_BURST=32]
>> > PMD: ixgbe_dev_rx_queue_setup(): sw_ring=0x754ea740 
>> > hw_ring=0x7421
dma_addr=0xaa01
>> > PMD: check_rx_burst_bulk_alloc_preconditions(): Rx Burst Bulk Alloc
Preconditions: rxq->rx_free_thresh=0,
>> RTE_PMD_IXGBE_RX_MAX_BURST=32
>> > PMD: ixgbe_dev_rx_queue_setup(): Rx Burst Bulk Alloc Preconditions are
not satisfied, Scattered Rx is requested, or
>> RTE_LIBRTE_IXGBE_RX_ALLOW_BULK_ALLOC is not enabled (port=0, queue=0).
>> > PMD: check_rx_burst_bulk_alloc_preconditions(): Rx Burst Bulk Alloc
Preconditions: rxq->rx_free_thresh=0,
>> RTE_PMD_IXGBE_RX_MAX_BURST=32
>> >
>> > testpmd> start
>> >   io packet forwarding - CRC stripping disabled - packets/burst=32
>> >   nb forwarding cores=1 - nb forwarding ports=2
>> >   RX queues=1 - RX desc=128 - RX free threshold=0
>> >   RX threshold registers: pthresh=8 hthresh=8 wthresh=0
>> >   TX q

[dpdk-dev] two tso related questions

2015-01-04 Thread Alex Markuze
On Sun, Jan 4, 2015 at 10:50 AM, Helmut Sim  wrote:

> Hi Alex and Olivier,
>
> Alex, I made the test and the segmentation is not at the IP level (i.e.
> each packet ip total length indicated the mss length), hence the 16 bits
> total length limitation is not relevant here.
>

Oliver thanks for reporting back, this is interesting but doesn't come as a
surprise as the headers must be correct when on the wire, what I couldn't
tell you is what happens with the identificayion/Frag off fields.

The IP length limitation comes from the send side network stack,
theoreticaly Its possible to send a packet of any size as long as your
network stack doesn't mind sending a packet with a malformed IP header(as
the length field is not defined).
  .
The send side HW recieves a single packet with the ip length of the whole
packet. Assuming that the ixgbe HW takes the packet len for its TSO
fragmentation from the TX descriptor rather then from the IP header, it
should be able to send as much as the HW supports.


> I went over the 82599 datasheet and as Olivier mentioned it is a 18 bits
> field, hence allowing up to 256KB length.
>
> Olivier, although tcp window size field is 16 bits the advertised window
> is typically higher than 64KB using the TCP window scaling option (which is
> the common usage today).
>
> Hence I think that the API should allow at least up to 256KB packet
> length, while finding a solution to make sure it also support lower lengths
> for other NICs.
>
> Any idea?
>
> Sim
>
> On Wed, Dec 17, 2014 at 3:02 PM, Olivier MATZ 
> wrote:
>
>> Hi Helmut,
>>
>> On 12/17/2014 08:17 AM, Helmut Sim wrote:
>>
>>> While working on TSO based solution I faced the following two questions:
>>>
>>> 1.
>>> is there a maximum pkt_len to be used with TSO?, e.g. let's say if
>>> seg_sz
>>> is 1400 can the entire segmented pkt be 256K (higer than 64K) ?, then
>>> the
>>> driver gets a list of chanined mbufs while the first mbuf is set to
>>> TSO
>>> offload.
>>>
>>
>> I think the limitations depend on:
>>
>> - the window size advertised by the peer: your stack should handle this
>>   and not generate more packets that what the peer can receive
>>
>> - the driver: on ixgbe, the maximum payload length is 2^18. I don't know
>>   if there is a limitation on number of chained descriptors.
>>
>> I think we should define a way to know this limitation in the API. Maybe
>> a comment saying that the TSO length should not be higher than 256KB (or
>> fix it to 64KB in case future drivers do not support 256KB) is enough.
>>
>> Regards,
>> Olivier
>>
>>
>


[dpdk-dev] IOMMU and VF

2015-01-08 Thread Alex Markuze
Hi, Guys,
I'm trying to run a DPDK(1.7.1) application that has been previously tested
on Xen/VMware VM's. I have both iommu=pt and intel_iommu=on.
I would expect things to work as usual but unfortunately the VF I'm taking
is unable to send or receive any packets (The TXQ gets filled out, and the
packets never leave).

Looking at the demise I se this:
IOMMU: hardware identity mapping for device :83:00.0
IOMMU: hardware identity mapping for device :83:00.1

These are the bus addresses of the physical functions.
I don't know If I need to see the VF's listed here as well.

Any suggestions?


[dpdk-dev] IOMMU and VF

2015-01-09 Thread Alex Markuze
Thanks Zhang, I'm familiar with issue you have mentioned, but I don't
 think it is related.
The OS is a RHEL 6.5 which is kernel 2.6.32 (With what ever newer patches
RH have cherry picked).

Moving to DPDK 1.8 is not a viable option right now for us.
Could you please elaborate on the new mac types patches and how it can
relate to the issue at hand.

This feels like an iommu issue as we are able to configure the ports
successfully, but there is no traffic in or out.
Which leads me to believe that the RX/TX descriptors are at fault.
Another clue that I've already mentioned is these snippets from demise:
Looking at the demise I se this:
IOMMU: hardware identity mapping for device :83:00.0
IOMMU: hardware identity mapping for device :83:00.1

Which include the PF but not the VF. IOMMU has a separate translation table
for each peripheral device(just like the cpu has a separate translation
table and TLB for each process). I'm guessing that the IOMMU is not SRIOV
aware and should maintain a separate translation table for each virtual
function(unless I'm missing something). If this is true I would expect to
see that an identity ,capping has been set for the VFs as well.

It maybe some config issue.

Any advice would be appreciated.

Thanks




On Fri, Jan 9, 2015 at 2:39 AM, Zhang, Helin  wrote:

> Hi Alex
>
> Could you help to try 1.8? I remember there might a fix of supporting some
> newly mac types.
> In addition, what's the kernel version of your host? We observed issues
> recently before kernel version 3.18. I'd suggest to try kernel 3.18.
>
> Hopefully it is helpful!
>
> Regards,
> Helin
>
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Alex Markuze
> > Sent: Friday, January 9, 2015 1:57 AM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] IOMMU and VF
> >
> > Hi, Guys,
> > I'm trying to run a DPDK(1.7.1) application that has been previously
> tested on
> > Xen/VMware VM's. I have both iommu=pt and intel_iommu=on.
> > I would expect things to work as usual but unfortunately the VF I'm
> taking is
> > unable to send or receive any packets (The TXQ gets filled out, and the
> packets
> > never leave).
> >
> > Looking at the demise I se this:
> > IOMMU: hardware identity mapping for device :83:00.0
> > IOMMU: hardware identity mapping for device :83:00.1
> >
> > These are the bus addresses of the physical functions.
> > I don't know If I need to see the VF's listed here as well.
> >
> > Any suggestions?
>


[dpdk-dev] 82599 Ethernet Controller Virtual Function

2014-11-16 Thread Alex Markuze
Hi,
I'm working with a VMware SRIOV with intel 10G nics.
I'm using two virtual functions per VM, mainly because I need loopback
(the LLE(PFVMTXSW[n])
register is not allowing loopback on a VF by default).

After adding a second VF to the VM's I often see issues where no traffic is
passing between the VF's.

Each VM has 4 legs:
   1. mgmnt (vmxnet over the onboard 1 GB)- ssh connections
   2. data mgmnt (vmxnet  over the physical function) - "Trusted interface"
to ping DPDK apps/KNI
   3 + 4) VF bound to igbuio - for the application.

The issue I see, is that some VF cannot communicate with other VF's either
same nic or a different Nic. For example two "data mgmnt" vmxnet3
interfaces (not bound to DPDK) cannot see each other.

The only thing that seem to solve it is a VM shutdown (a reboot of a VM
will not help).

Has anyone seen something like this? any suggestions how to debug analyse
what I'm seeing.


[dpdk-dev] Huge Pages.

2014-10-01 Thread Alex Markuze
Hi,
How well does DPDK play with other applications using huge pages?
Looking at eal_init/eal_hugepage_info_init it seems that DPDK will try to
grab All available huge pages.

Is there an existing way to limit the number of huge pages taken ?
My goal is to be able to run several applications each with its own dpdk
instance and a possible 3rd party application all using huge pages. Is this
possible under DPDKs current design?

Sharing the NICs is fairly simple if we specify the available lci functions
per dpdk instance but at first glance sharing huge pages looks like a
problem.

Thanks.


[dpdk-dev] Aligned RX data.

2014-10-07 Thread Alex Markuze
Hi , I'm trying to receive aligned packets from the wire.
Meaning that for all received packets the pkt.data is always aligned to
(512 -H).

Looking at the pmds of ixgbe/vmxnet I see that the pmds call
__rte_mbuf_raw_alloc and set the rx descriptor with a
RTE_MBUF_DATA_DMA_ADDR_DEFAULT
Instead of the more appropriate RTE_MBUF_DATA_DMA_ADDR.

Do I need to modify each pmd I'm using to be able to receive aligned data?
Or have I missed something?

Thanks


[dpdk-dev] Aligned RX data.

2014-10-07 Thread Alex Markuze
RTE_PKTMBUF_HEADROOM defines the headroom this would be true only if the
buff_start was aligned to 512 which is not.

On Tue, Oct 7, 2014 at 1:05 PM, Ananyev, Konstantin <
konstantin.ananyev at intel.com> wrote:

>
>
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Alex Markuze
> > Sent: Tuesday, October 07, 2014 10:40 AM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] Aligned RX data.
> >
> > Hi , I'm trying to receive aligned packets from the wire.
> > Meaning that for all received packets the pkt.data is always aligned to
> > (512 -H).
> >
> > Looking at the pmds of ixgbe/vmxnet I see that the pmds call
> > __rte_mbuf_raw_alloc and set the rx descriptor with a
> > RTE_MBUF_DATA_DMA_ADDR_DEFAULT
> > Instead of the more appropriate RTE_MBUF_DATA_DMA_ADDR.
> >
> > Do I need to modify each pmd I'm using to be able to receive aligned
> data?
>
> Make sure that your all your mbufs are aligned by 512 and set in your
> config RTE_PKTMBUF_HEADROOM=512-H?
>
>
> > Or have I missed something?
> >
> > Thanks
>


[dpdk-dev] Aligned RX data.

2014-10-11 Thread Alex Markuze
O.k, And how would I do that?
I'm guessing there is something I can control in rte_pktmbuf_pool_init?
I would appreciate If you could spare a word or two in the matter.

On Tue, Oct 7, 2014 at 7:11 PM, Ananyev, Konstantin <
konstantin.ananyev at intel.com> wrote:

>
>
> > -Original Message-
> > From: Ananyev, Konstantin
> > Sent: Tuesday, October 07, 2014 5:03 PM
> > To: Ananyev, Konstantin
> > Subject: FW: [dpdk-dev] Aligned RX data.
> >
> >
> >
> > From: Alex Markuze [mailto:alex at weka.io]
> > Sent: Tuesday, October 07, 2014 4:52 PM
> > To: Ananyev, Konstantin
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] Aligned RX data.
> >
> > RTE_PKTMBUF_HEADROOM defines the headroom
>
> Yes.
>
> >this would be true only if the buff_start was aligned to 512 which is not.
>
> As I said: " Make sure that your all your mbufs are aligned by 512".
>
> Konstantin
>
> >
> > On Tue, Oct 7, 2014 at 1:05 PM, Ananyev, Konstantin <
> konstantin.ananyev at intel.com> wrote:
> >
> >
> > > -Original Message-
> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Alex Markuze
> > > Sent: Tuesday, October 07, 2014 10:40 AM
> > > To: dev at dpdk.org
> > > Subject: [dpdk-dev] Aligned RX data.
> > >
> > > Hi , I'm trying to receive aligned packets from the wire.
> > > Meaning that for all received packets the pkt.data is always aligned to
> > > (512 -H).
> > >
> > > Looking at the pmds of ixgbe/vmxnet I see that the pmds call
> > > __rte_mbuf_raw_alloc and set the rx descriptor with a
> > > RTE_MBUF_DATA_DMA_ADDR_DEFAULT
> > > Instead of the more appropriate RTE_MBUF_DATA_DMA_ADDR.
> > >
> > > Do I need to modify each pmd I'm using to be able to receive aligned
> data?
> > Make sure that your all your mbufs are aligned by 512 and set in your
> config RTE_PKTMBUF_HEADROOM=512-H?
> >
> >
> > > Or have I missed something?
> > >
> > > Thanks
>
>


[dpdk-dev] Aligned RX data.

2014-10-13 Thread Alex Markuze
Hi All,
Is there a way to create a mempool such that all mbufs are aligned to X.
lets say X is 512.

Thanks.


On Sat, Oct 11, 2014 at 5:04 PM, Alex Markuze  wrote:

> O.k, And how would I do that?
> I'm guessing there is something I can control in rte_pktmbuf_pool_init?
> I would appreciate If you could spare a word or two in the matter.
>
> On Tue, Oct 7, 2014 at 7:11 PM, Ananyev, Konstantin <
> konstantin.ananyev at intel.com> wrote:
>
>>
>>
>> > -Original Message-
>> > From: Ananyev, Konstantin
>> > Sent: Tuesday, October 07, 2014 5:03 PM
>> > To: Ananyev, Konstantin
>> > Subject: FW: [dpdk-dev] Aligned RX data.
>> >
>> >
>> >
>> > From: Alex Markuze [mailto:alex at weka.io]
>> > Sent: Tuesday, October 07, 2014 4:52 PM
>> > To: Ananyev, Konstantin
>> > Cc: dev at dpdk.org
>> > Subject: Re: [dpdk-dev] Aligned RX data.
>> >
>> > RTE_PKTMBUF_HEADROOM defines the headroom
>>
>> Yes.
>>
>> >this would be true only if the buff_start was aligned to 512 which is
>> not.
>>
>> As I said: " Make sure that your all your mbufs are aligned by 512".
>>
>> Konstantin
>>
>> >
>> > On Tue, Oct 7, 2014 at 1:05 PM, Ananyev, Konstantin <
>> konstantin.ananyev at intel.com> wrote:
>> >
>> >
>> > > -Original Message-
>> > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Alex Markuze
>> > > Sent: Tuesday, October 07, 2014 10:40 AM
>> > > To: dev at dpdk.org
>> > > Subject: [dpdk-dev] Aligned RX data.
>> > >
>> > > Hi , I'm trying to receive aligned packets from the wire.
>> > > Meaning that for all received packets the pkt.data is always aligned
>> to
>> > > (512 -H).
>> > >
>> > > Looking at the pmds of ixgbe/vmxnet I see that the pmds call
>> > > __rte_mbuf_raw_alloc and set the rx descriptor with a
>> > > RTE_MBUF_DATA_DMA_ADDR_DEFAULT
>> > > Instead of the more appropriate RTE_MBUF_DATA_DMA_ADDR.
>> > >
>> > > Do I need to modify each pmd I'm using to be able to receive aligned
>> data?
>> > Make sure that your all your mbufs are aligned by 512 and set in your
>> config RTE_PKTMBUF_HEADROOM=512-H?
>> >
>> >
>> > > Or have I missed something?
>> > >
>> > > Thanks
>>
>>
>


[dpdk-dev] Aligned RX data.

2014-10-13 Thread Alex Markuze
Very helpful, thanks a lot.
It sure does seem to do the trick.

On Mon, Oct 13, 2014 at 2:43 PM, Ananyev, Konstantin <
konstantin.ananyev at intel.com> wrote:

>
>
> > -Original Message-
> > From: Ananyev, Konstantin
> > Sent: Monday, October 13, 2014 12:30 PM
> > To: Ananyev, Konstantin
> > Subject: FW: [dpdk-dev] Aligned RX data.
> >
> >
> >
> > From: Alex Markuze [mailto:alex at weka.io]
> > Sent: Monday, October 13, 2014 9:47 AM
> > To: Ananyev, Konstantin
> > Cc: dev at dpdk.org
> > Subject: Re: [dpdk-dev] Aligned RX data.
> >
> > Hi All,
> > Is there a way to create a mempool such that all mbufs are aligned to X.
> lets say X is 512.
> >
> > Thanks.
> >
>
> For example something like that:
>
> struct rte_mempool *
> mempool_xz1_create(uint32_t elt_num, int32_t socket_id)
> {
> struct rte_mempool *mp;
> const struct rte_memzone *mz;
> struct rte_mempool_objsz obj_sz;
> uint32_t flags, elt_size, total_size;
> size_t sz;
> phys_addr_t pa;
> void *va;
>
> /* mp element header_size==64B,  trailer_size==0. */
> flags = MEMPOOL_F_NO_SPREAD;
>
> /* to make total element size of mp 2K. */
> elt_size = 2048 - 64;
>
> total_size = rte_mempool_calc_obj_size(elt_size, flags, &obj_sz);
> sz = elt_num * total_size + 512;
>
> if ((mz = rte_memzone_reserve_aligned("xz1_obj", sz, socket_id,
> 0, 512)) == NULL)
> return (NULL);
>
> va = (char *)mz->addr + 512 - obj_sz.header_size;
> pa = mz->phys_addr + 512 - obj_sz.header_size;
>
> mp = rte_mempool_xmem_create("xz1", elt_num, elt_size,
> 256, sizeof(struct rte_pktmbuf_pool_private),
> rte_pktmbuf_pool_init, NULL,
> rte_pktmbuf_init, NULL,
> socket_id, flags, va, &pa,
> MEMPOOL_PG_NUM_DEFAULT, MEMPOOL_PG_SHIFT_MAX);
>
> return (mp);
> }
>
> Each mbuf will be aligned on 512B boundary and  1856 (2K - 64B header -
> 128B mbuf).
>
> Alternative way - is to provide your own element constructor instead of
> rte_pktmbuf_init() for mempool_create.
> And inside it align buf_addr and buf_physaddr.
> Though in that case you have to set RTE_MBUF_REFCNT=n in your config.
> That's why I'd say it is a not recommended.
>
> Konstantin
>
> >
> > On Sat, Oct 11, 2014 at 5:04 PM, Alex Markuze  wrote:
> > O.k, And how would I do that?
> > I'm guessing there is something I can control in rte_pktmbuf_pool_init?
> > I would appreciate If you could spare a word or two in the matter.
> >
> > On Tue, Oct 7, 2014 at 7:11 PM, Ananyev, Konstantin <
> konstantin.ananyev at intel.com> wrote:
> >
> >
> > > -Original Message-
> > > From: Ananyev, Konstantin
> > > Sent: Tuesday, October 07, 2014 5:03 PM
> > > To: Ananyev, Konstantin
> > > Subject: FW: [dpdk-dev] Aligned RX data.
> > >
> > >
> > >
> > > From: Alex Markuze [mailto:alex at weka.io]
> > > Sent: Tuesday, October 07, 2014 4:52 PM
> > > To: Ananyev, Konstantin
> > > Cc: dev at dpdk.org
> > > Subject: Re: [dpdk-dev] Aligned RX data.
> > >
> > > RTE_PKTMBUF_HEADROOM defines the headroom
> >
> > Yes.
> >
> > >this would be true only if the buff_start was aligned to 512 which is
> not.
> >
> > As I said: " Make sure that your all your mbufs are aligned by 512".
> >
> > Konstantin
> >
> > >
> > > On Tue, Oct 7, 2014 at 1:05 PM, Ananyev, Konstantin <
> konstantin.ananyev at intel.com> wrote:
> > >
> > >
> > > > -Original Message-
> > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Alex Markuze
> > > > Sent: Tuesday, October 07, 2014 10:40 AM
> > > > To: dev at dpdk.org
> > > > Subject: [dpdk-dev] Aligned RX data.
> > > >
> > > > Hi , I'm trying to receive aligned packets from the wire.
> > > > Meaning that for all received packets the pkt.data is always aligned
> to
> > > > (512 -H).
> > > >
> > > > Looking at the pmds of ixgbe/vmxnet I see that the pmds call
> > > > __rte_mbuf_raw_alloc and set the rx descriptor with a
> > > > RTE_MBUF_DATA_DMA_ADDR_DEFAULT
> > > > Instead of the more appropriate RTE_MBUF_DATA_DMA_ADDR.
> > > >
> > > > Do I need to modify each pmd I'm using to be able to receive aligned
> data?
> > > Make sure that your all your mbufs are aligned by 512 and set in your
> config RTE_PKTMBUF_HEADROOM=512-H?
> > >
> > >
> > > > Or have I missed something?
> > > >
> > > > Thanks
> >
>
>


[dpdk-dev] nic loopback

2014-10-20 Thread Alex Markuze
Hi,
I'm trying to send packets from an application to it self, meaning smac  ==
dmac.
I'm working with intel 82599 virtual function. But it seems that these
packets are lost.

Is there a software/hw limitation I'm missing here (some additional
anti-spoofing)? AFAIK modern NICs with sriov are mini switches so the hw
loopback should work, at least thats the theory.


Thanks.


[dpdk-dev] Why do we need iommu=pt?

2014-10-21 Thread Alex Markuze
DPDK uses a 1:1 mapping and doesn't support IOMMU.  IOMMU allows for
simpler VM physical address translation.
The second role of IOMMU is to allow protection from unwanted memory access
by an unsafe devise that has DMA privileges. Unfortunately this protection
comes with an extremely high performance costs for high speed nics.

To your question iommu=pt disables IOMMU support for the hypervisor.

On Tue, Oct 21, 2014 at 1:39 AM, Xie, Huawei  wrote:

>
>
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Shivapriya Hiremath
> > Sent: Monday, October 20, 2014 2:59 PM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] Why do we need iommu=pt?
> >
> > Hi,
> >
> > My question is that if the Poll mode  driver used the DMA kernel
> interface
> > to set up its mappings appropriately, would it still require that
> iommu=pt
> > be set?
> > What is the purpose of setting iommu=pt ?
> PMD allocates memory though hugetlb file system, and fills the physical
> address
> into the descriptor.
> pt is used to pass through iotlb translation. Refer to the below link.
> http://lkml.iu.edu/hypermail/linux/kernel/0906.2/02129.html
> >
> > Thank you.
>


[dpdk-dev] nic loopback

2014-10-21 Thread Alex Markuze
How can I set/query this bit (LLE(PFVMTXSW[n]), intel 82599 ) on ESX, or
any other friendlier environment like Linux?

On Tue, Oct 21, 2014 at 4:18 AM, Liang, Cunming 
wrote:

>
>
> > -Original Message-
> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Alex Markuze
> > Sent: Tuesday, October 21, 2014 12:24 AM
> > To: dev at dpdk.org
> > Subject: [dpdk-dev] nic loopback
> >
> > Hi,
> > I'm trying to send packets from an application to it self, meaning smac
> ==
> > dmac.
> > I'm working with intel 82599 virtual function. But it seems that these
> > packets are lost.
> >
> > Is there a software/hw limitation I'm missing here (some additional
> > anti-spoofing)? AFAIK modern NICs with sriov are mini switches so the hw
> > loopback should work, at least thats the theory.
> >
> [Liang, Cunming] You could have a check on register LLE(PFVMTXSW[n]).
> Which allow an individual pool to be able to send traffic and have it
> loopback to itself.
> >
> > Thanks.
>


[dpdk-dev] nic loopback

2014-10-21 Thread Alex Markuze
Thanks Thomas,
unfortunately these patches are only valid a pf*. This is also evident from
the ixgbe pmd code which is the only one looking at this bit (lpbk_mode).
The ixgbevf functions are agnostic to this capability.

*
http://www.intel.com/content/dam/doc/design-guide/82599-sr-iov-driver-companion-guide.pdf

On Tue, Oct 21, 2014 at 6:32 PM, Thomas Monjalon 
wrote:

> 2014-10-20 19:24, Alex Markuze:
> > I'm trying to send packets from an application to it self, meaning smac
> ==
> > dmac.
> > I'm working with intel 82599 virtual function. But it seems that these
> > packets are lost.
> >
> > Is there a software/hw limitation I'm missing here (some additional
> > anti-spoofing)? AFAIK modern NICs with sriov are mini switches so the hw
> > loopback should work, at least thats the theory.
>
> I think you should look at these commits:
>
> ixgbe: add Tx->Rx loopback mode for 82599
> http://dpdk.org/browse/dpdk/commit/?id=db035925617
> app/testpmd: add loopback topology
> http://dpdk.org/browse/dpdk/commit/?id=3e2006d6186
>
> --
> Thomas
>


[dpdk-dev] Why do we need iommu=pt?

2014-10-23 Thread Alex Markuze
A engine inside the NIC using modified DPDK
> PMD
> > > > are not trustable
> > > > as they can potentially DAM to/from arbitrary memory regions using
> > > > physical addresses, so IOMMU
> > > > is needed to provide strict memory protection, at the cost of
> negative
> > > > performance impact.
> > > >
> > > > So if you want to seek high performance, disable IOMMU in BIOS or
> OS. And
> > > > if security is a major
> > > > concern, tune it on and tradeoff between performance and security.
> But I
> > > > do NOT think is comes with
> > > > an extremely high performance costs according to our performance
> > > > measurement, but it probably true
> > > > for 100G NIC.
> > > >
> > > > > -Original Message-
> > > > > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of Shivapriya
> Hiremath
> > > > > Sent: Wednesday, October 22, 2014 12:54 AM
> > > > > To: Alex Markuze
> > > > > Cc: dev at dpdk.org
> > > > > Subject: Re: [dpdk-dev] Why do we need iommu=pt?
> > > > >
> > > > > Hi,
> > > > >
> > > > > Thank you for all the replies.
> > > > > I am trying to understand the impact of this on DPDK. What will be
> the
> > > > > repercussions of disabling "iommu=pt" on the DPDK performance?
> > > > >
> > > > >
> > > > > On Tue, Oct 21, 2014 at 12:32 AM, Alex Markuze 
> wrote:
> > > > >
> > > > > > DPDK uses a 1:1 mapping and doesn't support IOMMU.  IOMMU allows
> for
> > > > > > simpler VM physical address translation.
> > > > > > The second role of IOMMU is to allow protection from unwanted
> memory
> > > > > > access by an unsafe devise that has DMA privileges.
> Unfortunately this
> > > > > > protection comes with an extremely high performance costs for
> high
> > > > speed
> > > > > > nics.
> > > > > >
> > > > > > To your question iommu=pt disables IOMMU support for the
> hypervisor.
> > > > > >
> > > > > > On Tue, Oct 21, 2014 at 1:39 AM, Xie, Huawei <
> huawei.xie at intel.com>
> > > > wrote:
> > > > > >
> > > > > >>
> > > > > >>
> > > > > >> > -Original Message-
> > > > > >> > From: dev [mailto:dev-bounces at dpdk.org] On Behalf Of
> Shivapriya
> > > > > >> Hiremath
> > > > > >> > Sent: Monday, October 20, 2014 2:59 PM
> > > > > >> > To: dev at dpdk.org
> > > > > >> > Subject: [dpdk-dev] Why do we need iommu=pt?
> > > > > >> >
> > > > > >> > Hi,
> > > > > >> >
> > > > > >> > My question is that if the Poll mode  driver used the DMA
> kernel
> > > > > >> interface
> > > > > >> > to set up its mappings appropriately, would it still require
> that
> > > > > >> iommu=pt
> > > > > >> > be set?
> > > > > >> > What is the purpose of setting iommu=pt ?
> > > > > >> PMD allocates memory though hugetlb file system, and fills the
> > > > physical
> > > > > >> address
> > > > > >> into the descriptor.
> > > > > >> pt is used to pass through iotlb translation. Refer to the
> below link.
> > > > > >> http://lkml.iu.edu/hypermail/linux/kernel/0906.2/02129.html
> > > > > >> >
> > > > > >> > Thank you.
> > > > > >>
> > > > > >
> > > > > >
> > > >
>


[dpdk-dev] Fwd: [dpdk-announce] DPDK Features for Q1 2015

2014-10-23 Thread Alex Markuze
On Thu, Oct 23, 2014 at 5:18 PM, Jay Rolette  wrote:

> Tim,
>
> Thanks for sharing this. If nothing else, I wanted to at least provide some
> feedback on the parts that look useful to me for my applications/product.
> Bits that make me interested in the release:
>
>
>
> *> 2.0 (Q1 2015) DPDK Features:> Bifurcated Driver: With the Bifurcated
> Driver, the kernel will retain direct control of the NIC, and will assign
> specific queue pairs to DPDK. Configuration of the NIC is controlled by the
> kernel via ethtool.*
>
> Having NIC configuration, port stats, etc. available via the normal Linux
> tools is very helpful - particularly on new products just getting started
> with DPDK.
>
>
> *> Packet Reordering: Assign a sequence number to packets on Rx, and then
> provide the ability to reorder on Tx to preserve the original order.*
>
> This could be extremely useful but it depends on where it goes. The current
> design being discussed seems fundamentally flawed to me. See the thread on
> the RFC for details.
>
>
> *> Packet Distributor (phase 2): Implement the following enhancements to
> the Packet Distributor that was originally delivered in the DPDK 1.7
> release: performance improvements; the ability for packets from a flow to
> be processed by multiple worker cores in parallel and then reordered on Tx
> using the Packet Reordering feature; the ability to have multiple
> Distributors which share Worker cores.*
>
> TBD on this for me. The 1.0 version of our product is based on DPDK 1.6 and
> I haven't had a chance to look at what is happening with Packet Distributor
> yet. An area of potential interest at least.
>
>
> *> Cuckoo Hash: A new hash algorithm was implemented as part of the Cuckoo
> Switch project (see http://www.cs.cmu.edu/~dongz/papers/cuckooswitch.pdf
> ), and shows some
> promising performance results. This needs to be modified to make it more
> generic, and then incorporated into DPDK.*
>
> More performance == creamy goodness, especially if it is in the plumbing
> and doesn't require significant app changes.
>
>
> *> Interrupt mode for PMD: Allow DPDK process to transition to interrupt
> mode when load is low so that other processes can run, or else power can be
> saved. This will increase latency/jitter.*
>
> Yes! I don't care about power savings, but I do care about giving a good
> product impression in the lab during evals without having to sacrifice
> overall system performance when under load. Hybrid drivers that use
> interrupts when load is low and poll-mode when loaded are ideal, IMO.
>
> It seems an odd thing, but during lab testing, it is normal for customers
> to fire the box up and just start running pings or some other low volume
> traffic through the box. If the PMDs are configured to batch in sizes
> optimal for best performance under load, the system can look *really* bad
> in these initial tests. We go through a fair bit of gymnastics right now to
> work around this without just giving up on batching in the PMDs.
>


> *>> I second this, DPDK is great for kernel bypass and zero-copy. But not
> all apps are network bound, thus interrupt mode is something that is
> extremely helpful.
>


> *> DPDK Headroom: Provide a mechanism to indicate how much headroom (spare
> capacity) exists in a DPDK process.*
>
> Very helpful in the field. Anything that helps customers understand how
> much headroom is left on their box before they need to take action is a
> huge win. CPU utilization is a bad indicator, especially with a PMD
> architecture.
>
> Hope this type of feedback is helpful.
>
> Regards,
> Jay
>


[dpdk-dev] VIRTIO indication

2014-10-28 Thread Alex Markuze
Each device can tell you the driver name its bound to . This is enough to
have separate configuration paths for vmxnet/virtio/ixgbe/etc...

On Tue, Oct 28, 2014 at 7:21 PM, Yan Freedland  wrote:

> Hi
>
> In my multi process system I need to support 2 modes of work: pass through
> and VIRTIO.
> I saw that in order to work in a VIRTIO mode I need to update the
> txq_flags (part of the rte_eth_txconf structure) as follows:
> .txq_flags = ETH_TXQ_FLAGS_NOMULTSEGS | ETH_TXQ_FLAGS_NOOFFLOADS
>
> In pass - through however, this parameter should remain 0 (default value).
> Apparently I need to update this parameter in runtime. Is there any flag
> that can give me any indication on a mode I am running in.
>
> Thank you,
> Yan
>
>


[dpdk-dev] segmented recv ixgbevf

2014-10-30 Thread Alex Markuze
Hi,
I'm seeing an unwanted behaviour in the receive flow of ixgbevf. While
using Jumbo frames and sending 4k+ bytes , the receive side breaks up the
packets into 2K buffers, and I receive 3 mbuffs per packet.

Im setting the .max_rx_pkt_len to 4.5K and the mempoll has 5K sized
elements?

Anything else I'm missing here. The purpose is to have all 4k+ bytes in one
single continuous buffer.

Thanks
Alex.


[dpdk-dev] segmented recv ixgbevf

2014-10-30 Thread Alex Markuze
For posterity.

1.When using MTU larger then 2K its advised to provide the value
to rte_pktmbuf_pool_init.
2.ixgbevf rounds down the ("MBUF size" - RTE_PKTMBUF_HEADROOM) to the
nearest 1K multiple when deciding on the receiving capabilities [buffer
size]of the Buffers in the pool.
The function SRRCTL register,  is considered here for some reason?