[dpdk-dev] Linux Plumbers Conference 2014: Network Virtualization (Security)

2014-03-13 Thread Thomas Graf
Hi

[Cross posting this to dpdk-dev for exposure]

We had excellent technical discussions around network virtualization
at LPC13 last year and would love to provide the same forum at this
year's LPC again.

I believe this would be a good opportunity to discuss integration of
DPDK with the Linux network stack community. Let us know if you have
interest.

We have started collecting ideas on possible discussion topics
on the following wiki page. Feel free to add your own ideas and/or
provide feedback on existing proposals:

http://wiki.linuxplumbersconf.org/2014:network_virtualization_security


 Original Message 
Date: Wed, 12 Mar 2014 16:26:36 -0700
From: Tom Herbert 
To: Linux Netdev List 

Hi,

We (me, Chris, and Thomas) are planning to hold another
micro-conference at this years LPC, Oct. 15-17 in Dusseldorf, Germany.
Similar to last year we'd like focus on network virtualization,
however with some extra emphasis on related topics in security area.
I'd really like to encourage HW vendors to participate also since HW
support of network virtualization seems to be a very hot topic now!

As usual, we want to facilitate discussion around design and
implementation in the network stack. If you have ideas for discussion
that you think would benefit from discussion at the Plumbers
Conference, please add your idea to the wiki, and email us.

Thanks,
Tom




[dpdk-dev] [PATCH] vmxnet3: rename library

2014-03-21 Thread Thomas Graf
On 03/21/2014 01:59 PM, Thomas Monjalon wrote:
> In order to distinguish clearly this implementation from the extension
> vmxnet3-usermap, it is renamed to reflect its usage of uio framework.
>
> Signed-off-by: Thomas Monjalon 

LGTM

Acked-by: Thomas Graf 


[dpdk-dev] [PATCH 03/16] pkg: add recipe for RPM

2014-04-02 Thread Thomas Graf
On 04/02/2014 11:53 AM, Thomas Monjalon wrote:
> 2014-02-26 14:07, Thomas Graf:
>>> +BuildRequires: kernel-devel, kernel-headers, doxygen
>>
>> Is a python environment required as well?
>
> Python is only needed to run some tools on the target. But is is optional.
> Do you think it should be written somewhere?

Not sure what target is in this context. You only need to list it if
a python environment must be present at build time.

>> What about calling it just "libdpdk"?
>
> In this case, it should be libdpdk-core in order to distinguish it from dpdk
> extensions. But the name of the project is dpdk so it seems simpler to call it
> dpdk-core.
> Is the "lib" prefix mandatory for libraries?

Not at all. You are free to name the package. The mandatory part is that
runtime and development files must be separated and the development
files such as headers will go into a -devel package.

dpdk-core
dpdk-core-devel

>> This brings up the question of multiple parallel DPDK installations.
>> A specific application linking to library version X will also require
>> tools of version X, right? A second application linking against version
>> Y will require tools version Y. Right now, these could not be installed
>> in parallel. Any chance we can make the runtime version independent?
>
> Are you thinking about installing different major versions? In my
> understanding, we cannot install 2 different minor versions of a package.
> As long as there is no stable API, there is no major versions defined.
> So don't you think we should speak about it later?

That's right, you can't install multiple versions of the same package
unless you name them differently. This is why we need to come up with a
strategy how to handle naming and upgrades now, before we push it into
Fedora.

Example:
Let's assume we push DPDK 1.6.0 into Fedora as 'dpdk-core' with NVR
dpdk-core-1.6.0-1. Let's assume we later push Open vSwitch 2.2 into
Fedora which will consume DPDK 1.6.0 via "Requires: dpdk-core = 1.6.0"
We can't do dpdk-core >= 1.6.0 because there is no compatibility.
DPDK 1.6.1 gets released and we push it as dpdk-core-1.6.1-1. We then
push dpdk-pktgen into Fedora which is based on DPDK 1.6.1 and requires
"dpdk-core = 1.6.1". Users won't be able to install both OVS and
dpdk-ptkgen in parallel at this point because they can't install both
1.6.0 and 1.6.1.

Fedora inclusion will require a strategy to resolve this. A unique name
for each release is an option (Every DPDK release is currently a new
major release). This can slowly transform into compatible releases once
stable ABIs are in place.

Unique names is not enough though as multiple packages would still
attempt to install the same file, e.g. header files. This would be
typically resolved by installing headers and other non versioned files
with a prefix as outlined below. This leaves the problems of tool
versioning as they also seem to be bound to specific DPDK versions but
can't be prefixed as they need to be part of $PATH.

So while we don't have to enforce stable ABIs at this point we have to
account for the lack of it in the packaging names and packaging
structure.

Packages could be named

dpdk-1.6.0-core
dpdk-1.6.0-core-devel


https://fedoraproject.org/wiki/Packaging:NamingGuidelines#General_Naming


>> Same applies to header files. A good option here would be to install
>> them to /usr/include/libdpdk{version}/ and have a dpdk-1.5.2.pc which
>> provides Cflags: -I${includedir}/libdpdk${version}
>
> Yes same applies :)
> I agree that a .pc file would be a good idea. But we also must allow to build
> with the DPDK framework.

Definitely.



[dpdk-dev] [vmxnet3-usermap PATCH 07/16] pkg: add recipe for RPM

2014-04-02 Thread Thomas Graf
On 04/02/2014 12:08 PM, Thomas Monjalon wrote:
> 2014-02-26 14:22, Thomas Graf:
>> On 02/04/2014 04:54 PM, Thomas Monjalon wrote:
>>> +BuildRequires: dpdk-core-devel, kernel-devel, kernel-headers
>>> +Requires: dpdk-core-runtime
>>
>> How does the compatibility mapping look like? I assume a given vmxnet3
>> version can only be linked against certain dpdk versions? We need to
>> encode that mapping in the spec file somehow.
>
> Since vmxnet3-usermap-1.1, any dpdk >= 1.3 is supported.
> But RPM packaging is not supported for these old versions.
> So do you think it's needed to encode a restriction for these old versions?

The restriction is only needed if RPM packages for these old versions
actually exist.

On a more general note: While it is extremely nice to have this spec
file to ease the building process. Fedora does not allow inclusion of
external kernel modules:

https://fedoraproject.org/wiki/Packaging:Guidelines#No_External_Kernel_Modules


[dpdk-dev] [PATCH 03/16] pkg: add recipe for RPM

2014-04-02 Thread Thomas Graf
On 04/02/2014 11:01 AM, Thomas Monjalon wrote:
> Hello,
>
> Sorry for the long delay.
>
> 2014-02-24 08:52, Chris Wright:
>>>   pkg/rpm.spec |  143
>>
>> This should be dpdk.spec
>
> Actually it should be dpdk-core.spec.
> Since it is a file hosted in the project, is it mandatory to have such naming?
> Could you explain why?
> When building it with "rpmbuild -ta dpdk.tar.gz", the .spec name has no
> importance.

You are right, it doesn't matter for external building but this would
likely get pointed at in the Fedora package review process.

https://fedoraproject.org/wiki/Packaging:NamingGuidelines#Spec_file_name


[dpdk-dev] [PATCH 03/16] pkg: add recipe for RPM

2014-02-26 Thread Thomas Graf
On 02/04/2014 04:54 PM, Thomas Monjalon wrote:
> Packages can be built with:
>   RPM_BUILD_NCPUS=8 rpmbuild -ta dpdk-1.5.2r2.tar.gz
>
> There are packages for runtime, static libraries and development.
> Once devel package installed, it can be used like this:
>   make -C /usr/share/dpdk/examples/helloworld RTE_SDK=/usr/share/dpdk
>
> Signed-off-by: Thomas Monjalon 

Thanks for getting this started! Some comments below. I'll be glad to
help pushing this into Fedora.

> +Name: dpdk
> +Version: 1.5.2r1
> +Release: 1

What kind of upgrade strategy do you have in mind?

I'm raising this because Fedora and other distributions will require
a unique package name for every version of the package that is not
backwards compatible.

Typically libraries provide backwards compatible within a major release,
i.e. all 1.x.x releases would be compatible. I realize that this might
not be applicable yet but maybe 1.5.x?

Depending on the versioning schema the name would be dpdk15, dpdk16, ...
or dpdk152, dpdk153, ...

> +BuildRequires: kernel-devel, kernel-headers, doxygen

Is a python environment required as well?

> +%description
> +Dummy main package. Make only subpackages.

I would just call the main package "libdpdk152" so you don't have to
repeat the encoding versioning in all the subpackages.

> +
> +%package core-runtime

What about calling it just "libdpdk"?

> +Summary: Intel(r) Data Plane Development Kit core for runtime
> +%description core-runtime
> +Intel(r) DPDK runtime includes kernel modules, core libraries and tools.
> +testpmd application allows to test fast packet processing environments
> +on x86 platforms. For instance, it can be used to check that environment
> +can support fast path applications such as 6WINDGate, pktgen, rumptcpip, etc.
> +More libraries are available as extensions in other packages.
> +
> +%package core-static

Based on the above: "libdpdk-static"

Packages that link against dpdk statically will do:
BuildRequire: libdpdk152-static

> +%files core-runtime
> +%dir %{datadir}
> +%{datadir}/config
> +%{datadir}/tools
> +%{moddir}/*
> +%{_sbindir}/*
> +%{_bindir}/*
> +%{_libdir}/*.so

This brings up the question of multiple parallel DPDK installations.
A specific application linking to library version X will also require
tools of version X, right? A second application linking against version
Y will require tools version Y. Right now, these could not be installed
in parallel. Any chance we can make the runtime version independent?

Same applies to header files. A good option here would be to install
them to /usr/include/libdpdk{version}/ and have a dpdk-1.5.2.pc which
provides Cflags: -I${includedir}/libdpdk${version}

> +%files core-static
> +%{_libdir}/*.a
> +
> +%files core-devel
> +%{_includedir}/*
> +%{datadir}/mk
> +%{datadir}/%{target}
> +%{datadir}/examples
> +%doc %{docdir}
>

You'll also need for all packages and subpackages installing shared
libraries:

%post -p /sbin/ldconfig
%postun -p /sbin/ldconfig





[dpdk-dev] [vmxnet3-usermap PATCH 07/16] pkg: add recipe for RPM

2014-02-26 Thread Thomas Graf
On 02/04/2014 04:54 PM, Thomas Monjalon wrote:
> +BuildRequires: dpdk-core-devel, kernel-devel, kernel-headers
> +Requires: dpdk-core-runtime

How does the compatibility mapping look like? I assume a given vmxnet3
version can only be linked against certain dpdk versions? We need to
encode that mapping in the spec file somehow.




[dpdk-dev] [ovs-dev] [PATCH RFC] dpif-netdev: Add support Intel DPDK based ports.

2014-01-29 Thread Thomas Graf
On 01/28/2014 07:17 PM, Pravin Shelar wrote:
> Right, version mismatch will not work. API provided by DPDK are not
> stable, So OVS has to be built for different releases for now.
>
> I do not see how we can fix it from OVS side. DPDK needs to
> standardize API, Actually OVS also needs more API, like DPDK
> initialization, mempool destroy, etc.

Agreed. It's not fixable from the OVS side. I also don't want to
object to including this. I'm just raising awareness of the issue
as this will become essential for dstribution.

The obvious and usual best practise would be for DPDK to guarantee
ABI stability between minor releases.

Since dpdk-dev is copied as well, any comments?


[dpdk-dev] [ovs-dev] [PATCH RFC] dpif-netdev: Add support Intel DPDK based ports.

2014-01-29 Thread Thomas Graf
On 01/28/2014 02:48 AM, pshelar at nicira.com wrote:
> From: Pravin B Shelar 
>
> Following patch adds DPDK netdev-class to userspace datapath.
> Approach taken in this patch differs from Intel? DPDK vSwitch
> where DPDK datapath switching is done in saparate process.  This
> patch adds support for DPDK type port and uses OVS userspace
> datapath for switching.  Therefore all DPDK processing and flow
> miss handling is done in single process.  This also avoids code
> duplication by reusing OVS userspace datapath switching and
> therefore it supports all flow matching and actions that
> user-space datapath supports.  Refer to INSTALL.DPDK doc for
> further info.
>
> With this patch I got similar performance for netperf TCP_STREAM
> tests compared to kernel datapath.
>
> This is based a patch from Gerald Rogers.
>
> Signed-off-by: Pravin B Shelar 
> CC: "Gerald Rogers" 

Pravin,

Some initial comments below. I will provide more after deeper
digging.

Do you have any ideas on how to implement the TX batching yet?

> +
> +static int
> +netdev_dpdk_rx_drain(struct netdev_rx *rx_)
> +{
> +struct netdev_rx_dpdk *rx = netdev_rx_dpdk_cast(rx_);
> +int pending;
> +int i;
> +
> +pending = rx->ofpbuf_cnt;
> +if (pending) {

This conditional seems unneeded.

> +for (i = 0; i < pending; i++) {
> + build_ofpbuf(rx, &rx->ofpbuf[i], NULL);
> +}
> +rx->ofpbuf_cnt = 0;
> +return 0;
> +}
> +
> +return 0;
> +}
> +
> +/* Tx function. Transmit packets indefinitely */
> +static int
> +dpdk_do_tx_copy(struct netdev *netdev, char *buf, int size)
> +{
> +struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> +struct rte_mbuf *pkt;
> +uint32_t nb_tx = 0;
> +
> +pkt = rte_pktmbuf_alloc(dev->dpdk_mp->mp);
> +if (!pkt) {
> +return 0;

Silent drop? ;-) Shouldn't these drops be accounted for somehow?

> +}
> +
> +/* We have to do a copy for now */
> +memcpy(pkt->pkt.data, buf, size);
> +
> +rte_pktmbuf_data_len(pkt) = size;
> +rte_pktmbuf_pkt_len(pkt) = size;
> +
> +rte_spinlock_lock(&dev->tx_lock);

What is the purpose of tx_lock here? Multiple threads writing to
the same Q? The lock is not acquired for the zerocopy path below.

> +nb_tx = rte_eth_tx_burst(dev->port_id, NR_QUEUE, &pkt, 1);
> +rte_spinlock_unlock(&dev->tx_lock);
> +
> +if (nb_tx != 1) {
> +/* free buffers if we couldn't transmit packets */
> +rte_mempool_put_bulk(dev->dpdk_mp->mp, (void **)&pkt, 1);
> +}
> +return nb_tx;
> +}
> +
> +static int
> +netdev_dpdk_send(struct netdev *netdev,
> + struct ofpbuf *ofpbuf, bool may_steal)
> +{
> +struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> +
> +if (ofpbuf->size > dev->max_packet_len) {
> +VLOG_ERR("2big size %d max_packet_len %d",
> +  (int)ofpbuf->size , dev->max_packet_len);

Should probably use VLOG_RATE_LIMIT_INIT

> +return E2BIG;
> +}
> +
> +rte_prefetch0(&ofpbuf->private_p);
> +if (!may_steal ||
> +!ofpbuf->private_p || ofpbuf->source != OFPBUF_DPDK) {
> +dpdk_do_tx_copy(netdev, (char *) ofpbuf->data, ofpbuf->size);
> +} else {
> +struct rte_mbuf *pkt;
> +uint32_t nb_tx;
> +int qid;
> +
> +pkt = ofpbuf->private_p;
> +ofpbuf->private_p = NULL;
> +rte_pktmbuf_data_len(pkt) = ofpbuf->size;
> +rte_pktmbuf_pkt_len(pkt) = ofpbuf->size;
> +
> +/* TODO: TX batching. */
> +qid = rte_lcore_id() % NR_QUEUE;
> +nb_tx = rte_eth_tx_burst(dev->port_id, qid, &pkt, 1);
> +if (nb_tx != 1) {
> +struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> +
> +rte_mempool_put_bulk(dev->dpdk_mp->mp, (void **)&pkt, 1);
> +VLOG_ERR("TX error, zero packets sent");

Same here

> +   }
> +}
> +return 0;
> +}

> +static int
> +netdev_dpdk_set_mtu(const struct netdev *netdev, int mtu)
> +{
> +struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
> +int old_mtu, err;
> +struct dpdk_mp *old_mp;
> +struct dpdk_mp *mp;
> +
> +ovs_mutex_lock(&dpdk_mutex);
> +ovs_mutex_lock(&dev->mutex);
> +if (dev->mtu == mtu) {
> +err = 0;
> +goto out;
> +}
> +
> +mp = dpdk_mp_get(dev->socket_id, dev->mtu);
> +if (!mp) {
> +err = ENOMEM;
> +goto out;
> +}
> +
> +rte_eth_dev_stop(dev->port_id);
> +
> +old_mtu = dev->mtu;
> +old_mp = dev->dpdk_mp;
> +dev->dpdk_mp = mp;
> +dev->mtu = mtu;
> +dev->max_packet_len = MTU_TO_MAX_LEN(dev->mtu);
> +
> +err = dpdk_eth_dev_init(dev);
> +if (err) {
> +
> +dpdk_mp_put(mp);
> +dev->mtu = old_mtu;
> +dev->dpdk_mp = old_mp;
> +dev->max_packet_len = MTU_TO_MAX_LEN(dev->mtu);
> +dpdk_eth_dev_init(dev);

Would be nice if we don't need these constructs and DPDK would
provide an all or nothing init method.

> +goto 

[dpdk-dev] [ovs-dev] [PATCH RFC] dpif-netdev: Add support Intel DPDK based ports.

2014-01-29 Thread Thomas Graf
Vincent,

On 01/29/2014 11:26 AM, Vincent JARDIN wrote:
> DPDK's ABIs are not Kernel's ABIs, they are not POSIX, there is no
> standard. Currently, there is no such plan to have a stable ABI since we
> need to keep freedom to chase CPU cycles over having a stable ABI. For
> instance, some applications on top of the DPDK process the packets in
> less than 150 CPU cycles (have a look at testpmd:
>http://dpdk.org/browse/dpdk/tree/app/test-pmd )

I understand the requirement to not introduce overhead with wrappers
or shim layers. No problem with that. I believe this is mainly a policy
and release process issue.

Without a concept of stable interfaces, it will be difficult to
package and distribute RTE libraries, PMD, and DPDK applications. Right
now, the obvious path would include packaging the PMD bits together
with each DPDK application depending on the version of DPDK the binary
was compiled against. This is clearly not ideal.

> I agree that some areas could be improved since they are not into the
> critical datapath of packets, but still other areas remain very CPU
> constraints. For instance:
> http://dpdk.org/browse/dpdk/commit/lib/librte_ether/rte_ethdev.h?id=c3d0564cf0f00c3c9a61cf72bd4bd1c441740637
>
> is bad:
> struct eth_dev_ops
> is churned, no comment, and a #ifdef that changes the structure
> according to compilation!

This is a very good example as it outlines the difference between
control structures and the fast path. We have this same exact trade off
in the kernel a lot where we have highly optimized internal APIs
towards modules and drivers but want to provide binary compatibility to
a certain extend.

As for the specific example you mention, it is relatively trivial to
make eth_dev_ops backwards compatible by appending appropriate padding
to the struct before a new major release and ensure that new members
are added by replacing the padding accordingly. Obviously no ifdefs
would be allowed anymore.

> Should an application use the librte libraries of the DPDK:
>- you can use RTE_VERSION and RTE_VERSION_NUM :
> http://dpdk.org/doc/api/rte__version_8h.html#a8775053b0f721b9fa0457494cfbb7ed9

Right. This would be more or less identical to requiring a specific
DPDK version in OVS_CHEC_DPDK. It's not ideal to require application to
clutter their code with #ifdefs all over for every new minor release
though.

>- you can write your own wrapper (with CPU overhead) in order to have
> a stable ABI, that wrapper should be tight to the versions of the librte
> => the overhead is part of your application instead of the DPDK,
>- *otherwise recompile your software, it is opensource, what's the
> issue?*
>
> We are opened to any suggestion to have stable ABI, but it should never
> remove the options to have fast/efficient/compilation/CPU execution
> processing.

Absolutely agreed. We also don't want to add tons of abstraction and
overcomplicate everything. Still, I strongly believe that the definition
of stable interfaces towards applications and especially PMD is
essential.

I'm not proposing to standardize all the APIs towards applications on
the level of POSIX. DPDK is in early stages and disruptive changes will
come along. What I would propose on an abstract level is:

1. Extend but not break API between minor releases. Postpone API
breakages to the next major release. High cadence of major
releases initially, lower cadence as DPDK matures.

2. Define ABI stability towards PMD for minor releases to allow
isolated packaging of PMD by padding control structures and keeping
functions ABI stable.

I realize that this might be less trivial than it seems without
sacrificing performance but I consider it effort well spent.

Thomas


[dpdk-dev] [ovs-dev] [PATCH RFC] dpif-netdev: Add support Intel DPDK based ports.

2014-01-29 Thread Thomas Graf
On 01/29/2014 05:34 PM, Vincent JARDIN wrote:
> Thomas,
>
> First and easy answer: it is open source, so anyone can recompile. So,
> what's the issue?

I'm talking from a pure distribution perspective here: Requiring to
recompile all DPDK based applications to distribute a bugfix or to
add support for a new PMD is not ideal.

So ideally OVS would have the possibility to link against the shared
library long term.

> I get lost: do you mean ABI + API toward the PMDs or towards the
> applications using the librte ?

Towards the PMDs is more straight forward at first so it seems logical
to focus on that first.

A stable API and ABI for librte seems required as well long term as
DPDK does offer shared libraries but I realize that this is a stretch
goal in the initial phase.


[dpdk-dev] [ovs-dev] [PATCH RFC] dpif-netdev: Add support Intel DPDK based ports.

2014-01-30 Thread Thomas Graf
On 01/29/2014 09:47 PM, Fran?ois-Fr?d?ric Ozog wrote:
> In the telecom world, if you fix the underlying framework of an app, you
> will still have to validate the solution, ie app/framework. In addition, the
> idea of shared libraries introduces the implied requirement to validate apps
> against diverse versions of DPDK shared libraries. This translates into
> development and support costs.
>
> I also expect many DPDK applications to tackle core networking features,
> with sub micro second packet handling delays  and even lower than 200ns
> (NAT64...). The lazy binding based on ELF PLT represent quite a cost, not
> mentioning that optimization stops are shared libraries boundaries (gcc
> whole program optimization can be very effective...). Microsoft DLL linkage
> are an order of magnitude faster. If Linux was to provide that, I would
> probably revise my judgment. (I haven't checked Linux dynamic linking
> implementation for some time so my understanding of Linux dynamic linking
> may be outdated).

All very valid points and I am not suggesting to stop offering the
static linking option in any way. Dynamic linking will by design result
in more cycles. My sole point is that for a core platform component
like OVS, the shared library benefits _might_ outweigh the performance
difference. In order for a shared library to be effective, some form of
ABI compatibility must be guaranteed though.

> I don't think it is so straight forward. Many recent cards such as Chelsio
> and Myricom have a very different "packet memory layout" that does not fit
> so easily into actual DPDK architecture.
>
> 1) "traditional" architecture: the driver reserves X buffers and provide the
> card with descriptors of those buffers. Each packet is DMA'ed into exactly
> one buffer. Typically you have 2K buffers, a 64 byte packet consumes exactly
> one buffer
>
> 2) "alternative" new architecture: the driver reserves a memory zone, say
> 4MB, without any structure, and provide a a single zone description and a
> ring buffer to the card. (there no individual buffer descriptors any more).
> The card fills the memory zone with packets, one next to the other and
> specifies where the packets are by updating the supplied ring. Out of the
> many issues fitting this scheme into DPDK, you cannot free a single mbuf:
> you have to maintain a ref count to the memory zone so that, when all mbufs
> have been "released", the memory zone can be freed.
> That's quite a stretch from actual paradigm.
>
> Apart from this aspect, managing RSS is two tied to Intel's flow director
> concepts and cannot accommodate directly smarter or dumber RSS mechanisms.
>
> That said, I fully agree PMD API should be revisited.

Fair enough. I don't see a reason why multiple interfaces could not
coexist in order to support multiple memory layouts. What I'm hearing
so far is that while there is no objection to bringing stability to the
APIs, it should not result in performance side effects and it is still
early to nail down the yet fluent APIs.