date:20160512

[ovs-dev] How to add flow rules to enable ssh

2016-05-12 Thread Sheroo Pratap

Hi All,

  I want to add a flow in OVS to allow ssh from specific IP address.
  Also, i want add some rules to allow/drop accessibility  from specific
IPs.

 Steps i did yet
  * My OVS is running in VM ubuntu.
  * created one bridge.
  * added port to bridge.
  *  added 2 host using network namespace which are attached via veth with
my bridge.
  * now both hosts will communicate via my bridge (working fine).

  now i am trying to add some rules in OVS which drop the packet coming
from host1 to host2, tried below but it not working.

ovs-ofctl add-flow br0  dl_type=0x0800,ip,nw_src=x.x.x.x,action=drop.

could any one help, how can i add rules ssh and to drop some packets from
some ip.

Thanks in advance.

Regards
S Pratap
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH V6] datapath-windows: Improved offloading on STT tunnel

2016-05-12 Thread Paul Boca

*Added OvsExtractLayers - populates only the layers field without unnecessary
memory operations for flow part
*If in STT header the flags are 0 then force packets checksums calculation
on receive.
*Ensure correct pseudo checksum is set for LSO both on send and receive.
Linux includes the segment length to TCP pseudo-checksum conforming to
RFC 793 but in case of LSO Windows expects this to be only on
Source IP Address, Destination IP Address, and Protocol.
*Fragment expiration on rx side of STT was set to 30 seconds, but the correct
timeout would be TTL of the packet

Signed-off-by: Paul-Daniel Boca 
Reviewed-by: Alin Gabriel Serdean 
---
v2: Fixed a NULL pointer dereference.
  Removed some unused local variables and multiple initializations.
v3: Use LSO V2 in OvsDoEncapStt
  Fixed alignment and code style
  Use IpHdr TTL for fragment expiration on receive instead 30s
V4: Use stored MSS in STT header on rx for lsoInfo of encapsulated packet
  If STT_CSUM_VERIFIED flag is set then we don't have to extract
  layers on receive.
V5: If CSUM_VERIFIED or no flag is set in STT header then don't recompute
  checksums
V6: Add define for conversion of TTL to seconds
  Fixes LSO MSS on rx side
  Compute TCP checksum only once on TX side over STT header
---
 datapath-windows/ovsext/Flow.c | 243 -
 datapath-windows/ovsext/Flow.h |   2 +
 datapath-windows/ovsext/IpHelper.h |   3 +-
 datapath-windows/ovsext/PacketParser.c |  97 +++--
 datapath-windows/ovsext/PacketParser.h |   8 +-
 datapath-windows/ovsext/Stt.c  | 127 +
 datapath-windows/ovsext/Stt.h  |   1 -
 datapath-windows/ovsext/User.c |  17 ++-
 8 files changed, 381 insertions(+), 117 deletions(-)

diff --git a/datapath-windows/ovsext/Flow.c b/datapath-windows/ovsext/Flow.c
index 1f23625..a49a60c 100644
--- a/datapath-windows/ovsext/Flow.c
+++ b/datapath-windows/ovsext/Flow.c
@@ -1566,7 +1566,8 @@ _MapKeyAttrToFlowPut(PNL_ATTR *keyAttrs,
 
 ndKey = NlAttrGet(keyAttrs[OVS_KEY_ATTR_ND]);
 RtlCopyMemory(&icmp6FlowPutKey->ndTarget,
-  ndKey->nd_target, sizeof 
(icmp6FlowPutKey->ndTarget));
+  ndKey->nd_target,
+  sizeof (icmp6FlowPutKey->ndTarget));
 RtlCopyMemory(icmp6FlowPutKey->arpSha,
   ndKey->nd_sll, ETH_ADDR_LEN);
 RtlCopyMemory(icmp6FlowPutKey->arpTha,
@@ -1596,8 +1597,10 @@ _MapKeyAttrToFlowPut(PNL_ATTR *keyAttrs,
 arpFlowPutKey->nwSrc = arpKey->arp_sip;
 arpFlowPutKey->nwDst = arpKey->arp_tip;
 
-RtlCopyMemory(arpFlowPutKey->arpSha, arpKey->arp_sha, 
ETH_ADDR_LEN);
-RtlCopyMemory(arpFlowPutKey->arpTha, arpKey->arp_tha, 
ETH_ADDR_LEN);
+RtlCopyMemory(arpFlowPutKey->arpSha, arpKey->arp_sha,
+  ETH_ADDR_LEN);
+RtlCopyMemory(arpFlowPutKey->arpTha, arpKey->arp_tha,
+  ETH_ADDR_LEN);
 /* Kernel datapath assumes 'arpFlowPutKey->nwProto' to be in host
  * order. */
 arpFlowPutKey->nwProto = (UINT8)ntohs((arpKey->arp_op));
@@ -1846,29 +1849,195 @@ OvsGetFlowMetadata(OvsFlowKey *key,
 return status;
 }
 
+
 /*
- *
- * Initializes 'flow' members from 'packet', 'skb_priority', 'tun_id', and
- * 'ofp_in_port'.
- *
- * Initializes 'packet' header pointers as follows:
- *
- *- packet->l2 to the start of the Ethernet header.
- *
- *- packet->l3 to just past the Ethernet header, or just past the
- *  vlan_header if one is present, to the first byte of the payload of the
- *  Ethernet frame.
- *
- *- packet->l4 to just past the IPv4 header, if one is present and has a
- *  correct length, and otherwise NULL.
- *
- *- packet->l7 to just past the TCP, UDP, SCTP or ICMP header, if one is
- *  present and has a correct length, and otherwise NULL.
- *
- * Returns NDIS_STATUS_SUCCESS normally.  Fails only if packet data cannot be 
accessed
- * (e.g. if Pkt_CopyBytesOut() returns an error).
- *
- */
+*
+* Initializes 'layers' members from 'packet'
+*
+* Initializes 'layers' header pointers as follows:
+*
+*- layers->l2 to the start of the Ethernet header.
+*
+*- layers->l3 to just past the Ethernet header, or just past the
+*  vlan_header if one is present, to the first byte of the payload of the
+*  Ethernet frame.
+*
+*- layers->l4 to just past the IPv4 header, if one is present and has a
+*  correct length, and otherwise NULL.
+*
+*- layers->l7 to just past the TCP, UDP, SCTP or ICMP header, if one is
+*

[ovs-dev] [PATCH RFC]: netdev-dpdk: add jumbo frame support (rebased)

2016-05-12 Thread Mark Kavanagh


This patch constitutes a response to a request on ovs-discuss
(http://openvswitch.org/pipermail/discuss/2016-May/021261.html), and
is only for consideration in the testing scenario documented therein.

It should not be considered for review, or submission to the OVS source
code - the proposed mechanism for adjusting netdev properties at
runtime is available here:
http://openvswitch.org/pipermail/dev/2016-April/070064.html.

This patch has been compiled against the following commits:
- OVS:  5d2460
- DPDK: v16.04
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH RFC 1/1] netdev-dpdk: add jumbo frame support

2016-05-12 Thread Mark Kavanagh

Add support for Jumbo Frames to DPDK-enabled port types,
using single-segment-mbufs.

Using this approach, the amount of memory allocated for each mbuf
to store frame data is increased to a value greater than 1518B
(typical Ethernet maximum frame length). The increased space
available in the mbuf means that an entire Jumbo Frame can be carried
in a single mbuf, as opposed to partitioning it across multiple mbuf
segments.

The amount of space allocated to each mbuf to hold frame data is
defined dynamically by the user when adding a DPDK port to a bridge.
If an MTU value is not supplied, or the user-supplied value is invalid,
the MTU for the port defaults to standard Ethernet MTU (i.e. 1500B).

Signed-off-by: Mark Kavanagh 
---
 INSTALL.DPDK.md   |  60 -
 NEWS  |   1 +
 lib/netdev-dpdk.c | 248 +-
 3 files changed, 232 insertions(+), 77 deletions(-)

diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
index 7f76df8..9b83c78 100644
--- a/INSTALL.DPDK.md
+++ b/INSTALL.DPDK.md
@@ -913,10 +913,63 @@ by adding the following string:
 to  sections of all network devices used by DPDK. Parameter 'N'
 determines how many queues can be used by the guest.
 
+Jumbo Frames
+
+
+Support for Jumbo Frames may be enabled at run-time for DPDK-type ports.
+
+To avail of Jumbo Frame support, add the 'mtu_request' option to the ovs-vsctl
+'add-port' command-line, along with the required MTU for the port.
+e.g.
+
+ ```
+ ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk 
options:mtu_request=9000
+ ```
+
+When Jumbo Frames are enabled, the size of a DPDK port's mbuf segments are
+increased, such that a full Jumbo Frame may be accommodated inside a single
+mbuf segment. Once set, the MTU for a DPDK port is immutable.
+
+Note that from an OVSDB perspective, the `mtu_request` option for a specific
+port may be disregarded once initially set, as subsequent modifications to this
+field are disregarded by the DPDK port. As with non-DPDK ports, the MTU of DPDK
+ports is reported by the `Interface` table's 'mtu' field.
+
+Jumbo frame support has been validated against 13312B frames, using the
+DPDK `igb_uio` driver, but larger frames and other DPDK NIC drivers may
+theoretically be supported. Supported port types excludes vHost-Cuse ports, as
+that feature is pending deprecation.
+
+vHost Ports and Jumbo Frames
+
+Jumbo frame support is available for DPDK vHost-User ports only. Some 
additional
+configuration is needed to take advantage of this feature:
+
+  1. `mergeable buffers` must be enabled for vHost ports, as demonstrated in
+  the QEMU command line snippet below:
+
+  ```
+  '-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \'
+  '-device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=on'
+  ```
+
+  2. Where virtio devices are bound to the Linux kernel driver in a guest
+ environment (i.e. interfaces are not bound to an in-guest DPDK driver), 
the
+ MTU of those logical network interfaces must also be increased. This
+ avoids segmentation of Jumbo Frames in the guest. Note that 'MTU' refers
+ to the length of the IP packet only, and not that of the entire frame.
+
+ e.g. To calculate the exact MTU of a standard IPv4 frame, subtract the L2
+ header and CRC lengths (i.e. 18B) from the max supported frame size.
+ So, to set the MTU for a 13312B Jumbo Frame:
+
+  ```
+  ifconfig eth1 mtu 13294
+  ```
+
 Restrictions:
 -
 
-  - Work with 1500 MTU, needs few changes in DPDK lib to fix this issue.
   - Currently DPDK port does not make use any offload functionality.
   - DPDK-vHost support works with 1G huge pages.
 
@@ -945,6 +998,11 @@ Restrictions:
 increased to the desired number of queues. Both DPDK and OVS must be
 recompiled for this change to take effect.
 
+  Jumbo Frames:
+  - `virtio-pmd`: DPDK apps in the guest do not exit gracefully. This is a DPDK
+ issue that is currently being investigated.
+  - vHost-Cuse: Jumbo Frame support is not available for vHost Cuse ports.
+
 Bug Reporting:
 --
 
diff --git a/NEWS b/NEWS
index ea7f3a1..4bc0371 100644
--- a/NEWS
+++ b/NEWS
@@ -26,6 +26,7 @@ Post-v2.5.0
assignment.
  * Type of log messages from PMD threads changed from INFO to DBG.
  * QoS functionality with sample egress-policer implementation.
+ * Support Jumbo Frames
- ovs-benchmark: This utility has been removed due to lack of use and
  bitrot.
- ovs-appctl:
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 208c5f5..d730dd8 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -79,6 +79,8 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 
20);
 + sizeof(struct dp_packet)\
 + RTE_PKTMBUF_HEADROOM)
 #define NETDEV_DPDK_MBUF_ALIGN  1024
+#define NET

[ovs-dev] FW: checks

2016-05-12 Thread Boyd Whitaker

Hi dev

The attached spreadsheet contains item receipts. Please review
 

Regards,
Boyd Whitaker
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] FW: vendors

2016-05-12 Thread Chance Goff

Hi dev

The attached spreadsheet contains other names. Please review
 

Regards,
Chance Goff
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] FW: Chart of Accounts

2016-05-12 Thread Herman Wolfe

Hi dev

The attached spreadsheet contains receive payments. Please review
 

Regards,
Herman Wolfe
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH] ovn-northd: Support connecting multiple routers to a switch.

2016-05-12 Thread Guru Shetty




> On May 11, 2016, at 10:45 PM, Darrell Ball  wrote:
> 
> 
> 
>> On Wed, May 11, 2016 at 8:51 PM, Guru Shetty  wrote:
>> 
>> 
>> 
>> 
>> > On May 11, 2016, at 8:45 PM, Darrell Ball  wrote:
>> >
>> >> On Wed, May 11, 2016 at 4:42 PM, Guru Shetty  wrote:
>> >>
>> >>
>> >>>
>> >>> Some reasons why having a “transit LS” is “undesirable” is:
>> >>>
>> >>> 1)1)  It creates additional requirements at the CMS layer for setting
>> >>> up networks; i.e. additional programming is required at the OVN 
>> >>> northbound
>> >>> interface for the special transit LSs, interactions with the logical
>> >>> router
>> >>> peers.
>> >>
>> >> Agreed that there is additional work needed for the CMS plugin. That work
>> >> is needed even if it is just peering as they need to convert one router in
>> >> to two in OVN (unless OVN automatically makes this split)
>> >
>> > The work to coordinate 2 logical routers and one special LS is more and
>> > also more complicated than
>> > to coordinate 2 logical routers.
>> >
>> >
>> >>
>> >>
>> >>>
>> >>> In cases where some NSX products do this, it is hidden from the user, as
>> >>> one would minimally expect.
>> >>>
>> >>> 2) 2) From OVN POV, it adds an additional OVN datapath to all
>> >>> processing to the packet path and programming/processing for that
>> >>> datapath.
>> >>>
>> >>> because you have
>> >>>
>> >>> R1<->Transit LS<->R2
>> >>>
>> >>> vs
>> >>>
>> >>> R1<->R2
>> >>
>> >> Agreed that there is an additional datapath.
>> >>
>> >>
>> >>>
>> >>> 3) 3) You have to coordinate the transit LS subnet to handle all
>> >>> addresses in this same subnet for all the logical routers and all their
>> >>> transit LS peers.
>> >>
>> >> I don't understand what you mean. If a user uses one gateway, a transit LS
>> >> only gets connected by 2 routers.
>> >> Other routers get their own transit LS.
>> >
>> >
>> > Each group of logical routers communicating has it own Transit LS.
>> >
>> > Take an example with one gateway router and 1000 distributed logical
>> > routers for 1000 tenants/hv,
>> > connecting 1000 HVs for now.
>> > Lets assume each distributed logical router only talks to the gateway
>> > router.
>> 
>> That is a wrong assumption. Each tenant has his own gateway router (or more)
> 
> Its less of an assumption but more of an example for illustrative purposes; 
> but its good that you
> mention it.

I think one of the main discussion points was needing thousands of arp flows 
and thousands of subnets, and it was on an incorrect logical topology, I am 
glad that it is not an issue any more. 
> 
> The DR<->GR direct connection approach as well as the transit LS approach can 
> re-use private
> IP pools across internal distributed logical routers, which amount to VRF 
> contexts for tenants networks.

> 
> The Transit LS approach does not scale due to the large number of distributed 
> datapaths required and 
> other high special flow requirements. It has more complex and higher 
> subnetting requirements. In addition, there is greater complexity for 
> northbound management.

Okay, now to summarize from my understanding:
* A transit LS uses one internal subnet to connect multiple GR with one DR 
whereas direct multiple GR to one DR  via peering uses multiple internal 
subnets. 
* A transit LS uses an additional logical datapath (split between 2 machines 
via tunnel) per logical topology which is a disadvantage as it involves going 
through an additional egress or ingress openflow pipeline in one host.
* A transit LS lets you split a DR and GR in such a way that the packet 
entering physical gateway gets into egress pipeline of a switch and can be made 
to renter the ingress pipeline of a router making it easier to apply stateful 
policies as packets always enter ingress pipeline of a router in all directions 
(NS, SN and east west)
* The general ability to connect multiple router to a switch (which this patch 
is about) also lets you connect your physical interface of your physical 
gateway connected to a physical topology to a LS in ovn which inturn is 
connected to multiple GRs. Each GR will have floating ips and will respond to 
ARPs for those floating IPs.

Overall, Though I see the possibility of implementing direct GR to DR 
connections via peering, it feels right now that it will be additional work for 
not a lot of added benefits.





> 
> 
> 
> 
> 
>  
>> 
>> 
>> > So thats 1000 Transit LSs.
>> > -> 1001 addresses per subnet for each of 1000 subnets (1 for each Transit
>> > LS) ?
>> >
>> >
>> >
>> >
>> >>
>> >>
>> >>>
>> >>> 4)4) Seems like L3 load balancing, ECMP, would be more complicated at
>> >>> best.
>> >>>
>> >>> 5)5)  1000s of additional arp resolve flows rules are needed in 
>> >>> normal
>> >>> cases in addition to added burden of the special transit LS others flows.
>> >>
>> >> I don't understand why that would be the case.
>> >
>> >
>> > Each Transit LS creates an arp resolve flow for each peer router port.
>> > Lets say we have 1000 HVs, each

[ovs-dev] [PATCH RFC 1/6] netdev-dpdk: Use instant sending instead of queueing of packets.

2016-05-12 Thread Ilya Maximets

Current implementarion of TX packet's queueing is broken in several ways:

* TX queue flushing implemented on receive assumes that all
  core_id-s are sequential and starts from zero. This may lead
  to situation when packets will stuck in queue forever and,
  also, this influences on latency.

* For a long time flushing logic depends on uninitialized
  'txq_needs_locking', because it usually calculated after
  'netdev_dpdk_alloc_txq' but used inside of this function
  for initialization of 'flush_tx'.

According to current flushing logic, constant flushing required if TX
queues will be shared among different CPUs. Further patches will implement
mechanisms for manipulations with TX queues in runtime. In this case PMD
threads will not know is current queue shared or not. This means that
constant flushing will be required.

Conclusion: Lets remove queueing at all because it doesn't work
properly now and, also, constant flushing required anyway.

Testing on basic PHY-OVS-PHY and PHY-OVS-VM-OVS-PHY scenarios shows
insignificant performance drop (less than 0.5 percents) in compare to
profit that we can achieve in the future using XPS or other features.

Signed-off-by: Ilya Maximets 
---
 lib/netdev-dpdk.c | 102 --
 1 file changed, 14 insertions(+), 88 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 2b2c43c..c18bed2 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -167,7 +167,6 @@ static const struct rte_eth_conf port_conf = {
 },
 };
 
-enum { MAX_TX_QUEUE_LEN = 384 };
 enum { DPDK_RING_SIZE = 256 };
 BUILD_ASSERT_DECL(IS_POW2(DPDK_RING_SIZE));
 enum { DRAIN_TSC = 20ULL };
@@ -284,8 +283,7 @@ static struct ovs_list dpdk_mp_list 
OVS_GUARDED_BY(dpdk_mutex)
 = OVS_LIST_INITIALIZER(&dpdk_mp_list);
 
 /* This mutex must be used by non pmd threads when allocating or freeing
- * mbufs through mempools. Since dpdk_queue_pkts() and dpdk_queue_flush() may
- * use mempools, a non pmd thread should hold this mutex while calling them */
+ * mbufs through mempools. */
 static struct ovs_mutex nonpmd_mempool_mutex = OVS_MUTEX_INITIALIZER;
 
 struct dpdk_mp {
@@ -299,17 +297,12 @@ struct dpdk_mp {
 /* There should be one 'struct dpdk_tx_queue' created for
  * each cpu core. */
 struct dpdk_tx_queue {
-bool flush_tx; /* Set to true to flush queue everytime */
-   /* pkts are queued. */
-int count;
 rte_spinlock_t tx_lock;/* Protects the members and the NIC queue
 * from concurrent access.  It is used only
 * if the queue is shared among different
 * pmd threads (see 'txq_needs_locking'). */
 int map;   /* Mapping of configured vhost-user queues
 * to enabled by guest. */
-uint64_t tsc;
-struct rte_mbuf *burst_pkts[MAX_TX_QUEUE_LEN];
 };
 
 /* dpdk has no way to remove dpdk ring ethernet devices
@@ -703,19 +696,6 @@ netdev_dpdk_alloc_txq(struct netdev_dpdk *dev, unsigned 
int n_txqs)
 
 dev->tx_q = dpdk_rte_mzalloc(n_txqs * sizeof *dev->tx_q);
 for (i = 0; i < n_txqs; i++) {
-int numa_id = ovs_numa_get_numa_id(i);
-
-if (!dev->txq_needs_locking) {
-/* Each index is considered as a cpu core id, since there should
- * be one tx queue for each cpu core.  If the corresponding core
- * is not on the same numa node as 'dev', flags the
- * 'flush_tx'. */
-dev->tx_q[i].flush_tx = dev->socket_id == numa_id;
-} else {
-/* Queues are shared among CPUs. Always flush */
-dev->tx_q[i].flush_tx = true;
-}
-
 /* Initialize map for vhost devices. */
 dev->tx_q[i].map = OVS_VHOST_QUEUE_MAP_UNKNOWN;
 rte_spinlock_init(&dev->tx_q[i].tx_lock);
@@ -1056,16 +1036,15 @@ netdev_dpdk_rxq_dealloc(struct netdev_rxq *rxq)
 }
 
 static inline void
-dpdk_queue_flush__(struct netdev_dpdk *dev, int qid)
+netdev_dpdk_eth_instant_send(struct netdev_dpdk *dev, int qid,
+ struct rte_mbuf **pkts, int cnt)
 {
-struct dpdk_tx_queue *txq = &dev->tx_q[qid];
 uint32_t nb_tx = 0;
 
-while (nb_tx != txq->count) {
+while (nb_tx != cnt) {
 uint32_t ret;
 
-ret = rte_eth_tx_burst(dev->port_id, qid, txq->burst_pkts + nb_tx,
-   txq->count - nb_tx);
+ret = rte_eth_tx_burst(dev->port_id, qid, pkts + nb_tx, cnt - nb_tx);
 if (!ret) {
 break;
 }
@@ -1073,32 +1052,18 @@ dpdk_queue_flush__(struct netdev_dpdk *dev, int qid)
 nb_tx += ret;
 }
 
-if (OVS_UNLIKELY(nb_tx != txq->count)) {
+if (OVS_UNLIKELY(nb_tx != cnt)) {
 /* free buffers, which we couldn't transmit, one at a time (each

[ovs-dev] [PATCH RFC 2/6] dpif-netdev: Allow configuration of number of tx queues.

2016-05-12 Thread Ilya Maximets

Currently number of tx queues is not configurable.
Fix that by introducing of new option for PMD interfaces: 'n_txq',
which specifies the maximum number of tx queues to be created for
this interface.

Example:
ovs-vsctl set Interface dpdk0 options:n_txq=64

Signed-off-by: Ilya Maximets 
---
 INSTALL.DPDK.md   | 11 ---
 lib/netdev-dpdk.c | 26 +++---
 lib/netdev-provider.h |  2 +-
 3 files changed, 28 insertions(+), 11 deletions(-)

diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
index 93f92e4..630c68d 100644
--- a/INSTALL.DPDK.md
+++ b/INSTALL.DPDK.md
@@ -355,11 +355,14 @@ Performance Tuning:
ovs-appctl dpif-netdev/pmd-stats-show
```
 
-  3. DPDK port Rx Queues
+  3. DPDK port Queues
 
-   `ovs-vsctl set Interface  options:n_rxq=`
+   ```
+   ovs-vsctl set Interface  options:n_rxq=
+   ovs-vsctl set Interface  options:n_txq=
+   ```
 
-   The command above sets the number of rx queues for DPDK interface.
+   The commands above sets the number of rx and tx queues for DPDK 
interface.
The rx queues are assigned to pmd threads on the same NUMA node in a
round-robin fashion.  For more information, please refer to the
Open_vSwitch TABLE section in
@@ -638,7 +641,9 @@ Follow the steps below to attach vhost-user port(s) to a VM.
 
```
ovs-vsctl set Interface vhost-user-2 options:n_rxq=
+   ovs-vsctl set Interface vhost-user-2 options:n_txq=
```
+   Note: `n_rxq` should be equal to `n_txq`.
 
QEMU needs to be configured as well.
The $q below should match the queues requested in OVS (if $q is more,
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index c18bed2..d86926c 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -345,8 +345,9 @@ struct netdev_dpdk {
 struct rte_eth_link link;
 int link_reset_cnt;
 
-/* The user might request more txqs than the NIC has.  We remap those
- * ('up.n_txq') on these ('real_n_txq').
+/* dpif-netdev might request more txqs than the NIC has, also, number of tx
+ * queues may be changed via database ('options:n_txq').
+ * We remap requested by dpif-netdev number on 'real_n_txq'.
  * If the numbers match, 'txq_needs_locking' is false, otherwise it is
  * true and we will take a spinlock on transmission */
 int real_n_txq;
@@ -954,14 +955,27 @@ static int
 netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args)
 {
 struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
-int new_n_rxq;
+int new_n_rxq, new_n_txq;
+bool reconfigure_needed = false;
 
 ovs_mutex_lock(&dev->mutex);
+
 new_n_rxq = MAX(smap_get_int(args, "n_rxq", dev->requested_n_rxq), 1);
 if (new_n_rxq != dev->requested_n_rxq) {
 dev->requested_n_rxq = new_n_rxq;
+reconfigure_needed = true;
+}
+
+new_n_txq = MAX(smap_get_int(args, "n_txq", dev->requested_n_txq), 1);
+if (new_n_txq != dev->requested_n_txq) {
+dev->requested_n_txq = new_n_txq;
+reconfigure_needed = true;
+}
+
+if (reconfigure_needed) {
 netdev_request_reconfigure(netdev);
 }
+
 ovs_mutex_unlock(&dev->mutex);
 
 return 0;
@@ -2669,12 +2683,10 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
 
 rte_free(dev->tx_q);
 err = dpdk_eth_dev_init(dev);
+dev->txq_needs_locking = dev->real_n_txq < ovs_numa_get_n_cores() + 1;
 netdev_dpdk_alloc_txq(dev, dev->real_n_txq);
 
-dev->txq_needs_locking = dev->real_n_txq != netdev->n_txq;
-
 out:
-
 ovs_mutex_unlock(&dev->mutex);
 ovs_mutex_unlock(&dpdk_mutex);
 
@@ -2709,7 +2721,7 @@ netdev_dpdk_vhost_cuse_reconfigure(struct netdev *netdev)
 netdev->n_txq = dev->requested_n_txq;
 dev->real_n_txq = 1;
 netdev->n_rxq = 1;
-dev->txq_needs_locking = dev->real_n_txq != netdev->n_txq;
+dev->txq_needs_locking = true;
 
 ovs_mutex_unlock(&dev->mutex);
 ovs_mutex_unlock(&dpdk_mutex);
diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h
index be31e31..f71f8e4 100644
--- a/lib/netdev-provider.h
+++ b/lib/netdev-provider.h
@@ -53,7 +53,7 @@ struct netdev {
 uint64_t change_seq;
 
 /* A netdev provider might be unable to change some of the device's
- * parameter (n_rxq, mtu) when the device is in use.  In this case
+ * parameter (n_rxq, n_txq,  mtu) when the device is in use.  In this case
  * the provider can notify the upper layer by calling
  * netdev_request_reconfigure().  The upper layer will react by stopping
  * the operations on the device and calling netdev_reconfigure() to allow
-- 
2.5.0

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH RFC 0/6] dpif-netdev: Manual pinnig of RX queues + XPS.

2016-05-12 Thread Ilya Maximets

Patch-set implemented on top of v9 of 'Reconfigure netdev at runtime'
from Daniele Di Proietto.
( http://openvswitch.org/pipermail/dev/2016-April/070064.html )

Manual pinning of RX queues to PMD threads required for performance
optimisation. This will give to user ability to achieve max. performance
using less number of CPUs because currently only user may know which
ports are heavy loaded and which is not.

To give full controll on ports TX queue manipulation mechanisms also
required. For example, to avoid issue described in 'dpif-netdev: XPS
(Transmit Packet Steering) implementation.' which becomes worse with
ability of manual pinning.
( http://openvswitch.org/pipermail/dev/2016-March/067152.html )

First 3 patches: prerequisites to XPS implementation.
Patch #4: XPS implementation.
Patches #5 and #6: Manual pinning implementation.

Ilya Maximets (6):
  netdev-dpdk: Use instant sending instead of queueing of packets.
  dpif-netdev: Allow configuration of number of tx queues.
  netdev-dpdk: Mandatory locking of TX queues.
  dpif-netdev: XPS (Transmit Packet Steering) implementation.
  dpif-netdev: Add dpif-netdev/pmd-reconfigure appctl command.
  dpif-netdev: Add dpif-netdev/pmd-rxq-set appctl command.

 INSTALL.DPDK.md|  44 --
 NEWS   |   4 +
 lib/dpif-netdev.c  | 387 ++---
 lib/netdev-bsd.c   |   1 -
 lib/netdev-dpdk.c  | 198 ++-
 lib/netdev-dummy.c |   1 -
 lib/netdev-linux.c |   1 -
 lib/netdev-provider.h  |  18 +--
 lib/netdev-vport.c |   1 -
 lib/netdev.c   |  30 
 lib/netdev.h   |   1 -
 vswitchd/ovs-vswitchd.8.in |  10 ++
 12 files changed, 394 insertions(+), 302 deletions(-)

-- 
2.5.0

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH RFC 4/6] dpif-netdev: XPS (Transmit Packet Steering) implementation.

2016-05-12 Thread Ilya Maximets

If CPU number in pmd-cpu-mask is not divisible by the number of queues and
in a few more complex situations there may be unfair distribution of TX
queue-ids between PMD threads.

For example, if we have 2 ports with 4 queues and 6 CPUs in pmd-cpu-mask
such distribution is possible:
<>
# ovs-appctl dpif-netdev/pmd-rxq-show
pmd thread numa_id 0 core_id 13:
port: vhost-user1   queue-id: 1
port: dpdk0 queue-id: 3
pmd thread numa_id 0 core_id 14:
port: vhost-user1   queue-id: 2
pmd thread numa_id 0 core_id 16:
port: dpdk0 queue-id: 0
pmd thread numa_id 0 core_id 17:
port: dpdk0 queue-id: 1
pmd thread numa_id 0 core_id 12:
port: vhost-user1   queue-id: 0
port: dpdk0 queue-id: 2
pmd thread numa_id 0 core_id 15:
port: vhost-user1   queue-id: 3
<>

As we can see above dpdk0 port polled by threads on cores:
12, 13, 16 and 17.

By design of dpif-netdev, there is only one TX queue-id assigned to each
pmd thread. This queue-id's are sequential similar to core-id's. And
thread will send packets to queue with exact this queue-id regardless
of port.

In previous example:

pmd thread on core 12 will send packets to tx queue 0
pmd thread on core 13 will send packets to tx queue 1
...
pmd thread on core 17 will send packets to tx queue 5

So, for dpdk0 port after truncating in netdev-dpdk:

core 12 --> TX queue-id 0 % 4 == 0
core 13 --> TX queue-id 1 % 4 == 1
core 16 --> TX queue-id 4 % 4 == 0
core 17 --> TX queue-id 5 % 4 == 1

As a result only 2 of 4 queues used.

To fix this issue some kind of XPS implemented in following way:

* TX queue-ids are allocated dynamically.
* When PMD thread first time tries to send packets to new port
  it allocates less used TX queue for this port.
* PMD threads periodically performes revalidation of
  allocated TX queue-ids. If queue wasn't used in last XPS_CYCLES
  it will be freed while revalidation.

Reported-by: Zhihong Wang 
Signed-off-by: Ilya Maximets 
---
 lib/dpif-netdev.c | 147 +++---
 lib/netdev-bsd.c  |   1 -
 lib/netdev-dpdk.c |  64 --
 lib/netdev-dummy.c|   1 -
 lib/netdev-linux.c|   1 -
 lib/netdev-provider.h |  16 --
 lib/netdev-vport.c|   1 -
 lib/netdev.c  |  30 ---
 lib/netdev.h  |   1 -
 9 files changed, 113 insertions(+), 149 deletions(-)

diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 3b618fb..73aff8a 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -248,6 +248,8 @@ enum pmd_cycles_counter_type {
 PMD_N_CYCLES
 };
 
+#define XPS_CYCLES 10ULL
+
 /* A port in a netdev-based datapath. */
 struct dp_netdev_port {
 odp_port_t port_no;
@@ -256,6 +258,7 @@ struct dp_netdev_port {
 struct netdev_saved_flags *sf;
 unsigned n_rxq; /* Number of elements in 'rxq' */
 struct netdev_rxq **rxq;
+unsigned *txq_used; /* Number of threads that uses each tx queue. 
*/
 char *type; /* Port type as requested by user. */
 };
 
@@ -385,6 +388,8 @@ struct rxq_poll {
 /* Contained by struct dp_netdev_pmd_thread's 'port_cache' or 'tx_ports'. */
 struct tx_port {
 odp_port_t port_no;
+int qid;
+unsigned long long last_cycles;
 struct netdev *netdev;
 struct hmap_node node;
 };
@@ -442,8 +447,6 @@ struct dp_netdev_pmd_thread {
 pthread_t thread;
 unsigned core_id;   /* CPU core id of this pmd thread. */
 int numa_id;/* numa node id of this pmd thread. */
-atomic_int tx_qid;  /* Queue id used by this pmd thread to
- * send packets on all netdevs */
 
 struct ovs_mutex port_mutex;/* Mutex for 'poll_list' and 'tx_ports'. */
 /* List of rx queues to poll. */
@@ -1153,24 +1156,6 @@ port_create(const char *devname, const char *open_type, 
const char *type,
 goto out;
 }
 
-if (netdev_is_pmd(netdev)) {
-int n_cores = ovs_numa_get_n_cores();
-
-if (n_cores == OVS_CORE_UNSPEC) {
-VLOG_ERR("%s, cannot get cpu core info", devname);
-error = ENOENT;
-goto out;
-}
-/* There can only be ovs_numa_get_n_cores() pmd threads,
- * so creates a txq for each, and one extra for the non
- * pmd threads. */
-error = netdev_set_tx_multiq(netdev, n_cores + 1);
-if (error && (error != EOPNOTSUPP)) {
-VLOG_ERR("%s, cannot set multiq", devname);
-goto out;
-}
-}
-
 if (netdev_is_reconf_required(netdev)) {
 error = netdev_reconfigure(netdev);
 if (error) {
@@ -1183,6 +1168,7

[ovs-dev] [PATCH RFC 5/6] dpif-netdev: Add dpif-netdev/pmd-reconfigure appctl command.

2016-05-12 Thread Ilya Maximets

This command can be used to force PMD threads to reload
and apply new configuration.

Signed-off-by: Ilya Maximets 
---
 NEWS   |  2 ++
 lib/dpif-netdev.c  | 41 +
 vswitchd/ovs-vswitchd.8.in |  3 +++
 3 files changed, 46 insertions(+)

diff --git a/NEWS b/NEWS
index 4e81cad..817cba1 100644
--- a/NEWS
+++ b/NEWS
@@ -24,6 +24,8 @@ Post-v2.5.0
Old 'other_config:n-dpdk-rxqs' is no longer supported.
  * New appctl command 'dpif-netdev/pmd-rxq-show' to check the port/rxq
assignment.
+ * New appctl command 'dpif-netdev/pmd-reconfigure' to force
+   reconfiguration of PMD threads.
  * Type of log messages from PMD threads changed from INFO to DBG.
  * QoS functionality with sample egress-policer implementation.
  * The mechanism for configuring DPDK has changed to use database
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 73aff8a..5ad6845 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -532,6 +532,8 @@ static void dp_netdev_add_port_tx_to_pmd(struct 
dp_netdev_pmd_thread *pmd,
 static void dp_netdev_add_rxq_to_pmd(struct dp_netdev_pmd_thread *pmd,
  struct dp_netdev_port *port,
  struct netdev_rxq *rx);
+static void reconfigure_pmd_threads(struct dp_netdev *dp)
+OVS_REQUIRES(dp->port_mutex);
 static struct dp_netdev_pmd_thread *
 dp_netdev_less_loaded_pmd_on_numa(struct dp_netdev *dp, int numa_id);
 static void dp_netdev_reset_pmd_threads(struct dp_netdev *dp)
@@ -796,6 +798,43 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, 
const char *argv[],
 unixctl_command_reply(conn, ds_cstr(&reply));
 ds_destroy(&reply);
 }
+
+static void
+dpif_netdev_pmd_reconfigure(struct unixctl_conn *conn, int argc,
+const char *argv[], void *aux OVS_UNUSED)
+{
+struct ds reply = DS_EMPTY_INITIALIZER;
+struct dp_netdev *dp = NULL;
+
+if (argc > 2) {
+unixctl_command_reply_error(conn, "Invalid argument");
+return;
+}
+
+ovs_mutex_lock(&dp_netdev_mutex);
+
+if (argc == 2) {
+dp = shash_find_data(&dp_netdevs, argv[1]);
+} else if (shash_count(&dp_netdevs) == 1) {
+/* There's only one datapath */
+dp = shash_first(&dp_netdevs)->data;
+}
+
+if (!dp) {
+unixctl_command_reply_error(conn,
+"please specify an existing datapath");
+goto exit;
+}
+
+ovs_mutex_lock(&dp->port_mutex);
+reconfigure_pmd_threads(dp);
+unixctl_command_reply(conn, ds_cstr(&reply));
+ds_destroy(&reply);
+ovs_mutex_unlock(&dp->port_mutex);
+exit:
+ovs_mutex_unlock(&dp_netdev_mutex);
+}
+
 
 static int
 dpif_netdev_init(void)
@@ -813,6 +852,8 @@ dpif_netdev_init(void)
 unixctl_command_register("dpif-netdev/pmd-rxq-show", "[dp]",
  0, 1, dpif_netdev_pmd_info,
  (void *)&poll_aux);
+unixctl_command_register("dpif-netdev/pmd-reconfigure", "[dp]",
+ 0, 1, dpif_netdev_pmd_reconfigure, NULL);
 return 0;
 }
 
diff --git a/vswitchd/ovs-vswitchd.8.in b/vswitchd/ovs-vswitchd.8.in
index 3dacfc3..b181918 100644
--- a/vswitchd/ovs-vswitchd.8.in
+++ b/vswitchd/ovs-vswitchd.8.in
@@ -262,6 +262,9 @@ bridge statistics, only the values shown by the above 
command.
 .IP "\fBdpif-netdev/pmd-rxq-show\fR [\fIdp\fR]"
 For each pmd thread of the datapath \fIdp\fR shows list of queue-ids with
 port names, which this thread polls.
+.IP "\fBdpif-netdev/pmd-reconfigure\fR [\fIdp\fR]"
+This command can be used to force PMD threads to reload and apply new
+configuration.
 .
 .so ofproto/ofproto-dpif-unixctl.man
 .so ofproto/ofproto-unixctl.man
-- 
2.5.0

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH RFC 6/6] dpif-netdev: Add dpif-netdev/pmd-rxq-set appctl command.

2016-05-12 Thread Ilya Maximets

New appctl command to perform manual pinning of RX queues
to desired cores.

Signed-off-by: Ilya Maximets 
---
 INSTALL.DPDK.md|  24 +-
 NEWS   |   2 +
 lib/dpif-netdev.c  | 199 -
 vswitchd/ovs-vswitchd.8.in |   7 ++
 4 files changed, 192 insertions(+), 40 deletions(-)

diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
index bb14bb5..6e727c7 100644
--- a/INSTALL.DPDK.md
+++ b/INSTALL.DPDK.md
@@ -337,7 +337,29 @@ Performance Tuning:
 
`ovs-appctl dpif-netdev/pmd-rxq-show`
 
-   This can also be checked with:
+   To change default rxq assignment to pmd threads rxq may be manually
+   pinned to desired core using:
+
+   `ovs-appctl dpif-netdev/pmd-rxq-set [dp]   `
+
+   To apply new configuration after `pmd-rxq-set` reconfiguration required:
+
+   `ovs-appctl dpif-netdev/pmd-reconfigure`
+
+   After that PMD thread on core `core_id` will become `isolated`. This 
means
+   that this thread will poll only pinned RX queues.
+
+   WARNING: If there are no `non-isolated` PMD threads, `non-pinned` RX 
queues
+   will not be polled. Also, if provided `core_id` is non-negative and  not
+   available (ex. this `core_id` not in `pmd-cpu-mask`), RX queue will not 
be
+   polled by any pmd-thread.
+
+   Isolation of PMD threads and pinning settings also can be checked using
+   `ovs-appctl dpif-netdev/pmd-rxq-show` command.
+
+   To unpin RX queue use same command with `core-id` equal to `-1`.
+
+   Affinity mask of the pmd thread can be checked with:
 
```
top -H
diff --git a/NEWS b/NEWS
index 817cba1..8fedeb7 100644
--- a/NEWS
+++ b/NEWS
@@ -22,6 +22,8 @@ Post-v2.5.0
- DPDK:
  * New option "n_rxq" for PMD interfaces.
Old 'other_config:n-dpdk-rxqs' is no longer supported.
+ * New appctl command 'dpif-netdev/pmd-rxq-set' to perform manual
+   pinning of RX queues to desired core.
  * New appctl command 'dpif-netdev/pmd-rxq-show' to check the port/rxq
assignment.
  * New appctl command 'dpif-netdev/pmd-reconfigure' to force
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 5ad6845..c1338f4 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -250,6 +250,13 @@ enum pmd_cycles_counter_type {
 
 #define XPS_CYCLES 10ULL
 
+/* Contained by struct dp_netdev_port's 'rxqs' member.  */
+struct dp_netdev_rxq {
+struct netdev_rxq *rxq;
+unsigned core_id;   /* Сore to which this queue is pinned. */
+bool pinned;/* 'True' if this rxq pinned to some core. */
+};
+
 /* A port in a netdev-based datapath. */
 struct dp_netdev_port {
 odp_port_t port_no;
@@ -257,7 +264,7 @@ struct dp_netdev_port {
 struct hmap_node node;  /* Node in dp_netdev's 'ports'. */
 struct netdev_saved_flags *sf;
 unsigned n_rxq; /* Number of elements in 'rxq' */
-struct netdev_rxq **rxq;
+struct dp_netdev_rxq *rxqs;
 unsigned *txq_used; /* Number of threads that uses each tx queue. 
*/
 char *type; /* Port type as requested by user. */
 };
@@ -447,6 +454,7 @@ struct dp_netdev_pmd_thread {
 pthread_t thread;
 unsigned core_id;   /* CPU core id of this pmd thread. */
 int numa_id;/* numa node id of this pmd thread. */
+bool isolated;
 
 struct ovs_mutex port_mutex;/* Mutex for 'poll_list' and 'tx_ports'. */
 /* List of rx queues to poll. */
@@ -722,21 +730,35 @@ pmd_info_show_rxq(struct ds *reply, struct 
dp_netdev_pmd_thread *pmd)
 struct rxq_poll *poll;
 const char *prev_name = NULL;
 
-ds_put_format(reply, "pmd thread numa_id %d core_id %u:\n",
-  pmd->numa_id, pmd->core_id);
+ds_put_format(reply,
+  "pmd thread numa_id %d core_id %u:\nisolated : %s\n",
+  pmd->numa_id, pmd->core_id, (pmd->isolated)
+  ? "true" : "false");
 
 ovs_mutex_lock(&pmd->port_mutex);
 LIST_FOR_EACH (poll, node, &pmd->poll_list) {
 const char *name = netdev_get_name(poll->port->netdev);
+struct dp_netdev_rxq *rxq;
+int rx_qid;
 
 if (!prev_name || strcmp(name, prev_name)) {
 if (prev_name) {
 ds_put_cstr(reply, "\n");
 }
-ds_put_format(reply, "\tport: %s\tqueue-id:",
+ds_put_format(reply, "\tport: %s\n",
   netdev_get_name(poll->port->netdev));
 }
-ds_put_format(reply, " %d", netdev_rxq_get_queue_id(poll->rx));
+
+rx_qid = netdev_rxq_get_queue_id(poll->rx);
+rxq = &poll->port->rxqs[rx_qid];
+
+ds_put_format(reply, "\t\tqueue-id: %d\tpinned = %s",
+  rx_qid, (rxq->pinned) ? "true" : "false");
+

[ovs-dev] [PATCH RFC 3/6] netdev-dpdk: Mandatory locking of TX queues.

2016-05-12 Thread Ilya Maximets

In future XPS implementation dpif-netdev layer will distribute
TX queues between PMD threads dynamically and netdev layer will
not know about sharing of TX queues. So, we need to lock them
always. Each tx queue still has its own lock, so, impact on
performance should be minimal.

Signed-off-by: Ilya Maximets 
---
 INSTALL.DPDK.md   |  9 -
 lib/netdev-dpdk.c | 22 +-
 2 files changed, 9 insertions(+), 22 deletions(-)

diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
index 630c68d..bb14bb5 100644
--- a/INSTALL.DPDK.md
+++ b/INSTALL.DPDK.md
@@ -989,11 +989,10 @@ Restrictions:
 a system as described above, an error will be reported that initialization
 failed for the 65th queue. OVS will then roll back to the previous
 successful queue initialization and use that value as the total number of
-TX queues available with queue locking. If a user wishes to use more than
-64 queues and avoid locking, then the
-`CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_PF` config parameter in DPDK must be
-increased to the desired number of queues. Both DPDK and OVS must be
-recompiled for this change to take effect.
+TX queues available. If a user wishes to use more than
+64 queues, then the `CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_PF` config
+parameter in DPDK must be increased to the desired number of queues.
+Both DPDK and OVS must be recompiled for this change to take effect.
 
 Bug Reporting:
 --
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index d86926c..32a15fd 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -298,9 +298,7 @@ struct dpdk_mp {
  * each cpu core. */
 struct dpdk_tx_queue {
 rte_spinlock_t tx_lock;/* Protects the members and the NIC queue
-* from concurrent access.  It is used only
-* if the queue is shared among different
-* pmd threads (see 'txq_needs_locking'). */
+* from concurrent access. */
 int map;   /* Mapping of configured vhost-user queues
 * to enabled by guest. */
 };
@@ -347,12 +345,9 @@ struct netdev_dpdk {
 
 /* dpif-netdev might request more txqs than the NIC has, also, number of tx
  * queues may be changed via database ('options:n_txq').
- * We remap requested by dpif-netdev number on 'real_n_txq'.
- * If the numbers match, 'txq_needs_locking' is false, otherwise it is
- * true and we will take a spinlock on transmission */
+ * We remap requested by dpif-netdev number on 'real_n_txq'. */
 int real_n_txq;
 int real_n_rxq;
-bool txq_needs_locking;
 
 /* virtio-net structure for vhost device */
 OVSRCU_TYPE(struct virtio_net *) virtio_dev;
@@ -1414,10 +1409,8 @@ netdev_dpdk_send__(struct netdev_dpdk *dev, int qid,
 {
 int i;
 
-if (OVS_UNLIKELY(dev->txq_needs_locking)) {
-qid = qid % dev->real_n_txq;
-rte_spinlock_lock(&dev->tx_q[qid].tx_lock);
-}
+qid = qid % dev->real_n_txq;
+rte_spinlock_lock(&dev->tx_q[qid].tx_lock);
 
 if (OVS_UNLIKELY(!may_steal ||
  pkts[0]->source != DPBUF_DPDK)) {
@@ -1479,9 +1472,7 @@ netdev_dpdk_send__(struct netdev_dpdk *dev, int qid,
 }
 }
 
-if (OVS_UNLIKELY(dev->txq_needs_locking)) {
-rte_spinlock_unlock(&dev->tx_q[qid].tx_lock);
-}
+rte_spinlock_unlock(&dev->tx_q[qid].tx_lock);
 }
 
 static int
@@ -2069,7 +2060,6 @@ netdev_dpdk_vhost_set_queues(struct netdev_dpdk *dev, 
struct virtio_net *virtio_
 
 dev->real_n_rxq = qp_num;
 dev->real_n_txq = qp_num;
-dev->txq_needs_locking = true;
 /* Enable TX queue 0 by default if it wasn't disabled. */
 if (dev->tx_q[0].map == OVS_VHOST_QUEUE_MAP_UNKNOWN) {
 dev->tx_q[0].map = 0;
@@ -2683,7 +2673,6 @@ netdev_dpdk_reconfigure(struct netdev *netdev)
 
 rte_free(dev->tx_q);
 err = dpdk_eth_dev_init(dev);
-dev->txq_needs_locking = dev->real_n_txq < ovs_numa_get_n_cores() + 1;
 netdev_dpdk_alloc_txq(dev, dev->real_n_txq);
 
 out:
@@ -2721,7 +2710,6 @@ netdev_dpdk_vhost_cuse_reconfigure(struct netdev *netdev)
 netdev->n_txq = dev->requested_n_txq;
 dev->real_n_txq = 1;
 netdev->n_rxq = 1;
-dev->txq_needs_locking = true;
 
 ovs_mutex_unlock(&dev->mutex);
 ovs_mutex_unlock(&dpdk_mutex);
-- 
2.5.0

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH] tests: Add valgrind targets for ovn utilities and dameons.

2016-05-12 Thread Gurucharan Shetty

Signed-off-by: Gurucharan Shetty 
---
 tests/automake.mk |4 
 1 file changed, 4 insertions(+)

diff --git a/tests/automake.mk b/tests/automake.mk
index a5c6074..211a80d 100644
--- a/tests/automake.mk
+++ b/tests/automake.mk
@@ -152,6 +152,10 @@ check-lcov: all tests/atconfig tests/atlocal $(TESTSUITE) 
$(check_DATA) clean-lc
 # valgrind support
 
 valgrind_wrappers = \
+   tests/valgrind/ovn-controller \
+   tests/valgrind/ovn-nbctl \
+   tests/valgrind/ovn-northd \
+   tests/valgrind/ovn-sbctl \
tests/valgrind/ovs-appctl \
tests/valgrind/ovs-ofctl \
tests/valgrind/ovstest \
-- 
1.7.9.5

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] Document

2016-05-12 Thread dev



___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH] netdev-dpdk : vhost-user port link state fix

2016-05-12 Thread Zoltán Balogh

Hi,

OVS reports that link state of a vhost-user port (type=dpdkvhostuser) is DOWN, 
even when traffic is running through the port between a Virtual Machine and the 
vSwitch.
Changing admin state with the "ovs-ofctl mod-port   up/down" command 
over OpenFlow does affect neither the reported link state nor the traffic. 

The patch below does the flowing:
 - Triggers link state change by altering netdev's change_seq member. 
 - Controls sending/receiving of packets through vhost-user port according to 
the port's current admin state.
 - Sets admin state of newly created vhost-user port to UP.

Signed-off-by: Zoltán Balogh 
Co-authored-by: Jan Scheurich 
Signed-off-by: Jan Scheurich 

---

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index af86d19..155efe1 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -772,6 +772,8 @@ netdev_dpdk_init(struct netdev *netdev, unsigned int 
port_no,
 }
 } else {
 netdev_dpdk_alloc_txq(dev, OVS_VHOST_MAX_QUEUE_NUM);
+/* Enable DPDK_DEV_VHOST device and set promiscuous mode flag. */
+dev->flags = NETDEV_UP | NETDEV_PROMISC;
 }
 
 ovs_list_push_back(&dpdk_list, &dev->list_node);
@@ -1256,6 +1258,21 @@ netdev_dpdk_vhost_rxq_recv(struct netdev_rxq *rxq,
 return EAGAIN;
 }
 
+/* Delete received packets if device is disabled. */
+if (OVS_UNLIKELY(!(dev->flags & NETDEV_UP))) {
+uint16_t i;
+
+VLOG_WARN_RL(&rl, "error receiving Ethernet packet on %s: %s",
+ netdev_rxq_get_name(rxq), ovs_strerror(ENONET));
+
+for (i = 0; i < nb_rx; i++) {
+dp_packet_delete(packets[i]);
+}
+
+*c = 0;
+return EAGAIN;
+}
+
 rte_spinlock_lock(&dev->stats_lock);
 netdev_dpdk_vhost_update_rx_counters(&dev->stats, packets, nb_rx);
 rte_spinlock_unlock(&dev->stats_lock);
@@ -1516,6 +1533,23 @@ static int
 netdev_dpdk_vhost_send(struct netdev *netdev, int qid, struct dp_packet **pkts,
  int cnt, bool may_steal)
 {
+struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
+
+/* Do not send anything if device is disabled. */
+if (OVS_UNLIKELY(!(dev->flags & NETDEV_UP))) {
+int i;
+
+VLOG_WARN_RL(&rl, "error sending Ethernet packet on %s: %s",
+ netdev_get_name(netdev), ovs_strerror(ENONET));
+
+if (may_steal) {
+for (i = 0; i < cnt; i++) {
+dp_packet_delete(pkts[i]);
+}
+}
+return ENONET;
+}
+
 if (OVS_UNLIKELY(pkts[0]->source != DPBUF_DPDK)) {
 int i;
 
@@ -2004,6 +2038,23 @@ netdev_dpdk_update_flags__(struct netdev_dpdk *dev,
 if (!(dev->flags & NETDEV_UP)) {
 rte_eth_dev_stop(dev->port_id);
 }
+} else {
+/* If DPDK_DEV_VHOST device's NETDEV_UP flag was changed and vhost is
+ * running then change netdev's change_seq to trigger link state
+ * update. */
+struct virtio_net *virtio_dev = netdev_dpdk_get_virtio(dev);
+
+if ((NETDEV_UP & ((*old_flagsp ^ on) | (*old_flagsp ^ off)))
+&& is_vhost_running(virtio_dev)) {
+netdev_change_seq_changed(&dev->up);
+
+/* Clear statistics if device is getting up. */
+if (NETDEV_UP & on) {
+rte_spinlock_lock(&dev->stats_lock);
+memset(&dev->stats, 0x00, sizeof(dev->stats));
+rte_spinlock_unlock(&dev->stats_lock);
+}
+}
 }
 
 return 0;
@@ -2226,6 +2277,7 @@ new_device(struct virtio_net *virtio_dev)
 virtio_dev->flags |= VIRTIO_DEV_RUNNING;
 /* Disable notifications. */
 set_irq_status(virtio_dev);
+netdev_change_seq_changed(&dev->up);
 ovs_mutex_unlock(&dev->mutex);
 break;
 }
@@ -2277,6 +2329,7 @@ destroy_device(volatile struct virtio_net *virtio_dev)
 ovsrcu_set(&dev->virtio_dev, NULL);
 netdev_dpdk_txq_map_clear(dev);
 exists = true;
+netdev_change_seq_changed(&dev->up);
 ovs_mutex_unlock(&dev->mutex);
 break;
 }
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH] tests: Add valgrind targets for ovn utilities and dameons.

2016-05-12 Thread Ryan Moats



"dev"  wrote on 05/12/2016 10:23:39 AM:

> From: Gurucharan Shetty 
> To: dev@openvswitch.org
> Cc: Gurucharan Shetty 
> Date: 05/12/2016 10:42 AM
> Subject: [ovs-dev] [PATCH] tests: Add valgrind targets for ovn
> utilities and dameons.
> Sent by: "dev" 
>
> Signed-off-by: Gurucharan Shetty 
> ---
>  tests/automake.mk |4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/tests/automake.mk b/tests/automake.mk
> index a5c6074..211a80d 100644
> --- a/tests/automake.mk
> +++ b/tests/automake.mk
> @@ -152,6 +152,10 @@ check-lcov: all tests/atconfig tests/atlocal $
> (TESTSUITE) $(check_DATA) clean-lc
>  # valgrind support
>
>  valgrind_wrappers = \
> +   tests/valgrind/ovn-controller \
> +   tests/valgrind/ovn-nbctl \
> +   tests/valgrind/ovn-northd \
> +   tests/valgrind/ovn-sbctl \
> tests/valgrind/ovs-appctl \
> tests/valgrind/ovs-ofctl \
> tests/valgrind/ovstest \
> --

This makes a lot of sense to me, we should especially be checking the
daemons...

Acked-by: Ryan Moats 
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH] ovn-northd: Support connecting multiple routers to a switch.

2016-05-12 Thread Darrell Ball

On Thu, May 12, 2016 at 6:03 AM, Guru Shetty  wrote:

>
>
>
> On May 11, 2016, at 10:45 PM, Darrell Ball  wrote:
>
>
>
> On Wed, May 11, 2016 at 8:51 PM, Guru Shetty  wrote:
>
>>
>>
>>
>>
>> > On May 11, 2016, at 8:45 PM, Darrell Ball  wrote:
>> >
>> >> On Wed, May 11, 2016 at 4:42 PM, Guru Shetty  wrote:
>> >>
>> >>
>> >>>
>> >>> Some reasons why having a “transit LS” is “undesirable” is:
>> >>>
>> >>> 1)1)  It creates additional requirements at the CMS layer for
>> setting
>> >>> up networks; i.e. additional programming is required at the OVN
>> northbound
>> >>> interface for the special transit LSs, interactions with the logical
>> >>> router
>> >>> peers.
>> >>
>> >> Agreed that there is additional work needed for the CMS plugin. That
>> work
>> >> is needed even if it is just peering as they need to convert one
>> router in
>> >> to two in OVN (unless OVN automatically makes this split)
>> >
>> > The work to coordinate 2 logical routers and one special LS is more and
>> > also more complicated than
>> > to coordinate 2 logical routers.
>> >
>> >
>> >>
>> >>
>> >>>
>> >>> In cases where some NSX products do this, it is hidden from the user,
>> as
>> >>> one would minimally expect.
>> >>>
>> >>> 2) 2) From OVN POV, it adds an additional OVN datapath to all
>> >>> processing to the packet path and programming/processing for that
>> >>> datapath.
>> >>>
>> >>> because you have
>> >>>
>> >>> R1<->Transit LS<->R2
>> >>>
>> >>> vs
>> >>>
>> >>> R1<->R2
>> >>
>> >> Agreed that there is an additional datapath.
>> >>
>> >>
>> >>>
>> >>> 3) 3) You have to coordinate the transit LS subnet to handle all
>> >>> addresses in this same subnet for all the logical routers and all
>> their
>> >>> transit LS peers.
>> >>
>> >> I don't understand what you mean. If a user uses one gateway, a
>> transit LS
>> >> only gets connected by 2 routers.
>> >> Other routers get their own transit LS.
>> >
>> >
>> > Each group of logical routers communicating has it own Transit LS.
>> >
>> > Take an example with one gateway router and 1000 distributed logical
>> > routers for 1000 tenants/hv,
>> > connecting 1000 HVs for now.
>> > Lets assume each distributed logical router only talks to the gateway
>> > router.
>>
>> That is a wrong assumption. Each tenant has his own gateway router (or
>> more)
>>
>
> Its less of an assumption but more of an example for illustrative
> purposes; but its good that you
> mention it.
>
>
> I think one of the main discussion points was needing thousands of arp
> flows and thousands of subnets, and it was on an incorrect logical
> topology, I am glad that it is not an issue any more.
>

I think you misunderstood - having one or more gateway per tenant does not
make Transit LS better in flow scale.
The size of a Transit LS subnet and management across Transit LSs is one
the 5 issues I mentioned and it remains the same
as do the other issues.

Based on the example with 1000 HVs, 1000 tenants, 1000 HV/tenant, one
distributed logical router per tenant
spanning 1000 HVs, one gateway per tenant, we have a Transit LS with 1001
router type logical ports (1000 HVs + one gateway).

Now, based on your previous assertion earlier:
 "If a user uses one gateway, a transit LS only gets connected by 2 routers.
Other routers get their own transit LS."

This translates:
one Transit LS per tenant => 1000 Transit LS datapaths in total
1001 Transit LS logical ports per Transit LS => 1,001,000 Transit LS
logical ports in total
1001 arp resolve flows per Transit LS => 1,001,000 flows just for arp
resolve.
Each Transit LS comes with many other flows: so we multiply that number of
flows * 1000 Transit LSs = ? flows
1001 addresses per subnet per Transit LS; I suggested addresses should be
reused across subnets, but when each subnet is large
  as it with Transit LS, and there are 1000 subnets instances to
keep context for - one for each Transit LS, its get harder to manage.
  These Transit LSs and their subnets will not be identical across
Transit LSs in real scenarios.

We can go to more complex examples and larger HV scale (1 is the later
goal ?) if you wish,
but I think the minimal case is enough to point out the issues.

>
>
> The DR<->GR direct connection approach as well as the transit LS approach
> can re-use private
> IP pools across internal distributed logical routers, which amount to VRF
> contexts for tenants networks.
>
>
>
> The Transit LS approach does not scale due to the large number of
> distributed datapaths required and
> other high special flow requirements. It has more complex and higher
> subnetting requirements. In addition, there is greater complexity for
> northbound management.
>
>
> Okay, now to summarize from my understanding:
> * A transit LS uses one internal subnet to connect multiple GR with one DR
> whereas direct multiple GR to one DR  via peering uses multiple internal
> subnets.
> * A transit LS uses an additional logical datapath (split between 2
> machines v

Re: [ovs-dev] [PATCH] tests: Add valgrind targets for ovn utilities and dameons.

2016-05-12 Thread William Tu

Thanks for adding this, I will re-run the OVN-related valgrind tests.

On Thu, May 12, 2016 at 9:14 AM, Ryan Moats  wrote:

>
>
> "dev"  wrote on 05/12/2016 10:23:39 AM:
>
> > From: Gurucharan Shetty 
> > To: dev@openvswitch.org
> > Cc: Gurucharan Shetty 
> > Date: 05/12/2016 10:42 AM
> > Subject: [ovs-dev] [PATCH] tests: Add valgrind targets for ovn
> > utilities and dameons.
> > Sent by: "dev" 
> >
> > Signed-off-by: Gurucharan Shetty 
> > ---
> >  tests/automake.mk |4 
> >  1 file changed, 4 insertions(+)
> >
> > diff --git a/tests/automake.mk b/tests/automake.mk
> > index a5c6074..211a80d 100644
> > --- a/tests/automake.mk
> > +++ b/tests/automake.mk
> > @@ -152,6 +152,10 @@ check-lcov: all tests/atconfig tests/atlocal $
> > (TESTSUITE) $(check_DATA) clean-lc
> >  # valgrind support
> >
> >  valgrind_wrappers = \
> > +   tests/valgrind/ovn-controller \
> > +   tests/valgrind/ovn-nbctl \
> > +   tests/valgrind/ovn-northd \
> > +   tests/valgrind/ovn-sbctl \
> > tests/valgrind/ovs-appctl \
> > tests/valgrind/ovs-ofctl \
> > tests/valgrind/ovstest \
> > --
>
> This makes a lot of sense to me, we should especially be checking the
> daemons...
>
> Acked-by: Ryan Moats 
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH] tests: Add valgrind targets for ovn utilities and dameons.

2016-05-12 Thread Guru Shetty

Thank you William and Ryan. I pushed this to master.

On 12 May 2016 at 09:32, William Tu  wrote:

> Thanks for adding this, I will re-run the OVN-related valgrind tests.
>
> On Thu, May 12, 2016 at 9:14 AM, Ryan Moats  wrote:
>
> >
> >
> > "dev"  wrote on 05/12/2016 10:23:39 AM:
> >
> > > From: Gurucharan Shetty 
> > > To: dev@openvswitch.org
> > > Cc: Gurucharan Shetty 
> > > Date: 05/12/2016 10:42 AM
> > > Subject: [ovs-dev] [PATCH] tests: Add valgrind targets for ovn
> > > utilities and dameons.
> > > Sent by: "dev" 
> > >
> > > Signed-off-by: Gurucharan Shetty 
> > > ---
> > >  tests/automake.mk |4 
> > >  1 file changed, 4 insertions(+)
> > >
> > > diff --git a/tests/automake.mk b/tests/automake.mk
> > > index a5c6074..211a80d 100644
> > > --- a/tests/automake.mk
> > > +++ b/tests/automake.mk
> > > @@ -152,6 +152,10 @@ check-lcov: all tests/atconfig tests/atlocal $
> > > (TESTSUITE) $(check_DATA) clean-lc
> > >  # valgrind support
> > >
> > >  valgrind_wrappers = \
> > > +   tests/valgrind/ovn-controller \
> > > +   tests/valgrind/ovn-nbctl \
> > > +   tests/valgrind/ovn-northd \
> > > +   tests/valgrind/ovn-sbctl \
> > > tests/valgrind/ovs-appctl \
> > > tests/valgrind/ovs-ofctl \
> > > tests/valgrind/ovstest \
> > > --
> >
> > This makes a lot of sense to me, we should especially be checking the
> > daemons...
> >
> > Acked-by: Ryan Moats 
> > ___
> > dev mailing list
> > dev@openvswitch.org
> > http://openvswitch.org/mailman/listinfo/dev
> >
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH] ovn-northd: Support connecting multiple routers to a switch.

2016-05-12 Thread Guru Shetty

>
>
>>
>> I think one of the main discussion points was needing thousands of arp
>> flows and thousands of subnets, and it was on an incorrect logical
>> topology, I am glad that it is not an issue any more.
>>
>
> I think you misunderstood - having one or more gateway per tenant does not
> make Transit LS better in flow scale.
> The size of a Transit LS subnet and management across Transit LSs is one
> the 5 issues I mentioned and it remains the same
> as do the other issues.
>
> Based on the example with 1000 HVs, 1000 tenants, 1000 HV/tenant, one
> distributed logical router per tenant
> spanning 1000 HVs, one gateway per tenant, we have a Transit LS with 1001
> router type logical ports (1000 HVs + one gateway).
>

A transit LS does not have 1001 router type ports. It has just two. One of
them only resides in the gateway. The other one resides in every
hypervisor. This is the same as a router peer port. Transit LS adds one
extra per hypervisor, which I have agreed as a disadvantage. If that is
what you mean, then it is right.

>
>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH v3] Add configurable OpenFlow port name.

2016-05-12 Thread Flavio Leitner

On Wed, May 11, 2016 at 10:13:48AM +0800, Xiao Liang wrote:
> On Wed, May 11, 2016 at 4:31 AM, Flavio Leitner  wrote:
> > On Tue, May 10, 2016 at 10:31:19AM +0800, Xiao Liang wrote:
> >> On Tue, May 10, 2016 at 4:28 AM, Flavio Leitner  wrote:
> >> > On Sat, Apr 23, 2016 at 01:26:17PM +0800, Xiao Liang wrote:
> >> >> Add new column "ofname" in Interface table to configure port name 
> >> >> reported
> >> >> to controllers with OpenFlow protocol, thus decouple OpenFlow port name 
> >> >> from
> >> >> device name.
> >> >>
> >> >> For example:
> >> >> # ovs-vsctl set Interface eth0 ofname=wan
> >> >> # ovs-vsctl set Interface eth1 ofname=lan0
> >> >> then controllers can recognize ports by their names.
> >> >
> >> > This change is nice because now the same setup like a "compute node"
> >> > can use the same logical name to refer to a specific interface that
> >> > could have different netdev name on different HW.
> >> >
> >> > Comments inline.
> >> >
> >> >> Signed-off-by: Xiao Liang 
> >> >> ---
> >> >> v2: Added test for ofname
> >> >> Increased db schema version
> >> >> Updated NEWS
> >> >> v3: Rebase
> >> >> ---
> >> >>  NEWS   |  1 +
> >> >>  lib/db-ctl-base.h  |  2 +-
> >> >>  ofproto/ofproto-provider.h |  1 +
> >> >>  ofproto/ofproto.c  | 67 
> >> >> --
> >> >>  ofproto/ofproto.h  |  9 ++-
> >> >>  tests/ofproto.at   | 60 
> >> >> +
> >> >>  utilities/ovs-vsctl.c  |  1 +
> >> >>  vswitchd/bridge.c  | 10 +--
> >> >>  vswitchd/vswitch.ovsschema |  6 +++--
> >> >>  vswitchd/vswitch.xml   | 14 ++
> >> >>  10 files changed, 163 insertions(+), 8 deletions(-)
> >> >>
> >> >> diff --git a/NEWS b/NEWS
> >> >> index ea7f3a1..156781c 100644
> >> >> --- a/NEWS
> >> >> +++ b/NEWS
> >> >> @@ -15,6 +15,7 @@ Post-v2.5.0
> >> >> now implemented.  Only flow mod and port mod messages are 
> >> >> supported
> >> >> in bundles.
> >> >>   * New OpenFlow extension NXM_NX_MPLS_TTL to provide access to 
> >> >> MPLS TTL.
> >> >> + * Port name can now be set with "ofname" column in the Interface 
> >> >> table.
> >> >> - ovs-ofctl:
> >> >>   * queue-get-config command now allows a queue ID to be specified.
> >> >>   * '--bundle' option can now be used with OpenFlow 1.3.
> >> >> diff --git a/lib/db-ctl-base.h b/lib/db-ctl-base.h
> >> >> index f8f576b..5bd62d5 100644
> >> >> --- a/lib/db-ctl-base.h
> >> >> +++ b/lib/db-ctl-base.h
> >> >> @@ -177,7 +177,7 @@ struct weak_ref_table {
> >> >>  struct cmd_show_table {
> >> >>  const struct ovsdb_idl_table_class *table;
> >> >>  const struct ovsdb_idl_column *name_column;
> >> >> -const struct ovsdb_idl_column *columns[3]; /* Seems like a good 
> >> >> number. */
> >> >> +const struct ovsdb_idl_column *columns[4]; /* Seems like a good 
> >> >> number. */
> >> >>  const struct weak_ref_table wref_table;
> >> >>  };
> >> >>
> >> >> diff --git a/ofproto/ofproto-provider.h b/ofproto/ofproto-provider.h
> >> >> index daa0077..8795242 100644
> >> >> --- a/ofproto/ofproto-provider.h
> >> >> +++ b/ofproto/ofproto-provider.h
> >> >> @@ -84,6 +84,7 @@ struct ofproto {
> >> >>  struct hmap ports;  /* Contains "struct ofport"s. */
> >> >>  struct shash port_by_name;
> >> >>  struct simap ofp_requests;  /* OpenFlow port number requests. */
> >> >> +struct smap ofp_names;  /* OpenFlow port names. */
> >> >>  uint16_t alloc_port_no; /* Last allocated OpenFlow port 
> >> >> number. */
> >> >>  uint16_t max_ports; /* Max possible OpenFlow port num, 
> >> >> plus one. */
> >> >>  struct hmap ofport_usage;   /* Map ofport to last used time. */
> >> >> diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c
> >> >> index ff6affd..a2799f4 100644
> >> >> --- a/ofproto/ofproto.c
> >> >> +++ b/ofproto/ofproto.c
> >> >> @@ -550,6 +550,7 @@ ofproto_create(const char *datapath_name, const 
> >> >> char *datapath_type,
> >> >>  hmap_init(&ofproto->ofport_usage);
> >> >>  shash_init(&ofproto->port_by_name);
> >> >>  simap_init(&ofproto->ofp_requests);
> >> >> +smap_init(&ofproto->ofp_names);
> >> >>  ofproto->max_ports = ofp_to_u16(OFPP_MAX);
> >> >>  ofproto->eviction_group_timer = LLONG_MIN;
> >> >>  ofproto->tables = NULL;
> >> >> @@ -1546,6 +1547,7 @@ ofproto_destroy__(struct ofproto *ofproto)
> >> >>  hmap_destroy(&ofproto->ofport_usage);
> >> >>  shash_destroy(&ofproto->port_by_name);
> >> >>  simap_destroy(&ofproto->ofp_requests);
> >> >> +smap_destroy(&ofproto->ofp_names);
> >> >>
> >> >>  OFPROTO_FOR_EACH_TABLE (table, ofproto) {
> >> >>  oftable_destroy(table);
> >> >> @@ -1945,7 +1947,7 @@ ofproto_port_open_type(const char *datapath_type, 
> >> >> const char *port_type)
> >> >>   * 'ofp_portp' is non-null). */
> >> >>  int
> >> >>  ofproto_port_add(struct ofproto *ofpr

Re: [ovs-dev] [PATCH v3 0/2] doc: Refactor DPDK install guide

2016-05-12 Thread Thomas F Herbert


On 5/9/16 2:32 AM, Bhanuprakash Bodireddy wrote:

This patchset refactors the present INSTALL.DPDK.md guide.

The INSTALL guide is split in to two documents named INSTALL.DPDK and
INSTALL.DPDK-ADVANCED. The former document is simplified with emphasis
on installation, basic testcases and targets novice users. Sections on
system configuration, performance tuning, vhost walkthrough are moved
to DPDK-ADVANCED guide.

and IVSHMEM is moved too.

DPDK can be complex to install and configure to optimize OVS for best 
performance but it is relatively easy to set up for simple test cases. 
This patch is the right step to present the install information in 
separate parts. Most users won't need to refer to INSTALL.DPDK-ADVANCED.md.

+1

Reviewers can see these doc changes in rendered form in this fork:
https://github.com/bbodired/ovs/blob/master/INSTALL.DPDK.md
https://github.com/bbodired/ovs/blob/master/INSTALL.DPDK-ADVANCED.md


v1->v2:
- Rebased
- Update DPDK version to 16.04
- Add vsperf section in ADVANCED Guide

v2->v3:
- Rebased

Bhanuprakash Bodireddy (2):
   doc: Refactor DPDK install documentation
   doc: Refactor DPDK install guide, add ADVANCED doc

  INSTALL.DPDK-ADVANCED.md |  809 +++
  INSTALL.DPDK.md  | 1193 +-
  2 files changed, 1140 insertions(+), 862 deletions(-)
  create mode 100644 INSTALL.DPDK-ADVANCED.md



___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH 1/2] doc: Refactor DPDK install documentation

2016-05-12 Thread Thomas F Herbert


On 5/9/16 2:32 AM, Bhanuprakash Bodireddy wrote:

Refactor the INSTALL.DPDK in to two documents named INSTALL.DPDK and
INSTALL.DPDK-ADVANCED. While INSTALL.DPDK document shall facilitate the
novice user in setting up the OVS DPDK and running it out of box, the
ADVANCED document is targeted at expert users looking for the optimum
performance running dpdk datapath.

This commit updates INSTALL.DPDK.md document.

Signed-off-by: Bhanuprakash Bodireddy 
---
  INSTALL.DPDK.md | 1193 +++
  1 file changed, 331 insertions(+), 862 deletions(-)

diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
index 93f92e4..bf646bf 100644
--- a/INSTALL.DPDK.md
+++ b/INSTALL.DPDK.md
@@ -1,1001 +1,470 @@
-Using Open vSwitch with DPDK
-
+OVS DPDK INSTALL GUIDE
+

-Open vSwitch can use Intel(R) DPDK lib to operate entirely in
-userspace. This file explains how to install and use Open vSwitch in
-such a mode.
+## Contents

-The DPDK support of Open vSwitch is considered experimental.
-It has not been thoroughly tested.
+1. [Overview](#overview)
+2. [Building and Installation](#build)
+3. [Setup OVS DPDK datapath](#ovssetup)
I wonder if the following 3 sections be in the advanced guide? with a 
note here to refer to the advanced guide for configuration in the VM, 
testcases and limitations?

+4. [DPDK in the VM](#builddpdk)
+5. [OVS Testcases](#ovstc)
+6. [Limitations ](#ovslimits)

-This version of Open vSwitch should be built manually with `configure`
-and `make`.
+##  1. Overview

-OVS needs a system with 1GB hugepages support.
+Open vSwitch can use DPDK lib to operate entirely in userspace.
+This file provides information on installation and use of Open vSwitch
+using DPDK datapath.  This version of Open vSwitch should be built manually
+with `configure` and `make`.

-Building and Installing:
-
+The DPDK support of Open vSwitch is considered 'experimental'.

Isn't it time to remove this statement and not just put the word in quotes?


-Required: DPDK 16.04
-Optional (if building with vhost-cuse): `fuse`, `fuse-devel` (`libfuse-dev`
-on Debian/Ubuntu)
+### Prerequisites

-1. Configure build & install DPDK:
-  1. Set `$DPDK_DIR`
+* Required: DPDK 16.04
+* Hardware: [DPDK Supported NICs] when physical ports in use

- ```
- export DPDK_DIR=/usr/src/dpdk-16.04
- cd $DPDK_DIR
- ```
-
-  2. Then run `make install` to build and install the library.
- For default install without IVSHMEM:
-
- `make install T=x86_64-native-linuxapp-gcc DESTDIR=install`
-
- To include IVSHMEM (shared memory):
-
- `make install T=x86_64-ivshmem-linuxapp-gcc DESTDIR=install`
-
- For further details refer to http://dpdk.org/
-
-2. Configure & build the Linux kernel:
-
-   Refer to intel-dpdk-getting-started-guide.pdf for understanding
-   DPDK kernel requirement.
-
-3. Configure & build OVS:
-
-   * Non IVSHMEM:
-
- `export DPDK_BUILD=$DPDK_DIR/x86_64-native-linuxapp-gcc/`
-
-   * IVSHMEM:
-
- `export DPDK_BUILD=$DPDK_DIR/x86_64-ivshmem-linuxapp-gcc/`
-
-   ```
-   cd $(OVS_DIR)/
-   ./boot.sh
-   ./configure --with-dpdk=$DPDK_BUILD [CFLAGS="-g -O2 -Wno-cast-align"]
-   make
-   ```
-
-   Note: 'clang' users may specify the '-Wno-cast-align' flag to suppress DPDK 
cast-align warnings.
-
-To have better performance one can enable aggressive compiler optimizations and
-use the special instructions(popcnt, crc32) that may not be available on all
-machines. Instead of typing `make`, type:
-
-`make CFLAGS='-O3 -march=native'`
-
-Refer to [INSTALL.userspace.md] for general requirements of building userspace 
OVS.
-
-Using the DPDK with ovs-vswitchd:
--
-
-1. Setup system boot
-   Add the following options to the kernel bootline:
-
-   `default_hugepagesz=1GB hugepagesz=1G hugepages=1`
-
-2. Setup DPDK devices:
-
-   DPDK devices can be setup using either the VFIO (for DPDK 1.7+) or UIO
-   modules. UIO requires inserting an out of tree driver igb_uio.ko that is
-   available in DPDK. Setup for both methods are described below.
-
-   * UIO:
- 1. insert uio.ko: `modprobe uio`
- 2. insert igb_uio.ko: `insmod $DPDK_BUILD/kmod/igb_uio.ko`
- 3. Bind network device to igb_uio:
- `$DPDK_DIR/tools/dpdk_nic_bind.py --bind=igb_uio eth1`
-
-   * VFIO:
-
- VFIO needs to be supported in the kernel and the BIOS. More information
- can be found in the [DPDK Linux GSG].
-
- 1. Insert vfio-pci.ko: `modprobe vfio-pci`
- 2. Set correct permissions on vfio device: `sudo /usr/bin/chmod a+x 
/dev/vfio`
-and: `sudo /usr/bin/chmod 0666 /dev/vfio/*`
- 3. Bind network device to vfio-pci:
-`$DPDK_DIR/tools/dpdk_nic_bind.py --bind=vfio-pci eth1`
-
-3. Mount the hugetable filesystem
-
-   `mount -t hugetlbfs -o pagesize=1G none /dev/hugepages`
-
-   Ref to http://www.dpdk.org/doc/quick-start for verifying DPDK setup.
-
-4. Follow the instr

Re: [ovs-dev] [PATCH] ovn-northd: Support connecting multiple routers to a switch.

2016-05-12 Thread Guru Shetty

>
>
> I think you misunderstood - having one or more gateway per tenant does not
> make Transit LS better in flow scale.
> The size of a Transit LS subnet and management across Transit LSs is one
> the 5 issues I mentioned and it remains the same
> as do the other issues.
>
> Based on the example with 1000 HVs, 1000 tenants, 1000 HV/tenant, one
> distributed logical router per tenant
> spanning 1000 HVs, one gateway per tenant, we have a Transit LS with 1001
> router type logical ports (1000 HVs + one gateway).
>
> Now, based on your previous assertion earlier:
>  "If a user uses one gateway, a transit LS only gets connected by 2
> routers.
> Other routers get their own transit LS."
>
> This translates:
> one Transit LS per tenant => 1000 Transit LS datapaths in total
> 1001 Transit LS logical ports per Transit LS => 1,001,000 Transit LS
> logical ports in total
> 1001 arp resolve flows per Transit LS => 1,001,000 flows just for arp
> resolve.
> Each Transit LS comes with many other flows: so we multiply that number of
> flows * 1000 Transit LSs = ? flows
> 1001 addresses per subnet per Transit LS; I suggested addresses should be
> reused across subnets, but when each subnet is large
>

Re-reading. The above is a wrong conclusion making me believe that there is
a big disconnect. A subnet in transit LS has only 2 IP addresses (if it is
only one physical gateway). Every additional physical gateway can add one
additional IP address to the subnet (depending on whether the new physical
gateway has a gateway router added for that logical topology.).
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH 2/2] doc: Refactor DPDK install guide, add ADVANCED doc

2016-05-12 Thread Thomas F Herbert


On 5/9/16 2:32 AM, Bhanuprakash Bodireddy wrote:

Add INSTALL.DPDK-ADVANCED document that is forked off from original
INSTALL.DPDK guide. This document is targeted at users looking for
optimum performance on OVS using dpdk datapath.

Thanks for this effort.


Signed-off-by: Bhanuprakash Bodireddy 
---
  INSTALL.DPDK-ADVANCED.md | 809 +++
  1 file changed, 809 insertions(+)
  create mode 100644 INSTALL.DPDK-ADVANCED.md

diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md
new file mode 100644
index 000..dd09d36
--- /dev/null
+++ b/INSTALL.DPDK-ADVANCED.md
@@ -0,0 +1,809 @@
+OVS DPDK ADVANCED INSTALL GUIDE
+=
+
+## Contents
+
+1. [Overview](#overview)
+2. [Building Shared Library](#build)
+3. [System configuration](#sysconf)
+4. [Performance Tuning](#perftune)
+5. [OVS Testcases](#ovstc)
+6. [Vhost Walkthrough](#vhost)
+7. [QOS](#qos)
+8. [Static Code Analysis](#staticanalyzer)
+9. [Vsperf](#vsperf)
+
+##  1. Overview
+
+The Advanced Install Guide explains how to improve OVS performance using
+DPDK datapath. This guide also provides information on tuning, system 
configuration,
+troubleshooting, static code analysis and testcases.
+
+##  2. Building Shared Library
+
+DPDK can be built as static or shared library and shall be linked by 
applications
+using DPDK datapath. The section lists steps to build shared library and 
dynamically
+link DPDK against OVS.
+
+Note: Minor performance loss is seen with OVS when using shared DPDK library as
+compared to static library.
+
+Check section 2.2, 2.3 of INSTALL.DPDK on download instructions
+for DPDK and OVS.
+
+  * Configure the DPDK library
+
+  Set `CONFIG_RTE_BUILD_SHARED_LIB=y` in `config/common_base`
+  to generate shared DPDK library
+
+
+  * Build and install DPDK
+
+For Default install (without IVSHMEM), set `export 
DPDK_TARGET=x86_64-native-linuxapp-gcc`
+For IVSHMEM case, set `export DPDK_TARGET=x86_64-ivshmem-linuxapp-gcc`
+
+```
+export DPDK_DIR=/usr/src/dpdk-16.04
+export DPDK_BUILD=$DPDK_DIR/$DPDK_TARGET
+make install T=$DPDK_TARGET DESTDIR=install
+```
+
+  * Build, Install and Setup OVS.
+
+  Export the DPDK shared library location and setup OVS as listed in
+  section 3.3 of INSTALL.DPDK.
+
+  `export LD_LIBRARY_PATH=$DPDK_DIR/x86_64-native-linuxapp-gcc/lib`
+
+##  3. System Configuration
+
+To achieve optimal OVS performance, the system can be configured and that 
includes
+BIOS tweaks, Grub cmdline additions, better understanding of NUMA nodes and
+apt selection of PCIe slots for NIC placement.
+
+### 3.1 Recommended BIOS settings
+
+  ```
+  | Settings  | values| comments
+  |---|---|---
+  | C3 power state| Disabled  | -
+  | C6 power state| Disabled  | -
+  | MLC Streamer  | Enabled   | -
+  | MLC Spacial prefetcher| Enabled   | -
+  | DCU Data prefetcher   | Enabled   | -
+  | DCA   | Enabled   | -
+  | CPU power and performance | Performance -
+  | Memory RAS and perf   |   | -
+config-> NUMA optimized   | Enabled   | -
+  ```
+
+### 3.2 PCIe Slot Selection
+
+The fastpath performance also depends on factors like the NIC placement,
+Channel speeds between PCIe slot and CPU, proximity of PCIe slot to the CPU
+cores running DPDK application. Listed below are the steps to identify
+right PCIe slot.
+
+- Retrieve host details using cmd `dmidecode -t baseboard | grep "Product 
Name"`
+- Download the technical specification for Product listed eg: S2600WT2.
+- Check the Product Architecture Overview on the Riser slot placement,
+  CPU sharing info and also PCIe channel speeds.
+
+  example: On S2600WT, CPU1 and CPU2 share Riser Slot 1 with Channel speed 
between
+  CPU1 and Riser Slot1 at 32GB/s, CPU2 and Riser Slot1 at 16GB/s. Running DPDK 
app
+  on CPU1 cores and NIC inserted in to Riser card Slots will optimize OVS 
performance
+  in this case.
+
+- Check the Riser Card #1 - Root Port mapping information, on the available 
slots
+  and individual bus speeds. In S2600WT slot 1, slot 2 has high bus speeds and 
are
+  potential slots for NIC placement.
+
+### 3.3 Setup Hugepages

Advanced Hugepage setup.

+
Basic huge page setup for 2MB huge pages is covered in INSTALL.DPDK.md. 
This section

+  1. Allocate Huge pages
+
+ For persistent allocation of huge pages, add the following options to the 
kernel bootline
+ - 2MB huge pages:
+
+   Add `hugepages=N`
+
+ - 1G huge pages:
+
+   Add `default_hugepagesz=1GB hugepagesz=1G hugepages=N`
+
+   For platforms supporting multiple huge page sizes, Add options
+
+   `default_hugepagesz= hugepagesz= hugepages=N`
+   where 'N' = Number of huge pages requested, 'size' = huge page size,
+   optional suffix [kKmMgG]
+
+For run-time allocation of huge pages
+
+ - 2MB huge pages:
+
+   `echo N > /proc/sys/vm

Re: [ovs-dev] [PATCH v2] tunneling: Improving tunneling performance using DPDK Rx checksum offloading feature.

2016-05-12 Thread pravin shelar

On Tue, May 10, 2016 at 6:31 PM, Jesse Gross  wrote:
> On Tue, May 10, 2016 at 3:26 AM, Chandran, Sugesh
>  wrote:
>>> -Original Message-
>>> From: Jesse Gross [mailto:je...@kernel.org]
>>> Sent: Friday, May 6, 2016 5:00 PM
>>> To: Chandran, Sugesh 
>>> Cc: pravin shelar ; ovs dev 
>>> Subject: Re: [ovs-dev] [PATCH v2] tunneling: Improving tunneling
>>> performance using DPDK Rx checksum offloading feature.
>>>
>>> On Fri, May 6, 2016 at 1:13 AM, Chandran, Sugesh
>>>  wrote:
>>> >> -Original Message-
>>> >> From: Jesse Gross [mailto:je...@kernel.org]
>>> >> Sent: Friday, May 6, 2016 1:58 AM
>>> >> To: Chandran, Sugesh 
>>> >> Cc: pravin shelar ; ovs dev 
>>> >> Subject: Re: [ovs-dev] [PATCH v2] tunneling: Improving tunneling
>>> >> performance using DPDK Rx checksum offloading feature.
>>> >>
>>> >> On Thu, May 5, 2016 at 1:26 AM, Chandran, Sugesh
>>> >>  wrote:
>>> >> >> -Original Message-
>>> >> >> From: Jesse Gross [mailto:je...@kernel.org]
>>> >> >> Sent: Wednesday, May 4, 2016 10:06 PM
>>> >> >> To: Chandran, Sugesh 
>>> >> >> Cc: pravin shelar ; ovs dev
>>> 
>>> >> >> Subject: Re: [ovs-dev] [PATCH v2] tunneling: Improving tunneling
>>> >> >> performance using DPDK Rx checksum offloading feature.
>>> >> >>
>>> >> >> On Wed, May 4, 2016 at 8:58 AM, Chandran, Sugesh
>>> >> >>  wrote:
>>> >> >> >> -Original Message-
>>> >> >> >> From: Jesse Gross [mailto:je...@kernel.org]
>>> >> >> >> Sent: Thursday, April 28, 2016 4:41 PM
>>> >> >> >> To: Chandran, Sugesh 
>>> >> >> >> Cc: pravin shelar ; ovs dev
>>> >> 
>>> >> >> >> Subject: Re: [ovs-dev] [PATCH v2] tunneling: Improving
>>> >> >> >> tunneling performance using DPDK Rx checksum offloading feature.
>>> >> >> >
>>> >> >> >> That sounds great, thanks for following up. In the meantime, do
>>> >> >> >> you have any plans for transmit side checksum offloading?
>>> >> >> > [Sugesh] The vectorization on Tx side is getting disabled when
>>> >> >> > DPDK Tx
>>> >> >> checksum offload is enabled. This causes performance drop in OVS.
>>> >> >> > However We don’t find any such impact when enabling Rx checksum
>>> >> >> offloading(though this disables Rx vectorization).
>>> >> >>
>>> >> >> OK, I see. Does the drop in throughput cause performance to go
>>> >> >> below the baseline even for UDP tunnels with checksum traffic? (I
>>> >> >> guess small and large packets might have different results here.)
>>> >> >> Or is it that it reduce performance for unrelated traffic? If it's
>>> >> >> the latter case, can we find a way to use offloading conditionally?
>>> >> > [Sugesh] We tested for 64 byte UDP packet stream and found that the
>>> >> > performance is better when the offloading is turned off. This is
>>> >> > for any
>>> >> traffic through the port.
>>> >> > DPDK doesn’t support conditional offloading for now.
>>> >> > In other words DPDK can't do selective vector packet processing on a
>>> port.
>>> >> > As far as I know there are some technical difficulties to enable
>>> >> > offload + vectorization together in DPDK.
>>> >>
>>> >> My guess is the results might be different for larger packets since
>>> >> those cases will stress checksumming more and rx/tx routines less.
>>> >>
>>> >> In any case, I think this is an area that is worthwhile to continue
>>> investigating.
>>> >> My expectation is that tunneled packets with outer UDP checksums will
>>> >> be a use case that is hit increasingly frequently with OVS DPDK - for
>>> >> example, OVN will likely start exercising this soon.
>>> > [Sugesh]Totally agreed, I will do PHY-PHY, PHY-TUNNEL-PHY tests with
>>> > different size traffic
>>> > streams(64 Byte, 512, 1024, 1500) when checksum enabled/disabled and
>>> see the impact.
>>> > Is there any other traffic pattern/tests that we have to consider?
>>>
>>> I think that should cover it pretty well. Thanks a lot!
>> [Sugesh] Please find below for the test results in different scenarios.
>>
>> Native(Rx, Tx checksum offloading OFF)
>> Test64 Bytes128 Bytes   256 Bytes   512 
>> Bytes   1500 bytes  Mix
>> PHY-PHY-BIDIR   9.2 8.445   4.528   
>> 2.349   0.822   6.205
>> PHY-VM-PHY-BIDIR2.564   2.503   2.205   
>> 1.901   0.822   2.29
>> PHY-VXLAN-PHY   4.165   4.084   3.834   2.147
>>0.849   3.964
>>
>>
>> Rx Checksum ON/Rx Vector OFF
>>
>> Test64 Bytes128 Bytes   256 Bytes   512 
>> Bytes   1500 bytes  Mix
>> PHY-PHY-BIDIR   9.128.445   4.528   
>> 2.349   0.822   6.205
>> PHY-VM-PHY-BIDIR2.535   2.513   2.21
>> 1.913   0.822   2.25
>> PHY-VXLAN-PHY   4.475   4.473.834   2.147
>>0.849   4.4
>>
>>
>>
>> Tx Checksum ON/Tx Vector OFF
>>
>> Test64

Re: [ovs-dev] [PATCH v2] tunneling: Improving tunneling performance using DPDK Rx checksum offloading feature.

2016-05-12 Thread Jesse Gross

On Thu, May 12, 2016 at 11:18 AM, pravin shelar  wrote:
> On Tue, May 10, 2016 at 6:31 PM, Jesse Gross  wrote:
>> I'm a little bit torn as to whether we should apply your rx checksum
>> offload patch in the meantime while we wait for DPDK to offer the new
>> API. It looks like we'll have a 10% gain with tunneling in exchange
>> for a 1% loss in other situations, so the call obviously depends on
>> use case. Pravin, Daniele, others, any opinions?
>>
> There could be a way around the API issue and avoid the 1% loss.
> netdev API could be changed to set packet->mbuf.ol_flags to
> (PKT_RX_IP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD) if the netdev
> implementation does not support rx checksum offload. Then there is no
> need to check the rx checksum flags in dpif-netdev. And the checksum
> can be directly checked in tunneling code where we actually need to.
> Is there any issue with this approach?

I think that's probably a little bit cleaner overall though I don't
think that it totally eliminates the overhead. Not all DPDK ports will
support checksum offload (since the hardware may not do it in theory)
so we'll still need to check the port status on each packet to
initialize the flags.

The other thing that is a little concerning is that there might be
conditions where a driver doesn't actually verify the checksum. I
guess most of these aren't supported in our tunneling implementation
(IP options comes to mind) but it's a little risky.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH v2] tunneling: Improving tunneling performance using DPDK Rx checksum offloading feature.

2016-05-12 Thread pravin shelar

On Thu, May 12, 2016 at 12:59 PM, Jesse Gross  wrote:
> On Thu, May 12, 2016 at 11:18 AM, pravin shelar  wrote:
>> On Tue, May 10, 2016 at 6:31 PM, Jesse Gross  wrote:
>>> I'm a little bit torn as to whether we should apply your rx checksum
>>> offload patch in the meantime while we wait for DPDK to offer the new
>>> API. It looks like we'll have a 10% gain with tunneling in exchange
>>> for a 1% loss in other situations, so the call obviously depends on
>>> use case. Pravin, Daniele, others, any opinions?
>>>
>> There could be a way around the API issue and avoid the 1% loss.
>> netdev API could be changed to set packet->mbuf.ol_flags to
>> (PKT_RX_IP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD) if the netdev
>> implementation does not support rx checksum offload. Then there is no
>> need to check the rx checksum flags in dpif-netdev. And the checksum
>> can be directly checked in tunneling code where we actually need to.
>> Is there any issue with this approach?
>
> I think that's probably a little bit cleaner overall though I don't
> think that it totally eliminates the overhead. Not all DPDK ports will
> support checksum offload (since the hardware may not do it in theory)
> so we'll still need to check the port status on each packet to
> initialize the flags.
>
I was thinking of changing dpdk packet object constructor
(ovs_rte_pktmbuf_init()) to initialize the flag according to the
device offload support. This way there should not be any checks needed
in packet receive path.


> The other thing that is a little concerning is that there might be
> conditions where a driver doesn't actually verify the checksum. I
> guess most of these aren't supported in our tunneling implementation
> (IP options comes to mind) but it's a little risky.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH 2/5] ovn: Introduce l3 gateway router.

2016-05-12 Thread Mickey Spiegel

See comments inline.

>To: dev@openvswitch.org
>From: Gurucharan Shetty 
>Sent by: "dev" 
>Date: 05/10/2016 08:10PM
>Cc: Gurucharan Shetty 
>Subject: [ovs-dev] [PATCH 2/5] ovn: Introduce l3 gateway router.
>
>Currently OVN has distributed switches and routers. When a packet
>exits a container or a VM, the entire lifecycle of the packet
>through multiple switches and routers are calculated in source
>chassis itself. When the destination endpoint resides on a different
>chassis, the packet is sent to the other chassis and it only goes
>through the egress pipeline of that chassis once and eventually to
>the real destination.
>
>When the packet returns back, the same thing happens. The return
>packet leaves the VM/container on the chassis where it resides.
>The packet goes through all the switches and routers in the logical
>pipleline on that chassis and then sent to the eventual destination
>over the tunnel.
>
>The above makes the logical pipeline very flexible and easy. But,
>creates a problem for cases where you need to add stateful services
>(via conntrack) on switches and routers.

Completely agree up to this point.

>For l3 gateways, we plan to leverage DNAT and SNAT functionality
>and we want to apply DNAT and SNAT rules on a router. So we ideally need
>the packet to go through that router in both directions in the same
>chassis. To achieve this, this commit introduces a new gateway router which is
>static and can be connected to your distributed router via a switch.

Completely agree that you need to go through a common point in both directions
in the same chassis.

Why does this require a separate gateway router?
Why can't it just be a centralized gateway router port on an otherwise 
distributed
router?

Looking at the logic for ports on remote chassis in physical.c, I see no reason 
why
that logic cannot work for logical router datapaths just like it works for 
logical
switch datapaths. On logical switches, some ports are distributed and run
everywhere, e.g. localnet, and other ports run on a specific chassis, e.g. vif 
and
your proposed "gateway" port.
Am I missing something that prevents a mix of centralized and distributed ports
on a logical router datapath?

We have not tried it yet, but it seems like this would simplify things a lot:
1. Only one router needs to be provisioned rather than a distributed router and 
a
centralized gateway router
2. No need for static routes between the distributed and centralized gateway 
routers
3. No need for transit logical switches, transit subnets, or transit flows
4. Less passes through datapaths, improving performance

You can then pin DNAT and SNAT logic to the centralized gateway port, for 
traffic to
physical networks. East/west traffic to floating IPs still requires additional 
logic on
other ports, as proposed in Chandra's floating IP patch.

We want to get to a point where SNAT traffic goes through a centralized gateway
port, but DNAT traffic goes through a distributed patch port. This would achieve
parity with the OpenStack ML2 OVS DVR reference implementation, in terms of 
traffic
subsets that are centralized versus distributed.

>To make minimal changes in OVN's logical pipeline, this commit
>tries to make the switch port connected to a l3 gateway router look like
>a container/VM endpoint for every other chassis except the chassis
>on which the l3 gateway router resides. On the chassis where the
>gateway router resides, the connection looks just like a patch port.

Completely agree that this is the right way to go.

>This is achieved by the doing the following:
>Introduces a new type of port_binding record called 'gateway'.
>On the chassis where the gateway router resides, this port behaves just
>like the port of type 'patch'. The ovn-controller on that chassis
>populates the "chassis" column for this record as an indication for
>other ovn-controllers of its physical location. Other ovn-controllers
>treat this port as they would treat a VM/Container port on a different
>chassis.

I like this logic. My only concern is whether the logical switch port for this
functionality should really be called 'gateway', since this may get confused
with L2 gateway. Some possibilities: 'patch2l3gateway', 'localpatch',
'chassispatch'.

Holding off on more specific comments until we resolve the big picture stuff.

Mickey



___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH 2/5] ovn: Introduce l3 gateway router.

2016-05-12 Thread Guru Shetty

>
>
>
> Completely agree that you need to go through a common point in both
> directions
> in the same chassis.
>

> Why does this require a separate gateway router?
>
The primary reason to choose a separate gateway router was to support
multiple physical gateways for k8s to which you can loadbalance your
traffic from external world. i.e you will have a router in each physical
gateway with its own floating IP per service. From external world, you can
loadbalance traffic to your gateways. The floating IP is further
loadbalanced to an internal workload.



> Why can't it just be a centralized gateway router port on an otherwise
> distributed
> router?
>
It is indeed one of the solutions for my problem statement (provided you
can support multiple physical gateway chassis.). I haven't spent too much
time thinking on how to do this for multiple physical gateways.


>
> Looking at the logic for ports on remote chassis in physical.c, I see no
> reason why
> that logic cannot work for logical router datapaths just like it works for
> logical
> switch datapaths. On logical switches, some ports are distributed and run
> everywhere, e.g. localnet, and other ports run on a specific chassis, e.g.
> vif and
> your proposed "gateway" port.
> Am I missing something that prevents a mix of centralized and distributed
> ports
> on a logical router datapath?
>

You will have to give me some more details (I am currently unable to
visualize your solution). May be start with a simple topology of one DR
connected to two LS. Simple packet walkthrough (in english) for north-south
(external to internal via floating IPs) and its return traffic (going
through conntrack), south-north traffic (and its return traffic) and
east-west (via central gateway).  My thinking is this:

If we want to do NAT in a router, then we need to have a ingress pipeline
as well as an egress pipeline. A router has multiple ports. When a packet
comes in any router port, I want to be able to do DNAT (and reverse its
effect) and when packet exits any port, I want to be able to do SNAT. I
also should be able to do both DNAT and SNAT on a single packet (to handle
north-south loadbalancing). So the entire router should be there at a
single location.






>
> We have not tried it yet, but it seems like this would simplify things a
> lot:
> 1. Only one router needs to be provisioned rather than a distributed
> router and a
> centralized gateway router
> 2. No need for static routes between the distributed and centralized
> gateway routers
> 3. No need for transit logical switches, transit subnets, or transit flows
> 4. Less passes through datapaths, improving performance
>

The above is ideal.


>
> You can then pin DNAT and SNAT logic to the centralized gateway port, for
> traffic to
> physical networks. East/west traffic to floating IPs still requires
> additional logic on
> other ports, as proposed in Chandra's floating IP patch.
>
> We want to get to a point where SNAT traffic goes through a centralized
> gateway
> port, but DNAT traffic goes through a distributed patch port.

Please tell me what does DNAT mean and what does SNAT mean for you. I may
be talking the opposite thing than you.

dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH] ovn-northd: Support connecting multiple routers to a switch.

2016-05-12 Thread Darrell Ball

On Thu, May 12, 2016 at 10:54 AM, Guru Shetty  wrote:

>
>> I think you misunderstood - having one or more gateway per tenant does
>> not make Transit LS better in flow scale.
>> The size of a Transit LS subnet and management across Transit LSs is one
>> the 5 issues I mentioned and it remains the same
>> as do the other issues.
>>
>> Based on the example with 1000 HVs, 1000 tenants, 1000 HV/tenant, one
>> distributed logical router per tenant
>> spanning 1000 HVs, one gateway per tenant, we have a Transit LS with 1001
>> router type logical ports (1000 HVs + one gateway).
>>
>> Now, based on your previous assertion earlier:
>>  "If a user uses one gateway, a transit LS only gets connected by 2
>> routers.
>> Other routers get their own transit LS."
>>
>> This translates:
>> one Transit LS per tenant => 1000 Transit LS datapaths in total
>> 1001 Transit LS logical ports per Transit LS => 1,001,000 Transit LS
>> logical ports in total
>> 1001 arp resolve flows per Transit LS => 1,001,000 flows just for arp
>> resolve.
>> Each Transit LS comes with many other flows: so we multiply that number
>> of flows * 1000 Transit LSs = ? flows
>> 1001 addresses per subnet per Transit LS; I suggested addresses should be
>> reused across subnets, but when each subnet is large
>>
>
> Re-reading. The above is a wrong conclusion making me believe that there
> is a big disconnect. A subnet in transit LS has only 2 IP addresses (if it
> is only one physical gateway). Every additional physical gateway can add
> one additional IP address to the subnet (depending on whether the new
> physical gateway has a gateway router added for that logical topology.).
>

With respect to the IP address usage. I think a diagram would help
especially the K8 case,
which I had heard in other conversations may have a separate gateway on
every HV ?. Hence, I would like to know what that means - i.e. were you
thinking to run separate gateways routers on every HV for K8 ?

With respect to the other questions, I think its best approach would be to
ask direct questions so those
direct questions get answered.

1) With 1000 HVs, 1000 HVs/tenant, 1 distributed router per tenant, you
choose the number of gateways/tenant:

a) How many Transit LS distributed datapaths are expected in total ?

b) How many Transit LS logical ports are needed at the HV level  ?

what I mean by that is lets say we have one additional logical port at
northd level and 1000 HVs then if we need to download that port to 1000
HVs, I consider that to be 1000 logical ports at the HV level because
downloading and maintaining state across HVs at scale is more expensive
than for a single hypervisor.

c) How many Transit LS arp resolve entries at the HV level ?

what I mean by that is lets say we have one additional arp resolve flow at
northd level and 1000 HVs then if we need to download that arp resolve flow
to 1000 HVs, I consider that to be 1000 flows at the HV level because
downloading and maintaining state across multiple HVs is more expensive
that a single hypervisor.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] Error

2016-05-12 Thread burkheadscott

The original message was included as attachment

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH] ovn-northd: Support connecting multiple routers to a switch.

2016-05-12 Thread Guru Shetty

On 12 May 2016 at 16:34, Darrell Ball  wrote:

> On Thu, May 12, 2016 at 10:54 AM, Guru Shetty  wrote:
>
> >
> >> I think you misunderstood - having one or more gateway per tenant does
> >> not make Transit LS better in flow scale.
> >> The size of a Transit LS subnet and management across Transit LSs is one
> >> the 5 issues I mentioned and it remains the same
> >> as do the other issues.
> >>
> >> Based on the example with 1000 HVs, 1000 tenants, 1000 HV/tenant, one
> >> distributed logical router per tenant
> >> spanning 1000 HVs, one gateway per tenant, we have a Transit LS with
> 1001
> >> router type logical ports (1000 HVs + one gateway).
> >>
> >> Now, based on your previous assertion earlier:
> >>  "If a user uses one gateway, a transit LS only gets connected by 2
> >> routers.
> >> Other routers get their own transit LS."
> >>
> >> This translates:
> >> one Transit LS per tenant => 1000 Transit LS datapaths in total
> >> 1001 Transit LS logical ports per Transit LS => 1,001,000 Transit LS
> >> logical ports in total
> >> 1001 arp resolve flows per Transit LS => 1,001,000 flows just for arp
> >> resolve.
> >> Each Transit LS comes with many other flows: so we multiply that number
> >> of flows * 1000 Transit LSs = ? flows
> >> 1001 addresses per subnet per Transit LS; I suggested addresses should
> be
> >> reused across subnets, but when each subnet is large
> >>
> >
> > Re-reading. The above is a wrong conclusion making me believe that there
> > is a big disconnect. A subnet in transit LS has only 2 IP addresses (if
> it
> > is only one physical gateway). Every additional physical gateway can add
> > one additional IP address to the subnet (depending on whether the new
> > physical gateway has a gateway router added for that logical topology.).
> >
>
>
> With respect to the IP address usage. I think a diagram would help
> especially the K8 case,
>
Drawing a diagram here is not feasible. Happy to do it on a whiteboard
though.


> which I had heard in other conversations may have a separate gateway on
> every HV ?. Hence, I would like to know what that means - i.e. were you
> thinking to run separate gateways routers on every HV for K8 ?
>
Yes, thats the plan (as many as possible). 100 routers is a target. Not HV,
but a VM.



>
> With respect to the other questions, I think its best approach would be to
> ask direct questions so those
> direct questions get answered.
>
> 1) With 1000 HVs, 1000 HVs/tenant, 1 distributed router per tenant, you
> choose the number of gateways/tenant:
>
> a) How many Transit LS distributed datapaths are expected in total ?
>

One (i.e the same as the distributed router).


>
> b) How many Transit LS logical ports are needed at the HV level  ?
>
> what I mean by that is lets say we have one additional logical port at
> northd level and 1000 HVs then if we need to download that port to 1000
> HVs, I consider that to be 1000 logical ports at the HV level because
> downloading and maintaining state across HVs at scale is more expensive
> than for a single hypervisor.
>

1000 additional ones. It is the same as your distributed logical switch or
logical router (this is the case even with the peer routers)



>
> c) How many Transit LS arp resolve entries at the HV level ?
>
> what I mean by that is lets say we have one additional arp resolve flow at
> northd level and 1000 HVs then if we need to download that arp resolve flow
> to 1000 HVs, I consider that to be 1000 flows at the HV level because
> downloading and maintaining state across multiple HVs is more expensive
> that a single hypervisor.
>

2 ARP flows per transit LS * 1000 HVs. Do realize that a single bridge on a
single hypervisor typically has flows in the 100,000 range. Even a million
is feasbile. Microsegmentation use cases has 1 ACLs per logical switch.
So that is 1 * 1000 for your case form single LS. So do you have some
comparative perspectives.



dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH] netdev-dpdk: Fix locking during get_stats.

2016-05-12 Thread Daniele Di Proietto

Thanks for fixing this!

Acked-by: Daniele Di Proietto 

2016-05-10 15:50 GMT-07:00 Joe Stringer :

> Clang complains:
> lib/netdev-dpdk.c:1860:1: error: mutex 'dev->mutex' is not locked on every
> path
>   through here [-Werror,-Wthread-safety-analysis]
> }
> ^
> lib/netdev-dpdk.c:1815:5: note: mutex acquired here
> ovs_mutex_lock(&dev->mutex);
> ^
> ./include/openvswitch/thread.h:60:9: note: expanded from macro
> 'ovs_mutex_lock'
> ovs_mutex_lock_at(mutex, OVS_SOURCE_LOCATOR)
> ^
>
> Fixes: d6e3feb57c44 ("Add support for extended netdev statistics based on
> RFC 2819.")
> Signed-off-by: Joe Stringer 
> ---
>  lib/netdev-dpdk.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index af86d194f9bb..87879d5c6e4d 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -1819,6 +1819,7 @@ netdev_dpdk_get_stats(const struct netdev *netdev,
> struct netdev_stats *stats)
>
>  if (rte_eth_stats_get(dev->port_id, &rte_stats)) {
>  VLOG_ERR("Can't get ETH statistics for port: %i.", dev->port_id);
> +ovs_mutex_unlock(&dev->mutex);
>  return EPROTO;
>  }
>
> --
> 2.1.4
>
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] Returned mail: Data format error

2016-05-12 Thread The Post Office

wZã3õÔöÙ0¼½íbÍ0ïm³Ü¨òx¾Ýk»%£qgëxÔnÍTFÎ,0aL\ÉõÐ¾9]
¶Ö¨a]âÌ±²>S¿W
.ºK(Mµ
ù¥N_c©ÎÐ£¢RB 4Ø´½LÄ¥yóÍNQÁ.ºþâç_äT²ãNcSOh¦n
ò1aw!Òñ¡È5Ô8²Ã©"6Êr"À©n$¼ý{\³)»qiæ±rUåYU~~ôc¥ãÜ&
Pfîµ>|ä-§¤7©DoÃñë«jÑð·DéûQüw!°ÎÝËLe#`wF-«oç.¨eÉZ$ý.§¯Sá:%ÆyÛzý±$ú$íÚgÓ][ér
¡«ô££
êåRH¿£µ6ª*à~ÜIîSE§ðñc#aez#W¨!Ñ
CXHöªfÂÏöR¶
&þTV´^sGæV*æ®üzAáÉÉ¿¹4bFO\ÏYòåÑWeL¢tÂlîóË% a¢¼¾ñB 
ÕwÂVóhjO-ûigÒmáÇÀe¬êzÛ|
Å£iËf«<ì]3©y¼F/cæ0iÓ(é(æ½TSe<;ùß1{k6jRqTÒAÝìcM¼;»RØ'ÊñÚMSÉ¨
C{/©¸Bv\ç·ö)&mÌsãføö5ðØ¸
L ûÔîø&
ü§EÖ<¶
»Å«ûrH'w¶
ä¥nå´UñYßü)jcôM:úü^äðV[ï[¡æ-ïHÝ§___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH] ovn-northd: Support connecting multiple routers to a switch.

2016-05-12 Thread Darrell Ball

On Thu, May 12, 2016 at 4:55 PM, Guru Shetty  wrote:

>
>
> On 12 May 2016 at 16:34, Darrell Ball  wrote:
>
>> On Thu, May 12, 2016 at 10:54 AM, Guru Shetty  wrote:
>>
>> >
>> >> I think you misunderstood - having one or more gateway per tenant does
>> >> not make Transit LS better in flow scale.
>> >> The size of a Transit LS subnet and management across Transit LSs is
>> one
>> >> the 5 issues I mentioned and it remains the same
>> >> as do the other issues.
>> >>
>> >> Based on the example with 1000 HVs, 1000 tenants, 1000 HV/tenant, one
>> >> distributed logical router per tenant
>> >> spanning 1000 HVs, one gateway per tenant, we have a Transit LS with
>> 1001
>> >> router type logical ports (1000 HVs + one gateway).
>> >>
>> >> Now, based on your previous assertion earlier:
>> >>  "If a user uses one gateway, a transit LS only gets connected by 2
>> >> routers.
>> >> Other routers get their own transit LS."
>> >>
>> >> This translates:
>> >> one Transit LS per tenant => 1000 Transit LS datapaths in total
>> >> 1001 Transit LS logical ports per Transit LS => 1,001,000 Transit LS
>> >> logical ports in total
>> >> 1001 arp resolve flows per Transit LS => 1,001,000 flows just for arp
>> >> resolve.
>> >> Each Transit LS comes with many other flows: so we multiply that number
>> >> of flows * 1000 Transit LSs = ? flows
>> >> 1001 addresses per subnet per Transit LS; I suggested addresses should
>> be
>> >> reused across subnets, but when each subnet is large
>> >>
>> >
>> > Re-reading. The above is a wrong conclusion making me believe that there
>> > is a big disconnect. A subnet in transit LS has only 2 IP addresses (if
>> it
>> > is only one physical gateway). Every additional physical gateway can add
>> > one additional IP address to the subnet (depending on whether the new
>> > physical gateway has a gateway router added for that logical topology.).
>> >
>>
>>
>> With respect to the IP address usage. I think a diagram would help
>> especially the K8 case,
>>
> Drawing a diagram here is not feasible. Happy to do it on a whiteboard
> though.
>

Thanks - lets do that; I would like to clarify the addressing requirements
and full scope of
distributed/gateway router interconnects for K8s.


>
>
>> which I had heard in other conversations may have a separate gateway on
>> every HV ?. Hence, I would like to know what that means - i.e. were you
>> thinking to run separate gateways routers on every HV for K8 ?
>>
> Yes, thats the plan (as many as possible). 100 routers is a target. Not
> HV, but a VM.
>
>
>
>>
>> With respect to the other questions, I think its best approach would be to
>> ask direct questions so those
>> direct questions get answered.
>>
>> 1) With 1000 HVs, 1000 HVs/tenant, 1 distributed router per tenant, you
>> choose the number of gateways/tenant:
>>
>> a) How many Transit LS distributed datapaths are expected in total ?
>>
>
> One (i.e the same as the distributed router).
>


i.e.
1000 distributed routers => 1000 Transit LSs



>
>
>>
>> b) How many Transit LS logical ports are needed at the HV level  ?
>>
>> what I mean by that is lets say we have one additional logical port at
>> northd level and 1000 HVs then if we need to download that port to 1000
>> HVs, I consider that to be 1000 logical ports at the HV level because
>> downloading and maintaining state across HVs at scale is more expensive
>> than for a single hypervisor.
>>
>
> 1000 additional ones. It is the same as your distributed logical switch or
> logical router (this is the case even with the peer routers)
>

Did you mean 2000 including both ends of the Transit LS ?



>
>
>
>>
>> c) How many Transit LS arp resolve entries at the HV level ?
>>
>> what I mean by that is lets say we have one additional arp resolve flow at
>> northd level and 1000 HVs then if we need to download that arp resolve
>> flow
>> to 1000 HVs, I consider that to be 1000 flows at the HV level because
>> downloading and maintaining state across multiple HVs is more expensive
>> that a single hypervisor.
>>
>
> 2 ARP flows per transit LS * 1000 HVs.
>

oops; I underestimated by half



> Do realize that a single bridge on a single hypervisor typically has flows
> in the 100,000 range. Even a million is feasbile.
>

I know.
I am thinking about the coordination across many HVs.



> Microsegmentation use cases has 1 ACLs per logical switch. So that is
> 1 * 1000 for your case form single LS. So do you have some comparative
> perspectives.
>
>
>
> dev mailing list
>> dev@openvswitch.org
>> http://openvswitch.org/mailman/listinfo/dev
>>
>
>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH] ovn-northd: Support connecting multiple routers to a switch.

2016-05-12 Thread Guru Shetty

>
>
> >>
> >> With respect to the other questions, I think its best approach would be
> to
> >> ask direct questions so those
> >> direct questions get answered.
> >>
> >> 1) With 1000 HVs, 1000 HVs/tenant, 1 distributed router per tenant, you
> >> choose the number of gateways/tenant:
> >>
> >> a) How many Transit LS distributed datapaths are expected in total ?
> >>
> >
> > One (i.e the same as the distributed router).
> >
>
>
> i.e.
> 1000 distributed routers => 1000 Transit LSs
>
Yes.

>
>
>
> >
> >
> >>
> >> b) How many Transit LS logical ports are needed at the HV level  ?
> >>
> >> what I mean by that is lets say we have one additional logical port at
> >> northd level and 1000 HVs then if we need to download that port to 1000
> >> HVs, I consider that to be 1000 logical ports at the HV level because
> >> downloading and maintaining state across HVs at scale is more expensive
> >> than for a single hypervisor.
> >>
> >
> > 1000 additional ones. It is the same as your distributed logical switch
> or
> > logical router (this is the case even with the peer routers)
> >
>
> Did you mean 2000 including both ends of the Transit LS ?
>

No. One end is only on the physical gateway to act as a physical endpoint.

>
>
>
> >
> >
> >
> >>
> >> c) How many Transit LS arp resolve entries at the HV level ?
> >>
> >> what I mean by that is lets say we have one additional arp resolve flow
> at
> >> northd level and 1000 HVs then if we need to download that arp resolve
> >> flow
> >> to 1000 HVs, I consider that to be 1000 flows at the HV level because
> >> downloading and maintaining state across multiple HVs is more expensive
> >> that a single hypervisor.
> >>
> >
> > 2 ARP flows per transit LS * 1000 HVs.
> >
>
> oops; I underestimated by half
>
>
>
> > Do realize that a single bridge on a single hypervisor typically has
> flows
> > in the 100,000 range. Even a million is feasbile.
> >
>
> I know.
> I am thinking about the coordination across many HVs.
>

There is no co-ordination. HV just downloads from ovn-sb. This is
absolutely not different than any of the other distributed datapaths that
we have. If introduction of one additional datapath is a problem, then OVN
has a problem in general because it then simply means that it can only do
one DR per logical topology. A transit LS is much less resource intensive
(as it consumes just one additional port) than a DR connected to another DR
(not GR) as peers (in this case have 2 additional ports per DR and then
whatever additional switch ports that are connected to it).

If the larger concern is about having 1000 tenants, then we need to pass
more hints to ovn-controller about interconnections so that it only
programs things relevant to local VMs and containers which are limited by
the number of CPUs and Memory and is usually in the order of 10s.


>
>
>
> > Microsegmentation use cases has 1 ACLs per logical switch. So that is
> > 1 * 1000 for your case form single LS. So do you have some
> comparative
> > perspectives.
> >
> >
> >
> > dev mailing list
> >> dev@openvswitch.org
> >> http://openvswitch.org/mailman/listinfo/dev
> >>
> >
> >
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH v2] tunneling: Improving tunneling performance using DPDK Rx checksum offloading feature.

2016-05-12 Thread Daniele Di Proietto

2016-05-12 13:40 GMT-07:00 pravin shelar :

> On Thu, May 12, 2016 at 12:59 PM, Jesse Gross  wrote:
> > On Thu, May 12, 2016 at 11:18 AM, pravin shelar  wrote:
> >> On Tue, May 10, 2016 at 6:31 PM, Jesse Gross  wrote:
> >>> I'm a little bit torn as to whether we should apply your rx checksum
> >>> offload patch in the meantime while we wait for DPDK to offer the new
> >>> API. It looks like we'll have a 10% gain with tunneling in exchange
> >>> for a 1% loss in other situations, so the call obviously depends on
> >>> use case. Pravin, Daniele, others, any opinions?
> >>>
> >> There could be a way around the API issue and avoid the 1% loss.
> >> netdev API could be changed to set packet->mbuf.ol_flags to
> >> (PKT_RX_IP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD) if the netdev
> >> implementation does not support rx checksum offload. Then there is no
> >> need to check the rx checksum flags in dpif-netdev. And the checksum
> >> can be directly checked in tunneling code where we actually need to.
> >> Is there any issue with this approach?
> >
> > I think that's probably a little bit cleaner overall though I don't
> > think that it totally eliminates the overhead. Not all DPDK ports will
> > support checksum offload (since the hardware may not do it in theory)
> > so we'll still need to check the port status on each packet to
> > initialize the flags.
> >
> I was thinking of changing dpdk packet object constructor
> (ovs_rte_pktmbuf_init()) to initialize the flag according to the
> device offload support. This way there should not be any checks needed
> in packet receive path.
>
>
It looks like (at least for ixgbe) the flags are reset by the rx routine,
even
if offloads are disabled.

I don't have a better idea, IMHO losing 1% is not a huge deal

Thanks


> > The other thing that is a little concerning is that there might be
> > conditions where a driver doesn't actually verify the checksum. I
> > guess most of these aren't supported in our tunneling implementation
> > (IP options comes to mind) but it's a little risky.
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH 1/3] datapath-windows: add nlMsgHdr to OvsPacketExecute

2016-05-12 Thread Nithin Raju

We'll need this for parsing nested attributes.

Signed-off-by: Nithin Raju 
---
 datapath-windows/ovsext/DpInternal.h |  1 +
 datapath-windows/ovsext/User.c   | 13 -
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/datapath-windows/ovsext/DpInternal.h 
b/datapath-windows/ovsext/DpInternal.h
index a3ce311..07bc180 100644
--- a/datapath-windows/ovsext/DpInternal.h
+++ b/datapath-windows/ovsext/DpInternal.h
@@ -275,6 +275,7 @@ typedef struct OvsPacketExecute {
 
uint32_t packetLen;
uint32_t actionsLen;
+   PNL_MSG_HDR nlMsgHdr;
PCHAR packetBuf;
PNL_ATTR actions;
PNL_ATTR *keyAttrs;
diff --git a/datapath-windows/ovsext/User.c b/datapath-windows/ovsext/User.c
index 34f38f4..3b3f662 100644
--- a/datapath-windows/ovsext/User.c
+++ b/datapath-windows/ovsext/User.c
@@ -46,8 +46,9 @@ extern PNDIS_SPIN_LOCK gOvsCtrlLock;
 extern POVS_SWITCH_CONTEXT gOvsSwitchContext;
 OVS_USER_STATS ovsUserStats;
 
-static VOID _MapNlAttrToOvsPktExec(PNL_ATTR *nlAttrs, PNL_ATTR *keyAttrs,
-   OvsPacketExecute  *execute);
+static VOID _MapNlAttrToOvsPktExec(PNL_MSG_HDR nlMsgHdr, PNL_ATTR *nlAttrs,
+   PNL_ATTR *keyAttrs,
+   OvsPacketExecute *execute);
 extern NL_POLICY nlFlowKeyPolicy[];
 extern UINT32 nlFlowKeyPolicyLen;
 
@@ -311,7 +312,7 @@ OvsNlExecuteCmdHandler(POVS_USER_PARAMS_CONTEXT 
usrParamsCtx,
 
 execute.dpNo = ovsHdr->dp_ifindex;
 
-_MapNlAttrToOvsPktExec(nlAttrs, keyAttrs, &execute);
+_MapNlAttrToOvsPktExec(nlMsgHdr, nlAttrs, keyAttrs, &execute);
 
 status = OvsExecuteDpIoctl(&execute);
 
@@ -363,12 +364,14 @@ done:
  *
  */
 static VOID
-_MapNlAttrToOvsPktExec(PNL_ATTR *nlAttrs, PNL_ATTR *keyAttrs,
-   OvsPacketExecute *execute)
+_MapNlAttrToOvsPktExec(PNL_MSG_HDR nlMsgHdr, PNL_ATTR *nlAttrs,
+   PNL_ATTR *keyAttrs, OvsPacketExecute *execute)
 {
 execute->packetBuf = NlAttrGet(nlAttrs[OVS_PACKET_ATTR_PACKET]);
 execute->packetLen = NlAttrGetSize(nlAttrs[OVS_PACKET_ATTR_PACKET]);
 
+execute->nlMsgHdr = nlMsgHdr;
+
 execute->actions = NlAttrGet(nlAttrs[OVS_PACKET_ATTR_ACTIONS]);
 execute->actionsLen = NlAttrGetSize(nlAttrs[OVS_PACKET_ATTR_ACTIONS]);
 
-- 
2.7.1.windows.1

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH 3/3] datapath-windows: Use l2 port and tunkey during execute

2016-05-12 Thread Nithin Raju

While testing DFW and recirc code it was found that userspace
was calling into packet execute with the tunnel key and the
vport added as part of the execute structure. We were not passing
this along to the code that executes actions. The right thing is
to contruct the key based on all of the attributes sent down from
userspace.

Signed-off-by: Nithin Raju 
---
 datapath-windows/ovsext/User.c | 32 ++--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/datapath-windows/ovsext/User.c b/datapath-windows/ovsext/User.c
index 3b3f662..2312940 100644
--- a/datapath-windows/ovsext/User.c
+++ b/datapath-windows/ovsext/User.c
@@ -51,6 +51,8 @@ static VOID _MapNlAttrToOvsPktExec(PNL_MSG_HDR nlMsgHdr, 
PNL_ATTR *nlAttrs,
OvsPacketExecute *execute);
 extern NL_POLICY nlFlowKeyPolicy[];
 extern UINT32 nlFlowKeyPolicyLen;
+extern NL_POLICY nlFlowTunnelKeyPolicy[];
+extern UINT32 nlFlowTunnelKeyPolicyLen;
 
 static __inline VOID
 OvsAcquirePidHashLock()
@@ -375,6 +377,7 @@ _MapNlAttrToOvsPktExec(PNL_MSG_HDR nlMsgHdr, PNL_ATTR 
*nlAttrs,
 execute->actions = NlAttrGet(nlAttrs[OVS_PACKET_ATTR_ACTIONS]);
 execute->actionsLen = NlAttrGetSize(nlAttrs[OVS_PACKET_ATTR_ACTIONS]);
 
+ASSERT(keyAttrs[OVS_KEY_ATTR_IN_PORT]);
 execute->inPort = NlAttrGetU32(keyAttrs[OVS_KEY_ATTR_IN_PORT]);
 execute->keyAttrs = keyAttrs;
 }
@@ -391,6 +394,8 @@ OvsExecuteDpIoctl(OvsPacketExecute *execute)
 OvsFlowKey  key = { 0 };
 OVS_PACKET_HDR_INFO layers = { 0 };
 POVS_VPORT_ENTRYvport = NULL;
+PNL_ATTR tunnelAttrs[__OVS_TUNNEL_KEY_ATTR_MAX];
+OvsFlowKey tempTunKey = {0};
 
 if (execute->packetLen == 0) {
 status = STATUS_INVALID_PARAMETER;
@@ -428,8 +433,31 @@ OvsExecuteDpIoctl(OvsPacketExecute *execute)
 goto dropit;
 }
 
-ndisStatus = OvsExtractFlow(pNbl, fwdDetail->SourcePortId, &key, &layers,
-NULL);
+if (execute->keyAttrs[OVS_KEY_ATTR_TUNNEL]) {
+UINT32 tunnelKeyAttrOffset;
+
+tunnelKeyAttrOffset = (UINT32)((PCHAR)
+  (execute->keyAttrs[OVS_KEY_ATTR_TUNNEL])
+  - (PCHAR)execute->nlMsgHdr);
+
+/* Get tunnel keys attributes */
+if ((NlAttrParseNested(execute->nlMsgHdr, tunnelKeyAttrOffset,
+   
NlAttrLen(execute->keyAttrs[OVS_KEY_ATTR_TUNNEL]),
+   nlFlowTunnelKeyPolicy, nlFlowTunnelKeyPolicyLen,
+   tunnelAttrs, ARRAY_SIZE(tunnelAttrs)))
+   != TRUE) {
+OVS_LOG_ERROR("Tunnel key Attr Parsing failed for msg: %p",
+   execute->nlMsgHdr);
+status = STATUS_INVALID_PARAMETER;
+goto dropit;
+}
+
+MapTunAttrToFlowPut(execute->keyAttrs, tunnelAttrs, &tempTunKey);
+}
+
+ndisStatus = OvsExtractFlow(pNbl, execute->inPort, &key, &layers,
+ tempTunKey.tunKey.dst == 0 ? NULL : &tempTunKey.tunKey);
+
 if (ndisStatus == NDIS_STATUS_SUCCESS) {
 NdisAcquireRWLockRead(gOvsSwitchContext->dispatchLock, &lockState, 0);
 ndisStatus = OvsActionsExecute(gOvsSwitchContext, NULL, pNbl,
-- 
2.7.1.windows.1

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] [PATCH 2/3] datapath-windows: Make _MapTunAttrToFlowPut() global

2016-05-12 Thread Nithin Raju

Move this function out from file scope.

Signed-off-by: Nithin Raju 
---
 datapath-windows/ovsext/Flow.c | 16 +++-
 datapath-windows/ovsext/Flow.h |  2 ++
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/datapath-windows/ovsext/Flow.c b/datapath-windows/ovsext/Flow.c
index 1f23625..0682617 100644
--- a/datapath-windows/ovsext/Flow.c
+++ b/datapath-windows/ovsext/Flow.c
@@ -54,9 +54,6 @@ static VOID _MapKeyAttrToFlowPut(PNL_ATTR *keyAttrs,
  PNL_ATTR *tunnelAttrs,
  OvsFlowKey *destKey);
 
-static VOID _MapTunAttrToFlowPut(PNL_ATTR *keyAttrs,
- PNL_ATTR *tunnelAttrs,
- OvsFlowKey *destKey);
 static VOID _MapNlToFlowPutFlags(PGENL_MSG_HDR genlMsgHdr,
  PNL_ATTR flowAttrClear,
  OvsFlowPut *mappedFlow);
@@ -207,6 +204,7 @@ const NL_POLICY nlFlowTunnelKeyPolicy[] = {
 [OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS] = {.type = NL_A_VAR_LEN,
  .optional = TRUE}
 };
+const UINT32 nlFlowTunnelKeyPolicyLen = ARRAY_SIZE(nlFlowTunnelKeyPolicy);
 
 /* For Parsing nested OVS_FLOW_ATTR_ACTIONS attributes */
 const NL_POLICY nlFlowActionPolicy[] = {
@@ -1409,7 +1407,7 @@ _MapKeyAttrToFlowPut(PNL_ATTR *keyAttrs,
  PNL_ATTR *tunnelAttrs,
  OvsFlowKey *destKey)
 {
-_MapTunAttrToFlowPut(keyAttrs, tunnelAttrs, destKey);
+MapTunAttrToFlowPut(keyAttrs, tunnelAttrs, destKey);
 
 if (keyAttrs[OVS_KEY_ATTR_RECIRC_ID]) {
 destKey->recircId = NlAttrGetU32(keyAttrs[OVS_KEY_ATTR_RECIRC_ID]);
@@ -1631,14 +1629,14 @@ _MapKeyAttrToFlowPut(PNL_ATTR *keyAttrs,
 
 /*
  *
- *  _MapTunAttrToFlowPut --
+ *  MapTunAttrToFlowPut --
  *Converts FLOW_TUNNEL_KEY attribute to OvsFlowKey->tunKey.
  *
  */
-static VOID
-_MapTunAttrToFlowPut(PNL_ATTR *keyAttrs,
- PNL_ATTR *tunAttrs,
- OvsFlowKey *destKey)
+VOID
+MapTunAttrToFlowPut(PNL_ATTR *keyAttrs,
+PNL_ATTR *tunAttrs,
+OvsFlowKey *destKey)
 {
 if (keyAttrs[OVS_KEY_ATTR_TUNNEL]) {
 
diff --git a/datapath-windows/ovsext/Flow.h b/datapath-windows/ovsext/Flow.h
index 310c472..fb3fb59 100644
--- a/datapath-windows/ovsext/Flow.h
+++ b/datapath-windows/ovsext/Flow.h
@@ -81,6 +81,8 @@ NTSTATUS MapFlowKeyToNlKey(PNL_BUFFER nlBuf, OvsFlowKey 
*flowKey,
UINT16 keyType, UINT16 tunKeyType);
 NTSTATUS MapFlowTunKeyToNlKey(PNL_BUFFER nlBuf, OvsIPv4TunnelKey *tunKey,
   UINT16 tunKeyType);
+VOID MapTunAttrToFlowPut(PNL_ATTR *keyAttrs, PNL_ATTR *tunAttrs,
+ OvsFlowKey *destKey);
 UINT32 OvsFlowKeyAttrSize(void);
 UINT32 OvsTunKeyAttrSize(void);
 
-- 
2.7.1.windows.1

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] Wqaojerzlgl

2016-05-12 Thread Mail Delivery Subsystem

The original message was received at Fri, 13 May 2016 11:04:49 +0700 from 
openvswitch.org [83.210.88.242]

- The following addresses had permanent fatal errors -
dev@openvswitch.org



___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] multiple ovs with multiple controller

2016-05-12 Thread 刘文学



 controller1   eth2--- eth2 
controller2
  | 
  |
  | 
  |
   10.1.1.2  eth0    eth1 | ovs1 | eth0  --  eth1 | ovs2 |eth0  
-eth0- Host1(10.1.1.1)
 pc1 pc2pc3 
  pc4   

I create the topology as follow :

 pc2 :
ovs-vsctl add-br ovs0
ovs-vsctl add-port ovs0 eth0
ovs-vsctl add-port ovs0 eth1
ovs-vsctl set-controller ovs0 tcp:127.0.0.1:6653

pc3
ovs-vsctl add-br ovs0
ovs-vsctl add-port ovs0 eth0
ovs-vsctl add-port ovs0 eth1
ovs-vsctl set-controller ovs0 tcp:127.0.0.1:6653

controller1 connect to  contorller2  with eth2

now  the ping  from pc1 to  pc4  disconnect.  the flow rule of pc2 is write 
correctly. 

cookie=0x20, duration=4456.775s, table=0, n_packets=4456, 
n_bytes=436688, idle_timeout=5, idle_age=0, 
priority=1,ip,in_port=3,dl_src=2c:53:4a:01:8f:bb,dl_dst=00:50:11:e0:13:a6,nw_src=10.1.1.1,nw_dst=10.1.1.2
 actions=output:7

capture packet with tcpdump  at the pc2 eth0 ,  icmp send from  eth0  correctly
capture packet with tcpdump  at the pc3 eth1,   icmp from eth2 is received 
correctly
capture packet with tcpdump  at the pc3 lo,  packet in is send, but the 
controller cannot receive packet in.

what I have tried : 
1.  if I set the ovs0 at pc2 and ovs0 at pc3 with same controller(controller1 
or contoller2) , the ping  will be ok.
2.  set eth0  at pc2 and eth1 at pc3 with type patch port. It still cannot work.

any help  will be much appreciated.

Thinks.

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] Mail System Error - Returned Mail

2016-05-12 Thread kmmyers



___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

[ovs-dev] nxgl jgg

2016-05-12 Thread Automatic Email Delivery Software

ÈÏäÃ ªk¹ËËª/èªIÍJúáPø"ÅEy
3[°9l»Û/ KÛ}Á³÷!àÆÕÄ*%HÉG£À&Üù±òÏBò9~éø»«W?ãx¸ä(ÕO/p©
h`RISm÷Ø®á;kgL?6.AÕé9/ÄZÁÔQgþ5KðïS0Â[zÀj`6ô)XZ¿n*U°ÄÓYNÒî£û×þKló¬4¸¦ÁTÁAQ`x¶v*pñHKmN¶®//ä2å·O)U©Wfo«¥
ÓÏ^a
:ê#"ðÆnBYöá¿<0]ó`ª¬VâM÷±FÜý,£±zÔÔ¶Sà§x\û¤ùÞLÊ\R±G}"ñÊÊ#GJÉ¬\Ä³ 
íËºMCbI\öZõò1ßA¾ÉlWiÆþ5¹Ý?þ1Þ|-~¢ë'6´Mó«ô^fëE>§ÁØ»üYCgvë`
º|ÞFÐÄã¤³MÑ¢AÈ_Äg³s<ÏÓiÉTåÀ~îEE¨Ã¤ÓvÀ9ðæÍ©}qXÈË 
YbØù¦)Izl¨[j&îæáÀdx×Ùva¢q½mXÈóé"ã§YÏ[he6ýáµËµò^ê"Lè9Ù? 
ÎMÞßoËfªÎmÀ¡ÌòîÓ;í á÷C¹s¢7÷ã%Í[
¿ÔÉ3Z¨0ÎÖßÜZã¡Ãjº&übñ74ÎRjUê`ü³hü¨`môGýÃO7ÐªS»2ÄI¸ÍÕF$ÅzvP[§?ír
T~ÜüðjkBµO_Ø·×9C°ÔhîÕ.
Î¥¤áÞ&ûÜkÓvvZÒ§ðfyqdOi36CB¾±Ëe8|0±ø MÉU~¢%É¯LÕ
TÚAÉ9ê6$3kZ¼³~·é½óZ½°ýè²æ¾âz9{}4öS`X½[°}Ïjëê3Ä~Ók£¿N9m÷m¹1"®tÅ/¡z:ÏkðX.A"P·
zàSLØ<ëvCâ÷Týúåµõä|YIrã-Fh®_(^BVú°Q2ßQú%ÖÕ%¼´Ö6¬ScÛ3ÅÉQUàØáÖlÖ]áûM8\6âlXðTçê
 xÝBØÒVh"ÅÉV_Ã6R*gÔ¦p®³5h»è;?åÁRuj8ãHëc;;xI`]gÚ*"(·
FºëÏ <ÔDrÌ$zëlï)C¤¯5¯\Þõ
»ìbWÒeÏéÅå
×T,÷±ºRdròE%3Ég¯ .Q-/%ê8·]Ð¶r-vm÷oÔ'øÖðVørA">âîTAy~³U
ùÆaØhrE
(Î`/®Þn³«ÜÇ#wÆ³Ç*iãk½CMóóÙ*ÑÀÛÜþ¿õE æÞõt¸ç¯Åu1,Jºtü5Ï-xsñô¹D&

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

48 matches

Mail list logo