[ovs-dev] How to add flow rules to enable ssh
Hi All, I want to add a flow in OVS to allow ssh from specific IP address. Also, i want add some rules to allow/drop accessibility from specific IPs. Steps i did yet * My OVS is running in VM ubuntu. * created one bridge. * added port to bridge. * added 2 host using network namespace which are attached via veth with my bridge. * now both hosts will communicate via my bridge (working fine). now i am trying to add some rules in OVS which drop the packet coming from host1 to host2, tried below but it not working. ovs-ofctl add-flow br0 dl_type=0x0800,ip,nw_src=x.x.x.x,action=drop. could any one help, how can i add rules ssh and to drop some packets from some ip. Thanks in advance. Regards S Pratap ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH V6] datapath-windows: Improved offloading on STT tunnel
*Added OvsExtractLayers - populates only the layers field without unnecessary memory operations for flow part *If in STT header the flags are 0 then force packets checksums calculation on receive. *Ensure correct pseudo checksum is set for LSO both on send and receive. Linux includes the segment length to TCP pseudo-checksum conforming to RFC 793 but in case of LSO Windows expects this to be only on Source IP Address, Destination IP Address, and Protocol. *Fragment expiration on rx side of STT was set to 30 seconds, but the correct timeout would be TTL of the packet Signed-off-by: Paul-Daniel Boca Reviewed-by: Alin Gabriel Serdean --- v2: Fixed a NULL pointer dereference. Removed some unused local variables and multiple initializations. v3: Use LSO V2 in OvsDoEncapStt Fixed alignment and code style Use IpHdr TTL for fragment expiration on receive instead 30s V4: Use stored MSS in STT header on rx for lsoInfo of encapsulated packet If STT_CSUM_VERIFIED flag is set then we don't have to extract layers on receive. V5: If CSUM_VERIFIED or no flag is set in STT header then don't recompute checksums V6: Add define for conversion of TTL to seconds Fixes LSO MSS on rx side Compute TCP checksum only once on TX side over STT header --- datapath-windows/ovsext/Flow.c | 243 - datapath-windows/ovsext/Flow.h | 2 + datapath-windows/ovsext/IpHelper.h | 3 +- datapath-windows/ovsext/PacketParser.c | 97 +++-- datapath-windows/ovsext/PacketParser.h | 8 +- datapath-windows/ovsext/Stt.c | 127 + datapath-windows/ovsext/Stt.h | 1 - datapath-windows/ovsext/User.c | 17 ++- 8 files changed, 381 insertions(+), 117 deletions(-) diff --git a/datapath-windows/ovsext/Flow.c b/datapath-windows/ovsext/Flow.c index 1f23625..a49a60c 100644 --- a/datapath-windows/ovsext/Flow.c +++ b/datapath-windows/ovsext/Flow.c @@ -1566,7 +1566,8 @@ _MapKeyAttrToFlowPut(PNL_ATTR *keyAttrs, ndKey = NlAttrGet(keyAttrs[OVS_KEY_ATTR_ND]); RtlCopyMemory(&icmp6FlowPutKey->ndTarget, - ndKey->nd_target, sizeof (icmp6FlowPutKey->ndTarget)); + ndKey->nd_target, + sizeof (icmp6FlowPutKey->ndTarget)); RtlCopyMemory(icmp6FlowPutKey->arpSha, ndKey->nd_sll, ETH_ADDR_LEN); RtlCopyMemory(icmp6FlowPutKey->arpTha, @@ -1596,8 +1597,10 @@ _MapKeyAttrToFlowPut(PNL_ATTR *keyAttrs, arpFlowPutKey->nwSrc = arpKey->arp_sip; arpFlowPutKey->nwDst = arpKey->arp_tip; -RtlCopyMemory(arpFlowPutKey->arpSha, arpKey->arp_sha, ETH_ADDR_LEN); -RtlCopyMemory(arpFlowPutKey->arpTha, arpKey->arp_tha, ETH_ADDR_LEN); +RtlCopyMemory(arpFlowPutKey->arpSha, arpKey->arp_sha, + ETH_ADDR_LEN); +RtlCopyMemory(arpFlowPutKey->arpTha, arpKey->arp_tha, + ETH_ADDR_LEN); /* Kernel datapath assumes 'arpFlowPutKey->nwProto' to be in host * order. */ arpFlowPutKey->nwProto = (UINT8)ntohs((arpKey->arp_op)); @@ -1846,29 +1849,195 @@ OvsGetFlowMetadata(OvsFlowKey *key, return status; } + /* - * - * Initializes 'flow' members from 'packet', 'skb_priority', 'tun_id', and - * 'ofp_in_port'. - * - * Initializes 'packet' header pointers as follows: - * - *- packet->l2 to the start of the Ethernet header. - * - *- packet->l3 to just past the Ethernet header, or just past the - * vlan_header if one is present, to the first byte of the payload of the - * Ethernet frame. - * - *- packet->l4 to just past the IPv4 header, if one is present and has a - * correct length, and otherwise NULL. - * - *- packet->l7 to just past the TCP, UDP, SCTP or ICMP header, if one is - * present and has a correct length, and otherwise NULL. - * - * Returns NDIS_STATUS_SUCCESS normally. Fails only if packet data cannot be accessed - * (e.g. if Pkt_CopyBytesOut() returns an error). - * - */ +* +* Initializes 'layers' members from 'packet' +* +* Initializes 'layers' header pointers as follows: +* +*- layers->l2 to the start of the Ethernet header. +* +*- layers->l3 to just past the Ethernet header, or just past the +* vlan_header if one is present, to the first byte of the payload of the +* Ethernet frame. +* +*- layers->l4 to just past the IPv4 header, if one is present and has a +* correct length, and otherwise NULL. +* +*- layers->l7 to just past the TCP, UDP, SCTP or ICMP header, if one is +*
[ovs-dev] [PATCH RFC]: netdev-dpdk: add jumbo frame support (rebased)
This patch constitutes a response to a request on ovs-discuss (http://openvswitch.org/pipermail/discuss/2016-May/021261.html), and is only for consideration in the testing scenario documented therein. It should not be considered for review, or submission to the OVS source code - the proposed mechanism for adjusting netdev properties at runtime is available here: http://openvswitch.org/pipermail/dev/2016-April/070064.html. This patch has been compiled against the following commits: - OVS: 5d2460 - DPDK: v16.04 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH RFC 1/1] netdev-dpdk: add jumbo frame support
Add support for Jumbo Frames to DPDK-enabled port types, using single-segment-mbufs. Using this approach, the amount of memory allocated for each mbuf to store frame data is increased to a value greater than 1518B (typical Ethernet maximum frame length). The increased space available in the mbuf means that an entire Jumbo Frame can be carried in a single mbuf, as opposed to partitioning it across multiple mbuf segments. The amount of space allocated to each mbuf to hold frame data is defined dynamically by the user when adding a DPDK port to a bridge. If an MTU value is not supplied, or the user-supplied value is invalid, the MTU for the port defaults to standard Ethernet MTU (i.e. 1500B). Signed-off-by: Mark Kavanagh --- INSTALL.DPDK.md | 60 - NEWS | 1 + lib/netdev-dpdk.c | 248 +- 3 files changed, 232 insertions(+), 77 deletions(-) diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md index 7f76df8..9b83c78 100644 --- a/INSTALL.DPDK.md +++ b/INSTALL.DPDK.md @@ -913,10 +913,63 @@ by adding the following string: to sections of all network devices used by DPDK. Parameter 'N' determines how many queues can be used by the guest. +Jumbo Frames + + +Support for Jumbo Frames may be enabled at run-time for DPDK-type ports. + +To avail of Jumbo Frame support, add the 'mtu_request' option to the ovs-vsctl +'add-port' command-line, along with the required MTU for the port. +e.g. + + ``` + ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk options:mtu_request=9000 + ``` + +When Jumbo Frames are enabled, the size of a DPDK port's mbuf segments are +increased, such that a full Jumbo Frame may be accommodated inside a single +mbuf segment. Once set, the MTU for a DPDK port is immutable. + +Note that from an OVSDB perspective, the `mtu_request` option for a specific +port may be disregarded once initially set, as subsequent modifications to this +field are disregarded by the DPDK port. As with non-DPDK ports, the MTU of DPDK +ports is reported by the `Interface` table's 'mtu' field. + +Jumbo frame support has been validated against 13312B frames, using the +DPDK `igb_uio` driver, but larger frames and other DPDK NIC drivers may +theoretically be supported. Supported port types excludes vHost-Cuse ports, as +that feature is pending deprecation. + +vHost Ports and Jumbo Frames + +Jumbo frame support is available for DPDK vHost-User ports only. Some additional +configuration is needed to take advantage of this feature: + + 1. `mergeable buffers` must be enabled for vHost ports, as demonstrated in + the QEMU command line snippet below: + + ``` + '-netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \' + '-device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=on' + ``` + + 2. Where virtio devices are bound to the Linux kernel driver in a guest + environment (i.e. interfaces are not bound to an in-guest DPDK driver), the + MTU of those logical network interfaces must also be increased. This + avoids segmentation of Jumbo Frames in the guest. Note that 'MTU' refers + to the length of the IP packet only, and not that of the entire frame. + + e.g. To calculate the exact MTU of a standard IPv4 frame, subtract the L2 + header and CRC lengths (i.e. 18B) from the max supported frame size. + So, to set the MTU for a 13312B Jumbo Frame: + + ``` + ifconfig eth1 mtu 13294 + ``` + Restrictions: - - - Work with 1500 MTU, needs few changes in DPDK lib to fix this issue. - Currently DPDK port does not make use any offload functionality. - DPDK-vHost support works with 1G huge pages. @@ -945,6 +998,11 @@ Restrictions: increased to the desired number of queues. Both DPDK and OVS must be recompiled for this change to take effect. + Jumbo Frames: + - `virtio-pmd`: DPDK apps in the guest do not exit gracefully. This is a DPDK + issue that is currently being investigated. + - vHost-Cuse: Jumbo Frame support is not available for vHost Cuse ports. + Bug Reporting: -- diff --git a/NEWS b/NEWS index ea7f3a1..4bc0371 100644 --- a/NEWS +++ b/NEWS @@ -26,6 +26,7 @@ Post-v2.5.0 assignment. * Type of log messages from PMD threads changed from INFO to DBG. * QoS functionality with sample egress-policer implementation. + * Support Jumbo Frames - ovs-benchmark: This utility has been removed due to lack of use and bitrot. - ovs-appctl: diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 208c5f5..d730dd8 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -79,6 +79,8 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20); + sizeof(struct dp_packet)\ + RTE_PKTMBUF_HEADROOM) #define NETDEV_DPDK_MBUF_ALIGN 1024 +#define NET
[ovs-dev] FW: checks
Hi dev The attached spreadsheet contains item receipts. Please review Regards, Boyd Whitaker ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] FW: vendors
Hi dev The attached spreadsheet contains other names. Please review Regards, Chance Goff ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] FW: Chart of Accounts
Hi dev The attached spreadsheet contains receive payments. Please review Regards, Herman Wolfe ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] ovn-northd: Support connecting multiple routers to a switch.
> On May 11, 2016, at 10:45 PM, Darrell Ball wrote: > > > >> On Wed, May 11, 2016 at 8:51 PM, Guru Shetty wrote: >> >> >> >> >> > On May 11, 2016, at 8:45 PM, Darrell Ball wrote: >> > >> >> On Wed, May 11, 2016 at 4:42 PM, Guru Shetty wrote: >> >> >> >> >> >>> >> >>> Some reasons why having a “transit LS” is “undesirable” is: >> >>> >> >>> 1)1) It creates additional requirements at the CMS layer for setting >> >>> up networks; i.e. additional programming is required at the OVN >> >>> northbound >> >>> interface for the special transit LSs, interactions with the logical >> >>> router >> >>> peers. >> >> >> >> Agreed that there is additional work needed for the CMS plugin. That work >> >> is needed even if it is just peering as they need to convert one router in >> >> to two in OVN (unless OVN automatically makes this split) >> > >> > The work to coordinate 2 logical routers and one special LS is more and >> > also more complicated than >> > to coordinate 2 logical routers. >> > >> > >> >> >> >> >> >>> >> >>> In cases where some NSX products do this, it is hidden from the user, as >> >>> one would minimally expect. >> >>> >> >>> 2) 2) From OVN POV, it adds an additional OVN datapath to all >> >>> processing to the packet path and programming/processing for that >> >>> datapath. >> >>> >> >>> because you have >> >>> >> >>> R1<->Transit LS<->R2 >> >>> >> >>> vs >> >>> >> >>> R1<->R2 >> >> >> >> Agreed that there is an additional datapath. >> >> >> >> >> >>> >> >>> 3) 3) You have to coordinate the transit LS subnet to handle all >> >>> addresses in this same subnet for all the logical routers and all their >> >>> transit LS peers. >> >> >> >> I don't understand what you mean. If a user uses one gateway, a transit LS >> >> only gets connected by 2 routers. >> >> Other routers get their own transit LS. >> > >> > >> > Each group of logical routers communicating has it own Transit LS. >> > >> > Take an example with one gateway router and 1000 distributed logical >> > routers for 1000 tenants/hv, >> > connecting 1000 HVs for now. >> > Lets assume each distributed logical router only talks to the gateway >> > router. >> >> That is a wrong assumption. Each tenant has his own gateway router (or more) > > Its less of an assumption but more of an example for illustrative purposes; > but its good that you > mention it. I think one of the main discussion points was needing thousands of arp flows and thousands of subnets, and it was on an incorrect logical topology, I am glad that it is not an issue any more. > > The DR<->GR direct connection approach as well as the transit LS approach can > re-use private > IP pools across internal distributed logical routers, which amount to VRF > contexts for tenants networks. > > The Transit LS approach does not scale due to the large number of distributed > datapaths required and > other high special flow requirements. It has more complex and higher > subnetting requirements. In addition, there is greater complexity for > northbound management. Okay, now to summarize from my understanding: * A transit LS uses one internal subnet to connect multiple GR with one DR whereas direct multiple GR to one DR via peering uses multiple internal subnets. * A transit LS uses an additional logical datapath (split between 2 machines via tunnel) per logical topology which is a disadvantage as it involves going through an additional egress or ingress openflow pipeline in one host. * A transit LS lets you split a DR and GR in such a way that the packet entering physical gateway gets into egress pipeline of a switch and can be made to renter the ingress pipeline of a router making it easier to apply stateful policies as packets always enter ingress pipeline of a router in all directions (NS, SN and east west) * The general ability to connect multiple router to a switch (which this patch is about) also lets you connect your physical interface of your physical gateway connected to a physical topology to a LS in ovn which inturn is connected to multiple GRs. Each GR will have floating ips and will respond to ARPs for those floating IPs. Overall, Though I see the possibility of implementing direct GR to DR connections via peering, it feels right now that it will be additional work for not a lot of added benefits. > > > > > > >> >> >> > So thats 1000 Transit LSs. >> > -> 1001 addresses per subnet for each of 1000 subnets (1 for each Transit >> > LS) ? >> > >> > >> > >> > >> >> >> >> >> >>> >> >>> 4)4) Seems like L3 load balancing, ECMP, would be more complicated at >> >>> best. >> >>> >> >>> 5)5) 1000s of additional arp resolve flows rules are needed in >> >>> normal >> >>> cases in addition to added burden of the special transit LS others flows. >> >> >> >> I don't understand why that would be the case. >> > >> > >> > Each Transit LS creates an arp resolve flow for each peer router port. >> > Lets say we have 1000 HVs, each
[ovs-dev] [PATCH RFC 1/6] netdev-dpdk: Use instant sending instead of queueing of packets.
Current implementarion of TX packet's queueing is broken in several ways: * TX queue flushing implemented on receive assumes that all core_id-s are sequential and starts from zero. This may lead to situation when packets will stuck in queue forever and, also, this influences on latency. * For a long time flushing logic depends on uninitialized 'txq_needs_locking', because it usually calculated after 'netdev_dpdk_alloc_txq' but used inside of this function for initialization of 'flush_tx'. According to current flushing logic, constant flushing required if TX queues will be shared among different CPUs. Further patches will implement mechanisms for manipulations with TX queues in runtime. In this case PMD threads will not know is current queue shared or not. This means that constant flushing will be required. Conclusion: Lets remove queueing at all because it doesn't work properly now and, also, constant flushing required anyway. Testing on basic PHY-OVS-PHY and PHY-OVS-VM-OVS-PHY scenarios shows insignificant performance drop (less than 0.5 percents) in compare to profit that we can achieve in the future using XPS or other features. Signed-off-by: Ilya Maximets --- lib/netdev-dpdk.c | 102 -- 1 file changed, 14 insertions(+), 88 deletions(-) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 2b2c43c..c18bed2 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -167,7 +167,6 @@ static const struct rte_eth_conf port_conf = { }, }; -enum { MAX_TX_QUEUE_LEN = 384 }; enum { DPDK_RING_SIZE = 256 }; BUILD_ASSERT_DECL(IS_POW2(DPDK_RING_SIZE)); enum { DRAIN_TSC = 20ULL }; @@ -284,8 +283,7 @@ static struct ovs_list dpdk_mp_list OVS_GUARDED_BY(dpdk_mutex) = OVS_LIST_INITIALIZER(&dpdk_mp_list); /* This mutex must be used by non pmd threads when allocating or freeing - * mbufs through mempools. Since dpdk_queue_pkts() and dpdk_queue_flush() may - * use mempools, a non pmd thread should hold this mutex while calling them */ + * mbufs through mempools. */ static struct ovs_mutex nonpmd_mempool_mutex = OVS_MUTEX_INITIALIZER; struct dpdk_mp { @@ -299,17 +297,12 @@ struct dpdk_mp { /* There should be one 'struct dpdk_tx_queue' created for * each cpu core. */ struct dpdk_tx_queue { -bool flush_tx; /* Set to true to flush queue everytime */ - /* pkts are queued. */ -int count; rte_spinlock_t tx_lock;/* Protects the members and the NIC queue * from concurrent access. It is used only * if the queue is shared among different * pmd threads (see 'txq_needs_locking'). */ int map; /* Mapping of configured vhost-user queues * to enabled by guest. */ -uint64_t tsc; -struct rte_mbuf *burst_pkts[MAX_TX_QUEUE_LEN]; }; /* dpdk has no way to remove dpdk ring ethernet devices @@ -703,19 +696,6 @@ netdev_dpdk_alloc_txq(struct netdev_dpdk *dev, unsigned int n_txqs) dev->tx_q = dpdk_rte_mzalloc(n_txqs * sizeof *dev->tx_q); for (i = 0; i < n_txqs; i++) { -int numa_id = ovs_numa_get_numa_id(i); - -if (!dev->txq_needs_locking) { -/* Each index is considered as a cpu core id, since there should - * be one tx queue for each cpu core. If the corresponding core - * is not on the same numa node as 'dev', flags the - * 'flush_tx'. */ -dev->tx_q[i].flush_tx = dev->socket_id == numa_id; -} else { -/* Queues are shared among CPUs. Always flush */ -dev->tx_q[i].flush_tx = true; -} - /* Initialize map for vhost devices. */ dev->tx_q[i].map = OVS_VHOST_QUEUE_MAP_UNKNOWN; rte_spinlock_init(&dev->tx_q[i].tx_lock); @@ -1056,16 +1036,15 @@ netdev_dpdk_rxq_dealloc(struct netdev_rxq *rxq) } static inline void -dpdk_queue_flush__(struct netdev_dpdk *dev, int qid) +netdev_dpdk_eth_instant_send(struct netdev_dpdk *dev, int qid, + struct rte_mbuf **pkts, int cnt) { -struct dpdk_tx_queue *txq = &dev->tx_q[qid]; uint32_t nb_tx = 0; -while (nb_tx != txq->count) { +while (nb_tx != cnt) { uint32_t ret; -ret = rte_eth_tx_burst(dev->port_id, qid, txq->burst_pkts + nb_tx, - txq->count - nb_tx); +ret = rte_eth_tx_burst(dev->port_id, qid, pkts + nb_tx, cnt - nb_tx); if (!ret) { break; } @@ -1073,32 +1052,18 @@ dpdk_queue_flush__(struct netdev_dpdk *dev, int qid) nb_tx += ret; } -if (OVS_UNLIKELY(nb_tx != txq->count)) { +if (OVS_UNLIKELY(nb_tx != cnt)) { /* free buffers, which we couldn't transmit, one at a time (each
[ovs-dev] [PATCH RFC 2/6] dpif-netdev: Allow configuration of number of tx queues.
Currently number of tx queues is not configurable. Fix that by introducing of new option for PMD interfaces: 'n_txq', which specifies the maximum number of tx queues to be created for this interface. Example: ovs-vsctl set Interface dpdk0 options:n_txq=64 Signed-off-by: Ilya Maximets --- INSTALL.DPDK.md | 11 --- lib/netdev-dpdk.c | 26 +++--- lib/netdev-provider.h | 2 +- 3 files changed, 28 insertions(+), 11 deletions(-) diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md index 93f92e4..630c68d 100644 --- a/INSTALL.DPDK.md +++ b/INSTALL.DPDK.md @@ -355,11 +355,14 @@ Performance Tuning: ovs-appctl dpif-netdev/pmd-stats-show ``` - 3. DPDK port Rx Queues + 3. DPDK port Queues - `ovs-vsctl set Interface options:n_rxq=` + ``` + ovs-vsctl set Interface options:n_rxq= + ovs-vsctl set Interface options:n_txq= + ``` - The command above sets the number of rx queues for DPDK interface. + The commands above sets the number of rx and tx queues for DPDK interface. The rx queues are assigned to pmd threads on the same NUMA node in a round-robin fashion. For more information, please refer to the Open_vSwitch TABLE section in @@ -638,7 +641,9 @@ Follow the steps below to attach vhost-user port(s) to a VM. ``` ovs-vsctl set Interface vhost-user-2 options:n_rxq= + ovs-vsctl set Interface vhost-user-2 options:n_txq= ``` + Note: `n_rxq` should be equal to `n_txq`. QEMU needs to be configured as well. The $q below should match the queues requested in OVS (if $q is more, diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index c18bed2..d86926c 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -345,8 +345,9 @@ struct netdev_dpdk { struct rte_eth_link link; int link_reset_cnt; -/* The user might request more txqs than the NIC has. We remap those - * ('up.n_txq') on these ('real_n_txq'). +/* dpif-netdev might request more txqs than the NIC has, also, number of tx + * queues may be changed via database ('options:n_txq'). + * We remap requested by dpif-netdev number on 'real_n_txq'. * If the numbers match, 'txq_needs_locking' is false, otherwise it is * true and we will take a spinlock on transmission */ int real_n_txq; @@ -954,14 +955,27 @@ static int netdev_dpdk_set_config(struct netdev *netdev, const struct smap *args) { struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); -int new_n_rxq; +int new_n_rxq, new_n_txq; +bool reconfigure_needed = false; ovs_mutex_lock(&dev->mutex); + new_n_rxq = MAX(smap_get_int(args, "n_rxq", dev->requested_n_rxq), 1); if (new_n_rxq != dev->requested_n_rxq) { dev->requested_n_rxq = new_n_rxq; +reconfigure_needed = true; +} + +new_n_txq = MAX(smap_get_int(args, "n_txq", dev->requested_n_txq), 1); +if (new_n_txq != dev->requested_n_txq) { +dev->requested_n_txq = new_n_txq; +reconfigure_needed = true; +} + +if (reconfigure_needed) { netdev_request_reconfigure(netdev); } + ovs_mutex_unlock(&dev->mutex); return 0; @@ -2669,12 +2683,10 @@ netdev_dpdk_reconfigure(struct netdev *netdev) rte_free(dev->tx_q); err = dpdk_eth_dev_init(dev); +dev->txq_needs_locking = dev->real_n_txq < ovs_numa_get_n_cores() + 1; netdev_dpdk_alloc_txq(dev, dev->real_n_txq); -dev->txq_needs_locking = dev->real_n_txq != netdev->n_txq; - out: - ovs_mutex_unlock(&dev->mutex); ovs_mutex_unlock(&dpdk_mutex); @@ -2709,7 +2721,7 @@ netdev_dpdk_vhost_cuse_reconfigure(struct netdev *netdev) netdev->n_txq = dev->requested_n_txq; dev->real_n_txq = 1; netdev->n_rxq = 1; -dev->txq_needs_locking = dev->real_n_txq != netdev->n_txq; +dev->txq_needs_locking = true; ovs_mutex_unlock(&dev->mutex); ovs_mutex_unlock(&dpdk_mutex); diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h index be31e31..f71f8e4 100644 --- a/lib/netdev-provider.h +++ b/lib/netdev-provider.h @@ -53,7 +53,7 @@ struct netdev { uint64_t change_seq; /* A netdev provider might be unable to change some of the device's - * parameter (n_rxq, mtu) when the device is in use. In this case + * parameter (n_rxq, n_txq, mtu) when the device is in use. In this case * the provider can notify the upper layer by calling * netdev_request_reconfigure(). The upper layer will react by stopping * the operations on the device and calling netdev_reconfigure() to allow -- 2.5.0 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH RFC 0/6] dpif-netdev: Manual pinnig of RX queues + XPS.
Patch-set implemented on top of v9 of 'Reconfigure netdev at runtime' from Daniele Di Proietto. ( http://openvswitch.org/pipermail/dev/2016-April/070064.html ) Manual pinning of RX queues to PMD threads required for performance optimisation. This will give to user ability to achieve max. performance using less number of CPUs because currently only user may know which ports are heavy loaded and which is not. To give full controll on ports TX queue manipulation mechanisms also required. For example, to avoid issue described in 'dpif-netdev: XPS (Transmit Packet Steering) implementation.' which becomes worse with ability of manual pinning. ( http://openvswitch.org/pipermail/dev/2016-March/067152.html ) First 3 patches: prerequisites to XPS implementation. Patch #4: XPS implementation. Patches #5 and #6: Manual pinning implementation. Ilya Maximets (6): netdev-dpdk: Use instant sending instead of queueing of packets. dpif-netdev: Allow configuration of number of tx queues. netdev-dpdk: Mandatory locking of TX queues. dpif-netdev: XPS (Transmit Packet Steering) implementation. dpif-netdev: Add dpif-netdev/pmd-reconfigure appctl command. dpif-netdev: Add dpif-netdev/pmd-rxq-set appctl command. INSTALL.DPDK.md| 44 -- NEWS | 4 + lib/dpif-netdev.c | 387 ++--- lib/netdev-bsd.c | 1 - lib/netdev-dpdk.c | 198 ++- lib/netdev-dummy.c | 1 - lib/netdev-linux.c | 1 - lib/netdev-provider.h | 18 +-- lib/netdev-vport.c | 1 - lib/netdev.c | 30 lib/netdev.h | 1 - vswitchd/ovs-vswitchd.8.in | 10 ++ 12 files changed, 394 insertions(+), 302 deletions(-) -- 2.5.0 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH RFC 4/6] dpif-netdev: XPS (Transmit Packet Steering) implementation.
If CPU number in pmd-cpu-mask is not divisible by the number of queues and in a few more complex situations there may be unfair distribution of TX queue-ids between PMD threads. For example, if we have 2 ports with 4 queues and 6 CPUs in pmd-cpu-mask such distribution is possible: <> # ovs-appctl dpif-netdev/pmd-rxq-show pmd thread numa_id 0 core_id 13: port: vhost-user1 queue-id: 1 port: dpdk0 queue-id: 3 pmd thread numa_id 0 core_id 14: port: vhost-user1 queue-id: 2 pmd thread numa_id 0 core_id 16: port: dpdk0 queue-id: 0 pmd thread numa_id 0 core_id 17: port: dpdk0 queue-id: 1 pmd thread numa_id 0 core_id 12: port: vhost-user1 queue-id: 0 port: dpdk0 queue-id: 2 pmd thread numa_id 0 core_id 15: port: vhost-user1 queue-id: 3 <> As we can see above dpdk0 port polled by threads on cores: 12, 13, 16 and 17. By design of dpif-netdev, there is only one TX queue-id assigned to each pmd thread. This queue-id's are sequential similar to core-id's. And thread will send packets to queue with exact this queue-id regardless of port. In previous example: pmd thread on core 12 will send packets to tx queue 0 pmd thread on core 13 will send packets to tx queue 1 ... pmd thread on core 17 will send packets to tx queue 5 So, for dpdk0 port after truncating in netdev-dpdk: core 12 --> TX queue-id 0 % 4 == 0 core 13 --> TX queue-id 1 % 4 == 1 core 16 --> TX queue-id 4 % 4 == 0 core 17 --> TX queue-id 5 % 4 == 1 As a result only 2 of 4 queues used. To fix this issue some kind of XPS implemented in following way: * TX queue-ids are allocated dynamically. * When PMD thread first time tries to send packets to new port it allocates less used TX queue for this port. * PMD threads periodically performes revalidation of allocated TX queue-ids. If queue wasn't used in last XPS_CYCLES it will be freed while revalidation. Reported-by: Zhihong Wang Signed-off-by: Ilya Maximets --- lib/dpif-netdev.c | 147 +++--- lib/netdev-bsd.c | 1 - lib/netdev-dpdk.c | 64 -- lib/netdev-dummy.c| 1 - lib/netdev-linux.c| 1 - lib/netdev-provider.h | 16 -- lib/netdev-vport.c| 1 - lib/netdev.c | 30 --- lib/netdev.h | 1 - 9 files changed, 113 insertions(+), 149 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 3b618fb..73aff8a 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -248,6 +248,8 @@ enum pmd_cycles_counter_type { PMD_N_CYCLES }; +#define XPS_CYCLES 10ULL + /* A port in a netdev-based datapath. */ struct dp_netdev_port { odp_port_t port_no; @@ -256,6 +258,7 @@ struct dp_netdev_port { struct netdev_saved_flags *sf; unsigned n_rxq; /* Number of elements in 'rxq' */ struct netdev_rxq **rxq; +unsigned *txq_used; /* Number of threads that uses each tx queue. */ char *type; /* Port type as requested by user. */ }; @@ -385,6 +388,8 @@ struct rxq_poll { /* Contained by struct dp_netdev_pmd_thread's 'port_cache' or 'tx_ports'. */ struct tx_port { odp_port_t port_no; +int qid; +unsigned long long last_cycles; struct netdev *netdev; struct hmap_node node; }; @@ -442,8 +447,6 @@ struct dp_netdev_pmd_thread { pthread_t thread; unsigned core_id; /* CPU core id of this pmd thread. */ int numa_id;/* numa node id of this pmd thread. */ -atomic_int tx_qid; /* Queue id used by this pmd thread to - * send packets on all netdevs */ struct ovs_mutex port_mutex;/* Mutex for 'poll_list' and 'tx_ports'. */ /* List of rx queues to poll. */ @@ -1153,24 +1156,6 @@ port_create(const char *devname, const char *open_type, const char *type, goto out; } -if (netdev_is_pmd(netdev)) { -int n_cores = ovs_numa_get_n_cores(); - -if (n_cores == OVS_CORE_UNSPEC) { -VLOG_ERR("%s, cannot get cpu core info", devname); -error = ENOENT; -goto out; -} -/* There can only be ovs_numa_get_n_cores() pmd threads, - * so creates a txq for each, and one extra for the non - * pmd threads. */ -error = netdev_set_tx_multiq(netdev, n_cores + 1); -if (error && (error != EOPNOTSUPP)) { -VLOG_ERR("%s, cannot set multiq", devname); -goto out; -} -} - if (netdev_is_reconf_required(netdev)) { error = netdev_reconfigure(netdev); if (error) { @@ -1183,6 +1168,7
[ovs-dev] [PATCH RFC 5/6] dpif-netdev: Add dpif-netdev/pmd-reconfigure appctl command.
This command can be used to force PMD threads to reload and apply new configuration. Signed-off-by: Ilya Maximets --- NEWS | 2 ++ lib/dpif-netdev.c | 41 + vswitchd/ovs-vswitchd.8.in | 3 +++ 3 files changed, 46 insertions(+) diff --git a/NEWS b/NEWS index 4e81cad..817cba1 100644 --- a/NEWS +++ b/NEWS @@ -24,6 +24,8 @@ Post-v2.5.0 Old 'other_config:n-dpdk-rxqs' is no longer supported. * New appctl command 'dpif-netdev/pmd-rxq-show' to check the port/rxq assignment. + * New appctl command 'dpif-netdev/pmd-reconfigure' to force + reconfiguration of PMD threads. * Type of log messages from PMD threads changed from INFO to DBG. * QoS functionality with sample egress-policer implementation. * The mechanism for configuring DPDK has changed to use database diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 73aff8a..5ad6845 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -532,6 +532,8 @@ static void dp_netdev_add_port_tx_to_pmd(struct dp_netdev_pmd_thread *pmd, static void dp_netdev_add_rxq_to_pmd(struct dp_netdev_pmd_thread *pmd, struct dp_netdev_port *port, struct netdev_rxq *rx); +static void reconfigure_pmd_threads(struct dp_netdev *dp) +OVS_REQUIRES(dp->port_mutex); static struct dp_netdev_pmd_thread * dp_netdev_less_loaded_pmd_on_numa(struct dp_netdev *dp, int numa_id); static void dp_netdev_reset_pmd_threads(struct dp_netdev *dp) @@ -796,6 +798,43 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, const char *argv[], unixctl_command_reply(conn, ds_cstr(&reply)); ds_destroy(&reply); } + +static void +dpif_netdev_pmd_reconfigure(struct unixctl_conn *conn, int argc, +const char *argv[], void *aux OVS_UNUSED) +{ +struct ds reply = DS_EMPTY_INITIALIZER; +struct dp_netdev *dp = NULL; + +if (argc > 2) { +unixctl_command_reply_error(conn, "Invalid argument"); +return; +} + +ovs_mutex_lock(&dp_netdev_mutex); + +if (argc == 2) { +dp = shash_find_data(&dp_netdevs, argv[1]); +} else if (shash_count(&dp_netdevs) == 1) { +/* There's only one datapath */ +dp = shash_first(&dp_netdevs)->data; +} + +if (!dp) { +unixctl_command_reply_error(conn, +"please specify an existing datapath"); +goto exit; +} + +ovs_mutex_lock(&dp->port_mutex); +reconfigure_pmd_threads(dp); +unixctl_command_reply(conn, ds_cstr(&reply)); +ds_destroy(&reply); +ovs_mutex_unlock(&dp->port_mutex); +exit: +ovs_mutex_unlock(&dp_netdev_mutex); +} + static int dpif_netdev_init(void) @@ -813,6 +852,8 @@ dpif_netdev_init(void) unixctl_command_register("dpif-netdev/pmd-rxq-show", "[dp]", 0, 1, dpif_netdev_pmd_info, (void *)&poll_aux); +unixctl_command_register("dpif-netdev/pmd-reconfigure", "[dp]", + 0, 1, dpif_netdev_pmd_reconfigure, NULL); return 0; } diff --git a/vswitchd/ovs-vswitchd.8.in b/vswitchd/ovs-vswitchd.8.in index 3dacfc3..b181918 100644 --- a/vswitchd/ovs-vswitchd.8.in +++ b/vswitchd/ovs-vswitchd.8.in @@ -262,6 +262,9 @@ bridge statistics, only the values shown by the above command. .IP "\fBdpif-netdev/pmd-rxq-show\fR [\fIdp\fR]" For each pmd thread of the datapath \fIdp\fR shows list of queue-ids with port names, which this thread polls. +.IP "\fBdpif-netdev/pmd-reconfigure\fR [\fIdp\fR]" +This command can be used to force PMD threads to reload and apply new +configuration. . .so ofproto/ofproto-dpif-unixctl.man .so ofproto/ofproto-unixctl.man -- 2.5.0 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH RFC 6/6] dpif-netdev: Add dpif-netdev/pmd-rxq-set appctl command.
New appctl command to perform manual pinning of RX queues to desired cores. Signed-off-by: Ilya Maximets --- INSTALL.DPDK.md| 24 +- NEWS | 2 + lib/dpif-netdev.c | 199 - vswitchd/ovs-vswitchd.8.in | 7 ++ 4 files changed, 192 insertions(+), 40 deletions(-) diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md index bb14bb5..6e727c7 100644 --- a/INSTALL.DPDK.md +++ b/INSTALL.DPDK.md @@ -337,7 +337,29 @@ Performance Tuning: `ovs-appctl dpif-netdev/pmd-rxq-show` - This can also be checked with: + To change default rxq assignment to pmd threads rxq may be manually + pinned to desired core using: + + `ovs-appctl dpif-netdev/pmd-rxq-set [dp] ` + + To apply new configuration after `pmd-rxq-set` reconfiguration required: + + `ovs-appctl dpif-netdev/pmd-reconfigure` + + After that PMD thread on core `core_id` will become `isolated`. This means + that this thread will poll only pinned RX queues. + + WARNING: If there are no `non-isolated` PMD threads, `non-pinned` RX queues + will not be polled. Also, if provided `core_id` is non-negative and not + available (ex. this `core_id` not in `pmd-cpu-mask`), RX queue will not be + polled by any pmd-thread. + + Isolation of PMD threads and pinning settings also can be checked using + `ovs-appctl dpif-netdev/pmd-rxq-show` command. + + To unpin RX queue use same command with `core-id` equal to `-1`. + + Affinity mask of the pmd thread can be checked with: ``` top -H diff --git a/NEWS b/NEWS index 817cba1..8fedeb7 100644 --- a/NEWS +++ b/NEWS @@ -22,6 +22,8 @@ Post-v2.5.0 - DPDK: * New option "n_rxq" for PMD interfaces. Old 'other_config:n-dpdk-rxqs' is no longer supported. + * New appctl command 'dpif-netdev/pmd-rxq-set' to perform manual + pinning of RX queues to desired core. * New appctl command 'dpif-netdev/pmd-rxq-show' to check the port/rxq assignment. * New appctl command 'dpif-netdev/pmd-reconfigure' to force diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 5ad6845..c1338f4 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -250,6 +250,13 @@ enum pmd_cycles_counter_type { #define XPS_CYCLES 10ULL +/* Contained by struct dp_netdev_port's 'rxqs' member. */ +struct dp_netdev_rxq { +struct netdev_rxq *rxq; +unsigned core_id; /* Сore to which this queue is pinned. */ +bool pinned;/* 'True' if this rxq pinned to some core. */ +}; + /* A port in a netdev-based datapath. */ struct dp_netdev_port { odp_port_t port_no; @@ -257,7 +264,7 @@ struct dp_netdev_port { struct hmap_node node; /* Node in dp_netdev's 'ports'. */ struct netdev_saved_flags *sf; unsigned n_rxq; /* Number of elements in 'rxq' */ -struct netdev_rxq **rxq; +struct dp_netdev_rxq *rxqs; unsigned *txq_used; /* Number of threads that uses each tx queue. */ char *type; /* Port type as requested by user. */ }; @@ -447,6 +454,7 @@ struct dp_netdev_pmd_thread { pthread_t thread; unsigned core_id; /* CPU core id of this pmd thread. */ int numa_id;/* numa node id of this pmd thread. */ +bool isolated; struct ovs_mutex port_mutex;/* Mutex for 'poll_list' and 'tx_ports'. */ /* List of rx queues to poll. */ @@ -722,21 +730,35 @@ pmd_info_show_rxq(struct ds *reply, struct dp_netdev_pmd_thread *pmd) struct rxq_poll *poll; const char *prev_name = NULL; -ds_put_format(reply, "pmd thread numa_id %d core_id %u:\n", - pmd->numa_id, pmd->core_id); +ds_put_format(reply, + "pmd thread numa_id %d core_id %u:\nisolated : %s\n", + pmd->numa_id, pmd->core_id, (pmd->isolated) + ? "true" : "false"); ovs_mutex_lock(&pmd->port_mutex); LIST_FOR_EACH (poll, node, &pmd->poll_list) { const char *name = netdev_get_name(poll->port->netdev); +struct dp_netdev_rxq *rxq; +int rx_qid; if (!prev_name || strcmp(name, prev_name)) { if (prev_name) { ds_put_cstr(reply, "\n"); } -ds_put_format(reply, "\tport: %s\tqueue-id:", +ds_put_format(reply, "\tport: %s\n", netdev_get_name(poll->port->netdev)); } -ds_put_format(reply, " %d", netdev_rxq_get_queue_id(poll->rx)); + +rx_qid = netdev_rxq_get_queue_id(poll->rx); +rxq = &poll->port->rxqs[rx_qid]; + +ds_put_format(reply, "\t\tqueue-id: %d\tpinned = %s", + rx_qid, (rxq->pinned) ? "true" : "false"); +
[ovs-dev] [PATCH RFC 3/6] netdev-dpdk: Mandatory locking of TX queues.
In future XPS implementation dpif-netdev layer will distribute TX queues between PMD threads dynamically and netdev layer will not know about sharing of TX queues. So, we need to lock them always. Each tx queue still has its own lock, so, impact on performance should be minimal. Signed-off-by: Ilya Maximets --- INSTALL.DPDK.md | 9 - lib/netdev-dpdk.c | 22 +- 2 files changed, 9 insertions(+), 22 deletions(-) diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md index 630c68d..bb14bb5 100644 --- a/INSTALL.DPDK.md +++ b/INSTALL.DPDK.md @@ -989,11 +989,10 @@ Restrictions: a system as described above, an error will be reported that initialization failed for the 65th queue. OVS will then roll back to the previous successful queue initialization and use that value as the total number of -TX queues available with queue locking. If a user wishes to use more than -64 queues and avoid locking, then the -`CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_PF` config parameter in DPDK must be -increased to the desired number of queues. Both DPDK and OVS must be -recompiled for this change to take effect. +TX queues available. If a user wishes to use more than +64 queues, then the `CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_PF` config +parameter in DPDK must be increased to the desired number of queues. +Both DPDK and OVS must be recompiled for this change to take effect. Bug Reporting: -- diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index d86926c..32a15fd 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -298,9 +298,7 @@ struct dpdk_mp { * each cpu core. */ struct dpdk_tx_queue { rte_spinlock_t tx_lock;/* Protects the members and the NIC queue -* from concurrent access. It is used only -* if the queue is shared among different -* pmd threads (see 'txq_needs_locking'). */ +* from concurrent access. */ int map; /* Mapping of configured vhost-user queues * to enabled by guest. */ }; @@ -347,12 +345,9 @@ struct netdev_dpdk { /* dpif-netdev might request more txqs than the NIC has, also, number of tx * queues may be changed via database ('options:n_txq'). - * We remap requested by dpif-netdev number on 'real_n_txq'. - * If the numbers match, 'txq_needs_locking' is false, otherwise it is - * true and we will take a spinlock on transmission */ + * We remap requested by dpif-netdev number on 'real_n_txq'. */ int real_n_txq; int real_n_rxq; -bool txq_needs_locking; /* virtio-net structure for vhost device */ OVSRCU_TYPE(struct virtio_net *) virtio_dev; @@ -1414,10 +1409,8 @@ netdev_dpdk_send__(struct netdev_dpdk *dev, int qid, { int i; -if (OVS_UNLIKELY(dev->txq_needs_locking)) { -qid = qid % dev->real_n_txq; -rte_spinlock_lock(&dev->tx_q[qid].tx_lock); -} +qid = qid % dev->real_n_txq; +rte_spinlock_lock(&dev->tx_q[qid].tx_lock); if (OVS_UNLIKELY(!may_steal || pkts[0]->source != DPBUF_DPDK)) { @@ -1479,9 +1472,7 @@ netdev_dpdk_send__(struct netdev_dpdk *dev, int qid, } } -if (OVS_UNLIKELY(dev->txq_needs_locking)) { -rte_spinlock_unlock(&dev->tx_q[qid].tx_lock); -} +rte_spinlock_unlock(&dev->tx_q[qid].tx_lock); } static int @@ -2069,7 +2060,6 @@ netdev_dpdk_vhost_set_queues(struct netdev_dpdk *dev, struct virtio_net *virtio_ dev->real_n_rxq = qp_num; dev->real_n_txq = qp_num; -dev->txq_needs_locking = true; /* Enable TX queue 0 by default if it wasn't disabled. */ if (dev->tx_q[0].map == OVS_VHOST_QUEUE_MAP_UNKNOWN) { dev->tx_q[0].map = 0; @@ -2683,7 +2673,6 @@ netdev_dpdk_reconfigure(struct netdev *netdev) rte_free(dev->tx_q); err = dpdk_eth_dev_init(dev); -dev->txq_needs_locking = dev->real_n_txq < ovs_numa_get_n_cores() + 1; netdev_dpdk_alloc_txq(dev, dev->real_n_txq); out: @@ -2721,7 +2710,6 @@ netdev_dpdk_vhost_cuse_reconfigure(struct netdev *netdev) netdev->n_txq = dev->requested_n_txq; dev->real_n_txq = 1; netdev->n_rxq = 1; -dev->txq_needs_locking = true; ovs_mutex_unlock(&dev->mutex); ovs_mutex_unlock(&dpdk_mutex); -- 2.5.0 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH] tests: Add valgrind targets for ovn utilities and dameons.
Signed-off-by: Gurucharan Shetty --- tests/automake.mk |4 1 file changed, 4 insertions(+) diff --git a/tests/automake.mk b/tests/automake.mk index a5c6074..211a80d 100644 --- a/tests/automake.mk +++ b/tests/automake.mk @@ -152,6 +152,10 @@ check-lcov: all tests/atconfig tests/atlocal $(TESTSUITE) $(check_DATA) clean-lc # valgrind support valgrind_wrappers = \ + tests/valgrind/ovn-controller \ + tests/valgrind/ovn-nbctl \ + tests/valgrind/ovn-northd \ + tests/valgrind/ovn-sbctl \ tests/valgrind/ovs-appctl \ tests/valgrind/ovs-ofctl \ tests/valgrind/ovstest \ -- 1.7.9.5 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] Document
___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH] netdev-dpdk : vhost-user port link state fix
Hi, OVS reports that link state of a vhost-user port (type=dpdkvhostuser) is DOWN, even when traffic is running through the port between a Virtual Machine and the vSwitch. Changing admin state with the "ovs-ofctl mod-port up/down" command over OpenFlow does affect neither the reported link state nor the traffic. The patch below does the flowing: - Triggers link state change by altering netdev's change_seq member. - Controls sending/receiving of packets through vhost-user port according to the port's current admin state. - Sets admin state of newly created vhost-user port to UP. Signed-off-by: Zoltán Balogh Co-authored-by: Jan Scheurich Signed-off-by: Jan Scheurich --- diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index af86d19..155efe1 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -772,6 +772,8 @@ netdev_dpdk_init(struct netdev *netdev, unsigned int port_no, } } else { netdev_dpdk_alloc_txq(dev, OVS_VHOST_MAX_QUEUE_NUM); +/* Enable DPDK_DEV_VHOST device and set promiscuous mode flag. */ +dev->flags = NETDEV_UP | NETDEV_PROMISC; } ovs_list_push_back(&dpdk_list, &dev->list_node); @@ -1256,6 +1258,21 @@ netdev_dpdk_vhost_rxq_recv(struct netdev_rxq *rxq, return EAGAIN; } +/* Delete received packets if device is disabled. */ +if (OVS_UNLIKELY(!(dev->flags & NETDEV_UP))) { +uint16_t i; + +VLOG_WARN_RL(&rl, "error receiving Ethernet packet on %s: %s", + netdev_rxq_get_name(rxq), ovs_strerror(ENONET)); + +for (i = 0; i < nb_rx; i++) { +dp_packet_delete(packets[i]); +} + +*c = 0; +return EAGAIN; +} + rte_spinlock_lock(&dev->stats_lock); netdev_dpdk_vhost_update_rx_counters(&dev->stats, packets, nb_rx); rte_spinlock_unlock(&dev->stats_lock); @@ -1516,6 +1533,23 @@ static int netdev_dpdk_vhost_send(struct netdev *netdev, int qid, struct dp_packet **pkts, int cnt, bool may_steal) { +struct netdev_dpdk *dev = netdev_dpdk_cast(netdev); + +/* Do not send anything if device is disabled. */ +if (OVS_UNLIKELY(!(dev->flags & NETDEV_UP))) { +int i; + +VLOG_WARN_RL(&rl, "error sending Ethernet packet on %s: %s", + netdev_get_name(netdev), ovs_strerror(ENONET)); + +if (may_steal) { +for (i = 0; i < cnt; i++) { +dp_packet_delete(pkts[i]); +} +} +return ENONET; +} + if (OVS_UNLIKELY(pkts[0]->source != DPBUF_DPDK)) { int i; @@ -2004,6 +2038,23 @@ netdev_dpdk_update_flags__(struct netdev_dpdk *dev, if (!(dev->flags & NETDEV_UP)) { rte_eth_dev_stop(dev->port_id); } +} else { +/* If DPDK_DEV_VHOST device's NETDEV_UP flag was changed and vhost is + * running then change netdev's change_seq to trigger link state + * update. */ +struct virtio_net *virtio_dev = netdev_dpdk_get_virtio(dev); + +if ((NETDEV_UP & ((*old_flagsp ^ on) | (*old_flagsp ^ off))) +&& is_vhost_running(virtio_dev)) { +netdev_change_seq_changed(&dev->up); + +/* Clear statistics if device is getting up. */ +if (NETDEV_UP & on) { +rte_spinlock_lock(&dev->stats_lock); +memset(&dev->stats, 0x00, sizeof(dev->stats)); +rte_spinlock_unlock(&dev->stats_lock); +} +} } return 0; @@ -2226,6 +2277,7 @@ new_device(struct virtio_net *virtio_dev) virtio_dev->flags |= VIRTIO_DEV_RUNNING; /* Disable notifications. */ set_irq_status(virtio_dev); +netdev_change_seq_changed(&dev->up); ovs_mutex_unlock(&dev->mutex); break; } @@ -2277,6 +2329,7 @@ destroy_device(volatile struct virtio_net *virtio_dev) ovsrcu_set(&dev->virtio_dev, NULL); netdev_dpdk_txq_map_clear(dev); exists = true; +netdev_change_seq_changed(&dev->up); ovs_mutex_unlock(&dev->mutex); break; } ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] tests: Add valgrind targets for ovn utilities and dameons.
"dev" wrote on 05/12/2016 10:23:39 AM: > From: Gurucharan Shetty > To: dev@openvswitch.org > Cc: Gurucharan Shetty > Date: 05/12/2016 10:42 AM > Subject: [ovs-dev] [PATCH] tests: Add valgrind targets for ovn > utilities and dameons. > Sent by: "dev" > > Signed-off-by: Gurucharan Shetty > --- > tests/automake.mk |4 > 1 file changed, 4 insertions(+) > > diff --git a/tests/automake.mk b/tests/automake.mk > index a5c6074..211a80d 100644 > --- a/tests/automake.mk > +++ b/tests/automake.mk > @@ -152,6 +152,10 @@ check-lcov: all tests/atconfig tests/atlocal $ > (TESTSUITE) $(check_DATA) clean-lc > # valgrind support > > valgrind_wrappers = \ > + tests/valgrind/ovn-controller \ > + tests/valgrind/ovn-nbctl \ > + tests/valgrind/ovn-northd \ > + tests/valgrind/ovn-sbctl \ > tests/valgrind/ovs-appctl \ > tests/valgrind/ovs-ofctl \ > tests/valgrind/ovstest \ > -- This makes a lot of sense to me, we should especially be checking the daemons... Acked-by: Ryan Moats ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] ovn-northd: Support connecting multiple routers to a switch.
On Thu, May 12, 2016 at 6:03 AM, Guru Shetty wrote: > > > > On May 11, 2016, at 10:45 PM, Darrell Ball wrote: > > > > On Wed, May 11, 2016 at 8:51 PM, Guru Shetty wrote: > >> >> >> >> >> > On May 11, 2016, at 8:45 PM, Darrell Ball wrote: >> > >> >> On Wed, May 11, 2016 at 4:42 PM, Guru Shetty wrote: >> >> >> >> >> >>> >> >>> Some reasons why having a “transit LS” is “undesirable” is: >> >>> >> >>> 1)1) It creates additional requirements at the CMS layer for >> setting >> >>> up networks; i.e. additional programming is required at the OVN >> northbound >> >>> interface for the special transit LSs, interactions with the logical >> >>> router >> >>> peers. >> >> >> >> Agreed that there is additional work needed for the CMS plugin. That >> work >> >> is needed even if it is just peering as they need to convert one >> router in >> >> to two in OVN (unless OVN automatically makes this split) >> > >> > The work to coordinate 2 logical routers and one special LS is more and >> > also more complicated than >> > to coordinate 2 logical routers. >> > >> > >> >> >> >> >> >>> >> >>> In cases where some NSX products do this, it is hidden from the user, >> as >> >>> one would minimally expect. >> >>> >> >>> 2) 2) From OVN POV, it adds an additional OVN datapath to all >> >>> processing to the packet path and programming/processing for that >> >>> datapath. >> >>> >> >>> because you have >> >>> >> >>> R1<->Transit LS<->R2 >> >>> >> >>> vs >> >>> >> >>> R1<->R2 >> >> >> >> Agreed that there is an additional datapath. >> >> >> >> >> >>> >> >>> 3) 3) You have to coordinate the transit LS subnet to handle all >> >>> addresses in this same subnet for all the logical routers and all >> their >> >>> transit LS peers. >> >> >> >> I don't understand what you mean. If a user uses one gateway, a >> transit LS >> >> only gets connected by 2 routers. >> >> Other routers get their own transit LS. >> > >> > >> > Each group of logical routers communicating has it own Transit LS. >> > >> > Take an example with one gateway router and 1000 distributed logical >> > routers for 1000 tenants/hv, >> > connecting 1000 HVs for now. >> > Lets assume each distributed logical router only talks to the gateway >> > router. >> >> That is a wrong assumption. Each tenant has his own gateway router (or >> more) >> > > Its less of an assumption but more of an example for illustrative > purposes; but its good that you > mention it. > > > I think one of the main discussion points was needing thousands of arp > flows and thousands of subnets, and it was on an incorrect logical > topology, I am glad that it is not an issue any more. > I think you misunderstood - having one or more gateway per tenant does not make Transit LS better in flow scale. The size of a Transit LS subnet and management across Transit LSs is one the 5 issues I mentioned and it remains the same as do the other issues. Based on the example with 1000 HVs, 1000 tenants, 1000 HV/tenant, one distributed logical router per tenant spanning 1000 HVs, one gateway per tenant, we have a Transit LS with 1001 router type logical ports (1000 HVs + one gateway). Now, based on your previous assertion earlier: "If a user uses one gateway, a transit LS only gets connected by 2 routers. Other routers get their own transit LS." This translates: one Transit LS per tenant => 1000 Transit LS datapaths in total 1001 Transit LS logical ports per Transit LS => 1,001,000 Transit LS logical ports in total 1001 arp resolve flows per Transit LS => 1,001,000 flows just for arp resolve. Each Transit LS comes with many other flows: so we multiply that number of flows * 1000 Transit LSs = ? flows 1001 addresses per subnet per Transit LS; I suggested addresses should be reused across subnets, but when each subnet is large as it with Transit LS, and there are 1000 subnets instances to keep context for - one for each Transit LS, its get harder to manage. These Transit LSs and their subnets will not be identical across Transit LSs in real scenarios. We can go to more complex examples and larger HV scale (1 is the later goal ?) if you wish, but I think the minimal case is enough to point out the issues. > > > The DR<->GR direct connection approach as well as the transit LS approach > can re-use private > IP pools across internal distributed logical routers, which amount to VRF > contexts for tenants networks. > > > > The Transit LS approach does not scale due to the large number of > distributed datapaths required and > other high special flow requirements. It has more complex and higher > subnetting requirements. In addition, there is greater complexity for > northbound management. > > > Okay, now to summarize from my understanding: > * A transit LS uses one internal subnet to connect multiple GR with one DR > whereas direct multiple GR to one DR via peering uses multiple internal > subnets. > * A transit LS uses an additional logical datapath (split between 2 > machines v
Re: [ovs-dev] [PATCH] tests: Add valgrind targets for ovn utilities and dameons.
Thanks for adding this, I will re-run the OVN-related valgrind tests. On Thu, May 12, 2016 at 9:14 AM, Ryan Moats wrote: > > > "dev" wrote on 05/12/2016 10:23:39 AM: > > > From: Gurucharan Shetty > > To: dev@openvswitch.org > > Cc: Gurucharan Shetty > > Date: 05/12/2016 10:42 AM > > Subject: [ovs-dev] [PATCH] tests: Add valgrind targets for ovn > > utilities and dameons. > > Sent by: "dev" > > > > Signed-off-by: Gurucharan Shetty > > --- > > tests/automake.mk |4 > > 1 file changed, 4 insertions(+) > > > > diff --git a/tests/automake.mk b/tests/automake.mk > > index a5c6074..211a80d 100644 > > --- a/tests/automake.mk > > +++ b/tests/automake.mk > > @@ -152,6 +152,10 @@ check-lcov: all tests/atconfig tests/atlocal $ > > (TESTSUITE) $(check_DATA) clean-lc > > # valgrind support > > > > valgrind_wrappers = \ > > + tests/valgrind/ovn-controller \ > > + tests/valgrind/ovn-nbctl \ > > + tests/valgrind/ovn-northd \ > > + tests/valgrind/ovn-sbctl \ > > tests/valgrind/ovs-appctl \ > > tests/valgrind/ovs-ofctl \ > > tests/valgrind/ovstest \ > > -- > > This makes a lot of sense to me, we should especially be checking the > daemons... > > Acked-by: Ryan Moats > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] tests: Add valgrind targets for ovn utilities and dameons.
Thank you William and Ryan. I pushed this to master. On 12 May 2016 at 09:32, William Tu wrote: > Thanks for adding this, I will re-run the OVN-related valgrind tests. > > On Thu, May 12, 2016 at 9:14 AM, Ryan Moats wrote: > > > > > > > "dev" wrote on 05/12/2016 10:23:39 AM: > > > > > From: Gurucharan Shetty > > > To: dev@openvswitch.org > > > Cc: Gurucharan Shetty > > > Date: 05/12/2016 10:42 AM > > > Subject: [ovs-dev] [PATCH] tests: Add valgrind targets for ovn > > > utilities and dameons. > > > Sent by: "dev" > > > > > > Signed-off-by: Gurucharan Shetty > > > --- > > > tests/automake.mk |4 > > > 1 file changed, 4 insertions(+) > > > > > > diff --git a/tests/automake.mk b/tests/automake.mk > > > index a5c6074..211a80d 100644 > > > --- a/tests/automake.mk > > > +++ b/tests/automake.mk > > > @@ -152,6 +152,10 @@ check-lcov: all tests/atconfig tests/atlocal $ > > > (TESTSUITE) $(check_DATA) clean-lc > > > # valgrind support > > > > > > valgrind_wrappers = \ > > > + tests/valgrind/ovn-controller \ > > > + tests/valgrind/ovn-nbctl \ > > > + tests/valgrind/ovn-northd \ > > > + tests/valgrind/ovn-sbctl \ > > > tests/valgrind/ovs-appctl \ > > > tests/valgrind/ovs-ofctl \ > > > tests/valgrind/ovstest \ > > > -- > > > > This makes a lot of sense to me, we should especially be checking the > > daemons... > > > > Acked-by: Ryan Moats > > ___ > > dev mailing list > > dev@openvswitch.org > > http://openvswitch.org/mailman/listinfo/dev > > > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] ovn-northd: Support connecting multiple routers to a switch.
> > >> >> I think one of the main discussion points was needing thousands of arp >> flows and thousands of subnets, and it was on an incorrect logical >> topology, I am glad that it is not an issue any more. >> > > I think you misunderstood - having one or more gateway per tenant does not > make Transit LS better in flow scale. > The size of a Transit LS subnet and management across Transit LSs is one > the 5 issues I mentioned and it remains the same > as do the other issues. > > Based on the example with 1000 HVs, 1000 tenants, 1000 HV/tenant, one > distributed logical router per tenant > spanning 1000 HVs, one gateway per tenant, we have a Transit LS with 1001 > router type logical ports (1000 HVs + one gateway). > A transit LS does not have 1001 router type ports. It has just two. One of them only resides in the gateway. The other one resides in every hypervisor. This is the same as a router peer port. Transit LS adds one extra per hypervisor, which I have agreed as a disadvantage. If that is what you mean, then it is right. > > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v3] Add configurable OpenFlow port name.
On Wed, May 11, 2016 at 10:13:48AM +0800, Xiao Liang wrote: > On Wed, May 11, 2016 at 4:31 AM, Flavio Leitner wrote: > > On Tue, May 10, 2016 at 10:31:19AM +0800, Xiao Liang wrote: > >> On Tue, May 10, 2016 at 4:28 AM, Flavio Leitner wrote: > >> > On Sat, Apr 23, 2016 at 01:26:17PM +0800, Xiao Liang wrote: > >> >> Add new column "ofname" in Interface table to configure port name > >> >> reported > >> >> to controllers with OpenFlow protocol, thus decouple OpenFlow port name > >> >> from > >> >> device name. > >> >> > >> >> For example: > >> >> # ovs-vsctl set Interface eth0 ofname=wan > >> >> # ovs-vsctl set Interface eth1 ofname=lan0 > >> >> then controllers can recognize ports by their names. > >> > > >> > This change is nice because now the same setup like a "compute node" > >> > can use the same logical name to refer to a specific interface that > >> > could have different netdev name on different HW. > >> > > >> > Comments inline. > >> > > >> >> Signed-off-by: Xiao Liang > >> >> --- > >> >> v2: Added test for ofname > >> >> Increased db schema version > >> >> Updated NEWS > >> >> v3: Rebase > >> >> --- > >> >> NEWS | 1 + > >> >> lib/db-ctl-base.h | 2 +- > >> >> ofproto/ofproto-provider.h | 1 + > >> >> ofproto/ofproto.c | 67 > >> >> -- > >> >> ofproto/ofproto.h | 9 ++- > >> >> tests/ofproto.at | 60 > >> >> + > >> >> utilities/ovs-vsctl.c | 1 + > >> >> vswitchd/bridge.c | 10 +-- > >> >> vswitchd/vswitch.ovsschema | 6 +++-- > >> >> vswitchd/vswitch.xml | 14 ++ > >> >> 10 files changed, 163 insertions(+), 8 deletions(-) > >> >> > >> >> diff --git a/NEWS b/NEWS > >> >> index ea7f3a1..156781c 100644 > >> >> --- a/NEWS > >> >> +++ b/NEWS > >> >> @@ -15,6 +15,7 @@ Post-v2.5.0 > >> >> now implemented. Only flow mod and port mod messages are > >> >> supported > >> >> in bundles. > >> >> * New OpenFlow extension NXM_NX_MPLS_TTL to provide access to > >> >> MPLS TTL. > >> >> + * Port name can now be set with "ofname" column in the Interface > >> >> table. > >> >> - ovs-ofctl: > >> >> * queue-get-config command now allows a queue ID to be specified. > >> >> * '--bundle' option can now be used with OpenFlow 1.3. > >> >> diff --git a/lib/db-ctl-base.h b/lib/db-ctl-base.h > >> >> index f8f576b..5bd62d5 100644 > >> >> --- a/lib/db-ctl-base.h > >> >> +++ b/lib/db-ctl-base.h > >> >> @@ -177,7 +177,7 @@ struct weak_ref_table { > >> >> struct cmd_show_table { > >> >> const struct ovsdb_idl_table_class *table; > >> >> const struct ovsdb_idl_column *name_column; > >> >> -const struct ovsdb_idl_column *columns[3]; /* Seems like a good > >> >> number. */ > >> >> +const struct ovsdb_idl_column *columns[4]; /* Seems like a good > >> >> number. */ > >> >> const struct weak_ref_table wref_table; > >> >> }; > >> >> > >> >> diff --git a/ofproto/ofproto-provider.h b/ofproto/ofproto-provider.h > >> >> index daa0077..8795242 100644 > >> >> --- a/ofproto/ofproto-provider.h > >> >> +++ b/ofproto/ofproto-provider.h > >> >> @@ -84,6 +84,7 @@ struct ofproto { > >> >> struct hmap ports; /* Contains "struct ofport"s. */ > >> >> struct shash port_by_name; > >> >> struct simap ofp_requests; /* OpenFlow port number requests. */ > >> >> +struct smap ofp_names; /* OpenFlow port names. */ > >> >> uint16_t alloc_port_no; /* Last allocated OpenFlow port > >> >> number. */ > >> >> uint16_t max_ports; /* Max possible OpenFlow port num, > >> >> plus one. */ > >> >> struct hmap ofport_usage; /* Map ofport to last used time. */ > >> >> diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c > >> >> index ff6affd..a2799f4 100644 > >> >> --- a/ofproto/ofproto.c > >> >> +++ b/ofproto/ofproto.c > >> >> @@ -550,6 +550,7 @@ ofproto_create(const char *datapath_name, const > >> >> char *datapath_type, > >> >> hmap_init(&ofproto->ofport_usage); > >> >> shash_init(&ofproto->port_by_name); > >> >> simap_init(&ofproto->ofp_requests); > >> >> +smap_init(&ofproto->ofp_names); > >> >> ofproto->max_ports = ofp_to_u16(OFPP_MAX); > >> >> ofproto->eviction_group_timer = LLONG_MIN; > >> >> ofproto->tables = NULL; > >> >> @@ -1546,6 +1547,7 @@ ofproto_destroy__(struct ofproto *ofproto) > >> >> hmap_destroy(&ofproto->ofport_usage); > >> >> shash_destroy(&ofproto->port_by_name); > >> >> simap_destroy(&ofproto->ofp_requests); > >> >> +smap_destroy(&ofproto->ofp_names); > >> >> > >> >> OFPROTO_FOR_EACH_TABLE (table, ofproto) { > >> >> oftable_destroy(table); > >> >> @@ -1945,7 +1947,7 @@ ofproto_port_open_type(const char *datapath_type, > >> >> const char *port_type) > >> >> * 'ofp_portp' is non-null). */ > >> >> int > >> >> ofproto_port_add(struct ofproto *ofpr
Re: [ovs-dev] [PATCH v3 0/2] doc: Refactor DPDK install guide
On 5/9/16 2:32 AM, Bhanuprakash Bodireddy wrote: This patchset refactors the present INSTALL.DPDK.md guide. The INSTALL guide is split in to two documents named INSTALL.DPDK and INSTALL.DPDK-ADVANCED. The former document is simplified with emphasis on installation, basic testcases and targets novice users. Sections on system configuration, performance tuning, vhost walkthrough are moved to DPDK-ADVANCED guide. and IVSHMEM is moved too. DPDK can be complex to install and configure to optimize OVS for best performance but it is relatively easy to set up for simple test cases. This patch is the right step to present the install information in separate parts. Most users won't need to refer to INSTALL.DPDK-ADVANCED.md. +1 Reviewers can see these doc changes in rendered form in this fork: https://github.com/bbodired/ovs/blob/master/INSTALL.DPDK.md https://github.com/bbodired/ovs/blob/master/INSTALL.DPDK-ADVANCED.md v1->v2: - Rebased - Update DPDK version to 16.04 - Add vsperf section in ADVANCED Guide v2->v3: - Rebased Bhanuprakash Bodireddy (2): doc: Refactor DPDK install documentation doc: Refactor DPDK install guide, add ADVANCED doc INSTALL.DPDK-ADVANCED.md | 809 +++ INSTALL.DPDK.md | 1193 +- 2 files changed, 1140 insertions(+), 862 deletions(-) create mode 100644 INSTALL.DPDK-ADVANCED.md ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 1/2] doc: Refactor DPDK install documentation
On 5/9/16 2:32 AM, Bhanuprakash Bodireddy wrote: Refactor the INSTALL.DPDK in to two documents named INSTALL.DPDK and INSTALL.DPDK-ADVANCED. While INSTALL.DPDK document shall facilitate the novice user in setting up the OVS DPDK and running it out of box, the ADVANCED document is targeted at expert users looking for the optimum performance running dpdk datapath. This commit updates INSTALL.DPDK.md document. Signed-off-by: Bhanuprakash Bodireddy --- INSTALL.DPDK.md | 1193 +++ 1 file changed, 331 insertions(+), 862 deletions(-) diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md index 93f92e4..bf646bf 100644 --- a/INSTALL.DPDK.md +++ b/INSTALL.DPDK.md @@ -1,1001 +1,470 @@ -Using Open vSwitch with DPDK - +OVS DPDK INSTALL GUIDE + -Open vSwitch can use Intel(R) DPDK lib to operate entirely in -userspace. This file explains how to install and use Open vSwitch in -such a mode. +## Contents -The DPDK support of Open vSwitch is considered experimental. -It has not been thoroughly tested. +1. [Overview](#overview) +2. [Building and Installation](#build) +3. [Setup OVS DPDK datapath](#ovssetup) I wonder if the following 3 sections be in the advanced guide? with a note here to refer to the advanced guide for configuration in the VM, testcases and limitations? +4. [DPDK in the VM](#builddpdk) +5. [OVS Testcases](#ovstc) +6. [Limitations ](#ovslimits) -This version of Open vSwitch should be built manually with `configure` -and `make`. +## 1. Overview -OVS needs a system with 1GB hugepages support. +Open vSwitch can use DPDK lib to operate entirely in userspace. +This file provides information on installation and use of Open vSwitch +using DPDK datapath. This version of Open vSwitch should be built manually +with `configure` and `make`. -Building and Installing: - +The DPDK support of Open vSwitch is considered 'experimental'. Isn't it time to remove this statement and not just put the word in quotes? -Required: DPDK 16.04 -Optional (if building with vhost-cuse): `fuse`, `fuse-devel` (`libfuse-dev` -on Debian/Ubuntu) +### Prerequisites -1. Configure build & install DPDK: - 1. Set `$DPDK_DIR` +* Required: DPDK 16.04 +* Hardware: [DPDK Supported NICs] when physical ports in use - ``` - export DPDK_DIR=/usr/src/dpdk-16.04 - cd $DPDK_DIR - ``` - - 2. Then run `make install` to build and install the library. - For default install without IVSHMEM: - - `make install T=x86_64-native-linuxapp-gcc DESTDIR=install` - - To include IVSHMEM (shared memory): - - `make install T=x86_64-ivshmem-linuxapp-gcc DESTDIR=install` - - For further details refer to http://dpdk.org/ - -2. Configure & build the Linux kernel: - - Refer to intel-dpdk-getting-started-guide.pdf for understanding - DPDK kernel requirement. - -3. Configure & build OVS: - - * Non IVSHMEM: - - `export DPDK_BUILD=$DPDK_DIR/x86_64-native-linuxapp-gcc/` - - * IVSHMEM: - - `export DPDK_BUILD=$DPDK_DIR/x86_64-ivshmem-linuxapp-gcc/` - - ``` - cd $(OVS_DIR)/ - ./boot.sh - ./configure --with-dpdk=$DPDK_BUILD [CFLAGS="-g -O2 -Wno-cast-align"] - make - ``` - - Note: 'clang' users may specify the '-Wno-cast-align' flag to suppress DPDK cast-align warnings. - -To have better performance one can enable aggressive compiler optimizations and -use the special instructions(popcnt, crc32) that may not be available on all -machines. Instead of typing `make`, type: - -`make CFLAGS='-O3 -march=native'` - -Refer to [INSTALL.userspace.md] for general requirements of building userspace OVS. - -Using the DPDK with ovs-vswitchd: -- - -1. Setup system boot - Add the following options to the kernel bootline: - - `default_hugepagesz=1GB hugepagesz=1G hugepages=1` - -2. Setup DPDK devices: - - DPDK devices can be setup using either the VFIO (for DPDK 1.7+) or UIO - modules. UIO requires inserting an out of tree driver igb_uio.ko that is - available in DPDK. Setup for both methods are described below. - - * UIO: - 1. insert uio.ko: `modprobe uio` - 2. insert igb_uio.ko: `insmod $DPDK_BUILD/kmod/igb_uio.ko` - 3. Bind network device to igb_uio: - `$DPDK_DIR/tools/dpdk_nic_bind.py --bind=igb_uio eth1` - - * VFIO: - - VFIO needs to be supported in the kernel and the BIOS. More information - can be found in the [DPDK Linux GSG]. - - 1. Insert vfio-pci.ko: `modprobe vfio-pci` - 2. Set correct permissions on vfio device: `sudo /usr/bin/chmod a+x /dev/vfio` -and: `sudo /usr/bin/chmod 0666 /dev/vfio/*` - 3. Bind network device to vfio-pci: -`$DPDK_DIR/tools/dpdk_nic_bind.py --bind=vfio-pci eth1` - -3. Mount the hugetable filesystem - - `mount -t hugetlbfs -o pagesize=1G none /dev/hugepages` - - Ref to http://www.dpdk.org/doc/quick-start for verifying DPDK setup. - -4. Follow the instr
Re: [ovs-dev] [PATCH] ovn-northd: Support connecting multiple routers to a switch.
> > > I think you misunderstood - having one or more gateway per tenant does not > make Transit LS better in flow scale. > The size of a Transit LS subnet and management across Transit LSs is one > the 5 issues I mentioned and it remains the same > as do the other issues. > > Based on the example with 1000 HVs, 1000 tenants, 1000 HV/tenant, one > distributed logical router per tenant > spanning 1000 HVs, one gateway per tenant, we have a Transit LS with 1001 > router type logical ports (1000 HVs + one gateway). > > Now, based on your previous assertion earlier: > "If a user uses one gateway, a transit LS only gets connected by 2 > routers. > Other routers get their own transit LS." > > This translates: > one Transit LS per tenant => 1000 Transit LS datapaths in total > 1001 Transit LS logical ports per Transit LS => 1,001,000 Transit LS > logical ports in total > 1001 arp resolve flows per Transit LS => 1,001,000 flows just for arp > resolve. > Each Transit LS comes with many other flows: so we multiply that number of > flows * 1000 Transit LSs = ? flows > 1001 addresses per subnet per Transit LS; I suggested addresses should be > reused across subnets, but when each subnet is large > Re-reading. The above is a wrong conclusion making me believe that there is a big disconnect. A subnet in transit LS has only 2 IP addresses (if it is only one physical gateway). Every additional physical gateway can add one additional IP address to the subnet (depending on whether the new physical gateway has a gateway router added for that logical topology.). ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 2/2] doc: Refactor DPDK install guide, add ADVANCED doc
On 5/9/16 2:32 AM, Bhanuprakash Bodireddy wrote: Add INSTALL.DPDK-ADVANCED document that is forked off from original INSTALL.DPDK guide. This document is targeted at users looking for optimum performance on OVS using dpdk datapath. Thanks for this effort. Signed-off-by: Bhanuprakash Bodireddy --- INSTALL.DPDK-ADVANCED.md | 809 +++ 1 file changed, 809 insertions(+) create mode 100644 INSTALL.DPDK-ADVANCED.md diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md new file mode 100644 index 000..dd09d36 --- /dev/null +++ b/INSTALL.DPDK-ADVANCED.md @@ -0,0 +1,809 @@ +OVS DPDK ADVANCED INSTALL GUIDE += + +## Contents + +1. [Overview](#overview) +2. [Building Shared Library](#build) +3. [System configuration](#sysconf) +4. [Performance Tuning](#perftune) +5. [OVS Testcases](#ovstc) +6. [Vhost Walkthrough](#vhost) +7. [QOS](#qos) +8. [Static Code Analysis](#staticanalyzer) +9. [Vsperf](#vsperf) + +## 1. Overview + +The Advanced Install Guide explains how to improve OVS performance using +DPDK datapath. This guide also provides information on tuning, system configuration, +troubleshooting, static code analysis and testcases. + +## 2. Building Shared Library + +DPDK can be built as static or shared library and shall be linked by applications +using DPDK datapath. The section lists steps to build shared library and dynamically +link DPDK against OVS. + +Note: Minor performance loss is seen with OVS when using shared DPDK library as +compared to static library. + +Check section 2.2, 2.3 of INSTALL.DPDK on download instructions +for DPDK and OVS. + + * Configure the DPDK library + + Set `CONFIG_RTE_BUILD_SHARED_LIB=y` in `config/common_base` + to generate shared DPDK library + + + * Build and install DPDK + +For Default install (without IVSHMEM), set `export DPDK_TARGET=x86_64-native-linuxapp-gcc` +For IVSHMEM case, set `export DPDK_TARGET=x86_64-ivshmem-linuxapp-gcc` + +``` +export DPDK_DIR=/usr/src/dpdk-16.04 +export DPDK_BUILD=$DPDK_DIR/$DPDK_TARGET +make install T=$DPDK_TARGET DESTDIR=install +``` + + * Build, Install and Setup OVS. + + Export the DPDK shared library location and setup OVS as listed in + section 3.3 of INSTALL.DPDK. + + `export LD_LIBRARY_PATH=$DPDK_DIR/x86_64-native-linuxapp-gcc/lib` + +## 3. System Configuration + +To achieve optimal OVS performance, the system can be configured and that includes +BIOS tweaks, Grub cmdline additions, better understanding of NUMA nodes and +apt selection of PCIe slots for NIC placement. + +### 3.1 Recommended BIOS settings + + ``` + | Settings | values| comments + |---|---|--- + | C3 power state| Disabled | - + | C6 power state| Disabled | - + | MLC Streamer | Enabled | - + | MLC Spacial prefetcher| Enabled | - + | DCU Data prefetcher | Enabled | - + | DCA | Enabled | - + | CPU power and performance | Performance - + | Memory RAS and perf | | - +config-> NUMA optimized | Enabled | - + ``` + +### 3.2 PCIe Slot Selection + +The fastpath performance also depends on factors like the NIC placement, +Channel speeds between PCIe slot and CPU, proximity of PCIe slot to the CPU +cores running DPDK application. Listed below are the steps to identify +right PCIe slot. + +- Retrieve host details using cmd `dmidecode -t baseboard | grep "Product Name"` +- Download the technical specification for Product listed eg: S2600WT2. +- Check the Product Architecture Overview on the Riser slot placement, + CPU sharing info and also PCIe channel speeds. + + example: On S2600WT, CPU1 and CPU2 share Riser Slot 1 with Channel speed between + CPU1 and Riser Slot1 at 32GB/s, CPU2 and Riser Slot1 at 16GB/s. Running DPDK app + on CPU1 cores and NIC inserted in to Riser card Slots will optimize OVS performance + in this case. + +- Check the Riser Card #1 - Root Port mapping information, on the available slots + and individual bus speeds. In S2600WT slot 1, slot 2 has high bus speeds and are + potential slots for NIC placement. + +### 3.3 Setup Hugepages Advanced Hugepage setup. + Basic huge page setup for 2MB huge pages is covered in INSTALL.DPDK.md. This section + 1. Allocate Huge pages + + For persistent allocation of huge pages, add the following options to the kernel bootline + - 2MB huge pages: + + Add `hugepages=N` + + - 1G huge pages: + + Add `default_hugepagesz=1GB hugepagesz=1G hugepages=N` + + For platforms supporting multiple huge page sizes, Add options + + `default_hugepagesz= hugepagesz= hugepages=N` + where 'N' = Number of huge pages requested, 'size' = huge page size, + optional suffix [kKmMgG] + +For run-time allocation of huge pages + + - 2MB huge pages: + + `echo N > /proc/sys/vm
Re: [ovs-dev] [PATCH v2] tunneling: Improving tunneling performance using DPDK Rx checksum offloading feature.
On Tue, May 10, 2016 at 6:31 PM, Jesse Gross wrote: > On Tue, May 10, 2016 at 3:26 AM, Chandran, Sugesh > wrote: >>> -Original Message- >>> From: Jesse Gross [mailto:je...@kernel.org] >>> Sent: Friday, May 6, 2016 5:00 PM >>> To: Chandran, Sugesh >>> Cc: pravin shelar ; ovs dev >>> Subject: Re: [ovs-dev] [PATCH v2] tunneling: Improving tunneling >>> performance using DPDK Rx checksum offloading feature. >>> >>> On Fri, May 6, 2016 at 1:13 AM, Chandran, Sugesh >>> wrote: >>> >> -Original Message- >>> >> From: Jesse Gross [mailto:je...@kernel.org] >>> >> Sent: Friday, May 6, 2016 1:58 AM >>> >> To: Chandran, Sugesh >>> >> Cc: pravin shelar ; ovs dev >>> >> Subject: Re: [ovs-dev] [PATCH v2] tunneling: Improving tunneling >>> >> performance using DPDK Rx checksum offloading feature. >>> >> >>> >> On Thu, May 5, 2016 at 1:26 AM, Chandran, Sugesh >>> >> wrote: >>> >> >> -Original Message- >>> >> >> From: Jesse Gross [mailto:je...@kernel.org] >>> >> >> Sent: Wednesday, May 4, 2016 10:06 PM >>> >> >> To: Chandran, Sugesh >>> >> >> Cc: pravin shelar ; ovs dev >>> >>> >> >> Subject: Re: [ovs-dev] [PATCH v2] tunneling: Improving tunneling >>> >> >> performance using DPDK Rx checksum offloading feature. >>> >> >> >>> >> >> On Wed, May 4, 2016 at 8:58 AM, Chandran, Sugesh >>> >> >> wrote: >>> >> >> >> -Original Message- >>> >> >> >> From: Jesse Gross [mailto:je...@kernel.org] >>> >> >> >> Sent: Thursday, April 28, 2016 4:41 PM >>> >> >> >> To: Chandran, Sugesh >>> >> >> >> Cc: pravin shelar ; ovs dev >>> >> >>> >> >> >> Subject: Re: [ovs-dev] [PATCH v2] tunneling: Improving >>> >> >> >> tunneling performance using DPDK Rx checksum offloading feature. >>> >> >> > >>> >> >> >> That sounds great, thanks for following up. In the meantime, do >>> >> >> >> you have any plans for transmit side checksum offloading? >>> >> >> > [Sugesh] The vectorization on Tx side is getting disabled when >>> >> >> > DPDK Tx >>> >> >> checksum offload is enabled. This causes performance drop in OVS. >>> >> >> > However We don’t find any such impact when enabling Rx checksum >>> >> >> offloading(though this disables Rx vectorization). >>> >> >> >>> >> >> OK, I see. Does the drop in throughput cause performance to go >>> >> >> below the baseline even for UDP tunnels with checksum traffic? (I >>> >> >> guess small and large packets might have different results here.) >>> >> >> Or is it that it reduce performance for unrelated traffic? If it's >>> >> >> the latter case, can we find a way to use offloading conditionally? >>> >> > [Sugesh] We tested for 64 byte UDP packet stream and found that the >>> >> > performance is better when the offloading is turned off. This is >>> >> > for any >>> >> traffic through the port. >>> >> > DPDK doesn’t support conditional offloading for now. >>> >> > In other words DPDK can't do selective vector packet processing on a >>> port. >>> >> > As far as I know there are some technical difficulties to enable >>> >> > offload + vectorization together in DPDK. >>> >> >>> >> My guess is the results might be different for larger packets since >>> >> those cases will stress checksumming more and rx/tx routines less. >>> >> >>> >> In any case, I think this is an area that is worthwhile to continue >>> investigating. >>> >> My expectation is that tunneled packets with outer UDP checksums will >>> >> be a use case that is hit increasingly frequently with OVS DPDK - for >>> >> example, OVN will likely start exercising this soon. >>> > [Sugesh]Totally agreed, I will do PHY-PHY, PHY-TUNNEL-PHY tests with >>> > different size traffic >>> > streams(64 Byte, 512, 1024, 1500) when checksum enabled/disabled and >>> see the impact. >>> > Is there any other traffic pattern/tests that we have to consider? >>> >>> I think that should cover it pretty well. Thanks a lot! >> [Sugesh] Please find below for the test results in different scenarios. >> >> Native(Rx, Tx checksum offloading OFF) >> Test64 Bytes128 Bytes 256 Bytes 512 >> Bytes 1500 bytes Mix >> PHY-PHY-BIDIR 9.2 8.445 4.528 >> 2.349 0.822 6.205 >> PHY-VM-PHY-BIDIR2.564 2.503 2.205 >> 1.901 0.822 2.29 >> PHY-VXLAN-PHY 4.165 4.084 3.834 2.147 >>0.849 3.964 >> >> >> Rx Checksum ON/Rx Vector OFF >> >> Test64 Bytes128 Bytes 256 Bytes 512 >> Bytes 1500 bytes Mix >> PHY-PHY-BIDIR 9.128.445 4.528 >> 2.349 0.822 6.205 >> PHY-VM-PHY-BIDIR2.535 2.513 2.21 >> 1.913 0.822 2.25 >> PHY-VXLAN-PHY 4.475 4.473.834 2.147 >>0.849 4.4 >> >> >> >> Tx Checksum ON/Tx Vector OFF >> >> Test64
Re: [ovs-dev] [PATCH v2] tunneling: Improving tunneling performance using DPDK Rx checksum offloading feature.
On Thu, May 12, 2016 at 11:18 AM, pravin shelar wrote: > On Tue, May 10, 2016 at 6:31 PM, Jesse Gross wrote: >> I'm a little bit torn as to whether we should apply your rx checksum >> offload patch in the meantime while we wait for DPDK to offer the new >> API. It looks like we'll have a 10% gain with tunneling in exchange >> for a 1% loss in other situations, so the call obviously depends on >> use case. Pravin, Daniele, others, any opinions? >> > There could be a way around the API issue and avoid the 1% loss. > netdev API could be changed to set packet->mbuf.ol_flags to > (PKT_RX_IP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD) if the netdev > implementation does not support rx checksum offload. Then there is no > need to check the rx checksum flags in dpif-netdev. And the checksum > can be directly checked in tunneling code where we actually need to. > Is there any issue with this approach? I think that's probably a little bit cleaner overall though I don't think that it totally eliminates the overhead. Not all DPDK ports will support checksum offload (since the hardware may not do it in theory) so we'll still need to check the port status on each packet to initialize the flags. The other thing that is a little concerning is that there might be conditions where a driver doesn't actually verify the checksum. I guess most of these aren't supported in our tunneling implementation (IP options comes to mind) but it's a little risky. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2] tunneling: Improving tunneling performance using DPDK Rx checksum offloading feature.
On Thu, May 12, 2016 at 12:59 PM, Jesse Gross wrote: > On Thu, May 12, 2016 at 11:18 AM, pravin shelar wrote: >> On Tue, May 10, 2016 at 6:31 PM, Jesse Gross wrote: >>> I'm a little bit torn as to whether we should apply your rx checksum >>> offload patch in the meantime while we wait for DPDK to offer the new >>> API. It looks like we'll have a 10% gain with tunneling in exchange >>> for a 1% loss in other situations, so the call obviously depends on >>> use case. Pravin, Daniele, others, any opinions? >>> >> There could be a way around the API issue and avoid the 1% loss. >> netdev API could be changed to set packet->mbuf.ol_flags to >> (PKT_RX_IP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD) if the netdev >> implementation does not support rx checksum offload. Then there is no >> need to check the rx checksum flags in dpif-netdev. And the checksum >> can be directly checked in tunneling code where we actually need to. >> Is there any issue with this approach? > > I think that's probably a little bit cleaner overall though I don't > think that it totally eliminates the overhead. Not all DPDK ports will > support checksum offload (since the hardware may not do it in theory) > so we'll still need to check the port status on each packet to > initialize the flags. > I was thinking of changing dpdk packet object constructor (ovs_rte_pktmbuf_init()) to initialize the flag according to the device offload support. This way there should not be any checks needed in packet receive path. > The other thing that is a little concerning is that there might be > conditions where a driver doesn't actually verify the checksum. I > guess most of these aren't supported in our tunneling implementation > (IP options comes to mind) but it's a little risky. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 2/5] ovn: Introduce l3 gateway router.
See comments inline. >To: dev@openvswitch.org >From: Gurucharan Shetty >Sent by: "dev" >Date: 05/10/2016 08:10PM >Cc: Gurucharan Shetty >Subject: [ovs-dev] [PATCH 2/5] ovn: Introduce l3 gateway router. > >Currently OVN has distributed switches and routers. When a packet >exits a container or a VM, the entire lifecycle of the packet >through multiple switches and routers are calculated in source >chassis itself. When the destination endpoint resides on a different >chassis, the packet is sent to the other chassis and it only goes >through the egress pipeline of that chassis once and eventually to >the real destination. > >When the packet returns back, the same thing happens. The return >packet leaves the VM/container on the chassis where it resides. >The packet goes through all the switches and routers in the logical >pipleline on that chassis and then sent to the eventual destination >over the tunnel. > >The above makes the logical pipeline very flexible and easy. But, >creates a problem for cases where you need to add stateful services >(via conntrack) on switches and routers. Completely agree up to this point. >For l3 gateways, we plan to leverage DNAT and SNAT functionality >and we want to apply DNAT and SNAT rules on a router. So we ideally need >the packet to go through that router in both directions in the same >chassis. To achieve this, this commit introduces a new gateway router which is >static and can be connected to your distributed router via a switch. Completely agree that you need to go through a common point in both directions in the same chassis. Why does this require a separate gateway router? Why can't it just be a centralized gateway router port on an otherwise distributed router? Looking at the logic for ports on remote chassis in physical.c, I see no reason why that logic cannot work for logical router datapaths just like it works for logical switch datapaths. On logical switches, some ports are distributed and run everywhere, e.g. localnet, and other ports run on a specific chassis, e.g. vif and your proposed "gateway" port. Am I missing something that prevents a mix of centralized and distributed ports on a logical router datapath? We have not tried it yet, but it seems like this would simplify things a lot: 1. Only one router needs to be provisioned rather than a distributed router and a centralized gateway router 2. No need for static routes between the distributed and centralized gateway routers 3. No need for transit logical switches, transit subnets, or transit flows 4. Less passes through datapaths, improving performance You can then pin DNAT and SNAT logic to the centralized gateway port, for traffic to physical networks. East/west traffic to floating IPs still requires additional logic on other ports, as proposed in Chandra's floating IP patch. We want to get to a point where SNAT traffic goes through a centralized gateway port, but DNAT traffic goes through a distributed patch port. This would achieve parity with the OpenStack ML2 OVS DVR reference implementation, in terms of traffic subsets that are centralized versus distributed. >To make minimal changes in OVN's logical pipeline, this commit >tries to make the switch port connected to a l3 gateway router look like >a container/VM endpoint for every other chassis except the chassis >on which the l3 gateway router resides. On the chassis where the >gateway router resides, the connection looks just like a patch port. Completely agree that this is the right way to go. >This is achieved by the doing the following: >Introduces a new type of port_binding record called 'gateway'. >On the chassis where the gateway router resides, this port behaves just >like the port of type 'patch'. The ovn-controller on that chassis >populates the "chassis" column for this record as an indication for >other ovn-controllers of its physical location. Other ovn-controllers >treat this port as they would treat a VM/Container port on a different >chassis. I like this logic. My only concern is whether the logical switch port for this functionality should really be called 'gateway', since this may get confused with L2 gateway. Some possibilities: 'patch2l3gateway', 'localpatch', 'chassispatch'. Holding off on more specific comments until we resolve the big picture stuff. Mickey ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 2/5] ovn: Introduce l3 gateway router.
> > > > Completely agree that you need to go through a common point in both > directions > in the same chassis. > > Why does this require a separate gateway router? > The primary reason to choose a separate gateway router was to support multiple physical gateways for k8s to which you can loadbalance your traffic from external world. i.e you will have a router in each physical gateway with its own floating IP per service. From external world, you can loadbalance traffic to your gateways. The floating IP is further loadbalanced to an internal workload. > Why can't it just be a centralized gateway router port on an otherwise > distributed > router? > It is indeed one of the solutions for my problem statement (provided you can support multiple physical gateway chassis.). I haven't spent too much time thinking on how to do this for multiple physical gateways. > > Looking at the logic for ports on remote chassis in physical.c, I see no > reason why > that logic cannot work for logical router datapaths just like it works for > logical > switch datapaths. On logical switches, some ports are distributed and run > everywhere, e.g. localnet, and other ports run on a specific chassis, e.g. > vif and > your proposed "gateway" port. > Am I missing something that prevents a mix of centralized and distributed > ports > on a logical router datapath? > You will have to give me some more details (I am currently unable to visualize your solution). May be start with a simple topology of one DR connected to two LS. Simple packet walkthrough (in english) for north-south (external to internal via floating IPs) and its return traffic (going through conntrack), south-north traffic (and its return traffic) and east-west (via central gateway). My thinking is this: If we want to do NAT in a router, then we need to have a ingress pipeline as well as an egress pipeline. A router has multiple ports. When a packet comes in any router port, I want to be able to do DNAT (and reverse its effect) and when packet exits any port, I want to be able to do SNAT. I also should be able to do both DNAT and SNAT on a single packet (to handle north-south loadbalancing). So the entire router should be there at a single location. > > We have not tried it yet, but it seems like this would simplify things a > lot: > 1. Only one router needs to be provisioned rather than a distributed > router and a > centralized gateway router > 2. No need for static routes between the distributed and centralized > gateway routers > 3. No need for transit logical switches, transit subnets, or transit flows > 4. Less passes through datapaths, improving performance > The above is ideal. > > You can then pin DNAT and SNAT logic to the centralized gateway port, for > traffic to > physical networks. East/west traffic to floating IPs still requires > additional logic on > other ports, as proposed in Chandra's floating IP patch. > > We want to get to a point where SNAT traffic goes through a centralized > gateway > port, but DNAT traffic goes through a distributed patch port. Please tell me what does DNAT mean and what does SNAT mean for you. I may be talking the opposite thing than you. dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] ovn-northd: Support connecting multiple routers to a switch.
On Thu, May 12, 2016 at 10:54 AM, Guru Shetty wrote: > >> I think you misunderstood - having one or more gateway per tenant does >> not make Transit LS better in flow scale. >> The size of a Transit LS subnet and management across Transit LSs is one >> the 5 issues I mentioned and it remains the same >> as do the other issues. >> >> Based on the example with 1000 HVs, 1000 tenants, 1000 HV/tenant, one >> distributed logical router per tenant >> spanning 1000 HVs, one gateway per tenant, we have a Transit LS with 1001 >> router type logical ports (1000 HVs + one gateway). >> >> Now, based on your previous assertion earlier: >> "If a user uses one gateway, a transit LS only gets connected by 2 >> routers. >> Other routers get their own transit LS." >> >> This translates: >> one Transit LS per tenant => 1000 Transit LS datapaths in total >> 1001 Transit LS logical ports per Transit LS => 1,001,000 Transit LS >> logical ports in total >> 1001 arp resolve flows per Transit LS => 1,001,000 flows just for arp >> resolve. >> Each Transit LS comes with many other flows: so we multiply that number >> of flows * 1000 Transit LSs = ? flows >> 1001 addresses per subnet per Transit LS; I suggested addresses should be >> reused across subnets, but when each subnet is large >> > > Re-reading. The above is a wrong conclusion making me believe that there > is a big disconnect. A subnet in transit LS has only 2 IP addresses (if it > is only one physical gateway). Every additional physical gateway can add > one additional IP address to the subnet (depending on whether the new > physical gateway has a gateway router added for that logical topology.). > With respect to the IP address usage. I think a diagram would help especially the K8 case, which I had heard in other conversations may have a separate gateway on every HV ?. Hence, I would like to know what that means - i.e. were you thinking to run separate gateways routers on every HV for K8 ? With respect to the other questions, I think its best approach would be to ask direct questions so those direct questions get answered. 1) With 1000 HVs, 1000 HVs/tenant, 1 distributed router per tenant, you choose the number of gateways/tenant: a) How many Transit LS distributed datapaths are expected in total ? b) How many Transit LS logical ports are needed at the HV level ? what I mean by that is lets say we have one additional logical port at northd level and 1000 HVs then if we need to download that port to 1000 HVs, I consider that to be 1000 logical ports at the HV level because downloading and maintaining state across HVs at scale is more expensive than for a single hypervisor. c) How many Transit LS arp resolve entries at the HV level ? what I mean by that is lets say we have one additional arp resolve flow at northd level and 1000 HVs then if we need to download that arp resolve flow to 1000 HVs, I consider that to be 1000 flows at the HV level because downloading and maintaining state across multiple HVs is more expensive that a single hypervisor. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] Error
The original message was included as attachment ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] ovn-northd: Support connecting multiple routers to a switch.
On 12 May 2016 at 16:34, Darrell Ball wrote: > On Thu, May 12, 2016 at 10:54 AM, Guru Shetty wrote: > > > > >> I think you misunderstood - having one or more gateway per tenant does > >> not make Transit LS better in flow scale. > >> The size of a Transit LS subnet and management across Transit LSs is one > >> the 5 issues I mentioned and it remains the same > >> as do the other issues. > >> > >> Based on the example with 1000 HVs, 1000 tenants, 1000 HV/tenant, one > >> distributed logical router per tenant > >> spanning 1000 HVs, one gateway per tenant, we have a Transit LS with > 1001 > >> router type logical ports (1000 HVs + one gateway). > >> > >> Now, based on your previous assertion earlier: > >> "If a user uses one gateway, a transit LS only gets connected by 2 > >> routers. > >> Other routers get their own transit LS." > >> > >> This translates: > >> one Transit LS per tenant => 1000 Transit LS datapaths in total > >> 1001 Transit LS logical ports per Transit LS => 1,001,000 Transit LS > >> logical ports in total > >> 1001 arp resolve flows per Transit LS => 1,001,000 flows just for arp > >> resolve. > >> Each Transit LS comes with many other flows: so we multiply that number > >> of flows * 1000 Transit LSs = ? flows > >> 1001 addresses per subnet per Transit LS; I suggested addresses should > be > >> reused across subnets, but when each subnet is large > >> > > > > Re-reading. The above is a wrong conclusion making me believe that there > > is a big disconnect. A subnet in transit LS has only 2 IP addresses (if > it > > is only one physical gateway). Every additional physical gateway can add > > one additional IP address to the subnet (depending on whether the new > > physical gateway has a gateway router added for that logical topology.). > > > > > With respect to the IP address usage. I think a diagram would help > especially the K8 case, > Drawing a diagram here is not feasible. Happy to do it on a whiteboard though. > which I had heard in other conversations may have a separate gateway on > every HV ?. Hence, I would like to know what that means - i.e. were you > thinking to run separate gateways routers on every HV for K8 ? > Yes, thats the plan (as many as possible). 100 routers is a target. Not HV, but a VM. > > With respect to the other questions, I think its best approach would be to > ask direct questions so those > direct questions get answered. > > 1) With 1000 HVs, 1000 HVs/tenant, 1 distributed router per tenant, you > choose the number of gateways/tenant: > > a) How many Transit LS distributed datapaths are expected in total ? > One (i.e the same as the distributed router). > > b) How many Transit LS logical ports are needed at the HV level ? > > what I mean by that is lets say we have one additional logical port at > northd level and 1000 HVs then if we need to download that port to 1000 > HVs, I consider that to be 1000 logical ports at the HV level because > downloading and maintaining state across HVs at scale is more expensive > than for a single hypervisor. > 1000 additional ones. It is the same as your distributed logical switch or logical router (this is the case even with the peer routers) > > c) How many Transit LS arp resolve entries at the HV level ? > > what I mean by that is lets say we have one additional arp resolve flow at > northd level and 1000 HVs then if we need to download that arp resolve flow > to 1000 HVs, I consider that to be 1000 flows at the HV level because > downloading and maintaining state across multiple HVs is more expensive > that a single hypervisor. > 2 ARP flows per transit LS * 1000 HVs. Do realize that a single bridge on a single hypervisor typically has flows in the 100,000 range. Even a million is feasbile. Microsegmentation use cases has 1 ACLs per logical switch. So that is 1 * 1000 for your case form single LS. So do you have some comparative perspectives. dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] netdev-dpdk: Fix locking during get_stats.
Thanks for fixing this! Acked-by: Daniele Di Proietto 2016-05-10 15:50 GMT-07:00 Joe Stringer : > Clang complains: > lib/netdev-dpdk.c:1860:1: error: mutex 'dev->mutex' is not locked on every > path > through here [-Werror,-Wthread-safety-analysis] > } > ^ > lib/netdev-dpdk.c:1815:5: note: mutex acquired here > ovs_mutex_lock(&dev->mutex); > ^ > ./include/openvswitch/thread.h:60:9: note: expanded from macro > 'ovs_mutex_lock' > ovs_mutex_lock_at(mutex, OVS_SOURCE_LOCATOR) > ^ > > Fixes: d6e3feb57c44 ("Add support for extended netdev statistics based on > RFC 2819.") > Signed-off-by: Joe Stringer > --- > lib/netdev-dpdk.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c > index af86d194f9bb..87879d5c6e4d 100644 > --- a/lib/netdev-dpdk.c > +++ b/lib/netdev-dpdk.c > @@ -1819,6 +1819,7 @@ netdev_dpdk_get_stats(const struct netdev *netdev, > struct netdev_stats *stats) > > if (rte_eth_stats_get(dev->port_id, &rte_stats)) { > VLOG_ERR("Can't get ETH statistics for port: %i.", dev->port_id); > +ovs_mutex_unlock(&dev->mutex); > return EPROTO; > } > > -- > 2.1.4 > > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] Returned mail: Data format error
wZã3õÔöÙ0¼½íb Í0ïm³Ü¨òx¾Ýk»%£qgëxÔnÍTFÎ,0aL\Éõо9] ¶Ö¨a]â̱²>S¿W .ºK(Mµ ù¥N_c©ÎУ¢RB 4Ø´½LÄ¥yóÍNQÁ.ºþâç_äT²ãNcSOh¦n ò1aw!Òñ¡È5Ô8²Ã©"6Êr"À©n$¼ý{\³)»qiæ±rUåYU ~~ôc¥ãÜ& Pfîµ>|ä-§¤7©DoÃñë«jÑð·DéûQüw!°ÎÝËLe#`wF-«oç.¨eÉZ$ý.§¯Sá:%ÆyÛzý±$ú$íÚgÓ][ér ¡«ô££ êåRH¿£µ6ª*à~ÜIîSE§ðñc#aez#W¨!Ñ CXHöªfÂÏöR¶ &þTV´^sGæV*æ®üzAáÉÉ¿¹4bFO\ÏYòåÑWeL¢tÂlîóË% a¢¼¾ñB ÕwÂVóhjO-ûigÒmáÇÀe¬êzÛ| Å£iËf«<ì]3©y¼F/cæ0iÓ(é(æ½TSe<;ùß1{k6jRqTÒAÝìcM¼;»RØ'ÊñÚMSɨ C{/©¸Bv\ç·ö)&mÌsãføö5ðØ¸ L ûÔîø& ü§EÖ<¶ »Å«ûrH'w¶ ä¥nå´UñYßü)jcôM:úü^äðV[ï[¡æ-ïHݧ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] ovn-northd: Support connecting multiple routers to a switch.
On Thu, May 12, 2016 at 4:55 PM, Guru Shetty wrote: > > > On 12 May 2016 at 16:34, Darrell Ball wrote: > >> On Thu, May 12, 2016 at 10:54 AM, Guru Shetty wrote: >> >> > >> >> I think you misunderstood - having one or more gateway per tenant does >> >> not make Transit LS better in flow scale. >> >> The size of a Transit LS subnet and management across Transit LSs is >> one >> >> the 5 issues I mentioned and it remains the same >> >> as do the other issues. >> >> >> >> Based on the example with 1000 HVs, 1000 tenants, 1000 HV/tenant, one >> >> distributed logical router per tenant >> >> spanning 1000 HVs, one gateway per tenant, we have a Transit LS with >> 1001 >> >> router type logical ports (1000 HVs + one gateway). >> >> >> >> Now, based on your previous assertion earlier: >> >> "If a user uses one gateway, a transit LS only gets connected by 2 >> >> routers. >> >> Other routers get their own transit LS." >> >> >> >> This translates: >> >> one Transit LS per tenant => 1000 Transit LS datapaths in total >> >> 1001 Transit LS logical ports per Transit LS => 1,001,000 Transit LS >> >> logical ports in total >> >> 1001 arp resolve flows per Transit LS => 1,001,000 flows just for arp >> >> resolve. >> >> Each Transit LS comes with many other flows: so we multiply that number >> >> of flows * 1000 Transit LSs = ? flows >> >> 1001 addresses per subnet per Transit LS; I suggested addresses should >> be >> >> reused across subnets, but when each subnet is large >> >> >> > >> > Re-reading. The above is a wrong conclusion making me believe that there >> > is a big disconnect. A subnet in transit LS has only 2 IP addresses (if >> it >> > is only one physical gateway). Every additional physical gateway can add >> > one additional IP address to the subnet (depending on whether the new >> > physical gateway has a gateway router added for that logical topology.). >> > >> >> >> With respect to the IP address usage. I think a diagram would help >> especially the K8 case, >> > Drawing a diagram here is not feasible. Happy to do it on a whiteboard > though. > Thanks - lets do that; I would like to clarify the addressing requirements and full scope of distributed/gateway router interconnects for K8s. > > >> which I had heard in other conversations may have a separate gateway on >> every HV ?. Hence, I would like to know what that means - i.e. were you >> thinking to run separate gateways routers on every HV for K8 ? >> > Yes, thats the plan (as many as possible). 100 routers is a target. Not > HV, but a VM. > > > >> >> With respect to the other questions, I think its best approach would be to >> ask direct questions so those >> direct questions get answered. >> >> 1) With 1000 HVs, 1000 HVs/tenant, 1 distributed router per tenant, you >> choose the number of gateways/tenant: >> >> a) How many Transit LS distributed datapaths are expected in total ? >> > > One (i.e the same as the distributed router). > i.e. 1000 distributed routers => 1000 Transit LSs > > >> >> b) How many Transit LS logical ports are needed at the HV level ? >> >> what I mean by that is lets say we have one additional logical port at >> northd level and 1000 HVs then if we need to download that port to 1000 >> HVs, I consider that to be 1000 logical ports at the HV level because >> downloading and maintaining state across HVs at scale is more expensive >> than for a single hypervisor. >> > > 1000 additional ones. It is the same as your distributed logical switch or > logical router (this is the case even with the peer routers) > Did you mean 2000 including both ends of the Transit LS ? > > > >> >> c) How many Transit LS arp resolve entries at the HV level ? >> >> what I mean by that is lets say we have one additional arp resolve flow at >> northd level and 1000 HVs then if we need to download that arp resolve >> flow >> to 1000 HVs, I consider that to be 1000 flows at the HV level because >> downloading and maintaining state across multiple HVs is more expensive >> that a single hypervisor. >> > > 2 ARP flows per transit LS * 1000 HVs. > oops; I underestimated by half > Do realize that a single bridge on a single hypervisor typically has flows > in the 100,000 range. Even a million is feasbile. > I know. I am thinking about the coordination across many HVs. > Microsegmentation use cases has 1 ACLs per logical switch. So that is > 1 * 1000 for your case form single LS. So do you have some comparative > perspectives. > > > > dev mailing list >> dev@openvswitch.org >> http://openvswitch.org/mailman/listinfo/dev >> > > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] ovn-northd: Support connecting multiple routers to a switch.
> > > >> > >> With respect to the other questions, I think its best approach would be > to > >> ask direct questions so those > >> direct questions get answered. > >> > >> 1) With 1000 HVs, 1000 HVs/tenant, 1 distributed router per tenant, you > >> choose the number of gateways/tenant: > >> > >> a) How many Transit LS distributed datapaths are expected in total ? > >> > > > > One (i.e the same as the distributed router). > > > > > i.e. > 1000 distributed routers => 1000 Transit LSs > Yes. > > > > > > > > >> > >> b) How many Transit LS logical ports are needed at the HV level ? > >> > >> what I mean by that is lets say we have one additional logical port at > >> northd level and 1000 HVs then if we need to download that port to 1000 > >> HVs, I consider that to be 1000 logical ports at the HV level because > >> downloading and maintaining state across HVs at scale is more expensive > >> than for a single hypervisor. > >> > > > > 1000 additional ones. It is the same as your distributed logical switch > or > > logical router (this is the case even with the peer routers) > > > > Did you mean 2000 including both ends of the Transit LS ? > No. One end is only on the physical gateway to act as a physical endpoint. > > > > > > > > > > >> > >> c) How many Transit LS arp resolve entries at the HV level ? > >> > >> what I mean by that is lets say we have one additional arp resolve flow > at > >> northd level and 1000 HVs then if we need to download that arp resolve > >> flow > >> to 1000 HVs, I consider that to be 1000 flows at the HV level because > >> downloading and maintaining state across multiple HVs is more expensive > >> that a single hypervisor. > >> > > > > 2 ARP flows per transit LS * 1000 HVs. > > > > oops; I underestimated by half > > > > > Do realize that a single bridge on a single hypervisor typically has > flows > > in the 100,000 range. Even a million is feasbile. > > > > I know. > I am thinking about the coordination across many HVs. > There is no co-ordination. HV just downloads from ovn-sb. This is absolutely not different than any of the other distributed datapaths that we have. If introduction of one additional datapath is a problem, then OVN has a problem in general because it then simply means that it can only do one DR per logical topology. A transit LS is much less resource intensive (as it consumes just one additional port) than a DR connected to another DR (not GR) as peers (in this case have 2 additional ports per DR and then whatever additional switch ports that are connected to it). If the larger concern is about having 1000 tenants, then we need to pass more hints to ovn-controller about interconnections so that it only programs things relevant to local VMs and containers which are limited by the number of CPUs and Memory and is usually in the order of 10s. > > > > > Microsegmentation use cases has 1 ACLs per logical switch. So that is > > 1 * 1000 for your case form single LS. So do you have some > comparative > > perspectives. > > > > > > > > dev mailing list > >> dev@openvswitch.org > >> http://openvswitch.org/mailman/listinfo/dev > >> > > > > > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2] tunneling: Improving tunneling performance using DPDK Rx checksum offloading feature.
2016-05-12 13:40 GMT-07:00 pravin shelar : > On Thu, May 12, 2016 at 12:59 PM, Jesse Gross wrote: > > On Thu, May 12, 2016 at 11:18 AM, pravin shelar wrote: > >> On Tue, May 10, 2016 at 6:31 PM, Jesse Gross wrote: > >>> I'm a little bit torn as to whether we should apply your rx checksum > >>> offload patch in the meantime while we wait for DPDK to offer the new > >>> API. It looks like we'll have a 10% gain with tunneling in exchange > >>> for a 1% loss in other situations, so the call obviously depends on > >>> use case. Pravin, Daniele, others, any opinions? > >>> > >> There could be a way around the API issue and avoid the 1% loss. > >> netdev API could be changed to set packet->mbuf.ol_flags to > >> (PKT_RX_IP_CKSUM_BAD | PKT_RX_L4_CKSUM_BAD) if the netdev > >> implementation does not support rx checksum offload. Then there is no > >> need to check the rx checksum flags in dpif-netdev. And the checksum > >> can be directly checked in tunneling code where we actually need to. > >> Is there any issue with this approach? > > > > I think that's probably a little bit cleaner overall though I don't > > think that it totally eliminates the overhead. Not all DPDK ports will > > support checksum offload (since the hardware may not do it in theory) > > so we'll still need to check the port status on each packet to > > initialize the flags. > > > I was thinking of changing dpdk packet object constructor > (ovs_rte_pktmbuf_init()) to initialize the flag according to the > device offload support. This way there should not be any checks needed > in packet receive path. > > It looks like (at least for ixgbe) the flags are reset by the rx routine, even if offloads are disabled. I don't have a better idea, IMHO losing 1% is not a huge deal Thanks > > The other thing that is a little concerning is that there might be > > conditions where a driver doesn't actually verify the checksum. I > > guess most of these aren't supported in our tunneling implementation > > (IP options comes to mind) but it's a little risky. > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH 1/3] datapath-windows: add nlMsgHdr to OvsPacketExecute
We'll need this for parsing nested attributes. Signed-off-by: Nithin Raju --- datapath-windows/ovsext/DpInternal.h | 1 + datapath-windows/ovsext/User.c | 13 - 2 files changed, 9 insertions(+), 5 deletions(-) diff --git a/datapath-windows/ovsext/DpInternal.h b/datapath-windows/ovsext/DpInternal.h index a3ce311..07bc180 100644 --- a/datapath-windows/ovsext/DpInternal.h +++ b/datapath-windows/ovsext/DpInternal.h @@ -275,6 +275,7 @@ typedef struct OvsPacketExecute { uint32_t packetLen; uint32_t actionsLen; + PNL_MSG_HDR nlMsgHdr; PCHAR packetBuf; PNL_ATTR actions; PNL_ATTR *keyAttrs; diff --git a/datapath-windows/ovsext/User.c b/datapath-windows/ovsext/User.c index 34f38f4..3b3f662 100644 --- a/datapath-windows/ovsext/User.c +++ b/datapath-windows/ovsext/User.c @@ -46,8 +46,9 @@ extern PNDIS_SPIN_LOCK gOvsCtrlLock; extern POVS_SWITCH_CONTEXT gOvsSwitchContext; OVS_USER_STATS ovsUserStats; -static VOID _MapNlAttrToOvsPktExec(PNL_ATTR *nlAttrs, PNL_ATTR *keyAttrs, - OvsPacketExecute *execute); +static VOID _MapNlAttrToOvsPktExec(PNL_MSG_HDR nlMsgHdr, PNL_ATTR *nlAttrs, + PNL_ATTR *keyAttrs, + OvsPacketExecute *execute); extern NL_POLICY nlFlowKeyPolicy[]; extern UINT32 nlFlowKeyPolicyLen; @@ -311,7 +312,7 @@ OvsNlExecuteCmdHandler(POVS_USER_PARAMS_CONTEXT usrParamsCtx, execute.dpNo = ovsHdr->dp_ifindex; -_MapNlAttrToOvsPktExec(nlAttrs, keyAttrs, &execute); +_MapNlAttrToOvsPktExec(nlMsgHdr, nlAttrs, keyAttrs, &execute); status = OvsExecuteDpIoctl(&execute); @@ -363,12 +364,14 @@ done: * */ static VOID -_MapNlAttrToOvsPktExec(PNL_ATTR *nlAttrs, PNL_ATTR *keyAttrs, - OvsPacketExecute *execute) +_MapNlAttrToOvsPktExec(PNL_MSG_HDR nlMsgHdr, PNL_ATTR *nlAttrs, + PNL_ATTR *keyAttrs, OvsPacketExecute *execute) { execute->packetBuf = NlAttrGet(nlAttrs[OVS_PACKET_ATTR_PACKET]); execute->packetLen = NlAttrGetSize(nlAttrs[OVS_PACKET_ATTR_PACKET]); +execute->nlMsgHdr = nlMsgHdr; + execute->actions = NlAttrGet(nlAttrs[OVS_PACKET_ATTR_ACTIONS]); execute->actionsLen = NlAttrGetSize(nlAttrs[OVS_PACKET_ATTR_ACTIONS]); -- 2.7.1.windows.1 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH 3/3] datapath-windows: Use l2 port and tunkey during execute
While testing DFW and recirc code it was found that userspace was calling into packet execute with the tunnel key and the vport added as part of the execute structure. We were not passing this along to the code that executes actions. The right thing is to contruct the key based on all of the attributes sent down from userspace. Signed-off-by: Nithin Raju --- datapath-windows/ovsext/User.c | 32 ++-- 1 file changed, 30 insertions(+), 2 deletions(-) diff --git a/datapath-windows/ovsext/User.c b/datapath-windows/ovsext/User.c index 3b3f662..2312940 100644 --- a/datapath-windows/ovsext/User.c +++ b/datapath-windows/ovsext/User.c @@ -51,6 +51,8 @@ static VOID _MapNlAttrToOvsPktExec(PNL_MSG_HDR nlMsgHdr, PNL_ATTR *nlAttrs, OvsPacketExecute *execute); extern NL_POLICY nlFlowKeyPolicy[]; extern UINT32 nlFlowKeyPolicyLen; +extern NL_POLICY nlFlowTunnelKeyPolicy[]; +extern UINT32 nlFlowTunnelKeyPolicyLen; static __inline VOID OvsAcquirePidHashLock() @@ -375,6 +377,7 @@ _MapNlAttrToOvsPktExec(PNL_MSG_HDR nlMsgHdr, PNL_ATTR *nlAttrs, execute->actions = NlAttrGet(nlAttrs[OVS_PACKET_ATTR_ACTIONS]); execute->actionsLen = NlAttrGetSize(nlAttrs[OVS_PACKET_ATTR_ACTIONS]); +ASSERT(keyAttrs[OVS_KEY_ATTR_IN_PORT]); execute->inPort = NlAttrGetU32(keyAttrs[OVS_KEY_ATTR_IN_PORT]); execute->keyAttrs = keyAttrs; } @@ -391,6 +394,8 @@ OvsExecuteDpIoctl(OvsPacketExecute *execute) OvsFlowKey key = { 0 }; OVS_PACKET_HDR_INFO layers = { 0 }; POVS_VPORT_ENTRYvport = NULL; +PNL_ATTR tunnelAttrs[__OVS_TUNNEL_KEY_ATTR_MAX]; +OvsFlowKey tempTunKey = {0}; if (execute->packetLen == 0) { status = STATUS_INVALID_PARAMETER; @@ -428,8 +433,31 @@ OvsExecuteDpIoctl(OvsPacketExecute *execute) goto dropit; } -ndisStatus = OvsExtractFlow(pNbl, fwdDetail->SourcePortId, &key, &layers, -NULL); +if (execute->keyAttrs[OVS_KEY_ATTR_TUNNEL]) { +UINT32 tunnelKeyAttrOffset; + +tunnelKeyAttrOffset = (UINT32)((PCHAR) + (execute->keyAttrs[OVS_KEY_ATTR_TUNNEL]) + - (PCHAR)execute->nlMsgHdr); + +/* Get tunnel keys attributes */ +if ((NlAttrParseNested(execute->nlMsgHdr, tunnelKeyAttrOffset, + NlAttrLen(execute->keyAttrs[OVS_KEY_ATTR_TUNNEL]), + nlFlowTunnelKeyPolicy, nlFlowTunnelKeyPolicyLen, + tunnelAttrs, ARRAY_SIZE(tunnelAttrs))) + != TRUE) { +OVS_LOG_ERROR("Tunnel key Attr Parsing failed for msg: %p", + execute->nlMsgHdr); +status = STATUS_INVALID_PARAMETER; +goto dropit; +} + +MapTunAttrToFlowPut(execute->keyAttrs, tunnelAttrs, &tempTunKey); +} + +ndisStatus = OvsExtractFlow(pNbl, execute->inPort, &key, &layers, + tempTunKey.tunKey.dst == 0 ? NULL : &tempTunKey.tunKey); + if (ndisStatus == NDIS_STATUS_SUCCESS) { NdisAcquireRWLockRead(gOvsSwitchContext->dispatchLock, &lockState, 0); ndisStatus = OvsActionsExecute(gOvsSwitchContext, NULL, pNbl, -- 2.7.1.windows.1 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH 2/3] datapath-windows: Make _MapTunAttrToFlowPut() global
Move this function out from file scope. Signed-off-by: Nithin Raju --- datapath-windows/ovsext/Flow.c | 16 +++- datapath-windows/ovsext/Flow.h | 2 ++ 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/datapath-windows/ovsext/Flow.c b/datapath-windows/ovsext/Flow.c index 1f23625..0682617 100644 --- a/datapath-windows/ovsext/Flow.c +++ b/datapath-windows/ovsext/Flow.c @@ -54,9 +54,6 @@ static VOID _MapKeyAttrToFlowPut(PNL_ATTR *keyAttrs, PNL_ATTR *tunnelAttrs, OvsFlowKey *destKey); -static VOID _MapTunAttrToFlowPut(PNL_ATTR *keyAttrs, - PNL_ATTR *tunnelAttrs, - OvsFlowKey *destKey); static VOID _MapNlToFlowPutFlags(PGENL_MSG_HDR genlMsgHdr, PNL_ATTR flowAttrClear, OvsFlowPut *mappedFlow); @@ -207,6 +204,7 @@ const NL_POLICY nlFlowTunnelKeyPolicy[] = { [OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS] = {.type = NL_A_VAR_LEN, .optional = TRUE} }; +const UINT32 nlFlowTunnelKeyPolicyLen = ARRAY_SIZE(nlFlowTunnelKeyPolicy); /* For Parsing nested OVS_FLOW_ATTR_ACTIONS attributes */ const NL_POLICY nlFlowActionPolicy[] = { @@ -1409,7 +1407,7 @@ _MapKeyAttrToFlowPut(PNL_ATTR *keyAttrs, PNL_ATTR *tunnelAttrs, OvsFlowKey *destKey) { -_MapTunAttrToFlowPut(keyAttrs, tunnelAttrs, destKey); +MapTunAttrToFlowPut(keyAttrs, tunnelAttrs, destKey); if (keyAttrs[OVS_KEY_ATTR_RECIRC_ID]) { destKey->recircId = NlAttrGetU32(keyAttrs[OVS_KEY_ATTR_RECIRC_ID]); @@ -1631,14 +1629,14 @@ _MapKeyAttrToFlowPut(PNL_ATTR *keyAttrs, /* * - * _MapTunAttrToFlowPut -- + * MapTunAttrToFlowPut -- *Converts FLOW_TUNNEL_KEY attribute to OvsFlowKey->tunKey. * */ -static VOID -_MapTunAttrToFlowPut(PNL_ATTR *keyAttrs, - PNL_ATTR *tunAttrs, - OvsFlowKey *destKey) +VOID +MapTunAttrToFlowPut(PNL_ATTR *keyAttrs, +PNL_ATTR *tunAttrs, +OvsFlowKey *destKey) { if (keyAttrs[OVS_KEY_ATTR_TUNNEL]) { diff --git a/datapath-windows/ovsext/Flow.h b/datapath-windows/ovsext/Flow.h index 310c472..fb3fb59 100644 --- a/datapath-windows/ovsext/Flow.h +++ b/datapath-windows/ovsext/Flow.h @@ -81,6 +81,8 @@ NTSTATUS MapFlowKeyToNlKey(PNL_BUFFER nlBuf, OvsFlowKey *flowKey, UINT16 keyType, UINT16 tunKeyType); NTSTATUS MapFlowTunKeyToNlKey(PNL_BUFFER nlBuf, OvsIPv4TunnelKey *tunKey, UINT16 tunKeyType); +VOID MapTunAttrToFlowPut(PNL_ATTR *keyAttrs, PNL_ATTR *tunAttrs, + OvsFlowKey *destKey); UINT32 OvsFlowKeyAttrSize(void); UINT32 OvsTunKeyAttrSize(void); -- 2.7.1.windows.1 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] Wqaojerzlgl
The original message was received at Fri, 13 May 2016 11:04:49 +0700 from openvswitch.org [83.210.88.242] - The following addresses had permanent fatal errors - dev@openvswitch.org ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] multiple ovs with multiple controller
controller1 eth2--- eth2 controller2 | | | | 10.1.1.2 eth0 eth1 | ovs1 | eth0 -- eth1 | ovs2 |eth0 -eth0- Host1(10.1.1.1) pc1 pc2pc3 pc4 I create the topology as follow : pc2 : ovs-vsctl add-br ovs0 ovs-vsctl add-port ovs0 eth0 ovs-vsctl add-port ovs0 eth1 ovs-vsctl set-controller ovs0 tcp:127.0.0.1:6653 pc3 ovs-vsctl add-br ovs0 ovs-vsctl add-port ovs0 eth0 ovs-vsctl add-port ovs0 eth1 ovs-vsctl set-controller ovs0 tcp:127.0.0.1:6653 controller1 connect to contorller2 with eth2 now the ping from pc1 to pc4 disconnect. the flow rule of pc2 is write correctly. cookie=0x20, duration=4456.775s, table=0, n_packets=4456, n_bytes=436688, idle_timeout=5, idle_age=0, priority=1,ip,in_port=3,dl_src=2c:53:4a:01:8f:bb,dl_dst=00:50:11:e0:13:a6,nw_src=10.1.1.1,nw_dst=10.1.1.2 actions=output:7 capture packet with tcpdump at the pc2 eth0 , icmp send from eth0 correctly capture packet with tcpdump at the pc3 eth1, icmp from eth2 is received correctly capture packet with tcpdump at the pc3 lo, packet in is send, but the controller cannot receive packet in. what I have tried : 1. if I set the ovs0 at pc2 and ovs0 at pc3 with same controller(controller1 or contoller2) , the ping will be ok. 2. set eth0 at pc2 and eth1 at pc3 with type patch port. It still cannot work. any help will be much appreciated. Thinks. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] Mail System Error - Returned Mail
___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] nxgl jgg
ÈÏäà ªk¹Ë˪/èªIÍJúáPø"ÅEy 3[°9l»Û/ KÛ}Á³÷!àÆÕÄ*%HÉG£À&Üù±òÏBò9~éø»«W?ãx¸ä(ÕO/p© h`RISm÷Ø®á;kgL?6.AÕé9/ÄZÁÔQgþ5KðïS0Â[zÀj`6ô)XZ¿n*U°ÄÓYNÒî£û×þKló¬4¸¦ÁTÁAQ`x¶v*pñHKmN¶®//ä2å·O)U©Wfo«¥ ÓÏ^a :ê#"ðÆnBYöá¿<0]ó`ª¬VâM÷±FÜý,£±zÔÔ¶Sà§x\û¤ùÞLÊ\R±G}"ñÊÊ#GJɬ\ij í˺MCbI\öZõò1ßA¾ÉlWiÆþ5¹Ý?þ1Þ|-~¢ë'6´Mó«ô^fëE>§ÁØ»üYCgv ë` º|ÞFÐÄ㤳MÑ¢AÈ_Äg³s<ÏÓiÉTåÀ~îEE¨Ã¤ÓvÀ9ðæÍ©}qXÈË YbØù¦)Izl¨[j&îæáÀdx×Ùva¢q½mXÈóé"ã§YÏ[he6ýáµËµò^ê"Lè9Ù? ÎMÞßoËfªÎmÀ¡ÌòîÓ;í á÷C¹s¢7÷ã%Í[ ¿ÔÉ3Z¨0ÎÖßÜZã¡Ãjº&übñ74ÎRjUê`ü³hü¨`môGýÃO7ЪS»2ÄI¸ÍÕF$ÅzvP[§?ír T~ÜüðjkBµO_Ø·×9C°ÔhîÕ. Î¥¤áÞ&ûÜkÓvvZÒ§ðfyqdOi36CB¾±Ëe8|0±ø MÉU~¢%ɯLÕ TÚAÉ9ê6$3kZ¼³~·é½óZ½°ýè²æ¾âz9{}4öS`X½[°}Ïjëê3Ä~Ók£¿N9m÷m¹1"®tÅ/¡z:ÏkðX.A"P· zàSLØ<ëvCâ÷Týúåµõä|YIrã-Fh®_(^BVú°Q2ßQú%ÖÕ%¼´Ö6¬ScÛ3ÅÉQUàØáÖlÖ]áûM8\6âlXðTçê xÝBØÒVh"ÅÉV_Ã6R*gÔ¦p®³5h»è;?åÁRuj8ãHë c;;xI`]gÚ*"(· FºëÏ <ÔDrÌ$zëlï)C¤¯5¯\Þõ »ìbWÒeÏéÅå ×T,÷±ºRdròE%3Ég¯ .Q-/%ê8·]жr-vm÷oÔ'øÖðVørA">âîTAy~³U ùÆaØhrE (Î`/®Þn³«ÜÇ#wÆ ³Ç*iãk½CMóóÙ*ÑÀÛÜþ¿õE æÞõt¸ç¯Åu1,Jºtü5Ï-xsñô¹D& ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev