Re: [ovs-dev] plan to integrate Hyper-V support
Hi Nithin, I'd also add at least: 1) Design for mapping hyper-v ports to OVS ports 2) CI testing plans 3) Roadmap Thanks, Alessandro On 29.07.2014, at 05:01, "Nithin Raju" wrote: > 10.00 AM Pacific - Sounds good to me. > > hi Alessandro, > We'll be meeting tomorrow. In the light of that, we had the following items > on mind that we could discuss. Pls. feel free to edit the list: > > 1. Helping Alin to run some tests successfully (with the V2 patch that > Saurabh posted last week). > 2. Any design or code walkthrough for the kernel code. > 3. Netlink interface related discussion. > 4. Architectural topics. > > See you tomorrow! > > thanks, > Nithin ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] bitmap: Optimize bitmap_equal and bitmap_scan
On Tue, Jul 29, 2014 at 1:01 AM, Ben Pfaff wrote: > Needs a sign-off. > Oh.. I will not forget it in the next version. > > I'm pretty sure that bitmap_equal() is wrong: doesn't it access one > past the end of the bitmap arrays? You are right. I'll fix it. > I think that rightmost_1bit_idx() might make bitmap_scan() faster. > I'll give it a try. > The nice thing about the previous implementations was that they were > obviously correct. Now, I'd prefer to have some tests. > > No problem, I'll add some tests in the next version. Thanks for reviewing ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH 2/2] netlink: Convert netlink_lookup() to use RCU protected hash table
Heavy Netlink users such as Open vSwitch spend a considerable amount of time in netlink_lookup() due to the read-lock on nl_table_lock. Use of RCU relieves the lock contention. Makes use of the new resizable hash table to avoid locking on the lookup. The hash table will grow if entries exceeds 75% of table size up to a total table size of 64K. It will automatically shrink if usage falls below 50%. Also splits nl_table_lock into a separate spinlock to protect hash table mutations. This avoids a possible deadlock when the hash table growing waits on RCU readers to complete via synchronize_rcu() while readers holding RCU read lock are waiting on the nl_table_lock() to be released to lock the table for broadcasting. Before: 9.16% kpktgend_0 [openvswitch] [k] masked_flow_lookup 6.42% kpktgend_0 [pktgen] [k] mod_cur_headers 6.26% kpktgend_0 [pktgen] [k] pktgen_thread_worker 6.23% kpktgend_0 [kernel.kallsyms] [k] memset 4.79% kpktgend_0 [kernel.kallsyms] [k] netlink_lookup 4.37% kpktgend_0 [kernel.kallsyms] [k] memcpy 3.60% kpktgend_0 [openvswitch] [k] ovs_flow_extract 2.69% kpktgend_0 [kernel.kallsyms] [k] jhash2 After: 15.26% kpktgend_0 [openvswitch] [k] masked_flow_lookup 8.12% kpktgend_0 [pktgen] [k] pktgen_thread_worker 7.92% kpktgend_0 [pktgen] [k] mod_cur_headers 5.11% kpktgend_0 [kernel.kallsyms] [k] memset 4.11% kpktgend_0 [openvswitch] [k] ovs_flow_extract 4.06% kpktgend_0 [kernel.kallsyms] [k] _raw_spin_lock 3.90% kpktgend_0 [kernel.kallsyms] [k] jhash2 [...] 0.67% kpktgend_0 [kernel.kallsyms] [k] netlink_lookup Signed-off-by: Thomas Graf --- net/netlink/af_netlink.c | 285 ++- net/netlink/af_netlink.h | 18 +-- net/netlink/diag.c | 11 +- 3 files changed, 119 insertions(+), 195 deletions(-) diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index 1b38f7f..7f44468 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -58,7 +58,9 @@ #include #include #include +#include #include +#include #include #include @@ -100,6 +102,18 @@ static atomic_t nl_table_users = ATOMIC_INIT(0); #define nl_deref_protected(X) rcu_dereference_protected(X, lockdep_is_held(&nl_table_lock)); +/* Protects netlink socket hash table mutations */ +DEFINE_SPINLOCK(nl_sk_hash_lock); + +static int lockdep_nl_sk_hash_is_held(void) +{ +#ifdef CONFIG_LOCKDEP + return (debug_locks) ? lockdep_is_held(&nl_sk_hash_lock) : 1; +#else + return 1; +#endif +} + static ATOMIC_NOTIFIER_HEAD(netlink_chain); static DEFINE_SPINLOCK(netlink_tap_lock); @@ -110,11 +124,6 @@ static inline u32 netlink_group_mask(u32 group) return group ? 1 << (group - 1) : 0; } -static inline struct hlist_head *nl_portid_hashfn(struct nl_portid_hash *hash, u32 portid) -{ - return &hash->table[jhash_1word(portid, hash->rnd) & hash->mask]; -} - int netlink_add_tap(struct netlink_tap *nt) { if (unlikely(nt->dev->type != ARPHRD_NETLINK)) @@ -983,105 +992,48 @@ netlink_unlock_table(void) wake_up(&nl_table_wait); } -static bool netlink_compare(struct net *net, struct sock *sk) -{ - return net_eq(sock_net(sk), net); -} - -static struct sock *netlink_lookup(struct net *net, int protocol, u32 portid) +struct netlink_compare_arg { - struct netlink_table *table = &nl_table[protocol]; - struct nl_portid_hash *hash = &table->hash; - struct hlist_head *head; - struct sock *sk; - - read_lock(&nl_table_lock); - head = nl_portid_hashfn(hash, portid); - sk_for_each(sk, head) { - if (table->compare(net, sk) && - (nlk_sk(sk)->portid == portid)) { - sock_hold(sk); - goto found; - } - } - sk = NULL; -found: - read_unlock(&nl_table_lock); - return sk; -} + struct net *net; + u32 portid; +}; -static struct hlist_head *nl_portid_hash_zalloc(size_t size) +static bool netlink_compare(void *ptr, void *arg) { - if (size <= PAGE_SIZE) - return kzalloc(size, GFP_ATOMIC); - else - return (struct hlist_head *) - __get_free_pages(GFP_ATOMIC | __GFP_ZERO, -get_order(size)); -} + struct netlink_compare_arg *x = arg; + struct sock *sk = ptr; -static void nl_portid_hash_free(struct hlist_head *table, size_t size) -{ - if (size <= PAGE_SIZE) - kfree(table); - else - free_pages((unsigned long)table, get_order(size)); + return nlk_sk(sk)->portid == x->portid && + net_eq(sock_net(sk), x->net); } -static int nl_portid_hash_rehash(struct nl_portid_hash *hash, int grow) +static struct sock *__netlink_lookup(struct netlink_table *table, u32 portid,
[ovs-dev] [PATCH 0/2 net-next] Lockless netlink_lookup() with new concurrent hash table
Netlink sockets are maintained in a hash table to allow efficient lookup via the port ID for unicast messages. However, lookups currently require a read lock to be taken. This series adds a new generic, resizable, scalable, concurrent hash table based on the paper referenced in the first patch. It then makes use of the new data type to implement lockless netlink_lookup(). Against net-next since the initial user of the new hash table is in net/ Thomas Graf (2): lib: Resizable, Scalable, Concurrent Hash Table netlink: Convert netlink_lookup() to use RCU protected hash table include/linux/rhashtable.h | 208 lib/Kconfig.debug | 8 + lib/Makefile | 2 +- lib/rhashtable.c | 783 + net/netlink/af_netlink.c | 285 +++-- net/netlink/af_netlink.h | 18 +- net/netlink/diag.c | 11 +- 7 files changed, 1119 insertions(+), 196 deletions(-) create mode 100644 include/linux/rhashtable.h create mode 100644 lib/rhashtable.c -- 1.9.3 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH 1/2] lib: Resizable, Scalable, Concurrent Hash Table
Generic implementation of a resizable, scalable, concurrent hash table based on [0]. The implementation supports both, fixed size keys specified via an offset and length, or arbitrary keys via own hash and compare functions. Lookups are lockless and protected as RCU read side critical sections. Automatic growing/shrinking based on user configurable watermarks is available while allowing concurrent lookups to take place. Objects to be hashed must include a struct rhash_head. The reason for not using the existing struct hlist_head is that the expansion and shrinking will have two buckets point to a single entry which would lead in obscure reverse chaining behaviour. Code includes a boot selftest if CONFIG_TEST_RHASHTABLE is defined. [0] https://www.usenix.org/legacy/event/atc11/tech/final_files/Triplett.pdf Signed-off-by: Thomas Graf --- include/linux/rhashtable.h | 208 lib/Kconfig.debug | 8 + lib/Makefile | 2 +- lib/rhashtable.c | 783 + 4 files changed, 1000 insertions(+), 1 deletion(-) create mode 100644 include/linux/rhashtable.h create mode 100644 lib/rhashtable.c diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h new file mode 100644 index 000..f784f7a --- /dev/null +++ b/include/linux/rhashtable.h @@ -0,0 +1,208 @@ +/* + * Resizable, Scalable, Concurrent Hash Table + * + * Copyright (c) 2014 Thomas Graf + * + * Based on the following paper by Josh Triplett, Paul E. McKenney + * and Jonathan Walpole: + * https://www.usenix.org/legacy/event/atc11/tech/final_files/Triplett.pdf + * + * Code partially derived from nft_hash: + * Copyright (c) 2008-2014 Patrick McHardy + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#ifndef _LINUX_RHASHTABLE_H +#define _LINUX_RHASHTABLE_H + +#include + +struct rhash_head { + struct rhash_head *next; +}; + +#define INIT_HASH_HEAD(ptr) ((ptr)->next = NULL) + +struct bucket_table { + size_t size; + struct rhash_head __rcu *buckets[]; +}; + +typedef u32 (*rht_hashfn_t)(const void *data, u32 len, u32 seed); +typedef u32 (*rht_obj_hashfn_t)(const void *data, u32 seed); + +struct rhashtable; + +/** + * struct rhashtable_params - Hash table construction parameters + * @nelem_hint: Hint on number of elements expected + * @key_len: Length of key + * @key_offset: Offset of key in struct to be hashed + * @head_offset: Offset of rhash_head in struct to be hashed + * @hash_rnd: Seed to use while hashing + * @max_shift: Maximum number of shifts while expanding + * @hashfn: Function to hash key + * @obj_hashfn: Function to hash object + * @grow_decision: If defined, may return true if table should expand + * @shrink_decision: If defined, may return true if table should shrink + * @mutex_is_held: Must return true if protecting mutex is held + */ +struct rhashtable_params { + size_t nelem_hint; + size_t key_len; + size_t key_offset; + size_t head_offset; + u32 hash_rnd; + size_t max_shift; + rht_hashfn_thashfn; + rht_obj_hashfn_tobj_hashfn; + bool(*grow_decision)(const struct rhashtable *ht, +size_t new_size); + bool(*shrink_decision)(const struct rhashtable *ht, + size_t new_size); + int (*mutex_is_held)(void); +}; + +/** + * struct rhashtable - Hash table handle + * @tbl: Bucket table + * @nelems: Number of elements in table + * @shift: Current size (1 << shift) + * @p: Configuration parameters + */ +struct rhashtable { + struct bucket_table __rcu *tbl; + size_t nelems; + size_t shift; + struct rhashtable_paramsp; +}; + +#ifdef CONFIG_PROVE_LOCKING +int lockdep_rht_mutex_is_held(const struct rhashtable *ht); +#else +static inline int lockdep_rht_mutex_is_held(const struct rhashtable *ht) +{ + return 1; +} +#endif /* CONFIG_PROVE_LOCKING */ + +int rhashtable_init(struct rhashtable *ht, struct rhashtable_params *params); + +u32 rhashtable_hashfn(const struct rhashtable *ht, const void *key, u32 len); +u32 rhashtable_obj_hashfn(const struct rhashtable *ht, void *ptr); + +int rhashtable_insert(struct rhashtable *ht, struct rhash_head *node, gfp_t); +bool rhashtable_remove(struct rhashtable *ht, struct rhash_head *node, gfp_t); + +bool rht_grow_above_75(const struct rhashtable *ht, size_t new_size); +bool rht_shrink_below_50(const struct rhashtable *ht, size_t new_size); + +int rhashtable_expand(struct rhashtable *ht,
[ovs-dev] [PATCH v2 2/5] Add NETLINK defines
Add MAX_LINKS define needed for nl_pool. Add NLM_F_CREATE and NLM_F_EXCL defines also. Signed-off-by: Alin Gabriel Serdean --- lib/netlink-protocol.h | 4 1 file changed, 4 insertions(+) diff --git a/lib/netlink-protocol.h b/lib/netlink-protocol.h index 88b7abf..8938055 100644 --- a/lib/netlink-protocol.h +++ b/lib/netlink-protocol.h @@ -48,7 +48,9 @@ #define NLM_F_ROOT 0x100 #define NLM_F_MATCH 0x200 +#define NLM_F_EXCL 0x200 #define NLM_F_ATOMIC0x400 +#define NLM_F_CREATE0x400 #define NLM_F_DUMP (NLM_F_ROOT | NLM_F_MATCH) /* nlmsg_type values. */ @@ -59,6 +61,8 @@ #define NLMSG_MIN_TYPE 0x10 +#define MAX_LINKS 32 + struct nlmsghdr { uint32_t nlmsg_len; uint16_t nlmsg_type; -- 1.9.0.msysgit.0 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH] vtep-ctl: Add Tunnel table to vtep_ctl_table_class.
This is needed to create, get, set records in the Tunnel table. Signed-off-by: Gurucharan Shetty --- vtep/vtep-ctl.c |4 1 file changed, 4 insertions(+) diff --git a/vtep/vtep-ctl.c b/vtep/vtep-ctl.c index 0b9463a..129fa03 100644 --- a/vtep/vtep-ctl.c +++ b/vtep/vtep-ctl.c @@ -2283,6 +2283,10 @@ static const struct vtep_ctl_table_class tables[] = { {{&vteprec_table_physical_switch, &vteprec_physical_switch_col_name, NULL}, {NULL, NULL, NULL}}}, +{&vteprec_table_tunnel, + {{NULL, NULL, NULL}, + {NULL, NULL, NULL}}}, + {NULL, {{NULL, NULL, NULL}, {NULL, NULL, NULL}}} }; -- 1.7.9.5 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v2 3/5] Bypass HAVE_NETLINK for MSVC
Bypass the error compilation when compiling under MSVC. Signed-off-by: Alin Gabriel Serdean --- lib/netlink-socket.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/lib/netlink-socket.h b/lib/netlink-socket.h index d53db4e..2b9ec52 100644 --- a/lib/netlink-socket.h +++ b/lib/netlink-socket.h @@ -57,8 +57,10 @@ struct nl_sock; #ifndef HAVE_NETLINK +#ifndef _WIN32 #error "netlink-socket.h is only for hosts that support Netlink sockets" #endif +#endif /* Netlink sockets. */ int nl_sock_create(int protocol, struct nl_sock **); -- 1.9.0.msysgit.0 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v2 4/5] Changes needed to netlink-socket for MSVC
Add two functions set_sock_pid_in_kernel and portid_next. This will allow the channel identification for the kernel extension to send back messages. Replace send with WriteFile equivalent and ignore nl_sock_drain for the moment under MSVC. Replace sendmsg and recvmsg with ReadFile and WriteFile equivalents. On MSVC put in handle instead of fd(sock->fd becomes sock->handle). Creation of the netlink socket will be replaced by CreateFile equivalent. Add MAX_STACK_LENGTH for MSVC this will be our maximmum for on-stack copy buffer. Signed-off-by: Alin Gabriel Serdean --- lib/netlink-socket.c | 178 ++- 1 file changed, 177 insertions(+), 1 deletion(-) diff --git a/lib/netlink-socket.c b/lib/netlink-socket.c index 09d3a61..9ce4c53 100644 --- a/lib/netlink-socket.c +++ b/lib/netlink-socket.c @@ -48,6 +48,33 @@ COVERAGE_DEFINE(netlink_sent); #define SOL_NETLINK 270 #endif +#ifdef _WIN32 +static struct ovs_mutex portid_mutex = OVS_MUTEX_INITIALIZER; +static uint32_t g_last_portid = 0; + +/* Port IDs must be unique! */ +uint32_t +portid_next() OVS_GUARDED_BY(portid_mutex) +{ +g_last_portid++; +return g_last_portid; +} + +void +set_sock_pid_in_kernel(HANDLE handle, uint32_t pid) OVS_GUARDED_BY(portid_mutex) +{ +struct nlmsghdr msg = { 0 }; + +msg.nlmsg_len = sizeof(struct nlmsghdr); +msg.nlmsg_type = 80; /* target = set file pid */ +msg.nlmsg_flags = 0; +msg.nlmsg_seq = 0; +msg.nlmsg_pid = pid; + +WriteFile(handle, &msg, sizeof(struct nlmsghdr), NULL, NULL); +} +#endif + /* A single (bad) Netlink message can in theory dump out many, many log * messages, so the burst size is set quite high here to avoid missing useful * information. Also, at high logging levels we log *all* Netlink messages. */ @@ -60,7 +87,11 @@ static void log_nlmsg(const char *function, int error, /* Netlink sockets. */ struct nl_sock { +#ifdef _WIN32 +HANDLE handle; +#else int fd; +#endif uint32_t next_seq; uint32_t pid; int protocol; @@ -88,7 +119,9 @@ nl_sock_create(int protocol, struct nl_sock **sockp) { static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER; struct nl_sock *sock; +#ifndef _WIN32 struct sockaddr_nl local, remote; +#endif socklen_t local_size; int rcvbuf; int retval = 0; @@ -114,15 +147,39 @@ nl_sock_create(int protocol, struct nl_sock **sockp) *sockp = NULL; sock = xmalloc(sizeof *sock); +#ifdef _WIN32 +sock->handle = CreateFileA(".\\OpenVSwitchDevice", + GENERIC_READ | GENERIC_WRITE, + FILE_SHARE_READ | FILE_SHARE_WRITE, + NULL, OPEN_EXISTING, + FILE_ATTRIBUTE_NORMAL, NULL); + +int last_error = GetLastError(); + +if (sock->handle == INVALID_HANDLE_VALUE) { +VLOG_ERR("fcntl: %s", ovs_strerror(last_error)); +goto error; +} +#else sock->fd = socket(AF_NETLINK, SOCK_RAW, protocol); if (sock->fd < 0) { VLOG_ERR("fcntl: %s", ovs_strerror(errno)); goto error; } +#endif + sock->protocol = protocol; sock->next_seq = 1; rcvbuf = 1024 * 1024; +#ifdef _WIN32 +sock->rcvbuf = rcvbuf; +ovs_mutex_init(&portid_mutex); +ovs_mutex_lock(&portid_mutex); +sock->pid = portid_next(); +set_sock_pid_in_kernel(sock->handle, sock->pid); +ovs_mutex_unlock(&portid_mutex); +#else if (setsockopt(sock->fd, SOL_SOCKET, SO_RCVBUFFORCE, &rcvbuf, sizeof rcvbuf)) { /* Only root can use SO_RCVBUFFORCE. Everyone else gets EPERM. @@ -161,6 +218,7 @@ nl_sock_create(int protocol, struct nl_sock **sockp) goto error; } sock->pid = local.nl_pid; +#endif *sockp = sock; return 0; @@ -172,9 +230,15 @@ error: retval = EINVAL; } } +#ifdef _WIN32 +if (sock->handle != INVALID_HANDLE_VALUE) { +CloseHandle(sock->handle); +} +#else if (sock->fd >= 0) { close(sock->fd); } +#endif free(sock); return retval; } @@ -193,7 +257,11 @@ void nl_sock_destroy(struct nl_sock *sock) { if (sock) { +#ifdef _WIN32 +CloseHandle(sock->handle); +#else close(sock->fd); +#endif free(sock); } } @@ -212,12 +280,40 @@ nl_sock_destroy(struct nl_sock *sock) int nl_sock_join_mcgroup(struct nl_sock *sock, unsigned int multicast_group) { +#ifdef _WIN32 +#define OVS_VPORT_MCGROUP_FALLBACK_ID 33 +struct ofpbuf msg_buf; +struct message_multicast +{ +struct nlmsghdr; +/* if true, join; if else, leave */ +unsigned char join; +unsigned int groupId; +}; + +struct message_multicast msg = { 0 }; + +msg.nlmsg_len = sizeof(struct message_multicast); +msg.nlmsg_type = OVS_VPORT_MCGROUP_FALLBACK_ID; +msg.nlmsg_flags = 0; +msg.nlmsg_seq = 0; +msg.nlmsg_pid = sock->pid; +
[ovs-dev] [PATCH v2 5/5] Add more files to the openvswitch library on MSVC
Add netlink related files to the windows build. Signed-off-by: Alin Gabriel Serdean --- lib/automake.mk | 9 + 1 file changed, 9 insertions(+) diff --git a/lib/automake.mk b/lib/automake.mk index 0997df5..87a8faa 100644 --- a/lib/automake.mk +++ b/lib/automake.mk @@ -319,6 +319,15 @@ lib_libopenvswitch_la_SOURCES += \ lib/netdev-dpdk.h endif +if WIN32 +lib_libopenvswitch_la_SOURCES += \ + lib/netlink-notifier.c \ + lib/netlink-notifier.h \ + lib/netlink-protocol.h \ + lib/netlink-socket.c \ + lib/netlink-socket.h +endif + if HAVE_POSIX_AIO lib_libopenvswitch_la_SOURCES += lib/async-append-aio.c else -- 1.9.0.msysgit.0 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v2 1/5] Add defines, enums and headers for MSVC
Add defines needed to compile netlink-socket.c and netlink.c. Add a wrapper and the functionality behind it for syconf. Add the newly created files to the noinst_HEADERS in windows/automake.mk Signed-off-by: Alin Gabriel Serdean --- include/windows/automake.mk| 1 + include/windows/net/if.h | 50 ++ include/windows/netpacket/packet.h | 40 ++ include/windows/sys/uio.h | 22 + include/windows/unistd.h | 50 ++ 5 files changed, 163 insertions(+) create mode 100644 include/windows/netpacket/packet.h diff --git a/include/windows/automake.mk b/include/windows/automake.mk index 4d5a42c..2f1b631 100644 --- a/include/windows/automake.mk +++ b/include/windows/automake.mk @@ -11,6 +11,7 @@ noinst_HEADERS += \ include/windows/getopt.h \ include/windows/net/if.h \ include/windows/netdb.h \ + include/windows/netpacket/packet.h \ include/windows/netinet/icmp6.h \ include/windows/netinet/in.h \ include/windows/netinet/in_systm.h \ diff --git a/include/windows/net/if.h b/include/windows/net/if.h index 893ffe4..3a064ae 100644 --- a/include/windows/net/if.h +++ b/include/windows/net/if.h @@ -21,4 +21,54 @@ #define IFNAMSIZ IF_NAMESIZE +enum { +IFLA_UNSPEC, +IFLA_ADDRESS, +IFLA_BROADCAST, +IFLA_IFNAME, +IFLA_MTU, +IFLA_LINK, +IFLA_QDISC, +IFLA_STATS, +IFLA_COST, +#define IFLA_COST IFLA_COST +IFLA_PRIORITY, +#define IFLA_PRIORITY IFLA_PRIORITY +IFLA_MASTER, +#define IFLA_MASTER IFLA_MASTER +IFLA_WIRELESS, +#define IFLA_WIRELESS IFLA_WIRELESS +IFLA_PROTINFO, +#define IFLA_PROTINFO IFLA_PROTINFO +IFLA_TXQLEN, +#define IFLA_TXQLEN IFLA_TXQLEN +IFLA_MAP, +#define IFLA_MAP IFLA_MAP +IFLA_WEIGHT, +#define IFLA_WEIGHT IFLA_WEIGHT +IFLA_OPERSTATE, +IFLA_LINKMODE, +IFLA_LINKINFO, +#define IFLA_LINKINFO IFLA_LINKINFO +IFLA_NET_NS_PID, +IFLA_IFALIAS, +IFLA_NUM_VF, +IFLA_VFINFO_LIST, +IFLA_STATS64, +IFLA_VF_PORTS, +IFLA_PORT_SELF, +IFLA_AF_SPEC, +IFLA_GROUP, +IFLA_NET_NS_FD, +IFLA_EXT_MASK, +IFLA_PROMISCUITY, +#define IFLA_PROMISCUITY IFLA_PROMISCUITY +IFLA_NUM_TX_QUEUES, +IFLA_NUM_RX_QUEUES, +IFLA_CARRIER, +IFLA_PHYS_PORT_ID, +__IFLA_MAX +}; +#define IFLA_MAX (__IFLA_MAX - 1) + #endif /* net/if.h */ diff --git a/include/windows/netpacket/packet.h b/include/windows/netpacket/packet.h new file mode 100644 index 000..cf26a70 --- /dev/null +++ b/include/windows/netpacket/packet.h @@ -0,0 +1,40 @@ +/* + * Copyright 2014 Cloudbase Solutions Srl + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef __NETPACKET_PACKET_H +#define __NETPACKET_PACKET_H 1 + +struct iovec +{ +void *iov_base; +unsigned int iov_len; +}; + +struct msghdr + { +void *msg_name; +socklen_t msg_namelen; + +struct iovec *msg_iov; +size_t msg_iovlen; + +void *msg_control; +size_t msg_controllen; + +int msg_flags; + }; + +#endif /* netpacket/packet.h */ \ No newline at end of file diff --git a/include/windows/sys/uio.h b/include/windows/sys/uio.h index e69de29..b51c367 100644 --- a/include/windows/sys/uio.h +++ b/include/windows/sys/uio.h @@ -0,0 +1,22 @@ +/* + * Copyright 2014 Cloudbase Solutions Srl + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +#ifndef __SYS_UIO_H +#define __SYS_UIO_H 1 + +#include + +#endif /* sys/uio.h */ \ No newline at end of file diff --git a/include/windows/unistd.h b/include/windows/unistd.h index d9ded5a..8629f7e 100644 --- a/include/windows/unistd.h +++ b/include/windows/unistd.h @@ -16,6 +16,9 @@ #ifndef _UNISTD_H #define _UNISTD_H 1 +#define WIN32_LEAN_AND_MEAN +#include + #define fsync _commit /* Standard file descriptors. */ @@ -23,4 +26,51 @@ #define STDOUT_FILENO 1 /* Standard outpu
Re: [ovs-dev] [PATCH 1/2] lib: Resizable, Scalable, Concurrent Hash Table
On Tue, Jul 29, 2014 at 01:41:32PM +0200, Thomas Graf wrote: > Generic implementation of a resizable, scalable, concurrent hash table > based on [0]. The implementation supports both, fixed size keys specified > via an offset and length, or arbitrary keys via own hash and compare > functions. > > Lookups are lockless and protected as RCU read side critical sections. > Automatic growing/shrinking based on user configurable watermarks is > available while allowing concurrent lookups to take place. > > Objects to be hashed must include a struct rhash_head. The reason for not > using the existing struct hlist_head is that the expansion and shrinking > will have two buckets point to a single entry which would lead in obscure > reverse chaining behaviour. > > Code includes a boot selftest if CONFIG_TEST_RHASHTABLE is defined. > > [0] https://www.usenix.org/legacy/event/atc11/tech/final_files/Triplett.pdf > > Signed-off-by: Thomas Graf Thanks for working on this! Currently reading the algorithm implementation in more detail. A couple of minor issues below. > --- /dev/null > +++ b/include/linux/rhashtable.h [...] > +struct rhash_head { > + struct rhash_head *next; > +}; Arguably this could become a new singly-linked list type in list.h; we don't currently have one. > +/** > + * rht_for_each_entry_safe - safely iterate over hash chain of given type > + * @pos: type * to use as a loop cursor. > + * @n: type * to use for temporary next object storage > + * @head:head of the hash chain (struct rhash_head *) > + * @ht: pointer to your struct rhashtable > + * @member: name of the rhash_head within the hashable struct. > + * > + * This hash chain list-traversal primitive allows for the looped code to > + * remove the loop cursor from the list. > + */ > +#define rht_for_each_entry_safe(pos, n, head, ht, member)\ > + for (pos = rht_entry_safe(rht_dereference(head, ht), \ > +typeof(*(pos)), member), \ > + n = rht_entry_safe(rht_dereference((pos)->member.next, ht), \ > + typeof(*(pos)), member); \ > + pos; \ > + pos = n, \ > + n = rht_entry_safe(rht_dereference((pos)->member.next, ht), \ > + typeof(*(pos)), member)) Given that removal requires special care, is this something actually needed in the public interface, or only internally? You don't necessarily need to define a full set of list primitives. (Unless of course you make this a new list type in list.h.) > --- /dev/null > +++ b/lib/rhashtable.c [...] > +#define ASSERT_RHT_MUTEX(HT) BUG_ON(unlikely(!lockdep_rht_mutex_is_held(HT))) BUG_ON and WARN_ON already include unlikely(); you don't need to add it yourself. > +/** > + * rht_obj - cast hash head to outer object > + * @ht: hash table > + * @he: hashed node > + */ > +void *rht_obj(const struct rhashtable *ht, const struct rhash_head *he) > +{ > + return (void *) he - ht->p.head_offset; > +} > +EXPORT_SYMBOL_GPL(rht_obj); Should this be a static inline? And, will the runtime indirection involved in head_offset add unnecessary overhead for tables of a known type? (I'd expect a head_offset of 0 in common cases.) - Josh Triplett ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] netlink-socket: Add conceptual documentation.
This is excellent. Thanks, Ben. > -Original Message- > From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Ben Pfaff > Sent: Monday, July 28, 2014 11:35 PM > To: dev@openvswitch.org > Cc: Ben Pfaff > Subject: [ovs-dev] [PATCH] netlink-socket: Add conceptual documentation. > > Based on a conversation with the VMware Hyper-V team earlier today. > > This commit also changes a couple of functions that were only used with > netlink-socket.c into static functions. I couldn't think of a reason for code > outside that file to use them. > > Signed-off-by: Ben Pfaff > --- > lib/netlink-socket.c | 124 +-- > lib/netlink-socket.h | 145 > +++ > 2 files changed, 197 insertions(+), 72 deletions(-) > > diff --git a/lib/netlink-socket.c b/lib/netlink-socket.c index > 09d3a61..0ff85d6 > 100644 > --- a/lib/netlink-socket.c > +++ b/lib/netlink-socket.c > @@ -537,26 +537,7 @@ nl_sock_transact_multiple__(struct nl_sock *sock, > return error; > } > > -/* Sends the 'request' member of the 'n' transactions in 'transactions' on > - * 'sock', in order, and receives responses to all of them. Fills in the > - * 'error' member of each transaction with 0 if it was successful, otherwise > - * with a positive errno value. If 'reply' is nonnull, then it will be > filled > - * with the reply if the message receives a detailed reply. In other cases, > - * i.e. where the request failed or had no reply beyond an indication of > - * success, 'reply' will be cleared if it is nonnull. > - * > - * The caller is responsible for destroying each request and reply, and the > - * transactions array itself. > - * > - * Before sending each message, this function will finalize nlmsg_len in each > - * 'request' to match the ofpbuf's size, set nlmsg_pid to 'sock''s pid, and > - * initialize nlmsg_seq. > - * > - * Bare Netlink is an unreliable transport protocol. This function layers > - * reliable delivery and reply semantics on top of bare Netlink. See > - * nl_sock_transact() for some caveats. > - */ > -void > +static void > nl_sock_transact_multiple(struct nl_sock *sock, >struct nl_transaction **transactions, size_t n) { > @@ -611,47 > +592,7 @@ nl_sock_transact_multiple(struct nl_sock *sock, > } > } > > -/* Sends 'request' to the kernel via 'sock' and waits for a response. If > - * successful, returns 0. On failure, returns a positive errno value. > - * > - * If 'replyp' is nonnull, then on success '*replyp' is set to the kernel's > - * reply, which the caller is responsible for freeing with ofpbuf_delete(), > and > - * on failure '*replyp' is set to NULL. If 'replyp' is null, then the > kernel's > - * reply, if any, is discarded. > - * > - * Before the message is sent, nlmsg_len in 'request' will be finalized to > - * match ofpbuf_size(msg), nlmsg_pid will be set to 'sock''s pid, and > nlmsg_seq will > - * be initialized, NLM_F_ACK will be set in nlmsg_flags. > - * > - * The caller is responsible for destroying 'request'. > - * > - * Bare Netlink is an unreliable transport protocol. This function layers > - * reliable delivery and reply semantics on top of bare Netlink. > - * > - * In Netlink, sending a request to the kernel is reliable enough, because > the > - * kernel will tell us if the message cannot be queued (and we will in that > - * case put it on the transmit queue and wait until it can be delivered). > - * > - * Receiving the reply is the real problem: if the socket buffer is full when > - * the kernel tries to send the reply, the reply will be dropped. However, > the > - * kernel sets a flag that a reply has been dropped. The next call to recv > - * then returns ENOBUFS. We can then re-send the request. > - * > - * Caveats: > - * > - * 1. Netlink depends on sequence numbers to match up requests and > - * replies. The sender of a request supplies a sequence number, and > - * the reply echos back that sequence number. > - * > - * This is fine, but (1) some kernel netlink implementations are > - * broken, in that they fail to echo sequence numbers and (2) this > - * function will drop packets with non-matching sequence numbers, so > - * that only a single request can be usefully transacted at a time. > - * > - * 2. Resending the request causes it to be re-executed, so the request > - * needs to be idempotent. > - */ > -int > +static int > nl_sock_transact(struct nl_sock *sock, const struct ofpbuf *request, > struct ofpbuf **replyp) { @@ -1124,6 +1065,47 @@ > nl_pool_release(struct nl_sock *sock) > } > } > > +/* Sends 'request' to the kernel on a Netlink socket for the given 'protocol' > + * (e.g. NETLINK_ROUTE or NETLINK_GENERIC) and waits for a response. > +If > + * successful, returns 0. On failure, returns a positive errno value. > + * > + * If 'replyp' is nonnull, then on s
Re: [ovs-dev] [PATCH 2/2] netlink: Convert netlink_lookup() to use RCU protected hash table
On 2014-07-29 at 13:41:33 +0200, Thomas Graf wrote: > Heavy Netlink users such as Open vSwitch spend a considerable amount of > time in netlink_lookup() due to the read-lock on nl_table_lock. Use of > RCU relieves the lock contention. > > Makes use of the new resizable hash table to avoid locking on the > lookup. > > The hash table will grow if entries exceeds 75% of table size up to a > total table size of 64K. It will automatically shrink if usage falls > below 50%. > > Also splits nl_table_lock into a separate spinlock to protect hash table > mutations. This avoids a possible deadlock when the hash table growing > waits on RCU readers to complete via synchronize_rcu() while readers > holding RCU read lock are waiting on the nl_table_lock() to be released > to lock the table for broadcasting. > > Before: >9.16% kpktgend_0 [openvswitch] [k] masked_flow_lookup >6.42% kpktgend_0 [pktgen] [k] mod_cur_headers >6.26% kpktgend_0 [pktgen] [k] pktgen_thread_worker >6.23% kpktgend_0 [kernel.kallsyms] [k] memset >4.79% kpktgend_0 [kernel.kallsyms] [k] netlink_lookup >4.37% kpktgend_0 [kernel.kallsyms] [k] memcpy >3.60% kpktgend_0 [openvswitch] [k] ovs_flow_extract >2.69% kpktgend_0 [kernel.kallsyms] [k] jhash2 > > After: > 15.26% kpktgend_0 [openvswitch] [k] masked_flow_lookup >8.12% kpktgend_0 [pktgen] [k] pktgen_thread_worker >7.92% kpktgend_0 [pktgen] [k] mod_cur_headers >5.11% kpktgend_0 [kernel.kallsyms] [k] memset >4.11% kpktgend_0 [openvswitch] [k] ovs_flow_extract >4.06% kpktgend_0 [kernel.kallsyms] [k] _raw_spin_lock >3.90% kpktgend_0 [kernel.kallsyms] [k] jhash2 >[...] >0.67% kpktgend_0 [kernel.kallsyms] [k] netlink_lookup > > Signed-off-by: Thomas Graf > --- > net/netlink/af_netlink.c | 285 > ++- > net/netlink/af_netlink.h | 18 +-- > net/netlink/diag.c | 11 +- > 3 files changed, 119 insertions(+), 195 deletions(-) > > diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c > index 1b38f7f..7f44468 100644 > --- a/net/netlink/af_netlink.c > +++ b/net/netlink/af_netlink.c [...] > @@ -2996,28 +2939,26 @@ static void *netlink_seq_next(struct seq_file *seq, > void *v, loff_t *pos) > > net = seq_file_net(seq); > iter = seq->private; > - s = v; > - do { > - s = sk_next(s); > - } while (s && !nl_table[s->sk_protocol].compare(net, s)); > - if (s) > - return s; > + nlk = v; > + > + rht_for_each_entry_rcu(nlk, nlk->node.next, node) > + if (net_eq(sock_net((struct sock *)nlk), net)) > + return nlk; > > i = iter->link; > j = iter->hash_idx + 1; > > do { > - struct nl_portid_hash *hash = &nl_table[i].hash; > - > - for (; j <= hash->mask; j++) { > - s = sk_head(&hash->table[j]); > + struct rhashtable *ht = &nl_table[i].hash; > + const struct bucket_table *tbl = rht_dereference(ht->tbl, ht); > > - while (s && !nl_table[s->sk_protocol].compare(net, s)) > - s = sk_next(s); > - if (s) { > - iter->link = i; > - iter->hash_idx = j; > - return s; > + for (; j <= tbl->size; j++) { Should the condition j < tbl->size here, since the bucket table only contains `size' buckets? > + rht_for_each_entry_rcu(nlk, tbl->buckets[j], node) { > + if (net_eq(sock_net((struct sock *)nlk), net)) { > + iter->link = i; > + iter->hash_idx = j; > + return nlk; > + } > } > } [...] > diff --git a/net/netlink/diag.c b/net/netlink/diag.c > index 1af29624..7588f34 100644 > --- a/net/netlink/diag.c > +++ b/net/netlink/diag.c [...] > @@ -101,16 +102,20 @@ static int __netlink_diag_dump(struct sk_buff *skb, > struct netlink_callback *cb, > int protocol, int s_num) > { > struct netlink_table *tbl = &nl_table[protocol]; > - struct nl_portid_hash *hash = &tbl->hash; > + struct rhashtable *ht = &tbl->hash; > + const struct bucket_table *htbl = rht_dereference(ht->tbl, ht); > struct net *net = sock_net(skb->sk); > struct netlink_diag_req *req; > + struct netlink_sock *nlsk; > struct sock *sk; > int ret = 0, num = 0, i; > > req = nlmsg_data(cb->nlh); > > - for (i = 0; i <= hash->mask; i++) { > - sk_for_each(sk, &hash->table[i]) { > + for (i = 0; i <= htbl->size; i++) { Same here, condition should be i < htbl->size > + rht_for_each_entry(nlsk, h
Re: [ovs-dev] [PATCH] netlink-socket: Add conceptual documentation.
Thanks for the review. I realized that I left off a couple of important points, so I'm folding in the following and will apply this to master soon. diff --git a/lib/netlink-socket.h b/lib/netlink-socket.h index 1450862..cdf32d2 100644 --- a/lib/netlink-socket.h +++ b/lib/netlink-socket.h @@ -149,6 +149,13 @@ * to select one of the PIDs, and sends the packet (encapsulated in an * Open vSwitch Netlink message) to the socket with the selected PID. * + * nl_sock_recv() reads notifications sent this way. + * + * Messages received this way can overflow, just like multicast + * subscription messages, and they are reported the same way. Because + * packet notification messages do not report the state of a table, there + * is no way to recover the dropped packets; they are simply lost. + * * The main reason to support multiple PIDs per vport is to increase * fairness, that is, to make it harder for a single high-flow-rate * sender to drown out lower rate sources. Multiple PIDs per vport might On Tue, Jul 29, 2014 at 03:29:53PM +, Saurabh Shah wrote: > This is excellent. Thanks, Ben. > > > -Original Message- > > From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Ben Pfaff > > Sent: Monday, July 28, 2014 11:35 PM > > To: dev@openvswitch.org > > Cc: Ben Pfaff > > Subject: [ovs-dev] [PATCH] netlink-socket: Add conceptual documentation. > > > > Based on a conversation with the VMware Hyper-V team earlier today. > > > > This commit also changes a couple of functions that were only used with > > netlink-socket.c into static functions. I couldn't think of a reason for > > code > > outside that file to use them. > > > > Signed-off-by: Ben Pfaff > > --- > > lib/netlink-socket.c | 124 +-- > > lib/netlink-socket.h | 145 > > +++ > > 2 files changed, 197 insertions(+), 72 deletions(-) > > > > diff --git a/lib/netlink-socket.c b/lib/netlink-socket.c index > > 09d3a61..0ff85d6 > > 100644 > > --- a/lib/netlink-socket.c > > +++ b/lib/netlink-socket.c > > @@ -537,26 +537,7 @@ nl_sock_transact_multiple__(struct nl_sock *sock, > > return error; > > } > > > > -/* Sends the 'request' member of the 'n' transactions in 'transactions' on > > - * 'sock', in order, and receives responses to all of them. Fills in the > > - * 'error' member of each transaction with 0 if it was successful, > > otherwise > > - * with a positive errno value. If 'reply' is nonnull, then it will be > > filled > > - * with the reply if the message receives a detailed reply. In other > > cases, > > - * i.e. where the request failed or had no reply beyond an indication of > > - * success, 'reply' will be cleared if it is nonnull. > > - * > > - * The caller is responsible for destroying each request and reply, and the > > - * transactions array itself. > > - * > > - * Before sending each message, this function will finalize nlmsg_len in > > each > > - * 'request' to match the ofpbuf's size, set nlmsg_pid to 'sock''s pid, > > and > > - * initialize nlmsg_seq. > > - * > > - * Bare Netlink is an unreliable transport protocol. This function layers > > - * reliable delivery and reply semantics on top of bare Netlink. See > > - * nl_sock_transact() for some caveats. > > - */ > > -void > > +static void > > nl_sock_transact_multiple(struct nl_sock *sock, > >struct nl_transaction **transactions, size_t n) > > { @@ -611,47 > > +592,7 @@ nl_sock_transact_multiple(struct nl_sock *sock, > > } > > } > > > > -/* Sends 'request' to the kernel via 'sock' and waits for a response. If > > - * successful, returns 0. On failure, returns a positive errno value. > > - * > > - * If 'replyp' is nonnull, then on success '*replyp' is set to the kernel's > > - * reply, which the caller is responsible for freeing with > > ofpbuf_delete(), and > > - * on failure '*replyp' is set to NULL. If 'replyp' is null, then the > > kernel's > > - * reply, if any, is discarded. > > - * > > - * Before the message is sent, nlmsg_len in 'request' will be finalized to > > - * match ofpbuf_size(msg), nlmsg_pid will be set to 'sock''s pid, and > > nlmsg_seq will > > - * be initialized, NLM_F_ACK will be set in nlmsg_flags. > > - * > > - * The caller is responsible for destroying 'request'. > > - * > > - * Bare Netlink is an unreliable transport protocol. This function layers > > - * reliable delivery and reply semantics on top of bare Netlink. > > - * > > - * In Netlink, sending a request to the kernel is reliable enough, because > > the > > - * kernel will tell us if the message cannot be queued (and we will in that > > - * case put it on the transmit queue and wait until it can be delivered). > > - * > > - * Receiving the reply is the real problem: if the socket buffer is full > > when > > - * the kernel tries to send the reply, the reply will be dropped.
Re: [ovs-dev] [PATCH] cmap: Merge CMAP_FOR_EACH_SAFE into CMAP_FOR_EACH.
On Mon, Jul 28, 2014 at 02:03:49PM -0700, Jarno Rajahalme wrote: > Ben, > > Thanks for reminding, somehow I lost track of this. > > This is actually a nice cleanup :-) > > Acked-by: Jarno Rajahalme Thanks, applied to master. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH] tests: Add another test for extended registers.
ONF-JIRA: EXT-244 Suggested-by: Jean Tourrilhes Signed-off-by: Ben Pfaff --- tests/ofproto-dpif.at | 21 + 1 file changed, 21 insertions(+) diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at index 4386639..9ee1698 100644 --- a/tests/ofproto-dpif.at +++ b/tests/ofproto-dpif.at @@ -410,6 +410,27 @@ AT_CHECK([tail -1 stdout], [0], OVS_VSWITCHD_STOP AT_CLEANUP +dnl Tests that the standardized xregs are mapped onto the legacy OVS registers +dnl in the manner documented in ovs-ofctl(8). +AT_SETUP([ofproto-dpif - extended registers]) +OVS_VSWITCHD_START +ADD_OF_PORTS([br0], [1], [2], [3]) +AT_DATA([flows.txt], [dnl +table=0 actions=load:0xfedcba9876543210->OXM_OF_PKT_REG1[[]],resubmit(,1) +table=1,reg2=0xfedcba98,reg3=0x76543210 actions=2 + +# These low-priority rules shouldn't match. They're here only to make really +# sure that the test fails if either of the above rules fails to match. +table=0,priority=0actions=3 +table=1,priority=0actions=3 +]) +AT_CHECK([ovs-ofctl add-flows br0 flows.txt]) +AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'in_port(1),eth(src=50:54:00:00:00:05,dst=50:54:00:00:00:07),eth_type(0x0800),ipv4(src=192.168.0.1,dst=192.168.0.2,proto=1,tos=0,ttl=128,frag=no),icmp(type=8,code=0)'], [0], [stdout]) +AT_CHECK([tail -1 stdout], [0], [Datapath actions: 2 +]) +OVS_VSWITCHD_STOP +AT_CLEANUP + AT_SETUP([ofproto-dpif - push-pop]) OVS_VSWITCHD_START ADD_OF_PORTS([br0], [20], [21], [22], [33], [90]) -- 1.7.10.4 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2 1/5] Add defines, enums and headers for MSVC
On Tue, Jul 29, 2014 at 03:21:16PM +, Alin Serdean wrote: > Add defines needed to compile netlink-socket.c and netlink.c. > > Add a wrapper and the functionality behind it for syconf. > > Add the newly created files to the noinst_HEADERS in windows/automake.mk > > Signed-off-by: Alin Gabriel Serdean Applied to master, thanks! ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] plan to integrate Hyper-V support
We have pushed in the changes we made upon our repository: https://github.com/cloudbase/openvswitch-hyperv-kernel.git This conspicuous update includes: * Rebased on the the tip of master. * Functionality to take into account the NETLINK padding. * Hashing (SpookyHash) to improve performance over previous linear searches. * Some heavy refactoring to make the code more readable. * Packet fragmentation fixes * DirectIO performance improvements in place of the previous buffer copying Kind Regards, Alin. From: Alessandro Pilotti Sent: Tuesday, July 29, 2014 11:32 AM To: Nithin Raju Cc: Ben Pfaff; Saurabh Shah; dev@openvswitch.org; Alin Serdean; Guolin Yang; Eitan Eliahu; Rajiv Krishnamurthy Subject: Re: plan to integrate Hyper-V support Hi Nithin, I'd also add at least: 1) Design for mapping hyper-v ports to OVS ports 2) CI testing plans 3) Roadmap Thanks, Alessandro On 29.07.2014, at 05:01, "Nithin Raju" wrote: > 10.00 AM Pacific - Sounds good to me. > > hi Alessandro, > We'll be meeting tomorrow. In the light of that, we had the following items > on mind that we could discuss. Pls. feel free to edit the list: > > 1. Helping Alin to run some tests successfully (with the V2 patch that > Saurabh posted last week). > 2. Any design or code walkthrough for the kernel code. > 3. Netlink interface related discussion. > 4. Architectural topics. > > See you tomorrow! > > thanks, > Nithin ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2 3/5] Bypass HAVE_NETLINK for MSVC
On Tue, Jul 29, 2014 at 03:22:37PM +, Alin Serdean wrote: > Bypass the error compilation when compiling under MSVC. > > Signed-off-by: Alin Gabriel Serdean Applied to master, thanks! ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 2/2] netlink: Convert netlink_lookup() to use RCU protected hash table
On 07/29/14 at 05:58pm, Tobias Klauser wrote: > On 2014-07-29 at 13:41:33 +0200, Thomas Graf wrote: > > Heavy Netlink users such as Open vSwitch spend a considerable amount of > > time in netlink_lookup() due to the read-lock on nl_table_lock. Use of > > RCU relieves the lock contention. > > > > Makes use of the new resizable hash table to avoid locking on the > > lookup. > > > > The hash table will grow if entries exceeds 75% of table size up to a > > total table size of 64K. It will automatically shrink if usage falls > > below 50%. > > > > Also splits nl_table_lock into a separate spinlock to protect hash table > > mutations. This avoids a possible deadlock when the hash table growing > > waits on RCU readers to complete via synchronize_rcu() while readers > > holding RCU read lock are waiting on the nl_table_lock() to be released > > to lock the table for broadcasting. > > > > Before: > >9.16% kpktgend_0 [openvswitch] [k] masked_flow_lookup > >6.42% kpktgend_0 [pktgen] [k] mod_cur_headers > >6.26% kpktgend_0 [pktgen] [k] pktgen_thread_worker > >6.23% kpktgend_0 [kernel.kallsyms] [k] memset > >4.79% kpktgend_0 [kernel.kallsyms] [k] netlink_lookup > >4.37% kpktgend_0 [kernel.kallsyms] [k] memcpy > >3.60% kpktgend_0 [openvswitch] [k] ovs_flow_extract > >2.69% kpktgend_0 [kernel.kallsyms] [k] jhash2 > > > > After: > > 15.26% kpktgend_0 [openvswitch] [k] masked_flow_lookup > >8.12% kpktgend_0 [pktgen] [k] pktgen_thread_worker > >7.92% kpktgend_0 [pktgen] [k] mod_cur_headers > >5.11% kpktgend_0 [kernel.kallsyms] [k] memset > >4.11% kpktgend_0 [openvswitch] [k] ovs_flow_extract > >4.06% kpktgend_0 [kernel.kallsyms] [k] _raw_spin_lock > >3.90% kpktgend_0 [kernel.kallsyms] [k] jhash2 > >[...] > >0.67% kpktgend_0 [kernel.kallsyms] [k] netlink_lookup > > > > Signed-off-by: Thomas Graf > > --- > > net/netlink/af_netlink.c | 285 > > ++- > > net/netlink/af_netlink.h | 18 +-- > > net/netlink/diag.c | 11 +- > > 3 files changed, 119 insertions(+), 195 deletions(-) > > > > diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c > > index 1b38f7f..7f44468 100644 > > --- a/net/netlink/af_netlink.c > > +++ b/net/netlink/af_netlink.c > [...] > > @@ -2996,28 +2939,26 @@ static void *netlink_seq_next(struct seq_file *seq, > > void *v, loff_t *pos) > > > > net = seq_file_net(seq); > > iter = seq->private; > > - s = v; > > - do { > > - s = sk_next(s); > > - } while (s && !nl_table[s->sk_protocol].compare(net, s)); > > - if (s) > > - return s; > > + nlk = v; > > + > > + rht_for_each_entry_rcu(nlk, nlk->node.next, node) > > + if (net_eq(sock_net((struct sock *)nlk), net)) > > + return nlk; > > > > i = iter->link; > > j = iter->hash_idx + 1; > > > > do { > > - struct nl_portid_hash *hash = &nl_table[i].hash; > > - > > - for (; j <= hash->mask; j++) { > > - s = sk_head(&hash->table[j]); > > + struct rhashtable *ht = &nl_table[i].hash; > > + const struct bucket_table *tbl = rht_dereference(ht->tbl, ht); > > > > - while (s && !nl_table[s->sk_protocol].compare(net, s)) > > - s = sk_next(s); > > - if (s) { > > - iter->link = i; > > - iter->hash_idx = j; > > - return s; > > + for (; j <= tbl->size; j++) { > > Should the condition j < tbl->size here, since the bucket table only > contains `size' buckets? Good catch, thanks > > > > + rht_for_each_entry_rcu(nlk, tbl->buckets[j], node) { > > + if (net_eq(sock_net((struct sock *)nlk), net)) { > > + iter->link = i; > > + iter->hash_idx = j; > > + return nlk; > > + } > > } > > } > [...] > > diff --git a/net/netlink/diag.c b/net/netlink/diag.c > > index 1af29624..7588f34 100644 > > --- a/net/netlink/diag.c > > +++ b/net/netlink/diag.c > [...] > > @@ -101,16 +102,20 @@ static int __netlink_diag_dump(struct sk_buff *skb, > > struct netlink_callback *cb, > > int protocol, int s_num) > > { > > struct netlink_table *tbl = &nl_table[protocol]; > > - struct nl_portid_hash *hash = &tbl->hash; > > + struct rhashtable *ht = &tbl->hash; > > + const struct bucket_table *htbl = rht_dereference(ht->tbl, ht); > > struct net *net = sock_net(skb->sk); > > struct netlink_diag_req *req; > > + struct netlink_sock *nlsk; > > struct sock *sk; > > int ret = 0, num = 0, i; > > > > req = nlmsg_data(cb->nlh); > > > > - fo
Re: [ovs-dev] [PATCH] tests: Add another test for extended registers.
Hi, Unfortunately, I don't really have time to test the code, but from here it looks good and make sense. Have fun... Jean Acked-by: Jean Tourrilhes On Tue, Jul 29, 2014 at 09:27:23AM -0700, Ben Pfaff wrote: > ONF-JIRA: EXT-244 > Suggested-by: Jean Tourrilhes > Signed-off-by: Ben Pfaff > --- > tests/ofproto-dpif.at | 21 + > 1 file changed, 21 insertions(+) ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2 4/5] Changes needed to netlink-socket for MSVC
On Tue, Jul 29, 2014 at 03:23:28PM +, Alin Serdean wrote: > Add two functions set_sock_pid_in_kernel and portid_next. This will allow > the channel identification for the kernel extension to send back messages. > > Replace send with WriteFile equivalent and ignore nl_sock_drain for the moment > under MSVC. > > Replace sendmsg and recvmsg with ReadFile and WriteFile equivalents. > > On MSVC put in handle instead of fd(sock->fd becomes sock->handle). > > Creation of the netlink socket will be replaced by CreateFile equivalent. > > Add MAX_STACK_LENGTH for MSVC this will be our maximmum for on-stack copy > buffer. > > Signed-off-by: Alin Gabriel Serdean I'm not convinced that this will be the final Netlink implementation for Windows, but it seems like an OK place to start. I'm folding in the following, which is mostly coding style change but the removal of the ovs_mutex_init() call is actually important. diff --git a/lib/netlink-socket.c b/lib/netlink-socket.c index d648761..3dbfe59 100644 --- a/lib/netlink-socket.c +++ b/lib/netlink-socket.c @@ -53,15 +53,17 @@ static struct ovs_mutex portid_mutex = OVS_MUTEX_INITIALIZER; static uint32_t g_last_portid = 0; /* Port IDs must be unique! */ -uint32_t -portid_next() OVS_GUARDED_BY(portid_mutex) +static uint32_t +portid_next(void) +OVS_GUARDED_BY(portid_mutex) { g_last_portid++; return g_last_portid; } -void -set_sock_pid_in_kernel(HANDLE handle, uint32_t pid) OVS_GUARDED_BY(portid_mutex) +static void +set_sock_pid_in_kernel(HANDLE handle, uint32_t pid) +OVS_GUARDED_BY(portid_mutex) { struct nlmsghdr msg = { 0 }; @@ -73,7 +75,7 @@ set_sock_pid_in_kernel(HANDLE handle, uint32_t pid) OVS_GUARDED_BY(portid_mutex) WriteFile(handle, &msg, sizeof(struct nlmsghdr), NULL, NULL); } -#endif +#endif /* _WIN32 */ /* A single (bad) Netlink message can in theory dump out many, many log * messages, so the burst size is set quite high here to avoid missing useful @@ -174,7 +176,6 @@ nl_sock_create(int protocol, struct nl_sock **sockp) rcvbuf = 1024 * 1024; #ifdef _WIN32 sock->rcvbuf = rcvbuf; -ovs_mutex_init(&portid_mutex); ovs_mutex_lock(&portid_mutex); sock->pid = portid_next(); set_sock_pid_in_kernel(sock->handle, sock->pid); ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2 5/5] Add more files to the openvswitch library on MSVC
On Tue, Jul 29, 2014 at 03:24:08PM +, Alin Serdean wrote: > Add netlink related files to the windows build. > > > Signed-off-by: Alin Gabriel Serdean Applied to master, thanks! ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] tests: Add another test for extended registers.
That's good enough for me, for a review of a test case. Thanks! I've applied this to master. On Tue, Jul 29, 2014 at 09:35:57AM -0700, Jean Tourrilhes wrote: > Hi, > > Unfortunately, I don't really have time to test the code, but > from here it looks good and make sense. > Have fun... > > Jean > > Acked-by: Jean Tourrilhes > > On Tue, Jul 29, 2014 at 09:27:23AM -0700, Ben Pfaff wrote: > > ONF-JIRA: EXT-244 > > Suggested-by: Jean Tourrilhes > > Signed-off-by: Ben Pfaff > > --- > > tests/ofproto-dpif.at | 21 + > > 1 file changed, 21 insertions(+) ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2 4/5] Changes needed to netlink-socket for MSVC
Thank you for review and the changes. Kind Regards, Alin. From: Ben Pfaff [b...@nicira.com] Sent: Tuesday, July 29, 2014 7:38 PM To: Alin Serdean Cc: dev@openvswitch.org Subject: Re: [ovs-dev] [PATCH v2 4/5] Changes needed to netlink-socket for MSVC On Tue, Jul 29, 2014 at 03:23:28PM +, Alin Serdean wrote: > Add two functions set_sock_pid_in_kernel and portid_next. This will allow > the channel identification for the kernel extension to send back messages. > > Replace send with WriteFile equivalent and ignore nl_sock_drain for the moment > under MSVC. > > Replace sendmsg and recvmsg with ReadFile and WriteFile equivalents. > > On MSVC put in handle instead of fd(sock->fd becomes sock->handle). > > Creation of the netlink socket will be replaced by CreateFile equivalent. > > Add MAX_STACK_LENGTH for MSVC this will be our maximmum for on-stack copy > buffer. > > Signed-off-by: Alin Gabriel Serdean I'm not convinced that this will be the final Netlink implementation for Windows, but it seems like an OK place to start. I'm folding in the following, which is mostly coding style change but the removal of the ovs_mutex_init() call is actually important. diff --git a/lib/netlink-socket.c b/lib/netlink-socket.c index d648761..3dbfe59 100644 --- a/lib/netlink-socket.c +++ b/lib/netlink-socket.c @@ -53,15 +53,17 @@ static struct ovs_mutex portid_mutex = OVS_MUTEX_INITIALIZER; static uint32_t g_last_portid = 0; /* Port IDs must be unique! */ -uint32_t -portid_next() OVS_GUARDED_BY(portid_mutex) +static uint32_t +portid_next(void) +OVS_GUARDED_BY(portid_mutex) { g_last_portid++; return g_last_portid; } -void -set_sock_pid_in_kernel(HANDLE handle, uint32_t pid) OVS_GUARDED_BY(portid_mutex) +static void +set_sock_pid_in_kernel(HANDLE handle, uint32_t pid) +OVS_GUARDED_BY(portid_mutex) { struct nlmsghdr msg = { 0 }; @@ -73,7 +75,7 @@ set_sock_pid_in_kernel(HANDLE handle, uint32_t pid) OVS_GUARDED_BY(portid_mutex) WriteFile(handle, &msg, sizeof(struct nlmsghdr), NULL, NULL); } -#endif +#endif /* _WIN32 */ /* A single (bad) Netlink message can in theory dump out many, many log * messages, so the burst size is set quite high here to avoid missing useful @@ -174,7 +176,6 @@ nl_sock_create(int protocol, struct nl_sock **sockp) rcvbuf = 1024 * 1024; #ifdef _WIN32 sock->rcvbuf = rcvbuf; -ovs_mutex_init(&portid_mutex); ovs_mutex_lock(&portid_mutex); sock->pid = portid_next(); set_sock_pid_in_kernel(sock->handle, sock->pid); ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH] tests: Turn off appctl poll_loop logging for long output.
One of the VMware internal autobuilder builds failed due to extraneous logging in these tests of the form: 2014-07-28T21:11:07Z|1|poll_loop|INFO|wakeup due to [POLLIN] on fd 3 (...) at lib/stream-fd-unix.c:124 (93% CPU usage) I think this must be because these tests have tons of output and so on a loaded machine it can take some CPU to pull it down. We don't want to fail for that reason, so disable this logging. Signed-off-by: Ben Pfaff --- tests/ofproto-dpif.at |8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at index 9ee1698..0253cb0 100644 --- a/tests/ofproto-dpif.at +++ b/tests/ofproto-dpif.at @@ -5458,7 +5458,7 @@ AT_BANNER([ofproto-dpif - flow translation resource limits]) AT_SETUP([ofproto-dpif - infinite resubmit]) OVS_VSWITCHD_START AT_CHECK([ovs-ofctl add-flow br0 actions=resubmit:1,resubmit:2,output:3]) -AT_CHECK([ovs-appctl ofproto/trace br0 'eth_dst=ff:ff:ff:ff:ff:ff'], +AT_CHECK([ovs-appctl -vpoll_loop:off ofproto/trace br0 'eth_dst=ff:ff:ff:ff:ff:ff'], [0], [stdout]) AT_CHECK([tail -1 stdout], [0], [Datapath actions: drop ]) @@ -5477,7 +5477,7 @@ ADD_OF_PORTS([br0], 1) done echo "in_port=65, actions=local") > flows AT_CHECK([ovs-ofctl add-flows br0 flows]) -AT_CHECK([ovs-appctl ofproto/trace br0 'in_port=1'], [0], [stdout]) +AT_CHECK([ovs-appctl -vpoll_loop:off ofproto/trace br0 'in_port=1'], [0], [stdout]) AT_CHECK([grep -c 'over 4096 resubmit actions' ovs-vswitchd.log], [0], [1 ]) OVS_VSWITCHD_STOP(["/over.*resubmit actions/d"]) @@ -5492,7 +5492,7 @@ ADD_OF_PORTS([br0], 1) done echo "in_port=13, actions=local,local,local,local,local,local,local,local") > flows AT_CHECK([ovs-ofctl add-flows br0 flows]) -AT_CHECK([ovs-appctl ofproto/trace br0 'in_port=1'], [0], [stdout]) +AT_CHECK([ovs-appctl -vpoll_loop:off ofproto/trace br0 'in_port=1'], [0], [stdout]) AT_CHECK([grep -c -e '- Uses action(s) not supported by datapath' stdout], [0], [1 ]) @@ -5511,7 +5511,7 @@ ADD_OF_PORTS([br0], 1) push="push:NXM_NX_REG0[[]]" echo "in_port=13, actions=$push,$push,$push,$push,$push,$push,$push,$push") > flows AT_CHECK([ovs-ofctl add-flows br0 flows]) -AT_CHECK([ovs-appctl ofproto/trace br0 'in_port=1'], [0], [stdout]) +AT_CHECK([ovs-appctl -vpoll_loop:off ofproto/trace br0 'in_port=1'], [0], [stdout]) AT_CHECK([grep -c 'resubmits yielded over 64 kB of stack' ovs-vswitchd.log], [0], [1 ]) OVS_VSWITCHD_STOP(["/resubmits yielded over 64 kB of stack/d"]) -- 1.7.10.4 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 1/2] lib: Resizable, Scalable, Concurrent Hash Table
On 07/29/14 at 08:30am, Josh Triplett wrote: > On Tue, Jul 29, 2014 at 01:41:32PM +0200, Thomas Graf wrote: > > Generic implementation of a resizable, scalable, concurrent hash table > > based on [0]. The implementation supports both, fixed size keys specified > > via an offset and length, or arbitrary keys via own hash and compare > > functions. > > > > Lookups are lockless and protected as RCU read side critical sections. > > Automatic growing/shrinking based on user configurable watermarks is > > available while allowing concurrent lookups to take place. > > > > Objects to be hashed must include a struct rhash_head. The reason for not > > using the existing struct hlist_head is that the expansion and shrinking > > will have two buckets point to a single entry which would lead in obscure > > reverse chaining behaviour. > > > > Code includes a boot selftest if CONFIG_TEST_RHASHTABLE is defined. > > > > [0] https://www.usenix.org/legacy/event/atc11/tech/final_files/Triplett.pdf > > > > Signed-off-by: Thomas Graf > > Thanks for working on this! > > Currently reading the algorithm implementation in more detail. A couple > of minor issues below. Thanks for the initial review! > > > --- /dev/null > > +++ b/include/linux/rhashtable.h > [...] > > +struct rhash_head { > > + struct rhash_head *next; > > +}; > > Arguably this could become a new singly-linked list type in list.h; > we don't currently have one. No objections and I'm very open to this if there is use for it aside from this hash table implementation. > > +/** > > + * rht_for_each_entry_safe - safely iterate over hash chain of given type > > + * @pos: type * to use as a loop cursor. > > + * @n: type * to use for temporary next object storage > > + * @head: head of the hash chain (struct rhash_head *) > > + * @ht:pointer to your struct rhashtable > > + * @member:name of the rhash_head within the hashable struct. > > + * > > + * This hash chain list-traversal primitive allows for the looped code to > > + * remove the loop cursor from the list. > > + */ > > +#define rht_for_each_entry_safe(pos, n, head, ht, member) \ > > + for (pos = rht_entry_safe(rht_dereference(head, ht), \ > > + typeof(*(pos)), member), \ > > +n = rht_entry_safe(rht_dereference((pos)->member.next, ht), \ > > +typeof(*(pos)), member); \ > > +pos; \ > > +pos = n, \ > > +n = rht_entry_safe(rht_dereference((pos)->member.next, ht), \ > > +typeof(*(pos)), member)) > > Given that removal requires special care, is this something actually > needed in the public interface, or only internally? You don't > necessarily need to define a full set of list primitives. (Unless of > course you make this a new list type in list.h.) The main purpose of the safe variant is to allow API users to easily destruct all table entries in the end and do something like this: tbl = rht_dereference(ht->tbl, ht); for (i = 0; i < tbl->size; i++) rht_for_each_entry_safe(obj, next, tbl->buckets[i], ht, node) kfree(obj); > > --- /dev/null > > +++ b/lib/rhashtable.c > [...] > > +#define ASSERT_RHT_MUTEX(HT) > > BUG_ON(unlikely(!lockdep_rht_mutex_is_held(HT))) > > BUG_ON and WARN_ON already include unlikely(); you don't need to add it > yourself. Thanks, will fix. > > +/** > > + * rht_obj - cast hash head to outer object > > + * @ht:hash table > > + * @he:hashed node > > + */ > > +void *rht_obj(const struct rhashtable *ht, const struct rhash_head *he) > > +{ > > + return (void *) he - ht->p.head_offset; > > +} > > +EXPORT_SYMBOL_GPL(rht_obj); > > Should this be a static inline? And, will the runtime indirection > involved in head_offset add unnecessary overhead for tables of a known > type? (I'd expect a head_offset of 0 in common cases.) I left the optimization to the compiler here as I would expect it to inline this automatically. As for the overhead of the head_offset, I think it is minimal and justified by the added flexibility. The overhead is basically the same as for lists. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH] datapath: Remove redundant key ref from upcall_info.
struct dp_upcall_info has pointer to pkt_key which is already available in OVS_CB. This also simplifies upcall handling for gso packet. Signed-off-by: Pravin B Shelar --- datapath/actions.c | 3 --- datapath/datapath.c | 34 +- datapath/datapath.h | 4 +--- 3 files changed, 18 insertions(+), 23 deletions(-) diff --git a/datapath/actions.c b/datapath/actions.c index 39a21f4..792c3a9 100644 --- a/datapath/actions.c +++ b/datapath/actions.c @@ -503,10 +503,7 @@ static int output_userspace(struct datapath *dp, struct sk_buff *skb, const struct nlattr *a; int rem; - BUG_ON(!OVS_CB(skb)->pkt_key); - upcall.cmd = OVS_PACKET_CMD_ACTION; - upcall.key = OVS_CB(skb)->pkt_key; upcall.userdata = NULL; upcall.portid = 0; diff --git a/datapath/datapath.c b/datapath/datapath.c index 94539eb..1185f60 100644 --- a/datapath/datapath.c +++ b/datapath/datapath.c @@ -262,6 +262,7 @@ void ovs_dp_process_packet_with_key(struct sk_buff *skb, u32 n_mask_hit; stats = this_cpu_ptr(dp->stats_percpu); + OVS_CB(skb)->pkt_key = pkt_key; /* Look up flow. */ flow = ovs_flow_tbl_lookup_stats(&dp->table, pkt_key, skb_get_hash(skb), @@ -270,7 +271,6 @@ void ovs_dp_process_packet_with_key(struct sk_buff *skb, struct dp_upcall_info upcall; upcall.cmd = OVS_PACKET_CMD_MISS; - upcall.key = pkt_key; upcall.userdata = NULL; upcall.portid = ovs_vport_find_upcall_portid(p, skb); ovs_dp_upcall(dp, skb, &upcall); @@ -279,7 +279,6 @@ void ovs_dp_process_packet_with_key(struct sk_buff *skb, goto out; } - OVS_CB(skb)->pkt_key = pkt_key; OVS_CB(skb)->flow = flow; ovs_flow_stats_update(OVS_CB(skb)->flow, pkt_key->tp.flags, skb); @@ -316,6 +315,8 @@ int ovs_dp_upcall(struct datapath *dp, struct sk_buff *skb, struct dp_stats_percpu *stats; int err; + BUG_ON(!OVS_CB(skb)->pkt_key); + if (upcall_info->portid == 0) { err = -ENOTCONN; goto err; @@ -344,7 +345,6 @@ static int queue_gso_packets(struct datapath *dp, struct sk_buff *skb, const struct dp_upcall_info *upcall_info) { unsigned short gso_type = skb_shinfo(skb)->gso_type; - struct dp_upcall_info later_info; struct sw_flow_key later_key; struct sk_buff *segs, *nskb; int err; @@ -353,25 +353,25 @@ static int queue_gso_packets(struct datapath *dp, struct sk_buff *skb, if (IS_ERR(segs)) return PTR_ERR(segs); + if (gso_type & SKB_GSO_UDP) { + /* The initial flow key extracted by ovs_flow_extract() +* in this case is for a first fragment, so we need to +* properly mark later fragments. +*/ + later_key = *OVS_CB(skb)->pkt_key; + later_key.ip.frag = OVS_FRAG_TYPE_LATER; + } + /* Queue all of the segments. */ skb = segs; do { + if (gso_type & SKB_GSO_UDP && skb != segs) + OVS_CB(skb)->pkt_key = &later_key; + err = queue_userspace_packet(dp, skb, upcall_info); if (err) break; - if (skb == segs && gso_type & SKB_GSO_UDP) { - /* The initial flow key extracted by ovs_flow_extract() -* in this case is for a first fragment, so we need to -* properly mark later fragments. -*/ - later_key = *upcall_info->key; - later_key.ip.frag = OVS_FRAG_TYPE_LATER; - - later_info = *upcall_info; - later_info.key = &later_key; - upcall_info = &later_info; - } } while ((skb = skb->next)); /* Free all of the segments. */ @@ -437,6 +437,7 @@ static int queue_userspace_packet(struct datapath *dp, struct sk_buff *skb, struct ovs_header *upcall; struct sk_buff *nskb = NULL; struct sk_buff *user_skb; /* to be queued to userspace */ + struct sw_flow_key *pkt_key = OVS_CB(skb)->pkt_key; struct nlattr *nla; struct genl_info info = { #if LINUX_VERSION_CODE >= KERNEL_VERSION(3,14,0) @@ -497,8 +498,7 @@ static int queue_userspace_packet(struct datapath *dp, struct sk_buff *skb, upcall->dp_ifindex = dp_ifindex; nla = nla_nest_start(user_skb, OVS_PACKET_ATTR_KEY); - err = ovs_nla_put_flow(dp, upcall_info->key, - upcall_info->key, user_skb); + err = ovs_nla_put_flow(dp, pkt_key, pkt_key, user_skb); BUG_ON(err); nla_nest_end(user_skb, nla); diff --git a/datapath/datapath.h b/datapath/datapath.h index d6dee50..f2e8d6b 100644 --- a/dat
[ovs-dev] [PATCH 1/2] connmgr: Fix a typo.
Signed-off-by: Alex Wang --- ofproto/connmgr.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ofproto/connmgr.c b/ofproto/connmgr.c index b7d5d3b..c6432bf 100644 --- a/ofproto/connmgr.c +++ b/ofproto/connmgr.c @@ -1496,7 +1496,7 @@ ofconn_wants_packet_in_on_miss(struct ofconn *ofconn, /* The default "table-miss" behaviour for OpenFlow1.3+ is to drop the * packet rather than to send the packet to the controller. * - * This function returns false to indicate that a packet_in message + * This function returns true to indicate that a packet_in message * for a "table-miss" should be sent to at least one controller. * That is there is at least one controller with controller_id 0 * which connected using an OpenFlow version earlier than OpenFlow1.3. -- 1.7.9.5 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH 2/2] connmgr: Make call to connmgr_wants_packet_in_on_miss() thread safe.
connmgr_wants_packet_in_on_miss() is called by multiple threads and thusly should be protected by the mutex. Signed-off-by: Alex Wang --- ofproto/connmgr.c |6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/ofproto/connmgr.c b/ofproto/connmgr.c index c6432bf..89af6b6 100644 --- a/ofproto/connmgr.c +++ b/ofproto/connmgr.c @@ -1506,19 +1506,23 @@ ofconn_wants_packet_in_on_miss(struct ofconn *ofconn, * This logic assumes that "table-miss" packet_in messages * are always sent to controller_id 0. */ bool -connmgr_wants_packet_in_on_miss(struct connmgr *mgr) +connmgr_wants_packet_in_on_miss(struct connmgr *mgr) OVS_EXCLUDED(ofproto_mutex) { struct ofconn *ofconn; +ovs_mutex_lock(&ofproto_mutex); LIST_FOR_EACH (ofconn, node, &mgr->all_conns) { enum ofputil_protocol protocol = ofconn_get_protocol(ofconn); if (ofconn->controller_id == 0 && (protocol == OFPUTIL_P_NONE || ofputil_protocol_to_ofp_version(protocol) < OFP13_VERSION)) { +ovs_mutex_unlock(&ofproto_mutex); return true; } } +ovs_mutex_unlock(&ofproto_mutex); + return false; } -- 1.7.9.5 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [RFC] connmgr: Make connmgr_wants_packet_in_on_miss() only check controller connections.
The connmgr_wants_packet_in_on_miss() should only check 'ofconn's of type OFCONN_PRIMARY (i.e. controller connections). This commit makes it happen. Signed-off-by: Alex Wang --- ofproto/connmgr.c | 16 +--- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/ofproto/connmgr.c b/ofproto/connmgr.c index 89af6b6..49ced1f 100644 --- a/ofproto/connmgr.c +++ b/ofproto/connmgr.c @@ -1512,13 +1512,15 @@ connmgr_wants_packet_in_on_miss(struct connmgr *mgr) OVS_EXCLUDED(ofproto_mutex) ovs_mutex_lock(&ofproto_mutex); LIST_FOR_EACH (ofconn, node, &mgr->all_conns) { -enum ofputil_protocol protocol = ofconn_get_protocol(ofconn); - -if (ofconn->controller_id == 0 && -(protocol == OFPUTIL_P_NONE || - ofputil_protocol_to_ofp_version(protocol) < OFP13_VERSION)) { -ovs_mutex_unlock(&ofproto_mutex); -return true; +if (ofconn->type == OFCONN_PRIMARY) { +enum ofputil_protocol protocol = ofconn_get_protocol(ofconn); + +if (ofconn->controller_id == 0 && +(protocol == OFPUTIL_P_NONE || + ofputil_protocol_to_ofp_version(protocol) < OFP13_VERSION)) { +ovs_mutex_unlock(&ofproto_mutex); +return true; +} } } ovs_mutex_unlock(&ofproto_mutex); -- 1.7.9.5 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [RFC] connmgr: Make connmgr_wants_packet_in_on_miss() only check controller connections.
Hey Simon, I think we should not check 'ofconn's for OFCONN_SERVICE... This change broke several tests: *Please send `tests/testsuite.log' and all information you think might help:* * To: >* * Subject: [openvswitch 2.3.0] testsuite: 737 739 747 817 failed* Wherein we use the ofctl monitor to simulate controller and capture the packet_ins. So, could you help me clarify if we need consider ofctl conn as a controller connection? Thanks, Alex Wang, On Tue, Jul 29, 2014 at 11:24 AM, Alex Wang wrote: > The connmgr_wants_packet_in_on_miss() should only check 'ofconn's > of type OFCONN_PRIMARY (i.e. controller connections). This commit > makes it happen. > > Signed-off-by: Alex Wang > --- > ofproto/connmgr.c | 16 +--- > 1 file changed, 9 insertions(+), 7 deletions(-) > > diff --git a/ofproto/connmgr.c b/ofproto/connmgr.c > index 89af6b6..49ced1f 100644 > --- a/ofproto/connmgr.c > +++ b/ofproto/connmgr.c > @@ -1512,13 +1512,15 @@ connmgr_wants_packet_in_on_miss(struct connmgr > *mgr) OVS_EXCLUDED(ofproto_mutex) > > ovs_mutex_lock(&ofproto_mutex); > LIST_FOR_EACH (ofconn, node, &mgr->all_conns) { > -enum ofputil_protocol protocol = ofconn_get_protocol(ofconn); > - > -if (ofconn->controller_id == 0 && > -(protocol == OFPUTIL_P_NONE || > - ofputil_protocol_to_ofp_version(protocol) < OFP13_VERSION)) { > -ovs_mutex_unlock(&ofproto_mutex); > -return true; > +if (ofconn->type == OFCONN_PRIMARY) { > +enum ofputil_protocol protocol = ofconn_get_protocol(ofconn); > + > +if (ofconn->controller_id == 0 && > +(protocol == OFPUTIL_P_NONE || > + ofputil_protocol_to_ofp_version(protocol) < > OFP13_VERSION)) { > +ovs_mutex_unlock(&ofproto_mutex); > +return true; > +} > } > } > ovs_mutex_unlock(&ofproto_mutex); > -- > 1.7.9.5 > > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 1/2] connmgr: Fix a typo.
Acked-by: Andy Zhou On Tue, Jul 29, 2014 at 11:20 AM, Alex Wang wrote: > Signed-off-by: Alex Wang > --- > ofproto/connmgr.c |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/ofproto/connmgr.c b/ofproto/connmgr.c > index b7d5d3b..c6432bf 100644 > --- a/ofproto/connmgr.c > +++ b/ofproto/connmgr.c > @@ -1496,7 +1496,7 @@ ofconn_wants_packet_in_on_miss(struct ofconn *ofconn, > /* The default "table-miss" behaviour for OpenFlow1.3+ is to drop the > * packet rather than to send the packet to the controller. > * > - * This function returns false to indicate that a packet_in message > + * This function returns true to indicate that a packet_in message > * for a "table-miss" should be sent to at least one controller. > * That is there is at least one controller with controller_id 0 > * which connected using an OpenFlow version earlier than OpenFlow1.3. > -- > 1.7.9.5 > > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 2/2] connmgr: Make call to connmgr_wants_packet_in_on_miss() thread safe.
Should we also consider lock annotate the ofproto->connmgr variable? Also, it seems to me the lock can be per ofproto, instead of a global lock, but I am not sure this use case will benefit from a finer grain lock. On Tue, Jul 29, 2014 at 11:20 AM, Alex Wang wrote: > connmgr_wants_packet_in_on_miss() is called by multiple threads > and thusly should be protected by the mutex. > > Signed-off-by: Alex Wang > --- > ofproto/connmgr.c |6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/ofproto/connmgr.c b/ofproto/connmgr.c > index c6432bf..89af6b6 100644 > --- a/ofproto/connmgr.c > +++ b/ofproto/connmgr.c > @@ -1506,19 +1506,23 @@ ofconn_wants_packet_in_on_miss(struct ofconn *ofconn, > * This logic assumes that "table-miss" packet_in messages > * are always sent to controller_id 0. */ > bool > -connmgr_wants_packet_in_on_miss(struct connmgr *mgr) > +connmgr_wants_packet_in_on_miss(struct connmgr *mgr) > OVS_EXCLUDED(ofproto_mutex) > { > struct ofconn *ofconn; > > +ovs_mutex_lock(&ofproto_mutex); > LIST_FOR_EACH (ofconn, node, &mgr->all_conns) { > enum ofputil_protocol protocol = ofconn_get_protocol(ofconn); > > if (ofconn->controller_id == 0 && > (protocol == OFPUTIL_P_NONE || > ofputil_protocol_to_ofp_version(protocol) < OFP13_VERSION)) { > +ovs_mutex_unlock(&ofproto_mutex); > return true; > } > } > +ovs_mutex_unlock(&ofproto_mutex); > + > return false; > } > > -- > 1.7.9.5 > > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 2/2] connmgr: Make call to connmgr_wants_packet_in_on_miss() thread safe.
Thx for the suggestion, I think it will make the connmgr module safer if we annotate the ofproto->connmgr. With a quick look, I think it will take more checking and efforts, (for functions we know that are only called by the main thread, there is no lock protection) For branch-2.3, I want to use this fix. Do you think it is okay? Thanks, Alex Wang, On Tue, Jul 29, 2014 at 1:23 PM, Andy Zhou wrote: > Should we also consider lock annotate the ofproto->connmgr variable? > > Also, it seems to me the lock can be per ofproto, instead of a global > lock, but I am not sure this use case will benefit from a finer grain > lock. > > > On Tue, Jul 29, 2014 at 11:20 AM, Alex Wang wrote: > > connmgr_wants_packet_in_on_miss() is called by multiple threads > > and thusly should be protected by the mutex. > > > > Signed-off-by: Alex Wang > > --- > > ofproto/connmgr.c |6 +- > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > diff --git a/ofproto/connmgr.c b/ofproto/connmgr.c > > index c6432bf..89af6b6 100644 > > --- a/ofproto/connmgr.c > > +++ b/ofproto/connmgr.c > > @@ -1506,19 +1506,23 @@ ofconn_wants_packet_in_on_miss(struct ofconn > *ofconn, > > * This logic assumes that "table-miss" packet_in messages > > * are always sent to controller_id 0. */ > > bool > > -connmgr_wants_packet_in_on_miss(struct connmgr *mgr) > > +connmgr_wants_packet_in_on_miss(struct connmgr *mgr) > OVS_EXCLUDED(ofproto_mutex) > > { > > struct ofconn *ofconn; > > > > +ovs_mutex_lock(&ofproto_mutex); > > LIST_FOR_EACH (ofconn, node, &mgr->all_conns) { > > enum ofputil_protocol protocol = ofconn_get_protocol(ofconn); > > > > if (ofconn->controller_id == 0 && > > (protocol == OFPUTIL_P_NONE || > > ofputil_protocol_to_ofp_version(protocol) < > OFP13_VERSION)) { > > +ovs_mutex_unlock(&ofproto_mutex); > > return true; > > } > > } > > +ovs_mutex_unlock(&ofproto_mutex); > > + > > return false; > > } > > > > -- > > 1.7.9.5 > > > > ___ > > dev mailing list > > dev@openvswitch.org > > http://openvswitch.org/mailman/listinfo/dev > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 2/2] connmgr: Make call to connmgr_wants_packet_in_on_miss() thread safe.
On Tue, Jul 29, 2014 at 2:09 PM, Alex Wang wrote: > Thx for the suggestion, I think it will make the connmgr module safer > if we annotate the ofproto->connmgr. With a quick look, I think it will > take more checking and efforts, (for functions we know that are only called > by the main thread, there is no lock protection) Is this safe if ofproto->connmgr can be accessed by other threads? > > For branch-2.3, I want to use this fix. Do you think it is okay? I suppose so, as long as the fix is safe. Your call. > > Thanks, > Alex Wang, > > > On Tue, Jul 29, 2014 at 1:23 PM, Andy Zhou wrote: >> >> Should we also consider lock annotate the ofproto->connmgr variable? >> >> Also, it seems to me the lock can be per ofproto, instead of a global >> lock, but I am not sure this use case will benefit from a finer grain >> lock. >> >> >> On Tue, Jul 29, 2014 at 11:20 AM, Alex Wang wrote: >> > connmgr_wants_packet_in_on_miss() is called by multiple threads >> > and thusly should be protected by the mutex. >> > >> > Signed-off-by: Alex Wang >> > --- >> > ofproto/connmgr.c |6 +- >> > 1 file changed, 5 insertions(+), 1 deletion(-) >> > >> > diff --git a/ofproto/connmgr.c b/ofproto/connmgr.c >> > index c6432bf..89af6b6 100644 >> > --- a/ofproto/connmgr.c >> > +++ b/ofproto/connmgr.c >> > @@ -1506,19 +1506,23 @@ ofconn_wants_packet_in_on_miss(struct ofconn >> > *ofconn, >> > * This logic assumes that "table-miss" packet_in messages >> > * are always sent to controller_id 0. */ >> > bool >> > -connmgr_wants_packet_in_on_miss(struct connmgr *mgr) >> > +connmgr_wants_packet_in_on_miss(struct connmgr *mgr) >> > OVS_EXCLUDED(ofproto_mutex) >> > { >> > struct ofconn *ofconn; >> > >> > +ovs_mutex_lock(&ofproto_mutex); >> > LIST_FOR_EACH (ofconn, node, &mgr->all_conns) { >> > enum ofputil_protocol protocol = ofconn_get_protocol(ofconn); >> > >> > if (ofconn->controller_id == 0 && >> > (protocol == OFPUTIL_P_NONE || >> > ofputil_protocol_to_ofp_version(protocol) < >> > OFP13_VERSION)) { >> > +ovs_mutex_unlock(&ofproto_mutex); >> > return true; >> > } >> > } >> > +ovs_mutex_unlock(&ofproto_mutex); >> > + >> > return false; >> > } >> > >> > -- >> > 1.7.9.5 >> > >> > ___ >> > dev mailing list >> > dev@openvswitch.org >> > http://openvswitch.org/mailman/listinfo/dev > > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 2/2] connmgr: Make call to connmgr_wants_packet_in_on_miss() thread safe.
On Tue, Jul 29, 2014 at 2:16 PM, Andy Zhou wrote: > On Tue, Jul 29, 2014 at 2:09 PM, Alex Wang wrote: > > Thx for the suggestion, I think it will make the connmgr module safer > > if we annotate the ofproto->connmgr. With a quick look, I think it will > > take more checking and efforts, (for functions we know that are only > called > > by the main thread, there is no lock protection) > Is this safe if ofproto->connmgr can be accessed by other threads? > I think it is safe. since main thread creates/deletes the connmgr, > > > > For branch-2.3, I want to use this fix. Do you think it is okay? > I suppose so, as long as the fix is safe. Your call. > Thx, will backport it soon, ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 2/2] connmgr: Make call to connmgr_wants_packet_in_on_miss() thread safe.
Thx, pushed both patches to master and branch-2.3. I'll try improving the thread-safety of connmgr module later, On Tue, Jul 29, 2014 at 2:32 PM, Alex Wang wrote: > > > > On Tue, Jul 29, 2014 at 2:16 PM, Andy Zhou wrote: > >> On Tue, Jul 29, 2014 at 2:09 PM, Alex Wang wrote: >> > Thx for the suggestion, I think it will make the connmgr module safer >> > if we annotate the ofproto->connmgr. With a quick look, I think it will >> > take more checking and efforts, (for functions we know that are only >> called >> > by the main thread, there is no lock protection) >> Is this safe if ofproto->connmgr can be accessed by other threads? >> > > > I think it is safe. since main thread creates/deletes the connmgr, > > > > >> > >> > For branch-2.3, I want to use this fix. Do you think it is okay? >> I suppose so, as long as the fix is safe. Your call. >> > > > Thx, will backport it soon, > > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [RFC] connmgr: Make connmgr_wants_packet_in_on_miss() only check controller connections.
On Tue, Jul 29, 2014 at 11:30:12AM -0700, Alex Wang wrote: > I think we should not check 'ofconn's for OFCONN_SERVICE... This > change broke several tests: > > *Please send `tests/testsuite.log' and all information you think might > help:* > > * To: >* > > * Subject: [openvswitch 2.3.0] testsuite: 737 739 747 817 failed* I'm a little confused now. I'm not sure why you addressed this to Simon. I don't think he has any recent commits. >From the [openvswitch 2.3.0] it looks like you ran the tests against branch-2.3. I don't see the same failures locally when I run the tests on branch-2.3 (commit 9067b54) or master (commit 57fa816). Are these intermittent failures or do you see them every time? Do they appear only after the two recent additional commits that you pushed? ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [RFC] connmgr: Make connmgr_wants_packet_in_on_miss() only check controller connections.
Hey Ben, Sorry for the confusion, On Tue, Jul 29, 2014 at 2:52 PM, Ben Pfaff wrote: > On Tue, Jul 29, 2014 at 11:30:12AM -0700, Alex Wang wrote: > > I think we should not check 'ofconn's for OFCONN_SERVICE... This > > change broke several tests: > > > > *Please send `tests/testsuite.log' and all information you think might > > help:* > > > > * To: >* > > > > * Subject: [openvswitch 2.3.0] testsuite: 737 739 747 817 failed* > > I'm a little confused now. > > I'm not sure why you addressed this to Simon. I don't think he has any > recent commits. > 1. The test failure is caused by this RFC patch, 2. connmgr_wants_packet_in_on_miss() was added by Simon. I think this function should only check for controller connections. And want to confirm it with Simon. 3. unit test 737 corresponds to this: (which is also added by Simon) 527ae97e (Simon Horman 2014-03-13 15:52:54 +0900 1565) 527ae97e (Simon Horman 2014-03-13 15:52:54 +0900 1566) AT_SETUP([ofproto-dpif - table-miss flow (OpenFlow 1.0)]) 527ae97e (Simon Horman 2014-03-13 15:52:54 +0900 1567) OVS_VSWITCHD_START([dnl 527ae97e (Simon Horman 2014-03-13 15:52:54 +0900 1568)add-port br0 p1 -- set Interface p1 type=dummy 527ae97e (Simon Horman 2014-03-13 15:52:54 +0900 1569) ]) I think that's why I sent email asking him. > > From the [openvswitch 2.3.0] it looks like you ran the tests against > branch-2.3. I don't see the same failures locally when I run the tests > on branch-2.3 (commit 9067b54) or master (commit 57fa816). Are these > intermittent failures or do you see them every time? Do they appear > only after the two recent additional commits that you pushed? > 4. I saw them every time. Those failures are caused only by this RFC patch (unrelated to the two recent commits I just pused) 5. The cause of failure is that connmgr_wants_packet_in_on_miss() now checks only the 'ofconn's of type OFCONN_PRIMARY. so, the 'ofconn' for 'ofctl monitor' is ignored, and we always drop the pkt_ins. Thanks, Alex Wang, ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH] datapath: refactor ovs flow extract API.
OVS flow extract is called on packet receive or packet execute code path. Following patch defines separate API for these two cases to simplify code. Signed-off-by: Pravin B Shelar --- datapath/actions.c | 3 +- datapath/datapath.c | 8 ++--- datapath/flow.c | 82 - datapath/flow.h | 4 ++- datapath/flow_netlink.c | 31 --- datapath/flow_netlink.h | 4 +-- 6 files changed, 75 insertions(+), 57 deletions(-) diff --git a/datapath/actions.c b/datapath/actions.c index 792c3a9..65b4892 100644 --- a/datapath/actions.c +++ b/datapath/actions.c @@ -649,11 +649,10 @@ static int execute_recirc(struct datapath *dp, struct sk_buff *skb, const struct nlattr *a) { struct sw_flow_key recirc_key; - const struct vport *p = OVS_CB(skb)->input_vport; uint32_t hash = OVS_CB(skb)->pkt_key->ovs_flow_hash; int err; - err = ovs_flow_extract(skb, p->port_no, &recirc_key); + err = ovs_flow_extract(skb, &recirc_key); if (err) { kfree_skb(skb); return err; diff --git a/datapath/datapath.c b/datapath/datapath.c index 1185f60..7d4b599 100644 --- a/datapath/datapath.c +++ b/datapath/datapath.c @@ -300,7 +300,7 @@ void ovs_dp_process_received_packet(struct sk_buff *skb) struct sw_flow_key key; /* Extract flow from 'skb' into 'key'. */ - error = ovs_flow_extract(skb, OVS_CB(skb)->input_vport->port_no, &key); + error = ovs_flow_extract(skb, &key); if (unlikely(error)) { kfree_skb(skb); return; @@ -581,13 +581,11 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, struct genl_info *info) if (IS_ERR(flow)) goto err_kfree_skb; - err = ovs_flow_extract(packet, -1, &flow->key); + err = ovs_flow_extract_nla_metadata(a[OVS_PACKET_ATTR_KEY], packet, + &flow->key); if (err) goto err_flow_free; - err = ovs_nla_get_flow_metadata(flow, a[OVS_PACKET_ATTR_KEY]); - if (err) - goto err_flow_free; acts = ovs_nla_alloc_flow_actions(nla_len(a[OVS_PACKET_ATTR_ACTIONS])); err = PTR_ERR(acts); if (IS_ERR(acts)) diff --git a/datapath/flow.c b/datapath/flow.c index e234796..89f4d05 100644 --- a/datapath/flow.c +++ b/datapath/flow.c @@ -16,8 +16,6 @@ * 02110-1301, USA */ -#include "flow.h" -#include "datapath.h" #include #include #include @@ -45,6 +43,10 @@ #include #include +#include "datapath.h" +#include "flow.h" +#include "flow_netlink.h" + #include "mpls.h" #include "vlan.h" @@ -425,10 +427,9 @@ invalid: } /** - * ovs_flow_extract - extracts a flow key from an Ethernet frame. + * flow_extract - extracts a flow key from an Ethernet frame. * @skb: sk_buff that contains the frame, with skb->data pointing to the * Ethernet header - * @in_port: port number on which @skb was received. * @key: output flow key * * The caller must ensure that skb->len >= ETH_HLEN. @@ -447,35 +448,11 @@ invalid: * of a correct length, otherwise the same as skb->network_header. * For other key->eth.type values it is left untouched. */ -int ovs_flow_extract(struct sk_buff *skb, u16 in_port, struct sw_flow_key *key) +static int flow_extract(struct sk_buff *skb, struct sw_flow_key *key) { int error; struct ethhdr *eth; - if (OVS_CB(skb)->tun_info) { - struct ovs_tunnel_info *tun_info = OVS_CB(skb)->tun_info; - memcpy(&key->tun_key, &tun_info->tunnel, - sizeof(key->tun_key)); - if (tun_info->options) { - BUILD_BUG_ON((1 << (sizeof(tun_info->options_len) * 8)) - 1 - > sizeof(key->tun_opts)); - memcpy(GENEVE_OPTS(key, tun_info->options_len), - tun_info->options, tun_info->options_len); - key->tun_opts_len = tun_info->options_len; - } else { - key->tun_opts_len = 0; - } - } else { - key->tun_opts_len = 0; - memset(&key->tun_key, 0, sizeof(key->tun_key)); - } - - key->phy.priority = skb->priority; - key->phy.in_port = in_port; - key->phy.skb_mark = skb->mark; - key->ovs_flow_hash = 0; - key->recirc_id = 0; - /* Flags are always used as part of stats. */ key->tp.flags = 0; @@ -694,3 +671,50 @@ int ovs_flow_extract(struct sk_buff *skb, u16 in_port, struct sw_flow_key *key) return 0; } + +int ovs_flow_extract(struct sk_buff *skb, struct sw_flow_key *key) +{ + /* Extract metadata from packet. */ + + if (OVS_CB(skb)->tun_info) { + struct ovs_tunnel_info *tun_info = OVS_CB(skb)->tun_info; + + memcpy(&ke
Re: [ovs-dev] [PATCH] openvswitch: Use IS_ERR_OR_NULL
From: Himangi Saraogi Date: Sun, 27 Jul 2014 12:37:46 +0530 > This patch introduces the use of the macro IS_ERR_OR_NULL in place of > tests for NULL and IS_ERR. > > The following Coccinelle semantic patch was used for making the change: > > @@ > expression e; > @@ > > - e == NULL || IS_ERR(e) > + IS_ERR_OR_NULL(e) > || ... > > Signed-off-by: Himangi Saraogi > Acked-by: Julia Lawall Applied. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH] netdev-dpdk: Use different constant for ring size
DPDK rings must have a power-of-two size. Signed-off-by: Daniele Di Proietto --- lib/netdev-dpdk.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index b45e367..c1b644a 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -125,6 +125,7 @@ static const struct rte_eth_txconf tx_conf = { enum { MAX_RX_QUEUE_LEN = 192 }; enum { MAX_TX_QUEUE_LEN = 384 }; +enum { DPDK_RING_SIZE = 256 }; /* Must be a power of 2 */ enum { DRAIN_TSC = 20ULL }; static int rte_eal_init_ret = ENODEV; @@ -1236,7 +1237,7 @@ dpdk_ring_create(const char dev_name[], unsigned int port_no, return -err; } -ivshmem->cring_tx = rte_ring_create(ring_name, MAX_RX_QUEUE_LEN, SOCKET0, 0); +ivshmem->cring_tx = rte_ring_create(ring_name, DPDK_RING_SIZE, SOCKET0, 0); if (ivshmem->cring_tx == NULL) { rte_free(ivshmem); return ENOMEM; @@ -1247,7 +1248,7 @@ dpdk_ring_create(const char dev_name[], unsigned int port_no, return -err; } -ivshmem->cring_rx = rte_ring_create(ring_name, MAX_RX_QUEUE_LEN, SOCKET0, 0); +ivshmem->cring_rx = rte_ring_create(ring_name, DPDK_RING_SIZE, SOCKET0, 0); if (ivshmem->cring_rx == NULL) { rte_free(ivshmem); return ENOMEM; -- 2.0.0 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCHv3] datapath: do not use vport type to determine presence of Geneve attributes
This patch fixes following kernel crash that could happen, if geneve vport was not added yet, but revalidator thread attempted to dump flows. To reproduce: 1. switch tunnel type between geneve and gre in a loop; and 2. run ping. BUG: unable to handle kernel NULL pointer dereference at 0048 IP: [] ovs_nla_put_flow+0x3d0/0x7c0 [openvswitch] PGD 3b32b067 PUD 3b2ef067 PMD 0 Oops: [#2] SMP ... CPU: 0 PID: 6450 Comm: revalidator2 Tainted: GF DO 3.13.0-24-generic #46-Ubuntu Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2012 task: 88003b4aafe0 ti: 88003d314000 task.ti: 88003d314000 RIP: 0010:[] [] ovs_nla_put_flow+0x3d0/0x7c0 [openvswitch] RSP: 0018:88003d315a10 EFLAGS: 00010246 RAX: RBX: 88003a9a9960 RCX: RDX: 0002 RSI: ffc8 RDI: 88003babcb80 RBP: 88003d315a68 R08: R09: 0004 R10: 880039c23034 R11: 0008 R12: 88003a861600 R13: 88003a9a9960 R14: 88003babcb80 R15: q88003a861600 FS: 7ff0f5d94700() GS:88003f60() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 0048 CR3: 3b55b000 CR4: 07f0 Stack: 81385093 8800 88003d315a58 880039c23014 88003a9a97a0 88003babcb80 880039c23018 88003a861600 88003d315ad0 Call Trace: [] ? __nla_reserve+0x43/0x50 [] ovs_flow_cmd_fill_info+0x93/0x2b0 [openvswitch] [] ? ovs_flow_tbl_dump_next+0x49/0xc0 [openvswitch] [] ovs_flow_cmd_dump+0x80/0xd0 [openvswitch] [] netlink_dump+0x84/0x240 [] __netlink_dump_start+0x1ab/0x220 [] genl_family_rcv_msg+0x337/0x370 [] ? ovs_flow_cmd_fill_info+0x2b0/0x2b0 [openvswitch] [] ? __kmalloc_node_track_caller+0x58/0x1e0 [] ? genl_family_rcv_msg+0x370/0x370 [] genl_rcv_msg+0x91/0xd0 [] netlink_rcv_skb+0xa9/0xc0 [] genl_rcv+0x28/0x40 [] netlink_unicast+0xd5/0x1b0 [] netlink_sendmsg+0x2ff/0x740 [] sock_sendmsg+0x8b/0xc0 [] ? __sb_end_write+0x31/0x60 [] ? touch_atime+0x10f/0x140 [] ? pipe_read+0x371/0x400 [] SYSC_sendto+0x121/0x1c0 [] ? vtime_account_user+0x54/0x60 [] ? syscall_trace_enter+0x145/0x250 [] SyS_sendto+0xe/0x10 [] tracesys+0xe1/0xe6 Signed-Off-By: Ansis Atteka --- datapath/flow_netlink.c|6 ++ datapath/linux/compat/include/net/ip_tunnels.h |1 + datapath/vport-geneve.c|2 +- 3 files changed, 4 insertions(+), 5 deletions(-) diff --git a/datapath/flow_netlink.c b/datapath/flow_netlink.c index e1eadbb..41199ff 100644 --- a/datapath/flow_netlink.c +++ b/datapath/flow_netlink.c @@ -421,6 +421,7 @@ static int ipv4_tun_from_nlattr(const struct nlattr *attr, tun_flags |= TUNNEL_OAM; break; case OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS: + tun_flags |= TUNNEL_OPTIONS_PRESENT; if (nla_len(a) > sizeof(match->key->tun_opts)) { OVS_NLERR("Geneve option length exceeds " "maximum size (len %d, max %zu).\n", @@ -1051,10 +1052,7 @@ int ovs_nla_put_flow(struct datapath *dp, const struct sw_flow_key *swkey, const struct geneve_opt *opts = NULL; if (!is_mask) { - struct vport *in_port; - - in_port = ovs_vport_ovsl_rcu(dp, swkey->phy.in_port); - if (in_port->ops->type == OVS_VPORT_TYPE_GENEVE) + if (swkey->tun_key.tun_flags & TUNNEL_OPTIONS_PRESENT) opts = GENEVE_OPTS(output, swkey->tun_opts_len); } else { if (output->tun_opts_len) diff --git a/datapath/linux/compat/include/net/ip_tunnels.h b/datapath/linux/compat/include/net/ip_tunnels.h index c7a14ef..2a6470a 100644 --- a/datapath/linux/compat/include/net/ip_tunnels.h +++ b/datapath/linux/compat/include/net/ip_tunnels.h @@ -48,5 +48,6 @@ int iptunnel_pull_header(struct sk_buff *skb, int hdr_len, __be16 inner_proto); /* Not yet upstream */ #define TUNNEL_OAM __cpu_to_be16(0x0200) #define TUNNEL_CRIT_OPT__cpu_to_be16(0x0400) +#define TUNNEL_OPTIONS_PRESENT __cpu_to_be16(0x0800) #endif /* __NET_IP_TUNNELS_H */ diff --git a/datapath/vport-geneve.c b/datapath/vport-geneve.c index 99841d4..65a996f 100644 --- a/datapath/vport-geneve.c +++ b/datapath/vport-geneve.c @@ -189,7 +189,7 @@ static int geneve_rcv(struct sock *sk, struct sk_buff *skb) geneveh = geneve_hdr(skb); - flags = TUNNEL_KEY | + flags = TUNNEL_KEY | TUNNEL_OPTIONS_PRESENT | (udp_hdr(skb)->check != 0 ? TUNNEL_CSUM : 0) | (geneveh->oam ? TUNNEL_OAM : 0) | (geneveh->critical ? TUNNEL_CRIT_OPT : 0); -- 1.7.9.5 __
Re: [ovs-dev] [PATCH] datapath: do not add Geneve attributes if vport does not exist
On Mon, Jul 28, 2014 at 7:54 PM, Jesse Gross wrote: > On Tue, Jul 22, 2014 at 9:34 AM, Jesse Gross wrote: >> On Mon, Jul 21, 2014 at 5:11 PM, Ansis Atteka wrote: >>> On Thu, Jul 17, 2014 at 4:08 PM, Jesse Gross wrote: One thing that I worry about is that this has the possibility to change how the flow is reported - before the flow deletion it has Geneve options but immediately after the flow still exists but without options. OVS will likely deal with this but it doesn't seem like a great thing. >>> I talked with Pravin on how to solve this "flow changing while vport >>> gets added" issue and he seemed to prefer the idea where we simply >>> look into tun_opts_len to figure out whether to append Geneve >>> attributes. >>> >>> Basically, the new patch makes assumption that if tun_opts_len>0 then >>> that is Geneve port that could potentially have Geneve options. Also, >>> I remember you mentioning that it might be good to distinguish between >>> the case when there are no tunnel options at all and the case when >>> Geneve attributes contain an empty set. Patch v2 does not address >>> that. How important is that in your opinion? >> >> The biggest concern that I have is if there is a flow that matches on >> Geneve packets without any options. If we it changed to look at >> tun_opts_len only, when the upcall goes to userspace there would be no >> OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS. Userspace would generally like to >> just echo the key back, so that means that we would have a mask >> without a key since it needs to specify a mask to match. >> >> This is currently allowed but not really ideal - we have this already >> for some existing netlink attributes like the overall >> OVS_KEY_ATTR_TUNNEL but it's been a source of bugs and special casing. >> The other choice is that we force userspace to insert the key even if >> the kernel didn't provide it for Geneve flows but that's somewhat >> complicated. >> >> One other possible solution is to tuck a bit into the flow to >> differentiate between the cases of an empty option set and no options >> but I was somewhat hoping to avoid that. > > Pravin, Andy, and I just talked about this and I think that we reached > a conclusion. > > It seems like the best thing to do is to use a flag in the key that > indicates whether options were originally present in either the flow > (because userspace passed it down) or header (because it came from a > Geneve port). Essentially, this would act like the current port number > check but with less crashing. > > So what we would do is: > * Add a TUNNEL_OPTIONS_PRESENT flag to the existing list of > TUNNEL_CSUM, TUNNEL_KEY, etc. > * Geneve ports would always set this when initializing flags in > geneve_rcv(), other tunnel ports never would. > * When receiving a flow setup from userspace, we would set this flag > if there is a GENEVE_OPTIONS key attribute present. > * When sending a flow from the kernel to userspace, we would change > the current check based on the port number to one based on this flag. > > Does that sound reasonable to you? Yes, that sounds reasonable. I sent it out for review here - http://openvswitch.org/pipermail/dev/2014-July/043220.html ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH net-next] net: Remove unlikely() for WARN_ON() conditions
No need for the unlikely(), WARN_ON() and BUG_ON() internally use unlikely() on the condition. Signed-off-by: Thomas Graf --- net/core/dev.c | 2 +- net/openvswitch/datapath.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index e1b7cfa..b370230 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2326,7 +2326,7 @@ __be16 skb_network_protocol(struct sk_buff *skb, int *depth) */ if (type == htons(ETH_P_8021Q) || type == htons(ETH_P_8021AD)) { if (vlan_depth) { - if (unlikely(WARN_ON(vlan_depth < VLAN_HLEN))) + if (WARN_ON(vlan_depth < VLAN_HLEN)) return 0; vlan_depth -= VLAN_HLEN; } else { diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h index 7ede507..701b573 100644 --- a/net/openvswitch/datapath.h +++ b/net/openvswitch/datapath.h @@ -144,7 +144,7 @@ int lockdep_ovsl_is_held(void); #define lockdep_ovsl_is_held() 1 #endif -#define ASSERT_OVSL() WARN_ON(unlikely(!lockdep_ovsl_is_held())) +#define ASSERT_OVSL() WARN_ON(!lockdep_ovsl_is_held()) #define ovsl_dereference(p)\ rcu_dereference_protected(p, lockdep_ovsl_is_held()) #define rcu_dereference_ovsl(p)\ -- 1.9.3 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [RFC] connmgr: Make connmgr_wants_packet_in_on_miss() only check controller connections.
On Tue, Jul 29, 2014 at 03:01:03PM -0700, Alex Wang wrote: > Hey Ben, > > Sorry for the confusion, > > On Tue, Jul 29, 2014 at 2:52 PM, Ben Pfaff wrote: > > > On Tue, Jul 29, 2014 at 11:30:12AM -0700, Alex Wang wrote: > > > I think we should not check 'ofconn's for OFCONN_SERVICE... This > > > change broke several tests: > > > > > > *Please send `tests/testsuite.log' and all information you think might > > > help:* > > > > > > * To: >* > > > > > > * Subject: [openvswitch 2.3.0] testsuite: 737 739 747 817 failed* > > > > I'm a little confused now. > > > > I'm not sure why you addressed this to Simon. I don't think he has any > > recent commits. > > > > > 1. The test failure is caused by this RFC patch, > > 2. connmgr_wants_packet_in_on_miss() was added by Simon. I think this > function should only check for controller connections. And want to > confirm > it with Simon. > > 3. unit test 737 corresponds to this: (which is also added by Simon) > > 527ae97e (Simon Horman 2014-03-13 15:52:54 +0900 1565) > 527ae97e (Simon Horman 2014-03-13 15:52:54 +0900 1566) > AT_SETUP([ofproto-dpif - table-miss flow (OpenFlow 1.0)]) > 527ae97e (Simon Horman 2014-03-13 15:52:54 +0900 1567) > OVS_VSWITCHD_START([dnl > 527ae97e (Simon Horman 2014-03-13 15:52:54 +0900 1568)add-port br0 > p1 -- set Interface p1 type=dummy > 527ae97e (Simon Horman 2014-03-13 15:52:54 +0900 1569) ]) > > > I think that's why I sent email asking him. I don't wish to add to any confusion but I will none the less make a comment on this change which you may choose to ignore if you like. The motivation for adding connmgr_wants_packet_in_on_miss() is to help implement per-OpenFlow version behaviour of packet_in messages. So if monitors should have per-OpenFlow version behaviour then I am not sure your patch is correct. My recollection is that this behaviour of monitors seemed useful for the purpose of testing the per-OpenFlow version behaviour of packet in messages. But it was a while ago so my recollection may not be accurate. > > From the [openvswitch 2.3.0] it looks like you ran the tests against > > branch-2.3. I don't see the same failures locally when I run the tests > > on branch-2.3 (commit 9067b54) or master (commit 57fa816). Are these > > intermittent failures or do you see them every time? Do they appear > > only after the two recent additional commits that you pushed? > > > > > 4. I saw them every time. Those failures are caused only by this RFC > patch (unrelated to the two recent commits I just pused) > > 5. The cause of failure is that connmgr_wants_packet_in_on_miss() now > checks only the 'ofconn's of type OFCONN_PRIMARY. so, the 'ofconn' > for 'ofctl monitor' is ignored, and we always drop the pkt_ins. > > Thanks, > Alex Wang, ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 2/2] connmgr: Make call to connmgr_wants_packet_in_on_miss() thread safe.
On Tue, Jul 29, 2014 at 2:46 PM, Alex Wang wrote: > Thx, pushed both patches to master and branch-2.3. > > I'll try improving the thread-safety of connmgr module later, Sounds good. Thanks. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [RFC] connmgr: Make connmgr_wants_packet_in_on_miss() only check controller connections.
> > > I don't wish to add to any confusion but I will none the less > make a comment on this change which you may choose to ignore if you like. > > The motivation for adding connmgr_wants_packet_in_on_miss() is to help > implement per-OpenFlow version behaviour of packet_in messages. So if > monitors should have per-OpenFlow version behaviour then I am not sure your > patch is correct. My recollection is that this behaviour of monitors seemed > useful for the purpose of testing the per-OpenFlow version behaviour of > packet in messages. But it was a while ago so my recollection may not be > accurate. > > > Hey Simon, Thanks for the explanation, I think we do want to have per-OpenFlow version 'table-miss' behavior for 'ofctl monitor'. My original understanding was that if there is no established 'ofconn' to controller, connmgr_wants_packet_in_on_miss() should always returns false, and 'ofctl monitor' should not work. After checking the code further, I think we treat 'ofconn' from 'ovs-ofctl monitor' as 'controller' also. And i see why you wrote connmgr_wants_packet_in_on_miss(). So, I'm dropping this patch. Thanks, Alex Wang, ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH] vport-internal_dev:Drop packets when interdev is not up
From: Chunhe Li If the internal device is not up, it should drop received packets. Sometimes it receive the broadcast or multicast packets, and the ip protocol stack will casue more cpu usage wasted. Signed-off-by: Chunhe Li --- datapath/vport-internal_dev.c | 5 + 1 file changed, 5 insertions(+) diff --git a/datapath/vport-internal_dev.c b/datapath/vport-internal_dev.c index 3b0f9a7..b0e1b24 100644 --- a/datapath/vport-internal_dev.c +++ b/datapath/vport-internal_dev.c @@ -242,6 +242,11 @@ static int internal_dev_recv(struct vport *vport, struct sk_buff *skb) struct net_device *netdev = netdev_vport_priv(vport)->dev; int len; + if (netdev && !(netdev->flags & IFF_UP)) { + kfree_skb(skb); + return 0; + } + #if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,37) if (vlan_tx_tag_present(skb)) { if (unlikely(!__vlan_put_tag(skb, -- 1.9.2.0 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH] ovs-lib.in:Add process name checking when start ovs service
From: Chunhe Li Only check wheather is daemon pid exist is not enough, becasue the pid which store in pidfile maybe assign to another process by OS. So it will checking failed for pid exist, but the starting process which own the pid is not the ovs daemon. Signed-off-by: Chunhe Li Signed-off-by: wuyunfei --- utilities/ovs-lib.in | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/utilities/ovs-lib.in b/utilities/ovs-lib.in index 48d0c36..8c1a8b2 100644 --- a/utilities/ovs-lib.in +++ b/utilities/ovs-lib.in @@ -244,5 +244,5 @@ daemon_status () { daemon_is_running () { pidfile=$rundir/$1.pid -test -e "$pidfile" && pid=`cat "$pidfile"` && pid_exists "$pid" +test -e "$pidfile" && pid=`cat "$pidfile"` && pid_exists "$pid" && pidof "$1" } >/dev/null 2>&1 -- 1.9.2.0 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev