Re: [ovs-dev] plan to integrate Hyper-V support

2014-07-29 Thread Alessandro Pilotti
Hi Nithin,

I'd also add at least:

1) Design for mapping hyper-v ports to OVS ports
2) CI testing plans
3) Roadmap

Thanks,

Alessandro

On 29.07.2014, at 05:01, "Nithin Raju"  wrote:

> 10.00 AM Pacific - Sounds good to me.
> 
> hi Alessandro,
> We'll be meeting tomorrow. In the light of that, we had the following items 
> on mind that we could discuss. Pls. feel free to edit the list:
> 
> 1. Helping Alin to run some tests successfully (with the V2 patch that 
> Saurabh posted last week).
> 2. Any design or code walkthrough for the kernel code.
> 3. Netlink interface related discussion.
> 4. Architectural topics.
> 
> See you tomorrow!
> 
> thanks,
> Nithin
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] bitmap: Optimize bitmap_equal and bitmap_scan

2014-07-29 Thread Kmindg G
On Tue, Jul 29, 2014 at 1:01 AM, Ben Pfaff  wrote:

> Needs a sign-off.
>

Oh..  I will not forget it in the next version.


>
> I'm pretty sure that bitmap_equal() is wrong: doesn't it access one
> past the end of the bitmap arrays?


You are right. I'll fix it.


> I think that rightmost_1bit_idx() might make bitmap_scan() faster.
>

I'll give it a try.


> The nice thing about the previous implementations was that they were
> obviously correct.  Now, I'd prefer to have some tests.
>
> No problem, I'll add some tests in the next version.

Thanks for reviewing
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH 2/2] netlink: Convert netlink_lookup() to use RCU protected hash table

2014-07-29 Thread Thomas Graf
Heavy Netlink users such as Open vSwitch spend a considerable amount of
time in netlink_lookup() due to the read-lock on nl_table_lock. Use of
RCU relieves the lock contention.

Makes use of the new resizable hash table to avoid locking on the
lookup.

The hash table will grow if entries exceeds 75% of table size up to a
total table size of 64K. It will automatically shrink if usage falls
below 50%.

Also splits nl_table_lock into a separate spinlock to protect hash table
mutations. This avoids a possible deadlock when the hash table growing
waits on RCU readers to complete via synchronize_rcu() while readers
holding RCU read lock are waiting on the nl_table_lock() to be released
to lock the table for broadcasting.

Before:
   9.16%  kpktgend_0  [openvswitch]  [k] masked_flow_lookup
   6.42%  kpktgend_0  [pktgen]   [k] mod_cur_headers
   6.26%  kpktgend_0  [pktgen]   [k] pktgen_thread_worker
   6.23%  kpktgend_0  [kernel.kallsyms]  [k] memset
   4.79%  kpktgend_0  [kernel.kallsyms]  [k] netlink_lookup
   4.37%  kpktgend_0  [kernel.kallsyms]  [k] memcpy
   3.60%  kpktgend_0  [openvswitch]  [k] ovs_flow_extract
   2.69%  kpktgend_0  [kernel.kallsyms]  [k] jhash2

After:
  15.26%  kpktgend_0  [openvswitch]  [k] masked_flow_lookup
   8.12%  kpktgend_0  [pktgen]   [k] pktgen_thread_worker
   7.92%  kpktgend_0  [pktgen]   [k] mod_cur_headers
   5.11%  kpktgend_0  [kernel.kallsyms]  [k] memset
   4.11%  kpktgend_0  [openvswitch]  [k] ovs_flow_extract
   4.06%  kpktgend_0  [kernel.kallsyms]  [k] _raw_spin_lock
   3.90%  kpktgend_0  [kernel.kallsyms]  [k] jhash2
   [...]
   0.67%  kpktgend_0  [kernel.kallsyms]  [k] netlink_lookup

Signed-off-by: Thomas Graf 
---
 net/netlink/af_netlink.c | 285 ++-
 net/netlink/af_netlink.h |  18 +--
 net/netlink/diag.c   |  11 +-
 3 files changed, 119 insertions(+), 195 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 1b38f7f..7f44468 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -58,7 +58,9 @@
 #include 
 #include 
 #include 
+#include 
 #include 
+#include 
 
 #include 
 #include 
@@ -100,6 +102,18 @@ static atomic_t nl_table_users = ATOMIC_INIT(0);
 
 #define nl_deref_protected(X) rcu_dereference_protected(X, 
lockdep_is_held(&nl_table_lock));
 
+/* Protects netlink socket hash table mutations */
+DEFINE_SPINLOCK(nl_sk_hash_lock);
+
+static int lockdep_nl_sk_hash_is_held(void)
+{
+#ifdef CONFIG_LOCKDEP
+   return (debug_locks) ? lockdep_is_held(&nl_sk_hash_lock) : 1;
+#else
+   return 1;
+#endif
+}
+
 static ATOMIC_NOTIFIER_HEAD(netlink_chain);
 
 static DEFINE_SPINLOCK(netlink_tap_lock);
@@ -110,11 +124,6 @@ static inline u32 netlink_group_mask(u32 group)
return group ? 1 << (group - 1) : 0;
 }
 
-static inline struct hlist_head *nl_portid_hashfn(struct nl_portid_hash *hash, 
u32 portid)
-{
-   return &hash->table[jhash_1word(portid, hash->rnd) & hash->mask];
-}
-
 int netlink_add_tap(struct netlink_tap *nt)
 {
if (unlikely(nt->dev->type != ARPHRD_NETLINK))
@@ -983,105 +992,48 @@ netlink_unlock_table(void)
wake_up(&nl_table_wait);
 }
 
-static bool netlink_compare(struct net *net, struct sock *sk)
-{
-   return net_eq(sock_net(sk), net);
-}
-
-static struct sock *netlink_lookup(struct net *net, int protocol, u32 portid)
+struct netlink_compare_arg
 {
-   struct netlink_table *table = &nl_table[protocol];
-   struct nl_portid_hash *hash = &table->hash;
-   struct hlist_head *head;
-   struct sock *sk;
-
-   read_lock(&nl_table_lock);
-   head = nl_portid_hashfn(hash, portid);
-   sk_for_each(sk, head) {
-   if (table->compare(net, sk) &&
-   (nlk_sk(sk)->portid == portid)) {
-   sock_hold(sk);
-   goto found;
-   }
-   }
-   sk = NULL;
-found:
-   read_unlock(&nl_table_lock);
-   return sk;
-}
+   struct net *net;
+   u32 portid;
+};
 
-static struct hlist_head *nl_portid_hash_zalloc(size_t size)
+static bool netlink_compare(void *ptr, void *arg)
 {
-   if (size <= PAGE_SIZE)
-   return kzalloc(size, GFP_ATOMIC);
-   else
-   return (struct hlist_head *)
-   __get_free_pages(GFP_ATOMIC | __GFP_ZERO,
-get_order(size));
-}
+   struct netlink_compare_arg *x = arg;
+   struct sock *sk = ptr;
 
-static void nl_portid_hash_free(struct hlist_head *table, size_t size)
-{
-   if (size <= PAGE_SIZE)
-   kfree(table);
-   else
-   free_pages((unsigned long)table, get_order(size));
+   return nlk_sk(sk)->portid == x->portid &&
+  net_eq(sock_net(sk), x->net);
 }
 
-static int nl_portid_hash_rehash(struct nl_portid_hash *hash, int grow)
+static struct sock *__netlink_lookup(struct netlink_table *table, u32 portid,

[ovs-dev] [PATCH 0/2 net-next] Lockless netlink_lookup() with new concurrent hash table

2014-07-29 Thread Thomas Graf
Netlink sockets are maintained in a hash table to allow efficient lookup
via the port ID for unicast messages. However, lookups currently require
a read lock to be taken. This series adds a new generic, resizable,
scalable, concurrent hash table based on the paper referenced in the first
patch. It then makes use of the new data type to implement lockless
netlink_lookup().

Against net-next since the initial user of the new hash table is in net/

Thomas Graf (2):
  lib: Resizable, Scalable, Concurrent Hash Table
  netlink: Convert netlink_lookup() to use RCU protected hash table

 include/linux/rhashtable.h | 208 
 lib/Kconfig.debug  |   8 +
 lib/Makefile   |   2 +-
 lib/rhashtable.c   | 783 +
 net/netlink/af_netlink.c   | 285 +++--
 net/netlink/af_netlink.h   |  18 +-
 net/netlink/diag.c |  11 +-
 7 files changed, 1119 insertions(+), 196 deletions(-)
 create mode 100644 include/linux/rhashtable.h
 create mode 100644 lib/rhashtable.c

-- 
1.9.3

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH 1/2] lib: Resizable, Scalable, Concurrent Hash Table

2014-07-29 Thread Thomas Graf
Generic implementation of a resizable, scalable, concurrent hash table
based on [0]. The implementation supports both, fixed size keys specified
via an offset and length, or arbitrary keys via own hash and compare
functions.

Lookups are lockless and protected as RCU read side critical sections.
Automatic growing/shrinking based on user configurable watermarks is
available while allowing concurrent lookups to take place.

Objects to be hashed must include a struct rhash_head. The reason for not
using the existing struct hlist_head is that the expansion and shrinking
will have two buckets point to a single entry which would lead in obscure
reverse chaining behaviour.

Code includes a boot selftest if CONFIG_TEST_RHASHTABLE is defined.

[0] https://www.usenix.org/legacy/event/atc11/tech/final_files/Triplett.pdf

Signed-off-by: Thomas Graf 
---
 include/linux/rhashtable.h | 208 
 lib/Kconfig.debug  |   8 +
 lib/Makefile   |   2 +-
 lib/rhashtable.c   | 783 +
 4 files changed, 1000 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/rhashtable.h
 create mode 100644 lib/rhashtable.c

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
new file mode 100644
index 000..f784f7a
--- /dev/null
+++ b/include/linux/rhashtable.h
@@ -0,0 +1,208 @@
+/*
+ * Resizable, Scalable, Concurrent Hash Table
+ *
+ * Copyright (c) 2014 Thomas Graf 
+ *
+ * Based on the following paper by Josh Triplett, Paul E. McKenney
+ * and Jonathan Walpole:
+ * https://www.usenix.org/legacy/event/atc11/tech/final_files/Triplett.pdf
+ *
+ * Code partially derived from nft_hash:
+ * Copyright (c) 2008-2014 Patrick McHardy 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef _LINUX_RHASHTABLE_H
+#define _LINUX_RHASHTABLE_H
+
+#include 
+
+struct rhash_head {
+   struct rhash_head   *next;
+};
+
+#define INIT_HASH_HEAD(ptr) ((ptr)->next = NULL)
+
+struct bucket_table {
+   size_t  size;
+   struct rhash_head __rcu *buckets[];
+};
+
+typedef u32 (*rht_hashfn_t)(const void *data, u32 len, u32 seed);
+typedef u32 (*rht_obj_hashfn_t)(const void *data, u32 seed);
+
+struct rhashtable;
+
+/**
+ * struct rhashtable_params - Hash table construction parameters
+ * @nelem_hint: Hint on number of elements expected
+ * @key_len: Length of key
+ * @key_offset: Offset of key in struct to be hashed
+ * @head_offset: Offset of rhash_head in struct to be hashed
+ * @hash_rnd: Seed to use while hashing
+ * @max_shift: Maximum number of shifts while expanding
+ * @hashfn: Function to hash key
+ * @obj_hashfn: Function to hash object
+ * @grow_decision: If defined, may return true if table should expand
+ * @shrink_decision: If defined, may return true if table should shrink
+ * @mutex_is_held: Must return true if protecting mutex is held
+ */
+struct rhashtable_params {
+   size_t  nelem_hint;
+   size_t  key_len;
+   size_t  key_offset;
+   size_t  head_offset;
+   u32 hash_rnd;
+   size_t  max_shift;
+   rht_hashfn_thashfn;
+   rht_obj_hashfn_tobj_hashfn;
+   bool(*grow_decision)(const struct rhashtable *ht,
+size_t new_size);
+   bool(*shrink_decision)(const struct rhashtable *ht,
+  size_t new_size);
+   int (*mutex_is_held)(void);
+};
+
+/**
+ * struct rhashtable - Hash table handle
+ * @tbl: Bucket table
+ * @nelems: Number of elements in table
+ * @shift: Current size (1 << shift)
+ * @p: Configuration parameters
+ */
+struct rhashtable {
+   struct bucket_table __rcu   *tbl;
+   size_t  nelems;
+   size_t  shift;
+   struct rhashtable_paramsp;
+};
+
+#ifdef CONFIG_PROVE_LOCKING
+int lockdep_rht_mutex_is_held(const struct rhashtable *ht);
+#else
+static inline int lockdep_rht_mutex_is_held(const struct rhashtable *ht)
+{
+   return 1;
+}
+#endif /* CONFIG_PROVE_LOCKING */
+
+int rhashtable_init(struct rhashtable *ht, struct rhashtable_params *params);
+
+u32 rhashtable_hashfn(const struct rhashtable *ht, const void *key, u32 len);
+u32 rhashtable_obj_hashfn(const struct rhashtable *ht, void *ptr);
+
+int rhashtable_insert(struct rhashtable *ht, struct rhash_head *node, gfp_t);
+bool rhashtable_remove(struct rhashtable *ht, struct rhash_head *node, gfp_t);
+
+bool rht_grow_above_75(const struct rhashtable *ht, size_t new_size);
+bool rht_shrink_below_50(const struct rhashtable *ht, size_t new_size);
+
+int rhashtable_expand(struct rhashtable *ht,

[ovs-dev] [PATCH v2 2/5] Add NETLINK defines

2014-07-29 Thread Alin Serdean
Add MAX_LINKS define needed for nl_pool.

Add NLM_F_CREATE and NLM_F_EXCL defines also.

Signed-off-by: Alin Gabriel Serdean 
---
 lib/netlink-protocol.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/lib/netlink-protocol.h b/lib/netlink-protocol.h
index 88b7abf..8938055 100644
--- a/lib/netlink-protocol.h
+++ b/lib/netlink-protocol.h
@@ -48,7 +48,9 @@
 
 #define NLM_F_ROOT  0x100
 #define NLM_F_MATCH 0x200
+#define NLM_F_EXCL  0x200
 #define NLM_F_ATOMIC0x400
+#define NLM_F_CREATE0x400
 #define NLM_F_DUMP  (NLM_F_ROOT | NLM_F_MATCH)
 
 /* nlmsg_type values. */
@@ -59,6 +61,8 @@
 
 #define NLMSG_MIN_TYPE  0x10
 
+#define MAX_LINKS   32
+
 struct nlmsghdr {
 uint32_t nlmsg_len;
 uint16_t nlmsg_type;
-- 
1.9.0.msysgit.0
 
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH] vtep-ctl: Add Tunnel table to vtep_ctl_table_class.

2014-07-29 Thread Gurucharan Shetty
This is needed to create, get, set records in the Tunnel table.

Signed-off-by: Gurucharan Shetty 
---
 vtep/vtep-ctl.c |4 
 1 file changed, 4 insertions(+)

diff --git a/vtep/vtep-ctl.c b/vtep/vtep-ctl.c
index 0b9463a..129fa03 100644
--- a/vtep/vtep-ctl.c
+++ b/vtep/vtep-ctl.c
@@ -2283,6 +2283,10 @@ static const struct vtep_ctl_table_class tables[] = {
  {{&vteprec_table_physical_switch, &vteprec_physical_switch_col_name, 
NULL},
   {NULL, NULL, NULL}}},
 
+{&vteprec_table_tunnel,
+ {{NULL, NULL, NULL},
+  {NULL, NULL, NULL}}},
+
 {NULL, {{NULL, NULL, NULL}, {NULL, NULL, NULL}}}
 };
 
-- 
1.7.9.5

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH v2 3/5] Bypass HAVE_NETLINK for MSVC

2014-07-29 Thread Alin Serdean
Bypass the error compilation when compiling under MSVC.

Signed-off-by: Alin Gabriel Serdean 
---
 lib/netlink-socket.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/netlink-socket.h b/lib/netlink-socket.h
index d53db4e..2b9ec52 100644
--- a/lib/netlink-socket.h
+++ b/lib/netlink-socket.h
@@ -57,8 +57,10 @@
 struct nl_sock;
 
 #ifndef HAVE_NETLINK
+#ifndef _WIN32
 #error "netlink-socket.h is only for hosts that support Netlink sockets"
 #endif
+#endif
 
 /* Netlink sockets. */
 int nl_sock_create(int protocol, struct nl_sock **);
-- 
1.9.0.msysgit.0
 
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH v2 4/5] Changes needed to netlink-socket for MSVC

2014-07-29 Thread Alin Serdean
Add two functions set_sock_pid_in_kernel and portid_next. This will allow
the channel identification for the kernel extension to send back messages.

Replace send with WriteFile equivalent and ignore nl_sock_drain for the moment
under MSVC.

Replace sendmsg and recvmsg with ReadFile and WriteFile equivalents.

On MSVC put in handle instead of fd(sock->fd becomes sock->handle).

Creation of the netlink socket will be replaced by CreateFile equivalent.

Add MAX_STACK_LENGTH for MSVC this will be our maximmum for on-stack copy 
buffer.

Signed-off-by: Alin Gabriel Serdean 
---
 lib/netlink-socket.c | 178 ++-
 1 file changed, 177 insertions(+), 1 deletion(-)

diff --git a/lib/netlink-socket.c b/lib/netlink-socket.c
index 09d3a61..9ce4c53 100644
--- a/lib/netlink-socket.c
+++ b/lib/netlink-socket.c
@@ -48,6 +48,33 @@ COVERAGE_DEFINE(netlink_sent);
 #define SOL_NETLINK 270
 #endif
 
+#ifdef _WIN32
+static struct ovs_mutex portid_mutex = OVS_MUTEX_INITIALIZER;
+static uint32_t g_last_portid = 0;
+
+/* Port IDs must be unique! */
+uint32_t
+portid_next() OVS_GUARDED_BY(portid_mutex)
+{
+g_last_portid++;
+return g_last_portid;
+}
+
+void
+set_sock_pid_in_kernel(HANDLE handle, uint32_t pid) 
OVS_GUARDED_BY(portid_mutex)
+{
+struct nlmsghdr msg = { 0 };
+
+msg.nlmsg_len = sizeof(struct nlmsghdr);
+msg.nlmsg_type = 80; /* target = set file pid */
+msg.nlmsg_flags = 0;
+msg.nlmsg_seq = 0;
+msg.nlmsg_pid = pid;
+
+WriteFile(handle, &msg, sizeof(struct nlmsghdr), NULL, NULL);
+}
+#endif
+
 /* A single (bad) Netlink message can in theory dump out many, many log
  * messages, so the burst size is set quite high here to avoid missing useful
  * information.  Also, at high logging levels we log *all* Netlink messages. */
@@ -60,7 +87,11 @@ static void log_nlmsg(const char *function, int error,
 /* Netlink sockets. */
 
 struct nl_sock {
+#ifdef _WIN32
+HANDLE handle;
+#else
 int fd;
+#endif
 uint32_t next_seq;
 uint32_t pid;
 int protocol;
@@ -88,7 +119,9 @@ nl_sock_create(int protocol, struct nl_sock **sockp)
 {
 static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
 struct nl_sock *sock;
+#ifndef _WIN32
 struct sockaddr_nl local, remote;
+#endif
 socklen_t local_size;
 int rcvbuf;
 int retval = 0;
@@ -114,15 +147,39 @@ nl_sock_create(int protocol, struct nl_sock **sockp)
 *sockp = NULL;
 sock = xmalloc(sizeof *sock);
 
+#ifdef _WIN32
+sock->handle = CreateFileA(".\\OpenVSwitchDevice",
+   GENERIC_READ | GENERIC_WRITE,
+   FILE_SHARE_READ | FILE_SHARE_WRITE,
+   NULL, OPEN_EXISTING,
+   FILE_ATTRIBUTE_NORMAL, NULL);
+
+int last_error = GetLastError();
+
+if (sock->handle == INVALID_HANDLE_VALUE) {
+VLOG_ERR("fcntl: %s", ovs_strerror(last_error));
+goto error;
+}
+#else
 sock->fd = socket(AF_NETLINK, SOCK_RAW, protocol);
 if (sock->fd < 0) {
 VLOG_ERR("fcntl: %s", ovs_strerror(errno));
 goto error;
 }
+#endif
+
 sock->protocol = protocol;
 sock->next_seq = 1;
 
 rcvbuf = 1024 * 1024;
+#ifdef _WIN32
+sock->rcvbuf = rcvbuf;
+ovs_mutex_init(&portid_mutex);
+ovs_mutex_lock(&portid_mutex);
+sock->pid = portid_next();
+set_sock_pid_in_kernel(sock->handle, sock->pid);
+ovs_mutex_unlock(&portid_mutex);
+#else
 if (setsockopt(sock->fd, SOL_SOCKET, SO_RCVBUFFORCE,
&rcvbuf, sizeof rcvbuf)) {
 /* Only root can use SO_RCVBUFFORCE.  Everyone else gets EPERM.
@@ -161,6 +218,7 @@ nl_sock_create(int protocol, struct nl_sock **sockp)
 goto error;
 }
 sock->pid = local.nl_pid;
+#endif
 
 *sockp = sock;
 return 0;
@@ -172,9 +230,15 @@ error:
 retval = EINVAL;
 }
 }
+#ifdef _WIN32
+if (sock->handle != INVALID_HANDLE_VALUE) {
+CloseHandle(sock->handle);
+}
+#else
 if (sock->fd >= 0) {
 close(sock->fd);
 }
+#endif
 free(sock);
 return retval;
 }
@@ -193,7 +257,11 @@ void
 nl_sock_destroy(struct nl_sock *sock)
 {
 if (sock) {
+#ifdef _WIN32
+CloseHandle(sock->handle);
+#else
 close(sock->fd);
+#endif
 free(sock);
 }
 }
@@ -212,12 +280,40 @@ nl_sock_destroy(struct nl_sock *sock)
 int
 nl_sock_join_mcgroup(struct nl_sock *sock, unsigned int multicast_group)
 {
+#ifdef _WIN32
+#define OVS_VPORT_MCGROUP_FALLBACK_ID 33
+struct ofpbuf msg_buf;
+struct message_multicast
+{
+struct nlmsghdr;
+/* if true, join; if else, leave */
+unsigned char join;
+unsigned int groupId;
+};
+
+struct message_multicast msg = { 0 };
+
+msg.nlmsg_len = sizeof(struct message_multicast);
+msg.nlmsg_type = OVS_VPORT_MCGROUP_FALLBACK_ID;
+msg.nlmsg_flags = 0;
+msg.nlmsg_seq = 0;
+msg.nlmsg_pid = sock->pid;
+

[ovs-dev] [PATCH v2 5/5] Add more files to the openvswitch library on MSVC

2014-07-29 Thread Alin Serdean
Add netlink related files to the windows build.


Signed-off-by: Alin Gabriel Serdean 
---
 lib/automake.mk | 9 +
 1 file changed, 9 insertions(+)

diff --git a/lib/automake.mk b/lib/automake.mk
index 0997df5..87a8faa 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -319,6 +319,15 @@ lib_libopenvswitch_la_SOURCES += \
lib/netdev-dpdk.h
 endif
 
+if WIN32
+lib_libopenvswitch_la_SOURCES += \
+   lib/netlink-notifier.c \
+   lib/netlink-notifier.h \
+   lib/netlink-protocol.h \
+   lib/netlink-socket.c \
+   lib/netlink-socket.h
+endif
+
 if HAVE_POSIX_AIO
 lib_libopenvswitch_la_SOURCES += lib/async-append-aio.c
 else
-- 
1.9.0.msysgit.0
  
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH v2 1/5] Add defines, enums and headers for MSVC

2014-07-29 Thread Alin Serdean
Add defines needed to compile netlink-socket.c and netlink.c.

Add a wrapper and the functionality behind it for syconf.

Add the newly created files to the noinst_HEADERS in windows/automake.mk

Signed-off-by: Alin Gabriel Serdean 
---
 include/windows/automake.mk|  1 +
 include/windows/net/if.h   | 50 ++
 include/windows/netpacket/packet.h | 40 ++
 include/windows/sys/uio.h  | 22 +
 include/windows/unistd.h   | 50 ++
 5 files changed, 163 insertions(+)
 create mode 100644 include/windows/netpacket/packet.h

diff --git a/include/windows/automake.mk b/include/windows/automake.mk
index 4d5a42c..2f1b631 100644
--- a/include/windows/automake.mk
+++ b/include/windows/automake.mk
@@ -11,6 +11,7 @@ noinst_HEADERS += \
include/windows/getopt.h \
include/windows/net/if.h \
include/windows/netdb.h \
+   include/windows/netpacket/packet.h \
include/windows/netinet/icmp6.h \
include/windows/netinet/in.h \
include/windows/netinet/in_systm.h \
diff --git a/include/windows/net/if.h b/include/windows/net/if.h
index 893ffe4..3a064ae 100644
--- a/include/windows/net/if.h
+++ b/include/windows/net/if.h
@@ -21,4 +21,54 @@
 
 #define IFNAMSIZ IF_NAMESIZE
 
+enum {
+IFLA_UNSPEC,
+IFLA_ADDRESS,
+IFLA_BROADCAST,
+IFLA_IFNAME,
+IFLA_MTU,
+IFLA_LINK,
+IFLA_QDISC,
+IFLA_STATS,
+IFLA_COST,
+#define IFLA_COST IFLA_COST
+IFLA_PRIORITY,
+#define IFLA_PRIORITY IFLA_PRIORITY
+IFLA_MASTER,
+#define IFLA_MASTER IFLA_MASTER
+IFLA_WIRELESS,
+#define IFLA_WIRELESS IFLA_WIRELESS
+IFLA_PROTINFO,
+#define IFLA_PROTINFO IFLA_PROTINFO
+IFLA_TXQLEN,
+#define IFLA_TXQLEN IFLA_TXQLEN
+IFLA_MAP,
+#define IFLA_MAP IFLA_MAP
+IFLA_WEIGHT,
+#define IFLA_WEIGHT IFLA_WEIGHT
+IFLA_OPERSTATE,
+IFLA_LINKMODE,
+IFLA_LINKINFO,
+#define IFLA_LINKINFO IFLA_LINKINFO
+IFLA_NET_NS_PID,
+IFLA_IFALIAS,
+IFLA_NUM_VF,
+IFLA_VFINFO_LIST,
+IFLA_STATS64,
+IFLA_VF_PORTS,
+IFLA_PORT_SELF,
+IFLA_AF_SPEC,
+IFLA_GROUP,
+IFLA_NET_NS_FD,
+IFLA_EXT_MASK,
+IFLA_PROMISCUITY,
+#define IFLA_PROMISCUITY IFLA_PROMISCUITY
+IFLA_NUM_TX_QUEUES,
+IFLA_NUM_RX_QUEUES,
+IFLA_CARRIER,
+IFLA_PHYS_PORT_ID,
+__IFLA_MAX
+};
+#define IFLA_MAX (__IFLA_MAX - 1)
+
 #endif /* net/if.h */
diff --git a/include/windows/netpacket/packet.h 
b/include/windows/netpacket/packet.h
new file mode 100644
index 000..cf26a70
--- /dev/null
+++ b/include/windows/netpacket/packet.h
@@ -0,0 +1,40 @@
+/*
+ * Copyright 2014 Cloudbase Solutions Srl
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef __NETPACKET_PACKET_H
+#define __NETPACKET_PACKET_H 1
+
+struct iovec
+{
+void *iov_base;
+unsigned int iov_len;
+};
+
+struct msghdr
+  {
+void *msg_name;
+socklen_t msg_namelen;
+
+struct iovec *msg_iov;
+size_t msg_iovlen;
+
+void *msg_control;
+size_t msg_controllen;
+
+int msg_flags;
+  };
+
+#endif /* netpacket/packet.h */
\ No newline at end of file
diff --git a/include/windows/sys/uio.h b/include/windows/sys/uio.h
index e69de29..b51c367 100644
--- a/include/windows/sys/uio.h
+++ b/include/windows/sys/uio.h
@@ -0,0 +1,22 @@
+/*
+ * Copyright 2014 Cloudbase Solutions Srl
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#ifndef __SYS_UIO_H
+#define __SYS_UIO_H 1
+
+#include 
+
+#endif /* sys/uio.h */
\ No newline at end of file
diff --git a/include/windows/unistd.h b/include/windows/unistd.h
index d9ded5a..8629f7e 100644
--- a/include/windows/unistd.h
+++ b/include/windows/unistd.h
@@ -16,6 +16,9 @@
 #ifndef _UNISTD_H
 #define _UNISTD_H   1
 
+#define WIN32_LEAN_AND_MEAN
+#include 
+
 #define fsync _commit
 
 /* Standard file descriptors.  */
@@ -23,4 +26,51 @@
 #define STDOUT_FILENO   1   /* Standard outpu

Re: [ovs-dev] [PATCH 1/2] lib: Resizable, Scalable, Concurrent Hash Table

2014-07-29 Thread Josh Triplett
On Tue, Jul 29, 2014 at 01:41:32PM +0200, Thomas Graf wrote:
> Generic implementation of a resizable, scalable, concurrent hash table
> based on [0]. The implementation supports both, fixed size keys specified
> via an offset and length, or arbitrary keys via own hash and compare
> functions.
> 
> Lookups are lockless and protected as RCU read side critical sections.
> Automatic growing/shrinking based on user configurable watermarks is
> available while allowing concurrent lookups to take place.
> 
> Objects to be hashed must include a struct rhash_head. The reason for not
> using the existing struct hlist_head is that the expansion and shrinking
> will have two buckets point to a single entry which would lead in obscure
> reverse chaining behaviour.
> 
> Code includes a boot selftest if CONFIG_TEST_RHASHTABLE is defined.
> 
> [0] https://www.usenix.org/legacy/event/atc11/tech/final_files/Triplett.pdf
> 
> Signed-off-by: Thomas Graf 

Thanks for working on this!

Currently reading the algorithm implementation in more detail.  A couple
of minor issues below.

> --- /dev/null
> +++ b/include/linux/rhashtable.h
[...]
> +struct rhash_head {
> + struct rhash_head   *next;
> +};

Arguably this could become a new singly-linked list type in list.h;
we don't currently have one.

> +/**
> + * rht_for_each_entry_safe - safely iterate over hash chain of given type
> + * @pos: type * to use as a loop cursor.
> + * @n:   type * to use for temporary next object storage
> + * @head:head of the hash chain (struct rhash_head *)
> + * @ht:  pointer to your struct rhashtable
> + * @member:  name of the rhash_head within the hashable struct.
> + *
> + * This hash chain list-traversal primitive allows for the looped code to
> + * remove the loop cursor from the list.
> + */
> +#define rht_for_each_entry_safe(pos, n, head, ht, member)\
> + for (pos = rht_entry_safe(rht_dereference(head, ht), \
> +typeof(*(pos)), member), \
> +  n = rht_entry_safe(rht_dereference((pos)->member.next, ht), \
> +  typeof(*(pos)), member); \
> +  pos; \
> +  pos = n, \
> +  n = rht_entry_safe(rht_dereference((pos)->member.next, ht), \
> +  typeof(*(pos)), member))

Given that removal requires special care, is this something actually
needed in the public interface, or only internally?  You don't
necessarily need to define a full set of list primitives.  (Unless of
course you make this a new list type in list.h.)

> --- /dev/null
> +++ b/lib/rhashtable.c
[...]
> +#define ASSERT_RHT_MUTEX(HT) BUG_ON(unlikely(!lockdep_rht_mutex_is_held(HT)))

BUG_ON and WARN_ON already include unlikely(); you don't need to add it
yourself.

> +/**
> + * rht_obj - cast hash head to outer object
> + * @ht:  hash table
> + * @he:  hashed node
> + */
> +void *rht_obj(const struct rhashtable *ht, const struct rhash_head *he)
> +{
> + return (void *) he - ht->p.head_offset;
> +}
> +EXPORT_SYMBOL_GPL(rht_obj);

Should this be a static inline?  And, will the runtime indirection
involved in head_offset add unnecessary overhead for tables of a known
type?  (I'd expect a head_offset of 0 in common cases.)

- Josh Triplett
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] netlink-socket: Add conceptual documentation.

2014-07-29 Thread Saurabh Shah
This is excellent. Thanks, Ben.

> -Original Message-
> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Ben Pfaff
> Sent: Monday, July 28, 2014 11:35 PM
> To: dev@openvswitch.org
> Cc: Ben Pfaff
> Subject: [ovs-dev] [PATCH] netlink-socket: Add conceptual documentation.
> 
> Based on a conversation with the VMware Hyper-V team earlier today.
> 
> This commit also changes a couple of functions that were only used with
> netlink-socket.c into static functions.  I couldn't think of a reason for code
> outside that file to use them.
> 
> Signed-off-by: Ben Pfaff 
> ---
>  lib/netlink-socket.c | 124 +--
>  lib/netlink-socket.h | 145
> +++
>  2 files changed, 197 insertions(+), 72 deletions(-)
> 
> diff --git a/lib/netlink-socket.c b/lib/netlink-socket.c index 
> 09d3a61..0ff85d6
> 100644
> --- a/lib/netlink-socket.c
> +++ b/lib/netlink-socket.c
> @@ -537,26 +537,7 @@ nl_sock_transact_multiple__(struct nl_sock *sock,
>  return error;
>  }
> 
> -/* Sends the 'request' member of the 'n' transactions in 'transactions' on
> - * 'sock', in order, and receives responses to all of them.  Fills in the
> - * 'error' member of each transaction with 0 if it was successful, otherwise
> - * with a positive errno value.  If 'reply' is nonnull, then it will be 
> filled
> - * with the reply if the message receives a detailed reply.  In other cases,
> - * i.e. where the request failed or had no reply beyond an indication of
> - * success, 'reply' will be cleared if it is nonnull.
> - *
> - * The caller is responsible for destroying each request and reply, and the
> - * transactions array itself.
> - *
> - * Before sending each message, this function will finalize nlmsg_len in each
> - * 'request' to match the ofpbuf's size,  set nlmsg_pid to 'sock''s pid, and
> - * initialize nlmsg_seq.
> - *
> - * Bare Netlink is an unreliable transport protocol.  This function layers
> - * reliable delivery and reply semantics on top of bare Netlink.  See
> - * nl_sock_transact() for some caveats.
> - */
> -void
> +static void
>  nl_sock_transact_multiple(struct nl_sock *sock,
>struct nl_transaction **transactions, size_t n)  { 
> @@ -611,47
> +592,7 @@ nl_sock_transact_multiple(struct nl_sock *sock,
>  }
>  }
> 
> -/* Sends 'request' to the kernel via 'sock' and waits for a response.  If
> - * successful, returns 0.  On failure, returns a positive errno value.
> - *
> - * If 'replyp' is nonnull, then on success '*replyp' is set to the kernel's
> - * reply, which the caller is responsible for freeing with ofpbuf_delete(), 
> and
> - * on failure '*replyp' is set to NULL.  If 'replyp' is null, then the 
> kernel's
> - * reply, if any, is discarded.
> - *
> - * Before the message is sent, nlmsg_len in 'request' will be finalized to
> - * match ofpbuf_size(msg), nlmsg_pid will be set to 'sock''s pid, and
> nlmsg_seq will
> - * be initialized, NLM_F_ACK will be set in nlmsg_flags.
> - *
> - * The caller is responsible for destroying 'request'.
> - *
> - * Bare Netlink is an unreliable transport protocol.  This function layers
> - * reliable delivery and reply semantics on top of bare Netlink.
> - *
> - * In Netlink, sending a request to the kernel is reliable enough, because 
> the
> - * kernel will tell us if the message cannot be queued (and we will in that
> - * case put it on the transmit queue and wait until it can be delivered).
> - *
> - * Receiving the reply is the real problem: if the socket buffer is full when
> - * the kernel tries to send the reply, the reply will be dropped.  However,
> the
> - * kernel sets a flag that a reply has been dropped.  The next call to recv
> - * then returns ENOBUFS.  We can then re-send the request.
> - *
> - * Caveats:
> - *
> - *  1. Netlink depends on sequence numbers to match up requests and
> - * replies.  The sender of a request supplies a sequence number, and
> - * the reply echos back that sequence number.
> - *
> - * This is fine, but (1) some kernel netlink implementations are
> - * broken, in that they fail to echo sequence numbers and (2) this
> - * function will drop packets with non-matching sequence numbers, so
> - * that only a single request can be usefully transacted at a time.
> - *
> - *  2. Resending the request causes it to be re-executed, so the request
> - * needs to be idempotent.
> - */
> -int
> +static int
>  nl_sock_transact(struct nl_sock *sock, const struct ofpbuf *request,
>   struct ofpbuf **replyp)  { @@ -1124,6 +1065,47 @@
> nl_pool_release(struct nl_sock *sock)
>  }
>  }
> 
> +/* Sends 'request' to the kernel on a Netlink socket for the given 'protocol'
> + * (e.g. NETLINK_ROUTE or NETLINK_GENERIC) and waits for a response.
> +If
> + * successful, returns 0.  On failure, returns a positive errno value.
> + *
> + * If 'replyp' is nonnull, then on s

Re: [ovs-dev] [PATCH 2/2] netlink: Convert netlink_lookup() to use RCU protected hash table

2014-07-29 Thread Tobias Klauser
On 2014-07-29 at 13:41:33 +0200, Thomas Graf  wrote:
> Heavy Netlink users such as Open vSwitch spend a considerable amount of
> time in netlink_lookup() due to the read-lock on nl_table_lock. Use of
> RCU relieves the lock contention.
> 
> Makes use of the new resizable hash table to avoid locking on the
> lookup.
> 
> The hash table will grow if entries exceeds 75% of table size up to a
> total table size of 64K. It will automatically shrink if usage falls
> below 50%.
> 
> Also splits nl_table_lock into a separate spinlock to protect hash table
> mutations. This avoids a possible deadlock when the hash table growing
> waits on RCU readers to complete via synchronize_rcu() while readers
> holding RCU read lock are waiting on the nl_table_lock() to be released
> to lock the table for broadcasting.
> 
> Before:
>9.16%  kpktgend_0  [openvswitch]  [k] masked_flow_lookup
>6.42%  kpktgend_0  [pktgen]   [k] mod_cur_headers
>6.26%  kpktgend_0  [pktgen]   [k] pktgen_thread_worker
>6.23%  kpktgend_0  [kernel.kallsyms]  [k] memset
>4.79%  kpktgend_0  [kernel.kallsyms]  [k] netlink_lookup
>4.37%  kpktgend_0  [kernel.kallsyms]  [k] memcpy
>3.60%  kpktgend_0  [openvswitch]  [k] ovs_flow_extract
>2.69%  kpktgend_0  [kernel.kallsyms]  [k] jhash2
> 
> After:
>   15.26%  kpktgend_0  [openvswitch]  [k] masked_flow_lookup
>8.12%  kpktgend_0  [pktgen]   [k] pktgen_thread_worker
>7.92%  kpktgend_0  [pktgen]   [k] mod_cur_headers
>5.11%  kpktgend_0  [kernel.kallsyms]  [k] memset
>4.11%  kpktgend_0  [openvswitch]  [k] ovs_flow_extract
>4.06%  kpktgend_0  [kernel.kallsyms]  [k] _raw_spin_lock
>3.90%  kpktgend_0  [kernel.kallsyms]  [k] jhash2
>[...]
>0.67%  kpktgend_0  [kernel.kallsyms]  [k] netlink_lookup
> 
> Signed-off-by: Thomas Graf 
> ---
>  net/netlink/af_netlink.c | 285 
> ++-
>  net/netlink/af_netlink.h |  18 +--
>  net/netlink/diag.c   |  11 +-
>  3 files changed, 119 insertions(+), 195 deletions(-)
> 
> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
> index 1b38f7f..7f44468 100644
> --- a/net/netlink/af_netlink.c
> +++ b/net/netlink/af_netlink.c
[...]
> @@ -2996,28 +2939,26 @@ static void *netlink_seq_next(struct seq_file *seq, 
> void *v, loff_t *pos)
>  
>   net = seq_file_net(seq);
>   iter = seq->private;
> - s = v;
> - do {
> - s = sk_next(s);
> - } while (s && !nl_table[s->sk_protocol].compare(net, s));
> - if (s)
> - return s;
> + nlk = v;
> +
> + rht_for_each_entry_rcu(nlk, nlk->node.next, node)
> + if (net_eq(sock_net((struct sock *)nlk), net))
> + return nlk;
>  
>   i = iter->link;
>   j = iter->hash_idx + 1;
>  
>   do {
> - struct nl_portid_hash *hash = &nl_table[i].hash;
> -
> - for (; j <= hash->mask; j++) {
> - s = sk_head(&hash->table[j]);
> + struct rhashtable *ht = &nl_table[i].hash;
> + const struct bucket_table *tbl = rht_dereference(ht->tbl, ht);
>  
> - while (s && !nl_table[s->sk_protocol].compare(net, s))
> - s = sk_next(s);
> - if (s) {
> - iter->link = i;
> - iter->hash_idx = j;
> - return s;
> + for (; j <= tbl->size; j++) {

Should the condition j < tbl->size here, since the bucket table only
contains `size' buckets?

> + rht_for_each_entry_rcu(nlk, tbl->buckets[j], node) {
> + if (net_eq(sock_net((struct sock *)nlk), net)) {
> + iter->link = i;
> + iter->hash_idx = j;
> + return nlk;
> + }
>   }
>   }
[...]
> diff --git a/net/netlink/diag.c b/net/netlink/diag.c
> index 1af29624..7588f34 100644
> --- a/net/netlink/diag.c
> +++ b/net/netlink/diag.c
[...]
> @@ -101,16 +102,20 @@ static int __netlink_diag_dump(struct sk_buff *skb, 
> struct netlink_callback *cb,
>   int protocol, int s_num)
>  {
>   struct netlink_table *tbl = &nl_table[protocol];
> - struct nl_portid_hash *hash = &tbl->hash;
> + struct rhashtable *ht = &tbl->hash;
> + const struct bucket_table *htbl = rht_dereference(ht->tbl, ht);
>   struct net *net = sock_net(skb->sk);
>   struct netlink_diag_req *req;
> + struct netlink_sock *nlsk;
>   struct sock *sk;
>   int ret = 0, num = 0, i;
>  
>   req = nlmsg_data(cb->nlh);
>  
> - for (i = 0; i <= hash->mask; i++) {
> - sk_for_each(sk, &hash->table[i]) {
> + for (i = 0; i <= htbl->size; i++) {

Same here, condition should be i < htbl->size

> + rht_for_each_entry(nlsk, h

Re: [ovs-dev] [PATCH] netlink-socket: Add conceptual documentation.

2014-07-29 Thread Ben Pfaff
Thanks for the review.

I realized that I left off a couple of important points, so I'm
folding in the following and will apply this to master soon.

diff --git a/lib/netlink-socket.h b/lib/netlink-socket.h
index 1450862..cdf32d2 100644
--- a/lib/netlink-socket.h
+++ b/lib/netlink-socket.h
@@ -149,6 +149,13 @@
  *   to select one of the PIDs, and sends the packet (encapsulated in an
  *   Open vSwitch Netlink message) to the socket with the selected PID.
  *
+ *   nl_sock_recv() reads notifications sent this way.
+ *
+ *   Messages received this way can overflow, just like multicast
+ *   subscription messages, and they are reported the same way.  Because
+ *   packet notification messages do not report the state of a table, there
+ *   is no way to recover the dropped packets; they are simply lost.
+ *
  *   The main reason to support multiple PIDs per vport is to increase
  *   fairness, that is, to make it harder for a single high-flow-rate
  *   sender to drown out lower rate sources.  Multiple PIDs per vport might


On Tue, Jul 29, 2014 at 03:29:53PM +, Saurabh Shah wrote:
> This is excellent. Thanks, Ben.
> 
> > -Original Message-
> > From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Ben Pfaff
> > Sent: Monday, July 28, 2014 11:35 PM
> > To: dev@openvswitch.org
> > Cc: Ben Pfaff
> > Subject: [ovs-dev] [PATCH] netlink-socket: Add conceptual documentation.
> > 
> > Based on a conversation with the VMware Hyper-V team earlier today.
> > 
> > This commit also changes a couple of functions that were only used with
> > netlink-socket.c into static functions.  I couldn't think of a reason for 
> > code
> > outside that file to use them.
> > 
> > Signed-off-by: Ben Pfaff 
> > ---
> >  lib/netlink-socket.c | 124 +--
> >  lib/netlink-socket.h | 145
> > +++
> >  2 files changed, 197 insertions(+), 72 deletions(-)
> > 
> > diff --git a/lib/netlink-socket.c b/lib/netlink-socket.c index 
> > 09d3a61..0ff85d6
> > 100644
> > --- a/lib/netlink-socket.c
> > +++ b/lib/netlink-socket.c
> > @@ -537,26 +537,7 @@ nl_sock_transact_multiple__(struct nl_sock *sock,
> >  return error;
> >  }
> > 
> > -/* Sends the 'request' member of the 'n' transactions in 'transactions' on
> > - * 'sock', in order, and receives responses to all of them.  Fills in the
> > - * 'error' member of each transaction with 0 if it was successful, 
> > otherwise
> > - * with a positive errno value.  If 'reply' is nonnull, then it will be 
> > filled
> > - * with the reply if the message receives a detailed reply.  In other 
> > cases,
> > - * i.e. where the request failed or had no reply beyond an indication of
> > - * success, 'reply' will be cleared if it is nonnull.
> > - *
> > - * The caller is responsible for destroying each request and reply, and the
> > - * transactions array itself.
> > - *
> > - * Before sending each message, this function will finalize nlmsg_len in 
> > each
> > - * 'request' to match the ofpbuf's size,  set nlmsg_pid to 'sock''s pid, 
> > and
> > - * initialize nlmsg_seq.
> > - *
> > - * Bare Netlink is an unreliable transport protocol.  This function layers
> > - * reliable delivery and reply semantics on top of bare Netlink.  See
> > - * nl_sock_transact() for some caveats.
> > - */
> > -void
> > +static void
> >  nl_sock_transact_multiple(struct nl_sock *sock,
> >struct nl_transaction **transactions, size_t n)  
> > { @@ -611,47
> > +592,7 @@ nl_sock_transact_multiple(struct nl_sock *sock,
> >  }
> >  }
> > 
> > -/* Sends 'request' to the kernel via 'sock' and waits for a response.  If
> > - * successful, returns 0.  On failure, returns a positive errno value.
> > - *
> > - * If 'replyp' is nonnull, then on success '*replyp' is set to the kernel's
> > - * reply, which the caller is responsible for freeing with 
> > ofpbuf_delete(), and
> > - * on failure '*replyp' is set to NULL.  If 'replyp' is null, then the 
> > kernel's
> > - * reply, if any, is discarded.
> > - *
> > - * Before the message is sent, nlmsg_len in 'request' will be finalized to
> > - * match ofpbuf_size(msg), nlmsg_pid will be set to 'sock''s pid, and
> > nlmsg_seq will
> > - * be initialized, NLM_F_ACK will be set in nlmsg_flags.
> > - *
> > - * The caller is responsible for destroying 'request'.
> > - *
> > - * Bare Netlink is an unreliable transport protocol.  This function layers
> > - * reliable delivery and reply semantics on top of bare Netlink.
> > - *
> > - * In Netlink, sending a request to the kernel is reliable enough, because 
> > the
> > - * kernel will tell us if the message cannot be queued (and we will in that
> > - * case put it on the transmit queue and wait until it can be delivered).
> > - *
> > - * Receiving the reply is the real problem: if the socket buffer is full 
> > when
> > - * the kernel tries to send the reply, the reply will be dropped.  

Re: [ovs-dev] [PATCH] cmap: Merge CMAP_FOR_EACH_SAFE into CMAP_FOR_EACH.

2014-07-29 Thread Ben Pfaff
On Mon, Jul 28, 2014 at 02:03:49PM -0700, Jarno Rajahalme wrote:
> Ben,
> 
> Thanks for reminding, somehow I lost track of this.
> 
> This is actually a nice cleanup :-)
> 
> Acked-by: Jarno Rajahalme 

Thanks, applied to master.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH] tests: Add another test for extended registers.

2014-07-29 Thread Ben Pfaff
ONF-JIRA: EXT-244
Suggested-by: Jean Tourrilhes 
Signed-off-by: Ben Pfaff 
---
 tests/ofproto-dpif.at |   21 +
 1 file changed, 21 insertions(+)

diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
index 4386639..9ee1698 100644
--- a/tests/ofproto-dpif.at
+++ b/tests/ofproto-dpif.at
@@ -410,6 +410,27 @@ AT_CHECK([tail -1 stdout], [0],
 OVS_VSWITCHD_STOP
 AT_CLEANUP
 
+dnl Tests that the standardized xregs are mapped onto the legacy OVS registers
+dnl in the manner documented in ovs-ofctl(8).
+AT_SETUP([ofproto-dpif - extended registers])
+OVS_VSWITCHD_START
+ADD_OF_PORTS([br0], [1], [2], [3])
+AT_DATA([flows.txt], [dnl
+table=0 actions=load:0xfedcba9876543210->OXM_OF_PKT_REG1[[]],resubmit(,1)
+table=1,reg2=0xfedcba98,reg3=0x76543210   actions=2
+
+# These low-priority rules shouldn't match.  They're here only to make really
+# sure that the test fails if either of the above rules fails to match.
+table=0,priority=0actions=3
+table=1,priority=0actions=3
+])
+AT_CHECK([ovs-ofctl add-flows br0 flows.txt])
+AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 
'in_port(1),eth(src=50:54:00:00:00:05,dst=50:54:00:00:00:07),eth_type(0x0800),ipv4(src=192.168.0.1,dst=192.168.0.2,proto=1,tos=0,ttl=128,frag=no),icmp(type=8,code=0)'],
 [0], [stdout])
+AT_CHECK([tail -1 stdout], [0], [Datapath actions: 2
+])
+OVS_VSWITCHD_STOP
+AT_CLEANUP
+
 AT_SETUP([ofproto-dpif - push-pop])
 OVS_VSWITCHD_START
 ADD_OF_PORTS([br0], [20], [21], [22], [33], [90])
-- 
1.7.10.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v2 1/5] Add defines, enums and headers for MSVC

2014-07-29 Thread Ben Pfaff
On Tue, Jul 29, 2014 at 03:21:16PM +, Alin Serdean wrote:
> Add defines needed to compile netlink-socket.c and netlink.c.
> 
> Add a wrapper and the functionality behind it for syconf.
> 
> Add the newly created files to the noinst_HEADERS in windows/automake.mk
> 
> Signed-off-by: Alin Gabriel Serdean 

Applied to master, thanks!
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] plan to integrate Hyper-V support

2014-07-29 Thread Alin Serdean
We have pushed in the changes we made upon our repository:
https://github.com/cloudbase/openvswitch-hyperv-kernel.git

This conspicuous update includes:

* Rebased on the the tip of master.
* Functionality to take into account the NETLINK padding.
* Hashing (SpookyHash) to improve performance over previous linear searches. 
* Some heavy refactoring to make the code more readable.
* Packet fragmentation fixes
* DirectIO performance improvements in place of the previous buffer copying

Kind Regards,
Alin.

From: Alessandro Pilotti
Sent: Tuesday, July 29, 2014 11:32 AM
To: Nithin Raju
Cc: Ben Pfaff; Saurabh Shah; dev@openvswitch.org; Alin Serdean; Guolin Yang; 
Eitan Eliahu; Rajiv Krishnamurthy
Subject: Re: plan to integrate Hyper-V support

Hi Nithin,

I'd also add at least:

1) Design for mapping hyper-v ports to OVS ports
2) CI testing plans
3) Roadmap

Thanks,

Alessandro

On 29.07.2014, at 05:01, "Nithin Raju"  wrote:

> 10.00 AM Pacific - Sounds good to me.
>
> hi Alessandro,
> We'll be meeting tomorrow. In the light of that, we had the following items 
> on mind that we could discuss. Pls. feel free to edit the list:
>
> 1. Helping Alin to run some tests successfully (with the V2 patch that 
> Saurabh posted last week).
> 2. Any design or code walkthrough for the kernel code.
> 3. Netlink interface related discussion.
> 4. Architectural topics.
>
> See you tomorrow!
>
> thanks,
> Nithin
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v2 3/5] Bypass HAVE_NETLINK for MSVC

2014-07-29 Thread Ben Pfaff
On Tue, Jul 29, 2014 at 03:22:37PM +, Alin Serdean wrote:
> Bypass the error compilation when compiling under MSVC.
> 
> Signed-off-by: Alin Gabriel Serdean 

Applied to master, thanks!
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 2/2] netlink: Convert netlink_lookup() to use RCU protected hash table

2014-07-29 Thread Thomas Graf
On 07/29/14 at 05:58pm, Tobias Klauser wrote:
> On 2014-07-29 at 13:41:33 +0200, Thomas Graf  wrote:
> > Heavy Netlink users such as Open vSwitch spend a considerable amount of
> > time in netlink_lookup() due to the read-lock on nl_table_lock. Use of
> > RCU relieves the lock contention.
> > 
> > Makes use of the new resizable hash table to avoid locking on the
> > lookup.
> > 
> > The hash table will grow if entries exceeds 75% of table size up to a
> > total table size of 64K. It will automatically shrink if usage falls
> > below 50%.
> > 
> > Also splits nl_table_lock into a separate spinlock to protect hash table
> > mutations. This avoids a possible deadlock when the hash table growing
> > waits on RCU readers to complete via synchronize_rcu() while readers
> > holding RCU read lock are waiting on the nl_table_lock() to be released
> > to lock the table for broadcasting.
> > 
> > Before:
> >9.16%  kpktgend_0  [openvswitch]  [k] masked_flow_lookup
> >6.42%  kpktgend_0  [pktgen]   [k] mod_cur_headers
> >6.26%  kpktgend_0  [pktgen]   [k] pktgen_thread_worker
> >6.23%  kpktgend_0  [kernel.kallsyms]  [k] memset
> >4.79%  kpktgend_0  [kernel.kallsyms]  [k] netlink_lookup
> >4.37%  kpktgend_0  [kernel.kallsyms]  [k] memcpy
> >3.60%  kpktgend_0  [openvswitch]  [k] ovs_flow_extract
> >2.69%  kpktgend_0  [kernel.kallsyms]  [k] jhash2
> > 
> > After:
> >   15.26%  kpktgend_0  [openvswitch]  [k] masked_flow_lookup
> >8.12%  kpktgend_0  [pktgen]   [k] pktgen_thread_worker
> >7.92%  kpktgend_0  [pktgen]   [k] mod_cur_headers
> >5.11%  kpktgend_0  [kernel.kallsyms]  [k] memset
> >4.11%  kpktgend_0  [openvswitch]  [k] ovs_flow_extract
> >4.06%  kpktgend_0  [kernel.kallsyms]  [k] _raw_spin_lock
> >3.90%  kpktgend_0  [kernel.kallsyms]  [k] jhash2
> >[...]
> >0.67%  kpktgend_0  [kernel.kallsyms]  [k] netlink_lookup
> > 
> > Signed-off-by: Thomas Graf 
> > ---
> >  net/netlink/af_netlink.c | 285 
> > ++-
> >  net/netlink/af_netlink.h |  18 +--
> >  net/netlink/diag.c   |  11 +-
> >  3 files changed, 119 insertions(+), 195 deletions(-)
> > 
> > diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
> > index 1b38f7f..7f44468 100644
> > --- a/net/netlink/af_netlink.c
> > +++ b/net/netlink/af_netlink.c
> [...]
> > @@ -2996,28 +2939,26 @@ static void *netlink_seq_next(struct seq_file *seq, 
> > void *v, loff_t *pos)
> >  
> > net = seq_file_net(seq);
> > iter = seq->private;
> > -   s = v;
> > -   do {
> > -   s = sk_next(s);
> > -   } while (s && !nl_table[s->sk_protocol].compare(net, s));
> > -   if (s)
> > -   return s;
> > +   nlk = v;
> > +
> > +   rht_for_each_entry_rcu(nlk, nlk->node.next, node)
> > +   if (net_eq(sock_net((struct sock *)nlk), net))
> > +   return nlk;
> >  
> > i = iter->link;
> > j = iter->hash_idx + 1;
> >  
> > do {
> > -   struct nl_portid_hash *hash = &nl_table[i].hash;
> > -
> > -   for (; j <= hash->mask; j++) {
> > -   s = sk_head(&hash->table[j]);
> > +   struct rhashtable *ht = &nl_table[i].hash;
> > +   const struct bucket_table *tbl = rht_dereference(ht->tbl, ht);
> >  
> > -   while (s && !nl_table[s->sk_protocol].compare(net, s))
> > -   s = sk_next(s);
> > -   if (s) {
> > -   iter->link = i;
> > -   iter->hash_idx = j;
> > -   return s;
> > +   for (; j <= tbl->size; j++) {
> 
> Should the condition j < tbl->size here, since the bucket table only
> contains `size' buckets?

Good catch, thanks

>
> 
> > +   rht_for_each_entry_rcu(nlk, tbl->buckets[j], node) {
> > +   if (net_eq(sock_net((struct sock *)nlk), net)) {
> > +   iter->link = i;
> > +   iter->hash_idx = j;
> > +   return nlk;
> > +   }
> > }
> > }
> [...]
> > diff --git a/net/netlink/diag.c b/net/netlink/diag.c
> > index 1af29624..7588f34 100644
> > --- a/net/netlink/diag.c
> > +++ b/net/netlink/diag.c
> [...]
> > @@ -101,16 +102,20 @@ static int __netlink_diag_dump(struct sk_buff *skb, 
> > struct netlink_callback *cb,
> > int protocol, int s_num)
> >  {
> > struct netlink_table *tbl = &nl_table[protocol];
> > -   struct nl_portid_hash *hash = &tbl->hash;
> > +   struct rhashtable *ht = &tbl->hash;
> > +   const struct bucket_table *htbl = rht_dereference(ht->tbl, ht);
> > struct net *net = sock_net(skb->sk);
> > struct netlink_diag_req *req;
> > +   struct netlink_sock *nlsk;
> > struct sock *sk;
> > int ret = 0, num = 0, i;
> >  
> > req = nlmsg_data(cb->nlh);
> >  
> > -   fo

Re: [ovs-dev] [PATCH] tests: Add another test for extended registers.

2014-07-29 Thread Jean Tourrilhes
Hi,

Unfortunately, I don't really have time to test the code, but
from here it looks good and make sense.
Have fun...

Jean

Acked-by: Jean Tourrilhes 

On Tue, Jul 29, 2014 at 09:27:23AM -0700, Ben Pfaff wrote:
> ONF-JIRA: EXT-244
> Suggested-by: Jean Tourrilhes 
> Signed-off-by: Ben Pfaff 
> ---
>  tests/ofproto-dpif.at |   21 +
>  1 file changed, 21 insertions(+)
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v2 4/5] Changes needed to netlink-socket for MSVC

2014-07-29 Thread Ben Pfaff
On Tue, Jul 29, 2014 at 03:23:28PM +, Alin Serdean wrote:
> Add two functions set_sock_pid_in_kernel and portid_next. This will allow
> the channel identification for the kernel extension to send back messages.
> 
> Replace send with WriteFile equivalent and ignore nl_sock_drain for the moment
> under MSVC.
> 
> Replace sendmsg and recvmsg with ReadFile and WriteFile equivalents.
> 
> On MSVC put in handle instead of fd(sock->fd becomes sock->handle).
> 
> Creation of the netlink socket will be replaced by CreateFile equivalent.
> 
> Add MAX_STACK_LENGTH for MSVC this will be our maximmum for on-stack copy 
> buffer.
> 
> Signed-off-by: Alin Gabriel Serdean 

I'm not convinced that this will be the final Netlink implementation
for Windows, but it seems like an OK place to start.

I'm folding in the following, which is mostly coding style change but
the removal of the ovs_mutex_init() call is actually important.

diff --git a/lib/netlink-socket.c b/lib/netlink-socket.c
index d648761..3dbfe59 100644
--- a/lib/netlink-socket.c
+++ b/lib/netlink-socket.c
@@ -53,15 +53,17 @@ static struct ovs_mutex portid_mutex = 
OVS_MUTEX_INITIALIZER;
 static uint32_t g_last_portid = 0;
 
 /* Port IDs must be unique! */
-uint32_t
-portid_next() OVS_GUARDED_BY(portid_mutex)
+static uint32_t
+portid_next(void)
+OVS_GUARDED_BY(portid_mutex)
 {
 g_last_portid++;
 return g_last_portid;
 }
 
-void
-set_sock_pid_in_kernel(HANDLE handle, uint32_t pid) 
OVS_GUARDED_BY(portid_mutex)
+static void
+set_sock_pid_in_kernel(HANDLE handle, uint32_t pid)
+OVS_GUARDED_BY(portid_mutex)
 {
 struct nlmsghdr msg = { 0 };
 
@@ -73,7 +75,7 @@ set_sock_pid_in_kernel(HANDLE handle, uint32_t pid) 
OVS_GUARDED_BY(portid_mutex)
 
 WriteFile(handle, &msg, sizeof(struct nlmsghdr), NULL, NULL);
 }
-#endif
+#endif  /* _WIN32 */
 
 /* A single (bad) Netlink message can in theory dump out many, many log
  * messages, so the burst size is set quite high here to avoid missing useful
@@ -174,7 +176,6 @@ nl_sock_create(int protocol, struct nl_sock **sockp)
 rcvbuf = 1024 * 1024;
 #ifdef _WIN32
 sock->rcvbuf = rcvbuf;
-ovs_mutex_init(&portid_mutex);
 ovs_mutex_lock(&portid_mutex);
 sock->pid = portid_next();
 set_sock_pid_in_kernel(sock->handle, sock->pid);
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v2 5/5] Add more files to the openvswitch library on MSVC

2014-07-29 Thread Ben Pfaff
On Tue, Jul 29, 2014 at 03:24:08PM +, Alin Serdean wrote:
> Add netlink related files to the windows build.
> 
> 
> Signed-off-by: Alin Gabriel Serdean 

Applied to master, thanks!
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH] tests: Add another test for extended registers.

2014-07-29 Thread Ben Pfaff
That's good enough for me, for a review of a test case.  Thanks!  I've
applied this to master.

On Tue, Jul 29, 2014 at 09:35:57AM -0700, Jean Tourrilhes wrote:
>   Hi,
> 
>   Unfortunately, I don't really have time to test the code, but
> from here it looks good and make sense.
>   Have fun...
> 
>   Jean
> 
> Acked-by: Jean Tourrilhes 
> 
> On Tue, Jul 29, 2014 at 09:27:23AM -0700, Ben Pfaff wrote:
> > ONF-JIRA: EXT-244
> > Suggested-by: Jean Tourrilhes 
> > Signed-off-by: Ben Pfaff 
> > ---
> >  tests/ofproto-dpif.at |   21 +
> >  1 file changed, 21 insertions(+)
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH v2 4/5] Changes needed to netlink-socket for MSVC

2014-07-29 Thread Alin Serdean
Thank you for review and the changes. 

Kind Regards,
Alin.

From: Ben Pfaff [b...@nicira.com]
Sent: Tuesday, July 29, 2014 7:38 PM
To: Alin Serdean
Cc: dev@openvswitch.org
Subject: Re: [ovs-dev] [PATCH v2 4/5] Changes needed to netlink-socket for MSVC

On Tue, Jul 29, 2014 at 03:23:28PM +, Alin Serdean wrote:
> Add two functions set_sock_pid_in_kernel and portid_next. This will allow
> the channel identification for the kernel extension to send back messages.
>
> Replace send with WriteFile equivalent and ignore nl_sock_drain for the moment
> under MSVC.
>
> Replace sendmsg and recvmsg with ReadFile and WriteFile equivalents.
>
> On MSVC put in handle instead of fd(sock->fd becomes sock->handle).
>
> Creation of the netlink socket will be replaced by CreateFile equivalent.
>
> Add MAX_STACK_LENGTH for MSVC this will be our maximmum for on-stack copy 
> buffer.
>
> Signed-off-by: Alin Gabriel Serdean 

I'm not convinced that this will be the final Netlink implementation
for Windows, but it seems like an OK place to start.

I'm folding in the following, which is mostly coding style change but
the removal of the ovs_mutex_init() call is actually important.

diff --git a/lib/netlink-socket.c b/lib/netlink-socket.c
index d648761..3dbfe59 100644
--- a/lib/netlink-socket.c
+++ b/lib/netlink-socket.c
@@ -53,15 +53,17 @@ static struct ovs_mutex portid_mutex = 
OVS_MUTEX_INITIALIZER;
 static uint32_t g_last_portid = 0;

 /* Port IDs must be unique! */
-uint32_t
-portid_next() OVS_GUARDED_BY(portid_mutex)
+static uint32_t
+portid_next(void)
+OVS_GUARDED_BY(portid_mutex)
 {
 g_last_portid++;
 return g_last_portid;
 }

-void
-set_sock_pid_in_kernel(HANDLE handle, uint32_t pid) 
OVS_GUARDED_BY(portid_mutex)
+static void
+set_sock_pid_in_kernel(HANDLE handle, uint32_t pid)
+OVS_GUARDED_BY(portid_mutex)
 {
 struct nlmsghdr msg = { 0 };

@@ -73,7 +75,7 @@ set_sock_pid_in_kernel(HANDLE handle, uint32_t pid) 
OVS_GUARDED_BY(portid_mutex)

 WriteFile(handle, &msg, sizeof(struct nlmsghdr), NULL, NULL);
 }
-#endif
+#endif  /* _WIN32 */

 /* A single (bad) Netlink message can in theory dump out many, many log
  * messages, so the burst size is set quite high here to avoid missing useful
@@ -174,7 +176,6 @@ nl_sock_create(int protocol, struct nl_sock **sockp)
 rcvbuf = 1024 * 1024;
 #ifdef _WIN32
 sock->rcvbuf = rcvbuf;
-ovs_mutex_init(&portid_mutex);
 ovs_mutex_lock(&portid_mutex);
 sock->pid = portid_next();
 set_sock_pid_in_kernel(sock->handle, sock->pid);
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH] tests: Turn off appctl poll_loop logging for long output.

2014-07-29 Thread Ben Pfaff
One of the VMware internal autobuilder builds failed due to extraneous
logging in these tests of the form:

   2014-07-28T21:11:07Z|1|poll_loop|INFO|wakeup due to [POLLIN] on fd 3
   (...) at lib/stream-fd-unix.c:124 (93% CPU usage)

I think this must be because these tests have tons of output and so on a
loaded machine it can take some CPU to pull it down.  We don't want to fail
for that reason, so disable this logging.

Signed-off-by: Ben Pfaff 
---
 tests/ofproto-dpif.at |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
index 9ee1698..0253cb0 100644
--- a/tests/ofproto-dpif.at
+++ b/tests/ofproto-dpif.at
@@ -5458,7 +5458,7 @@ AT_BANNER([ofproto-dpif - flow translation resource 
limits])
 AT_SETUP([ofproto-dpif - infinite resubmit])
 OVS_VSWITCHD_START
 AT_CHECK([ovs-ofctl add-flow br0 actions=resubmit:1,resubmit:2,output:3])
-AT_CHECK([ovs-appctl ofproto/trace br0 'eth_dst=ff:ff:ff:ff:ff:ff'],
+AT_CHECK([ovs-appctl -vpoll_loop:off ofproto/trace br0 
'eth_dst=ff:ff:ff:ff:ff:ff'],
   [0], [stdout])
 AT_CHECK([tail -1 stdout], [0], [Datapath actions: drop
 ])
@@ -5477,7 +5477,7 @@ ADD_OF_PORTS([br0], 1)
  done
  echo "in_port=65, actions=local") > flows
  AT_CHECK([ovs-ofctl add-flows br0 flows])
-AT_CHECK([ovs-appctl ofproto/trace br0 'in_port=1'], [0], [stdout])
+AT_CHECK([ovs-appctl -vpoll_loop:off ofproto/trace br0 'in_port=1'], [0], 
[stdout])
 AT_CHECK([grep -c 'over 4096 resubmit actions' ovs-vswitchd.log], [0], [1
 ])
 OVS_VSWITCHD_STOP(["/over.*resubmit actions/d"])
@@ -5492,7 +5492,7 @@ ADD_OF_PORTS([br0], 1)
  done
  echo "in_port=13, actions=local,local,local,local,local,local,local,local") > 
flows
 AT_CHECK([ovs-ofctl add-flows br0 flows])
-AT_CHECK([ovs-appctl ofproto/trace br0 'in_port=1'], [0], [stdout])
+AT_CHECK([ovs-appctl -vpoll_loop:off ofproto/trace br0 'in_port=1'], [0], 
[stdout])
 AT_CHECK([grep -c -e '- Uses action(s) not supported by datapath' stdout],
   [0], [1
 ])
@@ -5511,7 +5511,7 @@ ADD_OF_PORTS([br0], 1)
  push="push:NXM_NX_REG0[[]]"
  echo "in_port=13, actions=$push,$push,$push,$push,$push,$push,$push,$push") > 
flows
  AT_CHECK([ovs-ofctl add-flows br0 flows])
-AT_CHECK([ovs-appctl ofproto/trace br0 'in_port=1'], [0], [stdout])
+AT_CHECK([ovs-appctl -vpoll_loop:off ofproto/trace br0 'in_port=1'], [0], 
[stdout])
 AT_CHECK([grep -c 'resubmits yielded over 64 kB of stack' ovs-vswitchd.log], 
[0], [1
 ])
 OVS_VSWITCHD_STOP(["/resubmits yielded over 64 kB of stack/d"])
-- 
1.7.10.4

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 1/2] lib: Resizable, Scalable, Concurrent Hash Table

2014-07-29 Thread Thomas Graf
On 07/29/14 at 08:30am, Josh Triplett wrote:
> On Tue, Jul 29, 2014 at 01:41:32PM +0200, Thomas Graf wrote:
> > Generic implementation of a resizable, scalable, concurrent hash table
> > based on [0]. The implementation supports both, fixed size keys specified
> > via an offset and length, or arbitrary keys via own hash and compare
> > functions.
> > 
> > Lookups are lockless and protected as RCU read side critical sections.
> > Automatic growing/shrinking based on user configurable watermarks is
> > available while allowing concurrent lookups to take place.
> > 
> > Objects to be hashed must include a struct rhash_head. The reason for not
> > using the existing struct hlist_head is that the expansion and shrinking
> > will have two buckets point to a single entry which would lead in obscure
> > reverse chaining behaviour.
> > 
> > Code includes a boot selftest if CONFIG_TEST_RHASHTABLE is defined.
> > 
> > [0] https://www.usenix.org/legacy/event/atc11/tech/final_files/Triplett.pdf
> > 
> > Signed-off-by: Thomas Graf 
> 
> Thanks for working on this!
> 
> Currently reading the algorithm implementation in more detail.  A couple
> of minor issues below.

Thanks for the initial review!

> 
> > --- /dev/null
> > +++ b/include/linux/rhashtable.h
> [...]
> > +struct rhash_head {
> > +   struct rhash_head   *next;
> > +};
> 
> Arguably this could become a new singly-linked list type in list.h;
> we don't currently have one.

No objections and I'm very open to this if there is use for it
aside from this hash table implementation.

> > +/**
> > + * rht_for_each_entry_safe - safely iterate over hash chain of given type
> > + * @pos:   type * to use as a loop cursor.
> > + * @n: type * to use for temporary next object storage
> > + * @head:  head of the hash chain (struct rhash_head *)
> > + * @ht:pointer to your struct rhashtable
> > + * @member:name of the rhash_head within the hashable struct.
> > + *
> > + * This hash chain list-traversal primitive allows for the looped code to
> > + * remove the loop cursor from the list.
> > + */
> > +#define rht_for_each_entry_safe(pos, n, head, ht, member)  \
> > +   for (pos = rht_entry_safe(rht_dereference(head, ht), \
> > +  typeof(*(pos)), member), \
> > +n = rht_entry_safe(rht_dereference((pos)->member.next, ht), \
> > +typeof(*(pos)), member); \
> > +pos; \
> > +pos = n, \
> > +n = rht_entry_safe(rht_dereference((pos)->member.next, ht), \
> > +typeof(*(pos)), member))
> 
> Given that removal requires special care, is this something actually
> needed in the public interface, or only internally?  You don't
> necessarily need to define a full set of list primitives.  (Unless of
> course you make this a new list type in list.h.)

The main purpose of the safe variant is to allow API users to easily
destruct all table entries in the end and do something like this:

tbl = rht_dereference(ht->tbl, ht);
for (i = 0; i < tbl->size; i++)
rht_for_each_entry_safe(obj, next, tbl->buckets[i], ht, node)
kfree(obj);

> > --- /dev/null
> > +++ b/lib/rhashtable.c
> [...]
> > +#define ASSERT_RHT_MUTEX(HT) 
> > BUG_ON(unlikely(!lockdep_rht_mutex_is_held(HT)))
> 
> BUG_ON and WARN_ON already include unlikely(); you don't need to add it
> yourself.

Thanks, will fix.

> > +/**
> > + * rht_obj - cast hash head to outer object
> > + * @ht:hash table
> > + * @he:hashed node
> > + */
> > +void *rht_obj(const struct rhashtable *ht, const struct rhash_head *he)
> > +{
> > +   return (void *) he - ht->p.head_offset;
> > +}
> > +EXPORT_SYMBOL_GPL(rht_obj);
> 
> Should this be a static inline?  And, will the runtime indirection
> involved in head_offset add unnecessary overhead for tables of a known
> type?  (I'd expect a head_offset of 0 in common cases.)

I left the optimization to the compiler here as I would expect it to
inline this automatically. As for the overhead of the head_offset, I
think it is minimal and justified by the added flexibility. The
overhead is basically the same as for lists.

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH] datapath: Remove redundant key ref from upcall_info.

2014-07-29 Thread Pravin B Shelar
struct dp_upcall_info has pointer to pkt_key which is already
available in OVS_CB.  This also simplifies upcall handling
for gso packet.

Signed-off-by: Pravin B Shelar 
---
 datapath/actions.c  |  3 ---
 datapath/datapath.c | 34 +-
 datapath/datapath.h |  4 +---
 3 files changed, 18 insertions(+), 23 deletions(-)

diff --git a/datapath/actions.c b/datapath/actions.c
index 39a21f4..792c3a9 100644
--- a/datapath/actions.c
+++ b/datapath/actions.c
@@ -503,10 +503,7 @@ static int output_userspace(struct datapath *dp, struct 
sk_buff *skb,
const struct nlattr *a;
int rem;
 
-   BUG_ON(!OVS_CB(skb)->pkt_key);
-
upcall.cmd = OVS_PACKET_CMD_ACTION;
-   upcall.key = OVS_CB(skb)->pkt_key;
upcall.userdata = NULL;
upcall.portid = 0;
 
diff --git a/datapath/datapath.c b/datapath/datapath.c
index 94539eb..1185f60 100644
--- a/datapath/datapath.c
+++ b/datapath/datapath.c
@@ -262,6 +262,7 @@ void ovs_dp_process_packet_with_key(struct sk_buff *skb,
u32 n_mask_hit;
 
stats = this_cpu_ptr(dp->stats_percpu);
+   OVS_CB(skb)->pkt_key = pkt_key;
 
/* Look up flow. */
flow = ovs_flow_tbl_lookup_stats(&dp->table, pkt_key, skb_get_hash(skb),
@@ -270,7 +271,6 @@ void ovs_dp_process_packet_with_key(struct sk_buff *skb,
struct dp_upcall_info upcall;
 
upcall.cmd = OVS_PACKET_CMD_MISS;
-   upcall.key = pkt_key;
upcall.userdata = NULL;
upcall.portid = ovs_vport_find_upcall_portid(p, skb);
ovs_dp_upcall(dp, skb, &upcall);
@@ -279,7 +279,6 @@ void ovs_dp_process_packet_with_key(struct sk_buff *skb,
goto out;
}
 
-   OVS_CB(skb)->pkt_key = pkt_key;
OVS_CB(skb)->flow = flow;
 
ovs_flow_stats_update(OVS_CB(skb)->flow, pkt_key->tp.flags, skb);
@@ -316,6 +315,8 @@ int ovs_dp_upcall(struct datapath *dp, struct sk_buff *skb,
struct dp_stats_percpu *stats;
int err;
 
+   BUG_ON(!OVS_CB(skb)->pkt_key);
+
if (upcall_info->portid == 0) {
err = -ENOTCONN;
goto err;
@@ -344,7 +345,6 @@ static int queue_gso_packets(struct datapath *dp, struct 
sk_buff *skb,
 const struct dp_upcall_info *upcall_info)
 {
unsigned short gso_type = skb_shinfo(skb)->gso_type;
-   struct dp_upcall_info later_info;
struct sw_flow_key later_key;
struct sk_buff *segs, *nskb;
int err;
@@ -353,25 +353,25 @@ static int queue_gso_packets(struct datapath *dp, struct 
sk_buff *skb,
if (IS_ERR(segs))
return PTR_ERR(segs);
 
+   if (gso_type & SKB_GSO_UDP) {
+   /* The initial flow key extracted by ovs_flow_extract()
+* in this case is for a first fragment, so we need to
+* properly mark later fragments.
+*/
+   later_key = *OVS_CB(skb)->pkt_key;
+   later_key.ip.frag = OVS_FRAG_TYPE_LATER;
+   }
+
/* Queue all of the segments. */
skb = segs;
do {
+   if (gso_type & SKB_GSO_UDP && skb != segs)
+   OVS_CB(skb)->pkt_key = &later_key;
+
err = queue_userspace_packet(dp, skb, upcall_info);
if (err)
break;
 
-   if (skb == segs && gso_type & SKB_GSO_UDP) {
-   /* The initial flow key extracted by ovs_flow_extract()
-* in this case is for a first fragment, so we need to
-* properly mark later fragments.
-*/
-   later_key = *upcall_info->key;
-   later_key.ip.frag = OVS_FRAG_TYPE_LATER;
-
-   later_info = *upcall_info;
-   later_info.key = &later_key;
-   upcall_info = &later_info;
-   }
} while ((skb = skb->next));
 
/* Free all of the segments. */
@@ -437,6 +437,7 @@ static int queue_userspace_packet(struct datapath *dp, 
struct sk_buff *skb,
struct ovs_header *upcall;
struct sk_buff *nskb = NULL;
struct sk_buff *user_skb; /* to be queued to userspace */
+   struct sw_flow_key *pkt_key = OVS_CB(skb)->pkt_key;
struct nlattr *nla;
struct genl_info info = {
 #if LINUX_VERSION_CODE >= KERNEL_VERSION(3,14,0)
@@ -497,8 +498,7 @@ static int queue_userspace_packet(struct datapath *dp, 
struct sk_buff *skb,
upcall->dp_ifindex = dp_ifindex;
 
nla = nla_nest_start(user_skb, OVS_PACKET_ATTR_KEY);
-   err = ovs_nla_put_flow(dp, upcall_info->key,
-  upcall_info->key, user_skb);
+   err = ovs_nla_put_flow(dp, pkt_key, pkt_key, user_skb);
BUG_ON(err);
nla_nest_end(user_skb, nla);
 
diff --git a/datapath/datapath.h b/datapath/datapath.h
index d6dee50..f2e8d6b 100644
--- a/dat

[ovs-dev] [PATCH 1/2] connmgr: Fix a typo.

2014-07-29 Thread Alex Wang
Signed-off-by: Alex Wang 
---
 ofproto/connmgr.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ofproto/connmgr.c b/ofproto/connmgr.c
index b7d5d3b..c6432bf 100644
--- a/ofproto/connmgr.c
+++ b/ofproto/connmgr.c
@@ -1496,7 +1496,7 @@ ofconn_wants_packet_in_on_miss(struct ofconn *ofconn,
 /* The default "table-miss" behaviour for OpenFlow1.3+ is to drop the
  * packet rather than to send the packet to the controller.
  *
- * This function returns false to indicate that a packet_in message
+ * This function returns true to indicate that a packet_in message
  * for a "table-miss" should be sent to at least one controller.
  * That is there is at least one controller with controller_id 0
  * which connected using an OpenFlow version earlier than OpenFlow1.3.
-- 
1.7.9.5

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH 2/2] connmgr: Make call to connmgr_wants_packet_in_on_miss() thread safe.

2014-07-29 Thread Alex Wang
connmgr_wants_packet_in_on_miss() is called by multiple threads
and thusly should be protected by the mutex.

Signed-off-by: Alex Wang 
---
 ofproto/connmgr.c |6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/ofproto/connmgr.c b/ofproto/connmgr.c
index c6432bf..89af6b6 100644
--- a/ofproto/connmgr.c
+++ b/ofproto/connmgr.c
@@ -1506,19 +1506,23 @@ ofconn_wants_packet_in_on_miss(struct ofconn *ofconn,
  * This logic assumes that "table-miss" packet_in messages
  * are always sent to controller_id 0. */
 bool
-connmgr_wants_packet_in_on_miss(struct connmgr *mgr)
+connmgr_wants_packet_in_on_miss(struct connmgr *mgr) 
OVS_EXCLUDED(ofproto_mutex)
 {
 struct ofconn *ofconn;
 
+ovs_mutex_lock(&ofproto_mutex);
 LIST_FOR_EACH (ofconn, node, &mgr->all_conns) {
 enum ofputil_protocol protocol = ofconn_get_protocol(ofconn);
 
 if (ofconn->controller_id == 0 &&
 (protocol == OFPUTIL_P_NONE ||
  ofputil_protocol_to_ofp_version(protocol) < OFP13_VERSION)) {
+ovs_mutex_unlock(&ofproto_mutex);
 return true;
 }
 }
+ovs_mutex_unlock(&ofproto_mutex);
+
 return false;
 }
 
-- 
1.7.9.5

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [RFC] connmgr: Make connmgr_wants_packet_in_on_miss() only check controller connections.

2014-07-29 Thread Alex Wang
The connmgr_wants_packet_in_on_miss() should only check 'ofconn's
of type OFCONN_PRIMARY (i.e. controller connections).  This commit
makes it happen.

Signed-off-by: Alex Wang 
---
 ofproto/connmgr.c |   16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/ofproto/connmgr.c b/ofproto/connmgr.c
index 89af6b6..49ced1f 100644
--- a/ofproto/connmgr.c
+++ b/ofproto/connmgr.c
@@ -1512,13 +1512,15 @@ connmgr_wants_packet_in_on_miss(struct connmgr *mgr) 
OVS_EXCLUDED(ofproto_mutex)
 
 ovs_mutex_lock(&ofproto_mutex);
 LIST_FOR_EACH (ofconn, node, &mgr->all_conns) {
-enum ofputil_protocol protocol = ofconn_get_protocol(ofconn);
-
-if (ofconn->controller_id == 0 &&
-(protocol == OFPUTIL_P_NONE ||
- ofputil_protocol_to_ofp_version(protocol) < OFP13_VERSION)) {
-ovs_mutex_unlock(&ofproto_mutex);
-return true;
+if (ofconn->type == OFCONN_PRIMARY) {
+enum ofputil_protocol protocol = ofconn_get_protocol(ofconn);
+
+if (ofconn->controller_id == 0 &&
+(protocol == OFPUTIL_P_NONE ||
+ ofputil_protocol_to_ofp_version(protocol) < OFP13_VERSION)) {
+ovs_mutex_unlock(&ofproto_mutex);
+return true;
+}
 }
 }
 ovs_mutex_unlock(&ofproto_mutex);
-- 
1.7.9.5

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [RFC] connmgr: Make connmgr_wants_packet_in_on_miss() only check controller connections.

2014-07-29 Thread Alex Wang
Hey Simon,

I think we should not check 'ofconn's for OFCONN_SERVICE... This
change broke several tests:

*Please send `tests/testsuite.log' and all information you think might
help:*

*   To: >*

*   Subject: [openvswitch 2.3.0] testsuite: 737 739 747 817 failed*

Wherein we use the ofctl monitor to simulate controller and capture the
packet_ins.

So, could you help me clarify if we need consider ofctl conn as a

controller connection?


Thanks,

Alex Wang,


On Tue, Jul 29, 2014 at 11:24 AM, Alex Wang  wrote:

> The connmgr_wants_packet_in_on_miss() should only check 'ofconn's
> of type OFCONN_PRIMARY (i.e. controller connections).  This commit
> makes it happen.
>
> Signed-off-by: Alex Wang 
> ---
>  ofproto/connmgr.c |   16 +---
>  1 file changed, 9 insertions(+), 7 deletions(-)
>
> diff --git a/ofproto/connmgr.c b/ofproto/connmgr.c
> index 89af6b6..49ced1f 100644
> --- a/ofproto/connmgr.c
> +++ b/ofproto/connmgr.c
> @@ -1512,13 +1512,15 @@ connmgr_wants_packet_in_on_miss(struct connmgr
> *mgr) OVS_EXCLUDED(ofproto_mutex)
>
>  ovs_mutex_lock(&ofproto_mutex);
>  LIST_FOR_EACH (ofconn, node, &mgr->all_conns) {
> -enum ofputil_protocol protocol = ofconn_get_protocol(ofconn);
> -
> -if (ofconn->controller_id == 0 &&
> -(protocol == OFPUTIL_P_NONE ||
> - ofputil_protocol_to_ofp_version(protocol) < OFP13_VERSION)) {
> -ovs_mutex_unlock(&ofproto_mutex);
> -return true;
> +if (ofconn->type == OFCONN_PRIMARY) {
> +enum ofputil_protocol protocol = ofconn_get_protocol(ofconn);
> +
> +if (ofconn->controller_id == 0 &&
> +(protocol == OFPUTIL_P_NONE ||
> + ofputil_protocol_to_ofp_version(protocol) <
> OFP13_VERSION)) {
> +ovs_mutex_unlock(&ofproto_mutex);
> +return true;
> +}
>  }
>  }
>  ovs_mutex_unlock(&ofproto_mutex);
> --
> 1.7.9.5
>
>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 1/2] connmgr: Fix a typo.

2014-07-29 Thread Andy Zhou
Acked-by: Andy Zhou 

On Tue, Jul 29, 2014 at 11:20 AM, Alex Wang  wrote:
> Signed-off-by: Alex Wang 
> ---
>  ofproto/connmgr.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/ofproto/connmgr.c b/ofproto/connmgr.c
> index b7d5d3b..c6432bf 100644
> --- a/ofproto/connmgr.c
> +++ b/ofproto/connmgr.c
> @@ -1496,7 +1496,7 @@ ofconn_wants_packet_in_on_miss(struct ofconn *ofconn,
>  /* The default "table-miss" behaviour for OpenFlow1.3+ is to drop the
>   * packet rather than to send the packet to the controller.
>   *
> - * This function returns false to indicate that a packet_in message
> + * This function returns true to indicate that a packet_in message
>   * for a "table-miss" should be sent to at least one controller.
>   * That is there is at least one controller with controller_id 0
>   * which connected using an OpenFlow version earlier than OpenFlow1.3.
> --
> 1.7.9.5
>
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 2/2] connmgr: Make call to connmgr_wants_packet_in_on_miss() thread safe.

2014-07-29 Thread Andy Zhou
Should we also consider lock annotate the ofproto->connmgr variable?

Also, it seems to me the lock can be per ofproto, instead of a global
lock, but I am not sure  this use case will benefit from a finer grain
lock.


On Tue, Jul 29, 2014 at 11:20 AM, Alex Wang  wrote:
> connmgr_wants_packet_in_on_miss() is called by multiple threads
> and thusly should be protected by the mutex.
>
> Signed-off-by: Alex Wang 
> ---
>  ofproto/connmgr.c |6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/ofproto/connmgr.c b/ofproto/connmgr.c
> index c6432bf..89af6b6 100644
> --- a/ofproto/connmgr.c
> +++ b/ofproto/connmgr.c
> @@ -1506,19 +1506,23 @@ ofconn_wants_packet_in_on_miss(struct ofconn *ofconn,
>   * This logic assumes that "table-miss" packet_in messages
>   * are always sent to controller_id 0. */
>  bool
> -connmgr_wants_packet_in_on_miss(struct connmgr *mgr)
> +connmgr_wants_packet_in_on_miss(struct connmgr *mgr) 
> OVS_EXCLUDED(ofproto_mutex)
>  {
>  struct ofconn *ofconn;
>
> +ovs_mutex_lock(&ofproto_mutex);
>  LIST_FOR_EACH (ofconn, node, &mgr->all_conns) {
>  enum ofputil_protocol protocol = ofconn_get_protocol(ofconn);
>
>  if (ofconn->controller_id == 0 &&
>  (protocol == OFPUTIL_P_NONE ||
>   ofputil_protocol_to_ofp_version(protocol) < OFP13_VERSION)) {
> +ovs_mutex_unlock(&ofproto_mutex);
>  return true;
>  }
>  }
> +ovs_mutex_unlock(&ofproto_mutex);
> +
>  return false;
>  }
>
> --
> 1.7.9.5
>
> ___
> dev mailing list
> dev@openvswitch.org
> http://openvswitch.org/mailman/listinfo/dev
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 2/2] connmgr: Make call to connmgr_wants_packet_in_on_miss() thread safe.

2014-07-29 Thread Alex Wang
Thx for the suggestion, I think it will make the connmgr module safer
if we annotate the ofproto->connmgr.  With a quick look, I think it will
take more checking and efforts, (for functions we know that are only called
by the main thread, there is no lock protection)

For branch-2.3, I want to use this fix.  Do you think it is okay?

Thanks,
Alex Wang,


On Tue, Jul 29, 2014 at 1:23 PM, Andy Zhou  wrote:

> Should we also consider lock annotate the ofproto->connmgr variable?
>
> Also, it seems to me the lock can be per ofproto, instead of a global
> lock, but I am not sure  this use case will benefit from a finer grain
> lock.
>
>
> On Tue, Jul 29, 2014 at 11:20 AM, Alex Wang  wrote:
> > connmgr_wants_packet_in_on_miss() is called by multiple threads
> > and thusly should be protected by the mutex.
> >
> > Signed-off-by: Alex Wang 
> > ---
> >  ofproto/connmgr.c |6 +-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/ofproto/connmgr.c b/ofproto/connmgr.c
> > index c6432bf..89af6b6 100644
> > --- a/ofproto/connmgr.c
> > +++ b/ofproto/connmgr.c
> > @@ -1506,19 +1506,23 @@ ofconn_wants_packet_in_on_miss(struct ofconn
> *ofconn,
> >   * This logic assumes that "table-miss" packet_in messages
> >   * are always sent to controller_id 0. */
> >  bool
> > -connmgr_wants_packet_in_on_miss(struct connmgr *mgr)
> > +connmgr_wants_packet_in_on_miss(struct connmgr *mgr)
> OVS_EXCLUDED(ofproto_mutex)
> >  {
> >  struct ofconn *ofconn;
> >
> > +ovs_mutex_lock(&ofproto_mutex);
> >  LIST_FOR_EACH (ofconn, node, &mgr->all_conns) {
> >  enum ofputil_protocol protocol = ofconn_get_protocol(ofconn);
> >
> >  if (ofconn->controller_id == 0 &&
> >  (protocol == OFPUTIL_P_NONE ||
> >   ofputil_protocol_to_ofp_version(protocol) <
> OFP13_VERSION)) {
> > +ovs_mutex_unlock(&ofproto_mutex);
> >  return true;
> >  }
> >  }
> > +ovs_mutex_unlock(&ofproto_mutex);
> > +
> >  return false;
> >  }
> >
> > --
> > 1.7.9.5
> >
> > ___
> > dev mailing list
> > dev@openvswitch.org
> > http://openvswitch.org/mailman/listinfo/dev
>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 2/2] connmgr: Make call to connmgr_wants_packet_in_on_miss() thread safe.

2014-07-29 Thread Andy Zhou
On Tue, Jul 29, 2014 at 2:09 PM, Alex Wang  wrote:
> Thx for the suggestion, I think it will make the connmgr module safer
> if we annotate the ofproto->connmgr.  With a quick look, I think it will
> take more checking and efforts, (for functions we know that are only called
> by the main thread, there is no lock protection)
Is this safe if ofproto->connmgr can be accessed by other threads?
>
> For branch-2.3, I want to use this fix.  Do you think it is okay?
I suppose so, as long as the fix is safe. Your call.
>
> Thanks,
> Alex Wang,
>
>
> On Tue, Jul 29, 2014 at 1:23 PM, Andy Zhou  wrote:
>>
>> Should we also consider lock annotate the ofproto->connmgr variable?
>>
>> Also, it seems to me the lock can be per ofproto, instead of a global
>> lock, but I am not sure  this use case will benefit from a finer grain
>> lock.
>>
>>
>> On Tue, Jul 29, 2014 at 11:20 AM, Alex Wang  wrote:
>> > connmgr_wants_packet_in_on_miss() is called by multiple threads
>> > and thusly should be protected by the mutex.
>> >
>> > Signed-off-by: Alex Wang 
>> > ---
>> >  ofproto/connmgr.c |6 +-
>> >  1 file changed, 5 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/ofproto/connmgr.c b/ofproto/connmgr.c
>> > index c6432bf..89af6b6 100644
>> > --- a/ofproto/connmgr.c
>> > +++ b/ofproto/connmgr.c
>> > @@ -1506,19 +1506,23 @@ ofconn_wants_packet_in_on_miss(struct ofconn
>> > *ofconn,
>> >   * This logic assumes that "table-miss" packet_in messages
>> >   * are always sent to controller_id 0. */
>> >  bool
>> > -connmgr_wants_packet_in_on_miss(struct connmgr *mgr)
>> > +connmgr_wants_packet_in_on_miss(struct connmgr *mgr)
>> > OVS_EXCLUDED(ofproto_mutex)
>> >  {
>> >  struct ofconn *ofconn;
>> >
>> > +ovs_mutex_lock(&ofproto_mutex);
>> >  LIST_FOR_EACH (ofconn, node, &mgr->all_conns) {
>> >  enum ofputil_protocol protocol = ofconn_get_protocol(ofconn);
>> >
>> >  if (ofconn->controller_id == 0 &&
>> >  (protocol == OFPUTIL_P_NONE ||
>> >   ofputil_protocol_to_ofp_version(protocol) <
>> > OFP13_VERSION)) {
>> > +ovs_mutex_unlock(&ofproto_mutex);
>> >  return true;
>> >  }
>> >  }
>> > +ovs_mutex_unlock(&ofproto_mutex);
>> > +
>> >  return false;
>> >  }
>> >
>> > --
>> > 1.7.9.5
>> >
>> > ___
>> > dev mailing list
>> > dev@openvswitch.org
>> > http://openvswitch.org/mailman/listinfo/dev
>
>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 2/2] connmgr: Make call to connmgr_wants_packet_in_on_miss() thread safe.

2014-07-29 Thread Alex Wang
On Tue, Jul 29, 2014 at 2:16 PM, Andy Zhou  wrote:

> On Tue, Jul 29, 2014 at 2:09 PM, Alex Wang  wrote:
> > Thx for the suggestion, I think it will make the connmgr module safer
> > if we annotate the ofproto->connmgr.  With a quick look, I think it will
> > take more checking and efforts, (for functions we know that are only
> called
> > by the main thread, there is no lock protection)
> Is this safe if ofproto->connmgr can be accessed by other threads?
>


I think it is safe.  since main thread creates/deletes the connmgr,




> >
> > For branch-2.3, I want to use this fix.  Do you think it is okay?
> I suppose so, as long as the fix is safe. Your call.
>


Thx, will backport it soon,
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 2/2] connmgr: Make call to connmgr_wants_packet_in_on_miss() thread safe.

2014-07-29 Thread Alex Wang
Thx, pushed both patches to master and branch-2.3.

I'll try improving the thread-safety of connmgr module later,


On Tue, Jul 29, 2014 at 2:32 PM, Alex Wang  wrote:

>
>
>
> On Tue, Jul 29, 2014 at 2:16 PM, Andy Zhou  wrote:
>
>> On Tue, Jul 29, 2014 at 2:09 PM, Alex Wang  wrote:
>> > Thx for the suggestion, I think it will make the connmgr module safer
>> > if we annotate the ofproto->connmgr.  With a quick look, I think it will
>> > take more checking and efforts, (for functions we know that are only
>> called
>> > by the main thread, there is no lock protection)
>> Is this safe if ofproto->connmgr can be accessed by other threads?
>>
>
>
> I think it is safe.  since main thread creates/deletes the connmgr,
>
>
>
>
>> >
>> > For branch-2.3, I want to use this fix.  Do you think it is okay?
>> I suppose so, as long as the fix is safe. Your call.
>>
>
>
> Thx, will backport it soon,
>
>
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [RFC] connmgr: Make connmgr_wants_packet_in_on_miss() only check controller connections.

2014-07-29 Thread Ben Pfaff
On Tue, Jul 29, 2014 at 11:30:12AM -0700, Alex Wang wrote:
> I think we should not check 'ofconn's for OFCONN_SERVICE... This
> change broke several tests:
> 
> *Please send `tests/testsuite.log' and all information you think might
> help:*
> 
> *   To: >*
> 
> *   Subject: [openvswitch 2.3.0] testsuite: 737 739 747 817 failed*

I'm a little confused now.

I'm not sure why you addressed this to Simon.  I don't think he has any
recent commits.

>From the [openvswitch 2.3.0] it looks like you ran the tests against
branch-2.3.  I don't see the same failures locally when I run the tests
on branch-2.3 (commit 9067b54) or master (commit 57fa816).  Are these
intermittent failures or do you see them every time?  Do they appear
only after the two recent additional commits that you pushed?
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [RFC] connmgr: Make connmgr_wants_packet_in_on_miss() only check controller connections.

2014-07-29 Thread Alex Wang
Hey Ben,

Sorry for the confusion,

On Tue, Jul 29, 2014 at 2:52 PM, Ben Pfaff  wrote:

> On Tue, Jul 29, 2014 at 11:30:12AM -0700, Alex Wang wrote:
> > I think we should not check 'ofconn's for OFCONN_SERVICE... This
> > change broke several tests:
> >
> > *Please send `tests/testsuite.log' and all information you think might
> > help:*
> >
> > *   To: >*
> >
> > *   Subject: [openvswitch 2.3.0] testsuite: 737 739 747 817 failed*
>
> I'm a little confused now.
>
> I'm not sure why you addressed this to Simon.  I don't think he has any
> recent commits.
>


1. The test failure is caused by this RFC patch,

2. connmgr_wants_packet_in_on_miss() was added by Simon.  I think this
function should only check for controller connections.  And want to
confirm
it with Simon.

3. unit test 737 corresponds to this: (which is also added by Simon)

527ae97e (Simon Horman  2014-03-13 15:52:54 +0900 1565)
527ae97e (Simon Horman  2014-03-13 15:52:54 +0900 1566)
AT_SETUP([ofproto-dpif - table-miss flow (OpenFlow 1.0)])
527ae97e (Simon Horman  2014-03-13 15:52:54 +0900 1567)
OVS_VSWITCHD_START([dnl
527ae97e (Simon Horman  2014-03-13 15:52:54 +0900 1568)add-port br0
p1 -- set Interface p1 type=dummy
527ae97e (Simon Horman  2014-03-13 15:52:54 +0900 1569) ])


I think that's why I sent email asking him.



>
> From the [openvswitch 2.3.0] it looks like you ran the tests against
> branch-2.3.  I don't see the same failures locally when I run the tests
> on branch-2.3 (commit 9067b54) or master (commit 57fa816).  Are these
> intermittent failures or do you see them every time?  Do they appear
> only after the two recent additional commits that you pushed?
>


4. I saw them every time.  Those failures are caused only by this RFC
patch (unrelated to the two recent commits I just pused)

5. The cause of failure is that connmgr_wants_packet_in_on_miss() now
 checks only the 'ofconn's of type OFCONN_PRIMARY.  so, the 'ofconn'
 for 'ofctl monitor' is ignored, and we always drop the pkt_ins.

Thanks,
Alex Wang,
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH] datapath: refactor ovs flow extract API.

2014-07-29 Thread Pravin B Shelar
OVS flow extract is called on packet receive or packet
execute code path.  Following patch defines separate API
for these two cases to simplify code.

Signed-off-by: Pravin B Shelar 
---
 datapath/actions.c  |  3 +-
 datapath/datapath.c |  8 ++---
 datapath/flow.c | 82 -
 datapath/flow.h |  4 ++-
 datapath/flow_netlink.c | 31 ---
 datapath/flow_netlink.h |  4 +--
 6 files changed, 75 insertions(+), 57 deletions(-)

diff --git a/datapath/actions.c b/datapath/actions.c
index 792c3a9..65b4892 100644
--- a/datapath/actions.c
+++ b/datapath/actions.c
@@ -649,11 +649,10 @@ static int execute_recirc(struct datapath *dp, struct 
sk_buff *skb,
 const struct nlattr *a)
 {
struct sw_flow_key recirc_key;
-   const struct vport *p = OVS_CB(skb)->input_vport;
uint32_t hash = OVS_CB(skb)->pkt_key->ovs_flow_hash;
int err;
 
-   err = ovs_flow_extract(skb, p->port_no, &recirc_key);
+   err = ovs_flow_extract(skb, &recirc_key);
if (err) {
kfree_skb(skb);
return err;
diff --git a/datapath/datapath.c b/datapath/datapath.c
index 1185f60..7d4b599 100644
--- a/datapath/datapath.c
+++ b/datapath/datapath.c
@@ -300,7 +300,7 @@ void ovs_dp_process_received_packet(struct sk_buff *skb)
struct sw_flow_key key;
 
/* Extract flow from 'skb' into 'key'. */
-   error = ovs_flow_extract(skb, OVS_CB(skb)->input_vport->port_no, &key);
+   error = ovs_flow_extract(skb, &key);
if (unlikely(error)) {
kfree_skb(skb);
return;
@@ -581,13 +581,11 @@ static int ovs_packet_cmd_execute(struct sk_buff *skb, 
struct genl_info *info)
if (IS_ERR(flow))
goto err_kfree_skb;
 
-   err = ovs_flow_extract(packet, -1, &flow->key);
+   err = ovs_flow_extract_nla_metadata(a[OVS_PACKET_ATTR_KEY], packet,
+   &flow->key);
if (err)
goto err_flow_free;
 
-   err = ovs_nla_get_flow_metadata(flow, a[OVS_PACKET_ATTR_KEY]);
-   if (err)
-   goto err_flow_free;
acts = ovs_nla_alloc_flow_actions(nla_len(a[OVS_PACKET_ATTR_ACTIONS]));
err = PTR_ERR(acts);
if (IS_ERR(acts))
diff --git a/datapath/flow.c b/datapath/flow.c
index e234796..89f4d05 100644
--- a/datapath/flow.c
+++ b/datapath/flow.c
@@ -16,8 +16,6 @@
  * 02110-1301, USA
  */
 
-#include "flow.h"
-#include "datapath.h"
 #include 
 #include 
 #include 
@@ -45,6 +43,10 @@
 #include 
 #include 
 
+#include "datapath.h"
+#include "flow.h"
+#include "flow_netlink.h"
+
 #include "mpls.h"
 #include "vlan.h"
 
@@ -425,10 +427,9 @@ invalid:
 }
 
 /**
- * ovs_flow_extract - extracts a flow key from an Ethernet frame.
+ * flow_extract - extracts a flow key from an Ethernet frame.
  * @skb: sk_buff that contains the frame, with skb->data pointing to the
  * Ethernet header
- * @in_port: port number on which @skb was received.
  * @key: output flow key
  *
  * The caller must ensure that skb->len >= ETH_HLEN.
@@ -447,35 +448,11 @@ invalid:
  *  of a correct length, otherwise the same as skb->network_header.
  *  For other key->eth.type values it is left untouched.
  */
-int ovs_flow_extract(struct sk_buff *skb, u16 in_port, struct sw_flow_key *key)
+static int flow_extract(struct sk_buff *skb, struct sw_flow_key *key)
 {
int error;
struct ethhdr *eth;
 
-   if (OVS_CB(skb)->tun_info) {
-   struct ovs_tunnel_info *tun_info = OVS_CB(skb)->tun_info;
-   memcpy(&key->tun_key, &tun_info->tunnel,
-   sizeof(key->tun_key));
-   if (tun_info->options) {
-   BUILD_BUG_ON((1 << (sizeof(tun_info->options_len) * 8)) 
- 1
-   > sizeof(key->tun_opts));
-   memcpy(GENEVE_OPTS(key, tun_info->options_len),
-   tun_info->options, tun_info->options_len);
-   key->tun_opts_len = tun_info->options_len;
-   } else {
-   key->tun_opts_len = 0;
-   }
-   } else {
-   key->tun_opts_len = 0;
-   memset(&key->tun_key, 0, sizeof(key->tun_key));
-   }
-
-   key->phy.priority = skb->priority;
-   key->phy.in_port = in_port;
-   key->phy.skb_mark = skb->mark;
-   key->ovs_flow_hash = 0;
-   key->recirc_id = 0;
-
/* Flags are always used as part of stats. */
key->tp.flags = 0;
 
@@ -694,3 +671,50 @@ int ovs_flow_extract(struct sk_buff *skb, u16 in_port, 
struct sw_flow_key *key)
 
return 0;
 }
+
+int ovs_flow_extract(struct sk_buff *skb, struct sw_flow_key *key)
+{
+   /* Extract metadata from packet. */
+
+   if (OVS_CB(skb)->tun_info) {
+   struct ovs_tunnel_info *tun_info = OVS_CB(skb)->tun_info;
+
+   memcpy(&ke

Re: [ovs-dev] [PATCH] openvswitch: Use IS_ERR_OR_NULL

2014-07-29 Thread David Miller
From: Himangi Saraogi 
Date: Sun, 27 Jul 2014 12:37:46 +0530

> This patch introduces the use of the macro IS_ERR_OR_NULL in place of
> tests for NULL and IS_ERR.
> 
> The following Coccinelle semantic patch was used for making the change:
> 
> @@
> expression e;
> @@
> 
> - e == NULL || IS_ERR(e)
> + IS_ERR_OR_NULL(e)
>  || ...
> 
> Signed-off-by: Himangi Saraogi 
> Acked-by: Julia Lawall 

Applied.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH] netdev-dpdk: Use different constant for ring size

2014-07-29 Thread Daniele Di Proietto
DPDK rings must have a power-of-two size.

Signed-off-by: Daniele Di Proietto 
---
 lib/netdev-dpdk.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index b45e367..c1b644a 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -125,6 +125,7 @@ static const struct rte_eth_txconf tx_conf = {
 
 enum { MAX_RX_QUEUE_LEN = 192 };
 enum { MAX_TX_QUEUE_LEN = 384 };
+enum { DPDK_RING_SIZE = 256 }; /* Must be a power of 2 */
 enum { DRAIN_TSC = 20ULL };
 
 static int rte_eal_init_ret = ENODEV;
@@ -1236,7 +1237,7 @@ dpdk_ring_create(const char dev_name[], unsigned int 
port_no,
 return -err;
 }
 
-ivshmem->cring_tx = rte_ring_create(ring_name, MAX_RX_QUEUE_LEN, SOCKET0, 
0);
+ivshmem->cring_tx = rte_ring_create(ring_name, DPDK_RING_SIZE, SOCKET0, 0);
 if (ivshmem->cring_tx == NULL) {
 rte_free(ivshmem);
 return ENOMEM;
@@ -1247,7 +1248,7 @@ dpdk_ring_create(const char dev_name[], unsigned int 
port_no,
 return -err;
 }
 
-ivshmem->cring_rx = rte_ring_create(ring_name, MAX_RX_QUEUE_LEN, SOCKET0, 
0);
+ivshmem->cring_rx = rte_ring_create(ring_name, DPDK_RING_SIZE, SOCKET0, 0);
 if (ivshmem->cring_rx == NULL) {
 rte_free(ivshmem);
 return ENOMEM;
-- 
2.0.0

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCHv3] datapath: do not use vport type to determine presence of Geneve attributes

2014-07-29 Thread Ansis Atteka
This patch fixes following kernel crash that could happen, if geneve
vport was not added yet, but revalidator thread attempted to dump flows.
To reproduce:
1. switch tunnel type between geneve and gre in a loop; and
2. run ping.

BUG: unable to handle kernel NULL pointer dereference at 0048
IP: [] ovs_nla_put_flow+0x3d0/0x7c0 [openvswitch]
PGD 3b32b067 PUD 3b2ef067 PMD 0
Oops:  [#2] SMP
...
CPU: 0 PID: 6450 Comm: revalidator2 Tainted: GF DO
3.13.0-24-generic #46-Ubuntu
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference
Platform, BIOS 6.00 07/02/2012
task: 88003b4aafe0 ti: 88003d314000 task.ti: 88003d314000
RIP: 0010:[]  []
ovs_nla_put_flow+0x3d0/0x7c0 [openvswitch]
RSP: 0018:88003d315a10  EFLAGS: 00010246
RAX:  RBX: 88003a9a9960 RCX: 
RDX: 0002 RSI: ffc8 RDI: 88003babcb80
RBP: 88003d315a68 R08:  R09: 0004
R10: 880039c23034 R11: 0008 R12: 88003a861600
R13: 88003a9a9960 R14: 88003babcb80 R15: q88003a861600
FS:  7ff0f5d94700() GS:88003f60() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 0048 CR3: 3b55b000 CR4: 07f0
Stack:
81385093   
8800 88003d315a58 880039c23014 88003a9a97a0
88003babcb80 880039c23018 88003a861600 88003d315ad0
Call Trace:
[] ? __nla_reserve+0x43/0x50
[] ovs_flow_cmd_fill_info+0x93/0x2b0 [openvswitch]
[] ? ovs_flow_tbl_dump_next+0x49/0xc0 [openvswitch]
[] ovs_flow_cmd_dump+0x80/0xd0 [openvswitch]
[] netlink_dump+0x84/0x240
[] __netlink_dump_start+0x1ab/0x220
[] genl_family_rcv_msg+0x337/0x370
[] ? ovs_flow_cmd_fill_info+0x2b0/0x2b0 [openvswitch]
[] ? __kmalloc_node_track_caller+0x58/0x1e0
[] ? genl_family_rcv_msg+0x370/0x370
[] genl_rcv_msg+0x91/0xd0
[] netlink_rcv_skb+0xa9/0xc0
[] genl_rcv+0x28/0x40
[] netlink_unicast+0xd5/0x1b0
[] netlink_sendmsg+0x2ff/0x740
[] sock_sendmsg+0x8b/0xc0
[] ? __sb_end_write+0x31/0x60
[] ? touch_atime+0x10f/0x140
[] ? pipe_read+0x371/0x400
[] SYSC_sendto+0x121/0x1c0
[] ? vtime_account_user+0x54/0x60
[] ? syscall_trace_enter+0x145/0x250
[] SyS_sendto+0xe/0x10
[] tracesys+0xe1/0xe6

Signed-Off-By: Ansis Atteka 
---
 datapath/flow_netlink.c|6 ++
 datapath/linux/compat/include/net/ip_tunnels.h |1 +
 datapath/vport-geneve.c|2 +-
 3 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/datapath/flow_netlink.c b/datapath/flow_netlink.c
index e1eadbb..41199ff 100644
--- a/datapath/flow_netlink.c
+++ b/datapath/flow_netlink.c
@@ -421,6 +421,7 @@ static int ipv4_tun_from_nlattr(const struct nlattr *attr,
tun_flags |= TUNNEL_OAM;
break;
case OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS:
+   tun_flags |= TUNNEL_OPTIONS_PRESENT;
if (nla_len(a) > sizeof(match->key->tun_opts)) {
OVS_NLERR("Geneve option length exceeds "
  "maximum size (len %d, max %zu).\n",
@@ -1051,10 +1052,7 @@ int ovs_nla_put_flow(struct datapath *dp, const struct 
sw_flow_key *swkey,
const struct geneve_opt *opts = NULL;
 
if (!is_mask) {
-   struct vport *in_port;
-
-   in_port = ovs_vport_ovsl_rcu(dp, swkey->phy.in_port);
-   if (in_port->ops->type == OVS_VPORT_TYPE_GENEVE)
+   if (swkey->tun_key.tun_flags & TUNNEL_OPTIONS_PRESENT)
opts = GENEVE_OPTS(output, swkey->tun_opts_len);
} else {
if (output->tun_opts_len)
diff --git a/datapath/linux/compat/include/net/ip_tunnels.h 
b/datapath/linux/compat/include/net/ip_tunnels.h
index c7a14ef..2a6470a 100644
--- a/datapath/linux/compat/include/net/ip_tunnels.h
+++ b/datapath/linux/compat/include/net/ip_tunnels.h
@@ -48,5 +48,6 @@ int iptunnel_pull_header(struct sk_buff *skb, int hdr_len, 
__be16 inner_proto);
 /* Not yet upstream */
 #define TUNNEL_OAM __cpu_to_be16(0x0200)
 #define TUNNEL_CRIT_OPT__cpu_to_be16(0x0400)
+#define TUNNEL_OPTIONS_PRESENT __cpu_to_be16(0x0800)
 
 #endif /* __NET_IP_TUNNELS_H */
diff --git a/datapath/vport-geneve.c b/datapath/vport-geneve.c
index 99841d4..65a996f 100644
--- a/datapath/vport-geneve.c
+++ b/datapath/vport-geneve.c
@@ -189,7 +189,7 @@ static int geneve_rcv(struct sock *sk, struct sk_buff *skb)
 
geneveh = geneve_hdr(skb);
 
-   flags = TUNNEL_KEY |
+   flags = TUNNEL_KEY | TUNNEL_OPTIONS_PRESENT |
(udp_hdr(skb)->check != 0 ? TUNNEL_CSUM : 0) |
(geneveh->oam ? TUNNEL_OAM : 0) |
(geneveh->critical ? TUNNEL_CRIT_OPT : 0);
-- 
1.7.9.5

__

Re: [ovs-dev] [PATCH] datapath: do not add Geneve attributes if vport does not exist

2014-07-29 Thread Ansis Atteka
On Mon, Jul 28, 2014 at 7:54 PM, Jesse Gross  wrote:
> On Tue, Jul 22, 2014 at 9:34 AM, Jesse Gross  wrote:
>> On Mon, Jul 21, 2014 at 5:11 PM, Ansis Atteka  wrote:
>>> On Thu, Jul 17, 2014 at 4:08 PM, Jesse Gross  wrote:
 One thing that I worry about is that this has the possibility to
 change how the flow is reported - before the flow deletion it has
 Geneve options but immediately after the flow still exists but without
 options. OVS will likely deal with this but it doesn't seem like a
 great thing.
>>> I talked with Pravin on how to solve this "flow changing while vport
>>> gets added" issue and he seemed to prefer the idea where we simply
>>> look into tun_opts_len to figure out whether to append Geneve
>>> attributes.
>>>
>>> Basically, the new patch makes assumption that if tun_opts_len>0 then
>>> that is Geneve port that could potentially have Geneve options. Also,
>>> I remember you mentioning that it might be good to distinguish between
>>> the case when there are no tunnel options at all and the case when
>>> Geneve attributes contain an empty set. Patch v2 does not address
>>> that. How important is that in your opinion?
>>
>> The biggest concern that I have is if there is a flow that matches on
>> Geneve packets without any options. If we it changed to look at
>> tun_opts_len only, when the upcall goes to userspace there would be no
>> OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS. Userspace would generally like to
>> just echo the key back, so that means that we would have a mask
>> without a key since it needs to specify a mask to match.
>>
>> This is currently allowed but not really ideal - we have this already
>> for some existing netlink attributes like the overall
>> OVS_KEY_ATTR_TUNNEL but it's been a source of bugs and special casing.
>> The other choice is that we force userspace to insert the key even if
>> the kernel didn't provide it for Geneve flows but that's somewhat
>> complicated.
>>
>> One other possible solution is to tuck a bit into the flow to
>> differentiate between the cases of an empty option set and no options
>> but I was somewhat hoping to avoid that.
>
> Pravin, Andy, and I just talked about this and I think that we reached
> a conclusion.
>
> It seems like the best thing to do is to use a flag in the key that
> indicates whether options were originally present in either the flow
> (because userspace passed it down) or header (because it came from a
> Geneve port). Essentially, this would act like the current port number
> check but with less crashing.
>
> So what we would do is:
>  * Add a TUNNEL_OPTIONS_PRESENT flag to the existing list of
> TUNNEL_CSUM, TUNNEL_KEY, etc.
>  * Geneve ports would always set this when initializing flags in
> geneve_rcv(), other tunnel ports never would.
>  * When receiving a flow setup from userspace, we would set this flag
> if there is a GENEVE_OPTIONS key attribute present.
>  * When sending a flow from the kernel to userspace, we would change
> the current check based on the port number to one based on this flag.
>
> Does that sound reasonable to you?
Yes, that sounds reasonable. I sent it out for review here -
http://openvswitch.org/pipermail/dev/2014-July/043220.html
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH net-next] net: Remove unlikely() for WARN_ON() conditions

2014-07-29 Thread Thomas Graf
No need for the unlikely(), WARN_ON() and BUG_ON() internally use
unlikely() on the condition.

Signed-off-by: Thomas Graf 
---
 net/core/dev.c | 2 +-
 net/openvswitch/datapath.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index e1b7cfa..b370230 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2326,7 +2326,7 @@ __be16 skb_network_protocol(struct sk_buff *skb, int 
*depth)
 */
if (type == htons(ETH_P_8021Q) || type == htons(ETH_P_8021AD)) {
if (vlan_depth) {
-   if (unlikely(WARN_ON(vlan_depth < VLAN_HLEN)))
+   if (WARN_ON(vlan_depth < VLAN_HLEN))
return 0;
vlan_depth -= VLAN_HLEN;
} else {
diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h
index 7ede507..701b573 100644
--- a/net/openvswitch/datapath.h
+++ b/net/openvswitch/datapath.h
@@ -144,7 +144,7 @@ int lockdep_ovsl_is_held(void);
 #define lockdep_ovsl_is_held() 1
 #endif
 
-#define ASSERT_OVSL()  WARN_ON(unlikely(!lockdep_ovsl_is_held()))
+#define ASSERT_OVSL()  WARN_ON(!lockdep_ovsl_is_held())
 #define ovsl_dereference(p)\
rcu_dereference_protected(p, lockdep_ovsl_is_held())
 #define rcu_dereference_ovsl(p)\
-- 
1.9.3

___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [RFC] connmgr: Make connmgr_wants_packet_in_on_miss() only check controller connections.

2014-07-29 Thread Simon Horman
On Tue, Jul 29, 2014 at 03:01:03PM -0700, Alex Wang wrote:
> Hey Ben,
> 
> Sorry for the confusion,
> 
> On Tue, Jul 29, 2014 at 2:52 PM, Ben Pfaff  wrote:
> 
> > On Tue, Jul 29, 2014 at 11:30:12AM -0700, Alex Wang wrote:
> > > I think we should not check 'ofconn's for OFCONN_SERVICE... This
> > > change broke several tests:
> > >
> > > *Please send `tests/testsuite.log' and all information you think might
> > > help:*
> > >
> > > *   To: >*
> > >
> > > *   Subject: [openvswitch 2.3.0] testsuite: 737 739 747 817 failed*
> >
> > I'm a little confused now.
> >
> > I'm not sure why you addressed this to Simon.  I don't think he has any
> > recent commits.
> >
> 
> 
> 1. The test failure is caused by this RFC patch,
> 
> 2. connmgr_wants_packet_in_on_miss() was added by Simon.  I think this
> function should only check for controller connections.  And want to
> confirm
> it with Simon.
> 
> 3. unit test 737 corresponds to this: (which is also added by Simon)
> 
> 527ae97e (Simon Horman  2014-03-13 15:52:54 +0900 1565)
> 527ae97e (Simon Horman  2014-03-13 15:52:54 +0900 1566)
> AT_SETUP([ofproto-dpif - table-miss flow (OpenFlow 1.0)])
> 527ae97e (Simon Horman  2014-03-13 15:52:54 +0900 1567)
> OVS_VSWITCHD_START([dnl
> 527ae97e (Simon Horman  2014-03-13 15:52:54 +0900 1568)add-port br0
> p1 -- set Interface p1 type=dummy
> 527ae97e (Simon Horman  2014-03-13 15:52:54 +0900 1569) ])
> 
> 
> I think that's why I sent email asking him.

I don't wish to add to any confusion but I will none the less
make a comment on this change which you may choose to ignore if you like.

The motivation for adding connmgr_wants_packet_in_on_miss() is to help
implement per-OpenFlow version behaviour of packet_in messages.  So if
monitors should have per-OpenFlow version behaviour then I am not sure your
patch is correct. My recollection is that this behaviour of monitors seemed
useful for the purpose of testing the per-OpenFlow version behaviour of
packet in messages. But it was a while ago so my recollection may not be
accurate.

> > From the [openvswitch 2.3.0] it looks like you ran the tests against
> > branch-2.3.  I don't see the same failures locally when I run the tests
> > on branch-2.3 (commit 9067b54) or master (commit 57fa816).  Are these
> > intermittent failures or do you see them every time?  Do they appear
> > only after the two recent additional commits that you pushed?
> >
> 
> 
> 4. I saw them every time.  Those failures are caused only by this RFC
> patch (unrelated to the two recent commits I just pused)
> 
> 5. The cause of failure is that connmgr_wants_packet_in_on_miss() now
>  checks only the 'ofconn's of type OFCONN_PRIMARY.  so, the 'ofconn'
>  for 'ofctl monitor' is ignored, and we always drop the pkt_ins.
> 
> Thanks,
> Alex Wang,
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [PATCH 2/2] connmgr: Make call to connmgr_wants_packet_in_on_miss() thread safe.

2014-07-29 Thread Andy Zhou
On Tue, Jul 29, 2014 at 2:46 PM, Alex Wang  wrote:
> Thx, pushed both patches to master and branch-2.3.
>
> I'll try improving the thread-safety of connmgr module later,

Sounds good. Thanks.
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


Re: [ovs-dev] [RFC] connmgr: Make connmgr_wants_packet_in_on_miss() only check controller connections.

2014-07-29 Thread Alex Wang
>
>
> I don't wish to add to any confusion but I will none the less
> make a comment on this change which you may choose to ignore if you like.
>
> The motivation for adding connmgr_wants_packet_in_on_miss() is to help
> implement per-OpenFlow version behaviour of packet_in messages.  So if
> monitors should have per-OpenFlow version behaviour then I am not sure your
> patch is correct. My recollection is that this behaviour of monitors seemed
> useful for the purpose of testing the per-OpenFlow version behaviour of
> packet in messages. But it was a while ago so my recollection may not be
> accurate.
>
>
>

Hey Simon,

Thanks for the explanation, I think we do want to have per-OpenFlow
version 'table-miss' behavior for 'ofctl monitor'.

My original understanding was that if there is no established 'ofconn' to
controller, connmgr_wants_packet_in_on_miss() should always returns false,
and 'ofctl monitor' should not work.

After checking the code further, I think we treat 'ofconn' from 'ovs-ofctl
monitor' as 'controller' also.  And i see why you wrote
connmgr_wants_packet_in_on_miss().

So, I'm dropping this patch.

Thanks,
Alex Wang,
___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH] vport-internal_dev:Drop packets when interdev is not up

2014-07-29 Thread lichunhe
From: Chunhe Li 

If the internal device is not up, it should drop received packets. Sometimes
it receive the broadcast or multicast packets, and the ip protocol stack will
casue more cpu usage wasted.

Signed-off-by: Chunhe Li 
---
 datapath/vport-internal_dev.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/datapath/vport-internal_dev.c b/datapath/vport-internal_dev.c
index 3b0f9a7..b0e1b24 100644
--- a/datapath/vport-internal_dev.c
+++ b/datapath/vport-internal_dev.c
@@ -242,6 +242,11 @@ static int internal_dev_recv(struct vport *vport, struct 
sk_buff *skb)
struct net_device *netdev = netdev_vport_priv(vport)->dev;
int len;
 
+   if (netdev && !(netdev->flags & IFF_UP)) {
+   kfree_skb(skb);
+   return 0;
+   }
+
 #if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,37)
if (vlan_tx_tag_present(skb)) {
if (unlikely(!__vlan_put_tag(skb,
-- 
1.9.2.0


___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev


[ovs-dev] [PATCH] ovs-lib.in:Add process name checking when start ovs service

2014-07-29 Thread lichunhe
From: Chunhe Li 

Only check wheather is daemon pid exist is not enough, becasue the pid which 
store in pidfile maybe assign to another process by OS. So it will checking
failed for pid exist, but the starting process which own the pid is not the
ovs daemon.

Signed-off-by: Chunhe Li 
Signed-off-by: wuyunfei 
---
 utilities/ovs-lib.in | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/utilities/ovs-lib.in b/utilities/ovs-lib.in
index 48d0c36..8c1a8b2 100644
--- a/utilities/ovs-lib.in
+++ b/utilities/ovs-lib.in
@@ -244,5 +244,5 @@ daemon_status () {
 
 daemon_is_running () {
 pidfile=$rundir/$1.pid
-test -e "$pidfile" && pid=`cat "$pidfile"` && pid_exists "$pid"
+test -e "$pidfile" && pid=`cat "$pidfile"` && pid_exists "$pid" && pidof 
"$1"
 } >/dev/null 2>&1
-- 
1.9.2.0


___
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev