Re: [ovs-dev] [PATCH v4 1/3] Add basic implementation for OpenFlow 1.4 bundles
Hi Alex, On 04/22/2014 09:31 PM, Alexandru Copot wrote: +/* List of 'struct bundle_message's */ +struct list msg_list; +struct ovs_mutex list_mutex; I think we can remove this mutex too. Daniel. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v3] Rapid Spanning Protocol Implementation (IEEE 802.1D) + functional tests
Jarno Rajahalme wrote: Daniele, Martino, Thank you for the contribution, and sorry that it has taken so long for me to get to the review! Thank you for the thorough revision. We are going to take a deeper look at it and provide modifications/comments. B.R., Martino. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 3.13] core, nfqueue, openvswitch: Orphan frags in skb_zerocopy and handle errors
On Tue, Apr 22, 2014 at 08:18:04PM +0100, Ben Hutchings wrote: > From: Zoltan Kiss > > commit 36d5fe6a000790f56039afe26834265db0a3ad4c upstream. > > skb_zerocopy can copy elements of the frags array between skbs, but it doesn't > orphan them. Also, it doesn't handle errors, so this patch takes care of that > as well, and modify the callers accordingly. skb_tx_error() is also added to > the callers so they will signal the failed delivery towards the creator of the > skb. > > Signed-off-by: Zoltan Kiss > Signed-off-by: David S. Miller > [bwh: Backported to 3.13: skb_zerocopy() is new in 3.14, but was moved from a > static function in nfnetlink_queue. We need to patch that and its caller, > but > not openvswitch.] > Signed-off-by: Ben Hutchings > --- > On Tue, 2014-04-22 at 19:02 +0100, Zoltan Kiss wrote: > > On 22/04/14 16:38, Ben Hutchings wrote: > > > On Mon, 2014-04-21 at 12:26 +0100, Luis Henriques wrote: > > >> Hi David, > > >> > > >> On Thu, Mar 27, 2014 at 03:29:56PM -0400, David Miller wrote: > > >>> From: Zoltan Kiss > > >>> Date: Wed, 26 Mar 2014 22:37:45 + > > >>> > > skb_zerocopy can copy elements of the frags array between skbs, but it > > doesn't > > orphan them. Also, it doesn't handle errors, so this patch takes care > > of that > > as well, and modify the callers accordingly. skb_tx_error() is also > > added to > > the callers so they will signal the failed delivery towards the > > creator of the > > skb. > > > > Signed-off-by: Zoltan Kiss > > >>> > > >>> Fingers crossed :-) Applied, thanks Zoltan. > > >>> -- > > >>> To unsubscribe from this list: send the line "unsubscribe netdev" in > > >>> the body of a message to majord...@vger.kernel.org > > >>> More majordomo info at http://vger.kernel.org/majordomo-info.html > > >> > > >> Could you please queue this for stable as well? It has CVE-2014-2568 > > >> assigned to it and I believe it is applicable to some of the trees. > > > > > > It was applied to 3.13, but needs backporting to earlier versions. I > > > posted my attempt in > > > <1397429860.10849.86.ca...@deadeye.wl.decadent.org.uk> but it needs > > > testing/reviewing. > > > > This one is a different issue, although it is very similar. > > Sorry for mixing these up. I believe this bug has been present since > zerocopy was added to nfqueue in Linux 3.10 (commit ae08ce002108). This > is the version I used for Debian's 3.13 branch, which might be usable > for older stable branches too. > > Ben. Thank you for this backport Ben. It looks correct to me and it seems to be applicable to 3.10+ kernels Cheers, -- Luís > --- > --- a/net/netfilter/nfnetlink_queue_core.c > +++ b/net/netfilter/nfnetlink_queue_core.c > @@ -235,22 +235,23 @@ nfqnl_flush(struct nfqnl_instance *queue > spin_unlock_bh(&queue->lock); > } > > -static void > +static int > nfqnl_zcopy(struct sk_buff *to, const struct sk_buff *from, int len, int > hlen) > { > int i, j = 0; > int plen = 0; /* length of skb->head fragment */ > + int ret; > struct page *page; > unsigned int offset; > > /* dont bother with small payloads */ > - if (len <= skb_tailroom(to)) { > - skb_copy_bits(from, 0, skb_put(to, len), len); > - return; > - } > + if (len <= skb_tailroom(to)) > + return skb_copy_bits(from, 0, skb_put(to, len), len); > > if (hlen) { > - skb_copy_bits(from, 0, skb_put(to, hlen), hlen); > + ret = skb_copy_bits(from, 0, skb_put(to, hlen), hlen); > + if (unlikely(ret)) > + return ret; > len -= hlen; > } else { > plen = min_t(int, skb_headlen(from), len); > @@ -268,6 +269,11 @@ nfqnl_zcopy(struct sk_buff *to, const st > to->len += len + plen; > to->data_len += len + plen; > > + if (unlikely(skb_orphan_frags(from, GFP_ATOMIC))) { > + skb_tx_error(from); > + return -ENOMEM; > + } > + > for (i = 0; i < skb_shinfo(from)->nr_frags; i++) { > if (!len) > break; > @@ -278,6 +284,8 @@ nfqnl_zcopy(struct sk_buff *to, const st > j++; > } > skb_shinfo(to)->nr_frags = j; > + > + return 0; > } > > static int > @@ -374,13 +382,16 @@ nfqnl_build_packet_message(struct net *n > > skb = nfnetlink_alloc_skb(net, size, queue->peer_portid, > GFP_ATOMIC); > - if (!skb) > + if (!skb) { > + skb_tx_error(entskb); > return NULL; > + } > > nlh = nlmsg_put(skb, 0, 0, > NFNL_SUBSYS_QUEUE << 8 | NFQNL_MSG_PACKET, > sizeof(struct nfgenmsg), 0); > if (!nlh) { > + skb_tx_error(entskb); > kfree_skb(skb); > return NULL; > } > @@ -504,13 +515,15 @@ nfqnl_build_packet_message(struct net *n >
[ovs-dev] [PATCH v4 1/3] Add basic implementation for OpenFlow 1.4 bundles
This is only the communication part of the bundles functionality. The actual message pre-validation and commits are not implemented. Signed-off-by: Alexandru Copot --- v4: * adjust copyright * fix style issues * remove all locking * delete bundles in ofconn_destroy() v3: * rebase * adjusted to use ofpbuf_l3() * bug fixes v2: rebase --- lib/learning-switch.c | 2 + lib/ofp-errors.h | 52 ++ lib/ofp-msgs.h| 10 ++ lib/ofp-print.c | 92 ++ lib/ofp-util.c| 58 lib/ofp-util.h| 21 + lib/rconn.c | 2 + ofproto/automake.mk | 5 +- ofproto/bundles.c | 256 ++ ofproto/bundles.h | 49 ++ ofproto/connmgr.c | 16 ofproto/connmgr.h | 2 + ofproto/ofproto.c | 70 ++ 13 files changed, 634 insertions(+), 1 deletion(-) create mode 100644 ofproto/bundles.c create mode 100644 ofproto/bundles.h diff --git a/lib/learning-switch.c b/lib/learning-switch.c index c818a32..ca57911 100644 --- a/lib/learning-switch.c +++ b/lib/learning-switch.c @@ -398,6 +398,8 @@ lswitch_process_packet(struct lswitch *sw, const struct ofpbuf *msg) case OFPTYPE_METER_FEATURES_STATS_REPLY: case OFPTYPE_TABLE_FEATURES_STATS_REQUEST: case OFPTYPE_TABLE_FEATURES_STATS_REPLY: +case OFPTYPE_BUNDLE_CONTROL: +case OFPTYPE_BUNDLE_ADD_MESSAGE: default: if (VLOG_IS_DBG_ENABLED()) { char *s = ofp_to_string(ofpbuf_data(msg), ofpbuf_size(msg), 2); diff --git a/lib/ofp-errors.h b/lib/ofp-errors.h index c80a75e..b1bcf7c 100644 --- a/lib/ofp-errors.h +++ b/lib/ofp-errors.h @@ -557,6 +557,58 @@ enum ofperr { /* OF1.3+(13,5). Permissions error. */ OFPERR_OFPTFFC_EPERM, +/* ## ## */ +/* ## OFPET_BUNDLE_FAILED ## */ +/* ## ## */ + +/* OF1.4+(17,0). Unspecified error. */ +OFPERR_OFPBFC_UNKNOWN, + +/* OF1.4+(17,1). Permissions error. */ +OFPERR_OFPBFC_EPERM, + +/* OF1.4+(17,2). Bundle ID doesn't exist. */ +OFPERR_OFPBFC_BAD_ID, + +/* OF1.4+(17,3). Bundle ID already exists. */ +OFPERR_OFPBFC_BUNDLE_EXIST, + +/* OF1.4+(17,4). Bundle ID is closed. */ +OFPERR_OFPBFC_BUNDLE_CLOSED, + +/* OF1.4+(17,5). Too many bundle IDs. */ +OFPERR_OFPBFC_OUT_OF_BUNDLES, + +/* OF1.4+(17,6). Unsupported of unknown message control type. */ +OFPERR_OFPBFC_BAD_TYPE, + +/* OF1.4+(17,7). Unsupported, unknown, or inconsistent flags. */ +OFPERR_OFPBFC_BAD_FLAGS, + +/* OF1.4+(17,8). Length problem in included message. */ +OFPERR_OFPBFC_MSG_BAD_LEN, + +/* OF1.4+(17,9). Inconsistent or duplicate XID. */ +OFPERR_OFPBFC_MSG_BAD_XID, + +/* OF1.4+(17,10). Unsupported message in this bundle. */ +OFPERR_OFPBFC_MSG_UNSUP, + +/* OF1.4+(17,11). Unsupported message combination in this bundle. */ +OFPERR_OFPBFC_MSG_CONFLICT, + +/* OF1.4+(17,12). Cant handle this many messages in bundle. */ +OFPERR_OFPBFC_MSG_TOO_MANY, + +/* OF1.4+(17,13). One message in bundle failed. */ +OFPERR_OFPBFC_MSG_FAILED, + +/* OF1.4+(17,14). Bundle is taking too long. */ +OFPERR_OFPBFC_TIMEOUT, + +/* OF1.4+(17,15). Bundle is locking the resource. */ +OFPERR_OFPBFC_BUNDLE_IN_PROGRESS, + /* ## -- ## */ /* ## OFPET_EXPERIMENTER ## */ /* ## -- ## */ diff --git a/lib/ofp-msgs.h b/lib/ofp-msgs.h index d8dee5b..5cded7c 100644 --- a/lib/ofp-msgs.h +++ b/lib/ofp-msgs.h @@ -233,6 +233,12 @@ enum ofpraw { /* OFPT 1.4+ (30): struct ofp14_role_status, uint8_t[8][]. */ OFPRAW_OFPT14_ROLE_STATUS, +/* OFPT 1.4+ (33): struct ofp14_bundle_ctrl_msg, uint8_t[8][]. */ +OFPRAW_OFPT14_BUNDLE_CONTROL, + +/* OFPT 1.4+ (34): struct ofp14_bundle_add_msg, struct ofp_header, uint8_t[]. */ +OFPRAW_OFPT14_BUNDLE_ADD_MESSAGE, + /* Standard statistics. */ /* OFPST 1.0+ (0): void. */ @@ -508,6 +514,10 @@ enum ofptype { /* Controller role change event messages. */ OFPTYPE_ROLE_STATUS, /* OFPRAW_OFPT14_ROLE_STATUS. */ +OFPTYPE_BUNDLE_CONTROL, /* OFPRAW_OFPT14_BUNDLE_CONTROL. */ + +OFPTYPE_BUNDLE_ADD_MESSAGE, /* OFPRAW_OFPT14_BUNDLE_ADD_MESSAGE. */ + /* Statistics. */ OFPTYPE_DESC_STATS_REQUEST, /* OFPRAW_OFPST_DESC_REQUEST. */ OFPTYPE_DESC_STATS_REPLY,/* OFPRAW_OFPST_DESC_REPLY. */ diff --git a/lib/ofp-print.c b/lib/ofp-print.c index 9091b1b..b1bd2c3 100644 --- a/lib/ofp-print.c +++ b/lib/ofp-print.c @@ -2665,6 +2665,90 @@ ofp_print_table_features(struct ds *s, const struct ofp_header *oh) } } +static const char * +bundle_flags_to_name(uint32_t bit) +{ +switch (bit) { +case OFPBF_ATOMIC: +return "atomic"; +case OFPBF_ORDERED: +return "ordered"; +default: +return NULL; +} +} + +static void +ofp_print_bundle_ctrl(struct ds *s, con
Re: [ovs-dev] [PATCH 3.13] core, nfqueue, openvswitch: Orphan frags in skb_zerocopy and handle errors
On 22/04/14 20:18, Ben Hutchings wrote: From: Zoltan Kiss commit 36d5fe6a000790f56039afe26834265db0a3ad4c upstream. skb_zerocopy can copy elements of the frags array between skbs, but it doesn't orphan them. Also, it doesn't handle errors, so this patch takes care of that as well, and modify the callers accordingly. skb_tx_error() is also added to the callers so they will signal the failed delivery towards the creator of the skb. Signed-off-by: Zoltan Kiss Signed-off-by: David S. Miller [bwh: Backported to 3.13: skb_zerocopy() is new in 3.14, but was moved from a static function in nfnetlink_queue. We need to patch that and its caller, but not openvswitch.] Signed-off-by: Ben Hutchings --- On Tue, 2014-04-22 at 19:02 +0100, Zoltan Kiss wrote: On 22/04/14 16:38, Ben Hutchings wrote: On Mon, 2014-04-21 at 12:26 +0100, Luis Henriques wrote: Hi David, On Thu, Mar 27, 2014 at 03:29:56PM -0400, David Miller wrote: From: Zoltan Kiss Date: Wed, 26 Mar 2014 22:37:45 + skb_zerocopy can copy elements of the frags array between skbs, but it doesn't orphan them. Also, it doesn't handle errors, so this patch takes care of that as well, and modify the callers accordingly. skb_tx_error() is also added to the callers so they will signal the failed delivery towards the creator of the skb. Signed-off-by: Zoltan Kiss Fingers crossed :-) Applied, thanks Zoltan. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Could you please queue this for stable as well? It has CVE-2014-2568 assigned to it and I believe it is applicable to some of the trees. It was applied to 3.13, but needs backporting to earlier versions. I posted my attempt in <1397429860.10849.86.ca...@deadeye.wl.decadent.org.uk> but it needs testing/reviewing. This one is a different issue, although it is very similar. Sorry for mixing these up. I believe this bug has been present since zerocopy was added to nfqueue in Linux 3.10 (commit ae08ce002108). This is the version I used for Debian's 3.13 branch, which might be usable for older stable branches too. Seems OK, thanks! Ben. --- --- a/net/netfilter/nfnetlink_queue_core.c +++ b/net/netfilter/nfnetlink_queue_core.c @@ -235,22 +235,23 @@ nfqnl_flush(struct nfqnl_instance *queue spin_unlock_bh(&queue->lock); } -static void +static int nfqnl_zcopy(struct sk_buff *to, const struct sk_buff *from, int len, int hlen) { int i, j = 0; int plen = 0; /* length of skb->head fragment */ + int ret; struct page *page; unsigned int offset; /* dont bother with small payloads */ - if (len <= skb_tailroom(to)) { - skb_copy_bits(from, 0, skb_put(to, len), len); - return; - } + if (len <= skb_tailroom(to)) + return skb_copy_bits(from, 0, skb_put(to, len), len); if (hlen) { - skb_copy_bits(from, 0, skb_put(to, hlen), hlen); + ret = skb_copy_bits(from, 0, skb_put(to, hlen), hlen); + if (unlikely(ret)) + return ret; len -= hlen; } else { plen = min_t(int, skb_headlen(from), len); @@ -268,6 +269,11 @@ nfqnl_zcopy(struct sk_buff *to, const st to->len += len + plen; to->data_len += len + plen; + if (unlikely(skb_orphan_frags(from, GFP_ATOMIC))) { + skb_tx_error(from); + return -ENOMEM; + } + for (i = 0; i < skb_shinfo(from)->nr_frags; i++) { if (!len) break; @@ -278,6 +284,8 @@ nfqnl_zcopy(struct sk_buff *to, const st j++; } skb_shinfo(to)->nr_frags = j; + + return 0; } static int @@ -374,13 +382,16 @@ nfqnl_build_packet_message(struct net *n skb = nfnetlink_alloc_skb(net, size, queue->peer_portid, GFP_ATOMIC); - if (!skb) + if (!skb) { + skb_tx_error(entskb); return NULL; + } nlh = nlmsg_put(skb, 0, 0, NFNL_SUBSYS_QUEUE << 8 | NFQNL_MSG_PACKET, sizeof(struct nfgenmsg), 0); if (!nlh) { + skb_tx_error(entskb); kfree_skb(skb); return NULL; } @@ -504,13 +515,15 @@ nfqnl_build_packet_message(struct net *n nla->nla_type = NFQA_PAYLOAD; nla->nla_len = nla_attr_size(data_len); - nfqnl_zcopy(skb, entskb, data_len, hlen); + if (nfqnl_zcopy(skb, entskb, data_len, hlen)) + goto nla_put_failure; } nlh->nlmsg_len = skb->len; return skb; nla_put_failure: + skb_tx_error(entskb); kfree_skb(skb); net_err_ratelimited("nf_queue: error creating packet message\n");
Re: [ovs-dev] [PATCH] lib/util: Input validation in str_to_uint
On 22/04/14 22:39, Ben Pfaff wrote: On Tue, Apr 22, 2014 at 06:27:18PM +0100, Zoltan Kiss wrote: This function returns true when 's' is negative or greater than UINT_MAX. Also, the representation of 'int' and 'unsigned int' is implementation dependent, so converting [INT_MAX..UINT_MAX] values with str_to_int is fragile. Instead, we should convert straight to 'long long' and do a boundary check before returning the converted value. Signed-off-by: Zoltan Kiss It's implementation dependent in theory but in practice OVS only runs on two's complement systems. The intent here was to accept negative or out of range values and adjust them to lie within the range as if by truncation of the two's complement bitwise pattern. This commit changes the semantics. However, all of the existing users of str_to_uint() would be better off with the semantics that you propose. Let's change it. I think that we'd be better off with this in the .c file now. It was only in the .h file because it was completely trivial. I think that the other str_to_u*() functions should adopt the new semantics too. Or we could delete them, since they have no users. Here, I would remove the OVS_UNLIKELY, not because it is wrong but because this is not on any fast path and such annotations make code harder to read. I would also remove the inner (): Ok, I'll repost with fixes. +if (OVS_UNLIKELY(!ok || (ll < 0) || (ll > UINT_MAX))) { + *u = 0; + return false; +} +else { + *u = ll; + return true; +} Thanks, Ben. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 3.13] core, nfqueue, openvswitch: Orphan frags in skb_zerocopy and handle errors
On Tue, Apr 22, 2014 at 3:18 PM, Ben Hutchings wrote: > From: Zoltan Kiss > > commit 36d5fe6a000790f56039afe26834265db0a3ad4c upstream. > > skb_zerocopy can copy elements of the frags array between skbs, but it doesn't > orphan them. Also, it doesn't handle errors, so this patch takes care of that > as well, and modify the callers accordingly. skb_tx_error() is also added to > the callers so they will signal the failed delivery towards the creator of the > skb. > > Signed-off-by: Zoltan Kiss > Signed-off-by: David S. Miller > [bwh: Backported to 3.13: skb_zerocopy() is new in 3.14, but was moved from a > static function in nfnetlink_queue. We need to patch that and its caller, > but > not openvswitch.] > Signed-off-by: Ben Hutchings Deja-vu. I sent almost the same patch to netdev a while ago and you acked it there. So I guess I'll ack your copy as well :). Not sure why either version wasn't picked up by Dave. josh ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v2] lib/util: Input validation in str_to_uint
This function returns true when 's' is negative or greater than UINT_MAX. Also, the representation of 'int' and 'unsigned int' is implementation dependent, so converting [INT_MAX..UINT_MAX] values with str_to_int is fragile. Instead, we should convert straight to 'long long' and do a boundary check before returning the converted value. This patch also move the function to the .c file as it's not-trivial now, and deletes the other str_to_u* functions as they are not used. Signed-off-by: Zoltan Kiss --- v2: - remove OVS_UNLIKELY and unnecessary () - move the function to util.c as it became not-trivial - remove the other str_to_u* functions as they are not used diff --git a/lib/util.c b/lib/util.c index 3f08c4a..54065fe 100644 --- a/lib/util.c +++ b/lib/util.c @@ -613,6 +613,21 @@ str_to_llong(const char *s, int base, long long *x) } } +bool +str_to_uint(const char *s, int base, unsigned int *u) +{ +long long ll; +bool ok = str_to_llong(s, base, &ll); +if (!ok || ll < 0 || ll > UINT_MAX) { + *u = 0; + return false; +} +else { + *u = ll; + return true; +} +} + /* Converts floating-point string 's' into a double. If successful, stores * the double in '*d' and returns true; on failure, stores 0 in '*d' and * returns false. diff --git a/lib/util.h b/lib/util.h index aff17a5..bff6153 100644 --- a/lib/util.h +++ b/lib/util.h @@ -290,24 +290,7 @@ void ovs_hex_dump(FILE *, const void *, size_t, uintptr_t offset, bool ascii); bool str_to_int(const char *, int base, int *); bool str_to_long(const char *, int base, long *); bool str_to_llong(const char *, int base, long long *); - -static inline bool -str_to_uint(const char *s, int base, unsigned int *u) -{ -return str_to_int(s, base, (int *) u); -} - -static inline bool -str_to_ulong(const char *s, int base, unsigned long *ul) -{ -return str_to_long(s, base, (long *) ul); -} - -static inline bool -str_to_ullong(const char *s, int base, unsigned long long *ull) -{ -return str_to_llong(s, base, (long long *) ull); -} +bool str_to_uint(const char *, int base, unsigned int *); bool ovs_scan(const char *s, const char *format, ...) SCANF_FORMAT(2, 3); ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 07/11] daemon-windows: Recognize --no-chdir option for windows.
On Tue, Apr 22, 2014 at 2:58 PM, Ben Pfaff wrote: > On Tue, Apr 22, 2014 at 01:40:55PM -0700, Gurucharan Shetty wrote: >> > Should OVS chdir to the root (on all drives?) on Windows >> > also? >> Looks like chdir("/") on windows takes me to C:/ . So doing that on >> windows should be good. I will re-spin this. > > OK. > > If Windows works like it used to, there's an independent notion of the > current working directory on every drive, so one would have to do > something like: > > char c; > > for (c = 'a'; c <= 'z'; c++) { > char dir[] = {c, ':', '/', 0}; > chdir(dir); > } > > to really chdir to the root everywhere. But I do not know whether it > matters. I see that this notion is true in current versions of windows too. I suppose a working directory for every drive is useful if a program changes its drive midway in the program. (This is what I understood after trying reading some msdn documentation, though there are probably other reasons too.) I decided to use the first version of this patch for the time being (which does not do anything other than accept the "--chdir" option. Once I see a real use case for doing something, I will add the feature. Thanks, Guru ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH V2 3/4] datapath: Convert mask list in mask array.
mask caches index of mask in mask_list. On packet recv OVS need to traverse mask-list to get cached mask. Therefore array is better for retrieving cached mask. This also allows better cache replacement algorithm by directly checking mask's existence. Signed-off-by: Pravin B Shelar --- datapath/flow.h |1 - datapath/flow_table.c | 200 ++--- datapath/flow_table.h |8 +- 3 files changed, 163 insertions(+), 46 deletions(-) diff --git a/datapath/flow.h b/datapath/flow.h index d05a9f4..2018691 100644 --- a/datapath/flow.h +++ b/datapath/flow.h @@ -127,7 +127,6 @@ struct sw_flow_key_range { struct sw_flow_mask { int ref_count; struct rcu_head rcu; - struct list_head list; struct sw_flow_key_range range; struct sw_flow_key key; }; diff --git a/datapath/flow_table.c b/datapath/flow_table.c index cc0eaf2..c8bd9d1 100644 --- a/datapath/flow_table.c +++ b/datapath/flow_table.c @@ -47,6 +47,7 @@ #include "vlan.h" #define TBL_MIN_BUCKETS1024 +#define MASK_ARRAY_SIZE_MIN16 #define REHASH_INTERVAL(10 * 60 * HZ) #define MC_HASH_SHIFT 8 @@ -211,26 +212,83 @@ static struct table_instance *table_instance_alloc(int new_size) return ti; } +static void mask_array_rcu_cb(struct rcu_head *rcu) +{ + struct mask_array *ma = container_of(rcu, struct mask_array, rcu); + + kfree(ma); +} + +static struct mask_array *tbl_mask_array_alloc(int size) +{ + struct mask_array *new; + + new = kzalloc(sizeof(struct mask_array) + + sizeof(struct sw_flow_mask *) * size, GFP_KERNEL); + if (!new) + return NULL; + + new->count = 0; + new->max = size; + + return new; +} + +static int tbl_mask_array_realloc(struct flow_table *tbl, int size) +{ + struct mask_array *old; + struct mask_array *new; + + new = tbl_mask_array_alloc(size); + if (!new) + return -ENOMEM; + + old = ovsl_dereference(tbl->mask_array); + if (old) { + int i; + + for (i = 0; i < old->max; i++) { + if (old->masks[i]) + new->masks[new->count++] = old->masks[i]; + } + } + rcu_assign_pointer(tbl->mask_array, new); + + if (old) + call_rcu(&old->rcu, mask_array_rcu_cb); + + return 0; +} + int ovs_flow_tbl_init(struct flow_table *table) { struct table_instance *ti; + struct mask_array *ma; table->mask_cache = __alloc_percpu(sizeof(struct mask_cache_entry) * MC_HASH_ENTRIES, __alignof__(struct mask_cache_entry)); if (!table->mask_cache) return -ENOMEM; + ma = tbl_mask_array_alloc(MASK_ARRAY_SIZE_MIN); + if (!ma) + goto free_mask_cache; + ti = table_instance_alloc(TBL_MIN_BUCKETS); - if (!ti) { - free_percpu(table->mask_cache); - return -ENOMEM; - } + if (!ti) + goto free_mask_array; rcu_assign_pointer(table->ti, ti); - INIT_LIST_HEAD(&table->mask_list); + rcu_assign_pointer(table->mask_array, ma); table->last_rehash = jiffies; table->count = 0; return 0; + +free_mask_array: + kfree((struct mask_array __force *)table->mask_array); +free_mask_cache: + free_percpu(table->mask_cache); + return -ENOMEM; } static void flow_tbl_destroy_rcu_cb(struct rcu_head *rcu) @@ -276,6 +334,7 @@ void ovs_flow_tbl_destroy(struct flow_table *table) struct table_instance *ti = (struct table_instance __force *)table->ti; free_percpu(table->mask_cache); + kfree((struct mask_array __force *)table->mask_array); table_instance_destroy(ti, false); } @@ -457,17 +516,27 @@ static struct sw_flow *masked_flow_lookup(struct table_instance *ti, static struct sw_flow *flow_lookup(struct flow_table *tbl, struct table_instance *ti, + struct mask_array *ma, const struct sw_flow_key *key, - u32 *n_mask_hit) + u32 *n_mask_hit, + u32 *index) { - struct sw_flow_mask *mask; struct sw_flow *flow; + int i; - list_for_each_entry_rcu(mask, &tbl->mask_list, list) { - flow = masked_flow_lookup(ti, key, mask, n_mask_hit); - if (flow) /* Found */ - return flow; + for (i = 0; i < ma->max; i++) { + struct sw_flow_mask *mask; + + mask = rcu_dereference_ovsl(ma->masks[i]); + if (mask) { + flow = masked_flow_lookup(ti, key, mask, n_mask_hit); + if (flow) { /* Fou
[ovs-dev] [PATCH V2 4/4] datapath: Compact mask list array.
Along with flow-table rehashing OVS can compact masks array. This allows us to calculate highest index for mask array. Signed-off-by: Pravin B Shelar --- datapath/flow_table.c | 24 datapath/flow_table.h |2 +- 2 files changed, 21 insertions(+), 5 deletions(-) diff --git a/datapath/flow_table.c b/datapath/flow_table.c index c8bd9d1..ea4100f 100644 --- a/datapath/flow_table.c +++ b/datapath/flow_table.c @@ -230,6 +230,7 @@ static struct mask_array *tbl_mask_array_alloc(int size) new->count = 0; new->max = size; + new->hi_index = 0; return new; } @@ -252,6 +253,8 @@ static int tbl_mask_array_realloc(struct flow_table *tbl, int size) new->masks[new->count++] = old->masks[i]; } } + + new->hi_index = new->count; rcu_assign_pointer(tbl->mask_array, new); if (old) @@ -260,6 +263,17 @@ static int tbl_mask_array_realloc(struct flow_table *tbl, int size) return 0; } +static void tbl_mask_array_compact(struct flow_table *tbl) +{ + struct mask_array *ma; + int size; + + ma = ovsl_dereference(tbl->mask_array); + + size = roundup(ma->count, MASK_ARRAY_SIZE_MIN); + tbl_mask_array_realloc(tbl, size); +} + int ovs_flow_tbl_init(struct flow_table *table) { struct table_instance *ti; @@ -524,7 +538,7 @@ static struct sw_flow *flow_lookup(struct flow_table *tbl, struct sw_flow *flow; int i; - for (i = 0; i < ma->max; i++) { + for (i = 0; i < ma->hi_index; i++) { struct sw_flow_mask *mask; mask = rcu_dereference_ovsl(ma->masks[i]); @@ -648,7 +662,7 @@ static void flow_mask_remove(struct flow_table *tbl, struct sw_flow_mask *mask) int i; ma = ovsl_dereference(tbl->mask_array); - for (i = 0; i < ma->max; i++) { + for (i = 0; i < ma->hi_index; i++) { if (mask == ovsl_dereference(ma->masks[i])) { RCU_INIT_POINTER(ma->masks[i], NULL); ma->count--; @@ -705,7 +719,7 @@ static struct sw_flow_mask *flow_mask_find(const struct flow_table *tbl, int i; ma = ovsl_dereference(tbl->mask_array); - for (i = 0; i < ma->max; i++) { + for (i = 0; i < ma->hi_index; i++) { struct sw_flow_mask *t; t = ovsl_dereference(ma->masks[i]); @@ -753,8 +767,9 @@ static int flow_mask_insert(struct flow_table *tbl, struct sw_flow *flow, t = ovsl_dereference(ma->masks[i]); if (!t) { - rcu_assign_pointer(ma->masks[i], mask); ma->count++; + ma->hi_index = i + 1; + rcu_assign_pointer(ma->masks[i], mask); break; } } @@ -794,6 +809,7 @@ int ovs_flow_tbl_insert(struct flow_table *table, struct sw_flow *flow, if (new_ti) { rcu_assign_pointer(table->ti, new_ti); table_instance_destroy(ti, true); + tbl_mask_array_compact(table); table->last_rehash = jiffies; } return 0; diff --git a/datapath/flow_table.h b/datapath/flow_table.h index ee86953..cf898ba 100644 --- a/datapath/flow_table.h +++ b/datapath/flow_table.h @@ -43,7 +43,7 @@ struct mask_cache_entry { struct mask_array { struct rcu_head rcu; - int count, max; + int count, max, hi_index; struct sw_flow_mask __rcu *masks[]; }; -- 1.7.9.5 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH V2 2/4] datapath: Add flow mask cache.
On every packet OVS needs to lookup flow-table with every mask until it finds a match. The packet flow-key is first masked with mask in the list and then the masked key is looked up in flow-table. Therefore number of masks can affect packet processing performance. Following patch adds mask index to mask cache from last pakcet lookup in same flow. Index of mask is stored in this cache. This cache is searched by 5 tuple hash (skb rxhash). Signed-off-by: Pravin B Shelar --- Fixed according to comments from Thomas Graf - Added comment for cache lookup. - check for zero hash - break of loop in case of hash collision. --- datapath/datapath.c |3 +- datapath/flow_table.c | 104 +++-- datapath/flow_table.h | 11 +- 3 files changed, 103 insertions(+), 15 deletions(-) diff --git a/datapath/datapath.c b/datapath/datapath.c index fcaafd1..10706f5 100644 --- a/datapath/datapath.c +++ b/datapath/datapath.c @@ -253,7 +253,8 @@ void ovs_dp_process_packet_with_key(struct sk_buff *skb, stats = this_cpu_ptr(dp->stats_percpu); /* Look up flow. */ - flow = ovs_flow_tbl_lookup_stats(&dp->table, pkt_key, &n_mask_hit); + flow = ovs_flow_tbl_lookup_stats(&dp->table, pkt_key, skb_get_rxhash(skb), +&n_mask_hit); if (unlikely(!flow)) { struct dp_upcall_info upcall; diff --git a/datapath/flow_table.c b/datapath/flow_table.c index be2d7d5..cc0eaf2 100644 --- a/datapath/flow_table.c +++ b/datapath/flow_table.c @@ -49,6 +49,10 @@ #define TBL_MIN_BUCKETS1024 #define REHASH_INTERVAL(10 * 60 * HZ) +#define MC_HASH_SHIFT 8 +#define MC_HASH_ENTRIES(1u << MC_HASH_SHIFT) +#define MC_HASH_SEGS ((sizeof(uint32_t) * 8) / MC_HASH_SHIFT) + static struct kmem_cache *flow_cache; struct kmem_cache *flow_stats_cache __read_mostly; @@ -211,10 +215,16 @@ int ovs_flow_tbl_init(struct flow_table *table) { struct table_instance *ti; - ti = table_instance_alloc(TBL_MIN_BUCKETS); + table->mask_cache = __alloc_percpu(sizeof(struct mask_cache_entry) * + MC_HASH_ENTRIES, __alignof__(struct mask_cache_entry)); + if (!table->mask_cache) + return -ENOMEM; - if (!ti) + ti = table_instance_alloc(TBL_MIN_BUCKETS); + if (!ti) { + free_percpu(table->mask_cache); return -ENOMEM; + } rcu_assign_pointer(table->ti, ti); INIT_LIST_HEAD(&table->mask_list); @@ -265,6 +275,7 @@ void ovs_flow_tbl_destroy(struct flow_table *table) { struct table_instance *ti = (struct table_instance __force *)table->ti; + free_percpu(table->mask_cache); table_instance_destroy(ti, false); } @@ -420,7 +431,8 @@ bool ovs_flow_cmp_unmasked_key(const struct sw_flow *flow, static struct sw_flow *masked_flow_lookup(struct table_instance *ti, const struct sw_flow_key *unmasked, - struct sw_flow_mask *mask) + struct sw_flow_mask *mask, + u32 *n_mask_hit) { struct sw_flow *flow; struct hlist_head *head; @@ -432,6 +444,7 @@ static struct sw_flow *masked_flow_lookup(struct table_instance *ti, ovs_flow_mask_key(&masked_key, unmasked, mask); hash = flow_hash(&masked_key, key_start, key_end); head = find_bucket(ti, hash); + (*n_mask_hit)++; hlist_for_each_entry_rcu(flow, head, hash_node[ti->node_ver]) { if (flow->mask == mask && flow->hash == hash && flow_cmp_masked_key(flow, &masked_key, @@ -441,30 +454,97 @@ static struct sw_flow *masked_flow_lookup(struct table_instance *ti, return NULL; } -struct sw_flow *ovs_flow_tbl_lookup_stats(struct flow_table *tbl, - const struct sw_flow_key *key, - u32 *n_mask_hit) + +static struct sw_flow *flow_lookup(struct flow_table *tbl, + struct table_instance *ti, + const struct sw_flow_key *key, + u32 *n_mask_hit) { - struct table_instance *ti = rcu_dereference_ovsl(tbl->ti); struct sw_flow_mask *mask; struct sw_flow *flow; - *n_mask_hit = 0; list_for_each_entry_rcu(mask, &tbl->mask_list, list) { - (*n_mask_hit)++; - flow = masked_flow_lookup(ti, key, mask); + flow = masked_flow_lookup(ti, key, mask, n_mask_hit); if (flow) /* Found */ return flow; } return NULL; } +/* + * mask_cache maps flow to probable mask. This cache is not tightly + * coupled cache, It means updates to mask list can result i
[ovs-dev] [PATCH V2 1/4] datapath: Move table destroy to dp-rcu callback.
Ths simplifies flow-table-destroy API. This change is required for following patches. Signed-off-by: Pravin B Shelar --- Added comment --- datapath/datapath.c |5 ++--- datapath/flow_table.c |8 +--- datapath/flow_table.h |2 +- 3 files changed, 8 insertions(+), 7 deletions(-) diff --git a/datapath/datapath.c b/datapath/datapath.c index aa4c109..fcaafd1 100644 --- a/datapath/datapath.c +++ b/datapath/datapath.c @@ -187,6 +187,7 @@ static void destroy_dp_rcu(struct rcu_head *rcu) { struct datapath *dp = container_of(rcu, struct datapath, rcu); + ovs_flow_tbl_destroy(&dp->table); free_percpu(dp->stats_percpu); release_net(ovs_dp_get_net(dp)); kfree(dp->ports); @@ -1447,7 +1448,7 @@ err_destroy_ports_array: err_destroy_percpu: free_percpu(dp->stats_percpu); err_destroy_table: - ovs_flow_tbl_destroy(&dp->table, false); + ovs_flow_tbl_destroy(&dp->table); err_free_dp: release_net(ovs_dp_get_net(dp)); kfree(dp); @@ -1479,8 +1480,6 @@ static void __dp_destroy(struct datapath *dp) ovs_dp_detach_port(ovs_vport_ovsl(dp, OVSP_LOCAL)); /* RCU destroy the flow table */ - ovs_flow_tbl_destroy(&dp->table, true); - call_rcu(&dp->rcu, destroy_dp_rcu); } diff --git a/datapath/flow_table.c b/datapath/flow_table.c index 159572b..be2d7d5 100644 --- a/datapath/flow_table.c +++ b/datapath/flow_table.c @@ -259,11 +259,13 @@ skip_flows: __table_instance_destroy(ti); } -void ovs_flow_tbl_destroy(struct flow_table *table, bool deferred) +/* No need for locking this function is called from RCU callback or + * error path. */ +void ovs_flow_tbl_destroy(struct flow_table *table) { - struct table_instance *ti = ovsl_dereference(table->ti); + struct table_instance *ti = (struct table_instance __force *)table->ti; - table_instance_destroy(ti, deferred); + table_instance_destroy(ti, false); } struct sw_flow *ovs_flow_tbl_dump_next(struct table_instance *ti, diff --git a/datapath/flow_table.h b/datapath/flow_table.h index ca8a582..ddf0c01 100644 --- a/datapath/flow_table.h +++ b/datapath/flow_table.h @@ -62,7 +62,7 @@ void ovs_flow_free(struct sw_flow *, bool deferred); int ovs_flow_tbl_init(struct flow_table *); int ovs_flow_tbl_count(struct flow_table *table); -void ovs_flow_tbl_destroy(struct flow_table *table, bool deferred); +void ovs_flow_tbl_destroy(struct flow_table *table); int ovs_flow_tbl_flush(struct flow_table *flow_table); int ovs_flow_tbl_insert(struct flow_table *table, struct sw_flow *flow, -- 1.7.9.5 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 07/11] daemon-windows: Recognize --no-chdir option for windows.
On Wed, Apr 23, 2014 at 8:47 AM, Gurucharan Shetty wrote: > On Tue, Apr 22, 2014 at 2:58 PM, Ben Pfaff wrote: >> On Tue, Apr 22, 2014 at 01:40:55PM -0700, Gurucharan Shetty wrote: >>> > Should OVS chdir to the root (on all drives?) on Windows >>> > also? >>> Looks like chdir("/") on windows takes me to C:/ . So doing that on >>> windows should be good. I will re-spin this. >> >> OK. >> >> If Windows works like it used to, there's an independent notion of the >> current working directory on every drive, so one would have to do >> something like: >> >> char c; >> >> for (c = 'a'; c <= 'z'; c++) { >> char dir[] = {c, ':', '/', 0}; >> chdir(dir); >> } >> >> to really chdir to the root everywhere. But I do not know whether it >> matters. > I see that this notion is true in current versions of windows too. > > I suppose a working directory for every drive is useful if a program > changes its drive midway in the program. (This is what I understood > after trying reading some msdn documentation, though there are > probably other reasons too.) > > I decided to use the first version of this patch for the time being > (which does not do anything other than accept the "--chdir" option. > Once I see a real use case for doing something, I will add the > feature. OK, I'm happy with that. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] run-ryu: Use unix socket rather than patch ports
On Wed, Apr 23, 2014 at 02:06:12PM +0900, Simon Horman wrote: > My understanding of the implementation of patch ports is that they > are rather special, being handled as a special case inside > compose_output_action__() and do not exist in the datapath. > > A side effect of the implementation of patch ports (though perhaps not the > portion mentioned above) is that the OFPUTIL_PC_PORT_DOWN bit may not be > set via a port mod message. In particular, the call to > netdev_turn_flags_on() in update_port_config() fails. > > There is a test provided by Ryu that test this via port mod and thus fails. > > While that test could be modified or the results ignored it seems to me > that it would be best if ryu-check used ports which were more fully > featured and not special cases. > > Thus this patch moves run-ryu to use unix socket backed ports rather than > patch ports. > > Signed-off-by: Simon Horman Applied, thanks. The following seemed good rationale so I folded it into the commit message: > * I believe a more significant problem with the use of patch ports > is that they will require (more) special case code in order to correctly > handle recirculation. As Ryu provides many tests that exercise > recirculation for MPLS it would be nice if they could be used to exercise > recirculation for MPLS (which I have provided patches for separately[1]) > without the need to add more special-case code for that purpose. > > I believe that patch ports are also incompatible with recirculation for > bonding, which has already been merged, though I have not verified that > and it is not strictly related to this patch as I do not believe that Ryu > provides any tests to exercise that case. > > The key problem with patch ports in the context of recirculation is that > the ofproto and in_port may change during translation. And this > information is lost by the time that execution occurs. > > Furthermore the new in_port will not exist in the datapath as it is a > patch port. That particular problem may be addressed by executing the > actions in user-space, I have posted patches to provide infrastructure > for that[1]. > > Overall it is not clear to me that the complexity of supporting > recirculation for patch-ports would have sufficient pay-off. > > [1] [PATCH v3 00/16] Flow-Based Recirculation for MPLS ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCHv2] revalidator: Prevent handling the same flow twice.
On Tue, Apr 22, 2014 at 10:24 PM, Alex Wang wrote: > diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c >> index 416cfdc..906bf17 100644 >> --- a/ofproto/ofproto-dpif-upcall.c >> +++ b/ofproto/ofproto-dpif-upcall.c >> @@ -45,6 +45,8 @@ >> >> VLOG_DEFINE_THIS_MODULE(ofproto_dpif_upcall); >> >> +COVERAGE_DEFINE(upcall_duplicate_flow); >> + >> /* A thread that reads upcalls from dpif, forwards each upcall's packet, >> * and possibly sets up a kernel flow as a cache. */ >> struct handler { >> @@ -161,6 +163,7 @@ struct udpif_key { >> long long int created; /* Estimation of creation time. */ >> >> bool mark; /* Used by mark and sweep GC >> algorithm. */ >> +bool flow_exists; /* Ensures flows are only deleted >> once. */ >> >> > > Do we still need mark? I think the function of 'mark' and 'flow_exists' > is overlapped. > Based on discussion offline, we still need the 'mark' for garbage collection. Don't know if you want to add more comments to explain them, Acked-by: Alex Wang ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCHv13] ofproto-dpif-upcall: Remove the flow_dumper thread.
On Tue, Apr 22, 2014 at 04:54:24PM +1200, Joe Stringer wrote: > From: Ethan Jackson > > Previously, we had a separate flow_dumper thread that fetched flows from > the datapath to distribute to revalidator threads. This patch takes the > logic for dumping and pushes it into the revalidator threads, resulting > in simpler code with similar performance to the current code. > > One thread, the "leader", is responsible for beginning and ending each > flow dump, maintaining the flow_limit, and checking whether the > revalidator threads need to exit. All revalidator threads dump, > revalidate, delete datapath flows and garbage collect ukeys. > > Co-authored-by: Joe Stringer > Signed-off-by: Joe Stringer Here in the definition of struct udpif, I would mention that there are n_revalidators member of 'ukeys'. Otherwise the casual reader might guess that there was only one: > +/* During the flow dump phase, revalidators insert into these with a > random > + * distribution. During the garbage collection phase, each revalidator > + * takes care of garbage collecting one of these hmaps. */ > +struct { > +struct ovs_mutex mutex;/* Guards the following. */ > +struct hmap hmap OVS_GUARDED; /* Datapath flow keys. */ > +} *ukeys; In the definition of struct udpif_key, the synchronization of most of the members is pretty clear, except for 'xcache'. Acked-by: Ben Pfaff ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2] lib/util: Input validation in str_to_uint
On Wed, Apr 23, 2014 at 02:45:21PM +0100, Zoltan Kiss wrote: > This function returns true when 's' is negative or greater than UINT_MAX. > Also, > the representation of 'int' and 'unsigned int' is implementation dependent, so > converting [INT_MAX..UINT_MAX] values with str_to_int is fragile. > Instead, we should convert straight to 'long long' and do a boundary check > before returning the converted value. > This patch also move the function to the .c file as it's not-trivial now, and > deletes the other str_to_u* functions as they are not used. > > Signed-off-by: Zoltan Kiss > --- > v2: > - remove OVS_UNLIKELY and unnecessary () > - move the function to util.c as it became not-trivial > - remove the other str_to_u* functions as they are not used Applied. I fixed up the coding style slightly: diff --git a/lib/util.c b/lib/util.c index 54065fe..1ebe22a 100644 --- a/lib/util.c +++ b/lib/util.c @@ -621,8 +621,7 @@ str_to_uint(const char *s, int base, unsigned int *u) if (!ok || ll < 0 || ll > UINT_MAX) { *u = 0; return false; -} -else { +} else { *u = ll; return true; } ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] Userspace Netlink MMAP status
Hi, I would like to ask, what's the status of enabling Netlink MMAP in the userspace? I'm interested to see this progressing, but digging the mail archive I've found it run into scalability issues: http://openvswitch.org/pipermail/dev/2013-December/034546.html And also it can't handle frags at the moment properly: http://openvswitch.org/pipermail/dev/2014-March/037337.html I was thinking about this scalability issue, and I think maybe we shouldn't stick to the 16 KB frame size. I think in most cases the packets we are actually sending up are small ones, in case of TCP the packets of the handshakes are less than 100 bytes. And for the rest, we can call genlmsg_new_unicast() with the full packet size, so it will fall back to non-mmaped communication. And this would solve the frag-handling problem as well. Another approach to keep the frame size lower is to pass references to the frags, which would point outside the ring buffer, to the actual page (which need to be mapped for the userspace). I don't know how feasible would that be, but huge linear buffers also has to be sliced up to frags. Regards, Zoli ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] Userspace Netlink MMAP status
The problem has actually gotten worse since we've gotten rid of the dispatcher thread. Now each thread has it's own channel per port. I wonder if the right approach is to simply ditch the per-port fairness in the case where mmap netlink is enabled. I.E. we simply have one channel per thread and call it a day. Anyways, I don't have a lot of context on this thread, so take everything above with a grain of salt. Ethan On Wed, Apr 23, 2014 at 1:05 PM, Zoltan Kiss wrote: > Hi, > > I would like to ask, what's the status of enabling Netlink MMAP in the > userspace? I'm interested to see this progressing, but digging the mail > archive I've found it run into scalability issues: > > http://openvswitch.org/pipermail/dev/2013-December/034546.html > > And also it can't handle frags at the moment properly: > > http://openvswitch.org/pipermail/dev/2014-March/037337.html > > I was thinking about this scalability issue, and I think maybe we shouldn't > stick to the 16 KB frame size. I think in most cases the packets we are > actually sending up are small ones, in case of TCP the packets of the > handshakes are less than 100 bytes. And for the rest, we can call > genlmsg_new_unicast() with the full packet size, so it will fall back to > non-mmaped communication. And this would solve the frag-handling problem as > well. > Another approach to keep the frame size lower is to pass references to the > frags, which would point outside the ring buffer, to the actual page (which > need to be mapped for the userspace). I don't know how feasible would that > be, but huge linear buffers also has to be sliced up to frags. > > Regards, > > Zoli > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH] ofproto-dpif: Improve code clarity and comments on recirc changes to rule_dpif_lookup
This patch improves the code readability and comments on the recirculation related changes to rule_dpif_lookup() base on off-line discussions with Jarno. There is no behavior changes. Signed-off-by: Andy Zhou --- ofproto/ofproto-dpif.c | 21 ++--- 1 file changed, 14 insertions(+), 7 deletions(-) diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c index 6a725e4..983cad5 100644 --- a/ofproto/ofproto-dpif.c +++ b/ofproto/ofproto-dpif.c @@ -3199,14 +3199,21 @@ rule_dpif_lookup(struct ofproto_dpif *ofproto, struct flow *flow, uint8_t table_id; if (ofproto_dpif_get_enable_recirc(ofproto)) { -if (flow->recirc_id == 0) { -if (wc) { -wc->masks.recirc_id = UINT32_MAX; -} -table_id = 0; -} else { -table_id = TBL_INTERNAL; +/* Always exactly match recirc_id since datapath supports + * recirculation. */ +if (wc) { +wc->masks.recirc_id = UINT32_MAX; } + +/* Start looking up from internal table for post recirculation flows + * or packets. We can also simply send all, including normal flows + * or packets to the internal table. They will not match any post + * recirculation rules except the 'catch all' rule that resubmit + * them to table 0. + * + * As an optimization, we send normal flows and packets to table 0 + * directly, saving one table lookup. */ +table_id = flow->recirc_id ? TBL_INTERNAL : 0; } else { table_id = 0; } -- 1.9.1 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] ofproto-dpif: Improve code clarity and comments on recirc changes to rule_dpif_lookup
Acked-by: Ethan Jackson On Wed, Apr 23, 2014 at 1:26 PM, Andy Zhou wrote: > This patch improves the code readability and comments on the > recirculation related changes to rule_dpif_lookup() base on off-line > discussions with Jarno. There is no behavior changes. > > Signed-off-by: Andy Zhou > --- > ofproto/ofproto-dpif.c | 21 ++--- > 1 file changed, 14 insertions(+), 7 deletions(-) > > diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c > index 6a725e4..983cad5 100644 > --- a/ofproto/ofproto-dpif.c > +++ b/ofproto/ofproto-dpif.c > @@ -3199,14 +3199,21 @@ rule_dpif_lookup(struct ofproto_dpif *ofproto, struct > flow *flow, > uint8_t table_id; > > if (ofproto_dpif_get_enable_recirc(ofproto)) { > -if (flow->recirc_id == 0) { > -if (wc) { > -wc->masks.recirc_id = UINT32_MAX; > -} > -table_id = 0; > -} else { > -table_id = TBL_INTERNAL; > +/* Always exactly match recirc_id since datapath supports > + * recirculation. */ > +if (wc) { > +wc->masks.recirc_id = UINT32_MAX; > } > + > +/* Start looking up from internal table for post recirculation flows > + * or packets. We can also simply send all, including normal flows > + * or packets to the internal table. They will not match any post > + * recirculation rules except the 'catch all' rule that resubmit > + * them to table 0. > + * > + * As an optimization, we send normal flows and packets to table 0 > + * directly, saving one table lookup. */ > +table_id = flow->recirc_id ? TBL_INTERNAL : 0; > } else { > table_id = 0; > } > -- > 1.9.1 > > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] Userspace Netlink MMAP status
On 04/23/2014 10:12 PM, Ethan Jackson wrote: The problem has actually gotten worse since we've gotten rid of the dispatcher thread. Now each thread has it's own channel per port. I wonder if the right approach is to simply ditch the per-port fairness in the case where mmap netlink is enabled. I.E. we simply have one channel per thread and call it a day. Anyways, I don't have a lot of context on this thread, so take everything above with a grain of salt. I agree with Ethan's statement. Even with a reduced frame size the cost of an individual ring buffer per port is likely still too large and we lose the benefit of zerocopy for large packets which are typically the expensive packets. As we extend the GSO path into the upcall and make use of the new DPDK style ofpbuf to avoid the memcpy() for the mmap case the cost of an upcall will be minimal and the fairness issues of the upcall itself will be minimized. However, I remember someone (Alex?) stating that the unfairness was primarily caused by the classification. Thomas Ethan On Wed, Apr 23, 2014 at 1:05 PM, Zoltan Kiss wrote: Hi, I would like to ask, what's the status of enabling Netlink MMAP in the userspace? I'm interested to see this progressing, but digging the mail archive I've found it run into scalability issues: http://openvswitch.org/pipermail/dev/2013-December/034546.html And also it can't handle frags at the moment properly: http://openvswitch.org/pipermail/dev/2014-March/037337.html I was thinking about this scalability issue, and I think maybe we shouldn't stick to the 16 KB frame size. I think in most cases the packets we are actually sending up are small ones, in case of TCP the packets of the handshakes are less than 100 bytes. And for the rest, we can call genlmsg_new_unicast() with the full packet size, so it will fall back to non-mmaped communication. And this would solve the frag-handling problem as well. Another approach to keep the frame size lower is to pass references to the frags, which would point outside the ring buffer, to the actual page (which need to be mapped for the userspace). I don't know how feasible would that be, but huge linear buffers also has to be sliced up to frags. Regards, Zoli ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH V2 1/4] datapath: Move table destroy to dp-rcu callback.
On 04/23/2014 05:49 PM, Pravin B Shelar wrote: Ths simplifies flow-table-destroy API. This change is required for following patches. Signed-off-by: Pravin B Shelar Thanks for adding the comment. Acked-by: Thomas Graf ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH V2 2/4] datapath: Add flow mask cache.
On 04/23/2014 05:49 PM, Pravin B Shelar wrote: On every packet OVS needs to lookup flow-table with every mask until it finds a match. The packet flow-key is first masked with mask in the list and then the masked key is looked up in flow-table. Therefore number of masks can affect packet processing performance. Following patch adds mask index to mask cache from last pakcet lookup in same flow. Index of mask is stored in this cache. This cache is searched by 5 tuple hash (skb rxhash). Signed-off-by: Pravin B Shelar This looks excellent now, thanks! Acked-by: Thomas Graf ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH V2 3/4] datapath: Convert mask list in mask array.
On 04/23/2014 05:49 PM, Pravin B Shelar wrote: mask caches index of mask in mask_list. On packet recv OVS need to traverse mask-list to get cached mask. Therefore array is better for retrieving cached mask. This also allows better cache replacement algorithm by directly checking mask's existence. Signed-off-by: Pravin B Shelar LGTM Acked-by: Thomas Graf + if (old) + call_rcu(&old->rcu, mask_array_rcu_cb); At some point we should simply backport rcupdate.h to make kfree_rcu() available and get rid of all these wrappers. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH V2 4/4] datapath: Compact mask list array.
On 04/23/2014 05:49 PM, Pravin B Shelar wrote: Along with flow-table rehashing OVS can compact masks array. This allows us to calculate highest index for mask array. Signed-off-by: Pravin B Shelar Looks great in general, see comment below. --- datapath/flow_table.c | 24 datapath/flow_table.h |2 +- 2 files changed, 21 insertions(+), 5 deletions(-) diff --git a/datapath/flow_table.c b/datapath/flow_table.c index c8bd9d1..ea4100f 100644 --- a/datapath/flow_table.c +++ b/datapath/flow_table.c @@ -230,6 +230,7 @@ static struct mask_array *tbl_mask_array_alloc(int size) new->count = 0; new->max = size; + new->hi_index = 0; return new; } @@ -252,6 +253,8 @@ static int tbl_mask_array_realloc(struct flow_table *tbl, int size) new->masks[new->count++] = old->masks[i]; } } + + new->hi_index = new->count; rcu_assign_pointer(tbl->mask_array, new); if (old) @@ -260,6 +263,17 @@ static int tbl_mask_array_realloc(struct flow_table *tbl, int size) return 0; } +static void tbl_mask_array_compact(struct flow_table *tbl) +{ + struct mask_array *ma; + int size; + + ma = ovsl_dereference(tbl->mask_array); + + size = roundup(ma->count, MASK_ARRAY_SIZE_MIN); + tbl_mask_array_realloc(tbl, size); I think you have to clear the masked cache as well. There might still be cached masks pointing to the now out of bounds array if you compact after having removed flow masks, right? ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] ofproto-dpif: Improve code clarity and comments on recirc changes to rule_dpif_lookup
Pushed. Thanks for the review. On Wed, Apr 23, 2014 at 2:30 PM, Ethan Jackson wrote: > Acked-by: Ethan Jackson > > > On Wed, Apr 23, 2014 at 1:26 PM, Andy Zhou wrote: >> This patch improves the code readability and comments on the >> recirculation related changes to rule_dpif_lookup() base on off-line >> discussions with Jarno. There is no behavior changes. >> >> Signed-off-by: Andy Zhou >> --- >> ofproto/ofproto-dpif.c | 21 ++--- >> 1 file changed, 14 insertions(+), 7 deletions(-) >> >> diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c >> index 6a725e4..983cad5 100644 >> --- a/ofproto/ofproto-dpif.c >> +++ b/ofproto/ofproto-dpif.c >> @@ -3199,14 +3199,21 @@ rule_dpif_lookup(struct ofproto_dpif *ofproto, >> struct flow *flow, >> uint8_t table_id; >> >> if (ofproto_dpif_get_enable_recirc(ofproto)) { >> -if (flow->recirc_id == 0) { >> -if (wc) { >> -wc->masks.recirc_id = UINT32_MAX; >> -} >> -table_id = 0; >> -} else { >> -table_id = TBL_INTERNAL; >> +/* Always exactly match recirc_id since datapath supports >> + * recirculation. */ >> +if (wc) { >> +wc->masks.recirc_id = UINT32_MAX; >> } >> + >> +/* Start looking up from internal table for post recirculation flows >> + * or packets. We can also simply send all, including normal flows >> + * or packets to the internal table. They will not match any post >> + * recirculation rules except the 'catch all' rule that resubmit >> + * them to table 0. >> + * >> + * As an optimization, we send normal flows and packets to table 0 >> + * directly, saving one table lookup. */ >> +table_id = flow->recirc_id ? TBL_INTERNAL : 0; >> } else { >> table_id = 0; >> } >> -- >> 1.9.1 >> >> ___ >> dev mailing list >> dev@openvswitch.org >> http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCHv2] revalidator: Prevent handling the same flow twice.
On 24 April 2014 04:56, Alex Wang wrote: > > On Tue, Apr 22, 2014 at 10:24 PM, Alex Wang wrote: > >> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c >>> index 416cfdc..906bf17 100644 >>> --- a/ofproto/ofproto-dpif-upcall.c >>> +++ b/ofproto/ofproto-dpif-upcall.c >>> @@ -45,6 +45,8 @@ >>> >>> VLOG_DEFINE_THIS_MODULE(ofproto_dpif_upcall); >>> >>> +COVERAGE_DEFINE(upcall_duplicate_flow); >>> + >>> /* A thread that reads upcalls from dpif, forwards each upcall's packet, >>> * and possibly sets up a kernel flow as a cache. */ >>> struct handler { >>> @@ -161,6 +163,7 @@ struct udpif_key { >>> long long int created; /* Estimation of creation time. */ >>> >>> bool mark; /* Used by mark and sweep GC >>> algorithm. */ >>> +bool flow_exists; /* Ensures flows are only deleted >>> once. */ >>> >>> >> >> Do we still need mark? I think the function of 'mark' and 'flow_exists' >> is overlapped. >> > Strictly speaking, no. But I think there are a couple of benefits: (1) If they are separate, then we can detect that the flow was already handled and chosen to be kept: * Revalidator finishes a round of revalidation. mark is set to false. flow_exists is true. * Consider when we dump a flow, revalidate it and choose to keep it. We set the mark to true. Flow_exists is true. * If we dump a duplicate of the flow, then "mark" will be true. This means that the flow was handled already. If it was just handled, there is unlikely to be any changes. We can skip execution. (2) If they are separate, then we know when the datapath has skipped a flow during flow_dump (if we've seen the flow before at all): * If a flow is dumped and we decide to delete it, "mark" and "flow_exists" will both be set to false. * If a flow is dumped and we decide to keep it, "mark" and "flow_exists" will both be set to true. * If the datapath doesn't dump a flow that does exist (due to some race condition), then "mark" will be false from previous revalidator_sweep() and "flow_exists" will be true. * Revalidator_sweep(): If a flow has "mark" set false and "flow_exists" set true, then we haven't seen the flow. Currently we delete such flows from the datapath, so that we can garbage collect the ukey and update the stats for that flow. Thanks for the review Alex. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH 1/3] ofproto: RCU postpone rule destruction.
This allows rules to be used without taking references while RCU protected. The last step of destroying an ofproto also needs to be postponed, as the rule destruction requires the class structure to be available at the postponed destruction callback. Signed-off-by: Jarno Rajahalme --- ofproto/ofproto.c | 40 ++-- 1 file changed, 26 insertions(+), 14 deletions(-) diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c index f16005c..f10d3ae 100644 --- a/ofproto/ofproto.c +++ b/ofproto/ofproto.c @@ -259,7 +259,6 @@ struct ofport_usage { }; /* rule. */ -static void ofproto_rule_destroy__(struct rule *); static void ofproto_rule_send_removed(struct rule *, uint8_t reason); static bool rule_is_modifiable(const struct rule *rule, enum ofputil_flow_mod_flags flag); @@ -1372,7 +1371,8 @@ ofproto_destroy(struct ofproto *p) } p->ofproto_class->destruct(p); -ofproto_destroy__(p); +/* Destroying rules is deferred, must have 'ofproto' around for them. */ +ovsrcu_postpone(ofproto_destroy__, p); } /* Destroys the datapath with the respective 'name' and 'type'. With the Linux @@ -2642,6 +2642,23 @@ update_mtu(struct ofproto *p, struct ofport *port) } } +static void +ofproto_rule_destroy__(struct rule *rule) +OVS_NO_THREAD_SAFETY_ANALYSIS +{ +cls_rule_destroy(CONST_CAST(struct cls_rule *, &rule->cr)); +rule_actions_destroy(rule_get_actions(rule)); +ovs_mutex_destroy(&rule->mutex); +rule->ofproto->ofproto_class->rule_dealloc(rule); +} + +static void +rule_destroy_cb(struct rule *rule) +{ +rule->ofproto->ofproto_class->rule_destruct(rule); +ofproto_rule_destroy__(rule); +} + void ofproto_rule_ref(struct rule *rule) { @@ -2650,25 +2667,20 @@ ofproto_rule_ref(struct rule *rule) } } +/* Decrements 'rule''s ref_count and schedules 'rule' to be destroyed if the + * ref_count reaches 0. + * + * Use of RCU allows short term use (between RCU quiescent periods) without + * keeping a reference. A reference must be taken if the rule needs to + * stay around accross the RCU quiescent periods. */ void ofproto_rule_unref(struct rule *rule) { if (rule && ovs_refcount_unref(&rule->ref_count) == 1) { -rule->ofproto->ofproto_class->rule_destruct(rule); -ofproto_rule_destroy__(rule); +ovsrcu_postpone(rule_destroy_cb, rule); } } -static void -ofproto_rule_destroy__(struct rule *rule) -OVS_NO_THREAD_SAFETY_ANALYSIS -{ -cls_rule_destroy(CONST_CAST(struct cls_rule *, &rule->cr)); -rule_actions_destroy(rule_get_actions(rule)); -ovs_mutex_destroy(&rule->mutex); -rule->ofproto->ofproto_class->rule_dealloc(rule); -} - static uint32_t get_provider_meter_id(const struct ofproto *, uint32_t of_meter_id); -- 1.7.10.4 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH 2/3] ofproto: Make taking rule reference conditional on lookup.
Prior to this paths the rule lookup functions have always taken a reference on the found rule before returning. Make this conditional, so that unnecessary refs/unrefs can be avoided in a later patch. Signed-off-by: Jarno Rajahalme --- ofproto/ofproto-dpif-xlate.c |8 +++--- ofproto/ofproto-dpif.c | 58 +- ofproto/ofproto-dpif.h | 20 --- 3 files changed, 62 insertions(+), 24 deletions(-) diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c index 248382f..c62424a 100644 --- a/ofproto/ofproto-dpif-xlate.c +++ b/ofproto/ofproto-dpif-xlate.c @@ -2057,7 +2057,7 @@ xlate_table_action(struct xlate_ctx *ctx, ofp_port_t in_port, uint8_t table_id, !skip_wildcards ? &ctx->xout->wc : NULL, honor_table_miss, - &ctx->table_id, &rule); + &ctx->table_id, &rule, true); ctx->xin->flow.in_port.ofp_port = old_in_port; if (ctx->xin->resubmit_hook) { @@ -2090,7 +2090,7 @@ xlate_table_action(struct xlate_ctx *ctx, ofp_port_t in_port, uint8_t table_id, } choose_miss_rule(config, ctx->xbridge->miss_rule, - ctx->xbridge->no_packet_in_rule, &rule); + ctx->xbridge->no_packet_in_rule, &rule, true); match: if (rule) { @@ -2654,7 +2654,7 @@ xlate_learn_action(struct xlate_ctx *ctx, entry = xlate_cache_add_entry(ctx->xin->xcache, XC_LEARN); entry->u.learn.ofproto = ctx->xin->ofproto; rule_dpif_lookup(ctx->xbridge->ofproto, &ctx->xin->flow, NULL, - &entry->u.learn.rule); + &entry->u.learn.rule, true); } } @@ -3263,7 +3263,7 @@ xlate_actions__(struct xlate_in *xin, struct xlate_out *xout) if (!xin->ofpacts && !ctx.rule) { ctx.table_id = rule_dpif_lookup(ctx.xbridge->ofproto, flow, !xin->skip_wildcards ? wc : NULL, -&rule); +&rule, true); if (ctx.xin->resubmit_stats) { rule_dpif_credit_stats(rule, ctx.xin->resubmit_stats); } diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c index 983cad5..5669cd1 100644 --- a/ofproto/ofproto-dpif.c +++ b/ofproto/ofproto-dpif.c @@ -3189,10 +3189,16 @@ rule_dpif_get_actions(const struct rule_dpif *rule) * * The return value will be zero unless there was a miss and * OFPTC11_TABLE_MISS_CONTINUE is in effect for the sequence of tables - * where misses occur. */ + * where misses occur. + * + * The rule is returned in '*rule', which is valid at least until the next + * RCU quiescent period. If the '*rule' needs to stay around longer, + * a non-zero 'take_ref' must be passed in to cause a reference to be taken + * on it before this returns. */ uint8_t rule_dpif_lookup(struct ofproto_dpif *ofproto, struct flow *flow, - struct flow_wildcards *wc, struct rule_dpif **rule) + struct flow_wildcards *wc, struct rule_dpif **rule, + bool take_ref) { enum rule_dpif_lookup_verdict verdict; enum ofputil_port_config config = 0; @@ -3219,7 +3225,7 @@ rule_dpif_lookup(struct ofproto_dpif *ofproto, struct flow *flow, } verdict = rule_dpif_lookup_from_table(ofproto, flow, wc, true, - &table_id, rule); + &table_id, rule, take_ref); switch (verdict) { case RULE_DPIF_LOOKUP_VERDICT_MATCH: @@ -3248,13 +3254,17 @@ rule_dpif_lookup(struct ofproto_dpif *ofproto, struct flow *flow, } choose_miss_rule(config, ofproto->miss_rule, - ofproto->no_packet_in_rule, rule); + ofproto->no_packet_in_rule, rule, take_ref); return table_id; } +/* The returned rule is valid at least until the next RCU quiescent period. + * If the '*rule' needs to stay around longer, a non-zero 'take_ref' must be + * passed in to cause a reference to be taken on it before this returns. */ static struct rule_dpif * rule_dpif_lookup_in_table(struct ofproto_dpif *ofproto, uint8_t table_id, - const struct flow *flow, struct flow_wildcards *wc) + const struct flow *flow, struct flow_wildcards *wc, + bool take_ref) { struct classifier *cls = &ofproto->up.tables[table_id].cls; const struct cls_rule *cls_rule; @@ -3288,7 +3298,9 @@ rule_dpif_lookup_in_table(struct ofproto_dpif *ofproto, uint8_t table_id, } rule = rule_dpif_cast(rule_from_cls_rule(cls_rule)); -rule_dpif_ref(rule); +if (take_ref) { +rule_dpif_ref(rule); +} fat_rwlock_unlock(&cls->r
[ovs-dev] [PATCH 3/3] ofproto: Reduce taking rule references.
Only take reference to a looked up rule when needed. This reduces the total CPU utilization of rule_ref/unref calls by 80%, from 5% of total server CPU capacity to 1% in a netperf TCP_CRR test stressing the userspace. Signed-off-by: Jarno Rajahalme --- ofproto/ofproto-dpif-xlate.c | 49 +++--- 1 file changed, 27 insertions(+), 22 deletions(-) diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c index c62424a..6a56a15 100644 --- a/ofproto/ofproto-dpif-xlate.c +++ b/ofproto/ofproto-dpif-xlate.c @@ -1996,13 +1996,6 @@ xlate_recursively(struct xlate_ctx *ctx, struct rule_dpif *rule) if (ctx->xin->resubmit_stats) { rule_dpif_credit_stats(rule, ctx->xin->resubmit_stats); } -if (ctx->xin->xcache) { -struct xc_entry *entry; - -entry = xlate_cache_add_entry(ctx->xin->xcache, XC_RULE); -entry->u.rule = rule; -rule_dpif_ref(rule); -} ctx->resubmits++; ctx->recurse++; @@ -2057,7 +2050,8 @@ xlate_table_action(struct xlate_ctx *ctx, ofp_port_t in_port, uint8_t table_id, !skip_wildcards ? &ctx->xout->wc : NULL, honor_table_miss, - &ctx->table_id, &rule, true); + &ctx->table_id, &rule, + ctx->xin->xcache != NULL); ctx->xin->flow.in_port.ofp_port = old_in_port; if (ctx->xin->resubmit_hook) { @@ -2090,12 +2084,22 @@ xlate_table_action(struct xlate_ctx *ctx, ofp_port_t in_port, uint8_t table_id, } choose_miss_rule(config, ctx->xbridge->miss_rule, - ctx->xbridge->no_packet_in_rule, &rule, true); + ctx->xbridge->no_packet_in_rule, &rule, + ctx->xin->xcache != NULL); match: if (rule) { +/* Fill in the cache entry here instead of xlate_recursively + * to make the reference counting more explicit. We take a + * reference in the lookups above if we are going to cache the + * rule. */ +if (ctx->xin->xcache) { +struct xc_entry *entry; + +entry = xlate_cache_add_entry(ctx->xin->xcache, XC_RULE); +entry->u.rule = rule; +} xlate_recursively(ctx, rule); -rule_dpif_unref(rule); } ctx->table_id = old_table_id; @@ -2653,6 +2657,8 @@ xlate_learn_action(struct xlate_ctx *ctx, entry = xlate_cache_add_entry(ctx->xin->xcache, XC_LEARN); entry->u.learn.ofproto = ctx->xin->ofproto; +/* Lookup the learned rule, taking a reference on it. The reference + * is released when this cache entry is deleted. */ rule_dpif_lookup(ctx->xbridge->ofproto, &ctx->xin->flow, NULL, &entry->u.learn.rule, true); } @@ -2678,10 +2684,11 @@ xlate_fin_timeout(struct xlate_ctx *ctx, struct xc_entry *entry; entry = xlate_cache_add_entry(ctx->xin->xcache, XC_FIN_TIMEOUT); +/* XC_RULE already holds a reference on the rule, none is taken + * here. */ entry->u.fin.rule = ctx->rule; entry->u.fin.idle = oft->fin_idle_timeout; entry->u.fin.hard = oft->fin_hard_timeout; -rule_dpif_ref(ctx->rule); } } } @@ -3230,7 +3237,7 @@ xlate_actions__(struct xlate_in *xin, struct xlate_out *xout) ctx.xbridge = xbridge_lookup(xin->ofproto); if (!ctx.xbridge) { -goto out; +return; } ctx.rule = xin->rule; @@ -3263,7 +3270,7 @@ xlate_actions__(struct xlate_in *xin, struct xlate_out *xout) if (!xin->ofpacts && !ctx.rule) { ctx.table_id = rule_dpif_lookup(ctx.xbridge->ofproto, flow, !xin->skip_wildcards ? wc : NULL, -&rule, true); +&rule, ctx.xin->xcache != NULL); if (ctx.xin->resubmit_stats) { rule_dpif_credit_stats(rule, ctx.xin->resubmit_stats); } @@ -3271,7 +3278,6 @@ xlate_actions__(struct xlate_in *xin, struct xlate_out *xout) struct xc_entry *entry; entry = xlate_cache_add_entry(ctx.xin->xcache, XC_RULE); -rule_dpif_ref(rule); entry->u.rule = rule; } ctx.rule = rule; @@ -3309,7 +3315,7 @@ xlate_actions__(struct xlate_in *xin, struct xlate_out *xout) break; case OFPC_FRAG_DROP: -goto out; +return; case OFPC_FRAG_REASM: OVS_NOT_REACHED(); @@ -3456,9 +3462,6 @@ xlate_actions__(struct xlate_in *xin, struct xlate_out *xout) wc->masks.tp_src &= htons(UINT8_MAX)
Re: [ovs-dev] [PATCH 1/3] ofproto: RCU postpone rule destruction.
Acked-by: Ethan Jackson On Wed, Apr 23, 2014 at 4:20 PM, Jarno Rajahalme wrote: > This allows rules to be used without taking references while RCU > protected. > > The last step of destroying an ofproto also needs to be postponed, as > the rule destruction requires the class structure to be available at > the postponed destruction callback. > > Signed-off-by: Jarno Rajahalme > --- > ofproto/ofproto.c | 40 ++-- > 1 file changed, 26 insertions(+), 14 deletions(-) > > diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c > index f16005c..f10d3ae 100644 > --- a/ofproto/ofproto.c > +++ b/ofproto/ofproto.c > @@ -259,7 +259,6 @@ struct ofport_usage { > }; > > /* rule. */ > -static void ofproto_rule_destroy__(struct rule *); > static void ofproto_rule_send_removed(struct rule *, uint8_t reason); > static bool rule_is_modifiable(const struct rule *rule, > enum ofputil_flow_mod_flags flag); > @@ -1372,7 +1371,8 @@ ofproto_destroy(struct ofproto *p) > } > > p->ofproto_class->destruct(p); > -ofproto_destroy__(p); > +/* Destroying rules is deferred, must have 'ofproto' around for them. */ > +ovsrcu_postpone(ofproto_destroy__, p); > } > > /* Destroys the datapath with the respective 'name' and 'type'. With the > Linux > @@ -2642,6 +2642,23 @@ update_mtu(struct ofproto *p, struct ofport *port) > } > } > > +static void > +ofproto_rule_destroy__(struct rule *rule) > +OVS_NO_THREAD_SAFETY_ANALYSIS > +{ > +cls_rule_destroy(CONST_CAST(struct cls_rule *, &rule->cr)); > +rule_actions_destroy(rule_get_actions(rule)); > +ovs_mutex_destroy(&rule->mutex); > +rule->ofproto->ofproto_class->rule_dealloc(rule); > +} > + > +static void > +rule_destroy_cb(struct rule *rule) > +{ > +rule->ofproto->ofproto_class->rule_destruct(rule); > +ofproto_rule_destroy__(rule); > +} > + > void > ofproto_rule_ref(struct rule *rule) > { > @@ -2650,25 +2667,20 @@ ofproto_rule_ref(struct rule *rule) > } > } > > +/* Decrements 'rule''s ref_count and schedules 'rule' to be destroyed if the > + * ref_count reaches 0. > + * > + * Use of RCU allows short term use (between RCU quiescent periods) without > + * keeping a reference. A reference must be taken if the rule needs to > + * stay around accross the RCU quiescent periods. */ > void > ofproto_rule_unref(struct rule *rule) > { > if (rule && ovs_refcount_unref(&rule->ref_count) == 1) { > -rule->ofproto->ofproto_class->rule_destruct(rule); > -ofproto_rule_destroy__(rule); > +ovsrcu_postpone(rule_destroy_cb, rule); > } > } > > -static void > -ofproto_rule_destroy__(struct rule *rule) > -OVS_NO_THREAD_SAFETY_ANALYSIS > -{ > -cls_rule_destroy(CONST_CAST(struct cls_rule *, &rule->cr)); > -rule_actions_destroy(rule_get_actions(rule)); > -ovs_mutex_destroy(&rule->mutex); > -rule->ofproto->ofproto_class->rule_dealloc(rule); > -} > - > static uint32_t get_provider_meter_id(const struct ofproto *, >uint32_t of_meter_id); > > -- > 1.7.10.4 > > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 2/3] ofproto: Make taking rule reference conditional on lookup.
Traditionally we've put function comments just in the .c file, not in the .c and .h file as you've done for rule_dpif_lookup(). That said, I don't know where that convention came from and don't care that much . . . Acked-by: Ethan Jackson On Wed, Apr 23, 2014 at 4:20 PM, Jarno Rajahalme wrote: > Prior to this paths the rule lookup functions have always taken a > reference on the found rule before returning. Make this conditional, > so that unnecessary refs/unrefs can be avoided in a later patch. > > Signed-off-by: Jarno Rajahalme > --- > ofproto/ofproto-dpif-xlate.c |8 +++--- > ofproto/ofproto-dpif.c | 58 > +- > ofproto/ofproto-dpif.h | 20 --- > 3 files changed, 62 insertions(+), 24 deletions(-) > > diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c > index 248382f..c62424a 100644 > --- a/ofproto/ofproto-dpif-xlate.c > +++ b/ofproto/ofproto-dpif-xlate.c > @@ -2057,7 +2057,7 @@ xlate_table_action(struct xlate_ctx *ctx, ofp_port_t > in_port, uint8_t table_id, >!skip_wildcards >? &ctx->xout->wc : NULL, >honor_table_miss, > - &ctx->table_id, &rule); > + &ctx->table_id, &rule, true); > ctx->xin->flow.in_port.ofp_port = old_in_port; > > if (ctx->xin->resubmit_hook) { > @@ -2090,7 +2090,7 @@ xlate_table_action(struct xlate_ctx *ctx, ofp_port_t > in_port, uint8_t table_id, > } > > choose_miss_rule(config, ctx->xbridge->miss_rule, > - ctx->xbridge->no_packet_in_rule, &rule); > + ctx->xbridge->no_packet_in_rule, &rule, true); > > match: > if (rule) { > @@ -2654,7 +2654,7 @@ xlate_learn_action(struct xlate_ctx *ctx, > entry = xlate_cache_add_entry(ctx->xin->xcache, XC_LEARN); > entry->u.learn.ofproto = ctx->xin->ofproto; > rule_dpif_lookup(ctx->xbridge->ofproto, &ctx->xin->flow, NULL, > - &entry->u.learn.rule); > + &entry->u.learn.rule, true); > } > } > > @@ -3263,7 +3263,7 @@ xlate_actions__(struct xlate_in *xin, struct xlate_out > *xout) > if (!xin->ofpacts && !ctx.rule) { > ctx.table_id = rule_dpif_lookup(ctx.xbridge->ofproto, flow, > !xin->skip_wildcards ? wc : NULL, > -&rule); > +&rule, true); > if (ctx.xin->resubmit_stats) { > rule_dpif_credit_stats(rule, ctx.xin->resubmit_stats); > } > diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c > index 983cad5..5669cd1 100644 > --- a/ofproto/ofproto-dpif.c > +++ b/ofproto/ofproto-dpif.c > @@ -3189,10 +3189,16 @@ rule_dpif_get_actions(const struct rule_dpif *rule) > * > * The return value will be zero unless there was a miss and > * OFPTC11_TABLE_MISS_CONTINUE is in effect for the sequence of tables > - * where misses occur. */ > + * where misses occur. > + * > + * The rule is returned in '*rule', which is valid at least until the next > + * RCU quiescent period. If the '*rule' needs to stay around longer, > + * a non-zero 'take_ref' must be passed in to cause a reference to be taken > + * on it before this returns. */ > uint8_t > rule_dpif_lookup(struct ofproto_dpif *ofproto, struct flow *flow, > - struct flow_wildcards *wc, struct rule_dpif **rule) > + struct flow_wildcards *wc, struct rule_dpif **rule, > + bool take_ref) > { > enum rule_dpif_lookup_verdict verdict; > enum ofputil_port_config config = 0; > @@ -3219,7 +3225,7 @@ rule_dpif_lookup(struct ofproto_dpif *ofproto, struct > flow *flow, > } > > verdict = rule_dpif_lookup_from_table(ofproto, flow, wc, true, > - &table_id, rule); > + &table_id, rule, take_ref); > > switch (verdict) { > case RULE_DPIF_LOOKUP_VERDICT_MATCH: > @@ -3248,13 +3254,17 @@ rule_dpif_lookup(struct ofproto_dpif *ofproto, struct > flow *flow, > } > > choose_miss_rule(config, ofproto->miss_rule, > - ofproto->no_packet_in_rule, rule); > + ofproto->no_packet_in_rule, rule, take_ref); > return table_id; > } > > +/* The returned rule is valid at least until the next RCU quiescent period. > + * If the '*rule' needs to stay around longer, a non-zero 'take_ref' must be > + * passed in to cause a reference to be taken on it before this returns. */ > static struct rule_dpif * > rule_dpif_lookup_in_table(struct ofproto_dpif *ofproto, uint8_t table_id, > - const struct flow *flow, struct flow_wildcards *wc) > +
Re: [ovs-dev] [PATCH 3/3] ofproto: Reduce taking rule references.
LGTM Acked-by: Ethan Jackson On Wed, Apr 23, 2014 at 4:20 PM, Jarno Rajahalme wrote: > Only take reference to a looked up rule when needed. > > This reduces the total CPU utilization of rule_ref/unref calls by 80%, > from 5% of total server CPU capacity to 1% in a netperf TCP_CRR > test stressing the userspace. > > Signed-off-by: Jarno Rajahalme > --- > ofproto/ofproto-dpif-xlate.c | 49 > +++--- > 1 file changed, 27 insertions(+), 22 deletions(-) > > diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c > index c62424a..6a56a15 100644 > --- a/ofproto/ofproto-dpif-xlate.c > +++ b/ofproto/ofproto-dpif-xlate.c > @@ -1996,13 +1996,6 @@ xlate_recursively(struct xlate_ctx *ctx, struct > rule_dpif *rule) > if (ctx->xin->resubmit_stats) { > rule_dpif_credit_stats(rule, ctx->xin->resubmit_stats); > } > -if (ctx->xin->xcache) { > -struct xc_entry *entry; > - > -entry = xlate_cache_add_entry(ctx->xin->xcache, XC_RULE); > -entry->u.rule = rule; > -rule_dpif_ref(rule); > -} > > ctx->resubmits++; > ctx->recurse++; > @@ -2057,7 +2050,8 @@ xlate_table_action(struct xlate_ctx *ctx, ofp_port_t > in_port, uint8_t table_id, >!skip_wildcards >? &ctx->xout->wc : NULL, >honor_table_miss, > - &ctx->table_id, &rule, true); > + &ctx->table_id, &rule, > + ctx->xin->xcache != NULL); > ctx->xin->flow.in_port.ofp_port = old_in_port; > > if (ctx->xin->resubmit_hook) { > @@ -2090,12 +2084,22 @@ xlate_table_action(struct xlate_ctx *ctx, ofp_port_t > in_port, uint8_t table_id, > } > > choose_miss_rule(config, ctx->xbridge->miss_rule, > - ctx->xbridge->no_packet_in_rule, &rule, true); > + ctx->xbridge->no_packet_in_rule, &rule, > + ctx->xin->xcache != NULL); > > match: > if (rule) { > +/* Fill in the cache entry here instead of xlate_recursively > + * to make the reference counting more explicit. We take a > + * reference in the lookups above if we are going to cache the > + * rule. */ > +if (ctx->xin->xcache) { > +struct xc_entry *entry; > + > +entry = xlate_cache_add_entry(ctx->xin->xcache, XC_RULE); > +entry->u.rule = rule; > +} > xlate_recursively(ctx, rule); > -rule_dpif_unref(rule); > } > > ctx->table_id = old_table_id; > @@ -2653,6 +2657,8 @@ xlate_learn_action(struct xlate_ctx *ctx, > > entry = xlate_cache_add_entry(ctx->xin->xcache, XC_LEARN); > entry->u.learn.ofproto = ctx->xin->ofproto; > +/* Lookup the learned rule, taking a reference on it. The reference > + * is released when this cache entry is deleted. */ > rule_dpif_lookup(ctx->xbridge->ofproto, &ctx->xin->flow, NULL, > &entry->u.learn.rule, true); > } > @@ -2678,10 +2684,11 @@ xlate_fin_timeout(struct xlate_ctx *ctx, > struct xc_entry *entry; > > entry = xlate_cache_add_entry(ctx->xin->xcache, XC_FIN_TIMEOUT); > +/* XC_RULE already holds a reference on the rule, none is taken > + * here. */ > entry->u.fin.rule = ctx->rule; > entry->u.fin.idle = oft->fin_idle_timeout; > entry->u.fin.hard = oft->fin_hard_timeout; > -rule_dpif_ref(ctx->rule); > } > } > } > @@ -3230,7 +3237,7 @@ xlate_actions__(struct xlate_in *xin, struct xlate_out > *xout) > > ctx.xbridge = xbridge_lookup(xin->ofproto); > if (!ctx.xbridge) { > -goto out; > +return; > } > > ctx.rule = xin->rule; > @@ -3263,7 +3270,7 @@ xlate_actions__(struct xlate_in *xin, struct xlate_out > *xout) > if (!xin->ofpacts && !ctx.rule) { > ctx.table_id = rule_dpif_lookup(ctx.xbridge->ofproto, flow, > !xin->skip_wildcards ? wc : NULL, > -&rule, true); > +&rule, ctx.xin->xcache != NULL); > if (ctx.xin->resubmit_stats) { > rule_dpif_credit_stats(rule, ctx.xin->resubmit_stats); > } > @@ -3271,7 +3278,6 @@ xlate_actions__(struct xlate_in *xin, struct xlate_out > *xout) > struct xc_entry *entry; > > entry = xlate_cache_add_entry(ctx.xin->xcache, XC_RULE); > -rule_dpif_ref(rule); > entry->u.rule = rule; > } > ctx.rule = rule; > @@ -3309,7 +3315,7 @@ xlate_actions__(struct xlate_in *xin, struct x
[ovs-dev] [PATCH] bridge: When ports disappear from a datapath, add them back.
Before commit 2a73b1d73d4bdb (bridge: Reconfigure in single pass.), if a port disappeared, for one reason or another, from a datapath, the next bridge reconfiguration pass would notice and, if the port was still configured in the database, add the port back to the datapath. That commit, however, removed the logic from bridge_refresh_ofp_port() that did that and failed to add the same logic to the replacement function bridge_delete_or_reconfigure_ports(). This commit fixes the problem. To see this problem on a Linux kernel system: ovs-vsctl add-br br0 # 1 tunctl -t tap# 2 ovs-vsctl add-port br0 tap # 3 ovs-dpctl show # 4 tunctl -d tap# 5 ovs-dpctl show # 6 tunctl -t tap# 7 ovs-vsctl del-port tap -- add-port br0 tap # 8 ovs-dpctl show # 9 Steps 1-4 create a bridge and a tap and add it to the bridge and demonstrate that the tap is part of the datapath. Step 5 and 6 delete the tap and demonstrate that it has therefore disappeared from the datapath. Step 7 recreates a tap with the same name, and step 8 forces ovs-vswitchd to reconfigure. Step 9 shows the effect of the fix: without the fix, the new tap is not added back to the datapath; with this fix, it is. Special thanks to Gurucharan Shetty for finding a simple reproduction case and then bisecting to find the commit that introduced the problem. Bug #1238467. Reported-by: Ronald Lee Signed-off-by: Ben Pfaff --- vswitchd/bridge.c | 44 +--- 1 file changed, 41 insertions(+), 3 deletions(-) diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c index f46d002..45a1491 100644 --- a/vswitchd/bridge.c +++ b/vswitchd/bridge.c @@ -252,6 +252,7 @@ static bool iface_is_internal(const struct ovsrec_interface *iface, static const char *iface_get_type(const struct ovsrec_interface *, const struct ovsrec_bridge *); static void iface_destroy(struct iface *); +static void iface_destroy__(struct iface *); static struct iface *iface_lookup(const struct bridge *, const char *name); static struct iface *iface_find(const char *name); static struct iface *iface_from_ofp_port(const struct bridge *, @@ -667,6 +668,9 @@ bridge_delete_or_reconfigure_ports(struct bridge *br) struct ofproto_port ofproto_port; struct ofproto_port_dump dump; +struct sset ofproto_ports; +struct port *port, *port_next; + /* List of "ofp_port"s to delete. We make a list instead of deleting them * right away because ofproto implementations aren't necessarily able to * iterate through a changing list of ports in an entirely robust way. */ @@ -676,11 +680,21 @@ bridge_delete_or_reconfigure_ports(struct bridge *br) del = NULL; n = allocated = 0; +sset_init(&ofproto_ports); +/* Main task: Iterate over the ports in 'br->ofproto' and remove the ports + * that are not configured in the database. (This commonly happens when + * ports have been deleted, e.g. with "ovs-vsctl del-port".) + * + * Side tasks: Reconfigure the ports that are still in 'br'. Delete ports + * that have the wrong OpenFlow port number (and arrange to add them back + * with the correct OpenFlow port number). */ OFPROTO_PORT_FOR_EACH (&ofproto_port, &dump, br->ofproto) { ofp_port_t requested_ofp_port; struct iface *iface; +sset_add(&ofproto_ports, ofproto_port.name); + iface = iface_lookup(br, ofproto_port.name); if (!iface) { /* No such iface is configured, so we should delete this @@ -746,11 +760,37 @@ bridge_delete_or_reconfigure_ports(struct bridge *br) iface_destroy(iface); del = add_ofp_port(ofproto_port.ofp_port, del, &n, &allocated); } - for (i = 0; i < n; i++) { ofproto_port_del(br->ofproto, del[i]); } free(del); + +/* Iterate over this module's idea of interfaces in 'br'. Remove any ports + * that we didn't see when we iterated through the datapath, i.e. ports + * that disappeared underneath use. This is an unusual situation, but it + * can happen in some cases: + * + * - An admin runs a command like "ovs-dpctl del-port" (which is a bad + * idea but could happen). + * + * - The port represented a device that disappeared, e.g. a tuntap + * device destroyed via "tunctl -d", a physical Ethernet device + * whose module was just unloaded via "rmmod", or a virtual NIC for a + * VM whose VM was just terminated. */ +HMAP_FOR_EACH_SAFE (port, port_next, hmap_node, &br->ports) { +struct iface *iface, *iface_next; + +LIST_FOR_EACH_SAFE (iface, iface_next, port_elem, &port->ifaces) { +if (!sse
Re: [ovs-dev] [PATCH] run-ryu: Use unix socket rather than patch ports
On Wed, Apr 23, 2014 at 09:36:30AM -0700, Ben Pfaff wrote: > On Wed, Apr 23, 2014 at 02:06:12PM +0900, Simon Horman wrote: > > My understanding of the implementation of patch ports is that they > > are rather special, being handled as a special case inside > > compose_output_action__() and do not exist in the datapath. > > > > A side effect of the implementation of patch ports (though perhaps not the > > portion mentioned above) is that the OFPUTIL_PC_PORT_DOWN bit may not be > > set via a port mod message. In particular, the call to > > netdev_turn_flags_on() in update_port_config() fails. > > > > There is a test provided by Ryu that test this via port mod and thus fails. > > > > While that test could be modified or the results ignored it seems to me > > that it would be best if ryu-check used ports which were more fully > > featured and not special cases. > > > > Thus this patch moves run-ryu to use unix socket backed ports rather than > > patch ports. > > > > Signed-off-by: Simon Horman > > Applied, thanks. > > The following seemed good rationale so I folded it into the commit message: Thanks! ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCHv2] revalidator: Prevent handling the same flow twice.
Pushed to master and branch-2.1. On 24 April 2014 11:06, Joe Stringer wrote: > On 24 April 2014 04:56, Alex Wang wrote: > >> >> On Tue, Apr 22, 2014 at 10:24 PM, Alex Wang wrote: >> >>> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c index 416cfdc..906bf17 100644 --- a/ofproto/ofproto-dpif-upcall.c +++ b/ofproto/ofproto-dpif-upcall.c @@ -45,6 +45,8 @@ VLOG_DEFINE_THIS_MODULE(ofproto_dpif_upcall); +COVERAGE_DEFINE(upcall_duplicate_flow); + /* A thread that reads upcalls from dpif, forwards each upcall's packet, * and possibly sets up a kernel flow as a cache. */ struct handler { @@ -161,6 +163,7 @@ struct udpif_key { long long int created; /* Estimation of creation time. */ bool mark; /* Used by mark and sweep GC algorithm. */ +bool flow_exists; /* Ensures flows are only deleted once. */ >>> >>> Do we still need mark? I think the function of 'mark' and 'flow_exists' >>> is overlapped. >>> >> > Strictly speaking, no. But I think there are a couple of benefits: > > (1) If they are separate, then we can detect that the flow was already > handled and chosen to be kept: > * Revalidator finishes a round of revalidation. mark is set to false. > flow_exists is true. > * Consider when we dump a flow, revalidate it and choose to keep it. We > set the mark to true. Flow_exists is true. > * If we dump a duplicate of the flow, then "mark" will be true. This means > that the flow was handled already. If it was just handled, there is > unlikely to be any changes. We can skip execution. > > (2) If they are separate, then we know when the datapath has skipped a > flow during flow_dump (if we've seen the flow before at all): > * If a flow is dumped and we decide to delete it, "mark" and "flow_exists" > will both be set to false. > * If a flow is dumped and we decide to keep it, "mark" and "flow_exists" > will both be set to true. > * If the datapath doesn't dump a flow that does exist (due to some race > condition), then "mark" will be false from previous revalidator_sweep() and > "flow_exists" will be true. > * Revalidator_sweep(): If a flow has "mark" set false and "flow_exists" > set true, then we haven't seen the flow. Currently we delete such flows > from the datapath, so that we can garbage collect the ukey and update the > stats for that flow. > > Thanks for the review Alex. > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] bridge: When ports disappear from a datapath, add them back.
Thanks for taking care of this. Acked-by: Ethan Jackson On Wed, Apr 23, 2014 at 5:06 PM, Ben Pfaff wrote: > Before commit 2a73b1d73d4bdb (bridge: Reconfigure in single pass.), if a > port disappeared, for one reason or another, from a datapath, the next > bridge reconfiguration pass would notice and, if the port was still > configured in the database, add the port back to the datapath. That > commit, however, removed the logic from bridge_refresh_ofp_port() that > did that and failed to add the same logic to the replacement function > bridge_delete_or_reconfigure_ports(). This commit fixes the problem. > > To see this problem on a Linux kernel system: > > ovs-vsctl add-br br0 # 1 > tunctl -t tap# 2 > ovs-vsctl add-port br0 tap # 3 > ovs-dpctl show # 4 > tunctl -d tap# 5 > ovs-dpctl show # 6 > tunctl -t tap# 7 > ovs-vsctl del-port tap -- add-port br0 tap # 8 > ovs-dpctl show # 9 > > Steps 1-4 create a bridge and a tap and add it to the bridge and > demonstrate that the tap is part of the datapath. Step 5 and 6 delete > the tap and demonstrate that it has therefore disappeared from the > datapath. Step 7 recreates a tap with the same name, and step 8 > forces ovs-vswitchd to reconfigure. Step 9 shows the effect of the > fix: without the fix, the new tap is not added back to the datapath; > with this fix, it is. > > Special thanks to Gurucharan Shetty for finding a > simple reproduction case and then bisecting to find the commit that > introduced the problem. > > Bug #1238467. > Reported-by: Ronald Lee > Signed-off-by: Ben Pfaff > --- > vswitchd/bridge.c | 44 +--- > 1 file changed, 41 insertions(+), 3 deletions(-) > > diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c > index f46d002..45a1491 100644 > --- a/vswitchd/bridge.c > +++ b/vswitchd/bridge.c > @@ -252,6 +252,7 @@ static bool iface_is_internal(const struct > ovsrec_interface *iface, > static const char *iface_get_type(const struct ovsrec_interface *, >const struct ovsrec_bridge *); > static void iface_destroy(struct iface *); > +static void iface_destroy__(struct iface *); > static struct iface *iface_lookup(const struct bridge *, const char *name); > static struct iface *iface_find(const char *name); > static struct iface *iface_from_ofp_port(const struct bridge *, > @@ -667,6 +668,9 @@ bridge_delete_or_reconfigure_ports(struct bridge *br) > struct ofproto_port ofproto_port; > struct ofproto_port_dump dump; > > +struct sset ofproto_ports; > +struct port *port, *port_next; > + > /* List of "ofp_port"s to delete. We make a list instead of deleting > them > * right away because ofproto implementations aren't necessarily able to > * iterate through a changing list of ports in an entirely robust way. */ > @@ -676,11 +680,21 @@ bridge_delete_or_reconfigure_ports(struct bridge *br) > > del = NULL; > n = allocated = 0; > +sset_init(&ofproto_ports); > > +/* Main task: Iterate over the ports in 'br->ofproto' and remove the > ports > + * that are not configured in the database. (This commonly happens when > + * ports have been deleted, e.g. with "ovs-vsctl del-port".) > + * > + * Side tasks: Reconfigure the ports that are still in 'br'. Delete > ports > + * that have the wrong OpenFlow port number (and arrange to add them back > + * with the correct OpenFlow port number). */ > OFPROTO_PORT_FOR_EACH (&ofproto_port, &dump, br->ofproto) { > ofp_port_t requested_ofp_port; > struct iface *iface; > > +sset_add(&ofproto_ports, ofproto_port.name); > + > iface = iface_lookup(br, ofproto_port.name); > if (!iface) { > /* No such iface is configured, so we should delete this > @@ -746,11 +760,37 @@ bridge_delete_or_reconfigure_ports(struct bridge *br) > iface_destroy(iface); > del = add_ofp_port(ofproto_port.ofp_port, del, &n, &allocated); > } > - > for (i = 0; i < n; i++) { > ofproto_port_del(br->ofproto, del[i]); > } > free(del); > + > +/* Iterate over this module's idea of interfaces in 'br'. Remove any > ports > + * that we didn't see when we iterated through the datapath, i.e. ports > + * that disappeared underneath use. This is an unusual situation, but it > + * can happen in some cases: > + * > + * - An admin runs a command like "ovs-dpctl del-port" (which is a > bad > + * idea but could happen). > + * > + * - The port represented a device that disappeared, e.g. a tuntap > + * device destroyed via "tunctl -d", a physical Ethernet device > + * whos
Re: [ovs-dev] [PATCH] bridge: When ports disappear from a datapath, add them back.
Thanks, I applied this to master and branch-2.1. On Wed, Apr 23, 2014 at 05:27:02PM -0700, Ethan Jackson wrote: > Thanks for taking care of this. > > Acked-by: Ethan Jackson > > > On Wed, Apr 23, 2014 at 5:06 PM, Ben Pfaff wrote: > > Before commit 2a73b1d73d4bdb (bridge: Reconfigure in single pass.), if a > > port disappeared, for one reason or another, from a datapath, the next > > bridge reconfiguration pass would notice and, if the port was still > > configured in the database, add the port back to the datapath. That > > commit, however, removed the logic from bridge_refresh_ofp_port() that > > did that and failed to add the same logic to the replacement function > > bridge_delete_or_reconfigure_ports(). This commit fixes the problem. > > > > To see this problem on a Linux kernel system: > > > > ovs-vsctl add-br br0 # 1 > > tunctl -t tap# 2 > > ovs-vsctl add-port br0 tap # 3 > > ovs-dpctl show # 4 > > tunctl -d tap# 5 > > ovs-dpctl show # 6 > > tunctl -t tap# 7 > > ovs-vsctl del-port tap -- add-port br0 tap # 8 > > ovs-dpctl show # 9 > > > > Steps 1-4 create a bridge and a tap and add it to the bridge and > > demonstrate that the tap is part of the datapath. Step 5 and 6 delete > > the tap and demonstrate that it has therefore disappeared from the > > datapath. Step 7 recreates a tap with the same name, and step 8 > > forces ovs-vswitchd to reconfigure. Step 9 shows the effect of the > > fix: without the fix, the new tap is not added back to the datapath; > > with this fix, it is. > > > > Special thanks to Gurucharan Shetty for finding a > > simple reproduction case and then bisecting to find the commit that > > introduced the problem. > > > > Bug #1238467. > > Reported-by: Ronald Lee > > Signed-off-by: Ben Pfaff > > --- > > vswitchd/bridge.c | 44 +--- > > 1 file changed, 41 insertions(+), 3 deletions(-) > > > > diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c > > index f46d002..45a1491 100644 > > --- a/vswitchd/bridge.c > > +++ b/vswitchd/bridge.c > > @@ -252,6 +252,7 @@ static bool iface_is_internal(const struct > > ovsrec_interface *iface, > > static const char *iface_get_type(const struct ovsrec_interface *, > >const struct ovsrec_bridge *); > > static void iface_destroy(struct iface *); > > +static void iface_destroy__(struct iface *); > > static struct iface *iface_lookup(const struct bridge *, const char *name); > > static struct iface *iface_find(const char *name); > > static struct iface *iface_from_ofp_port(const struct bridge *, > > @@ -667,6 +668,9 @@ bridge_delete_or_reconfigure_ports(struct bridge *br) > > struct ofproto_port ofproto_port; > > struct ofproto_port_dump dump; > > > > +struct sset ofproto_ports; > > +struct port *port, *port_next; > > + > > /* List of "ofp_port"s to delete. We make a list instead of deleting > > them > > * right away because ofproto implementations aren't necessarily able > > to > > * iterate through a changing list of ports in an entirely robust way. > > */ > > @@ -676,11 +680,21 @@ bridge_delete_or_reconfigure_ports(struct bridge *br) > > > > del = NULL; > > n = allocated = 0; > > +sset_init(&ofproto_ports); > > > > +/* Main task: Iterate over the ports in 'br->ofproto' and remove the > > ports > > + * that are not configured in the database. (This commonly happens > > when > > + * ports have been deleted, e.g. with "ovs-vsctl del-port".) > > + * > > + * Side tasks: Reconfigure the ports that are still in 'br'. Delete > > ports > > + * that have the wrong OpenFlow port number (and arrange to add them > > back > > + * with the correct OpenFlow port number). */ > > OFPROTO_PORT_FOR_EACH (&ofproto_port, &dump, br->ofproto) { > > ofp_port_t requested_ofp_port; > > struct iface *iface; > > > > +sset_add(&ofproto_ports, ofproto_port.name); > > + > > iface = iface_lookup(br, ofproto_port.name); > > if (!iface) { > > /* No such iface is configured, so we should delete this > > @@ -746,11 +760,37 @@ bridge_delete_or_reconfigure_ports(struct bridge *br) > > iface_destroy(iface); > > del = add_ofp_port(ofproto_port.ofp_port, del, &n, &allocated); > > } > > - > > for (i = 0; i < n; i++) { > > ofproto_port_del(br->ofproto, del[i]); > > } > > free(del); > > + > > +/* Iterate over this module's idea of interfaces in 'br'. Remove any > > ports > > + * that we didn't see when we iterated through the datapath, i.e. ports > > + * that disappeared underneath use. This is an unusual situation, but > > it > >
Re: [ovs-dev] [PATCHv13] ofproto-dpif-upcall: Remove the flow_dumper thread.
Thanks, I tidied up the comments, added threadsafety annotations for xcache and pushed. On 24 April 2014 06:38, Ben Pfaff wrote: > On Tue, Apr 22, 2014 at 04:54:24PM +1200, Joe Stringer wrote: > > From: Ethan Jackson > > > > Previously, we had a separate flow_dumper thread that fetched flows from > > the datapath to distribute to revalidator threads. This patch takes the > > logic for dumping and pushes it into the revalidator threads, resulting > > in simpler code with similar performance to the current code. > > > > One thread, the "leader", is responsible for beginning and ending each > > flow dump, maintaining the flow_limit, and checking whether the > > revalidator threads need to exit. All revalidator threads dump, > > revalidate, delete datapath flows and garbage collect ukeys. > > > > Co-authored-by: Joe Stringer > > Signed-off-by: Joe Stringer > > Here in the definition of struct udpif, I would mention that there are > n_revalidators member of 'ukeys'. Otherwise the casual reader might > guess that there was only one: > > +/* During the flow dump phase, revalidators insert into these with > a random > > + * distribution. During the garbage collection phase, each > revalidator > > + * takes care of garbage collecting one of these hmaps. */ > > +struct { > > +struct ovs_mutex mutex;/* Guards the following. */ > > +struct hmap hmap OVS_GUARDED; /* Datapath flow keys. */ > > +} *ukeys; > > In the definition of struct udpif_key, the synchronization of most of > the members is pretty clear, except for 'xcache'. > > Acked-by: Ben Pfaff > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCHv13] ofproto-dpif-upcall: Remove the flow_dumper thread.
W00t, nice to see this in. Ethan On Wed, Apr 23, 2014 at 9:32 PM, Joe Stringer wrote: > Thanks, I tidied up the comments, added threadsafety annotations for xcache > and pushed. > > > On 24 April 2014 06:38, Ben Pfaff wrote: >> >> On Tue, Apr 22, 2014 at 04:54:24PM +1200, Joe Stringer wrote: >> > From: Ethan Jackson >> > >> > Previously, we had a separate flow_dumper thread that fetched flows from >> > the datapath to distribute to revalidator threads. This patch takes the >> > logic for dumping and pushes it into the revalidator threads, resulting >> > in simpler code with similar performance to the current code. >> > >> > One thread, the "leader", is responsible for beginning and ending each >> > flow dump, maintaining the flow_limit, and checking whether the >> > revalidator threads need to exit. All revalidator threads dump, >> > revalidate, delete datapath flows and garbage collect ukeys. >> > >> > Co-authored-by: Joe Stringer >> > Signed-off-by: Joe Stringer >> >> Here in the definition of struct udpif, I would mention that there are >> n_revalidators member of 'ukeys'. Otherwise the casual reader might >> guess that there was only one: >> > +/* During the flow dump phase, revalidators insert into these with >> > a random >> > + * distribution. During the garbage collection phase, each >> > revalidator >> > + * takes care of garbage collecting one of these hmaps. */ >> > +struct { >> > +struct ovs_mutex mutex;/* Guards the following. */ >> > +struct hmap hmap OVS_GUARDED; /* Datapath flow keys. */ >> > +} *ukeys; >> >> In the definition of struct udpif_key, the synchronization of most of >> the members is pretty clear, except for 'xcache'. >> >> Acked-by: Ben Pfaff > > > > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev