[ovs-dev] Is there any way to simulate the tunnel traffic in test cases to test IPFIX feature?
I am trying to simulate traffic in test cases to test IPFIX feature. For normal traffic, I can use "ovs-appctl netdev-dummy/receive " to simulate it, just refer to the netflow and sflow cases in tests/ofproto-dpif.at And I can capture the IPFIX packets on lo interface as expected. But for tunnel traffic, I haven't find a workable way to generate this kind of traffic. For example, I try to use "ovs-appctl ofproto/trace ovs-dummy" command with "-generate" option. And we can see that the "sample" actions has been inserted as expected, but these actions are not executed, and no IPFIX packets can be captured on lo interface. Anyone knows about the usage of this command? Or other way to generate the tunnel traffic to trigger IPFIX sample action? Thanks a lot. Following is the script to simulate tunnel traffic, but it doesn't work for IPFIX. AT_SETUP([ofproto-dpif - IPFIX tunnel packet sampling]) OVS_VSWITCHD_START([add-port br0 p1 -- set Interface p1 type=gre \ options:remote_ip=1.1.1.1 ofport_request=1\ -- add-port br0 p2 -- set Interface p2 type=gre \ options:local_ip=2.2.2.2 options:remote_ip=1.1.1.1 \ ofport_request=2 \ -- add-port br0 p3 -- set Interface p3 type=gre \ options:remote_ip=2.2.2.2 ofport_request=3]) AT_DATA([flows.txt], [dnl actions=IN_PORT ]) ovs-appctl time/stop ON_EXIT([kill `cat test-ipfix.pid`]) AT_CHECK([ovstest test-ipfix --log-file --detach --no-chdir --pidfile 0:127.0.0.1 > ipfix.log], [0], [], [ignore]) AT_CAPTURE_FILE([ipfix.log]) IPFIX_PORT=`parse_listening_port < test-ipfix.log` ovs-vsctl \ set Bridge br0 ipfix=@i -- \ --id=@i create IPFIX targets=\"127.0.0.1:$IPFIX_PORT\" \ sampling=1 obs_domain_id=123 obs_point_id=456 cache_active_timeout=1 cache_max_flows=0 other_config:enable-tunnel-sampling=true AT_CHECK([ovs-ofctl add-flows br0 flows.txt]) AT_CHECK([ovs-appctl dpif/show | tail -n +3], [0], [dnl br0 65534/100: (dummy) p1 1/1: (gre: remote_ip=1.1.1.1) p2 2/1: (gre: local_ip=2.2.2.2, remote_ip=1.1.1.1) p3 3/1: (gre: remote_ip=2.2.2.2) ]) dnl remote_ip AT_CHECK([ovs-appctl ofproto/trace ovs-dummy 'tunnel(tun_id=0x0,src=1.1.1.1,dst=1.2.3.4,tos=0x0,ttl=64,flags()),in_port(1),eth(src=50:54:00:00:00:05,dst=50:54:00:00:00:07),eth_type(0x0800),ipv4(src=192.168.0.1,dst=192.168.0.2,proto=6,tos=0,ttl=64,frag=no),tcp(src=8,dst=9)' -generate], [0], [stdout]) AT_CHECK([tail -1 stdout], [0], [Datapath actions: sample(sample=100.0%,actions(userspace(pid=0,ipfix(output_port=4294967295,set(tunnel(tun_id=0x0,src=0.0.0.0,dst=1.1.1.1,tos=0x0,ttl=64,flags(df))),sample(sample=100.0%,actions(userspace(pid=0,ipfix(output_port=1),tunnel_out_port=1))),1 Bests, Wenyu ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH 2/2] ofproto: Remove ofproto_refresh_rule().
The only user of this function was removed in the previous patch, so remove it. Signed-off-by: Joe Stringer --- ofproto/ofproto-dpif.c | 12 ofproto/ofproto-provider.h |2 -- ofproto/ofproto.c | 41 - 3 files changed, 55 deletions(-) diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c index 06be234..22ba14e 100644 --- a/ofproto/ofproto-dpif.c +++ b/ofproto/ofproto-dpif.c @@ -364,18 +364,6 @@ ofproto_dpif_flow_mod(struct ofproto_dpif *ofproto, ofproto_flow_mod(&ofproto->up, fm); } -/* Resets the modified time for 'rule' or an equivalent rule. If 'rule' is not - * in the classifier, but an equivalent rule is, unref 'rule' and ref the new - * rule. Otherwise if 'rule' is no longer installed in the classifier, - * reinstall it. - * - * Returns the rule whose modified time has been reset. */ -struct rule_dpif * -ofproto_dpif_refresh_rule(struct rule_dpif *rule) -{ -return rule_dpif_cast(ofproto_refresh_rule(&rule->up)); -} - /* Appends 'pin' to the queue of "packet ins" to be sent to the controller. * Takes ownership of 'pin' and pin->packet. */ void diff --git a/ofproto/ofproto-provider.h b/ofproto/ofproto-provider.h index ff539b9..7741044 100644 --- a/ofproto/ofproto-provider.h +++ b/ofproto/ofproto-provider.h @@ -1716,8 +1716,6 @@ BUILD_ASSERT_DECL(OFPROTO_POSTPONE < OFPERR_OFS); int ofproto_flow_mod(struct ofproto *, struct ofputil_flow_mod *) OVS_EXCLUDED(ofproto_mutex); -struct rule *ofproto_refresh_rule(struct rule *rule) -OVS_EXCLUDED(ofproto_mutex); void ofproto_add_flow(struct ofproto *, const struct match *, unsigned int priority, const struct ofpact *ofpacts, size_t ofpacts_len) diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c index 24a709b..f34cfde 100644 --- a/ofproto/ofproto.c +++ b/ofproto/ofproto.c @@ -1978,47 +1978,6 @@ ofproto_flow_mod(struct ofproto *ofproto, struct ofputil_flow_mod *fm) return handle_flow_mod__(ofproto, NULL, fm, NULL); } -/* Resets the modified time for 'rule' or an equivalent rule. If 'rule' is not - * in the classifier, but an equivalent rule is, unref 'rule' and ref the new - * rule. Otherwise if 'rule' is no longer installed in the classifier, - * reinstall it. - * - * Returns the rule whose modified time has been reset. */ -struct rule * -ofproto_refresh_rule(struct rule *rule) -{ -const struct oftable *table = &rule->ofproto->tables[rule->table_id]; -const struct cls_rule *cr = &rule->cr; -struct rule *r; - -/* do_add_flow() requires that the rule is not installed. We lock the - * ofproto_mutex here so that another thread cannot add the flow before - * we get a chance to add it.*/ -ovs_mutex_lock(&ofproto_mutex); - -fat_rwlock_rdlock(&table->cls.rwlock); -r = rule_from_cls_rule(classifier_find_rule_exactly(&table->cls, cr)); -if (r != rule) { -ofproto_rule_ref(r); -} -fat_rwlock_unlock(&table->cls.rwlock); - -if (!r) { -do_add_flow(rule->ofproto, NULL, NULL, 0, rule); -} else if (r != rule) { -ofproto_rule_unref(rule); -rule = r; -} -ovs_mutex_unlock(&ofproto_mutex); - -/* Refresh the modified time for the rule. */ -ovs_mutex_lock(&rule->mutex); -rule->modified = MAX(rule->modified, time_msec()); -ovs_mutex_unlock(&rule->mutex); - -return rule; -} - /* Searches for a rule with matching criteria exactly equal to 'target' in * ofproto's table 0 and, if it finds one, deletes it. * -- 1.7.10.4 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH 1/2] ofproto-dpif-xlate: Cache full flowmod for learning.
Caching the results of xlate_learn was previously dependent on the state of the 'may_learn' flag. This meant that if the caller did not specify that this flow may learn, then a learn entry would not be cached. However, the xlate_cache tends to be used on a recurring basis, so failing to cache the learn entry can provide unexpected behaviour later on, particularly in corner cases. Such a corner case occurred previously:- * Revalidation was requested. * A flow with a learn action was dumped. * The flow had no packets. * The flow's corresponding xcache was cleared, and the flow revalidated. * The flow went on to receive packets after the xcache is re-created. In this case, the xcache would be re-created, but would not refresh the timeouts on the learnt flow until the next time it was cleared, even if it received more traffic. This would cause flows to time out sooner than expected. Symptoms of this bug may include unexpected forwarding behaviour or extraneous statistics being attributed to the wrong flow. This patch fixes the issue by caching the entire flow_mod, including actions, upon translating an xlate_learn action. This is used to perform a flow_mod from scratch with the original flow, rather than simply refreshing the rule that was created during the creation of the xcache. Bug #1252997. Reported-by: Scott Hendricks Signed-off-by: Joe Stringer --- ofproto/ofproto-dpif-xlate.c | 58 ++ 1 file changed, 30 insertions(+), 28 deletions(-) diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c index 068d54f..e8f3dbf 100644 --- a/ofproto/ofproto-dpif-xlate.c +++ b/ofproto/ofproto-dpif-xlate.c @@ -265,7 +265,9 @@ struct xc_entry { uint16_t vid; } bond; struct { -struct rule_dpif *rule; +struct ofproto_dpif *ofproto; +struct ofputil_flow_mod *fm; +struct ofpbuf *ofpacts; } learn; struct { struct ofproto_dpif *ofproto; @@ -2977,34 +2979,38 @@ xlate_bundle_action(struct xlate_ctx *ctx, } static void -xlate_learn_action(struct xlate_ctx *ctx, - const struct ofpact_learn *learn) +xlate_learn_action__(struct xlate_ctx *ctx, const struct ofpact_learn *learn, + struct ofputil_flow_mod *fm, struct ofpbuf *ofpacts) { -uint64_t ofpacts_stub[1024 / 8]; -struct ofputil_flow_mod fm; -struct ofpbuf ofpacts; +learn_execute(learn, &ctx->xin->flow, fm, ofpacts); +if (ctx->xin->may_learn) { +ofproto_dpif_flow_mod(ctx->xbridge->ofproto, fm); +} +} +static void +xlate_learn_action(struct xlate_ctx *ctx, const struct ofpact_learn *learn) +{ ctx->xout->has_learn = true; - learn_mask(learn, &ctx->xout->wc); -if (!ctx->xin->may_learn) { -return; -} - -ofpbuf_use_stub(&ofpacts, ofpacts_stub, sizeof ofpacts_stub); -learn_execute(learn, &ctx->xin->flow, &fm, &ofpacts); -ofproto_dpif_flow_mod(ctx->xbridge->ofproto, &fm); -ofpbuf_uninit(&ofpacts); - if (ctx->xin->xcache) { struct xc_entry *entry; entry = xlate_cache_add_entry(ctx->xin->xcache, XC_LEARN); -/* Lookup the learned rule, taking a reference on it. The reference - * is released when this cache entry is deleted. */ -rule_dpif_lookup(ctx->xbridge->ofproto, &ctx->xin->flow, NULL, - &entry->u.learn.rule, true, NULL); +entry->u.learn.ofproto = ctx->xbridge->ofproto; +entry->u.learn.fm = xmalloc(sizeof *entry->u.learn.fm); +entry->u.learn.ofpacts = ofpbuf_new(1024); +xlate_learn_action__(ctx, learn, entry->u.learn.fm, + entry->u.learn.ofpacts); +} else if (ctx->xin->may_learn) { +uint64_t ofpacts_stub[1024 / 8]; +struct ofputil_flow_mod fm; +struct ofpbuf ofpacts; + +ofpbuf_use_stub(&ofpacts, ofpacts_stub, sizeof ofpacts_stub); +xlate_learn_action__(ctx, learn, &fm, &ofpacts); +ofpbuf_uninit(&ofpacts); } } @@ -3911,12 +3917,8 @@ xlate_push_stats(struct xlate_cache *xcache, bool may_learn, break; case XC_LEARN: if (may_learn) { -struct rule_dpif *rule = entry->u.learn.rule; - -/* Reset the modified time for a rule that is equivalent to - * the currently cached rule. If the rule is not the exact - * rule we have cached, update the reference that we have. */ -entry->u.learn.rule = ofproto_dpif_refresh_rule(rule); +ofproto_dpif_flow_mod(entry->u.learn.ofproto, + entry->u.learn.fm); } break; case XC_NORMAL: @@ -3989,8 +3991,8 @@ xlate_cache_clear(struct xlate_cache *xcache) mbridge_unref(entry->u.mirror.mbridge); break; case XC_LEARN: -/* 'u.learn.rule' is th
Re: [ovs-dev] [PATCH 2/3] lib/ovs-rcu: Evaluate parameters before ovsrcu_set and ovsrcu_init.
Thank you for the reviews. Series applied with an incremental shown below, Jarno On Jun 2, 2014, at 9:24 PM, YAMAMOTO Takashi wrote: >> ovsrcu_set() and ovsrcu_init() look like functions, so users are >> justified in expecting the arguments to be evalueated before any of > > evaluated > Thanks. >> >> +#define ovsrcu_set__(VAR, VALUE, ORDER) \ >> +({ \ >> +typeof(VAR) var__ = (VAR); \ >> +typeof(VALUE) value__ = (VALUE);\ >> +memory_order order__ = (ORDER); \ >> +\ >> +atomic_store_explicit(&var__->p, value__, order__); \ > > ovs-atomic-gcc4+ version of atomic_store_explicit also has > a variable named order__. is it safe? > Good catch, applied with the following incremental: diff --git a/lib/ovs-rcu.h b/lib/ovs-rcu.h index e27dfad..87651d8 100644 --- a/lib/ovs-rcu.h +++ b/lib/ovs-rcu.h @@ -136,14 +136,14 @@ * any of the body of the atomic_store_explicit. Since the type of * 'VAR' is not fixed, we cannot use an inline function to get * function semantics for this. */ -#define ovsrcu_set__(VAR, VALUE, ORDER) \ -({ \ -typeof(VAR) var__ = (VAR); \ -typeof(VALUE) value__ = (VALUE);\ -memory_order order__ = (ORDER); \ -\ -atomic_store_explicit(&var__->p, value__, order__); \ -(void *) 0; \ +#define ovsrcu_set__(VAR, VALUE, ORDER) \ +({ \ +typeof(VAR) ovsrcu_var = (VAR); \ +typeof(VALUE) ovsrcu_value = (VALUE); \ +memory_order ovsrcu_order = (ORDER);\ +\ +atomic_store_explicit(&ovsrcu_var->p, ovsrcu_value, ovsrcu_order); \ +(void *) 0; \ }) #else /* not GNU C */ struct ovsrcu_pointer { ATOMIC(void *) p; }; > otherwise, > Acked-by: YAMAMOTO Takashi > > YAMAMOTO Takashi ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] dpif-netdev: Remove an extra memcpy of packet data from datapath-upcall interface for bridges with datapath_type=netdev.
Ryan, See comments below. Jarno On Jun 2, 2014, at 3:54 PM, Ryan Wilson wrote: > When a bridge of datatype type netdev receives a packet, it copies the > packet from the NIC to a buffer in userspace. Currently, when making > an upcall, the packet is again copied to the upcall's buffer. However, > this extra copy is not necessary when the datapath exists in userspace > as the upcall can directly access the packet data. > > This patch eliminates this extra copy of the packet data in most cases. > In cases where the packet may still be used later by callers of > dp_netdev_execute_actions, making a copy of the packet data is still > necessary. > --- > lib/dpif-netdev.c | 15 +-- > lib/ofpbuf.c | 18 ++ > lib/ofpbuf.h |2 ++ > 3 files changed, 25 insertions(+), 10 deletions(-) > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > index 91c83d6..7cb0478 100644 > --- a/lib/dpif-netdev.c > +++ b/lib/dpif-netdev.c > @@ -2024,7 +2024,6 @@ dp_netdev_input(struct dp_netdev *dp, struct ofpbuf > *packet, >miniflow_hash_5tuple(&key.flow, 0) >% dp->n_handlers, >DPIF_UC_MISS, &key.flow, NULL); > -ofpbuf_delete(packet); > } > } > > @@ -2063,7 +2062,6 @@ dp_netdev_output_userspace(struct dp_netdev *dp, struct > ofpbuf *packet, > if (userdata) { > buf_size += NLA_ALIGN(userdata->nla_len); > } > -buf_size += ofpbuf_size(packet); > ofpbuf_init(buf, buf_size); > > /* Put ODP flow. */ > @@ -2078,15 +2076,14 @@ dp_netdev_output_userspace(struct dp_netdev *dp, > struct ofpbuf *packet, > NLA_ALIGN(userdata->nla_len)); > } > > -ofpbuf_set_data(&upcall->packet, > -ofpbuf_put(buf, ofpbuf_data(packet), > ofpbuf_size(packet))); > -ofpbuf_set_size(&upcall->packet, ofpbuf_size(packet)); > +ofpbuf_set(&upcall->packet, packet); > Did you consider the possibility to just assign the packet instead? We do this elsewhere, so it should work here too: upcall->packet = *packet; > seq_change(q->seq); > > error = 0; > } else { > dp_netdev_count_packet(dp, DP_STAT_LOST); > +ofpbuf_delete(packet); > error = ENOBUFS; > } > ovs_mutex_unlock(&q->mutex); > @@ -2120,19 +2117,17 @@ dp_execute_cb(void *aux_, struct ofpbuf *packet, > break; > > case OVS_ACTION_ATTR_USERSPACE: { > +struct ofpbuf *userspace_packet; > const struct nlattr *userdata; > > userdata = nl_attr_find_nested(a, OVS_USERSPACE_ATTR_USERDATA); > +userspace_packet = may_steal ? packet : ofpbuf_clone(packet); > > -dp_netdev_output_userspace(aux->dp, packet, > +dp_netdev_output_userspace(aux->dp, userspace_packet, >miniflow_hash_5tuple(aux->key, 0) >% aux->dp->n_handlers, >DPIF_UC_ACTION, aux->key, >userdata); > - > -if (may_steal) { > -ofpbuf_delete(packet); > -} > break; > } > > diff --git a/lib/ofpbuf.c b/lib/ofpbuf.c > index 1f4b61d..ade940b 100644 > --- a/lib/ofpbuf.c > +++ b/lib/ofpbuf.c > @@ -555,3 +555,21 @@ ofpbuf_resize_l2(struct ofpbuf *b, int increment) > ofpbuf_adjust_layer_offset(&b->l2_5_ofs, increment); > return b->frame; > } > + > +/* Sets ofpbuf 'dst' equal to 'src'. Note that 'dst' and 'src' share the > + * same buffer, so calling ofpbuf_delete('dst') will also delete 'src'. */ > +void > +ofpbuf_set(struct ofpbuf *dst, struct ofpbuf *src) > +{ > +ofpbuf_set_base(dst, ofpbuf_base(src)); > +ofpbuf_set_data(dst, ofpbuf_data(src)); > +ofpbuf_set_size(dst, ofpbuf_size(src)); > + > +dst->allocated = src->allocated; > +dst->frame = src->frame; > +dst->l2_5_ofs = src->l2_5_ofs; > +dst->l3_ofs = src->l3_ofs; > +dst->l4_ofs = src->l4_ofs; > +dst->source = src->source; > +dst->list_node = src->list_node; > +} This would be unnecessary if the plain assignment works also for DPDK. > diff --git a/lib/ofpbuf.h b/lib/ofpbuf.h > index 85be899..ec08c27 100644 > --- a/lib/ofpbuf.h > +++ b/lib/ofpbuf.h > @@ -101,6 +101,8 @@ static inline const void *ofpbuf_get_udp_payload(const > struct ofpbuf *); > static inline const void *ofpbuf_get_sctp_payload(const struct ofpbuf *); > static inline const void *ofpbuf_get_icmp_payload(const struct ofpbuf *); > > +void ofpbuf_set(struct ofpbuf *, struct ofpbuf *); > + > void ofpbuf_use(struct ofpbuf *, void *, size_t); > void ofpbuf_use_stack(struct ofpbuf *, void *, size_t); > void ofpbuf_use_stub(struct ofpbuf *, void *, size_t); > -- > 1.7.9.5 > > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/l
[ovs-dev] [PATCH] dpif-netdev: Remove an extra memcpy of packet data from datapath-upcall interface for bridges with datapath_type=netdev.
When a bridge of datatype type netdev receives a packet, it copies the packet from the NIC to a buffer in userspace. Currently, when making an upcall, the packet is again copied to the upcall's buffer. However, this extra copy is not necessary when the datapath exists in userspace as the upcall can directly access the packet data. This patch eliminates this extra copy of the packet data in most cases. In cases where the packet may still be used later by callers of dp_netdev_execute_actions, making a copy of the packet data is still necessary. Signed-off-by: Ryan Wilson --- v2: Addressed Jarno's comment to use direct pointer assignment for upcall->packet instead of ofpbuf_set(). --- lib/dpif-netdev.c | 15 +-- 1 file changed, 5 insertions(+), 10 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 91c83d6..c89ae20 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -2024,7 +2024,6 @@ dp_netdev_input(struct dp_netdev *dp, struct ofpbuf *packet, miniflow_hash_5tuple(&key.flow, 0) % dp->n_handlers, DPIF_UC_MISS, &key.flow, NULL); -ofpbuf_delete(packet); } } @@ -2063,7 +2062,6 @@ dp_netdev_output_userspace(struct dp_netdev *dp, struct ofpbuf *packet, if (userdata) { buf_size += NLA_ALIGN(userdata->nla_len); } -buf_size += ofpbuf_size(packet); ofpbuf_init(buf, buf_size); /* Put ODP flow. */ @@ -2078,15 +2076,14 @@ dp_netdev_output_userspace(struct dp_netdev *dp, struct ofpbuf *packet, NLA_ALIGN(userdata->nla_len)); } -ofpbuf_set_data(&upcall->packet, -ofpbuf_put(buf, ofpbuf_data(packet), ofpbuf_size(packet))); -ofpbuf_set_size(&upcall->packet, ofpbuf_size(packet)); +upcall->packet = *packet; seq_change(q->seq); error = 0; } else { dp_netdev_count_packet(dp, DP_STAT_LOST); +ofpbuf_delete(packet); error = ENOBUFS; } ovs_mutex_unlock(&q->mutex); @@ -2120,19 +2117,17 @@ dp_execute_cb(void *aux_, struct ofpbuf *packet, break; case OVS_ACTION_ATTR_USERSPACE: { +struct ofpbuf *userspace_packet; const struct nlattr *userdata; userdata = nl_attr_find_nested(a, OVS_USERSPACE_ATTR_USERDATA); +userspace_packet = may_steal ? packet : ofpbuf_clone(packet); -dp_netdev_output_userspace(aux->dp, packet, +dp_netdev_output_userspace(aux->dp, userspace_packet, miniflow_hash_5tuple(aux->key, 0) % aux->dp->n_handlers, DPIF_UC_ACTION, aux->key, userdata); - -if (may_steal) { -ofpbuf_delete(packet); -} break; } -- 1.7.9.5 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 1/2] ofproto-dpif-xlate: Cache full flowmod for learning.
I confirm this patch fix the bug, but have a high level concern, Assume another learnt flow with hard_timeout=10 and was never hit after install, now, if there is frequent change to the flow table and need_revalidate is constantly 'true', this learnt flow will never timeout... since we always redo the flow_mod during revalidation, Let's discuss further on this scenario, I'm thinking about solutions, Thanks, Alex Wang, On Tue, Jun 3, 2014 at 2:59 AM, Joe Stringer wrote: > Caching the results of xlate_learn was previously dependent on the state > of the 'may_learn' flag. This meant that if the caller did not specify > that this flow may learn, then a learn entry would not be cached. > However, the xlate_cache tends to be used on a recurring basis, so > failing to cache the learn entry can provide unexpected behaviour later > on, particularly in corner cases. > > Such a corner case occurred previously:- > * Revalidation was requested. > * A flow with a learn action was dumped. > * The flow had no packets. > * The flow's corresponding xcache was cleared, and the flow revalidated. > * The flow went on to receive packets after the xcache is re-created. > > In this case, the xcache would be re-created, but would not refresh the > timeouts on the learnt flow until the next time it was cleared, even if > it received more traffic. This would cause flows to time out sooner than > expected. Symptoms of this bug may include unexpected forwarding > behaviour or extraneous statistics being attributed to the wrong flow. > > This patch fixes the issue by caching the entire flow_mod, including > actions, upon translating an xlate_learn action. This is used to perform > a flow_mod from scratch with the original flow, rather than simply > refreshing the rule that was created during the creation of the xcache. > > Bug #1252997. > > Reported-by: Scott Hendricks > Signed-off-by: Joe Stringer > --- > ofproto/ofproto-dpif-xlate.c | 58 > ++ > 1 file changed, 30 insertions(+), 28 deletions(-) > > diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c > index 068d54f..e8f3dbf 100644 > --- a/ofproto/ofproto-dpif-xlate.c > +++ b/ofproto/ofproto-dpif-xlate.c > @@ -265,7 +265,9 @@ struct xc_entry { > uint16_t vid; > } bond; > struct { > -struct rule_dpif *rule; > +struct ofproto_dpif *ofproto; > +struct ofputil_flow_mod *fm; > +struct ofpbuf *ofpacts; > } learn; > struct { > struct ofproto_dpif *ofproto; > @@ -2977,34 +2979,38 @@ xlate_bundle_action(struct xlate_ctx *ctx, > } > > static void > -xlate_learn_action(struct xlate_ctx *ctx, > - const struct ofpact_learn *learn) > +xlate_learn_action__(struct xlate_ctx *ctx, const struct ofpact_learn > *learn, > + struct ofputil_flow_mod *fm, struct ofpbuf *ofpacts) > { > -uint64_t ofpacts_stub[1024 / 8]; > -struct ofputil_flow_mod fm; > -struct ofpbuf ofpacts; > +learn_execute(learn, &ctx->xin->flow, fm, ofpacts); > +if (ctx->xin->may_learn) { > +ofproto_dpif_flow_mod(ctx->xbridge->ofproto, fm); > +} > +} > > +static void > +xlate_learn_action(struct xlate_ctx *ctx, const struct ofpact_learn > *learn) > +{ > ctx->xout->has_learn = true; > - > learn_mask(learn, &ctx->xout->wc); > > -if (!ctx->xin->may_learn) { > -return; > -} > - > -ofpbuf_use_stub(&ofpacts, ofpacts_stub, sizeof ofpacts_stub); > -learn_execute(learn, &ctx->xin->flow, &fm, &ofpacts); > -ofproto_dpif_flow_mod(ctx->xbridge->ofproto, &fm); > -ofpbuf_uninit(&ofpacts); > - > if (ctx->xin->xcache) { > struct xc_entry *entry; > > entry = xlate_cache_add_entry(ctx->xin->xcache, XC_LEARN); > -/* Lookup the learned rule, taking a reference on it. The > reference > - * is released when this cache entry is deleted. */ > -rule_dpif_lookup(ctx->xbridge->ofproto, &ctx->xin->flow, NULL, > - &entry->u.learn.rule, true, NULL); > +entry->u.learn.ofproto = ctx->xbridge->ofproto; > +entry->u.learn.fm = xmalloc(sizeof *entry->u.learn.fm); > +entry->u.learn.ofpacts = ofpbuf_new(1024); > +xlate_learn_action__(ctx, learn, entry->u.learn.fm, > + entry->u.learn.ofpacts); > +} else if (ctx->xin->may_learn) { > +uint64_t ofpacts_stub[1024 / 8]; > +struct ofputil_flow_mod fm; > +struct ofpbuf ofpacts; > + > +ofpbuf_use_stub(&ofpacts, ofpacts_stub, sizeof ofpacts_stub); > +xlate_learn_action__(ctx, learn, &fm, &ofpacts); > +ofpbuf_uninit(&ofpacts); > } > } > > @@ -3911,12 +3917,8 @@ xlate_push_stats(struct xlate_cache *xcache, bool > may_learn, > break; > case XC_LEARN: > if (may_learn) { > -struct rule_dpif *rule = entry->u.le
Re: [ovs-dev] [PATCH] dpif-netdev: Remove an extra memcpy of packet data from datapath-upcall interface for bridges with datapath_type=netdev.
Hey Jarno, I didn't consider that possibility, but that is a much better solution which makes for a much cleaner patch. I just posted a new patch using this direct pointer assignment. Looks like I'm a bit rusty on my C :P Thanks for the review! Ryan On 6/3/14 10:10 AM, "Jarno Rajahalme" wrote: >Ryan, > >See comments below. > > Jarno > >On Jun 2, 2014, at 3:54 PM, Ryan Wilson wrote: > >> When a bridge of datatype type netdev receives a packet, it copies the >> packet from the NIC to a buffer in userspace. Currently, when making >> an upcall, the packet is again copied to the upcall's buffer. However, >> this extra copy is not necessary when the datapath exists in userspace >> as the upcall can directly access the packet data. >> >> This patch eliminates this extra copy of the packet data in most cases. >> In cases where the packet may still be used later by callers of >> dp_netdev_execute_actions, making a copy of the packet data is still >> necessary. >> --- >> lib/dpif-netdev.c | 15 +-- >> lib/ofpbuf.c | 18 ++ >> lib/ofpbuf.h |2 ++ >> 3 files changed, 25 insertions(+), 10 deletions(-) >> >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c >> index 91c83d6..7cb0478 100644 >> --- a/lib/dpif-netdev.c >> +++ b/lib/dpif-netdev.c >> @@ -2024,7 +2024,6 @@ dp_netdev_input(struct dp_netdev *dp, struct >>ofpbuf *packet, >>miniflow_hash_5tuple(&key.flow, 0) >>% dp->n_handlers, >>DPIF_UC_MISS, &key.flow, NULL); >> -ofpbuf_delete(packet); >> } >> } >> >> @@ -2063,7 +2062,6 @@ dp_netdev_output_userspace(struct dp_netdev *dp, >>struct ofpbuf *packet, >> if (userdata) { >> buf_size += NLA_ALIGN(userdata->nla_len); >> } >> -buf_size += ofpbuf_size(packet); >> ofpbuf_init(buf, buf_size); >> >> /* Put ODP flow. */ >> @@ -2078,15 +2076,14 @@ dp_netdev_output_userspace(struct dp_netdev >>*dp, struct ofpbuf *packet, >> NLA_ALIGN(userdata->nla_len)); >> } >> >> -ofpbuf_set_data(&upcall->packet, >> -ofpbuf_put(buf, ofpbuf_data(packet), >>ofpbuf_size(packet))); >> -ofpbuf_set_size(&upcall->packet, ofpbuf_size(packet)); >> +ofpbuf_set(&upcall->packet, packet); >> > >Did you consider the possibility to just assign the packet instead? We do >this elsewhere, so it should work here too: > >upcall->packet = *packet; > >> seq_change(q->seq); >> >> error = 0; >> } else { >> dp_netdev_count_packet(dp, DP_STAT_LOST); >> +ofpbuf_delete(packet); >> error = ENOBUFS; >> } >> ovs_mutex_unlock(&q->mutex); >> @@ -2120,19 +2117,17 @@ dp_execute_cb(void *aux_, struct ofpbuf *packet, >> break; >> >> case OVS_ACTION_ATTR_USERSPACE: { >> +struct ofpbuf *userspace_packet; >> const struct nlattr *userdata; >> >> userdata = nl_attr_find_nested(a, OVS_USERSPACE_ATTR_USERDATA); >> +userspace_packet = may_steal ? packet : ofpbuf_clone(packet); >> >> -dp_netdev_output_userspace(aux->dp, packet, >> +dp_netdev_output_userspace(aux->dp, userspace_packet, >>miniflow_hash_5tuple(aux->key, 0) >>% aux->dp->n_handlers, >>DPIF_UC_ACTION, aux->key, >>userdata); >> - >> -if (may_steal) { >> -ofpbuf_delete(packet); >> -} >> break; >> } >> >> diff --git a/lib/ofpbuf.c b/lib/ofpbuf.c >> index 1f4b61d..ade940b 100644 >> --- a/lib/ofpbuf.c >> +++ b/lib/ofpbuf.c >> @@ -555,3 +555,21 @@ ofpbuf_resize_l2(struct ofpbuf *b, int increment) >> ofpbuf_adjust_layer_offset(&b->l2_5_ofs, increment); >> return b->frame; >> } >> + >> +/* Sets ofpbuf 'dst' equal to 'src'. Note that 'dst' and 'src' share >>the >> + * same buffer, so calling ofpbuf_delete('dst') will also delete >>'src'. */ >> +void >> +ofpbuf_set(struct ofpbuf *dst, struct ofpbuf *src) >> +{ >> +ofpbuf_set_base(dst, ofpbuf_base(src)); >> +ofpbuf_set_data(dst, ofpbuf_data(src)); >> +ofpbuf_set_size(dst, ofpbuf_size(src)); >> + >> +dst->allocated = src->allocated; >> +dst->frame = src->frame; >> +dst->l2_5_ofs = src->l2_5_ofs; >> +dst->l3_ofs = src->l3_ofs; >> +dst->l4_ofs = src->l4_ofs; >> +dst->source = src->source; >> +dst->list_node = src->list_node; >> +} > >This would be unnecessary if the plain assignment works also for DPDK. > >> diff --git a/lib/ofpbuf.h b/lib/ofpbuf.h >> index 85be899..ec08c27 100644 >> --- a/lib/ofpbuf.h >> +++ b/lib/ofpbuf.h >> @@ -101,6 +101,8 @@ static inline const void >>*ofpbuf_get_udp_payload(const struct ofpbuf *); >> static inline const void *ofpbuf_get_sctp_payload(const struct ofpbuf >>*); >> static inline cons
Re: [ovs-dev] [PATCH] dpif-netdev: Remove an extra memcpy of packet data from datapath-upcall interface for bridges with datapath_type=netdev.
On Tue, Jun 3, 2014 at 10:33 AM, Ryan Wilson wrote: > When a bridge of datatype type netdev receives a packet, it copies the > packet from the NIC to a buffer in userspace. Currently, when making > an upcall, the packet is again copied to the upcall's buffer. However, > this extra copy is not necessary when the datapath exists in userspace > as the upcall can directly access the packet data. > > This patch eliminates this extra copy of the packet data in most cases. > In cases where the packet may still be used later by callers of > dp_netdev_execute_actions, making a copy of the packet data is still > necessary. > > Signed-off-by: Ryan Wilson > --- > v2: Addressed Jarno's comment to use direct pointer assignment for > upcall->packet instead of ofpbuf_set(). > --- > lib/dpif-netdev.c | 15 +-- > 1 file changed, 5 insertions(+), 10 deletions(-) > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > index 91c83d6..c89ae20 100644 > --- a/lib/dpif-netdev.c > +++ b/lib/dpif-netdev.c > @@ -2024,7 +2024,6 @@ dp_netdev_input(struct dp_netdev *dp, struct ofpbuf > *packet, > miniflow_hash_5tuple(&key.flow, 0) > % dp->n_handlers, > DPIF_UC_MISS, &key.flow, NULL); > -ofpbuf_delete(packet); > } > } > > @@ -2063,7 +2062,6 @@ dp_netdev_output_userspace(struct dp_netdev *dp, struct > ofpbuf *packet, > if (userdata) { > buf_size += NLA_ALIGN(userdata->nla_len); > } > -buf_size += ofpbuf_size(packet); > ofpbuf_init(buf, buf_size); > > /* Put ODP flow. */ > @@ -2078,15 +2076,14 @@ dp_netdev_output_userspace(struct dp_netdev *dp, > struct ofpbuf *packet, >NLA_ALIGN(userdata->nla_len)); > } > > -ofpbuf_set_data(&upcall->packet, > -ofpbuf_put(buf, ofpbuf_data(packet), > ofpbuf_size(packet))); > -ofpbuf_set_size(&upcall->packet, ofpbuf_size(packet)); > +upcall->packet = *packet; > This would not work with DPDK. ofpbuf from dpdk is special case where ofpbuf and data are allocated from same memory object. Therefore moving ofpbuf->data is nontrivial. To make it work we need atleast following covered. 1. Define flag for source of ofpbuf header. So that we can track header and data independently. 2. Fix dpdk ofpbuf destructor to free correct dpdk memory object. Thanks, Pravin. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] dpif-netdev: Remove an extra memcpy of packet data from datapath-upcall interface for bridges with datapath_type=netdev.
I think Pravin will need to take a look in case he has something in the pipeline that might conflict with this. On my part: Acked-by: Jarno Rajahalme Jarno On Jun 3, 2014, at 10:33 AM, Ryan Wilson wrote: > When a bridge of datatype type netdev receives a packet, it copies the > packet from the NIC to a buffer in userspace. Currently, when making > an upcall, the packet is again copied to the upcall's buffer. However, > this extra copy is not necessary when the datapath exists in userspace > as the upcall can directly access the packet data. > > This patch eliminates this extra copy of the packet data in most cases. > In cases where the packet may still be used later by callers of > dp_netdev_execute_actions, making a copy of the packet data is still > necessary. > > Signed-off-by: Ryan Wilson > --- > v2: Addressed Jarno's comment to use direct pointer assignment for > upcall->packet instead of ofpbuf_set(). > --- > lib/dpif-netdev.c | 15 +-- > 1 file changed, 5 insertions(+), 10 deletions(-) > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > index 91c83d6..c89ae20 100644 > --- a/lib/dpif-netdev.c > +++ b/lib/dpif-netdev.c > @@ -2024,7 +2024,6 @@ dp_netdev_input(struct dp_netdev *dp, struct ofpbuf > *packet, >miniflow_hash_5tuple(&key.flow, 0) >% dp->n_handlers, >DPIF_UC_MISS, &key.flow, NULL); > -ofpbuf_delete(packet); > } > } > > @@ -2063,7 +2062,6 @@ dp_netdev_output_userspace(struct dp_netdev *dp, struct > ofpbuf *packet, > if (userdata) { > buf_size += NLA_ALIGN(userdata->nla_len); > } > -buf_size += ofpbuf_size(packet); > ofpbuf_init(buf, buf_size); > > /* Put ODP flow. */ > @@ -2078,15 +2076,14 @@ dp_netdev_output_userspace(struct dp_netdev *dp, > struct ofpbuf *packet, > NLA_ALIGN(userdata->nla_len)); > } > > -ofpbuf_set_data(&upcall->packet, > -ofpbuf_put(buf, ofpbuf_data(packet), > ofpbuf_size(packet))); > -ofpbuf_set_size(&upcall->packet, ofpbuf_size(packet)); > +upcall->packet = *packet; > > seq_change(q->seq); > > error = 0; > } else { > dp_netdev_count_packet(dp, DP_STAT_LOST); > +ofpbuf_delete(packet); > error = ENOBUFS; > } > ovs_mutex_unlock(&q->mutex); > @@ -2120,19 +2117,17 @@ dp_execute_cb(void *aux_, struct ofpbuf *packet, > break; > > case OVS_ACTION_ATTR_USERSPACE: { > +struct ofpbuf *userspace_packet; > const struct nlattr *userdata; > > userdata = nl_attr_find_nested(a, OVS_USERSPACE_ATTR_USERDATA); > +userspace_packet = may_steal ? packet : ofpbuf_clone(packet); > > -dp_netdev_output_userspace(aux->dp, packet, > +dp_netdev_output_userspace(aux->dp, userspace_packet, >miniflow_hash_5tuple(aux->key, 0) >% aux->dp->n_handlers, >DPIF_UC_ACTION, aux->key, >userdata); > - > -if (may_steal) { > -ofpbuf_delete(packet); > -} > break; > } > > -- > 1.7.9.5 > > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 1/2] acinclude.m4: dpdk: use AC_SEARCH_LIBS and -ldl
Hi Ben, thanks for the suggestion. I’m about to send another patch that makes use of AC_LINK_IFELSE directly. Daniele On May 12, 2014, at 5:55 PM, Ben Pfaff wrote: > On Mon, May 12, 2014 at 12:47:47PM -0700, Daniele Di Proietto wrote: >> On some systems libintel_dpdk.a fails to link with libopenvswitch >> unless -ldl is used. This should address the issue >> >> Signed-off-by: Daniele Di Proietto > > Hmm. > > I have never noticed "unset" used in this way in Autoconf tests. I > suspect that it means that we are misusing AC_SEARCH_LIBS. Probably, > that means that we should instead use AC_LINK_IFELSE directly ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH] acinclude.m4: dpdk: link with -ldl if necessary
On some systems libintel_dpdk.a fails to link with libopenvswitch unless -ldl is used. This should address the issue Signed-off-by: Daniele Di Proietto --- acinclude.m4 | 30 +++--- 1 file changed, 27 insertions(+), 3 deletions(-) diff --git a/acinclude.m4 b/acinclude.m4 index bd71a82..69d65f0 100644 --- a/acinclude.m4 +++ b/acinclude.m4 @@ -170,10 +170,34 @@ AC_DEFUN([OVS_CHECK_DPDK], [ DPDK_INCLUDE=$RTE_SDK/include DPDK_LIB_DIR=$RTE_SDK/lib -DPDK_LIBS="$DPDK_LIB_DIR/libintel_dpdk.a" -LIBS="$DPDK_LIBS $LIBS" -CPPFLAGS="-I$DPDK_INCLUDE $CPPFLAGS" +LDFLAGS="$LDFLAGS -L$DPDK_LIB_DIR" +CFLAGS="$CFLAGS -I$DPDK_INCLUDE" + +# On some systems we have to add -ldl to link with dpdk +# +# This code, at first, tries to link without -ldl (""), +# then adds it and tries again. +# Before each attempt the search cache must be unset, +# otherwise autoconf will stick with the old result + +found=false +save_LIBS=$LIBS +for extras in "" "-ldl"; do +LIBS="-lintel_dpdk $extras $save_LIBS" +AC_LINK_IFELSE( + [AC_LANG_PROGRAM([#include + #include ], +[int rte_argc; char ** rte_argv; + rte_eal_init(rte_argc, rte_argv);])], + [found=true]) +if $found; then +break +fi +done +if $found; then :; else +AC_MSG_ERROR([cannot link with dpdk]) +fi AC_DEFINE([DPDK_NETDEV], [1], [System uses the DPDK module.]) else -- 2.0.0.rc2 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 1/1] netdev-dpdk-vhost / add dpdk vhost ports
On Tue, Jun 3, 2014 at 3:20 AM, maryam.tahhan wrote: > This patch implements the vhost-net offload API. It adds support for > a new port type to userspace datapath called dpdkvhost. This allows KVM > (QEMU) to offload the servicing of virtio-net devices to it's associated > dpdkvhost port. Instructions for use are in INSTALL.DPDK. > > This has been tested on Intel multi-core platforms and with clients that > have virtio-net interfaces. > Thanks for the patch. I have not done with reviewing this patch, but have a high level comment. I see some common code in netdev-dpdk and netdev-dpdk-vhost, Is it possible to reuse that code? something like Gerald did for dpdk rings patch. > Signed-off-by: maryam.tahhan > --- > INSTALL.DPDK | 190 + > Makefile.am | 44 +- > configure.ac |1 + > lib/automake.mk |9 +- > lib/netdev-bsd.c |1 - > lib/netdev-dpdk-vhost.c | 1219 > + > lib/netdev-dpdk-vhost.h | 62 ++ > lib/netdev-dpdk.c | 62 +- > lib/netdev-dpdk.h | 13 + > lib/netdev.c |5 +- > lib/vhost-net-cdev.c | 374 ++ > lib/vhost-net-cdev.h | 81 +++ > lib/virtio-net.c | 989 ++ > lib/virtio-net.h | 117 > utilities/automake.mk |3 +- > utilities/eventfd_link/Makefile.in| 86 +++ > utilities/eventfd_link/eventfd_link.c | 189 + > utilities/eventfd_link/eventfd_link.h | 79 +++ > utilities/qemu-wrap.py| 389 +++ > 19 files changed, 3858 insertions(+), 55 deletions(-) > create mode 100644 lib/netdev-dpdk-vhost.c > create mode 100644 lib/netdev-dpdk-vhost.h > create mode 100644 lib/vhost-net-cdev.c > create mode 100644 lib/vhost-net-cdev.h > create mode 100644 lib/virtio-net.c > create mode 100644 lib/virtio-net.h > create mode 100644 utilities/eventfd_link/Makefile.in > create mode 100644 utilities/eventfd_link/eventfd_link.c > create mode 100644 utilities/eventfd_link/eventfd_link.h > create mode 100755 utilities/qemu-wrap.py > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] Can VXLAN work with VIP (Keepalived)?
On Mon, Jun 2, 2014 at 6:37 PM, Changbin Liu wrote: > Hi Folks, > > Using Open vSwitch I am setting up VXLAN tunnels between hosts. I intend to > make VXLAN tunnels "highly available" via Keepalived. Specifically, there > are one master node and one backup node for each host, and they share a > Virtual IP (VIP) via Keepalived, and the VXLAN tunnel will use the VIP as > endpoint IP. This way when the master node fails, the backup node will > automatically have the VIP shifted to it. Therefore, human efforts are not > involved in failure recovery. > > I thought VXLAN can work with VIP this way, since VXLAN simply uses UDP > packets, and as long as the endpoint VIP does not change, it should work. > But it turns out that it didn't at all. I wonder whether this is intended > behavior or I missed something. It should be possible to use VXLAN with a VIP. The IP stack is configured outside of OVS so you should make sure that it is working in Linux first. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] cmap test failures
[adding ovs-dev, hope that's OK] On Tue, Jun 03, 2014 at 10:19:23AM -0700, Jarno Rajahalme wrote: > > On Jun 3, 2014, at 9:56 AM, Gurucharan Shetty wrote: > > > Hello Jarno, > > Occasionally the cmap tests fail on Windows platform with the following o/p. > > > > #tests/ovstest test-cmap check 1 > > test-cmap: lib/ovs-atomic-locked.c:49: pthread_mutex_lock failed > > (Resource deadlock avoided) > > > > This is now fixed in master. However, I also noticed that cmap (or > any heavy use of ovs-rcu and/or ovs-atomic) is horribly slow with > pthreads. This is because pthreads doesn?t have memory barriers > (other than the locks), so lockless atomics cannot be supported with > pthreads alone. > > We should look into having more native support for MSVC atomics in > lib/ovs-atomic.h. Depending on the level of support for C11 atomics, > any of the existing lib/ovs-atomic-*.h files could serve as a > starting point. I?m pretty sure a thing like ova-atomic-gcc4+.h is > possible, since MSVC has memory barrier support. I forgot to mention that I looked at this earlier and found a couple of obstacles. The main one is that MSVC doesn't have the GNU C typeof or ({...}) extensions. I don't see a way to implement the C11 atomics without those and without evaluating parameters to the atomic_*() operations more than once. So we might have to change the interface to require clients to tolerate multiple evaluation. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2 2/2] lib/classifier: Optimize megaflows for single rule case.
On May 22, 2014, at 10:43 AM, Ben Pfaff wrote: > I find myself wondering whether we could end up with a funny > paradoxical mask, e.g. mark some bit of a TCP port as looked at even > though we didn't mark the IP protocol as looked at. I am not sure > whether this can happen or whether this is a problem. Actually, > looking further, I guess it can't happen because we always unwildcard > all the fields up to the current stage when we jump to range_out. In > fact, I think that range_out wildcards the current stage, too: is that > defeating the intended optimization here? (I see that you updated the > tests so presumably not?) Coming back to an old comment here… So, what we could do instead is to fill in the mask bits as we successfully compare, and fill in one bit of the differing uint32_t. This would get rid of another scan through the map in both outcomes, so it would be faster and simpler, and would take care of the prerequisites independent of the chosen boundaries of the lookup ranges (as long as the difference and the prerequisite are not in the same uint32_t, which is not the case for any field in struct flow). This would be the incremental, what do you think: diff --git a/lib/classifier.c b/lib/classifier.c index b5416c6..2d47310 100644 --- a/lib/classifier.c +++ b/lib/classifier.c @@ -1385,8 +1385,7 @@ check_tries(struct trie_ctx trie_ctx[CLS_MAX_TRIES], unsigned int n_tries, static inline bool miniflow_and_mask_matches_flow(const struct miniflow *flow, const struct minimask *mask, - const struct flow *target, - struct flow_wildcards *wc) + const struct flow *target) { const uint32_t *flowp = miniflow_get_u32_values(flow); const uint32_t *maskp = miniflow_get_u32_values(&mask->masks); @@ -1396,12 +1395,6 @@ miniflow_and_mask_matches_flow(const struct miniflow *flow, uint32_t diff = (*flowp++ ^ FLOW_U32_VALUE(target, idx)) & *maskp++; if (diff) { -/* Only unwildcard if none of the differing bits is already - * exact-matched. */ -if (wc && !(FLOW_U32_VALUE(&wc->masks, idx) & diff)) { -/* Keep one bit of the difference. */ -FLOW_U32_VALUE(&wc->masks, idx) |= rightmost_1bit(diff); -} return false; } } @@ -1417,7 +1410,7 @@ find_match(const struct cls_subtable *subtable, const struct flow *flow, HMAP_FOR_EACH_WITH_HASH (rule, hmap_node, hash, &subtable->rules) { if (miniflow_and_mask_matches_flow(&rule->flow, &subtable->mask, - flow, NULL)) { + flow)) { return rule; } } @@ -1425,6 +1418,42 @@ find_match(const struct cls_subtable *subtable, const struct flow *flow, return NULL; } +/* Returns true if 'target' satisifies 'flow'/'mask', that is, if each bit + * for which 'flow', for which 'mask' has a bit set, specifies a particular + * value has the correct value in 'target'. + * + * This function is equivalent to miniflow_and_mask_matches_flow() but this + * version fills in the mask bits in 'wc'. */ +static inline bool +miniflow_and_mask_matches_flow_wc(const struct miniflow *flow, + const struct minimask *mask, + const struct flow *target, + struct flow_wildcards *wc) +{ +const uint32_t *flowp = miniflow_get_u32_values(flow); +const uint32_t *maskp = miniflow_get_u32_values(&mask->masks); +uint32_t idx; + +MAP_FOR_EACH_INDEX(idx, mask->masks.map) { +uint32_t mask = *maskp++; +uint32_t diff = (*flowp++ ^ FLOW_U32_VALUE(target, idx)) & mask; + +if (diff) { +/* Only unwildcard if none of the differing bits is already + * exact-matched. */ +if (!(FLOW_U32_VALUE(&wc->masks, idx) & diff)) { +/* Keep one bit of the difference. */ +FLOW_U32_VALUE(&wc->masks, idx) |= rightmost_1bit(diff); +} +return false; +} +/* Fill in the bits that were looked at. */ +FLOW_U32_VALUE(&wc->masks, idx) |= mask; +} + +return true; +} + static struct cls_match * find_match_wc(const struct cls_subtable *subtable, const struct flow *flow, struct trie_ctx trie_ctx[CLS_MAX_TRIES], unsigned int n_tries, @@ -1467,11 +1496,11 @@ find_match_wc(const struct cls_subtable *subtable, const struct flow *flow, * optimization. */ if (!inode->s) { ASSIGN_CONTAINER(rule, inode - i, index_nodes); -if (miniflow_and_mask_matches_flow(&rule->flow, &subtable->mask, - flow, wc)) { -goto out; +if (miniflow_and_mask_matches_flow_wc(&rule->flow, &subtable->m
[ovs-dev] MS HyperV-End Users database
Hi, Would you be interested to acquire a list of MS HyperV-End Users & Partners across worldwide who can be potential clients for your company? Data fields are name, title, email, phone, address and company details, linkedIn Url or database can be filtered based on the criteria given by the clients. Let me know if you are focused on any particular application/technology users or titles so that I can get back to you with all the relevant details. Await your response. Kind Regards, June White Business Development Our Services:- Email List I Tracked Email campaign I Email/Data Appending I Search Engine Optimization I Custom Built List I Tele Marketing I Multi Channel Marketing I Lead Nurturing I Data Analysis I Customer profiling I Web Analysis I Campaign Planning If you are not interested to receive further communication in the future, please send a reply with the email subject line as "Leave Out". ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH] INSTALL: Note about compiler atomics support.
OVS is slow when compiled with pthreads atomics. Add a generic note in INSTALL, with a reference to lib/ovs-atomic.h, where a new comment provides additional detail. Signed-off-by: Jarno Rajahalme --- INSTALL |4 lib/ovs-atomic.h |3 +++ 2 files changed, 7 insertions(+) diff --git a/INSTALL b/INSTALL index 5c869b2..7b08172 100644 --- a/INSTALL +++ b/INSTALL @@ -28,6 +28,10 @@ you will need the following software: analysis and thread-safety checks. For Ubuntu, there are nightly built packages available on clang's website. +* While OVS may be compatible with other compilers, optimal + support for atomic operations may be missing, making OVS + very slow (see lib/ovs-atomic.h). + - libssl, from OpenSSL, is optional but recommended if you plan to connect the Open vSwitch to an OpenFlow controller. libssl is required to establish confidentiality and authenticity in the diff --git a/lib/ovs-atomic.h b/lib/ovs-atomic.h index c21b7ee..2452846 100644 --- a/lib/ovs-atomic.h +++ b/lib/ovs-atomic.h @@ -257,6 +257,9 @@ #elif HAVE_GCC4_ATOMICS #include "ovs-atomic-gcc4+.h" #else +/* ovs-atomic-pthreads implementation is provided for portability. + * It might be too slow for real use because Open vSwitch is + * optimized for platforms where real atomic ops are available. */ #include "ovs-atomic-pthreads.h" #endif #undef IN_OVS_ATOMIC_H -- 1.7.10.4 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] cmap: Fix deadlock with ovs-atomic-pthreads
On Jun 2, 2014, at 8:50 PM, YAMAMOTO Takashi wrote: >> >> On Jun 2, 2014, at 12:51 PM, Ben Pfaff wrote: >> >>> On Mon, Jun 02, 2014 at 12:50:45PM -0700, Jarno Rajahalme wrote: On Jun 2, 2014, at 11:45 AM, Ben Pfaff wrote: > I'd really hope that any serious OVS implementation would be able to > use C11 or GCC or Clang or other compiler-specific techniques to get a > "real" implementation of atomics. Really the pthreads version is just > to make porting easier. Are we already documenting this somewhere? >>> >>> I don't think so. It wasn't the attitude I started out with; >>> originally I hoped that the pthreads implementation would be viable >>> for real use, but I'm really not sure about that any longer. >> >> I just ran \x93time ./ovstest test-cmap check 1\x94, with >> ovs-atomics-pthreads it takes 16 times longer than on master. So it is >> likely unusable in practice, so this would be the time to document it, I >> guess. > > while i guess test-cmap is far from the real usage, > documenting expectations is a good idea. > > "ovs-atomic-pthreads implementation is provided for portability. > It might be too slow for real use because Open vSwitch is > optimized for platforms where real atomic ops are available.” Thanks, I just sent a patch incorporating this into a comment and adding a more generic note in INSTALL. Jarno ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 1/2] ofproto-dpif-xlate: Cache full flowmod for learning.
After discussing the issue with few people, I think we should keep the old_xcache during the full revalidation. And in xlate_learn_action(), if may_learn is false, we should check if the learned rule is in the old_xcache, if it is, we should copy the XC_LEARN entry to the new_xcache list (but we do not refresh the learnt rule again, since may_learn to false). Thoughts? Alex Wang, On Tue, Jun 3, 2014 at 10:39 AM, Alex Wang wrote: > I confirm this patch fix the bug, but have a high level concern, > > > Assume another learnt flow with hard_timeout=10 and was never hit after > install, now, if there is frequent change to the flow table and > need_revalidate > is constantly 'true', this learnt flow will never timeout... since we > always > redo the flow_mod during revalidation, > > Let's discuss further on this scenario, I'm thinking about solutions, > > Thanks, > Alex Wang, > > > > > On Tue, Jun 3, 2014 at 2:59 AM, Joe Stringer > wrote: > >> Caching the results of xlate_learn was previously dependent on the state >> of the 'may_learn' flag. This meant that if the caller did not specify >> that this flow may learn, then a learn entry would not be cached. >> However, the xlate_cache tends to be used on a recurring basis, so >> failing to cache the learn entry can provide unexpected behaviour later >> on, particularly in corner cases. >> >> Such a corner case occurred previously:- >> * Revalidation was requested. >> * A flow with a learn action was dumped. >> * The flow had no packets. >> * The flow's corresponding xcache was cleared, and the flow revalidated. >> * The flow went on to receive packets after the xcache is re-created. >> >> In this case, the xcache would be re-created, but would not refresh the >> timeouts on the learnt flow until the next time it was cleared, even if >> it received more traffic. This would cause flows to time out sooner than >> expected. Symptoms of this bug may include unexpected forwarding >> behaviour or extraneous statistics being attributed to the wrong flow. >> >> This patch fixes the issue by caching the entire flow_mod, including >> actions, upon translating an xlate_learn action. This is used to perform >> a flow_mod from scratch with the original flow, rather than simply >> refreshing the rule that was created during the creation of the xcache. >> >> Bug #1252997. >> >> Reported-by: Scott Hendricks >> Signed-off-by: Joe Stringer >> --- >> ofproto/ofproto-dpif-xlate.c | 58 >> ++ >> 1 file changed, 30 insertions(+), 28 deletions(-) >> >> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c >> index 068d54f..e8f3dbf 100644 >> --- a/ofproto/ofproto-dpif-xlate.c >> +++ b/ofproto/ofproto-dpif-xlate.c >> @@ -265,7 +265,9 @@ struct xc_entry { >> uint16_t vid; >> } bond; >> struct { >> -struct rule_dpif *rule; >> +struct ofproto_dpif *ofproto; >> +struct ofputil_flow_mod *fm; >> +struct ofpbuf *ofpacts; >> } learn; >> struct { >> struct ofproto_dpif *ofproto; >> @@ -2977,34 +2979,38 @@ xlate_bundle_action(struct xlate_ctx *ctx, >> } >> >> static void >> -xlate_learn_action(struct xlate_ctx *ctx, >> - const struct ofpact_learn *learn) >> +xlate_learn_action__(struct xlate_ctx *ctx, const struct ofpact_learn >> *learn, >> + struct ofputil_flow_mod *fm, struct ofpbuf *ofpacts) >> { >> -uint64_t ofpacts_stub[1024 / 8]; >> -struct ofputil_flow_mod fm; >> -struct ofpbuf ofpacts; >> +learn_execute(learn, &ctx->xin->flow, fm, ofpacts); >> +if (ctx->xin->may_learn) { >> +ofproto_dpif_flow_mod(ctx->xbridge->ofproto, fm); >> +} >> +} >> >> +static void >> +xlate_learn_action(struct xlate_ctx *ctx, const struct ofpact_learn >> *learn) >> +{ >> ctx->xout->has_learn = true; >> - >> learn_mask(learn, &ctx->xout->wc); >> >> -if (!ctx->xin->may_learn) { >> -return; >> -} >> - >> -ofpbuf_use_stub(&ofpacts, ofpacts_stub, sizeof ofpacts_stub); >> -learn_execute(learn, &ctx->xin->flow, &fm, &ofpacts); >> -ofproto_dpif_flow_mod(ctx->xbridge->ofproto, &fm); >> -ofpbuf_uninit(&ofpacts); >> - >> if (ctx->xin->xcache) { >> struct xc_entry *entry; >> >> entry = xlate_cache_add_entry(ctx->xin->xcache, XC_LEARN); >> -/* Lookup the learned rule, taking a reference on it. The >> reference >> - * is released when this cache entry is deleted. */ >> -rule_dpif_lookup(ctx->xbridge->ofproto, &ctx->xin->flow, NULL, >> - &entry->u.learn.rule, true, NULL); >> +entry->u.learn.ofproto = ctx->xbridge->ofproto; >> +entry->u.learn.fm = xmalloc(sizeof *entry->u.learn.fm); >> +entry->u.learn.ofpacts = ofpbuf_new(1024); >> +xlate_learn_action__(ctx, learn, entry->u.learn.fm, >> + entry->u.learn.ofpacts
Re: [ovs-dev] [PATCH] INSTALL: Note about compiler atomics support.
On Tue, Jun 03, 2014 at 01:09:11PM -0700, Jarno Rajahalme wrote: > OVS is slow when compiled with pthreads atomics. Add a generic note > in INSTALL, with a reference to lib/ovs-atomic.h, where a new comment > provides additional detail. > > Signed-off-by: Jarno Rajahalme I think this is only likely to affect Windows (everyone else uses GCC or Clang) so should we put it in BUILD.Windows? ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v3 2/2] dpif-netdev: Remove an extra memcpy of packet data from datapath-upcall interface for bridges with datapath_type=netdev.
When a bridge of datatype type netdev receives a packet, it copies the packet from the NIC to a buffer in userspace. Currently, when making an upcall, the packet is again copied to the upcall's buffer. However, this extra copy is not necessary when the datapath exists in userspace as the upcall can directly access the packet data. This patch eliminates this extra copy of the packet data in most cases. In cases where the packet may still be used later by callers of dp_netdev_execute_actions, making a copy of the packet data is still necessary. Signed-off-by: Ryan Wilson Acked-by: Jarno Rajahalme --- v2: Addressed Jarno's comment to use direct pointer assignment for upcall->packet instead of ofpbuf_set(). v3: Addressed issues with DPDK mentioned by Pravin with previous patch. --- lib/dpif-netdev.c | 15 +-- 1 file changed, 5 insertions(+), 10 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 91c83d6..c89ae20 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -2024,7 +2024,6 @@ dp_netdev_input(struct dp_netdev *dp, struct ofpbuf *packet, miniflow_hash_5tuple(&key.flow, 0) % dp->n_handlers, DPIF_UC_MISS, &key.flow, NULL); -ofpbuf_delete(packet); } } @@ -2063,7 +2062,6 @@ dp_netdev_output_userspace(struct dp_netdev *dp, struct ofpbuf *packet, if (userdata) { buf_size += NLA_ALIGN(userdata->nla_len); } -buf_size += ofpbuf_size(packet); ofpbuf_init(buf, buf_size); /* Put ODP flow. */ @@ -2078,15 +2076,14 @@ dp_netdev_output_userspace(struct dp_netdev *dp, struct ofpbuf *packet, NLA_ALIGN(userdata->nla_len)); } -ofpbuf_set_data(&upcall->packet, -ofpbuf_put(buf, ofpbuf_data(packet), ofpbuf_size(packet))); -ofpbuf_set_size(&upcall->packet, ofpbuf_size(packet)); +upcall->packet = *packet; seq_change(q->seq); error = 0; } else { dp_netdev_count_packet(dp, DP_STAT_LOST); +ofpbuf_delete(packet); error = ENOBUFS; } ovs_mutex_unlock(&q->mutex); @@ -2120,19 +2117,17 @@ dp_execute_cb(void *aux_, struct ofpbuf *packet, break; case OVS_ACTION_ATTR_USERSPACE: { +struct ofpbuf *userspace_packet; const struct nlattr *userdata; userdata = nl_attr_find_nested(a, OVS_USERSPACE_ATTR_USERDATA); +userspace_packet = may_steal ? packet : ofpbuf_clone(packet); -dp_netdev_output_userspace(aux->dp, packet, +dp_netdev_output_userspace(aux->dp, userspace_packet, miniflow_hash_5tuple(aux->key, 0) % aux->dp->n_handlers, DPIF_UC_ACTION, aux->key, userdata); - -if (may_steal) { -ofpbuf_delete(packet); -} break; } -- 1.7.9.5 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] INSTALL: Note about compiler atomics support.
On Jun 3, 2014, at 2:03 PM, Ben Pfaff wrote: > On Tue, Jun 03, 2014 at 01:09:11PM -0700, Jarno Rajahalme wrote: >> OVS is slow when compiled with pthreads atomics. Add a generic note >> in INSTALL, with a reference to lib/ovs-atomic.h, where a new comment >> provides additional detail. >> >> Signed-off-by: Jarno Rajahalme > > I think this is only likely to affect Windows (everyone else uses GCC > or Clang) so should we put it in BUILD.Windows? So you think this should not be mentioned in INSTALL? If so, I think the text there should be modified to something like “* GCC 4.x or clang C compiler” instead of the current “A C compiler, such as…”. How about adding the following to BUILD.Windows: diff --git a/BUILD.Windows b/BUILD.Windows index ca0d252..05956ba 100644 --- a/BUILD.Windows +++ b/BUILD.Windows @@ -39,6 +39,10 @@ project from ftp://sourceware.org/pub/pthreads-win32/prebuilt-dll-2-9-1-release to a directory (e.g.: C:/pthread). +OVS currently has no native support for atomics on Windows. Pthreads +are used as a fallback, but some features, such as OVS-RCU are really +slow without native atomics support. + * Get the Open vSwitch sources from either cloning the repo using git or from a distribution tar ball. Jarno ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] INSTALL: Note about compiler atomics support.
May be we should add a new TODO section in BUILD.Windows like: diff --git a/BUILD.Windows b/BUILD.Windows index ca0d252..41e6eab 100644 --- a/BUILD.Windows +++ b/BUILD.Windows @@ -85,3 +85,11 @@ For example, --with-openssl="C:/OpenSSL-Win32" * Run make for the ported executables. + +TODO: + + +* OVS currently has no native support for atomics on Windows. Pthreads +are used as a fallback, but some features, such as OVS-RCU are really +slow without native atomics support. Atomics support for Windows has to +be brought in. On Tue, Jun 3, 2014 at 2:16 PM, Jarno Rajahalme wrote: > > On Jun 3, 2014, at 2:03 PM, Ben Pfaff wrote: > >> On Tue, Jun 03, 2014 at 01:09:11PM -0700, Jarno Rajahalme wrote: >>> OVS is slow when compiled with pthreads atomics. Add a generic note >>> in INSTALL, with a reference to lib/ovs-atomic.h, where a new comment >>> provides additional detail. >>> >>> Signed-off-by: Jarno Rajahalme >> >> I think this is only likely to affect Windows (everyone else uses GCC >> or Clang) so should we put it in BUILD.Windows? > > > So you think this should not be mentioned in INSTALL? If so, I think the text > there should be modified to something like “* GCC 4.x or clang C compiler” > instead of the current “A C compiler, such as…”. > > How about adding the following to BUILD.Windows: > > diff --git a/BUILD.Windows b/BUILD.Windows > index ca0d252..05956ba 100644 > --- a/BUILD.Windows > +++ b/BUILD.Windows > @@ -39,6 +39,10 @@ project from > ftp://sourceware.org/pub/pthreads-win32/prebuilt-dll-2-9-1-release to a > directory (e.g.: C:/pthread). > > +OVS currently has no native support for atomics on Windows. Pthreads > +are used as a fallback, but some features, such as OVS-RCU are really > +slow without native atomics support. > + > * Get the Open vSwitch sources from either cloning the repo using git > or from a distribution tar ball. > > Jarno > > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] dpif-netdev: Remove an extra memcpy of packet data from datapath-upcall interface for bridges with datapath_type=netdev.
Hey Pravin, Thanks for the catch here! Turns out the header is already tracked in DPDK with rte_mbuf's buffer address - sizeof(ofpbuf). Thus, I submitted another patch that, in free_dpdk_buf(), always gets this header and uses this to free memory. http://patchwork.openvswitch.org/patch/4375/ Let me know if this patch works. Cheers, Ryan On 6/3/14 10:50 AM, "Pravin Shelar" wrote: >On Tue, Jun 3, 2014 at 10:33 AM, Ryan Wilson wrote: >> When a bridge of datatype type netdev receives a packet, it copies the >> packet from the NIC to a buffer in userspace. Currently, when making >> an upcall, the packet is again copied to the upcall's buffer. However, >> this extra copy is not necessary when the datapath exists in userspace >> as the upcall can directly access the packet data. >> >> This patch eliminates this extra copy of the packet data in most cases. >> In cases where the packet may still be used later by callers of >> dp_netdev_execute_actions, making a copy of the packet data is still >> necessary. >> >> Signed-off-by: Ryan Wilson >> --- >> v2: Addressed Jarno's comment to use direct pointer assignment for >> upcall->packet instead of ofpbuf_set(). >> --- >> lib/dpif-netdev.c | 15 +-- >> 1 file changed, 5 insertions(+), 10 deletions(-) >> >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c >> index 91c83d6..c89ae20 100644 >> --- a/lib/dpif-netdev.c >> +++ b/lib/dpif-netdev.c >> @@ -2024,7 +2024,6 @@ dp_netdev_input(struct dp_netdev *dp, struct >>ofpbuf *packet, >> miniflow_hash_5tuple(&key.flow, 0) >> % dp->n_handlers, >> DPIF_UC_MISS, &key.flow, NULL); >> -ofpbuf_delete(packet); >> } >> } >> >> @@ -2063,7 +2062,6 @@ dp_netdev_output_userspace(struct dp_netdev *dp, >>struct ofpbuf *packet, >> if (userdata) { >> buf_size += NLA_ALIGN(userdata->nla_len); >> } >> -buf_size += ofpbuf_size(packet); >> ofpbuf_init(buf, buf_size); >> >> /* Put ODP flow. */ >> @@ -2078,15 +2076,14 @@ dp_netdev_output_userspace(struct dp_netdev >>*dp, struct ofpbuf *packet, >> >>NLA_ALIGN(userdata->nla_len)); >> } >> >> -ofpbuf_set_data(&upcall->packet, >> -ofpbuf_put(buf, ofpbuf_data(packet), >>ofpbuf_size(packet))); >> -ofpbuf_set_size(&upcall->packet, ofpbuf_size(packet)); >> +upcall->packet = *packet; >> > >This would not work with DPDK. ofpbuf from dpdk is special case where >ofpbuf and data are allocated from same memory object. Therefore >moving ofpbuf->data is nontrivial. > >To make it work we need atleast following covered. >1. Define flag for source of ofpbuf header. So that we can track >header and data independently. >2. Fix dpdk ofpbuf destructor to free correct dpdk memory object. > >Thanks, >Pravin. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] INSTALL: Note about compiler atomics support.
This better, Acked-by: Jarno Rajahalme On Jun 3, 2014, at 2:20 PM, Gurucharan Shetty wrote: > May be we should add a new TODO section in BUILD.Windows like: > > diff --git a/BUILD.Windows b/BUILD.Windows > index ca0d252..41e6eab 100644 > --- a/BUILD.Windows > +++ b/BUILD.Windows > @@ -85,3 +85,11 @@ For example, > --with-openssl="C:/OpenSSL-Win32" > > * Run make for the ported executables. > + > +TODO: > + > + > +* OVS currently has no native support for atomics on Windows. Pthreads > +are used as a fallback, but some features, such as OVS-RCU are really > +slow without native atomics support. Atomics support for Windows has to > +be brought in. > > On Tue, Jun 3, 2014 at 2:16 PM, Jarno Rajahalme wrote: >> >> On Jun 3, 2014, at 2:03 PM, Ben Pfaff wrote: >> >>> On Tue, Jun 03, 2014 at 01:09:11PM -0700, Jarno Rajahalme wrote: OVS is slow when compiled with pthreads atomics. Add a generic note in INSTALL, with a reference to lib/ovs-atomic.h, where a new comment provides additional detail. Signed-off-by: Jarno Rajahalme >>> >>> I think this is only likely to affect Windows (everyone else uses GCC >>> or Clang) so should we put it in BUILD.Windows? >> >> >> So you think this should not be mentioned in INSTALL? If so, I think the >> text there should be modified to something like “* GCC 4.x or clang C >> compiler” instead of the current “A C compiler, such as…”. >> >> How about adding the following to BUILD.Windows: >> >> diff --git a/BUILD.Windows b/BUILD.Windows >> index ca0d252..05956ba 100644 >> --- a/BUILD.Windows >> +++ b/BUILD.Windows >> @@ -39,6 +39,10 @@ project from >> ftp://sourceware.org/pub/pthreads-win32/prebuilt-dll-2-9-1-release to a >> directory (e.g.: C:/pthread). >> >> +OVS currently has no native support for atomics on Windows. Pthreads >> +are used as a fallback, but some features, such as OVS-RCU are really >> +slow without native atomics support. >> + >> * Get the Open vSwitch sources from either cloning the repo using git >> or from a distribution tar ball. >> >> Jarno >> >> ___ >> dev mailing list >> dev@openvswitch.org >> http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] INSTALL: Note about compiler atomics support.
On Tue, Jun 03, 2014 at 02:16:29PM -0700, Jarno Rajahalme wrote: > > On Jun 3, 2014, at 2:03 PM, Ben Pfaff wrote: > > > On Tue, Jun 03, 2014 at 01:09:11PM -0700, Jarno Rajahalme wrote: > >> OVS is slow when compiled with pthreads atomics. Add a generic note > >> in INSTALL, with a reference to lib/ovs-atomic.h, where a new comment > >> provides additional detail. > >> > >> Signed-off-by: Jarno Rajahalme > > > > I think this is only likely to affect Windows (everyone else uses GCC > > or Clang) so should we put it in BUILD.Windows? > > > So you think this should not be mentioned in INSTALL? If so, I think > the text there should be modified to something like ?* GCC 4.x or > clang C compiler? instead of the current ?A C compiler, such as??. I phrased that badly. I think we should mention it *additionally* in BUILD.Windows, not *only* there. I lean toward recommending GCC or Clang, except MSVC 2013 for Windows, but not claim that those are the only possibilities. > How about adding the following to BUILD.Windows: That looks good, thanks. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v3 1/2] netdev-dpdk: Free DPDK buffer using mbuf's buffer address.
Previously, DPDK buffers were freed using 'struct ofpbuf' pointers passed to free_dpdk_buf(). However, if a new ofpbuf allocated on the stack shares data with the old ofpbuf and the new ofpbuf is to be freed with free_dpdk_buf(), this will cause an error. This patch gets the beginning of the allocated buffer from 'struct rte_mbuf'. This allows ofpbufs to share packet data when using DPDK. Signed-off-by: Ryan Wilson --- lib/netdev-dpdk.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index ba41d2e..155cbf0 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -206,8 +206,9 @@ void free_dpdk_buf(struct ofpbuf *b) { struct rte_mbuf *pkt = (struct rte_mbuf *) b; +struct rte_mbuf *buf = (char *) pkt->buf_addr - sizeof(struct ofpbuf); -rte_mempool_put(pkt->pool, pkt); +rte_mempool_put(pkt->pool, buf); } static void -- 1.7.9.5 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] INSTALL: Note about compiler atomics support.
Me too. Acked-by: Ben Pfaff On Tue, Jun 03, 2014 at 02:22:48PM -0700, Jarno Rajahalme wrote: > This better, > > Acked-by: Jarno Rajahalme > > On Jun 3, 2014, at 2:20 PM, Gurucharan Shetty wrote: > > > May be we should add a new TODO section in BUILD.Windows like: > > > > diff --git a/BUILD.Windows b/BUILD.Windows > > index ca0d252..41e6eab 100644 > > --- a/BUILD.Windows > > +++ b/BUILD.Windows > > @@ -85,3 +85,11 @@ For example, > > --with-openssl="C:/OpenSSL-Win32" > > > > * Run make for the ported executables. > > + > > +TODO: > > + > > + > > +* OVS currently has no native support for atomics on Windows. Pthreads > > +are used as a fallback, but some features, such as OVS-RCU are really > > +slow without native atomics support. Atomics support for Windows has to > > +be brought in. > > > > On Tue, Jun 3, 2014 at 2:16 PM, Jarno Rajahalme > > wrote: > >> > >> On Jun 3, 2014, at 2:03 PM, Ben Pfaff wrote: > >> > >>> On Tue, Jun 03, 2014 at 01:09:11PM -0700, Jarno Rajahalme wrote: > OVS is slow when compiled with pthreads atomics. Add a generic note > in INSTALL, with a reference to lib/ovs-atomic.h, where a new comment > provides additional detail. > > Signed-off-by: Jarno Rajahalme > >>> > >>> I think this is only likely to affect Windows (everyone else uses GCC > >>> or Clang) so should we put it in BUILD.Windows? > >> > >> > >> So you think this should not be mentioned in INSTALL? If so, I think the > >> text there should be modified to something like ?* GCC 4.x or clang C > >> compiler? instead of the current ?A C compiler, such as??. > >> > >> How about adding the following to BUILD.Windows: > >> > >> diff --git a/BUILD.Windows b/BUILD.Windows > >> index ca0d252..05956ba 100644 > >> --- a/BUILD.Windows > >> +++ b/BUILD.Windows > >> @@ -39,6 +39,10 @@ project from > >> ftp://sourceware.org/pub/pthreads-win32/prebuilt-dll-2-9-1-release to a > >> directory (e.g.: C:/pthread). > >> > >> +OVS currently has no native support for atomics on Windows. Pthreads > >> +are used as a fallback, but some features, such as OVS-RCU are really > >> +slow without native atomics support. > >> + > >> * Get the Open vSwitch sources from either cloning the repo using git > >> or from a distribution tar ball. > >> > >> Jarno > >> > >> ___ > >> dev mailing list > >> dev@openvswitch.org > >> http://openvswitch.org/mailman/listinfo/dev > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [slow path HASH and RECIRC action v2] dpif: Fix slow action handling for DP_HASH and RECIRC
In case DP_HASH and RECIRC actions need to be executed in slow path, current implementation simply don't handle them -- vswitchd simply crashes. This patch fixes them by supply an implementation for them. RECIRC will be handled by the datapath, same as the output action. DP_HASH, on the other hand, is handled in the user space. Although the resulting hash values may not match those computed by the datapath, it is less expensive; current use case (bonding) does not require a strict match to work properly. Reported-by: YAMAMOTO Takashi Signed-off-by: Andy Zhou --- v1->v2: Addresses Ben's review feedback fix the comment on why execute dp_hash in user space. Add assert for unknown hash algorithm specification. --- lib/dpif.c| 4 ++-- lib/odp-execute.c | 23 ++- 2 files changed, 24 insertions(+), 3 deletions(-) diff --git a/lib/dpif.c b/lib/dpif.c index 84dba28..ac73be1 100644 --- a/lib/dpif.c +++ b/lib/dpif.c @@ -1069,6 +1069,7 @@ dpif_execute_helper_cb(void *aux_, struct ofpbuf *packet, switch ((enum ovs_action_attr)type) { case OVS_ACTION_ATTR_OUTPUT: case OVS_ACTION_ATTR_USERSPACE: +case OVS_ACTION_ATTR_RECIRC: execute.actions = action; execute.actions_len = NLA_ALIGN(action->nla_len); execute.packet = packet; @@ -1077,6 +1078,7 @@ dpif_execute_helper_cb(void *aux_, struct ofpbuf *packet, aux->error = aux->dpif->dpif_class->execute(aux->dpif, &execute); break; +case OVS_ACTION_ATTR_HASH: case OVS_ACTION_ATTR_PUSH_VLAN: case OVS_ACTION_ATTR_POP_VLAN: case OVS_ACTION_ATTR_PUSH_MPLS: @@ -1084,8 +1086,6 @@ dpif_execute_helper_cb(void *aux_, struct ofpbuf *packet, case OVS_ACTION_ATTR_SET: case OVS_ACTION_ATTR_SAMPLE: case OVS_ACTION_ATTR_UNSPEC: -case OVS_ACTION_ATTR_RECIRC: -case OVS_ACTION_ATTR_HASH: case __OVS_ACTION_ATTR_MAX: OVS_NOT_REACHED(); } diff --git a/lib/odp-execute.c b/lib/odp-execute.c index 12ad679..cc18536 100644 --- a/lib/odp-execute.c +++ b/lib/odp-execute.c @@ -26,6 +26,7 @@ #include "ofpbuf.h" #include "odp-util.h" #include "packets.h" +#include "flow.h" #include "unaligned.h" #include "util.h" @@ -208,7 +209,6 @@ odp_execute_actions__(void *dp, struct ofpbuf *packet, bool steal, case OVS_ACTION_ATTR_OUTPUT: case OVS_ACTION_ATTR_USERSPACE: case OVS_ACTION_ATTR_RECIRC: -case OVS_ACTION_ATTR_HASH: if (dp_execute_action) { /* Allow 'dp_execute_action' to steal the packet data if we do * not need it any more. */ @@ -219,6 +219,27 @@ odp_execute_actions__(void *dp, struct ofpbuf *packet, bool steal, } break; +case OVS_ACTION_ATTR_HASH: { +const struct ovs_action_hash *hash_act = nl_attr_get(a); + +/* Calculate a hash value directly. This might not match the + * value computed by the datapath, but it is much less expensive, + * and the current use case (bonding) does not require a strict + * match to work properly. */ +if (hash_act->hash_alg == OVS_HASH_ALG_L4) { +struct flow flow; +uint32_t hash; + +flow_extract(packet, md, &flow); +hash = flow_hash_5tuple(&flow, hash_act->hash_basis); +md->dp_hash = hash ? hash : 1; +} else { +/* Assert on unknown hash algorithm. */ +OVS_NOT_REACHED(); +} +break; +} + case OVS_ACTION_ATTR_PUSH_VLAN: { const struct ovs_action_push_vlan *vlan = nl_attr_get(a); eth_push_vlan(packet, htons(ETH_TYPE_VLAN), vlan->vlan_tci); -- 1.9.1 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] dpif: Fix slow action handling for DP_HASH and RECIRC
Ben, thanks for the review. Both are valid. I will send out a V2 soon that should address both points. On Thu, May 29, 2014 at 2:02 PM, Ben Pfaff wrote: > On Wed, May 28, 2014 at 05:00:48PM -0700, Andy Zhou wrote: >> In case DP_HASH and RECIRC actions need to be executed in slow path, >> current implementation simply don't handle them -- vswitchd simply >> crashes. This patch fixes them by supply an implementation for them. >> >> RECIRC will be handled by the datapath, same as the output action. >> >> DP_HASH, on the other hand, is handled in the user space. Although the >> resulting hash values may not match those computed by the datapath, it >> is less expensive; current use case (bonding) does not require a strict >> match to work properly. >> >> Reported-by: YAMAMOTO Takashi >> Signed-off-by: Andy Zhou > > I would adjust the comment in odp_execute_actions__() to match the > changelog better, maybe: > /* Calculate a hash value directly. This might not match the > * value computed by the datapath, but it is much less expensive, > * and the current use case (bonding) does not require a strict > * match to work properly. */ > > Should we ovs_assert() that md->dp_hash is OVS_HASH_ALG_L4? > Presumably we won't ever create an ODP action that uses a hash > algorithm we don't implement, and if we do we ought to find out about > it. > > Thanks, > > Ben. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] dpif: Fix slow action handling for DP_HASH and RECIRC
Thanks! On Tue, Jun 03, 2014 at 02:30:12PM -0700, Andy Zhou wrote: > Ben, thanks for the review. Both are valid. > > I will send out a V2 soon that should address both points. > > On Thu, May 29, 2014 at 2:02 PM, Ben Pfaff wrote: > > On Wed, May 28, 2014 at 05:00:48PM -0700, Andy Zhou wrote: > >> In case DP_HASH and RECIRC actions need to be executed in slow path, > >> current implementation simply don't handle them -- vswitchd simply > >> crashes. This patch fixes them by supply an implementation for them. > >> > >> RECIRC will be handled by the datapath, same as the output action. > >> > >> DP_HASH, on the other hand, is handled in the user space. Although the > >> resulting hash values may not match those computed by the datapath, it > >> is less expensive; current use case (bonding) does not require a strict > >> match to work properly. > >> > >> Reported-by: YAMAMOTO Takashi > >> Signed-off-by: Andy Zhou > > > > I would adjust the comment in odp_execute_actions__() to match the > > changelog better, maybe: > > /* Calculate a hash value directly. This might not match the > > * value computed by the datapath, but it is much less > > expensive, > > * and the current use case (bonding) does not require a strict > > * match to work properly. */ > > > > Should we ovs_assert() that md->dp_hash is OVS_HASH_ALG_L4? > > Presumably we won't ever create an ODP action that uses a hash > > algorithm we don't implement, and if we do we ought to find out about > > it. > > > > Thanks, > > > > Ben. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 1/2] ofproto-dpif-xlate: Cache full flowmod for learning.
On 4 June 2014 05:39, Alex Wang wrote: > I confirm this patch fix the bug, but have a high level concern, > > > Assume another learnt flow with hard_timeout=10 and was never hit after > install, now, if there is frequent change to the flow table and > need_revalidate > is constantly 'true', this learnt flow will never timeout... since we > always > redo the flow_mod during revalidation, > (Correct me if I'm wrong, but this is how I see it:) For this particular case, we will not redo the flow_mod during revalidation. The "may_learn" flag is the same as before. Now, we will create the flow_mod and cache it, but only do the flow_mod if may_learn is true. The "execute_learn" function is perhaps a bit misleading here: It only assembles the flow_mod, doesn't actually execute it. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 1/2] ofproto-dpif-xlate: Cache full flowmod for learning.
You are right, sorry for misreading your patch. Acked-by: Alex Wang On Tue, Jun 3, 2014 at 2:51 PM, Joe Stringer wrote: > On 4 June 2014 05:39, Alex Wang wrote: > >> I confirm this patch fix the bug, but have a high level concern, >> >> >> Assume another learnt flow with hard_timeout=10 and was never hit after >> install, now, if there is frequent change to the flow table and >> need_revalidate >> is constantly 'true', this learnt flow will never timeout... since we >> always >> redo the flow_mod during revalidation, >> > > (Correct me if I'm wrong, but this is how I see it:) > For this particular case, we will not redo the flow_mod during > revalidation. The "may_learn" flag is the same as before. Now, we will > create the flow_mod and cache it, but only do the flow_mod if may_learn is > true. > > The "execute_learn" function is perhaps a bit misleading here: It only > assembles the flow_mod, doesn't actually execute it. > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 2/2] ofproto: Remove ofproto_refresh_rule().
Acked-by: Alex Wang On Tue, Jun 3, 2014 at 2:59 AM, Joe Stringer wrote: > The only user of this function was removed in the previous patch, so > remove it. > > Signed-off-by: Joe Stringer > --- > ofproto/ofproto-dpif.c | 12 > ofproto/ofproto-provider.h |2 -- > ofproto/ofproto.c | 41 > - > 3 files changed, 55 deletions(-) > > diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c > index 06be234..22ba14e 100644 > --- a/ofproto/ofproto-dpif.c > +++ b/ofproto/ofproto-dpif.c > @@ -364,18 +364,6 @@ ofproto_dpif_flow_mod(struct ofproto_dpif *ofproto, > ofproto_flow_mod(&ofproto->up, fm); > } > > -/* Resets the modified time for 'rule' or an equivalent rule. If 'rule' > is not > - * in the classifier, but an equivalent rule is, unref 'rule' and ref the > new > - * rule. Otherwise if 'rule' is no longer installed in the classifier, > - * reinstall it. > - * > - * Returns the rule whose modified time has been reset. */ > -struct rule_dpif * > -ofproto_dpif_refresh_rule(struct rule_dpif *rule) > -{ > -return rule_dpif_cast(ofproto_refresh_rule(&rule->up)); > -} > - > /* Appends 'pin' to the queue of "packet ins" to be sent to the > controller. > * Takes ownership of 'pin' and pin->packet. */ > void > diff --git a/ofproto/ofproto-provider.h b/ofproto/ofproto-provider.h > index ff539b9..7741044 100644 > --- a/ofproto/ofproto-provider.h > +++ b/ofproto/ofproto-provider.h > @@ -1716,8 +1716,6 @@ BUILD_ASSERT_DECL(OFPROTO_POSTPONE < OFPERR_OFS); > > int ofproto_flow_mod(struct ofproto *, struct ofputil_flow_mod *) > OVS_EXCLUDED(ofproto_mutex); > -struct rule *ofproto_refresh_rule(struct rule *rule) > -OVS_EXCLUDED(ofproto_mutex); > void ofproto_add_flow(struct ofproto *, const struct match *, >unsigned int priority, >const struct ofpact *ofpacts, size_t ofpacts_len) > diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c > index 24a709b..f34cfde 100644 > --- a/ofproto/ofproto.c > +++ b/ofproto/ofproto.c > @@ -1978,47 +1978,6 @@ ofproto_flow_mod(struct ofproto *ofproto, struct > ofputil_flow_mod *fm) > return handle_flow_mod__(ofproto, NULL, fm, NULL); > } > > -/* Resets the modified time for 'rule' or an equivalent rule. If 'rule' > is not > - * in the classifier, but an equivalent rule is, unref 'rule' and ref the > new > - * rule. Otherwise if 'rule' is no longer installed in the classifier, > - * reinstall it. > - * > - * Returns the rule whose modified time has been reset. */ > -struct rule * > -ofproto_refresh_rule(struct rule *rule) > -{ > -const struct oftable *table = &rule->ofproto->tables[rule->table_id]; > -const struct cls_rule *cr = &rule->cr; > -struct rule *r; > - > -/* do_add_flow() requires that the rule is not installed. We lock the > - * ofproto_mutex here so that another thread cannot add the flow > before > - * we get a chance to add it.*/ > -ovs_mutex_lock(&ofproto_mutex); > - > -fat_rwlock_rdlock(&table->cls.rwlock); > -r = rule_from_cls_rule(classifier_find_rule_exactly(&table->cls, cr)); > -if (r != rule) { > -ofproto_rule_ref(r); > -} > -fat_rwlock_unlock(&table->cls.rwlock); > - > -if (!r) { > -do_add_flow(rule->ofproto, NULL, NULL, 0, rule); > -} else if (r != rule) { > -ofproto_rule_unref(rule); > -rule = r; > -} > -ovs_mutex_unlock(&ofproto_mutex); > - > -/* Refresh the modified time for the rule. */ > -ovs_mutex_lock(&rule->mutex); > -rule->modified = MAX(rule->modified, time_msec()); > -ovs_mutex_unlock(&rule->mutex); > - > -return rule; > -} > - > /* Searches for a rule with matching criteria exactly equal to 'target' in > * ofproto's table 0 and, if it finds one, deletes it. > * > -- > 1.7.10.4 > > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 1/2] ofproto-dpif-xlate: Cache full flowmod for learning.
On Wed, Jun 04, 2014 at 09:51:16AM +1200, Joe Stringer wrote: > On 4 June 2014 05:39, Alex Wang wrote: > > > I confirm this patch fix the bug, but have a high level concern, > > > > > > Assume another learnt flow with hard_timeout=10 and was never hit after > > install, now, if there is frequent change to the flow table and > > need_revalidate > > is constantly 'true', this learnt flow will never timeout... since we > > always > > redo the flow_mod during revalidation, > > > > (Correct me if I'm wrong, but this is how I see it:) > For this particular case, we will not redo the flow_mod during > revalidation. The "may_learn" flag is the same as before. Now, we will > create the flow_mod and cache it, but only do the flow_mod if may_learn is > true. > > The "execute_learn" function is perhaps a bit misleading here: It only > assembles the flow_mod, doesn't actually execute it. I think this is logically correct. Have you confirmed that it fixes the original bug? One concern I have is memory consumption. struct ofputil_flow_mod is 452 bytes (which we use and initialize), plus we allocate another 1024 bytes for actions. I don't see an easy way to get rid of the former, but I think that the actions would normally be pretty small. I think we'd probably be better off with an initial stub for the latter that we then xmemdup() to save. Alternatively, if you changed the starting size of the actions ofpbuf to just, say, 64 bytes, I think that would save a lot of memory in the common case. Assuming it fixes the bug, Acked-by: Ben Pfaff ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 1/2] ofproto-dpif-xlate: Cache full flowmod for learning.
Yes, I have confirmed that this patch fix the bug. On Tue, Jun 3, 2014 at 3:11 PM, Ben Pfaff wrote: > On Wed, Jun 04, 2014 at 09:51:16AM +1200, Joe Stringer wrote: > > On 4 June 2014 05:39, Alex Wang wrote: > > > > > I confirm this patch fix the bug, but have a high level concern, > > > > > > > > > Assume another learnt flow with hard_timeout=10 and was never hit after > > > install, now, if there is frequent change to the flow table and > > > need_revalidate > > > is constantly 'true', this learnt flow will never timeout... since we > > > always > > > redo the flow_mod during revalidation, > > > > > > > (Correct me if I'm wrong, but this is how I see it:) > > For this particular case, we will not redo the flow_mod during > > revalidation. The "may_learn" flag is the same as before. Now, we will > > create the flow_mod and cache it, but only do the flow_mod if may_learn > is > > true. > > > > The "execute_learn" function is perhaps a bit misleading here: It only > > assembles the flow_mod, doesn't actually execute it. > > I think this is logically correct. Have you confirmed that it fixes > the original bug? > > One concern I have is memory consumption. struct ofputil_flow_mod is > 452 bytes (which we use and initialize), plus we allocate another 1024 > bytes for actions. I don't see an easy way to get rid of the former, > but I think that the actions would normally be pretty small. I think > we'd probably be better off with an initial stub for the latter that > we then xmemdup() to save. Alternatively, if you changed the starting > size of the actions ofpbuf to just, say, 64 bytes, I think that would > save a lot of memory in the common case. > > Assuming it fixes the bug, > Acked-by: Ben Pfaff > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 1/2] ofproto-dpif-xlate: Cache full flowmod for learning.
I'm not against this idea; It seems correct to me, to keep the cache the same even if we need to revalidate, because we will throw it away if revalidation fails anyway. We already had a discussion about using the xcache even when revalidation is required:- http://openvswitch.org/pipermail/dev/2014-May/040440.html I wonder if we need to extend that, so that rather than clearing the xcache in the revalidation case, we keep it (and make sure that it is not overwritten by the xlate_actions__). Regardless, I would feel more comfortable with this patch going in, as the cases of creating xcache and learning are separate, and not handled correctly in that function currently. On 4 June 2014 08:26, Alex Wang wrote: > After discussing the issue with few people, > > I think we should keep the old_xcache during the full revalidation. > > And in xlate_learn_action(), if may_learn is false, we should check if the > learned rule is in the old_xcache, if it is, we should copy the XC_LEARN > entry > to the new_xcache list (but we do not refresh the learnt rule again, since > may_learn to false). > > Thoughts? > > Alex Wang, > > > On Tue, Jun 3, 2014 at 10:39 AM, Alex Wang wrote: > >> I confirm this patch fix the bug, but have a high level concern, >> >> >> Assume another learnt flow with hard_timeout=10 and was never hit after >> install, now, if there is frequent change to the flow table and >> need_revalidate >> is constantly 'true', this learnt flow will never timeout... since we >> always >> redo the flow_mod during revalidation, >> >> Let's discuss further on this scenario, I'm thinking about solutions, >> >> Thanks, >> Alex Wang, >> >> >> >> >> On Tue, Jun 3, 2014 at 2:59 AM, Joe Stringer >> wrote: >> >>> Caching the results of xlate_learn was previously dependent on the state >>> of the 'may_learn' flag. This meant that if the caller did not specify >>> that this flow may learn, then a learn entry would not be cached. >>> However, the xlate_cache tends to be used on a recurring basis, so >>> failing to cache the learn entry can provide unexpected behaviour later >>> on, particularly in corner cases. >>> >>> Such a corner case occurred previously:- >>> * Revalidation was requested. >>> * A flow with a learn action was dumped. >>> * The flow had no packets. >>> * The flow's corresponding xcache was cleared, and the flow revalidated. >>> * The flow went on to receive packets after the xcache is re-created. >>> >>> In this case, the xcache would be re-created, but would not refresh the >>> timeouts on the learnt flow until the next time it was cleared, even if >>> it received more traffic. This would cause flows to time out sooner than >>> expected. Symptoms of this bug may include unexpected forwarding >>> behaviour or extraneous statistics being attributed to the wrong flow. >>> >>> This patch fixes the issue by caching the entire flow_mod, including >>> actions, upon translating an xlate_learn action. This is used to perform >>> a flow_mod from scratch with the original flow, rather than simply >>> refreshing the rule that was created during the creation of the xcache. >>> >>> Bug #1252997. >>> >>> Reported-by: Scott Hendricks >>> Signed-off-by: Joe Stringer >>> --- >>> ofproto/ofproto-dpif-xlate.c | 58 >>> ++ >>> 1 file changed, 30 insertions(+), 28 deletions(-) >>> >>> diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c >>> index 068d54f..e8f3dbf 100644 >>> --- a/ofproto/ofproto-dpif-xlate.c >>> +++ b/ofproto/ofproto-dpif-xlate.c >>> @@ -265,7 +265,9 @@ struct xc_entry { >>> uint16_t vid; >>> } bond; >>> struct { >>> -struct rule_dpif *rule; >>> +struct ofproto_dpif *ofproto; >>> +struct ofputil_flow_mod *fm; >>> +struct ofpbuf *ofpacts; >>> } learn; >>> struct { >>> struct ofproto_dpif *ofproto; >>> @@ -2977,34 +2979,38 @@ xlate_bundle_action(struct xlate_ctx *ctx, >>> } >>> >>> static void >>> -xlate_learn_action(struct xlate_ctx *ctx, >>> - const struct ofpact_learn *learn) >>> +xlate_learn_action__(struct xlate_ctx *ctx, const struct ofpact_learn >>> *learn, >>> + struct ofputil_flow_mod *fm, struct ofpbuf >>> *ofpacts) >>> { >>> -uint64_t ofpacts_stub[1024 / 8]; >>> -struct ofputil_flow_mod fm; >>> -struct ofpbuf ofpacts; >>> +learn_execute(learn, &ctx->xin->flow, fm, ofpacts); >>> +if (ctx->xin->may_learn) { >>> +ofproto_dpif_flow_mod(ctx->xbridge->ofproto, fm); >>> +} >>> +} >>> >>> +static void >>> +xlate_learn_action(struct xlate_ctx *ctx, const struct ofpact_learn >>> *learn) >>> +{ >>> ctx->xout->has_learn = true; >>> - >>> learn_mask(learn, &ctx->xout->wc); >>> >>> -if (!ctx->xin->may_learn) { >>> -return; >>> -} >>> - >>> -ofpbuf_use_stub(&ofpacts, ofpacts_stub, sizeof ofpacts_stub); >>> -learn_execute(learn, &ctx-
Re: [ovs-dev] [PATCH 1/2] ofproto-dpif-xlate: Cache full flowmod for learning.
OK. My memory consumption comment is just quibbling. I'd like to get in the fix, then we can reduce the memory consumption if need be. On Tue, Jun 03, 2014 at 03:14:01PM -0700, Alex Wang wrote: > Yes, I have confirmed that this patch fix the bug. > > > On Tue, Jun 3, 2014 at 3:11 PM, Ben Pfaff wrote: > > > On Wed, Jun 04, 2014 at 09:51:16AM +1200, Joe Stringer wrote: > > > On 4 June 2014 05:39, Alex Wang wrote: > > > > > > > I confirm this patch fix the bug, but have a high level concern, > > > > > > > > > > > > Assume another learnt flow with hard_timeout=10 and was never hit after > > > > install, now, if there is frequent change to the flow table and > > > > need_revalidate > > > > is constantly 'true', this learnt flow will never timeout... since we > > > > always > > > > redo the flow_mod during revalidation, > > > > > > > > > > (Correct me if I'm wrong, but this is how I see it:) > > > For this particular case, we will not redo the flow_mod during > > > revalidation. The "may_learn" flag is the same as before. Now, we will > > > create the flow_mod and cache it, but only do the flow_mod if may_learn > > is > > > true. > > > > > > The "execute_learn" function is perhaps a bit misleading here: It only > > > assembles the flow_mod, doesn't actually execute it. > > > > I think this is logically correct. Have you confirmed that it fixes > > the original bug? > > > > One concern I have is memory consumption. struct ofputil_flow_mod is > > 452 bytes (which we use and initialize), plus we allocate another 1024 > > bytes for actions. I don't see an easy way to get rid of the former, > > but I think that the actions would normally be pretty small. I think > > we'd probably be better off with an initial stub for the latter that > > we then xmemdup() to save. Alternatively, if you changed the starting > > size of the actions ofpbuf to just, say, 64 bytes, I think that would > > save a lot of memory in the common case. > > > > Assuming it fixes the bug, > > Acked-by: Ben Pfaff > > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 1/2] ofproto-dpif-xlate: Cache full flowmod for learning.
Would this incremental alleviate your concerns? diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c index e8f3dbf..aa8cea1 100644 --- a/ofproto/ofproto-dpif-xlate.c +++ b/ofproto/ofproto-dpif-xlate.c @@ -3000,11 +3000,11 @@ xlate_learn_action(struct xlate_ctx *ctx, const struct ofpact_learn *learn) entry = xlate_cache_add_entry(ctx->xin->xcache, XC_LEARN); entry->u.learn.ofproto = ctx->xbridge->ofproto; entry->u.learn.fm = xmalloc(sizeof *entry->u.learn.fm); -entry->u.learn.ofpacts = ofpbuf_new(1024); +entry->u.learn.ofpacts = ofpbuf_new(64); xlate_learn_action__(ctx, learn, entry->u.learn.fm, entry->u.learn.ofpacts); } else if (ctx->xin->may_learn) { -uint64_t ofpacts_stub[1024 / 8]; +uint64_t ofpacts_stub[64 / 8]; struct ofputil_flow_mod fm; struct ofpbuf ofpacts; On 4 June 2014 10:11, Ben Pfaff wrote: > On Wed, Jun 04, 2014 at 09:51:16AM +1200, Joe Stringer wrote: > > On 4 June 2014 05:39, Alex Wang wrote: > > > > > I confirm this patch fix the bug, but have a high level concern, > > > > > > > > > Assume another learnt flow with hard_timeout=10 and was never hit after > > > install, now, if there is frequent change to the flow table and > > > need_revalidate > > > is constantly 'true', this learnt flow will never timeout... since we > > > always > > > redo the flow_mod during revalidation, > > > > > > > (Correct me if I'm wrong, but this is how I see it:) > > For this particular case, we will not redo the flow_mod during > > revalidation. The "may_learn" flag is the same as before. Now, we will > > create the flow_mod and cache it, but only do the flow_mod if may_learn > is > > true. > > > > The "execute_learn" function is perhaps a bit misleading here: It only > > assembles the flow_mod, doesn't actually execute it. > > I think this is logically correct. Have you confirmed that it fixes > the original bug? > > One concern I have is memory consumption. struct ofputil_flow_mod is > 452 bytes (which we use and initialize), plus we allocate another 1024 > bytes for actions. I don't see an easy way to get rid of the former, > but I think that the actions would normally be pretty small. I think > we'd probably be better off with an initial stub for the latter that > we then xmemdup() to save. Alternatively, if you changed the starting > size of the actions ofpbuf to just, say, 64 bytes, I think that would > save a lot of memory in the common case. > > Assuming it fixes the bug, > Acked-by: Ben Pfaff > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 1/2] ofproto-dpif-xlate: Cache full flowmod for learning.
On Wed, Jun 04, 2014 at 10:30:13AM +1200, Joe Stringer wrote: > Would this incremental alleviate your concerns? > > diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c > index e8f3dbf..aa8cea1 100644 > --- a/ofproto/ofproto-dpif-xlate.c > +++ b/ofproto/ofproto-dpif-xlate.c > @@ -3000,11 +3000,11 @@ xlate_learn_action(struct xlate_ctx *ctx, const > struct ofpact_learn *learn) > entry = xlate_cache_add_entry(ctx->xin->xcache, XC_LEARN); > entry->u.learn.ofproto = ctx->xbridge->ofproto; > entry->u.learn.fm = xmalloc(sizeof *entry->u.learn.fm); This is good: > -entry->u.learn.ofpacts = ofpbuf_new(1024); > +entry->u.learn.ofpacts = ofpbuf_new(64); > xlate_learn_action__(ctx, learn, entry->u.learn.fm, > entry->u.learn.ofpacts); > } else if (ctx->xin->may_learn) { I'd leave this out, though, since 1024 bytes of stack space doesn't really cost anything: > -uint64_t ofpacts_stub[1024 / 8]; > +uint64_t ofpacts_stub[64 / 8]; > struct ofputil_flow_mod fm; > struct ofpbuf ofpacts; Thanks, Ben. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 1/2] ofproto-dpif-xlate: Cache full flowmod for learning.
OK, thanks. I'll push this patchset to master and branch-2.3 soon. On 4 June 2014 10:33, Ben Pfaff wrote: > On Wed, Jun 04, 2014 at 10:30:13AM +1200, Joe Stringer wrote: > > Would this incremental alleviate your concerns? > > > > diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c > > index e8f3dbf..aa8cea1 100644 > > --- a/ofproto/ofproto-dpif-xlate.c > > +++ b/ofproto/ofproto-dpif-xlate.c > > @@ -3000,11 +3000,11 @@ xlate_learn_action(struct xlate_ctx *ctx, const > > struct ofpact_learn *learn) > > entry = xlate_cache_add_entry(ctx->xin->xcache, XC_LEARN); > > entry->u.learn.ofproto = ctx->xbridge->ofproto; > > entry->u.learn.fm = xmalloc(sizeof *entry->u.learn.fm); > > This is good: > > > -entry->u.learn.ofpacts = ofpbuf_new(1024); > > +entry->u.learn.ofpacts = ofpbuf_new(64); > > xlate_learn_action__(ctx, learn, entry->u.learn.fm, > > entry->u.learn.ofpacts); > > } else if (ctx->xin->may_learn) { > > I'd leave this out, though, since 1024 bytes of stack space doesn't > really cost anything: > > > -uint64_t ofpacts_stub[1024 / 8]; > > +uint64_t ofpacts_stub[64 / 8]; > > struct ofputil_flow_mod fm; > > struct ofpbuf ofpacts; > > Thanks, > > Ben. > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 1/2] ofproto-dpif-xlate: Cache full flowmod for learning.
Thank you! Also many thanks to Alex. On Wed, Jun 04, 2014 at 10:34:44AM +1200, Joe Stringer wrote: > OK, thanks. I'll push this patchset to master and branch-2.3 soon. > > > On 4 June 2014 10:33, Ben Pfaff wrote: > > > On Wed, Jun 04, 2014 at 10:30:13AM +1200, Joe Stringer wrote: > > > Would this incremental alleviate your concerns? > > > > > > diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c > > > index e8f3dbf..aa8cea1 100644 > > > --- a/ofproto/ofproto-dpif-xlate.c > > > +++ b/ofproto/ofproto-dpif-xlate.c > > > @@ -3000,11 +3000,11 @@ xlate_learn_action(struct xlate_ctx *ctx, const > > > struct ofpact_learn *learn) > > > entry = xlate_cache_add_entry(ctx->xin->xcache, XC_LEARN); > > > entry->u.learn.ofproto = ctx->xbridge->ofproto; > > > entry->u.learn.fm = xmalloc(sizeof *entry->u.learn.fm); > > > > This is good: > > > > > -entry->u.learn.ofpacts = ofpbuf_new(1024); > > > +entry->u.learn.ofpacts = ofpbuf_new(64); > > > xlate_learn_action__(ctx, learn, entry->u.learn.fm, > > > entry->u.learn.ofpacts); > > > } else if (ctx->xin->may_learn) { > > > > I'd leave this out, though, since 1024 bytes of stack space doesn't > > really cost anything: > > > > > -uint64_t ofpacts_stub[1024 / 8]; > > > +uint64_t ofpacts_stub[64 / 8]; > > > struct ofputil_flow_mod fm; > > > struct ofpbuf ofpacts; > > > > Thanks, > > > > Ben. > > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2.58] datapath: Add basic MPLS support to kernel
On Mon, Jun 2, 2014 at 9:04 PM, Simon Horman wrote: > Hi Jesse, > > thanks for your feedback. > > On Mon, Jun 02, 2014 at 05:58:10PM -0700, Jesse Gross wrote: >> On Sun, May 25, 2014 at 5:22 PM, Simon Horman wrote: >> > diff --git a/datapath/flow_netlink.c b/datapath/flow_netlink.c >> > index 803a94c..8ce596c 100644 >> > --- a/datapath/flow_netlink.c >> > +++ b/datapath/flow_netlink.c >> > + case OVS_ACTION_ATTR_POP_MPLS: >> > + if (!eth_p_mpls(eth_type)) >> > + return -EINVAL; >> >> Should this also take into account the VLAN tag? It's really part of >> the EtherType although it has been stripped out here. Actually, maybe >> it's better to not track the vlan_tci separately at all during >> validation but just fold it into the EtherType. >> >> The practical implication of this is that you wouldn't be able to pop >> out from underneath a VLAN. This may be a good thing if we are trying >> to avoid tag order issues - after all, you can't push underneath a >> VLAN either. I'm not sure what effects this has on the need to track >> mac_len, if any. > > My thinking is that the ordering problem only surfaces in relation > to push MPLS actions where should it go in relation to VLAN tags. > For pop actions it seems to me that the outermost tag should be removed > regardless of its position in relation to other tags. > > So I think that the code above is safe. Though now you mention > it I do notice that it only allows pop MPLS if there is at most > one VLAN tag present. > > That said, I would not mind particularly disabling pop MPLS in the > presence of VLAN tags. At the very least it is related to the > painful issue of tag ordering. I agree that it is safe but my thought was the it avoids a number of potential corner cases such as: * Difference between push and pop underneath vlan tags. * Pop with multiple vlan tags * Differences with varying EtherTypes used for vlans > I explored your idea of tracking only eth_type rather than both > it and vlan_tci. I did this by adding the following logic to > ovs_nla_copy_actions(). > > if (key->eth.tci & htons(VLAN_TAG_PRESENT)) > eth_type = htons(ETH_P_8021Q); > else > eth_type = key->eth.type; > > I then updated the usage of eth_type in ovs_nla_copy_actions__() accordingly. > One problem that I have run into is what to do about pop VLAN. > > I don't believe its possible to know what the new eth type is. > This makes subsequent checks of the eth type for validate_set() > a little awkward. And seems to indicate that an extra parameter would > be needed. > > For this reason I am inclined to leave the eth_type and vlan_tci > parameters in place. In this case there is no problem with pop VLAN > as the ether type inside the VLAN tag should be the value of eth_type. Can't we populate eth_type with the EtherType from the flow key in pop_vlan? This doesn't provide us with the ability to look arbitrarily deep into the packet but it should at least retain the functionality that we have today. >> Otherwise, I'm happy with this. I think that we need to conclude the >> discussion on the other patch and update this appropriately first. > > Yes, lets get that sorted out. > > Assuming the other patch is accepted do you want me to increase the > coverage of the compatibility code (in this patch) up to whichever version > of the kernel the other patch is included in? It seems logical to me but I > do not have strong feelings about it. Yes, I think that probably makes sense. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 1/2] ofproto-dpif-xlate: Cache full flowmod for learning.
Yes, Alex has helped a lot with this. Thanks Alex! On 4 June 2014 10:36, Ben Pfaff wrote: > Thank you! Also many thanks to Alex. > > On Wed, Jun 04, 2014 at 10:34:44AM +1200, Joe Stringer wrote: > > OK, thanks. I'll push this patchset to master and branch-2.3 soon. > > > > > > On 4 June 2014 10:33, Ben Pfaff wrote: > > > > > On Wed, Jun 04, 2014 at 10:30:13AM +1200, Joe Stringer wrote: > > > > Would this incremental alleviate your concerns? > > > > > > > > diff --git a/ofproto/ofproto-dpif-xlate.c > b/ofproto/ofproto-dpif-xlate.c > > > > index e8f3dbf..aa8cea1 100644 > > > > --- a/ofproto/ofproto-dpif-xlate.c > > > > +++ b/ofproto/ofproto-dpif-xlate.c > > > > @@ -3000,11 +3000,11 @@ xlate_learn_action(struct xlate_ctx *ctx, > const > > > > struct ofpact_learn *learn) > > > > entry = xlate_cache_add_entry(ctx->xin->xcache, XC_LEARN); > > > > entry->u.learn.ofproto = ctx->xbridge->ofproto; > > > > entry->u.learn.fm = xmalloc(sizeof *entry->u.learn.fm); > > > > > > This is good: > > > > > > > -entry->u.learn.ofpacts = ofpbuf_new(1024); > > > > +entry->u.learn.ofpacts = ofpbuf_new(64); > > > > xlate_learn_action__(ctx, learn, entry->u.learn.fm, > > > > entry->u.learn.ofpacts); > > > > } else if (ctx->xin->may_learn) { > > > > > > I'd leave this out, though, since 1024 bytes of stack space doesn't > > > really cost anything: > > > > > > > -uint64_t ofpacts_stub[1024 / 8]; > > > > +uint64_t ofpacts_stub[64 / 8]; > > > > struct ofputil_flow_mod fm; > > > > struct ofpbuf ofpacts; > > > > > > Thanks, > > > > > > Ben. > > > > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 2/2] ofproto: Remove ofproto_refresh_rule().
Thanks, I pushed this series to master and branch-2.3. On 4 June 2014 10:06, Alex Wang wrote: > Acked-by: Alex Wang > > > On Tue, Jun 3, 2014 at 2:59 AM, Joe Stringer > wrote: > >> The only user of this function was removed in the previous patch, so >> remove it. >> >> Signed-off-by: Joe Stringer >> --- >> ofproto/ofproto-dpif.c | 12 >> ofproto/ofproto-provider.h |2 -- >> ofproto/ofproto.c | 41 >> - >> 3 files changed, 55 deletions(-) >> >> diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c >> index 06be234..22ba14e 100644 >> --- a/ofproto/ofproto-dpif.c >> +++ b/ofproto/ofproto-dpif.c >> @@ -364,18 +364,6 @@ ofproto_dpif_flow_mod(struct ofproto_dpif *ofproto, >> ofproto_flow_mod(&ofproto->up, fm); >> } >> >> -/* Resets the modified time for 'rule' or an equivalent rule. If 'rule' >> is not >> - * in the classifier, but an equivalent rule is, unref 'rule' and ref >> the new >> - * rule. Otherwise if 'rule' is no longer installed in the classifier, >> - * reinstall it. >> - * >> - * Returns the rule whose modified time has been reset. */ >> -struct rule_dpif * >> -ofproto_dpif_refresh_rule(struct rule_dpif *rule) >> -{ >> -return rule_dpif_cast(ofproto_refresh_rule(&rule->up)); >> -} >> - >> /* Appends 'pin' to the queue of "packet ins" to be sent to the >> controller. >> * Takes ownership of 'pin' and pin->packet. */ >> void >> diff --git a/ofproto/ofproto-provider.h b/ofproto/ofproto-provider.h >> index ff539b9..7741044 100644 >> --- a/ofproto/ofproto-provider.h >> +++ b/ofproto/ofproto-provider.h >> @@ -1716,8 +1716,6 @@ BUILD_ASSERT_DECL(OFPROTO_POSTPONE < OFPERR_OFS); >> >> int ofproto_flow_mod(struct ofproto *, struct ofputil_flow_mod *) >> OVS_EXCLUDED(ofproto_mutex); >> -struct rule *ofproto_refresh_rule(struct rule *rule) >> -OVS_EXCLUDED(ofproto_mutex); >> void ofproto_add_flow(struct ofproto *, const struct match *, >>unsigned int priority, >>const struct ofpact *ofpacts, size_t ofpacts_len) >> diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c >> index 24a709b..f34cfde 100644 >> --- a/ofproto/ofproto.c >> +++ b/ofproto/ofproto.c >> @@ -1978,47 +1978,6 @@ ofproto_flow_mod(struct ofproto *ofproto, struct >> ofputil_flow_mod *fm) >> return handle_flow_mod__(ofproto, NULL, fm, NULL); >> } >> >> -/* Resets the modified time for 'rule' or an equivalent rule. If 'rule' >> is not >> - * in the classifier, but an equivalent rule is, unref 'rule' and ref >> the new >> - * rule. Otherwise if 'rule' is no longer installed in the classifier, >> - * reinstall it. >> - * >> - * Returns the rule whose modified time has been reset. */ >> -struct rule * >> -ofproto_refresh_rule(struct rule *rule) >> -{ >> -const struct oftable *table = &rule->ofproto->tables[rule->table_id]; >> -const struct cls_rule *cr = &rule->cr; >> -struct rule *r; >> - >> -/* do_add_flow() requires that the rule is not installed. We lock the >> - * ofproto_mutex here so that another thread cannot add the flow >> before >> - * we get a chance to add it.*/ >> -ovs_mutex_lock(&ofproto_mutex); >> - >> -fat_rwlock_rdlock(&table->cls.rwlock); >> -r = rule_from_cls_rule(classifier_find_rule_exactly(&table->cls, >> cr)); >> -if (r != rule) { >> -ofproto_rule_ref(r); >> -} >> -fat_rwlock_unlock(&table->cls.rwlock); >> - >> -if (!r) { >> -do_add_flow(rule->ofproto, NULL, NULL, 0, rule); >> -} else if (r != rule) { >> -ofproto_rule_unref(rule); >> -rule = r; >> -} >> -ovs_mutex_unlock(&ofproto_mutex); >> - >> -/* Refresh the modified time for the rule. */ >> -ovs_mutex_lock(&rule->mutex); >> -rule->modified = MAX(rule->modified, time_msec()); >> -ovs_mutex_unlock(&rule->mutex); >> - >> -return rule; >> -} >> - >> /* Searches for a rule with matching criteria exactly equal to 'target' >> in >> * ofproto's table 0 and, if it finds one, deletes it. >> * >> -- >> 1.7.10.4 >> >> > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] dpif-netdev: Remove an extra memcpy of packet data from datapath-upcall interface for bridges with datapath_type=netdev.
I've ran into some unexpected issues while perf testing this, lets hold off on looking at this. I'll submit another patch when I've had all the kinks worked out. Ryan On 6/3/14 2:21 PM, "Ryan Wilson 76511" wrote: >Hey Pravin, > >Thanks for the catch here! Turns out the header is already tracked in DPDK >with rte_mbuf's buffer address - sizeof(ofpbuf). Thus, I submitted another >patch that, in free_dpdk_buf(), always gets this header and uses this to >free memory. > >http://patchwork.openvswitch.org/patch/4375/ > > >Let me know if this patch works. > >Cheers, > >Ryan > >On 6/3/14 10:50 AM, "Pravin Shelar" wrote: > >>On Tue, Jun 3, 2014 at 10:33 AM, Ryan Wilson wrote: >>> When a bridge of datatype type netdev receives a packet, it copies the >>> packet from the NIC to a buffer in userspace. Currently, when making >>> an upcall, the packet is again copied to the upcall's buffer. However, >>> this extra copy is not necessary when the datapath exists in userspace >>> as the upcall can directly access the packet data. >>> >>> This patch eliminates this extra copy of the packet data in most cases. >>> In cases where the packet may still be used later by callers of >>> dp_netdev_execute_actions, making a copy of the packet data is still >>> necessary. >>> >>> Signed-off-by: Ryan Wilson >>> --- >>> v2: Addressed Jarno's comment to use direct pointer assignment for >>> upcall->packet instead of ofpbuf_set(). >>> --- >>> lib/dpif-netdev.c | 15 +-- >>> 1 file changed, 5 insertions(+), 10 deletions(-) >>> >>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c >>> index 91c83d6..c89ae20 100644 >>> --- a/lib/dpif-netdev.c >>> +++ b/lib/dpif-netdev.c >>> @@ -2024,7 +2024,6 @@ dp_netdev_input(struct dp_netdev *dp, struct >>>ofpbuf *packet, >>> miniflow_hash_5tuple(&key.flow, 0) >>> % dp->n_handlers, >>> DPIF_UC_MISS, &key.flow, NULL); >>> -ofpbuf_delete(packet); >>> } >>> } >>> >>> @@ -2063,7 +2062,6 @@ dp_netdev_output_userspace(struct dp_netdev *dp, >>>struct ofpbuf *packet, >>> if (userdata) { >>> buf_size += NLA_ALIGN(userdata->nla_len); >>> } >>> -buf_size += ofpbuf_size(packet); >>> ofpbuf_init(buf, buf_size); >>> >>> /* Put ODP flow. */ >>> @@ -2078,15 +2076,14 @@ dp_netdev_output_userspace(struct dp_netdev >>>*dp, struct ofpbuf *packet, >>> >>>NLA_ALIGN(userdata->nla_len)); >>> } >>> >>> -ofpbuf_set_data(&upcall->packet, >>> -ofpbuf_put(buf, ofpbuf_data(packet), >>>ofpbuf_size(packet))); >>> -ofpbuf_set_size(&upcall->packet, ofpbuf_size(packet)); >>> +upcall->packet = *packet; >>> >> >>This would not work with DPDK. ofpbuf from dpdk is special case where >>ofpbuf and data are allocated from same memory object. Therefore >>moving ofpbuf->data is nontrivial. >> >>To make it work we need atleast following covered. >>1. Define flag for source of ofpbuf header. So that we can track >>header and data independently. >>2. Fix dpdk ofpbuf destructor to free correct dpdk memory object. >> >>Thanks, >>Pravin. > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH v5 net-next] MPLS: Use mpls_features to activate software MPLS GSO segmentation
If an MPLS packet requires segmentation then use mpls_features to determine if the software implementation should be used. As no driver advertises MPLS GSO segmentation this will always be the case. I had not noticed that this was necessary before as software MPLS GSO segmentation was already being used in my test environment. I believe that the reason for that is the skbs in question always had fragments and the driver I used does not advertise NETIF_F_FRAGLIST (which seems to be the case for most drivers). Thus software segmentation was activated by skb_gso_ok(). This introduces the overhead of an extra call to skb_network_protocol() in the case where where CONFIG_NET_MPLS_GSO is set and skb->ip_summed == CHECKSUM_NONE. Thanks to Jesse Gross for prompting me to investigate this. Signed-off-by: Simon Horman Acked-by: YAMAMOTO Takashi --- v5 * As suggested by Yamamoto-san - Remove unused tmp variable - Do not use unlikely() for MPLS ethertype check * Added Ack from Yamamoto-san v4.2 * As suggested by Eric Dumazet - Use htons() instead of cpu_to_be16() - Refactor code to pass type to net_mpls_features. v4.1 * Following fedback from Thomas Graff and Jesse Gross - Use ethertype of packet to detect MPLS rather than relying on mac_len indicating a gap between the end of L2 and the beginning of L3. That assumption seems to be broken by the GRE GSO code. - Move mpls_features handling into harmonize_features() This allows an existing call in there to skb_network_protocol() to be leveraged. - Removed acks as the patch has now changed in a material way v4 * As suggested by YAMAMOTO Takashi - Correct typos in comment * Added Ack from YAMAMOTO Takashi v3 * As requested by David Miller - Do not mark net_mpls_features as inline - Correct alignment of parameters v2 * Added Ack from Jesse Gross * Removed duplicate 'Thus' from changelog --- net/core/dev.c | 28 +++- 1 file changed, 27 insertions(+), 1 deletion(-) diff --git a/net/core/dev.c b/net/core/dev.c index 0355ca5..f726bfb 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2498,13 +2498,39 @@ static int dev_gso_segment(struct sk_buff *skb, netdev_features_t features) return 0; } +/* If MPLS offload request, verify we are testing hardware MPLS features + * instead of standard features for the netdev. + */ +#ifdef CONFIG_NET_MPLS_GSO +static netdev_features_t net_mpls_features(struct sk_buff *skb, + netdev_features_t features, + __be16 type) +{ + if (type == htons(ETH_P_MPLS_UC) || type == htons(ETH_P_MPLS_MC)) + features &= skb->dev->mpls_features; + + return features; +} +#else +static netdev_features_t net_mpls_features(struct sk_buff *skb, + netdev_features_t features, + __be16 type) +{ + return features; +} +#endif + static netdev_features_t harmonize_features(struct sk_buff *skb, netdev_features_t features) { int tmp; + __be16 type; + + type = skb_network_protocol(skb, &tmp); + features = net_mpls_features(skb, features, type); if (skb->ip_summed != CHECKSUM_NONE && - !can_checksum_protocol(features, skb_network_protocol(skb, &tmp))) { + !can_checksum_protocol(features, type)) { features &= ~NETIF_F_ALL_CSUM; } else if (illegal_highdma(skb->dev, skb)) { features &= ~NETIF_F_SG; -- 2.0.0.rc2 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 1/5] dpif-netdev: batch packet receiving
Hi Pravin, On May 30, 2014, at 4:53 PM, Pravin Shelar wrote: > We just need to reset mf->map, Can you add api for that ? > infact miniflow_extract() does not read from map, is there need to > initialize it on every packet? You’re right. mini flow_extract() should be enough. > This metadata is same for given batch, we can sent just one instance of it. > This is true, and helps improving performances too, since we do not need to initialize a big amount of memory. Thanks, Daniele___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] Is there any way to simulate the tunnel traffic in test cases to test IPFIX feature?
On Tue, Jun 3, 2014 at 2:51 AM, Wenyu Zhang wrote: > > I am trying to simulate traffic in test cases to test IPFIX feature. > > For normal traffic, I can use "ovs-appctl netdev-dummy/receive " to simulate > it, just refer to the netflow and sflow cases in tests/ofproto-dpif.at > And I can capture the IPFIX packets on lo interface as expected. > > But for tunnel traffic, I haven't find a workable way to generate this kind > of traffic. > For example, I try to use "ovs-appctl ofproto/trace ovs-dummy" command with > "-generate" option. > And we can see that the "sample" actions has been inserted as expected, but > these actions are not executed, and no IPFIX packets can be captured on lo > interface. > > Anyone knows about the usage of this command? Or other way to generate the > tunnel traffic to trigger IPFIX sample action? Thanks a lot. I don't believe that this is implemented. netdev-dummy/receive constructs a packet from the flow and the discards everything else. Since there is no support for additional metadata beyond the input port or tunneling implemented in the dummy datapath, it basically ends up looking like a non-tunneled packet arriving on a tunnel port. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 3/5] netdev-dpdk: implement send_batch
On May 30, 2014, at 4:54 PM, Pravin Shelar wrote: > On Fri, May 23, 2014 at 11:04 AM, Daniele Di Proietto > wrote: > need to skip this packet. Ops, added a continue. >> >> +/* TODO: warn if any packet bigger than dev->max_packet_len */ > > I think we have to check this. Better to add it now, and optimize it later. I added a check. I does not seem to slow down things too much. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 5/5] dpif-netdev: batch packet sending
On May 30, 2014, at 4:55 PM, Pravin Shelar wrote: >> >> int i; >> >> +memset(&batch, 0, sizeof batch); >> + > Is there need to reset batch? can batch_init() take care of it? You’re right. It should be enough to set batch.flow to NULL. > >> +for (i = 0; i < c; i++) { >> +struct ofpbuf * packet = packets[i]; > extra space before packet. > need blank line after declaration. Fixed, sorry. >> +if (q->head - q->tail < MAX_QUEUE_LEN) { >> +struct dp_netdev_upcall *u = &q->upcalls[q->head++ & >> QUEUE_MASK]; >> +struct dpif_upcall *upcall = &u->upcall; >> +struct ofpbuf *buf = &u->buf; >> +size_t buf_size; >> +struct flow flow; >> + >> +upcall->type = type; >> + >> +/* Allocate buffer big enough for everything. */ >> +buf_size = ODPUTIL_FLOW_KEY_BYTES; >> +if (userdata) { >> +buf_size += NLA_ALIGN(userdata->nla_len); >> +} >> +buf_size += ofpbuf_size(packet); >> +ofpbuf_init(buf, buf_size); >> + >> +/* Put ODP flow. */ >> +miniflow_expand(key, &flow); >> +odp_flow_key_from_flow(buf, &flow, NULL, flow.in_port.odp_port, >> true); >> +upcall->key = ofpbuf_data(buf); >> +upcall->key_len = ofpbuf_size(buf); >> + >> +/* Put userdata. */ >> +if (userdata) { >> +upcall->userdata = ofpbuf_put(buf, userdata, >> + NLA_ALIGN(userdata->nla_len)); >> +} >> >> -ofpbuf_set_data(&upcall->packet, >> -ofpbuf_put(buf, ofpbuf_data(packet), >> ofpbuf_size(packet))); >> -ofpbuf_set_size(&upcall->packet, ofpbuf_size(packet)); >> +ofpbuf_set_data(&upcall->packet, >> +ofpbuf_put(buf, ofpbuf_data(packet), >> +ofpbuf_size(packet))); >> +ofpbuf_set_size(&upcall->packet, ofpbuf_size(packet)); >> >> -seq_change(q->seq); >> +seq_change(q->seq); >> >> -error = 0; >> -} else { >> -dp_netdev_count_packet(dp, DP_STAT_LOST, 1); >> -error = ENOBUFS; >> +error = 0; >> +} else { >> +dp_netdev_count_packet(dp, DP_STAT_LOST, 1); >> +error = ENOBUFS; >> +} > Can you make separate function to send a packet for upcall processing? > it is easier to read that way. Sure, I did it. At some point, though, we might want to batch that too. Thanks for the suggestions, I’m about to send an updated patch set. Daniele ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH 2/3] netdev: netdev_send accepts multiple packets
The netdev_send function has been modified to accept multiple packets, to allow netdev providers to amortize locking and queuing costs. This is especially true for netdev-dpdk. Later commits exploit the new API. Signed-off-by: Daniele Di Proietto --- lib/dpif-netdev.c | 2 +- lib/netdev-bsd.c | 55 ++ lib/netdev-dpdk.c | 158 -- lib/netdev-dummy.c| 66 - lib/netdev-linux.c| 47 ++- lib/netdev-provider.h | 18 +++--- lib/netdev.c | 17 -- lib/netdev.h | 3 +- 8 files changed, 226 insertions(+), 140 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 91c83d6..e0e5ad8 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -2115,7 +2115,7 @@ dp_execute_cb(void *aux_, struct ofpbuf *packet, case OVS_ACTION_ATTR_OUTPUT: p = dp_netdev_lookup_port(aux->dp, u32_to_odp(nl_attr_get_u32(a))); if (p) { -netdev_send(p->netdev, packet, may_steal); +netdev_send(p->netdev, &packet, 1, may_steal); } break; diff --git a/lib/netdev-bsd.c b/lib/netdev-bsd.c index 8dc33df..4f4aa28 100644 --- a/lib/netdev-bsd.c +++ b/lib/netdev-bsd.c @@ -681,13 +681,13 @@ netdev_bsd_rxq_drain(struct netdev_rxq *rxq_) * system or a tap device. */ static int -netdev_bsd_send(struct netdev *netdev_, struct ofpbuf *pkt, bool may_steal) +netdev_bsd_send(struct netdev *netdev_, struct ofpbuf **pkts, int cnt, +bool may_steal) { struct netdev_bsd *dev = netdev_bsd_cast(netdev_); const char *name = netdev_get_name(netdev_); -const void *data = ofpbuf_data(pkt); -size_t size = ofpbuf_size(pkt); int error; +int i; ovs_mutex_lock(&dev->mutex); if (dev->tap_fd < 0 && !dev->pcap) { @@ -696,35 +696,42 @@ netdev_bsd_send(struct netdev *netdev_, struct ofpbuf *pkt, bool may_steal) error = 0; } -while (!error) { -ssize_t retval; -if (dev->tap_fd >= 0) { -retval = write(dev->tap_fd, data, size); -} else { -retval = pcap_inject(dev->pcap, data, size); -} -if (retval < 0) { -if (errno == EINTR) { -continue; +for (i = 0; i < cnt; i++) { +const void *data = ofpbuf_data(pkts[i]); +size_t size = ofpbuf_size(pkts[i]); + +while (!error) { +ssize_t retval; +if (dev->tap_fd >= 0) { +retval = write(dev->tap_fd, data, size); } else { -error = errno; -if (error != EAGAIN) { -VLOG_WARN_RL(&rl, "error sending Ethernet packet on %s: " - "%s", name, ovs_strerror(error)); +retval = pcap_inject(dev->pcap, data, size); +} +if (retval < 0) { +if (errno == EINTR) { +continue; +} else { +error = errno; +if (error != EAGAIN) { +VLOG_WARN_RL(&rl, "error sending Ethernet packet on %s: " + "%s", name, ovs_strerror(error)); +} } +} else if (retval != size) { +VLOG_WARN_RL(&rl, "sent partial Ethernet packet (%"PRIuSIZE"d bytes of " + "%"PRIuSIZE") on %s", retval, size, name); +error = EMSGSIZE; +} else { +break; } -} else if (retval != size) { -VLOG_WARN_RL(&rl, "sent partial Ethernet packet (%"PRIuSIZE"d bytes of " - "%"PRIuSIZE") on %s", retval, size, name); -error = EMSGSIZE; -} else { -break; } } ovs_mutex_unlock(&dev->mutex); if (may_steal) { -ofpbuf_delete(pkt); +for (i = 0; i < cnt; i++) { +ofpbuf_delete(pkt); +} } return error; diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 7a78c34..b731985 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -608,103 +608,147 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq_, struct ofpbuf **packets, int *c) } inline static void -dpdk_queue_pkt(struct netdev_dpdk *dev, int qid, - struct rte_mbuf *pkt) +dpdk_queue_pkts(struct netdev_dpdk *dev, int qid, + struct rte_mbuf **pkts, int cnt) { struct dpdk_tx_queue *txq = &dev->tx_q[qid]; uint64_t diff_tsc; uint64_t cur_tsc; uint32_t nb_tx; +int i = 0; + rte_spinlock_lock(&txq->tx_lock); -txq->burst_pkts[txq->count++] = pkt; -if (txq->count == MAX_TX_QUEUE_LEN) { -goto flush; -} -cur_tsc = rte_get_timer_cycles(); -if (txq->count == 1) { -txq->tsc = cur_tsc; -} -diff_tsc = cur_tsc - txq->tsc; -if (diff_tsc >= DRAIN_TSC) { -
[ovs-dev] [PATCH 1/3] netdev-dpdk: receive up to NETDEV_MAX_RX_BATCH
As per netdev-provider interface, netdev_dpdk_rxq_recv should receive at most NETDEV_MAX_RX_BATCH. Signed-off-by: Daniele Di Proietto --- lib/netdev-dpdk.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index ba41d2e..7a78c34 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -595,7 +595,9 @@ netdev_dpdk_rxq_recv(struct netdev_rxq *rxq_, struct ofpbuf **packets, int *c) dpdk_queue_flush(dev, rxq_->queue_id); nb_rx = rte_eth_rx_burst(rx->port_id, rxq_->queue_id, - (struct rte_mbuf **) packets, MAX_RX_QUEUE_LEN); + (struct rte_mbuf **) packets, + MIN((int)NETDEV_MAX_RX_BATCH, + (int)MAX_RX_QUEUE_LEN)); if (!nb_rx) { return EAGAIN; } -- 2.0.0.rc2 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH 3/3] dpif-netdev: batch packet processing
This change in dpif-netdev allows faster packet processing for devices which implement batching (netdev-dpdk currently). Signed-off-by: Daniele Di Proietto --- lib/dpif-netdev.c| 240 ++- lib/dpif.c | 7 +- lib/odp-execute.c| 49 ++--- lib/odp-execute.h| 4 +- ofproto/ofproto-dpif-xlate.c | 2 +- 5 files changed, 211 insertions(+), 91 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index e0e5ad8..fba1d65 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -331,18 +331,18 @@ static void dp_netdev_destroy_all_queues(struct dp_netdev *dp) OVS_REQ_WRLOCK(dp->queue_rwlock); static int dpif_netdev_open(const struct dpif_class *, const char *name, bool create, struct dpif **); -static int dp_netdev_output_userspace(struct dp_netdev *dp, struct ofpbuf *, - int queue_no, int type, +static int dp_netdev_output_userspace(struct dp_netdev *dp, struct ofpbuf **, + int cnt, int queue_no, int type, const struct miniflow *, const struct nlattr *userdata); static void dp_netdev_execute_actions(struct dp_netdev *dp, const struct miniflow *, - struct ofpbuf *, bool may_steal, + struct ofpbuf **, int cnt, bool may_steal, struct pkt_metadata *, const struct nlattr *actions, size_t actions_len); -static void dp_netdev_port_input(struct dp_netdev *dp, struct ofpbuf *packet, - struct pkt_metadata *); +static void dp_netdev_port_input(struct dp_netdev *dp, struct ofpbuf **packets, + int cnt, odp_port_t port_no); static void dp_netdev_set_pmd_threads(struct dp_netdev *, int n); @@ -1518,7 +1518,6 @@ static int dpif_netdev_execute(struct dpif *dpif, struct dpif_execute *execute) { struct dp_netdev *dp = get_dp_netdev(dpif); -struct pkt_metadata *md = &execute->md; struct { struct miniflow flow; uint32_t buf[FLOW_U32S]; @@ -1531,9 +1530,9 @@ dpif_netdev_execute(struct dpif *dpif, struct dpif_execute *execute) /* Extract flow key. */ miniflow_initialize(&key.flow, key.buf); -miniflow_extract(execute->packet, md, &key.flow); +miniflow_extract(execute->packet, &execute->md, &key.flow); -dp_netdev_execute_actions(dp, &key.flow, execute->packet, false, md, +dp_netdev_execute_actions(dp, &key.flow, &execute->packet, 1, false, &execute->md, execute->actions, execute->actions_len); return 0; @@ -1747,17 +1746,12 @@ dp_netdev_process_rxq_port(struct dp_netdev *dp, struct dp_netdev_port *port, struct netdev_rxq *rxq) { -struct ofpbuf *packet[NETDEV_MAX_RX_BATCH]; -int error, c; +struct ofpbuf *packets[NETDEV_MAX_RX_BATCH]; +int error, cnt; -error = netdev_rxq_recv(rxq, packet, &c); +error = netdev_rxq_recv(rxq, packets, &cnt); if (!error) { -struct pkt_metadata md = PKT_METADATA_INITIALIZER(port->port_no); -int i; - -for (i = 0; i < c; i++) { -dp_netdev_port_input(dp, packet[i], &md); -} +dp_netdev_port_input(dp, packets, cnt, port->port_no); } else if (error != EAGAIN && error != EOPNOTSUPP) { static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5); @@ -1954,7 +1948,7 @@ dp_netdev_flow_stats_new_cb(void) static void dp_netdev_flow_used(struct dp_netdev_flow *netdev_flow, -const struct ofpbuf *packet, +int cnt, int size, const struct miniflow *key) { uint16_t tcp_flags = miniflow_get_tcp_flags(key); @@ -1966,8 +1960,8 @@ dp_netdev_flow_used(struct dp_netdev_flow *netdev_flow, ovs_mutex_lock(&bucket->mutex); bucket->used = MAX(now, bucket->used); -bucket->packet_count++; -bucket->byte_count += ofpbuf_size(packet); +bucket->packet_count += cnt; +bucket->byte_count += size; bucket->tcp_flags |= tcp_flags; ovs_mutex_unlock(&bucket->mutex); } @@ -1981,74 +1975,143 @@ dp_netdev_stats_new_cb(void) } static void -dp_netdev_count_packet(struct dp_netdev *dp, enum dp_stat_type type) +dp_netdev_count_packet(struct dp_netdev *dp, enum dp_stat_type type, int cnt) { struct dp_netdev_stats *bucket; bucket = ovsthread_stats_bucket_get(&dp->stats, dp_netdev_stats_new_cb); ovs_mutex_lock(&bucket->mutex); -bucket->n[type]++; +bucket->n[type] += cnt; ovs_mutex_unlock(&bucket->mutex); } +struct batch_pkt_execute { +unsigned int packet_co
Re: [ovs-dev] [PATCH] dpif-netdev: Remove an extra memcpy of packet data from datapath-upcall interface for bridges with datapath_type=netdev.
Can you combine it with second patch, otherwise it introduces a bug. On Tue, Jun 3, 2014 at 4:22 PM, Ryan Wilson 76511 wrote: > I've ran into some unexpected issues while perf testing this, lets hold > off on looking at this. I'll submit another patch when I've had all the > kinks worked out. > > Ryan > > On 6/3/14 2:21 PM, "Ryan Wilson 76511" wrote: > >>Hey Pravin, >> >>Thanks for the catch here! Turns out the header is already tracked in DPDK >>with rte_mbuf's buffer address - sizeof(ofpbuf). Thus, I submitted another >>patch that, in free_dpdk_buf(), always gets this header and uses this to >>free memory. >> >>http://patchwork.openvswitch.org/patch/4375/ >> >> >>Let me know if this patch works. >> >>Cheers, >> >>Ryan >> >>On 6/3/14 10:50 AM, "Pravin Shelar" wrote: >> >>>On Tue, Jun 3, 2014 at 10:33 AM, Ryan Wilson wrote: When a bridge of datatype type netdev receives a packet, it copies the packet from the NIC to a buffer in userspace. Currently, when making an upcall, the packet is again copied to the upcall's buffer. However, this extra copy is not necessary when the datapath exists in userspace as the upcall can directly access the packet data. This patch eliminates this extra copy of the packet data in most cases. In cases where the packet may still be used later by callers of dp_netdev_execute_actions, making a copy of the packet data is still necessary. Signed-off-by: Ryan Wilson --- v2: Addressed Jarno's comment to use direct pointer assignment for upcall->packet instead of ofpbuf_set(). --- lib/dpif-netdev.c | 15 +-- 1 file changed, 5 insertions(+), 10 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 91c83d6..c89ae20 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -2024,7 +2024,6 @@ dp_netdev_input(struct dp_netdev *dp, struct ofpbuf *packet, miniflow_hash_5tuple(&key.flow, 0) % dp->n_handlers, DPIF_UC_MISS, &key.flow, NULL); -ofpbuf_delete(packet); } } @@ -2063,7 +2062,6 @@ dp_netdev_output_userspace(struct dp_netdev *dp, struct ofpbuf *packet, if (userdata) { buf_size += NLA_ALIGN(userdata->nla_len); } -buf_size += ofpbuf_size(packet); ofpbuf_init(buf, buf_size); /* Put ODP flow. */ @@ -2078,15 +2076,14 @@ dp_netdev_output_userspace(struct dp_netdev *dp, struct ofpbuf *packet, NLA_ALIGN(userdata->nla_len)); } -ofpbuf_set_data(&upcall->packet, -ofpbuf_put(buf, ofpbuf_data(packet), ofpbuf_size(packet))); -ofpbuf_set_size(&upcall->packet, ofpbuf_size(packet)); +upcall->packet = *packet; >>> >>>This would not work with DPDK. ofpbuf from dpdk is special case where >>>ofpbuf and data are allocated from same memory object. Therefore >>>moving ofpbuf->data is nontrivial. >>> >>>To make it work we need atleast following covered. >>>1. Define flag for source of ofpbuf header. So that we can track >>>header and data independently. >>>2. Fix dpdk ofpbuf destructor to free correct dpdk memory object. >>> >>>Thanks, >>>Pravin. >> > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 2/2] revalidator: Refactor ukey creation/lookup.
I agree, that would look a bit tidier. I'll send a patch. On 3 June 2014 08:53, Ethan Jackson wrote: > I'm wondering if it'd be cleaner to have ukey_create() return a locked > ukey, and to pull the try_lock() into the critical section for the > ukeys map? I.E. > > lock_hmap() > ukey = ukey_lookup() > if (!ukey) { > //create and insert > } else { > // trylock > } > unlock_hmap() > > What do you think? > > > > On Thu, May 29, 2014 at 6:38 PM, Joe Stringer > wrote: > > This patch refactors the code around ukey creation and lookup to > > simplify the code for callers. A new function ukey_acquire() combines > > these functions and attempts to acquire a lock on the ukey. Failure to > > acquire a lock on the ukey is usually a sign that another thread is > > handling the same flow concurrently, which means the flow does not need > > to be handled anyway. > > > > Signed-off-by: Joe Stringer > > --- > > ofproto/ofproto-dpif-upcall.c | 86 > ++--- > > 1 file changed, 37 insertions(+), 49 deletions(-) > > > > diff --git a/ofproto/ofproto-dpif-upcall.c > b/ofproto/ofproto-dpif-upcall.c > > index 1c82b6b..36e9d41 100644 > > --- a/ofproto/ofproto-dpif-upcall.c > > +++ b/ofproto/ofproto-dpif-upcall.c > > @@ -216,6 +216,12 @@ static void upcall_unixctl_set_flow_limit(struct > unixctl_conn *conn, int argc, > > > > static struct udpif_key *ukey_create(const struct nlattr *key, size_t > key_len, > > long long int used); > > +static struct udpif_key *ukey_lookup(struct udpif *udpif, > > + const struct nlattr *key, size_t > key_len, > > + uint32_t hash); > > +static bool ukey_acquire(struct udpif *udpif, const struct nlattr *key, > > + size_t key_len, long long int used, > > + struct udpif_key **result); > > static void ukey_delete(struct revalidator *, struct udpif_key *); > > > > static atomic_bool enable_megaflows = ATOMIC_VAR_INIT(true); > > @@ -983,8 +989,9 @@ handle_upcalls(struct handler *handler, struct > upcall *upcalls, > > > > /* Must be called with udpif->ukeys[hash % > udpif->n_revalidators].mutex. */ > > static struct udpif_key * > > -ukey_lookup__(struct udpif *udpif, const struct nlattr *key, size_t > key_len, > > - uint32_t hash) > > +ukey_lookup(struct udpif *udpif, const struct nlattr *key, size_t > key_len, > > +uint32_t hash) > > +OVS_REQUIRES(udpif->ukeys->mutex) > > { > > struct udpif_key *ukey; > > struct hmap *hmap = &udpif->ukeys[hash % > udpif->n_revalidators].hmap; > > @@ -998,21 +1005,8 @@ ukey_lookup__(struct udpif *udpif, const struct > nlattr *key, size_t key_len, > > } > > > > static struct udpif_key * > > -ukey_lookup(struct udpif *udpif, const struct nlattr *key, size_t > key_len, > > -uint32_t hash) > > -{ > > -struct udpif_key *ukey; > > -uint32_t idx = hash % udpif->n_revalidators; > > - > > -ovs_mutex_lock(&udpif->ukeys[idx].mutex); > > -ukey = ukey_lookup__(udpif, key, key_len, hash); > > -ovs_mutex_unlock(&udpif->ukeys[idx].mutex); > > - > > -return ukey; > > -} > > - > > -static struct udpif_key * > > ukey_create(const struct nlattr *key, size_t key_len, long long int > used) > > +OVS_NO_THREAD_SAFETY_ANALYSIS > > { > > struct udpif_key *ukey = xmalloc(sizeof *ukey); > > ovs_mutex_init(&ukey->mutex); > > @@ -1020,38 +1014,47 @@ ukey_create(const struct nlattr *key, size_t > key_len, long long int used) > > ukey->key = (struct nlattr *) &ukey->key_buf; > > memcpy(&ukey->key_buf, key, key_len); > > ukey->key_len = key_len; > > - > > -ovs_mutex_lock(&ukey->mutex); > > ukey->mark = false; > > ukey->flow_exists = true; > > ukey->created = used ? used : time_msec(); > > memset(&ukey->stats, 0, sizeof ukey->stats); > > ukey->xcache = NULL; > > -ovs_mutex_unlock(&ukey->mutex); > > > > return ukey; > > } > > > > -/* Checks for a ukey in 'udpif->ukeys' with the same 'ukey->key' and > 'hash', > > - * and inserts 'ukey' if it does not exist. > > +/* Searches for a ukey in 'udpif->ukeys' that matches 'key' and > 'key_len' and > > + * attempts to lock the ukey. If the ukey does not exist, create it. > > * > > - * Returns true if 'ukey' was inserted into 'udpif->ukeys', false > otherwise. */ > > + * Returns true on success, setting *result to the matching ukey and > returning > > + * it in a locked state. Otherwise, returns false and clears *result. */ > > static bool > > -udpif_insert_ukey(struct udpif *udpif, struct udpif_key *ukey, uint32_t > hash) > > +ukey_acquire(struct udpif *udpif, const struct nlattr *key, size_t > key_len, > > + long long int used, struct udpif_key **result) > > +OVS_TRY_LOCK(true, (*result)->mutex) > > { > > -struct udpif_key *duplicate; > > -uint32_t idx = hash % udpif->n_revalidators; >
[ovs-dev] [PATCHv3 1/2] ovs-dev.py: Add option to specify which tests to run.
Signed-off-by: Joe Stringer --- v3: Don't require users to specify "-k" for searches. v2: No change. v1: First post. --- utilities/ovs-dev.py | 14 ++ 1 file changed, 14 insertions(+) diff --git a/utilities/ovs-dev.py b/utilities/ovs-dev.py index 814a9d0..2c3af19 100755 --- a/utilities/ovs-dev.py +++ b/utilities/ovs-dev.py @@ -144,6 +144,14 @@ commands.append(make) def check(): +flags = "" +if options.tests: +for arg in str.split(options.tests): +if arg[0].isdigit(): +flags += "%s " % arg +else: +flags += "-k %s " % arg +ENV["TESTSUITEFLAGS"] = flags make("check") commands.append(check) @@ -355,6 +363,12 @@ def main(): const=i, help="compile with -O%d" % i) parser.add_option_group(group) +group = optparse.OptionGroup(parser, "check") +group.add_option("--tests", dest="tests", metavar="FILTER", + help="""run specific tests and/or a test category + eg, --tests=\"1-10 megaflow\) +parser.add_option_group(group) + group = optparse.OptionGroup(parser, "run") group.add_option("-g", "--gdb", dest="gdb", action="store_true", help="run ovs-vswitchd under gdb") -- 1.7.10.4 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCHv3 2/2] ovs-dev.py: Add option to run tests in parallel.
Signed-off-by: Joe Stringer --- v3: Set the default to -j8. v2: Make it configurable. v1: First post. --- utilities/ovs-dev.py |6 ++ 1 file changed, 6 insertions(+) diff --git a/utilities/ovs-dev.py b/utilities/ovs-dev.py index 2c3af19..fa74d57 100755 --- a/utilities/ovs-dev.py +++ b/utilities/ovs-dev.py @@ -145,6 +145,10 @@ commands.append(make) def check(): flags = "" +if options.parallel: +flags += "-j%d " % options.parallel +else: +flags += "-j8 " if options.tests: for arg in str.split(options.tests): if arg[0].isdigit(): @@ -364,6 +368,8 @@ def main(): parser.add_option_group(group) group = optparse.OptionGroup(parser, "check") +group.add_option("-p", dest="parallel", metavar="N_TESTS", type="int", + help="Run N_TESTS in parallel") group.add_option("--tests", dest="tests", metavar="FILTER", help="""run specific tests and/or a test category eg, --tests=\"1-10 megaflow\) -- 1.7.10.4 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCHv3 1/2] ovs-dev.py: Add option to specify which tests to run.
Acked-by: Ethan Jackson On Tue, Jun 3, 2014 at 5:41 PM, Joe Stringer wrote: > Signed-off-by: Joe Stringer > --- > v3: Don't require users to specify "-k" for searches. > v2: No change. > v1: First post. > --- > utilities/ovs-dev.py | 14 ++ > 1 file changed, 14 insertions(+) > > diff --git a/utilities/ovs-dev.py b/utilities/ovs-dev.py > index 814a9d0..2c3af19 100755 > --- a/utilities/ovs-dev.py > +++ b/utilities/ovs-dev.py > @@ -144,6 +144,14 @@ commands.append(make) > > > def check(): > +flags = "" > +if options.tests: > +for arg in str.split(options.tests): > +if arg[0].isdigit(): > +flags += "%s " % arg > +else: > +flags += "-k %s " % arg > +ENV["TESTSUITEFLAGS"] = flags > make("check") > commands.append(check) > > @@ -355,6 +363,12 @@ def main(): > const=i, help="compile with -O%d" % i) > parser.add_option_group(group) > > +group = optparse.OptionGroup(parser, "check") > +group.add_option("--tests", dest="tests", metavar="FILTER", > + help="""run specific tests and/or a test category > + eg, --tests=\"1-10 megaflow\) > +parser.add_option_group(group) > + > group = optparse.OptionGroup(parser, "run") > group.add_option("-g", "--gdb", dest="gdb", action="store_true", > help="run ovs-vswitchd under gdb") > -- > 1.7.10.4 > > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCHv3 2/2] ovs-dev.py: Add option to run tests in parallel.
Sorry to nit pick, but could we change the option name to "--jobs" and the abbreviated option to "-j". This is consistent with how make does it. Acked-by: Ethan Jackson On Tue, Jun 3, 2014 at 5:41 PM, Joe Stringer wrote: > Signed-off-by: Joe Stringer > --- > v3: Set the default to -j8. > v2: Make it configurable. > v1: First post. > --- > utilities/ovs-dev.py |6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/utilities/ovs-dev.py b/utilities/ovs-dev.py > index 2c3af19..fa74d57 100755 > --- a/utilities/ovs-dev.py > +++ b/utilities/ovs-dev.py > @@ -145,6 +145,10 @@ commands.append(make) > > def check(): > flags = "" > +if options.parallel: > +flags += "-j%d " % options.parallel > +else: > +flags += "-j8 " > if options.tests: > for arg in str.split(options.tests): > if arg[0].isdigit(): > @@ -364,6 +368,8 @@ def main(): > parser.add_option_group(group) > > group = optparse.OptionGroup(parser, "check") > +group.add_option("-p", dest="parallel", metavar="N_TESTS", type="int", > + help="Run N_TESTS in parallel") > group.add_option("--tests", dest="tests", metavar="FILTER", > help="""run specific tests and/or a test category >eg, --tests=\"1-10 megaflow\) > -- > 1.7.10.4 > > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2.58] datapath: Add basic MPLS support to kernel
On Tue, Jun 03, 2014 at 03:40:27PM -0700, Jesse Gross wrote: > On Mon, Jun 2, 2014 at 9:04 PM, Simon Horman wrote: > > Hi Jesse, > > > > thanks for your feedback. > > > > On Mon, Jun 02, 2014 at 05:58:10PM -0700, Jesse Gross wrote: > >> On Sun, May 25, 2014 at 5:22 PM, Simon Horman wrote: > >> > diff --git a/datapath/flow_netlink.c b/datapath/flow_netlink.c > >> > index 803a94c..8ce596c 100644 > >> > --- a/datapath/flow_netlink.c > >> > +++ b/datapath/flow_netlink.c > >> > + case OVS_ACTION_ATTR_POP_MPLS: > >> > + if (!eth_p_mpls(eth_type)) > >> > + return -EINVAL; > >> > >> Should this also take into account the VLAN tag? It's really part of > >> the EtherType although it has been stripped out here. Actually, maybe > >> it's better to not track the vlan_tci separately at all during > >> validation but just fold it into the EtherType. > >> > >> The practical implication of this is that you wouldn't be able to pop > >> out from underneath a VLAN. This may be a good thing if we are trying > >> to avoid tag order issues - after all, you can't push underneath a > >> VLAN either. I'm not sure what effects this has on the need to track > >> mac_len, if any. > > > > My thinking is that the ordering problem only surfaces in relation > > to push MPLS actions where should it go in relation to VLAN tags. > > For pop actions it seems to me that the outermost tag should be removed > > regardless of its position in relation to other tags. > > > > So I think that the code above is safe. Though now you mention > > it I do notice that it only allows pop MPLS if there is at most > > one VLAN tag present. > > > > That said, I would not mind particularly disabling pop MPLS in the > > presence of VLAN tags. At the very least it is related to the > > painful issue of tag ordering. > > I agree that it is safe but my thought was the it avoids a number of > potential corner cases such as: > * Difference between push and pop underneath vlan tags. > * Pop with multiple vlan tags > * Differences with varying EtherTypes used for vlans > > > I explored your idea of tracking only eth_type rather than both > > it and vlan_tci. I did this by adding the following logic to > > ovs_nla_copy_actions(). > > > > if (key->eth.tci & htons(VLAN_TAG_PRESENT)) > > eth_type = htons(ETH_P_8021Q); > > else > > eth_type = key->eth.type; > > > > I then updated the usage of eth_type in ovs_nla_copy_actions__() > > accordingly. > > One problem that I have run into is what to do about pop VLAN. > > > > I don't believe its possible to know what the new eth type is. > > This makes subsequent checks of the eth type for validate_set() > > a little awkward. And seems to indicate that an extra parameter would > > be needed. > > > > For this reason I am inclined to leave the eth_type and vlan_tci > > parameters in place. In this case there is no problem with pop VLAN > > as the ether type inside the VLAN tag should be the value of eth_type. > > Can't we populate eth_type with the EtherType from the flow key in > pop_vlan? This doesn't provide us with the ability to look arbitrarily > deep into the packet but it should at least retain the functionality > that we have today. I think that things are still a bit tricky for at least two reasons. 1. Tracking eth_type if there are more push VLAN actions than pop VLAN actions e.g.: A packet with no VLAN tags with the following actions applied: push VLAN, push VLAN, pop VLAN I believe that with the change your propose eth_type would end up being that of the original packet than of the packet with one VLAN tag removed after two are added. I think would allow a push MPLS action but it should be rejected. I think this could be resolved by setting eth_type to a bogus value (e.g. 0) but that would may cause subsequent calls to validate_set() to fail when they should pass. 2. The use of eth_type in validate_set(). Suppose we have an IPv6 packet with one VLAN. validate_set() allows an IPv6 set operation if the eth_type is IPv6. Which would be true if the eth_type is that of the inner rather than the outer packet. However, I believe with the scheme you are proposing this becomes a bit tricky as the eth_type will be that of the outer packet. I suppose this particular problem could be resolved using something like this in validate_set(): if (eth_type == htons(ETH_P_8021Q)) eth_type = flow_key.eth_type; But that may get a bit tricky. For example suppose there is an IPv6 packet with no VLAN tags and the following actions are applied. push MPLS, push VLAN, set IPv6 validate_set() would be passed: eth_type: htons(ETH_P_8021Q) flow_key.eth_type: htons(ETH_P_IPV6) But validate_set should be checking against an MPLS ethtype. I think this particular problem could be resolved with a new restriction,
Re: [ovs-dev] [PATCH 3/3] dpif-netdev: batch packet processing
Couple of questions. First, this patch adds a bit of code which extends beyond 79 characters. dp_netdev_input() does this rather interesting stuff with two keys that mfk switches between. This is definitely not obvious when first looking at it. Minimally it needs a comment explaining why. That said I don't totally understand it, why can't we ruse the same key after each batch is executed? If we can't, why is using only two keys sufficient? This seems a bit odd. Actually on that note, do we need miniflowkey at all? Could it just be embedded at the end of batch_pkt_execute? After this patch, both the hash and recirc and set actions have the wrong behavior. I'd rather they abort() the switch than silently misbehave. Looking at this now, I think this is a rather deep issue that we need to think through. Let's discuss it tomorrow. Ethan On Tue, Jun 3, 2014 at 5:10 PM, Daniele Di Proietto wrote: > This change in dpif-netdev allows faster packet processing for devices which > implement batching (netdev-dpdk currently). > > Signed-off-by: Daniele Di Proietto > --- > lib/dpif-netdev.c| 240 > ++- > lib/dpif.c | 7 +- > lib/odp-execute.c| 49 ++--- > lib/odp-execute.h| 4 +- > ofproto/ofproto-dpif-xlate.c | 2 +- > 5 files changed, 211 insertions(+), 91 deletions(-) > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > index e0e5ad8..fba1d65 100644 > --- a/lib/dpif-netdev.c > +++ b/lib/dpif-netdev.c > @@ -331,18 +331,18 @@ static void dp_netdev_destroy_all_queues(struct > dp_netdev *dp) > OVS_REQ_WRLOCK(dp->queue_rwlock); > static int dpif_netdev_open(const struct dpif_class *, const char *name, > bool create, struct dpif **); > -static int dp_netdev_output_userspace(struct dp_netdev *dp, struct ofpbuf *, > - int queue_no, int type, > +static int dp_netdev_output_userspace(struct dp_netdev *dp, struct ofpbuf **, > + int cnt, int queue_no, int type, >const struct miniflow *, >const struct nlattr *userdata); > static void dp_netdev_execute_actions(struct dp_netdev *dp, >const struct miniflow *, > - struct ofpbuf *, bool may_steal, > + struct ofpbuf **, int cnt, bool > may_steal, >struct pkt_metadata *, >const struct nlattr *actions, >size_t actions_len); > -static void dp_netdev_port_input(struct dp_netdev *dp, struct ofpbuf *packet, > - struct pkt_metadata *); > +static void dp_netdev_port_input(struct dp_netdev *dp, struct ofpbuf > **packets, > + int cnt, odp_port_t port_no); > > static void dp_netdev_set_pmd_threads(struct dp_netdev *, int n); > > @@ -1518,7 +1518,6 @@ static int > dpif_netdev_execute(struct dpif *dpif, struct dpif_execute *execute) > { > struct dp_netdev *dp = get_dp_netdev(dpif); > -struct pkt_metadata *md = &execute->md; > struct { > struct miniflow flow; > uint32_t buf[FLOW_U32S]; > @@ -1531,9 +1530,9 @@ dpif_netdev_execute(struct dpif *dpif, struct > dpif_execute *execute) > > /* Extract flow key. */ > miniflow_initialize(&key.flow, key.buf); > -miniflow_extract(execute->packet, md, &key.flow); > +miniflow_extract(execute->packet, &execute->md, &key.flow); > > -dp_netdev_execute_actions(dp, &key.flow, execute->packet, false, md, > +dp_netdev_execute_actions(dp, &key.flow, &execute->packet, 1, false, > &execute->md, >execute->actions, execute->actions_len); > > return 0; > @@ -1747,17 +1746,12 @@ dp_netdev_process_rxq_port(struct dp_netdev *dp, >struct dp_netdev_port *port, >struct netdev_rxq *rxq) > { > -struct ofpbuf *packet[NETDEV_MAX_RX_BATCH]; > -int error, c; > +struct ofpbuf *packets[NETDEV_MAX_RX_BATCH]; > +int error, cnt; > > -error = netdev_rxq_recv(rxq, packet, &c); > +error = netdev_rxq_recv(rxq, packets, &cnt); > if (!error) { > -struct pkt_metadata md = PKT_METADATA_INITIALIZER(port->port_no); > -int i; > - > -for (i = 0; i < c; i++) { > -dp_netdev_port_input(dp, packet[i], &md); > -} > +dp_netdev_port_input(dp, packets, cnt, port->port_no); > } else if (error != EAGAIN && error != EOPNOTSUPP) { > static struct vlog_rate_limit rl > = VLOG_RATE_LIMIT_INIT(1, 5); > @@ -1954,7 +1948,7 @@ dp_netdev_flow_stats_new_cb(void) > > static void > dp_netdev_flow_used(struct dp_netdev_flow *netdev_flow, > -
[ovs-dev] [PATCH v4] dpif-netdev: Remove an extra memcpy of packet data from datapath-upcall interface for bridges with datapath_type=netdev.
When a bridge of datatype type netdev receives a packet, it copies the packet from the NIC to a buffer in userspace. Currently, when making an upcall, the packet is again copied to the upcall's buffer. However, this extra copy is not necessary when the datapath exists in userspace as the upcall can directly access the packet data. This patch eliminates this extra copy of the packet data in most cases. In cases where the packet may still be used later by callers of dp_netdev_execute_actions, making a copy of the packet data is still necessary. This patch also adds a header_ field to 'struct ofpbuf' when using DPDK. This field holds a pointer to the allocated buffer in the rte_mempool. Thus, an upcall packet ofpbuf allocated on the stack can now share data and free memory of a rte_mempool allocated ofpbuf. Signed-off-by: Ryan Wilson Acked-by: Jarno Rajahalme --- v2: Addressed Jarno's comment to use direct pointer assignment for upcall->packet instead of ofpbuf_set(). v3: Addressed issues with DPDK mentioned by Pravin. v4: Fixed various bugs found in perf test --- lib/dpif-netdev.c | 15 +-- lib/netdev-dpdk.c |2 +- lib/ofpbuf.c |6 +- lib/ofpbuf.h | 20 ++-- 4 files changed, 25 insertions(+), 18 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 91c83d6..c89ae20 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -2024,7 +2024,6 @@ dp_netdev_input(struct dp_netdev *dp, struct ofpbuf *packet, miniflow_hash_5tuple(&key.flow, 0) % dp->n_handlers, DPIF_UC_MISS, &key.flow, NULL); -ofpbuf_delete(packet); } } @@ -2063,7 +2062,6 @@ dp_netdev_output_userspace(struct dp_netdev *dp, struct ofpbuf *packet, if (userdata) { buf_size += NLA_ALIGN(userdata->nla_len); } -buf_size += ofpbuf_size(packet); ofpbuf_init(buf, buf_size); /* Put ODP flow. */ @@ -2078,15 +2076,14 @@ dp_netdev_output_userspace(struct dp_netdev *dp, struct ofpbuf *packet, NLA_ALIGN(userdata->nla_len)); } -ofpbuf_set_data(&upcall->packet, -ofpbuf_put(buf, ofpbuf_data(packet), ofpbuf_size(packet))); -ofpbuf_set_size(&upcall->packet, ofpbuf_size(packet)); +upcall->packet = *packet; seq_change(q->seq); error = 0; } else { dp_netdev_count_packet(dp, DP_STAT_LOST); +ofpbuf_delete(packet); error = ENOBUFS; } ovs_mutex_unlock(&q->mutex); @@ -2120,19 +2117,17 @@ dp_execute_cb(void *aux_, struct ofpbuf *packet, break; case OVS_ACTION_ATTR_USERSPACE: { +struct ofpbuf *userspace_packet; const struct nlattr *userdata; userdata = nl_attr_find_nested(a, OVS_USERSPACE_ATTR_USERDATA); +userspace_packet = may_steal ? packet : ofpbuf_clone(packet); -dp_netdev_output_userspace(aux->dp, packet, +dp_netdev_output_userspace(aux->dp, userspace_packet, miniflow_hash_5tuple(aux->key, 0) % aux->dp->n_handlers, DPIF_UC_ACTION, aux->key, userdata); - -if (may_steal) { -ofpbuf_delete(packet); -} break; } diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index ba41d2e..e48bfde 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -205,7 +205,7 @@ dpdk_rte_mzalloc(size_t sz) void free_dpdk_buf(struct ofpbuf *b) { -struct rte_mbuf *pkt = (struct rte_mbuf *) b; +struct rte_mbuf *pkt = (struct rte_mbuf *) ofpbuf_header(b); rte_mempool_put(pkt->pool, pkt); } diff --git a/lib/ofpbuf.c b/lib/ofpbuf.c index 1f4b61d..9490a1e 100644 --- a/lib/ofpbuf.c +++ b/lib/ofpbuf.c @@ -117,6 +117,9 @@ void ofpbuf_init_dpdk(struct ofpbuf *b, size_t allocated) { ofpbuf_init__(b, allocated, OFPBUF_DPDK); +#ifdef DPDK_NETDEV +ofpbuf_set_header(b, b); +#endif } /* Initializes 'b' as an empty ofpbuf with an initial capacity of 'size' @@ -134,8 +137,9 @@ ofpbuf_uninit(struct ofpbuf *b) if (b) { if (b->source == OFPBUF_MALLOC) { free(ofpbuf_base(b)); +} else if (b->source == OFPBUF_DPDK) { +free_dpdk_buf(b); } -ovs_assert(b->source != OFPBUF_DPDK); } } diff --git a/lib/ofpbuf.h b/lib/ofpbuf.h index 85be899..54113d0 100644 --- a/lib/ofpbuf.h +++ b/lib/ofpbuf.h @@ -59,6 +59,7 @@ enum OVS_PACKED_ENUM ofpbuf_source { struct ofpbuf { #ifdef DPDK_NETDEV struct rte_mbuf mbuf; /* DPDK mbuf */ +struct ofpbuf *header_; /* First byte of allocated buffer header. */ #else void *base_; /* First byte of allocated space. */ void *data_; /* First byte actually in use. */
Re: [ovs-dev] [PATCH 3/3] dpif-netdev: batch packet processing
Oh, also I'm getting these errors when compiling with clang's thread saftey analysis. ../lib/dpif-netdev.c:2115:12: error: reading variable 'head' requires locking any mutex [-Werror,-Wthread-safety-analysis] if (q->head - q->tail < MAX_QUEUE_LEN) { ^ ../lib/dpif-netdev.c:2115:22: error: reading variable 'tail' requires locking any mutex [-Werror,-Wthread-safety-analysis] if (q->head - q->tail < MAX_QUEUE_LEN) { ^ ../lib/dpif-netdev.c:2116:53: error: writing variable 'head' requires locking any mutex exclusively [-Werror,-Wthread-safety-analysis] struct dp_netdev_upcall *u = &q->upcalls[q->head++ & QUEUE_MASK]; On Tue, Jun 3, 2014 at 6:05 PM, Ethan Jackson wrote: > Couple of questions. > > First, this patch adds a bit of code which extends beyond 79 characters. > > dp_netdev_input() does this rather interesting stuff with two keys > that mfk switches between. This is definitely not obvious when first > looking at it. Minimally it needs a comment explaining why. That > said I don't totally understand it, why can't we ruse the same key > after each batch is executed? If we can't, why is using only two keys > sufficient? This seems a bit odd. > > Actually on that note, do we need miniflowkey at all? Could it just > be embedded at the end of batch_pkt_execute? > > After this patch, both the hash and recirc and set actions have the > wrong behavior. I'd rather they abort() the switch than silently > misbehave. Looking at this now, I think this is a rather deep issue > that we need to think through. Let's discuss it tomorrow. > > Ethan > > On Tue, Jun 3, 2014 at 5:10 PM, Daniele Di Proietto > wrote: >> This change in dpif-netdev allows faster packet processing for devices which >> implement batching (netdev-dpdk currently). >> >> Signed-off-by: Daniele Di Proietto >> --- >> lib/dpif-netdev.c| 240 >> ++- >> lib/dpif.c | 7 +- >> lib/odp-execute.c| 49 ++--- >> lib/odp-execute.h| 4 +- >> ofproto/ofproto-dpif-xlate.c | 2 +- >> 5 files changed, 211 insertions(+), 91 deletions(-) >> >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c >> index e0e5ad8..fba1d65 100644 >> --- a/lib/dpif-netdev.c >> +++ b/lib/dpif-netdev.c >> @@ -331,18 +331,18 @@ static void dp_netdev_destroy_all_queues(struct >> dp_netdev *dp) >> OVS_REQ_WRLOCK(dp->queue_rwlock); >> static int dpif_netdev_open(const struct dpif_class *, const char *name, >> bool create, struct dpif **); >> -static int dp_netdev_output_userspace(struct dp_netdev *dp, struct ofpbuf *, >> - int queue_no, int type, >> +static int dp_netdev_output_userspace(struct dp_netdev *dp, struct ofpbuf >> **, >> + int cnt, int queue_no, int type, >>const struct miniflow *, >>const struct nlattr *userdata); >> static void dp_netdev_execute_actions(struct dp_netdev *dp, >>const struct miniflow *, >> - struct ofpbuf *, bool may_steal, >> + struct ofpbuf **, int cnt, bool >> may_steal, >>struct pkt_metadata *, >>const struct nlattr *actions, >>size_t actions_len); >> -static void dp_netdev_port_input(struct dp_netdev *dp, struct ofpbuf >> *packet, >> - struct pkt_metadata *); >> +static void dp_netdev_port_input(struct dp_netdev *dp, struct ofpbuf >> **packets, >> + int cnt, odp_port_t port_no); >> >> static void dp_netdev_set_pmd_threads(struct dp_netdev *, int n); >> >> @@ -1518,7 +1518,6 @@ static int >> dpif_netdev_execute(struct dpif *dpif, struct dpif_execute *execute) >> { >> struct dp_netdev *dp = get_dp_netdev(dpif); >> -struct pkt_metadata *md = &execute->md; >> struct { >> struct miniflow flow; >> uint32_t buf[FLOW_U32S]; >> @@ -1531,9 +1530,9 @@ dpif_netdev_execute(struct dpif *dpif, struct >> dpif_execute *execute) >> >> /* Extract flow key. */ >> miniflow_initialize(&key.flow, key.buf); >> -miniflow_extract(execute->packet, md, &key.flow); >> +miniflow_extract(execute->packet, &execute->md, &key.flow); >> >> -dp_netdev_execute_actions(dp, &key.flow, execute->packet, false, md, >> +dp_netdev_execute_actions(dp, &key.flow, &execute->packet, 1, false, >> &execute->md, >>execute->actions, execute->actions_len); >> >> return 0; >> @@ -1747,17 +1746,12 @@ dp_netdev_process_rxq_port(struct dp_netdev *dp, >>struct dp_netdev_port *port, >>struct netdev_rxq *rxq) >> { >> -
Re: [ovs-dev] [PATCH] dpif-netdev: Remove an extra memcpy of packet data from datapath-upcall interface for bridges with datapath_type=netdev.
Sure, I combined them and sent out a new patch fixing all the bugs I encountered in the DPDK perf test. http://patchwork.openvswitch.org/patch/4388/ One issue is that since ofproto-dpif-upcall calls ofpbuf_uninit(), I had to free the DPDK buffer in that function as opposed to ofpbuf_delete(). I'm not sure I entirely like that since we're freeing the header too, which goes directly against the philosophy of ofpbuf_uninit(). One idea I had was to only allow ofpbuf_uninit to be called with OFPBUF_DPDK only if struct ofpbuf *b != ofpbuf_header(b) (meaning b is a copy of the header and thus we can free the original header). I figured I'd let you check out the patch and see if you had any other ideas. Cheers, Ryan On 6/3/14 5:22 PM, "Pravin Shelar" wrote: >Can you combine it with second patch, otherwise it introduces a bug. > >On Tue, Jun 3, 2014 at 4:22 PM, Ryan Wilson 76511 >wrote: >> I've ran into some unexpected issues while perf testing this, lets hold >> off on looking at this. I'll submit another patch when I've had all the >> kinks worked out. >> >> Ryan >> >> On 6/3/14 2:21 PM, "Ryan Wilson 76511" wrote: >> >>>Hey Pravin, >>> >>>Thanks for the catch here! Turns out the header is already tracked in >>>DPDK >>>with rte_mbuf's buffer address - sizeof(ofpbuf). Thus, I submitted >>>another >>>patch that, in free_dpdk_buf(), always gets this header and uses this to >>>free memory. >>> >>>https://urldefense.proofpoint.com/v1/url?u=http://patchwork.openvswitch. >>>org/patch/4375/&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=TfBS78Vw3dzttvXidh >>>bffg%3D%3D%0A&m=gZZtvgxp3GIUGlkz85Olmu3cqs62R0tf%2F4IxSAQLQ9Y%3D%0A&s=20 >>>9e52cdfc433e5fa4deb7ef48a561f78207981f0dabf4dd5ecb3257bde613e2 >>> >>> >>>Let me know if this patch works. >>> >>>Cheers, >>> >>>Ryan >>> >>>On 6/3/14 10:50 AM, "Pravin Shelar" wrote: >>> On Tue, Jun 3, 2014 at 10:33 AM, Ryan Wilson wrote: > When a bridge of datatype type netdev receives a packet, it copies >the > packet from the NIC to a buffer in userspace. Currently, when making > an upcall, the packet is again copied to the upcall's buffer. >However, > this extra copy is not necessary when the datapath exists in >userspace > as the upcall can directly access the packet data. > > This patch eliminates this extra copy of the packet data in most >cases. > In cases where the packet may still be used later by callers of > dp_netdev_execute_actions, making a copy of the packet data is still > necessary. > > Signed-off-by: Ryan Wilson > --- > v2: Addressed Jarno's comment to use direct pointer assignment for > upcall->packet instead of ofpbuf_set(). > --- > lib/dpif-netdev.c | 15 +-- > 1 file changed, 5 insertions(+), 10 deletions(-) > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > index 91c83d6..c89ae20 100644 > --- a/lib/dpif-netdev.c > +++ b/lib/dpif-netdev.c > @@ -2024,7 +2024,6 @@ dp_netdev_input(struct dp_netdev *dp, struct >ofpbuf *packet, > miniflow_hash_5tuple(&key.flow, >0) > % dp->n_handlers, > DPIF_UC_MISS, &key.flow, NULL); > -ofpbuf_delete(packet); > } > } > > @@ -2063,7 +2062,6 @@ dp_netdev_output_userspace(struct dp_netdev >*dp, >struct ofpbuf *packet, > if (userdata) { > buf_size += NLA_ALIGN(userdata->nla_len); > } > -buf_size += ofpbuf_size(packet); > ofpbuf_init(buf, buf_size); > > /* Put ODP flow. */ > @@ -2078,15 +2076,14 @@ dp_netdev_output_userspace(struct dp_netdev >*dp, struct ofpbuf *packet, > >NLA_ALIGN(userdata->nla_len)); > } > > -ofpbuf_set_data(&upcall->packet, > -ofpbuf_put(buf, ofpbuf_data(packet), >ofpbuf_size(packet))); > -ofpbuf_set_size(&upcall->packet, ofpbuf_size(packet)); > +upcall->packet = *packet; > This would not work with DPDK. ofpbuf from dpdk is special case where ofpbuf and data are allocated from same memory object. Therefore moving ofpbuf->data is nontrivial. To make it work we need atleast following covered. 1. Define flag for source of ofpbuf header. So that we can track header and data independently. 2. Fix dpdk ofpbuf destructor to free correct dpdk memory object. Thanks, Pravin. >>> >> ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v4] dpif-netdev: Remove an extra memcpy of packet data from datapath-upcall interface for bridges with datapath_type=netdev.
> On Jun 3, 2014, at 6:11 PM, Ryan Wilson wrote: > > When a bridge of datatype type netdev receives a packet, it copies the > packet from the NIC to a buffer in userspace. Currently, when making > an upcall, the packet is again copied to the upcall's buffer. However, > this extra copy is not necessary when the datapath exists in userspace > as the upcall can directly access the packet data. > > This patch eliminates this extra copy of the packet data in most cases. > In cases where the packet may still be used later by callers of > dp_netdev_execute_actions, making a copy of the packet data is still > necessary. > > This patch also adds a header_ field to 'struct ofpbuf' when using > DPDK. This field holds a pointer to the allocated buffer in the > rte_mempool. Thus, an upcall packet ofpbuf allocated on the stack can > now share data and free memory of a rte_mempool allocated ofpbuf. > > Signed-off-by: Ryan Wilson > Acked-by: Jarno Rajahalme > --- > v2: Addressed Jarno's comment to use direct pointer assignment for > upcall->packet instead of ofpbuf_set(). > v3: Addressed issues with DPDK mentioned by Pravin. > v4: Fixed various bugs found in perf test > --- > lib/dpif-netdev.c | 15 +-- > lib/netdev-dpdk.c |2 +- > lib/ofpbuf.c |6 +- > lib/ofpbuf.h | 20 ++-- > 4 files changed, 25 insertions(+), 18 deletions(-) > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > index 91c83d6..c89ae20 100644 > --- a/lib/dpif-netdev.c > +++ b/lib/dpif-netdev.c > @@ -2024,7 +2024,6 @@ dp_netdev_input(struct dp_netdev *dp, struct ofpbuf > *packet, >miniflow_hash_5tuple(&key.flow, 0) >% dp->n_handlers, >DPIF_UC_MISS, &key.flow, NULL); > -ofpbuf_delete(packet); > } > } > > @@ -2063,7 +2062,6 @@ dp_netdev_output_userspace(struct dp_netdev *dp, struct > ofpbuf *packet, > if (userdata) { > buf_size += NLA_ALIGN(userdata->nla_len); > } > -buf_size += ofpbuf_size(packet); > ofpbuf_init(buf, buf_size); > > /* Put ODP flow. */ > @@ -2078,15 +2076,14 @@ dp_netdev_output_userspace(struct dp_netdev *dp, > struct ofpbuf *packet, > NLA_ALIGN(userdata->nla_len)); > } > > -ofpbuf_set_data(&upcall->packet, > -ofpbuf_put(buf, ofpbuf_data(packet), > ofpbuf_size(packet))); > -ofpbuf_set_size(&upcall->packet, ofpbuf_size(packet)); > +upcall->packet = *packet; > > seq_change(q->seq); > > error = 0; > } else { > dp_netdev_count_packet(dp, DP_STAT_LOST); > +ofpbuf_delete(packet); > error = ENOBUFS; > } > ovs_mutex_unlock(&q->mutex); > @@ -2120,19 +2117,17 @@ dp_execute_cb(void *aux_, struct ofpbuf *packet, > break; > > case OVS_ACTION_ATTR_USERSPACE: { > +struct ofpbuf *userspace_packet; > const struct nlattr *userdata; > > userdata = nl_attr_find_nested(a, OVS_USERSPACE_ATTR_USERDATA); > +userspace_packet = may_steal ? packet : ofpbuf_clone(packet); > > -dp_netdev_output_userspace(aux->dp, packet, > +dp_netdev_output_userspace(aux->dp, userspace_packet, >miniflow_hash_5tuple(aux->key, 0) >% aux->dp->n_handlers, >DPIF_UC_ACTION, aux->key, >userdata); > - > -if (may_steal) { > -ofpbuf_delete(packet); > -} > break; > } > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c > index ba41d2e..e48bfde 100644 > --- a/lib/netdev-dpdk.c > +++ b/lib/netdev-dpdk.c > @@ -205,7 +205,7 @@ dpdk_rte_mzalloc(size_t sz) > void > free_dpdk_buf(struct ofpbuf *b) > { > -struct rte_mbuf *pkt = (struct rte_mbuf *) b; > +struct rte_mbuf *pkt = (struct rte_mbuf *) ofpbuf_header(b); > Could use the address of the 'mbuf' member instead of a cast here? > rte_mempool_put(pkt->pool, pkt); > } > diff --git a/lib/ofpbuf.c b/lib/ofpbuf.c > index 1f4b61d..9490a1e 100644 > --- a/lib/ofpbuf.c > +++ b/lib/ofpbuf.c > @@ -117,6 +117,9 @@ void > ofpbuf_init_dpdk(struct ofpbuf *b, size_t allocated) > { > ofpbuf_init__(b, allocated, OFPBUF_DPDK); > +#ifdef DPDK_NETDEV > +ofpbuf_set_header(b, b); > +#endif > } > > /* Initializes 'b' as an empty ofpbuf with an initial capacity of 'size' > @@ -134,8 +137,9 @@ ofpbuf_uninit(struct ofpbuf *b) > if (b) { > if (b->source == OFPBUF_MALLOC) { > free(ofpbuf_base(b)); > +} else if (b->source == OFPBUF_DPDK) { > +free_dpdk_buf(b); > } > -ovs_assert(b->source != OFPBUF_DPDK); > } > } > > diff --git a/lib/ofpbuf.h b/lib/ofpbuf.h > index 85be899..54113d0 100644 > --- a/lib/ofpbuf.h > +++ b/lib/ofpbuf.h > @@ -59,6 +59,7 @@ enum OVS_
Re: [ovs-dev] [PATCH] dpif-netdev: Remove an extra memcpy of packet data from datapath-upcall interface for bridges with datapath_type=netdev.
> On Jun 3, 2014, at 6:22 PM, Ryan Wilson 76511 wrote: > > Sure, I combined them and sent out a new patch fixing all the bugs I > encountered in the DPDK perf test. > > http://patchwork.openvswitch.org/patch/4388/ > > > One issue is that since ofproto-dpif-upcall calls ofpbuf_uninit(), I had > to free the DPDK buffer in that function as opposed to ofpbuf_delete(). ofpbuf_delete() is supposed to delete the ofpbuf itself, while ofpbuf_ininit() deletes any associated storage for an ofpbuf whose own memory is managed elsewhere (stack, some struct, etc). Upcall handling will call uninit on a copy of the dpdk ofpbuf, not on the dpdk allocated buf itself, so the ofpbuf pointer given as a parameter to uninit remains valid even after the call. > I'm not sure I entirely like that since we're freeing the header too, > which goes directly against the philosophy of ofpbuf_uninit(). > As said above, you should be fine in this regard. > One idea I had was to only allow ofpbuf_uninit to be called with > OFPBUF_DPDK only if struct ofpbuf *b != ofpbuf_header(b) (meaning b is a > copy of the header and thus we can free the original header). I figured > I'd let you check out the patch and see if you had any other ideas. > An assert to this effect would help detect/avoid bugs. > Cheers, > > Ryan > >> On 6/3/14 5:22 PM, "Pravin Shelar" wrote: >> >> Can you combine it with second patch, otherwise it introduces a bug. >> >> On Tue, Jun 3, 2014 at 4:22 PM, Ryan Wilson 76511 >> wrote: >>> I've ran into some unexpected issues while perf testing this, lets hold >>> off on looking at this. I'll submit another patch when I've had all the >>> kinks worked out. >>> >>> Ryan >>> On 6/3/14 2:21 PM, "Ryan Wilson 76511" wrote: Hey Pravin, Thanks for the catch here! Turns out the header is already tracked in DPDK with rte_mbuf's buffer address - sizeof(ofpbuf). Thus, I submitted another patch that, in free_dpdk_buf(), always gets this header and uses this to free memory. https://urldefense.proofpoint.com/v1/url?u=http://patchwork.openvswitch. org/patch/4375/&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=TfBS78Vw3dzttvXidh bffg%3D%3D%0A&m=gZZtvgxp3GIUGlkz85Olmu3cqs62R0tf%2F4IxSAQLQ9Y%3D%0A&s=20 9e52cdfc433e5fa4deb7ef48a561f78207981f0dabf4dd5ecb3257bde613e2 Let me know if this patch works. Cheers, Ryan > On 6/3/14 10:50 AM, "Pravin Shelar" wrote: > >> On Tue, Jun 3, 2014 at 10:33 AM, Ryan Wilson wrote: >> When a bridge of datatype type netdev receives a packet, it copies >> the >> packet from the NIC to a buffer in userspace. Currently, when making >> an upcall, the packet is again copied to the upcall's buffer. >> However, >> this extra copy is not necessary when the datapath exists in >> userspace >> as the upcall can directly access the packet data. >> >> This patch eliminates this extra copy of the packet data in most >> cases. >> In cases where the packet may still be used later by callers of >> dp_netdev_execute_actions, making a copy of the packet data is still >> necessary. >> >> Signed-off-by: Ryan Wilson >> --- >> v2: Addressed Jarno's comment to use direct pointer assignment for >> upcall->packet instead of ofpbuf_set(). >> --- >> lib/dpif-netdev.c | 15 +-- >> 1 file changed, 5 insertions(+), 10 deletions(-) >> >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c >> index 91c83d6..c89ae20 100644 >> --- a/lib/dpif-netdev.c >> +++ b/lib/dpif-netdev.c >> @@ -2024,7 +2024,6 @@ dp_netdev_input(struct dp_netdev *dp, struct >> ofpbuf *packet, >>miniflow_hash_5tuple(&key.flow, >> 0) >>% dp->n_handlers, >>DPIF_UC_MISS, &key.flow, NULL); >> -ofpbuf_delete(packet); >> } >> } >> >> @@ -2063,7 +2062,6 @@ dp_netdev_output_userspace(struct dp_netdev >> *dp, >> struct ofpbuf *packet, >> if (userdata) { >> buf_size += NLA_ALIGN(userdata->nla_len); >> } >> -buf_size += ofpbuf_size(packet); >> ofpbuf_init(buf, buf_size); >> >> /* Put ODP flow. */ >> @@ -2078,15 +2076,14 @@ dp_netdev_output_userspace(struct dp_netdev >> *dp, struct ofpbuf *packet, >> >> NLA_ALIGN(userdata->nla_len)); >> } >> >> -ofpbuf_set_data(&upcall->packet, >> -ofpbuf_put(buf, ofpbuf_data(packet), >> ofpbuf_size(packet))); >> -ofpbuf_set_size(&upcall->packet, ofpbuf_size(packet)); >> +upcall->packet = *packet; > > This would not work with DPDK. ofpbuf from dpdk is special case where > ofpbuf and data are allocated from same memory object. Th
[ovs-dev] [PATCHv2 3/3] revalidator: Use xcache when revalidation is required.
One of the reasons that xlate_cache was introduced was to ensure that statistics were attributed to the correct rules and interfaces according to the flow that was installed into the datapath, rather than according to the current state of the flow table. This patch makes the revalidators use the xlate_cache to attribute stats when full revalidation is required. Furthermore, rather than deleting the xcache when revalidating, keep it around. Signed-off-by: Joe Stringer --- v2: Don't delete the old xcache when revalidating. Removed extraneous "ukey->xcache = NULL". --- ofproto/ofproto-dpif-upcall.c | 23 --- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c index 90e18e3..b4aa48b 100644 --- a/ofproto/ofproto-dpif-upcall.c +++ b/ofproto/ofproto-dpif-upcall.c @@ -1148,10 +1148,16 @@ revalidate_ukey(struct udpif *udpif, struct udpif_key *ukey, } may_learn = push.n_packets > 0; -if (ukey->xcache && !udpif->need_revalidate) { +if (ukey->xcache) { xlate_push_stats(ukey->xcache, may_learn, &push); -ok = true; -goto exit; +if (udpif->need_revalidate) { +push.n_packets = 0; +push.n_bytes = 0; +may_learn = false; +} else { +ok = true; +goto exit; +} } error = xlate_receive(udpif->backer, NULL, ukey->key, ukey->key_len, &flow, @@ -1160,16 +1166,11 @@ revalidate_ukey(struct udpif *udpif, struct udpif_key *ukey, goto exit; } -if (udpif->need_revalidate) { -xlate_cache_clear(ukey->xcache); -} -if (!ukey->xcache) { -ukey->xcache = xlate_cache_new(); -} - xlate_in_init(&xin, ofproto, &flow, NULL, push.tcp_flags, NULL); xin.resubmit_stats = push.n_packets ? &push : NULL; -xin.xcache = ukey->xcache; +if (!ukey->xcache) { +ukey->xcache = xin.xcache = xlate_cache_new(); +} xin.may_learn = may_learn; xin.skip_wildcards = !udpif->need_revalidate; xlate_actions(&xin, &xout); -- 1.7.10.4 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCHv2 1/3] revalidator: Refactor ukey creation/lookup.
This patch refactors the code around ukey creation and lookup to simplify the code for callers. A new function ukey_acquire() combines these functions and attempts to acquire a lock on the ukey. Failure to acquire a lock on the ukey is usually a sign that another thread is handling the same flow concurrently, which means the flow does not need to be handled anyway. Signed-off-by: Joe Stringer --- v2: Make ukey_create() return the ukey locked. Rearrange the logic in ukey_acquire(). --- ofproto/ofproto-dpif-upcall.c | 95 +++-- 1 file changed, 43 insertions(+), 52 deletions(-) diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c index 9d48e98..3a75690 100644 --- a/ofproto/ofproto-dpif-upcall.c +++ b/ofproto/ofproto-dpif-upcall.c @@ -216,6 +216,12 @@ static void upcall_unixctl_set_flow_limit(struct unixctl_conn *conn, int argc, static struct udpif_key *ukey_create(const struct nlattr *key, size_t key_len, long long int used); +static struct udpif_key *ukey_lookup(struct udpif *udpif, + const struct nlattr *key, size_t key_len, + uint32_t hash); +static bool ukey_acquire(struct udpif *udpif, const struct nlattr *key, + size_t key_len, long long int used, + struct udpif_key **result); static void ukey_delete(struct revalidator *, struct udpif_key *); static atomic_bool enable_megaflows = ATOMIC_VAR_INIT(true); @@ -983,8 +989,9 @@ handle_upcalls(struct handler *handler, struct upcall *upcalls, /* Must be called with udpif->ukeys[hash % udpif->n_revalidators].mutex. */ static struct udpif_key * -ukey_lookup__(struct udpif *udpif, const struct nlattr *key, size_t key_len, - uint32_t hash) +ukey_lookup(struct udpif *udpif, const struct nlattr *key, size_t key_len, +uint32_t hash) +OVS_REQUIRES(udpif->ukeys->mutex) { struct udpif_key *ukey; struct hmap *hmap = &udpif->ukeys[hash % udpif->n_revalidators].hmap; @@ -997,26 +1004,15 @@ ukey_lookup__(struct udpif *udpif, const struct nlattr *key, size_t key_len, return NULL; } -static struct udpif_key * -ukey_lookup(struct udpif *udpif, const struct nlattr *key, size_t key_len, -uint32_t hash) -{ -struct udpif_key *ukey; -uint32_t idx = hash % udpif->n_revalidators; - -ovs_mutex_lock(&udpif->ukeys[idx].mutex); -ukey = ukey_lookup__(udpif, key, key_len, hash); -ovs_mutex_unlock(&udpif->ukeys[idx].mutex); - -return ukey; -} - +/* Creates a ukey for 'key' and 'key_len', returning it with ukey->mutex in + * a locked state. */ static struct udpif_key * ukey_create(const struct nlattr *key, size_t key_len, long long int used) +OVS_NO_THREAD_SAFETY_ANALYSIS { struct udpif_key *ukey = xmalloc(sizeof *ukey); -ovs_mutex_init(&ukey->mutex); +ovs_mutex_init(&ukey->mutex); ukey->key = (struct nlattr *) &ukey->key_buf; memcpy(&ukey->key_buf, key, key_len); ukey->key_len = key_len; @@ -1027,33 +1023,44 @@ ukey_create(const struct nlattr *key, size_t key_len, long long int used) ukey->created = used ? used : time_msec(); memset(&ukey->stats, 0, sizeof ukey->stats); ukey->xcache = NULL; -ovs_mutex_unlock(&ukey->mutex); return ukey; } -/* Checks for a ukey in 'udpif->ukeys' with the same 'ukey->key' and 'hash', - * and inserts 'ukey' if it does not exist. +/* Searches for a ukey in 'udpif->ukeys' that matches 'key' and 'key_len' and + * attempts to lock the ukey. If the ukey does not exist, create it. * - * Returns true if 'ukey' was inserted into 'udpif->ukeys', false otherwise. */ + * Returns true on success, setting *result to the matching ukey and returning + * it in a locked state. Otherwise, returns false and clears *result. */ static bool -udpif_insert_ukey(struct udpif *udpif, struct udpif_key *ukey, uint32_t hash) +ukey_acquire(struct udpif *udpif, const struct nlattr *key, size_t key_len, + long long int used, struct udpif_key **result) +OVS_TRY_LOCK(true, (*result)->mutex) { -struct udpif_key *duplicate; -uint32_t idx = hash % udpif->n_revalidators; -bool ok; +struct udpif_key *ukey; +uint32_t hash, idx; +bool locked = false; + +hash = hash_bytes(key, key_len, udpif->secret); +idx = hash % udpif->n_revalidators; ovs_mutex_lock(&udpif->ukeys[idx].mutex); -duplicate = ukey_lookup__(udpif, ukey->key, ukey->key_len, hash); -if (duplicate) { -ok = false; -} else { +ukey = ukey_lookup(udpif, key, key_len, hash); +if (!ukey) { +ukey = ukey_create(key, key_len, used); hmap_insert(&udpif->ukeys[idx].hmap, &ukey->hmap_node, hash); -ok = true; +locked = true; +} else if (!ovs_mutex_trylock(&ukey->mutex)) { +locked = true; } ovs_mutex_unlock(&udpif->ukeys[idx].mutex)
[ovs-dev] [PATCHv2 2/3] revalidator: Replace ukey->mark with dump_seq.
Rather than setting and resetting the 'mark' field in the ukey, this patch introduces a seq to track whether a flow has been seen during the most recent dump. This tidies the code and simplifies the logic for detecting when flows are duplicated from the datapath. Signed-off-by: Joe Stringer --- v2: Rebase --- ofproto/ofproto-dpif-upcall.c | 33 + 1 file changed, 17 insertions(+), 16 deletions(-) diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c index 3a75690..90e18e3 100644 --- a/ofproto/ofproto-dpif-upcall.c +++ b/ofproto/ofproto-dpif-upcall.c @@ -179,8 +179,7 @@ struct udpif_key { struct ovs_mutex mutex; /* Guards the following. */ struct dpif_flow_stats stats OVS_GUARDED; /* Last known stats.*/ long long int created OVS_GUARDED;/* Estimate of creation time. */ -bool mark OVS_GUARDED;/* For mark and sweep garbage - collection. */ +uint64_t dump_seq OVS_GUARDED;/* Tracks udpif->dump_seq. */ bool flow_exists OVS_GUARDED; /* Ensures flows are only deleted once. */ @@ -1018,7 +1017,7 @@ ukey_create(const struct nlattr *key, size_t key_len, long long int used) ukey->key_len = key_len; ovs_mutex_lock(&ukey->mutex); -ukey->mark = false; +ukey->dump_seq = 0; ukey->flow_exists = true; ukey->created = used ? used : time_msec(); memset(&ukey->stats, 0, sizeof ukey->stats); @@ -1327,8 +1326,10 @@ revalidate(struct revalidator *revalidator) { struct udpif *udpif = revalidator->udpif; struct dpif_flow_dump_thread *dump_thread; +uint64_t dump_seq; unsigned int flow_limit; +dump_seq = seq_read(udpif->dump_seq); atomic_read(&udpif->flow_limit, &flow_limit); dump_thread = dpif_flow_dump_thread_create(udpif->dump); for (;;) { @@ -1370,7 +1371,7 @@ revalidate(struct revalidator *revalidator) for (f = flows; f < &flows[n_dumped]; f++) { long long int used = f->stats.used; struct udpif_key *ukey; -bool already_dumped, mark; +bool already_dumped, keep; if (!ukey_acquire(udpif, f->key, f->key_len, used, &ukey)) { /* We couldn't acquire the ukey. This means that @@ -1380,7 +1381,7 @@ revalidate(struct revalidator *revalidator) continue; } -already_dumped = ukey->mark || !ukey->flow_exists; +already_dumped = ukey->dump_seq == dump_seq; if (already_dumped) { /* The flow has already been dumped and handled by another * revalidator during this flow dump operation. Skip it. */ @@ -1393,13 +1394,14 @@ revalidate(struct revalidator *revalidator) used = ukey->created; } if (kill_them_all || (used && used < now - max_idle)) { -mark = false; +keep = false; } else { -mark = revalidate_ukey(udpif, ukey, f); +keep = revalidate_ukey(udpif, ukey, f); } -ukey->mark = ukey->flow_exists = mark; +ukey->dump_seq = dump_seq; +ukey->flow_exists = keep; -if (!mark) { +if (!keep) { dump_op_init(&ops[n_ops++], f->key, f->key_len, ukey); } ovs_mutex_unlock(&ukey->mutex); @@ -1419,22 +1421,21 @@ revalidator_sweep__(struct revalidator *revalidator, bool purge) struct dump_op ops[REVALIDATE_MAX_BATCH]; struct udpif_key *ukey, *next; size_t n_ops; +uint64_t dump_seq; n_ops = 0; +dump_seq = seq_read(revalidator->udpif->dump_seq); /* During garbage collection, this revalidator completely owns its ukeys * map, and therefore doesn't need to do any locking. */ HMAP_FOR_EACH_SAFE (ukey, next, hmap_node, revalidator->ukeys) { -if (!purge && ukey->mark) { -ukey->mark = false; -} else if (!ukey->flow_exists) { +if (!ukey->flow_exists) { ukey_delete(revalidator, ukey); -} else { +} else if (purge || ukey->dump_seq != dump_seq) { struct dump_op *op = &ops[n_ops++]; -/* If we have previously seen a flow in the datapath, but didn't - * see it during the most recent dump, delete it. This allows us - * to clean up the ukey and keep the statistics consistent. */ +/* If we have previously seen a flow in the datapath, but it + * hasn't been seen in the current dump, delete it. */ dump_op_init(op, ukey->key, ukey->key_len, ukey); if (n_ops == REVALIDATE_MAX_BATCH) { push_dump_ops(revalidator, ops, n_ops); -- 1.7.10.4 ___ dev mailing list dev@openvs
Re: [ovs-dev] [PATCHv3 2/2] ovs-dev.py: Add option to run tests in parallel.
Thanks, I applied this with the following incremental: diff --git a/utilities/ovs-dev.py b/utilities/ovs-dev.py index fa74d57..efc65c8 100755 --- a/utilities/ovs-dev.py +++ b/utilities/ovs-dev.py @@ -145,8 +145,8 @@ commands.append(make) def check(): flags = "" -if options.parallel: -flags += "-j%d " % options.parallel +if options.jobs: +flags += "-j%d " % options.jobs else: flags += "-j8 " if options.tests: @@ -368,7 +368,7 @@ def main(): parser.add_option_group(group) group = optparse.OptionGroup(parser, "check") -group.add_option("-p", dest="parallel", metavar="N_TESTS", type="int", - help="Run N_TESTS in parallel") +group.add_option("-j", "--jobs", dest="jobs", metavar="N", type="int", + help="Run N tests in parallel") group.add_option("--tests", dest="tests", metavar="FILTER", help="""run specific tests and/or a test category On 4 June 2014 12:45, Ethan Jackson wrote: > Sorry to nit pick, but could we change the option name to "--jobs" and > the abbreviated option to "-j". This is consistent with how make does > it. > > Acked-by: Ethan Jackson > > > On Tue, Jun 3, 2014 at 5:41 PM, Joe Stringer > wrote: > > Signed-off-by: Joe Stringer > > --- > > v3: Set the default to -j8. > > v2: Make it configurable. > > v1: First post. > > --- > > utilities/ovs-dev.py |6 ++ > > 1 file changed, 6 insertions(+) > > > > diff --git a/utilities/ovs-dev.py b/utilities/ovs-dev.py > > index 2c3af19..fa74d57 100755 > > --- a/utilities/ovs-dev.py > > +++ b/utilities/ovs-dev.py > > @@ -145,6 +145,10 @@ commands.append(make) > > > > def check(): > > flags = "" > > +if options.parallel: > > +flags += "-j%d " % options.parallel > > +else: > > +flags += "-j8 " > > if options.tests: > > for arg in str.split(options.tests): > > if arg[0].isdigit(): > > @@ -364,6 +368,8 @@ def main(): > > parser.add_option_group(group) > > > > group = optparse.OptionGroup(parser, "check") > > +group.add_option("-p", dest="parallel", metavar="N_TESTS", > type="int", > > + help="Run N_TESTS in parallel") > > group.add_option("--tests", dest="tests", metavar="FILTER", > > help="""run specific tests and/or a test category > >eg, --tests=\"1-10 megaflow\) > > -- > > 1.7.10.4 > > > > ___ > > dev mailing list > > dev@openvswitch.org > > http://openvswitch.org/mailman/listinfo/dev > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 2/2] datapath: Clean up files on distclean.
Thanks Pravin, I applied this to master. On 31 May 2014 07:33, Pravin Shelar wrote: > On Mon, May 26, 2014 at 6:47 PM, Joe Stringer > wrote: > > This was causing failures in 'make distcleancheck'. > > > > Signed-off-by: Joe Stringer > > LGTM > Acked-by: Pravin B Shelar > > > --- > > datapath/linux/Makefile.main.in |3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/datapath/linux/Makefile.main.in b/datapath/linux/ > Makefile.main.in > > index 88f144c..7d18253 100644 > > --- a/datapath/linux/Makefile.main.in > > +++ b/datapath/linux/Makefile.main.in > > @@ -29,7 +29,8 @@ check: all > > installcheck: > > mostlyclean: > > clean: > > - rm -f *.o *.ko *.mod.* Module.symvers *.cmd kcompat.h.new > > + rm -f *.o *.ko *.mod.* Module.symvers .*.cmd kcompat.h.new \ > > + modules.order .tmp_versions/openvswitch.mod > > for d in $(build_links); do if test -h $$d; then rm $$d; fi; done > > distclean: clean > > rm -f kcompat.h > > -- > > 1.7.10.4 > > > > ___ > > dev mailing list > > dev@openvswitch.org > > http://openvswitch.org/mailman/listinfo/dev > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCHv3 1/2] ovs-dev.py: Add option to specify which tests to run.
Thanks, pushed to master. On 4 June 2014 12:44, Ethan Jackson wrote: > Acked-by: Ethan Jackson > > > On Tue, Jun 3, 2014 at 5:41 PM, Joe Stringer > wrote: > > Signed-off-by: Joe Stringer > > --- > > v3: Don't require users to specify "-k" for searches. > > v2: No change. > > v1: First post. > > --- > > utilities/ovs-dev.py | 14 ++ > > 1 file changed, 14 insertions(+) > > > > diff --git a/utilities/ovs-dev.py b/utilities/ovs-dev.py > > index 814a9d0..2c3af19 100755 > > --- a/utilities/ovs-dev.py > > +++ b/utilities/ovs-dev.py > > @@ -144,6 +144,14 @@ commands.append(make) > > > > > > def check(): > > +flags = "" > > +if options.tests: > > +for arg in str.split(options.tests): > > +if arg[0].isdigit(): > > +flags += "%s " % arg > > +else: > > +flags += "-k %s " % arg > > +ENV["TESTSUITEFLAGS"] = flags > > make("check") > > commands.append(check) > > > > @@ -355,6 +363,12 @@ def main(): > > const=i, help="compile with -O%d" % i) > > parser.add_option_group(group) > > > > +group = optparse.OptionGroup(parser, "check") > > +group.add_option("--tests", dest="tests", metavar="FILTER", > > + help="""run specific tests and/or a test category > > + eg, --tests=\"1-10 megaflow\) > > +parser.add_option_group(group) > > + > > group = optparse.OptionGroup(parser, "run") > > group.add_option("-g", "--gdb", dest="gdb", action="store_true", > > help="run ovs-vswitchd under gdb") > > -- > > 1.7.10.4 > > > > ___ > > dev mailing list > > dev@openvswitch.org > > http://openvswitch.org/mailman/listinfo/dev > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2] timeval: Add monotonic time functionality for NetBSD and FreeBSD.
> This patch also checks the system platform as clock_gettime > could exist on different platforms but with different values of > CLOCK_MONOTONIC and different definitions of 'struct timespec'. > In this case, the system call would be expected to catch the > error, which is dangerous. > > This patch ensures Linux, NetBSD and FreeBSD platforms use > clock_gettime with their corresponding correct values and > definitions. All other platforms use time.time(). > > Signed-off-by: Ryan Wilson Acked-by: YAMAMOTO Takashi > > --- > v2: Added Yamamoto's diff to fix NetBSD case > --- > python/ovs/timeval.py | 34 +++--- > 1 file changed, 27 insertions(+), 7 deletions(-) > > diff --git a/python/ovs/timeval.py b/python/ovs/timeval.py > index 4463e87..a16bf60 100644 > --- a/python/ovs/timeval.py > +++ b/python/ovs/timeval.py > @@ -12,22 +12,41 @@ > # See the License for the specific language governing permissions and > # limitations under the License. > > +import sys > import time > > -LIBRT = 'librt.so.1' > -CLOCK_MONOTONIC = 1 > - > try: > import ctypes > > +LIBRT = 'librt.so.1' > +clock_gettime_name = 'clock_gettime' > + > +if sys.platform.startswith("linux"): > +CLOCK_MONOTONIC = 1 > +time_t = ctypes.c_long > +elif sys.platform.startswith("netbsd"): > +# NetBSD uses function renaming for ABI versioning. While the proper > +# way to get the appropriate version is of course "#include > ", > +# it is difficult with ctypes. The following is appropriate for > +# recent versions of NetBSD, including NetBSD-6. > +LIBRT = 'libc.so.12' > +clock_gettime_name = '__clock_gettime50' > +CLOCK_MONOTONIC = 3 > +time_t = ctypes.c_int64 > +elif sys.platform.startswith("freebsd"): > +CLOCK_MONOTONIC = 4 > +time_t = ctypes.c_int64 > +else: > +raise Exception > + > class timespec(ctypes.Structure): > _fields_ = [ > -('tv_sec', ctypes.c_long), > +('tv_sec', time_t), > ('tv_nsec', ctypes.c_long), > ] > > librt = ctypes.CDLL(LIBRT) > -clock_gettime = librt.clock_gettime > +clock_gettime = getattr(librt, clock_gettime_name) > clock_gettime.argtypes = [ctypes.c_int, ctypes.POINTER(timespec)] > except: > # Librt shared library could not be loaded > @@ -48,8 +67,9 @@ if not hasattr(time, 'monotonic'): > time.monotonic = monotonic > > def msec(): > -"""Returns the current time, as the amount of time since the epoch, in > -milliseconds, as a float.""" > +""" Returns the system's monotonic time if possible, otherwise returns > the > +current time as the amount of time since the epoch, in milliseconds, as a > +float.""" > return time.monotonic() * 1000.0 > > > -- > 1.7.9.5 > > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] timeval: Add monotonic time functionality for NetBSD and FreeBSD.
> Thanks for the fix! Sorry, I don't know too much about the intricacies of > NetBSD. My first version seemed to work on my NetBSD instance though, so I > wrongly assumed it would work for most versions of NetBSD. it picked the old (pre-64bit-time_t) compat version of the function. yes, it should work. but better to use the latest one which all newly compiled C programs would use. YAMAMOTO Takashi > > I tested this on FreeBSD and it works, but I don't know the intricacies of > FreeBSD either. > > I applied your incremental in my next patch. > > http://patchwork.openvswitch.org/patch/4367/ > > > Again thanks! > > Ryan > > On 6/2/14 10:38 PM, "YAMAMOTO Takashi" wrote: > >>> This patch also checks the system platform as clock_gettime >>> could exist on different platforms but with different values of >>> CLOCK_MONOTONIC and different definitions of 'struct timespec'. >>> In this case, the system call would be expected to catch the >>> error, which is dangerous. >>> >>> This patch ensures Linux, NetBSD and FreeBSD platforms use >>> clock_gettime with their corresponding correct values and >>> definitions. All other platforms use time.time(). >> >>thanks. >> >>the following is incremental diff to fix NetBSD case. >>please feel free to fold it into yours. >> >>i haven't checked FreeBSD part. >> >>YAMAMOTO Takashi >> >>From 71891a0fc4ba892e26fec8e704167ab8ccde408a Mon Sep 17 00:00:00 2001 >>From: YAMAMOTO Takashi >>Date: Tue, 3 Jun 2014 14:17:23 +0900 >>Subject: [PATCH] timeval: Fix NetBSD case >> >>Signed-off-by: YAMAMOTO Takashi >>--- >> python/ovs/timeval.py | 9 - >> 1 file changed, 8 insertions(+), 1 deletion(-) >> >>diff --git a/python/ovs/timeval.py b/python/ovs/timeval.py >>index 6f3cbb3..a34ef2b 100644 >>--- a/python/ovs/timeval.py >>+++ b/python/ovs/timeval.py >>@@ -16,6 +16,7 @@ import sys >> import time >> >> LIBRT = 'librt.so.1' >>+clock_gettime_name = 'clock_gettime' >> >> try: >> import ctypes >>@@ -24,6 +25,12 @@ try: >> CLOCK_MONOTONIC = 1 >> time_t = ctypes.c_long >> elif sys.platform.startswith("netbsd"): >>+# NetBSD uses function renaming for ABI versioning. While the >>proper >>+# way to get the appropriate version is of course "#include >>", >>+# it is difficult with ctypes. The following is appropriate for >>+# recent versions of NetBSD, including NetBSD-6. >>+LIBRT = 'libc.so.12' >>+clock_gettime_name = '__clock_gettime50' >> CLOCK_MONOTONIC = 3 >> time_t = ctypes.c_int64 >> elif sys.platform.startswith("freebsd"): >>@@ -39,7 +46,7 @@ try: >> ] >> >> librt = ctypes.CDLL(LIBRT) >>-clock_gettime = librt.clock_gettime >>+clock_gettime = getattr(librt, clock_gettime_name) >> clock_gettime.argtypes = [ctypes.c_int, ctypes.POINTER(timespec)] >> except: >> # Librt shared library could not be loaded >>-- >>1.9.0 >>___ >>dev mailing list >>dev@openvswitch.org >>https://urldefense.proofpoint.com/v1/url?u=http://openvswitch.org/mailman/ >>listinfo/dev&k=oIvRg1%2BdGAgOoM1BIlLLqw%3D%3D%0A&r=TfBS78Vw3dzttvXidhbffg% >>3D%3D%0A&m=EIzWMNYZlk7pLtwllkmlJqm%2FDcGWDbOB7Icp95%2Fr1tA%3D%0A&s=0e9550c >>56b07bdd67032cee1451ade1acb6b88ebcdc237cc36e1cb1f61e8855b > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH] tests: Rename "ofproto-if"->"ofproto-dpif"
It's unclear why these tests are named "ofproto-if..." unlike all of the other ofproto-dpif tests. Rename them to be more consistent. Signed-off-by: Joe Stringer --- tests/ofproto-dpif.at |8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at index 2c03ea9..c7d5fa1 100644 --- a/tests/ofproto-dpif.at +++ b/tests/ofproto-dpif.at @@ -4630,7 +4630,7 @@ OVS_VSWITCHD_STOP(["/resubmits yielded over 64 kB of stack/d"]) AT_CLEANUP -AT_SETUP([ofproto-if packet-out controller]) +AT_SETUP([ofproto-dpif packet-out controller]) OVS_VSWITCHD_START ADD_OF_PORTS([br0], 1, 2) @@ -4673,7 +4673,7 @@ OVS_VSWITCHD_STOP AT_CLEANUP -AT_SETUP([ofproto-if packet-out controller (patch port)]) +AT_SETUP([ofproto-dpif packet-out controller (patch port)]) OVS_VSWITCHD_START( [-- \ add-port br0 p1 -- \ @@ -4725,7 +4725,7 @@ OVS_VSWITCHD_STOP AT_CLEANUP -AT_SETUP([ofproto-if packet-out goto_table]) +AT_SETUP([ofproto-dpif packet-out goto_table]) OVS_VSWITCHD_START ADD_OF_PORTS([br0], 1, 2) @@ -4777,7 +4777,7 @@ OVS_VSWITCHD_STOP AT_CLEANUP -AT_SETUP([ofproto-if packet-out table-miss (continue)]) +AT_SETUP([ofproto-dpif packet-out table-miss (continue)]) OVS_VSWITCHD_START ADD_OF_PORTS([br0], 1, 2) -- 1.7.10.4 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] Is there any way to simulate the tunnel traffic in test cases to test IPFIX feature?
Thanks for your help. And how about "ovs-appctl ofproto/trace ovs-dummy" command? The manual about ovs-vswitchd says that it can generate a packet with "-generate" option. Can it fit my requirement? Bests, Wenyu - Original Message - From: "Jesse Gross" To: "Wenyu Zhang" Cc: dev@openvswitch.org Sent: Wednesday, June 4, 2014 8:11:30 AM Subject: Re: [ovs-dev] Is there any way to simulate the tunnel traffic in test cases to test IPFIX feature? On Tue, Jun 3, 2014 at 2:51 AM, Wenyu Zhang wrote: > > I am trying to simulate traffic in test cases to test IPFIX feature. > > For normal traffic, I can use "ovs-appctl netdev-dummy/receive " to simulate > it, just refer to the netflow and sflow cases in tests/ofproto-dpif.at > And I can capture the IPFIX packets on lo interface as expected. > > But for tunnel traffic, I haven't find a workable way to generate this kind > of traffic. > For example, I try to use "ovs-appctl ofproto/trace ovs-dummy" command with > "-generate" option. > And we can see that the "sample" actions has been inserted as expected, but > these actions are not executed, and no IPFIX packets can be captured on lo > interface. > > Anyone knows about the usage of this command? Or other way to generate the > tunnel traffic to trigger IPFIX sample action? Thanks a lot. I don't believe that this is implemented. netdev-dummy/receive constructs a packet from the flow and the discards everything else. Since there is no support for additional metadata beyond the input port or tunneling implemented in the dummy datapath, it basically ends up looking like a non-tunneled packet arriving on a tunnel port ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] tests: Rename "ofproto-if"->"ofproto-dpif"
On Wed, Jun 04, 2014 at 02:54:27PM +1200, Joe Stringer wrote: > It's unclear why these tests are named "ofproto-if..." unlike all of the > other ofproto-dpif tests. Rename them to be more consistent. > > Signed-off-by: Joe Stringer Acked-by: Simon Horman > --- > tests/ofproto-dpif.at |8 > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at > index 2c03ea9..c7d5fa1 100644 > --- a/tests/ofproto-dpif.at > +++ b/tests/ofproto-dpif.at > @@ -4630,7 +4630,7 @@ OVS_VSWITCHD_STOP(["/resubmits yielded over 64 kB of > stack/d"]) > AT_CLEANUP > > > -AT_SETUP([ofproto-if packet-out controller]) > +AT_SETUP([ofproto-dpif packet-out controller]) > OVS_VSWITCHD_START > ADD_OF_PORTS([br0], 1, 2) > > @@ -4673,7 +4673,7 @@ OVS_VSWITCHD_STOP > AT_CLEANUP > > > -AT_SETUP([ofproto-if packet-out controller (patch port)]) > +AT_SETUP([ofproto-dpif packet-out controller (patch port)]) > OVS_VSWITCHD_START( >[-- \ > add-port br0 p1 -- \ > @@ -4725,7 +4725,7 @@ OVS_VSWITCHD_STOP > AT_CLEANUP > > > -AT_SETUP([ofproto-if packet-out goto_table]) > +AT_SETUP([ofproto-dpif packet-out goto_table]) > OVS_VSWITCHD_START > ADD_OF_PORTS([br0], 1, 2) > > @@ -4777,7 +4777,7 @@ OVS_VSWITCHD_STOP > AT_CLEANUP > > > -AT_SETUP([ofproto-if packet-out table-miss (continue)]) > +AT_SETUP([ofproto-dpif packet-out table-miss (continue)]) > OVS_VSWITCHD_START > ADD_OF_PORTS([br0], 1, 2) > > -- > 1.7.10.4 > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] Can VXLAN work with VIP (Keepalived)?
Thanks, Jesse. I figured out the issue. For those who are interested in using VIP as VXLAN endpoint IP: after the backup node takes over VIP, its "source address" also needs to be updated. By default (I tested with Ubuntu 14.04 OS), the bakcup node's source address keeps unchanged. That explains why my VXLAN tunnels did not work previously. Thanks Changbin On Tue, Jun 3, 2014 at 3:39 PM, Jesse Gross wrote: > On Mon, Jun 2, 2014 at 6:37 PM, Changbin Liu > wrote: > > Hi Folks, > > > > Using Open vSwitch I am setting up VXLAN tunnels between hosts. I intend > to > > make VXLAN tunnels "highly available" via Keepalived. Specifically, there > > are one master node and one backup node for each host, and they share a > > Virtual IP (VIP) via Keepalived, and the VXLAN tunnel will use the VIP as > > endpoint IP. This way when the master node fails, the backup node will > > automatically have the VIP shifted to it. Therefore, human efforts are > not > > involved in failure recovery. > > > > I thought VXLAN can work with VIP this way, since VXLAN simply uses UDP > > packets, and as long as the endpoint VIP does not change, it should work. > > But it turns out that it didn't at all. I wonder whether this is intended > > behavior or I missed something. > > It should be possible to use VXLAN with a VIP. The IP stack is > configured outside of OVS so you should make sure that it is working > in Linux first. > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] Can VXLAN work with VIP (Keepalived)?
2014-06-04 8:12 GMT+04:00 Changbin Liu : > I figured out the issue. For those who are interested in using VIP as VXLAN > endpoint IP: after the backup node takes over VIP, its "source address" also > needs to be updated. By default (I tested with Ubuntu 14.04 OS), the bakcup > node's source address keeps unchanged. That explains why my VXLAN tunnels > did not work previously. And how you solve source address change? -- Vasiliy Tolstov, e-mail: v.tols...@selfip.ru jabber: v...@selfip.ru ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] Can VXLAN work with VIP (Keepalived)?
It appears that we can change source address via updating the routing table. http://www.symantec.com/business/support/index?page=content&id=HOWTO58872 But somehow it doesn't work on Ubuntu 14.04. Maybe I missed something. So right now, manually I first delete the default IP on the interface, and then update the routing table. I don't have an automated solution yet. Any ideas? Thanks Changbin On Wed, Jun 4, 2014 at 12:15 AM, Vasiliy Tolstov wrote: > 2014-06-04 8:12 GMT+04:00 Changbin Liu : > > I figured out the issue. For those who are interested in using VIP as > VXLAN > > endpoint IP: after the backup node takes over VIP, its "source address" > also > > needs to be updated. By default (I tested with Ubuntu 14.04 OS), the > bakcup > > node's source address keeps unchanged. That explains why my VXLAN tunnels > > did not work previously. > > > And how you solve source address change? > > -- > Vasiliy Tolstov, > e-mail: v.tols...@selfip.ru > jabber: v...@selfip.ru > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH] ovs-sandbox: Use the correct make
On some platforms including NetBSD, GNU make is usually installed as "gmake". Signed-off-by: YAMAMOTO Takashi --- tutorial/automake.mk | 2 +- tutorial/ovs-sandbox | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/tutorial/automake.mk b/tutorial/automake.mk index d88d5b4..82ad66d 100644 --- a/tutorial/automake.mk +++ b/tutorial/automake.mk @@ -9,4 +9,4 @@ EXTRA_DIST += \ tutorial/t-stage4 sandbox: all - cd $(srcdir)/tutorial && ./ovs-sandbox -b $(abs_builddir) + cd $(srcdir)/tutorial && MAKE=$(MAKE) ./ovs-sandbox -b $(abs_builddir) diff --git a/tutorial/ovs-sandbox b/tutorial/ovs-sandbox index db980d2..21066d1 100755 --- a/tutorial/ovs-sandbox +++ b/tutorial/ovs-sandbox @@ -190,7 +190,7 @@ OVS_SYSCONFDIR=$sandbox; export OVS_SYSCONFDIR if $built; then # Easy access to OVS manpages. -(cd "$builddir" && make install-man mandir="$sandbox"/man) +(cd "$builddir" && ${MAKE} install-man mandir="$sandbox"/man) MANPATH=$sandbox/man:; export MANPATH fi -- 1.8.3.1 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [slow path HASH and RECIRC action v2] dpif: Fix slow action handling for DP_HASH and RECIRC
> In case DP_HASH and RECIRC actions need to be executed in slow path, > current implementation simply don't handle them -- vswitchd simply > crashes. This patch fixes them by supply an implementation for them. > > RECIRC will be handled by the datapath, same as the output action. > > DP_HASH, on the other hand, is handled in the user space. Although the > resulting hash values may not match those computed by the datapath, it > is less expensive; current use case (bonding) does not require a strict > match to work properly. > > Reported-by: YAMAMOTO Takashi > Signed-off-by: Andy Zhou Acked-by: YAMAMOTO Takashi > > --- > v1->v2: Addresses Ben's review feedback > fix the comment on why execute dp_hash in user space. > Add assert for unknown hash algorithm specification. > --- > lib/dpif.c| 4 ++-- > lib/odp-execute.c | 23 ++- > 2 files changed, 24 insertions(+), 3 deletions(-) > > diff --git a/lib/dpif.c b/lib/dpif.c > index 84dba28..ac73be1 100644 > --- a/lib/dpif.c > +++ b/lib/dpif.c > @@ -1069,6 +1069,7 @@ dpif_execute_helper_cb(void *aux_, struct ofpbuf > *packet, > switch ((enum ovs_action_attr)type) { > case OVS_ACTION_ATTR_OUTPUT: > case OVS_ACTION_ATTR_USERSPACE: > +case OVS_ACTION_ATTR_RECIRC: > execute.actions = action; > execute.actions_len = NLA_ALIGN(action->nla_len); > execute.packet = packet; > @@ -1077,6 +1078,7 @@ dpif_execute_helper_cb(void *aux_, struct ofpbuf > *packet, > aux->error = aux->dpif->dpif_class->execute(aux->dpif, &execute); > break; > > +case OVS_ACTION_ATTR_HASH: > case OVS_ACTION_ATTR_PUSH_VLAN: > case OVS_ACTION_ATTR_POP_VLAN: > case OVS_ACTION_ATTR_PUSH_MPLS: > @@ -1084,8 +1086,6 @@ dpif_execute_helper_cb(void *aux_, struct ofpbuf > *packet, > case OVS_ACTION_ATTR_SET: > case OVS_ACTION_ATTR_SAMPLE: > case OVS_ACTION_ATTR_UNSPEC: > -case OVS_ACTION_ATTR_RECIRC: > -case OVS_ACTION_ATTR_HASH: > case __OVS_ACTION_ATTR_MAX: > OVS_NOT_REACHED(); > } > diff --git a/lib/odp-execute.c b/lib/odp-execute.c > index 12ad679..cc18536 100644 > --- a/lib/odp-execute.c > +++ b/lib/odp-execute.c > @@ -26,6 +26,7 @@ > #include "ofpbuf.h" > #include "odp-util.h" > #include "packets.h" > +#include "flow.h" > #include "unaligned.h" > #include "util.h" > > @@ -208,7 +209,6 @@ odp_execute_actions__(void *dp, struct ofpbuf *packet, > bool steal, > case OVS_ACTION_ATTR_OUTPUT: > case OVS_ACTION_ATTR_USERSPACE: > case OVS_ACTION_ATTR_RECIRC: > -case OVS_ACTION_ATTR_HASH: > if (dp_execute_action) { > /* Allow 'dp_execute_action' to steal the packet data if we > do > * not need it any more. */ > @@ -219,6 +219,27 @@ odp_execute_actions__(void *dp, struct ofpbuf *packet, > bool steal, > } > break; > > +case OVS_ACTION_ATTR_HASH: { > +const struct ovs_action_hash *hash_act = nl_attr_get(a); > + > +/* Calculate a hash value directly. This might not match the > + * value computed by the datapath, but it is much less expensive, > + * and the current use case (bonding) does not require a strict > + * match to work properly. */ > +if (hash_act->hash_alg == OVS_HASH_ALG_L4) { > +struct flow flow; > +uint32_t hash; > + > +flow_extract(packet, md, &flow); > +hash = flow_hash_5tuple(&flow, hash_act->hash_basis); > +md->dp_hash = hash ? hash : 1; > +} else { > +/* Assert on unknown hash algorithm. */ > +OVS_NOT_REACHED(); > +} > +break; > +} > + > case OVS_ACTION_ATTR_PUSH_VLAN: { > const struct ovs_action_push_vlan *vlan = nl_attr_get(a); > eth_push_vlan(packet, htons(ETH_TYPE_VLAN), vlan->vlan_tci); > -- > 1.9.1 > > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] INSTALL: Note about compiler atomics support.
Acked-by: YAMAMOTO Takashi > Me too. > > Acked-by: Ben Pfaff > > > On Tue, Jun 03, 2014 at 02:22:48PM -0700, Jarno Rajahalme wrote: >> This better, >> >> Acked-by: Jarno Rajahalme >> >> On Jun 3, 2014, at 2:20 PM, Gurucharan Shetty wrote: >> >> > May be we should add a new TODO section in BUILD.Windows like: >> > >> > diff --git a/BUILD.Windows b/BUILD.Windows >> > index ca0d252..41e6eab 100644 >> > --- a/BUILD.Windows >> > +++ b/BUILD.Windows >> > @@ -85,3 +85,11 @@ For example, >> > --with-openssl="C:/OpenSSL-Win32" >> > >> > * Run make for the ported executables. >> > + >> > +TODO: >> > + >> > + >> > +* OVS currently has no native support for atomics on Windows. Pthreads >> > +are used as a fallback, but some features, such as OVS-RCU are really >> > +slow without native atomics support. Atomics support for Windows has to >> > +be brought in. >> > >> > On Tue, Jun 3, 2014 at 2:16 PM, Jarno Rajahalme >> > wrote: >> >> >> >> On Jun 3, 2014, at 2:03 PM, Ben Pfaff wrote: >> >> >> >>> On Tue, Jun 03, 2014 at 01:09:11PM -0700, Jarno Rajahalme wrote: >> OVS is slow when compiled with pthreads atomics. Add a generic note >> in INSTALL, with a reference to lib/ovs-atomic.h, where a new comment >> provides additional detail. >> >> Signed-off-by: Jarno Rajahalme >> >>> >> >>> I think this is only likely to affect Windows (everyone else uses GCC >> >>> or Clang) so should we put it in BUILD.Windows? >> >> >> >> >> >> So you think this should not be mentioned in INSTALL? If so, I think the >> >> text there should be modified to something like ?* GCC 4.x or clang C >> >> compiler? instead of the current ?A C compiler, such as??. >> >> >> >> How about adding the following to BUILD.Windows: >> >> >> >> diff --git a/BUILD.Windows b/BUILD.Windows >> >> index ca0d252..05956ba 100644 >> >> --- a/BUILD.Windows >> >> +++ b/BUILD.Windows >> >> @@ -39,6 +39,10 @@ project from >> >> ftp://sourceware.org/pub/pthreads-win32/prebuilt-dll-2-9-1-release to a >> >> directory (e.g.: C:/pthread). >> >> >> >> +OVS currently has no native support for atomics on Windows. Pthreads >> >> +are used as a fallback, but some features, such as OVS-RCU are really >> >> +slow without native atomics support. >> >> + >> >> * Get the Open vSwitch sources from either cloning the repo using git >> >> or from a distribution tar ball. >> >> >> >> Jarno >> >> >> >> ___ >> >> dev mailing list >> >> dev@openvswitch.org >> >> http://openvswitch.org/mailman/listinfo/dev >> > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] INSTALL: Note about compiler atomics support.
> OVS is slow when compiled with pthreads atomics. Add a generic note > in INSTALL, with a reference to lib/ovs-atomic.h, where a new comment > provides additional detail. > > Signed-off-by: Jarno Rajahalme Acked-by: YAMAMOTO Takashi > --- > INSTALL |4 > lib/ovs-atomic.h |3 +++ > 2 files changed, 7 insertions(+) > > diff --git a/INSTALL b/INSTALL > index 5c869b2..7b08172 100644 > --- a/INSTALL > +++ b/INSTALL > @@ -28,6 +28,10 @@ you will need the following software: >analysis and thread-safety checks. For Ubuntu, there are >nightly built packages available on clang's website. > > +* While OVS may be compatible with other compilers, optimal > + support for atomic operations may be missing, making OVS > + very slow (see lib/ovs-atomic.h). > + > - libssl, from OpenSSL, is optional but recommended if you plan to >connect the Open vSwitch to an OpenFlow controller. libssl is >required to establish confidentiality and authenticity in the > diff --git a/lib/ovs-atomic.h b/lib/ovs-atomic.h > index c21b7ee..2452846 100644 > --- a/lib/ovs-atomic.h > +++ b/lib/ovs-atomic.h > @@ -257,6 +257,9 @@ > #elif HAVE_GCC4_ATOMICS > #include "ovs-atomic-gcc4+.h" > #else > +/* ovs-atomic-pthreads implementation is provided for portability. > + * It might be too slow for real use because Open vSwitch is > + * optimized for platforms where real atomic ops are available. */ > #include "ovs-atomic-pthreads.h" > #endif > #undef IN_OVS_ATOMIC_H > -- > 1.7.10.4 > > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev