[ovs-dev] [DPDK Upcalls 07/11] xlate: Handle VLAN splinters in xlate_actions().
I find this quite a bit simpler to reason about. Also, future patches require xlate_actions() not to modify the packet. Signed-off-by: Ethan Jackson --- ofproto/ofproto-dpif-upcall.c | 25 ++ ofproto/ofproto-dpif-xlate.c | 80 +-- ofproto/ofproto-dpif-xlate.h | 9 +++-- ofproto/ofproto-dpif.c| 6 ++-- tests/vlan-splinters.at | 4 +-- 5 files changed, 58 insertions(+), 66 deletions(-) diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c index 48393b0..b0fd203 100644 --- a/ofproto/ofproto-dpif-upcall.c +++ b/ofproto/ofproto-dpif-upcall.c @@ -858,7 +858,7 @@ convert_upcall(struct udpif *udpif, struct upcall *upcall) odp_port_t odp_in_port; int error; -error = xlate_receive(udpif->backer, packet, dupcall->key, +error = xlate_receive(udpif->backer, dupcall->key, dupcall->key_len, &flow, &ofproto, &ipfix, &sflow, NULL, &odp_in_port); @@ -967,25 +967,6 @@ handle_upcalls(struct udpif *udpif, struct upcall *upcalls, fail_open = fail_open || upcall->xout.fail_open; -if (upcall->flow.in_port.ofp_port -!= vsp_realdev_to_vlandev(upcall->ofproto, - upcall->flow.in_port.ofp_port, - upcall->flow.vlan_tci)) { -/* This packet was received on a VLAN splinter port. We - * added a VLAN to the packet to make the packet resemble - * the flow, but the actions were composed assuming that - * the packet contained no VLAN. So, we must remove the - * VLAN header from the packet before trying to execute the - * actions. */ -if (ofpbuf_size(upcall->xout.odp_actions)) { -eth_pop_vlan(packet); -} - -/* Remove the flow vlan tags inserted by vlan splinter logic - * to ensure megaflow masks generated match the data path flow. */ -upcall->flow.vlan_tci = 0; -} - /* Do not install a flow into the datapath if: * *- The datapath already has too many flows. @@ -1257,7 +1238,7 @@ revalidate_ukey(struct udpif *udpif, struct udpif_key *ukey, goto exit; } -error = xlate_receive(udpif->backer, NULL, ukey->key, ukey->key_len, &flow, +error = xlate_receive(udpif->backer, ukey->key, ukey->key_len, &flow, &ofproto, NULL, NULL, &netflow, &odp_in_port); if (error) { goto exit; @@ -1385,7 +1366,7 @@ push_dump_ops__(struct udpif *udpif, struct dump_op *ops, size_t n_ops) } ovs_mutex_unlock(&op->ukey->mutex); -error = xlate_receive(udpif->backer, NULL, op->op.u.flow_del.key, +error = xlate_receive(udpif->backer, op->op.u.flow_del.key, op->op.u.flow_del.key_len, &flow, &ofproto, NULL, NULL, &netflow, NULL); if (!error) { diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c index c816135..f115849 100644 --- a/ofproto/ofproto-dpif-xlate.c +++ b/ofproto/ofproto-dpif-xlate.c @@ -934,23 +934,16 @@ xlate_ofport_remove(struct ofport_dpif *ofport) xlate_xport_remove(new_xcfg, xport); } -/* Given a datpath, packet, and flow metadata ('backer', 'packet', and 'key' - * respectively), populates 'flow' with the result of odp_flow_key_to_flow(). - * Optionally populates 'ofproto' with the ofproto_dpif, 'odp_in_port' with - * the datapath in_port, that 'packet' ingressed, and 'ipfix', 'sflow', and - * 'netflow' with the appropriate handles for those protocols if they're - * enabled. Caller is responsible for unrefing them. +/* Given a datpath and flow metadata ('backer', and 'key' respectively), + * populates 'flow' with the result of odp_flow_key_to_flow(). Optionally + * populates 'ofproto' with the ofproto_dpif, 'odp_in_port' with the datapath + * in_port, and 'ipfix', 'sflow', and 'netflow' with the appropriate handles + * for those protocols if they're enabled. Caller is responsible for unrefing + * them. * * If 'ofproto' is nonnull, requires 'flow''s in_port to exist. Otherwise sets * 'flow''s in_port to OFPP_NONE. * - * This function does post-processing on data returned from - * odp_flow_key_to_flow() to help make VLAN splinters transparent to the rest - * of the upcall processing logic. In particular, if the extracted in_port is - * a VLAN splinter port, it replaces flow->in_port by the "re
[ovs-dev] [DPDK Upcalls 01/11] style: Replace TODO with XXX.
In accordance with CodingStyle. Signed-off-by: Ethan Jackson --- lib/dpif-netdev.c | 4 ++-- lib/netdev-dpdk.c | 10 +- ofproto/bundles.c | 4 ++-- 3 files changed, 9 insertions(+), 9 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index bbc4eea..76e36d9 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -1690,7 +1690,7 @@ reload: if (lc++ > 1024) { ovsrcu_quiesce(); -/* TODO: need completely userspace based signaling method. +/* XXX: need completely userspace based signaling method. * to keep this thread entirely in userspace. * For now using atomic counter. */ lc = 0; @@ -2112,7 +2112,7 @@ dp_execute_cb(void *aux_, struct dpif_packet **packets, int cnt, for (i = 0; i < cnt; i++) { -/* TODO: this is slow. Use RSS hash in the future */ +/* XXX: this is slow. Use RSS hash in the future */ miniflow_extract(&packets[i]->ofpbuf, md, &key.flow); if (hash_act->hash_alg == OVS_HASH_ALG_L4) { diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index b45e367..6ee9803 100644 --- a/lib/netdev-dpdk.c +++ b/lib/netdev-dpdk.c @@ -64,7 +64,7 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20); #define MBUF_SIZE(mtu) (MTU_TO_MAX_LEN(mtu) + (512) + \ sizeof(struct rte_mbuf) + RTE_PKTMBUF_HEADROOM) -/* TODO: mempool size should be based on system resources. */ +/* XXX: mempool size should be based on system resources. */ #define NB_MBUF (4096 * 64) #define MP_CACHE_SZ (256 * 2) #define SOCKET0 0 @@ -74,7 +74,7 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20); #define NIC_PORT_RX_Q_SIZE 2048 /* Size of Physical NIC RX Queue, Max (n+32<=4096)*/ #define NIC_PORT_TX_Q_SIZE 2048 /* Size of Physical NIC TX Queue, Max (n+32<=4096)*/ -/* TODO: Needs per NIC value for these constants. */ +/* XXX: Needs per NIC value for these constants. */ #define RX_PTHRESH 32 /* Default values of RX prefetch threshold reg. */ #define RX_HTHRESH 32 /* Default values of RX host threshold reg. */ #define RX_WTHRESH 16 /* Default values of RX write-back threshold reg. */ @@ -215,7 +215,7 @@ is_dpdk_class(const struct netdev_class *class) return class->construct == netdev_dpdk_construct; } -/* TODO: use dpdk malloc for entire OVS. infact huge page shld be used +/* XXX: use dpdk malloc for entire OVS. infact huge page shld be used * for all other sengments data, bss and text. */ static void * @@ -478,7 +478,7 @@ netdev_dpdk_init(struct netdev *netdev_, unsigned int port_no) OVS_REQUIRES(dpdk netdev->mtu = ETHER_MTU; netdev->max_packet_len = MTU_TO_MAX_LEN(netdev->mtu); -/* TODO: need to discover device node at run time. */ +/* XXX: need to discover device node at run time. */ netdev->socket_id = SOCKET0; netdev->dpdk_mp = dpdk_mp_get(netdev->socket_id, netdev->mtu); @@ -569,7 +569,7 @@ netdev_dpdk_get_config(const struct netdev *netdev_, struct smap *args) ovs_mutex_lock(&dev->mutex); -/* TODO: Allow to configure number of queues. */ +/* XXX: Allow to configure number of queues. */ smap_add_format(args, "configured_rx_queues", "%u", netdev_->n_rxq); smap_add_format(args, "configured_tx_queues", "%u", netdev_->n_rxq); ovs_mutex_unlock(&dev->mutex); diff --git a/ofproto/bundles.c b/ofproto/bundles.c index 9904465..ddc74c6 100644 --- a/ofproto/bundles.c +++ b/ofproto/bundles.c @@ -143,7 +143,7 @@ ofp_bundle_open(struct ofconn *ofconn, uint32_t id, uint16_t flags) return OFPERR_OFPBFC_BAD_ID; } -/* TODO: Check the limit of open bundles */ +/* XXX: Check the limit of open bundles */ bundle = ofp_bundle_create(id, flags); bundle->state = BS_OPEN; @@ -198,7 +198,7 @@ ofp_bundle_commit(struct ofconn *ofconn, uint32_t id, uint16_t flags) return OFPERR_OFPBFC_BAD_FLAGS; } -/* TODO: actual commit */ +/* XXX: actual commit */ return OFPERR_OFPBFC_MSG_UNSUP; } -- 1.8.1.2 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [DPDK Upcalls 05/11] classifier: classifier_lookup_miniflow_batch() indicate failures.
This patch causes classifier_lookup_miniflow_batch() to return a boolean indicating whether any rules could not be successfully looked up. Used in future patches. Signed-off-by: Ethan Jackson --- lib/classifier.c | 10 +++--- lib/classifier.h | 2 +- 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/lib/classifier.c b/lib/classifier.c index a6a582c..ae03251 100644 --- a/lib/classifier.c +++ b/lib/classifier.c @@ -1004,8 +1004,10 @@ find_match_miniflow(const struct cls_subtable *subtable, * This function is optimized for use in the userspace datapath and therefore * does not implement a lot of features available in the standard * classifier_lookup() function. Specifically, it does not implement - * priorities, instead returning any rule which matches the flow. */ -void + * priorities, instead returning any rule which matches the flow. + * + * Returns true if all flows found a corresponding rule. */ +bool classifier_lookup_miniflow_batch(const struct classifier *cls, const struct miniflow **flows, struct cls_rule **rules, size_t len) @@ -1034,9 +1036,11 @@ classifier_lookup_miniflow_batch(const struct classifier *cls, begin++; } if (begin >= len) { -break; +return true; } } + +return false; } /* Finds and returns a rule in 'cls' with exactly the same priority and diff --git a/lib/classifier.h b/lib/classifier.h index 4203eb8..b394724 100644 --- a/lib/classifier.h +++ b/lib/classifier.h @@ -295,7 +295,7 @@ void classifier_remove(struct classifier *, struct cls_rule *); struct cls_rule *classifier_lookup(const struct classifier *, const struct flow *, struct flow_wildcards *); -void classifier_lookup_miniflow_batch(const struct classifier *cls, +bool classifier_lookup_miniflow_batch(const struct classifier *cls, const struct miniflow **flows, struct cls_rule **rules, size_t len); bool classifier_rule_overlaps(const struct classifier *, -- 1.8.1.2 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [DPDK Upcalls 02/11] ovs-dev.py: Support check-valgrind in the Makefile.
Signed-off-by: Ethan Jackson --- utilities/ovs-dev.py | 1 + 1 file changed, 1 insertion(+) diff --git a/utilities/ovs-dev.py b/utilities/ovs-dev.py index 74143ca..496f6dc 100755 --- a/utilities/ovs-dev.py +++ b/utilities/ovs-dev.py @@ -131,6 +131,7 @@ def conf(): if clang: mf.write(make_str % BUILD_CLANG) mf.write("\t$(MAKE) -C %s %s $@\n" % (BUILD_GCC, c1)) +mf.write("\ncheck-valgrind:\n") mf.write("\ncheck:\n") mf.write(make_str % BUILD_GCC) mf.close() -- 1.8.1.2 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [DPDK Upcalls 08/11] xlate: Pull key parsing out of xlate_receive().
In future patches, dpif-netdev will not create a flow key for each upcall. To accommodate this, we require callers to handle flow parsing themselves. Signed-off-by: Ethan Jackson --- ofproto/ofproto-dpif-upcall.c | 30 -- ofproto/ofproto-dpif-xlate.c | 23 +-- ofproto/ofproto-dpif-xlate.h | 8 ofproto/ofproto-dpif.c| 17 - 4 files changed, 45 insertions(+), 33 deletions(-) diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c index b0fd203..3003455 100644 --- a/ofproto/ofproto-dpif-upcall.c +++ b/ofproto/ofproto-dpif-upcall.c @@ -858,9 +858,14 @@ convert_upcall(struct udpif *udpif, struct upcall *upcall) odp_port_t odp_in_port; int error; -error = xlate_receive(udpif->backer, dupcall->key, - dupcall->key_len, &flow, - &ofproto, &ipfix, &sflow, NULL, &odp_in_port); +if (odp_flow_key_to_flow(dupcall->key, dupcall->key_len, &flow) +== ODP_FIT_ERROR) { +error = EINVAL; +goto destroy_upcall; +} + +error = xlate_receive(udpif->backer, &flow, &ofproto, &ipfix, &sflow, NULL, + &odp_in_port); if (error) { if (error == ENODEV) { @@ -1238,8 +1243,13 @@ revalidate_ukey(struct udpif *udpif, struct udpif_key *ukey, goto exit; } -error = xlate_receive(udpif->backer, ukey->key, ukey->key_len, &flow, - &ofproto, NULL, NULL, &netflow, &odp_in_port); +if (odp_flow_key_to_flow(ukey->key, ukey->key_len, &flow) +== ODP_FIT_ERROR) { +goto exit; +} + +error = xlate_receive(udpif->backer, &flow, &ofproto, NULL, NULL, &netflow, + &odp_in_port); if (error) { goto exit; } @@ -1355,7 +1365,6 @@ push_dump_ops__(struct udpif *udpif, struct dump_op *ops, size_t n_ops) struct netflow *netflow; struct flow flow; bool may_learn; -int error; may_learn = push->n_packets > 0; ovs_mutex_lock(&op->ukey->mutex); @@ -1366,10 +1375,11 @@ push_dump_ops__(struct udpif *udpif, struct dump_op *ops, size_t n_ops) } ovs_mutex_unlock(&op->ukey->mutex); -error = xlate_receive(udpif->backer, op->op.u.flow_del.key, - op->op.u.flow_del.key_len, &flow, &ofproto, - NULL, NULL, &netflow, NULL); -if (!error) { +if (odp_flow_key_to_flow(op->op.u.flow_del.key, + op->op.u.flow_del.key_len, + &flow) != ODP_FIT_ERROR +&& !xlate_receive(udpif->backer, &flow, &ofproto, NULL, NULL, + &netflow, NULL)) { struct xlate_in xin; xlate_in_init(&xin, ofproto, &flow, NULL, push->tcp_flags, diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c index f115849..a1d13c7 100644 --- a/ofproto/ofproto-dpif-xlate.c +++ b/ofproto/ofproto-dpif-xlate.c @@ -934,12 +934,11 @@ xlate_ofport_remove(struct ofport_dpif *ofport) xlate_xport_remove(new_xcfg, xport); } -/* Given a datpath and flow metadata ('backer', and 'key' respectively), - * populates 'flow' with the result of odp_flow_key_to_flow(). Optionally - * populates 'ofproto' with the ofproto_dpif, 'odp_in_port' with the datapath - * in_port, and 'ipfix', 'sflow', and 'netflow' with the appropriate handles - * for those protocols if they're enabled. Caller is responsible for unrefing - * them. +/* Given a datpath and flow metadata ('backer', and 'flow' respectively), + * Optionally populates 'ofproto' with the ofproto_dpif, 'odp_in_port' with the + * datapath in_port, and 'ipfix', 'sflow', and 'netflow' with the appropriate + * handles for those protocols if they're enabled. Caller is responsible for + * unrefing them. * * If 'ofproto' is nonnull, requires 'flow''s in_port to exist. Otherwise sets * 'flow''s in_port to OFPP_NONE. @@ -951,18 +950,14 @@ xlate_ofport_remove(struct ofport_dpif *ofport) * Returns 0 if successful, ENODEV if the parsed flow has no associated ofport, * or some other positive errno if there are other problems. */ int -xlate_receive(const struct dpif_backer *backer, const struct nlattr *key, - size_t key_len, struct flow *flow, struct ofproto_dpif **ofproto, - struct dpif_ipfix **ipfix, struct dpif_sflow **sflow, -
[ovs-dev] [DPDK Upcalls 04/11] dpif-netdev: Avoid useless flow copy in dp_netdev_flow_add().
This patch gives dp_netdev_flow_add() a match with which it can initialize the classifier rule. This prevents it from needing to copy a flow and flow_wildcards into the match first. Signed-off-by: Ethan Jackson --- lib/dpif-netdev.c | 27 +++ 1 file changed, 11 insertions(+), 16 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 76e36d9..c637d9f 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -1210,29 +1210,25 @@ dpif_netdev_flow_get(const struct dpif *dpif, } static int -dp_netdev_flow_add(struct dp_netdev *dp, const struct flow *flow, - const struct flow_wildcards *wc, - const struct nlattr *actions, - size_t actions_len) +dp_netdev_flow_add(struct dp_netdev *dp, struct match *match, + const struct nlattr *actions, size_t actions_len) OVS_REQUIRES(dp->flow_mutex) { struct dp_netdev_flow *netdev_flow; -struct match match; netdev_flow = xzalloc(sizeof *netdev_flow); -*CONST_CAST(struct flow *, &netdev_flow->flow) = *flow; +*CONST_CAST(struct flow *, &netdev_flow->flow) = match->flow; ovsthread_stats_init(&netdev_flow->stats); ovsrcu_set(&netdev_flow->actions, dp_netdev_actions_create(actions, actions_len)); -match_init(&match, flow, wc); cls_rule_init(CONST_CAST(struct cls_rule *, &netdev_flow->cr), - &match, NETDEV_RULE_PRIORITY); + match, NETDEV_RULE_PRIORITY); cmap_insert(&dp->flow_table, CONST_CAST(struct cmap_node *, &netdev_flow->node), -flow_hash(flow, 0)); +flow_hash(&match->flow, 0)); classifier_insert(&dp->cls, CONST_CAST(struct cls_rule *, &netdev_flow->cr)); @@ -1260,22 +1256,21 @@ dpif_netdev_flow_put(struct dpif *dpif, const struct dpif_flow_put *put) { struct dp_netdev *dp = get_dp_netdev(dpif); struct dp_netdev_flow *netdev_flow; -struct flow flow; struct miniflow miniflow; -struct flow_wildcards wc; +struct match match; int error; -error = dpif_netdev_flow_from_nlattrs(put->key, put->key_len, &flow); +error = dpif_netdev_flow_from_nlattrs(put->key, put->key_len, &match.flow); if (error) { return error; } error = dpif_netdev_mask_from_nlattrs(put->key, put->key_len, put->mask, put->mask_len, - &flow, &wc.masks); + &match.flow, &match.wc.masks); if (error) { return error; } -miniflow_init(&miniflow, &flow); +miniflow_init(&miniflow, &match.flow); ovs_mutex_lock(&dp->flow_mutex); netdev_flow = dp_netdev_lookup_flow(dp, &miniflow); @@ -1285,7 +1280,7 @@ dpif_netdev_flow_put(struct dpif *dpif, const struct dpif_flow_put *put) if (put->stats) { memset(put->stats, 0, sizeof *put->stats); } -error = dp_netdev_flow_add(dp, &flow, &wc, put->actions, +error = dp_netdev_flow_add(dp, &match, put->actions, put->actions_len); } else { error = EFBIG; @@ -1295,7 +1290,7 @@ dpif_netdev_flow_put(struct dpif *dpif, const struct dpif_flow_put *put) } } else { if (put->flags & DPIF_FP_MODIFY -&& flow_equal(&flow, &netdev_flow->flow)) { +&& flow_equal(&match.flow, &netdev_flow->flow)) { struct dp_netdev_actions *new_actions; struct dp_netdev_actions *old_actions; -- 1.8.1.2 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [DPDK Upcalls 09/11] xlate: Don't modify provided struct flow.
Necessary for future patches which need the provided flow to be const. Signed-off-by: Ethan Jackson --- ofproto/ofproto-dpif-upcall.c | 37 +++-- ofproto/ofproto-dpif-xlate.c | 23 --- ofproto/ofproto-dpif-xlate.h | 6 +++--- ofproto/ofproto-dpif.c| 10 ++ 4 files changed, 40 insertions(+), 36 deletions(-) diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c index 3003455..96a7e0b 100644 --- a/ofproto/ofproto-dpif-upcall.c +++ b/ofproto/ofproto-dpif-upcall.c @@ -153,7 +153,6 @@ struct upcall { size_t key_len; enum dpif_upcall_type upcall_type; struct dpif_flow_stats stats; -odp_port_t odp_in_port; uint64_t slow_path_buf[128 / 8]; struct odputil_keybuf mask_buf; @@ -726,7 +725,7 @@ compose_slow_path(struct udpif *udpif, struct xlate_out *xout, static void upcall_init(struct upcall *upcall, struct flow *flow, struct ofpbuf *packet, struct ofproto_dpif *ofproto, struct dpif_upcall *dupcall, -odp_port_t odp_in_port) +ofp_port_t ofp_in_port) { struct pkt_metadata md = pkt_metadata_from_flow(flow); struct xlate_in xin; @@ -741,9 +740,8 @@ upcall_init(struct upcall *upcall, struct flow *flow, struct ofpbuf *packet, upcall->stats.n_bytes = ofpbuf_size(packet); upcall->stats.used = time_msec(); upcall->stats.tcp_flags = ntohs(upcall->flow.tcp_flags); -upcall->odp_in_port = odp_in_port; -xlate_in_init(&xin, upcall->ofproto, &upcall->flow, NULL, +xlate_in_init(&xin, upcall->ofproto, &upcall->flow, ofp_in_port, NULL, upcall->stats.tcp_flags, packet); if (upcall->upcall_type == DPIF_UC_MISS) { @@ -853,9 +851,9 @@ convert_upcall(struct udpif *udpif, struct upcall *upcall) struct ofproto_dpif *ofproto; struct dpif_sflow *sflow; struct dpif_ipfix *ipfix; -struct flow flow; +ofp_port_t ofp_in_port; enum upcall_type type; -odp_port_t odp_in_port; +struct flow flow; int error; if (odp_flow_key_to_flow(dupcall->key, dupcall->key_len, &flow) @@ -865,7 +863,7 @@ convert_upcall(struct udpif *udpif, struct upcall *upcall) } error = xlate_receive(udpif->backer, &flow, &ofproto, &ipfix, &sflow, NULL, - &odp_in_port); + &ofp_in_port); if (error) { if (error == ENODEV) { @@ -876,7 +874,7 @@ convert_upcall(struct udpif *udpif, struct upcall *upcall) * so that future packets of the flow are inexpensively dropped * in the kernel. */ VLOG_INFO_RL(&rl, "received packet on unassociated datapath " - "port %"PRIu32, odp_in_port); + "port %"PRIu32, flow.in_port.odp_port); dpif_flow_put(udpif->dpif, DPIF_FP_CREATE | DPIF_FP_MODIFY, dupcall->key, dupcall->key_len, NULL, 0, NULL, 0, NULL); @@ -886,7 +884,7 @@ convert_upcall(struct udpif *udpif, struct upcall *upcall) type = classify_upcall(upcall); if (type == MISS_UPCALL) { -upcall_init(upcall, &flow, packet, ofproto, dupcall, odp_in_port); +upcall_init(upcall, &flow, packet, ofproto, dupcall, ofp_in_port); return error; } @@ -898,7 +896,7 @@ convert_upcall(struct udpif *udpif, struct upcall *upcall) memset(&cookie, 0, sizeof cookie); memcpy(&cookie, nl_attr_get(dupcall->userdata), sizeof cookie.sflow); -dpif_sflow_received(sflow, packet, &flow, odp_in_port, +dpif_sflow_received(sflow, packet, &flow, flow.in_port.odp_port, &cookie); } break; @@ -1014,7 +1012,7 @@ handle_upcalls(struct udpif *udpif, struct upcall *upcalls, ofpbuf_use_stack(&buf, upcall->slow_path_buf, sizeof upcall->slow_path_buf); compose_slow_path(udpif, &upcall->xout, &upcall->flow, - upcall->odp_in_port, &buf); + upcall->flow.in_port.odp_port, &buf); op->u.flow_put.actions = ofpbuf_data(&buf); op->u.flow_put.actions_len = ofpbuf_size(&buf); } @@ -1202,7 +1200,7 @@ revalidate_ukey(struct udpif *udpif, struct udpif_key *ukey, struct ofpbuf xout_actions; struct flow flow, dp_mask; uint32_t *dp32, *xout32; -odp_port_t odp_in_port; +ofp_port_t ofp_in_port; struct xlate_in xin; long long int last_used; int error; @@ -1249,7 +1247,7 @@ revalidate_ukey(struct udpif *udpif, struct udpif_key *ukey, } error = xla
[ovs-dev] [DPDK Upcalls 06/11] xlate: Allow callers to supply their own odp_actions buffer.
In future patches, this will allow dpif-netdev to avoid a copy. Signed-off-by: Ethan Jackson --- ofproto/ofproto-dpif-upcall.c | 16 - ofproto/ofproto-dpif-xlate.c | 75 +++ ofproto/ofproto-dpif-xlate.h | 8 - ofproto/ofproto-dpif.c| 10 +++--- 4 files changed, 61 insertions(+), 48 deletions(-) diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c index 654fbd3..48393b0 100644 --- a/ofproto/ofproto-dpif-upcall.c +++ b/ofproto/ofproto-dpif-upcall.c @@ -977,7 +977,7 @@ handle_upcalls(struct udpif *udpif, struct upcall *upcalls, * the packet contained no VLAN. So, we must remove the * VLAN header from the packet before trying to execute the * actions. */ -if (ofpbuf_size(&upcall->xout.odp_actions)) { +if (ofpbuf_size(upcall->xout.odp_actions)) { eth_pop_vlan(packet); } @@ -1020,8 +1020,8 @@ handle_upcalls(struct udpif *udpif, struct upcall *upcalls, op->u.flow_put.stats = NULL; if (!upcall->xout.slow) { -op->u.flow_put.actions = ofpbuf_data(&upcall->xout.odp_actions); -op->u.flow_put.actions_len = ofpbuf_size(&upcall->xout.odp_actions); +op->u.flow_put.actions = ofpbuf_data(upcall->xout.odp_actions); +op->u.flow_put.actions_len = ofpbuf_size(upcall->xout.odp_actions); } else { struct ofpbuf buf; @@ -1034,15 +1034,15 @@ handle_upcalls(struct udpif *udpif, struct upcall *upcalls, } } -if (ofpbuf_size(&upcall->xout.odp_actions)) { +if (ofpbuf_size(upcall->xout.odp_actions)) { op = &ops[n_ops++]; op->type = DPIF_OP_EXECUTE; op->u.execute.packet = packet; odp_key_to_pkt_metadata(upcall->key, upcall->key_len, &op->u.execute.md); -op->u.execute.actions = ofpbuf_data(&upcall->xout.odp_actions); -op->u.execute.actions_len = ofpbuf_size(&upcall->xout.odp_actions); +op->u.execute.actions = ofpbuf_data(upcall->xout.odp_actions); +op->u.execute.actions_len = ofpbuf_size(upcall->xout.odp_actions); op->u.execute.needs_help = (upcall->xout.slow & SLOW_ACTION) != 0; } } @@ -1284,8 +1284,8 @@ revalidate_ukey(struct udpif *udpif, struct udpif_key *ukey, } if (!xout.slow) { -ofpbuf_use_const(&xout_actions, ofpbuf_data(&xout.odp_actions), - ofpbuf_size(&xout.odp_actions)); +ofpbuf_use_const(&xout_actions, ofpbuf_data(xout.odp_actions), + ofpbuf_size(xout.odp_actions)); } else { ofpbuf_use_stack(&xout_actions, slow_path_buf, sizeof slow_path_buf); compose_slow_path(udpif, &xout, &flow, odp_in_port, &xout_actions); diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c index 4aedb59..c816135 100644 --- a/ofproto/ofproto-dpif-xlate.c +++ b/ofproto/ofproto-dpif-xlate.c @@ -1368,7 +1368,7 @@ add_mirror_actions(struct xlate_ctx *ctx, const struct flow *orig_flow) "%s, which is reserved exclusively for mirroring", ctx->xbridge->name, in_xbundle->name); } -ofpbuf_clear(&ctx->xout->odp_actions); +ofpbuf_clear(ctx->xout->odp_actions); return; } @@ -2276,7 +2276,7 @@ static void add_sflow_action(struct xlate_ctx *ctx) { ctx->user_cookie_offset = compose_sflow_action(ctx->xbridge, - &ctx->xout->odp_actions, + ctx->xout->odp_actions, &ctx->xin->flow, ODPP_NONE); ctx->sflow_odp_port = 0; ctx->sflow_n_outputs = 0; @@ -2287,7 +2287,7 @@ add_sflow_action(struct xlate_ctx *ctx) static void add_ipfix_action(struct xlate_ctx *ctx) { -compose_ipfix_action(ctx->xbridge, &ctx->xout->odp_actions, +compose_ipfix_action(ctx->xbridge, ctx->xout->odp_actions, &ctx->xin->flow); } @@ -2304,7 +2304,7 @@ fix_sflow_action(struct xlate_ctx *ctx) return; } -cookie = ofpbuf_at(&ctx->xout->odp_actions, ctx->user_cookie_offset, +cookie = ofpbuf_at(ctx->xout->odp_actions, ctx->user_cookie_offset, sizeof cookie->sflow); ovs_assert(cookie->type == USER_ACTION_COOKIE_SFLOW); @@ -2415,12 +2415,12 @@ compose_output_action__(struct xlate_ctx *ctx, ofp_port_t ofp_port, /* Forwarding is disable
[ovs-dev] [DPDK Upcalls 03/11] flow: Parse MPLS should return the actual number of labels.
This problem is uncovered by a future patch. Signed-off-by: Ethan Jackson --- lib/flow.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/flow.c b/lib/flow.c index 5e04015..cfd90d6 100644 --- a/lib/flow.c +++ b/lib/flow.c @@ -210,7 +210,7 @@ parse_mpls(void **datap, size_t *sizep) break; } } -return MAX(count, FLOW_MAX_MPLS_LABELS); +return count; } static inline ovs_be16 -- 1.8.1.2 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [DPDK Upcalls 10/11] ofproto-dpif-upcall: Reorganize miss handling.
This patch reorganizes miss handling so that it can support future patches which don't go through the standard dpif interfaces. As an added side benefit, the resulting code is a bit easier to understand. Signed-off-by: Ethan Jackson --- ofproto/ofproto-dpif-ipfix.c | 8 +- ofproto/ofproto-dpif-ipfix.h | 4 +- ofproto/ofproto-dpif-sflow.c | 2 +- ofproto/ofproto-dpif-sflow.h | 6 +- ofproto/ofproto-dpif-upcall.c | 386 +- ofproto/ofproto-dpif-upcall.h | 3 - ofproto/ofproto-dpif.c| 5 +- 7 files changed, 207 insertions(+), 207 deletions(-) diff --git a/ofproto/ofproto-dpif-ipfix.c b/ofproto/ofproto-dpif-ipfix.c index 1584c25..ff498d0 100644 --- a/ofproto/ofproto-dpif-ipfix.c +++ b/ofproto/ofproto-dpif-ipfix.c @@ -1019,7 +1019,7 @@ ipfix_cache_update(struct dpif_ipfix_exporter *exporter, static void ipfix_cache_entry_init(struct ipfix_flow_cache_entry *entry, - struct ofpbuf *packet, const struct flow *flow, + const struct ofpbuf *packet, const struct flow *flow, uint64_t packet_delta_count, uint32_t obs_domain_id, uint32_t obs_point_id) { @@ -1284,7 +1284,7 @@ ipfix_send_data_msg(struct dpif_ipfix_exporter *exporter, static void dpif_ipfix_sample(struct dpif_ipfix_exporter *exporter, - struct ofpbuf *packet, const struct flow *flow, + const struct ofpbuf *packet, const struct flow *flow, uint64_t packet_delta_count, uint32_t obs_domain_id, uint32_t obs_point_id) { @@ -1298,7 +1298,7 @@ dpif_ipfix_sample(struct dpif_ipfix_exporter *exporter, } void -dpif_ipfix_bridge_sample(struct dpif_ipfix *di, struct ofpbuf *packet, +dpif_ipfix_bridge_sample(struct dpif_ipfix *di, const struct ofpbuf *packet, const struct flow *flow) OVS_EXCLUDED(mutex) { uint64_t packet_delta_count; @@ -1315,7 +1315,7 @@ dpif_ipfix_bridge_sample(struct dpif_ipfix *di, struct ofpbuf *packet, } void -dpif_ipfix_flow_sample(struct dpif_ipfix *di, struct ofpbuf *packet, +dpif_ipfix_flow_sample(struct dpif_ipfix *di, const struct ofpbuf *packet, const struct flow *flow, uint32_t collector_set_id, uint16_t probability, uint32_t obs_domain_id, uint32_t obs_point_id) OVS_EXCLUDED(mutex) diff --git a/ofproto/ofproto-dpif-ipfix.h b/ofproto/ofproto-dpif-ipfix.h index 6ebf8b0..9de17ab 100644 --- a/ofproto/ofproto-dpif-ipfix.h +++ b/ofproto/ofproto-dpif-ipfix.h @@ -35,9 +35,9 @@ void dpif_ipfix_set_options( const struct ofproto_ipfix_bridge_exporter_options *, const struct ofproto_ipfix_flow_exporter_options *, size_t); -void dpif_ipfix_bridge_sample(struct dpif_ipfix *, struct ofpbuf *, +void dpif_ipfix_bridge_sample(struct dpif_ipfix *, const struct ofpbuf *, const struct flow *); -void dpif_ipfix_flow_sample(struct dpif_ipfix *, struct ofpbuf *, +void dpif_ipfix_flow_sample(struct dpif_ipfix *, const struct ofpbuf *, const struct flow *, uint32_t, uint16_t, uint32_t, uint32_t); diff --git a/ofproto/ofproto-dpif-sflow.c b/ofproto/ofproto-dpif-sflow.c index c7e092a..fad066b 100644 --- a/ofproto/ofproto-dpif-sflow.c +++ b/ofproto/ofproto-dpif-sflow.c @@ -561,7 +561,7 @@ dpif_sflow_odp_port_to_ifindex(const struct dpif_sflow *ds, } void -dpif_sflow_received(struct dpif_sflow *ds, struct ofpbuf *packet, +dpif_sflow_received(struct dpif_sflow *ds, const struct ofpbuf *packet, const struct flow *flow, odp_port_t odp_in_port, const union user_action_cookie *cookie) OVS_EXCLUDED(mutex) diff --git a/ofproto/ofproto-dpif-sflow.h b/ofproto/ofproto-dpif-sflow.h index d53c95c..130568a 100644 --- a/ofproto/ofproto-dpif-sflow.h +++ b/ofproto/ofproto-dpif-sflow.h @@ -46,10 +46,8 @@ void dpif_sflow_del_port(struct dpif_sflow *, odp_port_t odp_port); void dpif_sflow_run(struct dpif_sflow *); void dpif_sflow_wait(struct dpif_sflow *); -void dpif_sflow_received(struct dpif_sflow *, - struct ofpbuf *, - const struct flow *, - odp_port_t odp_port, +void dpif_sflow_received(struct dpif_sflow *, const struct ofpbuf *, + const struct flow *, odp_port_t odp_port, const union user_action_cookie *); int dpif_sflow_odp_port_to_ifindex(const struct dpif_sflow *, diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c index 96a7e0b..ba74359 100644 --- a/ofproto/ofproto-dpif-upcall.c +++ b/ofproto/ofproto-dpif-upcall.c @@ -140,6 +140,7 @@ struct udpif { enum upcall_type { BAD_UPCALL, /* Some kind of bug somewhere. */ MISS_UPCALL,/* A flow miss. */ +SLOW_UPCALL,/* A
[ovs-dev] [DPDK Upcalls 11/11] dpif-netdev: Streamline miss handling.
This patch avoids the relatively inefficient miss handling processes dictated by the dpif process, by calling into ofproto-dpif directly through a callback. Signed-off-by: Ethan Jackson --- lib/dpif-netdev.c | 299 -- lib/dpif-provider.h | 5 +- lib/dpif.c| 4 +- lib/dpif.h| 18 ++- ofproto/ofproto-dpif-upcall.c | 96 ++ tests/dpif-netdev.at | 10 +- tests/ofproto-dpif.at | 54 7 files changed, 258 insertions(+), 228 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index c637d9f..d4a30bb 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -85,14 +85,7 @@ static struct ovs_mutex dp_netdev_mutex = OVS_MUTEX_INITIALIZER; static struct shash dp_netdevs OVS_GUARDED_BY(dp_netdev_mutex) = SHASH_INITIALIZER(&dp_netdevs); -struct dp_netdev_queue { -unsigned int packet_count; - -struct dpif_upcall upcalls[NETDEV_MAX_RX_BATCH]; -struct ofpbuf bufs[NETDEV_MAX_RX_BATCH]; -}; - -#define DP_NETDEV_QUEUE_INITIALIZER { .packet_count = 0 } +static struct vlog_rate_limit upcall_rl = VLOG_RATE_LIMIT_INIT(600, 600); /* Datapath based on the network device interface from netdev.h. * @@ -140,7 +133,8 @@ struct dp_netdev { /* Protects access to ofproto-dpif-upcall interface during revalidator * thread synchronization. */ struct fat_rwlock upcall_rwlock; -exec_upcall_cb *upcall_cb; /* Callback function for executing upcalls. */ +upcall_callback *upcall_cb; /* Callback function for executing upcalls. */ +void *upcall_aux; /* Forwarding threads. */ struct latch exit_latch; @@ -320,12 +314,6 @@ static void do_del_port(struct dp_netdev *dp, struct dp_netdev_port *) OVS_REQUIRES(dp->port_mutex); static int dpif_netdev_open(const struct dpif_class *, const char *name, bool create, struct dpif **); -static int dp_netdev_queue_userspace_packet(struct dp_netdev_queue *, -struct ofpbuf *, int type, -const struct miniflow *, -const struct nlattr *); -static void dp_netdev_execute_userspace_queue(struct dp_netdev_queue *, - struct dp_netdev *); static void dp_netdev_execute_actions(struct dp_netdev *dp, struct dpif_packet **, int c, bool may_steal, struct pkt_metadata *, @@ -474,6 +462,7 @@ create_dp_netdev(const char *name, const struct dpif_class *class, /* Disable upcalls by default. */ dp_netdev_disable_upcall(dp); +dp->upcall_aux = NULL; dp->upcall_cb = NULL; ovs_mutex_lock(&dp->port_mutex); @@ -1232,6 +1221,35 @@ dp_netdev_flow_add(struct dp_netdev *dp, struct match *match, classifier_insert(&dp->cls, CONST_CAST(struct cls_rule *, &netdev_flow->cr)); +if (OVS_UNLIKELY(VLOG_IS_DBG_ENABLED())) { +struct ds ds = DS_EMPTY_INITIALIZER; +struct ofpbuf key, mask; + +ofpbuf_init(&key, 0); +ofpbuf_init(&mask, 0); + +ds_put_cstr(&ds, "flow_add: "); + +odp_flow_key_from_flow(&key, &match->flow, &match->wc.masks, + match->flow.in_port.odp_port, true); + +odp_flow_key_from_mask(&mask, &match->wc.masks, &match->flow, + odp_to_u32(match->wc.masks.in_port.odp_port), + SIZE_MAX, true); + +odp_flow_format(ofpbuf_data(&key), ofpbuf_size(&key), +ofpbuf_data(&mask), ofpbuf_size(&mask), NULL, &ds, +true); + +ds_put_cstr(&ds, ", actions:"); +format_odp_actions(&ds, actions, actions_len); +VLOG_DBG("%s", ds_cstr(&ds)); + +ds_destroy(&ds); +ofpbuf_uninit(&key); +ofpbuf_uninit(&mask); +} + return 0; } @@ -1822,6 +1840,44 @@ dp_netdev_count_packet(struct dp_netdev *dp, enum dp_stat_type type, int cnt) ovs_mutex_unlock(&bucket->mutex); } +static int +dp_netdev_upcall(struct dp_netdev *dp, struct dpif_packet *packet_, + struct flow *flow, struct flow_wildcards *wc, + enum dpif_upcall_type type, const struct nlattr *userdata, + struct ofpbuf *actions, struct ofpbuf *put_actions) +{ +struct ofpbuf *packet = &packet_->ofpbuf; + +if (type == DPIF_UC_MISS) { +dp_netdev_count_packet(dp, DP_STAT_MISS, 1); +} + +if (OVS_UNLIKELY(!VLOG_DROP_DBG(&upcall_rl))) { +struct ds ds = DS_EMPTY_INITIALIZER; +struct ofpbuf key; +char *packe
Re: [ovs-dev] [DPDK Upcalls 01/11] style: Replace TODO with XXX.
1 through 10 are all ofproto reorganization, so I think you or jarno would be appropriate reviewers. Ethan (iPhone) > On Aug 3, 2014, at 19:24, Ben Pfaff wrote: > >> On Fri, Aug 01, 2014 at 06:39:12PM -0700, Ethan Jackson wrote: >> In accordance with CodingStyle. >> >> Signed-off-by: Ethan Jackson > > Pravin has been doing most of the review for DPDK work, I think, but let > me know if you want me to look at any of these. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [DPDK Upcalls 06/11] xlate: Allow callers to supply their own odp_actions buffer.
I did but I was lazy. I'll change it. Ethan On Tue, Aug 5, 2014 at 1:02 PM, Ben Pfaff wrote: > On Fri, Aug 01, 2014 at 06:39:17PM -0700, Ethan Jackson wrote: >> In future patches, this will allow dpif-netdev to avoid a copy. >> >> Signed-off-by: Ethan Jackson > > It might be better to require callers to supply a buffer, because it > reduces special cases. Did you consider that approach? ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [DPDK Upcalls 06/11] xlate: Allow callers to supply their own odp_actions buffer.
Actually, the more I think about it. I think I made the right case. The only caller that wants to supply it's own odp_actions buffer is dpif-netdev. For everyone else, requiring a user supplied buffer would require some rather complicated memory management. I'd rather hide that complexity within the xlate module. Thoughts? Ethan On Tue, Aug 5, 2014 at 2:04 PM, Ethan Jackson wrote: > I did but I was lazy. I'll change it. > > Ethan > > On Tue, Aug 5, 2014 at 1:02 PM, Ben Pfaff wrote: >> On Fri, Aug 01, 2014 at 06:39:17PM -0700, Ethan Jackson wrote: >>> In future patches, this will allow dpif-netdev to avoid a copy. >>> >>> Signed-off-by: Ethan Jackson >> >> It might be better to require callers to supply a buffer, because it >> reduces special cases. Did you consider that approach? ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [DPDK Upcalls 07/11] xlate: Handle VLAN splinters in xlate_actions().
Basically I want the packet passed into the upcall callback to be "const" so I have some guarantee that it's not modified in a way that will confuse the datapath. Unlike in the standard linux datapath, we're dealing with the actual packet, not a copy, so modifications could have some rather surprising results. Perhaps a compromise solution would be to pull the VLAN splinters handling out of xlate_recieve(), and only call it in the dpif-linux case, not in the dpif-netdev callback case. Thoughts? Ethan On Tue, Aug 5, 2014 at 1:31 PM, Ben Pfaff wrote: > On Fri, Aug 01, 2014 at 06:39:18PM -0700, Ethan Jackson wrote: >> I find this quite a bit simpler to reason about. Also, future patches >> require xlate_actions() not to modify the packet. >> >> Signed-off-by: Ethan Jackson > > This actually make xlate_actions() modify the packet more than before, > although it does change it back. Did you mean that future patches > require xlate_receive() not to modify the packet? > > The reason that we do this in xlate_receive() is basically the > end-to-end principle: if we add the VLAN header as early as possible, > then it means that nothing else in userspace has to know that the > packet and the flow aren't really correct. If we do it later on, we > have to audit everything between xlate_receive() and xlate_actions() > to make sure that it doesn't depend on vlan_tci or whether there's a > VLAN header in the packet. That's hard to do (did you do it?), and if > you get it wrong in a corner case we probably won't find out for a > long time because very few installations need the VLAN splinters > feature. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH RFC v2] lacp: Prefer slaves with running partner when selecting lead
Based on my (long ago) reading of the LACP spec, only supporting a single aggregator is a valid configuration. Furthermore, it's what makes the most sense given the structure of the OVS bonding configuration. I'd really rather not make a non standard change to the protocol to support a buggy upstream mlag implementation cause I don't know how it could affect other less buggy switches. My preferences is to shelve this for now FWIW. Ethan On Tue, Aug 5, 2014 at 2:16 PM, Flavio Leitner wrote: > On Mon, Aug 04, 2014 at 12:08:48PM -0700, Andy Zhou wrote: >> Zoltan, >> >> Sorry it took a while to get back to you. I am just coming up to >> speed on OVS LACP implementation, so my understanding may not be >> correct. Please feel free to point them out If I am wrong. >> >> According to wikipeida MC-LAG entry, there is no standard for it, they >> are mostly designed and implemented by vendors. >> >> After reading through the commit message, and comparing with the >> 802.1AX spec, I feel this seems like there is a bug in the MC-LAG >> implementation/configuration issue. When the partner on port A comes >> back again, should it wait for MC-LAG sync before using the default >> profile to exchange states with OVS? > > I agree that it sounds like a problem in the MC-LAG. However, I also > agree that OVS could do better. > > The aggregation selection policy is somewhat a gray area not defined > in any spec. The bonding driver offers ad_select= parameter which > allows to switch to the new aggregator only if, for instance, all the > ports are down in an active aggregator. > > The Team driver implementing 802.3ad also provides the policy selection > parameter. The default is to consider the prio in the LACPDU, but you > can also tell to not select any other aggregator if the current one is > still usable, or per bandwidth or per number of ports available. > > My suggestion if we want to change something is to stick with bonding > driver default behavior regarding to select a new aggregator: > """ > table or 0 > > The active aggregator is chosen by largest aggregate > bandwidth. > > Reselection of the active aggregator occurs only when all > slaves of the active aggregator are down or the active > aggregator has no slaves. > > This is the default value. > """ > Documentation/networking/bonding.txt > > That would avoid problems with transient states like the reported one. > > fbl > >> On Mon, Jul 14, 2014 at 3:11 PM, Ben Pfaff wrote: >> > On Tue, Jul 08, 2014 at 05:35:57PM +0100, Zoltan Kiss wrote: >> >> This patch modifies the LACP selection logic by prefering a slaves with >> >> up and >> >> running partners when looking for a lead. >> >> That fixes the following scenario: >> >> - bond has 2 ports, A and B, their other ends are in separate chassis with >> >> MC-LAG sync >> >> - the partner of port A is restarted >> >> - port B is still working >> >> - the partner on port A comes back, but temporarily it is using a default >> >> config, as MC-LAG haven't synced yet >> >> - apparently that default config has a sys_priority which is smaller than >> >> the >> >> other, still running port, plus completely different sys_id >> >> - therefore OVS choose port A despite it won't ever comes up into >> >> collecting-distributing state >> >> - and port B is disabled, causing the whole bond goes down >> >> >> >> Checking through the 802.1ax standard, when port A comes up again, the two >> >> links fall apart due to the different LAG IDs. They should be attached to >> >> different Aggregators, and the Aggregators should live separately. In OVS >> >> there >> >> is no such concept as Aggregator, but I think it should be said that it >> >> has only >> >> one Aggregator, and it has an unique policy to choose which ports can >> >> join. >> >> Although changing the chassis' default config can also fix this, detecting >> >> such problems quite hard, therefore I think it is still valid to improve >> >> things >> >> in OVS side. >> >> Btw. the Linux kernel bonding drivers' LACP implementation allows more >> >> aggregators, and therefore it could handle this situation properly. >> >> >> >> Signed-off-by: Zoltan Kiss >> > >> > I verified that the unit tests still pass with this applied. >> > >> > Andy Zhou said he'd review the patch. >> ___ >> dev mailing list >> dev@openvswitch.org >> http://openvswitch.org/mailman/listinfo/dev >> > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [DPDK Upcalls v2 1/3] flow: Parse MPLS should return the actual number of labels.
This problem is uncovered by a future patch. Signed-off-by: Ethan Jackson --- lib/flow.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/flow.c b/lib/flow.c index 5e04015..64f7ba7 100644 --- a/lib/flow.c +++ b/lib/flow.c @@ -210,7 +210,7 @@ parse_mpls(void **datap, size_t *sizep) break; } } -return MAX(count, FLOW_MAX_MPLS_LABELS); +return MIN(count, FLOW_MAX_MPLS_LABELS); } static inline ovs_be16 -- 1.8.1.2 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [DPDK Upcalls v2 2/3] ofproto: Reorganize in preparation for direct dpdk upcalls.
This patch reorganizes ofproto-dpif in preparation for future patches which allow direct upcall processing from dpif-netdev. The main goals are to share as much code as possible between the dpif-linux and dpif-netdev upcall paths. Additionally, to avoid confusing the dpif-netdev fast path, the packet processing path should treat packets and struct flow's as const. Signed-off-by: Ethan Jackson --- ofproto/ofproto-dpif-ipfix.c | 8 +- ofproto/ofproto-dpif-ipfix.h | 4 +- ofproto/ofproto-dpif-sflow.c | 2 +- ofproto/ofproto-dpif-sflow.h | 6 +- ofproto/ofproto-dpif-upcall.c | 550 -- ofproto/ofproto-dpif-upcall.h | 3 - ofproto/ofproto-dpif-xlate.c | 132 +- ofproto/ofproto-dpif-xlate.h | 17 +- ofproto/ofproto-dpif.c| 60 +++-- ofproto/ofproto-dpif.h| 3 +- 10 files changed, 433 insertions(+), 352 deletions(-) diff --git a/ofproto/ofproto-dpif-ipfix.c b/ofproto/ofproto-dpif-ipfix.c index 1584c25..ff498d0 100644 --- a/ofproto/ofproto-dpif-ipfix.c +++ b/ofproto/ofproto-dpif-ipfix.c @@ -1019,7 +1019,7 @@ ipfix_cache_update(struct dpif_ipfix_exporter *exporter, static void ipfix_cache_entry_init(struct ipfix_flow_cache_entry *entry, - struct ofpbuf *packet, const struct flow *flow, + const struct ofpbuf *packet, const struct flow *flow, uint64_t packet_delta_count, uint32_t obs_domain_id, uint32_t obs_point_id) { @@ -1284,7 +1284,7 @@ ipfix_send_data_msg(struct dpif_ipfix_exporter *exporter, static void dpif_ipfix_sample(struct dpif_ipfix_exporter *exporter, - struct ofpbuf *packet, const struct flow *flow, + const struct ofpbuf *packet, const struct flow *flow, uint64_t packet_delta_count, uint32_t obs_domain_id, uint32_t obs_point_id) { @@ -1298,7 +1298,7 @@ dpif_ipfix_sample(struct dpif_ipfix_exporter *exporter, } void -dpif_ipfix_bridge_sample(struct dpif_ipfix *di, struct ofpbuf *packet, +dpif_ipfix_bridge_sample(struct dpif_ipfix *di, const struct ofpbuf *packet, const struct flow *flow) OVS_EXCLUDED(mutex) { uint64_t packet_delta_count; @@ -1315,7 +1315,7 @@ dpif_ipfix_bridge_sample(struct dpif_ipfix *di, struct ofpbuf *packet, } void -dpif_ipfix_flow_sample(struct dpif_ipfix *di, struct ofpbuf *packet, +dpif_ipfix_flow_sample(struct dpif_ipfix *di, const struct ofpbuf *packet, const struct flow *flow, uint32_t collector_set_id, uint16_t probability, uint32_t obs_domain_id, uint32_t obs_point_id) OVS_EXCLUDED(mutex) diff --git a/ofproto/ofproto-dpif-ipfix.h b/ofproto/ofproto-dpif-ipfix.h index 6ebf8b0..9de17ab 100644 --- a/ofproto/ofproto-dpif-ipfix.h +++ b/ofproto/ofproto-dpif-ipfix.h @@ -35,9 +35,9 @@ void dpif_ipfix_set_options( const struct ofproto_ipfix_bridge_exporter_options *, const struct ofproto_ipfix_flow_exporter_options *, size_t); -void dpif_ipfix_bridge_sample(struct dpif_ipfix *, struct ofpbuf *, +void dpif_ipfix_bridge_sample(struct dpif_ipfix *, const struct ofpbuf *, const struct flow *); -void dpif_ipfix_flow_sample(struct dpif_ipfix *, struct ofpbuf *, +void dpif_ipfix_flow_sample(struct dpif_ipfix *, const struct ofpbuf *, const struct flow *, uint32_t, uint16_t, uint32_t, uint32_t); diff --git a/ofproto/ofproto-dpif-sflow.c b/ofproto/ofproto-dpif-sflow.c index 69dfc03..32cf51b 100644 --- a/ofproto/ofproto-dpif-sflow.c +++ b/ofproto/ofproto-dpif-sflow.c @@ -563,7 +563,7 @@ dpif_sflow_odp_port_to_ifindex(const struct dpif_sflow *ds, } void -dpif_sflow_received(struct dpif_sflow *ds, struct ofpbuf *packet, +dpif_sflow_received(struct dpif_sflow *ds, const struct ofpbuf *packet, const struct flow *flow, odp_port_t odp_in_port, const union user_action_cookie *cookie) OVS_EXCLUDED(mutex) diff --git a/ofproto/ofproto-dpif-sflow.h b/ofproto/ofproto-dpif-sflow.h index d53c95c..130568a 100644 --- a/ofproto/ofproto-dpif-sflow.h +++ b/ofproto/ofproto-dpif-sflow.h @@ -46,10 +46,8 @@ void dpif_sflow_del_port(struct dpif_sflow *, odp_port_t odp_port); void dpif_sflow_run(struct dpif_sflow *); void dpif_sflow_wait(struct dpif_sflow *); -void dpif_sflow_received(struct dpif_sflow *, - struct ofpbuf *, - const struct flow *, - odp_port_t odp_port, +void dpif_sflow_received(struct dpif_sflow *, const struct ofpbuf *, + const struct flow *, odp_port_t odp_port, const union user_action_cookie *); int dpif_sflow_odp_port_to_ifindex(const struct dpif_sflow *, diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c index 66728e6..bf
[ovs-dev] [DPDK Upcalls v2 3/3] dpif-netdev: Streamline miss handling.
This patch avoids the relatively inefficient miss handling processes dictated by the dpif process, by calling into ofproto-dpif directly through a callback. Signed-off-by: Ethan Jackson --- lib/dpif-netdev.c | 299 -- lib/dpif-provider.h | 5 +- lib/dpif.c| 4 +- lib/dpif.h| 18 ++- ofproto/ofproto-dpif-upcall.c | 109 ++- tests/dpif-netdev.at | 10 +- tests/ofproto-dpif.at | 54 7 files changed, 262 insertions(+), 237 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index c637d9f..d4a30bb 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -85,14 +85,7 @@ static struct ovs_mutex dp_netdev_mutex = OVS_MUTEX_INITIALIZER; static struct shash dp_netdevs OVS_GUARDED_BY(dp_netdev_mutex) = SHASH_INITIALIZER(&dp_netdevs); -struct dp_netdev_queue { -unsigned int packet_count; - -struct dpif_upcall upcalls[NETDEV_MAX_RX_BATCH]; -struct ofpbuf bufs[NETDEV_MAX_RX_BATCH]; -}; - -#define DP_NETDEV_QUEUE_INITIALIZER { .packet_count = 0 } +static struct vlog_rate_limit upcall_rl = VLOG_RATE_LIMIT_INIT(600, 600); /* Datapath based on the network device interface from netdev.h. * @@ -140,7 +133,8 @@ struct dp_netdev { /* Protects access to ofproto-dpif-upcall interface during revalidator * thread synchronization. */ struct fat_rwlock upcall_rwlock; -exec_upcall_cb *upcall_cb; /* Callback function for executing upcalls. */ +upcall_callback *upcall_cb; /* Callback function for executing upcalls. */ +void *upcall_aux; /* Forwarding threads. */ struct latch exit_latch; @@ -320,12 +314,6 @@ static void do_del_port(struct dp_netdev *dp, struct dp_netdev_port *) OVS_REQUIRES(dp->port_mutex); static int dpif_netdev_open(const struct dpif_class *, const char *name, bool create, struct dpif **); -static int dp_netdev_queue_userspace_packet(struct dp_netdev_queue *, -struct ofpbuf *, int type, -const struct miniflow *, -const struct nlattr *); -static void dp_netdev_execute_userspace_queue(struct dp_netdev_queue *, - struct dp_netdev *); static void dp_netdev_execute_actions(struct dp_netdev *dp, struct dpif_packet **, int c, bool may_steal, struct pkt_metadata *, @@ -474,6 +462,7 @@ create_dp_netdev(const char *name, const struct dpif_class *class, /* Disable upcalls by default. */ dp_netdev_disable_upcall(dp); +dp->upcall_aux = NULL; dp->upcall_cb = NULL; ovs_mutex_lock(&dp->port_mutex); @@ -1232,6 +1221,35 @@ dp_netdev_flow_add(struct dp_netdev *dp, struct match *match, classifier_insert(&dp->cls, CONST_CAST(struct cls_rule *, &netdev_flow->cr)); +if (OVS_UNLIKELY(VLOG_IS_DBG_ENABLED())) { +struct ds ds = DS_EMPTY_INITIALIZER; +struct ofpbuf key, mask; + +ofpbuf_init(&key, 0); +ofpbuf_init(&mask, 0); + +ds_put_cstr(&ds, "flow_add: "); + +odp_flow_key_from_flow(&key, &match->flow, &match->wc.masks, + match->flow.in_port.odp_port, true); + +odp_flow_key_from_mask(&mask, &match->wc.masks, &match->flow, + odp_to_u32(match->wc.masks.in_port.odp_port), + SIZE_MAX, true); + +odp_flow_format(ofpbuf_data(&key), ofpbuf_size(&key), +ofpbuf_data(&mask), ofpbuf_size(&mask), NULL, &ds, +true); + +ds_put_cstr(&ds, ", actions:"); +format_odp_actions(&ds, actions, actions_len); +VLOG_DBG("%s", ds_cstr(&ds)); + +ds_destroy(&ds); +ofpbuf_uninit(&key); +ofpbuf_uninit(&mask); +} + return 0; } @@ -1822,6 +1840,44 @@ dp_netdev_count_packet(struct dp_netdev *dp, enum dp_stat_type type, int cnt) ovs_mutex_unlock(&bucket->mutex); } +static int +dp_netdev_upcall(struct dp_netdev *dp, struct dpif_packet *packet_, + struct flow *flow, struct flow_wildcards *wc, + enum dpif_upcall_type type, const struct nlattr *userdata, + struct ofpbuf *actions, struct ofpbuf *put_actions) +{ +struct ofpbuf *packet = &packet_->ofpbuf; + +if (type == DPIF_UC_MISS) { +dp_netdev_count_packet(dp, DP_STAT_MISS, 1); +} + +if (OVS_UNLIKELY(!VLOG_DROP_DBG(&upcall_rl))) { +struct ds ds = DS_EMPTY_INITIALIZER; +struct ofpbuf key; +char *packe
Re: [ovs-dev] [PATCH 1/2] bfd: Flip the default value of bfd ip source and destination.
Acked-by: Ethan Jackson On Fri, Aug 8, 2014 at 5:02 PM, Alex Wang wrote: > This commit flips the default value of bfd ip source and destination, > so that they match the default value of ip destination and source > of vtep schema. > > Signed-off-by: Alex Wang > --- > lib/bfd.c|4 ++-- > vswitchd/vswitch.xml |4 ++-- > 2 files changed, 4 insertions(+), 4 deletions(-) > > diff --git a/lib/bfd.c b/lib/bfd.c > index 892dfe8..9069582 100644 > --- a/lib/bfd.c > +++ b/lib/bfd.c > @@ -453,14 +453,14 @@ bfd_configure(struct bfd *bfd, const char *name, const > struct smap *cfg, > if (ip_src && bfd_lookup_ip(ip_src, &in_addr)) { > memcpy(&bfd->ip_src, &in_addr, sizeof in_addr); > } else { > -bfd->ip_src = htonl(0xA9FE0100); /* 169.254.1.0. */ > +bfd->ip_src = htonl(0xA9FE0101); /* 169.254.1.1. */ > } > > ip_dst = smap_get(cfg, "bfd_dst_ip"); > if (ip_dst && bfd_lookup_ip(ip_dst, &in_addr)) { > memcpy(&bfd->ip_dst, &in_addr, sizeof in_addr); > } else { > -bfd->ip_dst = htonl(0xA9FE0101); /* 169.254.1.1. */ > +bfd->ip_dst = htonl(0xA9FE0100); /* 169.254.1.0. */ > } > > forwarding_if_rx = smap_get_bool(cfg, "forwarding_if_rx", false); > diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml > index d47fc1a..bff8fb6 100644 > --- a/vswitchd/vswitch.xml > +++ b/vswitchd/vswitch.xml > @@ -2091,12 +2091,12 @@ > > >Set to an IPv4 address to set the IP address used as source for > - transmitted BFD packets. The default is 169.254.1.0. > + transmitted BFD packets. The default is 169.254.1.1. > > > >Set to an IPv4 address to set the IP address used as destination > - for transmitted BFD packets. The default is > 169.254.1.1. > + for transmitted BFD packets. The default is > 169.254.1.0. > > > > -- > 1.7.9.5 > > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 2/2] bfd: Allow users to set local/remote src/dst MAC address.
For the local eth_src and eth_dst, instead of having the various is set booleans, wouldn't it be simpler to just copy the default value into the struct bfd in bfd_configure() instead of doing it in bfd_put_packet()? I'm not sure we should have the rmt_eth_src flag. We should probably accept bfd packets with the appropriate eth_dst no matter what the eth_src is. Ethan On Fri, Aug 8, 2014 at 5:02 PM, Alex Wang wrote: > This commit adds four options for configuring the MAC addresses > in BFD state machine. Therein, the "bfd_local_src_mac" and > "bfd_local_dst_mac" configure the MAC address of sent BFD > packets. The "bfd_remote_src_mac" and "bfd_remote_dst_mac" > configure the matching of MAC address on recevied BFD packets. > > Signed-off-by: Alex Wang > --- > lib/bfd.c| 60 > +++--- > vswitchd/vswitch.xml | 32 +++ > 2 files changed, 75 insertions(+), 17 deletions(-) > > diff --git a/lib/bfd.c b/lib/bfd.c > index 9069582..165eb61 100644 > --- a/lib/bfd.c > +++ b/lib/bfd.c > @@ -169,8 +169,15 @@ struct bfd { > > uint32_t rmt_disc;/* bfd.RemoteDiscr. */ > > -uint8_t eth_dst[ETH_ADDR_LEN];/* Ethernet destination address. */ > -bool eth_dst_set; /* 'eth_dst' set through database. */ > +uint8_t local_eth_src[ETH_ADDR_LEN]; /* Local eth src address. */ > +bool local_eth_src_set; /* Local 'eth_src' set through db. > */ > +uint8_t local_eth_dst[ETH_ADDR_LEN]; /* Local eth dst address. */ > +bool local_eth_dst_set; /* Local 'eth_dst' set through db. > */ > + > +uint8_t rmt_eth_src[ETH_ADDR_LEN]; /* Remote eth src address. */ > +bool rmt_eth_src_set;/* Remote 'eth_src' set through db. > */ > +uint8_t rmt_eth_dst[ETH_ADDR_LEN]; /* Remote eth dst address. */ > +bool rmt_eth_dst_set;/* Remote 'eth_dst' set through db. > */ > > ovs_be32 ip_src; /* IPv4 source address. */ > ovs_be32 ip_dst; /* IPv4 destination address. */ > @@ -388,8 +395,6 @@ bfd_configure(struct bfd *bfd, const char *name, const > struct smap *cfg, > > bfd_set_state(bfd, STATE_DOWN, DIAG_NONE); > > -memcpy(bfd->eth_dst, eth_addr_bfd, ETH_ADDR_LEN); > - > bfd_status_changed(bfd); > } > > @@ -440,13 +445,36 @@ bfd_configure(struct bfd *bfd, const char *name, const > struct smap *cfg, > need_poll = true; > } > > -hwaddr = smap_get(cfg, "bfd_dst_mac"); > +hwaddr = smap_get(cfg, "bfd_local_src_mac"); > +if (hwaddr && eth_addr_from_string(hwaddr, ea) && !eth_addr_is_zero(ea)) > { > +memcpy(bfd->local_eth_src, ea, ETH_ADDR_LEN); > +bfd->local_eth_src_set = true; > +} else { > +bfd->local_eth_src_set = false; > +} > + > +hwaddr = smap_get(cfg, "bfd_local_dst_mac"); > if (hwaddr && eth_addr_from_string(hwaddr, ea) && !eth_addr_is_zero(ea)) > { > -memcpy(bfd->eth_dst, ea, ETH_ADDR_LEN); > -bfd->eth_dst_set = true; > -} else if (bfd->eth_dst_set) { > -memcpy(bfd->eth_dst, eth_addr_bfd, ETH_ADDR_LEN); > -bfd->eth_dst_set = false; > +memcpy(bfd->local_eth_dst, ea, ETH_ADDR_LEN); > +bfd->local_eth_dst_set = true; > +} else { > +bfd->local_eth_dst_set = false; > +} > + > +hwaddr = smap_get(cfg, "bfd_remote_src_mac"); > +if (hwaddr && eth_addr_from_string(hwaddr, ea) && !eth_addr_is_zero(ea)) > { > +memcpy(bfd->rmt_eth_src, ea, ETH_ADDR_LEN); > +bfd->rmt_eth_src_set = true; > +} else { > +bfd->rmt_eth_src_set = false; > +} > + > +hwaddr = smap_get(cfg, "bfd_remote_dst_mac"); > +if (hwaddr && eth_addr_from_string(hwaddr, ea) && !eth_addr_is_zero(ea)) > { > +memcpy(bfd->rmt_eth_dst, ea, ETH_ADDR_LEN); > +bfd->rmt_eth_dst_set = true; > +} else { > +bfd->rmt_eth_dst_set = false; > } > > ip_src = smap_get(cfg, "bfd_src_ip"); > @@ -600,8 +628,10 @@ bfd_put_packet(struct bfd *bfd, struct ofpbuf *p, > > ofpbuf_reserve(p, 2); /* Properly align after the ethernet header. */ > eth = ofpbuf_put_uninit(p, sizeof *eth); > -memcpy(eth->eth_src, eth_src, ETH_ADDR_LEN); > -memcpy(eth->eth_dst, bfd->eth_dst, ETH_ADDR_LEN); > +memcpy(eth->eth_src, bfd->local_eth_src_set ? bfd->local_eth_src : > eth_src, > + ETH_ADDR_LEN); > +memcpy(eth->eth_dst, bfd->local_eth_dst_set ? bfd->local_eth_dst > +: eth_addr_bfd, > ETH_ADDR_LEN); > eth->eth_type = htons(ETH_TYPE_IP); > > ip = ofpbuf_put_zeros(p, sizeof *ip); > @@ -656,8 +686,12 @@ bfd_should_process_flow(const struct bfd *bfd_, const > struct flow *flow, > struct bfd *bfd = CONST_CAST(struct bfd *, bfd_); > bool check_tnl_key; > > +memset(&wc->masks.dl_src, 0xff, sizeof wc->masks.dl_src); > memset(&wc->masks.dl_dst,
Re: [ovs-dev] [PATCH RFC v2] lacp: Prefer slaves with running partner when selecting lead
In OVS we only support a single aggregator per port. I understand that Linux supports multiple, and that perhaps this problem would be easier to work around if we did as well. But today we don't, and IMO implementing this feature would be rather complex. Of course, an implementation would be welcome. The spec is quite clear about which slaves should enabled within a particular LACP aggregator. I'm not saying that the spec should be followed in all cases, but the bar for making non standard changes should be extremely high. Basically in order to merge something like this, I would like evidence that either my interpretation of the spec is incorrect, or that some major LACP implementation (i.e. cisco) has similar non standard behavior. Protocol changes like this can have significant unforeseen consequences which is driving my inclination to be conservative in this case. I understand that this change would be a convenient workaround to a buggy m-lag implementation. However, based on my reading of this thread so far, I'm not totally convinced that it's prudent. I could be convinced, but I'm not quite there yet. I think it's best to shelve it for now. Ethan On Mon, Aug 11, 2014 at 6:09 PM, Flavio Leitner wrote: > On Mon, Aug 11, 2014 at 05:47:48PM +0100, Zoltan Kiss wrote: >> On 08/08/14 02:38, Flavio Leitner wrote: >> >On Thu, Aug 07, 2014 at 04:19:17PM +0100, Zoltan Kiss wrote: >> >>On 05/08/14 22:16, Flavio Leitner wrote: >> >>>On Mon, Aug 04, 2014 at 12:08:48PM -0700, Andy Zhou wrote: >> Zoltan, >> >> Sorry it took a while to get back to you. I am just coming up to >> speed on OVS LACP implementation, so my understanding may not be >> correct. Please feel free to point them out If I am wrong. >> >> According to wikipeida MC-LAG entry, there is no standard for it, they >> are mostly designed and implemented by vendors. >> >> After reading through the commit message, and comparing with the >> 802.1AX spec, I feel this seems like there is a bug in the MC-LAG >> implementation/configuration issue. When the partner on port A comes >> back again, should it wait for MC-LAG sync before using the default >> profile to exchange states with OVS? >> >>> >> >>>I agree that it sounds like a problem in the MC-LAG. However, I also >> >>>agree that OVS could do better. >> >>> >> >>>The aggregation selection policy is somewhat a gray area not defined >> >>>in any spec. The bonding driver offers ad_select= parameter which >> >>>allows to switch to the new aggregator only if, for instance, all the >> >>>ports are down in an active aggregator. >> >>> >> >>>The Team driver implementing 802.3ad also provides the policy selection >> >>>parameter. The default is to consider the prio in the LACPDU, but you >> >>>can also tell to not select any other aggregator if the current one is >> >>>still usable, or per bandwidth or per number of ports available. >> >>> >> >>>My suggestion if we want to change something is to stick with bonding >> >>>driver default behavior regarding to select a new aggregator: >> >>>""" >> >>>table or 0 >> >>> >> >>> The active aggregator is chosen by largest aggregate >> >>> bandwidth. >> >>> >> >>> Reselection of the active aggregator occurs only when all >> >>> slaves of the active aggregator are down or the active >> >>> aggregator has no slaves. >> >>> >> >>> This is the default value. >> >>>""" >> >>>Documentation/networking/bonding.txt >> >>As far as I understood, there isn't really a concept of aggregator in OVS, >> >>but "lead" in lacp_update_attached() is something similar. However it is >> >>recalculated at every iteration, and it is simply just the slave with the >> >>lower priority. >> > >> >Yes, that's my understanding as well. Rhe slave with lower priority >> >carries the information about the partner which then is used to >> >compare with other slaves. The ones matching with lead's partner info >> >are in the same trunk, so they are attached, otherwise it's either not >> >established or is in another trunk (not attached). >> > >> >>Do you mean we should save "lead" into a new field in struct lacp, and >> >>select a new one only if lacp_partner_ready(lead) == false? >> > >> >I think it will require a bit more than that because if the "lead" >> >slave fails for some reason, the slave is not CURRENT anymore but the >> >LACP might be still valid because of another slave in CURRENT state, so >> >the "lead" might need to be replaced by another slave in the same trunk >> >to keep it up. Otherwise, the failing "lead" might receive another >> >partner info (the default config in your case) and that would break >> >the established trunk. >> In my scenario one of the slaves reboot, so it becomes DEFAULTED, and the >> another one carry on, and becomes lead. And when this rebooting slave comes >> back with the default config, it becomes CURRENT and then lead immediately >> because it's priority is lower. However its par
Re: [ovs-dev] [DPDK Upcalls v2 2/3] ofproto: Reorganize in preparation for direct dpdk upcalls.
> In ofproto-dpif-upcall.c, struct upcall seriously needs comments on > the members. Some of them are baffling at first glance (put_actions, > userdata, vsp_adjusted). Done. > I'm not sure of the value of the new 'put_actions' member. It only > gets used in one place. That place can't compose_slow_path() itself? > It did before this commit. So in the final patch, dpif-netdev will use the put_actions to figure exactly what to install for slow path actions. An alternative would be for the upcall callback to somehow indicate that slow path actions are required, and that it should do the conversion itself. Do you think that might be cleaner? > There is a new distinction between MISS_UPCALL and SLOW_UPCALL, but > they are handled identically. Why distinguish? Leftover from old patch versions. I've changed it. > recv_upcalls() and exec_upcalls() have a lot of code in common. Can > recv_upcalls() use exec_upcalls() as a helper? Yep, this is just an intermediate issue, the very next patch fixes this. I'd prefer to leave it if that's alright as it makes the diffs a bit cleaner. > struct upcall is quite large (about 600 bytes) because of struct > xlate_out (about 500 bytes). That makes the memset() at the beginning > of upcall_receive() quite expensive. I think that it is unnecessary, > because about half of the space in the xlate_out is an ofpbuf stub, > and in any case xlate_actions() will initialize everything necessary > in the xlate_out. Done > s/datpath/datapath/ in the comment on xlate_receive(). Done ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [DPDK Upcalls v2 3/3] dpif-netdev: Streamline miss handling.
> In ofproto-dpif-upcall.c, there are two blank lines above upcall_cb > (horrors!). I'm experimenting with a new form of avant-garde post-post-post-modern dadaist syntax formatting. Only those with the most discerning taste could possibly comprehend its momentous implications. > Did you test with megaflows disabled, by the way? I am not sure > whether all-1-bits is an appropriate wildcard mask. There's a unit test which catches it. Everything else made sense so I change the code accordingly. Ethan ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [DPDK Upcalls v3 2/3] ofproto: Reorganize in preparation for direct dpdk upcalls.
This patch reorganizes ofproto-dpif in preparation for future patches which allow direct upcall processing from dpif-netdev. The main goals are to share as much code as possible between the dpif-linux and dpif-netdev upcall paths. Additionally, to avoid confusing the dpif-netdev fast path, the packet processing path should treat packets and struct flow's as const. Signed-off-by: Ethan Jackson --- ofproto/ofproto-dpif-ipfix.c | 8 +- ofproto/ofproto-dpif-ipfix.h | 4 +- ofproto/ofproto-dpif-sflow.c | 2 +- ofproto/ofproto-dpif-sflow.h | 6 +- ofproto/ofproto-dpif-upcall.c | 555 -- ofproto/ofproto-dpif-upcall.h | 3 - ofproto/ofproto-dpif-xlate.c | 132 +- ofproto/ofproto-dpif-xlate.h | 17 +- ofproto/ofproto-dpif.c| 60 +++-- ofproto/ofproto-dpif.h| 3 +- 10 files changed, 437 insertions(+), 353 deletions(-) diff --git a/ofproto/ofproto-dpif-ipfix.c b/ofproto/ofproto-dpif-ipfix.c index 1584c25..ff498d0 100644 --- a/ofproto/ofproto-dpif-ipfix.c +++ b/ofproto/ofproto-dpif-ipfix.c @@ -1019,7 +1019,7 @@ ipfix_cache_update(struct dpif_ipfix_exporter *exporter, static void ipfix_cache_entry_init(struct ipfix_flow_cache_entry *entry, - struct ofpbuf *packet, const struct flow *flow, + const struct ofpbuf *packet, const struct flow *flow, uint64_t packet_delta_count, uint32_t obs_domain_id, uint32_t obs_point_id) { @@ -1284,7 +1284,7 @@ ipfix_send_data_msg(struct dpif_ipfix_exporter *exporter, static void dpif_ipfix_sample(struct dpif_ipfix_exporter *exporter, - struct ofpbuf *packet, const struct flow *flow, + const struct ofpbuf *packet, const struct flow *flow, uint64_t packet_delta_count, uint32_t obs_domain_id, uint32_t obs_point_id) { @@ -1298,7 +1298,7 @@ dpif_ipfix_sample(struct dpif_ipfix_exporter *exporter, } void -dpif_ipfix_bridge_sample(struct dpif_ipfix *di, struct ofpbuf *packet, +dpif_ipfix_bridge_sample(struct dpif_ipfix *di, const struct ofpbuf *packet, const struct flow *flow) OVS_EXCLUDED(mutex) { uint64_t packet_delta_count; @@ -1315,7 +1315,7 @@ dpif_ipfix_bridge_sample(struct dpif_ipfix *di, struct ofpbuf *packet, } void -dpif_ipfix_flow_sample(struct dpif_ipfix *di, struct ofpbuf *packet, +dpif_ipfix_flow_sample(struct dpif_ipfix *di, const struct ofpbuf *packet, const struct flow *flow, uint32_t collector_set_id, uint16_t probability, uint32_t obs_domain_id, uint32_t obs_point_id) OVS_EXCLUDED(mutex) diff --git a/ofproto/ofproto-dpif-ipfix.h b/ofproto/ofproto-dpif-ipfix.h index 6ebf8b0..9de17ab 100644 --- a/ofproto/ofproto-dpif-ipfix.h +++ b/ofproto/ofproto-dpif-ipfix.h @@ -35,9 +35,9 @@ void dpif_ipfix_set_options( const struct ofproto_ipfix_bridge_exporter_options *, const struct ofproto_ipfix_flow_exporter_options *, size_t); -void dpif_ipfix_bridge_sample(struct dpif_ipfix *, struct ofpbuf *, +void dpif_ipfix_bridge_sample(struct dpif_ipfix *, const struct ofpbuf *, const struct flow *); -void dpif_ipfix_flow_sample(struct dpif_ipfix *, struct ofpbuf *, +void dpif_ipfix_flow_sample(struct dpif_ipfix *, const struct ofpbuf *, const struct flow *, uint32_t, uint16_t, uint32_t, uint32_t); diff --git a/ofproto/ofproto-dpif-sflow.c b/ofproto/ofproto-dpif-sflow.c index 69dfc03..32cf51b 100644 --- a/ofproto/ofproto-dpif-sflow.c +++ b/ofproto/ofproto-dpif-sflow.c @@ -563,7 +563,7 @@ dpif_sflow_odp_port_to_ifindex(const struct dpif_sflow *ds, } void -dpif_sflow_received(struct dpif_sflow *ds, struct ofpbuf *packet, +dpif_sflow_received(struct dpif_sflow *ds, const struct ofpbuf *packet, const struct flow *flow, odp_port_t odp_in_port, const union user_action_cookie *cookie) OVS_EXCLUDED(mutex) diff --git a/ofproto/ofproto-dpif-sflow.h b/ofproto/ofproto-dpif-sflow.h index d53c95c..130568a 100644 --- a/ofproto/ofproto-dpif-sflow.h +++ b/ofproto/ofproto-dpif-sflow.h @@ -46,10 +46,8 @@ void dpif_sflow_del_port(struct dpif_sflow *, odp_port_t odp_port); void dpif_sflow_run(struct dpif_sflow *); void dpif_sflow_wait(struct dpif_sflow *); -void dpif_sflow_received(struct dpif_sflow *, - struct ofpbuf *, - const struct flow *, - odp_port_t odp_port, +void dpif_sflow_received(struct dpif_sflow *, const struct ofpbuf *, + const struct flow *, odp_port_t odp_port, const union user_action_cookie *); int dpif_sflow_odp_port_to_ifindex(const struct dpif_sflow *, diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c index 66728e6..dd
[ovs-dev] [DPDK Upcalls v3 3/3] dpif-netdev: Streamline miss handling.
This patch avoids the relatively inefficient miss handling processes dictated by the dpif process, by calling into ofproto-dpif directly through a callback. Signed-off-by: Ethan Jackson --- lib/dpif-netdev.c | 293 +++--- lib/dpif-provider.h | 5 +- lib/dpif.c| 4 +- lib/dpif.h| 35 - ofproto/ofproto-dpif-upcall.c | 107 ++- tests/dpif-netdev.at | 24 ++-- tests/ofproto-dpif.at | 168 7 files changed, 335 insertions(+), 301 deletions(-) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 0993b0c..56f56ce 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -85,14 +85,7 @@ static struct ovs_mutex dp_netdev_mutex = OVS_MUTEX_INITIALIZER; static struct shash dp_netdevs OVS_GUARDED_BY(dp_netdev_mutex) = SHASH_INITIALIZER(&dp_netdevs); -struct dp_netdev_queue { -unsigned int packet_count; - -struct dpif_upcall upcalls[NETDEV_MAX_RX_BATCH]; -struct ofpbuf bufs[NETDEV_MAX_RX_BATCH]; -}; - -#define DP_NETDEV_QUEUE_INITIALIZER { .packet_count = 0 } +static struct vlog_rate_limit upcall_rl = VLOG_RATE_LIMIT_INIT(600, 600); /* Datapath based on the network device interface from netdev.h. * @@ -140,7 +133,8 @@ struct dp_netdev { /* Protects access to ofproto-dpif-upcall interface during revalidator * thread synchronization. */ struct fat_rwlock upcall_rwlock; -exec_upcall_cb *upcall_cb; /* Callback function for executing upcalls. */ +upcall_callback *upcall_cb; /* Callback function for executing upcalls. */ +void *upcall_aux; /* Forwarding threads. */ struct latch exit_latch; @@ -324,12 +318,6 @@ static void do_del_port(struct dp_netdev *dp, struct dp_netdev_port *) OVS_REQUIRES(dp->port_mutex); static int dpif_netdev_open(const struct dpif_class *, const char *name, bool create, struct dpif **); -static int dp_netdev_queue_userspace_packet(struct dp_netdev_queue *, -struct ofpbuf *, int type, -const struct miniflow *, -const struct nlattr *); -static void dp_netdev_execute_userspace_queue(struct dp_netdev_queue *, - struct dp_netdev *); static void dp_netdev_execute_actions(struct dp_netdev *dp, struct dpif_packet **, int c, bool may_steal, struct pkt_metadata *, @@ -478,6 +466,7 @@ create_dp_netdev(const char *name, const struct dpif_class *class, /* Disable upcalls by default. */ dp_netdev_disable_upcall(dp); +dp->upcall_aux = NULL; dp->upcall_cb = NULL; ovs_mutex_lock(&dp->port_mutex); @@ -1246,6 +1235,19 @@ dp_netdev_flow_add(struct dp_netdev *dp, struct match *match, classifier_insert(&dp->cls, CONST_CAST(struct cls_rule *, &netdev_flow->cr)); +if (OVS_UNLIKELY(VLOG_IS_DBG_ENABLED())) { +struct ds ds = DS_EMPTY_INITIALIZER; + +ds_put_cstr(&ds, "flow_add: "); +match_format(match, &ds, OFP_DEFAULT_PRIORITY); +ds_put_cstr(&ds, ", actions:"); +format_odp_actions(&ds, actions, actions_len); + +VLOG_DBG("%s", ds_cstr(&ds)); + +ds_destroy(&ds); +} + return 0; } @@ -1860,6 +1862,48 @@ dp_netdev_count_packet(struct dp_netdev *dp, enum dp_stat_type type, int cnt) ovs_mutex_unlock(&bucket->mutex); } +static int +dp_netdev_upcall(struct dp_netdev *dp, struct dpif_packet *packet_, + struct flow *flow, struct flow_wildcards *wc, + enum dpif_upcall_type type, const struct nlattr *userdata, + struct ofpbuf *actions, struct ofpbuf *put_actions) +{ +struct ofpbuf *packet = &packet_->ofpbuf; + +if (type == DPIF_UC_MISS) { +dp_netdev_count_packet(dp, DP_STAT_MISS, 1); +} + +if (OVS_UNLIKELY(!dp->upcall_cb)) { +return ENODEV; +} + +if (OVS_UNLIKELY(!VLOG_DROP_DBG(&upcall_rl))) { +struct ds ds = DS_EMPTY_INITIALIZER; +struct ofpbuf key; +char *packet_str; + +ofpbuf_init(&key, 0); +odp_flow_key_from_flow(&key, flow, &wc->masks, flow->in_port.odp_port, + true); + +packet_str = ofp_packet_to_string(ofpbuf_data(packet), + ofpbuf_size(packet)); + +odp_flow_key_format(ofpbuf_data(&key), ofpbuf_size(&key), &ds); + +VLOG_DBG("%s: %s upcall:\n%s\n%s", dp->name, + dpif_upcall_type_to_string(type), ds_cstr(&ds), packet_str); + +ofpbuf_uninit(&key); +free(packet_str
[ovs-dev] [DPDK Upcalls v3 1/3] flow: Parse MPLS should return the actual number of labels.
This problem is uncovered by a future patch. Signed-off-by: Ethan Jackson Acked-by: Ben Pfaff --- lib/flow.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/flow.c b/lib/flow.c index 2e5ca0a..29b331e 100644 --- a/lib/flow.c +++ b/lib/flow.c @@ -210,7 +210,7 @@ parse_mpls(void **datap, size_t *sizep) break; } } -return MAX(count, FLOW_MAX_MPLS_LABELS); +return MIN(count, FLOW_MAX_MPLS_LABELS); } static inline ovs_be16 -- 1.8.1.2 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH V2] bfd: Allow users to set local/remote src/dst MAC address.
Acked-by: Ethan Jackson On Mon, Aug 11, 2014 at 6:34 PM, Alex Wang wrote: > This commit adds options for configuring the MAC addresses > in BFD state machine. Therein, the "bfd_local_src_mac" and > "bfd_local_dst_mac" configure the MAC address of sent BFD > packets. The "bfd_remote_dst_mac" configure the matching > of MAC address on recevied BFD packets. > > Signed-off-by: Alex Wang > > --- > PATCH -> V2: > - simplify the code. > --- > lib/bfd.c| 47 +-- > vswitchd/vswitch.xml | 23 +++ > 2 files changed, 52 insertions(+), 18 deletions(-) > > diff --git a/lib/bfd.c b/lib/bfd.c > index 9069582..5c86451 100644 > --- a/lib/bfd.c > +++ b/lib/bfd.c > @@ -169,8 +169,10 @@ struct bfd { > > uint32_t rmt_disc;/* bfd.RemoteDiscr. */ > > -uint8_t eth_dst[ETH_ADDR_LEN];/* Ethernet destination address. */ > -bool eth_dst_set; /* 'eth_dst' set through database. */ > +uint8_t local_eth_src[ETH_ADDR_LEN]; /* Local eth src address. */ > +uint8_t local_eth_dst[ETH_ADDR_LEN]; /* Local eth dst address. */ > + > +uint8_t rmt_eth_dst[ETH_ADDR_LEN]; /* Remote eth dst address. */ > > ovs_be32 ip_src; /* IPv4 source address. */ > ovs_be32 ip_dst; /* IPv4 destination address. */ > @@ -388,8 +390,6 @@ bfd_configure(struct bfd *bfd, const char *name, const > struct smap *cfg, > > bfd_set_state(bfd, STATE_DOWN, DIAG_NONE); > > -memcpy(bfd->eth_dst, eth_addr_bfd, ETH_ADDR_LEN); > - > bfd_status_changed(bfd); > } > > @@ -440,13 +440,25 @@ bfd_configure(struct bfd *bfd, const char *name, const > struct smap *cfg, > need_poll = true; > } > > -hwaddr = smap_get(cfg, "bfd_dst_mac"); > -if (hwaddr && eth_addr_from_string(hwaddr, ea) && !eth_addr_is_zero(ea)) > { > -memcpy(bfd->eth_dst, ea, ETH_ADDR_LEN); > -bfd->eth_dst_set = true; > -} else if (bfd->eth_dst_set) { > -memcpy(bfd->eth_dst, eth_addr_bfd, ETH_ADDR_LEN); > -bfd->eth_dst_set = false; > +hwaddr = smap_get(cfg, "bfd_local_src_mac"); > +if (hwaddr && eth_addr_from_string(hwaddr, ea)) { > +memcpy(bfd->local_eth_src, ea, ETH_ADDR_LEN); > +} else { > +memset(bfd->local_eth_src, 0, ETH_ADDR_LEN); > +} > + > +hwaddr = smap_get(cfg, "bfd_local_dst_mac"); > +if (hwaddr && eth_addr_from_string(hwaddr, ea)) { > +memcpy(bfd->local_eth_dst, ea, ETH_ADDR_LEN); > +} else { > +memset(bfd->local_eth_dst, 0, ETH_ADDR_LEN); > +} > + > +hwaddr = smap_get(cfg, "bfd_remote_dst_mac"); > +if (hwaddr && eth_addr_from_string(hwaddr, ea)) { > +memcpy(bfd->rmt_eth_dst, ea, ETH_ADDR_LEN); > +} else { > +memset(bfd->rmt_eth_dst, 0, ETH_ADDR_LEN); > } > > ip_src = smap_get(cfg, "bfd_src_ip"); > @@ -600,8 +612,14 @@ bfd_put_packet(struct bfd *bfd, struct ofpbuf *p, > > ofpbuf_reserve(p, 2); /* Properly align after the ethernet header. */ > eth = ofpbuf_put_uninit(p, sizeof *eth); > -memcpy(eth->eth_src, eth_src, ETH_ADDR_LEN); > -memcpy(eth->eth_dst, bfd->eth_dst, ETH_ADDR_LEN); > +memcpy(eth->eth_src, > + eth_addr_is_zero(bfd->local_eth_src) ? eth_src > +: bfd->local_eth_src, > + ETH_ADDR_LEN); > +memcpy(eth->eth_dst, > + eth_addr_is_zero(bfd->local_eth_dst) ? eth_addr_bfd > +: bfd->local_eth_dst, > + ETH_ADDR_LEN); > eth->eth_type = htons(ETH_TYPE_IP); > > ip = ofpbuf_put_zeros(p, sizeof *ip); > @@ -657,7 +675,8 @@ bfd_should_process_flow(const struct bfd *bfd_, const > struct flow *flow, > bool check_tnl_key; > > memset(&wc->masks.dl_dst, 0xff, sizeof wc->masks.dl_dst); > -if (bfd->eth_dst_set && memcmp(bfd->eth_dst, flow->dl_dst, > ETH_ADDR_LEN)) { > +if (!eth_addr_is_zero(bfd->rmt_eth_dst) > +&& memcmp(bfd->rmt_eth_dst, flow->dl_dst, ETH_ADDR_LEN)) { > return false; > } > > diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml > index bff8fb6..ff9be17 100644 > --- a/vswitchd/vswitch.xml > +++ b/vswitchd/vswitch.xml > @@ -2081,12 +2081,27 @@ >tunnel key. > > > - > + > Set to an Ethernet address in the
Re: [ovs-dev] [DPDK Upcalls v3 2/3] ofproto: Reorganize in preparation for direct dpdk upcalls.
Good catch. I've fixed it and will merge soon. Ethan On Thu, Aug 14, 2014 at 9:49 AM, Ben Pfaff wrote: > On Wed, Aug 13, 2014 at 06:46:17PM -0700, Ethan Jackson wrote: >> This patch reorganizes ofproto-dpif in preparation for future patches >> which allow direct upcall processing from dpif-netdev. The main goals >> are to share as much code as possible between the dpif-linux and >> dpif-netdev upcall paths. Additionally, to avoid confusing the >> dpif-netdev fast path, the packet processing path should treat packets >> and struct flow's as const. >> >> Signed-off-by: Ethan Jackson > > recv_upcalls() calls upcall_receive() and if that fails it jumps to > cleanup to call upcall_uninit(). However, my reading of > upcall_receive() is that in its only failure case the upcall is not > properly initialized to call upcall_uninit(). For example, I believe > that the upcall's xout_initialized member is not initialized at that > point. I think that this error case should jump to free_dupcall > instead. > > Acked-by: Ben Pfaff ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH] ofproto-dpif-upcall: Properly initialize 'recv_buf'.
Due to a typo, the latest upcall refactoring caused dpif_recv() to be called on an un-initialized chunk of memory. Signed-off-by: Ethan Jackson Reported-by: Justin Pettit --- ofproto/ofproto-dpif-upcall.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c index 3b4ff5f..180684c 100644 --- a/ofproto/ofproto-dpif-upcall.c +++ b/ofproto/ofproto-dpif-upcall.c @@ -586,7 +586,7 @@ recv_upcalls(struct handler *handler) struct flow flow; int error; -ofpbuf_use_stub(&recv_buf[n_upcalls], recv_stubs[n_upcalls], +ofpbuf_use_stub(recv_buf, recv_stubs[n_upcalls], sizeof recv_stubs[n_upcalls]); if (dpif_recv(udpif->dpif, handler->handler_id, &dupcall, recv_buf)) { ofpbuf_uninit(recv_buf); -- 1.8.1.2 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] ofproto-dpif-upcall: Fix use of cleared stack memory.
Acked-by: Ethan Jackson Thanks for taking care of this. Ethan On Fri, Aug 15, 2014 at 1:15 AM, Alex Wang wrote: > Commit cc377352d (ofproto: Reorganize in preparation for direct > dpdk upcalls.) introduced the bug that uses variable defined on > the stack inside while loop for reading dpif upcalls and keeps > reference to attributes of the variable within the same function > after the stack is cleared. This bug can cause ovs abort. > > This commit fixes the above issue by defining an array of the > variable on the function stack. > > Signed-off-by: Alex Wang > --- > ofproto/ofproto-dpif-upcall.c | 27 ++- > 1 file changed, 14 insertions(+), 13 deletions(-) > > diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c > index 180684c..9f68a7d 100644 > --- a/ofproto/ofproto-dpif-upcall.c > +++ b/ofproto/ofproto-dpif-upcall.c > @@ -574,55 +574,56 @@ recv_upcalls(struct handler *handler) > struct udpif *udpif = handler->udpif; > uint64_t recv_stubs[UPCALL_MAX_BATCH][512 / 8]; > struct ofpbuf recv_bufs[UPCALL_MAX_BATCH]; > +struct dpif_upcall dupcalls[UPCALL_MAX_BATCH]; > struct upcall upcalls[UPCALL_MAX_BATCH]; > size_t n_upcalls, i; > > n_upcalls = 0; > while (n_upcalls < UPCALL_MAX_BATCH) { > struct ofpbuf *recv_buf = &recv_bufs[n_upcalls]; > +struct dpif_upcall *dupcall = &dupcalls[n_upcalls]; > struct upcall *upcall = &upcalls[n_upcalls]; > -struct dpif_upcall dupcall; > struct pkt_metadata md; > struct flow flow; > int error; > > ofpbuf_use_stub(recv_buf, recv_stubs[n_upcalls], > sizeof recv_stubs[n_upcalls]); > -if (dpif_recv(udpif->dpif, handler->handler_id, &dupcall, recv_buf)) > { > +if (dpif_recv(udpif->dpif, handler->handler_id, dupcall, recv_buf)) { > ofpbuf_uninit(recv_buf); > break; > } > > -if (odp_flow_key_to_flow(dupcall.key, dupcall.key_len, &flow) > +if (odp_flow_key_to_flow(dupcall->key, dupcall->key_len, &flow) > == ODP_FIT_ERROR) { > goto free_dupcall; > } > > -error = upcall_receive(upcall, udpif->backer, &dupcall.packet, > - dupcall.type, dupcall.userdata, &flow); > +error = upcall_receive(upcall, udpif->backer, &dupcall->packet, > + dupcall->type, dupcall->userdata, &flow); > if (error) { > if (error == ENODEV) { > /* Received packet on datapath port for which we couldn't > * associate an ofproto. This can happen if a port is > removed > * while traffic is being received. Print a rate-limited > * message in case it happens frequently. */ > -dpif_flow_put(udpif->dpif, DPIF_FP_CREATE, dupcall.key, > - dupcall.key_len, NULL, 0, NULL, 0, NULL); > +dpif_flow_put(udpif->dpif, DPIF_FP_CREATE, dupcall->key, > + dupcall->key_len, NULL, 0, NULL, 0, NULL); > VLOG_INFO_RL(&rl, "received packet on unassociated datapath " > "port %"PRIu32, flow.in_port.odp_port); > } > goto free_dupcall; > } > > -upcall->key = dupcall.key; > -upcall->key_len = dupcall.key_len; > +upcall->key = dupcall->key; > +upcall->key_len = dupcall->key_len; > > -if (vsp_adjust_flow(upcall->ofproto, &flow, &dupcall.packet)) { > +if (vsp_adjust_flow(upcall->ofproto, &flow, &dupcall->packet)) { > upcall->vsp_adjusted = true; > } > > md = pkt_metadata_from_flow(&flow); > -flow_extract(&dupcall.packet, &md, &flow); > +flow_extract(&dupcall->packet, &md, &flow); > > error = process_upcall(udpif, upcall, NULL); > if (error) { > @@ -635,14 +636,14 @@ recv_upcalls(struct handler *handler) > cleanup: > upcall_uninit(upcall); > free_dupcall: > -ofpbuf_uninit(&dupcall.packet); > +ofpbuf_uninit(&dupcall->packet); > ofpbuf_uninit(recv_buf); > } > > if (n_upcalls) { > handle_upcalls(handler->udpif, upcalls, n_upcalls); > for (i = 0; i < n_upcalls; i++) { > -ofpbuf_uninit(CONST_CAST(struct ofpbuf *, upcalls[i].packet)); > +ofpbuf_uninit(&dupcalls[i].packet); > ofpbuf_uninit(&recv_bufs[i]); > upcall_uninit(&upcalls[i]); > } > -- > 1.7.9.5 > > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH] packets: Suppress sparse warning.
warning: incorrect type in argument 1 (different base types) expected restricted ovs_be16 [usertype] old_csum got unsigned short [unsigned] [usertype] icmp6_cksum Signed-off-by: Ethan Jackson --- lib/packets.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/lib/packets.c b/lib/packets.c index ace7d8e..ddf7752 100644 --- a/lib/packets.c +++ b/lib/packets.c @@ -718,7 +718,9 @@ packet_update_csum128(struct ofpbuf *packet, uint8_t proto, } else if (proto == IPPROTO_ICMPV6 && l4_size >= sizeof(struct icmp6_hdr)) { struct icmp6_hdr *icmp = ofpbuf_l4(packet); +#ifndef __CHECKER__ /* icmp6_csum is a short instead of an ovs_be16. */ icmp->icmp6_cksum = recalc_csum128(icmp->icmp6_cksum, addr, new_addr); +#endif } } -- 1.8.1.2 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] packets: Suppress sparse warning.
oops, I missed it. I'll drop this then On Tue, Sep 2, 2014 at 6:44 PM, Jesse Gross wrote: > On Tue, Sep 2, 2014 at 6:36 PM, Ethan Jackson wrote: >> warning: incorrect type in argument 1 (different base types) >> expected restricted ovs_be16 [usertype] old_csum >> got unsigned short [unsigned] [usertype] icmp6_cksum >> >> Signed-off-by: Ethan Jackson > > I actually just sent out a patch for this about a half hour ago. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [ovs-dev 1/3] ovs-dev.py: Add aggressive compile optimization options.
These options don't make sense when building portable code, but when using the dev script, OVS is built on the same system it's run on. They make a small difference in the OVS DPDK testing, hence their addition. Signed-off-by: Ethan Jackson --- utilities/ovs-dev.py | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/utilities/ovs-dev.py b/utilities/ovs-dev.py index 496f6dc..e454e18 100755 --- a/utilities/ovs-dev.py +++ b/utilities/ovs-dev.py @@ -63,7 +63,7 @@ def conf(): "--with-logdir=%s/log" % ROOT, "--with-rundir=%s/run" % ROOT, "--enable-silent-rules", "--with-dbdir=" + ROOT, "--silent"] -cflags = "-g -fno-omit-frame-pointer" +cflags = "-g -fno-omit-frame-pointer -march=native" if options.werror: configure.append("--enable-Werror") @@ -83,8 +83,6 @@ def conf(): cflags += " -O%d" % options.optimize -ENV["CFLAGS"] = cflags - _sh("./boot.sh") try: @@ -93,6 +91,7 @@ def conf(): pass # Directory exists. os.chdir(BUILD_GCC) +ENV["CFLAGS"] = cflags + " -finline-limit=5" _sh(*(configure + ["--with-linux=/lib/modules/%s/build" % uname()])) try: @@ -114,6 +113,7 @@ def conf(): pass # Directory exists. ENV["CC"] = "clang" +ENV["CFLAGS"] = cflags os.chdir(BUILD_CLANG) _sh(*configure) -- 1.8.1.2 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [ovs-dev 2/3] ovs-dev.py: Support additional optimization flags.
They may or may not make a difference, but there's no reason not to support passing them. Signed-off-by: Ethan Jackson --- utilities/ovs-dev.py | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/utilities/ovs-dev.py b/utilities/ovs-dev.py index e454e18..3686391 100755 --- a/utilities/ovs-dev.py +++ b/utilities/ovs-dev.py @@ -81,7 +81,7 @@ def conf(): if options.optimize is None: options.optimize = 0 -cflags += " -O%d" % options.optimize +cflags += " -O%s" % str(options.optimize) _sh("./boot.sh") @@ -362,10 +362,13 @@ def main(): help="configure the man documentation install directory") group.add_option("--with-dpdk", dest="with_dpdk", metavar="DPDK_BUILD", help="built with dpdk libraries located at DPDK_BUILD"); +parser.add_option_group(group) -for i in range(4): -group.add_option("--O%d" % i, dest="optimize", action="store_const", - const=i, help="compile with -O%d" % i) +group = optparse.OptionGroup(parser, "Optimization Flags") +for i in ["s", "g"] + range(4) + ["fast"]: +group.add_option("--O%s" % str(i), dest="optimize", + action="store_const", const=i, + help="compile with -O%s" % str(i)) parser.add_option_group(group) group = optparse.OptionGroup(parser, "check") -- 1.8.1.2 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [ovs-dev 3/3] ovs-dev.py: Support running the clang binaries.
They have slightly different support characteristics, so it's nice to easily switch between them for testing. Signed-off-by: Ethan Jackson --- utilities/ovs-dev.py | 18 ++ 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/utilities/ovs-dev.py b/utilities/ovs-dev.py index 3686391..491d5ff 100755 --- a/utilities/ovs-dev.py +++ b/utilities/ovs-dev.py @@ -26,14 +26,15 @@ OVS_SRC = HOME + "/ovs" ROOT = HOME + "/root" BUILD_GCC = OVS_SRC + "/_build-gcc" BUILD_CLANG = OVS_SRC + "/_build-clang" -PATH = "%(ovs)s/utilities:%(ovs)s/ovsdb:%(ovs)s/vswitchd" % {"ovs": BUILD_GCC} - -ENV["PATH"] = PATH + ":" + ENV["PATH"] options = None parser = None commands = [] +def set_path(build): +PATH = "%(ovs)s/utilities:%(ovs)s/ovsdb:%(ovs)s/vswitchd" % {"ovs": build} + +ENV["PATH"] = PATH + ":" + ENV["PATH"] def _sh(*args, **kwargs): print "--> " + " ".join(args) @@ -236,7 +237,8 @@ def run(): _sh("ovs-vsctl --no-wait set Open_vSwitch %s ovs_version=%s" % (root_uuid, version)) -cmd = [BUILD_GCC + "/vswitchd/ovs-vswitchd"] +build = BUILD_CLANG if options.clang else BUILD_GCC +cmd = [build + "/vswitchd/ovs-vswitchd"] if options.dpdk: cmd.append("--dpdk") @@ -387,6 +389,9 @@ def main(): group.add_option("--dpdk", dest="dpdk", action="callback", callback=parse_subargs, help="run ovs-vswitchd with dpdk subopts (ended by --)") +group.add_option("--clang", dest="clang", action="store_true", + help="Use binaries built by clang") + parser.add_option_group(group) options, args = parser.parse_args() @@ -396,6 +401,11 @@ def main(): print "Unknown argument " + arg doc() +if options.clang: +set_path(BUILD_CLANG) +else: +set_path(BUILD_GCC) + try: os.chdir(OVS_SRC) except OSError: -- 1.8.1.2 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [ovs-dev 1/3] ovs-dev.py: Add aggressive compile optimization options.
> Is there any point to testing OVS in a configuration that we won't > distribute? This is a subject in which there could legitimately be many points of view, but I'll take a stab at explaining how I think about it. To answer your question: In general no, but with DPDK yes. If you look at how stock OVS is typically deployed, a vendor (Redhat, Debian, etc), builds it in a portable fashion and distributes binaries to whoever wants them. In this case, I completely agree that these sorts of compiler optimizations don't make sense. With DPDK I think the situation is slightly different. The DPDK libraries are specifically designed for people to heavily tune the applications built on top of them, and I suspect the people interested in OVS on DPDK at this point are the same. To the extent that OVS on DPDK is deployed, it will be in very specific circumstances where performance is critical and these sorts of compile time optimizations are typical. Anyways that's my thinking on it. If you prefer, I could make it an option, but for the dev script I don't think it really matters that much. Keep in mind that real numbers we publish are based on official builds which don't use this script. Ethan ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] ofproto: Increase default datapath max_idle time.
Acked-by: Ethan Jackson On Thu, Sep 11, 2014 at 3:03 PM, Joe Stringer wrote: > The datapath max_idle value determines how long to wait before deleting > an idle datapath flow when operating below the flow_limit. This patch > increases the max_idle to 10 seconds, which allows datapath flows to be > remain cached even if they are used less consistently, and provides a > small improvement in the supported number of flows when operating around > the flow_limit. > > Signed-off-by: Joe Stringer > --- > ofproto/ofproto.h |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/ofproto/ofproto.h b/ofproto/ofproto.h > index 1b8709a..d60b198 100644 > --- a/ofproto/ofproto.h > +++ b/ofproto/ofproto.h > @@ -267,7 +267,7 @@ int ofproto_port_dump_done(struct ofproto_port_dump *); > ) > > #define OFPROTO_FLOW_LIMIT_DEFAULT 20 > -#define OFPROTO_MAX_IDLE_DEFAULT 1500 > +#define OFPROTO_MAX_IDLE_DEFAULT 1 /* ms */ > > const char *ofproto_port_open_type(const char *datapath_type, > const char *port_type); > -- > 1.7.10.4 > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 2/2] ovs-dev.py: skip clang if configure fails
Acked-by: Ethan Jackson On Sun, Sep 14, 2014 at 8:06 PM, Joe Stringer wrote: > Although I reviewed and applied #1, I'll leave this one up to Ethan as it's > more relevant to his other proposed changes. > > On 13 September 2014 05:35, Daniele Di Proietto > wrote: >> >> If 'configure' with clang fails (this can happen for example if it doesn't >> support all the cflags), simply skip it >> >> Signed-off-by: Daniele Di Proietto >> --- >> utilities/ovs-dev.py | 9 ++--- >> 1 file changed, 6 insertions(+), 3 deletions(-) >> >> diff --git a/utilities/ovs-dev.py b/utilities/ovs-dev.py >> index 869d89e..934cddb 100755 >> --- a/utilities/ovs-dev.py >> +++ b/utilities/ovs-dev.py >> @@ -114,9 +114,12 @@ def conf(): >> except OSError: >> pass # Directory exists. >> >> -ENV["CC"] = "clang" >> -os.chdir(BUILD_CLANG) >> -_sh(*configure) >> +try: >> +ENV["CC"] = "clang" >> +os.chdir(BUILD_CLANG) >> +_sh(*configure) >> +except subprocess.CalledProcessError: >> +clang = False >> >> if sparse: >> c1 = "C=1" >> -- >> 2.1.0.rc1 >> >> ___ >> dev mailing list >> dev@openvswitch.org >> http://openvswitch.org/mailman/listinfo/dev > > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 1/2] datapaht/README: fixed some minor typos
The patch introduces some trailing whitespace. > - - If userspace's notion of the flow key for the packet matches the > + - If the userspace's notion of the flow key for the packet matches the I don't agree with this change. It's pretty common to talk about userspace without a "the" as the README originally did. Otherwise looks good. If you resend it with those changes I'll merge it. Ethan ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 2/2] AUTHORS: added Luis E. P. to list
Typically adding yourself to the authors file wouldn't be a separate patch, I'd just merge this into the previous one. Also, Looking at the file now the first list of people are those who have "authored or signed off on commits in the Open vSwitch source code". I think the patch you suggested technically fits that bill, so I'd go ahead and add yourself to the first list instead of the second. Ethan On Thu, Jul 2, 2015 at 8:58 AM, Luis E. P wrote: > Signed-off-by: Luis E. P > --- > AUTHORS | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/AUTHORS b/AUTHORS > index 845d20c..7aad9fa 100644 > --- a/AUTHORS > +++ b/AUTHORS > @@ -294,6 +294,7 @@ Krishna Miriyalakris...@nicira.com > Len Gao l...@vmware.com > Logan Rosen logatron...@gmail.com > Luca Falavigna dktrkr...@debian.org > +Luis E. P. l...@hotmail.com > Luiz Henrique Ozaki luiz.oz...@gmail.com > Marco d'Itrim...@linux.it > Martin Vizvary vizv...@ics.muni.cz > -- > 2.3.2 (Apple Git-55) > > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 1/2] datapaht/README: fixed some minor typos
> Also, there is a typo in the subject. Doh, I should have caught that. Ethan ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 2/2] AUTHORS: added Luis E. P. to list
I would ditch this patch and add your self to the authors file in the previous one. Ethan On Thu, Jul 2, 2015 at 12:12 PM, Justin Pettit wrote: > Yeah, I guess so. If you're doing that, you might as well roll in the other > feedback. We typically version the patches, so I'd put a v3 in the next patch. > > --Justin > > >> On Jul 2, 2015, at 12:01 PM, Luis E Pena wrote: >> >> Should I redo this patch with my name in the first list? >> >>> On Jul 2, 2015, at 11:15, Ben Pfaff wrote: >>> >>> On Thu, Jul 02, 2015 at 11:13:54AM -0700, Ethan Jackson wrote: >>>> Also, Looking at the file now the first list of people are those who >>>> have "authored or signed off on commits in the Open vSwitch source >>>> code". I think the patch you suggested technically fits that bill, so >>>> I'd go ahead and add yourself to the first list instead of the second. >>> >>> Yes, it's intentionally a broad category, not a narrow one, anyone who >>> submits a patch goes in the first list. >> >> ___ >> dev mailing list >> dev@openvswitch.org >> http://openvswitch.org/mailman/listinfo/dev > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v3] datapath/README: fixed some typos
Acked-by: Ethan Jackson Merged. Changed some tabs to spaces Ethan On Thu, Jul 2, 2015 at 12:34 PM, Luis E. P wrote: > Signed-off-by: Luis E. P > --- > AUTHORS| 1 + > datapath/README.md | 7 --- > 2 files changed, 5 insertions(+), 3 deletions(-) > > diff --git a/AUTHORS b/AUTHORS > index 845d20c..b82d5ed 100644 > --- a/AUTHORS > +++ b/AUTHORS > @@ -109,6 +109,7 @@ Lorand Jakabloja...@cisco.com > Luca Giraudolgira...@nicira.com > Lucian Petrut lpet...@cloudbasesolutions.com > Luigi Rizzo ri...@iet.unipi.it > +Luis E. P. l...@hotmail.com > Madhu Challacha...@noironetworks.com > Mark D. Graymark.d.g...@intel.com > Mark Hamilton mhamil...@nicira.com > diff --git a/datapath/README.md b/datapath/README.md > index 1a4d8e1..8faecc0 100644 > --- a/datapath/README.md > +++ b/datapath/README.md > @@ -99,10 +99,11 @@ passed over the Netlink socket. A flow key, exactly as > described above, and an > optional corresponding flow mask. > > A wildcarded flow can represent a group of exact match flows. Each '1' bit > -in the mask specifies a exact match with the corresponding bit in the flow > key. > +in the mask specifies an exact match with the corresponding bit in the flow > key. > A '0' bit specifies a don't care bit, which will match either a '1' or '0' > bit > -of a incoming packet. Using wildcarded flow can improve the flow set up rate > -by reduce the number of new flows need to be processed by the user space > program. > +of an incoming packet. Using a wildcarded flow can improve the flow set up > rate > +by reducing the number of new flows that need to be processed by the user > space > +program. > > Support for the mask Netlink attribute is optional for both the kernel and > user > space program. The kernel can ignore the mask attribute, installing an exact > -- > 2.3.2 (Apple Git-55) > > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH] doc: Document proposed OVN Gateway HA design.
High availability for gateways in network virtualization deployments is fairly difficult to get right. There are a ton of options, most of which are too complicated or perform badly. To help solve this problem, this patch proposes an HA design based on some of the lessons learned building similar systems. The hope is that it can be used as a starting point for design discussions and an eventual implementation. Signed-off-by: Ethan Jackson --- OVN-GW-HA.md | 374 +++ 1 file changed, 374 insertions(+) create mode 100644 OVN-GW-HA.md diff --git a/OVN-GW-HA.md b/OVN-GW-HA.md new file mode 100644 index 000..ea598b2 --- /dev/null +++ b/OVN-GW-HA.md @@ -0,0 +1,374 @@ +OVN Gateway High Availability Plan +== +``` + +---+ + | | + | External Network | + | | + +-^-+ + | + | + +---+ + | | + | Gateway | + | | + +---+ + ^ + | + | + +-v-+ + | | + |OVN Virtual Network| + | | + +---+ + +OVN Gateway +``` + +The OVN gateway is responsible for shuffling traffic between logical space +(governed by ovn-northd), and the legacy physical network. In a naive +implementation, the gateway is a single x86 server, or hardware VTEP. For most +deployments, a single system has enough forwarding capacity to service the +entire virtualized network, however, it introduces a single point of failure. +If this system dies, the entire OVN deployment becomes unavailable. To mitgate +this risk, an HA solution is critical ??? by spreading responsibilty across +multiple systems, no single server failure can take down the network. + +An HA solution is both critical to the performance and manageability of the +system, and extremely difficult to get right. The purpose of this document, is +to propose a plan for OVN Gateway High Availability which takes into account +our past experience building similar systems. It should be considered a fluid +changing proposal, not a set-in-stone decree. + +Basic Architecture +-- +In an OVN deployment, the set of hypervisors and network elements operating +under the guidance of ovn-northd are in what's called "logical space". These +servers use VXLAN, STT, or Geneve to communicate, oblivious to the details of +the underlying physical network. When these systems need to communicate with +legacy networks, traffic must be routed through a Gateway which translates from +OVN controlled tunnel traffic, to raw physical network traffic. + +Since the broader internet is managed outside of the OVN network domain, all +traffic between logical space and the WAN must travel through this gateway. +This makes it a critical single point of failure ??? if the gateway dies, +communication with the WAN ceases for all systems in logical space. + +To mitigate this risk, multiple gateways should be run in a "High Availability +Cluster" or "HA Cluster". The HA cluster will be responsible for performing +the duties of a gateways, while being able to recover gracefully from +individual memember failures. + +``` + +---+ + | | + | External Network | + | | + +-^-+ + | + | ++--v--+ +| | +| High Availability Cluster | +| | +| +---+ +---+ +---+ | +| | | | | | | | +| | Gateway | | Gateway | | Gateway | | +| | | | | | | | +| +---+ +---+ +---+ | ++--^--+ + | + | + +-v-+ + | | + |OVN Virtual Network| + | | + +---+ + +OVN Gateway HA Cluster +``` + +# L2 vs L3 High Availability +In order to achieve this goal, there are two broad approaches one can take. +The HA cluster can appear to the network like a giant Layer 2 Ethernet Switch, +or like a giant IP Router. These approaches are called L2HA, and L3HA +respectively. L2HA allows ethernet broadcast domains to extend into logical +space, a signifi
Re: [ovs-dev] [PATCH] doc: Document proposed OVN Gateway HA design.
Thanks for the review, I've addressed the comments and will send out another version of the patch shortly. One comment on the feedback below. > Under "Controller Independent Active-backup", I am not sure that I buy > the argument here, because currently ovn-northd doesn't care about the > layout of the physical network. The other argument rings true for me of > course: I'm not sure I fully understood this comment. What part of the paragraph do you disagree with specifically? Whether or not ovn-northd cares about the layout of the physical network, it still needs to know which gateway is up, and under naive active-backup react to changes. Ethan > > This can significantly increase downtime in the event of a failover > as the (often already busy) ovn-northd controller has to recompute > state for the new leader. > > Here are some spelling fixes as a patch. This also replaces the fancy > Unicode U+2014 em dashes by the more common (in OVS, anyway) ASCII "--". > > Thanks again for writing this! > > diff --git a/OVN-GW-HA.md b/OVN-GW-HA.md > index ea598b2..e0d5c9f 100644 > --- a/OVN-GW-HA.md > +++ b/OVN-GW-HA.md > @@ -30,8 +30,8 @@ The OVN gateway is responsible for shuffling traffic > between logical space > implementation, the gateway is a single x86 server, or hardware VTEP. For > most > deployments, a single system has enough forwarding capacity to service the > entire virtualized network, however, it introduces a single point of failure. > -If this system dies, the entire OVN deployment becomes unavailable. To > mitgate > -this risk, an HA solution is critical — by spreading responsibilty across > +If this system dies, the entire OVN deployment becomes unavailable. To > mitigate > +this risk, an HA solution is critical -- by spreading responsibility across > multiple systems, no single server failure can take down the network. > > An HA solution is both critical to the performance and manageability of the > @@ -51,7 +51,7 @@ OVN controlled tunnel traffic, to raw physical network > traffic. > > Since the broader internet is managed outside of the OVN network domain, all > traffic between logical space and the WAN must travel through this gateway. > -This makes it a critical single point of failure — if the gateway dies, > +This makes it a critical single point of failure -- if the gateway dies, > communication with the WAN ceases for all systems in logical space. > > To mitigate this risk, multiple gateways should be run in a "High > Availability > @@ -128,15 +128,15 @@ absolute simplest way to achive this is what we'll call > "naive-active-backup". > Naive Active Backup HA Implementation > ``` > > -In a naive active-bakup, one of the Gateways is choosen (arbitrarily) as a > +In a naive active-backup, one of the Gateways is choosen (arbitrarily) as a > leader. All logical routers (A, B, C in the figure), are scheduled on this > leader gateway and all traffic flows through it. ovn-northd monitors this > gateway via OpenFlow hello messages (or some equivalent), and if the gateway > dies, it recreates the routers on one of the backups. > > This approach basically works in most cases and should likely be the starting > -point for OVN — it's strictly better than no HA solution and is a good > -foundation for more sophisticated solutions. That said, it's not without > it's > +point for OVN -- it's strictly better than no HA solution and is a good > +foundation for more sophisticated solutions. That said, it's not without its > limitations. Specifically, this approach doesn't coordinate with the physical > network to minimize disruption during failures, and it tightly couples > failover > to ovn-northd (we'll discuss why this is bad in a bit), and wastes resources > by > @@ -167,7 +167,7 @@ ethernet source address of the RARP is that of the > logical router it > corresponds to, and its destination is the broadcast address. This causes > the > RARP to travel to every L2 switch in the broadcast domain, updating > forwarding > tables accordingly. This strategy is recommended in all failover mechanisms > -discussed in this document — when a router newly boots on a new leader, it > +discussed in this document -- when a router newly boots on a new leader, it > should RARP its MAC address. > > ### Controller Independent Active-backup > @@ -188,7 +188,7 @@ Controller Independent Active-Backup Implementation > ``` > > The fundamental problem with naive active-backup, is it tightly couples the > -failover solution to ovn-northd. This can signifcantly increase downtime in > +failover solution to ovn-northd. This can significantly increase downtime in > the event of a failover as the (often already busy) ovn-northd controller has > to recompute state for the new leader. Worse, if ovn-northd goes down, we > can't perform gateway failover at all. This violates the principle that > @@ -207,7 +207,7 @@ priority to each node it controls. Nodes use the >
[ovs-dev] [PATCH v2] doc: Document proposed OVN Gateway HA design.
High availability for gateways in network virtualization deployments is fairly difficult to get right. There are a ton of options, most of which are too complicated or perform badly. To help solve this problem, this patch proposes an HA design based on some of the lessons learned building similar systems. The hope is that it can be used as a starting point for design discussions and an eventual implementation. Signed-off-by: Ethan Jackson --- ovn/OVN-GW-HA.md | 375 +++ 1 file changed, 375 insertions(+) create mode 100644 ovn/OVN-GW-HA.md diff --git a/ovn/OVN-GW-HA.md b/ovn/OVN-GW-HA.md new file mode 100644 index 000..b26ee68 --- /dev/null +++ b/ovn/OVN-GW-HA.md @@ -0,0 +1,375 @@ +OVN Gateway High Availability Plan +== +``` + +---+ + | | + | External Network | + | | + +-^-+ + | + | + +---+ + | | + | Gateway | + | | + +---+ + ^ + | + | + +-v-+ + | | + |OVN Virtual Network| + | | + +---+ + +OVN Gateway +``` + +The OVN gateway is responsible for shuffling traffic between the tunneled +overlay network (governed by ovn-northd), and the legacy physical network. In +a naive implementation, the gateway is a single x86 server, or hardware VTEP. +For most deployments, a single system has enough forwarding capacity to service +the entire virtualized network, however, it introduces a single point of +failure. If this system dies, the entire OVN deployment becomes unavailable. +To mitigate this risk, an HA solution is critical -- by spreading +responsibility across multiple systems, no single server failure can take down +the network. + +An HA solution is both critical to the manageability of the system, and +extremely difficult to get right. The purpose of this document, is to propose +a plan for OVN Gateway High Availability which takes into account our past +experience building similar systems. It should be considered a fluid changing +proposal, not a set-in-stone decree. + +Basic Architecture +-- +In an OVN deployment, the set of hypervisors and network elements operating +under the guidance of ovn-northd are in what's called "logical space". These +servers use VXLAN, STT, or Geneve to communicate, oblivious to the details of +the underlying physical network. When these systems need to communicate with +legacy networks, traffic must be routed through a Gateway which translates from +OVN controlled tunnel traffic, to raw physical network traffic. + +Since the gateway is typically the only system with a connection to the +physical network all traffic between logical space and the WAN must travel +through it. This makes it a critical single point of failure -- if +the gateway dies, communication with the WAN ceases for all systems in logical +space. + +To mitigate this risk, multiple gateways should be run in a "High Availability +Cluster" or "HA Cluster". The HA cluster will be responsible for performing +the duties of a gateways, while being able to recover gracefully from +individual member failures. + +``` + +---+ + | | + | External Network | + | | + +-^-+ + | + | ++--v--+ +| | +| High Availability Cluster | +| | +| +---+ +---+ +---+ | +| | | | | | | | +| | Gateway | | Gateway | | Gateway | | +| | | | | | | | +| +---+ +---+ +---+ | ++--^--+ + | + | + +-v-+ + | | + |OVN Virtual Network| + | | + +---+ + +OVN Gateway HA Cluster +``` + +# L2 vs L3 High Availability +In order to achieve this goal, there are two broad approaches one can take. +The HA cluster can appear to the network like a giant Layer 2 Ethernet Switch, +or like a giant IP Router. These approaches are called L2HA, and L3HA +respectively. L2HA allows ethernet broadcast domains to extend into logi
Re: [ovs-dev] [PATCH] doc: Document proposed OVN Gateway HA design.
No worries. I sent a new version (actually ran spell check on this one ;) ) On Tue, Jul 21, 2015 at 11:35 AM, Ben Pfaff wrote: > On Tue, Jul 21, 2015 at 11:32:21AM -0700, Ethan Jackson wrote: >> Thanks for the review, I've addressed the comments and will send out >> another version of the patch shortly. One comment on the feedback >> below. >> >> > Under "Controller Independent Active-backup", I am not sure that I buy >> > the argument here, because currently ovn-northd doesn't care about the >> > layout of the physical network. The other argument rings true for me of >> > course: >> >> I'm not sure I fully understood this comment. What part of the >> paragraph do you disagree with specifically? Whether or not >> ovn-northd cares about the layout of the physical network, it still >> needs to know which gateway is up, and under naive active-backup react >> to changes. > > Never mind, I think I misunderstood. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] netdev-dpdk: add support for rings in secondary processes in IVSHMEM setups
Sort of. Helped with the creation of the patch a bit so I figured I should add a sign off. Someone other than me should review it for this reason. Ethan On Wed, Jul 22, 2015 at 3:07 PM, Ben Pfaff wrote: > On Wed, Jul 22, 2015 at 01:51:46PM -0700, Melvin Walls wrote: >> In order for OVS running inside a VM using IVSHMEM to recognize ports created >> on the host, you have to start vswitchd with the --proc-type=secondary EAL >> option. >> >> When creating rings in secondary processes functions like >> rte_eth_dev_configure() fail with the error code E_RTE_SECONDARY, i.e., the >> operations are not allowed in secondary processes. Avoiding this requires >> some >> changes to the way secondary processes handle dpdk rings. >> >> This patch changes dpdk_ring_create() to use rte_ring_lookup() instead of >> rte_ring_create() when called from a secondary process. It also introduces >> two >> functions: netdev_dpdk_ring_rxq_recv() and netdev_dpdk_ring_send__() to >> handle >> tx/rx on dpdk rings in secondary processes. >> >> Signed-off-by: Melvin Walls >> Signed-off-by: Ethan Jackson > > These sign-offs are weird. Ethan, is this Signed-off-by meant to be an > Acked-by from you? ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] netdev-dpdk: add support for rings in secondary processes in IVSHMEM setups
Sure, Melvin could you add that to the next version you send out (after this version is reviewed). Ethan On Wed, Jul 22, 2015 at 4:31 PM, Ben Pfaff wrote: > I understand now. I recommend adding a Co-authored-by, then, to make > that clear. > > On Wed, Jul 22, 2015 at 04:29:55PM -0700, Ethan Jackson wrote: >> Sort of. Helped with the creation of the patch a bit so I figured I >> should add a sign off. Someone other than me should review it for >> this reason. >> >> Ethan >> >> On Wed, Jul 22, 2015 at 3:07 PM, Ben Pfaff wrote: >> > On Wed, Jul 22, 2015 at 01:51:46PM -0700, Melvin Walls wrote: >> >> In order for OVS running inside a VM using IVSHMEM to recognize ports >> >> created >> >> on the host, you have to start vswitchd with the --proc-type=secondary EAL >> >> option. >> >> >> >> When creating rings in secondary processes functions like >> >> rte_eth_dev_configure() fail with the error code E_RTE_SECONDARY, i.e., >> >> the >> >> operations are not allowed in secondary processes. Avoiding this requires >> >> some >> >> changes to the way secondary processes handle dpdk rings. >> >> >> >> This patch changes dpdk_ring_create() to use rte_ring_lookup() instead of >> >> rte_ring_create() when called from a secondary process. It also >> >> introduces two >> >> functions: netdev_dpdk_ring_rxq_recv() and netdev_dpdk_ring_send__() to >> >> handle >> >> tx/rx on dpdk rings in secondary processes. >> >> >> >> Signed-off-by: Melvin Walls >> >> Signed-off-by: Ethan Jackson >> > >> > These sign-offs are weird. Ethan, is this Signed-off-by meant to be an >> > Acked-by from you? ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] netdev-dpdk: add support for rings in secondary processes in IVSHMEM setups
Probably someone with some experience with the DPDK code. My feeling is that Daniele or Pravin would be appropriate. Wouldn't mind if someone from Intel had a look too. Ethan On Wed, Jul 22, 2015 at 4:36 PM, Ben Pfaff wrote: > Who should review it? Daniele? Pravin? > > On Wed, Jul 22, 2015 at 04:34:39PM -0700, Ethan Jackson wrote: >> Sure, Melvin could you add that to the next version you send out >> (after this version is reviewed). >> >> Ethan >> >> On Wed, Jul 22, 2015 at 4:31 PM, Ben Pfaff wrote: >> > I understand now. I recommend adding a Co-authored-by, then, to make >> > that clear. >> > >> > On Wed, Jul 22, 2015 at 04:29:55PM -0700, Ethan Jackson wrote: >> >> Sort of. Helped with the creation of the patch a bit so I figured I >> >> should add a sign off. Someone other than me should review it for >> >> this reason. >> >> >> >> Ethan >> >> >> >> On Wed, Jul 22, 2015 at 3:07 PM, Ben Pfaff wrote: >> >> > On Wed, Jul 22, 2015 at 01:51:46PM -0700, Melvin Walls wrote: >> >> >> In order for OVS running inside a VM using IVSHMEM to recognize ports >> >> >> created >> >> >> on the host, you have to start vswitchd with the --proc-type=secondary >> >> >> EAL >> >> >> option. >> >> >> >> >> >> When creating rings in secondary processes functions like >> >> >> rte_eth_dev_configure() fail with the error code E_RTE_SECONDARY, >> >> >> i.e., the >> >> >> operations are not allowed in secondary processes. Avoiding this >> >> >> requires some >> >> >> changes to the way secondary processes handle dpdk rings. >> >> >> >> >> >> This patch changes dpdk_ring_create() to use rte_ring_lookup() instead >> >> >> of >> >> >> rte_ring_create() when called from a secondary process. It also >> >> >> introduces two >> >> >> functions: netdev_dpdk_ring_rxq_recv() and netdev_dpdk_ring_send__() >> >> >> to handle >> >> >> tx/rx on dpdk rings in secondary processes. >> >> >> >> >> >> Signed-off-by: Melvin Walls >> >> >> Signed-off-by: Ethan Jackson >> >> > >> >> > These sign-offs are weird. Ethan, is this Signed-off-by meant to be an >> >> > Acked-by from you? ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 1/2] netdev-dpdk: Restore txq/rxq number if initialization fails.
Ben, Justin, should this be backported? I'm not up on the policy at the moment. Etha On Thu, Jul 23, 2015 at 3:42 AM, Traynor, Kevin wrote: > >> -Original Message- >> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Daniele Di >> Proietto >> Sent: Thursday, July 16, 2015 7:48 PM >> To: dev@openvswitch.org >> Subject: [ovs-dev] [PATCH 1/2] netdev-dpdk: Restore txq/rxq number if >> initialization fails. >> >> netdev_dpdk_set_multiq() should not set the number of configured rxq >> and txq if the driver initialization fails (meaning that the driver >> failed to setup the queues). Otherwise, on a subsequent call to >> netdev_dpdk_set_multiq(), the code may believe that the queues have >> already been setup and there's no work to be done. >> >> This commit fixes the problem by restoring the old values if >> dpdk_eth_dev_init() fails. >> >> Reported-by: Ian Stokes >> Signed-off-by: Daniele Di Proietto >> --- >> lib/netdev-dpdk.c | 9 + >> 1 file changed, 9 insertions(+) >> >> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c >> index 8b843db..5ae805e 100644 >> --- a/lib/netdev-dpdk.c >> +++ b/lib/netdev-dpdk.c >> @@ -743,6 +743,7 @@ netdev_dpdk_set_multiq(struct netdev *netdev_, unsigned >> int n_txq, >> { >> struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_); >> int err = 0; >> +int old_rxq, old_txq; >> >> if (netdev->up.n_txq == n_txq && netdev->up.n_rxq == n_rxq) { >> return err; >> @@ -753,12 +754,20 @@ netdev_dpdk_set_multiq(struct netdev *netdev_, unsigned >> int n_txq, >> >> rte_eth_dev_stop(netdev->port_id); >> >> +old_txq = netdev->up.n_txq; >> +old_rxq = netdev->up.n_rxq; >> netdev->up.n_txq = n_txq; >> netdev->up.n_rxq = n_rxq; >> >> rte_free(netdev->tx_q); >> err = dpdk_eth_dev_init(netdev); >> netdev_dpdk_alloc_txq(netdev, netdev->real_n_txq); >> +if (err) { >> +/* If there has been an error, it means that the requested queues >> + * have not been created. Restore the old numbers. */ >> +netdev->up.n_txq = old_txq; >> +netdev->up.n_rxq = old_rxq; > > I had thought that we should restore the previous netdev->tx_q but at present > txq's are fixed, so I think it is fine. If txq's become configurable we can > change. > > It would be good to get these patches into OVS2.4 branch if there is still > time? > > Acked-by: Kevin Traynor > >> +} >> >> netdev->txq_needs_locking = netdev->real_n_txq != netdev->up.n_txq; >> >> -- >> 2.1.4 >> >> ___ >> dev mailing list >> dev@openvswitch.org >> http://openvswitch.org/mailman/listinfo/dev > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 1/2] netdev-dpdk: Restore txq/rxq number if initialization fails.
Would you please summarize the errors on list? Daniele should probably have a look at it since he wrote the patch originally. Ethan On Thu, Jul 23, 2015 at 4:27 PM, Luis E Pena wrote: > I ran into errors when using master when adding a dpdk port. I worked with > Pravin yesterday and we think that the cause is this patch. > When we tried branch-2.4, we ran into no errors. > Today I am working on confirming that this is the patch. > > Luis E. P. > > Sent from my iPhone > >> On Jul 23, 2015, at 15:38, Ethan Jackson wrote: >> >> Ben, Justin, should this be backported? I'm not up on the policy at the >> moment. >> >> Etha >> >>> On Thu, Jul 23, 2015 at 3:42 AM, Traynor, Kevin >>> wrote: >>> >>>> -Original Message- >>>> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Daniele Di >>>> Proietto >>>> Sent: Thursday, July 16, 2015 7:48 PM >>>> To: dev@openvswitch.org >>>> Subject: [ovs-dev] [PATCH 1/2] netdev-dpdk: Restore txq/rxq number if >>>> initialization fails. >>>> >>>> netdev_dpdk_set_multiq() should not set the number of configured rxq >>>> and txq if the driver initialization fails (meaning that the driver >>>> failed to setup the queues). Otherwise, on a subsequent call to >>>> netdev_dpdk_set_multiq(), the code may believe that the queues have >>>> already been setup and there's no work to be done. >>>> >>>> This commit fixes the problem by restoring the old values if >>>> dpdk_eth_dev_init() fails. >>>> >>>> Reported-by: Ian Stokes >>>> Signed-off-by: Daniele Di Proietto >>>> --- >>>> lib/netdev-dpdk.c | 9 + >>>> 1 file changed, 9 insertions(+) >>>> >>>> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c >>>> index 8b843db..5ae805e 100644 >>>> --- a/lib/netdev-dpdk.c >>>> +++ b/lib/netdev-dpdk.c >>>> @@ -743,6 +743,7 @@ netdev_dpdk_set_multiq(struct netdev *netdev_, unsigned >>>> int n_txq, >>>> { >>>> struct netdev_dpdk *netdev = netdev_dpdk_cast(netdev_); >>>> int err = 0; >>>> +int old_rxq, old_txq; >>>> >>>> if (netdev->up.n_txq == n_txq && netdev->up.n_rxq == n_rxq) { >>>> return err; >>>> @@ -753,12 +754,20 @@ netdev_dpdk_set_multiq(struct netdev *netdev_, >>>> unsigned >>>> int n_txq, >>>> >>>> rte_eth_dev_stop(netdev->port_id); >>>> >>>> +old_txq = netdev->up.n_txq; >>>> +old_rxq = netdev->up.n_rxq; >>>> netdev->up.n_txq = n_txq; >>>> netdev->up.n_rxq = n_rxq; >>>> >>>> rte_free(netdev->tx_q); >>>> err = dpdk_eth_dev_init(netdev); >>>> netdev_dpdk_alloc_txq(netdev, netdev->real_n_txq); >>>> +if (err) { >>>> +/* If there has been an error, it means that the requested queues >>>> + * have not been created. Restore the old numbers. */ >>>> +netdev->up.n_txq = old_txq; >>>> +netdev->up.n_rxq = old_rxq; >>> >>> I had thought that we should restore the previous netdev->tx_q but at >>> present >>> txq's are fixed, so I think it is fine. If txq's become configurable we can >>> change. >>> >>> It would be good to get these patches into OVS2.4 branch if there is still >>> time? >>> >>> Acked-by: Kevin Traynor >>> >>>> +} >>>> >>>> netdev->txq_needs_locking = netdev->real_n_txq != netdev->up.n_txq; >>>> >>>> -- >>>> 2.1.4 >>>> >>>> ___ >>>> dev mailing list >>>> dev@openvswitch.org >>>> https://urldefense.proofpoint.com/v2/url?u=http-3A__openvswitch.org_mailman_listinfo_dev&d=BQIGaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=vLBOU1QC09fNCgzE9c0HcQ&m=BYGI54GsZwT1uHiylyB910omvTxfD_EkW_A8R8jN_5M&s=3CQhoT6BwcZjI5qd5549pOjn6pDX3E3MsUbJc8QKufg&e= >>> ___ >>> dev mailing list >>> dev@openvswitch.org >>> https://urldefense.proofpoint.com/v2/url?u=http-3A__openvswitch.org_mailman_listinfo_dev&d=BQIGaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=vLBOU1QC09fNCgzE9c0HcQ&m=BYGI54GsZwT1uHiylyB910omvTxfD_EkW_A8R8jN_5M&s=3CQhoT6BwcZjI5qd5549pOjn6pDX3E3MsUbJc8QKufg&e= >> ___ >> dev mailing list >> dev@openvswitch.org >> https://urldefense.proofpoint.com/v2/url?u=http-3A__openvswitch.org_mailman_listinfo_dev&d=BQIGaQ&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=vLBOU1QC09fNCgzE9c0HcQ&m=BYGI54GsZwT1uHiylyB910omvTxfD_EkW_A8R8jN_5M&s=3CQhoT6BwcZjI5qd5549pOjn6pDX3E3MsUbJc8QKufg&e= ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] ovn: Fix broken build.
Ah my bad sorry. Acked-by: Ethan Jackson On Tue, Jul 28, 2015 at 6:53 PM, Ben Pfaff wrote: > Fixes the following error: > > The following files are in git but not the distribution: > ovn/OVN-GW-HA.md > > CC: Ethan Jackson > Signed-off-by: Ben Pfaff > --- > ovn/automake.mk | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/ovn/automake.mk b/ovn/automake.mk > index 459ee36..1170a0b 100644 > --- a/ovn/automake.mk > +++ b/ovn/automake.mk > @@ -72,7 +72,8 @@ DISTCLEANFILES += ovn/ovn-nbctl.8 ovn/ovn-architecture.7 > > EXTRA_DIST += \ > ovn/TODO \ > - ovn/CONTAINERS.OpenStack.md > + ovn/CONTAINERS.OpenStack.md \ > + ovn/OVN-GW-HA.md > > # ovn-nbctl > bin_PROGRAMS += ovn/ovn-nbctl > -- > 2.1.3 > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] netdev-dpdk: add support for rings in secondary processes in IVSHMEM setups
> However, there is a bit of a problem with this model as it currently stands > which is > kind of independent of this patch and was suggested by Daniel above. Freeing > mbufs in the guest may cause packet corruption. This was an issue with DPDK > (I think > it still is). When returning an mbuf to the mempool, DPDK will first try to > return the > mbuf to a per-lcore cache. The per-lcore cache will get periodically flushed > to the > central mempool. The per-lcore cache is not threadsafe but the central > mempool is. > This is not a major issue when everything is running on the host but when you > run a > dpdk application in the guest you potentially have two DPDK threads with an > lcore of 0 > (one in the host and one in the guest!). You can hit a situation whereby both > threads > simultaneously try to access the per-lcore cache for lcore 0 and corrupt it. Is this just an issue if two cores share lcore 0, or would the same thing happen if two cores have lcore 1? Would a hacky solution be to just somehow detect overlapping lcore ids between the guest and the host, and avoid them? > When we did something similar previously, we approached it differently by > adding ivshmem support directly to DPDK (there is an ivshmem target). Using > this, you > are able to select which dpdk objects you would like to share to the guest > from > the host. For example, you would say "I want to share this ring and this > mempool". > Then, the DPDK process running in the guest would automatically detect that it > was running in a virtualized environment with ivshmem (it would detect the > ivshmem > pci device) and allow the guest application to get those objects. In this > model, you have > better security (you don’t need to share an entire page - you can just share > objects), you > can run a primary process in the guest (even if the host process is a primary > process). > However, you still have the problem with per-lcore cache corruption described > above. > The way you can resolve this is : > a) don’t ever free (or allocate) packets in the guest This doesn't seem feasible in general. It's somewhat of an odd requirement that would be hard to enforce I suspect. > b) create a free and alloc ring that gets handled in the host. If the guest > needs an mbuf, > it would request one from the alloc ring. If it needs to free an mbuf, it > would push it > to the free ring. This seems like a lot of work. Do we really care that much about IVSHMEM? My impression was that we're more-or-less standardizing around vhost-user. My inclination would be to drop the patch instead of implementing this logic unless there's consensus that IVSHMEM is going to be an relevant IO mechanism in the future. > However, maybe for your use-case this isn’t an issue? Yeah for our (Melvin + Me) specific use case, we'll just do (a) and not free or allocate packets, shouldn't be too hard for us. Ethan > >> > >> >Signed-off-by: Melvin Walls >> >Signed-off-by: Ethan Jackson >> >--- >> > lib/netdev-dpdk.c | 158 >> >+- >> > 1 file changed, 122 insertions(+), 36 deletions(-) >> > >> >diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index >> >5ae805e..5abe90f 100644 >> >--- a/lib/netdev-dpdk.c >> >+++ b/lib/netdev-dpdk.c >> >@@ -227,6 +227,10 @@ struct netdev_dpdk { >> > /* Identifier used to distinguish vhost devices from each other */ >> > char vhost_id[PATH_MAX]; >> > >> >+/* Rings for secondary processes in IVSHMEM setups, NULL otherwise >> */ >> >+struct rte_ring *rx_ring; >> >+struct rte_ring *tx_ring; >> >+ >> > /* In dpdk_list. */ >> > struct ovs_list list_node OVS_GUARDED_BY(dpdk_mutex); }; @@ >> >-340,12 +344,16 @@ dpdk_mp_get(int socket_id, int mtu) >> >OVS_REQUIRES(dpdk_mutex) >> > return NULL; >> > } >> > >> >-dmp->mp = rte_mempool_create(mp_name, mp_size, >> MBUF_SIZE(mtu), >> >- MP_CACHE_SZ, >> >- sizeof(struct >> >rte_pktmbuf_pool_private), >> >- rte_pktmbuf_pool_init, NULL, >> >- ovs_rte_pktmbuf_init, NULL, >> >- socket_id, 0); >> >+if (rte_eal_process_type() == RTE_PROC_PRIMARY) { >> >+dmp->mp = rte_mempool_create(mp_name, mp_size, >> >MBUF_SIZE(mtu), >> >+
Re: [ovs-dev] [PATCH 2/2] netdev-dpdk: Retry tx/rx queue setup until we don't get any failure.
Sorry for taking so long to get to this. The one question I have is: Is OVS the right layer to be fixing this? Isn't this really an issue of DPDK reporting a number of available queues that for practical purposes is wrong? I.E. Shouldn't this be fixed by the DPDK driver of this system? This patch feels like a hack to me . . . Ethan On Tue, Jul 28, 2015 at 2:36 AM, Stokes, Ian wrote: > Hi all, > > Just wondering what the status of this patch is? Is there any feedback > or queries we can answer to help? > > Thanks > Ian > >> -Original Message- >> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Traynor, >> Kevin >> Sent: Thursday, July 23, 2015 11:28 AM >> To: Daniele Di Proietto; dev@openvswitch.org >> Subject: Re: [ovs-dev] [PATCH 2/2] netdev-dpdk: Retry tx/rx queue setup >> until we don't get any failure. >> >> >> > -Original Message- >> > From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Daniele Di >> > Proietto >> > Sent: Thursday, July 16, 2015 7:48 PM >> > To: dev@openvswitch.org >> > Subject: [ovs-dev] [PATCH 2/2] netdev-dpdk: Retry tx/rx queue setup >> until we >> > don't get any failure. >> > >> > It has been observed that some DPDK device (e.g intel xl710) report an >> > high number of queues but make some of them available only for special >> > functions (SRIOV). Therefore the queues will be counted in >> > rte_eth_dev_info_get(), but rte_eth_tx_queue_setup() will fail. >> > >> > This commit works around the issue by retrying the device >> initialization >> > with a smaller number of queues, if a queue fails to setup. >> > >> > Reported-by: Ian Stokes >> > Signed-off-by: Daniele Di Proietto >> > --- >> > lib/netdev-dpdk.c | 100 +++-- >> --- >> > -- >> > 1 file changed, 73 insertions(+), 27 deletions(-) >> >> >> Acked-by: Kevin Traynor >> ___ >> dev mailing list >> dev@openvswitch.org >> http://openvswitch.org/mailman/listinfo/dev > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 2/2] netdev-dpdk: Retry tx/rx queue setup until we don't get any failure.
I personally am fine with waiting on this being fixed until December. My reading is, it only happens on one specific NIC, and even then only if you devote a huge number of cores to forwarding on that NIC. That said, I won't block this if another committer disagrees with me. Ben Justin Pravin? What do you think? Ethan On Thu, Jul 30, 2015 at 1:13 AM, Stokes, Ian wrote: > So ideally this will be fixed in a future release of DPDK. We have flagged > this. However that solution will not be in place until the DPDK 2.2 release > in December at the earliest (DPDK 2.1 is currently in release candidate mode > at the moment so it won't make it to that). When this has been changed in > DPDK we can revisit the OVS code. > > Technically DPDK is doing what it is supposed to with the current > implementation i.e. it is returning the max number of queues it supports. > From the OVS side I think we need to understand that this has a different > connotation to what it had with previously with NICS in terms of how many of > those queues are usable. > > Unfortunately I don’t see another way to negotiate the tx queue > initialization without something like the patch below. > Not until we have more explicit configuration details available for the HW > device from DPDK. > > Thanks > Ian > >> -Original Message- >> From: Ethan Jackson [mailto:et...@nicira.com] >> Sent: Wednesday, July 29, 2015 10:13 PM >> To: Stokes, Ian >> Cc: Traynor, Kevin; Daniele Di Proietto; dev@openvswitch.org >> Subject: Re: [ovs-dev] [PATCH 2/2] netdev-dpdk: Retry tx/rx queue setup >> until we don't get any failure. >> >> Sorry for taking so long to get to this. The one question I have is: >> Is OVS the right layer to be fixing this? Isn't this really an issue >> of DPDK reporting a number of available queues that for practical >> purposes is wrong? I.E. Shouldn't this be fixed by the DPDK driver of >> this system? This patch feels like a hack to me . . . >> >> Ethan >> >> On Tue, Jul 28, 2015 at 2:36 AM, Stokes, Ian >> wrote: >> > Hi all, >> > >> > Just wondering what the status of this patch is? Is there any feedback >> > or queries we can answer to help? >> > >> > Thanks >> > Ian >> > >> >> -Original Message- >> >> From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Traynor, >> >> Kevin >> >> Sent: Thursday, July 23, 2015 11:28 AM >> >> To: Daniele Di Proietto; dev@openvswitch.org >> >> Subject: Re: [ovs-dev] [PATCH 2/2] netdev-dpdk: Retry tx/rx queue >> setup >> >> until we don't get any failure. >> >> >> >> >> >> > -Original Message- >> >> > From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Daniele >> Di >> >> > Proietto >> >> > Sent: Thursday, July 16, 2015 7:48 PM >> >> > To: dev@openvswitch.org >> >> > Subject: [ovs-dev] [PATCH 2/2] netdev-dpdk: Retry tx/rx queue setup >> >> until we >> >> > don't get any failure. >> >> > >> >> > It has been observed that some DPDK device (e.g intel xl710) report >> an >> >> > high number of queues but make some of them available only for >> special >> >> > functions (SRIOV). Therefore the queues will be counted in >> >> > rte_eth_dev_info_get(), but rte_eth_tx_queue_setup() will fail. >> >> > >> >> > This commit works around the issue by retrying the device >> >> initialization >> >> > with a smaller number of queues, if a queue fails to setup. >> >> > >> >> > Reported-by: Ian Stokes >> >> > Signed-off-by: Daniele Di Proietto >> >> > --- >> >> > lib/netdev-dpdk.c | 100 +++--- >> --- >> >> --- >> >> > -- >> >> > 1 file changed, 73 insertions(+), 27 deletions(-) >> >> >> >> >> >> Acked-by: Kevin Traynor >> >> ___ >> >> dev mailing list >> >> dev@openvswitch.org >> >> http://openvswitch.org/mailman/listinfo/dev >> > ___ >> > dev mailing list >> > dev@openvswitch.org >> > http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v3] dpif-netdev: fix race for queues between pmd threads
Hey, Really sorry this has taken so long to merge. I dropped the ball. It's in now. Ethan On Tue, Aug 4, 2015 at 5:26 AM, Ilya Maximets wrote: > Will anyone plan to apply this patch? > > Best regards, Ilya Maximets. > > On 28.07.2015 23:48, Flavio Leitner wrote: >> On Tue, Jul 28, 2015 at 09:55:52AM +0300, Ilya Maximets wrote: >>> Currently pmd threads select queues in pmd_load_queues() according to >>> get_n_pmd_threads_on_numa(). This behavior leads to race between pmds, >>> beacause dp_netdev_set_pmds_on_numa() starts them one by one and >>> current number of threads changes incrementally. >>> >>> As a result we may have the following situation with 2 pmd threads: >>> >>> * dp_netdev_set_pmds_on_numa() >>> * pmd12 thread started. Currently only 1 pmd thread exists. >>> dpif_netdev(pmd12)|INFO|Core 1 processing port 'port_1' >>> dpif_netdev(pmd12)|INFO|Core 1 processing port 'port_2' >>> * pmd14 thread started. 2 pmd threads exists. >>> dpif_netdev|INFO|Created 2 pmd threads on numa node 0 >>> dpif_netdev(pmd14)|INFO|Core 2 processing port 'port_2' >>> >>> We have: >>> core 1 --> port 1, port 2 >>> core 2 --> port 2 >>> >>> Fix this by starting pmd threads only after all of them have >>> been configured. >>> >>> Cc: Daniele Di Proietto >>> Cc: Dyasly Sergey >>> Signed-off-by: Ilya Maximets >>> --- >> >> >> Looks good to me. >> Acked-by: Flavio Leitner >> >> >> > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 2/2] netdev-dpdk: Retry tx/rx queue setup until we don't get any failure.
Sure let's do that. If you write an INSTALL.DPDK note, and post it here. I'll incorporate it into the patch and merge. Thanks for your patience. Ethan On Tue, Aug 4, 2015 at 4:48 AM, Stokes, Ian wrote: > This issue currently affects Fortville cards on systems with a logical core > count greater than 64. > > Just to call out however that the number of cores setup by default to forward > traffic to a NIC using OVS with DPDK will be the total number of cores > detected on the system. This is the default behavior, thus there is no way > user can "devote" less cores to forwarding on the NIC. > > This means they will always hit this issue when using a system with 72 > logical cores and a Fortville card. There is no way they can work around the > issue with tx configuration changes. > > If this preferable then I can write up a note for theINSTALL.DPDK.md > explaining the issue. > > Thanks > Ian > >> -Original Message- >> From: Ben Pfaff [mailto:b...@nicira.com] >> Sent: Monday, August 03, 2015 6:11 PM >> To: Ethan Jackson >> Cc: Stokes, Ian; Justin Pettit; Pravin Shelar; Traynor, Kevin; Daniele >> Di Proietto; dev@openvswitch.org >> Subject: Re: [ovs-dev] [PATCH 2/2] netdev-dpdk: Retry tx/rx queue setup >> until we don't get any failure. >> >> If that's an accurate description of the problem then that seems fine to >> me. We could add a note to INSTALL.DPDK.md describing the issue I >> suppose, if someone wants to write one up. >> >> On Sat, Aug 01, 2015 at 02:40:16PM -0700, Ethan Jackson wrote: >> > I personally am fine with waiting on this being fixed until December. >> > My reading is, it only happens on one specific NIC, and even then only >> > if you devote a huge number of cores to forwarding on that NIC. >> > >> > That said, I won't block this if another committer disagrees with me. >> > Ben Justin Pravin? What do you think? >> > >> > Ethan >> > >> > On Thu, Jul 30, 2015 at 1:13 AM, Stokes, Ian >> wrote: >> > > So ideally this will be fixed in a future release of DPDK. We have >> flagged this. However that solution will not be in place until the DPDK >> 2.2 release in December at the earliest (DPDK 2.1 is currently in >> release candidate mode at the moment so it won't make it to that). When >> this has been changed in DPDK we can revisit the OVS code. >> > > >> > > Technically DPDK is doing what it is supposed to with the current >> implementation i.e. it is returning the max number of queues it >> supports. From the OVS side I think we need to understand that this has >> a different connotation to what it had with previously with NICS in >> terms of how many of those queues are usable. >> > > >> > > Unfortunately I don’t see another way to negotiate the tx queue >> initialization without something like the patch below. >> > > Not until we have more explicit configuration details available for >> the HW device from DPDK. >> > > >> > > Thanks >> > > Ian >> > > >> > >> -Original Message- >> > >> From: Ethan Jackson [mailto:et...@nicira.com] >> > >> Sent: Wednesday, July 29, 2015 10:13 PM >> > >> To: Stokes, Ian >> > >> Cc: Traynor, Kevin; Daniele Di Proietto; dev@openvswitch.org >> > >> Subject: Re: [ovs-dev] [PATCH 2/2] netdev-dpdk: Retry tx/rx queue >> setup >> > >> until we don't get any failure. >> > >> >> > >> Sorry for taking so long to get to this. The one question I have >> is: >> > >> Is OVS the right layer to be fixing this? Isn't this really an >> issue >> > >> of DPDK reporting a number of available queues that for practical >> > >> purposes is wrong? I.E. Shouldn't this be fixed by the DPDK driver >> of >> > >> this system? This patch feels like a hack to me . . . >> > >> >> > >> Ethan >> > >> >> > >> On Tue, Jul 28, 2015 at 2:36 AM, Stokes, Ian >> > >> wrote: >> > >> > Hi all, >> > >> > >> > >> > Just wondering what the status of this patch is? Is there any >> feedback >> > >> > or queries we can answer to help? >> > >> > >> > >> > Thanks >> > >> > Ian >> > >> > >> > >> >> -Original Message- >> > >> >> From:
[ovs-dev] [PATCH] ofproto: Allow in-place modifications of datapath flows.
There are certain use cases (such as bond rebalancing) where a datapath flow's actions may change, while it's wildcard pattern remains the same. Before this patch, revalidators would note the change, delete the flow, and wait for the handlers to install an updated version. This is inefficient, as many packets could get punted to userspace before the new flow is finally installed. To improve the situation, this patch implements in place modification of datapath flows. If the revalidators detect the only change to a given ukey is its actions, instead of deleting it, it does a put with the MODIFY flag set. Signed-off-by: Ethan Jackson --- ofproto/ofproto-dpif-upcall.c | 118 +++--- 1 file changed, 87 insertions(+), 31 deletions(-) diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c index 59010c2..7abc97d 100644 --- a/ofproto/ofproto-dpif-upcall.c +++ b/ofproto/ofproto-dpif-upcall.c @@ -150,6 +150,12 @@ enum upcall_type { IPFIX_UPCALL/* Per-bridge sampling. */ }; +enum reval_result { +UKEY_KEEP, +UKEY_DELETE, +UKEY_MODIFY +}; + struct upcall { struct ofproto_dpif *ofproto; /* Parent ofproto. */ const struct recirc_id_node *recirc; /* Recirculation context. */ @@ -1663,33 +1669,41 @@ should_revalidate(const struct udpif *udpif, uint64_t packets, return false; } -static bool +/* Verifies that the datapath actions of 'ukey' are still correct, and pushes + * 'stats' for it. + * + * Returns a recommended action for 'ukey', options include: UKEY_DELETE, + * meaning the ukey should be deleted. UKEY_KEEP, meaning the ukey is fine as + * is. Or UKEY_MODIFY, meaning the ukey's actions should be changed but is + * otherwise fine. If UKEY_MODIFY, is returned, callers should change the + * ukey's actions to those found in the caller supplied 'odp_actions' buffer. + * */ +static enum reval_result revalidate_ukey(struct udpif *udpif, struct udpif_key *ukey, -const struct dpif_flow_stats *stats, uint64_t reval_seq) +const struct dpif_flow_stats *stats, +struct ofpbuf *odp_actions, uint64_t reval_seq) OVS_REQUIRES(ukey->mutex) { -uint64_t odp_actions_stub[1024 / 8]; -struct ofpbuf odp_actions = OFPBUF_STUB_INITIALIZER(odp_actions_stub); - struct xlate_out xout, *xoutp; struct netflow *netflow; struct ofproto_dpif *ofproto; struct dpif_flow_stats push; struct flow flow, dp_mask; struct flow_wildcards wc; +enum reval_result result; uint64_t *dp64, *xout64; ofp_port_t ofp_in_port; struct xlate_in xin; long long int last_used; int error; size_t i; -bool ok; bool need_revalidate; -ok = false; +result = UKEY_DELETE; xoutp = NULL; netflow = NULL; +ofpbuf_clear(odp_actions); need_revalidate = (ukey->reval_seq != reval_seq); last_used = ukey->stats.used; push.used = stats->used; @@ -1703,20 +1717,19 @@ revalidate_ukey(struct udpif *udpif, struct udpif_key *ukey, if (need_revalidate && last_used && !should_revalidate(udpif, push.n_packets, last_used)) { -ok = false; goto exit; } /* We will push the stats, so update the ukey stats cache. */ ukey->stats = *stats; if (!push.n_packets && !need_revalidate) { -ok = true; +result = UKEY_KEEP; goto exit; } if (ukey->xcache && !need_revalidate) { xlate_push_stats(ukey->xcache, &push); -ok = true; +result = UKEY_KEEP; goto exit; } @@ -1739,7 +1752,7 @@ revalidate_ukey(struct udpif *udpif, struct udpif_key *ukey, } xlate_in_init(&xin, ofproto, &flow, ofp_in_port, NULL, push.tcp_flags, - NULL, need_revalidate ? &wc : NULL, &odp_actions); + NULL, need_revalidate ? &wc : NULL, odp_actions); if (push.n_packets) { xin.resubmit_stats = &push; xin.may_learn = true; @@ -1749,18 +1762,14 @@ revalidate_ukey(struct udpif *udpif, struct udpif_key *ukey, xoutp = &xout; if (!need_revalidate) { -ok = true; +result = UKEY_KEEP; goto exit; } if (xout.slow) { -ofpbuf_clear(&odp_actions); +ofpbuf_clear(odp_actions); compose_slow_path(udpif, &xout, &flow, flow.in_port.odp_port, - &odp_actions); -} - -if (!ofpbuf_equal(&odp_actions, ukey->actions)) { -goto exit; + odp_actions); } if (odp_flow_key_to_mask(ukey->mask, ukey->mask_len, ukey->key, @@ -1781,18 +1790,24 @@ revalidate_ukey(struct udpif *udpif, struct udpif_key *ukey, } } -ok = true; +if (!ofpbuf_equal(
Re: [ovs-dev] [PATCH 2/2] netdev-dpdk: Retry tx/rx queue setup until we don't get any failure.
Great, I'll merge both today. Thanks a lot, Ethan On Thu, Aug 6, 2015 at 9:00 AM, Stokes, Ian wrote: > No problem, > > I've sent a patch to the mailing list with these changes for INSTALL.DPDK.md. > > Thanks > Ian > >> -Original Message- >> From: Ethan Jackson [mailto:et...@nicira.com] >> Sent: Tuesday, August 04, 2015 10:46 PM >> To: Stokes, Ian >> Cc: Ben Pfaff; Justin Pettit; Pravin Shelar; Traynor, Kevin; Daniele Di >> Proietto; dev@openvswitch.org >> Subject: Re: [ovs-dev] [PATCH 2/2] netdev-dpdk: Retry tx/rx queue setup >> until we don't get any failure. >> >> Sure let's do that. If you write an INSTALL.DPDK note, and post it >> here. I'll incorporate it into the patch and merge. Thanks for your >> patience. >> >> Ethan >> >> On Tue, Aug 4, 2015 at 4:48 AM, Stokes, Ian >> wrote: >> > This issue currently affects Fortville cards on systems with a logical >> core count greater than 64. >> > >> > Just to call out however that the number of cores setup by default to >> forward traffic to a NIC using OVS with DPDK will be the total number of >> cores detected on the system. This is the default behavior, thus there >> is no way user can "devote" less cores to forwarding on the NIC. >> > >> > This means they will always hit this issue when using a system with 72 >> logical cores and a Fortville card. There is no way they can work around >> the issue with tx configuration changes. >> > >> > If this preferable then I can write up a note for theINSTALL.DPDK.md >> explaining the issue. >> > >> > Thanks >> > Ian >> > >> >> -Original Message- >> >> From: Ben Pfaff [mailto:b...@nicira.com] >> >> Sent: Monday, August 03, 2015 6:11 PM >> >> To: Ethan Jackson >> >> Cc: Stokes, Ian; Justin Pettit; Pravin Shelar; Traynor, Kevin; >> Daniele >> >> Di Proietto; dev@openvswitch.org >> >> Subject: Re: [ovs-dev] [PATCH 2/2] netdev-dpdk: Retry tx/rx queue >> setup >> >> until we don't get any failure. >> >> >> >> If that's an accurate description of the problem then that seems fine >> to >> >> me. We could add a note to INSTALL.DPDK.md describing the issue I >> >> suppose, if someone wants to write one up. >> >> >> >> On Sat, Aug 01, 2015 at 02:40:16PM -0700, Ethan Jackson wrote: >> >> > I personally am fine with waiting on this being fixed until >> December. >> >> > My reading is, it only happens on one specific NIC, and even then >> only >> >> > if you devote a huge number of cores to forwarding on that NIC. >> >> > >> >> > That said, I won't block this if another committer disagrees with >> me. >> >> > Ben Justin Pravin? What do you think? >> >> > >> >> > Ethan >> >> > >> >> > On Thu, Jul 30, 2015 at 1:13 AM, Stokes, Ian >> >> wrote: >> >> > > So ideally this will be fixed in a future release of DPDK. We >> have >> >> flagged this. However that solution will not be in place until the >> DPDK >> >> 2.2 release in December at the earliest (DPDK 2.1 is currently in >> >> release candidate mode at the moment so it won't make it to that). >> When >> >> this has been changed in DPDK we can revisit the OVS code. >> >> > > >> >> > > Technically DPDK is doing what it is supposed to with the current >> >> implementation i.e. it is returning the max number of queues it >> >> supports. From the OVS side I think we need to understand that this >> has >> >> a different connotation to what it had with previously with NICS in >> >> terms of how many of those queues are usable. >> >> > > >> >> > > Unfortunately I don’t see another way to negotiate the tx queue >> >> initialization without something like the patch below. >> >> > > Not until we have more explicit configuration details available >> for >> >> the HW device from DPDK. >> >> > > >> >> > > Thanks >> >> > > Ian >> >> > > >> >> > >> -Original Message- >> >> > >> From: Ethan Jackson [mailto:et...@nicira.com] >> >> > >> Sent: Wednesday, July 29, 2015 10:13 PM >> >> > >>
Re: [ovs-dev] [PATCH 2/2] netdev-dpdk: Retry tx/rx queue setup until we don't get any failure.
Merged and backported to 2.4. Thanks Ethan On Thu, Aug 6, 2015 at 11:23 AM, Ethan Jackson wrote: > Great, I'll merge both today. > > Thanks a lot, > Ethan > > On Thu, Aug 6, 2015 at 9:00 AM, Stokes, Ian wrote: >> No problem, >> >> I've sent a patch to the mailing list with these changes for INSTALL.DPDK.md. >> >> Thanks >> Ian >> >>> -Original Message- >>> From: Ethan Jackson [mailto:et...@nicira.com] >>> Sent: Tuesday, August 04, 2015 10:46 PM >>> To: Stokes, Ian >>> Cc: Ben Pfaff; Justin Pettit; Pravin Shelar; Traynor, Kevin; Daniele Di >>> Proietto; dev@openvswitch.org >>> Subject: Re: [ovs-dev] [PATCH 2/2] netdev-dpdk: Retry tx/rx queue setup >>> until we don't get any failure. >>> >>> Sure let's do that. If you write an INSTALL.DPDK note, and post it >>> here. I'll incorporate it into the patch and merge. Thanks for your >>> patience. >>> >>> Ethan >>> >>> On Tue, Aug 4, 2015 at 4:48 AM, Stokes, Ian >>> wrote: >>> > This issue currently affects Fortville cards on systems with a logical >>> core count greater than 64. >>> > >>> > Just to call out however that the number of cores setup by default to >>> forward traffic to a NIC using OVS with DPDK will be the total number of >>> cores detected on the system. This is the default behavior, thus there >>> is no way user can "devote" less cores to forwarding on the NIC. >>> > >>> > This means they will always hit this issue when using a system with 72 >>> logical cores and a Fortville card. There is no way they can work around >>> the issue with tx configuration changes. >>> > >>> > If this preferable then I can write up a note for theINSTALL.DPDK.md >>> explaining the issue. >>> > >>> > Thanks >>> > Ian >>> > >>> >> -Original Message- >>> >> From: Ben Pfaff [mailto:b...@nicira.com] >>> >> Sent: Monday, August 03, 2015 6:11 PM >>> >> To: Ethan Jackson >>> >> Cc: Stokes, Ian; Justin Pettit; Pravin Shelar; Traynor, Kevin; >>> Daniele >>> >> Di Proietto; dev@openvswitch.org >>> >> Subject: Re: [ovs-dev] [PATCH 2/2] netdev-dpdk: Retry tx/rx queue >>> setup >>> >> until we don't get any failure. >>> >> >>> >> If that's an accurate description of the problem then that seems fine >>> to >>> >> me. We could add a note to INSTALL.DPDK.md describing the issue I >>> >> suppose, if someone wants to write one up. >>> >> >>> >> On Sat, Aug 01, 2015 at 02:40:16PM -0700, Ethan Jackson wrote: >>> >> > I personally am fine with waiting on this being fixed until >>> December. >>> >> > My reading is, it only happens on one specific NIC, and even then >>> only >>> >> > if you devote a huge number of cores to forwarding on that NIC. >>> >> > >>> >> > That said, I won't block this if another committer disagrees with >>> me. >>> >> > Ben Justin Pravin? What do you think? >>> >> > >>> >> > Ethan >>> >> > >>> >> > On Thu, Jul 30, 2015 at 1:13 AM, Stokes, Ian >>> >> wrote: >>> >> > > So ideally this will be fixed in a future release of DPDK. We >>> have >>> >> flagged this. However that solution will not be in place until the >>> DPDK >>> >> 2.2 release in December at the earliest (DPDK 2.1 is currently in >>> >> release candidate mode at the moment so it won't make it to that). >>> When >>> >> this has been changed in DPDK we can revisit the OVS code. >>> >> > > >>> >> > > Technically DPDK is doing what it is supposed to with the current >>> >> implementation i.e. it is returning the max number of queues it >>> >> supports. From the OVS side I think we need to understand that this >>> has >>> >> a different connotation to what it had with previously with NICS in >>> >> terms of how many of those queues are usable. >>> >> > > >>> >> > > Unfortunately I don’t see another way to negotiate the tx queue >>> >> initialization without something like the patch below. >>> >&
Re: [ovs-dev] [PATCH] ofproto: Allow in-place modifications of datapath flows.
Your call either way. I don't feel strongly. Ethan On Thu, Aug 6, 2015 at 1:07 PM, Justin Pettit wrote: > This is cleaner than I was expecting. Do you think we want to backport it to > 2.3? > > --Justin > > >> On Aug 6, 2015, at 12:30 PM, Jarno Rajahalme wrote: >> >> Ethan, >> >> This ended up being a very clean patch, thank you! >> >> IMO this is good to go, but I’d like to see a test case that shows the >> modify operation in the logs to prevent regression in the future. Maybe work >> off some of the bond rebalance test cases to prove that modify is used in >> the case we currently know suffers from the lack of modify support? >> >> Also, there are some minor comments below you could address at the same time. >> >> Jarno >> >>> On Aug 5, 2015, at 4:07 PM, Ethan Jackson wrote: >>> >>> There are certain use cases (such as bond rebalancing) where a >>> datapath flow's actions may change, while it's wildcard pattern >>> remains the same. Before this patch, revalidators would note the >>> change, delete the flow, and wait for the handlers to install an >>> updated version. This is inefficient, as many packets could get >>> punted to userspace before the new flow is finally installed. >>> >>> To improve the situation, this patch implements in place modification >>> of datapath flows. If the revalidators detect the only change to a >>> given ukey is its actions, instead of deleting it, it does a put with >>> the MODIFY flag set. >>> >>> Signed-off-by: Ethan Jackson >>> --- >>> ofproto/ofproto-dpif-upcall.c | 118 >>> +++--- >>> 1 file changed, 87 insertions(+), 31 deletions(-) >>> >>> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c >>> index 59010c2..7abc97d 100644 >>> --- a/ofproto/ofproto-dpif-upcall.c >>> +++ b/ofproto/ofproto-dpif-upcall.c >>> @@ -150,6 +150,12 @@ enum upcall_type { >>>IPFIX_UPCALL/* Per-bridge sampling. */ >>> }; >>> >>> +enum reval_result { >>> +UKEY_KEEP, >>> +UKEY_DELETE, >>> +UKEY_MODIFY >>> +}; >>> + >>> struct upcall { >>>struct ofproto_dpif *ofproto; /* Parent ofproto. */ >>>const struct recirc_id_node *recirc; /* Recirculation context. */ >>> @@ -1663,33 +1669,41 @@ should_revalidate(const struct udpif *udpif, >>> uint64_t packets, >>>return false; >>> } >>> >>> -static bool >>> +/* Verifies that the datapath actions of 'ukey' are still correct, and >>> pushes >>> + * 'stats' for it. >>> + * >>> + * Returns a recommended action for 'ukey', options include: UKEY_DELETE, >>> + * meaning the ukey should be deleted. UKEY_KEEP, meaning the ukey is >>> fine as >>> + * is. Or UKEY_MODIFY, meaning the ukey's actions should be changed but is >>> + * otherwise fine. If UKEY_MODIFY, is returned, callers should change the >>> + * ukey's actions to those found in the caller supplied 'odp_actions' >>> buffer. >>> + * */ >> >> This would be easier to read if you reformat this with each option listed in >> it’s own line, and drop the repeating “meaning the”. >> >>> +static enum reval_result >>> revalidate_ukey(struct udpif *udpif, struct udpif_key *ukey, >>> -const struct dpif_flow_stats *stats, uint64_t reval_seq) >>> +const struct dpif_flow_stats *stats, >>> +struct ofpbuf *odp_actions, uint64_t reval_seq) >>>OVS_REQUIRES(ukey->mutex) >>> { >>> -uint64_t odp_actions_stub[1024 / 8]; >>> -struct ofpbuf odp_actions = OFPBUF_STUB_INITIALIZER(odp_actions_stub); >>> - >>>struct xlate_out xout, *xoutp; >>>struct netflow *netflow; >>>struct ofproto_dpif *ofproto; >>>struct dpif_flow_stats push; >>>struct flow flow, dp_mask; >>>struct flow_wildcards wc; >>> +enum reval_result result; >>>uint64_t *dp64, *xout64; >>>ofp_port_t ofp_in_port; >>>struct xlate_in xin; >>>long long int last_used; >>>int error; >>>size_t i; >>> -bool ok; >>>bool need_revalidate; >>> >>> -ok = false; >>> +result = UKEY_DELETE; >>>
Re: [ovs-dev] [PATCH v2 1/2] upcall: Fix minor race when deleting ukeys.
Yeah sorry this patch is not great. That second problem is solved in the follow on. I'd be somewhat inclined to drop this one all together. Ethan On Tue, Aug 11, 2015 at 11:15 AM, Jarno Rajahalme wrote: > >> On Aug 11, 2015, at 10:42 AM, Jarno Rajahalme wrote: >> >> >>> On Aug 10, 2015, at 6:46 PM, Ethan J. Jackson wrote: >>> >>> From: Ethan Jackson >>> >>> Since revalidator_sweep() doesn't hold the ukey mutex for each full >>> loop iteration, it's theoretically possible that two threads may >>> call ukey_delete() on the same ukey. If this happens, they both will >>> attempt to remove the ukey from he cmap, causing the loser of the race >>> to fail. >>> >>> Signed-off-by: Ethan J. Jackson >>> --- >>> ofproto/ofproto-dpif-upcall.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c >>> index 0f2e186..fddb535 100644 >>> --- a/ofproto/ofproto-dpif-upcall.c >>> +++ b/ofproto/ofproto-dpif-upcall.c >>> @@ -2076,7 +2076,6 @@ revalidator_sweep__(struct revalidator *revalidator, >>> bool purge) >>>flow_exists = ukey->flow_exists; >>>seq_mismatch = (ukey->dump_seq != dump_seq >>>&& ukey->reval_seq != reval_seq); >>> -ovs_mutex_unlock(&ukey->mutex); >>> >>>if (flow_exists >>>&& (purge >> >> There is a call to push_ukey_ops() between these blocks. It calls >> push_ukey_ops__(), which will take a lock on the ukeys, including the last >> one added and still locked, so this would result in double locking. >> > > I missed the fact that handle_missed_revalidation() in here also takes the > ukey lock. > > Jarno > >> This could be resolved by moving the “ if (n_ops == REVALIDATE_MAX_BATCH)” >> block after the (moved) ukey unlock, similarly to the remainder ops handling >> at the end. However, I’m not sure if this would defeat the purpose of this >> patch. >> >> Also, it seems that generally the umap lock is taken first and ukey locks >> after, so there is a possibility of a deadlock if the locks are taken in the >> reverse order, like in this patch. >> >> It would be really great if we could employ clang thread safety analysis >> more than we do in this file currently, but I’m not sure if that is possible. >> >> Jarno >> >>> @@ -2095,6 +2094,7 @@ revalidator_sweep__(struct revalidator *revalidator, >>> bool purge) >>>ukey_delete(umap, ukey); >>>ovs_mutex_unlock(&umap->mutex); >>>} >>> +ovs_mutex_unlock(&ukey->mutex); >>>} >>> >>>if (n_ops) { >>> -- >>> 2.1.0 >>> >>> ___ >>> dev mailing list >>> dev@openvswitch.org >>> http://openvswitch.org/mailman/listinfo/dev >> > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2 1/2] upcall: Fix minor race when deleting ukeys.
Ah, that's a good point I had missed. I think the logic is fine as is then. I think I'll add a comment to that affect at the beginning of the function. Ethan On Tue, Aug 11, 2015 at 1:51 PM, Joe Stringer wrote: > FWIW the "slice" logic in the for loop is meant to ensure that no > revalidator threads concurrently clean the same map. > > There's two cases where the main thread might call into this as well > though, one is when reconfiguring threads (which shouldn't conflict, > because it stops all revalidators first), and the ovs-appctl > revalidator/purge case, which could hit something like this, but I > don't think there is a case where we use this functionality while OVS > is under load, so the likelyhood of problems is low. > > On 11 August 2015 at 12:18, Ethan Jackson wrote: >> Yeah sorry this patch is not great. That second problem is solved in >> the follow on. I'd be somewhat inclined to drop this one all >> together. >> >> Ethan >> >> On Tue, Aug 11, 2015 at 11:15 AM, Jarno Rajahalme >> wrote: >>> >>>> On Aug 11, 2015, at 10:42 AM, Jarno Rajahalme >>>> wrote: >>>> >>>> >>>>> On Aug 10, 2015, at 6:46 PM, Ethan J. Jackson wrote: >>>>> >>>>> From: Ethan Jackson >>>>> >>>>> Since revalidator_sweep() doesn't hold the ukey mutex for each full >>>>> loop iteration, it's theoretically possible that two threads may >>>>> call ukey_delete() on the same ukey. If this happens, they both will >>>>> attempt to remove the ukey from he cmap, causing the loser of the race >>>>> to fail. >>>>> >>>>> Signed-off-by: Ethan J. Jackson >>>>> --- >>>>> ofproto/ofproto-dpif-upcall.c | 2 +- >>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>> >>>>> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c >>>>> index 0f2e186..fddb535 100644 >>>>> --- a/ofproto/ofproto-dpif-upcall.c >>>>> +++ b/ofproto/ofproto-dpif-upcall.c >>>>> @@ -2076,7 +2076,6 @@ revalidator_sweep__(struct revalidator >>>>> *revalidator, bool purge) >>>>>flow_exists = ukey->flow_exists; >>>>>seq_mismatch = (ukey->dump_seq != dump_seq >>>>>&& ukey->reval_seq != reval_seq); >>>>> -ovs_mutex_unlock(&ukey->mutex); >>>>> >>>>>if (flow_exists >>>>>&& (purge >>>> >>>> There is a call to push_ukey_ops() between these blocks. It calls >>>> push_ukey_ops__(), which will take a lock on the ukeys, including the last >>>> one added and still locked, so this would result in double locking. >>>> >>> >>> I missed the fact that handle_missed_revalidation() in here also takes the >>> ukey lock. >>> >>> Jarno >>> >>>> This could be resolved by moving the “ if (n_ops == REVALIDATE_MAX_BATCH)” >>>> block after the (moved) ukey unlock, similarly to the remainder ops >>>> handling at the end. However, I’m not sure if this would defeat the >>>> purpose of this patch. >>>> >>>> Also, it seems that generally the umap lock is taken first and ukey locks >>>> after, so there is a possibility of a deadlock if the locks are taken in >>>> the reverse order, like in this patch. >>>> >>>> It would be really great if we could employ clang thread safety analysis >>>> more than we do in this file currently, but I’m not sure if that is >>>> possible. >>>> >>>> Jarno >>>> >>>>> @@ -2095,6 +2094,7 @@ revalidator_sweep__(struct revalidator >>>>> *revalidator, bool purge) >>>>>ukey_delete(umap, ukey); >>>>>ovs_mutex_unlock(&umap->mutex); >>>>>} >>>>> +ovs_mutex_unlock(&ukey->mutex); >>>>>} >>>>> >>>>>if (n_ops) { >>>>> -- >>>>> 2.1.0 >>>>> >>>>> ___ >>>>> dev mailing list >>>>> dev@openvswitch.org >>>>> http://openvswitch.org/mailman/listinfo/dev >>>> >>> >> ___ >> dev mailing list >> dev@openvswitch.org >> http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v3 1/2] ofproto-dpif-upcall: Make ukey actions modifiable with RCU.
agreed, sent a new version. Ethan On Wed, Aug 12, 2015 at 5:01 PM, Jarno Rajahalme wrote: > >> On Aug 12, 2015, at 4:13 PM, Ethan J. Jackson wrote: >> >> From: Ethan Jackson >> >> Future patches will need to modify ukey actions in some instances. >> This patch makes this possible by protecting them with RCU. It also >> adds thread safety checks to enforce the new protection mechanism. >> >> Signed-off-by: Ethan J. Jackson >> --- >> ofproto/ofproto-dpif-upcall.c | 47 >> +++ >> 1 file changed, 39 insertions(+), 8 deletions(-) >> >> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c >> index 0f2e186..c57cebd 100644 >> --- a/ofproto/ofproto-dpif-upcall.c >> +++ b/ofproto/ofproto-dpif-upcall.c >> @@ -215,7 +215,6 @@ struct udpif_key { >> size_t key_len;/* Length of 'key'. */ >> const struct nlattr *mask; /* Datapath flow mask. */ >> size_t mask_len; /* Length of 'mask'. */ >> -struct ofpbuf *actions;/* Datapath flow actions as nlattrs. */ >> ovs_u128 ufid; /* Unique flow identifier. */ >> bool ufid_present; /* True if 'ufid' is in datapath. */ >> uint32_t hash; /* Pre-computed hash for 'key'. */ >> @@ -228,6 +227,10 @@ struct udpif_key { >> uint64_t reval_seq OVS_GUARDED; /* Tracks udpif->reval_seq. */ >> bool flow_exists OVS_GUARDED; /* Ensures flows are only >> deleted >> once. */ >> +/* Datapath flow actions as nlattrs. Protected by RCU. Lockless reads >> can >> + * be made with ukey_get_actions(), otherwise a lock is required. >> Updates >> + * should be made with the ukey_set_actions() function. */ >> +const struct ofpbuf *actions OVS_GUARDED; >> > > IMO it would be prudent to define this as: > > OVSRCU_TYPE(const struct ofpbuf *) actions; > > and then always use ovsrcu_get() to access it, and ovsrcu_set() to set it. > This makes sure the memory barriers are there, i.e., the compiler will not > rearrange any initialization of the new actions after the point in which > other threads can see it. > > With this the OVS_GUARDED would not be needed here. > > Jarno > >> struct xlate_cache *xcache OVS_GUARDED; /* Cache for xlate entries that >>* are affected by this ukey. >> @@ -287,6 +290,8 @@ static struct udpif_key *ukey_create_from_upcall(struct >> upcall *, >> static int ukey_create_from_dpif_flow(const struct udpif *, >> const struct dpif_flow *, >> struct udpif_key **); >> +static void ukey_get_actions(struct udpif_key *, const struct nlattr >> **actions, >> + size_t *size); >> static bool ukey_install_start(struct udpif *, struct udpif_key *ukey); >> static bool ukey_install_finish(struct udpif_key *ukey, int error); >> static bool ukey_install(struct udpif *udpif, struct udpif_key *ukey); >> @@ -1127,7 +1132,7 @@ process_upcall(struct udpif *udpif, struct upcall >> *upcall, >> if (upcall->sflow) { >> union user_action_cookie cookie; >> const struct nlattr *actions; >> -int actions_len = 0; >> +size_t actions_len = 0; >> struct dpif_sflow_actions sflow_actions; >> memset(&sflow_actions, 0, sizeof sflow_actions); >> memset(&cookie, 0, sizeof cookie); >> @@ -1145,8 +1150,7 @@ process_upcall(struct udpif *udpif, struct upcall >> *upcall, >> /* Lookup actions in userspace cache. */ >> struct udpif_key *ukey = ukey_lookup(udpif, upcall->ufid); >> if (ukey) { >> -actions = ukey->actions->data; >> -actions_len = ukey->actions->size; >> +ukey_get_actions(ukey, &actions, &actions_len); >> dpif_sflow_read_actions(flow, actions, actions_len, >> &sflow_actions); >> } >> @@ -1273,8 +1277,8 @@ handle_upcalls(struct udpif *udpif, struct upcall >> *upcalls, >> op->dop.u.flow_put.mask_len = ukey->mask_len; >> op->dop.u.flow_put.ufid = upcall->ufid; >> op->dop.u.flow_put.sta
Re: [ovs-dev] [PATCH] authors: Update authors list with my new email.
Leather elbow patches are reserved for candidates. You should know better Justin. Ethan On Tue, Sep 1, 2015 at 10:40 AM, Justin Pettit wrote: > Have you been fitted for your leather elbow patches yet? > > Acked-by: Justin Pettit > > --Justin > > > > On Sep 1, 2015, at 10:38 AM, Ethan J. Jackson wrote: > > > > From: Ethan Jackson > > > > Signed-off-by: Ethan J. Jackson > > --- > > AUTHORS | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/AUTHORS b/AUTHORS > > index 41e7f57..6c0989a 100644 > > --- a/AUTHORS > > +++ b/AUTHORS > > @@ -61,7 +61,7 @@ Edward Tomasz Napierała tr...@freebsd.org > > Eitan Eliahuelia...@vmware.com > > Eohyung Lee liquidnu...@gmail.com > > Eric Sesterhenn eric.sesterh...@lsexperts.de > > -Ethan Jackson et...@nicira.com > > +Ethan J. Jacksone...@eecs.berkeley.edu > > Flavio Leitner f...@redhat.com > > Francesco Fusco ffu...@redhat.com > > FUJITA Tomonori fujita.tomon...@lab.ntt.co.jp > > -- > > 2.1.0 > > > > ___ > > dev mailing list > > dev@openvswitch.org > > http://openvswitch.org/mailman/listinfo/dev > > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] authors: Update authors list with my new email.
Yeah, there's another Ethan Jackson at MSR unfortunately. On Tue, Sep 1, 2015 at 10:42 AM, Ben Pfaff wrote: > On Tue, Sep 01, 2015 at 10:38:26AM -0700, Ethan J. Jackson wrote: >> From: Ethan Jackson >> >> Signed-off-by: Ethan J. Jackson >> --- >> AUTHORS | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/AUTHORS b/AUTHORS >> index 41e7f57..6c0989a 100644 >> --- a/AUTHORS >> +++ b/AUTHORS >> @@ -61,7 +61,7 @@ Edward Tomasz Napierała tr...@freebsd.org >> Eitan Eliahuelia...@vmware.com >> Eohyung Lee liquidnu...@gmail.com >> Eric Sesterhenn eric.sesterh...@lsexperts.de >> -Ethan Jackson et...@nicira.com >> +Ethan J. Jacksone...@eecs.berkeley.edu > > I see you're changing your name, too! > > Acked-by: Ben Pfaff ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH V9 1/3] dpif-linux: Implement the API functions to allow multiple handler threads read upcall.
Acked-by: Ethan Jackson On Thu, Apr 17, 2014 at 5:41 PM, Alex Wang wrote: > Signed-off-by: Alex Wang > > --- > V8 -> v9: > - move the fat-lock acquisition inside the if-statement in dpif_linux_run(). > - fix an error: dpif_linux_recv_purge() should acquire wrlock. > > V7 -> V8: > - rebase. > > V6 -> V7: > - rebase. > - set n_handlers to 0 in destroy_all_channels(). > - fix the indexing error in dpif_linux_recv_purge(). > > V5 -> V6: > - move the 'struct dpif_epoll' in 'struct dpif_handler' > - fix typos based on review. > - refine refresh_channels() comments. > - rename 'n_pids' of 'struct dpif_linux_vport' to 'n_upcall_pids'. > > V4 -> V5: > - rebase. > > V3 -> V4: > - add check of handler_id range. > > V2 -> V3: > - use OVS_DP_ATTR_USER_FEATURES to inform datapath about the type of > OVS_VPORT_ATTR_UPCALL_PID attribute. > - close epoll fd when destroying all channels. > > PATCH -> V2: > - rebase. > > major changes since RFC: > - free the 'upcall_pids' pointer in dpif_linux_refresh_channels(). > - for cache access efficiency, in 'recv()' index the handlers > array first and then port channels array. > --- > lib/dpif-linux.c | 549 > +++--- > lib/dpif-linux.h |3 +- > 2 files changed, 361 insertions(+), 191 deletions(-) > > diff --git a/lib/dpif-linux.c b/lib/dpif-linux.c > index e8efdac..c119c12 100644 > --- a/lib/dpif-linux.c > +++ b/lib/dpif-linux.c > @@ -36,6 +36,7 @@ > #include "dpif-provider.h" > #include "dynamic-string.h" > #include "flow.h" > +#include "fat-rwlock.h" > #include "netdev.h" > #include "netdev-linux.h" > #include "netdev-vport.h" > @@ -132,7 +133,13 @@ struct dpif_channel { > long long int last_poll;/* Last time this channel was polled. */ > }; > > -static void report_loss(struct dpif *, struct dpif_channel *); > +struct dpif_handler { > +struct dpif_channel *channels;/* Array of channels for each handler. */ > +struct epoll_event *epoll_events; > +int epoll_fd; /* epoll fd that includes channel socks. */ > +int n_events; /* Num events returned by epoll_wait(). */ > +int event_offset; /* Offset into 'epoll_events'. */ > +}; > > /* Datapath interface for the openvswitch Linux kernel module. */ > struct dpif_linux { > @@ -140,19 +147,20 @@ struct dpif_linux { > int dp_ifindex; > > /* Upcall messages. */ > -struct ovs_mutex upcall_lock; > -int uc_array_size; /* Size of 'channels' and 'epoll_events'. */ > -struct dpif_channel *channels; > -struct epoll_event *epoll_events; > -int epoll_fd; /* epoll fd that includes channel socks. */ > -int n_events; /* Num events returned by epoll_wait(). */ > -int event_offset; /* Offset into 'epoll_events'. */ > +struct fat_rwlock upcall_lock; > +struct dpif_handler *handlers; > +uint32_t n_handlers; /* Num of upcall handlers. */ > +int uc_array_size; /* Size of 'handler->channels' and */ > + /* 'handler->epoll_events'. */ > > /* Change notification. */ > struct nl_sock *port_notifier; /* vport multicast group subscriber. */ > bool refresh_channels; > }; > > +static void report_loss(struct dpif *, struct dpif_channel *, uint32_t > ch_idx, > +uint32_t handler_id); > + > static struct vlog_rate_limit error_rl = VLOG_RATE_LIMIT_INIT(, 5); > > /* Generic Netlink family numbers for OVS. > @@ -172,8 +180,7 @@ static int dpif_linux_init(void); > static int open_dpif(const struct dpif_linux_dp *, struct dpif **); > static uint32_t dpif_linux_port_get_pid(const struct dpif *, > odp_port_t port_no, uint32_t hash); > -static int dpif_linux_refresh_channels(struct dpif *); > - > +static int dpif_linux_refresh_channels(struct dpif *, uint32_t n_handlers); > static void dpif_linux_vport_to_ofpbuf(const struct dpif_linux_vport *, > struct ofpbuf *); > static int dpif_linux_vport_from_ofpbuf(struct dpif_linux_vport *, > @@ -238,6 +245,7 @@ dpif_linux_open(const struct dpif_class *class > OVS_UNUSED, const char *name, > } > dp_request.name = name; > dp_request.user_features |= OVS_DP_F_UNALIGNED; > +dp_request.user_features |= OVS_DP_F_VPORT_PIDS; >
Re: [ovs-dev] [PATCH V9 2/3] dpif-linux: Pass 'struct dpif_linux *' to internal static functions.
Acked-by: Ethan Jackson On Thu, Apr 17, 2014 at 5:41 PM, Alex Wang wrote: > This commit reformats the dpif-linux module so that all internal > static functions take 'struct dpif_linux *' as input argument. > This will allow the adding of thread-safety annotations. > > Signed-off-by: Alex Wang > > --- > V9: > - necessary changes for adding thread-safety annotations. > --- > lib/dpif-linux.c | 113 > +++--- > 1 file changed, 56 insertions(+), 57 deletions(-) > > diff --git a/lib/dpif-linux.c b/lib/dpif-linux.c > index c119c12..b70ddef 100644 > --- a/lib/dpif-linux.c > +++ b/lib/dpif-linux.c > @@ -158,8 +158,8 @@ struct dpif_linux { > bool refresh_channels; > }; > > -static void report_loss(struct dpif *, struct dpif_channel *, uint32_t > ch_idx, > -uint32_t handler_id); > +static void report_loss(struct dpif_linux *, struct dpif_channel *, > +uint32_t ch_idx, uint32_t handler_id); > > static struct vlog_rate_limit error_rl = VLOG_RATE_LIMIT_INIT(, 5); > > @@ -180,7 +180,7 @@ static int dpif_linux_init(void); > static int open_dpif(const struct dpif_linux_dp *, struct dpif **); > static uint32_t dpif_linux_port_get_pid(const struct dpif *, > odp_port_t port_no, uint32_t hash); > -static int dpif_linux_refresh_channels(struct dpif *, uint32_t n_handlers); > +static int dpif_linux_refresh_channels(struct dpif_linux *, uint32_t > n_handlers); > static void dpif_linux_vport_to_ofpbuf(const struct dpif_linux_vport *, > struct ofpbuf *); > static int dpif_linux_vport_from_ofpbuf(struct dpif_linux_vport *, > @@ -538,7 +538,7 @@ dpif_linux_run(struct dpif *dpif_) > if (dpif->refresh_channels) { > dpif->refresh_channels = false; > fat_rwlock_wrlock(&dpif->upcall_lock); > -dpif_linux_refresh_channels(dpif_, dpif->n_handlers); > +dpif_linux_refresh_channels(dpif, dpif->n_handlers); > fat_rwlock_unlock(&dpif->upcall_lock); > } > } > @@ -623,10 +623,9 @@ netdev_to_ovs_vport_type(const struct netdev *netdev) > } > > static int > -dpif_linux_port_add__(struct dpif *dpif_, struct netdev *netdev, > +dpif_linux_port_add__(struct dpif_linux *dpif, struct netdev *netdev, >odp_port_t *port_nop) > { > -struct dpif_linux *dpif = dpif_linux_cast(dpif_); > const struct netdev_tunnel_config *tnl_cfg; > char namebuf[NETDEV_VPORT_NAME_BUFSIZE]; > const char *name = netdev_vport_get_dpif_port(netdev, > @@ -654,7 +653,7 @@ dpif_linux_port_add__(struct dpif *dpif_, struct netdev > *netdev, > if (request.type == OVS_VPORT_TYPE_UNSPEC) { > VLOG_WARN_RL(&error_rl, "%s: cannot create port `%s' because it has " > "unsupported type `%s'", > - dpif_name(dpif_), name, type); > + dpif_name(&dpif->dpif), name, type); > vport_del_socksp(socksp, dpif->n_handlers); > return EINVAL; > } > @@ -684,7 +683,7 @@ dpif_linux_port_add__(struct dpif *dpif_, struct netdev > *netdev, > } else { > if (error == EBUSY && *port_nop != ODPP_NONE) { > VLOG_INFO("%s: requested port %"PRIu32" is in use", > - dpif_name(dpif_), *port_nop); > + dpif_name(&dpif->dpif), *port_nop); > } > > vport_del_socksp(socksp, dpif->n_handlers); > @@ -695,7 +694,7 @@ dpif_linux_port_add__(struct dpif *dpif_, struct netdev > *netdev, > error = vport_add_channels(dpif, *port_nop, socksp); > if (error) { > VLOG_INFO("%s: could not add channel for port %s", > - dpif_name(dpif_), name); > + dpif_name(&dpif->dpif), name); > > /* Delete the port. */ > dpif_linux_vport_init(&request); > @@ -724,16 +723,15 @@ dpif_linux_port_add(struct dpif *dpif_, struct netdev > *netdev, > int error; > > fat_rwlock_wrlock(&dpif->upcall_lock); > -error = dpif_linux_port_add__(dpif_, netdev, port_nop); > +error = dpif_linux_port_add__(dpif, netdev, port_nop); > fat_rwlock_unlock(&dpif->upcall_lock); > > return error; > } > > static int > -dpif_linux_port_del__(struct dpif *dpif_, odp_port_t port_no) > +dpif_linux_port_del__(struct dpif_linux *dpif, odp_port_t port_no) > { > -struct dpif_linux *dpif = dpif_linux_cast(dpif_); &
Re: [ovs-dev] [PATCH V9 3/3] dpif-linux: Add thread-safety annotations.
Acked-by: Ethan Jackson On Thu, Apr 17, 2014 at 5:41 PM, Alex Wang wrote: > Signed-off-by: Alex Wang > > --- > V9: > - add thread-safety annotations. > --- > lib/dpif-linux.c | 60 > +- > 1 file changed, 46 insertions(+), 14 deletions(-) > > diff --git a/lib/dpif-linux.c b/lib/dpif-linux.c > index b70ddef..a575b78 100644 > --- a/lib/dpif-linux.c > +++ b/lib/dpif-linux.c > @@ -180,7 +180,8 @@ static int dpif_linux_init(void); > static int open_dpif(const struct dpif_linux_dp *, struct dpif **); > static uint32_t dpif_linux_port_get_pid(const struct dpif *, > odp_port_t port_no, uint32_t hash); > -static int dpif_linux_refresh_channels(struct dpif_linux *, uint32_t > n_handlers); > +static int dpif_linux_refresh_channels(struct dpif_linux *, > + uint32_t n_handlers); > static void dpif_linux_vport_to_ofpbuf(const struct dpif_linux_vport *, > struct ofpbuf *); > static int dpif_linux_vport_from_ofpbuf(struct dpif_linux_vport *, > @@ -459,7 +460,7 @@ vport_del_channels(struct dpif_linux *dpif, odp_port_t > port_no) > } > > static void > -destroy_all_channels(struct dpif_linux *dpif) > +destroy_all_channels(struct dpif_linux *dpif) > OVS_REQ_WRLOCK(dpif->upcall_lock) > { > unsigned int i; > > @@ -625,6 +626,7 @@ netdev_to_ovs_vport_type(const struct netdev *netdev) > static int > dpif_linux_port_add__(struct dpif_linux *dpif, struct netdev *netdev, >odp_port_t *port_nop) > +OVS_REQ_WRLOCK(dpif->upcall_lock) > { > const struct netdev_tunnel_config *tnl_cfg; > char namebuf[NETDEV_VPORT_NAME_BUFSIZE]; > @@ -731,6 +733,7 @@ dpif_linux_port_add(struct dpif *dpif_, struct netdev > *netdev, > > static int > dpif_linux_port_del__(struct dpif_linux *dpif, odp_port_t port_no) > +OVS_REQ_WRLOCK(dpif->upcall_lock) > { > struct dpif_linux_vport vport; > int error; > @@ -809,14 +812,13 @@ dpif_linux_port_query_by_name(const struct dpif *dpif_, > const char *devname, > } > > static uint32_t > -dpif_linux_port_get_pid(const struct dpif *dpif_, odp_port_t port_no, > -uint32_t hash) > +dpif_linux_port_get_pid__(const struct dpif_linux *dpif, odp_port_t port_no, > + uint32_t hash) > +OVS_REQ_RDLOCK(dpif->upcall_lock) > { > -struct dpif_linux *dpif = dpif_linux_cast(dpif_); > uint32_t port_idx = odp_to_u32(port_no); > uint32_t pid = 0; > > -fat_rwlock_rdlock(&dpif->upcall_lock); > if (dpif->handlers) { > /* The ODPP_NONE "reserved" port number uses the "ovs-system"'s > * channel, since it is not heavily loaded. */ > @@ -825,11 +827,24 @@ dpif_linux_port_get_pid(const struct dpif *dpif_, > odp_port_t port_no, > > pid = nl_sock_pid(h->channels[idx].sock); > } > -fat_rwlock_unlock(&dpif->upcall_lock); > > return pid; > } > > +static uint32_t > +dpif_linux_port_get_pid(const struct dpif *dpif_, odp_port_t port_no, > +uint32_t hash) > +{ > +const struct dpif_linux *dpif = dpif_linux_cast(dpif_); > +uint32_t ret; > + > +fat_rwlock_rdlock(&dpif->upcall_lock); > +ret = dpif_linux_port_get_pid__(dpif, port_no, hash); > +fat_rwlock_unlock(&dpif->upcall_lock); > + > +return ret; > +} > + > static int > dpif_linux_flow_flush(struct dpif *dpif_) > { > @@ -1461,6 +1476,7 @@ dpif_linux_operate(struct dpif *dpif_, struct dpif_op > **ops, size_t n_ops) > * backing kernel vports. */ > static int > dpif_linux_refresh_channels(struct dpif_linux *dpif, uint32_t n_handlers) > +OVS_REQ_WRLOCK(dpif->upcall_lock) > { > unsigned long int *keep_channels; > struct dpif_linux_vport vport; > @@ -1586,6 +1602,7 @@ dpif_linux_refresh_channels(struct dpif_linux *dpif, > uint32_t n_handlers) > > static int > dpif_linux_recv_set__(struct dpif_linux *dpif, bool enable) > +OVS_REQ_WRLOCK(dpif->upcall_lock) > { > if ((dpif->handlers != NULL) == enable) { > return 0; > @@ -1702,6 +1719,7 @@ parse_odp_packet(struct ofpbuf *buf, struct dpif_upcall > *upcall, > static int > dpif_linux_recv__(struct dpif_linux *dpif, uint32_t handler_id, >struct dpif_upcall *upcall, struct ofpbuf *buf) > +OVS_REQ_RDLOCK(dpif->upcall_lock) > { > struct dpif_handler *handler; > int read_tries = 0; > @@ -1787,25 +1805,30 @@ dpif_linu
Re: [ovs-dev] [PATCH RFC V9] ofproto-dpif-upcall: Remove the dispatcher thread.
> + *- "flow_dumper" thread that reads the kernel flow table and dispatches Lost the initial "A" here. > struct upcall { > -struct list list_node; /* For queuing upcalls. */ > +bool is_valid; /* If the upcall can be used. */ > struct flow_miss *flow_miss;/* This upcall's flow_miss. */ Looks like is_valid isn't necessary anymore. I was going to suggest some other major changes, but I think it probably makes more sense to do it as a separate patch. Go ahead and merge. Acked-by: Ethan Jackson > > /* Raw upcall plus data for keeping track of the memory backing it. */ > @@ -219,15 +215,16 @@ struct flow_miss { > bool put; > }; > > -static void upcall_destroy(struct upcall *); > - > static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5); > static struct list all_udpifs = LIST_INITIALIZER(&all_udpifs); > > -static void recv_upcalls(struct udpif *); > -static void handle_upcalls(struct handler *handler, struct list *upcalls); > +static size_t read_upcalls(struct handler *, > + struct upcall upcalls[FLOW_MISS_MAX_BATCH], > + struct flow_miss miss_buf[FLOW_MISS_MAX_BATCH], > + struct hmap *); > +static void handle_upcalls(struct handler *, struct hmap *, struct upcall *, > + size_t n_upcalls); > static void *udpif_flow_dumper(void *); > -static void *udpif_dispatcher(void *); > static void *udpif_upcall_handler(void *); > static void *udpif_revalidator(void *); > static uint64_t udpif_get_n_flows(struct udpif *); > @@ -315,9 +312,6 @@ udpif_set_threads(struct udpif *udpif, size_t n_handlers, > for (i = 0; i < udpif->n_handlers; i++) { > struct handler *handler = &udpif->handlers[i]; > > -ovs_mutex_lock(&handler->mutex); > -xpthread_cond_signal(&handler->wake_cond); > -ovs_mutex_unlock(&handler->mutex); > xpthread_join(handler->thread, NULL); > } > > @@ -331,7 +325,6 @@ udpif_set_threads(struct udpif *udpif, size_t n_handlers, > } > > xpthread_join(udpif->flow_dumper, NULL); > -xpthread_join(udpif->dispatcher, NULL); > > for (i = 0; i < udpif->n_revalidators; i++) { > struct revalidator *revalidator = &udpif->revalidators[i]; > @@ -353,17 +346,7 @@ udpif_set_threads(struct udpif *udpif, size_t n_handlers, > } > > for (i = 0; i < udpif->n_handlers; i++) { > -struct handler *handler = &udpif->handlers[i]; > -struct upcall *miss, *next; > - > -LIST_FOR_EACH_SAFE (miss, next, list_node, &handler->upcalls) { > -list_remove(&miss->list_node); > -upcall_destroy(miss); > -} > -ovs_mutex_destroy(&handler->mutex); > - > -xpthread_cond_destroy(&handler->wake_cond); > -free(handler->name); > +free(udpif->handlers[i].name); > } > latch_poll(&udpif->exit_latch); > > @@ -376,7 +359,7 @@ udpif_set_threads(struct udpif *udpif, size_t n_handlers, > udpif->n_handlers = 0; > } > > -error = dpif_handlers_set(udpif->dpif, 1); > +error = dpif_handlers_set(udpif->dpif, n_handlers); > if (error) { > VLOG_ERR("failed to configure handlers in dpif %s: %s", > dpif_name(udpif->dpif), ovs_strerror(error)); > @@ -395,10 +378,7 @@ udpif_set_threads(struct udpif *udpif, size_t n_handlers, > struct handler *handler = &udpif->handlers[i]; > > handler->udpif = udpif; > -list_init(&handler->upcalls); > -handler->need_signal = false; > -xpthread_cond_init(&handler->wake_cond, NULL); > -ovs_mutex_init(&handler->mutex); > +handler->handler_id = i; > xpthread_create(&handler->thread, NULL, udpif_upcall_handler, > handler); > } > @@ -416,7 +396,6 @@ udpif_set_threads(struct udpif *udpif, size_t n_handlers, > xpthread_create(&revalidator->thread, NULL, udpif_revalidator, > revalidator); > } > -xpthread_create(&udpif->dispatcher, NULL, udpif_dispatcher, udpif); > xpthread_create(&udpif->flow_dumper, NULL, udpif_flow_dumper, udpif); > } > > @@ -461,16 +440,9 @@ udpif_get_memory_usage(struct udpif *udpif, struct simap
Re: [ovs-dev] [PATCH] ofproto-dpif-upcall: Fix ovs-vswitchd crash.
Acked-by: Ethan Jackson On Mon, Apr 21, 2014 at 6:13 PM, Alex Wang wrote: > On current master, caller of udpif_set_threads() can pass 0 value > on n_handlers and n_revalidators to delete all handler and revalidator > threads. > > After commit 9a159f748866 (ofproto-dpif-upcall: Remove the dispatcher > thread.), udpif_set_threads() also calls the dpif_handlers_set() with > the 0 value 'n_handlers'. Since dpif level always assume the 'n_handlers' > be non-zero, this causes warnings and even crash of ovs-vswitchd. > > This commit fixes the above issue by defining separate functions for > starting and stopping handler and revalidator threads. So udpif_set_threads() > will never be called with 0 value arguments. > > Reported-by: Andy Zhou > Signed-off-by: Alex Wang > Co-authored-by: Ethan Jackson > --- > ofproto/ofproto-dpif-upcall.c | 76 > +++-- > 1 file changed, 51 insertions(+), 25 deletions(-) > > diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c > index 4ee5bf5..4d87b88 100644 > --- a/ofproto/ofproto-dpif-upcall.c > +++ b/ofproto/ofproto-dpif-upcall.c > @@ -223,6 +223,9 @@ static size_t read_upcalls(struct handler *, > struct hmap *); > static void handle_upcalls(struct handler *, struct hmap *, struct upcall *, > size_t n_upcalls); > +static void udpif_stop_threads(struct udpif *); > +static void udpif_start_threads(struct udpif *, size_t n_handlers, > +size_t n_revalidators); > static void *udpif_flow_dumper(void *); > static void *udpif_upcall_handler(void *); > static void *udpif_revalidator(void *); > @@ -278,8 +281,7 @@ udpif_create(struct dpif_backer *backer, struct dpif > *dpif) > void > udpif_destroy(struct udpif *udpif) > { > -udpif_set_threads(udpif, 0, 0); > -udpif_flush(udpif); > +udpif_stop_threads(udpif); > > list_remove(&udpif->list_node); > latch_destroy(&udpif->exit_latch); > @@ -289,18 +291,11 @@ udpif_destroy(struct udpif *udpif) > free(udpif); > } > > -/* Tells 'udpif' how many threads it should use to handle upcalls. Disables > - * all threads if 'n_handlers' and 'n_revalidators' is zero. 'udpif''s > - * datapath handle must have packet reception enabled before starting > threads. > - */ > -void > -udpif_set_threads(struct udpif *udpif, size_t n_handlers, > - size_t n_revalidators) > +/* Stops the handler and revalidator threads, must be enclosed in > + * ovsrcu quiescent state unless when destroying udpif. */ > +static void > +udpif_stop_threads(struct udpif *udpif) > { > -int error; > - > -ovsrcu_quiesce_start(); > -/* Stop the old threads (if any). */ > if (udpif->handlers && > (udpif->n_handlers != n_handlers > || udpif->n_revalidators != n_revalidators)) { > @@ -347,6 +342,7 @@ udpif_set_threads(struct udpif *udpif, size_t n_handlers, > for (i = 0; i < udpif->n_handlers; i++) { > free(udpif->handlers[i].name); > } > + > latch_poll(&udpif->exit_latch); > > free(udpif->revalidators); > @@ -357,15 +353,14 @@ udpif_set_threads(struct udpif *udpif, size_t > n_handlers, > udpif->handlers = NULL; > udpif->n_handlers = 0; > } > +} > > -error = dpif_handlers_set(udpif->dpif, n_handlers); > -if (error) { > -VLOG_ERR("failed to configure handlers in dpif %s: %s", > - dpif_name(udpif->dpif), ovs_strerror(error)); > -return; > -} > - > -/* Start new threads (if necessary). */ > +/* Starts the handler and revalidator threads, must be enclosed in > + * ovsrcu quiescent state. */ > +static void > +udpif_start_threads(struct udpif *udpif, size_t n_handlers, > +size_t n_revalidators) > +{ > if (!udpif->handlers && n_handlers) { > size_t i; > > @@ -397,7 +392,31 @@ udpif_set_threads(struct udpif *udpif, size_t n_handlers, > } > xpthread_create(&udpif->flow_dumper, NULL, udpif_flow_dumper, udpif); > } > +} > + > +/* Tells 'udpif' how many threads it should use to handle upcalls. > + * 'n_handlers' and 'n_revalidators' can never be zero. 'udpif''s > + * datapath handle must have packet reception enabled before starting > + * threads. */ > +void > +udpif_set_threads(struct udpif *udpif, size_t n_handlers, > + size_t n
Re: [ovs-dev] [PATCH] ofproto-dpif-upcall: Fix logic error in handler/revalidator threads creation and deletion.
Acked-by: Ethan Jackson On Mon, Apr 21, 2014 at 8:25 PM, Alex Wang wrote: > Commit 1f8675481e (ofproto-dpif-upcall: Fix ovs-vswitchd crash.) > directly copied the udpif_set_threads() logic to udpif_stop_threads() > and udpif_start_threads(). In fact, this was erroneous and caused > unittest failures. > > This commit fixes the above issue by correcting the checks in > udpif_stop_threads() and udpif_start_threads(), and adding necessary > checks in udpif_set_threads(). > > Signed-off-by: Alex Wang > --- > ofproto/ofproto-dpif-upcall.c | 17 +++-- > 1 file changed, 11 insertions(+), 6 deletions(-) > > diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c > index 8e43e84..f8c0301 100644 > --- a/ofproto/ofproto-dpif-upcall.c > +++ b/ofproto/ofproto-dpif-upcall.c > @@ -296,9 +296,8 @@ udpif_destroy(struct udpif *udpif) > static void > udpif_stop_threads(struct udpif *udpif) > { > -if (udpif->handlers && > -(udpif->n_handlers != n_handlers > - || udpif->n_revalidators != n_revalidators)) { > +if (udpif->handlers > +&& (udpif->n_handlers != 0 || udpif->n_revalidators != 0)) { > size_t i; > > latch_set(&udpif->exit_latch); > @@ -360,7 +359,7 @@ static void > udpif_start_threads(struct udpif *udpif, size_t n_handlers, > size_t n_revalidators) > { > -if (!udpif->handlers && n_handlers) { > +if (!udpif->handlers && !udpif->revalidators) { > size_t i; > > udpif->n_handlers = n_handlers; > @@ -406,7 +405,11 @@ udpif_set_threads(struct udpif *udpif, size_t n_handlers, > ovs_assert(n_handlers && n_revalidators); > > ovsrcu_quiesce_start(); > -udpif_stop_threads(udpif); > +if (udpif->handlers && > +(udpif->n_handlers != n_handlers > + || udpif->n_revalidators != n_revalidators)) { > +udpif_stop_threads(udpif); > +} > > error = dpif_handlers_set(udpif->dpif, n_handlers); > if (error) { > @@ -415,7 +418,9 @@ udpif_set_threads(struct udpif *udpif, size_t n_handlers, > return; > } > > -udpif_start_threads(udpif, n_handlers, n_revalidators); > +if (!udpif->handlers && n_handlers) { > +udpif_start_threads(udpif, n_handlers, n_revalidators); > +} > ovsrcu_quiesce_end(); > } > > -- > 1.7.9.5 > > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] ofproto-dpif-upcall: Fix logic error in handler/revalidator threads creation and deletion.
LGTM as well On Mon, Apr 21, 2014 at 9:09 PM, Alex Wang wrote: > Commit 1f8675481e (ofproto-dpif-upcall: Fix ovs-vswitchd crash.) > directly copied the udpif_set_threads() logic to udpif_stop_threads() > and udpif_start_threads(). In fact, this was erroneous and caused > unittest failures. > > This commit fixes the above issue by correcting the checks in > udpif_stop_threads() and udpif_start_threads(), and adding necessary > checks in udpif_set_threads(). > > Acked-by: Ethan Jackson > Signed-off-by: Alex Wang > > --- > PATCH -> V2: > - refine the checks. > --- > ofproto/ofproto-dpif-upcall.c | 16 ++-- > 1 file changed, 10 insertions(+), 6 deletions(-) > > diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c > index 8e43e84..416cfdc 100644 > --- a/ofproto/ofproto-dpif-upcall.c > +++ b/ofproto/ofproto-dpif-upcall.c > @@ -296,9 +296,7 @@ udpif_destroy(struct udpif *udpif) > static void > udpif_stop_threads(struct udpif *udpif) > { > -if (udpif->handlers && > -(udpif->n_handlers != n_handlers > - || udpif->n_revalidators != n_revalidators)) { > +if (udpif && (udpif->n_handlers != 0 || udpif->n_revalidators != 0)) { > size_t i; > > latch_set(&udpif->exit_latch); > @@ -360,7 +358,7 @@ static void > udpif_start_threads(struct udpif *udpif, size_t n_handlers, > size_t n_revalidators) > { > -if (!udpif->handlers && n_handlers) { > +if (udpif && (!udpif->handlers && !udpif->revalidators)) { > size_t i; > > udpif->n_handlers = n_handlers; > @@ -403,10 +401,14 @@ udpif_set_threads(struct udpif *udpif, size_t > n_handlers, > { > int error; > > +ovs_assert(udpif); > ovs_assert(n_handlers && n_revalidators); > > ovsrcu_quiesce_start(); > -udpif_stop_threads(udpif); > +if (udpif->n_handlers != n_handlers > +|| udpif->n_revalidators != n_revalidators) { > +udpif_stop_threads(udpif); > +} > > error = dpif_handlers_set(udpif->dpif, n_handlers); > if (error) { > @@ -415,7 +417,9 @@ udpif_set_threads(struct udpif *udpif, size_t n_handlers, > return; > } > > -udpif_start_threads(udpif, n_handlers, n_revalidators); > +if (!udpif->handlers && !udpif->revalidators) { > +udpif_start_threads(udpif, n_handlers, n_revalidators); > +} > ovsrcu_quiesce_end(); > } > > -- > 1.7.9.5 > > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] netdev: Add random tag to struct netdev.
> I don't understand why, if you need to know when vswitchd restarts (why do > you?), it should be tied to netdevs. So if you're monitoring a netdev for packet stats, you need to know if vswitchd restarted so that you can take account of the fact the stats were reset. I suppose it needs to be tied to netdevs to take account of the case when a tunnel is removed and re-added for whatever reason. Does that make sense? If so, an explanation of the sort should be added to the commit message. Ethan > > On Apr 22, 2014 4:09 PM, "Ryan Wilson 76511" wrote: >> >> netdev: Add random tag to struct netdev. >> >> >> Before, there would be no way to tell if ovs-vswitchd had >> been restarted or killed via ovsdb logging. Now, a random >> tag will be generated upon initialization of the netdev >> struct which happens when ovs-vswitchd is restarted. A >> change in tag value in ovsdb's Interface table likely >> indicates that ovs-vswitchd has been restarted or killed. >> >> Signed-off-by: Ryan Wilson >> >> --- >> >> diff --git a/lib/netdev-provider.h b/lib/netdev-provider.h >> >> index 9d8e67f..ea2e59c 100644 >> --- a/lib/netdev-provider.h >> +++ b/lib/netdev-provider.h >> @@ -40,6 +40,11 @@ struct netdev { >> const struct netdev_class *netdev_class; /* Functions to control >> this device. */ >> >> + /* A random number generated upon initialization of the netdev struct. >> + * If this number changes, the netdev struct has been re-initialized, >> + * likely indicating ovs-vswitchd has restarted. */ >> + uint32_t tag; >> + >> /* A sequence number which indicates changes in one of 'netdev''s >> * properties. It must be nonzero so that users have a value which >> * they may use as a reset when tracking 'netdev'. >> diff --git a/lib/netdev.c b/lib/netdev.c >> index 4736a97..70cb3e8 100644 >> --- a/lib/netdev.c >> +++ b/lib/netdev.c >> @@ -340,6 +340,7 @@ netdev_open(const char *name, const char *type, struct >> netdev **netdevp) >> netdev->name = xstrdup(name); >> netdev->change_seq = 1; >> netdev->node = shash_add(&netdev_shash, name, netdev); >> + netdev->tag = random_uint32(); >> >> /* By default enable one rx queue per netdev. */ >> if (netdev->netdev_class->rxq_alloc) { >> @@ -664,6 +665,13 @@ netdev_set_etheraddr(struct netdev *netdev, const >> uint8_t mac[ETH_ADDR_LEN]) >> return netdev->netdev_class->set_etheraddr(netdev, mac); >> } >> >> +/* Retrieves 'netdev''s random tag generated upon initialization. */ >> +int >> +netdev_get_tag(const struct netdev *netdev) >> +{ >> + return (int) netdev->tag; >> +} >> + >> /* Retrieves 'netdev''s MAC address. If successful, returns 0 and copies >> the >> * the MAC address into 'mac'. On failure, returns a positive errno value >> and >> * clears 'mac' to all-zeros. */ >> diff --git a/lib/netdev.h b/lib/netdev.h >> index 7cb3c80..169d595 100644 >> --- a/lib/netdev.h >> +++ b/lib/netdev.h >> @@ -158,6 +158,7 @@ const char *netdev_get_type_from_name(const char *); >> int netdev_get_mtu(const struct netdev *, int *mtup); >> int netdev_set_mtu(const struct netdev *, int mtu); >> int netdev_get_ifindex(const struct netdev *); >> +int netdev_get_tag(const struct netdev *netdev); >> >> /* Packet reception. */ >> int netdev_rxq_open(struct netdev *, struct netdev_rxq **, int id); >> diff --git a/tests/ofproto-macros.at b/tests/ofproto-macros.at >> index a82a9b1..ad2f342 100644 >> --- a/tests/ofproto-macros.at >> +++ b/tests/ofproto-macros.at >> @@ -38,6 +38,31 @@ m4_define([STRIP_DURATION], [[sed >> 's/\bduration=[0-9.]*s/duration=?s/']]) >> m4_define([STRIP_USED], [[sed 's/used:[0-9]\.[0-9]*/used:0.0/']]) >> m4_define([TESTABLE_LOG], [-vPATTERN:ANY:'%c|%p|%m']) >> >> +# OVS_VSWITCHD_CHECK_STDERR >> +# >> +# Checks ovs-vswitchd stderr output on startup for consistent messages. >> +m4_define([OVS_VSWITCHD_CHECK_STDERR], >> + [AT_CHECK([[sed < stderr ' >> +/vlog|INFO|opened log file/d >> +/vswitchd|INFO|ovs-vswitchd (Open vSwitch)/d >> +/reconnect|INFO|/d >> +/ofproto_dpif|INFO|/d >> +/ofproto_dpif|DBG|need revalidate in ofproto_wait_cb()/d >> +/bridge|INFO|/d >> +/connmgr|INFO|/d >> +/ofproto|INFO|using datapath ID/d >> +/ofproto|INFO|datapath ID changed to fedcba9876543210/d']])]) >> + >> +# OVS_VSWITCHD_SETENV >> +# >> +# Sets environment variables for ovs-server and ovs-vswitchd processes. >> +m4_define([OVS_VSWITCHD_SETENV], >> + [OVS_RUNDIR=`pwd`; export OVS_RUNDIR >> + OVS_LOGDIR=`pwd`; export OVS_LOGDIR >> + OVS_DBDIR=`pwd`; export OVS_DBDIR >> + OVS_SYSCONFDIR=`pwd`; export OVS_SYSCONFDIR >> + ON_EXIT([kill `cat ovsdb-server.pid ovs-vswitchd.pid`])]) >> + >> # OVS_VSWITCHD_START([vsctl-args], [vsctl-output], [=override]) >> # >> # Creates a database and starts ovsdb-server, starts ovs-vswitchd >> @@ -52,11 +77,7 @@ m4_define([TESTABLE_LOG], [-vPATTERN:ANY:'%c|%p|%m']) >> # won't work at all (which makes sense because tests should not access a >> # system's real Ethernet devices). >> m4_define([OVS_VSWITCHD_START], >> - [OVS_RUNDIR=`pwd`; export OVS_RUNDIR >> - OVS_LOGDIR=`pwd`;
Re: [ovs-dev] bug fixes
Though we've discussed it, at this point there is no public bug tracker for the OVS project. We just moved to github, so there may be one there, but to date we haven't used it. If you want to see what went into the 1.10 release, you may consider checking the 1.10 git log for the 1.10 branch. We write pretty comprehensive commit messages which should explain any problems which were fixed. Ethan On Tue, Apr 22, 2014 at 9:36 PM, Neelakantam Gaddam wrote: > Hi All, > > Please guide me to find out the openvswitch 1.10.0 bugs list and their > fixes. > > > -- > Thanks & Regards > Neelakantam Gaddam > > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] Userspace Netlink MMAP status
The problem has actually gotten worse since we've gotten rid of the dispatcher thread. Now each thread has it's own channel per port. I wonder if the right approach is to simply ditch the per-port fairness in the case where mmap netlink is enabled. I.E. we simply have one channel per thread and call it a day. Anyways, I don't have a lot of context on this thread, so take everything above with a grain of salt. Ethan On Wed, Apr 23, 2014 at 1:05 PM, Zoltan Kiss wrote: > Hi, > > I would like to ask, what's the status of enabling Netlink MMAP in the > userspace? I'm interested to see this progressing, but digging the mail > archive I've found it run into scalability issues: > > http://openvswitch.org/pipermail/dev/2013-December/034546.html > > And also it can't handle frags at the moment properly: > > http://openvswitch.org/pipermail/dev/2014-March/037337.html > > I was thinking about this scalability issue, and I think maybe we shouldn't > stick to the 16 KB frame size. I think in most cases the packets we are > actually sending up are small ones, in case of TCP the packets of the > handshakes are less than 100 bytes. And for the rest, we can call > genlmsg_new_unicast() with the full packet size, so it will fall back to > non-mmaped communication. And this would solve the frag-handling problem as > well. > Another approach to keep the frame size lower is to pass references to the > frags, which would point outside the ring buffer, to the actual page (which > need to be mapped for the userspace). I don't know how feasible would that > be, but huge linear buffers also has to be sliced up to frags. > > Regards, > > Zoli > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] ofproto-dpif: Improve code clarity and comments on recirc changes to rule_dpif_lookup
Acked-by: Ethan Jackson On Wed, Apr 23, 2014 at 1:26 PM, Andy Zhou wrote: > This patch improves the code readability and comments on the > recirculation related changes to rule_dpif_lookup() base on off-line > discussions with Jarno. There is no behavior changes. > > Signed-off-by: Andy Zhou > --- > ofproto/ofproto-dpif.c | 21 ++--- > 1 file changed, 14 insertions(+), 7 deletions(-) > > diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c > index 6a725e4..983cad5 100644 > --- a/ofproto/ofproto-dpif.c > +++ b/ofproto/ofproto-dpif.c > @@ -3199,14 +3199,21 @@ rule_dpif_lookup(struct ofproto_dpif *ofproto, struct > flow *flow, > uint8_t table_id; > > if (ofproto_dpif_get_enable_recirc(ofproto)) { > -if (flow->recirc_id == 0) { > -if (wc) { > -wc->masks.recirc_id = UINT32_MAX; > -} > -table_id = 0; > -} else { > -table_id = TBL_INTERNAL; > +/* Always exactly match recirc_id since datapath supports > + * recirculation. */ > +if (wc) { > +wc->masks.recirc_id = UINT32_MAX; > } > + > +/* Start looking up from internal table for post recirculation flows > + * or packets. We can also simply send all, including normal flows > + * or packets to the internal table. They will not match any post > + * recirculation rules except the 'catch all' rule that resubmit > + * them to table 0. > + * > + * As an optimization, we send normal flows and packets to table 0 > + * directly, saving one table lookup. */ > +table_id = flow->recirc_id ? TBL_INTERNAL : 0; > } else { > table_id = 0; > } > -- > 1.9.1 > > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 1/3] ofproto: RCU postpone rule destruction.
Acked-by: Ethan Jackson On Wed, Apr 23, 2014 at 4:20 PM, Jarno Rajahalme wrote: > This allows rules to be used without taking references while RCU > protected. > > The last step of destroying an ofproto also needs to be postponed, as > the rule destruction requires the class structure to be available at > the postponed destruction callback. > > Signed-off-by: Jarno Rajahalme > --- > ofproto/ofproto.c | 40 ++-- > 1 file changed, 26 insertions(+), 14 deletions(-) > > diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c > index f16005c..f10d3ae 100644 > --- a/ofproto/ofproto.c > +++ b/ofproto/ofproto.c > @@ -259,7 +259,6 @@ struct ofport_usage { > }; > > /* rule. */ > -static void ofproto_rule_destroy__(struct rule *); > static void ofproto_rule_send_removed(struct rule *, uint8_t reason); > static bool rule_is_modifiable(const struct rule *rule, > enum ofputil_flow_mod_flags flag); > @@ -1372,7 +1371,8 @@ ofproto_destroy(struct ofproto *p) > } > > p->ofproto_class->destruct(p); > -ofproto_destroy__(p); > +/* Destroying rules is deferred, must have 'ofproto' around for them. */ > +ovsrcu_postpone(ofproto_destroy__, p); > } > > /* Destroys the datapath with the respective 'name' and 'type'. With the > Linux > @@ -2642,6 +2642,23 @@ update_mtu(struct ofproto *p, struct ofport *port) > } > } > > +static void > +ofproto_rule_destroy__(struct rule *rule) > +OVS_NO_THREAD_SAFETY_ANALYSIS > +{ > +cls_rule_destroy(CONST_CAST(struct cls_rule *, &rule->cr)); > +rule_actions_destroy(rule_get_actions(rule)); > +ovs_mutex_destroy(&rule->mutex); > +rule->ofproto->ofproto_class->rule_dealloc(rule); > +} > + > +static void > +rule_destroy_cb(struct rule *rule) > +{ > +rule->ofproto->ofproto_class->rule_destruct(rule); > +ofproto_rule_destroy__(rule); > +} > + > void > ofproto_rule_ref(struct rule *rule) > { > @@ -2650,25 +2667,20 @@ ofproto_rule_ref(struct rule *rule) > } > } > > +/* Decrements 'rule''s ref_count and schedules 'rule' to be destroyed if the > + * ref_count reaches 0. > + * > + * Use of RCU allows short term use (between RCU quiescent periods) without > + * keeping a reference. A reference must be taken if the rule needs to > + * stay around accross the RCU quiescent periods. */ > void > ofproto_rule_unref(struct rule *rule) > { > if (rule && ovs_refcount_unref(&rule->ref_count) == 1) { > -rule->ofproto->ofproto_class->rule_destruct(rule); > -ofproto_rule_destroy__(rule); > +ovsrcu_postpone(rule_destroy_cb, rule); > } > } > > -static void > -ofproto_rule_destroy__(struct rule *rule) > -OVS_NO_THREAD_SAFETY_ANALYSIS > -{ > -cls_rule_destroy(CONST_CAST(struct cls_rule *, &rule->cr)); > -rule_actions_destroy(rule_get_actions(rule)); > -ovs_mutex_destroy(&rule->mutex); > -rule->ofproto->ofproto_class->rule_dealloc(rule); > -} > - > static uint32_t get_provider_meter_id(const struct ofproto *, >uint32_t of_meter_id); > > -- > 1.7.10.4 > > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 2/3] ofproto: Make taking rule reference conditional on lookup.
Traditionally we've put function comments just in the .c file, not in the .c and .h file as you've done for rule_dpif_lookup(). That said, I don't know where that convention came from and don't care that much . . . Acked-by: Ethan Jackson On Wed, Apr 23, 2014 at 4:20 PM, Jarno Rajahalme wrote: > Prior to this paths the rule lookup functions have always taken a > reference on the found rule before returning. Make this conditional, > so that unnecessary refs/unrefs can be avoided in a later patch. > > Signed-off-by: Jarno Rajahalme > --- > ofproto/ofproto-dpif-xlate.c |8 +++--- > ofproto/ofproto-dpif.c | 58 > +- > ofproto/ofproto-dpif.h | 20 --- > 3 files changed, 62 insertions(+), 24 deletions(-) > > diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c > index 248382f..c62424a 100644 > --- a/ofproto/ofproto-dpif-xlate.c > +++ b/ofproto/ofproto-dpif-xlate.c > @@ -2057,7 +2057,7 @@ xlate_table_action(struct xlate_ctx *ctx, ofp_port_t > in_port, uint8_t table_id, >!skip_wildcards >? &ctx->xout->wc : NULL, >honor_table_miss, > - &ctx->table_id, &rule); > + &ctx->table_id, &rule, true); > ctx->xin->flow.in_port.ofp_port = old_in_port; > > if (ctx->xin->resubmit_hook) { > @@ -2090,7 +2090,7 @@ xlate_table_action(struct xlate_ctx *ctx, ofp_port_t > in_port, uint8_t table_id, > } > > choose_miss_rule(config, ctx->xbridge->miss_rule, > - ctx->xbridge->no_packet_in_rule, &rule); > + ctx->xbridge->no_packet_in_rule, &rule, true); > > match: > if (rule) { > @@ -2654,7 +2654,7 @@ xlate_learn_action(struct xlate_ctx *ctx, > entry = xlate_cache_add_entry(ctx->xin->xcache, XC_LEARN); > entry->u.learn.ofproto = ctx->xin->ofproto; > rule_dpif_lookup(ctx->xbridge->ofproto, &ctx->xin->flow, NULL, > - &entry->u.learn.rule); > + &entry->u.learn.rule, true); > } > } > > @@ -3263,7 +3263,7 @@ xlate_actions__(struct xlate_in *xin, struct xlate_out > *xout) > if (!xin->ofpacts && !ctx.rule) { > ctx.table_id = rule_dpif_lookup(ctx.xbridge->ofproto, flow, > !xin->skip_wildcards ? wc : NULL, > -&rule); > +&rule, true); > if (ctx.xin->resubmit_stats) { > rule_dpif_credit_stats(rule, ctx.xin->resubmit_stats); > } > diff --git a/ofproto/ofproto-dpif.c b/ofproto/ofproto-dpif.c > index 983cad5..5669cd1 100644 > --- a/ofproto/ofproto-dpif.c > +++ b/ofproto/ofproto-dpif.c > @@ -3189,10 +3189,16 @@ rule_dpif_get_actions(const struct rule_dpif *rule) > * > * The return value will be zero unless there was a miss and > * OFPTC11_TABLE_MISS_CONTINUE is in effect for the sequence of tables > - * where misses occur. */ > + * where misses occur. > + * > + * The rule is returned in '*rule', which is valid at least until the next > + * RCU quiescent period. If the '*rule' needs to stay around longer, > + * a non-zero 'take_ref' must be passed in to cause a reference to be taken > + * on it before this returns. */ > uint8_t > rule_dpif_lookup(struct ofproto_dpif *ofproto, struct flow *flow, > - struct flow_wildcards *wc, struct rule_dpif **rule) > + struct flow_wildcards *wc, struct rule_dpif **rule, > + bool take_ref) > { > enum rule_dpif_lookup_verdict verdict; > enum ofputil_port_config config = 0; > @@ -3219,7 +3225,7 @@ rule_dpif_lookup(struct ofproto_dpif *ofproto, struct > flow *flow, > } > > verdict = rule_dpif_lookup_from_table(ofproto, flow, wc, true, > - &table_id, rule); > + &table_id, rule, take_ref); > > switch (verdict) { > case RULE_DPIF_LOOKUP_VERDICT_MATCH: > @@ -3248,13 +3254,17 @@ rule_dpif_lookup(struct ofproto_dpif *ofproto, struct > flow *flow, > } > > choose_miss_rule(config, ofproto->miss_rule, > - ofproto->no_packet_in_rule, rule); > + ofpr
Re: [ovs-dev] [PATCH 3/3] ofproto: Reduce taking rule references.
LGTM Acked-by: Ethan Jackson On Wed, Apr 23, 2014 at 4:20 PM, Jarno Rajahalme wrote: > Only take reference to a looked up rule when needed. > > This reduces the total CPU utilization of rule_ref/unref calls by 80%, > from 5% of total server CPU capacity to 1% in a netperf TCP_CRR > test stressing the userspace. > > Signed-off-by: Jarno Rajahalme > --- > ofproto/ofproto-dpif-xlate.c | 49 > +++--- > 1 file changed, 27 insertions(+), 22 deletions(-) > > diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c > index c62424a..6a56a15 100644 > --- a/ofproto/ofproto-dpif-xlate.c > +++ b/ofproto/ofproto-dpif-xlate.c > @@ -1996,13 +1996,6 @@ xlate_recursively(struct xlate_ctx *ctx, struct > rule_dpif *rule) > if (ctx->xin->resubmit_stats) { > rule_dpif_credit_stats(rule, ctx->xin->resubmit_stats); > } > -if (ctx->xin->xcache) { > -struct xc_entry *entry; > - > -entry = xlate_cache_add_entry(ctx->xin->xcache, XC_RULE); > -entry->u.rule = rule; > -rule_dpif_ref(rule); > -} > > ctx->resubmits++; > ctx->recurse++; > @@ -2057,7 +2050,8 @@ xlate_table_action(struct xlate_ctx *ctx, ofp_port_t > in_port, uint8_t table_id, >!skip_wildcards >? &ctx->xout->wc : NULL, >honor_table_miss, > - &ctx->table_id, &rule, true); > + &ctx->table_id, &rule, > + ctx->xin->xcache != NULL); > ctx->xin->flow.in_port.ofp_port = old_in_port; > > if (ctx->xin->resubmit_hook) { > @@ -2090,12 +2084,22 @@ xlate_table_action(struct xlate_ctx *ctx, ofp_port_t > in_port, uint8_t table_id, > } > > choose_miss_rule(config, ctx->xbridge->miss_rule, > - ctx->xbridge->no_packet_in_rule, &rule, true); > + ctx->xbridge->no_packet_in_rule, &rule, > + ctx->xin->xcache != NULL); > > match: > if (rule) { > +/* Fill in the cache entry here instead of xlate_recursively > + * to make the reference counting more explicit. We take a > + * reference in the lookups above if we are going to cache the > + * rule. */ > +if (ctx->xin->xcache) { > +struct xc_entry *entry; > + > +entry = xlate_cache_add_entry(ctx->xin->xcache, XC_RULE); > +entry->u.rule = rule; > +} > xlate_recursively(ctx, rule); > -rule_dpif_unref(rule); > } > > ctx->table_id = old_table_id; > @@ -2653,6 +2657,8 @@ xlate_learn_action(struct xlate_ctx *ctx, > > entry = xlate_cache_add_entry(ctx->xin->xcache, XC_LEARN); > entry->u.learn.ofproto = ctx->xin->ofproto; > +/* Lookup the learned rule, taking a reference on it. The reference > + * is released when this cache entry is deleted. */ > rule_dpif_lookup(ctx->xbridge->ofproto, &ctx->xin->flow, NULL, > &entry->u.learn.rule, true); > } > @@ -2678,10 +2684,11 @@ xlate_fin_timeout(struct xlate_ctx *ctx, > struct xc_entry *entry; > > entry = xlate_cache_add_entry(ctx->xin->xcache, XC_FIN_TIMEOUT); > +/* XC_RULE already holds a reference on the rule, none is taken > + * here. */ > entry->u.fin.rule = ctx->rule; > entry->u.fin.idle = oft->fin_idle_timeout; > entry->u.fin.hard = oft->fin_hard_timeout; > -rule_dpif_ref(ctx->rule); > } > } > } > @@ -3230,7 +3237,7 @@ xlate_actions__(struct xlate_in *xin, struct xlate_out > *xout) > > ctx.xbridge = xbridge_lookup(xin->ofproto); > if (!ctx.xbridge) { > -goto out; > +return; > } > > ctx.rule = xin->rule; > @@ -3263,7 +3270,7 @@ xlate_actions__(struct xlate_in *xin, struct xlate_out > *xout) > if (!xin->ofpacts && !ctx.rule) { > ctx.table_id = rule_dpif_lookup(ctx.xbridge->ofproto, flow, > !xin->skip_wildcards ? wc : NULL, > -&rule, true); > +
Re: [ovs-dev] [PATCH] bridge: When ports disappear from a datapath, add them back.
Thanks for taking care of this. Acked-by: Ethan Jackson On Wed, Apr 23, 2014 at 5:06 PM, Ben Pfaff wrote: > Before commit 2a73b1d73d4bdb (bridge: Reconfigure in single pass.), if a > port disappeared, for one reason or another, from a datapath, the next > bridge reconfiguration pass would notice and, if the port was still > configured in the database, add the port back to the datapath. That > commit, however, removed the logic from bridge_refresh_ofp_port() that > did that and failed to add the same logic to the replacement function > bridge_delete_or_reconfigure_ports(). This commit fixes the problem. > > To see this problem on a Linux kernel system: > > ovs-vsctl add-br br0 # 1 > tunctl -t tap# 2 > ovs-vsctl add-port br0 tap # 3 > ovs-dpctl show # 4 > tunctl -d tap# 5 > ovs-dpctl show # 6 > tunctl -t tap# 7 > ovs-vsctl del-port tap -- add-port br0 tap # 8 > ovs-dpctl show # 9 > > Steps 1-4 create a bridge and a tap and add it to the bridge and > demonstrate that the tap is part of the datapath. Step 5 and 6 delete > the tap and demonstrate that it has therefore disappeared from the > datapath. Step 7 recreates a tap with the same name, and step 8 > forces ovs-vswitchd to reconfigure. Step 9 shows the effect of the > fix: without the fix, the new tap is not added back to the datapath; > with this fix, it is. > > Special thanks to Gurucharan Shetty for finding a > simple reproduction case and then bisecting to find the commit that > introduced the problem. > > Bug #1238467. > Reported-by: Ronald Lee > Signed-off-by: Ben Pfaff > --- > vswitchd/bridge.c | 44 +--- > 1 file changed, 41 insertions(+), 3 deletions(-) > > diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c > index f46d002..45a1491 100644 > --- a/vswitchd/bridge.c > +++ b/vswitchd/bridge.c > @@ -252,6 +252,7 @@ static bool iface_is_internal(const struct > ovsrec_interface *iface, > static const char *iface_get_type(const struct ovsrec_interface *, >const struct ovsrec_bridge *); > static void iface_destroy(struct iface *); > +static void iface_destroy__(struct iface *); > static struct iface *iface_lookup(const struct bridge *, const char *name); > static struct iface *iface_find(const char *name); > static struct iface *iface_from_ofp_port(const struct bridge *, > @@ -667,6 +668,9 @@ bridge_delete_or_reconfigure_ports(struct bridge *br) > struct ofproto_port ofproto_port; > struct ofproto_port_dump dump; > > +struct sset ofproto_ports; > +struct port *port, *port_next; > + > /* List of "ofp_port"s to delete. We make a list instead of deleting > them > * right away because ofproto implementations aren't necessarily able to > * iterate through a changing list of ports in an entirely robust way. */ > @@ -676,11 +680,21 @@ bridge_delete_or_reconfigure_ports(struct bridge *br) > > del = NULL; > n = allocated = 0; > +sset_init(&ofproto_ports); > > +/* Main task: Iterate over the ports in 'br->ofproto' and remove the > ports > + * that are not configured in the database. (This commonly happens when > + * ports have been deleted, e.g. with "ovs-vsctl del-port".) > + * > + * Side tasks: Reconfigure the ports that are still in 'br'. Delete > ports > + * that have the wrong OpenFlow port number (and arrange to add them back > + * with the correct OpenFlow port number). */ > OFPROTO_PORT_FOR_EACH (&ofproto_port, &dump, br->ofproto) { > ofp_port_t requested_ofp_port; > struct iface *iface; > > +sset_add(&ofproto_ports, ofproto_port.name); > + > iface = iface_lookup(br, ofproto_port.name); > if (!iface) { > /* No such iface is configured, so we should delete this > @@ -746,11 +760,37 @@ bridge_delete_or_reconfigure_ports(struct bridge *br) > iface_destroy(iface); > del = add_ofp_port(ofproto_port.ofp_port, del, &n, &allocated); > } > - > for (i = 0; i < n; i++) { > ofproto_port_del(br->ofproto, del[i]); > } > free(del); > + > +/* Iterate over this module's idea of interfaces in 'br'. Remove any > ports > + * that we didn't see when we iterated through the datapath, i.e. ports > + * th
Re: [ovs-dev] [PATCHv13] ofproto-dpif-upcall: Remove the flow_dumper thread.
W00t, nice to see this in. Ethan On Wed, Apr 23, 2014 at 9:32 PM, Joe Stringer wrote: > Thanks, I tidied up the comments, added threadsafety annotations for xcache > and pushed. > > > On 24 April 2014 06:38, Ben Pfaff wrote: >> >> On Tue, Apr 22, 2014 at 04:54:24PM +1200, Joe Stringer wrote: >> > From: Ethan Jackson >> > >> > Previously, we had a separate flow_dumper thread that fetched flows from >> > the datapath to distribute to revalidator threads. This patch takes the >> > logic for dumping and pushes it into the revalidator threads, resulting >> > in simpler code with similar performance to the current code. >> > >> > One thread, the "leader", is responsible for beginning and ending each >> > flow dump, maintaining the flow_limit, and checking whether the >> > revalidator threads need to exit. All revalidator threads dump, >> > revalidate, delete datapath flows and garbage collect ukeys. >> > >> > Co-authored-by: Joe Stringer >> > Signed-off-by: Joe Stringer >> >> Here in the definition of struct udpif, I would mention that there are >> n_revalidators member of 'ukeys'. Otherwise the casual reader might >> guess that there was only one: >> > +/* During the flow dump phase, revalidators insert into these with >> > a random >> > + * distribution. During the garbage collection phase, each >> > revalidator >> > + * takes care of garbage collecting one of these hmaps. */ >> > +struct { >> > +struct ovs_mutex mutex;/* Guards the following. */ >> > +struct hmap hmap OVS_GUARDED; /* Datapath flow keys. */ >> > +} *ukeys; >> >> In the definition of struct udpif_key, the synchronization of most of >> the members is pretty clear, except for 'xcache'. >> >> Acked-by: Ben Pfaff > > > > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 01/10] lib/flow: Simplify miniflow accessors, add ipv6 support.
In the commit message s/suppoort/support/ Out of curiosity why replace the miniflow_get inline functions with macros? > +/* Separate loops for better optimization. */ Why do we need separate loops? You think it somehow improves the branch predictor or something? Does this actually help? It smells like premature optimization. > +hash = mhash_add(hash, saddr[0]); > +hash = mhash_add(hash, saddr[1]); > +hash = mhash_add(hash, saddr[2]); > +hash = mhash_add(hash, saddr[3]); > +hash = mhash_add(hash, daddr[0]); > +hash = mhash_add(hash, daddr[1]); > +hash = mhash_add(hash, daddr[2]); > +hash = mhash_add(hash, daddr[3]); Maybe use a loop? Acked-by: Ethan Jackson ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 02/10] lib: Inline functions used in classifier_lookup
The first line in the commit message needs a period. As Yamamoto asked, please verify that this actually helps. If it doesn't I'd prefer not to merge it. I think we need to get rid of the hindex anyways, so I'd prefer we don't inline it now. Assuming the above is addressed: Acked-by: Ethan Jackson On Mon, Apr 21, 2014 at 8:15 AM, Jarno Rajahalme wrote: > > On Apr 20, 2014, at 7:54 PM, YAMAMOTO Takashi wrote: > >> do these inlining yield measurable improvements? >> > > I have not measured this recently, but I’ll report back with later on this. > >> YAMAMOTO Takashi > > Jarno > > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 03/10] ofproto: Use classifer cursor API to collect vlan usage.
Acked-by: Ethan Jackson On Fri, Apr 18, 2014 at 12:41 PM, Jarno Rajahalme wrote: > This was the only place in OVS code that accessed classifier internal > data structures directly. Use the classifier cursor API instead, so > that following patches can hide classifier internal data structures. > > Note: There seems to be no test case to verify that this vlan usage > collection is implemented correctly. > > Signed-off-by: Jarno Rajahalme > --- > ofproto/ofproto.c | 25 +++-- > 1 file changed, 15 insertions(+), 10 deletions(-) > > diff --git a/ofproto/ofproto.c b/ofproto/ofproto.c > index f16005c..e2a593e 100644 > --- a/ofproto/ofproto.c > +++ b/ofproto/ofproto.c > @@ -6984,25 +6984,30 @@ ofproto_unixctl_init(void) > void > ofproto_get_vlan_usage(struct ofproto *ofproto, unsigned long int > *vlan_bitmap) > { > +struct match match; > +struct cls_rule target; > const struct oftable *oftable; > > +match_init_catchall(&match); > +match_set_vlan_vid_masked(&match, htons(VLAN_CFI), htons(VLAN_CFI)); > +cls_rule_init(&target, &match, 0); > + > free(ofproto->vlan_bitmap); > ofproto->vlan_bitmap = bitmap_allocate(4096); > ofproto->vlans_changed = false; > > OFPROTO_FOR_EACH_TABLE (oftable, ofproto) { > -const struct cls_subtable *table; > +struct cls_cursor cursor; > +struct rule *rule; > > fat_rwlock_rdlock(&oftable->cls.rwlock); > -HMAP_FOR_EACH (table, hmap_node, &oftable->cls.subtables) { > -if (minimask_get_vid_mask(&table->mask) == VLAN_VID_MASK) { > -const struct cls_rule *rule; > - > -HMAP_FOR_EACH (rule, hmap_node, &table->rules) { > -uint16_t vid = miniflow_get_vid(&rule->match.flow); > -bitmap_set1(vlan_bitmap, vid); > -bitmap_set1(ofproto->vlan_bitmap, vid); > -} > +cls_cursor_init(&cursor, &oftable->cls, &target); > +CLS_CURSOR_FOR_EACH (rule, cr, &cursor) { > +if (minimask_get_vid_mask(&rule->cr.match.mask) == > VLAN_VID_MASK) { > +uint16_t vid = miniflow_get_vid(&rule->cr.match.flow); > + > +bitmap_set1(vlan_bitmap, vid); > +bitmap_set1(ofproto->vlan_bitmap, vid); > } > } > fat_rwlock_unlock(&oftable->cls.rwlock); > -- > 1.7.10.4 > > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 04/10] lib/classifier: Hide more of the internal data structures.
So it seems to me that the only data needed in struct classifier by callers is the fat_rwlock. What if we add a classifier_rdlock() and a classifier_wrlock() function. That done, we could entirely hide struct classifier, and would need to make the distinction between it and the cls_classifier. Thoughts? Ethan On Fri, Apr 18, 2014 at 12:41 PM, Jarno Rajahalme wrote: > It is better not to expose definitions not needed by users. > > Signed-off-by: Jarno Rajahalme > --- > lib/classifier.c| 134 > --- > lib/classifier.h| 65 --- > tests/test-classifier.c |6 +-- > 3 files changed, 115 insertions(+), 90 deletions(-) > > diff --git a/lib/classifier.c b/lib/classifier.c > index 8ab1f9c..e48f0a1 100644 > --- a/lib/classifier.c > +++ b/lib/classifier.c > @@ -30,18 +30,70 @@ > > VLOG_DEFINE_THIS_MODULE(classifier); > > +struct trie_node; > + > +/* Prefix trie for a 'field' */ > +struct cls_trie { > +const struct mf_field *field; /* Trie field, or NULL. */ > +struct trie_node *root; /* NULL if none. */ > +}; > + > +struct cls_classifier { > +int n_rules;/* Total number of rules. */ > +uint8_t n_flow_segments; > +uint8_t flow_segments[CLS_MAX_INDICES]; /* Flow segment boundaries to use > + * for staged lookup. */ > +struct hmap subtables; /* Contains "struct cls_subtable"s. */ > +struct list subtables_priority; /* Subtables in descending priority > order. > + */ > +struct hmap partitions; /* Contains "struct cls_partition"s. */ > +struct cls_trie tries[CLS_MAX_TRIES]; /* Prefix tries. */ > +unsigned int n_tries; > +}; > + > +/* A set of rules that all have the same fields wildcarded. */ > +struct cls_subtable { > +struct hmap_node hmap_node; /* Within struct cls_classifier 'subtables' > + * hmap. */ > +struct list list_node; /* Within classifier 'subtables_priority' > list. > + */ > +struct hmap rules; /* Contains "struct cls_rule"s. */ > +struct minimask mask; /* Wildcards for fields. */ > +int n_rules;/* Number of rules, including duplicates. */ > +unsigned int max_priority; /* Max priority of any rule in the subtable. > */ > +unsigned int max_count; /* Count of max_priority rules. */ > +tag_type tag; /* Tag generated from mask for partitioning. > */ > +uint8_t n_indices; /* How many indices to use. */ > +uint8_t index_ofs[CLS_MAX_INDICES]; /* u32 flow segment boundaries. */ > +struct hindex indices[CLS_MAX_INDICES]; /* Staged lookup indices. */ > +unsigned int trie_plen[CLS_MAX_TRIES]; /* Trie prefix length in 'mask'. > */ > +}; > + > +/* Associates a metadata value (that is, a value of the OpenFlow 1.1+ > metadata > + * field) with tags for the "cls_subtable"s that contain rules that match > that > + * metadata value. */ > +struct cls_partition { > +struct hmap_node hmap_node; /* In struct cls_classifier's 'partitions' > + * hmap. */ > +ovs_be64 metadata; /* metadata value for this partition. */ > +tag_type tags; /* OR of each flow's cls_subtable tag. */ > +struct tag_tracker tracker; /* Tracks the bits in 'tags'. */ > +}; > + > + > + > struct trie_ctx; > -static struct cls_subtable *find_subtable(const struct classifier *, > +static struct cls_subtable *find_subtable(const struct cls_classifier *, >const struct minimask *); > -static struct cls_subtable *insert_subtable(struct classifier *, > +static struct cls_subtable *insert_subtable(struct cls_classifier *, > const struct minimask *); > > -static void destroy_subtable(struct classifier *, struct cls_subtable *); > +static void destroy_subtable(struct cls_classifier *, struct cls_subtable *); > > -static void update_subtables_after_insertion(struct classifier *, > +static void update_subtables_after_insertion(struct cls_classifier *, > struct cls_subtable *, > unsigned int new_priority); > -static void update_subtables_after_removal(struct classifier *, > +static void update_subtables_after_removal(struct cls_classifier *, > struct cls_subtable *, > unsigned int del_priority); > > @@ -51,7 +103,7 @@ static struct cls_rule *find_match_wc(const struct > cls_subtable *, >struct flow_wildcards *); > static struct cls_rule *find_equal(struct cls_subtable *, > const struct miniflow *, uint32_t hash); > -static struct cls_rule *ins
Re: [ovs-dev] [PATCH 05/10] lib/flow: Optimize minimask_has_extra() and minimask_is_catchall()
Acked-by: Ethan Jackson On Fri, Apr 18, 2014 at 12:41 PM, Jarno Rajahalme wrote: > We only need to iterate over the bits masked by the 'b' in > minimask_has_extra(), since for zeroes in 'b' there can be no 'extra' > wildcards in 'a', as 'b' has already wildcarded all the bits. > > minimask_is_catchall() can be simplified by the invariant that mask's > map never has 1-bits for all-zero values. > > Signed-off-by: Jarno Rajahalme > --- > lib/flow.c | 31 ++- > lib/flow.h | 12 +++- > 2 files changed, 17 insertions(+), 26 deletions(-) > > diff --git a/lib/flow.c b/lib/flow.c > index a227f73..22e95d2 100644 > --- a/lib/flow.c > +++ b/lib/flow.c > @@ -1862,19 +1862,17 @@ minimask_equal(const struct minimask *a, const struct > minimask *b) > return miniflow_equal(&a->masks, &b->masks); > } > > -/* Returns true if at least one bit is wildcarded in 'a_' but not in 'b_', > +/* Returns true if at least one bit matched by 'b' is wildcarded by 'a', > * false otherwise. */ > bool > -minimask_has_extra(const struct minimask *a_, const struct minimask *b_) > +minimask_has_extra(const struct minimask *a, const struct minimask *b) > { > -const struct miniflow *a = &a_->masks; > -const struct miniflow *b = &b_->masks; > +const uint32_t *p = b->masks.values; > uint64_t map; > > -for (map = a->map | b->map; map; map = zero_rightmost_1bit(map)) { > -int ofs = raw_ctz(map); > -uint32_t a_u32 = miniflow_get(a, ofs); > -uint32_t b_u32 = miniflow_get(b, ofs); > +for (map = b->masks.map; map; map = zero_rightmost_1bit(map)) { > +uint32_t a_u32 = minimask_get(a, raw_ctz(map)); > +uint32_t b_u32 = *p++; > > if ((a_u32 & b_u32) != b_u32) { > return true; > @@ -1883,20 +1881,3 @@ minimask_has_extra(const struct minimask *a_, const > struct minimask *b_) > > return false; > } > - > -/* Returns true if 'mask' matches every packet, false if 'mask' fixes any > bits > - * or fields. */ > -bool > -minimask_is_catchall(const struct minimask *mask_) > -{ > -const struct miniflow *mask = &mask_->masks; > -const uint32_t *p = mask->values; > -uint64_t map; > - > -for (map = mask->map; map; map = zero_rightmost_1bit(map)) { > -if (*p++) { > -return false; > -} > -} > -return true; > -} > diff --git a/lib/flow.h b/lib/flow.h > index 0ed4f6a..47b 100644 > --- a/lib/flow.h > +++ b/lib/flow.h > @@ -484,7 +484,17 @@ static inline ovs_be64 minimask_get_metadata_mask(const > struct minimask *); > > bool minimask_equal(const struct minimask *a, const struct minimask *b); > bool minimask_has_extra(const struct minimask *, const struct minimask *); > -bool minimask_is_catchall(const struct minimask *); > + > +/* Returns true if 'mask' matches every packet, false if 'mask' fixes any > bits > + * or fields. */ > +static inline bool > +minimask_is_catchall(const struct minimask *mask) > +{ > +/* For every 1-bit in mask's map, the corresponding value is non-zero, > + * so the only way the mask can not fix any bits or fields is for the > + * map the be zero. */ > +return mask->masks.map == 0; > +} > > > > -- > 1.7.10.4 > > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 06/10] lib: Add prefetch support (for GCC)
I think there should be a comment explaining when to use prefetch vs prefetch_write. I certainly don't know off the top of my head. Acked-by: Ethan Jackson On Fri, Apr 18, 2014 at 12:41 PM, Jarno Rajahalme wrote: > Define OVS_PREFETCH() and OVS_PREFETCH_WRITE() using builtin prefetch > for GCC, and ovs_prefetch_range() for prefetching a range of addresses. > > Signed-off-by: Jarno Rajahalme > --- > lib/compiler.h |8 > lib/util.h | 14 ++ > 2 files changed, 22 insertions(+) > > diff --git a/lib/compiler.h b/lib/compiler.h > index 3b59813..981605c 100644 > --- a/lib/compiler.h > +++ b/lib/compiler.h > @@ -206,4 +206,12 @@ > static void f(void) > #endif > > +#if __GNUC__ > +#define OVS_PREFETCH(addr) __builtin_prefetch((addr)) > +#define OVS_PREFETCH_WRITE(addr) __builtin_prefetch((addr), 1) > +#else > +#define OVS_PREFETCH(addr) > +#define OVS_PREFETCH_WRITE(addr) > +#endif > + > #endif /* compiler.h */ > diff --git a/lib/util.h b/lib/util.h > index aff17a5..5c3668a 100644 > --- a/lib/util.h > +++ b/lib/util.h > @@ -151,6 +151,19 @@ is_pow2(uintmax_t x) > #define CACHE_LINE_SIZE 64 > BUILD_ASSERT_DECL(IS_POW2(CACHE_LINE_SIZE)); > > +#define CACHE_LINE_SIZE 64 /* Correct for most CPUs. */ > + > +static inline void > +ovs_prefetch_range(const void *start, size_t size) > +{ > +const char *addr = (const char *)start; > +size_t ofs; > + > +for (ofs = 0; ofs < size; ofs += CACHE_LINE_SIZE) { > +OVS_PREFETCH(addr + ofs); > +} > +} > + > #ifndef MIN > #define MIN(X, Y) ((X) < (Y) ? (X) : (Y)) > #endif > @@ -503,6 +516,7 @@ uint64_t bitwise_get(const void *src, unsigned int > src_len, > unsigned int src_ofs, unsigned int n_bits); > > void xsleep(unsigned int seconds); > + > #ifdef _WIN32 > > char *ovs_format_message(int error); > -- > 1.7.10.4 > > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 07/10] classifier: Use array for subtables instead of a list.
In the cache_push_back function, you might consider using the x2nrealloc() function. Does this actually help? I've found we spend most of our time getting the memory for the rule, not the subtable itself. Ethan On Fri, Apr 18, 2014 at 12:41 PM, Jarno Rajahalme wrote: > Using a linear array allows more efficient memory access for lookups. > > Signed-off-by: Jarno Rajahalme > --- > lib/classifier.c | 239 > +- > 1 file changed, 182 insertions(+), 57 deletions(-) > > diff --git a/lib/classifier.c b/lib/classifier.c > index e48f0a1..0e67b7f 100644 > --- a/lib/classifier.c > +++ b/lib/classifier.c > @@ -38,14 +38,26 @@ struct cls_trie { > struct trie_node *root; /* NULL if none. */ > }; > > +struct cls_subtable_entry { > +struct cls_subtable *subtable; > +uint32_t *mask_values; > +tag_type tag; > +unsigned int max_priority; > +}; > + > +struct cls_subtable_cache { > +struct cls_subtable_entry *subtables; > +size_t alloc_size; /* Number of allocated elements. */ > +size_t size; /* One past last valid array element. */ > +}; > + > struct cls_classifier { > int n_rules;/* Total number of rules. */ > uint8_t n_flow_segments; > uint8_t flow_segments[CLS_MAX_INDICES]; /* Flow segment boundaries to use > * for staged lookup. */ > struct hmap subtables; /* Contains "struct cls_subtable"s. */ > -struct list subtables_priority; /* Subtables in descending priority > order. > - */ > +struct cls_subtable_cache subtables_priority; > struct hmap partitions; /* Contains "struct cls_partition"s. */ > struct cls_trie tries[CLS_MAX_TRIES]; /* Prefix tries. */ > unsigned int n_tries; > @@ -55,8 +67,6 @@ struct cls_classifier { > struct cls_subtable { > struct hmap_node hmap_node; /* Within struct cls_classifier 'subtables' > * hmap. */ > -struct list list_node; /* Within classifier 'subtables_priority' > list. > - */ > struct hmap rules; /* Contains "struct cls_rule"s. */ > struct minimask mask; /* Wildcards for fields. */ > int n_rules;/* Number of rules, including duplicates. */ > @@ -131,6 +141,84 @@ static void mask_set_prefix_bits(struct flow_wildcards > *, uint8_t be32ofs, > unsigned int nbits); > static bool mask_prefix_bits_set(const struct flow_wildcards *, > uint8_t be32ofs, unsigned int nbits); > + > +static void > +cls_subtable_cache_init(struct cls_subtable_cache *array) > +{ > +memset(array, 0, sizeof *array); > +} > + > +static void > +cls_subtable_cache_destroy(struct cls_subtable_cache *array) > +{ > +free(array->subtables); > +memset(array, 0, sizeof *array); > +} > + > +/* Array insertion. */ > +static void > +cls_subtable_cache_push_back(struct cls_subtable_cache *array, > + struct cls_subtable_entry a) > +{ > +if (array->size == array->alloc_size) { > +array->alloc_size = (array->alloc_size + 1) * 2; > +array->subtables = xrealloc(array->subtables, > +array->alloc_size * sizeof a); > +} > + > +array->subtables[array->size++] = a; > +} > + > +/* Only for rearranging entries in the same cache. */ > +static inline void > +cls_subtable_cache_splice(struct cls_subtable_entry *to, > + struct cls_subtable_entry *start, > + struct cls_subtable_entry *end) > +{ > +if (to > end) { > +/* Same as splicing entries to (start) from [end, to). */ > +struct cls_subtable_entry *temp = to; > +to = start; start = end; end = temp; > +} > +if (to < start) { > +while (start != end) { > +struct cls_subtable_entry temp = *start; > + > +memmove(to + 1, to, (start - to) * sizeof *to); > +*to = temp; > +start++; > +} > +} /* Else nothing to be done. */ > +} > + > +/* Array removal. */ > +static inline void > +cls_subtable_cache_remove(struct cls_subtable_cache *array, > + struct cls_subtable_entry *elem) > +{ > +ssize_t size = (&array->subtables[array->size] > +- (elem + 1)) * sizeof *elem; > +if (size > 0) { > +memmove(elem, elem + 1, size); > +} > +array->size--; > +} > + > +#define CLS_SUBTABLE_CACHE_FOR_EACH(SUBTABLE, ITER, ARRAY) \ > +for (ITER = (ARRAY)->subtables; \ > + ITER < &(ARRAY)->subtables[(ARRAY)->size] \ > + && OVS_LIKELY(SUBTABLE = ITER->subtable); \ > + ++ITER) > +#define CLS_SUBTABLE_CACHE_FOR_EACH_CONTINUE(SUBTABLE, ITER, ARRAY) \ > +for (++ITER;
Re: [ovs-dev] [PATCH 08/10] lib/classifier: Separate cls_rule internals from the API.
I haven't read this patch yet, but a high level question. Why not just hide all of struct cls_rule and make callers embed a pointer to it? We'd replace the cls_rule_init() function with a cls_rule_create() function which mallocs the rule and returns it (for example). Ethan On Fri, Apr 18, 2014 at 12:41 PM, Jarno Rajahalme wrote: > Keep an internal representation of a rule separate from the one > embedded into user's structs. This allows for further memory > optimization in the classifier. > > Signed-off-by: Jarno Rajahalme > --- > lib/classifier.c| 211 > +-- > lib/classifier.h| 22 ++--- > tests/test-classifier.c | 18 ++-- > 3 files changed, 149 insertions(+), 102 deletions(-) > > diff --git a/lib/classifier.c b/lib/classifier.c > index 0e67b7f..0517aa7 100644 > --- a/lib/classifier.c > +++ b/lib/classifier.c > @@ -51,6 +51,10 @@ struct cls_subtable_cache { > size_t size; /* One past last valid array element. */ > }; > > +enum { > +CLS_MAX_INDICES = 3 /* Maximum number of lookup indices per subtable. > */ > +}; > + > struct cls_classifier { > int n_rules;/* Total number of rules. */ > uint8_t n_flow_segments; > @@ -90,7 +94,30 @@ struct cls_partition { > struct tag_tracker tracker; /* Tracks the bits in 'tags'. */ > }; > > +/* Internal representation of a rule in a "struct cls_subtable". */ > +struct cls_match { > +struct cls_rule *cls_rule; > +struct hindex_node index_nodes[CLS_MAX_INDICES]; /* Within subtable's > + * 'indices'. */ > +struct hmap_node hmap_node; /* Within struct cls_subtable 'rules'. */ > +unsigned int priority; /* Larger numbers are higher priorities. */ > +struct cls_partition *partition; > +struct list list; /* List of identical, lower-priority rules. > */ > +struct minimatch match; /* Matching rule. */ > +}; > > +static struct cls_match * > +cls_match_alloc(struct cls_rule *rule) > +{ > +struct cls_match *cls_match = xmalloc(sizeof *cls_match); > + > +cls_match->cls_rule = rule; > +minimatch_clone(&cls_match->match, &rule->match); > +cls_match->priority = rule->priority; > +rule->cls_match = cls_match; > + > +return cls_match; > +} > > struct trie_ctx; > static struct cls_subtable *find_subtable(const struct cls_classifier *, > @@ -107,14 +134,14 @@ static void update_subtables_after_removal(struct > cls_classifier *, > struct cls_subtable *, > unsigned int del_priority); > > -static struct cls_rule *find_match_wc(const struct cls_subtable *, > - const struct flow *, struct trie_ctx *, > - unsigned int n_tries, > - struct flow_wildcards *); > -static struct cls_rule *find_equal(struct cls_subtable *, > - const struct miniflow *, uint32_t hash); > -static struct cls_rule *insert_rule(struct cls_classifier *, > -struct cls_subtable *, struct cls_rule > *); > +static struct cls_match *find_match_wc(const struct cls_subtable *, > + const struct flow *, struct trie_ctx > *, > + unsigned int n_tries, > + struct flow_wildcards *); > +static struct cls_match *find_equal(struct cls_subtable *, > +const struct miniflow *, uint32_t hash); > +static struct cls_match *insert_rule(struct cls_classifier *, > + struct cls_subtable *, struct cls_rule > *); > > /* Iterates RULE over HEAD and all of the cls_rules on HEAD->list. */ > #define FOR_EACH_RULE_IN_LIST(RULE, HEAD) \ > @@ -124,8 +151,8 @@ static struct cls_rule *insert_rule(struct cls_classifier > *, > (RULE) != NULL && ((NEXT) = next_rule_in_list(RULE), true);\ > (RULE) = (NEXT)) > > -static struct cls_rule *next_rule_in_list__(struct cls_rule *); > -static struct cls_rule *next_rule_in_list(struct cls_rule *); > +static struct cls_match *next_rule_in_list__(struct cls_match *); > +static struct cls_match *next_rule_in_list(struct cls_match *); > > static unsigned int minimask_get_prefix_len(const struct minimask *, > const struct mf_field *); > @@ -415,6 +442,7 @@ cls_rule_init(struct cls_rule *rule, > { > minimatch_init(&rule->match, match); > rule->priority = priority; > +rule->cls_match = NULL; > } > > /* Same as cls_rule_init() for initialization from a "struct minimatch". */ > @@ -425,6 +453,7 @@ cls_rule_init_from_minimatch(struct cls_rule *rule, > { > minimatch_clone(&rule->match, match); > rule->priori
Re: [ovs-dev] [PATCH 08/10] lib/classifier: Separate cls_rule internals from the API.
Acked-by: Ethan Jackson I don't really like the API we've ended up with here. But I think it's fine for now. As discussed offline, I intend to change it later. Ethan On Thu, Apr 24, 2014 at 6:25 PM, Ethan Jackson wrote: > I haven't read this patch yet, but a high level question. Why not > just hide all of struct cls_rule and make callers embed a pointer to > it? We'd replace the cls_rule_init() function with a > cls_rule_create() function which mallocs the rule and returns it (for > example). > > Ethan > > On Fri, Apr 18, 2014 at 12:41 PM, Jarno Rajahalme > wrote: >> Keep an internal representation of a rule separate from the one >> embedded into user's structs. This allows for further memory >> optimization in the classifier. >> >> Signed-off-by: Jarno Rajahalme >> --- >> lib/classifier.c| 211 >> +-- >> lib/classifier.h| 22 ++--- >> tests/test-classifier.c | 18 ++-- >> 3 files changed, 149 insertions(+), 102 deletions(-) >> >> diff --git a/lib/classifier.c b/lib/classifier.c >> index 0e67b7f..0517aa7 100644 >> --- a/lib/classifier.c >> +++ b/lib/classifier.c >> @@ -51,6 +51,10 @@ struct cls_subtable_cache { >> size_t size; /* One past last valid array element. */ >> }; >> >> +enum { >> +CLS_MAX_INDICES = 3 /* Maximum number of lookup indices per subtable. >> */ >> +}; >> + >> struct cls_classifier { >> int n_rules;/* Total number of rules. */ >> uint8_t n_flow_segments; >> @@ -90,7 +94,30 @@ struct cls_partition { >> struct tag_tracker tracker; /* Tracks the bits in 'tags'. */ >> }; >> >> +/* Internal representation of a rule in a "struct cls_subtable". */ >> +struct cls_match { >> +struct cls_rule *cls_rule; >> +struct hindex_node index_nodes[CLS_MAX_INDICES]; /* Within subtable's >> + * 'indices'. */ >> +struct hmap_node hmap_node; /* Within struct cls_subtable 'rules'. */ >> +unsigned int priority; /* Larger numbers are higher priorities. */ >> +struct cls_partition *partition; >> +struct list list; /* List of identical, lower-priority rules. >> */ >> +struct minimatch match; /* Matching rule. */ >> +}; >> >> +static struct cls_match * >> +cls_match_alloc(struct cls_rule *rule) >> +{ >> +struct cls_match *cls_match = xmalloc(sizeof *cls_match); >> + >> +cls_match->cls_rule = rule; >> +minimatch_clone(&cls_match->match, &rule->match); >> +cls_match->priority = rule->priority; >> +rule->cls_match = cls_match; >> + >> +return cls_match; >> +} >> >> struct trie_ctx; >> static struct cls_subtable *find_subtable(const struct cls_classifier *, >> @@ -107,14 +134,14 @@ static void update_subtables_after_removal(struct >> cls_classifier *, >> struct cls_subtable *, >> unsigned int del_priority); >> >> -static struct cls_rule *find_match_wc(const struct cls_subtable *, >> - const struct flow *, struct trie_ctx >> *, >> - unsigned int n_tries, >> - struct flow_wildcards *); >> -static struct cls_rule *find_equal(struct cls_subtable *, >> - const struct miniflow *, uint32_t hash); >> -static struct cls_rule *insert_rule(struct cls_classifier *, >> -struct cls_subtable *, struct cls_rule >> *); >> +static struct cls_match *find_match_wc(const struct cls_subtable *, >> + const struct flow *, struct trie_ctx >> *, >> + unsigned int n_tries, >> + struct flow_wildcards *); >> +static struct cls_match *find_equal(struct cls_subtable *, >> +const struct miniflow *, uint32_t hash); >> +static struct cls_match *insert_rule(struct cls_classifier *, >> + struct cls_subtable *, struct cls_rule >> *); >> >> /* Iterates RULE over HEAD and all of the cls_rules on HEAD->list. */ >> #define FOR_EACH_RULE_IN_LIST(RULE, HEAD) \ >> @@
Re: [ovs-dev] [PATCH 09/10] lib/flow: Maintain miniflow offline values explicitly.
Acked-by: Ethan Jackson I'm not sure I totally follow Kmindg's comment, perhaps he could explain further? At any rate, once addressed I"m happy with this. Ethan On Sat, Apr 19, 2014 at 10:09 PM, Kmindg G wrote: > On Sat, Apr 19, 2014 at 3:42 AM, Jarno Rajahalme > wrote: >> This allows use of miniflows that have all of their values inline. >> >> Signed-off-by: Jarno Rajahalme >> --- >> lib/classifier.c | 36 +++-- >> lib/dpif-netdev.c | 32 ++- >> lib/flow.c| 91 >> ++--- >> lib/flow.h| 70 + >> lib/match.c |4 +-- >> 5 files changed, 127 insertions(+), 106 deletions(-) >> >> diff --git a/lib/classifier.c b/lib/classifier.c >> index 0517aa7..75befad 100644 >> --- a/lib/classifier.c >> +++ b/lib/classifier.c >> @@ -40,7 +40,7 @@ struct cls_trie { >> >> struct cls_subtable_entry { >> struct cls_subtable *subtable; >> -uint32_t *mask_values; >> +const uint32_t *mask_values; >> tag_type tag; >> unsigned int max_priority; >> }; >> @@ -279,8 +279,9 @@ static inline uint32_t >> flow_hash_in_minimask(const struct flow *flow, const struct minimask *mask, >>uint32_t basis) >> { >> +const uint32_t *mask_values = miniflow_get_u32_values(&mask->masks); >> const uint32_t *flow_u32 = (const uint32_t *)flow; >> -const uint32_t *p = mask->masks.values; >> +const uint32_t *p = mask_values; >> uint32_t hash; >> uint64_t map; >> >> @@ -289,7 +290,7 @@ flow_hash_in_minimask(const struct flow *flow, const >> struct minimask *mask, >> hash = mhash_add(hash, flow_u32[raw_ctz(map)] & *p++); >> } >> >> -return mhash_finish(hash, (p - mask->masks.values) * 4); >> +return mhash_finish(hash, (p - mask_values) * 4); >> } >> >> /* Returns a hash value for the bits of 'flow' where there are 1-bits in >> @@ -301,7 +302,8 @@ static inline uint32_t >> miniflow_hash_in_minimask(const struct miniflow *flow, >>const struct minimask *mask, uint32_t basis) >> { >> -const uint32_t *p = mask->masks.values; >> +const uint32_t *mask_values = miniflow_get_u32_values(&mask->masks); >> +const uint32_t *p = mask_values; >> uint32_t hash = basis; >> uint32_t flow_u32; >> >> @@ -309,7 +311,7 @@ miniflow_hash_in_minimask(const struct miniflow *flow, >> hash = mhash_add(hash, flow_u32 & *p++); >> } >> >> -return mhash_finish(hash, (p - mask->masks.values) * 4); >> +return mhash_finish(hash, (p - mask_values) * 4); >> } >> >> /* Returns a hash value for the bits of range [start, end) in 'flow', >> @@ -322,11 +324,12 @@ flow_hash_in_minimask_range(const struct flow *flow, >> const struct minimask *mask, >> uint8_t start, uint8_t end, uint32_t *basis) >> { >> +const uint32_t *mask_values = miniflow_get_u32_values(&mask->masks); >> const uint32_t *flow_u32 = (const uint32_t *)flow; >> unsigned int offset; >> uint64_t map = miniflow_get_map_in_range(&mask->masks, start, end, >> &offset); >> -const uint32_t *p = mask->masks.values + offset; >> +const uint32_t *p = mask_values + offset; >> uint32_t hash = *basis; >> >> for (; map; map = zero_rightmost_1bit(map)) { >> @@ -334,7 +337,7 @@ flow_hash_in_minimask_range(const struct flow *flow, >> } >> >> *basis = hash; /* Allow continuation from the unfinished value. */ >> -return mhash_finish(hash, (p - mask->masks.values) * 4); >> +return mhash_finish(hash, (p - mask_values) * 4); >> } >> >> /* Fold minimask 'mask''s wildcard mask into 'wc's wildcard mask. */ >> @@ -356,7 +359,7 @@ flow_wildcards_fold_minimask_range(struct flow_wildcards >> *wc, >> unsigned int offset; >> uint64_t map = miniflow_get_map_in_range(&mask->masks, start, end, >> &offset); >> -const uint32_t *p = mask->masks.values + offset; >> +const uint32_t *p = miniflow_get_u32_values(&mask->masks) + offset; >> >> for (; map; map = zero_rightmost_1bit(map)) { >
Re: [ovs-dev] [PATCH 10/10] lib/classifier: Support variable sized miniflows.
The comment of miniflow_and_mask_matches_flow() need to be reworded. Do we really need that function? Is it really that much faster? Having two functions which do the exact same thing is a bit confusing. So I suppose the only reason for a non zero inline_values array is for those callers who can't allocate something that's variable sized, but still want to avoid the extra memory allocation. However, I don't think there are any of those callers which are performance critical. I'm wondering if we could simplify this significantly by always having offline values passed in by the caller. For those callers which can make it variable sized, they would, everyone else would make it FLOW_U32s sized? What do you think? Ethan On Sat, Apr 19, 2014 at 9:51 PM, Kmindg G wrote: > On Sat, Apr 19, 2014 at 3:42 AM, Jarno Rajahalme > wrote: >> Change the classifier to allocate variable sized miniflows and >> minimasks in cls_match and cls_subtable, respectively. Do not >> duplicate the mask in cls_rule any more. >> >> miniflow_clone and miniflow_move can now take variably sized miniflows >> as source. The destination is assumed to be regularly sized miniflow. >> >> Inlining miniflow and mask values reduces memory indirection and helps >> reduce cache misses. >> >> Signed-off-by: Jarno Rajahalme >> --- >> lib/classifier.c | 82 >> +++--- >> lib/flow.c | 55 +++- >> lib/flow.h | 30 ++-- >> 3 files changed, 134 insertions(+), 33 deletions(-) >> >> diff --git a/lib/classifier.c b/lib/classifier.c >> index 75befad..1198a76 100644 >> --- a/lib/classifier.c >> +++ b/lib/classifier.c >> @@ -40,7 +40,6 @@ struct cls_trie { >> >> struct cls_subtable_entry { >> struct cls_subtable *subtable; >> -const uint32_t *mask_values; >> tag_type tag; >> unsigned int max_priority; >> }; >> @@ -72,7 +71,6 @@ struct cls_subtable { >> struct hmap_node hmap_node; /* Within struct cls_classifier 'subtables' >> * hmap. */ >> struct hmap rules; /* Contains "struct cls_rule"s. */ >> -struct minimask mask; /* Wildcards for fields. */ >> int n_rules;/* Number of rules, including duplicates. */ >> unsigned int max_priority; /* Max priority of any rule in the >> subtable. */ >> unsigned int max_count; /* Count of max_priority rules. */ >> @@ -81,6 +79,8 @@ struct cls_subtable { >> uint8_t index_ofs[CLS_MAX_INDICES]; /* u32 flow segment boundaries. */ >> struct hindex indices[CLS_MAX_INDICES]; /* Staged lookup indices. */ >> unsigned int trie_plen[CLS_MAX_TRIES]; /* Trie prefix length in >> 'mask'. */ >> +struct minimask mask; /* Wildcards for fields. */ >> +/* 'mask' must be the last field. */ >> }; >> >> /* Associates a metadata value (that is, a value of the OpenFlow 1.1+ >> metadata >> @@ -103,16 +103,21 @@ struct cls_match { >> unsigned int priority; /* Larger numbers are higher priorities. */ >> struct cls_partition *partition; >> struct list list; /* List of identical, lower-priority rules. >> */ >> -struct minimatch match; /* Matching rule. */ >> +struct miniflow flow; /* Matching rule. Mask is in the subtable. >> */ >> +/* 'flow' must be the last field. */ >> }; >> >> static struct cls_match * >> cls_match_alloc(struct cls_rule *rule) >> { >> -struct cls_match *cls_match = xmalloc(sizeof *cls_match); >> +int count = count_1bits(rule->match.flow.map); >> + >> +struct cls_match *cls_match >> += xmalloc(sizeof *cls_match - sizeof cls_match->flow.inline_values >> + + MINIFLOW_VALUES_SIZE(count)); > > Would it lead to a potential array access violation problem when > 'sizeof cls_match->flow.inline_values' is bigger than > 'MINIFLOW_VALUES_SIZE(count)'? > >> >> cls_match->cls_rule = rule; >> -minimatch_clone(&cls_match->match, &rule->match); >> +miniflow_clone_inline(&cls_match->flow, &rule->match.flow, count); >> cls_match->priority = rule->priority; >> rule->cls_match = cls_match; >> >> @@ -875,7 +880,6 @@ static inline void >> lookahead_subtable(const struct cls_subtable_entry *subtables) >> { >> ovs_prefetch_range(subtables->subtable, sizeof *subtables->subtable); >> -ovs_prefetch_range(subtables->mask_values, 1); >> } >> >> /* Finds and returns the highest-priority rule in 'cls' that matches 'flow'. >> @@ -969,16 +973,19 @@ classifier_lookup(const struct classifier *cls_, const >> struct flow *flow, >> } >> >> /* Returns true if 'target' satisifies 'match', that is, if each bit for >> which >> - * 'match' specifies a particular value has the correct value in 'target'. >> */ >> + * 'match' specifies a particular value has the correct value in 'target'. >> + * >> + * 'flow' and 'mask' have the same mask! */ >> static bool >> -minimatch_ma
Re: [ovs-dev] [PATCH 04/10] lib/classifier: Hide more of the internal data structures.
Fine with me go ahead and push it. Ethan On Tue, Apr 29, 2014 at 12:40 PM, Jarno Rajahalme wrote: > > On Apr 24, 2014, at 5:40 PM, Ethan Jackson wrote: > >> So it seems to me that the only data needed in struct classifier by >> callers is the fat_rwlock. What if we add a classifier_rdlock() and a >> classifier_wrlock() function. That done, we could entirely hide >> struct classifier, and would need to make the distinction between it >> and the cls_classifier. Thoughts? > > I wanted to avoid API changes in this series, but I like this in general. > However, this would prevent the OVS_REQ_RDLOCK() type of static analyzer > annotations on the API functions, and would result in additional function > call indirections for the lock and unlock functions. > > Moreover, we should consider how to replace (most of) the locking with RCU, > which would be another change on the classifier locking interface. > > Considering the above I would be inclined to push this patch as a step > towards the above goals, with the understanding that more work on this will > be done later. > > Jarno > >> >> Ethan >> >> On Fri, Apr 18, 2014 at 12:41 PM, Jarno Rajahalme >> wrote: >>> It is better not to expose definitions not needed by users. >>> >>> Signed-off-by: Jarno Rajahalme >>> --- >>> lib/classifier.c| 134 >>> --- >>> lib/classifier.h| 65 --- >>> tests/test-classifier.c |6 +-- >>> 3 files changed, 115 insertions(+), 90 deletions(-) >>> >>> diff --git a/lib/classifier.c b/lib/classifier.c >>> index 8ab1f9c..e48f0a1 100644 >>> --- a/lib/classifier.c >>> +++ b/lib/classifier.c >>> @@ -30,18 +30,70 @@ >>> >>> VLOG_DEFINE_THIS_MODULE(classifier); >>> >>> +struct trie_node; >>> + >>> +/* Prefix trie for a 'field' */ >>> +struct cls_trie { >>> +const struct mf_field *field; /* Trie field, or NULL. */ >>> +struct trie_node *root; /* NULL if none. */ >>> +}; >>> + >>> +struct cls_classifier { >>> +int n_rules;/* Total number of rules. */ >>> +uint8_t n_flow_segments; >>> +uint8_t flow_segments[CLS_MAX_INDICES]; /* Flow segment boundaries to >>> use >>> + * for staged lookup. */ >>> +struct hmap subtables; /* Contains "struct cls_subtable"s. */ >>> +struct list subtables_priority; /* Subtables in descending priority >>> order. >>> + */ >>> +struct hmap partitions; /* Contains "struct cls_partition"s. */ >>> +struct cls_trie tries[CLS_MAX_TRIES]; /* Prefix tries. */ >>> +unsigned int n_tries; >>> +}; >>> + >>> +/* A set of rules that all have the same fields wildcarded. */ >>> +struct cls_subtable { >>> +struct hmap_node hmap_node; /* Within struct cls_classifier 'subtables' >>> + * hmap. */ >>> +struct list list_node; /* Within classifier 'subtables_priority' >>> list. >>> + */ >>> +struct hmap rules; /* Contains "struct cls_rule"s. */ >>> +struct minimask mask; /* Wildcards for fields. */ >>> +int n_rules;/* Number of rules, including duplicates. >>> */ >>> +unsigned int max_priority; /* Max priority of any rule in the >>> subtable. */ >>> +unsigned int max_count; /* Count of max_priority rules. */ >>> +tag_type tag; /* Tag generated from mask for >>> partitioning. */ >>> +uint8_t n_indices; /* How many indices to use. */ >>> +uint8_t index_ofs[CLS_MAX_INDICES]; /* u32 flow segment boundaries. */ >>> +struct hindex indices[CLS_MAX_INDICES]; /* Staged lookup indices. */ >>> +unsigned int trie_plen[CLS_MAX_TRIES]; /* Trie prefix length in >>> 'mask'. */ >>> +}; >>> + >>> +/* Associates a metadata value (that is, a value of the OpenFlow 1.1+ >>> metadata >>> + * field) with tags for the "cls_subtable"s that contain rules that match >>> that >>> + * metadata value. */ >>> +struct cls_partition { >>> +struct hmap_node hmap_node; /* In struct cls_classifier's 'partitions
Re: [ovs-dev] Userspace Netlink MMAP status
> The common use case is a flow expiring during the lifetime of a TCP > connection. It will result in multiple data packets being sent upwards. > It's much less likely in the megaflows era though. I think this is pretty unlikely except in some extreme circumstances. For a flow to be removed from the datapath, it needs to be idle (unused) for about 1.5s. One could construct pathological cases in which this would be the case for bursty high bandwidth flows, but I highly doubt it's typical. Ethan > >> >As we extend the GSO path into the upcall and make use of the new DPDK >> >style ofpbuf to avoid the memcpy() for the mmap case >> Can you elaborate a bit more on this? > > The current upcall code does segmentation which is not required and is > expensive for the above mentioned case. A single 64K GSO packet will > automatically result in up to 50 upcalls. > > Also, right now, the first thing we do in the mmap case is copy the > buffer into an ofpbuf. This is not required at all and the copy is > expensive, instead, we should make use of the shared memory just like > in the DPDK case and only release the buffer after the packet has been > fully processed. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 07/10] classifier: Use array for subtables instead of a list.
Sounds good. Ethan On Tue, Apr 29, 2014 at 12:58 PM, Jarno Rajahalme wrote: > > On Apr 24, 2014, at 6:15 PM, Ethan Jackson wrote: > >> In the cache_push_back function, you might consider using the >> x2nrealloc() function. >> > > I did not know this existed, thanks! > >> Does this actually help? I've found we spend most of our time getting >> the memory for the rule, not the subtable itself. >> > > After this series I see cache misses only on fetching the ‘next’ rule. > Before, I saw also misses for the subtable (being fetched via a next pointer > as well) and/or it’s mask. So this helps :-) > > Jarno > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 10/10] lib/classifier: Support variable sized miniflows.
> The almost equivalent function miniflow_equal_flow_in_minimask() does not > require the miniflow and minimask to have the same map, and it is slower for > it. However, it is only used by the test-classifier.c, so one option to > reduce the confusion would be to move it there. What do you think? That sounds like a good option to me. > I suggest we work on this as on work-in-progress, and check the current > series in to get a stable basis for that work. Sounds reasonable. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 10/10] lib/classifier: Support variable sized miniflows.
Meh I changed my mind again. Just merge the series instead of holding back, I'll make my tweaks on top of master. Ethan On Tue, Apr 29, 2014 at 2:34 PM, Ethan Jackson wrote: >> The almost equivalent function miniflow_equal_flow_in_minimask() does not >> require the miniflow and minimask to have the same map, and it is slower for >> it. However, it is only used by the test-classifier.c, so one option to >> reduce the confusion would be to move it there. What do you think? > > That sounds like a good option to me. > >> I suggest we work on this as on work-in-progress, and check the current >> series in to get a stable basis for that work. > > Sounds reasonable. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev