Re: [ovs-dev] [threads 28/28] vlog: Make thread-safe.
On 10 July 2013 19:04, Ben Pfaff wrote: > static void format_log_message(const struct vlog_module *, enum vlog_level, > - enum vlog_facility, unsigned int msg_num, > + enum vlog_facility, > const char *message, va_list, struct ds *) > PRINTF_FORMAT(5, 0); This should be PRINTF_FORMAT(4, 0) now. X-CudaMail-Whitelist-To: dev@openvswitch.org ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [threads 05/28] timeval: Always block SIGALRM in time_poll().
On 10 July 2013 19:03, Ben Pfaff wrote: > (The usual reason for ppoll() is to ensure atomicity of changing the signal > mask and blocking, but time_poll() does not need atomicity. Therefore, > even a userspace emulated ppoll() would work here in a race-free manner, if > the kernel does not support ppoll().) And we'll need that userspace emulated ppoll() for non-Linux systems; seem reasonable to just implement it in timeval.c inside an #ifndef HAVE_PPOLL from autoconf? X-CudaMail-Whitelist-To: dev@openvswitch.org ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [threads 01/28] fatal-signal: Remove write-only variable fatal_signal_set.
On 10 July 2013 19:03, Ben Pfaff wrote: > Signed-off-by: Ben Pfaff > --- > lib/fatal-signal.c |5 - Acked-by: Ed Maste X-CudaMail-Whitelist-To: dev@openvswitch.org ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [threads 24/28] vlog: Make vlog_should_drop() thread-safe.
On 10 July 2013 19:04, Ben Pfaff wrote: > Signed-off-by: Ben Pfaff Acked-by: Ed Maste X-CudaMail-Whitelist-To: dev@openvswitch.org ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [threads 13/28] poll-loop: Fix typo in comment.
On 10 July 2013 19:03, Ben Pfaff wrote: > Signed-off-by: Ben Pfaff Acked-by: Ed Maste X-CudaMail-Whitelist-To: dev@openvswitch.org ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [threads 08/28] timeval: Remove backtrace feature.
On 10 July 2013 19:03, Ben Pfaff wrote: > The backtrace feature of timeval is useful because it provides a "poor > man's profile" view of Open vSwitch. But it is not likely to be useful in > a multithreaded process, because signal delivery doesn't necessarily follow > the profile when there is more than one thread. (A signal in a > multithreaded process are delivered to an arbitrary thread.) > > Another problem with the backtrace feature is that it is difficult for > format_backtraces() to synchronize properly with the signal handler in a > multithreaded process. In a single-threaded process, it can just block > the signal handler, but in a multithreaded process this does not prevent > signal delivery to threads other than the one running format_backtrace(). > > Signed-off-by: Ben Pfaff Acked-by: Ed Maste X-CudaMail-Whitelist-To: dev@openvswitch.org ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [threads 23/28] util: Make subprogram_name thread-specific.
On 10 July 2013 19:04, Ben Pfaff wrote: > Signed-off-by: Ben Pfaff > +void > +set_subprogram_name(const char *name) > +{ > +free(subprogram_name_set(xstrdup(name))); > +} How about calling (through appropriate abstraction) pthread_setname_np (Linux) or pthread_set_name_np (FreeBSD)? The thread names will then appear in gdb or lldb, and in top(1) (on FreeBSD at least). Acked-by: Ed Maste X-CudaMail-Whitelist-To: dev@openvswitch.org ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [threads 06/28] timeval: New function xclock_gettime().
On 10 July 2013 19:03, Ben Pfaff wrote: > Signed-off-by: Ben Pfaff Acked-by: Ed Maste X-CudaMail-Whitelist-To: dev@openvswitch.org ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [threads 09/28] timeval: Fix typo in comment.
On 10 July 2013 19:03, Ben Pfaff wrote: > Signed-off-by: Ben Pfaff Acked-by: Ed Maste X-CudaMail-Whitelist-To: dev@openvswitch.org ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [threads 27/28] vlog: Remove unused function vlog_exit().
On 10 July 2013 19:04, Ben Pfaff wrote: > This is harder to implement once threads are introduced. > > Signed-off-by: Ben Pfaff Acked-by: Ed Maste X-CudaMail-Whitelist-To: dev@openvswitch.org ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [threads 14/28] vlog: Remove support for worker process.
On 10 July 2013 19:03, Ben Pfaff wrote: > The worker process implementation isn't thread-safe and, once OVS > itself is threaded, it doesn't make much sense to have a worker > process anyway. > > Signed-off-by: Ben Pfaff Acked-by: Ed Maste X-CudaMail-Whitelist-To: dev@openvswitch.org ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [threads 26/28] vlog: Remove unused function vlog_get_log_file().
On 10 July 2013 19:04, Ben Pfaff wrote: > Signed-off-by: Ben Pfaff Acked-by: Ed Maste X-CudaMail-Whitelist-To: dev@openvswitch.org ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH V2] bfd: Implements BFD decay
When there is no incoming data traffic in the tunnel for a period, BFD decay allows the bfd session to increase the min_rx. This isa helpful in that usually some tunnels are rarely used. And cpu consumption can be reduced by processing fewer bfd control packets. Signed-off-by: Alex Wang --- v1 -> v2: 1. replace the gradual decay to immediately decay. 2. add configurable decay_min_rx. --- lib/bfd.c | 89 ++-- lib/bfd.h |5 ++- ofproto/ofproto-dpif.c |3 +- vswitchd/vswitch.xml | 18 ++ 4 files changed, 110 insertions(+), 5 deletions(-) diff --git a/lib/bfd.c b/lib/bfd.c index aa1a3f7..0d4805a 100644 --- a/lib/bfd.c +++ b/lib/bfd.c @@ -152,6 +152,9 @@ struct bfd { bool cpath_down; /* Concatenated Path Down. */ uint8_t mult; /* bfd.DetectMult. */ +struct netdev *netdev; +uint64_t rx_packets; /* Packets received by 'netdev'. */ + enum state state; /* bfd.SessionState. */ enum state rmt_state; /* bfd.RemoteSessionState. */ @@ -182,6 +185,11 @@ struct bfd { int ref_cnt; int forwarding_override; /* Manual override of 'forwarding' status. */ + +/* BFD decay related variables. */ +bool decay_enabled; /* Flag that enables bfd decay. */ +int decay_min_rx; +long long int decay_detect_time; }; static bool bfd_in_poll(const struct bfd *); @@ -191,6 +199,9 @@ static const char *bfd_state_str(enum state); static long long int bfd_min_tx(const struct bfd *); static long long int bfd_tx_interval(const struct bfd *); static long long int bfd_rx_interval(const struct bfd *); +static uint64_t bfd_rx_packets(const struct bfd *); +static void bfd_decay_min_rx_set_default(struct bfd *); +static void bfd_decay(struct bfd *); static void bfd_set_next_tx(struct bfd *); static void bfd_set_state(struct bfd *, enum state, enum diag); static uint32_t generate_discriminator(void); @@ -243,7 +254,8 @@ bfd_get_status(const struct bfd *bfd, struct smap *smap) * Also returns NULL if cfg is NULL. */ struct bfd * bfd_configure(struct bfd *bfd, const char *name, - const struct smap *cfg) + const struct smap *cfg, + struct netdev *netdev) { static uint16_t udp_src = 0; static bool init = false; @@ -276,6 +288,9 @@ bfd_configure(struct bfd *bfd, const char *name, bfd->min_tx = 1000; bfd->mult = 3; bfd->ref_cnt = 1; +bfd->netdev = netdev; +bfd->decay_detect_time = 0; +bfd->rx_packets = bfd_rx_packets(bfd); /* RFC 5881 section 4 * The source port MUST be in the range 49152 through 65535. The same @@ -309,6 +324,15 @@ bfd_configure(struct bfd *bfd, const char *name, bfd_poll(bfd); } +bfd->decay_enabled = smap_get_bool(cfg, "decay_enable", false); +bfd->decay_min_rx = smap_get_int(cfg, "decay_min_rx", -1); +/* Set the default decay_min_rx if decay is enabled and decay_min_rx + is not specified. */ +if (bfd->decay_enabled && bfd->decay_min_rx < 0) { +bfd_decay_min_rx_set_default(bfd); +bfd_poll(bfd); +} + cpath_down = smap_get_bool(cfg, "cpath_down", false); if (bfd->cpath_down != cpath_down) { bfd->cpath_down = cpath_down; @@ -360,11 +384,19 @@ bfd_wait(const struct bfd *bfd) void bfd_run(struct bfd *bfd) { -if (bfd->state > STATE_DOWN && time_msec() >= bfd->detect_time) { +long long int now = time_msec(); + +if (bfd->state > STATE_DOWN && now >= bfd->detect_time) { bfd_set_state(bfd, STATE_DOWN, DIAG_EXPIRED); } -if (bfd->min_tx != bfd->cfg_min_tx || bfd->min_rx != bfd->cfg_min_rx) { +if (bfd->state == STATE_UP && bfd->decay_enabled +&& now >= bfd->decay_detect_time) { +bfd_decay(bfd); +} + +if (!bfd->decay_enabled && +(bfd->min_tx != bfd->cfg_min_tx || bfd->min_rx != bfd->cfg_min_rx)) { bfd_poll(bfd); } } @@ -804,6 +836,57 @@ bfd_set_state(struct bfd *bfd, enum state state, enum diag diag) } } +static uint64_t +bfd_rx_packets(const struct bfd *bfd) +{ +struct netdev_stats stats; + +if (bfd->netdev && !netdev_get_stats(bfd->netdev, &stats)) { +return stats.rx_packets; +} else { +return 0; +} +} + +static void +bfd_decay_min_rx_set_default(struct bfd *bfd) +{ + +bfd->decay_min_rx = bfd->mult * bfd->mult * MAX(bfd->cfg_min_rx, +bfd->rmt_min_tx); +/* Always restore the min_rx. */ +bfd->min_rx = bfd->cfg_min_rx; +} + +static void +bfd_decay(struct bfd *bfd) +{ +uint64_t rx_packets = bfd_rx_packets(bfd); +int64_t diff; + +diff = rx_packets - bfd->rx_packets; +bfd->rx_packets = rx_packets; +bfd->decay_detect_time = bfd->decay_min_rx + time_msec(); + +if (diff <= (bfd->decay_min_rx / bfd->min_rx + 5)) { +
[ovs-dev] [PATCH 1/2] ovs-vswitchd: Correct the documentation on the limits for bridges and ports.
Signed-off-by: Gurucharan Shetty --- vswitchd/ovs-vswitchd.8.in | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/vswitchd/ovs-vswitchd.8.in b/vswitchd/ovs-vswitchd.8.in index 86500c9..da43336 100644 --- a/vswitchd/ovs-vswitchd.8.in +++ b/vswitchd/ovs-vswitchd.8.in @@ -237,13 +237,12 @@ We believe these limits to be accurate as of this writing. These limits assume the use of the Linux kernel datapath. . .IP \(bu -Approximately 256 bridges given the allowance of 5,000 file -descriptors that \fBovs\-ctl\fR(8) configures. (\fBovs\-vswitchd\fR -requires 17 file descriptors per datapath.) -. -.IP \(bu -65,280 ports per bridge. Performance will degrade beyond 1,024 ports -per bridge due to fixed hash table sizing. +\fBovs\-vswitchd\fR started through \fBovs\-ctl\fR(8) provides a limit of 5000 +file descriptors. Creation of a single bridge consumes 3 file descriptors and +adding a port consumes 1 file descriptor. The limits on the number of bridges +and ports is decided by the availability of file descriptors. There is also a +theoritical limit of 65,280 ports per bridge. Performance will degrade beyond +1,024 ports per bridge due to fixed hash table sizing. . .IP \(bu 2,048 MAC learning entries per bridge, by default. (This is -- 1.7.9.5 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH 2/2] ovs-ctl.in: Increase the limit on file descriptors.
Testing shows that creation of 5000 internal ports and using it to do some meaningful tasks works fine on a 12 cpu hardware. Since a single port needs one file descriptor and a bridge needs 3 file descriptors, we will have to increase the file descriptor limit to a higher number from the current limit of 5000. 7500 feels like a decent increase with enough room for further scale testing. Bug #18383. Signed-off-by: Gurucharan Shetty --- utilities/ovs-ctl.in |8 vswitchd/ovs-vswitchd.8.in |2 +- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/utilities/ovs-ctl.in b/utilities/ovs-ctl.in index 6465fdf..0735160 100755 --- a/utilities/ovs-ctl.in +++ b/utilities/ovs-ctl.in @@ -224,10 +224,10 @@ start_forwarding () { log_success_msg "ovs-vswitchd is already running" else # Increase the limit on the number of open file descriptors. -# On Linux, ovs-vswitchd needs about one file descriptor per -# switch port, so this allows a very large number of switch -# ports. -ulimit -n 5000 +# On Linux, ovs-vswitchd needs about three file descriptors +# per bridge and one file descriptor per bridge port, so this +# allows a very large number of bridges and ports. +ulimit -n 7500 # Start ovs-vswitchd. set ovs-vswitchd unix:"$DB_SOCK" diff --git a/vswitchd/ovs-vswitchd.8.in b/vswitchd/ovs-vswitchd.8.in index da43336..7d06918 100644 --- a/vswitchd/ovs-vswitchd.8.in +++ b/vswitchd/ovs-vswitchd.8.in @@ -237,7 +237,7 @@ We believe these limits to be accurate as of this writing. These limits assume the use of the Linux kernel datapath. . .IP \(bu -\fBovs\-vswitchd\fR started through \fBovs\-ctl\fR(8) provides a limit of 5000 +\fBovs\-vswitchd\fR started through \fBovs\-ctl\fR(8) provides a limit of 7500 file descriptors. Creation of a single bridge consumes 3 file descriptors and adding a port consumes 1 file descriptor. The limits on the number of bridges and ports is decided by the availability of file descriptors. There is also a -- 1.7.9.5 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 1/2] ovs-vswitchd: Correct the documentation on the limits for bridges and ports.
On Thu, Jul 11, 2013 at 09:37:45AM -0700, Gurucharan Shetty wrote: > Signed-off-by: Gurucharan Shetty I'm not sure it's worth mentioning the 65,280 port limit, because it's about the same as the maximum number of file descriptors. At first thought, these limits seemed Linux specific, but I think that FreeBSD also needs a file descriptor per port (for a pcap handle), so maybe it's not worth saying that it is Linux specific. The final note about performance beyond 1,024 ports is definitely specific to the Linux kernel datapath though. Thanks, Ben. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 2/2] ovs-ctl.in: Increase the limit on file descriptors.
On Thu, Jul 11, 2013 at 09:37:46AM -0700, Gurucharan Shetty wrote: > Testing shows that creation of 5000 internal ports and using it > to do some meaningful tasks works fine on a 12 cpu hardware. > Since a single port needs one file descriptor and a bridge > needs 3 file descriptors, we will have to increase the file > descriptor limit to a higher number from the current limit of 5000. > 7500 feels like a decent increase with enough room for further > scale testing. > > Bug #18383. > Signed-off-by: Gurucharan Shetty Acked-by: Ben Pfaff ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 1/2] ovs-vswitchd: Correct the documentation on the limits for bridges and ports.
On Thu, Jul 11, 2013 at 10:22 AM, Ben Pfaff wrote: > On Thu, Jul 11, 2013 at 09:37:45AM -0700, Gurucharan Shetty wrote: > > Signed-off-by: Gurucharan Shetty > > I'm not sure it's worth mentioning the 65,280 port limit, because it's > about the same as the maximum number of file descriptors. > > At first thought, these limits seemed Linux specific, but I think that > FreeBSD also needs a file descriptor per port (for a pcap handle), so > maybe it's not worth saying that it is Linux specific. The final note > about performance beyond 1,024 ports is definitely specific to the > Linux kernel datapath though. > Okay. How about the following wording? ovs-vswitchd started through ovs-ctl provides a limit of 5000 file descriptors. Creation of a single bridge consumes 3 file descriptors and adding a port consumes 1 file descriptor. The limits on the number of bridges and ports is decided by the availability of file descriptors. (For Linux kernel datapath, performance will degrade beyond 1,024 ports per bridge due to fixed hash table sizing.) > > Thanks, > > Ben. > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 1/2] ovs-vswitchd: Correct the documentation on the limits for bridges and ports.
On Thu, Jul 11, 2013 at 10:34 AM, Gurucharan Shetty wrote: > On Thu, Jul 11, 2013 at 10:22 AM, Ben Pfaff wrote: > >> On Thu, Jul 11, 2013 at 09:37:45AM -0700, Gurucharan Shetty wrote: >> > Signed-off-by: Gurucharan Shetty >> >> I'm not sure it's worth mentioning the 65,280 port limit, because it's >> about the same as the maximum number of file descriptors. >> >> At first thought, these limits seemed Linux specific, but I think that >> FreeBSD also needs a file descriptor per port (for a pcap handle), so >> maybe it's not worth saying that it is Linux specific. The final note >> about performance beyond 1,024 ports is definitely specific to the >> Linux kernel datapath though. >> > Okay. How about the following wording? > > ovs-vswitchd started through ovs-ctl provides a limit of 5000 file > descriptors. Creation of a single bridge consumes 3 file descriptors and > adding a port consumes 1 file descriptor. The limits on the number of > bridges and ports is decided by the availability of file descriptors. (For > Linux kernel datapath, performance will degrade beyond 1,024 ports per > bridge due to fixed hash table sizing.) > Or probably make the entire thing Linux specific. ovs-vswitchd started through ovs-ctl provides a limit of 5000 file descriptors. The limits on the number of bridges and ports is decided by the availability of file descriptors. On Linux, creation of a single bridge consumes 3 file descriptors and adding a port consumes 1 file descriptor. Performance will degrade beyond 1,024 ports per bridge due to fixed hash table sizing. Other platforms may have different limitations. > > >> >> Thanks, >> >> Ben. >> > > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 1/2] ovs-vswitchd: Correct the documentation on the limits for bridges and ports.
On Thu, Jul 11, 2013 at 10:34:06AM -0700, Gurucharan Shetty wrote: > On Thu, Jul 11, 2013 at 10:22 AM, Ben Pfaff wrote: > > > On Thu, Jul 11, 2013 at 09:37:45AM -0700, Gurucharan Shetty wrote: > > > Signed-off-by: Gurucharan Shetty > > > > I'm not sure it's worth mentioning the 65,280 port limit, because it's > > about the same as the maximum number of file descriptors. > > > > At first thought, these limits seemed Linux specific, but I think that > > FreeBSD also needs a file descriptor per port (for a pcap handle), so > > maybe it's not worth saying that it is Linux specific. The final note > > about performance beyond 1,024 ports is definitely specific to the > > Linux kernel datapath though. > > > Okay. How about the following wording? > > ovs-vswitchd started through ovs-ctl provides a limit of 5000 file > descriptors. Creation of a single bridge consumes 3 file descriptors and > adding a port consumes 1 file descriptor. The limits on the number of > bridges and ports is decided by the availability of file descriptors. (For > Linux kernel datapath, performance will degrade beyond 1,024 ports per > bridge due to fixed hash table sizing.) I like it. Thank you. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 1/2] ovs-vswitchd: Correct the documentation on the limits for bridges and ports.
On Thu, Jul 11, 2013 at 10:39:38AM -0700, Gurucharan Shetty wrote: > On Thu, Jul 11, 2013 at 10:34 AM, Gurucharan Shetty wrote: > > > On Thu, Jul 11, 2013 at 10:22 AM, Ben Pfaff wrote: > > > >> On Thu, Jul 11, 2013 at 09:37:45AM -0700, Gurucharan Shetty wrote: > >> > Signed-off-by: Gurucharan Shetty > >> > >> I'm not sure it's worth mentioning the 65,280 port limit, because it's > >> about the same as the maximum number of file descriptors. > >> > >> At first thought, these limits seemed Linux specific, but I think that > >> FreeBSD also needs a file descriptor per port (for a pcap handle), so > >> maybe it's not worth saying that it is Linux specific. The final note > >> about performance beyond 1,024 ports is definitely specific to the > >> Linux kernel datapath though. > >> > > Okay. How about the following wording? > > > > ovs-vswitchd started through ovs-ctl provides a limit of 5000 file > > descriptors. Creation of a single bridge consumes 3 file descriptors and > > adding a port consumes 1 file descriptor. The limits on the number of > > bridges and ports is decided by the availability of file descriptors. (For > > Linux kernel datapath, performance will degrade beyond 1,024 ports per > > bridge due to fixed hash table sizing.) > > > Or probably make the entire thing Linux specific. > ovs-vswitchd started through ovs-ctl provides a limit of 5000 file > descriptors. The limits on the number of bridges and ports is decided by > the availability of file descriptors. On Linux, creation of a single bridge > consumes 3 file descriptors and adding a port consumes 1 file descriptor. > Performance will degrade beyond 1,024 ports per bridge due to fixed hash > table sizing. Other platforms may have different limitations. How about "With the Linux kernel datapath" instead of "On Linux", because the netdev datapath has slightly different characteristics. Otherwise I like this. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH V3] feature: Create specific types for ofp and odp port
On Thu, Jul 11, 2013 at 12:44:50PM +0900, YAMAMOTO Takashi wrote: > > On Tue, Jul 09, 2013 at 05:18:34PM +0900, YAMAMOTO Takashi wrote: > >> hi, > >> > >> > @@ -708,9 +711,7 @@ dpif_netdev_flow_from_nlattrs(const struct nlattr > >> > *key, uint32_t key_len, > >> > return EINVAL; > >> > } > >> > > >> > -if (flow->in_port < OFPP_MAX > >> > -? flow->in_port >= MAX_PORTS > >> > -: flow->in_port != OFPP_LOCAL && flow->in_port != OFPP_NONE) { > >> > +if (odp_to_u32(flow->in_port.odp_port) >= MAX_PORTS) { > >> > return EINVAL; > >> > } > >> > >> it seems that this change broke packet_out with in_port=controller. > > > > Oops. > > > > I sent out a fix for review: > > http://openvswitch.org/pipermail/dev/2013-July/029436.html > > Can you confirm that it fixes the problem for you? > > thanks. > > i haven't tested your patch but it seems fine. > it's mostly identical to my local patch which is working for me. Thanks for the review. I applied this to master. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [partition v2 3/3] classifier: Speed up lookup when metadata partitions the flow table.
Thanks for the reminder. They all look like good comments to me, but I'm reposting the series with the goal of getting some performance measurements. (The 19% in the commit log is obsolete.) If the performance measurements are good, I'll revise it. I'm going to repost the series again in a moment. I'll mention this in the commit log this time. On Fri, Jun 28, 2013 at 07:49:49AM +, Rajahalme, Jarno (NSN - FI/Espoo) wrote: > I sent some comments (mostly about comments it seems) earlier you may want to > revisit (or not): > > http://openvswitch.org/pipermail/dev/2013-May/027297.html > > Jarno > > On Jun 28, 2013, at 1:30 , ext Ben Pfaff wrote: > > > We have a controller that puts many rules with different metadata values > > into the flow table, where metadata is used (by "resubmit"s) to distinguish > > stages in a pipeline. Thus, any given flow only needs to be hashed into > > classifier "cls_table"s that contain a match for the flow's metadata value. > > This commit optimizes the classifier lookup by (probabilistically) skipping > > the "cls_table"s that can't possibly match. > > > > (The "metadata" referred to here is the OpenFlow 1.1+ "metadata" field, > > which is a 64-bit field similar in purpose to the "registers" defined by > > Open vSwitch.) > > > > This speeds up flow setup performance with the controller in question by > > about 19%. > > > > Bug #14282. > > Signed-off-by: Ben Pfaff > > --- > > lib/classifier.c | 116 > > + > > lib/classifier.h | 74 +++--- > > lib/flow.h | 27 > > lib/tag.h|8 +++- > > 4 files changed, 209 insertions(+), 16 deletions(-) > > > > diff --git a/lib/classifier.c b/lib/classifier.c > > index bd202f6..0af6e17 100644 > > --- a/lib/classifier.c > > +++ b/lib/classifier.c > > @@ -41,7 +41,7 @@ static void update_tables_after_removal(struct classifier > > *, > >unsigned int del_priority); > > > > static struct cls_rule *find_match(const struct cls_table *, > > - const struct flow *); > > + const struct flow *, tag_type tags); > > static struct cls_rule *find_equal(struct cls_table *, > > const struct miniflow *, uint32_t hash); > > static struct cls_rule *insert_rule(struct classifier *, > > @@ -143,6 +143,7 @@ classifier_init(struct classifier *cls) > >cls->n_rules = 0; > >hmap_init(&cls->tables); > >list_init(&cls->tables_priority); > > +hmap_init(&cls->partitions); > > } > > > > /* Destroys 'cls'. Rules within 'cls', if any, are not freed; this is the > > @@ -151,12 +152,20 @@ void > > classifier_destroy(struct classifier *cls) > > { > >if (cls) { > > +struct cls_table *partition, *next_partition; > >struct cls_table *table, *next_table; > > > >HMAP_FOR_EACH_SAFE (table, next_table, hmap_node, &cls->tables) { > >destroy_table(cls, table); > >} > >hmap_destroy(&cls->tables); > > + > > +HMAP_FOR_EACH_SAFE (partition, next_partition, hmap_node, > > +&cls->partitions) { > > +hmap_remove(&cls->partitions, &partition->hmap_node); > > +free(partition); > > +} > > +hmap_destroy(&cls->partitions); > >} > > } > > > > @@ -174,6 +183,46 @@ classifier_count(const struct classifier *cls) > >return cls->n_rules; > > } > > > > +static uint32_t > > +hash_metadata(ovs_be64 metadata_) > > +{ > > +uint64_t metadata = (OVS_FORCE uint64_t) metadata_; > > +return hash_2words(metadata, metadata >> 32); > > +} > > + > > +static struct cls_partition * > > +find_partition(const struct classifier *cls, ovs_be64 metadata, uint32_t > > hash) > > +{ > > +struct cls_partition *partition; > > + > > +HMAP_FOR_EACH_IN_BUCKET (partition, hmap_node, hash, &cls->partitions) > > { > > +if (partition->metadata == metadata) { > > +return partition; > > +} > > +} > > + > > +return NULL; > > +} > > + > > +static struct cls_partition * > > +create_partition(struct classifier *cls, struct cls_table *table, > > + ovs_be64 metadata) > > +{ > > +uint32_t hash = hash_metadata(metadata); > > +struct cls_partition *partition = find_partition(cls, metadata, hash); > > +if (partition) { > > +partition->n_refs++; > > +partition->tags |= table->tag; > > +} else { > > +partition = xmalloc(sizeof *partition); > > +partition->n_refs = 1; > > +partition->tags = table->tag; > > +partition->metadata = metadata; > > +hmap_insert(&cls->partitions, &partition->hmap_node, hash); > > +} > > +return partition; > > +} > > + > > /* Inserts 'rule' into 'cls'. Until 'rule' is removed from 'cls', the > > caller > > * must not modify or free it. > > * > > @@ -200
Re: [ovs-dev] [PATCH 1/2] ovs-vswitchd: Correct the documentation on the limits for bridges and ports.
On Thu, Jul 11, 2013 at 10:43 AM, Ben Pfaff wrote: > On Thu, Jul 11, 2013 at 10:39:38AM -0700, Gurucharan Shetty wrote: > > On Thu, Jul 11, 2013 at 10:34 AM, Gurucharan Shetty >wrote: > > > > > On Thu, Jul 11, 2013 at 10:22 AM, Ben Pfaff wrote: > > > > > >> On Thu, Jul 11, 2013 at 09:37:45AM -0700, Gurucharan Shetty wrote: > > >> > Signed-off-by: Gurucharan Shetty > > >> > > >> I'm not sure it's worth mentioning the 65,280 port limit, because it's > > >> about the same as the maximum number of file descriptors. > > >> > > >> At first thought, these limits seemed Linux specific, but I think that > > >> FreeBSD also needs a file descriptor per port (for a pcap handle), so > > >> maybe it's not worth saying that it is Linux specific. The final note > > >> about performance beyond 1,024 ports is definitely specific to the > > >> Linux kernel datapath though. > > >> > > > Okay. How about the following wording? > > > > > > ovs-vswitchd started through ovs-ctl provides a limit of 5000 file > > > descriptors. Creation of a single bridge consumes 3 file descriptors > and > > > adding a port consumes 1 file descriptor. The limits on the number of > > > bridges and ports is decided by the availability of file descriptors. > (For > > > Linux kernel datapath, performance will degrade beyond 1,024 ports per > > > bridge due to fixed hash table sizing.) > > > > > Or probably make the entire thing Linux specific. > > ovs-vswitchd started through ovs-ctl provides a limit of 5000 file > > descriptors. The limits on the number of bridges and ports is decided by > > the availability of file descriptors. On Linux, creation of a single > bridge > > consumes 3 file descriptors and adding a port consumes 1 file descriptor. > > Performance will degrade beyond 1,024 ports per bridge due to fixed hash > > table sizing. Other platforms may have different limitations. > > How about "With the Linux kernel datapath" instead of "On Linux", > because the netdev datapath has slightly different characteristics. > Otherwise I like this. > Sounds good. I applied it and pushed this series to master and 1.11. Thanks for the reviews. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [partition v3 3/3] classifier: Speed up lookup when metadata partitions the flow table.
We have a controller that puts many rules with different metadata values into the flow table, where metadata is used (by "resubmit"s) to distinguish stages in a pipeline. Thus, any given flow only needs to be hashed into classifier "cls_table"s that contain a match for the flow's metadata value. This commit optimizes the classifier lookup by (probabilistically) skipping the "cls_table"s that can't possibly match. (The "metadata" referred to here is the OpenFlow 1.1+ "metadata" field, which is a 64-bit field similar in purpose to the "registers" defined by Open vSwitch.) Previous versions of this patch, with earlier versions of the controller in question, improved flow setup performance by about 19%. This version needs performance testing before it is worth reviewing. Some review comments on previous versions, not yet addressed, are posted here: http://openvswitch.org/pipermail/dev/2013-May/027297.html Bug #14282. Signed-off-by: Ben Pfaff --- lib/classifier.c | 116 + lib/classifier.h | 74 +++--- lib/flow.h | 27 lib/tag.h|8 +++- 4 files changed, 209 insertions(+), 16 deletions(-) diff --git a/lib/classifier.c b/lib/classifier.c index bd202f6..0af6e17 100644 --- a/lib/classifier.c +++ b/lib/classifier.c @@ -41,7 +41,7 @@ static void update_tables_after_removal(struct classifier *, unsigned int del_priority); static struct cls_rule *find_match(const struct cls_table *, - const struct flow *); + const struct flow *, tag_type tags); static struct cls_rule *find_equal(struct cls_table *, const struct miniflow *, uint32_t hash); static struct cls_rule *insert_rule(struct classifier *, @@ -143,6 +143,7 @@ classifier_init(struct classifier *cls) cls->n_rules = 0; hmap_init(&cls->tables); list_init(&cls->tables_priority); +hmap_init(&cls->partitions); } /* Destroys 'cls'. Rules within 'cls', if any, are not freed; this is the @@ -151,12 +152,20 @@ void classifier_destroy(struct classifier *cls) { if (cls) { +struct cls_table *partition, *next_partition; struct cls_table *table, *next_table; HMAP_FOR_EACH_SAFE (table, next_table, hmap_node, &cls->tables) { destroy_table(cls, table); } hmap_destroy(&cls->tables); + +HMAP_FOR_EACH_SAFE (partition, next_partition, hmap_node, +&cls->partitions) { +hmap_remove(&cls->partitions, &partition->hmap_node); +free(partition); +} +hmap_destroy(&cls->partitions); } } @@ -174,6 +183,46 @@ classifier_count(const struct classifier *cls) return cls->n_rules; } +static uint32_t +hash_metadata(ovs_be64 metadata_) +{ +uint64_t metadata = (OVS_FORCE uint64_t) metadata_; +return hash_2words(metadata, metadata >> 32); +} + +static struct cls_partition * +find_partition(const struct classifier *cls, ovs_be64 metadata, uint32_t hash) +{ +struct cls_partition *partition; + +HMAP_FOR_EACH_IN_BUCKET (partition, hmap_node, hash, &cls->partitions) { +if (partition->metadata == metadata) { +return partition; +} +} + +return NULL; +} + +static struct cls_partition * +create_partition(struct classifier *cls, struct cls_table *table, + ovs_be64 metadata) +{ +uint32_t hash = hash_metadata(metadata); +struct cls_partition *partition = find_partition(cls, metadata, hash); +if (partition) { +partition->n_refs++; +partition->tags |= table->tag; +} else { +partition = xmalloc(sizeof *partition); +partition->n_refs = 1; +partition->tags = table->tag; +partition->metadata = metadata; +hmap_insert(&cls->partitions, &partition->hmap_node, hash); +} +return partition; +} + /* Inserts 'rule' into 'cls'. Until 'rule' is removed from 'cls', the caller * must not modify or free it. * @@ -200,8 +249,17 @@ classifier_replace(struct classifier *cls, struct cls_rule *rule) old_rule = insert_rule(cls, table, rule); if (!old_rule) { +if (minimask_get_metadata_mask(&rule->match.mask) == OVS_BE64_MAX) { +ovs_be64 metadata = miniflow_get_metadata(&rule->match.flow); +rule->partition = create_partition(cls, table, metadata); +} else { +rule->partition = NULL; +} + table->n_table_rules++; cls->n_rules++; +} else { +rule->partition = old_rule->partition; } return old_rule; } @@ -225,6 +283,7 @@ classifier_insert(struct classifier *cls, struct cls_rule *rule) void classifier_remove(struct classifier *cls, struct cls_rule *rule) { +struct cls_partition *partition; struct cls_rule *head; struct
[ovs-dev] [partition v3 0/3] speed up classification based on metadata
v1->v2: rebase v2->v3: rebase Reposting to get performance numbers. Some review comments from v1 have not yet been addressed and won't be worth addressing unless the performance numbers are good. Ben Pfaff (3): openvswitch/types.h: New macros OVS_BE16_MAX, OVS_BE32_MAX, OVS_BE64_MAX. match: New function minimatch_matches_flow(). classifier: Speed up lookup when metadata partitions the flow table. include/openvswitch/types.h |4 ++ lib/classifier.c| 117 lib/classifier.h| 74 +++-- lib/flow.c | 126 +- lib/flow.h | 37 +++-- lib/learn.c |2 +- lib/match.c | 61 +++-- lib/match.h | 18 --- lib/meta-flow.c | 13 ++--- lib/nx-match.c |8 ++-- lib/odp-util.c | 13 ++--- lib/ofp-actions.c |2 +- lib/ofp-parse.c |6 +- lib/ofp-print.c |2 +- lib/ofp-util.c | 18 +++--- lib/packets.c |2 +- lib/tag.h |8 +++- ofproto/ofproto.c | 12 ++-- tests/test-classifier.c | 16 +++--- tests/test-util.c |2 +- 20 files changed, 421 insertions(+), 120 deletions(-) -- 1.7.2.5 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [partition v3 2/3] match: New function minimatch_matches_flow().
Signed-off-by: Ben Pfaff --- lib/classifier.c |3 +- lib/flow.c | 126 ++ lib/flow.h | 10 +++-- lib/match.c | 31 +- lib/match.h | 18 +--- 5 files changed, 147 insertions(+), 41 deletions(-) diff --git a/lib/classifier.c b/lib/classifier.c index a717bd3..bd202f6 100644 --- a/lib/classifier.c +++ b/lib/classifier.c @@ -654,8 +654,7 @@ find_match(const struct cls_table *table, const struct flow *flow) struct cls_rule *rule; HMAP_FOR_EACH_WITH_HASH (rule, hmap_node, hash, &table->rules) { -if (miniflow_equal_flow_in_minimask(&rule->match.flow, flow, -&table->mask)) { +if (minimatch_matches_flow(&rule->match, flow)) { return rule; } } diff --git a/lib/flow.c b/lib/flow.c index d899d26..0094ccc 100644 --- a/lib/flow.c +++ b/lib/flow.c @@ -1065,13 +1065,38 @@ miniflow_alloc_values(struct miniflow *flow, int n) } } +/* Completes an initialization of 'dst' as a miniflow copy of 'src' begun by + * the caller. The caller must have already initialized 'dst->map' properly + * to indicate the nonzero uint32_t elements of 'src'. 'n' must be the number + * of 1-bits in 'dst->map'. + * + * This function initializes 'dst->values' (either inline if possible or with + * malloc() otherwise) and copies the nonzero uint32_t elements of 'src' into + * it. */ +static void +miniflow_init__(struct miniflow *dst, const struct flow *src, int n) +{ +const uint32_t *src_u32 = (const uint32_t *) src; +unsigned int ofs; +int i; + +dst->values = miniflow_alloc_values(dst, n); +ofs = 0; +for (i = 0; i < MINI_N_MAPS; i++) { +uint32_t map; + +for (map = dst->map[i]; map; map = zero_rightmost_1bit(map)) { +dst->values[ofs++] = src_u32[raw_ctz(map) + i * 32]; +} +} +} + /* Initializes 'dst' as a copy of 'src'. The caller must eventually free 'dst' * with miniflow_destroy(). */ void miniflow_init(struct miniflow *dst, const struct flow *src) { const uint32_t *src_u32 = (const uint32_t *) src; -unsigned int ofs; unsigned int i; int n; @@ -1085,16 +1110,17 @@ miniflow_init(struct miniflow *dst, const struct flow *src) } } -/* Initialize dst->values. */ -dst->values = miniflow_alloc_values(dst, n); -ofs = 0; -for (i = 0; i < MINI_N_MAPS; i++) { -uint32_t map; +miniflow_init__(dst, src, n); +} -for (map = dst->map[i]; map; map = zero_rightmost_1bit(map)) { -dst->values[ofs++] = src_u32[raw_ctz(map) + i * 32]; -} -} +/* Initializes 'dst' as a copy of 'src', using 'mask->map' as 'dst''s map. The + * caller must eventually free 'dst' with miniflow_destroy(). */ +void +miniflow_init_with_minimask(struct miniflow *dst, const struct flow *src, +const struct minimask *mask) +{ +memcpy(dst->map, mask->masks.map, sizeof dst->map); +miniflow_init__(dst, src, miniflow_n_values(dst)); } /* Initializes 'dst' as a copy of 'src'. The caller must eventually free 'dst' @@ -1177,16 +1203,35 @@ miniflow_get_vid(const struct miniflow *flow) bool miniflow_equal(const struct miniflow *a, const struct miniflow *b) { +const uint32_t *ap = a->values; +const uint32_t *bp = b->values; int i; for (i = 0; i < MINI_N_MAPS; i++) { -if (a->map[i] != b->map[i]) { -return false; +const uint32_t a_map = a->map[i]; +const uint32_t b_map = b->map[i]; +uint32_t map; + +if (a_map == b_map) { +for (map = a_map; map; map = zero_rightmost_1bit(map)) { +if (*ap++ != *bp++) { +return false; +} +} +} else { +for (map = a_map | b_map; map; map = zero_rightmost_1bit(map)) { +uint32_t bit = rightmost_1bit(map); +uint32_t a_value = a_map & bit ? *ap++ : 0; +uint32_t b_value = b_map & bit ? *bp++ : 0; + +if (a_value != b_value) { +return false; +} +} } } -return !memcmp(a->values, b->values, - miniflow_n_values(a) * sizeof *a->values); +return true; } /* Returns true if 'a' and 'b' are equal at the places where there are 1-bits @@ -1246,10 +1291,24 @@ miniflow_equal_flow_in_minimask(const struct miniflow *a, const struct flow *b, uint32_t miniflow_hash(const struct miniflow *flow, uint32_t basis) { -BUILD_ASSERT_DECL(MINI_N_MAPS == 2); -return hash_3words(flow->map[0], flow->map[1], - hash_words(flow->values, miniflow_n_values(flow), - basis)); +const uint32_t *p = flow->values; +uint32_t hash = basis; +int i; + +for (i = 0; i < MINI_N_MAPS; i++) { +uint32_t hash_m
[ovs-dev] [partition v3 1/3] openvswitch/types.h: New macros OVS_BE16_MAX, OVS_BE32_MAX, OVS_BE64_MAX.
These seem slightly nicer than e.g. htons(UINT16_MAX). Signed-off-by: Ben Pfaff --- include/openvswitch/types.h |4 lib/learn.c |2 +- lib/match.c | 30 +++--- lib/meta-flow.c | 13 ++--- lib/nx-match.c |8 lib/odp-util.c | 13 ++--- lib/ofp-actions.c |2 +- lib/ofp-parse.c |6 +++--- lib/ofp-print.c |2 +- lib/ofp-util.c | 18 +- lib/packets.c |2 +- ofproto/ofproto.c | 12 ++-- tests/test-classifier.c | 16 tests/test-util.c |2 +- 14 files changed, 66 insertions(+), 64 deletions(-) diff --git a/include/openvswitch/types.h b/include/openvswitch/types.h index d07c1e8..8bfb218 100644 --- a/include/openvswitch/types.h +++ b/include/openvswitch/types.h @@ -38,6 +38,10 @@ typedef __be16 ovs_be16; typedef __be32 ovs_be32; typedef __be64 ovs_be64; + +#define OVS_BE16_MAX ((OVS_FORCE ovs_be16) 0x) +#define OVS_BE32_MAX ((OVS_FORCE ovs_be32) 0x) +#define OVS_BE64_MAX ((OVS_FORCE ovs_be64) 0xULL) /* Netlink and OpenFlow both contain 64-bit values that are only guaranteed to * be aligned on 32-bit boundaries. These types help. diff --git a/lib/learn.c b/lib/learn.c index 49d9efd..f4b827c 100644 --- a/lib/learn.c +++ b/lib/learn.c @@ -303,7 +303,7 @@ learn_execute(const struct ofpact_learn *learn, const struct flow *flow, fm->cookie = htonll(0); fm->cookie_mask = htonll(0); fm->new_cookie = htonll(learn->cookie); -fm->modify_cookie = fm->new_cookie != htonll(UINT64_MAX); +fm->modify_cookie = fm->new_cookie != OVS_BE64_MAX; fm->table_id = learn->table_id; fm->command = OFPFC_MODIFY_STRICT; fm->idle_timeout = learn->idle_timeout; diff --git a/lib/match.c b/lib/match.c index 6d66eba..bf82d6c 100644 --- a/lib/match.c +++ b/lib/match.c @@ -181,7 +181,7 @@ match_set_reg_masked(struct match *match, unsigned int reg_idx, void match_set_metadata(struct match *match, ovs_be64 metadata) { -match_set_metadata_masked(match, metadata, htonll(UINT64_MAX)); +match_set_metadata_masked(match, metadata, OVS_BE64_MAX); } void @@ -195,7 +195,7 @@ match_set_metadata_masked(struct match *match, void match_set_tun_id(struct match *match, ovs_be64 tun_id) { -match_set_tun_id_masked(match, tun_id, htonll(UINT64_MAX)); +match_set_tun_id_masked(match, tun_id, OVS_BE64_MAX); } void @@ -208,7 +208,7 @@ match_set_tun_id_masked(struct match *match, ovs_be64 tun_id, ovs_be64 mask) void match_set_tun_src(struct match *match, ovs_be32 src) { -match_set_tun_src_masked(match, src, htonl(UINT32_MAX)); +match_set_tun_src_masked(match, src, OVS_BE32_MAX); } void @@ -221,7 +221,7 @@ match_set_tun_src_masked(struct match *match, ovs_be32 src, ovs_be32 mask) void match_set_tun_dst(struct match *match, ovs_be32 dst) { -match_set_tun_dst_masked(match, dst, htonl(UINT32_MAX)); +match_set_tun_dst_masked(match, dst, OVS_BE32_MAX); } void @@ -294,7 +294,7 @@ match_set_skb_mark(struct match *match, uint32_t skb_mark) void match_set_dl_type(struct match *match, ovs_be16 dl_type) { -match->wc.masks.dl_type = htons(UINT16_MAX); +match->wc.masks.dl_type = OVS_BE16_MAX; match->flow.dl_type = dl_type; } @@ -405,7 +405,7 @@ match_set_dl_vlan(struct match *match, ovs_be16 dl_vlan) { flow_set_dl_vlan(&match->flow, dl_vlan); if (dl_vlan == htons(OFP10_VLAN_NONE)) { -match->wc.masks.vlan_tci = htons(UINT16_MAX); +match->wc.masks.vlan_tci = OVS_BE16_MAX; } else { match->wc.masks.vlan_tci |= htons(VLAN_VID_MASK | VLAN_CFI); } @@ -512,7 +512,7 @@ match_set_mpls_bos(struct match *match, uint8_t mpls_bos) void match_set_tp_src(struct match *match, ovs_be16 tp_src) { -match_set_tp_src_masked(match, tp_src, htons(UINT16_MAX)); +match_set_tp_src_masked(match, tp_src, OVS_BE16_MAX); } void @@ -525,7 +525,7 @@ match_set_tp_src_masked(struct match *match, ovs_be16 port, ovs_be16 mask) void match_set_tp_dst(struct match *match, ovs_be16 tp_dst) { -match_set_tp_dst_masked(match, tp_dst, htons(UINT16_MAX)); +match_set_tp_dst_masked(match, tp_dst, OVS_BE16_MAX); } void @@ -546,7 +546,7 @@ void match_set_nw_src(struct match *match, ovs_be32 nw_src) { match->flow.nw_src = nw_src; -match->wc.masks.nw_src = htonl(UINT32_MAX); +match->wc.masks.nw_src = OVS_BE32_MAX; } void @@ -561,7 +561,7 @@ void match_set_nw_dst(struct match *match, ovs_be32 nw_dst) { match->flow.nw_dst = nw_dst; -match->wc.masks.nw_dst = htonl(UINT32_MAX); +match->wc.masks.nw_dst = OVS_BE32_MAX; } void @@ -686,7 +686,7 @@ match_set_ipv6_dst_masked(struct match *match, const struct in6_addr *dst, void match_set_ipv6_label(struct match *match, ovs_be32 ipv6_label)
Re: [ovs-dev] [PATCH] FAQ: Add supported kernel versions for newer OVS releases.
On Tue, Jul 9, 2013 at 2:13 PM, Jesse Gross wrote: > Reported-by: Kris zhang > Signed-off-by: Jesse Gross Acked-by: Pravin B Shelar > --- > FAQ | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/FAQ b/FAQ > index 0210477..6b5d8da 100644 > --- a/FAQ > +++ b/FAQ > @@ -146,6 +146,9 @@ A: The following table lists the Linux kernel versions > against which the > 1.7.x 2.6.18 to 3.3 > 1.8.x 2.6.18 to 3.4 > 1.9.x 2.6.18 to 3.8 > + 1.10.x 2.6.18 to 3.8 > + 1.11.x 2.6.18 to 3.8 > + 1.12.x 2.6.18 to 3.8 > > Open vSwitch userspace should also work with the Linux kernel module > built into Linux 3.3 and later. > -- > 1.8.1.2 > > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 1/2] datapath: Improve net-namespace compat code.
On Wed, Jul 10, 2013 at 3:52 PM, Jesse Gross wrote: > On Mon, Jul 1, 2013 at 3:28 PM, Pravin B Shelar wrote: >> diff --git a/datapath/linux/compat/net_namespace.c >> b/datapath/linux/compat/net_namespace.c >> index 4e8a891..39b4a28 100644 >> --- a/datapath/linux/compat/net_namespace.c >> +++ b/datapath/linux/compat/net_namespace.c >> int rpl_register_pernet_##PNET_TYPE(struct rpl_pernet_operations *rpl_pnet) >>\ >> { >>\ >> pnet_##PNET_TYPE = rpl_pnet; >>\ >> - return register_pernet_##PNET_TYPE(pnet_##PNET_TYPE->id, >> &pnet_compat_##PNET_TYPE); \ >> + rpl_pnet->ops = pnet_compat_##PNET_TYPE; >>\ >> + return register_pernet_##PNET_TYPE(pnet_##PNET_TYPE->id, >> &rpl_pnet->ops); \ >> } > > I think this works for initializing the device for existing namespaces > but if a new namespace is brought up then it will have problems. In > that case, it will call the registration function for the last device > repeatedly. > right. > We could avoid this issue if we just used contain_of instead of a set > of global variables to recover the original ops. That might also allow > us to significantly reduce the amount of token pasting, which would > also make this easier to read. I am not sure what are you suggesting. problem is compat-per-net-init function has net as parameter and it need id to get private net obj. but there is no way to get net_id/net-obj for this compat function without global pointer, but that does not work as you explained. I can think of some more macro tricks to hide this compat code, that will involve moving this code to header and using file-name (and may be line no) to build specific compat functions rather than sharing same functions for multiple net-ops. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 1/2] datapath: Improve net-namespace compat code.
On Thu, Jul 11, 2013 at 12:25 PM, Pravin Shelar wrote: > On Wed, Jul 10, 2013 at 3:52 PM, Jesse Gross wrote: >> On Mon, Jul 1, 2013 at 3:28 PM, Pravin B Shelar wrote: >>> diff --git a/datapath/linux/compat/net_namespace.c >>> b/datapath/linux/compat/net_namespace.c >>> index 4e8a891..39b4a28 100644 >>> --- a/datapath/linux/compat/net_namespace.c >>> +++ b/datapath/linux/compat/net_namespace.c >>> int rpl_register_pernet_##PNET_TYPE(struct rpl_pernet_operations >>> *rpl_pnet)\ >>> { >>> \ >>> pnet_##PNET_TYPE = rpl_pnet; >>> \ >>> - return register_pernet_##PNET_TYPE(pnet_##PNET_TYPE->id, >>> &pnet_compat_##PNET_TYPE); \ >>> + rpl_pnet->ops = pnet_compat_##PNET_TYPE; >>> \ >>> + return register_pernet_##PNET_TYPE(pnet_##PNET_TYPE->id, >>> &rpl_pnet->ops); \ >>> } >> >> I think this works for initializing the device for existing namespaces >> but if a new namespace is brought up then it will have problems. In >> that case, it will call the registration function for the last device >> repeatedly. >> > right. > >> We could avoid this issue if we just used contain_of instead of a set >> of global variables to recover the original ops. That might also allow >> us to significantly reduce the amount of token pasting, which would >> also make this easier to read. > I am not sure what are you suggesting. > problem is compat-per-net-init function has net as parameter and it > need id to get private net obj. but there is no way to get > net_id/net-obj for this compat function without global pointer, but > that does not work as you explained. You're right, it doesn't work. > I can think of some more macro tricks to hide this compat code, that > will involve moving this code to header and using file-name (and may > be line no) to build specific compat functions rather than sharing > same functions for multiple net-ops. I guess that works. It will be hard to read but I don't have a better idea. X-CudaMail-Whitelist-To: dev@openvswitch.org ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] FAQ: Add supported kernel versions for newer OVS releases.
On Thu, Jul 11, 2013 at 11:42 AM, Pravin Shelar wrote: > On Tue, Jul 9, 2013 at 2:13 PM, Jesse Gross wrote: >> Reported-by: Kris zhang >> Signed-off-by: Jesse Gross > Acked-by: Pravin B Shelar Thanks, applied to master and I'll push appropriate versions to the other affected branches. X-CudaMail-Whitelist-To: dev@openvswitch.org ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] Aldrich
Contact me regarding Aldrich. Fu. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [threads 01/28] fatal-signal: Remove write-only variable fatal_signal_set.
On Thu, Jul 11, 2013 at 09:17:50AM -0400, Ed Maste wrote: > On 10 July 2013 19:03, Ben Pfaff wrote: > > Signed-off-by: Ben Pfaff > > --- > > lib/fatal-signal.c |5 - > > Acked-by: Ed Maste Thanks, applied to master. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] add flow aggregation and caching to IPFIX exporter
This patch implements an aggregator in the IPFIX exporter, to reduce the volume of IPFIX messages sent. There might be issues with multithreading, since the flow caches are accessed both by upcalls and by the main run loop. I had no problems in my (very small scale) tests. Should I add locking to protect access to each cache? ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH] ipfix: implement flow caching and aggregation in exporter
Implement a per-exporter flow cache with active timeout expiration. Add columns "cache_active_timeout" and "cache_max_flows" into table "IPFIX" to configure each cache. Add per-flow elements "octetDeltaSumOfSquares", "minimumIpTotalLength", and "maximumIpTotalLength" to replace "ethernetTotalLength". Add per-flow element "flowEndReason" to indicate whether a flow has expired because of an active timeout, the cache size limit being reached, or the exporter being stopped. Signed-off-by: Romain Lenglet --- ofproto/ofproto-dpif-ipfix.c | 758 +- ofproto/ofproto-dpif-ipfix.h |3 + ofproto/ofproto-dpif.c | 23 +- ofproto/ofproto.h|4 + utilities/ovs-vsctl.8.in |5 +- vswitchd/bridge.c| 10 + vswitchd/vswitch.ovsschema | 12 +- vswitchd/vswitch.xml | 12 + 8 files changed, 727 insertions(+), 100 deletions(-) diff --git a/ofproto/ofproto-dpif-ipfix.c b/ofproto/ofproto-dpif-ipfix.c index ef0e980..e066199 100644 --- a/ofproto/ofproto-dpif-ipfix.c +++ b/ofproto/ofproto-dpif-ipfix.c @@ -15,15 +15,18 @@ */ #include +#include #include "ofproto-dpif-ipfix.h" #include "byte-order.h" #include "collectors.h" #include "flow.h" #include "hash.h" #include "hmap.h" +#include "list.h" #include "ofpbuf.h" #include "ofproto.h" #include "packets.h" +#include "poll-loop.h" #include "sset.h" #include "util.h" #include "timeval.h" @@ -40,7 +43,11 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(1, 5); struct dpif_ipfix_exporter { struct collectors *collectors; uint32_t seq_number; -time_t last_template_set_time; +__kernel_time_t last_template_set_time; +struct hmap cache_flow_key_map; /* ipfix_flow_cache_entry. */ +struct list cache_flow_start_timestamp_list; /* ipfix_flow_cache_entry. */ +uint32_t cache_active_timeout; /* In seconds. */ +uint32_t cache_max_flows; }; struct dpif_ipfix_bridge_exporter { @@ -61,7 +68,7 @@ struct dpif_ipfix_flow_exporter_map_node { struct dpif_ipfix { struct dpif_ipfix_bridge_exporter bridge_exporter; -struct hmap flow_exporter_map; /* dpif_ipfix_flow_exporter_map_nodes. */ +struct hmap flow_exporter_map; /* dpif_ipfix_flow_exporter_map_node. */ int ref_cnt; }; @@ -138,29 +145,27 @@ struct ipfix_template_field_specifier { } __attribute__((packed)); BUILD_ASSERT_DECL(sizeof(struct ipfix_template_field_specifier) == 4); -/* Part of data record for common metadata and Ethernet entities. */ -struct ipfix_data_record_common { +/* Part of data record flow key for common metadata and Ethernet entities. */ +struct ipfix_data_record_flow_key_common { ovs_be32 observation_point_id; /* OBSERVATION_POINT_ID */ -ovs_be64 packet_delta_count; /* PACKET_DELTA_COUNT */ -ovs_be64 layer2_octet_delta_count; /* LAYER2_OCTET_DELTA_COUNT */ uint8_t source_mac_address[6]; /* SOURCE_MAC_ADDRESS */ uint8_t destination_mac_address[6]; /* DESTINATION_MAC_ADDRESS */ ovs_be16 ethernet_type; /* ETHERNET_TYPE */ -ovs_be16 ethernet_total_length; /* ETHERNET_TOTAL_LENGTH */ uint8_t ethernet_header_length; /* ETHERNET_HEADER_LENGTH */ } __attribute__((packed)); -BUILD_ASSERT_DECL(sizeof(struct ipfix_data_record_common) == 37); +BUILD_ASSERT_DECL(sizeof(struct ipfix_data_record_flow_key_common) == 19); -/* Part of data record for VLAN entities. */ -struct ipfix_data_record_vlan { +/* Part of data record flow key for VLAN entities. */ +struct ipfix_data_record_flow_key_vlan { ovs_be16 vlan_id; /* VLAN_ID */ ovs_be16 dot1q_vlan_id; /* DOT1Q_VLAN_ID */ uint8_t dot1q_priority; /* DOT1Q_PRIORITY */ } __attribute__((packed)); -BUILD_ASSERT_DECL(sizeof(struct ipfix_data_record_vlan) == 5); +BUILD_ASSERT_DECL(sizeof(struct ipfix_data_record_flow_key_vlan) == 5); -/* Part of data record for IP entities. */ -struct ipfix_data_record_ip { +/* Part of data record flow key for IP entities. */ +/* XXX: Replace IP_TTL with MINIMUM_TTL and MAXIMUM_TTL? */ +struct ipfix_data_record_flow_key_ip { uint8_t ip_version; /* IP_VERSION */ uint8_t ip_ttl; /* IP_TTL */ uint8_t protocol_identifier; /* PROTOCOL_IDENTIFIER */ @@ -168,29 +173,118 @@ struct ipfix_data_record_ip { uint8_t ip_precedence; /* IP_PRECEDENCE */ uint8_t ip_class_of_service; /* IP_CLASS_OF_SERVICE */ } __attribute__((packed)); -BUILD_ASSERT_DECL(sizeof(struct ipfix_data_record_ip) == 6); +BUILD_ASSERT_DECL(sizeof(struct ipfix_data_record_flow_key_ip) == 6); -/* Part of data record for IPv4 entities. */ -struct ipfix_data_record_ipv4 { +/* Part of data record flow key for IPv4 entities. */ +struct ipfix_data_record_flow_key_ipv4 { ovs_be32 source_ipv4_address; /* SOURCE_IPV4_ADDRESS */ ovs_be32 destination_ipv4_address; /* DESTINATION_IPV4_ADDRESS */ } __attribute__((packed)); -BUILD_ASSERT_DECL(sizeof(struct ipfix_data_record_ipv4) == 8); +BUILD_ASSERT_DECL(sizeof(struct i
Re: [ovs-dev] [threads 05/28] timeval: Always block SIGALRM in time_poll().
On Thu, Jul 11, 2013 at 09:13:15AM -0400, Ed Maste wrote: > On 10 July 2013 19:03, Ben Pfaff wrote: > > (The usual reason for ppoll() is to ensure atomicity of changing the signal > > mask and blocking, but time_poll() does not need atomicity. Therefore, > > even a userspace emulated ppoll() would work here in a race-free manner, if > > the kernel does not support ppoll().) > > And we'll need that userspace emulated ppoll() for non-Linux systems; > seem reasonable to just implement it in timeval.c inside an #ifndef > HAVE_PPOLL from autoconf? Hmm, I had this notion that ppoll() had been standard for a long time now, but I see that it's not even in SUSv4. I'm dropping this patch for now since the rationale is based entirely on speculation, not a good reason to reduce portability. Thanks, Ben. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [threads 06/28] timeval: New function xclock_gettime().
On Thu, Jul 11, 2013 at 09:43:44AM -0400, Ed Maste wrote: > On 10 July 2013 19:03, Ben Pfaff wrote: > > Signed-off-by: Ben Pfaff > > Acked-by: Ed Maste Thanks, applied to master. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [threads 08/28] timeval: Remove backtrace feature.
On Thu, Jul 11, 2013 at 09:31:07AM -0400, Ed Maste wrote: > On 10 July 2013 19:03, Ben Pfaff wrote: > > The backtrace feature of timeval is useful because it provides a "poor > > man's profile" view of Open vSwitch. But it is not likely to be useful in > > a multithreaded process, because signal delivery doesn't necessarily follow > > the profile when there is more than one thread. (A signal in a > > multithreaded process are delivered to an arbitrary thread.) > > > > Another problem with the backtrace feature is that it is difficult for > > format_backtraces() to synchronize properly with the signal handler in a > > multithreaded process. In a single-threaded process, it can just block > > the signal handler, but in a multithreaded process this does not prevent > > signal delivery to threads other than the one running format_backtrace(). > > > > Signed-off-by: Ben Pfaff > > Acked-by: Ed Maste Thanks. I applied this to master. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [threads 09/28] timeval: Fix typo in comment.
On Thu, Jul 11, 2013 at 09:45:21AM -0400, Ed Maste wrote: > On 10 July 2013 19:03, Ben Pfaff wrote: > > Signed-off-by: Ben Pfaff > > Acked-by: Ed Maste Thanks. I applied this to master. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [threads 13/28] poll-loop: Fix typo in comment.
On Thu, Jul 11, 2013 at 09:28:10AM -0400, Ed Maste wrote: > On 10 July 2013 19:03, Ben Pfaff wrote: > > Signed-off-by: Ben Pfaff > > Acked-by: Ed Maste Thanks, applied to master. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [threads 14/28] vlog: Remove support for worker process.
On Thu, Jul 11, 2013 at 10:00:21AM -0400, Ed Maste wrote: > On 10 July 2013 19:03, Ben Pfaff wrote: > > The worker process implementation isn't thread-safe and, once OVS > > itself is threaded, it doesn't make much sense to have a worker > > process anyway. > > > > Signed-off-by: Ben Pfaff > > Acked-by: Ed Maste Applied to master, thanks. I had to move this hunk to the later "worker: Delete library." patch, because in isolation it just caused assertion failures. It made more logical sense in the latter place anyhow: commit 14017b032237bc166b3d4af256d2c06b6f79817a Author: Ben Pfaff Date: Thu Jul 11 17:10:16 2013 -0700 daemon diff --git a/lib/daemon.c b/lib/daemon.c index 3c1e5c3..edeeca2 100644 --- a/lib/daemon.c +++ b/lib/daemon.c @@ -529,6 +529,8 @@ daemonize_start(void) /* Running in daemon process. */ } +forbid_forking("running in daemon process"); + if (pidfile) { make_pidfile(); } ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [threads 23/28] util: Make subprogram_name thread-specific.
On Thu, Jul 11, 2013 at 09:40:57AM -0400, Ed Maste wrote: > On 10 July 2013 19:04, Ben Pfaff wrote: > > Signed-off-by: Ben Pfaff > > > +void > > +set_subprogram_name(const char *name) > > +{ > > +free(subprogram_name_set(xstrdup(name))); > > +} > > How about calling (through appropriate abstraction) pthread_setname_np > (Linux) or pthread_set_name_np (FreeBSD)? The thread names will then > appear in gdb or lldb, and in top(1) (on FreeBSD at least). I didn't know about those functions. Thanks, I'll look into it. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 1/1] Option to toggle "Require successful LACP negotiation when configured"
> We do not have any known usecases from XenServer customers where this is > needed. We also do not expect a customer to explicitly configure this way. > However, we want to avoid the case of the customer having configured this > way with proper connectivity that breaks on upgrade. I really don't want to go down the rabbit hole of retaining optional backwards compatibility for every little feature we remove just because there might possibly me some badly configured user who is affected on upgrade. I'd prefer not to accept this patch until there's a more specific use case for it. Btw, in 1.4 it's easy to check if a user is running LACP, but has fallen back to SLB. Why not warn or fail this upgrade if this is the situation? Ethan > > >> >>Ethan >> >>> On 6/28/13 1:04 PM, "Ethan Jackson" wrote: >>> This doesn't constitute a full review, but I have a couple of comments. Can you describe a realistic situation in which a user configured their hypervisor to run LACP, the negotiation failed, and the correct behavior is to forward traffic despite a clear problem in their network config? At best, the upstream switch is dead or missing, at worst it's completely misconfigured and we could be doing more harm than good. In short, I don't understand the use case for this option. Assuming we decide to optionally fall back. I'd prefer we fall back to active-backup instead of SLB which is far more dangerous. Please add your signed-off-by to this patch. Ethan On Fri, Jun 28, 2013 at 9:58 AM, Ravi Kondamuru wrote: > Commit bdebeece5 (lacp: Require successful LACP negotiations when > configured.) makes successful LACP negotiation mandatory. > This change makes it configurable. The user has the option to > set other-config:lacp-fallback-slb to true if pre 1.5.0 behavior > is needed. The switch will fallback to balance-slb when LACP > negotiation fails or is unsupported by the partner switch. The > default is false, requiring a successful LACP negotiation (which is > the current behavior). > > Signed-off-by: Dominic Curran > --- > lib/bond.c | 41 +++-- > lib/bond.h |1 + > lib/lacp.c | 15 +-- > lib/lacp.h |1 + > vswitchd/bridge.c| 17 + > vswitchd/vswitch.xml | 19 +-- > 6 files changed, 76 insertions(+), 18 deletions(-) > > diff --git a/lib/bond.c b/lib/bond.c > index 68ac068..6365cd0 100644 > --- a/lib/bond.c > +++ b/lib/bond.c > @@ -103,6 +103,7 @@ struct bond { > > /* Legacy compatibility. */ > long long int next_fake_iface_update; /* LLONG_MAX if disabled. >*/ > +bool lacp_fallback_slb; /* fallback to balance-slb on lacp >failure >*/ > > /* Tag set saved for next bond_run(). This tag set is a kluge >for >cases > * where we can't otherwise provide revalidation feedback to the >client. > @@ -238,6 +239,11 @@ bond_reconfigure(struct bond *bond, const struct >bond_settings *s) > bond->updelay = s->up_delay; > bond->downdelay = s->down_delay; > > +if (bond->lacp_fallback_slb != s->lacp_fallback_slb_cfg) { > +bond->lacp_fallback_slb = s->lacp_fallback_slb_cfg; > +revalidate = true; > +} > + > if (bond->rebalance_interval != s->rebalance_interval) { > bond->rebalance_interval = s->rebalance_interval; > revalidate = true; > @@ -463,8 +469,9 @@ bond_wait(struct bond *bond) > static bool > may_send_learning_packets(const struct bond *bond) > { > -return bond->lacp_status == LACP_DISABLED > -&& (bond->balance == BM_SLB || bond->balance == BM_AB) > +return ((bond->lacp_status == LACP_DISABLED > +&& (bond->balance == BM_SLB || bond->balance == BM_AB)) > +|| (bond->lacp_fallback_slb && bond->lacp_status == >LACP_CONFIGURED)) > && bond->active_slave; > } > > @@ -546,10 +553,12 @@ bond_check_admissibility(struct bond *bond, >const >void *slave_, > * packets to them. > * > * If LACP is configured, but LACP negotiations have been >unsuccessful, we > - * drop all incoming traffic. */ > + * drop all incoming traffic except if lacp_fallback_slb is true. >*/ > switch (bond->lacp_status) { > case LACP_NEGOTIATED: return slave->enabled ? BV_ACCEPT : >BV_DROP; > -case LACP_CONFIGURED: return BV_DROP; > +case LACP_CONFIGURED: > +if (!bond->lacp_fallback_slb) > +return BV_DROP; > case LACP_DISABLED: break; > } > > @@ -578,9 +587,11 @@ bond_check_admissibility(struct bond *bond, const
[ovs-dev] [PATCH 2/2] bfd: Implements forwarding_if_rx
This commit adds a new boolean option "forwarding_if_rx" to bfd. When forwarding_if_rx is true and BFD state is down, the forwarding field in "ovs-appctl bfd/show" output will still be true as long as there are incoming packets received. This is for indicating the tunnel liveness when the tunnel is congested and consecutive BFD control packets are lost. Or when BFD is enabled only on one side of the tunnel. Signed-off-by: Alex Wang --- lib/bfd.c| 48 +++- vswitchd/vswitch.xml |8 2 files changed, 51 insertions(+), 5 deletions(-) diff --git a/lib/bfd.c b/lib/bfd.c index 0d4805a..a16a154 100644 --- a/lib/bfd.c +++ b/lib/bfd.c @@ -186,6 +186,13 @@ struct bfd { int ref_cnt; int forwarding_override; /* Manual override of 'forwarding' status. */ +/* When forward_if_rx is true and bfd->state is STATE_DOWN, the + * bfd_forwarding() will return true as long as there are incoming + * packets received. Note, forwarding_override still has + * higher priority. */ +bool forwarding_if_rx; +bool has_rx; + /* BFD decay related variables. */ bool decay_enabled; /* Flag that enables bfd decay. */ int decay_min_rx; @@ -202,6 +209,7 @@ static long long int bfd_rx_interval(const struct bfd *); static uint64_t bfd_rx_packets(const struct bfd *); static void bfd_decay_min_rx_set_default(struct bfd *); static void bfd_decay(struct bfd *); +static void bfd_check_rx(struct bfd *); static void bfd_set_next_tx(struct bfd *); static void bfd_set_state(struct bfd *, enum state, enum diag); static uint32_t generate_discriminator(void); @@ -226,10 +234,11 @@ bfd_forwarding(const struct bfd *bfd) return bfd->forwarding_override == 1; } -return bfd->state == STATE_UP -&& bfd->rmt_diag != DIAG_PATH_DOWN -&& bfd->rmt_diag != DIAG_CPATH_DOWN -&& bfd->rmt_diag != DIAG_RCPATH_DOWN; +return (bfd->forwarding_if_rx && bfd->state <= STATE_DOWN +? bfd->has_rx : bfd->state == STATE_UP) + && bfd->rmt_diag != DIAG_PATH_DOWN + && bfd->rmt_diag != DIAG_CPATH_DOWN + && bfd->rmt_diag != DIAG_RCPATH_DOWN; } /* Returns a 'smap' of key value pairs representing the status of 'bfd' @@ -261,7 +270,7 @@ bfd_configure(struct bfd *bfd, const char *name, static bool init = false; long long int min_tx, min_rx; -bool cpath_down; +bool cpath_down, forwarding_if_rx; if (!init) { unixctl_command_register("bfd/show", "[interface]", 0, 1, @@ -291,6 +300,8 @@ bfd_configure(struct bfd *bfd, const char *name, bfd->netdev = netdev; bfd->decay_detect_time = 0; bfd->rx_packets = bfd_rx_packets(bfd); +bfd->forwarding_if_rx = false; +bfd->has_rx = false; /* RFC 5881 section 4 * The source port MUST be in the range 49152 through 65535. The same @@ -341,6 +352,14 @@ bfd_configure(struct bfd *bfd, const char *name, } bfd_poll(bfd); } + +forwarding_if_rx = smap_get_bool(cfg, "forwarding_if_rx", false); +if (bfd->forwarding_if_rx != forwarding_if_rx) { +bfd->forwarding_if_rx = forwarding_if_rx; +if (bfd->state <= STATE_DOWN && bfd->forwarding_if_rx) { +bfd_check_rx(bfd); +} +} return bfd; } @@ -390,6 +409,11 @@ bfd_run(struct bfd *bfd) bfd_set_state(bfd, STATE_DOWN, DIAG_EXPIRED); } +if (bfd->state <= STATE_DOWN && bfd->forwarding_if_rx +&& now >= bfd->detect_time) { +bfd_check_rx(bfd); +} + if (bfd->state == STATE_UP && bfd->decay_enabled && now >= bfd->decay_detect_time) { bfd_decay(bfd); @@ -887,6 +911,20 @@ bfd_decay(struct bfd *bfd) } } +static void +bfd_check_rx(struct bfd *bfd) +{ +uint64_t rx_packets = bfd_rx_packets(bfd); +int64_t diff; + +diff = rx_packets - bfd->rx_packets; +ovs_assert(diff >= 0); + +bfd->rx_packets = rx_packets; +bfd->has_rx = (diff > 0); +bfd->detect_time = bfd_rx_interval(bfd) * bfd->mult + time_msec(); +} + static uint32_t generate_discriminator(void) { diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml index f888f4b..8c1170d 100644 --- a/vswitchd/vswitch.xml +++ b/vswitchd/vswitch.xml @@ -1896,6 +1896,14 @@ . + + When forwarding_if_rx is true and BFD state is down, + the forwarding field in "ovs-appctl bfd/show" + output will still be true as long as there are incoming + packets received. This option is for indicating the tunnel liveness + when the tunnel becomes congested and consecutive BFD control + packets are lost. + Concatenated path down may be used when the local system should not -- 1.7.9.5 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
[ovs-dev] [PATCH 1/2] bfd: Implements BFD decay
When there is no incoming data traffic in the tunnel for a period, BFD decay allows the bfd session to increase the min_rx. This is helpful in that usually some tunnels are rarely used. And cpu consumption can be reduced by processing fewer bfd control packets. Signed-off-by: Alex Wang --- lib/bfd.c | 89 ++-- lib/bfd.h |5 ++- ofproto/ofproto-dpif.c |3 +- vswitchd/vswitch.xml | 17 + 4 files changed, 109 insertions(+), 5 deletions(-) diff --git a/lib/bfd.c b/lib/bfd.c index aa1a3f7..0d4805a 100644 --- a/lib/bfd.c +++ b/lib/bfd.c @@ -152,6 +152,9 @@ struct bfd { bool cpath_down; /* Concatenated Path Down. */ uint8_t mult; /* bfd.DetectMult. */ +struct netdev *netdev; +uint64_t rx_packets; /* Packets received by 'netdev'. */ + enum state state; /* bfd.SessionState. */ enum state rmt_state; /* bfd.RemoteSessionState. */ @@ -182,6 +185,11 @@ struct bfd { int ref_cnt; int forwarding_override; /* Manual override of 'forwarding' status. */ + +/* BFD decay related variables. */ +bool decay_enabled; /* Flag that enables bfd decay. */ +int decay_min_rx; +long long int decay_detect_time; }; static bool bfd_in_poll(const struct bfd *); @@ -191,6 +199,9 @@ static const char *bfd_state_str(enum state); static long long int bfd_min_tx(const struct bfd *); static long long int bfd_tx_interval(const struct bfd *); static long long int bfd_rx_interval(const struct bfd *); +static uint64_t bfd_rx_packets(const struct bfd *); +static void bfd_decay_min_rx_set_default(struct bfd *); +static void bfd_decay(struct bfd *); static void bfd_set_next_tx(struct bfd *); static void bfd_set_state(struct bfd *, enum state, enum diag); static uint32_t generate_discriminator(void); @@ -243,7 +254,8 @@ bfd_get_status(const struct bfd *bfd, struct smap *smap) * Also returns NULL if cfg is NULL. */ struct bfd * bfd_configure(struct bfd *bfd, const char *name, - const struct smap *cfg) + const struct smap *cfg, + struct netdev *netdev) { static uint16_t udp_src = 0; static bool init = false; @@ -276,6 +288,9 @@ bfd_configure(struct bfd *bfd, const char *name, bfd->min_tx = 1000; bfd->mult = 3; bfd->ref_cnt = 1; +bfd->netdev = netdev; +bfd->decay_detect_time = 0; +bfd->rx_packets = bfd_rx_packets(bfd); /* RFC 5881 section 4 * The source port MUST be in the range 49152 through 65535. The same @@ -309,6 +324,15 @@ bfd_configure(struct bfd *bfd, const char *name, bfd_poll(bfd); } +bfd->decay_enabled = smap_get_bool(cfg, "decay_enable", false); +bfd->decay_min_rx = smap_get_int(cfg, "decay_min_rx", -1); +/* Set the default decay_min_rx if decay is enabled and decay_min_rx + is not specified. */ +if (bfd->decay_enabled && bfd->decay_min_rx < 0) { +bfd_decay_min_rx_set_default(bfd); +bfd_poll(bfd); +} + cpath_down = smap_get_bool(cfg, "cpath_down", false); if (bfd->cpath_down != cpath_down) { bfd->cpath_down = cpath_down; @@ -360,11 +384,19 @@ bfd_wait(const struct bfd *bfd) void bfd_run(struct bfd *bfd) { -if (bfd->state > STATE_DOWN && time_msec() >= bfd->detect_time) { +long long int now = time_msec(); + +if (bfd->state > STATE_DOWN && now >= bfd->detect_time) { bfd_set_state(bfd, STATE_DOWN, DIAG_EXPIRED); } -if (bfd->min_tx != bfd->cfg_min_tx || bfd->min_rx != bfd->cfg_min_rx) { +if (bfd->state == STATE_UP && bfd->decay_enabled +&& now >= bfd->decay_detect_time) { +bfd_decay(bfd); +} + +if (!bfd->decay_enabled && +(bfd->min_tx != bfd->cfg_min_tx || bfd->min_rx != bfd->cfg_min_rx)) { bfd_poll(bfd); } } @@ -804,6 +836,57 @@ bfd_set_state(struct bfd *bfd, enum state state, enum diag diag) } } +static uint64_t +bfd_rx_packets(const struct bfd *bfd) +{ +struct netdev_stats stats; + +if (bfd->netdev && !netdev_get_stats(bfd->netdev, &stats)) { +return stats.rx_packets; +} else { +return 0; +} +} + +static void +bfd_decay_min_rx_set_default(struct bfd *bfd) +{ + +bfd->decay_min_rx = bfd->mult * bfd->mult * MAX(bfd->cfg_min_rx, +bfd->rmt_min_tx); +/* Always restore the min_rx. */ +bfd->min_rx = bfd->cfg_min_rx; +} + +static void +bfd_decay(struct bfd *bfd) +{ +uint64_t rx_packets = bfd_rx_packets(bfd); +int64_t diff; + +diff = rx_packets - bfd->rx_packets; +bfd->rx_packets = rx_packets; +bfd->decay_detect_time = bfd->decay_min_rx + time_msec(); + +if (diff <= (bfd->decay_min_rx / bfd->min_rx + 5)) { +/* Decay when there is no obvious data traffic. */ +if (bfd->min_rx != bfd->decay_min_rx)
Re: [ovs-dev] [PATCH v2.34] datapath: Add basic MPLS support to kernel
On Wed, Jul 10, 2013 at 5:16 PM, Simon Horman wrote: > On Mon, Jul 08, 2013 at 11:51:18PM -0700, Jesse Gross wrote: >> On Tue, Jul 2, 2013 at 6:31 PM, Simon Horman wrote: >> > On Tue, Jul 02, 2013 at 02:59:51PM -0700, Jesse Gross wrote: >> >> On Mon, Jul 1, 2013 at 9:16 PM, Simon Horman wrote: >> >> > On Mon, Jul 01, 2013 at 04:42:46PM -0700, Jesse Gross wrote: >> >> >> On Wed, Jun 26, 2013 at 1:18 AM, Simon Horman >> >> >> wrote: >> >> >> > Allow datapath to recognize and extract MPLS labels into flow keys >> >> >> > and execute actions which push, pop, and set labels on packets. >> >> >> > >> >> >> > Based heavily on work by Leo Alterman, Ravi K, Isaku Yamahata and >> >> >> > Joe Stringer. >> >> >> > >> >> >> > Cc: Ravi K >> >> >> > Cc: Leo Alterman >> >> >> > Cc: Isaku Yamahata >> >> >> > Cc: Joe Stringer >> >> >> > Signed-off-by: Simon Horman >> >> >> >> >> >> Simon, have you thought any more about the header ordering issues? I >> >> >> don't think we've reached a conclusion at this point. >> >> > >> >> > I believe that you then raised the issue of QinQ, which somehow >> >> > I failed to respond to, I apologise for that. >> >> > >> >> > In particular, my understanding is that you are concerned the code will >> >> > miss-calculate the end of L2 headers in the presence of multiple VLAN >> >> > tags. >> >> > Thus resulting in an MPLS push action inserting an MPLS LSE after the >> >> > first >> >> > rather than the last VLAN tag. And that would likely change if QinQ >> >> > support >> >> > was added to Open vSwtich. >> >> > >> >> > I wonder if a good way forwards is to enhance the calculation >> >> > of the end of L2 headers (mac_len) and the beginning of L3 headers >> >> > (network_header) in ovs_flow_extract() such that it takes into >> >> > account the presence of more than one VLAN tag. >> >> >> >> The problem is that this requires being able to calculate the length >> >> of all possible headers that we might want to support in the future. >> >> In the case of QinQ, this would mean the various EtherTypes. You could >> >> also imagine other things like MAC-in-MAC that are farther afield from >> >> what we currently support. >> > >> > That is true. >> > >> > I think that the key problem is it that it is hard for us >> > to correctly determine where the end of the L2 header is, >> > or more specifically where the MPLS tag should go, for all cases. >> > Particularly cases which are yet to be defined. >> > >> > Having spoken with Joe about this we see two main options: >> > >> > 1. The status quo as of this patch. Which is that MPLS actions >> >may be invalid for some cases. >> > >> >While it should be be possible to solve individual cases, >> >this doesn't solve the general case. >> > >> > 2. Only allow MPLS actions on ether types where the implementation >> >is known to work. >> > >> >This could act as a white list of sorts. It could start with >> >the obvious candidates: IPv4, IPv6, ARP, 802.1Q,... >> >And support for more protocols could be added in the future. >> > >> >This would seem to reflect the somewhat special nature of MPLS. >> >> I think what is really necessary at the kernel level is some >> flexibility about where to put the newly inserted MPLS header. Then >> you could basically chose either of the two options above or export >> the flexibility through OpenFlow (which by my reading of the spec is >> already supposed to be allowed). Furthermore, you could do different >> things in different situations as OpenFlow evolves, which really has >> to be done at the userspace level since only it has the full set of >> knowledge about the protocol. > > I wonder if this can be achieved by adding a list of features to > the MPLS push action, for example as a possibly zero-length array of > integers which encode feature bits. > > At the time that MPLS support is added to the datapath we could define that > all the bits are zero and the behaviour of the code at that time is the > expected behaviour. > > Suppose that a new feature is added in the future. I will use the example > of support for 802.1ad (the standardised variant of Q-in-Q). > > The logic in the datapath to determine the end of the L2 header and thus > the top of the MPLS LSE stack could be guarded by a new feature bit, > the ad-bit. > > If an MPLS push action, supplied by userspace, has the ad-bit set then the > new logic is used and the MPLS LSE is inserted accordingly. > Conversely, if the MPLS push action does not have the bit set then the > old logic is used and the MPLS LSE is inserted as if the datapath > didn't understand 802.1ad. > > In this way it would be possible for userspace to select the desired > behaviour. An old user-space would use the old behaviour. A new userspace > may choose the old or the new behaviour. > > And it would be possible for the datapath to reject facets with MPLS > push actions with feature bits or combinations of feature bits that > are not supported. Hmm, I
Re: [ovs-dev] [PATCH v2.34] datapath: Add basic MPLS support to kernel
On Thu, Jul 11, 2013 at 06:26:32PM -0700, Jesse Gross wrote: > On Wed, Jul 10, 2013 at 5:16 PM, Simon Horman wrote: > > On Mon, Jul 08, 2013 at 11:51:18PM -0700, Jesse Gross wrote: > >> On Tue, Jul 2, 2013 at 6:31 PM, Simon Horman wrote: > >> > On Tue, Jul 02, 2013 at 02:59:51PM -0700, Jesse Gross wrote: > >> >> On Mon, Jul 1, 2013 at 9:16 PM, Simon Horman wrote: > >> >> > On Mon, Jul 01, 2013 at 04:42:46PM -0700, Jesse Gross wrote: > >> >> >> On Wed, Jun 26, 2013 at 1:18 AM, Simon Horman > >> >> >> wrote: > >> >> >> > Allow datapath to recognize and extract MPLS labels into flow keys > >> >> >> > and execute actions which push, pop, and set labels on packets. > >> >> >> > > >> >> >> > Based heavily on work by Leo Alterman, Ravi K, Isaku Yamahata and > >> >> >> > Joe Stringer. > >> >> >> > > >> >> >> > Cc: Ravi K > >> >> >> > Cc: Leo Alterman > >> >> >> > Cc: Isaku Yamahata > >> >> >> > Cc: Joe Stringer > >> >> >> > Signed-off-by: Simon Horman > >> >> >> > >> >> >> Simon, have you thought any more about the header ordering issues? I > >> >> >> don't think we've reached a conclusion at this point. > >> >> > > >> >> > I believe that you then raised the issue of QinQ, which somehow > >> >> > I failed to respond to, I apologise for that. > >> >> > > >> >> > In particular, my understanding is that you are concerned the code > >> >> > will > >> >> > miss-calculate the end of L2 headers in the presence of multiple VLAN > >> >> > tags. > >> >> > Thus resulting in an MPLS push action inserting an MPLS LSE after the > >> >> > first > >> >> > rather than the last VLAN tag. And that would likely change if QinQ > >> >> > support > >> >> > was added to Open vSwtich. > >> >> > > >> >> > I wonder if a good way forwards is to enhance the calculation > >> >> > of the end of L2 headers (mac_len) and the beginning of L3 headers > >> >> > (network_header) in ovs_flow_extract() such that it takes into > >> >> > account the presence of more than one VLAN tag. > >> >> > >> >> The problem is that this requires being able to calculate the length > >> >> of all possible headers that we might want to support in the future. > >> >> In the case of QinQ, this would mean the various EtherTypes. You could > >> >> also imagine other things like MAC-in-MAC that are farther afield from > >> >> what we currently support. > >> > > >> > That is true. > >> > > >> > I think that the key problem is it that it is hard for us > >> > to correctly determine where the end of the L2 header is, > >> > or more specifically where the MPLS tag should go, for all cases. > >> > Particularly cases which are yet to be defined. > >> > > >> > Having spoken with Joe about this we see two main options: > >> > > >> > 1. The status quo as of this patch. Which is that MPLS actions > >> >may be invalid for some cases. > >> > > >> >While it should be be possible to solve individual cases, > >> >this doesn't solve the general case. > >> > > >> > 2. Only allow MPLS actions on ether types where the implementation > >> >is known to work. > >> > > >> >This could act as a white list of sorts. It could start with > >> >the obvious candidates: IPv4, IPv6, ARP, 802.1Q,... > >> >And support for more protocols could be added in the future. > >> > > >> >This would seem to reflect the somewhat special nature of MPLS. > >> > >> I think what is really necessary at the kernel level is some > >> flexibility about where to put the newly inserted MPLS header. Then > >> you could basically chose either of the two options above or export > >> the flexibility through OpenFlow (which by my reading of the spec is > >> already supposed to be allowed). Furthermore, you could do different > >> things in different situations as OpenFlow evolves, which really has > >> to be done at the userspace level since only it has the full set of > >> knowledge about the protocol. > > > > I wonder if this can be achieved by adding a list of features to > > the MPLS push action, for example as a possibly zero-length array of > > integers which encode feature bits. > > > > At the time that MPLS support is added to the datapath we could define that > > all the bits are zero and the behaviour of the code at that time is the > > expected behaviour. > > > > Suppose that a new feature is added in the future. I will use the example > > of support for 802.1ad (the standardised variant of Q-in-Q). > > > > The logic in the datapath to determine the end of the L2 header and thus > > the top of the MPLS LSE stack could be guarded by a new feature bit, > > the ad-bit. > > > > If an MPLS push action, supplied by userspace, has the ad-bit set then the > > new logic is used and the MPLS LSE is inserted accordingly. > > Conversely, if the MPLS push action does not have the bit set then the > > old logic is used and the MPLS LSE is inserted as if the datapath > > didn't understand 802.1ad. > > > > In this way it would be possible for userspace to select the desired > > behaviour. A