On Tue, Mar 12, 2013 at 07:44:10PM -0700, Ethan Jackson wrote:
> Traditionally, Open vSwitch has used a variant of 802.1ag "CFM" for
> interface liveness detection.  This has served us well until now,
> but has several serious drawbacks which have steadily become more
> inconvenient.  First, the 802.1ag standard does not implement
> several useful features forcing us to (optionally) break
> compatibility.  Second, 802.1.ag is not particularly popular
> outside of carrier grade networking equipment.  Third, 802.1ag is
> simply quite awkward.
> 
> In an effort to solve the aforementioned problems, this patch
> implements BFD which is ubiquitous, well designed, straight
> forward, and implements required features in a standard way.  The
> initial cut of the protocol focuses on getting the basics of the
> specification correct, leaving performance optimizations, and
> advanced features as future work.  The protocol should be
> considered experimental pending future testing.
> 
> Signed-off-by: Ethan Jackson <et...@nicira.com>

bfd.c
-----

> +#include <config.h>
> +#include <bfd.h>

We usually use double quotes ("bfd.h") for our own headers.

> +/* XXX Finish Experimnetal BFD support.

What part is experimental?  (The protocol or the implementation?)

> + * - CFM random VLAN option equivalent.  Instead of vlan, randomly modulate 
> the
> + *   UDP source port to get the required entropy.

Ugh.  Do we really need random anything?  I've never liked that
"feature".

> +#define BFD_VERSION 1

I think the following is microseconds, could you add a comment /*
microseconds */, put _USEC in the name, etc.?

> +#define OVS_BFD_MIN_RX 200000

I'd be half-inclined to write the states as 0 << 6, 1 << 6, 2 << 6, 3
<< 6.  But it doesn't really matter:

> +enum state {
> +    STATE_ADMIN_DOWN = 0,
> +    STATE_DOWN = 1,
> +    STATE_INIT = 2,
> +    STATE_UP = 3
> +};

...

> +/* RFC 5880 Section 4.1

I think you left out the line that gives the first digits of the
two-digit numbers here.  It's harder to read without them:

> + *  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
> + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + * |Vers |  Diag   |Sta|P|F|C|A|D|M|  Detect Mult  |    Length     |
> + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + * |                       My Discriminator                        |
> + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + * |                      Your Discriminator                       |
> + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + * |                    Desired Min TX Interval                    |
> + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + * |                   Required Min RX Interval                    |
> + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + * |                 Required Min Echo RX Interval                 |
> + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ */

The BFD timers are 32-bit counts of microseconds.  The timers in
struct bfd are 64-bit and appear to be milliseconds, but I'm not
certain that all of them have the same unit.  Comments on units would
be welcome.  Do they need to be 64-bit?

...

> +/* Returns a 'smap' of key value pairs representing the status of 'bfd'
> + * intended for the OVS database. */
> +void
> +bfd_get_status(const struct bfd *bfd, struct smap *smap)
> +{
> +    smap_add(smap, "forwarding", bfd_forwarding(bfd) ? "true" : "false");
> +    smap_add(smap, "state", bfd_state_str(bfd->state));
> +    smap_add(smap, "diagnostic", bfd_diag_str(bfd->diag));
> +
> +    if (bfd->state != STATE_DOWN) {

Spaces are OK in the OVSDB protocol but I don't think we use them
much.  Usually we use hyphen or underscores (probably should
standardize that in fact):

> +        smap_add(smap, "remote state", bfd_state_str(bfd->rmt_state));
> +        smap_add(smap, "remote diagnostic", bfd_diag_str(bfd->rmt_diag));
> +    }
> +}

The interface for bfd_configure() is unusual.  It does seem to be
appropriate.

bfd_configure() checks for a key "enable" in the configuration that is
passed in, but I don't see any documentation for that key.

This might be a personal problem, but I find the following:
> +        udp_src++;
> +        udp_src = udp_src < 49152 ? 49152 : udp_src;
> +        bfd->udp_src = udp_src;
a little harder to verify as correct than:
        bfd->udp_src = (udp_src++ & 16383) + 49152;

The following, and similar, bugs me a bit because it can call
smap_get_int() twice due to the macro expansion:
> +    min_tx = MAX(smap_get_int(cfg, "min_tx", 100), 100);

> +void
> +bfd_run(struct bfd *bfd)
> +{
> +    if (bfd->state > STATE_DOWN && time_msec() > bfd->detect_time) {

The test above should probably be >= bfd->detect_time or we should
wait until detect_time + 1 in bfd_wait(), otherwise we could wake up 1
ms too early.

> +        bfd_set_state(bfd, STATE_DOWN, DIAG_EXPIRED);
> +    }
> +
> +    if (bfd->min_tx != bfd->cfg_min_tx || bfd->min_rx != bfd->cfg_min_rx) {
> +        bfd_poll(bfd);
> +    }
> +}

...

In bfd_put_packet(), it might be wise to start out with
ofpbuf_reserve(p, 2); to properly align what comes afterward.

> +    eth = ofpbuf_put_uninit(p, sizeof *eth);
> +    memcpy(eth->eth_dst, eth_addr_broadcast, sizeof eth->eth_dst);
> +    memcpy(eth->eth_src, eth_src, ETH_ADDR_LEN);

It would be nice to use sizeof in both cases or neither, above.

> +    eth->eth_type = htons(ETH_TYPE_IP);
> +
> +    ip = ofpbuf_put_zeros(p, sizeof *ip);
> +    ip->ip_ihl_ver = IP_IHL_VER(5, 4);
> +    ip->ip_tot_len = htons(sizeof *ip + sizeof *udp + sizeof(struct msg));
> +    ip->ip_ttl = 1;

I understand why we're using ip_ttl == 1, but I wonder whether you
considered following RFC 5082 and therefore using (and checking for)
ip_ttl == 255.

> +    ip->ip_proto = IPPROTO_UDP;
> +    ip->ip_src = htonl(0xA9FE0100); /* 169.254.1.0 Link Local. */
> +    ip->ip_dst = htonl(0xA9FE0101); /* 169.254.1.1 Link Local. */
> +    ip->ip_csum = csum(ip, sizeof *ip);
> +
> +    udp = ofpbuf_put_zeros(p, sizeof *udp);
> +    udp->udp_src = htons(bfd->udp_src);
> +    udp->udp_dst = htons(BFD_DEST_PORT);
> +    udp->udp_len = htons(sizeof *udp + sizeof(struct msg));

I'd be inclined to write "sizeof *msg" instead of "sizeof(struct
msg)" above.

> +    msg = ofpbuf_put_uninit(p, sizeof(struct msg));

Here too.

It looks like bfd_process_packet() expects the caller to have pulled
off everything except the L7 payload.  It would be nice to put this in
a comment.

I don't see a check for this requirement from 6.8.6 (I think it only
requires comparing p->size to msg->length):

      If the Length field is greater than the payload of the
      encapsulating protocol, the packet MUST be discarded.

The pseudocode at the end of 6.8.6 only mentions updating
bfd.LocalDiag in a couple of cases, but the implementation appears to
update LocalDiag in every case where it changes the session state.

I don't see anything in bfd_process_packet() that corresponds to:

      If the packet was not discarded, it has been received for purposes
      of the Detection Time expiration rules in section 6.8.4.

but maybe I just missed it.

Perhaps the log message in bfd_poll() should include bfd->name.

In bfd_min_tx(), I'd be inclined to remove the outer () here:
> +    return (bfd->state == STATE_UP ? bfd->min_tx : MAX(bfd->min_tx, 1000));

In bfd_tx_interval(), I'm not wild about putting MAX around a function
call here either.

In bfd_flag_str(), please use ovs_strlcpy() instead of strncpy().
There is occasionally a good reason to use strncpy().  However:

        * Using strncpy() into a large buffer can be very inefficient.
          strncpy() always writes to every byte in the destination
          buffer, which can waste a lot of time if the destination
          buffer is much longer than the source string.

        * If the source string is longer than the size of the
          destination buffer, then strncpy() doesn't write a
          terminating null.  So a call to strncpy() must be followed
          by explicitly writing a null terminator at the end of the
          destination buffer in most cases.

bfd_flag_str() could return a const string.

> +static void
> +log_msg(enum vlog_level level, const struct msg *p, const char *message,
> +        const struct bfd *bfd)
> +{
> +    struct ds ds = DS_EMPTY_INITIALIZER;
> +
> +    if (level != VLL_DBG && vlog_should_drop(THIS_MODULE, level, &rl)) {

If we really don't want to rate-limit debug messages, then we should
at least check for VLOG_IS_DBG_ENABLED() in that case, so that in the
common case we don't spend a lot of time composing a message that will
just be discarded later.

It might be a good idea to rate-limit debug messages anyway, because
these days we have an ovs-appctl command to disable rate limits on
particular modules.

Would there be any value in showing the full times in microseconds?
Occasionally I find that every detail is valuable in debugging.

The implementation of generate_discriminator() could be expensive if
we have thousands of BFD instances (which I know we will have).  Would
it be worthwhile to use a global discriminator-based hmap instead of a
global list to hold the BFD instances?

> +static long long int
> +msec_diff(long long int earlier, long long int later) {

We usually put the { on a line by itself.

> +    later -= earlier;
> +    return later < 0 ? 0 : later;
> +}

The following loses a bit of information by truncating to 0.  Did you
consider a format like "now%+lld ms" so that you get output like
"now+5 ms" or "now-10 ms"?
> +    ds_put_format(ds, "\tDetect Time: %lldms from now\n",
> +                  msec_diff(time_msec(), bfd->detect_time));
> +    ds_put_format(ds, "\tNext TX Time: %lldms from now\n",
> +                  msec_diff(time_msec(), bfd->next_tx));
> +    ds_put_format(ds, "\tLast TX Time: %lldms ago\n",
> +                  msec_diff(bfd->last_tx, time_msec()));

bfd.h
-----

I think that the parameter to bfd_should_send_packet() could be 'const':
> +bool bfd_should_send_packet(struct bfd *);

odp-util.h
----------

I think this comment can now add SLOW_BFD:

    /* Mutually exclusive with SLOW_CFM, SLOW_LACP, SLOW_STP.
     * Could possibly appear with SLOW_IN_BAND. */


ofproto-dpif.c
--------------

Why do we now need to call time_refresh() in run_fast()?

vswitch.xml
-----------

Should we mention RFC 5881 also?

"three times the expected reception rate" is the way we configure BFD,
I guess, but it's not protocol-specified.

"aims comply" => "aims to comply":
            The Open vSwitch implementation of BFD aims comply faithfully with

Many of the column specifications could be more specific.  For
example:
      <column name="bfd" key="min_rx" 
              type='{"type": "integer", "minInteger": 1}'>

and 
      <column name="bfd_status" key="state">
              type='{"type": "string", 
                     "enum": ["set", ["UP", "DOWN", "INIT", "ADMIN_DOWN"]]}'>

We usually use quotes ``like this'' in vswitch.xml, here:
> +            omission is "Demand Mode", which we hope to include in future.
> +            Open vSwitch does not implement the optional Authentication or
and here:
> +            "Echo Mode" features.
> +        </p>


> +      <column name="bfd" key="min_rx">

I might say "fastest" instead of "minimum" here:

> +          The minimum rate, in milliseconds, at which this BFD session is

may "be" slower:

> +          willing to receive BFD control messages.  The actual rate may 
> slower
> +          if the remote endpoint isn't willing to transmit as quickly as
> +          specified.  Defaults to <code>1000</code>.
> +      </column>
> +
> +      <column name="bfd" key="min_tx">

"fastest" here?
> +          The minimum rate, in milliseconds, at which this BFD session is
> +          willing to transmit BFD control messages.  The actual rate may be
> +          slower if the remote endpoint isn't willing to receive as quickly 
> as
> +          specified.  Defaults to <code>100</code>.
> +      </column>
> +
> +      <column name="bfd" key="cpath_down">

The following description doesn't actually tell the user how to use
cpath_down.  Rephrase slightly?

> +          Concatenated path down may be used when the local system should not
> +          have traffic forwarded to it for some reason other than a 
> connectivty
> +          failure on the interface being monitored.  The local BFD session 
> will
> +          notify the remote session of the connectivity problem, at which 
> time
> +          the remote session may choose not to forward traffic.  Defaults to
> +          <code>false</code>.
> +      </column>
> +
> +      <column name="bfd_status" key="state">

Any reason to make the state values all-uppercase?  We tend to use
lowercase elsewhere in the db.

> +          State of the BFD session.  One of <code>ADMIN_DOWN</code>,
> +          <code>DOWN</code>, <code>INIT</code>, or <code>UP</code>.
> +      </column>
> +
> +      <column name="bfd_status" key="forwarding">
> +          True if the BFD session believes this <ref table="Interface"/> may 
> be
> +          used to forward traffic.  Typically this means the local session is

"signaling":

> +          up, and the remote system isn't signalling a problem such as
> +          concatenated path down.
> +      </column>

Thanks,

Ben.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to