Re: [ovs-dev] [PATCH ovn v3 0/5] Implement matching expressions for OVN.

Ben Pfaff Wed, 15 Apr 2015 14:01:10 -0700

On Thu, Apr 09, 2015 at 02:55:45PM -0400, Russell Bryant wrote:
> On 04/01/2015 12:52 AM, Ben Pfaff wrote:
> Nice work!  This is really impressive stuff.  Thanks a bunch for the
> extensive testing.  I'm relying on that pretty heavily for correctness
> of most of the logic.  I was mostly reading through and leaving style
> nits and looking for other subtle things.  Feel free to incorporate (or
> not) whichever pieces you'd like.
> Acked-by: Russell Bryant <rbry...@redhat.com>


Thank you for the review!

> > +/* Returns the name of measurement level 'level'. */
> > +const char *
> > +expr_level_to_string(enum expr_level level)
> 
> As a general comment, I probably would have used an 'ovn_' prefix for
> all the new types and functions exposed by libovn.  If I were you, I'd
> probably pass on making this change at this point because it would be a
> pain.  :-)

You might be right.  I'll think about it.  I did do this for OVSDB.

It's not really harder to do this after commit, so I'm going to postpone
it for now.

> > +{
> > +    switch (level) {
> > +    case EXPR_L_NOMINAL: return "nominal";
> > +    case EXPR_L_BOOLEAN: return "Boolean";
> > +    case EXPR_L_ORDINAL: return "ordinal";
> > +    default: OVS_NOT_REACHED();
> 
> This is a case where I'd drop default so you get the compiler warning if
> a new entry is added to the enum but not to this switch statement.

The compiler options we use ensure that this happens regardless.  Try
deleting one of the cases above.  You get Clang and GCC diagnostics:

    ../ovn/expr.c:33:13: error: enumeration value 'EXPR_L_ORDINAL' not 
explicitly
          handled in switch [-Werror,-Wswitch-enum]

    ../ovn/expr.c: In function 'expr_level_to_string':
    ../ovn/expr.c:33:5: error: enumeration value 'EXPR_L_ORDINAL' not handled 
in switch [-Werror=switch-enum]

> I won't repeat this same comment for any other cases in this patch, but
> it generally applies to any of the default: OVS_NOT_REACHED() cases.

We can't really delete the OVS_NOT_REACHED() cases because then GCC
gives a different diagnostic:

    ../ovn/expr.c: In function 'expr_level_to_string':
    ../ovn/expr.c:38:1: error: control reaches end of non-void function 
[-Werror=return-type]

> > +    }
> > +}
> > +
> > +bool
> > +expr_relop_from_token(enum lex_type type, enum expr_relop *relop)
> > +{
> > +    enum expr_relop r;
> > +
> > +    switch ((int) type) {
> 
> I'm curious why the cast was needed here.

Oh, that suppresses warnings related to the point above, which aren't
really helpful here.  Try removing the cast:

    ../ovn/expr.c:63:13: error: 19 enumeration values not explicitly handled in
          switch: 'LEX_T_END', 'LEX_T_ID', 'LEX_T_STRING'... 
[-Werror,-Wswitch-enum]

or

    ../ovn/expr.c:63:5: error: enumeration value 'LEX_T_END' not handled in 
switch [-Werror=switch-enum]
    ../ovn/expr.c:63:5: error: enumeration value 'LEX_T_ID' not handled in 
switch [-Werror=switch-enum]
    ../ovn/expr.c:63:5: error: enumeration value 'LEX_T_STRING' not handled in 
switch [-Werror=switch-enum]
    ../ovn/expr.c:63:5: error: enumeration value 'LEX_T_INTEGER' not handled in 
switch [-Werror=switch-enum]
    ../ovn/expr.c:63:5: error: enumeration value 'LEX_T_MASKED_INTEGER' not 
handled in switch [-Werror=switch-enum]
    ../ovn/expr.c:63:5: error: enumeration value 'LEX_T_ERROR' not handled in 
switch [-Werror=switch-enum]
    ../ovn/expr.c:63:5: error: enumeration value 'LEX_T_LPAREN' not handled in 
switch [-Werror=switch-enum]
    ../ovn/expr.c:63:5: error: enumeration value 'LEX_T_RPAREN' not handled in 
switch [-Werror=switch-enum]
    ../ovn/expr.c:63:5: error: enumeration value 'LEX_T_LCURLY' not handled in 
switch [-Werror=switch-enum]
    ../ovn/expr.c:63:5: error: enumeration value 'LEX_T_RCURLY' not handled in 
switch [-Werror=switch-enum]
    ../ovn/expr.c:63:5: error: enumeration value 'LEX_T_LSQUARE' not handled in 
switch [-Werror=switch-enum]
    ../ovn/expr.c:63:5: error: enumeration value 'LEX_T_RSQUARE' not handled in 
switch [-Werror=switch-enum]
    ../ovn/expr.c:63:5: error: enumeration value 'LEX_T_LOG_NOT' not handled in 
switch [-Werror=switch-enum]
    ../ovn/expr.c:63:5: error: enumeration value 'LEX_T_LOG_AND' not handled in 
switch [-Werror=switch-enum]
    ../ovn/expr.c:63:5: error: enumeration value 'LEX_T_LOG_OR' not handled in 
switch [-Werror=switch-enum]
    ../ovn/expr.c:63:5: error: enumeration value 'LEX_T_ELLIPSIS' not handled 
in switch [-Werror=switch-enum]
    ../ovn/expr.c:63:5: error: enumeration value 'LEX_T_COMMA' not handled in 
switch [-Werror=switch-enum]
    ../ovn/expr.c:63:5: error: enumeration value 'LEX_T_SEMICOLON' not handled 
in switch [-Werror=switch-enum]
    ../ovn/expr.c:63:5: error: enumeration value 'LEX_T_EQUALS' not handled in 
switch [-Werror=switch-enum]

It didn't seem worth listing cases for all the non-relational operator
tokens here.

> > +/* Constructing and manipulating expressions. */
> > +
> > +/* Creates and returns a logical AND or OR expression (according to 'type',
> > + * which must be EXPR_T_AND or EXPR_T_OR) that initially has no
> > + * sub-expressions.  (To satisfy the invariants for expressions, the caller
> > + * must add at least two sub-expressions whose types are different from
> > + * 'type'.) */
> > +struct expr *
> > +expr_create_andor(enum expr_type type)
> > +{
> > +    struct expr *e = xmalloc(sizeof *e);
> > +    e->type = type;
> > +    list_init(&e->andor);
> > +    return e;
> 
> This leaves e->node uninitialized.  Maybe use xzalloc() instead?

I see limited value in initializing the node member, since it isn't
meaningful outside of the context of some parent node.

> > +/* Returns an EXPR_T_BOOLEAN expression with value 'b'. */
> > +struct expr *
> > +expr_create_boolean(bool b)
> > +{
> > +    struct expr *e = xmalloc(sizeof *e);
> 
> Maybe use xzalloc() here to avoid e->node being uninitialized?

Same here.

> > +    e->type = EXPR_T_BOOLEAN;
> > +    e->boolean = b;
> > +    return e;
> > +}
> 
> > +static void
> > +expr_format_string(const char *s, struct ds *ds)
> > +{
> > +    struct json json;
> > +    json.type = JSON_STRING;
> > +    json.u.string = CONST_CAST(char *, s);
> 
> This could be:
> 
>     struct json json = {
>         .type = JSON_STRING,
>         .u.string = CONST_CAST(char *, s),
>     };
> 
> It doesn't make a difference now, but would ensure the whole struct is
> initialized if more was ever added to struct json.

Sure, good idea, I made that change.

I made the same change to lex_token_format_string(), which is a
cut-and-paste of the same code.  (I have another patch, which I'm not
sure I've posted yet, that unifies 3 copies of this same snippet in the
tree.)

> > +    json_to_ds(&json, 0, ds);
> > +}
> > +
> > +static void
> > +expr_format_cmp(const struct expr *e, struct ds *s)
> > +{
> > +    if (e->cmp.symbol->width) {
> 
> really minor style nit ... I'd be tempted to change this to:
> 
>     if (!e->cmp.symbol->width) {
>         ds_put_format(s, "%s %s ", e->cmp.symbol->name,
>                       expr_relop_to_string(e->cmp.relop));
>         expr_format_string(e->cmp.string, s);
>         return;
>     }
> 
> and then bring the majority of the function back to not being nested.

I don't care much either way, so I made this change.

> > +        int ofs, n;
> > +
> > +        find_bitwise_range(&e->cmp.mask, e->cmp.symbol->width, &ofs, &n);
> > +        if (n == 1
> > +            && (e->cmp.relop == EXPR_R_EQ || e->cmp.relop == EXPR_R_NE)) {
> > +            bool positive;
> > +
> > +            positive = bitwise_get_bit(&e->cmp.value, sizeof e->cmp.value,
> > +                                       ofs);
> > +            positive ^= e->cmp.relop == EXPR_R_NE;
> > +            if (!positive) {
> > +                ds_put_char(s, '!');
> > +            }
> > +            ds_put_cstr(s, e->cmp.symbol->name);
> > +            if (e->cmp.symbol->width > 1) {
> > +                ds_put_format(s, "[%d]", ofs);
> > +            }
> > +            return;
> > +        }
> > +
> > +        ds_put_cstr(s, e->cmp.symbol->name);
> > +        if (n > 0 && n < e->cmp.symbol->width) {
> > +            if (n > 1) {
> > +                ds_put_format(s, "[%d..%d]", ofs, ofs + n - 1);
> > +            } else {
> > +                ds_put_format(s, "[%d]", ofs);
> > +            }
> > +        }
> > +
> > +        ds_put_format(s, " %s ", expr_relop_to_string(e->cmp.relop));
> > +
> > +        if (n) {
> > +            union mf_subvalue value;
> > +
> > +            memset(&value, 0, sizeof value);
> 
> I prefer something like the following:
> 
>     union mf_subvalue value = {
>         .integer = 0,
>     };
> 
> It results in the same thing as the memset(), but gives the opportunity
> for the memset to be optimized out if possible.  There are some other
> cases where the same change could be made if you like it.

I do tend to forget about the .member initialization syntax, because OVS
style didn't allow it until recently (now that recent Visual C supports
it), so I do underuse it.  But there are a couple of reasons why I don't
think it's the best choice here.

First, it reads to the programmer as if I only cared about the 'integer'
member, when really I want the whole thing zero-initialized.  It does
read to the compiler as "initialize all the members to zero" but in a
case like this, that always makes me go look and figure out whether
there is any non-member padding that needs to be zero-initialized (the
compiler isn't obligated to do that and I don't know whether Clang and
GCC and MSVC consistently do).

Second, the C standard has some pretty fussy "aliasing" requirements for
unions.  If you write one member, you're strictly supposed to only read
from that member.  GCC, in some modes, will screw up your code if you
don't pay attention to that rule, and we've seen it happen, see the
following commits:

commit 538919d3a672eab8a561048acf9a8e1721cd559c
Author: Ben Pfaff <b...@nicira.com>
Date:   Tue Sep 30 17:03:07 2014 -0700

    configure: Disable strict aliasing.
    
    The C standard allows compilers to do type-based alias analysis, which
    means that the compiler is allowed to assume that pointers to objects of
    different types are pointers to different objects.  For example, a compiler
    may assume that "uint16_t *a" and "uint32_t *b" point to different and
    nonoverlapping locations because the pointed-to types are different.  This
    can lead to surprising "optimizations" with compilers that by default do
    this kind of analysis, which includes GCC and Clang.
    
    The one escape clause that the C standard gives us is that character types
    must be assumed to alias any other object.  We've always tried to use this
    escape clause to avoid problems with type-based alias analysis in the past.
    I think that we should continue to try to do this in the future.  It's hard
    to tell what compiler we might want to use in the future, and one never
    knows what kind of control that compiler allows over alias analysis.
    
    However, recently I helped another developer debug a nasty and confusing
    issue, which turned out to be the result of a surprising compiler
    optimization due to alias analysis.  I've seen enough of these that I don't
    think it's worthwhile to risk more problems than we have to.  Thus, this
    commit turns off type-based alias analysis in GCC and Clang.
    
    Linus Torvalds thinks that type-base alias analysis is not sane, at least
    as GCC implements it: https://lkml.org/lkml/2003/2/26/158
    
    The GCC manual says that -Wstrict-aliasing is only effective without
    -fno-strict-aliasing, otherwise I'd keep -Wstrict-aliasing also.
    
    Indications are that MSVC doesn't do type-based alias analysis by default.
    
    Signed-off-by: Ben Pfaff <b...@nicira.com>
    Acked-by: Jarno Rajahalme <jrajaha...@nicira.com>

commit 02334943a07e2e9a40e350397192d7793b9962f4
Author: Jarno Rajahalme <jrajaha...@nicira.com>
Date:   Tue Aug 5 13:51:19 2014 -0700

    Fix strict aliasing violations with GCC 4.1 and 4.4.
    
    The typical use of struct sockaddr_storage is flagged as a strict
    aliasing violation by GCC 4.4.7.  Using an explicit union lets the
    compiler know that accessing the same location via different types is
    not an error.
    
    GCC 4.1.2 had a similar complaint about a cast of ukey's key_buf to
    nlattr.  After this patch there are no further warnings with the
    XenServer build, so we could start treating warnings as errors in the
    builds.
    
    Signed-off-by: Jarno Rajahalme <jrajaha...@nicira.com>
    Acked-by: Ben Pfaff <b...@nicira.com>

I see that the lkml.org link above is dead, but it's in the Wayback
Machine:
        
https://web.archive.org/web/20150130022255/https://lkml.org/lkml/2003/2/26/158

Since memset() is character-based, it's safer than initializing a
non-character-based union member.

Finally, I don't know why the compiler would be better or worse at
memset() versus initialization.  GCC understands what memset() does, so
I think it ought to be able to do just as good a job.  Also, old
versions of GCC (3.x?) had pretty bad reputations regarding complicated
initializers; I don't have citations, but I remember Dave Miller on
netdev telling people not to use them because they generated bad code.

So, while I like member-based initialization better than memset() in
general, I don't think that this is a good place to substitute it.

> > +static void OVS_PRINTF_FORMAT(2, 3)
> > +expr_error(struct expr_context *ctx, const char *message, ...)
> > +{
> > +    if (!ctx->error) {
> > +        if (ctx->lexer->token.type == LEX_T_ERROR) {
> > +            ctx->error = xstrdup(ctx->lexer->token.s);
> > +        } else {
> > +            va_list args;
> > +            va_start(args, message);
> > +            ctx->error = xvasprintf(message, args);
> > +            va_end(args);
> > +        }
> > +    }
> > +}
> 
> This will silently drop the error message if expr_error() is called
> twice.  It avoids a mem leak, which is great, but I wonder if a
> VLOG_WARN would be good here.
> 
> I suppose another alternative would be to make it stack error messages.
>  It could append something like: "(next error: %s)".

I figured that dropping the later error messages was a feature, since
the cascade seemed to me to be unlikely to be useful.

> > +static void OVS_PRINTF_FORMAT(2, 3)
> > +expr_syntax_error(struct expr_context *ctx, const char *message, ...)
> > +{
> > +    if (!ctx->error) {
> > +        if (ctx->lexer->token.type == LEX_T_ERROR) {
> > +            ctx->error = xstrdup(ctx->lexer->token.s);
> > +        } else {
> > +            struct ds s;
> > +
> > +            ds_init(&s);
> > +            ds_put_cstr(&s, "Syntax error ");
> > +            if (ctx->lexer->token.type == LEX_T_END) {
> > +                ds_put_cstr(&s, "at end of input ");
> > +            } else if (ctx->lexer->start) {
> > +                ds_put_format(&s, "at `%.*s' ",
> > +                              (int) (ctx->lexer->input - 
> > ctx->lexer->start),
> > +                              ctx->lexer->start);
> > +            }
> > +
> > +            va_list args;
> > +            va_start(args, message);
> > +            ds_put_format_valist(&s, message, args);
> > +            va_end(args);
> > +
> > +            ctx->error = ds_steal_cstr(&s);
> > +        }
> > +    }
> > +}
> 
> You could reduce nesting in the 2 functions above with:
> 
>     if (ctx->error) {
>         return;
>     }

I guess they have a little common code too.  Thanks, I folded this in
(with "diff -w" to ignore white space changes):

@@ -460,28 +460,44 @@ static void expr_not(struct expr *);
 static void expr_constant_set_destroy(struct expr_constant_set *);
 static bool parse_field(struct expr_context *, struct expr_field *);
 
-static void OVS_PRINTF_FORMAT(2, 3)
-expr_error(struct expr_context *ctx, const char *message, ...)
+static bool
+expr_error_handle_common(struct expr_context *ctx)
 {
-    if (!ctx->error) {
-        if (ctx->lexer->token.type == LEX_T_ERROR) {
+    if (ctx->error) {
+        /* Already have an error, suppress this one since the cascade seems
+         * unlikely to be useful. */
+        return true;
+    } else if (ctx->lexer->token.type == LEX_T_ERROR) {
+        /* The lexer signaled an error.  Nothing at the expression level
+         * accepts an error token, so we'll inevitably end up here with some
+         * meaningless parse error.  Report the lexical error instead. */
         ctx->error = xstrdup(ctx->lexer->token.s);
+        return true;
     } else {
+        return false;
+    }
+}
+
+static void OVS_PRINTF_FORMAT(2, 3)
+expr_error(struct expr_context *ctx, const char *message, ...)
+{
+    if (expr_error_handle_common(ctx)) {
+        return;
+    }
+
     va_list args;
     va_start(args, message);
     ctx->error = xvasprintf(message, args);
     va_end(args);
 }
-    }
-}
 
 static void OVS_PRINTF_FORMAT(2, 3)
 expr_syntax_error(struct expr_context *ctx, const char *message, ...)
 {
-    if (!ctx->error) {
-        if (ctx->lexer->token.type == LEX_T_ERROR) {
-            ctx->error = xstrdup(ctx->lexer->token.s);
-        } else {
+    if (expr_error_handle_common(ctx)) {
+        return;
+    }
+
     struct ds s;
 
     ds_init(&s);
@@ -501,8 +517,6 @@ expr_syntax_error(struct expr_context *ctx, const char 
*message, ...)
 
     ctx->error = ds_steal_cstr(&s);
 }
-    }
-}
 
 static struct expr *
 make_cmp__(const struct expr_field *f, enum expr_relop r,

> > +static bool
> > +parse_constant_set(struct expr_context *ctx, struct expr_constant_set *cs)
> > +{
> > +    size_t allocated_values = 0;
> > +    bool ok;
> > +
> > +    memset(cs, 0, sizeof *cs);
> 
> I went looking around the code seeing if there was any possibility of
> leaking anything here.  Perhaps add a comment above the function that
> clarifies the precondition that cs is uninitialized (or at least empty),
> as this function will initialize it.
> 
> An expr_constant_set_init() might be nice for more consistency, bit it
> may be overkill to do that for everything.

I added a comment:

/* Parses a single or {}-enclosed set of integer or string constants into 'cs',
 * which the caller need not have initialized.  Returns true on success, in
 * which case the caller owns 'cs', false on failure, in which case 'cs' is
 * indeterminate. */

> > +    if (list_is_empty(&expr->andor)) {
> > +        if (is_all_zeros(&mask, sizeof mask)) {
> > +            expr_destroy(expr);
> > +            return expr_create_boolean(true);
> > +        } else {
> > +            struct expr *cmp;
> > +            cmp = xmalloc(sizeof *cmp);
> > +            cmp->type = EXPR_T_CMP;
> > +            cmp->cmp.symbol = symbol;
> > +            cmp->cmp.relop = EXPR_R_EQ;
> > +            cmp->cmp.value = value;
> > +            cmp->cmp.mask = mask;
> > +            expr_destroy(expr);
> > +            return cmp;
> 
> This code leaves part of cmp unitialized (cmp->node).

I guess I just think of 'node' as owned by the parent of an expression,
not by the expression itself, so I don't bother to initialize it until
it's a subexpression of something else.

> > +static struct expr_match *
> > +expr_match_new(const struct match *m, uint8_t clause, uint8_t n_clauses,
> > +               uint32_t conj_id)
> > +{
> > +    struct expr_match *match = xmalloc(sizeof *match);
> 
> How about xzalloc() here?
> 
> if m is NULL, match->match is uninitialized.

I didn't want to initialize match->match if the caller doesn't know what
it is yet (because "struct match" is large--hundreds of bytes), so I
left that possibility.  However, both callers that do that first call
match_init_catchall(&match->match), so I guess we might as well do that
in expr_match_new().

I folded this in:

@@ -2048,6 +2066,17 @@ expr_normalize(struct expr *expr)
     OVS_NOT_REACHED();
 }
 
+/* Creates, initializes, and returns a new 'struct expr_match'.  If 'm' is
+ * nonnull then it is copied into the new expr_match, otherwise the new
+ * expr_match's 'match' member is initialized to a catch-all match for the
+ * caller to refine in-place.
+ *
+ * If 'conj_id' is nonzero, adds one conjunction based on 'conj_id', 'clause',
+ * and 'n_clauses' to the returned 'struct expr_match', otherwise the
+ * expr_match will not have any conjunctions.
+ *
+ * The caller should use expr_match_add() to add the expr_match to a hash table
+ * after it is finalized. */
 static struct expr_match *
 expr_match_new(const struct match *m, uint8_t clause, uint8_t n_clauses,
                uint32_t conj_id)
@@ -2055,6 +2084,8 @@ expr_match_new(const struct match *m, uint8_t clause, 
uint8_t n_clauses,
     struct expr_match *match = xmalloc(sizeof *match);
     if (m) {
         match->match = *m;
+    } else {
+        match_init_catchall(&m->match);
     }
     if (conj_id) {
         match->conjunctions = xmalloc(sizeof *match->conjunctions);
@@ -2071,6 +2102,11 @@ expr_match_new(const struct match *m, uint8_t clause, 
uint8_t n_clauses,
     return match;
 }
 
+/* Adds 'match' to hash table 'matches', which becomes the new owner of
+ * 'match'.
+ *
+ * This might actually destroy 'match' because it gets merged together with
+ * some existing conjunction.*/
 static void
 expr_match_add(struct hmap *matches, struct expr_match *match)
 {
@@ -2175,7 +2211,6 @@ static void
 add_cmp_flow(const struct expr *cmp, struct hmap *matches)
 {
     struct expr_match *m = expr_match_new(NULL, 0, 0, 0);
-    match_init_catchall(&m->match);
     constrain_match(cmp, &m->match);
     expr_match_add(matches, m);
 }
@@ -2221,7 +2256,6 @@ expr_to_matches(const struct expr *expr, struct hmap 
*matches)
     case EXPR_T_BOOLEAN:
         if (expr->boolean) {
             struct expr_match *m = expr_match_new(NULL, 0, 0, 0);
-            match_init_catchall(&m->match);
             expr_match_add(matches, m);
         } else {
             /* No match. */

> All cases leave match->hmap_node uninitialized.

I'm comfortable with that, it's owned by the parent hash table not by
the node itself.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH ovn v3 0/5] Implement matching expressions for OVN.

Reply via email to