Re: [ovs-dev] [PATCH ovn v3 1/5] lex: New lexical analyzer module for use in OVN.

Ben Pfaff Wed, 15 Apr 2015 16:11:41 -0700

On Wed, Apr 08, 2015 at 05:17:33PM -0400, Russell Bryant wrote:
> On 04/01/2015 12:52 AM, Ben Pfaff wrote:
> > I'm determined not to let the terrible style of pseudo-parsing we have in
> > OVS leak into OVN.  Here's the first step.
> > 
> > Signed-off-by: Ben Pfaff <b...@nicira.com>
> 
> Did you consider flex and bison for this patch series?  It's mostly a
> weakly informed question out of curiosity.  I don't have much experience
> with them, but it seems the lexer could have been done with flex, for
> example.


I did think about that.  I've done things both ways in the past, and
even written my own parser-generators for that matter.  My experience is
that machine-generated lexers and parsers are most useful when the
lexing and parsing is somewhat tricky.  In this case, it's not very
difficult, so I prefer to avoid adding a dependency.

> I left several style related comments inline, but it's all optional
> stuff that I noted while reading through.  I didn't find anything
> obviously wrong.  Nice work on the tests.  That certainly helps the
> confidence.  :-)
> 
> Acked-by: Russell Bryant <rbry...@redhat.com>

Thanks for the review.

> > +lib_LTLIBRARIES += lib/libovn.la
> > +lib_libovn_la_SOURCES = ovn/lex.c ovn/lex.h
> > +
> 
> This can probably just be done later, but I like the idea mentioned in
> another thread about starting to create some subdirs under ovn/.  In
> this case, perhaps we can have the sources go in ovn/lib/?

Given that we already have ovn/controller, I guess that's a good idea.
Done.

> > +/* Initializes 'token'. */
> > +void
> > +lex_token_init(struct lex_token *token)
> > +{
> > +    token->type = LEX_T_END;
> > +    token->s = NULL;
> > +}
> 
> Perhaps this could be updated to initialize all of lex_token?  I can't
> find any case where it's actually a problem, but that's the behavior I'd
> expect without looking.

I'd like to avoid that simply because I expect "union mf_subvalue" to
get quite large when Geneve options are fully supported.

> > +static size_t
> > +lex_token_n_zeros(enum lex_format format)
> > +{
> > +    switch (format) {
> > +    case LEX_F_DECIMAL:     return offsetof(union mf_subvalue, integer);
> > +    case LEX_F_HEXADECIMAL: return 0;
> > +    case LEX_F_IPV4:        return offsetof(union mf_subvalue, ipv4);
> > +    case LEX_F_IPV6:        return offsetof(union mf_subvalue, ipv6);
> > +    case LEX_F_ETHERNET:    return offsetof(union mf_subvalue, mac);
> > +    default: OVS_NOT_REACHED();
> > +    }
> 
> IIRC, if you remove default here, you should get a compiler warning if
> something gets added to the enum and not added to this switch statement.
>  That way we'd catch it at compile time instead of runtime.

Oh, we do anyhow due to our GCC options, please try it out.

> > +static void
> > +lex_token_format_string(const char *s, struct ds *ds)
> > +{
> > +    struct json json;
> > +    json.type = JSON_STRING;
> > +    json.u.string = CONST_CAST(char *, s);
> 
> Minor style thing ... I'd be tempted to write this as:
> 
>     struct json json = {
>         .type = JSON_STRING,
>         .u.string = CONST_CAST(char *, s),
>     };
> 
> It doesn't make any difference now, but would mean the rest of the json
> struct would automatically be 0 if any more struct members were added later.

Changed, thanks.

> > +    case LEX_T_STRING:
> > +        lex_token_format_string(token->s, s);
> > +        break;
> > +
> > +        break;
> 
> There's an extra break here.

Oops, fixed, thanks.

> > +
> > +}
> > +
> > +/* lex_token_parse(). */
> > +
> > +static void OVS_PRINTF_FORMAT(2, 3)
> > +lex_error(struct lex_token *token, const char *message, ...)
> > +{
> > +    token->type = LEX_T_ERROR;
> > +
> > +    va_list args;
> > +    va_start(args, message);
> > +    token->s = xvasprintf(message, args);
> > +    va_end(args);
> 
> I wonder if it's worth any error checking here in case lex_error() was
> called on the same token twice.  If it happened, there'd be a memory
> leak of token->s.

OK, I added an assertion:
    ovs_assert(!token->s);

> > +static const char *
> > +lex_parse_integer(const char *p, struct lex_token *token)
> > +{
> > +    memset(&token->value, 0, sizeof token->value);
> > +    p = lex_parse_integer__(p, token);
> > +    if (token->type == LEX_T_INTEGER && *p == '/') {
> 
> style nit ... I personally like to try to cut down on nesting levels
> where I can.  In this case, almost the whole function is inside this if
> block.  The condition could be reversed and the rest of the code brought
> in a level.

OK, I fussed with this a bit, here's an incremental diff with white
space only changes omitted:

diff --git a/ovn/lib/lex.c b/ovn/lib/lex.c
index fc88afe..99877af 100644
--- a/ovn/lib/lex.c
+++ b/ovn/lib/lex.c
@@ -241,6 +241,7 @@ lex_token_format(struct lex_token *token, struct ds *s)
 static void OVS_PRINTF_FORMAT(2, 3)
 lex_error(struct lex_token *token, const char *message, ...)
 {
+    ovs_assert(!token->s);
     token->type = LEX_T_ERROR;
 
     va_list args;
@@ -341,19 +342,24 @@ lex_parse_integer__(const char *p, struct lex_token 
*token)
 }
 
 static const char *
-lex_parse_integer(const char *p, struct lex_token *token)
+lex_parse_mask(const char *p, struct lex_token *token)
 {
-    memset(&token->value, 0, sizeof token->value);
-    p = lex_parse_integer__(p, token);
-    if (token->type == LEX_T_INTEGER && *p == '/') {
     struct lex_token mask;
 
+    /* Parse just past the '/' as a second integer.  Handle errors. */
     lex_token_init(&mask);
     memset(&mask.value, 0, sizeof mask.value);
     p = lex_parse_integer__(p + 1, &mask);
-        if (mask.type == LEX_T_INTEGER) {
-            token->type = LEX_T_MASKED_INTEGER;
+    if (mask.type == LEX_T_ERROR) {
+        lex_token_swap(&mask, token);
+        lex_token_destroy(&mask);
+        return p;
+    }
+    ovs_assert(mask.type == LEX_T_INTEGER);
 
+    /* Now convert the value and mask into a masked integer token.
+     * We have a few special cases. */
+    token->type = LEX_T_MASKED_INTEGER;
     uint32_t prefix_bits = ntohll(mask.value.integer);
     if (token->format == mask.format) {
         /* Same format value and mask is always OK. */
@@ -362,8 +368,7 @@ lex_parse_integer(const char *p, struct lex_token *token)
                && mask.format == LEX_F_DECIMAL
                && prefix_bits <= 32) {
         /* IPv4 address with decimal mask is a CIDR prefix. */
-                token->mask.integer = htonll(ntohl(be32_prefix_mask(
-                                                       prefix_bits)));
+        token->mask.integer = htonll(ntohl(be32_prefix_mask(prefix_bits)));
     } else if (token->format == LEX_F_IPV6
                && mask.format == LEX_F_DECIMAL
                && prefix_bits <= 128) {
@@ -380,6 +385,8 @@ lex_parse_integer(const char *p, struct lex_token *token)
         return p;
     }
 
+    /* Check invariant that a 1-bit in the value corresponds to a 1-bit in the
+     * mask. */
     for (int i = 0; i < ARRAY_SIZE(token->mask.be32); i++) {
         ovs_be32 v = token->value.be32[i];
         ovs_be32 m = token->mask.be32[i];
@@ -390,15 +397,18 @@ lex_parse_integer(const char *p, struct lex_token *token)
         }
     }
 
+    /* Done! */
+    lex_token_destroy(&mask);
     return p;
-        } else {
-            /* 'mask' isn't an integer so it must be an error message.  Swap
-             * the integer for the error message so that we return the error
-             * message (and destroy the pre-mask token just below). */
-            ovs_assert(mask.type == LEX_T_ERROR);
-            lex_token_swap(&mask, token);
 }
-        lex_token_destroy(&mask);
+
+static const char *
+lex_parse_integer(const char *p, struct lex_token *token)
+{
+    memset(&token->value, 0, sizeof token->value);
+    p = lex_parse_integer__(p, token);
+    if (token->type == LEX_T_INTEGER && *p == '/') {
+        p = lex_parse_mask(p, token);
     }
     return p;
 }

> > +/* Initializes 'lexer' for parsing 'input'.
> > + *
> > + * While the lexer is in use, 'input' must remain available, but the caller
> > + * otherwise retains ownership of 'input'.
> > + *
> > + * The caller must call lexer_get() to obtain the first token. */
> > +void
> > +lexer_init(struct lexer *lexer, const char *input)
> > +{
> > +    lexer->input = input;
> > +    lexer->start = NULL;
> > +    memset(&lexer->token, 0, sizeof lexer->token);
> 
> How about lex_token_init() instead?

Fixed, thanks.

I had to fix the test case too:

diff --git a/tests/test-ovn.c b/tests/test-ovn.c
index db97224..b2f1838 100644
--- a/tests/test-ovn.c
+++ b/tests/test-ovn.c
@@ -52,13 +52,14 @@ compare_token(const struct lex_token *a, const struct 
lex_token *b)
             fprintf(stderr, "mask differs\n");
             return;
         }
-    }
 
-    if (a->format != b->format
-        && !(a->format == LEX_F_HEXADECIMAL
-             && b->format == LEX_F_DECIMAL
-             && a->value.integer == 0)) {
-        fprintf(stderr, "format differs: %d -> %d\n", a->format, b->format);
+        if (a->format != b->format
+            && !(a->format == LEX_F_HEXADECIMAL
+                 && b->format == LEX_F_DECIMAL
+                 && a->value.integer == 0)) {
+            fprintf(stderr, "format differs: %d -> %d\n",
+                    a->format, b->format);
+        }
     }
 }
 
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH ovn v3 1/5] lex: New lexical analyzer module for use in OVN.

Reply via email to