On Wed, Apr 08, 2015 at 05:17:33PM -0400, Russell Bryant wrote: > On 04/01/2015 12:52 AM, Ben Pfaff wrote: > > I'm determined not to let the terrible style of pseudo-parsing we have in > > OVS leak into OVN. Here's the first step. > > > > Signed-off-by: Ben Pfaff <b...@nicira.com> > > Did you consider flex and bison for this patch series? It's mostly a > weakly informed question out of curiosity. I don't have much experience > with them, but it seems the lexer could have been done with flex, for > example.
I did think about that. I've done things both ways in the past, and even written my own parser-generators for that matter. My experience is that machine-generated lexers and parsers are most useful when the lexing and parsing is somewhat tricky. In this case, it's not very difficult, so I prefer to avoid adding a dependency. > I left several style related comments inline, but it's all optional > stuff that I noted while reading through. I didn't find anything > obviously wrong. Nice work on the tests. That certainly helps the > confidence. :-) > > Acked-by: Russell Bryant <rbry...@redhat.com> Thanks for the review. > > +lib_LTLIBRARIES += lib/libovn.la > > +lib_libovn_la_SOURCES = ovn/lex.c ovn/lex.h > > + > > This can probably just be done later, but I like the idea mentioned in > another thread about starting to create some subdirs under ovn/. In > this case, perhaps we can have the sources go in ovn/lib/? Given that we already have ovn/controller, I guess that's a good idea. Done. > > +/* Initializes 'token'. */ > > +void > > +lex_token_init(struct lex_token *token) > > +{ > > + token->type = LEX_T_END; > > + token->s = NULL; > > +} > > Perhaps this could be updated to initialize all of lex_token? I can't > find any case where it's actually a problem, but that's the behavior I'd > expect without looking. I'd like to avoid that simply because I expect "union mf_subvalue" to get quite large when Geneve options are fully supported. > > +static size_t > > +lex_token_n_zeros(enum lex_format format) > > +{ > > + switch (format) { > > + case LEX_F_DECIMAL: return offsetof(union mf_subvalue, integer); > > + case LEX_F_HEXADECIMAL: return 0; > > + case LEX_F_IPV4: return offsetof(union mf_subvalue, ipv4); > > + case LEX_F_IPV6: return offsetof(union mf_subvalue, ipv6); > > + case LEX_F_ETHERNET: return offsetof(union mf_subvalue, mac); > > + default: OVS_NOT_REACHED(); > > + } > > IIRC, if you remove default here, you should get a compiler warning if > something gets added to the enum and not added to this switch statement. > That way we'd catch it at compile time instead of runtime. Oh, we do anyhow due to our GCC options, please try it out. > > +static void > > +lex_token_format_string(const char *s, struct ds *ds) > > +{ > > + struct json json; > > + json.type = JSON_STRING; > > + json.u.string = CONST_CAST(char *, s); > > Minor style thing ... I'd be tempted to write this as: > > struct json json = { > .type = JSON_STRING, > .u.string = CONST_CAST(char *, s), > }; > > It doesn't make any difference now, but would mean the rest of the json > struct would automatically be 0 if any more struct members were added later. Changed, thanks. > > + case LEX_T_STRING: > > + lex_token_format_string(token->s, s); > > + break; > > + > > + break; > > There's an extra break here. Oops, fixed, thanks. > > + > > +} > > + > > +/* lex_token_parse(). */ > > + > > +static void OVS_PRINTF_FORMAT(2, 3) > > +lex_error(struct lex_token *token, const char *message, ...) > > +{ > > + token->type = LEX_T_ERROR; > > + > > + va_list args; > > + va_start(args, message); > > + token->s = xvasprintf(message, args); > > + va_end(args); > > I wonder if it's worth any error checking here in case lex_error() was > called on the same token twice. If it happened, there'd be a memory > leak of token->s. OK, I added an assertion: ovs_assert(!token->s); > > +static const char * > > +lex_parse_integer(const char *p, struct lex_token *token) > > +{ > > + memset(&token->value, 0, sizeof token->value); > > + p = lex_parse_integer__(p, token); > > + if (token->type == LEX_T_INTEGER && *p == '/') { > > style nit ... I personally like to try to cut down on nesting levels > where I can. In this case, almost the whole function is inside this if > block. The condition could be reversed and the rest of the code brought > in a level. OK, I fussed with this a bit, here's an incremental diff with white space only changes omitted: diff --git a/ovn/lib/lex.c b/ovn/lib/lex.c index fc88afe..99877af 100644 --- a/ovn/lib/lex.c +++ b/ovn/lib/lex.c @@ -241,6 +241,7 @@ lex_token_format(struct lex_token *token, struct ds *s) static void OVS_PRINTF_FORMAT(2, 3) lex_error(struct lex_token *token, const char *message, ...) { + ovs_assert(!token->s); token->type = LEX_T_ERROR; va_list args; @@ -341,19 +342,24 @@ lex_parse_integer__(const char *p, struct lex_token *token) } static const char * -lex_parse_integer(const char *p, struct lex_token *token) +lex_parse_mask(const char *p, struct lex_token *token) { - memset(&token->value, 0, sizeof token->value); - p = lex_parse_integer__(p, token); - if (token->type == LEX_T_INTEGER && *p == '/') { struct lex_token mask; + /* Parse just past the '/' as a second integer. Handle errors. */ lex_token_init(&mask); memset(&mask.value, 0, sizeof mask.value); p = lex_parse_integer__(p + 1, &mask); - if (mask.type == LEX_T_INTEGER) { - token->type = LEX_T_MASKED_INTEGER; + if (mask.type == LEX_T_ERROR) { + lex_token_swap(&mask, token); + lex_token_destroy(&mask); + return p; + } + ovs_assert(mask.type == LEX_T_INTEGER); + /* Now convert the value and mask into a masked integer token. + * We have a few special cases. */ + token->type = LEX_T_MASKED_INTEGER; uint32_t prefix_bits = ntohll(mask.value.integer); if (token->format == mask.format) { /* Same format value and mask is always OK. */ @@ -362,8 +368,7 @@ lex_parse_integer(const char *p, struct lex_token *token) && mask.format == LEX_F_DECIMAL && prefix_bits <= 32) { /* IPv4 address with decimal mask is a CIDR prefix. */ - token->mask.integer = htonll(ntohl(be32_prefix_mask( - prefix_bits))); + token->mask.integer = htonll(ntohl(be32_prefix_mask(prefix_bits))); } else if (token->format == LEX_F_IPV6 && mask.format == LEX_F_DECIMAL && prefix_bits <= 128) { @@ -380,6 +385,8 @@ lex_parse_integer(const char *p, struct lex_token *token) return p; } + /* Check invariant that a 1-bit in the value corresponds to a 1-bit in the + * mask. */ for (int i = 0; i < ARRAY_SIZE(token->mask.be32); i++) { ovs_be32 v = token->value.be32[i]; ovs_be32 m = token->mask.be32[i]; @@ -390,15 +397,18 @@ lex_parse_integer(const char *p, struct lex_token *token) } } + /* Done! */ + lex_token_destroy(&mask); return p; - } else { - /* 'mask' isn't an integer so it must be an error message. Swap - * the integer for the error message so that we return the error - * message (and destroy the pre-mask token just below). */ - ovs_assert(mask.type == LEX_T_ERROR); - lex_token_swap(&mask, token); } - lex_token_destroy(&mask); + +static const char * +lex_parse_integer(const char *p, struct lex_token *token) +{ + memset(&token->value, 0, sizeof token->value); + p = lex_parse_integer__(p, token); + if (token->type == LEX_T_INTEGER && *p == '/') { + p = lex_parse_mask(p, token); } return p; } > > +/* Initializes 'lexer' for parsing 'input'. > > + * > > + * While the lexer is in use, 'input' must remain available, but the caller > > + * otherwise retains ownership of 'input'. > > + * > > + * The caller must call lexer_get() to obtain the first token. */ > > +void > > +lexer_init(struct lexer *lexer, const char *input) > > +{ > > + lexer->input = input; > > + lexer->start = NULL; > > + memset(&lexer->token, 0, sizeof lexer->token); > > How about lex_token_init() instead? Fixed, thanks. I had to fix the test case too: diff --git a/tests/test-ovn.c b/tests/test-ovn.c index db97224..b2f1838 100644 --- a/tests/test-ovn.c +++ b/tests/test-ovn.c @@ -52,13 +52,14 @@ compare_token(const struct lex_token *a, const struct lex_token *b) fprintf(stderr, "mask differs\n"); return; } - } - if (a->format != b->format - && !(a->format == LEX_F_HEXADECIMAL - && b->format == LEX_F_DECIMAL - && a->value.integer == 0)) { - fprintf(stderr, "format differs: %d -> %d\n", a->format, b->format); + if (a->format != b->format + && !(a->format == LEX_F_HEXADECIMAL + && b->format == LEX_F_DECIMAL + && a->value.integer == 0)) { + fprintf(stderr, "format differs: %d -> %d\n", + a->format, b->format); + } } } _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev