Thanks for reviewing my patch! On Mon, Jul 4, 2011 at 7:10 AM, Bernd Helmle <maili...@oopsware.de> wrote: > +comment = 'data type for storing and manipulating JSON content' > > I'm not sure, if "manipulating" is a correct description. Maybe i missed it, > but i didn't see functions to manipulate JSON strings directly, for example, > adding an object to a JSON string, ...
Indeed. I intend to add manipulation functions in the future. The words "and manipulating" shouldn't be there yet. > -- Coding > ... > ... It is noticable that the parser seems to define > its own is_space() and is_digit() functions, e.g.: > > +#define is_space(c) ((c) == '\t' || (c) == '\n' || (c) == '\r' || (c) == ' > ') > > Wouldn't it be more clean to use isspace() instead? isspace() accepts '\f', '\v', and sometimes other characters as well, depending on locale. Likewise, isdigit() accepts some non-ASCII characters in addition to [0-9], depending on the locale and platform. Thus, to avoid parsing a superset of the JSON spec, I can't use the <ctype.h> functions. I should add a comment with this explanation. > This code block in function json_escape_unicode() looks suspicious: > > + /* Convert to UTF-8, if necessary. */ > + { > + const char *orig = json; > + json = (const char *) > + pg_do_encoding_conversion((unsigned char *) json, length, > + GetDatabaseEncoding(), PG_UTF8); > + if (json != orig) > + length = strlen(json); > + } > > Seems it *always* wants to convert the string. Isn't there a "if" condition > missing? pg_do_encoding_conversion does this check already. Perhaps I should rephrase the comment slightly: + /* Convert to UTF-8 (only if necessary). */ > There's a commented code fragment missing, which should be removed (see last > two lines of the snippet below): > > +static unsigned int > +read_hex16(const char *in) > ... > + Assert(is_hex_digit(c)); > + > + if (c >= '0' && c <= '9') > + tmp = c - '0'; > + else if (c >= 'A' && c <= 'F') > + tmp = c - 'A' + 10; > + else /* if (c >= 'a' && c <= 'f') */ > + tmp = c - 'a' + 10; That comment is there for documentation purposes, to indicate the range check that's skipped because we know c is a hex digit, and [a-f] is the only thing it could be (and must be) at that line. Should it still be removed? > json.c introduces new appendStringInfo* functions, e.g. > appendStringInfoEscape() and appendStringInfoUtf8(). Maybe it's better > to name them to something different, since it may occur someday that the > backend > will introduce the same notion with a different meaning. They are static, > but why not naming them into something like jsonAppendStringInfoUtf8() or > similar? I agree. > -- Regression Tests ... > It also tests UNICODE sequences and input encoding with other server > encoding than UTF-8. It tests with other *client* encodings than UTF-8, but not other server encodings than UTF-8. Is it possible to write tests for different server encodings? -- Self-review There's a minor bug in the normalization code: > select $$ "\u0009" $$::json; json ---------- "\u0009" (1 row) This should produce "\t" instead. I'm thinking that my expect_*/next_*pop_char/... parsing framework is overkill, and that, for a syntax as simple as JSON's, visibly messy parsing code like this: if (s < e && (*s == 'E' || *s == 'e')) { s++; if (s < e && (*s == '+' || *s == '-')) s++; if (!(s < e && is_digit(*s))) return false; do s++; while (s < e && is_digit(*s)); } would be easier to maintain than the deceptively elegant-looking code currently in place: if (optional_char_cond(s, e, *s == 'E' || *s == 'e')) { optional_char_cond(s, e, *s == '+' || *s == '-'); skip1_pred(s, e, is_digit); } I don't plan on gutting the current macro-driven parser just yet. When I do, the new parser will support building a parse tree (only when needed), and the parsing functions will have signatures like this: static bool parse_value(const char **sp, const char *e, JsonNode **out); static bool parse_string(const char **sp, const char *e, StringInfo *out); ... The current patch doesn't provide manipulation features where a parse tree would come in handy. I'd rather hold off on rewriting the parser until such features are added. I'll try to submit a revised patch within the next couple days. Thanks! - Joey -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers