Hi David, "Thompson, David" <dthomps...@worcester.edu> writes:
> Hi Maxim, > > Thanks for the review! My pleasure! It's a small thing I can do to try to help nudge builtin JSON support in Guile, which I agree has its place in 2025. In general, I'd like Guile to compete a bit more with Python in the out-of-the-box library capabilities. Especially since outside of Guix installing Guile libraries is probably not that fun (I'd suspect many Guile libraries to be missing or outdated on various distributions), so a more useful core has even more value there. [...] >> Is there particular reason for using vectors instead of plain list to >> represent JSON arrays? The later would be more idiomatic unless there >> are technical reasons (perhaps performance?). > > Back in 2015 I chose lists out of convenience because lists are such a > fundamental data structure in Scheme. It is very tempting! However, > there are a couple of issues: 1) O(n) access time for lists is not > great when there's a perfectly suitable O(1) data structure with read > syntax sitting right there and 2) It creates ambiguity when lists are > also used to represent objects. My previous patch used the symbol '@ > as a sentinel to mark objects, but it was kinda gross and I grew to > dislike it more and more over the years. SRFI-180 uses vectors, > Racket's JSON library uses vectors, etc. and I think they made the > right call. I see. It makes sense, thanks for explaining. [...] >> Hm, I wonder what (list port) looks like in the irritants when the >> exception is reported; is it useful? Shouldn't it show instead the >> problematic value? > > By including the port in the irritants it allows the exception handler > to use 'port-line' and 'port-column' to show where in the JSON > document the error occurred (for ports with such information like file > ports). OK! >> > + (define (consume-whitespace) >> > + (case (peek-char port) >> > + ((#\space #\tab #\return #\newline) >> >> Should a match + ((? char-whitespace?)) predicate pattern be used here >> instead, or similar? Or perhaps the above is faster and more >> self-contained, which can be a good thing. > > ECMA-404 states that these 4 characters are the only acceptable > whitespace characters: > > "Whitespace is any sequence of one or more of the following code > points: character tabulation (U+0009), line feed (U+000A), carriage > return (U+000D), and space (U+0020)." Makes sense. [...] >> > + (define (read-fraction) >> > + (case (peek-char port) >> > + ((#\.) >> > + (read-char port) >> > + (let lp ((mag 10)) >> > + (let ((n (read-digit-maybe))) >> > + (if n (+ (/ n mag) (lp (* mag 10))) 0)))) >> > + (else 0))) >> >> Should the above be named 'read-decimal' ? Does a decimal number in >> JSON always start with '.' and not with 0. ? I was a bit puzzled on >> what 'mag' may mean here, I guess 'magnitude' although there doesn't >> appear to have a clear terminology for it. > > It's called 'read-fraction' because it reads the fractional part of > the number. I considered 'read-decimal' but I think it's less > descriptive. This procedure is called from two places, after the > respective caller has already read the digits before the decimal > point. 'mag' does indeed mean 'magnitude', as each digit read > increases the order of magnitude of the resulting number. I borrowed > this name from Hoot's 'read' implementation, which means it's probably > in Guile's pure-Scheme 'read' implementation as well. OK, thanks for explaining. I realized later in my review that these are very dependent on when they are called (they are very specific or narrow, more than their name might suggest). It's fine. [...] > The following clause checks for '.'. We need to distinguish between 3 > types of input here: > > - invalid extraneous zeroes like '09' (ECMA-404 says this is not allowed) > - fractional notation like '0.123' > - plain '0' > > If I leave out this clause, we can no longer distinguish errors from > plain zeroes. There's likely other ways to express it but this feels > straightforward enough. > > However, it was a great idea to take another look at this code because > there is a bug! '0e3' is a valid JSON number that this code rejects. > It should parse to 0. My updated patch fixes this and adds a test > case. Good catch! [...] >> > + (define (write-object obj) >> > + (put-char port #\{) >> > + (match obj >> > + ((head . rest) >> > + (write-pair head) >> > + (let lp ((obj rest)) >> > + (match obj >> > + (() (values)) >> >> Any reason to return (values) instead of some dummy #t to denote 'no-op' >> ?. > > The loop is evaluated for effect and thus there is nothing to return > so returning 0 values makes sense. This is something I picked up from > Andy Wingo. Interesting. >> > + ((head . rest) >> > + (put-char port #\,) >> > + (write-pair head) >> > + (lp rest)) >> > + (_ (fail "invalid object" obj)))))) >> > + (put-char port #\})) >> > + (define (write-array v) >> > + (put-char port #\[) >> > + (match (vector-length v) >> > + (0 (values)) >> > + (n >> > + (write-value (vector-ref v 0)) >> > + (do ((i 1 (1+ i))) >> > + ((= i n)) >> > + (put-char port #\,) >> > + (write-value (vector-ref v i))))) >> >> I suppose the above is more efficient than a for-each loop? I'd be >> curious to see it profiled, if you still have data. At least now I see >> than for > 100k, vector-ref is faster than list-ref, which probably >> explains why you went with vectors (could still be an implementation >> detail with the list->vector call left in the writer though, in my >> opinion). > > Yes, it is more efficient because it avoids closure allocation. Also, > I just don't see any reason to import (rnrs base) or (srfi srfi-43) to > get vector-for-each when looping over a vector is trivial. I didn't > profile anything. Yeah, I was thinking in terms of plain lists, didn't know we didn't iterators for vectors in the core (I've seldom used vectors). Sounds reasonable. >> > + (put-char port #\])) >> > + (define (write-number x) >> > + (if (or (exact-integer? x) >> > + (and (real? x) >> > + (inexact? x) >> > + ;; NaNs and infinities are not allowed. >> > + (not (or (nan? x) (inf? x))))) >> > + ;; Scheme's string representations of exact integers and floats >> > + ;; are compatible with JSON. >> > + (put-string port (number->string x)) >> > + (fail "invalid number" x))) >> > + (define (write-value x) >> > + (match x >> > + (#t (put-string port "true")) >> > + (#f (put-string port "false")) >> > + ('null (put-string port "null")) >> > + (() (put-string port "{}")) >> > + ((? pair?) (write-object x)) >> > + ((? vector?) (write-array x)) >> > + ((? string?) (write-string x)) >> > + ((? number?) (write-number x)) >> > + (_ (fail "invalid value" x)))) >> > + (write-value exp)) >> >> Phew. That's a pretty low-level parser! I hope it's fast, otherwise it >> seems it'd be more concise/fun/maintainable to devise a PEG-based one, >> which appears to be doable for JSON, from what I've read. Perhaps >> sprinkle with a few performance-related comments where such concerns >> impacted the design choices, so that we can remember and retest/reverify >> these in the future when Guile evolves. > > JSON is a pretty simple format and thus I think a hand-rolled parser > is appropriate. It's much simpler than 'read', anyway. I suppose it > would be more concise, but "PEG parser" and "fun" do not go together > for me. At ~300 lines of quite simple code (I did not go hog wild on > macros or fancy abstractions nor did I sacrifice readable code for > performance) I don't think there is much concern regarding > maintenance. If someone wants to experiment to see how a PEG parser > compares, though, feel free. Someone could always make that experiment in the future and suggest a replacement, if the performance is as good or better and the code simpler. LGTM. Reviewed-by: Maxim Cournoyer <maxim.cournoyer@gmail> -- Thanks, Maxim