On Fri, Nov 22, 2019 at 8:50 AM Neil Van Dyke <n...@neilvandyke.org> wrote:
> That hypothetical parser assembling the parsed representation *could* > then concatenate sequences of 2 or more contiguous strings representing > CDATA, but that could be expensive, and might not be needed. Consider > how large some XML and HTML documents can be, and how little information > out of them is sometimes needed (e.g., price scraper) -- > performance-wise, the concatenation might be best left up to whatever > uses that parsed representation. > I think a key point here is that the very features that make a representation of XML ideal for some uses will be troublesome for other uses. I parse a lot of XML in Racket, and I often wish the x-expression grammar were different in various ways, which basically amount to eliminating artifacts of the concrete syntax: turning numeric entities (`valid-char?`) and the `cdata` struct into strings, plus concatenating contiguous strings. When I wander over toward the front-end, though, I start writing HTML pages as x-expressions, and then I want adjacent strings to be allowed so I can format my code nicely (perhaps with Scribble's at-syntax). If I were writing an XML-aware text editor (at one point I took a few small steps in that direction), I would very much care about the concrete syntax and even source-location information. While I don't personally want this, some people have even wished that x-expressions supported HTML-isms like "boolean attributes." Of course, these tensions aren't specific to XML: one could also wish for fancier representations of strings than linear (mutable!) sequences of characters, like "ropes"/"cords"/"texts" (a tree representation) or substring "views" that can share storage. To me, the fact that x-expressions are a good-enough representation of XML for a lot of different uses suggests that they're in the right neighborhood for a general-purpose library representation. (As Neil knows, there are also Racket libraries that use a different representation, SXML <https://docs.racket-lang.org/sxml-intro/index.html>, that's fairly close neighbor in the design space.) I particularly like that, when I'm doing the kind of parsing where I want a more normalized representation, I can come up with a subset of the x-expression grammar that meets my needs (and enforce it with memoized contracts) and do a normalization pass: I can rely on stronger invariants internally while still taking full advantage of existing libraries (for x-expressions, lists, etc.). -Philip -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/CAH3z3gYu0GQphbTzswt0gm3UnpsGnqzq4G5nHF4bLTh5EA9Qvw%40mail.gmail.com.