On Fri, Nov 22, 2019 at 8:50 AM Neil Van Dyke <n...@neilvandyke.org> wrote:

> That hypothetical parser assembling the parsed representation *could*
> then concatenate sequences of 2 or more contiguous strings representing
> CDATA, but that could be expensive, and might not be needed.  Consider
> how large some XML and HTML documents can be, and how little information
> out of them is sometimes needed (e.g., price scraper) --
> performance-wise, the concatenation might be best left up to whatever
> uses that parsed representation.
>

I think a key point here is that the very features that make a
representation of XML ideal for some uses will be troublesome for other
uses.

I parse a lot of XML in Racket, and I often wish the x-expression grammar
were different in various ways, which basically amount to eliminating
artifacts of the concrete syntax: turning numeric entities (`valid-char?`)
and the `cdata` struct into strings, plus concatenating contiguous strings.
When I wander over toward the front-end, though, I start writing HTML pages
as x-expressions, and then I want adjacent strings to be allowed so I can
format my code nicely (perhaps with Scribble's at-syntax). If I were
writing an XML-aware text editor (at one point I took a few small steps in
that direction), I would very much care about the concrete syntax and even
source-location information. While I don't personally want this, some
people have even wished that x-expressions supported HTML-isms like
"boolean attributes."

Of course, these tensions aren't specific to XML: one could also wish for
fancier representations of strings than linear (mutable!) sequences of
characters, like "ropes"/"cords"/"texts" (a tree representation) or
substring "views" that can share storage.

To me, the fact that x-expressions are a good-enough representation of XML
for a lot of different uses suggests that they're in the right neighborhood
for a general-purpose library representation. (As Neil knows, there are
also Racket libraries that use a different representation, SXML
<https://docs.racket-lang.org/sxml-intro/index.html>, that's fairly close
neighbor in the design space.) I particularly like that, when I'm doing the
kind of parsing where I want a more normalized representation, I can come
up with a subset of the x-expression grammar that meets my needs (and
enforce it with memoized contracts) and do a normalization pass: I can rely
on stronger invariants internally while still taking full advantage of
existing libraries (for x-expressions, lists, etc.).

-Philip

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/racket-users/CAH3z3gYu0GQphbTzswt0gm3UnpsGnqzq4G5nHF4bLTh5EA9Qvw%40mail.gmail.com.

Reply via email to