Kira wrote on 11/22/19 12:43 AM:
I am trying to understated what purpose it serves? Does this done intentionally, or this is just random side effect?
I suspect it's an implementation decision of the parser, done for reasons of implementation ease or runtime efficiency. It's not-unusual in XML and HTML parsers I've seen.
For example, imagine a parser that has a fast way to scan an input stream for the next special character (including `&`), and then take that chunk of all the non-special characters as a string. That string can then be used as-is in the parsed representation. Then a different mode of the parser starts parsing from the `&`, and ends up adding a new, different string for the result of that, perhaps coming from a lookup table.
That hypothetical parser assembling the parsed representation *could* then concatenate sequences of 2 or more contiguous strings representing CDATA, but that could be expensive, and might not be needed. Consider how large some XML and HTML documents can be, and how little information out of them is sometimes needed (e.g., price scraper) -- performance-wise, the concatenation might be best left up to whatever uses that parsed representation.
If you're using a DSL for XML querying, pattern-matching, extraction, transformation, etc., then you might have the DSL do that concatenation when worthwhile (e.g., when extracting the content of an element, with type-checking). I've implemented such a DSL before. Or you might do that concatenation in your application code, as needed. Or you might not do the concatenation at all, because, even if you used query tools to narrow in on the information you wanted, you're streaming it out to somewhere else, or transforming it in some way that doesn't benefit from (and even might suffer from) an intermediate concatenation.
Anyway, that's just a quick explanation in answer to your question of *why* a parser might happen to do it the way you say. But I agree that it's not intuitive, and you'd also like to have better off-the-shelf DSLs for working with that parsed representation. XML processing is no nearly longer as popular as in the early days of Racket (PLT Scheme), which is when most all of the XML tools available for Racket were written.
If you wanted, you could make better tools. Though be aware that I think the "market" for XML tools in Racket is even smaller now than it used to be. So I suggest only making such for your own reasons, not out of altruism to help solve this problem for others, nor to "promote" Racket. (Racket was promoted by some of the XML and HTML tools earlier, but not anymore that I'm aware of.)
-- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/racket-users/55cd8570-6f17-8e9b-0620-569c3cc27333%40neilvandyke.org.