Re: [racket-users] xml library clarification - """ symbol parsing

Neil Van Dyke Fri, 22 Nov 2019 05:50:48 -0800

Kira wrote on 11/22/19 12:43 AM:

I am trying to understated what purpose it serves? Does this doneintentionally, or this is just random side effect?

I suspect it's an implementation decision of the parser, done forreasons of implementation ease or runtime efficiency. It's not-unusualin XML and HTML parsers I've seen.

For example, imagine a parser that has a fast way to scan an inputstream for the next special character (including `&`), and then takethat chunk of all the non-special characters as a string. That stringcan then be used as-is in the parsed representation. Then a differentmode of the parser starts parsing from the `&`, and ends up adding anew, different string for the result of that, perhaps coming from alookup table.

That hypothetical parser assembling the parsed representation *could*then concatenate sequences of 2 or more contiguous strings representingCDATA, but that could be expensive, and might not be needed. Considerhow large some XML and HTML documents can be, and how little informationout of them is sometimes needed (e.g., price scraper) --performance-wise, the concatenation might be best left up to whateveruses that parsed representation.

If you're using a DSL for XML querying, pattern-matching, extraction,transformation, etc., then you might have the DSL do that concatenationwhen worthwhile (e.g., when extracting the content of an element, withtype-checking). I've implemented such a DSL before. Or you might dothat concatenation in your application code, as needed. Or you mightnot do the concatenation at all, because, even if you used query toolsto narrow in on the information you wanted, you're streaming it out tosomewhere else, or transforming it in some way that doesn't benefit from(and even might suffer from) an intermediate concatenation.

Anyway, that's just a quick explanation in answer to your question of*why* a parser might happen to do it the way you say. But I agree thatit's not intuitive, and you'd also like to have better off-the-shelfDSLs for working with that parsed representation. XML processing is nonearly longer as popular as in the early days of Racket (PLT Scheme),which is when most all of the XML tools available for Racket were written.

If you wanted, you could make better tools. Though be aware that Ithink the "market" for XML tools in Racket is even smaller now than itused to be. So I suggest only making such for your own reasons, not outof altruism to help solve this problem for others, nor to "promote"Racket. (Racket was promoted by some of the XML and HTML tools earlier,but not anymore that I'm aware of.)


--
You received this message because you are subscribed to the Google Groups "Racket 
Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/racket-users/55cd8570-6f17-8e9b-0620-569c3cc27333%40neilvandyke.org.

Re: [racket-users] xml library clarification - """ symbol parsing

Reply via email to