David Storrs wrote on 01/04/2018 02:27 PM: [...] (xexp->html (html->xexp "<P>This is<br<b<I>bold </foo>italic</ b > text.</p>"))
[Result is:] "<p>This is<br><b><i>bold italic</i></b> text.</p>" [...]
I would have expected instead: "<p>This is<br<b<i>bold italic</i></ b > text.</p>"
I think you might be talking about the behavior of the `html-parsing` parser, when encountering *invalid* HTML. Which is invalid.
This parser was written in portable Scheme in 2001, for HTML of the time, and, over the years, has been tweaked to accommodate the evolution: HTML 4.x, XHTML, HTML5.
Browser developers have always seemed to encourage invalid HTML, and even sought to standardize interpretation of invalid HTML. I'm not saying that they are bad people, but I'm not saying that they aren't. :)
I think the default engineering opinion is that it would've been better to specify that invalid HTML "raises an exception", or at least "behavior is undefined".
Back then, I actually proposed to a certain knight that the browsers could at least put a big red indicator in the status bar when encountering invalid HTML, to warn sloppy dotcom-bubble Aeron-hammock "HTML programmers" before they deployed, or embarrass their company after. ("Warning: this site can't even get their HTML right. Don't give them an IPO.")
However, since the original purpose of the `html-parsing` parser was for scraping arbitrary real-world Web pages, and invalid HTML was rampant, I did make it permissive (the original name of the parser referenced the word "pragmatic"), and mimicked some of the apparent browser behavior when encountering invalid HTML.
The browser behavior in response to invalid HTML might have changed over time, but I'd guess that HTML is not as invalid as it used to be (when more HTML was written by hand, or generated by very sketchy-looking Perl scripts).
So, for any contemporary real-world invalid HTML, if it turns out that popular browsers handle it differently than the `html-parsing` parser does, in a way that matters to your code, that would be a good argument for changing the parser behavior. Probably with a new major version without backward compatibility, if you're breaking the regression tests or you might be breaking some unspecified behavior on which people are understandably dependent.
Someday, I'd like to write new HTML5 and other Web-related tools in Racket, but I can't justify that right now (the current tools work for my immediate needs, and I don't have enough discretionary time right now), so I'd need funding for that.
-- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.