Re: [racket-users] reading html

2016-02-25 Thread Sam Tobin-Hochstadt
Probably the documentation should clarify that it only works for the older specification, then. Sam On Thu, Feb 25, 2016 at 1:48 PM, Jay McCarthy wrote: > The `html` library, however, is specifically for parsing HTML4. HTML5 > is a totally new beast basically unrelated to old HTML. We could > im

Re: [racket-users] reading html

2016-02-25 Thread Jay McCarthy
The `html` library, however, is specifically for parsing HTML4. HTML5 is a totally new beast basically unrelated to old HTML. We could imaginably have a new html library Jay On Thu, Feb 25, 2016 at 1:45 PM, Sam Tobin-Hochstadt wrote: > Note that HTML4 is quite out of date (from 1999), the most r

Re: [racket-users] reading html

2016-02-25 Thread Sam Tobin-Hochstadt
Note that HTML4 is quite out of date (from 1999), the most recent HTML standard from the W3C is here: https://www.w3.org/TR/html/ from 2014. However, if you plan to reference the standard to build software, the most useful spec is https://html.spec.whatwg.org/ which is what browsers and other appli

Re: [racket-users] reading html

2016-02-25 Thread jon stenerson
Thanks Neil. Jay, it seems to me that the html spec at w3.org says that and can be used as inline elements so that may be a reasonable change to html-spec.rkt. On 2/25/2016 11:30 AM, Neil Van Dyke wrote: Jay McCarthy wrote on 02/25/2016 01:21 PM: Since you mention "in the wild", I think yo

Re: [racket-users] reading html

2016-02-25 Thread jon stenerson
Thank you! I wasn't aware of the html-parsing library. Jon On 2/25/2016 11:21 AM, Jay McCarthy wrote: You should double check against the HTML 4.01 spec https://www.w3.org/TR/html4/ Since you mention "in the wild", I think you probably don't want to use the html library but instead want to u

Re: [racket-users] reading html

2016-02-25 Thread Neil Van Dyke
Jay McCarthy wrote on 02/25/2016 01:21 PM: Since you mention "in the wild", I think you probably don't want to use the html library but instead want to use http://docs.racket-lang.org/html-parsing/index.html BTW, `html-parsing` package uses SXML, and you'll want to read this brand-new documen

Re: [racket-users] reading html

2016-02-25 Thread Jay McCarthy
You should double check against the HTML 4.01 spec https://www.w3.org/TR/html4/ Since you mention "in the wild", I think you probably don't want to use the html library but instead want to use http://docs.racket-lang.org/html-parsing/index.html Jay On Thu, Feb 25, 2016 at 1:13 PM, jon stenerso

[racket-users] reading html

2016-02-25 Thread jon stenerson
I find that when I use the html library I have to make a few simple changes to html-spec.rkt. It seems that and are not treated like and . You can see in this example that while remains in the enclosing , does not. I also find that I have to allow pcdata as a child of and . I don't know