Hi all, I need a function to provide a rough textual preview (without formatting except newlines) of the content of a web page.
So far I'm using this: (require net/url html-parsing sxml) (provide fetch fetch-string-content) (define (fetch url) (call/input-url url get-pure-port port->string)) (define (fetch-string-content url) (sxml:text ((sxpath '(html body)) (html->xexp (fetch url))))) The sxpath correctly returns the body sexp, but fetch-string-content still only returns an empty string or a bunch of "\n\n\n". I guess the problem is that sxml:text only returns what is immediately below the element, and that's not what I want. There are all kinds of unknown div and span tags in web pages. I'm looking for a way to get a simplified version of the textual content of the html body. If I was on Linux only I'd use "lynx -dump -nolist" in a subprocess, but it needs to be cross-platform. Is there a sxml trick to achieve that? It doesn't need to be perfect. Best, Erich -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.