> On Dec 9, 2015, at 10:59 PM, David K. Storrs <david.sto...@gmail.com> wrote: > > On Wednesday, December 9, 2015 at 6:33:02 PM UTC-8, Neil Van Dyke wrote: >> David K. Storrs wrote on 12/09/2015 08:50 PM: >>> 1) Is there a web-spidering package that people recommend? I could use >>> wget and then parse things from disk, but I'd like to have something that's >>> easily composable into CLI scripts. >> >> >> I've done a lot of Web crawling and scraping successfully with Racket >> and Scheme, over the last 14-15 years. I released an HTML parser >> ("http://www.neilvandyke.org/racket-html-parsing/"), which I still use >> today. From that parse, you might then extract the info you need with >> `sxml-match` >> ("http://planet.racket-lang.org/display.ss?package=sxml-match.plt&owner=jim") >> >> and/or SXPath. > > Thank you; I've been rolling through the docs and playing around on thse, and > they seem really useful. One question though -- I stumbled across a mention > of the sxml/html module while I was reading, but had no luck installing it. > None of the following worked: > > (require sxml/html) > $ raco pkg install sxml/html > $ raco pkg install 'sxml/html' # Maybe the shell was having trouble with '/'? > > I don't know that I need it, but I'd like to know how to deal with modules > like this in future.
I don’t believe that package names should contain slashes. However, it could easily be the case that a package (presumably from Neil Van Dyke) could contain code that would be installed into the html subdirectory of the sxml collection, and I’m guessing that’s what you’re referring to. It’s perhaps also worth mentioning that Racket has an older package system (PLaneT), and a newer one (packages), and there’s a certain amount of confusion that may result from that transition. When you write ‘raco pkg install …”, you’re referring to the new system. It appears to me that, for instance, Neil’s html-parsing library is not currently available in package form. (ObResearch: quick search for 'neil@ne' through pkgs.) John Clements -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.