It should be fine. Hundreds of sites is not really that many. You just need to have backoffs etc. to avoid getting blacklisted. Using sync and friends would make implementing this easy.
If you want to extract unstructured data, there is some good reading here: http://metaoptimize.com/qa/questions/3440/text-extraction-from-html-pages Probably roping together various existing systems would be the efficient way to get a scalable solution working. See Apache Tika and projects referenced above. There is also a surprising amount of work on "scalable web spider"s. (Google that phrase if you're interested.) HTH, N. On Fri, Mar 18, 2011 at 7:29 PM, Geoffrey S. Knauth <ge...@knauth.org> wrote: > I'm evaluating whether to use Racket to data mine hundreds of websites > pulling out business information within an industry. I think Racket is up to > it, but I'm wondering if anyone else has had experiences positive or > negative. I've used other tools to do rudimentary digging, but this project > is likely to touch AI, which brings me back to the Lisp family. > > Geoff _________________________________________________ For list-related administrative tasks: http://lists.racket-lang.org/listinfo/users