From: "Mike McClain" <mike.j...@nethere.com> > Hi, > A few years ago I wrote a script to search a couple of dozen sites > like CalJobs, craigslist, Dice, Indeed, several temp agencies in the > area and a few of the major companies who use electronics techs > for jobs I might care to apply for. > At the time I used LWP::Simple, LWP::UserAgent, HTML::TreeBuilder, > WWW::Mechanize & HTTP::Cookies but many of the sites have modified > their pages so that my program needs to be rewritten. > I'm wondering if anyone has suggestions of modules that make this > sort of task easier. > Thanks, > Mike
Which part of the process do you find hard and want to make easier? The process has 2 important parts: - downloading the pages - scraping them To download, Mechanize is good because it is higher level and offers some helpful methods, but it won't help you if those pages are hard to get... if they use a kind of anti-scraping protection. In that case LWP is better, but Mechanize can use LWP's methods. For scraping the content, HTML::TreeBuilder is very good. If you have a good XPath knowledge you may find helpful HTML::TreeBuilder::XPath. CSS is just a subset of XPath, so it is not as advanced, but it has a nicer syntax, so if you have a good CSS knowledge, you may use other scrapers like: WWW::Mechanize::Query Web::Scraper Scrappy::Scraper::Parser Mojo::UserAgent All of them do the same thing, so it depends which type of syntax do you like the most. Octavian -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/