From: "Mike McClain" <mike.j...@nethere.com>
> Hi,
>    A few years ago I wrote a script to search a couple of dozen sites
> like CalJobs, craigslist, Dice, Indeed, several temp agencies in the
> area and a few of the major companies who use electronics techs
> for jobs I might care to apply for.
>    At the time I used LWP::Simple, LWP::UserAgent, HTML::TreeBuilder,
> WWW::Mechanize & HTTP::Cookies but many of the sites have modified
> their pages so that my program needs to be rewritten.
>    I'm wondering if anyone has suggestions of modules that make this
> sort of task easier.
> Thanks,
> Mike



Which part of the process do you find hard and want to make easier?

The process has 2 important parts:
- downloading the pages
- scraping them

To download, Mechanize is good because it is higher level and offers some 
helpful methods, but it won't help you if those pages are hard to get... if 
they use a kind of anti-scraping protection. In that case LWP is better, but 
Mechanize can use LWP's methods.

For scraping the content, HTML::TreeBuilder is very good.
If you have a good XPath knowledge you may find helpful 
HTML::TreeBuilder::XPath.

CSS is just a subset of XPath, so it is not as advanced, but it has a nicer 
syntax, so if you have a good CSS knowledge, you may use other scrapers like:
WWW::Mechanize::Query
Web::Scraper
Scrappy::Scraper::Parser
Mojo::UserAgent

All of them do the same thing, so it depends which type of syntax do you like 
the most.

Octavian


--
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/


Reply via email to