On 9/12/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> On 11 Sep 2007 at 16:09, Chas Owens wrote:
> > On 9/11/07, Jenda Krynicky <[EMAIL PROTECTED]> wrote:
> > > On 11 Sep 2007 at 15:15, Srinivas wrote:
> > > > I want to write a perl script that scrapes various job sites like
> > > > monster, dice, career builders etc.
> > > >
> > > > Given the job id and web site name it should scrape the
> > > > information and store in a mySQL database.
> > >
> > > And are you sure they won't mind? I don't work there anymore, but
> > > still ... you should make sure what you plan to do is OK with them.
> > snip
> >
> > The easiest way to do this is to obey their robots.txt file.  You can
> > learn more about robots.txt here:
> > http://www.robotstxt.org/wc/faq.html.  Also, be careful, the text you
> > are copying is still copyrighted and you cannot republish more than a
> > snippet without running into potential legal hazards.
>
> I don't think that's enough. It's one thing to index a site for searching 
> (think ... Google) and another
> to scrape the data and present it elsewhere as yours. The fact that it's OK 
> to run a script to
> download some data doesn't mean all uses of said data are all right.
snip

Right, that is why I warned about the possible legal hazards*.  A
script should never** request data that is under a url marked
disallow, but even if it is acceptable to read data, it is almost
never acceptable to display more than a snippet of the data (think of
the one or two lines after a search result in Google).  However, you
may, if my understanding of US and international copyright laws is
correct, derive new information from the data you scrape off a
website.  So, you could create a robot that scrapes all of the new
jobs off of several job websites (assuming the robots.txt allows you
to) and then create a webpage that looks like this

Monster has
5 new jobs requiring Perl
20 new jobs requiring Java
1000 new jobs requiring Befunge

Dice has
15 new jobs requiring Perl
50 new jobs requiring Java
0 jobs requiring Befunge

It would not be legal (in my opinion) to then have those lines link to
the full text of the jobs in question on your own website, but a page
like this

New jobs on Monster:
DBA/Developer - Sacramento, CA
Sys Admin - BFE, OK
Senior Sysadmin - Atlanta, GA
Perl Developer - Norfolk, VA
Data Munging Expert - Portland, OR

where each line deep links to the job offer on Monster's website would
be legal (again, in my opinion).

* always consult legal council before working with someone else's data.
** for certain values of never

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/


Reply via email to