Re: HTML Parsing and Indexing

Andy Dingley Mon, 13 Nov 2006 15:15:56 -0800

[EMAIL PROTECTED] wrote:

>     I am involved in one project which tends to collect news
> information published on selected, known web sites inthe format of
> HTML, RSS, etc


I just can't imagine why anyone would still want to do this.

With RSS, it's an easy (if not trivial) problem.

With HTML it's hard, it's unstable, and the legality of recycling
others' content like this is far from clear.  Are you _sure_ there's
still a need to do this thoroughly awkward task?  How many sites are
there that are worth scraping, permit scraping, and don't yet offer RSS
?

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: HTML Parsing and Indexing

Reply via email to