[EMAIL PROTECTED] wrote: > I am involved in one project which tends to collect news > information published on selected, known web sites inthe format of > HTML, RSS, etc
I just can't imagine why anyone would still want to do this. With RSS, it's an easy (if not trivial) problem. With HTML it's hard, it's unstable, and the legality of recycling others' content like this is far from clear. Are you _sure_ there's still a need to do this thoroughly awkward task? How many sites are there that are worth scraping, permit scraping, and don't yet offer RSS ? -- http://mail.python.org/mailman/listinfo/python-list