On 22 Dec 2004 09:22:15 -0800, Zhang Le <[EMAIL PROTECTED]> wrote:
> Hello,
> I'm writing a little Tkinter application to retrieve news from
> various news websites such as http://news.bbc.co.uk/, and display them
> in a TK listbox. All I want are news title and url information.
Well, the BBC pub
Title: RE: extract news article from web
Excel in later offices has the "web query" feature.
(sorry about top posting)
-Original Message-
From: Steve Holden [mailto:[EMAIL PROTECTED]]
Sent: quinta-feira, 23 de dezembro de 2004 12:59
To: python-list@python.org
Subject: R
If you have a reliably structured page, then you can write a custom
parser. As Steve points out - BeautifulSOup would be a very good place
to start.
This is the problem that RSS was designed to solve. Many newssites will
supply exactly the information you want as an RSS feed. You should then
use U
Zhang Le wrote:
Thanks for the hint. The xml-rpc service is great, but I want some
general techniques to parse news information in the usual html pages.
Currently I'm looking at a script-based approach found at:
http://www.namo.com/products/handstory/manual/hsceditor/
User can write some simple tem
Thanks for the hint. The xml-rpc service is great, but I want some
general techniques to parse news information in the usual html pages.
Currently I'm looking at a script-based approach found at:
http://www.namo.com/products/handstory/manual/hsceditor/
User can write some simple template to extrac
Steve Holden wrote:
[...]
However, the code to extract the news is pretty simple. Here's the whole
program, modulo newsreader wrapping. It would be shorter if I weren't
stashing the extracted links it a relational database:
[...]
I see that, as is so often the case, I only told half the story,
Zhang Le wrote:
Hello,
I'm writing a little Tkinter application to retrieve news from
various news websites such as http://news.bbc.co.uk/, and display them
in a TK listbox. All I want are news title and url information. Since
each news site has a different layout, I think I need some
template-base