Hello jk, > For a project, I need to develop a corpus of online news stories. I'm > looking for an application that, given the url of a web page, "copies" > the rendered text of the web page (not the source HTNL text), opens a > text editor (Notepad), and displays the copied text for the user to > examine and save into a text file. Graphics and sidebars to be > ignored. The examples I have come across are much too complex for me > to customize for this simple job. Can anyone lead me to the right > direction? Going simple :)
from os import system from sys import argv OUTFILE = "geturl.txt" system("lynx -dump %s > %s" % (argv[1], OUTFILE)) system("start notepad %s" % OUTFILE) (You can find lynx at http://lynx.browser.org/) Note the removing sidebars is a very difficult problem. Search for "wrapper induction" to see some work on the subject. HTH, -- Miki <[EMAIL PROTECTED]> http://pythonwise.blogspot.com -- http://mail.python.org/mailman/listinfo/python-list