Frederic, Good points... I have a plain text file containing the html and words that I want removed(keywords) from the html file, after processing the html file it would save it as a plain text file.
So the program would import the keywords, remove them from the html file and save the html file as something.txt. I would post the data but it's secret. I can post an example: index.html (html page) " <div><p><em>"Python has been an important part of Google since the beginning, and remains so as the system grows and evolves. "</em></p> <p>-- Peter Norvig, <a class="reference" " replace.txt (keywords) " <div id="quote" class="homepage-box"> <div><p><em>" "</em></p> <p>-- Peter Norvig, <a class="reference" " something.txt(file after editing) " Python has been an important part of Google since the beginning, and remains so as the system grows and evolves. " Larry, I've looked into using BeatifulSoup but came to the conculsion that my idea would work better in the end. Thanks for the help. Anthra Norell wrote: > DH, > Could you be more specific describing what you have and what you want? > You are addressing people, many of whom are good at > stripping useless junk once you tell them what 'useless junk' is. > Also it helps to post some of you data that you need to process and a > sample of the same data as it should look once it is > processed. > > Frederic > > ----- Original Message ----- > From: "DH" <[EMAIL PROTECTED]> > Newsgroups: comp.lang.python > To: <python-list@python.org> > Sent: Thursday, August 24, 2006 2:11 AM > Subject: Taking data from a text file to parse html page > > > > Hi, > > > > I'm trying to strip the html and other useless junk from a html page.. > > Id like to create something like an automated text editor, where it > > takes the keywords from a txt file and removes them from the html page > > (replace the words in the html page with blank space) I'm new to python > > and could use a little push in the right direction, any ideas on how to > > implement this? > > > > Thanks! > > > > -- > > http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list