On 11/11/10 09:07, chad wrote:
Let's say that I have an article. What I want to do is read in this file and have the program skip over ever instance of the words "the", "and", "or", and "but". What would be the general strategy for attacking a problem like this?
I'd keep a file of "stop words", read them into a set (normalizing case in the process). Then, as I skim over each word in my target file, check if the case-normalized version of the word is in your stop-words and skipping if it is. It might look something like this:
def normalize_word(s): return s.strip().upper() stop_words = set( normalize_word(word) for word in file('stop_words.txt') ) for line in file('data.txt'): for word in line.split(): if normalize_word(word) in stop_words: continue process(word) -tkc -- http://mail.python.org/mailman/listinfo/python-list