On 11/11/10 09:07, chad wrote:
Let's say that I have an article. What I want to do is read in
this file and have the program skip over ever instance of the
words "the", "and",  "or", and "but". What would be the
general strategy for attacking a problem like this?

I'd keep a file of "stop words", read them into a set (normalizing case in the process). Then, as I skim over each word in my target file, check if the case-normalized version of the word is in your stop-words and skipping if it is. It might look something like this:

  def normalize_word(s):
    return s.strip().upper()

  stop_words = set(
    normalize_word(word)
    for word in file('stop_words.txt')
    )
  for line in file('data.txt'):
    for word in line.split():
      if normalize_word(word) in stop_words: continue
      process(word)

-tkc



--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to