I second the idea of just using the islower(), isupper(), and istitle() methods. So, you could have a function - let's call it checkCase() - that returns a string with the tag you want...
def checkCase(word): if word.islower(): tag = 'nocap' elif word.isupper(): tag = 'allcaps' elif word.istitle(): tag = 'cap' return tag Then let's take an input file and pass every word through the function... f = open(path:to:file, 'r') corpus_text = f.read() f.close() tagged_corpus = '' all_words = corpus_text.split() for w in all_words: tagtext = checkCase(w) tagged_corpus = tagged_corpus + ' ' + w + '/' + tagtext output_file = open(path:to:file, 'w') output_file.write(tagged_corpus) print 'All Done!' Also, if you're doing natural language processing in Python, you should get NLTK. -- http://mail.python.org/mailman/listinfo/python-list