brad <[EMAIL PROTECTED]> writes: > Crazy question, but has anyone attempted this or seen Python code that > does? For example, if a text file contained 'Guido' and or 'Robert' > and or 'Susan', then we should return True, otherwise return False.
A few ideas: 1. If you don't have a list of names, find a list of words that doesn't contain proper nouns (there are a few word lists out there, not sure if any exclude people's names, though). Look for short runs of two or three "words" (punctuation-separated tokens) in the email that aren't in the dictionary. Some of them will be people's names. 2. Send the text through Google translate and look for runs of words that are unchanged. Some of them will be people's names. 3. Search the literature and look for fancy algorithms. Here are some papers (the last mentions some commercial software to do this): http://citeseer.ist.psu.edu/bikel99algorithm.html http://citeseer.ist.psu.edu/618945.html http://arxiv.org/html/cmp-lg/9706017 John -- http://mail.python.org/mailman/listinfo/python-list