In <[EMAIL PROTECTED]>, brad wrote:

> I am developing a list of 3 character strings like this:
> 
> and
> bra
> cam
> dom
> emi
> mar
> smi
> ...
> 
> The goal of the list is to have enough strings to identify files that 
> may contain the names of people. Missing a name in a file is unacceptable.

Then simply return `True` for any file that contains at least two or three
ASCII letters in a row.  Easily written as a short re.  ;-)

> I may end up with a thousand or so of these 3 character strings. Is that 
> too much for an re.compile to handle? Also, is this a bad way to 
> approach this problem? Any ideas for improvement are welcome!

Unless you can come up with some restrictions to the names, just follow
the advice above or give up.  I saw a documentation about someone with the
name "Scary Guy" in his ID papers recently.  What about names with letters
not in the ASCII range?

Ciao,
        Marc 'BlackJack' Rintsch
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to