Re: Python's re module and genealogy problem

BrJohan Fri, 13 Jun 2014 08:22:32 -0700

On 11/06/2014 14:23, BrJohan wrote:

For some genealogical purposes I consider using Python's re module.


Rather many names can be spelled in a number of similar ways, and in
order to match names even if they are spelled differently, I will build
regular expressions, each of which is supposed to match  a number of
similar names.

I guess that there will be a few hundred such regular expressions
covering most popular names.

Now, my problem: Is there a way to decide whether any two - or more - of
those regular expressions will match the same string?

Or, stated a little differently:

Can it, for a pair of regular expressions be decided whether at least
one string matching both of those regular expressions, can be constructed?

If it is possible to make such a decision, then how? Anyone aware of an
algorithm for this?


Thank you all for valuable input and interesting thoughts.

After having reconsidered my problem, it might be better to approach ita little differently.


Either to state the regexps simply like:
"(Kristina)|(Christina)|(Cristine)|(Kristine)"
instead of "((K|(Ch))ristina)|([CK]ristine)"

Or to put the namevariants in some sequence of sets having elements like:
("Kristina", "Christina", "Cristine", "Kristine")
Matching is then just applying the 'in' operator.

I see two distinct advantages.
1. Readability and maintainability

2. Any namevariant occurring in just one regexp or set means no risk oferroneous matching.


Comments?


--
https://mail.python.org/mailman/listinfo/python-list

Re: Python's re module and genealogy problem

Reply via email to