Re: Seeking regex optimizer

gry Mon, 19 Jun 2006 12:00:46 -0700

Kay Schluehr wrote:
> I have a list of strings ls = [s_1,s_2,...,s_n] and want to create a
> regular expression sx from it, such that sx.match(s) yields a SRE_Match
> object when s starts with an s_i for one i in [0,...,n].  There might
> be relations between those strings: s_k.startswith(s_1) -> True or
> s_k.endswith(s_1) -> True. An extreme case would be ls = ['a', 'aa',
> ...,'aaaa...ab']. For this reason SRE_Match should provide the longest
> possible match.


In a very similar case I used a simple tree of dictionaries, one node
per letter, to represent the strings.
This naturally collapses cases like ['a','aa','aaa'].  Then a recursive
function finds
the desired prefix.  This was WAY faster than the "re" module for large
n (tradeoff point for me was n~1000).  It requires a bit more coding,
but I think it is the natural data structure for this problem.

As others have suggested, you should first try the most naive
implementation before making a hard optimization problem out of this.

> Is there a Python module able to create an optimized regex rx from ls
> for the given constraints?
> 
> Regards,
> Kay

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Seeking regex optimizer

Reply via email to