David Hirschfield <[EMAIL PROTECTED]> writes: >Here's the problem: Given a list of item names like:
>apple1 >apple2 >apple3_SD >formA >formB >formC >kla_MM >kla_MB >kca_MM >which is a subset of a much larger list of items, >is there an efficient algorithm to create condensed forms that match >those items, and only those items? Such as: >apple[12] >apple3_SD >form[ABC] >kla_M[MB] >kca_MM >The condensed expression syntax only has [...] and * as operators. [...] >matches a set of individual characters, * matches any string. >I'd be satisfied with a solution that only uses the [...] syntax, since >I don't think it's possible to use * without potentially matching items >not explicitly in the given list. >I'm not sure what this condensed expression syntax is called (looks a >bit like shell name expansion syntax), and I'm not even sure there is an >efficient way to do what I'm asking. Any ideas would be appreciated. If you used regular expressions instead of globs then I think what you want is to optimize the regular expression for: 'apple1|apple2|apple3_SD|formA|formB|formC|kla_MM|kla_MB|kca_MM' then spit the optimised finite automaton back out. Of course if you're just searching Python might do a decent job of optimising it anyway. Looking at: http://en.wikipedia.org/wiki/Finite_state_machine#Optimization they make it sound so easy!. There's also a whole bunch of tools on that page. Maybe there's an optimiser you can use in one of them. Failing that I guess you build a tree and try to merge nodes where they fork. I suspect an optimal solution would be hard but if you knew your input did have lots of redundancy a simple algorithm might work. Or I could be talking crap as usual. Eddie -- http://mail.python.org/mailman/listinfo/python-list