First of all thank you very much for the reply. I hope I'm not too verbose here, just hope that if someone else runs into a similar problem they can find an answer here.
> This appears to be a Computer Science 101 Data Structures and > Algorithms question, not a Python question, but here's an answer > anyway Heh, it's been awhile since I took data structures 101 but I asked here after reading about python generators, and although I can't quite get my head around them, I have a sense that they could help me check all mutations (is this the right word?) of the candidate strings. Thanks for the ideas on finding matches, I will have to try the bit filter out to see if its faster. The trie works similar to this in that it will return a 3 if the substring is not in the dictionary so that you don't have to check any superset of the mutation. Although to clarify, the reason I posted was mostly for help on my weak wildcard algorithm. > If "aa" is in your dictionary, what queries would retrieve it? Any query that contains either "a" and "a" or "a" and "?" or "?" and "?". So some examples would be: "aa", "a?", "aba", "baa", "bbbaba", "b?b?", "bab?", "?abbbbb" etc... > Searching with wild cards: your example of query == "??" seems to yield > all two-letter words. I'd like to see what you expect for "a?", "?a", > "ab?", and "aa?" before suggesting how to tackle wild cards. > Reverse-engineering requirements out of other folks' code is not > something I do for fun :-) My apologies, in my haste my example of what query == "??" yields was unclear. Every list of candidate letters should yield every unique (if possible) mutation, of length 1 to len(candidate). Heh, exactly what each of those strings would yield is pretty much the root of my problem: ?? yields (the order of the results dont matter): a, b, c, d, e, f, ..., x, y, z, aa, ab, ac, ..., ax, ay, az, ba, bb, bc, ..., ya, yb, yc, ..., yx, yy, yz, za, zb, zc, ..., zx, zy, zz a? yields a, b, c, d, ..., x, y, z, aa, ab, ac, ..., ax, ay, az, ba, ca, da, ..., xa, ya, za ?a yields same as a? ab? yields a, b, c, d, ..., x, y, z, aa, ab, ac, ..., ax, ay, az, ba, bb, bc, ..., bx, by, bz, ca, cb, da, db, ea, eb, ..., xa, xb, ya, yb, za, zb, aba, abb, abc, abd, ..., abx, aby, abz, baa, bab, bac, ..., bax, bay, baz, aab, acb, adb, ..., axb, ayb, azb, bba, bca, bda, ... bxa, bya, bza, cba, dba, eba, xba, yba, zba HTH, basically every possible string out of the given pool of up to 10 letters. Ok writing these out just gave me an idea on how I could try and write this another way. Ok done. Wow, this program actually ran much much slower than my original 135 seconds vs 9 seconds!! I'm not sure if I made a huge mistake or if it's because in the original I can stop searching superset-candidates if the candidate is not a substring of a dictionary word (in the same way the mask would reject impossible mutations). This produces the correct mutations however (although not unique as I wrote out above): -- begin code -- !def getmutations(pool): ! ! for i, letter in enumerate(pool): ! pool2 = pool[:] ! pool2.pop(i) ! ! for mutation in getmutations(pool2): ! if letter != '?': ! yield letter + mutation ! else: ! for c in alpha: ! yield c + mutation ! ! yield "" !letters = list('abae?s?') !s = time.time() !output = open('words.log','w') !for candidate in getmutations(letters): ! # Check if word is valid ! result = mytrie.query(candidate) ! ! # Found a word, add it to the database ! if result == 1: ! words[candidate] = 1 ! !print time.time() - s --- end code --- -- http://mail.python.org/mailman/listinfo/python-list