On Oct 23, 4:12 pm, "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> wrote: > On Oct 23, 3:21 am, [EMAIL PROTECTED] wrote: > > > > > > > This was from a random website I found on practising good programming > > techniques and I thought I'd see what ways people could find to write > > out this example. Below are my two examples (which should work...). > > > I am merely interested in other techniques people have (without > > resorting to overusage of C modules and such like), just python and > > good computer science techniques. > > > For the wordlist go to this link <a href="http://codekata.pragprog.com/ > > 2007/01/kata_six_anagra.html">Kata Anagrams</a> > > Anyway here are my two examples. > > > This one uses a dictionary to store prime values of each letter in the > > alphabet and for each line multiple the results of the characters > > (which is unique for each anagram) and add them to a dictionary for > > printing latter. > > <pre> > > def anagram_finder(): > > primeAlpha = {'a':2, 'b':3, 'c':5, 'd':7,'e' : 11, 'f':13, 'g':17, > > 'h':19, \ > > 'i':23, 'j':29, 'k':31, 'l':37, 'm':41, 'n':43, 'o': > > 47, 'p':53, \ > > 'q':59, 'r':61, 's':67, 't':71, 'u':73, 'v':79, 'w': > > 83, 'x':89, \ > > 'y':97, 'z':101} > > dictAna = {} > > file = open ("wordlist.txt", "r") > > for line in file: > > value = 1 > > for s in line: > > if s.lower() in primeAlpha: > > value *= primeAlpha[s.lower()] > > dictAna.setdefault(value, []).append(line.strip()) > > file.close() > > print "\n".join(['Anagrams are: %s' % (v) for k, v in > > dictAna.items() if len(dictAna[k]) > 1]) > > </pre> > > > My second is a little bit simpler. Each dictionary key is an > > alphabetical ordering of the line in question, so that anagrams will > > appear the same. It will add to the dictionary the new word either in > > an existing key, or create a new one for it. > > > <pre> > > def anagram_finder(): > > dict = {} > > file = ('wordlist.txt', 'r') > > for line in file: > > strip_line = line.strip().lower() > > sort_line = str(sorted(strip_line)) > > dict.setdefault(sort_line, []).append(strip_line) > > file.close() > > print '\n'.join(['Anagrams are: %s' % (v) for k, v in dict.items() > > if len(dict[k]) > 1]) > > </pre> > > > Comparing them with timeit, they both took around 1 second or so, with > > the first being slightly faster. > > > What other ways do people think will work (and do mine actually work > > for other people!) > > ## I took a somewhat different approach. Instead of in a file, > ## I've got my word list (562456 words) in an MS-Access database. > ## And instead of calculating the signature on the fly, I did it > ## once and added the signature as a second field: > ## > ## TABLE CONS_alpha_only_signature_unique > ## -------------------------------------- > ## CONS text 75 > ## signature text 26 > ## > ## The signature is a 26 character string where each character is > ## the count of occurences of the matching letter. Luckily, in > ## only a single case was there more than 9 occurences of any > ## given letter, which turned not to be a word but a series of > ## words concatenated so I just deleted it from the database > ## (lots of crap in the original word list I used). > ## > ## Example: > ## > ## CONS signature > ## aah 20000001000000000000000000 # 'a' occurs twice & 'h' once > ## aahed 20011001000000000000000000 > ## aahing 20000011100001000000000000 > ## aahs 20000001000000000010000000 > ## aaii 20000000200000000000000000 > ## aaker 20001000001000000100000000 > ## aal 20000000000100000000000000 > ## aalborg 21000010000100100100000000 > ## aalesund > 20011000000101000010100000 > ## > ## Any words with identical signatures must be anagrams. > ## > ## Once this was been set up, I wrote a whole bunch of queries > ## to use this table. I use the normal Access drag and drop > ## design, but the SQL can be extracted from each, so I can > ## simply open the query from Python or I can grab the SQL > ## and build it inside the program. The query > ## > ## signatures_anagrams_select_signature > ## > ## is hard coded for criteria 9 & 10 and should be cast inside > ## Python so the criteria can be changed dynamically. > ## > ## > ## QUERY signatures_anagrams_longest > ## --------------------------------- > ## SELECT Len([CONS]) AS Expr1, > ## Count(Cons_alpha_only_signature_unique.CONS) AS > CountOfCONS, > ## Cons_alpha_only_signature_unique.signature > ## FROM Cons_alpha_only_signature_unique > ## GROUP BY Len([CONS]), > ## Cons_alpha_only_signature_unique.signature > ## HAVING (((Count(Cons_alpha_only_signature_unique.CONS))>1)) > ## ORDER BY Len([CONS]) DESC , > ## Count(Cons_alpha_only_signature_unique.CONS) DESC; > ## > ## This is why I don't use SQLite3, must have crosstab queries. > ## > ## QUERY signatures_anagram_summary > ## -------------------------------- > ## TRANSFORM Count(signatures_anagrams_longest.signature) AS > CountOfsignature > ## SELECT signatures_anagrams_longest.Expr1 AS [length of word] > ## FROM signatures_anagrams_longest > ## GROUP BY signatures_anagrams_longest.Expr1 > ## PIVOT signatures_anagrams_longest.CountOfCONS; > ## > ## > ## QUERY signatures_anagrams_select_signature > ## ------------------------------------------ > ## SELECT Len([CONS]) AS Expr1, > ## Count(Cons_alpha_only_signature_unique.CONS) AS > CountOfCONS, > ## Cons_alpha_only_signature_unique.signature > ## FROM Cons_alpha_only_signature_unique > ## GROUP BY Len([CONS]), > ## Cons_alpha_only_signature_unique.signature > ## HAVING (((Len([CONS]))=9) AND > ## ((Count(Cons_alpha_only_signature_unique.CONS))=10)) > ## ORDER BY Len([CONS]) DESC , > ## Count(Cons_alpha_only_signature_unique.CONS) DESC; > ## > ## QUERY signatures_lookup_by_anagram_select_signature > ## --------------------------------------------------- > ## SELECT signatures_anagrams_select_signature.Expr1, > ## signatures_anagrams_select_signature.CountOfCONS, > ## Cons_alpha_only_signature_unique.CONS, > ## Cons_alpha_only_signature_unique.signature > ## FROM signatures_anagrams_select_signature > ## INNER JOIN Cons_alpha_only_signature_unique > ## ON signatures_anagrams_select_signature.signature > ## = Cons_alpha_only_signature_unique.signature; > ## > ## > ## Now it's a simple matter to use the ODBC from Win32 to extract > ## the query output into Python. > > import dbi > import odbc > > con = odbc.odbc("words") > cursor = con.cursor() > > ## This first section grabs the anagram summary. Note that > ## queries act just like tables (as long as they don't have > ## internal dependencies). I read somewhere you can get the > ## field names, but here I put them in by hand. > > ##cursor.execute("SELECT * FROM signature_anagram_summary") > ## > ##results = cursor.fetchall() > ## > ##for i in results: > ## for j in i: > ## print '%4s' % (str(j)), > ## print > > ## (if this wraps, each line is 116 characters)
Sorry, s/b 96 characters, I was looking at the line number. Anyway, each line starts with a double hash, so it's easy to fix. > ## 2 3 4 5 6 7 8 9 10 11 12 13 > 14 15 16 17 18 23 > ## 2 259 None None None None None None None None None None None > None None None None None None > ## 3 487 348 218 150 102 None None None None None None None > None None None None None None > ## 4 1343 718 398 236 142 101 51 26 25 9 8 3 > 2 None None None None None > ## 5 3182 1424 777 419 274 163 106 83 53 23 20 10 > 6 4 5 1 3 1 > ## 6 5887 2314 1051 545 302 170 114 54 43 21 15 6 > 5 4 4 2 None None > ## 7 7321 2251 886 390 151 76 49 37 14 7 5 1 > 1 1 None None None None > ## 8 6993 1505 452 166 47 23 8 6 4 2 2 None > None None None None None None > ## 9 5127 830 197 47 17 6 None None 1 None None None > None None None None None None > ## 10 2975 328 66 8 2 None None None None None None None > None None None None None None > ## 11 1579 100 5 4 2 None None None None None None None > None None None None None None > ## 12 781 39 2 1 None None None None None None None None > None None None None None None > ## 13 326 11 2 None None None None None None None None None > None None None None None None > ## 14 166 2 None None None None None None None None None None > None None None None None None > ## 15 91 None 1 None None None None None None None None None > None None None None None None > ## 16 60 None None None None None None None None None None None > None None None None None None > ## 17 35 None None None None None None None None None None None > None None None None None None > ## 18 24 None None None None None None None None None None None > None None None None None None > ## 19 11 None None None None None None None None None None None > None None None None None None > ## 20 6 None None None None None None None None None None None > None None None None None None > ## 21 6 None None None None None None None None None None None > None None None None None None > ## 22 4 None None None None None None None None None None None > None None None None None None > > ## From the query we have the word size as row header and size of > ## anagram set as column header. The data value is the count of > ## how many different anagram sets match the row/column header. > ## > ## For example, there are 7321 different 7-letter signatures that > ## have 2 anagram sets. There is 1 5-letter signature having a > ## 23 member anagram set. > ## > ## We can then pick any of these, say the single 10 member anagram > ## set of 9-letter words, and query out out the anagrams: > > cursor.execute("SELECT * FROM > signatures_lookup_by_anagram_select_signature") > results = cursor.fetchall() > for i in results: > for j in i: > print j, > print > > ## 9 10 anoretics 10101000100001100111000000 > ## 9 10 atroscine 10101000100001100111000000 > ## 9 10 certosina 10101000100001100111000000 > ## 9 10 creations 10101000100001100111000000 > ## 9 10 narcotise 10101000100001100111000000 > ## 9 10 ostracine > > read more ยป- Hide quoted text - > > - Show quoted text -...
-- http://mail.python.org/mailman/listinfo/python-list