On Nov 14, 11:56 am, [EMAIL PROTECTED] wrote: > Referred here from the tutor list. > > > I'm trying to write a program to test someones typing speed and show > > them their mistakes. However I'm getting weird results when looking > > for the differences in longer (than 100 chars) strings: > > > import difflib > > > # a tape measure string (just makes it easier to locate a given index) > > a = > > '1-3-5-7-9-12-15-18-21-24-27-30-33-36-39-42-45-48-51-54-57-60-63-66-69 > > -72-75-78-81-84-87-90-93-96-99-103-107-111-115-119-123-127-131-135-139 > > -143-147-151-155-159-163-167-171-175-179-183-187-191-195--200' > > > # now with a few mistakes > > b = '1-3-5-7- > > l-12-15-18-21-24-27-30-33-36-39o42-45-48-51-54-57-60-63-66-69-72-75-78 > > -81-84-8k-90-93-96-9l-103-107-111-115-119-12b-1v7-131-135-139-143-147- > > 151-m55-159-163-167-a71-175j179-183-187-191-195--200' > > > s = difflib.SequenceMatcher(None, a ,b) > > ms = s.get_matching_blocks() > > > print ms > > >>>> [(0, 0, 8), (200, 200, 0)] > > > Have I made a mistake or is this function designed to give up when the > > input strings get too long? If so what could I use instead to compute > > the mistakes in a typed text? > ---------- Forwarded message ---------- > From: Evert Rol > > Hi Tom, > > Ok, I wasn't on the list last year, but I was a few days ago, so > persistence pays off; partly, as I don't have a full answer. > > I got curious and looked at the source of difflib. There's a method > __chain_b() which sets up the b2j variable, which contains the > occurrences of characters in string b. So cutting b to 199 > characters, it looks like this: > b2j= 19 {'a': [168], 'b': [122], 'm': [152], 'k': [86], 'v': > [125], '-': [1, 3, 5, 7, 9, 12, 15, 18, 21, 24, 27, 30, 33, 36, 42, > 45, 48, 51, 54, 57, 60, 63, 66, 69, 72, 75, 78, 81, 84, 87, 90, 93, > 96, 99, 103, 107, 111, 115, 119, 123, 127, 131, 135, 139, 143, 147, > 151, 155, 159, 163, 167, 171, 179, 183, 187, 191, 195, 196], 'l': [8, > 98], 'o': [39], 'j': [175], '1': [0, 10, 13, 16, 20, 50, 80, 100, > 104, 108, 109, 110, 112, 113, 116, 117, 120, 124, 128, 130, 132, 136, > 140, 144, 148, 150, 156, 160, 164, 170, 172, 176, 180, 184, 188, 190, > 192], '0': [29, 59, 89, 101, 105, 198], '3': [2, 28, 31, 32, 34, 37, > 62, 92, 102, 129, 133, 137, 142, 162, 182], '2': [11, 19, 22, 25, 41, > 71, 121, 197], '5': [4, 14, 44, 49, 52, 55, 74, 114, 134, 149, 153, > 154, 157, 174, 194], '4': [23, 40, 43, 46, 53, 83, 141, 145], '7': > [6, 26, 56, 70, 73, 76, 106, 126, 146, 166, 169, 173, 177, 186], '6': > [35, 58, 61, 64, 65, 67, 95, 161, 165], '9': [38, 68, 88, 91, 94, 97, > 118, 138, 158, 178, 189, 193], '8': [17, 47, 77, 79, 82, 85, 181, > 185]} > > This little detour is because of how b2j is built. Here's a part from > the comments of __chain_b(): > > # Before the tricks described here, __chain_b was by far the most > # time-consuming routine in the whole module! If anyone sees > # Jim Roskind, thank him again for profile.py -- I never would > # have guessed that. > > And the part of the actual code reads: > b = self.b > n = len(b) > self.b2j = b2j = {} > populardict = {} > for i, elt in enumerate(b): > if elt in b2j: > indices = b2j[elt] > if n >= 200 and len(indices) * 100 > n: # <--- !! > populardict[elt] = 1 > del indices[:] > else: > indices.append(i) > else: > b2j[elt] = [i] > > So you're right: it has a stop at the (somewhat arbitrarily) limit of > 200 characters. How that exactly works, I don't know (needs more > delving into the code), though it looks like there also need to be a > lot of indices (len(indices*100>n); I guess that's caused in your > strings by the dashes, '1's and '0's (that's why I printed the b2j > string). > If you feel safe enough and on a fast platform, you can probably up > that limit (or even put it somewhere as an optional variable in the > code, which I would think is generally better). > Not sure who the author of the module is (doesn't list in the file > itself), but perhaps you can find out and email him/her, to see what > can be altered. > > Hope that helps. > > Evert
I would use the time module to "time" the user. Then you should be able to compare the original string with the user inputted string using cmp. <code> # untested start = time.time() print 'some complicated long string' # you should use a GUI toolkit's textbox rather than # using a variable user_string = raw_input('Please type the string above as quickly and accurately as you can:\n\n') end = time.time() print 'amount of time to complete: %s seconds' % (end-start) # do the comparison here # which I am not sure how to do right now </code> See the following for ideas on comparing similar strings/iterables: http://www.velocityreviews.com/forums/t345107-comparing-2-similar-strings.html Mike -- http://mail.python.org/mailman/listinfo/python-list