"John Machin" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED]
> In that case, the problem can be solved in O(n) time by a simple loop > which counts the number of differences and notes the position of the > first (if any) difference. Any full-distance Levenshtein method that > does no pre-processing must take O(n**2) time. It is possible to take > advantage of the fact that the OP doesn't care about what the distance > is exactly if it is not 1; 0 is just as bad as 2, 3, 4, etc. > Here is a class that implements 4 different approaches to this comparison problem, with a configurable number of maximum mismatches (where a and b are the strings being compared): 1. use sum(map(lambda (x,y): x!=y, itertools.izip(a,b))) to count mismatches - no short-circuiting 2. use sum(map(abs, itertools.starmap(cmp, itertools.izip(a,b)))) to count mismatches - no short-circuiting 3. use explicit for loop to compare each tuple returned from itertools.izip(a,b) - short-circuits when number of mismatches exceeds allowable number 4. use for loop over itertools.takewhile to count mismatches - short-circuits when number of mismatches exceeds allowable number Also note the short-circuiting in the initializer - if the strings being compared are shorter in length then the number of allowed mismatches, than they will always match, no comparisons required. (In the OP's specific case, any two one-character strings will match within one character). Of course this does nothing to address the larger issue of the program design - I just wanted to learn a little more about itertools. :) -- Paul from itertools import izip,starmap,takewhile class offByNoMoreThanOneCharacter(object): def __init__(self,a,b,maxMismatches=1): len_a = len(a) self.val = None if len_a != len(b): self.val = False elif len_a <= maxMismatches: self.val = True else: self.a = a self.b = b self.maxMismatches = maxMismatches def eval1(self): # one-liner using map with lambda - sum counts mismatches self.val = sum(map(lambda (x,y): x!=y, izip(self.a,self.b))) <= self.maxMismatches def eval2(self): # one-liner using map with abs and cmp - sum counts mismatches self.val = sum(map(abs, starmap(cmp, izip(self.a,self.b)))) <= self.maxMismatches def eval3(self): # explicit traversal, with short-circuit when too many mismatches found mismatches = 0 for (ac,bc) in izip(self.a,self.b): if ac != bc: mismatches += 1 if mismatches > self.maxMismatches: self.val = False break else: self.val = True def eval4(self): # traversal using takewhile, also short-circuits when too many mismatches found numMismatches = 0 maxMismatches = self.maxMismatches stillOk = lambda (s,t)=(None,None) : numMismatches <= maxMismatches for (ac,bc) in takewhile(stillOk, izip(self.a,self.b)): numMismatches += (ac != bc) self.val = stillOk() # special instance method to return "boolean-ness" of this object def __nonzero__(self): if self.val is None: # change this line to use whichever eval method you want to test self.eval1() return self.val def test(a,b): print a,b print bool(offByNoMoreThanOneCharacter(a,b)) print test("abc","abd") test("abc","axd") test("yaqtil","yaqtel") test("yiqtol","yaqtel") -- http://mail.python.org/mailman/listinfo/python-list