Re: Pattern Matching Given # of Characters and no String Input; use RegularExpressions?

tiissa Mon, 18 Apr 2005 12:05:06 -0700

Synonymous wrote:

tiissa <[EMAIL PROTECTED]> wrote in message news:<[EMAIL PROTECTED]>...
tiissa wrote:
If you know the number of characters to match can't you just compare slices?
If you don't, you can still do it by hand:
In [7]: def cmp(s1,s2): ....: diff_map=[chr(s1[i]!=s2[i]) for i in range(min(len(s1), len(s2)))] ....: diff_index=''.join(diff_map).find(chr(True)) ....: if -1==diff_index: ....: return min(len(s1), len(s2)) ....: else: ....: return diff_index ....:
I will look at that, although if i have 300 images i dont want to type
all the comparisons (In [9]: cmp('ccc','cccap')) by hand, it would
just be easier to sort them then :).

I didn't meant you had to type it by hand. I thought about writing a small script (as opposed to using some in the standard tools). It might look like:

In [22]: def make_group(L):
   ....:     root,res='',[]
   ....:     for i in range(1,len(L)):
   ....:         if ''==root:
   ....:             root=L[i][:cmp(L[i-1],L[i])]
   ....:             if ''==root:
   ....:                 res.append((L[i-1],[L[i-1]]))
   ....:             else:
   ....:                 res.append((root,[L[i-1],L[i]]))
   ....:         elif len(root)==cmp(root,L[i]):
   ....:             res[-1][1].append(L[i])
   ....:         else:
   ....:             root=''
   ....:     if ''==root:
   ....:         res.append((L[-1],[L[-1]]))
   ....:     return res
   ....:

In [23]: L=['cccat','cccap','cccan','dddfa','dddfg','dddfz']

In [24]: L.sort()

In [25]: make_group(L) Out[25]: [('ccca', ['cccan', 'cccap', 'cccat']), ('dddf', ['dddfa', 'dddfg', 'dddfz'])]

However I guarantee no optimality in the number of classes (but, hey, that's when you don't specify the size of the prefix). (Actually, I guarantee nothing at all ;p) But in particular, you can have some file singled out:

In [26]: make_group(['cccan','cccap','cccat','cccb'])
Out[26]: [('ccca', ['cccan', 'cccap', 'cccat']), ('cccb', ['cccb'])]

It is a matter of choice: either you want to specify by hand the size of the prefix and you'd rather look at itertools as pointed out by Kent, or you don't and a variation with the above code might do the job. -- http://mail.python.org/mailman/listinfo/python-list

Re: Pattern Matching Given # of Characters and no String Input; use RegularExpressions?

Reply via email to