On Wed, 10 Aug 2005 16:51:55 +0200, Paolino <[EMAIL PROTECTED]> wrote:
>I have a self organizing net which aim is clustering words. >Let's think the clustering is about their 2-grams set. >Words then are instances of this class. > >class clusterable(str): > def __abs__(self):# the set of q-grams (to be calculated only once) > return set([(self+self[0])[n:n+2] for n in range(len(self))]) > def __sub__(self,other): # the q-grams distance between 2 words > set1=abs(self) > set2=abs(other) > return len(set1|set2)-len(set1&set2) > >I'm looking for the medium of a set of words, as the word which >minimizes the sum of the distances from those words. > >Aka:sum([medium-word for word in words]) > > >Thanks for ideas, Paolino > Just wondering if this is a desired result: >>> clusterable('banana')-clusterable('bananana') 0 i.e., resulting from >>> abs(clusterable('banana'))-abs(clusterable('bananana')) set([]) >>> abs(clusterable('banana')) set(['na', 'ab', 'ba', 'an']) >>> abs(clusterable('bananana')) set(['na', 'ab', 'ba', 'an']) Regards, Bengt Richter -- http://mail.python.org/mailman/listinfo/python-list