On 16.07.2014 13:49, Jose Amoreira wrote:
Hello
I wrote a function that, given a list of numbers, finds clusters of
values by proximity and returns a reduced list containing the centers of
these clusters. However, I find it rather unclear. I would appreciate
any comments on how pythonic my function is and suggestions to improve
its readability.
The function is:

def aglomerate(x_lst, delta=1.e-5):
     clusters = [] #list of pairs [center, number of clustered values]
     for x in x_lst:
         close_to = [abs(x - y) < delta for y,_ in clusters]
         if any(close_to):
             # x is close to a cluster
             index = close_to.index(True)
             center, n = clusters[index]
             #update the cluster center including the new value,
             #and increment dimension of cluster
             clusters[index] = (n * center + x)/(n+1), n+1

careful here: you just stored a tuple instead of a list; doesn't matter for your current implementation, but may bite you at some point.

         else:
             # x does not belong to any cluster, create a new one
             clusters.append([x,1])
     # return list with centers
     return [center for center, _ in clusters]


Your building of the close_to list with Trues and Falses, then using it to find the original element again makes me think that you're a regular R user ? Using such a logical vector (I guess that's what you'd call it in R) should (almost) never be required in Python. Possible solutions for your case depend somewhat on what you want to be able to do. Currently, you are identifying all possible clusters that a new value may belong to, but then you are simply adding it to the first cluster. If that's all your code needs to do, then you could do (this avoids indexing altogether by exploiting the fact that lists are mutable):

def aglomerate(x_lst, delta=1.e-5):
    clusters = [] #list of pairs [center, number of clustered values]
    for x in x_lst:
        for cluster in clusters:
            # cluster is now a list of two elements
            # lists are mutable objects
            center, n = cluster # to keep things readable
            if abs(x - center) < delta:
                # x is close to a cluster
                #update the cluster center including the new value,
                #and increment dimension of cluster
                # since list are mutable objects you can do this as an
                # in-place operation
                cluster[0] = (n * center + x)/(n+1)
                cluster[1] = n+1
                break
        else:
            # this block is executed if the break in the preceeding
            # block wasn't reached =>
            # x does not belong to any cluster, create a new one
            clusters.append([x,1])
    # return list with centers
    return [center for center, _ in clusters]

Given your return value on the other hand, it looks like your clusters data structure is somewhat suboptimal. In fact, you would be better off with two separate lists for centers and sizes, in which case you could simply
return centers.

Here's an implementation of this idea demonstrating how, in Python, you typically use the builtin enumerate function to avoid "logical vectors" or other kinds of index juggling:

def aglomerate(x_lst, delta=1.e-5):
    centers = []
    sizes = []
    for x in x_lst:
        for i, center in enumerate(centers):
            if abs(x - center) < delta:
                # x is close to a cluster
                #update the cluster center including the new value,
                #and increment dimension of cluster
                n = sizes[i]
                centers[i] = (n * center + x)/(n+1)
                sizes[i] = n+1
                break
        else:
            # this block is executed only when the break in the preceeding
            # block wasn't reached =>
            # x does not belong to any cluster, create a new one
            centers.append(x)
            sizes.append(1)
    # return list with centers
    return centers

In summary, indices in Python should be handled using enumerate and sometimes can be avoided completely by in-place manipulations of mutable objects.

Best,
Wolfgang
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Reply via email to