Gundala Viswanath wrote: > I have the following list of lists that contains 6 entries: > > lol = [['a', 3, 1.01], ['x',5, 1.00],['k',7, 2.02],['p',8, 3.00], > ['b', 10, 1.09], > ['f', 12, 2.03]] > > each list in lol contain 3 elements: > > ['a', 3, 1.01] > e1 e2 e3 > > The list above is already sorted according to the e2 (i.e, 2nd element) > > I'd like to 'cluster' the above list following roughly these steps: > > 1. Pick the lowest entry (wrt. e2) in lol as the key of first cluster > 2. Assign that as first member of the cluster (dictionary of list) > 3. Calculate the difference current e3 in next list with first member > of existing clusters. > 3. If the difference is less than threshold, assign that list as the > member of the corresponding cluster Else, create new cluster with > current list as new key. > 3. Repeat the rest until finish > > The final result will look like this, with threshold <= 0.1. > > dol = {'a':['a','x','b'], 'k':['k','f'], 'p':['p']} > > I'm stuck with this step what's the right way to do it: > > __BEGIN__ > import json > from collections import defaultdict > > thres = 0.1 > tmp_e3 = 0 > tmp_e1 = "-" > > lol = [['a', 3, 1.01], ['x',5, 1.00],['k',7, 2.02], > ['p',8, 3.00], ['b', 10, 1.09], ['f', 12, 2.03]] > > dol = defaultdict(list) > for thelist in lol: > e1, e2, e3 = thelist > > if tmp_e1 == "-": > tmp_e1 = e1 > else: > diff = abs(tmp_e3 - e3) > if diff > thres: > tmp_e1 = e1 > > dol[tmp_e1].append(e1) > tmp_e1 = e1 > tmp_e3 = e3 > > print json.dumps(dol, indent=4) > __END__
I won't provide the complete solution for a homework question, but here's the idea that let me to working code: Introduce a helper list (let's call it `pairs`) where you put (e1, e3) tuples (called `inner_e1`, `inner_e3` below). Inside the loop over `lol` loop over `pairs` -- and if you find an item where the inner_e3 is close enough to e3 you stop and use the inner_e1 as the key: dol[inner_e1].append(e1) If there is no inner_e3 close enough you add another item to pairs and a new key to dol: pairs.append((e1, e3)) dol[e1] = [e1] Hints: `for ... else` with a `break` thrown in fits the above description for the inner loop nicely. Note that this 2-loop approach is not very efficient if the number of clusters is large -- but first make it work, then -- maybe -- make it fast. -- https://mail.python.org/mailman/listinfo/python-list