after change to frozenset , it seems can classify into 7 repeated group
but, in real data this consecutive lines can be also a group but i can not find this, actually i do not understand how the function works is there any algorithm tutorials or books or external library that can have a better result for finding repeated lines as group in grouping application 1,[(1, 0, 1)] 1,[(1, 0, 1)] 1,[(1, 0, 1)] 1,[(1, 0, 1)] real data 1,[(1, 0, 1)] 1,[] 1,[(-1, 1, -2), (2, -1/2, 1)] 0,[(1, 0, 0)] 1,[(1, 0, 1)] 1,[] 1,[] 0,[{a: 0}, {b: -1/c, a: 0}, {c: 1, b: -1, a: 0}, {c: -1, b: 1, a: 0}] 0,[(1/2 + sqrt(5)/2, -sqrt(5)/2 + 1/2, -1/2 + sqrt(5)/2), (-sqrt(5)/2 + 1/2, 1/2 + sqrt(5)/2, -sqrt(5)/2 - 1/2)] 0,[(1, 0, 0)] 1,[(1, 0, 1)] 1,[(1, 0, 1)] 1,[(1, 0, 1)] 1,[(1, 0, 1)] 1,[] 1,[] 0,[(1/2 + sqrt(5)/2, -sqrt(5)/2 + 1/2, -1/2 + sqrt(5)/2), (-sqrt(5)/2 + 1/2, 1/2 + sqrt(5)/2, -sqrt(5)/2 - 1/2)] 0,[(1, 0, 0)] 1,[] 1,[(-1, 1, -2), (2, -1/2, 1)] 0,[(1, 0, 0)] 1,[(1, 0, 1)] 1,[(1, 0, 1)] 1,[] 1,[(-1, 1, -2), (2, -1/2, 1)] 0,[(-1, -1, 1), (1, 1, -1)] 1,[] 0,[(1/2 + sqrt(5)/2, -sqrt(5)/2 + 1/2, -1/2 + sqrt(5)/2), (-sqrt(5)/2 + 1/2, 1/2 + sqrt(5)/2, -sqrt(5)/2 - 1/2)] 0,[(-1, -1, 1), (1, 1, -1)] 1,[] 0,[(1/2 + sqrt(5)/2, -sqrt(5)/2 + 1/2, -1/2 + sqrt(5)/2), (-sqrt(5)/2 + 1/2, 1/2 + sqrt(5)/2, -sqrt(5)/2 - 1/2)] 0,[(-1, -1, 1), (1, 1, -1)] 1,[(-1, 1, -2), (2, -1/2, 1)] 0,[(-1, -1, 1), (1, 1, -1)] 1,[(-1, 1, -2), (2, -1/2, 1)] 0,[(1, 0, 0)] 1,[] 1,[(-1, 1, -2), (2, -1/2, 1)] 0,[(-1, -1, 1), (1, 1, -1)] 1,[] 0,[(1/2 + sqrt(5)/2, -sqrt(5)/2 + 1/2, -1/2 + sqrt(5)/2), (-sqrt(5)/2 + 1/2, 1/2 + sqrt(5)/2, -sqrt(5)/2 - 1/2)] 0,[(1, 0, 0)] 1,[(1, 0, 1)] 1,[] def consolidate(sets): # setlist = [s for s in sets if s] for i, s1 in enumerate(setlist): if s1: for s2 in setlist[i+1:]: intersection = s1.intersection(s2) if intersection: s2.update(s1) s1.clear() s1 = s2 return [s for s in setlist if s] def wrapper(seqs): consolidated = consolidate(map(set, seqs)) groupmap = {x: i for i,seq in enumerate(consolidated) for x in seq} output = {} for seq in seqs: target = output.setdefault(groupmap[seq[0]], []) target.append(seq) return list(output.values()) with open("testing1.txt", "r") as myfile: content = myfile.readlines() gr = [['']] for ii in range(0,500): try: gr = [[frozenset(content[ii].split())]] + gr except: print "error" + str(content[ii]) groups = wrapper(gr) for i, group in enumerate(wrapper(gr)): print('g{}:'.format(i), group) print("\n") On Wednesday, October 5, 2016 at 3:40:25 PM UTC+8, dieter wrote: > meInvent bbird <> writes: > ... not looking at the details ... > > "'str' object has not attribute 'intersection'": apparently, > something is really a string (an 'str') while you expect it to be a set. > > "unhashable set": maybe, you try to put a set into another set (or a dict; > or somewhere else where hashability is necessary). A "set" itself is > unhashable (like many mutable standard data types); you may consider to use > "frozenset" in those cases (of course, a "frozenset" is immutable, i.e. > cannot be changed after creation). --