This is a first try, is something like this enough for you?
data = """apple1 apple2 apple3_SD formA formB formC kla_MM kla_MB kca_MM""".split() headtails = {} for word in data: head = word[:-1] if head in headtails: headtails[head].append(word[-1]) else: headtails[head] = [word[-1]] for head, tails in sorted(headtails.iteritems()): if len(tails) == 1: print head + tails[0] else: print head + "[%s]" % "".join(tails) Output: apple[12] apple3_SD form[ABC] kca_MM kla_M[MB] It looks only the last letters. It modifies the original order (it sorts the sequence on the root-word). If you don't need sorted results you can remove the sorted(). Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list