Robert R.: > i would like to write a piece of code to help me to align some sequence > of words and suggest me the ordered common subwords of them [...] > a trouble i have if when having many different strings my results tend > to be nothing while i still would like to have one of the, or maybe, > all the best matches.
This is my first solution try, surely there are faster, shorter, better solutions... from collections import defaultdict from itertools import chain from graph import Graph # http://sourceforge.net/projects/pynetwork/ def commonOrdered(*strings): lists = [[w for w in string.lower().split() if w.isalpha()] for string in strings] freqs = defaultdict(int) for w in chain(*lists): freqs[w] += 1 g = Graph() for words in lists: g.addPath(words) len_strings = len(strings) return [w for w in g.toposort() if freqs[w]==len_strings] s0 = "this is an example of a thing i would like to have" s1 = "another example of something else i would like to have" s2 = 'and this is another " example " but of something ; now i would still like to have' print commonOrdered(s0, s1, s2) It creates a graph with the paths of words, then sorts the graph topologically, then takes only the words of the sorting that are present in all the original strings. With a bit of work the code can be used if it contains words like "example" instead of " example ". An xtoposort method too can be added to the Graph class... Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list