"rh0dium" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > Hi all, > > I am having a bit of difficulty in figuring out an efficient way to > split up my data and identify the unique pieces of it. > > list=['1p2m_3.3-1.8v_sal_ms','1p2m_3.3-1.8_sal_log'] > > Now I want to split each item up on the "_" and compare it with all > others on the list, if there is a difference I want to create a list of > the possible choices, and ask the user which choice of the list they > want. <snip>
Check out difflib. >>> data=['1p2m_3.3-1.8v_sal_ms','1p2m_3.3-1.8_sal_log'] >>> data[0].split("_") ['1p2m', '3.3-1.8v', 'sal', 'ms'] >>> data[1].split("_") ['1p2m', '3.3-1.8', 'sal', 'log'] >>> from difflib import SequenceMatcher >>> s = SequenceMatcher(None, data[0].split("_"), data[1].split("_")) >>> s.matching_blocks [(0, 0, 1), (2, 2, 1), (4, 4, 0)] I believe one interprets the tuples in matching_blocks as: (seq1index,seq2index,numberOfMatchingItems) In your case, the sequences have a matching element 0 and matching element 2, each of length 1. I don't fully grok the meaning of the (4,4,0) tuple, unless this is intended to show that both sequences have the same length. Perhaps from here, you could locate the gaps in the SequenceMatcher.matching_blocks property, and prompt for the user's choice. -- Paul -- http://mail.python.org/mailman/listinfo/python-list