Steven Bethard wrote: > I have a string with a bunch of whitespace in it, and a series of chunks > of that string whose indices I need to find. However, the chunks have > been whitespace-normalized, so that multiple spaces and newlines have > been converted to single spaces as if by ' '.join(chunk.split()). Some
If you are willing to get your hands dirty with regexps: import re _reLump = re.compile(r"\S+") def indices(text, chunks): lumps = _reLump.finditer(text) for chunk in chunks: lump = [lumps.next() for _ in chunk.split()] yield lump[0].start(), lump[-1].end() def main(): text = """\ aaa bb ccc dd eee. fff gggg hh i. jjj kk. """ chunks = ['aaa bb', 'ccc dd eee.', 'fff gggg hh i.', 'jjj', 'kk.'] assert list(indices(text, chunks)) == [(3, 10), (11, 22), (24, 40), (44, 47), (48, 51)] if __name__ == "__main__": main() Not tested beyond what you see. Peter -- http://mail.python.org/mailman/listinfo/python-list