It's not clear to me from your posting what possible order the tags may be inn. Assuming you will always END a section before beginning an new, eg.
it's always: A some A-section lines. END A B some B-section lines. END B etc. And never: A some A-section lines. B some B-section lines. END B END A etc. is should be fairly simple. And if the file is several GB, your ought to use a generator in order to overcome the memory problem. Something like this: def make_tag_lookup(begin_tags): # create a dict with each {begin_tag : end_tag} end_tags = [('END ' + begin_tag) for begin_tag in begin_tags] return dict(zip(begin_tags, end_tags)) def return_sections(filepath, lookup): # Generator returning each section inside_section = False for line in open(filepath, 'r').readlines(): line = line.strip() if not inside_section: if line in lookup: inside_section = True data_section = [] section_end_tag = lookup[line] section_begin_tag = line data_section.append(line) # store section start tag else: if line == section_end_tag: data_section.append(line) # store section end tag inside_section = False yield data_section # yield entire section else: data_section.append(line) #store each line within section # create the generator yielding each section # sections = return_sections(datafile, make_tag_lookup(list_of_begin_tags)) for section in sections: for line in section: print line print '\n' -- http://mail.python.org/mailman/listinfo/python-list