Here is a simple function that scans through an input file and groups the lines of the file into sections. Sections start with 'Name:' and end with a blank line. The function yields sections as they are found.

def makeSections(f):
    currSection = []

    for line in f:
        line = line.strip()
        if line == 'Name:':
            # Start of a new section
            if currSection:
                yield currSection
                currSection = []
            currSection.append(line)

        elif not line:
            # Blank line ends a section
            if currSection:
                yield currSection
                currSection = []

        else:
            # Accumulate into a section
            currSection.append(line)

    # Yield the last section
    if currSection:
        yield currSection

There is some obvious code duplication in the function - this bit is repeated 
2.67 times ;-):
            if currSection:
                yield currSection
                currSection = []

As a firm believer in Once and Only Once, I would like to factor this out into a separate function, either a nested function of makeSections(), or as a separate method of a class implementation. Something like this:

def makeSections(f):    ### DOESN'T WORK ###
    currSection = []

    def yieldSection():
        if currSection:
            yield currSection
            del currSection[:]

    for line in f:
        line = line.strip()
        if line == 'Name:':
            # Start of a new section
            yieldSection()
            currSection.append(line)

        elif not line:
            # Blank line ends a section
            yieldSection()

        else:
            # Accumulate into a section
            currSection.append(line)

    # Yield the last section
    yieldSection()


The problem is that yieldSection() now is the generator, and makeSections() is not, and the result of calling yieldSection() is a new iterator, not the section...


Is there a way to do this or do I have to live with the duplication?

Thanks,
Kent


Here is a complete program:

data = '''
Name:
City:
xxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxx
....................
xxxxxxxxxxxxxxxxxxxx


Name: City: xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxx

'''

import cStringIO    # just for test

def makeSections(f):
    ''' This is a generator function. It will return successive sections
        of f until EOF.

        Sections are every line from a 'Name:' line to the first blank line.
        Sections are returned as a list of lines with line endings stripped.
    '''

    currSection = []

    for line in f:
        line = line.strip()
        if line == 'Name:':
            # Start of a new section
            if currSection:
                yield currSection
                currSection = []
            currSection.append(line)

        elif not line:
            # Blank line ends a section
            if currSection:
                yield currSection
                currSection = []

        else:
            # Accumulate into a section
            currSection.append(line)

    # Yield the last section
    if currSection:
        yield currSection


f = cStringIO.StringIO(data)

for section in makeSections(f):
    print 'Section'
    for line in section:
        print '   ', line
    print
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to