Re: Refactoring a generator function

Steven Bethard Sat, 04 Dec 2004 08:45:34 -0800

Kent Johnson wrote:

Here is a simple function that scans through an input file and groups the lines of the file into sections. Sections start with 'Name:' and end with a blank line. The function yields sections as they are found.
def makeSections(f):
    currSection = []
    for line in f:
        line = line.strip()
        if line == 'Name:':
            # Start of a new section
            if currSection:
                yield currSection
                currSection = []
            currSection.append(line)
        elif not line:
            # Blank line ends a section
            if currSection:
                yield currSection
                currSection = []
        else:
            # Accumulate into a section
            currSection.append(line)
    # Yield the last section
    if currSection:
        yield currSection
There is some obvious code duplication in the function - this bit is repeated 2.67 times ;-): if currSection: yield currSection currSection = []


You can write:

for section in yieldSection():
    yield section

in both places, but I assume you still don't like the code duplication this would create.

How about something like (completely untested):

if line == 'Name:' or not line:
    if currSection:
        yield currSection
        currSection = []
    if line == 'Name:'
        currSection.append(line)

Another consideration: in Python 2.4, itertools has a groupby function that you could probably get some benefit from:

>>> class Sections(object):
...     def __init__(self):
...         self.is_section = False
...     def __call__(self, line):
...         if line == 'Name:\n':
...             self.is_section = True
...         elif line == '\n':
...             self.is_section = False
...         return self.is_section
...
>>> def make_sections(f):
...     for _, section in itertools.groupby(f, Sections()):
...         result = ''.join(section)
...         if result != '\n':
...             yield result
...
>>> f = 'Name:\nA\nx\ny\nz\n\nName:\nB\na\nb\nc\n'.splitlines(True)
>>> list(make_sections(f))
['Name:\nA\nx\ny\nz\n', 'Name:\nB\na\nb\nc\n']
--
http://mail.python.org/mailman/listinfo/python-list

Re: Refactoring a generator function

Reply via email to