James wrote:
Hello all,
I'm working on some NLP code - what I'm doing is passing a large
number of tokens through a number of filtering / processing steps.

The filters take a token as input, and may or may not yield a token as
a result. For example, I might have filters which lowercases the
input, filter out boring words and filter out duplicates chained
together.

I originally had code like this:
for t0 in token_stream:
  for t1 in lowercase_token(t0):
    for t2 in remove_boring(t1):
      for t3 in remove_dupes(t2):
        yield t3

Apart from being ugly as sin, I only get one token out as
StopIteration is raised before the whole token stream is consumed.

Any suggestions on an elegant way to chain together a bunch of
generators, with processing steps in between?

What you should be doing is letting the filters accept an iterator and
yield values on demand:

def lowercase_token(stream):
    for t in stream:
        yield t.lower()

def remove_boring(stream):
    for t in stream:
        if t not in boring:
            yield t

def remove_dupes(stream):
    seen = set()
    for t in stream:
        if t not in seen:
            yield t
            seen.add(t)

def compound_filter(token_stream):
    stream = lowercase_token(token_stream)
    stream = remove_boring(stream)
    stream = remove_dupes(stream)
    for t in stream(t):
        yield t
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to