Jason Friedman wrote: > I have a file such as: > > $ cat my_data > Starting a new group > a > b > c > Starting a new group > 1 > 2 > 3 > 4 > Starting a new group > X > Y > Z > Starting a new group > > I am wanting a list of lists: > ['a', 'b', 'c'] > ['1', '2', '3', '4'] > ['X', 'Y', 'Z'] > [] > > I wrote this: > ------------------------------------ > #!/usr/bin/python3 > from itertools import groupby > > def get_lines_from_file(file_name): > with open(file_name) as reader: > for line in reader.readlines():
readlines() slurps the whole file into memory! Don't do that, iterate over the file directly instead: for line in reader: > yield(line.strip()) > > counter = 0 > def key_func(x): > if x.startswith("Starting a new group"): > global counter > counter += 1 > return counter > > for key, group in groupby(get_lines_from_file("my_data"), key_func): > print(list(group)[1:]) > ------------------------------------ > > I get the output I desire, but I'm wondering if there is a solution > without the global counter. If you were to drop the empty groups you could simplify it to def is_header(line): return line.startswith("Starting a new group") with open("my_data") as lines: stripped_lines = (line.strip() for line in lines) for header, group in itertools.groupby(stripped_lines, key=is_header): if not header: print(list(group)) And here's a refactoring for your initial code. The main point is the use of nonlocal instead of global state to make the function reentrant. def split_groups(items, header): odd = True def group_key(item): nonlocal odd if header(item): odd = not odd return odd for _key, group in itertools.groupby(items, key=group_key): yield itertools.islice(group, 1, None) def is_header(line): return line.startswith("Starting a new group") with open("my_data") as lines: stripped_lines = map(str.strip, lines) for group in split_groups(stripped_lines, header=is_header): print(list(group)) One remaining problem with that code is that it will silently drop the first line of the file if it doesn't start with a header: $ cat my_data alpha beta gamma Starting a new group a b c Starting a new group Starting a new group 1 2 3 4 Starting a new group X Y Z Starting a new group $ python3 group.py ['beta', 'gamma'] # where's alpha? ['a', 'b', 'c'] [] ['1', '2', '3', '4'] ['X', 'Y', 'Z'] [] How do you want to handle that case? -- http://mail.python.org/mailman/listinfo/python-list