Aha, cool, that's a good idea =) - it seems I should spend some time getting to know generators/iterators.
Also, sorry if this is basic, but once I have the "block" list itself, what is the best way to parse each relevant line? In this case, the first line is a timestamp, the next two lines are system stats, and then a newline, and then one line for each block device. I could just hardcode in the lines, but that seems ugly: for block in parse_iostat(f): for i, line in enumerate(block): if i == 0: print("timestamp is {}".format(line)) elif i == 1 or i == 2: print("system stats: {}".format(line)) elif i >= 4: print("disk stats: {}".format(line)) Is there a prettier or more Pythonic way of doing this? Thanks, Victor On Wednesday, 1 July 2015 02:03:01 UTC+10, Chris Angelico wrote: > On Wed, Jul 1, 2015 at 1:47 AM, Skip Montanaro <skip.montan...@gmail.com> > wrote: > > Maybe define a class which wraps a file-like object. Its next() method (or > > is it __next__() method?) can just buffer up lines starting with one which > > successfully parses as a timestamp, accumulates all the rest, until a blank > > line or EOF is seen, then return that, either as a list of strings, one > > massive string, or some higher level representation (presumably an instance > > of another class) which represents one "paragraph" of iostat output. > > next() in Py2, __next__() in Py3. But I'd do it, instead, as a > generator - that takes care of all the details, and you can simply > yield useful information whenever you have it. Something like this > (untested): > > def parse_iostat(lines): > """Parse lines of iostat information, yielding ... something > > lines should be an iterable yielding separate lines of output > """ > block = None > for line in lines: > line = line.strip() > try: > tm = datetime.datetime.strptime(line, "%m/%d/%Y %I:%M:%S %p") > if block: yield block > block = [tm] > except ValueError: > # It's not a new timestamp, so add it to the existing block > block.append(line) > if block: yield block > > This is a fairly classic line-parsing generator. You can pass it a > file-like object, a list of strings, or anything else that it can > iterate over; it'll yield some sort of aggregate object representing > each time's block. In this case, all it does is append strings to a > list, so this will result in a series of lists of strings, each one > representing a single timestamp; you can parse the other lines in any > way you like and aggregate useful data. Usage would be something like > this: > > with open("logfile") as f: > for block in parse_iostat(f): > # do stuff with block > > This will work quite happily with an ongoing stream, too, so if you're > working with a pipe from a currently-running process, it'll pick stuff > up just fine. (However, since it uses the timestamp as its signature, > it won't yield anything till it gets the *next* timestamp. If the > blank line is sufficient to denote the end of a block, you could > change the loop to look for that instead.) > > Hope that helps! > > ChrisA -- https://mail.python.org/mailman/listinfo/python-list