Victor Hooi wrote: > Hi Peter, > > Hmm, are you sure that will work?
If you want the starting line for the batch, yes: $ cat tmp.txt alpha (line #1) beta (line #2) gamma (line #3) delta (line #4) epsilon (line #5) zeta (line #6) eta (line #7) theta (line #8) iota (line #9) kappa (line #10) $ cat grouper_demo.py from itertools import zip_longest def grouper(iterable, n, fillvalue=None): args = [iter(iterable)] * n return zip_longest(fillvalue=fillvalue, *args) _BATCH_SIZE = 3 with open("tmp.txt", 'r') as f: for index, chunk in enumerate(grouper(f, _BATCH_SIZE)): print("batch starting at line", index * _BATCH_SIZE + 1) print(chunk) $ python3 grouper_demo.py batch starting at line 1 ('alpha (line #1)\n', 'beta (line #2)\n', 'gamma (line #3)\n') batch starting at line 4 ('delta (line #4)\n', 'epsilon (line #5)\n', 'zeta (line #6)\n') batch starting at line 7 ('eta (line #7)\n', 'theta (line #8)\n', 'iota (line #9)\n') batch starting at line 10 ('kappa (line #10)\n', None, None) > The indexes returned by enumerate will start from zero. > > Also, I've realised line_number is a bit of a misnomer here - it's > actually the index for the chunks that grouper() is returning. > > So say I had a 10-line textfile, and I was using a _BATCH_SIZE of 50. > > If I do: > > print(line_number * _BATCH_SIZE) > > I'd just get (0 * 50) = 0 printed out 10 times. > > Even if I add one: > > print((line_number + 1) * _BATCH_SIZE) > > I will just get 50 printed out 10 times. So you are trying to solve a slightly different problem. You can attack that by moving the enumerate() call: $ cat grouper_demo2.py from itertools import zip_longest def grouper(iterable, n, fillvalue=None): args = [iter(iterable)] * n return zip_longest(fillvalue=fillvalue, *args) _BATCH_SIZE = 3 with open("tmp.txt", 'r') as f: for chunk in grouper( enumerate(f, 1), _BATCH_SIZE, fillvalue=(None, None)): print("--- batch ---") for index, line in chunk: if index is None: break print(index, line, end="") print() $ python3 grouper_demo2.py --- batch --- 1 alpha (line #1) 2 beta (line #2) 3 gamma (line #3) --- batch --- 4 delta (line #4) 5 epsilon (line #5) 6 zeta (line #6) --- batch --- 7 eta (line #7) 8 theta (line #8) 9 iota (line #9) --- batch --- 10 kappa (line #10) $ -- https://mail.python.org/mailman/listinfo/python-list