[Joerg Schuster] > I am looking for a method to "shuffle" the lines of a large file. > > I have a corpus of sorted and "uniqed" English sentences that has been > produced with (1): > > (1) sort corpus | uniq > corpus.uniq > > corpus.uniq is 80G large.
Since the corpus is huge, the python portion should not pull it all into memory. The best bet is to let the o/s tools take care of the that part: >>> from random import random >>> out = open('corpus.decorated', 'w') >>> for line in open('corpus.uniq'): print >> out, '%.14f %s' % (random(), line), >>> out.close() sort corpus.decorated | cut -c 18- > corpus.randomized Raymond Hettinger -- http://mail.python.org/mailman/listinfo/python-list