On 11 April 2013 02:21, gry <georgeryo...@gmail.com> wrote: > Dear pythonistas, > I am writing a tiny utility to produce a file consisting of a > specified number of lines of a given length of random ascii > characters. I am hoping to find a more time and memory efficient way, > that is still fairly simple clear, and _pythonic_. > > I would like to have something that I can use at both extremes of > data: > > 32M chars per line * 100 lines > or > 5 chars per line * 1e8 lines.
I would definitely use numpy for this. The script below seems to be io-bound on my machine: #!/usr/bin/env python from sys import stdout import numpy as np from numpy import random CHARS = ( '0123456789' 'abcdefghijklmnopqrstuvwxyz' 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' '!#$%& \'()*+,-./:;<=>?@[\\]^_`{}' ) ARRAY_CHARS = np.frombuffer(CHARS, np.uint8) NCHARS = len(CHARS) CHUNK_SIZE = 4096 NCOLS = 32000000 NROWS = 10 def chunk_sizes(total, chunk_size): numchunks, remainder = divmod(total, chunk_size) for n in range(numchunks): yield chunk_size if remainder: yield remainder def chunks(): bytes_per_line = NCOLS + 1 total_bytes = bytes_per_line * NROWS newline_index = NCOLS newline = ord('\n') for size in chunk_sizes(total_bytes, CHUNK_SIZE): chars = ARRAY_CHARS[random.randint(0, NCHARS, size)] chars[newline_index::bytes_per_line] = newline newline_index = (newline_index - CHUNK_SIZE) % bytes_per_line yield chars for chunk in chunks(): chunk.tofile(stdout) -- http://mail.python.org/mailman/listinfo/python-list