On Apr 2, 2:09 pm, "Derek Tracy" <[EMAIL PROTECTED]> wrote: > On Wed, Apr 2, 2008 at 10:59 AM, Derek Tracy <[EMAIL PROTECTED]> wrote: > > I am trying to write a script that reads in a large binary file (over 2Gb) > > saves the header file (169088 bytes) into one file then take the rest of > > the data and dump it into anther file. I generated code that works > > wonderfully for files under 2Gb in size but the majority of the files I am > > dealing with are over the 2Gb limit > > > INPUT = open(infile, 'rb') > > header = FH.read(169088) > > > ary = array.array('H', INPUT.read()) > > > INPUT.close() > > > OUTF1 = open(outfile1, 'wb') > > OUTF1.write(header) > > > OUTF2 = open(outfile2, 'wb') > > ary.tofile(OUTF2) > > > When I try to use the above on files over 2Gb I get: > > OverflowError: requested number of bytes is more than a Python string > > can hold > > > Does anybody have an idea as to how I can get by this hurdle? > > > I am working in an environment that does not allow me to freely download > > modules to use. Python version 2.5.1 > > > R/S -- > > --------------------------------- > > Derek Tracy > > [EMAIL PROTECTED] > > --------------------------------- > > I know have 2 solutions, one using > partial > and the other using array > > Both are clocking in at the same time (1m 5sec for 2.6Gb), are there > any ways I can optimize either solution? Would turning off the > read/write buff increase speed?
You may try to increase the buffering size when you open() the file and see if this helps: def iterchunks(filename, buffering): return iter(partial(open(filename,buffering=buffering).read, buffering), '') for chunk in iterchunks(filename, 32*1024): pass #for chunk in iterchunks(filename, 1024**2): pass #for chunk in iterchunks(filename, 10*1024**2): pass George -- http://mail.python.org/mailman/listinfo/python-list