Re: Manipulate Large Binary Files

George Sakkis Wed, 02 Apr 2008 13:02:25 -0700

On Apr 2, 2:09 pm, "Derek Tracy" <[EMAIL PROTECTED]> wrote:
> On Wed, Apr 2, 2008 at 10:59 AM, Derek Tracy <[EMAIL PROTECTED]> wrote:
> > I am trying to write a script that reads in a large binary file (over 2Gb) 
> > saves the header file (169088 bytes) into one file then take the rest of 
> > the data and dump it into anther file.  I generated code that works 
> > wonderfully for files under 2Gb in size but the majority of the files I am 
> > dealing with are over the 2Gb limit
>
> > INPUT = open(infile, 'rb')
> > header = FH.read(169088)
>
> > ary = array.array('H', INPUT.read())
>
> > INPUT.close()
>
> > OUTF1 = open(outfile1, 'wb')
> > OUTF1.write(header)
>
> > OUTF2 = open(outfile2, 'wb')
> > ary.tofile(OUTF2)
>
> > When I try to use the above on files over 2Gb I get:
> >      OverflowError: requested number of bytes is more than a Python string 
> > can hold
>
> > Does anybody have an idea as to how I can get by this hurdle?
>
> > I am working in an environment that does not allow me to freely download 
> > modules to use.  Python version 2.5.1
>
> > R/S --
> > ---------------------------------
> > Derek Tracy
> > [EMAIL PROTECTED]
> > ---------------------------------
>
> I know have 2 solutions, one using
> partial
> and the other using array
>
> Both are clocking in at the same time (1m 5sec for 2.6Gb), are there
> any ways I can optimize either solution?  Would turning off the
> read/write buff increase speed?


You may try to increase the buffering size when you open() the file
and see if this helps:

def iterchunks(filename, buffering):
    return iter(partial(open(filename,buffering=buffering).read,
buffering), '')

for chunk in iterchunks(filename, 32*1024): pass
#for chunk in iterchunks(filename, 1024**2): pass
#for chunk in iterchunks(filename, 10*1024**2): pass


George
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Manipulate Large Binary Files

Reply via email to