As nice as it would be to use 64bit offsets I am instead mmapping the file in 1GB chunks and getting the results I need. I would still be interested in a 64bit solution though.
jt On Wed, Oct 6, 2010 at 2:41 PM, jay thompson <jayryan.thomp...@gmail.com>wrote: > Hello everyone, > > I'm trying to extract some data from a large memory mapped file (the > largest is ~30GB) with re.finditer() and re.start(). Pythons regular > expression module is great but the size of re.start() is 32bits (signed so I > can really only address 2GB). I was wondering if any here had some > suggestions on how to get the long offsets I need. btw... I can't break up > the file because the pattern I'm looking for can occur anywhere and on any > boundry. > > Also, is seek() limited to 32bit addresses? > > this is what I have in python 2.7 AMD64: > > > with open(file_path, 'r+b') as file: > > file_map = mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_READ) > file_map.seek(0) > > pattern = re.compile("pattern") > > for iii in re.finditer(pattern, file_map): > > offset = iii.start() > > write_to_sqlite(offset) > > > > > > -- > "It's quite difficult to remind people that all this stuff was here for a > million years before people. So the idea that we are required to manage it > is ridiculous. What we are having to manage is us." ...Bill Ballantine, > marine biologist. > > -- "It's quite difficult to remind people that all this stuff was here for a million years before people. So the idea that we are required to manage it is ridiculous. What we are having to manage is us." ...Bill Ballantine, marine biologist.
-- http://mail.python.org/mailman/listinfo/python-list