MrsEntity wrote:
> Based on heapy, a db based solution would be serious overkill.

I've embraced overkill and my life is better for it. Don't confuse overkill 
with cost. Overkill is your friend.

The facts of the case: You need to save some derived strings for each of 2M 
input lines. Even half the input runs over the 2GB RAM in your (virtual) 
machine. You're using Ubuntu 12.04 in Virtualbox on Win7/64, Python 2.7/64.

That screams "sqlite3". It's overkill, in a good way. It's already there for 
the importing.

Other approaches? You could try to keep everything in RAM, but use less. Tim 
Chase pointed out the memory-efficiency of named tuples. You could save some 
more by switching to Win7/32, Python 2.7/32; VirtualBox makes trying such 
alternatives quick and easy.

Or you could add memory. Compared to good old 32-bit, 64-bit operation consumes 
significantly more memory and supports vastly more memory. There's a bit of a 
mis-match in a 64-bit system with just 2GB of RAM. I know, sounds weird, "just" 
two billion bytes of RAM. I'll rephrase: just ten dollars worth of RAM. Less if 
you buy it where I do.

I don't know why the memory profiling tools are misleading you. I can think of 
plausible explanations, but they'd just be guesses. There's nothing all that 
surprising in running out of RAM, given what you've explained. A couple K per 
line is easy to burn. 

-Bryan
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to