dmtr wrote: >> > Well... 63 bytes per item for very short unicode strings... Is there >> > any way to do better than that? Perhaps some compact unicode objects? >> >> There is a certain price you pay for having full-feature Python objects. > > Are there any *compact* Python objects? Optimized for compactness? > >> What are you trying to accomplish anyway? Maybe the array module can be >> of some help. Or numpy? > > Ultimately a dict that can store ~20,000,000 entries: (u'short > string' : (int, int, int, int, int, int, int)).
I don't know to what extent it still applys but switching off cyclic garbage collection with import gc gc.disable() while building large datastructures used to speed up things significantly. That's what I would try first with your real data. Encoding your unicode strings as UTF-8 could save some memory. When your integers fit into two bytes, say, you can use an array.array() instead of the tuple. Peter -- http://mail.python.org/mailman/listinfo/python-list