On Aug 6, 6:56 pm, dmtr <dchich...@gmail.com> wrote: > > > Well... 63 bytes per item for very short unicode strings... Is there > > > any way to do better than that? Perhaps some compact unicode objects? > > > There is a certain price you pay for having full-feature Python objects. > > Are there any *compact* Python objects? Optimized for compactness?
Yes, but probably not in the way that'd be useful to you. Look at the array module, and also consider the third-party numpy library. They store compact arrays of numeric types (mostly) but they have character type storage as well. That probably won't help you, though, since you have variable-length strings. I don't know of any third-party types that can do what you want, but there might be some. Search PyPI. > > What are you trying to accomplish anyway? Maybe the array module can be > > of some help. Or numpy? > > Ultimately a dict that can store ~20,000,000 entries: (u'short > string' : (int, int, int, int, int, int, int)). My recommendation would be to use sqlite3. Only if you know for sure that it's too slow--meaning that you've actually tried it and it was too slow, and nothing else--then should you bother with a For that I'd probably go with a binary tree rather than a hash. So you have a huge numpy character array that stores all 20 million short strings end-to-end (in lexical order, so that you can look up the strings with a binary search), then you have an numpy integer array that stores the indices into this string where the word boundaries are, and then an Nx7 numpy integer array storing the int return vslues. That's three compact arrays. Carl Banks -- http://mail.python.org/mailman/listinfo/python-list