Mark Shannon <m...@hotpy.org> added the comment: Jim Jewett wrote: > Jim Jewett <jimjjew...@gmail.com> added the comment: > > As of Feb 28, 2012, the PEP mentions an additional optimization of storing > the values in an array indexed by (effectively) key insertion order, rather > than key position. ("Alternative Implementation") > > It states that this would reduce memory usage for the values array by 1/3. > 1/3 is a worst-case measurement; average is 1/2. (At savings of less than > 1/3, the keys would resize, to initial savings of 2/3. And yes, that means > in practice, the average savings would be greater than half, because the > frequency of dicts of size N decreases with N.) >
I presume you mean to allocate a values array of size == actual keys rather than size == usable keys. Making the values array smaller than the the number of usable keys causes a number of issues: 1. The values_size will need to be kept in the dict, not in the keys, which would make the dict larger for size 3 or 5, the same size for size 2 or 4, but it would save memory for sizes 6-8 (see below). So it might not save memory at all, it depends on the distribution of the sizes of shared-key dicts. 2. dk_usable would need to be reverted to dk_fill (c.f ma_fill) in order to quickly allocate values arrays. This would slow down the resizing check, which is now done before insertion, so might be slower. (not really a problem, but it might slow down inserts) 3. An extra check needs to be performed for each read to ensure the index is in-bounds (again not really a problem, but it might slow down reads) Comparative sizes of values array (including size field): Items Proposed Alternative Other Alternative Delta (size == usuable) (size == keys (+1)) 1 4 3 2 -1 2 4 3 3 0 3 4 3 4 +1 4 8 5 5 0 5 8 5 6 +1 6 16 10 7 -3 7 16 10 8 -2 8 16 10 9 -1 9 16 10 10 0 10 16 10 11 +1 12 32 21 13 -8 14 32 21 15 -6 The memory savings of the two alternative implementations are broadly similar. Clearly, either of the alternatives are attractive in terms of memory use. I think it is definitely worth investigating further, but I would like to get the proposed implementation into CPython first. > It states that the keys table would need an additional "values_size" field, > but in the absence of dummies, this is just ma_used. values_size != ma_used values_size is the size of the values array, not the number of values. Don't forget deletions or the case where items are inserted in a different order from that of another dict sharing the same keys. Although there are no dummies, (key != NULL, value == NULL) is a legal pair representing a missing or deleted value. Cheers, Mark. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue13903> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com