[issue13903] New shared-keys dictionary implementation

Mark Shannon Wed, 29 Feb 2012 08:24:30 -0800

Mark Shannon <m...@hotpy.org> added the comment:

Jim Jewett wrote:
> Jim Jewett <jimjjew...@gmail.com> added the comment:
> 
> As of Feb 28, 2012, the PEP mentions an additional optimization of storing 
> the values in an array indexed by (effectively) key insertion order, rather 
> than key position. ("Alternative Implementation")
> 
> It states that this would reduce memory usage for the values array by 1/3.  
> 1/3 is a worst-case measurement; average is 1/2.  (At savings of less than 
> 1/3, the keys would resize, to initial savings of 2/3.  And yes, that means 
> in practice, the average savings would be greater than half, because the 
> frequency of dicts of size N decreases with N.)
>


I presume you mean to allocate a values array of size == actual keys
rather than size == usable keys.

Making the  values array smaller than the the number of usable keys
causes a number of issues:

1. The values_size will need to be kept in the dict, not in the keys, 
which would make the dict larger for size 3 or 5, the same size for
size 2 or 4, but it would save memory for sizes 6-8 (see below).
So it might not save memory at all, it depends on the distribution of 
the sizes of shared-key dicts.
2. dk_usable would need to be reverted to dk_fill (c.f ma_fill) in order
to quickly allocate values arrays. This would slow down the resizing 
check, which is now done before insertion, so might be slower.
(not really a problem, but it might slow down inserts)
3. An extra check needs to be performed for each read to ensure the 
index is in-bounds
(again not really a problem, but it might slow down reads)

Comparative sizes of values array (including size field):

Items Proposed    Alternative    Other Alternative    Delta
                (size == usuable) (size == keys (+1))
   1      4           3                 2              -1       
   2      4           3                 3               0
   3      4           3                 4              +1
   4      8           5                 5               0
   5      8           5                 6              +1
   6     16          10                 7              -3
   7     16          10                 8              -2
   8     16          10                 9              -1
   9     16          10                10               0
  10     16          10                11              +1
  12     32          21                13              -8
  14     32          21                15              -6

The memory savings of the two alternative implementations are broadly
similar.

Clearly, either of the alternatives are attractive in terms of memory 
use. I think it is definitely worth investigating further, but I would 
like to get the proposed implementation into CPython first.

> It states that the keys table would need an additional "values_size" field, 
> but in the absence of dummies, this is just ma_used.

values_size != ma_used
values_size is the size of the values array, not the number of values.

Don't forget deletions or the case where items are inserted in a 
different order from that of another dict sharing the same keys.
Although there are no dummies, (key != NULL, value == NULL) is a legal
pair representing a missing or deleted value.

Cheers,
Mark.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue13903>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue13903] New shared-keys dictionary implementation

Reply via email to