New submission from Larry Hastings:

Matt Mackall brought this over to me today.

If list "l" has one million pairs and you do
  dict(l)
it creates the dict, then resizes it to a million elements, then starts adding 
name/value pairs.  But since it's sized to a million elements, it gets beyond 
its "fill ratio" comfort level of 666k and resizes.  And when you're up to a 
million elements, the resize is quite slow.  It should have internally resized 
the dict correctly from the beginning such that no resize was necessary.

The same is true for dict.fromkeys--passed in a list of 1m elements, it uses an 
initial size of 1m, not 1.5m.

Attached is a sample patch to change the behavior of both these operations so 
no resizes are needed during the insertions.  I haven't checked that it 
actually fixes the behavior, I just made the change and ran the regression 
test.  (I'm oversubscribed here at the sprints, and am kind of rushing this 
diff out just to get the conversation rolling.)

Benjamin: what do you think?  Would it be appropriate to make a change like 
this in 2.7?

Python 3 (or at least 3.5 trunk) is fancy here.  It starts with the minimum 
size of a dict, then iterates until the new size is > 1.5*the  number of 
elements.  This means the dicts it creates are of the same size as they would 
be if we'd started with a minsize dict and inserted one element at a time.  
This might help minimize wear and tear on the small block allocator for smaller 
dicts.

BTW, the internal function _PyDict_NewPresize should really have taken the fill 
ratio into account too.  But, even though it's an internal-only function, it's 
probably too late to change its behavior.

----------
components: Interpreter Core
messages: 241179
nosy: benjamin.peterson, larry, marmoute, mpm
priority: normal
severity: normal
stage: patch review
status: open
title: dict(list) and dict.fromkeys() doesn't account for 2/3 fill ratio
type: behavior
versions: Python 2.7

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue23971>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to