Dmitry Chichkov added the comment:
Use case: a custom immutable array with a large number of items and indirect
key field access. For example ctypes.array, memoryview or ctypes.pointer or any
other custom container.
1. I'm not sure how anyone can consider a precached key array as a righ
Dmitry Chichkov added the comment:
Yes, it looks like you are right. And while there is some slight performance
degradation, at least nothing drastic is happening up to 30M keys. Using your
modified test:
1000 words ( 961 keys), 3609555 words/s, 19239926 lookups/s, 51
bytes/key
Changes by Dmitry Chichkov :
Added file: http://bugs.python.org/file18515/dc.dict.bench.0.02.py
___
Python tracker
<http://bugs.python.org/issue9520>
___
___
Python-bug
Dmitry Chichkov added the comment:
Yes. Data containers optimized for very large datasets, compactness and strict
adherence to O(1) can be beneficial.
Python have great high performance containers, but there is a certain lack of
compact ones. For example, on the x64 machine the following
Dmitry Chichkov added the comment:
No. I'm not simply running out of system memory. 8Gb/x64/linux. And in my test
cases I've only seen ~25% of memory utilized. And good idea. I'll try to play
with the cyclic garbage collector.
It is harder than I thought to make a solid synt
Dmitry Chichkov added the comment:
Thank you for your comment. Perhaps we should try separate this into two issues:
1) Bug. Python's dict() is unusable on datasets with 10,000,000+ keys. Here I
should provide a solid test case showing a deviation from O(1);
2) Feature request/idea
New submission from Dmitry Chichkov :
On large data sets (10-100 million keys) the default python dictionary
implementation fails to meet memory and performance constraints. It also
apparently fails to keep O(1) complexity (after just 1M keys). As such, there
is a need for good, optimized
Dmitry Chichkov added the comment:
Interestingly in precisely these applications often you don't care about
namespaces at all. Often all you need is to extract 'text' or 'name' elements
irregardless of the namespace.
--
Dmitry Chichkov added the comment:
I agree that the argument name choice is poor. But it have already been made by
whoever coded the EXPAT parser which cElementTree.XMLParser wraps. So there is
not much room here.
As to 'proposed feature have to be used with great care by users'
Dmitry Chichkov added the comment:
This patch does not modify the existing behavior of the library. The
namespace_separator parameter is optional. Parameter already exists in the
EXPAT library, but it is hard coded in the cElementTree.XMLParser code.
Fredrik, yes, namespaces are a
Dmitry Chichkov added the comment:
And obviously iterparse can be either overridden in the local user code or
patched in the library. Here's the iterparse code/test code:
import cElementTree
from cStringIO import StringIO
class iterparse(object):
root = None
def __init__(self,
Changes by Dmitry Chichkov :
--
keywords: +patch
Added file: http://bugs.python.org/file17153/issue-8583.patch
___
Python tracker
<http://bugs.python.org/issue8
New submission from Dmitry Chichkov :
The namespace_separator parameter is hard coded in the cElementTree.XMLParser
class disallowing the option of ignoring XML Namespaces with cElementTree
library.
Here's the code example:
from xml.etree.cElementTree import iterparse
from StringIO i
Dmitry Chichkov added the comment:
Yes. This patch is nowhere near the production level. Unfortunately it works
for me. And in the moment I don't have time to improve it further. Current
version doesn't check the item's width upfront, there is definitely room for
improvement.
Dmitry Chichkov added the comment:
Quick, dirty and utterly incorrect patch that works for me. Includes
issue_5131.patch (defaultdict support, etc). Targets trunk (2.6), revision
77310.
--
keywords: +patch
Added file: http://bugs.python.org/file16640/issue_8228.dirty.patch
New submission from Dmitry Chichkov :
I've run into a case where pprint isn't really pretty.
import pprint
pprint.PrettyPrinter().pprint([1]*100)
Prints a lengthy column of '1'; Not pretty at all. Look:
[1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1
16 matches
Mail list logo