[issue9520] Add Patricia Trie high performance container

2010-11-13 Thread Raymond Hettinger
Raymond Hettinger added the comment: I don't think a Patricia Trie is going to find its way into the code distribution. It has a better chance as a third-party module listed on PyPI (much in the same way that people access memcached from Python). -- resolution: -> rejected status: o

[issue9520] Add Patricia Trie high performance container

2010-11-11 Thread Łukasz Langa
Changes by Łukasz Langa : -- nosy: +lukasz.langa ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.pyt

[issue9520] Add Patricia Trie high performance container

2010-11-11 Thread Raymond Hettinger
Changes by Raymond Hettinger : -- assignee: -> rhettinger ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http:

[issue9520] Add Patricia Trie high performance container

2010-11-11 Thread Dirkjan Ochtman
Changes by Dirkjan Ochtman : -- nosy: +djc ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.or

[issue9520] Add Patricia Trie high performance container

2010-08-14 Thread Dmitry Chichkov
Dmitry Chichkov added the comment: Yes, it looks like you are right. And while there is some slight performance degradation, at least nothing drastic is happening up to 30M keys. Using your modified test: 1000 words ( 961 keys), 3609555 words/s, 19239926 lookups/s, 51 bytes/key (0.0M

[issue9520] Add Patricia Trie high performance container

2010-08-14 Thread Antoine Pitrou
Antoine Pitrou added the comment: So, here is the modified benchmark. It first creates a cache of the random wordlist, because it is quite long to compute (for N up to 1000). The cache is reused in subsequent runs. This takes some memory though (probably won't run it if you have less than

[issue9520] Add Patricia Trie high performance container

2010-08-14 Thread Antoine Pitrou
Antoine Pitrou added the comment: > For example, on the x64 machine the following dict() mapping > 10,000,000 very short unicode keys (~7 chars) to integers eats 149 > bytes per entry. This is counting the keys too. Under 3.x: >>> d = {} >>> for i in range(0, 1000): d[str(i)] = i ... >>

[issue9520] Add Patricia Trie high performance container

2010-08-13 Thread Dmitry Chichkov
Changes by Dmitry Chichkov : Added file: http://bugs.python.org/file18515/dc.dict.bench.0.02.py ___ Python tracker ___ ___ Python-bugs-list mai

[issue9520] Add Patricia Trie high performance container

2010-08-08 Thread Dmitry Chichkov
Dmitry Chichkov added the comment: Yes. Data containers optimized for very large datasets, compactness and strict adherence to O(1) can be beneficial. Python have great high performance containers, but there is a certain lack of compact ones. For example, on the x64 machine the following dic

[issue9520] Add Patricia Trie high performance container

2010-08-08 Thread Terry J. Reedy
Terry J. Reedy added the comment: A decade ago, 8gig memory was rare, now it is almost standard for high-end consumer machines. People are now doing bigger computations and stressing the implementation more. So it is plausible that we need to either tweak the core class implementations or (no

[issue9520] Add Patricia Trie high performance container

2010-08-06 Thread Daniel Urban
Changes by Daniel Urban : -- nosy: +durban ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.or

[issue9520] Add Patricia Trie high performance container

2010-08-05 Thread Dmitry Chichkov
Dmitry Chichkov added the comment: No. I'm not simply running out of system memory. 8Gb/x64/linux. And in my test cases I've only seen ~25% of memory utilized. And good idea. I'll try to play with the cyclic garbage collector. It is harder than I thought to make a solid synthetic test case ad

[issue9520] Add Patricia Trie high performance container

2010-08-05 Thread Antoine Pitrou
Antoine Pitrou added the comment: > 1) Bug. Python's dict() is unusable on datasets with 10,000,000+ keys. > Here I should provide a solid test case showing a deviation from O(1); Try to disable the cyclic garbage collector before the test (gc.disable()). -- nosy: +pitrou ___

[issue9520] Add Patricia Trie high performance container

2010-08-05 Thread Mark Dickinson
Mark Dickinson added the comment: > 1) Bug. Python's dict() is unusable on datasets with 10,000,000+ keys. > Here I should provide a solid test case showing a deviation from O(1); That would be helpful. Are you sure that the slow-down you're seeing isn't simply due to running out of system me

[issue9520] Add Patricia Trie high performance container

2010-08-05 Thread Dmitry Chichkov
Dmitry Chichkov added the comment: Thank you for your comment. Perhaps we should try separate this into two issues: 1) Bug. Python's dict() is unusable on datasets with 10,000,000+ keys. Here I should provide a solid test case showing a deviation from O(1); 2) Feature request/idea. Add Patrici

[issue9520] Add Patricia Trie high performance container

2010-08-05 Thread Éric Araujo
Changes by Éric Araujo : -- type: performance -> feature request ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe:

[issue9520] Add Patricia Trie high performance container

2010-08-05 Thread Éric Araujo
Éric Araujo added the comment: Thanks for the request Dmitry. You may want to read http://www.python.org/dev/peps/pep-0002/ to know which steps are required to add a module to the stdlib. In particular, the module has to be proven on the field and the author needs to sign a contributor agreem

[issue9520] Add Patricia Trie high performance container (python's defaultdict(int) is unusable on datasets with 10, 000, 000+ keys.)

2010-08-04 Thread Ezio Melotti
Ezio Melotti added the comment: I think it would be better to propose this on the python-ideas mailing list. -- nosy: +ezio.melotti, rhettinger type: performance -> feature request versions: -Python 3.1, Python 3.3 ___ Python tracker

[issue9520] Add Patricia Trie high performance container (python's defaultdict(int) is unusable on datasets with 10, 000, 000+ keys.)

2010-08-04 Thread Dmitry Chichkov
New submission from Dmitry Chichkov : On large data sets (10-100 million keys) the default python dictionary implementation fails to meet memory and performance constraints. It also apparently fails to keep O(1) complexity (after just 1M keys). As such, there is a need for good, optimized, pra