On Wed, Mar 19, 2014 at 7:16 PM, Rhodri James wrote:
> 65536 is a suspiciously round number. You might want to double-
> check that there's no 16-bit overflow causing something unexpected.
It's because I'm using powers of 2. All the numbers in the report are
round in hex.
--
https://mail.pytho
On Tue, 18 Mar 2014 21:45:52 -, Dan Stromberg
wrote:
blist.sorteddict was able to do 65536 operations on a dictionary
before taking more than 120 seconds to complete - it took 77.3
seconds to do 65536 operations.
65536 is a suspiciously round number. You might want to double-
check th
On 03/18/2014 06:15 PM, Steven D'Aprano wrote:
On Tue, 18 Mar 2014 15:21:28 -0700, Dan Stromberg wrote:
On Tue, Mar 18, 2014 at 3:03 PM, Marko Rauhamaa
wrote:
Dan Stromberg :
For a proper comparison, I'd like a fixed, identical dataset and set of
operations run against each data structure.
H
In article <53299eac$0$29994$c3e8da3$54964...@news.astraweb.com>,
Steven D'Aprano wrote:
> If you have a million items, then the odds that those
> million items happen to be sorted (the worst-case order) are 1 in a
> million factorial. That's a rather small number, small enough that we can
>
Steven D'Aprano :
> Please re-read what I wrote. I didn't say "if your data comes to you
> fully randomized". I said "if you are in a position to randomize the
> data before storing it". In other words, if you actively randomize the
> data yourself.
Yes, you could implement a "hash table" that wa
On Wed, 19 Mar 2014 10:49:33 +0200, Marko Rauhamaa wrote:
> Steven D'Aprano :
>
>> If you are in a position to randomize the data before storing it in the
>> tree, an unbalanced binary tree is a solid contender.
>
> Applications that can assume randomly distributed data are exceedingly
> rare
Steven D'Aprano :
> If you are in a position to randomize the data before storing it in
> the tree, an unbalanced binary tree is a solid contender.
Applications that can assume randomly distributed data are exceedingly
rare and even then, you can't easily discount the possibility of
worst-case or
On Tue, 18 Mar 2014 15:21:28 -0700, Dan Stromberg wrote:
> On Tue, Mar 18, 2014 at 3:03 PM, Marko Rauhamaa
> wrote:
>> Dan Stromberg :
>> For a proper comparison, I'd like a fixed, identical dataset and set of
>> operations run against each data structure.
>>
>> How about this test program:
>
>
On Wed, 19 Mar 2014 01:11:33 +0200, Marko Rauhamaa wrote:
> Dan Stromberg :
>> Rather than throw out unbalanced binary tree altogether, it makes more
>> sense to run it until it gets "too slow".
>
> I disagree strongly. You should throw out unbalanced binary trees and
> linked lists and the like
>
> On Tue, Mar 18, 2014 at 1:55 PM, Marko Rauhamaa wrote:
>
Dan Stromberg :
>> > The results are at
>> >
>> http://stromberg.dnsalias.org/~strombrg/python-tree-and-heap-comparison/2014-03/
>
>
Size: 1048576, duration: 75.3, dictionary type: dict
> [...]
> Size: 262144, duration: 66.
Dan Stromberg :
> On Tue, Mar 18, 2014 at 3:03 PM, Marko Rauhamaa wrote:
>> Dan Stromberg :
>> For a proper comparison, I'd like a fixed, identical dataset and set
>> of operations run against each data structure.
>>
>> How about this test program:
>
> I used to do essentially this, but it was ti
On Tue, Mar 18, 2014 at 3:03 PM, Marko Rauhamaa wrote:
> Dan Stromberg :
> For a proper comparison, I'd like a fixed, identical dataset and set of
> operations run against each data structure.
>
> How about this test program:
I used to do essentially this, but it was time-prohibitive and
produced
Dan Stromberg :
> dict was able to do 1048576 operations on a dictionary before taking
> more than 120 seconds to complete - it took 75.3 seconds to do 1048576
> operations.
>
> AVL_tree was able to do 262144 operations on a dictionary before
> taking more than 120 seconds to complete - it took 66
On Tue, Mar 18, 2014 at 1:55 PM, Marko Rauhamaa wrote:
> Dan Stromberg :
>
>> The results are at
>> http://stromberg.dnsalias.org/~strombrg/python-tree-and-heap-comparison/2014-03/
>
> Unfortunately I'm having a hard time understanding the results.
>
> The 50/50 get/set ratio is most interesting t
Dan Stromberg :
> The results are at
> http://stromberg.dnsalias.org/~strombrg/python-tree-and-heap-comparison/2014-03/
Unfortunately I'm having a hard time understanding the results.
The 50/50 get/set ratio is most interesting to me.
I'm seeing (under cpython-3.3):
Size: 1048576, duratio
On Mon, Mar 17, 2014 at 3:05 PM, Marko Rauhamaa wrote:
> Joshua Landau :
>
>> The thing we really need is for the blist containers to become stdlib
>> (but not to replace the current list implementation).
>
> Very interesting. Downloaded blist but didn't compile it yet. It *could*
> be the missing
On 18 March 2014 01:01, Daniel Stutzbach wrote:
> I would love to have include macro-benchmarks. I keep waiting for the PyPy
> benchmark suite to get ported to Python 3...
*grins*
>> "Delete a slice" is fudged from its inclusion of multiplication, which
>> is far faster on blists. I admit that
On 17 March 2014 21:16, Daniel Stutzbach wrote:
> On Fri, Mar 14, 2014 at 6:13 PM, Joshua Landau wrote:
>>
>> Now, I understand there are downsides to blist. Particularly, I've
>> looked through the "benchmarks" and they seem untruthful.
>
> I worked hard to make those benchmarks as fair as possi
Joshua Landau :
> The thing we really need is for the blist containers to become stdlib
> (but not to replace the current list implementation).
Very interesting. Downloaded blist but didn't compile it yet. It *could*
be the missing link.
I would *love* to see some comparative performance results
On 15/03/2014 01:13, Joshua Landau wrote:
On 8 March 2014 20:37, Mark Lawrence wrote:
I've found this link useful http://kmike.ru/python-data-structures/
I also don't want all sorts of data structures added to the Python library.
I believe that there are advantages to leaving specialist data s
On 8 March 2014 20:37, Mark Lawrence wrote:
> I've found this link useful http://kmike.ru/python-data-structures/
>
> I also don't want all sorts of data structures added to the Python library.
> I believe that there are advantages to leaving specialist data structures on
> pypi or other sites, pl
On Thu, 13 Mar 2014 20:12:41 -0700, Dan Stromberg wrote:
> Sorting the same (slightly tweaked) data inside of a tight loop is
> rarely a good idea - despite the fact that the "sort" itself tends to be
> O(n) with Python's rather awesome builtin sorting algorithm. This is
> because sorting inside
On Thu, Mar 13, 2014 at 4:57 PM, Steven D'Aprano
wrote:
> On Mon, 10 Mar 2014 19:34:48 +0200, Marko Rauhamaa wrote:
>
>>> With a high level language like Python, using the provided hash table
>>> will almost always cream any hand-built tree, no matter how
>>> advantageous the data is to the tree.
On Mon, 10 Mar 2014 19:34:48 +0200, Marko Rauhamaa wrote:
>> With a high level language like Python, using the provided hash table
>> will almost always cream any hand-built tree, no matter how
>> advantageous the data is to the tree.
>
> The main thing is there are use cases where order is essen
On Mon, Mar 10, 2014 at 11:34 AM, Marko Rauhamaa wrote:
> The main thing is there are use cases where order is essential. That's
> why I have had to implement the AVL tree in Python myself. No biggie,
> but a C implementation would probably be much faster. Also, a standard
> version would likely b
Op 11-03-14 00:24, Roy Smith schreef:
> In article <8761nmrnfk@elektro.pacujo.net>,
> Marko Rauhamaa wrote:
>
>> Anyway, this whole debate is rather unnecessary since every developer is
>> supposed to have both weapons in their arsenal.
> The problem with having a choice is that it opens up
On 11/03/2014 8:12 PM, Marko Rauhamaa wrote:
Python should let skilled professionals do their work. Thankfully, for
the most part, it does.
Skilled professionals don't solely rely on the standard library, either.
If you know you need a balanced tree, you'll also know where to find an
implemen
On Tue, 11 Mar 2014 04:28:25 -, Chris Angelico
wrote:
On Tue, Mar 11, 2014 at 2:38 PM, Ian Kelly wrote:
On Mon, Mar 10, 2014 at 7:45 PM, Chris Angelico
wrote:
No no,
I could make this so much better by using the 80x86 "REP MOVSW"
command (or commands, depending on your point of view)
Steven D'Aprano :
> On Mon, 10 Mar 2014 19:24:07 -0400, Roy Smith wrote:
>
>> In article <8761nmrnfk@elektro.pacujo.net>,
>> Marko Rauhamaa wrote:
>>
>>> Anyway, this whole debate is rather unnecessary since every developer
>>> is supposed to have both weapons in their arsenal.
>>
>> The p
On Mon, 10 Mar 2014 19:24:07 -0400, Roy Smith wrote:
> In article <8761nmrnfk@elektro.pacujo.net>,
> Marko Rauhamaa wrote:
>
>> Anyway, this whole debate is rather unnecessary since every developer
>> is supposed to have both weapons in their arsenal.
>
> The problem with having a choice i
On Tue, Mar 11, 2014 at 1:20 PM, Roy Smith wrote:
> In article <531e6eca$0$29994$c3e8da3$54964...@news.astraweb.com>,
> Steven D'Aprano wrote:
>
>> There's only so much matter in the
>> universe, so talking about limits as the amount of data approaches
>> infinity is nonsense. Where would you st
On Tue, Mar 11, 2014 at 2:38 PM, Ian Kelly wrote:
> On Mon, Mar 10, 2014 at 7:45 PM, Chris Angelico wrote:
>> No no,
>> I could make this so much better by using the 80x86 "REP MOVSW"
>> command (or commands, depending on your point of view). That would be
>> so much better than all those separat
On Mon, Mar 10, 2014 at 7:45 PM, Chris Angelico wrote:
> On Tue, Mar 11, 2014 at 12:26 PM, Steven D'Aprano
> wrote:
>> In my experience, the average developer has an amazing talent for
>> pessimising code when they think they are optimising it.
>
> I remember a number of incidents from personal e
In article <531e6eca$0$29994$c3e8da3$54964...@news.astraweb.com>,
Steven D'Aprano wrote:
> There's only so much matter in the
> universe, so talking about limits as the amount of data approaches
> infinity is nonsense. Where would you store it?
Funny you should ask that. I just finished watc
In article ,
Dan Stromberg wrote:
> On Mon, Mar 10, 2014 at 6:59 AM, Roy Smith wrote:
> > On the other hand, log n, for n = 1 million, is just 20. It's not hard
> > to imagine a hash function which costs 20x what a node traversal does,
> > in which case, the log n lookup is ahead for all n < 1
On Mon, 10 Mar 2014 09:01:23 -0700, Rustom Mody wrote:
> 2. Being pointed out that a finite-input table-lookup being called a
> hash-function is a rather nonsensical claim and goes counter to the
> basic tenets of asymptotic notation. (In CS unlike in math 'asymptote'
> is always infinity) IOW
Th
On Tue, Mar 11, 2014 at 12:26 PM, Steven D'Aprano
wrote:
> In my experience, the average developer has an amazing talent for
> pessimising code when they think they are optimising it.
I remember a number of incidents from personal experience when I was a
*very* average developer. One time, writin
On Mon, 10 Mar 2014 19:24:07 -0400, Roy Smith wrote:
> In article <8761nmrnfk@elektro.pacujo.net>,
> Marko Rauhamaa wrote:
>
>> Anyway, this whole debate is rather unnecessary since every developer
>> is supposed to have both weapons in their arsenal.
>
> The problem with having a choice i
On Mon, Mar 10, 2014 at 6:59 AM, Roy Smith wrote:
> On the other hand, log n, for n = 1 million, is just 20. It's not hard
> to imagine a hash function which costs 20x what a node traversal does,
> in which case, the log n lookup is ahead for all n < 1 million.
FWIW, both the hash table and the
On Tue, Mar 11, 2014 at 10:24 AM, Roy Smith wrote:
> As this discussion has shown, figuring out whether a hash table or a
> tree is better for a given problem is non-trivial. My guess is that if
> you gave 1000 typical developers both data structures and let them pick
> freely, the number of case
In article <8761nmrnfk@elektro.pacujo.net>,
Marko Rauhamaa wrote:
> Anyway, this whole debate is rather unnecessary since every developer is
> supposed to have both weapons in their arsenal.
The problem with having a choice is that it opens up the possibility of
making the wrong one :-)
A
Chris Angelico :
> Supposed to have? What does that mean, a language isn't ISO-compliant
> unless it provides both?
It's an ancient, fundamental data structure, right up there with dynamic
lists. There's no reason it shouldn't be available in every programming
environment.
> With a high level la
On Tue, Mar 11, 2014 at 4:08 AM, Marko Rauhamaa wrote:
> Chris Angelico :
>
>> The only difference between a tree and a hash here is that the tree
>> might be able to short-cut the comparisons. But if there are a whole
>> bunch of songs with the same "song" and "user", then the tree has to
>> comp
Chris Angelico :
> The only difference between a tree and a hash here is that the tree
> might be able to short-cut the comparisons. But if there are a whole
> bunch of songs with the same "song" and "user", then the tree has to
> compare (song->song? same; user->user? same; add_time->add_time?
>
On Tue, Mar 11, 2014 at 3:33 AM, Tim Chase
wrote:
> On 2014-03-11 03:24, Chris Angelico wrote:
>> Imagine, worst case, all one million records have the same
>> song/user/add_time and you need to do twenty comparisons involving
>> four fields. That's gotta be worse than one hashing of five fields.
On 2014-03-11 03:24, Chris Angelico wrote:
> Imagine, worst case, all one million records have the same
> song/user/add_time and you need to do twenty comparisons involving
> four fields. That's gotta be worse than one hashing of five fields.
And if you have one million songs that are indistinguis
On Tue, Mar 11, 2014 at 12:59 AM, Roy Smith wrote:
> Looking at the Songza source, I see we have one user-defined hash
> function:
>
> def __hash__(self):
> return hash((self.song,
> self.user,
> self.add_time,
> self.delet
On Tue, Mar 11, 2014 at 12:59 AM, Roy Smith wrote:
> The hash vs. tree argument can get very complicated. For example, if
> your tree is not completely resident in memory, the cost of paging in a
> node will swamp everything else, and improving lookup speed will boil
> down to reducing the number
On Monday, March 10, 2014 4:27:13 PM UTC+5:30, Ned Batchelder wrote:
> You are right that you and Steven have had a hard time communicating.
> You are part of "you and Steven", it would be at least polite to
> consider that part of the reason for the difficulty has to do with your
> style. It c
In article <87eh2atw6s@elektro.pacujo.net>,
Marko Rauhamaa wrote:
> Steven D'Aprano :
>
> > Proof: I create a hash table that accepts unsigned bytes as keys, where
>
> The O(f(n)) notation has no meaning when n is limited.
>
> This thing is not just pedantry. The discussion was how a bal
On 3/10/14 5:41 AM, Marko Rauhamaa wrote:
Steven D'Aprano :
If I am right, that certainly would explain your apparent inability to
understand the difference between "is" and == operators, your
insistence that object IDs are addresses, and your declaration that
object identity is philosophically
On 10/03/2014 08:53, Steven D'Aprano wrote:
In other words, I suspect you are trolling.
s/suspect/know/ , he didn't make captain of my dream team for nothing,
you know :)
--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.
Mark Lawre
Steven D'Aprano :
> If I am right, that certainly would explain your apparent inability to
> understand the difference between "is" and == operators, your
> insistence that object IDs are addresses, and your declaration that
> object identity is philosophically untenable.
You and I certainly have
On Mon, 10 Mar 2014 08:16:43 +0200, Marko Rauhamaa wrote:
> Steven D'Aprano :
>
>> Proof: I create a hash table that accepts unsigned bytes as keys, where
>
> The O(f(n)) notation has no meaning when n is limited.
It has obvious meaning: O(1) means that look-ups take constant time, not
(for ex
Steven D'Aprano :
> Proof: I create a hash table that accepts unsigned bytes as keys, where
The O(f(n)) notation has no meaning when n is limited.
This thing is not just pedantry. The discussion was how a balanced tree
fares in comparison with hash tables. A simplistic O(1) vs O(log n) was
pres
On Sun, 09 Mar 2014 18:04:46 -0600, Ian Kelly wrote:
> On Sun, Mar 9, 2014 at 4:08 PM, Dan Stromberg
> wrote:
>> On Sun, Mar 9, 2014 at 2:43 PM, Marko Rauhamaa
>> wrote:
>>> Dan Stromberg :
>>>
On Sun, Mar 9, 2014 at 2:32 PM, Marko Rauhamaa
wrote:
> There is no O(1) hash table.
>>
On Sun, 09 Mar 2014 23:32:54 +0200, Marko Rauhamaa wrote:
> Dan Stromberg :
>
>> This is not just a detail: O(1) tends to be beat O(logn) pretty easily
>> for large n.
>
> There is no O(1) hash table.
Of course there are.
While it is true that hash tables *in general* are not *always* O(1), th
On Sun, Mar 9, 2014 at 4:08 PM, Dan Stromberg wrote:
> On Sun, Mar 9, 2014 at 2:43 PM, Marko Rauhamaa wrote:
>> Dan Stromberg :
>>
>>> On Sun, Mar 9, 2014 at 2:32 PM, Marko Rauhamaa wrote:
There is no O(1) hash table.
>>>
>>> http://stackoverflow.com/questions/2771368/can-hash-tables-really
Dan Stromberg :
> On Sun, Mar 9, 2014 at 2:43 PM, Marko Rauhamaa wrote:
There is no O(1) hash table.
> [...]
>
> it's still amortized O(1).
So we are in perfect agreement.
Hash tables are a useful family of techniques but involve quite a bit of
cost/benefit heuristics. You can only claim O
On Sun, Mar 9, 2014 at 2:43 PM, Marko Rauhamaa wrote:
> Dan Stromberg :
>
>> On Sun, Mar 9, 2014 at 2:32 PM, Marko Rauhamaa wrote:
>>> There is no O(1) hash table.
>>
>> http://stackoverflow.com/questions/2771368/can-hash-tables-really-be-o1
>
> Please elaborate.
A hash table of fixed size is O(
Dan Stromberg :
> On Sun, Mar 9, 2014 at 2:32 PM, Marko Rauhamaa wrote:
>> There is no O(1) hash table.
>
> http://stackoverflow.com/questions/2771368/can-hash-tables-really-be-o1
Please elaborate.
Marko
--
https://mail.python.org/mailman/listinfo/python-list
On Sun, Mar 9, 2014 at 2:32 PM, Marko Rauhamaa wrote:
> Dan Stromberg :
>
>> This is not just a detail: O(1) tends to be beat O(logn) pretty easily
>> for large n.
>
> There is no O(1) hash table.
http://stackoverflow.com/questions/2771368/can-hash-tables-really-be-o1
--
https://mail.python.org/
Dan Stromberg :
> This is not just a detail: O(1) tends to be beat O(logn) pretty easily
> for large n.
There is no O(1) hash table.
Marko
--
https://mail.python.org/mailman/listinfo/python-list
On Sun, Mar 9, 2014 at 1:27 AM, Marko Rauhamaa wrote:
> Dan Stromberg :
>
>> On Sat, Mar 8, 2014 at 1:21 PM, Marko Rauhamaa wrote:
>>> If I had to choose between a hash table and AVL (or RB) tree in the
>>> standard library, it would definitely have to be the latter. It is more
>>> generally usab
Dan Stromberg :
> On Sat, Mar 8, 2014 at 1:21 PM, Marko Rauhamaa wrote:
>> If I had to choose between a hash table and AVL (or RB) tree in the
>> standard library, it would definitely have to be the latter. It is more
>> generally usable, has fewer corner cases and probably has an equal
>> perfor
Roy Smith :
> Marko Rauhamaa wrote:
>
>> If I had to choose between a hash table and AVL (or RB) tree in the
>> standard library, it would definitely have to be the latter. It is more
>> generally usable, has fewer corner cases and probably has an equal
>> performance even in hash tables' sweet
On Sat, Mar 8, 2014 at 1:21 PM, Marko Rauhamaa wrote:
> If I had to choose between a hash table and AVL (or RB) tree in the
> standard library, it would definitely have to be the latter. It is more
> generally usable, has fewer corner cases and probably has an equal
> performance even in hash tabl
In article <87eh2ctmht@elektro.pacujo.net>,
Marko Rauhamaa wrote:
> If I had to choose between a hash table and AVL (or RB) tree in the
> standard library, it would definitely have to be the latter. It is more
> generally usable, has fewer corner cases and probably has an equal
> performance
Mark Lawrence :
> I believe that there are advantages to leaving specialist data
> structures on pypi or other sites, plus it means Python in a Nutshell
> can still fit in your pocket and not a 40 ton articulated lorry,
> unlike the Java equivalent.
An ordered map is a foundational data structure
On 08/03/2014 19:58, Dan Stromberg wrote:
On Sat, Mar 8, 2014 at 12:34 AM, Marko Rauhamaa wrote:
Ian Kelly :
I already mentioned this earlier in the thread, but a balanced binary
tree might implement += as node insertion and then return a different
object if the balancing causes the root node
On Sat, Mar 8, 2014 at 12:34 AM, Marko Rauhamaa wrote:
> Ian Kelly :
>
>> I already mentioned this earlier in the thread, but a balanced binary
>> tree might implement += as node insertion and then return a different
>> object if the balancing causes the root node to change.
>
> True.
>
> Speaking
Ian Kelly :
> Peeking at the code, it appears to use a heapq-based priority queue.
> Why would a balanced binary tree be better?
AFAIK, a heap queue doesn't allow for the deletion of a random element
forcing you to leave the canceled timers in the queue to be deleted
later.
In a very typical sce
On Sat, Mar 8, 2014 at 1:34 AM, Marko Rauhamaa wrote:
> Speaking of which, are there plans to add a balanced tree to the
> "batteries" of Python? Timers, cache aging and the like need it. I'm
> using my own AVL tree implementation, but I'm wondering why Python
> still doesn't have one.
None curre
73 matches
Mail list logo