On 14 May 2018 at 13:53, Chris Angelico <ros...@gmail.com> wrote: > On Mon, May 14, 2018 at 10:49 PM, Paul Moore <p.f.mo...@gmail.com> wrote: >> On 14 May 2018 at 13:27, Chris Angelico <ros...@gmail.com> wrote: >>> On Mon, May 14, 2018 at 9:59 PM, Paul Moore <p.f.mo...@gmail.com> wrote: >>>> The problem is that supplying cum_weights allows the code to run in >>>> O(log n) by using bisection. This is significantly faster on large >>>> populations. Adding a test that the cumulative weights are >>>> nondecreasing would add an O(n) step to the code. >>>> >>> >>> Hang on - are the 'n' and 'log n' there referring to the same n? >> >> Yes. The number of elements in the sample population (which is the >> same as the number of entries in the weights/cum_weights arrays). > > Okay, cool. Thanks. I was a little confused as to whether the weights > were getting grouped up or not. Have seen too many cases where someone > panics about an O(n²) on a tiny n that's unrelated to the important > O(n) on a huge n :)
Yeah, for all of *my* uses of the functions in random, n is so small as to make all this irrelevant. But when I looked into how cum_weights worked, I realised it's aimed at people passing significant sized data sets. An they would probably be hit hard by a change from O(log n) to O(n). One thing I always liked about C++ was the way the standard library documented a lot of the O(n) properties of the operations. It not only made it easier to know what was costly and what wasn't, it also made it much clearer what functions were intended for use on large data sets. I sort of miss that information in Python - not least because functions like random.choices are often a lot faster than I'd naively expect. Paul -- https://mail.python.org/mailman/listinfo/python-list