forgot to edit the subject. Sorry. paul c. On Mon, May 14, 2018 at 12:02 PM, Paul <tallp...@gmail.com> wrote:
> Hello all, > Thanks for the thoughtful (and non-snarky) replies. > > First, a suggestion for a documentation change: > > To this paragraph: > > *If neither weights nor cum_weights are specified, selections are made > with equal probability. If a weights sequence is supplied, it must be the > same length as the population sequence. It is a TypeError > <https://docs.python.org/3/library/exceptions.html#TypeError> to specify > both weights and cum_weights.* > Add this sentence: > > "A cum_weights sequence, if supplied, must be in strictly-ascending order, > else incorrect results will be (silently) returned." > > Secondly, about the cost of verifying the sequence: > > 1) I understand the added cost of verifying the sequence. However, this > appears to be a one-time cost. E.G., if I submit this, > > random.choices(lm,cum_weights=[25,26,36,46,136],k=400 > > then the code will do an O(n log n) operation 400 times. > > If verification was added, then the the code would do an O(n log n) > operation 400 times, plus an O(n) operation done *one* time. So, I'm not > sure that this would be a significant efficiency hit (except in rare cases). > > 2) Paul Moore wrote: > > > So the people who *really* need cum_weights are those > > > who have the cumulative weights already, and cannot > > > afford an O(n)precalculation step. > > > I agree that with the "already have the cum_weights" argument. Based on > my point #1, I'm not convinced about the "can't afford" argument. > > 3) A minor point. The documentation also says: "so supplying the > cumulative weights saves work." However, this is work done (once, as noted > above) by a computer rather than work done (even if aided by a a computer) > by a human, so I'd vote for having the computer do it. :) > > > To conclude, I would still lean slightly toward having the code enforce > the 'strictly-ascending sequence' requirement. However, given that a) > improving the documentation is much more doable and that, b) in some cases, > the addition of an order O(n) step might be significant, I'd be more than > happy if the documentation could be improved (as suggested). > > thanks > Paul Czyzewki > > PS. I see the issue which steven.daprano opened. Thanks, Steven. > However, I'm not sure what's appropriate in terms of updating that issue, > or even if I have permission to update it, so I'd appreciate if someone > would add this response to the issue. Thanks. > > > >> From: Paul Moore <p.f.mo...@gmail.com> >> To: "Steven D'Aprano" <steve+comp.lang.pyt...@pearwood.info> >> Cc: Python <python-list@python.org> >> Bcc: >> Date: Mon, 14 May 2018 14:35:34 +0100 >> Subject: Re: random.choices() Suggest that the code confirm that >> cum_weights sequence is in ascending order >> On 14 May 2018 at 14:07, Steven D'Aprano >> <steve+comp.lang.pyt...@pearwood.info> wrote: >> > On Mon, 14 May 2018 12:59:28 +0100, Paul Moore wrote: >> > >> >> The problem is that supplying cum_weights allows the code to run in >> >> O(log n) by using bisection. This is significantly faster on large >> >> populations. Adding a test that the cumulative weights are >> nondecreasing >> >> would add an O(n) step to the code. >> >> >> >> So while I understand the OP's problem, I don't think it's soluble >> >> without making the cum_weights argument useless in practice. >> > >> > How does O(N) make it "useless"? There are lots of O(N) algorithms, even >> > O(N**2) and O(2**N) which are nevertheless still useful. >> >> Well, I've never seen an actual use case for this argument (I can't >> think of a case where I'd even have cumulative weights rather than >> weights, and obviously calculating the cumulative weights from the >> actual weights is what we're trying to avoid). And if you have >> cum_weights and O(n) is fine, then calculating weights from >> cum_weights is acceptable (although pointless, as it simply duplicates >> work). So the people who *really* need cum_weights are those who have >> the cumulative weights already, and cannot afford an O(n) >> precalculation step. >> >> But yes, clearly in itself an O(n) algorithm isn't useless. And >> agreed, in most cases whether random.choices() is O(n) or O(log n) is >> irrelevant in practice. >> >> Paul >> > -- https://mail.python.org/mailman/listinfo/python-list