On Sun, Feb 12, 2017 at 10:35:04AM +0000, Dean Rasheed wrote: > On 11 February 2017 at 01:17, Tomas Vondra <tomas.von...@2ndquadrant.com> > wrote: > > Thanks for the feedback, I'll fix this. I've allowed myself to be a bit > > sloppy because the number of attributes in the statistics is currently > > limited to 8, so the overflows are currently not an issue. But it doesn't > > hurt to make it future-proof, in case we change that mostly artificial limit > > sometime in the future. > > > > Ah right, so it can't overflow at present, but it's neater to have an > overflow-proof algorithm. > > Thinking about the exactness of the division steps is quite > interesting. Actually, the order of the multiplying factors doesn't > matter as long as the divisors are in increasing order. So in both my > proposal: > > result = 1 > for (i = 1; i <= k; i++) > result = (result * (n-k+i)) / i; > > and David's proposal, which is equivalent but has the multiplying > factors in the opposite order, equivalent to: > > result = 1 > for (i = 1; i <= k; i++) > result = (result * (n-i+1)) / i; > > the divisions are exact at each step. The first time through the loop > it divides by 1 which is trivially exact. The second time it divides > by 2, having multiplied by 2 consecutive factors, one of which is > therefore guaranteed to be divisible by 2. The third time it divides > by 3, having multiplied by 3 consecutive factors, one of which is > therefore guaranteed to be divisible by 3, and so on.
Right. You know you can use integer division, which make sense as permutations of discrete sets are always integers. > My approach originally seemed more logical to me because of the way it > derives from the recurrence relation binomial(n, k) = binomial(n-1, > k-1) * n / k, but they both work fine as long as they have suitable > overflow checks. Right. We could even cache those checks (sorry) based on data type limits by architecture and OS if performance on those operations ever matters that much. > It's also interesting that descriptions of this algorithm tend to > talk about setting k to min(k, n-k) at the start as an optimisation > step, as I did in fact, whereas it's actually more than that -- it > helps prevent unnecessary intermediate overflows when k > n/2. Of > course, that's not a worry for the current use of this function, but > it's good to have a robust algorithm. Indeed. :) Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers