FYI Julia (on M1 MBP) seems much faster: julia> @which factorial(big(100000000)) factorial(x::BigInt) in Base.GMP at gmp.jl:645
julia> @time begin; factorial(big(100000000)); 1; end 27.849116 seconds (1.39 M allocations: 11.963 GiB, 0.22% gc time) Probably they use Schönhage-Strassen multiplication algorithm for very large numbers as the 1E8! result will have over a 3/4 billion digits. I should try this in Gambit-Scheme (which has an excellent multiply implementation). > On Jan 12, 2024, at 9:32 PM, Rob Pike <r...@golang.org> wrote: > > Thanks for the tip. A fairly straightforward implementation of this algorithm > gives me about a factor of two speedup for pretty much any value. I went up > to 1e8!, which took about half an hour compared to nearly an hour for > MulRange. > > I'll probably stick in ivy after a little more tuning. I may even try > parallelization. > > -rob > > > On Tue, Jan 9, 2024 at 4:54 PM Bakul Shah <ba...@iitbombay.org > <mailto:ba...@iitbombay.org>> wrote: >> For that you may wish to explore Peter Luschny's "prime swing" factorial >> algorithm and variations! >> https://oeis.org/A000142/a000142.pdf >> >> And implementations in various languages including go: >> https://github.com/PeterLuschny/Fast-Factorial-Functions >> >>> On Jan 8, 2024, at 9:22 PM, Rob Pike <r...@golang.org >>> <mailto:r...@golang.org>> wrote: >>> >>> Here's an example where it's the bottleneck: ivy factorial >>> >>> >>> !1e7 >>> 1.20242340052e+65657059 >>> >>> )cpu >>> 1m10s (1m10s user, 167.330ms sys) >>> >>> >>> -rob >>> >>> >>> On Tue, Jan 9, 2024 at 2:21 PM Bakul Shah <ba...@iitbombay.org >>> <mailto:ba...@iitbombay.org>> wrote: >>>> Perhaps you were thinking of this? >>>> >>>> At iteration number k, the value xk contains O(klog(k)) digits, thus the >>>> computation of xk+1 = kxk has cost O(klog(k)). Finally, the total cost >>>> with this basic approach is O(2log(2)+¼+n log(n)) = O(n2log(n)). >>>> A better approach is the binary splitting : it just consists in >>>> recursively cutting the product of m consecutive integers in half. It >>>> leads to better results when products on large integers are performed with >>>> a fast method. >>>> >>>> http://numbers.computation.free.fr/Constants/Algorithms/splitting.html >>>> >>>> I think you can do recursive splitting without using function recursion by >>>> allocating N/2 array (where b = a+N-1) and iterating over it; each time >>>> the array "shrinks" by half. A "cleverer" algorithm would allocate an >>>> array of *words* of a bignum, as you know that the upper limit on size is >>>> N*64 (for 64 bit numbers) so you can just reuse the same space for each >>>> outer iteration (N/2 multiplie, N/4 ...) and apply Karatsuba 2nd outer >>>> iteration onwards. Not sure if this is easy in Go. >>>> >>>>> On Jan 8, 2024, at 11:47 AM, Robert Griesemer <g...@golang.org >>>>> <mailto:g...@golang.org>> wrote: >>>>> >>>>> Hello John; >>>>> >>>>> Thanks for your interest in this code. >>>>> >>>>> In a (long past) implementation of the factorial function, I noticed that >>>>> computing a * (a+1) * (a+2) * ... (b-1) * b was much faster when computed >>>>> in a recursive fashion than when computed iteratively: the reason (I >>>>> believed) was that the iterative approach seemed to produce a lot more >>>>> "internal fragmentation", that is medium-size intermediate results where >>>>> the most significant word (or "limb" as is the term in other >>>>> implementations) is only marginally used, resulting in more work than >>>>> necessary if those words were fully used. >>>>> >>>>> I never fully investigated, it was enough at the time that the recursive >>>>> approach was much faster. In retrospect, I don't quite believe my own >>>>> theory. Also, that implementation didn't have Karatsuba multiplication, >>>>> it just used grade-school multiplication. >>>>> >>>>> Since a, b are uint64 values (words), this could probably be implemented >>>>> in terms of mulAddVWW directly, with a suitable initial allocation for >>>>> the result - ideally this should just need one allocation (not sure how >>>>> close we can get to the right size). That would cut down the allocations >>>>> massively. >>>>> >>>>> In a next step, one should benchmark the implementation again. >>>>> >>>>> But at the very least, the overflow bug should be fixed, thanks for >>>>> finding it! I will send out a CL to fix that today. >>>>> >>>>> Thanks, >>>>> - gri >>>>> >>>>> >>>>> >>>>> On Sun, Jan 7, 2024 at 4:47 AM John Jannotti <janno...@gmail.com >>>>> <mailto:janno...@gmail.com>> wrote: >>>>>> Actually, both implementations have bugs! >>>>>> >>>>>> The recursive implementation ends with: >>>>>> ``` >>>>>> m := (a + b) / 2 >>>>>> return z.mul(nat(nil).mulRange(a, m), nat(nil).mulRange(m+1, b)) >>>>>> ``` >>>>>> >>>>>> That's a bug whenever `(a+b)` overflows, making `m` small. >>>>>> FIX: `m := a + (b-a)/2` >>>>>> >>>>>> My iterative implementation went into an infinite loop here: >>>>>> `for m := a + 1; m <= b; m++ {` >>>>>> if b is `math.MaxUint64` >>>>>> FIX: add `&& m > a` to the exit condition is an easy fix, but pays a >>>>>> small penalty for the vast majority of calls that don't have b=MaxUint64 >>>>>> >>>>>> I would add these to `mulRangesN` of the unit test: >>>>>> ``` >>>>>> {math.MaxUint64 - 3, math.MaxUint64 - 1, >>>>>> "6277101735386680760773248120919220245411599323494568951784"}, >>>>>> {math.MaxUint64 - 3, math.MaxUint64, >>>>>> "115792089237316195360799967654821100226821973275796746098729803619699194331160"} >>>>>> ``` >>>>>> >>>>>> On Sun, Jan 7, 2024 at 6:34 AM John Jannotti <janno...@gmail.com >>>>>> <mailto:janno...@gmail.com>> wrote: >>>>>>> I'm equally curious. >>>>>>> >>>>>>> FWIW, I realized the loop should perhaps be >>>>>>> ``` >>>>>>> mb := nat(nil).setUint64(b) // ensure mb starts big enough for b, even >>>>>>> on 32-bit arch >>>>>>> for m := a + 1; m <= b; m++ { >>>>>>> mb.setUint64(m) >>>>>>> z = z.mul(z, mb) >>>>>>> } >>>>>>> ``` >>>>>>> to avoid allocating repeatedly for `m`, which yields: >>>>>>> BenchmarkIterativeMulRangeN-10 354685 3032 ns/op 2129 B/op >>>>>>> 48 allocs/op >>>>>>> >>>>>>> On Sun, Jan 7, 2024 at 2:41 AM Rob Pike <r...@golang.org >>>>>>> <mailto:r...@golang.org>> wrote: >>>>>>>> It seems reasonable but first I'd like to understand why the recursive >>>>>>>> method is used. I can't deduce why, but the CL that adds it, by gri, >>>>>>>> does Karatsuba multiplication, which implies something deep is going >>>>>>>> on. I'll add him to the conversation. >>>>>>>> >>>>>>>> -rob >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Sun, Jan 7, 2024 at 5:46 PM John Jannotti <janno...@gmail.com >>>>>>>> <mailto:janno...@gmail.com>> wrote: >>>>>>>>> I enjoy bignum implementations, so I was looking through nat.go and >>>>>>>>> saw that `mulRange` is implemented in a surprising, recursive way,. >>>>>>>>> In the non-base case, `mulRange(a, b)` returns `mulrange(a, (a+b)/2) >>>>>>>>> * mulRange(1+(a+b)/2, b)` (lots of big.Int ceremony elided). >>>>>>>>> >>>>>>>>> That's fine, but I didn't see any advantage over the straightforward >>>>>>>>> (and simpler?) for loop. >>>>>>>>> >>>>>>>>> ``` >>>>>>>>> z = z.setUint64(a) >>>>>>>>> for m := a + 1; m <= b; m++ { >>>>>>>>> z = z.mul(z, nat(nil).setUint64(m)) >>>>>>>>> } >>>>>>>>> return z >>>>>>>>> ``` >>>>>>>>> >>>>>>>>> In fact, I suspected the existing code was slower, and allocated a >>>>>>>>> lot more. That seems true. A quick benchmark, using the existing >>>>>>>>> unit test as the benchmark, yields >>>>>>>>> BenchmarkRecusiveMulRangeN-10 169417 6856 >>>>>>>>> ns/op 9452 B/op 338 allocs/op >>>>>>>>> BenchmarkIterativeMulRangeN-10 265354 4269 >>>>>>>>> ns/op 2505 B/op 196 allocs/op >>>>>>>>> >>>>>>>>> I doubt `mulRange` is a performance bottleneck in anyone's code! But >>>>>>>>> it is exported as `int.MulRange` so I guess it's viewed with some >>>>>>>>> value. And seeing as how the for-loop seems even easier to >>>>>>>>> understand that the recursive version, maybe it's worth submitting a >>>>>>>>> PR? (If so, should I create an issue first?) >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> You received this message because you are subscribed to the Google >>>>>>>>> Groups "golang-nuts" group. >>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>> send an email to golang-nuts+unsubscr...@googlegroups.com >>>>>>>>> <mailto:golang-nuts+unsubscr...@googlegroups.com>. >>>>>>>>> To view this discussion on the web visit >>>>>>>>> https://groups.google.com/d/msgid/golang-nuts/e6ceb75a-f8b7-4f77-97dc-9445fb750782n%40googlegroups.com >>>>>>>>> >>>>>>>>> <https://groups.google.com/d/msgid/golang-nuts/e6ceb75a-f8b7-4f77-97dc-9445fb750782n%40googlegroups.com?utm_medium=email&utm_source=footer>. >>>>> >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google Groups >>>>> "golang-nuts" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send an >>>>> email to golang-nuts+unsubscr...@googlegroups.com >>>>> <mailto:golang-nuts+unsubscr...@googlegroups.com>. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/golang-nuts/CAKy0tf7Lcd8hiF2Qv3NFfjGcfvXDn%2BA%2BxJ1bfKta1w9P-OAs%3Dw%40mail.gmail.com >>>>> >>>>> <https://groups.google.com/d/msgid/golang-nuts/CAKy0tf7Lcd8hiF2Qv3NFfjGcfvXDn%2BA%2BxJ1bfKta1w9P-OAs%3Dw%40mail.gmail.com?utm_medium=email&utm_source=footer>. >>>> >> -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/B219273B-D067-4D57-9280-EEAB297778D7%40iitbombay.org.